- Docs Home
- About TiDB Cloud
- Get Started
- Develop Applications
- Overview
- Quick Start
- Build a TiDB Developer Cluster
- CRUD SQL in TiDB
- Build a Simple CRUD App with TiDB
- Example Applications
- Connect to TiDB
- Design Database Schema
- Write Data
- Read Data
- Transaction
- Optimize
- Troubleshoot
- Reference
- Cloud Native Development Environment
- Manage Cluster
- Plan Your Cluster
- Create a TiDB Cluster
- Connect to Your TiDB Cluster
- Set Up VPC Peering Connections
- Use an HTAP Cluster with TiFlash
- Scale a TiDB Cluster
- Upgrade a TiDB Cluster
- Delete a TiDB Cluster
- Use TiDB Cloud API (Beta)
- Migrate Data
- Import Sample Data
- Migrate Data into TiDB
- Configure Amazon S3 Access and GCS Access
- Migrate from MySQL-Compatible Databases
- Migrate Incremental Data from MySQL-Compatible Databases
- Migrate from Amazon Aurora MySQL in Bulk
- Import or Migrate from Amazon S3 or GCS to TiDB Cloud
- Import CSV Files from Amazon S3 or GCS into TiDB Cloud
- Import Apache Parquet Files from Amazon S3 or GCS into TiDB Cloud
- Troubleshoot Access Denied Errors during Data Import from Amazon S3
- Export Data from TiDB
- Back Up and Restore
- Monitor and Alert
- Overview
- Built-in Monitoring
- Built-in Alerting
- Third-Party Monitoring Integrations
- Tune Performance
- Overview
- Analyze Performance
- SQL Tuning
- Overview
- Understanding the Query Execution Plan
- SQL Optimization Process
- Overview
- Logic Optimization
- Physical Optimization
- Prepare Execution Plan Cache
- Control Execution Plans
- TiKV Follower Read
- Coprocessor Cache
- Garbage Collection (GC)
- Tune TiFlash performance
- Manage User Access
- Billing
- Reference
- TiDB Cluster Architecture
- TiDB Cloud Cluster Limits and Quotas
- TiDB Limitations
- SQL
- Explore SQL with TiDB
- SQL Language Structure and Syntax
- SQL Statements
ADD COLUMN
ADD INDEX
ADMIN
ADMIN CANCEL DDL
ADMIN CHECKSUM TABLE
ADMIN CHECK [TABLE|INDEX]
ADMIN SHOW DDL [JOBS|QUERIES]
ALTER DATABASE
ALTER INDEX
ALTER TABLE
ALTER TABLE COMPACT
ALTER USER
ANALYZE TABLE
BATCH
BEGIN
CHANGE COLUMN
COMMIT
CHANGE DRAINER
CHANGE PUMP
CREATE [GLOBAL|SESSION] BINDING
CREATE DATABASE
CREATE INDEX
CREATE ROLE
CREATE SEQUENCE
CREATE TABLE LIKE
CREATE TABLE
CREATE USER
CREATE VIEW
DEALLOCATE
DELETE
DESC
DESCRIBE
DO
DROP [GLOBAL|SESSION] BINDING
DROP COLUMN
DROP DATABASE
DROP INDEX
DROP ROLE
DROP SEQUENCE
DROP STATS
DROP TABLE
DROP USER
DROP VIEW
EXECUTE
EXPLAIN ANALYZE
EXPLAIN
FLASHBACK TABLE
FLUSH PRIVILEGES
FLUSH STATUS
FLUSH TABLES
GRANT <privileges>
GRANT <role>
INSERT
KILL [TIDB]
MODIFY COLUMN
PREPARE
RECOVER TABLE
RENAME INDEX
RENAME TABLE
REPLACE
REVOKE <privileges>
REVOKE <role>
ROLLBACK
SELECT
SET DEFAULT ROLE
SET [NAMES|CHARACTER SET]
SET PASSWORD
SET ROLE
SET TRANSACTION
SET [GLOBAL|SESSION] <variable>
SHOW ANALYZE STATUS
SHOW [GLOBAL|SESSION] BINDINGS
SHOW BUILTINS
SHOW CHARACTER SET
SHOW COLLATION
SHOW [FULL] COLUMNS FROM
SHOW CREATE SEQUENCE
SHOW CREATE TABLE
SHOW CREATE USER
SHOW DATABASES
SHOW DRAINER STATUS
SHOW ENGINES
SHOW ERRORS
SHOW [FULL] FIELDS FROM
SHOW GRANTS
SHOW INDEX [FROM|IN]
SHOW INDEXES [FROM|IN]
SHOW KEYS [FROM|IN]
SHOW MASTER STATUS
SHOW PLUGINS
SHOW PRIVILEGES
SHOW [FULL] PROCESSSLIST
SHOW PROFILES
SHOW PUMP STATUS
SHOW SCHEMAS
SHOW STATS_HEALTHY
SHOW STATS_HISTOGRAMS
SHOW STATS_META
SHOW STATUS
SHOW TABLE NEXT_ROW_ID
SHOW TABLE REGIONS
SHOW TABLE STATUS
SHOW [FULL] TABLES
SHOW [GLOBAL|SESSION] VARIABLES
SHOW WARNINGS
SHUTDOWN
SPLIT REGION
START TRANSACTION
TABLE
TRACE
TRUNCATE
UPDATE
USE
WITH
- Data Types
- Functions and Operators
- Overview
- Type Conversion in Expression Evaluation
- Operators
- Control Flow Functions
- String Functions
- Numeric Functions and Operators
- Date and Time Functions
- Bit Functions and Operators
- Cast Functions and Operators
- Encryption and Compression Functions
- Locking Functions
- Information Functions
- JSON Functions
- Aggregate (GROUP BY) Functions
- Window Functions
- Miscellaneous Functions
- Precision Math
- Set Operations
- List of Expressions for Pushdown
- TiDB Specific Functions
- Clustered Indexes
- Constraints
- Generated Columns
- SQL Mode
- Table Attributes
- Transactions
- Views
- Partitioning
- Temporary Tables
- Cached Tables
- Character Set and Collation
- Read Historical Data
- System Tables
mysql
- INFORMATION_SCHEMA
- Overview
ANALYZE_STATUS
CLIENT_ERRORS_SUMMARY_BY_HOST
CLIENT_ERRORS_SUMMARY_BY_USER
CLIENT_ERRORS_SUMMARY_GLOBAL
CHARACTER_SETS
CLUSTER_INFO
COLLATIONS
COLLATION_CHARACTER_SET_APPLICABILITY
COLUMNS
DATA_LOCK_WAITS
DDL_JOBS
DEADLOCKS
ENGINES
KEY_COLUMN_USAGE
PARTITIONS
PROCESSLIST
REFERENTIAL_CONSTRAINTS
SCHEMATA
SEQUENCES
SESSION_VARIABLES
SLOW_QUERY
STATISTICS
TABLES
TABLE_CONSTRAINTS
TABLE_STORAGE_STATS
TIDB_HOT_REGIONS_HISTORY
TIDB_INDEXES
TIDB_SERVERS_INFO
TIDB_TRX
TIFLASH_REPLICA
TIKV_REGION_PEERS
TIKV_REGION_STATUS
TIKV_STORE_STATUS
USER_PRIVILEGES
VIEWS
- System Variables
- API Reference
- Storage Engines
- Dumpling
- Table Filter
- Troubleshoot Inconsistency Between Data and Indexes
- FAQs
- Release Notes
- Support
- Glossary
Non-Transactional DML Statements
This document describes the usage scenarios, usage methods, and restrictions of non-transactional DML statements in TiDB. In addition, the implementation principle and common issues are also explained.
A non-transactional DML statement is a DML statement split into multiple SQL statements (which is, multiple batches) to be executed in sequence. It enhances the performance and ease of use in batch data processing at the expense of transactional atomicity and isolation.
Non-transactional DML statements include INSERT
, UPDATE
, and DELETE
, of which TiDB currently only supports DELETE
. For detailed syntax, see BATCH
.
A non-transactional DML statement does not guarantee the atomicity and isolation of the statement, and is not equivalent to the original DML statement.
Usage scenarios
In the scenarios of large data processing, you might often need to perform same operations on a large batch of data. If the operation is performed directly using a single SQL statement, the transaction size might exceed the limit and affect the execution performance.
Batch data processing often has no overlap of time or data with the online application operations. Isolation (I in ACID) is unnecessary when no concurrent operations exist. Atomicity is also unnecessary if bulk data operations are idempotent or easily retryable. If your application needs neither data isolation nor atomicity, you can consider using non-transactional DML statements.
Non-transactional DML statements are used to bypass the size limit on large transactions in certain scenarios. One statement is used to complete tasks that would otherwise require manually splitting of transactions, with higher execution efficiency and less resource consumption.
For example, to delete expired data, if you ensure that no application will access the expired data, you can use a non-transactional DML statement to improve the DELETE
performance.
Prerequisites
Before using non-transactional DML statements, make sure that the following conditions are met:
- The statement does not require atomicity, which permits some rows to be modified and some rows to remain unmodified in the execution result.
- The statement is idempotent, or you are prepared to retry on a part of the data according to the error message. If the system variables are set to
tidb_redact_log = 1
andtidb_nontransactional_ignore_error = 1
, this statement must be idempotent. Otherwise, when the statement partially fails, the failed part cannot be accurately located. - The data to be operated on has no other concurrent writes, which means it is not updated by other statements at the same time. Otherwise, unexpected results such as missing deletions and wrong deletions might occur.
- The statement does not modify the data to be read by the statement itself. Otherwise, the following batch will read the data written by the previous batch and easily causes unexpected results.
- The statement meets the restrictions.
- It is not recommended to perform concurrent DDL operations on the table to be read or written by this DML statement.
WARNING:
If
tidb_redact_log
andtidb_nontransactional_ignore_error
are enabled at the same time, you might not get the complete error information of each batch, and you cannot retry the failed batch only. Therefore, if both of the system variables are turned on, the non-transactional DML statement must be idempotent.
Usage examples
Use a non-transactional DML statement
The following sections describe the use of non-transactional DML statements with examples:
Create a table t
with the following schema:
CREATE TABLE t (id INT, v INT, KEY(id));
Query OK, 0 rows affected
Insert some data into table t
.
INSERT INTO t VALUES (1, 2), (2, 3), (3, 4), (4, 5), (5, 6);
Query OK, 5 rows affected
The following operation uses a non-transactional DML statement to delete rows with values less than the integer 6 on column v
of table t
. This statement is split into two SQL statements, with a batch size of 2, divided by the id
column and executed.
BATCH ON id LIMIT 2 DELETE FROM t WHERE v < 6;
+----------------+---------------+
| number of jobs | job status |
+----------------+---------------+
| 2 | all succeeded |
+----------------+---------------+
1 row in set
Check the deletion results of the above non-transactional DML statement.
SELECT * FROM t;
+----+---+
| id | v |
+----+---+
| 5 | 6 |
+----+---+
1 row in set
Check the execution progress
During the execution of a non-transactional DML statement, you can view the progress using SHOW PROCESSLIST
. The Time
field in the returned result indicates the time consumption of the current batch execution. Logs and slow logs also record the progress of each split statement throughout the non-transactional DML execution. For example:
SHOW PROCESSLIST;
+------+------+--------------------+--------+---------+------+------------+----------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+------+------+--------------------+--------+---------+------+------------+----------------------------------------------------------------------------------------------------+
| 1203 | root | 100.64.10.62:52711 | test | Query | 0 | autocommit | /* job 506/500000 */ DELETE FROM `test`.`t1` WHERE `test`.`t1`.`_tidb_rowid` BETWEEN 2271 AND 2273 |
| 1209 | root | 100.64.10.62:52735 | <null> | Query | 0 | autocommit | show full processlist |
+------+------+--------------------+--------+---------+------+------------+----------------------------------------------------------------------------------------------------+
Terminate a non-transactional DML statement
To terminate a non-transactional DML statement, you can use KILL TIDB
. Then TiDB will cancel all batches after the batch that is currently being executed. You can get the execution result from the log.
Query the batch-dividing statement
During the execution of a non-transactional DML statement, a statement is internally used to divide the DML statement into multiple batches. To query this batch-dividing statement, you can add DRY RUN QUERY
to this non-transactional DML statement. Then TiDB will not execute this query and the subsequent DML operations.
The following statement queries the batch-dividing statement during the execution of BATCH ON id LIMIT 2 DELETE FROM t WHERE v < 6
:
BATCH ON id LIMIT 2 DRY RUN QUERY DELETE FROM t WHERE v < 6;
+--------------------------------------------------------------------------------+
| query statement |
+--------------------------------------------------------------------------------+
| SELECT `id` FROM `test`.`t` WHERE (`v` < 6) ORDER BY IF(ISNULL(`id`),0,1),`id` |
+--------------------------------------------------------------------------------+
1 row in set
Query the statements corresponding to the first and the last batches
To query the actual DML statements corresponding to the first and the last batches in a non-transactional DML statement, you can add DRY RUN
to this non-transactional DML statement. Then, TiDB only divides batches and does not execute these SQL statements. Because there might be many batches, not all batches are displayed, and only the first one and the last one are displayed.
BATCH ON id LIMIT 2 DRY RUN DELETE FROM t WHERE v < 6;
+-------------------------------------------------------------------+
| split statement examples |
+-------------------------------------------------------------------+
| DELETE FROM `test`.`t` WHERE (`id` BETWEEN 1 AND 2 AND (`v` < 6)) |
| DELETE FROM `test`.`t` WHERE (`id` BETWEEN 3 AND 4 AND (`v` < 6)) |
+-------------------------------------------------------------------+
2 rows in set
Use the optimizer hint
If an optimizer hint is originally supported in the DELETE
statement, the optimizer hint is also supported in the non-transactional DELETE
statement. The position of the hint is the same as that in the ordinary DELETE
statement:
BATCH ON id LIMIT 2 DELETE /*+ USE_INDEX(t)*/ FROM t WHERE v < 6;
Best practices
To use a non-transactional DML statement, the following steps are recommended:
- Select an appropriate dividing column. Integer or string types are recommended.
- (Optional) Add
DRY RUN QUERY
to the non-transactional DML statement, execute the query manually, and confirm whether the data range affected by the DML statement is roughly correct. - (Optional) Add
DRY RUN
to the non-transactional DML statement, execute the query manually, and check the split statements and the execution plans. You need to pay attention to the index selection efficiency. - Execute the non-transactional DML statement.
- If an error is reported, get the specific failed data range from the error message or log, and retry or handle it manually.
Parameter description
Parameter | Description | Default value | Required or not | Recommended value |
---|---|---|---|---|
Dividing column | The column used to divide batches, such as the id column in the above non-transactional DML statement BATCH ON id LIMIT 2 DELETE FROM t WHERE v < 6 . | TiDB tries to automatically select a dividing column. | No | Select a column that can meet the WHERE condition in the most efficient way. |
Batch size | Used to control the size of each batch. The number of batches is the number of SQL statements into which DML operations are split, such as LIMIT 2 in the above non-transactional DML statement BATCH ON id LIMIT 2 DELETE FROM t WHERE v < 6 . The more batches, the smaller the batch size. | N/A | Yes | 1000-1000000. Too small or too large a batch will lead to performance degradation. |
How to select a dividing column
A non-transactional DML statement uses a column as the basis for data batching, which is the dividing column. For higher execution efficiency, a dividing column is required to use index. The execution efficiency brought by different indexes and dividing columns might vary by dozens of times. When choosing the dividing column, consider the following suggestions:
- If you know the application data distribution, according to the
WHERE
condition, choose the column that divides data with smaller ranges after the batching.- Ideally, the
WHERE
condition can take advantage of the index of the dividing column to reduce the amount of data to be scanned per batch. For example, there is a transaction table that records the start and end time of each transaction, and you want to delete all transaction records whose end time is before one month. If there is an index on the start time of the transaction, and the start and end times of the transaction are relatively close, then you can choose the start time column as the dividing column. - In a less-than-ideal case, the data distribution of the dividing column is completely independent of the
WHERE
condition, and the index of the dividing column cannot be used to reduce the scope of the data scan.
- Ideally, the
- When a clustered index exists, it is recommended to use the primary key (including an
INT
primary key and_tidb_rowid
) as the dividing column, so that the execution efficiency is higher. - Choose the column with fewer duplicate values.
You can also choose not to specify a dividing column. Then, TiDB will use the first column of handle
as the dividing column by default. But if the first column of the primary key of the clustered index is of a data type not supported by non-transactional DML statements (which is ENUM
, BIT
, SET
, JSON
), TiDB will report an error. You can choose an appropriate dividing column according to your application needs.
How to set batch size
In non-transactional DML statements, the larger the batch size, the fewer SQL statements are split and the slower each SQL statement is executed. The optimal batch size depends on the workload. It is recommended to start from 50000. Either too small or too large batch sizes will cause decreased execution efficiency.
The information of each batch is stored in memory, so too many batches can significantly increase memory consumption. This explains why the batch size cannot be too small. The upper limit of memory consumed by non-transactional statements for storing batch information is the same as tidb_mem_quota_query
, and the action triggered when this limit is exceeded is determined by the configuration item tidb_mem_oom_action
.
Restrictions
The following are hard restrictions on non-transactional DML statements. If these restrictions are not met, TiDB will report an error.
- You can only operate on a single table. Multi-table joins are currently not supported.
- The DML statements cannot contain
ORDER BY
orLIMIT
clauses. - The dividing column must be indexed. The index can be a single-column index, or the first column of a joint index.
- Must be used in the
autocommit
mode. - Cannot be used when batch-dml is enabled.
- Cannot be used when [
tidb_snapshot
](/read-historical-data.md#operation flow) is set. - Cannot be used with the
prepare
statement. ENUM
,BIT
,SET
,JSON
types are not supported as the dividing columns.- Not supported for temporary tables.
- Common Table Expression is not supported.
Control batch execution failure
Non-transactional DML statements do not satisfy atomicity. Some batches might succeed and some might fail. The system variable tidb_nontransactional_ignore_error
controls how the non-transactional DML statements handle errors.
An exception is that if the first batch fails, there is a high probability that the statement itself is wrong. In this case, the entire non-transactional statement will directly return an error.
How it works
The working principle of non-transactional DML statements is to build into TiDB the automatic splitting of SQL statements. Without non-transactional DML statements, you will need to manually split the SQL statements. To understand the behavior of a non-transactional DML statement, think of it as a user script doing the following tasks:
For the non-transactional DML BATCH ON $C$ LIMIT $N$ DELETE FROM ... WHERE $P$
, is the column used for dividing, is the batch size, and is the filter condition.
- According to the filter condition of the original statement and the specified column for dividing, TiDB queries all that satisfy . TiDB sorts these into groups according to . For each of all , TiDB keeps its first and last as and . The query statement executed in this step can be viewed through
DRY RUN QUERY
. - The data involved in is a subset that satisfies : BETWEEN AND . You can use to narrow down the range of data that each batch needs to process.
- For , TiDB embeds the above condition into the
WHERE
condition of the original statement, which makes it WHERE () AND (). The execution result of this step can be viewed throughDRY RUN
. - For all batches, execute new statements in sequence. The errors for each grouping are collected and combined, and returned as the result of the entire non-transactional DML statement after all groupings are complete.
Comparison with batch-dml
batch-dml is a mechanism for splitting a transaction into multiple transaction commits during the execution of a DML statement.
It is not recommended to use batch-dml. When the batch-dml feature is not properly used, there is a risk of data index inconsistency. batch-dml will be deprecated in a later release of TiDB.
Non-transactional DML statements are not yet a replacement for all batch-dml usage scenarios. Their main differences are as follows:
Performance: When the dividing column is efficient, the performance of non-transactional DML statements is close to that of batch-dml. When the dividing column is less efficient, the performance of non-transactional DML statements is significantly lower than that of batch-dml.
Stability: batch-dml is prone to data index inconsistencies due to improper use. Non-transactional DML statements do not cause data index inconsistencies. However, when used improperly, non-transactional DML statements are not equivalent to the original statements, and the applications might observe unexpected behavior. See the common issues section for details.
Common issues
The actual batch size is not the same as the specified batch size
During the execution of a non-transactional DML statement, the size of data to be processed in the last batch might be smaller than the specified batch size.
When duplicated values exist in the dividing column, each batch will contain all the duplicated values of the last element of the dividing column in this batch. Therefore, the number of rows in this batch might be greater than the specified batch size.
In addition, when other concurrent writes occur, the number of rows processed in each batch might be different from the specified batch size.
The Failed to restore the delete statement, probably because of unsupported type of the shard column
error occurs during execution
The dividing column does not support ENUM
, BIT
, SET
, JSON
types. Try to specify a new dividing column. It is recommended to use an integer or string type column.
If the error occurs when the selected dividing column is not one of these unsupported types, contact PingCAP technical support.
Non-transactional DELETE
has "exceptional" behavior that is not equivalent to ordinary DELETE
A non-transactional DML statement is not equivalent to the original form of this DML statement, which might have the following reasons:
- There are other concurrent writes.
- The non-transactional DML statement modifies a value that the statement itself will read.
- The SQL statement executed in each batch might cause a different execution plan and expression calculation order because the
WHERE
condition is changed. Therefore, the execution result might be different from the original statement. - The DML statements contain non-deterministic operations.
MySQL compatibility
Non-transactional statements are TiDB-specific and are not compatible with MySQL.
See also
- The
BATCH
syntax tidb_nontransactional_ignore_error