- Docs Home
- About TiDB
- Quick Start
- Develop
- Overview
- Quick Start
- Build a TiDB Cluster in TiDB Cloud (Developer Tier)
- CRUD SQL in TiDB
- Build a Simple CRUD App with TiDB
- Example Applications
- Connect to TiDB
- Design Database Schema
- Write Data
- Read Data
- Transaction
- Optimize
- Troubleshoot
- Reference
- Cloud Native Development Environment
- Third-party Support
- Deploy
- Software and Hardware Requirements
- Environment Configuration Checklist
- Plan Cluster Topology
- Install and Start
- Verify Cluster Status
- Test Cluster Performance
- Migrate
- Overview
- Migration Tools
- Migration Scenarios
- Migrate from Aurora
- Migrate MySQL of Small Datasets
- Migrate MySQL of Large Datasets
- Migrate and Merge MySQL Shards of Small Datasets
- Migrate and Merge MySQL Shards of Large Datasets
- Migrate from CSV Files
- Migrate from SQL Files
- Migrate from One TiDB Cluster to Another TiDB Cluster
- Migrate from TiDB to MySQL-compatible Databases
- Advanced Migration
- Integrate
- Maintain
- Monitor and Alert
- Troubleshoot
- TiDB Troubleshooting Map
- Identify Slow Queries
- Analyze Slow Queries
- SQL Diagnostics
- Identify Expensive Queries Using Top SQL
- Identify Expensive Queries Using Logs
- Statement Summary Tables
- Troubleshoot Hotspot Issues
- Troubleshoot Increased Read and Write Latency
- Save and Restore the On-Site Information of a Cluster
- Troubleshoot Cluster Setup
- Troubleshoot High Disk I/O Usage
- Troubleshoot Lock Conflicts
- Troubleshoot TiFlash
- Troubleshoot Write Conflicts in Optimistic Transactions
- Troubleshoot Inconsistency Between Data and Indexes
- Performance Tuning
- Tuning Guide
- Configuration Tuning
- System Tuning
- Software Tuning
- SQL Tuning
- Overview
- Understanding the Query Execution Plan
- SQL Optimization Process
- Overview
- Logic Optimization
- Physical Optimization
- Prepare Execution Plan Cache
- Control Execution Plans
- Tutorials
- TiDB Tools
- Overview
- Use Cases
- Download
- TiUP
- Documentation Map
- Overview
- Terminology and Concepts
- Manage TiUP Components
- FAQ
- Troubleshooting Guide
- Command Reference
- Overview
- TiUP Commands
- TiUP Cluster Commands
- Overview
- tiup cluster audit
- tiup cluster check
- tiup cluster clean
- tiup cluster deploy
- tiup cluster destroy
- tiup cluster disable
- tiup cluster display
- tiup cluster edit-config
- tiup cluster enable
- tiup cluster help
- tiup cluster import
- tiup cluster list
- tiup cluster patch
- tiup cluster prune
- tiup cluster reload
- tiup cluster rename
- tiup cluster replay
- tiup cluster restart
- tiup cluster scale-in
- tiup cluster scale-out
- tiup cluster start
- tiup cluster stop
- tiup cluster template
- tiup cluster upgrade
- TiUP DM Commands
- Overview
- tiup dm audit
- tiup dm deploy
- tiup dm destroy
- tiup dm disable
- tiup dm display
- tiup dm edit-config
- tiup dm enable
- tiup dm help
- tiup dm import
- tiup dm list
- tiup dm patch
- tiup dm prune
- tiup dm reload
- tiup dm replay
- tiup dm restart
- tiup dm scale-in
- tiup dm scale-out
- tiup dm start
- tiup dm stop
- tiup dm template
- tiup dm upgrade
- TiDB Cluster Topology Reference
- DM Cluster Topology Reference
- Mirror Reference Guide
- TiUP Components
- PingCAP Clinic Diagnostic Service
- TiDB Operator
- Dumpling
- TiDB Lightning
- TiDB Data Migration
- About TiDB Data Migration
- Architecture
- Quick Start
- Deploy a DM cluster
- Tutorials
- Advanced Tutorials
- Maintain
- Cluster Upgrade
- Tools
- Performance Tuning
- Manage Data Sources
- Manage Tasks
- Export and Import Data Sources and Task Configurations of Clusters
- Handle Alerts
- Daily Check
- Reference
- Architecture
- Command Line
- Configuration Files
- OpenAPI
- Compatibility Catalog
- Secure
- Monitoring and Alerts
- Error Codes
- Glossary
- Example
- Troubleshoot
- Release Notes
- Backup & Restore (BR)
- TiDB Binlog
- TiCDC
- Dumpling
- sync-diff-inspector
- TiSpark
- Reference
- Cluster Architecture
- Key Monitoring Metrics
- Secure
- Privileges
- SQL
- SQL Language Structure and Syntax
- SQL Statements
ADD COLUMN
ADD INDEX
ADMIN
ADMIN CANCEL DDL
ADMIN CHECKSUM TABLE
ADMIN CHECK [TABLE|INDEX]
ADMIN SHOW DDL [JOBS|QUERIES]
ADMIN SHOW TELEMETRY
ALTER DATABASE
ALTER INDEX
ALTER INSTANCE
ALTER PLACEMENT POLICY
ALTER TABLE
ALTER TABLE COMPACT
ALTER USER
ANALYZE TABLE
BACKUP
BATCH
BEGIN
CHANGE COLUMN
COMMIT
CHANGE DRAINER
CHANGE PUMP
CREATE [GLOBAL|SESSION] BINDING
CREATE DATABASE
CREATE INDEX
CREATE PLACEMENT POLICY
CREATE ROLE
CREATE SEQUENCE
CREATE TABLE LIKE
CREATE TABLE
CREATE USER
CREATE VIEW
DEALLOCATE
DELETE
DESC
DESCRIBE
DO
DROP [GLOBAL|SESSION] BINDING
DROP COLUMN
DROP DATABASE
DROP INDEX
DROP PLACEMENT POLICY
DROP ROLE
DROP SEQUENCE
DROP STATS
DROP TABLE
DROP USER
DROP VIEW
EXECUTE
EXPLAIN ANALYZE
EXPLAIN
FLASHBACK TABLE
FLUSH PRIVILEGES
FLUSH STATUS
FLUSH TABLES
GRANT <privileges>
GRANT <role>
INSERT
KILL [TIDB]
LOAD DATA
LOAD STATS
MODIFY COLUMN
PREPARE
RECOVER TABLE
RENAME INDEX
RENAME TABLE
REPLACE
RESTORE
REVOKE <privileges>
REVOKE <role>
ROLLBACK
SELECT
SET DEFAULT ROLE
SET [NAMES|CHARACTER SET]
SET PASSWORD
SET ROLE
SET TRANSACTION
SET [GLOBAL|SESSION] <variable>
SHOW ANALYZE STATUS
SHOW [BACKUPS|RESTORES]
SHOW [GLOBAL|SESSION] BINDINGS
SHOW BUILTINS
SHOW CHARACTER SET
SHOW COLLATION
SHOW [FULL] COLUMNS FROM
SHOW CONFIG
SHOW CREATE PLACEMENT POLICY
SHOW CREATE SEQUENCE
SHOW CREATE TABLE
SHOW CREATE USER
SHOW DATABASES
SHOW DRAINER STATUS
SHOW ENGINES
SHOW ERRORS
SHOW [FULL] FIELDS FROM
SHOW GRANTS
SHOW INDEX [FROM|IN]
SHOW INDEXES [FROM|IN]
SHOW KEYS [FROM|IN]
SHOW MASTER STATUS
SHOW PLACEMENT
SHOW PLACEMENT FOR
SHOW PLACEMENT LABELS
SHOW PLUGINS
SHOW PRIVILEGES
SHOW [FULL] PROCESSSLIST
SHOW PROFILES
SHOW PUMP STATUS
SHOW SCHEMAS
SHOW STATS_HEALTHY
SHOW STATS_HISTOGRAMS
SHOW STATS_META
SHOW STATUS
SHOW TABLE NEXT_ROW_ID
SHOW TABLE REGIONS
SHOW TABLE STATUS
SHOW [FULL] TABLES
SHOW [GLOBAL|SESSION] VARIABLES
SHOW WARNINGS
SHUTDOWN
SPLIT REGION
START TRANSACTION
TABLE
TRACE
TRUNCATE
UPDATE
USE
WITH
- Data Types
- Functions and Operators
- Overview
- Type Conversion in Expression Evaluation
- Operators
- Control Flow Functions
- String Functions
- Numeric Functions and Operators
- Date and Time Functions
- Bit Functions and Operators
- Cast Functions and Operators
- Encryption and Compression Functions
- Locking Functions
- Information Functions
- JSON Functions
- Aggregate (GROUP BY) Functions
- Window Functions
- Miscellaneous Functions
- Precision Math
- Set Operations
- List of Expressions for Pushdown
- TiDB Specific Functions
- Clustered Indexes
- Constraints
- Generated Columns
- SQL Mode
- Table Attributes
- Transactions
- Garbage Collection (GC)
- Views
- Partitioning
- Temporary Tables
- Cached Tables
- Character Set and Collation
- Placement Rules in SQL
- System Tables
mysql
- INFORMATION_SCHEMA
- Overview
ANALYZE_STATUS
CLIENT_ERRORS_SUMMARY_BY_HOST
CLIENT_ERRORS_SUMMARY_BY_USER
CLIENT_ERRORS_SUMMARY_GLOBAL
CHARACTER_SETS
CLUSTER_CONFIG
CLUSTER_HARDWARE
CLUSTER_INFO
CLUSTER_LOAD
CLUSTER_LOG
CLUSTER_SYSTEMINFO
COLLATIONS
COLLATION_CHARACTER_SET_APPLICABILITY
COLUMNS
DATA_LOCK_WAITS
DDL_JOBS
DEADLOCKS
ENGINES
INSPECTION_RESULT
INSPECTION_RULES
INSPECTION_SUMMARY
KEY_COLUMN_USAGE
METRICS_SUMMARY
METRICS_TABLES
PARTITIONS
PLACEMENT_POLICIES
PROCESSLIST
REFERENTIAL_CONSTRAINTS
SCHEMATA
SEQUENCES
SESSION_VARIABLES
SLOW_QUERY
STATISTICS
TABLES
TABLE_CONSTRAINTS
TABLE_STORAGE_STATS
TIDB_HOT_REGIONS
TIDB_HOT_REGIONS_HISTORY
TIDB_INDEXES
TIDB_SERVERS_INFO
TIDB_TRX
TIFLASH_REPLICA
TIKV_REGION_PEERS
TIKV_REGION_STATUS
TIKV_STORE_STATUS
USER_PRIVILEGES
VIEWS
METRICS_SCHEMA
- UI
- TiDB Dashboard
- Overview
- Maintain
- Access
- Overview Page
- Cluster Info Page
- Top SQL Page
- Key Visualizer Page
- Metrics Relation Graph
- SQL Statements Analysis
- Slow Queries Page
- Cluster Diagnostics
- Search Logs Page
- Instance Profiling
- Session Management and Configuration
- FAQ
- CLI
- Command Line Flags
- Configuration File Parameters
- System Variables
- Storage Engines
- Telemetry
- Errors Codes
- Table Filter
- Schedule Replicas by Topology Labels
- FAQs
- Release Notes
- All Releases
- Release Timeline
- TiDB Versioning
- v6.1
- v6.0
- v5.4
- v5.3
- v5.2
- v5.1
- v5.0
- v4.0
- v3.1
- v3.0
- v2.1
- v2.0
- v1.0
- Glossary
Troubleshoot TiCDC
This document introduces the common errors you might encounter when using TiCDC, and the corresponding maintenance and troubleshooting methods.
In this document, the PD address specified in cdc cli
commands is --pd=http://10.0.10.25:2379
. When you use the command, replace the address with your actual PD address.
TiCDC replication interruptions
How do I know whether a TiCDC replication task is interrupted?
- Check the
changefeed checkpoint
monitoring metric of the replication task (choose the rightchangefeed id
) in the Grafana dashboard. If the metric value stays unchanged, or thecheckpoint lag
metric keeps increasing, the replication task might be interrupted. - Check the
exit error count
monitoring metric. If the metric value is greater than0
, an error has occurred in the replication task. - Execute
cdc cli changefeed list
andcdc cli changefeed query
to check the status of the replication task.stopped
means the task has stopped, and theerror
item provides the detailed error message. After the error occurs, you can searcherror on running processor
in the TiCDC server log to see the error stack for troubleshooting. - In some extreme cases, the TiCDC service is restarted. You can search the
FATAL
level log in the TiCDC server log for troubleshooting.
How do I know whether the replication task is stopped manually?
You can know whether the replication task is stopped manually by executing cdc cli
. For example:
cdc cli changefeed query --pd=http://10.0.10.25:2379 --changefeed-id 28c43ffc-2316-4f4f-a70b-d1a7c59ba79f
In the output of the above command, admin-job-type
shows the state of this replication task:
0
: In progress, which means that the task is not stopped manually.1
: Paused. When the task is paused, all replicatedprocessor
s exit. The configuration and the replication status of the task are retained, so you can resume the task fromcheckpiont-ts
.2
: Resumed. The replication task resumes fromcheckpoint-ts
.3
: Removed. When the task is removed, all replicatedprocessor
s are ended, and the configuration information of the replication task is cleared up. The replication status is retained only for later queries.
How do I handle replication interruptions?
A replication task might be interrupted in the following known scenarios:
The downstream continues to be abnormal, and TiCDC still fails after many retries.
In this scenario, TiCDC saves the task information. Because TiCDC has set the service GC safepoint in PD, the data after the task checkpoint is not cleaned by TiKV GC within the valid period of
gc-ttl
.Handling method: You can resume the replication task via the HTTP interface after the downstream is back to normal.
Replication cannot continue because of incompatible SQL statement(s) in the downstream.
- In this scenario, TiCDC saves the task information. Because TiCDC has set the service GC safepoint in PD, the data after the task checkpoint is not cleaned by TiKV GC within the valid period of
gc-ttl
. - Handling procedures:
- Query the status information of the replication task using the
cdc cli changefeed query
command and record the value ofcheckpoint-ts
. - Use the new task configuration file and add the
ignore-txn-start-ts
parameter to skip the transaction corresponding to the specifiedstart-ts
. - Stop the old replication task via HTTP API. Execute
cdc cli changefeed create
to create a new task and specify the new task configuration file. Specifycheckpoint-ts
recorded in step 1 as thestart-ts
and start a new task to resume the replication.
- Query the status information of the replication task using the
- In this scenario, TiCDC saves the task information. Because TiCDC has set the service GC safepoint in PD, the data after the task checkpoint is not cleaned by TiKV GC within the valid period of
In TiCDC v4.0.13 and earlier versions, when TiCDC replicates the partitioned table, it might encounter an error that leads to replication interruption.
- In this scenario, TiCDC saves the task information. Because TiCDC has set the service GC safepoint in PD, the data after the task checkpoint is not cleaned by TiKV GC within the valid period of
gc-ttl
. - Handling procedures:
- Pause the replication task by executing
cdc cli changefeed pause -c <changefeed-id>
. - Wait for about one munite, and then resume the replication task by executing
cdc cli changefeed resume -c <changefeed-id>
.
- Pause the replication task by executing
- In this scenario, TiCDC saves the task information. Because TiCDC has set the service GC safepoint in PD, the data after the task checkpoint is not cleaned by TiKV GC within the valid period of
What should I do to handle the OOM that occurs after TiCDC is restarted after a task interruption?
- Update your TiDB cluster and TiCDC cluster to the latest versions. The OOM problem has already been resolved in v4.0.14 and later v4.0 versions, v5.0.2 and later v5.0 versions, and the latest versions.
How do I handle the Error 1298: Unknown or incorrect time zone: 'UTC'
error when creating the replication task or replicating data to MySQL?
This error is returned when the downstream MySQL does not load the time zone. You can load the time zone by running mysql_tzinfo_to_sql
. After loading the time zone, you can create tasks and replicate data normally.
mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql -p
If the output of the command above is similar to the following one, the import is successful:
Enter password:
Warning: Unable to load '/usr/share/zoneinfo/iso3166.tab' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/zone.tab' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/zone1970.tab' as time zone. Skipping it.
If the downstream is a special MySQL environment (a public cloud RDS or some MySQL derivative versions) and importing the time zone using the above method fails, you need to specify the MySQL time zone of the downstream using the time-zone
parameter in sink-uri
. You can first query the time zone used by MySQL:
Query the time zone used by MySQL:
show variables like '%time_zone%';
+------------------+--------+ | Variable_name | Value | +------------------+--------+ | system_time_zone | CST | | time_zone | SYSTEM | +------------------+--------+
Specify the time zone when you create the replication task and create the TiCDC service:
cdc cli changefeed create --sink-uri="mysql://root@127.0.0.1:3306/?time-zone=CST" --pd=http://10.0.10.25:2379
NoteCST might be an abbreviation for the following four different time zones:
- Central Standard Time (USA) UT-6:00
- Central Standard Time (Australia) UT+9:30
- China Standard Time UT+8:00
- Cuba Standard Time UT-4:00
In China, CST usually stands for China Standard Time.
How do I handle the incompatibility issue of configuration files caused by TiCDC upgrade?
Refer to Notes for compatibility.
The start-ts
timestamp of the TiCDC task is quite different from the current time. During the execution of this task, replication is interrupted and an error [CDC:ErrBufferReachLimit]
occurs
Since v4.0.9, you can try to enable the unified sorter feature in your replication task, or use the BR tool for an incremental backup and restore, and then start the TiCDC replication task from a new time.
When the downstream of a changefeed is a database similar to MySQL and TiCDC executes a time-consuming DDL statement, all other changefeeds are blocked. How should I handle the issue?
- Pause the execution of the changefeed that contains the time-consuming DDL statement. Then you can see that other changefeeds are no longer blocked.
- Search for the
apply job
field in the TiCDC log and confirm thestart-ts
of the time-consuming DDL statement. - Manually execute the DDL statement in the downstream. After the execution finishes, go on performing the following operations.
- Modify the changefeed configuration and add the above
start-ts
to theignore-txn-start-ts
configuration item. - Resume the paused changefeed.
After I upgrade the TiCDC cluster to v4.0.8, the [CDC:ErrKafkaInvalidConfig]Canal requires old value to be enabled
error is reported when I execute a changefeed
Since v4.0.8, if the canal-json
, canal
or maxwell
protocol is used for output in a changefeed, TiCDC enables the old value feature automatically. However, if you have upgraded TiCDC from an earlier version to v4.0.8 or later, when the changefeed uses the canal-json
, canal
or maxwell
protocol and the old value feature is disabled, this error is reported.
To fix the error, take the following steps:
Set the value of
enable-old-value
in the changefeed configuration file totrue
.Execute
cdc cli changefeed pause
to pause the replication task.cdc cli changefeed pause -c test-cf --pd=http://10.0.10.25:2379
Execute
cdc cli changefeed update
to update the original changefeed configuration.cdc cli changefeed update -c test-cf --pd=http://10.0.10.25:2379 --sink-uri="mysql://127.0.0.1:3306/?max-txn-row=20&worker-number=8" --config=changefeed.toml
Execute
cdc cli changfeed resume
to resume the replication task.cdc cli changefeed resume -c test-cf --pd=http://10.0.10.25:2379
The [tikv:9006]GC life time is shorter than transaction duration, transaction starts at xx, GC safe point is yy
error is reported when I use TiCDC to create a changefeed
You need to run the pd-ctl service-gc-safepoint --pd <pd-addrs>
command to query the current GC safepoint and service GC safepoint. If the GC safepoint is smaller than the start-ts
of the TiCDC replication task (changefeed), you can directly add the --disable-gc-check
option to the cdc cli create changefeed
command to create a changefeed.
If the result of pd-ctl service-gc-safepoint --pd <pd-addrs>
does not have gc_worker service_id
:
- If your PD version is v4.0.8 or earlier, refer to PD issue #3128 for details.
- If your PD is upgraded from v4.0.8 or an earlier version to a later version, refer to PD issue #3366 for details.
When I use TiCDC to replicate messages to Kafka, Kafka returns the Message was too large
error
For TiCDC v4.0.8 or earlier versions, you cannot effectively control the size of the message output to Kafka only by configuring the max-message-bytes
setting for Kafka in the Sink URI. To control the message size, you also need to increase the limit on the bytes of messages to be received by Kafka. To add such a limit, add the following configuration to the Kafka server configuration.
# The maximum byte number of a message that the broker receives
message.max.bytes=2147483648
# The maximum byte number of a message that the broker copies
replica.fetch.max.bytes=2147483648
# The maximum message byte number that the consumer side reads
fetch.message.max.bytes=2147483648
How can I find out whether a DDL statement fails to execute in downstream during TiCDC replication? How to resume the replication?
If a DDL statement fails to execute, the replication task (changefeed) automatically stops. The checkpoint-ts is the DDL statement's finish-ts minus one. If you want TiCDC to retry executing this statement in the downstream, use cdc cli changefeed resume
to resume the replication task. For example:
cdc cli changefeed resume -c test-cf --pd=http://10.0.10.25:2379
If you want to skip this DDL statement that goes wrong, set the start-ts of the changefeed to the checkpoint-ts (the timestamp at which the DDL statement goes wrong) plus one. For example, if the checkpoint-ts at which the DDL statement goes wrong is 415241823337054209
, execute the following commands to skip this DDL statement:
cdc cli changefeed update -c test-cf --pd=http://10.0.10.25:2379 --start-ts 415241823337054210
cdc cli changefeed resume -c test-cf --pd=http://10.0.10.25:2379
- TiCDC replication interruptions
- How do I handle the Error 1298: Unknown or incorrect time zone: 'UTC' error when creating the replication task or replicating data to MySQL?
- How do I handle the incompatibility issue of configuration files caused by TiCDC upgrade?
- The start-ts timestamp of the TiCDC task is quite different from the current time. During the execution of this task, replication is interrupted and an error [CDC:ErrBufferReachLimit] occurs
- When the downstream of a changefeed is a database similar to MySQL and TiCDC executes a time-consuming DDL statement, all other changefeeds are blocked. How should I handle the issue?
- After I upgrade the TiCDC cluster to v4.0.8, the [CDC:ErrKafkaInvalidConfig]Canal requires old value to be enabled error is reported when I execute a changefeed
- The [tikv:9006]GC life time is shorter than transaction duration, transaction starts at xx, GC safe point is yy error is reported when I use TiCDC to create a changefeed
- When I use TiCDC to replicate messages to Kafka, Kafka returns the Message was too large error
- How can I find out whether a DDL statement fails to execute in downstream during TiCDC replication? How to resume the replication?