Docs Home
About TiDB
Quick Start
Develop
- Overview
- Quick Start
  - Build a TiDB Cluster in TiDB Cloud (Developer Tier)
  - CRUD SQL in TiDB
  - Build a Simple CRUD App with TiDB
    - Java
    - Golang
- Example Applications
  - Build a TiDB Application using Spring Boot
- Connect to TiDB
- Design Database Schema
- Write Data
- Read Data
- Transaction
- Optimize
  - Overview
  - SQL Performance Tuning
  - Best Practices for Performance Tuning
  - Best Practices for Indexing
  - Other Optimization Methods
    - Avoid Implicit Type Conversions
    - Unique Serial Number Generation
- Troubleshoot
- Reference
  - Bookshop Example Application
  - Guidelines
    - Object Naming Convention
    - SQL Development Specifications
  - Archived Docs
- Cloud Native Development Environment
  - Gitpod
- Third-party Support
  - Third-Party Libraries Support
  - Integrate with ProxySQL
Deploy
- Software and Hardware Requirements
- Environment Configuration Checklist
- Plan Cluster Topology
- Install and Start
  - Use TiUP (Recommended)
  - Deploy in Kubernetes
- Verify Cluster Status
- Test Cluster Performance
  - Test TiDB Using Sysbench
  - Test TiDB Using TPC-C
Migrate
Integrate
- Overview
- Integration Scenarios
  - Integrate with Confluent Cloud
  - Integrate with Apache Kafka and Apache Flink
Maintain
Monitor and Alert
Troubleshoot
Performance Tuning
- Tuning Guide
- Configuration Tuning
  - System Tuning
    - Operating System Tuning
  - Software Tuning
    - Configuration
    - Coprocessor Cache
- SQL Tuning
  - Overview
  - Understanding the Query Execution Plan
  - SQL Optimization Process
    - Overview
    - Logic Optimization
    - Physical Optimization
    - Prepare Execution Plan Cache
  - Control Execution Plans
Tutorials
TiDB Tools
- Overview
- Use Cases
- Download
- TiUP
- PingCAP Clinic Diagnostic Service
- TiDB Operator
- Dumpling
- TiDB Lightning
  - Overview
  - Prechecks and requirements
  - Key Features
  - Tutorial
  - Deploy
  - Configure
  - Monitor
  - FAQ
  - Glossary
- TiDB Data Migration
  - About TiDB Data Migration
  - Architecture
  - Quick Start
  - Deploy a DM cluster
  - Tutorials
    - Create a Data Source
    - Manage Data Sources
    - Configure Tasks
    - Table Routing
    - Block and Allow Lists
    - Binlog Event Filter
    - Filter DMLs Using SQL Expressions
    - Manage a Data Migration Task
  - Advanced Tutorials
    - Merge and Migrate Data from Sharded Tables
    - Migrate from MySQL Databases that Use GH-ost/PT-osc
    - Migrate Data to a Downstream TiDB Table with More Columns
    - Continuous Data Validation
  - Maintain
    - Cluster Upgrade
      - Maintain DM Clusters Using TiUP (Recommended)
      - Manually Upgrade from v1.0.x to v2.0+
    - Tools
      - Manage Using WebUI
      - Manage Using dmctl
    - Performance Tuning
    - Manage Data Sources
      - Switch the MySQL Instance to Be Migrated
    - Manage Tasks
      - Handle Failed DDL Statements
      - Manage Schemas of Tables to be Migrated
    - Export and Import Data Sources and Task Configurations of Clusters
    - Handle Alerts
    - Daily Check
  - Reference
    - Architecture
      - DM-worker
      - Relay Log
    - Command Line
      - DM-master & DM-worker
    - Configuration Files
    - OpenAPI
    - Compatibility Catalog
    - Secure
      - Enable TLS for DM Connections
      - Generate Self-signed Certificates
    - Monitoring and Alerts
      - Monitoring Metrics
      - Alert Rules
    - Error Codes
    - Glossary
  - Example
  - Troubleshoot
    - FAQ
    - Handle Errors
  - Release Notes
- Backup & Restore (BR)
- Point-in-Time Recovery
- TiDB Binlog
  - Overview
  - Quick Start
  - Deploy
  - Maintain
  - Configure
    - Pump
    - Drainer
  - Upgrade
  - Monitor
  - Reparo
  - binlogctl
  - Binlog Consumer Client
  - TiDB Binlog Relay Log
  - Bidirectional Replication Between TiDB Clusters
  - Glossary
  - Troubleshoot
    - Troubleshoot
    - Handle Errors
  - FAQ
- TiCDC
  - Overview
  - Deploy
  - Maintain
  - Monitor and Alert
    - Monitoring Metrics
    - Alert Rules
  - Troubleshoot
  - Reference
  - FAQs
  - Glossary
- Dumpling
- sync-diff-inspector
- TiSpark
  - User Guide
Reference
FAQs
Release Notes
- All Releases
- Release Timeline
- TiDB Versioning
- TiDB Installation Packages
- v6.2
  - 6.2.0-DMR
- v6.1
  - 6.1.0
- v6.0
  - 6.0.0-DMR
- v5.4
- v5.3
- v5.2
- v5.1
- v5.0
- v4.0
- v3.1
- v3.0
- v2.1
- v2.0
- v1.0
  - 1.0.8
  - 1.0.7
  - 1.0.6
  - 1.0.5
  - 1.0.4
  - 1.0.3
  - 1.0.2
  - 1.0.1
  - 1.0
  - Pre-GA
  - RC4
  - RC3
  - RC2
  - RC1
Glossary

Migrate Data from CSV Files to TiDB

This document describes how to migrate data from CSV files to TiDB.

TiDB Lightning can read data from CSV files and other delimiter formats, such as tab-separated values (TSV). For other flat file data sources, you can also refer to this document and migrate data to TiDB.

Prerequisites

Step 1. Prepare the CSV files

Put all the CSV files in the same directory. If you need TiDB Lightning to recognize all CSV files, the file names should meet the following requirements:

If a CSV file contains the data for an entire table, name the file ${db_name}.${table_name}.csv.
If the data of one table is separated into multiple CSV files, append a numeric suffix to these CSV files. For example, ${db_name}.${table_name}.003.csv. The numeric suffixes can be inconsecutive but must be in ascending order. You also need to add extra zeros before the number to ensure all the suffixes are in the same length.

Step 2. Create the target table schema

Because CSV files do not contain schema information, before importing data from CSV files into TiDB, you need to create the target table schema. You can create the target table schema by either of the following two methods:

Method 1: create the target table schema using TiDB Lightning.
Create SQL files that contain the required DDL statements:
- Add CREATE DATABASE statements in the ${db_name}-schema-create.sql files.
- Add CREATE TABLE statements in the ${db_name}.${table_name}-schema.sql files.
Method 2: create the target table schema manually.

Step 3. Create the configuration file

Create a tidb-lightning.toml file with the following content:

[lightning]
# Log
level = "info"
file = "tidb-lightning.log"

[tikv-importer]
# "local": Default backend. The local backend is recommended to import large volumes of data (1 TiB or more). During the import, the target TiDB cluster cannot provide any service.
# "tidb": The "tidb" backend is recommended to import data less than 1 TiB. During the import, the target TiDB cluster can provide service normally.
backend = "local"
# Set the temporary storage directory for the sorted Key-Value files. The directory must be empty, and the storage space must be greater than the size of the dataset to be imported. For better import performance, it is recommended to use a directory different from `data-source-dir` and use flash storage, which can use I/O exclusively.
sorted-kv-dir = "/mnt/ssd/sorted-kv-dir"

[mydumper]
# Directory of the data source.
data-source-dir = "${data-path}" # A local path or S3 path. For example, 's3://my-bucket/sql-backup'.

# Defines CSV format.
[mydumper.csv]
# Field separator of the CSV file. Must not be empty. If the source file contains fields that are not string or numeric, such as binary, blob, or bit, it is recommended not to usesimple delimiters such as ",", and use an uncommon character combination like "|+|" instead.
separator = ','
# Delimiter. Can be zero or multiple characters.
delimiter = '"'
# Configures whether the CSV file has a table header.
# If this item is set to true, TiDB Lightning uses the first line of the CSV file to parse the corresponding relationship of fields.
header = true
# Configures whether the CSV file contains NULL.
# If this item is set to true, any column of the CSV file cannot be parsed as NULL.
not-null = false
# If `not-null` is set to false (CSV contains NULL),
# The following value is parsed as NULL.
null = '\N'
# Whether to treat the backslash ('\') in the string as an escape character.
backslash-escape = true
# Whether to trim the last separator at the end of each line.
trim-last-separator = false

[tidb]
# The target cluster.
host = ${host}            # e.g.: 172.16.32.1
port = ${port}            # e.g.: 4000
user = "${user_name}"     # e.g.: "root"
password = "${password}"  # e.g.: "rootroot"
status-port = ${status-port} # During the import, TiDB Lightning needs to obtain the table schema information from the TiDB status port. e.g.: 10080
pd-addr = "${ip}:${port}" # The address of the PD cluster, e.g.: 172.16.31.3:2379. TiDB Lightning obtains some information from PD. When backend = "local", you must specify status-port and pd-addr correctly. Otherwise, the import will be abnormal.

For more information on the configuration file, refer to TiDB Lightning Configuration.

Step 4. Tune the import performance (optional)

When you import data from CSV files with a uniform size of about 256 MiB, TiDB Lightning works in the best performance. However, if you import data from a single large CSV file, TiDB Lightning can only use one thread to process the import by default, which might slow down the import speed.

To speed up the import, you can split a large CSV file into smaller ones. For a CSV file in a common format, before TiDB Lightning reads the entire file, it is hard to quickly locate the beginning and ending positions of each line. Therefore, TiDB Lightning does not automatically split CSV files by default. But if your CSV files to be imported meet certain format requirements, you can enable the strict-format mode. In this mode, TiDB Lightning automatically splits a single large CSV file into multiple files, each in about 256 MiB, and processes them in parallel.

Note

If a CSV file is not in a strict format but the strict-format mode is set to true by mistake, a field that spans multiple lines will be split into two fields. This causes the parsing to fail, and TiDB Lightning might import the corrupted data without reporting any error.

In a strict-format CSV file, each field only takes up one line. It must meet the following requirements:

The delimiter is empty.
Each field does not contain CR (\r) or LF (\n).

If your CSV file meets the above requirements, you can speed up the import by enabling the strict-format mode as follows:

[mydumper]
strict-format = true

Step 5. Import the data

To start the import, run tidb-lightning. If you launch the program in the command line, the process might exit unexpectedly after receiving a SIGHUP signal. In this case, it is recommended to run the program using a nohup or screen tool. For example:

nohup tiup tidb-lightning -config tidb-lightning.toml > nohup.out 2>&1 &

After the import starts, you can check the progress of the import by either of the following methods:

grep the keyword progress in the log. The progress is updated every 5 minutes by default.
Check progress in the monitoring dashboard.
Check progress in the TiDB Lightning web interface.

After TiDB Lightning completes the import, it exits automatically. Check whether tidb-lightning.log contains the whole procedure completed in the last lines. If yes, the import is successful. If no, the import encounters an error. Address the error as instructed in the error message.

Note

Whether the import is successful or not, the last line of the log shows tidb lightning exit. It means that TiDB Lightning exits normally, but does not necessarily mean that the import is successful.

If the import fails, refer to TiDB Lightning FAQ for troubleshooting.

Other file formats

If your data source is in other formats, to migrate data from your data source, you must end the file name with .csv and make corresponding changes in the [mydumper.csv] section of the tidb-lightning.toml configuration file. Here are example changes for common formats:

TSV:

# Format example
# ID    Region    Count
# 1     East      32
# 2     South     NULL
# 3     West      10
# 4     North     39

# Format configuration
[mydumper.csv]
separator = "\t"
delimiter = ''
header = true
not-null = false
null = 'NULL'
backslash-escape = false
trim-last-separator = false

TPC-H DBGEN:

# Format example
# 1|East|32|
# 2|South|0|
# 3|West|10|
# 4|North|39|

# Format configuration
[mydumper.csv]
separator = '|'
delimiter = ''
header = false
not-null = true
backslash-escape = false
trim-last-separator = true

What's next

CSV Support and Restrictions.

Download PDF Request docs changes Edit this page

What’s on this page

Prerequisites
Step 1. Prepare the CSV files
Step 2. Create the target table schema
Step 3. Create the configuration file
Step 4. Tune the import performance (optional)
Step 5. Import the data
Other file formats
What's next

Was this page helpful?