- Docs Home
- About TiDB Cloud
- Get Started
- Develop Applications
- Overview
- Quick Start
- Build a TiDB Developer Cluster
- CRUD SQL in TiDB
- Build a Simple CRUD App with TiDB
- Example Applications
- Connect to TiDB
- Design Database Schema
- Write Data
- Read Data
- Transaction
- Optimize
- Troubleshoot
- Reference
- Cloud Native Development Environment
- Manage Cluster
- Plan Your Cluster
- Create a TiDB Cluster
- Connect to Your TiDB Cluster
- Set Up VPC Peering Connections
- Use an HTAP Cluster with TiFlash
- Scale a TiDB Cluster
- Upgrade a TiDB Cluster
- Delete a TiDB Cluster
- Use TiDB Cloud API (Beta)
- Migrate Data
- Import Sample Data
- Migrate Data into TiDB
- Configure Amazon S3 Access and GCS Access
- Migrate from MySQL-Compatible Databases
- Migrate Incremental Data from MySQL-Compatible Databases
- Migrate from Amazon Aurora MySQL in Bulk
- Import or Migrate from Amazon S3 or GCS to TiDB Cloud
- Import CSV Files from Amazon S3 or GCS into TiDB Cloud
- Import Apache Parquet Files from Amazon S3 or GCS into TiDB Cloud
- Troubleshoot Access Denied Errors during Data Import from Amazon S3
- Export Data from TiDB
- Back Up and Restore
- Monitor and Alert
- Overview
- Built-in Monitoring
- Built-in Alerting
- Third-Party Monitoring Integrations
- Tune Performance
- Overview
- Analyze Performance
- SQL Tuning
- Overview
- Understanding the Query Execution Plan
- SQL Optimization Process
- Overview
- Logic Optimization
- Physical Optimization
- Prepare Execution Plan Cache
- Control Execution Plans
- TiKV Follower Read
- Coprocessor Cache
- Garbage Collection (GC)
- Tune TiFlash performance
- Manage User Access
- Billing
- Reference
- TiDB Cluster Architecture
- TiDB Cloud Cluster Limits and Quotas
- TiDB Limitations
- SQL
- Explore SQL with TiDB
- SQL Language Structure and Syntax
- SQL Statements
ADD COLUMN
ADD INDEX
ADMIN
ADMIN CANCEL DDL
ADMIN CHECKSUM TABLE
ADMIN CHECK [TABLE|INDEX]
ADMIN SHOW DDL [JOBS|QUERIES]
ALTER DATABASE
ALTER INDEX
ALTER TABLE
ALTER TABLE COMPACT
ALTER USER
ANALYZE TABLE
BATCH
BEGIN
CHANGE COLUMN
COMMIT
CHANGE DRAINER
CHANGE PUMP
CREATE [GLOBAL|SESSION] BINDING
CREATE DATABASE
CREATE INDEX
CREATE ROLE
CREATE SEQUENCE
CREATE TABLE LIKE
CREATE TABLE
CREATE USER
CREATE VIEW
DEALLOCATE
DELETE
DESC
DESCRIBE
DO
DROP [GLOBAL|SESSION] BINDING
DROP COLUMN
DROP DATABASE
DROP INDEX
DROP ROLE
DROP SEQUENCE
DROP STATS
DROP TABLE
DROP USER
DROP VIEW
EXECUTE
EXPLAIN ANALYZE
EXPLAIN
FLASHBACK TABLE
FLUSH PRIVILEGES
FLUSH STATUS
FLUSH TABLES
GRANT <privileges>
GRANT <role>
INSERT
KILL [TIDB]
MODIFY COLUMN
PREPARE
RECOVER TABLE
RENAME INDEX
RENAME TABLE
REPLACE
REVOKE <privileges>
REVOKE <role>
ROLLBACK
SELECT
SET DEFAULT ROLE
SET [NAMES|CHARACTER SET]
SET PASSWORD
SET ROLE
SET TRANSACTION
SET [GLOBAL|SESSION] <variable>
SHOW ANALYZE STATUS
SHOW [GLOBAL|SESSION] BINDINGS
SHOW BUILTINS
SHOW CHARACTER SET
SHOW COLLATION
SHOW [FULL] COLUMNS FROM
SHOW CREATE SEQUENCE
SHOW CREATE TABLE
SHOW CREATE USER
SHOW DATABASES
SHOW DRAINER STATUS
SHOW ENGINES
SHOW ERRORS
SHOW [FULL] FIELDS FROM
SHOW GRANTS
SHOW INDEX [FROM|IN]
SHOW INDEXES [FROM|IN]
SHOW KEYS [FROM|IN]
SHOW MASTER STATUS
SHOW PLUGINS
SHOW PRIVILEGES
SHOW [FULL] PROCESSSLIST
SHOW PROFILES
SHOW PUMP STATUS
SHOW SCHEMAS
SHOW STATS_HEALTHY
SHOW STATS_HISTOGRAMS
SHOW STATS_META
SHOW STATUS
SHOW TABLE NEXT_ROW_ID
SHOW TABLE REGIONS
SHOW TABLE STATUS
SHOW [FULL] TABLES
SHOW [GLOBAL|SESSION] VARIABLES
SHOW WARNINGS
SHUTDOWN
SPLIT REGION
START TRANSACTION
TABLE
TRACE
TRUNCATE
UPDATE
USE
WITH
- Data Types
- Functions and Operators
- Overview
- Type Conversion in Expression Evaluation
- Operators
- Control Flow Functions
- String Functions
- Numeric Functions and Operators
- Date and Time Functions
- Bit Functions and Operators
- Cast Functions and Operators
- Encryption and Compression Functions
- Locking Functions
- Information Functions
- JSON Functions
- Aggregate (GROUP BY) Functions
- Window Functions
- Miscellaneous Functions
- Precision Math
- Set Operations
- List of Expressions for Pushdown
- TiDB Specific Functions
- Clustered Indexes
- Constraints
- Generated Columns
- SQL Mode
- Table Attributes
- Transactions
- Views
- Partitioning
- Temporary Tables
- Cached Tables
- Character Set and Collation
- Read Historical Data
- System Tables
mysql
- INFORMATION_SCHEMA
- Overview
ANALYZE_STATUS
CLIENT_ERRORS_SUMMARY_BY_HOST
CLIENT_ERRORS_SUMMARY_BY_USER
CLIENT_ERRORS_SUMMARY_GLOBAL
CHARACTER_SETS
CLUSTER_INFO
COLLATIONS
COLLATION_CHARACTER_SET_APPLICABILITY
COLUMNS
DATA_LOCK_WAITS
DDL_JOBS
DEADLOCKS
ENGINES
KEY_COLUMN_USAGE
PARTITIONS
PROCESSLIST
REFERENTIAL_CONSTRAINTS
SCHEMATA
SEQUENCES
SESSION_VARIABLES
SLOW_QUERY
STATISTICS
TABLES
TABLE_CONSTRAINTS
TABLE_STORAGE_STATS
TIDB_HOT_REGIONS_HISTORY
TIDB_INDEXES
TIDB_SERVERS_INFO
TIDB_TRX
TIFLASH_REPLICA
TIKV_REGION_PEERS
TIKV_REGION_STATUS
TIKV_STORE_STATUS
USER_PRIVILEGES
VIEWS
- System Variables
- API Reference
- Storage Engines
- Dumpling
- Table Filter
- Troubleshoot Inconsistency Between Data and Indexes
- FAQs
- Release Notes
- Support
- Glossary
TiFlash Overview
TiFlash is the key component that makes TiDB essentially an Hybrid Transactional/Analytical Processing (HTAP) database. As a columnar storage extension of TiKV, TiFlash provides both good isolation level and strong consistency guarantee.
In TiFlash, the columnar replicas are asynchronously replicated according to the Raft Learner consensus algorithm. When these replicas are read, the Snapshot Isolation level of consistency is achieved by validating Raft index and multi-version concurrency control (MVCC).
With TiDB Cloud, you can create an HTAP cluster easily by specifying one or more TiFlash nodes according to your HTAP workload. If the TiFlash node count is not specified when you create the cluster or you want to add more TiFlash nodes, you can change the node count by scaling the cluster.
Architecture
The above figure is the architecture of TiDB in its HTAP form, including TiFlash nodes.
TiFlash provides the columnar storage, with a layer of coprocessors efficiently implemented by ClickHouse. Similar to TiKV, TiFlash also has a Multi-Raft system, which supports replicating and distributing data in the unit of Region (see Data Storage for details).
TiFlash conducts real-time replication of data in the TiKV nodes at a low cost that does not block writes in TiKV. Meanwhile, it provides the same read consistency as in TiKV and ensures that the latest data is read. The Region replica in TiFlash is logically identical to those in TiKV, and is split and merged along with the Leader replica in TiKV at the same time.
TiFlash is compatible with both TiDB and TiSpark, which enables you to freely choose between these two computing engines.
It is recommended that you deploy TiFlash in different nodes from TiKV to ensure workload isolation. It is also acceptable to deploy TiFlash and TiKV in the same node if no business isolation is required.
Currently, data cannot be written directly into TiFlash. You need to write data in TiKV and then replicate it to TiFlash, because it connects to the TiDB cluster as a Learner role. TiFlash supports data replication in the unit of table, but no data is replicated by default after deployment. To replicate data of a specified table, see Create TiFlash replicas for tables.
TiFlash has three components: the columnar storage module, tiflash proxy
, and pd buddy
. tiflash proxy
is responsible for the communication using the Multi-Raft consensus algorithm. pd buddy
works with PD to replicate data from TiKV to TiFlash in the unit of table.
When TiDB receives the DDL command to create replicas in TiFlash, the pd buddy
component acquires the information of the table to be replicated via the status port of TiDB, and sends the information to PD. Then PD performs the corresponding data scheduling according to the information provided by pd buddy
.
Key features
TiFlash has the following key features:
Asynchronous replication
The replica in TiFlash is asynchronously replicated as a special role, Raft Learner. This means when the TiFlash node is down or high network latency occurs, applications in TiKV can still proceed normally.
This replication mechanism inherits two advantages of TiKV: automatic load balancing and high availability.
- TiFlash does not rely on additional replication channels, but directly receives data from TiKV in a many-to-many manner.
- As long as the data is not lost in TiKV, you can restore the replica in TiFlash at any time.
Consistency
TiFlash provides the same Snapshot Isolation level of consistency as TiKV, and ensures that the latest data is read, which means that you can read the data previously written in TiKV. Such consistency is achieved by validating the data replication progress.
Every time TiFlash receives a read request, the Region replica sends a progress validation request (a lightweight RPC request) to the Leader replica. TiFlash performs the read operation only after the current replication progress includes the data covered by the timestamp of the read request.
Intelligent choice
TiDB can automatically choose to use TiFlash (column-wise) or TiKV (row-wise), or use both of them in one query to ensure the best performance.
This selection mechanism is similar to that of TiDB which chooses different indexes to execute query. TiDB optimizer makes the appropriate choice based on statistics of the read cost.
Computing acceleration
TiFlash accelerates the computing of TiDB in two ways:
- The columnar storage engine is more efficient in performing read operation.
- TiFlash shares part of the computing workload of TiDB.
TiFlash shares the computing workload in the same way as the TiKV Coprocessor does: TiDB pushes down the computing that can be completed in the storage layer. Whether the computing can be pushed down depends on the support of TiFlash. For details, see Supported pushdown calculations.
Use TiFlash
After TiFlash is deployed, data replication does not automatically begin. You need to manually specify the tables to be replicated.
You can either use TiDB to read TiFlash replicas for medium-scale analytical processing, or use TiSpark to read TiFlash replicas for large-scale analytical processing, which is based on your own needs. See the following sections for details:
To experience the whole process from importing data to querying in a TPC-H dataset, refer to Quick Start Guide for TiDB HTAP.
See also
- To deploy a new cluster with TiFlash nodes, see Deploy a TiDB cluster using TiUP.
- To add a TiFlash node in a deployed cluster, see Scale out a TiFlash cluster.
- Maintain a TiFlash cluster.
- Tune TiFlash performance.
- Configure TiFlash.
- Monitor the TiFlash cluster.
- Learn TiFlash alert rules.
- Troubleshoot a TiFlash cluster.
- Supported push-down calculations in TiFlash
- Data validation in TiFlash
- TiFlash compatibility