Docs Home
About TiDB Cloud
Get Started
Develop Applications
- Overview
- Quick Start
  - Build a TiDB Developer Cluster
  - CRUD SQL in TiDB
  - Build a Simple CRUD App with TiDB
    - Java
    - Golang
- Example Applications
  - Build a TiDB Application using Spring Boot
- Connect to TiDB
- Design Database Schema
- Write Data
- Read Data
- Transaction
- Optimize
  - Overview
  - SQL Performance Tuning
  - Best Practices for Performance Tuning
  - Best Practices for Indexing
  - Other Optimization Methods
    - Avoid Implicit Type Conversions
    - Unique Serial Number Generation
- Troubleshoot
- Reference
  - Bookshop Example Application
  - Guidelines
    - Object Naming Convention
    - SQL Development Specifications
- Cloud Native Development Environment
  - Gitpod
Manage Cluster
- Plan Your Cluster
- Create a TiDB Cluster
- Connect to Your TiDB Cluster
  - Connect via a SQL Client
  - Connect via SQL Shell
- Set Up VPC Peering Connections
- Use an HTAP Cluster with TiFlash
- Scale a TiDB Cluster
- Upgrade a TiDB Cluster
- Delete a TiDB Cluster
- Use TiDB Cloud API (Beta)
Migrate Data
Back Up and Restore
Monitor and Alert
- Overview
- Built-in Monitoring
- Built-in Alerting
- Third-Party Monitoring Integrations
  - Datadog Integration
  - Prometheus and Grafana Integration
Tune Performance
- Overview
- Analyze Performance
- SQL Tuning
  - Overview
  - Understanding the Query Execution Plan
  - SQL Optimization Process
    - Overview
    - Logic Optimization
    - Physical Optimization
    - Prepare Execution Plan Cache
  - Control Execution Plans
- TiKV Follower Read
- Coprocessor Cache
- Garbage Collection (GC)
  - Overview
  - Configuration
- Tune TiFlash performance
Manage User Access
- Manage Console User Access
- Configure Cluster Security Settings
Billing
Reference
FAQs
Release Notes
Support
Glossary

Distinct Optimization

This document introduces the distinct optimization in the TiDB query optimizer, including SELECT DISTINCT and DISTINCT in the aggregate functions.

`DISTINCT` modifier in `SELECT` statements

The DISTINCT modifier specifies removal of duplicate rows from the result set. SELECT DISTINCT is transformed to GROUP BY, for example:

mysql> explain SELECT DISTINCT a from t;
+--------------------------+---------+-----------+---------------+-------------------------------------------------------+
| id                       | estRows | task      | access object | operator info                                         |
+--------------------------+---------+-----------+---------------+-------------------------------------------------------+
| HashAgg_6                | 2.40    | root      |               | group by:test.t.a, funcs:firstrow(test.t.a)->test.t.a |
| └─TableReader_11         | 3.00    | root      |               | data:TableFullScan_10                                 |
|   └─TableFullScan_10     | 3.00    | cop[tikv] | table:t       | keep order:false, stats:pseudo                        |
+--------------------------+---------+-----------+---------------+-------------------------------------------------------+
3 rows in set (0.00 sec)

`DISTINCT` option in aggregate functions

Usually, aggregate functions with the DISTINCT option is executed in the TiDB layer in a single-threaded execution model.

The tidb_opt_distinct_agg_push_down system variable or the distinct-agg-push-down configuration item in TiDB controls whether to rewrite the distinct aggregate queries and push them to the TiKV or TiFlash Coprocessor.

The tidb_opt_distinct_agg_push_down system variable in TiDB controls whether to rewrite the distinct aggregate queries and push them to the TiKV or TiFlash Coprocessor.

Take the following queries as an example of this optimization. tidb_opt_distinct_agg_push_down is disabled by default, which means the aggregate functions are executed in the TiDB layer. After enabling this optimization by setting its value to 1, the distinct a part of count(distinct a) is pushed to TiKV or TiFlash Coprocessor: there is a HashAgg_5 to remove the duplicated values on column a in the TiKV Coprocessor. It might reduce the computation overhead of HashAgg_8 in the TiDB layer.

mysql> desc select count(distinct a) from test.t;
+-------------------------+----------+-----------+---------------+------------------------------------------+
| id                      | estRows  | task      | access object | operator info                            |
+-------------------------+----------+-----------+---------------+------------------------------------------+
| StreamAgg_6             | 1.00     | root      |               | funcs:count(distinct test.t.a)->Column#4 |
| └─TableReader_10        | 10000.00 | root      |               | data:TableFullScan_9                     |
|   └─TableFullScan_9     | 10000.00 | cop[tikv] | table:t       | keep order:false, stats:pseudo           |
+-------------------------+----------+-----------+---------------+------------------------------------------+
3 rows in set (0.01 sec)

mysql> set session tidb_opt_distinct_agg_push_down = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> desc select count(distinct a) from test.t;
+---------------------------+----------+-----------+---------------+------------------------------------------+
| id                        | estRows  | task      | access object | operator info                            |
+---------------------------+----------+-----------+---------------+------------------------------------------+
| HashAgg_8                 | 1.00     | root      |               | funcs:count(distinct test.t.a)->Column#3 |
| └─TableReader_9           | 1.00     | root      |               | data:HashAgg_5                           |
|   └─HashAgg_5             | 1.00     | cop[tikv] |               | group by:test.t.a,                       |
|     └─TableFullScan_7     | 10000.00 | cop[tikv] | table:t       | keep order:false, stats:pseudo           |
+---------------------------+----------+-----------+---------------+------------------------------------------+
4 rows in set (0.00 sec)

Download PDF Request docs changes

What’s on this page

DISTINCT modifier in SELECT statements
DISTINCT option in aggregate functions

Was this page helpful?

Distinct Optimization

DISTINCT modifier in SELECT statements

DISTINCT option in aggregate functions

`DISTINCT` modifier in `SELECT` statements

`DISTINCT` option in aggregate functions