Docs Home
About TiDB
Quick Start
Develop
- Overview
- Quick Start
  - Build a TiDB Cluster in TiDB Cloud (Developer Tier)
  - CRUD SQL in TiDB
  - Build a Simple CRUD App with TiDB
    - Java
    - Golang
- Example Applications
  - Build a TiDB Application using Spring Boot
- Connect to TiDB
- Design Database Schema
- Write Data
- Read Data
- Transaction
- Optimize
  - Overview
  - SQL Performance Tuning
  - Best Practices for Performance Tuning
  - Best Practices for Indexing
  - Other Optimization Methods
    - Avoid Implicit Type Conversions
    - Unique Serial Number Generation
- Troubleshoot
- Reference
  - Bookshop Example Application
  - Guidelines
    - Object Naming Convention
    - SQL Development Specifications
  - Archived Docs
- Cloud Native Development Environment
  - Gitpod
- Third-party Support
  - Third-Party Libraries Support
  - Integrate with ProxySQL
Deploy
- Software and Hardware Requirements
- Environment Configuration Checklist
- Plan Cluster Topology
- Install and Start
  - Use TiUP (Recommended)
  - Deploy in Kubernetes
- Verify Cluster Status
- Test Cluster Performance
  - Test TiDB Using Sysbench
  - Test TiDB Using TPC-C
Migrate
Integrate
- Overview
- Integration Scenarios
  - Integrate with Confluent Cloud and Snowflake
  - Integrate with Apache Kafka and Apache Flink
Maintain
Monitor and Alert
Troubleshoot
Performance Tuning
- Tuning Guide
- Configuration Tuning
  - System Tuning
    - Operating System Tuning
  - Software Tuning
    - Configuration
    - Coprocessor Cache
- SQL Tuning
  - Overview
  - Understanding the Query Execution Plan
  - SQL Optimization Process
    - Overview
    - Logic Optimization
    - Physical Optimization
    - Prepare Execution Plan Cache
  - Control Execution Plans
Tutorials
TiDB Tools
- Overview
- Use Cases
- Download
- TiUP
- PingCAP Clinic Diagnostic Service
- TiDB Operator
- Dumpling
- TiDB Lightning
  - Overview
  - Prechecks and requirements
  - Key Features
  - Tutorial
  - Deploy
  - Configure
  - Monitor
  - FAQ
  - Glossary
- TiDB Data Migration
  - About TiDB Data Migration
  - Architecture
  - Quick Start
  - Deploy a DM cluster
  - Tutorials
    - Create a Data Source
    - Manage Data Sources
    - Configure Tasks
    - Table Routing
    - Block and Allow Lists
    - Binlog Event Filter
    - Filter DMLs Using SQL Expressions
    - Manage a Data Migration Task
  - Advanced Tutorials
    - Merge and Migrate Data from Sharded Tables
    - Migrate from MySQL Databases that Use GH-ost/PT-osc
    - Migrate Data to a Downstream TiDB Table with More Columns
  - Maintain
    - Cluster Upgrade
      - Maintain DM Clusters Using TiUP (Recommended)
      - Manually Upgrade from v1.0.x to v2.0+
    - Tools
      - Manage Using WebUI
      - Manage Using dmctl
    - Performance Tuning
    - Manage Data Sources
      - Switch the MySQL Instance to Be Migrated
    - Manage Tasks
      - Handle Failed DDL Statements
      - Manage Schemas of Tables to be Migrated
    - Export and Import Data Sources and Task Configurations of Clusters
    - Handle Alerts
    - Daily Check
  - Reference
    - Architecture
      - DM-worker
      - Relay Log
    - Command Line
      - DM-master & DM-worker
    - Configuration Files
    - OpenAPI
    - Compatibility Catalog
    - Secure
      - Enable TLS for DM Connections
      - Generate Self-signed Certificates
    - Monitoring and Alerts
      - Monitoring Metrics
      - Alert Rules
    - Error Codes
    - Glossary
  - Example
  - Troubleshoot
    - FAQ
    - Handle Errors
  - Release Notes
- Backup & Restore (BR)
- TiDB Binlog
  - Overview
  - Quick Start
  - Deploy
  - Maintain
  - Configure
    - Pump
    - Drainer
  - Upgrade
  - Monitor
  - Reparo
  - binlogctl
  - Binlog Consumer Client
  - TiDB Binlog Relay Log
  - Bidirectional Replication Between TiDB Clusters
  - Glossary
  - Troubleshoot
    - Troubleshoot
    - Handle Errors
  - FAQ
- TiCDC
  - Overview
  - Deploy
  - Maintain
  - Monitor and Alert
    - Monitoring Metrics
    - Alert Rules
  - Troubleshoot
  - Reference
  - FAQs
  - Glossary
- Dumpling
- sync-diff-inspector
- TiSpark
  - User Guide
Reference
FAQs
Release Notes
- All Releases
- Release Timeline
- TiDB Versioning
- v6.1
  - 6.1.0
- v6.0
  - 6.0.0-DMR
- v5.4
- v5.3
- v5.2
- v5.1
- v5.0
- v4.0
- v3.1
- v3.0
- v2.1
- v2.0
- v1.0
  - 1.0.8
  - 1.0.7
  - 1.0.6
  - 1.0.5
  - 1.0.4
  - 1.0.3
  - 1.0.2
  - 1.0.1
  - 1.0
  - Pre-GA
  - RC4
  - RC3
  - RC2
  - RC1
Glossary

Multi-table Join Queries

In many scenarios, you need to use one query to get data from multiple tables. You can use the JOIN statement to combine the data from two or more tables.

Join types

This section describes the Join types in detail.

INNER JOIN

The join result of an inner join returns only rows that match the join condition.

Inner Join

For example, if you want to know the most prolific author, you need to join the author table named authors with the book author table named book_authors.

SQL
Java

In the following SQL statement, use the keyword JOIN to declare that you want to join the rows of the left table authors and the right table book_authors as an inner join with the join condition a.id = ba.author_id. The result set will only contain rows that satisfy the join condition. If an author has not written any books, then his record in authors table will not satisfy the join condition and will therefore not appear in the result set.

SELECT ANY_VALUE(a.id) AS author_id, ANY_VALUE(a.name) AS author_name, COUNT(ba.book_id) AS books
FROM authors a
JOIN book_authors ba ON a.id = ba.author_id
GROUP BY ba.author_id
ORDER BY books DESC
LIMIT 10;

The query results are as follows:

+------------+----------------+-------+
| author_id  | author_name    | books |
+------------+----------------+-------+
|  431192671 | Emilie Cassin  |     7 |
|  865305676 | Nola Howell    |     7 |
|  572207928 | Lamar Koch     |     6 |
| 3894029860 | Elijah Howe    |     6 |
| 1150614082 | Cristal Stehr  |     6 |
| 4158341032 | Roslyn Rippin  |     6 |
| 2430691560 | Francisca Hahn |     6 |
| 3346415350 | Leta Weimann   |     6 |
| 1395124973 | Albin Cole     |     6 |
| 2768150724 | Caleb Wyman    |     6 |
+------------+----------------+-------+
10 rows in set (0.01 sec)

public List<Author> getTop10AuthorsOrderByBooks() throws SQLException {
    List<Author> authors = new ArrayList<>();
    try (Connection conn = ds.getConnection()) {
        Statement stmt = conn.createStatement();
        ResultSet rs = stmt.executeQuery("""
        SELECT ANY_VALUE(a.id) AS author_id, ANY_VALUE(a.name) AS author_name, COUNT(ba.book_id) AS books
        FROM authors a
        JOIN book_authors ba ON a.id = ba.author_id
        GROUP BY ba.author_id
        ORDER BY books DESC
        LIMIT 10;
        """);
        while (rs.next()) {
            Author author = new Author();
            author.setId(rs.getLong("author_id"));
            author.setName(rs.getString("author_name"));
            author.setBooks(rs.getInt("books"));
            authors.add(author);
        }
    }
    return authors;
}

LEFT OUTER JOIN

The left outer join returns all the rows in the left table and the values in the right table that match the join condition. If no rows are matched in the right table, it will be filled with NULL.

Left Outer Join

In some cases, you want to use multiple tables to complete the data query, but do not want the data set to become too small because the join condition are not met.

For example, on the homepage of the Bookshop app, you want to display a list of new books with average ratings. In this case, the new books may not have been rated by anyone yet. Using inner joins will cause the information of these unrated books to be filtered out, which is not what you expect.

SQL
Java

In the following SQL statement, use the LEFT JOIN keyword to declare that the left table books will be joined to the right table ratings in a left outer join, thus ensuring that all rows in the books table are returned.

SELECT b.id AS book_id, ANY_VALUE(b.title) AS book_title, AVG(r.score) AS average_score
FROM books b
LEFT JOIN ratings r ON b.id = r.book_id
GROUP BY b.id
ORDER BY b.published_at DESC
LIMIT 10;

The query results are as follows:

+------------+---------------------------------+---------------+
| book_id    | book_title                      | average_score |
+------------+---------------------------------+---------------+
| 3438991610 | The Documentary of lion         |        2.7619 |
| 3897175886 | Torey Kuhn                      |        3.0000 |
| 1256171496 | Elmo Vandervort                 |        2.5500 |
| 1036915727 | The Story of Munchkin           |        2.0000 |
|  270254583 | Tate Kovacek                    |        2.5000 |
| 1280950719 | Carson Damore                   |        3.2105 |
| 1098041838 | The Documentary of grasshopper  |        2.8462 |
| 1476566306 | The Adventures of Vince Sanford |        2.3529 |
| 4036300890 | The Documentary of turtle       |        2.4545 |
| 1299849448 | Antwan Olson                    |        3.0000 |
+------------+---------------------------------+---------------+
10 rows in set (0.30 sec)

It seems that the latest published book already has a lot of ratings. To verify the above method, let's delete all the ratings of the book The Documentary of lion through the SQL statement:

DELETE FROM ratings WHERE book_id = 3438991610;

Query again. The book The Documentary of lion still appears in the result set, but the average_score column calculated from score of the right table ratings is filled with NULL.

+------------+---------------------------------+---------------+
| book_id    | book_title                      | average_score |
+------------+---------------------------------+---------------+
| 3438991610 | The Documentary of lion         |          NULL |
| 3897175886 | Torey Kuhn                      |        3.0000 |
| 1256171496 | Elmo Vandervort                 |        2.5500 |
| 1036915727 | The Story of Munchkin           |        2.0000 |
|  270254583 | Tate Kovacek                    |        2.5000 |
| 1280950719 | Carson Damore                   |        3.2105 |
| 1098041838 | The Documentary of grasshopper  |        2.8462 |
| 1476566306 | The Adventures of Vince Sanford |        2.3529 |
| 4036300890 | The Documentary of turtle       |        2.4545 |
| 1299849448 | Antwan Olson                    |        3.0000 |
+------------+---------------------------------+---------------+
10 rows in set (0.30 sec)

What happens if you use INNER JOIN? It's up to you to have a try.

public List<Book> getLatestBooksWithAverageScore() throws SQLException {
    List<Book> books = new ArrayList<>();
    try (Connection conn = ds.getConnection()) {
        Statement stmt = conn.createStatement();
        ResultSet rs = stmt.executeQuery("""
        SELECT b.id AS book_id, ANY_VALUE(b.title) AS book_title, AVG(r.score) AS average_score
        FROM books b
        LEFT JOIN ratings r ON b.id = r.book_id
        GROUP BY b.id
        ORDER BY b.published_at DESC
        LIMIT 10;
        """);
        while (rs.next()) {
            Book book = new Book();
            book.setId(rs.getLong("book_id"));
            book.setTitle(rs.getString("book_title"));
            book.setAverageScore(rs.getFloat("average_score"));
            books.add(book);
        }
    }
    return books;
}

RIGHT OUTER JOIN

A right outer join returns all the records in the right table and the values in the left table that match the join condition. If there is no matching value, it is filled with NULL.

Right Outer Join

CROSS JOIN

When the join condition is constant, the inner join between the two tables is called a cross join. A cross join joins every record of the left table to all the records of the right table. If the number of records in the left table is m and the number of records in the right table is n, then m \* n records will be generated in the result set.

LEFT SEMI JOIN

TiDB does not support LEFT SEMI JOIN table_name at the SQL syntax level. But at the execution plan level, subquery-related optimizations will use semi join as the default join method for rewritten equivalent JOIN queries.

Implicit join

Before the JOIN statement that explicitly declared a join was added to the SQL standard, it was possible to join two or more tables in a SQL statement using the FROM t1, t2 clause, and specify the conditions for the join using the WHERE t1.id = t2.id clause. You can understand it as an implicit join, which uses the inner join to join tables.

TiDB supports the following general table join algorithms.

The optimizer selects an appropriate join algorithm to execute based on the factors such as the data volume in the joined table. You can see which algorithm the query uses for Join by using the EXPLAIN statement.

If the optimizer of TiDB does not execute according to the optimal join algorithm, you can use Optimizer Hints to force TiDB to use a better join algorithm.

For example, assuming the example for the left join query above executes faster using the Hash Join algorithm, which is not chosen by the optimizer, you can append the hint /*+ HASH_JOIN(b, r) */ after the SELECT keyword. Note that If the table has an alias, use the alias in the hint.

EXPLAIN SELECT /*+ HASH_JOIN(b, r) */ b.id AS book_id, ANY_VALUE(b.title) AS book_title, AVG(r.score) AS average_score
FROM books b
LEFT JOIN ratings r ON b.id = r.book_id
GROUP BY b.id
ORDER BY b.published_at DESC
LIMIT 10;

Hints related to join algorithms:

Join orders

In real business scenarios, join statements of multiple tables are very common. The execution efficiency of join is related to the order of each table in join. TiDB uses the Join Reorder algorithm to determine the order in which multiple tables are joined.

If the join order selected by the optimizer is not optimal as expected, you can use STRAIGHT_JOIN to enforce TiDB to join queries in the order of the tables used in the FROM clause.

EXPLAIN SELECT *
FROM authors a STRAIGHT_JOIN book_authors ba STRAIGHT_JOIN books b
WHERE b.id = ba.book_id AND ba.author_id = a.id;

For more information about the implementation details and limitations of this Join Reorder algorithm, see Introduction to Join Reorder Algorithm.