Breaking the Limits of Relational Databases: An Analysis of Cloud-Native Database Middleware (1)

Sharding

Regarding performance and availability, traditional solutions that store data on a single data node in a centralized manner can no longer adapt to the massive data scenarios created by the Internet. Most relational database products use B+ tree indexes. When the data volume exceeds the threshold, the increase in the index depth leads to an increased disk I/O count, the substantially degrading query performance. In addition, highly concurrent access requests also turn the centralized database into the biggest bottleneck of the system.

1. Vertical Sharding

Vertical sharding is also known as vertical partitioning. Its key idea is the use of different databases for different purposes. Before sharding is performed, a database can consist of multiple data tables that correspond to different businesses. After sharding is performed, the tables are organized according to business and distributed to different databases, balancing the workload among different databases, as shown below:

2. Horizontal Sharding

Horizontal sharding is also known as horizontal partitioning. In contrast to vertical sharding, horizontal sharding does not organize data by business logic. Instead, it distributes data to multiple databases or tables according to a rule of a specific field, and each shard contains only part of the data.

  1. In NewSQL with the new architecture, the database storage engine is redesigned to store the data from the same table in a distributed file system.
  2. In the sharding middleware, the impacts of sharding are transparent to users, allowing them to use a horizontally sharded database as a common database.

3. Read/Write Separation

Database throughput is challenged by a huge bottleneck due to increasing system access traffic. For applications with a large number of concurrent reads and few writes, you can split a single database into primary and secondary databases. The primary database is used for the addition, deletion, and modification of transactions, while the secondary database is for queries. This effectively prevents the row locking problem caused by data updates and dramatically improves the query performance of the entire system.

4. Key Processes

Sharding consists of the following processes: statement parsing, statement routing, statement modification, statement execution, and result aggregation. Database protocol adaptation is essential to ensure low-cost access by original applications.

Protocol Adaptation

In addition to SQL, NewSQL is compatible with the protocols for traditional relational databases, reducing access costs for users. Open source relational database products act as native relational databases by implementing the NewSQL protocol.

  1. The payload length is of the int❤> type. It indicates the total number of bytes occupied by the subsequent payload. Note that the payload length does not include the length of the sequence ID.
  2. The sequence ID is of the int<1> type. It indicates the serial number of each MySQL packet returned for a request. The maximum sequence ID that occupies one byte is 0xff, that is, 255 in decimal notation. However, this does not imply that a request can only contain up to 255 MySQL packets. If the sequence ID exceeds 255, the sequence ID restarts from zero. For example, hundreds of thousands of records may be returned for a request. In this case, the MySQL packets only need to ensure that their sequence IDs are continuous. If the sequence ID exceeds 255, it is reset and restarts from zero.
  3. The length of the payload is the bytes declared by the payload length. In a MySQL packet, the payload is the actual business data. The content of the payload varies with the packet type.

SQL Parsing

Although SQL is relatively simple compared to other programming languages, it is still a complete programming language. Therefore, it essentially works in the same way as other languages in terms of parsing SQL grammar and parsing other languages (such as Java, C, and Go).

Request Routing

The sharding strategy is to match databases and tables according to the parsing context and generate the routing path. SQL routing with sharding keys can be divided into single-shard routing (where the equal mark is used as the sharding operator), multi-shard routing (where IN is used as the sharding operator), and range routing (where BETWEEN is used as the sharding operator). SQL statements without sharding keys adopt broadcast routing.

SQL Statement Rewriting

NewSQL with the new architecture does not require SQL statement rewriting, which is only required for NewSQL statements of the sharding middleware type. SQL statement rewriting is used to rewrite SQL statements into ones that can be correctly executed in actual databases. This includes replacing the logical table name with the actual table name, rewriting the start and end values of the pagination information, adding the columns that are used for sorting, grouping, and auto-increment keys, and rewriting AVG as SUM or COUNT.

Results Merging

Results merging refers to merging multiple execution result sets into one result set and returning it to the application. Results merging is divided into stream merging and memory merging.

  1. Stream merging is used for simple queries, Order By queries, Group By queries, and Order By and Group By scenarios where the Order By and Group By items are completely consistent. The “next” method is called each time to traverse the stream merging result set without consuming additional memory resources.
  2. Memory merging requires that all data in the result sets must be loaded to the memory for processing. If the result sets contain a large volume of data, lots of memory resources are consumed accordingly.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com