By Zhaofeng Zhou (Muluo)
Table Store is Alibaba Cloud’s first distributed multi-model database, and is a NoSQL database. Many application developers are familiar with NoSQL databases. Many application systems are no longer reliant at a low level on relational databases alone, but using different databases for different business scenarios. For example, cache KeyValue data is stored in Redis, documentary data is stored in MongoDB, and graphical data is stored in Neo4J.
Looking back at the development of NoSQL: NoSQL was created in the Web 2.0 era, and the rapid development of the Internet has also brought about an explosion of Internet data. Traditional relational databases cannot handle such massive amounts of data, which requires a highly-scalable, distributed database. However, it is very challenging to implement a highly-available, scalable distributed database based on the conventional model of relational data. Most data on the Internet is based on a simple model and does not need a relational model to model it. If data could be modeled with a simpler data model than the relational model, weakening transactions and constraints, and aim at high availability and scalability, then databases designed with these in mind would better meet business requirements. Such a concept is behind the development of NoSQL.
In summary, the development of NoSQL is based on the new challenges of business and the new requirements for databases in the Internet era. Developed on this basis, NoSQL has these distinctive features:
- Multi-data model: Different requirements for different data have given rise to many different data models, including KeyValue, Document, Wide Column, Graph and Time Series. This is one key feature of a NoSQL database, which breaks the constraints of the relational model, developing in a diverse directions. The choice of data model is more scenario-oriented, more in line with actual business needs and allows for optimization at a deeper level.
- High concurrency and low latency: The development of the NoSQL database has been driven mainly by the requirements of online businesses. The design aims to provide high-concurrency and low-latency access for online businesses.
- High scalability: Scalability is a core design goal in response to the explosive growth in data volume. A distributed model for the underlying architecture is often considered at the outset of the design process.
There has been significant development in all types of NoSQL databases in the past few years, which is apparent from development trend statistics related to DBEngines. Alibaba Cloud’s Table Store is a distributed NoSQL database that uses a multi-model architecture for its data model, supporting both Wide Column and Timeline.
The wide column storage model, is a classic model, originally seen in Bigtable and subsequently widely adopted by other systems of the same type. Currently most semi-structured and structured data in the world is stored in systems based on this model. In addition to the Wide Column model, we have also introduced another completely new data model: Timeline, which is a new generation model for message data, suited to the storage and synchronization of messages in messaging systems such as IM, Feeds, and IoT device pushdowns, and is now seeing widespread application. Next, we describe these two models in detail.
Wide Column Model
The above is a schematic diagram of the wide column model. We take a relational model for comparison to help us better understand this model. A relational model can be simply understood as a two-dimensional model consisting of rows and columns, with a fixed schema for each row. So the features of a relational model are: two dimensions and fixed schema. This is the simplest understanding, aside from transactions and constraints. The wide column model is a three-dimensional model with the additional dimension of time added to the two dimensions of row and column. The time dimension is reflected in the attribute column, which has multiple values, each value corresponding to a timestamp for the version. And each row is schema free, with no strong schema defined. So the differences between the wide column model and the relational model are: three-dimension, schema free, and simplified transactions and constraints.
Taking a closer look at the composition of this model, it has several main parts:
- Primary key: Each row has a primary key, which consists of multiple (1–4) columns. The primary key is defined by a fixed schema, and is mainly used to uniquely identify a row of data.
- Partition key: The first column of the primary key is called the partition key, which is used to partition the range of the table. Each partition is dispatched in a distributed manner to a different machine for service. Within the same partition key, cross-row transactions are provided.
- Attribute column: Apart from the primary key column, all columns in a row are attribute columns. An attribute column corresponds to multiple values. Different values correspond to different versions, and a row can store an unlimited number of attribute columns.
- Version: Each value corresponds to a different version, the value of which is a timestamp that defines the time to live of the data.
- Data type: Table Store supports a variety of data types, including String, Binary, Double, Integer, and Boolean.
- Time to live (TTL): The TTL of the data can be defined for each table. For example, if the TTL is configured to one month, the data written in the table one month ago will automatically be cleared. The write time of the data is determined by the version, which is generally determined by the server end based on the server time when the data was written, and can also be specified by the application.
- MaxVersion: The maximum number of versions saved in each column can be defined for each table. This can be used to control the number of versions in a column, with data that exceed the maximum number of versions being automatically cleared.
The features of the Wide Column model can be summarized as follows: three-dimensional structure (row, column, and time), wide row, multi-version data, and TTL management. Meanwhile, in terms of data operation, the Wide Column model provides two types of data access APIs: Data API and Stream API.
For more detailed Data API documentation, see here. The Data API is a standard data API that provides online data read-write, including:
- PutRow: Insert a new row, and overwrite the same row if it exists.
- UpdateRow: Update a row. This allows you to add or delete the attribute columns in a row, or update the values of the existing attribute columns. If the row does not exist, a new row will be inserted.
- DeleteRow: Delete a row.
- BatchWriteRow: Update multiple rows of data in multiple tables in batch, which can combine PutRow, UpdateRow, and DeleteRow.
- GetRow: Read data from a row.
- GetRange: Scan data in a range, either in ascending or descending order.
- BatchGetRow: Read multiple rows in multiple tables in batch.
In relational model databases, there are no definitions of standard APIs for the incremental data in the database; while in many application scenarios of traditional relational databases, the use of incremental data (binlog) cannot be ignored. This is widely used inside Alibaba, and provides the DRC with this type of middleware to fully utilize this part of the data capability. After the incremental data has been fully utilized, there are many things we can do in terms of the technical architecture:
- Heterogeneous data source replication: MySQL data can be incrementally synchronized to NoSQL for cold data storage.
- Integration with StreamCompute: MySQL data can be analyzed in real time for some large-screen applications.
- Integration with the search system: The search system can be extended to include a secondary index of MySQL to enhance the data retrieval capability within MySQL.
However, even though the incremental data of a relational database is useful, the industry does not have a standard API definition to obtain this data. Table Store has long recognized the value of this data, and so it provides a standard API that allows this data to be fully utilized. Refer to our Stream API documentation to learn more.
The Stream API generally includes:
- ListStream: Obtain the stream of the table and the ID of the range stream.
- DescribeStream: Obtain the details of the stream, and get the shard list and the shard tree in the stream.
- GetShardIterator: Obtain the iterator for the current incremental data of the shard.
- GetStreamRecord: Obtain the incremental data in the shard according to the shard iterator.
The implementation of Table Store Stream is much more complicated than MySQL Binlog, as Table Store has a distributed architecture, and Stream also has a distributed incremental data consumption framework. The data consumption of the stream must be obtained in an order-preserving manner. The shards of the stream correspond to the partitions of the table inside the Table Store. The partition of the table may be split and merged. To ensure that data consumption for the old and newly-added shards, after partition splitting and merging, is order-preserving, we have designed a more sophisticated mechanism. The design of Table Store Stream is not described at length here, but we will release more detailed design documentation at a later time.
Due to the complexity of Stream’s current internal architecture, which has also been introduced into Stream’s data consumption side, using the Stream API is not all that simple. This year, we have also been planning the launch of a brand new data consumption service, which will simplify the data consumption of Stream and provide a simpler and easier to use API. Be sure to keep an eye out for its release.
The Timeline model is a new data model that we have created for message data scenarios. It is able to meet the special requirements needed for message data scenarios, such as message order preserving, massive message storage, and real-time synchronization.
The above is a schematic diagram of the Timeline model, which abstracts the data in a large table into multiple timelines. There is no upper limit on the number of timelines a large table can hold.
A timeline consists of:
- Timeline ID: The ID uniquely identifies a timeline.
- Timeline Meta: Timeline metadata, which can contain any key-value pair attributes.
- Message Sequence: Message queues hold all messages in the timeline. Messages are stored in order in the queue, and are assigned with incremented IDs according to the write order. A message queue can hold an unlimited number of messages. Within the queue, a message can be randomly located based on its message ID, and ascending or descending order scans can be provided.
- Message Entry: The message body contains the specific content of the message and can contain any key-value pairs.
The Timeline model is similar to the message queue in terms of logic, and a timeline is similar to the topic in a message queue. The difference is that the Table Store Timeline places greater emphasis on the scale of topics. In an instant messaging scenario, both the inbox and outbox of a user is a topic. In an IoT message scenario, each device corresponds to a topic, and the magnitude of topics will reach an order of tens of millions or even hundreds of millions. Table Store Timeline is based on an underlying distributed engine. Thus, a single table can theoretically support an unlimited number of timelines (topics), and the queue’s Pub/Sub model has been simplified. It also supports message order preservation, random positioning, and ascending or descending order scans, while better meeting the requirements of massive message data scenarios, such as instant messaging (IM), feeds, and IoT messaging systems.
Timeline is a data model that was just launched last year and is constantly being improved on. Based on this model, we have helped DingTalk, Cainiao Smart Customer Service, Taopiaopiao Xiaojuchang, Smart Device Management, and other services to build messaging systems for instant messaging, feeds, and IoT messages. We invite you to give it a try.