During the PolarDB session of the 2017 Computing Conference, Alibaba Cloud’s high level Technical Expert He Jun delivered a speech on the features and common use cases of PolarDB. In his speech, He Jun discussed the structure of PolarDB, introduced its features, and finally shared insights on some common use cases.
The following sections highlights the main points from his speech.
I was pleasantly surprised when I first encountered PolarDB, as in my understanding, it represents a cross-generational milestone product that combines innovations in computing, storage, networking, and more. It implements a new design concept called Cloud Native, which is far different from the database design concepts we spoke about before. The earliest relation to modern databases is the relational database produced by the computing power available in the IT era. However, while moving computing capability onto the publicly accessible cloud and connecting it to user businesses generated a number of new innovations, they are far from sufficient in the long term. Why? Today, we are required to develop a cloud-based relational database targeted at public cloud environments and the user businesses that run in them. This is no small task.
PolarDB utilizes a structure that separates computing and storage, which is much easier said than done. The reason for combining computing and storage, after all, is to improve performance. The primary consideration in building a relational database is performance, so while separating storage and computing seems like an easy concept, actually doing it without sacrificing performance is quite difficult.
Today, the separation of computing and storage in PolarDB is a bold innovation that’s no longer stuck in the concept phase, but has been both realized and implemented. Where is the difficulty in building a relational database? It needs to be compatible with ACID semantics, otherwise it will be unable to support business situations that require online operations. If ACID compatibility, performance, and flexibility on the public cloud are all crucial, then we also need to take into consideration performance to cost ratio. Looking at commercial databases on the market, most of them are more or less a fantasy. Is it even possible to combine all required functionality, capability, and acceptable performance to cost ratio in a framework that sufficiently supports all necessary business scenarios? We have, through superior understanding of business applications and accumulated experience on the public cloud, implemented a single write multiple read database framework to significantly simplify the complexity of previous multiple write databases. Furthermore, we are able to satisfy the needs of the vast majority of use cases. We have implemented a proprietary distributed storage engine as the core of our arsenal, allowing PolarDB to provide flexibility on multiple dimensions.
The system has three layers, as we can see in this figure. The top layer is DBserver, which implements a single master, multiple slave framework whereby other nodes are able to expand or contract as needed to support any request. The lowest layer is distributed, fast storage devices.
What makes PolarDB special? First, a relational database absolutely must have high performance. If a relational database has poor performance, it will have difficulty satisfying the need to process the explosive growth of data characteristic of the current Internet era. So when I say that PolarDB performance is high, what exactly does that mean?
- High speed Single Point QBS can easily reach 500,000
Because PolarDB uses shared distributed storage, performance when adding a new read-only node is quite high, and when sharing data, we don’t have to add a new read-only instance and replicate the data. This reduces overhead from replicating data, as adding a new read-only instance only takes 1–5 minutes. It is also completely unaffected by the size of the data in the database. What’s more, with a single master multiple read structure, we are able to keep latency down to a matter of milliseconds. We can also create backups in seconds. Each of these functions features extremely high performance.
- Super high capacity
Using data to a certain point, it seems that once the size reaches around 2TB most databases become useless. Today, PolarDB is capable of providing capacity of up to 100TB, which, from the perspective of relational frameworks, is an enormous amount of data.
- Automatic scaling according to necessity
The PolarDB data structure makes full use of the flexibility offered by the cloud, enabling the system to scale flexibly according to changes in the user’s application.
- MySQL compatibility
There are already more open source database instances combined than Oracle instances, and this trend is increasing every year. We are already nearing 100% compatibility, and will continue to improve support for SQL standards as quickly as possible.
- High reliability and availability
PolarDB uses a one master many slaves framework, which naturally offers high availability. If the master node crashes, it will automatically be directed to the command node. At the same time, the existence of multiple data copies means that the data is naturally more reliable.
PolarDB in Production Scenarios
When talking about the capabilities of PolarDB as a product, remember that the birth of a product, its value, and its reputation, are all dependent on the services it provides. If users don’t use it and it doesn’t solve pain points in their application scenarios, then it’s difficult to say that the product has any value at all. For a user on the public cloud, the product must first take into consideration whether or not a cloud database can solve the user’s needs. If I have a new service, or even an existing service that I want to move to the cloud, then I want to use a database with a high performance to cost ratio, and it should be a next gen database. Moving my data to the cloud involves the cost of migrating all of my users to the cloud as well.
This migration cost is quite low if all users are very easy to migrate. However, if migrating users involves changing business procedures, then the process becomes quite painful and brings with it hidden dangers according to what the user does. We have to provide strong performance if we are to satisfy the needs of high end users. From business to the cloud, I trust the public cloud, and in turn Alibaba Cloud. When you provide services 24/7, you can’t afford any interruptions. As users increase, it becomes crucially important that your database be flexible enough, expandable enough to satisfy the needs of every business scenario.
Finally, data must be reliable. It is only once these needs are met that a database service is able to provide real value to the user. Next I will introduce and analyze four use cases to illustrate the capabilities and services offered by PolarDB.
Use Case 1: High Throughput Processing of Big Data
High throughput processing capability of large data volumes. In its earliest days, the public cloud serviced website users. As the public cloud improved and software on it continued to evolve, it gradually grew to become something very different. With the introduction of large users, medium users, and even smaller users with high growth potential, the services and data running on the cloud have become exponentially larger. We know that, in the mobile Internet era, data is used not only to solve users’ needs, but it may very well become much more important, serving as a balance between supply and demand. Because of today’s calculations, we know how to increase production efficiency, and as production becomes more and more efficient, so does the efficiency of user service scenarios as well as performance to cost ratios. Because we have gathered knowledge of user needs by servicing them and collecting their data, we have a much better understanding of what we need to provide. This allows us to react to changing needs and even become aware of changes in the collected data itself. Data has the possibility of changing the balance between supply and demand, which is a major contribution of the big data era. As data grows infinitely, databases become the supporting computing power that enables commercial civilization on the backend. Similarly, with the addition of data, the database requires more computing power to be able to process and utilize the data.
We utilize an architecture that separates reads and writes in order to accommodate more user processing systems. At the same time, we implement a shared storage system that allows us to provide storage of over 100TB and respond to the explosive growth of web-scale data.
Use Case 2: High Availability and Business Flexibility
A few years ago, when I was a developer, I was involved in developing high availability software. At the time, we wanted to install open source MySQL with two single nodes, purchase another piece of high availability software, and learn how to configure it in order to make the LAMP architecture highly available on two machines. Today, on the public cloud, we can use technology at a lower cost, and use it to serve more users cheaply. The value brought by the cloud is enormous.
Looking at this image, we see that when the CPU and memory on a computing node in PolarDB is insufficient, we can quickly and easily expand accordingly. Today we can use a shared storage framework to scale up or scale in. When there aren’t many read tasks, we can even delete some read nodes. Because of today’s competition, marketing, and changes in the Internet ecology, the time frame for our services could be reduced to a matter of hours or even minutes. For example, in e-commerce you sometimes have to deal with bid sniping, where data could surge in just an hour. However, if we’re able to add a read-only node each minute, this kind of load poses much less of a problem.
Use Case 3: Cloudification and Migration
When something new and more advanced comes on the market, we naturally want to give it a try, but that becomes quite difficult if we have to change our business processes. If we have MySQL compatibility, then putting our business on the cloud is quite simple. Then, if we use cloudification tools and perform logical migration, then the entire cloudification and cloud migration process is quite smooth.
Today we have already entered an age of cloud computing, IoT, and artificial intelligence. Before, we used to say that the Internet would move from online to offline, maybe some traditional businesses would move to the cloud, and maybe artificial intelligence would open up new forms of business. It’s possible that industry + the Internet will embrace the high performance to cost ratio, flexible, easily deployable cloud. With these kinds of migration tools, issues of compatibility are easily solved and the cost of the entire process of migrating to the cloud is reduced greatly.
Use Case 4: High Reliability and Backups for Disaster Recovery
The last point is high reliability and backups for disaster recovery. The above diagram shows a framework diagram of PolarDB with PolarDB as a cluster architecture on the DBserver layer. For a cluster architecture, network connectivity can be considered a mission critical application service. Because of PolarDB’s high reliability, it is ideal to be used for backups and disaster recovery scenarios.
Looking back, as I have personally come to understand PolarDB, I see it as a database product that combines imagination with creativity and adaptability. We believe that the spirit of PolarDB is one of faith combined with hard work and effort, and that is why we are able to present such a product to you all today.