A Small Victory — Acceleration of BITFIELD Commands for ApsaraDB for Redis
An Alibaba Cloud customer discovers that when they used read/write splitting instances, the CPU utilization of the main node was high while the standby nodes that carry the read traffic in read/write splitting was relatively idle. When the CPU is at full capacity, the online services on the main node are getting affected significantly.
1.1) Principles of Read/Write Splitting
The principles ApsaraDB for Redis follows in read/write splitting instances are as follows:
- The keys are written to the main node and then synchronized to the standby nodes.
- User requests are judged by the proxy.
- Write requests are forwarded to the main node while read requests are distributed to the standby nodes.
This architecture is suitable for businesses with much more read requests than write requests. The following figure shows the read/write splitting architecture.
1.2) BITFIELD Commands
After interacting with the customer, we found the customer used a large number of BITFIELD commands to read data. BITFIELD commands are run for bitmap data types. A bitmap is usually used to determine the status with minimal space consumption based on bitwise operations (AND, OR, XOR, and NOT). Common scenarios include:
- Use bitmaps to record a user’s daily application logon. If the $ID user logs on, “SETBIT logins:20200404 $ID 1” is recorded, indicating that the $ID user logged on on April 4, 2020. Run “BITCOUNT logins:20200404” to obtain the number of users that logged on on that day and run “BITOP AND logins:20200404–05 logins:20200404 logins:20200404” to calculate the number of users that logged on for two consecutive days.
- Determine whether users have read the same articles or watched the same videos.
- Use bitmaps to easily determine whether a user correctly answered all the questions in a live Q&A activity.
The live Q&A system is designed as follows:
1) Set a key for each user to answer questions in each round. For example, the key for user1 in the first round is round:1:user1.
2) Set the relevant bit to 1 for each correct answer. If user1 answers the fifth question correctly, set the fifth bit to 1, such as by running SETBIT round:1:user1 5 1. If user1 answers the ninth question correctly in the first round, set the ninth bit to 1 by running SETBIT round:1:user1 9 1. All the default bits of a BITFIELD command are 0 and do not need to be set upon an incorrect answer.
3) To calculate how many questions the user has answered correctly, run the BITCOUNT command to count the number of bits set to 1. For example, if user1 answers three questions correctly and user2 answers all questions correctly in the first round, user2 can continue on to the next round.
The bitmap API of ApsaraDB for Redis features high storage efficiency and computing acceleration efficiency. The syntax of a BITFIELD command is as follows:
[GET type offset] // Obtain the value of the specified bit
[SET type offset value] // Set the value for the specified bit
[INCRBY type offset increment] // Increase the value of the specified bit
[OVERFLOW WRAP|SAT|FAIL] // Control the INCR threshold
1.3) BITFIELD Problems in Read/Write Splitting Instances
As mentioned above, in BITFIELD sub-commands, the GET command is a read command while the SET and INCRBY commands are write commands. Therefore, ApsaraDB for Redis classifies the BITFIELD command as a write command, so these commands can only be forwarded to the main node. The following figure shows the BITFIELD command routing.
Therefore, only the main node had a high CPU utilization, while the standby nodes did not receive the commands.
2) Ideas and Problem Resolution
- Solution 1: Mark the BITFIELD command as a read attribute in the ApsaraDB for Redis kernel. When it contains sub-commands of the write attribute, such as SET and INCRBY, the BITFIELD command is synchronized to the standby nodes. When you use this solution, you do not need to modify the external components (proxy and client) but do need to specially process the BITFIELD command, which destroys the consistency of the unified processing of engine commands.
- Solution 2: Add the BITFIELD_RO command, which is similar to the GEORADIUS_RO command, to only support GET commands. Because all these commands are read operations, this ensures that the standby nodes can process BITFIELD commands. This solution is clear and reliable but requires the adaptation of the proxy and client.
In the end, we chose solution 2 because it is more elegant and standardized.
2.2) Adding BITFIELD_RO
"read-only fast @bitmap",
The following information shows that the BITFIELD_RO command is run on the standby nodes correctly:
tair-redis > SLAVEOF 127.0.0.1 6379
tair-redis > set k v
(error) READONLY You can't write against a read only replica.
tair-redis > BITFIELD mykey GET u4 0
(error) READONLY You can't write against a read only replica.
tair-redis > BITFIELD_RO mykey GET u4 0
1) (integer) 0
2.3) Forwarding the Proxy
To free users from having to modify the code, we have implemented compatibility with BITFIELD commands on the proxy. Therefore, if your BITFIELD command only contains the GET sub-command, the proxy converts the command to BITFIELD_RO and distributes it to multiple backend nodes to accelerate delivery, as shown in Figure 4.
2.4) Contribution to the Community
We contributed our modification to the Redis community, which officially accepted it.
Alibaba Cloud is the largest contributor to the Redis community in China. For example, in the newly released Redis 6.0 RC, Alibaba Cloud ranks third in contributions, following only the author and Redis Labs. Alibaba Cloud keeps contributing to the community.
3) Extension and Discussion
ApsaraDB for Redis introduces the BITFIELD_RO command to solve the problem where official BITFIELD GET sub-commands could not be accelerated on the standby nodes.
In addition, ApsaraDB for Redis also performed a compatibility conversion for the GEORADIUS command. Therefore, for a read/write splitting instance, if the GEORADIUS or GEORADIUSBYMEMBER command does not contain the store or storedist option, it is automatically judged to be a read command and forwarded to the standby nodes for accelerated execution.
3.2) Discussion Questions
Why do we need read/write splitting? Why can’t we just use the cluster edition? To answer these questions, assume the service capability of the community version is K. The following table shows a comparison of different versions. Here, we only compare ApsaraDB for Redis Enhanced Edition (Tair). For the cluster edition, the service capability can be multiplied by the number of shards.
Table 1: Comparison of Redis Community Edition (cluster and read/write splitting instances) and ApsaraDB for Redis Enhanced Edition (main and standby nodes) in simple scenarios
Obviously, the Redis Community Edition’s read/write splitting instances allow expanding the read capability of a single key or a hot key. This version is better suited to small- and medium-sized users with large keys, but it cannot solve the burst write bottlenecks. For example, if a user’s BITFIELD command is a write request (the sub-commands contain INCRBY and SET), you will encounter performance problems that cannot be solved.
According to the table, if you break down keys or divide big keys into multiple smaller keys, you may use the cluster edition for linear acceleration. Big keys cause many problems, for example:
- Big keys cause data skew, which hinders the linear expansion of Redis capacity and service capability.
- A big key is very likely a hot spot.
- If you accidentally perform range operations on a big key, slow queries may occur and the bandwidth is likely to burst.
This is a best practice when we use ApsaraDB for Redis Enhanced Edition (Tair) in various applications within Alibaba Group: “Avoiding big keys and slow queries can prevent more than 90% of ApsaraDB for Redis problems.”
However, you will still encounter hot spot issues, such as flash sales, popular videos, and super-large broadcasting rooms. In particular, many hot spots are sudden and unexpected. The performance-enhanced instances of ApsaraDB for Redis Enhanced Edition feature service capabilities for 400,000 to 450,000 O(1) operations per second (OPS) for a single key, as well as extremely strong impact resistance. The main and standby nodes of ApsaraDB for Redis Basic Edition are sufficient for medium- and large-scale flash sales. Meanwhile, if you do not have big keys, ApsaraDB for Redis Enhanced Edition cluster instances will support tens of millions of OPS. This is why Alibaba uses ApsaraDB for Redis Enhanced Edition (Tair) to ensure a successful Double 11 each year. Now, you can also leverage this technology.
Learn more about ApsaraDB for Redis at https://www.alibabacloud.com/product/apsaradb-for-redis