This year’s Double 11 big promotion has just passed. During this year’s big promotion on Alibaba’s various e-commerce platforms, 10 billion RMB in transactions were made in just 1 minute and 36 seconds, 30 billion in just 5 minutes, 25 seconds, and 50 billion in less than 13 minutes. With the plunge of every year’s Double 11 event, new innovative technologies are constantly being developed to make for an even bigger and better super promotion. In fact, you could go as far to say that, without the plunge of Double 11, it would have probably taken another 20 years for the current technologies and computing capabilities to have been developed.
Making the Impossible Possible
Make 11' Happen
For the past 11 years, ever since the very first Double 11, every year when the objective is set for each year’s Double 11, everyone on the Alipay Technology team gets pretty doubtful that the objective set can actually be achieved. And yet, somehow, every year’s ambitious objectives come to fruition — making for one extraordinary shopping event, powered by some amazing new advancements.
Also, typically, as technology changes and evolves, the results achieved in the previous year become “ordinary” by the next one. And, in the blink of an eye, the huge Double 11 event has been transformed from a “lonely canoe” to a “giant ship” of a shopping event, one that carries the happiness and dreams of billions of online shoppers. Behind this miracle project, of course, lies several advanced, cutting-edge technologies and the technical engineers that made them possible. Alibaba’s technical engineers have been working tirelessly to develop today’s future technologies at such a quick turnaround.
The original intention of these engineers was simple: Once determined to meet their goals, they would do everything they could to make them happen. And through many trials and tribulations, great achievements were made. The impossible was made possible.
Just Seconds from a Crash
Ten years ago, November 11, 2009 started like any other day for Chen Liang, an engineer with Alipay. After Chen went on this commute through the morning rush-hour traffic to sit at his desk, which was then at Huaxing Times Square, in Hangzhou. Soon after sitting at his desk, he received an email from Ant Financial CTO Cheng Li on his computer. The email said that Taobao Mall (today’s Tmall) was going to hold big promotion campaign on the same day with an high estimated transaction volume, so everyone needed to keep a close eye on the system.
At that time, Chen Liang’s team was responsible for ensuring the stability and reliability of the entire system. During promotional events, their responsibility was generally to ensure that the server was “robust” enough to handle a swarm of online shoppers.
Taobao Mall had just been reorganized and launched in August 2009. At that time, it was very easy for Alipay to ensure the stability for the average daily transaction volumes. Even if a temporary peak occurred during a sales promotion, there was usually no need to be panic. This could be resolved simply by expanding the capacity of the system.
And, in fact, this exact situation played out. The whole team gathered in the office and stared at the computer screen. They scaled up the system immediately whenever they found that the transaction volume was approaching the upper limit.
However, the increase in the transaction volume was somewhat unusual. At first, only the rattling of keyboards could be heard in the quiet office. Suddenly, someone shouted, “I did it!” Immediately, another person shouted, “I did it too!” The office suddenly burst into excitement.
It turned out that some people at the office were curious about what Taobao Mall had on sale, leading to soaring transaction volumes. The consumers clicked the link and found flash sales with discounts of more than 50%. Of course, they could not help trying it out.
“I have forgotten what exactly everyone got at that moment. All I can remember is that everyone was super happy and excited,” Chen Liang said.
Happiness is Chen Liang’s initial and most distinctive impression of this promotional event.
However, at the time, most Alipay employees were unaware of the promotion, except for those who were busy scaling up the system on that day. “I didn’t even know there was a promotion until the next day. My colleagues said that site traffic was pretty fierce.” Li Junkui, who is now a researcher at Ant Financial, said the O&M director nervously “protested” at the replay meeting the next day: “What was Taobao Mall doing? Payment volume suddenly surged. It would’ve been risky if we weren’t well prepared for it.”
What exactly did Taobao Mall do? Looking back on it today, it wasn’t that big of deal in all actuality: On that Singles’ Day, they held a promotional campaign jointly with 27 brands, hitting a single-day gross merchandise volume (GMV) of 50 million.
At that time, no one would have predicted that this promotion would grow to where it is today. However, Alipay sensed a “coming tidal wave” based on the data: The peak transaction volume brought by this event was five times the typical volume. Although Alipay withstood the hit, at the time, the load limit of Alipay was almost reached.
Just after the middle of 2010, Alipay contacted Taobao Mall and asked: “Are you going to repeat last year’s promotional event this year?” Taobao Mall replied, “Yes.”
As the saying goes a soldier wouldn’t go into battle without preparing for it. Similarly, preparing for Double 11 was naturally added to the agenda of Alipay’s weekly stability meeting. The top objective was to prepare the sufficient capacity needed for the promotion. But, one question remained: how much would be required? No one had a clue.
“Just promptly estimate a value and then multiply the value by three. Let’s make it simple and straightforward!” Li Junkui said frankly.
To test whether this prompt decision was feasible, he conducted a test with his team: Manually modifying the system configuration to direct the traffic from multiple machines to one machine, and then test how much traffic that single machine could sustain. “In retrospect, this was the initial prototype for stress testing.”
They even created a standby work group for the coming event. With DingTalk, Alibaba’s workplace messaging app, being not yet available, they created all their work groups on Aliwangwang, Taobao’s messaging system. However, with doing that, doubts emerged: “what if Aliwangwang also fails, and we cannot contact each other promptly?”
Even though the team didn’t spend much time to prepare for the event, they did consider all aspects. “There were accidents every year, regardless of how complete our preparation was,” said Zhao Zunkui, an engineer from the Core Financial Technology Department. He worked in the Financial Accounting Team, where money was involved in every operation, and no mistakes were permitted.
Unfortunately, the unexpected really did occur.
On November 11, shortly after the promotional event began, Alipay’s account database ran short of capacity.
Problems quickly snowballed from there. The situation had already got to a critical stage by the time the team identified the problem. “The system can only stand a few more minutes of this!” The O&M director was anxious. If no solution was made immediately available, Alipay would be at risk of downtime. If Alipay were to go down, then nobody could buy anything on Taobao Mall.
But what could be done to save the promotion? The O&M director bit the bullet and said “Shut down the accounting system to free space for the core financial system.”
There was little time left to consider. With a crowd of senior executives standing behind him, Jiang Tao, an engineer in the Alipay Middleware Team, experienced unprecedented stress, “My hands were shaking during the operation.”
This timely decision saved Alipay from downtime with only seconds to spare. According to the final data report, during Double 11 of 2010, 21 million users participated in the event, and the total GMV reached 1 billion, 20 times that of 2009. It had been difficult for anyone to estimate such a spike in advance.
“We had expected a rise, but never expected such an extreme surge.” Zhao Zunkui said, “From that year on, we would always have this sort premonition that an even more extreme surge would come during the promotion.”
The Power of Code
Xiao Han, a member of the post-85s generation, and Zheng Yangfei, a member of the post-90s generation, both learned about Double 11 when they were in college.
Xiao Han likes online shopping. He was among the early Duoshouzu in Double 11 of 2009. Duoshouzu (剁手族) in Mandarin literally means a person that chops off their hands. This relatively new cultural metaphor, that came out of the Taobao internet culture in China, refers to the sort of online shopper who is such a shopaholic that they’ll jokingly say that they’ll “chop off” their hands if they buy any more things online.
In a technical exchange group, Xiao Han met the Alipay engineers who participated in Double 11. Zheng Yangfei often bought “Computer News”, which in one release said that the sales volume of Double 11 in 2010 was equal to the total single-day retail sales volume of one day in Hong Kong. He was impressed by this and wanted to learn more.
“I felt it was amazing and had to have a look into it.”
The two young people who did not know each other came up with the same idea independently.
Xiao Han joined Alipay in 2011, when Alipay had already started the strategy of “building and developing the system in the first half of the year and the major promotion in the second half.” The preparations for the major promotion began in May and June. And soon after he joined the company, he was transferred to develop the traffic access and transfer system, which is referred to as Spanner.
This system was equivalent to the first portal of the Alipay transaction process. “It seemed like a cart for serving food in a restaurant. In a restaurant, one waiter can serve only one dish at a time. However, the challenge from Double 11 demands that one waiter serve 10 dishes at the same time. For this reason, we needed a cart. However, no ready-to-use cart was available in the industry to meet the needs of Alipay. So we had to build one.”
For almost the whole year, Xiao Han and his team spent numerous sleepless nights working on this project. Spanner finally faced its first big test in Double 11 of 2012.
Yet again, the unexpected occurred.
That year, Alipay’s major promotion monitoring system was also launched. The traffic curve for every second can be displayed in real time. Right before the event started, everyone was staring at the screen and waiting eagerly.
And, when the clock struck 00:00, the traffic came roaring in, and the curve began to grow, forming a beautiful arc. Everyone began to cheer. And then, suddenly, the traffic curve fell and then began to fluctuate like an electrocardiogram.
The monitoring system was normal and no errors had been reported. So then why was the traffic failing to pass in?
A stone tossed across the water raises a thousand ripples. The fluctuations we saw were also displayed in Taobao’s command center in real time. As the only “representative” of Alipay, He Yan, an Alipay engineer, was preparing for the campaign with Taobao’s technical engineers. That was a job that challenged his psychological endurance. At the moment when the payment curve fluctuated, “Taobao’s technical engineers immediately surrounded me, and asked me one after another, ‘What is wrong with Alipay?’” He Yan recalled.
Xiao Han’s mind went blank. The only thought on his mind was “Keep the transactions moving.”
Within the short 20 minutes from 00:00 through 00:20, 10 minutes were spent on pinpointing the problem, and another 10 minutes were spent on solving the problem. Within the same short span of 20 minutes, however, the situation was revealed to the public: “Payments Halted on Alipay” was posted on Weibo Hot Search, and family members, relatives, and friends called in to ask what was happening. My phone was ready to blow with all the calls and messages.” Xiao Han said.
The system finally recovered to stable after a health monitoring module was disabled. Rather than feeling stressed, Xiao Han felt an unprecedented shock: What he was doing had already impacted tens of millions of people, and every minor mistake would also affect an inestimably huge group of people.
“If I had not been there, it would be difficult to realize how important each line of code was.” Zheng Yangfei said. When he joined Alipay as an intern in 2013, Gong Jie, Zheng Yangfei’s mentor said something that made a deep impression on him: Look at the customer service personnel. If you can elaborate on the code to eliminate one error, they will receive far fewer error report calls.
After we maneuvered pass the roadblocks in 2012, the Alipay Database Administrator (DBA) repeatedly warned that system expansion had been exhausted, and it could only last for at most a few more months. He added that, according to this growth rate, if no solution was developed, the system would fail to withstand 2013’s Double 11 promotion.
Misfortunes never come alone. Another “spell” was cast: The maximum number of connections to the Oracle database also became a bottleneck for expansion. Even worse, Hangzhou’s power supply would no longer sufficient to support the persistent expansion of our data centers. Sometimes, in order to protect the power supply of the data centers, “We cut off the electricity of the office in the scorching summer. When we did that, we had to use ice cubes to cool ourselves off.” Gong Jie said with a smile that only those who had ever lived through the middle of summer in Hangzhou would know how harsh that experience was.
They were running out of workarounds. New solutions to tackling the root cause must be found, for example, modularization, a “revolution” at the architecture level.
A revolution is not as simple as inviting someone to dinner. It is very difficult to make fundamental adjustments at the architecture level. First, no successful experience was available for reference, which means exploration of new channels was the only way. Second, there’s a large number of departments with different needs and, hence, different opinions were involved. Third, a revolution was needed to not only solve the problems of the current year and the next year but also to plan for at least the next three years.
At the same time, after communicating with Taobao Mall, unsurprisingly, Alipay set another goal that caused everyone to shout “Impossible!”: Support for a maximum of 20,000 payments per second.
This matter was so serious that everyone was very cautious. “The architecture adjustment solution alone was discussed for a long time.” Chen Liang said, as the architect of the project, he spent a lot of time trying to convince everyone to agree on the solution.
One part of the burden fell on his shoulders, and the other part was handed over to Jiang Tao, who had been shaking while resolving the crisis in 2010. Jiang Tao was more concerned with stability. “We had to ensure service stability while changing the technical architecture. This was very complex and had high technical risks.”
Time was running out. The project of Logical Data Center (LDC) architecture had not been approved until the end of 2012, less than one year before 2013’s Double 11 promotion. This was undoubtedly a tight schedule for such a huge project.
Chen Liang initially conceived that a grand system could convert all the existing systems into units all at once. However, Cheng Li rejected this solution. “The major problem lies in transactions in Taobao Mall. We have to start with Taobao Mall.” According to him, the system must be released in 2013, even if only the first phase had been finished.
A bunch of impossible targets were gathered together. Now that the targets had been set, moving forward was the only thing to do.
“After the project was approved, we released updates almost every month.” Jiang Tao said that this frequency is several times that in common project development. Despite this, the entire system had not been deployed until half a month before Double 11. Minor errors continued to occur constantly. However, after an increasing number of minor errors were discovered and rectified, he said, “I was finally getting a little bit more confident about it.”
In 2013, the logical data center (LDC) architecture for Alipay was unveiled during Double 11. Alipay sent a “representative” for the first time to “Guangming Peak” at the Alibaba Xixi Campus, the general command center for Double 11.
The “lucky” representative was Li Junkui. “I was just the ‘sacrificial lamb’.” He laughed and said he felt intense pressure as soon as he walked into “Guangming Peak”. Li Jin, the commander in chief for that year, pointed to the big screen and called him up in front of the hundreds of engineers in the whole of Alibaba Group. He said, “Xiangxiu (Li Junkui’s nickname)! You are responsible for the Alipay data!”
Li Junkui has worked on this stress-intensive task for several consecutive years, and even drawn some lessons from it. “First, don’t panic. No matter what kind of feedback you receive, first answer ‘Got it. Let me take a look.’ Actually, you cannot do anything, rather what you do is just convey the problem symptoms to your teammates at the back end as quickly as possible, and then trust them to handle the rest.”
He said this was one of the most important success secrets of Alipay Technology Team: You are never fighting alone, and you cannot fight alone. Behind you, you can always rely on your teammates.
As for the results of this year, according to Jiang Tao, “We survived.” The new architecture had taken its breathtaking first steps.
Guan Gong, Lingyin Temple, and Stress Tests
Another special feature of Double 11 in 2013 is that a painting of Guan Yu (关羽), a mighty warrior in the epic Romance of the Three Kingdoms, who received the honorary title of Guan Gong (关公, literally, General Guan) and is worshiped as a patron god in China. Guan Gong appeared in Alipay’s preparation room.
The painting was brought in by coworker Zheng Yangfei. Actually, “worshiping Guan Gong” had been a tradition of the Alipay Technology Team for a long time even before Zhang joined the company. It is said that this tradition can be traced back to the very beginnings of the foundation of Alipay. Each time an important system update was to be released, engineers forwarded Guan Gong emojis in their Wangwang groups to pray for smooth updates “without any bugs.”
One year later, Guan Gong’s image was “upgraded.” An employee saw Guan Gong shadow puppets during an on-campus recruitment in a college in Xi’an, and then “invited” one to the preparation room. Later, Cheng Li bought a wooden Guan Gong statue. Last year, Deputy CTO Hu Xi bought a bronze Guan Gong statue.
In addition to worshiping Guan Gong, it is also a routine to go to the temple to burn incense. Depending on their destination, the employees are even divided into the “Lingyin Temple faction” (灵隐寺派) and the “Faxi Temple faction” (法喜寺派), which correspond to two major temples in Hangzhou, close to West Lake. As for which side is more efficacious, opinions vary. After Double 11 every year, Cheng Li and Hu Xi personally lead their teams to redeem a vow to the god by walking all the way from the Alipay Building to Faxi Temple. On their way back, they also collected litter along the way.
Technology is pure science. Why would technical engineers believe that praying to some god could prevent system faults and bugs?
“Spiritually, I think it is quite useful.” Chen Liang said, “it is mainly to express respect and the slight fear of unpredictable things. Although we have been working on technology for many years, the road to technology is still full of unpredictable things.”
Unpredictability is the biggest source of anxiety that engineers face during Double 11 every year.
They use their respective methods to relieve the stress from Double 11. Some people prefer doing sports, such as running or playing ball games to relax. Some people obsessively and repeatedly double-check the code. Some people are chihuo, or foodies, who always organize a group to eat at a popular Sichuan hot pot chain, Haidilao, before Double 11.
When he was asked “which year was the most difficult,” Zhao Zunkui, who has participated in all the Double 11 promotion during the past 11 years, answered without hesitation: “It is tough every year.” Chen Liang, who was also had a “perfect attendance record,” said: “Before 2014, our confidence in the system stability at 00:00 of Double 11 was, if I have to give a number, 60%.”
“But after 2014, this number changed to 95%,” he quickly added.
Chen Liang’s confidence came from Alipay’s stress testing system, which was built that year. At this time, instead of manually adjusting configurations to test a single machine, they created a simulation environment to run the system. By using the system, they could pinpoint system problems in advance and fix them promptly, so that they would not be caught in a sudden flurry during the event.
“Stress testing gradually transformed Double 11 from an uncertain thing to a definite one, which greatly changed the way we ensure stability for Double 11,” said Zheng Yangfei, who was honored as the “Prince of Stress Testing.”
Although the stress testing in 2014 only covered the core system, the testing system had been very helpful. Within the month before Double 11, it exposed at least 100 serious problems in advance. “If any of them had not been fixed, our Double 11 of 2014 would have definitely failed,” Chen Liang said.
1%? Or 10%?
The stress testing process not only found many hidden risks, but also revealed a big issue: The Oracle database used by Alipay was “exposed” during the stress testing. Specifically, the testing results showed that the performance of the Oracle database apparently almost hit the upper limit.
It was in 2014 when mobile Internet became mainstream. The exponential increase in the percentage of mobile payments was bound to bring a traffic peak that would be fiercer than the ones we saw in previous years, and the Oracle database we were using apparently could no longer withstand this.
So is the solution to buy more servers? Well, the cost of doing so would add up to something well more than we could afford. Moreover, the number of machines that would be added to cope with peaks would be ultimately a total waste of resources, because these machines would be unused during ordinary, non-big promotion days.
So, was there any other method to fix this problem? Well, yes. Alibaba’s proprietary distributed database OceanBase had been quiet during the first two years it was implemented for Taobao and Alipay, and its owner was anxiously looking for a stage to display its capabilities.
However, upon hearing that the database was proprietary, the service department was full of suspicion. When applied, the database would be directly related to the transactions and the amount. The consequences would be unimaginable if any data was incorrect. For such an untested product, they would consider carefully even on ordinary days, let alone for Double 11 when the traffic tended to be very high.
Switch 1% of the traffic to OceanBase first to give it a try. This is a solution that was reached after prolonged discussion.
However, according to the stress testing results of the Oracle database, the gap was not 1% but 10%.
The department responsible for OceanBase said, “Let’s take on that 10%.”
Ten percent does not sound like much. However, 10% of the traffic on Double 11 is equivalent to the peak traffic of ordinary days. If OceanBase could stably handle that 10%, this meant that it could take on the responsibility to support Alipay’s daily operations.
OceanBase must prove its ability to cope with this. “We gathered Taobao’s employees and coordinated a lot of resources to conduct a test. The test focused on whether the amount of orders in Taobao matches the amount of transactions in Alipay.” Shi Wenhui, an engineer from the DBA team, said they were very cautious in the plan. They planned to switch back from OceanBase whenever it became faulty.
The test result showed that OceanBase did not miss a single piece of data. Cheng Li immediately settled the argument: “All the 10% are yours.”
This decision contributed to OceanBase’s debut on Double 11. “It seems that both the Oracle database and Lusu (Cheng Li’s nickname) helped us,” Shi Wenhui said with a smile.
At this time, they were less than two weeks away from Double 11 of 2014. Although the reliability had been proved sufficient to withstand the test, OceanBase was, after all, a four-year-old database. Minor problems emerged one after another. For example, the response time was as high as 10 milliseconds, which was several orders of magnitude worse than that of the Oracle database. In the last 10 days, Shi Wenhui and all his teammates put forth the utmost effort and finally optimized it to less than 1 millisecond.
“After working on it for so many years, of course I have confidence in its capacity and performance,” Shi Wenhui said.
He spoke lightly. However, no one could be more clear than themselves about how much effort he and his entire team had devoted to this product, which had once faced team dismissal and project cancellation.
OceanBase was not originally designed for Double 11. However, on the stage of Double 11, it performed excellently when it was first able to take its place in the spotlight. From then on, Alipay began to fully migrate its core transaction system to OceanBase.
As of 2019, OceanBase internally carries 100% of the traffic for Ant Financial’s business. Externally, in the TPC-C benchmark test, which is known as the “Database World Cup”, OceanBase broke the world record maintained by the US company Oracle for nine years. OceanBase became the first Chinese database product to top the leaderboard.
I Won a Bet and Got an Apple Watch
In 2015, Li Junkui visited Shanghai Stock Exchange (SSE). The trading system of SSE was deployed on six large computers, which supported a maximum of 100,000 transactions per second.
He was amazed: “100,000! What an unreachable number! What if Alipay could achieve this?!”
When he returned to Hangzhou, he immediately shared his idea with his teammates. In fact, his teammates told him, their goal for the current year was going to exceed 100,000 transactions per second.
Li Junkui was calm about this, because setting such seemingly impossible goals was exactly Alipay’s style.
In contrast, Zheng Yangfei was troubled by this goal. As the head of the comprehensive stress tests for Double 11 of 2015, he had just bet his supervisor on whether he could ensure that no payment problems would occur during Double 11. They bet an Apple Watch.
In that year, it was the first time that post-90s Zheng Yangfei took the lead. He had changed from a participant to a project leader of Double 11. However, that same year, he and his team also “shouldered a burden.” In the first half of the year, their stability team was frequently defeated and frustrated by frequent availability problems. With depressed morale, many teammates chose to leave. Even worse, doubts continued inside and outside of the company. In those months, the air seemed to be filled with the word “tough.”
“At that time, only several members stayed in the team, but everyone held their breath, thinking that they must make everything right for Double 11.” Zheng Yangfei said, “We just didn’t want people to think that Alipay was bad.”
The situation was like a last stand. If they failed, they would have to wait another whole year to “erase the humiliation,” because Double 11 was held only once a year, and it was not only an annual final exam, but also an annual stage for their technological and computing experiments. According to Yang Haiti, a senior technical expert from the system department, “Everyone wants to test themselves and display the results of their year-long efforts during Double 11. They would be upset if they were not on the stage.”
Compared with the comprehensive stress tests done in 2014, the stress tests done in 2015 required even more aggressive improvements from several different aspects. First, the test needed to be expanded from the core system to all the systems. Second, platform tools needed to be developed for the comprehensive stress tests. Third, each test needed to be linked with all the stress tests throughout the entire group.
“To be honest, I was very nervous,” said Zheng Yangfei, feeling very unsure of himself.
When the peak of Double 11 was reached right after 00:00, he had forgotten everything about that Apple Watch bet. A scheduled database task that was not verified during the stress test had made the curve unsmooth. With the aim of “stability overwhelms everything,” the whole team was shaken whenever the system stability curve shook. After quick troubleshooting, the result showed that no system problem had occurred, but rather that some details neglected in the stress test had caused the fluctuations.
“That curve does not look very attractive,” he said with regret. However, despite that, everything went relatively smoothly.
But in the end, Zheng Yangfei won the Apple Watch. For him, this Apple Watch was not only a reward. It also carried special meaning. It reminds him that nothing is foolproof, despite full preparation.
In the Pursuit of “As Smooth As Silk”
Actually, every Alipay engineer has a “perfect curve” in their minds.
Ideally, it should be like this: At 00:00 on Double 11, when the peak comes, the curve climbs smoothly without any sharp rise or drop, and there shouldn’t be any frequent fluctuations that torture everyone’s fragile nerves.
To put it simply, the curve should be “as smooth as silk.”
However, every time the Double 11-oriented technological evolution and architecture change reaches a certain stage, “you’ll find that, even though your assumptions at the beginning may have been fairly accurate, challenges come in succession after the traffic starts to hit some high spikes.” Yang Haiti sighed, “A change in quality also entails a change in quantity. This saying is not a lie.”
The “volume” of Double 11 had already entered an unprecedented range. The transaction volume within more than six hours of Double 11 in 2016 exceeded that of the whole day of Double 11 in 2014. Over the years, we have constantly broken our own records.
The difficulty of ensuring stability for such a huge volume has increased by at least one order of magnitude.
Do you still remember the “three-year plan” for the architecture revolution, which was formulated at the end of 2012? It really did last for three years. This architecture revolution was originally intended to remove the limits of the database connections and data centers. During the three-year evolution, many other architectures emerged, for example, active geo-redundancy, disaster recovery, and elastic capacity scheduling. None of these were fully implemented until 2016. Each step of the evolution enabled the system to dynamically scale up and elastically scale out and in.
A “big” problem is a challenge, so is a “small” problem.
For Shi Wenhui, the most impressive moment was not the moment when OceanBase became well-known externally in 2014, but a small test in 2016. In that test, he found that a metric was a bit abnormal, which was subtle with a slight deviation of 2 milliseconds.
“It was just 2 milliseconds. On other occasions, it might have been deemed irrelevant and ignored.” However, fortunately, one of his teammates checked the problem very carefully. It turned out that if this problem had not been resolved, a big issue would have occurred in that year’s Double 11.
“Despite insufficient resources, tight schedules, and imperfect software, our teammates would never neglect any problem,” Shi Wenhui sighed with emotion.
In earlier years, traffic had been the most important challenge for a perfect curve. As time went by and the business continued to expand, however, engineers became more aware that although stability is the overarching goal, the technology must focus on the future.
In 2017, He Yan celebrated his ninth anniversary working on Alipay. In the same year, Alipay implemented a hybrid deployment. A large number of idle resources for offline tasks could be used for online tasks, which greatly improved overall resource utilization.
“In some small scenarios, what was saved by such efficiency improvement might not be obvious. However, considering our system volume, such efficiency improvements brought an entire, immeasurable future,” He Yan said.
The road to the future was smoother thanks to the foundation that had been constructed by our predecessors over the years.
In 2018, Alipay actually guaranteed two major promotions during Double 11: in addition to the Double 11 Shopping Festival on Alibaba’s various e-commerce platform, Alipay’s also had a “Double 11 Code Festival.” The number of errors dropped by 70% to 80% during that year. This was the first time that the major promotion was stable throughout the day.
Team Captain Li Wei was very calm: “To put it bluntly, we control various risks properly through systematic or engineering processes. This peak was also within our expectations.”
In 2019, “cloud native” was first applied for the Double 11 promotion.
If technology is accumulated layer by layer like building blocks, cloud native is the cornerstone. After this foundation is laid, just like standing on the shoulder of a giant, upper-layer applications can be created to have a variety of powerful capabilities. In this case, the service department does not need to worry about technical issues, but can focus only on the service code.
Light the World
At this point, I wonder if you still remember the teammates who shouted “Impossible!” at the peak goal of 20,000 transactions per second.
In that year, the peak of 20,000 transactions per second was what they had to rush towards with their utmost efforts for half a year. In contrast, last year, the peak of 20,000 transactions per second had become the norm for Alipay, because it was reached just within one second.
Such a radical change really took place, but, at that time, none of the people there had expected as much. Almost every engineer said: “Every year, the next year’s goal comes out soon after Double 11 ends. Then, we will continue to prepare for and work towards next year’s goals.”
One goal follows another. In the process of conquering one “impossibility” after another, the once seemingly unreachable targets were left behind consecutively. The peak of 100,000 transactions per second in that year seems like a piece of cake today.
You can see it only by looking back. Someday, you may suddenly find that you have already walked so far and climbed so high.
Those young people who shouted “Impossible,” but worked desperately to turn the impossible goal into reality have grown up. You can see more calmness from them. As for the upper limit, no one knows, or possibly, there is actually no upper limit at all.
Working together! There’s no stopping us!
The support for traffic data growth is no longer the only goal to pursue in Double 11. More complex services and gameplays were developed among the technological achievements, and in return powered technological development. In addition to Double 11, they can be integrated into many scenarios, such as new-year red envelopes and the collection of five Fu (福, fortune) cards.
- They May Emanate from Alipay and Alibaba
The feats created by Alipay engineers are currently turned into products that serve more financial institutions. So far, dozens of banks and financial institutions have applied OceanBase, and technologies such as the stress testing platform and cloud native are also being gradually turned into products. With the technologies and experience accumulated through Double 11 over the years, Alipay is driving the Internet financial technologies of China to rush forward.
- They May Reach the World
Double 11 has become a global carnival instead of just China’s Double 11. With Double 11, technologies are also going global.
“Speaking of our dream, it would be like this: The whole preparation room is empty in the future Double 11. Except for the Guan Gong statue, no one needs to stay onsite, because the intelligent system can do everything. All we need to do is drink a cup of tea or wine while watching the smooth curve.”
On hearing the vision that Yao Jie envisions, some of the interviewed employees laughed. “Is that even possible?” Every year, Double 11 is like fighting a war, and the rescuers are like firefighters.
But who can say it is impossible? After all, they are a group of people who have turned too many impossibilities into reality.