SOFAStack: Building a Financial-level Cloud-Native Architecture
By Zhu Sansong
I recently came across an interesting quote that sounds counterintuitive at first but will start to make sense if you give more thought to it. It goes something like, “progress is driven by lazy people.”
Scientific technologies continuously evolve because “lazy people” always seek for easier ways to do things. To save our energy from traveling, we created the steam engine. To prevent manual calculation, we created the electronic computer. In order not to carry cash, we invented online transactions and contactless payment. Data and information technology have become the foundation of this era, and innovative services and applications are all around us.
Today, we acquire almost everything from the Internet, and massive volumes of data and code are flowing in the cyberspace. Developers who built codes are starting to think about packaging some common code modules to the upper layers for them to use at any time. By doing this, they do not need to code repeatedly and can turn typing code into a modular job.
This laziness has motivated a new concept more commonly known as “middleware”.
Most people are unfamiliar with “middleware”. Technically, middleware is a special software between the infrastructure and business system. Programmers have designed a variety of metaphors for the middleware. Some people said that it is a “prefabricated part” at a construction site, so that workers do not have to stir cement from the scratch. Meanwhile, some other people said that it is a “middleman” integrating the source of goods, so that sellers are free from repetitive price inquiry and comparison.
“There are lots of communication and integration work between the infrastructure and business system. Enterprises never want to do this with extra human labor for every business system,” said Ma Zhenxiong, a senior product expert at Ant Group. “This is a common requirement for all enterprises.”
The Rise of SOFAStack
To meet this requirement, Ant Group proposed Scalable Open Financial Architecture Stack (SOFAStack).
SOFAStack came out quietly, with the original intention of “rescuing” Alipay. At that time, Alipay did not provide Ant Forest, Ant Credit Pay, or Health Code. It was a simple application that ran on an application server, used a database, and served Taobao. SOFAStack is a collection of cloud native middleware components, which are designed to build distributed systems with high performance and reliability, and have been fully validated by mission-critical financial business scenarios.
Featuring simplicity, ease-of-use, and convenience, SOFAStack supported the development of Alipay from 2004 to 2006. However, with the increase of the transaction volume and the complexity of businesses, Alipay encountered pain points for growth.
“The technical team was expanded from dozens of people to hundreds of people and finally thousands of. In different scales, the R&D and organization methods were varied.” Huang Ting, a senior technical expert at Ant Group, said, “As the number of people increases, code written by different developers varies and conflicts increase.”
In a word, the R&D efficiency was affected.
If Alipay was used to be a bungalow, it now has to develop into a city. To build each building in the city, workers have to burn bricks and stir cement from the scratch without excavators or hydraulic hammers. This is unacceptable for the team who is responsible for “building the city”.
For example, it takes 5 or 6 minutes to start iWallet each time, an electronic wallet system of Alipay. If a bug is found, iWallet has to be relaunched after modification. As a result, developers are trapped in the “endless loop” of code compilation and relaunching every day.
This issue lies in that the iWallet system contains dozens of projects developed by over 10 teams concurrently. In fact, the original Alipay system cannot support this complex business logic or allow so many engineers to work concurrently. Therefore, developers call iWallet a monolithic system.
Alipay has the following requirements. First, hundreds of projects run concurrently, and engineers can work without interference. Second, when the complexity of business logic increases, the system complexity does not increase exponentially.
The Need for Middleware
To meet these requirements, Alipay needs a set of “middleware”.
In 2006, the opportunity came. The technical team held a series of meetings with the only core subject of Alipay’s future technical architecture. Team members proposed two ideas. One was the centralized architecture, like banks. The other was the distributed architecture, which was not the small-scale architecture in client/server mode but the ultra-large-scale distributed architecture in the Internet era.
The second idea has never been explored before.
However, Alibaba employees have never been retreated or hidebound. After about a year of thinking and argumentation, the technical team decisively chose the second idea. Since 2007, Alipay started to reconstruct its transaction system, merchant system, membership system, and payment and settlement system to conceive a new architecture.
This distributed architecture is called “service-oriented fabric architecture (SOFA)”.
It is called SOFA because At that time, service-oriented architecture (SOA) was popular. By incorporating financial businesses into SOA, SOFA is generated.
Second, it is spelled the same as the word sofa. As its name implied, developers hope that engineers can work comfortably with SOFA.
From a “Connector” to a “Tool Library”
What is SOA? Technically, an enterprise’s IT system is reorganized in the unit of “services”. Then, these services are connected through the “service bus” to form a pluggable enterprise IT architecture. This architecture is SOA.
Some of you may find that this definition is difficult to understand. Fret not. At that time, SOA was simply an idea for the traditional enterprise IT architecture, that is, a theoretical framework. There were no specific successful practices of SOA in the industry.
At the beginning, Ant Group pioneers were cautious. The first-generation SOFA only solved two problems. First, it functioned as the glue or connector to interconnect distributed systems as a whole. Second, it allowed each service to be component-based. In other words, engineers only needed to focus on their own components. Finally, the components were assembled to services, and the services were assembled to a complete system.
Huang Ting said, “SOFA can isolate different modules to be developed by different developers. So, everyone has a more detailed division of labor and will not have too many intersections with others.”
The first-generation SOFA clearly defined the boundaries between teams, including labor division and collaboration. To demonstrate this, Huang Ting gave an example. For a simple bank transfer, the system needs to call the transferer’s address book, while accounting-related subsystems may have to inquiry the bank whether the account balance is sufficient. The whole process involves complex system interactions. To ensure efficient interactions between these systems that are developed and maintained by different teams to complete a transaction, SOFA is required.
However, the new SOFA distributed middleware cannot deal with all problems. It needs to be continuously iterated.
On this unexplored road, there are no pioneers but growing technical issues.
With SOFA, Alipay splits the financial business system (business mid end) and underlying IT system (data mid end and computing mid end). In addition, it has to cope with massive volumes of data during Double 11 and various emerging technical issues. The two SOA transaction standards provided by the industry cannot support the transaction volumes of Alipay core systems. Therefore, the Alipay team planned to develop their own standard to ensure distributed service consistency.
The Evolution of SOFA
To produce several lines of SOFA code, SOFA had encountered countless similar obstacles over these years and accumulated many ideas and technical practices.
It pioneered new technologies to light up new areas.
The first-generation SOFA was modular.
The second-generation SOFA was service-based.
The third-generation SOFA was unitized, which was called a cutting edge technology of Ant Group. The active geo-redundancy architecture made it easier to scale out server resources and ensured the stable and smooth processing of each user order. The SOFA team confessed that the unitized architecture was actually forced by the businesses for distributed reconstruction of ultra-large-scale Internet financial transactions. There was no precedent in the industry.
“We did review some papers and concepts. However, no one was sure that unitized architecture can work for the large-size Alipay,” said SOFA team members.
SOFA continuously iterates and grows with the optimization of the Alipay architecture. At the beginning, it was just a simple framework. Later, it strengthened communication performance, improved disaster recovery efficiency, built geo-disaster recovery architectures, performed unitized reconstruction, and introduced LDC logical data center projects. As a result, SOFA gradually became mature with more and more technical tools. It was no longer a middleware but a tool library.
Key Features of SOFAStack
So far, SOFA has completed its first evolution. Its full name is also changed to Scalable Open Financial Architecture, dedicated to building the architecture for financial-level systems. Some developers also append Stack, which means a suite, to SOFA.
From this new name, we can easily see the developers’ visions and expectations.
- Scalable: A scalable architecture can process more transactions, support more businesses, and enable thousands of or even tens of thousands of engineers to collaborate.
- Open: Business applications are easy to use and can integrate with classic architectures.
- Financial: SOFAStack must incorporate financial-level attributes to ensure financial-level consistency, availability, and stability.
In “SOFAStack Financial Distributed Architecture White Paper” released in 2020, Ant Group defined SOFAStack as a technology stack for constructing financial-level cloud-native distributed applications.
Over the past years, SOFAStack has withstood the tests of many big promotion activities and supported the development of Ant Group’s all-round businesses. It has become a star product of Ant Group. As the distributed architecture gradually enters the public, the middleware market develops rapidly.
Some team members propose to launch SOFAStack to the market.
With the agreement of most team members, SOFAStack goes to the market.
SOFAStack Go to Market
The market competition is fierce.
Before SOFAStack was launched to the market, traditional enterprises still use the centralized architecture for their core systems, especially the well-known IBM, Oracle, EMC (IOE) architecture. Specifically, IBM who provided minicomputers with powerful computing capabilities, EMC who offered expensive high-end storage devices, and Oracle who provided databases are the three core components of the centralized architecture. However, the running of a large amount of service logic depends on J2EE containers or the CISC transaction middleware.
Under the prosperity, the cornerstone of the centralized architecture is unstable. Even though IBM standalone servers have good performance, standalone core applications based on the server system can no longer support high concurrency after a large number of financial institutions transform to digitization and actively develop online businesses.
To solve the problem, we need scale-out.
However, it is expensive to upgrade the server configuration under the IOE architecture. Not all enterprises can afford this. During Double 11 in 2013, Oracle asked Alibaba to pay expensive extra bills for surged traffic running on its databases.
Fortunately, Alibaba has developed their own database product, OceanBase.
The proprietary OceanBase database helped Alibaba reduce costs. Ant Group found that there was strong support for replacing the IOE architecture in the market and therefore launched SOFAStack.
Bank of Nanjing: The First Customer of SOFAStack
“Many banks have seen previous achievements and financial innovations of Ant Group.” As the head of the SOFAStack commercialization team, Ma Zhenxiong thinks that the future of SOFAStack is prosperous. “Look, the customers have reached a consensus and recognized the trend. They also want to move to this way.”
At the beginning of 2017, Bank of Nanjing started “dual-mode operations”. It retained the traditional stable core system and built an open and flexible agile core system. In April 2017, multiple Ant Group teams, including Platform Architecture Department, Financial Core Platform Department, Technology Risk Department, and Micro-Loan Business Department comprehensively diagnosed Bank of Nanjing.
Each team had tried the best because this was the first customer. SOFAStack showed all its features. This was the first showcase in its life.
In July 2017, Ant Group assigned a technical team to Bank of Nanjing. They were responsible for the roadmap and top-layer architecture design for distributed architecture transformation to prevent the customer from taking detours at the beginning. In October 2017, Bank of Nanjing released its own Internet financial open platform named “Xinyun+” at the Yunqi Conference.
On November 18, “Xinyun+” was officially implemented.
After the first project, SOFAStack accumulated experience during commercialization and quickly and flexibly made adjustments to cope with customers’ feedback and requirements. In general cases, it takes a long time to respond to customers’ requirements. Customers’ requirements are first reported to the delivery department and after-sales O&M department. Then, the O&M department analyzes the requirements and submits the core requirements to the product team. Next, the product team arranges for production, asks the technical team to implement, and finally releases a new version and asks the after-sales team to maintain the version.
However, Ant Group assigned a joint team to Bank of Nanjing, involving product, technology, business, after-sales, delivery, and O&M. When any bugs or product requirements occurred, the project team could quickly solve them. The project team could even release six product versions in one day. This Internet-style quick iteration mode made the traditional financial industry stunned.
After being polished and experienced on the road of commercialization and productization, the fourth-generation SOFAStack emerged.
After the Bank of Nanjing project, the financial-level cloud-native architecture solution provided by SOFAStack was widely recognized by the industry. More and more financial institutions that wanted to get rid of the IOE architecture hoped to cooperate with Ant Group.
The market is violently stirred, and the “new specie” is undergoing metamorphosis.
Today, SOFAStack has obtained an increasing number of customers, including well-known large institutions and small-sized enterprises with unique visions. SOFAStack has undergone smooth transition but also encountered a lot of function adaptation issues. Ma Zhenxiong said that sometimes after the team deployed the platform and entered the development or testing phase, the customer might have dozens of questions for a product in a day.
When I asked him whether he was discouraged for this,
Ma Zhenxiong said with a smile that the team was both painful and happy most of the time.
On the one hand, it was painful because the Ant Group star product was overwhelmed by coming problems. On the other hand, it was happy because customers had high expectation on our team. If customers were not confident about our product at all, the team would feel embarrassed. Ma Zhenxiong added that such customers were valuable. “We are not afraid of noise but no noise.”
Cooperation with Huarui Bank
In the list of customers, Huarui Bank is a typical case.
Compared with joint-stock banks and urban commercial banks with more than 100 billion assets or Bank of Nanjing with trillions of assets, Huarui Bank with an asset of more than 30 billion is nothing but a minor customer.
However, a minor case can pose a major impact. The cooperation between SOFAStack and Huarui Bank was evaluated by Ma Zhenxiong as “the benchmark for private bank businesses”. Before cooperating with Alibaba and Ant Group, Huarui Bank spent almost one year on researching cloud platform building. Huarui Bank does not have offline outlets or counters. Instead, customer acquisition, account opening, and deposit and load businesses are all done online.
In other words, it is an Internet-based bank, same as Ant Group with Internet genes. At the end of 2019, Huarui Bank used the SOFAStack financial-level distributed architecture, mPaaS mobile development platform, and Alibaba Aspara cloud computing system to build its own financial cloud platform “Xiangyun” to support business systems, such as mobile banking, marketing, anti-fraud, and loan accounting.
Many technical tools can be used out-of-the-box, and innovations are continuously provided.
Ye Ning, the general manager of the science and technology department of Huarui Bank, mentioned in an interview that small- and medium-sized banks must know what they must do and what must not do. Since they do not have the technical strengths of large state-owned banks and joint-stock banks, they need to ask financial technology companies to provide assistance.
“Through the cooperation with Alibaba Cloud and Ant Group, we can free ourselves from inefficient work. We do not need to spend time on the construction of standardized software and hardware technologies.” Ye Ning compared this process to “cooking”. Some people like to grow vegetables, raise pigs, and squeeze oil themselves from scratch. This is something that complies with the green and health concepts. However, not every housewife has efforts to undertake these tasks.
“Huarui Bank does not want to be a farmer. We just want to bring over semi-finished products in the supermarket and make dishes that meet our taste.” Ye Ning said.
This analogy coincides with the birth of “middleware” at the beginning. With mixers on the construction site, semi-finished products in the household refrigerator, and modular components available, everyone does not need to waste energy on repetitive and inefficient work.
In the first quarter of 2020, Huarui Bank earned an increase of online customers by 468%, improved the system development speed by more than 30%, and significantly shortened the system environment preparation and resource expansion periods. During the epidemic, upgraded financial-level distributed core systems effectively supported the outbreak of online business volume.
PICC: SOFAStack’s Extension to the Insurance Industry
In addition to the bank industry, SOFAStack also proven its capabilities in the insurance industry.
In 2018, Ant Group cooperated with PICC. The solution with technical products such as mPaaS and SOFAStack helped this well-known insurance company successfully eliminate technical bottlenecks and build an industry-leading new-generation core business system.
In just a few months, PICC has improved its insurance policy processing capability by thousands of times, can issue 1,000 orders per second, improved the access efficiency of external channel products by 6 times, shortened the launch time of new products by 80%, and ensured 99.99% availability of platform services. In the past, PICC needs to spend 4 hours processing tens of thousands of daily settlement files. Now, it only requires 6 minutes.
In the insurance industry, SOFAStack develops well.
The Adoption of SOFAStack in Various Industries
Ma Zhenxiong said that SOFAStack’s previous mission was to support all businesses of Ant Group, including Yu’E Bao, Insurance, and Zhima Credit. Almost all business requirements in the financial industry are covered.
“SOFAStack natively supports all segmented sectors of the financial industry.” said Ma Zhenxiong. However, this depends on a lot of technologies accumulated in the background.
Ant Group launches SOFAStack from the initial attempt to rapid development and from a pioneer on unexplored roads to a leader of digital transformation. SOFAStack is also used together with the mPaaS mobile development platform and the OceanBase distributed database product.
Other customers, including Shunde Rural Commercial Bank, Shenzhen Rural Commercial Bank, Cathay Century Insurance, and Trust Mutual Life, start to cooperate with Ant Group for SOFAStack. The slogan “distributed architecture is the future” has attracted more and more customers to adopt the distributed architecture, which will bring a new revolution to the era.
Trusted Native: The Future Is Here
Now, SOFAStack has evolved to its fifth generation. The previous simple middleware framework is now a magic box, including SOFABoot, SOFARegistry, MOSN, and SOFARPC. In the open-source community, tens of thousands of members have contributed to these projects and components. SOFAStack is trained and verified in more application scenarios.
When I asked Huang Ting what are the new changes of the five-generation SOFAStack, he said that the major change is “trusted native”. When SOFAStack provides services for a well-known application, users have high requirements for data privacy, security, and reliability. The SOFAStack team has made improvements on breaking technical boundaries and building stable frameworks to ensure a more secure and reliable architecture.
To explain the “trusted native”, we have to first know “cloud native”.
Cloud native is dedicated for cloud applications. With the cloud native method, cloud applications can be quickly and frequently built, released, and deployed with excellent performance in scalability, availability, and portability. Cloud-native technologies have become a development direction of modern cloud computing technologies. More and more enterprises accept and use cloud-native technologies.
Since 2018, Ant Group has completely turned to cloud-native technologies. As the carrier of core cloud-native technologies, SOFAStack is also undergoing dramatic changes.
In some technical fields, SOFAStack is already at the forefront of the industry. The most well-known field is Service Mesh. Based on open-source community projects, SOFAStack developed its own component SOFAMosn. This component was independently operated later and upgraded to the MOSN brand. In 2019, MOSN was used to detect the traffic of Alipay core links during Double 11 and was one of the largest service mesh clusters in the world.
The wave of innovation is surging up. However, the “trendsetter” is not showing up.
Cloud-native technologies have a huge impact on the original technical architecture. Most business personnel and customers hold a wait-and-see attitude because they distrust emerging technologies. In recent years, financial institutions have only used cloud-native technologies for new businesses, and few of them dare to implement cloud-native technologies in the core transaction system.
If customers’ distrust of emerging technologies remains, it is difficult for SOFAStack to develop further. After a long period of thinking and practice, Ant Group puts forward “trusted native”, whose essence is to make cloud native trustworthy.
However, to make cloud native trustworthy, a long technical link is involved. Both businesses and users have high requirements for security, stability, and trustworthiness. This cannot be done by strengthening some technical points. Instead, we must ensure that the whole system from hardware to application, the whole application lifecycle from development, deployment, upgrade, to deprecation, and users’ access from the mobile client to the core database are all trustworthy.
As a practitioner of trusted native, SOFAStack is seeking for a more magnificent transformation.
In reliability, SOFAStack has withstood practical inspection of Double 11 and active geo-redundancy over the past years. In secure production, data protection, and other aspects, key technologies in trusted native “security containers” and “confidential computing” have been added to SOFAStack. In the future, SOFAStack will cooperate with academic institutions and industry customers in and outside China to continue to strengthen the construction of trusted native.
New technologies bring both risks and opportunities.
“We can use new technologies to build a safer and more reliable system than before,” said Wang Xu, a senior technical expert at Ant Group. “More importantly, we need to deliver the intangible product of ‘trust’ to users through our technologies.”
Is progress really driven by lazy people? Not necessarily.
But I believe that those who rely on the invention of new technologies to look for easier ways to do things are the world’s most lazy people but also the most intelligent and diligent people.
This is also true for those who made the “middleware”. They turned heavy workloads in the code world to modules to free most programmers and make programming smooth, elegant, and easy. They are lazy because they are unwilling to accept boring and inefficient work. They are hardworking because their efforts are no less than others, and they benefit the industry with their developed tools.
Evolved to today, the distributed architecture is facing fierce competition. To reach the peak of the mountain, SOFAStack still has a lot of issues to tackle.
After all, new technologies continuously emerge in the tide of times.