The Now and Future of Financial Data Intelligence at Ant Financial
11.11 Big Sale for Cloud. Get unbeatable offers with up to 90% off on cloud servers and up to a $300 rebate for all products! Click here to learn more.
In the financial industry, fusion use cases have become increasingly complex, and mass data is becoming increasingly common. Therefore, a new generation of data technology architecture and core engines for financial computing are needed. At the Developer Summit during Apsara Conference 2019, held in Hangzhou, Xiao He, the Chief Architect of Computing and Storage of Ant Financial, discussed the now and future of financial data intelligence at Ant Financial. This article is based on his discussion at the conference.
Over the past decade, Ant Financial has reshaped financial services by using several different advanced and innovative technologies, among which were financial transaction payment and financial data intelligence technologies.
Ant Financial discovered that, in addition to meeting the requirements of traditional big data, financial data intelligence also needs to meet several distinct requirements:
- Highly demanding real-time requirements: Real-time data is rapidly growing, with the number of online decision-making increasing, and data timeliness being critical for business development.
- Diversified computing use cases: Computing use cases have developed from simple statistics and decision-making rules to complex systems of graphs, artificial intelligence (AI) models, and complex decision-making rules.
- Research, development, and debugging throughout long data links: Model research and development span over 18 systems, and several different research and development models are used, which bring huge challenges to the research and development team.
- High availability of computing and storage: Cross-domain disaster recovery and highly reliable computing services are required.
- Data security, regulatory compliance, and risk prevention and control: A rigid data security and grading system is required to protect user privacy, ensure regulatory compliance, and prevent and control all risks.
To meet the requirements of financial data intelligence, Ant Financial has spent much time and effort to develop and evolve the computing technologies behind their services. Batch computing engines such as MapReduce or resilient distributed dataset (RDD) are needed to process mass data. Real-time computing is needed to meet the demand for real-time solutions in the industry. Interactive analysis is implemented to meet increasing requirements for data analysis. In addition, computing technologies also face certain challenges, for example, inefficient research and development due to multiple computing models, different requirements for storage resulting from multiple systems, extra costs, different requirements for disaster recovery, the requirement for ensuring data security, and increasing complexity.
Open Computing Architecture
For engineers, an ideal solution to the preceding challenges is to build a unified system. However, it is difficult to define the system, determine the system boundaries, and implement system abstraction. First, computing engines are closely related to businesses, but there is no single engine that can meet all the business requirements of Ant Financial. More to this, with the constant innovation in businesses, new requirements are also constantly emerging. Therefore, Ant Financial requires an open computing architecture that can accommodate all kinds of computing engines.
In the open computing architecture, unified storage is required. Data storage can be in various formats and in multiple replicas and duplicates. Also, it can be optimized for different computing engines. However, unified storage must be used at the underlying layer, and unified security control measures must be taken at the site level.
Data security control must be implemented at the site level. The financial data intelligence system must provide unified metadata management, access specifications, security levels, and the privacy protection system throughout the site. Unified metadata management and data security is prerequisite for the introduction of different computing engines. In addition, different security control policies must be available for each engine.
In addition to an open architecture, unified storage, and unified metadata management and data security, Ant Financial also expects a standard programming paradigm. At present, Ant Financial uses standard SQL components and extensions so that users can develop their own business logic by directly using the underlying data. In this case, it is to abstract engines and storage. Business engineers only need to focus on the purpose and timeliness of data and don’t need to be concerned about stream and batch computing, because other parts are automatically optimized and processed by the engine and storage layers. In addition, data-oriented programming is implemented in the standard programming paradigm, and business engineers develop business logic based on the abstracted data without needed to pay special attention to small details. This is then the process of data virtualization. Ant Financial developed the standard programming paradigm based on existing computing models and experience in real-world applications. This paradigm can significantly improve the research and development experience of users.
The above infrastructure systems comprise the overall financial data intelligence system that Ant Financial believes can support the continuous development of financial businesses in the future. Today, one of the hottest topics in the industry is AI technology. Ant Financial has also explored the application of AI in its financial intelligence business services.
In the current AI system of Ant Financial, first there is a dataset, then the data is cleansed by the data warehouse and trained on the model platform, and a trained model is finally output and pushed to online services. During the process, multiple systems are required and multiple duplicates of data must exist, which may cause data security risks and low storage efficiency. Moreover, another issue is that this process cannot proceed in real time due to the limitations of the model. However, the real-time requirements are of increasing importance for financial systems. And, at the same time, users need to provide the data warehouse, understand the machine learning platform, and deploy the model online, which can make for a complex and tedious process. Ant Financial directly inserted the machine learning engine into the new financial data intelligence system.
Ant Financial’s SQLFlow was intended to use the SQL to characterize and describe target machine learning content, and use the SQL to connect data and machine learning, making machine learning as simple as the SQL. Users can train machine learning and polish model predictions simply by understanding the SQL.
ElasticDL is Ant Financial’s open-source AI engine based on elastic scheduling. It was developed completely based on the open-source TensorFlow program, but it additionally adds the features of fault tolerance and elastic scheduling. ElasticDL is also integrated with SQLFlow, which can help users by providing a simpler training model.
Financial Graph Computing
Ant Financial has some typical graph computing use cases, such as real-time cash-out recognition, social analysis, and marketing target users. These use cases can be easily implemented in the financial data intelligence system of Ant Financial, in a similar way to how inserting a graph computing engine into the system works. As such, the system provides both offline and online graph computing engines, while also connecting stream computing with batch computing to implement a hybrid computing engine. In addition, Ant Financial intends to further optimize its financial data intelligence system and simplify machine learning by using standard programming languages such as SQL and Gremlin. A highly consistent online graph database is also implemented at the underlying layer of the system, which provides users with the storage of mass graph data.
Ant Financial’s financial data intelligence system is developed based on an open architecture. Therefore, whenever a new data engine or data model emerges, it can be easily integrated into the system. When new business requirements arise, deeply customized engines can also be integrated into the system. Any integrated engine can be directly used to process mass financial data. When these computing engines work stably, they can be inducted at the upper layer to further optimize the research and development efficiency.
The Fusion Computing of Financial Data
After the financial data intelligence system is built, we also need to consider complex computing use cases for processing businesses. In complex financial use cases, multiple computing engines need to be used at the same time. Therefore, fusion computing is required to support more efficient communication between computing engines. For this purpose, Ant Financial developed the Ray fusion computing engine by working with University of California, Berkeley. When the fusion computing engine is used, a unified research and development process and standard can be used to describe various computing tasks. In addition, the underlying computing status, data, and intermediate results are shared. In this case, users can select any research and development language to complete tasks such as data processing, machine learning, and graph computing.
The Ray fusion computing engine is an open-source framework. Ant Financial made many contributions to the Ray project, and also promoted the development of the Ray community with University of California, Berkeley. The Ray fusion computing engine can convert users’ simple local logic to massive distributed execution models. Ant Financial implemented several fusion computing use cases based on the Ray framework. For example, dynamic graph derivation combined with stream computing and graph computing, which can complete a 6-layer iterative query within one second. For online financial decision-making, the decision-making process from data production to distributed query can be completed within one second. Online machine learning enables end-to-end updates and data pushing from data samples to models within seconds.
The fusion computing engine is not designed to replace all the engines mentioned earlier, but will be inserted into the open computing architecture as a computing engine for special cases. After gaining an in-depth understanding of all computing engines, we can optimize them and construct a SmartSQL layer.
Finally, I will share Ant Financial’s outlook for the financial data intelligence system. In the future, we hope that the financial data intelligence system can provide unified storage at the underlying layer, and provide open, pluggable, and reusable engines at the intermediate layer. In addition, we hope that the upper layer can be optimized and unified or directly be opened to different engines to form a big data base system. We hope that the financial data intelligence system will be as simple as a database, and that it can be developed into a big data base that has an open computing architecture. In this way, the financial data intelligence system can process data and support unlimited scalability.