What’s All Involved with Blink Merging with Apache Flink?

  • The move of Blink to go open-source is mainly about moving a large body of core code for new features, performance optimization and stability improvement that Alibaba has accumulated in stream computing and batch processing to community developers. In short, the core code of Blink can be described as being based on the open-source Flink engine and relying on the internal businesses of the Group.
  • Blink is open-source in the form of a branch, that is, it will become a branch under the Apache Flink project after it made fully open-source.
  • The goal for making Blink open-source is not to have it become another active project, but to make Flink better. That is, by this change, community developers-including you-can learn how exactly Blink code was written and implement and then use these insights to improve the efficiency of merging Blink features into Flink.

Upgrades to the Underlying Architecture

Generally speaking, when it comes to all things IT, if the system changes a lot, it probably has something to do with changes or upgrades to the underlying architecture of the system. The changes that occurred with this release of Flink, of course, are certainly not an exception to this rule. Flink has taken a big step in the direction of stream-batch integration. First, let’s look at the architecture diagram of earlier Flink versions:

Changes to Table API and SQL Queries

When Blink was made open-source, the Table module of Blink as been used for a new underlying architecture design in Flink. Therefore, in Flink 1.9.0, the Table module naturally became the first to use the adjusted architecture. However, in order minimize impact to the user experience of earlier versions, developers at both Alibaba and Apache still need to find a way for the two architectures to coexist.

Improvements to Batch Processing

The batch processing feature of Flink has made significant progress in version 1.9.0. After the architecture adjustment, Flink 1.9.0 has added several improvements to the batch processing feature.

Improvements to Stream Processing

After all, stream computing is still the main field of Flink development. Therefore, in version 1.9.0, it is important not to forget to make some improvements in this area. This version adds a very practical feature, that is, FLIP-43 (State Processor API). In Flink, the access to the state data and to the savepoint composed of the state data has always been highly popular among community users. In versions earlier than Flink 1.9.0, the Queryable State feature was developed. However, the application scenario of this feature is limited and the effect is not ideal. Therefore, not many people have used this feature. While, the State Processor API provides more flexible access methods, and enables users to perform some of the more technical features:

  1. Users can use this API to read data from other external systems in advance, convert them to the Flink Savepoint format, and then enable the Flink job to start from this savepoint. In this way, many cold start problems can be avoided.
  2. Use Flink State Processor API to directly analyze the state data. The state data has always been a “black box” for users. Users do not know whether the data stored in it is right or wrong and whether any exceptions have occurred. With this API, users can analyze the state data just like other data.
  3. Revise the dirty data. If a piece of dirty data contaminates your state, you can also use this API to fix and correct such problems.
  4. Migrate the state. Suppose a user modifies the logic of the job and wants to reuse the state of most of the original jobs, and also wants to make some fine-tuning. Then, the user can use this API to complete the corresponding work.

Integrating Hive

Apache Hive has always been an important force in the Hadoop ecosystem. To better promote the batch feature of Flink, the integration with Hive is essential. During the development of version 1.9.0, we are very pleased to have two Apache Hive PMC specialists to promote the integration of Flink and Hive.

Summary

So, in summary, Flink 1.9.0 was finally launched after more than six months of intense development. During the process, many Chinese developers and users alike joined the Flink community and a massive amount of code was contributed to the community, indicating a promising start for Flink. In the future, we will continue to invest more in the functionality and ecosystem of the Flink community and popularize Flink in China or throughout the world.

Original Source

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

4.97K Followers

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com