An Enterprise-level Data Empowerment System That Features a Closed Loop, Accumulation, and Sustainability
The following content is based on a speech presented by an expert at Alibaba Cloud. This article covers the development of the enterprise-level data empowerment system that features a closed loop, accumulation, and sustainability, and also Umeng DataBank.
Developing the Enterprise-level Data Empowerment System
Four Key Steps
Our goal is to quickly form a closed loop that covers data at different enterprise touchpoints and then bring together the scattered data to quickly use it to empower business. In this effort, we must consider four key steps. The first is the transformation of businesses into data. For this step, we must consider whether all the touchpoints of an enterprise are authentic and connected. The second is the transformation of data into assets. For this step, we must consider whether data can be managed as assets. The third is the transformation of assets into applications. For this step, we must consider whether an enterprise’s assets are effectively applied. The final step is the transformation of applications into value. For this step, we must consider how to leverage data assets to empower businesses. The ultimate purpose of all applications is to boost business growth, customer acquisition, and value production. To achieve these goals, it is essential for us to form a closed loop that accumulates data. Ultimately, the data mid-end and data energy must be sustainable.
Purpose of Developing This System
The following figure shows how we built a closed-loop, accumulative, and sustainable enterprise-level data empowerment system. The figure shows the enterprise-oriented data bank to be released by Umeng+. Now, let’s discuss how data banks and businesses work with each other. Based on cloud infrastructure, such as MaxCompute, Umeng DataBank will continuously help enterprises collect data from various scenarios and touchpoints, perform data governance, purification, model processing, and form various application services. Based on the connection capabilities of UMID, multiple accounts and terminals are normalized, and data can be connected across different terminals, specifically different mobile clients, servers, and client platforms. All of this helps developers gather data assets from all scenarios and touchpoints and manage applications.
Cross-terminal user operations involve two issues. First, does the company’s data located on external media flow back to the company? Can the backflow data be applied for another time? Second, are users gathered in a user pool through marketing activities, and can efficient operations be performed for users with different terminals? In fact, in addition to marketing, enterprises have many other user touchpoints, which in China are the TopBuzz (Toutiao), Weibo, and Tiktok (Douyin) accounts. All user asset data must be interconnected to tap into its true value. If you are working on search recommendations, in addition to advanced model algorithms, your company must have a data foundation and collect normalized user behavior data that backflows from each touchpoint. By feeding this data into your search engine, you can make it more intelligent. For example, by incorporating data on served ads into subsequent searches, you can recommend ad content that the customer has interacted with in the past.
The Umeng DataBank
Every company needs to build its own data bank. For example, in the Alibaba ecosystem, we have millions of merchants during the Double 11 Shopping Festival, and many brands and merchants have built data banks on Alibaba. Similarly, Umeng+ has been deeply engaged in data intelligence services for nine years. Drawing on its experience of serving millions of Internet enterprises, Umeng+ launched Umeng DataBank for developers and, together with Alibaba Cloud MaxCompute, formed a set of core solutions for users. Data banks are required for solving several problems. Data banks can solve the problem of data asset management and the application of data, which can be expressed in four words: collection, construction, management, and use. Businesses can be transformed into data, and data can be transformed into assets. This process involves data collection and the conversion of terminal data into data assets. Next, these assets are applied. In this process, we push various messages and use marketing to obtain new customers, which includes app push and various operational recommendations. These services all can be provided by data banks.
Umeng DataBank includes three products to help users solve three types of problems. As shown below, the first product is smart data collection (which is U-SDC) and the second is the customer data platform (U-CDP). These products help enterprises accumulate data assets and provide efficient services to business departments, operation teams, and marketing teams. The third product is the data open platform (U-DOP), which integrates and analyzes collected data by using business data in Umeng Cloud. U-DOP provides comprehensive insights into users and more scenario-based application data.
AI and smart engine products are essentially data production and collection products. Collection is the fundamental basis of data quality, whereas the efficiency, quality, and efficiency of data collection are crucial. Data collection requires you to answer several questions. Do you have full control over your company’s data tracking? Do you understand how data should be tracked in a given scenario? What kind of data will be generated after tracking? Are the tracking points correct and valid? Tracking is a long-term operation. You must constantly verify that tracking is healthy and ultimately related to the fundamental questions you’re concerned with. If data tracking is inappropriate, then all the capabilities based on it, such as AI, will not function properly.
- Tracking management: Tracking in the big data field is difficult, and few want to get their hands dirty doing it. Often, it’s the case that, when a product is launched and you need to have usage data, the tracking is all too rushed. Also, many companies lack people who understand how data tracking actually works. They do not know what types of tracking are appropriate, when they are appropriate, and which are defective. This is the case for many companies. And this kind of problem can have a significant impact on company operations. If a company does not understand its own tracking problems and continues to operate with incorrect data for a long time, the impact will only be amplified over time.
- Intelligent tracking solution recommendations: Assume the following, a company in the video industry has two teams, each responsible for different live broadcast channel businesses, and both teams define some tracking specifications for the company. However, the data specifications of both teams are inconsistent. When video playback starts, team A defines the global tracking parameter as Play, which indicates the playback start event, while team B defines it as Start. In addition, both teams are not aware that their definitions are inconsistent. This problem may not seem serious, but it means that the company’s data will be unsustainable. This problem cannot be solved with any tools. A company’s data management must be based on a deep understanding of the industry, which will allow the company to define consistent standards and specifications for business scenarios. Umeng+ solves user problems through more standardized scenarios, including providing standard tracking solution recommendations for different industries. Umeng+ aggregates the practices of many excellent enterprises to tell users how to track data and what scenario problems can be solved after tracking. At the same time, Umeng+ provides various intelligent recommendations for tracking and collects the knowledge maps of the company’s scenario-based tracking solution to assist the relevant technical teams.
- Intelligent tracking and verification: Developers use SDK code to perform tacking, report data, and print logs from the backend. However, this does not mean that data reporting marks the end of tracking. If you directly place a startup log on the logon page, you may find that, one day, the number of logons is nearly twice the number of page visits. This is because the tracking point is also placed in the loading process of the logout page. In this case, you have accidentally tracked a single event at two locations. Umeng+ hopes to provide various intelligent verification tools. For example, a service will be provided to developers that does the following. If a tracking point is named “start”, a series of intelligent checks are performed to verify that the pages from which the tracking point is reported are correct for the business scenario. Intelligent tracking and verification testing are very important. Umeng+ verifies the correctness of tracking through visual screenshot computing, greatly reducing work costs and the pressure on the technical team.
- One-click checks of tracking health: When all tracking points are completed, the company needs to verify the tracking health, check whether the tracking points comply with their specifications, and detect anomalies. The degree of tracking health determines the accuracy of a company’s data collection. Data teams and client development teams often encounter conflicts due to tracking problems. When the data team believes that the data is problematic, they generally blame it on a tracking problem, but the development team sees it as a failure of cooperation on the part of the data team. The KPIs of tracking call for tracking be visualized first, so we can see who placed the tracking points, whether there is a problem with operations, and whether tracking complies with relevant specifications. This can tell us whether the team should be held responsible when tracking does not fully comply with the specifications. Therefore, from the management, organization, and product capability perspectives, it is necessary to solve the core problems that impact company data tracking and collection.
Umeng+’s U-SDC aims to solve these problems. It makes user tracking visible, controllable, and manageable and recommends optimal solutions for user tracking. In this way, user tracking can be intelligently debugged and verified, greatly reducing the cost of tracking and collection and, in turn, fundamentally improving the data quality and ensuring the value and quality of data assets.
After data is collected, the most important thing is to solve the user asset problem. First, user asset management must solve the problems of trust and normalization. Data is created at many touchpoints. Among all the requests sent to an app, many represent fraudulent traffic. So, how can we ensure that devices are trustworthy? Based on the connection capabilities of UMID, multiple accounts and multiple terminals are normalized. By connecting data across different terminals (different mobile client, server, and client platforms), it supports stream conversion and relationship insights. After normalization, we can create an automatic tag production library to ensure efficient tag production in private domains. This empowers business teams to quickly create tags, gain insights, identify target users, and ultimately form operation actions for customers.
- Clear understanding of user assets: U-CDP allows multi-source data to be connected to platforms with one click. This data can come from mobile clients or other clients, servers, and other sources. U-CDP ensures trust identification and multi-terminal normalization. It helps users normalize and purify data, filter junk data, and defend against frauds through global data identification. After identifying and interconnecting data, we can achieve user asset visualization. This clarifies the company touchpoint data sources and shows us the accumulation of private domain users. After clarifying the preceding issues, we can determine the touchpoints that need to be added or enhanced. In the end, this allows us to accumulate our own private domain data assets. One premise of accumulated private domain user assets is that they must be operational. If they are unoperational and invisible, the data is useless.
- For a user tag management library, configuration means production: Business teams are always dissatisfied with technical teams. When the operation team wants to hold an activity, they need to prepare materials according to the business scenario and prepare the activity page. In addition, they must select a group of target users based on rules and then perform operations for them. To meet these needs, they must first work with the project manager to propose requirements. Then, the product manager communicates with the algorithm and technical teams and writes a product requirements document (PRD). Next, the operation team waits a few days for the activity to be developed and launched. The process often takes a long time and cannot meet the needs of the operation team for quick iteration, quick trial and error, and quick customer operations. The requirements of the operation team are usually not very complicated. For example, the operation team may only want to see the users who have accessed the app and mini program in the last 30 days and the people hit by advertisements over the past 2 days. However, many enterprises face difficult technical schedules.
As a result, the operations team is dissatisfied, and the technical team lacks a sense of accomplishment because their daily work generally consists of running SQL statements and other tedious and fragmented tasks. Enterprises need to think about how to efficiently solve such situations. Umeng+ hopes that Umeng DataBank will enable the production of preset private domain tags. This way, as long as the technical team does a good job in data tracking, they will not have to do much more work. All products must support the operations. This enables quick configuration and production on the platform, empowers business teams, and allows them to preconfigure private domain tags. Configuration means production. In addition, Umeng DataBank provides a new capability, global domain tags. Private domain tags are only used for customer labeling and insight. Umeng+ will also support global domain tags to identify the interests of different users, so as to understand and label users in more dimensions. In the future, Umeng+ plans to work with other enterprises to jointly establish a tag lab and share the different data of the parties. Integrated computing can be used to improve tag performance and better serve other enterprises.
- Preset analysis models and custom report structures: The operation team only needs to use preset analysis models and various union, intersection, and difference combinations to obtain various insights. After formulating insights, they can generate and save their own user group package and reuse it for the operations and activities of each business. After the tracking for custom private domain user segments is completed, the MaxCompute data warehouse solution can be implemented in Umeng Cloud to automatically aggregate the behavior of individual users who use multiple terminals each day, automatically generate user archive sequences, and complete automatic configuration. As long as the tracking is correct, the operation team can immediately complete private domain group segmentation. Umeng+ hopes to apply the preceding lightweight solution to solve various support problems in actual production.
- Multiple combination modes for filtering target users: Assume the following scenario. A building material company has a website and initially contacted customers through the website and a messaging account. Later, the company developed an app and mini program. In this case, a single customer may appear in all three areas at the same time. When this problem occurs, the data is not interoperable and the organization must operate these areas separately. In fact, the essential problem is whether mini program customers can be quickly found through the app and whether the client can deliver ads, perform operations, and return data. Umeng+ combines multiple modes to help the operation team find the right users without being tied to a schedule.
- Multi-channel outreach and interaction effect tracking: U-CDP supports multiple channels, including SMS, EDM, and app messages. All operation effects are visible in real time. In essence, U-CDP works with technical teams to empower business teams by solving problems of efficiency and enhancing their operational capabilities, and accumulating user data assets.
After collecting data, Umeng+ integrates it with the customer’s data. By seamlessly connecting with MaxCompute in the cloud, Umeng supports greater openness and return capabilities.
- One-click data packet subscriptions and return: As shown below, Umeng Cloud collection helps customers quickly collect data from different platforms, such as mobile clients or other clients, and servers. If customers perform this process by themselves, it will take a long time to complete it and the ultimate effect cannot be guaranteed. Based on UMID connection capabilities, multiple accounts and multiple terminals are normalized, supporting data interconnection between different terminals. Umeng+ helps customers process and generate different data packets. As long as customers use SDKs, data packets are automatically generated and data is automatically transferred to MaxCompute. Then, DataWorks, DataV, and QuickBI can be used to integrate this data with customer data, significantly reducing costs. In this way, the customer no longer uses raw data, but data that has been processed by Umeng+. Then, instead of concerning themselves with raw data, customers can focus on business product development, business scenario empowerment, and business innovation.
Seamlessly integrating Umeng+ with the MaxCompute cloud data warehouse not only improves processing performance, but makes it easier and more convenient to use the system. Umeng+ preconfigures all model packages and tables for users and interconnects data. This means the data is ready to use right away.
- Quick BI intelligent data analysis and presentation: The following figure shows a customer’s smart data analysis presentation when Umeng+ and Quick BI are used. After data integration and return, business personnel can use MaxCompute and Quick BI to perform self-service analysis, including drag-and-drop self-service analysis and online table analysis. This greatly improves the efficiency of the analyst team as they no longer need to do the time-consuming work of merging different data.
No matter how powerful an enterprise’s containers, databases, and algorithms are, or how intelligent their applications are, it is necessary to go back to the four key steps. First, we must transform businesses into data to manage data collection and quality. Second, we must transform data into assets. This gives the management a clear understanding of user data assets, the number of terminals, the number of touchpoints, the data generated each day, and the number of accumulated users. Third, we must transform assets into applications. Accumulated data should be quickly converted to applications that serve the business team. This way, the business team can better innovate with the help of technology and data and will not have to wait for resources. The most fundamental thing is to build scenarios and closed data loops that cover all touchpoints and business behaviors. Such scenarios and closed loops can accumulate data assets. This is the only way to ensure that the enterprise mid-end and data empowerment are sustainable and that the power of data grows richer and better throughout the data utilization process.