At this year’s Computing Conference, a glass-walled structure stood between blocks A and B, capturing the attention of conference attendees. It was the data operations center and central media editing area for the Computing Conference. Perhaps the most eye catching feature of the data operations center was the big screen that displayed live information about video production as well as real-time traffic data. What technologies were behind this amazing solution?
“The age of people-generated video contents has witnessed the flourishing of mobile Internet and smart devices, as well as the fast evolution and close integration of artificial intelligence (AI) and video technologies. The entire process of the media asset service is also experiencing significant changes,” Hu Fan, a senior technical expert at Alibaba Cloud said. “To be specific, the entire process linking production and editing, management and control, and distribution and consumption is becoming more and more intelligent. Alibaba Cloud integrates AI into its ApsaraVideo solution to build a global integrated media asset service, enabling enterprises to implement intelligent collaborative production of media assets, resource sharing, and fast distribution.”
This article introduces the application of AI technologies in the entire process of the media asset service, as well as Alibaba Cloud’s exploration and practices related to the next-generation intelligent media asset production service.
To achieve intelligent and efficient media asset service, Alibaba Cloud has built extensive media products with complete AI capabilities. AI features, such as review, recognition, interpretation, and search, are fully integrated into each link of the media asset service to bring forth a brand new intelligent reform and set the foundation for the entire intelligent production solution.
Intelligent Production and Editing
- Intelligent capture: On a capture device, if the media asset platform requires user-generated content (UGC) and professionally-generated content (PGC), Alibaba Cloud’s short video software development kit (SDK) can provide product-level features for video recording and non-linear editing, including eye enlarging, face slimming and other advanced beautify effects, as well as animated stickers and special effects based on facial recognition and face image tracking techniques. This helps you make more diversified and creative videos.
- Intelligent editing: The video resources that are captured can be edited in the cloud or on a device. On a device, Alibaba Cloud’s short video SDK provides custom materials and supports multiple types of montage, filters, transitions, and music special effects. On the cloud, cloud-based editing and multimodal content interpretation technologies are used to automatically re-produce quality content. Compared to traditional video production, intelligent production reduces the processing duration to seconds and reduces the investment into devices significantly.
- Real-time subtitles: Different from traditional manual conversion and translation, intelligent production can perform automatic voice-to-text conversion through automatic speech recognition (ASR), store the text at the corresponding timeline, and then automatically translate the text from the original language to the language required. The amount of human intervention is greatly reduced. This technology is applicable to not only offline videos, but also live conference videos for real-time subtitles production.
Intelligent Media Asset Management
- Content review: Illegal content may result in serious threats for enterprise operation. Intelligent review can be triggered through workflows or called APIs. It can help identify pornographic, reactionary, violent, terrorist, and politically sensitive content to control and handle risks. A media blacklist is set up based on the illegal content identified to further improve review efficiency.
- Intelligent thumbnail extraction: This feature automatically selects the best keyframe or clip to be the video thumbnail to better illustrate the core content and increase views.
- Intelligent catalog: Traditional intensive catalog takes about 2 to 4 hours for a 1-hour video. In the age of Internet and content explosion, intelligent catalog can apply technologies, such as video auto-classification, flagging, character recognition, and speech recognition, to generate video information, add videos to the media asset database, and make intelligent recommendations based on scenarios of natural language processing (NLP) and part-of-speech filtering. The whole process is driven by algorithms with no need of human labor.
- Intelligent cloud-based broadcasting: The cloud caster integrates the multimodal content interpretation technology to automatically add character information to the video and produce video highlights in a real-time and precise manner. This solution helps lower the threshold and costs of professional broadcasting instruction equipment, advanced editing software, and professional personnel. With no need for human labor, this solution can reduce the cost by hundreds of times and realize collaboration across multiple locations.
Intelligent Distribution and Consumption
In terms of distribution and consumption, the intelligent media search engine developed based on video DNA (one for each medium) precisely delivers the highest quality content to users. The fingerprint index to video DNA features stability regardless of any change to file format, editing, compression, or rotation. It also helps to effectively identify user-generated videos and replicated videos to avoid redundancy in content searching and improve user experience. In addition, video DNA is also applicable to the protection of video copyright.
Besides the data operations center of the Computing Conference, intelligent video production is also applied to other cases.
Use Case 1: World Cup Highlights
During this year’s World Cup which was live-streamed by Youku, Alibaba Cloud’s intelligent video production solution was applied to instantly generate match highlights within a minimum of ten seconds, improving the production efficiency by nearly 10 times. 20% of the World Cup short videos were produced by AI.
Hu Fan said, “In essence, highlights of star players were produced by connecting each timeline in which the player appears to automatically generate character highlights. To this end, we resorted to face image library definition and face registration. So, we built a face image library for each star player. On this basis, we implemented facial recognition and face image tracking on each registered star player, and performed a comprehensive dynamic analysis on the timeline in which the player appears and the coordinates of the player relative to the image. Of course, mere face images and timeline are not enough. We also applied ASR and optical character recognition (OCR) to do real-time analysis on match commentary and records, and obtained relevant information including player names, key events, and scores. For editing, we applied the multi-segment concurrent editing mode to ensure time effectiveness.”
Use Case 2: Intelligent Sports Meeting
An intelligent sports meeting was held at this year’s Computing Conference. The intelligent video production solution was also used to generate intelligent highlights of the surfing and 3x3 basketball games.
The intelligent highlights of the surfing game were generated based on Alibaba Cloud’s media processing capabilities and the video AI technology developed by Alibaba’s Machine Intelligence Technology (MIT) Lab. The intelligent video production solution fast learned the data from multiple surfing matches, and conducted panoramic analyses and modeling on players’ poses, actions, and movements from multiple perspectives, to reach an accurate understanding of a surfing performance to allow the video AI to judge the players’ performance and obtain highlights of each player. Then, the intelligent cloud caster cut and merged the live images from the scene. The video on-demand service performed resolution, noise reduction, merging, and cutting on the clips of the live stream, and then fast composed highlights for each player after the smoothing process of cloud-based editing. These match highlights can be downloaded and forwarded in a real-time manner.
For the 3x3 basketball game, similar technologies were used to produce highlights of matches and players. The leader of this project, Lian Yanan, a senior product manager at Alibaba Cloud video AI, said that, “It’s exciting to see that we have overcome the challenges with the time effectiveness and quality of highlights production in such a short time. We have provided a brand new experience for the participants. This is also the first time that intelligent cloud-based broadcasting, intelligent cloud-based editing, and video AI technologies are integrated successfully. Apart from World Cup highlights production, this case again brings innovation to the sports industry.”
Building a Global Integrated Media Asset Production and Management Platform
More and more enterprises are engaged in global businesses, when the Internet sees lowering limit of time and space. Based on an infrastructure with global coverage, Alibaba Cloud’s next-generation media asset service supports multi-region high-speed data synchronization, and enables collaborative production, management, and control of multiple centers in and outside China anytime, anywhere. In addition, Alibaba Cloud CDN has over 1,500 nodes across the globe, allowing customers to fast distribute media content to more than 70 countries in six continents.
As shown in the following figure, you can access the media asset service from different places in and outside China. Every region is deployed with a full set of media asset service with the access layer and the application layer independent of each other, but the core metadata in the media asset database is stored in full mode in each region. As for reading and writing in one region, the system guarantees strong consistency. As for cross-region reading and writing, the system can ensure not only fault-tolerant partition and availability but also consistency.
Moreover, remote all-active and automatic fault migration are implemented through domain name resolution, request transfer, and other means to ensure high stability of the media asset service. To reduce data transmission and back-to-origin requests, media files are stored, processed, and calculated in each region. So each region stores full metadata and part of physical files.