Video User Network Profiling and Its Application

Image for post
Image for post

By Daole, a Wireless Development Expert, and Xinlv, a Test and Development Engineer at Alibaba Digital Media & Entertainment Group

Released by AliENT

1. Background

We can find some practices of network analysis and environmental profiling on the market. For example, the popular mobile game “King of Glory” supports the measurement of network quality. This game gives users visual measurement results using the latency of the current router, community, and public network to describe the current network status. For example, a 500 kbps network speed limit is used to simulate the bus model, and a packet loss rate of 20% is used to simulate the subway model. The importance of network profiling can be summarized as personalization and differentiated decision-making. Network environments vary greatly with video viewers. This difference can be summarized in two aspects:

  1. Network Quality: Network quality ultimately influences the download rate of video streams. If the download rate is lower than the video bitrate for a long time or the jitter is severe, the playback may freeze easily. The download rate is influenced by a combination of various factors related to the playback process. We must evaluate the quality of each part of the process, so we can find out which part goes wrong once the download rate decreases. Then, we can apply a specific solution. This way, we can easily cope with any possible problems.
  2. Network Environment: Users are probably visiting home networks, public networks, or commuting. The environmental recognition here is no longer real-time data collection and analysis. It requires collecting data in advance. It requires understanding the use environments of users based on features, such as network traffic, to further predict possible playback events in this environment. Moreover, by applying custom policies, we can provide users with a better video viewing experience.

2. Network Profiling

The network for video playback on the user terminal is shown in Figure 1. From the perspective of network topologies, the main factors that influence the playback and download speeds of videos are user devices, LANs, public networks, and CDNs. Signal strength is the main parameter to determine how much user devices influence network quality. The network quality in a LAN reflects the data distribution capabilities of gateways. Quantitative indicators in this regard include the data latency, packet loss rate, and network channel congestion between devices and gateways. Public network quality indicates the quality of random network requests initiated by devices. Indicators in this regard include the latency and packet loss rate of network requests related to random addresses. The CDN factor here is about whether the quality and scheduling policy of CDNs are normal. Quantitative indicators include the download speed, Transmission Control Protocol (TCP) latency, and packet loss rate of playback fragments to be downloaded.

Image for post
Image for post

Indicators related to user devices, LANs, public networks, and CDNs come from different sources and have different data dimensions. Therefore, we need to cleanse the indicator data and find its ability to characterize the network speed through statistical feature analysis. Figure 2 shows the method of extracting statistical features from the original time series data:

Image for post
Image for post
  1. Data Cleansing: Raw data collection introduces dirty data due to problems, such as thread timing. The dirty data includes the value 0 or the maximum value. For example, the gateway delay Round-Trip Time (RTT) may be mixed with some data, such as -1, 0, or the timeout value. Abnormal values affect the final decision. Generally, we delete abnormal values. For missing values, we may delete or fill them. To fill such missing values, we can use the methods, including the mean completer, random filling, and k-nearest neighbor (KNN) filling, based on the condition.
  2. Data Standardization: The cleansed data is normalized, and the limit on units is removed to facilitate the comparison or weighting of different indicators. Finally, multiple features are converted into a multi-dimensional vector, and the vector is standardized to achieve data standardization. Both data with ms as a unit and values with kbps as a unit form an element in the vector.
  3. Feature Derivation and Selection: Feature derivation aims to convert the original features and calculate the required new data. For example, calculate the mean value, variance, standard deviation of features, and select X-quantile data to characterize features. For the gateway latency RTT, the RTT is collected multiple times, the mean or variance of the multiple acquisition results is calculated, and then the mean or variance is reported to represent the gateway latency RTT. For network card traffic, we expect to get its maximum value within a short time. Therefore, taking the mean value is not the best solution. Here, we take the 90th percentile value of data to characterize the data.

To select features with a higher correlation with the check result, we can verify correlation. For example, we use the Pearson correlation coefficient to verify the correlation between variables. For the features obtained through data cleansing, standardization, and selection, their differentiation needs to be proved by corresponding clustering algorithms. We can observe related data through the scatter and density charts below:

Image for post
Image for post

Figure 3 is a scatter chart of the gateway latency and average network speed of some users. In the figure, the horizontal axis indicates the average network speed in kbps. The vertical axis indicates the average gateway latency (unit: ms) collected each time, and each red dot represents a piece of data. Unlike the traditional scatter chart, Figure 3 involves the calculation of the scatter density distribution. The blue area denotes a density area. The darker color represents greater density. The figure also contains the boundary distribution maps of data distribution on the right vertical axis and the top horizontal axis. This figure shows the distribution of some data by converting the data into visual images. The figure gives the following information:

  1. The highest density parts are located in areas with low gateway latency.
  2. According to the horizontal axis, as the network speed increases, the possibility that the gateway latency has larger values decreases.

3. Application Scenarios

Table 1: A list of policies for weak networks

Image for post
Image for post

1) Weak Network Prompts for Users

Image for post
Image for post

2) Scheduling Optimization of Weak Networks

3) Download Optimization of Weak Networks

4) User Scenario Profiling

Image for post
Image for post

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store