Application of Machine Learning in Start-point Road Tracking of AMAP
AMAP is a leading travel solution provider in China and addresses Navigation as its core user scenario. It facilitates route planning, a prerequisite for navigation, to develop a personalized travel plan based on the start-point, endpoint, and path policy settings.
Start-point road tracking is necessary for route planning, and its accuracy is crucial to the route-planning quality and user experience. This article outlines how to improve the accuracy of start-point road tracking of AMAP, and focuses on the exploration and practice of introducing machine learning algorithms.
What Is Start-point Road Tracking?
In simple words, start-point road tracking is the process of acquiring the location information about a user who initiates a route planning request to connect the user’s start point to the actual road where the user is located.
AMAP app provides the following three methods for selecting a start point during route planning:
1. Manual Selection: Here, the user manually marks his/her location on the map.
2. Point of Interest (POI) Selection: A POI indicates a geographic location, such as a store, residential area, or bus stop in a geographic information system.
3. Automatic Location: It automatically locates the user through GPS, a base station, or Wi-Fi.
Manual selection and POI selection generate location information that is more accurate as compared to automatic location mode and therefore highly improves the accuracy of start-point road tracking.
The location coordinates in automatic location mode tend to drift due to accuracy issues of GPS, base stations, and network location. The location captured by a locating device may be several meters, dozens of meters, or even hundreds of meters away from the actual road where the user is located. The primary issue is the accurate identification of users’ locations (down to specific roads) with limited information.
Why Is Machine Learning Needed?
Prior to the advent of machine learning, candidate roads were sorted based on manual rules during start-point road tracking. The core idea is to sort candidate roads based on the weighted scores that are calculated primarily on the basis of distance in combination with the angle, speed, and other features. The weights and thresholds involved in manual rules are manually determined based on comprehensive practice experience.
With the continuous growth across AMAP business, route planning requests, and scenarios, manual rules are increasingly faced with limited applicability. Some key challenges include:
- In spite of being developed on the basis of extensive experience, manual rules are prone to offset and easily result in blind spots due to immature thresholds and weights. This disadvantage cannot be removed.
- New features resulting from upstream data updates cannot be promptly integrated into policies.
- The determination of manual rules demands extensive experience, thereby, making it difficult to implement agile response to personnel changes.
Big data and AI drive the inevitable trend of using the power of data to replace manual operations with automatic processes to improve productivity.
To improve start-point road tracking based on manual rules, we introduce machine learning for automatically determining the relationship between features and road tracking results. AMAP’s unique advantage of acquiring training data for machine learning models addresses the primary challenges of both, a large amount of planning and real-life movement data. Improved expressiveness enables machine learning models to learn the complex relationship between features, and therefore, improves the accuracy of road-tracking.
How Is Machine Learning Implemented?
This section illustrates how to build a machine learning model for start-point road tracking. Let’s deep dive to discover how machine learning is used to solve practical problems:
1. Define the Target Problem
Before introducing a machine learning model, we must mathematically abstract the problem to be solved.
The preceding figure exhibits the schematic drawing of start-point road tracking, where a user initiates a route planning request at point A, and the roads that surround point A constitute an independent set B. The road where the user is located is a unique element C in set B. In this case, start-point road tracking is the process of selecting the road where the user is most likely to be located from the set of roads that surround point A.
Such a process is similar to Searching and Sorting, the two means used for modeling. It includes the following steps:
- Extract the user’s location information, namely, point A, from the route planning request.
- Recall the roads that surround point A within a certain scope to constitute candidate set B.
- Sort the candidate roads and select the first candidate road from the output of the model that is marked as road C, where the user is actually located.
Finally, start-point road tracking is defined as a supervised process of searching and sorting. After determining the target, we proceed to the issues of data acquisition and feature engineering.
2. Acquire Data and Engineer Features
According to industry norms, models and algorithms are only the means of approximating the upper limit of machine learning that is determined by data and features. Data and features are critical to the final effect of a project.
To train a machine learning model for start-point road tracking, we need to acquire the following two types of data from raw data:
- Truth-value Data
Truth value data is the road information about the user who sends a route planning request. In start-point road tracking, the first issue to be addressed by machine learning is the acquisition of truth-value data. When a user initiates a route planning request at point A, the user’s actual location cannot be determined due to the accuracy limitations of location.
However, if the user has information about real-life movement in the area around point A, we can match the real-life movement information with the road network to generate a motion track, which can be used to acquire the road where point A resides. We combine real-life user movement and route planning information by mining the navigation request data of AMAP to acquire a dataset of a one-to-one mapping between requests and truth values.
- Feature Data
In the start-point road tracking model, we extract three categories of features for building sample sets: anchor point-related features, road features, and features that are a combination of the preceding two feature categories.
Feature processing is the core of feature engineering, and feature preprocessing varies depending on different projects. Special processing is required based on the actual scenario and depends on professional experience. In start-point road tracking, we perform a series of data cleansing operations on anchor point-related features, including sample deduplication, outlier processing, error value correction, and mapping.
3. Select a Model
After start-point road tracking is defined as a process of searching and sorting, we can use the various ranking techniques of machine learning such as point-wise, pair-wise, or list-wise to achieve our target. Based on start-point road tracking features, we select the list-wise approach, whose learning-to-rank framework has the following characteristics:
- The input information is a multi-characteristic vector (a query) composed of all roads that correspond to the same route planning request.
- The output information is the scoring sequence of the characteristic vectors that correspond to the request (query).
- For the scoring function, we adopt a tree model.
Also, we select normalized discounted cumulative gain (NDCG) as the model evaluation indicator, which comprehensively considers the relationship between the model sorting result and the actual sequence. NDCG is also a commonly used indicator to measure the sorting results.
4. Train the Model and Evaluate the Results
We extract request information for a certain period of time and acquire truth values and feature data by using the method described in step 2. Moving ahead, we build sample sets by means of tagging, divide the sample sets into a training set and a test-set, train a model, and check whether the results meet our expectations
To evaluate the model performance, we perform road-tracking for the requests in the test-set by using manual rules and machine learning. Then, we compare the results of manual rules and machine learning with truth values and calculate accuracy.
On comparing the results, we find a 10% difference between the results of road tracking via manual rules and the machine learning model. In contrast, model-based tracking shows a 40% increase in accuracy compared to road tracking based on manual rules. The improvement is significant.
At AMAP, we have introduced some scenarios for the application of big data and machine learning to start-point road tracking. The successful launch of the project demonstrates that machine learning plays an important role in improving accuracy and optimizing processes.
In the future, we hope to continue revising existing model scenarios, finding new benefits, and optimizing the effects of machine learning in road tracking through exploration from the perspectives of data and models.