Model Optimization: Application of Machine Learning to AMAP’s Suggestion Service
11.11 Big Sale for Cloud. Get unbeatable offers with up to 90% off on cloud servers and up to $300 rebate for all products! Click here to learn more.
AMAP envisions to connect the real world and make travel better, and to accomplish this vision, we strive to build a smart connection between location-based service (LBS), big data and users. Information retrieval is a key technology to this end, and the AMAP’s suggestion service is an indispensable part of information retrieval.
This article describes the specific application of machine learning for the AMAP suggestion service, particularly the attempts at model optimization. These explorations and practices are verified and have achieved good results in laying the foundation for personalization, deep learning, and vector indexing for the coming years.
Reconstruction of the Search Ranking Module
The suggestion service helps to automatically complete a user-input query or point of interest (POI), which can either be a shop, residential area, or bus station annotated in a geographic information system. After query completion, all candidates are intelligently listed and sorted. Using the suggestion service is expected to reduce users’ input costs by giving smart prompts.
Though the suggestion service features a fast response, it does not support information retrieval for complex queries. It can be understood as a simplified LBS-specific information retrieval service.
Similar to general information retrieval systems, the suggestion service is divided into two phases: document recall and ranking (in LBS, documents are also referred to as POIs). In the ranking phase, weighted scores are sorted based on the textual relevance between queries and documents and the features (such as weights and clicks) of documents.
As manual parameter tuning gets complicated with the growing business and increasing features, rule-based ranking no longer meets the requirements. In this scenario, business problems seeking solutions demand Patching, hence, making the code difficult to maintain.
Therefore, we decide to reconstruct the ranking module through learning to rank (LTR), which is undoubtedly a good choice.
Challenges: Sample Construction and Model Optimization
LTR applies machine learning to solve ranking issues in retrieval systems. The GBRank model is commonly used, while pair-wise is most frequently used by the loss solution. Here we follow these practices. When we apply LTR to solve actual problems, we need to address the paramount issue of how to acquire samples.
AMAP has a huge daily access volume that is served by massive candidate POIs which makes it unfeasible to acquire samples through manual annotation.
Alternatively, applying automatic sample construction methods, such as building sample pairs based on POI clicks by users requires addressing the following issues:
- Click overfitting tends to occur, and as a result, repeated search suggestions are given based on previous clicks.
- Users’ click behavior occasionally fails to measure their real satisfaction.
- Since the suggestion service displays only the top 10 results at the front-end. More results remain hidden and therefore have no clicks.
- Some users tend to enter complete queries without using the enlisted results from the suggestion service. As a result, statistics cannot be collected from such users.
In the preceding cases, no click data is available for modeling. Even though a user clicks a POI, he/she might not be necessarily satisfied.
We also have to address the challenge of feature sparsity during the model learning process. Statistical learning is intended to minimize global errors. Sparse features are often ignored by models because they affect few samples and have limited global impact. However, in practice, these features play an important role in solving long-tail cases. This attaches high importance to optimization during the model-learning process.
The preceding section describes two modeling challenges, first, how to construct samples and then, how to optimize model learning. We propose the following solution for sample construction:
- To evaluate a traveling user’s behavioral session, in addition to analyzing the user’s specific clicks when using the suggestion service, we also need to analyze the user’s behavioral sequence during travel. For example, we must look at the subsequent queries (keywords) entered by the user after the suggestion service gives recommendations and the user’s destination.
- Instead of collecting statistics on query-specific clicks, we need to analyze the user’s click behavior at the end of the session as a whole and generalize the click behavior to all queries of the session.
Follow the steps below to successfully implement the solution for model optimization:
1. Data Convergence: Integrate multiple log tables on the server, including search suggestions, searches, and navigation.
2. Session Splitting: Split and clean sessions.
3. Session Cleaning: Calculate the clicks during the last query of the input session and generalize the calculation result for all queries of the particular session to minimize the input session. Refer to the following illustration.
4. Sample Statistics: Extract random queries with more than one million online click logs and recall the first N candidate POIs from each query.
With the preceding sample construction solution, we acquire tens of millions of valid samples to be used as the training samples for GBRrank.
As for features, we consider the following four modeling requirements and propose a feature design solution for each requirement.
- Multiple recall links exist, such as cities and pinyin. A feature design solution is required to address the comparability among different recall links.
- Target POIs change dynamically with constant user input. A feature is required to represent the dynamic needs of different queries.
- Posterior features, such as low-frequency long-tail queries and no clicks, must be supplemented with prior features.
- The LBS requires a high degree of regional personalization. User requirements vary greatly depending on different regions. To personalize regions, we use the Geohash algorithm to split geographic space into shards, each of which is assigned a unique identifier. Then, we collect statistics by identifier (shard).
The following figure shows the feature design in detail.
After feature design, we conduct necessary feature engineering, including scaling, feature smoothing, position bias removal, and normalization to maximize the effect of features.
Once all rules are removed from the initial version, the MRR of the test set improves by five points approximately. However, model learning issues still exist. For example, GBRank feature learning is highly uneven. When a tree node is split, only a few features are selected, while other features do not work. This pertains to the second modeling challenge, model learning optimization. Specifically, we need to address the uneven importance of GBRank features. This issue is described in detail below.
Let’s take a look at the feature importance of the model with the help of the following chart:
Post analysis, we identify the following key causes of feature-learning imbalance:
- The cross-feature query-click is missing in many cases. The feature value for 60% of samples is 0. This feature has a small gain from the tree-node split and cannot be selected. In fact, when sufficient clicks exist, query-click approximates users’ real intentions more than city-click.
- The text similarity feature is always available but has a relatively low positive and negative sequence. Therefore, it has a lower gain from the tree-node split compared to city-click and cannot be selected either.
In aggregate, due to various causes, the same-feature city-click is always selected as the tree node during the tree model learning process, rendering other features ineffective. This problem can be solved through the following two methods:
- Method 1: Perform oversampling for samples with sparse features and low-frequency queries to increase the split gain. This method is easy, however, it changes the real distribution of samples and does not support flexible target adjustment because oversampling is effective for all features.
- Method 2: We follow this method for this tutorial. It requires to call the loss function. The negative gradient (residual) is modified based on the feature value difference between the two samples to further modify the next split gain of the feature. For example, a loss is generated when an error occurs while learning a sample with the query-click feature. You can call this loss to add the loss_diff penalty to this loss. The split gain of the next tree increases as more losses are generated. This increases the probability of the selection of the query-click feature as the split node. The formula is as follows:
The preceding formula is the negative gradient of the cross-entropy loss function. loss_diff is equivalent to a translation of the Sigmoid function.
The difference is proportional to the value of loss_diff, the penalty, and the split gain of the feature during the next iteration.
After calling the loss, we retrain the model to acquire a two-point MRR improvement for the test set based on the initial model. The rate of solving for historical ranking cases increases from 40% to 70%, which is a significant improvement.
Learning to Rank (LTR) replaces the rule-based ranking method of the AMAP suggestion service and eliminates the need for policy coupling and patching in addition to several significant benefits. The GBRank model is launched to cover the ranking requirements of queries with all frequencies.
We have successfully launched the models for general population personalization and individual personalization, and are actively promoting the application of deep learning, vector indexing, and user behavior sequence prediction in the AMAP suggestion service.