Alibaba Cloud Machine Learning Platform for AI: Financial Risk Control Experiment with Graph Algorithms

Join us at the Alibaba Cloud ACtivate Online Conference on March 5–6 to challenge assumptions, exchange ideas, and explore what is possible through digital transformation.

By Garvin Li

Note: Data in this article is hypothetical and is created for experimental usage only.

Graph algorithms are typically applied to relationship-based business. Unlike structured data, graph algorithms organize data into relationship graphs with nodes connected to each other by edges. Alibaba Cloud Machine Learning Platform for AI (PAI) provides several graph algorithm components, including K-Core, maximum connected subgraph, and label propagation classification.

This section uses graph algorithm components in the Alibaba Cloud Machine Learning Platform for AI to create an experiment as follows:

Image for post
Image for post

The figure above shows the relationships among a group of people. The arrows in the figure represent the relationships between these people, for example, coworkers or relatives. Enoch is a trusted customer and Evan is a fraudulent customer. Graph algorithms are used to calculate the credit score of other people in order to learn the probability of a person being a fraudulent customer. The results can be used by corresponding institutions for risk control.

Datasets

The following table shows the attributes in the dataset.

Image for post
Image for post

The following figure shows the dataset.

Image for post
Image for post

Data Exploration Procedure

The experiment flowchart is as follows:

Image for post
Image for post

Maximum Connected Subgraph

Maximum connected subgraph: the input data in graph algorithms is represented by a map of relationships. The maximum connected subgraph is used to find the cluster that contains the most interconnections, in order to remove people that do not contribute from risk control.

This experiment uses the maximum connected subgraph component to divide the people into two groups and assign each group a group_id. You can use the SQL script component and JOIN component to remove this group from the subgraph.

Image for post
Image for post

Single-Source Shortest Path

The single-source shortest path component allows you to explore the close and distant relationships. The distance field indicates how many people Enoch needs to contact the target, as shown in the following figure:

Image for post
Image for post

Label Propagation Classification

Label propagation classification is a semi-supervised classification algorithm. It uses the existing label information of the nodes to predict the label information of the unlabeled nodes. Based on the similarity of nodes, label propagation classification propagates each label to other nodes.

To use the label propagation classification component, make sure that you have a connected graph containing all entities and the data for labelling. This experiment uses the read MaxCompute table component to import the labeled data, as shown in the following figure. The weight field indicates the probability of a person being a fraudulent customer.

Image for post
Image for post

Conclusion

By using SQL filtering, the final results show the fraud committing probabilities for all people. The larger the value is, the larger the probability that a person may be a fraudulent customer.

Reference:https://www.alibabacloud.com/blog/alibaba-cloud-machine-learning-platform-for-ai-financial-risk-control-experiment-with-graph-algorithms_594518?spm=a2c65.%2012602492.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store