By Harshit Khandelwal, Alibaba Cloud Community Blog author.
We live in an era of Big Data where there is an abundance of data and companies across several different industries are coming up with new and unique ways of using this data to create value for their customers. For all of this, machine learning is an important piece of the puzzle.
What Is Machine Learning?
Machine learning (abbreviated ML) can be described as a mechanism whereby a machine learns a pattern from data sets so that it can predict future data. The major types of machine learning algorithms are supervises, semi-supervised, unsupervised, and reinforcement learning. In a machine learning pipeline, training data, some sort of model for that data, and an algorithm are used. After initial training, a test dataset is applied to the model to check the accuracy of predictions made by this pipeline.
Machine learning pipelines typically have the following steps:
- Data source
- Data Preprocessing
- Feature engineering
- Training and prediction
Machine learning can be broadly considered as a subtype of Artificial Intelligence (AI) and the larger umbrella category under which you find other types of algorithms like deep learning algorithms.
The computation power of the machine on which these types of algorithms are deployed also plays a big role in how power the algorithm can be. In the cloud, all of these algorithms are instrumental pieces to many services provided, and they rely on the computing power provided by servers on the cloud.
What Can Machine Learning Be Used for?
One common application of machine learning in recent years is recommendation systems. These systems use user input data to provide user recommendations. On example of these systems is the one used by Netflix.
Netflix uses a state-of-the-art recommendation system that can provide accurate recommendations. The algorithm used takes input such as the user’s viewing history, user ratings, the data of other users with similar tastes, and the time of the day the user watched the content.
This recommendation system is important as about two thirds of movies watched on Netflix are recommended ones. In other similar services provided by Amazon and Google, the story is very similar. For Amazon, 35 percent of sales on their ecommerce platform come from recommendations, and on Google, news recommendations improved click-through rates by 38 percent.
How Does Machine Learning Work on Alibaba Cloud?
This section takes you on a step-by-step tutorial of how to use machine learning on Alibaba Cloud. In this tutorial, you will create a basic machine learning pipeline to create a binary classification algorithm.
First, procure the Data. To do this, find a data source you want to work with. You can find some datasets in the console already. In this example, breast cancer data is used.
The data is as follows:
- Next, feature engineering is involved. For this step what you want to do is to make clear important features are given more importance than others. Here the node has two outputs. One gives the feature weights (for this, high scores are given greater importance in the resulting algorithm) and second output gives the sorted dataset.
The following is the output for node 2 (the feature weights):
- Third, you want to train the machine. To do this, divide the dataset into training and testing datasets by using the split node function, and feed the one of the datasets to the machine learning algorithm called LogisticRegression.
- Next is the evaluation step. For this step, the output from the split and logistic regression are combined in the prediction node, which feeds the result into the confusion matrix evaluation metric to check how accurate our result is.
The result from prediction node is as follows:
Next, the confusion matrix output is as follows:
For this particular machine learning pipeline, Logistic regression is a statistical model which in its basic form uses a logistic function for classification, which can be understood as the prediction of labels. Other models may be a numerical value (or regression), which can be labels or binary numbers, such as 0 or 1. If the prediction values are two then it is called as Binary Logistic Regression, but if the categorical outputs are more than two then it is called as Multinomial Logistic Regression and if the multiple categories are ordered, then ordinal logistic regression.
Machine learning is means by which machines can predict future data based on current data. Therefore, machine learning can use data to provide value to customers. However, the power of a machine learning algorithm is limited by the machine or device is on. This is also the case for servers in the cloud. Machine learning plays an important role in cloud. Last, to develop a machine learning algorithm, you need to follow the regular steps of a pipeline.