This article presents the Machine Learning solution from Microsoft: Azure Machine Learning.
For the french version, click here.
About Machine Learning
Herbert Simon, 1975 Turing Award winner and Nobel Prize in economics in 1978, said: « Learning is any process by which a system improves performance from experience ».
There are various types of learning methods:
Supervised learning: classification (or ranking) and/or regression algorithms are used to predict unread input data, labeling those already processed. Here are some typical examples:
Unsupervised learning: in this context, the algorithms must learn to perform new tasks independently. That is to say, without being explicitly programmed to perform them. It is, then, a « self-help » exploratory analysis. Here are some typical examples:
- Semi-supervised learning, which is a combination of the two previous types of learning methods. Application example: facial recognition.
Reinforcement learning, which is a learning mode based on the interaction with environment. Specifically, in this context, the algorithms learn from the consequences of their actions, the same way a human being can learn from mistakes. We can also speak of learning by doing, experimenting, testing … Some typical examples:
More concrete use cases of Machine Learning can be found at the following URL: https://www.kaggle.com/wiki/DataScienceUseCases.
Introducing Azure Machine Learning
The development and democratization of Machine Learning – originally reserved for researchers – follows that of Big Data. The Machine Learning algorithms provide the capabilities to push the boundaries of data analysis by providing the ability to perform what is called a « predictive analysis » of the massively processed information.
It is in this context that Microsoft implemented Azure Machine Learning (AML), its Machine Learning solution:
To date, Azure Machine Learning has three major components:
- ML Studio, which allows the implementation of predictive models from imported data. This service is accessible via Azure Management Portal or here: https://studio.azureml.net/. Data can be imported from HDInsight, Azure Storage, local data …
- Azure Portal ML & Service API, which lets you create and manage ML Studio workspaces.
- ML API Service, which allows you to publish predictive analysis from models implemented in ML Studio and deployed via Azure Portal ML & Service API. The publication can be made to web applications, mobile applications, in the form of interactive reports (PowerBI, …), …
The processed data can either be retrieved from Web applications, mobile, analytical reports or HDInsight, Azure Storage, local data.
From an operational point of view, each major component corresponds to a type of user profile:
- Operational (IT professional) uses Azure Portal ML & Service API to provision and manage workspaces for models.
- Data scientist (scientist, researcher, data analyst) uses ML Studio to build models.
- Developer (IT professional) tests made models and publishes them via a Web service.
Using Azure Machine Learning
Before you start…
We will use Azure Management Portal to implement a workspace. But before that:
- Make sure you have a Windows Azure from Microsoft subscribed account. To do this, you can go here: https://account.windowsazure.com/Subscriptions.
- To quickly access the Azure Management Portal, the Azure management interface, you can directly go to the following URL: https://manage.windowsazure.com.
Note that the Creating a workspace in ML Azure subsection is optional. In this article, its interest is purely tutorial. In normal times, it is recommended to have a workspace to store and manage ML models.
If you just want to test Azure ML (see Using Azure ML subsection) and experimenting models, go directly to the URL: https://studio.azureml.net/.
Getting started with Azure Machine Learning
The presentations made are based on the preview version of Azure ML Studio. Therefore, it is possible that some elements of the ticket become obsolete thereafter, or as new features are added in the final version. In all cases, please go to the official site for more information: http://azure.microsoft.com/en-us/documentation/services/machine-learning/.
Creating a workspace in ML Azure
To create a ML workspace, follow these steps:
- Access to the Machine Learning menu.
In the vertical navigation panel at the left of the Azure Management Portal, click Storage, and then select Create a storage account:
This will open the horizontal panel services, with quick creation options (Quick create), at the bottom:
- The workspace name is the name of the workspace to create, and within which the ML models will be implemented. It must be globally unique, with lowercase (and no special) characters.
- The workspace owner is the name of the owner who will manage the ML models. By default, the name of Azure subscription account is specified.
- The location corresponds to the geographical location where the workspace will be stored.
- The new storage account will be the one used to create the workspace storage.
In our context, the following information were specified:
- Check out the ML workspace.
If all goes well …
A quick look at the Machine Learning section allows to quickly check if the workspace storage account has been created and is online:
Using Azure ML Studio
You can either:
- Go to the dashboard of the freshly created workspace, then click Access your Workspace:
- Click at the bottom page of Machine Learning service on Open in Studio:
- Directly go to the following URL: https://studio.azureml.net. Then click Get started now:
It will launch the following Azure ML Studio homepage:
To start creating a ML model, click New, at the bottom of the window from the Experiments service.
We see two tabs:
Dataset, which allows to create/configure data sources (CSV files, text files, compressed files, …) as objects:
Experiment, which creates (click Blank Experiment) experiment and ML models for analysis. It is possible to reuse existing examples produced by Microsoft (see above in this article).
Here is an overview of the ML interface design:
The middle of the interface design is the space in which any operation is designed to process a ML model. It is in this space that are dragged and dropped various modules (from the left pane) to build a ML model.
All Azure ML modules are classified by group:
Saved Datasets, which allows the use of previously created data sources, including the following examples:
- Data Format Conversion, which converts the data stream to a specific format. Examples of modules:
- Data Input and Output, which allows to perform read and write operations on the data sources:
- Data Transformation, which converts the data. The main processing operations are:
- Filtering operations:
- Manipulation operations:
Sampling and splitting operations:
Scaling and reducing operations:
- Feature Selection, which filters the source data:
- Machine Learning, which is used to implement the models through the use of learning algorithms. Examples of modules:
- Evaluation operations:
- The algorithms for classification and regression are part of supervised Machine Learning. Examples of available algorithms:
- The clustering algorithms are part of non-supervised Machine Learning. An example of available algorithm:
OpenCV Library Modules, which allows the use of Open Computer Vision graphics library for real-time image processing.
R Language Modules, which allows to run encoded scripts in R language (one of the most used programming languages in the statistical world):
Statistical Functions, which allows to use various statistical functions:
Text Analytics, which provides various modules for data mining (text mining):
Note that some modules can allow to work with data sources other than flat files. This is the case, for example, a module such as Writer or Reader, from the Data Input and Output group:
It is possible that in the future, new types of data sources are available. For example, SQL Server databases, OLAP cubes, etc …
Publishing a ML model
Models made in Azure ML Studio can be published as a Web service through an API called REST. Therefore, it is possible to view the results in mobile applications, web applications, PowerBI, an SSIS package, etc …
There are basically two RESTful modes of publication:
- Request Response Service (RRS) for synchronous and low latency use. Very useful in near real-time predictive analytics context.
- Batch Execution Service (BES) for asynchronous use. Useful for testing data. Very useful if we want to work with formatted data into a file stored in the storage Azure (WABS) or a Hadoop cluster (HDFS). Or when a very large number of data sources (HDInsight, SQL Azure BLOB service …) need to be bulk-processed for scoring.
Within Azure ML Studio, Web services are managed in the Web Services section:
By the way, a click on the selected ML model (column Name) leads to this interface:
For further information …