Azure Machine Learning Designer is a visual interface that allows users to create, manage, and deploy machine learning models without writing code. It's a great tool for beginners and experts alike.
With Azure Machine Learning Designer, you can import data from various sources, including Azure Blob Storage, Azure Data Lake Storage, and SQL Server. You can also use the designer to create and train machine learning models using a variety of algorithms.
One of the key benefits of Azure Machine Learning Designer is its simplicity. It allows users to create machine learning models without needing to write code, making it a great option for those who are new to machine learning.
Consider reading: Create Vm Azure
Data Preparation
Data preparation is a crucial step in Azure Machine Learning Designer. First, you must create a pipeline and add the dataset you want to work with. To do this, select the Designer item in Azure Machine Learning studio, create a new pipeline, and change the draft name to something descriptive.
Explore further: Webflow Designers
You'll then need to load your dataset onto the canvas, which can be done by selecting the Data button in the Asset library pane and searching for the dataset. Once loaded, you can preview the data to review its schema and distributions.
To clean missing data, use the Clean Missing Data component, which can be found in the datasets and component palette. This component allows you to remove entire rows with missing values, which is a common practice in machine learning. By selecting the Remove entire row option, you can ensure that your dataset is complete and ready for modeling.
Here are the steps to clean missing data in Azure Machine Learning Designer:
Splitting data is another important step in data preparation, which allows you to separate your data into training and testing sets. To do this, use the Split Data component, which can be found in the datasets and component palette. By setting the Fraction of rows in the first output dataset to 0.7, you can ensure that 70% of your data is used for training and 30% for testing.
Load Data
To load data, you must first create a pipeline in Azure Machine Learning Designer. This involves selecting the Designer item in the Authoring section and clicking the + button to create a new pipeline. You can then change the draft name to something more descriptive, such as "Diabetes Training".
The next step is to add the dataset you want to work with. You can do this by selecting the Data button in the Asset library pane and searching for the dataset. In this case, the diabetes-data dataset was used. Once you've located the dataset, you can drag it onto the canvas and right-click on it to preview the data.
As you review the schema of the data in the Profile tab, you'll notice that the Diabetic column contains two values: 0 and 1. These values represent the two possible classes for the label that your model will predict. Most of the other columns are numeric, but each feature is on its own scale.
To get a better understanding of the data, you can also visualize it. This involves right-clicking on the dataset and selecting Preview Data. You can then select different columns to view information about each one. For example, in the Automobile price data, each row represents an automobile, and the variables associated with each automobile appear as columns.
Clean Missing Data
Clean Missing Data is an essential step in preparing your dataset for analysis. It's a prerequisite for using most components in the designer.
You can remove missing values from your input data by using the Clean Missing Data component. This component is easily accessible in the datasets and component palette to the left of the canvas.
To add the Clean Missing Data component to your pipeline, click Component and search for it in the palette. Then, drag it to the pipeline canvas and connect it to the Select Columns in Dataset component.
Check this out: Learn Azure Devops Ci/cd Pipeline
Once you've added the component, select it and click on the arrow icon under Settings to open the component details pane. Alternatively, you can double-click the component to open the details pane.
In the component details pane, select Edit column to the right of the pane. In the Columns to be cleaned window that appears, expand the drop-down menu next to Include and select All columns. Then, select Save to apply the changes.
Next, in the Clean Missing Data component details pane, under Cleaning mode, select Remove entire row. This will remove any rows with missing values from your dataset.
You can also add a comment to the Clean Missing Data component by selecting the Comment text box and entering a brief description, such as "Remove missing value rows."
Split the Data
Splitting data is a crucial step in machine learning, where you'll divide your data into two separate datasets for training and testing the model.
You'll need to click on the Component and search for the Split Data component in the datasets and component palette to the left of the canvas.
Drag the Split Data component to the pipeline canvas and connect the left port of the Clean Missing Data component to the Split Data component.
Make sure the left output port of Clean Missing Data connects to Split Data, as it contains the cleaned data.
Select the Split Data component and open the component details pane by clicking on the arrow icon under Settings or by double-clicking the component.
Set the Fraction of rows in the first output dataset to 0.7, which splits 70 percent of the data for training the model and 30 percent for testing it.
The 70 percent dataset will be accessible through the left output port, and the remaining data is available through the right output port.
To customize the Split Data component, expand Node info and enter a comment in the Comment text box, such as "Split the dataset into training set (0.7) and test set (0.3)".
Data Transformation
Data Transformation is a crucial step in preparing your data for training in Azure Machine Learning Designer. You typically need to apply some pre-processing transformations to the data, such as normalizing numeric columns to put them on the same scale.
To start, select the Component module from the Asset library pane, which contains a range of modules for data transformation and model training. You can also use the search bar to quickly locate modules.
The Select Columns in Dataset module is a useful tool for selecting specific columns from your dataset. To use it, place the module on the canvas below your dataset, connect the output from the bottom of the dataset to the input at the top of the module, and then double-click on the module to access its settings. Select Edit column, then in the Select columns window, select By name and Add all the columns, and finally remove PatientID and click Save.
A fresh viewpoint: Azure Azure-common Python Module
Here are the steps to apply normalization to your numeric columns:
- Set the Transformation method to MinMax.
- Check the Use 0 for constant columns when checked to True.
- Edit the columns to transform with Edit columns, then select columns With Rules and copy and paste the following list under include column names:
Once you've applied these transformations, you can preview the transformed data by right-clicking on the Normalize Data module and selecting Preview data, then Transformed dataset.
Remove a Column
Removing a column from your dataset is a common task in data transformation. You can exclude a column altogether if it contains missing values.
To remove a column, start by clicking on the Component button in the datasets and component palette. Search for the Select Columns in Dataset component and drag it onto the canvas.
Drag the Select Columns in Dataset component below the dataset component. Connect the output port of the dataset to the input port of the Select Columns in Dataset component.
Select the Select Columns in Dataset component and click on the arrow icon under Settings to open the component details pane. Alternatively, you can double-click the component to open the details pane.
In the component details pane, select Edit column to the right of the pane. Expand the Column names dropdown next to Include and select All columns.
Select the + to add a new rule. From the dropdown menus, select Exclude and Column names. Enter normalized-losses in the text box.
In the lower right, select Save to close the column selector. In the Select Columns in Dataset component details pane, expand Node info. Select the Comment text box and enter Exclude normalized losses.
Add Transformations
To add transformations to your data, you'll typically need to apply some pre-processing steps. This involves selecting the right modules from the Asset library.
The Select Columns in Dataset module is a great place to start. You can find it in the Asset library, and then place it on the canvas below your dataset. Next, connect the output from the bottom of your dataset to the input at the top of the Select Columns in Dataset module.
Double-clicking on the Select Columns in Dataset module gives you access to its settings. Here, you can select Edit column and choose to add all the columns by name. Be sure to remove any columns you don't need, like PatientID.
The Normalize Data module is another essential tool for data transformation. Place it on the canvas below the Select Columns in Dataset module, and connect the output from the bottom of the Select Columns module to the input at the top of the Normalize Data module.
To set up the Normalize Data module, double-click on it to view its settings. You'll need to specify the transformation method and the columns to be transformed. For this exercise, you can set the Transformation method to MinMax and the Use 0 for constant columns when checked to True.
Here are the key settings for the Normalize Data module:
- Transformation method: MinMax
- Use 0 for constant columns: True
- Columns to transform: Edit columns, select With Rules, and include the following column names: (list not specified in the article section)
Use Sample
To make the most of your data transformation projects, use sample pipelines as a starting point. They're available in your Azure Machine Learning designer workspace and can be saved as your own.
These sample pipelines can be found under the New pipeline section, and you can select Show more samples for a complete list of options.
You can sign in to ml.azure.com and select the workspace you want to work with, then select Designer to access the sample pipelines.
To run a pipeline, you first need to set a default compute target. This might take some time, depending on the sample pipeline and compute settings.
The default compute settings have a minimum node size of 0, which means the designer must allocate resources after being idle.
Explore further: Azure Compute
Training and Evaluation
To train a model in Azure Machine Learning Designer, you'll need to add training modules to your pipeline. This typically involves splitting your data into training and validation sets, training a model using the training set, and then evaluating its performance using the validation set.
You can use the Split Data module to split your data into training and validation sets. This module takes the original dataset as input and produces two outputs: a Results dataset1 containing the training data and a Results dataset2 containing the validation data.
The Train Model module is used to train the model using the training data. You can configure this module to use a classification algorithm, such as Two-Class Logistic Regression, to predict the Diabetic value.
To evaluate the model's performance, you can use the Evaluate Model module. This module takes the scored dataset as input and produces evaluation results, including metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Coefficient of Determination.
Here are some key metrics to look out for when evaluating your model's performance:
- Mean Absolute Error (MAE): The average of absolute errors.
- Root Mean Squared Error (RMSE): The square root of the average of squared errors.
- Relative Absolute Error: The average of absolute errors relative to the absolute difference between actual values and the average of all actual values.
- Relative Squared Error: The average of squared errors relative to the squared difference between the actual values and the average of all actual values.
- Coefficient of Determination: A statistical metric that indicates how well a model fits the data.
Train The Model
To train a model, you need to give it a dataset that includes the price. The algorithm constructs a model that explains the relationship between the features and the price as presented by the training data.
You can use a Linear Regression component to train the model. This component is used to predict a continuous output variable based on one or more input features.
To use the Linear Regression component, drag it to the pipeline canvas. Then, connect the output of the Linear Regression component to the left input of the Train Model component.
The Train Model component is used to train the model using the dataset. Make sure to connect the training data output of the Split Data component to the right input of the Train Model component.
To specify the value that your model is going to predict, select the Train Model component and click on the arrow icon under Settings to open the component details pane. Then, select Edit column and enter the column name exactly, in this case, "price".
Here is a step-by-step guide to training the model:
1. Drag the Linear Regression component to the pipeline canvas.
2. Connect the output of the Linear Regression component to the left input of the Train Model component.
3. Connect the training data output of the Split Data component to the right input of the Train Model component.
4. Select the Train Model component and click on the arrow icon under Settings to open the component details pane.
5. Select Edit column and enter the column name exactly, in this case, "price".
For more insights, see: How to Connect to Azure
Add Evaluate Model Component
To add an Evaluate Model component to your pipeline, start by returning to the Designer and opening the pipeline you created. Then, search for and place an Evaluate Model module to the canvas under the Score Model module, connecting the output of the Score Model module to the Scored dataset input of the Evaluate Model module.
Ensure your pipeline looks like this: Score Model module connected to Evaluate Model module. Select Configure & Submit, and run the pipeline using the existing experiment named mslearn-diabetes-training.
Wait for the experiment run to finish, and then check the status of the job by selecting Jobs under the Assets. From there, select the mslearn-diabetes-training experiment and then select the latest Diabetes Training job.
On the new tab, right-click the Evaluate Model module on the canvas, select Preview data, and then select Evaluation results to view the performance metrics. These metrics help data scientists assess how well the model predicts based on the validation data.
The metrics to review include the Recall metric, which becomes 1 if you move the threshold slider all the way to the left (0), and becomes 0 if you move it all the way to the right (1). Also, look at the ROC curve and AUC metric listed below the Threshold slider.
Here's a summary of the metrics to review:
- Recall metric
- ROC curve
- AUC metric
These metrics can help you understand how well your model performs compared to a random guess. The point of the exercise is to introduce you to classification and the Azure Machine Learning designer interface, not to train a perfect model.
Model Deployment
To deploy a service, simply select Deploy at the top of the job window. This is the first step in making your model available for use.
You'll then be taken to the Set up real-time endpoint page, where you'll select Deploy new real-time endpoint and use the provided settings. Be patient, as this process can take several minutes.
Once deployed, your web service will be live and ready for use.
Deploy a Service
To deploy a service, you need to follow a few simple steps. Select Deploy at the top of the job window, such as the Predict Diabetes job window.
In the Set up real-time endpoint section, choose Deploy new real-time endpoint and use the following settings. This will initiate the deployment process.
Select Deploy and wait for the web service to be deployed - this can take several minutes. You'll know it's done when you see the deployment complete notification.
Here's a quick summary of the deployment process:
- Select Deploy at the top of the job window.
- Choose Deploy new real-time endpoint and use the following settings.
- Select Deploy and wait for the web service to be deployed.
An Inference
Inference is a crucial step in the model deployment process, and it's where the trained model is used to make predictions on new, unseen data. This is where the magic happens, and the model is put to the test.
To test the service, you can use the predict-diabetes real-time endpoint in the Endpoints page. Simply open the predict-diabetes endpoint, select the Test tab, and delete the current data under Input data to test real-time endpoint. Then, copy and paste the provided data into the data section.
The JSON data defines features for a patient, and uses the predict-diabetes service to predict a diabetes diagnosis. When you select Test, you should see the output 'DiabetesPrediction' on the right-hand side of the screen. This output is 1 if the patient is predicted to have diabetes, and 0 if they are predicted not to have diabetes.
You can also use the Score Model component to score the other 30 percent of the data to see how well your model functions. To do this, you'll need to connect the output of the Train Model component to the left input port of Score Model, and the test data output (right port) of the Split Data component to the right input port of Score Model.
Here's a summary of the metrics you can expect to see when evaluating your model's performance:
By evaluating your model's performance using these metrics, you can get a better understanding of its strengths and weaknesses, and make adjustments as needed to improve its accuracy.
Submit
To submit a pipeline, you need to click on the "Configure & Submit" button in the top right corner.
This will open a step-by-step wizard that guides you through the submission process. You'll see three main steps: Basics, Inputs & Outputs, and Runtime settings.
In the Basics step, you can configure your experiment, including the job display name and description. You can also assign values to Inputs/Outputs that have been promoted to pipeline level.
The Inputs & Outputs step is where you can assign values to Inputs/Outputs that have been promoted to pipeline level. In this example, it will be empty because no Inputs/Outputs have been promoted.
In Runtime settings, you can configure the default datastore and default compute for the pipeline. This setting will apply to all components in the pipeline unless overridden at the component level.
The Review + Submit step is the final step, where you review all your settings before submitting the pipeline job. The wizard will remember your last configuration if you submit the pipeline again.
After submitting the pipeline job, you'll see a message with a link to the job detail. You can click on this link to review the job details.
Here is a summary of the submission process:
- Click on "Configure & Submit" in the top right corner.
- Follow the step-by-step wizard through the Basics, Inputs & Outputs, and Runtime settings steps.
- Review and submit the pipeline job in the Review + Submit step.
- View the job details after submission.
Frequently Asked Questions
What is a designer in Azure machine learning?
Azure Machine Learning's Designer is a drag-and-drop interface for building machine learning pipelines in workspaces. It supports two types of pipelines: classic prebuilt (v1) and custom (v2) components.
Sources
- https://microsoftlearning.github.io/AI-900-AIFundamentals/instructions/02b-create-classification-model.html
- https://learn.microsoft.com/en-us/azure/machine-learning/concept-designer
- https://microsoftlearning.github.io/mslearn-azure-ml/Instructions/05-Designer-train-model.html
- https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-designer-automobile-price-train-score
- https://learn.microsoft.com/en-us/azure/machine-learning/samples-designer
Featured Images: pexels.com