To ensure your Azure Machine Learning Workspace is running smoothly, it's essential to establish best practices and governance. This means setting clear roles and permissions to avoid confusion and ensure only authorized users can access and modify your workspace.
Having a single owner for your workspace is recommended to maintain control and accountability. This owner should be responsible for creating and managing users, as well as monitoring and managing workspace resources.
A well-organized workspace with clear naming conventions is crucial for efficient collaboration and knowledge sharing. This includes using consistent naming conventions for experiments, datasets, and models to avoid confusion.
Regularly reviewing and updating your workspace configuration is also vital to ensure it remains aligned with your organization's needs and goals. This includes checking for any unused resources and removing them to avoid unnecessary costs.
Configuration and Management
In an Azure Machine Learning workspace, you can save your workspace Azure Resource Manager (ARM) properties to a config file using the write_config method. This method is a simple way to reuse the same workspace across multiple projects or notebooks.
The write_config method takes two parameters: path and file_name. The path parameter specifies the location where the config file will be saved, and it defaults to '.azureml/' in the current working directory. The file_name parameter specifies the name of the config file, and it defaults to 'config.json'.
If you want to save your workspace ARM properties to a specific location, you can provide the path parameter. For example, if you want to save the config file in a directory called 'my_config', you can pass the path '/home/user/my_config' to the write_config method.
You can also customize the name of the config file by passing a string to the file_name parameter. For example, if you want to save the config file with the name 'my_workspace_config', you can pass the string 'my_workspace_config' to the file_name parameter.
Here are the default values for the path and file_name parameters:
Security and Access
Azure Machine Learning workspace offers robust security features to protect your data.
You can choose between Credential-based access or Identity-based access when connecting to the default storage account. For identity-based authentication, the Storage Blob Data Contributor role must be granted to the workspace managed identity on the storage account.
Data encryption is also a top priority in Azure Machine Learning. All the data stored in Azure Blob storage is encrypted at rest with Microsoft-managed keys.
Azure Machine Learning uses TLS to secure internal communication between various microservices, and all Azure Storage access occurs over a secure channel.
To further secure your data, Azure Machine Learning uses the Azure Key Vault instance associated with the workspace to store credentials of various kinds, including storage account connection strings and passwords to Azure Container Repository instances.
To ensure secure access to your workspace, you can also enable high business impact features. This will help protect your data and control the amount of data Microsoft collects for diagnostic purposes.
Networking and Endpoints
You can configure your Azure Machine Learning workspace to have a private endpoint, which limits access to your workspace to an Azure Virtual Network you created. This requires an existing virtual network.
To create a private endpoint, you'll need to select Private with Internet Outbound or Private with Approved Outbound in the workspace settings. This will allow you to configure the network and outbound rules.
You can also use the Azure Portal or SDK to retrieve the Azure resource IDs for existing resources, such as a resource group, storage account, or key vault. These IDs can be used to reuse existing resources in your workspace.
Here's a summary of the private endpoint configuration options:
By using a private endpoint, you can improve the security and isolation of your workspace.
Networking
Networking is a crucial aspect of setting up a workspace. You can choose from three network configurations: Public endpoint, Private with Internet Outbound, and Private with Approved Outbound.
To use a private endpoint, you'll need to create a virtual network first. The default network configuration uses a Public endpoint, which is accessible on the public internet.
You can select Private with Internet Outbound or Private with Approved Outbound to limit access to your workspace to an Azure Virtual Network you created. This will require you to configure the settings.
To create a private endpoint, follow these steps: Add a private endpoint to your workspace, set the location, name, and virtual network to use, and integrate with a private DNS zone if needed.
If you selected Private with Internet Outbound, use the Workspace Outbound access section to configure the network and outbound rules. If you selected Private with Approved Outbound, add more rules to the required set.
You can also create a workspace with disabled internet connectivity via Studio by specifying a hub workspace that has public network access disabled. A private hub has a 'lock' icon.
Here are the steps to create a private endpoint:
- Add a private endpoint to your workspace
- Set the location, name, and virtual network to use
- Integrate with a private DNS zone if needed
- Configure the Workspace Outbound access section if using Private with Internet Outbound
- Add more rules to the required set if using Private with Approved Outbound
Tools for Interaction
You can interact with your Azure Machine Learning workspace in several ways, including using the Azure Machine Learning SDK in any Python environment.
The Azure Machine Learning CLI extension v2 allows you to interact with your workspace on the command line.
You can also use the Azure Machine Learning CLI extension v1 to interact with your workspace on the command line, although this is an older version.
Here are the ways you can interact with your workspace:
In addition to these interfaces, you can also use the Azure Machine Learning CLI extension v2 and v1 to interact with your workspace on the command line.
To create a workspace, you can use the Python SDK, Azure CLI, VS Code, Studio, or the Portal.
Workspace management tasks are available in each interface, including creating a workspace, managing workspace access, creating and managing compute resources, and creating a compute instance.
You can use the Azure Machine Learning Studio, Python SDK, or CLI to create a compute instance.
Azure Machine Learning supports cross-compatible platform tools, allowing you to use your preferred tools to get the job done.
Authentication and Credentials
You can connect to an Azure Machine Learning (AML) workspace by providing raw credentials as strings to the Workspace constructor, but this approach is not recommended as it stores your workspace information in code and makes it harder to use the same code to affect multiple workspaces.
Azure AML uses the Azure Key Vault instance associated with the workspace to store credentials, including the associated storage account connection string, passwords to Azure Container Repository instances, and connection strings to data stores.
You can also use a configuration file to authenticate with your workspace, or use parameters to create an MLClient object from scratch.
Azure AML uses a user-assigned managed identity that has access to all keys, secrets, and certificates in the key vault, which makes it easier to manage credentials.
To add or update a connection under the workspace, you'll need to provide the connection details in JSON format, including the name, category, target, authorization type, and value.
Here are some common connection types you can use with Azure AML:
- container registry
- storage account
- key vault
- application insights
Credential Encryption
Credential encryption is a vital aspect of Azure Machine Learning, and it's handled through the Azure Key Vault instance associated with the workspace. This instance stores credentials of various kinds, including the associated storage account connection string, passwords to Azure Container Repository instances, and connection strings to data stores.
The workspace shares a user-assigned managed identity that has the same name as the workspace, and this managed identity has access to all keys, secrets, and certificates in the key vault. This ensures that sensitive data is protected and can only be accessed by authorized users.
Here are some key variables related to credential encryption in Azure Machine Learning:
Connecting with Credentials
Connecting with Credentials is a crucial aspect of authentication in Azure Machine Learning. You can manually connect to a workspace by providing the necessary information as strings to the Workspace constructor, but this approach is not recommended as it stores your workspace information in code that's likely tracked in version control.
Azure Machine Learning uses the Azure Key Vault instance associated with the workspace to store credentials of various kinds, including storage account connection strings, passwords to Azure Container Repository instances, and connection strings to data stores.
You can create an MLClient object from parameters, or with a configuration file, to specify the connection to your workspace. The MLClient object requires a credential, subscription_id, resource_group_name, and workspace_name to connect to the workspace.
The second way to connect to a workspace is by using a config.json file to represent your workspace. This file lives in the same directory as your Python code and can be downloaded from the Azure Portal.
Here are the required parameters to connect to a workspace using a config.json file:
You can connect to the workspace in Python via the following code:
Azure Subscription IDs can be considered sensitive information, so it's recommended to add the config.json file to your .gitignore file and not track the file in version control if storing your code in a publicly-visible repository.
Sync Keys
Sync Keys is a feature that allows you to update keys for resources in your workspace immediately. This is particularly useful if you need to access storage after regenerating storage keys.
If keys for any resource in the workspace are changed, it can take around an hour for them to automatically be updated. This is why Sync Keys is a valuable tool to have.
You can use the `sync_keys` function to update keys upon request. This function takes an optional argument called `no_wait`, which determines whether to wait for the workspace sync keys to complete.
The `no_wait` argument is a boolean value that defaults to `False`. If you set `no_wait` to `True`, the function will not wait for the workspace sync keys to complete, allowing you to continue with other tasks while the sync process runs in the background.
Here's a summary of the `sync_keys` function and its `no_wait` argument:
Frequently Asked Questions
What is workspace in Azure machine learning?
In Azure Machine Learning, a workspace is a collaborative environment where you can create, manage, and organize machine learning artifacts with your team. Learn how to manage access and use workspaces to streamline your machine learning workflow.
Sources
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace
- https://learn.microsoft.com/en-us/azure/machine-learning/concept-workspace
- https://accessibleai.dev/post/azureml-sdk-connect-to-workspace/
- https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace
- https://www.massdriver.cloud/templates/azure-machine-learning-workspace
Featured Images: pexels.com