Azure Batch Essentials for Cloud Computing

Author

Reads 987

Computer server in data center room
Credit: pexels.com, Computer server in data center room

Azure Batch is a powerful cloud computing service that allows you to run large-scale batch jobs on a scalable and on-demand infrastructure.

You can use Azure Batch to run jobs that require significant processing power, memory, or storage, making it ideal for tasks such as data processing, scientific simulations, and machine learning model training.

Azure Batch provides a managed environment for running jobs, which means you don't have to worry about provisioning and managing the underlying infrastructure.

This allows you to focus on your application and business logic, rather than worrying about the underlying infrastructure.

Getting Started

To get started with Azure Batch, you'll first need to create a Batch account. This is a straightforward process that can be completed in a few steps.

You must have a Batch account to create pools and jobs, so it's essential to get this set up first. A storage account is not required for this process, but it's useful for deploying applications and storing input and output data.

Credit: youtube.com, Introduction To Azure Batch and Run your first Azure Batch job in the Azure portal

To create a Batch account, navigate to the Azure portal and choose to create a resource. Type "batch service" in the search box and select Batch Service. Select Create to begin the process.

You'll then need to select Create New under Resource Group and name your resource group. This name must be unique within the Azure location you selected and can only contain lowercase letters and numbers, with a length of between 3 and 24 characters.

Compute Environment

There are two ways to create an Azure Batch compute environment in Seqera Platform: Batch Forge and Manual. Batch Forge automatically creates Azure Batch resources, while Manual is for using existing Azure Batch resources.

You can create a Batch pool with a specific Azure virtual machine configuration and operating system, such as Windows Server or Linux. The operating system and version you choose will depend on the type of compute nodes you want to run in your pool.

Here are the two types of nodes you can specify when creating a pool:

Resource Group

Credit: youtube.com, Resource Groups in Microsoft Azure Portal

Creating a resource group is a crucial step in setting up your compute environment. You can create a resource group in your preferred region.

To create Azure Batch and Azure Storage accounts, you'll need to create a resource group first. This can be done while creating an Azure Storage Account or Azure Batch account.

Resource groups are a way to organize and manage related resources in Azure. They help you keep your resources organized and make it easier to scale and manage them.

You can create a resource group in the Azure portal, or using Azure CLI or PowerShell commands.

Compute Environment

There are two ways to create an Azure Batch compute environment in Seqera Platform: Batch Forge and Manual. Batch Forge automatically creates Azure Batch resources, while Manual is for using existing Azure Batch resources.

You can choose from two types of nodes when creating a pool: Dedicated nodes and Spot nodes. Dedicated nodes are reserved for your workloads and are more expensive, but they are guaranteed to never be preempted.

Credit: youtube.com, Computing Environments

Here are the key differences between Batch Forge and Manual:

Spot nodes, on the other hand, take advantage of surplus capacity in Azure and are less expensive per hour than dedicated nodes. However, they may be preempted when Azure has insufficient surplus capacity, so it's essential to consider this when deciding which type of node to use.

You can have both Spot and dedicated compute nodes in the same pool, each with its own target setting for the desired number of nodes.

Configuration

Azure Batch is a powerful service that allows you to scale your compute resources on demand. It's designed to handle large-scale batch processing tasks.

To configure Azure Batch, you'll need to create a pool of virtual machines, which can be either Windows or Linux-based. The pool size and type will depend on your specific needs.

You can choose from a variety of virtual machine sizes, ranging from Standard_DS2_v2 to Standard_DS14_v2, each with its own set of characteristics. The Standard_DS2_v2, for example, has 2 vCPUs and 4 GiB of memory.

The configuration process also involves specifying the number of nodes in the pool, which can range from 1 to 50,000. You can also specify the node size and type, including the operating system and software requirements.

Virtual Machine Configuration

Computer server in data center room
Credit: pexels.com, Computer server in data center room

Virtual Machine Configuration is a crucial step in setting up your Azure Batch pool. You'll specify that the pool is composed of Azure virtual machines, which can be created from either Linux or Windows images.

To create a pool, you'll need to choose the size of the nodes and the source of the images used to create them. You'll also need to specify the virtual machine image reference and the Batch node agent SKU to be installed on the nodes.

The Batch node agent is a program that runs on each node in the pool, providing the command-and-control interface between the node and the Batch service. There are different implementations of the node agent, known as SKUs, for different operating systems.

You can optionally attach one or more empty data disks to pool VMs created from Marketplace images. This can be useful for storing data that needs to be accessed by the nodes in the pool.

To use data disks, you'll need to mount and format them from within a VM. This requires some technical knowledge, but it's a great way to expand the storage capacity of your nodes.

VNet and Firewall Configuration

Credit: youtube.com, Configuring Firewall and Virtual Network access on Azure Storage Accounts

You can associate a pool of compute nodes in Batch with a subnet of an Azure virtual network (VNet). To use an Azure VNet, the Batch client API must use Microsoft Entra authentication.

Azure Batch support for Microsoft Entra ID is documented in Authenticate Batch service solutions with Active Directory.

To set up a Batch pool in a VNet, see Create a pool of virtual machines with your virtual network.

Pools and Nodes

Pools and nodes are the building blocks of Azure Batch, and understanding how they work together is crucial for designing an efficient and scalable solution.

A pool is a collection of nodes, which are Azure virtual machines or cloud service VMs that process a portion of your application's workload. You can create pools of Windows or Linux nodes by using Azure Cloud Services, images from the Azure Virtual Machines Marketplace, or custom images.

Each node has its own unique characteristics, including the number of CPU cores, memory capacity, and local file system size. You can specify the type of node you want, such as dedicated or Spot nodes, and the target number for each.

Credit: youtube.com, Azure Batch High Performance Computing AZ-305 Essentials

Here are the two types of nodes:

  • Dedicated nodes: reserved for your workloads, more expensive, and guaranteed to never be preempted.
  • Spot nodes: take advantage of surplus capacity, less expensive, and may be preempted when Azure has insufficient surplus capacity.

By creating a pool, you can scale the number of nodes up or down according to the job load, either reactively or proactively. This allows you to maximize utilization and ensure that your tasks are executed efficiently.

Nodes

A node is an Azure virtual machine (VM) or cloud service VM dedicated to processing a portion of your application's workload. The size of a node determines the number of CPU cores, memory capacity, and local file system size allocated to the node.

You can create pools of Windows or Linux nodes by using Azure Cloud Services, images from the Azure Virtual Machines Marketplace, or custom images you prepare.

Nodes can run any executable or script supported by the operating system environment of the node, including *.exe, *.cmd, *.bat, and PowerShell scripts (for Windows) and binaries, shell, and Python scripts (for Linux).

All compute nodes in Batch include a standard folder structure and associated environment variables, firewall settings, and remote access to both Windows (RDP) and Linux (SSH) nodes.

Credit: youtube.com, LTM Fundamental Concepts (Nodes, Pools, Virtual Server)

By default, nodes can communicate with each other, but not with virtual machines outside the same pool. To allow secure communication with other virtual machines or an on-premises network, you can provision the pool in a subnet of an Azure virtual network (VNet).

Here are the two types of nodes you can specify when creating a pool:

Each type of node has its own target setting, for which you can specify the desired number of nodes.

Custom VM Pool Images

Custom VM Pool Images are a great way to standardize your Virtual Machine deployments.

You can create a pool with custom images using the Azure Compute Gallery.

To learn more, see the relevant section.

Security and Identity

Security with certificates is a vital aspect of Azure Batch, allowing you to encrypt sensitive information for tasks. You can install certificates on nodes, which are then used to decrypt encrypted secrets passed to tasks via command-line parameters or embedded in task resources.

Credit: youtube.com, Azure Managed Identities - explained in plain English in 5 mins with a step by step demo

To add a certificate to a Batch account, you can use the Add certificate operation or the CertificateOperations.CreateCertificate method. This will allow you to associate the certificate with a new or existing pool.

If you add a certificate to an existing pool, you must reboot its compute nodes for the certificate to be applied.

Managed identity is another secure way to authenticate to Azure services using Nextflow. To use this method, you need to create a user-assigned managed identity in Azure and associate it with the Azure Batch Pool.

Here are the steps to follow:

  1. Create a user-assigned managed identity in Azure.
  2. Assign the necessary access roles to the managed identity.
  3. Associate the managed identity with the Azure Batch Pool.
  4. Set up the Platform compute environment with the managed identity client ID.

Access keys and Entra service principals are the two types of Azure credentials available. Access keys are simple to use but have limitations, such as being long-lived and providing full access to Azure Storage and Azure Batch accounts.

Entra service principals, on the other hand, enable role-based access control with more precise permissions and map to a many-to-many relationship with Azure Batch and Azure Storage accounts.

Accounts

Credit: youtube.com, Identity and Access Management - CompTIA Security+ SY0-701 - 4.6

In Azure, each service has its own account, such as Azure Storage or Azure Batch. These accounts are used to house various resources like blob containers, file shares, queues, and tables.

A single Azure subscription can have multiple Azure Storage and Azure Batch accounts. However, a compute environment on the Seqera Platform can only use one of each, so you'll need to choose the right accounts for your needs.

You can create multiple compute environments on the platform with different credentials, storage accounts, and Batch accounts. This allows for flexibility and scalability in managing your resources.

To create a Batch account, you'll need to have a resource group and storage account already set up. This is a crucial step in the process, as it ensures everything is properly configured.

Here are the key steps to create a Batch account:

  1. Log in to your Azure account and select Create a batch account on this page.
  2. Select the existing resource group or create a new one.
  3. Enter a name for the Batch account, such as seqeracomputebatch.
  4. Choose the preferred region, which must be the same as the Storage account.
  5. Select Advanced, then choose Batch service for Pool allocation mode and Shared Key for Authentication mode.
  6. Select Networking and ensure sufficient access for the platform and any additional required resources.
  7. Add any necessary tags to the Batch account.
  8. Review and create the Batch account.
  9. Go to your new Batch account, then select Access Keys and store them for use with your Seqera compute environment.

Security with Certificates

Security with Certificates is crucial for protecting sensitive information. You typically need to use certificates for tasks like encrypting or decrypting Azure Storage account keys.

Credit: youtube.com, USENIX Security '22 - Uninvited Guests: Analyzing the Identity and Behavior of Certificate...

Certificates can be installed on nodes to support encryption and decryption. You can install certificates using the Add certificate operation or the CertificateOperations.CreateCertificate method.

The Batch service installs certificates on each node in a pool when a certificate is associated with a pool. The service installs the certificates before launching any tasks, including the start task and job manager task.

If you add a certificate to an existing pool, you must reboot its compute nodes for the certificate to be applied. This ensures the certificate is installed on the nodes before they start running tasks.

Managed Identity

Managed identity is a more secure way to authenticate to Azure services, but it requires running on Azure infrastructure.

To use managed identity with Nextflow, you need to create a user-assigned managed identity in Azure. This involves recording the Client ID of the managed identity.

The user-assigned managed identity must have the necessary access roles for Nextflow to work properly. You can find more information on required role assignments elsewhere.

Credit: youtube.com, Microsoft Azure Managed Identity Deep Dive

To associate the user-assigned managed identity with the Azure Batch Pool, you need to follow the steps outlined in the documentation. This will allow Nextflow to authenticate using the managed identity associated with the Azure Batch node it runs on.

Here's a step-by-step guide to setting up managed identity:

  1. Create a user-assigned managed identity in Azure.
  2. Record the Client ID of the managed identity.
  3. Associate the user-assigned managed identity with the Azure Batch Pool.
  4. When setting up the Platform compute environment, select the Azure Batch pool by name and enter the managed identity client ID.

Managed identity offers enhanced security compared to access keys, but it's still necessary to use access keys or an Entra service principal to submit the initial task to Azure Batch.

Credentials

Credentials play a crucial role in securing your Azure resources. Access keys are simple to use but have several limitations, including being long-lived and providing full access to Azure Storage and Batch accounts.

Access keys are a single point of failure, with Azure allowing only two keys per account. Entra service principals, on the other hand, offer more precise permissions and a many-to-many relationship with Azure Batch and Storage accounts.

Service principals are only supported in manual compute environments, while Batch Forge compute environments must use access keys for authentication. If you're using access keys, be sure to store them securely and delete any temporary copies after adding them to a credential in your Platform workspace.

Credit: youtube.com, Tech Talk: Identity, Credential and Access Management

Here are the key differences between access keys and Entra service principals:

By understanding the differences between access keys and Entra service principals, you can make informed decisions about which option is best for your Azure resources.

Scheduling and Scaling

Automatic scaling is a game-changer for dynamic workloads, allowing you to adjust the number of nodes in a pool based on your workload and resource usage.

You can write an automatic scaling formula and associate it with a pool, and the Batch service will use the formula to determine the target number of nodes for the next scaling interval.

A scaling formula can be based on time metrics, resource metrics, or task metrics, such as CPU usage, bandwidth usage, memory usage, or task state.

Here are some key metrics to consider when creating a scaling formula:

  • Time metrics: based on statistics collected every five minutes in the specified number of hours.
  • Resource metrics: CPU usage, bandwidth usage, memory usage, and number of nodes.
  • Task metrics: task state, such as Active (queued), Running, or Completed.

To handle tasks running at the time of a decrease operation, you can include a node deallocation option in your formula, such as stopping and requeuing tasks or allowing them to finish before removing the node.

Task Scheduling Policy

Credit: youtube.com, NSDI '22 - Efficient Scheduling Policies for Microsecond-Scale Tasks

The task scheduling policy is a crucial aspect of efficient resource utilization. The max tasks per node configuration option determines the maximum number of tasks that can be run in parallel on each compute node within the pool.

By default, only one task runs on a node at a time, but there are scenarios where running two or more tasks simultaneously can be beneficial. You can specify a fill type to determine how Batch assigns tasks to nodes.

Batch can spread tasks evenly across all nodes in a pool or pack each node with the maximum number of tasks before assigning tasks to another node.

Automatic Scaling Policy

Automatic scaling policy is a game-changer for dynamic workloads. You can apply an automatic scaling policy to a pool, allowing the Batch service to periodically evaluate a formula and dynamically adjust the number of nodes within the pool based on the current workload and resource usage.

Credit: youtube.com, Auto Scaling Groups - Scaling Policies

This approach helps lower the overall cost of running your application by using only the resources you need, and releasing those you don't need. You can specify the automatic scaling settings for a pool when you create it, or enable scaling on a pool later.

A scaling formula can be based on time metrics, resource metrics, or task metrics. Time metrics are based on statistics collected every five minutes in the specified number of hours. Resource metrics include CPU usage, bandwidth usage, memory usage, and number of nodes. Task metrics are based on task state, such as Active (queued), Running, or Completed.

To accommodate tasks that are running at the time of the decrease operation, Batch provides a node deallocation option that you can include in your formulas. You can specify that running tasks are stopped immediately and then requeued for execution on another node, or allowed to finish before the node is removed from the pool.

Here are some common node deallocation options:

  • Immediate: running tasks are stopped immediately and then requeued for execution on another node
  • Task completion: running tasks are allowed to finish before the node is removed from the pool
  • Retained data: running tasks are allowed to finish before the node is removed from the pool, and the data is retained

To maximize compute resource utilization, set the target number of nodes to zero at the end of a job, but allow running tasks to finish.

Start Tasks

Credit: youtube.com, Linux Crash Course - Scheduling Tasks with Cron

Start tasks are a crucial part of Azure Batch, allowing you to execute tasks on each node as it joins the pool or is restarted.

You can add a start task to prepare compute nodes for task execution, such as installing applications that your tasks run on.

Start tasks are especially useful for tasks like simulations, rendering, and data processing, as they enable efficient parallel processing.

To add a start task, you can specify the application to deploy when nodes join the pool, and the Batch service will take care of the rest.

Here are some key things to consider when adding a start task:

  • Specify the application to deploy when nodes join the pool.
  • Ensure the application is installed correctly on the compute nodes.
  • Test the start task to ensure it executes correctly.

By adding a start task, you can streamline your workflow and ensure that your compute nodes are properly prepared for task execution.

Frequently Asked Questions

What is an Azure Batch?

Azure Batch is a cloud-based service that efficiently runs large-scale, compute-intensive applications, allowing you to scale resources on demand without managing infrastructure. It's ideal for complex tasks that require significant processing power.

What is the difference between Azure Function and Azure Batch?

Azure Batch is designed for large-scale job scheduling and compute management, whereas Azure Functions focuses on executing event-driven serverless code. If you need to process big jobs, use Azure Batch; for real-time, event-driven tasks, Azure Functions is the way to go.

What is the difference between Azure Batch and VM?

Azure Batch is a job scheduling and cluster management service that allows applications to run in parallel at scale, whereas a VM (Virtual Machine) is a single, dedicated computing resource. If you need to run large-scale workloads, Azure Batch can help you save costs with low-priority VMs, but learn more about when to use each.

Is Azure Batch deprecated?

Azure Batch REST APIs are supported for two years after release, after which they enter a one-year deprecation period. During this time, notifications are sent to users to plan for updates or alternative solutions.

Jeannie Larson

Senior Assigning Editor

Jeannie Larson is a seasoned Assigning Editor with a keen eye for compelling content. With a passion for storytelling, she has curated articles on a wide range of topics, from technology to lifestyle. Jeannie's expertise lies in assigning and editing articles that resonate with diverse audiences.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.