Azure Windows Cluster Setup and Management Simplified

Setting up an Azure Windows Cluster is a straightforward process that can be completed in a few steps. To begin, you'll need to create a new virtual network in the Azure portal, which will serve as the foundation for your cluster.

Azure supports two types of clusters: Windows Server Failover Cluster (WSFC) and Windows Server 2019. WSFC is the recommended choice for most scenarios, as it provides high availability and scalability.

To create a WSFC cluster, you'll need to deploy at least two virtual machines (VMs) with a minimum of 4 vCPUs and 16 GB of RAM each. This is because each VM will need to be a node in the cluster, and they must meet the minimum hardware requirements.

The Azure portal provides an easy-to-use interface for creating and managing clusters. You can use the portal to create a new cluster, add or remove nodes, and configure cluster settings such as the cluster name and IP address.

Explore further: Azure Hybrid Benefits for Windows Server Core

Health Monitoring

Credit: youtube.com, Operations: Cluster health and metrics

Health monitoring is a crucial aspect of maintaining a high-availability Azure Windows cluster. The cluster service monitors the health of the cluster based on system and network parameters to detect and respond to failures.

Setting the threshold for declaring a failure is a delicate balance between prompt response and avoiding false failures. A relaxed monitoring strategy is recommended for failover clusters on Azure VMs to avoid premature failures and longer outages.

The cluster service offers two monitoring settings: Aggressive and Relaxed. Aggressive settings provide rapid failure detection and recovery of hard failures, but may lead to premature failures and longer outages.

Relaxed settings, on the other hand, provide more forgiving failure detection with a greater tolerance for brief transient network issues. This approach avoids transient failures but may delay the detection of a true failure.

To adjust threshold settings, refer to cluster best practices for more detail.

Curious to learn more? Check out: Service Fabric Azure

Configuration

To configure a Windows Server Failover Cluster (WSFC) on Azure VMware Solution vSAN, you'll need to ensure that an Active Directory environment is available. This is crucial for the cluster to function properly.

A unique perspective: Cluster in Azure

Credit: youtube.com, How to Configure Cluster Cloud Witness Quorum in MS Azure - SQL Server High Availability on Azure

The configuration process involves creating virtual machines (VMs) on the vSAN datastore and powering them on to configure the hostname and IP addresses, join them to an Active Directory domain, and install the latest available OS updates.

You'll also need to install the latest VMware Tools and enable and configure the Windows Server Failover Cluster feature on each VM. Additionally, you'll need to configure a Cluster Witness for quorum, which can be a file share witness.

Here are the requirements for shared disks configuration parameters:

VM Configuration Requirements

To create a well-configured virtual machine, you'll need to consider the operating system it will be running. This includes selecting a compatible OS, such as Windows, Linux, or macOS.

The amount of RAM required will depend on the OS you choose, with Windows typically needing at least 2GB and Linux requiring a minimum of 512MB.

Adequate storage is also essential, with a minimum of 16GB recommended for most OS installations.

Expand your knowledge: Windows Azure Os

WSFC Node Boot Configuration

Credit: youtube.com, Failover Cluster Installation & Configuration Step By Step

To configure a WSFC node for booting, you'll need to ensure the SCSI controller type is set to LSI Logic SAS. This is a specific requirement for WSFC nodes.

You'll also need to configure the disk mode to virtual, and disable SCSI bus sharing. This is crucial for ensuring the boot device is properly configured.

To modify advanced settings for a virtual SCSI controller hosting the boot device, you'll need to add specific settings to each WSFC node. These settings include scsiX.returnNoConnectDuringAPD = "TRUE" and scsiX.returnBusyOnNoConnectStatus = "FALSE", where X is the boot device SCSI bus controller ID number. By default, X is set to 0.

Here's a summary of the required settings:

These settings are essential for ensuring the boot device is properly configured and the WSFC node can boot successfully.

Configure WSFC with Shared Disks

To configure a Windows Server Failover Cluster (WSFC) with shared disks, you'll need to create virtual machines on the vSAN datastore and power them on. Ensure that an Active Directory environment is available.

Credit: youtube.com, Windows Server 2016 failover cluster setup with Azure shared disk

Power on all VMs, configure the hostname and IP addresses, join all VMs to an Active Directory domain, and install the latest available OS updates. Install the latest VMware Tools.

To enable and configure the Windows Server Failover Cluster feature on each VM, you'll need to add one or more Para virtual SCSI controllers (up to four) to each VM part of the WSFC. Use the settings per the previous paragraphs.

On the first cluster node, add all needed shared disks using Add New Device > Hard Disk. Leave Disk sharing as Unspecified (default) and Disk mode as Independent - Persistent. Then attach it to the controller(s) created in the previous steps.

Here are the specific settings for the shared disks configuration:

Continue with the remaining WSFC nodes. Add the disks created in the previous step by selecting Add New Device > Existing Hard Disk. Be sure to maintain the same disk SCSI IDs on all WSFC nodes.

Power on the first WSFC node, sign in, and open the disk management console (mmc). Make sure the added shared disks are manageable by the OS and are initialized. Format the disks and assign a drive letter.

Resource Limits

Credit: youtube.com, Setting Resource Requests and Limits in Kubernetes

Resource limits can cause VM performance issues if workloads require more resources than the purchased Azure VM.

Intensive SQL IO operations can reach IOPS or MBPS throughput limits, making SQL Server unresponsive to an IsAlive/LooksAlive check.

Resource bottlenecks can make the node or resource appear down to the cluster or SQL Server, resulting in a failed health check.

Performance degradation may cause the VM or disk to reach disk limits, which can lead to maintenance operations like backups being impacted.

Monitoring the server for disk or VM-level capping is crucial if your SQL Server is experiencing unexpected failovers.

On a similar theme: Azure Vm Creation

Network

To match the on-premises experience, deploy your SQL Server VMs to multiple subnets within the same virtual network, negating the need for an Azure Load Balancer to route traffic to your HADR solution.

In a traditional on-premises environment, clustered resources rely on the Virtual Network Name to route traffic to the appropriate target. This virtual name binds the IP address in DNS, allowing clients to use either the virtual name or the IP address to connect to their high availability target.

See what others are reading: Windows 365 Vs. Azure Virtual Desktop

Credit: youtube.com, Install and configure Windows Failover Cluster on Azure

Having multiple subnets is key to avoiding the extra dependency on an Azure Load Balancer, but you'll still need to configure a load balancer to route traffic from the client to the Virtual Network Name.

You can use a Public Load Balancer if your client connects over the public internet, but for clients residing in the same vNet, create an Internal Load Balancer. This will redirect traffic to the active cluster node, making it possible for clients to connect directly to the cluster IP address.

A different take: Windows Azure Traffic Manager

Virtual Network Name (VNN)

In a traditional on-premises environment, clustered resources rely on the Virtual Network Name to route traffic to the appropriate target.

The Virtual Network Name is a network name and address managed by the cluster, which moves the network address from node to node during a failover event.

On Azure Virtual Machines in a single subnet, an additional component is necessary to route traffic from the client to the Virtual Network Name of the clustered resource.

Credit: youtube.com, Always ON using Distributed Network Name (DNN) on Azure Virtual Machines (IAAS)

This component is a load balancer, which holds the IP address for the VNN and is necessary to route traffic to the appropriate high availability target.

The load balancer detects failures with the networking components and moves the address to a new host, but it also introduces a slight failover delay due to health probes conducting alive checks every 10 seconds by default.

The load balancer distributes inbound flows to the instances defined by the back-end pool, which are the Azure virtual machines running SQL Server for FCI, or the Azure virtual machines that can become the primary replica for the listener in availability groups.

Having multiple subnets negates the need for the extra dependency on an Azure Load Balancer to route traffic to your HADR solution.

The Virtual Network Name can be cumbersome to configure and is an additional source of failure, which can cause a delay in failure detection and introduce an overhead and cost associated with managing the additional resource.

On a similar theme: Windows Azure High Availability

Static IP Address

Credit: youtube.com, Static IP vs Dynamic IP Address

Having a static IP address is crucial for your cluster nodes, so make sure each one uses a static IP. This will prevent the IP address from changing, which can cause connectivity issues.

You'll want to set this up as soon as each VM is provisioned. Make sure to change the settings to use a static IP address.

Using a static IP address will ensure that your cluster nodes have a consistent IP address, which is essential for maintaining a stable network.

Storage

Storage is a crucial aspect of an Azure Windows cluster, and there are some specific best practices to keep in mind. You'll want to consult Performance best practices for SQL Server in Azure Virtual Machines for more information.

To ensure optimal storage performance, you'll need to add at least one additional Managed Disk to each of your cluster nodes.

DataKeeper can utilize Basic Disk, Premium Storage, or even multiple disks striped together in a local Storage Space.

Creating a local Storage Space requires careful planning, as you should create the Storage Space BEFORE you do any cluster configuration due to a known issue with Failover Clustering and local Storage Spaces.

All disks should be formatted NTFS.

High Availability

Credit: youtube.com, SAP on Azure High Availability Options on Compute and Operating System

SIOS DataKeeper Cluster Edition is the first Azure certified high availability and disaster recovery solution in the Azure Marketplace. It provides efficient data replication and seamless integration into Windows Server Failover Clustering environments for high availability clusters without the need for costly shared storage.

To achieve high availability, SIOS DataKeeper synchronizes local storage using highly efficient block-level replication, making it appear to WSFC as traditional shared storage. This allows IT teams to continue using familiar WSFC in the cloud without the cost and complexity of a SAN or other shared storage.

In a cloud environment, SIOS DataKeeper provides a simple way to use Windows Server Failover Clustering – including SQL Server Always On Failover Clustering – to protect business-critical applications.

Readers also liked: Windows Server 2022 Azure Edition

High Availability Guide

Microsoft Azure clustering ensures high availability protection for critical applications running in Azure environments by eliminating single points of failure.

SIOS DataKeeper Cluster Edition is the first Azure certified high availability and disaster recovery solution in the Azure Marketplace, providing efficient data replication and seamless integration into Windows Server Failover Clustering environments.

Credit: youtube.com, High Availability Beginners' Guide

To build a 2-node File Server Failover Cluster Instance in Azure, you'll need to provision two new virtual machines which will act as the two nodes in your cluster.

SIOS DataKeeper Cluster Edition synchronizes local storage using highly efficient block-level replication, making it appear to WSFC as traditional shared storage, without the need for costly shared storage.

A comprehensive range of Linux clustering is provided by SIOS LifeKeeper for Linux, protecting applications in SUSE Linux, Red Hat Linux, Oracle Linux, and Rocky Linux clustering environments.

For customers running critical Windows applications in Windows Server Failover Clustering (WSFC) environments, SIOS DataKeeper Cluster Edition provides a simple way to use familiar WSFC in the cloud without the cost and complexity of a SAN or other shared storage.

SIOS LifeKeeper clustering software is part of the SIOS Protection Suite, a tightly integrated combination of high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies.

Here are some key features of SIOS DataKeeper Cluster Edition:

Provides efficient data replication and seamless integration into Windows Server Failover Clustering environments
Synchronizes local storage using highly efficient block-level replication
Appears to WSFC as traditional shared storage, without the need for costly shared storage
Supports SAP, SQL Server, and Oracle applications
Replicates the cluster to a geographically separated location using Azure Site Recovery

SIOS DataKeeper Cluster Edition is the only high availability solution certified for use with Microsoft Azure Site Recovery, providing cost-efficient high availability and disaster recovery protection for business-critical applications in Azure.

Cluster Heartbeat

Credit: youtube.com, How to use Heartbeat and DRBD for HA (high availability)

Cluster Heartbeat is a critical component of high availability, ensuring that nodes in a cluster can communicate with each other and detect issues promptly. The frequency at which cluster heartbeats are sent between nodes is defined by the Delay setting, which is the number of seconds before the next heartbeat is sent.

The Delay setting can be configured differently within the same cluster, depending on whether nodes are on the same subnet or different subnets. This flexibility allows for more tailored performance in various network environments.

The Threshold setting determines the number of heartbeats that can be missed before the cluster takes recovery action. This setting can also be configured differently within the same cluster, depending on the subnet configuration.

Default values for these settings may be too low for cloud environments, leading to unnecessary failures due to transient network issues. To mitigate this, consider using relaxed Threshold settings for failover clusters in Azure VMs.

Here's a summary of the primary Cluster Heartbeat settings:

Quorum

Credit: youtube.com, Name Node High Availability With QJM (quorum journal manager) | Edureka

Quorum is a crucial component of high availability in Azure clustering. It eliminates single points of failure by ensuring that critical applications remain online even if one node fails.

A two-node cluster can function without a quorum resource, but customers are strictly required to use one for production support. Cluster validation won't pass any cluster without a quorum resource.

A three-node cluster can survive a single node loss without a quorum resource, but there's a risk of a split-brain scenario if a node is lost or there's a communication failure between the nodes.

To prevent this, you can configure a quorum resource, which will allow the cluster resources to remain online with only one node online. This is especially important in Azure environments, where high availability is critical.

There are three types of quorum resources: cloud witness, disk witness, and file share witness. Each has its own advantages and disadvantages.

Credit: youtube.com, 6. Infrastructure & High-Availability || 6.1 Split Brain & Quorum || CourseWikia.com

Here's a comparison of the three types of quorum resources:

In summary, quorum is a critical component of high availability in Azure clustering, and there are three types of quorum resources to choose from. By understanding the advantages and disadvantages of each, you can make an informed decision about which one is best for your environment.

Sources

Rosemary Boyer

Writer

View Rosemary's Profile

Rosemary Boyer is a skilled writer with a passion for crafting engaging and informative content. With a focus on technical and educational topics, she has established herself as a reliable voice in the industry. Her writing has been featured in a variety of publications, covering subjects such as CSS Precedence, where she breaks down complex concepts into clear and concise language.

View Rosemary's Profile

Azure Windows Cluster Setup and Management

Health Monitoring