
To create an Azure SQL Failover Group, you'll need to define a primary and secondary database. The primary database is the main instance that will be replicated to the secondary database.
You can create a failover group in the Azure portal or using Azure CLI. The Azure portal is a user-friendly interface for managing Azure resources, including SQL databases.
A failover group can be configured to automatically fail over to the secondary database in case of a failure. This is done by setting up an availability group listener. The listener is responsible for directing connections to the available database.
The failover group also allows you to monitor the health of your databases and receive notifications when issues arise. This is done through Azure Monitor and Azure Alerting.
Readers also liked: Azure App Insights vs Azure Monitor
Business Continuity
Having a business continuity plan in place is crucial for Azure SQL databases, as it ensures minimal downtime and data loss in the event of a disaster.
Azure SQL failover groups can be configured to automatically fail over to a secondary database in the event of a primary database failure, ensuring high availability.
This feature is particularly useful for businesses with high-availability requirements, such as e-commerce sites or financial institutions.
Business Continuity for PaaS Services
Business Continuity for PaaS Services is crucial for any organization that relies on cloud-based infrastructure.
To ensure business continuity for PaaS (Platform as a Service) services, you can create a failover group for your database. This allows you to automatically switch to a secondary server in case the primary one fails.
Use the Switch-AzSqlDatabaseFailoverGroup command to fail over to the secondary server. This command is specifically designed for Azure SQL databases.
Before creating a failover group, you need to add the database to the group. Use the Get-AzSqlDatabaseFailoverGroup and Get-AzSqlElasticPoolDatabase commands to do this.
Testing the failover process is also essential to ensure business continuity. You can test the failover of your failover group using PowerShell commands.
Readers also liked: Azure Sql Database Linked Server
Prerequisites
To ensure a smooth setup for business continuity, you need to consider a few prerequisites. Your primary database should already be created, so make sure you've got that taken care of first.
A key aspect of setting up a failover group is ensuring your secondary server is properly configured. If your secondary server already exists in a different region to the primary server, the server login and firewall settings must match that of your primary server.
Expand your knowledge: Sql Server on Aws vs Azure
To configure a failover group, having proper permissions is essential. You'll need to review the limitations and ensure you have the necessary permissions before proceeding.
Here are the specific prerequisites to keep in mind:
- Your primary database should already be created.
- Your secondary server must have matching login and firewall settings if it's in a different region than the primary server.
- You should have proper permissions and a SQL managed instance to use as the primary.
Initial Seeding
The initial seeding phase is the longest and most expensive operation in setting up a failover group. This phase can take a significant amount of time, depending on the size of your data, number of replicated databases, and the load on the primary databases.
The seeding speed can be up to 500 GB an hour for SQL Database, and seeding is performed for all databases in parallel. This means that multiple databases can be seeded at the same time, which can significantly reduce the overall seeding time.
The initial seeding phase is the longest and most expensive part of establishing a failover group between managed instances. This phase can take a significant amount of time, depending on the size of data, number of replicated databases, and the workload intensity on the primary databases.
Explore further: Azure Data Studio vs Azure Data Explorer
Under normal circumstances, and when connectivity is established using recommended global virtual network peering, seeding speed is up to 360 GB an hour for SQL Managed Instance. This is slower than the seeding speed for SQL Database.
A slow link between the two instances can significantly affect the time it takes for the initial seeding phase to complete. For example, if the link can only transfer 10 GB per hour, seeding a 100-GB database can take about 10 hours.
Test Planned
Testing planned failover is a crucial step in ensuring business continuity. You can test failover of your failover group by using the Azure portal or PowerShell.
To test failover using the Azure portal, go directly to the logical server that hosts your database, or follow the steps to find the server. On the SQL server resource menu, select Failover groups under Data management, and then choose an existing failover group to open the Failover group page.
Discover more: Azure Sql vs Sql Server
Review which server is now primary and which server is secondary. Once failover succeeds, the two servers swap roles, so that the former primary becomes the secondary. You can also select Failover again to fail the servers back to their original roles.
If you're using PowerShell, you can test failover by following the steps outlined in the Azure portal instructions. Note that if the instances are in different subscriptions, or resource groups, initiate the failover from the secondary instance.
Here are the steps to test failover using the Azure portal:
- Go to either the primary or secondary managed instance within the Azure portal.
- Under Data management, select Failover groups.
- On the Failover groups pane, note which instance is the primary instance, and which instance is the secondary instance.
- On the Failover groups pane, select Failover from the command bar. Select Yes on the warning about TDS sessions being disconnected, and note the licensing implication.
- On the Failover groups pane, after failover succeeds, the instances switch roles so the previous secondary becomes the new primary and the previous primary becomes the new secondary.
- (Optional) On the Failover groups pane, use Failover to switch the roles back so the original primary becomes primary again.
Log Replay Service
When migrating databases to Azure SQL Managed Instance using the Log Replay Service, it's essential to understand the implications on business continuity.
Databases migrated with LRS are in a restoring state until the cutover step is executed.
This means they can't be added to a failover group until the cutover step is complete, which can delay creating the failover group until the database restore completes.
A database in a restoring state is essentially in a paused state, unable to participate in a failover group.
This is a critical consideration for businesses that rely on continuous database availability for their operations.
Failover Group Configuration
To configure a failover group, you'll need to create a secondary managed instance that's empty, without any user databases. This ensures a clean start for the secondary instance.
The secondary managed instance must be the same service tier and have the same storage size as the primary instance. Having equal compute sizes is also recommended to ensure the secondary instance can handle changes replicated from the primary instance.
The IP address range for the virtual network of the primary instance must not overlap with the address range of the virtual network for the secondary managed instance, or any other virtual network peered with either the primary or secondary virtual network.
Both instances must be in the same DNS zone, which you'll need to specify when creating the secondary managed instance. This ensures consistency across both instances.
To facilitate communication between the two instances, Network Security Groups (NSG) rules for the subnets of both instances must have open inbound and outbound TCP connections for port 5022 and port range 11000-11999.
Worth a look: Azure Sql Managed Instance Limitations
Here are the key configuration requirements summarized:
- The secondary managed instance must be empty, without any user databases.
- The primary and secondary instances must be the same service tier and have the same storage size.
- The IP address range for the virtual network of the primary instance must not overlap with the address range of the virtual network for the secondary managed instance.
- Both instances must be in the same DNS zone.
- NSG rules for the subnets of both instances must have open inbound and outbound TCP connections for port 5022 and port range 11000-11999.
Deploying managed instances to paired regions is also recommended for performance reasons, as it benefits from a significantly higher geo-replication speed compared to unpaired regions.
Geo-Replication
Geo-replication is a feature of Azure SQL databases that continuously replicates data to a single or more secondary SQL databases of a primary SQL database. This is unrelated to SQL Server replication, so don't get confused between the two terms.
Active geo-replication uses availability group technologies to asynchronously replicate data to secondary databases, which can be configured up to four secondary databases. If you need more than four secondary databases, you can configure secondary of secondary databases, known as chaining.
You can use geo-replication to offload read-only queries from the primary database and route them to respective geo secondaries to load balance the database workload. Another use of active geo-replication is in the case of database migration to other servers.
A different take: Azure Data Studio Connect to Azure Sql
To ensure uninterrupted geo-replication traffic flow, you need to establish and maintain connectivity between the virtual network subnets hosting primary and secondary instances. You can do this using global virtual network peering, VPN gateways, or Azure ExpressRoute. Global virtual network peering is the recommended way to establish connectivity between two instances in a failover group, providing a low-latency, high-bandwidth private connection between the peered virtual networks.
Here are the ways to provide connectivity between instances:
- Global virtual network peering
- VPN gateways
- Azure ExpressRoute
Active Geo-Replication
Active geo-replication is a feature of Azure SQL database that continuously replicates data to a single or more secondary SQL databases of a primary SQL database.
This replication is done using availability group technologies to asynchronously replicate data to its secondary databases. Up to four secondary databases can be configured for an Azure SQL database, and if more are needed, secondary of secondary databases, also known as chaining, can be used.
Active geo-replication is designed as a business continuity option for SQL databases to secure and protect your data in another location. It's not related to SQL Server replication, so don't be confused between the two terms.
On a similar theme: Group Naming Policy in Azure Active Directory
You can failover the primary database to its respective secondary databases in case of any unplanned event or disaster recovery scenario. However, manual intervention is required to failover to its respective secondary database, as it doesn't support automatic failover.
Active geo-replication supports Recovery Time Objective (RTO) as 30 seconds and Recovery Point Objective (RPO) as 5 seconds. This means that in the event of a disaster, your database will be up and running within 30 seconds, and you'll only lose 5 seconds of data.
To offload read-only queries from the primary database, you can route them to respective geo secondaries, also known as geo replicas, which can serve read-only transactions like availability group secondary replicas.
Here are some ways to establish connectivity between primary and secondary instances for uninterrupted geo-replication traffic flow:
- Global virtual network peering
- VPN gateways
- Azure ExpressRoute
Global virtual network peering is the recommended way to establish connectivity between two instances in a failover group, as it provides a low-latency, high-bandwidth private connection between the peered virtual networks using the Microsoft backbone infrastructure.
Paired Regions
Paired regions are a crucial aspect of geo-replication, and understanding their benefits and best practices is essential for a smooth and high-performance setup.
Using paired regions for failover groups can provide better performance compared to unpaired regions.
For Azure SQL Database, paired regions are generally not deployed at the same time, so it's impossible to predict which region will be upgraded first.
This means you should select different maintenance window schedules for your primary and secondary databases if they're not in paired regions.
For example, you can choose a Weekday maintenance window for your geo-secondary database and a Weekend maintenance window for your geo-primary database.
Deploying managed instances to paired regions can also improve performance, especially for SQL Managed Instance failover groups.
In paired regions, Azure SQL Managed Instance generally doesn't update both regions at the same time, so the order of deployment isn't guaranteed.
To ensure a smooth transition when changing the secondary region, follow these steps:
- Create additional secondaries of each database on the primary server to the new secondary server using active geo-replication.
- Delete the failover group and re-create it with the same name between the primary and new secondary servers.
- Add all primary databases to the new failover group.
- Delete the old secondary server.
By following these steps and selecting different maintenance window schedules, you can ensure a high-performance geo-replication setup with paired regions.
Transactional Replication
Transactional replication is supported with instances in a failover group.
If you configure replication before adding your SQL managed instance into a failover group, replication pauses and shows a status of Replicated transactions are waiting for the next log backup or for mirroring partner to catch up.
Replication resumes once the failover group is created successfully.
A SQL managed instance administrator must clean up all publications on the old primary and reconfigure them on the new primary after a failover occurs.
You might like: Azure Sql Managed Instance vs Azure Sql Database
Listener and Endpoints
Failover groups provide read-write and read-only listener end-points that remain unchanged during geo-failovers.
You can locate the listener endpoint in the Azure portal by going to your logical server or SQL managed instance and selecting Failover groups under Data management.
The Read/write listener endpoint, in the form of fog-name.database.windows.net, routes traffic to the primary database, while the Read-only listener endpoint, in the form of fog-name.secondary.database.windows.net, routes traffic to the secondary database.
You might like: What Is Windows Azure Sql Database
Here are the details of the listener endpoints:
- The Read/write listener endpoint routes traffic to the primary database.
- The Read-only listener endpoint routes traffic to the secondary database.
By using the listener endpoints, you don't have to manually update your connection string every time your failover group fails over since traffic is always routed to the current primary.
Configure Ports and NSG Rules
To ensure communication between your frontend components and databases, you need to configure the right ports and network security group (NSG) rules.
For Azure SQL Database, you need to open inbound traffic from the public IP address you create, which is used by the public load balancer. This is done by creating a SQL Database firewall rule to allow inbound traffic from the public IP address.
To facilitate communication between two managed instances, you need to open inbound and outbound TCP connections for port 5022 and port range 11000-11999 in the NSG rules for the subnets of both instances.
Here are the specific NSG rule requirements:
By following these configuration requirements, you can ensure that your database and frontend components can communicate with each other, even during regional outages.
Security and Permissions
Azure SQL failover groups have specific security and permissions requirements to ensure business continuity design and restricted access to the data tier. Permissions for a failover group are managed via Azure role-based access control (Azure RBAC).
To create and manage failover groups, Azure RBAC write access is necessary, and the SQL Server Contributor role has all the necessary permissions. However, for more restricted access, consider the SQL Managed Instance Contributor role, scoped to the resource groups of the primary and the secondary managed instance.
The following table outlines the minimal required permissions and their respective scope levels for management operations on failover groups:
Permissions
Permissions are a crucial aspect of security in Azure. Permissions for a failover group are managed via Azure role-based access control (Azure RBAC).
To create and manage failover groups, you need Azure RBAC write access. The SQL Server Contributor role has all the necessary permissions to manage failover groups.
For more insights, see: How to Give Access to Resource Group in Azure
If you're working with Azure SQL Database, you'll need to review the specific permission scopes listed in the table below:
In some cases, you might need to work with managed instances. In that scenario, the SQL Managed Instance Contributor role, scoped to the resource groups of the primary and the secondary managed instance, is sufficient to perform all management operations on failover groups.
Here's a more detailed breakdown of the minimal required permissions and their respective minimal required scope levels for management operations on failover groups:
Scalability and Performance
To ensure your Azure SQL failover group performs well, scalability is key. You can scale the primary database up or down to a different compute size within the same service tier without disconnecting any geo-secondaries.
Scaling up requires scaling up the geo-secondary first, and then the primary, to avoid overloading the secondary. Scaling down should be done in reverse order: scale down the primary first, and then the secondary. This sequence helps prevent issues during the upgrade or downgrade process.
Scaling a database to a different service tier is also possible, but you'll need to scale up the geo-secondary first, and then the primary. If you try to scale the primary or geo-secondary in a way that violates this rule, you'll receive an error message.
Curious to learn more? Check out: Azure Kubernetes Service vs Azure Container Apps
Scaling Databases
Scaling databases is a delicate process, and it's essential to follow some guidelines to avoid common pitfalls. You can scale the primary database up or down to a different compute size within the same service tier without disconnecting any geo-secondaries.
To avoid overloading the geo-secondary, it's recommended to scale up the geo-secondary first, and then scale up the primary. Conversely, when scaling down, reverse the order: scale down the primary first, and then scale down the secondary.
Scaling a database to a different service tier requires a specific sequence of operations. You must first scale the geo-secondary to the higher tier, and then scale the primary. If you try to scale the primary or geo-secondary in a way that violates this rule, you'll receive an error message.
Scaling instances within a failover group also requires a specific sequence of operations. To avoid overloading the geo-secondary, scale up the geo-secondary first, and then scale up the primary. When scaling down, reverse the order: scale down the primary first, and then scale down the secondary.
Here are some key things to keep in mind when scaling databases:
- Scaling the primary database in a failover group to a higher service tier requires scaling the geo-secondary to the higher tier first.
- Scaling the geo-secondary down is not recommended to ensure sufficient capacity after a geo-failover.
- Scaling a geo-secondary after an unplanned failover may not be possible if the former geo-primary is unavailable.
- Scaling instances to or from the Next-gen General Purpose tier requires deleting the failover group first.
Potential Performance Degradation
Geo-failover of an application can be triggered by the state of Azure SQL components alone, which can lead to performance issues.
A typical Azure application uses multiple Azure services and consists of multiple components, some of which might still be available in the primary region even after an outage.
Ensure the redundancy of all the application's components in the secondary region to avoid performance degradation.
Once the primary databases switch to the secondary region, latency between dependent components can increase, affecting the application's performance.
Fail over application components together with the database to mitigate the impact of higher cross-region latency.
Discover more: Application Security Groups Azure
Protect Critical Data
To protect critical transactions from data loss, an application developer can call the sp_wait_for_database_copy_sync stored procedure immediately after committing the transaction. This procedure blocks the calling thread until the last committed transaction has been transmitted and hardened in the transaction log of the secondary database.
sp_wait_for_database_copy_sync prevents data loss after geo-failover for specific transactions, but doesn't guarantee full synchronization for read access. The delay caused by a sp_wait_for_database_copy_sync procedure call can be significant and depends on the size of the not yet transmitted transaction log on the primary at the time of the call.
Using manual planned failover to move the primary back to the original location once the outage that caused the geo-failover is mitigated can also help prevent data loss.
To minimize the risk of data loss, it's essential to understand the potential issues with forced failover, such as recent transactions not being replicated to the geo-secondary. If a forced failover is performed, there might be data loss if an outage occurs in the primary region.
The following conditions must be met prior to initiating a forced failover to achieve data lossless forced failover:
- The workload is stopped on the primary managed instance.
- All long running transactions have completed.
- All client connections to the primary managed instance have been disconnected.
- Failover group status is 'Synchronizing'.
Management and Monitoring
Managing an Azure SQL Failover Group is a straightforward process. You can monitor the status of your failover group in the Azure portal.
To check the status, navigate to the Azure portal and select your failover group. The status will be displayed on the Overview tab.
You can also set up alerts and notifications to receive notifications when there are issues with your failover group. This can be done by clicking on the "Alerts" tab and setting up new alerts.
Monitoring the performance of your Azure SQL databases within the failover group is also important. This can be done using Azure Monitor, which provides detailed metrics and logs for your databases.
By regularly checking the status and performance of your failover group, you can identify and resolve issues quickly, ensuring minimal downtime for your applications.
Frequently Asked Questions
What is the difference between Azure SQL failover Group and geo-replication?
Azure SQL failover groups replicate data within a region, while geo-replication copies data across regions, with the latter offering more comprehensive disaster recovery. Geo-replication is specific to Azure SQL Database, whereas failover groups are applicable to both Azure SQL Database and managed instances.
Sources
- https://www.sqlshack.com/understanding-business-continuity-solutions-for-azure-sql-paas-services/
- https://learn.microsoft.com/en-us/azure/azure-sql/database/failover-group-sql-db
- https://learn.microsoft.com/en-us/azure/azure-sql/database/failover-group-configure-sql-db
- https://learn.microsoft.com/en-us/azure/azure-sql/managed-instance/failover-group-sql-mi
- https://learn.microsoft.com/en-us/azure/azure-sql/managed-instance/failover-group-configure-sql-mi
Featured Images: pexels.com