Deploying Cross-AZ Windows Server Failover Clustering (WSFC) in Alibaba Cloud
WSFC is a feature of the Windows Server platform, which is generally used to improve the high availability of applications and services on your network. WSFC is a successor to the Microsoft Cluster Service (MCS). We recommend you use Windows Server Failover Clustering (WSFC) and SQL Server AlwaysOn Availability Groups as your SQL Server high availability (HA) solution on Alibaba Cloud’s Elastic Compute Service (ECS) instances.
An Alibaba Cloud ECS Instance provides fast memory and the latest Intel CPUs to help you to power your cloud applications and achieve faster results with low latency. All ECS instances come with Anti-DDoS protection to safeguard your data and applications from DDoS and Trojan attacks.
The Alibaba Cloud ECS allows you to load applications with multiple operating systems and manage network access rights and permissions. Within the user console, you can also access the latest storage features, including auto snapshots, which is perfect for testing new tasks or operating systems as it allows you to make a quick copy and restore later. It offers a variety of configurable CPU, memory, data disk and bandwidth variations allowing you to tailor each Instance to your specific needs.
When using WSFC in conjunction with Alibaba Cloud ECS, if one cluster node fails, another node can take over. We can configure this failover to happen automatically, which is the usual configuration, or we can manually trigger a failover.
In this tutorial, we deploy a Cross-Availability Zone (AZ) WSFC on an Alibaba Cloud ECS instance. This tutorial assumes a basic understanding of Alibaba Cloud’s suite of products and services, the Alibaba Cloud Console, failover clustering, the Active Directory (AD), and the administration of Windows Server.
1.1 The Architecture
We recommend the following configuration, which contains three servers and runs across the Alibaba Cloud Virtual Private Cloud (VPC) to provide an isolated cloud network to operate your resource in a secure environment:
• A primary ECS instance running Windows Server 2016.
• A secondary ECS instance, configured to match the primary instance, running in another Availability Zone.
• An Active Directory (AD) / domain name server (DNS) instance. This server will serve several roles:
- Providing a Windows domain.
- Resolving hostnames to IP addresses.
- Hosting the file share witness that acts as a third “vote” to achieve the required quorum for the cluster.
Note: the quorum is sometimes referred to as the Disk or File Witness. It is simply a small clustered disk which is in the available cluster storage group.
Figure 1: The Architecture
1.2 Understanding the Network Routing
When the cluster fails, requests must go to the newly active node. This routing is usually handled by the address resolution protocol (ARP), which associates IP addresses with MAC addresses.
However, in Alibaba Cloud, the VPC system uses software-defined networking, which does not provide MAC addresses. This means the changes broadcast by ARP don’t affect routing. To make routing work, we need to make use of an Alibaba Cloud product called HAVIP (Highly Available Virtual IP).
In this scenario we need to form a cluster across two different subnets in two availability zones. So, we will need to employ two HAVIPs.
1.3 Understanding a Failover
When a failover happens in the cluster, the following changes take place:
- Windows failover clustering changes the status of the active node to indicate that it has failed.
- Failover clustering moves any cluster resources and roles from the failing node to the best node, as defined by the quorum. This action includes moving the associated cluster IP addresses.
- Failover clustering broadcasts ARP packets to notify the hardware-based network routers that the IP addresses have moved. For this scenario, the HAVIP in the other subnet/availability zone will pick up this change and will promote the corresponding instance to become the new master, and the cluster DNS will now be mapped to the new HAVIP address.
That’s it! Let’s start the tutorial from the Alibaba Cloud Console.
2. Preparing the Environment in the Alibaba Cloud Console
First, login to your Alibaba Cloud Console. We are now going to set up your Alibaba Cloud account to work with the WSFC environment.
2.1 Create Your VPC
- In the Alibaba Cloud Console, find and click “VPC” on the left-hand menu.
- Then, click “Create VPC” and follow the on-screen instructions.
- More details on creating a VPC are available here: https://www.alibabacloud.com/help/product/27706.htm
2.2 Create Three ECS Instances
- In the Alibaba Cloud Console, find and click “Elastic Compute Service” on the left-hand menu.
- Create the three ECS instances, which include:
- Two ECS instances to form the cluster, which are in separate availability zones. Call these instances “wsfc-a” and “wsfc-b”.
- One AD instance, in one of the two availability zones of your ECS instances. Call this instance “ad-1”.
- Remember to select “VPC” for Network Type and “Windows Server” for the OS when creating these ECS instances.
- More details on creating an ECS instance are available here: https://www.alibabacloud.com/help/doc-detail/27549.htm
- When you have created the three instances, your Console dashboard should look like this:
2.3 Create Two HAVIPs
Next, we need to create two HAVIPs, one in each availability zone, and then bind the corresponding instance to that subnet behind the HAVIP.
In Alibaba Cloud, all IPs on any VPC and underlying switches are assigned dynamically. So, you must use “HAVIP” to configure a static IP that can be used as Virtual IP for Windows Server Failover Cluster and other application clusters on ECS.
By default, HAVIP button is not available for use. So, you will need to log a support ticket “To whitelist HAVIP”.
Once HAVIP is available under VPC, complete the following steps:
- Click on “Create a HAVIP Address”.
- Select Vswitch and Specify the Private IP that you want to use as a static virtual IP
- Add both Nodes that will be part of the High Availability Cluster
- The Primary should be called the “Master”, while secondary is known as the “Slave”.
- Check this new HAVIP is reachable from the ECS instance. If you can successfully ping it, this IP can now be used for your Windows Cluster.
2.4 In Summary
For the remainder of this tutorial, we will assume the following environment has been set up:
3. Configure Both Instances to Join the Domain
- Use RDP to connect to the wsfc-a instance.
- Before we can join this instance to the domain, we need to perform one fix on the duplicated SID because of the nature of the public image that we used to create the instance.
- So, download the file from the following URL: sysprep.ps1
- Open a PowerShell terminal as Administrator.
- Execute the script, and enter the administrative password when prompted:
[wsfc-a]> .\Sysprep.ps1 -ReserveHostname -ReserveNetwork -skiprearm -post_action "reboot">
- Restart and then connect back to each instance and open a PowerShell terminal as Administrator.
- Set the following variables:
[wsfc-a]> $DNS = "192.168.1.1" # Private IP of ad-1 instance [wsfc-a]> $LocalStaticIp = "192.168.1.111" # Private IP of this instance [wsfc-a]> $DefaultGateway = "192.168.1.253"
- Obtain the address interface of the private static IP, in this case it is showing “Ethernet”:
[wsfc-a]> netsh interface ip show address Configuration for interface "Ethernet" DHCP enabled: No IP Address: 192.168.1.111 Subnet Prefix: 192.168.1.0/24 (mask 255.255.255.0) Default Gateway: 192.168.1.253 Gateway Metric: 1 InterfaceMetric: 15
- Set the static IP address and default gateway to:
`[wsfc-a]> netsh interface ip set address name="Ethernet" static `
$LocalStaticIp 255.255.255.0 $DefaultGateway 1```
- Note: RDP might lose connectivity for a few seconds or require you to reconnect.
- Configure the primary DNS server to:
[wsfc-a]> netsh interface ip set dns name="Ethernet" static $DNS
- Open “Server Manager > Local Server”, click onto the default WORKGROUP, and change to our domain:
- Enter the credentials of an account with the permission to join the domain when prompted.
- Finally, restart the instance to complete the operation.
- Repeat the above steps for the wsfc-b instance, adapting to its own static IP address.
4. Configuring the Cluster
- Use RDP to connect to the wsfc-a instance with the credentials we created in previous step.
- Open a PowerShell terminal as Administrator.
- Add the clustering tools to the instance by running the following command:
[wsfc-a]> Install-WindowsFeature Failover-Clustering -IncludeManagementTools
- Restart to complete the configuration.
- Repeat steps 1–3 for the wsfc-b instance.
- Now we are ready to create the cluster. The subsequent steps can be performed on either one of the instances.
- Open Failover Cluster Manager
- Right-click on Failover Cluster Manager > Create Cluster
- Click the Select Servers page and add both servers.
- Click Next and keep the option to run configuration validation tests.
- Click Next to get to the Validate a Configuration Wizard screen.
- On the Testing Options page, select Run only tests I select, and then click Next.
- Unselect Storage on the Test Selection page as the Storage option will fail in our setup (as it would for separate standalone physical servers). Shared storage is needed for traditional failover-cluster instances (FCIs) where every node needs to see the shared storage locations where data and log files reside, but in the cloud, we would favor a solution like SQL AlwaysOn that doesn’t require shared storage.
- Click Next twice to run the tests. Make sure none of the tests have failed.
- Common issues found during cluster validation include:
- Only one network path between replicas. Previously for physical servers, we would build a separate cluster heartbeat network. Because you are now working with the cloud, you can ignore this one.
- Windows Updates may not be the same on both replicas. If you configured them to apply updates automatically, one of them might have applied updates that the other hasn’t downloaded yet.
- Pending reboot. We might have made changes to one of the servers, and it needs a reboot to apply.
- Click Finish to return to Create Cluster Wizard.
- Name the cluster wsfc-cluster-1 on the Access Point for Administering the Cluster page and specify the two HAVIP addresses as the cluster IP for each subnet.
- Click Next twice to create the cluster and then Finish to complete the wizard.
- We can also uncheck the Add all eligible storage to the cluster option for now.
- We can now move on to create the file-share witness to help the cluster to achieve quorum.
- Right-click on the cluster, select More Actions and then Configure Cluster Quorum Settings.
- Click Next.
- Select the option for Select the quorum witness and then click Next.
- Select the option for Configure a file share witness.
- Select Browse option, and then create a new file share on the AD instance ad-1, and click Next.
- Click Next after confirming the settings.
- Click Finish to end the wizard.
- Verify that all resources are online for the cluster:
4. Testing the Setup
- In the HAVIP web console, both servers in their respective HAVIP have been promoted to Master. But, from WSFC perspective, the cluster resource is online for ‘192.168.2.110’, in this case, it is wsfc-b that is the active node in the cluster setup.
- Next, we will try to simulate a failover and make sure the connection is working as expected.
- First, RDP to the ad-1 server, open a PowerShell terminal and we will start pinging the cluster. The current active IP is wsfc-b (192.168.2.110) in this example.
- RDP to either one of the instances as part of the cluster. Within the Failover Cluster Manager page, right-click onto our cluster, select More Actions > Move Core Cluster Resources > Select Node
- Since the current resource is up on wsfc-2, we only see wsfc-1 here as candidate to failover the resource. Select the node and click OK to complete this action.
- The failover should complete very quickly, but if we go back to the ad-1 server, after refreshing the DNS, we can perform the ping again and notice that the failover is assigned to 192.168.1.110.
- To failback to the previous server, we can repeat step 5 above and we will see wsfc-b in the selection list.
Testing the Setup
That’s it! We have successfully created a cross-AZ failover cluster using Windows Server on Alibaba Cloud.
To read the other tutorials covering Windows Server Failover Clusters, SQL Servers and Windows Server Failover Clustering, you can visit: https://www.alibabacloud.com/getting-started/projects.