SAP HANA High Availability Cross-Zone Solution on Alibaba Cloud With SUSE Linux Enterprise Server for SAP Applications

  • Jinhui Li, Product Manager SAP Solutions on Alibaba Cloud, AliCloud (Germany)
  • Bernd Schubert, SAP Solution Architect, SUSE

1. Solution Overview

1.1 SAP HANA System Replication

SAP HANA provides a feature called System Replication which is available in every SAP HANA installation. It offers an inherent disaster recovery support.

1.2 High Availability Extension Included with SUSE Linux Enterprise Server for SAP Applications

The SUSE High Availability Extension is a high availability solution based on Corosync and Pacemaker. With SUSE Linux Enterprise Server for SAP Applications, SUSE provides SAP specific Resource Agents (SAPHana, SAPHanaTopology etc.) used by Pacemaker. This helps you to build your SAP HANA HA solution up more effectively.

1.3 Architecture Overview

This document guides you on how to deploy an SAP HANA High Availability (HA) solution cross different Zones. The following list contains a brief architecture overview:

1.4 Network Design

The following table contains information for the Network design.

2 Infrastructure Preparation

The next sections contain information about how to prepare the infrastructure.

2.1 Infrastructure List

To set up your infrastructure, the following components are required:

2.2 Creating VPC

First, create a VPC via Console?úVirtual Private Cloud?úVPCs?úCreate VPC. In this example, a VPC named suse_hana_ha in the Region EU Central 1 (Frankfurt) has been created:

  • Switch1 192.168.0.0/24 Zone A, for SAP HANA Primary Node;
  • Switch2 192.168.1.0/24 Zone B, for SAP HANA Secondary Node;

2.3 Creating ECS Instances

Two ECS instances are created in different Zones of the same VPC via Console?úElastic Compute Service ECS?úInstances?úCreate Instance. Choose the “SUSE Linux Enterprise Server for SAP Applications” image from the Image Market place.

2.4 Creating ENIs and Binding to ECS Instances

Create two ENIs via Console?ú Elastic Compute Service ECS?úNetwork and Security?úENI, and attach one for each ECS instance, for HANA System Replication purposes. Configure the IP addresses of the ENIs to the subnet for HANA System Replication only.

echo "192.168.0.82 hana0 hana0" >> /etc/hosts
echo "192.168.1.245 hana1 hana1" >> /etc/hosts

2.5 Creating NAT Gateway and configure SNAT entry

Now create an NAT Gateway attached to the given VPC. In the example at hand, an NAT Gateway named suse_hana_ha_GW has been created:

2.6 Creating STONITH Device and Virtual IP Resource Agent

Download the STONITH fencing software with the following command:

wget http://repository-iso.oss-cn-beijing.aliyuncs.com/ha/aliyun-ecs-pacemaker.tar.gz
tar ¨Cxvf aliyun-ecs-pacemaker.tar.gz
./install
pip install aliyun-python-sdk-ecs aliyun-python-sdk-vpc aliyuncli
aliyuncli configure

3 Software Preparation

The next sections contain information about the required software.

3.1 Software List

The following software must be available: — SUSE Linux Enterprise Server for SAP Applications 12 SP2 — HANA Installation Media — SAP Host Agent Installation Media

3.2 High Availability Extension Installation

Both ECS instances are created with the SUSE Linux Enterprise Server for SAP Applications image. On both ECS instances, the High Availability Extension (with the major software components: Corosync and Pacemaker), and the package SAPHanaSR should be installed. To do so, you can use zypper.

zypper in -t pattern ha_sles
zypper in SAPHanaSR SAPHanaSR-doc

3.3 SAP HANA Installation

Next, install the SAP HANA software on both ECS instances. Make sure the SAP HANA SID and Instance Number are the same (this is required by SAP HANA System Replication). It is recommended to use hdblcm to do the installation. For details refer to the SAP HANA Server Installation and Update Guide at https://help.sap.com/viewer/2c1988d620e04368aa4103bf26f17727/2.0.03/en-US.

3.4 SAP Host Agent Installation

When you have finished the HANA installation with hdblcm as mentioned above, the SAP Host Agent should already be installed on your server. To install it manually, refer to the article Installing SAP Host Agent Manually: https://help.sap.com/saphelp_nw73ehp1/helpdata/en/8b/92b1cf6d5f4a7eac40700295ea687f/content.htm?no_cache=true.

4 Configuring SAP HANA System Replication

The following sections detail how to configure SAP HANA System Replication.

4.1 Backing up SAP HANA on Primary ECS Instance

To do a backup on SAP HANA, you can either use SAP HANA studio or hdbsql as the client command tool.

BACKUP DATA USING FILE('COMPLETE_DATA_BACKUP');
BACKUP DATA for <DATABASE> using FILE('COMPLETE_DATA_BACKUP')
BACKUP DATA for SYSTEMDB using FILE('COMPLETE_DATA_BACKUP')BACKUP DATA for JL0 using FILE('COMPLETE_DATA_BACKUP')

4.2 Configuring SAP HANA System Replication on Primary Node

Log on to the primary node with: su — adm. [sidadm] should be replaced by your SAP HANA database SID. In our example it is su — jl0adm;

vi /hana/shared/<SID>/global/hdb/custom/config/global.ini
[system_replication_hostname_resolution]
<IP> = <HOSTNAME>
[system_replication_hostname_resolution]
192.168.1.246 = hana1

4.3 Configuring SAP HANA System Replication on Secondary Node

Perform the same steps as outlined above for the Primary node on the Secondary node. However, do not forget to use here the IP and hostname of the Primary node instead of the Secondary node.

[system_replication_hostname_resolution]
192.168.0.83 = hana0

4.4 Enable SAP HANA System Replication on Primary Node

Log on to the primary node with: su — adm;

hdbnsutil -sr_enable --name= [primary location name]
hdbnsutil -sr_enable --name=hana0

4.5 Register the Secondary Node to the Primary SAP HANA Node

Log on to the secondary node with: su — adm;

hdbnsutil -sr_register --remoteHost=[location of primary Node] --remoteInstance=[instance number of primary node] --replicationMode=sync --name=[location of the secondary node] --operationMode=logreplay
hdbnsutil -sr_register --name=hana1 --remoteHost=hana0 --remoteInstance=00 --replicationMode=sync --operationMode=logreplay
hdbnsutil -sr_state

5 Configuring High Availability Extension for SAP HANA

5.1 Configuration of Corosync

It is recommended that you add more redundancy for messaging (Heartbeat) by using separate ENIs attached to the ECS instances with a separate network range.

  • Create Keys
  • Configure /etc/corosync/corosync.conf as root on the Primary SAP HANA node with the following content:
totem {
version: 2
token: 5000
token_retransmits_before_loss_const: 6
secauth: on
crypto_hash: sha1
crypto_cipher: aes256
clear_node_high_bit: yes
interface {
ringnumber: 0
bindnetaddr: **IP-address-for-heart-beating-for-the-current-server**
mcastport: 5405
ttl: 1
}
# On Alibaba Cloud, transport should be set to udpu, means: unicast
transport: udpu
}
logging {
fileline: off
to_logfile: yes
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
nodelist {
node {
ring0_addr: **ip-node-1**
nodeid: 1
}
node {
ring0_addr: **ip-node-2**
nodeid: 2
}
}
quorum {
# Enable and configure quorum subsystem (default: off)
# see also corosync.conf.5 and votequorum.5
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}
scp /etc/corosync/authkey root@**hostnameOfSecondaryNode**:/etc/corosync

5.2 Configuration of Pacemaker

For the SAP HANA High Availability solution, you need to configure seven resources and the corresponding constraints in Pacemaker.

5.2.1 Cluster Bootstrap and More

Add the configuration of the bootstrap and default setting of the resource and operations to the cluster. Save the following scripts in a file: crm-bs.txt.

property $id='cib-bootstrap-options' \
stonith-enabled="true" \
stonith-action="off" \
stonith-timeout="150s"
rsc_defaults $id="rsc-options" \
resource-stickness="1000" \
migration-threshold="5000"
op_defaults $id="op-options" \
timeout="600"
crm configure load update crm-bs.txt

5.2.2 STONITH Device

This part defines the Aliyun STONITH devices in the cluster.

primitive res_ALIYUN_STONITH_1 stonith:fence_aliyun \
op monitor interval=120 timeout=60 \
params pcmk_host_list=<primary node hostname> port=<primary node instance id> \
access_key=<access key> secret_key=<secret key> \
region=<region> \
meta target-role=Started
primitive res_ALIYUN_STONITH_2 stonith:fence_aliyun \
op monitor interval=120 timeout=60 \
params pcmk_host_list=<secondary node hostname> port=<secondary node instance id> \
access_key=<access key> secret_key=<secret key> \
region=<region> \
meta target-role=Started
location loc_<primary node hostname>_stonith_not_on_<primary node hostname> res_ALIYUN_STONITH_1 -inf: <primary node hostname>
#Stonith 1 should not run on primary node because it is controling primary node
location loc_<secondary node hostname>_stonith_not_on_<secondary node hostname> res_ALIYUN_STONITH_2 -inf: <secondary node hostname>
#Stonith 2 should not run on secondary node because it is controling secondary node
  • [secondary node hostname] / [primary node hostname] should be replaced by the real hostname of your Secondary node.
  • [secondary node instance id] / [secondary node instance id] should be replaced by the real instance-id of your Secondary node. You can get this from the console.
  • [access key] should be replaced with real access key.
  • [secret key] should be replaced with real secret key.
  • [region] should be replaced with real region name where the node is located.
crm configure load update crm-stonith.txt

5.2.3 SAPHanaTopology

This part defines an SAPHanaTopology RA, and a clone of the SAPHanaTopology on both nodes in the cluster. Save the following scripts in a file: crm-saphanatop.txt.

primitive rsc_SAPHanaTopology_<SID>_HDB<instance number> ocf:suse:SAPHanaTopology \
operations $id="rsc_SAPHanaTopology_<SID>_HDB<instance number>-operations" \
op monitor interval="10" timeout="600" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="300" \
params SID="<SID>" InstanceNumber="<instance number>"
clone cln_SAPHanaTopology_<SID>_HDB<instance number> rsc_SAPHanaTopology_<SID>_HDB<instance number> \
meta clone-node-max="1" interleave="true"
  • [SID] should be replaced by the real SAP HANA SID.
  • [instance number] should be replaced by the real SAP HANA Instance Number.
crm configure load update crm-saphanatop.txt

5.2.4 SAPHana

This part defines an SAPHana RA, and a multi-state resource of SAPHana on both nodes in the cluster. Save the following scripts in a file: crm-saphana.txt.

primitive rsc_SAPHana_<SID>_HDB<instance number> ocf:suse:SAPHana \
operatoins $id="rsc_sap_<SID>_HDB<instance number>-operations" \
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="3600" \
op monitor interval="60" role="Master" timeout="700" \
op monitor interval="61" role="Slave" timeout="700" \
params SID="<SID>" InstanceNumber="<instance number>" PREFER_SITE_TAKEOVER="true" \
DUPLICATE_PRIMARY_TIMEOUT="7200" AUTOMATED_REGISTER="false"
ms msl_SAPHana_<SID>_HDB<instance number> rsc_SAPHana_<SID>_HDB<instance number> \
meta clone-max="2" clone-node-max="1" interleave="true"
  • [SID] should be replaced by the real SAP HANA SID.
  • [instance number] should be replaced by the real SAP HANA Instance Number.
crm configure load update crm-saphana.txt

5.2.5 Virtual IP

This part defines a Virtual IP RA in the cluster. Save the following scripts in a file: crm-vip.txt.

primitive res_vip_<SID>_HDB<instance number> ocf:aliyun:vpc-move-ip \
op monitor interval=60 \
meta target-role=Started \
params address=<virtual_IPv4_address> routing_table=<route_table_ID> interface=eth0
  • [virtual_IP4_address] should be replaced by the real IP address you prefer to provide the service.
  • [route_table_ID] should be replaced by the route table ID of your VPC.
  • [SID] should be replaced by the real SAP HANA SID.
  • [instance number] should be replaced by the real SAP HANA Instance Number.
crm configure load update crm-vip.txt

5.2.6 Constraints

Two constraints are organizing the correct placement of the virtual IP address for the client database access and the start order between the two resource agents SAPHana and SAPHanaTopology. Save the following scripts in a file: crm-constraint.txt.

colocation col_SAPHana_vip_<SID>_HDB<instance number> 2000: rsc_vip_<SID>_HDB<instance number>:started \
msl_SAPHana_<SID>_HDB<instance number>:Master
order ord_SAPHana_<SID>_HDB<instance number> Optional: cln_SAPHanaTopology_<SID>_HDB<instance number> \
msl_SAPHana_<SID>_HDB<instance number>
  • [SID] should be replaced by the real SAP HANA SID;
  • [instance number] should be replaced by the real SAP HANA Instance Number;
crm configure load update crm-constraint.txt

5.3 Check the Cluster Status

Start the SAP HANA High Availability Cluster on both nodes:

systemctl start Pacemaker
systemctl status pacemaker
crm_mon ¨Cr

5.4 Verify the High Availability Takeover

Shut down the primary node.

6 Example

6.1 Example of Cluster Configuration

You can check the cluster configuration via the command crm configure show. For the example at hand, the cluster configuration should display the following content:

node 1: hana0 \
attributes hana_jl0_vhost=hana0 hana_jl0_srmode=sync hana_jl0_remoteHost=hana1 hana_jl0_site=hana0 lpa_jl0_lpt=10 hana_jl0_op_mode=logreplay
node 2: hana1 \
attributes lpa_jl0_lpt=1529509236 hana_jl0_op_mode=logreplay hana_jl0_vhost=hana1 hana_jl0_site=hana1 hana_jl0_srmode=sync hana_jl0_remoteHost=hana0
primitive res_ALIYUN_STONITH_0 stonith:fence_aliyun \
op monitor interval=120 timeout=60 \
params pcmk_host_list=hana0 port=i-gw8byf3m4f9a8os6rke8 access_key=<access key> secret_key=<secret key> region=eu-central-1 \
meta target-role=Started
primitive res_ALIYUN_STONITH_1 stonith:fence_aliyun \
op monitor interval=120 timeout=60 \
params pcmk_host_list=hana1 port=i-gw8byf3m4f9a8os6rke9 access_key=<access key> secret_key=<secret key> region=eu-central-1 \
meta target-role=Started
primitive rsc_SAPHanaTopology_JL0_HDB00 ocf:suse:SAPHanaTopology \
operations $id=rsc_SAPHanaTopology_JL0_HDB00-operations \
op monitor interval=10 timeout=600 \
op start interval=0 timeout=600 \
op stop interval=0 timeout=300 \
params SID=JL0 InstanceNumber=00
primitive rsc_SAPHana_JL0_HDB00 ocf:suse:SAPHana \
operations $id=rsc_SAPHana_JL0_HDB00-operations \
op start interval=0 timeout=3600 \
op stop interval=0 timeout=3600 \
op promote interval=0 timeout=3600 \
op monitor interval=60 role=Master timeout=700 \
op monitor interval=61 role=Slave timeout=700 \
params SID=JL0 InstanceNumber=00 PREFER_SITE_TAKEOVER=true DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
primitive rsc_vip_JL0_HDB00 ocf:aliyun:vpc-move-ip \
op monitor interval=60 \
meta target-role=Started \
params address=192.168.4.1 routing_table=vtb-gw8fii1g1d8cp14tzynub interface=eth0
ms msl_SAPHana_JL0_HDB00 rsc_SAPHana_JL0_HDB00 \
meta clone-max=2 clone-node-max=1 interleave=true target-role=Started
clone cln_SAPHanaTopology_JL0_HDB00 rsc_SAPHanaTopology_JL0_HDB00 \
meta clone-node-max=1 interleave=true
colocation col_SAPHana_vip_JL0_HDB00 2000: rsc_vip_JL0_HDB00:Started msl_SAPHana_JL0_HDB00:Master
location loc_hana0_stonith_not_on_hana0 res_ALIYUN_STONITH_0 -inf: hana0
location loc_hana1_stonith_not_on_hana1 res_ALIYUN_STONITH_1 -inf: hana1
order ord_SAPHana_JL0_HDB00 Optional: cln_SAPHanaTopology_JL0_HDB00 msl_SAPHana_JL0_HDB00
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.15-21.1-e174ec8 \
cluster-infrastructure=corosync \
stonith-action=off \
stonith-enabled=true \
stonith-timeout=150s \
last-lrm-refresh=1529503606 \
maintenance-mode=false
rsc_defaults rsc-options: \
resource-stickness=1000 \
migration-threshold=5000
op_defaults op-options: \
timeout=600

6.2 Example of /etc/corosync/corosync.conf

For the example at hand, the corosync.conf on hana1 should display the following content:

totem{
version: 2
token: 5000
token_retransmits_before_loss_const: 6
secauth: on
crypto_hash: sha1
crypto_cipher: aes256
clear_node_high_bit: yes
interface {
ringnumber: 0
bindnetaddr: 192.168.0.83
mcastport: 5405
ttl: 1
}
# On Alibaba Cloud, transport should be set to udpu, means: unicast
transport: udpu
}
logging {
fileline: off
to_logfile: yes
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
nodelist {
node {
ring0_addr: 192.168.0.83
nodeid: 1
}
node {
ring0_addr: 192.168.1.246
nodeid: 2
}
}
quorum {
# Enable and configure quorum subsystem (default: off)
# see also corosync.conf.5 and votequorum.5
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}

7. Reference

  1. Pacemaker 1.1 Configuration Explained https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/.
  2. SAP HANA SR Performance Optimized Scenario https://www.suse.com/media/white-paper/suse_linux_enterprise_server_for_sap_applications_12_sp1.pdf.
  3. SAP HANA system replication — SAP Help Portal https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.03/en-US/b74e16a9e09541749a745f41246a065e.html.
  4. SAP Applications on Alibaba Cloud: Supported Products and IaaS VM Types https://launchpad.support.sap.com/#/notes/2552731

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com