HANA DB Failover and Failback
2023-11-29 00:39:27 Author: blogs.sap.com(查看原文) 阅读量:9 收藏

In this blog I am sharing a HANA  database Disaster Recovery exercise using the new option for takeover command, –suspendPrimary.

It is a much simpler option to failover from site to site, avoiding a “split brain” situation , which is relatively complex to fix.

The complete syntax : hdbnsutil -sr_takeover –suspendPrimary

The new option is applicable for system version on the primary is HANA 2.0 SPS 04 or greater.

Reference: SAP HANA High Availability (New and Changed) | SAP Help Portal

The environment which I have tested:

Three databases with two are running in a cluster for HA(High availability), Site1 and Site2. and one for disaster recovery, Site3. Site3 is placed in a remote location away from Site1 and Site2, therefore synchronization method is asynchronous.

Prerequisite:

Verify that system replication is active and that all services are in sync.

You can check that the column REPLICATION_STATUS in M_SERVICE_REPLICATION has the value ACTIVE for all services.

You also can find this from the O/S level using python script systemReplicationStatus.py located in /usr/sap/SID/HDBXX/exe/python_support  – XX – Instance number

Initial replication site mapping as the following:

Initial%20Replication%20Site%20MappingSite1 – Primary

Site2 – Secondary, replicating from Site1 synchrounously

Site3 – Secondary, replicating from Site2 asynchrounously

In a disaster situation, we want Site3 to be the primary database.

Steps to takeover Site3 to become a Primary:

1. Make sure replication are active and all the sites are fully synchronized.

2. Register Site3 to Site1 otherwise you will get the “no consumer error”

Make sure the servers are still fully synchronized. If synchronization is still in progress, wait until it finishes, or troubleshoot any error which might occur.

Command: hdbnsutil -sr_register –name=Site3 –online –remoteHost=Site1_Host –remoteInstance=00 –replicationMode=sync

3. Register Site2 to Site3.

If this is not done, Site2 will go down when the takeover is being executed, and cause cluster error.

Make sure the servers are still fully synchronized. If synchronization is still in progress, wait until it finishes, or troubleshoot any error which might occur.

Command: hdbnsutil -sr_register –name=Site2 –online –remoteHost=Site3_Host –remoteInstance=00 –replicationMode=sync

4. Perform takeover from Site3

Make sure the servers are still fully synchronized. If synchronization is still in progress, wait until it finishes, or troubleshoot any error which might occur.

hdbnsutil -sr_takeover –suspendPrimary

5. Register Site1 to Site3

After the takeover, the suspended primary is unblocked when you register it as the new secondary.

Make sure the servers are still fully synchronized. If synchronization is still in progress, wait until it finishes, or troubleshoot any error which might occur.

Command: hdbnsutil -sr_register –name=Site1 –online –remoteHost=Site3_Host –remoteInstance=00 –replicationMode=sync

6. Status after the takeover

Now Site3 becomes the new primary.

With the new takeover option –suspendPrimary, I did not observe any DB downtime, no cluster maintenance mode needed.

For failing back, follow the same steps in reverse, please let me know if you need the detail steps.

Thank you

Welly Sunarko


文章来源: https://blogs.sap.com/2023/11/28/hana-db-failover-and-failback/
如有侵权请联系:admin#unsafe.sh