HA cluster takeover takes too long on HANA indexserver failure
This document (000020845) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server for SAP Applications 12
Situation
- A software failure causes one or more HANA processes to be restarted in place by the HANA daemon (hdbdaemon).
- A hardware error causes the HANA indexserver (hdbindexserver) to restart locally.
Resolution
The API method srServiceStateChanged() is called when HANA processes are failing, starting or stopping.
The SUSE hook script susChkSrv.py can be called on any srServiceStateChanged() event. It executes a predefined action on HANA. As soon as the HANA landscapeHostConfiguration status changes to 1, the Linux cluster will take action. The cluster action depends on HANA system replication status and the RA´s configuration parameters PREFER_SITE_TAKEOVER and AUTOMATED_REGISTER.
The resolution is described below for a SAP HANA scale-up systems.
It can be adapted for scale-out. See manual page susChkSrv.py(7) for details.
The resolution is implemented by four steps:
1. Updating the software packages
The package SAPHanaSR should be updated on all nodes. It has to provide the hook script susChkSrv.py.
# zypper up SAPHanaSR SAPHanaSR-doc # rpm -ql SAPHanaSR | grep susChkSrv.py
The section [ha_dr_provider_suschksrv] has to be added to HANA global.ini at both sites.
--- [ha_dr_provider_suschksrv] provider = susChkSrv path = /usr/share/SAPHanaSR/ execution_order = 3 action_on_lost = stop ---
Refer to SAP HANA documentation on how to change the global.ini.
Alternatively you may use SAPHanaSR-manageProvider. See manual pages susChkSrv.py(7) and SAPHanaSR-manageProvider(8).
3. Loading the new HADR provider hook script
The newly added HADR provider hook script needs to be loaded.
# su - <sid>adm ~> hdbnsutil -reloadHADRProviders; echo rc=$?
Refer to SAP HANA documentation on details about loading HADR provider hook scripts.
4. Checking if the hook script has been loaded
The hook script should appear in the HANA nameserver trace files at both sites. It also should write into its own log file nameserver_suschksrv.trc.
# su - <sid>adm ~> cdtrace ~> grep HADR.*load.*susChkSrv nameserver_*.trc ~> grep susChkSrv.init nameserver_*.trc
See manual page susChkSrv.py(7).
Additional Information
susChkSrv.py(7)
SAPHanaSR-manageProvider(8)
zypper(8)
https://www.suse.com/c/emergency-braking-for-sap-hana-dying-indexserver/
https://documentation.suse.com/sbp/all/single-html/SLES4SAP-hana-sr-guide-PerfOpt-15/
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020845
- Creation Date: 07-Nov-2022
- Modified Date:15-Nov-2022
-
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com