1019ohasd Start Fail - Log

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 4

前几天帮助同事处理了个案例, 主机意外重启后数据库无法启动, 环境是 11.2.0.

3 standalone on aix,
用的是 ASM.

因为当时没有记录具体流程,在这里只简单的记录

# check init.ohasd process is running


ps -ef | grep init.ohasd
# if not the run as root
/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
# check ohasd process is running
ps -ef | grep ohasd.bin

# if ohasd.bin not running, If you selected a Standalone install (SIHA) then


ohasd.bin should run as the crs owner. If configured for a cluster it will run as
root.

crsctl start has


crsctl check has 显示 HAS online
crsctl check css
crsctl check crs
crsctl stat res -t -init 没有任何报错也没返回
ps -ef|grep d.bin 只有 ohasd.bin

到这步发现 HAS 已启动,但是 cssd 都没启动, cssd 和一些 local OCR 做为资源是随 HAS 启动的, 如果没有
数据库使用 ASM, 默认是不会自动启动 cssd 和 asm 的。 但是使用 crsctl start res ora.cssd -init
又提示没有资源, 查看 GI alert log 如下

# gi alert
[ohasd(7210)]CRS-8017:location: /etc/oracle/lastgasp has 4 reboot advisory log
files, 0 were announced and 0 errors occurred
2015-10-18 12:51:23.952
[ohasd(7210)]CRS-2772:Server 'anbob' has been assigned to pool 'Free'.
2015-10-18 13:52:40.249
[ohasd(8085)]CRS-2112:The OLR service started on node anbob.
2015-10-18 13:52:40.314
[ohasd(8085)]CRS-2772:Server 'anbob' has been assigned to pool 'Free'.
2015-10-18 14:01:57.696
[ohasd(8668)]CRS-2112:The OLR service started on node anbob.
2015-10-18 14:01:57.761
[ohasd(8668)]CRS-2772:Server 'anbob' has been assigned to pool 'Free'.
2015-10-18 15:03:27.157
[ohasd(11026)]CRS-2112:The OLR service started on node anbob.
2015-10-18 15:03:27.202
[ohasd(11296)]CRS-1339:Oracle High Availability Service aborted due to an
unexpected error
[Failed to initialize Oracle Local Registry]. Details at (:OHAS00106:) in
/oracle/app/oracle/product/11.2.0/grid/log/anbob/ohasd/ohasd.log.
2015-10-18 15:03:27.220
[ohasd(11026)]CRS-2772:Server 'anbob' has been assigned to pool 'Free'.
2015-10-18 15:23:08.435
[ohasd(12135)]CRS-2112:The OLR service started on node anbob.
2015-10-18 15:23:08.499
[ohasd(12135)]CRS-2772:Server 'anbob' has been assigned to pool 'Free'.

# OHASD.LOG

[grid@anbob anbob]$ vi
/oracle/app/oracle/product/11.2.0/grid/log/anbob/ohasd/ohasd.log
2015-10-18 15:03:27.159: [ OCRSRV][1032845088]th_init: Local listener did not
reach valid state
2015-10-18 15:03:27.159: [ CRSOCR][555579168] CAAOCR GET Debug sblevel
Level[default]: 0
...
[ clsdmt][4118787840]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=anbobDBG_OHASD))
2015-10-18 15:03:27.170: [ clsdmt][4118787840]PID for the Process [11026], connkey
8
2015-10-18 15:03:27.176: [ default][555579168] Ohasd Daemon Started.
2015-10-18 15:03:27.176: [ CLSLG][3747604224] Last Gasp Monitor thread started
2015-10-18 15:03:27.176: [ CLSLG][3747604224] processing Last Gasp disk
location /etc/oracle/lastgasp
2015-10-18 15:03:27.177: [ CLSE][555579168]clse_get_auth_loc: Returning default
authloc: /oracle/app/oracle/product/11.2.0/grid/auth/ohasd/anbob
2015-10-18 15:03:27.177: [ default][555579168] AuthLoc
/oracle/app/oracle/product/11.2.0/grid/auth/ohasd/anbob
2015-10-18 15:03:27.177: [ default][555579168] PE Engine: NEW
2015-10-18 15:03:27.177: [ default][555579168] Using OCR batch ops : ENABLED
2015-10-18 15:03:27.177: [ default][555579168] RD registrations and Clusterization
disabled.
2015-10-18 15:03:27.177: [ CLSLG][3747604224] monitoring new interface 0.0.0.0
2015-10-18 15:03:27.178: [ default][555579168][F-ALGO] getIpcPath returning
(ADDRESS=(PROTOCOL=IPC)(KEY=OHASD_IPC_SOCKET_11))
2015-10-18 15:03:27.178: [CLSFRAME][555579168] Inited lsf context 0x220efe0
2015-10-18 15:03:27.178: [CLSFRAME][555579168] Initing CLS Framework messaging
2015-10-18 15:03:27.179: [ CLSVER][555579168] Static Version 11.2.0.1.0
2015-10-18 15:03:27.179: [ default][555579168][F-ALGO] getIpcPath returning
(ADDRESS=(PROTOCOL=IPC)(KEY=OHASD_IPC_SOCKET_11))
2015-10-18 15:03:27.180: [UiServer][555579168] UI Comms initalize() 1
2015-10-18 15:03:27.181: [CLSFRAME][555579168] New Framework state: 2
2015-10-18 15:03:27.181: [CLSFRAME][555579168] M2M is starting...
2015-10-18 15:03:27.182: [ CRSCOMM][555579168]
m_pClscCtx=0x22c8030m_pUgblm=0x22d32f0
2015-10-18 15:03:27.182: [ CRSCOMM][555579168] Starting send thread
2015-10-18 15:03:27.182: [ CRSCOMM][555579168] IPC Listener instantiated for:
(ADDRESS=(PROTOCOL=IPC)(KEY=OHASD_IPC_SOCKET_11))
2015-10-18 15:03:27.182: [ CRSCOMM][205494016] clsIpc: sendWork thread started.
2015-10-18 15:03:27.182: [ CRSCOMM][555579168] IPC Listener started listening.
2015-10-18 15:03:27.183: [ CRSCOMM][4097808128] IPCL thread started listening
2015-10-18 15:03:27.183: [CLSFRAME][555579168] Starting thread model named:
AgfwProxySrvTM
2015-10-18 15:03:27.183: [CLSFRAME][555579168] Starting thread model named:
OcrModuleTM
2015-10-18 15:03:27.183: [CLSFRAME][555579168] Starting thread model named:
PolicyEngineTM
2015-10-18 15:03:27.184: [CLSFRAME][555579168] Starting thread model named:
SharedThreadTM
2015-10-18 15:03:27.184: [CLSFRAME][555579168] Starting thread model named:
UiServerTM
2015-10-18 15:03:27.184: [CLSFRAME][555579168] New Framework state: 3
2015-10-18 15:03:27.185: [ CRSRPT][3699324672] Enabled
2015-10-18 15:03:27.185: [ CRSPE][3701425920] PE Role|State Update: old role
[INVALID] new [INVALID]; old state [Not yet initialized] new [Enabling: waiting for
role]
2015-10-18 15:03:27.185: [ CRSSE][3699324672] SE module master election disabled
2015-10-18 15:03:27.185: [ CRSSE][3699324672] Master Change Event; New Master
Node ID:0 This Node's ID:0
2015-10-18 15:03:27.189: [ CRSPE][3701425920] Sent request to write event
sequence number 1400000 to repository
2015-10-18 15:03:27.189: [ CRSPE][3701425920] Reading (1) servers
2015-10-18 15:03:27.189: [ CRSPE][3701425920] There are no resource types to
read.
2015-10-18 15:03:27.189: [ CRSPE][3701425920] There are no resources to read.
2015-10-18 15:03:27.191: [ CRSPE][3701425920] Wrote new event sequence to
repository
2015-10-18 15:03:27.192: [ CRSPE][3701425920] Reading (1) server pools
2015-10-18 15:03:27.196: [ CRSPE][3701425920] Finished reading configuration.
Parsing...
2015-10-18 15:03:27.202: [ CRSPE][3701425920] Parsing server pools...
2015-10-18 15:03:27.202: [ CRSOCR][1032845088] OCR context init failure. Error:
PROCL-24: Error in the messaging layer Messaging error [18]
2015-10-18 15:03:27.202: [ default][1032845088] OLR initalization failured, rc=24
2015-10-18 15:03:27.202: [ CRSPE][3701425920] Parsed and validated SERVERPOOL:
Free [min:0][max:-1][importance:0] NO SERVERS ASSIGNED
2015-10-18 15:03:27.202: [ default][1032845088]Created alert : (:OHAS00106:) :
Failed to initialize Oracle Local Registry
2015-10-18 15:03:27.202: [ CRSPE][3701425920] Server pools parsed
2015-10-18 15:03:27.202: [ default][1032845088][PANIC] OHASD exiting; Could not
init OLR
2015-10-18 15:03:27.202: [ CRSPE][3701425920] Server Pool Free has been
registered
2015-10-18 15:03:27.202: [ default][1032845088] Done.

2015-10-18 15:03:27.202: [ CRSPE][3701425920] Cluster reboot took place.


2015-10-18 15:03:27.203: [ CRSPE][3701425920] Configuration has been parsed
2015-10-18 15:03:27.203: [ default][3697223424][F-ALGO] getIpcPath returning
(ADDRESS=(PROTOCOL=IPC)(KEY=OHASD_UI_SOCKET))
2015-10-18 15:03:27.204: [UiServer][3697223424] UI socket on:
(ADDRESS=(PROTOCOL=IPC)(KEY=OHASD_UI_SOCKET))
2015-10-18 15:03:27.204: [ default][3697223424][F-ALGO] getIpcPath returning
(ADDRESS=(PROTOCOL=IPC)(KEY=CRSD_UI_SOCKET))
2015-10-18 15:03:27.204: [UiServer][3697223424] UI socket on:
(ADDRESS=(PROTOCOL=IPC)(KEY=CRSD_UI_SOCKET))
2015-10-18 15:03:27.204: [UiServer][3695122176] UI comms listening for events.
2015-10-18 15:03:27.204: [CLSFRAME][3711932160] Module Enabling is complete

从上面可以看到 OLR 初始化失败,应该是 olr miss 或 corrupted,因为是 Standalone 所有找个本地的 olr


备份还原一下应该就可以解决。

[grid@anbob ~]$ cd $ORACLE_HOME


[grid@anbob grid]$ cd cdata
[grid@anbob cdata]$ ls
anbob localhost
[grid@anbob cdata]$ cd anbob
[grid@anbob anbob]$ ls
backup_20140818_113812.olr
[grid@anbob anbob]$ ls -lrt
total 5260
-rwxr-xr-x 1 grid oinstall 5386240 Aug 18 2014 backup_20140818_113812.olr

[grid@anbob anbob]$ ocrconfig -local -showbackup


PROTL-25: Manual backups for the Oracle Local Registry are not available

可以看到有个 2014 年的一个 olr 备份,ocr 备份存放在$GRID_HOME/cdata, 其它没有手动备份的。

[grid@anbob anbob]$ ls -lrt /etc/oracle/


total 24
drwxr-xr-x 3 root oinstall 4096 Oct 18 04:58 scls_scr
drwxrwxr-x 5 root oinstall 4096 Oct 18 04:58 oprocd
-rw-r--r-- 1 root root 0 Oct 18 04:58 olr.loc.orig
-rw-r--r-- 1 root oinstall 130 Oct 18 04:58 olr.loc
-rw-r--r-- 1 root root 16 Oct 18 04:58 ocr.loc.orig
-rw-r----- 1 grid oinstall 95 Oct 18 04:58 ocr.loc
drwxrwx--- 3 root oinstall 4096 Oct 18 13:33 lastgasp

# To restore OLR
crsctl stop has
ocrconfig -local -restore
/oracle/app/oracle/product/11.2.0/grid/cdata/anbob/backup_20140818_113812.olr

crsctl start has


crsctl start res ora.cssd -init
终于可以显示资源了,并且 cssd 已自动启动,剩下的工作就是手动启 asm,mount asm 磁盘组,启数据库
su - grid
sqlplus / as sysasm
startup
alter diskgroup <diskgroup name> mount
su - oracle
sqlplus startup database;

这里因为刚好有以前的备份,如果没有备份时可以参考 NOTE 1539020.1, 重建 OLR

In an environment with fresh GI installation:


1. Deconfig the existing clusterware, as this is only done on the problematic node,
the other nodes should have Clusterware up and running, this command should only
deconfig the clusterware configuration for local node, it should not touch/change
OCR.
# <GRID_HOME>/crs/install/rootcrs.pl -deconfig -force
2. Rerun root.sh
# <GRID_HOME>/root.sh

You might also like