Oracle RAC not starting upon rebooting. The following show some symptoms, tests, and what to look for.
Node2 not getting any RAC services status back when first startup or reboot.
[root@rac02 bin]# ./crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
Upon starting it all up, it has couple of errors. The "CRS-1705" and the "ora.diskmon" . Those are indications that there are some issues with the ASM storage being provisioned in node2.
[root@rac01 bin]# ./crsctl start cluster -all
CRS-2672: Attempting to start 'ora.cssd' on 'rac01'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac01'
CRS-2672: Attempting to start 'ora.cssd' on 'rac02'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac02'
CRS-2676: Start of 'ora.diskmon' on 'rac01' succeeded
CRS-2676: Start of 'ora.diskmon' on 'rac02' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac01'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac01'
CRS-2676: Start of 'ora.ctssd' on 'rac01' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac01'
CRS-2676: Start of 'ora.asm' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac01'
CRS-2676: Start of 'ora.storage' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac01'
CRS-2676: Start of 'ora.crsd' on 'rac01' succeeded
CRS-1705: Found 0 configured voting files but 1 voting files are required, terminating to ensure data integrity; details at (:CSSNM00065:) in /u01/app/oracle/diag/crs/rac02/crs/trace/ocssd.trc
CRS-2674: Start of 'ora.cssd' on 'rac02' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'rac02'
CRS-2681: Clean of 'ora.cssd' on 'rac02' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac02'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac02'
CRS-2676: Start of 'ora.diskmon' on 'rac02' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac02' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac02'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac02'
CRS-2676: Start of 'ora.ctssd' on 'rac02' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac02' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac02'
CRS-2676: Start of 'ora.asm' on 'rac02' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac02'
CRS-2676: Start of 'ora.storage' on 'rac02' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac02'
CRS-2676: Start of 'ora.crsd' on 'rac02' succeeded
Performing all the RAC-related storage checks will all appear hung. Once performing the oracelasm listdisks, it initiated the disk on node2 the RAC-related disk checks will show the disks output.
[root@rac02 bin]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "DISK01"
[root@rac02 bin]# ./crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE cacc790e79514f14bf658a94d092b503 (/dev/oracleasm/disks/DISK01) [DATA]
Located 1 voting disk(s).
[root@rac02 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 4
Total space (kbytes) : 491684
Used space (kbytes) : 84360
Available space (kbytes) : 407324
ID : 1922254439
Device/File Name : +DATA
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
Oracle Cluster Registry check was cancelled because an ongoing update was detected.
All the oracleasm configure seems to be appropriately set.
[root@rac02 ~]# oracleasm status
Checking if ASM is loaded: yes
Checking if /dev/oracleasm is mounted: yes
[root@rac02 ~]# oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=oracle
ORACLEASM_GID=oinstall
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""
ORACLEASM_SCAN_DIRECTORIES=""
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"
systemd did have the oracleasm enabled upon start up.
[root@rac02 bin]# systemctl list-unit-files --type=service|grep oracleasm
oracleasm.service enabled
In the oracleasm log "/var/log/oracleasm" was showing "Disk "DISK01" does not exist or is not instantiated" when node2 rebooted. oracleasm configure showing disk scan is enabled. So, Scanning the disks manually after reboot seems to fix the issue. So, the issue has to do storage and timing. After some googling, my issue seems to match the following 2 notes from Oracle Metalink.
Oracle Linux 7: ASM Disks Created on FCOE Target Disks are Not Visible After System Reboot (Doc ID 2065945.1)
/usr/sbin/oracleasm.init" prior to scandisk and it solved my issue. That gives about 20 seconds for the storage to be presented before the scandisk and initiation. |
No comments:
Post a Comment