An example of how to perform Automatic Failover and later switchover on Oracle DataGuard.
An example of how to perform Automatic Failover and later switchover on Oracle DataGuard.
ALTER DATABASE CONVERT TO PHYSICAL STANDBY;
dgmgrl Physical Standby is showing offline and Real Time Query is showing OFF.
DGMGRL> show configuration
Configuration - orcl
Protection Mode: MaxPerformance
Members:
orclprm - Primary database
orclstb - Physical standby database
Error: ORA-16664: unable to receive the result from a member
Fast-Start Failover: Disabled
Configuration Status:
ERROR (status updated 66 seconds ago)
DGMGRL> show database orclprm
Database - orclprm
Role: PRIMARY
Intended State: TRANSPORT-ON
Instance(s):
orcl
Database Status:
SUCCESS
DGMGRL> show database orclstb
Database - orclstb
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: (unknown)
Apply Lag: (unknown)
Average Apply Rate: (unknown)
Real Time Query: OFF
Instance(s):
orcl
Database Status:
DGM-17016: failed to retrieve status for database "orclstb"
ORA-16664: unable to receive the result from a member
Primary node listener is not started and that's what caused the inability to communicate with the Physical Standby database.
oracle@Primary ~]$ lsnrctl status
LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 16-SEP-2022 16:36:15
Copyright (c) 1991, 2019, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
TNS-12541: TNS:no listener
TNS-12560: TNS:protocol adapter error
TNS-00511: No listener
Linux Error: 111: Connection refused
[oracle@Primary ~]$ lsnrctl start
The Physical Standby status returns from ON after starting up the listener.
[oracle@Primary ~]$ dgmgrl /
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Fri Sep 16 16:36:32 2022
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "ORCLPRM"
Connected as SYSDG.
DGMGRL> show configuration
Configuration - orcl
Protection Mode: MaxPerformance
Members:
orclprm - Primary database
orclstb - Physical standby database
Error: ORA-16664: unable to receive the result from a member
Fast-Start Failover: Disabled
Configuration Status:
ERROR (status updated 71 seconds ago)
DGMGRL> show database orclstb
Database - orclstb
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
Average Apply Rate: 2.00 KByte/s
Real Time Query: ON
Instance(s):
orcl
Database Status:
SUCCESS
While in the middle of installing Oracle software on RAC. One of the terminal throwing out and error and terminated the session.
[oracle@rac02 ~]$
Message from syslogd@rac02 at Jan 13 15:47:15 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 21s! [osysmond.bin:4024]
In another session, the installation still going on, and no other errors are reported. Installation is still going and files are being copied from node1 to node2.
This issue seems to be coming from VMW and indicating that this is simply some performance (latency) hiccups.
https://kb.vmware.com/s/article/67623
Oracle RAC not starting upon rebooting. The following show some symptoms, tests, and what to look for.
Node2 not getting any RAC services status back when first startup or reboot.
[root@rac02 bin]# ./crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
Upon starting it all up, it has couple of errors. The "CRS-1705" and the "ora.diskmon" . Those are indications that there are some issues with the ASM storage being provisioned in node2.
[root@rac01 bin]# ./crsctl start cluster -all
CRS-2672: Attempting to start 'ora.cssd' on 'rac01'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac01'
CRS-2672: Attempting to start 'ora.cssd' on 'rac02'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac02'
CRS-2676: Start of 'ora.diskmon' on 'rac01' succeeded
CRS-2676: Start of 'ora.diskmon' on 'rac02' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac01'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac01'
CRS-2676: Start of 'ora.ctssd' on 'rac01' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac01'
CRS-2676: Start of 'ora.asm' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac01'
CRS-2676: Start of 'ora.storage' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac01'
CRS-2676: Start of 'ora.crsd' on 'rac01' succeeded
CRS-1705: Found 0 configured voting files but 1 voting files are required, terminating to ensure data integrity; details at (:CSSNM00065:) in /u01/app/oracle/diag/crs/rac02/crs/trace/ocssd.trc
CRS-2674: Start of 'ora.cssd' on 'rac02' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'rac02'
CRS-2681: Clean of 'ora.cssd' on 'rac02' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac02'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac02'
CRS-2676: Start of 'ora.diskmon' on 'rac02' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac02' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac02'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac02'
CRS-2676: Start of 'ora.ctssd' on 'rac02' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac02' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac02'
CRS-2676: Start of 'ora.asm' on 'rac02' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac02'
CRS-2676: Start of 'ora.storage' on 'rac02' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac02'
CRS-2676: Start of 'ora.crsd' on 'rac02' succeeded
Performing all the RAC-related storage checks will all appear hung. Once performing the oracelasm listdisks, it initiated the disk on node2 the RAC-related disk checks will show the disks output.
[root@rac02 bin]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "DISK01"
[root@rac02 bin]# ./crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE cacc790e79514f14bf658a94d092b503 (/dev/oracleasm/disks/DISK01) [DATA]
Located 1 voting disk(s).
[root@rac02 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 4
Total space (kbytes) : 491684
Used space (kbytes) : 84360
Available space (kbytes) : 407324
ID : 1922254439
Device/File Name : +DATA
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
Oracle Cluster Registry check was cancelled because an ongoing update was detected.
All the oracleasm configure seems to be appropriately set.
[root@rac02 ~]# oracleasm status
Checking if ASM is loaded: yes
Checking if /dev/oracleasm is mounted: yes
[root@rac02 ~]# oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=oracle
ORACLEASM_GID=oinstall
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""
ORACLEASM_SCAN_DIRECTORIES=""
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"
systemd did have the oracleasm enabled upon start up.
[root@rac02 bin]# systemctl list-unit-files --type=service|grep oracleasm
oracleasm.service enabled
In the oracleasm log "/var/log/oracleasm" was showing "Disk "DISK01" does not exist or is not instantiated" when node2 rebooted. oracleasm configure showing disk scan is enabled. So, Scanning the disks manually after reboot seems to fix the issue. So, the issue has to do storage and timing. After some googling, my issue seems to match the following 2 notes from Oracle Metalink.
Oracle Linux 7: ASM Disks Created on FCOE Target Disks are Not Visible After System Reboot (Doc ID 2065945.1)
/usr/sbin/oracleasm.init" prior to scandisk and it solved my issue. That gives about 20 seconds for the storage to be presented before the scandisk and initiation. |