Wednesday, January 27, 2016

Oracle datapump migration to 11G triggered [qerxtgetrefoffset_911]

I have worked on this issue with Oraclec Support for over 8 months. It took 5 months for Oracle Engineering to recognize this as a bug and another 3 months find resolution. Oracle Engineering finally came back and stated that they are unable to fix it due to its complexity and will be fixed in latest  12.1.0.2.0 version. The patch id is 13770504 and has been rolled in to it as well.

I have done hundreds of datapump and this one just would not go through. I have tried everything that I can dig through in Oracle Metalink and attempted informal patches from Oracle on 11.2.0.1,11.2.0.3, 11.2.0.4 and 12.0.1.0. Some of the attempts I have made are using VERSION=12, parallel=1 and tried recommended one-off patch from Oracle.

OK, OK, so what was my issue ??

I was trying to migrate their database at 10.2.0.5 to 11.2.0.4 and been getting ORA-­600 [qerxtGetRefOffset_911]. Oracle alert.log, trace files and datapump traces and the issue appear to hit the Datapump Worker Process bug. There were quite a bit of similar bugs have been reported and it seems to me that the fix did not make it into 11.2.0.4 at first. They all appear similar to the following errors


ORA­39014: One or more workers have prematurely exited. 
ORA­39029: worker 1 with process name "DW00" prematurely terminated 
ORA­31671: Worker process DW00 had an unhandled exception. 
ORA­00600: internal error code, arguments: [qerxtGetRefOffset_911], [], [], [], [], [], [], [], [], [], [], [] 
ORA­06512: at "SYS.KUPW$WORKER", line 1887 ORA­06512: at line 2 
ORA­39014: One or more workers have prematurely exited. 
ORA­39029: worker 2 with process name "DW00" prematurely terminated 
ORA­31671: Worker process DW00 had an unhandled exception. 
ORA­00600: internal error code, arguments: [qerxtGetRefOffset_911], [], [], [], [], [], [], [], [], [], [], [] 
ORA­06512: at "SYS.KUPW$WORKER", line 1887 
ORA­06512: at line 2 Job "SYS"."SYS_IMPORT_FULL_01" stopped due to fatal error

DataPump Import (IMPDP) Fails With Internal Error ORA­600 [qerxtgetrefoffset_911] When Importing V10 Dump Files (Doc ID 1369347.1) ORA­600 [qerxtGetRefOffset_911] (Doc ID 1267718.1) ­ this bug stated bug fix at 11.2.0.3, 12.1.0.1 and 11.2.0.2. Upon, trying to patch, it still did not fixed my issue.

Basically, nailed it down to this one table in Oracle 10G with CLOB column storing XML documentation that was causing all the issue. I was able to reproduce the issue in all Oracle 11G dot releases and early Oracle 12G base as well. Oracle finally stated that the bug has been checked into the patch 13770504 . I no longer have the opportunity to test it out as of writing this and I will just take the words for it.

I haven't tried conventional export/import or files copying as the options were not given and this issue is not my call. I was just being instructed to fix the issue by using Datapump.

Thursday, January 21, 2016

Oracle RAC Scan and resolv.conf issue

Oracle RAC evicted. SCAN no longer working. At first, everything look sporadic. Then, it seem to have something to do with network bounced.


Things to check/verify and double check in case someone changed something.

/etc/hosts
/etc/hostname or hostname
/etc/resolv.conf
/etc/named.conf
/var/named/mhrac1.zone (the zone file)
 /etc/sysconfig/network-scripts   -- check all the eth0, eth1, eth2 files here to have a good picture of how the networking looks like in this setup

Long story short -
When running an nslookup check, I was getting the following. I have disabled the firewall for the test, so port 53 should not be blocked. I can ping freely in and out of the node yet I kept getting the following. All configured IPs ping-ed fine.

nslookup dnsrac-scan.wwdom.com
Server:          10.11.188.211
Address:     10.11.188.211#53


** server can't find mhrac1-scan.wwdom.com: NXDOMAIN

Upon checking my resolv.conf, the following did not look right at all. I should have a "search" and the bottom 2 IPs addresses I set up in my mhrac1.zone file, instead, I am having the following

Generated by NetworkManager
search wwdom.com
nameserver 10.11.188.211
nameserver 10.11.188.212
nameserver 10.16.169.90
# NOTE: the libc resolver may not support more than 3 nameservers.
# The nameservers listed below may not be recognized.
nameserver 10.10.1.110
nameserver 75.75.75.75


After fixing the extra 3 IPs from the top, I was able to perform an expected nslookup. I was also able to replicate the issue by restarting the network - "service network restart" and new IPs get inserted every time I did that.


This could be due to when network restarted, it appended other IP addresses on top and the intended ones did not get read from - as the message already hinted.

My guess to permanently preventing from this issue, I need to change the PEERDNS in the  ifcfg-eth0 so, when network restarted, the VM will not reach outside and grab the real domain DNS IPs.  (By the way, the  ifcfg-eth0 is a make up name. Centos7 will provide a random name by default. So, keep track of you MAC address and assigned IPs. Default might look like this  ifcfg-234133445. the same goes for eth1 and eth2.