Balaji's Scratchpad on Oracle Administration

Wednesday, November 12, 2014

Oracle RDBMS 11.2 Online patch

Today I had a colleague asking if he can install a opatch online or should he do it in a traditional fashion and any risk involved with it. To answer this I had to give some explanation and examples, which I thought would be useful for others if posted in a blog.

A normal patch comprises of one or more object (.o) files and/or libraries (.a files). Installation requires shutting down the RDBMS instance, re-linking the oracle binary, and restarting the instance. Whereas the online patch can be applied to a live RDBMS instance.

An online patch contains a single shared library; installing an online patch does not require shutting down the instance or relinking the oracle binary. An online patch can be installed/un-installed using Opatch (which uses oradebug commands internally).Since this is done with oradebug if you have multiple databases running from a single oracle home you have push them across to the databases you require.

Is patching online always recommended if we have the option to do so ? I would say use the online option only in cases of a quick fix required and a downtime cannot be borrowed immediately. Else stay away from the online option. Online patches as mentioned earlier are share libraries and require additional memory since the modified functions have to be in the memory. Oracle defines the overhead calculation as below:

Unix : memory overhead = ( # of processes +1) x size of ( .pch file)

In my example the overhead is as below:

processes parameter is set to 200, and the .pch file size is 1.67 MB

Memory overhead = (200 + 1) * 1.67 MB = 335 MB

When we have multiple online patches like these the overhead would go up, so even if we apply online patches it is recommended to remove them and install it in the normal fashion during a downtime acquired later.

Here we will take a example of applying patch 17018214 to a database in 11.2.0.3 in AIX, to see the actual steps and how it gets loaded.

How to determine a patch can be applied online:

1. It should be mentioned in the readme and patch should have a online directory and under its subdirectories a .pch file

2. cod to the patch directory and run : opatch query -all online
The output should contain the text "Patch is an online patch: true"

How to apply the patch online:

syntax

$ORACLE_HOME/OPatch/opatch apply online -connectString :::,:::

I am using a single node command since I am using a single node command as below:

$ORACLE_HOME/OPatch/opatch apply online -connectString TST:sys:oracle

The output should contain text as below after successful application

Patching component oracle.rdbms, 11.2.0.3.0...
Installing and enabling the online patch 'bug17018214.pch', on database 'TST'.

Verifying the update...
Patch 17018214 successfully applied

How to verify the patch is applied

Login via sqlplus as a sysdba and query vie oradebug, similarly enabling and disabling the patch can be done via oradebug, but when you want to remove the online patch it should be done via opatch. You can also check via the procmap command for the pmon process to see the shared libraries being loaded, you can use pmap for solaris and Linux.

SQL> oradebug patch list

Patch File Name                                   State
================                                =========
bug17018214.pch                                  ENABLED

I had issues with procmap, hence writing down a sample pmap output of the pmon process after the onilne patch is installed:

00002abecb53c000       8 r-x-- 000000000c64e000 008:00002 oracle
00002abecb53e000    5052 r-x-- 00000000000bd000 008:00002 oracle
00002abecba2d000     140 r-x-- 0000000000000000 008:00002 bug17018214.so
00002abecba50000    1024 ----- 0000000000023000 008:00002 bug17018214.so
00002abecbb50000       8 rwx-- 0000000000023000 008:00002 bug17018214.so
00007fff3342d000      84 rwx-- 00007ffffffea000 000:00000   [ stack ]
ffffffffff600000    8192 ----- 0000000000000000 000:00000   [ anon ]

Also you can enable and disable the patch with the below commands.

SQL> oradebug patch disable bug17018214.pch
Statement processed.
SQL> oradebug patch list

Patch File Name                                   State
================                                =========
bug17018214.pch                                  DISABLED

SQL> oradebug patch enable bug17018214.pch
Statement processed.
SQL> oradebug patch list

Patch File Name                                   State
================                                =========
bug17018214.pch                                  ENABLED

If you are wondering where the patch information is stored, its stored in $ORACLE_HOME/hpatch

$ ls
bug17018214.pch bug17018214.pchIFPT.fixup bug17018214.so orapatchIFPT.cfg

How to rollback the online patch

syntax

opatch rollback -id -connectString :::

The command in my single node database as below/;

$ORACLE_HOME/OPatch/opatch rollback -id 17018214 -connectString TST:sys:oracle

The output should contain text as below after succesfully revoking

The patch will be removed from database instances.
Disabling and removing online patch 'bug17018214.pch', on database 'TST'

RollbackSession removing interim patch '17018214' from inventory

Reference:
MoS Note Doc ID 761111.1 - RDBMS Online Patching Aka Hot Patching

Monday, February 10, 2014

ORA-15260 and ORA-15027

I am writing this post to show the trivial mistakes we could do while transitioning from 10g/11gR1 ASM to 11gR2 ASM/Grid Infrastructure.Yes I did make it as well !

       $ sqlplus / as sysdba
     SQL> ALTER diskgroup DATA mount;
         ALTER diskgroup DATA mount
         *
         ERROR AT line 1:
         ORA-15032: NOT ALL alterations performed
         ORA-15260: permission denied ON ASM disk GROUP

   This is not because you have some wrong privileges for the disks or diskgroups but we have got to the old habit of logging in as sysdba to ASM instance.Login as sysasm and this will go away, Starting from Oracle 11g, SYSASM role should be used to administer the ASM instances.


         $ sqlplus / as sysasm
   SQL> ALTER diskgroup DATA mount;

         Diskgroup altered.


        When I tried to drop the data diskgroup I get this error ORA-15027, I did login as sysasm.


         $ sqlplus / as sysasm
    SQL> drop diskgroup DATA including contents;

         *
         ERROR at line 1:
         ORA-15039: diskgroup not dropped
         ORA-15027: active use of diskgroup "DATA" precludes its dismount

        The next obvious thing I do is to check the V$ASM_CLIENT to see the active sessions. But without any results.


        SQL> select * from v$asm_client;
        no rows selected

         The reason for this is that I have the spfile for the ASM instance in the same diskgroup and its used by the instance. I had to move the spfile out of the diskgroup , in my case out of ASM since this is the only diskgroup we have. The issue got resolved after this.


           SQL> create pfile='$ORACLE_HOME/dbs/init+ASM.ora' from spfile='';
     SQL> shutdown immediate;
     SQL> startup pfile=$ORACLE_HOME/dbs/init+ASM.ora
     SQL> drop diskgroup DATA including contents;

Monday, February 27, 2012

Flushing a single SQL from the shared pool

Many DBA's want to flush independent sql's when there is a situation to reparse the sql for testing or even at times to put a quick workaround for plan deviations. Traditionally we opt for flushing the shared pool, which at times could break another sql.So what could we do to avoid it, cant we flush one sql at a time ? there was a partial answer to this by Kerry Osborne by creating and dropping outlines.Which was great but dint work semlessly always (no other option if you are still stuck in a old version).

But if you are in 10.2.0.4 or above there is another winner in DBMS_SHARED_POOL.PURGE which does purge a single sql.

The DBMS_SHARED_POOL package with the PURGE procedure is included in the 10.2.0.4 patchset release, but might not work if the db was upgraded from 10.2.0.X to 10.2.0.4. For 10.2.0.2/3 patch 5614566 can be installed to get this going.

The syntax for the procedure is as below and I will show more examples for different scenarios through this article.

DBMS_SHARED_POOL.PURGE (
name VARCHAR2,
flag CHAR DEFAULT 'P',
heaps NUMBER DEFAULT 1)

SQL_ID SQL_TEXT
------------- ----------------------------------------------------------------------------------------------------
1384ubj5yw6d4 select * from dba_objects where rownum < 5 SQL> select ADDRESS, HASH_VALUE from V$SQLAREA where SQL_ID='1384ubj5yw6d4';

ADDRESS HASH_VALUE
---------------- ----------
000000019BFCB8E8 1273895332

SQL> alter session set events '5614566 trace name context forever'; ---> This is event protected in 10.2.0.4, not required for 11g

Session altered.

SQL> exec DBMS_SHARED_POOL.PURGE ('000000019BFCB8E8,1273895332','C');

PL/SQL procedure successfully completed.

SQL> select ADDRESS, HASH_VALUE from V$SQLAREA where SQL_ID='1384ubj5yw6d4';

no rows selected

Perfect that has flushed one sql now.

Before we close we will take a look at one other variant of this procedure to clear a sequnce from the shared pool which could find a use in the DBA's life.

SQL> create sequence test_seq start with 1 increment by 1 cache 5; ---> created a sequence with cache value 5

Sequence created.

SQL> select test_seq.nextval from dual;

NEXTVAL
----------
1

exec DBMS_SHARED_POOL.PURGE ('TEST_SEQ','Q')

PL/SQL procedure successfully completed.

SQL> select test_seq.nextval from dual; ---> since we flushed the sequence with cache value of 5, the sequence has jummped to six

NEXTVAL
----------
6

Hope you found this usefull...

Wednesday, November 3, 2010

ASM mount fails with ORA-15032 + ORA-15063

Development DBA's were using a single node Dell BOX for 11gR1 ASM , RDBMS and later decided to move it to two node RAC and hence deinstralled the existing software and had left the DB files as such so that they can use the same DB's after the the RAC setup. After the CRS,ASM and RDBMS were installed, they had 2 new disks to be added for the RAC nodes.

Using DBCA an asm instance was created and a diskgroup called DG1 was created with the 2 new disks, and one of the ASM instance's alert log started thowing the below error.My help was sought to see if anything was faced like this in production.

ERROR: diskgroup DG1 was not mounted
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DG1"
ORA-15038: disk '' size mismatch with diskgroup [1048576] [4096] [512]
ERROR: ALTER DISKGROUP ALL MOUNT

First check the disks in both the nodes :

Node - 1

SQL> select MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,NAME,PATH,TOTAL_MB,FREE_MB from v$asm_disk;

MOUNT_S HEADER_STA MODE_ST NAME PATH TOTAL_MB FREE_MB
------- ---------- ------- ------------------ -------------------------------------------------- ---------- ----------
CLOSED MEMBER ONLINE /dev/raw/raw3 0 0
CLOSED FOREIGN ONLINE /dev/raw/raw5 0 0
CLOSED MEMBER ONLINE /dev/raw/raw1 0 0
CLOSED FOREIGN ONLINE /dev/raw/raw2 0 0
IGNORED MEMBER ONLINE /dev/raw/raw4 0 0

Node - 2

SQL> select MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,NAME,PATH,TOTAL_MB,FREE_MB from v$asm_disk;

MOUNT_S HEADER_STA MODE_ST NAME PATH TOTAL_MB FREE_MB
------- ---------- ------- ------------------ -------------------------------------------------- ---------- ----------
CLOSED FOREIGN ONLINE /dev/raw/raw2 0 0
CLOSED FOREIGN ONLINE /dev/raw/raw5 0 0
CACHED MEMBER ONLINE DG1_0001 /dev/raw/raw3 547419 168410
CACHED MEMBER ONLINE DG1_0000 /dev/raw/raw4 547419 168257

Inference 1:

From the above details /dev/raw/raw4 was the old disk mounted in the old standalone ASM, which is not made visible in the new node.

Note: Also I could see all the files from the standalone DB synchronized in the second node DG1, this takes us closer to our issue.

From this I used kfed to check what the disk headers had to say

(Contenet shortened for better reading)

[oracle@wv1devdb03b dev]$ kfed read /dev/raw/raw1

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DG1_0000 ; 0x028: length=8
kfdhdb.grpname: DG1 ; 0x048: length=3
kfdhdb.fgname: DG1_0000 ; 0x068: length=8
kfdhdb.capname: ; 0x088: length=0

[oracle@wv1devdb03b dev]$ kfed read /dev/raw/raw4

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DG1_0000 ; 0x028: length=8
kfdhdb.grpname: DG1 ; 0x048: length=3
kfdhdb.fgname: DG1_0000 ; 0x068: length=8
kfdhdb.capname: ; 0x088: length=0

[oracle@wv1devdb03b dev]$ kfed read /dev/raw/raw3

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DG1_0001 ; 0x028: length=8
kfdhdb.grpname: DG1 ; 0x048: length=3
kfdhdb.fgname: DG1_0001 ; 0x068: length=8
kfdhdb.capname: ; 0x088: length=0

Yes now we know the issue, the disk '/dev/raw/raw1' was earlier mounted as diskgroup DG1 (and of course was not cleaned up) , this could have better if the new disks were created with a new diskgroup name viz., DG2. So the end issue is we have mismatching set of disks for DG1 in both the nodes and also we have two disks with the name as "DG1_000".

So what could be done to resolve this (dev setup gives me more liberty :-) ), in our case the below :

1. Commented the /dev/raw/raw1 and restarted node 1.
2. ASM now had same disks in both sides and it has come up fine.
3. We returned the old disk raw1 to the storage team.

What could be done to avoid this :

1. Mounted the raw1 disk on both the nodes.
2. Else could have created the diskgroup with a different name, very simple I guess.

Friday, October 29, 2010

Oracle 11G : root.sh fails with - Failure at final check of Oracle CRS stack. 10

I was setting up a Oracle 11G RAC in a two node Linux cluster and got into a issue while running the root.sh in the second node of the cluster as below:

/rdbms/crs/root.sh
Checking to see if Oracle CRS stack is already configured
/etc/oracle does not exist. Creating it now.

Setting the permissions on OCR backup directory
Setting up Network socket directories
Oracle Cluster Registry configuration upgraded successfully
clscfg: EXISTING configuration version 4 detected.
clscfg: version 4 is 11 Release 1.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: devdb03b devdb03b-priv devdb03b
node 2: devdb03a devdb03a-priv devdb03a
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
Failure at final check of Oracle CRS stack.
10

After the error I was manually evaluating the basic setup was right, there were a couple of issues which were trivial and had escaped the clufy verification:

1. The private and virtual host names were commented in the /etc/hosts file in one node.
2. The time was not synced in both the nodes which could cause node eviction.

[oracle@devdb03b cssd]$ date
Thu Oct 28 10:49:38 GMT 2010
[oracle@devdb03a ~]$ date
Thu Oct 28 10:48:39 GMT 2010

After the required changes were done the installation was cleaned up following the following note and reinstalled.

Note: How to Clean Up After a Failed 10g or 11.1 Oracle Clusterware Installation [ID 239998.1]

Which still dint resolve the issue, after some more analysis on the trace dumps from the ocssd , we could see that the network heart beat was not coming through for some other reason like a port block or a firewall issue, checking the /etc/services and iptables confirmed it.

[ CSSD]2010-10-28 10:55:58.709 [1098586432] >TRACE: clssnmReadDskHeartbeat: node 1, devdb03a, has a disk HB, but no network HB, DHB has rcfg 183724820, wrtcnt, 476, LATS 51264, lastSeqNo 476, timestamp 1288262691/387864

OL=tcp)(HOST=wv1
devdb03b-priv)(P
ORT=49895))

iptables was enabled and had many restrictions, so after adding the following in the iptables and restarting the nodes (as in one node the crs restart was hanging forever).

In node devdb03a

ACCEPT all -- devdb03b anywhere

In node devdb03b

ACCEPT all -- devdb03a anywhere

After this the crs became healthy, but no resources were there.

This was due to the root.sh failure in the second node, to fix this the vipca was run as rot user from the first node and everything fell in place quickly, and all the vip,ons and gsd came up fine.

Tuesday, October 12, 2010

Oracle Netbackup restore to a different user/server

I am working on an environment where we have (Oracle RMAN + Netbackup) for our backup strategy, and today we had one of our APPS DBA seeking help for restoring the production backup to a different server (dev) in a different user.

For restoring in a different server it was straight forward as I had done it multiple times before, the solution is as below to send the name of the client which took the backup via NB_ORA_CLIENT parameter, which will let the netbackup client browse the backups taken from the production server (prd-bkp).

Note: the actual client here is prd-bkp

run
{
host "date";
allocate channel t1 DEVICE TYPE sbt_tape PARMS 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64.1' format '%d_dbf_%u_%t' ;
send 'NB_ORA_CLIENT=prd-bkp';
restore controlfile to '/upg04/FINDEV/control01.ctl' ;
set until time "to_date('05-10-2010 10:01:00','dd-mm-yyyy hh24:mi:ss')";
release channel t1 ;
debug off;
host "date";
}

Coming to the second issue, where we have to restore the file to a different user, I had to work to analyze the issue and from the help of the backup admin I pulled out the log files from the netbackup master server and could see the following message

07:21:46.888 [6581] <2> db_valid_master_server: dev-bkp is not a valid server
07:21:46.933 [6581] <2> process_request: command C_BPLIST_4_5 (82) received
07:21:46.933 [6581] <2> process_request: list request = 329199 82 oradev dbadev prd-bkp dev-bkp
dev-bkp NONE 0 3 999 1281405910 1284084310 4 4 1 1 1 0 4 7230 9005 4 0 C C C C C 0 2 0 0 0
07:21:46.947 [6581] <2> get_type_of_client_list_restore: list and restore not specified for dev-bkp
07:21:46.947 [6581] <2> get_type_of_client_free_browse: Free browse allowed for dev-bkp
07:21:46.948 [6581] <2> db_valid_client: -all clients valid-
07:21:46.949 [6581] <2> fileslist: sockfd = 9
07:21:46.949 [6581] <2> fileslist: owner = oradev
07:21:46.949 [6581] <2> fileslist: group = dbadev
07:21:46.949 [6581] <2> fileslist: client = prd-bkp
07:21:46.949 [6581] <2> fileslist: sched_type = 12

The reason being the backup was done from oracle user - dba group, restore was tried from oradev user - dbadev group. Here the user and group has to match for the restore to succeed. Changing the existing setup to a production like user was not possible because we had users like oradev,oratst in the box which are all going to have copies from production.

Finally we could see that since Netbackup 6.0 MP4 we could read the backup images if the groups of the users were same, voila we were in 6.5. This change was possible for us to do with the help of the sysadmin without compromising on security.By doing so the backup went fine the solution was as below:

1.Stop the oracle instance runnign in dev.
2.Changed the group of oradev from dbadev to dba.
3.Changed the binaries ownership to oradev:dba.
4.Start the oracle instance with the oradev user.
5.Restart the backup

Now we had a smooth restore and below was the log from the Netbackup master server.

07:53:11.923 [9423] <2> db_valid_client: -all clients valid-
07:53:11.924 [9423] <2> fileslist: sockfd = 9
07:53:11.924 [9423] <2> fileslist: owner = oradev
07:53:11.924 [9423] <2> fileslist: group = dba
07:53:11.924 [9423] <2> fileslist: client = prd-bkp
07:53:11.924 [9423] <2> fileslist: sched_type = 12.

Wednesday, September 8, 2010

Oracle Virtual Index

Oracle has the undocumented feature which helps to test if a specific index creation will be used by a query to improve performance during a tuning exercise. This is a fake index hence its very fast to do the initial test and also does not occupy space, given below is a very simple example on testing the virtual index.

1.Create a test table for the activity.

create table test as select * from dba_objects;

2.Collect the statistics for the table

EXEC DBMS_STATS.gather_table_stats(user, 'TEST')

3.Check the plan nfor the query

explain plan for
select * from test where object_id=200;

SQL> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 98 | 196 (1)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| TEST | 1 | 98 | 196 (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------

1 - filter("OBJECT_ID"=200)

13 rows selected.

4.Create a virtuala index with nosegment clause.

create index test_idx1 on test(object_id) nosegment;

5.Alter session to use the virtual index

ALTER SESSION SET "_use_nosegment_indexes" = TRUE;

6.Collect statistics for the table
Note: The statistics will not be populated for the index.

EXEC DBMS_STATS.gather_table_stats(user, 'TEST', cascade => TRUE);

7.Check the virtual index exists
Note: The information is not populated in DBA_INDEXES but in DBA_OBJECTS

SQL> SELECT o.object_name AS fake_index_name FROM user_objects o
WHERE o.object_type = 'INDEX' AND NOT EXISTS
(SELECT null FROM user_indexes i WHERE o.object_name = i.index_name );

FAKE_INDEX_NAME
------------------------------------------------------------------------
TEST_IDX1

8.Check the plan to see if the index is used.

explain plan for
select * from test where object_id=200;

SQL> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
Plan hash value: 2624864549

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 98 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 98 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TEST_IDX1 | 1 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
2 - access("OBJECT_ID"=200)

14 rows selected.

9.Drop the virtual index and create a real one if it helps.

drop index test_idx1;