Category Archives: 11g Release 2

Database 11g Release 2

Be aware of these environment variables in .bashrc et al.

This is a quick post about one of my pet peeves-statically setting environment variables in .bashrc or other shell’s equivalents. I have been bitten by this a number of times. Sometimes it’s my own code, as in this story.

Background

Many installation instructions about Oracle version x tell you to add variables to your shell session when you log in. What’s meant well for convenience can backfire. Sure it’s nice to have ORACLE_HOME, ORACLE_SID, LD_LIBRARY_PATH, CLASSPATH etc set automatically without having to find out about them the hard way. However, there are situations where this doesn’t help.

A few years back I migrated Clusterware from 10.2.0.4 to 11.2.0.1. It went pretty well for DEV, INT, UAT, DR. But not for PROD (d’oh!). Why didn’t it? It took a little while to figure out but Oracle assumes that variables such as ORA_CRS_HOME are NOT set when in my case it was. The full story is told on MOS: NETCA & ASMCA Fail during Upgrade of CRS/ASM to Grid Infrastructure 11gR2 (Doc ID 952925.1). Thankfully it was recoverable.

My case

I have a script on my lab servers which allows me to set the environment for a particular database, including ORA_NLS10, NLS_DATE_FORMAT, and the usual suspects such as ORACLE_HOME, ORACLE_SID etc. The bug I identified this morning was that for one of the databases I had a copy and paste error. So instead of pointing to the 11.2.0.3 home where I wanted to create another database the ORA_NLS10 parameter pointed to the database I previously worked on, an 11.2.0.1 home. I didn’t know that though when I performed the following steps:

DBCA crashes

I normally execute dbca in silent mode in a screen session for database creation. In this case, it silently aborted, no error output that I could make out. Checking $ORACLE_BASE/cfgtoollogs/ I noticed the trace file with the following contents:

Could not connect to ASM due to following error:
 ORA-03113: end-of-file on communication channel

Well-that’s no good. There wasn’t a database alert.log available to check, but there were entries in the ASM alert.log for the core dump:

xception [type: SIGSEGV, Address not mapped to object] [ADDR:0x57598BCB] [PC:0xA635B06, lxcsu22m()+22]
 [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_3742.trc  (incident=281963):
ORA-07445: exception encountered: core dump [lxcsu22m()+22] [SIGSEGV] [ADDR:0x57598BCB]
 [PC:0xA635B06] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_281963/+ASM1_ora_3742_i281963.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

Checking the file it referenced didn’t give me much of a clue, and the callstack in the incident didn’t tell me much either. A quick search on MOS revealed that core dumps with lxcsu22m in the stack hint at invalid NLS settings. I didn’t recall having set NLS parameters in my session when it dawned on me… Using env | grep -i nls revealed that the ORA_NLS10 path was pointing to the 11.2.0.1 home I previously worked on. Unsetting these caused DBCA to complete without error.

Lesson learned

I have added a function to my environment script called cleanEnv() which explicitly unsets all environment variables before setting new ones. It is also called first thing when logging in to avoid this kind of situation. I also assume that ORACLE_HOME is correct and check other paths against it and raise an error if there are mismatches.

Clusterware and listener management gotcha in 11.2

I have come across an interesting situation recently and thought it was worth blogging about. My friend Doug Burns might like it, it has to do with consolidation.

Background

I have seen quite a few sites in my career where the separation (of duties/listeners/disk space/log destinations) was paramount-and for good reason! In fact Oracle propagate it as well as a quick search with your favourite search engine will show. In my example I came across a system that used different listeners per database, which is very common and prevents users from “accidentally” connecting to the wrong system. If you are using such a setup please read on. If you are not using Oracle Restart/Clusterware/RAC then this is not immediately relevant to your Oracle estate.

The case is easy to reproduce. All you need is a Grid Infrastructure setup and a database. In my case it’s Oracle Restart 11.2.0.3.0 plus a 11.2.0.3.0 single instance database. I would classify this behaviour you are going to see here as a bug but it probably will return from development as “not a bug”. It is actually fixed in 12.1 which a quick test revealed.

Test Case

My test case requires quite a few listeners. Oracle Restart comes with the default LISTENER on port 1521 without you having to do anything. My setup uses the same OS account for the Oracle Restart and RDBMS home. If you use the grid user for Oracle Restart the logfile to tail will be different but not the problem.

Back to the test case-which is not recommended to reproduce outside your lab VM!

Begin by creating a few more listeners, maybe like this:

[oracle@asmtest ~]$ for i in $(seq 2 9); do srvctl add listener -l lsnr$i -p "tcp:152$i"; done

In a different (screen?) session navigate to the oraagent log file. In a Grid Infrastructure environment the listener(s) and many others are so-called Resources. Each resource has a profile which tells Clusterware what to do with it. If you have never seen one, here is the resource for the ninth listener just created:

[oracle@asmtest oraagent_oracle]$ crsctl stat res ora.LSNR9.lsnr -p
NAME=ora.LSNR9.lsnr
TYPE=ora.listener.type
ACL=owner:oracle:rwx,pgrp:dba:r-x,other::r--,group:oinstall:r-x,user:oracle:rwx
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=
AUTO_START=restore
CHECK_INTERVAL=60
CHECK_TIMEOUT=30
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=listener) PROPERTY(LISTENER_NAME=PARSE(%NAME%, ., 2))
DEGREE=1
DESCRIPTION=CRS resource type for the Listener
ENABLED=1
ENDPOINTS=TCP:1529
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORACLE_HOME=%CRS_HOME%
PORT=1529
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=
START_TIMEOUT=180
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=
STOP_TIMEOUT=0
TYPE_VERSION=1.2
UPTIME_THRESHOLD=1d
USR_ORA_ENV=
USR_ORA_OPI=false
VERSION=11.2.0.3.0

There are lots of interesting things to see here, and once you understand that Clusterware is actually just a framework to deal with resource metadata then it becomes a lot less complex and you enter a land of possibilities!

Anyway, there is an AGENT_FILENAME attribute which points to the oraagent binary. In other words, the oraagent binary is responsible for managing the resource. Depending on your installation type the agent logs are either in

  • $ORACLE_HOME/log/$(hostname)/agent/ohasd/oraagent_oracle/ for Oracle Restart or
  • $ORACLE_HOME/log/$(hostname)/agent/crsd/oraagent_oracle/ for a clustered installation.

Should the grid owner be different you will see a directory oraagent_<grid_owner>. Since the listeners are controlled by the $GRID_HOME the grid owner owns the log files: simple!

Anyway, tail the oraagent_oracle.log file now in a separate session.

By now the creation of listeners 2-9 should have completed. Before we can see them in action they need to be started. Are you tailing that log file?

[oracle@asmtest ~]$ for i in $(seq 2 9); do srvctl start listener -l lsnr$i ; done

After a little while and a few hundred kb log file later you should see the listeners:

[oracle@asmtest ~]$ ps -ef | grep tnslsnr
oracle    3930     1  0 11:24 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LISTENER -inherit
oracle    5920     1  0 11:33 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR2 -inherit
oracle    5953     1  0 11:33 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR3 -inherit
oracle    5986     1  0 11:33 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR4 -inherit
oracle    6020     1  0 11:33 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR5 -inherit
oracle    6056     1  0 11:34 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR6 -inherit
oracle    6089     1  0 11:34 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR7 -inherit
oracle    6130     1  0 11:34 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR8 -inherit
oracle    6165     1  0 11:34 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR9 -inherit
oracle    6452  4786  0 11:35 pts/1    00:00:00 grep tnslsnr
[oracle@asmtest ~]$

Clusterware will also tell you they exist:

[oracle@asmtest ~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): asmtest
Listener LSNR2 is enabled
Listener LSNR2 is running on node(s): asmtest
Listener LSNR3 is enabled
Listener LSNR3 is running on node(s): asmtest
Listener LSNR4 is enabled
Listener LSNR4 is running on node(s): asmtest
Listener LSNR5 is enabled
Listener LSNR5 is running on node(s): asmtest
Listener LSNR6 is enabled
Listener LSNR6 is running on node(s): asmtest
Listener LSNR7 is enabled
Listener LSNR7 is running on node(s): asmtest
Listener LSNR8 is enabled
Listener LSNR8 is running on node(s): asmtest
Listener LSNR9 is enabled
Listener LSNR9 is running on node(s): asmtest
[oracle@asmtest ~]$

Now here is the reason for the blog post: are you still tailing the log? It’s probably flying by right now :) I noticed the following too in the logfile:

2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} Value of LOCAL_LISTENER is
2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} sqlStmt = ALTER SYSTEM SET LOCAL_LISTENER='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1522))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1523))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1524))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1525))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1526))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1527))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1528))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1529))))' SCOPE=MEMORY SID='orcl' /* db agent *//* {0:0:3198} */
2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} ORA-32021: parameter value longer than 255 characters

2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} DbAgent::DedicatedThread::setLocalListener Exception OCIException
2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} ORA-32021: parameter value longer than 255 characters

2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} InstConnection:~InstConnection: this 3417da90
2013-09-10 11:36:48.673: [ USRTHRD][666883840] {0:0:3198} CrsCmd::ClscrsCmdData::stat entity 1 statflag 32 useFilter 0
2013-09-10 11:36:48.688: [ USRTHRD][666883840] {0:0:3198} setResAttrib: attr GEN_START_OPTIONS clsagfw attribVal nomount clscrs tmpAttrValue open
2013-09-10 11:36:48.688: [ USRTHRD][666883840] {0:0:3198} setResAttrib cmdid 0x0107 checktype 0xffff
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} setResAttrib modifyResource attr GEN_START_OPTIONS value nomount
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} setResAttrib clsagfw_modify_attribute attr GEN_START_OPTIONS value nomount retCode 1
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} DbAgent::DThread getConnxn
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} Utils::getCrsHome crsHome /u01/app/oracle/product/11.2.0/grid
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} clsnInstConnection::makeConnectStr UsrOraEnv  m_oracleHome /u01/app/oracle/product/11.2.0/dbhome_1 Crshome /u01/app/oracle/product/11.2.0/grid
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} Utils::getCrsHome crsHome /u01/app/oracle/product/11.2.0/grid
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} clsnInstConnection::makeConnectStr LIBRARY_PATH1 ,LD_LIBRARY_PATH=
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} makeConnectStr = (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle)(ARGV0=oracleorcl)(ENVS='ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1,ORACLE_SID=orcl,LD_LIBRARY_PATH=')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(CONNECT_DATA=(SID=orcl)))
2013-09-10 11:36:48.697: [ USRTHRD][666883840] {0:0:3198} InstConnection::connectInt: server not attached
2013-09-10 11:36:48.715: [ USRTHRD][666883840] {0:0:3198} In DedicatedThread::generateLocalListener
2013-09-10 11:36:48.716: [ USRTHRD][666883840] {0:0:3198} m_crsHome = /u01/app/oracle/product/11.2.0/grid
2013-09-10 11:36:48.747: [ USRTHRD][666883840] {0:0:3198} Add endpoint from ora.LISTENER.lsnr to LOCAL_LISTENER
2013-09-10 11:36:48.747: [ USRTHRD][666883840] {0:0:3198} CrsCmd::ClscrsCmdData::stat entity 1 statflag 32 useFilter 0

You need to scroll, sorry-check the sqlStmt. This is the agent’s way of telling me that the local listener of the database ORCL isn’t set and that it wants to set it. But it can’t, because the alter system command fails with ORA-32021: parameter value longer than 255 characters. Bummer, better set this manually! As you can see Clusterware tries to set the local listener to ALL the ports, 1521-1529 and fails. The same message will be repeated thousands of times.

So what?

Bug or feature? I don’t really care, but it is something to be aware of, at least until you migrate to 12.1. If your consolidation platform uses lots of listeners (and by lots I mean more than one) then please set the local_listener parameter in the databases to prevent this from happening. But also be aware that the listener resource profile dictates a check every 60 seconds. Each check will produce log information that is written to file. The amount of CPU spent performing this task is proportional to the number of listeners.

This is not necessarily a problem but might help you find out why a server has a high load average and why oraagent_{oracle|grid} ranks high in the “top” output.

DBMS_FILE_TRANSFER potentially cool but then it is not

This post is interesting for all those of you who plan to transfer data files between database instance. Why would you consider this? Here’s an excerpt from the official 12.1 package documentation:

The DBMS_FILE_TRANSFER package provides procedures to copy a binary file within a database or to transfer a binary file between databases.

But it gets better:

The destination database converts each block when it receives a file from a platform with different endianness. Datafiles can be imported after they are moved to the destination database as part of a transportable operation without RMAN conversion.

So that’s a way not only to copy data files from one database to another but it also allows me to get a file from SPARC and make it available on Linux!

Before you are getting too excited though like I did here’s a catch. If the source file is in ASM you are kind of stuffed. Consider the below (11.2.0.3.0) example, created with a bog standard dbca “General Purpose” database. I have created the directories and db link and now want to get the files from database “source” to database “dbmsft”

begin
 dbms_file_transfer.get_file('dbmsft_src1','system.256.823861409','source','dbmsft_dst1','system.256.823861409');
 dbms_file_transfer.get_file('dbmsft_src1','sysaux.257.823861409','source','dbmsft_dst1','sysaux.257.823861409');
 dbms_file_transfer.get_file('dbmsft_src1','undotbs1.258.823861411','source','dbmsft_dst1','undotbs1.258.823861411');
 dbms_file_transfer.get_file('dbmsft_src1','users.259.823861411','source','dbmsft_dst1','users.259.823861411');
 dbms_file_transfer.get_file('dbmsft_src2','nonomf_01.dbf','source','dbmsft_dst2','nonomf_01.dbf');
end;
  8  /
begin
*
ERROR at line 1:
ORA-19504: failed to create file
"+DATA/dbmsft/datafile/system.256.823861409"
ORA-17502: ksfdcre:4 Failed to create file
+DATA/dbmsft/datafile/system.256.823861409
ORA-15046: ASM file name '+DATA/dbmsft/datafile/system.256.823861409' is
not in single-file creation form
ORA-06512: at "SYS.DBMS_FILE_TRANSFER", line 37
ORA-06512: at "SYS.DBMS_FILE_TRANSFER", line 132
ORA-06512: at line 2

And yes, that’s not a bug, it’s a feature, the same as with creating tablespaces. You simply cannot specify a fully qualified ASM file name when creating a file. You can of course use an alias name but that’s not pretty, is it? I hate mixing non OMF and OMF file names in ASM.

Need to check if that’s possible in 12c…

I am speaking at AIM & Database Server combined SIG

If you haven’t thought about attending UKOUG’s AIM & Database Server combined SIG, you definitely should! The agenda is seriously good with many well known speakers and lots of networking opportunity. It’s also one of the few events outside an Oracle office. 

I am speaking about how you can use incremental backups to reduce cross-platform transportable tablespace (TTS) downtime:

A little known MOS note describes how you can potentially reduce downtime by a significant amount when migrating databases between systems with different endianness. Many sites are looking for ways to move their databases from big-endian platforms such as HP-UX, AIX or Solaris/SPARC to an Exadata system running Linux. A new functionality called cross-platform incremental backup in conjunction with the already-known cross-platform transportable tablespaces makes this possible. When using incrementally rolled forward backups the source system stays up while the destination database is instantiated. Further incremental backups are then applied to the destination to keep it current-while the source system stays fully available.

At the time of the cut-over only a final incremental backup needs to be transferred to the destination host, which is significantly smaller than the full set normally transferred. The time window for when the source tablespaces have to be offline often is a lot smaller than compared to the traditional TTS process.

Hope to see you there!

An introduction to Policy Managed Databases in 11.2 RAC

I just realised this week that I haven’t really detailed anything about policy managed RAC databases. I remembered having done some research about server pools way back when 11.2.0.1 came out. I promised to spend some time looking at the new type of database that comes with server pools: policy managed databases but somehow didn’t get around to doing it. Since I’m lazy I’ll refer to these databases as PMDs from now on as it saves a fair bit of typing.

So how are PMDs different from Administrator Managed Databases?

First of all you can have PMDs with RAC only, i.e. in a multi-instance active/active configuration. Before 11.2 RAC you had to tie an Oracle instance to a cluster node. This is why you see instance prefixes in a RAC spfile. Here is an example from my lab 11.2.0.3.6 cluster:

DEMO1.__db_cache_size=1073741824
[...]
DEMO2.__streams_pool_size=0
*.audit_file_dest='/u01/app/oracle/admin/DEMO/adump'
*.audit_trail='db'
*.cluster_database=true
*.compatible='11.2.0.0.0'
*.control_files='...'
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_name='DEMO'
*.db_recovery_file_dest='+RECO'
*.db_recovery_file_dest_size=4558159872
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=DEMOXDB)'
DEMO2.instance_number=2
DEMO1.instance_number=1
*.nls_language='ENGLISH'
*.nls_territory='UNITED KINGDOM'
*.open_cursors=300
*.pga_aggregate_target=310378496
*.processes=300
*.remote_listener='rac11gr2scan.example.com:1521'
*.remote_login_passwordfile='exclusive'
*.sessions=335
*.sga_target=1610612736
DEMO1.thread=1
DEMO2.thread=2
DEMO1.undo_tablespace='UNDOTBS2'
DEMO2.undo_tablespace='UNDOTBS1'

Note that the instance_number, thread and undo tablespace are manually (=administrator) managed. If these aren’t set or configured incorrectly you will run into all sorts of fun. Continue reading

4k sector size and Grid Infrastructure 11.2 installation gotcha

Some days are just too good to be true :) I ran into an interesting problem trying to install 11.2.0.3.0 Grid Infrastructure for a two node cluster. The storage was presented via iSCSI which turned out to be a blessing and inspiration for this blog post. So far I haven’t found out yet how to create “shareable” LUNs in KVM the same way I did successfully with Xen. I wouldn’t recommend general purpose iSCSI for anything besides lab setups though. If you want network based storage, go and use 10GBit/s Ethernet and either use FCoE or (direct) NFS.

Here is my setup. Storage is presented in 3 targets using tgtd on the host:

  1. Target 1 contains 3×2 GB LUNs for OCR and voting disks in normal redundancy.
  2. Target 2 contains 3×10 GB LUNs for +DATA
  3. Target 2 contains 3×10 GB LUNs for +RECO

iSCSI initiators are Oracle Linux 6.4 on KVM with the host running OpenSuSE 12.3 providing the iSCSI targets. Yes, I know I’m probably the only Oracle DBA running SuSE, but to my defence I have a similar system with Oracle Linux 6.4 throughout and both work.

So besides the weird host OS there is nothing special. Since I’m lazy sometimes and don’t particularly like udev I decided to use ASMLib for device name persistence on the iSCSI LUNs. This turned out to be crucial, otherwise I’d never had written this post.

Continue reading

Grid Infrastructure And Database High Availability Deep Dive Seminars 2013

So this is a little bit of a plug for myself and Enkitec but I’m running my Grid Infrastructure And Database High Availability Deep Dive Seminars again for Oracle University. This time these events are online, so no need to come to a classroom at all.

Here is the short description of the course:

Providing a highly available database architecture fit for today’s fast changing requirements can be a complex task. Many technologies are available to provide resilience, each with its own advantages and possible disadvantages. This seminar begins with an overview of available HA technologies (hard and soft partitioning of servers, cold failover clusters, RAC and RAC One Node) and complementary tools and techniques to provide recovery from site failure (Data Guard or storage replication).

In the second part of the seminar, we look at Grid Infrastructure in great detail. Oracle Grid Infrastructure is the latest incarnation of the Clusterware HA framework which successfully powers every single 10g and 11g RAC installation. Despite its widespread implementation, many of its features are still not well understood by its users. We focus on Grid Infrastructure, what it is, what it does and how it can be put to best use, including the creation of an active/passive cold failover cluster for web and database resources.

If you are interested I would like to invite you to head over to the Oracle University website here which has a more extensive synopsis and all the detail you need:

http://education.oracle.com/pls/web_prod-plq-dad/db_pages.getCourseDesc?dc=D81641_2043034

UPDATE: I received several emails and comments that the above link does not work. I couldn’t reproduce this until today. It appears to be an issue with the country selection. If you have USA selected in the top right corner the link won’t work, switching to United Kingdom (my preference) will fetch the course detail. I don’t quite understand as to why that is the case since the class is virtual and not depending on a country…

I hope to hear from you during the course!

Limiting the Degree of Parallelism via Resource Manager and a gotcha

This might be something very obvious for the reader but I had an interesting revelation recently when implementing parallel_degree_limit_p1 in a resource consumer group. My aim was to prevent users mapped to a resource consumer group from executing any query in parallel. The environment is fictional, but let’s assume that it is possible that maintenance operations for example leave indexes and tables decorated with a parallel x attribute. Another common case is the restriction of PQ resource to users to prevent them from using all the machine’s resources.

This can happen when you perform an index rebuild for example in parallel to speed the operation up. However the DOP will stay the same with the index after the maintenance operation, and you have to explicitly set it back:

SQL> alter index I_T1$ID1 rebuild parallel 4;

Index altered.

SQL> select index_name,degree from ind where index_name = 'I_T1$ID1';

INDEX_NAME		       DEGREE
------------------------------ ----------------------------------------
I_T1$ID1		       4

SQL>

Continue reading

How to set up data guard broker for RAC

This is pretty much a note to myself on how to set up Data Guard broker for RAC 11.2.0.2+.

UPDATE: Please note that a lot of this has changed in 12.1, as described in a different blog post of mine (Little things worth knowing: Data Guard Broker Setup changes in 12c). If you are looking for information how to implement this with 12.1 then please continue on the other post. If you are on Oracle 11.2 then please read on :)

The Test Environment

The tests have been performed on Oracle Linux 5.5 with the Red Hat Kernel. Oracle was 11.2.0.2. Sadly my lab server didn’t support more than 2 RAC nodes, so everything has been done on the same cluster. It shouldn’t make a difference though. If it does, please let me know).

WARNING: there are some rather deep changes to the cluster here, be sure to have proper change control around making such amendments as it can cause outages! Nuff said.

Unfortunately I didn’t take notes of the configuration as it was before, so the post is going to be a lot shorter and less dramatic, but it’s useful as a reference (I hope) nevertheless. Now what’s the situation? Imagine you have a two node RAC cluster with separation of duties in place-“grid” owns the GRID_HOME, while “oracle” owns the RDBMS binaries. Imagine further you have two RAC database, ORCL and STDBY. STDBY has only just been duplicated for standby, so there is nothing in place which links the two together.

Continue reading

Kernel UEK 2 on Oracle Linux 6.2 fixed lab server memory loss

A few days ago I wrote about my new lab server and the misfortune with kernel UEK (aka 2.6.32 + backports). It simply wouldn’t recognise the memory in the server:

# free -m
             total       used       free     shared    buffers     cached
Mem:          3385        426       2958          0          9        233
-/+ buffers/cache:        184       3200
Swap:          511          0        511

Ouch. Today I gave it another go, especially since my new M4 SSD has arrived. My first idea was to upgrade to UEK2. And indeed, following the instructions on Wim Coekaerts’s blog (see references), it worked:

[root@ol62 ~]# uname -a
Linux ol62.localdomain 2.6.39-100.7.1.el6uek.x86_64 #1 SMP Wed May 16 04:04:37 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@ol62 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         32221        495      31725          0          5         34
-/+ buffers/cache:        456      31764
Swap:          511          0        511

Note the 2.6.39-100.7.1! It’s actually past that and version 3.x, but to preserve compatibility with a lot of software parsing the kernel revision number in 3 tuples Oracle decided to stick with 2.6.39. But then the big distributions don’t really follow the mainstream kernel numbers anyway.

Now if anyone could tell me if UEK2 is out of beta? I know it’s not supported for the database yet, but it’s a cool kernel release and I can finally play around with the “perf” utility Kevin Closson and Frits Hoogland have mentioned so much about recently. Continue reading