Category Archives: Uncategorized

Little things worth knowing: scheduler jobs and Instance Caging

While updating material for a training class about Database Resource Management I briefly wondered if Instance Caging should apply for scheduler jobs as well. The obvious answer is “yes”, but I wanted to find proof. I hope I did in the end, and wanted to share how I got there.

The test environment

My system is based on the shiny new AMD Ryzen 7 1700X Eight-Core Processor, and it shows as 1s8c16t. I really like it! My 12.2.0.1 database named DEMO is patched to August 2017 – 12.2.0.1.170814 to be precise. It’s difficult to test resource management, and specifically Instance Caging, with a dual-core laptop, so this machine should help!

Test setup

Apart from the database workload I am soon going to launch the system is otherwise idle. I set cpu_count to 4 and made sure I had a resource manager plan active:

SQL> select name, value from v$parameter 
  2   where name in ('cpu_count','resource_manager_plan');

NAME                           VALUE                                             
------------------------------ ------------------------------------------------
cpu_count                      4                                                 
resource_manager_plan          SCHEDULER[0x4ABF]:DEFAULT_MAINTENANCE_PLAN        

For this case it doesn’t matter that my resource manager plan is associated with the maintenance window. All I needed was some resource manager plan. In production systems the situation is most often somewhat different and proper resource management is crucial.

The scheduler needs something to work with, and I opted for a stored procedure that needlessly burns CPU. Like this one:

SQL> create or replace procedure martin.burn_proc(
  2   pi_num_iterations number)
  3  as
  4   i number;
  5  begin
  6  for j in 1..pi_num_iterations loop
  7   i := dbms_random.random;
  8  end loop;
  9 end;
 10 /

Procedure created.

Now all I had to do was to create an anonymous block of PL/SQL scheduling a number of jobs. This little piece of code does just that:

SQL> !cat burn_sched.sql 
select to_char(sysdate,'dd.mm.yyyy hh24:mi:ss') now from dual;

show user

accept num_jobs number prompt 'how many jobs to schedule? '

prompt creating a number of scheduler jobs
declare
        i number;
begin
        for i in 1..&num_jobs loop
                dbms_scheduler.create_job(
                        job_name => 'martin.burn_proc_job' || i, 
                        job_type => 'PLSQL_BLOCK', 
                        job_action => 'begin burn_proc(1000000000); end;', 
                        start_date => systimestamp, 
                        enabled => true);
        end loop;
        dbms_lock.sleep(5);
end;
/

prompt checking for running jobs
select count(*) from dba_scheduler_running_jobs where owner = 'MARTIN';

Now it’s time to run the code!

SQL> @burn_sched

NOW
-------------------
05.11.2017 09:38:49

USER is "MARTIN"
how many jobs to schedule? 3
creating a number of scheduler jobs
old   4:        for i in 1..&num_jobs loop
new   4:        for i in 1..         3 loop

PL/SQL procedure successfully completed.

checking for running jobs

  COUNT(*)
----------
         3

With this first test I wanted to see what happens when keeping the number of jobs lower than the value I defined for cpu_count. Using Active Session History (ASH) as the source for performance data is probably the easiest way to analyse what happened.

Careful: using Active Session History (ASH) requires a license!

Provided you are licensed to use ASH, a query such as the following provides valuable insights:

SQL> select count(*), event, session_state, session_type, username
  2  from v$active_session_history ash left join dba_users u on (ash.user_id = u.user_id)
  3  where module = 'DBMS_SCHEDULER'
  4  and sample_time > to_date('05.11.2017 09:38:49','dd.mm.yyyy hh24:mi:ss')
  5  group by event, session_state, session_type, username;

  COUNT(*) EVENT                SESSION SESSION_TY USERNAME
---------- -------------------- ------- ---------- ----------
        89                      ON CPU  FOREGROUND MARTIN

As you can see, none of the samples in ASH show any waits for CPU. Let’s increase the session count to a value that exceeds my cpu_count of 4:

SQL> @burn_sched

NOW
-------------------
05.11.2017 09:51:45

USER is "MARTIN"
how many jobs to schedule? 5
creating a number of scheduler jobs
old   4:        for i in 1..&num_jobs loop
new   4:        for i in 1..         5 loop

PL/SQL procedure successfully completed.

checking for running jobs

  COUNT(*)
----------
         5

After these 5 jobs completed, I checked ASH again to see if there was a difference:

SQL> select count(*), event, session_state, session_type, username
  2  from v$active_session_history ash left join dba_users u on (ash.user_id = u.user_id)
  3  where module = 'DBMS_SCHEDULER'
  4  and sample_time > to_date('05.11.2017 09:51:45','dd.mm.yyyy hh24:mi:ss')
  5  group by event, session_state, session_type, username;

  COUNT(*) EVENT                          SESSION SESSION_TY USERNAME
---------- ------------------------------ ------- ---------- ----------
       153                                ON CPU  FOREGROUND MARTIN
        66 resmgr:cpu quantum             WAITING FOREGROUND MARTIN

And indeed, there is a number of Resource Manager CPU waits! To me this is proof enough that scheduler jobs also fall into the category of workload that can be caged.

Advertisement

12.2 New Feature: the FLEX ASM disk group part 3

In the previous 2 parts of this mini series I introduced the Flex ASM disk group and two related concepts, the Quota Group and File Group. In what should have become the final part (but isn’t) I am interested in checking whether quotas are enforced.

(Un)fortunately I have uncovered a few more things that are worth investigating and blogging about, which is why a) this isn’t the last post and b) it got a bit shorter than the previous two. Had I combined part 3 and 4 it would have been too long for sure … BTW, you can navigate all posts using the links at the very bottom of the page.

Are quotas enforced?

The purpose of the Quota Group is … to enforce quotas on a disk group, much like on a file system. This is quite interesting, because you now have a hard limit to which databases can grow within a disk group even for non-CDBs.

The question I set out to answer in this part is whether quotas are enforced. As you read in the previous post’s interlude, I have created 2 databases on the FLEX ASM disk group: ORCL and CDB, both container databases.

The current state of affairs for my File Groups is this:

ASMCMD> lsfg
File Group         Disk Group  Quota Group  Used Quota MB  Client Name    Client Type  
DEFAULT_FILEGROUP  FLEX        GENERIC      2488                                       
CDB_CDB$ROOT       FLEX        QG_CDB       6744           CDB_CDB$ROOT   DATABASE     
CDB_PDB$SEED       FLEX        QG_CDB       1656           CDB_PDB$SEED   DATABASE     
PDB1               FLEX        QG_CDB       1784           PDB1           DATABASE     
ORCL_CDB$ROOT      FLEX        GENERIC      9104           ORCL_CDB$ROOT  DATABASE     
ORCL_PDB$SEED      FLEX        GENERIC      1616           ORCL_PDB$SEED  DATABASE     
PDB1_0001          FLEX        GENERIC      9424           PDB1           DATABASE   

Database CDB is nicely tucked away in QG_CDB, but none of the ORCL database’s components are assigned to a Quota Group yet. I wanted to have another Quota Group QG_ORCL, for my second CDB. Somehow I think that a Quota Group per database makes sense.

ASMCMD> mkqg -G FLEX QG_ORCL quota 20G
Diskgroup altered.
ASMCMD> lsqg
Group_Num  Quotagroup_Num  Quotagroup_Name  Incarnation  Used_Quota_MB  Quota_Limit_MB  
2          1               GENERIC          1            22632          0               
2          2               QG_ORCL          7            0              20480           
2          3               QG_CDB           1            10184          20480           
ASMCMD> 
If you followed part 1 and 2 you may have noticed that my FLEX disk group group_number has changed from 5 to 2, after a server reboot.

SQL> select group_number, name, state, type from v$asm_diskgroup
  2  where group_number = 2;

GROUP_NUMBER NAME                           STATE       TYPE
------------ ------------------------------ ----------- ------
           2 FLEX                           CONNECTED   FLEX

Don’t let the change in numbers confuse you

Back to the example: File Groups ORCL_CDB$ROOT, ORCL_PDB$SEED and PDB1_0001 are not yet within QG_ORCL. This can be rectified using 3 simple mvfg commands in ASMCMD, or the corresponding SQL commands. After the move commands completed, the Quota Group still has space left (although it’s very limited)

ASMCMD> mvfg -G flex --filegroup PDB1_0001 QG_ORCL
Diskgroup altered.
ASMCMD> mvfg -G flex --filegroup ORCL_PDB$SEED  QG_ORCL
Diskgroup altered.
ASMCMD> mvfg -G flex --filegroup ORCL_CDB$ROOT  QG_ORCL
Diskgroup altered.
ASMCMD> lsfg
File Group         Disk Group  Quota Group  Used Quota MB  Client Name    Client Type  
DEFAULT_FILEGROUP  FLEX        GENERIC      2488                                       
CDB_CDB$ROOT       FLEX        QG_CDB       6744           CDB_CDB$ROOT   DATABASE     
CDB_PDB$SEED       FLEX        QG_CDB       1656           CDB_PDB$SEED   DATABASE     
PDB1               FLEX        QG_CDB       1784           PDB1           DATABASE     
ORCL_CDB$ROOT      FLEX        QG_ORCL      9104           ORCL_CDB$ROOT  DATABASE     
ORCL_PDB$SEED      FLEX        QG_ORCL      1616           ORCL_PDB$SEED  DATABASE     
PDB1_0001          FLEX        QG_ORCL      9424           PDB1           DATABASE     
ASMCMD> lsqg
Group_Num  Quotagroup_Num  Quotagroup_Name  Incarnation  Used_Quota_MB  Quota_Limit_MB  
2          1               GENERIC          1            2488           0               
2          2               QG_ORCL          7            20144          20480           
2          3               QG_CDB           1            10184          20480           
ASMCMD> 

Now I just need to try and push it over the edge to see if the quota has any effect. This is quite simple: all I need to do is create another tablespace in ORCL:PDB1

SQL> show con_name

CON_NAME
------------------------------
PDB1
SQL> create tablespace userdata datafile size 500m;
create tablespace userdata datafile size 500m
*
ERROR at line 1:
ORA-01119: error in creating database file '+FLEX'
ORA-17502: ksfdcre:4 Failed to create file +FLEX
ORA-15437: Not enough quota available in quota group QG_ORCL.

SQL> 

This is similar to what we could already enforce using the storage clause in the create pluggable database command. But what about ASM file types that aren’t data files? Querying the database I can find quite a few of these:

SQL> get /tmp/non_data_files
  1  SELECT
  2      f.group_number as DG_NUMBER,
  3      f.file_number,
  4      f.incarnation as file_incarnation,
  5      f.type,
  6      fg.name as filegroup_name,
  7      a.name as file_name
  8  FROM
  9      v$asm_alias a,
 10      v$asm_file f,
 11      v$asm_filegroup fg
 12  WHERE
 13          a.group_number = f.group_number
 14      AND
 15          a.file_number = f.file_number
 16      AND
 17          a.file_incarnation = f.incarnation
 18      AND
 19          fg.group_number = f.group_number
 20      AND
 21          fg.filegroup_number = f.filegroup_number
 22      AND
 23*         f.type  'DATAFILE';
SQL> start /tmp/non_data_files

 DG_NUMBER FILE_NUMBER FILE_INCARNATION TYPE            FILEGROUP_NAME       FILE_NAME
---------- ----------- ---------------- --------------- -------------------- ------------------------------
         2         282        948453843 PASSWORD        DEFAULT_FILEGROUP    pwdorcl.282.948453843
         2         287        948462715 PASSWORD        DEFAULT_FILEGROUP    pwdorcl.287.948462715
         2         293        948462849 CONTROLFILE     ORCL_CDB$ROOT        Current.293.948462849
         2         292        948462849 CONTROLFILE     ORCL_CDB$ROOT        Current.292.948462849
         2         294        948462855 ONLINELOG       ORCL_CDB$ROOT        group_2.294.948462855
         2         295        948462855 ONLINELOG       ORCL_CDB$ROOT        group_1.295.948462855
         2         296        948462861 ONLINELOG       ORCL_CDB$ROOT        group_1.296.948462861
         2         297        948462861 ONLINELOG       ORCL_CDB$ROOT        group_2.297.948462861
         2         304        948463227 ONLINELOG       ORCL_CDB$ROOT        group_3.304.948463227
         2         305        948463231 ONLINELOG       ORCL_CDB$ROOT        group_3.305.948463231
         2         306        948463239 ONLINELOG       ORCL_CDB$ROOT        group_4.306.948463239
         2         307        948463243 ONLINELOG       ORCL_CDB$ROOT        group_4.307.948463243
         2         298        948462891 TEMPFILE        ORCL_CDB$ROOT        TEMP.298.948462891
         2         302        948462937 TEMPFILE        ORCL_PDB$SEED        TEMP.302.948462937
         2         308        948463249 PARAMETERFILE   ORCL_CDB$ROOT        spfile.308.948463249
         2         312        948464283 TEMPFILE        PDB1_0001            TEMP.312.948464283

16 rows selected.

Let’s take the online redo logs and add another thread to instance 1:

SQL> alter database add logfile thread 1 size 1g;
alter database add logfile thread 1 size 1g
*
ERROR at line 1:
ORA-00301: error in adding log file '+FLEX' - file cannot be created
ORA-17502: ksfdcre:4 Failed to create file +FLEX
ORA-15437: Not enough quota available in quota group QG_ORCL.


SQL> 

This proves that quotas are enforced, at least for a couple of very simple examples.

Summary Part 3

There is certainly a lot more to discover about Quota Groups, datafiles set to autoextent, archivelog growth to a certain size, temp- and undo tablespaces etc. At first glance, it looks good. The key again is to have suitable monitoring in place that warns DBAs about File Groups running out of space. I wonder if that’s port of Enterprise Manager, otherwise it’s easy to write such checks yourself either as custom metrics in OEM or in tools such as Nagios.

DOAG Exaday 2017: a preview

I am quite excited to have been accepted to speak at the upcoming DOAG Exaday taking place June 20 in Frankfurt. It is the third time I have been lucky enough to present at the event, having spoken both in Frankfurt in Hamburg in previous years.

As you can probably imagine by looking at this weblog, I am a great supporter of the Oracle Engineered Systems community. My interest is not limited to Germany: I still remember the many great meetings in London in the context of UKOUG’s Exadays. I am also very lucky to work for @enkitec where we have been sharing real-world experience at E4 for years and will do so again in June this year.

Unfortunately interest in the UKOUG’s Systems Event has dwindled a bit recently which lead me to focus on the Exadata in Germany this year. I am going to speak – in German – about how you can make your migrations to Exadata easier. This is a talk I have delivered a few times before but leaped at the opportunity to update the material to release 12.2.

I am very glad to see my colleague @fritshoogland and many other great speakers too numerous to mention at the event. If you are an engineer working on Oracle Exadata and its cousins this is definitely the event to attend.

Little things worth knowing: Executing RDA on RAC

Result! I have finally been able to gather a complete RDA (Oracle Remote Diagnostic Agent) output on my 2 node RAC system. After consulting the relevant documentation on MOS-which is spread over at least 42 Doc IDs-I found them not to be very helpful to the degree that some of what I read is actually wrong or contradicting. I put together a short note, primarily to myself, to remind me of the process. I hope you find it useful, too.

The RDA version I used for this post is 8.14.x from MOS March 4th 2017. My RAC nodes are based on Oracle Linux 7.3/UEK 4.

Starting the data collection

RDA is one of the standard tools I use and I have previously blogged about it. RDA started off as a simple-to-use tool. After having used it for some years I started to run into issues with the interactive configuration which simply took too long to complete. As soon as I learned about them I fell in love with profiles. RDA profiles only prompt you for relevant information about the product you want to collect data for.

Running RDA on a RAC system is similar to single instance, and you automatically use profiles-a nice touch. It appears to me that RAC data collection is triggered via SSH on the remote nodes with the results being transferred to the local node.

I believe there are two methods of data gathering on RAC: one for Grid Infrastructure, and another one for the RDBMS layer. Depending on your settings of ORACLE_HOME and ORACLE_SID different parts of the stack are analysed. In this post I was primarily interested in running RDA for the RDBMS part of my cluster.

I started off by downloading the current RDA version to the system I want to run it on. This is a 2-node RAC, with nodes named rac12pri1 and rac12pri2. Both are based on Oracle Linux 7.3 with the most current UEK4 at the time of writing. My RDBMS homes are 12.1.0.2 patched with the January 2017 proactive bundle patch + OJVM.

The first caveat looms right at the beginning of the entire process: I didn’t find it stated explicitly in the documentation on MOS, but it seemed that you need to deploy RDA on all nodes in the cluster, and in the exact same location, before you start the data collection. I use /home/oracle/rda for that purpose. Make sure you have plenty of space in the location you chose, sometimes /home doesn’t provide enough for larger, more active systems.

During my first unsuccessful attempts I didn’t deploy RDA on all nodes before starting the data collection, only to see it not gather any information from the remote node. This is somewhat confusing, because the output (when collecting data) states this:

NOD002: Installing RDA software

NOD002 is short for the second node in my cluster.

Also, when configuring the data gathering process, you will see a prompt like this one:

Enter the absolute path of the directory where RDA will be installed on the remote nodes.

To the casual observer like me these messages suggest that RDA is actually installed on the remote nodes as part of the data collection process-but it is only partially done. Comparing directory sizes between remote and local node revealed /home/oracle/rda to be greater than 200M locally, while on the remote nodes it was only 37MB in size. Something seems to be missing…

Once deployed, you can change to the RDA directory and prepare for the data collection. You actually execute RDA on the first node, the rest of the work is done programatically. The MOS note seems to be correct this time, here is an example of my configuration session:

[oracle@rac12pri1 rda]$ ./rda.pl -vXRda start CLOUD -pRac_Assessment
Creating collection "output" ...
        - Trying to identify the domain ... (can take time)
 Inside DFT scenario, define the input and profile targets
 Inside DFT scenario, check Oracle home or Middleware presence
        - RDA:DCbegin ...
------------------------------------------------------------------------------
RDA.BEGIN: Initializes the Data Collection
------------------------------------------------------------------------------
Enter the Oracle home to be used for data analysis
Press Return to accept the default (/u01/app/oracle/product/12.1.0.2/dbhome_1)
>

        - RDA:DCconfig ...
------------------------------------------------------------------------------
RDA.CONFIG: Collects Key Configuration Information
------------------------------------------------------------------------------
        - RDA:DCocm ...
------------------------------------------------------------------------------
RDA.OCM: Set up the Configuration Manager Interface
------------------------------------------------------------------------------
        - RDA:DCstatus ...
------------------------------------------------------------------------------
RDA.STATUS: Produces the Remote Data Collection Reports
------------------------------------------------------------------------------
        - RDA:DCload ...
------------------------------------------------------------------------------
RDA.LOAD: Produces the External Collection Reports
------------------------------------------------------------------------------
        - RDA:DCfilter ...
------------------------------------------------------------------------------
RDA.FILTER: Controls Report Content Filtering
------------------------------------------------------------------------------
        - RDA:DCend ...
------------------------------------------------------------------------------
RDA.END: Finalizes the Data Collection
------------------------------------------------------------------------------
In CRS start scenario, getting nodes using /u01/app/12.1.0.2/grid/bin/olsnodes
2>/dev/null
------------------------------------------------------------------------------
Requesting common cluster information
------------------------------------------------------------------------------
Enter the absolute path of the directory where RDA will be installed on the
remote nodes.
Press Return to accept the default (/home/oracle/rda)
>

Do you want RDA to use an alternative login to execute remote requests (Y/N)?
Press Return to accept the default (N)
>

In CRS start scenario, getting local node information
In CRS start scenario, pinging node rac12pri2
------------------------------------------------------------------------------
Requesting information for node rac12pri1
------------------------------------------------------------------------------
Enter the Oracle home to be analyzed on the node rac12pri1
Press Return to accept the default (/u01/app/oracle/product/12.1.0.2/dbhome_1)
>

Enter the Oracle SID to be analyzed on the node rac12pri1
Press Return to accept the default (NCDBA1)
>

------------------------------------------------------------------------------
Requesting information for node rac12pri2
------------------------------------------------------------------------------
Enter the Oracle home to be analyzed on the node rac12pri2
Press Return to accept the default (/u01/app/oracle/product/12.1.0.2/dbhome_1)
>

Enter the Oracle SID to be analyzed on the node rac12pri2
> NCDBA2

        - RDA:DCremote ...
------------------------------------------------------------------------------
RAC Setup Summary
------------------------------------------------------------------------------
Nodes:
. NOD001  rac12pri1/NCDBA1
. NOD002  rac12pri2/NCDBA2
2 nodes found

So RDA understands the RAC scenario, and it gathers data for each node that is part of the cluster as reported by the olsnodes tool. There was nothing really surprising about the prompts, except that I haven’t found a way to analyse more than one database, or ASM and a database together.

Be careful to enter the correct ORACLE_SID for your (remote) RAC nodes. RDA picks up your ORACLE_HOME and ORACLE_SID if they are set.

Optionally verify the correct number of nodes was detected

I am only mentioning this for the sake of completeness, the RAC Setup Summary you saw in the previous step was more than adequate for me. If you really want to find out about the number of nodes you are about to collect information, be careful: MOS Doc ID 359395.1 is wrong – the command ./rda.pl -vC RDA.REMTOE list will not only list the nodes, it will also start the data collection. Use this one instead, which I found in MOS Doc ID 1682909.1:

[oracle@rac12pri1 rda]$ ./rda.pl -XRemote list
Defined nodes:
  NOD001  rac12pri1  NCDBA1
  NOD002  rac12pri2  NCDBA2
[oracle@rac12pri1 rda]$

It merely lists the nodes, without actually starting to do any work.

Initiate data collection

When you are happy with the output and configuration, start collecting data. Here is an example of my session:

[oracle@rac12pri1 rda]$ ./rda.pl -v -e TRC/TRACE=1
Collecting diagnostic data ...
------------------------------------------------------------------------------
RDA Data Collection Started 04-Mar-2017 16:10:15
------------------------------------------------------------------------------
Processing RDA.BEGIN module ...
 Inside BEGIN module, testing the RDA engine code build
 Inside BEGIN module, testing the report directory
 Inside BEGIN module, testing the module targets
 Inside BEGIN module, launching parallel executions
Processing RDA.CONFIG module ...
 Inside CONFIG module, listing Oracle homes
 Inside CONFIG module, getting Oracle home inventory (can take time)
Processing RDA.REMOTE module ...
NOD001: Detecting storage type
NOD002: Detecting storage type
NOD001: Running RDA command
NOD002: Installing RDA software
NOD002: Running RDA command
NOD002: Transfering report package
NOD001: Transfering report package
Processing RDA.END module ...
 Inside END module, gathering system information
 Inside END module, getting CPU information (linux)
 Inside END module, getting memory information (linux)
 Inside END module, producing the file catalog
 Inside END module, producing target overview
 Inside END module, waiting for parallel execution completion
 Inside END module, producing setting overview
------------------------------------------------------------------------------
RDA Data Collection Ended 04-Mar-2017 16:17:44
------------------------------------------------------------------------------
Generating the reports ...
        - collect/RDA_CONFIG_homes.txt ...
        - collect/RDA_CONFIG_oh_inv.txt ...
        - collect/RDA_END_system.txt ...
        - collect/RDA_END_files.txt ...
        - collect/RDA_END_target.txt ...
        - collect/RDA_END_report.txt ...
        - Report index ...
Packaging collection results ...

[...skipping a lot of text...]

  You can review the reports by transferring the /home/oracle/rda/output
  directory structure to a location where you have web-browser access.
Then, point your browser at this file to display the reports:
    RDA__start.htm

[...skipping a lot of text...]

It is crucially important to see these lines:

Processing RDA.REMOTE module ...
NOD001: Detecting storage type
NOD002: Detecting storage type
NOD001: Running RDA command
NOD002: Installing RDA software
NOD002: Running RDA command
NOD002: Transfering report package
NOD001: Transfering report package

In my first attempts, when I didn’t deploy RDA on all nodes myself, the lines “NOD002: Running RDA command” and “NOD002: Transfering report package” weren’t shown. Unsurprisingly no data was gathered on the remote nodes.

Viewing the output

At the end of the data collection you should see a *.tar.gz file per node. In my 2 node cluster setup, there are two:

[oracle@rac12pri1 rda]$ ls output/remote/
RDA_nod001_output.tar.gz  RDA_nod002_output.tar.gz

You can view these after extracting to a temporary location in your browser, start with the file named RDA__start.htm which presents the various parts of the report.

Further reading: Testing user equivalence prior to gathering data

While trying to troubleshoot my remote data gathering problems (I suspected a problem with SSH at first) I noticed that RDA offers test modules as well (see MOS Doc ID: 330760.1). What’s missing from the Doc ID is an example on how to invoke the SSH test module, or rather the command in the RAC specific instructions seems not to work. But it isn’t too hard to figure out the proper call to the RDA executable. The following worked for me:

[oracle@rac12pri1 rda]$ ./rda.pl -T ssh
Processing remote operation test module ...

Command Availability:
  rcp          (Not found)
  remsh        (Not found)
  rsh          (Not found)
  scp          /bin/scp
  ssh          /bin/ssh
  ssh-add      /bin/ssh-add
  ssh-agent    /bin/ssh-agent

Related files:
  /etc/ssh/moduli
  /etc/ssh/ssh_config
  /etc/ssh/ssh_host_ecdsa_key
  /etc/ssh/ssh_host_ecdsa_key.pub
  /etc/ssh/ssh_host_ed25519_key
  /etc/ssh/ssh_host_ed25519_key.pub
  /etc/ssh/ssh_host_rsa_key
  /etc/ssh/ssh_host_rsa_key.pub
  /etc/ssh/sshd_config
  /home/oracle/.ssh/authorized_keys
  /home/oracle/.ssh/config
  /home/oracle/.ssh/config.backup
  /home/oracle/.ssh/id_rsa
  /home/oracle/.ssh/id_rsa.pub
  /home/oracle/.ssh/known_hosts

Check if an authentication agent has been started
  SSH_AGENT_PID=1
  SSH_AUTH_SOCK=1
Agent identities:
  1024 some:colums:separated:by:colons:indicating:the:ID
  /home/oracle/.ssh/id_rsa (RSA)

Driver Availability:
    da Available
  jsch Available
   ssh Available
   rsh -

Settings modified by the remote operation initialization:
  REMOTE.F_SCP_COMMAND=/bin/scp
  REMOTE.F_SSH_COMMAND=/bin/ssh
  REMOTE.T_SCP_OPTIONS=-BCpq -o ConnectTimeout=30
  REMOTE.T_SSH_OPTIONS=-Cnq -o ConnectTimeout=30

------------------------------------------------------------------------------
Test a remote command
------------------------------------------------------------------------------
Enter the remote host name
Press Return to accept the default (localhost)
> rac12pri2

Do you want RDA to use an alternative login to execute remote requests (Y/N)?
Press Return to accept the default (N)
>

Check remote command execution with best driver
Exit code: 0

Check remote command execution using DA
Exit code: 0

Check remote command execution using JSCH
Exit code: 0

Check remote command execution using SSH
Exit code: 0

Check remote command execution using SSH (no timeout)
Exit code: 0
[oracle@rac12pri1 rda]$

Looking at the above output let me to believe that there isn’t a SSH-related problem with my cluster.

New theme – it was about time

After a lot of searching and testing I have finally decided to put the ageing Andreas09 theme a rest and try a new one. It has been nagging me for years that the representation of the blog is more like the early 2000s than 201x. Or was it late 1990s? Anyway, out with the old, in with the new.

The biggest problem I have with finding suitable templates is a) that I have insufficient talent in the fine arts, and b) that I present a lot of code examples. I guess point a) can’t be helped much, but b) had to be reasonably taken care of.

The monospaced font I use in the code blocks is usually far too large-10pt or more-in most themes that I otherwise liked a lot. The current one seems to use 10pt at maximum which is a good compromise although I would have preferred 9pt. My favourite font family for coding since my MacOS 7 days is Monaco, 9pt. I wish :)

I think the new theme works, at least for the last 10 posts I saw on my laptop. I am not 100% sure about tablet or mobile devices though, so please let me know. I am willing to take a leap of faith because, on the other hand (although I have never checked) I guess most visitors to this blog hit the pages via a search engine using a desktop computer or laptop of some sort.

So in summary I hope you like the new layout as it is, and I appreciate your comments. Should you know a better theme for my style of writing (code!) then let me know in the comments as well, I might give it a whirl.

Creating a RAC 12.1 Data Guard Physical Standby environment (4)

In the previous three parts of this series a lot of preparation work, needed for the configuration of Data Guard, was performed. In this part of the mini-series they all come to fruition. Using the Data Guard broker a switchover operation will be performed. A couple of new features in 12c make this easier. According to the “Changes in This Release for Oracle Data Guard Concepts and Administration” chapter of the 12.1 Data Guard Concepts and Administration guide:

When [you, ed.] perform a switchover from an Oracle RAC primary database to a physical standby database, it is no longer necessary to shut down all but one primary database instance.

I have always wanted to test that in a quiet moment…

I have previously blogged about another useful change that should make my life easier: the static registration of the *_DGMGRL services in the listener.ora file is no longer needed. Have a look at my Data Guard Broker Setup Changes post for more details and reference to the documentation.

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Now let’s get to it.

Step 1: Check the status of the configuration

In the first step I always check the configuration and make sure I can switch over. Data Guard 12c has a nifty automatic check that helps, but I always have a list of tasks I perform prior to a switchover (not shown in this blog post).

The following commands are somewhat sensitive to availability of the network – you should protect your sessions against any type of network failure! I am using screen (1) for that purpose, there are other tools out there doing similar things. Network glitches are too common to ignore, and I have come to appreciate the ability to resume work without too many problems after having seen the dreaded “broken pipe” message in my terminal window…

[oracle@rac12sec1 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdbb
Password:
Connected as SYSDBA.
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 55 seconds ago)

DGMGRL> validate database 'NCDBB'
...

The command to check for switchover readiness is new to 12c as well and called “validate database”. I don’t have screen output from the situation at this point-just take my word that I was ready :) Don’t switch over if you have any concerns or doubts the operation might not succeed! “Validate database” does not relieve you from your duties to check for switchover readiness – follow your procedures.

Step 2: Switch Over

Finally, the big moment has come! It takes just one line to perform the switchover:

DGMGRL> switchover to 'NCDBB'
Performing switchover NOW, please wait...
New primary database "NCDBB" is opening...
Oracle Clusterware is restarting database "NCDBA" ...
Switchover succeeded, new primary is "NCDBB"
DGMGRL> 

DGMGRL> show database 'NCDBA';

Database - NCDBA

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 9.00 KByte/s
  Real Time Query:    ON
  Instance(s):
    NCDBA1
    NCDBA2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB';

Database - NCDBB

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBB1
    NCDBB2

Database Status:
SUCCESS

DGMGRL> 

Well that was easy! Did you notice Data Guard Broker telling us that ‘Oracle Clusterware is restarting database “NCDBA” …’ ? I like it.

If you get stuck at this point something has gone wrong with the database registration in the OCR. You shouldn’t run into problems though, because you tested every aspect of the RAC system before handing the system over to its intended users, didn’t you?

Validating the new standby database shows no issues. I haven’t noticed it before but “validate database” allows you to get more verbose output:

DGMGRL> validate database verbose 'NCDBA';

  Database Role:     Physical standby database
  Primary Database:  NCDBB

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Capacity Information:
    Database  Instances        Threads
    NCDBB     2                2
    NCDBA     2                2

  Temporary Tablespace File Information:
    NCDBB TEMP Files:  1
    NCDBA TEMP Files:  1

  Flashback Database Status:
    NCDBB:  On
    NCDBA:  Off

  Data file Online Move in Progress:
    NCDBB:  No
    NCDBA:  No

  Standby Apply-Related Information:
    Apply State:      Running
    Apply Lag:        0 seconds (computed 1 second ago)
    Apply Delay:      0 minutes

  Transport-Related Information:
    Transport On:      Yes
    Gap Status:        No Gap
    Transport Lag:     0 seconds (computed 1 second ago)
    Transport Status:  Success

  Log Files Cleared:
    NCDBB Standby Redo Log Files:  Cleared
    NCDBA Online Redo Log Files:   Cleared
    NCDBA Standby Redo Log Files:  Available

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDBB)                 (NCDBA)
    1         2                       3                       Sufficient SRLs
    2         2                       3                       Sufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDBA)                 (NCDBB)
    1         2                       3                       Sufficient SRLs
    2         2                       3                       Sufficient SRLs

  Current Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (NCDBB)                   (NCDBA)
    1          50 MBytes                 50 MBytes
    2          50 MBytes                 50 MBytes

  Future Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (NCDBA)                   (NCDBB)
    1          50 MBytes                 50 MBytes
    2          50 MBytes                 50 MBytes

  Apply-Related Property Settings:
    Property                        NCDBB Value              NCDBA Value
    DelayMins                       0                        0
    ApplyParallel                   AUTO                     AUTO

  Transport-Related Property Settings:
    Property                        NCDBB Value              NCDBA Value
    LogXptMode                      ASYNC                    ASYNC
    RedoRoutes                      <empty>                  <empty>
    Dependency                      <empty>                  <empty>
    DelayMins                       0                        0
    Binding                         optional                 optional
    MaxFailure                      0                        0
    MaxConnections                  1                        1
    ReopenSecs                      300                      300
    NetTimeout                      30                       30
    RedoCompression                 DISABLE                  DISABLE
    LogShipping                     ON                       ON

  Automatic Diagnostic Repository Errors:
    Error                       NCDBB    NCDBA
    No logging operation        NO       NO
    Control file corruptions    NO       NO
    SRL Group Unavailable       NO       NO
    System data file missing    NO       NO
    System data file corrupted  NO       NO
    System data file offline    NO       NO
    User data file missing      NO       NO
    User data file corrupted    NO       NO
    User data file offline      NO       NO
    Block Corruptions found     NO       NO

DGMGRL> 

Isn’t that cool? That’s more information at my fingertips than I can shake a stick at! It’s also a lot more than I could think of (eg online datafile move!).

Interestingly the Broker reports that I have “Sufficient SRLs”. I have seen it complain about the number of Standby Redo Logs in the past and blogged about this Interesting observation about standby redo logs in Data Guard

Summary

After 4 (!) posts about the matter I have finally been able to perform a switchover operation. Role reversals are a much neglected operation a DBA should be comfortable with. In a crisis situation everyone needs to be clear about what needs to be done to restore service to the users. The database is usually the easier part … Success of Data Guard switchover operations also depends on the quality of change management: it is easy to “forget” applying configuration changes on the DR site.

In today’s busy times only few of us are lucky enough to intimately know each and every database we look after. What’s more common (sadly!) is that a DBA looks after 42 or more databases. This really only works without too many issues if procedures and standards are rock solid, and enforced.

Signed on the dotted line-Enkitec!

I had to think of ‏@OyvindIsene, a great ambassador of the Norwegian Oracle User Group when I typed the heading for this post. Unlike him I have not actively been looking for new challenges but sometimes things just develop, and in my case that was a great turn of events. I am very happy to have signed on the dotted line and in a couple of weeks will join Enkitec in Europe.

How did that happen? During an Oracle conference I met Andy Colvin together with some of his colleagues during a break in the busy schedule. I already knew and respected Enkitec as a great company with lots of seriously experienced DBAs. I feel fortunate to actually know some of them already from email and other social media exchanges.

Andy and I have exchanged a few tweets in the past and I really like his blog so I was curious to meet him in person. I haven’t yet had the opportunity to go and speak at an American conference so anytime someone I know from the other side of the Atlantic comes to Europe I try to meet up. I had a great time but unfortunately had to run since my talk started a few minutes later. It was quite funny actually although I’m not so sure if my presentation was up to my own expectations. The conversations I had made a lasting impression on me.

Over the cause of the next months we remained in contact, and I had the great pleasure to meet Kerry Osborne together with Andy a little later and that was when I seriously thought that I wanted to join a team I do admire. Now with all the paperwork done and dusted, and having signed I can’t wait to get started.

Lights-out management console on Supermicro boards

So this is slightly off topic, as it doesn’t deal with anything directly related to Oracle, it’s more a reference to those who are using Supermicro boards like I do. The nice thing about professional hardware is that you do not need a keyboard or mouse, or even a monitor. The board I have comes with an IPMI interface, and has a KVM over IP ability. The interface can be accessed from a dedicated software or a web browser.

IPMIView

The suitable application to access your KVM is called IPMIView, and it’s available for MacOS, Linux and Windows.

Since I didn’t want to install a lot of application on my laptop I opted to install IPMIView for Linux on my Ubuntu 12.04 LTS desktop running within a virtual machine. However I couldn’t get it to install at all until I found a great source how to do so. Here are the steps:

  1. Download IPMIView from the Supermicro website: ftp://ftp.supermicro.com/utility/IPMIView
  2. Make the file executable: chmod u+x IPMIView…bin
  3. Execute the file (note it requires an X11 DISPLAY): ./IPMIView…bin

It will then open an installer wizard that guides you through the installation. Although JVM based, you don’t need to download a JRE beforehand. It comes with its own-just be sure not to use it for anything but IPMIView, it seems to be an old one: 1.6.03.

That’s it-change directory to where IPMIView has been installed and launch it. Bizarrely that fails unless you are root on my system!

To connect to your server, you obviously need a bridge network interface in your virtual machine. In VMware workstation you might have to change the bridged network definition and ensure that it is bridging to the LAN port, not WLAN as it did in my case.

If you are unsure which IP you assigned to your IPMI interface you can instruct the software to perform a scan which is quite useful. Once discovered, you can save your server(s) to the interface and connect. On initial contact, the default username is ADMIN, the default password (you are guessing it …) ADMIN in upper case.

I suggest you change the password RIGHT NOW (click on the USERS tab and do so). Here’s the one of the reasons I really like the IPMIView application:

OK it’s not that much of an advantage right now as I can easily reach under my desk but once the server is out of reach then the option to reboot is nice. Another great tool is the graphical KVM console! I can even use “virtual storage”, for example an ISO image on my laptop and present this as if it was inserted in the DVD drive. Even if there is no physical DVD drive at all :) There was a glitch with the software that prevented me from selecting installation media (I haven’t tried the Windows version of IPMIView) in the virtual media menu on my Linux VM. I could see from the manual that there should be an additional tab at the bottom, next to the KVM Console but for some reason it was greyed out. There might be a way to get it back but I didn’t explore it further-the virtual media worked with the web interface.

Web browser

This looks like the way to go: simply point your web browser to the IP address of the IPMI card and off you go! For the KVM to work you need a Java runtime on your host. To be fair, the web interface is a lot nicer than the Java client, especially since it allows you to resize the window and the UI elements will scale accordingly. When executed on Windows you finally have the opportunity to mount an ISO image to the server. A windows share (I assume Samba works as well) with the ISOs is required to boot, and it’s very simple to set up and mount.

Reference

OT: I want this (WordPress) theme

Apologies for a completely non-technical blog item – feel free to skip!

However, if you have used a Mac SE like I have, running on System 6, then you’d love this WordPress theme I just discovered:

System 6 theme

Unfortunately I can’t use it as most of the source code I’m posting is too wide.

I can almost hear the characteristic SuperDrive noise when reading one of my floppy disks. A moment of nostalgia.

Installing OEM 12c agents in RPM format

One of the questions I have always asked myself revolved around: “why doesn’t Oracle package certain software as an RPM on Linux?” Well this question has recently been answered in the form of the Oracle 12c agent. It IS possible to use an RPM based installation, although it doesn’t make 100 use of RPM. I have written this post to give you an idea what happens.

The procedure is described in the OEM 12 Cloud Control Advanced Installation and Configuration Guide, chapter 6. The process is very similar to the non-RPM based agent deployment. Let’s have a loot at it in detail.

Continue reading