Saturday, February 16, 2019

Validate weather the oracle DBs you manage are in sync with your catalogue or inventory

Objective: Validate weather the oracle DBs you manage are in sync with your catalogue or inventory. Here our ask is to ensure that all the DBs or instance listed under that server are really there in oratab & running @ the os level.

Assumption: You manage your oracle DB catalogue or inventory in excel or in DB as a table. You dump the information as you see below.

hostname, RAC, os-user,#of instance,instance_names
host1,    N,    oracle,    2,    orainst1;orainst2;
host2,    N,    oracle,    2,    orainst3;orainst4;


Step 1: Ensure you have passwordless ssh setup from the server your going to run this validation.

ssh-keygen -t rsa
ssh-copy-id oracle@host2
ssh host2  << shouldnt prompt for password

Step 2: Processs the csv file.

while read line
do
srvnm=`echo $line|cut -d ',' -f 1`
srvusr=`echo $line|cut -d ',' -f 3`
invcnt=`echo $line|cut -d ',' -f 4`
inslst1=`echo $line|cut -d ',' -f 5`
inslst2=`echo $inslst1|tr ";" "\n"`

Step 3: Remote connect to the target server & do necessary validations

#!/bin/ksh
while read line
do
srvnm=`echo $line|cut -d ',' -f 1`
srvusr=`echo $line|cut -d ',' -f 3`
invcnt=`echo $line|cut -d ',' -f 4`
inslst1=`echo $line|cut -d ',' -f 5`
inslst2=`echo $inslst1|tr ";" "\n"`
srvrsp=$(ssh -o StrictHostKeyChecking=no $srvusr@$srvnm /bin/sh -s $inslst2 <<"EOF"
hstnm=`uname -n|cut -d '.' -f 1`
echo $hstnm
typeset -i j
j=0
typeset -i k
k=0
for i in "$@"
do
k=`cat /etc/oratab|grep -v ASM|grep -v MGMTDB|grep -v "^$"|grep -v "^#"|cut -d ":" -f 1|grep -iw $i|wc -l`
if [ $k -eq 1 ]
then
j=j+1
fi
done
echo $j
j=0
k=0
for i in "$@"
do
k=`ps -ef|grep -iw ora_pmon_$i|grep -v grep|wc -l`
if [ $k -eq 1 ]
then
j=j+1
fi
done
echo $j
EOF
)
rtc=$?
if [ $rtc -eq 0 ]
then
oracnt=`echo $srvrsp|cut -d ' ' -f 2`
runcnt=`echo $srvrsp|cut -d ' ' -f 3`
        if [ $invcnt -eq $oracnt ]
        then
                if [ $invcnt -eq $runcnt ]
                then
                echo "$srvnm, Conn_OK, Oratab_OK, Pmon_OK"
                else
                echo "$srvnm, Conn_OK, Oratab_OK, Pmon_NOT"
                fi
        else
                if [ $invcnt -eq $runcnt ]
                then
                echo "$srvnm, Conn_OK, Oratab_NOT, Pmon_OK"
                else
                echo "$srvnm, Conn_OK, Oratab_NOT, Pmon_NOT"
                fi
        fi
else
echo "$srvnm, conn_NOT, Oratab_NA, Pmon_NA"
fi
done < catalogue.input

./invVAL_v1.sh
host1, Conn_OK, Oratab_OK, Pmon_OK
host2, Conn_OK, Oratab_OK, Pmon_NOT


so in host2, I have only 1 instance alive, so pmon_NOT is reported (a _NOT means we arent in sync with our catalogue), which is fine.

Thanks

Sunday, February 3, 2019

Objective: Lost controlfile backup - will resetlogs with latest controlfile with new df - work?

version : 12.1

[please read previous post with no new df before reading this post to get some knowledge on the setup]

step 1) create a tablespace

SQL> alter session set container=pdb2;

Session altered.



SQL> create tablespace tbstest datafile '.../tbstest1.dbf' size 100k autoextend on next 100k;

Tablespace created.


step 2) create a table in the new tablespace with dummy data such as dba_users.

SQL> alter user scott quota unlimited on tbstest;

User altered.

SQL> create table scott.tbstest1 tablespace tbstest as select * from dba_users;

Table created.



step 3) backup current controlfile

piece handle=/oradata_1/backup/cf_ORCL_0jtp04n5_1_1-02 tag=TAG20190203T224244 comment=NONE

step 4) restart the instance using the pfile used with objective (i), since it was a pfile, it wasnt dropped by oracle - 'drop database command'

step 5) restore the controlfile

[oracle@host backup]$ ls -altr /oradata_1/orCL/cf_resorcl.ctl
ls: cannot access /oradata_1/orCL/cf_resorcl.ctl: No such file or directory
[oracle@host backup]$


step 6) mount the db and validate that the controlfile is intact new

SQL> select checkpoint_change#,controlfile_change#,controlfile_time from v$database;

CHECKPOINT_CHANGE# CONTROLFILE_CHANGE# CONTROLFILE_TIME
------------------ ------------------- -------------------
           5336799             5336873 03/02/2019 22:42:45

SQL> select con_id,checkpoint_change#,count(1) from v$datafile group by con_id,checkpoint_change# order by con_id;

            CON_ID CHECKPOINT_CHANGE#           COUNT(1)
------------------ ------------------ ------------------
                 1            5336799                  4
                 2             668608                  2
                 4            5336799                 18

see on con_id =>4 we have +1 df, which is pdb2's tbstest.

the actual controlfile backup points to the below...

   CHECKPOINT_CHANGE#   CONTROLFILE_CHANGE# CONTROLFILE_TIME
--------------------- --------------------- -------------------
              5332853               5332833 03/02/2019 19:22:33

so this is different and is the latest controlfile used for restore.

step 7) let us do the restore using the same command as objective (i)...

pre - restore:

[oracle@host RESORCL]$ pwd
/oradata_2/orCL/RESORCL
[oracle@host RESORCL]$ ls -altr
total 20
drwxrwxr-x. 3 oracle oracle 4096 Feb  3 22:03 ..
-rw-rw-r--. 1 oracle oracle 4781 Feb  3 22:24 controlresorcl.trc
-rw-rw-r--. 1 oracle oracle 2053 Feb  3 22:26 controlresorcl.sql
drwxrwx---. 2 oracle oracle 4096 Feb  3 22:34 .
[oracle@host RESORCL]$


RMAN> run
{
2> 3> set until time "TO_DATE('02/03/2019 19:25:31', 'MM/DD/YYYY HH24:MI:SS')";
4> allocate channel c1 device type disk;
5> allocate channel c2 device type disk;
6> allocate channel c3 device type disk;
7> allocate channel c4 device type disk;
set newname for database to new;
8> 9> restore database;
}10>


..

restore is complete

step 8) switch database to copy... will rely on the actual dbf... let us see...


RMAN> switch database to copy;

using target database control file instead of recovery catalog
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of switch to copy command at 02/03/2019 23:17:37
RMAN-06571: datafile 49 does not have recoverable copy

RMAN>

so we need to switch individual files.

set lines 300
set pages 3000
select 'switch datafile '||file#||' to copy;' from v$datafile_copy where status='A' order by file#;


RMAN>
datafile 46 switched to datafile copy "/oradata_2/orCL/RESORCL/datafile/o1_mf_test123_g5g05o6w_.dbf"

RMAN>
datafile 47 switched to datafile copy "/oradata_2/orCL/RESORCL/datafile/o1_mf_test123_g5g05o80_.dbf"
..


SQL> select substr(name,1,15),count(1) from v$datafile group by substr(name,1,15);

SUBSTR(NAME,1,15)                                              COUNT(1)
------------------------------------------------------------ ----------
...............                                                       1
/oradata_2/orCL                                                      23

SQL> select file#,name,status from v$datafile where SUBSTR(NAME,1,15)='.....';

     FILE#
----------
NAME
--------------------------------------------------------------------------------
STATUS
-------
        49
..../tbstest1.dbf
ONLINE



SQL> select checkpoint_change#,controlfile_change#,controlfile_time from v$database;

CHECKPOINT_CHANGE# CONTROLFILE_CHANGE# CONTROLFILE_TIME
------------------ ------------------- -------------------
           5336799             5336873 03/02/2019 22:42:45

SQL> select con_id,checkpoint_change#,count(1) from v$datafile group by con_id,checkpoint_change# order by con_id;

            CON_ID CHECKPOINT_CHANGE#           COUNT(1)
------------------ ------------------ ------------------
                 1            5336799                  4
                 2             668608                  2
                 4            5336799                 18

SQL> select con_id,checkpoint_change#,count(1) from v$datafile_header group by con_id,checkpoint_change# order by con_id;

            CON_ID CHECKPOINT_CHANGE#           COUNT(1)
------------------ ------------------ ------------------
                 1            5332853                  4
                 2             668608                  2
                 4            5332833                 17
                 4            5337495                  1

the only datafile left as is in source. Now let us proceed to recreate the controlfile after removing the #49 file from controlfile trace.

the actual controlfile or coldbackup ckpt info:

SQL> select con_id,checkpoint_change#,count(1) from v$datafile group by con_id,checkpoint_change#;

               CON_ID    CHECKPOINT_CHANGE#              COUNT(1)
--------------------- --------------------- ---------------------
                    1               5332853                     4
                    2                668608                     2
                    4               5332833                    17

SQL> select con_id,file#,checkpoint_change#,checkpoint_time from v$datafile_header where file#=49;

            CON_ID              FILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME
------------------ ------------------ ------------------ -------------------
                 4                 49            5337495 03/02/2019 22:44:51


so the rest of the file are intact except the file#49.

step 9) create controlfile trace for resetlogs and edit it.

hint: Please pay attention to the controlfile edit when you remove the last line in the file list, the comma at the end of the line should be taken care for previous line.

Otherwise you will hit the below error...
CREATE CONTROLFILE REUSE DATABASE "ORCL" RESETLOGS  NOARCHIVELOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-01163: SIZE clause indicates 12800 (blocks), but should match header 13
ORA-01110: data file 47:
'/oradata_2/orCL/RESORCL/datafile/o1_mf_test123_g5g05o80_.dbf'

step 10) create controlfile and perform resetlogs togather


SQL> startup nomount pfile='/oradata_1/backup/pfileRESORCL.ora';
ORACLE instance started.

Total System Global Area         1319473152 bytes
Fixed Size                          2288056 bytes
Variable Size                    1006634568 bytes
Database Buffers                  301989888 bytes
Redo Buffers                        8560640 bytes
SQL> select instance_name,status from v$instance;

INSTANCE_NAME    STATUS
---------------- ------------
resorcl          STARTED

SQL> @/oradata_2/orCL/RESORCL/cntrl_resorcl_3feb_obj2.sql

Control file created.


Database altered.


Pluggable database altered.

SQL>
...

...
Completed: CREATE CONTROLFILE REUSE DATABASE "ORCL" RESETLOGS  NOARCHIVELOG
...
2019-02-04 00:10:43.170000 +08:00
Starting background process CJQ0
Completed: ALTER DATABASE OPEN RESETLOGS

step 11) validate the db

SQL> select count(1) from scott.tbstest1;
select count(1) from scott.tbstest1
                           *
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> select con_id,owner,table_name from cdb_tables where lower(table_name)='tbstest1';

no rows selected

SQL> select tablespace_name from dba_tablespaces where upper(tablespace_name)='TBSTEST';

no rows selected

SQL>

SQL> select count(1) from scott.t1;

          COUNT(1)
------------------
                16

looks fine.

SQL> select checkpoint_change#,controlfile_change#,controlfile_time from v$database;

CHECKPOINT_CHANGE# CONTROLFILE_CHANGE# CONTROLFILE_TIME
------------------ ------------------- -------------------
           5332857             5333345 04/02/2019 00:10:43

SQL> select con_id,checkpoint_change#,count(1) from v$datafile group by con_id,checkpoint_change# order by con_id;

            CON_ID CHECKPOINT_CHANGE#           COUNT(1)
------------------ ------------------ ------------------
                 1            5332857                  4
                 2             668608                  2
                 4            5333332                 17

SQL> select con_id,checkpoint_change#,count(1) from v$datafile_header group by con_id,checkpoint_change# order by con_id;

            CON_ID CHECKPOINT_CHANGE#           COUNT(1)
------------------ ------------------ ------------------
                 1            5332857                  4
                 2             668608                  2
                 4            5333332                 17

SQL> select status,count(1) from v$datafile group by status;

STATUS            COUNT(1)
------- ------------------
ONLINE                  20
SYSTEM                   3

so no datafiles in offline or recovery mode,this is specifically to verify file#49 causing any issues.

SQL> select count(1) from v$datafile where file#=49;

          COUNT(1)
------------------
                 0

SQL>


also there is no noname entry or tbstest1.dbf found in the db.

[oracle@host RESORCL]$ grep -i noname control_postreset_resorcl.trc
[oracle@host RESORCL]$ grep -i tbstest1.dbf control_postreset_resorcl.trc

so all set with this test as well.

Inference: This is no different from no new df situation, all we need to do is remove all of those newly added datafiles from copntrolfile trace before recreating the controlfile.

Thanks

Objective: Lost controlfile backup - will resetlogs with latest controlfile with no new df work?

version: 12.1
step 1) setup a test table

SQL> create table t1 as select * from user_objects;

Table created.

SQL> select count(1) from t1;

  COUNT(1)
----------
        16


step 2) Perform a cold backup of the db in compressed form
shutdown immediate;
startup mount;

run
{
allocate channel c1 device type disk;
allocate channel c2 device type disk;
allocate channel c3 device type disk;
allocate channel c4 device type disk;
backup as compressed backupset database format '/oradata_1/backup/%d/db_bkp_%d_%U';
backup current controlfile format '/oradata_1/backup/%d/cf_bkp_%d_%U';
}

point in time info:

Recovery Manager: Release 12.1.0.1.0 - Production on Sun Feb 3 19:25:31 2019


step 3) note down the checkpoint change and controlfile time.

SQL> select checkpoint_change#,controlfile_change#,controlfile_time from v$database;

   CHECKPOINT_CHANGE#   CONTROLFILE_CHANGE# CONTROLFILE_TIME
--------------------- --------------------- -------------------
              5332853               5332833 03/02/2019 19:22:33

SQL> select distinct(checkpoint_change#) from v$datafile;

   CHECKPOINT_CHANGE#
---------------------
               668608
              5332833
              5332853

SQL> select con_id,checkpoint_change#,count(1) from v$datafile group by con_id,checkpoint_change#;

               CON_ID    CHECKPOINT_CHANGE#              COUNT(1)
--------------------- --------------------- ---------------------
                    1               5332853                     4
                    2                668608                     2
                    4               5332833                    17

SQL>

step 4) open the db and add some records to the test table

alter database open;
select count(1) from t1;
insert into t1
select * from t1;
commit;

<< here we verified we had 32 records.

step 5) checkpoint the db and note down the checkpoint change in v$database and controlfile_change,controlfile_time

SQL> select checkpoint_change#,controlfile_change#,controlfile_time from v$database;

CHECKPOINT_CHANGE# CONTROLFILE_CHANGE# CONTROLFILE_TIME
------------------ ------------------- -------------------
           5333481             5333482 03/02/2019 19:44:25

SQL> select con_id,checkpoint_change#,count(1) from v$datafile group by con_id,checkpoint_change# order by con_id;

            CON_ID CHECKPOINT_CHANGE#           COUNT(1)
------------------ ------------------ ------------------
                 1            5333481                  4
                 2             668608                  2
                 4            5333481                 17



SQL> select checkpoint_change#,controlfile_change#,controlfile_time from v$database;

CHECKPOINT_CHANGE# CONTROLFILE_CHANGE# CONTROLFILE_TIME
------------------ ------------------- -------------------
           5333481             5333559 03/02/2019 19:46:28

SQL> select con_id,checkpoint_change#,count(1) from v$datafile group by con_id,checkpoint_change# order by con_id;

            CON_ID CHECKPOINT_CHANGE#           COUNT(1)
------------------ ------------------ ------------------
                 1            5333481                  4
                 2             668608                  2
                 4            5333481                 17


step 6) backup the controlfile, create pfile from spfile.

channel ORA_DISK_1: finished piece 1 at 03-FEB-19
piece handle=/oradata_1/backup/cf_ORCL_0htovqbk_1_1 tag=TAG20190203T194556 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01

SQL> create pfile='/oradata_1/backup/pfileorcl.ora' from spfile;

File created.

step 7) shutdown the database

step 8) modify the pfile with new db_unique_name and set db_create_file_dest,controlfile_path to something else than original

step 9) Rename the original controlfile backup that was created along with the backupset

piece handle=/oradata_1/backup/ORCL/cf_bkp_ORCL_0ftovps1_1_1 tag=TAG20190203T193736 comment=NONE

[oracle@db12c ORCL]$ ls -altr /oradata_1/backup/ORCL/cf_bkp_ORCL_0ftovps1_1_1
ls: cannot access /oradata_1/backup/ORCL/cf_bkp_ORCL_0ftovps1_1_1: No such file or directory
[oracle@db12c ORCL]$

step 10) startup nomount,restore controlfile from latest cf backup (mimic of no original cf backup)

run
{
restore controlfile from '/oradata_1/backup/cf_ORCL_0htovqbk_1_1';
}

step 11) verify the controlfile time, see to be sure the controlfile isnt the backup controlfile associated with our point in time restore.

RMAN> select name,open_mode from v$database;

NAME      OPEN_MODE
--------- --------------------
ORCL      MOUNTED

RMAN>


SQL> select checkpoint_change#,controlfile_change#,controlfile_time from v$database;

CHECKPOINT_CHANGE# CONTROLFILE_CHANGE# CONTROLFILE_TIME
------------------ ------------------- -------------------
           5333481             5333531 03/02/2019 19:45:56

SQL> select con_id,checkpoint_change#,count(1) from v$datafile group by con_id,checkpoint_change# order by con_id;

            CON_ID CHECKPOINT_CHANGE#           COUNT(1)
------------------ ------------------ ------------------
                 1            5333481                  4
                 2             668608                  2
                 4            5333481                 17

the actual controlfile backup points to the below...

   CHECKPOINT_CHANGE#   CONTROLFILE_CHANGE# CONTROLFILE_TIME
--------------------- --------------------- -------------------
              5332853               5332833 03/02/2019 19:22:33

so it ensures we arent using the controlfile backup associated with the backup. Now let us move on to perform a point in time restore.

step 12) perform DB PITR - Point In Time Restore.

run
{
set until time "TO_DATE('02/03/2019 19:25:31', 'MM/DD/YYYY HH24:MI:SS')";
allocate channel c1 device type disk;
allocate channel c2 device type disk;
allocate channel c3 device type disk;
allocate channel c4 device type disk;
set newname for database to new;
restore database;
}

select OPERATION,status,MBYTES_PROCESSED,START_TIME,END_TIME,INPUT_BYTES/1024/1024 in_mb,OUTPUT_BYTES/1024/1024 out_mb
from v$rman_status
where status='RUNNING'
order by start_time;

hint: always validate the syntax before firing the command in rman using "rman checksyntax"

step 13) switch database to copy

RMAN> switch database to copy;
..>done



step 14) verify the vdf.ckpt,vdb.ckpt,vdb.ctnl_ckpt,vdf_header.ckpt

SQL> select checkpoint_change#,controlfile_change#,controlfile_time from v$database;

CHECKPOINT_CHANGE# CONTROLFILE_CHANGE# CONTROLFILE_TIME
------------------ ------------------- -------------------
           5333481             5333531 03/02/2019 19:45:56

SQL> select con_id,checkpoint_change#,count(1) from v$datafile group by con_id,checkpoint_change# order by con_id;

            CON_ID CHECKPOINT_CHANGE#           COUNT(1)
------------------ ------------------ ------------------
                 1            5333481                  4
                 2             668608                  2
                 4            5333481                 17

SQL> select con_id,checkpoint_change#,count(1) from v$datafile_header group by con_id,checkpoint_change# order by con_id;

            CON_ID CHECKPOINT_CHANGE#           COUNT(1)
------------------ ------------------ ------------------
                 1            5332853                  4
                 2             668608                  2
                 4            5332833                 17


select con_id,fuzzy,ERROR,RECOVER,status, CHECKPOINT_CHANGE#,CHECKPOINT_TIME,count(1) from v$datafile_header group by con_id,fuzzy,ERROR,RECOVER,status, CHECKPOINT_CHANGE#,CHECKPOINT_TIME order by con_id;

            CON_ID FUZ ERROR      REC STATUS  CHECKPOINT_CHANGE# CHECKPOINT_TIME               COUNT(1)
------------------ --- ---------- --- ------- ------------------ ------------------- ------------------
                 1 NO                 ONLINE             5332853 03/02/2019 19:22:35                  4
                 2 NO                 ONLINE              668608 30/08/2013 15:44:52                  2
                 4 NO                 ONLINE             5332833 03/02/2019 19:22:33                 17


so in short, vdf.ckpt != vdf_header.ckpt. So this is expected since controlfile feeds the vdf view.


step 15) rename the redologs and try to do resetlogs.

resetlogs will throw an error saying the system df needs more recovery. So the resetlogs didnt work.

step 16) work around is to recreate the controlfile...

SQL> startup nomount pfile='/oradata_1/backup/pfileRESORCL.ora';
ORACLE instance started.

Total System Global Area 1319473152 bytes
Fixed Size                  2288056 bytes
Variable Size            1006634568 bytes
Database Buffers          301989888 bytes
Redo Buffers                8560640 bytes
SQL> @/oradata_2/orCL/RESORCL/controlresorcl.sql

Control file created.


Database altered.

SQL>


alert log:
...there were complaints abt temp tbspc being empty on cdb,pdb and other container dbs, anyway they will be added later.
2019-02-03 22:29:20.742000 +08:00
Starting background process CJQ0
Completed: ALTER DATABASE OPEN RESETLOGS
CJQ0 started with pid=33, OS id=3238

SQL> select CON_ID,name,open_mode from v$pdbs;

    CON_ID NAME                           OPEN_MODE
---------- ------------------------------ ----------
         4 PDB2                           READ WRITE

SQL> alter session set container=pdb2;

Session altered.


SQL> select count(1) from scott.t1;

  COUNT(1)
----------
        16

SQL> select con_id,owner,table_name from cdb_tables where owner='SCOTT' and table_name='T1';

    CON_ID
----------
OWNER
--------------------------------------------------------------------------------
TABLE_NAME
--------------------------------------------------------------------------------
         4
SCOTT
T1

so this confirms the # of records matches the actual controlfile time.

Inference: So a resetlogs with advanced controlfile isnt going to work event without any new dfs. So we need to recreate the controlfile which is the only option.

Flashback data archive steps

 Objective: Ways to track DML changes in a table Detailed objective: We should be able to track the changes in the table at any point in tim...