Friday, December 29, 2023

Oracle GoldenGate - 1403 and KeyCols defination by GoldenGate automatically

********************* READ My update below for quick read **********

Dec 30 2023:

Can we trust the oracle GG's default mechanism to pick keycols for target when there is no primary key or unique key with not null constraint defined?

NO!!!

Oracle GG tries it level best to pick to pick a proper keycol for the target to perform the DMLs.

But at times it fails, so be prepare to face it, in case your table list part of replication doesn't have PK or UK with not null defined.

GG replication will throw 1403 errors during delete, update operations.

Since the keycol it developed at times may not identify the record in target.

For ex.

You may have a source record with below signature...

N,C1,C2,C3 -- attributes

1,A,C,D

Target:

1,B,D,D

In this case if oracle picks N as keycol and N happened to be a unique key, your good.

If oracle GG goes an extra mile and identifies N,C1 as keycols for target.

Then you will get 1403 error (which is data not found in target).


So it is wise, as soon as you face a 1403 error or proactively identify all the tables with non unique identifiers and get your fixup routine ready.


The discard will tell you the reason and report file will tell you the mapping info. These two piece of info should help you identify the cause and fix.


Hope this helps!


Thanks


****************** Update to this blog **********************

Jan 27 2024:

After handling a migration project involving golden gate and working with oracle support, my experience with oracle goldengate improved. I learnt that 1403 error in my case popped up due to the fact, the keycol values are being updated.

Table 1: cola, colb, colc

unique index; cola,colb

OGG, properly picks unique index as the keycol definition for replication.

Imagine if the keycol value is changed meaning updated in source, the corresponding keycol value pair mightnt be available in target, this means a no data found error.

imagine:

1,2,'A'

kecol: 1,2

if this value is changed to 2,3 --> then this wont be found in target.

Oracle assumes your not changing the search field or index field, which is expected as a product.

So it is upto the GG DBA to ensure this is the case. In my it was my responsibility.

Thank you!

Wednesday, December 27, 2023

Oracle: Will "set newname for database to new" work in active DB duplication?

Dears,

Today, while I was assisting my colleague to get a standby db build using active db duplication command.

Source DB is 30TB in size.

Background: the standby database was activated last saturday due to primary server hardware issue.

We were left with no flashback option, since it wasnt enabled. So db restore in the old primary was the only way to reinstate (we differed OS utilities).


Challenge: The db had a mix of non-OFM and OFM format of datafiles.

So I discussed with my colleague and proposed to move all the files to OMF format for 

  1. auto standby file management
  2. ease of datafile maintenance operations later

We noticed the DB had
  • db_file_name_convert
  • db_create_file_dest
parameters set. When I fired the active db duplication asis, with no change to these 2 param setting.
I noticed, Oracle trying to restore even the OMF files into the same path in target as if it is restoring the non-OMF files (file names had encrypted identity).

So I decided to remove db_file_name_convert, after reading oracle doc, I felt this is the main culprit.
To ensure the datafiles gets the benefit of db_create_file_dest parameter, I introduced

"set newname for database to new" as well to the duplicate rman run command.

run
{
allocate channel prmry1 type disk;
...
allocate channel auxiliary channel stby1 type disk;
...
set newname for database to new;
duplicate target database for standby from active databvase
dorecover
spfile
   parameter...
...
NOFILENAMECHECK;
}

All the files got restored to the location indicated by db_create_file_dest as I expected.
This worked perfectly fine.

Thanks

Block Change Tracking File Invalid (RMAN-08606)

Dears,

Today while I was performing a FULL db backup of my production database to disk, I received a weird RMAN-08606 error.

RMAN-08606: WARNING: The Change Tracking File Is Invalid


On reading oracle note, I learnt it is risk to our incremental level 1s. Since the change tracker is missing changed block information.

Oracle says to ignore any backup taken since last incremental level 0, if you have this error. Since those backups potentially are incomplete due to missing changed block info.

So oracle proposes few options like
  1. placing this bct file on a relatively less hot filesystem
  2. increasing large_pool_size
  3. increasing _bct_public_dba_buffer_size to a larger value (use x$krcstat view for getting this value).
This should reduce the wait events related to bct and avoid the situation highlighted.

What I did is as a first measure disabled the bct, since a risk to backup is huge for me. My DB is small around 2TB. So I am not bothered reading 2TB in place of 200GB.

Thanks

 

DBMS_SQLTUNE.REPORT_SQL_MONITOR - reporting "DONE(ERROR)"

 Dears,

Today while I was watching a SQL query execution statsistics, I noticed a weired behaviour coming out of oracle's DBMS_SQLTUNE.REPORT_SQL_MONITOR package.


DBMS_SQLTUNE.REPORT_SQL_MONITOR is used to report the live execution statistics of the SQL.

This particular package reported my active SQL as "DONE(ERROR)" as if like some one killed the session or it failed due to a temp or undo or datafile space error.


It didnt report why it failed though still. Interestingly the elapsed time is 7748 vs it's overall query duration reported was 317s.


It took me by surprise, then I check v$session and noticed it isnt rolling back the merge it was running; since it wasnt reading from undo datafile.


So I suspected this is a weird reporting issue. So I continued checking if the session is really active.


Yes the session was really active and v$sql perfectly increments elapsed, cpu, io wait times along with its corresponding other metrics.


Cause: the query ran using parallel slaves, query had plan regression and it ended up running longer with few slaves staying idle for more than 30mins.

Now oracle says this particular condition where a px slave part of the query execution staying idle is an anomaly and is reported as a failed sql execution from oracle end; but really isnt the case :)


RDBMS: Oracle

Version: 12.2

Oracle note: 1900060.1

Bug: 13523091 (Incorrect status in v$sql_monitor with parallel query)


Thank you!

Friday, December 22, 2023

One of the reason why my referential constraint enabling is slow - Oracle 19c

Dears,

I was working on a project which involves 

1.    Data Copy with table_exists action Truncate (#70 in nos)

2.    Data Copy with table_exists action replace (#5 in nos)


I had to disable constraints and triggers in preperation for this excercise on the candidate schema before kicking of impdp.

Post the impdp, the constraint enabling took ages for 1 table. At one point it was already looking 20000secs.

I was going through various recommendations and options. One noteable callout was...

a. Enable constraint without validation

b. Enable parallel DDL @ session and run validation statement seperately.

I was thinking to log a SR and try this work around. But suddely I thought I am missing something, which is my basic issue diagnosis attempt.

Though I was constantly collecting metrics for the session, I couldnt sense any issue except the buffer gets crossed 1 billion already!!!!!!!!!!!!!!!!!!!!!

So the session was spinning in CPU always.

Finally I pulled a AWR report for 1 isolated window where only this constrtaint enabling was inprogress.


Found out a recursive SQL which was inprogress triggered by the constraint sql.

Noticed the plan of the recursive sql was pathetic (nested loop - expecting to drive 35 billion buffer gets). The constraint was a FK, so the recursive query was trying to find orphan records in the table where I am attempting to enable constraint.

So I decided to fix the plan

a. Patch the sql

b. Look for proper fix --- more info needed.


I continued, I looked at the table statistics, the auto stats, clashed with truncate and made all the tables part of the refresh with num_rows 0 statistics.

So thought this could be a concern. So I collected fresh stats on the tables participating in the referential constraint.

Tried generating the plan now, it was using hash outer join. Tried running the recursive SQL, all went well - 20000secs came down to 90secs!!!!!!!!!!

Tried the constraint enable statement now, the statement just flew by in no time.

So a happy ending for the story :)

Monday, December 18, 2023

Difference between connections established directly to postgres and a connection established through pgbouncer - a view from netstat

 Difference between connections established directly to postgres and a connection established through pgbouncer


pgbouncer acts as a proxy for client.It is used as a connection pooler.
The pgbouncer reduces the connection latency to a greater extent which is very much useful for the OLTP apps.

Direct connection:


Client:


-bash-4.2$ psql -U postgres -h 127.0.0.1 -p 5432

psql (15.2, server 14.7)

Type "help" for help.


postgres=# \c

psql (15.2, server 14.7)

You are now connected to database "postgres" as user "postgres".

postgres=#


Netstat output:


-bash-4.2$ netstat -plantu|grep 33336

(Not all processes could be identified, non-owned process info

 will not be shown, you would have to be root to see it all.)

tcp        0      0 127.0.0.1:5432          127.0.0.1:33336         ESTABLISHED 4997/postgres: post

tcp        0      0 127.0.0.1:33336         127.0.0.1:5432          ESTABLISHED 4996/psql

-bash-4.2$



Pgbouncer based connection:


Client:

-bash-4.2$ psql -p 6432 testdb -h 127.0.0.1 -U postgres

Password for user postgres:

psql (15.2, server 14.7)

Type "help" for help.


testdb=# \c

psql (15.2, server 14.7)

You are now connected to database "testdb" as user "postgres".

testdb=#



Netstat output:


-bash-4.2$ netstat -plantu|grep -i 6432

(Not all processes could be identified, non-owned process info

 will not be shown, you would have to be root to see it all.)

tcp        0      0 0.0.0.0:6432            0.0.0.0:*               LISTEN      4196/pgbouncer <<< ipv4 pgbouncer listener

tcp        0      0 127.0.0.1:6432          127.0.0.1:33676         ESTABLISHED 4196/pgbouncer <<< this the incoming connection

tcp        0      0 127.0.0.1:33674         127.0.0.1:6432          TIME_WAIT   -

tcp        0      0 127.0.0.1:33676         127.0.0.1:6432          ESTABLISHED 4771/psql >>> this is the outgoing connection from client

tcp        0      0 127.0.0.1:33670         127.0.0.1:6432          TIME_WAIT   -

tcp6       0      0 :::6432                 :::*                    LISTEN      4196/pgbouncer <<< ipv6 pgbouncer listener

-bash-4.2$


-bash-4.2$ netstat -plantu|grep 5432

(Not all processes could be identified, non-owned process info

 will not be shown, you would have to be root to see it all.)

tcp        0      0 0.0.0.0:5432            0.0.0.0:*               LISTEN      1337/postmaster <<< ipv4 postgres listener

tcp        0      0 127.0.0.1:33326         127.0.0.1:5432          ESTABLISHED 4196/pgbouncer <<< pgbouncer redirecting the connection here, which is an outgoing connection

tcp        0      0 127.0.0.1:5432          127.0.0.1:33326         ESTABLISHED 4772/postgres: pgte <<< postgres accepting the incoming connection

tcp6       0      0 :::5432                 :::*                    LISTEN      1337/postmaster <<< ipv6 pgbouncer listener

-bash-4.2$ ps -ef|grep 4196



Overall picture:

YouTube Video:

Thanks for visiting the blog. Thanks

Flashback data archive steps

 Objective: Ways to track DML changes in a table Detailed objective: We should be able to track the changes in the table at any point in tim...