Monday, May 15, 2023

Oracle: HASH_JOIN - MEMORY, SWAP and TEMP Tablespace an impact analysis - Part1

Objective: Assess the impact of HASH_JOIN on SWAP and TEMP Tablespace

Find out how much disk IO and swapping is created when the Oracle's *_area_size is suboptimal.

For this test, we will have 2 test tables identical by defination and rowdata (atleast the join column). Both the tables were loaded with 25M records. When they are joined, they yield 25M records as well.

Current Setting:

CPU:2
Memory:3GB
SGA:1504M
PGA:500M
hash_area_size:131072
sort_area_size:65536
bitmap_area_size:1048576
workarea_size_policy: AUTO (PGA in effect)
Container deployment: yes
CON_ID: 3 (OPDB1)

Instrumentations used:
1. vmstat
2. iostat
3. ASH
4. dbms_sqltune
5. dbms_xplan

Test table creation:

1) Test table creation step

create table jt1

as
with generator as (
    select            rownum            id
    from        dual
    connect by
                rownum <= 5000
)
select rownum id
,trunc((rownum - 1) / 1000)                clustered
,mod(rownum - 1,1000)                      scattered
,trunc(dbms_random.value(0,1000))          randomised
,substr(dbms_random.string('U',12),1,12)   random_string
,lpad(rownum,10)                           vc_small
,rpad('x',10,'x')                          vc_padding
from
    generator   g1,
    generator   g2
;

create table jt2
as
with generator as (
    select            rownum            id
    from        dual
    connect by
                rownum <= 5000
)
select rownum id
,trunc((rownum - 1) / 1000)                clustered
,mod(rownum - 1,1000)                      scattered
,trunc(dbms_random.value(0,1000))          randomised
,substr(dbms_random.string('U',12),1,12)   random_string
,lpad(rownum,10)                           vc_small
,rpad('x',10,'x')                          vc_padding
from
    generator   g1,
    generator   g2
;

No indexes. 
>>> Tables were created.

2) Dont change any of the db settings, run a query which joins both

alter session set statistics_level=all;

statistics_level                     string      ALL

set autotrace trace statistics
select /*+ monitor */ t1.*
,t2.*
from jt1 t1
,jt2 t2
where t1.id=t2.id
;

Actual output:

19:04:06 SQL> select /*+ monitor */ t1.*
,t2.*
from jt1 t1
,jt2 t2
where t1.id=t2.id
;19:04:22   2  19:04:22   3  19:04:22   4  19:04:22   5  19:04:22   6
from jt1 t1
     *
ERROR at line 3:
ORA-27072: File I/O error
Linux-x86_64 Error: 28: No space left on device
Additional information: 4
Additional information: 267015
Additional information: 4294967295
ORA-01114: IO error writing block to file 203 (block # 266949)
...
ORA-27072: File I/O error
Linux-x86_64 Error: 28: No space left on device
Additional information: 4
Additional information: 265894
Additional information: 4294967295
ORA-01114: IO error writing block to file 203 (


Elapsed: 00:01:44.30
19:06:08 SQL>

>>> the query failed because TEMP tablespace grew and consumed all /oradata fs. Query ran between 19:04:06 and 19:06:08.

df -h /oradata
Filesystem                           Size  Used Avail Use% Mounted on
/dev/mapper/appdata--vg-oradata--lv   10G   10G  108K 100% /oradata


from same session - the last query execution stats:
19:16:05 SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6hz2uy35hb6jy, child number 0
-------------------------------------
select /*+ monitor */ t1.* ,t2.* from jt1 t1 ,jt2 t2 where t1.id=t2.id

Plan hash value: 913581377

------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |       |   266K(100)|      0 |00:00:00.01 |       0 |      0 |         |       |          |
|*  1 |  HASH JOIN         |      |      1 |     25M|  2593M|  1573M|   266K  (1)|      0 |00:00:00.01 |       0 |      0 |  2478M|    41M|   56M (1)|
|   2 |   TABLE ACCESS FULL| JT1  |      1 |     25M|  1287M|       | 55303   (1)|     25M|00:00:02.80 |     202K|    202K|         |       |          |
|   3 |   TABLE ACCESS FULL| JT2  |      1 |     25M|  1287M|       | 55303   (1)|   6804K|00:00:01.65 |   54333 |  54465 |         |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."ID"="T2"."ID")


20 rows selected.

Elapsed: 00:00:00.25
19:18:40 SQL>

Before the query:
Filesystem                           Size  Used Avail Use% Mounted on
/dev/mapper/appdata--vg-oradata--lv   10G  8.1G  2.0G  81% /oradata

After query failure:
/dev/mapper/appdata--vg-oradata--lv   10G   10G  156K 100% /oradata


ran in parallel these commands:
vmstat 2 1000 -t >/tmp/vms14may23.out
iostat -x /dev/sdc1 2 1000 -t >/tmp/ios14may23.out

/dev/sdc1 is the /oradata mountpoint's disk.

Let us pick live sql monitoring:

SET trimspool ON
SET TRIM ON
SET pages 0
SET linesize 32767
SET LONG 1000000
SET longchunksize 1000000
set feedback off
set echo off
set verify off
SELECT dbms_sqltune.Report_sql_monitor(SQL_ID=>'6hz2uy35hb6jy', TYPE=>'text',report_level=>'ALL')
FROM dual;

Output:

SQL> SELECT dbms_sqltune.Report_sql_monitor(SQL_ID=>'6hz2uy35hb6jy', TYPE=>'text',report_level=>'ALL')
FROM dual;  2
SQL Monitoring Report

SQL Text
------------------------------
select /*+ monitor */ t1.* ,t2.* from jt1 t1 ,jt2 t2 where t1.id=t2.id

Error: ORA-1114
------------------------------
ORA-01114: IO error writing block to file 203 (block # 266468)
ORA-27072: File I/O error
Linux-x86_64 Error: 28: No space left on device
Additional information: 4
Additional information: 266468
Additional information: 4294967295


Global Information
------------------------------
 Status              :  DONE (ERROR)
 Instance ID         :  1
 Session             :  SYS (1:9741)
 SQL ID              :  6hz2uy35hb6jy
 SQL Execution ID    :  16777216
 Execution Started   :  05/14/2023 19:04:22
 First Refresh Time  :  05/14/2023 19:04:22
 Last Refresh Time   :  05/14/2023 19:06:07
 Duration            :  105s
 Module/Action       :  sqlplus@vcentos79-oracle-sa1 (TNS V1-V3)/-
 Service             :  opdb1
 Program             :  sqlplus@vcentos79-oracle-sa1 (TNS V1-V3)
 Fetch Calls         :  1

Global Stats
==========================================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency |  Other   | Fetch | Buffer | Read  | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs  | Bytes | Reqs  | Bytes |
==========================================================================================================
|     104 |      20 |       69 |        0.00 |       15 |     1 |   267K | 29290 |   2GB | 23043 |   2GB | << look here 60% iowaits and 2GB temp writes. 
==========================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=913581377)
==========================================================================================================================================================================
| Id |      Operation       | Name |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Write | Write |  Mem  | Temp  | Activity | Activity Detail |
|    |                      |      | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | Reqs  | Bytes | (Max) | (Max) |     (%)    |   (# samples)   |
==========================================================================================================================================================================
|  0 | SELECT STATEMENT     |      |         |       |       104 |     +2 |     1 |        0 |      |       |       |       |       . |     . |          |                 |
|  1 |   HASH JOIN          |      |     25M |  267K |       104 |     +2 |     1 |        0 |      |       |  8398 |   2GB |  56MB |     2GB |          |                 |
|  2 |    TABLE ACCESS FULL | JT1  |     25M | 55303 |        79 |     +2 |     1 |      25M | 1595 |   2GB |       |       |       . |     . |          |                 |
|  3 |    TABLE ACCESS FULL | JT2  |     25M | 55303 |        24 |    +82 |     1 |       7M |  435 | 426MB |       |       |       . |     . |          |                 |
==========================================================================================================================================================================

SQL>

This how memory & swap looks  - now after failure:
[root@vcentos79-oracle-sa1 ~]# free -k
              total        used        free      shared  buff/cache   available
Mem:        2914064      730944       22228     1346740     2160892      797796
Swap:       3145724       11008     3134716
[root@vcentos79-oracle-sa1 ~]#


SQL> exec dbms_workload_repository.create_snapshot; << secured a snap after failure.
SQL>

Let us look @ how the memory usage and disk io looked:

Look at the chart we draw from the data collected by vmstat and iostat.
There is  no swapping activity, which means the Oracle contained this queries demand to stay within PGA_AGGREGATE_TARGET which is 500MB (actual usage was 56MB).
The spilled over operation did go to tempfile pushing the temp file usage to 2GB and ended up consuming all of the /oradata FS free space left.

SQL> 1
  1* select con_id,name,open_mode from v$pdbs
SQL> /

    CON_ID NAME                 OPEN_MODE
---------- -------------------- ----------
         2 PDB$SEED             READ ONLY
         3 OPDB1                READ WRITE <<< conid=3
         4 INVENTORYPDB         MOUNTED
SQL>

SQL> 1
  1* select con_id,tablespace_name,sum(bytes)/1024/1024 from cdb_temp_files group by con_id,tablespace_name order by 1,2
SQL> /

    CON_ID TABLESPACE SUM(BYTES)/1024/1024
---------- ---------- --------------------
         1 TEMP                         32
         3 TEMP                       2133 <<< 2GB in use.
SQL>


Query ran between 19:04:06 and 19:06:08

A Linux shell script to get relevant info from the vmstat and iostat file:

while read line
do
if [[ "${line}" == "05/14/2023"* ]];
then
tmstmp=${line};
fi
if [[ "${line}" == "sdc1"* ]];
then
deviceperformance=$(echo ${line}|awk '{print $6","$7","$13","$14}');
textout=${tmstmp},${deviceperformance};
echo ${textout};
fi
done </tmp/ios14may23.out


while read line
do
if ! [[ "${line}" == "procs"* || "${line}" == *"free"* ]];
then 
tmstmp=$(echo $line|awk '{print $19}');
restofthefields=$(echo $line|awk '{print $7","$8","$9","$10","$16}');
echo ${tmstmp},${restofthefields};
fi
done </tmp/vms14may23.out

If we look below from the OS stats we collected, we could see the db generated a large amount of IO to /oradata FS during query execution.


We also get the ASH reports dumped...

col event for a40
set lines 1000
set colsep |
set time on
set timing on
alter session set nls_date_format='DD/MM/YYYY HH24:MI:SS';
select cast(sample_Time as date) sampltm
,SQL_ID
,SQL_OPCODE
,SQL_OPNAME
,SQL_PLAN_LINE_ID
,SQL_PLAN_OPERATION
,SQL_PLAN_OPTIONS
,event
,SESSION_STATE
,TIME_WAITED
,DELTA_READ_IO_BYTES
,DELTA_WRITE_IO_BYTES
,PGA_ALLOCATED
,TEMP_SPACE_ALLOCATED
from dba_hist_active_sess_history
where session_id=1
and con_id=3
and cast(sample_Time as date) between '14/05/2023 19:00:00' and '14/05/2023 19:07:00'
order by 1;


ASH confirms the total PGA and Temp tablespace we consumed as we saw from the dbms_xplan and dbms_sqltune outputs.

So we defined the issue. Now let us look at the solution in next blogs.

Thanks






No comments:

Post a Comment

Troubleshooting the “Cannot Generate SSPI Context” Error After SQL Server Migration

  Introduction After a recent  SQL Server migration from 2012 to 2022 , our team encountered a classic authentication issue: The target prin...