Monday, October 24, 2022

Postgresql: What happens to the other index than the one used for cluster table command?

Objective: Let us examine the impact of the cluster table command on the other indexes than the one used for cluster command (in case of a table has more than 1 index).


create a table with 2 indexes and perform cluster operation and observe what happens to the other index which isnt participating in the cluster operation

SET search_path TO pgtst_schema;
create table pgtst_tbl2
(
id int primary key
,name text
,age int
);
create index pgtst_tbl2_idx_age on pgtst_tbl2(age);

Actual output:

pgtst_db=> SET search_path TO pgtst_schema;
SET

pgtst_db=> show search_path;
-[ RECORD 1 ]-------------
search_path | pgtst_schema

pgtst_db=> create table pgtst_tbl2
pgtst_db-> (
pgtst_db(> id int primary key
pgtst_db(> ,name text
pgtst_db(> ,age int
pgtst_db(> );
CREATE TABLE

pgtst_db=> create index pgtst_tbl2_idx_age on pgtst_tbl2(age);
CREATE INDEX

pgtst_db=> \d pgtst_tbl2
          Table "pgtst_schema.pgtst_tbl2"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           | not null |
 name   | text    |           |          |
 age    | integer |           |          |
Indexes:
    "pgtst_tbl2_pkey" PRIMARY KEY, btree (id)
    "pgtst_tbl2_idx_age" btree (age)
pgtst_db=>

Now load data:

insert into pgtst_tbl2
SELECT id,name,mod(id,100)
FROM pgtst_tbl1
where id < 1000
order by random();

pgtst_db=> insert into pgtst_tbl2
pgtst_db-> SELECT id,name,mod(id,100)
pgtst_db-> FROM pgtst_tbl1
pgtst_db-> where id < 1000
pgtst_db-> order by random();
INSERT 0 999

pgtst_db=> commit;
WARNING:  there is no transaction in progress
COMMIT

pgtst_db=>
pgtst_db=> select * from pgtst_tbl2 limit 10;
-[ RECORD 1 ]--------------------------------------------------------------------------------------------
id   | 704
name | CTST
age  | 4
-[ RECORD 2 ]--------------------------------------------------------------------------------------------
id   | 42
name | CHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKTST
age  | 42
-[ RECORD 3 ]--------------------------------------------------------------------------------------------
id   | 512
name | CHKCHKCHKTST
age  | 12
-[ RECORD 4 ]--------------------------------------------------------------------------------------------
id   | 38
name | CHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHTST
age  | 38
-[ RECORD 5 ]--------------------------------------------------------------------------------------------
id   | 567
name | CHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCTST
age  | 67
-[ RECORD 6 ]--------------------------------------------------------------------------------------------
id   | 605
name | CHTST
age  | 5
-[ RECORD 7 ]--------------------------------------------------------------------------------------------
id   | 450
name | CHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHTST
age  | 50
-[ RECORD 8 ]--------------------------------------------------------------------------------------------
id   | 85
name | CHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCHKCTST
age  | 85
pgtst_db=>


Rows arent sorted. So this time, we will evaluate the status of both the indexes and also when we do cluster we will increase the fillfactor.

pg_indexes
pg_class
select c.schemaname
,c.tablename
,c.indexname
,b.indisunique
,b.indisprimary
,b.indisclustered
,b.indisvalid
,b.indisready
,b.indislive
from pg_class a
,pg_index b
,pg_indexes c
where a.oid=b.indexrelid
and a.relnamespace=(select oid from pg_namespace where nspname='pgtst_schema')
and c.schemaname='pgtst_schema'
and c.indexname=a.relname
;

Actual output:

pgtst_db=> select c.schemaname
pgtst_db-> ,c.tablename
pgtst_db-> ,c.indexname
pgtst_db-> ,b.indisunique
pgtst_db-> ,b.indisprimary
pgtst_db-> ,b.indisclustered
pgtst_db-> ,b.indisvalid
pgtst_db-> ,b.indisready
pgtst_db-> ,b.indislive
pgtst_db-> from pg_class a
pgtst_db-> ,pg_index b
pgtst_db-> ,pg_indexes c
pgtst_db-> where a.oid=b.indexrelid
pgtst_db-> and a.relnamespace=(select oid from pg_namespace where nspname='pgtst_schema')
pgtst_db-> and c.schemaname='pgtst_schema'
pgtst_db-> and c.indexname=a.relname
pgtst_db-> ;

  schemaname  |     tablename     |       indexname        | indisunique | indisprimary | indisclustered | indisvalid | in
disready | indislive
--------------+-------------------+------------------------+-------------+--------------+----------------+------------+---
---------+-----------
 pgtst_schema | pgtst_tbl1        | pgtst_tbl1_pkey        | t           | t            | f              | t          | t
         | t
 pgtst_schema | PGTST_TBL1        | PGTST_TBL1_pkey        | t           | t            | f              | t          | t
         | t
 pgtst_schema | pgtst_tbl1_random | pgtst_tbl1_random_pkey | t           | t            | t              | t          | t
         | t
 pgtst_schema | pgtst_tbl2        | pgtst_tbl2_pkey        | t           | t            | f              | t          | t
         | t
 pgtst_schema | pgtst_tbl2        | pgtst_tbl2_idx_age     | f           | f            | f              | t          | t
         | t
(5 rows)
pgtst_db=>

So we see is index valid (indisvalid) and is index ready (indisready) flags hold correct. All the indexes are valid and are ready for the usage.
Let us now do 2 tasks...
1) Adjust fillfactor to 80 for the table
2) Run cluster command and check the index on age column for its status

alter table pgtst_schema.pgtst_tbl2 set (fillfactor=80);

Actual output:

pgtst_db=> select reloptions from pg_class where relname='pgtst_tbl2';
   reloptions
-----------------
 {fillfactor=80}
(1 row)
pgtst_db=>

Now we have 2 options - vaccum full on the table impacted or running cluster command.

We run cluster:
Actual output:

pgtst_db=> cluster pgtst_tbl2 using pgtst_tbl2_pkey;
CLUSTER
pgtst_db=> select * from pgtst_tbl2 limit 10;
 id |    name    | age
----+------------+-----
  1 | T          |   1
  2 | TS         |   2
  3 | TST        |   3
  4 | CTST       |   4
  5 | CHTST      |   5
  6 | CHKTST     |   6
  7 | CHKCTST    |   7
  8 | CHKCHTST   |   8
  9 | CHKCHKTST  |   9
 10 | CHKCHKCTST |  10
(10 rows)
pgtst_db=>

pgtst_db=> select c.schemaname
pgtst_db-> ,c.tablename
pgtst_db-> ,c.indexname
pgtst_db-> ,b.indisunique
pgtst_db-> ,b.indisprimary
pgtst_db-> ,b.indisclustered
pgtst_db-> ,b.indisvalid
pgtst_db-> ,b.indisready
pgtst_db-> ,b.indislive
pgtst_db-> from pg_class a
pgtst_db-> ,pg_index b
pgtst_db-> ,pg_indexes c
pgtst_db-> where a.oid=b.indexrelid
pgtst_db-> and a.relnamespace=(select oid from pg_namespace where nspname='pgtst_schema')
pgtst_db-> and c.schemaname='pgtst_schema'
pgtst_db-> and c.indexname=a.relname
pgtst_db-> ;
  schemaname  |     tablename     |       indexname        | indisunique | indisprimary | indisclustered | indisvalid | in
disready | indislive
--------------+-------------------+------------------------+-------------+--------------+----------------+------------+---
---------+-----------
 pgtst_schema | pgtst_tbl1        | pgtst_tbl1_pkey        | t           | t            | f              | t          | t
         | t
 pgtst_schema | PGTST_TBL1        | PGTST_TBL1_pkey        | t           | t            | f              | t          | t
         | t
 pgtst_schema | pgtst_tbl1_random | pgtst_tbl1_random_pkey | t           | t            | t              | t          | t
         | t
 pgtst_schema | pgtst_tbl2        | pgtst_tbl2_idx_age     | f           | f            | f              | t          | t
         | t
 pgtst_schema | pgtst_tbl2        | pgtst_tbl2_pkey        | t           | t            | t              | t          | t
         | t
(5 rows)

We see now pgtst_tbl2_idx_age is still valid and ready. Also note pgtst_tbl2_pkey became clustered true vs false last time before clustering done using this sql.
So this means clustering doesnt affect other indexes in the table.

pgtst_db=> \dt+
                                               List of relations
    Schema    |       Name        | Type  |   Owner   | Persistence | Access method |    Size    | Description
--------------+-------------------+-------+-----------+-------------+---------------+------------+-------------
 pgtst_schema | PGTST_TBL1        | table | pgtst_usr | permanent   | heap          | 8192 bytes |
 pgtst_schema | pgtst_tbl1        | table | pgtst_usr | permanent   | heap          | 825 MB     |
 pgtst_schema | pgtst_tbl1_random | table | pgtst_usr | permanent   | heap          | 828 MB     |
 pgtst_schema | pgtst_tbl2        | table | pgtst_usr | permanent   | heap          | 120 kB     |
(4 rows)
pgtst_db=>

Thanks



Foot notes... later additions:
Index status examination sql:

select c.schemaname
,c.tablename
,c.indexname
,b.indisunique
,b.indisprimary
,b.indisclustered
,b.indisvalid
,b.indisready
,b.indislive
from pg_class a
,pg_index b
,pg_indexes c
,pg_namespace d
where d.nspname='pgtst_schema'
and c.schemaname=d.nspname
and c.tablename='pgtst_tbl1_random'
and c.indexname=a.relname
and a.oid=b.indexrelid
and a.relnamespace=d.oid
;

pgtst_db=# select c.schemaname
pgtst_db-# ,c.tablename
pgtst_db-# ,c.indexname
pgtst_db-# ,b.indisunique
pgtst_db-# ,b.indisprimary
pgtst_db-# ,b.indisclustered
pgtst_db-# ,b.indisvalid
pgtst_db-# ,b.indisready
pgtst_db-# ,b.indislive
pgtst_db-# from pg_class a
pgtst_db-# ,pg_index b
pgtst_db-# ,pg_indexes c
pgtst_db-# ,pg_namespace d
pgtst_db-# where d.nspname='pgtst_schema'
pgtst_db-# and c.schemaname=d.nspname
pgtst_db-# and c.tablename='pgtst_tbl1_random'
pgtst_db-# and c.indexname=a.relname
pgtst_db-# and a.oid=b.indexrelid
pgtst_db-# and a.relnamespace=d.oid
pgtst_db-# ;
  schemaname  |     tablename     |       indexname        | indisunique | indisprimary | indisclustered | indisvalid | in
disready | indislive
--------------+-------------------+------------------------+-------------+--------------+----------------+------------+---
---------+-----------
 pgtst_schema | pgtst_tbl1_random | pgtst_tbl1_random_pkey | t           | t            | t              | t          | t
         | t
(1 row)

No comments:

Post a Comment

Flashback data archive steps

 Objective: Ways to track DML changes in a table Detailed objective: We should be able to track the changes in the table at any point in tim...