Objective: Compare performance of searching & updating a partitioned key vs non partitioned key in postgresql
Test Setup:
Table 1 -- candidate for updating the non-partition key (lvl) in a list partition table:
lvl integer,
dob date,
hosptl text,
fn text,
ln text,
poscode text
) partition by list(hosptl);
CREATE TABLE public.test_part_noparkeyupd_a partition of public.test_part_noparkeyupd for values in ('AAAAAAAAAAAAAAAAAAAA');
CREATE TABLE public.test_part_noparkeyupd_b partition of public.test_part_noparkeyupd for values in ('BBBBBBBBBBBBBBBBBBBA');
CREATE TABLE public.test_part_noparkeyupd_c partition of public.test_part_noparkeyupd for values in ('CCCCCCCCCCCCCCCCCCCA');
CREATE TABLE public.test_part_noparkeyupd_d partition of public.test_part_noparkeyupd for values in ('DDDDDDDDDDDDDDDDDDDA');
CREATE TABLE public.test_part_noparkeyup_ddefault partition of public.test_part_noparkeyupd default;
Table 2 -- candidate for updating the partition-key(lvl) in a range partition table:
We will test 2 scenario here, 1 the row stays within the partition boundary and other the row moves to another partition.
CREATE TABLE public.test_part_partkeyupd2 (
dob date,
hosptl text,
fn text,
ln text,
poscode text
) partition by range(lvl);
CREATE TABLE public.test_part_partkeyupd2_54 partition of public.test_part_partkeyupd2 FOR VALUES FROM (0) TO (54);
CREATE TABLE public.test_part_partkeyupd2_108 partition of public.test_part_partkeyupd2 FOR VALUES FROM (54) TO (108);
CREATE TABLE public.test_part_partkeyupd2_162 partition of public.test_part_partkeyupd2 FOR VALUES FROM (108) TO (162);
CREATE TABLE public.test_part_partkeyupd2_216 partition of public.test_part_partkeyupd2 FOR VALUES FROM (162) TO (216);
CREATE TABLE public.test_part_partkeyupd2_max partition of public.test_part_partkeyupd2 FOR VALUES FROM (216) TO (maxvalue);
Table created:
List of relations
Schema | Name | Type | Owner
--------+---------------------------------+-------------------+----------
...
public | test_part_noparkeyupd | partitioned table | postgres
public | test_part_noparkeyupd_a | table | postgres
public | test_part_noparkeyupd_b | table | postgres
public | test_part_noparkeyupd_c | table | postgres
public | test_part_noparkeyupd_d | table | postgres
public | test_part_partkeyupd | partitioned table | postgres
public | test_part_partkeyupd_200 | table | postgres
public | test_part_partkeyupd_400 | table | postgres
public | test_part_partkeyupd_600 | table | postgres
public | test_part_partkeyupd_800 | table | postgres
public | test_part_partkeyupd_max | table | postgres
(22 rows)
postgres=#
Load data:
with t as
(
select generate_series(0,1000) lvl
)
select t.lvl
,(mod(t.lvl,100)+date '1/1/2022') as dob
,lpad('A',(mod(t.lvl,20)+(20-mod(t.lvl,20))),chr(65+mod((t.lvl/26),26))) as hosptl
,substr(md5(random()::text),15) as fn
,substr(md5(random()::text),15) as ln
,'UKCO'||chr(65+mod(t.lvl,26))||chr(65+mod((t.lvl/26),26)) as POSCODE
from t;
insert into public.test_part_partkeyupd2
with t as
(
select generate_series(0,1000) lvl
)
select t.lvl
,(mod(t.lvl,100)+date '1/1/2022') as dob
,lpad('A',(mod(t.lvl,20)+(20-mod(t.lvl,20))),chr(65+mod((t.lvl/26),26))) as hosptl
,substr(md5(random()::text),15) as fn
,substr(md5(random()::text),15) as ln
,'UKCO'||chr(65+mod(t.lvl,26))||chr(65+mod((t.lvl/26),26)) as POSCODE
from t;
Post stats update:
relname | relpages | reltuples | relallvisible
-------------------------------+----------+-----------+---------------
test_part_noparkeyup_ddefault | 11 | 793 | 0
test_part_noparkeyupd | -1 | 1001 | 0
test_part_noparkeyupd_a | 1 | 52 | 0
test_part_noparkeyupd_b | 1 | 52 | 0
test_part_noparkeyupd_c | 1 | 52 | 0
test_part_noparkeyupd_d | 1 | 52 | 0
test_part_partkeyupd | -1 | 1001 | 0
test_part_partkeyupd2 | -1 | 1001 | 0
test_part_partkeyupd2_108 | 1 | 54 | 0
test_part_partkeyupd2_162 | 1 | 54 | 0
test_part_partkeyupd2_216 | 1 | 54 | 0
test_part_partkeyupd2_54 | 1 | 54 | 0
test_part_partkeyupd2_max | 11 | 785 | 0
test_part_partkeyupd_200 | 3 | 200 | 0
test_part_partkeyupd_400 | 3 | 200 | 0
test_part_partkeyupd_600 | 3 | 200 | 0
test_part_partkeyupd_800 | 3 | 200 | 0
test_part_partkeyupd_max | 3 | 201 | 0
(18 rows)
Time: 0.919 ms
Test 1: Update non part key column - by searching only the non part key column
postgres=# explain analyze update public.test_part_noparkeyupd set lvl=10 where lvl=2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on test_part_noparkeyupd (cost=0.00..27.54 rows=0 width=0) (actual time=0.313..0.315 rows=0 loops=1)
Update on test_part_noparkeyupd_a test_part_noparkeyupd_1
Update on test_part_noparkeyupd_b test_part_noparkeyupd_2
Update on test_part_noparkeyupd_c test_part_noparkeyupd_3
Update on test_part_noparkeyupd_d test_part_noparkeyupd_4
Update on test_part_noparkeyup_ddefault test_part_noparkeyupd_5
-> Append (cost=0.00..27.54 rows=5 width=14) (actual time=0.010..0.264 rows=1 loops=1)
-> Seq Scan on test_part_noparkeyupd_a test_part_noparkeyupd_1 (cost=0.00..1.65 rows=1 width=14) (actual time=0.009..0.019 rows=1 loops=1)
Filter: (lvl = 2)
Rows Removed by Filter: 51
-> Seq Scan on test_part_noparkeyupd_b test_part_noparkeyupd_2 (cost=0.00..1.65 rows=1 width=14) (actual time=0.016..0.016 rows=0 loops=1)
Filter: (lvl = 2)
Rows Removed by Filter: 52
-> Seq Scan on test_part_noparkeyupd_c test_part_noparkeyupd_3 (cost=0.00..1.65 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)
Filter: (lvl = 2)
Rows Removed by Filter: 52
-> Seq Scan on test_part_noparkeyupd_d test_part_noparkeyupd_4 (cost=0.00..1.65 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)
Filter: (lvl = 2)
Rows Removed by Filter: 52
-> Seq Scan on test_part_noparkeyup_ddefault test_part_noparkeyupd_5 (cost=0.00..20.91 rows=1 width=14) (actual time=0.198..0.198 rows=0 loops=1)
Filter: (lvl = 2)
Rows Removed by Filter: 793
Planning Time: 0.333 ms
Execution Time: 0.448 ms
(24 rows)
Time: 2.602 ms
postgres=#
Observation:
1) no partition pruning as expected, since we are searching with a nonpart key column.
Test 2: Update non part key column - by searching both part key (for partition pruning) & non part key (update) columns.
postgres=# explain analyze update public.test_part_noparkeyupd set lvl=10 where lvl=75 and hosptl='BBBBBBBBBBBBBBBBBBBA';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Update on test_part_noparkeyupd (cost=0.00..1.78 rows=0 width=0) (actual time=0.023..0.024 rows=0 loops=1)
Update on test_part_noparkeyupd_b test_part_noparkeyupd_1
-> Seq Scan on test_part_noparkeyupd_b test_part_noparkeyupd_1 (cost=0.00..1.78 rows=1 width=14) (actual time=0.022..0.023 rows=0 loops=1)
Filter: ((lvl = 75) AND (hosptl = 'BBBBBBBBBBBBBBBBBBBA'::text))
Rows Removed by Filter: 52
Planning Time: 0.110 ms
Execution Time: 0.105 ms
(7 rows)
Time: 1.399 ms
postgres=#
Observation:
1) Partition pruning improved the performance and decreased the query run duration to 1/4th or by 75%.
Repeat test 2 for different values in preference clause (to consider cache effect):
postgres=# explain analyze update public.test_part_noparkeyupd set lvl=20 where lvl=85 and hosptl='BBBBBBBBBBBBBBBBBBBA';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Update on test_part_noparkeyupd (cost=0.00..1.78 rows=0 width=0) (actual time=0.016..0.016 rows=0 loops=1)
Update on test_part_noparkeyupd_b test_part_noparkeyupd_1
-> Seq Scan on test_part_noparkeyupd_b test_part_noparkeyupd_1 (cost=0.00..1.78 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)
Filter: ((lvl = 85) AND (hosptl = 'BBBBBBBBBBBBBBBBBBBA'::text))
Rows Removed by Filter: 52
Planning Time: 0.074 ms
Execution Time: 0.034 ms
(7 rows)
Time: 0.379 ms
postgres=#
Observation:
1) The run duration reduced by 70% more.
Test 3: Update part key column - stay within same partition (no record movement) by searching on part key column
postgres=# explain analyze update public.test_part_partkeyupd2 set lvl=10 where lvl=2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Update on test_part_partkeyupd2 (cost=0.00..1.68 rows=0 width=0) (actual time=0.065..0.065 rows=0 loops=1)
Update on test_part_partkeyupd2_54 test_part_partkeyupd2_1
-> Seq Scan on test_part_partkeyupd2_54 test_part_partkeyupd2_1 (cost=0.00..1.68 rows=1 width=14) (actual time=0.009..0.019 rows=1 loops=1)
Filter: (lvl = 2)
Rows Removed by Filter: 53
Planning Time: 0.261 ms
Execution Time: 0.118 ms
(7 rows)
Time: 2.726 ms
postgres=#
Observation:
1) partition pruning has happened and row (though we updated partition key column) was updated without any issues.
2) Query performance was comparable against the other nonpartkey update routine.
Repeat no row movement tests to rule out cache effect with partition key update:
postgres=# explain analyze update public.test_part_partkeyupd2 set lvl=12 where lvl=4;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Update on test_part_partkeyupd2 (cost=0.00..1.68 rows=0 width=0) (actual time=0.071..0.072 rows=0 loops=1)
Update on test_part_partkeyupd2_54 test_part_partkeyupd2_1
-> Seq Scan on test_part_partkeyupd2_54 test_part_partkeyupd2_1 (cost=0.00..1.68 rows=1 width=14) (actual time=0.018..0.029 rows=1 loops=1)
Filter: (lvl = 4)
Rows Removed by Filter: 53
Planning Time: 0.098 ms
Execution Time: 0.097 ms
(7 rows)
Time: 1.963 ms
postgres=# explain analyze update public.test_part_partkeyupd2 set lvl=16 where lvl=6;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Update on test_part_partkeyupd2 (cost=0.00..1.68 rows=0 width=0) (actual time=0.054..0.054 rows=0 loops=1)
Update on test_part_partkeyupd2_54 test_part_partkeyupd2_1
-> Seq Scan on test_part_partkeyupd2_54 test_part_partkeyupd2_1 (cost=0.00..1.68 rows=1 width=14) (actual time=0.016..0.026 rows=1 loops=1)
Filter: (lvl = 6)
Rows Removed by Filter: 53
Planning Time: 0.079 ms
Execution Time: 0.077 ms
(7 rows)
Time: 1.816 ms
postgres=#
Observation:
There is a overhead with update on the partition key column. If we observe our test with no part key column update it took .034secs vs .077secs with part key updates.
Test 4: Update part key column - Move between partitions (row movement); here 56 stays in 2nd partition and 112 stays in 3rd parition. So a row should be deleted from 2nd partition and a row should be inserted in 3rd partition with value update.
postgres=# explain analyze update public.test_part_partkeyupd2 set lvl=112 where lvl=56;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Update on test_part_partkeyupd2 (cost=0.00..1.68 rows=0 width=0) (actual time=0.160..0.161 rows=0 loops=1)
Update on test_part_partkeyupd2_108 test_part_partkeyupd2_1
-> Seq Scan on test_part_partkeyupd2_108 test_part_partkeyupd2_1 (cost=0.00..1.68 rows=1 width=14) (actual time=0.014..0.025 rows=1 loops=1)
Filter: (lvl = 56)
Rows Removed by Filter: 53
Planning Time: 0.585 ms
Execution Time: 0.221 ms
(7 rows)
Time: 2.571 ms
postgres=#
Observation:
1) partition pruning happened as expected and no issues, which is good :)
2) Run duration incrased by 1x more!! As expected the query performance degrades. You can see below the row movement (part108 ,1 row deleted. part162, 1 row inserted)
relid | 16665 relid | 16665
schemaname | public schemaname | public
relname | test_part_partkeyupd2_108 relname | test_part_partkeyupd2_108
seq_scan | 0 | seq_scan | 1
seq_tup_read | 0 | seq_tup_read | 54
..
n_tup_del | 0 | n_tup_del | 1 <<< 1 row deleted here
..
n_live_tup | 54 | n_live_tup | 53
n_dead_tup | 0 | n_dead_tup | 1
n_mod_since_analyze | 0 | n_mod_since_analyze | 1
..
-[ RECORD 16 ]------+------------------------------ -[ RECORD 16 ]------+------------------------------
relid | 16670 relid | 16670
schemaname | public schemaname | public
relname | test_part_partkeyupd2_162 relname | test_part_partkeyupd2_162
..
n_tup_ins | 54 | n_tup_ins | 55 <<< 1 row inserted here
..
n_live_tup | 54 | n_live_tup | 55
..
n_mod_since_analyze | 0 | n_mod_since_analyze | 1
n_ins_since_vacuum | 54 | n_ins_since_vacuum | 55
Repeat row movement tests to rule out cache effect:
postgres=# explain analyze update public.test_part_partkeyupd2 set lvl=116 where lvl=58;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Update on test_part_partkeyupd2 (cost=0.00..1.68 rows=0 width=0) (actual time=0.097..0.097 rows=0 loops=1)
Update on test_part_partkeyupd2_108 test_part_partkeyupd2_1
-> Seq Scan on test_part_partkeyupd2_108 test_part_partkeyupd2_1 (cost=0.00..1.68 rows=1 width=14) (actual time=0.021..0.033 rows=1 loops=1)
Filter: (lvl = 58)
Rows Removed by Filter: 52
Planning Time: 0.102 ms
Execution Time: 0.125 ms
(7 rows)
Time: 2.116 ms
postgres=#
explain analyze update public.test_part_partkeyupd2 set lvl=186 where lvl=112;
postgres=# explain analyze update public.test_part_partkeyupd2 set lvl=186 where lvl=112;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Update on test_part_partkeyupd2 (cost=0.00..1.68 rows=0 width=0) (actual time=0.131..0.131 rows=0 loops=1)
Update on test_part_partkeyupd2_162 test_part_partkeyupd2_1
-> Seq Scan on test_part_partkeyupd2_162 test_part_partkeyupd2_1 (cost=0.00..1.68 rows=1 width=14) (actual time=0.011..0.022 rows=2 loops=1)
Filter: (lvl = 112)
Rows Removed by Filter: 54
Planning Time: 0.166 ms
Execution Time: 0.190 ms
(7 rows)
Time: 2.791 ms
postgres=#
explain analyze update public.test_part_partkeyupd2 set lvl=190 where lvl=114;
postgres=# explain analyze update public.test_part_partkeyupd2 set lvl=190 where lvl=114;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Update on test_part_partkeyupd2 (cost=0.00..1.68 rows=0 width=0) (actual time=0.059..0.059 rows=0 loops=1)
Update on test_part_partkeyupd2_162 test_part_partkeyupd2_1
-> Seq Scan on test_part_partkeyupd2_162 test_part_partkeyupd2_1 (cost=0.00..1.68 rows=1 width=14) (actual time=0.016..0.026 rows=1 loops=1)
Filter: (lvl = 114)
Rows Removed by Filter: 53
Planning Time: 0.075 ms
Execution Time: 0.082 ms
(7 rows)
Time: 2.040 ms
postgres=#
Observations with repeated test case which had row movement:
The partkey update with or without row movement has a comparable performance overhead against the non partkey update in a partition table. Hence updating a partkey column negatively impacts the query performance.
Conclusion:
It is better to avoid any DMLs on the part key columns in a partition table though postgresql doesnt fail. It properly shows in table stats a row deleted from old partition and inserted in to new partition.
Thank you!
No comments:
Post a Comment