PostgreSQL Bugs

Collected from the PG bugs email list.

Bug ID16280
PG Version11.6
OSCentos 7.4
Opened2020-02-27 11:28:11+00
Reported byIlya Serbin
StatusNew

Body of first available message related to this bug follows.

The following bug has been logged on the website:

Bug reference:      16280
Logged by:          Ilya Serbin
Email address:      (redacted)
PostgreSQL version: 11.6
Operating system:   Centos 7.4
Description:        

Hello all,
Faced some strange plan changes with a query (query is quite bad, however I
can't understand why the plan changes).
Context:

db1=> \d+ table1
                                                            Table
"db1.table1"
   Column   |            Type             | Collation | Nullable |          
    Default               | Storage  | Stats target | Description
------------+-----------------------------+-----------+----------+-------------------------------------+----------+--------------+-------------
 id         | bigint                      |           | not null |
nextval('table1_id_seq'::regclass)  | plain    |              |
 table1_id  | character varying(255)      |           | not null |          
                          | extended |              |
 content    | jsonb                       |           |          |          
                          | extended |              |
 created_at | timestamp without time zone |           | not null | now()    
                          | plain    |              |
 updated_at | timestamp without time zone |           | not null | now()    
                          | plain    |              |
 queued_at  | timestamp with time zone    |           |          |          
                          | plain    |              |
Indexes:
    "table1_pkey" PRIMARY KEY, btree (id)
    "uk_table1_id" UNIQUE CONSTRAINT, btree (table1_id)
    "content_idx" gin (content jsonb_path_ops)
Referenced by:
    TABLE "collection_table1s" CONSTRAINT "fk_collection_table1s_table1_id"
FOREIGN KEY (table1_id) REFERENCES table1(id)
    TABLE "db1_table1s" CONSTRAINT "fk_db1_table1s_table1_id" FOREIGN KEY
(table1s_id) REFERENCES table1(id)
    TABLE "table1_sort" CONSTRAINT "fk_table1_sort_table1_id" FOREIGN KEY
(table1_id) REFERENCES table1(table1_id)
              name              |  setting
--------------------------------+-----------
 constraint_exclusion           | partition
 cpu_index_tuple_cost           | 0.005
 cpu_operator_cost              | 0.0025
 cpu_tuple_cost                 | 0.01
 default_statistics_target      | 200
 effective_cache_size           | 1572864
 enable_bitmapscan              | on
 enable_hashagg                 | on
 enable_hashjoin                | on
 enable_indexonlyscan           | on
 enable_indexscan               | on
 enable_material                | on
 enable_mergejoin               | on
 enable_nestloop                | on
 enable_parallel_append         | on
 enable_parallel_hash           | on
 enable_partition_pruning       | on
 enable_partitionwise_aggregate | off
 enable_partitionwise_join      | off
 enable_seqscan                 | on
 enable_sort                    | on
 enable_tidscan                 | on
 random_page_cost               | 1.1
 seq_page_cost                  | 1
 shared_buffers                 | 524288
(25 rows)


First plan: Good plan and problematic query:
https://explain.tensor.ru/archive/explain/1cf3c0181a9574bf2fd06d6fd07dc201:0:2020-02-27#context
Second plan: Same query and it's bad plan:
https://explain.tensor.ru/archive/explain/3770d39f786135e38bebeb1a8a4b1da9:0:2020-02-27#context

Issue description:
After some minimal activity in database first plan changes to the second.
Analyze on table1 do not help (tried with various default_statistics_target
values).
content_idx index recreation helps for some time, but several minutes later
plan degrades back to second one.
The only thing helped (surprisingly) is vacuum. It also helps for some time,
but once number of dead tuples reaches something  like 300-500 - plan
changes back to second one.

Table details with bad plan:

db1=> select * from pg_stat_user_tables where relname='table1';
-[ RECORD 1 ]-------+------------------------------
relid               | 74813598
schemaname          | db1
relname             | table1
seq_scan            | 1167
seq_tup_read        | 315158718
idx_scan            | 23116536
idx_tup_fetch       | 42353778
n_tup_ins           | 409288
n_tup_upd           | 313114963
n_tup_del           | 0
n_tup_hot_upd       | 4683
n_live_tup          | 409288
n_dead_tup          | 379
n_mod_since_analyze | 1169
last_vacuum         | 2020-02-12 08:58:06.147247+01
last_autovacuum     | 2020-02-20 12:29:31.427082+01
last_analyze        |
last_autoanalyze    | 2020-02-20 10:12:11.494305+01
vacuum_count        | 2
autovacuum_count    | 702
analyze_count       | 0
autoanalyze_count   | 20

Table details with good plan after vacuum:
-[ RECORD 1 ]-------+------------------------------
relid               | 74813598
schemaname          | db1
relname             | table1
seq_scan            | 1167
seq_tup_read        | 315158718
idx_scan            | 23116662
idx_tup_fetch       | 42360694
n_tup_ins           | 409288
n_tup_upd           | 313114963
n_tup_del           | 0
n_tup_hot_upd       | 4683
n_live_tup          | 409288
n_dead_tup          | 0
n_mod_since_analyze | 0
last_vacuum         | 2020-02-26 08:10:49.884625+01
last_autovacuum     | 2020-02-20 12:29:31.427082+01
last_analyze        | 2020-02-26 08:13:07.253307+01
last_autoanalyze    | 2020-02-20 10:12:11.494305+01
vacuum_count        | 3
autovacuum_count    | 702
analyze_count       | 1
autoanalyze_count   | 20

I can confirm that issue exists on 11.4 and 11.6 (updated to 11.6 since I
thought it was a bu that may have been fixed in 11.5 "Fix possible failure
of planner's index endpoint probes (Tom Lane)"), it is always reproducible,
and can be reproducted on prod and all lower environments.
I can't understand why it happens. As far as I understand there is something
to do with visibility map, but as per documention it should effect
index-only scans, not my case.
Main two questions are:
1) Is it a bug?
2) If it is expected behaviour - can someone please explain why it happens
and if there is any way  to keep the good plan (without installing
extensions to force pin plans)

Best regards, 
Ilya

Messages

DateAuthorSubject
2020-02-27 11:28:11+00PG Bug reporting formBUG #16280: dead tuples (probably) effect plan and query performance
2020-02-27 14:07:51+00Alexey ErmakovRe: BUG #16280: dead tuples (probably) effect plan and query performance
2020-02-27 15:41:06+00Tom LaneRe: BUG #16280: dead tuples (probably) effect plan and query performance
2020-02-28 09:25:22+00"Serbin, Ilya"Re: BUG #16280: dead tuples (probably) effect plan and query performance
2020-03-11 10:50:41+00"Serbin, Ilya"Re: BUG #16280: dead tuples (probably) effect plan and query performance
2020-03-11 13:01:16+00James ColemanRe: BUG #16280: dead tuples (probably) effect plan and query performance
2020-03-17 17:36:47+00"Serbin, Ilya"Re: BUG #16280: dead tuples (probably) effect plan and query performance