From c2fe139c201c48f1133e9fbea2dd99b8efe2fadd Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 11 Mar 2019 12:46:41 -0700 Subject: [PATCH] tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql --- contrib/amcheck/verify_nbtree.c | 15 +- contrib/pgrowlocks/pgrowlocks.c | 20 +- contrib/pgstattuple/pgstattuple.c | 22 +- contrib/tsm_system_rows/tsm_system_rows.c | 14 +- contrib/tsm_system_time/tsm_system_time.c | 9 +- src/backend/access/gist/gistget.c | 4 +- src/backend/access/hash/hashsearch.c | 4 +- src/backend/access/heap/heapam.c | 650 +++++++-------------- src/backend/access/heap/heapam_handler.c | 166 ++++++ src/backend/access/index/genam.c | 110 ++-- src/backend/access/index/indexam.c | 164 ++---- src/backend/access/nbtree/nbtree.c | 2 +- src/backend/access/nbtree/nbtsearch.c | 6 +- src/backend/access/nbtree/nbtsort.c | 20 +- src/backend/access/spgist/spgscan.c | 2 +- src/backend/access/table/tableam.c | 293 +++++++++- src/backend/access/table/tableamapi.c | 26 +- src/backend/access/tablesample/system.c | 7 +- src/backend/bootstrap/bootstrap.c | 21 +- src/backend/catalog/aclchk.c | 13 +- src/backend/catalog/index.c | 114 ++-- src/backend/catalog/pg_conversion.c | 7 +- src/backend/catalog/pg_db_role_setting.c | 7 +- src/backend/catalog/pg_publication.c | 7 +- src/backend/catalog/pg_subscription.c | 7 +- src/backend/commands/cluster.c | 29 +- src/backend/commands/constraint.c | 68 ++- src/backend/commands/copy.c | 7 +- src/backend/commands/dbcommands.c | 19 +- src/backend/commands/indexcmds.c | 7 +- src/backend/commands/tablecmds.c | 135 +++-- src/backend/commands/tablespace.c | 37 +- src/backend/commands/typecmds.c | 29 +- src/backend/commands/vacuum.c | 13 +- src/backend/executor/execCurrent.c | 2 +- src/backend/executor/execIndexing.c | 18 +- src/backend/executor/execMain.c | 6 +- src/backend/executor/execPartition.c | 11 +- src/backend/executor/execReplication.c | 62 +- src/backend/executor/execUtils.c | 7 +- src/backend/executor/nodeBitmapHeapscan.c | 74 +-- src/backend/executor/nodeIndexonlyscan.c | 25 +- src/backend/executor/nodeIndexscan.c | 38 +- src/backend/executor/nodeModifyTable.c | 17 +- src/backend/executor/nodeSamplescan.c | 86 +-- src/backend/executor/nodeSeqscan.c | 67 +-- src/backend/executor/nodeTidscan.c | 3 +- src/backend/partitioning/partbounds.c | 18 +- src/backend/postmaster/autovacuum.c | 17 +- src/backend/postmaster/pgstat.c | 7 +- src/backend/replication/logical/launcher.c | 7 +- src/backend/replication/logical/worker.c | 13 +- src/backend/rewrite/rewriteDefine.c | 12 +- src/backend/utils/adt/ri_triggers.c | 23 +- src/backend/utils/adt/selfuncs.c | 17 +- src/backend/utils/init/postinit.c | 7 +- src/include/access/genam.h | 6 +- src/include/access/heapam.h | 92 ++- src/include/access/relscan.h | 122 ++-- src/include/access/tableam.h | 468 ++++++++++++++- src/include/catalog/index.h | 5 +- src/include/nodes/execnodes.h | 2 +- src/tools/pgindent/typedefs.list | 11 +- 63 files changed, 2031 insertions(+), 1266 deletions(-) diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c index 964200a767..bb6442de82 100644 --- a/contrib/amcheck/verify_nbtree.c +++ b/contrib/amcheck/verify_nbtree.c @@ -26,6 +26,7 @@ #include "access/heapam.h" #include "access/htup_details.h" #include "access/nbtree.h" +#include "access/tableam.h" #include "access/transam.h" #include "access/xact.h" #include "catalog/index.h" @@ -481,7 +482,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool readonly, if (state->heapallindexed) { IndexInfo *indexinfo = BuildIndexInfo(state->rel); - HeapScanDesc scan; + TableScanDesc scan; /* Report on extra downlink checks performed in readonly case */ if (state->readonly) @@ -500,12 +501,12 @@ bt_check_every_level(Relation rel, Relation heaprel, bool readonly, * * Note that IndexBuildHeapScan() calls heap_endscan() for us. */ - scan = heap_beginscan_strat(state->heaprel, /* relation */ - snapshot, /* snapshot */ - 0, /* number of keys */ - NULL, /* scan key */ - true, /* buffer access strategy OK */ - true); /* syncscan OK? */ + scan = table_beginscan_strat(state->heaprel, /* relation */ + snapshot, /* snapshot */ + 0, /* number of keys */ + NULL, /* scan key */ + true, /* buffer access strategy OK */ + true); /* syncscan OK? */ /* * Scan will behave as the first scan of a CREATE INDEX CONCURRENTLY diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c index df2ad7f2c9..2d2a6cf153 100644 --- a/contrib/pgrowlocks/pgrowlocks.c +++ b/contrib/pgrowlocks/pgrowlocks.c @@ -27,6 +27,7 @@ #include "access/heapam.h" #include "access/multixact.h" #include "access/relscan.h" +#include "access/tableam.h" #include "access/xact.h" #include "catalog/namespace.h" #include "catalog/pg_authid.h" @@ -55,7 +56,7 @@ PG_FUNCTION_INFO_V1(pgrowlocks); typedef struct { Relation rel; - HeapScanDesc scan; + TableScanDesc scan; int ncolumns; } MyData; @@ -70,7 +71,8 @@ Datum pgrowlocks(PG_FUNCTION_ARGS) { FuncCallContext *funcctx; - HeapScanDesc scan; + TableScanDesc scan; + HeapScanDesc hscan; HeapTuple tuple; TupleDesc tupdesc; AttInMetadata *attinmeta; @@ -124,7 +126,8 @@ pgrowlocks(PG_FUNCTION_ARGS) aclcheck_error(aclresult, get_relkind_objtype(rel->rd_rel->relkind), RelationGetRelationName(rel)); - scan = heap_beginscan(rel, GetActiveSnapshot(), 0, NULL); + scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL); + hscan = (HeapScanDesc) scan; mydata = palloc(sizeof(*mydata)); mydata->rel = rel; mydata->scan = scan; @@ -138,6 +141,7 @@ pgrowlocks(PG_FUNCTION_ARGS) attinmeta = funcctx->attinmeta; mydata = (MyData *) funcctx->user_fctx; scan = mydata->scan; + hscan = (HeapScanDesc) scan; /* scan the relation */ while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) @@ -147,11 +151,11 @@ pgrowlocks(PG_FUNCTION_ARGS) uint16 infomask; /* must hold a buffer lock to call HeapTupleSatisfiesUpdate */ - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE); htsu = HeapTupleSatisfiesUpdate(tuple, GetCurrentCommandId(false), - scan->rs_cbuf); + hscan->rs_cbuf); xmax = HeapTupleHeaderGetRawXmax(tuple->t_data); infomask = tuple->t_data->t_infomask; @@ -284,7 +288,7 @@ pgrowlocks(PG_FUNCTION_ARGS) BackendXidGetPid(xmax)); } - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK); /* build a tuple */ tuple = BuildTupleFromCStrings(attinmeta, values); @@ -301,11 +305,11 @@ pgrowlocks(PG_FUNCTION_ARGS) } else { - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK); } } - heap_endscan(scan); + table_endscan(scan); table_close(mydata->rel, AccessShareLock); SRF_RETURN_DONE(funcctx); diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c index 2ac9863463..7e1c308000 100644 --- a/contrib/pgstattuple/pgstattuple.c +++ b/contrib/pgstattuple/pgstattuple.c @@ -29,6 +29,7 @@ #include "access/heapam.h" #include "access/nbtree.h" #include "access/relscan.h" +#include "access/tableam.h" #include "catalog/namespace.h" #include "catalog/pg_am.h" #include "funcapi.h" @@ -317,7 +318,8 @@ pgstat_relation(Relation rel, FunctionCallInfo fcinfo) static Datum pgstat_heap(Relation rel, FunctionCallInfo fcinfo) { - HeapScanDesc scan; + TableScanDesc scan; + HeapScanDesc hscan; HeapTuple tuple; BlockNumber nblocks; BlockNumber block = 0; /* next block to count free space in */ @@ -327,10 +329,12 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo) SnapshotData SnapshotDirty; /* Disable syncscan because we assume we scan from block zero upwards */ - scan = heap_beginscan_strat(rel, SnapshotAny, 0, NULL, true, false); + scan = table_beginscan_strat(rel, SnapshotAny, 0, NULL, true, false); + hscan = (HeapScanDesc) scan; + InitDirtySnapshot(SnapshotDirty); - nblocks = scan->rs_nblocks; /* # blocks to be scanned */ + nblocks = hscan->rs_nblocks; /* # blocks to be scanned */ /* scan the relation */ while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) @@ -338,9 +342,9 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo) CHECK_FOR_INTERRUPTS(); /* must hold a buffer lock to call HeapTupleSatisfiesVisibility */ - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE); - if (HeapTupleSatisfiesVisibility(tuple, &SnapshotDirty, scan->rs_cbuf)) + if (HeapTupleSatisfiesVisibility(tuple, &SnapshotDirty, hscan->rs_cbuf)) { stat.tuple_len += tuple->t_len; stat.tuple_count++; @@ -351,7 +355,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo) stat.dead_tuple_count++; } - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK); /* * To avoid physically reading the table twice, try to do the @@ -366,7 +370,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo) CHECK_FOR_INTERRUPTS(); buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block, - RBM_NORMAL, scan->rs_strategy); + RBM_NORMAL, hscan->rs_strategy); LockBuffer(buffer, BUFFER_LOCK_SHARE); stat.free_space += PageGetHeapFreeSpace((Page) BufferGetPage(buffer)); UnlockReleaseBuffer(buffer); @@ -379,14 +383,14 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo) CHECK_FOR_INTERRUPTS(); buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block, - RBM_NORMAL, scan->rs_strategy); + RBM_NORMAL, hscan->rs_strategy); LockBuffer(buffer, BUFFER_LOCK_SHARE); stat.free_space += PageGetHeapFreeSpace((Page) BufferGetPage(buffer)); UnlockReleaseBuffer(buffer); block++; } - heap_endscan(scan); + table_endscan(scan); relation_close(rel, AccessShareLock); stat.table_len = (uint64) nblocks * BLCKSZ; diff --git a/contrib/tsm_system_rows/tsm_system_rows.c b/contrib/tsm_system_rows/tsm_system_rows.c index c92490f938..1d35ea3c53 100644 --- a/contrib/tsm_system_rows/tsm_system_rows.c +++ b/contrib/tsm_system_rows/tsm_system_rows.c @@ -209,7 +209,8 @@ static BlockNumber system_rows_nextsampleblock(SampleScanState *node) { SystemRowsSamplerData *sampler = (SystemRowsSamplerData *) node->tsm_state; - HeapScanDesc scan = node->ss.ss_currentScanDesc; + TableScanDesc scan = node->ss.ss_currentScanDesc; + HeapScanDesc hscan = (HeapScanDesc) scan; /* First call within scan? */ if (sampler->doneblocks == 0) @@ -221,14 +222,14 @@ system_rows_nextsampleblock(SampleScanState *node) SamplerRandomState randstate; /* If relation is empty, there's nothing to scan */ - if (scan->rs_nblocks == 0) + if (hscan->rs_nblocks == 0) return InvalidBlockNumber; /* We only need an RNG during this setup step */ sampler_random_init_state(sampler->seed, randstate); /* Compute nblocks/firstblock/step only once per query */ - sampler->nblocks = scan->rs_nblocks; + sampler->nblocks = hscan->rs_nblocks; /* Choose random starting block within the relation */ /* (Actually this is the predecessor of the first block visited) */ @@ -258,7 +259,7 @@ system_rows_nextsampleblock(SampleScanState *node) { /* Advance lb, using uint64 arithmetic to forestall overflow */ sampler->lb = ((uint64) sampler->lb + sampler->step) % sampler->nblocks; - } while (sampler->lb >= scan->rs_nblocks); + } while (sampler->lb >= hscan->rs_nblocks); return sampler->lb; } @@ -278,7 +279,8 @@ system_rows_nextsampletuple(SampleScanState *node, OffsetNumber maxoffset) { SystemRowsSamplerData *sampler = (SystemRowsSamplerData *) node->tsm_state; - HeapScanDesc scan = node->ss.ss_currentScanDesc; + TableScanDesc scan = node->ss.ss_currentScanDesc; + HeapScanDesc hscan = (HeapScanDesc) scan; OffsetNumber tupoffset = sampler->lt; /* Quit if we've returned all needed tuples */ @@ -308,7 +310,7 @@ system_rows_nextsampletuple(SampleScanState *node, } /* Found a candidate? */ - if (SampleOffsetVisible(tupoffset, scan)) + if (SampleOffsetVisible(tupoffset, hscan)) { sampler->donetuples++; break; diff --git a/contrib/tsm_system_time/tsm_system_time.c b/contrib/tsm_system_time/tsm_system_time.c index edeacf0b53..1cc7264e08 100644 --- a/contrib/tsm_system_time/tsm_system_time.c +++ b/contrib/tsm_system_time/tsm_system_time.c @@ -216,7 +216,8 @@ static BlockNumber system_time_nextsampleblock(SampleScanState *node) { SystemTimeSamplerData *sampler = (SystemTimeSamplerData *) node->tsm_state; - HeapScanDesc scan = node->ss.ss_currentScanDesc; + TableScanDesc scan = node->ss.ss_currentScanDesc; + HeapScanDesc hscan = (HeapScanDesc) scan; instr_time cur_time; /* First call within scan? */ @@ -229,14 +230,14 @@ system_time_nextsampleblock(SampleScanState *node) SamplerRandomState randstate; /* If relation is empty, there's nothing to scan */ - if (scan->rs_nblocks == 0) + if (hscan->rs_nblocks == 0) return InvalidBlockNumber; /* We only need an RNG during this setup step */ sampler_random_init_state(sampler->seed, randstate); /* Compute nblocks/firstblock/step only once per query */ - sampler->nblocks = scan->rs_nblocks; + sampler->nblocks = hscan->rs_nblocks; /* Choose random starting block within the relation */ /* (Actually this is the predecessor of the first block visited) */ @@ -272,7 +273,7 @@ system_time_nextsampleblock(SampleScanState *node) { /* Advance lb, using uint64 arithmetic to forestall overflow */ sampler->lb = ((uint64) sampler->lb + sampler->step) % sampler->nblocks; - } while (sampler->lb >= scan->rs_nblocks); + } while (sampler->lb >= hscan->rs_nblocks); return sampler->lb; } diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c index 156b9d699f..8108fbb7d8 100644 --- a/src/backend/access/gist/gistget.c +++ b/src/backend/access/gist/gistget.c @@ -561,7 +561,7 @@ getNextNearest(IndexScanDesc scan) if (GISTSearchItemIsHeap(*item)) { /* found a heap item at currently minimal distance */ - scan->xs_ctup.t_self = item->data.heap.heapPtr; + scan->xs_heaptid = item->data.heap.heapPtr; scan->xs_recheck = item->data.heap.recheck; index_store_float8_orderby_distances(scan, so->orderByTypes, @@ -650,7 +650,7 @@ gistgettuple(IndexScanDesc scan, ScanDirection dir) so->pageData[so->curPageData - 1].offnum; } /* continuing to return tuples from a leaf page */ - scan->xs_ctup.t_self = so->pageData[so->curPageData].heapPtr; + scan->xs_heaptid = so->pageData[so->curPageData].heapPtr; scan->xs_recheck = so->pageData[so->curPageData].recheck; /* in an index-only scan, also return the reconstructed tuple */ diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c index ccd3fdceac..61c90e6bb7 100644 --- a/src/backend/access/hash/hashsearch.c +++ b/src/backend/access/hash/hashsearch.c @@ -119,7 +119,7 @@ _hash_next(IndexScanDesc scan, ScanDirection dir) /* OK, itemIndex says what to return */ currItem = &so->currPos.items[so->currPos.itemIndex]; - scan->xs_ctup.t_self = currItem->heapTid; + scan->xs_heaptid = currItem->heapTid; return true; } @@ -432,7 +432,7 @@ _hash_first(IndexScanDesc scan, ScanDirection dir) /* OK, itemIndex says what to return */ currItem = &so->currPos.items[so->currPos.itemIndex]; - scan->xs_ctup.t_self = currItem->heapTid; + scan->xs_heaptid = currItem->heapTid; /* if we're here, _hash_readpage found a valid tuples */ return true; diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index dc3499349b..3c8a5da0bc 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -41,6 +41,7 @@ #include "access/parallel.h" #include "access/relscan.h" #include "access/sysattr.h" +#include "access/tableam.h" #include "access/transam.h" #include "access/tuptoaster.h" #include "access/valid.h" @@ -68,22 +69,6 @@ #include "utils/snapmgr.h" -/* GUC variable */ -bool synchronize_seqscans = true; - - -static HeapScanDesc heap_beginscan_internal(Relation relation, - Snapshot snapshot, - int nkeys, ScanKey key, - ParallelHeapScanDesc parallel_scan, - bool allow_strat, - bool allow_sync, - bool allow_pagemode, - bool is_bitmapscan, - bool is_samplescan, - bool temp_snap); -static void heap_parallelscan_startblock_init(HeapScanDesc scan); -static BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan); static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, CommandId cid, int options); static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf, @@ -207,6 +192,7 @@ static const int MultiXactStatusLock[MaxMultiXactStatus + 1] = static void initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) { + ParallelBlockTableScanDesc bpscan = NULL; bool allow_strat; bool allow_sync; @@ -221,10 +207,13 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) * results for a non-MVCC snapshot, the caller must hold some higher-level * lock that ensures the interesting tuple(s) won't change.) */ - if (scan->rs_parallel != NULL) - scan->rs_nblocks = scan->rs_parallel->phs_nblocks; + if (scan->rs_base.rs_parallel != NULL) + { + bpscan = (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel; + scan->rs_nblocks = bpscan->phs_nblocks; + } else - scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd); + scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd); /* * If the table is large relative to NBuffers, use a bulk-read access @@ -238,11 +227,11 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) * Note that heap_parallelscan_initialize has a very similar test; if you * change this, consider changing that one, too. */ - if (!RelationUsesLocalBuffers(scan->rs_rd) && + if (!RelationUsesLocalBuffers(scan->rs_base.rs_rd) && scan->rs_nblocks > NBuffers / 4) { - allow_strat = scan->rs_allow_strat; - allow_sync = scan->rs_allow_sync; + allow_strat = scan->rs_base.rs_allow_strat; + allow_sync = scan->rs_base.rs_allow_sync; } else allow_strat = allow_sync = false; @@ -260,10 +249,10 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) scan->rs_strategy = NULL; } - if (scan->rs_parallel != NULL) + if (scan->rs_base.rs_parallel != NULL) { - /* For parallel scan, believe whatever ParallelHeapScanDesc says. */ - scan->rs_syncscan = scan->rs_parallel->phs_syncscan; + /* For parallel scan, believe whatever ParallelTableScanDesc says. */ + scan->rs_base.rs_syncscan = scan->rs_base.rs_parallel->phs_syncscan; } else if (keep_startblock) { @@ -272,16 +261,16 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) * so that rewinding a cursor doesn't generate surprising results. * Reset the active syncscan setting, though. */ - scan->rs_syncscan = (allow_sync && synchronize_seqscans); + scan->rs_base.rs_syncscan = (allow_sync && synchronize_seqscans); } else if (allow_sync && synchronize_seqscans) { - scan->rs_syncscan = true; - scan->rs_startblock = ss_get_location(scan->rs_rd, scan->rs_nblocks); + scan->rs_base.rs_syncscan = true; + scan->rs_startblock = ss_get_location(scan->rs_base.rs_rd, scan->rs_nblocks); } else { - scan->rs_syncscan = false; + scan->rs_base.rs_syncscan = false; scan->rs_startblock = 0; } @@ -298,15 +287,15 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) * copy the scan key, if appropriate */ if (key != NULL) - memcpy(scan->rs_key, key, scan->rs_nkeys * sizeof(ScanKeyData)); + memcpy(scan->rs_base.rs_key, key, scan->rs_base.rs_nkeys * sizeof(ScanKeyData)); /* * Currently, we don't have a stats counter for bitmap heap scans (but the * underlying bitmap index scans will be counted) or sample scans (we only * update stats for tuple fetches there) */ - if (!scan->rs_bitmapscan && !scan->rs_samplescan) - pgstat_count_heap_scan(scan->rs_rd); + if (!scan->rs_base.rs_bitmapscan && !scan->rs_base.rs_samplescan) + pgstat_count_heap_scan(scan->rs_base.rs_rd); } /* @@ -316,10 +305,12 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) * numBlks is number of pages to scan (InvalidBlockNumber means "all") */ void -heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk, BlockNumber numBlks) +heap_setscanlimits(TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks) { + HeapScanDesc scan = (HeapScanDesc) sscan; + Assert(!scan->rs_inited); /* else too late to change */ - Assert(!scan->rs_syncscan); /* else rs_startblock is significant */ + Assert(!scan->rs_base.rs_syncscan); /* else rs_startblock is significant */ /* Check startBlk is valid (but allow case of zero blocks...) */ Assert(startBlk == 0 || startBlk < scan->rs_nblocks); @@ -336,8 +327,9 @@ heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk, BlockNumber numBlks) * which tuples on the page are visible. */ void -heapgetpage(HeapScanDesc scan, BlockNumber page) +heapgetpage(TableScanDesc sscan, BlockNumber page) { + HeapScanDesc scan = (HeapScanDesc) sscan; Buffer buffer; Snapshot snapshot; Page dp; @@ -364,20 +356,20 @@ heapgetpage(HeapScanDesc scan, BlockNumber page) CHECK_FOR_INTERRUPTS(); /* read page using selected strategy */ - scan->rs_cbuf = ReadBufferExtended(scan->rs_rd, MAIN_FORKNUM, page, + scan->rs_cbuf = ReadBufferExtended(scan->rs_base.rs_rd, MAIN_FORKNUM, page, RBM_NORMAL, scan->rs_strategy); scan->rs_cblock = page; - if (!scan->rs_pageatatime) + if (!scan->rs_base.rs_pageatatime) return; buffer = scan->rs_cbuf; - snapshot = scan->rs_snapshot; + snapshot = scan->rs_base.rs_snapshot; /* * Prune and repair fragmentation for the whole page, if possible. */ - heap_page_prune_opt(scan->rs_rd, buffer); + heap_page_prune_opt(scan->rs_base.rs_rd, buffer); /* * We must hold share lock on the buffer content while examining tuple @@ -387,7 +379,7 @@ heapgetpage(HeapScanDesc scan, BlockNumber page) LockBuffer(buffer, BUFFER_LOCK_SHARE); dp = BufferGetPage(buffer); - TestForOldSnapshot(snapshot, scan->rs_rd, dp); + TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp); lines = PageGetMaxOffsetNumber(dp); ntup = 0; @@ -422,7 +414,7 @@ heapgetpage(HeapScanDesc scan, BlockNumber page) HeapTupleData loctup; bool valid; - loctup.t_tableOid = RelationGetRelid(scan->rs_rd); + loctup.t_tableOid = RelationGetRelid(scan->rs_base.rs_rd); loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp); loctup.t_len = ItemIdGetLength(lpp); ItemPointerSet(&(loctup.t_self), page, lineoff); @@ -432,8 +424,8 @@ heapgetpage(HeapScanDesc scan, BlockNumber page) else valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer); - CheckForSerializableConflictOut(valid, scan->rs_rd, &loctup, - buffer, snapshot); + CheckForSerializableConflictOut(valid, scan->rs_base.rs_rd, + &loctup, buffer, snapshot); if (valid) scan->rs_vistuples[ntup++] = lineoff; @@ -476,7 +468,7 @@ heapgettup(HeapScanDesc scan, ScanKey key) { HeapTuple tuple = &(scan->rs_ctup); - Snapshot snapshot = scan->rs_snapshot; + Snapshot snapshot = scan->rs_base.rs_snapshot; bool backward = ScanDirectionIsBackward(dir); BlockNumber page; bool finished; @@ -502,11 +494,16 @@ heapgettup(HeapScanDesc scan, tuple->t_data = NULL; return; } - if (scan->rs_parallel != NULL) + if (scan->rs_base.rs_parallel != NULL) { - heap_parallelscan_startblock_init(scan); + ParallelBlockTableScanDesc pbscan = + (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel; - page = heap_parallelscan_nextpage(scan); + table_block_parallelscan_startblock_init(scan->rs_base.rs_rd, + pbscan); + + page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd, + pbscan); /* Other processes might have already finished the scan. */ if (page == InvalidBlockNumber) @@ -518,7 +515,7 @@ heapgettup(HeapScanDesc scan, } else page = scan->rs_startblock; /* first page */ - heapgetpage(scan, page); + heapgetpage((TableScanDesc) scan, page); lineoff = FirstOffsetNumber; /* first offnum */ scan->rs_inited = true; } @@ -533,7 +530,7 @@ heapgettup(HeapScanDesc scan, LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); dp = BufferGetPage(scan->rs_cbuf); - TestForOldSnapshot(snapshot, scan->rs_rd, dp); + TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp); lines = PageGetMaxOffsetNumber(dp); /* page and lineoff now reference the physically next tid */ @@ -542,7 +539,7 @@ heapgettup(HeapScanDesc scan, else if (backward) { /* backward parallel scan not supported */ - Assert(scan->rs_parallel == NULL); + Assert(scan->rs_base.rs_parallel == NULL); if (!scan->rs_inited) { @@ -562,13 +559,13 @@ heapgettup(HeapScanDesc scan, * time, and much more likely that we'll just bollix things for * forward scanners. */ - scan->rs_syncscan = false; + scan->rs_base.rs_syncscan = false; /* start from last page of the scan */ if (scan->rs_startblock > 0) page = scan->rs_startblock - 1; else page = scan->rs_nblocks - 1; - heapgetpage(scan, page); + heapgetpage((TableScanDesc) scan, page); } else { @@ -579,7 +576,7 @@ heapgettup(HeapScanDesc scan, LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); dp = BufferGetPage(scan->rs_cbuf); - TestForOldSnapshot(snapshot, scan->rs_rd, dp); + TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp); lines = PageGetMaxOffsetNumber(dp); if (!scan->rs_inited) @@ -610,11 +607,11 @@ heapgettup(HeapScanDesc scan, page = ItemPointerGetBlockNumber(&(tuple->t_self)); if (page != scan->rs_cblock) - heapgetpage(scan, page); + heapgetpage((TableScanDesc) scan, page); /* Since the tuple was previously fetched, needn't lock page here */ dp = BufferGetPage(scan->rs_cbuf); - TestForOldSnapshot(snapshot, scan->rs_rd, dp); + TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp); lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self)); lpp = PageGetItemId(dp, lineoff); Assert(ItemIdIsNormal(lpp)); @@ -649,11 +646,12 @@ heapgettup(HeapScanDesc scan, snapshot, scan->rs_cbuf); - CheckForSerializableConflictOut(valid, scan->rs_rd, tuple, - scan->rs_cbuf, snapshot); + CheckForSerializableConflictOut(valid, scan->rs_base.rs_rd, + tuple, scan->rs_cbuf, + snapshot); if (valid && key != NULL) - HeapKeyTest(tuple, RelationGetDescr(scan->rs_rd), + HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd), nkeys, key, valid); if (valid) @@ -696,9 +694,13 @@ heapgettup(HeapScanDesc scan, page = scan->rs_nblocks; page--; } - else if (scan->rs_parallel != NULL) + else if (scan->rs_base.rs_parallel != NULL) { - page = heap_parallelscan_nextpage(scan); + ParallelBlockTableScanDesc pbscan = + (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel; + + page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd, + pbscan); finished = (page == InvalidBlockNumber); } else @@ -721,8 +723,8 @@ heapgettup(HeapScanDesc scan, * a little bit backwards on every invocation, which is confusing. * We don't guarantee any specific ordering in general, though. */ - if (scan->rs_syncscan) - ss_report_location(scan->rs_rd, page); + if (scan->rs_base.rs_syncscan) + ss_report_location(scan->rs_base.rs_rd, page); } /* @@ -739,12 +741,12 @@ heapgettup(HeapScanDesc scan, return; } - heapgetpage(scan, page); + heapgetpage((TableScanDesc) scan, page); LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); dp = BufferGetPage(scan->rs_cbuf); - TestForOldSnapshot(snapshot, scan->rs_rd, dp); + TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp); lines = PageGetMaxOffsetNumber((Page) dp); linesleft = lines; if (backward) @@ -806,11 +808,16 @@ heapgettup_pagemode(HeapScanDesc scan, tuple->t_data = NULL; return; } - if (scan->rs_parallel != NULL) + if (scan->rs_base.rs_parallel != NULL) { - heap_parallelscan_startblock_init(scan); + ParallelBlockTableScanDesc pbscan = + (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel; + + table_block_parallelscan_startblock_init(scan->rs_base.rs_rd, + pbscan); - page = heap_parallelscan_nextpage(scan); + page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd, + pbscan); /* Other processes might have already finished the scan. */ if (page == InvalidBlockNumber) @@ -822,7 +829,7 @@ heapgettup_pagemode(HeapScanDesc scan, } else page = scan->rs_startblock; /* first page */ - heapgetpage(scan, page); + heapgetpage((TableScanDesc) scan, page); lineindex = 0; scan->rs_inited = true; } @@ -834,7 +841,7 @@ heapgettup_pagemode(HeapScanDesc scan, } dp = BufferGetPage(scan->rs_cbuf); - TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp); + TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, dp); lines = scan->rs_ntuples; /* page and lineindex now reference the next visible tid */ @@ -843,7 +850,7 @@ heapgettup_pagemode(HeapScanDesc scan, else if (backward) { /* backward parallel scan not supported */ - Assert(scan->rs_parallel == NULL); + Assert(scan->rs_base.rs_parallel == NULL); if (!scan->rs_inited) { @@ -863,13 +870,13 @@ heapgettup_pagemode(HeapScanDesc scan, * time, and much more likely that we'll just bollix things for * forward scanners. */ - scan->rs_syncscan = false; + scan->rs_base.rs_syncscan = false; /* start from last page of the scan */ if (scan->rs_startblock > 0) page = scan->rs_startblock - 1; else page = scan->rs_nblocks - 1; - heapgetpage(scan, page); + heapgetpage((TableScanDesc) scan, page); } else { @@ -878,7 +885,7 @@ heapgettup_pagemode(HeapScanDesc scan, } dp = BufferGetPage(scan->rs_cbuf); - TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp); + TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, dp); lines = scan->rs_ntuples; if (!scan->rs_inited) @@ -908,11 +915,11 @@ heapgettup_pagemode(HeapScanDesc scan, page = ItemPointerGetBlockNumber(&(tuple->t_self)); if (page != scan->rs_cblock) - heapgetpage(scan, page); + heapgetpage((TableScanDesc) scan, page); /* Since the tuple was previously fetched, needn't lock page here */ dp = BufferGetPage(scan->rs_cbuf); - TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp); + TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, dp); lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self)); lpp = PageGetItemId(dp, lineoff); Assert(ItemIdIsNormal(lpp)); @@ -950,7 +957,7 @@ heapgettup_pagemode(HeapScanDesc scan, { bool valid; - HeapKeyTest(tuple, RelationGetDescr(scan->rs_rd), + HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd), nkeys, key, valid); if (valid) { @@ -986,9 +993,13 @@ heapgettup_pagemode(HeapScanDesc scan, page = scan->rs_nblocks; page--; } - else if (scan->rs_parallel != NULL) + else if (scan->rs_base.rs_parallel != NULL) { - page = heap_parallelscan_nextpage(scan); + ParallelBlockTableScanDesc pbscan = + (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel; + + page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd, + pbscan); finished = (page == InvalidBlockNumber); } else @@ -1011,8 +1022,8 @@ heapgettup_pagemode(HeapScanDesc scan, * a little bit backwards on every invocation, which is confusing. * We don't guarantee any specific ordering in general, though. */ - if (scan->rs_syncscan) - ss_report_location(scan->rs_rd, page); + if (scan->rs_base.rs_syncscan) + ss_report_location(scan->rs_base.rs_rd, page); } /* @@ -1029,10 +1040,10 @@ heapgettup_pagemode(HeapScanDesc scan, return; } - heapgetpage(scan, page); + heapgetpage((TableScanDesc) scan, page); dp = BufferGetPage(scan->rs_cbuf); - TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp); + TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, dp); lines = scan->rs_ntuples; linesleft = lines; if (backward) @@ -1095,86 +1106,16 @@ fastgetattr(HeapTuple tup, int attnum, TupleDesc tupleDesc, */ -/* ---------------- - * heap_beginscan - begin relation scan - * - * heap_beginscan is the "standard" case. - * - * heap_beginscan_catalog differs in setting up its own temporary snapshot. - * - * heap_beginscan_strat offers an extended API that lets the caller control - * whether a nondefault buffer access strategy can be used, and whether - * syncscan can be chosen (possibly resulting in the scan not starting from - * block zero). Both of these default to true with plain heap_beginscan. - * - * heap_beginscan_bm is an alternative entry point for setting up a - * HeapScanDesc for a bitmap heap scan. Although that scan technology is - * really quite unlike a standard seqscan, there is just enough commonality - * to make it worth using the same data structure. - * - * heap_beginscan_sampling is an alternative entry point for setting up a - * HeapScanDesc for a TABLESAMPLE scan. As with bitmap scans, it's worth - * using the same data structure although the behavior is rather different. - * In addition to the options offered by heap_beginscan_strat, this call - * also allows control of whether page-mode visibility checking is used. - * ---------------- - */ -HeapScanDesc +TableScanDesc heap_beginscan(Relation relation, Snapshot snapshot, - int nkeys, ScanKey key) -{ - return heap_beginscan_internal(relation, snapshot, nkeys, key, NULL, - true, true, true, false, false, false); -} - -HeapScanDesc -heap_beginscan_catalog(Relation relation, int nkeys, ScanKey key) -{ - Oid relid = RelationGetRelid(relation); - Snapshot snapshot = RegisterSnapshot(GetCatalogSnapshot(relid)); - - return heap_beginscan_internal(relation, snapshot, nkeys, key, NULL, - true, true, true, false, false, true); -} - -HeapScanDesc -heap_beginscan_strat(Relation relation, Snapshot snapshot, - int nkeys, ScanKey key, - bool allow_strat, bool allow_sync) -{ - return heap_beginscan_internal(relation, snapshot, nkeys, key, NULL, - allow_strat, allow_sync, true, - false, false, false); -} - -HeapScanDesc -heap_beginscan_bm(Relation relation, Snapshot snapshot, - int nkeys, ScanKey key) -{ - return heap_beginscan_internal(relation, snapshot, nkeys, key, NULL, - false, false, true, true, false, false); -} - -HeapScanDesc -heap_beginscan_sampling(Relation relation, Snapshot snapshot, - int nkeys, ScanKey key, - bool allow_strat, bool allow_sync, bool allow_pagemode) -{ - return heap_beginscan_internal(relation, snapshot, nkeys, key, NULL, - allow_strat, allow_sync, allow_pagemode, - false, true, false); -} - -static HeapScanDesc -heap_beginscan_internal(Relation relation, Snapshot snapshot, - int nkeys, ScanKey key, - ParallelHeapScanDesc parallel_scan, - bool allow_strat, - bool allow_sync, - bool allow_pagemode, - bool is_bitmapscan, - bool is_samplescan, - bool temp_snap) + int nkeys, ScanKey key, + ParallelTableScanDesc parallel_scan, + bool allow_strat, + bool allow_sync, + bool allow_pagemode, + bool is_bitmapscan, + bool is_samplescan, + bool temp_snap) { HeapScanDesc scan; @@ -1192,21 +1133,22 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot, */ scan = (HeapScanDesc) palloc(sizeof(HeapScanDescData)); - scan->rs_rd = relation; - scan->rs_snapshot = snapshot; - scan->rs_nkeys = nkeys; - scan->rs_bitmapscan = is_bitmapscan; - scan->rs_samplescan = is_samplescan; + scan->rs_base.rs_rd = relation; + scan->rs_base.rs_snapshot = snapshot; + scan->rs_base.rs_nkeys = nkeys; + scan->rs_base.rs_bitmapscan = is_bitmapscan; + scan->rs_base.rs_samplescan = is_samplescan; scan->rs_strategy = NULL; /* set in initscan */ - scan->rs_allow_strat = allow_strat; - scan->rs_allow_sync = allow_sync; - scan->rs_temp_snap = temp_snap; - scan->rs_parallel = parallel_scan; + scan->rs_base.rs_allow_strat = allow_strat; + scan->rs_base.rs_allow_sync = allow_sync; + scan->rs_base.rs_temp_snap = temp_snap; + scan->rs_base.rs_parallel = parallel_scan; /* * we can use page-at-a-time mode if it's an MVCC-safe snapshot */ - scan->rs_pageatatime = allow_pagemode && IsMVCCSnapshot(snapshot); + scan->rs_base.rs_pageatatime = + allow_pagemode && snapshot && IsMVCCSnapshot(snapshot); /* * For a seqscan in a serializable transaction, acquire a predicate lock @@ -1230,23 +1172,29 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot, * initscan() and we don't want to allocate memory again */ if (nkeys > 0) - scan->rs_key = (ScanKey) palloc(sizeof(ScanKeyData) * nkeys); + scan->rs_base.rs_key = (ScanKey) palloc(sizeof(ScanKeyData) * nkeys); else - scan->rs_key = NULL; + scan->rs_base.rs_key = NULL; initscan(scan, key, false); - return scan; + return (TableScanDesc) scan; } -/* ---------------- - * heap_rescan - restart a relation scan - * ---------------- - */ void -heap_rescan(HeapScanDesc scan, - ScanKey key) +heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params, + bool allow_strat, bool allow_sync, bool allow_pagemode) { + HeapScanDesc scan = (HeapScanDesc) sscan; + + if (set_params) + { + scan->rs_base.rs_allow_strat = allow_strat; + scan->rs_base.rs_allow_sync = allow_sync; + scan->rs_base.rs_pageatatime = + allow_pagemode && IsMVCCSnapshot(scan->rs_base.rs_snapshot); + } + /* * unpin scan buffers */ @@ -1259,37 +1207,11 @@ heap_rescan(HeapScanDesc scan, initscan(scan, key, true); } -/* ---------------- - * heap_rescan_set_params - restart a relation scan after changing params - * - * This call allows changing the buffer strategy, syncscan, and pagemode - * options before starting a fresh scan. Note that although the actual use - * of syncscan might change (effectively, enabling or disabling reporting), - * the previously selected startblock will be kept. - * ---------------- - */ void -heap_rescan_set_params(HeapScanDesc scan, ScanKey key, - bool allow_strat, bool allow_sync, bool allow_pagemode) +heap_endscan(TableScanDesc sscan) { - /* adjust parameters */ - scan->rs_allow_strat = allow_strat; - scan->rs_allow_sync = allow_sync; - scan->rs_pageatatime = allow_pagemode && IsMVCCSnapshot(scan->rs_snapshot); - /* ... and rescan */ - heap_rescan(scan, key); -} + HeapScanDesc scan = (HeapScanDesc) sscan; -/* ---------------- - * heap_endscan - end relation scan - * - * See how to integrate with index scans. - * Check handling if reldesc caching. - * ---------------- - */ -void -heap_endscan(HeapScanDesc scan) -{ /* Note: no locking manipulations needed */ /* @@ -1301,246 +1223,20 @@ heap_endscan(HeapScanDesc scan) /* * decrement relation reference count and free scan descriptor storage */ - RelationDecrementReferenceCount(scan->rs_rd); + RelationDecrementReferenceCount(scan->rs_base.rs_rd); - if (scan->rs_key) - pfree(scan->rs_key); + if (scan->rs_base.rs_key) + pfree(scan->rs_base.rs_key); if (scan->rs_strategy != NULL) FreeAccessStrategy(scan->rs_strategy); - if (scan->rs_temp_snap) - UnregisterSnapshot(scan->rs_snapshot); + if (scan->rs_base.rs_temp_snap) + UnregisterSnapshot(scan->rs_base.rs_snapshot); pfree(scan); } -/* ---------------- - * heap_parallelscan_estimate - estimate storage for ParallelHeapScanDesc - * - * Sadly, this doesn't reduce to a constant, because the size required - * to serialize the snapshot can vary. - * ---------------- - */ -Size -heap_parallelscan_estimate(Snapshot snapshot) -{ - Size sz = offsetof(ParallelHeapScanDescData, phs_snapshot_data); - - if (IsMVCCSnapshot(snapshot)) - sz = add_size(sz, EstimateSnapshotSpace(snapshot)); - else - Assert(snapshot == SnapshotAny); - - return sz; -} - -/* ---------------- - * heap_parallelscan_initialize - initialize ParallelHeapScanDesc - * - * Must allow as many bytes of shared memory as returned by - * heap_parallelscan_estimate. Call this just once in the leader - * process; then, individual workers attach via heap_beginscan_parallel. - * ---------------- - */ -void -heap_parallelscan_initialize(ParallelHeapScanDesc target, Relation relation, - Snapshot snapshot) -{ - target->phs_relid = RelationGetRelid(relation); - target->phs_nblocks = RelationGetNumberOfBlocks(relation); - /* compare phs_syncscan initialization to similar logic in initscan */ - target->phs_syncscan = synchronize_seqscans && - !RelationUsesLocalBuffers(relation) && - target->phs_nblocks > NBuffers / 4; - SpinLockInit(&target->phs_mutex); - target->phs_startblock = InvalidBlockNumber; - pg_atomic_init_u64(&target->phs_nallocated, 0); - if (IsMVCCSnapshot(snapshot)) - { - SerializeSnapshot(snapshot, target->phs_snapshot_data); - target->phs_snapshot_any = false; - } - else - { - Assert(snapshot == SnapshotAny); - target->phs_snapshot_any = true; - } -} - -/* ---------------- - * heap_parallelscan_reinitialize - reset a parallel scan - * - * Call this in the leader process. Caller is responsible for - * making sure that all workers have finished the scan beforehand. - * ---------------- - */ -void -heap_parallelscan_reinitialize(ParallelHeapScanDesc parallel_scan) -{ - pg_atomic_write_u64(¶llel_scan->phs_nallocated, 0); -} - -/* ---------------- - * heap_beginscan_parallel - join a parallel scan - * - * Caller must hold a suitable lock on the correct relation. - * ---------------- - */ -HeapScanDesc -heap_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan) -{ - Snapshot snapshot; - - Assert(RelationGetRelid(relation) == parallel_scan->phs_relid); - - if (!parallel_scan->phs_snapshot_any) - { - /* Snapshot was serialized -- restore it */ - snapshot = RestoreSnapshot(parallel_scan->phs_snapshot_data); - RegisterSnapshot(snapshot); - } - else - { - /* SnapshotAny passed by caller (not serialized) */ - snapshot = SnapshotAny; - } - - return heap_beginscan_internal(relation, snapshot, 0, NULL, parallel_scan, - true, true, true, false, false, - !parallel_scan->phs_snapshot_any); -} - -/* ---------------- - * heap_parallelscan_startblock_init - find and set the scan's startblock - * - * Determine where the parallel seq scan should start. This function may - * be called many times, once by each parallel worker. We must be careful - * only to set the startblock once. - * ---------------- - */ -static void -heap_parallelscan_startblock_init(HeapScanDesc scan) -{ - BlockNumber sync_startpage = InvalidBlockNumber; - ParallelHeapScanDesc parallel_scan; - - Assert(scan->rs_parallel); - parallel_scan = scan->rs_parallel; - -retry: - /* Grab the spinlock. */ - SpinLockAcquire(¶llel_scan->phs_mutex); - - /* - * If the scan's startblock has not yet been initialized, we must do so - * now. If this is not a synchronized scan, we just start at block 0, but - * if it is a synchronized scan, we must get the starting position from - * the synchronized scan machinery. We can't hold the spinlock while - * doing that, though, so release the spinlock, get the information we - * need, and retry. If nobody else has initialized the scan in the - * meantime, we'll fill in the value we fetched on the second time - * through. - */ - if (parallel_scan->phs_startblock == InvalidBlockNumber) - { - if (!parallel_scan->phs_syncscan) - parallel_scan->phs_startblock = 0; - else if (sync_startpage != InvalidBlockNumber) - parallel_scan->phs_startblock = sync_startpage; - else - { - SpinLockRelease(¶llel_scan->phs_mutex); - sync_startpage = ss_get_location(scan->rs_rd, scan->rs_nblocks); - goto retry; - } - } - SpinLockRelease(¶llel_scan->phs_mutex); -} - -/* ---------------- - * heap_parallelscan_nextpage - get the next page to scan - * - * Get the next page to scan. Even if there are no pages left to scan, - * another backend could have grabbed a page to scan and not yet finished - * looking at it, so it doesn't follow that the scan is done when the - * first backend gets an InvalidBlockNumber return. - * ---------------- - */ -static BlockNumber -heap_parallelscan_nextpage(HeapScanDesc scan) -{ - BlockNumber page; - ParallelHeapScanDesc parallel_scan; - uint64 nallocated; - - Assert(scan->rs_parallel); - parallel_scan = scan->rs_parallel; - - /* - * phs_nallocated tracks how many pages have been allocated to workers - * already. When phs_nallocated >= rs_nblocks, all blocks have been - * allocated. - * - * Because we use an atomic fetch-and-add to fetch the current value, the - * phs_nallocated counter will exceed rs_nblocks, because workers will - * still increment the value, when they try to allocate the next block but - * all blocks have been allocated already. The counter must be 64 bits - * wide because of that, to avoid wrapping around when rs_nblocks is close - * to 2^32. - * - * The actual page to return is calculated by adding the counter to the - * starting block number, modulo nblocks. - */ - nallocated = pg_atomic_fetch_add_u64(¶llel_scan->phs_nallocated, 1); - if (nallocated >= scan->rs_nblocks) - page = InvalidBlockNumber; /* all blocks have been allocated */ - else - page = (nallocated + parallel_scan->phs_startblock) % scan->rs_nblocks; - - /* - * Report scan location. Normally, we report the current page number. - * When we reach the end of the scan, though, we report the starting page, - * not the ending page, just so the starting positions for later scans - * doesn't slew backwards. We only report the position at the end of the - * scan once, though: subsequent callers will report nothing. - */ - if (scan->rs_syncscan) - { - if (page != InvalidBlockNumber) - ss_report_location(scan->rs_rd, page); - else if (nallocated == scan->rs_nblocks) - ss_report_location(scan->rs_rd, parallel_scan->phs_startblock); - } - - return page; -} - -/* ---------------- - * heap_update_snapshot - * - * Update snapshot info in heap scan descriptor. - * ---------------- - */ -void -heap_update_snapshot(HeapScanDesc scan, Snapshot snapshot) -{ - Assert(IsMVCCSnapshot(snapshot)); - - RegisterSnapshot(snapshot); - scan->rs_snapshot = snapshot; - scan->rs_temp_snap = true; -} - -/* ---------------- - * heap_getnext - retrieve next tuple in scan - * - * Fix to work with index relations. - * We don't return the buffer anymore, but you can get it from the - * returned HeapTuple. - * ---------------- - */ - #ifdef HEAPDEBUGALL #define HEAPDEBUG_1 \ elog(DEBUG2, "heap_getnext([%s,nkeys=%d],dir=%d) called", \ @@ -1557,17 +1253,32 @@ heap_update_snapshot(HeapScanDesc scan, Snapshot snapshot) HeapTuple -heap_getnext(HeapScanDesc scan, ScanDirection direction) +heap_getnext(TableScanDesc sscan, ScanDirection direction) { + HeapScanDesc scan = (HeapScanDesc) sscan; + + /* + * This is still widely used directly, without going through table AM, so + * add a safety check. It's possible we should, at a later point, + * downgrade this to an assert. The reason for checking the AM routine, + * rather than the AM oid, is that this allows to write regression tests + * that create another AM reusing the heap handler. + */ + if (unlikely(sscan->rs_rd->rd_tableam != GetHeapamTableAmRoutine())) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("only heap AM is supported"))); + /* Note: no locking manipulations needed */ HEAPDEBUG_1; /* heap_getnext( info ) */ - if (scan->rs_pageatatime) + if (scan->rs_base.rs_pageatatime) heapgettup_pagemode(scan, direction, - scan->rs_nkeys, scan->rs_key); + scan->rs_base.rs_nkeys, scan->rs_base.rs_key); else - heapgettup(scan, direction, scan->rs_nkeys, scan->rs_key); + heapgettup(scan, direction, + scan->rs_base.rs_nkeys, scan->rs_base.rs_key); if (scan->rs_ctup.t_data == NULL) { @@ -1581,9 +1292,58 @@ heap_getnext(HeapScanDesc scan, ScanDirection direction) */ HEAPDEBUG_3; /* heap_getnext returning tuple */ - pgstat_count_heap_getnext(scan->rs_rd); + pgstat_count_heap_getnext(scan->rs_base.rs_rd); + + return &scan->rs_ctup; +} + +#ifdef HEAPAMSLOTDEBUGALL +#define HEAPAMSLOTDEBUG_1 \ + elog(DEBUG2, "heapam_getnextslot([%s,nkeys=%d],dir=%d) called", \ + RelationGetRelationName(scan->rs_base.rs_rd), scan->rs_base.rs_nkeys, (int) direction) +#define HEAPAMSLOTDEBUG_2 \ + elog(DEBUG2, "heapam_getnextslot returning EOS") +#define HEAPAMSLOTDEBUG_3 \ + elog(DEBUG2, "heapam_getnextslot returning tuple") +#else +#define HEAPAMSLOTDEBUG_1 +#define HEAPAMSLOTDEBUG_2 +#define HEAPAMSLOTDEBUG_3 +#endif + +bool +heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot) +{ + HeapScanDesc scan = (HeapScanDesc) sscan; + + /* Note: no locking manipulations needed */ + + HEAPAMSLOTDEBUG_1; /* heap_getnextslot( info ) */ + + if (scan->rs_base.rs_pageatatime) + heapgettup_pagemode(scan, direction, + scan->rs_base.rs_nkeys, scan->rs_base.rs_key); + else + heapgettup(scan, direction, scan->rs_base.rs_nkeys, scan->rs_base.rs_key); - return &(scan->rs_ctup); + if (scan->rs_ctup.t_data == NULL) + { + HEAPAMSLOTDEBUG_2; /* heap_getnextslot returning EOS */ + ExecClearTuple(slot); + return false; + } + + /* + * if we get here it means we have a new current scan tuple, so point to + * the proper return buffer and return the tuple. + */ + HEAPAMSLOTDEBUG_3; /* heap_getnextslot returning tuple */ + + pgstat_count_heap_getnext(scan->rs_base.rs_rd); + + ExecStoreBufferHeapTuple(&scan->rs_ctup, slot, + scan->rs_cbuf); + return true; } /* diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c index 518d1df84a..6a26fcef94 100644 --- a/src/backend/access/heap/heapam_handler.c +++ b/src/backend/access/heap/heapam_handler.c @@ -19,15 +19,181 @@ */ #include "postgres.h" +#include "access/heapam.h" #include "access/tableam.h" +#include "storage/bufmgr.h" #include "utils/builtins.h" static const TableAmRoutine heapam_methods; +/* ------------------------------------------------------------------------ + * Slot related callbacks for heap AM + * ------------------------------------------------------------------------ + */ + +static const TupleTableSlotOps * +heapam_slot_callbacks(Relation relation) +{ + return &TTSOpsBufferHeapTuple; +} + + +/* ------------------------------------------------------------------------ + * Index Scan Callbacks for heap AM + * ------------------------------------------------------------------------ + */ + +static IndexFetchTableData * +heapam_index_fetch_begin(Relation rel) +{ + IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData)); + + hscan->xs_base.rel = rel; + hscan->xs_cbuf = InvalidBuffer; + + return &hscan->xs_base; +} + +static void +heapam_index_fetch_reset(IndexFetchTableData *scan) +{ + IndexFetchHeapData *hscan = (IndexFetchHeapData *) scan; + + if (BufferIsValid(hscan->xs_cbuf)) + { + ReleaseBuffer(hscan->xs_cbuf); + hscan->xs_cbuf = InvalidBuffer; + } +} + +static void +heapam_index_fetch_end(IndexFetchTableData *scan) +{ + IndexFetchHeapData *hscan = (IndexFetchHeapData *) scan; + + heapam_index_fetch_reset(scan); + + pfree(hscan); +} + +static bool +heapam_index_fetch_tuple(struct IndexFetchTableData *scan, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + bool *call_again, bool *all_dead) +{ + IndexFetchHeapData *hscan = (IndexFetchHeapData *) scan; + BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot; + bool got_heap_tuple; + + Assert(TTS_IS_BUFFERTUPLE(slot)); + + /* We can skip the buffer-switching logic if we're in mid-HOT chain. */ + if (!*call_again) + { + /* Switch to correct buffer if we don't have it already */ + Buffer prev_buf = hscan->xs_cbuf; + + hscan->xs_cbuf = ReleaseAndReadBuffer(hscan->xs_cbuf, + hscan->xs_base.rel, + ItemPointerGetBlockNumber(tid)); + + /* + * Prune page, but only if we weren't already on this page + */ + if (prev_buf != hscan->xs_cbuf) + heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf); + } + + /* Obtain share-lock on the buffer so we can examine visibility */ + LockBuffer(hscan->xs_cbuf, BUFFER_LOCK_SHARE); + got_heap_tuple = heap_hot_search_buffer(tid, + hscan->xs_base.rel, + hscan->xs_cbuf, + snapshot, + &bslot->base.tupdata, + all_dead, + !*call_again); + bslot->base.tupdata.t_self = *tid; + LockBuffer(hscan->xs_cbuf, BUFFER_LOCK_UNLOCK); + + if (got_heap_tuple) + { + /* + * Only in a non-MVCC snapshot can more than one member of the HOT + * chain be visible. + */ + *call_again = !IsMVCCSnapshot(snapshot); + + slot->tts_tableOid = RelationGetRelid(scan->rel); + ExecStoreBufferHeapTuple(&bslot->base.tupdata, slot, hscan->xs_cbuf); + } + else + { + /* We've reached the end of the HOT chain. */ + *call_again = false; + } + + return got_heap_tuple; +} + + +/* ------------------------------------------------------------------------ + * Callbacks for non-modifying operations on individual tuples for heap AM + * ------------------------------------------------------------------------ + */ + +static bool +heapam_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot, + Snapshot snapshot) +{ + BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot; + bool res; + + Assert(TTS_IS_BUFFERTUPLE(slot)); + Assert(BufferIsValid(bslot->buffer)); + + /* + * We need buffer pin and lock to call HeapTupleSatisfiesVisibility. + * Caller should be holding pin, but not lock. + */ + LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE); + res = HeapTupleSatisfiesVisibility(bslot->base.tuple, snapshot, + bslot->buffer); + LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK); + + return res; +} + + +/* ------------------------------------------------------------------------ + * Definition of the heap table access method. + * ------------------------------------------------------------------------ + */ + static const TableAmRoutine heapam_methods = { .type = T_TableAmRoutine, + + .slot_callbacks = heapam_slot_callbacks, + + .scan_begin = heap_beginscan, + .scan_end = heap_endscan, + .scan_rescan = heap_rescan, + .scan_getnextslot = heap_getnextslot, + + .parallelscan_estimate = table_block_parallelscan_estimate, + .parallelscan_initialize = table_block_parallelscan_initialize, + .parallelscan_reinitialize = table_block_parallelscan_reinitialize, + + .index_fetch_begin = heapam_index_fetch_begin, + .index_fetch_reset = heapam_index_fetch_reset, + .index_fetch_end = heapam_index_fetch_end, + .index_fetch_tuple = heapam_index_fetch_tuple, + + .tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot, }; diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c index e0a5ea42d5..5222966e51 100644 --- a/src/backend/access/index/genam.c +++ b/src/backend/access/index/genam.c @@ -22,6 +22,7 @@ #include "access/genam.h" #include "access/heapam.h" #include "access/relscan.h" +#include "access/tableam.h" #include "access/transam.h" #include "catalog/index.h" #include "lib/stringinfo.h" @@ -83,6 +84,7 @@ RelationGetIndexScan(Relation indexRelation, int nkeys, int norderbys) scan = (IndexScanDesc) palloc(sizeof(IndexScanDescData)); scan->heapRelation = NULL; /* may be set later */ + scan->xs_heapfetch = NULL; scan->indexRelation = indexRelation; scan->xs_snapshot = InvalidSnapshot; /* caller must initialize this */ scan->numberOfKeys = nkeys; @@ -123,11 +125,6 @@ RelationGetIndexScan(Relation indexRelation, int nkeys, int norderbys) scan->xs_hitup = NULL; scan->xs_hitupdesc = NULL; - ItemPointerSetInvalid(&scan->xs_ctup.t_self); - scan->xs_ctup.t_data = NULL; - scan->xs_cbuf = InvalidBuffer; - scan->xs_continue_hot = false; - return scan; } @@ -335,6 +332,7 @@ systable_beginscan(Relation heapRelation, sysscan->heap_rel = heapRelation; sysscan->irel = irel; + sysscan->slot = table_slot_create(heapRelation, NULL); if (snapshot == NULL) { @@ -384,9 +382,9 @@ systable_beginscan(Relation heapRelation, * disadvantage; and there are no compensating advantages, because * it's unlikely that such scans will occur in parallel. */ - sysscan->scan = heap_beginscan_strat(heapRelation, snapshot, - nkeys, key, - true, false); + sysscan->scan = table_beginscan_strat(heapRelation, snapshot, + nkeys, key, + true, false); sysscan->iscan = NULL; } @@ -401,28 +399,46 @@ systable_beginscan(Relation heapRelation, * Note that returned tuple is a reference to data in a disk buffer; * it must not be modified, and should be presumed inaccessible after * next getnext() or endscan() call. + * + * XXX: It'd probably make sense to offer a slot based interface, at least + * optionally. */ HeapTuple systable_getnext(SysScanDesc sysscan) { - HeapTuple htup; + HeapTuple htup = NULL; if (sysscan->irel) { - htup = index_getnext(sysscan->iscan, ForwardScanDirection); + if (index_getnext_slot(sysscan->iscan, ForwardScanDirection, sysscan->slot)) + { + bool shouldFree; - /* - * We currently don't need to support lossy index operators for any - * system catalog scan. It could be done here, using the scan keys to - * drive the operator calls, if we arranged to save the heap attnums - * during systable_beginscan(); this is practical because we still - * wouldn't need to support indexes on expressions. - */ - if (htup && sysscan->iscan->xs_recheck) - elog(ERROR, "system catalog scans with lossy index conditions are not implemented"); + htup = ExecFetchSlotHeapTuple(sysscan->slot, false, &shouldFree); + Assert(!shouldFree); + + /* + * We currently don't need to support lossy index operators for + * any system catalog scan. It could be done here, using the scan + * keys to drive the operator calls, if we arranged to save the + * heap attnums during systable_beginscan(); this is practical + * because we still wouldn't need to support indexes on + * expressions. + */ + if (sysscan->iscan->xs_recheck) + elog(ERROR, "system catalog scans with lossy index conditions are not implemented"); + } } else - htup = heap_getnext(sysscan->scan, ForwardScanDirection); + { + if (table_scan_getnextslot(sysscan->scan, ForwardScanDirection, sysscan->slot)) + { + bool shouldFree; + + htup = ExecFetchSlotHeapTuple(sysscan->slot, false, &shouldFree); + Assert(!shouldFree); + } + } return htup; } @@ -446,37 +462,20 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup) Snapshot freshsnap; bool result; + Assert(tup == ExecFetchSlotHeapTuple(sysscan->slot, false, NULL)); + /* - * Trust that LockBuffer() and HeapTupleSatisfiesMVCC() do not themselves + * Trust that table_tuple_satisfies_snapshot() and its subsidiaries + * (commonly LockBuffer() and HeapTupleSatisfiesMVCC()) do not themselves * acquire snapshots, so we need not register the snapshot. Those * facilities are too low-level to have any business scanning tables. */ freshsnap = GetCatalogSnapshot(RelationGetRelid(sysscan->heap_rel)); - if (sysscan->irel) - { - IndexScanDesc scan = sysscan->iscan; - - Assert(IsMVCCSnapshot(scan->xs_snapshot)); - Assert(tup == &scan->xs_ctup); - Assert(BufferIsValid(scan->xs_cbuf)); - /* must hold a buffer lock to call HeapTupleSatisfiesVisibility */ - LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE); - result = HeapTupleSatisfiesVisibility(tup, freshsnap, scan->xs_cbuf); - LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK); - } - else - { - HeapScanDesc scan = sysscan->scan; - - Assert(IsMVCCSnapshot(scan->rs_snapshot)); - Assert(tup == &scan->rs_ctup); - Assert(BufferIsValid(scan->rs_cbuf)); - /* must hold a buffer lock to call HeapTupleSatisfiesVisibility */ - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); - result = HeapTupleSatisfiesVisibility(tup, freshsnap, scan->rs_cbuf); - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); - } + result = table_tuple_satisfies_snapshot(sysscan->heap_rel, + sysscan->slot, + freshsnap); + return result; } @@ -488,13 +487,19 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup) void systable_endscan(SysScanDesc sysscan) { + if (sysscan->slot) + { + ExecDropSingleTupleTableSlot(sysscan->slot); + sysscan->slot = NULL; + } + if (sysscan->irel) { index_endscan(sysscan->iscan); index_close(sysscan->irel, AccessShareLock); } else - heap_endscan(sysscan->scan); + table_endscan(sysscan->scan); if (sysscan->snapshot) UnregisterSnapshot(sysscan->snapshot); @@ -541,6 +546,7 @@ systable_beginscan_ordered(Relation heapRelation, sysscan->heap_rel = heapRelation; sysscan->irel = indexRelation; + sysscan->slot = table_slot_create(heapRelation, NULL); if (snapshot == NULL) { @@ -586,10 +592,12 @@ systable_beginscan_ordered(Relation heapRelation, HeapTuple systable_getnext_ordered(SysScanDesc sysscan, ScanDirection direction) { - HeapTuple htup; + HeapTuple htup = NULL; Assert(sysscan->irel); - htup = index_getnext(sysscan->iscan, direction); + if (index_getnext_slot(sysscan->iscan, direction, sysscan->slot)) + htup = ExecFetchSlotHeapTuple(sysscan->slot, false, NULL); + /* See notes in systable_getnext */ if (htup && sysscan->iscan->xs_recheck) elog(ERROR, "system catalog scans with lossy index conditions are not implemented"); @@ -603,6 +611,12 @@ systable_getnext_ordered(SysScanDesc sysscan, ScanDirection direction) void systable_endscan_ordered(SysScanDesc sysscan) { + if (sysscan->slot) + { + ExecDropSingleTupleTableSlot(sysscan->slot); + sysscan->slot = NULL; + } + Assert(sysscan->irel); index_endscan(sysscan->iscan); if (sysscan->snapshot) diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index 4ad30186d9..ae1c87ebad 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -72,6 +72,7 @@ #include "access/amapi.h" #include "access/heapam.h" #include "access/relscan.h" +#include "access/tableam.h" #include "access/transam.h" #include "access/xlog.h" #include "catalog/index.h" @@ -235,6 +236,9 @@ index_beginscan(Relation heapRelation, scan->heapRelation = heapRelation; scan->xs_snapshot = snapshot; + /* prepare to fetch index matches from table */ + scan->xs_heapfetch = table_index_fetch_begin(heapRelation); + return scan; } @@ -318,16 +322,12 @@ index_rescan(IndexScanDesc scan, Assert(nkeys == scan->numberOfKeys); Assert(norderbys == scan->numberOfOrderBys); - /* Release any held pin on a heap page */ - if (BufferIsValid(scan->xs_cbuf)) - { - ReleaseBuffer(scan->xs_cbuf); - scan->xs_cbuf = InvalidBuffer; - } - - scan->xs_continue_hot = false; + /* Release resources (like buffer pins) from table accesses */ + if (scan->xs_heapfetch) + table_index_fetch_reset(scan->xs_heapfetch); scan->kill_prior_tuple = false; /* for safety */ + scan->xs_heap_continue = false; scan->indexRelation->rd_indam->amrescan(scan, keys, nkeys, orderbys, norderbys); @@ -343,11 +343,11 @@ index_endscan(IndexScanDesc scan) SCAN_CHECKS; CHECK_SCAN_PROCEDURE(amendscan); - /* Release any held pin on a heap page */ - if (BufferIsValid(scan->xs_cbuf)) + /* Release resources (like buffer pins) from table accesses */ + if (scan->xs_heapfetch) { - ReleaseBuffer(scan->xs_cbuf); - scan->xs_cbuf = InvalidBuffer; + table_index_fetch_end(scan->xs_heapfetch); + scan->xs_heapfetch = NULL; } /* End the AM's scan */ @@ -379,17 +379,16 @@ index_markpos(IndexScanDesc scan) /* ---------------- * index_restrpos - restore a scan position * - * NOTE: this only restores the internal scan state of the index AM. - * The current result tuple (scan->xs_ctup) doesn't change. See comments - * for ExecRestrPos(). - * - * NOTE: in the presence of HOT chains, mark/restore only works correctly - * if the scan's snapshot is MVCC-safe; that ensures that there's at most one - * returnable tuple in each HOT chain, and so restoring the prior state at the - * granularity of the index AM is sufficient. Since the only current user - * of mark/restore functionality is nodeMergejoin.c, this effectively means - * that merge-join plans only work for MVCC snapshots. This could be fixed - * if necessary, but for now it seems unimportant. + * NOTE: this only restores the internal scan state of the index AM. See + * comments for ExecRestrPos(). + * + * NOTE: For heap, in the presence of HOT chains, mark/restore only works + * correctly if the scan's snapshot is MVCC-safe; that ensures that there's at + * most one returnable tuple in each HOT chain, and so restoring the prior + * state at the granularity of the index AM is sufficient. Since the only + * current user of mark/restore functionality is nodeMergejoin.c, this + * effectively means that merge-join plans only work for MVCC snapshots. This + * could be fixed if necessary, but for now it seems unimportant. * ---------------- */ void @@ -400,9 +399,12 @@ index_restrpos(IndexScanDesc scan) SCAN_CHECKS; CHECK_SCAN_PROCEDURE(amrestrpos); - scan->xs_continue_hot = false; + /* release resources (like buffer pins) from table accesses */ + if (scan->xs_heapfetch) + table_index_fetch_reset(scan->xs_heapfetch); scan->kill_prior_tuple = false; /* for safety */ + scan->xs_heap_continue = false; scan->indexRelation->rd_indam->amrestrpos(scan); } @@ -483,6 +485,9 @@ index_parallelrescan(IndexScanDesc scan) { SCAN_CHECKS; + if (scan->xs_heapfetch) + table_index_fetch_reset(scan->xs_heapfetch); + /* amparallelrescan is optional; assume no-op if not provided by AM */ if (scan->indexRelation->rd_indam->amparallelrescan != NULL) scan->indexRelation->rd_indam->amparallelrescan(scan); @@ -513,6 +518,9 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel, int nkeys, scan->heapRelation = heaprel; scan->xs_snapshot = snapshot; + /* prepare to fetch index matches from table */ + scan->xs_heapfetch = table_index_fetch_begin(heaprel); + return scan; } @@ -535,7 +543,7 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction) /* * The AM's amgettuple proc finds the next index entry matching the scan - * keys, and puts the TID into scan->xs_ctup.t_self. It should also set + * keys, and puts the TID into scan->xs_heaptid. It should also set * scan->xs_recheck and possibly scan->xs_itup/scan->xs_hitup, though we * pay no attention to those fields here. */ @@ -543,23 +551,23 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction) /* Reset kill flag immediately for safety */ scan->kill_prior_tuple = false; + scan->xs_heap_continue = false; /* If we're out of index entries, we're done */ if (!found) { - /* ... but first, release any held pin on a heap page */ - if (BufferIsValid(scan->xs_cbuf)) - { - ReleaseBuffer(scan->xs_cbuf); - scan->xs_cbuf = InvalidBuffer; - } + /* release resources (like buffer pins) from table accesses */ + if (scan->xs_heapfetch) + table_index_fetch_reset(scan->xs_heapfetch); + return NULL; } + Assert(ItemPointerIsValid(&scan->xs_heaptid)); pgstat_count_index_tuples(scan->indexRelation, 1); /* Return the TID of the tuple we found. */ - return &scan->xs_ctup.t_self; + return &scan->xs_heaptid; } /* ---------------- @@ -580,53 +588,18 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction) * enough information to do it efficiently in the general case. * ---------------- */ -HeapTuple -index_fetch_heap(IndexScanDesc scan) +bool +index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot) { - ItemPointer tid = &scan->xs_ctup.t_self; bool all_dead = false; - bool got_heap_tuple; - - /* We can skip the buffer-switching logic if we're in mid-HOT chain. */ - if (!scan->xs_continue_hot) - { - /* Switch to correct buffer if we don't have it already */ - Buffer prev_buf = scan->xs_cbuf; - - scan->xs_cbuf = ReleaseAndReadBuffer(scan->xs_cbuf, - scan->heapRelation, - ItemPointerGetBlockNumber(tid)); + bool found; - /* - * Prune page, but only if we weren't already on this page - */ - if (prev_buf != scan->xs_cbuf) - heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf); - } + found = table_index_fetch_tuple(scan->xs_heapfetch, &scan->xs_heaptid, + scan->xs_snapshot, slot, + &scan->xs_heap_continue, &all_dead); - /* Obtain share-lock on the buffer so we can examine visibility */ - LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE); - got_heap_tuple = heap_hot_search_buffer(tid, scan->heapRelation, - scan->xs_cbuf, - scan->xs_snapshot, - &scan->xs_ctup, - &all_dead, - !scan->xs_continue_hot); - LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK); - - if (got_heap_tuple) - { - /* - * Only in a non-MVCC snapshot can more than one member of the HOT - * chain be visible. - */ - scan->xs_continue_hot = !IsMVCCSnapshot(scan->xs_snapshot); + if (found) pgstat_count_heap_fetch(scan->indexRelation); - return &scan->xs_ctup; - } - - /* We've reached the end of the HOT chain. */ - scan->xs_continue_hot = false; /* * If we scanned a whole HOT chain and found only dead tuples, tell index @@ -638,17 +611,17 @@ index_fetch_heap(IndexScanDesc scan) if (!scan->xactStartedInRecovery) scan->kill_prior_tuple = all_dead; - return NULL; + return found; } /* ---------------- - * index_getnext - get the next heap tuple from a scan + * index_getnext_slot - get the next tuple from a scan * - * The result is the next heap tuple satisfying the scan keys and the - * snapshot, or NULL if no more matching tuples exist. + * The result is true if a tuple satisfying the scan keys and the snapshot was + * found, false otherwise. The tuple is stored in the specified slot. * - * On success, the buffer containing the heap tup is pinned (the pin will be - * dropped in a future index_getnext_tid, index_fetch_heap or index_endscan + * On success, resources (like buffer pins) are likely to be held, and will be + * dropped by a future index_getnext_tid, index_fetch_heap or index_endscan * call). * * Note: caller must check scan->xs_recheck, and perform rechecking of the @@ -656,32 +629,23 @@ index_fetch_heap(IndexScanDesc scan) * enough information to do it efficiently in the general case. * ---------------- */ -HeapTuple -index_getnext(IndexScanDesc scan, ScanDirection direction) +bool +index_getnext_slot(IndexScanDesc scan, ScanDirection direction, TupleTableSlot *slot) { - HeapTuple heapTuple; - ItemPointer tid; - for (;;) { - if (scan->xs_continue_hot) - { - /* - * We are resuming scan of a HOT chain after having returned an - * earlier member. Must still hold pin on current heap page. - */ - Assert(BufferIsValid(scan->xs_cbuf)); - Assert(ItemPointerGetBlockNumber(&scan->xs_ctup.t_self) == - BufferGetBlockNumber(scan->xs_cbuf)); - } - else + if (!scan->xs_heap_continue) { + ItemPointer tid; + /* Time to fetch the next TID from the index */ tid = index_getnext_tid(scan, direction); /* If we're out of index entries, we're done */ if (tid == NULL) break; + + Assert(ItemPointerEquals(tid, &scan->xs_heaptid)); } /* @@ -689,12 +653,12 @@ index_getnext(IndexScanDesc scan, ScanDirection direction) * If we don't find anything, loop around and grab the next TID from * the index. */ - heapTuple = index_fetch_heap(scan); - if (heapTuple != NULL) - return heapTuple; + Assert(ItemPointerIsValid(&scan->xs_heaptid)); + if (index_fetch_heap(scan, slot)) + return true; } - return NULL; /* failure exit */ + return false; } /* ---------------- diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index 98917de2ef..60e0b90ccf 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -310,7 +310,7 @@ btgetbitmap(IndexScanDesc scan, TIDBitmap *tbm) if (_bt_first(scan, ForwardScanDirection)) { /* Save tuple ID, and continue scanning */ - heapTid = &scan->xs_ctup.t_self; + heapTid = &scan->xs_heaptid; tbm_add_tuples(tbm, heapTid, 1, false); ntids++; diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c index 92832237a8..af3da3aa5b 100644 --- a/src/backend/access/nbtree/nbtsearch.c +++ b/src/backend/access/nbtree/nbtsearch.c @@ -1135,7 +1135,7 @@ _bt_first(IndexScanDesc scan, ScanDirection dir) readcomplete: /* OK, itemIndex says what to return */ currItem = &so->currPos.items[so->currPos.itemIndex]; - scan->xs_ctup.t_self = currItem->heapTid; + scan->xs_heaptid = currItem->heapTid; if (scan->xs_want_itup) scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset); @@ -1185,7 +1185,7 @@ _bt_next(IndexScanDesc scan, ScanDirection dir) /* OK, itemIndex says what to return */ currItem = &so->currPos.items[so->currPos.itemIndex]; - scan->xs_ctup.t_self = currItem->heapTid; + scan->xs_heaptid = currItem->heapTid; if (scan->xs_want_itup) scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset); @@ -1964,7 +1964,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir) /* OK, itemIndex says what to return */ currItem = &so->currPos.items[so->currPos.itemIndex]; - scan->xs_ctup.t_self = currItem->heapTid; + scan->xs_heaptid = currItem->heapTid; if (scan->xs_want_itup) scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset); diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c index dc398e1186..e37cbac7b3 100644 --- a/src/backend/access/nbtree/nbtsort.c +++ b/src/backend/access/nbtree/nbtsort.c @@ -61,6 +61,7 @@ #include "access/nbtree.h" #include "access/parallel.h" #include "access/relscan.h" +#include "access/tableam.h" #include "access/xact.h" #include "access/xlog.h" #include "access/xloginsert.h" @@ -158,9 +159,9 @@ typedef struct BTShared /* * This variable-sized field must come last. * - * See _bt_parallel_estimate_shared() and heap_parallelscan_estimate(). + * See _bt_parallel_estimate_shared() and table_parallelscan_estimate(). */ - ParallelHeapScanDescData heapdesc; + ParallelTableScanDescData heapdesc; } BTShared; /* @@ -282,7 +283,7 @@ static void _bt_load(BTWriteState *wstate, static void _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request); static void _bt_end_parallel(BTLeader *btleader); -static Size _bt_parallel_estimate_shared(Snapshot snapshot); +static Size _bt_parallel_estimate_shared(Relation heap, Snapshot snapshot); static double _bt_parallel_heapscan(BTBuildState *buildstate, bool *brokenhotchain); static void _bt_leader_participate_as_worker(BTBuildState *buildstate); @@ -1275,7 +1276,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request) * Estimate size for our own PARALLEL_KEY_BTREE_SHARED workspace, and * PARALLEL_KEY_TUPLESORT tuplesort workspace */ - estbtshared = _bt_parallel_estimate_shared(snapshot); + estbtshared = _bt_parallel_estimate_shared(btspool->heap, snapshot); shm_toc_estimate_chunk(&pcxt->estimator, estbtshared); estsort = tuplesort_estimate_shared(scantuplesortstates); shm_toc_estimate_chunk(&pcxt->estimator, estsort); @@ -1316,7 +1317,8 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request) btshared->havedead = false; btshared->indtuples = 0.0; btshared->brokenhotchain = false; - heap_parallelscan_initialize(&btshared->heapdesc, btspool->heap, snapshot); + table_parallelscan_initialize(btspool->heap, &btshared->heapdesc, + snapshot); /* * Store shared tuplesort-private state, for which we reserved space. @@ -1403,10 +1405,10 @@ _bt_end_parallel(BTLeader *btleader) * btree index build based on the snapshot its parallel scan will use. */ static Size -_bt_parallel_estimate_shared(Snapshot snapshot) +_bt_parallel_estimate_shared(Relation heap, Snapshot snapshot) { return add_size(offsetof(BTShared, heapdesc), - heap_parallelscan_estimate(snapshot)); + table_parallelscan_estimate(heap, snapshot)); } /* @@ -1617,7 +1619,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2, { SortCoordinate coordinate; BTBuildState buildstate; - HeapScanDesc scan; + TableScanDesc scan; double reltuples; IndexInfo *indexInfo; @@ -1670,7 +1672,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2, /* Join parallel scan */ indexInfo = BuildIndexInfo(btspool->index); indexInfo->ii_Concurrent = btshared->isconcurrent; - scan = heap_beginscan_parallel(btspool->heap, &btshared->heapdesc); + scan = table_beginscan_parallel(btspool->heap, &btshared->heapdesc); reltuples = IndexBuildHeapScan(btspool->heap, btspool->index, indexInfo, true, _bt_build_callback, (void *) &buildstate, scan); diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c index dc0d63924d..9365bc57ad 100644 --- a/src/backend/access/spgist/spgscan.c +++ b/src/backend/access/spgist/spgscan.c @@ -927,7 +927,7 @@ spggettuple(IndexScanDesc scan, ScanDirection dir) if (so->iPtr < so->nPtrs) { /* continuing to return reported tuples */ - scan->xs_ctup.t_self = so->heapPtrs[so->iPtr]; + scan->xs_heaptid = so->heapPtrs[so->iPtr]; scan->xs_recheck = so->recheck[so->iPtr]; scan->xs_hitup = so->reconTups[so->iPtr]; diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c index 84851e4ff8..628d930c13 100644 --- a/src/backend/access/table/tableam.c +++ b/src/backend/access/table/tableam.c @@ -6,13 +6,304 @@ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * - * src/backend/access/table/tableam.c + * + * IDENTIFICATION + * src/backend/access/table/tableam.c + * + * NOTES + * Note that most function in here are documented in tableam.h, rather than + * here. That's because there's a lot of inline functions in tableam.h and + * it'd be harder to understand if one constantly had to switch between files. + * *---------------------------------------------------------------------- */ #include "postgres.h" +#include "access/heapam.h" /* for ss_* */ #include "access/tableam.h" +#include "access/xact.h" +#include "storage/bufmgr.h" +#include "storage/shmem.h" /* GUC variables */ char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD; +bool synchronize_seqscans = true; + + +/* ---------------------------------------------------------------------------- + * Slot functions. + * ---------------------------------------------------------------------------- + */ + +const TupleTableSlotOps * +table_slot_callbacks(Relation relation) +{ + const TupleTableSlotOps *tts_cb; + + if (relation->rd_tableam) + tts_cb = relation->rd_tableam->slot_callbacks(relation); + else if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE) + { + /* + * Historically FDWs expect to store heap tuples in slots. Continue + * handing them one, to make it less painful to adapt FDWs to new + * versions. The cost of a heap slot over a virtual slot is pretty + * small. + */ + tts_cb = &TTSOpsHeapTuple; + } + else + { + /* + * These need to be supported, as some parts of the code (like COPY) + * need to create slots for such relations too. It seems better to + * centralize the knowledge that a heap slot is the right thing in + * that case here. + */ + Assert(relation->rd_rel->relkind == RELKIND_VIEW || + relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE); + tts_cb = &TTSOpsVirtual; + } + + return tts_cb; +} + +TupleTableSlot * +table_slot_create(Relation relation, List **reglist) +{ + const TupleTableSlotOps *tts_cb; + TupleTableSlot *slot; + + tts_cb = table_slot_callbacks(relation); + slot = MakeSingleTupleTableSlot(RelationGetDescr(relation), tts_cb); + + if (reglist) + *reglist = lappend(*reglist, slot); + + return slot; +} + + +/* ---------------------------------------------------------------------------- + * Table scan functions. + * ---------------------------------------------------------------------------- + */ + +TableScanDesc +table_beginscan_catalog(Relation relation, int nkeys, struct ScanKeyData *key) +{ + Oid relid = RelationGetRelid(relation); + Snapshot snapshot = RegisterSnapshot(GetCatalogSnapshot(relid)); + + return relation->rd_tableam->scan_begin(relation, snapshot, nkeys, key, NULL, + true, true, true, false, false, true); +} + +void +table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot) +{ + Assert(IsMVCCSnapshot(snapshot)); + + RegisterSnapshot(snapshot); + scan->rs_snapshot = snapshot; + scan->rs_temp_snap = true; +} + + +/* ---------------------------------------------------------------------------- + * Parallel table scan related functions. + * ---------------------------------------------------------------------------- + */ + +Size +table_parallelscan_estimate(Relation rel, Snapshot snapshot) +{ + Size sz = 0; + + if (IsMVCCSnapshot(snapshot)) + sz = add_size(sz, EstimateSnapshotSpace(snapshot)); + else + Assert(snapshot == SnapshotAny); + + sz = add_size(sz, rel->rd_tableam->parallelscan_estimate(rel)); + + return sz; +} + +void +table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan, + Snapshot snapshot) +{ + Size snapshot_off = rel->rd_tableam->parallelscan_initialize(rel, pscan); + + pscan->phs_snapshot_off = snapshot_off; + + if (IsMVCCSnapshot(snapshot)) + { + SerializeSnapshot(snapshot, (char *) pscan + pscan->phs_snapshot_off); + pscan->phs_snapshot_any = false; + } + else + { + Assert(snapshot == SnapshotAny); + pscan->phs_snapshot_any = true; + } +} + +TableScanDesc +table_beginscan_parallel(Relation relation, ParallelTableScanDesc parallel_scan) +{ + Snapshot snapshot; + + Assert(RelationGetRelid(relation) == parallel_scan->phs_relid); + + if (!parallel_scan->phs_snapshot_any) + { + /* Snapshot was serialized -- restore it */ + snapshot = RestoreSnapshot((char *) parallel_scan + + parallel_scan->phs_snapshot_off); + RegisterSnapshot(snapshot); + } + else + { + /* SnapshotAny passed by caller (not serialized) */ + snapshot = SnapshotAny; + } + + return relation->rd_tableam->scan_begin(relation, snapshot, 0, NULL, parallel_scan, + true, true, true, false, false, !parallel_scan->phs_snapshot_any); +} + + +/* ---------------------------------------------------------------------------- + * Helper functions to implement parallel scans for block oriented AMs. + * ---------------------------------------------------------------------------- + */ + +Size +table_block_parallelscan_estimate(Relation rel) +{ + return sizeof(ParallelBlockTableScanDescData); +} + +Size +table_block_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan) +{ + ParallelBlockTableScanDesc bpscan = (ParallelBlockTableScanDesc) pscan; + + bpscan->base.phs_relid = RelationGetRelid(rel); + bpscan->phs_nblocks = RelationGetNumberOfBlocks(rel); + /* compare phs_syncscan initialization to similar logic in initscan */ + bpscan->base.phs_syncscan = synchronize_seqscans && + !RelationUsesLocalBuffers(rel) && + bpscan->phs_nblocks > NBuffers / 4; + SpinLockInit(&bpscan->phs_mutex); + bpscan->phs_startblock = InvalidBlockNumber; + pg_atomic_init_u64(&bpscan->phs_nallocated, 0); + + return sizeof(ParallelBlockTableScanDescData); +} + +void +table_block_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan) +{ + ParallelBlockTableScanDesc bpscan = (ParallelBlockTableScanDesc) pscan; + + pg_atomic_write_u64(&bpscan->phs_nallocated, 0); +} + +/* + * find and set the scan's startblock + * + * Determine where the parallel seq scan should start. This function may be + * called many times, once by each parallel worker. We must be careful only + * to set the startblock once. + */ +void +table_block_parallelscan_startblock_init(Relation rel, ParallelBlockTableScanDesc pbscan) +{ + BlockNumber sync_startpage = InvalidBlockNumber; + +retry: + /* Grab the spinlock. */ + SpinLockAcquire(&pbscan->phs_mutex); + + /* + * If the scan's startblock has not yet been initialized, we must do so + * now. If this is not a synchronized scan, we just start at block 0, but + * if it is a synchronized scan, we must get the starting position from + * the synchronized scan machinery. We can't hold the spinlock while + * doing that, though, so release the spinlock, get the information we + * need, and retry. If nobody else has initialized the scan in the + * meantime, we'll fill in the value we fetched on the second time + * through. + */ + if (pbscan->phs_startblock == InvalidBlockNumber) + { + if (!pbscan->base.phs_syncscan) + pbscan->phs_startblock = 0; + else if (sync_startpage != InvalidBlockNumber) + pbscan->phs_startblock = sync_startpage; + else + { + SpinLockRelease(&pbscan->phs_mutex); + sync_startpage = ss_get_location(rel, pbscan->phs_nblocks); + goto retry; + } + } + SpinLockRelease(&pbscan->phs_mutex); +} + +/* + * get the next page to scan + * + * Get the next page to scan. Even if there are no pages left to scan, + * another backend could have grabbed a page to scan and not yet finished + * looking at it, so it doesn't follow that the scan is done when the first + * backend gets an InvalidBlockNumber return. + */ +BlockNumber +table_block_parallelscan_nextpage(Relation rel, ParallelBlockTableScanDesc pbscan) +{ + BlockNumber page; + uint64 nallocated; + + /* + * phs_nallocated tracks how many pages have been allocated to workers + * already. When phs_nallocated >= rs_nblocks, all blocks have been + * allocated. + * + * Because we use an atomic fetch-and-add to fetch the current value, the + * phs_nallocated counter will exceed rs_nblocks, because workers will + * still increment the value, when they try to allocate the next block but + * all blocks have been allocated already. The counter must be 64 bits + * wide because of that, to avoid wrapping around when rs_nblocks is close + * to 2^32. + * + * The actual page to return is calculated by adding the counter to the + * starting block number, modulo nblocks. + */ + nallocated = pg_atomic_fetch_add_u64(&pbscan->phs_nallocated, 1); + if (nallocated >= pbscan->phs_nblocks) + page = InvalidBlockNumber; /* all blocks have been allocated */ + else + page = (nallocated + pbscan->phs_startblock) % pbscan->phs_nblocks; + + /* + * Report scan location. Normally, we report the current page number. + * When we reach the end of the scan, though, we report the starting page, + * not the ending page, just so the starting positions for later scans + * doesn't slew backwards. We only report the position at the end of the + * scan once, though: subsequent callers will report nothing. + */ + if (pbscan->base.phs_syncscan) + { + if (page != InvalidBlockNumber) + ss_report_location(rel, page); + else if (nallocated == pbscan->phs_nblocks) + ss_report_location(rel, pbscan->phs_startblock); + } + + return page; +} diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c index 54a078d68a..3d3b82e1e5 100644 --- a/src/backend/access/table/tableamapi.c +++ b/src/backend/access/table/tableamapi.c @@ -44,6 +44,26 @@ GetTableAmRoutine(Oid amhandler) elog(ERROR, "Table access method handler %u did not return a TableAmRoutine struct", amhandler); + /* + * Assert that all required callbacks are present. That makes it a bit + * easier to keep AMs up to date, e.g. when forward porting them to a new + * major version. + */ + Assert(routine->scan_begin != NULL); + Assert(routine->scan_end != NULL); + Assert(routine->scan_rescan != NULL); + + Assert(routine->parallelscan_estimate != NULL); + Assert(routine->parallelscan_initialize != NULL); + Assert(routine->parallelscan_reinitialize != NULL); + + Assert(routine->index_fetch_begin != NULL); + Assert(routine->index_fetch_reset != NULL); + Assert(routine->index_fetch_end != NULL); + Assert(routine->index_fetch_tuple != NULL); + + Assert(routine->tuple_satisfies_snapshot != NULL); + return routine; } @@ -98,7 +118,7 @@ get_table_am_oid(const char *tableamname, bool missing_ok) { Oid result; Relation rel; - HeapScanDesc scandesc; + TableScanDesc scandesc; HeapTuple tuple; ScanKeyData entry[1]; @@ -113,7 +133,7 @@ get_table_am_oid(const char *tableamname, bool missing_ok) Anum_pg_am_amname, BTEqualStrategyNumber, F_NAMEEQ, CStringGetDatum(tableamname)); - scandesc = heap_beginscan_catalog(rel, 1, entry); + scandesc = table_beginscan_catalog(rel, 1, entry); tuple = heap_getnext(scandesc, ForwardScanDirection); /* We assume that there can be at most one matching tuple */ @@ -123,7 +143,7 @@ get_table_am_oid(const char *tableamname, bool missing_ok) else result = InvalidOid; - heap_endscan(scandesc); + table_endscan(scandesc); heap_close(rel, AccessShareLock); if (!OidIsValid(result) && !missing_ok) diff --git a/src/backend/access/tablesample/system.c b/src/backend/access/tablesample/system.c index fb1a563424..26f7de3e45 100644 --- a/src/backend/access/tablesample/system.c +++ b/src/backend/access/tablesample/system.c @@ -180,7 +180,8 @@ static BlockNumber system_nextsampleblock(SampleScanState *node) { SystemSamplerData *sampler = (SystemSamplerData *) node->tsm_state; - HeapScanDesc scan = node->ss.ss_currentScanDesc; + TableScanDesc scan = node->ss.ss_currentScanDesc; + HeapScanDesc hscan = (HeapScanDesc) scan; BlockNumber nextblock = sampler->nextblock; uint32 hashinput[2]; @@ -199,7 +200,7 @@ system_nextsampleblock(SampleScanState *node) * Loop over block numbers until finding suitable block or reaching end of * relation. */ - for (; nextblock < scan->rs_nblocks; nextblock++) + for (; nextblock < hscan->rs_nblocks; nextblock++) { uint32 hash; @@ -211,7 +212,7 @@ system_nextsampleblock(SampleScanState *node) break; } - if (nextblock < scan->rs_nblocks) + if (nextblock < hscan->rs_nblocks) { /* Found a suitable block; remember where we should start next time */ sampler->nextblock = nextblock + 1; diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index 4d7ed8ad1a..d8776e192e 100644 --- a/src/backend/bootstrap/bootstrap.c +++ b/src/backend/bootstrap/bootstrap.c @@ -20,6 +20,7 @@ #include "access/genam.h" #include "access/heapam.h" #include "access/htup_details.h" +#include "access/tableam.h" #include "access/xact.h" #include "access/xlog_internal.h" #include "bootstrap/bootstrap.h" @@ -594,7 +595,7 @@ boot_openrel(char *relname) int i; struct typmap **app; Relation rel; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tup; if (strlen(relname) >= NAMEDATALEN) @@ -604,16 +605,16 @@ boot_openrel(char *relname) { /* We can now load the pg_type data */ rel = table_open(TypeRelationId, NoLock); - scan = heap_beginscan_catalog(rel, 0, NULL); + scan = table_beginscan_catalog(rel, 0, NULL); i = 0; while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) ++i; - heap_endscan(scan); + table_endscan(scan); app = Typ = ALLOC(struct typmap *, i + 1); while (i-- > 0) *app++ = ALLOC(struct typmap, 1); *app = NULL; - scan = heap_beginscan_catalog(rel, 0, NULL); + scan = table_beginscan_catalog(rel, 0, NULL); app = Typ; while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) { @@ -623,7 +624,7 @@ boot_openrel(char *relname) sizeof((*app)->am_typ)); app++; } - heap_endscan(scan); + table_endscan(scan); table_close(rel, NoLock); } @@ -915,7 +916,7 @@ gettype(char *type) { int i; Relation rel; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tup; struct typmap **app; @@ -939,16 +940,16 @@ gettype(char *type) } elog(DEBUG4, "external type: %s", type); rel = table_open(TypeRelationId, NoLock); - scan = heap_beginscan_catalog(rel, 0, NULL); + scan = table_beginscan_catalog(rel, 0, NULL); i = 0; while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) ++i; - heap_endscan(scan); + table_endscan(scan); app = Typ = ALLOC(struct typmap *, i + 1); while (i-- > 0) *app++ = ALLOC(struct typmap, 1); *app = NULL; - scan = heap_beginscan_catalog(rel, 0, NULL); + scan = table_beginscan_catalog(rel, 0, NULL); app = Typ; while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) { @@ -957,7 +958,7 @@ gettype(char *type) (char *) GETSTRUCT(tup), sizeof((*app)->am_typ)); } - heap_endscan(scan); + table_endscan(scan); table_close(rel, NoLock); return gettype(type); } diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index 11ddce2a8b..a600f43a67 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -21,6 +21,7 @@ #include "access/heapam.h" #include "access/htup_details.h" #include "access/sysattr.h" +#include "access/tableam.h" #include "access/xact.h" #include "catalog/binary_upgrade.h" #include "catalog/catalog.h" @@ -821,7 +822,7 @@ objectsInSchemaToOids(ObjectType objtype, List *nspnames) ScanKeyData key[2]; int keycount; Relation rel; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tuple; keycount = 0; @@ -843,7 +844,7 @@ objectsInSchemaToOids(ObjectType objtype, List *nspnames) CharGetDatum(PROKIND_PROCEDURE)); rel = table_open(ProcedureRelationId, AccessShareLock); - scan = heap_beginscan_catalog(rel, keycount, key); + scan = table_beginscan_catalog(rel, keycount, key); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { @@ -852,7 +853,7 @@ objectsInSchemaToOids(ObjectType objtype, List *nspnames) objects = lappend_oid(objects, oid); } - heap_endscan(scan); + table_endscan(scan); table_close(rel, AccessShareLock); } break; @@ -877,7 +878,7 @@ getRelationsInNamespace(Oid namespaceId, char relkind) List *relations = NIL; ScanKeyData key[2]; Relation rel; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tuple; ScanKeyInit(&key[0], @@ -890,7 +891,7 @@ getRelationsInNamespace(Oid namespaceId, char relkind) CharGetDatum(relkind)); rel = table_open(RelationRelationId, AccessShareLock); - scan = heap_beginscan_catalog(rel, 2, key); + scan = table_beginscan_catalog(rel, 2, key); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { @@ -899,7 +900,7 @@ getRelationsInNamespace(Oid namespaceId, char relkind) relations = lappend_oid(relations, oid); } - heap_endscan(scan); + table_endscan(scan); table_close(rel, AccessShareLock); return relations; diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 1ee1ed2894..c339a2bb77 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -28,6 +28,7 @@ #include "access/multixact.h" #include "access/relscan.h" #include "access/sysattr.h" +#include "access/tableam.h" #include "access/transam.h" #include "access/visibilitymap.h" #include "access/xact.h" @@ -2138,7 +2139,7 @@ index_update_stats(Relation rel, ReindexIsProcessingHeap(RelationRelationId)) { /* don't assume syscache will work */ - HeapScanDesc pg_class_scan; + TableScanDesc pg_class_scan; ScanKeyData key[1]; ScanKeyInit(&key[0], @@ -2146,10 +2147,10 @@ index_update_stats(Relation rel, BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(relid)); - pg_class_scan = heap_beginscan_catalog(pg_class, 1, key); + pg_class_scan = table_beginscan_catalog(pg_class, 1, key); tuple = heap_getnext(pg_class_scan, ForwardScanDirection); tuple = heap_copytuple(tuple); - heap_endscan(pg_class_scan); + table_endscan(pg_class_scan); } else { @@ -2431,7 +2432,7 @@ IndexBuildHeapScan(Relation heapRelation, bool allow_sync, IndexBuildCallback callback, void *callback_state, - HeapScanDesc scan) + TableScanDesc scan) { return IndexBuildHeapRangeScan(heapRelation, indexRelation, indexInfo, allow_sync, @@ -2460,8 +2461,9 @@ IndexBuildHeapRangeScan(Relation heapRelation, BlockNumber numblocks, IndexBuildCallback callback, void *callback_state, - HeapScanDesc scan) + TableScanDesc scan) { + HeapScanDesc hscan; bool is_system_catalog; bool checking_uniqueness; HeapTuple heapTuple; @@ -2502,8 +2504,7 @@ IndexBuildHeapRangeScan(Relation heapRelation, */ estate = CreateExecutorState(); econtext = GetPerTupleExprContext(estate); - slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation), - &TTSOpsHeapTuple); + slot = table_slot_create(heapRelation, NULL); /* Arrange for econtext's scan tuple to be the tuple under test */ econtext->ecxt_scantuple = slot; @@ -2540,12 +2541,12 @@ IndexBuildHeapRangeScan(Relation heapRelation, else snapshot = SnapshotAny; - scan = heap_beginscan_strat(heapRelation, /* relation */ - snapshot, /* snapshot */ - 0, /* number of keys */ - NULL, /* scan key */ - true, /* buffer access strategy OK */ - allow_sync); /* syncscan OK? */ + scan = table_beginscan_strat(heapRelation, /* relation */ + snapshot, /* snapshot */ + 0, /* number of keys */ + NULL, /* scan key */ + true, /* buffer access strategy OK */ + allow_sync); /* syncscan OK? */ } else { @@ -2561,6 +2562,8 @@ IndexBuildHeapRangeScan(Relation heapRelation, snapshot = scan->rs_snapshot; } + hscan = (HeapScanDesc) scan; + /* * Must call GetOldestXmin() with SnapshotAny. Should never call * GetOldestXmin() with MVCC snapshot. (It's especially worth checking @@ -2618,15 +2621,15 @@ IndexBuildHeapRangeScan(Relation heapRelation, * tuple per HOT-chain --- else we could create more than one index * entry pointing to the same root tuple. */ - if (scan->rs_cblock != root_blkno) + if (hscan->rs_cblock != root_blkno) { - Page page = BufferGetPage(scan->rs_cbuf); + Page page = BufferGetPage(hscan->rs_cbuf); - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE); heap_get_root_tuples(page, root_offsets); - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK); - root_blkno = scan->rs_cblock; + root_blkno = hscan->rs_cblock; } if (snapshot == SnapshotAny) @@ -2643,7 +2646,7 @@ IndexBuildHeapRangeScan(Relation heapRelation, * be conservative about it. (This remark is still correct even * with HOT-pruning: our pin on the buffer prevents pruning.) */ - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE); /* * The criteria for counting a tuple as live in this block need to @@ -2652,7 +2655,7 @@ IndexBuildHeapRangeScan(Relation heapRelation, * values, e.g. when there are many recently-dead tuples. */ switch (HeapTupleSatisfiesVacuum(heapTuple, OldestXmin, - scan->rs_cbuf)) + hscan->rs_cbuf)) { case HEAPTUPLE_DEAD: /* Definitely dead, we can ignore it */ @@ -2733,7 +2736,7 @@ IndexBuildHeapRangeScan(Relation heapRelation, /* * Must drop the lock on the buffer before we wait */ - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK); XactLockTableWait(xwait, heapRelation, &heapTuple->t_self, XLTW_InsertIndexUnique); @@ -2800,7 +2803,7 @@ IndexBuildHeapRangeScan(Relation heapRelation, /* * Must drop the lock on the buffer before we wait */ - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK); XactLockTableWait(xwait, heapRelation, &heapTuple->t_self, XLTW_InsertIndexUnique); @@ -2852,7 +2855,7 @@ IndexBuildHeapRangeScan(Relation heapRelation, break; } - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK); if (!indexIt) continue; @@ -2867,7 +2870,7 @@ IndexBuildHeapRangeScan(Relation heapRelation, MemoryContextReset(econtext->ecxt_per_tuple_memory); /* Set up for predicate or expression evaluation */ - ExecStoreHeapTuple(heapTuple, slot, false); + ExecStoreBufferHeapTuple(heapTuple, slot, hscan->rs_cbuf); /* * In a partial index, discard tuples that don't satisfy the @@ -2931,7 +2934,7 @@ IndexBuildHeapRangeScan(Relation heapRelation, } } - heap_endscan(scan); + table_endscan(scan); /* we can now forget our snapshot, if set and registered by us */ if (need_unregister_snapshot) @@ -2966,8 +2969,7 @@ IndexCheckExclusion(Relation heapRelation, Relation indexRelation, IndexInfo *indexInfo) { - HeapScanDesc scan; - HeapTuple heapTuple; + TableScanDesc scan; Datum values[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; ExprState *predicate; @@ -2990,8 +2992,7 @@ IndexCheckExclusion(Relation heapRelation, */ estate = CreateExecutorState(); econtext = GetPerTupleExprContext(estate); - slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation), - &TTSOpsHeapTuple); + slot = table_slot_create(heapRelation, NULL); /* Arrange for econtext's scan tuple to be the tuple under test */ econtext->ecxt_scantuple = slot; @@ -3003,22 +3004,17 @@ IndexCheckExclusion(Relation heapRelation, * Scan all live tuples in the base relation. */ snapshot = RegisterSnapshot(GetLatestSnapshot()); - scan = heap_beginscan_strat(heapRelation, /* relation */ - snapshot, /* snapshot */ - 0, /* number of keys */ - NULL, /* scan key */ - true, /* buffer access strategy OK */ - true); /* syncscan OK */ - - while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + scan = table_beginscan_strat(heapRelation, /* relation */ + snapshot, /* snapshot */ + 0, /* number of keys */ + NULL, /* scan key */ + true, /* buffer access strategy OK */ + true); /* syncscan OK */ + + while (table_scan_getnextslot(scan, ForwardScanDirection, slot)) { CHECK_FOR_INTERRUPTS(); - MemoryContextReset(econtext->ecxt_per_tuple_memory); - - /* Set up for predicate or expression evaluation */ - ExecStoreHeapTuple(heapTuple, slot, false); - /* * In a partial index, ignore tuples that don't satisfy the predicate. */ @@ -3042,11 +3038,13 @@ IndexCheckExclusion(Relation heapRelation, */ check_exclusion_constraint(heapRelation, indexRelation, indexInfo, - &(heapTuple->t_self), values, isnull, + &(slot->tts_tid), values, isnull, estate, true); + + MemoryContextReset(econtext->ecxt_per_tuple_memory); } - heap_endscan(scan); + table_endscan(scan); UnregisterSnapshot(snapshot); ExecDropSingleTupleTableSlot(slot); @@ -3281,7 +3279,8 @@ validate_index_heapscan(Relation heapRelation, Snapshot snapshot, v_i_state *state) { - HeapScanDesc scan; + TableScanDesc scan; + HeapScanDesc hscan; HeapTuple heapTuple; Datum values[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; @@ -3324,12 +3323,13 @@ validate_index_heapscan(Relation heapRelation, * here, because it's critical that we read from block zero forward to * match the sorted TIDs. */ - scan = heap_beginscan_strat(heapRelation, /* relation */ - snapshot, /* snapshot */ - 0, /* number of keys */ - NULL, /* scan key */ - true, /* buffer access strategy OK */ - false); /* syncscan not OK */ + scan = table_beginscan_strat(heapRelation, /* relation */ + snapshot, /* snapshot */ + 0, /* number of keys */ + NULL, /* scan key */ + true, /* buffer access strategy OK */ + false); /* syncscan not OK */ + hscan = (HeapScanDesc) scan; /* * Scan all tuples matching the snapshot. @@ -3358,17 +3358,17 @@ validate_index_heapscan(Relation heapRelation, * already-passed-over tuplesort output TIDs of the current page. We * clear that array here, when advancing onto a new heap page. */ - if (scan->rs_cblock != root_blkno) + if (hscan->rs_cblock != root_blkno) { - Page page = BufferGetPage(scan->rs_cbuf); + Page page = BufferGetPage(hscan->rs_cbuf); - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE); heap_get_root_tuples(page, root_offsets); - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK); memset(in_index, 0, sizeof(in_index)); - root_blkno = scan->rs_cblock; + root_blkno = hscan->rs_cblock; } /* Convert actual tuple TID to root TID */ @@ -3493,7 +3493,7 @@ validate_index_heapscan(Relation heapRelation, } } - heap_endscan(scan); + table_endscan(scan); ExecDropSingleTupleTableSlot(slot); diff --git a/src/backend/catalog/pg_conversion.c b/src/backend/catalog/pg_conversion.c index a3bd8c2c15..04c207662a 100644 --- a/src/backend/catalog/pg_conversion.c +++ b/src/backend/catalog/pg_conversion.c @@ -17,6 +17,7 @@ #include "access/heapam.h" #include "access/htup_details.h" #include "access/sysattr.h" +#include "access/tableam.h" #include "catalog/catalog.h" #include "catalog/dependency.h" #include "catalog/indexing.h" @@ -152,7 +153,7 @@ RemoveConversionById(Oid conversionOid) { Relation rel; HeapTuple tuple; - HeapScanDesc scan; + TableScanDesc scan; ScanKeyData scanKeyData; ScanKeyInit(&scanKeyData, @@ -163,14 +164,14 @@ RemoveConversionById(Oid conversionOid) /* open pg_conversion */ rel = table_open(ConversionRelationId, RowExclusiveLock); - scan = heap_beginscan_catalog(rel, 1, &scanKeyData); + scan = table_beginscan_catalog(rel, 1, &scanKeyData); /* search for the target tuple */ if (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection))) CatalogTupleDelete(rel, &tuple->t_self); else elog(ERROR, "could not find tuple for conversion %u", conversionOid); - heap_endscan(scan); + table_endscan(scan); table_close(rel, RowExclusiveLock); } diff --git a/src/backend/catalog/pg_db_role_setting.c b/src/backend/catalog/pg_db_role_setting.c index 5189c6f7a5..20acac2eea 100644 --- a/src/backend/catalog/pg_db_role_setting.c +++ b/src/backend/catalog/pg_db_role_setting.c @@ -13,6 +13,7 @@ #include "access/genam.h" #include "access/heapam.h" #include "access/htup_details.h" +#include "access/tableam.h" #include "catalog/indexing.h" #include "catalog/objectaccess.h" #include "catalog/pg_db_role_setting.h" @@ -169,7 +170,7 @@ void DropSetting(Oid databaseid, Oid roleid) { Relation relsetting; - HeapScanDesc scan; + TableScanDesc scan; ScanKeyData keys[2]; HeapTuple tup; int numkeys = 0; @@ -195,12 +196,12 @@ DropSetting(Oid databaseid, Oid roleid) numkeys++; } - scan = heap_beginscan_catalog(relsetting, numkeys, keys); + scan = table_beginscan_catalog(relsetting, numkeys, keys); while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection))) { CatalogTupleDelete(relsetting, &tup->t_self); } - heap_endscan(scan); + table_endscan(scan); table_close(relsetting, RowExclusiveLock); } diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c index a994d7bb6d..bbf2173936 100644 --- a/src/backend/catalog/pg_publication.c +++ b/src/backend/catalog/pg_publication.c @@ -20,6 +20,7 @@ #include "access/genam.h" #include "access/heapam.h" #include "access/htup_details.h" +#include "access/tableam.h" #include "access/xact.h" #include "catalog/catalog.h" @@ -328,7 +329,7 @@ GetAllTablesPublicationRelations(void) { Relation classRel; ScanKeyData key[1]; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tuple; List *result = NIL; @@ -339,7 +340,7 @@ GetAllTablesPublicationRelations(void) BTEqualStrategyNumber, F_CHAREQ, CharGetDatum(RELKIND_RELATION)); - scan = heap_beginscan_catalog(classRel, 1, key); + scan = table_beginscan_catalog(classRel, 1, key); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { @@ -350,7 +351,7 @@ GetAllTablesPublicationRelations(void) result = lappend_oid(result, relid); } - heap_endscan(scan); + table_endscan(scan); table_close(classRel, AccessShareLock); return result; diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c index 935d7670e4..afee2838cc 100644 --- a/src/backend/catalog/pg_subscription.c +++ b/src/backend/catalog/pg_subscription.c @@ -19,6 +19,7 @@ #include "access/genam.h" #include "access/heapam.h" #include "access/htup_details.h" +#include "access/tableam.h" #include "access/xact.h" #include "catalog/indexing.h" @@ -390,7 +391,7 @@ void RemoveSubscriptionRel(Oid subid, Oid relid) { Relation rel; - HeapScanDesc scan; + TableScanDesc scan; ScanKeyData skey[2]; HeapTuple tup; int nkeys = 0; @@ -416,12 +417,12 @@ RemoveSubscriptionRel(Oid subid, Oid relid) } /* Do the search and delete what we found. */ - scan = heap_beginscan_catalog(rel, nkeys, skey); + scan = table_beginscan_catalog(rel, nkeys, skey); while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection))) { CatalogTupleDelete(rel, &tup->t_self); } - heap_endscan(scan); + table_endscan(scan); table_close(rel, RowExclusiveLock); } diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 4d6453d924..3e2a807640 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -22,6 +22,7 @@ #include "access/multixact.h" #include "access/relscan.h" #include "access/rewriteheap.h" +#include "access/tableam.h" #include "access/transam.h" #include "access/tuptoaster.h" #include "access/xact.h" @@ -764,6 +765,7 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, Datum *values; bool *isnull; IndexScanDesc indexScan; + TableScanDesc tableScan; HeapScanDesc heapScan; bool use_wal; bool is_system_catalog; @@ -779,6 +781,8 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, BlockNumber num_pages; int elevel = verbose ? INFO : DEBUG2; PGRUsage ru0; + TupleTableSlot *slot; + BufferHeapTupleTableSlot *hslot; pg_rusage_init(&ru0); @@ -924,16 +928,21 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, */ if (OldIndex != NULL && !use_sort) { + tableScan = NULL; heapScan = NULL; indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, 0, 0); index_rescan(indexScan, NULL, 0, NULL, 0); } else { - heapScan = heap_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL); + tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL); + heapScan = (HeapScanDesc) tableScan; indexScan = NULL; } + slot = table_slot_create(OldHeap, NULL); + hslot = (BufferHeapTupleTableSlot *) slot; + /* Log what we're doing */ if (indexScan != NULL) ereport(elevel, @@ -968,19 +977,19 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, if (indexScan != NULL) { - tuple = index_getnext(indexScan, ForwardScanDirection); - if (tuple == NULL) + if (!index_getnext_slot(indexScan, ForwardScanDirection, slot)) break; /* Since we used no scan keys, should never need to recheck */ if (indexScan->xs_recheck) elog(ERROR, "CLUSTER does not support lossy index conditions"); - buf = indexScan->xs_cbuf; + tuple = hslot->base.tuple; + buf = hslot->buffer; } else { - tuple = heap_getnext(heapScan, ForwardScanDirection); + tuple = heap_getnext(tableScan, ForwardScanDirection); if (tuple == NULL) break; @@ -1066,7 +1075,9 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, if (indexScan != NULL) index_endscan(indexScan); if (heapScan != NULL) - heap_endscan(heapScan); + table_endscan(tableScan); + if (slot) + ExecDropSingleTupleTableSlot(slot); /* * In scan-and-sort mode, complete the sort, then read out all live tuples @@ -1694,7 +1705,7 @@ static List * get_tables_to_cluster(MemoryContext cluster_context) { Relation indRelation; - HeapScanDesc scan; + TableScanDesc scan; ScanKeyData entry; HeapTuple indexTuple; Form_pg_index index; @@ -1713,7 +1724,7 @@ get_tables_to_cluster(MemoryContext cluster_context) Anum_pg_index_indisclustered, BTEqualStrategyNumber, F_BOOLEQ, BoolGetDatum(true)); - scan = heap_beginscan_catalog(indRelation, 1, &entry); + scan = table_beginscan_catalog(indRelation, 1, &entry); while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { index = (Form_pg_index) GETSTRUCT(indexTuple); @@ -1734,7 +1745,7 @@ get_tables_to_cluster(MemoryContext cluster_context) MemoryContextSwitchTo(old_context); } - heap_endscan(scan); + table_endscan(scan); relation_close(indRelation, AccessShareLock); diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c index f9ada29af8..cd04e4ea81 100644 --- a/src/backend/commands/constraint.c +++ b/src/backend/commands/constraint.c @@ -15,6 +15,7 @@ #include "access/genam.h" #include "access/heapam.h" +#include "access/tableam.h" #include "catalog/index.h" #include "commands/trigger.h" #include "executor/executor.h" @@ -41,7 +42,7 @@ unique_key_recheck(PG_FUNCTION_ARGS) { TriggerData *trigdata = castNode(TriggerData, fcinfo->context); const char *funcname = "unique_key_recheck"; - HeapTuple new_row; + ItemPointerData checktid; ItemPointerData tmptid; Relation indexRel; IndexInfo *indexInfo; @@ -73,28 +74,30 @@ unique_key_recheck(PG_FUNCTION_ARGS) * Get the new data that was inserted/updated. */ if (TRIGGER_FIRED_BY_INSERT(trigdata->tg_event)) - new_row = trigdata->tg_trigtuple; + checktid = trigdata->tg_trigslot->tts_tid; else if (TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event)) - new_row = trigdata->tg_newtuple; + checktid = trigdata->tg_newslot->tts_tid; else { ereport(ERROR, (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), errmsg("function \"%s\" must be fired for INSERT or UPDATE", funcname))); - new_row = NULL; /* keep compiler quiet */ + ItemPointerSetInvalid(&checktid); /* keep compiler quiet */ } + slot = table_slot_create(trigdata->tg_relation, NULL); + /* - * If the new_row is now dead (ie, inserted and then deleted within our - * transaction), we can skip the check. However, we have to be careful, - * because this trigger gets queued only in response to index insertions; - * which means it does not get queued for HOT updates. The row we are - * called for might now be dead, but have a live HOT child, in which case - * we still need to make the check --- effectively, we're applying the - * check against the live child row, although we can use the values from - * this row since by definition all columns of interest to us are the - * same. + * If the row pointed at by checktid is now dead (ie, inserted and then + * deleted within our transaction), we can skip the check. However, we + * have to be careful, because this trigger gets queued only in response + * to index insertions; which means it does not get queued e.g. for HOT + * updates. The row we are called for might now be dead, but have a live + * HOT child, in which case we still need to make the check --- + * effectively, we're applying the check against the live child row, + * although we can use the values from this row since by definition all + * columns of interest to us are the same. * * This might look like just an optimization, because the index AM will * make this identical test before throwing an error. But it's actually @@ -103,13 +106,23 @@ unique_key_recheck(PG_FUNCTION_ARGS) * it's possible the index entry has also been marked dead, and even * removed. */ - tmptid = new_row->t_self; - if (!heap_hot_search(&tmptid, trigdata->tg_relation, SnapshotSelf, NULL)) + tmptid = checktid; { - /* - * All rows in the HOT chain are dead, so skip the check. - */ - return PointerGetDatum(NULL); + IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation); + bool call_again = false; + + if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot, + &call_again, NULL)) + { + /* + * All rows referenced by the index entry are dead, so skip the + * check. + */ + ExecDropSingleTupleTableSlot(slot); + table_index_fetch_end(scan); + return PointerGetDatum(NULL); + } + table_index_fetch_end(scan); } /* @@ -121,14 +134,6 @@ unique_key_recheck(PG_FUNCTION_ARGS) RowExclusiveLock); indexInfo = BuildIndexInfo(indexRel); - /* - * The heap tuple must be put into a slot for FormIndexDatum. - */ - slot = MakeSingleTupleTableSlot(RelationGetDescr(trigdata->tg_relation), - &TTSOpsHeapTuple); - - ExecStoreHeapTuple(new_row, slot, false); - /* * Typically the index won't have expressions, but if it does we need an * EState to evaluate them. We need it for exclusion constraints too, @@ -163,11 +168,12 @@ unique_key_recheck(PG_FUNCTION_ARGS) { /* * Note: this is not a real insert; it is a check that the index entry - * that has already been inserted is unique. Passing t_self is - * correct even if t_self is now dead, because that is the TID the - * index will know about. + * that has already been inserted is unique. Passing the tuple's tid + * (i.e. unmodified by table_index_fetch_tuple()) is correct even if + * the row is now dead, because that is the TID the index will know + * about. */ - index_insert(indexRel, values, isnull, &(new_row->t_self), + index_insert(indexRel, values, isnull, &checktid, trigdata->tg_relation, UNIQUE_CHECK_EXISTING, indexInfo); } diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 12415b4e99..a0ea4f6c38 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -21,6 +21,7 @@ #include "access/heapam.h" #include "access/htup_details.h" #include "access/sysattr.h" +#include "access/tableam.h" #include "access/xact.h" #include "access/xlog.h" #include "catalog/dependency.h" @@ -2073,13 +2074,13 @@ CopyTo(CopyState cstate) { Datum *values; bool *nulls; - HeapScanDesc scandesc; + TableScanDesc scandesc; HeapTuple tuple; values = (Datum *) palloc(num_phys_attrs * sizeof(Datum)); nulls = (bool *) palloc(num_phys_attrs * sizeof(bool)); - scandesc = heap_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL); + scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL); processed = 0; while ((tuple = heap_getnext(scandesc, ForwardScanDirection)) != NULL) @@ -2094,7 +2095,7 @@ CopyTo(CopyState cstate) processed++; } - heap_endscan(scandesc); + table_endscan(scandesc); pfree(values); pfree(nulls); diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index d207cd899f..35cad0b629 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -26,6 +26,7 @@ #include "access/genam.h" #include "access/heapam.h" #include "access/htup_details.h" +#include "access/tableam.h" #include "access/xact.h" #include "access/xloginsert.h" #include "access/xlogutils.h" @@ -97,7 +98,7 @@ static int errdetail_busy_db(int notherbackends, int npreparedxacts); Oid createdb(ParseState *pstate, const CreatedbStmt *stmt) { - HeapScanDesc scan; + TableScanDesc scan; Relation rel; Oid src_dboid; Oid src_owner; @@ -589,7 +590,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) * each one to the new database. */ rel = table_open(TableSpaceRelationId, AccessShareLock); - scan = heap_beginscan_catalog(rel, 0, NULL); + scan = table_beginscan_catalog(rel, 0, NULL); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { Form_pg_tablespace spaceform = (Form_pg_tablespace) GETSTRUCT(tuple); @@ -643,7 +644,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) XLOG_DBASE_CREATE | XLR_SPECIAL_REL_UPDATE); } } - heap_endscan(scan); + table_endscan(scan); table_close(rel, AccessShareLock); /* @@ -1870,11 +1871,11 @@ static void remove_dbtablespaces(Oid db_id) { Relation rel; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tuple; rel = table_open(TableSpaceRelationId, AccessShareLock); - scan = heap_beginscan_catalog(rel, 0, NULL); + scan = table_beginscan_catalog(rel, 0, NULL); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { Form_pg_tablespace spcform = (Form_pg_tablespace) GETSTRUCT(tuple); @@ -1917,7 +1918,7 @@ remove_dbtablespaces(Oid db_id) pfree(dstpath); } - heap_endscan(scan); + table_endscan(scan); table_close(rel, AccessShareLock); } @@ -1938,11 +1939,11 @@ check_db_file_conflict(Oid db_id) { bool result = false; Relation rel; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tuple; rel = table_open(TableSpaceRelationId, AccessShareLock); - scan = heap_beginscan_catalog(rel, 0, NULL); + scan = table_beginscan_catalog(rel, 0, NULL); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { Form_pg_tablespace spcform = (Form_pg_tablespace) GETSTRUCT(tuple); @@ -1967,7 +1968,7 @@ check_db_file_conflict(Oid db_id) pfree(dstpath); } - heap_endscan(scan); + table_endscan(scan); table_close(rel, AccessShareLock); return result; diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 5dcedc337a..94006c1189 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -20,6 +20,7 @@ #include "access/htup_details.h" #include "access/reloptions.h" #include "access/sysattr.h" +#include "access/tableam.h" #include "access/xact.h" #include "catalog/catalog.h" #include "catalog/index.h" @@ -2336,7 +2337,7 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind, { Oid objectOid; Relation relationRelation; - HeapScanDesc scan; + TableScanDesc scan; ScanKeyData scan_keys[1]; HeapTuple tuple; MemoryContext private_context; @@ -2410,7 +2411,7 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind, * rels will be processed indirectly by reindex_relation). */ relationRelation = table_open(RelationRelationId, AccessShareLock); - scan = heap_beginscan_catalog(relationRelation, num_keys, scan_keys); + scan = table_beginscan_catalog(relationRelation, num_keys, scan_keys); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple); @@ -2469,7 +2470,7 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind, MemoryContextSwitchTo(old); } - heap_endscan(scan); + table_endscan(scan); table_close(relationRelation, AccessShareLock); /* Now reindex each rel in a separate transaction */ diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 59341e2a40..5ed560b02f 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -4736,12 +4736,9 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) if (newrel || needscan) { ExprContext *econtext; - Datum *values; - bool *isnull; TupleTableSlot *oldslot; TupleTableSlot *newslot; - HeapScanDesc scan; - HeapTuple tuple; + TableScanDesc scan; MemoryContext oldCxt; List *dropped_attrs = NIL; ListCell *lc; @@ -4769,19 +4766,27 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) econtext = GetPerTupleExprContext(estate); /* - * Make tuple slots for old and new tuples. Note that even when the - * tuples are the same, the tupDescs might not be (consider ADD COLUMN - * without a default). + * Create necessary tuple slots. When rewriting, two slots are needed, + * otherwise one suffices. In the case where one slot suffices, we + * need to use the new tuple descriptor, otherwise some constraints + * can't be evaluated. Note that even when the tuple layout is the + * same and no rewrite is required, the tupDescs might not be + * (consider ADD COLUMN without a default). */ - oldslot = MakeSingleTupleTableSlot(oldTupDesc, &TTSOpsHeapTuple); - newslot = MakeSingleTupleTableSlot(newTupDesc, &TTSOpsHeapTuple); - - /* Preallocate values/isnull arrays */ - i = Max(newTupDesc->natts, oldTupDesc->natts); - values = (Datum *) palloc(i * sizeof(Datum)); - isnull = (bool *) palloc(i * sizeof(bool)); - memset(values, 0, i * sizeof(Datum)); - memset(isnull, true, i * sizeof(bool)); + if (tab->rewrite) + { + Assert(newrel != NULL); + oldslot = MakeSingleTupleTableSlot(oldTupDesc, + table_slot_callbacks(oldrel)); + newslot = MakeSingleTupleTableSlot(newTupDesc, + table_slot_callbacks(newrel)); + } + else + { + oldslot = MakeSingleTupleTableSlot(newTupDesc, + table_slot_callbacks(oldrel)); + newslot = NULL; + } /* * Any attributes that are dropped according to the new tuple @@ -4799,7 +4804,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) * checking all the constraints. */ snapshot = RegisterSnapshot(GetLatestSnapshot()); - scan = heap_beginscan(oldrel, snapshot, 0, NULL); + scan = table_beginscan(oldrel, snapshot, 0, NULL); /* * Switch to per-tuple memory context and reset it for each tuple @@ -4807,55 +4812,69 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) */ oldCxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); - while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + while (table_scan_getnextslot(scan, ForwardScanDirection, oldslot)) { + TupleTableSlot *insertslot; + if (tab->rewrite > 0) { /* Extract data from old tuple */ - heap_deform_tuple(tuple, oldTupDesc, values, isnull); + slot_getallattrs(oldslot); + ExecClearTuple(newslot); + + /* copy attributes */ + memcpy(newslot->tts_values, oldslot->tts_values, + sizeof(Datum) * oldslot->tts_nvalid); + memcpy(newslot->tts_isnull, oldslot->tts_isnull, + sizeof(bool) * oldslot->tts_nvalid); /* Set dropped attributes to null in new tuple */ foreach(lc, dropped_attrs) - isnull[lfirst_int(lc)] = true; + newslot->tts_isnull[lfirst_int(lc)] = true; /* * Process supplied expressions to replace selected columns. * Expression inputs come from the old tuple. */ - ExecStoreHeapTuple(tuple, oldslot, false); econtext->ecxt_scantuple = oldslot; foreach(l, tab->newvals) { NewColumnValue *ex = lfirst(l); - values[ex->attnum - 1] = ExecEvalExpr(ex->exprstate, - econtext, - &isnull[ex->attnum - 1]); + newslot->tts_values[ex->attnum - 1] + = ExecEvalExpr(ex->exprstate, + econtext, + &newslot->tts_isnull[ex->attnum - 1]); } - /* - * Form the new tuple. Note that we don't explicitly pfree it, - * since the per-tuple memory context will be reset shortly. - */ - tuple = heap_form_tuple(newTupDesc, values, isnull); + ExecStoreVirtualTuple(newslot); /* * Constraints might reference the tableoid column, so * initialize t_tableOid before evaluating them. */ - tuple->t_tableOid = RelationGetRelid(oldrel); + newslot->tts_tableOid = RelationGetRelid(oldrel); + insertslot = newslot; + } + else + { + /* + * If there's no rewrite, old and new table are guaranteed to + * have the same AM, so we can just use the old slot to + * verify new constraints etc. + */ + insertslot = oldslot; } /* Now check any constraints on the possibly-changed tuple */ - ExecStoreHeapTuple(tuple, newslot, false); - econtext->ecxt_scantuple = newslot; + econtext->ecxt_scantuple = insertslot; foreach(l, notnull_attrs) { int attn = lfirst_int(l); - if (heap_attisnull(tuple, attn + 1, newTupDesc)) + if (slot_attisnull(insertslot, attn + 1)) { Form_pg_attribute attr = TupleDescAttr(newTupDesc, attn); @@ -4905,6 +4924,9 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) /* Write the tuple out to the new relation */ if (newrel) { + HeapTuple tuple; + + tuple = ExecFetchSlotHeapTuple(newslot, true, NULL); heap_insert(newrel, tuple, mycid, hi_options, bistate); ItemPointerCopy(&tuple->t_self, &newslot->tts_tid); } @@ -4915,11 +4937,12 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) } MemoryContextSwitchTo(oldCxt); - heap_endscan(scan); + table_endscan(scan); UnregisterSnapshot(snapshot); ExecDropSingleTupleTableSlot(oldslot); - ExecDropSingleTupleTableSlot(newslot); + if (newslot) + ExecDropSingleTupleTableSlot(newslot); } FreeExecutorState(estate); @@ -5310,7 +5333,7 @@ find_typed_table_dependencies(Oid typeOid, const char *typeName, DropBehavior be { Relation classRel; ScanKeyData key[1]; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tuple; List *result = NIL; @@ -5321,7 +5344,7 @@ find_typed_table_dependencies(Oid typeOid, const char *typeName, DropBehavior be BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(typeOid)); - scan = heap_beginscan_catalog(classRel, 1, key); + scan = table_beginscan_catalog(classRel, 1, key); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { @@ -5337,7 +5360,7 @@ find_typed_table_dependencies(Oid typeOid, const char *typeName, DropBehavior be result = lappend_oid(result, classform->oid); } - heap_endscan(scan); + table_endscan(scan); table_close(classRel, AccessShareLock); return result; @@ -8822,9 +8845,7 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup) char *conbin; Expr *origexpr; ExprState *exprstate; - TupleDesc tupdesc; - HeapScanDesc scan; - HeapTuple tuple; + TableScanDesc scan; ExprContext *econtext; MemoryContext oldcxt; TupleTableSlot *slot; @@ -8859,12 +8880,11 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup) exprstate = ExecPrepareExpr(origexpr, estate); econtext = GetPerTupleExprContext(estate); - tupdesc = RelationGetDescr(rel); - slot = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple); + slot = table_slot_create(rel, NULL); econtext->ecxt_scantuple = slot; snapshot = RegisterSnapshot(GetLatestSnapshot()); - scan = heap_beginscan(rel, snapshot, 0, NULL); + scan = table_beginscan(rel, snapshot, 0, NULL); /* * Switch to per-tuple memory context and reset it for each tuple @@ -8872,10 +8892,8 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup) */ oldcxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); - while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + while (table_scan_getnextslot(scan, ForwardScanDirection, slot)) { - ExecStoreHeapTuple(tuple, slot, false); - if (!ExecCheck(exprstate, econtext)) ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION), @@ -8887,7 +8905,7 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup) } MemoryContextSwitchTo(oldcxt); - heap_endscan(scan); + table_endscan(scan); UnregisterSnapshot(snapshot); ExecDropSingleTupleTableSlot(slot); FreeExecutorState(estate); @@ -8906,8 +8924,8 @@ validateForeignKeyConstraint(char *conname, Oid pkindOid, Oid constraintOid) { - HeapScanDesc scan; - HeapTuple tuple; + TupleTableSlot *slot; + TableScanDesc scan; Trigger trig; Snapshot snapshot; @@ -8942,9 +8960,10 @@ validateForeignKeyConstraint(char *conname, * ereport(ERROR) and that's that. */ snapshot = RegisterSnapshot(GetLatestSnapshot()); - scan = heap_beginscan(rel, snapshot, 0, NULL); + slot = table_slot_create(rel, NULL); + scan = table_beginscan(rel, snapshot, 0, NULL); - while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + while (table_scan_getnextslot(scan, ForwardScanDirection, slot)) { LOCAL_FCINFO(fcinfo, 0); TriggerData trigdata; @@ -8962,7 +8981,8 @@ validateForeignKeyConstraint(char *conname, trigdata.type = T_TriggerData; trigdata.tg_event = TRIGGER_EVENT_INSERT | TRIGGER_EVENT_ROW; trigdata.tg_relation = rel; - trigdata.tg_trigtuple = tuple; + trigdata.tg_trigtuple = ExecFetchSlotHeapTuple(slot, true, NULL); + trigdata.tg_trigslot = slot; trigdata.tg_newtuple = NULL; trigdata.tg_trigger = &trig; @@ -8971,8 +8991,9 @@ validateForeignKeyConstraint(char *conname, RI_FKey_check_ins(fcinfo); } - heap_endscan(scan); + table_endscan(scan); UnregisterSnapshot(snapshot); + ExecDropSingleTupleTableSlot(slot); } static void @@ -11618,7 +11639,7 @@ AlterTableMoveAll(AlterTableMoveAllStmt *stmt) ListCell *l; ScanKeyData key[1]; Relation rel; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tuple; Oid orig_tablespaceoid; Oid new_tablespaceoid; @@ -11683,7 +11704,7 @@ AlterTableMoveAll(AlterTableMoveAllStmt *stmt) ObjectIdGetDatum(orig_tablespaceoid)); rel = table_open(RelationRelationId, AccessShareLock); - scan = heap_beginscan_catalog(rel, 1, key); + scan = table_beginscan_catalog(rel, 1, key); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { Form_pg_class relForm = (Form_pg_class) GETSTRUCT(tuple); @@ -11742,7 +11763,7 @@ AlterTableMoveAll(AlterTableMoveAllStmt *stmt) relations = lappend_oid(relations, relOid); } - heap_endscan(scan); + table_endscan(scan); table_close(rel, AccessShareLock); if (relations == NIL) diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 4afd178e97..3784ea4b4f 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -54,6 +54,7 @@ #include "access/reloptions.h" #include "access/htup_details.h" #include "access/sysattr.h" +#include "access/tableam.h" #include "access/xact.h" #include "access/xlog.h" #include "access/xloginsert.h" @@ -405,7 +406,7 @@ DropTableSpace(DropTableSpaceStmt *stmt) { #ifdef HAVE_SYMLINK char *tablespacename = stmt->tablespacename; - HeapScanDesc scandesc; + TableScanDesc scandesc; Relation rel; HeapTuple tuple; Form_pg_tablespace spcform; @@ -421,7 +422,7 @@ DropTableSpace(DropTableSpaceStmt *stmt) Anum_pg_tablespace_spcname, BTEqualStrategyNumber, F_NAMEEQ, CStringGetDatum(tablespacename)); - scandesc = heap_beginscan_catalog(rel, 1, entry); + scandesc = table_beginscan_catalog(rel, 1, entry); tuple = heap_getnext(scandesc, ForwardScanDirection); if (!HeapTupleIsValid(tuple)) @@ -439,7 +440,7 @@ DropTableSpace(DropTableSpaceStmt *stmt) (errmsg("tablespace \"%s\" does not exist, skipping", tablespacename))); /* XXX I assume I need one or both of these next two calls */ - heap_endscan(scandesc); + table_endscan(scandesc); table_close(rel, NoLock); } return; @@ -467,7 +468,7 @@ DropTableSpace(DropTableSpaceStmt *stmt) */ CatalogTupleDelete(rel, &tuple->t_self); - heap_endscan(scandesc); + table_endscan(scandesc); /* * Remove any comments or security labels on this tablespace. @@ -918,7 +919,7 @@ RenameTableSpace(const char *oldname, const char *newname) Oid tspId; Relation rel; ScanKeyData entry[1]; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tup; HeapTuple newtuple; Form_pg_tablespace newform; @@ -931,7 +932,7 @@ RenameTableSpace(const char *oldname, const char *newname) Anum_pg_tablespace_spcname, BTEqualStrategyNumber, F_NAMEEQ, CStringGetDatum(oldname)); - scan = heap_beginscan_catalog(rel, 1, entry); + scan = table_beginscan_catalog(rel, 1, entry); tup = heap_getnext(scan, ForwardScanDirection); if (!HeapTupleIsValid(tup)) ereport(ERROR, @@ -943,7 +944,7 @@ RenameTableSpace(const char *oldname, const char *newname) newform = (Form_pg_tablespace) GETSTRUCT(newtuple); tspId = newform->oid; - heap_endscan(scan); + table_endscan(scan); /* Must be owner */ if (!pg_tablespace_ownercheck(tspId, GetUserId())) @@ -961,7 +962,7 @@ RenameTableSpace(const char *oldname, const char *newname) Anum_pg_tablespace_spcname, BTEqualStrategyNumber, F_NAMEEQ, CStringGetDatum(newname)); - scan = heap_beginscan_catalog(rel, 1, entry); + scan = table_beginscan_catalog(rel, 1, entry); tup = heap_getnext(scan, ForwardScanDirection); if (HeapTupleIsValid(tup)) ereport(ERROR, @@ -969,7 +970,7 @@ RenameTableSpace(const char *oldname, const char *newname) errmsg("tablespace \"%s\" already exists", newname))); - heap_endscan(scan); + table_endscan(scan); /* OK, update the entry */ namestrcpy(&(newform->spcname), newname); @@ -993,7 +994,7 @@ AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt) { Relation rel; ScanKeyData entry[1]; - HeapScanDesc scandesc; + TableScanDesc scandesc; HeapTuple tup; Oid tablespaceoid; Datum datum; @@ -1011,7 +1012,7 @@ AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt) Anum_pg_tablespace_spcname, BTEqualStrategyNumber, F_NAMEEQ, CStringGetDatum(stmt->tablespacename)); - scandesc = heap_beginscan_catalog(rel, 1, entry); + scandesc = table_beginscan_catalog(rel, 1, entry); tup = heap_getnext(scandesc, ForwardScanDirection); if (!HeapTupleIsValid(tup)) ereport(ERROR, @@ -1053,7 +1054,7 @@ AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt) heap_freetuple(newtuple); /* Conclude heap scan. */ - heap_endscan(scandesc); + table_endscan(scandesc); table_close(rel, NoLock); return tablespaceoid; @@ -1387,7 +1388,7 @@ get_tablespace_oid(const char *tablespacename, bool missing_ok) { Oid result; Relation rel; - HeapScanDesc scandesc; + TableScanDesc scandesc; HeapTuple tuple; ScanKeyData entry[1]; @@ -1402,7 +1403,7 @@ get_tablespace_oid(const char *tablespacename, bool missing_ok) Anum_pg_tablespace_spcname, BTEqualStrategyNumber, F_NAMEEQ, CStringGetDatum(tablespacename)); - scandesc = heap_beginscan_catalog(rel, 1, entry); + scandesc = table_beginscan_catalog(rel, 1, entry); tuple = heap_getnext(scandesc, ForwardScanDirection); /* We assume that there can be at most one matching tuple */ @@ -1411,7 +1412,7 @@ get_tablespace_oid(const char *tablespacename, bool missing_ok) else result = InvalidOid; - heap_endscan(scandesc); + table_endscan(scandesc); table_close(rel, AccessShareLock); if (!OidIsValid(result) && !missing_ok) @@ -1433,7 +1434,7 @@ get_tablespace_name(Oid spc_oid) { char *result; Relation rel; - HeapScanDesc scandesc; + TableScanDesc scandesc; HeapTuple tuple; ScanKeyData entry[1]; @@ -1448,7 +1449,7 @@ get_tablespace_name(Oid spc_oid) Anum_pg_tablespace_oid, BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(spc_oid)); - scandesc = heap_beginscan_catalog(rel, 1, entry); + scandesc = table_beginscan_catalog(rel, 1, entry); tuple = heap_getnext(scandesc, ForwardScanDirection); /* We assume that there can be at most one matching tuple */ @@ -1457,7 +1458,7 @@ get_tablespace_name(Oid spc_oid) else result = NULL; - heap_endscan(scandesc); + table_endscan(scandesc); table_close(rel, AccessShareLock); return result; diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index 448926db12..f94248dc95 100644 --- a/src/backend/commands/typecmds.c +++ b/src/backend/commands/typecmds.c @@ -34,6 +34,7 @@ #include "access/genam.h" #include "access/heapam.h" #include "access/htup_details.h" +#include "access/tableam.h" #include "access/xact.h" #include "catalog/binary_upgrade.h" #include "catalog/catalog.h" @@ -2362,14 +2363,15 @@ AlterDomainNotNull(List *names, bool notNull) RelToCheck *rtc = (RelToCheck *) lfirst(rt); Relation testrel = rtc->rel; TupleDesc tupdesc = RelationGetDescr(testrel); - HeapScanDesc scan; - HeapTuple tuple; + TupleTableSlot *slot; + TableScanDesc scan; Snapshot snapshot; /* Scan all tuples in this relation */ snapshot = RegisterSnapshot(GetLatestSnapshot()); - scan = heap_beginscan(testrel, snapshot, 0, NULL); - while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + scan = table_beginscan(testrel, snapshot, 0, NULL); + slot = table_slot_create(testrel, NULL); + while (table_scan_getnextslot(scan, ForwardScanDirection, slot)) { int i; @@ -2379,7 +2381,7 @@ AlterDomainNotNull(List *names, bool notNull) int attnum = rtc->atts[i]; Form_pg_attribute attr = TupleDescAttr(tupdesc, attnum - 1); - if (heap_attisnull(tuple, attnum, tupdesc)) + if (slot_attisnull(slot, attnum)) { /* * In principle the auxiliary information for this @@ -2398,7 +2400,8 @@ AlterDomainNotNull(List *names, bool notNull) } } } - heap_endscan(scan); + ExecDropSingleTupleTableSlot(slot); + table_endscan(scan); UnregisterSnapshot(snapshot); /* Close each rel after processing, but keep lock */ @@ -2776,14 +2779,15 @@ validateDomainConstraint(Oid domainoid, char *ccbin) RelToCheck *rtc = (RelToCheck *) lfirst(rt); Relation testrel = rtc->rel; TupleDesc tupdesc = RelationGetDescr(testrel); - HeapScanDesc scan; - HeapTuple tuple; + TupleTableSlot *slot; + TableScanDesc scan; Snapshot snapshot; /* Scan all tuples in this relation */ snapshot = RegisterSnapshot(GetLatestSnapshot()); - scan = heap_beginscan(testrel, snapshot, 0, NULL); - while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + scan = table_beginscan(testrel, snapshot, 0, NULL); + slot = table_slot_create(testrel, NULL); + while (table_scan_getnextslot(scan, ForwardScanDirection, slot)) { int i; @@ -2796,7 +2800,7 @@ validateDomainConstraint(Oid domainoid, char *ccbin) Datum conResult; Form_pg_attribute attr = TupleDescAttr(tupdesc, attnum - 1); - d = heap_getattr(tuple, attnum, tupdesc, &isNull); + d = slot_getattr(slot, attnum, &isNull); econtext->domainValue_datum = d; econtext->domainValue_isNull = isNull; @@ -2826,7 +2830,8 @@ validateDomainConstraint(Oid domainoid, char *ccbin) ResetExprContext(econtext); } - heap_endscan(scan); + ExecDropSingleTupleTableSlot(slot); + table_endscan(scan); UnregisterSnapshot(snapshot); /* Hold relation lock till commit (XXX bad for concurrency) */ diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index da13a5a619..1b5b50cf01 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -28,6 +28,7 @@ #include "access/heapam.h" #include "access/htup_details.h" #include "access/multixact.h" +#include "access/tableam.h" #include "access/transam.h" #include "access/xact.h" #include "catalog/namespace.h" @@ -745,12 +746,12 @@ get_all_vacuum_rels(int options) { List *vacrels = NIL; Relation pgclass; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tuple; pgclass = table_open(RelationRelationId, AccessShareLock); - scan = heap_beginscan_catalog(pgclass, 0, NULL); + scan = table_beginscan_catalog(pgclass, 0, NULL); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { @@ -784,7 +785,7 @@ get_all_vacuum_rels(int options) MemoryContextSwitchTo(oldcontext); } - heap_endscan(scan); + table_endscan(scan); table_close(pgclass, AccessShareLock); return vacrels; @@ -1381,7 +1382,7 @@ vac_truncate_clog(TransactionId frozenXID, { TransactionId nextXID = ReadNewTransactionId(); Relation relation; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tuple; Oid oldestxid_datoid; Oid minmulti_datoid; @@ -1412,7 +1413,7 @@ vac_truncate_clog(TransactionId frozenXID, */ relation = table_open(DatabaseRelationId, AccessShareLock); - scan = heap_beginscan_catalog(relation, 0, NULL); + scan = table_beginscan_catalog(relation, 0, NULL); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { @@ -1451,7 +1452,7 @@ vac_truncate_clog(TransactionId frozenXID, } } - heap_endscan(scan); + table_endscan(scan); table_close(relation, AccessShareLock); diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c index fe99096efc..fdb2c36246 100644 --- a/src/backend/executor/execCurrent.c +++ b/src/backend/executor/execCurrent.c @@ -204,7 +204,7 @@ execCurrentOf(CurrentOfExpr *cexpr, */ IndexScanDesc scan = ((IndexOnlyScanState *) scanstate)->ioss_ScanDesc; - *current_tid = scan->xs_ctup.t_self; + *current_tid = scan->xs_heaptid; } else { diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c index fd0520105d..e67dd6750c 100644 --- a/src/backend/executor/execIndexing.c +++ b/src/backend/executor/execIndexing.c @@ -108,6 +108,7 @@ #include "access/genam.h" #include "access/relscan.h" +#include "access/tableam.h" #include "access/xact.h" #include "catalog/index.h" #include "executor/executor.h" @@ -651,7 +652,6 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index, Oid *index_collations = index->rd_indcollation; int indnkeyatts = IndexRelationGetNumberOfKeyAttributes(index); IndexScanDesc index_scan; - HeapTuple tup; ScanKeyData scankeys[INDEX_MAX_KEYS]; SnapshotData DirtySnapshot; int i; @@ -707,8 +707,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index, * to this slot. Be sure to save and restore caller's value for * scantuple. */ - existing_slot = MakeSingleTupleTableSlot(RelationGetDescr(heap), - &TTSOpsHeapTuple); + existing_slot = table_slot_create(heap, NULL); econtext = GetPerTupleExprContext(estate); save_scantuple = econtext->ecxt_scantuple; @@ -724,11 +723,9 @@ retry: index_scan = index_beginscan(heap, index, &DirtySnapshot, indnkeyatts, 0); index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0); - while ((tup = index_getnext(index_scan, - ForwardScanDirection)) != NULL) + while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot)) { TransactionId xwait; - ItemPointerData ctid_wait; XLTW_Oper reason_wait; Datum existing_values[INDEX_MAX_KEYS]; bool existing_isnull[INDEX_MAX_KEYS]; @@ -739,7 +736,7 @@ retry: * Ignore the entry for the tuple we're trying to check. */ if (ItemPointerIsValid(tupleid) && - ItemPointerEquals(tupleid, &tup->t_self)) + ItemPointerEquals(tupleid, &existing_slot->tts_tid)) { if (found_self) /* should not happen */ elog(ERROR, "found self tuple multiple times in index \"%s\"", @@ -752,7 +749,6 @@ retry: * Extract the index column values and isnull flags from the existing * tuple. */ - ExecStoreHeapTuple(tup, existing_slot, false); FormIndexDatum(indexInfo, existing_slot, estate, existing_values, existing_isnull); @@ -787,7 +783,6 @@ retry: DirtySnapshot.speculativeToken && TransactionIdPrecedes(GetCurrentTransactionId(), xwait)))) { - ctid_wait = tup->t_data->t_ctid; reason_wait = indexInfo->ii_ExclusionOps ? XLTW_RecheckExclusionConstr : XLTW_InsertIndex; index_endscan(index_scan); @@ -795,7 +790,8 @@ retry: SpeculativeInsertionWait(DirtySnapshot.xmin, DirtySnapshot.speculativeToken); else - XactLockTableWait(xwait, heap, &ctid_wait, reason_wait); + XactLockTableWait(xwait, heap, + &existing_slot->tts_tid, reason_wait); goto retry; } @@ -807,7 +803,7 @@ retry: { conflict = true; if (conflictTid) - *conflictTid = tup->t_self; + *conflictTid = existing_slot->tts_tid; break; } diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 61be56fe0b..499917d45f 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -40,6 +40,7 @@ #include "access/heapam.h" #include "access/htup_details.h" #include "access/sysattr.h" +#include "access/tableam.h" #include "access/transam.h" #include "access/xact.h" #include "catalog/namespace.h" @@ -2802,9 +2803,8 @@ EvalPlanQualSlot(EPQState *epqstate, oldcontext = MemoryContextSwitchTo(epqstate->estate->es_query_cxt); if (relation) - *slot = ExecAllocTableSlot(&epqstate->estate->es_tupleTable, - RelationGetDescr(relation), - &TTSOpsBufferHeapTuple); + *slot = table_slot_create(relation, + &epqstate->estate->es_tupleTable); else *slot = ExecAllocTableSlot(&epqstate->estate->es_tupleTable, epqstate->origslot->tts_tupleDescriptor, diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c index aaa81f0620..37e96a6013 100644 --- a/src/backend/executor/execPartition.c +++ b/src/backend/executor/execPartition.c @@ -14,6 +14,7 @@ #include "postgres.h" #include "access/table.h" +#include "access/tableam.h" #include "catalog/partition.h" #include "catalog/pg_inherits.h" #include "catalog/pg_type.h" @@ -727,10 +728,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate, if (node->onConflictAction == ONCONFLICT_UPDATE) { TupleConversionMap *map; - TupleDesc leaf_desc; map = leaf_part_rri->ri_PartitionInfo->pi_RootToPartitionMap; - leaf_desc = RelationGetDescr(leaf_part_rri->ri_RelationDesc); Assert(node->onConflictSet != NIL); Assert(rootResultRelInfo->ri_onConflict != NULL); @@ -743,9 +742,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate, * descriptors match. */ leaf_part_rri->ri_onConflict->oc_Existing = - ExecInitExtraTupleSlot(mtstate->ps.state, - leaf_desc, - &TTSOpsBufferHeapTuple); + table_slot_create(leaf_part_rri->ri_RelationDesc, + &mtstate->ps.state->es_tupleTable); /* * If the partition's tuple descriptor matches exactly the root @@ -920,8 +918,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate, * end of the command. */ partrouteinfo->pi_PartitionTupleSlot = - ExecInitExtraTupleSlot(estate, RelationGetDescr(partrel), - &TTSOpsHeapTuple); + table_slot_create(partrel, &estate->es_tupleTable); } else partrouteinfo->pi_PartitionTupleSlot = NULL; diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c index 5c5aa96e7f..95dfc4987d 100644 --- a/src/backend/executor/execReplication.c +++ b/src/backend/executor/execReplication.c @@ -17,6 +17,7 @@ #include "access/genam.h" #include "access/heapam.h" #include "access/relscan.h" +#include "access/tableam.h" #include "access/transam.h" #include "access/xact.h" #include "commands/trigger.h" @@ -118,7 +119,6 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid, TupleTableSlot *searchslot, TupleTableSlot *outslot) { - HeapTuple scantuple; ScanKeyData skey[INDEX_MAX_KEYS]; IndexScanDesc scan; SnapshotData snap; @@ -144,10 +144,9 @@ retry: index_rescan(scan, skey, IndexRelationGetNumberOfKeyAttributes(idxrel), NULL, 0); /* Try to find the tuple */ - if ((scantuple = index_getnext(scan, ForwardScanDirection)) != NULL) + if (index_getnext_slot(scan, ForwardScanDirection, outslot)) { found = true; - ExecStoreHeapTuple(scantuple, outslot, false); ExecMaterializeSlot(outslot); xwait = TransactionIdIsValid(snap.xmin) ? @@ -222,19 +221,21 @@ retry: } /* - * Compare the tuple and slot and check if they have equal values. + * Compare the tuples in the slots by checking if they have equal values. */ static bool -tuple_equals_slot(TupleDesc desc, HeapTuple tup, TupleTableSlot *slot) +tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2) { - Datum values[MaxTupleAttributeNumber]; - bool isnull[MaxTupleAttributeNumber]; - int attrnum; + int attrnum; - heap_deform_tuple(tup, desc, values, isnull); + Assert(slot1->tts_tupleDescriptor->natts == + slot2->tts_tupleDescriptor->natts); + + slot_getallattrs(slot1); + slot_getallattrs(slot2); /* Check equality of the attributes. */ - for (attrnum = 0; attrnum < desc->natts; attrnum++) + for (attrnum = 0; attrnum < slot1->tts_tupleDescriptor->natts; attrnum++) { Form_pg_attribute att; TypeCacheEntry *typentry; @@ -243,16 +244,16 @@ tuple_equals_slot(TupleDesc desc, HeapTuple tup, TupleTableSlot *slot) * If one value is NULL and other is not, then they are certainly not * equal */ - if (isnull[attrnum] != slot->tts_isnull[attrnum]) + if (slot1->tts_isnull[attrnum] != slot2->tts_isnull[attrnum]) return false; /* * If both are NULL, they can be considered equal. */ - if (isnull[attrnum]) + if (slot1->tts_isnull[attrnum] || slot2->tts_isnull[attrnum]) continue; - att = TupleDescAttr(desc, attrnum); + att = TupleDescAttr(slot1->tts_tupleDescriptor, attrnum); typentry = lookup_type_cache(att->atttypid, TYPECACHE_EQ_OPR_FINFO); if (!OidIsValid(typentry->eq_opr_finfo.fn_oid)) @@ -262,8 +263,8 @@ tuple_equals_slot(TupleDesc desc, HeapTuple tup, TupleTableSlot *slot) format_type_be(att->atttypid)))); if (!DatumGetBool(FunctionCall2(&typentry->eq_opr_finfo, - values[attrnum], - slot->tts_values[attrnum]))) + slot1->tts_values[attrnum], + slot2->tts_values[attrnum]))) return false; } @@ -284,33 +285,33 @@ bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode, TupleTableSlot *searchslot, TupleTableSlot *outslot) { - HeapTuple scantuple; - HeapScanDesc scan; + TupleTableSlot *scanslot; + TableScanDesc scan; SnapshotData snap; TransactionId xwait; bool found; - TupleDesc desc = RelationGetDescr(rel); + TupleDesc desc PG_USED_FOR_ASSERTS_ONLY = RelationGetDescr(rel); Assert(equalTupleDescs(desc, outslot->tts_tupleDescriptor)); /* Start a heap scan. */ InitDirtySnapshot(snap); - scan = heap_beginscan(rel, &snap, 0, NULL); + scan = table_beginscan(rel, &snap, 0, NULL); + scanslot = table_slot_create(rel, NULL); retry: found = false; - heap_rescan(scan, NULL); + table_rescan(scan, NULL); /* Try to find the tuple */ - while ((scantuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + while (table_scan_getnextslot(scan, ForwardScanDirection, scanslot)) { - if (!tuple_equals_slot(desc, scantuple, searchslot)) + if (!tuples_equal(scanslot, searchslot)) continue; found = true; - ExecStoreHeapTuple(scantuple, outslot, false); - ExecMaterializeSlot(outslot); + ExecCopySlot(outslot, scanslot); xwait = TransactionIdIsValid(snap.xmin) ? snap.xmin : snap.xmax; @@ -375,7 +376,8 @@ retry: } } - heap_endscan(scan); + table_endscan(scan); + ExecDropSingleTupleTableSlot(scanslot); return found; } @@ -458,11 +460,9 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate, ResultRelInfo *resultRelInfo = estate->es_result_relation_info; Relation rel = resultRelInfo->ri_RelationDesc; HeapTupleTableSlot *hsearchslot = (HeapTupleTableSlot *)searchslot; - HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)slot; - /* We expect both searchslot and the slot to contain a heap tuple. */ + /* We expect the searchslot to contain a heap tuple. */ Assert(TTS_IS_HEAPTUPLE(searchslot) || TTS_IS_BUFFERTUPLE(searchslot)); - Assert(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)); /* For now we support only tables. */ Assert(rel->rd_rel->relkind == RELKIND_RELATION); @@ -493,11 +493,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate, tuple = ExecFetchSlotHeapTuple(slot, true, NULL); /* OK, update the tuple and index entries for it */ - simple_heap_update(rel, &hsearchslot->tuple->t_self, hslot->tuple); - ItemPointerCopy(&hslot->tuple->t_self, &slot->tts_tid); + simple_heap_update(rel, &hsearchslot->tuple->t_self, tuple); + ItemPointerCopy(&tuple->t_self, &slot->tts_tid); if (resultRelInfo->ri_NumIndices > 0 && - !HeapTupleIsHeapOnly(hslot->tuple)) + !HeapTupleIsHeapOnly(tuple)) recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), estate, false, NULL, NIL); diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index 044d62a56e..3b23de9fac 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -48,6 +48,7 @@ #include "access/parallel.h" #include "access/relscan.h" #include "access/table.h" +#include "access/tableam.h" #include "access/transam.h" #include "executor/executor.h" #include "jit/jit.h" @@ -1121,7 +1122,7 @@ ExecGetTriggerOldSlot(EState *estate, ResultRelInfo *relInfo) relInfo->ri_TrigOldSlot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel), - &TTSOpsBufferHeapTuple); + table_slot_callbacks(rel)); MemoryContextSwitchTo(oldcontext); } @@ -1143,7 +1144,7 @@ ExecGetTriggerNewSlot(EState *estate, ResultRelInfo *relInfo) relInfo->ri_TrigNewSlot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel), - &TTSOpsBufferHeapTuple); + table_slot_callbacks(rel)); MemoryContextSwitchTo(oldcontext); } @@ -1165,7 +1166,7 @@ ExecGetReturningSlot(EState *estate, ResultRelInfo *relInfo) relInfo->ri_ReturningSlot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel), - &TTSOpsBufferHeapTuple); + table_slot_callbacks(rel)); MemoryContextSwitchTo(oldcontext); } diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c index 5e74585d5e..3a82857770 100644 --- a/src/backend/executor/nodeBitmapHeapscan.c +++ b/src/backend/executor/nodeBitmapHeapscan.c @@ -39,6 +39,7 @@ #include "access/heapam.h" #include "access/relscan.h" +#include "access/tableam.h" #include "access/transam.h" #include "access/visibilitymap.h" #include "executor/execdebug.h" @@ -61,7 +62,7 @@ static inline void BitmapAdjustPrefetchIterator(BitmapHeapScanState *node, TBMIterateResult *tbmres); static inline void BitmapAdjustPrefetchTarget(BitmapHeapScanState *node); static inline void BitmapPrefetch(BitmapHeapScanState *node, - HeapScanDesc scan); + TableScanDesc scan); static bool BitmapShouldInitializeSharedState( ParallelBitmapHeapState *pstate); @@ -76,7 +77,8 @@ static TupleTableSlot * BitmapHeapNext(BitmapHeapScanState *node) { ExprContext *econtext; - HeapScanDesc scan; + TableScanDesc scan; + HeapScanDesc hscan; TIDBitmap *tbm; TBMIterator *tbmiterator = NULL; TBMSharedIterator *shared_tbmiterator = NULL; @@ -92,6 +94,7 @@ BitmapHeapNext(BitmapHeapScanState *node) econtext = node->ss.ps.ps_ExprContext; slot = node->ss.ss_ScanTupleSlot; scan = node->ss.ss_currentScanDesc; + hscan = (HeapScanDesc) scan; tbm = node->tbm; if (pstate == NULL) tbmiterator = node->tbmiterator; @@ -219,7 +222,7 @@ BitmapHeapNext(BitmapHeapScanState *node) * least AccessShareLock on the table before performing any of the * indexscans, but let's be safe.) */ - if (tbmres->blockno >= scan->rs_nblocks) + if (tbmres->blockno >= hscan->rs_nblocks) { node->tbmres = tbmres = NULL; continue; @@ -242,14 +245,14 @@ BitmapHeapNext(BitmapHeapScanState *node) * The number of tuples on this page is put into * scan->rs_ntuples; note we don't fill scan->rs_vistuples. */ - scan->rs_ntuples = tbmres->ntuples; + hscan->rs_ntuples = tbmres->ntuples; } else { /* * Fetch the current heap page and identify candidate tuples. */ - bitgetpage(scan, tbmres); + bitgetpage(hscan, tbmres); } if (tbmres->ntuples >= 0) @@ -260,7 +263,7 @@ BitmapHeapNext(BitmapHeapScanState *node) /* * Set rs_cindex to first slot to examine */ - scan->rs_cindex = 0; + hscan->rs_cindex = 0; /* Adjust the prefetch target */ BitmapAdjustPrefetchTarget(node); @@ -270,7 +273,7 @@ BitmapHeapNext(BitmapHeapScanState *node) /* * Continuing in previously obtained page; advance rs_cindex */ - scan->rs_cindex++; + hscan->rs_cindex++; #ifdef USE_PREFETCH @@ -297,7 +300,7 @@ BitmapHeapNext(BitmapHeapScanState *node) /* * Out of range? If so, nothing more to look at on this page */ - if (scan->rs_cindex < 0 || scan->rs_cindex >= scan->rs_ntuples) + if (hscan->rs_cindex < 0 || hscan->rs_cindex >= hscan->rs_ntuples) { node->tbmres = tbmres = NULL; continue; @@ -324,15 +327,15 @@ BitmapHeapNext(BitmapHeapScanState *node) /* * Okay to fetch the tuple. */ - targoffset = scan->rs_vistuples[scan->rs_cindex]; - dp = (Page) BufferGetPage(scan->rs_cbuf); + targoffset = hscan->rs_vistuples[hscan->rs_cindex]; + dp = (Page) BufferGetPage(hscan->rs_cbuf); lp = PageGetItemId(dp, targoffset); Assert(ItemIdIsNormal(lp)); - scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp); - scan->rs_ctup.t_len = ItemIdGetLength(lp); - scan->rs_ctup.t_tableOid = scan->rs_rd->rd_id; - ItemPointerSet(&scan->rs_ctup.t_self, tbmres->blockno, targoffset); + hscan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp); + hscan->rs_ctup.t_len = ItemIdGetLength(lp); + hscan->rs_ctup.t_tableOid = scan->rs_rd->rd_id; + ItemPointerSet(&hscan->rs_ctup.t_self, tbmres->blockno, targoffset); pgstat_count_heap_fetch(scan->rs_rd); @@ -340,9 +343,9 @@ BitmapHeapNext(BitmapHeapScanState *node) * Set up the result slot to point to this tuple. Note that the * slot acquires a pin on the buffer. */ - ExecStoreBufferHeapTuple(&scan->rs_ctup, + ExecStoreBufferHeapTuple(&hscan->rs_ctup, slot, - scan->rs_cbuf); + hscan->rs_cbuf); /* * If we are using lossy info, we have to recheck the qual @@ -392,17 +395,17 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres) Assert(page < scan->rs_nblocks); scan->rs_cbuf = ReleaseAndReadBuffer(scan->rs_cbuf, - scan->rs_rd, + scan->rs_base.rs_rd, page); buffer = scan->rs_cbuf; - snapshot = scan->rs_snapshot; + snapshot = scan->rs_base.rs_snapshot; ntup = 0; /* * Prune and repair fragmentation for the whole page, if possible. */ - heap_page_prune_opt(scan->rs_rd, buffer); + heap_page_prune_opt(scan->rs_base.rs_rd, buffer); /* * We must hold share lock on the buffer content while examining tuple @@ -430,8 +433,8 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres) HeapTupleData heapTuple; ItemPointerSet(&tid, page, offnum); - if (heap_hot_search_buffer(&tid, scan->rs_rd, buffer, snapshot, - &heapTuple, NULL, true)) + if (heap_hot_search_buffer(&tid, scan->rs_base.rs_rd, buffer, + snapshot, &heapTuple, NULL, true)) scan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid); } } @@ -456,16 +459,16 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres) continue; loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp); loctup.t_len = ItemIdGetLength(lp); - loctup.t_tableOid = scan->rs_rd->rd_id; + loctup.t_tableOid = scan->rs_base.rs_rd->rd_id; ItemPointerSet(&loctup.t_self, page, offnum); valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer); if (valid) { scan->rs_vistuples[ntup++] = offnum; - PredicateLockTuple(scan->rs_rd, &loctup, snapshot); + PredicateLockTuple(scan->rs_base.rs_rd, &loctup, snapshot); } - CheckForSerializableConflictOut(valid, scan->rs_rd, &loctup, - buffer, snapshot); + CheckForSerializableConflictOut(valid, scan->rs_base.rs_rd, + &loctup, buffer, snapshot); } } @@ -598,7 +601,7 @@ BitmapAdjustPrefetchTarget(BitmapHeapScanState *node) * BitmapPrefetch - Prefetch, if prefetch_pages are behind prefetch_target */ static inline void -BitmapPrefetch(BitmapHeapScanState *node, HeapScanDesc scan) +BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan) { #ifdef USE_PREFETCH ParallelBitmapHeapState *pstate = node->pstate; @@ -741,7 +744,7 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node) PlanState *outerPlan = outerPlanState(node); /* rescan to release any page pin */ - heap_rescan(node->ss.ss_currentScanDesc, NULL); + table_rescan(node->ss.ss_currentScanDesc, NULL); /* release bitmaps and buffers if any */ if (node->tbmiterator) @@ -785,7 +788,7 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node) void ExecEndBitmapHeapScan(BitmapHeapScanState *node) { - HeapScanDesc scanDesc; + TableScanDesc scanDesc; /* * extract information from the node @@ -830,7 +833,7 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node) /* * close heap scan */ - heap_endscan(scanDesc); + table_endscan(scanDesc); } /* ---------------------------------------------------------------- @@ -914,8 +917,7 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags) */ ExecInitScanTupleSlot(estate, &scanstate->ss, RelationGetDescr(currentRelation), - &TTSOpsBufferHeapTuple); - + table_slot_callbacks(currentRelation)); /* * Initialize result type and projection. @@ -953,10 +955,10 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags) * Even though we aren't going to do a conventional seqscan, it is useful * to create a HeapScanDesc --- most of the fields in it are usable. */ - scanstate->ss.ss_currentScanDesc = heap_beginscan_bm(currentRelation, - estate->es_snapshot, - 0, - NULL); + scanstate->ss.ss_currentScanDesc = table_beginscan_bm(currentRelation, + estate->es_snapshot, + 0, + NULL); /* * all done. @@ -1104,5 +1106,5 @@ ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node, node->pstate = pstate; snapshot = RestoreSnapshot(pstate->phs_snapshot_data); - heap_update_snapshot(node->ss.ss_currentScanDesc, snapshot); + table_scan_update_snapshot(node->ss.ss_currentScanDesc, snapshot); } diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c index 26758e7703..2d954b722a 100644 --- a/src/backend/executor/nodeIndexonlyscan.c +++ b/src/backend/executor/nodeIndexonlyscan.c @@ -32,6 +32,7 @@ #include "access/genam.h" #include "access/relscan.h" +#include "access/tableam.h" #include "access/tupdesc.h" #include "access/visibilitymap.h" #include "executor/execdebug.h" @@ -119,7 +120,7 @@ IndexOnlyNext(IndexOnlyScanState *node) */ while ((tid = index_getnext_tid(scandesc, direction)) != NULL) { - HeapTuple tuple = NULL; + bool tuple_from_heap = false; CHECK_FOR_INTERRUPTS(); @@ -165,17 +166,18 @@ IndexOnlyNext(IndexOnlyScanState *node) * Rats, we have to visit the heap to check visibility. */ InstrCountTuples2(node, 1); - tuple = index_fetch_heap(scandesc); - if (tuple == NULL) + if (!index_fetch_heap(scandesc, slot)) continue; /* no visible tuple, try next index entry */ + ExecClearTuple(slot); + /* * Only MVCC snapshots are supported here, so there should be no * need to keep following the HOT chain once a visible entry has * been found. If we did want to allow that, we'd need to keep * more state to remember not to call index_getnext_tid next time. */ - if (scandesc->xs_continue_hot) + if (scandesc->xs_heap_continue) elog(ERROR, "non-MVCC snapshots are not supported in index-only scans"); /* @@ -184,13 +186,15 @@ IndexOnlyNext(IndexOnlyScanState *node) * but it's not clear whether it's a win to do so. The next index * entry might require a visit to the same heap page. */ + + tuple_from_heap = true; } /* * Fill the scan tuple slot with data from the index. This might be - * provided in either HeapTuple or IndexTuple format. Conceivably an - * index AM might fill both fields, in which case we prefer the heap - * format, since it's probably a bit cheaper to fill a slot from. + * provided in either HeapTuple or IndexTuple format. Conceivably + * an index AM might fill both fields, in which case we prefer the + * heap format, since it's probably a bit cheaper to fill a slot from. */ if (scandesc->xs_hitup) { @@ -201,7 +205,7 @@ IndexOnlyNext(IndexOnlyScanState *node) */ Assert(slot->tts_tupleDescriptor->natts == scandesc->xs_hitupdesc->natts); - ExecStoreHeapTuple(scandesc->xs_hitup, slot, false); + ExecForceStoreHeapTuple(scandesc->xs_hitup, slot); } else if (scandesc->xs_itup) StoreIndexTuple(slot, scandesc->xs_itup, scandesc->xs_itupdesc); @@ -244,7 +248,7 @@ IndexOnlyNext(IndexOnlyScanState *node) * anyway, then we already have the tuple-level lock and can skip the * page lock. */ - if (tuple == NULL) + if (!tuple_from_heap) PredicateLockPage(scandesc->heapRelation, ItemPointerGetBlockNumber(tid), estate->es_snapshot); @@ -523,7 +527,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags) * suitable data anyway.) */ tupDesc = ExecTypeFromTL(node->indextlist); - ExecInitScanTupleSlot(estate, &indexstate->ss, tupDesc, &TTSOpsHeapTuple); + ExecInitScanTupleSlot(estate, &indexstate->ss, tupDesc, + table_slot_callbacks(currentRelation)); /* * Initialize result type and projection info. The node's targetlist will diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c index 337b561c24..8f39cc2b6b 100644 --- a/src/backend/executor/nodeIndexscan.c +++ b/src/backend/executor/nodeIndexscan.c @@ -31,6 +31,7 @@ #include "access/nbtree.h" #include "access/relscan.h" +#include "access/tableam.h" #include "catalog/pg_am.h" #include "executor/execdebug.h" #include "executor/nodeIndexscan.h" @@ -64,7 +65,7 @@ static int cmp_orderbyvals(const Datum *adist, const bool *anulls, IndexScanState *node); static int reorderqueue_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg); -static void reorderqueue_push(IndexScanState *node, HeapTuple tuple, +static void reorderqueue_push(IndexScanState *node, TupleTableSlot *slot, Datum *orderbyvals, bool *orderbynulls); static HeapTuple reorderqueue_pop(IndexScanState *node); @@ -83,7 +84,6 @@ IndexNext(IndexScanState *node) ExprContext *econtext; ScanDirection direction; IndexScanDesc scandesc; - HeapTuple tuple; TupleTableSlot *slot; /* @@ -130,20 +130,10 @@ IndexNext(IndexScanState *node) /* * ok, now that we have what we need, fetch the next tuple. */ - while ((tuple = index_getnext(scandesc, direction)) != NULL) + while (index_getnext_slot(scandesc, direction, slot)) { CHECK_FOR_INTERRUPTS(); - /* - * Store the scanned tuple in the scan tuple slot of the scan state. - * Note: we pass 'false' because tuples returned by amgetnext are - * pointers onto disk pages and must not be pfree()'d. - */ - ExecStoreBufferHeapTuple(tuple, /* tuple to store */ - slot, /* slot to store in */ - scandesc->xs_cbuf); /* buffer containing - * tuple */ - /* * If the index was lossy, we have to recheck the index quals using * the fetched tuple. @@ -183,7 +173,6 @@ IndexNextWithReorder(IndexScanState *node) EState *estate; ExprContext *econtext; IndexScanDesc scandesc; - HeapTuple tuple; TupleTableSlot *slot; ReorderTuple *topmost = NULL; bool was_exact; @@ -252,6 +241,8 @@ IndexNextWithReorder(IndexScanState *node) scandesc->xs_orderbynulls, node) <= 0) { + HeapTuple tuple; + tuple = reorderqueue_pop(node); /* Pass 'true', as the tuple in the queue is a palloc'd copy */ @@ -271,8 +262,7 @@ IndexNextWithReorder(IndexScanState *node) */ next_indextuple: slot = node->ss.ss_ScanTupleSlot; - tuple = index_getnext(scandesc, ForwardScanDirection); - if (!tuple) + if (!index_getnext_slot(scandesc, ForwardScanDirection, slot)) { /* * No more tuples from the index. But we still need to drain any @@ -282,14 +272,6 @@ next_indextuple: continue; } - /* - * Store the scanned tuple in the scan tuple slot of the scan state. - */ - ExecStoreBufferHeapTuple(tuple, /* tuple to store */ - slot, /* slot to store in */ - scandesc->xs_cbuf); /* buffer containing - * tuple */ - /* * If the index was lossy, we have to recheck the index quals and * ORDER BY expressions using the fetched tuple. @@ -358,7 +340,7 @@ next_indextuple: node) > 0)) { /* Put this tuple to the queue */ - reorderqueue_push(node, tuple, lastfetched_vals, lastfetched_nulls); + reorderqueue_push(node, slot, lastfetched_vals, lastfetched_nulls); continue; } else @@ -478,7 +460,7 @@ reorderqueue_cmp(const pairingheap_node *a, const pairingheap_node *b, * Helper function to push a tuple to the reorder queue. */ static void -reorderqueue_push(IndexScanState *node, HeapTuple tuple, +reorderqueue_push(IndexScanState *node, TupleTableSlot *slot, Datum *orderbyvals, bool *orderbynulls) { IndexScanDesc scandesc = node->iss_ScanDesc; @@ -488,7 +470,7 @@ reorderqueue_push(IndexScanState *node, HeapTuple tuple, int i; rt = (ReorderTuple *) palloc(sizeof(ReorderTuple)); - rt->htup = heap_copytuple(tuple); + rt->htup = ExecCopySlotHeapTuple(slot); rt->orderbyvals = (Datum *) palloc(sizeof(Datum) * scandesc->numberOfOrderBys); rt->orderbynulls = @@ -949,7 +931,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags) */ ExecInitScanTupleSlot(estate, &indexstate->ss, RelationGetDescr(currentRelation), - &TTSOpsBufferHeapTuple); + table_slot_callbacks(currentRelation)); /* * Initialize result type and projection. diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index a7efe8dcae..fa92db130b 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -39,6 +39,7 @@ #include "access/heapam.h" #include "access/htup_details.h" +#include "access/tableam.h" #include "access/xact.h" #include "catalog/catalog.h" #include "commands/trigger.h" @@ -2147,7 +2148,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags); mtstate->mt_scans[i] = ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]), - &TTSOpsHeapTuple); + table_slot_callbacks(resultRelInfo->ri_RelationDesc)); /* Also let FDWs init themselves for foreign-table result rels */ if (!resultRelInfo->ri_usesFdwDirectModify && @@ -2207,8 +2208,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) if (update_tuple_routing_needed) { ExecSetupChildParentMapForSubplan(mtstate); - mtstate->mt_root_tuple_slot = MakeTupleTableSlot(RelationGetDescr(rel), - &TTSOpsHeapTuple); + mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL); } /* @@ -2320,8 +2320,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) /* initialize slot for the existing tuple */ resultRelInfo->ri_onConflict->oc_Existing = - ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc, - &TTSOpsBufferHeapTuple); + table_slot_create(resultRelInfo->ri_RelationDesc, + &mtstate->ps.state->es_tupleTable); /* create the tuple slot for the UPDATE SET projection */ tupDesc = ExecTypeFromTL((List *) node->onConflictSet); @@ -2430,15 +2430,18 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) for (i = 0; i < nplans; i++) { JunkFilter *j; + TupleTableSlot *junkresslot; subplan = mtstate->mt_plans[i]->plan; if (operation == CMD_INSERT || operation == CMD_UPDATE) ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc, subplan->targetlist); + junkresslot = + ExecInitExtraTupleSlot(estate, NULL, + table_slot_callbacks(resultRelInfo->ri_RelationDesc)); j = ExecInitJunkFilter(subplan->targetlist, - ExecInitExtraTupleSlot(estate, NULL, - &TTSOpsHeapTuple)); + junkresslot); if (operation == CMD_UPDATE || operation == CMD_DELETE) { diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c index 65ad959641..817e4ca41f 100644 --- a/src/backend/executor/nodeSamplescan.c +++ b/src/backend/executor/nodeSamplescan.c @@ -16,6 +16,7 @@ #include "access/heapam.h" #include "access/relscan.h" +#include "access/tableam.h" #include "access/tsmapi.h" #include "executor/executor.h" #include "executor/nodeSamplescan.h" @@ -48,6 +49,7 @@ SampleNext(SampleScanState *node) { HeapTuple tuple; TupleTableSlot *slot; + HeapScanDesc hscan; /* * if this is first call within a scan, initialize @@ -61,11 +63,12 @@ SampleNext(SampleScanState *node) tuple = tablesample_getnext(node); slot = node->ss.ss_ScanTupleSlot; + hscan = (HeapScanDesc) node->ss.ss_currentScanDesc; if (tuple) ExecStoreBufferHeapTuple(tuple, /* tuple to store */ slot, /* slot to store in */ - node->ss.ss_currentScanDesc->rs_cbuf); /* tuple's buffer */ + hscan->rs_cbuf); /* tuple's buffer */ else ExecClearTuple(slot); @@ -147,7 +150,7 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags) /* and create slot with appropriate rowtype */ ExecInitScanTupleSlot(estate, &scanstate->ss, RelationGetDescr(scanstate->ss.ss_currentRelation), - &TTSOpsBufferHeapTuple); + table_slot_callbacks(scanstate->ss.ss_currentRelation)); /* * Initialize result type and projection. @@ -219,7 +222,7 @@ ExecEndSampleScan(SampleScanState *node) * close heap scan */ if (node->ss.ss_currentScanDesc) - heap_endscan(node->ss.ss_currentScanDesc); + table_endscan(node->ss.ss_currentScanDesc); } /* ---------------------------------------------------------------- @@ -319,19 +322,19 @@ tablesample_init(SampleScanState *scanstate) if (scanstate->ss.ss_currentScanDesc == NULL) { scanstate->ss.ss_currentScanDesc = - heap_beginscan_sampling(scanstate->ss.ss_currentRelation, - scanstate->ss.ps.state->es_snapshot, - 0, NULL, - scanstate->use_bulkread, - allow_sync, - scanstate->use_pagemode); + table_beginscan_sampling(scanstate->ss.ss_currentRelation, + scanstate->ss.ps.state->es_snapshot, + 0, NULL, + scanstate->use_bulkread, + allow_sync, + scanstate->use_pagemode); } else { - heap_rescan_set_params(scanstate->ss.ss_currentScanDesc, NULL, - scanstate->use_bulkread, - allow_sync, - scanstate->use_pagemode); + table_rescan_set_params(scanstate->ss.ss_currentScanDesc, NULL, + scanstate->use_bulkread, + allow_sync, + scanstate->use_pagemode); } pfree(params); @@ -350,8 +353,9 @@ static HeapTuple tablesample_getnext(SampleScanState *scanstate) { TsmRoutine *tsm = scanstate->tsmroutine; - HeapScanDesc scan = scanstate->ss.ss_currentScanDesc; - HeapTuple tuple = &(scan->rs_ctup); + TableScanDesc scan = scanstate->ss.ss_currentScanDesc; + HeapScanDesc hscan = (HeapScanDesc) scan; + HeapTuple tuple = &(hscan->rs_ctup); Snapshot snapshot = scan->rs_snapshot; bool pagemode = scan->rs_pageatatime; BlockNumber blockno; @@ -359,14 +363,14 @@ tablesample_getnext(SampleScanState *scanstate) bool all_visible; OffsetNumber maxoffset; - if (!scan->rs_inited) + if (!hscan->rs_inited) { /* * return null immediately if relation is empty */ - if (scan->rs_nblocks == 0) + if (hscan->rs_nblocks == 0) { - Assert(!BufferIsValid(scan->rs_cbuf)); + Assert(!BufferIsValid(hscan->rs_cbuf)); tuple->t_data = NULL; return NULL; } @@ -380,15 +384,15 @@ tablesample_getnext(SampleScanState *scanstate) } } else - blockno = scan->rs_startblock; - Assert(blockno < scan->rs_nblocks); + blockno = hscan->rs_startblock; + Assert(blockno < hscan->rs_nblocks); heapgetpage(scan, blockno); - scan->rs_inited = true; + hscan->rs_inited = true; } else { /* continue from previously returned page/tuple */ - blockno = scan->rs_cblock; /* current page */ + blockno = hscan->rs_cblock; /* current page */ } /* @@ -396,9 +400,9 @@ tablesample_getnext(SampleScanState *scanstate) * visibility checks. */ if (!pagemode) - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE); - page = (Page) BufferGetPage(scan->rs_cbuf); + page = (Page) BufferGetPage(hscan->rs_cbuf); all_visible = PageIsAllVisible(page) && !snapshot->takenDuringRecovery; maxoffset = PageGetMaxOffsetNumber(page); @@ -431,18 +435,18 @@ tablesample_getnext(SampleScanState *scanstate) if (all_visible) visible = true; else - visible = SampleTupleVisible(tuple, tupoffset, scan); + visible = SampleTupleVisible(tuple, tupoffset, hscan); /* in pagemode, heapgetpage did this for us */ if (!pagemode) CheckForSerializableConflictOut(visible, scan->rs_rd, tuple, - scan->rs_cbuf, snapshot); + hscan->rs_cbuf, snapshot); if (visible) { /* Found visible tuple, return it. */ if (!pagemode) - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK); break; } else @@ -457,7 +461,7 @@ tablesample_getnext(SampleScanState *scanstate) * it's time to move to the next. */ if (!pagemode) - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK); if (tsm->NextSampleBlock) { @@ -469,7 +473,7 @@ tablesample_getnext(SampleScanState *scanstate) { /* Without NextSampleBlock, just do a plain forward seqscan. */ blockno++; - if (blockno >= scan->rs_nblocks) + if (blockno >= hscan->rs_nblocks) blockno = 0; /* @@ -485,7 +489,7 @@ tablesample_getnext(SampleScanState *scanstate) if (scan->rs_syncscan) ss_report_location(scan->rs_rd, blockno); - finished = (blockno == scan->rs_startblock); + finished = (blockno == hscan->rs_startblock); } /* @@ -493,23 +497,23 @@ tablesample_getnext(SampleScanState *scanstate) */ if (finished) { - if (BufferIsValid(scan->rs_cbuf)) - ReleaseBuffer(scan->rs_cbuf); - scan->rs_cbuf = InvalidBuffer; - scan->rs_cblock = InvalidBlockNumber; + if (BufferIsValid(hscan->rs_cbuf)) + ReleaseBuffer(hscan->rs_cbuf); + hscan->rs_cbuf = InvalidBuffer; + hscan->rs_cblock = InvalidBlockNumber; tuple->t_data = NULL; - scan->rs_inited = false; + hscan->rs_inited = false; return NULL; } - Assert(blockno < scan->rs_nblocks); + Assert(blockno < hscan->rs_nblocks); heapgetpage(scan, blockno); /* Re-establish state for new page */ if (!pagemode) - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); + LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE); - page = (Page) BufferGetPage(scan->rs_cbuf); + page = (Page) BufferGetPage(hscan->rs_cbuf); all_visible = PageIsAllVisible(page) && !snapshot->takenDuringRecovery; maxoffset = PageGetMaxOffsetNumber(page); } @@ -517,7 +521,7 @@ tablesample_getnext(SampleScanState *scanstate) /* Count successfully-fetched tuples as heap fetches */ pgstat_count_heap_getnext(scan->rs_rd); - return &(scan->rs_ctup); + return &(hscan->rs_ctup); } /* @@ -526,7 +530,7 @@ tablesample_getnext(SampleScanState *scanstate) static bool SampleTupleVisible(HeapTuple tuple, OffsetNumber tupoffset, HeapScanDesc scan) { - if (scan->rs_pageatatime) + if (scan->rs_base.rs_pageatatime) { /* * In pageatatime mode, heapgetpage() already did visibility checks, @@ -559,7 +563,7 @@ SampleTupleVisible(HeapTuple tuple, OffsetNumber tupoffset, HeapScanDesc scan) { /* Otherwise, we have to check the tuple individually. */ return HeapTupleSatisfiesVisibility(tuple, - scan->rs_snapshot, + scan->rs_base.rs_snapshot, scan->rs_cbuf); } } diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c index e5482859ef..8bd7430a91 100644 --- a/src/backend/executor/nodeSeqscan.c +++ b/src/backend/executor/nodeSeqscan.c @@ -27,8 +27,8 @@ */ #include "postgres.h" -#include "access/heapam.h" #include "access/relscan.h" +#include "access/tableam.h" #include "executor/execdebug.h" #include "executor/nodeSeqscan.h" #include "utils/rel.h" @@ -49,8 +49,7 @@ static TupleTableSlot *SeqNext(SeqScanState *node); static TupleTableSlot * SeqNext(SeqScanState *node) { - HeapTuple tuple; - HeapScanDesc scandesc; + TableScanDesc scandesc; EState *estate; ScanDirection direction; TupleTableSlot *slot; @@ -69,34 +68,18 @@ SeqNext(SeqScanState *node) * We reach here if the scan is not parallel, or if we're serially * executing a scan that was planned to be parallel. */ - scandesc = heap_beginscan(node->ss.ss_currentRelation, - estate->es_snapshot, - 0, NULL); + scandesc = table_beginscan(node->ss.ss_currentRelation, + estate->es_snapshot, + 0, NULL); node->ss.ss_currentScanDesc = scandesc; } /* * get the next tuple from the table */ - tuple = heap_getnext(scandesc, direction); - - /* - * save the tuple and the buffer returned to us by the access methods in - * our scan tuple slot and return the slot. Note: we pass 'false' because - * tuples returned by heap_getnext() are pointers onto disk pages and were - * not created with palloc() and so should not be pfree()'d. Note also - * that ExecStoreHeapTuple will increment the refcount of the buffer; the - * refcount will not be dropped until the tuple table slot is cleared. - */ - if (tuple) - ExecStoreBufferHeapTuple(tuple, /* tuple to store */ - slot, /* slot to store in */ - scandesc->rs_cbuf); /* buffer associated - * with this tuple */ - else - ExecClearTuple(slot); - - return slot; + if (table_scan_getnextslot(scandesc, direction, slot)) + return slot; + return NULL; } /* @@ -174,7 +157,7 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags) /* and create slot with the appropriate rowtype */ ExecInitScanTupleSlot(estate, &scanstate->ss, RelationGetDescr(scanstate->ss.ss_currentRelation), - &TTSOpsBufferHeapTuple); + table_slot_callbacks(scanstate->ss.ss_currentRelation)); /* * Initialize result type and projection. @@ -200,7 +183,7 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags) void ExecEndSeqScan(SeqScanState *node) { - HeapScanDesc scanDesc; + TableScanDesc scanDesc; /* * get information from node @@ -223,7 +206,7 @@ ExecEndSeqScan(SeqScanState *node) * close heap scan */ if (scanDesc != NULL) - heap_endscan(scanDesc); + table_endscan(scanDesc); } /* ---------------------------------------------------------------- @@ -240,13 +223,13 @@ ExecEndSeqScan(SeqScanState *node) void ExecReScanSeqScan(SeqScanState *node) { - HeapScanDesc scan; + TableScanDesc scan; scan = node->ss.ss_currentScanDesc; if (scan != NULL) - heap_rescan(scan, /* scan desc */ - NULL); /* new scan keys */ + table_rescan(scan, /* scan desc */ + NULL); /* new scan keys */ ExecScanReScan((ScanState *) node); } @@ -269,7 +252,8 @@ ExecSeqScanEstimate(SeqScanState *node, { EState *estate = node->ss.ps.state; - node->pscan_len = heap_parallelscan_estimate(estate->es_snapshot); + node->pscan_len = table_parallelscan_estimate(node->ss.ss_currentRelation, + estate->es_snapshot); shm_toc_estimate_chunk(&pcxt->estimator, node->pscan_len); shm_toc_estimate_keys(&pcxt->estimator, 1); } @@ -285,15 +269,15 @@ ExecSeqScanInitializeDSM(SeqScanState *node, ParallelContext *pcxt) { EState *estate = node->ss.ps.state; - ParallelHeapScanDesc pscan; + ParallelTableScanDesc pscan; pscan = shm_toc_allocate(pcxt->toc, node->pscan_len); - heap_parallelscan_initialize(pscan, - node->ss.ss_currentRelation, - estate->es_snapshot); + table_parallelscan_initialize(node->ss.ss_currentRelation, + pscan, + estate->es_snapshot); shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan); node->ss.ss_currentScanDesc = - heap_beginscan_parallel(node->ss.ss_currentRelation, pscan); + table_beginscan_parallel(node->ss.ss_currentRelation, pscan); } /* ---------------------------------------------------------------- @@ -306,9 +290,10 @@ void ExecSeqScanReInitializeDSM(SeqScanState *node, ParallelContext *pcxt) { - HeapScanDesc scan = node->ss.ss_currentScanDesc; + ParallelTableScanDesc pscan; - heap_parallelscan_reinitialize(scan->rs_parallel); + pscan = node->ss.ss_currentScanDesc->rs_parallel; + table_parallelscan_reinitialize(node->ss.ss_currentRelation, pscan); } /* ---------------------------------------------------------------- @@ -321,9 +306,9 @@ void ExecSeqScanInitializeWorker(SeqScanState *node, ParallelWorkerContext *pwcxt) { - ParallelHeapScanDesc pscan; + ParallelTableScanDesc pscan; pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false); node->ss.ss_currentScanDesc = - heap_beginscan_parallel(node->ss.ss_currentRelation, pscan); + table_beginscan_parallel(node->ss.ss_currentRelation, pscan); } diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c index 9a877874b7..08872ef9b4 100644 --- a/src/backend/executor/nodeTidscan.c +++ b/src/backend/executor/nodeTidscan.c @@ -24,6 +24,7 @@ #include "access/heapam.h" #include "access/sysattr.h" +#include "access/tableam.h" #include "catalog/pg_type.h" #include "executor/execdebug.h" #include "executor/nodeTidscan.h" @@ -538,7 +539,7 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags) */ ExecInitScanTupleSlot(estate, &tidstate->ss, RelationGetDescr(currentRelation), - &TTSOpsBufferHeapTuple); + table_slot_callbacks(currentRelation)); /* * Initialize result type and projection. diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c index e71eb3793b..5b897d50ee 100644 --- a/src/backend/partitioning/partbounds.c +++ b/src/backend/partitioning/partbounds.c @@ -14,7 +14,9 @@ #include "postgres.h" -#include "access/heapam.h" +#include "access/relation.h" +#include "access/table.h" +#include "access/tableam.h" #include "catalog/partition.h" #include "catalog/pg_inherits.h" #include "catalog/pg_type.h" @@ -1202,12 +1204,10 @@ check_default_partition_contents(Relation parent, Relation default_rel, Expr *constr; Expr *partition_constraint; EState *estate; - HeapTuple tuple; ExprState *partqualstate = NULL; Snapshot snapshot; - TupleDesc tupdesc; ExprContext *econtext; - HeapScanDesc scan; + TableScanDesc scan; MemoryContext oldCxt; TupleTableSlot *tupslot; @@ -1254,7 +1254,6 @@ check_default_partition_contents(Relation parent, Relation default_rel, continue; } - tupdesc = CreateTupleDescCopy(RelationGetDescr(part_rel)); constr = linitial(def_part_constraints); partition_constraint = (Expr *) map_partition_varattnos((List *) constr, @@ -1266,8 +1265,8 @@ check_default_partition_contents(Relation parent, Relation default_rel, econtext = GetPerTupleExprContext(estate); snapshot = RegisterSnapshot(GetLatestSnapshot()); - scan = heap_beginscan(part_rel, snapshot, 0, NULL); - tupslot = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple); + tupslot = table_slot_create(part_rel, &estate->es_tupleTable); + scan = table_beginscan(part_rel, snapshot, 0, NULL); /* * Switch to per-tuple memory context and reset it for each tuple @@ -1275,9 +1274,8 @@ check_default_partition_contents(Relation parent, Relation default_rel, */ oldCxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); - while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + while (table_scan_getnextslot(scan, ForwardScanDirection, tupslot)) { - ExecStoreHeapTuple(tuple, tupslot, false); econtext->ecxt_scantuple = tupslot; if (!ExecCheck(partqualstate, econtext)) @@ -1291,7 +1289,7 @@ check_default_partition_contents(Relation parent, Relation default_rel, } MemoryContextSwitchTo(oldCxt); - heap_endscan(scan); + table_endscan(scan); UnregisterSnapshot(snapshot); ExecDropSingleTupleTableSlot(tupslot); FreeExecutorState(estate); diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index e9fe0a6e1f..3bfac919c4 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -69,6 +69,7 @@ #include "access/htup_details.h" #include "access/multixact.h" #include "access/reloptions.h" +#include "access/tableam.h" #include "access/transam.h" #include "access/xact.h" #include "catalog/dependency.h" @@ -1865,7 +1866,7 @@ get_database_list(void) { List *dblist = NIL; Relation rel; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tup; MemoryContext resultcxt; @@ -1883,7 +1884,7 @@ get_database_list(void) (void) GetTransactionSnapshot(); rel = table_open(DatabaseRelationId, AccessShareLock); - scan = heap_beginscan_catalog(rel, 0, NULL); + scan = table_beginscan_catalog(rel, 0, NULL); while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection))) { @@ -1912,7 +1913,7 @@ get_database_list(void) MemoryContextSwitchTo(oldcxt); } - heap_endscan(scan); + table_endscan(scan); table_close(rel, AccessShareLock); CommitTransactionCommand(); @@ -1931,7 +1932,7 @@ do_autovacuum(void) { Relation classRel; HeapTuple tuple; - HeapScanDesc relScan; + TableScanDesc relScan; Form_pg_database dbForm; List *table_oids = NIL; List *orphan_oids = NIL; @@ -2043,7 +2044,7 @@ do_autovacuum(void) * wide tables there might be proportionally much more activity in the * TOAST table than in its parent. */ - relScan = heap_beginscan_catalog(classRel, 0, NULL); + relScan = table_beginscan_catalog(classRel, 0, NULL); /* * On the first pass, we collect main tables to vacuum, and also the main @@ -2132,7 +2133,7 @@ do_autovacuum(void) } } - heap_endscan(relScan); + table_endscan(relScan); /* second pass: check TOAST tables */ ScanKeyInit(&key, @@ -2140,7 +2141,7 @@ do_autovacuum(void) BTEqualStrategyNumber, F_CHAREQ, CharGetDatum(RELKIND_TOASTVALUE)); - relScan = heap_beginscan_catalog(classRel, 1, &key); + relScan = table_beginscan_catalog(classRel, 1, &key); while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL) { Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple); @@ -2187,7 +2188,7 @@ do_autovacuum(void) table_oids = lappend_oid(table_oids, relid); } - heap_endscan(relScan); + table_endscan(relScan); table_close(classRel, AccessShareLock); /* diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 43ec33834b..ba31f532ea 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -36,6 +36,7 @@ #include "access/heapam.h" #include "access/htup_details.h" +#include "access/tableam.h" #include "access/transam.h" #include "access/twophase_rmgr.h" #include "access/xact.h" @@ -1206,7 +1207,7 @@ pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid) HTAB *htab; HASHCTL hash_ctl; Relation rel; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tup; Snapshot snapshot; @@ -1221,7 +1222,7 @@ pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid) rel = table_open(catalogid, AccessShareLock); snapshot = RegisterSnapshot(GetLatestSnapshot()); - scan = heap_beginscan(rel, snapshot, 0, NULL); + scan = table_beginscan(rel, snapshot, 0, NULL); while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) { Oid thisoid; @@ -1234,7 +1235,7 @@ pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid) (void) hash_search(htab, (void *) &thisoid, HASH_ENTER, NULL); } - heap_endscan(scan); + table_endscan(scan); UnregisterSnapshot(snapshot); table_close(rel, AccessShareLock); diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c index 55b91b5e12..186057bd93 100644 --- a/src/backend/replication/logical/launcher.c +++ b/src/backend/replication/logical/launcher.c @@ -24,6 +24,7 @@ #include "access/heapam.h" #include "access/htup.h" #include "access/htup_details.h" +#include "access/tableam.h" #include "access/xact.h" #include "catalog/pg_subscription.h" @@ -118,7 +119,7 @@ get_subscription_list(void) { List *res = NIL; Relation rel; - HeapScanDesc scan; + TableScanDesc scan; HeapTuple tup; MemoryContext resultcxt; @@ -136,7 +137,7 @@ get_subscription_list(void) (void) GetTransactionSnapshot(); rel = table_open(SubscriptionRelationId, AccessShareLock); - scan = heap_beginscan_catalog(rel, 0, NULL); + scan = table_beginscan_catalog(rel, 0, NULL); while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection))) { @@ -164,7 +165,7 @@ get_subscription_list(void) MemoryContextSwitchTo(oldcxt); } - heap_endscan(scan); + table_endscan(scan); table_close(rel, AccessShareLock); CommitTransactionCommand(); diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index a5e5007e81..07f4ec9055 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -24,6 +24,7 @@ #include "postgres.h" #include "access/table.h" +#include "access/tableam.h" #include "access/xact.h" #include "access/xlog_internal.h" #include "catalog/catalog.h" @@ -698,10 +699,9 @@ apply_handle_update(StringInfo s) estate = create_estate_for_relation(rel); remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel), - &TTSOpsHeapTuple); - localslot = ExecInitExtraTupleSlot(estate, - RelationGetDescr(rel->localrel), - &TTSOpsHeapTuple); + &TTSOpsVirtual); + localslot = table_slot_create(rel->localrel, + &estate->es_tupleTable); EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1); PushActiveSnapshot(GetTransactionSnapshot()); @@ -819,9 +819,8 @@ apply_handle_delete(StringInfo s) remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel), &TTSOpsVirtual); - localslot = ExecInitExtraTupleSlot(estate, - RelationGetDescr(rel->localrel), - &TTSOpsHeapTuple); + localslot = table_slot_create(rel->localrel, + &estate->es_tupleTable); EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1); PushActiveSnapshot(GetTransactionSnapshot()); diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c index 7ad470d34a..6bd889461e 100644 --- a/src/backend/rewrite/rewriteDefine.c +++ b/src/backend/rewrite/rewriteDefine.c @@ -17,6 +17,7 @@ #include "access/heapam.h" #include "access/htup_details.h" #include "access/multixact.h" +#include "access/tableam.h" #include "access/transam.h" #include "access/xact.h" #include "catalog/catalog.h" @@ -423,8 +424,9 @@ DefineQueryRewrite(const char *rulename, if (event_relation->rd_rel->relkind != RELKIND_VIEW && event_relation->rd_rel->relkind != RELKIND_MATVIEW) { - HeapScanDesc scanDesc; + TableScanDesc scanDesc; Snapshot snapshot; + TupleTableSlot *slot; if (event_relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ereport(ERROR, @@ -439,13 +441,15 @@ DefineQueryRewrite(const char *rulename, RelationGetRelationName(event_relation)))); snapshot = RegisterSnapshot(GetLatestSnapshot()); - scanDesc = heap_beginscan(event_relation, snapshot, 0, NULL); - if (heap_getnext(scanDesc, ForwardScanDirection) != NULL) + scanDesc = table_beginscan(event_relation, snapshot, 0, NULL); + slot = table_slot_create(event_relation, NULL); + if (table_scan_getnextslot(scanDesc, ForwardScanDirection, slot)) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("could not convert table \"%s\" to a view because it is not empty", RelationGetRelationName(event_relation)))); - heap_endscan(scanDesc); + ExecDropSingleTupleTableSlot(slot); + table_endscan(scanDesc); UnregisterSnapshot(snapshot); if (event_relation->rd_rel->relhastriggers) diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c index ef04fa5009..d715709b7c 100644 --- a/src/backend/utils/adt/ri_triggers.c +++ b/src/backend/utils/adt/ri_triggers.c @@ -23,10 +23,10 @@ #include "postgres.h" -#include "access/heapam.h" #include "access/htup_details.h" #include "access/sysattr.h" #include "access/table.h" +#include "access/tableam.h" #include "access/xact.h" #include "catalog/pg_collation.h" #include "catalog/pg_constraint.h" @@ -253,26 +253,9 @@ RI_FKey_check(TriggerData *trigdata) * checked). Test its liveness according to SnapshotSelf. We need pin * and lock on the buffer to call HeapTupleSatisfiesVisibility. Caller * should be holding pin, but not lock. - * - * XXX: Note that the buffer-tuple specificity will be removed in the near - * future. */ - if (TTS_IS_BUFFERTUPLE(newslot)) - { - BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) newslot; - - Assert(BufferIsValid(bslot->buffer)); - - LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE); - if (!HeapTupleSatisfiesVisibility(bslot->base.tuple, SnapshotSelf, bslot->buffer)) - { - LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK); - return PointerGetDatum(NULL); - } - LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK); - } - else - elog(ERROR, "expected buffer tuple"); + if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf)) + return PointerGetDatum(NULL); /* * Get the relation descriptors of the FK and PK tables. diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index e6837869cf..12d30d7d63 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -106,6 +106,7 @@ #include "access/htup_details.h" #include "access/sysattr.h" #include "access/table.h" +#include "access/tableam.h" #include "catalog/index.h" #include "catalog/pg_am.h" #include "catalog/pg_collation.h" @@ -5099,7 +5100,6 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata, bool typByVal; ScanKeyData scankeys[1]; IndexScanDesc index_scan; - HeapTuple tup; Datum values[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; SnapshotData SnapshotNonVacuumable; @@ -5122,8 +5122,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata, indexInfo = BuildIndexInfo(indexRel); /* some other stuff */ - slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel), - &TTSOpsHeapTuple); + slot = table_slot_create(heapRel, NULL); econtext->ecxt_scantuple = slot; get_typlenbyval(vardata->atttype, &typLen, &typByVal); InitNonVacuumableSnapshot(SnapshotNonVacuumable, RecentGlobalXmin); @@ -5175,11 +5174,9 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata, index_rescan(index_scan, scankeys, 1, NULL, 0); /* Fetch first tuple in sortop's direction */ - if ((tup = index_getnext(index_scan, - indexscandir)) != NULL) + if (index_getnext_slot(index_scan, indexscandir, slot)) { - /* Extract the index column values from the heap tuple */ - ExecStoreHeapTuple(tup, slot, false); + /* Extract the index column values from the slot */ FormIndexDatum(indexInfo, slot, estate, values, isnull); @@ -5208,11 +5205,9 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata, index_rescan(index_scan, scankeys, 1, NULL, 0); /* Fetch first tuple in reverse direction */ - if ((tup = index_getnext(index_scan, - -indexscandir)) != NULL) + if (index_getnext_slot(index_scan, -indexscandir, slot)) { - /* Extract the index column values from the heap tuple */ - ExecStoreHeapTuple(tup, slot, false); + /* Extract the index column values from the slot */ FormIndexDatum(indexInfo, slot, estate, values, isnull); diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index a5ee209f91..752010ed27 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -24,6 +24,7 @@ #include "access/htup_details.h" #include "access/session.h" #include "access/sysattr.h" +#include "access/tableam.h" #include "access/xact.h" #include "access/xlog.h" #include "catalog/catalog.h" @@ -1245,15 +1246,15 @@ static bool ThereIsAtLeastOneRole(void) { Relation pg_authid_rel; - HeapScanDesc scan; + TableScanDesc scan; bool result; pg_authid_rel = table_open(AuthIdRelationId, AccessShareLock); - scan = heap_beginscan_catalog(pg_authid_rel, 0, NULL); + scan = table_beginscan_catalog(pg_authid_rel, 0, NULL); result = (heap_getnext(scan, ForwardScanDirection) != NULL); - heap_endscan(scan); + table_endscan(scan); table_close(pg_authid_rel, AccessShareLock); return result; diff --git a/src/include/access/genam.h b/src/include/access/genam.h index c4aba39496..cad66513f6 100644 --- a/src/include/access/genam.h +++ b/src/include/access/genam.h @@ -159,8 +159,10 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel, ParallelIndexScanDesc pscan); extern ItemPointer index_getnext_tid(IndexScanDesc scan, ScanDirection direction); -extern HeapTuple index_fetch_heap(IndexScanDesc scan); -extern HeapTuple index_getnext(IndexScanDesc scan, ScanDirection direction); +struct TupleTableSlot; +extern bool index_fetch_heap(IndexScanDesc scan, struct TupleTableSlot *slot); +extern bool index_getnext_slot(IndexScanDesc scan, ScanDirection direction, + struct TupleTableSlot *slot); extern int64 index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap); extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info, diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index ab0879138f..1b6607fe90 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -15,6 +15,7 @@ #define HEAPAM_H #include "access/relation.h" /* for backward compatibility */ +#include "access/relscan.h" #include "access/sdir.h" #include "access/skey.h" #include "access/table.h" /* for backward compatibility */ @@ -60,6 +61,48 @@ typedef struct HeapUpdateFailureData CommandId cmax; } HeapUpdateFailureData; +/* + * Descriptor for heap table scans. + */ +typedef struct HeapScanDescData +{ + TableScanDescData rs_base; /* AM independent part of the descriptor */ + + /* state set up at initscan time */ + BlockNumber rs_nblocks; /* total number of blocks in rel */ + BlockNumber rs_startblock; /* block # to start at */ + BlockNumber rs_numblocks; /* max number of blocks to scan */ + /* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel" */ + + /* scan current state */ + bool rs_inited; /* false = scan not init'd yet */ + BlockNumber rs_cblock; /* current block # in scan, if any */ + Buffer rs_cbuf; /* current buffer in scan, if any */ + /* NB: if rs_cbuf is not InvalidBuffer, we hold a pin on that buffer */ + + /* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel" */ + BufferAccessStrategy rs_strategy; /* access strategy for reads */ + + HeapTupleData rs_ctup; /* current tuple in scan, if any */ + + /* these fields only used in page-at-a-time mode and for bitmap scans */ + int rs_cindex; /* current tuple's index in vistuples */ + int rs_ntuples; /* number of visible tuples on page */ + OffsetNumber rs_vistuples[MaxHeapTuplesPerPage]; /* their offsets */ +} HeapScanDescData; +typedef struct HeapScanDescData *HeapScanDesc; + +/* + * Descriptor for fetches from heap via an index. + */ +typedef struct IndexFetchHeapData +{ + IndexFetchTableData xs_base; /* AM independent part of the descriptor */ + + Buffer xs_cbuf; /* current heap buffer in scan, if any */ + /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */ +} IndexFetchHeapData; + /* Result codes for HeapTupleSatisfiesVacuum */ typedef enum { @@ -79,42 +122,32 @@ typedef enum */ -/* struct definitions appear in relscan.h */ -typedef struct HeapScanDescData *HeapScanDesc; -typedef struct ParallelHeapScanDescData *ParallelHeapScanDesc; - /* * HeapScanIsValid * True iff the heap scan is valid. */ #define HeapScanIsValid(scan) PointerIsValid(scan) -extern HeapScanDesc heap_beginscan(Relation relation, Snapshot snapshot, - int nkeys, ScanKey key); -extern HeapScanDesc heap_beginscan_catalog(Relation relation, int nkeys, - ScanKey key); -extern HeapScanDesc heap_beginscan_strat(Relation relation, Snapshot snapshot, - int nkeys, ScanKey key, - bool allow_strat, bool allow_sync); -extern HeapScanDesc heap_beginscan_bm(Relation relation, Snapshot snapshot, - int nkeys, ScanKey key); -extern HeapScanDesc heap_beginscan_sampling(Relation relation, - Snapshot snapshot, int nkeys, ScanKey key, - bool allow_strat, bool allow_sync, bool allow_pagemode); -extern void heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk, +extern TableScanDesc heap_beginscan(Relation relation, Snapshot snapshot, + int nkeys, ScanKey key, + ParallelTableScanDesc parallel_scan, + bool allow_strat, + bool allow_sync, + bool allow_pagemode, + bool is_bitmapscan, + bool is_samplescan, + bool temp_snap); +extern void heap_setscanlimits(TableScanDesc scan, BlockNumber startBlk, BlockNumber endBlk); -extern void heapgetpage(HeapScanDesc scan, BlockNumber page); -extern void heap_rescan(HeapScanDesc scan, ScanKey key); -extern void heap_rescan_set_params(HeapScanDesc scan, ScanKey key, +extern void heapgetpage(TableScanDesc scan, BlockNumber page); +extern void heap_rescan(TableScanDesc scan, ScanKey key, bool set_params, + bool allow_strat, bool allow_sync, bool allow_pagemode); +extern void heap_rescan_set_params(TableScanDesc scan, ScanKey key, bool allow_strat, bool allow_sync, bool allow_pagemode); -extern void heap_endscan(HeapScanDesc scan); -extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction); - -extern Size heap_parallelscan_estimate(Snapshot snapshot); -extern void heap_parallelscan_initialize(ParallelHeapScanDesc target, - Relation relation, Snapshot snapshot); -extern void heap_parallelscan_reinitialize(ParallelHeapScanDesc parallel_scan); -extern HeapScanDesc heap_beginscan_parallel(Relation, ParallelHeapScanDesc); +extern void heap_endscan(TableScanDesc scan); +extern HeapTuple heap_getnext(TableScanDesc scan, ScanDirection direction); +extern bool heap_getnextslot(TableScanDesc sscan, + ScanDirection direction, struct TupleTableSlot *slot); extern bool heap_fetch(Relation relation, Snapshot snapshot, HeapTuple tuple, Buffer *userbuf, bool keep_buf, @@ -164,7 +197,6 @@ extern void simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup); extern void heap_sync(Relation relation); -extern void heap_update_snapshot(HeapScanDesc scan, Snapshot snapshot); /* in heap/pruneheap.c */ extern void heap_page_prune_opt(Relation relation, Buffer buffer); @@ -190,7 +222,7 @@ extern void heap_vacuum_rel(Relation onerel, int options, /* in heap/heapam_visibility.c */ extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot, - Buffer buffer); + Buffer buffer); extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple stup, CommandId curcid, Buffer buffer); extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple stup, TransactionId OldestXmin, diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h index b78ef2f47d..82de4cdcf2 100644 --- a/src/include/access/relscan.h +++ b/src/include/access/relscan.h @@ -21,63 +21,76 @@ #include "storage/spin.h" #include "utils/relcache.h" + +struct ParallelTableScanDescData; + /* - * Shared state for parallel heap scan. - * - * Each backend participating in a parallel heap scan has its own - * HeapScanDesc in backend-private memory, and those objects all contain - * a pointer to this structure. The information here must be sufficient - * to properly initialize each new HeapScanDesc as workers join the scan, - * and it must act as a font of block numbers for those workers. + * Generic descriptor for table scans. This is the base-class for table scans, + * which needs to be embedded in the scans of individual AMs. */ -typedef struct ParallelHeapScanDescData -{ - Oid phs_relid; /* OID of relation to scan */ - bool phs_syncscan; /* report location to syncscan logic? */ - BlockNumber phs_nblocks; /* # blocks in relation at start of scan */ - slock_t phs_mutex; /* mutual exclusion for setting startblock */ - BlockNumber phs_startblock; /* starting block number */ - pg_atomic_uint64 phs_nallocated; /* number of blocks allocated to - * workers so far. */ - bool phs_snapshot_any; /* SnapshotAny, not phs_snapshot_data? */ - char phs_snapshot_data[FLEXIBLE_ARRAY_MEMBER]; -} ParallelHeapScanDescData; - -typedef struct HeapScanDescData +typedef struct TableScanDescData { /* scan parameters */ Relation rs_rd; /* heap relation descriptor */ struct SnapshotData *rs_snapshot; /* snapshot to see */ int rs_nkeys; /* number of scan keys */ - struct ScanKeyData *rs_key; /* array of scan key descriptors */ + struct ScanKeyData *rs_key; /* array of scan key descriptors */ bool rs_bitmapscan; /* true if this is really a bitmap scan */ bool rs_samplescan; /* true if this is really a sample scan */ bool rs_pageatatime; /* verify visibility page-at-a-time? */ bool rs_allow_strat; /* allow or disallow use of access strategy */ bool rs_allow_sync; /* allow or disallow use of syncscan */ bool rs_temp_snap; /* unregister snapshot at scan end? */ - - /* state set up at initscan time */ - BlockNumber rs_nblocks; /* total number of blocks in rel */ - BlockNumber rs_startblock; /* block # to start at */ - BlockNumber rs_numblocks; /* max number of blocks to scan */ - /* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel" */ - BufferAccessStrategy rs_strategy; /* access strategy for reads */ bool rs_syncscan; /* report location to syncscan logic? */ - /* scan current state */ - bool rs_inited; /* false = scan not init'd yet */ - HeapTupleData rs_ctup; /* current tuple in scan, if any */ - BlockNumber rs_cblock; /* current block # in scan, if any */ - Buffer rs_cbuf; /* current buffer in scan, if any */ - /* NB: if rs_cbuf is not InvalidBuffer, we hold a pin on that buffer */ - struct ParallelHeapScanDescData *rs_parallel; /* parallel scan information */ + struct ParallelTableScanDescData *rs_parallel; /* parallel scan + * information */ - /* these fields only used in page-at-a-time mode and for bitmap scans */ - int rs_cindex; /* current tuple's index in vistuples */ - int rs_ntuples; /* number of visible tuples on page */ - OffsetNumber rs_vistuples[MaxHeapTuplesPerPage]; /* their offsets */ -} HeapScanDescData; +} TableScanDescData; +typedef struct TableScanDescData *TableScanDesc; + +/* + * Shared state for parallel table scan. + * + * Each backend participating in a parallel table scan has its own + * TableScanDesc in backend-private memory, and those objects all contain a + * pointer to this structure. The information here must be sufficient to + * properly initialize each new TableScanDesc as workers join the scan, and it + * must act as a information what to scan for those workers. + */ +typedef struct ParallelTableScanDescData +{ + Oid phs_relid; /* OID of relation to scan */ + bool phs_syncscan; /* report location to syncscan logic? */ + bool phs_snapshot_any; /* SnapshotAny, not phs_snapshot_data? */ + Size phs_snapshot_off; /* data for snapshot */ +} ParallelTableScanDescData; +typedef struct ParallelTableScanDescData *ParallelTableScanDesc; + +/* + * Shared state for parallel table scans, for block oriented storage. + */ +typedef struct ParallelBlockTableScanDescData +{ + ParallelTableScanDescData base; + + BlockNumber phs_nblocks; /* # blocks in relation at start of scan */ + slock_t phs_mutex; /* mutual exclusion for setting startblock */ + BlockNumber phs_startblock; /* starting block number */ + pg_atomic_uint64 phs_nallocated; /* number of blocks allocated to + * workers so far. */ +} ParallelBlockTableScanDescData; +typedef struct ParallelBlockTableScanDescData *ParallelBlockTableScanDesc; + +/* + * Base class for fetches from a table via an index. This is the base-class + * for such scans, which needs to be embedded in the respective struct for + * individual AMs. + */ +typedef struct IndexFetchTableData +{ + Relation rel; +} IndexFetchTableData; /* * We use the same IndexScanDescData structure for both amgettuple-based @@ -92,7 +105,7 @@ typedef struct IndexScanDescData struct SnapshotData *xs_snapshot; /* snapshot to see */ int numberOfKeys; /* number of index qualifier conditions */ int numberOfOrderBys; /* number of ordering operators */ - struct ScanKeyData *keyData; /* array of index qualifier descriptors */ + struct ScanKeyData *keyData; /* array of index qualifier descriptors */ struct ScanKeyData *orderByData; /* array of ordering op descriptors */ bool xs_want_itup; /* caller requests index tuples */ bool xs_temp_snap; /* unregister snapshot at scan end? */ @@ -115,12 +128,13 @@ typedef struct IndexScanDescData IndexTuple xs_itup; /* index tuple returned by AM */ struct TupleDescData *xs_itupdesc; /* rowtype descriptor of xs_itup */ HeapTuple xs_hitup; /* index data returned by AM, as HeapTuple */ - struct TupleDescData *xs_hitupdesc; /* rowtype descriptor of xs_hitup */ + struct TupleDescData *xs_hitupdesc; /* rowtype descriptor of xs_hitup */ + + ItemPointerData xs_heaptid; /* result */ + bool xs_heap_continue; /* T if must keep walking, potential + * further results */ + IndexFetchTableData *xs_heapfetch; - /* xs_ctup/xs_cbuf/xs_recheck are valid after a successful index_getnext */ - HeapTupleData xs_ctup; /* current heap tuple, if any */ - Buffer xs_cbuf; /* current heap buffer in scan, if any */ - /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */ bool xs_recheck; /* T means scan keys must be rechecked */ /* @@ -134,9 +148,6 @@ typedef struct IndexScanDescData bool *xs_orderbynulls; bool xs_recheckorderby; - /* state data for traversing HOT chains in index_getnext */ - bool xs_continue_hot; /* T if must keep walking HOT chain */ - /* parallel index scan information, in shared memory */ struct ParallelIndexScanDescData *parallel_scan; } IndexScanDescData; @@ -150,14 +161,17 @@ typedef struct ParallelIndexScanDescData char ps_snapshot_data[FLEXIBLE_ARRAY_MEMBER]; } ParallelIndexScanDescData; -/* Struct for heap-or-index scans of system tables */ +struct TupleTableSlot; + +/* Struct for storage-or-index scans of system tables */ typedef struct SysScanDescData { Relation heap_rel; /* catalog being scanned */ Relation irel; /* NULL if doing heap scan */ - struct HeapScanDescData *scan; /* only valid in heap-scan case */ - struct IndexScanDescData *iscan; /* only valid in index-scan case */ - struct SnapshotData *snapshot; /* snapshot to unregister at end of scan */ + struct TableScanDescData *scan; /* only valid in storage-scan case */ + struct IndexScanDescData *iscan; /* only valid in index-scan case */ + struct SnapshotData *snapshot; /* snapshot to unregister at end of scan */ + struct TupleTableSlot *slot; } SysScanDescData; #endif /* RELSCAN_H */ diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h index ccdc6de3ae..f2913b8cff 100644 --- a/src/include/access/tableam.h +++ b/src/include/access/tableam.h @@ -14,31 +14,497 @@ #ifndef TABLEAM_H #define TABLEAM_H +#include "access/relscan.h" +#include "access/sdir.h" #include "utils/guc.h" +#include "utils/rel.h" +#include "utils/snapshot.h" #define DEFAULT_TABLE_ACCESS_METHOD "heap" extern char *default_table_access_method; - +extern bool synchronize_seqscans; /* * API struct for a table AM. Note this must be allocated in a * server-lifetime manner, typically as a static const struct, which then gets * returned by FormData_pg_am.amhandler. + * + * I most cases it's not appropriate to directly call the callbacks directly, + * instead use the table_* wrapper functions. + * + * GetTableAmRoutine() asserts that required callbacks are filled in, remember + * to update when adding a callback. */ typedef struct TableAmRoutine { /* this must be set to T_TableAmRoutine */ NodeTag type; + + + /* ------------------------------------------------------------------------ + * Slot related callbacks. + * ------------------------------------------------------------------------ + */ + + /* + * Return slot implementation suitable for storing a tuple of this AM. + */ + const TupleTableSlotOps *(*slot_callbacks) (Relation rel); + + + /* ------------------------------------------------------------------------ + * Table scan callbacks. + * ------------------------------------------------------------------------ + */ + + /* + * Start a scan of `rel`. The callback has to return a TableScanDesc, + * which will typically be embedded in a larger, AM specific, struct. + * + * If nkeys != 0, the results need to be filtered by those scan keys. + * + * pscan, if not NULL, will have already been initialized with + * parallelscan_initialize(), and has to be for the same relation. Will + * only be set coming from table_beginscan_parallel(). + * + * allow_{strat, sync, pagemode} specify whether a scan strategy, + * synchronized scans, or page mode may be used (although not every AM + * will support those). + * + * is_{bitmapscan, samplescan} specify whether the scan is inteded to + * support those types of scans. + * + * if temp_snap is true, the snapshot will need to be deallocated at + * scan_end. + */ + TableScanDesc (*scan_begin) (Relation rel, + Snapshot snapshot, + int nkeys, struct ScanKeyData *key, + ParallelTableScanDesc pscan, + bool allow_strat, + bool allow_sync, + bool allow_pagemode, + bool is_bitmapscan, + bool is_samplescan, + bool temp_snap); + + /* + * Release resources and deallocate scan. If TableScanDesc.temp_snap, + * TableScanDesc.rs_snapshot needs to be unregistered. + */ + void (*scan_end) (TableScanDesc scan); + + /* + * Restart relation scan. If set_params is set to true, allow{strat, + * sync, pagemode} (see scan_begin) changes should be taken into account. + */ + void (*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params, + bool allow_strat, bool allow_sync, bool allow_pagemode); + + /* + * Return next tuple from `scan`, store in slot. + */ + bool (*scan_getnextslot) (TableScanDesc scan, + ScanDirection direction, TupleTableSlot *slot); + + + /* ------------------------------------------------------------------------ + * Parallel table scan related functions. + * ------------------------------------------------------------------------ + */ + + /* + * Estimate the size of shared memory needed for a parallel scan of this + * relation. The snapshot does not need to be accounted for. + */ + Size (*parallelscan_estimate) (Relation rel); + + /* + * Initialize ParallelTableScanDesc for a parallel scan of this relation. + * pscan will be sized according to parallelscan_estimate() for the same + * relation. + */ + Size (*parallelscan_initialize) (Relation rel, ParallelTableScanDesc pscan); + + /* + * Reinitilize `pscan` for a new scan. `rel` will be the same relation as + * when `pscan` was initialized by parallelscan_initialize. + */ + void (*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc pscan); + + + /* ------------------------------------------------------------------------ + * Index Scan Callbacks + * ------------------------------------------------------------------------ + */ + + /* + * Prepare to fetch tuples from the relation, as needed when fetching + * tuples for an index scan. The callback has to return a + * IndexFetchTableData, which the AM will typically embed in a larger + * structure with additional information. + * + * Tuples for an index scan can then be fetched via index_fetch_tuple. + */ + struct IndexFetchTableData *(*index_fetch_begin) (Relation rel); + + /* + * Reset index fetch. Typically this will release cross index fetch + * resources held in IndexFetchTableData. + */ + void (*index_fetch_reset) (struct IndexFetchTableData *data); + + /* + * Release resources and deallocate index fetch. + */ + void (*index_fetch_end) (struct IndexFetchTableData *data); + + /* + * Fetch tuple at `tid` into `slot`, after doing a visibility test + * according to `snapshot`. If a tuple was found and passed the visibility + * test, return true, false otherwise. + * + * Note that AMs that do not necessarily update indexes when indexed + * columns do not change, need to return the current/correct version of a + * tuple as appropriate, even if the tid points to an older version of the + * tuple. + * + * *call_again is false on the first call to index_fetch_tuple for a tid. + * If there potentially is another tuple matching the tid, *call_again + * needs be set to true by index_fetch_tuple, signalling to the caller + * that index_fetch_tuple should be called again for the same tid. + * + * *all_dead should be set to true by index_fetch_tuple iff it is + * guaranteed that no backend needs to see that tuple. Index AMs can use + * that do avoid returning that tid in future searches. + */ + bool (*index_fetch_tuple) (struct IndexFetchTableData *scan, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + bool *call_again, bool *all_dead); + + /* ------------------------------------------------------------------------ + * Callbacks for non-modifying operations on individual tuples + * ------------------------------------------------------------------------ + */ + + /* + * Does the tuple in `slot` satisfy `snapshot`? The slot needs to be of + * the appropriate type for the AM. + */ + bool (*tuple_satisfies_snapshot) (Relation rel, + TupleTableSlot *slot, + Snapshot snapshot); + } TableAmRoutine; +/* ---------------------------------------------------------------------------- + * Slot functions. + * ---------------------------------------------------------------------------- + */ + +/* + * Returns slot callbacks suitable for holding tuples of the appropriate type + * for the relation. Works for tables, views, foreign tables and partitioned + * tables. + */ +extern const TupleTableSlotOps *table_slot_callbacks(Relation rel); + +/* + * Returns slot using the callbacks returned by table_slot_callbacks(), and + * registers it on *reglist. + */ +extern TupleTableSlot *table_slot_create(Relation rel, List **reglist); + + +/* ---------------------------------------------------------------------------- + * Table scan functions. + * ---------------------------------------------------------------------------- + */ + +/* + * Start a scan of `rel`. Returned tuples pass a visibility test of + * `snapshot`, and if nkeys != 0, the results are filtered by those scan keys. + */ +static inline TableScanDesc +table_beginscan(Relation rel, Snapshot snapshot, + int nkeys, struct ScanKeyData *key) +{ + return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, + true, true, true, false, false, false); +} + +/* + * Like table_beginscan(), but for scanning catalog. It'll automatically use a + * snapshot appropriate for scanning catalog relations. + */ +extern TableScanDesc table_beginscan_catalog(Relation rel, int nkeys, + struct ScanKeyData *key); + +/* + * Like table_beginscan(), but table_beginscan_strat() offers an extended API + * that lets the caller control whether a nondefault buffer access strategy + * can be used, and whether syncscan can be chosen (possibly resulting in the + * scan not starting from block zero). Both of these default to true with + * plain table_beginscan. + */ +static inline TableScanDesc +table_beginscan_strat(Relation rel, Snapshot snapshot, + int nkeys, struct ScanKeyData *key, + bool allow_strat, bool allow_sync) +{ + return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, + allow_strat, allow_sync, true, + false, false, false); +} + + +/* + * table_beginscan_bm is an alternative entry point for setting up a + * TableScanDesc for a bitmap heap scan. Although that scan technology is + * really quite unlike a standard seqscan, there is just enough commonality to + * make it worth using the same data structure. + */ +static inline TableScanDesc +table_beginscan_bm(Relation rel, Snapshot snapshot, + int nkeys, struct ScanKeyData *key) +{ + return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, + false, false, true, true, false, false); +} + +/* + * table_beginscan_sampling is an alternative entry point for setting up a + * TableScanDesc for a TABLESAMPLE scan. As with bitmap scans, it's worth + * using the same data structure although the behavior is rather different. + * In addition to the options offered by table_beginscan_strat, this call + * also allows control of whether page-mode visibility checking is used. + */ +static inline TableScanDesc +table_beginscan_sampling(Relation rel, Snapshot snapshot, + int nkeys, struct ScanKeyData *key, + bool allow_strat, bool allow_sync, bool allow_pagemode) +{ + return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, + allow_strat, allow_sync, allow_pagemode, + false, true, false); +} + +/* + * table_beginscan_analyze is an alternative entry point for setting up a + * TableScanDesc for an ANALYZE scan. As with bitmap scans, it's worth using + * the same data structure although the behavior is rather different. + */ +static inline TableScanDesc +table_beginscan_analyze(Relation rel) +{ + return rel->rd_tableam->scan_begin(rel, NULL, 0, NULL, NULL, + true, false, true, + false, true, false); +} + +/* + * End relation scan. + */ +static inline void +table_endscan(TableScanDesc scan) +{ + scan->rs_rd->rd_tableam->scan_end(scan); +} + + +/* + * Restart a relation scan. + */ +static inline void +table_rescan(TableScanDesc scan, + struct ScanKeyData *key) +{ + scan->rs_rd->rd_tableam->scan_rescan(scan, key, false, false, false, false); +} + +/* + * Restart a relation scan after changing params. + * + * This call allows changing the buffer strategy, syncscan, and pagemode + * options before starting a fresh scan. Note that although the actual use of + * syncscan might change (effectively, enabling or disabling reporting), the + * previously selected startblock will be kept. + */ +static inline void +table_rescan_set_params(TableScanDesc scan, struct ScanKeyData *key, + bool allow_strat, bool allow_sync, bool allow_pagemode) +{ + scan->rs_rd->rd_tableam->scan_rescan(scan, key, true, + allow_strat, allow_sync, + allow_pagemode); +} + +/* + * Update snapshot used by the scan. + */ +extern void table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot); + + +/* + * Return next tuple from `scan`, store in slot. + */ +static inline bool +table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot) +{ + slot->tts_tableOid = RelationGetRelid(sscan->rs_rd); + return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot); +} + + +/* ---------------------------------------------------------------------------- + * Parallel table scan related functions. + * ---------------------------------------------------------------------------- + */ + +/* + * Estimate the size of shared memory needed for a parallel scan of this + * relation. + */ +extern Size table_parallelscan_estimate(Relation rel, Snapshot snapshot); + +/* + * Initialize ParallelTableScanDesc for a parallel scan of this + * relation. `pscan` needs to be sized according to parallelscan_estimate() + * for the same relation. Call this just once in the leader process; then, + * individual workers attach via table_beginscan_parallel. + */ +extern void table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan, Snapshot snapshot); + +/* + * Begin a parallel scan. `pscan` needs to have been initialized with + * table_parallelscan_initialize(), for the same relation. The initialization + * does not need to have happened in this backend. + * + * Caller must hold a suitable lock on the correct relation. + */ +extern TableScanDesc table_beginscan_parallel(Relation rel, ParallelTableScanDesc pscan); + +/* + * Restart a parallel scan. Call this in the leader process. Caller is + * responsible for making sure that all workers have finished the scan + * beforehand. + */ +static inline void +table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan) +{ + return rel->rd_tableam->parallelscan_reinitialize(rel, pscan); +} + + +/* ---------------------------------------------------------------------------- + * Index scan related functions. + * ---------------------------------------------------------------------------- + */ + +/* + * Prepare to fetch tuples from the relation, as needed when fetching tuples + * for an index scan. + * + * Tuples for an index scan can then be fetched via table_index_fetch_tuple(). + */ +static inline IndexFetchTableData * +table_index_fetch_begin(Relation rel) +{ + return rel->rd_tableam->index_fetch_begin(rel); +} + +/* + * Reset index fetch. Typically this will release cross index fetch resources + * held in IndexFetchTableData. + */ +static inline void +table_index_fetch_reset(struct IndexFetchTableData *scan) +{ + scan->rel->rd_tableam->index_fetch_reset(scan); +} + +/* + * Release resources and deallocate index fetch. + */ +static inline void +table_index_fetch_end(struct IndexFetchTableData *scan) +{ + scan->rel->rd_tableam->index_fetch_end(scan); +} + +/* + * Fetches tuple at `tid` into `slot`, after doing a visibility test according + * to `snapshot`. If a tuple was found and passed the visibility test, returns + * true, false otherwise. + * + * *call_again needs to be false on the first call to table_index_fetch_tuple() for + * a tid. If there potentially is another tuple matching the tid, *call_again + * will be set to true, signalling that table_index_fetch_tuple() should be called + * again for the same tid. + * + * *all_dead will be set to true by table_index_fetch_tuple() iff it is guaranteed + * that no backend needs to see that tuple. Index AMs can use that do avoid + * returning that tid in future searches. + */ +static inline bool +table_index_fetch_tuple(struct IndexFetchTableData *scan, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + bool *call_again, bool *all_dead) +{ + + return scan->rel->rd_tableam->index_fetch_tuple(scan, tid, snapshot, + slot, call_again, + all_dead); +} + + +/* ------------------------------------------------------------------------ + * Functions for non-modifying operations on individual tuples + * ------------------------------------------------------------------------ + */ /* + * Return true iff tuple in slot satisfies the snapshot. + * + * This assumes the slot's tuple is valid, and of the appropriate type for the + * AM. + * + * Some AMs might modify the data underlying the tuple as a side-effect. If so + * they ought to mark the relevant buffer dirty. + */ +static inline bool +table_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot, Snapshot snapshot) +{ + return rel->rd_tableam->tuple_satisfies_snapshot(rel, slot, snapshot); +} + + +/* ---------------------------------------------------------------------------- + * Helper functions to implement parallel scans for block oriented AMs. + * ---------------------------------------------------------------------------- + */ + +extern Size table_block_parallelscan_estimate(Relation rel); +extern Size table_block_parallelscan_initialize(Relation rel, + ParallelTableScanDesc pscan); +extern void table_block_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan); +extern BlockNumber table_block_parallelscan_nextpage(Relation rel, ParallelBlockTableScanDesc pbscan); +extern void table_block_parallelscan_startblock_init(Relation rel, ParallelBlockTableScanDesc pbscan); + + +/* ---------------------------------------------------------------------------- * Functions in tableamapi.c + * ---------------------------------------------------------------------------- */ + extern const TableAmRoutine *GetTableAmRoutine(Oid amhandler); extern const TableAmRoutine *GetTableAmRoutineByAmId(Oid amoid); extern const TableAmRoutine *GetHeapamTableAmRoutine(void); diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h index 330c481a8b..29f7ed6237 100644 --- a/src/include/catalog/index.h +++ b/src/include/catalog/index.h @@ -110,13 +110,14 @@ extern void index_build(Relation heapRelation, bool isreindex, bool parallel); +struct TableScanDescData; extern double IndexBuildHeapScan(Relation heapRelation, Relation indexRelation, IndexInfo *indexInfo, bool allow_sync, IndexBuildCallback callback, void *callback_state, - struct HeapScanDescData *scan); + struct TableScanDescData *scan); extern double IndexBuildHeapRangeScan(Relation heapRelation, Relation indexRelation, IndexInfo *indexInfo, @@ -126,7 +127,7 @@ extern double IndexBuildHeapRangeScan(Relation heapRelation, BlockNumber end_blockno, IndexBuildCallback callback, void *callback_state, - struct HeapScanDescData *scan); + struct TableScanDescData *scan); extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot); diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index fd13c170d7..62eb1a06ee 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -1270,7 +1270,7 @@ typedef struct ScanState { PlanState ps; /* its first field is NodeTag */ Relation ss_currentRelation; - struct HeapScanDescData *ss_currentScanDesc; + struct TableScanDescData *ss_currentScanDesc; TupleTableSlot *ss_ScanTupleSlot; } ScanState; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 7a5d8c47e1..b821df9e71 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1018,6 +1018,8 @@ IndexBulkDeleteCallback IndexBulkDeleteResult IndexClauseSet IndexElem +IndexFetchHeapData +IndexFetchTableData IndexInfo IndexList IndexOnlyScan @@ -1602,6 +1604,8 @@ PagetableEntry Pairs ParallelAppendState ParallelBitmapHeapState +ParallelBlockTableScanDesc +ParallelBlockTableScanDescData ParallelCompletionPtr ParallelContext ParallelExecutorInfo @@ -1609,8 +1613,8 @@ ParallelHashGrowth ParallelHashJoinBatch ParallelHashJoinBatchAccessor ParallelHashJoinState -ParallelHeapScanDesc -ParallelHeapScanDescData +ParallelTableScanDesc +ParallelTableScanDescData ParallelIndexScanDesc ParallelSlot ParallelState @@ -2316,6 +2320,8 @@ TableFuncScanState TableInfo TableLikeClause TableSampleClause +TableScanDesc +TableScanDescData TableSpaceCacheEntry TableSpaceOpts TablespaceList @@ -2410,6 +2416,7 @@ TupleHashIterator TupleHashTable TupleQueueReader TupleTableSlot +TupleTableSlotOps TuplesortInstrumentation TuplesortMethod TuplesortSpaceType -- 2.40.0