From: Peter Geoghegan Date: Wed, 15 May 2019 19:22:07 +0000 (-0700) Subject: Reverse order of newitem nbtree candidate splits. X-Git-Tag: REL_12_BETA1~34 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=7505da2f459776a397177f7d591991f5591c2812;p=postgresql Reverse order of newitem nbtree candidate splits. Commit fab25024, which taught nbtree to choose candidate split points more carefully, had _bt_findsplitloc() record all possible split points in an initial pass over a page that is about to be split. The order that candidate split points were processed and stored in was assumed to match the offset number order of split points on an imaginary version of the page that contains the same items as the original, but also fits newitem (the item that provoked the split precisely because it didn't fit). However, the order of split points in the final array was not quite what was expected: the split point that makes newitem the firstright item came after the split point that makes newitem the lastleft item -- not before. As a result, _bt_findsplitloc() could get confused about the leftmost and rightmost tuples among all possible split points recorded for the page. This seems to have no appreciable impact on the quality of the final split point chosen by _bt_findsplitloc(), but it's still wrong. To fix, switch the order in which newitem candidate splits are recorded in. This also makes it possible to describe candidate split points in terms of which pair of adjoining tuples enclose the split point within _bt_findsplitloc(), making it clearer why it's generally safe for _bt_split() to expect lastleft and firstright tuples. --- diff --git a/src/backend/access/nbtree/nbtsplitloc.c b/src/backend/access/nbtree/nbtsplitloc.c index 50d2faba29..d045fe0d10 100644 --- a/src/backend/access/nbtree/nbtsplitloc.c +++ b/src/backend/access/nbtree/nbtsplitloc.c @@ -196,11 +196,7 @@ _bt_findsplitloc(Relation rel, /* * Scan through the data items and calculate space usage for a split at - * each possible position. We start at the first data offset rather than - * the second data offset to handle the "newitemoff == first data offset" - * case (any other split whose firstoldonright is the first data offset - * can't be legal, though, and so won't actually end up being recorded in - * first loop iteration). + * each possible position */ olddataitemstoleft = 0; @@ -214,27 +210,38 @@ _bt_findsplitloc(Relation rel, itemsz = MAXALIGN(ItemIdGetLength(itemid)) + sizeof(ItemIdData); /* - * Will the new item go to left or right of split? + * When item offset number is not newitemoff, neither side of the + * split can be newitem. Record a split after the previous data item + * from original page, but before the current data item from original + * page. (_bt_recsplitloc() will reject the split when there are no + * previous data items, which we rely on.) */ - if (offnum > newitemoff) - _bt_recsplitloc(&state, offnum, true, olddataitemstoleft, itemsz); - else if (offnum < newitemoff) + if (offnum < newitemoff) _bt_recsplitloc(&state, offnum, false, olddataitemstoleft, itemsz); + else if (offnum > newitemoff) + _bt_recsplitloc(&state, offnum, true, olddataitemstoleft, itemsz); else { - /* may need to record a split on one or both sides of new item */ - _bt_recsplitloc(&state, offnum, true, olddataitemstoleft, itemsz); + /* + * Record a split after all "offnum < newitemoff" original page + * data items, but before newitem + */ _bt_recsplitloc(&state, offnum, false, olddataitemstoleft, itemsz); + + /* + * Record a split after newitem, but before data item from + * original page at offset newitemoff/current offset + */ + _bt_recsplitloc(&state, offnum, true, olddataitemstoleft, itemsz); } olddataitemstoleft += itemsz; } /* - * If the new item goes as the last item, record the split point that - * leaves all the old items on the left page, and the new item on the - * right page. This is required because a split that leaves the new item - * as the firstoldonright won't have been reached within the loop. + * Record a split after all original page data items, but before newitem. + * (Though only when it's possible that newitem will end up alone on new + * right page.) */ Assert(olddataitemstoleft == olddataitemstotal); if (newitemoff > maxoff)