]> granicus.if.org Git - postgresql/commitdiff
In XLogReadBufferExtended, don't assume P_NEW yields consecutive pages.
authorTom Lane <tgl@sss.pgh.pa.us>
Wed, 12 Feb 2014 19:52:23 +0000 (14:52 -0500)
committerTom Lane <tgl@sss.pgh.pa.us>
Wed, 12 Feb 2014 19:52:23 +0000 (14:52 -0500)
In a database that's not yet reached consistency, it's possible that some
segments of a relation are not full-size but are not the last ones either.
Because of the way smgrnblocks() works, asking for a new page with P_NEW
will fill in the last not-full-size segment --- and if that makes it full
size, the apparent EOF of the relation will increase by more than one page,
so that the next P_NEW request will yield a page past the next consecutive
one.  This breaks the relation-extension logic in XLogReadBufferExtended,
possibly allowing a page update to be applied to some page far past where
it was intended to go.  This appears to be the explanation for reports of
table bloat on replication slaves compared to their masters, and probably
explains some corrupted-slave reports as well.

Fix the loop to check the page number it actually got, rather than merely
Assert()'ing that dead reckoning got it to the desired place.  AFAICT,
there are no other places that make assumptions about exactly which page
they'll get from P_NEW.

Problem identified by Greg Stark, though this is not the same as his
proposed patch.

It's been like this for a long time, so back-patch to all supported
branches.

src/backend/access/transam/xlogutils.c

index ee70340d7a6d4781442a8783094c2f68d78bf305..99414d98bc4f8c0560b6093d8d001a679872a7e1 100644 (file)
@@ -337,15 +337,21 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
                /* we do this in recovery only - no rel-extension lock needed */
                Assert(InRecovery);
                buffer = InvalidBuffer;
-               while (blkno >= lastblock)
+               do
                {
                        if (buffer != InvalidBuffer)
                                ReleaseBuffer(buffer);
                        buffer = ReadBufferWithoutRelcache(rnode, forknum,
                                                                                           P_NEW, mode, NULL);
-                       lastblock++;
                }
-               Assert(BufferGetBlockNumber(buffer) == blkno);
+               while (BufferGetBlockNumber(buffer) < blkno);
+               /* Handle the corner case that P_NEW returns non-consecutive pages */
+               if (BufferGetBlockNumber(buffer) != blkno)
+               {
+                       ReleaseBuffer(buffer);
+                       buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno,
+                                                                                          mode, NULL);
+               }
        }
 
        if (mode == RBM_NORMAL)