From: Tom Lane Date: Sat, 20 Aug 2011 18:51:02 +0000 (-0400) Subject: Fix performance problem when building a lossy tidbitmap. X-Git-Tag: REL9_2_BETA1~1254 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=08e1eedf247e04a9652d997a74ceb46d889124ba;p=postgresql Fix performance problem when building a lossy tidbitmap. As pointed out by Sergey Koposov, repeated invocations of tbm_lossify can make building a large tidbitmap into an O(N^2) operation. To fix, make sure we remove more than the minimum amount of information per call, and add a fallback path to behave sanely if we're unable to fit the bitmap within the requested amount of memory. This has been wrong since the tidbitmap code was written, so back-patch to all supported branches. --- diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c index 33f94c03b2..6f806fda85 100644 --- a/src/backend/nodes/tidbitmap.c +++ b/src/backend/nodes/tidbitmap.c @@ -953,8 +953,11 @@ tbm_lossify(TIDBitmap *tbm) /* * XXX Really stupid implementation: this just lossifies pages in * essentially random order. We should be paying some attention to the - * number of bits set in each page, instead. Also it might be a good idea - * to lossify more than the minimum number of pages during each call. + * number of bits set in each page, instead. + * + * Since we are called as soon as nentries exceeds maxentries, we should + * push nentries down to significantly less than maxentries, or else we'll + * just end up doing this again very soon. We shoot for maxentries/2. */ Assert(!tbm->iterating); Assert(tbm->status == TBM_HASH); @@ -975,7 +978,7 @@ tbm_lossify(TIDBitmap *tbm) /* This does the dirty work ... */ tbm_mark_page_lossy(tbm, page->blockno); - if (tbm->nentries <= tbm->maxentries) + if (tbm->nentries <= tbm->maxentries / 2) { /* we have done enough */ hash_seq_term(&status); @@ -988,6 +991,19 @@ tbm_lossify(TIDBitmap *tbm) * not care whether we visit lossy chunks or not. */ } + + /* + * With a big bitmap and small work_mem, it's possible that we cannot + * get under maxentries. Again, if that happens, we'd end up uselessly + * calling tbm_lossify over and over. To prevent this from becoming a + * performance sink, force maxentries up to at least double the current + * number of entries. (In essence, we're admitting inability to fit + * within work_mem when we do this.) Note that this test will not fire + * if we broke out of the loop early; and if we didn't, the current + * number of entries is simply not reducible any further. + */ + if (tbm->nentries > tbm->maxentries / 2) + tbm->maxentries = Min(tbm->nentries, (INT_MAX - 1) / 2) * 2; } /*