Add bitmap index mention.

author Bruce Momjian <bruce@momjian.us>

Tue, 13 Aug 2002 05:08:35 +0000 (05:08 +0000)

committer Bruce Momjian <bruce@momjian.us>

Tue, 13 Aug 2002 05:08:35 +0000 (05:08 +0000)
author Bruce Momjian <bruce@momjian.us>
Tue, 13 Aug 2002 05:08:35 +0000 (05:08 +0000)
committer Bruce Momjian <bruce@momjian.us>
Tue, 13 Aug 2002 05:08:35 +0000 (05:08 +0000)
diff --git a/doc/TODO.detail/performance b/doc/TODO.detail/performance

index 8d38aa5ddb79a6e90c2b3eb7086c8263a1b13919..19a6df3de38a1018db820abbddce173b5d65abf4 100644 (file)
--- a/doc/TODO.detail/performance
+++ b/doc/TODO.detail/performance
@@ -345,7 +345,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 10:31:10 1999
  Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
         by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA29087
         for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:31:08 -0400 (EDT)
-Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.12 $) with ESMTP id KAA27535 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:19:47 -0400 (EDT)
+Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.13 $) with ESMTP id KAA27535 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:19:47 -0400 (EDT)
  Received: from localhost (majordom@localhost)
         by hub.org (8.9.3/8.9.3) with SMTP id KAA30328;
         Tue, 19 Oct 1999 10:12:10 -0400 (EDT)
@@ -454,7 +454,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 21:25:30 1999
  Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
         by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA28130
         for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:25:26 -0400 (EDT)
-Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.12 $) with ESMTP id VAA10512 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:15:28 -0400 (EDT)
+Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.13 $) with ESMTP id VAA10512 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:15:28 -0400 (EDT)
  Received: from localhost (majordom@localhost)
         by hub.org (8.9.3/8.9.3) with SMTP id VAA50745;
         Tue, 19 Oct 1999 21:07:23 -0400 (EDT)
@@ -1006,7 +1006,7 @@ From pgsql-general-owner+M2497@hub.org Fri Jun 16 18:31:03 2000
  Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
         by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04165
         for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:31:01 -0400 (EDT)
-Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.12 $) with ESMTP id RAA13110 for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:20:12 -0400 (EDT)
+Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.13 $) with ESMTP id RAA13110 for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:20:12 -0400 (EDT)
  Received: from hub.org (majordom@localhost [127.0.0.1])
         by hub.org (8.10.1/8.10.1) with SMTP id e5GLDaM14477;
         Fri, 16 Jun 2000 17:13:36 -0400 (EDT)
@@ -1649,3 +1649,593 @@ OEV6eO8MnBSlbJMHiQ08gNE=
  --=-Ivchb84S75fOMzJ9DxwK--
  
  
+From pgsql-hackers-owner+M26157@postgresql.org Tue Aug  6 23:06:34 2002
+Date: Wed, 7 Aug 2002 13:07:38 +1000 (EST)
+From: Gavin Sherry <swm@linuxworld.com.au>
+To: Curt Sampson <cjs@cynic.net>
+cc: pgsql-hackers@postgresql.org
+Subject: Re: [HACKERS] CLUSTER and indisclustered
+In-Reply-To: <Pine.NEB.4.44.0208071126590.1214-100000@angelic.cynic.net>
+Message-ID: <Pine.LNX.4.21.0208071259210.13438-100000@linuxworld.com.au>
+X-Virus-Scanned: by AMaViS new-20020517
+Precedence: bulk
+Sender: pgsql-hackers-owner@postgresql.org
+X-Virus-Scanned: by AMaViS new-20020517
+Content-Length:  1357
+
+On Wed, 7 Aug 2002, Curt Sampson wrote:
+
+> But after doing some benchmarking of various sorts of random reads
+> and writes, it occurred to me that there might be optimizations
+> that could help a lot with this sort of thing. What if, when we've
+> got an index block with a bunch of entries, instead of doing the
+> reads in the order of the entries, we do them in the order of the
+> blocks the entries point to? That would introduce a certain amount
+> of "sequentialness" to the reads that the OS is not capable of
+> introducing (since it can't reschedule the reads you're doing, the
+> way it could reschedule, say, random writes).
+
+This sounds more or less like the method employed by Firebird as described
+by Ann Douglas to Tom at OSCON (correct me if I get this wrong).
+
+Basically, firebird populates a bitmap with entries the scan is interested
+in. The bitmap is populated in page order so that all entries on the same
+heap page can be fetched at once.
+
+This is totally different to the way postgres does things and would
+require significant modification to the index access methods.
+
+Gavin
+
+
+---------------------------(end of broadcast)---------------------------
+TIP 3: if posting/reading through Usenet, please send an appropriate
+subscribe-nomail command to majordomo@postgresql.org so that your
+message can get through to the mailing list cleanly
+
+From pgsql-hackers-owner+M26162@postgresql.org Wed Aug  7 00:42:35 2002
+To: Curt Sampson <cjs@cynic.net>
+cc: mark Kirkwood <markir@slithery.org>, Gavin Sherry <swm@linuxworld.com.au>, 
+          Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
+Subject: Re: [HACKERS] CLUSTER and indisclustered 
+In-Reply-To: <Pine.NEB.4.44.0208071126590.1214-100000@angelic.cynic.net> 
+References: <Pine.NEB.4.44.0208071126590.1214-100000@angelic.cynic.net>
+Comments: In-reply-to Curt Sampson <cjs@cynic.net>
+       message dated "Wed, 07 Aug 2002 11:31:32 +0900"
+Date: Wed, 07 Aug 2002 00:41:47 -0400
+Message-ID: <12593.1028695307@sss.pgh.pa.us>
+From: Tom Lane <tgl@sss.pgh.pa.us>
+X-Virus-Scanned: by AMaViS new-20020517
+Precedence: bulk
+Sender: pgsql-hackers-owner@postgresql.org
+X-Virus-Scanned: by AMaViS new-20020517
+Content-Length:  3063
+
+Curt Sampson <cjs@cynic.net> writes:
+> But after doing some benchmarking of various sorts of random reads
+> and writes, it occurred to me that there might be optimizations
+> that could help a lot with this sort of thing. What if, when we've
+> got an index block with a bunch of entries, instead of doing the
+> reads in the order of the entries, we do them in the order of the
+> blocks the entries point to?
+
+I thought to myself "didn't I just post something about that?"
+and then realized it was on a different mailing list.  Here ya go
+(and no, this is not the first time around on this list either...)
+
+
+I am currently thinking that bitmap indexes per se are not all that
+interesting.  What does interest me is bitmapped index lookup, which
+came back into mind after hearing Ann Harrison describe how FireBird/
+InterBase does it.
+
+The idea is that you don't scan the index and base table concurrently
+as we presently do it.  Instead, you scan the index and make a list
+of the TIDs of the table tuples you need to visit.  This list can
+be conveniently represented as a sparse bitmap.  After you've finished
+looking at the index, you visit all the required table tuples *in
+physical order* using the bitmap.  This eliminates multiple fetches
+of the same heap page, and can possibly let you get some win from
+sequential access.
+
+Once you have built this mechanism, you can then move on to using
+multiple indexes in interesting ways: you can do several indexscans
+in one query and then AND or OR their bitmaps before doing the heap
+scan.  This would allow, for example, "WHERE a = foo and b = bar"
+to be handled by ANDing results from separate indexes on the a and b
+columns, rather than having to choose only one index to use as we do
+now.
+
+Some thoughts about implementation: FireBird's implementation seems
+to depend on an assumption about a fixed number of tuple pointers
+per page.  We don't have that, but we could probably get away with
+just allocating BLCKSZ/sizeof(HeapTupleHeaderData) bits per page.
+Also, the main downside of this approach is that the bitmap could
+get large --- but you could have some logic that causes you to fall
+back to plain sequential scan if you get too many index hits.  (It's
+interesting to think of this as lossy compression of the bitmap...
+which leads to the idea of only being fuzzy in limited areas of the
+bitmap, rather than losing all the information you have.)
+
+A possibly nasty issue is that lazy VACUUM has some assumptions in it
+about indexscans holding pins on index pages --- that's what prevents
+it from removing heap tuples that a concurrent indexscan is just about
+to visit.  It might be that there is no problem: even if lazy VACUUM
+removes a heap tuple and someone else then installs a new tuple in that
+same TID slot, you should be okay because the new tuple is too new to
+pass your visibility test.  But I'm not convinced this is safe.
+
+                       regards, tom lane
+
+---------------------------(end of broadcast)---------------------------
+TIP 6: Have you searched our list archives?
+
+http://archives.postgresql.org
+
+From pgsql-hackers-owner+M26172@postgresql.org Wed Aug  7 02:49:56 2002
+X-Authentication-Warning: rh72.home.ee: hannu set sender to hannu@tm.ee using -f
+Subject: Re: [HACKERS] CLUSTER and indisclustered
+From: Hannu Krosing <hannu@tm.ee>
+To: Tom Lane <tgl@sss.pgh.pa.us>
+cc: Curt Sampson <cjs@cynic.net>, mark Kirkwood <markir@slithery.org>, 
+          Gavin Sherry <swm@linuxworld.com.au>, 
+          Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
+In-Reply-To: <12776.1028697148@sss.pgh.pa.us>
+References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net> 
+       <12776.1028697148@sss.pgh.pa.us>
+X-Mailer: Ximian Evolution 1.0.7 
+Date: 07 Aug 2002 09:46:29 +0500
+Message-ID: <1028695589.2133.11.camel@rh72.home.ee>
+X-Virus-Scanned: by AMaViS new-20020517
+Precedence: bulk
+Sender: pgsql-hackers-owner@postgresql.org
+X-Virus-Scanned: by AMaViS new-20020517
+Content-Length:  1064
+
+On Wed, 2002-08-07 at 10:12, Tom Lane wrote:
+> Curt Sampson <cjs@cynic.net> writes:
+> > On Wed, 7 Aug 2002, Tom Lane wrote:
+> >> Also, the main downside of this approach is that the bitmap could
+> >> get large --- but you could have some logic that causes you to fall
+> >> back to plain sequential scan if you get too many index hits.
+> 
+> > Well, what I was thinking of, should the list of TIDs to fetch get too
+> > long, was just to break it down in to chunks.
+> 
+> But then you lose the possibility of combining multiple indexes through
+> bitmap AND/OR steps, which seems quite interesting to me.  If you've
+> visited only a part of each index then you can't apply that concept.
+
+When the tuples are small relative to pagesize, you may get some
+"compression" by saving just pages and not the actual tids in the the
+bitmap.
+
+-------------
+Hannu
+
+---------------------------(end of broadcast)---------------------------
+TIP 2: you can get off all lists at once with the unregister command
+    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
+
+From pgsql-hackers-owner+M26166@postgresql.org Wed Aug  7 00:55:52 2002
+Date: Wed, 7 Aug 2002 13:55:41 +0900 (JST)
+From: Curt Sampson <cjs@cynic.net>
+To: Tom Lane <tgl@sss.pgh.pa.us>
+cc: mark Kirkwood <markir@slithery.org>, Gavin Sherry <swm@linuxworld.com.au>, 
+          Bruce Momjian <pgman@candle.pha.pa.us>,  <pgsql-hackers@postgresql.org>
+Subject: Re: [HACKERS] CLUSTER and indisclustered 
+In-Reply-To: <12593.1028695307@sss.pgh.pa.us>
+Message-ID: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net>
+X-Virus-Scanned: by AMaViS new-20020517
+Precedence: bulk
+Sender: pgsql-hackers-owner@postgresql.org
+X-Virus-Scanned: by AMaViS new-20020517
+Content-Length:  1840
+
+On Wed, 7 Aug 2002, Tom Lane wrote:
+
+> I thought to myself "didn't I just post something about that?"
+> and then realized it was on a different mailing list.  Here ya go
+> (and no, this is not the first time around on this list either...)
+
+Wow. I'm glad to see you looking at this, because this feature would so
+*so* much for the performance of some of my queries, and really, really
+impress my "billion-row-database" client.
+
+> The idea is that you don't scan the index and base table concurrently
+> as we presently do it.  Instead, you scan the index and make a list
+> of the TIDs of the table tuples you need to visit.
+
+Right.
+
+> Also, the main downside of this approach is that the bitmap could
+> get large --- but you could have some logic that causes you to fall
+> back to plain sequential scan if you get too many index hits.
+
+Well, what I was thinking of, should the list of TIDs to fetch get too
+long, was just to break it down in to chunks. If you want to limit to,
+say, 1000 TIDs, and your index has 3000, just do the first 1000, then
+the next 1000, then the last 1000. This would still result in much less
+disk head movement and speed the query immensely.
+
+(BTW, I have verified this emperically during testing of random read vs.
+random write on a RAID controller. The writes were 5-10 times faster
+than the reads because the controller was caching a number of writes and
+then doing them in the best possible order, whereas the reads had to be
+satisfied in the order they were submitted to the controller.)
+
+cjs
+-- 
+Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
+    Don't you know, in this new Dark Age, we're all light.  --XTC
+
+
+---------------------------(end of broadcast)---------------------------
+TIP 5: Have you checked our extensive FAQ?
+
+http://www.postgresql.org/users-lounge/docs/faq.html
+
+From pgsql-hackers-owner+M26167@postgresql.org Wed Aug  7 01:12:54 2002
+To: Curt Sampson <cjs@cynic.net>
+cc: mark Kirkwood <markir@slithery.org>, Gavin Sherry <swm@linuxworld.com.au>, 
+          Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
+Subject: Re: [HACKERS] CLUSTER and indisclustered 
+In-Reply-To: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net> 
+References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net>
+Comments: In-reply-to Curt Sampson <cjs@cynic.net>
+       message dated "Wed, 07 Aug 2002 13:55:41 +0900"
+Date: Wed, 07 Aug 2002 01:12:28 -0400
+Message-ID: <12776.1028697148@sss.pgh.pa.us>
+From: Tom Lane <tgl@sss.pgh.pa.us>
+X-Virus-Scanned: by AMaViS new-20020517
+Precedence: bulk
+Sender: pgsql-hackers-owner@postgresql.org
+X-Virus-Scanned: by AMaViS new-20020517
+Content-Length:  1428
+
+Curt Sampson <cjs@cynic.net> writes:
+> On Wed, 7 Aug 2002, Tom Lane wrote:
+>> Also, the main downside of this approach is that the bitmap could
+>> get large --- but you could have some logic that causes you to fall
+>> back to plain sequential scan if you get too many index hits.
+
+> Well, what I was thinking of, should the list of TIDs to fetch get too
+> long, was just to break it down in to chunks.
+
+But then you lose the possibility of combining multiple indexes through
+bitmap AND/OR steps, which seems quite interesting to me.  If you've
+visited only a part of each index then you can't apply that concept.
+
+Another point to keep in mind is that the bigger the bitmap gets, the
+less useful an indexscan is, by definition --- sooner or later you might
+as well fall back to a seqscan.  So the idea of lossy compression of a
+large bitmap seems really ideal to me.  In principle you could seqscan
+the parts of the table where matching tuples are thick on the ground,
+and indexscan the parts where they ain't.  Maybe this seems natural
+to me as an old JPEG campaigner, but if you don't see the logic I
+recommend thinking about it a little ...
+
+                       regards, tom lane
+
+---------------------------(end of broadcast)---------------------------
+TIP 3: if posting/reading through Usenet, please send an appropriate
+subscribe-nomail command to majordomo@postgresql.org so that your
+message can get through to the mailing list cleanly
+
+From tgl@sss.pgh.pa.us Wed Aug  7 09:27:05 2002
+To: Hannu Krosing <hannu@tm.ee>
+cc: Curt Sampson <cjs@cynic.net>, mark Kirkwood <markir@slithery.org>, 
+          Gavin Sherry <swm@linuxworld.com.au>, 
+          Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
+Subject: Re: [HACKERS] CLUSTER and indisclustered 
+In-Reply-To: <1028726966.13418.12.camel@taru.tm.ee> 
+References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net> <12776.1028697148@sss.pgh.pa.us> <1028695589.2133.11.camel@rh72.home.ee> <1028726966.13418.12.camel@taru.tm.ee>
+Comments: In-reply-to Hannu Krosing <hannu@tm.ee>
+       message dated "07 Aug 2002 15:29:26 +0200"
+Date: Wed, 07 Aug 2002 09:26:42 -0400
+Message-ID: <15010.1028726802@sss.pgh.pa.us>
+From: Tom Lane <tgl@sss.pgh.pa.us>
+Content-Length:  1120
+
+Hannu Krosing <hannu@tm.ee> writes:
+> Now I remembered my original preference for page bitmaps (vs. tuple
+> bitmaps): one can't actually make good use of a bitmap of tuples because
+> there is no fixed tuples/page ratio and thus no way to quickly go from
+> bit position to actual tuple. You mention the same problem but propose a
+> different solution.
+
+> Using page bitmap, we will at least avoid fetching any unneeded pages -
+> essentially we will have a sequential scan over possibly interesting
+> pages.
+
+Right.  One form of the "lossy compression" idea I suggested is to
+switch from a per-tuple bitmap to a per-page bitmap once the bitmap gets
+too large to work with.  Again, one could imagine doing that only in
+denser areas of the bitmap.
+
+> But I guess that CLUSTER support for INSERT will not be touched for 7.3
+> as will real bitmap indexes ;)
+
+All of this is far-future work I think.  Adding a new scan type to the
+executor would probably be pretty localized, but the ramifications in
+the planner could be extensive --- especially if you want to do plans
+involving ANDed or ORed bitmaps.
+
+                       regards, tom lane
+
+From pgsql-hackers-owner+M26178@postgresql.org Wed Aug  7 08:28:14 2002
+X-Authentication-Warning: taru.tm.ee: hannu set sender to hannu@tm.ee using -f
+Subject: Re: [HACKERS] CLUSTER and indisclustered
+From: Hannu Krosing <hannu@tm.ee>
+To: Hannu Krosing <hannu@tm.ee>
+cc: Tom Lane <tgl@sss.pgh.pa.us>, Curt Sampson <cjs@cynic.net>, 
+          mark Kirkwood <markir@slithery.org>, Gavin Sherry <swm@linuxworld.com.au>, 
+          Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
+In-Reply-To: <1028695589.2133.11.camel@rh72.home.ee>
+References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net> 
+       <12776.1028697148@sss.pgh.pa.us>  <1028695589.2133.11.camel@rh72.home.ee>
+X-Mailer: Ximian Evolution 1.0.3.99 
+Date: 07 Aug 2002 15:29:26 +0200
+Message-ID: <1028726966.13418.12.camel@taru.tm.ee>
+X-Virus-Scanned: by AMaViS new-20020517
+Precedence: bulk
+Sender: pgsql-hackers-owner@postgresql.org
+X-Virus-Scanned: by AMaViS new-20020517
+Content-Length:  1837
+
+On Wed, 2002-08-07 at 06:46, Hannu Krosing wrote:
+> On Wed, 2002-08-07 at 10:12, Tom Lane wrote:
+> > Curt Sampson <cjs@cynic.net> writes:
+> > > On Wed, 7 Aug 2002, Tom Lane wrote:
+> > >> Also, the main downside of this approach is that the bitmap could
+> > >> get large --- but you could have some logic that causes you to fall
+> > >> back to plain sequential scan if you get too many index hits.
+> > 
+> > > Well, what I was thinking of, should the list of TIDs to fetch get too
+> > > long, was just to break it down in to chunks.
+> > 
+> > But then you lose the possibility of combining multiple indexes through
+> > bitmap AND/OR steps, which seems quite interesting to me.  If you've
+> > visited only a part of each index then you can't apply that concept.
+> 
+> When the tuples are small relative to pagesize, you may get some
+> "compression" by saving just pages and not the actual tids in the the
+> bitmap.
+
+Now I remembered my original preference for page bitmaps (vs. tuple
+bitmaps): one can't actually make good use of a bitmap of tuples because
+there is no fixed tuples/page ratio and thus no way to quickly go from
+bit position to actual tuple. You mention the same problem but propose a
+different solution.
+
+Using page bitmap, we will at least avoid fetching any unneeded pages -
+essentially we will have a sequential scan over possibly interesting
+pages.
+
+If we were to use page-bitmap index for something with only a few values
+like booleans, some insert-time local clustering should be useful, so
+that TRUEs and FALSEs end up on different pages.
+
+But I guess that CLUSTER support for INSERT will not be touched for 7.3
+as will real bitmap indexes ;)
+
+---------------
+Hannu
+
+
+---------------------------(end of broadcast)---------------------------
+TIP 6: Have you searched our list archives?
+
+http://archives.postgresql.org
+
+From pgsql-hackers-owner+M26192@postgresql.org Wed Aug  7 10:26:30 2002
+To: Hannu Krosing <hannu@tm.ee>
+cc: Curt Sampson <cjs@cynic.net>, mark Kirkwood <markir@slithery.org>, 
+          Gavin Sherry <swm@linuxworld.com.au>, 
+          Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
+Subject: Re: [HACKERS] CLUSTER and indisclustered 
+In-Reply-To: <1028733234.13418.113.camel@taru.tm.ee> 
+References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net> <12776.1028697148@sss.pgh.pa.us> <1028695589.2133.11.camel@rh72.home.ee> <1028726966.13418.12.camel@taru.tm.ee> <15010.1028726802@sss.pgh.pa.us> <1028733234.13418.113.camel@taru.tm.ee>
+Comments: In-reply-to Hannu Krosing <hannu@tm.ee>
+       message dated "07 Aug 2002 17:13:54 +0200"
+Date: Wed, 07 Aug 2002 10:26:13 -0400
+Message-ID: <15622.1028730373@sss.pgh.pa.us>
+From: Tom Lane <tgl@sss.pgh.pa.us>
+X-Virus-Scanned: by AMaViS new-20020517
+Precedence: bulk
+Sender: pgsql-hackers-owner@postgresql.org
+X-Virus-Scanned: by AMaViS new-20020517
+Content-Length:  1224
+
+Hannu Krosing <hannu@tm.ee> writes:
+> On Wed, 2002-08-07 at 15:26, Tom Lane wrote:
+>> Right.  One form of the "lossy compression" idea I suggested is to
+>> switch from a per-tuple bitmap to a per-page bitmap once the bitmap gets
+>> too large to work with.  
+
+> If it is a real bitmap, should it not be easyeast to allocate at the
+> start ?
+
+But it isn't a "real bitmap".  That would be a really poor
+implementation, both for space and speed --- do you really want to scan
+over a couple of megs of zeroes to find the few one-bits you care about,
+in the typical case?  "Bitmap" is a convenient term because it describes
+the abstract behavior we want, but the actual data structure will
+probably be nontrivial.  If I recall Ann's description correctly,
+Firebird's implementation uses run length coding of some kind (anyone
+care to dig in their source and get all the details?).  If we tried
+anything in the way of lossy compression then there'd be even more stuff
+lurking under the hood.
+
+                       regards, tom lane
+
+---------------------------(end of broadcast)---------------------------
+TIP 2: you can get off all lists at once with the unregister command
+    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
+
+From pgsql-hackers-owner+M26188@postgresql.org Wed Aug  7 10:12:26 2002
+X-Authentication-Warning: taru.tm.ee: hannu set sender to hannu@tm.ee using -f
+Subject: Re: [HACKERS] CLUSTER and indisclustered
+From: Hannu Krosing <hannu@tm.ee>
+To: Tom Lane <tgl@sss.pgh.pa.us>
+cc: Curt Sampson <cjs@cynic.net>, mark Kirkwood <markir@slithery.org>, 
+          Gavin Sherry <swm@linuxworld.com.au>, 
+          Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
+In-Reply-To: <15010.1028726802@sss.pgh.pa.us>
+References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net>
+       <12776.1028697148@sss.pgh.pa.us> <1028695589.2133.11.camel@rh72.home.ee>
+       <1028726966.13418.12.camel@taru.tm.ee>  <15010.1028726802@sss.pgh.pa.us>
+X-Mailer: Ximian Evolution 1.0.3.99 
+Date: 07 Aug 2002 17:13:54 +0200
+Message-ID: <1028733234.13418.113.camel@taru.tm.ee>
+X-Virus-Scanned: by AMaViS new-20020517
+Precedence: bulk
+Sender: pgsql-hackers-owner@postgresql.org
+X-Virus-Scanned: by AMaViS new-20020517
+Content-Length:  2812
+
+On Wed, 2002-08-07 at 15:26, Tom Lane wrote:
+> Hannu Krosing <hannu@tm.ee> writes:
+> > Now I remembered my original preference for page bitmaps (vs. tuple
+> > bitmaps): one can't actually make good use of a bitmap of tuples because
+> > there is no fixed tuples/page ratio and thus no way to quickly go from
+> > bit position to actual tuple. You mention the same problem but propose a
+> > different solution.
+> 
+> > Using page bitmap, we will at least avoid fetching any unneeded pages -
+> > essentially we will have a sequential scan over possibly interesting
+> > pages.
+> 
+> Right.  One form of the "lossy compression" idea I suggested is to
+> switch from a per-tuple bitmap to a per-page bitmap once the bitmap gets
+> too large to work with.  
+
+If it is a real bitmap, should it not be easyeast to allocate at the
+start ?
+
+a page bitmap for a 100 000 000 tuple table with 10 tuples/page will be
+sized 10000000/8 = 1.25 MB, which does not look too big for me for that
+amount of data (the data table itself would occupy 80 GB).
+
+Even having the bitmap of 16 bits/page (with the bits 0-14 meaning
+tuples 0-14 and bit 15 meaning "seq scan the rest of page") would
+consume just 20 MB of _local_ memory, and would be quite justifyiable
+for a query on a table that large.
+
+For a real bitmap index the tuples-per-page should be a user-supplied
+tuning parameter.
+
+> Again, one could imagine doing that only in denser areas of the bitmap.
+
+I would hardly call the resulting structure "a bitmap" ;)
+
+And I'm not sure the overhead for a more complex structure would win us
+any additional performance for most cases.
+
+> > But I guess that CLUSTER support for INSERT will not be touched for 7.3
+> > as will real bitmap indexes ;)
+> 
+> All of this is far-future work I think. 
+
+After we do that we will probably be able claim support for
+"datawarehousing" ;)
+
+> Adding a new scan type to the
+> executor would probably be pretty localized, but the ramifications in
+> the planner could be extensive --- especially if you want to do plans
+> involving ANDed or ORed bitmaps.
+
+Also going to "smart inserter" which can do local clustering on sets of
+real bitmap indexes for INSERTS (and INSERT side of UPDATE) would
+probably be a major change from our current "stupid inserter" ;)
+
+This will not be needed for bitmap resolution higher than 1bit/page but
+default local clustering on bitmap indexes will probably buy us some
+extra performance. by avoiding data page fetches when such indexes are
+used.
+
+AN anyway the support for INSERT being aware of clustering will probably
+come up sometime.
+
+------------
+Hannu
+
+
+
+---------------------------(end of broadcast)---------------------------
+TIP 2: you can get off all lists at once with the unregister command
+    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
+
+From hannu@tm.ee Wed Aug  7 11:22:53 2002
+X-Authentication-Warning: taru.tm.ee: hannu set sender to hannu@tm.ee using -f
+Subject: Re: [HACKERS] CLUSTER and indisclustered
+From: Hannu Krosing <hannu@tm.ee>
+To: Tom Lane <tgl@sss.pgh.pa.us>
+cc: Curt Sampson <cjs@cynic.net>, mark Kirkwood <markir@slithery.org>, 
+          Gavin 
+        Sherry <swm@linuxworld.com.au>, 
+          Bruce Momjian <pgman@candle.pha.pa.us>, pgsql-hackers@postgresql.org
+In-Reply-To: <15622.1028730373@sss.pgh.pa.us>
+References: <Pine.NEB.4.44.0208071351440.1214-100000@angelic.cynic.net>
+       <12776.1028697148@sss.pgh.pa.us> <1028695589.2133.11.camel@rh72.home.ee>
+       <1028726966.13418.12.camel@taru.tm.ee> <15010.1028726802@sss.pgh.pa.us>
+       <1028733234.13418.113.camel@taru.tm.ee>  <15622.1028730373@sss.pgh.pa.us>
+X-Mailer: Ximian Evolution 1.0.3.99 
+Date: 07 Aug 2002 18:24:30 +0200
+Message-ID: <1028737470.13419.182.camel@taru.tm.ee>
+Content-Length:  2382
+
+On Wed, 2002-08-07 at 16:26, Tom Lane wrote:
+> Hannu Krosing <hannu@tm.ee> writes:
+> > On Wed, 2002-08-07 at 15:26, Tom Lane wrote:
+> >> Right.  One form of the "lossy compression" idea I suggested is to
+> >> switch from a per-tuple bitmap to a per-page bitmap once the bitmap gets
+> >> too large to work with.  
+> 
+> > If it is a real bitmap, should it not be easyeast to allocate at the
+> > start ?
+> 
+> But it isn't a "real bitmap".  That would be a really poor
+> implementation, both for space and speed --- do you really want to scan
+> over a couple of megs of zeroes to find the few one-bits you care about,
+> in the typical case?
+
+I guess that depends on data. The typical case should be somthing the
+stats process will find out so the optimiser can use it
+
+The bitmap must be less than 1/48 (size of TID) full for best
+uncompressed "active-tid-list" to be smaller than plain bitmap. If there
+were some structure above list then this ratio would be even higher.
+
+I have had good experience using "compressed delta lists", which will
+scale well ofer the whole "fullness" spectrum of bitmap, but this is for
+storage, not for initial constructing of lists.
+
+>  "Bitmap" is a convenient term because it describes
+> the abstract behavior we want, but the actual data structure will
+> probably be nontrivial.  If I recall Ann's description correctly,
+> Firebird's implementation uses run length coding of some kind (anyone
+> care to dig in their source and get all the details?).
+
+Plain RLL is probably a good way to store it and for merging two or more
+bitmaps, but not as good for constructing it bit-by-bit. I guess the
+most effective structure for updating is often still a plain bitmap
+(maybe not if it is very sparse and all of it does not fit in cache),
+followed by some kind of balanced tree (maybe rb-tree).
+
+If the bitmap is relatively full then the plain bitmap is almost always
+the most effective to update.
+
+> If we tried anything in the way of lossy compression then there'd
+> be even more stuff lurking under the hood.
+
+Having three-valued (0,1,maybe) RLL-encoded "tritmap" would be a good
+way to represent lossy compression, and it would also be quite
+straightforward to merge two of these using AND or OR. It may even be
+possible to easily construct it using a fixed-length b-tree and going
+from 1 to "maybe" for nodes that get too dense.
+
+---------------
+Hannu
+
+
author	Bruce Momjian <bruce@momjian.us>
	Tue, 13 Aug 2002 05:08:35 +0000 (05:08 +0000)
committer	Bruce Momjian <bruce@momjian.us>
	Tue, 13 Aug 2002 05:08:35 +0000 (05:08 +0000)