--- /dev/null
+From aoki@postgres.Berkeley.EDU Sun Jun 22 19:31:06 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA19488
+ for <maillist@candle.pha.pa.us>; Sun, 22 Jun 1997 19:31:03 -0400 (EDT)
+Received: from faerie.CS.Berkeley.EDU (faerie.CS.Berkeley.EDU [128.32.37.53]) by renoir.op.net ($ Revision: 1.12 $) with SMTP id TAA18795 for <maillist@candle.pha.pa.us>; Sun, 22 Jun 1997 19:18:06 -0400 (EDT)
+Received: from localhost.Berkeley.EDU (localhost.Berkeley.EDU [127.0.0.1]) by faerie.CS.Berkeley.EDU (8.6.10/8.6.3) with SMTP id QAA07816 for maillist@candle.pha.pa.us; Sun, 22 Jun 1997 16:16:44 -0700
+Message-Id: <199706222316.QAA07816@faerie.CS.Berkeley.EDU>
+X-Authentication-Warning: faerie.CS.Berkeley.EDU: Host localhost.Berkeley.EDU didn't use HELO protocol
+From: aoki@CS.Berkeley.EDU (Paul M. Aoki)
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+Subject: Re: PostgreSQL psort() function performance
+Reply-To: aoki@CS.Berkeley.EDU (Paul M. Aoki)
+In-reply-to: Your message of Sun, 22 Jun 1997 09:45:31 -0400 (EDT)
+ <199706221345.JAA11476@candle.pha.pa.us>
+Date: Sun, 22 Jun 97 16:16:43 -0700
+Sender: aoki@postgres.Berkeley.EDU
+X-Mts: smtp
+Status: OR
+
+the mariposa distribution (http://mariposa.cs.berkeley.edu/) contains
+some hacks to nodeSort.c and psort.c that
+ - make psort read directly from the executor node below it
+ (instead of an input relation)
+ - makes the Sort node read directly from the last set of psort runs
+ (instead of an output relation)
+speeds things up quite a bit. kind of ruins psort for other purposes,
+though (which is why nbtsort.c exists).
+
+i'd merge these in first and see how far that gets you.
+--
+ Paul M. Aoki | University of California at Berkeley
+ aoki@CS.Berkeley.EDU | Dept. of EECS, Computer Science Division #1776
+ | Berkeley, CA 94720-1776
+
+From owner-pgsql-hackers@hub.org Mon Nov 3 09:31:04 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA01676
+ for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 09:31:02 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA07345 for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 09:13:20 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id IAA13315; Mon, 3 Nov 1997 08:50:26 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 08:48:07 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id IAA11722 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 08:48:02 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id IAA11539 for <hackers@postgreSQL.org>; Mon, 3 Nov 1997 08:47:34 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id UAA19066; Mon, 3 Nov 1997 20:48:04 +0700 (KRS)
+Message-ID: <345DD614.345BF651@sable.krasnoyarsk.su>
+Date: Mon, 03 Nov 1997 20:48:04 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Marc Howard Zuckman <marc@fallon.classyad.com>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <Pine.LNX.3.95.971103090709.21917A-100000@fallon.classyad.com>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Marc Howard Zuckman wrote:
+>
+> On Mon, 3 Nov 1997, Bruce Momjian wrote:
+>
+> > With fsync off, I just did an insert of 1000 integers into a table
+> > containing a single int4 column and no indexes, and it completed in 2.3
+> > seconds. This is on the new source tree.. That is 434 inserts/second.
+> > Pretty major performance, or 2.3 ms/insert. This is on a idle PP200
+> > with UltraSCSI drives.
+> >
+> > With fsync on, the time goes to 51 seconds. Wow, big difference.
+>
+> If better alternative error recovery methods were available, perhaps
+> a facility to replay an interval transactions log from a prior dump,
+> it would be reasonable to run the backend without fsync and
+> take advantage of the performance gains.
+
+???
+
+>
+> I don't know the answer, but I suspect that the commercial databases
+> don't "fsync" the way pgsql does.
+
+Could someone try 1000 int4 inserts using postgres and
+some commercial database (on the same machine) ?
+
+Vadim
+
+
+From owner-pgsql-hackers@hub.org Mon Nov 3 09:01:02 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA01183
+ for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 09:01:00 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id IAA06632 for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 08:51:58 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id IAA05964; Mon, 3 Nov 1997 08:39:39 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 08:37:32 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id IAA04729 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 08:37:26 -0500 (EST)
+Received: from fallon.classyad.com (root@classyad.com [152.160.43.1]) by hub.org (8.8.5/8.7.5) with ESMTP id IAA04614 for <hackers@postgreSQL.org>; Mon, 3 Nov 1997 08:37:16 -0500 (EST)
+Received: from fallon.classyad.com (marc@fallon.classyad.com [152.160.43.1]) by fallon.classyad.com (8.8.5/8.7.3) with SMTP id JAA22108; Mon, 3 Nov 1997 09:11:09 -0500
+Date: Mon, 3 Nov 1997 09:11:09 -0500 (EST)
+From: Marc Howard Zuckman <marc@fallon.classyad.com>
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+cc: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>, hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+In-Reply-To: <199711030513.AAA23474@candle.pha.pa.us>
+Message-ID: <Pine.LNX.3.95.971103090709.21917A-100000@fallon.classyad.com>
+MIME-Version: 1.0
+Content-Type: TEXT/PLAIN; charset=US-ASCII
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+On Mon, 3 Nov 1997, Bruce Momjian wrote:
+
+> >
+> > Removed...
+> >
+> > Also, ItemPointerData t_chain (6 bytes) removed from HeapTupleHeader.
+> > CommandId is uint32 now (up to the 2^32 - 1 commands per transaction).
+> > DOUBLEALIGN(Sizeof(HeapTupleHeader)) is 40 bytes now.
+> >
+> > 1000 inserts (into table with single int4 column, 1 insert per transaction)
+> > takes 70 - 80 sec now (12.5 - 14 transactions/sec).
+> > This is hardware/OS limitation:
+> >
+> > fd = open ("t", O_RDWR);
+> > for (i = 1; i <= 1000; i++)
+> > {
+> > lseek(fd, 0, SEEK_END);
+> > write(fd, buf, 56);
+> > fsync(fd);
+> > }
+> > close (fd);
+> >
+> > takes 33 - 39 sec and so it's not possible to be faster
+> > having 2 fsync-s per transaction.
+> >
+> > The same test on 6.2.1: 92 - 107 sec
+>
+> With fsync off, I just did an insert of 1000 integers into a table
+> containing a single int4 column and no indexes, and it completed in 2.3
+> seconds. This is on the new source tree.. That is 434 inserts/second.
+> Pretty major performance, or 2.3 ms/insert. This is on a idle PP200
+> with UltraSCSI drives.
+>
+> With fsync on, the time goes to 51 seconds. Wow, big difference.
+
+If better alternative error recovery methods were available, perhaps
+a facility to replay an interval transactions log from a prior dump,
+it would be reasonable to run the backend without fsync and
+take advantage of the performance gains.
+
+I don't know the answer, but I suspect that the commercial databases
+don't "fsync" the way pgsql does.
+
+Marc Zuckman
+marc@fallon.classyad.com
+
+_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
+_ Visit The Home and Condo MarketPlace _
+_ http://www.ClassyAd.com _
+_ _
+_ FREE basic property listings/advertisements and searches. _
+_ _
+_ Try our premium, yet inexpensive services for a real _
+_ selling or buying edge! _
+_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
+
+
+
+From owner-pgsql-hackers@hub.org Mon Nov 3 11:31:03 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA04080
+ for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 11:31:00 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id LAA13680 for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 11:21:30 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id LAA07566; Mon, 3 Nov 1997 11:04:52 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 11:02:59 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id LAA07372 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 11:02:52 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id LAA07196 for <hackers@postgreSQL.org>; Mon, 3 Nov 1997 11:02:22 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id KAA02525;
+ Mon, 3 Nov 1997 10:42:03 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711031542.KAA02525@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Mon, 3 Nov 1997 10:42:03 -0500 (EST)
+Cc: marc@fallon.classyad.com, hackers@postgreSQL.org
+In-Reply-To: <345DD614.345BF651@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 08:48:04 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> > I don't know the answer, but I suspect that the commercial databases
+> > don't "fsync" the way pgsql does.
+>
+> Could someone try 1000 int4 inserts using postgres and
+> some commercial database (on the same machine) ?
+
+I have been thinking about this since seeing the performance change
+with/without fsync.
+
+Commerical databases usually do a log write every 5 or 15 minutes, and
+guarantee the logs will contain everything up to this time interval.
+
+Couldn't we have some such mechanism? Usually they have raw space, so
+they can control when the data is hitting the disk. Using a file
+system, some of it may be getting to the disk without our knowing it.
+
+What exactly is a scenario where lack of doing explicit fsync's will
+cause data corruption, rather than just lost data from the past few
+minutes?
+
+I think Vadim has gotten fsync's down to fsync'ing the modified data
+page, and pg_log.
+
+Let's suppose we did not fsync. There could be cases where pg_log was
+fsync'ed by the OS, and some of the modified data pages are fyncs'ed by
+the OS, but not others. This would leave us with a partial transaction.
+
+However, let's suppose we prevent pg_log from being fsync'ed somehow.
+Then, because we have a no-overwrite database, we could keep control of
+this, and write of some data pages, but not others would not cause us
+problems because the pg_log would show all such transactions, which had
+not had all their modified data pages fsync'ed, as non-committed.
+
+Perhaps we can even set a flag in pg_log every five minutes to indicate
+whether all buffers for the page have been flushed? That way we could
+not have to worry about preventing flushing of pg_log.
+
+Comments?
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Mon Nov 3 12:00:42 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA04456
+ for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 12:00:40 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id LAA26054; Mon, 3 Nov 1997 11:46:49 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 11:46:33 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id LAA25932 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 11:46:30 -0500 (EST)
+Received: from orion.SAPserv.Hamburg.dsh.de (polaris.sapserv.debis.de [53.2.131.8]) by hub.org (8.8.5/8.7.5) with SMTP id LAA25750 for <hackers@postgreSQL.org>; Mon, 3 Nov 1997 11:45:53 -0500 (EST)
+Received: by orion.SAPserv.Hamburg.dsh.de
+ (Linux Smail3.1.29.1 #1)}
+ id m0xSPfE-000BGZC; Mon, 3 Nov 97 17:47 MET
+Message-Id: <m0xSPfE-000BGZC@orion.SAPserv.Hamburg.dsh.de>
+From: wieck@sapserv.debis.de
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: maillist@candle.pha.pa.us (Bruce Momjian)
+Date: Mon, 3 Nov 1997 17:47:43 +0100 (MET)
+Cc: vadim@sable.krasnoyarsk.su, marc@fallon.classyad.com,
+ hackers@postgreSQL.org
+Reply-To: wieck@sapserv.debis.de (Jan Wieck)
+In-Reply-To: <199711031542.KAA02525@candle.pha.pa.us> from "Bruce Momjian" at Nov 3, 97 10:42:03 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=iso-8859-1
+Content-Transfer-Encoding: 8bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> > > I don't know the answer, but I suspect that the commercial databases
+> > > don't "fsync" the way pgsql does.
+> >
+> > Could someone try 1000 int4 inserts using postgres and
+> > some commercial database (on the same machine) ?
+>
+> I have been thinking about this since seeing the performance change
+> with/without fsync.
+>
+> Commerical databases usually do a log write every 5 or 15 minutes, and
+> guarantee the logs will contain everything up to this time interval.
+>
+
+ Without fsync PostgreSQL would only loose data if the OS
+ crashes between the last write operation of a backend and the
+ next regular update sync. This is seldom but if it happens it
+ really hurts.
+
+ A database can omit fsync on data files (e.g. tablespaces) if
+ it writes a redo log. With that redo log, a backup can be
+ restored and than all transactions since the backup redone.
+
+ PostgreSQL doesn't write such a redo log. So an OS crash
+ after the fsync of pg_log could corrupt the database without
+ a chance to recover.
+
+ Isn't it time to get an (optional) redo log. I don't exactly
+ know all the places where our datafiles can get modified, but
+ I hope this is only done in the heap access methods and
+ vacuum. So these are the places from where the redo log data
+ comes from (plus transaction commit/rollback).
+
+
+Until later, Jan
+
+--
+#define OPINIONS "they are all mine - not those of debis or daimler-benz"
+
+#======================================================================#
+# It's easier to get forgiveness for being wrong than for being right. #
+# Let's break this rule - forgive me. #
+#================================== wieck@sapserv.debis.de (Jan Wieck) #
+
+
+
+
+From owner-pgsql-hackers@hub.org Mon Nov 3 14:01:06 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id OAA06775
+ for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 14:01:04 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA22235 for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 13:43:15 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id NAA11482; Mon, 3 Nov 1997 13:32:40 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 13:32:02 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id NAA11204 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 13:31:58 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id NAA11119 for <hackers@postgreSQL.org>; Mon, 3 Nov 1997 13:31:44 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id MAA05464;
+ Mon, 3 Nov 1997 12:59:01 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711031759.MAA05464@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: wieck@sapserv.debis.de
+Date: Mon, 3 Nov 1997 12:59:01 -0500 (EST)
+Cc: vadim@sable.krasnoyarsk.su, marc@fallon.classyad.com,
+ hackers@postgreSQL.org
+In-Reply-To: <m0xSPfE-000BGZC@orion.SAPserv.Hamburg.dsh.de> from "wieck@sapserv.debis.de" at Nov 3, 97 05:47:43 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> >
+> > > > I don't know the answer, but I suspect that the commercial databases
+> > > > don't "fsync" the way pgsql does.
+> > >
+> > > Could someone try 1000 int4 inserts using postgres and
+> > > some commercial database (on the same machine) ?
+> >
+> > I have been thinking about this since seeing the performance change
+> > with/without fsync.
+> >
+> > Commerical databases usually do a log write every 5 or 15 minutes, and
+> > guarantee the logs will contain everything up to this time interval.
+> >
+>
+> Without fsync PostgreSQL would only loose data if the OS
+> crashes between the last write operation of a backend and the
+> next regular update sync. This is seldom but if it happens it
+> really hurts.
+>
+> A database can omit fsync on data files (e.g. tablespaces) if
+> it writes a redo log. With that redo log, a backup can be
+> restored and than all transactions since the backup redone.
+>
+> PostgreSQL doesn't write such a redo log. So an OS crash
+> after the fsync of pg_log could corrupt the database without
+> a chance to recover.
+>
+> Isn't it time to get an (optional) redo log. I don't exactly
+> know all the places where our datafiles can get modified, but
+> I hope this is only done in the heap access methods and
+> vacuum. So these are the places from where the redo log data
+> comes from (plus transaction commit/rollback).
+>
+
+Yes, but because we are a non-over-write database, I don't see why we
+can't just do this without a redo log.
+
+Every five minutes, we fsync() all dirty pages, mark all completed
+transactions as fsync'ed in pg_log, and fsync() pg_log.
+
+On postmaster startup, any transaction marked as completed, but not
+marked as fsync'ed gets marked as aborted.
+
+Of course, all vacuum operations would have to be fsync'ed.
+
+Comments?
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Mon Nov 3 16:46:01 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA10292
+ for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 16:45:59 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id QAA02040 for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 16:42:40 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA17422; Mon, 3 Nov 1997 16:34:28 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 16:34:10 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA17210 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 16:34:06 -0500 (EST)
+Received: from fallon.classyad.com (root@classyad.com [152.160.43.1]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA16690 for <hackers@postgreSQL.org>; Mon, 3 Nov 1997 16:33:27 -0500 (EST)
+Received: from fallon.classyad.com (marc@fallon.classyad.com [152.160.43.1]) by fallon.classyad.com (8.8.5/8.7.3) with SMTP id RAA32498; Mon, 3 Nov 1997 17:33:42 -0500
+Date: Mon, 3 Nov 1997 17:33:42 -0500 (EST)
+From: Marc Howard Zuckman <marc@fallon.classyad.com>
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+cc: wieck@sapserv.debis.de, vadim@sable.krasnoyarsk.su, hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+In-Reply-To: <199711031759.MAA05464@candle.pha.pa.us>
+Message-ID: <Pine.LNX.3.95.971103173129.32055B-100000@fallon.classyad.com>
+MIME-Version: 1.0
+Content-Type: TEXT/PLAIN; charset=US-ASCII
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+On Mon, 3 Nov 1997, Bruce Momjian wrote:
+
+> >
+> > >
+> > > > > I don't know the answer, but I suspect that the commercial databases
+> > > > > don't "fsync" the way pgsql does.
+> > > >
+> > > > Could someone try 1000 int4 inserts using postgres and
+> > > > some commercial database (on the same machine) ?
+> > >
+> > > I have been thinking about this since seeing the performance change
+> > > with/without fsync.
+> > >
+> > > Commerical databases usually do a log write every 5 or 15 minutes, and
+> > > guarantee the logs will contain everything up to this time interval.
+> > >
+> >
+> > Without fsync PostgreSQL would only loose data if the OS
+> > crashes between the last write operation of a backend and the
+> > next regular update sync. This is seldom but if it happens it
+> > really hurts.
+> >
+> > A database can omit fsync on data files (e.g. tablespaces) if
+> > it writes a redo log. With that redo log, a backup can be
+> > restored and than all transactions since the backup redone.
+> >
+> > PostgreSQL doesn't write such a redo log. So an OS crash
+> > after the fsync of pg_log could corrupt the database without
+> > a chance to recover.
+> >
+> > Isn't it time to get an (optional) redo log. I don't exactly
+> > know all the places where our datafiles can get modified, but
+> > I hope this is only done in the heap access methods and
+> > vacuum. So these are the places from where the redo log data
+> > comes from (plus transaction commit/rollback).
+> >
+>
+> Yes, but because we are a non-over-write database, I don't see why we
+> can't just do this without a redo log.
+
+Because if the hard drive is the reason for the failure (instead of
+power out, OS bites dust, etc), the database won't be of much help.
+
+The redo log should be on a device different than the database.
+
+Marc Zuckman
+marc@fallon.classyad.com
+
+_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
+_ Visit The Home and Condo MarketPlace _
+_ http://www.ClassyAd.com _
+_ _
+_ FREE basic property listings/advertisements and searches. _
+_ _
+_ Try our premium, yet inexpensive services for a real _
+_ selling or buying edge! _
+_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
+
+
+
+From maillist Mon Nov 3 22:59:31 1997
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id WAA16264;
+ Mon, 3 Nov 1997 22:59:31 -0500 (EST)
+From: Bruce Momjian <maillist>
+Message-Id: <199711040359.WAA16264@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: maillist@candle.pha.pa.us (Bruce Momjian)
+Date: Mon, 3 Nov 1997 22:59:30 -0500 (EST)
+Cc: vadim@sable.krasnoyarsk.su, marc@fallon.classyad.com,
+ hackers@postgreSQL.org
+In-Reply-To: <199711031542.KAA02525@candle.pha.pa.us> from "Bruce Momjian" at Nov 3, 97 10:42:03 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+>
+> > > I don't know the answer, but I suspect that the commercial databases
+> > > don't "fsync" the way pgsql does.
+> >
+> > Could someone try 1000 int4 inserts using postgres and
+> > some commercial database (on the same machine) ?
+>
+> I have been thinking about this since seeing the performance change
+> with/without fsync.
+>
+> Commercial databases usually do a log write every 5 or 15 minutes, and
+> guarantee the logs will contain everything up to this time interval.
+>
+> Couldn't we have some such mechanism? Usually they have raw space, so
+> they can control when the data is hitting the disk. Using a file
+> system, some of it may be getting to the disk without our knowing it.
+>
+> What exactly is a scenario where lack of doing explicit fsync's will
+> cause data corruption, rather than just lost data from the past few
+> minutes?
+>
+> I think Vadim has gotten fsync's down to fsync'ing the modified data
+> page, and pg_log.
+>
+> Let's suppose we did not fsync. There could be cases where pg_log was
+> fsync'ed by the OS, and some of the modified data pages are fyncs'ed by
+> the OS, but not others. This would leave us with a partial transaction.
+>
+> However, let's suppose we prevent pg_log from being fsync'ed somehow.
+> Then, because we have a no-overwrite database, we could keep control of
+> this, and write of some data pages, but not others would not cause us
+> problems because the pg_log would show all such transactions, which had
+> not had all their modified data pages fsync'ed, as non-committed.
+>
+> Perhaps we can even set a flag in pg_log every five minutes to indicate
+> whether all buffers for the page have been flushed? That way we could
+> not have to worry about preventing flushing of pg_log.
+>
+> Comments?
+
+OK, here is a more formal description of what I am suggesting. It will
+give us commercial dbms reliability with no-fsync performance.
+Commercial dbms's usually only give restore up to 5 minutes before the
+crash, and this is what I am suggesting. If we can do this, we can
+remove the no-fsync option.
+
+First, lets suppose there exists a shared queue that is visible to all
+backends and the postmaster that allows transaction id's to be added to
+the queue. We also add a bit to the pg_log record called 'been_synced'
+that is initially false.
+
+OK, once a backend starts a transaction, it puts a transaction id in
+pg_log. Once the transaction is finished, it is marked as committed.
+At the same time, we now put the transaction id on the shared queue.
+
+Every five minutes, or as defined by the administrator, the postmaster
+does a sync() call. On my OS, anyone use can call sync, and I think
+this is typical. update/pagecleaner does this every 30 seconds anyway,
+so it is no big deal for the postmaster to call it every 5 minutes. The
+nice thing about this is that the OS does the syncing of all the dirty
+pages for us. (An alarm() call can set up this 5 minute timing.)
+
+The postmaster then locks the shared transaction id queue, makes a copy
+of the entries in the queue, clears the queue, and unlocks the queue.
+It does this so no one else modifies the queue while it is being
+cleared.
+
+The postmaster then goes through pg_log, and marks each transaction as
+'been_synced'.
+
+The postmaster also performs this on shutdown.
+
+On postmaster startup, all transactions are checked and any transaction
+that is marked as committed but not 'been_synced' is marked as not
+committed. In this way, we prevent non-synced or partially synced
+transactions from being used.
+
+Of course, vacuum would have to do normal fsyncs because it is removing
+the transaction log.
+
+We need the shared transaction id queue because there is no way to find
+the newly committed transactions since the last sync. A transaction
+can last for hours.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+From owner-pgsql-hackers@hub.org Tue Nov 4 02:13:08 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA17544
+ for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 02:13:06 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id CAA14126; Tue, 4 Nov 1997 02:07:55 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 02:04:59 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id CAA12859 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 02:04:51 -0500 (EST)
+Received: from orion.SAPserv.Hamburg.dsh.de (polaris.sapserv.debis.de [53.2.131.8]) by hub.org (8.8.5/8.7.5) with SMTP id CAA12625 for <hackers@postgreSQL.org>; Tue, 4 Nov 1997 02:04:12 -0500 (EST)
+Received: by orion.SAPserv.Hamburg.dsh.de
+ (Linux Smail3.1.29.1 #1)}
+ id m0xSd44-000BFQC; Tue, 4 Nov 97 08:06 MET
+Message-Id: <m0xSd44-000BFQC@orion.SAPserv.Hamburg.dsh.de>
+From: wieck@sapserv.debis.de
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: maillist@candle.pha.pa.us (Bruce Momjian)
+Date: Tue, 4 Nov 1997 08:06:16 +0100 (MET)
+Cc: maillist@candle.pha.pa.us, vadim@sable.krasnoyarsk.su,
+ marc@fallon.classyad.com, hackers@postgreSQL.org
+Reply-To: wieck@sapserv.debis.de (Jan Wieck)
+In-Reply-To: <199711040359.WAA16264@candle.pha.pa.us> from "Bruce Momjian" at Nov 3, 97 10:59:30 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=iso-8859-1
+Content-Transfer-Encoding: 8bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> OK, here is a more formal description of what I am suggesting. It will
+> give us commercial dbms reliability with no-fsync performance.
+> Commercial dbms's usually only give restore up to 5 minutes before the
+> crash, and this is what I am suggesting. If we can do this, we can
+> remove the no-fsync option.
+
+ I'm not 100% sure but as far as I know Oracle, it can recover
+ up to the last committed transaction using the online redo
+ logs. And even if commercial dbms's aren't able to do that,
+ it should be our target.
+
+> [description about transaction queue]
+
+ This all depends on the fact that PostgreSQL is a no
+ overwrite dbms. Otherwise the space of deleted tuples might
+ get overwritten by later transactions and the information is
+ finally lost.
+
+ Another issue: All we up to now though of are crashes where
+ the database files are still usable after restart. But take
+ the simple case of a write error. A new bad block or track
+ will get remapped (in some way) but the data in it is lost.
+ So we end up with one or more totally corrupted database
+ files. And I don't trust mirrored disks farer than I can
+ throw them. A bug in the OS or a memory failure (many new
+ PeeCee boards don't support parity and even with parity a two
+ bit failure is still the wrong data but with a valid parity
+ bit) can also currupt the data.
+
+ I still prefer redo logs. They should reside on a different
+ disk and the possibility of loosing the database files along
+ with the redo log is very small.
+
+
+Until later, Jan
+
+--
+#define OPINIONS "they are all mine - not those of debis or daimler-benz"
+
+#======================================================================#
+# It's easier to get forgiveness for being wrong than for being right. #
+# Let's break this rule - forgive me. #
+#================================== wieck@sapserv.debis.de (Jan Wieck) #
+
+
+
+
+From vadim@sable.krasnoyarsk.su Tue Nov 4 04:12:50 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA18487
+ for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 04:12:48 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA03152 for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 04:12:06 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id QAA20591; Tue, 4 Nov 1997 16:14:06 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <345EE75D.398A68D@sable.krasnoyarsk.su>
+Date: Tue, 04 Nov 1997 16:14:05 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: marc@fallon.classyad.com, hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <199711040359.WAA16264@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> OK, here is a more formal description of what I am suggesting. It will
+> give us commercial dbms reliability with no-fsync performance.
+> Commercial dbms's usually only give restore up to 5 minutes before the
+ ^^^^^^^^^^^^^^^^^^^^^^^
+I'm sure that this is not true!
+If on-line redo_file is damaged then you have
+single ability: restore your last backup.
+In all other cases database will be recovered up to the last
+committed transaction automatically!
+
+DBMS-s using WAL have to fsync only redo file on commit
+(and they do it!), non-overwriting systems have to
+fsync data files and transaction log.
+
+We could optimize fsync-s for multi-user environment: do not
+fsync when we're ensured that our changes flushed to disk by
+another backend.
+
+> crash, and this is what I am suggesting. If we can do this, we can
+> remove the no-fsync option.
+>
+...
+>
+> On postmaster startup, all transactions are checked and any transaction
+> that is marked as committed but not 'been_synced' is marked as not
+> committed. In this way, we prevent non-synced or partially synced
+> transactions from being used.
+
+And what should users (ensured that their transaction are
+committed) do in this case ?
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Tue Nov 4 04:21:04 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA18536
+ for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 04:21:01 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id EAA15551; Tue, 4 Nov 1997 04:15:15 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 04:14:23 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id EAA14464 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 04:14:18 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id EAA13437 for <hackers@postgreSQL.org>; Tue, 4 Nov 1997 04:13:33 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id QAA20591; Tue, 4 Nov 1997 16:14:06 +0700 (KRS)
+Message-ID: <345EE75D.398A68D@sable.krasnoyarsk.su>
+Date: Tue, 04 Nov 1997 16:14:05 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: marc@fallon.classyad.com, hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <199711040359.WAA16264@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce Momjian wrote:
+>
+> OK, here is a more formal description of what I am suggesting. It will
+> give us commercial dbms reliability with no-fsync performance.
+> Commercial dbms's usually only give restore up to 5 minutes before the
+ ^^^^^^^^^^^^^^^^^^^^^^^
+I'm sure that this is not true!
+If on-line redo_file is damaged then you have
+single ability: restore your last backup.
+In all other cases database will be recovered up to the last
+committed transaction automatically!
+
+DBMS-s using WAL have to fsync only redo file on commit
+(and they do it!), non-overwriting systems have to
+fsync data files and transaction log.
+
+We could optimize fsync-s for multi-user environment: do not
+fsync when we're ensured that our changes flushed to disk by
+another backend.
+
+> crash, and this is what I am suggesting. If we can do this, we can
+> remove the no-fsync option.
+>
+...
+>
+> On postmaster startup, all transactions are checked and any transaction
+> that is marked as committed but not 'been_synced' is marked as not
+> committed. In this way, we prevent non-synced or partially synced
+> transactions from being used.
+
+And what should users (ensured that their transaction are
+committed) do in this case ?
+
+Vadim
+
+
+From owner-pgsql-hackers@hub.org Tue Nov 4 06:43:00 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA19743
+ for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 06:42:57 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id GAA10352; Tue, 4 Nov 1997 06:36:08 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 06:35:42 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id GAA10158 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 06:35:37 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id GAA10096 for <hackers@postgreSQL.org>; Tue, 4 Nov 1997 06:35:27 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id GAA19665;
+ Tue, 4 Nov 1997 06:35:10 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711041135.GAA19665@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: wieck@sapserv.debis.de
+Date: Tue, 4 Nov 1997 06:35:10 -0500 (EST)
+Cc: hackers@postgreSQL.org (PostgreSQL-development)
+In-Reply-To: <m0xSd44-000BFQC@orion.SAPserv.Hamburg.dsh.de> from "wieck@sapserv.debis.de" at Nov 4, 97 08:06:16 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> > OK, here is a more formal description of what I am suggesting. It will
+> > give us commercial dbms reliability with no-fsync performance.
+> > Commercial dbms's usually only give restore up to 5 minutes before the
+> > crash, and this is what I am suggesting. If we can do this, we can
+> > remove the no-fsync option.
+>
+> I'm not 100% sure but as far as I know Oracle, it can recover
+> up to the last committed transaction using the online redo
+> logs. And even if commercial dbms's aren't able to do that,
+> it should be our target.
+>
+> > [description about transaction queue]
+>
+> This all depends on the fact that PostgreSQL is a no
+> overwrite dbms. Otherwise the space of deleted tuples might
+> get overwritten by later transactions and the information is
+> finally lost.
+>
+> Another issue: All we up to now though of are crashes where
+> the database files are still usable after restart. But take
+> the simple case of a write error. A new bad block or track
+> will get remapped (in some way) but the data in it is lost.
+> So we end up with one or more totally corrupted database
+> files. And I don't trust mirrored disks farer than I can
+> throw them. A bug in the OS or a memory failure (many new
+> PeeCee boards don't support parity and even with parity a two
+> bit failure is still the wrong data but with a valid parity
+> bit) can also currupt the data.
+>
+> I still prefer redo logs. They should reside on a different
+> disk and the possibility of loosing the database files along
+> with the redo log is very small.
+
+I have been thinking about re-do logs, and I think it is a good idea.
+It would not be hard to have the queries spit out to a separate file
+configurable by the user.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Tue Nov 4 07:31:01 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA22051
+ for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 07:30:59 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA07444 for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 07:25:14 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id HAA08818; Tue, 4 Nov 1997 07:03:30 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 07:02:44 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id HAA08418 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 07:02:29 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA08331 for <hackers@postgreSQL.org>; Tue, 4 Nov 1997 07:02:07 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id GAA21484;
+ Tue, 4 Nov 1997 06:50:24 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711041150.GAA21484@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Tue, 4 Nov 1997 06:50:24 -0500 (EST)
+Cc: marc@fallon.classyad.com, hackers@postgreSQL.org
+In-Reply-To: <345EE75D.398A68D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 4, 97 04:14:05 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> Bruce Momjian wrote:
+> >
+> > OK, here is a more formal description of what I am suggesting. It will
+> > give us commercial dbms reliability with no-fsync performance.
+> > Commercial dbms's usually only give restore up to 5 minutes before the
+> ^^^^^^^^^^^^^^^^^^^^^^^
+> I'm sure that this is not true!
+
+You may be right. This five minute figure is when you restore from your
+previous backup, then restore from the log file.
+
+Can't we do something like sync every 5 seconds, rather than after every
+transaction? It just seems like such overkill.
+
+Actually, I found a problem with my description. Because pg_log is not
+fsync'ed, after a crash, pages with new transactions could have been
+flushed to disk, but not the pg_log table that contains the transaction
+ids. The problem is that the new backend could assign a transaction id
+that is already in use.
+
+We could set a flag upon successful shutdown, and if it is not set on
+reboot, either do a vacuum to find the max transaction id, and
+invalidate all them not in pg_log as synced, or increase the next
+transaction id to some huge number and invalidate all them in between.
+
+
+> If on-line redo_file is damaged then you have
+> single ability: restore your last backup.
+> In all other cases database will be recovered up to the last
+> committed transaction automatically!
+>
+> DBMS-s using WAL have to fsync only redo file on commit
+> (and they do it!), non-overwriting systems have to
+> fsync data files and transaction log.
+>
+> We could optimize fsync-s for multi-user environment: do not
+> fsync when we're ensured that our changes flushed to disk by
+> another backend.
+>
+> > crash, and this is what I am suggesting. If we can do this, we can
+> > remove the no-fsync option.
+> >
+> ...
+> >
+> > On postmaster startup, all transactions are checked and any transaction
+> > that is marked as committed but not 'been_synced' is marked as not
+> > committed. In this way, we prevent non-synced or partially synced
+> > transactions from being used.
+>
+> And what should users (ensured that their transaction are
+> committed) do in this case ?
+>
+> Vadim
+>
+>
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From wieck@sapserv.debis.de Tue Nov 4 07:01:00 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA21697
+ for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 07:00:58 -0500 (EST)
+From: wieck@sapserv.debis.de
+Received: from orion.SAPserv.Hamburg.dsh.de (polaris.sapserv.debis.de [53.2.131.8]) by renoir.op.net (o1/$ Revision: 1.14 $) with SMTP id GAA06401 for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 06:48:25 -0500 (EST)
+Received: by orion.SAPserv.Hamburg.dsh.de
+ (Linux Smail3.1.29.1 #1)}
+ id m0xShVQ-000BGZC; Tue, 4 Nov 97 12:50 MET
+Message-Id: <m0xShVQ-000BGZC@orion.SAPserv.Hamburg.dsh.de>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: maillist@candle.pha.pa.us (Bruce Momjian)
+Date: Tue, 4 Nov 1997 12:50:45 +0100 (MET)
+Cc: wieck@sapserv.debis.de, hackers@postgreSQL.org
+Reply-To: wieck@sapserv.debis.de (Jan Wieck)
+In-Reply-To: <199711041135.GAA19665@candle.pha.pa.us> from "Bruce Momjian" at Nov 4, 97 06:35:10 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=iso-8859-1
+Content-Transfer-Encoding: 8bit
+Status: OR
+
+
+Bruce Momjian wrote:
+> I have been thinking about re-do logs, and I think it is a good idea.
+> It would not be hard to have the queries spit out to a separate file
+> configurable by the user.
+
+ This way the recovery process will be very complicated. When
+ multiple backends run concurrently, there are multiple
+ transactions active at the same time. And what tuples are
+ affected by an update e.g. depends much on the timing.
+
+ I had something different in mind. The redo log contains the
+ information from the executor (e.g. the transactionId, the
+ tupleId and the new tuple values when calling ExecReplace())
+ and the information which transactions commit and which not.
+ When recovering, those operations where the transactions
+ committed are again passed to the executors functions that do
+ the real updates with the values from the logfile.
+
+
+Until later, Jan
+
+--
+#define OPINIONS "they are all mine - not those of debis or daimler-benz"
+
+#======================================================================#
+# It's easier to get forgiveness for being wrong than for being right. #
+# Let's break this rule - forgive me. #
+#================================== wieck@sapserv.debis.de (Jan Wieck) #
+
+
+
+From owner-pgsql-hackers@hub.org Tue Nov 4 07:30:59 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA22048
+ for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 07:30:57 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA07189 for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 07:18:02 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id HAA08856; Tue, 4 Nov 1997 07:03:37 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 07:03:03 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id HAA08487 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 07:02:46 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA08192 for <hackers@postgreSQL.org>; Tue, 4 Nov 1997 07:02:02 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id HAA21653;
+ Tue, 4 Nov 1997 07:00:20 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711041200.HAA21653@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!u
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Tue, 4 Nov 1997 07:00:19 -0500 (EST)
+Cc: marc@fallon.classyad.com, hackers@postgreSQL.org
+In-Reply-To: <345EE75D.398A68D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 4, 97 04:14:05 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> Bruce Momjian wrote:
+> >
+> > OK, here is a more formal description of what I am suggesting. It will
+> > give us commercial dbms reliability with no-fsync performance.
+> > Commercial dbms's usually only give restore up to 5 minutes before the
+> ^^^^^^^^^^^^^^^^^^^^^^^
+> I'm sure that this is not true!
+> If on-line redo_file is damaged then you have
+> single ability: restore your last backup.
+> In all other cases database will be recovered up to the last
+> committed transaction automatically!
+
+I doubt commercial dbms's sync to disk after every transaction. They
+pick a time, maybe five seconds, and see all dirty pages get flushed by
+then.
+
+What they do do is to make certain that you are restored to a consistent
+state, perhaps 15 seconds ago.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Tue Nov 4 07:32:45 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA22066
+ for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 07:32:35 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id TAA20889; Tue, 4 Nov 1997 19:35:12 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <345F1680.60E33853@sable.krasnoyarsk.su>
+Date: Tue, 04 Nov 1997 19:35:12 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Jan Wieck <wieck@sapserv.debis.de>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, marc@fallon.classyad.com,
+ hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <m0xSd44-000BFQC@orion.SAPserv.Hamburg.dsh.de>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+wieck@sapserv.debis.de wrote:
+>
+> I still prefer redo logs. They should reside on a different
+> disk and the possibility of loosing the database files along
+> with the redo log is very small.
+
+Agreed. This way we could don't fsync data files and
+fsync both redo and pg_log. This is much faster.
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Tue Nov 4 08:00:58 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA22371
+ for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 08:00:56 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA08540 for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 07:57:25 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id TAA20935; Tue, 4 Nov 1997 19:59:46 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <345F1C42.1F1A7590@sable.krasnoyarsk.su>
+Date: Tue, 04 Nov 1997 19:59:46 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Jan Wieck <wieck@sapserv.debis.de>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <m0xShVQ-000BGZC@orion.SAPserv.Hamburg.dsh.de>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+wieck@sapserv.debis.de wrote:
+>
+> Bruce Momjian wrote:
+> > I have been thinking about re-do logs, and I think it is a good idea.
+> > It would not be hard to have the queries spit out to a separate file
+> > configurable by the user.
+>
+> This way the recovery process will be very complicated. When
+> multiple backends run concurrently, there are multiple
+> transactions active at the same time. And what tuples are
+> affected by an update e.g. depends much on the timing.
+>
+> I had something different in mind. The redo log contains the
+> information from the executor (e.g. the transactionId, the
+> tupleId and the new tuple values when calling ExecReplace())
+> and the information which transactions commit and which not.
+> When recovering, those operations where the transactions
+> committed are again passed to the executors functions that do
+> the real updates with the values from the logfile.
+
+It seems that this is what Oracle does, but Sybase writes queries
+(with transaction ids, of 'course, and before execution) and
+begin, commit/abort events <-- this is better for non-overwriting
+system (shorter redo file), but, agreed, recovering is more complicated.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Tue Nov 4 22:35:45 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA05060
+ for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 22:35:43 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA26725 for <maillist@candle.pha.pa.us>; Tue, 4 Nov 1997 22:35:10 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA27875; Tue, 4 Nov 1997 22:23:14 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 22:20:55 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA24162 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 22:20:50 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA22727 for <hackers@postgreSQL.org>; Tue, 4 Nov 1997 22:20:18 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id WAA04674;
+ Tue, 4 Nov 1997 22:17:52 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711050317.WAA04674@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Tue, 4 Nov 1997 22:17:52 -0500 (EST)
+Cc: marc@fallon.classyad.com, hackers@postgreSQL.org
+In-Reply-To: <345F14E7.28CC1042@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 4, 97 07:28:23 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> Bruce Momjian wrote:
+> >
+> > >
+> > > Bruce Momjian wrote:
+> > > >
+> > > > OK, here is a more formal description of what I am suggesting. It will
+> > > > give us commercial dbms reliability with no-fsync performance.
+> > > > Commercial dbms's usually only give restore up to 5 minutes before the
+> > > ^^^^^^^^^^^^^^^^^^^^^^^
+> > > I'm sure that this is not true!
+> >
+> > You may be right. This five minute figure is when you restore from your
+> > previous backup, then restore from the log file.
+> >
+> > Can't we do something like sync every 5 seconds, rather than after every
+> > transaction? It just seems like such overkill.
+>
+> Isn't -F and sync in crontab the same ?
+
+OK, let me again try to marshall some (any?) support for my suggestion.
+
+Informix version 5/7 has three levels of logging: unbuffered
+logging(our normal fsync mode), buffered logging, and no logging(our no
+fsync mode).
+
+We don't have buffered logging. Buffered logging guarantees you get put
+back to a consistent state after an os/server crash, usually to within
+30/90 seconds. You do not have any partial transactions lying around,
+but you do have some transactions that you thought were done, but are
+not.
+
+This is faster then non-buffered logging, but not as fast as no logging.
+Guess what mode everyone uses? The one we don't have, buffered logging!
+
+Unbuffered logging performance is terrible. Non-buffered logging is
+used to load huge chunks of data during off-hours.
+
+The problem we have is that we fsync every transaction, which causes a
+9-times slowdown in performance on single-integer inserts.
+
+That is a pretty heavy cost. But the alternative we give people is
+no-fsync mode, where we don't sync anything, and in a crash, you could
+come back with partially committed data in your database, if pg_log was
+sync'ed by the database, and only some of the data pages were sync'ed,
+so if any data was changing within 30 seconds of the crash, you have to
+restore your previous backup.
+
+We really need a middle solution, that gives better data integrity, for
+a smaller price.
+
+>
+> >
+> > Actually, I found a problem with my description. Because pg_log is not
+> > fsync'ed, after a crash, pages with new transactions could have been
+> > flushed to disk, but not the pg_log table that contains the transaction
+> > ids. The problem is that the new backend could assign a transaction id
+> > that is already in use.
+>
+> Impossible. Backend flushes pg_variable after fetching nex 32 xids.
+
+My suggestion is that we don't need to flush pg_variable or pg_log that
+much. My suggestion would speed up the test you do with 100 inserts
+inside a single transaction vs. 100 separate inserts.
+
+> >
+> > We could set a flag upon successful shutdown, and if it is not set on
+> > reboot, either do a vacuum to find the max transaction id, and
+> > invalidate all them not in pg_log as synced, or increase the next
+> > transaction id to some huge number and invalidate all them in between.
+> >
+
+I have a fix for the problem stated above, and it doesn't require a
+vacuum.
+
+We decide to fsync pg_variable and pg_log every 10,000 transactions or
+oids. Then if the database is brought up, and it was not brought down
+cleanly, you increment oid and transaction_id by 10,000, because you
+know you couldn't have gotten more than that. All intermediate
+transactions that are not marked committed/synced are marked aborted.
+
+---------------------------------------------------------------------------
+
+The problem we have with the current system is that we sync by action,
+not by time interval. If you are doing tons of inserts or updates, it
+is syncing after every one. What people really want is something that
+will sync not after every action, but after every minute or five
+minutes, so when the system is busy, the syncing every minutes is just a
+small amount, and when the system is idle, no one cares if is syncs, and
+no one has to wait for the sync to complete.
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From matti@algonet.se Wed Nov 5 11:02:33 1997
+Received: from smtp.algonet.se (tomei.algonet.se [194.213.74.114])
+ by candle.pha.pa.us (8.8.5/8.8.5) with SMTP id LAA02099
+ for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 11:02:28 -0500 (EST)
+Received: (qmail 6685 invoked from network); 5 Nov 1997 17:01:06 +0100
+Received: from du228-6.ppp.algonet.se (HELO gamma) (root@195.100.6.228)
+ by tomei.algonet.se with SMTP; 5 Nov 1997 17:01:06 +0100
+Sender: root
+Message-ID: <34609871.27EED9D@algonet.se>
+Date: Wed, 05 Nov 1997 17:02:16 +0100
+From: Mattias Kregert <matti@algonet.se>
+Organization: Algonet ISP
+X-Mailer: Mozilla 3.0Gold (X11; I; Linux 2.0.29 i586)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: pgsql-hackers@postgresql.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <199711050317.WAA04674@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> We don't have buffered logging. Buffered logging guarantees you get put
+> back to a consistent state after an os/server crash, usually to within
+> 30/90 seconds. You do not have any partial transactions lying around,
+> but you do have some transactions that you thought were done, but are
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> not.
+ ^^^^
+>
+> This is faster then non-buffered logging, but not as fast as no logging.
+> Guess what mode everyone uses? The one we don't have, buffered logging!
+
+Ouch! I would *not* like to use "buffered logging".
+What's the point in having the wrong data in the database and not
+knowing what updates, inserts or deletes to do to get the correct data?
+
+That's irrecoverable loss of data. Not what *I* want. Do *you* want it?
+
+
+> We really need a middle solution, that gives better data integrity, for
+> a smaller price.
+
+What I would like to have is this:
+
+If a backend tells the frontend that a transaction has completed,
+then that transaction should absolutely not get lost in case of a crash.
+
+What is needed is a log of changes since the last backup. This
+log would preferrably reside on a remote machine or at least
+another disk. Then, if the power goes in the middle of a disk write,
+the disk explodes and the computer goes up in flames, you can
+install Postgresql on a new machine, restore the last backup and
+re-run the change log.
+
+
+> The problem we have with the current system is that we sync by action,
+> not by time interval. If you are doing tons of inserts or updates, it
+> is syncing after every one. What people really want is something that
+> will sync not after every action, but after every minute or five
+> minutes, so when the system is busy, the syncing every minutes is just a
+> small amount, and when the system is idle, no one cares if is syncs, and
+> no one has to wait for the sync to complete.
+
+Yes, but this would only be the first step on the way to better
+crash-recovery.
+
+/* m */
+
+From vadim@sable.krasnoyarsk.su Wed Nov 5 12:20:23 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA05156
+ for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 12:20:13 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id LAA24123 for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 11:44:49 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id XAA23062; Wed, 5 Nov 1997 23:48:52 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <3460A374.41C67EA6@sable.krasnoyarsk.su>
+Date: Wed, 05 Nov 1997 23:48:52 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: marc@fallon.classyad.com, hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <199711050317.WAA04674@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> OK, let me again try to marshall some (any?) support for my suggestion.
+>
+> Informix version 5/7 has three levels of logging: unbuffered
+> logging(our normal fsync mode), buffered logging, and no logging(our no
+> fsync mode).
+>
+> We don't have buffered logging. Buffered logging guarantees you get put
+> back to a consistent state after an os/server crash, usually to within
+> 30/90 seconds. You do not have any partial transactions lying around,
+> but you do have some transactions that you thought were done, but are
+> not.
+>
+> This is faster then non-buffered logging, but not as fast as no logging.
+> Guess what mode everyone uses? The one we don't have, buffered logging!
+>
+> Unbuffered logging performance is terrible. Non-buffered logging is
+> used to load huge chunks of data during off-hours.
+>
+> The problem we have is that we fsync every transaction, which causes a
+> 9-times slowdown in performance on single-integer inserts.
+>
+> That is a pretty heavy cost. But the alternative we give people is
+> no-fsync mode, where we don't sync anything, and in a crash, you could
+> come back with partially committed data in your database, if pg_log was
+> sync'ed by the database, and only some of the data pages were sync'ed,
+> so if any data was changing within 30 seconds of the crash, you have to
+> restore your previous backup.
+>
+> We really need a middle solution, that gives better data integrity, for
+> a smaller price.
+
+There is no fsync synchronization currently.
+How could we be ensured that all modified data pages are flushed
+when we decided to flush pg_log ?
+If backend doesn't fsync data pages & pg_log at the commit time
+then when he must flush them (data first) ?
+
+This is what Oracle does:
+
+it uses dedicated DBWR process for writing/flushing modified
+data pages and LGWR process for writing/flushing redo log
+(redo log is transaction log also). LGWR always flushes log pages
+when committing, but durty data pages can be flushed _after_ transaction
+commit when DBWR decides that it's time to do it (ala checkpoints interval).
+
+Using redo log we could implement buffered logging quite easy.
+We can even don't use dedicated processes (but flush redo before pg_log),
+though having LGWR could simplify things.
+
+Without redo log or without some fsync synchronization we can't implement
+buffered logging. BTW, shared system cache could help with
+fsync synchonization, but, imho, redo is better (and faster for
+un-buffered logging too).
+
+> > > Actually, I found a problem with my description. Because pg_log is not
+> > > fsync'ed, after a crash, pages with new transactions could have been
+> > > flushed to disk, but not the pg_log table that contains the transaction
+> > > ids. The problem is that the new backend could assign a transaction id
+> > > that is already in use.
+> >
+> > Impossible. Backend flushes pg_variable after fetching nex 32 xids.
+>
+> My suggestion is that we don't need to flush pg_variable or pg_log that
+> much. My suggestion would speed up the test you do with 100 inserts
+> inside a single transaction vs. 100 separate inserts.
+>
+> > >
+> > > We could set a flag upon successful shutdown, and if it is not set on
+> > > reboot, either do a vacuum to find the max transaction id, and
+> > > invalidate all them not in pg_log as synced, or increase the next
+> > > transaction id to some huge number and invalidate all them in between.
+> > >
+>
+> I have a fix for the problem stated above, and it doesn't require a
+> vacuum.
+>
+> We decide to fsync pg_variable and pg_log every 10,000 transactions or
+> oids. Then if the database is brought up, and it was not brought down
+> cleanly, you increment oid and transaction_id by 10,000, because you
+> know you couldn't have gotten more than that. All intermediate
+> transactions that are not marked committed/synced are marked aborted.
+
+This is what I suppose to do by placing next available oid/xid
+in shmem: this allows pre-fetch much more than 32 ids at once
+without losing them when session closed.
+
+> The problem we have with the current system is that we sync by action,
+> not by time interval. If you are doing tons of inserts or updates, it
+> is syncing after every one. What people really want is something that
+> will sync not after every action, but after every minute or five
+> minutes, so when the system is busy, the syncing every minutes is just a
+> small amount, and when the system is idle, no one cares if is syncs, and
+> no one has to wait for the sync to complete.
+
+When I'm really doing tons of inserts/updates/deletes I use
+BEGIN/END. But it doesn't work for multi-user environment, of 'course.
+As for about what people really want, I remember that recently someone
+said in user list that if one want to have 10-20 inserts/sec then he
+should use mysql, but I got 25 inserts/sec on AIC-7880 & WD Enterprise
+when using one session, 32 inserts/sec with two sessions inserting
+in two different tables and only 20 inserts/sec with two sessions
+inserting in the same table. Imho, this difference between 20 and 32
+is more important thing to fix, and these results are not so bad
+in comparison with others.
+
+(BTW, we shouldn't forget about using raw devices to speed up things).
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Wed Nov 5 12:20:08 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA05150
+ for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 12:20:07 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id LAA24889 for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 11:59:27 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id AAA23096; Thu, 6 Nov 1997 00:03:19 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <3460A6D7.167EB0E7@sable.krasnoyarsk.su>
+Date: Thu, 06 Nov 1997 00:03:19 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Mattias Kregert <matti@algonet.se>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, pgsql-hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <199711050317.WAA04674@candle.pha.pa.us> <34609871.27EED9D@algonet.se>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Mattias Kregert wrote:
+>
+> Bruce Momjian wrote:
+> >
+> > We don't have buffered logging. Buffered logging guarantees you get put
+> > back to a consistent state after an os/server crash, usually to within
+> > 30/90 seconds. You do not have any partial transactions lying around,
+> > but you do have some transactions that you thought were done, but are
+> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> > not.
+> ^^^^
+> >
+> > This is faster then non-buffered logging, but not as fast as no logging.
+> > Guess what mode everyone uses? The one we don't have, buffered logging!
+>
+> Ouch! I would *not* like to use "buffered logging".
+
+And I.
+
+> What's the point in having the wrong data in the database and not
+> knowing what updates, inserts or deletes to do to get the correct data?
+>
+> That's irrecoverable loss of data. Not what *I* want. Do *you* want it?
+>
+> > We really need a middle solution, that gives better data integrity, for
+> > a smaller price.
+>
+> What I would like to have is this:
+>
+> If a backend tells the frontend that a transaction has completed,
+> then that transaction should absolutely not get lost in case of a crash.
+
+Agreed.
+
+>
+> What is needed is a log of changes since the last backup. This
+> log would preferrably reside on a remote machine or at least
+> another disk. Then, if the power goes in the middle of a disk write,
+> the disk explodes and the computer goes up in flames, you can
+> install Postgresql on a new machine, restore the last backup and
+> re-run the change log.
+
+Yes. And as I already said - this will speed up things because
+redo flushing is faster than flushing NNN tables which can be
+unflushed for some interval.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Wed Nov 5 12:20:39 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA05168
+ for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 12:20:38 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA25888 for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 12:14:14 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id MAA02259; Wed, 5 Nov 1997 12:02:33 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 05 Nov 1997 12:00:21 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id MAA00750 for pgsql-hackers-outgoing; Wed, 5 Nov 1997 12:00:10 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id LAA00598 for <pgsql-hackers@postgreSQL.org>; Wed, 5 Nov 1997 11:59:45 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id AAA23096; Thu, 6 Nov 1997 00:03:19 +0700 (KRS)
+Message-ID: <3460A6D7.167EB0E7@sable.krasnoyarsk.su>
+Date: Thu, 06 Nov 1997 00:03:19 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Mattias Kregert <matti@algonet.se>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, pgsql-hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <199711050317.WAA04674@candle.pha.pa.us> <34609871.27EED9D@algonet.se>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Mattias Kregert wrote:
+>
+> Bruce Momjian wrote:
+> >
+> > We don't have buffered logging. Buffered logging guarantees you get put
+> > back to a consistent state after an os/server crash, usually to within
+> > 30/90 seconds. You do not have any partial transactions lying around,
+> > but you do have some transactions that you thought were done, but are
+> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> > not.
+> ^^^^
+> >
+> > This is faster then non-buffered logging, but not as fast as no logging.
+> > Guess what mode everyone uses? The one we don't have, buffered logging!
+>
+> Ouch! I would *not* like to use "buffered logging".
+
+And I.
+
+> What's the point in having the wrong data in the database and not
+> knowing what updates, inserts or deletes to do to get the correct data?
+>
+> That's irrecoverable loss of data. Not what *I* want. Do *you* want it?
+>
+> > We really need a middle solution, that gives better data integrity, for
+> > a smaller price.
+>
+> What I would like to have is this:
+>
+> If a backend tells the frontend that a transaction has completed,
+> then that transaction should absolutely not get lost in case of a crash.
+
+Agreed.
+
+>
+> What is needed is a log of changes since the last backup. This
+> log would preferrably reside on a remote machine or at least
+> another disk. Then, if the power goes in the middle of a disk write,
+> the disk explodes and the computer goes up in flames, you can
+> install Postgresql on a new machine, restore the last backup and
+> re-run the change log.
+
+Yes. And as I already said - this will speed up things because
+redo flushing is faster than flushing NNN tables which can be
+unflushed for some interval.
+
+Vadim
+
+
+From owner-pgsql-hackers@hub.org Wed Nov 5 14:01:02 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id OAA07017
+ for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 14:00:59 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA01759 for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 13:52:36 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id NAA03611; Wed, 5 Nov 1997 13:29:43 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 05 Nov 1997 13:27:48 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id NAA03291 for pgsql-hackers-outgoing; Wed, 5 Nov 1997 13:27:41 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id NAA02823 for <hackers@postgreSQL.org>; Wed, 5 Nov 1997 13:26:20 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id NAA05863;
+ Wed, 5 Nov 1997 13:16:09 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711051816.NAA05863@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Wed, 5 Nov 1997 13:16:09 -0500 (EST)
+Cc: marc@fallon.classyad.com, hackers@postgreSQL.org
+In-Reply-To: <3460A374.41C67EA6@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 5, 97 11:48:52 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> There is no fsync synchronization currently.
+> How could we be ensured that all modified data pages are flushed
+> when we decided to flush pg_log ?
+> If backend doesn't fsync data pages & pg_log at the commit time
+> then when he must flush them (data first) ?
+
+My idea was to have the backend do a 'sync' that causes the OS to sync
+all dirty pages, then mark all committed transactions on pg_log as
+'synced'. Then sync pg_log. That way, there is a clear system where we
+know everything is flushed to disk, and we mark the transactions as
+synced.
+
+The only time that synced flag is used, is when the database starts up,
+and it sees that the previous shutdown was not clean.
+
+What am I missing here?
+
+>
+> This is what Oracle does:
+>
+> it uses dedicated DBWR process for writing/flushing modified
+> data pages and LGWR process for writing/flushing redo log
+> (redo log is transaction log also). LGWR always flushes log pages
+> when committing, but durty data pages can be flushed _after_ transaction
+> commit when DBWR decides that it's time to do it (ala checkpoints interval).
+>
+> Using redo log we could implement buffered logging quite easy.
+> We can even don't use dedicated processes (but flush redo before pg_log),
+> though having LGWR could simplify things.
+>
+> Without redo log or without some fsync synchronization we can't implement
+> buffered logging. BTW, shared system cache could help with
+> fsync synchonization, but, imho, redo is better (and faster for
+> un-buffered logging too).
+>
+
+I suggested my solution because it is clean, does flushing in one
+central location(postmaster), and does quick restores.
+
+> > > > Actually, I found a problem with my description. Because pg_log is not
+> > > > fsync'ed, after a crash, pages with new transactions could have been
+> > > > flushed to disk, but not the pg_log table that contains the transaction
+> > > > ids. The problem is that the new backend could assign a transaction id
+> > > > that is already in use.
+> > >
+> > > Impossible. Backend flushes pg_variable after fetching nex 32 xids.
+> >
+> > My suggestion is that we don't need to flush pg_variable or pg_log that
+> > much. My suggestion would speed up the test you do with 100 inserts
+> > inside a single transaction vs. 100 separate inserts.
+> >
+> > > >
+> > > > We could set a flag upon successful shutdown, and if it is not set on
+> > > > reboot, either do a vacuum to find the max transaction id, and
+> > > > invalidate all them not in pg_log as synced, or increase the next
+> > > > transaction id to some huge number and invalidate all them in between.
+> > > >
+> >
+> > I have a fix for the problem stated above, and it doesn't require a
+> > vacuum.
+> >
+> > We decide to fsync pg_variable and pg_log every 10,000 transactions or
+> > oids. Then if the database is brought up, and it was not brought down
+> > cleanly, you increment oid and transaction_id by 10,000, because you
+> > know you couldn't have gotten more than that. All intermediate
+> > transactions that are not marked committed/synced are marked aborted.
+>
+> This is what I suppose to do by placing next available oid/xid
+> in shmem: this allows pre-fetch much more than 32 ids at once
+> without losing them when session closed.
+>
+> > The problem we have with the current system is that we sync by action,
+> > not by time interval. If you are doing tons of inserts or updates, it
+> > is syncing after every one. What people really want is something that
+> > will sync not after every action, but after every minute or five
+> > minutes, so when the system is busy, the syncing every minutes is just a
+> > small amount, and when the system is idle, no one cares if is syncs, and
+> > no one has to wait for the sync to complete.
+>
+> When I'm really doing tons of inserts/updates/deletes I use
+> BEGIN/END. But it doesn't work for multi-user environment, of 'course.
+> As for about what people really want, I remember that recently someone
+> said in user list that if one want to have 10-20 inserts/sec then he
+> should use mysql, but I got 25 inserts/sec on AIC-7880 & WD Enterprise
+> when using one session, 32 inserts/sec with two sessions inserting
+> in two different tables and only 20 inserts/sec with two sessions
+> inserting in the same table. Imho, this difference between 20 and 32
+> is more important thing to fix, and these results are not so bad
+> in comparison with others.
+>
+> (BTW, we shouldn't forget about using raw devices to speed up things).
+>
+> Vadim
+>
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From james@blarg.net Wed Nov 5 13:26:46 1997
+Received: from animal.blarg.net (mail@animal.blarg.net [206.114.144.1])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA06130
+ for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 13:26:26 -0500 (EST)
+Received: from animal.blarg.net (james@animal.blarg.net [206.114.144.1])
+ by animal.blarg.net (8.8.5/8.8.4) with SMTP
+ id KAA09775; Wed, 5 Nov 1997 10:26:10 -0800
+Date: Wed, 5 Nov 1997 10:26:10 -0800 (PST)
+From: "James A. Hillyerd" <james@blarg.net>
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+cc: Mattias Kregert <matti@algonet.se>, pgsql-hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+In-Reply-To: <199711051615.LAA02260@candle.pha.pa.us>
+Message-ID: <Pine.LNX.3.95.971105102332.6252E-100000@animal.blarg.net>
+MIME-Version: 1.0
+Content-Type: TEXT/PLAIN; charset=US-ASCII
+Status: OR
+
+On Wed, 5 Nov 1997, Bruce Momjian wrote:
+>
+> The strange thing I am hearing is that the people who use PostgreSQL are
+> more worried about data recovery from a crash than million-dollar
+> companies that use commercial databases.
+>
+
+If I may throw in my 2 cents, I'd prefer to see that database in a
+consistent state, with the data being up to date as of 1 minute or
+less before the crash. I'd rather have higher performance than up to the
+second data.
+
+-james
+
+[ James A. Hillyerd (JH2162) - james@blarg.net - Web Developer ]
+[ http://www.blarg.net/~james/ http://www.hyperglyphics.com/ ]
+[ 1024/B11C3751 CA 1C B3 A9 07 2F 57 C9 91 F4 73 F2 19 A4 C5 88 ]
+
+
+From vadim@sable.krasnoyarsk.su Wed Nov 5 14:24:03 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id OAA07830
+ for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 14:24:02 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA02778 for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 14:13:45 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id CAA23376; Thu, 6 Nov 1997 02:17:51 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <3460C65E.446B9B3D@sable.krasnoyarsk.su>
+Date: Thu, 06 Nov 1997 02:17:50 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: marc@fallon.classyad.com, hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <199711051816.NAA05863@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > There is no fsync synchronization currently.
+> > How could we be ensured that all modified data pages are flushed
+> > when we decided to flush pg_log ?
+> > If backend doesn't fsync data pages & pg_log at the commit time
+> > then when he must flush them (data first) ?
+>
+> My idea was to have the backend do a 'sync' that causes the OS to sync
+> all dirty pages, then mark all committed transactions on pg_log as
+> 'synced'. Then sync pg_log. That way, there is a clear system where we
+> know everything is flushed to disk, and we mark the transactions as
+> synced.
+>
+> The only time that synced flag is used, is when the database starts up,
+> and it sees that the previous shutdown was not clean.
+>
+> What am I missing here?
+
+Ok, I see. But we can avoid 'synced' flag: we can make (just before
+sync-ing data pages) in-memory copies of "on-line" durty pg_log pages
+to being written/fsynced and perform write/fsync from these copies
+without stopping new commits in "on-line" page(s) (nothing must go
+to disk from "on-line" log pages).
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Wed Nov 5 14:32:25 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id OAA08101
+ for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 14:32:21 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id OAA22970; Wed, 5 Nov 1997 14:26:47 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 05 Nov 1997 14:24:59 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id OAA22344 for pgsql-hackers-outgoing; Wed, 5 Nov 1997 14:24:56 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id OAA22319 for <hackers@postgreSQL.org>; Wed, 5 Nov 1997 14:24:38 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id OAA07661;
+ Wed, 5 Nov 1997 14:22:46 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711051922.OAA07661@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Wed, 5 Nov 1997 14:22:45 -0500 (EST)
+Cc: marc@fallon.classyad.com, hackers@postgreSQL.org
+In-Reply-To: <3460A374.41C67EA6@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 5, 97 11:48:52 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Just a clarification. When I say the postmaster issues a sync, I mean
+sync(2), not fsync(2).
+
+The sync flushes all dirty pages on all file systems. Ordinary users
+can issue this, and update usually does this every 30 seconds anyway.
+
+By using this, we let the kernel figure out which buffers are dirty. We
+don't have to figure this out in the postmaster.
+
+Then we update the pg_log table to mark those transactions as synced.
+On recovery from a crash, we mark the committed transactions as
+uncommitted if they do not have the synced flag.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Wed Nov 5 15:11:07 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA08751
+ for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 15:10:59 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id PAA01986; Wed, 5 Nov 1997 15:01:24 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 05 Nov 1997 14:59:32 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id OAA01414 for pgsql-hackers-outgoing; Wed, 5 Nov 1997 14:59:28 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id OAA01403 for <hackers@postgreSQL.org>; Wed, 5 Nov 1997 14:59:14 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id OAA08283;
+ Wed, 5 Nov 1997 14:53:55 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711051953.OAA08283@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Wed, 5 Nov 1997 14:53:54 -0500 (EST)
+Cc: marc@fallon.classyad.com, hackers@postgreSQL.org
+In-Reply-To: <3460C65E.446B9B3D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 6, 97 02:17:50 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> > The only time that synced flag is used, is when the database starts up,
+> > and it sees that the previous shutdown was not clean.
+> >
+> > What am I missing here?
+>
+> Ok, I see. But we can avoid 'synced' flag: we can make (just before
+> sync-ing data pages) in-memory copies of "on-line" durty pg_log pages
+> to being written/fsynced and perform write/fsync from these copies
+> without stopping new commits in "on-line" page(s) (nothing must go
+> to disk from "on-line" log pages).
+
+[Working late tonight?]
+
+OK, now I am lost. We need the sync'ed flag so when we start the
+postmaster, and we see the database we not shut down properly, we use
+the flag to clear the commit flag from comitted transactions that were
+not sync'ed by the postmaster.
+
+In my opinion, we don't need any extra copies of pg_log, we can set
+those sync'ed flags while others are making changes, because before we
+did our sync, we gathered a list of committed transaction ids from the
+shared transaction id queue that I mentioned a while ago.
+
+We need this queue so we can find the newly-committed transactions that
+do not have a sync flag. Another way we could do this would be to scan
+pg_log before we sync, getting all the committed transaction ids without
+sync flags. No lock is needed on the table. If we miss some new ones,
+we will get them next time we scan. The problem I saw is that there is
+no way to see when to stop scanning the pg_log table for such
+transactions, so I thought each backend would have to put its newly
+committed transactions in a separate place. Maybe I am wrong.
+
+This syncing method just seems so natural since we have pg_log. That is
+why I keep bringing it up until people tell me I am stupid.
+
+This transaction commit/sync stuff is complicated, and takes a while to
+hash out in a group.
+
+---------------------------------------------------------------------------
+
+I just re-read your description, and I see what you are saying. My idea
+has pg_log commit flag be real commit flags while the system is running,
+but on reboot after failure, we remove the commit flags on non-synced
+stuff before we start up.
+
+Your idea is to make pg_log commit flags only appear in in-memory copies
+of pg_log, and write the commit flags to disk only after the sync is
+done.
+
+Either way will work. The question is, "Which is easier?" The OS is
+going to sync pg_log on its own. We would almost need a second copy of
+pg_log, one copy to be used on postmaster startup, and a second to be
+used by running backends, and the postmaster would make a copy of the
+running backend pg_log, sync the disks, and copy it to the boot copy.
+
+I don't see how the backend is going to figure out which pg_log pages
+were modified and need to be sent to the boot copy of pg_log.
+
+Now that I am thinking, here is a good idea. Instead of a fancy
+transaction queue, what if we just have the backend record the lowest
+numbered transaction they commit in a shared memory area. If the
+current transaction id they commit is greater than the minimum, then
+change nothing. That way, the backend could copy all pg_log pages
+containing that minimum pg_log transaction id up to the most recent
+pg_log page, do the sync, and copy just those to the boot copy of
+pg_log.
+
+This eliminates the transaction id queue.
+
+The nice thing about the sync-flag in pg_log is that there is no copying
+by the backend. But we would have to spin through the file to set those
+sync bits. Your method just copies whole pages to the boot copy.
+
+---------------------------------------------------------------------------
+
+I don't want to force this idea on anyone, or annoy anyone. I just
+think it needs to be considered. The concepts are unusual, so once
+people get the full idea, if they don't like it, we can trash it. I
+still think it holds promise.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From hotz@jpl.nasa.gov Wed Nov 5 15:30:18 1997
+Received: from hotzsun.jpl.nasa.gov (hotzsun.jpl.nasa.gov [137.79.51.138])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA09500
+ for <maillist@candle.pha.pa.us>; Wed, 5 Nov 1997 15:30:16 -0500 (EST)
+Received: from [137.79.51.141] (hotzmac [137.79.51.141]) by hotzsun.jpl.nasa.gov (8.7.6/8.7.3) with SMTP id MAA10100; Wed, 5 Nov 1997 12:29:58 -0800 (PST)
+X-Sender: hotzmail@hotzsun.jpl.nasa.gov
+Message-Id: <v02140b02b0868294bbc1@[137.79.51.141]>
+Mime-Version: 1.0
+Content-Type: text/plain; charset="us-ascii"
+Date: Wed, 5 Nov 1997 12:29:58 -0800
+To: Bruce Momjian <maillist@candle.pha.pa.us>,
+ matti@algonet.se (Mattias Kregert)
+From: hotz@jpl.nasa.gov (Henry B. Hotz)
+Subject: Re: [HACKERS] My $.02, was: PERFORMANCE and Good Bye, Time Travel!
+Cc: pgsql-hackers@postgreSQL.org
+Status: OR
+
+At 11:15 AM 11/5/97, Bruce Momjian wrote:
+>The strange thing I am hearing is that the people who use PostgreSQL are
+>more worried about data recovery from a crash than million-dollar
+>companies that use commercial databases.
+>
+>I don't get it.
+
+I would run PG to make sure that committed transactions were really written
+to disk because that seems "correct" and I don't have the kind of
+performance requirements that would push me to do otherwise.
+
+That said, I can see a need for varying performance/crash-immunity
+tradeoffs, and at least *one* option in between "correct" and "unprotected"
+operation would seem desirable.
+
+Signature failed Preliminary Design Review.
+Feasibility of a new signature is currently being evaluated.
+h.b.hotz@jpl.nasa.gov, or hbhotz@oxy.edu
+
+
+
+From owner-pgsql-hackers@hub.org Thu Nov 6 15:51:23 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA04634
+ for <maillist@candle.pha.pa.us>; Thu, 6 Nov 1997 15:51:08 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id PAA24783; Thu, 6 Nov 1997 15:36:47 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 06 Nov 1997 15:36:07 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id PAA24514 for pgsql-hackers-outgoing; Thu, 6 Nov 1997 15:36:02 -0500 (EST)
+Received: from guevara.bildbasen.kiruna.se (guevara.bildbasen.kiruna.se [193.45.225.110]) by hub.org (8.8.5/8.7.5) with SMTP id PAA24319 for <pgsql-hackers@postgreSQL.org>; Thu, 6 Nov 1997 15:35:32 -0500 (EST)
+Received: (qmail 9764 invoked by uid 129); 6 Nov 1997 20:34:35 -0000
+Date: 6 Nov 1997 20:34:35 -0000
+Message-ID: <19971106203435.9763.qmail@guevara.bildbasen.kiruna.se>
+From: Goran Thyni <goran@bildbasen.se>
+To: pgsql-hackers@postgreSQL.org
+In-reply-to: <34619E9E.622F563@algonet.se> (message from Mattias Kregert on
+ Thu, 06 Nov 1997 11:40:30 +0100)
+Subject: [HACKERS] Re: Performance vs. Crash Recovery
+Mime-Version: 1.0
+Content-Type: text/plain; charset=ISO-8859-1
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+
+I am getting quiet bored by this discussion,
+if someone has a strong opinion about how this
+should be done go ahead and make a test implementation
+then we have something to discuss.
+
+In the mean time, if you want best possible data protection
+mount you database disk sync:ed. This is safer than any scheme
+we could come up with.
+D*mned slow too, so everybody should be happy. :-)
+
+And I see no point implement a periodic sync in postmaster.
+All unices has cron, why not just use that.
+Or even a stupid 1-liner (ba)sh-script like:
+
+while true; do sleep 20; sync; done
+
+ best regards,
+--
+---------------------------------------------
+Göran Thyni, sysadm, JMS Bildbasen, Kiruna
+
+
+
+From vadim@sable.krasnoyarsk.su Thu Nov 6 23:31:41 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04723
+ for <maillist@candle.pha.pa.us>; Thu, 6 Nov 1997 23:31:21 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id LAA25438; Fri, 7 Nov 1997 11:36:25 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <34629AC9.15FB7483@sable.krasnoyarsk.su>
+Date: Fri, 07 Nov 1997 11:36:25 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: marc@fallon.classyad.com, hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <199711051953.OAA08283@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > > The only time that synced flag is used, is when the database starts up,
+> > > and it sees that the previous shutdown was not clean.
+> > >
+> > > What am I missing here?
+> >
+> > Ok, I see. But we can avoid 'synced' flag: we can make (just before
+> > sync-ing data pages) in-memory copies of "on-line" durty pg_log pages
+> > to being written/fsynced and perform write/fsync from these copies
+> > without stopping new commits in "on-line" page(s) (nothing must go
+> > to disk from "on-line" log pages).
+>
+> [Working late tonight?]
+
+[Yes]
+
+> I just re-read your description, and I see what you are saying. My idea
+> has pg_log commit flag be real commit flags while the system is running,
+> but on reboot after failure, we remove the commit flags on non-synced
+> stuff before we start up.
+>
+> Your idea is to make pg_log commit flags only appear in in-memory copies
+> of pg_log, and write the commit flags to disk only after the sync is
+> done.
+>
+> Either way will work. The question is, "Which is easier?" The OS is
+> going to sync pg_log on its own. We would almost need a second copy of
+> pg_log, one copy to be used on postmaster startup, and a second to be
+> used by running backends, and the postmaster would make a copy of the
+> running backend pg_log, sync the disks, and copy it to the boot copy.
+>
+> I don't see how the backend is going to figure out which pg_log pages
+> were modified and need to be sent to the boot copy of pg_log.
+>
+> Now that I am thinking, here is a good idea. Instead of a fancy
+> transaction queue, what if we just have the backend record the lowest
+> numbered transaction they commit in a shared memory area. If the
+> current transaction id they commit is greater than the minimum, then
+> change nothing. That way, the backend could copy all pg_log pages
+> containing that minimum pg_log transaction id up to the most recent
+> pg_log page, do the sync, and copy just those to the boot copy of
+> pg_log.
+>
+> This eliminates the transaction id queue.
+>
+> The nice thing about the sync-flag in pg_log is that there is no copying
+> by the backend. But we would have to spin through the file to set those
+> sync bits. Your method just copies whole pages to the boot copy.
+
+ In my plans to re-design transaction system I supposed to keep in shmem
+two last pg_log pages. They are most often used and using ReadBuffer/WriteBuffer
+to access them is not good idea. Also, we could use spinlock instead of
+lock manager to synchronize access to these pages (as I see in spin.c
+spinlock-s could be shared, but only exclusive ones are used) - spinlocks
+are faster.
+ These two last pg_log pages are "online" ones. Race condition: when one or
+both of online pages becomes non-online ones, i.e. pg_log has to be expanded
+when writing commit/abort of "big" xid. This is how we could handle this
+in "buffered" logging (delayed fsync) mode:
+
+ When backend want to write commit/abort status he acquires exclusive
+OnLineLogLock. If xid belongs to online pages then backend writes status
+and releases spin. If xid is less than least xid on 1st online page then
+backend releases spin and does exactly the same what he does in normal mode:
+flush (write and fsync) all durty data files, lock pg_log for write, ReadBuffer,
+update xid status, WriteBuffer, release write lock, flush pg_log.
+If xid is greater than max xid on 2nd online page then the simplest way is
+just do sync(); sync() (two times), flush 1st or both online pages,
+read new page(s) into online pages space, update xid status,
+release OnLineLogLock spin. We could try other ways but pg_log expanding
+is rare case (32K xids in one pg_log page)...
+ All what postmaster will have to do is:
+1. Get shared OnLineLogLock.
+2. Copy 2 x 8K data to private place.
+3. Release spinlock.
+4. sync(); sync(); (two times!)
+5. Flush online pages.
+
+We could use -F DELAY_TIME to turn fsync delayed mode ON.
+
+And, btw, having two bits for xact status we have only one unused
+status value (0x11) currently - I would like to use this for
+nested xactions and savepoints...
+
+> I don't want to force this idea on anyone, or annoy anyone. I just
+> think it needs to be considered. The concepts are unusual, so once
+> people get the full idea, if they don't like it, we can trash it. I
+> still think it holds promise.
+
+Agreed.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Fri Nov 7 01:32:49 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07651
+ for <maillist@candle.pha.pa.us>; Fri, 7 Nov 1997 01:32:47 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA23328 for <maillist@candle.pha.pa.us>; Thu, 6 Nov 1997 23:46:08 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA19565; Thu, 6 Nov 1997 23:38:55 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 06 Nov 1997 23:36:53 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA18911 for pgsql-hackers-outgoing; Thu, 6 Nov 1997 23:36:44 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA18779 for <pgsql-hackers@postgreSQL.org>; Thu, 6 Nov 1997 23:36:02 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id LAA25448; Fri, 7 Nov 1997 11:40:29 +0700 (KRS)
+Message-ID: <34629BBD.59E2B600@sable.krasnoyarsk.su>
+Date: Fri, 07 Nov 1997 11:40:29 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: Mattias Kregert <matti@algonet.se>, pgsql-hackers@postgreSQL.org
+Subject: Re: Sync:ing data and log (Was: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!)
+References: <199711061810.NAA02118@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce Momjian wrote:
+>
+> >
+> > Never use sync(). Use fsync(). Other processes should take care of their
+> > own syncing. If you use sync(), and you have a lot of disks, the sync
+> > can
+> > take half a minute if you are unlucky.
+>
+> We could use fsync() but then the postmaster has to know what tables
+> have dirty buffers, and I don't think there is an easy way to do this.
+
+There is one way - shared system cache...
+
+Vadim
+
+
+From vadim@sable.krasnoyarsk.su Fri Nov 7 01:31:24 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07639
+ for <maillist@candle.pha.pa.us>; Fri, 7 Nov 1997 01:31:22 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA23094 for <maillist@candle.pha.pa.us>; Thu, 6 Nov 1997 23:39:00 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id LAA25457; Fri, 7 Nov 1997 11:43:52 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <34629C87.3F54BC7E@sable.krasnoyarsk.su>
+Date: Fri, 07 Nov 1997 11:43:51 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Mattias Kregert <matti@algonet.se>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, pgsql-hackers@postgreSQL.org
+Subject: Re: Performance vs. Crash Recovery (Was: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!)
+References: <199711051615.LAA02260@candle.pha.pa.us> <34619E9E.622F563@algonet.se>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Mattias Kregert wrote:
+>
+> > The strange thing I am hearing is that the people who use PostgreSQL are
+> > more worried about data recovery from a crash than million-dollar
+> > companies that use commercial databases.
+> >
+> > I don't get it.
+>
+> Perhaps the million-dollar companies have more sophisticated hardware,
+> like big expensive disk arrays, big UPS:es and parallell backup
+> servers?
+> If so, the risk of harware failure is much smaller for them.
+
+More of that - Informix is more stable than postgres: elog(FATAL)
+occures sometime and in fsync delayed mode this will cause
+of losing xaction too, not onle hard/OS failure.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Fri Nov 7 01:31:26 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07642
+ for <maillist@candle.pha.pa.us>; Fri, 7 Nov 1997 01:31:24 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id AAA24358 for <maillist@candle.pha.pa.us>; Fri, 7 Nov 1997 00:09:47 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA00167; Fri, 7 Nov 1997 00:03:17 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 07 Nov 1997 00:01:26 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA29427 for pgsql-hackers-outgoing; Fri, 7 Nov 1997 00:01:19 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA29364 for <hackers@postgreSQL.org>; Fri, 7 Nov 1997 00:01:02 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id XAA05565;
+ Thu, 6 Nov 1997 23:54:33 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711070454.XAA05565@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Thu, 6 Nov 1997 23:54:33 -0500 (EST)
+Cc: marc@fallon.classyad.com, hackers@postgreSQL.org
+In-Reply-To: <34629AC9.15FB7483@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 7, 97 11:36:25 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+I was worried when you didn't respond to my last list of ideas. I
+thought perhaps the idea was getting on your nerves.
+
+I haven't dropped the idea because:
+
+ 1) it offers 2-9 times speedup in database modifications
+ 2) this is how the big commercial system handle it, and I think
+ we need to give users this option.
+ 3) in the way I had it designed, it wouldn't take much work to
+ do it.
+
+Anything that promises that much speedup, if it can be done easy, I say
+lets consider it, even if you loose 60 seconds of changes.
+
+
+> In my plans to re-design transaction system I supposed to keep in shmem
+> two last pg_log pages. They are most often used and using ReadBuffer/WriteBuffer
+> to access them is not good idea. Also, we could use spinlock instead of
+> lock manager to synchronize access to these pages (as I see in spin.c
+> spinlock-s could be shared, but only exclusive ones are used) - spinlocks
+> are faster.
+
+Ah, so you already had the idea of having on-line pages in shared memory
+as part of a transaction system overhaul? Right now, does each backend
+lock/read/write/unlock to get at pg_log? Wow, that is bad.
+
+Perhaps mmap() would be a good idea. My system has msync() to flush
+mmap()'ed pages to the underlying file. You would still run fsync()
+after that. This may give us the best of both worlds: a shared-memory
+area of variable size, and control of when it get flushed to disk. Do
+other OS's have this? I have a feeling OS's with unified buffer caches
+don't have this ability to determine when the underlying mmap'ed file
+gets sent to the underlying file and disk.
+
+
+> These two last pg_log pages are "online" ones. Race condition: when one or
+> both of online pages becomes non-online ones, i.e. pg_log has to be expanded
+> when writing commit/abort of "big" xid. This is how we could handle this
+> in "buffered" logging (delayed fsync) mode:
+>
+> When backend want to write commit/abort status he acquires exclusive
+> OnLineLogLock. If xid belongs to online pages then backend writes status
+> and releases spin. If xid is less than least xid on 1st online page then
+> backend releases spin and does exactly the same what he does in normal mode:
+> flush (write and fsync) all durty data files, lock pg_log for write, ReadBuffer,
+> update xid status, WriteBuffer, release write lock, flush pg_log.
+> If xid is greater than max xid on 2nd online page then the simplest way is
+> just do sync(); sync() (two times), flush 1st or both online pages,
+> read new page(s) into online pages space, update xid status,
+> release OnLineLogLock spin. We could try other ways but pg_log expanding
+> is rare case (32K xids in one pg_log page)...
+> All what postmaster will have to do is:
+> 1. Get shared OnLineLogLock.
+> 2. Copy 2 x 8K data to private place.
+> 3. Release spinlock.
+> 4. sync(); sync(); (two times!)
+> 5. Flush online pages.
+>
+> We could use -F DELAY_TIME to turn fsync delayed mode ON.
+>
+> And, btw, having two bits for xact status we have only one unused
+> status value (0x11) currently - I would like to use this for
+> nested xactions and savepoints...
+
+I saw that. By keeping two copies of pg_log, one in memory to be used
+by all backend, and another that hits the disk, it certainly will work.
+
+>
+> > I don't want to force this idea on anyone, or annoy anyone. I just
+> > think it needs to be considered. The concepts are unusual, so once
+> > people get the full idea, if they don't like it, we can trash it. I
+> > still think it holds promise.
+>
+> Agreed.
+>
+> Vadim
+>
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Fri Nov 7 01:03:09 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07314
+ for <maillist@candle.pha.pa.us>; Fri, 7 Nov 1997 01:03:05 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA07879; Fri, 7 Nov 1997 00:57:42 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 07 Nov 1997 00:55:52 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA03918 for pgsql-hackers-outgoing; Fri, 7 Nov 1997 00:55:46 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA02961 for <hackers@postgreSQL.org>; Fri, 7 Nov 1997 00:55:18 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id MAA25567; Fri, 7 Nov 1997 12:59:29 +0700 (KRS)
+Message-ID: <3462AE40.FF6D5DF@sable.krasnoyarsk.su>
+Date: Fri, 07 Nov 1997 12:59:28 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: marc@fallon.classyad.com, hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <199711070454.XAA05565@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce Momjian wrote:
+>
+> I was worried when you didn't respond to my last list of ideas. I
+> thought perhaps the idea was getting on your nerves.
+
+No, I was (and, unfortunately, I still) busy...
+
+>
+> I haven't dropped the idea because:
+>
+> 1) it offers 2-9 times speedup in database modifications
+> 2) this is how the big commercial system handle it, and I think
+> we need to give users this option.
+> 3) in the way I had it designed, it wouldn't take much work to
+> do it.
+>
+> Anything that promises that much speedup, if it can be done easy, I say
+> lets consider it, even if you loose 60 seconds of changes.
+
+I agreed with your un-buffered logging idea. This would be excellent
+feature for un-critical dbase usings (WWW, etc).
+
+>
+> > In my plans to re-design transaction system I supposed to keep in shmem
+> > two last pg_log pages. They are most often used and using ReadBuffer/WriteBuffer
+> > to access them is not good idea. Also, we could use spinlock instead of
+> > lock manager to synchronize access to these pages (as I see in spin.c
+> > spinlock-s could be shared, but only exclusive ones are used) - spinlocks
+> > are faster.
+>
+> Ah, so you already had the idea of having on-line pages in shared memory
+> as part of a transaction system overhaul? Right now, does each backend
+
+Yes. I hope to implement this in the next 1-2 weeks.
+
+> lock/read/write/unlock to get at pg_log? Wow, that is bad.
+
+Yes, he does.
+
+>
+> Perhaps mmap() would be a good idea. My system has msync() to flush
+> mmap()'ed pages to the underlying file. You would still run fsync()
+> after that. This may give us the best of both worlds: a shared-memory
+ ^^^^^^^^^^^^^
+> area of variable size, and control of when it get flushed to disk. Do
+ ^^^^^^^^^^^^^^^^^^^^^
+I like it. FreeBSD supports
+
+MAP_ANON Map anonymous memory not associated with any specific file.
+
+It would be nice to use mmap to get more "shared" memory, but I don't see
+reasons to mmap any particular file to memory. Having two last pg_log pages
+in memory + xact commit/abort writeback optimization (updation of commit/abort
+xmin/xmax status in tuples by any scan - we already have this) reduce access
+to "old" pg_log pages to zero.
+
+> other OS's have this? I have a feeling OS's with unified buffer caches
+> don't have this ability to determine when the underlying mmap'ed file
+> gets sent to the underlying file and disk.
+>
+> > These two last pg_log pages are "online" ones. Race condition: when one or
+> > both of online pages becomes non-online ones, i.e. pg_log has to be expanded
+> > when writing commit/abort of "big" xid. This is how we could handle this
+> > in "buffered" logging (delayed fsync) mode:
+> >
+> > When backend want to write commit/abort status he acquires exclusive
+> > OnLineLogLock. If xid belongs to online pages then backend writes status
+> > and releases spin. If xid is less than least xid on 1st online page then
+> > backend releases spin and does exactly the same what he does in normal mode:
+> > flush (write and fsync) all durty data files, lock pg_log for write, ReadBuffer,
+> > update xid status, WriteBuffer, release write lock, flush pg_log.
+> > If xid is greater than max xid on 2nd online page then the simplest way is
+> > just do sync(); sync() (two times), flush 1st or both online pages,
+> > read new page(s) into online pages space, update xid status,
+> > release OnLineLogLock spin. We could try other ways but pg_log expanding
+> > is rare case (32K xids in one pg_log page)...
+> > All what postmaster will have to do is:
+> > 1. Get shared OnLineLogLock.
+> > 2. Copy 2 x 8K data to private place.
+> > 3. Release spinlock.
+> > 4. sync(); sync(); (two times!)
+> > 5. Flush online pages.
+> >
+> > We could use -F DELAY_TIME to turn fsync delayed mode ON.
+> >
+> > And, btw, having two bits for xact status we have only one unused
+> > status value (0x11) currently - I would like to use this for
+> > nested xactions and savepoints...
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+More about this: 0x11 could mean "this _child_ transaction is committed -
+you have to lookup in pg_xact_child to get parent xid and use pg_log again
+to get parent xact status". If parent committed then child xact status
+will be changed to 0x10 (committed) else - to 0x01 (aborted). Using this
+we could get xact nesting and savepoints by starting new child xaction
+inside running one...
+
+>
+> I saw that. By keeping two copies of pg_log, one in memory to be used
+ ^^^^^^
+ Just two pg_log pages...
+
+> by all backend, and another that hits the disk, it certainly will work.
+
+Vadim
+
+
+From vadim@sable.krasnoyarsk.su Fri Nov 7 01:30:59 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07599
+ for <maillist@candle.pha.pa.us>; Fri, 7 Nov 1997 01:30:58 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26793 for <maillist@candle.pha.pa.us>; Fri, 7 Nov 1997 01:12:33 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA25592; Fri, 7 Nov 1997 13:16:39 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <3462B247.ABD322C@sable.krasnoyarsk.su>
+Date: Fri, 07 Nov 1997 13:16:39 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Jan Wieck <wieck@sapserv.debis.de>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <m0xT9nq-000BFQC@orion.SAPserv.Hamburg.dsh.de>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+wieck@sapserv.debis.de wrote:
+>
+> Bruce wrote:
+> >
+> > > > It seems that this is what Oracle does, but Sybase writes queries
+> > > > (with transaction ids, of 'course, and before execution) and
+> > > > begin, commit/abort events <-- this is better for non-overwriting
+> > > > system (shorter redo file), but, agreed, recovering is more complicated.
+> > > >
+> > > > Vadim
+> > > >
+> > >
+> > > Writing only the queries (and only those that really modify
+> > > data - no selects) would be much smarter and the redo files
+> > > will be shorter. But it wouldn't fit for PostgreSQL as long
+> > > as someone can submit a query like
+> > >
+> > > DELETE FROM xxx WHERE oid = 59337;
+> >
+> > Interesting point. Currently, an insert shows the OID as output in
+> > psql. Perhaps we could do a little oid-manipulating to set the oid of
+> > the insert.
+>
+> Only for simple inserts, not on
+>
+> INSERT INTO xxx SELECT any_type_of_merge_join;
+
+I don't know how but Sybase handle this and IDENTITY (case of OIDs) too.
+But I don't object you, Jan, just because I havn't time to do
+"log queries" redo implementation and so I would like to have "log changes"
+redo at least. (Actually, "log changes" is good for my production dbase
+with 1 - 2 thousand updations per day).
+(BTW, "incrementing" backup could be implemented without redo - I have
+some thoughts about this, - but having additional recovering is good
+in any case).
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Fri Nov 7 15:42:58 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA22341
+ for <maillist@candle.pha.pa.us>; Fri, 7 Nov 1997 15:42:55 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id PAA02769; Fri, 7 Nov 1997 15:28:54 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 07 Nov 1997 15:24:00 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id PAA01318 for pgsql-hackers-outgoing; Fri, 7 Nov 1997 15:23:52 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id PAA00705 for <hackers@postgreSQL.org>; Fri, 7 Nov 1997 15:21:56 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id PAA20010;
+ Fri, 7 Nov 1997 15:20:10 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711072020.PAA20010@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Fri, 7 Nov 1997 15:20:10 -0500 (EST)
+Cc: marc@fallon.classyad.com, hackers@postgreSQL.org
+In-Reply-To: <3462AE40.FF6D5DF@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 7, 97 12:59:28 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> > Anything that promises that much speedup, if it can be done easy, I say
+> > lets consider it, even if you loose 60 seconds of changes.
+>
+> I agreed with your un-buffered logging idea. This would be excellent
+> feature for un-critical dbase usings (WWW, etc).
+
+Actually, it is buffered logging. We currently have unbuffered logging,
+I think.
+
+> > > In my plans to re-design transaction system I supposed to keep in shmem
+> > > two last pg_log pages. They are most often used and using ReadBuffer/WriteBuffer
+> > > to access them is not good idea. Also, we could use spinlock instead of
+> > > lock manager to synchronize access to these pages (as I see in spin.c
+> > > spinlock-s could be shared, but only exclusive ones are used) - spinlocks
+> > > are faster.
+> >
+> > Ah, so you already had the idea of having on-line pages in shared memory
+> > as part of a transaction system overhaul? Right now, does each backend
+>
+> Yes. I hope to implement this in the next 1-2 weeks.
+>
+> > lock/read/write/unlock to get at pg_log? Wow, that is bad.
+>
+> Yes, he does.
+>
+> >
+> > Perhaps mmap() would be a good idea. My system has msync() to flush
+> > mmap()'ed pages to the underlying file. You would still run fsync()
+> > after that. This may give us the best of both worlds: a shared-memory
+> ^^^^^^^^^^^^^
+> > area of variable size, and control of when it get flushed to disk. Do
+> ^^^^^^^^^^^^^^^^^^^^^
+> I like it. FreeBSD supports
+>
+> MAP_ANON Map anonymous memory not associated with any specific file.
+>
+> It would be nice to use mmap to get more "shared" memory, but I don't see
+> reasons to mmap any particular file to memory. Having two last pg_log pages
+> in memory + xact commit/abort writeback optimization (updation of commit/abort
+> xmin/xmax status in tuples by any scan - we already have this) reduce access
+> to "old" pg_log pages to zero.
+
+I totally agree. There is no advantage to mmap() vs. shared memory for
+us. I thought if we could control when the mmap() gets flushed to disk,
+we could let the OS handle the syncing, but I doubt this is going to be
+portable.
+
+Though, we could mmap() pg_log, and that way backends would not have to
+read/write the blocks, and they could all see the same data. But with
+the new scheme, they have most transaction ids in shared memory.
+
+Interesting you mention the scan updating the transaction status. We
+would have a problem here. It is possible a backend will update the
+commit status of a data page, and that data page will make it to disk,
+but if there is a crash before the update pg_log gets sync'ed, there
+would be a partial transaction in the system.
+
+I don't know any way that a backend would know the transaction has hit
+disk, and the data commit flag could be set. You don't want to update
+the commit flag of the data page until entire transaction has been
+sync'ed. The only way to do that would be to have a 'commit and synced'
+flag, but you want to save that for nested transactions.
+
+Another case this could come in handy is to allow reuse of superceeded
+data rows. If the transaction is committed and synced, the row space
+could be reused by another transaction.
+
+> > other OS's have this? I have a feeling OS's with unified buffer caches
+> > don't have this ability to determine when the underlying mmap'ed file
+> > gets sent to the underlying file and disk.
+> >
+> > > These two last pg_log pages are "online" ones. Race condition: when one or
+> > > both of online pages becomes non-online ones, i.e. pg_log has to be expanded
+> > > when writing commit/abort of "big" xid. This is how we could handle this
+> > > in "buffered" logging (delayed fsync) mode:
+> > >
+> > > When backend want to write commit/abort status he acquires exclusive
+> > > OnLineLogLock. If xid belongs to online pages then backend writes status
+
+This confuses me. Why does a backend need to lock pg_log to update a
+transaction status?
+
+> > > and releases spin. If xid is less than least xid on 1st online page then
+> > > backend releases spin and does exactly the same what he does in normal mode:
+> > > flush (write and fsync) all durty data files, lock pg_log for write, ReadBuffer,
+> > > update xid status, WriteBuffer, release write lock, flush pg_log.
+> > > If xid is greater than max xid on 2nd online page then the simplest way is
+> > > just do sync(); sync() (two times), flush 1st or both online pages,
+> > > read new page(s) into online pages space, update xid status,
+> > > release OnLineLogLock spin. We could try other ways but pg_log expanding
+> > > is rare case (32K xids in one pg_log page)...
+> > > All what postmaster will have to do is:
+> > > 1. Get shared OnLineLogLock.
+> > > 2. Copy 2 x 8K data to private place.
+> > > 3. Release spinlock.
+> > > 4. sync(); sync(); (two times!)
+> > > 5. Flush online pages.
+
+Great.
+
+> > >
+> > > We could use -F DELAY_TIME to turn fsync delayed mode ON.
+> > >
+> > > And, btw, having two bits for xact status we have only one unused
+> > > status value (0x11) currently - I would like to use this for
+> > > nested xactions and savepoints...
+> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> More about this: 0x11 could mean "this _child_ transaction is committed -
+> you have to lookup in pg_xact_child to get parent xid and use pg_log again
+> to get parent xact status". If parent committed then child xact status
+> will be changed to 0x10 (committed) else - to 0x01 (aborted). Using this
+> we could get xact nesting and savepoints by starting new child xaction
+> inside running one...
+
+OK.
+
+>
+> >
+> > I saw that. By keeping two copies of pg_log, one in memory to be used
+> ^^^^^^
+> Just two pg_log pages...
+
+Got it.
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Sun Nov 9 22:07:36 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA04655
+ for <maillist@candle.pha.pa.us>; Sun, 9 Nov 1997 22:07:30 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id VAA07023; Sun, 9 Nov 1997 21:55:54 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 09 Nov 1997 21:52:20 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id VAA06174 for pgsql-hackers-outgoing; Sun, 9 Nov 1997 21:52:13 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id VAA06092 for <hackers@postgreSQL.org>; Sun, 9 Nov 1997 21:51:58 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id VAA04150;
+ Sun, 9 Nov 1997 21:50:29 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711100250.VAA04150@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! (fwd)
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Sun, 9 Nov 1997 21:50:29 -0500 (EST)
+Cc: hackers@postgreSQL.org (PostgreSQL-development)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Forwarded message:
+> > > Perhaps mmap() would be a good idea. My system has msync() to flush
+> > > mmap()'ed pages to the underlying file. You would still run fsync()
+> > > after that. This may give us the best of both worlds: a shared-memory
+> > ^^^^^^^^^^^^^
+> > > area of variable size, and control of when it get flushed to disk. Do
+> > ^^^^^^^^^^^^^^^^^^^^^
+> > I like it. FreeBSD supports
+> >
+> > MAP_ANON Map anonymous memory not associated with any specific file.
+> >
+> > It would be nice to use mmap to get more "shared" memory, but I don't see
+> > reasons to mmap any particular file to memory. Having two last pg_log pages
+> > in memory + xact commit/abort writeback optimization (updation of commit/abort
+> > xmin/xmax status in tuples by any scan - we already have this) reduce access
+> > to "old" pg_log pages to zero.
+>
+> I totally agree. There is no advantage to mmap() vs. shared memory for
+> us. I thought if we could control when the mmap() gets flushed to disk,
+> we could let the OS handle the syncing, but I doubt this is going to be
+> portable.
+>
+> Though, we could mmap() pg_log, and that way backends would not have to
+> read/write the blocks, and they could all see the same data. But with
+> the new scheme, they have most transaction ids in shared memory.
+>
+> Interesting you mention the scan updating the transaction status. We
+> would have a problem here. It is possible a backend will update the
+> commit status of a data page, and that data page will make it to disk,
+> but if there is a crash before the update pg_log gets sync'ed, there
+> would be a partial transaction in the system.
+>
+> I don't know any way that a backend would know the transaction has hit
+> disk, and the data commit flag could be set. You don't want to update
+> the commit flag of the data page until entire transaction has been
+> sync'ed. The only way to do that would be to have a 'commit and synced'
+> flag, but you want to save that for nested transactions.
+>
+> Another case this could come in handy is to allow reuse of superceeded
+> data rows. If the transaction is committed and synced, the row space
+> could be reused by another transaction.
+>
+
+I have been thinking about the mmap() issue, and it seems a natural for
+pg_log. You can have every backend mmap() pg_log. It becomes a dynamic
+shared memory area that is auto-initialized to the contents of pg_log,
+and all changes can be made by all backends. No locking needed. We can
+also flush the changes to the underlying file. Under bsdi, you can also
+have the mmap area follow you across exec() calls, so each backend
+doesn't have to do anything. I want to replace exec with fork also, so
+the stuff would be auto-loaded in the address space of each backend.
+
+This way, you don't have to have two on-line pages and move them around
+as pg_log grows.
+
+The only problem remains how to mark certain transactions as synced or
+force only synced transactions to hit the pg_log file itself, and data
+row commit status only should be updated for synced transactions.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Sun Nov 9 23:00:58 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA05394
+ for <maillist@candle.pha.pa.us>; Sun, 9 Nov 1997 23:00:55 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA25139 for <maillist@candle.pha.pa.us>; Sun, 9 Nov 1997 22:42:33 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id KAA01845; Mon, 10 Nov 1997 10:49:25 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <34668444.237C228A@sable.krasnoyarsk.su>
+Date: Mon, 10 Nov 1997 10:49:24 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: marc@fallon.classyad.com, hackers@postgreSQL.org
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+References: <199711072020.PAA20010@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > > Anything that promises that much speedup, if it can be done easy, I say
+> > > lets consider it, even if you loose 60 seconds of changes.
+> >
+> > I agreed with your un-buffered logging idea. This would be excellent
+> > feature for un-critical dbase usings (WWW, etc).
+>
+> Actually, it is buffered logging. We currently have unbuffered logging,
+> I think.
+
+Sorry - mistyping.
+
+>
+> Interesting you mention the scan updating the transaction status. We
+> would have a problem here. It is possible a backend will update the
+> commit status of a data page, and that data page will make it to disk,
+> but if there is a crash before the update pg_log gets sync'ed, there
+> would be a partial transaction in the system.
+
+You're right! Currently, only system relations can be affected by this:
+backend releases locks on user tables after syncing data and pg_log.
+I'll keep this in mind...
+
+> > > > These two last pg_log pages are "online" ones. Race condition: when one or
+> > > > both of online pages becomes non-online ones, i.e. pg_log has to be expanded
+> > > > when writing commit/abort of "big" xid. This is how we could handle this
+> > > > in "buffered" logging (delayed fsync) mode:
+> > > >
+> > > > When backend want to write commit/abort status he acquires exclusive
+> > > > OnLineLogLock. If xid belongs to online pages then backend writes status
+>
+> This confuses me. Why does a backend need to lock pg_log to update a
+> transaction status?
+
+What if two backends try to change xact statuses in the same byte ?
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Sun Nov 9 23:59:50 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA06523
+ for <maillist@candle.pha.pa.us>; Sun, 9 Nov 1997 23:59:48 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA27105 for <maillist@candle.pha.pa.us>; Sun, 9 Nov 1997 23:41:39 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA08860; Sun, 9 Nov 1997 23:35:42 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 09 Nov 1997 23:31:50 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA07962 for pgsql-hackers-outgoing; Sun, 9 Nov 1997 23:31:43 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA07875 for <hackers@postgreSQL.org>; Sun, 9 Nov 1997 23:31:28 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id XAA05566;
+ Sun, 9 Nov 1997 23:17:41 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711100417.XAA05566@candle.pha.pa.us>
+Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Sun, 9 Nov 1997 23:17:41 -0500 (EST)
+Cc: marc@fallon.classyad.com, hackers@postgreSQL.org
+In-Reply-To: <34668444.237C228A@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 10, 97 10:49:24 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> > > > > These two last pg_log pages are "online" ones. Race condition: when one or
+> > > > > both of online pages becomes non-online ones, i.e. pg_log has to be expanded
+> > > > > when writing commit/abort of "big" xid. This is how we could handle this
+> > > > > in "buffered" logging (delayed fsync) mode:
+> > > > >
+> > > > > When backend want to write commit/abort status he acquires exclusive
+> > > > > OnLineLogLock. If xid belongs to online pages then backend writes status
+> >
+> > This confuses me. Why does a backend need to lock pg_log to update a
+> > transaction status?
+>
+> What if two backends try to change xact statuses in the same byte ?
+
+Ooo, you got me. I so hoped to prevent locking. It would be nice if:
+
+ *x |= 3;
+
+would be atomic, but I don't think it is. Most RISC machines don't even
+have an OR against a memory address, I think.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
regards, tom lane
+From aixssd!darrenk@abs.net Thu Dec 5 10:30:53 1996
+Received: from abs.net (root@u1.abs.net [207.114.0.130]) by candle.pha.pa.us (8.8.3/8.7.3) with ESMTP id KAA06591 for <maillist@candle.pha.pa.us>; Thu, 5 Dec 1996 10:30:43 -0500 (EST)
+Received: from aixssd.UUCP (nobody@localhost) by abs.net (8.8.3/8.7.3) with UUCP id KAA01387 for maillist@candle.pha.pa.us; Thu, 5 Dec 1996 10:13:56 -0500 (EST)
+Received: by aixssd (AIX 3.2/UCB 5.64/4.03)
+ id AA36963; Thu, 5 Dec 1996 10:10:24 -0500
+Received: by ceodev (AIX 4.1/UCB 5.64/4.03)
+ id AA34942; Thu, 5 Dec 1996 10:07:56 -0500
+Date: Thu, 5 Dec 1996 10:07:56 -0500
+From: aixssd!darrenk@abs.net (Darren King)
+Message-Id: <9612051507.AA34942@ceodev>
+To: maillist@candle.pha.pa.us
+Subject: Subselect info.
+Mime-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Content-Md5: jaWdPH2KYtdr7ESzqcOp5g==
+Status: OR
+
+> Any of them deal with implementing subselects?
+
+There's a white paper at the www.sybase.com that might
+help a little. It's just a copy of a presentation
+given by the optimizer guru there. Nothing code-wise,
+but he gives a few ways of flattening them with temp
+tables, etc...
+
+Darren
+
+From vadim@sable.krasnoyarsk.su Thu Aug 21 23:42:50 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04109
+ for <maillist@candle.pha.pa.us>; Thu, 21 Aug 1997 23:42:43 -0400 (EDT)
+Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04399; Fri, 22 Aug 1997 12:04:31 +0800 (KRD)
+Sender: root@www.krasnet.ru
+Message-ID: <33FD0FCF.4DAA423A@sable.krasnoyarsk.su>
+Date: Fri, 22 Aug 1997 12:04:31 +0800
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+Subject: Re: subselects
+References: <199708220219.WAA23745@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> Considering the complexity of the primary/secondary changes you are
+> making, I believe subselects will be easier than that.
+
+I don't do changes for P/F keys - just thinking...
+Yes, I think that impl of referential integrity is
+more complex work.
+
+As for subselects:
+
+in plannodes.h
+
+typedef struct Plan {
+...
+ struct Plan *lefttree;
+ struct Plan *righttree;
+} Plan;
+
+/* ----------------
+ * these are are defined to avoid confusion problems with "left"
+ ^^^^^^^^^^^^^^^^^^
+ * and "right" and "inner" and "outer". The convention is that
+ * the "left" plan is the "outer" plan and the "right" plan is
+ * the inner plan, but these make the code more readable.
+ * ----------------
+ */
+#define innerPlan(node) (((Plan *)(node))->righttree)
+#define outerPlan(node) (((Plan *)(node))->lefttree)
+
+First thought is avoid any confusions by re-defining
+
+#define rightPlan(node) (((Plan *)(node))->righttree)
+#define leftPlan(node) (((Plan *)(node))->lefttree)
+
+and change all occurrences of 'outer' & 'inner' in code
+to 'left' & 'inner' ones:
+
+this will allow to use 'outer' & 'inner' things for subselects
+latter, without confusion. My hope is that we may change Executor
+very easy by adding outer/inner plans/TupleSlots to
+EState, CommonState, JoinState, etc and by doing node
+processing in right order.
+
+Subselects are mostly Planner problem.
+
+Unfortunately, I havn't time at the moment: CHECK/DEFAULT...
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Fri Aug 22 00:00:59 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA04354
+ for <maillist@candle.pha.pa.us>; Fri, 22 Aug 1997 00:00:51 -0400 (EDT)
+Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04425; Fri, 22 Aug 1997 12:22:37 +0800 (KRD)
+Sender: root@www.krasnet.ru
+Message-ID: <33FD140D.64880EEB@sable.krasnoyarsk.su>
+Date: Fri, 22 Aug 1997 12:22:37 +0800
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+Subject: Re: subselects
+References: <199708220219.WAA23745@candle.pha.pa.us> <33FD0FCF.4DAA423A@sable.krasnoyarsk.su>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Vadim B. Mikheev wrote:
+>
+> this will allow to use 'outer' & 'inner' things for subselects
+> latter, without confusion. My hope is that we may change Executor
+
+Or may be use 'high' & 'low' for subselecs (to avoid confusion
+with outter hoins).
+
+> very easy by adding outer/inner plans/TupleSlots to
+> EState, CommonState, JoinState, etc and by doing node
+> processing in right order.
+ ^^^^^^^^^^^^^^
+Rule is easy:
+1. Uncorrelated subselect - do 'low' plan node first
+2. Correlated - do left/right first
+
+- just some flag in structures.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Thu Oct 30 17:02:30 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA09682
+ for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 17:02:28 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA20688; Thu, 30 Oct 1997 16:58:40 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:58:24 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA20615 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:58:17 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA20495 for <hackers@postgreSQL.org>; Thu, 30 Oct 1997 16:57:54 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id QAA07726
+ for hackers@postgreSQL.org; Thu, 30 Oct 1997 16:50:29 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199710302150.QAA07726@candle.pha.pa.us>
+Subject: [HACKERS] subselects
+To: hackers@postgreSQL.org (PostgreSQL-development)
+Date: Thu, 30 Oct 1997 16:50:29 -0500 (EST)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+The only thing I have to add to what I had written earlier is that I
+think it is best to have these subqueries executed as early in query
+execution as possible.
+
+Every piece of the backend: parser, optimizer, executor, is designed to
+work on a single query. The earlier we can split up the queries, the
+better those pieces will work at doing their job. You want to be able
+to use the parser and optimizer on each part of the query separately, if
+you can.
+
+
+Forwarded message:
+> I have done some thinking about subselects. There are basically two
+> issues:
+ >
+> Does the query return one row or several rows? This can be
+> determined by seeing if the user uses equals on 'IN' to join the
+> subquery.
+>
+> Is the query correlated, meaning "Does the subquery reference
+> values from the outer query?"
+>
+> (We already have the third type of subquery, the INSERT...SELECT query.)
+>
+> So we have these four combinations:
+>
+> 1) one row, no correlation
+> 2) multiple rows, no correlation
+> 3) one row, correlated
+> 4) multiple rows, correlated
+>
+>
+> With #1, we can execute the subquery, get the value, replace the
+> subquery with the constant returned from the subquery, and execute the
+> outer query.
+>
+> With #2, we can execute the subquery and put the result into a temporary
+> table. We then rewrite the outer query to access the temporary table
+> and replace the subquery with the column name from the temporary table.
+> We probabally put an index on the temp. table, which has only one
+> column, because a subquery can only return one column. We remove the
+> temp. table after query execution.
+>
+> With #3 and #4, we potentially need to execute the subquery for every
+> row returned by the outer query. Performance would be horrible for
+> anything but the smallest query. Another way to handle this is to
+> execute the subquery WITHOUT using any of the outer-query columns to
+> restrict the WHERE clause, and add those columns used to join the outer
+> variables into the target list of the subquery. So for query:
+>
+> select t1.name
+> from tab t1
+> where t1.age = (select max(t2.age)
+> from tab2
+> where tab2.name = t1.name)
+>
+> Execute the subquery and put it in a temporary table:
+>
+> select t2.name, max(t2.age)
+> into table temp999
+> from tab2
+> where tab2.name = t1.name
+>
+> create index i_temp999 on temp999 (name)
+>
+> Then re-write the outer query:
+>
+> select t1.name
+> from tab t1, temp999
+> where t1.age = temp999.age and
+> t1.name = temp999.name
+>
+> The only problem here is that the subselect is running for all entries
+> in tab2, even if the outer query is only going to need a few rows.
+> Determining whether to execute the subquery each time, or create a temp.
+> table is often difficult to determine. Even some non-correlated
+> subqueries are better to execute for each row rather the pre-execute the
+> entire subquery, expecially if the outer query returns few rows.
+>
+> One requirement to handle these issues is better column statistics,
+> which I am working on.
+>
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Fri Oct 31 22:30:58 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA15643
+ for <maillist@candle.pha.pa.us>; Fri, 31 Oct 1997 22:30:56 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA24379 for <maillist@candle.pha.pa.us>; Fri, 31 Oct 1997 22:06:08 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA15503; Fri, 31 Oct 1997 22:03:40 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 31 Oct 1997 22:01:38 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA14136 for pgsql-hackers-outgoing; Fri, 31 Oct 1997 22:01:29 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA13866 for <hackers@postgreSQL.org>; Fri, 31 Oct 1997 22:00:53 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id VAA14566;
+ Fri, 31 Oct 1997 21:37:06 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711010237.VAA14566@candle.pha.pa.us>
+Subject: Re: [HACKERS] subselects
+To: maillist@candle.pha.pa.us (Bruce Momjian)
+Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <199710302150.QAA07726@candle.pha.pa.us> from "Bruce Momjian" at Oct 30, 97 04:50:29 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+One more issue I thought of. You can have multiple subselects in a
+single query, and subselects can have their own subselects.
+
+This makes it particularly important that we define a system that always
+is able to process the subselect BEFORE the upper select. This will
+allow use to handle all these cases without limitations.
+
+>
+> The only thing I have to add to what I had written earlier is that I
+> think it is best to have these subqueries executed as early in query
+> execution as possible.
+>
+> Every piece of the backend: parser, optimizer, executor, is designed to
+> work on a single query. The earlier we can split up the queries, the
+> better those pieces will work at doing their job. You want to be able
+> to use the parser and optimizer on each part of the query separately, if
+> you can.
+>
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From hannu@trust.ee Sun Nov 2 10:33:33 1997
+Received: from sid.trust.ee (sid.trust.ee [194.204.23.180])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27619
+ for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 10:32:04 -0500 (EST)
+Received: from sid.trust.ee (wink.trust.ee [194.204.23.184])
+ by sid.trust.ee (8.8.5/8.8.5) with ESMTP id RAA02233;
+ Sun, 2 Nov 1997 17:30:11 +0200
+Message-ID: <345C9BFD.986C68AA@sid.trust.ee>
+Date: Sun, 02 Nov 1997 17:27:57 +0200
+From: Hannu Krosing <hannu@trust.ee>
+X-Mailer: Mozilla 4.02 [en] (Win95; I)
+MIME-Version: 1.0
+To: hackers-digest@postgresql.org
+CC: maillist@candle.pha.pa.us
+Subject: Re: [HACKERS] subselects
+References: <199711010401.XAA09216@hub.org>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+> Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST)
+> From: Bruce Momjian <maillist@candle.pha.pa.us>
+> Subject: Re: [HACKERS] subselects
+>
+> One more issue I thought of. You can have multiple subselects in a
+> single query, and subselects can have their own subselects.
+>
+> This makes it particularly important that we define a system that always
+> is able to process the subselect BEFORE the upper select. This will
+> allow use to handle all these cases without limitations.
+
+This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a
+search criteria for the subselect,
+for example you can't do
+
+update parts p1
+set parts.current_id = (
+ select new_id
+ from parts p2
+ where p1.old_id = p2.new_id);or
+
+select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice
+from parts p1;
+
+there may be of course ways to rewrite these queries (which the optimiser should do
+if it can) but IMHO, these kinds of subselects should still be allowed
+
+> > The only thing I have to add to what I had written earlier is that I
+> > think it is best to have these subqueries executed as early in query
+> > execution as possible.
+> >
+> > Every piece of the backend: parser, optimizer, executor, is designed to
+> > work on a single query. The earlier we can split up the queries, the
+> > better those pieces will work at doing their job. You want to be able
+> > to use the parser and optimizer on each part of the query separately, if
+> > you can.
+> >
+>
+
+Hannu
+
+
+From vadim@sable.krasnoyarsk.su Sun Nov 2 21:30:59 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA14831
+ for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 21:30:57 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id VAA19683 for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 21:20:13 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id JAA17259; Mon, 3 Nov 1997 09:22:38 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <345D356E.353C51DE@sable.krasnoyarsk.su>
+Date: Mon, 03 Nov 1997 09:22:38 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] subselects
+References: <199711021848.NAA08319@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > > One more issue I thought of. You can have multiple subselects in a
+> > > single query, and subselects can have their own subselects.
+> > >
+> > > This makes it particularly important that we define a system that always
+> > > is able to process the subselect BEFORE the upper select. This will
+> > > allow use to handle all these cases without limitations.
+> >
+> > This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a
+> > search criteria for the subselect,
+> > for example you can't do
+> >
+> > update parts p1
+> > set parts.current_id = (
+> > select new_id
+> > from parts p2
+> > where p1.old_id = p2.new_id);or
+> >
+> > select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice
+> > from parts p1;
+> >
+> > there may be of course ways to rewrite these queries (which the optimiser should do
+> > if it can) but IMHO, these kinds of subselects should still be allowed
+>
+> I hadn't even gotten to this point yet, but it is a good thing to keep
+> in mind.
+>
+> In these cases, as in correlated subqueries in the where clause, we will
+> create a temporary table, and add the proper join fields and tables to
+> the clauses. Our version of UPDATE accepts a FROM section, and we will
+> certainly use this for this purpose.
+
+We can't replace subselect with join if there is aggregate
+in subselect.
+
+Actually, I don't see any problems if we going to process subselect
+like sql-funcs: non-correlated subselects can be emulated by
+funcs without args, for correlated subselects parser (analyze.c)
+has to change all upper query references to $1, $2,...
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Mon Nov 3 06:07:12 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27433
+ for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 06:07:03 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id SAA18519; Mon, 3 Nov 1997 18:09:44 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <345DB0F7.5E652F78@sable.krasnoyarsk.su>
+Date: Mon, 03 Nov 1997 18:09:43 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselects
+References: <199711030316.WAA15401@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> >
+> > > In these cases, as in correlated subqueries in the where clause, we will
+> > > create a temporary table, and add the proper join fields and tables to
+> > > the clauses. Our version of UPDATE accepts a FROM section, and we will
+> > > certainly use this for this purpose.
+> >
+> > We can't replace subselect with join if there is aggregate
+> > in subselect.
+>
+> I got lost here. Why can't we handle aggregates?
+
+Sorry, I missed using of temp tables. Sybase uses joins (without
+temp tables) for non-correlated subqueries:
+
+ A noncorrelated subquery can be evaluated as if it were an independent query.
+ Conceptually, the results of the subquery are substituted in the main statement, or
+ outer query. This is not how SQL Server actually processes statements with
+ subqueries. Noncorrelated subqueries can be alternatively stated as joins and
+ are processed as joins by SQL Server.
+
+but this is not possible if there are aggregates in subquery.
+
+>
+> My idea was this. This is a non-correlated subquery.
+...
+No problems with it...
+
+>
+> Here is a correlated example:
+>
+> select *
+> from table_a
+> where table_a.col_a in (select table_b.col_b
+> from table_b
+> where table_b.col_b = table_a.col_c)
+>
+> rewrite as:
+>
+> select distinct table_b.col_b, table_a.col_c -- the distinct is needed
+> into table_sub
+> from table_a, table_b
+
+First, could we add 'where table_b.col_b = table_a.col_c' here ?
+Just to avoid Cartesian results ? I hope we can.
+
+Note that for query
+
+ select *
+ from table_a
+ where table_a.col_a in (select table_b.col_b * table_a.col_c
+ from table_b)
+
+it's better to do
+
+ select distinct table_a.col_a
+ into table table_sub
+ from table_b, table_a
+ where table_a.col_a = table_b.col_b * table_a.col_c
+
+once again - to avoid Cartesians.
+
+But what could we do for
+
+ select *
+ from table_a
+ where table_a.col_a = (select max(table_b.col_b * table_a.col_c)
+ from table_b)
+???
+ select max(table_b.col_b * table_a.col_c), table_a.col_a
+ into table table_sub
+ from table_b, table_a
+ group by table_a.col_a
+
+first tries to sort sizeof(table_a) * sizeof(table_b) tuples...
+For tables big and small with 100 000 and 1000 tuples
+
+select max(x*y), x from big, small group by x
+
+"ate" all free 140M in my file system after 20 minutes (just for
+sorting - nothing more) and was killed...
+
+select x from big where x = cor(x);
+(cor(int4) is 'select max($1*y) from small') takes 20 minutes -
+this is bad too.
+
+> >
+> > Actually, I don't see any problems if we going to process subselect
+> > like sql-funcs: non-correlated subselects can be emulated by
+> > funcs without args, for correlated subselects parser (analyze.c)
+> > has to change all upper query references to $1, $2,...
+>
+> Yes, logically, they are SQL functions, but aren't we going to see
+> terrible performance in such circumstances. My experience is that when
+ ^^^^^^^^^^^^^^^^^^^^
+You're right.
+
+> people are given subselects, they start to do huge jobs with them.
+>
+> In fact, the final solution may be to have both methods available, and
+> switch between them depending on the size of the query sets. Each
+> method has its advantages. The function example lets the outside query
+> be executed, and only calls the subquery when needed.
+>
+> For large tables where the subselect is small and is the entire WHERE
+> restriction, the SQL function gets call much too often. A simple join
+> of the subquery result and the large table would be much better. This
+> method also allows for sort/merge join of the subquery results, and
+> index use.
+
+...keep thinking...
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Mon Nov 3 11:01:01 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03633
+ for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 11:00:59 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id KAA12174 for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 10:49:42 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id KAA26203; Mon, 3 Nov 1997 10:33:32 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 10:31:43 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id KAA25514 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 10:31:36 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA25449 for <hackers@postgreSQL.org>; Mon, 3 Nov 1997 10:31:23 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id KAA02262;
+ Mon, 3 Nov 1997 10:25:34 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711031525.KAA02262@candle.pha.pa.us>
+Subject: Re: [HACKERS] subselects
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Mon, 3 Nov 1997 10:25:34 -0500 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <345DB0F7.5E652F78@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 06:09:43 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> Sorry, I missed using of temp tables. Sybase uses joins (without
+> temp tables) for non-correlated subqueries:
+>
+> A noncorrelated subquery can be evaluated as if it were an independent query.
+> Conceptually, the results of the subquery are substituted in the main statement, or
+> outer query. This is not how SQL Server actually processes statements with
+> subqueries. Noncorrelated subqueries can be alternatively stated as joins and
+> are processed as joins by SQL Server.
+>
+> but this is not possible if there are aggregates in subquery.
+>
+> >
+> > My idea was this. This is a non-correlated subquery.
+> ...
+> No problems with it...
+>
+> >
+> > Here is a correlated example:
+> >
+> > select *
+> > from table_a
+> > where table_a.col_a in (select table_b.col_b
+> > from table_b
+> > where table_b.col_b = table_a.col_c)
+> >
+> > rewrite as:
+> >
+> > select distinct table_b.col_b, table_a.col_c -- the distinct is needed
+> > into table_sub
+> > from table_a, table_b
+>
+> First, could we add 'where table_b.col_b = table_a.col_c' here ?
+> Just to avoid Cartesian results ? I hope we can.
+
+Yes, of course. I forgot that line here. We can also be fancy and move
+some of the outer where restrictions on table_a into the subquery.
+
+I think the classic subquery for this would be if someone wanted all
+customer names that had invoices in the past month:
+
+select custname
+from customer
+where custid in (select order.custid
+ from order
+ where order.date >= "09/01/97" and
+ order.date <= "09/30/97"
+
+In this case, the subquery can use an index on 'date' to quickly
+evaluate the query, and the resulting temp table can quickly be joined
+to the customer table. If we used SQL functions, every customer would
+have an order query evaluated for it, and there may be no multi-column
+index on customer and date, or even if there is, this could be many
+query executions.
+
+
+>
+> Note that for query
+>
+> select *
+> from table_a
+> where table_a.col_a in (select table_b.col_b * table_a.col_c
+> from table_b)
+>
+> it's better to do
+>
+> select distinct table_a.col_a
+> into table table_sub
+> from table_b, table_a
+> where table_a.col_a = table_b.col_b * table_a.col_c
+
+Yes, I had not thought of cases where they are doing correlated column
+arithmetic, but it looks like this would work.
+
+>
+> once again - to avoid Cartesians.
+>
+> But what could we do for
+>
+> select *
+> from table_a
+> where table_a.col_a = (select max(table_b.col_b * table_a.col_c)
+> from table_b)
+
+OK, who wrote this horrible query. :-)
+
+Without a join of table_b and table_a, even an SQL function would die on
+this. You have to take the current value table_a.col_c, and multiply by
+every value of table_b.col_b to get the maximum.
+
+Trying to do a temp table on this is certainly going to be a cartesian
+product, but using an SQL function is also going to be a cartesian
+product, except that the product is generated in small pieces instead of
+in one big query. The SQL function example may eventually complete, but
+it will take forever to do so in cases where the temp table would bomb.
+
+I can recommend some SQL books for anyone go sends in a bug report on
+this query. :-)
+
+
+
+> ???
+> select max(table_b.col_b * table_a.col_c), table_a.col_a
+> into table table_sub
+> from table_b, table_a
+> group by table_a.col_a
+>
+> first tries to sort sizeof(table_a) * sizeof(table_b) tuples...
+> For tables big and small with 100 000 and 1000 tuples
+>
+> select max(x*y), x from big, small group by x
+>
+> "ate" all free 140M in my file system after 20 minutes (just for
+> sorting - nothing more) and was killed...
+>
+> select x from big where x = cor(x);
+> (cor(int4) is 'select max($1*y) from small') takes 20 minutes -
+> this is bad too.
+
+Again, my feeling is that in cases where the temp table would bomb, the
+SQL function will be so slow that neither will be acceptable.
+
+>
+> > >
+> > > Actually, I don't see any problems if we going to process subselect
+> > > like sql-funcs: non-correlated subselects can be emulated by
+> > > funcs without args, for correlated subselects parser (analyze.c)
+> > > has to change all upper query references to $1, $2,...
+> >
+> > Yes, logically, they are SQL functions, but aren't we going to see
+> > terrible performance in such circumstances. My experience is that when
+> ^^^^^^^^^^^^^^^^^^^^
+> You're right.
+>
+> > people are given subselects, they start to do huge jobs with them.
+> >
+> > In fact, the final solution may be to have both methods available, and
+> > switch between them depending on the size of the query sets. Each
+> > method has its advantages. The function example lets the outside query
+> > be executed, and only calls the subquery when needed.
+> >
+> > For large tables where the subselect is small and is the entire WHERE
+> > restriction, the SQL function gets call much too often. A simple join
+> > of the subquery result and the large table would be much better. This
+> > method also allows for sort/merge join of the subquery results, and
+> > index use.
+>
+> ...keep thinking...
+>
+> Vadim
+>
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Thu Nov 20 00:09:18 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA05239
+ for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 00:09:11 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA13776; Wed, 19 Nov 1997 23:59:53 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 19 Nov 1997 23:58:49 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA13599 for pgsql-hackers-outgoing; Wed, 19 Nov 1997 23:58:43 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA13512 for <hackers@postgreSQL.org>; Wed, 19 Nov 1997 23:58:16 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id XAA03103
+ for hackers@postgreSQL.org; Wed, 19 Nov 1997 23:57:44 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711200457.XAA03103@candle.pha.pa.us>
+Subject: [HACKERS] subselect
+To: hackers@postgreSQL.org (PostgreSQL-development)
+Date: Wed, 19 Nov 1997 23:57:44 -0500 (EST)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+I am going to overhaul all the /parser files, and I may give subselects
+a try while I am in there. This is where it going to have to be done.
+
+Two things I think I need are:
+
+ temp tables that go away at the end of a statement, so if the
+query elog's out, the temp file gets destroyed
+
+ how do I implement "not in":
+
+ select * from a where x not in (select y from b)
+
+Using <> is not going to work because that returns multiple copies of a,
+one for every one that doesn't equal. It is like we need not equals,
+but don't return multiple rows.
+
+Any ideas?
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From lockhart@alumni.caltech.edu Thu Nov 20 10:00:59 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA22019
+ for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 10:00:56 -0500 (EST)
+Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21662 for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 09:52:55 -0500 (EST)
+Received: from alumni.caltech.edu (localhost [127.0.0.1])
+ by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA22754;
+ Thu, 20 Nov 1997 06:27:21 GMT
+Sender: tgl@gnet04.jpl.nasa.gov
+Message-ID: <3473D849.16F67A2A@alumni.caltech.edu>
+Date: Thu, 20 Nov 1997 06:27:21 +0000
+From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Organization: Caltech/JPL
+X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgresql.org>
+Subject: Re: [HACKERS] subselect
+References: <199711200457.XAA03103@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+> I am going to overhaul all the /parser files
+
+??
+
+> , and I may give subselects
+> a try while I am in there. This is where it going to have to be done.
+
+A first cut at the subselect syntax is already in gram.y. I'm sure that the
+e-mail you had sent which collected several items regarding subselects
+covers some of this topic. I've been thinking about subselects also, and
+had thought that there must be some existing mechanisms in the backend
+which can be used to help implement subselects. It seems to me that UNION
+might be a good thing to implement first, because it has a fairly
+well-defined set of behaviors:
+
+ select a union select b;
+
+chooses elements from a and from b and then sorts/uniques the result.
+
+ select a union all select b;
+
+chooses elements from a, sorts/uniques, and then adds all elements from b.
+
+ select a union select b union all select c;
+
+evaluates left to right, and first evaluates a union b, sorts/uniques, and
+then evaluates
+
+ (result) union all select c;
+
+There are several types of subselects. Examples of some are:
+
+1) select a.f from a union select b.f from b order by 1;
+Needs temporary table(s), optional sort/unique, final order by.
+
+2) select a.f from a where a.f in (select b.f from b);
+Needs temporary table(s). "in" can be first implemented by count(*) > 0 but
+would be better performance to have the backend return after the first
+match.
+
+3) select a.f from a where exists (select b.f from b where b.f = a);
+Need to do the select and do a subselect on _each_ of the returned values?
+Again could use count(*) to help implement.
+
+This brings up the point that perhaps the backend needs a row-counting
+atomic operation and count(*) could be re-implemented using that. At the
+moment count(*) is transformed to a select of OID columns and does not
+quite work on table joins.
+
+I would think that outer joins could use some of these support routines
+also.
+
+ - Tom
+
+> Two things I think I need are:
+>
+> temp tables that go away at the end of a statement, so if the
+> query elog's out, the temp file gets destroyed
+>
+> how do I implement "not in":
+>
+> select * from a where x not in (select y from b)
+>
+> Using <> is not going to work because that returns multiple copies of a,
+> one for every one that doesn't equal. It is like we need not equals,
+> but don't return multiple rows.
+>
+> Any ideas?
+>
+> --
+> Bruce Momjian
+> maillist@candle.pha.pa.us
+
+
+
+
+From owner-pgsql-hackers@hub.org Mon Dec 22 00:49:03 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA13311
+ for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 00:49:01 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA11930; Mon, 22 Dec 1997 00:45:41 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 00:45:17 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA11756 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 00:45:14 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA11624 for <hackers@postgreSQL.org>; Mon, 22 Dec 1997 00:44:57 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id AAA11605
+ for hackers@postgreSQL.org; Mon, 22 Dec 1997 00:45:23 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199712220545.AAA11605@candle.pha.pa.us>
+Subject: [HACKERS] subselects
+To: hackers@postgreSQL.org (PostgreSQL-development)
+Date: Mon, 22 Dec 1997 00:45:23 -0500 (EST)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+OK, a few questions:
+
+ Should we use sortmerge, so we can use our psort as temp tables,
+or do we use hashunique?
+
+ How do we pass the query to the optimizer? How do we represent
+the range table for each, and the links between them in correlated
+subqueries?
+
+I have to think about this. Comments are welcome.
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Mon Dec 22 02:01:27 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA20608
+ for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 02:01:25 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA25136 for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 01:37:29 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA25289; Mon, 22 Dec 1997 01:31:18 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 01:30:45 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA23854 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 01:30:35 -0500 (EST)
+Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22847 for <hackers@postgreSQL.org>; Mon, 22 Dec 1997 01:30:15 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id BAA17354
+ for hackers@postgreSQL.org; Mon, 22 Dec 1997 01:05:04 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199712220605.BAA17354@candle.pha.pa.us>
+Subject: [HACKERS] subselects (fwd)
+To: hackers@postgreSQL.org (PostgreSQL-development)
+Date: Mon, 22 Dec 1997 01:05:03 -0500 (EST)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Forwarded message:
+> OK, a few questions:
+>
+> Should we use sortmerge, so we can use our psort as temp tables,
+> or do we use hashunique?
+>
+> How do we pass the query to the optimizer? How do we represent
+> the range table for each, and the links between them in correlated
+> subqueries?
+>
+> I have to think about this. Comments are welcome.
+
+One more thing. I guess I am seeing subselects as a different thing
+that temp tables. I can see people wanting to put indexes on their temp
+tables, so I think they will need more system catalog support. For
+subselects, I think we can just stuff them into psort, perhaps, and do
+the unique as we unload them.
+
+Seems like a natural to me.
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:07 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08876
+ for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:00:57 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23042;
+ Tue, 23 Dec 1997 16:08:56 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <349F7FA8.77F8DC55@sable.krasnoyarsk.su>
+Date: Tue, 23 Dec 1997 16:08:56 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] subselects (fwd)
+References: <199712220605.BAA17354@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> Forwarded message:
+> > OK, a few questions:
+> >
+> > Should we use sortmerge, so we can use our psort as temp tables,
+> > or do we use hashunique?
+> >
+> > How do we pass the query to the optimizer? How do we represent
+> > the range table for each, and the links between them in correlated
+> > subqueries?
+> >
+> > I have to think about this. Comments are welcome.
+>
+> One more thing. I guess I am seeing subselects as a different thing
+> that temp tables. I can see people wanting to put indexes on their temp
+> tables, so I think they will need more system catalog support. For
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+What's the difference between temp tables and temp indices ?
+Both of them are handled via catalog cache...
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Sat Jan 3 04:01:00 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA28565
+ for <maillist@candle.pha.pa.us>; Sat, 3 Jan 1998 04:00:58 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA19242 for <maillist@candle.pha.pa.us>; Sat, 3 Jan 1998 03:47:07 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA21017;
+ Sat, 3 Jan 1998 16:08:55 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34AE0023.A477AEC5@sable.krasnoyarsk.su>
+Date: Sat, 03 Jan 1998 16:08:51 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>,
+ "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Subject: Re: subselects
+References: <199712290516.AAA12579@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> With UNIONs done, how are things going with you on subselects? UNIONs
+> are much easier that subselects.
+>
+> I am stumped on how to record the subselect query information in the
+> parser and stuff.
+
+ And I'm too. We definitely need in EXISTS node and may be in IN one.
+Also, we have to support ANY and ALL modifiers of comparison operators
+(it would be nice to support ANY and ALL for all operators returning
+bool: >, =, ..., like, ~ and so on). Note, that IN is the same as
+= ANY (NOT IN ==> <> ALL) assuming that '=' means EQUAL for all data types,
+and so, we could avoid IN node, but I'm not sure that I like such
+assumption: postgres is OO-like system allowing operators to be overriden
+and so, '=' can, in theory, mean not EQUAL but something else (someday
+we could allow to specify "meaning" of operator in CREATE OPERATOR) -
+in short, I would like IN node.
+ Also, I would suggest nodes for ANY and ALL.
+ (I need in few days to think more about recording of this stuff...)
+
+>
+> Please let me know what I can do to help, if anything.
+
+Thanks. As I remember, Tom also wished to work here. Tom ?
+
+Bye,
+ Vadim
+
+P.S. I'll be "on-line" Jan 5.
+
+From owner-pgsql-hackers@hub.org Mon Jan 5 07:30:51 1998
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA05466
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 07:30:49 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id HAA04700; Mon, 5 Jan 1998 07:22:06 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 07:21:45 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id HAA02846 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 07:21:35 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA00903 for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 07:20:57 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA24278;
+ Mon, 5 Jan 1998 19:36:06 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Message-ID: <34B0D3AF.F31338B3@sable.krasnoyarsk.su>
+Date: Mon, 05 Jan 1998 19:35:59 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] subselect
+References: <199801050516.AAA28005@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce Momjian wrote:
+>
+> I was thinking about subselects, and how to attach the two queries.
+>
+> What if the subquery makes a range table entry in the outer query, and
+> the query is set up like the UNION queries where we put the scans in a
+> row, but in the case we put them over/under each other.
+>
+> And we push a temp table into the catalog cache that represents the
+> result of the subquery, then we could join to it in the outer query as
+> though it was a real table.
+>
+> Also, can't we do the correlated subqueries by adding the proper
+> target/output columns to the subquery, and have the outer query
+> reference those columns in the subquery range table entry.
+
+Yes, this is a way to handle subqueries by joining to temp table.
+After getting plan we could change temp table access path to
+node material. On the other hand, it could be useful to let optimizer
+know about cost of temp table creation (have to think more about it)...
+Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
+is one example of this - joining by <> will give us invalid results.
+Setting special NOT EQUAL flag is not enough: subquery plan must be
+always inner one in this case. The same for handling ALL modifier.
+Note, that we generaly can't use aggregates here: we can't add MAX to
+subquery in the case of > ALL (subquery), because of > ALL should return FALSE
+if subquery returns NULL(s) but aggregates don't take NULLs into account.
+
+>
+> Maybe I can write up a sample of this? Vadim, would this help? Is this
+> the point we are stuck at?
+
+Personally, I was stuck by holydays -:)
+Now I can spend ~ 8 hours ~ each day for development...
+
+Vadim
+
+
+From owner-pgsql-hackers@hub.org Mon Jan 5 10:45:30 1998
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA10769
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 10:45:28 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA17823; Mon, 5 Jan 1998 10:32:00 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 10:31:45 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA17757 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 10:31:38 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA17727 for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 10:31:06 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id KAA10375;
+ Mon, 5 Jan 1998 10:28:48 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801051528.KAA10375@candle.pha.pa.us>
+Subject: Re: [HACKERS] subselect
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Mon, 5 Jan 1998 10:28:48 -0500 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 5, 98 07:35:59 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> Yes, this is a way to handle subqueries by joining to temp table.
+> After getting plan we could change temp table access path to
+> node material. On the other hand, it could be useful to let optimizer
+> know about cost of temp table creation (have to think more about it)...
+> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
+> is one example of this - joining by <> will give us invalid results.
+> Setting special NOT EQUAL flag is not enough: subquery plan must be
+> always inner one in this case. The same for handling ALL modifier.
+> Note, that we generaly can't use aggregates here: we can't add MAX to
+> subquery in the case of > ALL (subquery), because of > ALL should return FALSE
+> if subquery returns NULL(s) but aggregates don't take NULLs into account.
+
+OK, here are my ideas. First, I think you have to handle subselects in
+the outer node because a subquery could have its own subquery. Also, we
+now have a field in Aggreg to all us to 'usenulls'.
+
+OK, here it is. I recommend we pass the outer and subquery through
+the parser and optimizer separately.
+
+We parse the subquery first. If the subquery is not correlated, it
+should parse fine. If it is correlated, any columns we find in the
+subquery that are not already in the FROM list, we add the table to the
+subquery FROM list, and add the referenced column to the target list of
+the subquery.
+
+When we are finished parsing the subquery, we create a catalog cache
+entry for it called 'sub1' and make its fields match the target
+list of the subquery.
+
+In the outer query, we add 'sub1' to its target list, and change
+the subquery reference to point to the new range table. We also add
+WHERE clauses to do any correlated joins.
+
+Here is a simple example:
+
+ select *
+ from taba
+ where col1 = (select col2
+ from tabb)
+
+This is not correlated, and the subquery parser easily. We create a
+'sub1' catalog cache entry, and add 'sub1' to the outer query FROM
+clause. We also replace 'col1 = (subquery)' with 'col1 = sub1.col2'.
+
+Here is a more complex correlated subquery:
+
+ select *
+ from taba
+ where col1 = (select col2
+ from tabb
+ where taba.col3 = tabb.col4)
+
+Here we must add 'taba' to the subquery's FROM list, and add col3 to the
+target list of the subquery. After we parse the subquery, add 'sub1' to
+the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
+sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
+THe optimizer will do the correlation for us.
+
+In the optimizer, we can parse the subquery first, then the outer query,
+and then replace all 'sub1' references in the outer query to use the
+subquery plan.
+
+I realize making merging the two plans and doing IN and NOT IN is the
+real challenge, but I hoped this would give us a start.
+
+What do you think?
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Mon Jan 5 15:02:46 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA28690
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 15:02:44 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA08811 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 14:28:43 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id CAA24904;
+ Tue, 6 Jan 1998 02:56:00 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B13ACD.B1A95805@sable.krasnoyarsk.su>
+Date: Tue, 06 Jan 1998 02:55:57 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselect
+References: <199801051528.KAA10375@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > always inner one in this case. The same for handling ALL modifier.
+> > Note, that we generaly can't use aggregates here: we can't add MAX to
+> > subquery in the case of > ALL (subquery), because of > ALL should return FALSE
+> > if subquery returns NULL(s) but aggregates don't take NULLs into account.
+>
+> OK, here are my ideas. First, I think you have to handle subselects in
+> the outer node because a subquery could have its own subquery. Also, we
+
+I hope that this is no matter: if results of subquery (with/without sub-subqueries)
+will go into temp table then this table will be re-scanned for each outer tuple.
+
+> now have a field in Aggreg to all us to 'usenulls'.
+ ^^^^^^^^
+ This can't help:
+
+vac=> select * from x;
+y
+-
+1
+2
+3
+ <<< this is NULL
+(4 rows)
+
+vac=> select max(y) from x;
+max
+---
+ 3
+
+==> we can't replace
+
+select * from A where A.a > ALL (select y from x);
+ ^^^^^^^^^^^^^^^
+ (NULL will be returned and so A.a > ALL is FALSE - this is what
+ Sybase does, is it right ?)
+with
+
+select * from A where A.a > (select max(y) from x);
+ ^^^^^^^^^^^^^^^^^^^^
+just because of we lose knowledge about NULLs here.
+
+Also, I would like to handle ANY and ALL modifiers for all bool
+operators, either built-in or user-defined, for all data types -
+isn't PostgreSQL OO-like RDBMS -:)
+
+> OK, here it is. I recommend we pass the outer and subquery through
+> the parser and optimizer separately.
+
+I don't like this. I would like to get parse-tree from parser for
+entire query and let optimizer (on upper level) decide how to rewrite
+parse-tree and what plans to produce and how these plans should be
+merged. Note, that I don't object your methods below, but only where
+to place handling of this. I don't understand why should we add
+new part to the system which will do optimizer' work (parse-tree -->
+execution plan) and deal with optimizer nodes. Imho, upper optimizer
+level is nice place to do this.
+
+>
+> We parse the subquery first. If the subquery is not correlated, it
+> should parse fine. If it is correlated, any columns we find in the
+> subquery that are not already in the FROM list, we add the table to the
+> subquery FROM list, and add the referenced column to the target list of
+> the subquery.
+>
+> When we are finished parsing the subquery, we create a catalog cache
+> entry for it called 'sub1' and make its fields match the target
+> list of the subquery.
+>
+> In the outer query, we add 'sub1' to its target list, and change
+> the subquery reference to point to the new range table. We also add
+> WHERE clauses to do any correlated joins.
+...
+> Here is a more complex correlated subquery:
+>
+> select *
+> from taba
+> where col1 = (select col2
+> from tabb
+> where taba.col3 = tabb.col4)
+>
+> Here we must add 'taba' to the subquery's FROM list, and add col3 to the
+> target list of the subquery. After we parse the subquery, add 'sub1' to
+> the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
+> sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
+> THe optimizer will do the correlation for us.
+>
+> In the optimizer, we can parse the subquery first, then the outer query,
+> and then replace all 'sub1' references in the outer query to use the
+> subquery plan.
+>
+> I realize making merging the two plans and doing IN and NOT IN is the
+ ^^^^^^^^^^^^^^^^^^^^^
+This is very easy to do! As I already said we have just change sub1
+access path (SeqScan of sub1) with SeqScan of Material node with
+subquery plan.
+
+> real challenge, but I hoped this would give us a start.
+
+Decision about how to record subquery stuff in to parse-tree
+would be very good start -:)
+
+BTW, note that for _expression_ subqueries (which are introduced without
+IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples -
+we have to check that subquery returns single tuple...
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Mon Jan 5 20:31:03 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06836
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:31:01 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA29980 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 19:56:05 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28044; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:16 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27203 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:02 -0500 (EST)
+Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27049 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:30 -0500 (EST)
+Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67])
+ by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09337
+ for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:31:04 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id RAA02675;
+ Mon, 5 Jan 1998 17:16:40 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801052216.RAA02675@candle.pha.pa.us>
+Subject: Re: [HACKERS] subselect
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Mon, 5 Jan 1998 17:16:40 -0500 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <34B15C23.B24D5CC@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 6, 98 05:18:11 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> > I am confused. Do you want one flat query and want to pass the whole
+> > thing into the optimizer? That brings up some questions:
+>
+> No. I just want to follow Tom's way: I would like to see new
+> SubSelect node as shortened version of struct Query (or use
+> Query structure for each subquery - no matter for me), some
+> subquery-related stuff added to Query (and SubSelect) to help
+> optimizer to start, and see
+
+OK, so you want the subquery to actually be INSIDE the outer query
+expression. Do they share a common range table? If they don't, we
+could very easily just fly through when processing the WHERE clause, and
+start a new query using a new query structure for the subquery. Believe
+me, you don't want a separate SubQuery-type, just re-use Query for it.
+It allows you to call all the normal query stuff with a consistent
+structure.
+
+The parser will need to know it is in a subquery, so it can add the
+proper target columns to the subquery, or are you going to do that in
+the optimizer. You can do it in the optimizer, and join the range table
+references there too.
+
+>
+> typedef struct A_Expr
+> {
+> NodeTag type;
+> int oper; /* type of operation
+> * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
+> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> IN, NOT IN, ANY, ALL, EXISTS here,
+>
+> char *opname; /* name of operator/function */
+> Node *lexpr; /* left argument */
+> Node *rexpr; /* right argument */
+> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> and SubSelect (Query) here (as possible case).
+>
+> One thought to follow this way: RULEs (and so - VIEWs) are handled by using
+> Query - how else can we implement VIEWs on selects with subqueries ?
+
+Views are stored as nodeout structures, and are merged into the query's
+from list, target list, and where clause. I am working out
+readfunc,outfunc now to make sure they are up-to-date with all the
+current fields.
+
+>
+> BTW, is
+>
+> select * from A where (select TRUE from B);
+>
+> valid syntax ?
+
+I don't think so.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Mon Jan 5 17:01:54 1998
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA02066
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:01:47 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25063;
+ Tue, 6 Jan 1998 05:18:13 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B15C23.B24D5CC@sable.krasnoyarsk.su>
+Date: Tue, 06 Jan 1998 05:18:11 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselect
+References: <199801052051.PAA29341@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > > OK, here it is. I recommend we pass the outer and subquery through
+> > > the parser and optimizer separately.
+> >
+> > I don't like this. I would like to get parse-tree from parser for
+> > entire query and let optimizer (on upper level) decide how to rewrite
+> > parse-tree and what plans to produce and how these plans should be
+> > merged. Note, that I don't object your methods below, but only where
+> > to place handling of this. I don't understand why should we add
+> > new part to the system which will do optimizer' work (parse-tree -->
+> > execution plan) and deal with optimizer nodes. Imho, upper optimizer
+> > level is nice place to do this.
+>
+> I am confused. Do you want one flat query and want to pass the whole
+> thing into the optimizer? That brings up some questions:
+
+No. I just want to follow Tom's way: I would like to see new
+SubSelect node as shortened version of struct Query (or use
+Query structure for each subquery - no matter for me), some
+subquery-related stuff added to Query (and SubSelect) to help
+optimizer to start, and see
+
+typedef struct A_Expr
+{
+ NodeTag type;
+ int oper; /* type of operation
+ * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ IN, NOT IN, ANY, ALL, EXISTS here,
+
+ char *opname; /* name of operator/function */
+ Node *lexpr; /* left argument */
+ Node *rexpr; /* right argument */
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ and SubSelect (Query) here (as possible case).
+
+One thought to follow this way: RULEs (and so - VIEWs) are handled by using
+Query - how else can we implement VIEWs on selects with subqueries ?
+
+BTW, is
+
+select * from A where (select TRUE from B);
+
+valid syntax ?
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Mon Jan 5 18:00:57 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03296
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:00:55 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA20716 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:22:21 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094;
+ Tue, 6 Jan 1998 05:49:02 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su>
+Date: Tue, 06 Jan 1998 05:48:58 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Goran Thyni <goran@bildbasen.se>
+CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselect
+References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Goran Thyni wrote:
+>
+> Vadim,
+>
+> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
+> is one example of this - joining by <> will give us invalid results.
+>
+> What is you approach towards this problem?
+
+Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL)
+and so, we have to have not just NOT EQUAL flag but some ALL node
+with modified operator.
+
+After that, one way is put subquery into inner plan of an join node
+to be sure that for an outer tuple all corresponding subquery tuples
+will be tested with modified operator (this will require either
+changing code of all join nodes or addition of new plan type - we'll see)
+and another way is ... suggested by you:
+
+> I got an idea that one could reverse the order,
+> that is execute the outer first into a temptable
+> and delete from that according to the result of the
+> subquery and then return it.
+> Probably this is too raw and slow. ;-)
+
+This will be faster in some cases (when subquery returns many results
+and there are "not so many" results from outer query) - thanks for idea!
+
+>
+> Personally, I was stuck by holydays -:)
+> Now I can spend ~ 8 hours ~ each day for development...
+>
+> Oh, isn't it christmas eve right now in Russia?
+
+Due to historic reasons New Year is mu-u-u-uch popular
+holiday in Russia -:)
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Mon Jan 5 19:32:59 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA05070
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 19:32:57 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA26847 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:59:43 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28045; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:40 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27280 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:25 -0500 (EST)
+Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27030 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:25 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09438
+ for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:35:43 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094;
+ Tue, 6 Jan 1998 05:49:02 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su>
+Date: Tue, 06 Jan 1998 05:48:58 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Goran Thyni <goran@bildbasen.se>
+CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselect
+References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Goran Thyni wrote:
+>
+> Vadim,
+>
+> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
+> is one example of this - joining by <> will give us invalid results.
+>
+> What is you approach towards this problem?
+
+Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL)
+and so, we have to have not just NOT EQUAL flag but some ALL node
+with modified operator.
+
+After that, one way is put subquery into inner plan of an join node
+to be sure that for an outer tuple all corresponding subquery tuples
+will be tested with modified operator (this will require either
+changing code of all join nodes or addition of new plan type - we'll see)
+and another way is ... suggested by you:
+
+> I got an idea that one could reverse the order,
+> that is execute the outer first into a temptable
+> and delete from that according to the result of the
+> subquery and then return it.
+> Probably this is too raw and slow. ;-)
+
+This will be faster in some cases (when subquery returns many results
+and there are "not so many" results from outer query) - thanks for idea!
+
+>
+> Personally, I was stuck by holydays -:)
+> Now I can spend ~ 8 hours ~ each day for development...
+>
+> Oh, isn't it christmas eve right now in Russia?
+
+Due to historic reasons New Year is mu-u-u-uch popular
+holiday in Russia -:)
+
+Vadim
+
+
+From vadim@sable.krasnoyarsk.su Mon Jan 5 18:00:59 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03300
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:00:57 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA21652 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:42:15 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129;
+ Tue, 6 Jan 1998 06:10:05 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su>
+Date: Tue, 06 Jan 1998 06:09:56 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselect
+References: <199801052216.RAA02675@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > > I am confused. Do you want one flat query and want to pass the whole
+> > > thing into the optimizer? That brings up some questions:
+> >
+> > No. I just want to follow Tom's way: I would like to see new
+> > SubSelect node as shortened version of struct Query (or use
+> > Query structure for each subquery - no matter for me), some
+> > subquery-related stuff added to Query (and SubSelect) to help
+> > optimizer to start, and see
+>
+> OK, so you want the subquery to actually be INSIDE the outer query
+> expression. Do they share a common range table? If they don't, we
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+No.
+
+> could very easily just fly through when processing the WHERE clause, and
+> start a new query using a new query structure for the subquery. Believe
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+... and filling some subquery-related stuff in upper query structure -
+still don't know what exactly this could be -:)
+
+> me, you don't want a separate SubQuery-type, just re-use Query for it.
+> It allows you to call all the normal query stuff with a consistent
+> structure.
+
+No objections.
+
+>
+> The parser will need to know it is in a subquery, so it can add the
+> proper target columns to the subquery, or are you going to do that in
+
+I don't think that we need in it, but list of correlation clauses
+could be good thing - all in all parser has to check all column
+references...
+
+> the optimizer. You can do it in the optimizer, and join the range table
+> references there too.
+
+Yes.
+
+> > typedef struct A_Expr
+> > {
+> > NodeTag type;
+> > int oper; /* type of operation
+> > * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
+> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> > IN, NOT IN, ANY, ALL, EXISTS here,
+> >
+> > char *opname; /* name of operator/function */
+> > Node *lexpr; /* left argument */
+> > Node *rexpr; /* right argument */
+> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> > and SubSelect (Query) here (as possible case).
+> >
+> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using
+> > Query - how else can we implement VIEWs on selects with subqueries ?
+>
+> Views are stored as nodeout structures, and are merged into the query's
+> from list, target list, and where clause. I am working out
+> readfunc,outfunc now to make sure they are up-to-date with all the
+> current fields.
+
+Nice! This stuff was out-of-date for too long time.
+
+> > BTW, is
+> >
+> > select * from A where (select TRUE from B);
+> >
+> > valid syntax ?
+>
+> I don't think so.
+
+And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN,
+ANY, ALL, EXISTS - well.
+
+(Time to sleep -:)
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Mon Jan 5 20:31:08 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06842
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:31:06 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id UAA00621 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:03:49 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28043; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:38 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27270 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:22 -0500 (EST)
+Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27141 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:50 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09919
+ for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:54:47 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129;
+ Tue, 6 Jan 1998 06:10:05 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su>
+Date: Tue, 06 Jan 1998 06:09:56 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselect
+References: <199801052216.RAA02675@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > > I am confused. Do you want one flat query and want to pass the whole
+> > > thing into the optimizer? That brings up some questions:
+> >
+> > No. I just want to follow Tom's way: I would like to see new
+> > SubSelect node as shortened version of struct Query (or use
+> > Query structure for each subquery - no matter for me), some
+> > subquery-related stuff added to Query (and SubSelect) to help
+> > optimizer to start, and see
+>
+> OK, so you want the subquery to actually be INSIDE the outer query
+> expression. Do they share a common range table? If they don't, we
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+No.
+
+> could very easily just fly through when processing the WHERE clause, and
+> start a new query using a new query structure for the subquery. Believe
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+... and filling some subquery-related stuff in upper query structure -
+still don't know what exactly this could be -:)
+
+> me, you don't want a separate SubQuery-type, just re-use Query for it.
+> It allows you to call all the normal query stuff with a consistent
+> structure.
+
+No objections.
+
+>
+> The parser will need to know it is in a subquery, so it can add the
+> proper target columns to the subquery, or are you going to do that in
+
+I don't think that we need in it, but list of correlation clauses
+could be good thing - all in all parser has to check all column
+references...
+
+> the optimizer. You can do it in the optimizer, and join the range table
+> references there too.
+
+Yes.
+
+> > typedef struct A_Expr
+> > {
+> > NodeTag type;
+> > int oper; /* type of operation
+> > * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
+> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> > IN, NOT IN, ANY, ALL, EXISTS here,
+> >
+> > char *opname; /* name of operator/function */
+> > Node *lexpr; /* left argument */
+> > Node *rexpr; /* right argument */
+> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> > and SubSelect (Query) here (as possible case).
+> >
+> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using
+> > Query - how else can we implement VIEWs on selects with subqueries ?
+>
+> Views are stored as nodeout structures, and are merged into the query's
+> from list, target list, and where clause. I am working out
+> readfunc,outfunc now to make sure they are up-to-date with all the
+> current fields.
+
+Nice! This stuff was out-of-date for too long time.
+
+> > BTW, is
+> >
+> > select * from A where (select TRUE from B);
+> >
+> > valid syntax ?
+>
+> I don't think so.
+
+And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN,
+ANY, ALL, EXISTS - well.
+
+(Time to sleep -:)
+
+Vadim
+
+
+From owner-pgsql-hackers@hub.org Thu Jan 8 23:10:50 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA09707
+ for <maillist@candle.pha.pa.us>; Thu, 8 Jan 1998 23:10:48 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA19334 for <maillist@candle.pha.pa.us>; Thu, 8 Jan 1998 23:08:49 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id XAA14375; Thu, 8 Jan 1998 23:03:29 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 08 Jan 1998 23:03:10 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id XAA14345 for pgsql-hackers-outgoing; Thu, 8 Jan 1998 23:03:06 -0500 (EST)
+Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id XAA14008 for <hackers@postgreSQL.org>; Thu, 8 Jan 1998 23:00:50 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id WAA09243;
+ Thu, 8 Jan 1998 22:55:03 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801090355.WAA09243@candle.pha.pa.us>
+Subject: [HACKERS] subselects
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Thu, 8 Jan 1998 22:55:03 -0500 (EST)
+Cc: hackers@postgreSQL.org (PostgreSQL-development)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Vadim, I know you are still thinking about subselects, but I have some
+more clarification that may help.
+
+We have to add phantom range table entries to correlated subselects so
+they will pass the parser. We might as well add those fields to the
+target list of the subquery at the same time:
+
+ select *
+ from taba
+ where col1 = (select col2
+ from tabb
+ where taba.col3 = tabb.col4)
+
+becomes:
+
+ select *
+ from taba
+ where col1 = (select col2, tabb.col4 <---
+ from tabb, taba <---
+ where taba.col3 = tabb.col4)
+
+We add a field to TargetEntry and RangeTblEntry to mark the fact that it
+was entered as a correlation entry:
+
+ bool isCorrelated;
+
+Second, we need to hook the subselect to the main query. I recommend we
+add two fields to Query for this:
+
+ Query *parentQuery;
+ List *subqueries;
+
+The parentQuery pointer is used to resolve field names in the correlated
+subquery.
+
+ select *
+ from taba
+ where col1 = (select col2, tabb.col4 <---
+ from tabb, taba <---
+ where taba.col3 = tabb.col4)
+
+In the query above, the subquery can be easily parsed, and we add the
+subquery to the parsent's parentQuery list.
+
+In the parent query, to parse the WHERE clause, we create a new operator
+type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
+right side is an index to a slot in the subqueries List.
+
+We can then do the rest in the upper optimizer.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Fri Jan 9 10:01:01 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27305
+ for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 10:00:59 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21583 for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 09:52:17 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id WAA01623;
+ Fri, 9 Jan 1998 22:10:25 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B63DCD.73AA70C7@sable.krasnoyarsk.su>
+Date: Fri, 09 Jan 1998 22:10:06 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgresql.org>
+Subject: Re: subselects
+References: <199801090355.WAA09243@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> Vadim, I know you are still thinking about subselects, but I have some
+> more clarification that may help.
+>
+> We have to add phantom range table entries to correlated subselects so
+> they will pass the parser. We might as well add those fields to the
+> target list of the subquery at the same time:
+>
+> select *
+> from taba
+> where col1 = (select col2
+> from tabb
+> where taba.col3 = tabb.col4)
+>
+> becomes:
+>
+> select *
+> from taba
+> where col1 = (select col2, tabb.col4 <---
+> from tabb, taba <---
+> where taba.col3 = tabb.col4)
+>
+> We add a field to TargetEntry and RangeTblEntry to mark the fact that it
+> was entered as a correlation entry:
+>
+> bool isCorrelated;
+
+No, I don't like to add anything in parser. Example:
+
+ select *
+ from tabA
+ where col1 = (select col2
+ from tabB
+ where tabA.col3 = tabB.col4
+ and exists (select *
+ from tabC
+ where tabB.colX = tabC.colX and
+ tabC.colY = tabA.col2)
+ )
+
+: a column of tabA is referenced in sub-subselect
+(is it allowable by standards ?) - in this case it's better
+to don't add tabA to 1st subselect but add tabA to second one
+and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
+this gives us 2-tables join in 1st subquery instead of 3-tables join.
+(And I'm still not sure that using temp tables is best of what can be
+done in all cases...)
+
+Instead of using isCorrelated in TE & RTE we can add
+
+Index varlevel;
+
+to Var node to reflect (sub)query from where this Var is come
+(where is range table to find var's relation using varno). Upmost query
+will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
+ ^^^ ^^^^^^^^^^^^
+(I don't see problems with distinguishing Vars of different children
+on the same level...)
+
+>
+> Second, we need to hook the subselect to the main query. I recommend we
+> add two fields to Query for this:
+>
+> Query *parentQuery;
+> List *subqueries;
+
+Agreed. And maybe Index queryLevel.
+
+> In the parent query, to parse the WHERE clause, we create a new operator
+> type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
+ ^^^^^^^^^^^^^^^^^^
+No. We have to handle (a,b,c) OP (select x, y, z ...) and
+'_a_constant_' OP (select ...) - I don't know is last in standards,
+Sybase has this.
+
+Well,
+
+typedef enum OpType
+{
+ OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
+
++ OP_EXISTS, OP_ALL, OP_ANY
+
+} OpType;
+
+typedef struct Expr
+{
+ NodeTag type;
+ Oid typeOid; /* oid of the type of this expr */
+ OpType opType; /* type of the op */
+ Node *oper; /* could be Oper or Func */
+ List *args; /* list of argument nodes */
+} Expr;
+
+OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
+ List, following your suggestion)
+
+OP_ALL, OP_ANY:
+
+oper is List of Oper nodes. We need in list because of data types of
+a, b, c (above) can be different and so Oper nodes will be different too.
+
+lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
+left side of subquery' operator.
+lsecond(args) is SubSelect.
+
+Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
+IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
+by parser into corresponding ANY and ALL. At the moment we can do:
+
+IN --> = ANY, NOT IN --> <> ALL
+
+but this will be "known bug": this breaks OO-nature of Postgres, because of
+operators can be overrided and '=' can mean s o m e t h i n g (not equality).
+Example: box data type. For boxes, = means equality of _areas_ and =~
+means that boxes are the same ==> =~ ANY should be used for IN.
+
+> right side is an index to a slot in the subqueries List.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Fri Jan 9 17:44:04 1998
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA24779
+ for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 17:44:01 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id RAA20728; Fri, 9 Jan 1998 17:32:34 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 09 Jan 1998 17:32:19 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id RAA20503 for pgsql-hackers-outgoing; Fri, 9 Jan 1998 17:32:15 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id RAA20008 for <hackers@postgresql.org>; Fri, 9 Jan 1998 17:31:24 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id RAA24282;
+ Fri, 9 Jan 1998 17:31:41 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801092231.RAA24282@candle.pha.pa.us>
+Subject: [HACKERS] Re: subselects
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Fri, 9 Jan 1998 17:31:41 -0500 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 9, 98 10:10:06 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> Bruce Momjian wrote:
+> >
+> > Vadim, I know you are still thinking about subselects, but I have some
+> > more clarification that may help.
+> >
+> > We have to add phantom range table entries to correlated subselects so
+> > they will pass the parser. We might as well add those fields to the
+> > target list of the subquery at the same time:
+> >
+> > select *
+> > from taba
+> > where col1 = (select col2
+> > from tabb
+> > where taba.col3 = tabb.col4)
+> >
+> > becomes:
+> >
+> > select *
+> > from taba
+> > where col1 = (select col2, tabb.col4 <---
+> > from tabb, taba <---
+> > where taba.col3 = tabb.col4)
+> >
+> > We add a field to TargetEntry and RangeTblEntry to mark the fact that it
+> > was entered as a correlation entry:
+> >
+> > bool isCorrelated;
+>
+> No, I don't like to add anything in parser. Example:
+>
+> select *
+> from tabA
+> where col1 = (select col2
+> from tabB
+> where tabA.col3 = tabB.col4
+> and exists (select *
+> from tabC
+> where tabB.colX = tabC.colX and
+> tabC.colY = tabA.col2)
+> )
+>
+> : a column of tabA is referenced in sub-subselect
+
+This is a strange case that I don't think we need to handle in our first
+implementation.
+
+> (is it allowable by standards ?) - in this case it's better
+> to don't add tabA to 1st subselect but add tabA to second one
+> and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
+> this gives us 2-tables join in 1st subquery instead of 3-tables join.
+> (And I'm still not sure that using temp tables is best of what can be
+> done in all cases...)
+
+I don't see any use for temp tables in subselects anymore. After having
+implemented UNIONS, I now see how much can be done in the upper
+optimizer. I see you just putting the subquery PLAN into the proper
+place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
+
+>
+> Instead of using isCorrelated in TE & RTE we can add
+>
+> Index varlevel;
+
+OK. Sounds good.
+
+>
+> to Var node to reflect (sub)query from where this Var is come
+> (where is range table to find var's relation using varno). Upmost query
+> will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
+> ^^^ ^^^^^^^^^^^^
+> (I don't see problems with distinguishing Vars of different children
+> on the same level...)
+>
+> >
+> > Second, we need to hook the subselect to the main query. I recommend we
+> > add two fields to Query for this:
+> >
+> > Query *parentQuery;
+> > List *subqueries;
+>
+> Agreed. And maybe Index queryLevel.
+
+Sure. If it helps.
+
+>
+> > In the parent query, to parse the WHERE clause, we create a new operator
+> > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
+> ^^^^^^^^^^^^^^^^^^
+> No. We have to handle (a,b,c) OP (select x, y, z ...) and
+> '_a_constant_' OP (select ...) - I don't know is last in standards,
+> Sybase has this.
+
+I have never seen this in my eight years of SQL. Perhaps we can leave
+this for later, maybe much later.
+
+>
+> Well,
+>
+> typedef enum OpType
+> {
+> OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
+>
+> + OP_EXISTS, OP_ALL, OP_ANY
+>
+> } OpType;
+>
+> typedef struct Expr
+> {
+> NodeTag type;
+> Oid typeOid; /* oid of the type of this expr */
+> OpType opType; /* type of the op */
+> Node *oper; /* could be Oper or Func */
+> List *args; /* list of argument nodes */
+> } Expr;
+>
+> OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
+> List, following your suggestion)
+>
+> OP_ALL, OP_ANY:
+>
+> oper is List of Oper nodes. We need in list because of data types of
+> a, b, c (above) can be different and so Oper nodes will be different too.
+>
+> lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
+> left side of subquery' operator.
+> lsecond(args) is SubSelect.
+>
+> Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
+> IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
+> by parser into corresponding ANY and ALL. At the moment we can do:
+>
+> IN --> = ANY, NOT IN --> <> ALL
+>
+> but this will be "known bug": this breaks OO-nature of Postgres, because of
+> operators can be overrided and '=' can mean s o m e t h i n g (not equality).
+> Example: box data type. For boxes, = means equality of _areas_ and =~
+> means that boxes are the same ==> =~ ANY should be used for IN.
+
+That is interesting, to use =~ for ANY.
+
+Yes, but how many operators take a SUBQUERY as an operand. This is a
+special case to me.
+
+I think I see where you are trying to go. You want subselects to behave
+like any other operator, with a subselect type, and you do all the
+subselect handling in the optimizer, with special Nodes and actions.
+
+I think this may be just too much of a leap. We have such clean query
+logic for single queries, I can't imagine having an operator that has a
+Query operand, and trying to get everything to properly handle it.
+UNIONS were very easy to implement as a List off of Query, with some
+foreach()'s in rewrite and the high optimizer.
+
+Subselects are SQL standard, and are never going to be over-ridden by a
+user. Same with UNION. They want UNION, they get UNION. They want
+Subselect, we are going to spin through the Query structure and give
+them what they want.
+
+The complexities of subselects and correlated queries and range tables
+and stuff is so bizarre that trying to get it to work inside the type
+system could be a huge project.
+
+>
+> > right side is an index to a slot in the subqueries List.
+
+I guess the question is what can we have by February 1?
+
+I have been reading some postings, and it seems to me that subselects
+are the litmus test for many evaluators when deciding if a database
+engine is full-featured.
+
+Sorry to be so straightforward, but I want to keep hashing this around
+until we get a conclusion, so coding can start.
+
+My suggestions have been, I believe, trying to get subselects working
+with the fullest functionality by adding the least amount of code, and
+keeping the logic clean.
+
+Have you checked out the UNION code? It is very small, but it works. I
+think it could make a good sample for subselects.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Sat Jan 10 12:00:51 1998
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA28742
+ for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:00:43 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05684;
+ Sun, 11 Jan 1998 00:19:10 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
+Date: Sun, 11 Jan 1998 00:19:08 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgresql.org, "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Subject: Re: subselects
+References: <199801092231.RAA24282@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > No, I don't like to add anything in parser. Example:
+> >
+> > select *
+> > from tabA
+> > where col1 = (select col2
+> > from tabB
+> > where tabA.col3 = tabB.col4
+> > and exists (select *
+> > from tabC
+> > where tabB.colX = tabC.colX and
+> > tabC.colY = tabA.col2)
+> > )
+> >
+> > : a column of tabA is referenced in sub-subselect
+>
+> This is a strange case that I don't think we need to handle in our first
+> implementation.
+
+I don't know is this strange case or not :)
+But I would like to know is this allowed by standards - can someone
+comment on this ?
+And I don't see problems with handling this...
+
+>
+> > (is it allowable by standards ?) - in this case it's better
+> > to don't add tabA to 1st subselect but add tabA to second one
+> > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
+> > this gives us 2-tables join in 1st subquery instead of 3-tables join.
+> > (And I'm still not sure that using temp tables is best of what can be
+> > done in all cases...)
+>
+> I don't see any use for temp tables in subselects anymore. After having
+> implemented UNIONS, I now see how much can be done in the upper
+> optimizer. I see you just putting the subquery PLAN into the proper
+> place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
+
+When saying about temp tables, I meant tables created by node Material
+for subquery plan. This is one of two ways - run subquery once for all
+possible upper plan tuples and then just join result table with upper
+query. Another way is re-run subquery for each upper query tuple,
+without temp table but may be with caching results by some ways.
+Actually, there is special case - when subquery can be alternatively
+formulated as joins, - but this is just special case.
+
+> > > In the parent query, to parse the WHERE clause, we create a new operator
+> > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
+> > ^^^^^^^^^^^^^^^^^^
+> > No. We have to handle (a,b,c) OP (select x, y, z ...) and
+> > '_a_constant_' OP (select ...) - I don't know is last in standards,
+> > Sybase has this.
+>
+> I have never seen this in my eight years of SQL. Perhaps we can leave
+> this for later, maybe much later.
+
+Are you saying about (a, b, c) or about 'a_constant' ?
+Again, can someone comment on are they in standards or not ?
+Tom ?
+If yes then please add parser' support for them now...
+
+> > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
+> > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
+> > by parser into corresponding ANY and ALL. At the moment we can do:
+> >
+> > IN --> = ANY, NOT IN --> <> ALL
+> >
+> > but this will be "known bug": this breaks OO-nature of Postgres, because of
+> > operators can be overrided and '=' can mean s o m e t h i n g (not equality).
+> > Example: box data type. For boxes, = means equality of _areas_ and =~
+> > means that boxes are the same ==> =~ ANY should be used for IN.
+>
+> That is interesting, to use =~ for ANY.
+>
+> Yes, but how many operators take a SUBQUERY as an operand. This is a
+> special case to me.
+>
+> I think I see where you are trying to go. You want subselects to behave
+> like any other operator, with a subselect type, and you do all the
+> subselect handling in the optimizer, with special Nodes and actions.
+>
+> I think this may be just too much of a leap. We have such clean query
+> logic for single queries, I can't imagine having an operator that has a
+> Query operand, and trying to get everything to properly handle it.
+> UNIONS were very easy to implement as a List off of Query, with some
+> foreach()'s in rewrite and the high optimizer.
+>
+> Subselects are SQL standard, and are never going to be over-ridden by a
+> user. Same with UNION. They want UNION, they get UNION. They want
+> Subselect, we are going to spin through the Query structure and give
+> them what they want.
+>
+> The complexities of subselects and correlated queries and range tables
+> and stuff is so bizarre that trying to get it to work inside the type
+> system could be a huge project.
+
+PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
+derived from the Berkeley Postgres database management system. While
+PostgreSQL retains the powerful object-relational data model, rich data types and
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+easy extensibility of Postgres, it replaces the PostQuel query language with an
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+extended subset of SQL.
+^^^^^^^^^^^^^^^^^^^^^^
+
+Should we say users that subselect will work for standard data types only ?
+I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
+Is there difference between handling = ANY and ~ ANY ? I don't see any.
+Currently we can't get IN working properly for boxes (and may be for others too)
+and I don't like to try to resolve these problems now, but hope that someday
+we'll be able to do this. At the moment - just convert IN into = ANY and
+NOT IN into <> ALL in parser.
+
+(BTW, do you know how DISTINCT is implemented ? It doesn't use = but
+use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
+
+> >
+> > > right side is an index to a slot in the subqueries List.
+>
+> I guess the question is what can we have by February 1?
+>
+> I have been reading some postings, and it seems to me that subselects
+> are the litmus test for many evaluators when deciding if a database
+> engine is full-featured.
+>
+> Sorry to be so straightforward, but I want to keep hashing this around
+> until we get a conclusion, so coding can start.
+>
+> My suggestions have been, I believe, trying to get subselects working
+> with the fullest functionality by adding the least amount of code, and
+> keeping the logic clean.
+>
+> Have you checked out the UNION code? It is very small, but it works. I
+> think it could make a good sample for subselects.
+
+There is big difference between subqueries and queries in UNION -
+there are not dependences between UNION queries.
+
+Ok, opened issues:
+
+1. Is using upper query' vars in all subquery levels in standard ?
+2. Is (a, b, c) OP (subselect) in standard ?
+3. What types of expressions (Var, Const, ...) are allowed on the left
+ side of operator with subquery on the right ?
+4. What types of operators should we support (=, >, ..., like, ~, ...) ?
+ (My vote for all boolean operators).
+
+And - did we get consensus on presentation subqueries stuff in Query,
+Expr and Var ?
+I would like to have something done in parser near Jan 17 to get
+subqueries working by Feb 1. I vote for support of all standard
+things (1. - 3.) in parser right now - if there will be no time
+to implement something like (a, b, c) then optimizer will call
+elog(WARN) (oh, sorry, - elog(ERROR)).
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Sat Jan 10 12:31:05 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA29045
+ for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:31:01 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA23364 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:22:30 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05725;
+ Sun, 11 Jan 1998 00:41:22 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B7B2BF.44FE7252@sable.krasnoyarsk.su>
+Date: Sun, 11 Jan 1998 00:41:19 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] subselects
+References: <199712220545.AAA11605@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> OK, a few questions:
+>
+> Should we use sortmerge, so we can use our psort as temp tables,
+> or do we use hashunique?
+>
+> How do we pass the query to the optimizer? How do we represent
+> the range table for each, and the links between them in correlated
+> subqueries?
+
+My suggestion is just use varlevel in Var and don't put upper query'
+relations into subquery range table.
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Sat Jan 10 13:01:00 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29357
+ for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:00:58 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA24030 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:40:02 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05741;
+ Sun, 11 Jan 1998 00:58:56 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B7B6DC.937E1B8D@sable.krasnoyarsk.su>
+Date: Sun, 11 Jan 1998 00:58:52 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>,
+ PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] subselects
+References: <199712220545.AAA11605@candle.pha.pa.us> <34B7B2BF.44FE7252@sable.krasnoyarsk.su>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Vadim B. Mikheev wrote:
+>
+> Bruce Momjian wrote:
+> >
+> > OK, a few questions:
+> >
+> > Should we use sortmerge, so we can use our psort as temp tables,
+> > or do we use hashunique?
+> >
+> > How do we pass the query to the optimizer? How do we represent
+> > the range table for each, and the links between them in correlated
+> > subqueries?
+>
+> My suggestion is just use varlevel in Var and don't put upper query'
+> relations into subquery range table.
+
+Hmm... Sorry, it seems that I did reply to very old message - forget it.
+
+Vadim
+
+From lockhart@alumni.caltech.edu Sat Jan 10 13:30:59 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29664
+ for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:30:56 -0500 (EST)
+Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA25109 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:05:09 -0500 (EST)
+Received: from alumni.caltech.edu (localhost [127.0.0.1])
+ by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA03623;
+ Sat, 10 Jan 1998 18:01:03 GMT
+Sender: tgl@gnet04.jpl.nasa.gov
+Message-ID: <34B7B75F.B49D7642@alumni.caltech.edu>
+Date: Sat, 10 Jan 1998 18:01:03 +0000
+From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Organization: Caltech/JPL
+X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
+MIME-Version: 1.0
+To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
+Subject: Re: subselects
+References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+> > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
+> > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
+> > > by parser into corresponding ANY and ALL. At the moment we can do:
+> > >
+> > > IN --> = ANY, NOT IN --> <> ALL
+> > >
+> > > but this will be "known bug": this breaks OO-nature of Postgres, because of
+> > > operators can be overrided and '=' can mean s o m e t h i n g (not equality).
+> > > Example: box data type. For boxes, = means equality of _areas_ and =~
+> > > means that boxes are the same ==> =~ ANY should be used for IN.
+> >
+> > That is interesting, to use =~ for ANY.
+
+If I understand the discussion, I would think is is fine to make an assumption about
+which operator is used to implement a subselect expression. If someone remaps an
+operator to mean something different, then they will get a different result (or a
+nonsensical one) from a subselect.
+
+I'd be happy to remap existing operators to fit into a convention which would work
+with subselects (especially if I got to help choose :).
+
+> > Subselects are SQL standard, and are never going to be over-ridden by a
+> > user. Same with UNION. They want UNION, they get UNION. They want
+> > Subselect, we are going to spin through the Query structure and give
+> > them what they want.
+>
+> PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
+> derived from the Berkeley Postgres database management system. While
+> PostgreSQL retains the powerful object-relational data model, rich data types and
+> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> easy extensibility of Postgres, it replaces the PostQuel query language with an
+> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> extended subset of SQL.
+> ^^^^^^^^^^^^^^^^^^^^^^
+>
+> Should we say users that subselect will work for standard data types only ?
+> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
+> Is there difference between handling = ANY and ~ ANY ? I don't see any.
+> Currently we can't get IN working properly for boxes (and may be for others too)
+> and I don't like to try to resolve these problems now, but hope that someday
+> we'll be able to do this. At the moment - just convert IN into = ANY and
+> NOT IN into <> ALL in parser.
+>
+> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
+> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
+
+?? I didn't know that. Wouldn't we want it to eventually use "=" through a sorted
+list? That would give more consistant behavior...
+
+> > I have been reading some postings, and it seems to me that subselects
+> > are the litmus test for many evaluators when deciding if a database
+> > engine is full-featured.
+> >
+> > Sorry to be so straightforward, but I want to keep hashing this around
+> > until we get a conclusion, so coding can start.
+> >
+> > My suggestions have been, I believe, trying to get subselects working
+> > with the fullest functionality by adding the least amount of code, and
+> > keeping the logic clean.
+> >
+> > Have you checked out the UNION code? It is very small, but it works. I
+> > think it could make a good sample for subselects.
+>
+> There is big difference between subqueries and queries in UNION -
+> there are not dependences between UNION queries.
+>
+> Ok, opened issues:
+>
+> 1. Is using upper query' vars in all subquery levels in standard ?
+
+I'm not certain. Let me know if you do not get an answer from someone else and I will
+research it.
+
+> 2. Is (a, b, c) OP (subselect) in standard ?
+
+Yes. In fact, it _is_ the standard, and "a OP (subselect)" is a special case where
+the parens are allowed to be omitted from a one element list.
+
+> 3. What types of expressions (Var, Const, ...) are allowed on the left
+> side of operator with subquery on the right ?
+
+I think most expressions are allowed. The "constant OP (subselect)" case you were
+asking about is just a simplified case since "(a, b, constant) OP (subselect)" where
+a and b are column references should be allowed. Of course, our optimizer could
+perhaps change this to "(a, b) OP (subselect where x = constant)", or for the first
+example "EXISTS (subselect where x = constant)".
+
+> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
+> (My vote for all boolean operators).
+
+Sounds good. But I'll vote with Bruce (and I'll bet you already agree) that it is
+important to get an initial implementation for v6.3 which covers a little, some, or
+all of the usual SQL subselect constructs. If we have to revisit this for v6.4 then
+we will have the benefit of feedback from others in practical applications which
+always uncovers new things to consider.
+
+> And - did we get consensus on presentation subqueries stuff in Query,
+> Expr and Var ?
+> I would like to have something done in parser near Jan 17 to get
+> subqueries working by Feb 1. I vote for support of all standard
+> things (1. - 3.) in parser right now - if there will be no time
+> to implement something like (a, b, c) then optimizer will callelog(WARN) (oh,
+> sorry, - elog(ERROR)).
+
+Great. I'd like to help with the remaining parser issues; at the moment "row_expr"
+does the right thing with expression comparisions but just parses then ignores
+subselect expressions. Let me know what structures you want passed back and I'll put
+them in, or if you prefer put in the first one and I'll go through and clean up and
+add the rest.
+
+ - Tom
+
+
+From lockhart@alumni.caltech.edu Sat Jan 10 15:00:58 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA00728
+ for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 15:00:56 -0500 (EST)
+Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA28438 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 14:35:19 -0500 (EST)
+Received: from alumni.caltech.edu (localhost [127.0.0.1])
+ by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id TAA06002;
+ Sat, 10 Jan 1998 19:31:30 GMT
+Sender: tgl@gnet04.jpl.nasa.gov
+Message-ID: <34B7CC91.E6E331C7@alumni.caltech.edu>
+Date: Sat, 10 Jan 1998 19:31:29 +0000
+From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Organization: Caltech/JPL
+X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
+MIME-Version: 1.0
+To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
+Subject: Re: [HACKERS] Re: subselects
+References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+> Are you saying about (a, b, c) or about 'a_constant' ?
+> Again, can someone comment on are they in standards or not ?
+> Tom ?
+> If yes then please add parser' support for them now...
+
+As I mentioned a few minutes ago in my last message, I parse the row descriptors and
+the subselects but for subselect expressions (e.g. "(a,b) OP (subselect)" I currently
+ignore the result. I didn't want to pass things back as lists until something in the
+backend was ready to receive them.
+
+If it is OK, I'll go ahead and start passing back a list of expressions when a row
+descriptor is present. So, what you will find is lexpr or rexpr in the A_Expr node
+being a list rather than an atomic node.
+
+Also, I can start passing back the subselect expression as the rexpr; right now the
+parser calls elog() and quits.
+
+btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
+makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
+If lists are handled farther back, this routine should move to there also and the
+parser will just pass the lists. Note that some assumptions have to be made about the
+meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
+"a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
+to disallow those cases or to look for specific appearance of the operator to guess
+the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
+it has "<>" or "!" then build as "or"s.
+
+Let me know what you want...
+
+ - Tom
+
+
+From lockhart@alumni.caltech.edu Sun Jan 11 01:01:55 1998
+Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA11953
+ for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:01:51 -0500 (EST)
+Received: from alumni.caltech.edu (localhost [127.0.0.1])
+ by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id FAA23797;
+ Sun, 11 Jan 1998 05:58:01 GMT
+Sender: tgl@gnet04.jpl.nasa.gov
+Message-ID: <34B85F68.9C015ED9@alumni.caltech.edu>
+Date: Sun, 11 Jan 1998 05:58:01 +0000
+From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Organization: Caltech/JPL
+X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
+MIME-Version: 1.0
+To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
+Subject: Re: [HACKERS] Re: subselects
+References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
+Content-Type: multipart/mixed; boundary="------------D8B38A0D1F78A10C0023F702"
+Status: OR
+
+This is a multi-part message in MIME format.
+--------------D8B38A0D1F78A10C0023F702
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+
+Here are context diffs of gram.y and keywords.c; sorry about sending the full files.
+These start sending lists of arguments toward the backend from the parser to
+implement row descriptors and subselects.
+
+They should apply OK even over Bruce's recent changes...
+
+ - Tom
+
+--------------D8B38A0D1F78A10C0023F702
+Content-Type: text/plain; charset=us-ascii; name="gram.y.patch"
+Content-Transfer-Encoding: 7bit
+Content-Disposition: inline; filename="gram.y.patch"
+
+*** ../src/backend/parser/gram.y.orig Sat Jan 10 05:44:36 1998
+--- ../src/backend/parser/gram.y Sat Jan 10 19:29:37 1998
+***************
+*** 195,200 ****
+--- 195,201 ----
+ having_clause
+ %type <list> row_descriptor, row_list
+ %type <node> row_expr
++ %type <str> RowOp, row_opt
+ %type <list> OptCreateAs, CreateAsList
+ %type <node> CreateAsElement
+ %type <value> NumConst
+***************
+*** 242,248 ****
+ */
+
+ /* Keywords (in SQL92 reserved words) */
+! %token ACTION, ADD, ALL, ALTER, AND, AS, ASC,
+ BEGIN_TRANS, BETWEEN, BOTH, BY,
+ CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT,
+ CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME,
+--- 243,249 ----
+ */
+
+ /* Keywords (in SQL92 reserved words) */
+! %token ACTION, ADD, ALL, ALTER, AND, ANY, AS, ASC,
+ BEGIN_TRANS, BETWEEN, BOTH, BY,
+ CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT,
+ CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME,
+***************
+*** 258,264 ****
+ ON, OPTION, OR, ORDER, OUTER_P,
+ PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC,
+ REFERENCES, REVOKE, RIGHT, ROLLBACK,
+! SECOND_P, SELECT, SET, SUBSTRING,
+ TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM,
+ UNION, UNIQUE, UPDATE, USING,
+ VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW,
+--- 259,265 ----
+ ON, OPTION, OR, ORDER, OUTER_P,
+ PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC,
+ REFERENCES, REVOKE, RIGHT, ROLLBACK,
+! SECOND_P, SELECT, SET, SOME, SUBSTRING,
+ TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM,
+ UNION, UNIQUE, UPDATE, USING,
+ VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW,
+***************
+*** 2853,2866 ****
+ /* Expressions using row descriptors
+ * Define row_descriptor to allow yacc to break the reduce/reduce conflict
+ * with singleton expressions.
+ */
+ row_expr: '(' row_descriptor ')' IN '(' SubSelect ')'
+ {
+! $$ = NULL;
+ }
+ | '(' row_descriptor ')' NOT IN '(' SubSelect ')'
+ {
+! $$ = NULL;
+ }
+ | '(' row_descriptor ')' '=' '(' row_descriptor ')'
+ {
+--- 2854,2878 ----
+ /* Expressions using row descriptors
+ * Define row_descriptor to allow yacc to break the reduce/reduce conflict
+ * with singleton expressions.
++ *
++ * Note that "SOME" is the same as "ANY" in syntax.
++ * - thomas 1998-01-10
+ */
+ row_expr: '(' row_descriptor ')' IN '(' SubSelect ')'
+ {
+! $$ = makeA_Expr(OP, "=any", (Node *)$2, (Node *)$6);
+ }
+ | '(' row_descriptor ')' NOT IN '(' SubSelect ')'
+ {
+! $$ = makeA_Expr(OP, "<>any", (Node *)$2, (Node *)$7);
+! }
+! | '(' row_descriptor ')' RowOp row_opt '(' SubSelect ')'
+! {
+! char *opr;
+! opr = palloc(strlen($4)+strlen($5)+1);
+! strcpy(opr, $4);
+! strcat(opr, $5);
+! $$ = makeA_Expr(OP, opr, (Node *)$2, (Node *)$7);
+ }
+ | '(' row_descriptor ')' '=' '(' row_descriptor ')'
+ {
+***************
+*** 2880,2885 ****
+--- 2892,2907 ----
+ }
+ ;
+
++ RowOp: '=' { $$ = "="; }
++ | '<' { $$ = "<"; }
++ | '>' { $$ = ">"; }
++ ;
++
++ row_opt: ALL { $$ = "all"; }
++ | ANY { $$ = "any"; }
++ | SOME { $$ = "any"; }
++ ;
++
+ row_descriptor: row_list ',' a_expr
+ {
+ $$ = lappend($1, $3);
+***************
+*** 3432,3441 ****
+ ;
+
+ in_expr: SubSelect
+! {
+! elog(ERROR,"IN (SUBSELECT) not yet implemented");
+! $$ = $1;
+! }
+ | in_expr_nodes
+ { $$ = $1; }
+ ;
+--- 3454,3460 ----
+ ;
+
+ in_expr: SubSelect
+! { $$ = makeA_Expr(OP, "=", saved_In_Expr, (Node *)$1); }
+ | in_expr_nodes
+ { $$ = $1; }
+ ;
+***************
+*** 3449,3458 ****
+ ;
+
+ not_in_expr: SubSelect
+! {
+! elog(ERROR,"NOT IN (SUBSELECT) not yet implemented");
+! $$ = $1;
+! }
+ | not_in_expr_nodes
+ { $$ = $1; }
+ ;
+--- 3468,3474 ----
+ ;
+
+ not_in_expr: SubSelect
+! { $$ = makeA_Expr(OP, "<>", saved_In_Expr, (Node *)$1); }
+ | not_in_expr_nodes
+ { $$ = $1; }
+ ;
+
+--------------D8B38A0D1F78A10C0023F702
+Content-Type: text/plain; charset=us-ascii; name="keywords.c.patch"
+Content-Transfer-Encoding: 7bit
+Content-Disposition: inline; filename="keywords.c.patch"
+
+*** ../src/backend/parser/keywords.c.orig Mon Jan 5 07:51:33 1998
+--- ../src/backend/parser/keywords.c Sat Jan 10 19:22:07 1998
+***************
+*** 39,44 ****
+--- 39,45 ----
+ {"alter", ALTER},
+ {"analyze", ANALYZE},
+ {"and", AND},
++ {"any", ANY},
+ {"append", APPEND},
+ {"archive", ARCHIVE},
+ {"as", AS},
+***************
+*** 178,183 ****
+--- 179,185 ----
+ {"set", SET},
+ {"setof", SETOF},
+ {"show", SHOW},
++ {"some", SOME},
+ {"stdin", STDIN},
+ {"stdout", STDOUT},
+ {"substring", SUBSTRING},
+
+--------------D8B38A0D1F78A10C0023F702--
+
+
+From owner-pgsql-hackers@hub.org Sun Jan 11 01:31:13 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA12255
+ for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:31:10 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA20396 for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:10:48 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA22176; Sun, 11 Jan 1998 01:03:15 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 11 Jan 1998 01:02:34 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA22151 for pgsql-hackers-outgoing; Sun, 11 Jan 1998 01:02:26 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA22077 for <hackers@postgresql.org>; Sun, 11 Jan 1998 01:01:05 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id AAA11801;
+ Sun, 11 Jan 1998 00:59:23 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801110559.AAA11801@candle.pha.pa.us>
+Subject: [HACKERS] Re: subselects
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Sun, 11 Jan 1998 00:59:23 -0500 (EST)
+Cc: hackers@postgresql.org, lockhart@alumni.caltech.edu
+In-Reply-To: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 11, 98 00:19:08 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> I would like to have something done in parser near Jan 17 to get
+> subqueries working by Feb 1. I vote for support of all standard
+> things (1. - 3.) in parser right now - if there will be no time
+> to implement something like (a, b, c) then optimizer will call
+> elog(WARN) (oh, sorry, - elog(ERROR)).
+
+First, let me say I am glad we are still on schedule for Feb 1. I was
+panicking because I thought we wouldn't make it in time.
+
+
+> > > (is it allowable by standards ?) - in this case it's better
+> > > to don't add tabA to 1st subselect but add tabA to second one
+> > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
+> > > this gives us 2-tables join in 1st subquery instead of 3-tables join.
+> > > (And I'm still not sure that using temp tables is best of what can be
+> > > done in all cases...)
+> >
+> > I don't see any use for temp tables in subselects anymore. After having
+> > implemented UNIONS, I now see how much can be done in the upper
+> > optimizer. I see you just putting the subquery PLAN into the proper
+> > place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
+>
+> When saying about temp tables, I meant tables created by node Material
+> for subquery plan. This is one of two ways - run subquery once for all
+> possible upper plan tuples and then just join result table with upper
+> query. Another way is re-run subquery for each upper query tuple,
+> without temp table but may be with caching results by some ways.
+> Actually, there is special case - when subquery can be alternatively
+> formulated as joins, - but this is just special case.
+
+This is interesting. It really only applies for correlated subqueries,
+and certainly it may help sometimes to just evaluate the subquery for
+valid values that are going to come from the upper query than for all
+possible values. Perhaps we can use the 'cost' value of each query to
+decide how to handle this.
+
+>
+> > > > In the parent query, to parse the WHERE clause, we create a new operator
+> > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
+> > > ^^^^^^^^^^^^^^^^^^
+> > > No. We have to handle (a,b,c) OP (select x, y, z ...) and
+> > > '_a_constant_' OP (select ...) - I don't know is last in standards,
+> > > Sybase has this.
+> >
+> > I have never seen this in my eight years of SQL. Perhaps we can leave
+> > this for later, maybe much later.
+>
+> Are you saying about (a, b, c) or about 'a_constant' ?
+> Again, can someone comment on are they in standards or not ?
+> Tom ?
+> If yes then please add parser' support for them now...
+
+OK, Thomas says it is, so we will put in as much code as we can to handle
+it.
+
+> Should we say users that subselect will work for standard data types only ?
+> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
+> Is there difference between handling = ANY and ~ ANY ? I don't see any.
+> Currently we can't get IN working properly for boxes (and may be for others too)
+> and I don't like to try to resolve these problems now, but hope that someday
+> we'll be able to do this. At the moment - just convert IN into = ANY and
+> NOT IN into <> ALL in parser.
+
+OK.
+
+>
+> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
+> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
+
+I did not know that either.
+
+> There is big difference between subqueries and queries in UNION -
+> there are not dependences between UNION queries.
+
+Yes, I know UNIONS are trivial compared to subselects.
+
+>
+> Ok, opened issues:
+>
+> 1. Is using upper query' vars in all subquery levels in standard ?
+> 2. Is (a, b, c) OP (subselect) in standard ?
+> 3. What types of expressions (Var, Const, ...) are allowed on the left
+> side of operator with subquery on the right ?
+> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
+> (My vote for all boolean operators).
+>
+> And - did we get consensus on presentation subqueries stuff in Query,
+> Expr and Var ?
+
+OK, here are my concrete ideas on changes and structures.
+
+I think we all agreed that Query needs new fields:
+
+ Query *parentQuery;
+ List *subqueries;
+
+Maybe query level too, but I don't think so (see later ideas on Var).
+
+We need a new Node structure, call it Sublink:
+
+ int linkType (IN, NOTIN, ANY, EXISTS, OPERATOR...)
+ Oid operator /* subquery must return single row */
+ List *lefthand; /* parent stuff */
+ Node *subquery; /* represents nodes from parser */
+ Index Subindex; /* filled in to index Query->subqueries */
+
+Of course, the names are just suggestions. Every time we run through
+the parsenodes of a query to create a Query* structure, when we do the
+WHERE clause, if we come upon one of these Sublink nodes (created in the
+parser), we move the supplied Query* in Sublink->subquery to a local
+List variable, and we set Subquery->subindex to equal the index of the
+new query, i.e. is it the first subquery we found, 1, or the second, 2,
+etc.
+
+After we have created the parent Query structure, we run through our
+local List variable of subquery parsenodes we created above, and add
+Query* entries to Query->subqueries. In each subquery Query*, we set
+the parentQuery pointer.
+
+Also, when parsing the subqueries, we need to keep track of correlated
+references. I recommend we add a field to the Var structure:
+
+ Index sublevel; /* range table reference:
+ = 0 current level of query
+ < 0 parent above this many levels
+ > 0 index into subquery list
+ */
+
+This way, a Var node with sublevel 0 is the current level, and is true
+in most cases. This helps us not have to change much code. sublevel =
+-1 means it references the range table in the parent query. sublevel =
+-2 means the parent's parent. sublevel = 2 means it references the range
+table of the second entry in Query->subqueries. Varno and varattno are
+still meaningful. Of course, we can't reference variables in the
+subqueries from the parent in the parser code, but Vadim may want to.
+
+When doing a Var lookup in the parser, we look in the current level
+first, but if not found, if it is a subquery, we can look at the parent
+and parent's parent to set the sublevel, varno, and varatno properly.
+
+We create no phantom range table entries in the subquery, and no phantom
+target list entries. We can leave that all for the upper optimizer.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Tue Dec 9 12:14:09 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA16186
+ for <maillist@candle.pha.pa.us>; Tue, 9 Dec 1997 12:14:05 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id MAA17524; Tue, 9 Dec 1997 12:05:31 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 09 Dec 1997 12:05:01 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id MAA17316 for pgsql-hackers-outgoing; Tue, 9 Dec 1997 12:04:55 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id MAA17304 for <hackers@postgresql.org>; Tue, 9 Dec 1997 12:04:40 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id MAA15973;
+ Tue, 9 Dec 1997 12:05:03 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199712091705.MAA15973@candle.pha.pa.us>
+Subject: Re: [HACKERS] Items for 6.3
+To: lockhart@alumni.caltech.edu (Thomas G. Lockhart)
+Date: Tue, 9 Dec 1997 12:05:03 -0500 (EST)
+Cc: hackers@postgreSQL.org, vadim@sable.krasnoyarsk.su
+In-Reply-To: <348CE8BE.FE0F8AA1@alumni.caltech.edu> from "Thomas G. Lockhart" at Dec 9, 97 06:44:14 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> Bruce Momjian wrote:
+>
+> > Here are the items I think would make 6.3 a truly great release:
+> >
+> > subselects
+> > outer joins
+>
+> These two would be sufficient (along with the changes already in the
+> tree) to address the most visible deficiencies in SQL functionality.
+>
+> > temp tables
+> > fix "Reliability" items attached to specific queries
+>
+> Sure, why not?
+
+We will need temp tables for subselects anyway.
+
+I could implement them, but again we come up against the problem of
+storing these plans and executing them later. We need to do some of the
+temp table stuff in the optimizer because the plan could be passed with
+a temp table, and we can't bind the temp name to a real name in the
+parser, especially if we save those plans in system tables that other
+backends can execute. Multiple backends would be using the same temp
+name.
+
+At the same time, we need some temp stuff in the parser so the parser
+can recognize the temp table and its fields when it sees it.
+
+The hardest part is:
+
+select * into tmp mytmp from z where x=y;
+select * from mytmp;
+
+If they are passed together, and we have to plan them both, before
+either is executed, you have to make the parser aware of the fields in
+mytmp, even though you have not executed the select yet, you are just
+storing the plan.
+
+This was Vadim's point about not doing subselects in the parser.
+
+>
+> > postmaster sync's pglog, giving almost fsync reliability with
+> > no-fsync performance
+>
+> OK to save for v6.4.
+>
+> Could we try to do the subselect/join/union features for 6.3? I know you
+> have been looking at it, and found the deepest parts of the backend to
+> be a bit murky. I'm not familiar with that area at all, but perhaps we
+> could divert Vadim for a week or two or three when he has some time.
+> Especially if we trade him for help on his favorite topics for v6.4??
+>
+
+Sure. I may be able to do some of the pglog change myself, though Vadim
+has some definite ideas on this.
+
+As for Vadim, trading help is a good idea, but what trade can we make?
+He can do most of these tough things without us, and in 1/4 the time.
+We can't even see where to start them.
+
+Basically, without Vadim, this project would have really major problems.
+
+He certainly likes working on PostgreSQL, so he must be busy with other
+things.
+
+It is not fair to keep counting on Vadim to do all these tough jobs. We
+really need to get other people up to Vadim's level of ability.
+Unfortunately, the odds of this happening are very slim.
+
+This leaves me scratching my head.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Fri Dec 19 00:08:21 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA25029
+ for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 00:08:13 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id MAA11825;
+ Fri, 19 Dec 1997 12:13:15 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <349A0265.7329D4EE@sable.krasnoyarsk.su>
+Date: Fri, 19 Dec 1997 12:13:09 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>,
+ PostgreSQL-development <hackers@postgresql.org>
+Subject: Re: [HACKERS] Items for 6.3
+References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Thomas G. Lockhart wrote:
+>
+> Could we try to do the subselect/join/union features for 6.3? I know you
+> have been looking at it, and found the deepest parts of the backend to
+> be a bit murky. I'm not familiar with that area at all, but perhaps we
+> could divert Vadim for a week or two or three when he has some time.
+ ^^^^^
+More realistic... And this is for initial release only: tuning performance
+of subselects is very hard, long work.
+
+Ok - I'm ready to do subselects for 6.3 but this means that foreign keys
+may appear in 6.4 only. And I'll need in help: could someone add support
+for them in parser ? Not handling - but parsing and common checking.
+Also, it would be nice to have better temp tables implementation
+(without affecting pg_class etc) - node material need in query-level
+temp tables anyway. I'd really like to see temp table files created
+only when its data must go to disk due to local buffer pool is full
+and can't more keep table data in memory. Also, local buffer manager
+should be re-written to use hash table (like shared bufmgr) for buffer search,
+not sequential scan as now (this is item for TODO) - this will speed up
+things and allow to use more than 64 local buffers.
+
+I'm still sure that handling subselects in parser is not right way.
+And the main problem is not in execution plans (we could use tricks
+to resolve this) but in performance. Example:
+
+select b from big where b in (select s from small);
+
+If there is no duplicates in small then this is the same as
+
+select b from big, small where b = s;
+
+Without index on big postgres does seq scan of big and uses hashjoin with
+hash on small. Using temp table makes query only 20% slower (in my test).
+But with index on big postgres uses nestloop with seq scan of small and
+index scan of big => select run faster and temp table stuff makes query
+2.5 times slower! In the case of duplicates in small, handling in parser
+will use distinct (and so - sorting). But using hashjoin plan distinct
+may be avoided! Who can analize this ? Optimizer only. He can be smart
+to check is there unique index on small or not. If not - what is more
+costless: nestloop with sorting or slower hashjoin without sorting.
+Only optimizer can find best way to execute query, parser can't.
+
+> Especially if we trade him for help on his favorite topics for v6.4??
+
+Ok, I'd like to see shared catalog cache implemeted in 6.4... -:)
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Fri Dec 19 00:58:54 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA25460
+ for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 00:58:52 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA27667; Fri, 19 Dec 1997 00:54:39 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 19 Dec 1997 00:54:09 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA27633 for pgsql-hackers-outgoing; Fri, 19 Dec 1997 00:54:04 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA27623 for <hackers@postgresql.org>; Fri, 19 Dec 1997 00:53:53 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id AAA25415;
+ Fri, 19 Dec 1997 00:53:15 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199712190553.AAA25415@candle.pha.pa.us>
+Subject: Re: [HACKERS] Items for 6.3
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Fri, 19 Dec 1997 00:53:15 -0500 (EST)
+Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
+In-Reply-To: <349A0265.7329D4EE@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 19, 97 12:13:09 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> Thomas G. Lockhart wrote:
+> >
+> > Could we try to do the subselect/join/union features for 6.3? I know you
+> > have been looking at it, and found the deepest parts of the backend to
+> > be a bit murky. I'm not familiar with that area at all, but perhaps we
+> > could divert Vadim for a week or two or three when he has some time.
+> ^^^^^
+> More realistic... And this is for initial release only: tuning performance
+> of subselects is very hard, long work.
+>
+> Ok - I'm ready to do subselects for 6.3 but this means that foreign keys
+
+Great.
+
+> may appear in 6.4 only. And I'll need in help: could someone add support
+> for them in parser ? Not handling - but parsing and common checking.
+> Also, it would be nice to have better temp tables implementation
+> (without affecting pg_class etc) - node material need in query-level
+> temp tables anyway. I'd really like to see temp table files created
+> only when its data must go to disk due to local buffer pool is full
+> and can't more keep table data in memory. Also, local buffer manager
+> should be re-written to use hash table (like shared bufmgr) for buffer search,
+> not sequential scan as now (this is item for TODO) - this will speed up
+> things and allow to use more than 64 local buffers.
+>
+> I'm still sure that handling subselects in parser is not right way.
+> And the main problem is not in execution plans (we could use tricks
+> to resolve this) but in performance. Example:
+>
+> select b from big where b in (select s from small);
+>
+> If there is no duplicates in small then this is the same as
+>
+> select b from big, small where b = s;
+>
+> Without index on big postgres does seq scan of big and uses hashjoin with
+> hash on small. Using temp table makes query only 20% slower (in my test).
+> But with index on big postgres uses nestloop with seq scan of small and
+> index scan of big => select run faster and temp table stuff makes query
+> 2.5 times slower! In the case of duplicates in small, handling in parser
+> will use distinct (and so - sorting). But using hashjoin plan distinct
+> may be avoided! Who can analize this ? Optimizer only. He can be smart
+> to check is there unique index on small or not. If not - what is more
+> costless: nestloop with sorting or slower hashjoin without sorting.
+> Only optimizer can find best way to execute query, parser can't.
+>
+
+OK, let me comment on this. Let's take your example:
+
+> select b from big where b in (select s from small);
+>
+> If there is no duplicates in small then this is the same as
+>
+> select b from big, small where b = s;
+
+My idea was to do this:
+
+ select distinct s into temp table small2 from small;
+ select b from big,small2 where b = s;
+
+And let the optimizer decide how to do the join. Is this what you are
+saying?
+
+The problem I see is that the temp table is already distinct, and was
+sorted to do that, but you can't pass that information into the
+optimizer. Is that the problem with using the parser?
+
+But you want the temp table never to hit disk unless it has to, but that
+will not work unless we do a really good job with temp tables.
+
+Also NOT IN will need some type of non-join operator, perhaps a flag in
+the Plan to say "look for a match, but only output if you find it." How
+do we do that?
+
+We definately need temp tables, and I think we can stuff it into the
+cache as LOCAL, which will make it usable without adding to pg_class.
+
+Perhaps if we create a special Plan in the optimizer called IN, and we
+have the outer and inner queries as plans, and work that plan into the
+executor.
+
+The problem with that is we need to specify a way to join the two plans,
+and the same logic that determines what type of join to do can this too.
+Maybe that's why you wanted stuff done in the optimizer and not the
+parser.
+
+At least now, I understand enough to come up with ideas, and can
+understand what you are saying.
+
+> > Especially if we trade him for help on his favorite topics for v6.4??
+>
+> Ok, I'd like to see shared catalog cache implemeted in 6.4... -:)
+>
+> Vadim
+>
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Fri Dec 19 01:00:58 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA25512
+ for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 01:00:56 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA28102; Fri, 19 Dec 1997 00:56:52 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 19 Dec 1997 00:56:40 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA28077 for pgsql-hackers-outgoing; Fri, 19 Dec 1997 00:56:36 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA28065 for <hackers@postgresql.org>; Fri, 19 Dec 1997 00:56:19 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id AAA25436;
+ Fri, 19 Dec 1997 00:55:56 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199712190555.AAA25436@candle.pha.pa.us>
+Subject: Re: [HACKERS] Items for 6.3
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Fri, 19 Dec 1997 00:55:56 -0500 (EST)
+Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
+In-Reply-To: <349A0265.7329D4EE@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 19, 97 12:13:09 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> select b from big where b in (select s from small);
+>
+> If there is no duplicates in small then this is the same as
+>
+> select b from big, small where b = s;
+
+I think I see the problem you are describing now. If we put the
+subselect into a temp table, we can't use the existing index on small.s,
+even if there is one, or if sorting was involved in creating the temp
+table.
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From lockhart@alumni.caltech.edu Fri Dec 19 01:34:26 1997
+Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA25750
+ for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 01:34:23 -0500 (EST)
+Received: from alumni.caltech.edu (localhost [127.0.0.1])
+ by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA15234;
+ Fri, 19 Dec 1997 06:29:45 GMT
+Sender: tgl@gnet04.jpl.nasa.gov
+Message-ID: <349A1459.EBFE2C84@alumni.caltech.edu>
+Date: Fri, 19 Dec 1997 06:29:45 +0000
+From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Organization: Caltech/JPL
+X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
+MIME-Version: 1.0
+To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>,
+ PostgreSQL-development <hackers@postgresql.org>
+Subject: Re: [HACKERS] Items for 6.3
+References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu> <349A0265.7329D4EE@sable.krasnoyarsk.su>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+> > Could we try to do the subselect/join/union features for 6.3? I know you
+> > have been looking at it, and found the deepest parts of the backend to
+> > be a bit murky. I'm not familiar with that area at all, but perhaps we
+> > could divert Vadim for a week or two or three when he has some time.
+> ^^^^^
+> More realistic... And this is for initial release only: tuning performance
+> of subselects is very hard, long work.
+>
+> Ok - I'm ready to do subselects for 6.3 but this means that foreign keys
+> may appear in 6.4 only. And I'll need in help: could someone add support
+> for them in parser ? Not handling - but parsing and common checking.
+
+Yes, I've already added subselect syntax in the parser, but we will need to
+modify or add to the parse tree nodes to push that past the parser into the
+backend. I'm happy to focus on that, since I understand those pieces pretty well.
+There are several places where "subselect syntax" is used: subselects and unions
+come to mind right away. If you have an opinion on how the parse nodes should be
+structured I can start with that, or I can just put something in and then modify
+it as you need later. Do you see unions as being similar to subselects, or are
+they a separate problem? To me, they seem like a simpler case since (perhaps) not
+as much optimization and internal reorganizing needs to happen.
+
+> Also, it would be nice to have better temp tables implementation
+> (without affecting pg_class etc) - node material need in query-level
+> temp tables anyway. I'd really like to see temp table files created
+> only when its data must go to disk due to local buffer pool is full
+> and can't more keep table data in memory.
+
+This sounds very desirable. I noticed that there are, or used to be, multiple
+storage managers. Could a manager for temporary storage be written which stores
+things in memory until it gets too big and then go to disk? Could that manager
+use the mm and md managers internally? Or is all of that at too low a level to be
+helpful for this problem?
+
+SQL92 has the concept of transaction-only and session-only tables and variables.
+Could an implementation of "temporary tables" be used to implement this feature
+at the same time (or form the basis for it later)? It seems like none of these
+non-permanent tables need to go to any of the pg_ tables, since other backends do
+not need to see them and they are allowed to disappear at the end of the session
+(or at a crash). We would just need the "table manager" to cache information on
+temporary stuff before looking at the permanent tables (??).
+
+> Also, local buffer manager
+> should be re-written to use hash table (like shared bufmgr) for buffer search,
+> not sequential scan as now (this is item for TODO) - this will speed up
+> things and allow to use more than 64 local buffers.
+>
+> I'm still sure that handling subselects in parser is not right way.
+> And the main problem is not in execution plans (we could use tricks
+> to resolve this) but in performance.
+
+Seems to me that the subselect needs to stay untransformed (i.e. executable but
+non-optimized) so that an optimizer can independently decide how to transform for
+faster execution. That way, in the first implementation we have reliable but
+stupid execution, but then can add a subselect optimizer which looks for cases
+which can be transformed to run faster.
+
+> > Especially if we trade him for help on his favorite topics for v6.4??
+>
+> Ok, I'd like to see shared catalog cache implemeted in 6.4... -:)
+
+Sure. (Tell me what it is later :)
+
+ - Tom
+
+
+
+From vadim@sable.krasnoyarsk.su Fri Dec 19 06:23:14 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27849
+ for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 06:22:46 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id SAA12239;
+ Fri, 19 Dec 1997 18:28:13 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <349A5A4C.DA366B47@sable.krasnoyarsk.su>
+Date: Fri, 19 Dec 1997 18:28:12 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: lockhart@alumni.caltech.edu, hackers@postgresql.org
+Subject: Re: [HACKERS] Items for 6.3
+References: <199712190553.AAA25415@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> OK, let me comment on this. Let's take your example:
+>
+> > select b from big where b in (select s from small);
+> >
+> > If there is no duplicates in small then this is the same as
+> >
+> > select b from big, small where b = s;
+>
+> My idea was to do this:
+>
+> select distinct s into temp table small2 from small;
+> select b from big,small2 where b = s;
+>
+> And let the optimizer decide how to do the join. Is this what you are
+> saying?
+>
+> The problem I see is that the temp table is already distinct, and was
+> sorted to do that, but you can't pass that information into the
+> optimizer. Is that the problem with using the parser?
+
+No. I said that in some cases we can avoid distinct at all: if either
+unique index on small exists or by using hashjoin plans with !new!
+HashUnique node (there was mistake in my prev description - not Hash,
+but HashUnique on small should be used, - HashUnique is hash table
+without duplicates, just another way to implement distinct, without
+sorting). This new node can be usefull and for "normal" queries
+(without subselects).
+
+My example is very simple. I just want to say that by handling subqueries
+in optimizer we will have more chances to do better optimization. Maybe not
+now, but latter. I'm sure that subqueries require some specific optimization
+and this is not task of parser.
+
+>
+> But you want the temp table never to hit disk unless it has to, but that
+> will not work unless we do a really good job with temp tables.
+
+Of 'course.
+
+>
+> Also NOT IN will need some type of non-join operator, perhaps a flag in
+> the Plan to say "look for a match, but only output if you find it." How
+ ^^
+ don't ?
+> do we do that?
+
+Just as you said - by using of some flag.
+
+>
+> We definately need temp tables, and I think we can stuff it into the
+> cache as LOCAL, which will make it usable without adding to pg_class.
+
+We have Relation->rd_istemp flag... Just change it from bool to int:
+0 -> is not temp, 1 -> session level temp table, etc...
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Fri Dec 19 08:09:11 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00349
+ for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 08:09:05 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id UAA12377;
+ Fri, 19 Dec 1997 20:14:25 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <349A7327.9A484B74@sable.krasnoyarsk.su>
+Date: Fri, 19 Dec 1997 20:14:15 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>,
+ PostgreSQL-development <hackers@postgresql.org>
+Subject: Re: [HACKERS] Items for 6.3
+References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu> <349A0265.7329D4EE@sable.krasnoyarsk.su> <349A1459.EBFE2C84@alumni.caltech.edu>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Thomas G. Lockhart wrote:
+>
+> > Ok - I'm ready to do subselects for 6.3 but this means that foreign keys
+> > may appear in 6.4 only. And I'll need in help: could someone add support
+> > for them in parser ? Not handling - but parsing and common checking.
+>
+> Yes, I've already added subselect syntax in the parser, but we will need to
+> modify or add to the parse tree nodes to push that past the parser into the
+> backend. I'm happy to focus on that, since I understand those pieces pretty well.
+
+Nice!
+
+> There are several places where "subselect syntax" is used: subselects and unions
+> come to mind right away. If you have an opinion on how the parse nodes should be
+> structured I can start with that, or I can just put something in and then modify
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+It's ok for me.
+
+> it as you need later. Do you see unions as being similar to subselects, or are
+> they a separate problem? To me, they seem like a simpler case since (perhaps) not
+> as much optimization and internal reorganizing needs to happen.
+
+I didn't think about unions at all... Yes, it's simpler to implement.
+BTW, I recall Bruce mentioned that unions are used for selects from
+superclass and all descendant classes (select ... from table* ) - maybe
+something is already implemented ? Bruce ?
+
+>
+> > Also, it would be nice to have better temp tables implementation
+> > (without affecting pg_class etc) - node material need in query-level
+> > temp tables anyway. I'd really like to see temp table files created
+> > only when its data must go to disk due to local buffer pool is full
+> > and can't more keep table data in memory.
+>
+> This sounds very desirable. I noticed that there are, or used to be, multiple
+> storage managers. Could a manager for temporary storage be written which stores
+> things in memory until it gets too big and then go to disk? Could that manager
+> use the mm and md managers internally? Or is all of that at too low a level to be
+> helpful for this problem?
+
+mm uses shmem... This feature could be implemented in local bufmgr
+directly: when requested buffer is not found in pool and there is no free,
+!dirty buffer then try to find some dirty buffer of created relation, flush
+it to disk and use (exception below); if no such buffer -> create some relation
+(and flush 1st block); exception: also create some relation if # of buffers
+occupied by already created relations is too small (just to do not break
+buffering of created relations).
+(Note, that using some additional in-memory storage manager will cause
+keeping some buffers in-memory twice - in local pool and in manager.
+The way above is using local bufmgr as storage manager).
+
+> >
+> > I'm still sure that handling subselects in parser is not right way.
+> > And the main problem is not in execution plans (we could use tricks
+> > to resolve this) but in performance.
+>
+> Seems to me that the subselect needs to stay untransformed (i.e. executable but
+> non-optimized) so that an optimizer can independently decide how to transform for
+> faster execution. That way, in the first implementation we have reliable but
+> stupid execution, but then can add a subselect optimizer which looks for cases
+> which can be transformed to run faster.
+
+Yes, I believe that this is right way.
+
+>
+> > > Especially if we trade him for help on his favorite topics for v6.4??
+> >
+> > Ok, I'd like to see shared catalog cache implemeted in 6.4... -:)
+>
+> Sure. (Tell me what it is later :)
+
+Ok -:)
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:21 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08884
+ for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:01:18 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA24250 for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 03:57:12 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23028;
+ Tue, 23 Dec 1997 16:04:25 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <349F7E97.48C63F17@sable.krasnoyarsk.su>
+Date: Tue, 23 Dec 1997 16:04:23 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: lockhart@alumni.caltech.edu, hackers@postgresql.org
+Subject: Re: [HACKERS] Items for 6.3
+References: <199712191607.LAA02362@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+> >
+> > I didn't think about unions at all... Yes, it's simpler to implement.
+> > BTW, I recall Bruce mentioned that unions are used for selects from
+> > superclass and all descendant classes (select ... from table* ) - maybe
+> > something is already implemented ? Bruce ?
+>
+> Yes, it is already there. See optimizer/prep/prepunion.c, and see the
+> call to it from optimizer/plan/planner.c. The current source tree has a
+> cleaned up version that will be easier to understand. Basically, if
+> there are any inherited tables, it calls prepunion, and and cycles
+> through each inherited table, copying the Query plan, and calling the
+> planner() for each one, then it returns to the planner() to so sorting
+> and uniqueness. I am working on fixing aggregates.
+
+Could you try with unions ?
+I would like to concentrate on single thing - subqueries.
+
+>
+> > mm uses shmem... This feature could be implemented in local bufmgr
+> > directly: when requested buffer is not found in pool and there is no free,
+> > !dirty buffer then try to find some dirty buffer of created relation, flush
+> > it to disk and use (exception below); if no such buffer -> create some relation
+> > (and flush 1st block); exception: also create some relation if # of buffers
+> > occupied by already created relations is too small (just to do not break
+> > buffering of created relations).
+> > (Note, that using some additional in-memory storage manager will cause
+> > keeping some buffers in-memory twice - in local pool and in manager.
+> > The way above is using local bufmgr as storage manager).
+>
+> In the psort code, we do a nice job of keeping the stuff in files or
+> memory. Seems to work well. Can we use that somehow? Perhaps make it
+> a separate module, or just force a psort rather than a hash!
+
+I would like to be not restricted to psort only, but use what is better
+in each case. I even can foresee using indices on temp tables: we could
+put data in index without putting data in table itself!
+In any case, we can leave in-memory tables for future.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Tue Dec 23 04:31:23 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09186
+ for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:31:20 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA24391 for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:04:44 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id EAA06421; Tue, 23 Dec 1997 04:00:11 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 23 Dec 1997 03:58:36 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id DAA06163 for pgsql-hackers-outgoing; Tue, 23 Dec 1997 03:58:32 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id DAA06151 for <hackers@postgresql.org>; Tue, 23 Dec 1997 03:58:02 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23028;
+ Tue, 23 Dec 1997 16:04:25 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Message-ID: <349F7E97.48C63F17@sable.krasnoyarsk.su>
+Date: Tue, 23 Dec 1997 16:04:23 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
+Subject: Re: [HACKERS] Items for 6.3
+References: <199712191607.LAA02362@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce Momjian wrote:
+> >
+> > I didn't think about unions at all... Yes, it's simpler to implement.
+> > BTW, I recall Bruce mentioned that unions are used for selects from
+> > superclass and all descendant classes (select ... from table* ) - maybe
+> > something is already implemented ? Bruce ?
+>
+> Yes, it is already there. See optimizer/prep/prepunion.c, and see the
+> call to it from optimizer/plan/planner.c. The current source tree has a
+> cleaned up version that will be easier to understand. Basically, if
+> there are any inherited tables, it calls prepunion, and and cycles
+> through each inherited table, copying the Query plan, and calling the
+> planner() for each one, then it returns to the planner() to so sorting
+> and uniqueness. I am working on fixing aggregates.
+
+Could you try with unions ?
+I would like to concentrate on single thing - subqueries.
+
+>
+> > mm uses shmem... This feature could be implemented in local bufmgr
+> > directly: when requested buffer is not found in pool and there is no free,
+> > !dirty buffer then try to find some dirty buffer of created relation, flush
+> > it to disk and use (exception below); if no such buffer -> create some relation
+> > (and flush 1st block); exception: also create some relation if # of buffers
+> > occupied by already created relations is too small (just to do not break
+> > buffering of created relations).
+> > (Note, that using some additional in-memory storage manager will cause
+> > keeping some buffers in-memory twice - in local pool and in manager.
+> > The way above is using local bufmgr as storage manager).
+>
+> In the psort code, we do a nice job of keeping the stuff in files or
+> memory. Seems to work well. Can we use that somehow? Perhaps make it
+> a separate module, or just force a psort rather than a hash!
+
+I would like to be not restricted to psort only, but use what is better
+in each case. I even can foresee using indices on temp tables: we could
+put data in index without putting data in table itself!
+In any case, we can leave in-memory tables for future.
+
+Vadim
+
+
+From aixssd!darrenk@abs.net Thu Dec 5 10:30:53 1996
+Received: from abs.net (root@u1.abs.net [207.114.0.130]) by candle.pha.pa.us (8.8.3/8.7.3) with ESMTP id KAA06591 for <maillist@candle.pha.pa.us>; Thu, 5 Dec 1996 10:30:43 -0500 (EST)
+Received: from aixssd.UUCP (nobody@localhost) by abs.net (8.8.3/8.7.3) with UUCP id KAA01387 for maillist@candle.pha.pa.us; Thu, 5 Dec 1996 10:13:56 -0500 (EST)
+Received: by aixssd (AIX 3.2/UCB 5.64/4.03)
+ id AA36963; Thu, 5 Dec 1996 10:10:24 -0500
+Received: by ceodev (AIX 4.1/UCB 5.64/4.03)
+ id AA34942; Thu, 5 Dec 1996 10:07:56 -0500
+Date: Thu, 5 Dec 1996 10:07:56 -0500
+From: aixssd!darrenk@abs.net (Darren King)
+Message-Id: <9612051507.AA34942@ceodev>
+To: maillist@candle.pha.pa.us
+Subject: Subselect info.
+Mime-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Content-Md5: jaWdPH2KYtdr7ESzqcOp5g==
+Status: OR
+
+> Any of them deal with implementing subselects?
+
+There's a white paper at the www.sybase.com that might
+help a little. It's just a copy of a presentation
+given by the optimizer guru there. Nothing code-wise,
+but he gives a few ways of flattening them with temp
+tables, etc...
+
+Darren
+
+From vadim@sable.krasnoyarsk.su Thu Aug 21 23:42:50 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04109
+ for <maillist@candle.pha.pa.us>; Thu, 21 Aug 1997 23:42:43 -0400 (EDT)
+Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04399; Fri, 22 Aug 1997 12:04:31 +0800 (KRD)
+Sender: root@www.krasnet.ru
+Message-ID: <33FD0FCF.4DAA423A@sable.krasnoyarsk.su>
+Date: Fri, 22 Aug 1997 12:04:31 +0800
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+Subject: Re: subselects
+References: <199708220219.WAA23745@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> Considering the complexity of the primary/secondary changes you are
+> making, I believe subselects will be easier than that.
+
+I don't do changes for P/F keys - just thinking...
+Yes, I think that impl of referential integrity is
+more complex work.
+
+As for subselects:
+
+in plannodes.h
+
+typedef struct Plan {
+...
+ struct Plan *lefttree;
+ struct Plan *righttree;
+} Plan;
+
+/* ----------------
+ * these are are defined to avoid confusion problems with "left"
+ ^^^^^^^^^^^^^^^^^^
+ * and "right" and "inner" and "outer". The convention is that
+ * the "left" plan is the "outer" plan and the "right" plan is
+ * the inner plan, but these make the code more readable.
+ * ----------------
+ */
+#define innerPlan(node) (((Plan *)(node))->righttree)
+#define outerPlan(node) (((Plan *)(node))->lefttree)
+
+First thought is avoid any confusions by re-defining
+
+#define rightPlan(node) (((Plan *)(node))->righttree)
+#define leftPlan(node) (((Plan *)(node))->lefttree)
+
+and change all occurrences of 'outer' & 'inner' in code
+to 'left' & 'inner' ones:
+
+this will allow to use 'outer' & 'inner' things for subselects
+latter, without confusion. My hope is that we may change Executor
+very easy by adding outer/inner plans/TupleSlots to
+EState, CommonState, JoinState, etc and by doing node
+processing in right order.
+
+Subselects are mostly Planner problem.
+
+Unfortunately, I havn't time at the moment: CHECK/DEFAULT...
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Fri Aug 22 00:00:59 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA04354
+ for <maillist@candle.pha.pa.us>; Fri, 22 Aug 1997 00:00:51 -0400 (EDT)
+Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04425; Fri, 22 Aug 1997 12:22:37 +0800 (KRD)
+Sender: root@www.krasnet.ru
+Message-ID: <33FD140D.64880EEB@sable.krasnoyarsk.su>
+Date: Fri, 22 Aug 1997 12:22:37 +0800
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+Subject: Re: subselects
+References: <199708220219.WAA23745@candle.pha.pa.us> <33FD0FCF.4DAA423A@sable.krasnoyarsk.su>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Vadim B. Mikheev wrote:
+>
+> this will allow to use 'outer' & 'inner' things for subselects
+> latter, without confusion. My hope is that we may change Executor
+
+Or may be use 'high' & 'low' for subselecs (to avoid confusion
+with outter hoins).
+
+> very easy by adding outer/inner plans/TupleSlots to
+> EState, CommonState, JoinState, etc and by doing node
+> processing in right order.
+ ^^^^^^^^^^^^^^
+Rule is easy:
+1. Uncorrelated subselect - do 'low' plan node first
+2. Correlated - do left/right first
+
+- just some flag in structures.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Thu Oct 30 17:02:30 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA09682
+ for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 17:02:28 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA20688; Thu, 30 Oct 1997 16:58:40 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:58:24 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA20615 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:58:17 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA20495 for <hackers@postgreSQL.org>; Thu, 30 Oct 1997 16:57:54 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id QAA07726
+ for hackers@postgreSQL.org; Thu, 30 Oct 1997 16:50:29 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199710302150.QAA07726@candle.pha.pa.us>
+Subject: [HACKERS] subselects
+To: hackers@postgreSQL.org (PostgreSQL-development)
+Date: Thu, 30 Oct 1997 16:50:29 -0500 (EST)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+The only thing I have to add to what I had written earlier is that I
+think it is best to have these subqueries executed as early in query
+execution as possible.
+
+Every piece of the backend: parser, optimizer, executor, is designed to
+work on a single query. The earlier we can split up the queries, the
+better those pieces will work at doing their job. You want to be able
+to use the parser and optimizer on each part of the query separately, if
+you can.
+
+
+Forwarded message:
+> I have done some thinking about subselects. There are basically two
+> issues:
+ >
+> Does the query return one row or several rows? This can be
+> determined by seeing if the user uses equals on 'IN' to join the
+> subquery.
+>
+> Is the query correlated, meaning "Does the subquery reference
+> values from the outer query?"
+>
+> (We already have the third type of subquery, the INSERT...SELECT query.)
+>
+> So we have these four combinations:
+>
+> 1) one row, no correlation
+> 2) multiple rows, no correlation
+> 3) one row, correlated
+> 4) multiple rows, correlated
+>
+>
+> With #1, we can execute the subquery, get the value, replace the
+> subquery with the constant returned from the subquery, and execute the
+> outer query.
+>
+> With #2, we can execute the subquery and put the result into a temporary
+> table. We then rewrite the outer query to access the temporary table
+> and replace the subquery with the column name from the temporary table.
+> We probabally put an index on the temp. table, which has only one
+> column, because a subquery can only return one column. We remove the
+> temp. table after query execution.
+>
+> With #3 and #4, we potentially need to execute the subquery for every
+> row returned by the outer query. Performance would be horrible for
+> anything but the smallest query. Another way to handle this is to
+> execute the subquery WITHOUT using any of the outer-query columns to
+> restrict the WHERE clause, and add those columns used to join the outer
+> variables into the target list of the subquery. So for query:
+>
+> select t1.name
+> from tab t1
+> where t1.age = (select max(t2.age)
+> from tab2
+> where tab2.name = t1.name)
+>
+> Execute the subquery and put it in a temporary table:
+>
+> select t2.name, max(t2.age)
+> into table temp999
+> from tab2
+> where tab2.name = t1.name
+>
+> create index i_temp999 on temp999 (name)
+>
+> Then re-write the outer query:
+>
+> select t1.name
+> from tab t1, temp999
+> where t1.age = temp999.age and
+> t1.name = temp999.name
+>
+> The only problem here is that the subselect is running for all entries
+> in tab2, even if the outer query is only going to need a few rows.
+> Determining whether to execute the subquery each time, or create a temp.
+> table is often difficult to determine. Even some non-correlated
+> subqueries are better to execute for each row rather the pre-execute the
+> entire subquery, expecially if the outer query returns few rows.
+>
+> One requirement to handle these issues is better column statistics,
+> which I am working on.
+>
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Fri Oct 31 22:30:58 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA15643
+ for <maillist@candle.pha.pa.us>; Fri, 31 Oct 1997 22:30:56 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA24379 for <maillist@candle.pha.pa.us>; Fri, 31 Oct 1997 22:06:08 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA15503; Fri, 31 Oct 1997 22:03:40 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 31 Oct 1997 22:01:38 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA14136 for pgsql-hackers-outgoing; Fri, 31 Oct 1997 22:01:29 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA13866 for <hackers@postgreSQL.org>; Fri, 31 Oct 1997 22:00:53 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id VAA14566;
+ Fri, 31 Oct 1997 21:37:06 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711010237.VAA14566@candle.pha.pa.us>
+Subject: Re: [HACKERS] subselects
+To: maillist@candle.pha.pa.us (Bruce Momjian)
+Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <199710302150.QAA07726@candle.pha.pa.us> from "Bruce Momjian" at Oct 30, 97 04:50:29 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+One more issue I thought of. You can have multiple subselects in a
+single query, and subselects can have their own subselects.
+
+This makes it particularly important that we define a system that always
+is able to process the subselect BEFORE the upper select. This will
+allow use to handle all these cases without limitations.
+
+>
+> The only thing I have to add to what I had written earlier is that I
+> think it is best to have these subqueries executed as early in query
+> execution as possible.
+>
+> Every piece of the backend: parser, optimizer, executor, is designed to
+> work on a single query. The earlier we can split up the queries, the
+> better those pieces will work at doing their job. You want to be able
+> to use the parser and optimizer on each part of the query separately, if
+> you can.
+>
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From hannu@trust.ee Sun Nov 2 10:33:33 1997
+Received: from sid.trust.ee (sid.trust.ee [194.204.23.180])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27619
+ for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 10:32:04 -0500 (EST)
+Received: from sid.trust.ee (wink.trust.ee [194.204.23.184])
+ by sid.trust.ee (8.8.5/8.8.5) with ESMTP id RAA02233;
+ Sun, 2 Nov 1997 17:30:11 +0200
+Message-ID: <345C9BFD.986C68AA@sid.trust.ee>
+Date: Sun, 02 Nov 1997 17:27:57 +0200
+From: Hannu Krosing <hannu@trust.ee>
+X-Mailer: Mozilla 4.02 [en] (Win95; I)
+MIME-Version: 1.0
+To: hackers-digest@postgresql.org
+CC: maillist@candle.pha.pa.us
+Subject: Re: [HACKERS] subselects
+References: <199711010401.XAA09216@hub.org>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+> Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST)
+> From: Bruce Momjian <maillist@candle.pha.pa.us>
+> Subject: Re: [HACKERS] subselects
+>
+> One more issue I thought of. You can have multiple subselects in a
+> single query, and subselects can have their own subselects.
+>
+> This makes it particularly important that we define a system that always
+> is able to process the subselect BEFORE the upper select. This will
+> allow use to handle all these cases without limitations.
+
+This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a
+search criteria for the subselect,
+for example you can't do
+
+update parts p1
+set parts.current_id = (
+ select new_id
+ from parts p2
+ where p1.old_id = p2.new_id);or
+
+select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice
+from parts p1;
+
+there may be of course ways to rewrite these queries (which the optimiser should do
+if it can) but IMHO, these kinds of subselects should still be allowed
+
+> > The only thing I have to add to what I had written earlier is that I
+> > think it is best to have these subqueries executed as early in query
+> > execution as possible.
+> >
+> > Every piece of the backend: parser, optimizer, executor, is designed to
+> > work on a single query. The earlier we can split up the queries, the
+> > better those pieces will work at doing their job. You want to be able
+> > to use the parser and optimizer on each part of the query separately, if
+> > you can.
+> >
+>
+
+Hannu
+
+
+From vadim@sable.krasnoyarsk.su Sun Nov 2 21:30:59 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA14831
+ for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 21:30:57 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id VAA19683 for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 21:20:13 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id JAA17259; Mon, 3 Nov 1997 09:22:38 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <345D356E.353C51DE@sable.krasnoyarsk.su>
+Date: Mon, 03 Nov 1997 09:22:38 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] subselects
+References: <199711021848.NAA08319@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > > One more issue I thought of. You can have multiple subselects in a
+> > > single query, and subselects can have their own subselects.
+> > >
+> > > This makes it particularly important that we define a system that always
+> > > is able to process the subselect BEFORE the upper select. This will
+> > > allow use to handle all these cases without limitations.
+> >
+> > This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a
+> > search criteria for the subselect,
+> > for example you can't do
+> >
+> > update parts p1
+> > set parts.current_id = (
+> > select new_id
+> > from parts p2
+> > where p1.old_id = p2.new_id);or
+> >
+> > select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice
+> > from parts p1;
+> >
+> > there may be of course ways to rewrite these queries (which the optimiser should do
+> > if it can) but IMHO, these kinds of subselects should still be allowed
+>
+> I hadn't even gotten to this point yet, but it is a good thing to keep
+> in mind.
+>
+> In these cases, as in correlated subqueries in the where clause, we will
+> create a temporary table, and add the proper join fields and tables to
+> the clauses. Our version of UPDATE accepts a FROM section, and we will
+> certainly use this for this purpose.
+
+We can't replace subselect with join if there is aggregate
+in subselect.
+
+Actually, I don't see any problems if we going to process subselect
+like sql-funcs: non-correlated subselects can be emulated by
+funcs without args, for correlated subselects parser (analyze.c)
+has to change all upper query references to $1, $2,...
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Mon Nov 3 06:07:12 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27433
+ for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 06:07:03 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id SAA18519; Mon, 3 Nov 1997 18:09:44 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <345DB0F7.5E652F78@sable.krasnoyarsk.su>
+Date: Mon, 03 Nov 1997 18:09:43 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselects
+References: <199711030316.WAA15401@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> >
+> > > In these cases, as in correlated subqueries in the where clause, we will
+> > > create a temporary table, and add the proper join fields and tables to
+> > > the clauses. Our version of UPDATE accepts a FROM section, and we will
+> > > certainly use this for this purpose.
+> >
+> > We can't replace subselect with join if there is aggregate
+> > in subselect.
+>
+> I got lost here. Why can't we handle aggregates?
+
+Sorry, I missed using of temp tables. Sybase uses joins (without
+temp tables) for non-correlated subqueries:
+
+ A noncorrelated subquery can be evaluated as if it were an independent query.
+ Conceptually, the results of the subquery are substituted in the main statement, or
+ outer query. This is not how SQL Server actually processes statements with
+ subqueries. Noncorrelated subqueries can be alternatively stated as joins and
+ are processed as joins by SQL Server.
+
+but this is not possible if there are aggregates in subquery.
+
+>
+> My idea was this. This is a non-correlated subquery.
+...
+No problems with it...
+
+>
+> Here is a correlated example:
+>
+> select *
+> from table_a
+> where table_a.col_a in (select table_b.col_b
+> from table_b
+> where table_b.col_b = table_a.col_c)
+>
+> rewrite as:
+>
+> select distinct table_b.col_b, table_a.col_c -- the distinct is needed
+> into table_sub
+> from table_a, table_b
+
+First, could we add 'where table_b.col_b = table_a.col_c' here ?
+Just to avoid Cartesian results ? I hope we can.
+
+Note that for query
+
+ select *
+ from table_a
+ where table_a.col_a in (select table_b.col_b * table_a.col_c
+ from table_b)
+
+it's better to do
+
+ select distinct table_a.col_a
+ into table table_sub
+ from table_b, table_a
+ where table_a.col_a = table_b.col_b * table_a.col_c
+
+once again - to avoid Cartesians.
+
+But what could we do for
+
+ select *
+ from table_a
+ where table_a.col_a = (select max(table_b.col_b * table_a.col_c)
+ from table_b)
+???
+ select max(table_b.col_b * table_a.col_c), table_a.col_a
+ into table table_sub
+ from table_b, table_a
+ group by table_a.col_a
+
+first tries to sort sizeof(table_a) * sizeof(table_b) tuples...
+For tables big and small with 100 000 and 1000 tuples
+
+select max(x*y), x from big, small group by x
+
+"ate" all free 140M in my file system after 20 minutes (just for
+sorting - nothing more) and was killed...
+
+select x from big where x = cor(x);
+(cor(int4) is 'select max($1*y) from small') takes 20 minutes -
+this is bad too.
+
+> >
+> > Actually, I don't see any problems if we going to process subselect
+> > like sql-funcs: non-correlated subselects can be emulated by
+> > funcs without args, for correlated subselects parser (analyze.c)
+> > has to change all upper query references to $1, $2,...
+>
+> Yes, logically, they are SQL functions, but aren't we going to see
+> terrible performance in such circumstances. My experience is that when
+ ^^^^^^^^^^^^^^^^^^^^
+You're right.
+
+> people are given subselects, they start to do huge jobs with them.
+>
+> In fact, the final solution may be to have both methods available, and
+> switch between them depending on the size of the query sets. Each
+> method has its advantages. The function example lets the outside query
+> be executed, and only calls the subquery when needed.
+>
+> For large tables where the subselect is small and is the entire WHERE
+> restriction, the SQL function gets call much too often. A simple join
+> of the subquery result and the large table would be much better. This
+> method also allows for sort/merge join of the subquery results, and
+> index use.
+
+...keep thinking...
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Mon Nov 3 11:01:01 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03633
+ for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 11:00:59 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id KAA12174 for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 10:49:42 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id KAA26203; Mon, 3 Nov 1997 10:33:32 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 10:31:43 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id KAA25514 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 10:31:36 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA25449 for <hackers@postgreSQL.org>; Mon, 3 Nov 1997 10:31:23 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id KAA02262;
+ Mon, 3 Nov 1997 10:25:34 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711031525.KAA02262@candle.pha.pa.us>
+Subject: Re: [HACKERS] subselects
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Mon, 3 Nov 1997 10:25:34 -0500 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <345DB0F7.5E652F78@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 06:09:43 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> Sorry, I missed using of temp tables. Sybase uses joins (without
+> temp tables) for non-correlated subqueries:
+>
+> A noncorrelated subquery can be evaluated as if it were an independent query.
+> Conceptually, the results of the subquery are substituted in the main statement, or
+> outer query. This is not how SQL Server actually processes statements with
+> subqueries. Noncorrelated subqueries can be alternatively stated as joins and
+> are processed as joins by SQL Server.
+>
+> but this is not possible if there are aggregates in subquery.
+>
+> >
+> > My idea was this. This is a non-correlated subquery.
+> ...
+> No problems with it...
+>
+> >
+> > Here is a correlated example:
+> >
+> > select *
+> > from table_a
+> > where table_a.col_a in (select table_b.col_b
+> > from table_b
+> > where table_b.col_b = table_a.col_c)
+> >
+> > rewrite as:
+> >
+> > select distinct table_b.col_b, table_a.col_c -- the distinct is needed
+> > into table_sub
+> > from table_a, table_b
+>
+> First, could we add 'where table_b.col_b = table_a.col_c' here ?
+> Just to avoid Cartesian results ? I hope we can.
+
+Yes, of course. I forgot that line here. We can also be fancy and move
+some of the outer where restrictions on table_a into the subquery.
+
+I think the classic subquery for this would be if someone wanted all
+customer names that had invoices in the past month:
+
+select custname
+from customer
+where custid in (select order.custid
+ from order
+ where order.date >= "09/01/97" and
+ order.date <= "09/30/97"
+
+In this case, the subquery can use an index on 'date' to quickly
+evaluate the query, and the resulting temp table can quickly be joined
+to the customer table. If we used SQL functions, every customer would
+have an order query evaluated for it, and there may be no multi-column
+index on customer and date, or even if there is, this could be many
+query executions.
+
+
+>
+> Note that for query
+>
+> select *
+> from table_a
+> where table_a.col_a in (select table_b.col_b * table_a.col_c
+> from table_b)
+>
+> it's better to do
+>
+> select distinct table_a.col_a
+> into table table_sub
+> from table_b, table_a
+> where table_a.col_a = table_b.col_b * table_a.col_c
+
+Yes, I had not thought of cases where they are doing correlated column
+arithmetic, but it looks like this would work.
+
+>
+> once again - to avoid Cartesians.
+>
+> But what could we do for
+>
+> select *
+> from table_a
+> where table_a.col_a = (select max(table_b.col_b * table_a.col_c)
+> from table_b)
+
+OK, who wrote this horrible query. :-)
+
+Without a join of table_b and table_a, even an SQL function would die on
+this. You have to take the current value table_a.col_c, and multiply by
+every value of table_b.col_b to get the maximum.
+
+Trying to do a temp table on this is certainly going to be a cartesian
+product, but using an SQL function is also going to be a cartesian
+product, except that the product is generated in small pieces instead of
+in one big query. The SQL function example may eventually complete, but
+it will take forever to do so in cases where the temp table would bomb.
+
+I can recommend some SQL books for anyone go sends in a bug report on
+this query. :-)
+
+
+
+> ???
+> select max(table_b.col_b * table_a.col_c), table_a.col_a
+> into table table_sub
+> from table_b, table_a
+> group by table_a.col_a
+>
+> first tries to sort sizeof(table_a) * sizeof(table_b) tuples...
+> For tables big and small with 100 000 and 1000 tuples
+>
+> select max(x*y), x from big, small group by x
+>
+> "ate" all free 140M in my file system after 20 minutes (just for
+> sorting - nothing more) and was killed...
+>
+> select x from big where x = cor(x);
+> (cor(int4) is 'select max($1*y) from small') takes 20 minutes -
+> this is bad too.
+
+Again, my feeling is that in cases where the temp table would bomb, the
+SQL function will be so slow that neither will be acceptable.
+
+>
+> > >
+> > > Actually, I don't see any problems if we going to process subselect
+> > > like sql-funcs: non-correlated subselects can be emulated by
+> > > funcs without args, for correlated subselects parser (analyze.c)
+> > > has to change all upper query references to $1, $2,...
+> >
+> > Yes, logically, they are SQL functions, but aren't we going to see
+> > terrible performance in such circumstances. My experience is that when
+> ^^^^^^^^^^^^^^^^^^^^
+> You're right.
+>
+> > people are given subselects, they start to do huge jobs with them.
+> >
+> > In fact, the final solution may be to have both methods available, and
+> > switch between them depending on the size of the query sets. Each
+> > method has its advantages. The function example lets the outside query
+> > be executed, and only calls the subquery when needed.
+> >
+> > For large tables where the subselect is small and is the entire WHERE
+> > restriction, the SQL function gets call much too often. A simple join
+> > of the subquery result and the large table would be much better. This
+> > method also allows for sort/merge join of the subquery results, and
+> > index use.
+>
+> ...keep thinking...
+>
+> Vadim
+>
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Thu Nov 20 00:09:18 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA05239
+ for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 00:09:11 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA13776; Wed, 19 Nov 1997 23:59:53 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 19 Nov 1997 23:58:49 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA13599 for pgsql-hackers-outgoing; Wed, 19 Nov 1997 23:58:43 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA13512 for <hackers@postgreSQL.org>; Wed, 19 Nov 1997 23:58:16 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id XAA03103
+ for hackers@postgreSQL.org; Wed, 19 Nov 1997 23:57:44 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711200457.XAA03103@candle.pha.pa.us>
+Subject: [HACKERS] subselect
+To: hackers@postgreSQL.org (PostgreSQL-development)
+Date: Wed, 19 Nov 1997 23:57:44 -0500 (EST)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+I am going to overhaul all the /parser files, and I may give subselects
+a try while I am in there. This is where it going to have to be done.
+
+Two things I think I need are:
+
+ temp tables that go away at the end of a statement, so if the
+query elog's out, the temp file gets destroyed
+
+ how do I implement "not in":
+
+ select * from a where x not in (select y from b)
+
+Using <> is not going to work because that returns multiple copies of a,
+one for every one that doesn't equal. It is like we need not equals,
+but don't return multiple rows.
+
+Any ideas?
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From lockhart@alumni.caltech.edu Thu Nov 20 10:00:59 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA22019
+ for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 10:00:56 -0500 (EST)
+Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21662 for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 09:52:55 -0500 (EST)
+Received: from alumni.caltech.edu (localhost [127.0.0.1])
+ by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA22754;
+ Thu, 20 Nov 1997 06:27:21 GMT
+Sender: tgl@gnet04.jpl.nasa.gov
+Message-ID: <3473D849.16F67A2A@alumni.caltech.edu>
+Date: Thu, 20 Nov 1997 06:27:21 +0000
+From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Organization: Caltech/JPL
+X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgresql.org>
+Subject: Re: [HACKERS] subselect
+References: <199711200457.XAA03103@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+> I am going to overhaul all the /parser files
+
+??
+
+> , and I may give subselects
+> a try while I am in there. This is where it going to have to be done.
+
+A first cut at the subselect syntax is already in gram.y. I'm sure that the
+e-mail you had sent which collected several items regarding subselects
+covers some of this topic. I've been thinking about subselects also, and
+had thought that there must be some existing mechanisms in the backend
+which can be used to help implement subselects. It seems to me that UNION
+might be a good thing to implement first, because it has a fairly
+well-defined set of behaviors:
+
+ select a union select b;
+
+chooses elements from a and from b and then sorts/uniques the result.
+
+ select a union all select b;
+
+chooses elements from a, sorts/uniques, and then adds all elements from b.
+
+ select a union select b union all select c;
+
+evaluates left to right, and first evaluates a union b, sorts/uniques, and
+then evaluates
+
+ (result) union all select c;
+
+There are several types of subselects. Examples of some are:
+
+1) select a.f from a union select b.f from b order by 1;
+Needs temporary table(s), optional sort/unique, final order by.
+
+2) select a.f from a where a.f in (select b.f from b);
+Needs temporary table(s). "in" can be first implemented by count(*) > 0 but
+would be better performance to have the backend return after the first
+match.
+
+3) select a.f from a where exists (select b.f from b where b.f = a);
+Need to do the select and do a subselect on _each_ of the returned values?
+Again could use count(*) to help implement.
+
+This brings up the point that perhaps the backend needs a row-counting
+atomic operation and count(*) could be re-implemented using that. At the
+moment count(*) is transformed to a select of OID columns and does not
+quite work on table joins.
+
+I would think that outer joins could use some of these support routines
+also.
+
+ - Tom
+
+> Two things I think I need are:
+>
+> temp tables that go away at the end of a statement, so if the
+> query elog's out, the temp file gets destroyed
+>
+> how do I implement "not in":
+>
+> select * from a where x not in (select y from b)
+>
+> Using <> is not going to work because that returns multiple copies of a,
+> one for every one that doesn't equal. It is like we need not equals,
+> but don't return multiple rows.
+>
+> Any ideas?
+>
+> --
+> Bruce Momjian
+> maillist@candle.pha.pa.us
+
+
+
+
+From owner-pgsql-hackers@hub.org Mon Dec 22 00:49:03 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA13311
+ for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 00:49:01 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA11930; Mon, 22 Dec 1997 00:45:41 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 00:45:17 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA11756 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 00:45:14 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA11624 for <hackers@postgreSQL.org>; Mon, 22 Dec 1997 00:44:57 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id AAA11605
+ for hackers@postgreSQL.org; Mon, 22 Dec 1997 00:45:23 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199712220545.AAA11605@candle.pha.pa.us>
+Subject: [HACKERS] subselects
+To: hackers@postgreSQL.org (PostgreSQL-development)
+Date: Mon, 22 Dec 1997 00:45:23 -0500 (EST)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+OK, a few questions:
+
+ Should we use sortmerge, so we can use our psort as temp tables,
+or do we use hashunique?
+
+ How do we pass the query to the optimizer? How do we represent
+the range table for each, and the links between them in correlated
+subqueries?
+
+I have to think about this. Comments are welcome.
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Mon Dec 22 02:01:27 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA20608
+ for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 02:01:25 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA25136 for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 01:37:29 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA25289; Mon, 22 Dec 1997 01:31:18 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 01:30:45 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA23854 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 01:30:35 -0500 (EST)
+Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22847 for <hackers@postgreSQL.org>; Mon, 22 Dec 1997 01:30:15 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id BAA17354
+ for hackers@postgreSQL.org; Mon, 22 Dec 1997 01:05:04 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199712220605.BAA17354@candle.pha.pa.us>
+Subject: [HACKERS] subselects (fwd)
+To: hackers@postgreSQL.org (PostgreSQL-development)
+Date: Mon, 22 Dec 1997 01:05:03 -0500 (EST)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Forwarded message:
+> OK, a few questions:
+>
+> Should we use sortmerge, so we can use our psort as temp tables,
+> or do we use hashunique?
+>
+> How do we pass the query to the optimizer? How do we represent
+> the range table for each, and the links between them in correlated
+> subqueries?
+>
+> I have to think about this. Comments are welcome.
+
+One more thing. I guess I am seeing subselects as a different thing
+that temp tables. I can see people wanting to put indexes on their temp
+tables, so I think they will need more system catalog support. For
+subselects, I think we can just stuff them into psort, perhaps, and do
+the unique as we unload them.
+
+Seems like a natural to me.
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:07 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08876
+ for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:00:57 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23042;
+ Tue, 23 Dec 1997 16:08:56 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <349F7FA8.77F8DC55@sable.krasnoyarsk.su>
+Date: Tue, 23 Dec 1997 16:08:56 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] subselects (fwd)
+References: <199712220605.BAA17354@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> Forwarded message:
+> > OK, a few questions:
+> >
+> > Should we use sortmerge, so we can use our psort as temp tables,
+> > or do we use hashunique?
+> >
+> > How do we pass the query to the optimizer? How do we represent
+> > the range table for each, and the links between them in correlated
+> > subqueries?
+> >
+> > I have to think about this. Comments are welcome.
+>
+> One more thing. I guess I am seeing subselects as a different thing
+> that temp tables. I can see people wanting to put indexes on their temp
+> tables, so I think they will need more system catalog support. For
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+What's the difference between temp tables and temp indices ?
+Both of them are handled via catalog cache...
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Sat Jan 3 04:01:00 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA28565
+ for <maillist@candle.pha.pa.us>; Sat, 3 Jan 1998 04:00:58 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA19242 for <maillist@candle.pha.pa.us>; Sat, 3 Jan 1998 03:47:07 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA21017;
+ Sat, 3 Jan 1998 16:08:55 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34AE0023.A477AEC5@sable.krasnoyarsk.su>
+Date: Sat, 03 Jan 1998 16:08:51 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>,
+ "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Subject: Re: subselects
+References: <199712290516.AAA12579@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> With UNIONs done, how are things going with you on subselects? UNIONs
+> are much easier that subselects.
+>
+> I am stumped on how to record the subselect query information in the
+> parser and stuff.
+
+ And I'm too. We definitely need in EXISTS node and may be in IN one.
+Also, we have to support ANY and ALL modifiers of comparison operators
+(it would be nice to support ANY and ALL for all operators returning
+bool: >, =, ..., like, ~ and so on). Note, that IN is the same as
+= ANY (NOT IN ==> <> ALL) assuming that '=' means EQUAL for all data types,
+and so, we could avoid IN node, but I'm not sure that I like such
+assumption: postgres is OO-like system allowing operators to be overriden
+and so, '=' can, in theory, mean not EQUAL but something else (someday
+we could allow to specify "meaning" of operator in CREATE OPERATOR) -
+in short, I would like IN node.
+ Also, I would suggest nodes for ANY and ALL.
+ (I need in few days to think more about recording of this stuff...)
+
+>
+> Please let me know what I can do to help, if anything.
+
+Thanks. As I remember, Tom also wished to work here. Tom ?
+
+Bye,
+ Vadim
+
+P.S. I'll be "on-line" Jan 5.
+
+From owner-pgsql-hackers@hub.org Mon Jan 5 07:30:51 1998
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA05466
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 07:30:49 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id HAA04700; Mon, 5 Jan 1998 07:22:06 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 07:21:45 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id HAA02846 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 07:21:35 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA00903 for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 07:20:57 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA24278;
+ Mon, 5 Jan 1998 19:36:06 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Message-ID: <34B0D3AF.F31338B3@sable.krasnoyarsk.su>
+Date: Mon, 05 Jan 1998 19:35:59 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] subselect
+References: <199801050516.AAA28005@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce Momjian wrote:
+>
+> I was thinking about subselects, and how to attach the two queries.
+>
+> What if the subquery makes a range table entry in the outer query, and
+> the query is set up like the UNION queries where we put the scans in a
+> row, but in the case we put them over/under each other.
+>
+> And we push a temp table into the catalog cache that represents the
+> result of the subquery, then we could join to it in the outer query as
+> though it was a real table.
+>
+> Also, can't we do the correlated subqueries by adding the proper
+> target/output columns to the subquery, and have the outer query
+> reference those columns in the subquery range table entry.
+
+Yes, this is a way to handle subqueries by joining to temp table.
+After getting plan we could change temp table access path to
+node material. On the other hand, it could be useful to let optimizer
+know about cost of temp table creation (have to think more about it)...
+Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
+is one example of this - joining by <> will give us invalid results.
+Setting special NOT EQUAL flag is not enough: subquery plan must be
+always inner one in this case. The same for handling ALL modifier.
+Note, that we generaly can't use aggregates here: we can't add MAX to
+subquery in the case of > ALL (subquery), because of > ALL should return FALSE
+if subquery returns NULL(s) but aggregates don't take NULLs into account.
+
+>
+> Maybe I can write up a sample of this? Vadim, would this help? Is this
+> the point we are stuck at?
+
+Personally, I was stuck by holydays -:)
+Now I can spend ~ 8 hours ~ each day for development...
+
+Vadim
+
+
+From owner-pgsql-hackers@hub.org Mon Jan 5 10:45:30 1998
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA10769
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 10:45:28 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA17823; Mon, 5 Jan 1998 10:32:00 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 10:31:45 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA17757 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 10:31:38 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA17727 for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 10:31:06 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id KAA10375;
+ Mon, 5 Jan 1998 10:28:48 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801051528.KAA10375@candle.pha.pa.us>
+Subject: Re: [HACKERS] subselect
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Mon, 5 Jan 1998 10:28:48 -0500 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 5, 98 07:35:59 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> Yes, this is a way to handle subqueries by joining to temp table.
+> After getting plan we could change temp table access path to
+> node material. On the other hand, it could be useful to let optimizer
+> know about cost of temp table creation (have to think more about it)...
+> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
+> is one example of this - joining by <> will give us invalid results.
+> Setting special NOT EQUAL flag is not enough: subquery plan must be
+> always inner one in this case. The same for handling ALL modifier.
+> Note, that we generaly can't use aggregates here: we can't add MAX to
+> subquery in the case of > ALL (subquery), because of > ALL should return FALSE
+> if subquery returns NULL(s) but aggregates don't take NULLs into account.
+
+OK, here are my ideas. First, I think you have to handle subselects in
+the outer node because a subquery could have its own subquery. Also, we
+now have a field in Aggreg to all us to 'usenulls'.
+
+OK, here it is. I recommend we pass the outer and subquery through
+the parser and optimizer separately.
+
+We parse the subquery first. If the subquery is not correlated, it
+should parse fine. If it is correlated, any columns we find in the
+subquery that are not already in the FROM list, we add the table to the
+subquery FROM list, and add the referenced column to the target list of
+the subquery.
+
+When we are finished parsing the subquery, we create a catalog cache
+entry for it called 'sub1' and make its fields match the target
+list of the subquery.
+
+In the outer query, we add 'sub1' to its target list, and change
+the subquery reference to point to the new range table. We also add
+WHERE clauses to do any correlated joins.
+
+Here is a simple example:
+
+ select *
+ from taba
+ where col1 = (select col2
+ from tabb)
+
+This is not correlated, and the subquery parser easily. We create a
+'sub1' catalog cache entry, and add 'sub1' to the outer query FROM
+clause. We also replace 'col1 = (subquery)' with 'col1 = sub1.col2'.
+
+Here is a more complex correlated subquery:
+
+ select *
+ from taba
+ where col1 = (select col2
+ from tabb
+ where taba.col3 = tabb.col4)
+
+Here we must add 'taba' to the subquery's FROM list, and add col3 to the
+target list of the subquery. After we parse the subquery, add 'sub1' to
+the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
+sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
+THe optimizer will do the correlation for us.
+
+In the optimizer, we can parse the subquery first, then the outer query,
+and then replace all 'sub1' references in the outer query to use the
+subquery plan.
+
+I realize making merging the two plans and doing IN and NOT IN is the
+real challenge, but I hoped this would give us a start.
+
+What do you think?
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Mon Jan 5 15:02:46 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA28690
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 15:02:44 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA08811 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 14:28:43 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id CAA24904;
+ Tue, 6 Jan 1998 02:56:00 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B13ACD.B1A95805@sable.krasnoyarsk.su>
+Date: Tue, 06 Jan 1998 02:55:57 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselect
+References: <199801051528.KAA10375@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > always inner one in this case. The same for handling ALL modifier.
+> > Note, that we generaly can't use aggregates here: we can't add MAX to
+> > subquery in the case of > ALL (subquery), because of > ALL should return FALSE
+> > if subquery returns NULL(s) but aggregates don't take NULLs into account.
+>
+> OK, here are my ideas. First, I think you have to handle subselects in
+> the outer node because a subquery could have its own subquery. Also, we
+
+I hope that this is no matter: if results of subquery (with/without sub-subqueries)
+will go into temp table then this table will be re-scanned for each outer tuple.
+
+> now have a field in Aggreg to all us to 'usenulls'.
+ ^^^^^^^^
+ This can't help:
+
+vac=> select * from x;
+y
+-
+1
+2
+3
+ <<< this is NULL
+(4 rows)
+
+vac=> select max(y) from x;
+max
+---
+ 3
+
+==> we can't replace
+
+select * from A where A.a > ALL (select y from x);
+ ^^^^^^^^^^^^^^^
+ (NULL will be returned and so A.a > ALL is FALSE - this is what
+ Sybase does, is it right ?)
+with
+
+select * from A where A.a > (select max(y) from x);
+ ^^^^^^^^^^^^^^^^^^^^
+just because of we lose knowledge about NULLs here.
+
+Also, I would like to handle ANY and ALL modifiers for all bool
+operators, either built-in or user-defined, for all data types -
+isn't PostgreSQL OO-like RDBMS -:)
+
+> OK, here it is. I recommend we pass the outer and subquery through
+> the parser and optimizer separately.
+
+I don't like this. I would like to get parse-tree from parser for
+entire query and let optimizer (on upper level) decide how to rewrite
+parse-tree and what plans to produce and how these plans should be
+merged. Note, that I don't object your methods below, but only where
+to place handling of this. I don't understand why should we add
+new part to the system which will do optimizer' work (parse-tree -->
+execution plan) and deal with optimizer nodes. Imho, upper optimizer
+level is nice place to do this.
+
+>
+> We parse the subquery first. If the subquery is not correlated, it
+> should parse fine. If it is correlated, any columns we find in the
+> subquery that are not already in the FROM list, we add the table to the
+> subquery FROM list, and add the referenced column to the target list of
+> the subquery.
+>
+> When we are finished parsing the subquery, we create a catalog cache
+> entry for it called 'sub1' and make its fields match the target
+> list of the subquery.
+>
+> In the outer query, we add 'sub1' to its target list, and change
+> the subquery reference to point to the new range table. We also add
+> WHERE clauses to do any correlated joins.
+...
+> Here is a more complex correlated subquery:
+>
+> select *
+> from taba
+> where col1 = (select col2
+> from tabb
+> where taba.col3 = tabb.col4)
+>
+> Here we must add 'taba' to the subquery's FROM list, and add col3 to the
+> target list of the subquery. After we parse the subquery, add 'sub1' to
+> the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
+> sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
+> THe optimizer will do the correlation for us.
+>
+> In the optimizer, we can parse the subquery first, then the outer query,
+> and then replace all 'sub1' references in the outer query to use the
+> subquery plan.
+>
+> I realize making merging the two plans and doing IN and NOT IN is the
+ ^^^^^^^^^^^^^^^^^^^^^
+This is very easy to do! As I already said we have just change sub1
+access path (SeqScan of sub1) with SeqScan of Material node with
+subquery plan.
+
+> real challenge, but I hoped this would give us a start.
+
+Decision about how to record subquery stuff in to parse-tree
+would be very good start -:)
+
+BTW, note that for _expression_ subqueries (which are introduced without
+IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples -
+we have to check that subquery returns single tuple...
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Mon Jan 5 20:31:03 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06836
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:31:01 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA29980 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 19:56:05 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28044; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:16 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27203 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:02 -0500 (EST)
+Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27049 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:30 -0500 (EST)
+Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67])
+ by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09337
+ for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:31:04 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id RAA02675;
+ Mon, 5 Jan 1998 17:16:40 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801052216.RAA02675@candle.pha.pa.us>
+Subject: Re: [HACKERS] subselect
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Mon, 5 Jan 1998 17:16:40 -0500 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <34B15C23.B24D5CC@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 6, 98 05:18:11 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> > I am confused. Do you want one flat query and want to pass the whole
+> > thing into the optimizer? That brings up some questions:
+>
+> No. I just want to follow Tom's way: I would like to see new
+> SubSelect node as shortened version of struct Query (or use
+> Query structure for each subquery - no matter for me), some
+> subquery-related stuff added to Query (and SubSelect) to help
+> optimizer to start, and see
+
+OK, so you want the subquery to actually be INSIDE the outer query
+expression. Do they share a common range table? If they don't, we
+could very easily just fly through when processing the WHERE clause, and
+start a new query using a new query structure for the subquery. Believe
+me, you don't want a separate SubQuery-type, just re-use Query for it.
+It allows you to call all the normal query stuff with a consistent
+structure.
+
+The parser will need to know it is in a subquery, so it can add the
+proper target columns to the subquery, or are you going to do that in
+the optimizer. You can do it in the optimizer, and join the range table
+references there too.
+
+>
+> typedef struct A_Expr
+> {
+> NodeTag type;
+> int oper; /* type of operation
+> * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
+> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> IN, NOT IN, ANY, ALL, EXISTS here,
+>
+> char *opname; /* name of operator/function */
+> Node *lexpr; /* left argument */
+> Node *rexpr; /* right argument */
+> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> and SubSelect (Query) here (as possible case).
+>
+> One thought to follow this way: RULEs (and so - VIEWs) are handled by using
+> Query - how else can we implement VIEWs on selects with subqueries ?
+
+Views are stored as nodeout structures, and are merged into the query's
+from list, target list, and where clause. I am working out
+readfunc,outfunc now to make sure they are up-to-date with all the
+current fields.
+
+>
+> BTW, is
+>
+> select * from A where (select TRUE from B);
+>
+> valid syntax ?
+
+I don't think so.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Mon Jan 5 17:01:54 1998
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA02066
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:01:47 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25063;
+ Tue, 6 Jan 1998 05:18:13 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B15C23.B24D5CC@sable.krasnoyarsk.su>
+Date: Tue, 06 Jan 1998 05:18:11 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselect
+References: <199801052051.PAA29341@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > > OK, here it is. I recommend we pass the outer and subquery through
+> > > the parser and optimizer separately.
+> >
+> > I don't like this. I would like to get parse-tree from parser for
+> > entire query and let optimizer (on upper level) decide how to rewrite
+> > parse-tree and what plans to produce and how these plans should be
+> > merged. Note, that I don't object your methods below, but only where
+> > to place handling of this. I don't understand why should we add
+> > new part to the system which will do optimizer' work (parse-tree -->
+> > execution plan) and deal with optimizer nodes. Imho, upper optimizer
+> > level is nice place to do this.
+>
+> I am confused. Do you want one flat query and want to pass the whole
+> thing into the optimizer? That brings up some questions:
+
+No. I just want to follow Tom's way: I would like to see new
+SubSelect node as shortened version of struct Query (or use
+Query structure for each subquery - no matter for me), some
+subquery-related stuff added to Query (and SubSelect) to help
+optimizer to start, and see
+
+typedef struct A_Expr
+{
+ NodeTag type;
+ int oper; /* type of operation
+ * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ IN, NOT IN, ANY, ALL, EXISTS here,
+
+ char *opname; /* name of operator/function */
+ Node *lexpr; /* left argument */
+ Node *rexpr; /* right argument */
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ and SubSelect (Query) here (as possible case).
+
+One thought to follow this way: RULEs (and so - VIEWs) are handled by using
+Query - how else can we implement VIEWs on selects with subqueries ?
+
+BTW, is
+
+select * from A where (select TRUE from B);
+
+valid syntax ?
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Mon Jan 5 18:00:57 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03296
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:00:55 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA20716 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:22:21 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094;
+ Tue, 6 Jan 1998 05:49:02 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su>
+Date: Tue, 06 Jan 1998 05:48:58 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Goran Thyni <goran@bildbasen.se>
+CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselect
+References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Goran Thyni wrote:
+>
+> Vadim,
+>
+> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
+> is one example of this - joining by <> will give us invalid results.
+>
+> What is you approach towards this problem?
+
+Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL)
+and so, we have to have not just NOT EQUAL flag but some ALL node
+with modified operator.
+
+After that, one way is put subquery into inner plan of an join node
+to be sure that for an outer tuple all corresponding subquery tuples
+will be tested with modified operator (this will require either
+changing code of all join nodes or addition of new plan type - we'll see)
+and another way is ... suggested by you:
+
+> I got an idea that one could reverse the order,
+> that is execute the outer first into a temptable
+> and delete from that according to the result of the
+> subquery and then return it.
+> Probably this is too raw and slow. ;-)
+
+This will be faster in some cases (when subquery returns many results
+and there are "not so many" results from outer query) - thanks for idea!
+
+>
+> Personally, I was stuck by holydays -:)
+> Now I can spend ~ 8 hours ~ each day for development...
+>
+> Oh, isn't it christmas eve right now in Russia?
+
+Due to historic reasons New Year is mu-u-u-uch popular
+holiday in Russia -:)
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Mon Jan 5 19:32:59 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA05070
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 19:32:57 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA26847 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:59:43 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28045; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:40 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27280 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:25 -0500 (EST)
+Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27030 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:25 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09438
+ for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:35:43 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094;
+ Tue, 6 Jan 1998 05:49:02 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su>
+Date: Tue, 06 Jan 1998 05:48:58 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Goran Thyni <goran@bildbasen.se>
+CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselect
+References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Goran Thyni wrote:
+>
+> Vadim,
+>
+> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
+> is one example of this - joining by <> will give us invalid results.
+>
+> What is you approach towards this problem?
+
+Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL)
+and so, we have to have not just NOT EQUAL flag but some ALL node
+with modified operator.
+
+After that, one way is put subquery into inner plan of an join node
+to be sure that for an outer tuple all corresponding subquery tuples
+will be tested with modified operator (this will require either
+changing code of all join nodes or addition of new plan type - we'll see)
+and another way is ... suggested by you:
+
+> I got an idea that one could reverse the order,
+> that is execute the outer first into a temptable
+> and delete from that according to the result of the
+> subquery and then return it.
+> Probably this is too raw and slow. ;-)
+
+This will be faster in some cases (when subquery returns many results
+and there are "not so many" results from outer query) - thanks for idea!
+
+>
+> Personally, I was stuck by holydays -:)
+> Now I can spend ~ 8 hours ~ each day for development...
+>
+> Oh, isn't it christmas eve right now in Russia?
+
+Due to historic reasons New Year is mu-u-u-uch popular
+holiday in Russia -:)
+
+Vadim
+
+
+From vadim@sable.krasnoyarsk.su Mon Jan 5 18:00:59 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03300
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:00:57 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA21652 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:42:15 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129;
+ Tue, 6 Jan 1998 06:10:05 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su>
+Date: Tue, 06 Jan 1998 06:09:56 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselect
+References: <199801052216.RAA02675@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > > I am confused. Do you want one flat query and want to pass the whole
+> > > thing into the optimizer? That brings up some questions:
+> >
+> > No. I just want to follow Tom's way: I would like to see new
+> > SubSelect node as shortened version of struct Query (or use
+> > Query structure for each subquery - no matter for me), some
+> > subquery-related stuff added to Query (and SubSelect) to help
+> > optimizer to start, and see
+>
+> OK, so you want the subquery to actually be INSIDE the outer query
+> expression. Do they share a common range table? If they don't, we
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+No.
+
+> could very easily just fly through when processing the WHERE clause, and
+> start a new query using a new query structure for the subquery. Believe
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+... and filling some subquery-related stuff in upper query structure -
+still don't know what exactly this could be -:)
+
+> me, you don't want a separate SubQuery-type, just re-use Query for it.
+> It allows you to call all the normal query stuff with a consistent
+> structure.
+
+No objections.
+
+>
+> The parser will need to know it is in a subquery, so it can add the
+> proper target columns to the subquery, or are you going to do that in
+
+I don't think that we need in it, but list of correlation clauses
+could be good thing - all in all parser has to check all column
+references...
+
+> the optimizer. You can do it in the optimizer, and join the range table
+> references there too.
+
+Yes.
+
+> > typedef struct A_Expr
+> > {
+> > NodeTag type;
+> > int oper; /* type of operation
+> > * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
+> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> > IN, NOT IN, ANY, ALL, EXISTS here,
+> >
+> > char *opname; /* name of operator/function */
+> > Node *lexpr; /* left argument */
+> > Node *rexpr; /* right argument */
+> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> > and SubSelect (Query) here (as possible case).
+> >
+> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using
+> > Query - how else can we implement VIEWs on selects with subqueries ?
+>
+> Views are stored as nodeout structures, and are merged into the query's
+> from list, target list, and where clause. I am working out
+> readfunc,outfunc now to make sure they are up-to-date with all the
+> current fields.
+
+Nice! This stuff was out-of-date for too long time.
+
+> > BTW, is
+> >
+> > select * from A where (select TRUE from B);
+> >
+> > valid syntax ?
+>
+> I don't think so.
+
+And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN,
+ANY, ALL, EXISTS - well.
+
+(Time to sleep -:)
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Mon Jan 5 20:31:08 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06842
+ for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:31:06 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id UAA00621 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:03:49 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28043; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:38 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27270 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:22 -0500 (EST)
+Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27141 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:50 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09919
+ for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:54:47 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129;
+ Tue, 6 Jan 1998 06:10:05 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su>
+Date: Tue, 06 Jan 1998 06:09:56 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselect
+References: <199801052216.RAA02675@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > > I am confused. Do you want one flat query and want to pass the whole
+> > > thing into the optimizer? That brings up some questions:
+> >
+> > No. I just want to follow Tom's way: I would like to see new
+> > SubSelect node as shortened version of struct Query (or use
+> > Query structure for each subquery - no matter for me), some
+> > subquery-related stuff added to Query (and SubSelect) to help
+> > optimizer to start, and see
+>
+> OK, so you want the subquery to actually be INSIDE the outer query
+> expression. Do they share a common range table? If they don't, we
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+No.
+
+> could very easily just fly through when processing the WHERE clause, and
+> start a new query using a new query structure for the subquery. Believe
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+... and filling some subquery-related stuff in upper query structure -
+still don't know what exactly this could be -:)
+
+> me, you don't want a separate SubQuery-type, just re-use Query for it.
+> It allows you to call all the normal query stuff with a consistent
+> structure.
+
+No objections.
+
+>
+> The parser will need to know it is in a subquery, so it can add the
+> proper target columns to the subquery, or are you going to do that in
+
+I don't think that we need in it, but list of correlation clauses
+could be good thing - all in all parser has to check all column
+references...
+
+> the optimizer. You can do it in the optimizer, and join the range table
+> references there too.
+
+Yes.
+
+> > typedef struct A_Expr
+> > {
+> > NodeTag type;
+> > int oper; /* type of operation
+> > * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
+> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> > IN, NOT IN, ANY, ALL, EXISTS here,
+> >
+> > char *opname; /* name of operator/function */
+> > Node *lexpr; /* left argument */
+> > Node *rexpr; /* right argument */
+> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> > and SubSelect (Query) here (as possible case).
+> >
+> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using
+> > Query - how else can we implement VIEWs on selects with subqueries ?
+>
+> Views are stored as nodeout structures, and are merged into the query's
+> from list, target list, and where clause. I am working out
+> readfunc,outfunc now to make sure they are up-to-date with all the
+> current fields.
+
+Nice! This stuff was out-of-date for too long time.
+
+> > BTW, is
+> >
+> > select * from A where (select TRUE from B);
+> >
+> > valid syntax ?
+>
+> I don't think so.
+
+And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN,
+ANY, ALL, EXISTS - well.
+
+(Time to sleep -:)
+
+Vadim
+
+
+From owner-pgsql-hackers@hub.org Thu Jan 8 23:10:50 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA09707
+ for <maillist@candle.pha.pa.us>; Thu, 8 Jan 1998 23:10:48 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA19334 for <maillist@candle.pha.pa.us>; Thu, 8 Jan 1998 23:08:49 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id XAA14375; Thu, 8 Jan 1998 23:03:29 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 08 Jan 1998 23:03:10 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id XAA14345 for pgsql-hackers-outgoing; Thu, 8 Jan 1998 23:03:06 -0500 (EST)
+Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id XAA14008 for <hackers@postgreSQL.org>; Thu, 8 Jan 1998 23:00:50 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id WAA09243;
+ Thu, 8 Jan 1998 22:55:03 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801090355.WAA09243@candle.pha.pa.us>
+Subject: [HACKERS] subselects
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Thu, 8 Jan 1998 22:55:03 -0500 (EST)
+Cc: hackers@postgreSQL.org (PostgreSQL-development)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Vadim, I know you are still thinking about subselects, but I have some
+more clarification that may help.
+
+We have to add phantom range table entries to correlated subselects so
+they will pass the parser. We might as well add those fields to the
+target list of the subquery at the same time:
+
+ select *
+ from taba
+ where col1 = (select col2
+ from tabb
+ where taba.col3 = tabb.col4)
+
+becomes:
+
+ select *
+ from taba
+ where col1 = (select col2, tabb.col4 <---
+ from tabb, taba <---
+ where taba.col3 = tabb.col4)
+
+We add a field to TargetEntry and RangeTblEntry to mark the fact that it
+was entered as a correlation entry:
+
+ bool isCorrelated;
+
+Second, we need to hook the subselect to the main query. I recommend we
+add two fields to Query for this:
+
+ Query *parentQuery;
+ List *subqueries;
+
+The parentQuery pointer is used to resolve field names in the correlated
+subquery.
+
+ select *
+ from taba
+ where col1 = (select col2, tabb.col4 <---
+ from tabb, taba <---
+ where taba.col3 = tabb.col4)
+
+In the query above, the subquery can be easily parsed, and we add the
+subquery to the parsent's parentQuery list.
+
+In the parent query, to parse the WHERE clause, we create a new operator
+type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
+right side is an index to a slot in the subqueries List.
+
+We can then do the rest in the upper optimizer.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Fri Jan 9 10:01:01 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27305
+ for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 10:00:59 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21583 for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 09:52:17 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id WAA01623;
+ Fri, 9 Jan 1998 22:10:25 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B63DCD.73AA70C7@sable.krasnoyarsk.su>
+Date: Fri, 09 Jan 1998 22:10:06 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgresql.org>
+Subject: Re: subselects
+References: <199801090355.WAA09243@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> Vadim, I know you are still thinking about subselects, but I have some
+> more clarification that may help.
+>
+> We have to add phantom range table entries to correlated subselects so
+> they will pass the parser. We might as well add those fields to the
+> target list of the subquery at the same time:
+>
+> select *
+> from taba
+> where col1 = (select col2
+> from tabb
+> where taba.col3 = tabb.col4)
+>
+> becomes:
+>
+> select *
+> from taba
+> where col1 = (select col2, tabb.col4 <---
+> from tabb, taba <---
+> where taba.col3 = tabb.col4)
+>
+> We add a field to TargetEntry and RangeTblEntry to mark the fact that it
+> was entered as a correlation entry:
+>
+> bool isCorrelated;
+
+No, I don't like to add anything in parser. Example:
+
+ select *
+ from tabA
+ where col1 = (select col2
+ from tabB
+ where tabA.col3 = tabB.col4
+ and exists (select *
+ from tabC
+ where tabB.colX = tabC.colX and
+ tabC.colY = tabA.col2)
+ )
+
+: a column of tabA is referenced in sub-subselect
+(is it allowable by standards ?) - in this case it's better
+to don't add tabA to 1st subselect but add tabA to second one
+and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
+this gives us 2-tables join in 1st subquery instead of 3-tables join.
+(And I'm still not sure that using temp tables is best of what can be
+done in all cases...)
+
+Instead of using isCorrelated in TE & RTE we can add
+
+Index varlevel;
+
+to Var node to reflect (sub)query from where this Var is come
+(where is range table to find var's relation using varno). Upmost query
+will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
+ ^^^ ^^^^^^^^^^^^
+(I don't see problems with distinguishing Vars of different children
+on the same level...)
+
+>
+> Second, we need to hook the subselect to the main query. I recommend we
+> add two fields to Query for this:
+>
+> Query *parentQuery;
+> List *subqueries;
+
+Agreed. And maybe Index queryLevel.
+
+> In the parent query, to parse the WHERE clause, we create a new operator
+> type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
+ ^^^^^^^^^^^^^^^^^^
+No. We have to handle (a,b,c) OP (select x, y, z ...) and
+'_a_constant_' OP (select ...) - I don't know is last in standards,
+Sybase has this.
+
+Well,
+
+typedef enum OpType
+{
+ OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
+
++ OP_EXISTS, OP_ALL, OP_ANY
+
+} OpType;
+
+typedef struct Expr
+{
+ NodeTag type;
+ Oid typeOid; /* oid of the type of this expr */
+ OpType opType; /* type of the op */
+ Node *oper; /* could be Oper or Func */
+ List *args; /* list of argument nodes */
+} Expr;
+
+OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
+ List, following your suggestion)
+
+OP_ALL, OP_ANY:
+
+oper is List of Oper nodes. We need in list because of data types of
+a, b, c (above) can be different and so Oper nodes will be different too.
+
+lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
+left side of subquery' operator.
+lsecond(args) is SubSelect.
+
+Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
+IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
+by parser into corresponding ANY and ALL. At the moment we can do:
+
+IN --> = ANY, NOT IN --> <> ALL
+
+but this will be "known bug": this breaks OO-nature of Postgres, because of
+operators can be overrided and '=' can mean s o m e t h i n g (not equality).
+Example: box data type. For boxes, = means equality of _areas_ and =~
+means that boxes are the same ==> =~ ANY should be used for IN.
+
+> right side is an index to a slot in the subqueries List.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Fri Jan 9 17:44:04 1998
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA24779
+ for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 17:44:01 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id RAA20728; Fri, 9 Jan 1998 17:32:34 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 09 Jan 1998 17:32:19 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id RAA20503 for pgsql-hackers-outgoing; Fri, 9 Jan 1998 17:32:15 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id RAA20008 for <hackers@postgresql.org>; Fri, 9 Jan 1998 17:31:24 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id RAA24282;
+ Fri, 9 Jan 1998 17:31:41 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801092231.RAA24282@candle.pha.pa.us>
+Subject: [HACKERS] Re: subselects
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Fri, 9 Jan 1998 17:31:41 -0500 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 9, 98 10:10:06 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> Bruce Momjian wrote:
+> >
+> > Vadim, I know you are still thinking about subselects, but I have some
+> > more clarification that may help.
+> >
+> > We have to add phantom range table entries to correlated subselects so
+> > they will pass the parser. We might as well add those fields to the
+> > target list of the subquery at the same time:
+> >
+> > select *
+> > from taba
+> > where col1 = (select col2
+> > from tabb
+> > where taba.col3 = tabb.col4)
+> >
+> > becomes:
+> >
+> > select *
+> > from taba
+> > where col1 = (select col2, tabb.col4 <---
+> > from tabb, taba <---
+> > where taba.col3 = tabb.col4)
+> >
+> > We add a field to TargetEntry and RangeTblEntry to mark the fact that it
+> > was entered as a correlation entry:
+> >
+> > bool isCorrelated;
+>
+> No, I don't like to add anything in parser. Example:
+>
+> select *
+> from tabA
+> where col1 = (select col2
+> from tabB
+> where tabA.col3 = tabB.col4
+> and exists (select *
+> from tabC
+> where tabB.colX = tabC.colX and
+> tabC.colY = tabA.col2)
+> )
+>
+> : a column of tabA is referenced in sub-subselect
+
+This is a strange case that I don't think we need to handle in our first
+implementation.
+
+> (is it allowable by standards ?) - in this case it's better
+> to don't add tabA to 1st subselect but add tabA to second one
+> and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
+> this gives us 2-tables join in 1st subquery instead of 3-tables join.
+> (And I'm still not sure that using temp tables is best of what can be
+> done in all cases...)
+
+I don't see any use for temp tables in subselects anymore. After having
+implemented UNIONS, I now see how much can be done in the upper
+optimizer. I see you just putting the subquery PLAN into the proper
+place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
+
+>
+> Instead of using isCorrelated in TE & RTE we can add
+>
+> Index varlevel;
+
+OK. Sounds good.
+
+>
+> to Var node to reflect (sub)query from where this Var is come
+> (where is range table to find var's relation using varno). Upmost query
+> will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
+> ^^^ ^^^^^^^^^^^^
+> (I don't see problems with distinguishing Vars of different children
+> on the same level...)
+>
+> >
+> > Second, we need to hook the subselect to the main query. I recommend we
+> > add two fields to Query for this:
+> >
+> > Query *parentQuery;
+> > List *subqueries;
+>
+> Agreed. And maybe Index queryLevel.
+
+Sure. If it helps.
+
+>
+> > In the parent query, to parse the WHERE clause, we create a new operator
+> > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
+> ^^^^^^^^^^^^^^^^^^
+> No. We have to handle (a,b,c) OP (select x, y, z ...) and
+> '_a_constant_' OP (select ...) - I don't know is last in standards,
+> Sybase has this.
+
+I have never seen this in my eight years of SQL. Perhaps we can leave
+this for later, maybe much later.
+
+>
+> Well,
+>
+> typedef enum OpType
+> {
+> OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
+>
+> + OP_EXISTS, OP_ALL, OP_ANY
+>
+> } OpType;
+>
+> typedef struct Expr
+> {
+> NodeTag type;
+> Oid typeOid; /* oid of the type of this expr */
+> OpType opType; /* type of the op */
+> Node *oper; /* could be Oper or Func */
+> List *args; /* list of argument nodes */
+> } Expr;
+>
+> OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
+> List, following your suggestion)
+>
+> OP_ALL, OP_ANY:
+>
+> oper is List of Oper nodes. We need in list because of data types of
+> a, b, c (above) can be different and so Oper nodes will be different too.
+>
+> lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
+> left side of subquery' operator.
+> lsecond(args) is SubSelect.
+>
+> Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
+> IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
+> by parser into corresponding ANY and ALL. At the moment we can do:
+>
+> IN --> = ANY, NOT IN --> <> ALL
+>
+> but this will be "known bug": this breaks OO-nature of Postgres, because of
+> operators can be overrided and '=' can mean s o m e t h i n g (not equality).
+> Example: box data type. For boxes, = means equality of _areas_ and =~
+> means that boxes are the same ==> =~ ANY should be used for IN.
+
+That is interesting, to use =~ for ANY.
+
+Yes, but how many operators take a SUBQUERY as an operand. This is a
+special case to me.
+
+I think I see where you are trying to go. You want subselects to behave
+like any other operator, with a subselect type, and you do all the
+subselect handling in the optimizer, with special Nodes and actions.
+
+I think this may be just too much of a leap. We have such clean query
+logic for single queries, I can't imagine having an operator that has a
+Query operand, and trying to get everything to properly handle it.
+UNIONS were very easy to implement as a List off of Query, with some
+foreach()'s in rewrite and the high optimizer.
+
+Subselects are SQL standard, and are never going to be over-ridden by a
+user. Same with UNION. They want UNION, they get UNION. They want
+Subselect, we are going to spin through the Query structure and give
+them what they want.
+
+The complexities of subselects and correlated queries and range tables
+and stuff is so bizarre that trying to get it to work inside the type
+system could be a huge project.
+
+>
+> > right side is an index to a slot in the subqueries List.
+
+I guess the question is what can we have by February 1?
+
+I have been reading some postings, and it seems to me that subselects
+are the litmus test for many evaluators when deciding if a database
+engine is full-featured.
+
+Sorry to be so straightforward, but I want to keep hashing this around
+until we get a conclusion, so coding can start.
+
+My suggestions have been, I believe, trying to get subselects working
+with the fullest functionality by adding the least amount of code, and
+keeping the logic clean.
+
+Have you checked out the UNION code? It is very small, but it works. I
+think it could make a good sample for subselects.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Sat Jan 10 12:00:51 1998
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA28742
+ for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:00:43 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05684;
+ Sun, 11 Jan 1998 00:19:10 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
+Date: Sun, 11 Jan 1998 00:19:08 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgresql.org, "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Subject: Re: subselects
+References: <199801092231.RAA24282@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > No, I don't like to add anything in parser. Example:
+> >
+> > select *
+> > from tabA
+> > where col1 = (select col2
+> > from tabB
+> > where tabA.col3 = tabB.col4
+> > and exists (select *
+> > from tabC
+> > where tabB.colX = tabC.colX and
+> > tabC.colY = tabA.col2)
+> > )
+> >
+> > : a column of tabA is referenced in sub-subselect
+>
+> This is a strange case that I don't think we need to handle in our first
+> implementation.
+
+I don't know is this strange case or not :)
+But I would like to know is this allowed by standards - can someone
+comment on this ?
+And I don't see problems with handling this...
+
+>
+> > (is it allowable by standards ?) - in this case it's better
+> > to don't add tabA to 1st subselect but add tabA to second one
+> > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
+> > this gives us 2-tables join in 1st subquery instead of 3-tables join.
+> > (And I'm still not sure that using temp tables is best of what can be
+> > done in all cases...)
+>
+> I don't see any use for temp tables in subselects anymore. After having
+> implemented UNIONS, I now see how much can be done in the upper
+> optimizer. I see you just putting the subquery PLAN into the proper
+> place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
+
+When saying about temp tables, I meant tables created by node Material
+for subquery plan. This is one of two ways - run subquery once for all
+possible upper plan tuples and then just join result table with upper
+query. Another way is re-run subquery for each upper query tuple,
+without temp table but may be with caching results by some ways.
+Actually, there is special case - when subquery can be alternatively
+formulated as joins, - but this is just special case.
+
+> > > In the parent query, to parse the WHERE clause, we create a new operator
+> > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
+> > ^^^^^^^^^^^^^^^^^^
+> > No. We have to handle (a,b,c) OP (select x, y, z ...) and
+> > '_a_constant_' OP (select ...) - I don't know is last in standards,
+> > Sybase has this.
+>
+> I have never seen this in my eight years of SQL. Perhaps we can leave
+> this for later, maybe much later.
+
+Are you saying about (a, b, c) or about 'a_constant' ?
+Again, can someone comment on are they in standards or not ?
+Tom ?
+If yes then please add parser' support for them now...
+
+> > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
+> > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
+> > by parser into corresponding ANY and ALL. At the moment we can do:
+> >
+> > IN --> = ANY, NOT IN --> <> ALL
+> >
+> > but this will be "known bug": this breaks OO-nature of Postgres, because of
+> > operators can be overrided and '=' can mean s o m e t h i n g (not equality).
+> > Example: box data type. For boxes, = means equality of _areas_ and =~
+> > means that boxes are the same ==> =~ ANY should be used for IN.
+>
+> That is interesting, to use =~ for ANY.
+>
+> Yes, but how many operators take a SUBQUERY as an operand. This is a
+> special case to me.
+>
+> I think I see where you are trying to go. You want subselects to behave
+> like any other operator, with a subselect type, and you do all the
+> subselect handling in the optimizer, with special Nodes and actions.
+>
+> I think this may be just too much of a leap. We have such clean query
+> logic for single queries, I can't imagine having an operator that has a
+> Query operand, and trying to get everything to properly handle it.
+> UNIONS were very easy to implement as a List off of Query, with some
+> foreach()'s in rewrite and the high optimizer.
+>
+> Subselects are SQL standard, and are never going to be over-ridden by a
+> user. Same with UNION. They want UNION, they get UNION. They want
+> Subselect, we are going to spin through the Query structure and give
+> them what they want.
+>
+> The complexities of subselects and correlated queries and range tables
+> and stuff is so bizarre that trying to get it to work inside the type
+> system could be a huge project.
+
+PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
+derived from the Berkeley Postgres database management system. While
+PostgreSQL retains the powerful object-relational data model, rich data types and
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+easy extensibility of Postgres, it replaces the PostQuel query language with an
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+extended subset of SQL.
+^^^^^^^^^^^^^^^^^^^^^^
+
+Should we say users that subselect will work for standard data types only ?
+I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
+Is there difference between handling = ANY and ~ ANY ? I don't see any.
+Currently we can't get IN working properly for boxes (and may be for others too)
+and I don't like to try to resolve these problems now, but hope that someday
+we'll be able to do this. At the moment - just convert IN into = ANY and
+NOT IN into <> ALL in parser.
+
+(BTW, do you know how DISTINCT is implemented ? It doesn't use = but
+use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
+
+> >
+> > > right side is an index to a slot in the subqueries List.
+>
+> I guess the question is what can we have by February 1?
+>
+> I have been reading some postings, and it seems to me that subselects
+> are the litmus test for many evaluators when deciding if a database
+> engine is full-featured.
+>
+> Sorry to be so straightforward, but I want to keep hashing this around
+> until we get a conclusion, so coding can start.
+>
+> My suggestions have been, I believe, trying to get subselects working
+> with the fullest functionality by adding the least amount of code, and
+> keeping the logic clean.
+>
+> Have you checked out the UNION code? It is very small, but it works. I
+> think it could make a good sample for subselects.
+
+There is big difference between subqueries and queries in UNION -
+there are not dependences between UNION queries.
+
+Ok, opened issues:
+
+1. Is using upper query' vars in all subquery levels in standard ?
+2. Is (a, b, c) OP (subselect) in standard ?
+3. What types of expressions (Var, Const, ...) are allowed on the left
+ side of operator with subquery on the right ?
+4. What types of operators should we support (=, >, ..., like, ~, ...) ?
+ (My vote for all boolean operators).
+
+And - did we get consensus on presentation subqueries stuff in Query,
+Expr and Var ?
+I would like to have something done in parser near Jan 17 to get
+subqueries working by Feb 1. I vote for support of all standard
+things (1. - 3.) in parser right now - if there will be no time
+to implement something like (a, b, c) then optimizer will call
+elog(WARN) (oh, sorry, - elog(ERROR)).
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Sat Jan 10 12:31:05 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA29045
+ for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:31:01 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA23364 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:22:30 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05725;
+ Sun, 11 Jan 1998 00:41:22 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B7B2BF.44FE7252@sable.krasnoyarsk.su>
+Date: Sun, 11 Jan 1998 00:41:19 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] subselects
+References: <199712220545.AAA11605@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> OK, a few questions:
+>
+> Should we use sortmerge, so we can use our psort as temp tables,
+> or do we use hashunique?
+>
+> How do we pass the query to the optimizer? How do we represent
+> the range table for each, and the links between them in correlated
+> subqueries?
+
+My suggestion is just use varlevel in Var and don't put upper query'
+relations into subquery range table.
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Sat Jan 10 13:01:00 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29357
+ for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:00:58 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA24030 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:40:02 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05741;
+ Sun, 11 Jan 1998 00:58:56 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B7B6DC.937E1B8D@sable.krasnoyarsk.su>
+Date: Sun, 11 Jan 1998 00:58:52 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>,
+ PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] subselects
+References: <199712220545.AAA11605@candle.pha.pa.us> <34B7B2BF.44FE7252@sable.krasnoyarsk.su>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Vadim B. Mikheev wrote:
+>
+> Bruce Momjian wrote:
+> >
+> > OK, a few questions:
+> >
+> > Should we use sortmerge, so we can use our psort as temp tables,
+> > or do we use hashunique?
+> >
+> > How do we pass the query to the optimizer? How do we represent
+> > the range table for each, and the links between them in correlated
+> > subqueries?
+>
+> My suggestion is just use varlevel in Var and don't put upper query'
+> relations into subquery range table.
+
+Hmm... Sorry, it seems that I did reply to very old message - forget it.
+
+Vadim
+
+From lockhart@alumni.caltech.edu Sat Jan 10 13:30:59 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29664
+ for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:30:56 -0500 (EST)
+Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA25109 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:05:09 -0500 (EST)
+Received: from alumni.caltech.edu (localhost [127.0.0.1])
+ by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA03623;
+ Sat, 10 Jan 1998 18:01:03 GMT
+Sender: tgl@gnet04.jpl.nasa.gov
+Message-ID: <34B7B75F.B49D7642@alumni.caltech.edu>
+Date: Sat, 10 Jan 1998 18:01:03 +0000
+From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Organization: Caltech/JPL
+X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
+MIME-Version: 1.0
+To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
+Subject: Re: subselects
+References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+> > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
+> > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
+> > > by parser into corresponding ANY and ALL. At the moment we can do:
+> > >
+> > > IN --> = ANY, NOT IN --> <> ALL
+> > >
+> > > but this will be "known bug": this breaks OO-nature of Postgres, because of
+> > > operators can be overrided and '=' can mean s o m e t h i n g (not equality).
+> > > Example: box data type. For boxes, = means equality of _areas_ and =~
+> > > means that boxes are the same ==> =~ ANY should be used for IN.
+> >
+> > That is interesting, to use =~ for ANY.
+
+If I understand the discussion, I would think is is fine to make an assumption about
+which operator is used to implement a subselect expression. If someone remaps an
+operator to mean something different, then they will get a different result (or a
+nonsensical one) from a subselect.
+
+I'd be happy to remap existing operators to fit into a convention which would work
+with subselects (especially if I got to help choose :).
+
+> > Subselects are SQL standard, and are never going to be over-ridden by a
+> > user. Same with UNION. They want UNION, they get UNION. They want
+> > Subselect, we are going to spin through the Query structure and give
+> > them what they want.
+>
+> PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
+> derived from the Berkeley Postgres database management system. While
+> PostgreSQL retains the powerful object-relational data model, rich data types and
+> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> easy extensibility of Postgres, it replaces the PostQuel query language with an
+> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+> extended subset of SQL.
+> ^^^^^^^^^^^^^^^^^^^^^^
+>
+> Should we say users that subselect will work for standard data types only ?
+> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
+> Is there difference between handling = ANY and ~ ANY ? I don't see any.
+> Currently we can't get IN working properly for boxes (and may be for others too)
+> and I don't like to try to resolve these problems now, but hope that someday
+> we'll be able to do this. At the moment - just convert IN into = ANY and
+> NOT IN into <> ALL in parser.
+>
+> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
+> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
+
+?? I didn't know that. Wouldn't we want it to eventually use "=" through a sorted
+list? That would give more consistant behavior...
+
+> > I have been reading some postings, and it seems to me that subselects
+> > are the litmus test for many evaluators when deciding if a database
+> > engine is full-featured.
+> >
+> > Sorry to be so straightforward, but I want to keep hashing this around
+> > until we get a conclusion, so coding can start.
+> >
+> > My suggestions have been, I believe, trying to get subselects working
+> > with the fullest functionality by adding the least amount of code, and
+> > keeping the logic clean.
+> >
+> > Have you checked out the UNION code? It is very small, but it works. I
+> > think it could make a good sample for subselects.
+>
+> There is big difference between subqueries and queries in UNION -
+> there are not dependences between UNION queries.
+>
+> Ok, opened issues:
+>
+> 1. Is using upper query' vars in all subquery levels in standard ?
+
+I'm not certain. Let me know if you do not get an answer from someone else and I will
+research it.
+
+> 2. Is (a, b, c) OP (subselect) in standard ?
+
+Yes. In fact, it _is_ the standard, and "a OP (subselect)" is a special case where
+the parens are allowed to be omitted from a one element list.
+
+> 3. What types of expressions (Var, Const, ...) are allowed on the left
+> side of operator with subquery on the right ?
+
+I think most expressions are allowed. The "constant OP (subselect)" case you were
+asking about is just a simplified case since "(a, b, constant) OP (subselect)" where
+a and b are column references should be allowed. Of course, our optimizer could
+perhaps change this to "(a, b) OP (subselect where x = constant)", or for the first
+example "EXISTS (subselect where x = constant)".
+
+> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
+> (My vote for all boolean operators).
+
+Sounds good. But I'll vote with Bruce (and I'll bet you already agree) that it is
+important to get an initial implementation for v6.3 which covers a little, some, or
+all of the usual SQL subselect constructs. If we have to revisit this for v6.4 then
+we will have the benefit of feedback from others in practical applications which
+always uncovers new things to consider.
+
+> And - did we get consensus on presentation subqueries stuff in Query,
+> Expr and Var ?
+> I would like to have something done in parser near Jan 17 to get
+> subqueries working by Feb 1. I vote for support of all standard
+> things (1. - 3.) in parser right now - if there will be no time
+> to implement something like (a, b, c) then optimizer will callelog(WARN) (oh,
+> sorry, - elog(ERROR)).
+
+Great. I'd like to help with the remaining parser issues; at the moment "row_expr"
+does the right thing with expression comparisions but just parses then ignores
+subselect expressions. Let me know what structures you want passed back and I'll put
+them in, or if you prefer put in the first one and I'll go through and clean up and
+add the rest.
+
+ - Tom
+
+
+From lockhart@alumni.caltech.edu Sat Jan 10 15:00:58 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA00728
+ for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 15:00:56 -0500 (EST)
+Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA28438 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 14:35:19 -0500 (EST)
+Received: from alumni.caltech.edu (localhost [127.0.0.1])
+ by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id TAA06002;
+ Sat, 10 Jan 1998 19:31:30 GMT
+Sender: tgl@gnet04.jpl.nasa.gov
+Message-ID: <34B7CC91.E6E331C7@alumni.caltech.edu>
+Date: Sat, 10 Jan 1998 19:31:29 +0000
+From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Organization: Caltech/JPL
+X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
+MIME-Version: 1.0
+To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
+Subject: Re: [HACKERS] Re: subselects
+References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+> Are you saying about (a, b, c) or about 'a_constant' ?
+> Again, can someone comment on are they in standards or not ?
+> Tom ?
+> If yes then please add parser' support for them now...
+
+As I mentioned a few minutes ago in my last message, I parse the row descriptors and
+the subselects but for subselect expressions (e.g. "(a,b) OP (subselect)" I currently
+ignore the result. I didn't want to pass things back as lists until something in the
+backend was ready to receive them.
+
+If it is OK, I'll go ahead and start passing back a list of expressions when a row
+descriptor is present. So, what you will find is lexpr or rexpr in the A_Expr node
+being a list rather than an atomic node.
+
+Also, I can start passing back the subselect expression as the rexpr; right now the
+parser calls elog() and quits.
+
+btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
+makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
+If lists are handled farther back, this routine should move to there also and the
+parser will just pass the lists. Note that some assumptions have to be made about the
+meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
+"a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
+to disallow those cases or to look for specific appearance of the operator to guess
+the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
+it has "<>" or "!" then build as "or"s.
+
+Let me know what you want...
+
+ - Tom
+
+
+From lockhart@alumni.caltech.edu Sun Jan 11 01:01:55 1998
+Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA11953
+ for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:01:51 -0500 (EST)
+Received: from alumni.caltech.edu (localhost [127.0.0.1])
+ by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id FAA23797;
+ Sun, 11 Jan 1998 05:58:01 GMT
+Sender: tgl@gnet04.jpl.nasa.gov
+Message-ID: <34B85F68.9C015ED9@alumni.caltech.edu>
+Date: Sun, 11 Jan 1998 05:58:01 +0000
+From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+Organization: Caltech/JPL
+X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
+MIME-Version: 1.0
+To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
+Subject: Re: [HACKERS] Re: subselects
+References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
+Content-Type: multipart/mixed; boundary="------------D8B38A0D1F78A10C0023F702"
+Status: OR
+
+This is a multi-part message in MIME format.
+--------------D8B38A0D1F78A10C0023F702
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+
+Here are context diffs of gram.y and keywords.c; sorry about sending the full files.
+These start sending lists of arguments toward the backend from the parser to
+implement row descriptors and subselects.
+
+They should apply OK even over Bruce's recent changes...
+
+ - Tom
+
+--------------D8B38A0D1F78A10C0023F702
+Content-Type: text/plain; charset=us-ascii; name="gram.y.patch"
+Content-Transfer-Encoding: 7bit
+Content-Disposition: inline; filename="gram.y.patch"
+
+*** ../src/backend/parser/gram.y.orig Sat Jan 10 05:44:36 1998
+--- ../src/backend/parser/gram.y Sat Jan 10 19:29:37 1998
+***************
+*** 195,200 ****
+--- 195,201 ----
+ having_clause
+ %type <list> row_descriptor, row_list
+ %type <node> row_expr
++ %type <str> RowOp, row_opt
+ %type <list> OptCreateAs, CreateAsList
+ %type <node> CreateAsElement
+ %type <value> NumConst
+***************
+*** 242,248 ****
+ */
+
+ /* Keywords (in SQL92 reserved words) */
+! %token ACTION, ADD, ALL, ALTER, AND, AS, ASC,
+ BEGIN_TRANS, BETWEEN, BOTH, BY,
+ CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT,
+ CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME,
+--- 243,249 ----
+ */
+
+ /* Keywords (in SQL92 reserved words) */
+! %token ACTION, ADD, ALL, ALTER, AND, ANY, AS, ASC,
+ BEGIN_TRANS, BETWEEN, BOTH, BY,
+ CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT,
+ CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME,
+***************
+*** 258,264 ****
+ ON, OPTION, OR, ORDER, OUTER_P,
+ PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC,
+ REFERENCES, REVOKE, RIGHT, ROLLBACK,
+! SECOND_P, SELECT, SET, SUBSTRING,
+ TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM,
+ UNION, UNIQUE, UPDATE, USING,
+ VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW,
+--- 259,265 ----
+ ON, OPTION, OR, ORDER, OUTER_P,
+ PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC,
+ REFERENCES, REVOKE, RIGHT, ROLLBACK,
+! SECOND_P, SELECT, SET, SOME, SUBSTRING,
+ TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM,
+ UNION, UNIQUE, UPDATE, USING,
+ VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW,
+***************
+*** 2853,2866 ****
+ /* Expressions using row descriptors
+ * Define row_descriptor to allow yacc to break the reduce/reduce conflict
+ * with singleton expressions.
+ */
+ row_expr: '(' row_descriptor ')' IN '(' SubSelect ')'
+ {
+! $$ = NULL;
+ }
+ | '(' row_descriptor ')' NOT IN '(' SubSelect ')'
+ {
+! $$ = NULL;
+ }
+ | '(' row_descriptor ')' '=' '(' row_descriptor ')'
+ {
+--- 2854,2878 ----
+ /* Expressions using row descriptors
+ * Define row_descriptor to allow yacc to break the reduce/reduce conflict
+ * with singleton expressions.
++ *
++ * Note that "SOME" is the same as "ANY" in syntax.
++ * - thomas 1998-01-10
+ */
+ row_expr: '(' row_descriptor ')' IN '(' SubSelect ')'
+ {
+! $$ = makeA_Expr(OP, "=any", (Node *)$2, (Node *)$6);
+ }
+ | '(' row_descriptor ')' NOT IN '(' SubSelect ')'
+ {
+! $$ = makeA_Expr(OP, "<>any", (Node *)$2, (Node *)$7);
+! }
+! | '(' row_descriptor ')' RowOp row_opt '(' SubSelect ')'
+! {
+! char *opr;
+! opr = palloc(strlen($4)+strlen($5)+1);
+! strcpy(opr, $4);
+! strcat(opr, $5);
+! $$ = makeA_Expr(OP, opr, (Node *)$2, (Node *)$7);
+ }
+ | '(' row_descriptor ')' '=' '(' row_descriptor ')'
+ {
+***************
+*** 2880,2885 ****
+--- 2892,2907 ----
+ }
+ ;
+
++ RowOp: '=' { $$ = "="; }
++ | '<' { $$ = "<"; }
++ | '>' { $$ = ">"; }
++ ;
++
++ row_opt: ALL { $$ = "all"; }
++ | ANY { $$ = "any"; }
++ | SOME { $$ = "any"; }
++ ;
++
+ row_descriptor: row_list ',' a_expr
+ {
+ $$ = lappend($1, $3);
+***************
+*** 3432,3441 ****
+ ;
+
+ in_expr: SubSelect
+! {
+! elog(ERROR,"IN (SUBSELECT) not yet implemented");
+! $$ = $1;
+! }
+ | in_expr_nodes
+ { $$ = $1; }
+ ;
+--- 3454,3460 ----
+ ;
+
+ in_expr: SubSelect
+! { $$ = makeA_Expr(OP, "=", saved_In_Expr, (Node *)$1); }
+ | in_expr_nodes
+ { $$ = $1; }
+ ;
+***************
+*** 3449,3458 ****
+ ;
+
+ not_in_expr: SubSelect
+! {
+! elog(ERROR,"NOT IN (SUBSELECT) not yet implemented");
+! $$ = $1;
+! }
+ | not_in_expr_nodes
+ { $$ = $1; }
+ ;
+--- 3468,3474 ----
+ ;
+
+ not_in_expr: SubSelect
+! { $$ = makeA_Expr(OP, "<>", saved_In_Expr, (Node *)$1); }
+ | not_in_expr_nodes
+ { $$ = $1; }
+ ;
+
+--------------D8B38A0D1F78A10C0023F702
+Content-Type: text/plain; charset=us-ascii; name="keywords.c.patch"
+Content-Transfer-Encoding: 7bit
+Content-Disposition: inline; filename="keywords.c.patch"
+
+*** ../src/backend/parser/keywords.c.orig Mon Jan 5 07:51:33 1998
+--- ../src/backend/parser/keywords.c Sat Jan 10 19:22:07 1998
+***************
+*** 39,44 ****
+--- 39,45 ----
+ {"alter", ALTER},
+ {"analyze", ANALYZE},
+ {"and", AND},
++ {"any", ANY},
+ {"append", APPEND},
+ {"archive", ARCHIVE},
+ {"as", AS},
+***************
+*** 178,183 ****
+--- 179,185 ----
+ {"set", SET},
+ {"setof", SETOF},
+ {"show", SHOW},
++ {"some", SOME},
+ {"stdin", STDIN},
+ {"stdout", STDOUT},
+ {"substring", SUBSTRING},
+
+--------------D8B38A0D1F78A10C0023F702--
+
+
+From owner-pgsql-hackers@hub.org Sun Jan 11 01:31:13 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA12255
+ for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:31:10 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA20396 for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:10:48 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA22176; Sun, 11 Jan 1998 01:03:15 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 11 Jan 1998 01:02:34 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA22151 for pgsql-hackers-outgoing; Sun, 11 Jan 1998 01:02:26 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA22077 for <hackers@postgresql.org>; Sun, 11 Jan 1998 01:01:05 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id AAA11801;
+ Sun, 11 Jan 1998 00:59:23 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801110559.AAA11801@candle.pha.pa.us>
+Subject: [HACKERS] Re: subselects
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Sun, 11 Jan 1998 00:59:23 -0500 (EST)
+Cc: hackers@postgresql.org, lockhart@alumni.caltech.edu
+In-Reply-To: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 11, 98 00:19:08 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> I would like to have something done in parser near Jan 17 to get
+> subqueries working by Feb 1. I vote for support of all standard
+> things (1. - 3.) in parser right now - if there will be no time
+> to implement something like (a, b, c) then optimizer will call
+> elog(WARN) (oh, sorry, - elog(ERROR)).
+
+First, let me say I am glad we are still on schedule for Feb 1. I was
+panicking because I thought we wouldn't make it in time.
+
+
+> > > (is it allowable by standards ?) - in this case it's better
+> > > to don't add tabA to 1st subselect but add tabA to second one
+> > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
+> > > this gives us 2-tables join in 1st subquery instead of 3-tables join.
+> > > (And I'm still not sure that using temp tables is best of what can be
+> > > done in all cases...)
+> >
+> > I don't see any use for temp tables in subselects anymore. After having
+> > implemented UNIONS, I now see how much can be done in the upper
+> > optimizer. I see you just putting the subquery PLAN into the proper
+> > place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
+>
+> When saying about temp tables, I meant tables created by node Material
+> for subquery plan. This is one of two ways - run subquery once for all
+> possible upper plan tuples and then just join result table with upper
+> query. Another way is re-run subquery for each upper query tuple,
+> without temp table but may be with caching results by some ways.
+> Actually, there is special case - when subquery can be alternatively
+> formulated as joins, - but this is just special case.
+
+This is interesting. It really only applies for correlated subqueries,
+and certainly it may help sometimes to just evaluate the subquery for
+valid values that are going to come from the upper query than for all
+possible values. Perhaps we can use the 'cost' value of each query to
+decide how to handle this.
+
+>
+> > > > In the parent query, to parse the WHERE clause, we create a new operator
+> > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
+> > > ^^^^^^^^^^^^^^^^^^
+> > > No. We have to handle (a,b,c) OP (select x, y, z ...) and
+> > > '_a_constant_' OP (select ...) - I don't know is last in standards,
+> > > Sybase has this.
+> >
+> > I have never seen this in my eight years of SQL. Perhaps we can leave
+> > this for later, maybe much later.
+>
+> Are you saying about (a, b, c) or about 'a_constant' ?
+> Again, can someone comment on are they in standards or not ?
+> Tom ?
+> If yes then please add parser' support for them now...
+
+OK, Thomas says it is, so we will put in as much code as we can to handle
+it.
+
+> Should we say users that subselect will work for standard data types only ?
+> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
+> Is there difference between handling = ANY and ~ ANY ? I don't see any.
+> Currently we can't get IN working properly for boxes (and may be for others too)
+> and I don't like to try to resolve these problems now, but hope that someday
+> we'll be able to do this. At the moment - just convert IN into = ANY and
+> NOT IN into <> ALL in parser.
+
+OK.
+
+>
+> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
+> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
+
+I did not know that either.
+
+> There is big difference between subqueries and queries in UNION -
+> there are not dependences between UNION queries.
+
+Yes, I know UNIONS are trivial compared to subselects.
+
+>
+> Ok, opened issues:
+>
+> 1. Is using upper query' vars in all subquery levels in standard ?
+> 2. Is (a, b, c) OP (subselect) in standard ?
+> 3. What types of expressions (Var, Const, ...) are allowed on the left
+> side of operator with subquery on the right ?
+> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
+> (My vote for all boolean operators).
+>
+> And - did we get consensus on presentation subqueries stuff in Query,
+> Expr and Var ?
+
+OK, here are my concrete ideas on changes and structures.
+
+I think we all agreed that Query needs new fields:
+
+ Query *parentQuery;
+ List *subqueries;
+
+Maybe query level too, but I don't think so (see later ideas on Var).
+
+We need a new Node structure, call it Sublink:
+
+ int linkType (IN, NOTIN, ANY, EXISTS, OPERATOR...)
+ Oid operator /* subquery must return single row */
+ List *lefthand; /* parent stuff */
+ Node *subquery; /* represents nodes from parser */
+ Index Subindex; /* filled in to index Query->subqueries */
+
+Of course, the names are just suggestions. Every time we run through
+the parsenodes of a query to create a Query* structure, when we do the
+WHERE clause, if we come upon one of these Sublink nodes (created in the
+parser), we move the supplied Query* in Sublink->subquery to a local
+List variable, and we set Subquery->subindex to equal the index of the
+new query, i.e. is it the first subquery we found, 1, or the second, 2,
+etc.
+
+After we have created the parent Query structure, we run through our
+local List variable of subquery parsenodes we created above, and add
+Query* entries to Query->subqueries. In each subquery Query*, we set
+the parentQuery pointer.
+
+Also, when parsing the subqueries, we need to keep track of correlated
+references. I recommend we add a field to the Var structure:
+
+ Index sublevel; /* range table reference:
+ = 0 current level of query
+ < 0 parent above this many levels
+ > 0 index into subquery list
+ */
+
+This way, a Var node with sublevel 0 is the current level, and is true
+in most cases. This helps us not have to change much code. sublevel =
+-1 means it references the range table in the parent query. sublevel =
+-2 means the parent's parent. sublevel = 2 means it references the range
+table of the second entry in Query->subqueries. Varno and varattno are
+still meaningful. Of course, we can't reference variables in the
+subqueries from the parent in the parser code, but Vadim may want to.
+
+When doing a Var lookup in the parser, we look in the current level
+first, but if not found, if it is a subquery, we can look at the parent
+and parent's parent to set the sublevel, varno, and varatno properly.
+
+We create no phantom range table entries in the subquery, and no phantom
+target list entries. We can leave that all for the upper optimizer.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Fri Nov 28 16:34:03 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA17454
+ for <maillist@candle.pha.pa.us>; Fri, 28 Nov 1997 16:33:59 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA10553; Fri, 28 Nov 1997 16:20:03 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 28 Nov 1997 16:17:50 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA10116 for pgsql-hackers-outgoing; Fri, 28 Nov 1997 16:17:45 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA09997 for <hackers@postgreSQL.org>; Fri, 28 Nov 1997 16:17:26 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id QAA17309
+ for hackers@postgreSQL.org; Fri, 28 Nov 1997 16:18:08 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199711282118.QAA17309@candle.pha.pa.us>
+Subject: [HACKERS] querytrees and multiple statements
+To: hackers@postgreSQL.org (PostgreSQL-development)
+Date: Fri, 28 Nov 1997 16:18:08 -0500 (EST)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Currently, if a query string arrives that has multiple sql statements in
+it, the parser breaks it down into separate queries, analyzes each one,
+then executes them in order. (psql automatically breaks things down
+into separate queries, do this will not work there.) The problem is
+that if the first query creates a table, and the second query goes to
+access it, the parser analysis fails because the table is not yet
+created. See the attached pginterface source for an example. The real
+problem is that all the queries in the string are analyzed first, then
+executed, rather than having one analyzed then execute, then the next.
+
+I am going to have touble with subselects and temp tables. I want to
+pull out the subselect, change it into a SELECT ... INTO TEMP, add it to
+the QueryTree before the outer select, then the outer select is analyzed
+by the parser, the temp table doesn't exist yet, and will cause an
+error.
+
+Currently postgres.c does each step on all queries before moving to the
+next step. Does anyone know what the ramifications would be if I
+changed this to do to the full set of operations on each statement first
+before moving to the next?
+
+---------------------------------------------------------------------------
+
+
+/*
+ * pgnulltest.c
+ *
+*/
+
+#include <stdio.h>
+#include <signal.h>
+#include <time.h>
+#include <halt.h>
+#include <postgres.h>
+#include <libpq-fe.h>
+#include <pginterface.h>
+
+int main(int argc, char **argv)
+{
+ char query[4000];
+ int i;
+
+ if (argc != 2)
+ halt("Usage: %s database\n",argv[0]);
+
+ connectdb(argv[1],NULL,NULL,NULL,NULL);
+
+ sprintf(query,"create table test(x int); select x from test;");
+ doquery(query);
+
+ disconnectdb();
+ return 0;
+}
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Sat Nov 29 05:01:01 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id FAA27942
+ for <maillist@candle.pha.pa.us>; Sat, 29 Nov 1997 05:00:58 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA13666 for <maillist@candle.pha.pa.us>; Sat, 29 Nov 1997 04:35:08 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id QAA17107; Sat, 29 Nov 1997 16:38:58 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <347FE2B1.167EB0E7@sable.krasnoyarsk.su>
+Date: Sat, 29 Nov 1997 16:38:57 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] querytrees and multiple statements
+References: <199711282118.QAA17309@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> Currently, if a query string arrives that has multiple sql statements in
+> it, the parser breaks it down into separate queries, analyzes each one,
+> then executes them in order. (psql automatically breaks things down
+> into separate queries, do this will not work there.) The problem is
+> that if the first query creates a table, and the second query goes to
+> access it, the parser analysis fails because the table is not yet
+> created. See the attached pginterface source for an example. The real
+> problem is that all the queries in the string are analyzed first, then
+> executed, rather than having one analyzed then execute, then the next.
+>
+> I am going to have touble with subselects and temp tables. I want to
+> pull out the subselect, change it into a SELECT ... INTO TEMP, add it to
+> the QueryTree before the outer select, then the outer select is analyzed
+> by the parser, the temp table doesn't exist yet, and will cause an
+> error.
+>
+> Currently postgres.c does each step on all queries before moving to the
+> next step. Does anyone know what the ramifications would be if I
+> changed this to do to the full set of operations on each statement first
+> before moving to the next?
+
+This will break ability to prepare plan (parser + optimizer) for latter
+execution. This ability is used by RULEs (and so - by VIEWs) and will be
+used by PL(s)...
+
+Please, take a look at nodeMaterial.c:
+
+/*-------------------------------------------------------------------------
+ *
+ * nodeMaterial.c--
+ * Routines to handle materialization nodes.
+...
+/*
+ * INTERFACE ROUTINES
+ * ExecMaterial - generate a temporary relation
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+(I'm still very busy. Hope to return soon.)
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Sun Nov 30 02:30:56 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA15439
+ for <maillist@candle.pha.pa.us>; Sun, 30 Nov 1997 02:30:55 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id CAA17743 for <maillist@candle.pha.pa.us>; Sun, 30 Nov 1997 02:27:40 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id OAA18937; Sun, 30 Nov 1997 14:32:14 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <3481167E.2781E494@sable.krasnoyarsk.su>
+Date: Sun, 30 Nov 1997 14:32:14 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org
+Subject: Re: [HACKERS] querytrees and multiple statements
+References: <199711291854.NAA05185@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > This will break ability to prepare plan (parser + optimizer) for latter
+> > execution. This ability is used by RULEs (and so - by VIEWs) and will be
+> > used by PL(s)...
+> >
+> > Please, take a look at nodeMaterial.c:
+> >
+> > /*-------------------------------------------------------------------------
+> > *
+> > * nodeMaterial.c--
+> > * Routines to handle materialization nodes.
+> > ...
+> > /*
+> > * INTERFACE ROUTINES
+> > * ExecMaterial - generate a temporary relation
+> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+>
+> I understand what you are saying here. The temp table has transaction
+> scope, and breaking each query into multiple commands, each with its own
+> transaction scope will cause the temp table to go away.
+
+No. I just said that there will be no ability to prepare queries with
+subselects for latter execution: will be no ability to get execution plan which
+could be passed to executor to get results without additional parser/planner
+invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp()
+(==> PLs). RULEs don't use execution plan, but use parsed query tree (stored
+in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects.
+
+Ability to have execution plans seems important to me. Other DBMS-es use
+this for stored procedures and views.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Mon Dec 1 01:30:57 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA10903
+ for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 01:30:55 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26262 for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 01:21:28 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA05263; Mon, 1 Dec 1997 01:02:12 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 01:00:12 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA03357 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 01:00:07 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA03290 for <hackers@postgreSQL.org>; Mon, 1 Dec 1997 00:59:45 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id AAA10395;
+ Mon, 1 Dec 1997 00:57:07 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199712010557.AAA10395@candle.pha.pa.us>
+Subject: Re: [HACKERS] querytrees and multiple statements
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Mon, 1 Dec 1997 00:57:07 -0500 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <3481167E.2781E494@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 30, 97 02:32:14 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> No. I just said that there will be no ability to prepare queries with
+> subselects for latter execution: will be no ability to get execution plan which
+> could be passed to executor to get results without additional parser/planner
+> invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp()
+> (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored
+> in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects.
+>
+> Ability to have execution plans seems important to me. Other DBMS-es use
+> this for stored procedures and views.
+>
+> Vadim
+>
+
+I see what you are saying about other people calling pg_plan(). pg_plan
+returns the query rewritten, and a plan, and some areas use that. I
+will have to make sure I honor that functionality in any changes I make
+to it. I will think more about this. I may have to add an 'execute me'
+flag to it. However, I am unsure how I am going to generate 'just a
+plan or rewritten query structure' without actually running the query
+and having the temp table created so the rest can be parsed.
+
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Mon Dec 1 02:00:58 1997
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA11221
+ for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 02:00:57 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26994 for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 01:55:19 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA23269; Mon, 1 Dec 1997 01:47:13 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 01:45:31 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA22653 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 01:45:25 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22590 for <hackers@postgreSQL.org>; Mon, 1 Dec 1997 01:45:13 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA21318; Mon, 1 Dec 1997 13:49:58 +0700 (KRS)
+Message-ID: <34825E16.446B9B3D@sable.krasnoyarsk.su>
+Date: Mon, 01 Dec 1997 13:49:58 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org
+Subject: Re: [HACKERS] querytrees and multiple statements
+References: <199712010557.AAA10395@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce Momjian wrote:
+>
+> >
+> > No. I just said that there will be no ability to prepare queries with
+> > subselects for latter execution: will be no ability to get execution plan which
+> > could be passed to executor to get results without additional parser/planner
+> > invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp()
+> > (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored
+> > in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects.
+> >
+> > Ability to have execution plans seems important to me. Other DBMS-es use
+> > this for stored procedures and views.
+> >
+> > Vadim
+> >
+>
+> I see what you are saying about other people calling pg_plan(). pg_plan
+> returns the query rewritten, and a plan, and some areas use that. I
+> will have to make sure I honor that functionality in any changes I make
+> to it. I will think more about this. I may have to add an 'execute me'
+> flag to it. However, I am unsure how I am going to generate 'just a
+> plan or rewritten query structure' without actually running the query
+> and having the temp table created so the rest can be parsed.
+
+That's why I suggest to try with nodeMaterial(): this could allow to handle
+subqueries on optimizer level and got single execution plan for
+single user query.
+
+Vadim
+
+
+From owner-pgsql-hackers@hub.org Mon Dec 1 02:46:23 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA11762
+ for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 02:46:21 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id CAA11681; Mon, 1 Dec 1997 02:35:00 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 02:33:17 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id CAA11451 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 02:33:09 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id CAA11110 for <hackers@postgreSQL.org>; Mon, 1 Dec 1997 02:32:10 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id CAA11574;
+ Mon, 1 Dec 1997 02:32:45 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199712010732.CAA11574@candle.pha.pa.us>
+Subject: Re: [HACKERS] querytrees and multiple statements
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Mon, 1 Dec 1997 02:32:45 -0500 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <34825E16.446B9B3D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 1, 97 01:49:58 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> Bruce Momjian wrote:
+> >
+> > >
+> > > No. I just said that there will be no ability to prepare queries with
+> > > subselects for latter execution: will be no ability to get execution plan which
+> > > could be passed to executor to get results without additional parser/planner
+> > > invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp()
+> > > (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored
+> > > in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects.
+> > >
+> > > Ability to have execution plans seems important to me. Other DBMS-es use
+> > > this for stored procedures and views.
+> > >
+> > > Vadim
+> > >
+> >
+> > I see what you are saying about other people calling pg_plan(). pg_plan
+> > returns the query rewritten, and a plan, and some areas use that. I
+> > will have to make sure I honor that functionality in any changes I make
+> > to it. I will think more about this. I may have to add an 'execute me'
+> > flag to it. However, I am unsure how I am going to generate 'just a
+> > plan or rewritten query structure' without actually running the query
+> > and having the temp table created so the rest can be parsed.
+>
+> That's why I suggest to try with nodeMaterial(): this could allow to handle
+> subqueries on optimizer level and got single execution plan for
+> single user query.
+
+Can you give me more details on this? I realize I can create an empty
+tmp table to get through the parser analysis stuff, but how do I do
+something in nodeMaterial?
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Tue Dec 2 00:04:05 1997
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA00350
+ for <maillist@candle.pha.pa.us>; Tue, 2 Dec 1997 00:03:58 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id MAA22889; Tue, 2 Dec 1997 12:09:57 +0700 (KRS)
+Sender: root@www.krasnet.ru
+Message-ID: <34839824.3F54BC7E@sable.krasnoyarsk.su>
+Date: Tue, 02 Dec 1997 12:09:56 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: "Vadim B. Mikheev" <vadim@post.krasnet.ru>, hackers@postgreSQL.org
+Subject: Re: [HACKERS] querytrees and multiple statements
+References: <199712010732.CAA11574@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> >
+> > That's why I suggest to try with nodeMaterial(): this could allow to handle
+> > subqueries on optimizer level and got single execution plan for
+> > single user query.
+>
+> Can you give me more details on this? I realize I can create an empty
+> tmp table to get through the parser analysis stuff, but how do I do
+> something in nodeMaterial?
+
+ * ExecMaterial
+ *
+ * The first time this is called, ExecMaterial retrieves tuples
+ * this node's outer subplan and inserts them into a temporary
+ ^^^^^^^
+
+ * relation. After this is done, a flag is set indicating that
+ * the subplan has been materialized. Once the relation is
+ * materialized, the first tuple is then returned. Successive
+ * calls to ExecMaterial return successive tuples from the temp
+ * relation.
+
+As you see, this node materializes some plan results into temp relation:
+instead of doing SELECT ... INTO temp FROM ... WHERE ... you could
+create Material node using plan for 'SELECT ... FROM ... WHERE ...' as
+its subplan. SeqScan of this materialized relation can be used in any
+join plans just like scan od normal relation, e.g. - NESTLOOP plan:
+
+ NESTLOOP
+ SeqScan A
+ SeqScan B
+
+becomes
+
+ NESTLOOP
+ SeqScan
+ Material
+ ...subplan here...
+ SeqScan B (or other Material)
+
+and so on...
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Tue Dec 2 01:28:02 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA02313
+ for <maillist@candle.pha.pa.us>; Tue, 2 Dec 1997 01:28:00 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA00346; Tue, 2 Dec 1997 01:03:55 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 02 Dec 1997 01:03:04 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA28750 for pgsql-hackers-outgoing; Tue, 2 Dec 1997 01:02:57 -0500 (EST)
+Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA28254 for <hackers@postgreSQL.org>; Tue, 2 Dec 1997 01:02:38 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id BAA01042;
+ Tue, 2 Dec 1997 01:02:15 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199712020602.BAA01042@candle.pha.pa.us>
+Subject: Re: [HACKERS] querytrees and multiple statements
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Tue, 2 Dec 1997 01:02:15 -0500 (EST)
+Cc: vadim@post.krasnet.ru, hackers@postgreSQL.org
+In-Reply-To: <34839824.3F54BC7E@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 2, 97 12:09:56 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> Bruce Momjian wrote:
+> >
+> > >
+> > > That's why I suggest to try with nodeMaterial(): this could allow to handle
+> > > subqueries on optimizer level and got single execution plan for
+> > > single user query.
+> >
+> > Can you give me more details on this? I realize I can create an empty
+> > tmp table to get through the parser analysis stuff, but how do I do
+> > something in nodeMaterial?
+>
+> * ExecMaterial
+> *
+> * The first time this is called, ExecMaterial retrieves tuples
+> * this node's outer subplan and inserts them into a temporary
+> ^^^^^^^
+>
+> * relation. After this is done, a flag is set indicating that
+> * the subplan has been materialized. Once the relation is
+> * materialized, the first tuple is then returned. Successive
+> * calls to ExecMaterial return successive tuples from the temp
+> * relation.
+>
+> As you see, this node materializes some plan results into temp relation:
+> instead of doing SELECT ... INTO temp FROM ... WHERE ... you could
+> create Material node using plan for 'SELECT ... FROM ... WHERE ...' as
+> its subplan. SeqScan of this materialized relation can be used in any
+> join plans just like scan od normal relation, e.g. - NESTLOOP plan:
+>
+> NESTLOOP
+> SeqScan A
+> SeqScan B
+>
+> becomes
+>
+> NESTLOOP
+> SeqScan
+> Material
+> ...subplan here...
+> SeqScan B (or other Material)
+>
+> and so on...
+
+The problem now is that I don't understand much about what happens
+inside the optimizer or executor. I am sure you are correct that we can
+have the subselect as a subnode, and if you think that is best, then it
+is.
+
+This pretty much stops me in developing subselects. I have the concepts
+down of what has to happen, but I can not implement it. It will take me
+several months to learn how the optimizer and executor work in enough
+detail to implement this.
+
+I usually alot 2-3 days a month for PostgreSQL development.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Thu Oct 30 01:30:59 1997
+Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA17986
+ for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 01:30:58 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA27090 for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 01:19:49 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA28901; Thu, 30 Oct 1997 01:16:38 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 01:16:17 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA28673 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 01:16:10 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA27557 for <hackers@postgreSQL.org>; Thu, 30 Oct 1997 01:15:27 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA20275; Thu, 30 Oct 1997 13:16:10 +0700 (KRS)
+Message-ID: <34582629.33590565@sable.krasnoyarsk.su>
+Date: Thu, 30 Oct 1997 13:16:09 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: PostgreSQL Developers List <hackers@postgreSQL.org>
+Subject: [HACKERS] Subqueries?
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Hi!
+
+Bruce, did you begin with them ?
+I agreed that subqueries should be implemented like SQL-funcs, but
+I would suggest to don't CREATE FUNCTION - this is quite bad for
+performance, but use some new node (VirtualFunc or SubQuery or) and
+handle such nodes like sql-funcs are handled in function.c
+(but without parser/planner invocation on each call - should be
+fixed!). Also, not corelated subqueries returning single result
+can't be replaced in parser/planner by constant node: rules (and so -
+views), spi and PL use _prepared_ plans...
+It seems that this is not hard work...
+
+Vadim
+
+
+From owner-pgsql-hackers@hub.org Thu Oct 30 16:31:59 1997
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA07360
+ for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 16:31:49 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA11483; Thu, 30 Oct 1997 16:27:11 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:26:14 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA11163 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:26:07 -0500 (EST)
+Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA10874 for <hackers@postgreSQL.org>; Thu, 30 Oct 1997 16:25:12 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id QAA06370;
+ Thu, 30 Oct 1997 16:07:52 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199710302107.QAA06370@candle.pha.pa.us>
+Subject: Re: [HACKERS] Subqueries?
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Thu, 30 Oct 1997 16:07:51 -0500 (EST)
+Cc: hackers@postgreSQL.org
+In-Reply-To: <34582629.33590565@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Oct 30, 97 01:16:09 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> Hi!
+>
+> Bruce, did you begin with them ?
+> I agreed that subqueries should be implemented like SQL-funcs, but
+> I would suggest to don't CREATE FUNCTION - this is quite bad for
+> performance, but use some new node (VirtualFunc or SubQuery or) and
+> handle such nodes like sql-funcs are handled in function.c
+> (but without parser/planner invocation on each call - should be
+> fixed!). Also, not corelated subqueries returning single result
+> can't be replaced in parser/planner by constant node: rules (and so -
+> views), spi and PL use _prepared_ plans...
+> It seems that this is not hard work...
+>
+> Vadim
+>
+>
+
+OK, here is what I have collected over the months about subqueries.
+The Sybase whitepaper is also attached.
+
+This should get us thinking about how to implement each subquery type,
+what operations need to be performed, and in what order.
+
+---------------------------------------------------------------------------
+
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Subject: Re: [PG95-DEV] Need info on other databases.
+To: pg95-dev@ki.net
+Date: Fri, 22 Nov 1996 12:49:24 -0500 (EST)
+
+>
+>
+> What I'm specifically interested in is the SQL-92 spec
+> for the ANSI things that postgres95 is missing and the
+> syntax/limitations on systems like Informix, Sybase,
+> Microsoft, et.al...
+>
+> Any technical info such as performance hits, disabling
+> the use of indices, stuff like that would be _greatly_
+> appreciated. I have a decent understanding of this for
+> Oracle, but not for any other systems. I want to get
+> an idea of the work load of adding the IN, BETWEEN/AND
+> and HAVING clauses.
+
+I have done some thinking about subselects. There are basically two
+issues:
+
+ Does the query return one row or several rows? This can be
+ determined by seeing if the user uses equals on 'IN' to join the
+ subquery.
+
+ Is the query correlated, meaning "Does the subquery reference
+ values from the outer query?"
+
+(We already have the third type of subquery, the INSERT...SELECT query.)
+
+So we have these four combinations:
+
+ 1) one row, no correlation
+ 2) multiple rows, no correlation
+ 3) one row, correlated
+ 4) multiple rows, correlated
+
+
+With #1, we can execute the subquery, get the value, replace the
+subquery with the constant returned from the subquery, and execute the
+outer query.
+
+With #2, we can execute the subquery and put the result into a temporary
+table. We then rewrite the outer query to access the temporary table
+and replace the subquery with the column name from the temporary table.
+We probabally put an index on the temp. table, which has only one
+column, because a subquery can only return one column. We remove the
+temp. table after query execution.
+
+With #3 and #4, we potentially need to execute the subquery for every
+row returned by the outer query. Performance would be horrible for
+anything but the smallest query. Another way to handle this is to
+execute the subquery WITHOUT using any of the outer-query columns to
+restrict the WHERE clause, and add those columns used to join the outer
+variables into the target list of the subquery. So for query:
+
+ select t1.name
+ from tab t1
+ where t1.age = (select max(t2.age)
+ from tab2
+ where tab2.name = t1.name)
+
+Execute the subquery and put it in a temporary table:
+
+ select t2.name, max(t2.age)
+ into table temp999
+ from tab2
+ where tab2.name = t1.name
+
+ create index i_temp999 on temp999 (name)
+
+Then re-write the outer query:
+
+ select t1.name
+ from tab t1, temp999
+ where t1.age = temp999.age and
+ t1.name = temp999.name
+
+The only problem here is that the subselect is running for all entries
+in tab2, even if the outer query is only going to need a few rows.
+Determining whether to execute the subquery each time, or create a temp.
+table is often difficult to determine. Even some non-correlated
+subqueries are better to execute for each row rather the pre-execute the
+entire subquery, expecially if the outer query returns few rows.
+
+One requirement to handle these issues is better column statistics,
+which I am working on.
+
+------------------------------------------------------------------------------
+
+Date: Thu, 5 Dec 1996 10:07:56 -0500
+From: aixssd!darrenk@abs.net (Darren King)
+To: maillist@candle.pha.pa.us
+Subject: Subselect info.
+
+> Any of them deal with implementing subselects?
+
+There's a white paper at the www.sybase.com that might
+help a little. It's just a copy of a presentation
+given by the optimizer guru there. Nothing code-wise,
+but he gives a few ways of flattening them with temp
+tables, etc...
+
+Darren
+
+------------------------------------------------------------------------------
+
+Date: Fri, 22 Aug 1997 12:04:31 +0800
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+Subject: Re: subselects
+
+Bruce Momjian wrote:
+>
+> Considering the complexity of the primary/secondary changes you are
+> making, I believe subselects will be easier than that.
+
+I don't do changes for P/F keys - just thinking...
+Yes, I think that impl of referential integrity is
+more complex work.
+
+As for subselects:
+
+in plannodes.h
+
+typedef struct Plan {
+...
+ struct Plan *lefttree;
+ struct Plan *righttree;
+} Plan;
+
+/* ----------------
+ * these are are defined to avoid confusion problems with "left"
+ ^^^^^^^^^^^^^^^^^^
+ * and "right" and "inner" and "outer". The convention is that
+ * the "left" plan is the "outer" plan and the "right" plan is
+ * the inner plan, but these make the code more readable.
+ * ----------------
+ */
+#define innerPlan(node) (((Plan *)(node))->righttree)
+#define outerPlan(node) (((Plan *)(node))->lefttree)
+
+First thought is avoid any confusions by re-defining
+
+#define rightPlan(node) (((Plan *)(node))->righttree)
+#define leftPlan(node) (((Plan *)(node))->lefttree)
+
+and change all occurrences of 'outer' & 'inner' in code
+to 'left' & 'inner' ones:
+
+this will allow to use 'outer' & 'inner' things for subselects
+latter, without confusion. My hope is that we may change Executor
+very easy by adding outer/inner plans/TupleSlots to
+EState, CommonState, JoinState, etc and by doing node
+processing in right order.
+
+Subselects are mostly Planner problem.
+
+Unfortunately, I havn't time at the moment: CHECK/DEFAULT...
+
+Vadim
+
+------------------------------------------------------------------------------
+
+Date: Fri, 22 Aug 1997 12:22:37 +0800
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+Subject: Re: subselects
+
+Vadim B. Mikheev wrote:
+>
+> this will allow to use 'outer' & 'inner' things for subselects
+> latter, without confusion. My hope is that we may change Executor
+
+Or may be use 'high' & 'low' for subselecs (to avoid confusion
+with outter hoins).
+
+> very easy by adding outer/inner plans/TupleSlots to
+> EState, CommonState, JoinState, etc and by doing node
+> processing in right order.
+ ^^^^^^^^^^^^^^
+Rule is easy:
+1. Uncorrelated subselect - do 'low' plan node first
+2. Correlated - do left/right first
+
+- just some flag in structures.
+
+Vadim
+
+
+---------------------------------------------------------------------------
+
+[Image]
+Home | Search/Index
+
+Performance Tips for Transact-SQL
+
+Slides from a presentation by Jeff Lichtman
+
+----------------------------------------------------------------------------
+
+Table of Contents
+
+Overview
+>versus>=
+Exists Versus Not Exists
+Exists Versus Not Exists II
+Correlated Subqueries with Restrictive Outer Joins
+Correlated Subqueries with Restrictive Outer Joins Example
+Correlated Subqueries with Restrictive Outer Joins III
+Correlated Subqueries with Restrictive Outer Joins IV
+Correlated Subqueries with Restrictive Outer Joins V
+Correlated Subqueries with Restrictive Outer Joins Example
+Creating Tables in Stored Procedures
+Creating Tables in Stored Procedures Example
+Variables versus Parameters in Where Clause
+Variables versus Parameters in Where Clause Example
+Count versus Exists
+Count versus Exists II
+Or versus Union
+Or versus Union Example
+MAX and MIN Aggregates
+MAX and MIN Aggregates II
+MAX and MIN Aggregates Example
+MAX and MIN Aggregates III
+Joins and Datatypes
+Joins and Datatypes Example
+Joins and Datatypes II
+Joins and Datatypes III
+Parameters and Datatypes
+Parameters and Datatypes Example
+Summary
+----------------------------------------------------------------------------
+
+Overview
+
+ * Goal Is to Learn Some Tips to Help You Improve the Performance of Your
+ Queries.
+ * Emphasis Is on Queries, Not on Schema.
+ * Many Tips Are Not Related to Query Optimizer.
+ * Tips Are Based on Actual Customer Cases Seen by SQL Server Development
+ Engineer.
+ * These Tips Are Intended As Suggestions and Guidelines, Not Absolute
+ Rules.
+ * Some of These Tips Could Become Obsolete As Sybase Improves the SQL
+ Server.
+
+----------------------------------------------------------------------------
+
+> versus >=
+
+Given the query:
+
+select * from tab where x > 3
+
+with an index on x. This query works by using the index to find the first
+value where x = 3, and scanning forward.
+
+Suppose there are many rows in tab where x = 3.
+
+In this case, the server has to scan many pages before finding the first row
+where x > 3.
+
+It is more efficient to write the query like this:
+
+select * from tab where x >= 4
+
+----------------------------------------------------------------------------
+
+Exists Versus Not Exists
+
+In subqueries and IF statements, EXISTS and IN are faster than NOT EXISTS
+and NOT IN.
+
+With IF statements, one can easily avoid NOT EXISTS:
+
+if not exists (select * from ...)
+begin /* Statement group 1 */
+...
+end else begin /* Statement group 2 */
+...
+end
+
+can be re-written as:
+
+if exists (select * from ...)
+begin /* Statement group 2 */
+...
+end else begin /* Statement group 1 */
+...
+end
+
+----------------------------------------------------------------------------
+
+Exists versus Not Exists (cont.)
+
+Even without an ELSE clause, it is possible to avoid
+
+NOT EXISTS in IF statements :
+
+if not exists (select * from ...)
+begin
+ /* Statement group */
+ ...
+end
+...
+
+can be re-written as:
+
+if exists (select * from ...)
+begin
+ goto exists_label
+end
+/* Statement group */
+...
+exists_label:
+...
+
+----------------------------------------------------------------------------
+
+Correlated Subqueries with Restrictive Outer Joins
+
+ * SQL Server Processes Subqueries "Inside-Out"
+ * For Correlated Subqueries, It Creates a Worktable Containing Subquery
+ Results
+ * The Worktable Is Grouped on the Correlation Columns
+
+----------------------------------------------------------------------------
+
+Correlated Subqueries with Restrictive Outer Joins
+
+For example:
+
+select w from outer where x =
+ (select sum(a) from inner
+ where inner.b = outer.z)
+
+becomes:
+
+select outer.z, summ = sum(inner.a)
+into #work
+from outer, inner
+where inner.b = outer.z
+group by outer.z
+select outer.w
+from outer, #work
+where outer.z = #work.z
+and outer.x = #work.summ
+
+----------------------------------------------------------------------------
+
+Correlated Subqueries with Restrictive Outer Joins (cont.)
+
+The SQL Server copies search clauses from the outer query to the subquery to
+improve performance:
+
+select w from outer
+where y = 1
+and x = (select sum(a)
+ from inner
+ where inner.b = outer.z)
+
+becomes:
+
+select outer.z, summ = sum(inner.a)
+into #work
+from outer, inner
+where inner.b = outer.z and outer.y = 1
+group by outer .z
+select outer.w
+from outer, #work
+where outer.z = #work.z and outer.y = 1 and outer.x =#work.summ
+
+----------------------------------------------------------------------------
+
+Correlated Subqueries with Restrictive Outer Joins (cont.)
+
+ * The SQL Server Does Not Copy Join Clauses Into Correlated Subqueries As
+ It Does With Search Clauses.
+ * Copying Search Clauses Will Always Make the Query Run Faster, but
+ Copying a Join Clause Might Make It Run Slower.
+ * Copying the Join Clause Is Beneficial Only If the Join Clause Is Very
+ Restrictive.
+ * Only the Query Optimizer Knows Whether a Join Clause Is Restrictive,
+ but the SQL Server Breaks the Query Into Steps Before Optimization.
+ * Since You Know Your Data, You Can Copy Join Clauses Into Subqueries
+ When You Know It Will Help.
+
+----------------------------------------------------------------------------
+
+Correlated Subqueries with Restrictive Outer Joins (cont.)
+
+An example of when to copy join clause:
+
+select *
+from huge_tab, single_row_tab
+where huge_tab.unique_column = single_row_tab.a
+and huge_tab.b = (select sum©
+ from inner
+ where huge_tab.d = inner.e)
+
+should be re-written as:
+
+select *
+from huge_tab, single_row_tab
+where huge_tab.unique_column = single_row_tab.a
+and huge_tab.b = (select sum©
+ from inner
+ where huge_tab.d = inner.e
+ and huge_tab.unique_column = single_row_tab.a)
+
+----------------------------------------------------------------------------
+
+Correlated Subqueries with Restrictive Outer Joins (cont.)
+
+An example of when not to copy join clause:
+
+select *
+from huge_tab, single_row_tab
+where huge_tab.many_duplicates_in_column = single_row_tab.a and
+single_row_tab.b = (select sum©
+ from inner
+ where single_row_tab.d = inner.e)
+
+Should not be re-written as:
+
+select *
+from huge_tab, single_row_tab
+where huge_tab.many_duplicates_in_column = single_row_tab.a and
+single_row_tab.b = (select sum©
+ from inner
+ where single_row tab.d = inner .e
+ and huge_tab.many_duplicates_in_column = single_row_tab.a)
+
+----------------------------------------------------------------------------
+
+Creating Tables in Stored Procedures
+
+ * When You Create a Table in the Same Stored Procedure Where It Is Used,
+ the Query Optimizer Cannot Know How Big the Table Is.
+ * The Optimizer Assumes That Any Such Table Has 10 Data Pages and 100
+ Rows.
+ * If the Table Is Really Big, This Assumption Can Lead the Optimizer to
+ Choose a Sub-Optimal Query Plan.
+ * In Cases Like This, It Is Better to Create the Table Outside the
+ Procedure, Which Allows the Optimizer to See How Large the Table Is.
+
+----------------------------------------------------------------------------
+
+Creating Tables in Stored Procedures (cont)
+
+For example:
+
+create proc p as
+ select * into #huge_result from ...
+ select * from tab, #huge_result where
+ ...
+
+can be re-written as:
+
+create proc p as
+ select * into #huge_result from ...
+ exec s
+create proc s as
+ select * from tab, #huge_result where
+ ...
+
+----------------------------------------------------------------------------
+
+Variables versus Parameters in Where Clause
+
+ * The Query Optimizer Cannot Predict the Value of a Declared Variable.
+ * The Query Does Know the Value of a Parameter to a Stored Procedure at
+ Compile Time.
+ * Knowing the Values in the WHERE Clause of a Query Can Help the
+ Optimizer Make Better Choices.
+ * To Avoid Putting Variables Into WHERE Clauses, One Can Split up Stored
+ Procedures.
+
+----------------------------------------------------------------------------
+
+Variables versus Parameters in Where Clause (cont)
+
+For example:
+
+create procedure p as
+ declare @x int
+ select @x = col from tab where ...
+ select * from tab2 where col2 = @x
+
+can be re-written as:
+
+create procedure p as
+ declare @x int
+ select @x = col from tab where ...
+ exec s @x
+create procedure s @x int as
+ select * from tab2 where col2 = @x
+
+----------------------------------------------------------------------------
+
+Count versus Exists
+
+It is possible to use the COUNT aggregate in a subquery to do an existence
+check:
+
+select * from tab where 0 <
+ (select count(*) from tab2 where ...)
+
+It is possible to write this same query using EXISTS (or IN):
+
+select * from tab where exists
+ (select * from tab2 where ...)
+
+----------------------------------------------------------------------------
+
+Count versus Exists (cont)
+
+ * Using COUNT to Do an Existence Check Is Slower Than Using EXISTS.
+ * When You Use COUNT, the SQL Server Does Not Know That You Are Doing an
+ Existence Check. It Counts All of the Matching Values.
+ * When You Use EXISTS, the SQL Server Knows You Are Doing an Existence
+ Check, So It Stops Looking When It Finds the First Matching Value.
+ * The Same Applies to Using COUNT Instead of IN or ANY.
+
+----------------------------------------------------------------------------
+
+Or versus Union
+
+ * The SQL Server Cannot Optimize Join Clauses That Are Linked With OR.
+ * The SQL Server Can Optimize Selects That Are Linked With UNION.
+ * The Result of OR Is Somewhat Like the Result of UNION, Except For the
+ Treatment of Duplicate Rows and Empty Tables.
+
+----------------------------------------------------------------------------
+
+Or versus Union (cont)
+
+For example:
+
+select * from tab1, tab2
+where tab1.a = tab2.b
+or tab1.x = tab2.y
+
+can be re-written as:
+
+select * from tab1, tab2
+where tab1.a = tab2.b
+union all
+select * from tab1, tab2
+where tab1.x = tab2.y
+
+You can use UNION instead of UNION ALL if you want to eliminate duplicates,
+but this will eliminate all duplicates. It may not be possible to get
+exactly the same set of duplicates from the re-written query.
+----------------------------------------------------------------------------
+
+MAX and MIN Aggregates
+
+ * The SQL Server Uses Special Optimizations for the MAX and MIN
+ Aggregates When There Is an Index on the Aggregated Column.
+ * For MIN, It Stops the Scan on the First Qualifying Row.
+ * For MAX, It Goes Directly to the End of the Index to Find the Last Row.
+ * The Optimization Is Not Applied If:
+ o The Expression Inside the MAX or MIN Is Anything but a Column
+ o The Column Inside the MAX or MIN Is Not the First Column of an
+ Index
+ o There Is Another Aggregate in the Query
+ o There Is a GROUP BY Clause
+ * In Addition, the MAX Optimization Is Not Applied If There Is a WHERE
+ Clause.
+
+----------------------------------------------------------------------------
+
+MAX and MIN Aggregates (cont)
+
+If you have an optimizable MAX or MIN aggregate, it can pay to put it in a
+query separate from other aggregates. For example:
+
+select max(x), min(x) from tab
+
+will result in a full scan of tab, even if there is an index on x. The query
+can be re-written as:
+
+select max(x) from tab
+select min(x) from tab
+
+This can result in using the index twice, rather than scanning the entire
+table once.
+----------------------------------------------------------------------------
+
+MAX and MIN Aggregates (cont)
+
+The MIN optimization can backfire if the where clause is highly selective.
+For example:
+
+select min(index_col)
+from tab
+where
+ col_in_other_index = "value only at end of first index"
+
+The MIN optimization will result in a nearly complete scan of the entire
+index.
+
+This is counter-intuitive. The more selective the WHERE clause, the slower
+the query.
+----------------------------------------------------------------------------
+
+MAX and MIN Aggregates (cont)
+
+In cases like this, it can pay to disable the MIN optimization by combining
+it with another aggregate:
+
+select min(index_col), max(index_col)
+from tab
+where
+col_in_other_index = Òvalue only at end of first indexÓ
+
+This convinces the optimizer not to use the MIN optimization, so it chooses
+the next best plan, which might be the other index.
+----------------------------------------------------------------------------
+
+Joins and Datatypes
+
+ * When Joining Between Two Columns of the Different Datatypes, One of the
+ Columns Must Be Converted to the Type of the Other.
+ * The Commands Reference Manual Shows the Hierarchy of Types.
+ * The Column Whose Type Is Lower in the Hierarchy Is the One That Is
+ Converted.
+ * The Query Optimizer Cannot Choose an Index on the Column That Is
+ Converted.
+
+----------------------------------------------------------------------------
+
+Joins and Datatypes (cont)
+
+For example:
+
+select *
+from tab1, tab2
+where tab1.float_column = tab2.int_column
+
+In this case, no index on tab2.int_column can be used, because int is lower
+in the hierarchy than float.
+
+Note that CHAR NULL is really VARCHAR, and BINARY NULL is really VARBINARY.
+
+Joining CHAR NOT NULL with CHAR NULL involves a conversion (BINARY too).
+----------------------------------------------------------------------------
+
+Joins and Datatypes (cont)
+
+It's best to avoid datatype problems in joins by designing the schema
+accordingly.
+
+If a join between different datatypes is unavoidable, and it hurts
+performance, you can force the conversion to be on the other side of the
+join.
+
+For example:
+
+select *
+from tab1, tab2
+where tab1.char_column = convert(char(75),tab2.varchar_column)
+
+----------------------------------------------------------------------------
+
+Joins and Datatypes (cont)
+
+Be careful! This tactic can change the meaning of the query.
+
+For example:
+
+select *
+from tab1, tab2
+where tab1.int_column = convert(int, tab2.float_column)
+
+This will not return the same results as the join without the convert. It
+can be salvaged by adding:
+
+and tab2.float_column = convert(int, tab2.float_column)
+
+This assumes that all values in tab2.float_column can be converted to int.
+----------------------------------------------------------------------------
+
+Parameters and Datatypes
+
+ * The Query Optimizer Can Use the Values of Parameters to Stored
+ Procedures to Help Determine Costs.
+ * If a Parameter Is Not of the Same Type As the Column in The WHERE
+ Clause That It Is Being Compared to, the Server Has to Convert the
+ Parameter.
+ * The Optimizer Cannot Use the Value of a Converted Parameter.
+ * It Pays to Make Sure That Parameters Have the Same Type As the Columns
+ They Are Compared To.
+
+----------------------------------------------------------------------------
+
+Parameters and Datatypes (cont)
+
+For example:
+
+create proc p @x varchar(30) as
+select * from tab where char_column = @x
+
+may get a poorer query plan than:
+
+create proc p @x char(30) as
+select * from tab where char_column = @x
+
+Remember that CHAR NULL is really VARCHAR, and BINARY NULL is really
+VARBINARY.
+----------------------------------------------------------------------------
+
+Summary
+
+ * How you write your queries can make a big difference in performance.
+ * Two different queries that do the same thing may perform differently.
+ * There are few absolutes to improving performance, but the tips given
+ here can help.
+ * These tips are not all there is to know about performance.
+
+About the Author
+
+Jeff Lichtman has worked at Sybase since 1987. In 1994, he was given the new
+position of architect of query processing for SQL Server. He is informally
+known as Sybase's optimizer guru.
+
+For more info send email to webmaster@sybase.com
+
+Copyright 1995 © Sybase, Inc. All Rights Reserved.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Sun Jan 11 23:49:44 1998
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA19252
+ for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 23:49:02 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id MAA08095;
+ Mon, 12 Jan 1998 12:09:24 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B9A580.55DD4645@sable.krasnoyarsk.su>
+Date: Mon, 12 Jan 1998 12:09:20 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
+Subject: Re: [HACKERS] Re: subselects
+References: <199801110559.AAA11801@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> We need a new Node structure, call it Sublink:
+>
+> int linkType (IN, NOTIN, ANY, EXISTS, OPERATOR...)
+> Oid operator /* subquery must return single row */
+> List *lefthand; /* parent stuff */
+> Node *subquery; /* represents nodes from parser */
+> Index Subindex; /* filled in to index Query->subqueries */
+
+Ok, I agreed that it's better to have new node and don't put subquery stuff
+into Expr node.
+
+int linkType
+ is one of EXISTS, ANY, ALL, EXPR. EXPR is for the case of expression
+ subqueries (following Sybase naming) which must return single row -
+ (a, b, c) = (subquery).
+ Note again, that there are no linkType for IN and NOTIN here.
+ User' IN and NOT IN must be converted to = ANY and <> ALL by parser.
+
+We need not in Oid operator! In all cases we need in
+
+List *oper
+ list of Oper nodes for each of a, b, c, ... and operator (=, ...)
+ corresponding to data type of a, b, c, ...
+
+List *lefthand
+ is list of Var/Const nodes - representation of (a, b, c, ...)
+
+What is Node *subquery ?
+In optimizer we need either in Subindex (to get subquery from Query->subqueries
+when beeing in Sublink) or in Node *subquery inside Sublink itself.
+BTW, after some thought I don't see how Query->subqueries will be usefull.
+So, may be just add bool hassubqueries to Query (and Query *parentQuery)
+and use Query *subquery in Sublink, but not subindex ?
+
+>
+> Also, when parsing the subqueries, we need to keep track of correlated
+> references. I recommend we add a field to the Var structure:
+>
+> Index sublevel; /* range table reference:
+> = 0 current level of query
+> < 0 parent above this many levels
+> > 0 index into subquery list
+> */
+>
+> This way, a Var node with sublevel 0 is the current level, and is true
+> in most cases. This helps us not have to change much code. sublevel =
+> -1 means it references the range table in the parent query. sublevel =
+> -2 means the parent's parent. sublevel = 2 means it references the range
+> table of the second entry in Query->subqueries. Varno and varattno are
+> still meaningful. Of course, we can't reference variables in the
+> subqueries from the parent in the parser code, but Vadim may want to.
+ ^^^^^^^^^^^^^^^^^
+No. So, just use sublevel >= 0: 0 - current level, 1 - one level up, ...
+sublevel is for optimizer only - executor will not use it.
+
+>
+> When doing a Var lookup in the parser, we look in the current level
+> first, but if not found, if it is a subquery, we can look at the parent
+> and parent's parent to set the sublevel, varno, and varatno properly.
+>
+> We create no phantom range table entries in the subquery, and no phantom
+> target list entries. We can leave that all for the upper optimizer.
+
+Ok.
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Mon Jan 12 08:06:41 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00786
+ for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 08:06:39 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12270 for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 04:16:10 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08460;
+ Mon, 12 Jan 1998 16:34:54 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su>
+Date: Mon, 12 Jan 1998 16:34:45 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgreSQL.org
+Subject: Re: [HACKERS] Re: subselects
+References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Thomas G. Lockhart wrote:
+>
+> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
+> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
+> If lists are handled farther back, this routine should move to there also and the
+> parser will just pass the lists. Note that some assumptions have to be made about the
+> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
+> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
+> to disallow those cases or to look for specific appearance of the operator to guess
+> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
+> it has "<>" or "!" then build as "or"s.
+
+Oh, god! I never thought about this!
+Ok, I have to agree:
+
+1. Only <, <=, =, >, >=, <> is allowed with subselects
+2. Use OR's for <>, and so - we need in bool useor in SubLink
+ for <>, <> ANY and <> ALL:
+
+typedef struct SubLink {
+ NodeTag type;
+ int linkType; /* EXISTS, ALL, ANY, EXPR */
+ bool useor; /* TRUE for <> */
+ List *lefthand; /* List of Var/Const nodes on the left */
+ List *oper; /* List of Oper nodes */
+ Query *subquery; /* */
+} SubLink;
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Mon Jan 12 08:06:53 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00814
+ for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 08:06:51 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12449 for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 04:26:03 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id EAA01671; Mon, 12 Jan 1998 04:17:59 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 12 Jan 1998 04:17:29 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id EAA01651 for pgsql-hackers-outgoing; Mon, 12 Jan 1998 04:17:23 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id EAA01633 for <hackers@postgresql.org>; Mon, 12 Jan 1998 04:16:44 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08460;
+ Mon, 12 Jan 1998 16:34:54 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Message-ID: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su>
+Date: Mon, 12 Jan 1998 16:34:45 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgreSQL.org
+Subject: Re: [HACKERS] Re: subselects
+References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Thomas G. Lockhart wrote:
+>
+> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
+> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
+> If lists are handled farther back, this routine should move to there also and the
+> parser will just pass the lists. Note that some assumptions have to be made about the
+> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
+> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
+> to disallow those cases or to look for specific appearance of the operator to guess
+> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
+> it has "<>" or "!" then build as "or"s.
+
+Oh, god! I never thought about this!
+Ok, I have to agree:
+
+1. Only <, <=, =, >, >=, <> is allowed with subselects
+2. Use OR's for <>, and so - we need in bool useor in SubLink
+ for <>, <> ANY and <> ALL:
+
+typedef struct SubLink {
+ NodeTag type;
+ int linkType; /* EXISTS, ALL, ANY, EXPR */
+ bool useor; /* TRUE for <> */
+ List *lefthand; /* List of Var/Const nodes on the left */
+ List *oper; /* List of Oper nodes */
+ Query *subquery; /* */
+} SubLink;
+
+Vadim
+
+
+From vadim@sable.krasnoyarsk.su Mon Jan 12 08:06:38 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00783
+ for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 08:06:36 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12377 for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 04:21:55 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08470;
+ Mon, 12 Jan 1998 16:40:49 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34B9E520.4C0EA6BC@sable.krasnoyarsk.su>
+Date: Mon, 12 Jan 1998 16:40:48 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgreSQL.org
+Subject: Re: [HACKERS] Re: subselects
+References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Thomas G. Lockhart wrote:
+>
+> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
+> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
+> If lists are handled farther back, this routine should move to there also and the
+> parser will just pass the lists. Note that some assumptions have to be made about the
+> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
+> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
+> to disallow those cases or to look for specific appearance of the operator to guess
+> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
+> it has "<>" or "!" then build as "or"s.
+
+Sorry, I forgot something: is (a, b) OP (x, y) in standard ?
+If not then I suggest to don't implement it at all and allow
+(a, b) OP [ANY|ALL] (subselect) only.
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Tue Jan 13 09:30:58 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA28551
+ for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 09:30:56 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA26483 for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 09:21:36 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id VAA04356;
+ Tue, 13 Jan 1998 21:20:31 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34BB7829.2B18D4B5@sable.krasnoyarsk.su>
+Date: Tue, 13 Jan 1998 21:20:25 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
+Subject: Re: [HACKERS] Re: subselects
+References: <199801121424.JAA02440@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Ok. I don't see how Query->subqueries could me help, but I foresee
+that Query->sublinks can do it. Could you add this ?
+
+Bruce Momjian wrote:
+>
+> >
+> > What is Node *subquery ?
+> > In optimizer we need either in Subindex (to get subquery from Query->subqueries
+> > when beeing in Sublink) or in Node *subquery inside Sublink itself.
+> > BTW, after some thought I don't see how Query->subqueries will be usefull.
+> > So, may be just add bool hassubqueries to Query (and Query *parentQuery)
+> > and use Query *subquery in Sublink, but not subindex ?
+>
+> OK, I originally created it because the parser would have trouble
+> filling in a List* field in SelectStmt while it was parsing a WHERE
+> clause. I decided to just stick the SelectStmt* into Sublink->subquery.
+>
+> While we are going through the parse output to fill in the Query*, I
+> thought we should move the actual subquery parse output to a separate
+> place, and once the Query* was completed, spin through the saved
+> subquery parse list and stuff Query->subqueries with a list of Query*
+> for the subqueries. I thought this would be easier, because we would
+> then have all the subqueries in a nice list that we can manage easier.
+>
+> In fact, we can fill Query->subqueries with SelectStmt* as we process
+> the WHERE clause, then convert them to Query* at the end.
+>
+> If you would rather keep the subquery Query* entries in the Sublink
+> structure, we can do that. The only issue I see is that when you want
+> to get to them, you have to wade through the WHERE clause to find them.
+> For example, we will have to run the subquery Query* through the rewrite
+> system. Right now, for UNION, I have a nice union List* in Query, and I
+> just spin through it in postgres.c for each Union query. If we keep the
+> subquery Query* inside Sublink, we have to have some logic to go through
+> and find them.
+>
+> If we just have an Index in Sublink to the Query->subqueries, we can use
+> the nth() macro to find them quite easily.
+>
+> But it is up to you. I really don't know how you are going to handle
+> things like:
+>
+> select *
+> from taba
+> where x = 3 and y = 5 and (z=6 or q in (select g from tabb ))
+
+No problems.
+
+>
+> My logic was to break the problem down to single queries as much as
+> possible, so we would be breaking the problem up into pieces. Whatever
+> is easier for you.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Tue Jan 13 10:32:35 1998
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA29523
+ for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 10:32:33 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA03743; Tue, 13 Jan 1998 10:32:13 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 13 Jan 1998 10:31:57 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA03708 for pgsql-hackers-outgoing; Tue, 13 Jan 1998 10:31:51 -0500 (EST)
+Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id KAA03628 for <hackers@postgreSQL.org>; Tue, 13 Jan 1998 10:31:20 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id JAA28747;
+ Tue, 13 Jan 1998 09:48:00 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801131448.JAA28747@candle.pha.pa.us>
+Subject: Re: [HACKERS] Re: subselects
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Tue, 13 Jan 1998 09:48:00 -0500 (EST)
+Cc: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
+In-Reply-To: <34BB7829.2B18D4B5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 13, 98 09:20:25 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> Ok. I don't see how Query->subqueries could me help, but I foresee
+> that Query->sublinks can do it. Could you add this ?
+
+OK, so instead of moving the query out of the SubLink structure, you
+want the Query* in the Sublink structure, and a List* of SubLink
+pointers in the query structure?
+
+ Query
+ {
+ ...
+ List *sublink; /* list of pointers to Sublinks
+ ...
+ }
+
+I can do that. Let me know.
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Tue Jan 13 22:23:46 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA08806
+ for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 22:23:45 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA11486 for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 22:09:55 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id KAA05660;
+ Wed, 14 Jan 1998 10:09:07 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34BC2C4E.83E92D82@sable.krasnoyarsk.su>
+Date: Wed, 14 Jan 1998 10:09:02 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
+Subject: Re: [HACKERS] Re: subselects
+References: <199801131448.JAA28747@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> >
+> > Ok. I don't see how Query->subqueries could me help, but I foresee
+> > that Query->sublinks can do it. Could you add this ?
+>
+> OK, so instead of moving the query out of the SubLink structure, you
+> want the Query* in the Sublink structure, and a List* of SubLink
+> pointers in the query structure?
+
+Yes.
+
+>
+> Query
+> {
+> ...
+> List *sublink; /* list of pointers to Sublinks
+> ...
+> }
+>
+> I can do that. Let me know.
+
+Thanks!
+
+Are there any opened issues ?
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Thu Jan 15 19:00:40 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA21676
+ for <maillist@candle.pha.pa.us>; Thu, 15 Jan 1998 19:00:39 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA23948 for <maillist@candle.pha.pa.us>; Thu, 15 Jan 1998 18:35:59 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA27814; Thu, 15 Jan 1998 18:32:40 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 15 Jan 1998 18:32:20 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27668 for pgsql-hackers-outgoing; Thu, 15 Jan 1998 18:32:08 -0500 (EST)
+Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27425 for <hackers@postgreSQL.org>; Thu, 15 Jan 1998 18:31:32 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id SAA12920;
+ Thu, 15 Jan 1998 18:18:32 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801152318.SAA12920@candle.pha.pa.us>
+Subject: Re: [HACKERS] Re: subselects
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Thu, 15 Jan 1998 18:18:31 -0500 (EST)
+Cc: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
+In-Reply-To: <34BC2C4E.83E92D82@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 14, 98 10:09:02 am
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+>
+> Bruce Momjian wrote:
+> >
+> > >
+> > > Ok. I don't see how Query->subqueries could me help, but I foresee
+> > > that Query->sublinks can do it. Could you add this ?
+> >
+> > OK, so instead of moving the query out of the SubLink structure, you
+> > want the Query* in the Sublink structure, and a List* of SubLink
+> > pointers in the query structure?
+>
+> Yes.
+>
+> >
+> > Query
+> > {
+> > ...
+> > List *sublink; /* list of pointers to Sublinks
+> > ...
+> > }
+> >
+> > I can do that. Let me know.
+>
+> Thanks!
+>
+> Are there any opened issues ?
+
+OK, what do you need me to do. Do you want me to create the Sublink
+support stuff, fill them in in the parser, and pass them through the
+rewrite section and into the optimizer. I will prepare a list of
+changes.
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Thu Jan 15 19:00:38 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA21663
+ for <maillist@candle.pha.pa.us>; Thu, 15 Jan 1998 19:00:36 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA23925 for <maillist@candle.pha.pa.us>; Thu, 15 Jan 1998 18:35:42 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA27796; Thu, 15 Jan 1998 18:32:37 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 15 Jan 1998 18:31:52 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27463 for pgsql-hackers-outgoing; Thu, 15 Jan 1998 18:31:37 -0500 (EST)
+Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27167 for <hackers@postgreSQL.org>; Thu, 15 Jan 1998 18:31:06 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id SAA26747;
+ Thu, 15 Jan 1998 18:26:42 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801152326.SAA26747@candle.pha.pa.us>
+Subject: Re: [HACKERS] Re: subselects
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Thu, 15 Jan 1998 18:26:41 -0500 (EST)
+Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
+In-Reply-To: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 12, 98 04:34:45 pm
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> typedef struct SubLink {
+> NodeTag type;
+> int linkType; /* EXISTS, ALL, ANY, EXPR */
+> bool useor; /* TRUE for <> */
+> List *lefthand; /* List of Var/Const nodes on the left */
+> List *oper; /* List of Oper nodes */
+> Query *subquery; /* */
+> } SubLink;
+
+OK, we add this structure above. During parsing, *subquery actually
+will hold Node *parsetree, not Query *.
+
+And add to Query:
+
+ bool hasSubLinks;
+
+Also need a function to return a List* of SubLink*. I just did a
+similar thing with Aggreg*. And Var gets:
+
+ int uplevels;
+
+Is that it?
+
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From owner-pgsql-hackers@hub.org Fri Jan 16 04:36:05 1998
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09604
+ for <maillist@candle.pha.pa.us>; Fri, 16 Jan 1998 04:36:03 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id EAA07040; Fri, 16 Jan 1998 04:35:27 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 16 Jan 1998 04:35:18 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id EAA06936 for pgsql-hackers-outgoing; Fri, 16 Jan 1998 04:35:13 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id EAA06823 for <hackers@postgreSQL.org>; Fri, 16 Jan 1998 04:34:22 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA10384;
+ Fri, 16 Jan 1998 16:34:15 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Message-ID: <34BF2997.97B40172@sable.krasnoyarsk.su>
+Date: Fri, 16 Jan 1998 16:34:15 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
+Subject: Re: [HACKERS] Re: subselects
+References: <199801152326.SAA26747@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce Momjian wrote:
+>
+> > typedef struct SubLink {
+> > NodeTag type;
+> > int linkType; /* EXISTS, ALL, ANY, EXPR */
+> > bool useor; /* TRUE for <> */
+> > List *lefthand; /* List of Var/Const nodes on the left */
+> > List *oper; /* List of Oper nodes */
+> > Query *subquery; /* */
+> > } SubLink;
+>
+> OK, we add this structure above. During parsing, *subquery actually
+> will hold Node *parsetree, not Query *.
+ ^^^^^^^^^^^^^^^
+But optimizer will get node Query here, yes ?
+
+>
+> And add to Query:
+>
+> bool hasSubLinks;
+>
+> Also need a function to return a List* of SubLink*. I just did a
+> similar thing with Aggreg*. And Var gets:
+>
+> int uplevels;
+>
+> Is that it?
+
+Yes.
+
+Vadim
+
+
+From vadim@sable.krasnoyarsk.su Fri Jan 16 04:36:21 1998
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09607
+ for <maillist@candle.pha.pa.us>; Fri, 16 Jan 1998 04:36:06 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA10396;
+ Fri, 16 Jan 1998 16:37:21 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34BF2A50.A357A16D@sable.krasnoyarsk.su>
+Date: Fri, 16 Jan 1998 16:37:20 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
+Subject: Re: [HACKERS] Re: subselects
+References: <199801152318.SAA12920@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> >
+> > Are there any opened issues ?
+>
+> OK, what do you need me to do. Do you want me to create the Sublink
+> support stuff, fill them in in the parser, and pass them through the
+> rewrite section and into the optimizer. I will prepare a list of
+> changes.
+
+Please do this. I'm ready to start coding of things in optimizer.
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Sun Jan 18 07:32:52 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA14786
+ for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 07:32:51 -0500 (EST)
+Received: from www.krasnet.ru ([193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA29385 for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 07:25:55 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA15780;
+ Sun, 18 Jan 1998 19:27:14 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34C1F51D.E9CF0A39@sable.krasnoyarsk.su>
+Date: Sun, 18 Jan 1998 19:27:09 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>,
+ PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] subselects coding started
+References: <199801170500.AAA12837@candle.pha.pa.us> <34C044D5.C21FE707@alumni.caltech.edu>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Thomas G. Lockhart wrote:
+>
+> Bruce Momjian wrote:
+>
+> > OK, I have created the SubLink structure with supporting routines, and
+> > have added code to create the SubLink structures in the parser, and have
+> > added Query->hasSubLink.
+> >
+> > I changed gram.y to support:
+> >
+> > (x,y,z) OP (subselect)
+> >
+> > where OP is any operator. Is that right, or are we doing only certain
+> > ones, and of so, do we limit it in the parser?
+>
+> Seems like we would want to pass most operators and expressions through
+> gram.y, and then call elog() in either the transformation or in the
+> optimizer if it is an operator which can't be supported.
+
+Not in optimizer, in parser, please.
+Remember that for <> SubLink->useor must be TRUE and this is parser work
+(optimizer don't know about "=", "<>", etc but only about Oper nodes).
+
+IN ("=" ANY) and NOT IN ("<>" ALL) transformations are also parser work.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Sun Jan 18 21:08:59 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA00825
+ for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 21:08:57 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA25254 for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 19:18:24 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA06912; Sun, 18 Jan 1998 19:17:01 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 18 Jan 1998 19:11:05 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA06322 for pgsql-hackers-outgoing; Sun, 18 Jan 1998 19:11:01 -0500 (EST)
+Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA06144 for <hackers@postgresql.org>; Sun, 18 Jan 1998 19:10:31 -0500 (EST)
+Received: from www.krasnet.ru ([193.125.44.86])
+ by clio.trends.ca (8.8.8/8.8.8) with ESMTP id HAA12383
+ for <hackers@postgreSQL.org>; Sun, 18 Jan 1998 07:28:38 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA15780;
+ Sun, 18 Jan 1998 19:27:14 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Message-ID: <34C1F51D.E9CF0A39@sable.krasnoyarsk.su>
+Date: Sun, 18 Jan 1998 19:27:09 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
+CC: Bruce Momjian <maillist@candle.pha.pa.us>,
+ PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: [HACKERS] subselects coding started
+References: <199801170500.AAA12837@candle.pha.pa.us> <34C044D5.C21FE707@alumni.caltech.edu>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Thomas G. Lockhart wrote:
+>
+> Bruce Momjian wrote:
+>
+> > OK, I have created the SubLink structure with supporting routines, and
+> > have added code to create the SubLink structures in the parser, and have
+> > added Query->hasSubLink.
+> >
+> > I changed gram.y to support:
+> >
+> > (x,y,z) OP (subselect)
+> >
+> > where OP is any operator. Is that right, or are we doing only certain
+> > ones, and of so, do we limit it in the parser?
+>
+> Seems like we would want to pass most operators and expressions through
+> gram.y, and then call elog() in either the transformation or in the
+> optimizer if it is an operator which can't be supported.
+
+Not in optimizer, in parser, please.
+Remember that for <> SubLink->useor must be TRUE and this is parser work
+(optimizer don't know about "=", "<>", etc but only about Oper nodes).
+
+IN ("=" ANY) and NOT IN ("<>" ALL) transformations are also parser work.
+
+Vadim
+
+
+From vadim@sable.krasnoyarsk.su Sun Jan 18 23:59:08 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA10497
+ for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 23:59:07 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA06941 for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 23:44:32 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id LAA16745
+ for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 11:46:28 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34C2DAA3.78E54042@sable.krasnoyarsk.su>
+Date: Mon, 19 Jan 1998 11:46:27 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+Subject: Re: SubLink->oper
+References: <199801190419.XAA04367@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> In SubLink->oper, do you want the oid of the pg_operator, or the oid of
+> the pg_proc assigned to the operator?
+>
+> Currently, I am giving you the oid of pg_operator.
+
+No! I need in Oper nodes here. For "normal" operators parser
+returns Expr node with opType = OP_EXPR and corresponding Oper
+in Node *oper. Near the same for SubLink: I need in Oper node
+for each pair of Var/Const from the left side and target entry from
+the subquery.
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Mon Jan 19 01:02:23 1998
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24036
+ for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:02:21 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA13913; Mon, 19 Jan 1998 01:02:16 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 19 Jan 1998 01:01:41 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA13824 for pgsql-hackers-outgoing; Mon, 19 Jan 1998 01:01:34 -0500 (EST)
+Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA13699 for <hackers@postgreSQL.org>; Mon, 19 Jan 1998 01:00:59 -0500 (EST)
+Received: (from maillist@localhost)
+ by candle.pha.pa.us (8.8.5/8.8.5) id AAA23866;
+ Mon, 19 Jan 1998 00:54:49 -0500 (EST)
+From: Bruce Momjian <maillist@candle.pha.pa.us>
+Message-Id: <199801190554.AAA23866@candle.pha.pa.us>
+Subject: [HACKERS] subselects
+To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
+Date: Mon, 19 Jan 1998 00:54:49 -0500 (EST)
+Cc: hackers@postgreSQL.org (PostgreSQL-development)
+X-Mailer: ELM [version 2.4 PL25]
+MIME-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+
+OK, I have added code to allow the SubLinks make it to the optimizer.
+
+I implemented ParseState->parentParseState, but not parentQuery, because
+the parentParseState is much more valuable to me, and Vadim thought it
+might be useful, but was not positive. Also, keeping that parentQuery
+pointer valid through rewrite may be difficult, so I dropped it.
+ParseState is only valid in the parser.
+
+I have not done:
+
+ correlated subquery column references
+ added Var->sublevels_up
+ gotten this to work in the rewrite system
+ have not added full CopyNode support
+
+I will address these in the next few days.
+
+--
+Bruce Momjian
+maillist@candle.pha.pa.us
+
+
+From vadim@sable.krasnoyarsk.su Mon Jan 19 01:32:54 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24335
+ for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:32:52 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA10610 for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:23:02 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16879
+ for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 13:25:28 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34C2F1D2.9CD191CC@sable.krasnoyarsk.su>
+Date: Mon, 19 Jan 1998 13:25:22 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+Subject: Re: SubLink->oper
+References: <199801190500.AAA10576@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> >
+> > Bruce Momjian wrote:
+> > >
+> > > In SubLink->oper, do you want the oid of the pg_operator, or the oid of
+> > > the pg_proc assigned to the operator?
+> > >
+> > > Currently, I am giving you the oid of pg_operator.
+> >
+> > No! I need in Oper nodes here. For "normal" operators parser
+> > returns Expr node with opType = OP_EXPR and corresponding Oper
+> > in Node *oper. Near the same for SubLink: I need in Oper node
+> > for each pair of Var/Const from the left side and target entry from
+> > the subquery.
+> >
+> > Vadim
+> >
+>
+> OK, can I give you an Oper* for each field.
+
+Nice! But what's this:
+
+typedef struct SubLink
+{
+struct Query;
+^^^^^^^^^^^^^
+ NodeTag type;
+
+Vadim
+
+From vadim@sable.krasnoyarsk.su Mon Jan 19 01:34:39 1998
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24346
+ for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:34:33 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16904;
+ Mon, 19 Jan 1998 13:37:42 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Sender: root@www.krasnet.ru
+Message-ID: <34C2F4B4.7BBA1DB2@sable.krasnoyarsk.su>
+Date: Mon, 19 Jan 1998 13:37:41 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgreSQL.org>
+Subject: Re: subselects
+References: <199801190554.AAA23866@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+Bruce Momjian wrote:
+>
+> OK, I have added code to allow the SubLinks make it to the optimizer.
+>
+> I implemented ParseState->parentParseState, but not parentQuery, because
+> the parentParseState is much more valuable to me, and Vadim thought it
+> might be useful, but was not positive. Also, keeping that parentQuery
+> pointer valid through rewrite may be difficult, so I dropped it.
+> ParseState is only valid in the parser.
+>
+> I have not done:
+>
+> correlated subquery column references
+> added Var->sublevels_up
+> gotten this to work in the rewrite system
+> have not added full CopyNode support
+>
+> I will address these in the next few days.
+
+Nice! I'm starting with non-correlated subqueries...
+
+Vadim
+
+From owner-pgsql-hackers@hub.org Mon Jan 19 01:35:50 1998
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24362
+ for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:35:48 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA17531; Mon, 19 Jan 1998 01:35:39 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 19 Jan 1998 01:35:33 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA17460 for pgsql-hackers-outgoing; Mon, 19 Jan 1998 01:35:28 -0500 (EST)
+Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA17323 for <hackers@postgreSQL.org>; Mon, 19 Jan 1998 01:35:03 -0500 (EST)
+Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
+ by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16904;
+ Mon, 19 Jan 1998 13:37:42 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Message-ID: <34C2F4B4.7BBA1DB2@sable.krasnoyarsk.su>
+Date: Mon, 19 Jan 1998 13:37:41 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgreSQL.org>
+Subject: [HACKERS] Re: subselects
+References: <199801190554.AAA23866@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce Momjian wrote:
+>
+> OK, I have added code to allow the SubLinks make it to the optimizer.
+>
+> I implemented ParseState->parentParseState, but not parentQuery, because
+> the parentParseState is much more valuable to me, and Vadim thought it
+> might be useful, but was not positive. Also, keeping that parentQuery
+> pointer valid through rewrite may be difficult, so I dropped it.
+> ParseState is only valid in the parser.
+>
+> I have not done:
+>
+> correlated subquery column references
+> added Var->sublevels_up
+> gotten this to work in the rewrite system
+> have not added full CopyNode support
+>
+> I will address these in the next few days.
+
+Nice! I'm starting with non-correlated subqueries...
+
+Vadim
+
+
+From owner-pgsql-hackers@hub.org Wed Jan 21 04:00:59 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA14981
+ for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 04:00:56 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA02432 for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 03:46:22 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id DAA12583; Wed, 21 Jan 1998 03:45:43 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 03:44:07 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id DAA12288 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 03:44:02 -0500 (EST)
+Received: from gandalf.sd.spardat.at (gandalf.telecom.at [194.118.26.84]) by hub.org (8.8.8/8.7.5) with ESMTP id DAA12263 for <pgsql-hackers@hub.org>; Wed, 21 Jan 1998 03:43:18 -0500 (EST)
+Received: from sdgtw.sd.spardat.at (sdgtw.sd.spardat.at [172.18.99.31])
+ by gandalf.sd.spardat.at (8.8.8/8.8.8) with ESMTP id JAA38408
+ for <pgsql-hackers@hub.org>; Wed, 21 Jan 1998 09:42:55 +0100
+Received: by sdgtw.sd.spardat.at with Internet Mail Service (5.0.1458.49)
+ id <DAF4ZATD>; Wed, 21 Jan 1998 09:42:55 +0100
+Message-ID: <219F68D65015D011A8E000006F8590C6010A51A2@sdexcsrv1.sd.spardat.at>
+From: Zeugswetter Andreas DBT <Andreas.Zeugswetter@telecom.at>
+To: "'pgsql-hackers@hub.org'" <pgsql-hackers@hub.org>
+Subject: [HACKERS] Re: subselects
+Date: Wed, 21 Jan 1998 09:42:52 +0100
+X-Priority: 3
+MIME-Version: 1.0
+X-Mailer: Internet Mail Service (5.0.1458.49)
+Content-Type: text/plain
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce wrote:
+> I have completed adding Var.varlevelsup, and have added code to the
+> parser to properly set the field. It will allow correlated references
+> in the WHERE clause, but not in the target list.
+
+select i2.ip1, i1.ip4 from nameip i1 where ip1 = (select ip1 from nameip
+i2);
+ 522: Table (i2) not selected in query.
+select i1.ip4 from nameip i1 where ip1 = (select i1.ip1 from nameip i2);
+ 284: A subquery has returned not exactly one row.
+select i1.ip4 from nameip i1 where ip1 = (select i1.ip1 from nameip i2
+where name='zeus');
+ 2 row(s) retrieved.
+
+Informix allows correlated references in the target list. It also allows
+subselects in the target list as in:
+select i1.ip4, (select i1.ip1 from nameip i2) from nameip i1;
+ 284: A subquery has returned not exactly one row.
+select i1.ip4, (select i1.ip1 from nameip i2 where name='zeus') from
+nameip i1;
+ 2 row(s) retrieved.
+
+Is this what you were looking for ?
+
+Andreas
+
+
+From owner-pgsql-hackers@hub.org Wed Jan 21 05:31:02 1998
+Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id FAA15884
+ for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 05:31:01 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id FAA04709 for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 05:16:16 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id FAA05191; Wed, 21 Jan 1998 05:15:42 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 05:14:02 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id FAA04951 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 05:13:57 -0500 (EST)
+Received: from dune.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id FAA04610 for <hackers@postgreSQL.org>; Wed, 21 Jan 1998 05:12:18 -0500 (EST)
+Received: from sable.krasnoyarsk.su (dune.krasnet.ru [193.125.44.86])
+ by dune.krasnet.ru (8.8.7/8.8.7) with ESMTP id RAA01918;
+ Wed, 21 Jan 1998 17:10:24 +0700 (KRS)
+ (envelope-from vadim@sable.krasnoyarsk.su)
+Message-ID: <34C5C98E.3E085F52@sable.krasnoyarsk.su>
+Date: Wed, 21 Jan 1998 17:10:22 +0700
+From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
+Organization: ITTS (Krasnoyarsk)
+X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
+MIME-Version: 1.0
+To: Bruce Momjian <maillist@candle.pha.pa.us>
+CC: PostgreSQL-development <hackers@postgreSQL.org>
+Subject: [HACKERS] Re: subselects
+References: <199801210324.WAA02161@candle.pha.pa.us>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+Bruce Momjian wrote:
+>
+> We are only going to have subselects in the WHERE clause, not in the
+> target list, right?
+>
+> The standard says we can have them either place, but I didn't think we
+> were implementing the target list subselects.
+>
+> Is that correct?
+
+Yes, this is right for 6.3. I hope that we'll support subselects in
+target list, FROM, etc in future.
+
+BTW, I'm going to implement subselect in (let's say) "natural" way -
+without substitution of parent query relations into subselect and so on,
+but by execution of (correlated) subqueries for each upper query row
+(may be with cacheing of results in hash table for better performance).
+Sure, this is much more clean way and much more clear how to do this.
+This seems like SQL-func way, but funcs start/run/stop Executor each time
+when called and this breaks performance.
+
+Vadim
+
+
+From owner-pgsql-hackers@hub.org Wed Jan 21 10:02:02 1998
+Received: from hub.org (hub.org [209.47.148.200])
+ by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA20456
+ for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 10:02:01 -0500 (EST)
+Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA06778; Wed, 21 Jan 1998 10:02:13 -0500 (EST)
+Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 10:00:41 -0500 (EST)
+Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA06544 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 10:00:37 -0500 (EST)
+Received: from u1.abs.net (root@u1.abs.net [207.114.0.131]) by hub.org (8.8.8/8.7.5) with ESMTP id KAA06326 for <pgsql-hackers@postgresql.org>; Wed, 21 Jan 1998 10:00:03 -0500 (EST)
+Received: from insightdist.com (nobody@localhost)
+ by u1.abs.net (8.8.5/8.8.5) with UUCP id JAA08009
+ for pgsql-hackers@postgresql.org; Wed, 21 Jan 1998 09:40:29 -0500 (EST)
+X-Authentication-Warning: u1.abs.net: nobody set sender to insightdist.com!darrenk using -f
+Received: by insightdist.com (AIX 3.2/UCB 5.64/4.03)
+ id AA33174; Wed, 21 Jan 1998 09:26:09 -0500
+Received: by ceodev (AIX 4.1/UCB 5.64/4.03)
+ id AA36452; Wed, 21 Jan 1998 09:13:05 -0500
+Date: Wed, 21 Jan 1998 09:13:05 -0500
+From: darrenk@insightdist.com (Darren King)
+Message-Id: <9801211413.AA36452@ceodev>
+To: pgsql-hackers@postgreSQL.org
+Subject: Re: [HACKERS] subselects
+Mime-Version: 1.0
+Content-Type: text/plain; charset=US-ASCII
+Content-Transfer-Encoding: 7bit
+Content-Md5: 4wI6dUsUAXei+yg3JycjGw==
+Sender: owner-pgsql-hackers@hub.org
+Precedence: bulk
+Status: OR
+
+> We are only going to have subselects in the WHERE clause, not in the
+> target list, right?
+>
+> The standard says we can have them either place, but I didn't think we
+> were implementing the target list subselects.
+>
+> Is that correct?
+
+What about the HAVING clause? Currently not in, but someone here wants
+to take a stab at it.
+
+Doesn't seem that tough...loops over the tuples returned from the group
+by node and checks the expression such as "x > 5" or "x = (subselect)".
+
+The cost analysis in the optimizer could be tricky come to think of it.
+If a subselect has a HAVING, would have to have a formula to determine
+the selectiveness. Hmmm...
+
+darrenk
+
+