From da0e6bfeaf4dfe3dc760b3dcb2cd4c9812864d9c Mon Sep 17 00:00:00 2001 From: Bruce Momjian Date: Mon, 29 Jan 2001 17:52:47 +0000 Subject: [PATCH] Remove subquery. --- doc/TODO | 1 - doc/TODO.detail/subquery | 9706 -------------------------------------- 2 files changed, 9707 deletions(-) delete mode 100644 doc/TODO.detail/subquery diff --git a/doc/TODO b/doc/TODO index 2fc015adda..443b2f28f7 100644 --- a/doc/TODO +++ b/doc/TODO @@ -292,7 +292,6 @@ MISC * -Make oid use oidin/oidout not int4in/int4out in pg_type.h (Tom) * Improve Subplan list handling * Allow Subplans to use efficient joins(hash, merge) with upper variable - [subquery] * -use fmgr_info()/fmgr_faddr() instead of fmgr() calls in high-traffic places, like GROUP BY, UNIQUE, index processing, etc. * improve dynamic memory allocation by introducing tuple-context memory diff --git a/doc/TODO.detail/subquery b/doc/TODO.detail/subquery deleted file mode 100644 index cdc55c8580..0000000000 --- a/doc/TODO.detail/subquery +++ /dev/null @@ -1,9706 +0,0 @@ -From vadim@krs.ru Fri Aug 6 00:02:02 1999 -Received: from sunpine.krs.ru (SunPine.krs.ru [195.161.16.37]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA22890 - for ; Fri, 6 Aug 1999 00:02:00 -0400 (EDT) -Received: from krs.ru (dune.krs.ru [195.161.16.38]) - by sunpine.krs.ru (8.8.8/8.8.8) with ESMTP id MAA23302; - Fri, 6 Aug 1999 12:01:59 +0800 (KRSS) -Sender: root@sunpine.krs.ru -Message-ID: <37AA5E35.66C03F2E@krs.ru> -Date: Fri, 06 Aug 1999 12:01:57 +0800 -From: Vadim Mikheev -Organization: OJSC Rostelecom (Krasnoyarsk) -X-Mailer: Mozilla 4.5 [en] (X11; I; FreeBSD 3.0-RELEASE i386) -X-Accept-Language: ru, en -MIME-Version: 1.0 -To: Bruce Momjian -CC: Tom Lane , pgsql-hackers@postgreSQL.org -Subject: Re: [HACKERS] Idea for speeding up uncorrelated subqueries -References: <199908060331.XAA22277@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: RO - -Bruce Momjian wrote: -> -> Isn't it something that takes only a few hours to implement. We can't -> keep telling people to us EXISTS, especially because most SQL people -> think correlated queries are slower that non-correlated ones. Can we -> just on-the-fly rewrite the query to use exists? - -This seems easy to implement. We could look does subquery have -aggregates or not before calling union_planner() in -subselect.c:_make_subplan() and rewrite it (change -slink->subLinkType from IN to EXISTS and add quals). - -Without caching implemented IN-->EXISTS rewriting always -has sence. - -After implementation of caching we probably should call union_planner() -for both original/modified subqueries and compare costs/sizes -of EXISTS/IN_with_caching plans and maybe even make -decision what plan to use after parent query is planned -and we know for how many parent rows subplan will be executed. - -Vadim - -From tgl@sss.pgh.pa.us Fri Aug 6 00:15:23 1999 -Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA23058 - for ; Fri, 6 Aug 1999 00:15:22 -0400 (EDT) -Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1]) - by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id AAA06786; - Fri, 6 Aug 1999 00:14:50 -0400 (EDT) -To: Bruce Momjian -cc: Vadim Mikheev , pgsql-hackers@postgreSQL.org -Subject: Re: [HACKERS] Idea for speeding up uncorrelated subqueries -In-reply-to: Your message of Thu, 5 Aug 1999 23:31:01 -0400 (EDT) - <199908060331.XAA22277@candle.pha.pa.us> -Date: Fri, 06 Aug 1999 00:14:50 -0400 -Message-ID: <6783.933912890@sss.pgh.pa.us> -From: Tom Lane -Status: RO - -Bruce Momjian writes: -> Isn't it something that takes only a few hours to implement. We can't -> keep telling people to us EXISTS, especially because most SQL people -> think correlated queries are slower that non-correlated ones. Can we -> just on-the-fly rewrite the query to use exists? - -I was just about to suggest exactly that. The "IN (subselect)" -notation seems to be a lot more intuitive --- at least, people -keep coming up with it --- so why not rewrite it to the EXISTS -form, if we can handle that more efficiently? - - regards, tom lane - -From aixssd!darrenk@abs.net Thu Dec 5 10:30:53 1996 -Received: from abs.net (root@u1.abs.net [207.114.0.130]) by candle.pha.pa.us (8.8.3/8.7.3) with ESMTP id KAA06591 for ; Thu, 5 Dec 1996 10:30:43 -0500 (EST) -Received: from aixssd.UUCP (nobody@localhost) by abs.net (8.8.3/8.7.3) with UUCP id KAA01387 for maillist@candle.pha.pa.us; Thu, 5 Dec 1996 10:13:56 -0500 (EST) -Received: by aixssd (AIX 3.2/UCB 5.64/4.03) - id AA36963; Thu, 5 Dec 1996 10:10:24 -0500 -Received: by ceodev (AIX 4.1/UCB 5.64/4.03) - id AA34942; Thu, 5 Dec 1996 10:07:56 -0500 -Date: Thu, 5 Dec 1996 10:07:56 -0500 -From: aixssd!darrenk@abs.net (Darren King) -Message-Id: <9612051507.AA34942@ceodev> -To: maillist@candle.pha.pa.us -Subject: Subselect info. -Mime-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Content-Md5: jaWdPH2KYtdr7ESzqcOp5g== -Status: OR - -> Any of them deal with implementing subselects? - -There's a white paper at the www.sybase.com that might -help a little. It's just a copy of a presentation -given by the optimizer guru there. Nothing code-wise, -but he gives a few ways of flattening them with temp -tables, etc... - -Darren - -From vadim@sable.krasnoyarsk.su Thu Aug 21 23:42:50 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04109 - for ; Thu, 21 Aug 1997 23:42:43 -0400 (EDT) -Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04399; Fri, 22 Aug 1997 12:04:31 +0800 (KRD) -Sender: root@www.krasnet.ru -Message-ID: <33FD0FCF.4DAA423A@sable.krasnoyarsk.su> -Date: Fri, 22 Aug 1997 12:04:31 +0800 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -Subject: Re: subselects -References: <199708220219.WAA23745@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> Considering the complexity of the primary/secondary changes you are -> making, I believe subselects will be easier than that. - -I don't do changes for P/F keys - just thinking... -Yes, I think that impl of referential integrity is -more complex work. - -As for subselects: - -in plannodes.h - -typedef struct Plan { -... - struct Plan *lefttree; - struct Plan *righttree; -} Plan; - -/* ---------------- - * these are are defined to avoid confusion problems with "left" - ^^^^^^^^^^^^^^^^^^ - * and "right" and "inner" and "outer". The convention is that - * the "left" plan is the "outer" plan and the "right" plan is - * the inner plan, but these make the code more readable. - * ---------------- - */ -#define innerPlan(node) (((Plan *)(node))->righttree) -#define outerPlan(node) (((Plan *)(node))->lefttree) - -First thought is avoid any confusions by re-defining - -#define rightPlan(node) (((Plan *)(node))->righttree) -#define leftPlan(node) (((Plan *)(node))->lefttree) - -and change all occurrences of 'outer' & 'inner' in code -to 'left' & 'inner' ones: - -this will allow to use 'outer' & 'inner' things for subselects -latter, without confusion. My hope is that we may change Executor -very easy by adding outer/inner plans/TupleSlots to -EState, CommonState, JoinState, etc and by doing node -processing in right order. - -Subselects are mostly Planner problem. - -Unfortunately, I havn't time at the moment: CHECK/DEFAULT... - -Vadim - -From vadim@sable.krasnoyarsk.su Fri Aug 22 00:00:59 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA04354 - for ; Fri, 22 Aug 1997 00:00:51 -0400 (EDT) -Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04425; Fri, 22 Aug 1997 12:22:37 +0800 (KRD) -Sender: root@www.krasnet.ru -Message-ID: <33FD140D.64880EEB@sable.krasnoyarsk.su> -Date: Fri, 22 Aug 1997 12:22:37 +0800 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -Subject: Re: subselects -References: <199708220219.WAA23745@candle.pha.pa.us> <33FD0FCF.4DAA423A@sable.krasnoyarsk.su> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Vadim B. Mikheev wrote: -> -> this will allow to use 'outer' & 'inner' things for subselects -> latter, without confusion. My hope is that we may change Executor - -Or may be use 'high' & 'low' for subselecs (to avoid confusion -with outter hoins). - -> very easy by adding outer/inner plans/TupleSlots to -> EState, CommonState, JoinState, etc and by doing node -> processing in right order. - ^^^^^^^^^^^^^^ -Rule is easy: -1. Uncorrelated subselect - do 'low' plan node first -2. Correlated - do left/right first - -- just some flag in structures. - -Vadim - -From owner-pgsql-hackers@hub.org Thu Oct 30 17:02:30 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA09682 - for ; Thu, 30 Oct 1997 17:02:28 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA20688; Thu, 30 Oct 1997 16:58:40 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:58:24 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA20615 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:58:17 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA20495 for ; Thu, 30 Oct 1997 16:57:54 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id QAA07726 - for hackers@postgreSQL.org; Thu, 30 Oct 1997 16:50:29 -0500 (EST) -From: Bruce Momjian -Message-Id: <199710302150.QAA07726@candle.pha.pa.us> -Subject: [HACKERS] subselects -To: hackers@postgreSQL.org (PostgreSQL-development) -Date: Thu, 30 Oct 1997 16:50:29 -0500 (EST) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -The only thing I have to add to what I had written earlier is that I -think it is best to have these subqueries executed as early in query -execution as possible. - -Every piece of the backend: parser, optimizer, executor, is designed to -work on a single query. The earlier we can split up the queries, the -better those pieces will work at doing their job. You want to be able -to use the parser and optimizer on each part of the query separately, if -you can. - - -Forwarded message: -> I have done some thinking about subselects. There are basically two -> issues: - > -> Does the query return one row or several rows? This can be -> determined by seeing if the user uses equals on 'IN' to join the -> subquery. -> -> Is the query correlated, meaning "Does the subquery reference -> values from the outer query?" -> -> (We already have the third type of subquery, the INSERT...SELECT query.) -> -> So we have these four combinations: -> -> 1) one row, no correlation -> 2) multiple rows, no correlation -> 3) one row, correlated -> 4) multiple rows, correlated -> -> -> With #1, we can execute the subquery, get the value, replace the -> subquery with the constant returned from the subquery, and execute the -> outer query. -> -> With #2, we can execute the subquery and put the result into a temporary -> table. We then rewrite the outer query to access the temporary table -> and replace the subquery with the column name from the temporary table. -> We probabally put an index on the temp. table, which has only one -> column, because a subquery can only return one column. We remove the -> temp. table after query execution. -> -> With #3 and #4, we potentially need to execute the subquery for every -> row returned by the outer query. Performance would be horrible for -> anything but the smallest query. Another way to handle this is to -> execute the subquery WITHOUT using any of the outer-query columns to -> restrict the WHERE clause, and add those columns used to join the outer -> variables into the target list of the subquery. So for query: -> -> select t1.name -> from tab t1 -> where t1.age = (select max(t2.age) -> from tab2 -> where tab2.name = t1.name) -> -> Execute the subquery and put it in a temporary table: -> -> select t2.name, max(t2.age) -> into table temp999 -> from tab2 -> where tab2.name = t1.name -> -> create index i_temp999 on temp999 (name) -> -> Then re-write the outer query: -> -> select t1.name -> from tab t1, temp999 -> where t1.age = temp999.age and -> t1.name = temp999.name -> -> The only problem here is that the subselect is running for all entries -> in tab2, even if the outer query is only going to need a few rows. -> Determining whether to execute the subquery each time, or create a temp. -> table is often difficult to determine. Even some non-correlated -> subqueries are better to execute for each row rather the pre-execute the -> entire subquery, expecially if the outer query returns few rows. -> -> One requirement to handle these issues is better column statistics, -> which I am working on. -> - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Fri Oct 31 22:30:58 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA15643 - for ; Fri, 31 Oct 1997 22:30:56 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA24379 for ; Fri, 31 Oct 1997 22:06:08 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA15503; Fri, 31 Oct 1997 22:03:40 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 31 Oct 1997 22:01:38 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA14136 for pgsql-hackers-outgoing; Fri, 31 Oct 1997 22:01:29 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA13866 for ; Fri, 31 Oct 1997 22:00:53 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id VAA14566; - Fri, 31 Oct 1997 21:37:06 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711010237.VAA14566@candle.pha.pa.us> -Subject: Re: [HACKERS] subselects -To: maillist@candle.pha.pa.us (Bruce Momjian) -Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <199710302150.QAA07726@candle.pha.pa.us> from "Bruce Momjian" at Oct 30, 97 04:50:29 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -One more issue I thought of. You can have multiple subselects in a -single query, and subselects can have their own subselects. - -This makes it particularly important that we define a system that always -is able to process the subselect BEFORE the upper select. This will -allow use to handle all these cases without limitations. - -> -> The only thing I have to add to what I had written earlier is that I -> think it is best to have these subqueries executed as early in query -> execution as possible. -> -> Every piece of the backend: parser, optimizer, executor, is designed to -> work on a single query. The earlier we can split up the queries, the -> better those pieces will work at doing their job. You want to be able -> to use the parser and optimizer on each part of the query separately, if -> you can. -> - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From hannu@trust.ee Sun Nov 2 10:33:33 1997 -Received: from sid.trust.ee (sid.trust.ee [194.204.23.180]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27619 - for ; Sun, 2 Nov 1997 10:32:04 -0500 (EST) -Received: from sid.trust.ee (wink.trust.ee [194.204.23.184]) - by sid.trust.ee (8.8.5/8.8.5) with ESMTP id RAA02233; - Sun, 2 Nov 1997 17:30:11 +0200 -Message-ID: <345C9BFD.986C68AA@sid.trust.ee> -Date: Sun, 02 Nov 1997 17:27:57 +0200 -From: Hannu Krosing -X-Mailer: Mozilla 4.02 [en] (Win95; I) -MIME-Version: 1.0 -To: hackers-digest@postgresql.org -CC: maillist@candle.pha.pa.us -Subject: Re: [HACKERS] subselects -References: <199711010401.XAA09216@hub.org> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -> Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST) -> From: Bruce Momjian -> Subject: Re: [HACKERS] subselects -> -> One more issue I thought of. You can have multiple subselects in a -> single query, and subselects can have their own subselects. -> -> This makes it particularly important that we define a system that always -> is able to process the subselect BEFORE the upper select. This will -> allow use to handle all these cases without limitations. - -This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a -search criteria for the subselect, -for example you can't do - -update parts p1 -set parts.current_id = ( - select new_id - from parts p2 - where p1.old_id = p2.new_id);or - -select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice -from parts p1; - -there may be of course ways to rewrite these queries (which the optimiser should do -if it can) but IMHO, these kinds of subselects should still be allowed - -> > The only thing I have to add to what I had written earlier is that I -> > think it is best to have these subqueries executed as early in query -> > execution as possible. -> > -> > Every piece of the backend: parser, optimizer, executor, is designed to -> > work on a single query. The earlier we can split up the queries, the -> > better those pieces will work at doing their job. You want to be able -> > to use the parser and optimizer on each part of the query separately, if -> > you can. -> > -> - -Hannu - - -From vadim@sable.krasnoyarsk.su Sun Nov 2 21:30:59 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA14831 - for ; Sun, 2 Nov 1997 21:30:57 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id VAA19683 for ; Sun, 2 Nov 1997 21:20:13 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id JAA17259; Mon, 3 Nov 1997 09:22:38 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <345D356E.353C51DE@sable.krasnoyarsk.su> -Date: Mon, 03 Nov 1997 09:22:38 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: [HACKERS] subselects -References: <199711021848.NAA08319@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > > One more issue I thought of. You can have multiple subselects in a -> > > single query, and subselects can have their own subselects. -> > > -> > > This makes it particularly important that we define a system that always -> > > is able to process the subselect BEFORE the upper select. This will -> > > allow use to handle all these cases without limitations. -> > -> > This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a -> > search criteria for the subselect, -> > for example you can't do -> > -> > update parts p1 -> > set parts.current_id = ( -> > select new_id -> > from parts p2 -> > where p1.old_id = p2.new_id);or -> > -> > select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice -> > from parts p1; -> > -> > there may be of course ways to rewrite these queries (which the optimiser should do -> > if it can) but IMHO, these kinds of subselects should still be allowed -> -> I hadn't even gotten to this point yet, but it is a good thing to keep -> in mind. -> -> In these cases, as in correlated subqueries in the where clause, we will -> create a temporary table, and add the proper join fields and tables to -> the clauses. Our version of UPDATE accepts a FROM section, and we will -> certainly use this for this purpose. - -We can't replace subselect with join if there is aggregate -in subselect. - -Actually, I don't see any problems if we going to process subselect -like sql-funcs: non-correlated subselects can be emulated by -funcs without args, for correlated subselects parser (analyze.c) -has to change all upper query references to $1, $2,... - -Vadim - -From vadim@sable.krasnoyarsk.su Mon Nov 3 06:07:12 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27433 - for ; Mon, 3 Nov 1997 06:07:03 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id SAA18519; Mon, 3 Nov 1997 18:09:44 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <345DB0F7.5E652F78@sable.krasnoyarsk.su> -Date: Mon, 03 Nov 1997 18:09:43 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org -Subject: Re: [HACKERS] subselects -References: <199711030316.WAA15401@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > -> > > In these cases, as in correlated subqueries in the where clause, we will -> > > create a temporary table, and add the proper join fields and tables to -> > > the clauses. Our version of UPDATE accepts a FROM section, and we will -> > > certainly use this for this purpose. -> > -> > We can't replace subselect with join if there is aggregate -> > in subselect. -> -> I got lost here. Why can't we handle aggregates? - -Sorry, I missed using of temp tables. Sybase uses joins (without -temp tables) for non-correlated subqueries: - - A noncorrelated subquery can be evaluated as if it were an independent query. - Conceptually, the results of the subquery are substituted in the main statement, or - outer query. This is not how SQL Server actually processes statements with - subqueries. Noncorrelated subqueries can be alternatively stated as joins and - are processed as joins by SQL Server. - -but this is not possible if there are aggregates in subquery. - -> -> My idea was this. This is a non-correlated subquery. -... -No problems with it... - -> -> Here is a correlated example: -> -> select * -> from table_a -> where table_a.col_a in (select table_b.col_b -> from table_b -> where table_b.col_b = table_a.col_c) -> -> rewrite as: -> -> select distinct table_b.col_b, table_a.col_c -- the distinct is needed -> into table_sub -> from table_a, table_b - -First, could we add 'where table_b.col_b = table_a.col_c' here ? -Just to avoid Cartesian results ? I hope we can. - -Note that for query - - select * - from table_a - where table_a.col_a in (select table_b.col_b * table_a.col_c - from table_b) - -it's better to do - - select distinct table_a.col_a - into table table_sub - from table_b, table_a - where table_a.col_a = table_b.col_b * table_a.col_c - -once again - to avoid Cartesians. - -But what could we do for - - select * - from table_a - where table_a.col_a = (select max(table_b.col_b * table_a.col_c) - from table_b) -??? - select max(table_b.col_b * table_a.col_c), table_a.col_a - into table table_sub - from table_b, table_a - group by table_a.col_a - -first tries to sort sizeof(table_a) * sizeof(table_b) tuples... -For tables big and small with 100 000 and 1000 tuples - -select max(x*y), x from big, small group by x - -"ate" all free 140M in my file system after 20 minutes (just for -sorting - nothing more) and was killed... - -select x from big where x = cor(x); -(cor(int4) is 'select max($1*y) from small') takes 20 minutes - -this is bad too. - -> > -> > Actually, I don't see any problems if we going to process subselect -> > like sql-funcs: non-correlated subselects can be emulated by -> > funcs without args, for correlated subselects parser (analyze.c) -> > has to change all upper query references to $1, $2,... -> -> Yes, logically, they are SQL functions, but aren't we going to see -> terrible performance in such circumstances. My experience is that when - ^^^^^^^^^^^^^^^^^^^^ -You're right. - -> people are given subselects, they start to do huge jobs with them. -> -> In fact, the final solution may be to have both methods available, and -> switch between them depending on the size of the query sets. Each -> method has its advantages. The function example lets the outside query -> be executed, and only calls the subquery when needed. -> -> For large tables where the subselect is small and is the entire WHERE -> restriction, the SQL function gets call much too often. A simple join -> of the subquery result and the large table would be much better. This -> method also allows for sort/merge join of the subquery results, and -> index use. - -...keep thinking... - -Vadim - -From owner-pgsql-hackers@hub.org Mon Nov 3 11:01:01 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03633 - for ; Mon, 3 Nov 1997 11:00:59 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id KAA12174 for ; Mon, 3 Nov 1997 10:49:42 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id KAA26203; Mon, 3 Nov 1997 10:33:32 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 10:31:43 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id KAA25514 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 10:31:36 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA25449 for ; Mon, 3 Nov 1997 10:31:23 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id KAA02262; - Mon, 3 Nov 1997 10:25:34 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711031525.KAA02262@candle.pha.pa.us> -Subject: Re: [HACKERS] subselects -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Mon, 3 Nov 1997 10:25:34 -0500 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <345DB0F7.5E652F78@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 06:09:43 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> Sorry, I missed using of temp tables. Sybase uses joins (without -> temp tables) for non-correlated subqueries: -> -> A noncorrelated subquery can be evaluated as if it were an independent query. -> Conceptually, the results of the subquery are substituted in the main statement, or -> outer query. This is not how SQL Server actually processes statements with -> subqueries. Noncorrelated subqueries can be alternatively stated as joins and -> are processed as joins by SQL Server. -> -> but this is not possible if there are aggregates in subquery. -> -> > -> > My idea was this. This is a non-correlated subquery. -> ... -> No problems with it... -> -> > -> > Here is a correlated example: -> > -> > select * -> > from table_a -> > where table_a.col_a in (select table_b.col_b -> > from table_b -> > where table_b.col_b = table_a.col_c) -> > -> > rewrite as: -> > -> > select distinct table_b.col_b, table_a.col_c -- the distinct is needed -> > into table_sub -> > from table_a, table_b -> -> First, could we add 'where table_b.col_b = table_a.col_c' here ? -> Just to avoid Cartesian results ? I hope we can. - -Yes, of course. I forgot that line here. We can also be fancy and move -some of the outer where restrictions on table_a into the subquery. - -I think the classic subquery for this would be if someone wanted all -customer names that had invoices in the past month: - -select custname -from customer -where custid in (select order.custid - from order - where order.date >= "09/01/97" and - order.date <= "09/30/97" - -In this case, the subquery can use an index on 'date' to quickly -evaluate the query, and the resulting temp table can quickly be joined -to the customer table. If we used SQL functions, every customer would -have an order query evaluated for it, and there may be no multi-column -index on customer and date, or even if there is, this could be many -query executions. - - -> -> Note that for query -> -> select * -> from table_a -> where table_a.col_a in (select table_b.col_b * table_a.col_c -> from table_b) -> -> it's better to do -> -> select distinct table_a.col_a -> into table table_sub -> from table_b, table_a -> where table_a.col_a = table_b.col_b * table_a.col_c - -Yes, I had not thought of cases where they are doing correlated column -arithmetic, but it looks like this would work. - -> -> once again - to avoid Cartesians. -> -> But what could we do for -> -> select * -> from table_a -> where table_a.col_a = (select max(table_b.col_b * table_a.col_c) -> from table_b) - -OK, who wrote this horrible query. :-) - -Without a join of table_b and table_a, even an SQL function would die on -this. You have to take the current value table_a.col_c, and multiply by -every value of table_b.col_b to get the maximum. - -Trying to do a temp table on this is certainly going to be a cartesian -product, but using an SQL function is also going to be a cartesian -product, except that the product is generated in small pieces instead of -in one big query. The SQL function example may eventually complete, but -it will take forever to do so in cases where the temp table would bomb. - -I can recommend some SQL books for anyone go sends in a bug report on -this query. :-) - - - -> ??? -> select max(table_b.col_b * table_a.col_c), table_a.col_a -> into table table_sub -> from table_b, table_a -> group by table_a.col_a -> -> first tries to sort sizeof(table_a) * sizeof(table_b) tuples... -> For tables big and small with 100 000 and 1000 tuples -> -> select max(x*y), x from big, small group by x -> -> "ate" all free 140M in my file system after 20 minutes (just for -> sorting - nothing more) and was killed... -> -> select x from big where x = cor(x); -> (cor(int4) is 'select max($1*y) from small') takes 20 minutes - -> this is bad too. - -Again, my feeling is that in cases where the temp table would bomb, the -SQL function will be so slow that neither will be acceptable. - -> -> > > -> > > Actually, I don't see any problems if we going to process subselect -> > > like sql-funcs: non-correlated subselects can be emulated by -> > > funcs without args, for correlated subselects parser (analyze.c) -> > > has to change all upper query references to $1, $2,... -> > -> > Yes, logically, they are SQL functions, but aren't we going to see -> > terrible performance in such circumstances. My experience is that when -> ^^^^^^^^^^^^^^^^^^^^ -> You're right. -> -> > people are given subselects, they start to do huge jobs with them. -> > -> > In fact, the final solution may be to have both methods available, and -> > switch between them depending on the size of the query sets. Each -> > method has its advantages. The function example lets the outside query -> > be executed, and only calls the subquery when needed. -> > -> > For large tables where the subselect is small and is the entire WHERE -> > restriction, the SQL function gets call much too often. A simple join -> > of the subquery result and the large table would be much better. This -> > method also allows for sort/merge join of the subquery results, and -> > index use. -> -> ...keep thinking... -> -> Vadim -> - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Thu Nov 20 00:09:18 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA05239 - for ; Thu, 20 Nov 1997 00:09:11 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA13776; Wed, 19 Nov 1997 23:59:53 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 19 Nov 1997 23:58:49 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA13599 for pgsql-hackers-outgoing; Wed, 19 Nov 1997 23:58:43 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA13512 for ; Wed, 19 Nov 1997 23:58:16 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id XAA03103 - for hackers@postgreSQL.org; Wed, 19 Nov 1997 23:57:44 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711200457.XAA03103@candle.pha.pa.us> -Subject: [HACKERS] subselect -To: hackers@postgreSQL.org (PostgreSQL-development) -Date: Wed, 19 Nov 1997 23:57:44 -0500 (EST) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -I am going to overhaul all the /parser files, and I may give subselects -a try while I am in there. This is where it going to have to be done. - -Two things I think I need are: - - temp tables that go away at the end of a statement, so if the -query elog's out, the temp file gets destroyed - - how do I implement "not in": - - select * from a where x not in (select y from b) - -Using <> is not going to work because that returns multiple copies of a, -one for every one that doesn't equal. It is like we need not equals, -but don't return multiple rows. - -Any ideas? - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From lockhart@alumni.caltech.edu Thu Nov 20 10:00:59 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA22019 - for ; Thu, 20 Nov 1997 10:00:56 -0500 (EST) -Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21662 for ; Thu, 20 Nov 1997 09:52:55 -0500 (EST) -Received: from alumni.caltech.edu (localhost [127.0.0.1]) - by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA22754; - Thu, 20 Nov 1997 06:27:21 GMT -Sender: tgl@gnet04.jpl.nasa.gov -Message-ID: <3473D849.16F67A2A@alumni.caltech.edu> -Date: Thu, 20 Nov 1997 06:27:21 +0000 -From: "Thomas G. Lockhart" -Organization: Caltech/JPL -X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: [HACKERS] subselect -References: <199711200457.XAA03103@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -> I am going to overhaul all the /parser files - -?? - -> , and I may give subselects -> a try while I am in there. This is where it going to have to be done. - -A first cut at the subselect syntax is already in gram.y. I'm sure that the -e-mail you had sent which collected several items regarding subselects -covers some of this topic. I've been thinking about subselects also, and -had thought that there must be some existing mechanisms in the backend -which can be used to help implement subselects. It seems to me that UNION -might be a good thing to implement first, because it has a fairly -well-defined set of behaviors: - - select a union select b; - -chooses elements from a and from b and then sorts/uniques the result. - - select a union all select b; - -chooses elements from a, sorts/uniques, and then adds all elements from b. - - select a union select b union all select c; - -evaluates left to right, and first evaluates a union b, sorts/uniques, and -then evaluates - - (result) union all select c; - -There are several types of subselects. Examples of some are: - -1) select a.f from a union select b.f from b order by 1; -Needs temporary table(s), optional sort/unique, final order by. - -2) select a.f from a where a.f in (select b.f from b); -Needs temporary table(s). "in" can be first implemented by count(*) > 0 but -would be better performance to have the backend return after the first -match. - -3) select a.f from a where exists (select b.f from b where b.f = a); -Need to do the select and do a subselect on _each_ of the returned values? -Again could use count(*) to help implement. - -This brings up the point that perhaps the backend needs a row-counting -atomic operation and count(*) could be re-implemented using that. At the -moment count(*) is transformed to a select of OID columns and does not -quite work on table joins. - -I would think that outer joins could use some of these support routines -also. - - - Tom - -> Two things I think I need are: -> -> temp tables that go away at the end of a statement, so if the -> query elog's out, the temp file gets destroyed -> -> how do I implement "not in": -> -> select * from a where x not in (select y from b) -> -> Using <> is not going to work because that returns multiple copies of a, -> one for every one that doesn't equal. It is like we need not equals, -> but don't return multiple rows. -> -> Any ideas? -> -> -- -> Bruce Momjian -> maillist@candle.pha.pa.us - - - - -From owner-pgsql-hackers@hub.org Mon Dec 22 00:49:03 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA13311 - for ; Mon, 22 Dec 1997 00:49:01 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA11930; Mon, 22 Dec 1997 00:45:41 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 00:45:17 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA11756 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 00:45:14 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA11624 for ; Mon, 22 Dec 1997 00:44:57 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id AAA11605 - for hackers@postgreSQL.org; Mon, 22 Dec 1997 00:45:23 -0500 (EST) -From: Bruce Momjian -Message-Id: <199712220545.AAA11605@candle.pha.pa.us> -Subject: [HACKERS] subselects -To: hackers@postgreSQL.org (PostgreSQL-development) -Date: Mon, 22 Dec 1997 00:45:23 -0500 (EST) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -OK, a few questions: - - Should we use sortmerge, so we can use our psort as temp tables, -or do we use hashunique? - - How do we pass the query to the optimizer? How do we represent -the range table for each, and the links between them in correlated -subqueries? - -I have to think about this. Comments are welcome. --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Mon Dec 22 02:01:27 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA20608 - for ; Mon, 22 Dec 1997 02:01:25 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA25136 for ; Mon, 22 Dec 1997 01:37:29 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA25289; Mon, 22 Dec 1997 01:31:18 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 01:30:45 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA23854 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 01:30:35 -0500 (EST) -Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22847 for ; Mon, 22 Dec 1997 01:30:15 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id BAA17354 - for hackers@postgreSQL.org; Mon, 22 Dec 1997 01:05:04 -0500 (EST) -From: Bruce Momjian -Message-Id: <199712220605.BAA17354@candle.pha.pa.us> -Subject: [HACKERS] subselects (fwd) -To: hackers@postgreSQL.org (PostgreSQL-development) -Date: Mon, 22 Dec 1997 01:05:03 -0500 (EST) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Forwarded message: -> OK, a few questions: -> -> Should we use sortmerge, so we can use our psort as temp tables, -> or do we use hashunique? -> -> How do we pass the query to the optimizer? How do we represent -> the range table for each, and the links between them in correlated -> subqueries? -> -> I have to think about this. Comments are welcome. - -One more thing. I guess I am seeing subselects as a different thing -that temp tables. I can see people wanting to put indexes on their temp -tables, so I think they will need more system catalog support. For -subselects, I think we can just stuff them into psort, perhaps, and do -the unique as we unload them. - -Seems like a natural to me. - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:07 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08876 - for ; Tue, 23 Dec 1997 04:00:57 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23042; - Tue, 23 Dec 1997 16:08:56 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <349F7FA8.77F8DC55@sable.krasnoyarsk.su> -Date: Tue, 23 Dec 1997 16:08:56 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: [HACKERS] subselects (fwd) -References: <199712220605.BAA17354@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> Forwarded message: -> > OK, a few questions: -> > -> > Should we use sortmerge, so we can use our psort as temp tables, -> > or do we use hashunique? -> > -> > How do we pass the query to the optimizer? How do we represent -> > the range table for each, and the links between them in correlated -> > subqueries? -> > -> > I have to think about this. Comments are welcome. -> -> One more thing. I guess I am seeing subselects as a different thing -> that temp tables. I can see people wanting to put indexes on their temp -> tables, so I think they will need more system catalog support. For - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -What's the difference between temp tables and temp indices ? -Both of them are handled via catalog cache... - -Vadim - -From vadim@sable.krasnoyarsk.su Sat Jan 3 04:01:00 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA28565 - for ; Sat, 3 Jan 1998 04:00:58 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA19242 for ; Sat, 3 Jan 1998 03:47:07 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA21017; - Sat, 3 Jan 1998 16:08:55 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34AE0023.A477AEC5@sable.krasnoyarsk.su> -Date: Sat, 03 Jan 1998 16:08:51 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian , - "Thomas G. Lockhart" -Subject: Re: subselects -References: <199712290516.AAA12579@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> With UNIONs done, how are things going with you on subselects? UNIONs -> are much easier that subselects. -> -> I am stumped on how to record the subselect query information in the -> parser and stuff. - - And I'm too. We definitely need in EXISTS node and may be in IN one. -Also, we have to support ANY and ALL modifiers of comparison operators -(it would be nice to support ANY and ALL for all operators returning -bool: >, =, ..., like, ~ and so on). Note, that IN is the same as -= ANY (NOT IN ==> <> ALL) assuming that '=' means EQUAL for all data types, -and so, we could avoid IN node, but I'm not sure that I like such -assumption: postgres is OO-like system allowing operators to be overriden -and so, '=' can, in theory, mean not EQUAL but something else (someday -we could allow to specify "meaning" of operator in CREATE OPERATOR) - -in short, I would like IN node. - Also, I would suggest nodes for ANY and ALL. - (I need in few days to think more about recording of this stuff...) - -> -> Please let me know what I can do to help, if anything. - -Thanks. As I remember, Tom also wished to work here. Tom ? - -Bye, - Vadim - -P.S. I'll be "on-line" Jan 5. - -From owner-pgsql-hackers@hub.org Mon Jan 5 07:30:51 1998 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA05466 - for ; Mon, 5 Jan 1998 07:30:49 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id HAA04700; Mon, 5 Jan 1998 07:22:06 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 07:21:45 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id HAA02846 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 07:21:35 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA00903 for ; Mon, 5 Jan 1998 07:20:57 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA24278; - Mon, 5 Jan 1998 19:36:06 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Message-ID: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> -Date: Mon, 05 Jan 1998 19:35:59 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: [HACKERS] subselect -References: <199801050516.AAA28005@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce Momjian wrote: -> -> I was thinking about subselects, and how to attach the two queries. -> -> What if the subquery makes a range table entry in the outer query, and -> the query is set up like the UNION queries where we put the scans in a -> row, but in the case we put them over/under each other. -> -> And we push a temp table into the catalog cache that represents the -> result of the subquery, then we could join to it in the outer query as -> though it was a real table. -> -> Also, can't we do the correlated subqueries by adding the proper -> target/output columns to the subquery, and have the outer query -> reference those columns in the subquery range table entry. - -Yes, this is a way to handle subqueries by joining to temp table. -After getting plan we could change temp table access path to -node material. On the other hand, it could be useful to let optimizer -know about cost of temp table creation (have to think more about it)... -Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN -is one example of this - joining by <> will give us invalid results. -Setting special NOT EQUAL flag is not enough: subquery plan must be -always inner one in this case. The same for handling ALL modifier. -Note, that we generaly can't use aggregates here: we can't add MAX to -subquery in the case of > ALL (subquery), because of > ALL should return FALSE -if subquery returns NULL(s) but aggregates don't take NULLs into account. - -> -> Maybe I can write up a sample of this? Vadim, would this help? Is this -> the point we are stuck at? - -Personally, I was stuck by holydays -:) -Now I can spend ~ 8 hours ~ each day for development... - -Vadim - - -From owner-pgsql-hackers@hub.org Mon Jan 5 10:45:30 1998 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA10769 - for ; Mon, 5 Jan 1998 10:45:28 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA17823; Mon, 5 Jan 1998 10:32:00 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 10:31:45 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA17757 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 10:31:38 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA17727 for ; Mon, 5 Jan 1998 10:31:06 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id KAA10375; - Mon, 5 Jan 1998 10:28:48 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801051528.KAA10375@candle.pha.pa.us> -Subject: Re: [HACKERS] subselect -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Mon, 5 Jan 1998 10:28:48 -0500 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 5, 98 07:35:59 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -> Yes, this is a way to handle subqueries by joining to temp table. -> After getting plan we could change temp table access path to -> node material. On the other hand, it could be useful to let optimizer -> know about cost of temp table creation (have to think more about it)... -> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN -> is one example of this - joining by <> will give us invalid results. -> Setting special NOT EQUAL flag is not enough: subquery plan must be -> always inner one in this case. The same for handling ALL modifier. -> Note, that we generaly can't use aggregates here: we can't add MAX to -> subquery in the case of > ALL (subquery), because of > ALL should return FALSE -> if subquery returns NULL(s) but aggregates don't take NULLs into account. - -OK, here are my ideas. First, I think you have to handle subselects in -the outer node because a subquery could have its own subquery. Also, we -now have a field in Aggreg to all us to 'usenulls'. - -OK, here it is. I recommend we pass the outer and subquery through -the parser and optimizer separately. - -We parse the subquery first. If the subquery is not correlated, it -should parse fine. If it is correlated, any columns we find in the -subquery that are not already in the FROM list, we add the table to the -subquery FROM list, and add the referenced column to the target list of -the subquery. - -When we are finished parsing the subquery, we create a catalog cache -entry for it called 'sub1' and make its fields match the target -list of the subquery. - -In the outer query, we add 'sub1' to its target list, and change -the subquery reference to point to the new range table. We also add -WHERE clauses to do any correlated joins. - -Here is a simple example: - - select * - from taba - where col1 = (select col2 - from tabb) - -This is not correlated, and the subquery parser easily. We create a -'sub1' catalog cache entry, and add 'sub1' to the outer query FROM -clause. We also replace 'col1 = (subquery)' with 'col1 = sub1.col2'. - -Here is a more complex correlated subquery: - - select * - from taba - where col1 = (select col2 - from tabb - where taba.col3 = tabb.col4) - -Here we must add 'taba' to the subquery's FROM list, and add col3 to the -target list of the subquery. After we parse the subquery, add 'sub1' to -the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 = -sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'. -THe optimizer will do the correlation for us. - -In the optimizer, we can parse the subquery first, then the outer query, -and then replace all 'sub1' references in the outer query to use the -subquery plan. - -I realize making merging the two plans and doing IN and NOT IN is the -real challenge, but I hoped this would give us a start. - -What do you think? - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Mon Jan 5 15:02:46 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA28690 - for ; Mon, 5 Jan 1998 15:02:44 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA08811 for ; Mon, 5 Jan 1998 14:28:43 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id CAA24904; - Tue, 6 Jan 1998 02:56:00 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B13ACD.B1A95805@sable.krasnoyarsk.su> -Date: Tue, 06 Jan 1998 02:55:57 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org -Subject: Re: [HACKERS] subselect -References: <199801051528.KAA10375@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > always inner one in this case. The same for handling ALL modifier. -> > Note, that we generaly can't use aggregates here: we can't add MAX to -> > subquery in the case of > ALL (subquery), because of > ALL should return FALSE -> > if subquery returns NULL(s) but aggregates don't take NULLs into account. -> -> OK, here are my ideas. First, I think you have to handle subselects in -> the outer node because a subquery could have its own subquery. Also, we - -I hope that this is no matter: if results of subquery (with/without sub-subqueries) -will go into temp table then this table will be re-scanned for each outer tuple. - -> now have a field in Aggreg to all us to 'usenulls'. - ^^^^^^^^ - This can't help: - -vac=> select * from x; -y -- -1 -2 -3 - <<< this is NULL -(4 rows) - -vac=> select max(y) from x; -max ---- - 3 - -==> we can't replace - -select * from A where A.a > ALL (select y from x); - ^^^^^^^^^^^^^^^ - (NULL will be returned and so A.a > ALL is FALSE - this is what - Sybase does, is it right ?) -with - -select * from A where A.a > (select max(y) from x); - ^^^^^^^^^^^^^^^^^^^^ -just because of we lose knowledge about NULLs here. - -Also, I would like to handle ANY and ALL modifiers for all bool -operators, either built-in or user-defined, for all data types - -isn't PostgreSQL OO-like RDBMS -:) - -> OK, here it is. I recommend we pass the outer and subquery through -> the parser and optimizer separately. - -I don't like this. I would like to get parse-tree from parser for -entire query and let optimizer (on upper level) decide how to rewrite -parse-tree and what plans to produce and how these plans should be -merged. Note, that I don't object your methods below, but only where -to place handling of this. I don't understand why should we add -new part to the system which will do optimizer' work (parse-tree --> -execution plan) and deal with optimizer nodes. Imho, upper optimizer -level is nice place to do this. - -> -> We parse the subquery first. If the subquery is not correlated, it -> should parse fine. If it is correlated, any columns we find in the -> subquery that are not already in the FROM list, we add the table to the -> subquery FROM list, and add the referenced column to the target list of -> the subquery. -> -> When we are finished parsing the subquery, we create a catalog cache -> entry for it called 'sub1' and make its fields match the target -> list of the subquery. -> -> In the outer query, we add 'sub1' to its target list, and change -> the subquery reference to point to the new range table. We also add -> WHERE clauses to do any correlated joins. -... -> Here is a more complex correlated subquery: -> -> select * -> from taba -> where col1 = (select col2 -> from tabb -> where taba.col3 = tabb.col4) -> -> Here we must add 'taba' to the subquery's FROM list, and add col3 to the -> target list of the subquery. After we parse the subquery, add 'sub1' to -> the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 = -> sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'. -> THe optimizer will do the correlation for us. -> -> In the optimizer, we can parse the subquery first, then the outer query, -> and then replace all 'sub1' references in the outer query to use the -> subquery plan. -> -> I realize making merging the two plans and doing IN and NOT IN is the - ^^^^^^^^^^^^^^^^^^^^^ -This is very easy to do! As I already said we have just change sub1 -access path (SeqScan of sub1) with SeqScan of Material node with -subquery plan. - -> real challenge, but I hoped this would give us a start. - -Decision about how to record subquery stuff in to parse-tree -would be very good start -:) - -BTW, note that for _expression_ subqueries (which are introduced without -IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples - -we have to check that subquery returns single tuple... - -Vadim - -From owner-pgsql-hackers@hub.org Mon Jan 5 20:31:03 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06836 - for ; Mon, 5 Jan 1998 20:31:01 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA29980 for ; Mon, 5 Jan 1998 19:56:05 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28044; Mon, 5 Jan 1998 19:06:11 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:16 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27203 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:02 -0500 (EST) -Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27049 for ; Mon, 5 Jan 1998 19:02:30 -0500 (EST) -Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) - by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09337 - for ; Mon, 5 Jan 1998 17:31:04 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id RAA02675; - Mon, 5 Jan 1998 17:16:40 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801052216.RAA02675@candle.pha.pa.us> -Subject: Re: [HACKERS] subselect -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Mon, 5 Jan 1998 17:16:40 -0500 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <34B15C23.B24D5CC@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 6, 98 05:18:11 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -> > I am confused. Do you want one flat query and want to pass the whole -> > thing into the optimizer? That brings up some questions: -> -> No. I just want to follow Tom's way: I would like to see new -> SubSelect node as shortened version of struct Query (or use -> Query structure for each subquery - no matter for me), some -> subquery-related stuff added to Query (and SubSelect) to help -> optimizer to start, and see - -OK, so you want the subquery to actually be INSIDE the outer query -expression. Do they share a common range table? If they don't, we -could very easily just fly through when processing the WHERE clause, and -start a new query using a new query structure for the subquery. Believe -me, you don't want a separate SubQuery-type, just re-use Query for it. -It allows you to call all the normal query stuff with a consistent -structure. - -The parser will need to know it is in a subquery, so it can add the -proper target columns to the subquery, or are you going to do that in -the optimizer. You can do it in the optimizer, and join the range table -references there too. - -> -> typedef struct A_Expr -> { -> NodeTag type; -> int oper; /* type of operation -> * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> IN, NOT IN, ANY, ALL, EXISTS here, -> -> char *opname; /* name of operator/function */ -> Node *lexpr; /* left argument */ -> Node *rexpr; /* right argument */ -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> and SubSelect (Query) here (as possible case). -> -> One thought to follow this way: RULEs (and so - VIEWs) are handled by using -> Query - how else can we implement VIEWs on selects with subqueries ? - -Views are stored as nodeout structures, and are merged into the query's -from list, target list, and where clause. I am working out -readfunc,outfunc now to make sure they are up-to-date with all the -current fields. - -> -> BTW, is -> -> select * from A where (select TRUE from B); -> -> valid syntax ? - -I don't think so. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Mon Jan 5 17:01:54 1998 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA02066 - for ; Mon, 5 Jan 1998 17:01:47 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25063; - Tue, 6 Jan 1998 05:18:13 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B15C23.B24D5CC@sable.krasnoyarsk.su> -Date: Tue, 06 Jan 1998 05:18:11 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org -Subject: Re: [HACKERS] subselect -References: <199801052051.PAA29341@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > > OK, here it is. I recommend we pass the outer and subquery through -> > > the parser and optimizer separately. -> > -> > I don't like this. I would like to get parse-tree from parser for -> > entire query and let optimizer (on upper level) decide how to rewrite -> > parse-tree and what plans to produce and how these plans should be -> > merged. Note, that I don't object your methods below, but only where -> > to place handling of this. I don't understand why should we add -> > new part to the system which will do optimizer' work (parse-tree --> -> > execution plan) and deal with optimizer nodes. Imho, upper optimizer -> > level is nice place to do this. -> -> I am confused. Do you want one flat query and want to pass the whole -> thing into the optimizer? That brings up some questions: - -No. I just want to follow Tom's way: I would like to see new -SubSelect node as shortened version of struct Query (or use -Query structure for each subquery - no matter for me), some -subquery-related stuff added to Query (and SubSelect) to help -optimizer to start, and see - -typedef struct A_Expr -{ - NodeTag type; - int oper; /* type of operation - * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - IN, NOT IN, ANY, ALL, EXISTS here, - - char *opname; /* name of operator/function */ - Node *lexpr; /* left argument */ - Node *rexpr; /* right argument */ - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - and SubSelect (Query) here (as possible case). - -One thought to follow this way: RULEs (and so - VIEWs) are handled by using -Query - how else can we implement VIEWs on selects with subqueries ? - -BTW, is - -select * from A where (select TRUE from B); - -valid syntax ? - -Vadim - -From vadim@sable.krasnoyarsk.su Mon Jan 5 18:00:57 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03296 - for ; Mon, 5 Jan 1998 18:00:55 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA20716 for ; Mon, 5 Jan 1998 17:22:21 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094; - Tue, 6 Jan 1998 05:49:02 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su> -Date: Tue, 06 Jan 1998 05:48:58 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Goran Thyni -CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org -Subject: Re: [HACKERS] subselect -References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Goran Thyni wrote: -> -> Vadim, -> -> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN -> is one example of this - joining by <> will give us invalid results. -> -> What is you approach towards this problem? - -Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL) -and so, we have to have not just NOT EQUAL flag but some ALL node -with modified operator. - -After that, one way is put subquery into inner plan of an join node -to be sure that for an outer tuple all corresponding subquery tuples -will be tested with modified operator (this will require either -changing code of all join nodes or addition of new plan type - we'll see) -and another way is ... suggested by you: - -> I got an idea that one could reverse the order, -> that is execute the outer first into a temptable -> and delete from that according to the result of the -> subquery and then return it. -> Probably this is too raw and slow. ;-) - -This will be faster in some cases (when subquery returns many results -and there are "not so many" results from outer query) - thanks for idea! - -> -> Personally, I was stuck by holydays -:) -> Now I can spend ~ 8 hours ~ each day for development... -> -> Oh, isn't it christmas eve right now in Russia? - -Due to historic reasons New Year is mu-u-u-uch popular -holiday in Russia -:) - -Vadim - -From owner-pgsql-hackers@hub.org Mon Jan 5 19:32:59 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA05070 - for ; Mon, 5 Jan 1998 19:32:57 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA26847 for ; Mon, 5 Jan 1998 18:59:43 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28045; Mon, 5 Jan 1998 19:06:11 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:40 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27280 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:25 -0500 (EST) -Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27030 for ; Mon, 5 Jan 1998 19:02:25 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09438 - for ; Mon, 5 Jan 1998 17:35:43 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094; - Tue, 6 Jan 1998 05:49:02 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su> -Date: Tue, 06 Jan 1998 05:48:58 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Goran Thyni -CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org -Subject: Re: [HACKERS] subselect -References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Goran Thyni wrote: -> -> Vadim, -> -> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN -> is one example of this - joining by <> will give us invalid results. -> -> What is you approach towards this problem? - -Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL) -and so, we have to have not just NOT EQUAL flag but some ALL node -with modified operator. - -After that, one way is put subquery into inner plan of an join node -to be sure that for an outer tuple all corresponding subquery tuples -will be tested with modified operator (this will require either -changing code of all join nodes or addition of new plan type - we'll see) -and another way is ... suggested by you: - -> I got an idea that one could reverse the order, -> that is execute the outer first into a temptable -> and delete from that according to the result of the -> subquery and then return it. -> Probably this is too raw and slow. ;-) - -This will be faster in some cases (when subquery returns many results -and there are "not so many" results from outer query) - thanks for idea! - -> -> Personally, I was stuck by holydays -:) -> Now I can spend ~ 8 hours ~ each day for development... -> -> Oh, isn't it christmas eve right now in Russia? - -Due to historic reasons New Year is mu-u-u-uch popular -holiday in Russia -:) - -Vadim - - -From vadim@sable.krasnoyarsk.su Mon Jan 5 18:00:59 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03300 - for ; Mon, 5 Jan 1998 18:00:57 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA21652 for ; Mon, 5 Jan 1998 17:42:15 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129; - Tue, 6 Jan 1998 06:10:05 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su> -Date: Tue, 06 Jan 1998 06:09:56 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org -Subject: Re: [HACKERS] subselect -References: <199801052216.RAA02675@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > > I am confused. Do you want one flat query and want to pass the whole -> > > thing into the optimizer? That brings up some questions: -> > -> > No. I just want to follow Tom's way: I would like to see new -> > SubSelect node as shortened version of struct Query (or use -> > Query structure for each subquery - no matter for me), some -> > subquery-related stuff added to Query (and SubSelect) to help -> > optimizer to start, and see -> -> OK, so you want the subquery to actually be INSIDE the outer query -> expression. Do they share a common range table? If they don't, we - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -No. - -> could very easily just fly through when processing the WHERE clause, and -> start a new query using a new query structure for the subquery. Believe - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -... and filling some subquery-related stuff in upper query structure - -still don't know what exactly this could be -:) - -> me, you don't want a separate SubQuery-type, just re-use Query for it. -> It allows you to call all the normal query stuff with a consistent -> structure. - -No objections. - -> -> The parser will need to know it is in a subquery, so it can add the -> proper target columns to the subquery, or are you going to do that in - -I don't think that we need in it, but list of correlation clauses -could be good thing - all in all parser has to check all column -references... - -> the optimizer. You can do it in the optimizer, and join the range table -> references there too. - -Yes. - -> > typedef struct A_Expr -> > { -> > NodeTag type; -> > int oper; /* type of operation -> > * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ -> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> > IN, NOT IN, ANY, ALL, EXISTS here, -> > -> > char *opname; /* name of operator/function */ -> > Node *lexpr; /* left argument */ -> > Node *rexpr; /* right argument */ -> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> > and SubSelect (Query) here (as possible case). -> > -> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using -> > Query - how else can we implement VIEWs on selects with subqueries ? -> -> Views are stored as nodeout structures, and are merged into the query's -> from list, target list, and where clause. I am working out -> readfunc,outfunc now to make sure they are up-to-date with all the -> current fields. - -Nice! This stuff was out-of-date for too long time. - -> > BTW, is -> > -> > select * from A where (select TRUE from B); -> > -> > valid syntax ? -> -> I don't think so. - -And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN, -ANY, ALL, EXISTS - well. - -(Time to sleep -:) - -Vadim - -From owner-pgsql-hackers@hub.org Mon Jan 5 20:31:08 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06842 - for ; Mon, 5 Jan 1998 20:31:06 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id UAA00621 for ; Mon, 5 Jan 1998 20:03:49 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28043; Mon, 5 Jan 1998 19:06:11 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:38 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27270 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:22 -0500 (EST) -Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27141 for ; Mon, 5 Jan 1998 19:02:50 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09919 - for ; Mon, 5 Jan 1998 17:54:47 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129; - Tue, 6 Jan 1998 06:10:05 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su> -Date: Tue, 06 Jan 1998 06:09:56 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org -Subject: Re: [HACKERS] subselect -References: <199801052216.RAA02675@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce Momjian wrote: -> -> > > I am confused. Do you want one flat query and want to pass the whole -> > > thing into the optimizer? That brings up some questions: -> > -> > No. I just want to follow Tom's way: I would like to see new -> > SubSelect node as shortened version of struct Query (or use -> > Query structure for each subquery - no matter for me), some -> > subquery-related stuff added to Query (and SubSelect) to help -> > optimizer to start, and see -> -> OK, so you want the subquery to actually be INSIDE the outer query -> expression. Do they share a common range table? If they don't, we - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -No. - -> could very easily just fly through when processing the WHERE clause, and -> start a new query using a new query structure for the subquery. Believe - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -... and filling some subquery-related stuff in upper query structure - -still don't know what exactly this could be -:) - -> me, you don't want a separate SubQuery-type, just re-use Query for it. -> It allows you to call all the normal query stuff with a consistent -> structure. - -No objections. - -> -> The parser will need to know it is in a subquery, so it can add the -> proper target columns to the subquery, or are you going to do that in - -I don't think that we need in it, but list of correlation clauses -could be good thing - all in all parser has to check all column -references... - -> the optimizer. You can do it in the optimizer, and join the range table -> references there too. - -Yes. - -> > typedef struct A_Expr -> > { -> > NodeTag type; -> > int oper; /* type of operation -> > * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ -> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> > IN, NOT IN, ANY, ALL, EXISTS here, -> > -> > char *opname; /* name of operator/function */ -> > Node *lexpr; /* left argument */ -> > Node *rexpr; /* right argument */ -> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> > and SubSelect (Query) here (as possible case). -> > -> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using -> > Query - how else can we implement VIEWs on selects with subqueries ? -> -> Views are stored as nodeout structures, and are merged into the query's -> from list, target list, and where clause. I am working out -> readfunc,outfunc now to make sure they are up-to-date with all the -> current fields. - -Nice! This stuff was out-of-date for too long time. - -> > BTW, is -> > -> > select * from A where (select TRUE from B); -> > -> > valid syntax ? -> -> I don't think so. - -And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN, -ANY, ALL, EXISTS - well. - -(Time to sleep -:) - -Vadim - - -From owner-pgsql-hackers@hub.org Thu Jan 8 23:10:50 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA09707 - for ; Thu, 8 Jan 1998 23:10:48 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA19334 for ; Thu, 8 Jan 1998 23:08:49 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id XAA14375; Thu, 8 Jan 1998 23:03:29 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 08 Jan 1998 23:03:10 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id XAA14345 for pgsql-hackers-outgoing; Thu, 8 Jan 1998 23:03:06 -0500 (EST) -Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id XAA14008 for ; Thu, 8 Jan 1998 23:00:50 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id WAA09243; - Thu, 8 Jan 1998 22:55:03 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801090355.WAA09243@candle.pha.pa.us> -Subject: [HACKERS] subselects -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Thu, 8 Jan 1998 22:55:03 -0500 (EST) -Cc: hackers@postgreSQL.org (PostgreSQL-development) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Vadim, I know you are still thinking about subselects, but I have some -more clarification that may help. - -We have to add phantom range table entries to correlated subselects so -they will pass the parser. We might as well add those fields to the -target list of the subquery at the same time: - - select * - from taba - where col1 = (select col2 - from tabb - where taba.col3 = tabb.col4) - -becomes: - - select * - from taba - where col1 = (select col2, tabb.col4 <--- - from tabb, taba <--- - where taba.col3 = tabb.col4) - -We add a field to TargetEntry and RangeTblEntry to mark the fact that it -was entered as a correlation entry: - - bool isCorrelated; - -Second, we need to hook the subselect to the main query. I recommend we -add two fields to Query for this: - - Query *parentQuery; - List *subqueries; - -The parentQuery pointer is used to resolve field names in the correlated -subquery. - - select * - from taba - where col1 = (select col2, tabb.col4 <--- - from tabb, taba <--- - where taba.col3 = tabb.col4) - -In the query above, the subquery can be easily parsed, and we add the -subquery to the parsent's parentQuery list. - -In the parent query, to parse the WHERE clause, we create a new operator -type, called IN or NOT_IN, or ALL, where the left side is a Var, and the -right side is an index to a slot in the subqueries List. - -We can then do the rest in the upper optimizer. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Fri Jan 9 10:01:01 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27305 - for ; Fri, 9 Jan 1998 10:00:59 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21583 for ; Fri, 9 Jan 1998 09:52:17 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id WAA01623; - Fri, 9 Jan 1998 22:10:25 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> -Date: Fri, 09 Jan 1998 22:10:06 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: subselects -References: <199801090355.WAA09243@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> Vadim, I know you are still thinking about subselects, but I have some -> more clarification that may help. -> -> We have to add phantom range table entries to correlated subselects so -> they will pass the parser. We might as well add those fields to the -> target list of the subquery at the same time: -> -> select * -> from taba -> where col1 = (select col2 -> from tabb -> where taba.col3 = tabb.col4) -> -> becomes: -> -> select * -> from taba -> where col1 = (select col2, tabb.col4 <--- -> from tabb, taba <--- -> where taba.col3 = tabb.col4) -> -> We add a field to TargetEntry and RangeTblEntry to mark the fact that it -> was entered as a correlation entry: -> -> bool isCorrelated; - -No, I don't like to add anything in parser. Example: - - select * - from tabA - where col1 = (select col2 - from tabB - where tabA.col3 = tabB.col4 - and exists (select * - from tabC - where tabB.colX = tabC.colX and - tabC.colY = tabA.col2) - ) - -: a column of tabA is referenced in sub-subselect -(is it allowable by standards ?) - in this case it's better -to don't add tabA to 1st subselect but add tabA to second one -and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - -this gives us 2-tables join in 1st subquery instead of 3-tables join. -(And I'm still not sure that using temp tables is best of what can be -done in all cases...) - -Instead of using isCorrelated in TE & RTE we can add - -Index varlevel; - -to Var node to reflect (sub)query from where this Var is come -(where is range table to find var's relation using varno). Upmost query -will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on. - ^^^ ^^^^^^^^^^^^ -(I don't see problems with distinguishing Vars of different children -on the same level...) - -> -> Second, we need to hook the subselect to the main query. I recommend we -> add two fields to Query for this: -> -> Query *parentQuery; -> List *subqueries; - -Agreed. And maybe Index queryLevel. - -> In the parent query, to parse the WHERE clause, we create a new operator -> type, called IN or NOT_IN, or ALL, where the left side is a Var, and the - ^^^^^^^^^^^^^^^^^^ -No. We have to handle (a,b,c) OP (select x, y, z ...) and -'_a_constant_' OP (select ...) - I don't know is last in standards, -Sybase has this. - -Well, - -typedef enum OpType -{ - OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR - -+ OP_EXISTS, OP_ALL, OP_ANY - -} OpType; - -typedef struct Expr -{ - NodeTag type; - Oid typeOid; /* oid of the type of this expr */ - OpType opType; /* type of the op */ - Node *oper; /* could be Oper or Func */ - List *args; /* list of argument nodes */ -} Expr; - -OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries - List, following your suggestion) - -OP_ALL, OP_ANY: - -oper is List of Oper nodes. We need in list because of data types of -a, b, c (above) can be different and so Oper nodes will be different too. - -lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) - -left side of subquery' operator. -lsecond(args) is SubSelect. - -Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in -IN, NOTIN in A_Expr (parser node), but both of them have to be transferred -by parser into corresponding ANY and ALL. At the moment we can do: - -IN --> = ANY, NOT IN --> <> ALL - -but this will be "known bug": this breaks OO-nature of Postgres, because of -operators can be overrided and '=' can mean s o m e t h i n g (not equality). -Example: box data type. For boxes, = means equality of _areas_ and =~ -means that boxes are the same ==> =~ ANY should be used for IN. - -> right side is an index to a slot in the subqueries List. - -Vadim - -From owner-pgsql-hackers@hub.org Fri Jan 9 17:44:04 1998 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA24779 - for ; Fri, 9 Jan 1998 17:44:01 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id RAA20728; Fri, 9 Jan 1998 17:32:34 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 09 Jan 1998 17:32:19 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id RAA20503 for pgsql-hackers-outgoing; Fri, 9 Jan 1998 17:32:15 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id RAA20008 for ; Fri, 9 Jan 1998 17:31:24 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id RAA24282; - Fri, 9 Jan 1998 17:31:41 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801092231.RAA24282@candle.pha.pa.us> -Subject: [HACKERS] Re: subselects -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Fri, 9 Jan 1998 17:31:41 -0500 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 9, 98 10:10:06 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -> -> Bruce Momjian wrote: -> > -> > Vadim, I know you are still thinking about subselects, but I have some -> > more clarification that may help. -> > -> > We have to add phantom range table entries to correlated subselects so -> > they will pass the parser. We might as well add those fields to the -> > target list of the subquery at the same time: -> > -> > select * -> > from taba -> > where col1 = (select col2 -> > from tabb -> > where taba.col3 = tabb.col4) -> > -> > becomes: -> > -> > select * -> > from taba -> > where col1 = (select col2, tabb.col4 <--- -> > from tabb, taba <--- -> > where taba.col3 = tabb.col4) -> > -> > We add a field to TargetEntry and RangeTblEntry to mark the fact that it -> > was entered as a correlation entry: -> > -> > bool isCorrelated; -> -> No, I don't like to add anything in parser. Example: -> -> select * -> from tabA -> where col1 = (select col2 -> from tabB -> where tabA.col3 = tabB.col4 -> and exists (select * -> from tabC -> where tabB.colX = tabC.colX and -> tabC.colY = tabA.col2) -> ) -> -> : a column of tabA is referenced in sub-subselect - -This is a strange case that I don't think we need to handle in our first -implementation. - -> (is it allowable by standards ?) - in this case it's better -> to don't add tabA to 1st subselect but add tabA to second one -> and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - -> this gives us 2-tables join in 1st subquery instead of 3-tables join. -> (And I'm still not sure that using temp tables is best of what can be -> done in all cases...) - -I don't see any use for temp tables in subselects anymore. After having -implemented UNIONS, I now see how much can be done in the upper -optimizer. I see you just putting the subquery PLAN into the proper -place in the plan tree, with some proper JOIN nodes for IN, NOT IN. - -> -> Instead of using isCorrelated in TE & RTE we can add -> -> Index varlevel; - -OK. Sounds good. - -> -> to Var node to reflect (sub)query from where this Var is come -> (where is range table to find var's relation using varno). Upmost query -> will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on. -> ^^^ ^^^^^^^^^^^^ -> (I don't see problems with distinguishing Vars of different children -> on the same level...) -> -> > -> > Second, we need to hook the subselect to the main query. I recommend we -> > add two fields to Query for this: -> > -> > Query *parentQuery; -> > List *subqueries; -> -> Agreed. And maybe Index queryLevel. - -Sure. If it helps. - -> -> > In the parent query, to parse the WHERE clause, we create a new operator -> > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the -> ^^^^^^^^^^^^^^^^^^ -> No. We have to handle (a,b,c) OP (select x, y, z ...) and -> '_a_constant_' OP (select ...) - I don't know is last in standards, -> Sybase has this. - -I have never seen this in my eight years of SQL. Perhaps we can leave -this for later, maybe much later. - -> -> Well, -> -> typedef enum OpType -> { -> OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR -> -> + OP_EXISTS, OP_ALL, OP_ANY -> -> } OpType; -> -> typedef struct Expr -> { -> NodeTag type; -> Oid typeOid; /* oid of the type of this expr */ -> OpType opType; /* type of the op */ -> Node *oper; /* could be Oper or Func */ -> List *args; /* list of argument nodes */ -> } Expr; -> -> OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries -> List, following your suggestion) -> -> OP_ALL, OP_ANY: -> -> oper is List of Oper nodes. We need in list because of data types of -> a, b, c (above) can be different and so Oper nodes will be different too. -> -> lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) - -> left side of subquery' operator. -> lsecond(args) is SubSelect. -> -> Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in -> IN, NOTIN in A_Expr (parser node), but both of them have to be transferred -> by parser into corresponding ANY and ALL. At the moment we can do: -> -> IN --> = ANY, NOT IN --> <> ALL -> -> but this will be "known bug": this breaks OO-nature of Postgres, because of -> operators can be overrided and '=' can mean s o m e t h i n g (not equality). -> Example: box data type. For boxes, = means equality of _areas_ and =~ -> means that boxes are the same ==> =~ ANY should be used for IN. - -That is interesting, to use =~ for ANY. - -Yes, but how many operators take a SUBQUERY as an operand. This is a -special case to me. - -I think I see where you are trying to go. You want subselects to behave -like any other operator, with a subselect type, and you do all the -subselect handling in the optimizer, with special Nodes and actions. - -I think this may be just too much of a leap. We have such clean query -logic for single queries, I can't imagine having an operator that has a -Query operand, and trying to get everything to properly handle it. -UNIONS were very easy to implement as a List off of Query, with some -foreach()'s in rewrite and the high optimizer. - -Subselects are SQL standard, and are never going to be over-ridden by a -user. Same with UNION. They want UNION, they get UNION. They want -Subselect, we are going to spin through the Query structure and give -them what they want. - -The complexities of subselects and correlated queries and range tables -and stuff is so bizarre that trying to get it to work inside the type -system could be a huge project. - -> -> > right side is an index to a slot in the subqueries List. - -I guess the question is what can we have by February 1? - -I have been reading some postings, and it seems to me that subselects -are the litmus test for many evaluators when deciding if a database -engine is full-featured. - -Sorry to be so straightforward, but I want to keep hashing this around -until we get a conclusion, so coding can start. - -My suggestions have been, I believe, trying to get subselects working -with the fullest functionality by adding the least amount of code, and -keeping the logic clean. - -Have you checked out the UNION code? It is very small, but it works. I -think it could make a good sample for subselects. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Sat Jan 10 12:00:51 1998 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA28742 - for ; Sat, 10 Jan 1998 12:00:43 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05684; - Sun, 11 Jan 1998 00:19:10 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> -Date: Sun, 11 Jan 1998 00:19:08 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgresql.org, "Thomas G. Lockhart" -Subject: Re: subselects -References: <199801092231.RAA24282@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > No, I don't like to add anything in parser. Example: -> > -> > select * -> > from tabA -> > where col1 = (select col2 -> > from tabB -> > where tabA.col3 = tabB.col4 -> > and exists (select * -> > from tabC -> > where tabB.colX = tabC.colX and -> > tabC.colY = tabA.col2) -> > ) -> > -> > : a column of tabA is referenced in sub-subselect -> -> This is a strange case that I don't think we need to handle in our first -> implementation. - -I don't know is this strange case or not :) -But I would like to know is this allowed by standards - can someone -comment on this ? -And I don't see problems with handling this... - -> -> > (is it allowable by standards ?) - in this case it's better -> > to don't add tabA to 1st subselect but add tabA to second one -> > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - -> > this gives us 2-tables join in 1st subquery instead of 3-tables join. -> > (And I'm still not sure that using temp tables is best of what can be -> > done in all cases...) -> -> I don't see any use for temp tables in subselects anymore. After having -> implemented UNIONS, I now see how much can be done in the upper -> optimizer. I see you just putting the subquery PLAN into the proper -> place in the plan tree, with some proper JOIN nodes for IN, NOT IN. - -When saying about temp tables, I meant tables created by node Material -for subquery plan. This is one of two ways - run subquery once for all -possible upper plan tuples and then just join result table with upper -query. Another way is re-run subquery for each upper query tuple, -without temp table but may be with caching results by some ways. -Actually, there is special case - when subquery can be alternatively -formulated as joins, - but this is just special case. - -> > > In the parent query, to parse the WHERE clause, we create a new operator -> > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the -> > ^^^^^^^^^^^^^^^^^^ -> > No. We have to handle (a,b,c) OP (select x, y, z ...) and -> > '_a_constant_' OP (select ...) - I don't know is last in standards, -> > Sybase has this. -> -> I have never seen this in my eight years of SQL. Perhaps we can leave -> this for later, maybe much later. - -Are you saying about (a, b, c) or about 'a_constant' ? -Again, can someone comment on are they in standards or not ? -Tom ? -If yes then please add parser' support for them now... - -> > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in -> > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred -> > by parser into corresponding ANY and ALL. At the moment we can do: -> > -> > IN --> = ANY, NOT IN --> <> ALL -> > -> > but this will be "known bug": this breaks OO-nature of Postgres, because of -> > operators can be overrided and '=' can mean s o m e t h i n g (not equality). -> > Example: box data type. For boxes, = means equality of _areas_ and =~ -> > means that boxes are the same ==> =~ ANY should be used for IN. -> -> That is interesting, to use =~ for ANY. -> -> Yes, but how many operators take a SUBQUERY as an operand. This is a -> special case to me. -> -> I think I see where you are trying to go. You want subselects to behave -> like any other operator, with a subselect type, and you do all the -> subselect handling in the optimizer, with special Nodes and actions. -> -> I think this may be just too much of a leap. We have such clean query -> logic for single queries, I can't imagine having an operator that has a -> Query operand, and trying to get everything to properly handle it. -> UNIONS were very easy to implement as a List off of Query, with some -> foreach()'s in rewrite and the high optimizer. -> -> Subselects are SQL standard, and are never going to be over-ridden by a -> user. Same with UNION. They want UNION, they get UNION. They want -> Subselect, we are going to spin through the Query structure and give -> them what they want. -> -> The complexities of subselects and correlated queries and range tables -> and stuff is so bizarre that trying to get it to work inside the type -> system could be a huge project. - -PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS), -derived from the Berkeley Postgres database management system. While -PostgreSQL retains the powerful object-relational data model, rich data types and - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -easy extensibility of Postgres, it replaces the PostQuel query language with an -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -extended subset of SQL. -^^^^^^^^^^^^^^^^^^^^^^ - -Should we say users that subselect will work for standard data types only ? -I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ? -Is there difference between handling = ANY and ~ ANY ? I don't see any. -Currently we can't get IN working properly for boxes (and may be for others too) -and I don't like to try to resolve these problems now, but hope that someday -we'll be able to do this. At the moment - just convert IN into = ANY and -NOT IN into <> ALL in parser. - -(BTW, do you know how DISTINCT is implemented ? It doesn't use = but -use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...) - -> > -> > > right side is an index to a slot in the subqueries List. -> -> I guess the question is what can we have by February 1? -> -> I have been reading some postings, and it seems to me that subselects -> are the litmus test for many evaluators when deciding if a database -> engine is full-featured. -> -> Sorry to be so straightforward, but I want to keep hashing this around -> until we get a conclusion, so coding can start. -> -> My suggestions have been, I believe, trying to get subselects working -> with the fullest functionality by adding the least amount of code, and -> keeping the logic clean. -> -> Have you checked out the UNION code? It is very small, but it works. I -> think it could make a good sample for subselects. - -There is big difference between subqueries and queries in UNION - -there are not dependences between UNION queries. - -Ok, opened issues: - -1. Is using upper query' vars in all subquery levels in standard ? -2. Is (a, b, c) OP (subselect) in standard ? -3. What types of expressions (Var, Const, ...) are allowed on the left - side of operator with subquery on the right ? -4. What types of operators should we support (=, >, ..., like, ~, ...) ? - (My vote for all boolean operators). - -And - did we get consensus on presentation subqueries stuff in Query, -Expr and Var ? -I would like to have something done in parser near Jan 17 to get -subqueries working by Feb 1. I vote for support of all standard -things (1. - 3.) in parser right now - if there will be no time -to implement something like (a, b, c) then optimizer will call -elog(WARN) (oh, sorry, - elog(ERROR)). - -Vadim - -From vadim@sable.krasnoyarsk.su Sat Jan 10 12:31:05 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA29045 - for ; Sat, 10 Jan 1998 12:31:01 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA23364 for ; Sat, 10 Jan 1998 12:22:30 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05725; - Sun, 11 Jan 1998 00:41:22 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B7B2BF.44FE7252@sable.krasnoyarsk.su> -Date: Sun, 11 Jan 1998 00:41:19 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: [HACKERS] subselects -References: <199712220545.AAA11605@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> OK, a few questions: -> -> Should we use sortmerge, so we can use our psort as temp tables, -> or do we use hashunique? -> -> How do we pass the query to the optimizer? How do we represent -> the range table for each, and the links between them in correlated -> subqueries? - -My suggestion is just use varlevel in Var and don't put upper query' -relations into subquery range table. - -Vadim - -From vadim@sable.krasnoyarsk.su Sat Jan 10 13:01:00 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29357 - for ; Sat, 10 Jan 1998 13:00:58 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA24030 for ; Sat, 10 Jan 1998 12:40:02 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05741; - Sun, 11 Jan 1998 00:58:56 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B7B6DC.937E1B8D@sable.krasnoyarsk.su> -Date: Sun, 11 Jan 1998 00:58:52 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian , - PostgreSQL-development -Subject: Re: [HACKERS] subselects -References: <199712220545.AAA11605@candle.pha.pa.us> <34B7B2BF.44FE7252@sable.krasnoyarsk.su> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Vadim B. Mikheev wrote: -> -> Bruce Momjian wrote: -> > -> > OK, a few questions: -> > -> > Should we use sortmerge, so we can use our psort as temp tables, -> > or do we use hashunique? -> > -> > How do we pass the query to the optimizer? How do we represent -> > the range table for each, and the links between them in correlated -> > subqueries? -> -> My suggestion is just use varlevel in Var and don't put upper query' -> relations into subquery range table. - -Hmm... Sorry, it seems that I did reply to very old message - forget it. - -Vadim - -From lockhart@alumni.caltech.edu Sat Jan 10 13:30:59 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29664 - for ; Sat, 10 Jan 1998 13:30:56 -0500 (EST) -Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA25109 for ; Sat, 10 Jan 1998 13:05:09 -0500 (EST) -Received: from alumni.caltech.edu (localhost [127.0.0.1]) - by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA03623; - Sat, 10 Jan 1998 18:01:03 GMT -Sender: tgl@gnet04.jpl.nasa.gov -Message-ID: <34B7B75F.B49D7642@alumni.caltech.edu> -Date: Sat, 10 Jan 1998 18:01:03 +0000 -From: "Thomas G. Lockhart" -Organization: Caltech/JPL -X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) -MIME-Version: 1.0 -To: "Vadim B. Mikheev" -CC: Bruce Momjian , hackers@postgresql.org -Subject: Re: subselects -References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -> > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in -> > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred -> > > by parser into corresponding ANY and ALL. At the moment we can do: -> > > -> > > IN --> = ANY, NOT IN --> <> ALL -> > > -> > > but this will be "known bug": this breaks OO-nature of Postgres, because of -> > > operators can be overrided and '=' can mean s o m e t h i n g (not equality). -> > > Example: box data type. For boxes, = means equality of _areas_ and =~ -> > > means that boxes are the same ==> =~ ANY should be used for IN. -> > -> > That is interesting, to use =~ for ANY. - -If I understand the discussion, I would think is is fine to make an assumption about -which operator is used to implement a subselect expression. If someone remaps an -operator to mean something different, then they will get a different result (or a -nonsensical one) from a subselect. - -I'd be happy to remap existing operators to fit into a convention which would work -with subselects (especially if I got to help choose :). - -> > Subselects are SQL standard, and are never going to be over-ridden by a -> > user. Same with UNION. They want UNION, they get UNION. They want -> > Subselect, we are going to spin through the Query structure and give -> > them what they want. -> -> PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS), -> derived from the Berkeley Postgres database management system. While -> PostgreSQL retains the powerful object-relational data model, rich data types and -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> easy extensibility of Postgres, it replaces the PostQuel query language with an -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> extended subset of SQL. -> ^^^^^^^^^^^^^^^^^^^^^^ -> -> Should we say users that subselect will work for standard data types only ? -> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ? -> Is there difference between handling = ANY and ~ ANY ? I don't see any. -> Currently we can't get IN working properly for boxes (and may be for others too) -> and I don't like to try to resolve these problems now, but hope that someday -> we'll be able to do this. At the moment - just convert IN into = ANY and -> NOT IN into <> ALL in parser. -> -> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but -> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...) - -?? I didn't know that. Wouldn't we want it to eventually use "=" through a sorted -list? That would give more consistant behavior... - -> > I have been reading some postings, and it seems to me that subselects -> > are the litmus test for many evaluators when deciding if a database -> > engine is full-featured. -> > -> > Sorry to be so straightforward, but I want to keep hashing this around -> > until we get a conclusion, so coding can start. -> > -> > My suggestions have been, I believe, trying to get subselects working -> > with the fullest functionality by adding the least amount of code, and -> > keeping the logic clean. -> > -> > Have you checked out the UNION code? It is very small, but it works. I -> > think it could make a good sample for subselects. -> -> There is big difference between subqueries and queries in UNION - -> there are not dependences between UNION queries. -> -> Ok, opened issues: -> -> 1. Is using upper query' vars in all subquery levels in standard ? - -I'm not certain. Let me know if you do not get an answer from someone else and I will -research it. - -> 2. Is (a, b, c) OP (subselect) in standard ? - -Yes. In fact, it _is_ the standard, and "a OP (subselect)" is a special case where -the parens are allowed to be omitted from a one element list. - -> 3. What types of expressions (Var, Const, ...) are allowed on the left -> side of operator with subquery on the right ? - -I think most expressions are allowed. The "constant OP (subselect)" case you were -asking about is just a simplified case since "(a, b, constant) OP (subselect)" where -a and b are column references should be allowed. Of course, our optimizer could -perhaps change this to "(a, b) OP (subselect where x = constant)", or for the first -example "EXISTS (subselect where x = constant)". - -> 4. What types of operators should we support (=, >, ..., like, ~, ...) ? -> (My vote for all boolean operators). - -Sounds good. But I'll vote with Bruce (and I'll bet you already agree) that it is -important to get an initial implementation for v6.3 which covers a little, some, or -all of the usual SQL subselect constructs. If we have to revisit this for v6.4 then -we will have the benefit of feedback from others in practical applications which -always uncovers new things to consider. - -> And - did we get consensus on presentation subqueries stuff in Query, -> Expr and Var ? -> I would like to have something done in parser near Jan 17 to get -> subqueries working by Feb 1. I vote for support of all standard -> things (1. - 3.) in parser right now - if there will be no time -> to implement something like (a, b, c) then optimizer will callelog(WARN) (oh, -> sorry, - elog(ERROR)). - -Great. I'd like to help with the remaining parser issues; at the moment "row_expr" -does the right thing with expression comparisions but just parses then ignores -subselect expressions. Let me know what structures you want passed back and I'll put -them in, or if you prefer put in the first one and I'll go through and clean up and -add the rest. - - - Tom - - -From lockhart@alumni.caltech.edu Sat Jan 10 15:00:58 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA00728 - for ; Sat, 10 Jan 1998 15:00:56 -0500 (EST) -Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA28438 for ; Sat, 10 Jan 1998 14:35:19 -0500 (EST) -Received: from alumni.caltech.edu (localhost [127.0.0.1]) - by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id TAA06002; - Sat, 10 Jan 1998 19:31:30 GMT -Sender: tgl@gnet04.jpl.nasa.gov -Message-ID: <34B7CC91.E6E331C7@alumni.caltech.edu> -Date: Sat, 10 Jan 1998 19:31:29 +0000 -From: "Thomas G. Lockhart" -Organization: Caltech/JPL -X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) -MIME-Version: 1.0 -To: "Vadim B. Mikheev" -CC: Bruce Momjian , hackers@postgresql.org -Subject: Re: [HACKERS] Re: subselects -References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -> Are you saying about (a, b, c) or about 'a_constant' ? -> Again, can someone comment on are they in standards or not ? -> Tom ? -> If yes then please add parser' support for them now... - -As I mentioned a few minutes ago in my last message, I parse the row descriptors and -the subselects but for subselect expressions (e.g. "(a,b) OP (subselect)" I currently -ignore the result. I didn't want to pass things back as lists until something in the -backend was ready to receive them. - -If it is OK, I'll go ahead and start passing back a list of expressions when a row -descriptor is present. So, what you will find is lexpr or rexpr in the A_Expr node -being a list rather than an atomic node. - -Also, I can start passing back the subselect expression as the rexpr; right now the -parser calls elog() and quits. - -btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called -makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions. -If lists are handled farther back, this routine should move to there also and the -parser will just pass the lists. Note that some assumptions have to be made about the -meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of -"a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK -to disallow those cases or to look for specific appearance of the operator to guess -the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if -it has "<>" or "!" then build as "or"s. - -Let me know what you want... - - - Tom - - -From lockhart@alumni.caltech.edu Sun Jan 11 01:01:55 1998 -Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA11953 - for ; Sun, 11 Jan 1998 01:01:51 -0500 (EST) -Received: from alumni.caltech.edu (localhost [127.0.0.1]) - by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id FAA23797; - Sun, 11 Jan 1998 05:58:01 GMT -Sender: tgl@gnet04.jpl.nasa.gov -Message-ID: <34B85F68.9C015ED9@alumni.caltech.edu> -Date: Sun, 11 Jan 1998 05:58:01 +0000 -From: "Thomas G. Lockhart" -Organization: Caltech/JPL -X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) -MIME-Version: 1.0 -To: "Vadim B. Mikheev" -CC: Bruce Momjian , hackers@postgresql.org -Subject: Re: [HACKERS] Re: subselects -References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> -Content-Type: multipart/mixed; boundary="------------D8B38A0D1F78A10C0023F702" -Status: OR - -This is a multi-part message in MIME format. ---------------D8B38A0D1F78A10C0023F702 -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit - -Here are context diffs of gram.y and keywords.c; sorry about sending the full files. -These start sending lists of arguments toward the backend from the parser to -implement row descriptors and subselects. - -They should apply OK even over Bruce's recent changes... - - - Tom - ---------------D8B38A0D1F78A10C0023F702 -Content-Type: text/plain; charset=us-ascii; name="gram.y.patch" -Content-Transfer-Encoding: 7bit -Content-Disposition: inline; filename="gram.y.patch" - -*** ../src/backend/parser/gram.y.orig Sat Jan 10 05:44:36 1998 ---- ../src/backend/parser/gram.y Sat Jan 10 19:29:37 1998 -*************** -*** 195,200 **** ---- 195,201 ---- - having_clause - %type row_descriptor, row_list - %type row_expr -+ %type RowOp, row_opt - %type OptCreateAs, CreateAsList - %type CreateAsElement - %type NumConst -*************** -*** 242,248 **** - */ - - /* Keywords (in SQL92 reserved words) */ -! %token ACTION, ADD, ALL, ALTER, AND, AS, ASC, - BEGIN_TRANS, BETWEEN, BOTH, BY, - CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, - CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, ---- 243,249 ---- - */ - - /* Keywords (in SQL92 reserved words) */ -! %token ACTION, ADD, ALL, ALTER, AND, ANY, AS, ASC, - BEGIN_TRANS, BETWEEN, BOTH, BY, - CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, - CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, -*************** -*** 258,264 **** - ON, OPTION, OR, ORDER, OUTER_P, - PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC, - REFERENCES, REVOKE, RIGHT, ROLLBACK, -! SECOND_P, SELECT, SET, SUBSTRING, - TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM, - UNION, UNIQUE, UPDATE, USING, - VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW, ---- 259,265 ---- - ON, OPTION, OR, ORDER, OUTER_P, - PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC, - REFERENCES, REVOKE, RIGHT, ROLLBACK, -! SECOND_P, SELECT, SET, SOME, SUBSTRING, - TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM, - UNION, UNIQUE, UPDATE, USING, - VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW, -*************** -*** 2853,2866 **** - /* Expressions using row descriptors - * Define row_descriptor to allow yacc to break the reduce/reduce conflict - * with singleton expressions. - */ - row_expr: '(' row_descriptor ')' IN '(' SubSelect ')' - { -! $$ = NULL; - } - | '(' row_descriptor ')' NOT IN '(' SubSelect ')' - { -! $$ = NULL; - } - | '(' row_descriptor ')' '=' '(' row_descriptor ')' - { ---- 2854,2878 ---- - /* Expressions using row descriptors - * Define row_descriptor to allow yacc to break the reduce/reduce conflict - * with singleton expressions. -+ * -+ * Note that "SOME" is the same as "ANY" in syntax. -+ * - thomas 1998-01-10 - */ - row_expr: '(' row_descriptor ')' IN '(' SubSelect ')' - { -! $$ = makeA_Expr(OP, "=any", (Node *)$2, (Node *)$6); - } - | '(' row_descriptor ')' NOT IN '(' SubSelect ')' - { -! $$ = makeA_Expr(OP, "<>any", (Node *)$2, (Node *)$7); -! } -! | '(' row_descriptor ')' RowOp row_opt '(' SubSelect ')' -! { -! char *opr; -! opr = palloc(strlen($4)+strlen($5)+1); -! strcpy(opr, $4); -! strcat(opr, $5); -! $$ = makeA_Expr(OP, opr, (Node *)$2, (Node *)$7); - } - | '(' row_descriptor ')' '=' '(' row_descriptor ')' - { -*************** -*** 2880,2885 **** ---- 2892,2907 ---- - } - ; - -+ RowOp: '=' { $$ = "="; } -+ | '<' { $$ = "<"; } -+ | '>' { $$ = ">"; } -+ ; -+ -+ row_opt: ALL { $$ = "all"; } -+ | ANY { $$ = "any"; } -+ | SOME { $$ = "any"; } -+ ; -+ - row_descriptor: row_list ',' a_expr - { - $$ = lappend($1, $3); -*************** -*** 3432,3441 **** - ; - - in_expr: SubSelect -! { -! elog(ERROR,"IN (SUBSELECT) not yet implemented"); -! $$ = $1; -! } - | in_expr_nodes - { $$ = $1; } - ; ---- 3454,3460 ---- - ; - - in_expr: SubSelect -! { $$ = makeA_Expr(OP, "=", saved_In_Expr, (Node *)$1); } - | in_expr_nodes - { $$ = $1; } - ; -*************** -*** 3449,3458 **** - ; - - not_in_expr: SubSelect -! { -! elog(ERROR,"NOT IN (SUBSELECT) not yet implemented"); -! $$ = $1; -! } - | not_in_expr_nodes - { $$ = $1; } - ; ---- 3468,3474 ---- - ; - - not_in_expr: SubSelect -! { $$ = makeA_Expr(OP, "<>", saved_In_Expr, (Node *)$1); } - | not_in_expr_nodes - { $$ = $1; } - ; - ---------------D8B38A0D1F78A10C0023F702 -Content-Type: text/plain; charset=us-ascii; name="keywords.c.patch" -Content-Transfer-Encoding: 7bit -Content-Disposition: inline; filename="keywords.c.patch" - -*** ../src/backend/parser/keywords.c.orig Mon Jan 5 07:51:33 1998 ---- ../src/backend/parser/keywords.c Sat Jan 10 19:22:07 1998 -*************** -*** 39,44 **** ---- 39,45 ---- - {"alter", ALTER}, - {"analyze", ANALYZE}, - {"and", AND}, -+ {"any", ANY}, - {"append", APPEND}, - {"archive", ARCHIVE}, - {"as", AS}, -*************** -*** 178,183 **** ---- 179,185 ---- - {"set", SET}, - {"setof", SETOF}, - {"show", SHOW}, -+ {"some", SOME}, - {"stdin", STDIN}, - {"stdout", STDOUT}, - {"substring", SUBSTRING}, - ---------------D8B38A0D1F78A10C0023F702-- - - -From owner-pgsql-hackers@hub.org Sun Jan 11 01:31:13 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA12255 - for ; Sun, 11 Jan 1998 01:31:10 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA20396 for ; Sun, 11 Jan 1998 01:10:48 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA22176; Sun, 11 Jan 1998 01:03:15 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 11 Jan 1998 01:02:34 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA22151 for pgsql-hackers-outgoing; Sun, 11 Jan 1998 01:02:26 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA22077 for ; Sun, 11 Jan 1998 01:01:05 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id AAA11801; - Sun, 11 Jan 1998 00:59:23 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801110559.AAA11801@candle.pha.pa.us> -Subject: [HACKERS] Re: subselects -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Sun, 11 Jan 1998 00:59:23 -0500 (EST) -Cc: hackers@postgresql.org, lockhart@alumni.caltech.edu -In-Reply-To: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 11, 98 00:19:08 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -> I would like to have something done in parser near Jan 17 to get -> subqueries working by Feb 1. I vote for support of all standard -> things (1. - 3.) in parser right now - if there will be no time -> to implement something like (a, b, c) then optimizer will call -> elog(WARN) (oh, sorry, - elog(ERROR)). - -First, let me say I am glad we are still on schedule for Feb 1. I was -panicking because I thought we wouldn't make it in time. - - -> > > (is it allowable by standards ?) - in this case it's better -> > > to don't add tabA to 1st subselect but add tabA to second one -> > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - -> > > this gives us 2-tables join in 1st subquery instead of 3-tables join. -> > > (And I'm still not sure that using temp tables is best of what can be -> > > done in all cases...) -> > -> > I don't see any use for temp tables in subselects anymore. After having -> > implemented UNIONS, I now see how much can be done in the upper -> > optimizer. I see you just putting the subquery PLAN into the proper -> > place in the plan tree, with some proper JOIN nodes for IN, NOT IN. -> -> When saying about temp tables, I meant tables created by node Material -> for subquery plan. This is one of two ways - run subquery once for all -> possible upper plan tuples and then just join result table with upper -> query. Another way is re-run subquery for each upper query tuple, -> without temp table but may be with caching results by some ways. -> Actually, there is special case - when subquery can be alternatively -> formulated as joins, - but this is just special case. - -This is interesting. It really only applies for correlated subqueries, -and certainly it may help sometimes to just evaluate the subquery for -valid values that are going to come from the upper query than for all -possible values. Perhaps we can use the 'cost' value of each query to -decide how to handle this. - -> -> > > > In the parent query, to parse the WHERE clause, we create a new operator -> > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the -> > > ^^^^^^^^^^^^^^^^^^ -> > > No. We have to handle (a,b,c) OP (select x, y, z ...) and -> > > '_a_constant_' OP (select ...) - I don't know is last in standards, -> > > Sybase has this. -> > -> > I have never seen this in my eight years of SQL. Perhaps we can leave -> > this for later, maybe much later. -> -> Are you saying about (a, b, c) or about 'a_constant' ? -> Again, can someone comment on are they in standards or not ? -> Tom ? -> If yes then please add parser' support for them now... - -OK, Thomas says it is, so we will put in as much code as we can to handle -it. - -> Should we say users that subselect will work for standard data types only ? -> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ? -> Is there difference between handling = ANY and ~ ANY ? I don't see any. -> Currently we can't get IN working properly for boxes (and may be for others too) -> and I don't like to try to resolve these problems now, but hope that someday -> we'll be able to do this. At the moment - just convert IN into = ANY and -> NOT IN into <> ALL in parser. - -OK. - -> -> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but -> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...) - -I did not know that either. - -> There is big difference between subqueries and queries in UNION - -> there are not dependences between UNION queries. - -Yes, I know UNIONS are trivial compared to subselects. - -> -> Ok, opened issues: -> -> 1. Is using upper query' vars in all subquery levels in standard ? -> 2. Is (a, b, c) OP (subselect) in standard ? -> 3. What types of expressions (Var, Const, ...) are allowed on the left -> side of operator with subquery on the right ? -> 4. What types of operators should we support (=, >, ..., like, ~, ...) ? -> (My vote for all boolean operators). -> -> And - did we get consensus on presentation subqueries stuff in Query, -> Expr and Var ? - -OK, here are my concrete ideas on changes and structures. - -I think we all agreed that Query needs new fields: - - Query *parentQuery; - List *subqueries; - -Maybe query level too, but I don't think so (see later ideas on Var). - -We need a new Node structure, call it Sublink: - - int linkType (IN, NOTIN, ANY, EXISTS, OPERATOR...) - Oid operator /* subquery must return single row */ - List *lefthand; /* parent stuff */ - Node *subquery; /* represents nodes from parser */ - Index Subindex; /* filled in to index Query->subqueries */ - -Of course, the names are just suggestions. Every time we run through -the parsenodes of a query to create a Query* structure, when we do the -WHERE clause, if we come upon one of these Sublink nodes (created in the -parser), we move the supplied Query* in Sublink->subquery to a local -List variable, and we set Subquery->subindex to equal the index of the -new query, i.e. is it the first subquery we found, 1, or the second, 2, -etc. - -After we have created the parent Query structure, we run through our -local List variable of subquery parsenodes we created above, and add -Query* entries to Query->subqueries. In each subquery Query*, we set -the parentQuery pointer. - -Also, when parsing the subqueries, we need to keep track of correlated -references. I recommend we add a field to the Var structure: - - Index sublevel; /* range table reference: - = 0 current level of query - < 0 parent above this many levels - > 0 index into subquery list - */ - -This way, a Var node with sublevel 0 is the current level, and is true -in most cases. This helps us not have to change much code. sublevel = --1 means it references the range table in the parent query. sublevel = --2 means the parent's parent. sublevel = 2 means it references the range -table of the second entry in Query->subqueries. Varno and varattno are -still meaningful. Of course, we can't reference variables in the -subqueries from the parent in the parser code, but Vadim may want to. - -When doing a Var lookup in the parser, we look in the current level -first, but if not found, if it is a subquery, we can look at the parent -and parent's parent to set the sublevel, varno, and varatno properly. - -We create no phantom range table entries in the subquery, and no phantom -target list entries. We can leave that all for the upper optimizer. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Tue Dec 9 12:14:09 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA16186 - for ; Tue, 9 Dec 1997 12:14:05 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id MAA17524; Tue, 9 Dec 1997 12:05:31 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 09 Dec 1997 12:05:01 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id MAA17316 for pgsql-hackers-outgoing; Tue, 9 Dec 1997 12:04:55 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id MAA17304 for ; Tue, 9 Dec 1997 12:04:40 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id MAA15973; - Tue, 9 Dec 1997 12:05:03 -0500 (EST) -From: Bruce Momjian -Message-Id: <199712091705.MAA15973@candle.pha.pa.us> -Subject: Re: [HACKERS] Items for 6.3 -To: lockhart@alumni.caltech.edu (Thomas G. Lockhart) -Date: Tue, 9 Dec 1997 12:05:03 -0500 (EST) -Cc: hackers@postgreSQL.org, vadim@sable.krasnoyarsk.su -In-Reply-To: <348CE8BE.FE0F8AA1@alumni.caltech.edu> from "Thomas G. Lockhart" at Dec 9, 97 06:44:14 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> -> Bruce Momjian wrote: -> -> > Here are the items I think would make 6.3 a truly great release: -> > -> > subselects -> > outer joins -> -> These two would be sufficient (along with the changes already in the -> tree) to address the most visible deficiencies in SQL functionality. -> -> > temp tables -> > fix "Reliability" items attached to specific queries -> -> Sure, why not? - -We will need temp tables for subselects anyway. - -I could implement them, but again we come up against the problem of -storing these plans and executing them later. We need to do some of the -temp table stuff in the optimizer because the plan could be passed with -a temp table, and we can't bind the temp name to a real name in the -parser, especially if we save those plans in system tables that other -backends can execute. Multiple backends would be using the same temp -name. - -At the same time, we need some temp stuff in the parser so the parser -can recognize the temp table and its fields when it sees it. - -The hardest part is: - -select * into tmp mytmp from z where x=y; -select * from mytmp; - -If they are passed together, and we have to plan them both, before -either is executed, you have to make the parser aware of the fields in -mytmp, even though you have not executed the select yet, you are just -storing the plan. - -This was Vadim's point about not doing subselects in the parser. - -> -> > postmaster sync's pglog, giving almost fsync reliability with -> > no-fsync performance -> -> OK to save for v6.4. -> -> Could we try to do the subselect/join/union features for 6.3? I know you -> have been looking at it, and found the deepest parts of the backend to -> be a bit murky. I'm not familiar with that area at all, but perhaps we -> could divert Vadim for a week or two or three when he has some time. -> Especially if we trade him for help on his favorite topics for v6.4?? -> - -Sure. I may be able to do some of the pglog change myself, though Vadim -has some definite ideas on this. - -As for Vadim, trading help is a good idea, but what trade can we make? -He can do most of these tough things without us, and in 1/4 the time. -We can't even see where to start them. - -Basically, without Vadim, this project would have really major problems. - -He certainly likes working on PostgreSQL, so he must be busy with other -things. - -It is not fair to keep counting on Vadim to do all these tough jobs. We -really need to get other people up to Vadim's level of ability. -Unfortunately, the odds of this happening are very slim. - -This leaves me scratching my head. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Fri Dec 19 00:08:21 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA25029 - for ; Fri, 19 Dec 1997 00:08:13 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id MAA11825; - Fri, 19 Dec 1997 12:13:15 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <349A0265.7329D4EE@sable.krasnoyarsk.su> -Date: Fri, 19 Dec 1997 12:13:09 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: "Thomas G. Lockhart" -CC: Bruce Momjian , - PostgreSQL-development -Subject: Re: [HACKERS] Items for 6.3 -References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Thomas G. Lockhart wrote: -> -> Could we try to do the subselect/join/union features for 6.3? I know you -> have been looking at it, and found the deepest parts of the backend to -> be a bit murky. I'm not familiar with that area at all, but perhaps we -> could divert Vadim for a week or two or three when he has some time. - ^^^^^ -More realistic... And this is for initial release only: tuning performance -of subselects is very hard, long work. - -Ok - I'm ready to do subselects for 6.3 but this means that foreign keys -may appear in 6.4 only. And I'll need in help: could someone add support -for them in parser ? Not handling - but parsing and common checking. -Also, it would be nice to have better temp tables implementation -(without affecting pg_class etc) - node material need in query-level -temp tables anyway. I'd really like to see temp table files created -only when its data must go to disk due to local buffer pool is full -and can't more keep table data in memory. Also, local buffer manager -should be re-written to use hash table (like shared bufmgr) for buffer search, -not sequential scan as now (this is item for TODO) - this will speed up -things and allow to use more than 64 local buffers. - -I'm still sure that handling subselects in parser is not right way. -And the main problem is not in execution plans (we could use tricks -to resolve this) but in performance. Example: - -select b from big where b in (select s from small); - -If there is no duplicates in small then this is the same as - -select b from big, small where b = s; - -Without index on big postgres does seq scan of big and uses hashjoin with -hash on small. Using temp table makes query only 20% slower (in my test). -But with index on big postgres uses nestloop with seq scan of small and -index scan of big => select run faster and temp table stuff makes query -2.5 times slower! In the case of duplicates in small, handling in parser -will use distinct (and so - sorting). But using hashjoin plan distinct -may be avoided! Who can analize this ? Optimizer only. He can be smart -to check is there unique index on small or not. If not - what is more -costless: nestloop with sorting or slower hashjoin without sorting. -Only optimizer can find best way to execute query, parser can't. - -> Especially if we trade him for help on his favorite topics for v6.4?? - -Ok, I'd like to see shared catalog cache implemeted in 6.4... -:) - -Vadim - -From owner-pgsql-hackers@hub.org Fri Dec 19 00:58:54 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA25460 - for ; Fri, 19 Dec 1997 00:58:52 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA27667; Fri, 19 Dec 1997 00:54:39 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 19 Dec 1997 00:54:09 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA27633 for pgsql-hackers-outgoing; Fri, 19 Dec 1997 00:54:04 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA27623 for ; Fri, 19 Dec 1997 00:53:53 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id AAA25415; - Fri, 19 Dec 1997 00:53:15 -0500 (EST) -From: Bruce Momjian -Message-Id: <199712190553.AAA25415@candle.pha.pa.us> -Subject: Re: [HACKERS] Items for 6.3 -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Fri, 19 Dec 1997 00:53:15 -0500 (EST) -Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org -In-Reply-To: <349A0265.7329D4EE@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 19, 97 12:13:09 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> -> Thomas G. Lockhart wrote: -> > -> > Could we try to do the subselect/join/union features for 6.3? I know you -> > have been looking at it, and found the deepest parts of the backend to -> > be a bit murky. I'm not familiar with that area at all, but perhaps we -> > could divert Vadim for a week or two or three when he has some time. -> ^^^^^ -> More realistic... And this is for initial release only: tuning performance -> of subselects is very hard, long work. -> -> Ok - I'm ready to do subselects for 6.3 but this means that foreign keys - -Great. - -> may appear in 6.4 only. And I'll need in help: could someone add support -> for them in parser ? Not handling - but parsing and common checking. -> Also, it would be nice to have better temp tables implementation -> (without affecting pg_class etc) - node material need in query-level -> temp tables anyway. I'd really like to see temp table files created -> only when its data must go to disk due to local buffer pool is full -> and can't more keep table data in memory. Also, local buffer manager -> should be re-written to use hash table (like shared bufmgr) for buffer search, -> not sequential scan as now (this is item for TODO) - this will speed up -> things and allow to use more than 64 local buffers. -> -> I'm still sure that handling subselects in parser is not right way. -> And the main problem is not in execution plans (we could use tricks -> to resolve this) but in performance. Example: -> -> select b from big where b in (select s from small); -> -> If there is no duplicates in small then this is the same as -> -> select b from big, small where b = s; -> -> Without index on big postgres does seq scan of big and uses hashjoin with -> hash on small. Using temp table makes query only 20% slower (in my test). -> But with index on big postgres uses nestloop with seq scan of small and -> index scan of big => select run faster and temp table stuff makes query -> 2.5 times slower! In the case of duplicates in small, handling in parser -> will use distinct (and so - sorting). But using hashjoin plan distinct -> may be avoided! Who can analize this ? Optimizer only. He can be smart -> to check is there unique index on small or not. If not - what is more -> costless: nestloop with sorting or slower hashjoin without sorting. -> Only optimizer can find best way to execute query, parser can't. -> - -OK, let me comment on this. Let's take your example: - -> select b from big where b in (select s from small); -> -> If there is no duplicates in small then this is the same as -> -> select b from big, small where b = s; - -My idea was to do this: - - select distinct s into temp table small2 from small; - select b from big,small2 where b = s; - -And let the optimizer decide how to do the join. Is this what you are -saying? - -The problem I see is that the temp table is already distinct, and was -sorted to do that, but you can't pass that information into the -optimizer. Is that the problem with using the parser? - -But you want the temp table never to hit disk unless it has to, but that -will not work unless we do a really good job with temp tables. - -Also NOT IN will need some type of non-join operator, perhaps a flag in -the Plan to say "look for a match, but only output if you find it." How -do we do that? - -We definately need temp tables, and I think we can stuff it into the -cache as LOCAL, which will make it usable without adding to pg_class. - -Perhaps if we create a special Plan in the optimizer called IN, and we -have the outer and inner queries as plans, and work that plan into the -executor. - -The problem with that is we need to specify a way to join the two plans, -and the same logic that determines what type of join to do can this too. -Maybe that's why you wanted stuff done in the optimizer and not the -parser. - -At least now, I understand enough to come up with ideas, and can -understand what you are saying. - -> > Especially if we trade him for help on his favorite topics for v6.4?? -> -> Ok, I'd like to see shared catalog cache implemeted in 6.4... -:) -> -> Vadim -> - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Fri Dec 19 01:00:58 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA25512 - for ; Fri, 19 Dec 1997 01:00:56 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA28102; Fri, 19 Dec 1997 00:56:52 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 19 Dec 1997 00:56:40 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA28077 for pgsql-hackers-outgoing; Fri, 19 Dec 1997 00:56:36 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA28065 for ; Fri, 19 Dec 1997 00:56:19 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id AAA25436; - Fri, 19 Dec 1997 00:55:56 -0500 (EST) -From: Bruce Momjian -Message-Id: <199712190555.AAA25436@candle.pha.pa.us> -Subject: Re: [HACKERS] Items for 6.3 -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Fri, 19 Dec 1997 00:55:56 -0500 (EST) -Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org -In-Reply-To: <349A0265.7329D4EE@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 19, 97 12:13:09 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> select b from big where b in (select s from small); -> -> If there is no duplicates in small then this is the same as -> -> select b from big, small where b = s; - -I think I see the problem you are describing now. If we put the -subselect into a temp table, we can't use the existing index on small.s, -even if there is one, or if sorting was involved in creating the temp -table. - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From lockhart@alumni.caltech.edu Fri Dec 19 01:34:26 1997 -Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA25750 - for ; Fri, 19 Dec 1997 01:34:23 -0500 (EST) -Received: from alumni.caltech.edu (localhost [127.0.0.1]) - by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA15234; - Fri, 19 Dec 1997 06:29:45 GMT -Sender: tgl@gnet04.jpl.nasa.gov -Message-ID: <349A1459.EBFE2C84@alumni.caltech.edu> -Date: Fri, 19 Dec 1997 06:29:45 +0000 -From: "Thomas G. Lockhart" -Organization: Caltech/JPL -X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) -MIME-Version: 1.0 -To: "Vadim B. Mikheev" -CC: Bruce Momjian , - PostgreSQL-development -Subject: Re: [HACKERS] Items for 6.3 -References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu> <349A0265.7329D4EE@sable.krasnoyarsk.su> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -> > Could we try to do the subselect/join/union features for 6.3? I know you -> > have been looking at it, and found the deepest parts of the backend to -> > be a bit murky. I'm not familiar with that area at all, but perhaps we -> > could divert Vadim for a week or two or three when he has some time. -> ^^^^^ -> More realistic... And this is for initial release only: tuning performance -> of subselects is very hard, long work. -> -> Ok - I'm ready to do subselects for 6.3 but this means that foreign keys -> may appear in 6.4 only. And I'll need in help: could someone add support -> for them in parser ? Not handling - but parsing and common checking. - -Yes, I've already added subselect syntax in the parser, but we will need to -modify or add to the parse tree nodes to push that past the parser into the -backend. I'm happy to focus on that, since I understand those pieces pretty well. -There are several places where "subselect syntax" is used: subselects and unions -come to mind right away. If you have an opinion on how the parse nodes should be -structured I can start with that, or I can just put something in and then modify -it as you need later. Do you see unions as being similar to subselects, or are -they a separate problem? To me, they seem like a simpler case since (perhaps) not -as much optimization and internal reorganizing needs to happen. - -> Also, it would be nice to have better temp tables implementation -> (without affecting pg_class etc) - node material need in query-level -> temp tables anyway. I'd really like to see temp table files created -> only when its data must go to disk due to local buffer pool is full -> and can't more keep table data in memory. - -This sounds very desirable. I noticed that there are, or used to be, multiple -storage managers. Could a manager for temporary storage be written which stores -things in memory until it gets too big and then go to disk? Could that manager -use the mm and md managers internally? Or is all of that at too low a level to be -helpful for this problem? - -SQL92 has the concept of transaction-only and session-only tables and variables. -Could an implementation of "temporary tables" be used to implement this feature -at the same time (or form the basis for it later)? It seems like none of these -non-permanent tables need to go to any of the pg_ tables, since other backends do -not need to see them and they are allowed to disappear at the end of the session -(or at a crash). We would just need the "table manager" to cache information on -temporary stuff before looking at the permanent tables (??). - -> Also, local buffer manager -> should be re-written to use hash table (like shared bufmgr) for buffer search, -> not sequential scan as now (this is item for TODO) - this will speed up -> things and allow to use more than 64 local buffers. -> -> I'm still sure that handling subselects in parser is not right way. -> And the main problem is not in execution plans (we could use tricks -> to resolve this) but in performance. - -Seems to me that the subselect needs to stay untransformed (i.e. executable but -non-optimized) so that an optimizer can independently decide how to transform for -faster execution. That way, in the first implementation we have reliable but -stupid execution, but then can add a subselect optimizer which looks for cases -which can be transformed to run faster. - -> > Especially if we trade him for help on his favorite topics for v6.4?? -> -> Ok, I'd like to see shared catalog cache implemeted in 6.4... -:) - -Sure. (Tell me what it is later :) - - - Tom - - - -From vadim@sable.krasnoyarsk.su Fri Dec 19 06:23:14 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27849 - for ; Fri, 19 Dec 1997 06:22:46 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id SAA12239; - Fri, 19 Dec 1997 18:28:13 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <349A5A4C.DA366B47@sable.krasnoyarsk.su> -Date: Fri, 19 Dec 1997 18:28:12 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: lockhart@alumni.caltech.edu, hackers@postgresql.org -Subject: Re: [HACKERS] Items for 6.3 -References: <199712190553.AAA25415@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> OK, let me comment on this. Let's take your example: -> -> > select b from big where b in (select s from small); -> > -> > If there is no duplicates in small then this is the same as -> > -> > select b from big, small where b = s; -> -> My idea was to do this: -> -> select distinct s into temp table small2 from small; -> select b from big,small2 where b = s; -> -> And let the optimizer decide how to do the join. Is this what you are -> saying? -> -> The problem I see is that the temp table is already distinct, and was -> sorted to do that, but you can't pass that information into the -> optimizer. Is that the problem with using the parser? - -No. I said that in some cases we can avoid distinct at all: if either -unique index on small exists or by using hashjoin plans with !new! -HashUnique node (there was mistake in my prev description - not Hash, -but HashUnique on small should be used, - HashUnique is hash table -without duplicates, just another way to implement distinct, without -sorting). This new node can be usefull and for "normal" queries -(without subselects). - -My example is very simple. I just want to say that by handling subqueries -in optimizer we will have more chances to do better optimization. Maybe not -now, but latter. I'm sure that subqueries require some specific optimization -and this is not task of parser. - -> -> But you want the temp table never to hit disk unless it has to, but that -> will not work unless we do a really good job with temp tables. - -Of 'course. - -> -> Also NOT IN will need some type of non-join operator, perhaps a flag in -> the Plan to say "look for a match, but only output if you find it." How - ^^ - don't ? -> do we do that? - -Just as you said - by using of some flag. - -> -> We definately need temp tables, and I think we can stuff it into the -> cache as LOCAL, which will make it usable without adding to pg_class. - -We have Relation->rd_istemp flag... Just change it from bool to int: -0 -> is not temp, 1 -> session level temp table, etc... - -Vadim - -From vadim@sable.krasnoyarsk.su Fri Dec 19 08:09:11 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00349 - for ; Fri, 19 Dec 1997 08:09:05 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id UAA12377; - Fri, 19 Dec 1997 20:14:25 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <349A7327.9A484B74@sable.krasnoyarsk.su> -Date: Fri, 19 Dec 1997 20:14:15 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: "Thomas G. Lockhart" -CC: Bruce Momjian , - PostgreSQL-development -Subject: Re: [HACKERS] Items for 6.3 -References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu> <349A0265.7329D4EE@sable.krasnoyarsk.su> <349A1459.EBFE2C84@alumni.caltech.edu> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Thomas G. Lockhart wrote: -> -> > Ok - I'm ready to do subselects for 6.3 but this means that foreign keys -> > may appear in 6.4 only. And I'll need in help: could someone add support -> > for them in parser ? Not handling - but parsing and common checking. -> -> Yes, I've already added subselect syntax in the parser, but we will need to -> modify or add to the parse tree nodes to push that past the parser into the -> backend. I'm happy to focus on that, since I understand those pieces pretty well. - -Nice! - -> There are several places where "subselect syntax" is used: subselects and unions -> come to mind right away. If you have an opinion on how the parse nodes should be -> structured I can start with that, or I can just put something in and then modify - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -It's ok for me. - -> it as you need later. Do you see unions as being similar to subselects, or are -> they a separate problem? To me, they seem like a simpler case since (perhaps) not -> as much optimization and internal reorganizing needs to happen. - -I didn't think about unions at all... Yes, it's simpler to implement. -BTW, I recall Bruce mentioned that unions are used for selects from -superclass and all descendant classes (select ... from table* ) - maybe -something is already implemented ? Bruce ? - -> -> > Also, it would be nice to have better temp tables implementation -> > (without affecting pg_class etc) - node material need in query-level -> > temp tables anyway. I'd really like to see temp table files created -> > only when its data must go to disk due to local buffer pool is full -> > and can't more keep table data in memory. -> -> This sounds very desirable. I noticed that there are, or used to be, multiple -> storage managers. Could a manager for temporary storage be written which stores -> things in memory until it gets too big and then go to disk? Could that manager -> use the mm and md managers internally? Or is all of that at too low a level to be -> helpful for this problem? - -mm uses shmem... This feature could be implemented in local bufmgr -directly: when requested buffer is not found in pool and there is no free, -!dirty buffer then try to find some dirty buffer of created relation, flush -it to disk and use (exception below); if no such buffer -> create some relation -(and flush 1st block); exception: also create some relation if # of buffers -occupied by already created relations is too small (just to do not break -buffering of created relations). -(Note, that using some additional in-memory storage manager will cause -keeping some buffers in-memory twice - in local pool and in manager. -The way above is using local bufmgr as storage manager). - -> > -> > I'm still sure that handling subselects in parser is not right way. -> > And the main problem is not in execution plans (we could use tricks -> > to resolve this) but in performance. -> -> Seems to me that the subselect needs to stay untransformed (i.e. executable but -> non-optimized) so that an optimizer can independently decide how to transform for -> faster execution. That way, in the first implementation we have reliable but -> stupid execution, but then can add a subselect optimizer which looks for cases -> which can be transformed to run faster. - -Yes, I believe that this is right way. - -> -> > > Especially if we trade him for help on his favorite topics for v6.4?? -> > -> > Ok, I'd like to see shared catalog cache implemeted in 6.4... -:) -> -> Sure. (Tell me what it is later :) - -Ok -:) - -Vadim - -From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:21 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08884 - for ; Tue, 23 Dec 1997 04:01:18 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA24250 for ; Tue, 23 Dec 1997 03:57:12 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23028; - Tue, 23 Dec 1997 16:04:25 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <349F7E97.48C63F17@sable.krasnoyarsk.su> -Date: Tue, 23 Dec 1997 16:04:23 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: lockhart@alumni.caltech.edu, hackers@postgresql.org -Subject: Re: [HACKERS] Items for 6.3 -References: <199712191607.LAA02362@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> > -> > I didn't think about unions at all... Yes, it's simpler to implement. -> > BTW, I recall Bruce mentioned that unions are used for selects from -> > superclass and all descendant classes (select ... from table* ) - maybe -> > something is already implemented ? Bruce ? -> -> Yes, it is already there. See optimizer/prep/prepunion.c, and see the -> call to it from optimizer/plan/planner.c. The current source tree has a -> cleaned up version that will be easier to understand. Basically, if -> there are any inherited tables, it calls prepunion, and and cycles -> through each inherited table, copying the Query plan, and calling the -> planner() for each one, then it returns to the planner() to so sorting -> and uniqueness. I am working on fixing aggregates. - -Could you try with unions ? -I would like to concentrate on single thing - subqueries. - -> -> > mm uses shmem... This feature could be implemented in local bufmgr -> > directly: when requested buffer is not found in pool and there is no free, -> > !dirty buffer then try to find some dirty buffer of created relation, flush -> > it to disk and use (exception below); if no such buffer -> create some relation -> > (and flush 1st block); exception: also create some relation if # of buffers -> > occupied by already created relations is too small (just to do not break -> > buffering of created relations). -> > (Note, that using some additional in-memory storage manager will cause -> > keeping some buffers in-memory twice - in local pool and in manager. -> > The way above is using local bufmgr as storage manager). -> -> In the psort code, we do a nice job of keeping the stuff in files or -> memory. Seems to work well. Can we use that somehow? Perhaps make it -> a separate module, or just force a psort rather than a hash! - -I would like to be not restricted to psort only, but use what is better -in each case. I even can foresee using indices on temp tables: we could -put data in index without putting data in table itself! -In any case, we can leave in-memory tables for future. - -Vadim - -From owner-pgsql-hackers@hub.org Tue Dec 23 04:31:23 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09186 - for ; Tue, 23 Dec 1997 04:31:20 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA24391 for ; Tue, 23 Dec 1997 04:04:44 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id EAA06421; Tue, 23 Dec 1997 04:00:11 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 23 Dec 1997 03:58:36 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id DAA06163 for pgsql-hackers-outgoing; Tue, 23 Dec 1997 03:58:32 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id DAA06151 for ; Tue, 23 Dec 1997 03:58:02 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23028; - Tue, 23 Dec 1997 16:04:25 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Message-ID: <349F7E97.48C63F17@sable.krasnoyarsk.su> -Date: Tue, 23 Dec 1997 16:04:23 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: lockhart@alumni.caltech.edu, hackers@postgreSQL.org -Subject: Re: [HACKERS] Items for 6.3 -References: <199712191607.LAA02362@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce Momjian wrote: -> > -> > I didn't think about unions at all... Yes, it's simpler to implement. -> > BTW, I recall Bruce mentioned that unions are used for selects from -> > superclass and all descendant classes (select ... from table* ) - maybe -> > something is already implemented ? Bruce ? -> -> Yes, it is already there. See optimizer/prep/prepunion.c, and see the -> call to it from optimizer/plan/planner.c. The current source tree has a -> cleaned up version that will be easier to understand. Basically, if -> there are any inherited tables, it calls prepunion, and and cycles -> through each inherited table, copying the Query plan, and calling the -> planner() for each one, then it returns to the planner() to so sorting -> and uniqueness. I am working on fixing aggregates. - -Could you try with unions ? -I would like to concentrate on single thing - subqueries. - -> -> > mm uses shmem... This feature could be implemented in local bufmgr -> > directly: when requested buffer is not found in pool and there is no free, -> > !dirty buffer then try to find some dirty buffer of created relation, flush -> > it to disk and use (exception below); if no such buffer -> create some relation -> > (and flush 1st block); exception: also create some relation if # of buffers -> > occupied by already created relations is too small (just to do not break -> > buffering of created relations). -> > (Note, that using some additional in-memory storage manager will cause -> > keeping some buffers in-memory twice - in local pool and in manager. -> > The way above is using local bufmgr as storage manager). -> -> In the psort code, we do a nice job of keeping the stuff in files or -> memory. Seems to work well. Can we use that somehow? Perhaps make it -> a separate module, or just force a psort rather than a hash! - -I would like to be not restricted to psort only, but use what is better -in each case. I even can foresee using indices on temp tables: we could -put data in index without putting data in table itself! -In any case, we can leave in-memory tables for future. - -Vadim - - -From aixssd!darrenk@abs.net Thu Dec 5 10:30:53 1996 -Received: from abs.net (root@u1.abs.net [207.114.0.130]) by candle.pha.pa.us (8.8.3/8.7.3) with ESMTP id KAA06591 for ; Thu, 5 Dec 1996 10:30:43 -0500 (EST) -Received: from aixssd.UUCP (nobody@localhost) by abs.net (8.8.3/8.7.3) with UUCP id KAA01387 for maillist@candle.pha.pa.us; Thu, 5 Dec 1996 10:13:56 -0500 (EST) -Received: by aixssd (AIX 3.2/UCB 5.64/4.03) - id AA36963; Thu, 5 Dec 1996 10:10:24 -0500 -Received: by ceodev (AIX 4.1/UCB 5.64/4.03) - id AA34942; Thu, 5 Dec 1996 10:07:56 -0500 -Date: Thu, 5 Dec 1996 10:07:56 -0500 -From: aixssd!darrenk@abs.net (Darren King) -Message-Id: <9612051507.AA34942@ceodev> -To: maillist@candle.pha.pa.us -Subject: Subselect info. -Mime-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Content-Md5: jaWdPH2KYtdr7ESzqcOp5g== -Status: OR - -> Any of them deal with implementing subselects? - -There's a white paper at the www.sybase.com that might -help a little. It's just a copy of a presentation -given by the optimizer guru there. Nothing code-wise, -but he gives a few ways of flattening them with temp -tables, etc... - -Darren - -From vadim@sable.krasnoyarsk.su Thu Aug 21 23:42:50 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04109 - for ; Thu, 21 Aug 1997 23:42:43 -0400 (EDT) -Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04399; Fri, 22 Aug 1997 12:04:31 +0800 (KRD) -Sender: root@www.krasnet.ru -Message-ID: <33FD0FCF.4DAA423A@sable.krasnoyarsk.su> -Date: Fri, 22 Aug 1997 12:04:31 +0800 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -Subject: Re: subselects -References: <199708220219.WAA23745@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> Considering the complexity of the primary/secondary changes you are -> making, I believe subselects will be easier than that. - -I don't do changes for P/F keys - just thinking... -Yes, I think that impl of referential integrity is -more complex work. - -As for subselects: - -in plannodes.h - -typedef struct Plan { -... - struct Plan *lefttree; - struct Plan *righttree; -} Plan; - -/* ---------------- - * these are are defined to avoid confusion problems with "left" - ^^^^^^^^^^^^^^^^^^ - * and "right" and "inner" and "outer". The convention is that - * the "left" plan is the "outer" plan and the "right" plan is - * the inner plan, but these make the code more readable. - * ---------------- - */ -#define innerPlan(node) (((Plan *)(node))->righttree) -#define outerPlan(node) (((Plan *)(node))->lefttree) - -First thought is avoid any confusions by re-defining - -#define rightPlan(node) (((Plan *)(node))->righttree) -#define leftPlan(node) (((Plan *)(node))->lefttree) - -and change all occurrences of 'outer' & 'inner' in code -to 'left' & 'inner' ones: - -this will allow to use 'outer' & 'inner' things for subselects -latter, without confusion. My hope is that we may change Executor -very easy by adding outer/inner plans/TupleSlots to -EState, CommonState, JoinState, etc and by doing node -processing in right order. - -Subselects are mostly Planner problem. - -Unfortunately, I havn't time at the moment: CHECK/DEFAULT... - -Vadim - -From vadim@sable.krasnoyarsk.su Fri Aug 22 00:00:59 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA04354 - for ; Fri, 22 Aug 1997 00:00:51 -0400 (EDT) -Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04425; Fri, 22 Aug 1997 12:22:37 +0800 (KRD) -Sender: root@www.krasnet.ru -Message-ID: <33FD140D.64880EEB@sable.krasnoyarsk.su> -Date: Fri, 22 Aug 1997 12:22:37 +0800 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -Subject: Re: subselects -References: <199708220219.WAA23745@candle.pha.pa.us> <33FD0FCF.4DAA423A@sable.krasnoyarsk.su> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Vadim B. Mikheev wrote: -> -> this will allow to use 'outer' & 'inner' things for subselects -> latter, without confusion. My hope is that we may change Executor - -Or may be use 'high' & 'low' for subselecs (to avoid confusion -with outter hoins). - -> very easy by adding outer/inner plans/TupleSlots to -> EState, CommonState, JoinState, etc and by doing node -> processing in right order. - ^^^^^^^^^^^^^^ -Rule is easy: -1. Uncorrelated subselect - do 'low' plan node first -2. Correlated - do left/right first - -- just some flag in structures. - -Vadim - -From owner-pgsql-hackers@hub.org Thu Oct 30 17:02:30 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA09682 - for ; Thu, 30 Oct 1997 17:02:28 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA20688; Thu, 30 Oct 1997 16:58:40 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:58:24 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA20615 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:58:17 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA20495 for ; Thu, 30 Oct 1997 16:57:54 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id QAA07726 - for hackers@postgreSQL.org; Thu, 30 Oct 1997 16:50:29 -0500 (EST) -From: Bruce Momjian -Message-Id: <199710302150.QAA07726@candle.pha.pa.us> -Subject: [HACKERS] subselects -To: hackers@postgreSQL.org (PostgreSQL-development) -Date: Thu, 30 Oct 1997 16:50:29 -0500 (EST) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -The only thing I have to add to what I had written earlier is that I -think it is best to have these subqueries executed as early in query -execution as possible. - -Every piece of the backend: parser, optimizer, executor, is designed to -work on a single query. The earlier we can split up the queries, the -better those pieces will work at doing their job. You want to be able -to use the parser and optimizer on each part of the query separately, if -you can. - - -Forwarded message: -> I have done some thinking about subselects. There are basically two -> issues: - > -> Does the query return one row or several rows? This can be -> determined by seeing if the user uses equals on 'IN' to join the -> subquery. -> -> Is the query correlated, meaning "Does the subquery reference -> values from the outer query?" -> -> (We already have the third type of subquery, the INSERT...SELECT query.) -> -> So we have these four combinations: -> -> 1) one row, no correlation -> 2) multiple rows, no correlation -> 3) one row, correlated -> 4) multiple rows, correlated -> -> -> With #1, we can execute the subquery, get the value, replace the -> subquery with the constant returned from the subquery, and execute the -> outer query. -> -> With #2, we can execute the subquery and put the result into a temporary -> table. We then rewrite the outer query to access the temporary table -> and replace the subquery with the column name from the temporary table. -> We probabally put an index on the temp. table, which has only one -> column, because a subquery can only return one column. We remove the -> temp. table after query execution. -> -> With #3 and #4, we potentially need to execute the subquery for every -> row returned by the outer query. Performance would be horrible for -> anything but the smallest query. Another way to handle this is to -> execute the subquery WITHOUT using any of the outer-query columns to -> restrict the WHERE clause, and add those columns used to join the outer -> variables into the target list of the subquery. So for query: -> -> select t1.name -> from tab t1 -> where t1.age = (select max(t2.age) -> from tab2 -> where tab2.name = t1.name) -> -> Execute the subquery and put it in a temporary table: -> -> select t2.name, max(t2.age) -> into table temp999 -> from tab2 -> where tab2.name = t1.name -> -> create index i_temp999 on temp999 (name) -> -> Then re-write the outer query: -> -> select t1.name -> from tab t1, temp999 -> where t1.age = temp999.age and -> t1.name = temp999.name -> -> The only problem here is that the subselect is running for all entries -> in tab2, even if the outer query is only going to need a few rows. -> Determining whether to execute the subquery each time, or create a temp. -> table is often difficult to determine. Even some non-correlated -> subqueries are better to execute for each row rather the pre-execute the -> entire subquery, expecially if the outer query returns few rows. -> -> One requirement to handle these issues is better column statistics, -> which I am working on. -> - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Fri Oct 31 22:30:58 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA15643 - for ; Fri, 31 Oct 1997 22:30:56 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA24379 for ; Fri, 31 Oct 1997 22:06:08 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA15503; Fri, 31 Oct 1997 22:03:40 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 31 Oct 1997 22:01:38 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA14136 for pgsql-hackers-outgoing; Fri, 31 Oct 1997 22:01:29 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA13866 for ; Fri, 31 Oct 1997 22:00:53 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id VAA14566; - Fri, 31 Oct 1997 21:37:06 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711010237.VAA14566@candle.pha.pa.us> -Subject: Re: [HACKERS] subselects -To: maillist@candle.pha.pa.us (Bruce Momjian) -Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <199710302150.QAA07726@candle.pha.pa.us> from "Bruce Momjian" at Oct 30, 97 04:50:29 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -One more issue I thought of. You can have multiple subselects in a -single query, and subselects can have their own subselects. - -This makes it particularly important that we define a system that always -is able to process the subselect BEFORE the upper select. This will -allow use to handle all these cases without limitations. - -> -> The only thing I have to add to what I had written earlier is that I -> think it is best to have these subqueries executed as early in query -> execution as possible. -> -> Every piece of the backend: parser, optimizer, executor, is designed to -> work on a single query. The earlier we can split up the queries, the -> better those pieces will work at doing their job. You want to be able -> to use the parser and optimizer on each part of the query separately, if -> you can. -> - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From hannu@trust.ee Sun Nov 2 10:33:33 1997 -Received: from sid.trust.ee (sid.trust.ee [194.204.23.180]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27619 - for ; Sun, 2 Nov 1997 10:32:04 -0500 (EST) -Received: from sid.trust.ee (wink.trust.ee [194.204.23.184]) - by sid.trust.ee (8.8.5/8.8.5) with ESMTP id RAA02233; - Sun, 2 Nov 1997 17:30:11 +0200 -Message-ID: <345C9BFD.986C68AA@sid.trust.ee> -Date: Sun, 02 Nov 1997 17:27:57 +0200 -From: Hannu Krosing -X-Mailer: Mozilla 4.02 [en] (Win95; I) -MIME-Version: 1.0 -To: hackers-digest@postgresql.org -CC: maillist@candle.pha.pa.us -Subject: Re: [HACKERS] subselects -References: <199711010401.XAA09216@hub.org> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -> Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST) -> From: Bruce Momjian -> Subject: Re: [HACKERS] subselects -> -> One more issue I thought of. You can have multiple subselects in a -> single query, and subselects can have their own subselects. -> -> This makes it particularly important that we define a system that always -> is able to process the subselect BEFORE the upper select. This will -> allow use to handle all these cases without limitations. - -This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a -search criteria for the subselect, -for example you can't do - -update parts p1 -set parts.current_id = ( - select new_id - from parts p2 - where p1.old_id = p2.new_id);or - -select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice -from parts p1; - -there may be of course ways to rewrite these queries (which the optimiser should do -if it can) but IMHO, these kinds of subselects should still be allowed - -> > The only thing I have to add to what I had written earlier is that I -> > think it is best to have these subqueries executed as early in query -> > execution as possible. -> > -> > Every piece of the backend: parser, optimizer, executor, is designed to -> > work on a single query. The earlier we can split up the queries, the -> > better those pieces will work at doing their job. You want to be able -> > to use the parser and optimizer on each part of the query separately, if -> > you can. -> > -> - -Hannu - - -From vadim@sable.krasnoyarsk.su Sun Nov 2 21:30:59 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA14831 - for ; Sun, 2 Nov 1997 21:30:57 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id VAA19683 for ; Sun, 2 Nov 1997 21:20:13 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id JAA17259; Mon, 3 Nov 1997 09:22:38 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <345D356E.353C51DE@sable.krasnoyarsk.su> -Date: Mon, 03 Nov 1997 09:22:38 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: [HACKERS] subselects -References: <199711021848.NAA08319@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > > One more issue I thought of. You can have multiple subselects in a -> > > single query, and subselects can have their own subselects. -> > > -> > > This makes it particularly important that we define a system that always -> > > is able to process the subselect BEFORE the upper select. This will -> > > allow use to handle all these cases without limitations. -> > -> > This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a -> > search criteria for the subselect, -> > for example you can't do -> > -> > update parts p1 -> > set parts.current_id = ( -> > select new_id -> > from parts p2 -> > where p1.old_id = p2.new_id);or -> > -> > select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice -> > from parts p1; -> > -> > there may be of course ways to rewrite these queries (which the optimiser should do -> > if it can) but IMHO, these kinds of subselects should still be allowed -> -> I hadn't even gotten to this point yet, but it is a good thing to keep -> in mind. -> -> In these cases, as in correlated subqueries in the where clause, we will -> create a temporary table, and add the proper join fields and tables to -> the clauses. Our version of UPDATE accepts a FROM section, and we will -> certainly use this for this purpose. - -We can't replace subselect with join if there is aggregate -in subselect. - -Actually, I don't see any problems if we going to process subselect -like sql-funcs: non-correlated subselects can be emulated by -funcs without args, for correlated subselects parser (analyze.c) -has to change all upper query references to $1, $2,... - -Vadim - -From vadim@sable.krasnoyarsk.su Mon Nov 3 06:07:12 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27433 - for ; Mon, 3 Nov 1997 06:07:03 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id SAA18519; Mon, 3 Nov 1997 18:09:44 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <345DB0F7.5E652F78@sable.krasnoyarsk.su> -Date: Mon, 03 Nov 1997 18:09:43 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org -Subject: Re: [HACKERS] subselects -References: <199711030316.WAA15401@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > -> > > In these cases, as in correlated subqueries in the where clause, we will -> > > create a temporary table, and add the proper join fields and tables to -> > > the clauses. Our version of UPDATE accepts a FROM section, and we will -> > > certainly use this for this purpose. -> > -> > We can't replace subselect with join if there is aggregate -> > in subselect. -> -> I got lost here. Why can't we handle aggregates? - -Sorry, I missed using of temp tables. Sybase uses joins (without -temp tables) for non-correlated subqueries: - - A noncorrelated subquery can be evaluated as if it were an independent query. - Conceptually, the results of the subquery are substituted in the main statement, or - outer query. This is not how SQL Server actually processes statements with - subqueries. Noncorrelated subqueries can be alternatively stated as joins and - are processed as joins by SQL Server. - -but this is not possible if there are aggregates in subquery. - -> -> My idea was this. This is a non-correlated subquery. -... -No problems with it... - -> -> Here is a correlated example: -> -> select * -> from table_a -> where table_a.col_a in (select table_b.col_b -> from table_b -> where table_b.col_b = table_a.col_c) -> -> rewrite as: -> -> select distinct table_b.col_b, table_a.col_c -- the distinct is needed -> into table_sub -> from table_a, table_b - -First, could we add 'where table_b.col_b = table_a.col_c' here ? -Just to avoid Cartesian results ? I hope we can. - -Note that for query - - select * - from table_a - where table_a.col_a in (select table_b.col_b * table_a.col_c - from table_b) - -it's better to do - - select distinct table_a.col_a - into table table_sub - from table_b, table_a - where table_a.col_a = table_b.col_b * table_a.col_c - -once again - to avoid Cartesians. - -But what could we do for - - select * - from table_a - where table_a.col_a = (select max(table_b.col_b * table_a.col_c) - from table_b) -??? - select max(table_b.col_b * table_a.col_c), table_a.col_a - into table table_sub - from table_b, table_a - group by table_a.col_a - -first tries to sort sizeof(table_a) * sizeof(table_b) tuples... -For tables big and small with 100 000 and 1000 tuples - -select max(x*y), x from big, small group by x - -"ate" all free 140M in my file system after 20 minutes (just for -sorting - nothing more) and was killed... - -select x from big where x = cor(x); -(cor(int4) is 'select max($1*y) from small') takes 20 minutes - -this is bad too. - -> > -> > Actually, I don't see any problems if we going to process subselect -> > like sql-funcs: non-correlated subselects can be emulated by -> > funcs without args, for correlated subselects parser (analyze.c) -> > has to change all upper query references to $1, $2,... -> -> Yes, logically, they are SQL functions, but aren't we going to see -> terrible performance in such circumstances. My experience is that when - ^^^^^^^^^^^^^^^^^^^^ -You're right. - -> people are given subselects, they start to do huge jobs with them. -> -> In fact, the final solution may be to have both methods available, and -> switch between them depending on the size of the query sets. Each -> method has its advantages. The function example lets the outside query -> be executed, and only calls the subquery when needed. -> -> For large tables where the subselect is small and is the entire WHERE -> restriction, the SQL function gets call much too often. A simple join -> of the subquery result and the large table would be much better. This -> method also allows for sort/merge join of the subquery results, and -> index use. - -...keep thinking... - -Vadim - -From owner-pgsql-hackers@hub.org Mon Nov 3 11:01:01 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03633 - for ; Mon, 3 Nov 1997 11:00:59 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id KAA12174 for ; Mon, 3 Nov 1997 10:49:42 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id KAA26203; Mon, 3 Nov 1997 10:33:32 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 10:31:43 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id KAA25514 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 10:31:36 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA25449 for ; Mon, 3 Nov 1997 10:31:23 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id KAA02262; - Mon, 3 Nov 1997 10:25:34 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711031525.KAA02262@candle.pha.pa.us> -Subject: Re: [HACKERS] subselects -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Mon, 3 Nov 1997 10:25:34 -0500 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <345DB0F7.5E652F78@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 06:09:43 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> Sorry, I missed using of temp tables. Sybase uses joins (without -> temp tables) for non-correlated subqueries: -> -> A noncorrelated subquery can be evaluated as if it were an independent query. -> Conceptually, the results of the subquery are substituted in the main statement, or -> outer query. This is not how SQL Server actually processes statements with -> subqueries. Noncorrelated subqueries can be alternatively stated as joins and -> are processed as joins by SQL Server. -> -> but this is not possible if there are aggregates in subquery. -> -> > -> > My idea was this. This is a non-correlated subquery. -> ... -> No problems with it... -> -> > -> > Here is a correlated example: -> > -> > select * -> > from table_a -> > where table_a.col_a in (select table_b.col_b -> > from table_b -> > where table_b.col_b = table_a.col_c) -> > -> > rewrite as: -> > -> > select distinct table_b.col_b, table_a.col_c -- the distinct is needed -> > into table_sub -> > from table_a, table_b -> -> First, could we add 'where table_b.col_b = table_a.col_c' here ? -> Just to avoid Cartesian results ? I hope we can. - -Yes, of course. I forgot that line here. We can also be fancy and move -some of the outer where restrictions on table_a into the subquery. - -I think the classic subquery for this would be if someone wanted all -customer names that had invoices in the past month: - -select custname -from customer -where custid in (select order.custid - from order - where order.date >= "09/01/97" and - order.date <= "09/30/97" - -In this case, the subquery can use an index on 'date' to quickly -evaluate the query, and the resulting temp table can quickly be joined -to the customer table. If we used SQL functions, every customer would -have an order query evaluated for it, and there may be no multi-column -index on customer and date, or even if there is, this could be many -query executions. - - -> -> Note that for query -> -> select * -> from table_a -> where table_a.col_a in (select table_b.col_b * table_a.col_c -> from table_b) -> -> it's better to do -> -> select distinct table_a.col_a -> into table table_sub -> from table_b, table_a -> where table_a.col_a = table_b.col_b * table_a.col_c - -Yes, I had not thought of cases where they are doing correlated column -arithmetic, but it looks like this would work. - -> -> once again - to avoid Cartesians. -> -> But what could we do for -> -> select * -> from table_a -> where table_a.col_a = (select max(table_b.col_b * table_a.col_c) -> from table_b) - -OK, who wrote this horrible query. :-) - -Without a join of table_b and table_a, even an SQL function would die on -this. You have to take the current value table_a.col_c, and multiply by -every value of table_b.col_b to get the maximum. - -Trying to do a temp table on this is certainly going to be a cartesian -product, but using an SQL function is also going to be a cartesian -product, except that the product is generated in small pieces instead of -in one big query. The SQL function example may eventually complete, but -it will take forever to do so in cases where the temp table would bomb. - -I can recommend some SQL books for anyone go sends in a bug report on -this query. :-) - - - -> ??? -> select max(table_b.col_b * table_a.col_c), table_a.col_a -> into table table_sub -> from table_b, table_a -> group by table_a.col_a -> -> first tries to sort sizeof(table_a) * sizeof(table_b) tuples... -> For tables big and small with 100 000 and 1000 tuples -> -> select max(x*y), x from big, small group by x -> -> "ate" all free 140M in my file system after 20 minutes (just for -> sorting - nothing more) and was killed... -> -> select x from big where x = cor(x); -> (cor(int4) is 'select max($1*y) from small') takes 20 minutes - -> this is bad too. - -Again, my feeling is that in cases where the temp table would bomb, the -SQL function will be so slow that neither will be acceptable. - -> -> > > -> > > Actually, I don't see any problems if we going to process subselect -> > > like sql-funcs: non-correlated subselects can be emulated by -> > > funcs without args, for correlated subselects parser (analyze.c) -> > > has to change all upper query references to $1, $2,... -> > -> > Yes, logically, they are SQL functions, but aren't we going to see -> > terrible performance in such circumstances. My experience is that when -> ^^^^^^^^^^^^^^^^^^^^ -> You're right. -> -> > people are given subselects, they start to do huge jobs with them. -> > -> > In fact, the final solution may be to have both methods available, and -> > switch between them depending on the size of the query sets. Each -> > method has its advantages. The function example lets the outside query -> > be executed, and only calls the subquery when needed. -> > -> > For large tables where the subselect is small and is the entire WHERE -> > restriction, the SQL function gets call much too often. A simple join -> > of the subquery result and the large table would be much better. This -> > method also allows for sort/merge join of the subquery results, and -> > index use. -> -> ...keep thinking... -> -> Vadim -> - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Thu Nov 20 00:09:18 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA05239 - for ; Thu, 20 Nov 1997 00:09:11 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA13776; Wed, 19 Nov 1997 23:59:53 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 19 Nov 1997 23:58:49 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA13599 for pgsql-hackers-outgoing; Wed, 19 Nov 1997 23:58:43 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA13512 for ; Wed, 19 Nov 1997 23:58:16 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id XAA03103 - for hackers@postgreSQL.org; Wed, 19 Nov 1997 23:57:44 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711200457.XAA03103@candle.pha.pa.us> -Subject: [HACKERS] subselect -To: hackers@postgreSQL.org (PostgreSQL-development) -Date: Wed, 19 Nov 1997 23:57:44 -0500 (EST) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -I am going to overhaul all the /parser files, and I may give subselects -a try while I am in there. This is where it going to have to be done. - -Two things I think I need are: - - temp tables that go away at the end of a statement, so if the -query elog's out, the temp file gets destroyed - - how do I implement "not in": - - select * from a where x not in (select y from b) - -Using <> is not going to work because that returns multiple copies of a, -one for every one that doesn't equal. It is like we need not equals, -but don't return multiple rows. - -Any ideas? - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From lockhart@alumni.caltech.edu Thu Nov 20 10:00:59 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA22019 - for ; Thu, 20 Nov 1997 10:00:56 -0500 (EST) -Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21662 for ; Thu, 20 Nov 1997 09:52:55 -0500 (EST) -Received: from alumni.caltech.edu (localhost [127.0.0.1]) - by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA22754; - Thu, 20 Nov 1997 06:27:21 GMT -Sender: tgl@gnet04.jpl.nasa.gov -Message-ID: <3473D849.16F67A2A@alumni.caltech.edu> -Date: Thu, 20 Nov 1997 06:27:21 +0000 -From: "Thomas G. Lockhart" -Organization: Caltech/JPL -X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: [HACKERS] subselect -References: <199711200457.XAA03103@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -> I am going to overhaul all the /parser files - -?? - -> , and I may give subselects -> a try while I am in there. This is where it going to have to be done. - -A first cut at the subselect syntax is already in gram.y. I'm sure that the -e-mail you had sent which collected several items regarding subselects -covers some of this topic. I've been thinking about subselects also, and -had thought that there must be some existing mechanisms in the backend -which can be used to help implement subselects. It seems to me that UNION -might be a good thing to implement first, because it has a fairly -well-defined set of behaviors: - - select a union select b; - -chooses elements from a and from b and then sorts/uniques the result. - - select a union all select b; - -chooses elements from a, sorts/uniques, and then adds all elements from b. - - select a union select b union all select c; - -evaluates left to right, and first evaluates a union b, sorts/uniques, and -then evaluates - - (result) union all select c; - -There are several types of subselects. Examples of some are: - -1) select a.f from a union select b.f from b order by 1; -Needs temporary table(s), optional sort/unique, final order by. - -2) select a.f from a where a.f in (select b.f from b); -Needs temporary table(s). "in" can be first implemented by count(*) > 0 but -would be better performance to have the backend return after the first -match. - -3) select a.f from a where exists (select b.f from b where b.f = a); -Need to do the select and do a subselect on _each_ of the returned values? -Again could use count(*) to help implement. - -This brings up the point that perhaps the backend needs a row-counting -atomic operation and count(*) could be re-implemented using that. At the -moment count(*) is transformed to a select of OID columns and does not -quite work on table joins. - -I would think that outer joins could use some of these support routines -also. - - - Tom - -> Two things I think I need are: -> -> temp tables that go away at the end of a statement, so if the -> query elog's out, the temp file gets destroyed -> -> how do I implement "not in": -> -> select * from a where x not in (select y from b) -> -> Using <> is not going to work because that returns multiple copies of a, -> one for every one that doesn't equal. It is like we need not equals, -> but don't return multiple rows. -> -> Any ideas? -> -> -- -> Bruce Momjian -> maillist@candle.pha.pa.us - - - - -From owner-pgsql-hackers@hub.org Mon Dec 22 00:49:03 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA13311 - for ; Mon, 22 Dec 1997 00:49:01 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA11930; Mon, 22 Dec 1997 00:45:41 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 00:45:17 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA11756 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 00:45:14 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA11624 for ; Mon, 22 Dec 1997 00:44:57 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id AAA11605 - for hackers@postgreSQL.org; Mon, 22 Dec 1997 00:45:23 -0500 (EST) -From: Bruce Momjian -Message-Id: <199712220545.AAA11605@candle.pha.pa.us> -Subject: [HACKERS] subselects -To: hackers@postgreSQL.org (PostgreSQL-development) -Date: Mon, 22 Dec 1997 00:45:23 -0500 (EST) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -OK, a few questions: - - Should we use sortmerge, so we can use our psort as temp tables, -or do we use hashunique? - - How do we pass the query to the optimizer? How do we represent -the range table for each, and the links between them in correlated -subqueries? - -I have to think about this. Comments are welcome. --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Mon Dec 22 02:01:27 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA20608 - for ; Mon, 22 Dec 1997 02:01:25 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA25136 for ; Mon, 22 Dec 1997 01:37:29 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA25289; Mon, 22 Dec 1997 01:31:18 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 01:30:45 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA23854 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 01:30:35 -0500 (EST) -Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22847 for ; Mon, 22 Dec 1997 01:30:15 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id BAA17354 - for hackers@postgreSQL.org; Mon, 22 Dec 1997 01:05:04 -0500 (EST) -From: Bruce Momjian -Message-Id: <199712220605.BAA17354@candle.pha.pa.us> -Subject: [HACKERS] subselects (fwd) -To: hackers@postgreSQL.org (PostgreSQL-development) -Date: Mon, 22 Dec 1997 01:05:03 -0500 (EST) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Forwarded message: -> OK, a few questions: -> -> Should we use sortmerge, so we can use our psort as temp tables, -> or do we use hashunique? -> -> How do we pass the query to the optimizer? How do we represent -> the range table for each, and the links between them in correlated -> subqueries? -> -> I have to think about this. Comments are welcome. - -One more thing. I guess I am seeing subselects as a different thing -that temp tables. I can see people wanting to put indexes on their temp -tables, so I think they will need more system catalog support. For -subselects, I think we can just stuff them into psort, perhaps, and do -the unique as we unload them. - -Seems like a natural to me. - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:07 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08876 - for ; Tue, 23 Dec 1997 04:00:57 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23042; - Tue, 23 Dec 1997 16:08:56 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <349F7FA8.77F8DC55@sable.krasnoyarsk.su> -Date: Tue, 23 Dec 1997 16:08:56 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: [HACKERS] subselects (fwd) -References: <199712220605.BAA17354@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> Forwarded message: -> > OK, a few questions: -> > -> > Should we use sortmerge, so we can use our psort as temp tables, -> > or do we use hashunique? -> > -> > How do we pass the query to the optimizer? How do we represent -> > the range table for each, and the links between them in correlated -> > subqueries? -> > -> > I have to think about this. Comments are welcome. -> -> One more thing. I guess I am seeing subselects as a different thing -> that temp tables. I can see people wanting to put indexes on their temp -> tables, so I think they will need more system catalog support. For - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -What's the difference between temp tables and temp indices ? -Both of them are handled via catalog cache... - -Vadim - -From vadim@sable.krasnoyarsk.su Sat Jan 3 04:01:00 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA28565 - for ; Sat, 3 Jan 1998 04:00:58 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA19242 for ; Sat, 3 Jan 1998 03:47:07 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA21017; - Sat, 3 Jan 1998 16:08:55 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34AE0023.A477AEC5@sable.krasnoyarsk.su> -Date: Sat, 03 Jan 1998 16:08:51 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian , - "Thomas G. Lockhart" -Subject: Re: subselects -References: <199712290516.AAA12579@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> With UNIONs done, how are things going with you on subselects? UNIONs -> are much easier that subselects. -> -> I am stumped on how to record the subselect query information in the -> parser and stuff. - - And I'm too. We definitely need in EXISTS node and may be in IN one. -Also, we have to support ANY and ALL modifiers of comparison operators -(it would be nice to support ANY and ALL for all operators returning -bool: >, =, ..., like, ~ and so on). Note, that IN is the same as -= ANY (NOT IN ==> <> ALL) assuming that '=' means EQUAL for all data types, -and so, we could avoid IN node, but I'm not sure that I like such -assumption: postgres is OO-like system allowing operators to be overriden -and so, '=' can, in theory, mean not EQUAL but something else (someday -we could allow to specify "meaning" of operator in CREATE OPERATOR) - -in short, I would like IN node. - Also, I would suggest nodes for ANY and ALL. - (I need in few days to think more about recording of this stuff...) - -> -> Please let me know what I can do to help, if anything. - -Thanks. As I remember, Tom also wished to work here. Tom ? - -Bye, - Vadim - -P.S. I'll be "on-line" Jan 5. - -From owner-pgsql-hackers@hub.org Mon Jan 5 07:30:51 1998 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA05466 - for ; Mon, 5 Jan 1998 07:30:49 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id HAA04700; Mon, 5 Jan 1998 07:22:06 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 07:21:45 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id HAA02846 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 07:21:35 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA00903 for ; Mon, 5 Jan 1998 07:20:57 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA24278; - Mon, 5 Jan 1998 19:36:06 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Message-ID: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> -Date: Mon, 05 Jan 1998 19:35:59 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: [HACKERS] subselect -References: <199801050516.AAA28005@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce Momjian wrote: -> -> I was thinking about subselects, and how to attach the two queries. -> -> What if the subquery makes a range table entry in the outer query, and -> the query is set up like the UNION queries where we put the scans in a -> row, but in the case we put them over/under each other. -> -> And we push a temp table into the catalog cache that represents the -> result of the subquery, then we could join to it in the outer query as -> though it was a real table. -> -> Also, can't we do the correlated subqueries by adding the proper -> target/output columns to the subquery, and have the outer query -> reference those columns in the subquery range table entry. - -Yes, this is a way to handle subqueries by joining to temp table. -After getting plan we could change temp table access path to -node material. On the other hand, it could be useful to let optimizer -know about cost of temp table creation (have to think more about it)... -Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN -is one example of this - joining by <> will give us invalid results. -Setting special NOT EQUAL flag is not enough: subquery plan must be -always inner one in this case. The same for handling ALL modifier. -Note, that we generaly can't use aggregates here: we can't add MAX to -subquery in the case of > ALL (subquery), because of > ALL should return FALSE -if subquery returns NULL(s) but aggregates don't take NULLs into account. - -> -> Maybe I can write up a sample of this? Vadim, would this help? Is this -> the point we are stuck at? - -Personally, I was stuck by holydays -:) -Now I can spend ~ 8 hours ~ each day for development... - -Vadim - - -From owner-pgsql-hackers@hub.org Mon Jan 5 10:45:30 1998 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA10769 - for ; Mon, 5 Jan 1998 10:45:28 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA17823; Mon, 5 Jan 1998 10:32:00 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 10:31:45 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA17757 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 10:31:38 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA17727 for ; Mon, 5 Jan 1998 10:31:06 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id KAA10375; - Mon, 5 Jan 1998 10:28:48 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801051528.KAA10375@candle.pha.pa.us> -Subject: Re: [HACKERS] subselect -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Mon, 5 Jan 1998 10:28:48 -0500 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 5, 98 07:35:59 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -> Yes, this is a way to handle subqueries by joining to temp table. -> After getting plan we could change temp table access path to -> node material. On the other hand, it could be useful to let optimizer -> know about cost of temp table creation (have to think more about it)... -> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN -> is one example of this - joining by <> will give us invalid results. -> Setting special NOT EQUAL flag is not enough: subquery plan must be -> always inner one in this case. The same for handling ALL modifier. -> Note, that we generaly can't use aggregates here: we can't add MAX to -> subquery in the case of > ALL (subquery), because of > ALL should return FALSE -> if subquery returns NULL(s) but aggregates don't take NULLs into account. - -OK, here are my ideas. First, I think you have to handle subselects in -the outer node because a subquery could have its own subquery. Also, we -now have a field in Aggreg to all us to 'usenulls'. - -OK, here it is. I recommend we pass the outer and subquery through -the parser and optimizer separately. - -We parse the subquery first. If the subquery is not correlated, it -should parse fine. If it is correlated, any columns we find in the -subquery that are not already in the FROM list, we add the table to the -subquery FROM list, and add the referenced column to the target list of -the subquery. - -When we are finished parsing the subquery, we create a catalog cache -entry for it called 'sub1' and make its fields match the target -list of the subquery. - -In the outer query, we add 'sub1' to its target list, and change -the subquery reference to point to the new range table. We also add -WHERE clauses to do any correlated joins. - -Here is a simple example: - - select * - from taba - where col1 = (select col2 - from tabb) - -This is not correlated, and the subquery parser easily. We create a -'sub1' catalog cache entry, and add 'sub1' to the outer query FROM -clause. We also replace 'col1 = (subquery)' with 'col1 = sub1.col2'. - -Here is a more complex correlated subquery: - - select * - from taba - where col1 = (select col2 - from tabb - where taba.col3 = tabb.col4) - -Here we must add 'taba' to the subquery's FROM list, and add col3 to the -target list of the subquery. After we parse the subquery, add 'sub1' to -the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 = -sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'. -THe optimizer will do the correlation for us. - -In the optimizer, we can parse the subquery first, then the outer query, -and then replace all 'sub1' references in the outer query to use the -subquery plan. - -I realize making merging the two plans and doing IN and NOT IN is the -real challenge, but I hoped this would give us a start. - -What do you think? - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Mon Jan 5 15:02:46 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA28690 - for ; Mon, 5 Jan 1998 15:02:44 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA08811 for ; Mon, 5 Jan 1998 14:28:43 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id CAA24904; - Tue, 6 Jan 1998 02:56:00 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B13ACD.B1A95805@sable.krasnoyarsk.su> -Date: Tue, 06 Jan 1998 02:55:57 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org -Subject: Re: [HACKERS] subselect -References: <199801051528.KAA10375@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > always inner one in this case. The same for handling ALL modifier. -> > Note, that we generaly can't use aggregates here: we can't add MAX to -> > subquery in the case of > ALL (subquery), because of > ALL should return FALSE -> > if subquery returns NULL(s) but aggregates don't take NULLs into account. -> -> OK, here are my ideas. First, I think you have to handle subselects in -> the outer node because a subquery could have its own subquery. Also, we - -I hope that this is no matter: if results of subquery (with/without sub-subqueries) -will go into temp table then this table will be re-scanned for each outer tuple. - -> now have a field in Aggreg to all us to 'usenulls'. - ^^^^^^^^ - This can't help: - -vac=> select * from x; -y -- -1 -2 -3 - <<< this is NULL -(4 rows) - -vac=> select max(y) from x; -max ---- - 3 - -==> we can't replace - -select * from A where A.a > ALL (select y from x); - ^^^^^^^^^^^^^^^ - (NULL will be returned and so A.a > ALL is FALSE - this is what - Sybase does, is it right ?) -with - -select * from A where A.a > (select max(y) from x); - ^^^^^^^^^^^^^^^^^^^^ -just because of we lose knowledge about NULLs here. - -Also, I would like to handle ANY and ALL modifiers for all bool -operators, either built-in or user-defined, for all data types - -isn't PostgreSQL OO-like RDBMS -:) - -> OK, here it is. I recommend we pass the outer and subquery through -> the parser and optimizer separately. - -I don't like this. I would like to get parse-tree from parser for -entire query and let optimizer (on upper level) decide how to rewrite -parse-tree and what plans to produce and how these plans should be -merged. Note, that I don't object your methods below, but only where -to place handling of this. I don't understand why should we add -new part to the system which will do optimizer' work (parse-tree --> -execution plan) and deal with optimizer nodes. Imho, upper optimizer -level is nice place to do this. - -> -> We parse the subquery first. If the subquery is not correlated, it -> should parse fine. If it is correlated, any columns we find in the -> subquery that are not already in the FROM list, we add the table to the -> subquery FROM list, and add the referenced column to the target list of -> the subquery. -> -> When we are finished parsing the subquery, we create a catalog cache -> entry for it called 'sub1' and make its fields match the target -> list of the subquery. -> -> In the outer query, we add 'sub1' to its target list, and change -> the subquery reference to point to the new range table. We also add -> WHERE clauses to do any correlated joins. -... -> Here is a more complex correlated subquery: -> -> select * -> from taba -> where col1 = (select col2 -> from tabb -> where taba.col3 = tabb.col4) -> -> Here we must add 'taba' to the subquery's FROM list, and add col3 to the -> target list of the subquery. After we parse the subquery, add 'sub1' to -> the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 = -> sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'. -> THe optimizer will do the correlation for us. -> -> In the optimizer, we can parse the subquery first, then the outer query, -> and then replace all 'sub1' references in the outer query to use the -> subquery plan. -> -> I realize making merging the two plans and doing IN and NOT IN is the - ^^^^^^^^^^^^^^^^^^^^^ -This is very easy to do! As I already said we have just change sub1 -access path (SeqScan of sub1) with SeqScan of Material node with -subquery plan. - -> real challenge, but I hoped this would give us a start. - -Decision about how to record subquery stuff in to parse-tree -would be very good start -:) - -BTW, note that for _expression_ subqueries (which are introduced without -IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples - -we have to check that subquery returns single tuple... - -Vadim - -From owner-pgsql-hackers@hub.org Mon Jan 5 20:31:03 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06836 - for ; Mon, 5 Jan 1998 20:31:01 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA29980 for ; Mon, 5 Jan 1998 19:56:05 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28044; Mon, 5 Jan 1998 19:06:11 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:16 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27203 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:02 -0500 (EST) -Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27049 for ; Mon, 5 Jan 1998 19:02:30 -0500 (EST) -Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) - by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09337 - for ; Mon, 5 Jan 1998 17:31:04 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id RAA02675; - Mon, 5 Jan 1998 17:16:40 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801052216.RAA02675@candle.pha.pa.us> -Subject: Re: [HACKERS] subselect -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Mon, 5 Jan 1998 17:16:40 -0500 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <34B15C23.B24D5CC@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 6, 98 05:18:11 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -> > I am confused. Do you want one flat query and want to pass the whole -> > thing into the optimizer? That brings up some questions: -> -> No. I just want to follow Tom's way: I would like to see new -> SubSelect node as shortened version of struct Query (or use -> Query structure for each subquery - no matter for me), some -> subquery-related stuff added to Query (and SubSelect) to help -> optimizer to start, and see - -OK, so you want the subquery to actually be INSIDE the outer query -expression. Do they share a common range table? If they don't, we -could very easily just fly through when processing the WHERE clause, and -start a new query using a new query structure for the subquery. Believe -me, you don't want a separate SubQuery-type, just re-use Query for it. -It allows you to call all the normal query stuff with a consistent -structure. - -The parser will need to know it is in a subquery, so it can add the -proper target columns to the subquery, or are you going to do that in -the optimizer. You can do it in the optimizer, and join the range table -references there too. - -> -> typedef struct A_Expr -> { -> NodeTag type; -> int oper; /* type of operation -> * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> IN, NOT IN, ANY, ALL, EXISTS here, -> -> char *opname; /* name of operator/function */ -> Node *lexpr; /* left argument */ -> Node *rexpr; /* right argument */ -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> and SubSelect (Query) here (as possible case). -> -> One thought to follow this way: RULEs (and so - VIEWs) are handled by using -> Query - how else can we implement VIEWs on selects with subqueries ? - -Views are stored as nodeout structures, and are merged into the query's -from list, target list, and where clause. I am working out -readfunc,outfunc now to make sure they are up-to-date with all the -current fields. - -> -> BTW, is -> -> select * from A where (select TRUE from B); -> -> valid syntax ? - -I don't think so. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Mon Jan 5 17:01:54 1998 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA02066 - for ; Mon, 5 Jan 1998 17:01:47 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25063; - Tue, 6 Jan 1998 05:18:13 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B15C23.B24D5CC@sable.krasnoyarsk.su> -Date: Tue, 06 Jan 1998 05:18:11 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org -Subject: Re: [HACKERS] subselect -References: <199801052051.PAA29341@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > > OK, here it is. I recommend we pass the outer and subquery through -> > > the parser and optimizer separately. -> > -> > I don't like this. I would like to get parse-tree from parser for -> > entire query and let optimizer (on upper level) decide how to rewrite -> > parse-tree and what plans to produce and how these plans should be -> > merged. Note, that I don't object your methods below, but only where -> > to place handling of this. I don't understand why should we add -> > new part to the system which will do optimizer' work (parse-tree --> -> > execution plan) and deal with optimizer nodes. Imho, upper optimizer -> > level is nice place to do this. -> -> I am confused. Do you want one flat query and want to pass the whole -> thing into the optimizer? That brings up some questions: - -No. I just want to follow Tom's way: I would like to see new -SubSelect node as shortened version of struct Query (or use -Query structure for each subquery - no matter for me), some -subquery-related stuff added to Query (and SubSelect) to help -optimizer to start, and see - -typedef struct A_Expr -{ - NodeTag type; - int oper; /* type of operation - * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - IN, NOT IN, ANY, ALL, EXISTS here, - - char *opname; /* name of operator/function */ - Node *lexpr; /* left argument */ - Node *rexpr; /* right argument */ - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - and SubSelect (Query) here (as possible case). - -One thought to follow this way: RULEs (and so - VIEWs) are handled by using -Query - how else can we implement VIEWs on selects with subqueries ? - -BTW, is - -select * from A where (select TRUE from B); - -valid syntax ? - -Vadim - -From vadim@sable.krasnoyarsk.su Mon Jan 5 18:00:57 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03296 - for ; Mon, 5 Jan 1998 18:00:55 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA20716 for ; Mon, 5 Jan 1998 17:22:21 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094; - Tue, 6 Jan 1998 05:49:02 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su> -Date: Tue, 06 Jan 1998 05:48:58 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Goran Thyni -CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org -Subject: Re: [HACKERS] subselect -References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Goran Thyni wrote: -> -> Vadim, -> -> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN -> is one example of this - joining by <> will give us invalid results. -> -> What is you approach towards this problem? - -Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL) -and so, we have to have not just NOT EQUAL flag but some ALL node -with modified operator. - -After that, one way is put subquery into inner plan of an join node -to be sure that for an outer tuple all corresponding subquery tuples -will be tested with modified operator (this will require either -changing code of all join nodes or addition of new plan type - we'll see) -and another way is ... suggested by you: - -> I got an idea that one could reverse the order, -> that is execute the outer first into a temptable -> and delete from that according to the result of the -> subquery and then return it. -> Probably this is too raw and slow. ;-) - -This will be faster in some cases (when subquery returns many results -and there are "not so many" results from outer query) - thanks for idea! - -> -> Personally, I was stuck by holydays -:) -> Now I can spend ~ 8 hours ~ each day for development... -> -> Oh, isn't it christmas eve right now in Russia? - -Due to historic reasons New Year is mu-u-u-uch popular -holiday in Russia -:) - -Vadim - -From owner-pgsql-hackers@hub.org Mon Jan 5 19:32:59 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA05070 - for ; Mon, 5 Jan 1998 19:32:57 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA26847 for ; Mon, 5 Jan 1998 18:59:43 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28045; Mon, 5 Jan 1998 19:06:11 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:40 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27280 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:25 -0500 (EST) -Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27030 for ; Mon, 5 Jan 1998 19:02:25 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09438 - for ; Mon, 5 Jan 1998 17:35:43 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094; - Tue, 6 Jan 1998 05:49:02 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su> -Date: Tue, 06 Jan 1998 05:48:58 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Goran Thyni -CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org -Subject: Re: [HACKERS] subselect -References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Goran Thyni wrote: -> -> Vadim, -> -> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN -> is one example of this - joining by <> will give us invalid results. -> -> What is you approach towards this problem? - -Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL) -and so, we have to have not just NOT EQUAL flag but some ALL node -with modified operator. - -After that, one way is put subquery into inner plan of an join node -to be sure that for an outer tuple all corresponding subquery tuples -will be tested with modified operator (this will require either -changing code of all join nodes or addition of new plan type - we'll see) -and another way is ... suggested by you: - -> I got an idea that one could reverse the order, -> that is execute the outer first into a temptable -> and delete from that according to the result of the -> subquery and then return it. -> Probably this is too raw and slow. ;-) - -This will be faster in some cases (when subquery returns many results -and there are "not so many" results from outer query) - thanks for idea! - -> -> Personally, I was stuck by holydays -:) -> Now I can spend ~ 8 hours ~ each day for development... -> -> Oh, isn't it christmas eve right now in Russia? - -Due to historic reasons New Year is mu-u-u-uch popular -holiday in Russia -:) - -Vadim - - -From vadim@sable.krasnoyarsk.su Mon Jan 5 18:00:59 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03300 - for ; Mon, 5 Jan 1998 18:00:57 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA21652 for ; Mon, 5 Jan 1998 17:42:15 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129; - Tue, 6 Jan 1998 06:10:05 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su> -Date: Tue, 06 Jan 1998 06:09:56 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org -Subject: Re: [HACKERS] subselect -References: <199801052216.RAA02675@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > > I am confused. Do you want one flat query and want to pass the whole -> > > thing into the optimizer? That brings up some questions: -> > -> > No. I just want to follow Tom's way: I would like to see new -> > SubSelect node as shortened version of struct Query (or use -> > Query structure for each subquery - no matter for me), some -> > subquery-related stuff added to Query (and SubSelect) to help -> > optimizer to start, and see -> -> OK, so you want the subquery to actually be INSIDE the outer query -> expression. Do they share a common range table? If they don't, we - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -No. - -> could very easily just fly through when processing the WHERE clause, and -> start a new query using a new query structure for the subquery. Believe - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -... and filling some subquery-related stuff in upper query structure - -still don't know what exactly this could be -:) - -> me, you don't want a separate SubQuery-type, just re-use Query for it. -> It allows you to call all the normal query stuff with a consistent -> structure. - -No objections. - -> -> The parser will need to know it is in a subquery, so it can add the -> proper target columns to the subquery, or are you going to do that in - -I don't think that we need in it, but list of correlation clauses -could be good thing - all in all parser has to check all column -references... - -> the optimizer. You can do it in the optimizer, and join the range table -> references there too. - -Yes. - -> > typedef struct A_Expr -> > { -> > NodeTag type; -> > int oper; /* type of operation -> > * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ -> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> > IN, NOT IN, ANY, ALL, EXISTS here, -> > -> > char *opname; /* name of operator/function */ -> > Node *lexpr; /* left argument */ -> > Node *rexpr; /* right argument */ -> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> > and SubSelect (Query) here (as possible case). -> > -> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using -> > Query - how else can we implement VIEWs on selects with subqueries ? -> -> Views are stored as nodeout structures, and are merged into the query's -> from list, target list, and where clause. I am working out -> readfunc,outfunc now to make sure they are up-to-date with all the -> current fields. - -Nice! This stuff was out-of-date for too long time. - -> > BTW, is -> > -> > select * from A where (select TRUE from B); -> > -> > valid syntax ? -> -> I don't think so. - -And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN, -ANY, ALL, EXISTS - well. - -(Time to sleep -:) - -Vadim - -From owner-pgsql-hackers@hub.org Mon Jan 5 20:31:08 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06842 - for ; Mon, 5 Jan 1998 20:31:06 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id UAA00621 for ; Mon, 5 Jan 1998 20:03:49 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28043; Mon, 5 Jan 1998 19:06:11 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:38 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27270 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:22 -0500 (EST) -Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27141 for ; Mon, 5 Jan 1998 19:02:50 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09919 - for ; Mon, 5 Jan 1998 17:54:47 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129; - Tue, 6 Jan 1998 06:10:05 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su> -Date: Tue, 06 Jan 1998 06:09:56 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org -Subject: Re: [HACKERS] subselect -References: <199801052216.RAA02675@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce Momjian wrote: -> -> > > I am confused. Do you want one flat query and want to pass the whole -> > > thing into the optimizer? That brings up some questions: -> > -> > No. I just want to follow Tom's way: I would like to see new -> > SubSelect node as shortened version of struct Query (or use -> > Query structure for each subquery - no matter for me), some -> > subquery-related stuff added to Query (and SubSelect) to help -> > optimizer to start, and see -> -> OK, so you want the subquery to actually be INSIDE the outer query -> expression. Do they share a common range table? If they don't, we - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -No. - -> could very easily just fly through when processing the WHERE clause, and -> start a new query using a new query structure for the subquery. Believe - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -... and filling some subquery-related stuff in upper query structure - -still don't know what exactly this could be -:) - -> me, you don't want a separate SubQuery-type, just re-use Query for it. -> It allows you to call all the normal query stuff with a consistent -> structure. - -No objections. - -> -> The parser will need to know it is in a subquery, so it can add the -> proper target columns to the subquery, or are you going to do that in - -I don't think that we need in it, but list of correlation clauses -could be good thing - all in all parser has to check all column -references... - -> the optimizer. You can do it in the optimizer, and join the range table -> references there too. - -Yes. - -> > typedef struct A_Expr -> > { -> > NodeTag type; -> > int oper; /* type of operation -> > * {OP,OR,AND,NOT,ISNULL,NOTNULL} */ -> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> > IN, NOT IN, ANY, ALL, EXISTS here, -> > -> > char *opname; /* name of operator/function */ -> > Node *lexpr; /* left argument */ -> > Node *rexpr; /* right argument */ -> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> > and SubSelect (Query) here (as possible case). -> > -> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using -> > Query - how else can we implement VIEWs on selects with subqueries ? -> -> Views are stored as nodeout structures, and are merged into the query's -> from list, target list, and where clause. I am working out -> readfunc,outfunc now to make sure they are up-to-date with all the -> current fields. - -Nice! This stuff was out-of-date for too long time. - -> > BTW, is -> > -> > select * from A where (select TRUE from B); -> > -> > valid syntax ? -> -> I don't think so. - -And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN, -ANY, ALL, EXISTS - well. - -(Time to sleep -:) - -Vadim - - -From owner-pgsql-hackers@hub.org Thu Jan 8 23:10:50 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA09707 - for ; Thu, 8 Jan 1998 23:10:48 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA19334 for ; Thu, 8 Jan 1998 23:08:49 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id XAA14375; Thu, 8 Jan 1998 23:03:29 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 08 Jan 1998 23:03:10 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id XAA14345 for pgsql-hackers-outgoing; Thu, 8 Jan 1998 23:03:06 -0500 (EST) -Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id XAA14008 for ; Thu, 8 Jan 1998 23:00:50 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id WAA09243; - Thu, 8 Jan 1998 22:55:03 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801090355.WAA09243@candle.pha.pa.us> -Subject: [HACKERS] subselects -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Thu, 8 Jan 1998 22:55:03 -0500 (EST) -Cc: hackers@postgreSQL.org (PostgreSQL-development) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Vadim, I know you are still thinking about subselects, but I have some -more clarification that may help. - -We have to add phantom range table entries to correlated subselects so -they will pass the parser. We might as well add those fields to the -target list of the subquery at the same time: - - select * - from taba - where col1 = (select col2 - from tabb - where taba.col3 = tabb.col4) - -becomes: - - select * - from taba - where col1 = (select col2, tabb.col4 <--- - from tabb, taba <--- - where taba.col3 = tabb.col4) - -We add a field to TargetEntry and RangeTblEntry to mark the fact that it -was entered as a correlation entry: - - bool isCorrelated; - -Second, we need to hook the subselect to the main query. I recommend we -add two fields to Query for this: - - Query *parentQuery; - List *subqueries; - -The parentQuery pointer is used to resolve field names in the correlated -subquery. - - select * - from taba - where col1 = (select col2, tabb.col4 <--- - from tabb, taba <--- - where taba.col3 = tabb.col4) - -In the query above, the subquery can be easily parsed, and we add the -subquery to the parsent's parentQuery list. - -In the parent query, to parse the WHERE clause, we create a new operator -type, called IN or NOT_IN, or ALL, where the left side is a Var, and the -right side is an index to a slot in the subqueries List. - -We can then do the rest in the upper optimizer. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Fri Jan 9 10:01:01 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27305 - for ; Fri, 9 Jan 1998 10:00:59 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21583 for ; Fri, 9 Jan 1998 09:52:17 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id WAA01623; - Fri, 9 Jan 1998 22:10:25 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> -Date: Fri, 09 Jan 1998 22:10:06 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: subselects -References: <199801090355.WAA09243@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> Vadim, I know you are still thinking about subselects, but I have some -> more clarification that may help. -> -> We have to add phantom range table entries to correlated subselects so -> they will pass the parser. We might as well add those fields to the -> target list of the subquery at the same time: -> -> select * -> from taba -> where col1 = (select col2 -> from tabb -> where taba.col3 = tabb.col4) -> -> becomes: -> -> select * -> from taba -> where col1 = (select col2, tabb.col4 <--- -> from tabb, taba <--- -> where taba.col3 = tabb.col4) -> -> We add a field to TargetEntry and RangeTblEntry to mark the fact that it -> was entered as a correlation entry: -> -> bool isCorrelated; - -No, I don't like to add anything in parser. Example: - - select * - from tabA - where col1 = (select col2 - from tabB - where tabA.col3 = tabB.col4 - and exists (select * - from tabC - where tabB.colX = tabC.colX and - tabC.colY = tabA.col2) - ) - -: a column of tabA is referenced in sub-subselect -(is it allowable by standards ?) - in this case it's better -to don't add tabA to 1st subselect but add tabA to second one -and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - -this gives us 2-tables join in 1st subquery instead of 3-tables join. -(And I'm still not sure that using temp tables is best of what can be -done in all cases...) - -Instead of using isCorrelated in TE & RTE we can add - -Index varlevel; - -to Var node to reflect (sub)query from where this Var is come -(where is range table to find var's relation using varno). Upmost query -will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on. - ^^^ ^^^^^^^^^^^^ -(I don't see problems with distinguishing Vars of different children -on the same level...) - -> -> Second, we need to hook the subselect to the main query. I recommend we -> add two fields to Query for this: -> -> Query *parentQuery; -> List *subqueries; - -Agreed. And maybe Index queryLevel. - -> In the parent query, to parse the WHERE clause, we create a new operator -> type, called IN or NOT_IN, or ALL, where the left side is a Var, and the - ^^^^^^^^^^^^^^^^^^ -No. We have to handle (a,b,c) OP (select x, y, z ...) and -'_a_constant_' OP (select ...) - I don't know is last in standards, -Sybase has this. - -Well, - -typedef enum OpType -{ - OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR - -+ OP_EXISTS, OP_ALL, OP_ANY - -} OpType; - -typedef struct Expr -{ - NodeTag type; - Oid typeOid; /* oid of the type of this expr */ - OpType opType; /* type of the op */ - Node *oper; /* could be Oper or Func */ - List *args; /* list of argument nodes */ -} Expr; - -OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries - List, following your suggestion) - -OP_ALL, OP_ANY: - -oper is List of Oper nodes. We need in list because of data types of -a, b, c (above) can be different and so Oper nodes will be different too. - -lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) - -left side of subquery' operator. -lsecond(args) is SubSelect. - -Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in -IN, NOTIN in A_Expr (parser node), but both of them have to be transferred -by parser into corresponding ANY and ALL. At the moment we can do: - -IN --> = ANY, NOT IN --> <> ALL - -but this will be "known bug": this breaks OO-nature of Postgres, because of -operators can be overrided and '=' can mean s o m e t h i n g (not equality). -Example: box data type. For boxes, = means equality of _areas_ and =~ -means that boxes are the same ==> =~ ANY should be used for IN. - -> right side is an index to a slot in the subqueries List. - -Vadim - -From owner-pgsql-hackers@hub.org Fri Jan 9 17:44:04 1998 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA24779 - for ; Fri, 9 Jan 1998 17:44:01 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id RAA20728; Fri, 9 Jan 1998 17:32:34 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 09 Jan 1998 17:32:19 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id RAA20503 for pgsql-hackers-outgoing; Fri, 9 Jan 1998 17:32:15 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id RAA20008 for ; Fri, 9 Jan 1998 17:31:24 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id RAA24282; - Fri, 9 Jan 1998 17:31:41 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801092231.RAA24282@candle.pha.pa.us> -Subject: [HACKERS] Re: subselects -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Fri, 9 Jan 1998 17:31:41 -0500 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 9, 98 10:10:06 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -> -> Bruce Momjian wrote: -> > -> > Vadim, I know you are still thinking about subselects, but I have some -> > more clarification that may help. -> > -> > We have to add phantom range table entries to correlated subselects so -> > they will pass the parser. We might as well add those fields to the -> > target list of the subquery at the same time: -> > -> > select * -> > from taba -> > where col1 = (select col2 -> > from tabb -> > where taba.col3 = tabb.col4) -> > -> > becomes: -> > -> > select * -> > from taba -> > where col1 = (select col2, tabb.col4 <--- -> > from tabb, taba <--- -> > where taba.col3 = tabb.col4) -> > -> > We add a field to TargetEntry and RangeTblEntry to mark the fact that it -> > was entered as a correlation entry: -> > -> > bool isCorrelated; -> -> No, I don't like to add anything in parser. Example: -> -> select * -> from tabA -> where col1 = (select col2 -> from tabB -> where tabA.col3 = tabB.col4 -> and exists (select * -> from tabC -> where tabB.colX = tabC.colX and -> tabC.colY = tabA.col2) -> ) -> -> : a column of tabA is referenced in sub-subselect - -This is a strange case that I don't think we need to handle in our first -implementation. - -> (is it allowable by standards ?) - in this case it's better -> to don't add tabA to 1st subselect but add tabA to second one -> and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - -> this gives us 2-tables join in 1st subquery instead of 3-tables join. -> (And I'm still not sure that using temp tables is best of what can be -> done in all cases...) - -I don't see any use for temp tables in subselects anymore. After having -implemented UNIONS, I now see how much can be done in the upper -optimizer. I see you just putting the subquery PLAN into the proper -place in the plan tree, with some proper JOIN nodes for IN, NOT IN. - -> -> Instead of using isCorrelated in TE & RTE we can add -> -> Index varlevel; - -OK. Sounds good. - -> -> to Var node to reflect (sub)query from where this Var is come -> (where is range table to find var's relation using varno). Upmost query -> will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on. -> ^^^ ^^^^^^^^^^^^ -> (I don't see problems with distinguishing Vars of different children -> on the same level...) -> -> > -> > Second, we need to hook the subselect to the main query. I recommend we -> > add two fields to Query for this: -> > -> > Query *parentQuery; -> > List *subqueries; -> -> Agreed. And maybe Index queryLevel. - -Sure. If it helps. - -> -> > In the parent query, to parse the WHERE clause, we create a new operator -> > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the -> ^^^^^^^^^^^^^^^^^^ -> No. We have to handle (a,b,c) OP (select x, y, z ...) and -> '_a_constant_' OP (select ...) - I don't know is last in standards, -> Sybase has this. - -I have never seen this in my eight years of SQL. Perhaps we can leave -this for later, maybe much later. - -> -> Well, -> -> typedef enum OpType -> { -> OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR -> -> + OP_EXISTS, OP_ALL, OP_ANY -> -> } OpType; -> -> typedef struct Expr -> { -> NodeTag type; -> Oid typeOid; /* oid of the type of this expr */ -> OpType opType; /* type of the op */ -> Node *oper; /* could be Oper or Func */ -> List *args; /* list of argument nodes */ -> } Expr; -> -> OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries -> List, following your suggestion) -> -> OP_ALL, OP_ANY: -> -> oper is List of Oper nodes. We need in list because of data types of -> a, b, c (above) can be different and so Oper nodes will be different too. -> -> lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) - -> left side of subquery' operator. -> lsecond(args) is SubSelect. -> -> Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in -> IN, NOTIN in A_Expr (parser node), but both of them have to be transferred -> by parser into corresponding ANY and ALL. At the moment we can do: -> -> IN --> = ANY, NOT IN --> <> ALL -> -> but this will be "known bug": this breaks OO-nature of Postgres, because of -> operators can be overrided and '=' can mean s o m e t h i n g (not equality). -> Example: box data type. For boxes, = means equality of _areas_ and =~ -> means that boxes are the same ==> =~ ANY should be used for IN. - -That is interesting, to use =~ for ANY. - -Yes, but how many operators take a SUBQUERY as an operand. This is a -special case to me. - -I think I see where you are trying to go. You want subselects to behave -like any other operator, with a subselect type, and you do all the -subselect handling in the optimizer, with special Nodes and actions. - -I think this may be just too much of a leap. We have such clean query -logic for single queries, I can't imagine having an operator that has a -Query operand, and trying to get everything to properly handle it. -UNIONS were very easy to implement as a List off of Query, with some -foreach()'s in rewrite and the high optimizer. - -Subselects are SQL standard, and are never going to be over-ridden by a -user. Same with UNION. They want UNION, they get UNION. They want -Subselect, we are going to spin through the Query structure and give -them what they want. - -The complexities of subselects and correlated queries and range tables -and stuff is so bizarre that trying to get it to work inside the type -system could be a huge project. - -> -> > right side is an index to a slot in the subqueries List. - -I guess the question is what can we have by February 1? - -I have been reading some postings, and it seems to me that subselects -are the litmus test for many evaluators when deciding if a database -engine is full-featured. - -Sorry to be so straightforward, but I want to keep hashing this around -until we get a conclusion, so coding can start. - -My suggestions have been, I believe, trying to get subselects working -with the fullest functionality by adding the least amount of code, and -keeping the logic clean. - -Have you checked out the UNION code? It is very small, but it works. I -think it could make a good sample for subselects. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Sat Jan 10 12:00:51 1998 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA28742 - for ; Sat, 10 Jan 1998 12:00:43 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05684; - Sun, 11 Jan 1998 00:19:10 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> -Date: Sun, 11 Jan 1998 00:19:08 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgresql.org, "Thomas G. Lockhart" -Subject: Re: subselects -References: <199801092231.RAA24282@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > No, I don't like to add anything in parser. Example: -> > -> > select * -> > from tabA -> > where col1 = (select col2 -> > from tabB -> > where tabA.col3 = tabB.col4 -> > and exists (select * -> > from tabC -> > where tabB.colX = tabC.colX and -> > tabC.colY = tabA.col2) -> > ) -> > -> > : a column of tabA is referenced in sub-subselect -> -> This is a strange case that I don't think we need to handle in our first -> implementation. - -I don't know is this strange case or not :) -But I would like to know is this allowed by standards - can someone -comment on this ? -And I don't see problems with handling this... - -> -> > (is it allowable by standards ?) - in this case it's better -> > to don't add tabA to 1st subselect but add tabA to second one -> > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - -> > this gives us 2-tables join in 1st subquery instead of 3-tables join. -> > (And I'm still not sure that using temp tables is best of what can be -> > done in all cases...) -> -> I don't see any use for temp tables in subselects anymore. After having -> implemented UNIONS, I now see how much can be done in the upper -> optimizer. I see you just putting the subquery PLAN into the proper -> place in the plan tree, with some proper JOIN nodes for IN, NOT IN. - -When saying about temp tables, I meant tables created by node Material -for subquery plan. This is one of two ways - run subquery once for all -possible upper plan tuples and then just join result table with upper -query. Another way is re-run subquery for each upper query tuple, -without temp table but may be with caching results by some ways. -Actually, there is special case - when subquery can be alternatively -formulated as joins, - but this is just special case. - -> > > In the parent query, to parse the WHERE clause, we create a new operator -> > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the -> > ^^^^^^^^^^^^^^^^^^ -> > No. We have to handle (a,b,c) OP (select x, y, z ...) and -> > '_a_constant_' OP (select ...) - I don't know is last in standards, -> > Sybase has this. -> -> I have never seen this in my eight years of SQL. Perhaps we can leave -> this for later, maybe much later. - -Are you saying about (a, b, c) or about 'a_constant' ? -Again, can someone comment on are they in standards or not ? -Tom ? -If yes then please add parser' support for them now... - -> > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in -> > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred -> > by parser into corresponding ANY and ALL. At the moment we can do: -> > -> > IN --> = ANY, NOT IN --> <> ALL -> > -> > but this will be "known bug": this breaks OO-nature of Postgres, because of -> > operators can be overrided and '=' can mean s o m e t h i n g (not equality). -> > Example: box data type. For boxes, = means equality of _areas_ and =~ -> > means that boxes are the same ==> =~ ANY should be used for IN. -> -> That is interesting, to use =~ for ANY. -> -> Yes, but how many operators take a SUBQUERY as an operand. This is a -> special case to me. -> -> I think I see where you are trying to go. You want subselects to behave -> like any other operator, with a subselect type, and you do all the -> subselect handling in the optimizer, with special Nodes and actions. -> -> I think this may be just too much of a leap. We have such clean query -> logic for single queries, I can't imagine having an operator that has a -> Query operand, and trying to get everything to properly handle it. -> UNIONS were very easy to implement as a List off of Query, with some -> foreach()'s in rewrite and the high optimizer. -> -> Subselects are SQL standard, and are never going to be over-ridden by a -> user. Same with UNION. They want UNION, they get UNION. They want -> Subselect, we are going to spin through the Query structure and give -> them what they want. -> -> The complexities of subselects and correlated queries and range tables -> and stuff is so bizarre that trying to get it to work inside the type -> system could be a huge project. - -PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS), -derived from the Berkeley Postgres database management system. While -PostgreSQL retains the powerful object-relational data model, rich data types and - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -easy extensibility of Postgres, it replaces the PostQuel query language with an -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -extended subset of SQL. -^^^^^^^^^^^^^^^^^^^^^^ - -Should we say users that subselect will work for standard data types only ? -I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ? -Is there difference between handling = ANY and ~ ANY ? I don't see any. -Currently we can't get IN working properly for boxes (and may be for others too) -and I don't like to try to resolve these problems now, but hope that someday -we'll be able to do this. At the moment - just convert IN into = ANY and -NOT IN into <> ALL in parser. - -(BTW, do you know how DISTINCT is implemented ? It doesn't use = but -use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...) - -> > -> > > right side is an index to a slot in the subqueries List. -> -> I guess the question is what can we have by February 1? -> -> I have been reading some postings, and it seems to me that subselects -> are the litmus test for many evaluators when deciding if a database -> engine is full-featured. -> -> Sorry to be so straightforward, but I want to keep hashing this around -> until we get a conclusion, so coding can start. -> -> My suggestions have been, I believe, trying to get subselects working -> with the fullest functionality by adding the least amount of code, and -> keeping the logic clean. -> -> Have you checked out the UNION code? It is very small, but it works. I -> think it could make a good sample for subselects. - -There is big difference between subqueries and queries in UNION - -there are not dependences between UNION queries. - -Ok, opened issues: - -1. Is using upper query' vars in all subquery levels in standard ? -2. Is (a, b, c) OP (subselect) in standard ? -3. What types of expressions (Var, Const, ...) are allowed on the left - side of operator with subquery on the right ? -4. What types of operators should we support (=, >, ..., like, ~, ...) ? - (My vote for all boolean operators). - -And - did we get consensus on presentation subqueries stuff in Query, -Expr and Var ? -I would like to have something done in parser near Jan 17 to get -subqueries working by Feb 1. I vote for support of all standard -things (1. - 3.) in parser right now - if there will be no time -to implement something like (a, b, c) then optimizer will call -elog(WARN) (oh, sorry, - elog(ERROR)). - -Vadim - -From vadim@sable.krasnoyarsk.su Sat Jan 10 12:31:05 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA29045 - for ; Sat, 10 Jan 1998 12:31:01 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA23364 for ; Sat, 10 Jan 1998 12:22:30 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05725; - Sun, 11 Jan 1998 00:41:22 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B7B2BF.44FE7252@sable.krasnoyarsk.su> -Date: Sun, 11 Jan 1998 00:41:19 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: [HACKERS] subselects -References: <199712220545.AAA11605@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> OK, a few questions: -> -> Should we use sortmerge, so we can use our psort as temp tables, -> or do we use hashunique? -> -> How do we pass the query to the optimizer? How do we represent -> the range table for each, and the links between them in correlated -> subqueries? - -My suggestion is just use varlevel in Var and don't put upper query' -relations into subquery range table. - -Vadim - -From vadim@sable.krasnoyarsk.su Sat Jan 10 13:01:00 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29357 - for ; Sat, 10 Jan 1998 13:00:58 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA24030 for ; Sat, 10 Jan 1998 12:40:02 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05741; - Sun, 11 Jan 1998 00:58:56 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B7B6DC.937E1B8D@sable.krasnoyarsk.su> -Date: Sun, 11 Jan 1998 00:58:52 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian , - PostgreSQL-development -Subject: Re: [HACKERS] subselects -References: <199712220545.AAA11605@candle.pha.pa.us> <34B7B2BF.44FE7252@sable.krasnoyarsk.su> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Vadim B. Mikheev wrote: -> -> Bruce Momjian wrote: -> > -> > OK, a few questions: -> > -> > Should we use sortmerge, so we can use our psort as temp tables, -> > or do we use hashunique? -> > -> > How do we pass the query to the optimizer? How do we represent -> > the range table for each, and the links between them in correlated -> > subqueries? -> -> My suggestion is just use varlevel in Var and don't put upper query' -> relations into subquery range table. - -Hmm... Sorry, it seems that I did reply to very old message - forget it. - -Vadim - -From lockhart@alumni.caltech.edu Sat Jan 10 13:30:59 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29664 - for ; Sat, 10 Jan 1998 13:30:56 -0500 (EST) -Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA25109 for ; Sat, 10 Jan 1998 13:05:09 -0500 (EST) -Received: from alumni.caltech.edu (localhost [127.0.0.1]) - by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA03623; - Sat, 10 Jan 1998 18:01:03 GMT -Sender: tgl@gnet04.jpl.nasa.gov -Message-ID: <34B7B75F.B49D7642@alumni.caltech.edu> -Date: Sat, 10 Jan 1998 18:01:03 +0000 -From: "Thomas G. Lockhart" -Organization: Caltech/JPL -X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) -MIME-Version: 1.0 -To: "Vadim B. Mikheev" -CC: Bruce Momjian , hackers@postgresql.org -Subject: Re: subselects -References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -> > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in -> > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred -> > > by parser into corresponding ANY and ALL. At the moment we can do: -> > > -> > > IN --> = ANY, NOT IN --> <> ALL -> > > -> > > but this will be "known bug": this breaks OO-nature of Postgres, because of -> > > operators can be overrided and '=' can mean s o m e t h i n g (not equality). -> > > Example: box data type. For boxes, = means equality of _areas_ and =~ -> > > means that boxes are the same ==> =~ ANY should be used for IN. -> > -> > That is interesting, to use =~ for ANY. - -If I understand the discussion, I would think is is fine to make an assumption about -which operator is used to implement a subselect expression. If someone remaps an -operator to mean something different, then they will get a different result (or a -nonsensical one) from a subselect. - -I'd be happy to remap existing operators to fit into a convention which would work -with subselects (especially if I got to help choose :). - -> > Subselects are SQL standard, and are never going to be over-ridden by a -> > user. Same with UNION. They want UNION, they get UNION. They want -> > Subselect, we are going to spin through the Query structure and give -> > them what they want. -> -> PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS), -> derived from the Berkeley Postgres database management system. While -> PostgreSQL retains the powerful object-relational data model, rich data types and -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> easy extensibility of Postgres, it replaces the PostQuel query language with an -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> extended subset of SQL. -> ^^^^^^^^^^^^^^^^^^^^^^ -> -> Should we say users that subselect will work for standard data types only ? -> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ? -> Is there difference between handling = ANY and ~ ANY ? I don't see any. -> Currently we can't get IN working properly for boxes (and may be for others too) -> and I don't like to try to resolve these problems now, but hope that someday -> we'll be able to do this. At the moment - just convert IN into = ANY and -> NOT IN into <> ALL in parser. -> -> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but -> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...) - -?? I didn't know that. Wouldn't we want it to eventually use "=" through a sorted -list? That would give more consistant behavior... - -> > I have been reading some postings, and it seems to me that subselects -> > are the litmus test for many evaluators when deciding if a database -> > engine is full-featured. -> > -> > Sorry to be so straightforward, but I want to keep hashing this around -> > until we get a conclusion, so coding can start. -> > -> > My suggestions have been, I believe, trying to get subselects working -> > with the fullest functionality by adding the least amount of code, and -> > keeping the logic clean. -> > -> > Have you checked out the UNION code? It is very small, but it works. I -> > think it could make a good sample for subselects. -> -> There is big difference between subqueries and queries in UNION - -> there are not dependences between UNION queries. -> -> Ok, opened issues: -> -> 1. Is using upper query' vars in all subquery levels in standard ? - -I'm not certain. Let me know if you do not get an answer from someone else and I will -research it. - -> 2. Is (a, b, c) OP (subselect) in standard ? - -Yes. In fact, it _is_ the standard, and "a OP (subselect)" is a special case where -the parens are allowed to be omitted from a one element list. - -> 3. What types of expressions (Var, Const, ...) are allowed on the left -> side of operator with subquery on the right ? - -I think most expressions are allowed. The "constant OP (subselect)" case you were -asking about is just a simplified case since "(a, b, constant) OP (subselect)" where -a and b are column references should be allowed. Of course, our optimizer could -perhaps change this to "(a, b) OP (subselect where x = constant)", or for the first -example "EXISTS (subselect where x = constant)". - -> 4. What types of operators should we support (=, >, ..., like, ~, ...) ? -> (My vote for all boolean operators). - -Sounds good. But I'll vote with Bruce (and I'll bet you already agree) that it is -important to get an initial implementation for v6.3 which covers a little, some, or -all of the usual SQL subselect constructs. If we have to revisit this for v6.4 then -we will have the benefit of feedback from others in practical applications which -always uncovers new things to consider. - -> And - did we get consensus on presentation subqueries stuff in Query, -> Expr and Var ? -> I would like to have something done in parser near Jan 17 to get -> subqueries working by Feb 1. I vote for support of all standard -> things (1. - 3.) in parser right now - if there will be no time -> to implement something like (a, b, c) then optimizer will callelog(WARN) (oh, -> sorry, - elog(ERROR)). - -Great. I'd like to help with the remaining parser issues; at the moment "row_expr" -does the right thing with expression comparisions but just parses then ignores -subselect expressions. Let me know what structures you want passed back and I'll put -them in, or if you prefer put in the first one and I'll go through and clean up and -add the rest. - - - Tom - - -From lockhart@alumni.caltech.edu Sat Jan 10 15:00:58 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA00728 - for ; Sat, 10 Jan 1998 15:00:56 -0500 (EST) -Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA28438 for ; Sat, 10 Jan 1998 14:35:19 -0500 (EST) -Received: from alumni.caltech.edu (localhost [127.0.0.1]) - by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id TAA06002; - Sat, 10 Jan 1998 19:31:30 GMT -Sender: tgl@gnet04.jpl.nasa.gov -Message-ID: <34B7CC91.E6E331C7@alumni.caltech.edu> -Date: Sat, 10 Jan 1998 19:31:29 +0000 -From: "Thomas G. Lockhart" -Organization: Caltech/JPL -X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) -MIME-Version: 1.0 -To: "Vadim B. Mikheev" -CC: Bruce Momjian , hackers@postgresql.org -Subject: Re: [HACKERS] Re: subselects -References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -> Are you saying about (a, b, c) or about 'a_constant' ? -> Again, can someone comment on are they in standards or not ? -> Tom ? -> If yes then please add parser' support for them now... - -As I mentioned a few minutes ago in my last message, I parse the row descriptors and -the subselects but for subselect expressions (e.g. "(a,b) OP (subselect)" I currently -ignore the result. I didn't want to pass things back as lists until something in the -backend was ready to receive them. - -If it is OK, I'll go ahead and start passing back a list of expressions when a row -descriptor is present. So, what you will find is lexpr or rexpr in the A_Expr node -being a list rather than an atomic node. - -Also, I can start passing back the subselect expression as the rexpr; right now the -parser calls elog() and quits. - -btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called -makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions. -If lists are handled farther back, this routine should move to there also and the -parser will just pass the lists. Note that some assumptions have to be made about the -meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of -"a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK -to disallow those cases or to look for specific appearance of the operator to guess -the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if -it has "<>" or "!" then build as "or"s. - -Let me know what you want... - - - Tom - - -From lockhart@alumni.caltech.edu Sun Jan 11 01:01:55 1998 -Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA11953 - for ; Sun, 11 Jan 1998 01:01:51 -0500 (EST) -Received: from alumni.caltech.edu (localhost [127.0.0.1]) - by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id FAA23797; - Sun, 11 Jan 1998 05:58:01 GMT -Sender: tgl@gnet04.jpl.nasa.gov -Message-ID: <34B85F68.9C015ED9@alumni.caltech.edu> -Date: Sun, 11 Jan 1998 05:58:01 +0000 -From: "Thomas G. Lockhart" -Organization: Caltech/JPL -X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686) -MIME-Version: 1.0 -To: "Vadim B. Mikheev" -CC: Bruce Momjian , hackers@postgresql.org -Subject: Re: [HACKERS] Re: subselects -References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> -Content-Type: multipart/mixed; boundary="------------D8B38A0D1F78A10C0023F702" -Status: OR - -This is a multi-part message in MIME format. ---------------D8B38A0D1F78A10C0023F702 -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit - -Here are context diffs of gram.y and keywords.c; sorry about sending the full files. -These start sending lists of arguments toward the backend from the parser to -implement row descriptors and subselects. - -They should apply OK even over Bruce's recent changes... - - - Tom - ---------------D8B38A0D1F78A10C0023F702 -Content-Type: text/plain; charset=us-ascii; name="gram.y.patch" -Content-Transfer-Encoding: 7bit -Content-Disposition: inline; filename="gram.y.patch" - -*** ../src/backend/parser/gram.y.orig Sat Jan 10 05:44:36 1998 ---- ../src/backend/parser/gram.y Sat Jan 10 19:29:37 1998 -*************** -*** 195,200 **** ---- 195,201 ---- - having_clause - %type row_descriptor, row_list - %type row_expr -+ %type RowOp, row_opt - %type OptCreateAs, CreateAsList - %type CreateAsElement - %type NumConst -*************** -*** 242,248 **** - */ - - /* Keywords (in SQL92 reserved words) */ -! %token ACTION, ADD, ALL, ALTER, AND, AS, ASC, - BEGIN_TRANS, BETWEEN, BOTH, BY, - CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, - CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, ---- 243,249 ---- - */ - - /* Keywords (in SQL92 reserved words) */ -! %token ACTION, ADD, ALL, ALTER, AND, ANY, AS, ASC, - BEGIN_TRANS, BETWEEN, BOTH, BY, - CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, - CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, -*************** -*** 258,264 **** - ON, OPTION, OR, ORDER, OUTER_P, - PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC, - REFERENCES, REVOKE, RIGHT, ROLLBACK, -! SECOND_P, SELECT, SET, SUBSTRING, - TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM, - UNION, UNIQUE, UPDATE, USING, - VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW, ---- 259,265 ---- - ON, OPTION, OR, ORDER, OUTER_P, - PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC, - REFERENCES, REVOKE, RIGHT, ROLLBACK, -! SECOND_P, SELECT, SET, SOME, SUBSTRING, - TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM, - UNION, UNIQUE, UPDATE, USING, - VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW, -*************** -*** 2853,2866 **** - /* Expressions using row descriptors - * Define row_descriptor to allow yacc to break the reduce/reduce conflict - * with singleton expressions. - */ - row_expr: '(' row_descriptor ')' IN '(' SubSelect ')' - { -! $$ = NULL; - } - | '(' row_descriptor ')' NOT IN '(' SubSelect ')' - { -! $$ = NULL; - } - | '(' row_descriptor ')' '=' '(' row_descriptor ')' - { ---- 2854,2878 ---- - /* Expressions using row descriptors - * Define row_descriptor to allow yacc to break the reduce/reduce conflict - * with singleton expressions. -+ * -+ * Note that "SOME" is the same as "ANY" in syntax. -+ * - thomas 1998-01-10 - */ - row_expr: '(' row_descriptor ')' IN '(' SubSelect ')' - { -! $$ = makeA_Expr(OP, "=any", (Node *)$2, (Node *)$6); - } - | '(' row_descriptor ')' NOT IN '(' SubSelect ')' - { -! $$ = makeA_Expr(OP, "<>any", (Node *)$2, (Node *)$7); -! } -! | '(' row_descriptor ')' RowOp row_opt '(' SubSelect ')' -! { -! char *opr; -! opr = palloc(strlen($4)+strlen($5)+1); -! strcpy(opr, $4); -! strcat(opr, $5); -! $$ = makeA_Expr(OP, opr, (Node *)$2, (Node *)$7); - } - | '(' row_descriptor ')' '=' '(' row_descriptor ')' - { -*************** -*** 2880,2885 **** ---- 2892,2907 ---- - } - ; - -+ RowOp: '=' { $$ = "="; } -+ | '<' { $$ = "<"; } -+ | '>' { $$ = ">"; } -+ ; -+ -+ row_opt: ALL { $$ = "all"; } -+ | ANY { $$ = "any"; } -+ | SOME { $$ = "any"; } -+ ; -+ - row_descriptor: row_list ',' a_expr - { - $$ = lappend($1, $3); -*************** -*** 3432,3441 **** - ; - - in_expr: SubSelect -! { -! elog(ERROR,"IN (SUBSELECT) not yet implemented"); -! $$ = $1; -! } - | in_expr_nodes - { $$ = $1; } - ; ---- 3454,3460 ---- - ; - - in_expr: SubSelect -! { $$ = makeA_Expr(OP, "=", saved_In_Expr, (Node *)$1); } - | in_expr_nodes - { $$ = $1; } - ; -*************** -*** 3449,3458 **** - ; - - not_in_expr: SubSelect -! { -! elog(ERROR,"NOT IN (SUBSELECT) not yet implemented"); -! $$ = $1; -! } - | not_in_expr_nodes - { $$ = $1; } - ; ---- 3468,3474 ---- - ; - - not_in_expr: SubSelect -! { $$ = makeA_Expr(OP, "<>", saved_In_Expr, (Node *)$1); } - | not_in_expr_nodes - { $$ = $1; } - ; - ---------------D8B38A0D1F78A10C0023F702 -Content-Type: text/plain; charset=us-ascii; name="keywords.c.patch" -Content-Transfer-Encoding: 7bit -Content-Disposition: inline; filename="keywords.c.patch" - -*** ../src/backend/parser/keywords.c.orig Mon Jan 5 07:51:33 1998 ---- ../src/backend/parser/keywords.c Sat Jan 10 19:22:07 1998 -*************** -*** 39,44 **** ---- 39,45 ---- - {"alter", ALTER}, - {"analyze", ANALYZE}, - {"and", AND}, -+ {"any", ANY}, - {"append", APPEND}, - {"archive", ARCHIVE}, - {"as", AS}, -*************** -*** 178,183 **** ---- 179,185 ---- - {"set", SET}, - {"setof", SETOF}, - {"show", SHOW}, -+ {"some", SOME}, - {"stdin", STDIN}, - {"stdout", STDOUT}, - {"substring", SUBSTRING}, - ---------------D8B38A0D1F78A10C0023F702-- - - -From owner-pgsql-hackers@hub.org Sun Jan 11 01:31:13 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA12255 - for ; Sun, 11 Jan 1998 01:31:10 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA20396 for ; Sun, 11 Jan 1998 01:10:48 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA22176; Sun, 11 Jan 1998 01:03:15 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 11 Jan 1998 01:02:34 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA22151 for pgsql-hackers-outgoing; Sun, 11 Jan 1998 01:02:26 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA22077 for ; Sun, 11 Jan 1998 01:01:05 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id AAA11801; - Sun, 11 Jan 1998 00:59:23 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801110559.AAA11801@candle.pha.pa.us> -Subject: [HACKERS] Re: subselects -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Sun, 11 Jan 1998 00:59:23 -0500 (EST) -Cc: hackers@postgresql.org, lockhart@alumni.caltech.edu -In-Reply-To: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 11, 98 00:19:08 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -> I would like to have something done in parser near Jan 17 to get -> subqueries working by Feb 1. I vote for support of all standard -> things (1. - 3.) in parser right now - if there will be no time -> to implement something like (a, b, c) then optimizer will call -> elog(WARN) (oh, sorry, - elog(ERROR)). - -First, let me say I am glad we are still on schedule for Feb 1. I was -panicking because I thought we wouldn't make it in time. - - -> > > (is it allowable by standards ?) - in this case it's better -> > > to don't add tabA to 1st subselect but add tabA to second one -> > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - -> > > this gives us 2-tables join in 1st subquery instead of 3-tables join. -> > > (And I'm still not sure that using temp tables is best of what can be -> > > done in all cases...) -> > -> > I don't see any use for temp tables in subselects anymore. After having -> > implemented UNIONS, I now see how much can be done in the upper -> > optimizer. I see you just putting the subquery PLAN into the proper -> > place in the plan tree, with some proper JOIN nodes for IN, NOT IN. -> -> When saying about temp tables, I meant tables created by node Material -> for subquery plan. This is one of two ways - run subquery once for all -> possible upper plan tuples and then just join result table with upper -> query. Another way is re-run subquery for each upper query tuple, -> without temp table but may be with caching results by some ways. -> Actually, there is special case - when subquery can be alternatively -> formulated as joins, - but this is just special case. - -This is interesting. It really only applies for correlated subqueries, -and certainly it may help sometimes to just evaluate the subquery for -valid values that are going to come from the upper query than for all -possible values. Perhaps we can use the 'cost' value of each query to -decide how to handle this. - -> -> > > > In the parent query, to parse the WHERE clause, we create a new operator -> > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the -> > > ^^^^^^^^^^^^^^^^^^ -> > > No. We have to handle (a,b,c) OP (select x, y, z ...) and -> > > '_a_constant_' OP (select ...) - I don't know is last in standards, -> > > Sybase has this. -> > -> > I have never seen this in my eight years of SQL. Perhaps we can leave -> > this for later, maybe much later. -> -> Are you saying about (a, b, c) or about 'a_constant' ? -> Again, can someone comment on are they in standards or not ? -> Tom ? -> If yes then please add parser' support for them now... - -OK, Thomas says it is, so we will put in as much code as we can to handle -it. - -> Should we say users that subselect will work for standard data types only ? -> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ? -> Is there difference between handling = ANY and ~ ANY ? I don't see any. -> Currently we can't get IN working properly for boxes (and may be for others too) -> and I don't like to try to resolve these problems now, but hope that someday -> we'll be able to do this. At the moment - just convert IN into = ANY and -> NOT IN into <> ALL in parser. - -OK. - -> -> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but -> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...) - -I did not know that either. - -> There is big difference between subqueries and queries in UNION - -> there are not dependences between UNION queries. - -Yes, I know UNIONS are trivial compared to subselects. - -> -> Ok, opened issues: -> -> 1. Is using upper query' vars in all subquery levels in standard ? -> 2. Is (a, b, c) OP (subselect) in standard ? -> 3. What types of expressions (Var, Const, ...) are allowed on the left -> side of operator with subquery on the right ? -> 4. What types of operators should we support (=, >, ..., like, ~, ...) ? -> (My vote for all boolean operators). -> -> And - did we get consensus on presentation subqueries stuff in Query, -> Expr and Var ? - -OK, here are my concrete ideas on changes and structures. - -I think we all agreed that Query needs new fields: - - Query *parentQuery; - List *subqueries; - -Maybe query level too, but I don't think so (see later ideas on Var). - -We need a new Node structure, call it Sublink: - - int linkType (IN, NOTIN, ANY, EXISTS, OPERATOR...) - Oid operator /* subquery must return single row */ - List *lefthand; /* parent stuff */ - Node *subquery; /* represents nodes from parser */ - Index Subindex; /* filled in to index Query->subqueries */ - -Of course, the names are just suggestions. Every time we run through -the parsenodes of a query to create a Query* structure, when we do the -WHERE clause, if we come upon one of these Sublink nodes (created in the -parser), we move the supplied Query* in Sublink->subquery to a local -List variable, and we set Subquery->subindex to equal the index of the -new query, i.e. is it the first subquery we found, 1, or the second, 2, -etc. - -After we have created the parent Query structure, we run through our -local List variable of subquery parsenodes we created above, and add -Query* entries to Query->subqueries. In each subquery Query*, we set -the parentQuery pointer. - -Also, when parsing the subqueries, we need to keep track of correlated -references. I recommend we add a field to the Var structure: - - Index sublevel; /* range table reference: - = 0 current level of query - < 0 parent above this many levels - > 0 index into subquery list - */ - -This way, a Var node with sublevel 0 is the current level, and is true -in most cases. This helps us not have to change much code. sublevel = --1 means it references the range table in the parent query. sublevel = --2 means the parent's parent. sublevel = 2 means it references the range -table of the second entry in Query->subqueries. Varno and varattno are -still meaningful. Of course, we can't reference variables in the -subqueries from the parent in the parser code, but Vadim may want to. - -When doing a Var lookup in the parser, we look in the current level -first, but if not found, if it is a subquery, we can look at the parent -and parent's parent to set the sublevel, varno, and varatno properly. - -We create no phantom range table entries in the subquery, and no phantom -target list entries. We can leave that all for the upper optimizer. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Fri Nov 28 16:34:03 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA17454 - for ; Fri, 28 Nov 1997 16:33:59 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA10553; Fri, 28 Nov 1997 16:20:03 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 28 Nov 1997 16:17:50 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA10116 for pgsql-hackers-outgoing; Fri, 28 Nov 1997 16:17:45 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA09997 for ; Fri, 28 Nov 1997 16:17:26 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id QAA17309 - for hackers@postgreSQL.org; Fri, 28 Nov 1997 16:18:08 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711282118.QAA17309@candle.pha.pa.us> -Subject: [HACKERS] querytrees and multiple statements -To: hackers@postgreSQL.org (PostgreSQL-development) -Date: Fri, 28 Nov 1997 16:18:08 -0500 (EST) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Currently, if a query string arrives that has multiple sql statements in -it, the parser breaks it down into separate queries, analyzes each one, -then executes them in order. (psql automatically breaks things down -into separate queries, do this will not work there.) The problem is -that if the first query creates a table, and the second query goes to -access it, the parser analysis fails because the table is not yet -created. See the attached pginterface source for an example. The real -problem is that all the queries in the string are analyzed first, then -executed, rather than having one analyzed then execute, then the next. - -I am going to have touble with subselects and temp tables. I want to -pull out the subselect, change it into a SELECT ... INTO TEMP, add it to -the QueryTree before the outer select, then the outer select is analyzed -by the parser, the temp table doesn't exist yet, and will cause an -error. - -Currently postgres.c does each step on all queries before moving to the -next step. Does anyone know what the ramifications would be if I -changed this to do to the full set of operations on each statement first -before moving to the next? - ---------------------------------------------------------------------------- - - -/* - * pgnulltest.c - * -*/ - -#include -#include -#include -#include -#include -#include -#include - -int main(int argc, char **argv) -{ - char query[4000]; - int i; - - if (argc != 2) - halt("Usage: %s database\n",argv[0]); - - connectdb(argv[1],NULL,NULL,NULL,NULL); - - sprintf(query,"create table test(x int); select x from test;"); - doquery(query); - - disconnectdb(); - return 0; -} - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Sat Nov 29 05:01:01 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id FAA27942 - for ; Sat, 29 Nov 1997 05:00:58 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA13666 for ; Sat, 29 Nov 1997 04:35:08 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id QAA17107; Sat, 29 Nov 1997 16:38:58 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <347FE2B1.167EB0E7@sable.krasnoyarsk.su> -Date: Sat, 29 Nov 1997 16:38:57 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: [HACKERS] querytrees and multiple statements -References: <199711282118.QAA17309@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> Currently, if a query string arrives that has multiple sql statements in -> it, the parser breaks it down into separate queries, analyzes each one, -> then executes them in order. (psql automatically breaks things down -> into separate queries, do this will not work there.) The problem is -> that if the first query creates a table, and the second query goes to -> access it, the parser analysis fails because the table is not yet -> created. See the attached pginterface source for an example. The real -> problem is that all the queries in the string are analyzed first, then -> executed, rather than having one analyzed then execute, then the next. -> -> I am going to have touble with subselects and temp tables. I want to -> pull out the subselect, change it into a SELECT ... INTO TEMP, add it to -> the QueryTree before the outer select, then the outer select is analyzed -> by the parser, the temp table doesn't exist yet, and will cause an -> error. -> -> Currently postgres.c does each step on all queries before moving to the -> next step. Does anyone know what the ramifications would be if I -> changed this to do to the full set of operations on each statement first -> before moving to the next? - -This will break ability to prepare plan (parser + optimizer) for latter -execution. This ability is used by RULEs (and so - by VIEWs) and will be -used by PL(s)... - -Please, take a look at nodeMaterial.c: - -/*------------------------------------------------------------------------- - * - * nodeMaterial.c-- - * Routines to handle materialization nodes. -... -/* - * INTERFACE ROUTINES - * ExecMaterial - generate a temporary relation - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -(I'm still very busy. Hope to return soon.) - -Vadim - -From vadim@sable.krasnoyarsk.su Sun Nov 30 02:30:56 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA15439 - for ; Sun, 30 Nov 1997 02:30:55 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id CAA17743 for ; Sun, 30 Nov 1997 02:27:40 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id OAA18937; Sun, 30 Nov 1997 14:32:14 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <3481167E.2781E494@sable.krasnoyarsk.su> -Date: Sun, 30 Nov 1997 14:32:14 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org -Subject: Re: [HACKERS] querytrees and multiple statements -References: <199711291854.NAA05185@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > This will break ability to prepare plan (parser + optimizer) for latter -> > execution. This ability is used by RULEs (and so - by VIEWs) and will be -> > used by PL(s)... -> > -> > Please, take a look at nodeMaterial.c: -> > -> > /*------------------------------------------------------------------------- -> > * -> > * nodeMaterial.c-- -> > * Routines to handle materialization nodes. -> > ... -> > /* -> > * INTERFACE ROUTINES -> > * ExecMaterial - generate a temporary relation -> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> -> I understand what you are saying here. The temp table has transaction -> scope, and breaking each query into multiple commands, each with its own -> transaction scope will cause the temp table to go away. - -No. I just said that there will be no ability to prepare queries with -subselects for latter execution: will be no ability to get execution plan which -could be passed to executor to get results without additional parser/planner -invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp() -(==> PLs). RULEs don't use execution plan, but use parsed query tree (stored -in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects. - -Ability to have execution plans seems important to me. Other DBMS-es use -this for stored procedures and views. - -Vadim - -From owner-pgsql-hackers@hub.org Mon Dec 1 01:30:57 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA10903 - for ; Mon, 1 Dec 1997 01:30:55 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26262 for ; Mon, 1 Dec 1997 01:21:28 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA05263; Mon, 1 Dec 1997 01:02:12 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 01:00:12 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA03357 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 01:00:07 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA03290 for ; Mon, 1 Dec 1997 00:59:45 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id AAA10395; - Mon, 1 Dec 1997 00:57:07 -0500 (EST) -From: Bruce Momjian -Message-Id: <199712010557.AAA10395@candle.pha.pa.us> -Subject: Re: [HACKERS] querytrees and multiple statements -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Mon, 1 Dec 1997 00:57:07 -0500 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <3481167E.2781E494@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 30, 97 02:32:14 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> -> No. I just said that there will be no ability to prepare queries with -> subselects for latter execution: will be no ability to get execution plan which -> could be passed to executor to get results without additional parser/planner -> invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp() -> (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored -> in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects. -> -> Ability to have execution plans seems important to me. Other DBMS-es use -> this for stored procedures and views. -> -> Vadim -> - -I see what you are saying about other people calling pg_plan(). pg_plan -returns the query rewritten, and a plan, and some areas use that. I -will have to make sure I honor that functionality in any changes I make -to it. I will think more about this. I may have to add an 'execute me' -flag to it. However, I am unsure how I am going to generate 'just a -plan or rewritten query structure' without actually running the query -and having the temp table created so the rest can be parsed. - - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Mon Dec 1 02:00:58 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA11221 - for ; Mon, 1 Dec 1997 02:00:57 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26994 for ; Mon, 1 Dec 1997 01:55:19 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA23269; Mon, 1 Dec 1997 01:47:13 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 01:45:31 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA22653 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 01:45:25 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22590 for ; Mon, 1 Dec 1997 01:45:13 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA21318; Mon, 1 Dec 1997 13:49:58 +0700 (KRS) -Message-ID: <34825E16.446B9B3D@sable.krasnoyarsk.su> -Date: Mon, 01 Dec 1997 13:49:58 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org -Subject: Re: [HACKERS] querytrees and multiple statements -References: <199712010557.AAA10395@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce Momjian wrote: -> -> > -> > No. I just said that there will be no ability to prepare queries with -> > subselects for latter execution: will be no ability to get execution plan which -> > could be passed to executor to get results without additional parser/planner -> > invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp() -> > (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored -> > in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects. -> > -> > Ability to have execution plans seems important to me. Other DBMS-es use -> > this for stored procedures and views. -> > -> > Vadim -> > -> -> I see what you are saying about other people calling pg_plan(). pg_plan -> returns the query rewritten, and a plan, and some areas use that. I -> will have to make sure I honor that functionality in any changes I make -> to it. I will think more about this. I may have to add an 'execute me' -> flag to it. However, I am unsure how I am going to generate 'just a -> plan or rewritten query structure' without actually running the query -> and having the temp table created so the rest can be parsed. - -That's why I suggest to try with nodeMaterial(): this could allow to handle -subqueries on optimizer level and got single execution plan for -single user query. - -Vadim - - -From owner-pgsql-hackers@hub.org Mon Dec 1 02:46:23 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA11762 - for ; Mon, 1 Dec 1997 02:46:21 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id CAA11681; Mon, 1 Dec 1997 02:35:00 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 02:33:17 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id CAA11451 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 02:33:09 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id CAA11110 for ; Mon, 1 Dec 1997 02:32:10 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id CAA11574; - Mon, 1 Dec 1997 02:32:45 -0500 (EST) -From: Bruce Momjian -Message-Id: <199712010732.CAA11574@candle.pha.pa.us> -Subject: Re: [HACKERS] querytrees and multiple statements -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Mon, 1 Dec 1997 02:32:45 -0500 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <34825E16.446B9B3D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 1, 97 01:49:58 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> -> Bruce Momjian wrote: -> > -> > > -> > > No. I just said that there will be no ability to prepare queries with -> > > subselects for latter execution: will be no ability to get execution plan which -> > > could be passed to executor to get results without additional parser/planner -> > > invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp() -> > > (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored -> > > in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects. -> > > -> > > Ability to have execution plans seems important to me. Other DBMS-es use -> > > this for stored procedures and views. -> > > -> > > Vadim -> > > -> > -> > I see what you are saying about other people calling pg_plan(). pg_plan -> > returns the query rewritten, and a plan, and some areas use that. I -> > will have to make sure I honor that functionality in any changes I make -> > to it. I will think more about this. I may have to add an 'execute me' -> > flag to it. However, I am unsure how I am going to generate 'just a -> > plan or rewritten query structure' without actually running the query -> > and having the temp table created so the rest can be parsed. -> -> That's why I suggest to try with nodeMaterial(): this could allow to handle -> subqueries on optimizer level and got single execution plan for -> single user query. - -Can you give me more details on this? I realize I can create an empty -tmp table to get through the parser analysis stuff, but how do I do -something in nodeMaterial? - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Tue Dec 2 00:04:05 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA00350 - for ; Tue, 2 Dec 1997 00:03:58 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id MAA22889; Tue, 2 Dec 1997 12:09:57 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <34839824.3F54BC7E@sable.krasnoyarsk.su> -Date: Tue, 02 Dec 1997 12:09:56 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: "Vadim B. Mikheev" , hackers@postgreSQL.org -Subject: Re: [HACKERS] querytrees and multiple statements -References: <199712010732.CAA11574@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > -> > That's why I suggest to try with nodeMaterial(): this could allow to handle -> > subqueries on optimizer level and got single execution plan for -> > single user query. -> -> Can you give me more details on this? I realize I can create an empty -> tmp table to get through the parser analysis stuff, but how do I do -> something in nodeMaterial? - - * ExecMaterial - * - * The first time this is called, ExecMaterial retrieves tuples - * this node's outer subplan and inserts them into a temporary - ^^^^^^^ - - * relation. After this is done, a flag is set indicating that - * the subplan has been materialized. Once the relation is - * materialized, the first tuple is then returned. Successive - * calls to ExecMaterial return successive tuples from the temp - * relation. - -As you see, this node materializes some plan results into temp relation: -instead of doing SELECT ... INTO temp FROM ... WHERE ... you could -create Material node using plan for 'SELECT ... FROM ... WHERE ...' as -its subplan. SeqScan of this materialized relation can be used in any -join plans just like scan od normal relation, e.g. - NESTLOOP plan: - - NESTLOOP - SeqScan A - SeqScan B - -becomes - - NESTLOOP - SeqScan - Material - ...subplan here... - SeqScan B (or other Material) - -and so on... - -Vadim - -From owner-pgsql-hackers@hub.org Tue Dec 2 01:28:02 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA02313 - for ; Tue, 2 Dec 1997 01:28:00 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA00346; Tue, 2 Dec 1997 01:03:55 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 02 Dec 1997 01:03:04 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA28750 for pgsql-hackers-outgoing; Tue, 2 Dec 1997 01:02:57 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA28254 for ; Tue, 2 Dec 1997 01:02:38 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id BAA01042; - Tue, 2 Dec 1997 01:02:15 -0500 (EST) -From: Bruce Momjian -Message-Id: <199712020602.BAA01042@candle.pha.pa.us> -Subject: Re: [HACKERS] querytrees and multiple statements -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Tue, 2 Dec 1997 01:02:15 -0500 (EST) -Cc: vadim@post.krasnet.ru, hackers@postgreSQL.org -In-Reply-To: <34839824.3F54BC7E@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 2, 97 12:09:56 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> -> Bruce Momjian wrote: -> > -> > > -> > > That's why I suggest to try with nodeMaterial(): this could allow to handle -> > > subqueries on optimizer level and got single execution plan for -> > > single user query. -> > -> > Can you give me more details on this? I realize I can create an empty -> > tmp table to get through the parser analysis stuff, but how do I do -> > something in nodeMaterial? -> -> * ExecMaterial -> * -> * The first time this is called, ExecMaterial retrieves tuples -> * this node's outer subplan and inserts them into a temporary -> ^^^^^^^ -> -> * relation. After this is done, a flag is set indicating that -> * the subplan has been materialized. Once the relation is -> * materialized, the first tuple is then returned. Successive -> * calls to ExecMaterial return successive tuples from the temp -> * relation. -> -> As you see, this node materializes some plan results into temp relation: -> instead of doing SELECT ... INTO temp FROM ... WHERE ... you could -> create Material node using plan for 'SELECT ... FROM ... WHERE ...' as -> its subplan. SeqScan of this materialized relation can be used in any -> join plans just like scan od normal relation, e.g. - NESTLOOP plan: -> -> NESTLOOP -> SeqScan A -> SeqScan B -> -> becomes -> -> NESTLOOP -> SeqScan -> Material -> ...subplan here... -> SeqScan B (or other Material) -> -> and so on... - -The problem now is that I don't understand much about what happens -inside the optimizer or executor. I am sure you are correct that we can -have the subselect as a subnode, and if you think that is best, then it -is. - -This pretty much stops me in developing subselects. I have the concepts -down of what has to happen, but I can not implement it. It will take me -several months to learn how the optimizer and executor work in enough -detail to implement this. - -I usually alot 2-3 days a month for PostgreSQL development. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Thu Oct 30 01:30:59 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA17986 - for ; Thu, 30 Oct 1997 01:30:58 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA27090 for ; Thu, 30 Oct 1997 01:19:49 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA28901; Thu, 30 Oct 1997 01:16:38 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 01:16:17 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA28673 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 01:16:10 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA27557 for ; Thu, 30 Oct 1997 01:15:27 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA20275; Thu, 30 Oct 1997 13:16:10 +0700 (KRS) -Message-ID: <34582629.33590565@sable.krasnoyarsk.su> -Date: Thu, 30 Oct 1997 13:16:09 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: PostgreSQL Developers List -Subject: [HACKERS] Subqueries? -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Hi! - -Bruce, did you begin with them ? -I agreed that subqueries should be implemented like SQL-funcs, but -I would suggest to don't CREATE FUNCTION - this is quite bad for -performance, but use some new node (VirtualFunc or SubQuery or) and -handle such nodes like sql-funcs are handled in function.c -(but without parser/planner invocation on each call - should be -fixed!). Also, not corelated subqueries returning single result -can't be replaced in parser/planner by constant node: rules (and so - -views), spi and PL use _prepared_ plans... -It seems that this is not hard work... - -Vadim - - -From owner-pgsql-hackers@hub.org Thu Oct 30 16:31:59 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA07360 - for ; Thu, 30 Oct 1997 16:31:49 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA11483; Thu, 30 Oct 1997 16:27:11 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:26:14 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA11163 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:26:07 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA10874 for ; Thu, 30 Oct 1997 16:25:12 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id QAA06370; - Thu, 30 Oct 1997 16:07:52 -0500 (EST) -From: Bruce Momjian -Message-Id: <199710302107.QAA06370@candle.pha.pa.us> -Subject: Re: [HACKERS] Subqueries? -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Thu, 30 Oct 1997 16:07:51 -0500 (EST) -Cc: hackers@postgreSQL.org -In-Reply-To: <34582629.33590565@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Oct 30, 97 01:16:09 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> -> Hi! -> -> Bruce, did you begin with them ? -> I agreed that subqueries should be implemented like SQL-funcs, but -> I would suggest to don't CREATE FUNCTION - this is quite bad for -> performance, but use some new node (VirtualFunc or SubQuery or) and -> handle such nodes like sql-funcs are handled in function.c -> (but without parser/planner invocation on each call - should be -> fixed!). Also, not corelated subqueries returning single result -> can't be replaced in parser/planner by constant node: rules (and so - -> views), spi and PL use _prepared_ plans... -> It seems that this is not hard work... -> -> Vadim -> -> - -OK, here is what I have collected over the months about subqueries. -The Sybase whitepaper is also attached. - -This should get us thinking about how to implement each subquery type, -what operations need to be performed, and in what order. - ---------------------------------------------------------------------------- - -From: Bruce Momjian -Subject: Re: [PG95-DEV] Need info on other databases. -To: pg95-dev@ki.net -Date: Fri, 22 Nov 1996 12:49:24 -0500 (EST) - -> -> -> What I'm specifically interested in is the SQL-92 spec -> for the ANSI things that postgres95 is missing and the -> syntax/limitations on systems like Informix, Sybase, -> Microsoft, et.al... -> -> Any technical info such as performance hits, disabling -> the use of indices, stuff like that would be _greatly_ -> appreciated. I have a decent understanding of this for -> Oracle, but not for any other systems. I want to get -> an idea of the work load of adding the IN, BETWEEN/AND -> and HAVING clauses. - -I have done some thinking about subselects. There are basically two -issues: - - Does the query return one row or several rows? This can be - determined by seeing if the user uses equals on 'IN' to join the - subquery. - - Is the query correlated, meaning "Does the subquery reference - values from the outer query?" - -(We already have the third type of subquery, the INSERT...SELECT query.) - -So we have these four combinations: - - 1) one row, no correlation - 2) multiple rows, no correlation - 3) one row, correlated - 4) multiple rows, correlated - - -With #1, we can execute the subquery, get the value, replace the -subquery with the constant returned from the subquery, and execute the -outer query. - -With #2, we can execute the subquery and put the result into a temporary -table. We then rewrite the outer query to access the temporary table -and replace the subquery with the column name from the temporary table. -We probabally put an index on the temp. table, which has only one -column, because a subquery can only return one column. We remove the -temp. table after query execution. - -With #3 and #4, we potentially need to execute the subquery for every -row returned by the outer query. Performance would be horrible for -anything but the smallest query. Another way to handle this is to -execute the subquery WITHOUT using any of the outer-query columns to -restrict the WHERE clause, and add those columns used to join the outer -variables into the target list of the subquery. So for query: - - select t1.name - from tab t1 - where t1.age = (select max(t2.age) - from tab2 - where tab2.name = t1.name) - -Execute the subquery and put it in a temporary table: - - select t2.name, max(t2.age) - into table temp999 - from tab2 - where tab2.name = t1.name - - create index i_temp999 on temp999 (name) - -Then re-write the outer query: - - select t1.name - from tab t1, temp999 - where t1.age = temp999.age and - t1.name = temp999.name - -The only problem here is that the subselect is running for all entries -in tab2, even if the outer query is only going to need a few rows. -Determining whether to execute the subquery each time, or create a temp. -table is often difficult to determine. Even some non-correlated -subqueries are better to execute for each row rather the pre-execute the -entire subquery, expecially if the outer query returns few rows. - -One requirement to handle these issues is better column statistics, -which I am working on. - ------------------------------------------------------------------------------- - -Date: Thu, 5 Dec 1996 10:07:56 -0500 -From: aixssd!darrenk@abs.net (Darren King) -To: maillist@candle.pha.pa.us -Subject: Subselect info. - -> Any of them deal with implementing subselects? - -There's a white paper at the www.sybase.com that might -help a little. It's just a copy of a presentation -given by the optimizer guru there. Nothing code-wise, -but he gives a few ways of flattening them with temp -tables, etc... - -Darren - ------------------------------------------------------------------------------- - -Date: Fri, 22 Aug 1997 12:04:31 +0800 -From: "Vadim B. Mikheev" -To: Bruce Momjian -Subject: Re: subselects - -Bruce Momjian wrote: -> -> Considering the complexity of the primary/secondary changes you are -> making, I believe subselects will be easier than that. - -I don't do changes for P/F keys - just thinking... -Yes, I think that impl of referential integrity is -more complex work. - -As for subselects: - -in plannodes.h - -typedef struct Plan { -... - struct Plan *lefttree; - struct Plan *righttree; -} Plan; - -/* ---------------- - * these are are defined to avoid confusion problems with "left" - ^^^^^^^^^^^^^^^^^^ - * and "right" and "inner" and "outer". The convention is that - * the "left" plan is the "outer" plan and the "right" plan is - * the inner plan, but these make the code more readable. - * ---------------- - */ -#define innerPlan(node) (((Plan *)(node))->righttree) -#define outerPlan(node) (((Plan *)(node))->lefttree) - -First thought is avoid any confusions by re-defining - -#define rightPlan(node) (((Plan *)(node))->righttree) -#define leftPlan(node) (((Plan *)(node))->lefttree) - -and change all occurrences of 'outer' & 'inner' in code -to 'left' & 'inner' ones: - -this will allow to use 'outer' & 'inner' things for subselects -latter, without confusion. My hope is that we may change Executor -very easy by adding outer/inner plans/TupleSlots to -EState, CommonState, JoinState, etc and by doing node -processing in right order. - -Subselects are mostly Planner problem. - -Unfortunately, I havn't time at the moment: CHECK/DEFAULT... - -Vadim - ------------------------------------------------------------------------------- - -Date: Fri, 22 Aug 1997 12:22:37 +0800 -From: "Vadim B. Mikheev" -To: Bruce Momjian -Subject: Re: subselects - -Vadim B. Mikheev wrote: -> -> this will allow to use 'outer' & 'inner' things for subselects -> latter, without confusion. My hope is that we may change Executor - -Or may be use 'high' & 'low' for subselecs (to avoid confusion -with outter hoins). - -> very easy by adding outer/inner plans/TupleSlots to -> EState, CommonState, JoinState, etc and by doing node -> processing in right order. - ^^^^^^^^^^^^^^ -Rule is easy: -1. Uncorrelated subselect - do 'low' plan node first -2. Correlated - do left/right first - -- just some flag in structures. - -Vadim - - ---------------------------------------------------------------------------- - -[Image] -Home | Search/Index - -Performance Tips for Transact-SQL - -Slides from a presentation by Jeff Lichtman - ----------------------------------------------------------------------------- - -Table of Contents - -Overview ->versus>= -Exists Versus Not Exists -Exists Versus Not Exists II -Correlated Subqueries with Restrictive Outer Joins -Correlated Subqueries with Restrictive Outer Joins Example -Correlated Subqueries with Restrictive Outer Joins III -Correlated Subqueries with Restrictive Outer Joins IV -Correlated Subqueries with Restrictive Outer Joins V -Correlated Subqueries with Restrictive Outer Joins Example -Creating Tables in Stored Procedures -Creating Tables in Stored Procedures Example -Variables versus Parameters in Where Clause -Variables versus Parameters in Where Clause Example -Count versus Exists -Count versus Exists II -Or versus Union -Or versus Union Example -MAX and MIN Aggregates -MAX and MIN Aggregates II -MAX and MIN Aggregates Example -MAX and MIN Aggregates III -Joins and Datatypes -Joins and Datatypes Example -Joins and Datatypes II -Joins and Datatypes III -Parameters and Datatypes -Parameters and Datatypes Example -Summary ----------------------------------------------------------------------------- - -Overview - - * Goal Is to Learn Some Tips to Help You Improve the Performance of Your - Queries. - * Emphasis Is on Queries, Not on Schema. - * Many Tips Are Not Related to Query Optimizer. - * Tips Are Based on Actual Customer Cases Seen by SQL Server Development - Engineer. - * These Tips Are Intended As Suggestions and Guidelines, Not Absolute - Rules. - * Some of These Tips Could Become Obsolete As Sybase Improves the SQL - Server. - ----------------------------------------------------------------------------- - -> versus >= - -Given the query: - -select * from tab where x > 3 - -with an index on x. This query works by using the index to find the first -value where x = 3, and scanning forward. - -Suppose there are many rows in tab where x = 3. - -In this case, the server has to scan many pages before finding the first row -where x > 3. - -It is more efficient to write the query like this: - -select * from tab where x >= 4 - ----------------------------------------------------------------------------- - -Exists Versus Not Exists - -In subqueries and IF statements, EXISTS and IN are faster than NOT EXISTS -and NOT IN. - -With IF statements, one can easily avoid NOT EXISTS: - -if not exists (select * from ...) -begin /* Statement group 1 */ -... -end else begin /* Statement group 2 */ -... -end - -can be re-written as: - -if exists (select * from ...) -begin /* Statement group 2 */ -... -end else begin /* Statement group 1 */ -... -end - ----------------------------------------------------------------------------- - -Exists versus Not Exists (cont.) - -Even without an ELSE clause, it is possible to avoid - -NOT EXISTS in IF statements : - -if not exists (select * from ...) -begin - /* Statement group */ - ... -end -... - -can be re-written as: - -if exists (select * from ...) -begin - goto exists_label -end -/* Statement group */ -... -exists_label: -... - ----------------------------------------------------------------------------- - -Correlated Subqueries with Restrictive Outer Joins - - * SQL Server Processes Subqueries "Inside-Out" - * For Correlated Subqueries, It Creates a Worktable Containing Subquery - Results - * The Worktable Is Grouped on the Correlation Columns - ----------------------------------------------------------------------------- - -Correlated Subqueries with Restrictive Outer Joins - -For example: - -select w from outer where x = - (select sum(a) from inner - where inner.b = outer.z) - -becomes: - -select outer.z, summ = sum(inner.a) -into #work -from outer, inner -where inner.b = outer.z -group by outer.z -select outer.w -from outer, #work -where outer.z = #work.z -and outer.x = #work.summ - ----------------------------------------------------------------------------- - -Correlated Subqueries with Restrictive Outer Joins (cont.) - -The SQL Server copies search clauses from the outer query to the subquery to -improve performance: - -select w from outer -where y = 1 -and x = (select sum(a) - from inner - where inner.b = outer.z) - -becomes: - -select outer.z, summ = sum(inner.a) -into #work -from outer, inner -where inner.b = outer.z and outer.y = 1 -group by outer .z -select outer.w -from outer, #work -where outer.z = #work.z and outer.y = 1 and outer.x =#work.summ - ----------------------------------------------------------------------------- - -Correlated Subqueries with Restrictive Outer Joins (cont.) - - * The SQL Server Does Not Copy Join Clauses Into Correlated Subqueries As - It Does With Search Clauses. - * Copying Search Clauses Will Always Make the Query Run Faster, but - Copying a Join Clause Might Make It Run Slower. - * Copying the Join Clause Is Beneficial Only If the Join Clause Is Very - Restrictive. - * Only the Query Optimizer Knows Whether a Join Clause Is Restrictive, - but the SQL Server Breaks the Query Into Steps Before Optimization. - * Since You Know Your Data, You Can Copy Join Clauses Into Subqueries - When You Know It Will Help. - ----------------------------------------------------------------------------- - -Correlated Subqueries with Restrictive Outer Joins (cont.) - -An example of when to copy join clause: - -select * -from huge_tab, single_row_tab -where huge_tab.unique_column = single_row_tab.a -and huge_tab.b = (select sum© - from inner - where huge_tab.d = inner.e) - -should be re-written as: - -select * -from huge_tab, single_row_tab -where huge_tab.unique_column = single_row_tab.a -and huge_tab.b = (select sum© - from inner - where huge_tab.d = inner.e - and huge_tab.unique_column = single_row_tab.a) - ----------------------------------------------------------------------------- - -Correlated Subqueries with Restrictive Outer Joins (cont.) - -An example of when not to copy join clause: - -select * -from huge_tab, single_row_tab -where huge_tab.many_duplicates_in_column = single_row_tab.a and -single_row_tab.b = (select sum© - from inner - where single_row_tab.d = inner.e) - -Should not be re-written as: - -select * -from huge_tab, single_row_tab -where huge_tab.many_duplicates_in_column = single_row_tab.a and -single_row_tab.b = (select sum© - from inner - where single_row tab.d = inner .e - and huge_tab.many_duplicates_in_column = single_row_tab.a) - ----------------------------------------------------------------------------- - -Creating Tables in Stored Procedures - - * When You Create a Table in the Same Stored Procedure Where It Is Used, - the Query Optimizer Cannot Know How Big the Table Is. - * The Optimizer Assumes That Any Such Table Has 10 Data Pages and 100 - Rows. - * If the Table Is Really Big, This Assumption Can Lead the Optimizer to - Choose a Sub-Optimal Query Plan. - * In Cases Like This, It Is Better to Create the Table Outside the - Procedure, Which Allows the Optimizer to See How Large the Table Is. - ----------------------------------------------------------------------------- - -Creating Tables in Stored Procedures (cont) - -For example: - -create proc p as - select * into #huge_result from ... - select * from tab, #huge_result where - ... - -can be re-written as: - -create proc p as - select * into #huge_result from ... - exec s -create proc s as - select * from tab, #huge_result where - ... - ----------------------------------------------------------------------------- - -Variables versus Parameters in Where Clause - - * The Query Optimizer Cannot Predict the Value of a Declared Variable. - * The Query Does Know the Value of a Parameter to a Stored Procedure at - Compile Time. - * Knowing the Values in the WHERE Clause of a Query Can Help the - Optimizer Make Better Choices. - * To Avoid Putting Variables Into WHERE Clauses, One Can Split up Stored - Procedures. - ----------------------------------------------------------------------------- - -Variables versus Parameters in Where Clause (cont) - -For example: - -create procedure p as - declare @x int - select @x = col from tab where ... - select * from tab2 where col2 = @x - -can be re-written as: - -create procedure p as - declare @x int - select @x = col from tab where ... - exec s @x -create procedure s @x int as - select * from tab2 where col2 = @x - ----------------------------------------------------------------------------- - -Count versus Exists - -It is possible to use the COUNT aggregate in a subquery to do an existence -check: - -select * from tab where 0 < - (select count(*) from tab2 where ...) - -It is possible to write this same query using EXISTS (or IN): - -select * from tab where exists - (select * from tab2 where ...) - ----------------------------------------------------------------------------- - -Count versus Exists (cont) - - * Using COUNT to Do an Existence Check Is Slower Than Using EXISTS. - * When You Use COUNT, the SQL Server Does Not Know That You Are Doing an - Existence Check. It Counts All of the Matching Values. - * When You Use EXISTS, the SQL Server Knows You Are Doing an Existence - Check, So It Stops Looking When It Finds the First Matching Value. - * The Same Applies to Using COUNT Instead of IN or ANY. - ----------------------------------------------------------------------------- - -Or versus Union - - * The SQL Server Cannot Optimize Join Clauses That Are Linked With OR. - * The SQL Server Can Optimize Selects That Are Linked With UNION. - * The Result of OR Is Somewhat Like the Result of UNION, Except For the - Treatment of Duplicate Rows and Empty Tables. - ----------------------------------------------------------------------------- - -Or versus Union (cont) - -For example: - -select * from tab1, tab2 -where tab1.a = tab2.b -or tab1.x = tab2.y - -can be re-written as: - -select * from tab1, tab2 -where tab1.a = tab2.b -union all -select * from tab1, tab2 -where tab1.x = tab2.y - -You can use UNION instead of UNION ALL if you want to eliminate duplicates, -but this will eliminate all duplicates. It may not be possible to get -exactly the same set of duplicates from the re-written query. ----------------------------------------------------------------------------- - -MAX and MIN Aggregates - - * The SQL Server Uses Special Optimizations for the MAX and MIN - Aggregates When There Is an Index on the Aggregated Column. - * For MIN, It Stops the Scan on the First Qualifying Row. - * For MAX, It Goes Directly to the End of the Index to Find the Last Row. - * The Optimization Is Not Applied If: - o The Expression Inside the MAX or MIN Is Anything but a Column - o The Column Inside the MAX or MIN Is Not the First Column of an - Index - o There Is Another Aggregate in the Query - o There Is a GROUP BY Clause - * In Addition, the MAX Optimization Is Not Applied If There Is a WHERE - Clause. - ----------------------------------------------------------------------------- - -MAX and MIN Aggregates (cont) - -If you have an optimizable MAX or MIN aggregate, it can pay to put it in a -query separate from other aggregates. For example: - -select max(x), min(x) from tab - -will result in a full scan of tab, even if there is an index on x. The query -can be re-written as: - -select max(x) from tab -select min(x) from tab - -This can result in using the index twice, rather than scanning the entire -table once. ----------------------------------------------------------------------------- - -MAX and MIN Aggregates (cont) - -The MIN optimization can backfire if the where clause is highly selective. -For example: - -select min(index_col) -from tab -where - col_in_other_index = "value only at end of first index" - -The MIN optimization will result in a nearly complete scan of the entire -index. - -This is counter-intuitive. The more selective the WHERE clause, the slower -the query. ----------------------------------------------------------------------------- - -MAX and MIN Aggregates (cont) - -In cases like this, it can pay to disable the MIN optimization by combining -it with another aggregate: - -select min(index_col), max(index_col) -from tab -where -col_in_other_index = Òvalue only at end of first indexÓ - -This convinces the optimizer not to use the MIN optimization, so it chooses -the next best plan, which might be the other index. ----------------------------------------------------------------------------- - -Joins and Datatypes - - * When Joining Between Two Columns of the Different Datatypes, One of the - Columns Must Be Converted to the Type of the Other. - * The Commands Reference Manual Shows the Hierarchy of Types. - * The Column Whose Type Is Lower in the Hierarchy Is the One That Is - Converted. - * The Query Optimizer Cannot Choose an Index on the Column That Is - Converted. - ----------------------------------------------------------------------------- - -Joins and Datatypes (cont) - -For example: - -select * -from tab1, tab2 -where tab1.float_column = tab2.int_column - -In this case, no index on tab2.int_column can be used, because int is lower -in the hierarchy than float. - -Note that CHAR NULL is really VARCHAR, and BINARY NULL is really VARBINARY. - -Joining CHAR NOT NULL with CHAR NULL involves a conversion (BINARY too). ----------------------------------------------------------------------------- - -Joins and Datatypes (cont) - -It's best to avoid datatype problems in joins by designing the schema -accordingly. - -If a join between different datatypes is unavoidable, and it hurts -performance, you can force the conversion to be on the other side of the -join. - -For example: - -select * -from tab1, tab2 -where tab1.char_column = convert(char(75),tab2.varchar_column) - ----------------------------------------------------------------------------- - -Joins and Datatypes (cont) - -Be careful! This tactic can change the meaning of the query. - -For example: - -select * -from tab1, tab2 -where tab1.int_column = convert(int, tab2.float_column) - -This will not return the same results as the join without the convert. It -can be salvaged by adding: - -and tab2.float_column = convert(int, tab2.float_column) - -This assumes that all values in tab2.float_column can be converted to int. ----------------------------------------------------------------------------- - -Parameters and Datatypes - - * The Query Optimizer Can Use the Values of Parameters to Stored - Procedures to Help Determine Costs. - * If a Parameter Is Not of the Same Type As the Column in The WHERE - Clause That It Is Being Compared to, the Server Has to Convert the - Parameter. - * The Optimizer Cannot Use the Value of a Converted Parameter. - * It Pays to Make Sure That Parameters Have the Same Type As the Columns - They Are Compared To. - ----------------------------------------------------------------------------- - -Parameters and Datatypes (cont) - -For example: - -create proc p @x varchar(30) as -select * from tab where char_column = @x - -may get a poorer query plan than: - -create proc p @x char(30) as -select * from tab where char_column = @x - -Remember that CHAR NULL is really VARCHAR, and BINARY NULL is really -VARBINARY. ----------------------------------------------------------------------------- - -Summary - - * How you write your queries can make a big difference in performance. - * Two different queries that do the same thing may perform differently. - * There are few absolutes to improving performance, but the tips given - here can help. - * These tips are not all there is to know about performance. - -About the Author - -Jeff Lichtman has worked at Sybase since 1987. In 1994, he was given the new -position of architect of query processing for SQL Server. He is informally -known as Sybase's optimizer guru. - -For more info send email to webmaster@sybase.com - -Copyright 1995 © Sybase, Inc. All Rights Reserved. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Sun Jan 11 23:49:44 1998 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA19252 - for ; Sun, 11 Jan 1998 23:49:02 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id MAA08095; - Mon, 12 Jan 1998 12:09:24 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B9A580.55DD4645@sable.krasnoyarsk.su> -Date: Mon, 12 Jan 1998 12:09:20 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu -Subject: Re: [HACKERS] Re: subselects -References: <199801110559.AAA11801@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> We need a new Node structure, call it Sublink: -> -> int linkType (IN, NOTIN, ANY, EXISTS, OPERATOR...) -> Oid operator /* subquery must return single row */ -> List *lefthand; /* parent stuff */ -> Node *subquery; /* represents nodes from parser */ -> Index Subindex; /* filled in to index Query->subqueries */ - -Ok, I agreed that it's better to have new node and don't put subquery stuff -into Expr node. - -int linkType - is one of EXISTS, ANY, ALL, EXPR. EXPR is for the case of expression - subqueries (following Sybase naming) which must return single row - - (a, b, c) = (subquery). - Note again, that there are no linkType for IN and NOTIN here. - User' IN and NOT IN must be converted to = ANY and <> ALL by parser. - -We need not in Oid operator! In all cases we need in - -List *oper - list of Oper nodes for each of a, b, c, ... and operator (=, ...) - corresponding to data type of a, b, c, ... - -List *lefthand - is list of Var/Const nodes - representation of (a, b, c, ...) - -What is Node *subquery ? -In optimizer we need either in Subindex (to get subquery from Query->subqueries -when beeing in Sublink) or in Node *subquery inside Sublink itself. -BTW, after some thought I don't see how Query->subqueries will be usefull. -So, may be just add bool hassubqueries to Query (and Query *parentQuery) -and use Query *subquery in Sublink, but not subindex ? - -> -> Also, when parsing the subqueries, we need to keep track of correlated -> references. I recommend we add a field to the Var structure: -> -> Index sublevel; /* range table reference: -> = 0 current level of query -> < 0 parent above this many levels -> > 0 index into subquery list -> */ -> -> This way, a Var node with sublevel 0 is the current level, and is true -> in most cases. This helps us not have to change much code. sublevel = -> -1 means it references the range table in the parent query. sublevel = -> -2 means the parent's parent. sublevel = 2 means it references the range -> table of the second entry in Query->subqueries. Varno and varattno are -> still meaningful. Of course, we can't reference variables in the -> subqueries from the parent in the parser code, but Vadim may want to. - ^^^^^^^^^^^^^^^^^ -No. So, just use sublevel >= 0: 0 - current level, 1 - one level up, ... -sublevel is for optimizer only - executor will not use it. - -> -> When doing a Var lookup in the parser, we look in the current level -> first, but if not found, if it is a subquery, we can look at the parent -> and parent's parent to set the sublevel, varno, and varatno properly. -> -> We create no phantom range table entries in the subquery, and no phantom -> target list entries. We can leave that all for the upper optimizer. - -Ok. - -Vadim - -From vadim@sable.krasnoyarsk.su Mon Jan 12 08:06:41 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00786 - for ; Mon, 12 Jan 1998 08:06:39 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12270 for ; Mon, 12 Jan 1998 04:16:10 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08460; - Mon, 12 Jan 1998 16:34:54 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su> -Date: Mon, 12 Jan 1998 16:34:45 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: "Thomas G. Lockhart" -CC: Bruce Momjian , hackers@postgreSQL.org -Subject: Re: [HACKERS] Re: subselects -References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Thomas G. Lockhart wrote: -> -> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called -> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions. -> If lists are handled farther back, this routine should move to there also and the -> parser will just pass the lists. Note that some assumptions have to be made about the -> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of -> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK -> to disallow those cases or to look for specific appearance of the operator to guess -> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if -> it has "<>" or "!" then build as "or"s. - -Oh, god! I never thought about this! -Ok, I have to agree: - -1. Only <, <=, =, >, >=, <> is allowed with subselects -2. Use OR's for <>, and so - we need in bool useor in SubLink - for <>, <> ANY and <> ALL: - -typedef struct SubLink { - NodeTag type; - int linkType; /* EXISTS, ALL, ANY, EXPR */ - bool useor; /* TRUE for <> */ - List *lefthand; /* List of Var/Const nodes on the left */ - List *oper; /* List of Oper nodes */ - Query *subquery; /* */ -} SubLink; - -Vadim - -From owner-pgsql-hackers@hub.org Mon Jan 12 08:06:53 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00814 - for ; Mon, 12 Jan 1998 08:06:51 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12449 for ; Mon, 12 Jan 1998 04:26:03 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id EAA01671; Mon, 12 Jan 1998 04:17:59 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 12 Jan 1998 04:17:29 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id EAA01651 for pgsql-hackers-outgoing; Mon, 12 Jan 1998 04:17:23 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id EAA01633 for ; Mon, 12 Jan 1998 04:16:44 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08460; - Mon, 12 Jan 1998 16:34:54 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Message-ID: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su> -Date: Mon, 12 Jan 1998 16:34:45 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: "Thomas G. Lockhart" -CC: Bruce Momjian , hackers@postgreSQL.org -Subject: Re: [HACKERS] Re: subselects -References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Thomas G. Lockhart wrote: -> -> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called -> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions. -> If lists are handled farther back, this routine should move to there also and the -> parser will just pass the lists. Note that some assumptions have to be made about the -> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of -> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK -> to disallow those cases or to look for specific appearance of the operator to guess -> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if -> it has "<>" or "!" then build as "or"s. - -Oh, god! I never thought about this! -Ok, I have to agree: - -1. Only <, <=, =, >, >=, <> is allowed with subselects -2. Use OR's for <>, and so - we need in bool useor in SubLink - for <>, <> ANY and <> ALL: - -typedef struct SubLink { - NodeTag type; - int linkType; /* EXISTS, ALL, ANY, EXPR */ - bool useor; /* TRUE for <> */ - List *lefthand; /* List of Var/Const nodes on the left */ - List *oper; /* List of Oper nodes */ - Query *subquery; /* */ -} SubLink; - -Vadim - - -From vadim@sable.krasnoyarsk.su Mon Jan 12 08:06:38 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00783 - for ; Mon, 12 Jan 1998 08:06:36 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12377 for ; Mon, 12 Jan 1998 04:21:55 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08470; - Mon, 12 Jan 1998 16:40:49 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34B9E520.4C0EA6BC@sable.krasnoyarsk.su> -Date: Mon, 12 Jan 1998 16:40:48 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: "Thomas G. Lockhart" -CC: Bruce Momjian , hackers@postgreSQL.org -Subject: Re: [HACKERS] Re: subselects -References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Thomas G. Lockhart wrote: -> -> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called -> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions. -> If lists are handled farther back, this routine should move to there also and the -> parser will just pass the lists. Note that some assumptions have to be made about the -> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of -> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK -> to disallow those cases or to look for specific appearance of the operator to guess -> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if -> it has "<>" or "!" then build as "or"s. - -Sorry, I forgot something: is (a, b) OP (x, y) in standard ? -If not then I suggest to don't implement it at all and allow -(a, b) OP [ANY|ALL] (subselect) only. - -Vadim - -From vadim@sable.krasnoyarsk.su Tue Jan 13 09:30:58 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA28551 - for ; Tue, 13 Jan 1998 09:30:56 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA26483 for ; Tue, 13 Jan 1998 09:21:36 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id VAA04356; - Tue, 13 Jan 1998 21:20:31 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34BB7829.2B18D4B5@sable.krasnoyarsk.su> -Date: Tue, 13 Jan 1998 21:20:25 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu -Subject: Re: [HACKERS] Re: subselects -References: <199801121424.JAA02440@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Ok. I don't see how Query->subqueries could me help, but I foresee -that Query->sublinks can do it. Could you add this ? - -Bruce Momjian wrote: -> -> > -> > What is Node *subquery ? -> > In optimizer we need either in Subindex (to get subquery from Query->subqueries -> > when beeing in Sublink) or in Node *subquery inside Sublink itself. -> > BTW, after some thought I don't see how Query->subqueries will be usefull. -> > So, may be just add bool hassubqueries to Query (and Query *parentQuery) -> > and use Query *subquery in Sublink, but not subindex ? -> -> OK, I originally created it because the parser would have trouble -> filling in a List* field in SelectStmt while it was parsing a WHERE -> clause. I decided to just stick the SelectStmt* into Sublink->subquery. -> -> While we are going through the parse output to fill in the Query*, I -> thought we should move the actual subquery parse output to a separate -> place, and once the Query* was completed, spin through the saved -> subquery parse list and stuff Query->subqueries with a list of Query* -> for the subqueries. I thought this would be easier, because we would -> then have all the subqueries in a nice list that we can manage easier. -> -> In fact, we can fill Query->subqueries with SelectStmt* as we process -> the WHERE clause, then convert them to Query* at the end. -> -> If you would rather keep the subquery Query* entries in the Sublink -> structure, we can do that. The only issue I see is that when you want -> to get to them, you have to wade through the WHERE clause to find them. -> For example, we will have to run the subquery Query* through the rewrite -> system. Right now, for UNION, I have a nice union List* in Query, and I -> just spin through it in postgres.c for each Union query. If we keep the -> subquery Query* inside Sublink, we have to have some logic to go through -> and find them. -> -> If we just have an Index in Sublink to the Query->subqueries, we can use -> the nth() macro to find them quite easily. -> -> But it is up to you. I really don't know how you are going to handle -> things like: -> -> select * -> from taba -> where x = 3 and y = 5 and (z=6 or q in (select g from tabb )) - -No problems. - -> -> My logic was to break the problem down to single queries as much as -> possible, so we would be breaking the problem up into pieces. Whatever -> is easier for you. - -Vadim - -From owner-pgsql-hackers@hub.org Tue Jan 13 10:32:35 1998 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA29523 - for ; Tue, 13 Jan 1998 10:32:33 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA03743; Tue, 13 Jan 1998 10:32:13 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 13 Jan 1998 10:31:57 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA03708 for pgsql-hackers-outgoing; Tue, 13 Jan 1998 10:31:51 -0500 (EST) -Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id KAA03628 for ; Tue, 13 Jan 1998 10:31:20 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id JAA28747; - Tue, 13 Jan 1998 09:48:00 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801131448.JAA28747@candle.pha.pa.us> -Subject: Re: [HACKERS] Re: subselects -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Tue, 13 Jan 1998 09:48:00 -0500 (EST) -Cc: hackers@postgreSQL.org, lockhart@alumni.caltech.edu -In-Reply-To: <34BB7829.2B18D4B5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 13, 98 09:20:25 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -> -> Ok. I don't see how Query->subqueries could me help, but I foresee -> that Query->sublinks can do it. Could you add this ? - -OK, so instead of moving the query out of the SubLink structure, you -want the Query* in the Sublink structure, and a List* of SubLink -pointers in the query structure? - - Query - { - ... - List *sublink; /* list of pointers to Sublinks - ... - } - -I can do that. Let me know. --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Tue Jan 13 22:23:46 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA08806 - for ; Tue, 13 Jan 1998 22:23:45 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA11486 for ; Tue, 13 Jan 1998 22:09:55 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id KAA05660; - Wed, 14 Jan 1998 10:09:07 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34BC2C4E.83E92D82@sable.krasnoyarsk.su> -Date: Wed, 14 Jan 1998 10:09:02 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu -Subject: Re: [HACKERS] Re: subselects -References: <199801131448.JAA28747@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > -> > Ok. I don't see how Query->subqueries could me help, but I foresee -> > that Query->sublinks can do it. Could you add this ? -> -> OK, so instead of moving the query out of the SubLink structure, you -> want the Query* in the Sublink structure, and a List* of SubLink -> pointers in the query structure? - -Yes. - -> -> Query -> { -> ... -> List *sublink; /* list of pointers to Sublinks -> ... -> } -> -> I can do that. Let me know. - -Thanks! - -Are there any opened issues ? - -Vadim - -From owner-pgsql-hackers@hub.org Thu Jan 15 19:00:40 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA21676 - for ; Thu, 15 Jan 1998 19:00:39 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA23948 for ; Thu, 15 Jan 1998 18:35:59 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA27814; Thu, 15 Jan 1998 18:32:40 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 15 Jan 1998 18:32:20 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27668 for pgsql-hackers-outgoing; Thu, 15 Jan 1998 18:32:08 -0500 (EST) -Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27425 for ; Thu, 15 Jan 1998 18:31:32 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id SAA12920; - Thu, 15 Jan 1998 18:18:32 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801152318.SAA12920@candle.pha.pa.us> -Subject: Re: [HACKERS] Re: subselects -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Thu, 15 Jan 1998 18:18:31 -0500 (EST) -Cc: hackers@postgreSQL.org, lockhart@alumni.caltech.edu -In-Reply-To: <34BC2C4E.83E92D82@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 14, 98 10:09:02 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -> -> Bruce Momjian wrote: -> > -> > > -> > > Ok. I don't see how Query->subqueries could me help, but I foresee -> > > that Query->sublinks can do it. Could you add this ? -> > -> > OK, so instead of moving the query out of the SubLink structure, you -> > want the Query* in the Sublink structure, and a List* of SubLink -> > pointers in the query structure? -> -> Yes. -> -> > -> > Query -> > { -> > ... -> > List *sublink; /* list of pointers to Sublinks -> > ... -> > } -> > -> > I can do that. Let me know. -> -> Thanks! -> -> Are there any opened issues ? - -OK, what do you need me to do. Do you want me to create the Sublink -support stuff, fill them in in the parser, and pass them through the -rewrite section and into the optimizer. I will prepare a list of -changes. - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Thu Jan 15 19:00:38 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA21663 - for ; Thu, 15 Jan 1998 19:00:36 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA23925 for ; Thu, 15 Jan 1998 18:35:42 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA27796; Thu, 15 Jan 1998 18:32:37 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 15 Jan 1998 18:31:52 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27463 for pgsql-hackers-outgoing; Thu, 15 Jan 1998 18:31:37 -0500 (EST) -Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27167 for ; Thu, 15 Jan 1998 18:31:06 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id SAA26747; - Thu, 15 Jan 1998 18:26:42 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801152326.SAA26747@candle.pha.pa.us> -Subject: Re: [HACKERS] Re: subselects -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Thu, 15 Jan 1998 18:26:41 -0500 (EST) -Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org -In-Reply-To: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 12, 98 04:34:45 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -> typedef struct SubLink { -> NodeTag type; -> int linkType; /* EXISTS, ALL, ANY, EXPR */ -> bool useor; /* TRUE for <> */ -> List *lefthand; /* List of Var/Const nodes on the left */ -> List *oper; /* List of Oper nodes */ -> Query *subquery; /* */ -> } SubLink; - -OK, we add this structure above. During parsing, *subquery actually -will hold Node *parsetree, not Query *. - -And add to Query: - - bool hasSubLinks; - -Also need a function to return a List* of SubLink*. I just did a -similar thing with Aggreg*. And Var gets: - - int uplevels; - -Is that it? - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Fri Jan 16 04:36:05 1998 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09604 - for ; Fri, 16 Jan 1998 04:36:03 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id EAA07040; Fri, 16 Jan 1998 04:35:27 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 16 Jan 1998 04:35:18 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id EAA06936 for pgsql-hackers-outgoing; Fri, 16 Jan 1998 04:35:13 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id EAA06823 for ; Fri, 16 Jan 1998 04:34:22 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA10384; - Fri, 16 Jan 1998 16:34:15 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Message-ID: <34BF2997.97B40172@sable.krasnoyarsk.su> -Date: Fri, 16 Jan 1998 16:34:15 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: lockhart@alumni.caltech.edu, hackers@postgreSQL.org -Subject: Re: [HACKERS] Re: subselects -References: <199801152326.SAA26747@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce Momjian wrote: -> -> > typedef struct SubLink { -> > NodeTag type; -> > int linkType; /* EXISTS, ALL, ANY, EXPR */ -> > bool useor; /* TRUE for <> */ -> > List *lefthand; /* List of Var/Const nodes on the left */ -> > List *oper; /* List of Oper nodes */ -> > Query *subquery; /* */ -> > } SubLink; -> -> OK, we add this structure above. During parsing, *subquery actually -> will hold Node *parsetree, not Query *. - ^^^^^^^^^^^^^^^ -But optimizer will get node Query here, yes ? - -> -> And add to Query: -> -> bool hasSubLinks; -> -> Also need a function to return a List* of SubLink*. I just did a -> similar thing with Aggreg*. And Var gets: -> -> int uplevels; -> -> Is that it? - -Yes. - -Vadim - - -From vadim@sable.krasnoyarsk.su Fri Jan 16 04:36:21 1998 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09607 - for ; Fri, 16 Jan 1998 04:36:06 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA10396; - Fri, 16 Jan 1998 16:37:21 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34BF2A50.A357A16D@sable.krasnoyarsk.su> -Date: Fri, 16 Jan 1998 16:37:20 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu -Subject: Re: [HACKERS] Re: subselects -References: <199801152318.SAA12920@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > -> > Are there any opened issues ? -> -> OK, what do you need me to do. Do you want me to create the Sublink -> support stuff, fill them in in the parser, and pass them through the -> rewrite section and into the optimizer. I will prepare a list of -> changes. - -Please do this. I'm ready to start coding of things in optimizer. - -Vadim - -From vadim@sable.krasnoyarsk.su Sun Jan 18 07:32:52 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA14786 - for ; Sun, 18 Jan 1998 07:32:51 -0500 (EST) -Received: from www.krasnet.ru ([193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA29385 for ; Sun, 18 Jan 1998 07:25:55 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA15780; - Sun, 18 Jan 1998 19:27:14 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34C1F51D.E9CF0A39@sable.krasnoyarsk.su> -Date: Sun, 18 Jan 1998 19:27:09 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: "Thomas G. Lockhart" -CC: Bruce Momjian , - PostgreSQL-development -Subject: Re: [HACKERS] subselects coding started -References: <199801170500.AAA12837@candle.pha.pa.us> <34C044D5.C21FE707@alumni.caltech.edu> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Thomas G. Lockhart wrote: -> -> Bruce Momjian wrote: -> -> > OK, I have created the SubLink structure with supporting routines, and -> > have added code to create the SubLink structures in the parser, and have -> > added Query->hasSubLink. -> > -> > I changed gram.y to support: -> > -> > (x,y,z) OP (subselect) -> > -> > where OP is any operator. Is that right, or are we doing only certain -> > ones, and of so, do we limit it in the parser? -> -> Seems like we would want to pass most operators and expressions through -> gram.y, and then call elog() in either the transformation or in the -> optimizer if it is an operator which can't be supported. - -Not in optimizer, in parser, please. -Remember that for <> SubLink->useor must be TRUE and this is parser work -(optimizer don't know about "=", "<>", etc but only about Oper nodes). - -IN ("=" ANY) and NOT IN ("<>" ALL) transformations are also parser work. - -Vadim - -From owner-pgsql-hackers@hub.org Sun Jan 18 21:08:59 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA00825 - for ; Sun, 18 Jan 1998 21:08:57 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA25254 for ; Sun, 18 Jan 1998 19:18:24 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA06912; Sun, 18 Jan 1998 19:17:01 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 18 Jan 1998 19:11:05 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA06322 for pgsql-hackers-outgoing; Sun, 18 Jan 1998 19:11:01 -0500 (EST) -Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA06144 for ; Sun, 18 Jan 1998 19:10:31 -0500 (EST) -Received: from www.krasnet.ru ([193.125.44.86]) - by clio.trends.ca (8.8.8/8.8.8) with ESMTP id HAA12383 - for ; Sun, 18 Jan 1998 07:28:38 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA15780; - Sun, 18 Jan 1998 19:27:14 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Message-ID: <34C1F51D.E9CF0A39@sable.krasnoyarsk.su> -Date: Sun, 18 Jan 1998 19:27:09 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: "Thomas G. Lockhart" -CC: Bruce Momjian , - PostgreSQL-development -Subject: Re: [HACKERS] subselects coding started -References: <199801170500.AAA12837@candle.pha.pa.us> <34C044D5.C21FE707@alumni.caltech.edu> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Thomas G. Lockhart wrote: -> -> Bruce Momjian wrote: -> -> > OK, I have created the SubLink structure with supporting routines, and -> > have added code to create the SubLink structures in the parser, and have -> > added Query->hasSubLink. -> > -> > I changed gram.y to support: -> > -> > (x,y,z) OP (subselect) -> > -> > where OP is any operator. Is that right, or are we doing only certain -> > ones, and of so, do we limit it in the parser? -> -> Seems like we would want to pass most operators and expressions through -> gram.y, and then call elog() in either the transformation or in the -> optimizer if it is an operator which can't be supported. - -Not in optimizer, in parser, please. -Remember that for <> SubLink->useor must be TRUE and this is parser work -(optimizer don't know about "=", "<>", etc but only about Oper nodes). - -IN ("=" ANY) and NOT IN ("<>" ALL) transformations are also parser work. - -Vadim - - -From vadim@sable.krasnoyarsk.su Sun Jan 18 23:59:08 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA10497 - for ; Sun, 18 Jan 1998 23:59:07 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA06941 for ; Sun, 18 Jan 1998 23:44:32 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id LAA16745 - for ; Mon, 19 Jan 1998 11:46:28 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34C2DAA3.78E54042@sable.krasnoyarsk.su> -Date: Mon, 19 Jan 1998 11:46:27 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -Subject: Re: SubLink->oper -References: <199801190419.XAA04367@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> In SubLink->oper, do you want the oid of the pg_operator, or the oid of -> the pg_proc assigned to the operator? -> -> Currently, I am giving you the oid of pg_operator. - -No! I need in Oper nodes here. For "normal" operators parser -returns Expr node with opType = OP_EXPR and corresponding Oper -in Node *oper. Near the same for SubLink: I need in Oper node -for each pair of Var/Const from the left side and target entry from -the subquery. - -Vadim - -From owner-pgsql-hackers@hub.org Mon Jan 19 01:02:23 1998 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24036 - for ; Mon, 19 Jan 1998 01:02:21 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA13913; Mon, 19 Jan 1998 01:02:16 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 19 Jan 1998 01:01:41 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA13824 for pgsql-hackers-outgoing; Mon, 19 Jan 1998 01:01:34 -0500 (EST) -Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA13699 for ; Mon, 19 Jan 1998 01:00:59 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id AAA23866; - Mon, 19 Jan 1998 00:54:49 -0500 (EST) -From: Bruce Momjian -Message-Id: <199801190554.AAA23866@candle.pha.pa.us> -Subject: [HACKERS] subselects -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Mon, 19 Jan 1998 00:54:49 -0500 (EST) -Cc: hackers@postgreSQL.org (PostgreSQL-development) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - - -OK, I have added code to allow the SubLinks make it to the optimizer. - -I implemented ParseState->parentParseState, but not parentQuery, because -the parentParseState is much more valuable to me, and Vadim thought it -might be useful, but was not positive. Also, keeping that parentQuery -pointer valid through rewrite may be difficult, so I dropped it. -ParseState is only valid in the parser. - -I have not done: - - correlated subquery column references - added Var->sublevels_up - gotten this to work in the rewrite system - have not added full CopyNode support - -I will address these in the next few days. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Mon Jan 19 01:32:54 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24335 - for ; Mon, 19 Jan 1998 01:32:52 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA10610 for ; Mon, 19 Jan 1998 01:23:02 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16879 - for ; Mon, 19 Jan 1998 13:25:28 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34C2F1D2.9CD191CC@sable.krasnoyarsk.su> -Date: Mon, 19 Jan 1998 13:25:22 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -Subject: Re: SubLink->oper -References: <199801190500.AAA10576@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > -> > Bruce Momjian wrote: -> > > -> > > In SubLink->oper, do you want the oid of the pg_operator, or the oid of -> > > the pg_proc assigned to the operator? -> > > -> > > Currently, I am giving you the oid of pg_operator. -> > -> > No! I need in Oper nodes here. For "normal" operators parser -> > returns Expr node with opType = OP_EXPR and corresponding Oper -> > in Node *oper. Near the same for SubLink: I need in Oper node -> > for each pair of Var/Const from the left side and target entry from -> > the subquery. -> > -> > Vadim -> > -> -> OK, can I give you an Oper* for each field. - -Nice! But what's this: - -typedef struct SubLink -{ -struct Query; -^^^^^^^^^^^^^ - NodeTag type; - -Vadim - -From vadim@sable.krasnoyarsk.su Mon Jan 19 01:34:39 1998 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24346 - for ; Mon, 19 Jan 1998 01:34:33 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16904; - Mon, 19 Jan 1998 13:37:42 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Sender: root@www.krasnet.ru -Message-ID: <34C2F4B4.7BBA1DB2@sable.krasnoyarsk.su> -Date: Mon, 19 Jan 1998 13:37:41 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: Re: subselects -References: <199801190554.AAA23866@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> OK, I have added code to allow the SubLinks make it to the optimizer. -> -> I implemented ParseState->parentParseState, but not parentQuery, because -> the parentParseState is much more valuable to me, and Vadim thought it -> might be useful, but was not positive. Also, keeping that parentQuery -> pointer valid through rewrite may be difficult, so I dropped it. -> ParseState is only valid in the parser. -> -> I have not done: -> -> correlated subquery column references -> added Var->sublevels_up -> gotten this to work in the rewrite system -> have not added full CopyNode support -> -> I will address these in the next few days. - -Nice! I'm starting with non-correlated subqueries... - -Vadim - -From owner-pgsql-hackers@hub.org Mon Jan 19 01:35:50 1998 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24362 - for ; Mon, 19 Jan 1998 01:35:48 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA17531; Mon, 19 Jan 1998 01:35:39 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 19 Jan 1998 01:35:33 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA17460 for pgsql-hackers-outgoing; Mon, 19 Jan 1998 01:35:28 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA17323 for ; Mon, 19 Jan 1998 01:35:03 -0500 (EST) -Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86]) - by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16904; - Mon, 19 Jan 1998 13:37:42 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Message-ID: <34C2F4B4.7BBA1DB2@sable.krasnoyarsk.su> -Date: Mon, 19 Jan 1998 13:37:41 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: [HACKERS] Re: subselects -References: <199801190554.AAA23866@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce Momjian wrote: -> -> OK, I have added code to allow the SubLinks make it to the optimizer. -> -> I implemented ParseState->parentParseState, but not parentQuery, because -> the parentParseState is much more valuable to me, and Vadim thought it -> might be useful, but was not positive. Also, keeping that parentQuery -> pointer valid through rewrite may be difficult, so I dropped it. -> ParseState is only valid in the parser. -> -> I have not done: -> -> correlated subquery column references -> added Var->sublevels_up -> gotten this to work in the rewrite system -> have not added full CopyNode support -> -> I will address these in the next few days. - -Nice! I'm starting with non-correlated subqueries... - -Vadim - - -From owner-pgsql-hackers@hub.org Wed Jan 21 04:00:59 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA14981 - for ; Wed, 21 Jan 1998 04:00:56 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA02432 for ; Wed, 21 Jan 1998 03:46:22 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id DAA12583; Wed, 21 Jan 1998 03:45:43 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 03:44:07 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id DAA12288 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 03:44:02 -0500 (EST) -Received: from gandalf.sd.spardat.at (gandalf.telecom.at [194.118.26.84]) by hub.org (8.8.8/8.7.5) with ESMTP id DAA12263 for ; Wed, 21 Jan 1998 03:43:18 -0500 (EST) -Received: from sdgtw.sd.spardat.at (sdgtw.sd.spardat.at [172.18.99.31]) - by gandalf.sd.spardat.at (8.8.8/8.8.8) with ESMTP id JAA38408 - for ; Wed, 21 Jan 1998 09:42:55 +0100 -Received: by sdgtw.sd.spardat.at with Internet Mail Service (5.0.1458.49) - id ; Wed, 21 Jan 1998 09:42:55 +0100 -Message-ID: <219F68D65015D011A8E000006F8590C6010A51A2@sdexcsrv1.sd.spardat.at> -From: Zeugswetter Andreas DBT -To: "'pgsql-hackers@hub.org'" -Subject: [HACKERS] Re: subselects -Date: Wed, 21 Jan 1998 09:42:52 +0100 -X-Priority: 3 -MIME-Version: 1.0 -X-Mailer: Internet Mail Service (5.0.1458.49) -Content-Type: text/plain -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce wrote: -> I have completed adding Var.varlevelsup, and have added code to the -> parser to properly set the field. It will allow correlated references -> in the WHERE clause, but not in the target list. - -select i2.ip1, i1.ip4 from nameip i1 where ip1 = (select ip1 from nameip -i2); - 522: Table (i2) not selected in query. -select i1.ip4 from nameip i1 where ip1 = (select i1.ip1 from nameip i2); - 284: A subquery has returned not exactly one row. -select i1.ip4 from nameip i1 where ip1 = (select i1.ip1 from nameip i2 -where name='zeus'); - 2 row(s) retrieved. - -Informix allows correlated references in the target list. It also allows -subselects in the target list as in: -select i1.ip4, (select i1.ip1 from nameip i2) from nameip i1; - 284: A subquery has returned not exactly one row. -select i1.ip4, (select i1.ip1 from nameip i2 where name='zeus') from -nameip i1; - 2 row(s) retrieved. - -Is this what you were looking for ? - -Andreas - - -From owner-pgsql-hackers@hub.org Wed Jan 21 05:31:02 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id FAA15884 - for ; Wed, 21 Jan 1998 05:31:01 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id FAA04709 for ; Wed, 21 Jan 1998 05:16:16 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id FAA05191; Wed, 21 Jan 1998 05:15:42 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 05:14:02 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id FAA04951 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 05:13:57 -0500 (EST) -Received: from dune.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id FAA04610 for ; Wed, 21 Jan 1998 05:12:18 -0500 (EST) -Received: from sable.krasnoyarsk.su (dune.krasnet.ru [193.125.44.86]) - by dune.krasnet.ru (8.8.7/8.8.7) with ESMTP id RAA01918; - Wed, 21 Jan 1998 17:10:24 +0700 (KRS) - (envelope-from vadim@sable.krasnoyarsk.su) -Message-ID: <34C5C98E.3E085F52@sable.krasnoyarsk.su> -Date: Wed, 21 Jan 1998 17:10:22 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: PostgreSQL-development -Subject: [HACKERS] Re: subselects -References: <199801210324.WAA02161@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce Momjian wrote: -> -> We are only going to have subselects in the WHERE clause, not in the -> target list, right? -> -> The standard says we can have them either place, but I didn't think we -> were implementing the target list subselects. -> -> Is that correct? - -Yes, this is right for 6.3. I hope that we'll support subselects in -target list, FROM, etc in future. - -BTW, I'm going to implement subselect in (let's say) "natural" way - -without substitution of parent query relations into subselect and so on, -but by execution of (correlated) subqueries for each upper query row -(may be with cacheing of results in hash table for better performance). -Sure, this is much more clean way and much more clear how to do this. -This seems like SQL-func way, but funcs start/run/stop Executor each time -when called and this breaks performance. - -Vadim - - -From owner-pgsql-hackers@hub.org Wed Jan 21 10:02:02 1998 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA20456 - for ; Wed, 21 Jan 1998 10:02:01 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA06778; Wed, 21 Jan 1998 10:02:13 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 10:00:41 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA06544 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 10:00:37 -0500 (EST) -Received: from u1.abs.net (root@u1.abs.net [207.114.0.131]) by hub.org (8.8.8/8.7.5) with ESMTP id KAA06326 for ; Wed, 21 Jan 1998 10:00:03 -0500 (EST) -Received: from insightdist.com (nobody@localhost) - by u1.abs.net (8.8.5/8.8.5) with UUCP id JAA08009 - for pgsql-hackers@postgresql.org; Wed, 21 Jan 1998 09:40:29 -0500 (EST) -X-Authentication-Warning: u1.abs.net: nobody set sender to insightdist.com!darrenk using -f -Received: by insightdist.com (AIX 3.2/UCB 5.64/4.03) - id AA33174; Wed, 21 Jan 1998 09:26:09 -0500 -Received: by ceodev (AIX 4.1/UCB 5.64/4.03) - id AA36452; Wed, 21 Jan 1998 09:13:05 -0500 -Date: Wed, 21 Jan 1998 09:13:05 -0500 -From: darrenk@insightdist.com (Darren King) -Message-Id: <9801211413.AA36452@ceodev> -To: pgsql-hackers@postgreSQL.org -Subject: Re: [HACKERS] subselects -Mime-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Content-Md5: 4wI6dUsUAXei+yg3JycjGw== -Sender: owner-pgsql-hackers@hub.org -Precedence: bulk -Status: OR - -> We are only going to have subselects in the WHERE clause, not in the -> target list, right? -> -> The standard says we can have them either place, but I didn't think we -> were implementing the target list subselects. -> -> Is that correct? - -What about the HAVING clause? Currently not in, but someone here wants -to take a stab at it. - -Doesn't seem that tough...loops over the tuples returned from the group -by node and checks the expression such as "x > 5" or "x = (subselect)". - -The cost analysis in the optimizer could be tricky come to think of it. -If a subselect has a HAVING, would have to have a formula to determine -the selectiveness. Hmmm... - -darrenk - - -- 2.40.0