From 97f447b2cd81e029ca5008792c867512008f4b05 Mon Sep 17 00:00:00 2001 From: Bruce Momjian Date: Thu, 25 Jan 2001 03:53:25 +0000 Subject: [PATCH] Add to TODO.detail. --- doc/TODO.detail/cidr | 2769 -------------------------------------- doc/TODO.detail/cnfify | 1556 --------------------- doc/TODO.detail/drop | 71 +- doc/TODO.detail/pglog | 2900 ---------------------------------------- 4 files changed, 66 insertions(+), 7230 deletions(-) delete mode 100644 doc/TODO.detail/cidr delete mode 100644 doc/TODO.detail/cnfify delete mode 100644 doc/TODO.detail/pglog diff --git a/doc/TODO.detail/cidr b/doc/TODO.detail/cidr deleted file mode 100644 index 1a6479c062..0000000000 --- a/doc/TODO.detail/cidr +++ /dev/null @@ -1,2769 +0,0 @@ -From pgsql-hackers-owner+M4219@hub.org Tue Jul 4 20:10:16 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA13204 - for ; Tue, 4 Jul 2000 20:10:15 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e650A8S29252; - Tue, 4 Jul 2000 20:10:08 -0400 (EDT) -Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6505pS14530 - for ; Tue, 4 Jul 2000 20:05:52 -0400 (EDT) -Received: from regulus.student.UU.SE ([130.238.5.2]:37402 "EHLO - regulus.its.uu.se") by merganser.its.uu.se with ESMTP - id ; Wed, 5 Jul 2000 02:05:20 +0200 -Received: from peter (helo=localhost) - by regulus.its.uu.se with local-esmtp (Exim 3.02 #2) - id 139cnr-0003QO-00 - for pgsql-hackers@postgresql.org; Wed, 05 Jul 2000 02:12:35 +0200 -Date: Wed, 5 Jul 2000 02:12:35 +0200 (CEST) -From: Peter Eisentraut -To: PostgreSQL Development -Subject: [HACKERS] Repair plan for inet and cidr types -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=ISO-8859-1 -Content-Transfer-Encoding: 8BIT -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -As we know, the inet and cidr types are still broken in several ways, -amongst others input and output functions, operators, ordering. I've -collected the bug reports from the last year or so from the archives. - -There's apparently a lack of understanding of what exactly are these types -are supposed to do. Therefore, instead of addressing each bug -individually, let me first state what I reconstructed as the specification -of these types, and then add what is currently wrong with it. - -* CIDR - -The cidr type stores the identity of an IP _network_. A network -specification is of the form 'x.x.x.x/y'. The documentation states that if -y is omitted then it is constructed from the old A, B, C class scheme. So -be it. In a real world network, the bits (y+1)...32 have to be zero, but -the cidr type does not currently enforce this. This has been the source of -bugs in the past, and no doubt the source of some confusion as well. I -propose that cidr _reject_ input of the form '127.0.0.5/16'. If you think -about it, this is the same as int4 rejecting 3.5 as input. - -* INET - -The inet type stores the identity of an IP _host_. A host specification is -of the form 'x.x.x.x'. Optionally, the inet type also stores the identity -of the network the host is in. E.g., '127.0.0.5/16' means the host -127.0.0.5 in the network 127.0/16. - -* Type equivalency - -This has also been a source of problems. I propose that cidr and inet are -not made equivalent types at any level. No automatic casting either. A -network and a host are not the same thing. To construct a cidr value from -an inet value, you'd have to use some sort of (to be created) network() -function, e.g., network('127.0.0.5/16') => '127.0/16'. IMO, there is no -reasonable way to construct an inet value from a cidr value. - -* Operators - -Because the types are equivalent, the operators have also been bunched -together in confusing ways. I propose that ordering operators (>, +, <) -between inet and cidr be eliminated, they do not make sense. The only -useful operation between cidr and inet is the << ("contains") operator. -Ordering withing cidr and inet be defined in terms of their bit -representation, as is the case now. The << family of operators should also -be removed for the inet type -- a host cannot "contain" another host. What -you probably wanted is `inet1 << network(inet2)'. - - -Does anyone see this differently? If not, can we agree on this -specification? - --- -Peter Eisentraut Sernanders väg 10:115 -peter_e@gmx.net 75262 Uppsala -http://yi.org/peter-e/ Sweden - - -From pgsql-hackers-owner+M4230@hub.org Tue Jul 4 22:13:37 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA13773 - for ; Tue, 4 Jul 2000 22:13:37 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e652DSS19722; - Tue, 4 Jul 2000 22:13:28 -0400 (EDT) -Received: from druid.net (root@druid.net [216.126.72.98]) - by hub.org (8.10.1/8.10.1) with ESMTP id e652D9S19504 - for ; Tue, 4 Jul 2000 22:13:09 -0400 (EDT) -Received: from localhost (4223 bytes) by druid.net - via sendmail with P:stdio/R:bind_hosts/T:inet_zone_bind_smtp - (sender: ) (ident using unix) - id - for ; Tue, 4 Jul 2000 22:13:06 -0400 (EDT) - (Smail-3.2.0.109 1999-Oct-27 #3 built 2000-Jun-28) -Message-Id: -From: darcy@druid.net (D'Arcy J.M. Cain) -Subject: Re: [HACKERS] Repair plan for inet and cidr types -In-Reply-To: - "from Peter Eisentraut at Jul 5, 2000 02:12:35 am" -To: Peter Eisentraut -Date: Tue, 4 Jul 2000 22:13:06 -0400 (EDT) -CC: PostgreSQL Development -X-Mailer: ELM [version 2.4ME+ PL78 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Thus spake Peter Eisentraut -> There's apparently a lack of understanding of what exactly are these types -> are supposed to do. Therefore, instead of addressing each bug -> individually, let me first state what I reconstructed as the specification -> of these types, and then add what is currently wrong with it. - -I have been browsing through the old messages on the topic. There was, in -fact some very good work defining the type before anyone actually started -to code. There was a surprising amount of controversy over the actual -definitions but I think in the end we hammered it out at least to the -point that everyone could work with it. - -> * CIDR -> -> The cidr type stores the identity of an IP _network_. A network -> specification is of the form 'x.x.x.x/y'. The documentation states that if -> y is omitted then it is constructed from the old A, B, C class scheme. So -> be it. In a real world network, the bits (y+1)...32 have to be zero, but -> the cidr type does not currently enforce this. This has been the source of -> bugs in the past, and no doubt the source of some confusion as well. I -> propose that cidr _reject_ input of the form '127.0.0.5/16'. If you think -> about it, this is the same as int4 rejecting 3.5 as input. - -There is also the option of accepting it but masking out the host bits -before storing it. That gives us automatic conversion if we store an -inet into a cidr if our intent is to store the network part. - -What sort of bugs do you think it caused btw? - -> * INET -> -> The inet type stores the identity of an IP _host_. A host specification is -> of the form 'x.x.x.x'. Optionally, the inet type also stores the identity -> of the network the host is in. E.g., '127.0.0.5/16' means the host -> 127.0.0.5 in the network 127.0/16. - -That sounds right. We also allowed for hosts to be stored implicitely by -simply making the netmask /32. - -> * Type equivalency -> -> This has also been a source of problems. I propose that cidr and inet are -> not made equivalent types at any level. No automatic casting either. A -> network and a host are not the same thing. To construct a cidr value from -> an inet value, you'd have to use some sort of (to be created) network() -> function, e.g., network('127.0.0.5/16') => '127.0/16'. IMO, there is no -> reasonable way to construct an inet value from a cidr value. - -I'm not sure I understand why this is necessary. I can see not allowing -cidr ==> inet conversions but inet ==> cidr can be done as it is a matter -of dropping information - the host part. - - -> * Operators -> -> Because the types are equivalent, the operators have also been bunched -> together in confusing ways. I propose that ordering operators (>, +, <) -> between inet and cidr be eliminated, they do not make sense. The only -> useful operation between cidr and inet is the << ("contains") operator. -> Ordering withing cidr and inet be defined in terms of their bit -> representation, as is the case now. The << family of operators should also -> be removed for the inet type -- a host cannot "contain" another host. What -> you probably wanted is `inet1 << network(inet2)'. - -Then let's define that as the meaning of "inet1 << inet2" i.e. define -the << operator between inet types as meaning "tell me if inet1 is in -the same network as inet2." In fact, if we define << as only allowed -between inet and cidr (or cidr and cidr?) then the implied cast will -deal with it if that cast causes the host bits to drop as suggested -above. - --- -D'Arcy J.M. Cain | Democracy is three wolves -http://www.druid.net/darcy/ | and a sheep voting on -+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. - -From pgsql-hackers-owner+M4232@hub.org Tue Jul 4 22:20:30 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA13808 - for ; Tue, 4 Jul 2000 22:20:29 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e652KDS33988; - Tue, 4 Jul 2000 22:20:13 -0400 (EDT) -Received: from druid.net (root@druid.net [216.126.72.98]) - by hub.org (8.10.1/8.10.1) with ESMTP id e652JuS33839 - for ; Tue, 4 Jul 2000 22:19:57 -0400 (EDT) -Received: from localhost (1460 bytes) by druid.net - via sendmail with P:stdio/R:bind_hosts/T:inet_zone_bind_smtp - (sender: ) (ident using unix) - id - for ; Tue, 4 Jul 2000 22:19:49 -0400 (EDT) - (Smail-3.2.0.109 1999-Oct-27 #3 built 2000-Jun-28) -Message-Id: -From: darcy@druid.net (D'Arcy J.M. Cain) -Subject: Re: [HACKERS] Repair plan for inet and cidr types -In-Reply-To: - "from Peter Eisentraut at Jul 5, 2000 02:12:35 am" -To: Peter Eisentraut -Date: Tue, 4 Jul 2000 22:19:49 -0400 (EDT) -CC: PostgreSQL Development -X-Mailer: ELM [version 2.4ME+ PL78 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Thus spake Peter Eisentraut -> network and a host are not the same thing. To construct a cidr value from -> an inet value, you'd have to use some sort of (to be created) network() -> function, e.g., network('127.0.0.5/16') => '127.0/16'. IMO, there is no - -Oh, I forgot to mention: - -darcy=> select network('127.1.2.3/24'::inet); -network ----------- -127.1.2/24 -(1 row) - -There is also a host and netmask function and note: - -darcy=> select host('127.1.2.3/24'::cidr); -ERROR: CIDR type has no host part - -But I still see no reason why that can't be implicit if we assign the -"'127.1.2.3/24'::inet" value to a cidr. In other words let "select -('127.1.2.3/24'::inet)::cidr" give the same output. - --- -D'Arcy J.M. Cain | Democracy is three wolves -http://www.druid.net/darcy/ | and a sheep voting on -+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. - -From pgsql-hackers-owner+M4234@hub.org Tue Jul 4 22:31:46 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA13855 - for ; Tue, 4 Jul 2000 22:31:46 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e652VdS74063; - Tue, 4 Jul 2000 22:31:39 -0400 (EDT) -Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) - by hub.org (8.10.1/8.10.1) with ESMTP id e652VSS73985 - for ; Tue, 4 Jul 2000 22:31:28 -0400 (EDT) -Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) - by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id WAA29694; - Tue, 4 Jul 2000 22:31:26 -0400 (EDT) -To: Peter Eisentraut -cc: PostgreSQL Development -Subject: Re: [HACKERS] Repair plan for inet and cidr types -In-reply-to: -References: -Comments: In-reply-to Peter Eisentraut - message dated "Wed, 05 Jul 2000 02:12:35 +0200" -Date: Tue, 04 Jul 2000 22:31:25 -0400 -Message-ID: <29691.962764285@sss.pgh.pa.us> -From: Tom Lane -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Peter Eisentraut writes: -> There's apparently a lack of understanding of what exactly are these types -> are supposed to do. Therefore, instead of addressing each bug -> individually, let me first state what I reconstructed as the specification -> of these types, and then add what is currently wrong with it. - -This sounds good offhand, but then I never paid a whole lot of attention -to the details originally. Did you go through the original inet/cidr -design discussions (the threads where Paul Vixie was participating)? - -I don't believe Paul is subscribed here anymore, but I'd feel a lot -happier if you can contact him and get him to sign off on the clarified -design. Maybe this is what he had in mind all along, or maybe not. - - regards, tom lane - -PS: You do know who Paul Vixie is, I assume ;-). I can think of few -better-qualified experts in this domain... - -From pgsql-hackers-owner+M4312@hub.org Wed Jul 5 10:48:17 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA02483 - for ; Wed, 5 Jul 2000 10:48:16 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e65EllS08607; - Wed, 5 Jul 2000 10:47:47 -0400 (EDT) -Received: from elektron.elka.pw.edu.pl (root@elektron.elka.pw.edu.pl [148.81.63.249]) - by hub.org (8.10.1/8.10.1) with ESMTP id e65CiPS89307 - for ; Wed, 5 Jul 2000 08:44:55 -0400 (EDT) -Received: from elektron.elka.pw.edu.pl ([148.81.63.249]:41059 "EHLO - elektron.elka.pw.edu.pl") by elektron.elka.pw.edu.pl with ESMTP - id ; Wed, 5 Jul 2000 14:44:01 +0200 -Date: Wed, 5 Jul 2000 14:43:49 +0200 (MET DST) -From: Jakub Bartosz Bielecki -To: Sevo Stille -cc: Jakub Bartosz Bielecki , - pgsql-hackers@PostgreSQL.org -Subject: Re: [HACKERS] Re: postgres - development of inet/cidr -In-Reply-To: <3960A5FE.E626BAE1@ip23.net> -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - - - -On Mon, 3 Jul 2000, Sevo Stille wrote: -> -> This would be proper behaviour for the cidr datatype, which describes a -> network. "select '10.0.0.1/27'::cidr='10.0.0.2/27'::cidr;" has to return -> true, as both define the same network, the mask putting the 1 vs. 2 -> outside the comparison scope. -> -> On inet, I consider the above broken - going by the documentation, -> having a netmask on a inet datatype does not define a network address -> but rather supplies additional information on the cidr network the host -> as specified by the address is in. Accordingly, it should only truncate -> if the comparison casts to cidr. - -OK. After some inspection in list's archives I found the following -statement (http://www.postgresql.org/mhonarc/pgsql-hackers/1998-07): -> It does not work that way. /24 is -> not a shorthand for specifying a netmask -- in CIDR, it's a "prefix -> length". -> That means "192.7.34.21/24" is either (a) a syntax error or -> (b) equivilent to "192.7.34/24". - -Everybody seemed to agree with the above opinion at that time. - -This is obviously _not_ the way that CIDR is handled at this moment. -"select '1.2.3.4/24'" returns "1.2.3/24" only because the _output_ routine -silently cuts host bits. Input routine stores it exactly as '1.2.3.4/24'. - -Since IMHO it's wrong I prepared a patch (I'm sending it to pgsql-patch). -It fixes the CIDR input routine to zero host bits (ie beyond-prefix bits). -Please note that I didn't change the INET input routine. - -Eventually I had to change a bit comparison functions. -To this moment they worked in a CIDR way (didn't compare host bits at all) -although they were used by both INET and CIDR. -Since CIDR is zero-padded now, whole 32 bits are compared by > = < -operators. -Subnet operators <<, >> are still the same, don't compare host bits. - -> The big question is whether comparisons that only work on a cidr data -> type (contains/contained) or have a cidr type on one side can safely -> cast the inet type to cidr implicitly. For: -> "select '10.0.0.1/27'::inet = '10.0.0.2/27'::inet;" FALSE -> "select '10.0.0.1/27'::cidr = '10.0.0.2/27'::cidr;" TRUE -> "select '10.0.0.1/27'::cidr = '10.0.0.2/27'::inet;" FALSE -> "select '10.0.0.1/27'::cidr >> '10.0.0.2/27'::inet;" TRUE -OK. -> "select '10.0.0.1/27'::cidr << '10.0.0.2/27'::inet;" ERROR - -Currently it's not an error... There is no way (and no reason) to -distinguish between INET and CIDR. Above example is exactly -equivalent to: - select '10.0.0.0/27'::inet << '10.0.0.2/27'::inet; -- FALSE -but: - select '10.0.0.0/27'::inet <<= '10.0.0.2/27'::inet; -- TRUE - -> But we need to reach an agreement on the proper -> behaviour on greater/smaller comparisons. Should: -> -> "select '10.0.0.1/27'::inet > '10.0.0.2/27'::cidr;" -> -> be true or false? Casting to cidr prior to comparison would make it -> equivalent to "select '10.0.0.0/27'::cidr > '10.0.0.0/27'::cidr;", which -> is false, both networks being equal. - -It should be (and is!) true... Since second argument is -really '10.0.0.0/27'. - - -From pgsql-patches-owner+M284@hub.org Wed Jul 5 09:03:39 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA27744 - for ; Wed, 5 Jul 2000 09:03:38 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e65D3OS38516; - Wed, 5 Jul 2000 09:03:24 -0400 (EDT) -Received: from elektron.elka.pw.edu.pl (root@elektron.elka.pw.edu.pl [148.81.63.249]) - by hub.org (8.10.1/8.10.1) with ESMTP id e65Cr5S11483 - for ; Wed, 5 Jul 2000 08:53:06 -0400 (EDT) -Received: from elektron.elka.pw.edu.pl ([148.81.63.249]:42221 "EHLO - elektron.elka.pw.edu.pl") by elektron.elka.pw.edu.pl with ESMTP - id ; Wed, 5 Jul 2000 14:52:43 +0200 -Date: Wed, 5 Jul 2000 14:52:33 +0200 (MET DST) -From: Jakub Bartosz Bielecki -To: pgsql-patches@postgresql.org -Subject: [PATCHES] Re: [HACKERS] Re: postgres - development of inet/cidr -Message-ID: -MIME-Version: 1.0 -Content-Type: MULTIPART/MIXED; BOUNDARY="-559023410-959030623-962801553=:1267" -X-Mailing-List: pgsql-patches@postgresql.org -Precedence: bulk -Sender: pgsql-patches-owner@hub.org -Status: OR - - This message is in MIME format. The first part should be readable text, - while the remaining parts are likely unreadable without MIME-aware tools. - Send mail to mime@docserver.cac.washington.edu for more info. - ----559023410-959030623-962801553=:1267 -Content-Type: TEXT/PLAIN; charset=US-ASCII - - -1. Fixed obvious bug with strcpy() called on text type in network.c -2. Fixed CIDR input routine to cut 'host' bits in inet_net_pton.c -3. Changed network_{lt,gt,eq} to compare all bits of INET/CIDR in network.c - -Jakub - ----559023410-959030623-962801553=:1267 -Content-Type: TEXT/PLAIN; charset=US-ASCII; name=inet-patch -Content-Transfer-Encoding: BASE64 -Content-ID: -Content-Description: inet-patch -Content-Disposition: attachment; filename=inet-patch - -KioqIC4vYmFja2VuZC91dGlscy9hZHQvaW5ldF9uZXRfcHRvbi5jLm9yaWcJ -VHVlIEp1bCAgNCAyMzowMDowNiAyMDAwDQotLS0gLi9iYWNrZW5kL3V0aWxz -L2FkdC9pbmV0X25ldF9wdG9uLmMJV2VkIEp1bCAgNSAxMToxMTozMiAyMDAw -DQoqKioqKioqKioqKioqKioNCioqKiAxMDEsMTA3ICoqKioNCiAgCQkJCXRt -cCwNCiAgCQkJCWRpcnR5LA0KICAJCQkJYml0czsNCiEgCWNvbnN0IHVfY2hh -ciAqb2RzdCA9IGRzdDsNCiAgDQogIAljaCA9ICpzcmMrKzsNCiAgCWlmIChj -aCA9PSAnMCcgJiYgKHNyY1swXSA9PSAneCcgfHwgc3JjWzBdID09ICdYJykN -Ci0tLSAxMDEsMTA3IC0tLS0NCiAgCQkJCXRtcCwNCiAgCQkJCWRpcnR5LA0K -ICAJCQkJYml0czsNCiEgCXVfY2hhciAqb2RzdCA9IGRzdDsNCiAgDQogIAlj -aCA9ICpzcmMrKzsNCiAgCWlmIChjaCA9PSAnMCcgJiYgKHNyY1swXSA9PSAn -eCcgfHwgc3JjWzBdID09ICdYJykNCioqKioqKioqKioqKioqKg0KKioqIDIx -MywyMTggKioqKg0KLS0tIDIxMywyMjYgLS0tLQ0KICAJCS8qIElmIGltcHV0 -ZWQgbWFzayBpcyBuYXJyb3dlciB0aGFuIHNwZWNpZmllZCBvY3RldHMsIHdp -ZGVuLiAqLw0KICAJCWlmIChiaXRzID49IDggJiYgYml0cyA8ICgoZHN0IC0g -b2RzdCkgKiA4KSkNCiAgCQkJYml0cyA9IChkc3QgLSBvZHN0KSAqIDg7DQor -IAl9DQorIAkvKiBaZXJvIGhvc3QgYml0cyBpZiBhbnkgKi8NCisgCW4gPSBi -aXRzLzg7DQorIAlpZiggbiA8IChkc3QgLSBvZHN0KSApDQorIAl7DQorIAkJ -b2RzdFtuKytdICY9IFVDSEFSX01BWDw8KDggLSAoYml0cyAlIDgpKTsNCisg -CQlmb3IgKDtuIDwgKGRzdCAtIG9kc3QpOyBuKyspDQorIAkJCW9kc3Rbbl09 -J1wwJzsNCiAgCX0NCiAgCS8qIEV4dGVuZCBuZXR3b3JrIHRvIGNvdmVyIHRo -ZSBhY3R1YWwgbWFzay4gKi8NCiAgCXdoaWxlIChiaXRzID4gKChkc3QgLSBv -ZHN0KSAqIDgpKQ0KKioqIC4vYmFja2VuZC91dGlscy9hZHQvbmV0d29yay5j -Lm9yaWcJVHVlIEp1bCAgNCAyMzowMjowMSAyMDAwDQotLS0gLi9iYWNrZW5k -L3V0aWxzL2FkdC9uZXR3b3JrLmMJVHVlIEp1bCAgNCAyMzozNToyMSAyMDAw -DQoqKioqKioqKioqKioqKioNCioqKiAxOCwyMyAqKioqDQotLS0gMTgsMjQg -LS0tLQ0KICAjaW5jbHVkZSAicG9zdGdyZXMuaCINCiAgI2luY2x1ZGUgInV0 -aWxzL2J1aWx0aW5zLmgiDQogIA0KKyBzdGF0aWMgaW50CXY0Yml0Y21wKHVu -c2lnbmVkIGludCBhMSwgdW5zaWduZWQgaW50IGEyKTsNCiAgc3RhdGljIGlu -dAl2NGJpdG5jbXAodW5zaWduZWQgaW50IGExLCB1bnNpZ25lZCBpbnQgYTIs -IGludCBiaXRzKTsNCiAgDQogIC8qDQoqKioqKioqKioqKioqKioNCioqKiAx -MzcsMTQzICoqKioNCiAgCQlyZXR1cm4gRkFMU0U7DQogIAlpZiAoKGlwX2Zh -bWlseShhMSkgPT0gQUZfSU5FVCkgJiYgKGlwX2ZhbWlseShhMikgPT0gQUZf -SU5FVCkpDQogIAl7DQohIAkJaW50CQkJb3JkZXIgPSB2NGJpdG5jbXAoaXBf -djRhZGRyKGExKSwgaXBfdjRhZGRyKGEyKSwgaXBfYml0cyhhMikpOw0KICAN -CiAgCQlyZXR1cm4gKChvcmRlciA8IDApIHx8ICgob3JkZXIgPT0gMCkgJiYg -KGlwX2JpdHMoYTEpIDwgaXBfYml0cyhhMikpKSk7DQogIAl9DQotLS0gMTM4 -LDE0NCAtLS0tDQogIAkJcmV0dXJuIEZBTFNFOw0KICAJaWYgKChpcF9mYW1p -bHkoYTEpID09IEFGX0lORVQpICYmIChpcF9mYW1pbHkoYTIpID09IEFGX0lO -RVQpKQ0KICAJew0KISAJCWludAkJCW9yZGVyID0gdjRiaXRjbXAoaXBfdjRh -ZGRyKGExKSwgaXBfdjRhZGRyKGEyKSk7DQogIA0KICAJCXJldHVybiAoKG9y -ZGVyIDwgMCkgfHwgKChvcmRlciA9PSAwKSAmJiAoaXBfYml0cyhhMSkgPCBp -cF9iaXRzKGEyKSkpKTsNCiAgCX0NCioqKioqKioqKioqKioqKg0KKioqIDE2 -NiwxNzIgKioqKg0KICAJaWYgKChpcF9mYW1pbHkoYTEpID09IEFGX0lORVQp -ICYmIChpcF9mYW1pbHkoYTIpID09IEFGX0lORVQpKQ0KICAJew0KICAJCXJl -dHVybiAoKGlwX2JpdHMoYTEpID09IGlwX2JpdHMoYTIpKQ0KISAJCSAmJiAo -djRiaXRuY21wKGlwX3Y0YWRkcihhMSksIGlwX3Y0YWRkcihhMiksIGlwX2Jp -dHMoYTEpKSA9PSAwKSk7DQogIAl9DQogIAllbHNlDQogIAl7DQotLS0gMTY3 -LDE3MyAtLS0tDQogIAlpZiAoKGlwX2ZhbWlseShhMSkgPT0gQUZfSU5FVCkg -JiYgKGlwX2ZhbWlseShhMikgPT0gQUZfSU5FVCkpDQogIAl7DQogIAkJcmV0 -dXJuICgoaXBfYml0cyhhMSkgPT0gaXBfYml0cyhhMikpDQohIAkJICYmICh2 -NGJpdGNtcChpcF92NGFkZHIoYTEpLCBpcF92NGFkZHIoYTIpKSA9PSAwKSk7 -DQogIAl9DQogIAllbHNlDQogIAl7DQoqKioqKioqKioqKioqKioNCioqKiAx -OTIsMTk4ICoqKioNCiAgCQlyZXR1cm4gRkFMU0U7DQogIAlpZiAoKGlwX2Zh -bWlseShhMSkgPT0gQUZfSU5FVCkgJiYgKGlwX2ZhbWlseShhMikgPT0gQUZf -SU5FVCkpDQogIAl7DQohIAkJaW50CQkJb3JkZXIgPSB2NGJpdG5jbXAoaXBf -djRhZGRyKGExKSwgaXBfdjRhZGRyKGEyKSwgaXBfYml0cyhhMikpOw0KICAN -CiAgCQlyZXR1cm4gKChvcmRlciA+IDApIHx8ICgob3JkZXIgPT0gMCkgJiYg -KGlwX2JpdHMoYTEpID4gaXBfYml0cyhhMikpKSk7DQogIAl9DQotLS0gMTkz -LDE5OSAtLS0tDQogIAkJcmV0dXJuIEZBTFNFOw0KICAJaWYgKChpcF9mYW1p -bHkoYTEpID09IEFGX0lORVQpICYmIChpcF9mYW1pbHkoYTIpID09IEFGX0lO -RVQpKQ0KICAJew0KISAJCWludAkJCW9yZGVyID0gdjRiaXRjbXAoaXBfdjRh -ZGRyKGExKSwgaXBfdjRhZGRyKGEyKSk7DQogIA0KICAJCXJldHVybiAoKG9y -ZGVyID4gMCkgfHwgKChvcmRlciA9PSAwKSAmJiAoaXBfYml0cyhhMSkgPiBp -cF9iaXRzKGEyKSkpKTsNCiAgCX0NCioqKioqKioqKioqKioqKg0KKioqIDM0 -MSwzNTMgKioqKg0KICANCiAgCWlmICgocHRyID0gc3RyY2hyKHRtcCwgJy8n -KSkgIT0gTlVMTCkNCiAgCQkqcHRyID0gMDsNCiEgCWxlbiA9IFZBUkhEUlNa -ICsgc3RybGVuKHRtcCkgKyAxOw0KICAJcmV0ID0gcGFsbG9jKGxlbik7DQog -IAlpZiAocmV0ID09IE5VTEwpDQogIAkJZWxvZyhFUlJPUiwgInVuYWJsZSB0 -byBhbGxvY2F0ZSBtZW1vcnkgaW4gbmV0d29ya19ob3N0KCkiKTsNCiAgDQog -IAlWQVJTSVpFKHJldCkgPSBsZW47DQohIAlzdHJjcHkoVkFSREFUQShyZXQp -LCB0bXApOw0KICAJcmV0dXJuIChyZXQpOw0KICB9DQogIA0KLS0tIDM0Miwz -NTQgLS0tLQ0KICANCiAgCWlmICgocHRyID0gc3RyY2hyKHRtcCwgJy8nKSkg -IT0gTlVMTCkNCiAgCQkqcHRyID0gMDsNCiEgCWxlbiA9IFZBUkhEUlNaICsg -c3RybGVuKHRtcCk7DQogIAlyZXQgPSBwYWxsb2MobGVuKTsNCiAgCWlmIChy -ZXQgPT0gTlVMTCkNCiAgCQllbG9nKEVSUk9SLCAidW5hYmxlIHRvIGFsbG9j -YXRlIG1lbW9yeSBpbiBuZXR3b3JrX2hvc3QoKSIpOw0KICANCiAgCVZBUlNJ -WkUocmV0KSA9IGxlbjsNCiEgCW1lbWNweShWQVJEQVRBKHJldCksIHRtcCwg -bGVuLVZBUkhEUlNaKTsNCiAgCXJldHVybiAocmV0KTsNCiAgfQ0KICANCioq -KioqKioqKioqKioqKg0KKioqIDM5MSw0MDMgKioqKg0KICANCiAgCWlmICgo -cHRyID0gc3RyY2hyKHRtcCwgJy8nKSkgIT0gTlVMTCkNCiAgCQkqcHRyID0g -MDsNCiEgCWxlbiA9IFZBUkhEUlNaICsgc3RybGVuKHRtcCkgKyAxOw0KICAJ -cmV0ID0gcGFsbG9jKGxlbik7DQogIAlpZiAocmV0ID09IE5VTEwpDQogIAkJ -ZWxvZyhFUlJPUiwgInVuYWJsZSB0byBhbGxvY2F0ZSBtZW1vcnkgaW4gbmV0 -d29ya19icm9hZGNhc3QoKSIpOw0KICANCiAgCVZBUlNJWkUocmV0KSA9IGxl -bjsNCiEgCXN0cmNweShWQVJEQVRBKHJldCksIHRtcCk7DQogIAlyZXR1cm4g -KHJldCk7DQogIH0NCiAgDQotLS0gMzkyLDQwNCAtLS0tDQogIA0KICAJaWYg -KChwdHIgPSBzdHJjaHIodG1wLCAnLycpKSAhPSBOVUxMKQ0KICAJCSpwdHIg -PSAwOw0KISAJbGVuID0gVkFSSERSU1ogKyBzdHJsZW4odG1wKTsNCiAgCXJl -dCA9IHBhbGxvYyhsZW4pOw0KICAJaWYgKHJldCA9PSBOVUxMKQ0KICAJCWVs -b2coRVJST1IsICJ1bmFibGUgdG8gYWxsb2NhdGUgbWVtb3J5IGluIG5ldHdv -cmtfYnJvYWRjYXN0KCkiKTsNCiAgDQogIAlWQVJTSVpFKHJldCkgPSBsZW47 -DQohIAltZW1jcHkoVkFSREFUQShyZXQpLCB0bXAsIGxlbi1WQVJIRFJTWik7 -DQogIAlyZXR1cm4gKHJldCk7DQogIH0NCiAgDQoqKioqKioqKioqKioqKioN -CioqKiA0MjQsNDM2ICoqKioNCiAgCQkvKiBHbyBmb3IgYW4gSVBWNiBhZGRy -ZXNzIGhlcmUsIGJlZm9yZSBmYXVsdGluZyBvdXQ6ICovDQogIAkJZWxvZyhF -UlJPUiwgInVua25vd24gYWRkcmVzcyBmYW1pbHkgKCVkKSIsIGlwX2ZhbWls -eShpcCkpOw0KICANCiEgCWxlbiA9IFZBUkhEUlNaICsgc3RybGVuKHRtcCkg -KyAxOw0KICAJcmV0ID0gcGFsbG9jKGxlbik7DQogIAlpZiAocmV0ID09IE5V -TEwpDQogIAkJZWxvZyhFUlJPUiwgInVuYWJsZSB0byBhbGxvY2F0ZSBtZW1v -cnkgaW4gbmV0d29ya19uZXR3b3JrKCkiKTsNCiAgDQogIAlWQVJTSVpFKHJl -dCkgPSBsZW47DQohIAlzdHJjcHkoVkFSREFUQShyZXQpLCB0bXApOw0KICAJ -cmV0dXJuIChyZXQpOw0KICB9DQogIA0KLS0tIDQyNSw0MzcgLS0tLQ0KICAJ -CS8qIEdvIGZvciBhbiBJUFY2IGFkZHJlc3MgaGVyZSwgYmVmb3JlIGZhdWx0 -aW5nIG91dDogKi8NCiAgCQllbG9nKEVSUk9SLCAidW5rbm93biBhZGRyZXNz -IGZhbWlseSAoJWQpIiwgaXBfZmFtaWx5KGlwKSk7DQogIA0KISAJbGVuID0g -VkFSSERSU1ogKyBzdHJsZW4odG1wKTsNCiAgCXJldCA9IHBhbGxvYyhsZW4p -Ow0KICAJaWYgKHJldCA9PSBOVUxMKQ0KICAJCWVsb2coRVJST1IsICJ1bmFi -bGUgdG8gYWxsb2NhdGUgbWVtb3J5IGluIG5ldHdvcmtfbmV0d29yaygpIik7 -DQogIA0KICAJVkFSU0laRShyZXQpID0gbGVuOw0KISAJbWVtY3B5KFZBUkRB -VEEocmV0KSwgdG1wLCBsZW4tVkFSSERSU1opOw0KICAJcmV0dXJuIChyZXQp -Ow0KICB9DQogIA0KKioqKioqKioqKioqKioqDQoqKiogNDYxLDQ3OSAqKioq -DQogIA0KICAJaWYgKChwdHIgPSBzdHJjaHIodG1wLCAnLycpKSAhPSBOVUxM -KQ0KICAJCSpwdHIgPSAwOw0KISAJbGVuID0gVkFSSERSU1ogKyBzdHJsZW4o -dG1wKSArIDE7DQogIAlyZXQgPSBwYWxsb2MobGVuKTsNCiAgCWlmIChyZXQg -PT0gTlVMTCkNCiAgCQllbG9nKEVSUk9SLCAidW5hYmxlIHRvIGFsbG9jYXRl -IG1lbW9yeSBpbiBuZXR3b3JrX25ldG1hc2soKSIpOw0KICANCiAgCVZBUlNJ -WkUocmV0KSA9IGxlbjsNCiEgCXN0cmNweShWQVJEQVRBKHJldCksIHRtcCk7 -DQogIAlyZXR1cm4gKHJldCk7DQogIH0NCiAgDQogIC8qDQogICAqCUJpdHdp -c2UgY29tcGFyaXNvbiBmb3IgVjQgYWRkcmVzc2VzLiAgQWRkIFY2IGltcGxl -bWVudGF0aW9uIQ0KICAgKi8NCiAgDQogIHN0YXRpYyBpbnQNCiAgdjRiaXRu -Y21wKHVuc2lnbmVkIGludCBhMSwgdW5zaWduZWQgaW50IGEyLCBpbnQgYml0 -cykNCi0tLSA0NjIsNDkxIC0tLS0NCiAgDQogIAlpZiAoKHB0ciA9IHN0cmNo -cih0bXAsICcvJykpICE9IE5VTEwpDQogIAkJKnB0ciA9IDA7DQohIAlsZW4g -PSBWQVJIRFJTWiArIHN0cmxlbih0bXApOw0KICAJcmV0ID0gcGFsbG9jKGxl -bik7DQogIAlpZiAocmV0ID09IE5VTEwpDQogIAkJZWxvZyhFUlJPUiwgInVu -YWJsZSB0byBhbGxvY2F0ZSBtZW1vcnkgaW4gbmV0d29ya19uZXRtYXNrKCki -KTsNCiAgDQogIAlWQVJTSVpFKHJldCkgPSBsZW47DQohIAltZW1jcHkoVkFS -REFUQShyZXQpLCB0bXAsIGxlbi1WQVJIRFJTWik7DQogIAlyZXR1cm4gKHJl -dCk7DQogIH0NCiAgDQogIC8qDQogICAqCUJpdHdpc2UgY29tcGFyaXNvbiBm -b3IgVjQgYWRkcmVzc2VzLiAgQWRkIFY2IGltcGxlbWVudGF0aW9uIQ0KICAg -Ki8NCisgDQorIHN0YXRpYyBpbnQNCisgdjRiaXRjbXAodW5zaWduZWQgaW50 -IGExLCB1bnNpZ25lZCBpbnQgYTIpDQorIHsNCisgCWExID0gbnRvaGwoYTEp -Ow0KKyAJYTIgPSBudG9obChhMik7DQorIAlpZiAoYTEgPCBhMikNCisgCQly -ZXR1cm4gKC0xKTsNCisgCWVsc2UgDQorIAkJcmV0dXJuIChhMSA+IGEyKTsN -CisgfQ0KICANCiAgc3RhdGljIGludA0KICB2NGJpdG5jbXAodW5zaWduZWQg -aW50IGExLCB1bnNpZ25lZCBpbnQgYTIsIGludCBiaXRzKQ0K - ----559023410-959030623-962801553=:1267-- - -From pgsql-hackers-owner+M4284@hub.org Wed Jul 5 09:04:09 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA27751 - for ; Wed, 5 Jul 2000 09:04:08 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e65D44S42069; - Wed, 5 Jul 2000 09:04:04 -0400 (EDT) -Received: from turing.csis.gvsu.edu (IDENT:qmailr@csis.gvsu.edu [148.61.162.182]) - by hub.org (8.10.1/8.10.1) with SMTP id e65D2HS35607 - for ; Wed, 5 Jul 2000 09:02:17 -0400 (EDT) -Received: (qmail 4436 invoked by uid 0); 5 Jul 2000 13:02:17 -0000 -Received: from eos05.csis.gvsu.edu (eisentrp@148.61.162.105) - by turing.csis.gvsu.edu with QMQP; 5 Jul 2000 13:02:17 -0000 -From: eisentrp@csis.gvsu.edu -Date: Wed, 5 Jul 2000 09:02:17 -0400 (EDT) -Reply-To: Peter Eisentraut -To: "D'Arcy J.M. Cain" -cc: PostgreSQL Development -Subject: Re: [HACKERS] Repair plan for inet and cidr types -In-Reply-To: -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -On Tue, 4 Jul 2000, D'Arcy J.M. Cain wrote: - -> I'm not sure I understand why this is necessary. I can see not allowing -> cidr ==> inet conversions but inet ==> cidr can be done as it is a matter -> of dropping information - the host part. - -Automatic casts should not lose information. How would you feel if floats -were automatically rounded when you store them into int fields? I think -this is an important principle in any type system. - -> Then let's define that as the meaning of "inet1 << inet2" i.e. define -> the << operator between inet types as meaning "tell me if inet1 is in -> the same network as inet2." - -Again, let the user say what he wants: inet1 << network(inet2). - -Also note that "is inet1 in the same network as inet2" is different from -"is inet1 contained in the network of inet2" (which is what it does now). -The operator you defined is symmetric (if inet1 is in the same network as -inet2, then inet2 is also in the same network as inet1), whereas the << is -antisymmetric. In fact, AFAICT, the operator you defined doesn't exist -yet, although it perhaps should. - - --- -Peter Eisentraut Sernanders vaeg 10:115 -peter_e@gmx.net 75262 Uppsala -http://yi.org/peter-e/ Sweden - - -From pgsql-hackers-owner+M4293@hub.org Wed Jul 5 09:50:15 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA28183 - for ; Wed, 5 Jul 2000 09:50:14 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e65Do1S55862; - Wed, 5 Jul 2000 09:50:01 -0400 (EDT) -Received: from andie.ip23.net (andie.ip23.net [212.83.32.23]) - by hub.org (8.10.1/8.10.1) with ESMTP id e65DmGS51928 - for ; Wed, 5 Jul 2000 09:48:16 -0400 (EDT) -Received: from imap1.ip23.net (imap1.ip23.net [212.83.32.35]) - by andie.ip23.net (8.9.3/8.9.3) with ESMTP id PAA33008; - Wed, 5 Jul 2000 15:48:10 +0200 (CEST) -Received: from ip23.net (spc.ip23.net [212.83.32.122]) - by imap1.ip23.net (8.9.3/8.9.3) with ESMTP id QAA00989; - Wed, 5 Jul 2000 16:01:01 +0200 (CEST) -Message-ID: <39633C99.DD58D11F@ip23.net> -Date: Wed, 05 Jul 2000 15:48:09 +0200 -From: Sevo Stille -Organization: IP23 -X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.10 i686) -X-Accept-Language: en, de -MIME-Version: 1.0 -To: Jakub Bartosz Bielecki -CC: pgsql-hackers@PostgreSQL.org -Subject: Re: [HACKERS] Re: postgres - development of inet/cidr -References: -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Jakub Bartosz Bielecki wrote: - -> > "select '10.0.0.1/27'::cidr << '10.0.0.2/27'::inet;" ERROR -> -> Currently it's not an error... There is no way (and no reason) to -> distinguish between INET and CIDR. - -Yes, there is. CIDR is defined as the network 10.0.0.1 & /27, while INET -is defined as host 10.0.0.1 within network 10.0.0.1 & /27. You can do -almost every network and host calculation both in CIDR and INET, but -you need implicit knowledge for it. Two columns are necessary to define -a host and its network in CIDR, and a network cannot be specified -without a host using INET - except for ugly in-band hacks like using -10.0.0.0/27 for the network which would prevent you from specifying a -base address. - -> Above example is exactly -> equivalent to: -> select '10.0.0.0/27'::inet << '10.0.0.2/27'::inet; -- FALSE - -Nope. If the right hand side is automatically propagated to a network, -it is true. If not, the above IMHO should better raise an error, as a -host can never contain a host. - -> but: -> select '10.0.0.0/27'::inet <<= '10.0.0.2/27'::inet; -- TRUE - -Well, you might argue that a host could contain-or-equal a host, but as -only the equals part could ever be true, that is a redundant operator -without any meaning beyond equals, and accordingly it should not be -valid for that case. - -> > But we need to reach an agreement on the proper -> > behaviour on greater/smaller comparisons. Should: -> > -> > "select '10.0.0.1/27'::inet > '10.0.0.2/27'::cidr;" -> > -> > be true or false? Casting to cidr prior to comparison would make it -> > equivalent to "select '10.0.0.0/27'::cidr > '10.0.0.0/27'::cidr;", which -> > is false, both networks being equal. -> -> It should be (and is!) true... Since second argument is -> really '10.0.0.0/27'. - -Yes, but that does not make it any truer. CIDR 10.0.0.0/27 is -definitively not 10.0.0.0 but [10.0.0.0 .. 10.0.0.31]. A CIDR address is -never synonymous to a plain host address. You'll see the problem if you -try to calculate the inverse - any zeroed CIDR address in the entire -range from 10.0/8 to 10.0.0.0/32 would mask to 10.0.0.0. Accordingly, -there is no simple answer to a "host bigger/smaller than network" -question. For many applications, it may be useful to define that to mean -that the host is smaller than the network bottom address respectively -bigger than the top address, but any of the other possible views would -be perfectly legal as well. - -Sevo - --- -sevo@ip23.net - -From pgsql-hackers-owner+M4354@hub.org Wed Jul 5 16:49:21 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA17585 - for ; Wed, 5 Jul 2000 16:49:20 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e65KmdS82556; - Wed, 5 Jul 2000 16:48:39 -0400 (EDT) -Received: from druid.net (root@druid.net [216.126.72.98]) - by hub.org (8.10.1/8.10.1) with ESMTP id e65KkqS77601 - for ; Wed, 5 Jul 2000 16:46:52 -0400 (EDT) -Received: from localhost (2500 bytes) by druid.net - via sendmail with P:stdio/R:bind_hosts/T:inet_zone_bind_smtp - (sender: ) (ident using unix) - id - for ; Wed, 5 Jul 2000 16:46:48 -0400 (EDT) - (Smail-3.2.0.109 1999-Oct-27 #3 built 2000-Jun-28) -Message-Id: -From: darcy@druid.net (D'Arcy J.M. Cain) -Subject: Re: [HACKERS] Repair plan for inet and cidr types -In-Reply-To: - "from eisentrp@csis.gvsu.edu at Jul 5, 2000 09:02:17 am" -To: Peter Eisentraut -Date: Wed, 5 Jul 2000 16:46:48 -0400 (EDT) -CC: PostgreSQL Development -X-Mailer: ELM [version 2.4ME+ PL78 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Thus spake eisentrp@csis.gvsu.edu -> On Tue, 4 Jul 2000, D'Arcy J.M. Cain wrote: -> > I'm not sure I understand why this is necessary. I can see not allowing -> > cidr ==> inet conversions but inet ==> cidr can be done as it is a matter -> > of dropping information - the host part. -> -> Automatic casts should not lose information. How would you feel if floats -> were automatically rounded when you store them into int fields? I think -> this is an important principle in any type system. - -If it was defined well I would have no problem with it. - -> > Then let's define that as the meaning of "inet1 << inet2" i.e. define -> > the << operator between inet types as meaning "tell me if inet1 is in -> > the same network as inet2." -> -> Again, let the user say what he wants: inet1 << network(inet2). - -I think that's what I meant. I'm just saying that inet::cidr should be -the same as network(inet). Allowing that cast makes a lot of operations -work intuitively. - -> Also note that "is inet1 in the same network as inet2" is different from -> "is inet1 contained in the network of inet2" (which is what it does now). - -Hmm. It is a subtle difference and I did miss it. - -> The operator you defined is symmetric (if inet1 is in the same network as -> inet2, then inet2 is also in the same network as inet1), whereas the << is -> antisymmetric. In fact, AFAICT, the operator you defined doesn't exist -> yet, although it perhaps should. - -I guess what I was really getting at was this. - - host OP cidr - -where inet would cast to host on one side and cidr on the other. What -we have now is - - cidr OP cidr - -with both sides casting to cidr. Of course there is no such thing as a host -type so I don't know how we would cast such a thing. - --- -D'Arcy J.M. Cain | Democracy is three wolves -http://www.druid.net/darcy/ | and a sheep voting on -+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. - -From pgsql-hackers-owner+M4421@hub.org Thu Jul 6 08:54:47 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id IAA06169 - for ; Thu, 6 Jul 2000 08:54:46 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e66CrgS44851; - Thu, 6 Jul 2000 08:53:42 -0400 (EDT) -Received: from elektron.elka.pw.edu.pl (root@elektron.elka.pw.edu.pl [148.81.63.249]) - by hub.org (8.10.1/8.10.1) with ESMTP id e66Cr5S44024 - for ; Thu, 6 Jul 2000 08:53:05 -0400 (EDT) -Received: from elektron.elka.pw.edu.pl ([148.81.63.249]:64907 "EHLO - elektron.elka.pw.edu.pl") by elektron.elka.pw.edu.pl with ESMTP - id ; Thu, 6 Jul 2000 14:52:28 +0200 -Date: Thu, 6 Jul 2000 14:52:17 +0200 (MET DST) -From: Jakub Bartosz Bielecki -To: Sevo Stille -cc: Jakub Bartosz Bielecki , - pgsql-hackers@PostgreSQL.org -Subject: Re: [HACKERS] Re: postgres - development of inet/cidr -In-Reply-To: <39633C99.DD58D11F@ip23.net> -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - - - -On Wed, 5 Jul 2000, Sevo Stille wrote: -> -> > > "select '10.0.0.1/27'::cidr << '10.0.0.2/27'::inet;" ERROR -> > -> > Currently it's not an error... There is no way (and no reason) to -> > distinguish between INET and CIDR. -> -> Yes, there is. CIDR is defined as the network 10.0.0.1 & /27, while INET -> is defined as host 10.0.0.1 within network 10.0.0.1 & /27. You can do -> almost every network and host calculation both in CIDR and INET, but -> you need implicit knowledge for it. - -I was talking about *current* implementation of INET/CIDR (which IMHO -is very ill). -There is INET for users that want simply to store IP's and don't care -about all the technical jargon. -There is CIDR for advanced users who want to store network data. - -Currently these 2 types are handled by 1 implementation, moreover despite -INET netmask and CIDR prefix-length are something completely different, -both are stored in the same field of inet structure (yuck). - -At the moment it works fine. But that's only a hack. -I guess the purpose was to prevent duplication of code... Blah... - -> > select '10.0.0.0/27'::inet << '10.0.0.2/27'::inet; -- FALSE -> -> Nope. If the right hand side is automatically propagated to a network, -> it is true. If not, the above IMHO should better raise an error, as a -> host can never contain a host. -> -> > select '10.0.0.0/27'::inet <<= '10.0.0.2/27'::inet; -- TRUE -> -> Well, you might argue that a host could contain-or-equal a host, but as -> only the equals part could ever be true, that is a redundant operator -> without any meaning beyond equals, and accordingly it should not be -> valid for that case. -> -> > > "select '10.0.0.1/27'::inet > '10.0.0.2/27'::cidr;" -> > It should be (and is!) true... Since second argument is -> > really '10.0.0.0/27'. -> -> Yes, but that does not make it any truer. CIDR 10.0.0.0/27 is -> definitively not 10.0.0.0 but [10.0.0.0 .. 10.0.0.31]. - -Same as above... You are perfectly right. - -Everything works until user starts messing with _both_ INET and CIDR -at the same time. - -The possible solution is: -- inhibit cidr-to-inet cast (and maybe also inet-to-cidr, because - it would throw away netmask), -- CIDR operators: > = < << >> -- INET operators: > = < (and why not & | if it would be useful???) - functions: cidr network(inet); // '10.0.0.0/27' - text host(inet); // '10.0.0.1' - int masklen(inet); // 27 -- write an usable manual. - -Comments? -I *might* work on it if I find some spare time. But it's unlikely :( - - -From pgsql-hackers-owner+M4503@hub.org Fri Jul 7 12:11:37 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA26802 - for ; Fri, 7 Jul 2000 12:11:36 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e67GAgW67823; - Fri, 7 Jul 2000 12:10:42 -0400 (EDT) -Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) - by hub.org (8.10.1/8.10.1) with ESMTP id e67G9qW66262 - for ; Fri, 7 Jul 2000 12:09:52 -0400 (EDT) -Received: from regulus.student.UU.SE ([130.238.5.2]:53522 "EHLO - regulus.its.uu.se") by merganser.its.uu.se with ESMTP - id ; Fri, 7 Jul 2000 18:09:14 +0200 -Received: from peter (helo=localhost) - by regulus.its.uu.se with local-esmtp (Exim 3.02 #2) - id 13Aani-0003A6-00; Fri, 07 Jul 2000 18:16:26 +0200 -Date: Fri, 7 Jul 2000 18:16:26 +0200 (CEST) -From: Peter Eisentraut -To: "D'Arcy J.M. Cain" -cc: PostgreSQL Development -Subject: Re: [HACKERS] Repair plan for inet and cidr types -In-Reply-To: -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=ISO-8859-1 -Content-Transfer-Encoding: 8BIT -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -D'Arcy J.M. Cain writes: - -> > Automatic casts should not lose information. How would you feel if floats -> > were automatically rounded when you store them into int fields? I think -> > this is an important principle in any type system. -> -> If it was defined well I would have no problem with it. - -That is certainly not how type systems operate anywhere. - -> I guess what I was really getting at was this. -> -> host OP cidr -> -> where inet would cast to host on one side and cidr on the other. What -> we have now is -> -> cidr OP cidr -> -> with both sides casting to cidr. Of course there is no such thing as a host -> type so I don't know how we would cast such a thing. - -I think that while the implicit casting could sometimes be convenient, -it's also a source of confusion. Consider the statement - -select '10.0.0.3'::cidr < '10.0.0.2'::inet; => f - -This cannot possibly make sense on closer inspection. Firstly, it's -semantic nonsense, you cannot order a network and a host. Secondly, it's -also wrong. According to the documentation, the '10.0.0.3'::cidr should be -converted to '10/8' internally. Then one of two things could have happened -here: 1) cidr was implicitly converted to inet and '10.0.0.3' is taken to -be a host, which is completely wrong. Or 2) inet was converted to cidr. -But then we're looking at '10/8' < '10.0.0.2/32', which should be true. - -See also - -select '10.0.0.2'::cidr = '10.0.0.2'::inet; => t - -which is wrong for similar reasons. - - -Then let's look at the << family of operators. - -select '10.0.0.2'::cidr >> '10.0.0.2'::inet; => f - -Again, there are two ways this could currently be resolved: - - '10/8'::cidr >> '10.0.0.2/32'::cidr which does return true -or - '10.0.0.2'::inet >> '10.0.0.2'::inet -which doesn't make any sense. - -On closer inspection, the inet << cidr case is completely misbehaving: - -select '10.0.0.5/8'::inet << '10.0.0.0/16'::cidr; => f -select '10.0.0.5/24'::inet << '10.0.0.0/16'::cidr; => t - -This is not what I'd expect. - -Concretely, the cases - inet << cidr - cidr << cidr -are not the same: - - '10.0.0.5/8'::inet << '10.0.0.0/16'::cidr -should be true - - '10.0.0.5/8'::cidr << '10.0.0.0/16'::cidr -should be false, if you allow the left-side value in at all, which I -wouldn't. - -What this tells me is that the cast from inet to cidr is not well-defined -in the mathematical sense, and therefore no implicit casting should be -allowed. - -So the bottom line here is that these two types are, while from a related -domain, different, and the user should be the one that controls when and -how they are mixed together. - - --- -Peter Eisentraut Sernanders väg 10:115 -peter_e@gmx.net 75262 Uppsala -http://yi.org/peter-e/ Sweden - - -From pgsql-hackers-owner+M5242@hub.org Sun Jul 23 10:01:45 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA21261 - for ; Sun, 23 Jul 2000 10:01:44 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6NE1Th91342; - Sun, 23 Jul 2000 10:01:29 -0400 (EDT) -Received: from lerami.lerctr.org (lerami.lerctr.org [207.158.72.11]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6NE10h91172 - for ; Sun, 23 Jul 2000 10:01:00 -0400 (EDT) -Received: (from ler@localhost) - by lerami.lerctr.org (8.10.1/8.10.1/20000715) id e6NE0w219946 - for pgsql-hackers@hub.org; Sun, 23 Jul 2000 09:00:58 -0500 (CDT) -From: Larry Rosenman -Message-Id: <200007231400.e6NE0w219946@lerami.lerctr.org> -Subject: [HACKERS] INET/CIDR types -To: pgsql-hackers@hub.org -Date: Sun, 23 Jul 2000 09:00:57 -0500 (CDT) -X-Mailer: ELM [version 2.4ME+ PL79 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -I noticed a discussion on this list about a re-do of the INET/CIDR -types. I was wondering if there was ANY way at all to add -an output function that ALWAYS returns all 4 octets of an INET or CIDR -type with and without the /netmask? - -I'm writing a IP Allocation/Tracking app for the ISP I work for, and -find the current output format causes confusion for the less -technical types. - -Larry Rosenman --- -Larry Rosenman http://www.lerctr.org/~ler -Phone: +1 972-414-9812 (voice) Internet: ler@lerctr.org -US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 - -From pgsql-hackers-owner+M5264@hub.org Mon Jul 24 10:34:39 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA09676 - for ; Mon, 24 Jul 2000 10:34:38 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OEXZh83378; - Mon, 24 Jul 2000 10:33:35 -0400 (EDT) -Received: from druid.net (root@druid.net [216.126.72.98]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6OEXGh83201 - for ; Mon, 24 Jul 2000 10:33:16 -0400 (EDT) -Received: from localhost (1444 bytes) by druid.net - via sendmail with P:stdio/R:bind_hosts/T:inet_zone_bind_smtp - (sender: ) (ident using unix) - id - for ; Mon, 24 Jul 2000 10:33:15 -0400 (EDT) - (Smail-3.2.0.109 1999-Oct-27 #3 built 2000-Jun-28) -Message-Id: -From: darcy@druid.net (D'Arcy J.M. Cain) -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: <200007231400.e6NE0w219946@lerami.lerctr.org> "from Larry Rosenman - at Jul 23, 2000 09:00:57 am" -To: Larry Rosenman -Date: Mon, 24 Jul 2000 10:33:14 -0400 (EDT) -CC: pgsql-hackers@hub.org -Reply-To: pgsql-hackers@hub.org -X-Mailer: ELM [version 2.4ME+ PL78 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Thus spake Larry Rosenman -> I noticed a discussion on this list about a re-do of the INET/CIDR -> types. I was wondering if there was ANY way at all to add -> an output function that ALWAYS returns all 4 octets of an INET or CIDR -> type with and without the /netmask? -> -> I'm writing a IP Allocation/Tracking app for the ISP I work for, and -> find the current output format causes confusion for the less -> technical types. - -The host() function does this for the INET type. It doesn't work for -the CIDR type (it throws an error) because CIDR doesn't have a host -part per se. - -darcy=> select '1.2.0.0/23'::inet; -?column? --------- -1.2.0/23 -(1 row) - -darcy=> select host('1.2.0.0/23'::inet); - host -------- -1.2.0.0 -(1 row) - --- -D'Arcy J.M. Cain | Democracy is three wolves -http://www.druid.net/darcy/ | and a sheep voting on -+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. - -From pgsql-hackers-owner+M5265@hub.org Mon Jul 24 10:43:14 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA09722 - for ; Mon, 24 Jul 2000 10:43:13 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OEfLh86364; - Mon, 24 Jul 2000 10:41:21 -0400 (EDT) -Received: from lerami.lerctr.org (lerami.lerctr.org [207.158.72.11]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6OEf6h86190 - for ; Mon, 24 Jul 2000 10:41:06 -0400 (EDT) -Received: (from ler@localhost) - by lerami.lerctr.org (8.10.1/8.10.1/20000715) id e6OEf5D12433; - Mon, 24 Jul 2000 09:41:05 -0500 (CDT) -From: Larry Rosenman -Message-Id: <200007241441.e6OEf5D12433@lerami.lerctr.org> -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: "from darcy@druid.net at Jul 24, 2000 - 10:33:14 am" -To: pgsql-hackers@hub.org -Date: Mon, 24 Jul 2000 09:41:05 -0500 (CDT) -CC: Larry Rosenman -X-Mailer: ELM [version 2.4ME+ PL79 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -The bad news is that I'm tracking CIDR blocks. - -If I could get a network() function to return essentially -host()::inet for CIDR's that would work. - -Larry -> Thus spake Larry Rosenman -> > I noticed a discussion on this list about a re-do of the INET/CIDR -> > types. I was wondering if there was ANY way at all to add -> > an output function that ALWAYS returns all 4 octets of an INET or CIDR -> > type with and without the /netmask? -> > -> > I'm writing a IP Allocation/Tracking app for the ISP I work for, and -> > find the current output format causes confusion for the less -> > technical types. -> -> The host() function does this for the INET type. It doesn't work for -> the CIDR type (it throws an error) because CIDR doesn't have a host -> part per se. -> -> darcy=> select '1.2.0.0/23'::inet; -> ?column? -> -------- -> 1.2.0/23 -> (1 row) -> -> darcy=> select host('1.2.0.0/23'::inet); -> host -> ------- -> 1.2.0.0 -> (1 row) -> -> -- -> D'Arcy J.M. Cain | Democracy is three wolves -> http://www.druid.net/darcy/ | and a sheep voting on -> +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. - - --- -Larry Rosenman http://www.lerctr.org/~ler -Phone: +1 972-414-9812 (voice) Internet: ler@lerctr.org -US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 - -From pgsql-hackers-owner+M5270@hub.org Mon Jul 24 15:17:30 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA11467 - for ; Mon, 24 Jul 2000 15:17:29 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OJHEh72992; - Mon, 24 Jul 2000 15:17:14 -0400 (EDT) -Received: from druid.net (root@druid.net [216.126.72.98]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6OJF3h71969 - for ; Mon, 24 Jul 2000 15:15:04 -0400 (EDT) -Received: from localhost (1687 bytes) by druid.net - via sendmail with P:stdio/R:bind_hosts/T:inet_zone_bind_smtp - (sender: ) (ident using unix) - id - for ; Mon, 24 Jul 2000 15:15:02 -0400 (EDT) - (Smail-3.2.0.109 1999-Oct-27 #3 built 2000-Jun-28) -Message-Id: -From: darcy@druid.net (D'Arcy J.M. Cain) -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: <200007241441.e6OEf5D12433@lerami.lerctr.org> "from Larry Rosenman - at Jul 24, 2000 09:41:05 am" -To: Larry Rosenman -Date: Mon, 24 Jul 2000 15:15:01 -0400 (EDT) -CC: pgsql-hackers@hub.org -Reply-To: pgsql-hackers@hub.org -X-Mailer: ELM [version 2.4ME+ PL78 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Thus spake Larry Rosenman -> The bad news is that I'm tracking CIDR blocks. - -Then there is no host part. I would argue that if someone is getting -confused with the current output then perhaps they shouldn't be dealing -with client networks. - -> If I could get a network() function to return essentially -> host()::inet for CIDR's that would work. - -There is a network function. It returns the network. - -darcy=> select network('1.2.0.0/23'::cidr); -network --------- -1.2.0/23 -(1 row) - -A lot of work went into these types to make them correct. I don't think -we should be undermining that to allow people to work with incorrect -assumptions. If you want Micro$oft you know where to find it. - -If you really must do this then store your blocks in the INET type. It -pretty much does what you want but doesn't try to pretend to be a CIDR. - - -Hmmm. I just noticed this. - -darcy=> select '1.2.0.1/23'::cidr; -?column? --------- -1.2.0/23 -(1 row) - -Shouldn't that throw an error? - --- -D'Arcy J.M. Cain | Democracy is three wolves -http://www.druid.net/darcy/ | and a sheep voting on -+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. - -From pgsql-hackers-owner+M5271@hub.org Mon Jul 24 15:28:37 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA11521 - for ; Mon, 24 Jul 2000 15:28:36 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OJSMh77820; - Mon, 24 Jul 2000 15:28:22 -0400 (EDT) -Received: from lerami.lerctr.org (lerami.lerctr.org [207.158.72.11]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6OJQhh76867 - for ; Mon, 24 Jul 2000 15:26:43 -0400 (EDT) -Received: from lerdesk (ler-desk.iadfw.net [206.66.13.18]) - by lerami.lerctr.org (8.10.1/8.10.1/20000715) with SMTP id e6OJQcc24312; - Mon, 24 Jul 2000 14:26:38 -0500 (CDT) -From: "Larry Rosenman" -To: , "Larry Rosenman" -Subject: RE: [HACKERS] INET/CIDR types -Date: Mon, 24 Jul 2000 14:26:37 -0500 -Message-ID: -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Content-Transfer-Encoding: 7bit -X-Priority: 3 (Normal) -X-MSMail-Priority: Normal -X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) -X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400 -Importance: Normal -In-Reply-To: -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -The problem is NON-TECHNICAL people will be getting the output, -and they expect 4 octet output. - -I really think that we should have a way to coerce a CIDR to -an INET, and then allow host(). - -Remember that I am dealing with $10/hour clerks. - -I really don't get the hostility to changing the OUTPUT format... - -Larry Rosenman - ------Original Message----- -From: D'Arcy J.M. Cain [mailto:darcy@druid.net] -Sent: Monday, July 24, 2000 2:15 PM -To: Larry Rosenman -Cc: pgsql-hackers@hub.org -Subject: Re: [HACKERS] INET/CIDR types - - -Thus spake Larry Rosenman -> The bad news is that I'm tracking CIDR blocks. - -Then there is no host part. I would argue that if someone is getting -confused with the current output then perhaps they shouldn't be dealing -with client networks. - -> If I could get a network() function to return essentially -> host()::inet for CIDR's that would work. - -There is a network function. It returns the network. - -darcy=> select network('1.2.0.0/23'::cidr); -network --------- -1.2.0/23 -(1 row) - -A lot of work went into these types to make them correct. I don't think -we should be undermining that to allow people to work with incorrect -assumptions. If you want Micro$oft you know where to find it. - -If you really must do this then store your blocks in the INET type. It -pretty much does what you want but doesn't try to pretend to be a CIDR. - - -Hmmm. I just noticed this. - -darcy=> select '1.2.0.1/23'::cidr; -?column? --------- -1.2.0/23 -(1 row) - -Shouldn't that throw an error? - --- -D'Arcy J.M. Cain | Democracy is three wolves -http://www.druid.net/darcy/ | and a sheep voting on -+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. - -From pgsql-hackers-owner+M5272@hub.org Mon Jul 24 15:35:28 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA11554 - for ; Mon, 24 Jul 2000 15:35:28 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OJZFh80569; - Mon, 24 Jul 2000 15:35:16 -0400 (EDT) -Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6OJXgh80113 - for ; Mon, 24 Jul 2000 15:33:42 -0400 (EDT) -Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) - by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id MAA07579; - Mon, 24 Jul 2000 12:33:32 -0700 (PDT) -Message-Id: <3.0.1.32.20000724123008.01189db0@mail.pacifier.com> -X-Sender: dhogaza@mail.pacifier.com -X-Mailer: Windows Eudora Pro Version 3.0.1 (32) -Date: Mon, 24 Jul 2000 12:30:08 -0700 -To: "Larry Rosenman" , , - "Larry Rosenman" -From: Don Baccus -Subject: RE: [HACKERS] INET/CIDR types -In-Reply-To: -References: -Mime-Version: 1.0 -Content-Type: text/plain; charset="us-ascii" -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -At 02:26 PM 7/24/00 -0500, Larry Rosenman wrote: ->The problem is NON-TECHNICAL people will be getting the output, ->and they expect 4 octet output. -> ->I really think that we should have a way to coerce a CIDR to ->an INET, and then allow host(). -> ->Remember that I am dealing with $10/hour clerks. -> ->I really don't get the hostility to changing the OUTPUT format... - -Are these $10/hour clerks typing in SQL to psql? Strange ... - -If not, formatting issues like this can easily be broken (or fixed, -according to your POV) by your application client. - - - -- Don Baccus, Portland OR - Nature photos, on-line guides, Pacific Northwest - Rare Bird Alert Service and other goodies at - http://donb.photo.net. - -From pgsql-hackers-owner+M5273@hub.org Mon Jul 24 15:38:46 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA11571 - for ; Mon, 24 Jul 2000 15:38:45 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OJcPh81593; - Mon, 24 Jul 2000 15:38:25 -0400 (EDT) -Received: from lerami.lerctr.org (lerami.lerctr.org [207.158.72.11]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6OJanh80997 - for ; Mon, 24 Jul 2000 15:36:49 -0400 (EDT) -Received: from lerdesk (ler-desk.iadfw.net [206.66.13.18]) - by lerami.lerctr.org (8.10.1/8.10.1/20000715) with SMTP id e6OJajc24698; - Mon, 24 Jul 2000 14:36:45 -0500 (CDT) -From: "Larry Rosenman" -To: "Don Baccus" , "Larry Rosenman" , - , "Larry Rosenman" -Subject: RE: [HACKERS] INET/CIDR types -Date: Mon, 24 Jul 2000 14:36:44 -0500 -Message-ID: -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Content-Transfer-Encoding: 7bit -X-Priority: 3 (Normal) -X-MSMail-Priority: Normal -X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) -X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400 -Importance: Normal -In-Reply-To: <3.0.1.32.20000724123008.01189db0@mail.pacifier.com> -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -I was hoping to have some niceties out of the SQL retrieve to print directly -in PHP, and not have to massage it. - -Why is there such animosity to printing out all 4 octets in some function -somewhere for a CIDR block? - -Larry - ------Original Message----- -From: Don Baccus [mailto:dhogaza@pacifier.com] -Sent: Monday, July 24, 2000 2:30 PM -To: Larry Rosenman; pgsql-hackers@hub.org; Larry Rosenman -Subject: RE: [HACKERS] INET/CIDR types - - -At 02:26 PM 7/24/00 -0500, Larry Rosenman wrote: ->The problem is NON-TECHNICAL people will be getting the output, ->and they expect 4 octet output. -> ->I really think that we should have a way to coerce a CIDR to ->an INET, and then allow host(). -> ->Remember that I am dealing with $10/hour clerks. -> ->I really don't get the hostility to changing the OUTPUT format... - -Are these $10/hour clerks typing in SQL to psql? Strange ... - -If not, formatting issues like this can easily be broken (or fixed, -according to your POV) by your application client. - - - -- Don Baccus, Portland OR - Nature photos, on-line guides, Pacific Northwest - Rare Bird Alert Service and other goodies at - http://donb.photo.net. - -From pgsql-hackers-owner+M5274@hub.org Mon Jul 24 16:19:47 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA11771 - for ; Mon, 24 Jul 2000 16:19:46 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OKJRh99659; - Mon, 24 Jul 2000 16:19:28 -0400 (EDT) -Received: from druid.net (root@druid.net [216.126.72.98]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6OKHbh98841 - for ; Mon, 24 Jul 2000 16:17:37 -0400 (EDT) -Received: from localhost (1546 bytes) by druid.net - via sendmail with P:stdio/R:bind_hosts/T:inet_zone_bind_smtp - (sender: ) (ident using unix) - id - for ; Mon, 24 Jul 2000 16:17:36 -0400 (EDT) - (Smail-3.2.0.109 1999-Oct-27 #3 built 2000-Jun-28) -Message-Id: -From: darcy@druid.net (D'Arcy J.M. Cain) -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: "from Larry Rosenman - at Jul 24, 2000 02:36:44 pm" -To: Larry Rosenman -Date: Mon, 24 Jul 2000 16:17:36 -0400 (EDT) -CC: Don Baccus , pgsql-hackers@hub.org -Reply-To: pgsql-hackers@hub.org -X-Mailer: ELM [version 2.4ME+ PL78 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Thus spake Larry Rosenman -> I was hoping to have some niceties out of the SQL retrieve to print directly -> in PHP, and not have to massage it. -> -> Why is there such animosity to printing out all 4 octets in some function -> somewhere for a CIDR block? - -You keep saying "hostility" as if we are ganging up against you. Believe -me, I have no animosity towards you and I am sure no one else has either. -We are resisting the change you want simply because it would violate the -RFC which we agreed to follow when we created the types. - -If you think this is hostile, you will probably think that the original -discussions in the archives are nuclear war :-). If you would like to -look it over make sure to set aside a lot of time. We spent a long time -hashing this out. - --- -D'Arcy J.M. Cain | Democracy is three wolves -http://www.druid.net/darcy/ | and a sheep voting on -+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. - -From pgsql-hackers-owner+M5275@hub.org Mon Jul 24 16:29:30 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA11819 - for ; Mon, 24 Jul 2000 16:29:28 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OKT6h02534; - Mon, 24 Jul 2000 16:29:06 -0400 (EDT) -Received: from lerami.lerctr.org (lerami.lerctr.org [207.158.72.11]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6OKRUh02012 - for ; Mon, 24 Jul 2000 16:27:30 -0400 (EDT) -Received: from lerdesk (ler-desk.iadfw.net [206.66.13.18]) - by lerami.lerctr.org (8.10.1/8.10.1/20000715) with SMTP id e6OKRPc26861; - Mon, 24 Jul 2000 15:27:25 -0500 (CDT) -From: "Larry Rosenman" -To: , "Larry Rosenman" -Cc: "Don Baccus" -Subject: RE: [HACKERS] INET/CIDR types -Date: Mon, 24 Jul 2000 15:27:24 -0500 -Message-ID: -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Content-Transfer-Encoding: 7bit -X-Priority: 3 (Normal) -X-MSMail-Priority: Normal -X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) -X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400 -Importance: Normal -In-Reply-To: -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -What RFC says you can't print all 4 octets of a CIDR Netnumber? - -Why does network(cidr) return the whole cidr output just like -select cidr? - -I'm just trying to figure out the logic here. - -Here is what my Cisco Router that speaks BGP says: -big-bro#term ip netmask-format bit-count -big-bro#sh ip bg 206.66.0.0/20 -BGP routing table entry for 206.66.0.0/20, version 150832 -Paths: (5 available, best #4) - Advertised to non peer-group peers: - 157.130.140.109 166.63.135.33 206.66.12.3 206.66.12.4 206.66.12.7 -206.66.12. -8 - Local, (aggregated by 4278 206.66.12.3), (received & used) - 206.66.12.3 from 206.66.12.3 (206.66.12.3) - Origin IGP, localpref 0, valid, internal, atomic-aggregate - Local, (aggregated by 4278 206.66.12.7), (received & used) - 206.66.12.7 from 206.66.12.7 (206.66.12.7) - Origin IGP, localpref 0, valid, internal, atomic-aggregate - Local, (aggregated by 4278 206.66.12.8), (received & used) - 206.66.12.8 from 206.66.12.8 (206.66.12.8) - Origin IGP, localpref 0, valid, internal, atomic-aggregate - Local, (aggregated by 4278 206.66.12.1) - 0.0.0.0 from 0.0.0.0 (206.66.12.1) - Origin IGP, localpref 100, weight 32768, valid, aggregated, local, -atomic- -aggregate, best - Local, (received & used) - 206.66.12.4 from 206.66.12.4 (206.66.12.4) - Origin IGP, metric 0, localpref 100, valid, internal -big-bro# - -I am just asking for the same type output. - -Why is this so hard? - -The info is in the type, and the print routine wouldn't be so hard. - -I can probably write the function in less than 1 hour, but getting it -integrated is -my stumbling block. - - - ------Original Message----- -From: D'Arcy J.M. Cain [mailto:darcy@druid.net] -Sent: Monday, July 24, 2000 3:18 PM -To: Larry Rosenman -Cc: Don Baccus; pgsql-hackers@hub.org -Subject: Re: [HACKERS] INET/CIDR types - - -Thus spake Larry Rosenman -> I was hoping to have some niceties out of the SQL retrieve to print -directly -> in PHP, and not have to massage it. -> -> Why is there such animosity to printing out all 4 octets in some function -> somewhere for a CIDR block? - -You keep saying "hostility" as if we are ganging up against you. Believe -me, I have no animosity towards you and I am sure no one else has either. -We are resisting the change you want simply because it would violate the -RFC which we agreed to follow when we created the types. - -If you think this is hostile, you will probably think that the original -discussions in the archives are nuclear war :-). If you would like to -look it over make sure to set aside a lot of time. We spent a long time -hashing this out. - --- -D'Arcy J.M. Cain | Democracy is three wolves -http://www.druid.net/darcy/ | and a sheep voting on -+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. - - -From pgsql-hackers-owner+M5276@hub.org Mon Jul 24 16:54:14 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA11929 - for ; Mon, 24 Jul 2000 16:54:13 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OKruh24719; - Mon, 24 Jul 2000 16:53:56 -0400 (EDT) -Received: from druid.net (root@druid.net [216.126.72.98]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6OKrPh24294 - for ; Mon, 24 Jul 2000 16:53:25 -0400 (EDT) -Received: from localhost (1495 bytes) by druid.net - via sendmail with P:stdio/R:bind_hosts/T:inet_zone_bind_smtp - (sender: ) (ident using unix) - id - for ; Mon, 24 Jul 2000 16:53:24 -0400 (EDT) - (Smail-3.2.0.109 1999-Oct-27 #3 built 2000-Jun-28) -Message-Id: -From: darcy@druid.net (D'Arcy J.M. Cain) -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: "from Larry Rosenman - at Jul 24, 2000 03:27:24 pm" -To: Larry Rosenman -Date: Mon, 24 Jul 2000 16:53:24 -0400 (EDT) -CC: pgsql-hackers@hub.org, Don Baccus -Reply-To: pgsql-hackers@hub.org -X-Mailer: ELM [version 2.4ME+ PL78 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Thus spake Larry Rosenman -> What RFC says you can't print all 4 octets of a CIDR Netnumber? - -Can't recall. It was Paul Vixie who made the claim and since he was -probably the one who wrote it I tend to believe him. - -In fact it may be that it suggested rather than required but someone -would have to dig out the RFC before we considered changing it I think. - -> Why does network(cidr) return the whole cidr output just like -> select cidr? - -Yah, it's redundant. "network(cidr)" is just a long way to say "cidr." -The only reason it is there is because of the way the code was written -for the two types. Not having it would have required a special test to -look for it and technically it is correct so we didn't bother. - --- -D'Arcy J.M. Cain | Democracy is three wolves -http://www.druid.net/darcy/ | and a sheep voting on -+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. - -From pgsql-hackers-owner+M5277@hub.org Mon Jul 24 17:12:12 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA12251 - for ; Mon, 24 Jul 2000 17:12:11 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OLBth32813; - Mon, 24 Jul 2000 17:11:55 -0400 (EDT) -Received: from lerami.lerctr.org (lerami.lerctr.org [207.158.72.11]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6OLBah32716 - for ; Mon, 24 Jul 2000 17:11:36 -0400 (EDT) -Received: from lerdesk (ler-desk.iadfw.net [206.66.13.18]) - by lerami.lerctr.org (8.10.1/8.10.1/20000715) with SMTP id e6OLBZc28924; - Mon, 24 Jul 2000 16:11:35 -0500 (CDT) -From: "Larry Rosenman" -To: , "Larry Rosenman" -Cc: "Don Baccus" -Subject: RE: [HACKERS] INET/CIDR types -Date: Mon, 24 Jul 2000 16:11:34 -0500 -Message-ID: -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Content-Transfer-Encoding: 7bit -X-Priority: 3 (Normal) -X-MSMail-Priority: Normal -X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) -X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400 -Importance: Normal -In-Reply-To: -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Can we dig up the RFC? - -Larry - ------Original Message----- -From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On -Behalf Of D'Arcy J.M. Cain -Sent: Monday, July 24, 2000 3:53 PM -To: Larry Rosenman -Cc: pgsql-hackers@hub.org; Don Baccus -Subject: Re: [HACKERS] INET/CIDR types - - -Thus spake Larry Rosenman -> What RFC says you can't print all 4 octets of a CIDR Netnumber? - -Can't recall. It was Paul Vixie who made the claim and since he was -probably the one who wrote it I tend to believe him. - -In fact it may be that it suggested rather than required but someone -would have to dig out the RFC before we considered changing it I think. - -> Why does network(cidr) return the whole cidr output just like -> select cidr? - -Yah, it's redundant. "network(cidr)" is just a long way to say "cidr." -The only reason it is there is because of the way the code was written -for the two types. Not having it would have required a special test to -look for it and technically it is correct so we didn't bother. - --- -D'Arcy J.M. Cain | Democracy is three wolves -http://www.druid.net/darcy/ | and a sheep voting on -+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. - -From pgsql-hackers-owner+M5278@hub.org Mon Jul 24 18:31:03 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA12804 - for ; Mon, 24 Jul 2000 18:31:03 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OMUmh59689; - Mon, 24 Jul 2000 18:30:48 -0400 (EDT) -Received: from anubis (anubis.ip23.net [212.83.32.60]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6OMTxh58656 - for ; Mon, 24 Jul 2000 18:29:59 -0400 (EDT) -Received: from ip23.net (umbriel [212.83.32.61]) by anubis (980427.SGI.8.8.8/980728.SGI.AUTOCF) via ESMTP id AAA96097; Tue, 25 Jul 2000 00:29:42 +0200 (CEST) -Message-ID: <397CC356.7C78A018@ip23.net> -Date: Tue, 25 Jul 2000 00:29:42 +0200 -From: Sevo Stille -Reply-To: sevo@ip23.net -Organization: IP23 -X-Mailer: Mozilla 4.7 [en] (WinNT; U) -X-Accept-Language: en-GB,en,de,en-US -MIME-Version: 1.0 -To: Larry Rosenman -CC: pgsql-hackers@hub.org -Subject: Re: [HACKERS] INET/CIDR types -References: -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Larry Rosenman wrote: -> -> The problem is NON-TECHNICAL people will be getting the output, -> and they expect 4 octet output. - -Well, but what are they going to do if they see, say, that 196.100.0.0 -is already allocated? Any CIDR net starting off on the .0 will have -exactly the same 4 octet notation. That is, the above entry would only -tell that there is some indeterminable number of addresses starting off -196.100.0.0 allocated, which could be anything between a measly /31 and -a whopping big /16. To repeat: CIDR having no implicit netmask encoded -in the class, there is no way of figuring out your allocation if you -lose the explicit mask. Which presumably will cause considerable -problems in a network allocation and tracking application! - -> I really think that we should have a way to coerce a CIDR to -> an INET, and then allow host(). - -There is no unique mapping of a CIDR network to a INET host address, -except for the special case of /32. - -> Remember that I am dealing with $10/hour clerks. - -Then given them a interface which makes the concept of CIDR obvious to -them. Faking a classed notation is no way to go! IP v.4 being what it -is, and registries being on the move to enforce CIDR more and more, they -will inevitably encounter CIDR sooner or later, probably in a business -critical way. - -> I really don't get the hostility to changing the OUTPUT format... - -Anything broken that is added will sooner or later be used by somebody. -Which means that it can't be fixed without breaking some applications. -That alone should be a good enough reason not to introduce any broken -notions intentionally. - -Sevo - --- -Sevo Stille -sevo@ip23.net - -From pgsql-hackers-owner+M5279@hub.org Mon Jul 24 18:31:24 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA12809 - for ; Mon, 24 Jul 2000 18:31:23 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OMVBh60018; - Mon, 24 Jul 2000 18:31:11 -0400 (EDT) -Received: from paprika.michvhf.com (paprika.michvhf.com [209.103.136.12]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OMUrh59759 - for ; Mon, 24 Jul 2000 18:30:53 -0400 (EDT) -Received: (qmail 39846 invoked by uid 1001); 24 Jul 2000 22:30:59 -0000 -Date: Mon, 24 Jul 2000 18:30:59 -0400 (EDT) -From: Vince Vielhaber -To: Larry Rosenman -cc: pgsql-hackers@hub.org -Subject: RE: [HACKERS] INET/CIDR types -In-Reply-To: -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -On Mon, 24 Jul 2000, Larry Rosenman wrote: - -> Can we dig up the RFC? - -Feel free. You can start your research with RFC1467 and look back at -what it touches on, then on to 1517, 1518 and 1519 then to 1817 and -then to 2317. If, after reading these, you don't understand why and/or -why not you can check with Paul himself at www.vix.com, 'cuze if you -don't understand then he's your only hope. - -Vince. --- -========================================================================== -Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net - 128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking - Online Campground Directory http://www.camping-usa.com - Online Giftshop Superstore http://www.cloudninegifts.com -========================================================================== - - - - -From pgsql-hackers-owner+M5280@hub.org Mon Jul 24 18:53:56 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA12905 - for ; Mon, 24 Jul 2000 18:53:55 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6OMrjh74011; - Mon, 24 Jul 2000 18:53:45 -0400 (EDT) -Received: from anubis (anubis.ip23.net [212.83.32.60]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6OMrNh73763 - for ; Mon, 24 Jul 2000 18:53:24 -0400 (EDT) -Received: from ip23.net (umbriel [212.83.32.61]) by anubis (980427.SGI.8.8.8/980728.SGI.AUTOCF) via ESMTP id AAA95973; Tue, 25 Jul 2000 00:52:30 +0200 (CEST) -Message-ID: <397CC8AE.FC342360@ip23.net> -Date: Tue, 25 Jul 2000 00:52:30 +0200 -From: Sevo Stille -Reply-To: sevo@ip23.net -Organization: IP23 -X-Mailer: Mozilla 4.7 [en] (WinNT; U) -X-Accept-Language: en-GB,en,de,en-US -MIME-Version: 1.0 -To: Larry Rosenman -CC: pgsql-hackers@hub.org, Don Baccus -Subject: Re: [HACKERS] INET/CIDR types -References: -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Larry Rosenman wrote: -> -> What RFC says you can't print all 4 octets of a CIDR Netnumber? - -Implicitly in 1518, for example: ----------8<----------------------8<----------------------8<------------ -For the purposes of this paper, an IP prefix is an IP address and - some indication of the leftmost contiguous significant bits within - this address. ----------8<----------------------8<----------------------8<------------ - -As I already explained, the use of variable-length masks implies that -they have to be explicitly stated. This was not neccessary in classed -networks, as the MSB's encoded the class (mask). - -> Why does network(cidr) return the whole cidr output just like -> select cidr? - -Because a cast to network is a cast to CIDR - casting to the same type -obviously won't change a thing. - -> I'm just trying to figure out the logic here. - -As a matter of fact, it is the side effect of the current -implementations shortcomings - we have common code for INET and CIDR, -otherwise, network would not have to be a valid operator for CIDR. - -> Here is what my Cisco Router that speaks BGP says: -> big-bro#term ip netmask-format bit-count -> big-bro#sh ip bg 206.66.0.0/20 -> BGP routing table entry for 206.66.0.0/20, version 150832 -> Paths: (5 available, best #4) -> Advertised to non peer-group peers: -> 157.130.140.109 166.63.135.33 206.66.12.3 206.66.12.4 206.66.12.7 -> 206.66.12. -> ... -> I am just asking for the same type output. - -Huh? The only *network* I see in there IS in /bits notation. - -Sevo - --- -Sevo Stille -sevo@ip23.net - -From pgsql-hackers-owner+M5282@hub.org Mon Jul 24 19:05:38 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA13152 - for ; Mon, 24 Jul 2000 19:05:37 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6ON5Mh82281; - Mon, 24 Jul 2000 19:05:22 -0400 (EDT) -Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6ON4qh81966 - for ; Mon, 24 Jul 2000 19:04:52 -0400 (EDT) -Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) - by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id QAA16750; - Mon, 24 Jul 2000 16:04:27 -0700 (PDT) -Message-Id: <3.0.1.32.20000724160016.01192100@mail.pacifier.com> -X-Sender: dhogaza@mail.pacifier.com -X-Mailer: Windows Eudora Pro Version 3.0.1 (32) -Date: Mon, 24 Jul 2000 16:00:16 -0700 -To: Vince Vielhaber , Larry Rosenman -From: Don Baccus -Subject: RE: [HACKERS] INET/CIDR types -Cc: pgsql-hackers@hub.org -In-Reply-To: -References: -Mime-Version: 1.0 -Content-Type: text/plain; charset="us-ascii" -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -At 06:30 PM 7/24/00 -0400, Vince Vielhaber wrote: ->On Mon, 24 Jul 2000, Larry Rosenman wrote: -> ->> Can we dig up the RFC? -> ->Feel free. You can start your research with RFC1467 and look back at ->what it touches on, then on to 1517, 1518 and 1519 then to 1817 and ->then to 2317. If, after reading these, you don't understand why and/or ->why not you can check with Paul himself at www.vix.com, 'cuze if you ->don't understand then he's your only hope. - -I bet just hacking your PHP script to format it in the way you want -would involve a heck of a lot less effort ... - - - -- Don Baccus, Portland OR - Nature photos, on-line guides, Pacific Northwest - Rare Bird Alert Service and other goodies at - http://donb.photo.net. - -From pgsql-hackers-owner+M5281@hub.org Mon Jul 24 19:01:25 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA12969 - for ; Mon, 24 Jul 2000 19:01:25 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6ON1Dh79863; - Mon, 24 Jul 2000 19:01:13 -0400 (EDT) -Received: from lerami.lerctr.org (lerami.lerctr.org [207.158.72.11]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6ON0rh79630 - for ; Mon, 24 Jul 2000 19:00:53 -0400 (EDT) -Received: (from ler@localhost) - by lerami.lerctr.org (8.10.1/8.10.1/20000715) id e6ON0lO03554; - Mon, 24 Jul 2000 18:00:47 -0500 (CDT) -From: Larry Rosenman -Message-Id: <200007242300.e6ON0lO03554@lerami.lerctr.org> -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: <397CC8AE.FC342360@ip23.net> "from Sevo Stille at Jul 25, 2000 00:52:30 - am" -To: sevo@ip23.net -Date: Mon, 24 Jul 2000 18:00:46 -0500 (CDT) -CC: Larry Rosenman , pgsql-hackers@hub.org, - Don Baccus -X-Mailer: ELM [version 2.4ME+ PL79 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -> Larry Rosenman wrote: -> > -> > What RFC says you can't print all 4 octets of a CIDR Netnumber? -> -> Implicitly in 1518, for example: -> ---------8<----------------------8<----------------------8<------------ -> For the purposes of this paper, an IP prefix is an IP address and -> some indication of the leftmost contiguous significant bits within -> this address. -> ---------8<----------------------8<----------------------8<------------ -This doesn't prohibit listing all 4 octets, which is my argument... -> -> As I already explained, the use of variable-length masks implies that -> they have to be explicitly stated. This was not neccessary in classed -> networks, as the MSB's encoded the class (mask). -I know this... -> -> > Why does network(cidr) return the whole cidr output just like -> > select cidr? -> -> Because a cast to network is a cast to CIDR - casting to the same type -> obviously won't change a thing. -> -> > I'm just trying to figure out the logic here. -> -> As a matter of fact, it is the side effect of the current -> implementations shortcomings - we have common code for INET and CIDR, -> otherwise, network would not have to be a valid operator for CIDR. - -I just want something equivalent to host(inet) that -prints all 4 octets of a CIDR type with no mask. - -Is that hard? -> -> > Here is what my Cisco Router that speaks BGP says: -> > big-bro#term ip netmask-format bit-count -> > big-bro#sh ip bg 206.66.0.0/20 -> > BGP routing table entry for 206.66.0.0/20, version 150832 -> > Paths: (5 available, best #4) -> > Advertised to non peer-group peers: -> > 157.130.140.109 166.63.135.33 206.66.12.3 206.66.12.4 206.66.12.7 -> > 206.66.12. -> > ... -> > I am just asking for the same type output. -> -> Huh? The only *network* I see in there IS in /bits notation. -Yes, but with all 4 octets, which is all I'm looking for.... - - -> -> Sevo -> -> -- -> Sevo Stille -> sevo@ip23.net - - --- -Larry Rosenman http://www.lerctr.org/~ler -Phone: +1 972-414-9812 (voice) Internet: ler@lerctr.org -US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 - -From pgsql-hackers-owner+M5285@hub.org Mon Jul 24 22:17:47 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA22852 - for ; Mon, 24 Jul 2000 22:17:46 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6P2CVh51074; - Mon, 24 Jul 2000 22:12:31 -0400 (EDT) -Received: from spider.pilosoft.com (p55-222.acedsl.com [160.79.55.222]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6P2Boh50884 - for ; Mon, 24 Jul 2000 22:11:50 -0400 (EDT) -Received: from localhost (alexmail@localhost) - by spider.pilosoft.com (8.9.3/8.9.3) with ESMTP id WAA03783; - Mon, 24 Jul 2000 22:13:58 -0400 (EDT) -Date: Mon, 24 Jul 2000 22:13:57 -0400 (EDT) -From: Alex Pilosov -To: Sevo Stille -cc: Larry Rosenman , pgsql-hackers@hub.org -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: <397CC356.7C78A018@ip23.net> -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -This whole discussion is quite silly guys. - -It is quite reasonable to have ability to split CIDR net into two pieces: -the network and the bitshift. Second one is already possible, the first -one can be accomplished by having functions to convert a cidr/inet to int8 -(not int4 because of sign thing), and back. - -Its also very easy to implement ;) - -This will actually come very useful for many applications. Something I'm -working on now (allocation of 'most appropriate' block) requires ability -to split a netblock into two, which could be most easily accomplished -using int8 math. (net::int8+2^(netmask(net)-1)). - -Now, patch anyone? :) --alex -On Tue, 25 Jul 2000, Sevo Stille wrote: - -> Larry Rosenman wrote: -> > -> > The problem is NON-TECHNICAL people will be getting the output, -> > and they expect 4 octet output. -> -> Well, but what are they going to do if they see, say, that 196.100.0.0 -> is already allocated? Any CIDR net starting off on the .0 will have -> exactly the same 4 octet notation. That is, the above entry would only -> tell that there is some indeterminable number of addresses starting off -> 196.100.0.0 allocated, which could be anything between a measly /31 and -> a whopping big /16. To repeat: CIDR having no implicit netmask encoded -> in the class, there is no way of figuring out your allocation if you -> lose the explicit mask. Which presumably will cause considerable -> problems in a network allocation and tracking application! -> -> > I really think that we should have a way to coerce a CIDR to -> > an INET, and then allow host(). -> -> There is no unique mapping of a CIDR network to a INET host address, -> except for the special case of /32. -> -> > Remember that I am dealing with $10/hour clerks. -> -> Then given them a interface which makes the concept of CIDR obvious to -> them. Faking a classed notation is no way to go! IP v.4 being what it -> is, and registries being on the move to enforce CIDR more and more, they -> will inevitably encounter CIDR sooner or later, probably in a business -> critical way. -> -> > I really don't get the hostility to changing the OUTPUT format... -> -> Anything broken that is added will sooner or later be used by somebody. -> Which means that it can't be fixed without breaking some applications. -> That alone should be a good enough reason not to introduce any broken -> notions intentionally. -> -> Sevo -> -> - - -From pgsql-hackers-owner+M5287@hub.org Mon Jul 24 22:48:05 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA23082 - for ; Mon, 24 Jul 2000 22:48:04 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6P2hrh58014; - Mon, 24 Jul 2000 22:43:53 -0400 (EDT) -Received: from lerami.lerctr.org (lerami.lerctr.org [207.158.72.11]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6P2hbh57922 - for ; Mon, 24 Jul 2000 22:43:37 -0400 (EDT) -Received: (from ler@localhost) - by lerami.lerctr.org (8.10.1/8.10.1/20000715) id e6P2eVl12604; - Mon, 24 Jul 2000 21:40:31 -0500 (CDT) -From: Larry Rosenman -Message-Id: <200007250240.e6P2eVl12604@lerami.lerctr.org> -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: - "from Alex Pilosov at Jul 24, 2000 10:13:57 pm" -To: Alex Pilosov -Date: Mon, 24 Jul 2000 21:40:31 -0500 (CDT) -CC: Sevo Stille , Larry Rosenman , - pgsql-hackers@hub.org -X-Mailer: ELM [version 2.4ME+ PL79 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -> This whole discussion is quite silly guys. -> -> It is quite reasonable to have ability to split CIDR net into two pieces: -> the network and the bitshift. Second one is already possible, the first -> one can be accomplished by having functions to convert a cidr/inet to int8 -> (not int4 because of sign thing), and back. -> -> Its also very easy to implement ;) -> -> This will actually come very useful for many applications. Something I'm -> working on now (allocation of 'most appropriate' block) requires ability -> to split a netblock into two, which could be most easily accomplished -> using int8 math. (net::int8+2^(netmask(net)-1)). -All I'm looking for is to be able to print all 4 octets of an IP address -out so that joe user can take the 4 numbers and type it into the -4 boxes on a Windows 98 box, and use them. - -Is that really that abhorrent? - -They also need the 4 octet netmask which I can get now. - -All we are missing is a way to print ALL 4 NUMBERS ALL THE TIME -for the output. It's not asking for classful, and for sure -we use CIDR all over the place, but for the final output that my -users see, why can't I have the database just print all 4 octets? - -Why is this discussion so hard? - -Larry -> -> Now, patch anyone? :) -> -alex -> On Tue, 25 Jul 2000, Sevo Stille wrote: -> -> > Larry Rosenman wrote: -> > > -> > > The problem is NON-TECHNICAL people will be getting the output, -> > > and they expect 4 octet output. -> > -> > Well, but what are they going to do if they see, say, that 196.100.0.0 -> > is already allocated? Any CIDR net starting off on the .0 will have -> > exactly the same 4 octet notation. That is, the above entry would only -> > tell that there is some indeterminable number of addresses starting off -> > 196.100.0.0 allocated, which could be anything between a measly /31 and -> > a whopping big /16. To repeat: CIDR having no implicit netmask encoded -> > in the class, there is no way of figuring out your allocation if you -> > lose the explicit mask. Which presumably will cause considerable -> > problems in a network allocation and tracking application! -> > -> > > I really think that we should have a way to coerce a CIDR to -> > > an INET, and then allow host(). -> > -> > There is no unique mapping of a CIDR network to a INET host address, -> > except for the special case of /32. -> > -> > > Remember that I am dealing with $10/hour clerks. -> > -> > Then given them a interface which makes the concept of CIDR obvious to -> > them. Faking a classed notation is no way to go! IP v.4 being what it -> > is, and registries being on the move to enforce CIDR more and more, they -> > will inevitably encounter CIDR sooner or later, probably in a business -> > critical way. -> > -> > > I really don't get the hostility to changing the OUTPUT format... -> > -> > Anything broken that is added will sooner or later be used by somebody. -> > Which means that it can't be fixed without breaking some applications. -> > That alone should be a good enough reason not to introduce any broken -> > notions intentionally. -> > -> > Sevo -> > -> > -> - - --- -Larry Rosenman http://www.lerctr.org/~ler -Phone: +1 972-414-9812 (voice) Internet: ler@lerctr.org -US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 - -From pgsql-hackers-owner+M5288@hub.org Mon Jul 24 22:51:31 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA23098 - for ; Mon, 24 Jul 2000 22:51:30 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6P2omh59374; - Mon, 24 Jul 2000 22:50:48 -0400 (EDT) -Received: from spider.pilosoft.com (p55-222.acedsl.com [160.79.55.222]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6P2oah59301 - for ; Mon, 24 Jul 2000 22:50:36 -0400 (EDT) -Received: from localhost (alexmail@localhost) - by spider.pilosoft.com (8.9.3/8.9.3) with ESMTP id WAA03944; - Mon, 24 Jul 2000 22:53:20 -0400 (EDT) -Date: Mon, 24 Jul 2000 22:53:20 -0400 (EDT) -From: Alex Pilosov -To: Larry Rosenman -cc: Sevo Stille , pgsql-hackers@hub.org -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: <200007250240.e6P2eVl12604@lerami.lerctr.org> -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -On Mon, 24 Jul 2000, Larry Rosenman wrote: - -> > This whole discussion is quite silly guys. -> > -> > It is quite reasonable to have ability to split CIDR net into two pieces: -> > the network and the bitshift. Second one is already possible, the first -> > one can be accomplished by having functions to convert a cidr/inet to int8 -> > (not int4 because of sign thing), and back. -> > -> > Its also very easy to implement ;) -> > -> > This will actually come very useful for many applications. Something I'm -> > working on now (allocation of 'most appropriate' block) requires ability -> > to split a netblock into two, which could be most easily accomplished -> > using int8 math. (net::int8+2^(netmask(net)-1)). -> All I'm looking for is to be able to print all 4 octets of an IP address -> out so that joe user can take the 4 numbers and type it into the -> 4 boxes on a Windows 98 box, and use them. -> -> Is that really that abhorrent? -> -> They also need the 4 octet netmask which I can get now. -> -> All we are missing is a way to print ALL 4 NUMBERS ALL THE TIME -> for the output. It's not asking for classful, and for sure -> we use CIDR all over the place, but for the final output that my -> users see, why can't I have the database just print all 4 octets? - -Larry, -With my suggestion, you can do it as follows: - -net::int8::inet - -(net being of cidr type) --alex - - -From pgsql-hackers-owner+M5290@hub.org Mon Jul 24 23:10:44 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA23371 - for ; Mon, 24 Jul 2000 23:10:44 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6P3ATh64149; - Mon, 24 Jul 2000 23:10:29 -0400 (EDT) -Received: from lerami.lerctr.org (lerami.lerctr.org [207.158.72.11]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6P3AAh64000 - for ; Mon, 24 Jul 2000 23:10:10 -0400 (EDT) -Received: (from ler@localhost) - by lerami.lerctr.org (8.10.1/8.10.1/20000715) id e6P3A1v13825; - Mon, 24 Jul 2000 22:10:01 -0500 (CDT) -From: Larry Rosenman -Message-Id: <200007250310.e6P3A1v13825@lerami.lerctr.org> -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: - "from Alex Pilosov at Jul 24, 2000 10:53:20 pm" -To: Alex Pilosov -Date: Mon, 24 Jul 2000 22:10:01 -0500 (CDT) -CC: Larry Rosenman , Sevo Stille , - pgsql-hackers@hub.org -X-Mailer: ELM [version 2.4ME+ PL79 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -The bad news is it doesn't work now... - - -ler=# select host(netblock::int8::inet) from networks; -ERROR: Cannot cast type 'cidr' to 'int8' -ler=# \d networks - Table "networks" - Attribute | Type | Modifier - ---------------+--------------+---------- - netblock | cidr | - router | integer | - interface | varchar(64) | - dest_ip | inet | - net_name | varchar(64) | - owner | integer | - origin | varchar(256) | - assigned_date | date | - assigned_by | varchar(64) | - asn | smallint | - - ler=# - -> On Mon, 24 Jul 2000, Larry Rosenman wrote: -> -> > > This whole discussion is quite silly guys. -> > > -> > > It is quite reasonable to have ability to split CIDR net into two pieces: -> > > the network and the bitshift. Second one is already possible, the first -> > > one can be accomplished by having functions to convert a cidr/inet to int8 -> > > (not int4 because of sign thing), and back. -> > > -> > > Its also very easy to implement ;) -> > > -> > > This will actually come very useful for many applications. Something I'm -> > > working on now (allocation of 'most appropriate' block) requires ability -> > > to split a netblock into two, which could be most easily accomplished -> > > using int8 math. (net::int8+2^(netmask(net)-1)). -> > All I'm looking for is to be able to print all 4 octets of an IP address -> > out so that joe user can take the 4 numbers and type it into the -> > 4 boxes on a Windows 98 box, and use them. -> > -> > Is that really that abhorrent? -> > -> > They also need the 4 octet netmask which I can get now. -> > -> > All we are missing is a way to print ALL 4 NUMBERS ALL THE TIME -> > for the output. It's not asking for classful, and for sure -> > we use CIDR all over the place, but for the final output that my -> > users see, why can't I have the database just print all 4 octets? -> -> Larry, -> With my suggestion, you can do it as follows: -> -> net::int8::inet -> -> (net being of cidr type) -> -alex -> - - --- -Larry Rosenman http://www.lerctr.org/~ler -Phone: +1 972-414-9812 (voice) Internet: ler@lerctr.org -US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 - -From pgsql-hackers-owner+M5291@hub.org Mon Jul 24 23:24:11 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA23433 - for ; Mon, 24 Jul 2000 23:24:10 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6P3Nfh69885; - Mon, 24 Jul 2000 23:23:41 -0400 (EDT) -Received: from spider.pilosoft.com (p55-222.acedsl.com [160.79.55.222]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6P3Mvh69635 - for ; Mon, 24 Jul 2000 23:22:57 -0400 (EDT) -Received: from localhost (alexmail@localhost) - by spider.pilosoft.com (8.9.3/8.9.3) with ESMTP id XAA27580; - Mon, 24 Jul 2000 23:25:41 -0400 (EDT) -Date: Mon, 24 Jul 2000 23:25:41 -0400 (EDT) -From: Alex Pilosov -To: Larry Rosenman -cc: Sevo Stille , pgsql-hackers@hub.org -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: <200007250310.e6P3A1v13825@lerami.lerctr.org> -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Yes, I know. -I didn't say it existed, I proposed to create a simple conversion function -that would do that, which is why I asked for a patch. - -I'd do it myself but it'll take some time. Should be really simple, -something to the effect of return a.s_addr (where a is struct in_addr), -however, I'm not sure what's POSIXly correct way to do that. - - - -On Mon, 24 Jul 2000, Larry Rosenman wrote: - -> The bad news is it doesn't work now... -> -> -> ler=# select host(netblock::int8::inet) from networks; -> ERROR: Cannot cast type 'cidr' to 'int8' -> ler=# \d networks -> Table "networks" -> Attribute | Type | Modifier -> ---------------+--------------+---------- -> netblock | cidr | -> router | integer | -> interface | varchar(64) | -> dest_ip | inet | -> net_name | varchar(64) | -> owner | integer | -> origin | varchar(256) | -> assigned_date | date | -> assigned_by | varchar(64) | -> asn | smallint | -> -> ler=# -> -> > On Mon, 24 Jul 2000, Larry Rosenman wrote: -> > -> > > > This whole discussion is quite silly guys. -> > > > -> > > > It is quite reasonable to have ability to split CIDR net into two pieces: -> > > > the network and the bitshift. Second one is already possible, the first -> > > > one can be accomplished by having functions to convert a cidr/inet to int8 -> > > > (not int4 because of sign thing), and back. -> > > > -> > > > Its also very easy to implement ;) -> > > > -> > > > This will actually come very useful for many applications. Something I'm -> > > > working on now (allocation of 'most appropriate' block) requires ability -> > > > to split a netblock into two, which could be most easily accomplished -> > > > using int8 math. (net::int8+2^(netmask(net)-1)). -> > > All I'm looking for is to be able to print all 4 octets of an IP address -> > > out so that joe user can take the 4 numbers and type it into the -> > > 4 boxes on a Windows 98 box, and use them. -> > > -> > > Is that really that abhorrent? -> > > -> > > They also need the 4 octet netmask which I can get now. -> > > -> > > All we are missing is a way to print ALL 4 NUMBERS ALL THE TIME -> > > for the output. It's not asking for classful, and for sure -> > > we use CIDR all over the place, but for the final output that my -> > > users see, why can't I have the database just print all 4 octets? -> > -> > Larry, -> > With my suggestion, you can do it as follows: -> > -> > net::int8::inet -> > -> > (net being of cidr type) -> > -alex -> > -> -> -> - - -From pgsql-hackers-owner+M5292@hub.org Tue Jul 25 01:27:56 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA25020 - for ; Tue, 25 Jul 2000 01:27:56 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6P5RWh12976; - Tue, 25 Jul 2000 01:27:32 -0400 (EDT) -Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6P5PHh12429 - for ; Tue, 25 Jul 2000 01:25:17 -0400 (EDT) -Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) - by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id WAA04370; - Mon, 24 Jul 2000 22:25:00 -0700 (PDT) -Message-Id: <3.0.1.32.20000724221204.01198400@mail.pacifier.com> -X-Sender: dhogaza@mail.pacifier.com -X-Mailer: Windows Eudora Pro Version 3.0.1 (32) -Date: Mon, 24 Jul 2000 22:12:04 -0700 -To: Larry Rosenman , Alex Pilosov -From: Don Baccus -Subject: Re: [HACKERS] INET/CIDR types -Cc: Sevo Stille , Larry Rosenman , - pgsql-hackers@hub.org -In-Reply-To: <200007250240.e6P2eVl12604@lerami.lerctr.org> -References: -Mime-Version: 1.0 -Content-Type: text/plain; charset="us-ascii" -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -At 09:40 PM 7/24/00 -0500, Larry Rosenman wrote: - ->Why is this discussion so hard? - -Because it's an output format you could easily solve yourself. Could've -solved yourself long ago. - -If you care so much, change the sources and run your own custom version. -The beauty of open source, you get to break it in whatever manner you -choose. - -Or hack your PHP script. - -If you need help hacking your script you can probably get help, here. I'm -sure people are tired enough of this thread to write it for you, if that's -necessary. - -Next I suppose you'll ask that Unix "ls" output switch "/" to -"\" so your $10 clerks can understand the output? - - - -- Don Baccus, Portland OR - Nature photos, on-line guides, Pacific Northwest - Rare Bird Alert Service and other goodies at - http://donb.photo.net. - -From pgsql-hackers-owner+M5321@hub.org Tue Jul 25 16:45:35 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA19032 - for ; Tue, 25 Jul 2000 16:45:34 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6PKieh98955; - Tue, 25 Jul 2000 16:44:40 -0400 (EDT) -Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6PKenh96652 - for ; Tue, 25 Jul 2000 16:40:49 -0400 (EDT) -Received: from regulus.student.UU.SE ([130.238.5.2]:40690 "EHLO - regulus.its.uu.se") by merganser.its.uu.se with ESMTP - id ; Tue, 25 Jul 2000 22:40:07 +0200 -Received: from peter (helo=localhost) - by regulus.its.uu.se with local-esmtp (Exim 3.02 #2) - id 13HBVx-0001cn-00; Tue, 25 Jul 2000 22:41:21 +0200 -Date: Tue, 25 Jul 2000 22:41:21 +0200 (CEST) -From: Peter Eisentraut -To: pgsql-hackers@hub.org -cc: Larry Rosenman -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=ISO-8859-1 -Content-Transfer-Encoding: 8BIT -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -D'Arcy J.M. Cain writes: - -> Hmmm. I just noticed this. -> -> darcy=> select '1.2.0.1/23'::cidr; -> ?column? -> -------- -> 1.2.0/23 -> (1 row) -> -> Shouldn't that throw an error? - -Isn't that what I've been saying all along? - - --- -Peter Eisentraut Sernanders väg 10:115 -peter_e@gmx.net 75262 Uppsala -http://yi.org/peter-e/ Sweden - - -From pgsql-hackers-owner+M5370@hub.org Thu Jul 27 06:17:36 2000 -Received: from hub.org (root@hub.org [216.126.84.1]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id GAA24699 - for ; Thu, 27 Jul 2000 06:17:35 -0400 (EDT) -Received: from hub.org (majordom@localhost [127.0.0.1]) - by hub.org (8.10.1/8.10.1) with SMTP id e6RAHRA44622; - Thu, 27 Jul 2000 06:17:27 -0400 (EDT) -Received: from druid.net (root@druid.net [216.126.72.98]) - by hub.org (8.10.1/8.10.1) with ESMTP id e6RAH2A44416 - for ; Thu, 27 Jul 2000 06:17:02 -0400 (EDT) -Received: from localhost (1387 bytes) by druid.net - via sendmail with P:stdio/R:bind_hosts/T:inet_zone_bind_smtp - (sender: ) (ident using unix) - id - for ; Thu, 27 Jul 2000 06:16:54 -0400 (EDT) - (Smail-3.2.0.109 1999-Oct-27 #3 built 2000-Jun-28) -Message-Id: -From: darcy@druid.net (D'Arcy J.M. Cain) -Subject: Re: [HACKERS] INET/CIDR types -In-Reply-To: - "from Peter Eisentraut at Jul 25, 2000 10:41:21 pm" -To: Peter Eisentraut -Date: Thu, 27 Jul 2000 06:16:54 -0400 (EDT) -CC: pgsql-hackers@hub.org, Larry Rosenman -Reply-To: pgsql-hackers@hub.org -X-Mailer: ELM [version 2.4ME+ PL78 (25)] -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit -Content-Type: text/plain; charset=US-ASCII -X-Mailing-List: pgsql-hackers@postgresql.org -Precedence: bulk -Sender: pgsql-hackers-owner@hub.org -Status: OR - -Thus spake Peter Eisentraut -> > Hmmm. I just noticed this. -> > -> > darcy=> select '1.2.0.1/23'::cidr; -> > ?column? -> > -------- -> > 1.2.0/23 -> > (1 row) -> > -> > Shouldn't that throw an error? -> -> Isn't that what I've been saying all along? - -Well, yes but I thought that it was now and that you were arguing to keep -that behaviour. This seems to be the behaviour that I was suggesting -although you have half convinced me that this should throw an error. - -So, it looks like the status quo is for inet::cidr to be a different -spelling for network(inet). Is this the way we want to keep it? - --- -D'Arcy J.M. Cain | Democracy is three wolves -http://www.druid.net/darcy/ | and a sheep voting on -+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. - diff --git a/doc/TODO.detail/cnfify b/doc/TODO.detail/cnfify deleted file mode 100644 index e7201adea0..0000000000 --- a/doc/TODO.detail/cnfify +++ /dev/null @@ -1,1556 +0,0 @@ -From daybee@bellatlantic.net Sun Aug 23 20:21:48 1998 -Received: from iconmail.bellatlantic.net (iconmail.bellatlantic.net [199.173.162.30]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA26688 - for ; Sun, 23 Aug 1998 20:21:46 -0400 (EDT) -Received: from bellatlantic.net (client196-126-169.bellatlantic.net [151.196.126.169]) - by iconmail.bellatlantic.net (IConNet Sendmail) with ESMTP id UAA09478; - Sun, 23 Aug 1998 20:18:35 -0400 (EDT) -Message-ID: <35E0ABF0.578694C8@bellatlantic.net> -Date: Sun, 23 Aug 1998 19:55:29 -0400 -From: David Hartwig -Organization: Home -X-Mailer: Mozilla 4.04 [en] (Win95; I) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hannu@trust.ee, pgsql-interfaces@postgreSQL.org, hackers@postgreSQL.org -Subject: Re: [INTERFACES] Re: [HACKERS] changes in 6.4 -References: <199808220353.XAA04528@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: ROr - - - -Bruce Momjian wrote: - -> > -> > Hannu Krosing wrote: -> > -> > > > The days where every release fixed server crashes, or added a feature -> > > > that users were 'screaming for' may be a thing of the past. -> > > -> > > Is anyone working on fixing the exploding optimisations for many OR-s, -> > > at least the canonic case used by access? -> > > -> > > My impression is that this has fallen somewhere between -> > > insightdist and Vadim. -> > -> > This is really big for the ODBCers. (And I suspect for JDBCers too.) Many -> > desktop libraries and end-user tools depend on this "record set" strategy to -> > operate effectively. -> > -> > I have put together a workable hack that runs just before cnfify(). The -> > option is activated through the SET command. Once activated, it identifies -> > queries with this particular multi-OR pattern generated by these RECORD SET -> > strategies. Qualified query trees are rewritten as multiple UNIONs. (One -> > for each OR grouping). -> > -> > The results are profound. Queries that used to scan tables because of the -> > ORs, now make use of any indexes. Thus, the size of the table has virtually -> > no effect on performance. Furthermore, queries that used to crash the -> > backend, now run in under a second. -> > -> > Currently the down sides are: -> > 1. If there is no usable index, performance is significantly worse. The -> > patch does not check to make sure that there is a usable index. I could use -> > some pointers on this. -> > -> > 2. Small tables are actually a bit slower than without the patch. -> > -> > 3. Not very elegant. I am looking for a more generalized solution. -> > I have lots of ideas, but I would need to know the backend much better before -> > attempting any of them. My favorite idea is before cnfify(), to factor the -> > OR terms and pull out the constants into a virtual (temporary) table spaces. -> > Then rewrite the query as a join. The optimizer will (should) treat the new -> > query accordingly. This assumes that an efficient factoring algorithm exists -> > and that temporary tables can exist in the heap. -> > -> > Illustration: -> > SELECT ... FROM tab WHERE -> > (var1 = const1 AND var2 = const2) OR -> > (var1 = const3 AND var2 = const4) OR -> > (var1 = const5 AND var2 = const6) -> > -> > SELECT ... FROM tab, tmp WHERE -> > (var1 = var_x AND var2 = var_y) -> > -> > tmp -> > var_x | var_y -> > -------------- -> > const1|const2 -> > const3|const4 -> > const5|const6 -> -> David, where are we on this? I know we have OR's using indexes. Do we -> still need to look this as a fix, or are we OK. I have not gotten far -> enough in the optimizer to know how to fix the - -Bruce, - -If the question is, have I come up with a solution for the cnf'ify problem: No - -If the question is, is it still important: Very much yes. - -It is essential for many RAD tools using remote data objects which make use of key -sets. Your recent optimization of the OR list goes a long way, but inevitably -users are confronted with multi-part keys. - -When I look at the problem my head spins. I do not have the experience (yet?) -with the backend to be mucking around in the optimizer. As I see it, cnf'ify is -doing just what it is supposed to do. Boundless boolean logic. - -I think hope may lay though, in identifying each AND'ed group associated with a key -and tagging it as a special sub-root node which cnf'ify does not penetrate. This -node would be allowed to pass to the later stages of the optimizer where it will be -used to plan index scans. Easy for me to say. - -In the meantime, I still have the patch that I described in prior email. It has -worked well for us. Let me restate that. We could not survive without it! -However, I do not feel that is a sufficiently functional approach that should be -incorporated as a final solution. I will submit the patch if you, (anyone) does -not come up with a better solution. It is coded to be activated by a SET KSQO to -minimize its reach. - - -From daybee@bellatlantic.net Sun Aug 30 12:06:24 1998 -Received: from iconmail.bellatlantic.net (iconmail.bellatlantic.net [199.173.162.30]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA12860 - for ; Sun, 30 Aug 1998 12:06:22 -0400 (EDT) -Received: from bellatlantic.net (client196-126-73.bellatlantic.net [151.196.126.73]) - by iconmail.bellatlantic.net (IConNet Sendmail) with ESMTP id MAA18468; - Sun, 30 Aug 1998 12:03:33 -0400 (EDT) -Message-ID: <35E9726E.C6E73049@bellatlantic.net> -Date: Sun, 30 Aug 1998 11:40:31 -0400 -From: David Hartwig -Organization: Home -X-Mailer: Mozilla 4.06 [en] (Win98; I) -MIME-Version: 1.0 -To: Bruce Momjian -CC: hannu@trust.ee, pgsql-interfaces@postgreSQL.org, hackers@postgreSQL.org -Subject: Re: [INTERFACES] Re: [HACKERS] changes in 6.4 -References: <199808290344.XAA28089@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: RO - - - -Bruce Momjian wrote: - -> OK, let me try this one. -> -> Why is the system cnf'ifying the query. Because it wants to have a -> list of qualifications that are AND'ed, so it can just pick the most -> restrictive/cheapest, and evaluate that one first. -> -> If you have: -> -> (a=b and c=d) or e=1 -> -> In this case, without cnf'ify, it has to evaluate both of them, because -> if one is false, you can't be sure another would be true. In the -> cnf'ify case, -> -> (a=b or e=1) and (c=d or e=1) -> -> In this case, it can choose either, and act on just one, if a row fails -> to meet it, it can stop and not evaluate it using the other restriction. -> -> The fact is that it is only going to use fancy join/index in one of the -> two cases, so it tries to pick the best one, and does a brute-force -> qualification test on the remaining item if the first one tried is true. -> -> The problem is of course large where clauses can exponentially expand -> this. What it really trying to do is to pick a cheapest restriction, -> but the memory explosion and query failure are serious problems. -> -> The issue is that it thinks it is doing something to help things, while -> it is actually hurting things. -> -> In the ODBC case of: -> -> (x=3 and y=4) or -> (x=3 and y=5) or -> (x=3 and y=6) or ... -> -> it clearly is not going to gain anything by choosing any CHEAPEST path, -> because they are all the same in terms of cost, and the use by ODBC -> clients is hurting reliability. -> -> I am inclined to agree with David's solution of breaking apart the query -> into separate UNION queries in certain cases. It seems to be the most -> logical solution, because the cnf'ify code is working counter to its -> purpose in these cases. -> -> Now, the question is how/where to implement this. I see your idea of -> making the OR a join to a temp table that holds all the constants. -> Another idea would be to do actual UNION queries: -> -> SELECT * FROM tab -> WHERE (x=3 and y=4) -> UNION -> SELECT * FROM tab -> WHERE (x=3 and y=5) -> UNION -> SELECT * FROM tab -> WHERE (x=3 and y=6) ... -> -> This would work well for tables with indexes, but for a sequential scan, -> you are doing a sequential scan for each UNION. - -Practically speaking, the lack of an index concern, may not be justified. The reason -these queries are being generated, with this shape, is because remote data objects on the -client side are being told that a primary key exists on these tables. The object is told -about these keys in one of two ways. - -1. It queries the database for the primary key of the table. The ODBC driver serviced -this request by querying for the attributes used in {table_name}_pkey. - -2. The user manually specifies the primary key. In this case an actual index may not -exist. (i.e. MS Access asks the user for this information if a primary key is not found -in a table) - -The second case is the only one that would cause a problem. Fortunately, the solution is -simple. Add a primary key index! - -My only concern is to be able to accurately identify a query with the proper signature -before rewriting it as a UNION. To what degree should this inspection be taken? - -BTW, I would not do the rewrite on OR's without AND's since you have fixed the OR's use -of the index. - -There is one other potential issue. My experience with using arrays in tables and UNIONS -creates problems. There are missing array comparison operators which are used by the -implied DISTINCT. - -> Another idea is -> subselects. Also, you have to make sure you return the proper rows, -> keeping duplicates where they are in the base table, but not returning -> them when the meet more than one qualification. -> -> SELECT * FROM tab -> WHERE (x,y) IN (SELECT 3, 4 -> UNION -> SELECT 3, 5 -> UNION -> SELECT 3, 6) -> -> I believe we actually support this. This is not going to use an index -> on tab, so it may be slow if x and y are indexed. -> -> Another more bizarre solution is: -> -> SELECT * FROM tab -> WHERE (x,y) = (SELECT 3, 4) OR -> (x,y) = (SELECT 3, 5) OR -> (x,y) = (SELECT 3, 6) -> -> Again, I think we do this too. I don't think cnf'ify does anything with -> this. I also believe "=" uses indexes on subselects, while IN does not -> because IN could return lots of rows, and an index is slower than a -> non-index join on lots of rows. Of course, now that we index OR's. -> -> Let me ask another question. If I do: -> -> SELECT * FROM tab WHERE x=3 OR x=4 -> -> it works, and uses indexes. Why can't the optimizer just not cnf'ify -> things sometimes, and just do: -> -> SELECT * FROM tab -> WHERE (x=3 AND y=4) OR -> (x=3 AND y=5) OR -> (x=3 AND y=6) -> -> Why can it handle x=3 OR x=4, but not the more complicated case above, -> without trying to be too smart? If x,y is a multi-key index, it could -> use that quite easily. If not, it can do a sequentail scan and run the -> tests. -> -> Another issue. To the optimizer, x=3 and x=y are totally different. In -> x=3, it is a column compared to a constant, while in x=y, it is a join. -> That makes a huge difference. -> -> In the case of (a=b and c=d) or e=1, you pick the best path and do the -> a=b join, and throw in the e=1 entries. You can't easily do both joins, -> because you also need the e=1 stuff. -> -> I wounder what would happen if we prevent cnf'ifying of cases where the -> OR represent only column = constant restrictions. -> -> I meant to really go through the optimizer this month, but other backend -> items took my time. -> -> Can someone run some tests on disabling the cnf'ify calls. It is my -> understanding that with the non-cnf-ify'ed query, it can't choose an -> optimial path, and starts to do either straight index matches, -> sequential scans, or cartesian products where it joins every row to -> every other row looking for a match. -> -> Let's say we turn off cnf-ify just for non-join queries. Does that -> help? -> -> I am not sure of the ramifications of telling the optimizer it no longer -> has a variety of paths to choose for evaluating the query. - -I did not try this earlier because I thought it was too good to be true. I was right. -I tried commenting out the normalize() function in the cnfify(). The EXPLAIN showed a -sequential scan and the resulting tuple set was empty. Time will not allow me to dig -into this further this weekend. - -Unless you come up with a better solution, I am going to submit my patch on Monday to -make the Sept. 1st deadline. It includes a SET switch to activate the rewrite so as not -to cause problems outside the ODBC users. We can either improve, it or yank it, by the -Oct. 1st deadline. - - -From infotecn@tin.it Mon Aug 31 03:01:51 1998 -Received: from mail.tol.it (mail.tin.it [194.243.154.49]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id DAA09740 - for ; Mon, 31 Aug 1998 03:01:48 -0400 (EDT) -Received: from Server.InfoTecna.com (a-mz6-50.tin.it [212.216.9.113]) - by mail.tol.it (8.8.4/8.8.4) with ESMTP - id JAA16451; Mon, 31 Aug 1998 09:00:35 +0200 (MET DST) -Received: from tm3.InfoTecna.com (Tm1.InfoTecna.com [192.168.1.1]) - by Server.InfoTecna.com (8.8.5/8.8.5) with SMTP id IAA18678; - Mon, 31 Aug 1998 08:53:13 +0200 -Message-Id: <3.0.5.32.19980831085312.00986cc0@MBox.InfoTecna.com> -X-Sender: denis@MBox.InfoTecna.com -X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.5 (32) -Date: Mon, 31 Aug 1998 08:53:12 +0200 -To: David Hartwig , - Bruce Momjian -From: Sbragion Denis -Subject: Re: [INTERFACES] Re: [HACKERS] changes in 6.4 -Cc: hannu@trust.ee, pgsql-interfaces@postgreSQL.org, hackers@postgreSQL.org -In-Reply-To: <35E9726E.C6E73049@bellatlantic.net> -References: <199808290344.XAA28089@candle.pha.pa.us> -Mime-Version: 1.0 -Content-Type: text/plain; charset="us-ascii" -Status: RO - -Hello, - -At 11.40 30/08/98 -0400, David Hartwig wrote: ->> Why is the system cnf'ifying the query. Because it wants to have a ->> list of qualifications that are AND'ed, so it can just pick the most ->> restrictive/cheapest, and evaluate that one first. - -Just a small question about all this optimizations stuff. I'm not a -database expert but I think we are talking about a NP-complete problem. -Could'nt we convert this optimization problem into another NP one that is -known to have a good solution ? For example for the traveling salesman -problem there's an alghoritm that provide a solution that's never more than -two times the optimal one an provides results that are *really* near the -optimal one most of the times. The simplex alghoritm may be another -example. I think that this kind of alghoritm would be better than a -collection ot tricks for special cases, and this tricks could be used -anyway when special cases are detected. Furthermore I also know that exists -a free program I used in the past that provides this kind of optimizations -for chip design. I don't remember the exact name of the program but I -remember it came from Berkeley university. Of course may be I'm totally -missing the point. - -Hope it helps ! - -Bye! - - Dr. Sbragion Denis - InfoTecna - Tel, Fax: +39 39 2324054 - URL: http://space.tin.it/internet/dsbragio - -From andreas.zeugswetter@telecom.at Mon Aug 31 06:31:13 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA14231 - for ; Mon, 31 Aug 1998 06:31:12 -0400 (EDT) -Received: from gandalf.telecom.at (gandalf.telecom.at [194.118.26.84]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id GAA21099 for ; Mon, 31 Aug 1998 06:23:41 -0400 (EDT) -Received: from zeugswettera.user.lan.at (zeugswettera.user.lan.at [10.4.123.227]) by gandalf.telecom.at (A.B.C.Delta4/8.8.8) with SMTP id MAA38132; Mon, 31 Aug 1998 12:22:07 +0200 -Received: by zeugswettera.user.lan.at with Microsoft Mail - id <01BDD4DA.C7F5B690@zeugswettera.user.lan.at>; Mon, 31 Aug 1998 12:27:55 +0200 -Message-ID: <01BDD4DA.C7F5B690@zeugswettera.user.lan.at> -From: Andreas Zeugswetter -To: "'maillist@candle.pha.pa.us'" -Cc: "hackers@postgreSQL.org" -Subject: AW: [INTERFACES] Re: [HACKERS] changes in 6.4 -Date: Mon, 31 Aug 1998 12:22:05 +0200 -Encoding: 31 TEXT -Status: RO - - ->Another idea would be to do actual UNION queries: -> -> SELECT * FROM tab -> WHERE (x=3 and y=4) -> UNION -> SELECT * FROM tab -> WHERE (x=3 and y=5) -> UNION -> SELECT * FROM tab -> WHERE (x=3 and y=6) ... -> ->This would work well for tables with indexes, but for a sequential scan, ->you are doing a sequential scan for each UNION. - -The most important Application for this syntax will be M$ Access -because it uses this syntax to display x rows from a table in a particular -sort order. In this case x and y will be the primary key and therefore have a -unique index. So I think this special case should work good. - -The strategy could be something like: -iff x, y is a unique index - do the union access path -else - do something else -done - -I think hand written SQL can always be rewritten if it is not fast enough -using this syntax. - -Andreas - - -From owner-pgsql-patches@hub.org Tue Sep 1 02:01:10 1998 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA28687 - for ; Tue, 1 Sep 1998 02:01:06 -0400 (EDT) -Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA02180; Tue, 1 Sep 1998 01:48:43 -0400 (EDT) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 01 Sep 1998 01:47:48 +0000 (EDT) -Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA02160 for pgsql-patches-outgoing; Tue, 1 Sep 1998 01:47:46 -0400 (EDT) -Received: from iconmail.bellatlantic.net (iconmail.bellatlantic.net [199.173.162.30]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA02147 for ; Tue, 1 Sep 1998 01:47:42 -0400 (EDT) -Received: from bellatlantic.net (client196-126-3.bellatlantic.net [151.196.126.3]) - by iconmail.bellatlantic.net (IConNet Sendmail) with ESMTP id XAA27530 - for ; Mon, 31 Aug 1998 23:24:07 -0400 (EDT) -Message-ID: <35EB2B33.EBF1E9AA@bellatlantic.net> -Date: Mon, 31 Aug 1998 19:01:07 -0400 -From: David Hartwig -Organization: Insight Distribution Systems -X-Mailer: Mozilla 4.04 [en] (X11; I; Linux 2.0.29 i586) -MIME-Version: 1.0 -To: patches -Subject: [PATCHES] Interim AND/OR memory exaustion fix. -Content-Type: multipart/mixed; boundary="------------BEFD1E6DA78A2DC20B524E32" -Sender: owner-pgsql-patches@hub.org -Precedence: bulk -Status: ROr - -This is a multi-part message in MIME format. ---------------BEFD1E6DA78A2DC20B524E32 -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit - -I will be cleaning this up more before the Oct 1 deadline. - ---------------BEFD1E6DA78A2DC20B524E32 -Content-Type: text/plain; charset=us-ascii; name="keyset.patch" -Content-Transfer-Encoding: 7bit -Content-Disposition: inline; filename="keyset.patch" - -*** ./backend/commands/variable.c.orig Thu Jul 30 19:25:26 1998 ---- ./backend/commands/variable.c Mon Aug 31 17:23:32 1998 -*************** -*** 24,29 **** ---- 24,30 ---- - extern bool _use_geqo_; - extern int32 _use_geqo_rels_; - extern bool _use_right_sided_plans_; -+ extern bool _use_keyset_query_optimizer; - - /*-----------------------------------------------------------------------*/ - static const char * -*************** -*** 559,564 **** ---- 560,568 ---- - }, - #endif - { -+ "ksqo", parse_ksqo, show_ksqo, reset_ksqo -+ }, -+ { - NULL, NULL, NULL, NULL - } - }; -*************** -*** 611,615 **** ---- 615,663 ---- - - elog(NOTICE, "Unrecognized variable %s", name); - -+ return TRUE; -+ } -+ -+ -+ /*----------------------------------------------------------------------- -+ KSQO code will one day be unnecessary when the optimizer makes use of -+ indexes when multiple ORs are specified in the where clause. -+ See optimizer/prep/prepkeyset.c for more on this. -+ daveh@insightdist.com 6/16/98 -+ -----------------------------------------------------------------------*/ -+ bool -+ parse_ksqo(const char *value) -+ { -+ if (value == NULL) -+ { -+ reset_ksqo(); -+ return TRUE; -+ } -+ -+ if (strcasecmp(value, "on") == 0) -+ _use_keyset_query_optimizer = true; -+ else if (strcasecmp(value, "off") == 0) -+ _use_keyset_query_optimizer = false; -+ else -+ elog(ERROR, "Bad value for Key Set Query Optimizer (%s)", value); -+ -+ return TRUE; -+ } -+ -+ bool -+ show_ksqo() -+ { -+ -+ if (_use_keyset_query_optimizer) -+ elog(NOTICE, "Key Set Query Optimizer is ON"); -+ else -+ elog(NOTICE, "Key Set Query Optimizer is OFF"); -+ return TRUE; -+ } -+ -+ bool -+ reset_ksqo() -+ { -+ _use_keyset_query_optimizer = false; - return TRUE; - } -*** ./backend/optimizer/plan/planner.c.orig Sun Aug 30 04:28:02 1998 ---- ./backend/optimizer/plan/planner.c Mon Aug 31 17:23:32 1998 -*************** -*** 69,74 **** ---- 69,75 ---- - PlannerInitPlan = NULL; - PlannerPlanId = 0; - -+ transformKeySetQuery(parse); - result_plan = union_planner(parse); - - Assert(PlannerQueryLevel == 1); -*** ./backend/optimizer/prep/Makefile.orig Sun Apr 5 20:23:48 1998 ---- ./backend/optimizer/prep/Makefile Mon Aug 31 17:23:32 1998 -*************** -*** 13,19 **** - - CFLAGS += -I../.. - -! OBJS = prepqual.o preptlist.o prepunion.o - - # not ready yet: predmig.o xfunc.o - ---- 13,19 ---- - - CFLAGS += -I../.. - -! OBJS = prepqual.o preptlist.o prepunion.o prepkeyset.o - - # not ready yet: predmig.o xfunc.o - -*** ./backend/optimizer/prep/prepkeyset.c.orig Mon Aug 31 17:23:32 1998 ---- ./backend/optimizer/prep/prepkeyset.c Mon Aug 31 18:30:58 1998 -*************** -*** 0 **** ---- 1,213 ---- -+ /*------------------------------------------------------------------------- -+ * -+ * prepkeyset.c-- -+ * Special preperation for keyset queries. -+ * -+ * Copyright (c) 1994, Regents of the University of California -+ * -+ *------------------------------------------------------------------------- -+ */ -+ #include -+ #include -+ -+ #include "postgres.h" -+ #include "nodes/pg_list.h" -+ #include "nodes/parsenodes.h" -+ #include "utils/elog.h" -+ -+ #include "nodes/nodes.h" -+ #include "nodes/execnodes.h" -+ #include "nodes/plannodes.h" -+ #include "nodes/primnodes.h" -+ #include "nodes/relation.h" -+ -+ #include "catalog/pg_type.h" -+ #include "lib/stringinfo.h" -+ #include "optimizer/planmain.h" -+ /* -+ * Node_Copy-- -+ * a macro to simplify calling of copyObject on the specified field -+ */ -+ #define Node_Copy(from, newnode, field) newnode->field = copyObject(from->field) -+ -+ /***** DEBUG stuff -+ #define TABS {int i; printf("\n"); for (i = 0; igroupClause || -+ origNode->havingQual || -+ origNode->hasAggs || -+ origNode->utilityStmt || -+ origNode->unionClause || -+ origNode->unionall || -+ origNode->hasSubLinks || -+ origNode->commandType != CMD_SELECT) -+ return; -+ -+ /* Qualify single table query */ -+ -+ /* Qualify where clause */ -+ if ( ! inspectOrNode((Expr*)origNode->qual)) { -+ return; -+ } -+ -+ /* Copy essential elements into a union node */ -+ /* -+ elog(NOTICE, "OR_EXPR=%d, OP_EXPR=%d, AND_EXPR=%d", OR_EXPR, OP_EXPR, AND_EXPR); -+ elog(NOTICE, "T_List=%d, T_Expr=%d, T_Var=%d, T_Const=%d", T_List, T_Expr, T_Var, T_Const); -+ elog(NOTICE, "opType=%d", ((Expr*)origNode->qual)->opType); -+ */ -+ while (((Expr*)origNode->qual)->opType == OR_EXPR) { -+ Query *unionNode = makeNode(Query); -+ -+ /* Pull up Expr = */ -+ unionNode->qual = lsecond(((Expr*)origNode->qual)->args); -+ -+ /* Pull up balance of tree */ -+ origNode->qual = lfirst(((Expr*)origNode->qual)->args); -+ -+ /* -+ elog(NOTICE, "origNode: opType=%d, nodeTag=%d", ((Expr*)origNode->qual)->opType, nodeTag(origNode->qual)); -+ elog(NOTICE, "unionNode: opType=%d, nodeTag=%d", ((Expr*)unionNode->qual)->opType, nodeTag(unionNode->qual)); -+ */ -+ -+ unionNode->commandType = origNode->commandType; -+ unionNode->resultRelation = origNode->resultRelation; -+ unionNode->isPortal = origNode->isPortal; -+ unionNode->isBinary = origNode->isBinary; -+ -+ if (origNode->uniqueFlag) -+ unionNode->uniqueFlag = pstrdup(origNode->uniqueFlag); -+ -+ Node_Copy(origNode, unionNode, sortClause); -+ Node_Copy(origNode, unionNode, rtable); -+ Node_Copy(origNode, unionNode, targetList); -+ -+ origNode->unionClause = lappend(origNode->unionClause, unionNode); -+ } -+ return; -+ } -+ -+ -+ -+ -+ static int -+ inspectOrNode(Expr *expr) -+ { -+ int fr = 0, sr = 0; -+ Expr *firstExpr, *secondExpr; -+ -+ if ( ! (expr && nodeTag(expr) == T_Expr && expr->opType == OR_EXPR)) -+ return 0; -+ -+ firstExpr = lfirst(expr->args); -+ secondExpr = lsecond(expr->args); -+ if (nodeTag(firstExpr) != T_Expr || nodeTag(secondExpr) != T_Expr) -+ return 0; -+ -+ if (firstExpr->opType == OR_EXPR) -+ fr = inspectOrNode(firstExpr); -+ else if (firstExpr->opType == OP_EXPR) /* Need to make sure it is last */ -+ fr = inspectOpNode(firstExpr); -+ else if (firstExpr->opType == AND_EXPR) /* Need to make sure it is last */ -+ fr = inspectAndNode(firstExpr); -+ -+ -+ if (secondExpr->opType == AND_EXPR) -+ sr = inspectAndNode(secondExpr); -+ else if (secondExpr->opType == OP_EXPR) -+ sr = inspectOpNode(secondExpr); -+ -+ return (fr && sr); -+ } -+ -+ -+ static int -+ inspectAndNode(Expr *expr) -+ { -+ int fr = 0, sr = 0; -+ Expr *firstExpr, *secondExpr; -+ -+ if ( ! (expr && nodeTag(expr) == T_Expr && expr->opType == AND_EXPR)) -+ return 0; -+ -+ firstExpr = lfirst(expr->args); -+ secondExpr = lsecond(expr->args); -+ if (nodeTag(firstExpr) != T_Expr || nodeTag(secondExpr) != T_Expr) -+ return 0; -+ -+ if (firstExpr->opType == AND_EXPR) -+ fr = inspectAndNode(firstExpr); -+ else if (firstExpr->opType == OP_EXPR) -+ fr = inspectOpNode(firstExpr); -+ -+ if (secondExpr->opType == OP_EXPR) -+ sr = inspectOpNode(secondExpr); -+ -+ return (fr && sr); -+ } -+ -+ -+ static int -+ /****************************************************************** -+ * Return TRUE if T_Var = T_Const, else FALSE -+ * Actually it does not test for =. Need to do this! -+ ******************************************************************/ -+ inspectOpNode(Expr *expr) -+ { -+ Expr *firstExpr, *secondExpr; -+ -+ if (nodeTag(expr) != T_Expr || expr->opType != OP_EXPR) -+ return 0; -+ -+ firstExpr = lfirst(expr->args); -+ secondExpr = lsecond(expr->args); -+ return (firstExpr && secondExpr && nodeTag(firstExpr) == T_Var && nodeTag(secondExpr) == T_Const); -+ } -*** ./include/commands/variable.h.orig Thu Jul 30 19:27:05 1998 ---- ./include/commands/variable.h Mon Aug 31 17:23:32 1998 -*************** -*** 54,58 **** ---- 54,61 ---- - extern bool show_geqo(void); - extern bool reset_geqo(void); - extern bool parse_geqo(const char *); -+ extern bool show_ksqo(void); -+ extern bool reset_ksqo(void); -+ extern bool parse_ksqo(const char *); - - #endif /* VARIABLE_H */ -*** ./include/optimizer/planmain.h.orig Mon Aug 31 18:27:03 1998 ---- ./include/optimizer/planmain.h Mon Aug 31 18:26:04 1998 -*************** -*** 67,71 **** ---- 67,72 ---- - extern List *check_having_qual_for_aggs(Node *clause, - List *subplanTargetList, List *groupClause); - extern List *check_having_qual_for_vars(Node *clause, List *targetlist_so_far); -+ extern void transformKeySetQuery(Query *origNode); - - #endif /* PLANMAIN_H */ - ---------------BEFD1E6DA78A2DC20B524E32-- - - - -From daveh@insightdist.com Thu Sep 3 12:34:48 1998 -Received: from u1.abs.net (root@u1.abs.net [207.114.0.131]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA07696 - for ; Thu, 3 Sep 1998 12:34:46 -0400 (EDT) -Received: from insightdist.com (nobody@localhost) - by u1.abs.net (8.9.0/8.9.0) with UUCP id MAA23590 - for maillist@candle.pha.pa.us; Thu, 3 Sep 1998 12:17:44 -0400 (EDT) -X-Authentication-Warning: u1.abs.net: nobody set sender to insightdist.com!daveh using -f -Received: from ceodev by insightdist.com (AIX 3.2/UCB 5.64/4.03) - id AA56436; Thu, 3 Sep 1998 11:51:24 -0400 -Received: from daveh by ceodev (AIX 4.1/UCB 5.64/4.03) - id AA45986; Thu, 3 Sep 1998 11:51:24 -0400 -Message-Id: <35EEBBEF.2158F68A@insightdist.com> -Date: Thu, 03 Sep 1998 11:55:28 -0400 -From: David Hartwig -Organization: Insight Distribution Systems -X-Mailer: Mozilla 4.05 [en] (Win95; I) -Mime-Version: 1.0 -To: Bruce Momjian -Cc: David Hartwig , pgsql-patches@postgreSQL.org -Subject: Re: [PATCHES] Interim AND/OR memory exaustion fix. -References: <199809030236.WAA22888@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: RO - - - -Bruce Momjian wrote: - -> > I will be cleaning this up more before the Oct 1 deadline. -> -> > *** ./backend/commands/variable.c.orig Thu Jul 30 19:25:26 1998 -> > --- ./backend/commands/variable.c Mon Aug 31 17:23:32 1998 -> -> Applied. Let's keep talking to see if we can come up with a nice -> general solution to this. -> - -Agreed. - -> I have been thinking, and the trouble case is a query that uses only one -> table, and had only "column = value" statements. I believe this can be -> easily identified and reworked somehow. -> - -If you are referring to the AND'less set of OR's, I do have plans to not let -that qualify since you have gotten the index scan working with OR's. - -I also think that the qualification process should be tightened up. For -example force the number of AND's to be the same in each OR grouping. And -have at least n OR's to qualify. We just need to head off the memory -exhaustion. - -> Your subtable idea may be a good one. -> - -This sounds like a 6.5 thing. I needed to stop the bleeding for 6.4. - - -From bga@mug.org Tue Sep 8 03:39:37 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA06237 - for ; Tue, 8 Sep 1998 03:39:36 -0400 (EDT) -Received: from bgalli.mug.org (bajor.mug.org [207.158.132.1]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id DAA03648 for ; Tue, 8 Sep 1998 03:38:52 -0400 (EDT) -Received: from localhost (bga@localhost) by bgalli.mug.org (8.8.7/SCO5) with SMTP id DAA02895 for ; Tue, 8 Sep 1998 03:31:26 -0400 (EDT) -Message-Id: <199809080731.DAA02895@bgalli.mug.org> -X-Authentication-Warning: bgalli.mug.org: bga@localhost didn't use HELO protocol -X-Mailer: exmh version 2.0.2 2/24/98 -From: "Billy G. Allie" -Reply-To: "Billy G. Allie" -To: Bruce Momjian -Subject: Re: [HACKERS] flock patch breaks things here -In-reply-to: Your message of "Mon, 31 Aug 1998 00:36:34 EDT." - <199808310436.AAA07618@candle.pha.pa.us> -Mime-Version: 1.0 -Content-Type: text/plain; charset=us-ascii -Date: Tue, 08 Sep 1998 03:31:26 -0400 -Sender: bga@mug.org -Status: ROr - -Bruce Momjian writes: - -> I have been thinking about this. First, we can easily use fopen(r+) to -> check to see if the file exists, and if it does read the pid and do a -> kill -0 to see if it is running. If no one else does it, I will take it -> on. - -It is better to use open with the O_CREAT and O_EXCL set. If the file does not -exist it will be created and the PID can be written to it. If the file exists -then the call will fail, at which point it can be opened with fread, and the -PID it contains can be checked to see if it still exists with kill. The open -call has the added advantage that 'The check for the existence of the file and -the creation of the file if it does not exist is atomic with respect to other -processes executing open naming the same filename in the same directory with -O_EXCL and O_CREAT set.' [from the UnixAWare 7 man page, open(2)]. - -Also, you can't just delete the file, create it and write the your PID to it -and assume that you have the lock, you need to close the file, sleep some -small amount of time and then open and read the file to see if you still have -the lock. If you like, I can take this task on. - -Oh, the postmaster must clear the PID when it exits. - -> -> Second, where to put the pid file. There is reason to put in /tmp, -> because it will get cleared in a reboot, and because it is locking the -> port number 5432. There is also reason to put it in /data because you -> can't have more than one postmaster running on a single data directory. -> -> So, we really want to lock both places. If this is going to make it -> easier for people to run more than one postmaster, because it will -> prevent/warn administrators when they try and put two postmasters in the -> same data dir or port, I say create the pid lock files both places, and -> give the admin a clear description of what he is doing wrong in each -> case. - -IHMO, the pid should be put in the data directory. The reasoning that it will get cleared in a reboot is not sufficent since the logic used to create the PID file will delete it if the PID it contains is not a running process. Besides, I have used systems where /tmp was not cleared out on a re-boot (for various reasons). Also, I would rather have a script that explicitly removes the PID locking file at system statup (if it exists), in which case, it doesn't matter where it resides. --- -____ | Billy G. Allie | Domain....: Bill.Allie@mug.org -| /| | 7436 Hartwell | Compuserve: 76337,2061 -|-/-|----- | Dearborn, MI 48126| MSN.......: B_G_Allie@email.msn.com -|/ |LLIE | (313) 582-1540 | - - - -From owner-pgsql-general@hub.org Thu Oct 1 14:00:57 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA12443 - for ; Thu, 1 Oct 1998 14:00:56 -0400 (EDT) -Received: from hub.org (majordom@hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id NAA07930 for ; Thu, 1 Oct 1998 13:57:47 -0400 (EDT) -Received: from localhost (majordom@localhost) - by hub.org (8.8.8/8.8.8) with SMTP id NAA26913; - Thu, 1 Oct 1998 13:56:29 -0400 (EDT) - (envelope-from owner-pgsql-general@hub.org) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 01 Oct 1998 13:55:56 +0000 (EDT) -Received: (from majordom@localhost) - by hub.org (8.8.8/8.8.8) id NAA26856 - for pgsql-general-outgoing; Thu, 1 Oct 1998 13:55:54 -0400 (EDT) - (envelope-from owner-pgsql-general@postgreSQL.org) -X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-general@postgreSQL.org using -f -Received: from mail.utexas.edu (wb3-a.mail.utexas.edu [128.83.126.138]) - by hub.org (8.8.8/8.8.8) with SMTP id NAA26840 - for ; Thu, 1 Oct 1998 13:55:49 -0400 (EDT) - (envelope-from taral@mail.utexas.edu) -Received: (qmail 1198 invoked by uid 0); 1 Oct 1998 17:55:40 -0000 -Received: from dial-24-13.ots.utexas.edu (HELO taral) (128.83.128.157) - by umbs-smtp-3 with SMTP; 1 Oct 1998 17:55:40 -0000 -From: "Taral" -To: -Subject: [GENERAL] CNF vs DNF -Date: Thu, 1 Oct 1998 12:55:39 -0500 -Message-ID: <000001bded64$b34b2200$3b291f0a@taral> -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Content-Transfer-Encoding: 7bit -X-Priority: 3 (Normal) -X-MSMail-Priority: Normal -X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 -X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 -In-Reply-To: -Importance: Normal -Sender: owner-pgsql-general@postgreSQL.org -Precedence: bulk -Status: RO - -> select * from aa where (bb = 2 and ff = 3) or (bb = 4 and ff = 5); - -I've been told that the system restructures these in CNF (conjunctive normal -form)... i.e. the above query turns into: - -select * from aa where (bb = 2 or bb = 4) and (ff = 3 or bb = 4) and (bb = 2 -or ff = 5) and (ff = 3 or ff = 5); - -Much longer and much less efficient, AFAICT. Isn't it more efficient to do a -union of many queries (DNF) than an intersection of many subqueries (CNF)? -Certainly remembering the subqueries takes less memory... Also, queries -already in DNF are probably more common than queries in CNF, requiring less -rewrite. - -Can someone clarify this? - -Taral - - - -From taral@mail.utexas.edu Fri Oct 2 01:35:42 1998 -Received: from mail.utexas.edu (wb1-a.mail.utexas.edu [128.83.126.134]) - by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id BAA28231 - for ; Fri, 2 Oct 1998 01:35:27 -0400 (EDT) -Received: (qmail 16318 invoked by uid 0); 2 Oct 1998 05:35:13 -0000 -Received: from dial-42-8.ots.utexas.edu (HELO taral) (128.83.111.216) - by umbs-smtp-1 with SMTP; 2 Oct 1998 05:35:13 -0000 -From: "Taral" -To: "Bruce Momjian" -Cc: -Subject: RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) -Date: Fri, 2 Oct 1998 00:35:12 -0500 -Message-ID: <000001bdedc6$6cf75d20$3b291f0a@taral> -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Content-Transfer-Encoding: 7bit -X-Priority: 3 (Normal) -X-MSMail-Priority: Normal -X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 -In-Reply-To: <199810020218.WAA23299@candle.pha.pa.us> -Importance: Normal -X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 -Status: ROr - -> It currently convert to CNF so it can select the most restrictive -> restriction and join, and use those first. However, the CNF conversion -> is a memory exploder for some queries, and we certainly need to have -> another method to split up those queries into UNIONS. I think we need -> to code to identify those queries capable of being converted to UNIONS, -> and do that before the query gets to the CNF section. That would be -> great, and David Hartwig has implemented a limited capability of doing -> this, but we really need a general routine to do this with 100% -> reliability. - -Well, if you're talking about a routine to generate a heuristic for CNF vs. -DNF, it is possible to precalculate the query sizes for CNF and DNF -rewrites... - -For conversion to CNF: - -At every node: - -if nodeType = AND then f(node) = f(left) + f(right) -if nodeType = OR then f(node) = f(left) * f(right) - -f(root) = a reasonably (but not wonderful) metric - -For DNF just switch AND and OR in the above. You may want to compute both -metrics and compare... take the smaller one and use that path. - -How to deal with other operators depends on their implementation... - -Taral - - -From taral@mail.utexas.edu Fri Oct 2 12:48:27 1998 -Received: from mail.utexas.edu (wb4-a.mail.utexas.edu [128.83.126.140]) - by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id MAA11438 - for ; Fri, 2 Oct 1998 12:48:25 -0400 (EDT) -Received: (qmail 15628 invoked by uid 0); 2 Oct 1998 16:47:50 -0000 -Received: from dial-42-8.ots.utexas.edu (HELO taral) (128.83.111.216) - by umbs-smtp-4 with SMTP; 2 Oct 1998 16:47:50 -0000 -From: "Taral" -To: "Bruce Momjian" -Cc: -Subject: RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) -Date: Fri, 2 Oct 1998 11:47:48 -0500 -Message-ID: <000301bdee24$63308740$3b291f0a@taral> -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Content-Transfer-Encoding: 7bit -X-Priority: 3 (Normal) -X-MSMail-Priority: Normal -X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 -In-reply-to: <199810021640.MAA10925@candle.pha.pa.us> -Importance: Normal -X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 -Status: RO - -> > Create a temporary oid hash? (for each table selected on, I guess) -> -> What I did with indexes was to run the previous OR clause index -> restrictions through the qualification code, and make sure it failed, -> but I am not sure how that is going to work with a more complex WHERE -> clause. Perhaps I need to restrict this to just simple cases of -> constants, which are easy to pick out an run through. Doing this with -> joins would be very hard, I think. - -Actually, I was thinking more of an index of returned rows... After each -subquery, the backend would check each row to see if it was already in the -index... Simple duplicate check, in other words. Of course, I don't know how -well this would behave with large tables being returned... - -Anyone else have some ideas they want to throw in? - -Taral - - -From taral@mail.utexas.edu Fri Oct 2 17:13:01 1998 -Received: from mail.utexas.edu (wb1-a.mail.utexas.edu [128.83.126.134]) - by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id RAA20838 - for ; Fri, 2 Oct 1998 17:12:27 -0400 (EDT) -Received: (qmail 17418 invoked by uid 0); 2 Oct 1998 21:12:19 -0000 -Received: from dial-46-30.ots.utexas.edu (HELO taral) (128.83.112.158) - by umbs-smtp-1 with SMTP; 2 Oct 1998 21:12:19 -0000 -From: "Taral" -To: "Bruce Momjian" , -Cc: -Subject: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) -Date: Fri, 2 Oct 1998 16:12:19 -0500 -Message-ID: <000001bdee49$56c7cd40$3b291f0a@taral> -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Content-Transfer-Encoding: 7bit -X-Priority: 3 (Normal) -X-MSMail-Priority: Normal -X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 -In-reply-to: <199810021758.NAA15524@candle.pha.pa.us> -Importance: Normal -X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 -Status: ROr - -> Another idea is that we rewrite queries such as: -> -> SELECT * -> FROM tab -> WHERE (a=1 AND b=2 AND c=3) OR -> (a=1 AND b=2 AND c=4) OR -> (a=1 AND b=2 AND c=5) OR -> (a=1 AND b=2 AND c=6) -> -> into: -> -> SELECT * -> FROM tab -> WHERE (a=1 AND b=2) AND (c=3 OR c=4 OR c=5 OR c=6) - -Very nice, but that's like trying to code factorization of numbers... not -pretty, and very CPU intensive on complex queries... - -Taral - - -From taral@mail.utexas.edu Fri Oct 2 17:49:59 1998 -Received: from mail.utexas.edu (wb2-a.mail.utexas.edu [128.83.126.136]) - by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id RAA21488 - for ; Fri, 2 Oct 1998 17:49:52 -0400 (EDT) -Received: (qmail 23729 invoked by uid 0); 2 Oct 1998 21:49:27 -0000 -Received: from dial-2-6.ots.utexas.edu (HELO taral) (128.83.204.22) - by umbs-smtp-2 with SMTP; 2 Oct 1998 21:49:27 -0000 -From: "Taral" -To: "Bruce Momjian" -Cc: , -Subject: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) -Date: Fri, 2 Oct 1998 16:49:26 -0500 -Message-ID: <000001bdee4e$86688b20$3b291f0a@taral> -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Content-Transfer-Encoding: 7bit -X-Priority: 3 (Normal) -X-MSMail-Priority: Normal -X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 -In-Reply-To: <199810022139.RAA21082@candle.pha.pa.us> -Importance: Normal -X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 -Status: ROr - -> > Very nice, but that's like trying to code factorization of -> numbers... not -> > pretty, and very CPU intensive on complex queries... -> -> Yes, but how large are the WHERE clauses going to be? Considering the -> cost of cnfify() and UNION, it seems like a clear win. Is it general -> enough to solve our problems? - -Could be... the examples I received where the cnfify() was really bad were -cases where the query was submitted alredy in DNF... and where the UNION was -a simple one. However, I don't know of any algorithms for generic -simplification of logical constraints. One problem is resolution/selection -of factors: - -SELECT * FROM a WHERE (a = 1 AND b = 2 AND c = 3) OR (a = 4 AND b = 2 AND c -= 3) OR (a = 1 AND b = 5 AND c = 3) OR (a = 1 AND b = 2 AND c = 6); - -Try that on for size. You can understand why that code gets ugly, fast. -Somebody could try coding it, but it's not a clear win to me. - -My original heuristic was missing one thing: "Where the heuristic fails to -process or decide, default to CNF." Since that's the current behavior, we're -less likely to break things. - -Taral - - -From owner-pgsql-hackers@hub.org Fri Oct 2 19:28:09 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA23341 - for ; Fri, 2 Oct 1998 19:28:08 -0400 (EDT) -Received: from hub.org (majordom@hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id SAA18003 for ; Fri, 2 Oct 1998 18:21:37 -0400 (EDT) -Received: from localhost (majordom@localhost) - by hub.org (8.8.8/8.8.8) with SMTP id SAA01250; - Fri, 2 Oct 1998 18:08:02 -0400 (EDT) - (envelope-from owner-pgsql-hackers@hub.org) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 02 Oct 1998 18:04:37 +0000 (EDT) -Received: (from majordom@localhost) - by hub.org (8.8.8/8.8.8) id SAA00847 - for pgsql-hackers-outgoing; Fri, 2 Oct 1998 18:04:35 -0400 (EDT) - (envelope-from owner-pgsql-hackers@postgreSQL.org) -X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-hackers@postgreSQL.org using -f -Received: from mail.utexas.edu (wb2-a.mail.utexas.edu [128.83.126.136]) - by hub.org (8.8.8/8.8.8) with SMTP id SAA00806 - for ; Fri, 2 Oct 1998 18:04:26 -0400 (EDT) - (envelope-from taral@mail.utexas.edu) -Received: (qmail 29662 invoked by uid 0); 2 Oct 1998 22:04:25 -0000 -Received: from dial-2-6.ots.utexas.edu (HELO taral) (128.83.204.22) - by umbs-smtp-2 with SMTP; 2 Oct 1998 22:04:25 -0000 -From: "Taral" -To: "Bruce Momjian" -Cc: , -Subject: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) -Date: Fri, 2 Oct 1998 17:04:24 -0500 -Message-ID: <000201bdee50$9d9c4320$3b291f0a@taral> -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Content-Transfer-Encoding: 7bit -X-Priority: 3 (Normal) -X-MSMail-Priority: Normal -X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 -In-Reply-To: <199810022157.RAA21769@candle.pha.pa.us> -Importance: Normal -X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 -Sender: owner-pgsql-hackers@postgreSQL.org -Precedence: bulk -Status: ROr - -> How do we do that with UNION, and return the right rows. Seems the -> _join_ happending multiple times would be much worse than the factoring. - -Ok... We have two problems: - -1) DNF for unjoined queries. -2) Factorization for the rest. - -I have some solutions for (1). Not for (2). Remember that unjoined queries -are quite common. :) - -For (1), we can always try to parallel the multiple queries... especially in -the case where a sequential search is required. - -Taral - - - -From owner-pgsql-hackers@hub.org Sat Oct 3 23:32:35 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA06644 - for ; Sat, 3 Oct 1998 23:31:13 -0400 (EDT) -Received: from hub.org (root@hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id XAA26912 for ; Sat, 3 Oct 1998 23:14:01 -0400 (EDT) -Received: from localhost (majordom@localhost) - by hub.org (8.8.8/8.8.8) with SMTP id WAA04407; - Sat, 3 Oct 1998 22:07:05 -0400 (EDT) - (envelope-from owner-pgsql-hackers@hub.org) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sat, 03 Oct 1998 22:02:00 +0000 (EDT) -Received: (from majordom@localhost) - by hub.org (8.8.8/8.8.8) id WAA04010 - for pgsql-hackers-outgoing; Sat, 3 Oct 1998 22:01:59 -0400 (EDT) - (envelope-from owner-pgsql-hackers@postgreSQL.org) -X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-hackers@postgreSQL.org using -f -Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) - by hub.org (8.8.8/8.8.8) with ESMTP id WAA03968 - for ; Sat, 3 Oct 1998 22:00:37 -0400 (EDT) - (envelope-from maillist@candle.pha.pa.us) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.9.0/8.9.0) id VAA04640; - Sat, 3 Oct 1998 21:57:30 -0400 (EDT) -From: Bruce Momjian -Message-Id: <199810040157.VAA04640@candle.pha.pa.us> -Subject: Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) -In-Reply-To: <000201bdee50$9d9c4320$3b291f0a@taral> from Taral at "Oct 2, 1998 5: 4:24 pm" -To: taral@mail.utexas.edu (Taral) -Date: Sat, 3 Oct 1998 21:57:30 -0400 (EDT) -Cc: jwieck@debis.com, hackers@postgreSQL.org -X-Mailer: ELM [version 2.4ME+ PL47 (25)] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-pgsql-hackers@postgreSQL.org -Precedence: bulk -Status: RO - - -I have another idea. - -When we cnfify, this: - - (A AND B) OR (C AND D) - -becomes - - (A OR C) AND (A OR D) AND (B OR C) AND (B OR D) - -however if A and C are identical, this could become: - - (A OR A) AND (A OR D) AND (B OR A) AND (B OR D) - -and A OR A is A: - - A AND (A OR D) AND (B OR A) AND (B OR D) - -and since we are now saying A has to be true, we can remove OR's with A: - - A AND (B OR D) - -Much smaller, and a big win for queries like: - - SELECT * - FROM tab - WHERE (a=1 AND b=2) OR - (a=1 AND b=3) - -This becomes: - - (a=1) AND (b=2 OR b=3) - -which is accurate, and uses our OR indexing. - -Seems I could code cnfify() to look for identical qualifications in two -joined OR clauses and remove the duplicates. - -Sound like big win, and fairly easy and inexpensive in processing time. - -Comments? - --- - Bruce Momjian | http://www.op.net/~candle - maillist@candle.pha.pa.us | (610) 853-3000 - + If your life is a hard drive, | 830 Blythe Avenue - + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 - - - -From taral@mail.utexas.edu Sat Oct 3 22:43:41 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA05961 - for ; Sat, 3 Oct 1998 22:42:18 -0400 (EDT) -Received: from mail.utexas.edu (wb2-a.mail.utexas.edu [128.83.126.136]) by renoir.op.net (o1/$ Revision: 1.18 $) with SMTP id WAA25111 for ; Sat, 3 Oct 1998 22:27:34 -0400 (EDT) -Received: (qmail 25622 invoked by uid 0); 4 Oct 1998 02:26:21 -0000 -Received: from dial-42-9.ots.utexas.edu (HELO taral) (128.83.111.217) - by umbs-smtp-2 with SMTP; 4 Oct 1998 02:26:21 -0000 -From: "Taral" -To: "Bruce Momjian" -Cc: , -Subject: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) -Date: Sat, 3 Oct 1998 21:26:20 -0500 -Message-ID: <000501bdef3e$5f5293a0$3b291f0a@taral> -MIME-Version: 1.0 -Content-Type: text/plain; - charset="iso-8859-1" -Content-Transfer-Encoding: 7bit -X-Priority: 3 (Normal) -X-MSMail-Priority: Normal -X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 -Importance: Normal -In-Reply-To: <199810040157.VAA04640@candle.pha.pa.us> -X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 -Status: ROr - -> however if A and C are identical, this could become: -> -> (A OR A) AND (A OR D) AND (B OR A) AND (B OR D) -> -> and A OR A is A: -> -> A AND (A OR D) AND (B OR A) AND (B OR D) -> -> and since we are now saying A has to be true, we can remove OR's with A: -> -> A AND (B OR D) - -Very nice... and you could do that after each iteration of the rewrite, -preventing the size from getting too big. :) - -I have a symbolic expression tree evaluator that would be perfect for -this... I'll see if I can't adapt it. - -Can someone mail me the structures for expression trees? I don't want to -have to excise them from the source. Please? - -Taral - - -From daveh@insightdist.com Mon Nov 9 13:31:07 1998 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA00997 - for ; Mon, 9 Nov 1998 13:31:00 -0500 (EST) -Received: from u1.abs.net (root@u1.abs.net [207.114.0.131]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id NAA26657 for ; Mon, 9 Nov 1998 13:10:14 -0500 (EST) -Received: from insightdist.com (nobody@localhost) - by u1.abs.net (8.9.0/8.9.0) with UUCP id MAA17710 - for maillist@candle.pha.pa.us; Mon, 9 Nov 1998 12:52:05 -0500 (EST) -X-Authentication-Warning: u1.abs.net: nobody set sender to insightdist.com!daveh using -f -Received: from ceodev by insightdist.com (AIX 3.2/UCB 5.64/4.03) - id AA43498; Mon, 9 Nov 1998 12:38:24 -0500 -Received: from daveh by ceodev (AIX 4.1/UCB 5.64/4.03) - id AA54446; Mon, 9 Nov 1998 12:38:23 -0500 -Message-Id: <3647296F.6F7FDDD2@insightdist.com> -Date: Mon, 09 Nov 1998 12:42:07 -0500 -From: David Hartwig -Organization: Insight Distribution Systems -X-Mailer: Mozilla 4.5 [en] (Win98; I) -X-Accept-Language: en -Mime-Version: 1.0 -To: Bob Kruger , - Bruce Momjian -Cc: pgsql-general@postgreSQL.org, Byron Nikolaidis -Subject: Re: [GENERAL] Incrementing a Serial Field -References: <3.0.5.32.19981109110757.0082c950@mindspring.com> -Content-Type: multipart/mixed; - boundary="------------3D3EE7F67DFC542D3928BB7E" -Status: ROr - -This is a multi-part message in MIME format. ---------------3D3EE7F67DFC542D3928BB7E -Content-Type: multipart/alternative; - boundary="------------43E2CC34278FA08EFC9E0611" - - ---------------43E2CC34278FA08EFC9E0611 -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit - - - -Bob Kruger wrote: - -> The second question is that I noticed the ODBC bug (feature?) when linking -> Postgres to MS Access still exists. This bug occurs when linking a MS -> Access table to a Postgres table, and identifying more than one field as -> the unique record identifier. This makes Postgres run until it exhausts -> all available memory. Does anyone know a way around this? Enabling read -> only ODBC is a feature I would like to make available, but I do not want -> the possibility of postgres crashing because of an error on the part of a -> MS Access user. -> -> BTW - Having capability to be linked to an Access database is not an -> option. The current project I am working on calls for that, so it is a -> necessary evil that I hav to live with. -> - -In the driver connection settings add the following line. - - SET ksql TO 'on'; - -Stands for: keyset query optimization. This is not considered a final -solution. As such, it is undocumented. Some time in the next day or so, we -will be releasing a version of the driver which will automatically SET ksqo. - -You will most likely be satisfied with the results. One problem with this -solution, however, is that it does not work if you have any (some kinds of?) -arrays in the table you are browsing. This is a sideffect of the rewrite to a -UNION which performs an internal sort unique. - -Also, if you are using row versioning you may need to overload some operators -for xid and int4. I have included a script that will take care of this. - -Bruce, can I get these operators hardcoded into 6.4.1- assuming there will be -one. The operators necessitated by the UNION sideffects. - - ---------------43E2CC34278FA08EFC9E0611 -Content-Type: text/html; charset=us-ascii -Content-Transfer-Encoding: 7bit - - - -  -

Bob Kruger wrote: -

The second question is that I noticed the ODBC bug -(feature?) when linking -
Postgres to MS Access still exists.  This bug occurs when linking -a MS -
Access table to a Postgres table, and identifying more than one field -as -
the unique record identifier.  This makes Postgres run until it -exhausts -
all available memory.  Does anyone know a way around this?  -Enabling read -
only ODBC is a feature I would like to make available, but I do not -want -
the possibility of postgres crashing because of an error on the part -of a -
MS Access user. -

BTW - Having capability to be linked to an Access database is not an -
option.  The current project I am working on calls for that, so -it is a -
necessary evil that I hav to live with. -
 

-In the driver connection settings add the following line. -

    SET ksql TO 'on'; -

Stands for: keyset query optimization.  This is not considered -a final solution.  As such, it is undocumented.   Some time -in the next day or so, we will be releasing a version of the driver which -will automatically SET ksqo. -

You will most likely be satisfied with the results.   One -problem with this solution, however,  is that it does not work if -you have any (some kinds of?) arrays in the table you are browsing.   -This is a sideffect of the rewrite to a UNION which performs an internal -sort unique. -

Also, if you are using row versioning you may need to overload some -operators for xid and int4.  I have included a script that will take -care of this. -

Bruce, can I get these operators hardcoded into 6.4.1- assuming there -will be one.   The operators  necessitated by the UNION -sideffects. -
  - ---------------43E2CC34278FA08EFC9E0611-- - ---------------3D3EE7F67DFC542D3928BB7E -Content-Type: text/plain; charset=us-ascii; - name="xidint4.sql" -Content-Transfer-Encoding: 7bit -Content-Disposition: inline; - filename="xidint4.sql" - --- Insight Distribution Systems - System V - Apr 1998 --- @(#)xidint4.sql 1.2 :/sccs/sql/extend/s.xidint4.sql 10/2/98 13:40:19" - -create function int4eq(xid,int4) - returns bool - as '' - language 'internal'; - -create operator = ( - leftarg=xid, - rightarg=int4, - procedure=int4eq, - commutator='=', - negator='<>', - restrict=eqsel, - join=eqjoinsel - ); - -create function int4lt(xid,xid) - returns bool - as '' - language 'internal'; - -create operator < ( - leftarg=xid, - rightarg=xid, - procedure=int4lt, - commutator='=', - negator='<>', - restrict=eqsel, - join=eqjoinsel - ); - - - ---------------3D3EE7F67DFC542D3928BB7E-- - - diff --git a/doc/TODO.detail/drop b/doc/TODO.detail/drop index f4001b4ab0..ae87fe3b30 100644 --- a/doc/TODO.detail/drop +++ b/doc/TODO.detail/drop @@ -2,7 +2,7 @@ From pgsql-hackers-owner+M3040@hub.org Thu Jun 8 00:31:01 2000 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA13157 for ; Thu, 8 Jun 2000 00:31:00 -0400 (EDT) -Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id AAA01089 for ; Thu, 8 Jun 2000 00:17:19 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA01089 for ; Thu, 8 Jun 2000 00:17:19 -0400 (EDT) Received: from hub.org (majordom@localhost [127.0.0.1]) by hub.org (8.10.1/8.10.1) with SMTP id e5846ib99782; Thu, 8 Jun 2000 00:06:44 -0400 (EDT) @@ -280,7 +280,7 @@ From Inoue@tpf.co.jp Sat Jun 10 01:01:01 2000 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA10355 for ; Sat, 10 Jun 2000 01:01:00 -0400 (EDT) -Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id AAA25467 for ; Sat, 10 Jun 2000 00:41:32 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA25467 for ; Sat, 10 Jun 2000 00:41:32 -0400 (EDT) Received: from mcadnote1 (ppm110.noc.fukui.nsk.ne.jp [210.161.188.29] (may be forged)) by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP id NAA03125; Sat, 10 Jun 2000 13:40:40 +0900 @@ -411,7 +411,7 @@ From tgl@sss.pgh.pa.us Sat Jun 10 01:31:04 2000 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA10922 for ; Sat, 10 Jun 2000 01:31:03 -0400 (EDT) -Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id BAA27265 for ; Sat, 10 Jun 2000 01:16:07 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA27265 for ; Sat, 10 Jun 2000 01:16:07 -0400 (EDT) Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id BAA06206; Sat, 10 Jun 2000 01:14:37 -0400 (EDT) @@ -457,7 +457,7 @@ From dhogaza@pacifier.com Sat Jun 10 09:30:59 2000 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA25987 for ; Sat, 10 Jun 2000 09:30:58 -0400 (EDT) -Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id JAA18716 for ; Sat, 10 Jun 2000 09:15:08 -0400 (EDT) +Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id JAA18716 for ; Sat, 10 Jun 2000 09:15:08 -0400 (EDT) Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id GAA15799; Sat, 10 Jun 2000 06:14:28 -0700 (PDT) @@ -509,7 +509,7 @@ From tgl@sss.pgh.pa.us Sun Jun 11 12:31:03 2000 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA05771 for ; Sun, 11 Jun 2000 12:31:01 -0400 (EDT) -Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id MAA19315 for ; Sun, 11 Jun 2000 12:24:06 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA19315 for ; Sun, 11 Jun 2000 12:24:06 -0400 (EDT) Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA09503; Sun, 11 Jun 2000 12:22:42 -0400 (EDT) @@ -778,3 +778,64 @@ jdavis@dynworks.com http://dynworks.com +From owner-pgsql-hackers@hub.org Sat Feb 26 01:07:45 2000 +Received: from hub.org (hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA17776 + for ; Sat, 26 Feb 2000 01:07:43 -0500 (EST) +Received: from localhost (majordom@localhost) + by hub.org (8.9.3/8.9.3) with SMTP id BAA06232; + Sat, 26 Feb 2000 01:03:53 -0500 (EST) + (envelope-from owner-pgsql-hackers) +Received: by hub.org (bulk_mailer v1.5); Sat, 26 Feb 2000 01:03:26 -0500 +Received: (from majordom@localhost) + by hub.org (8.9.3/8.9.3) id BAA05808 + for pgsql-hackers-outgoing; Sat, 26 Feb 2000 01:02:28 -0500 (EST) + (envelope-from owner-pgsql-hackers@postgreSQL.org) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.9.3/8.9.3) with ESMTP id BAA05426 + for ; Sat, 26 Feb 2000 01:01:46 -0500 (EST) + (envelope-from tgl@sss.pgh.pa.us) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id BAA14228; + Sat, 26 Feb 2000 01:01:34 -0500 (EST) +To: Bruce Momjian +cc: Peter Eisentraut , + PostgreSQL Development +Subject: Re: [HACKERS] ALTER TABLE DROP COLUMN +In-reply-to: <200002260412.XAA14752@candle.pha.pa.us> +References: <200002260412.XAA14752@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Fri, 25 Feb 2000 23:12:26 -0500" +Date: Sat, 26 Feb 2000 01:01:33 -0500 +Message-ID: <14225.951544893@sss.pgh.pa.us> +From: Tom Lane +Sender: owner-pgsql-hackers@postgreSQL.org +Status: ORr + +Bruce Momjian writes: +> You can exclusively lock the table, then do a heap_getnext() scan over +> the entire table, remove the dropped column, do a heap_insert(), then a +> heap_delete() on the current tuple, making sure to skip over the tuples +> inserted by the current transaction. When completed, remove the column +> from pg_attribute, mark the transaction as committed (if desired), and +> run vacuum over the table to remove the deleted rows. + +Hmm, that would work --- the new tuples commit at the same instant that +the schema updates commit, so it should be correct. You have the 2x +disk usage problem, but there's no way around that without losing +rollback ability. + +A potentially tricky bit will be persuading the tuple-reading and tuple- +writing subroutines to pay attention to different versions of the tuple +structure for the same table. I haven't looked to see if this will be +difficult or not. If you can pass the TupleDesc explicitly then it +shouldn't be a problem. + +I'd suggest that the cleanup vacuum *not* be an automatic part of +the operation; just recommend that people do it ASAP after dropping +a column. Consider needing to drop several columns... + + regards, tom lane + +************ + diff --git a/doc/TODO.detail/pglog b/doc/TODO.detail/pglog deleted file mode 100644 index 1810a8911f..0000000000 --- a/doc/TODO.detail/pglog +++ /dev/null @@ -1,2900 +0,0 @@ -From aoki@postgres.Berkeley.EDU Sun Jun 22 19:31:06 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA19488 - for ; Sun, 22 Jun 1997 19:31:03 -0400 (EDT) -Received: from faerie.CS.Berkeley.EDU (faerie.CS.Berkeley.EDU [128.32.37.53]) by renoir.op.net ($ Revision: 1.12 $) with SMTP id TAA18795 for ; Sun, 22 Jun 1997 19:18:06 -0400 (EDT) -Received: from localhost.Berkeley.EDU (localhost.Berkeley.EDU [127.0.0.1]) by faerie.CS.Berkeley.EDU (8.6.10/8.6.3) with SMTP id QAA07816 for maillist@candle.pha.pa.us; Sun, 22 Jun 1997 16:16:44 -0700 -Message-Id: <199706222316.QAA07816@faerie.CS.Berkeley.EDU> -X-Authentication-Warning: faerie.CS.Berkeley.EDU: Host localhost.Berkeley.EDU didn't use HELO protocol -From: aoki@CS.Berkeley.EDU (Paul M. Aoki) -To: Bruce Momjian -Subject: Re: PostgreSQL psort() function performance -Reply-To: aoki@CS.Berkeley.EDU (Paul M. Aoki) -In-reply-to: Your message of Sun, 22 Jun 1997 09:45:31 -0400 (EDT) - <199706221345.JAA11476@candle.pha.pa.us> -Date: Sun, 22 Jun 97 16:16:43 -0700 -Sender: aoki@postgres.Berkeley.EDU -X-Mts: smtp -Status: OR - -the mariposa distribution (http://mariposa.cs.berkeley.edu/) contains -some hacks to nodeSort.c and psort.c that - - make psort read directly from the executor node below it - (instead of an input relation) - - makes the Sort node read directly from the last set of psort runs - (instead of an output relation) -speeds things up quite a bit. kind of ruins psort for other purposes, -though (which is why nbtsort.c exists). - -i'd merge these in first and see how far that gets you. --- - Paul M. Aoki | University of California at Berkeley - aoki@CS.Berkeley.EDU | Dept. of EECS, Computer Science Division #1776 - | Berkeley, CA 94720-1776 - -From owner-pgsql-hackers@hub.org Mon Nov 3 09:31:04 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA01676 - for ; Mon, 3 Nov 1997 09:31:02 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA07345 for ; Mon, 3 Nov 1997 09:13:20 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id IAA13315; Mon, 3 Nov 1997 08:50:26 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 08:48:07 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id IAA11722 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 08:48:02 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id IAA11539 for ; Mon, 3 Nov 1997 08:47:34 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id UAA19066; Mon, 3 Nov 1997 20:48:04 +0700 (KRS) -Message-ID: <345DD614.345BF651@sable.krasnoyarsk.su> -Date: Mon, 03 Nov 1997 20:48:04 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Marc Howard Zuckman -CC: Bruce Momjian , hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Marc Howard Zuckman wrote: -> -> On Mon, 3 Nov 1997, Bruce Momjian wrote: -> -> > With fsync off, I just did an insert of 1000 integers into a table -> > containing a single int4 column and no indexes, and it completed in 2.3 -> > seconds. This is on the new source tree.. That is 434 inserts/second. -> > Pretty major performance, or 2.3 ms/insert. This is on a idle PP200 -> > with UltraSCSI drives. -> > -> > With fsync on, the time goes to 51 seconds. Wow, big difference. -> -> If better alternative error recovery methods were available, perhaps -> a facility to replay an interval transactions log from a prior dump, -> it would be reasonable to run the backend without fsync and -> take advantage of the performance gains. - -??? - -> -> I don't know the answer, but I suspect that the commercial databases -> don't "fsync" the way pgsql does. - -Could someone try 1000 int4 inserts using postgres and -some commercial database (on the same machine) ? - -Vadim - - -From owner-pgsql-hackers@hub.org Mon Nov 3 09:01:02 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA01183 - for ; Mon, 3 Nov 1997 09:01:00 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id IAA06632 for ; Mon, 3 Nov 1997 08:51:58 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id IAA05964; Mon, 3 Nov 1997 08:39:39 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 08:37:32 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id IAA04729 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 08:37:26 -0500 (EST) -Received: from fallon.classyad.com (root@classyad.com [152.160.43.1]) by hub.org (8.8.5/8.7.5) with ESMTP id IAA04614 for ; Mon, 3 Nov 1997 08:37:16 -0500 (EST) -Received: from fallon.classyad.com (marc@fallon.classyad.com [152.160.43.1]) by fallon.classyad.com (8.8.5/8.7.3) with SMTP id JAA22108; Mon, 3 Nov 1997 09:11:09 -0500 -Date: Mon, 3 Nov 1997 09:11:09 -0500 (EST) -From: Marc Howard Zuckman -To: Bruce Momjian -cc: "Vadim B. Mikheev" , hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -In-Reply-To: <199711030513.AAA23474@candle.pha.pa.us> -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=US-ASCII -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -On Mon, 3 Nov 1997, Bruce Momjian wrote: - -> > -> > Removed... -> > -> > Also, ItemPointerData t_chain (6 bytes) removed from HeapTupleHeader. -> > CommandId is uint32 now (up to the 2^32 - 1 commands per transaction). -> > DOUBLEALIGN(Sizeof(HeapTupleHeader)) is 40 bytes now. -> > -> > 1000 inserts (into table with single int4 column, 1 insert per transaction) -> > takes 70 - 80 sec now (12.5 - 14 transactions/sec). -> > This is hardware/OS limitation: -> > -> > fd = open ("t", O_RDWR); -> > for (i = 1; i <= 1000; i++) -> > { -> > lseek(fd, 0, SEEK_END); -> > write(fd, buf, 56); -> > fsync(fd); -> > } -> > close (fd); -> > -> > takes 33 - 39 sec and so it's not possible to be faster -> > having 2 fsync-s per transaction. -> > -> > The same test on 6.2.1: 92 - 107 sec -> -> With fsync off, I just did an insert of 1000 integers into a table -> containing a single int4 column and no indexes, and it completed in 2.3 -> seconds. This is on the new source tree.. That is 434 inserts/second. -> Pretty major performance, or 2.3 ms/insert. This is on a idle PP200 -> with UltraSCSI drives. -> -> With fsync on, the time goes to 51 seconds. Wow, big difference. - -If better alternative error recovery methods were available, perhaps -a facility to replay an interval transactions log from a prior dump, -it would be reasonable to run the backend without fsync and -take advantage of the performance gains. - -I don't know the answer, but I suspect that the commercial databases -don't "fsync" the way pgsql does. - -Marc Zuckman -marc@fallon.classyad.com - -_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ -_ Visit The Home and Condo MarketPlace _ -_ http://www.ClassyAd.com _ -_ _ -_ FREE basic property listings/advertisements and searches. _ -_ _ -_ Try our premium, yet inexpensive services for a real _ -_ selling or buying edge! _ -_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ - - - -From owner-pgsql-hackers@hub.org Mon Nov 3 11:31:03 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA04080 - for ; Mon, 3 Nov 1997 11:31:00 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id LAA13680 for ; Mon, 3 Nov 1997 11:21:30 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id LAA07566; Mon, 3 Nov 1997 11:04:52 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 11:02:59 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id LAA07372 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 11:02:52 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id LAA07196 for ; Mon, 3 Nov 1997 11:02:22 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id KAA02525; - Mon, 3 Nov 1997 10:42:03 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711031542.KAA02525@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Mon, 3 Nov 1997 10:42:03 -0500 (EST) -Cc: marc@fallon.classyad.com, hackers@postgreSQL.org -In-Reply-To: <345DD614.345BF651@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 08:48:04 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> > I don't know the answer, but I suspect that the commercial databases -> > don't "fsync" the way pgsql does. -> -> Could someone try 1000 int4 inserts using postgres and -> some commercial database (on the same machine) ? - -I have been thinking about this since seeing the performance change -with/without fsync. - -Commerical databases usually do a log write every 5 or 15 minutes, and -guarantee the logs will contain everything up to this time interval. - -Couldn't we have some such mechanism? Usually they have raw space, so -they can control when the data is hitting the disk. Using a file -system, some of it may be getting to the disk without our knowing it. - -What exactly is a scenario where lack of doing explicit fsync's will -cause data corruption, rather than just lost data from the past few -minutes? - -I think Vadim has gotten fsync's down to fsync'ing the modified data -page, and pg_log. - -Let's suppose we did not fsync. There could be cases where pg_log was -fsync'ed by the OS, and some of the modified data pages are fyncs'ed by -the OS, but not others. This would leave us with a partial transaction. - -However, let's suppose we prevent pg_log from being fsync'ed somehow. -Then, because we have a no-overwrite database, we could keep control of -this, and write of some data pages, but not others would not cause us -problems because the pg_log would show all such transactions, which had -not had all their modified data pages fsync'ed, as non-committed. - -Perhaps we can even set a flag in pg_log every five minutes to indicate -whether all buffers for the page have been flushed? That way we could -not have to worry about preventing flushing of pg_log. - -Comments? - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Mon Nov 3 12:00:42 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA04456 - for ; Mon, 3 Nov 1997 12:00:40 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id LAA26054; Mon, 3 Nov 1997 11:46:49 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 11:46:33 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id LAA25932 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 11:46:30 -0500 (EST) -Received: from orion.SAPserv.Hamburg.dsh.de (polaris.sapserv.debis.de [53.2.131.8]) by hub.org (8.8.5/8.7.5) with SMTP id LAA25750 for ; Mon, 3 Nov 1997 11:45:53 -0500 (EST) -Received: by orion.SAPserv.Hamburg.dsh.de - (Linux Smail3.1.29.1 #1)} - id m0xSPfE-000BGZC; Mon, 3 Nov 97 17:47 MET -Message-Id: -From: wieck@sapserv.debis.de -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: maillist@candle.pha.pa.us (Bruce Momjian) -Date: Mon, 3 Nov 1997 17:47:43 +0100 (MET) -Cc: vadim@sable.krasnoyarsk.su, marc@fallon.classyad.com, - hackers@postgreSQL.org -Reply-To: wieck@sapserv.debis.de (Jan Wieck) -In-Reply-To: <199711031542.KAA02525@candle.pha.pa.us> from "Bruce Momjian" at Nov 3, 97 10:42:03 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=iso-8859-1 -Content-Transfer-Encoding: 8bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> -> > > I don't know the answer, but I suspect that the commercial databases -> > > don't "fsync" the way pgsql does. -> > -> > Could someone try 1000 int4 inserts using postgres and -> > some commercial database (on the same machine) ? -> -> I have been thinking about this since seeing the performance change -> with/without fsync. -> -> Commerical databases usually do a log write every 5 or 15 minutes, and -> guarantee the logs will contain everything up to this time interval. -> - - Without fsync PostgreSQL would only loose data if the OS - crashes between the last write operation of a backend and the - next regular update sync. This is seldom but if it happens it - really hurts. - - A database can omit fsync on data files (e.g. tablespaces) if - it writes a redo log. With that redo log, a backup can be - restored and than all transactions since the backup redone. - - PostgreSQL doesn't write such a redo log. So an OS crash - after the fsync of pg_log could corrupt the database without - a chance to recover. - - Isn't it time to get an (optional) redo log. I don't exactly - know all the places where our datafiles can get modified, but - I hope this is only done in the heap access methods and - vacuum. So these are the places from where the redo log data - comes from (plus transaction commit/rollback). - - -Until later, Jan - --- -#define OPINIONS "they are all mine - not those of debis or daimler-benz" - -#======================================================================# -# It's easier to get forgiveness for being wrong than for being right. # -# Let's break this rule - forgive me. # -#================================== wieck@sapserv.debis.de (Jan Wieck) # - - - - -From owner-pgsql-hackers@hub.org Mon Nov 3 14:01:06 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id OAA06775 - for ; Mon, 3 Nov 1997 14:01:04 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA22235 for ; Mon, 3 Nov 1997 13:43:15 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id NAA11482; Mon, 3 Nov 1997 13:32:40 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 13:32:02 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id NAA11204 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 13:31:58 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id NAA11119 for ; Mon, 3 Nov 1997 13:31:44 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id MAA05464; - Mon, 3 Nov 1997 12:59:01 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711031759.MAA05464@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: wieck@sapserv.debis.de -Date: Mon, 3 Nov 1997 12:59:01 -0500 (EST) -Cc: vadim@sable.krasnoyarsk.su, marc@fallon.classyad.com, - hackers@postgreSQL.org -In-Reply-To: from "wieck@sapserv.debis.de" at Nov 3, 97 05:47:43 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> -> > -> > > > I don't know the answer, but I suspect that the commercial databases -> > > > don't "fsync" the way pgsql does. -> > > -> > > Could someone try 1000 int4 inserts using postgres and -> > > some commercial database (on the same machine) ? -> > -> > I have been thinking about this since seeing the performance change -> > with/without fsync. -> > -> > Commerical databases usually do a log write every 5 or 15 minutes, and -> > guarantee the logs will contain everything up to this time interval. -> > -> -> Without fsync PostgreSQL would only loose data if the OS -> crashes between the last write operation of a backend and the -> next regular update sync. This is seldom but if it happens it -> really hurts. -> -> A database can omit fsync on data files (e.g. tablespaces) if -> it writes a redo log. With that redo log, a backup can be -> restored and than all transactions since the backup redone. -> -> PostgreSQL doesn't write such a redo log. So an OS crash -> after the fsync of pg_log could corrupt the database without -> a chance to recover. -> -> Isn't it time to get an (optional) redo log. I don't exactly -> know all the places where our datafiles can get modified, but -> I hope this is only done in the heap access methods and -> vacuum. So these are the places from where the redo log data -> comes from (plus transaction commit/rollback). -> - -Yes, but because we are a non-over-write database, I don't see why we -can't just do this without a redo log. - -Every five minutes, we fsync() all dirty pages, mark all completed -transactions as fsync'ed in pg_log, and fsync() pg_log. - -On postmaster startup, any transaction marked as completed, but not -marked as fsync'ed gets marked as aborted. - -Of course, all vacuum operations would have to be fsync'ed. - -Comments? - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Mon Nov 3 16:46:01 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA10292 - for ; Mon, 3 Nov 1997 16:45:59 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id QAA02040 for ; Mon, 3 Nov 1997 16:42:40 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA17422; Mon, 3 Nov 1997 16:34:28 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 16:34:10 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA17210 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 16:34:06 -0500 (EST) -Received: from fallon.classyad.com (root@classyad.com [152.160.43.1]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA16690 for ; Mon, 3 Nov 1997 16:33:27 -0500 (EST) -Received: from fallon.classyad.com (marc@fallon.classyad.com [152.160.43.1]) by fallon.classyad.com (8.8.5/8.7.3) with SMTP id RAA32498; Mon, 3 Nov 1997 17:33:42 -0500 -Date: Mon, 3 Nov 1997 17:33:42 -0500 (EST) -From: Marc Howard Zuckman -To: Bruce Momjian -cc: wieck@sapserv.debis.de, vadim@sable.krasnoyarsk.su, hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -In-Reply-To: <199711031759.MAA05464@candle.pha.pa.us> -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=US-ASCII -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -On Mon, 3 Nov 1997, Bruce Momjian wrote: - -> > -> > > -> > > > > I don't know the answer, but I suspect that the commercial databases -> > > > > don't "fsync" the way pgsql does. -> > > > -> > > > Could someone try 1000 int4 inserts using postgres and -> > > > some commercial database (on the same machine) ? -> > > -> > > I have been thinking about this since seeing the performance change -> > > with/without fsync. -> > > -> > > Commerical databases usually do a log write every 5 or 15 minutes, and -> > > guarantee the logs will contain everything up to this time interval. -> > > -> > -> > Without fsync PostgreSQL would only loose data if the OS -> > crashes between the last write operation of a backend and the -> > next regular update sync. This is seldom but if it happens it -> > really hurts. -> > -> > A database can omit fsync on data files (e.g. tablespaces) if -> > it writes a redo log. With that redo log, a backup can be -> > restored and than all transactions since the backup redone. -> > -> > PostgreSQL doesn't write such a redo log. So an OS crash -> > after the fsync of pg_log could corrupt the database without -> > a chance to recover. -> > -> > Isn't it time to get an (optional) redo log. I don't exactly -> > know all the places where our datafiles can get modified, but -> > I hope this is only done in the heap access methods and -> > vacuum. So these are the places from where the redo log data -> > comes from (plus transaction commit/rollback). -> > -> -> Yes, but because we are a non-over-write database, I don't see why we -> can't just do this without a redo log. - -Because if the hard drive is the reason for the failure (instead of -power out, OS bites dust, etc), the database won't be of much help. - -The redo log should be on a device different than the database. - -Marc Zuckman -marc@fallon.classyad.com - -_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ -_ Visit The Home and Condo MarketPlace _ -_ http://www.ClassyAd.com _ -_ _ -_ FREE basic property listings/advertisements and searches. _ -_ _ -_ Try our premium, yet inexpensive services for a real _ -_ selling or buying edge! _ -_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ - - - -From maillist Mon Nov 3 22:59:31 1997 -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id WAA16264; - Mon, 3 Nov 1997 22:59:31 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711040359.WAA16264@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: maillist@candle.pha.pa.us (Bruce Momjian) -Date: Mon, 3 Nov 1997 22:59:30 -0500 (EST) -Cc: vadim@sable.krasnoyarsk.su, marc@fallon.classyad.com, - hackers@postgreSQL.org -In-Reply-To: <199711031542.KAA02525@candle.pha.pa.us> from "Bruce Momjian" at Nov 3, 97 10:42:03 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Status: OR - -> -> > > I don't know the answer, but I suspect that the commercial databases -> > > don't "fsync" the way pgsql does. -> > -> > Could someone try 1000 int4 inserts using postgres and -> > some commercial database (on the same machine) ? -> -> I have been thinking about this since seeing the performance change -> with/without fsync. -> -> Commercial databases usually do a log write every 5 or 15 minutes, and -> guarantee the logs will contain everything up to this time interval. -> -> Couldn't we have some such mechanism? Usually they have raw space, so -> they can control when the data is hitting the disk. Using a file -> system, some of it may be getting to the disk without our knowing it. -> -> What exactly is a scenario where lack of doing explicit fsync's will -> cause data corruption, rather than just lost data from the past few -> minutes? -> -> I think Vadim has gotten fsync's down to fsync'ing the modified data -> page, and pg_log. -> -> Let's suppose we did not fsync. There could be cases where pg_log was -> fsync'ed by the OS, and some of the modified data pages are fyncs'ed by -> the OS, but not others. This would leave us with a partial transaction. -> -> However, let's suppose we prevent pg_log from being fsync'ed somehow. -> Then, because we have a no-overwrite database, we could keep control of -> this, and write of some data pages, but not others would not cause us -> problems because the pg_log would show all such transactions, which had -> not had all their modified data pages fsync'ed, as non-committed. -> -> Perhaps we can even set a flag in pg_log every five minutes to indicate -> whether all buffers for the page have been flushed? That way we could -> not have to worry about preventing flushing of pg_log. -> -> Comments? - -OK, here is a more formal description of what I am suggesting. It will -give us commercial dbms reliability with no-fsync performance. -Commercial dbms's usually only give restore up to 5 minutes before the -crash, and this is what I am suggesting. If we can do this, we can -remove the no-fsync option. - -First, lets suppose there exists a shared queue that is visible to all -backends and the postmaster that allows transaction id's to be added to -the queue. We also add a bit to the pg_log record called 'been_synced' -that is initially false. - -OK, once a backend starts a transaction, it puts a transaction id in -pg_log. Once the transaction is finished, it is marked as committed. -At the same time, we now put the transaction id on the shared queue. - -Every five minutes, or as defined by the administrator, the postmaster -does a sync() call. On my OS, anyone use can call sync, and I think -this is typical. update/pagecleaner does this every 30 seconds anyway, -so it is no big deal for the postmaster to call it every 5 minutes. The -nice thing about this is that the OS does the syncing of all the dirty -pages for us. (An alarm() call can set up this 5 minute timing.) - -The postmaster then locks the shared transaction id queue, makes a copy -of the entries in the queue, clears the queue, and unlocks the queue. -It does this so no one else modifies the queue while it is being -cleared. - -The postmaster then goes through pg_log, and marks each transaction as -'been_synced'. - -The postmaster also performs this on shutdown. - -On postmaster startup, all transactions are checked and any transaction -that is marked as committed but not 'been_synced' is marked as not -committed. In this way, we prevent non-synced or partially synced -transactions from being used. - -Of course, vacuum would have to do normal fsyncs because it is removing -the transaction log. - -We need the shared transaction id queue because there is no way to find -the newly committed transactions since the last sync. A transaction -can last for hours. - --- -Bruce Momjian -maillist@candle.pha.pa.us - -From owner-pgsql-hackers@hub.org Tue Nov 4 02:13:08 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA17544 - for ; Tue, 4 Nov 1997 02:13:06 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id CAA14126; Tue, 4 Nov 1997 02:07:55 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 02:04:59 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id CAA12859 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 02:04:51 -0500 (EST) -Received: from orion.SAPserv.Hamburg.dsh.de (polaris.sapserv.debis.de [53.2.131.8]) by hub.org (8.8.5/8.7.5) with SMTP id CAA12625 for ; Tue, 4 Nov 1997 02:04:12 -0500 (EST) -Received: by orion.SAPserv.Hamburg.dsh.de - (Linux Smail3.1.29.1 #1)} - id m0xSd44-000BFQC; Tue, 4 Nov 97 08:06 MET -Message-Id: -From: wieck@sapserv.debis.de -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: maillist@candle.pha.pa.us (Bruce Momjian) -Date: Tue, 4 Nov 1997 08:06:16 +0100 (MET) -Cc: maillist@candle.pha.pa.us, vadim@sable.krasnoyarsk.su, - marc@fallon.classyad.com, hackers@postgreSQL.org -Reply-To: wieck@sapserv.debis.de (Jan Wieck) -In-Reply-To: <199711040359.WAA16264@candle.pha.pa.us> from "Bruce Momjian" at Nov 3, 97 10:59:30 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=iso-8859-1 -Content-Transfer-Encoding: 8bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> OK, here is a more formal description of what I am suggesting. It will -> give us commercial dbms reliability with no-fsync performance. -> Commercial dbms's usually only give restore up to 5 minutes before the -> crash, and this is what I am suggesting. If we can do this, we can -> remove the no-fsync option. - - I'm not 100% sure but as far as I know Oracle, it can recover - up to the last committed transaction using the online redo - logs. And even if commercial dbms's aren't able to do that, - it should be our target. - -> [description about transaction queue] - - This all depends on the fact that PostgreSQL is a no - overwrite dbms. Otherwise the space of deleted tuples might - get overwritten by later transactions and the information is - finally lost. - - Another issue: All we up to now though of are crashes where - the database files are still usable after restart. But take - the simple case of a write error. A new bad block or track - will get remapped (in some way) but the data in it is lost. - So we end up with one or more totally corrupted database - files. And I don't trust mirrored disks farer than I can - throw them. A bug in the OS or a memory failure (many new - PeeCee boards don't support parity and even with parity a two - bit failure is still the wrong data but with a valid parity - bit) can also currupt the data. - - I still prefer redo logs. They should reside on a different - disk and the possibility of loosing the database files along - with the redo log is very small. - - -Until later, Jan - --- -#define OPINIONS "they are all mine - not those of debis or daimler-benz" - -#======================================================================# -# It's easier to get forgiveness for being wrong than for being right. # -# Let's break this rule - forgive me. # -#================================== wieck@sapserv.debis.de (Jan Wieck) # - - - - -From vadim@sable.krasnoyarsk.su Tue Nov 4 04:12:50 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA18487 - for ; Tue, 4 Nov 1997 04:12:48 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA03152 for ; Tue, 4 Nov 1997 04:12:06 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id QAA20591; Tue, 4 Nov 1997 16:14:06 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <345EE75D.398A68D@sable.krasnoyarsk.su> -Date: Tue, 04 Nov 1997 16:14:05 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: marc@fallon.classyad.com, hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: <199711040359.WAA16264@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> OK, here is a more formal description of what I am suggesting. It will -> give us commercial dbms reliability with no-fsync performance. -> Commercial dbms's usually only give restore up to 5 minutes before the - ^^^^^^^^^^^^^^^^^^^^^^^ -I'm sure that this is not true! -If on-line redo_file is damaged then you have -single ability: restore your last backup. -In all other cases database will be recovered up to the last -committed transaction automatically! - -DBMS-s using WAL have to fsync only redo file on commit -(and they do it!), non-overwriting systems have to -fsync data files and transaction log. - -We could optimize fsync-s for multi-user environment: do not -fsync when we're ensured that our changes flushed to disk by -another backend. - -> crash, and this is what I am suggesting. If we can do this, we can -> remove the no-fsync option. -> -... -> -> On postmaster startup, all transactions are checked and any transaction -> that is marked as committed but not 'been_synced' is marked as not -> committed. In this way, we prevent non-synced or partially synced -> transactions from being used. - -And what should users (ensured that their transaction are -committed) do in this case ? - -Vadim - -From owner-pgsql-hackers@hub.org Tue Nov 4 04:21:04 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA18536 - for ; Tue, 4 Nov 1997 04:21:01 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id EAA15551; Tue, 4 Nov 1997 04:15:15 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 04:14:23 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id EAA14464 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 04:14:18 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id EAA13437 for ; Tue, 4 Nov 1997 04:13:33 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id QAA20591; Tue, 4 Nov 1997 16:14:06 +0700 (KRS) -Message-ID: <345EE75D.398A68D@sable.krasnoyarsk.su> -Date: Tue, 04 Nov 1997 16:14:05 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: marc@fallon.classyad.com, hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: <199711040359.WAA16264@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce Momjian wrote: -> -> OK, here is a more formal description of what I am suggesting. It will -> give us commercial dbms reliability with no-fsync performance. -> Commercial dbms's usually only give restore up to 5 minutes before the - ^^^^^^^^^^^^^^^^^^^^^^^ -I'm sure that this is not true! -If on-line redo_file is damaged then you have -single ability: restore your last backup. -In all other cases database will be recovered up to the last -committed transaction automatically! - -DBMS-s using WAL have to fsync only redo file on commit -(and they do it!), non-overwriting systems have to -fsync data files and transaction log. - -We could optimize fsync-s for multi-user environment: do not -fsync when we're ensured that our changes flushed to disk by -another backend. - -> crash, and this is what I am suggesting. If we can do this, we can -> remove the no-fsync option. -> -... -> -> On postmaster startup, all transactions are checked and any transaction -> that is marked as committed but not 'been_synced' is marked as not -> committed. In this way, we prevent non-synced or partially synced -> transactions from being used. - -And what should users (ensured that their transaction are -committed) do in this case ? - -Vadim - - -From owner-pgsql-hackers@hub.org Tue Nov 4 06:43:00 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA19743 - for ; Tue, 4 Nov 1997 06:42:57 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id GAA10352; Tue, 4 Nov 1997 06:36:08 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 06:35:42 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id GAA10158 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 06:35:37 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id GAA10096 for ; Tue, 4 Nov 1997 06:35:27 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id GAA19665; - Tue, 4 Nov 1997 06:35:10 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711041135.GAA19665@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: wieck@sapserv.debis.de -Date: Tue, 4 Nov 1997 06:35:10 -0500 (EST) -Cc: hackers@postgreSQL.org (PostgreSQL-development) -In-Reply-To: from "wieck@sapserv.debis.de" at Nov 4, 97 08:06:16 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> -> > OK, here is a more formal description of what I am suggesting. It will -> > give us commercial dbms reliability with no-fsync performance. -> > Commercial dbms's usually only give restore up to 5 minutes before the -> > crash, and this is what I am suggesting. If we can do this, we can -> > remove the no-fsync option. -> -> I'm not 100% sure but as far as I know Oracle, it can recover -> up to the last committed transaction using the online redo -> logs. And even if commercial dbms's aren't able to do that, -> it should be our target. -> -> > [description about transaction queue] -> -> This all depends on the fact that PostgreSQL is a no -> overwrite dbms. Otherwise the space of deleted tuples might -> get overwritten by later transactions and the information is -> finally lost. -> -> Another issue: All we up to now though of are crashes where -> the database files are still usable after restart. But take -> the simple case of a write error. A new bad block or track -> will get remapped (in some way) but the data in it is lost. -> So we end up with one or more totally corrupted database -> files. And I don't trust mirrored disks farer than I can -> throw them. A bug in the OS or a memory failure (many new -> PeeCee boards don't support parity and even with parity a two -> bit failure is still the wrong data but with a valid parity -> bit) can also currupt the data. -> -> I still prefer redo logs. They should reside on a different -> disk and the possibility of loosing the database files along -> with the redo log is very small. - -I have been thinking about re-do logs, and I think it is a good idea. -It would not be hard to have the queries spit out to a separate file -configurable by the user. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Tue Nov 4 07:31:01 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA22051 - for ; Tue, 4 Nov 1997 07:30:59 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA07444 for ; Tue, 4 Nov 1997 07:25:14 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id HAA08818; Tue, 4 Nov 1997 07:03:30 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 07:02:44 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id HAA08418 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 07:02:29 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA08331 for ; Tue, 4 Nov 1997 07:02:07 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id GAA21484; - Tue, 4 Nov 1997 06:50:24 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711041150.GAA21484@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Tue, 4 Nov 1997 06:50:24 -0500 (EST) -Cc: marc@fallon.classyad.com, hackers@postgreSQL.org -In-Reply-To: <345EE75D.398A68D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 4, 97 04:14:05 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> -> Bruce Momjian wrote: -> > -> > OK, here is a more formal description of what I am suggesting. It will -> > give us commercial dbms reliability with no-fsync performance. -> > Commercial dbms's usually only give restore up to 5 minutes before the -> ^^^^^^^^^^^^^^^^^^^^^^^ -> I'm sure that this is not true! - -You may be right. This five minute figure is when you restore from your -previous backup, then restore from the log file. - -Can't we do something like sync every 5 seconds, rather than after every -transaction? It just seems like such overkill. - -Actually, I found a problem with my description. Because pg_log is not -fsync'ed, after a crash, pages with new transactions could have been -flushed to disk, but not the pg_log table that contains the transaction -ids. The problem is that the new backend could assign a transaction id -that is already in use. - -We could set a flag upon successful shutdown, and if it is not set on -reboot, either do a vacuum to find the max transaction id, and -invalidate all them not in pg_log as synced, or increase the next -transaction id to some huge number and invalidate all them in between. - - -> If on-line redo_file is damaged then you have -> single ability: restore your last backup. -> In all other cases database will be recovered up to the last -> committed transaction automatically! -> -> DBMS-s using WAL have to fsync only redo file on commit -> (and they do it!), non-overwriting systems have to -> fsync data files and transaction log. -> -> We could optimize fsync-s for multi-user environment: do not -> fsync when we're ensured that our changes flushed to disk by -> another backend. -> -> > crash, and this is what I am suggesting. If we can do this, we can -> > remove the no-fsync option. -> > -> ... -> > -> > On postmaster startup, all transactions are checked and any transaction -> > that is marked as committed but not 'been_synced' is marked as not -> > committed. In this way, we prevent non-synced or partially synced -> > transactions from being used. -> -> And what should users (ensured that their transaction are -> committed) do in this case ? -> -> Vadim -> -> - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From wieck@sapserv.debis.de Tue Nov 4 07:01:00 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA21697 - for ; Tue, 4 Nov 1997 07:00:58 -0500 (EST) -From: wieck@sapserv.debis.de -Received: from orion.SAPserv.Hamburg.dsh.de (polaris.sapserv.debis.de [53.2.131.8]) by renoir.op.net (o1/$ Revision: 1.14 $) with SMTP id GAA06401 for ; Tue, 4 Nov 1997 06:48:25 -0500 (EST) -Received: by orion.SAPserv.Hamburg.dsh.de - (Linux Smail3.1.29.1 #1)} - id m0xShVQ-000BGZC; Tue, 4 Nov 97 12:50 MET -Message-Id: -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: maillist@candle.pha.pa.us (Bruce Momjian) -Date: Tue, 4 Nov 1997 12:50:45 +0100 (MET) -Cc: wieck@sapserv.debis.de, hackers@postgreSQL.org -Reply-To: wieck@sapserv.debis.de (Jan Wieck) -In-Reply-To: <199711041135.GAA19665@candle.pha.pa.us> from "Bruce Momjian" at Nov 4, 97 06:35:10 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=iso-8859-1 -Content-Transfer-Encoding: 8bit -Status: OR - - -Bruce Momjian wrote: -> I have been thinking about re-do logs, and I think it is a good idea. -> It would not be hard to have the queries spit out to a separate file -> configurable by the user. - - This way the recovery process will be very complicated. When - multiple backends run concurrently, there are multiple - transactions active at the same time. And what tuples are - affected by an update e.g. depends much on the timing. - - I had something different in mind. The redo log contains the - information from the executor (e.g. the transactionId, the - tupleId and the new tuple values when calling ExecReplace()) - and the information which transactions commit and which not. - When recovering, those operations where the transactions - committed are again passed to the executors functions that do - the real updates with the values from the logfile. - - -Until later, Jan - --- -#define OPINIONS "they are all mine - not those of debis or daimler-benz" - -#======================================================================# -# It's easier to get forgiveness for being wrong than for being right. # -# Let's break this rule - forgive me. # -#================================== wieck@sapserv.debis.de (Jan Wieck) # - - - -From owner-pgsql-hackers@hub.org Tue Nov 4 07:30:59 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA22048 - for ; Tue, 4 Nov 1997 07:30:57 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA07189 for ; Tue, 4 Nov 1997 07:18:02 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id HAA08856; Tue, 4 Nov 1997 07:03:37 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 07:03:03 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id HAA08487 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 07:02:46 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA08192 for ; Tue, 4 Nov 1997 07:02:02 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id HAA21653; - Tue, 4 Nov 1997 07:00:20 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711041200.HAA21653@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!u -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Tue, 4 Nov 1997 07:00:19 -0500 (EST) -Cc: marc@fallon.classyad.com, hackers@postgreSQL.org -In-Reply-To: <345EE75D.398A68D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 4, 97 04:14:05 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> -> Bruce Momjian wrote: -> > -> > OK, here is a more formal description of what I am suggesting. It will -> > give us commercial dbms reliability with no-fsync performance. -> > Commercial dbms's usually only give restore up to 5 minutes before the -> ^^^^^^^^^^^^^^^^^^^^^^^ -> I'm sure that this is not true! -> If on-line redo_file is damaged then you have -> single ability: restore your last backup. -> In all other cases database will be recovered up to the last -> committed transaction automatically! - -I doubt commercial dbms's sync to disk after every transaction. They -pick a time, maybe five seconds, and see all dirty pages get flushed by -then. - -What they do do is to make certain that you are restored to a consistent -state, perhaps 15 seconds ago. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Tue Nov 4 07:32:45 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA22066 - for ; Tue, 4 Nov 1997 07:32:35 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id TAA20889; Tue, 4 Nov 1997 19:35:12 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <345F1680.60E33853@sable.krasnoyarsk.su> -Date: Tue, 04 Nov 1997 19:35:12 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Jan Wieck -CC: Bruce Momjian , marc@fallon.classyad.com, - hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -wieck@sapserv.debis.de wrote: -> -> I still prefer redo logs. They should reside on a different -> disk and the possibility of loosing the database files along -> with the redo log is very small. - -Agreed. This way we could don't fsync data files and -fsync both redo and pg_log. This is much faster. - -Vadim - -From vadim@sable.krasnoyarsk.su Tue Nov 4 08:00:58 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA22371 - for ; Tue, 4 Nov 1997 08:00:56 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA08540 for ; Tue, 4 Nov 1997 07:57:25 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id TAA20935; Tue, 4 Nov 1997 19:59:46 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <345F1C42.1F1A7590@sable.krasnoyarsk.su> -Date: Tue, 04 Nov 1997 19:59:46 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Jan Wieck -CC: Bruce Momjian , hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -wieck@sapserv.debis.de wrote: -> -> Bruce Momjian wrote: -> > I have been thinking about re-do logs, and I think it is a good idea. -> > It would not be hard to have the queries spit out to a separate file -> > configurable by the user. -> -> This way the recovery process will be very complicated. When -> multiple backends run concurrently, there are multiple -> transactions active at the same time. And what tuples are -> affected by an update e.g. depends much on the timing. -> -> I had something different in mind. The redo log contains the -> information from the executor (e.g. the transactionId, the -> tupleId and the new tuple values when calling ExecReplace()) -> and the information which transactions commit and which not. -> When recovering, those operations where the transactions -> committed are again passed to the executors functions that do -> the real updates with the values from the logfile. - -It seems that this is what Oracle does, but Sybase writes queries -(with transaction ids, of 'course, and before execution) and -begin, commit/abort events <-- this is better for non-overwriting -system (shorter redo file), but, agreed, recovering is more complicated. - -Vadim - -From owner-pgsql-hackers@hub.org Tue Nov 4 22:35:45 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA05060 - for ; Tue, 4 Nov 1997 22:35:43 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA26725 for ; Tue, 4 Nov 1997 22:35:10 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA27875; Tue, 4 Nov 1997 22:23:14 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 04 Nov 1997 22:20:55 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA24162 for pgsql-hackers-outgoing; Tue, 4 Nov 1997 22:20:50 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA22727 for ; Tue, 4 Nov 1997 22:20:18 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id WAA04674; - Tue, 4 Nov 1997 22:17:52 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711050317.WAA04674@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Tue, 4 Nov 1997 22:17:52 -0500 (EST) -Cc: marc@fallon.classyad.com, hackers@postgreSQL.org -In-Reply-To: <345F14E7.28CC1042@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 4, 97 07:28:23 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> -> Bruce Momjian wrote: -> > -> > > -> > > Bruce Momjian wrote: -> > > > -> > > > OK, here is a more formal description of what I am suggesting. It will -> > > > give us commercial dbms reliability with no-fsync performance. -> > > > Commercial dbms's usually only give restore up to 5 minutes before the -> > > ^^^^^^^^^^^^^^^^^^^^^^^ -> > > I'm sure that this is not true! -> > -> > You may be right. This five minute figure is when you restore from your -> > previous backup, then restore from the log file. -> > -> > Can't we do something like sync every 5 seconds, rather than after every -> > transaction? It just seems like such overkill. -> -> Isn't -F and sync in crontab the same ? - -OK, let me again try to marshall some (any?) support for my suggestion. - -Informix version 5/7 has three levels of logging: unbuffered -logging(our normal fsync mode), buffered logging, and no logging(our no -fsync mode). - -We don't have buffered logging. Buffered logging guarantees you get put -back to a consistent state after an os/server crash, usually to within -30/90 seconds. You do not have any partial transactions lying around, -but you do have some transactions that you thought were done, but are -not. - -This is faster then non-buffered logging, but not as fast as no logging. -Guess what mode everyone uses? The one we don't have, buffered logging! - -Unbuffered logging performance is terrible. Non-buffered logging is -used to load huge chunks of data during off-hours. - -The problem we have is that we fsync every transaction, which causes a -9-times slowdown in performance on single-integer inserts. - -That is a pretty heavy cost. But the alternative we give people is -no-fsync mode, where we don't sync anything, and in a crash, you could -come back with partially committed data in your database, if pg_log was -sync'ed by the database, and only some of the data pages were sync'ed, -so if any data was changing within 30 seconds of the crash, you have to -restore your previous backup. - -We really need a middle solution, that gives better data integrity, for -a smaller price. - -> -> > -> > Actually, I found a problem with my description. Because pg_log is not -> > fsync'ed, after a crash, pages with new transactions could have been -> > flushed to disk, but not the pg_log table that contains the transaction -> > ids. The problem is that the new backend could assign a transaction id -> > that is already in use. -> -> Impossible. Backend flushes pg_variable after fetching nex 32 xids. - -My suggestion is that we don't need to flush pg_variable or pg_log that -much. My suggestion would speed up the test you do with 100 inserts -inside a single transaction vs. 100 separate inserts. - -> > -> > We could set a flag upon successful shutdown, and if it is not set on -> > reboot, either do a vacuum to find the max transaction id, and -> > invalidate all them not in pg_log as synced, or increase the next -> > transaction id to some huge number and invalidate all them in between. -> > - -I have a fix for the problem stated above, and it doesn't require a -vacuum. - -We decide to fsync pg_variable and pg_log every 10,000 transactions or -oids. Then if the database is brought up, and it was not brought down -cleanly, you increment oid and transaction_id by 10,000, because you -know you couldn't have gotten more than that. All intermediate -transactions that are not marked committed/synced are marked aborted. - ---------------------------------------------------------------------------- - -The problem we have with the current system is that we sync by action, -not by time interval. If you are doing tons of inserts or updates, it -is syncing after every one. What people really want is something that -will sync not after every action, but after every minute or five -minutes, so when the system is busy, the syncing every minutes is just a -small amount, and when the system is idle, no one cares if is syncs, and -no one has to wait for the sync to complete. - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From matti@algonet.se Wed Nov 5 11:02:33 1997 -Received: from smtp.algonet.se (tomei.algonet.se [194.213.74.114]) - by candle.pha.pa.us (8.8.5/8.8.5) with SMTP id LAA02099 - for ; Wed, 5 Nov 1997 11:02:28 -0500 (EST) -Received: (qmail 6685 invoked from network); 5 Nov 1997 17:01:06 +0100 -Received: from du228-6.ppp.algonet.se (HELO gamma) (root@195.100.6.228) - by tomei.algonet.se with SMTP; 5 Nov 1997 17:01:06 +0100 -Sender: root -Message-ID: <34609871.27EED9D@algonet.se> -Date: Wed, 05 Nov 1997 17:02:16 +0100 -From: Mattias Kregert -Organization: Algonet ISP -X-Mailer: Mozilla 3.0Gold (X11; I; Linux 2.0.29 i586) -MIME-Version: 1.0 -To: Bruce Momjian -CC: pgsql-hackers@postgresql.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: <199711050317.WAA04674@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> We don't have buffered logging. Buffered logging guarantees you get put -> back to a consistent state after an os/server crash, usually to within -> 30/90 seconds. You do not have any partial transactions lying around, -> but you do have some transactions that you thought were done, but are - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> not. - ^^^^ -> -> This is faster then non-buffered logging, but not as fast as no logging. -> Guess what mode everyone uses? The one we don't have, buffered logging! - -Ouch! I would *not* like to use "buffered logging". -What's the point in having the wrong data in the database and not -knowing what updates, inserts or deletes to do to get the correct data? - -That's irrecoverable loss of data. Not what *I* want. Do *you* want it? - - -> We really need a middle solution, that gives better data integrity, for -> a smaller price. - -What I would like to have is this: - -If a backend tells the frontend that a transaction has completed, -then that transaction should absolutely not get lost in case of a crash. - -What is needed is a log of changes since the last backup. This -log would preferrably reside on a remote machine or at least -another disk. Then, if the power goes in the middle of a disk write, -the disk explodes and the computer goes up in flames, you can -install Postgresql on a new machine, restore the last backup and -re-run the change log. - - -> The problem we have with the current system is that we sync by action, -> not by time interval. If you are doing tons of inserts or updates, it -> is syncing after every one. What people really want is something that -> will sync not after every action, but after every minute or five -> minutes, so when the system is busy, the syncing every minutes is just a -> small amount, and when the system is idle, no one cares if is syncs, and -> no one has to wait for the sync to complete. - -Yes, but this would only be the first step on the way to better -crash-recovery. - -/* m */ - -From vadim@sable.krasnoyarsk.su Wed Nov 5 12:20:23 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA05156 - for ; Wed, 5 Nov 1997 12:20:13 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id LAA24123 for ; Wed, 5 Nov 1997 11:44:49 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id XAA23062; Wed, 5 Nov 1997 23:48:52 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <3460A374.41C67EA6@sable.krasnoyarsk.su> -Date: Wed, 05 Nov 1997 23:48:52 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: marc@fallon.classyad.com, hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: <199711050317.WAA04674@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> OK, let me again try to marshall some (any?) support for my suggestion. -> -> Informix version 5/7 has three levels of logging: unbuffered -> logging(our normal fsync mode), buffered logging, and no logging(our no -> fsync mode). -> -> We don't have buffered logging. Buffered logging guarantees you get put -> back to a consistent state after an os/server crash, usually to within -> 30/90 seconds. You do not have any partial transactions lying around, -> but you do have some transactions that you thought were done, but are -> not. -> -> This is faster then non-buffered logging, but not as fast as no logging. -> Guess what mode everyone uses? The one we don't have, buffered logging! -> -> Unbuffered logging performance is terrible. Non-buffered logging is -> used to load huge chunks of data during off-hours. -> -> The problem we have is that we fsync every transaction, which causes a -> 9-times slowdown in performance on single-integer inserts. -> -> That is a pretty heavy cost. But the alternative we give people is -> no-fsync mode, where we don't sync anything, and in a crash, you could -> come back with partially committed data in your database, if pg_log was -> sync'ed by the database, and only some of the data pages were sync'ed, -> so if any data was changing within 30 seconds of the crash, you have to -> restore your previous backup. -> -> We really need a middle solution, that gives better data integrity, for -> a smaller price. - -There is no fsync synchronization currently. -How could we be ensured that all modified data pages are flushed -when we decided to flush pg_log ? -If backend doesn't fsync data pages & pg_log at the commit time -then when he must flush them (data first) ? - -This is what Oracle does: - -it uses dedicated DBWR process for writing/flushing modified -data pages and LGWR process for writing/flushing redo log -(redo log is transaction log also). LGWR always flushes log pages -when committing, but durty data pages can be flushed _after_ transaction -commit when DBWR decides that it's time to do it (ala checkpoints interval). - -Using redo log we could implement buffered logging quite easy. -We can even don't use dedicated processes (but flush redo before pg_log), -though having LGWR could simplify things. - -Without redo log or without some fsync synchronization we can't implement -buffered logging. BTW, shared system cache could help with -fsync synchonization, but, imho, redo is better (and faster for -un-buffered logging too). - -> > > Actually, I found a problem with my description. Because pg_log is not -> > > fsync'ed, after a crash, pages with new transactions could have been -> > > flushed to disk, but not the pg_log table that contains the transaction -> > > ids. The problem is that the new backend could assign a transaction id -> > > that is already in use. -> > -> > Impossible. Backend flushes pg_variable after fetching nex 32 xids. -> -> My suggestion is that we don't need to flush pg_variable or pg_log that -> much. My suggestion would speed up the test you do with 100 inserts -> inside a single transaction vs. 100 separate inserts. -> -> > > -> > > We could set a flag upon successful shutdown, and if it is not set on -> > > reboot, either do a vacuum to find the max transaction id, and -> > > invalidate all them not in pg_log as synced, or increase the next -> > > transaction id to some huge number and invalidate all them in between. -> > > -> -> I have a fix for the problem stated above, and it doesn't require a -> vacuum. -> -> We decide to fsync pg_variable and pg_log every 10,000 transactions or -> oids. Then if the database is brought up, and it was not brought down -> cleanly, you increment oid and transaction_id by 10,000, because you -> know you couldn't have gotten more than that. All intermediate -> transactions that are not marked committed/synced are marked aborted. - -This is what I suppose to do by placing next available oid/xid -in shmem: this allows pre-fetch much more than 32 ids at once -without losing them when session closed. - -> The problem we have with the current system is that we sync by action, -> not by time interval. If you are doing tons of inserts or updates, it -> is syncing after every one. What people really want is something that -> will sync not after every action, but after every minute or five -> minutes, so when the system is busy, the syncing every minutes is just a -> small amount, and when the system is idle, no one cares if is syncs, and -> no one has to wait for the sync to complete. - -When I'm really doing tons of inserts/updates/deletes I use -BEGIN/END. But it doesn't work for multi-user environment, of 'course. -As for about what people really want, I remember that recently someone -said in user list that if one want to have 10-20 inserts/sec then he -should use mysql, but I got 25 inserts/sec on AIC-7880 & WD Enterprise -when using one session, 32 inserts/sec with two sessions inserting -in two different tables and only 20 inserts/sec with two sessions -inserting in the same table. Imho, this difference between 20 and 32 -is more important thing to fix, and these results are not so bad -in comparison with others. - -(BTW, we shouldn't forget about using raw devices to speed up things). - -Vadim - -From vadim@sable.krasnoyarsk.su Wed Nov 5 12:20:08 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA05150 - for ; Wed, 5 Nov 1997 12:20:07 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id LAA24889 for ; Wed, 5 Nov 1997 11:59:27 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id AAA23096; Thu, 6 Nov 1997 00:03:19 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <3460A6D7.167EB0E7@sable.krasnoyarsk.su> -Date: Thu, 06 Nov 1997 00:03:19 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Mattias Kregert -CC: Bruce Momjian , pgsql-hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: <199711050317.WAA04674@candle.pha.pa.us> <34609871.27EED9D@algonet.se> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Mattias Kregert wrote: -> -> Bruce Momjian wrote: -> > -> > We don't have buffered logging. Buffered logging guarantees you get put -> > back to a consistent state after an os/server crash, usually to within -> > 30/90 seconds. You do not have any partial transactions lying around, -> > but you do have some transactions that you thought were done, but are -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> > not. -> ^^^^ -> > -> > This is faster then non-buffered logging, but not as fast as no logging. -> > Guess what mode everyone uses? The one we don't have, buffered logging! -> -> Ouch! I would *not* like to use "buffered logging". - -And I. - -> What's the point in having the wrong data in the database and not -> knowing what updates, inserts or deletes to do to get the correct data? -> -> That's irrecoverable loss of data. Not what *I* want. Do *you* want it? -> -> > We really need a middle solution, that gives better data integrity, for -> > a smaller price. -> -> What I would like to have is this: -> -> If a backend tells the frontend that a transaction has completed, -> then that transaction should absolutely not get lost in case of a crash. - -Agreed. - -> -> What is needed is a log of changes since the last backup. This -> log would preferrably reside on a remote machine or at least -> another disk. Then, if the power goes in the middle of a disk write, -> the disk explodes and the computer goes up in flames, you can -> install Postgresql on a new machine, restore the last backup and -> re-run the change log. - -Yes. And as I already said - this will speed up things because -redo flushing is faster than flushing NNN tables which can be -unflushed for some interval. - -Vadim - -From owner-pgsql-hackers@hub.org Wed Nov 5 12:20:39 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA05168 - for ; Wed, 5 Nov 1997 12:20:38 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA25888 for ; Wed, 5 Nov 1997 12:14:14 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id MAA02259; Wed, 5 Nov 1997 12:02:33 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 05 Nov 1997 12:00:21 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id MAA00750 for pgsql-hackers-outgoing; Wed, 5 Nov 1997 12:00:10 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id LAA00598 for ; Wed, 5 Nov 1997 11:59:45 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id AAA23096; Thu, 6 Nov 1997 00:03:19 +0700 (KRS) -Message-ID: <3460A6D7.167EB0E7@sable.krasnoyarsk.su> -Date: Thu, 06 Nov 1997 00:03:19 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Mattias Kregert -CC: Bruce Momjian , pgsql-hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: <199711050317.WAA04674@candle.pha.pa.us> <34609871.27EED9D@algonet.se> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Mattias Kregert wrote: -> -> Bruce Momjian wrote: -> > -> > We don't have buffered logging. Buffered logging guarantees you get put -> > back to a consistent state after an os/server crash, usually to within -> > 30/90 seconds. You do not have any partial transactions lying around, -> > but you do have some transactions that you thought were done, but are -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> > not. -> ^^^^ -> > -> > This is faster then non-buffered logging, but not as fast as no logging. -> > Guess what mode everyone uses? The one we don't have, buffered logging! -> -> Ouch! I would *not* like to use "buffered logging". - -And I. - -> What's the point in having the wrong data in the database and not -> knowing what updates, inserts or deletes to do to get the correct data? -> -> That's irrecoverable loss of data. Not what *I* want. Do *you* want it? -> -> > We really need a middle solution, that gives better data integrity, for -> > a smaller price. -> -> What I would like to have is this: -> -> If a backend tells the frontend that a transaction has completed, -> then that transaction should absolutely not get lost in case of a crash. - -Agreed. - -> -> What is needed is a log of changes since the last backup. This -> log would preferrably reside on a remote machine or at least -> another disk. Then, if the power goes in the middle of a disk write, -> the disk explodes and the computer goes up in flames, you can -> install Postgresql on a new machine, restore the last backup and -> re-run the change log. - -Yes. And as I already said - this will speed up things because -redo flushing is faster than flushing NNN tables which can be -unflushed for some interval. - -Vadim - - -From owner-pgsql-hackers@hub.org Wed Nov 5 14:01:02 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id OAA07017 - for ; Wed, 5 Nov 1997 14:00:59 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA01759 for ; Wed, 5 Nov 1997 13:52:36 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id NAA03611; Wed, 5 Nov 1997 13:29:43 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 05 Nov 1997 13:27:48 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id NAA03291 for pgsql-hackers-outgoing; Wed, 5 Nov 1997 13:27:41 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id NAA02823 for ; Wed, 5 Nov 1997 13:26:20 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id NAA05863; - Wed, 5 Nov 1997 13:16:09 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711051816.NAA05863@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Wed, 5 Nov 1997 13:16:09 -0500 (EST) -Cc: marc@fallon.classyad.com, hackers@postgreSQL.org -In-Reply-To: <3460A374.41C67EA6@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 5, 97 11:48:52 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> There is no fsync synchronization currently. -> How could we be ensured that all modified data pages are flushed -> when we decided to flush pg_log ? -> If backend doesn't fsync data pages & pg_log at the commit time -> then when he must flush them (data first) ? - -My idea was to have the backend do a 'sync' that causes the OS to sync -all dirty pages, then mark all committed transactions on pg_log as -'synced'. Then sync pg_log. That way, there is a clear system where we -know everything is flushed to disk, and we mark the transactions as -synced. - -The only time that synced flag is used, is when the database starts up, -and it sees that the previous shutdown was not clean. - -What am I missing here? - -> -> This is what Oracle does: -> -> it uses dedicated DBWR process for writing/flushing modified -> data pages and LGWR process for writing/flushing redo log -> (redo log is transaction log also). LGWR always flushes log pages -> when committing, but durty data pages can be flushed _after_ transaction -> commit when DBWR decides that it's time to do it (ala checkpoints interval). -> -> Using redo log we could implement buffered logging quite easy. -> We can even don't use dedicated processes (but flush redo before pg_log), -> though having LGWR could simplify things. -> -> Without redo log or without some fsync synchronization we can't implement -> buffered logging. BTW, shared system cache could help with -> fsync synchonization, but, imho, redo is better (and faster for -> un-buffered logging too). -> - -I suggested my solution because it is clean, does flushing in one -central location(postmaster), and does quick restores. - -> > > > Actually, I found a problem with my description. Because pg_log is not -> > > > fsync'ed, after a crash, pages with new transactions could have been -> > > > flushed to disk, but not the pg_log table that contains the transaction -> > > > ids. The problem is that the new backend could assign a transaction id -> > > > that is already in use. -> > > -> > > Impossible. Backend flushes pg_variable after fetching nex 32 xids. -> > -> > My suggestion is that we don't need to flush pg_variable or pg_log that -> > much. My suggestion would speed up the test you do with 100 inserts -> > inside a single transaction vs. 100 separate inserts. -> > -> > > > -> > > > We could set a flag upon successful shutdown, and if it is not set on -> > > > reboot, either do a vacuum to find the max transaction id, and -> > > > invalidate all them not in pg_log as synced, or increase the next -> > > > transaction id to some huge number and invalidate all them in between. -> > > > -> > -> > I have a fix for the problem stated above, and it doesn't require a -> > vacuum. -> > -> > We decide to fsync pg_variable and pg_log every 10,000 transactions or -> > oids. Then if the database is brought up, and it was not brought down -> > cleanly, you increment oid and transaction_id by 10,000, because you -> > know you couldn't have gotten more than that. All intermediate -> > transactions that are not marked committed/synced are marked aborted. -> -> This is what I suppose to do by placing next available oid/xid -> in shmem: this allows pre-fetch much more than 32 ids at once -> without losing them when session closed. -> -> > The problem we have with the current system is that we sync by action, -> > not by time interval. If you are doing tons of inserts or updates, it -> > is syncing after every one. What people really want is something that -> > will sync not after every action, but after every minute or five -> > minutes, so when the system is busy, the syncing every minutes is just a -> > small amount, and when the system is idle, no one cares if is syncs, and -> > no one has to wait for the sync to complete. -> -> When I'm really doing tons of inserts/updates/deletes I use -> BEGIN/END. But it doesn't work for multi-user environment, of 'course. -> As for about what people really want, I remember that recently someone -> said in user list that if one want to have 10-20 inserts/sec then he -> should use mysql, but I got 25 inserts/sec on AIC-7880 & WD Enterprise -> when using one session, 32 inserts/sec with two sessions inserting -> in two different tables and only 20 inserts/sec with two sessions -> inserting in the same table. Imho, this difference between 20 and 32 -> is more important thing to fix, and these results are not so bad -> in comparison with others. -> -> (BTW, we shouldn't forget about using raw devices to speed up things). -> -> Vadim -> - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From james@blarg.net Wed Nov 5 13:26:46 1997 -Received: from animal.blarg.net (mail@animal.blarg.net [206.114.144.1]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA06130 - for ; Wed, 5 Nov 1997 13:26:26 -0500 (EST) -Received: from animal.blarg.net (james@animal.blarg.net [206.114.144.1]) - by animal.blarg.net (8.8.5/8.8.4) with SMTP - id KAA09775; Wed, 5 Nov 1997 10:26:10 -0800 -Date: Wed, 5 Nov 1997 10:26:10 -0800 (PST) -From: "James A. Hillyerd" -To: Bruce Momjian -cc: Mattias Kregert , pgsql-hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -In-Reply-To: <199711051615.LAA02260@candle.pha.pa.us> -Message-ID: -MIME-Version: 1.0 -Content-Type: TEXT/PLAIN; charset=US-ASCII -Status: OR - -On Wed, 5 Nov 1997, Bruce Momjian wrote: -> -> The strange thing I am hearing is that the people who use PostgreSQL are -> more worried about data recovery from a crash than million-dollar -> companies that use commercial databases. -> - -If I may throw in my 2 cents, I'd prefer to see that database in a -consistent state, with the data being up to date as of 1 minute or -less before the crash. I'd rather have higher performance than up to the -second data. - --james - -[ James A. Hillyerd (JH2162) - james@blarg.net - Web Developer ] -[ http://www.blarg.net/~james/ http://www.hyperglyphics.com/ ] -[ 1024/B11C3751 CA 1C B3 A9 07 2F 57 C9 91 F4 73 F2 19 A4 C5 88 ] - - -From vadim@sable.krasnoyarsk.su Wed Nov 5 14:24:03 1997 -Received: from renoir.op.net (root@renoir.op.net [206.84.208.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id OAA07830 - for ; Wed, 5 Nov 1997 14:24:02 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA02778 for ; Wed, 5 Nov 1997 14:13:45 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id CAA23376; Thu, 6 Nov 1997 02:17:51 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <3460C65E.446B9B3D@sable.krasnoyarsk.su> -Date: Thu, 06 Nov 1997 02:17:50 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: marc@fallon.classyad.com, hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: <199711051816.NAA05863@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > There is no fsync synchronization currently. -> > How could we be ensured that all modified data pages are flushed -> > when we decided to flush pg_log ? -> > If backend doesn't fsync data pages & pg_log at the commit time -> > then when he must flush them (data first) ? -> -> My idea was to have the backend do a 'sync' that causes the OS to sync -> all dirty pages, then mark all committed transactions on pg_log as -> 'synced'. Then sync pg_log. That way, there is a clear system where we -> know everything is flushed to disk, and we mark the transactions as -> synced. -> -> The only time that synced flag is used, is when the database starts up, -> and it sees that the previous shutdown was not clean. -> -> What am I missing here? - -Ok, I see. But we can avoid 'synced' flag: we can make (just before -sync-ing data pages) in-memory copies of "on-line" durty pg_log pages -to being written/fsynced and perform write/fsync from these copies -without stopping new commits in "on-line" page(s) (nothing must go -to disk from "on-line" log pages). - -Vadim - -From owner-pgsql-hackers@hub.org Wed Nov 5 14:32:25 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id OAA08101 - for ; Wed, 5 Nov 1997 14:32:21 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id OAA22970; Wed, 5 Nov 1997 14:26:47 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 05 Nov 1997 14:24:59 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id OAA22344 for pgsql-hackers-outgoing; Wed, 5 Nov 1997 14:24:56 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id OAA22319 for ; Wed, 5 Nov 1997 14:24:38 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id OAA07661; - Wed, 5 Nov 1997 14:22:46 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711051922.OAA07661@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Wed, 5 Nov 1997 14:22:45 -0500 (EST) -Cc: marc@fallon.classyad.com, hackers@postgreSQL.org -In-Reply-To: <3460A374.41C67EA6@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 5, 97 11:48:52 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Just a clarification. When I say the postmaster issues a sync, I mean -sync(2), not fsync(2). - -The sync flushes all dirty pages on all file systems. Ordinary users -can issue this, and update usually does this every 30 seconds anyway. - -By using this, we let the kernel figure out which buffers are dirty. We -don't have to figure this out in the postmaster. - -Then we update the pg_log table to mark those transactions as synced. -On recovery from a crash, we mark the committed transactions as -uncommitted if they do not have the synced flag. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Wed Nov 5 15:11:07 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA08751 - for ; Wed, 5 Nov 1997 15:10:59 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id PAA01986; Wed, 5 Nov 1997 15:01:24 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 05 Nov 1997 14:59:32 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id OAA01414 for pgsql-hackers-outgoing; Wed, 5 Nov 1997 14:59:28 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id OAA01403 for ; Wed, 5 Nov 1997 14:59:14 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id OAA08283; - Wed, 5 Nov 1997 14:53:55 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711051953.OAA08283@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Wed, 5 Nov 1997 14:53:54 -0500 (EST) -Cc: marc@fallon.classyad.com, hackers@postgreSQL.org -In-Reply-To: <3460C65E.446B9B3D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 6, 97 02:17:50 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> > The only time that synced flag is used, is when the database starts up, -> > and it sees that the previous shutdown was not clean. -> > -> > What am I missing here? -> -> Ok, I see. But we can avoid 'synced' flag: we can make (just before -> sync-ing data pages) in-memory copies of "on-line" durty pg_log pages -> to being written/fsynced and perform write/fsync from these copies -> without stopping new commits in "on-line" page(s) (nothing must go -> to disk from "on-line" log pages). - -[Working late tonight?] - -OK, now I am lost. We need the sync'ed flag so when we start the -postmaster, and we see the database we not shut down properly, we use -the flag to clear the commit flag from comitted transactions that were -not sync'ed by the postmaster. - -In my opinion, we don't need any extra copies of pg_log, we can set -those sync'ed flags while others are making changes, because before we -did our sync, we gathered a list of committed transaction ids from the -shared transaction id queue that I mentioned a while ago. - -We need this queue so we can find the newly-committed transactions that -do not have a sync flag. Another way we could do this would be to scan -pg_log before we sync, getting all the committed transaction ids without -sync flags. No lock is needed on the table. If we miss some new ones, -we will get them next time we scan. The problem I saw is that there is -no way to see when to stop scanning the pg_log table for such -transactions, so I thought each backend would have to put its newly -committed transactions in a separate place. Maybe I am wrong. - -This syncing method just seems so natural since we have pg_log. That is -why I keep bringing it up until people tell me I am stupid. - -This transaction commit/sync stuff is complicated, and takes a while to -hash out in a group. - ---------------------------------------------------------------------------- - -I just re-read your description, and I see what you are saying. My idea -has pg_log commit flag be real commit flags while the system is running, -but on reboot after failure, we remove the commit flags on non-synced -stuff before we start up. - -Your idea is to make pg_log commit flags only appear in in-memory copies -of pg_log, and write the commit flags to disk only after the sync is -done. - -Either way will work. The question is, "Which is easier?" The OS is -going to sync pg_log on its own. We would almost need a second copy of -pg_log, one copy to be used on postmaster startup, and a second to be -used by running backends, and the postmaster would make a copy of the -running backend pg_log, sync the disks, and copy it to the boot copy. - -I don't see how the backend is going to figure out which pg_log pages -were modified and need to be sent to the boot copy of pg_log. - -Now that I am thinking, here is a good idea. Instead of a fancy -transaction queue, what if we just have the backend record the lowest -numbered transaction they commit in a shared memory area. If the -current transaction id they commit is greater than the minimum, then -change nothing. That way, the backend could copy all pg_log pages -containing that minimum pg_log transaction id up to the most recent -pg_log page, do the sync, and copy just those to the boot copy of -pg_log. - -This eliminates the transaction id queue. - -The nice thing about the sync-flag in pg_log is that there is no copying -by the backend. But we would have to spin through the file to set those -sync bits. Your method just copies whole pages to the boot copy. - ---------------------------------------------------------------------------- - -I don't want to force this idea on anyone, or annoy anyone. I just -think it needs to be considered. The concepts are unusual, so once -people get the full idea, if they don't like it, we can trash it. I -still think it holds promise. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From hotz@jpl.nasa.gov Wed Nov 5 15:30:18 1997 -Received: from hotzsun.jpl.nasa.gov (hotzsun.jpl.nasa.gov [137.79.51.138]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA09500 - for ; Wed, 5 Nov 1997 15:30:16 -0500 (EST) -Received: from [137.79.51.141] (hotzmac [137.79.51.141]) by hotzsun.jpl.nasa.gov (8.7.6/8.7.3) with SMTP id MAA10100; Wed, 5 Nov 1997 12:29:58 -0800 (PST) -X-Sender: hotzmail@hotzsun.jpl.nasa.gov -Message-Id: -Mime-Version: 1.0 -Content-Type: text/plain; charset="us-ascii" -Date: Wed, 5 Nov 1997 12:29:58 -0800 -To: Bruce Momjian , - matti@algonet.se (Mattias Kregert) -From: hotz@jpl.nasa.gov (Henry B. Hotz) -Subject: Re: [HACKERS] My $.02, was: PERFORMANCE and Good Bye, Time Travel! -Cc: pgsql-hackers@postgreSQL.org -Status: OR - -At 11:15 AM 11/5/97, Bruce Momjian wrote: ->The strange thing I am hearing is that the people who use PostgreSQL are ->more worried about data recovery from a crash than million-dollar ->companies that use commercial databases. -> ->I don't get it. - -I would run PG to make sure that committed transactions were really written -to disk because that seems "correct" and I don't have the kind of -performance requirements that would push me to do otherwise. - -That said, I can see a need for varying performance/crash-immunity -tradeoffs, and at least *one* option in between "correct" and "unprotected" -operation would seem desirable. - -Signature failed Preliminary Design Review. -Feasibility of a new signature is currently being evaluated. -h.b.hotz@jpl.nasa.gov, or hbhotz@oxy.edu - - - -From owner-pgsql-hackers@hub.org Thu Nov 6 15:51:23 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA04634 - for ; Thu, 6 Nov 1997 15:51:08 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id PAA24783; Thu, 6 Nov 1997 15:36:47 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 06 Nov 1997 15:36:07 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id PAA24514 for pgsql-hackers-outgoing; Thu, 6 Nov 1997 15:36:02 -0500 (EST) -Received: from guevara.bildbasen.kiruna.se (guevara.bildbasen.kiruna.se [193.45.225.110]) by hub.org (8.8.5/8.7.5) with SMTP id PAA24319 for ; Thu, 6 Nov 1997 15:35:32 -0500 (EST) -Received: (qmail 9764 invoked by uid 129); 6 Nov 1997 20:34:35 -0000 -Date: 6 Nov 1997 20:34:35 -0000 -Message-ID: <19971106203435.9763.qmail@guevara.bildbasen.kiruna.se> -From: Goran Thyni -To: pgsql-hackers@postgreSQL.org -In-reply-to: <34619E9E.622F563@algonet.se> (message from Mattias Kregert on - Thu, 06 Nov 1997 11:40:30 +0100) -Subject: [HACKERS] Re: Performance vs. Crash Recovery -Mime-Version: 1.0 -Content-Type: text/plain; charset=ISO-8859-1 -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - - -I am getting quiet bored by this discussion, -if someone has a strong opinion about how this -should be done go ahead and make a test implementation -then we have something to discuss. - -In the mean time, if you want best possible data protection -mount you database disk sync:ed. This is safer than any scheme -we could come up with. -D*mned slow too, so everybody should be happy. :-) - -And I see no point implement a periodic sync in postmaster. -All unices has cron, why not just use that. -Or even a stupid 1-liner (ba)sh-script like: - -while true; do sleep 20; sync; done - - best regards, --- ---------------------------------------------- -Göran Thyni, sysadm, JMS Bildbasen, Kiruna - - - -From vadim@sable.krasnoyarsk.su Thu Nov 6 23:31:41 1997 -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04723 - for ; Thu, 6 Nov 1997 23:31:21 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id LAA25438; Fri, 7 Nov 1997 11:36:25 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <34629AC9.15FB7483@sable.krasnoyarsk.su> -Date: Fri, 07 Nov 1997 11:36:25 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: marc@fallon.classyad.com, hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: <199711051953.OAA08283@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > > The only time that synced flag is used, is when the database starts up, -> > > and it sees that the previous shutdown was not clean. -> > > -> > > What am I missing here? -> > -> > Ok, I see. But we can avoid 'synced' flag: we can make (just before -> > sync-ing data pages) in-memory copies of "on-line" durty pg_log pages -> > to being written/fsynced and perform write/fsync from these copies -> > without stopping new commits in "on-line" page(s) (nothing must go -> > to disk from "on-line" log pages). -> -> [Working late tonight?] - -[Yes] - -> I just re-read your description, and I see what you are saying. My idea -> has pg_log commit flag be real commit flags while the system is running, -> but on reboot after failure, we remove the commit flags on non-synced -> stuff before we start up. -> -> Your idea is to make pg_log commit flags only appear in in-memory copies -> of pg_log, and write the commit flags to disk only after the sync is -> done. -> -> Either way will work. The question is, "Which is easier?" The OS is -> going to sync pg_log on its own. We would almost need a second copy of -> pg_log, one copy to be used on postmaster startup, and a second to be -> used by running backends, and the postmaster would make a copy of the -> running backend pg_log, sync the disks, and copy it to the boot copy. -> -> I don't see how the backend is going to figure out which pg_log pages -> were modified and need to be sent to the boot copy of pg_log. -> -> Now that I am thinking, here is a good idea. Instead of a fancy -> transaction queue, what if we just have the backend record the lowest -> numbered transaction they commit in a shared memory area. If the -> current transaction id they commit is greater than the minimum, then -> change nothing. That way, the backend could copy all pg_log pages -> containing that minimum pg_log transaction id up to the most recent -> pg_log page, do the sync, and copy just those to the boot copy of -> pg_log. -> -> This eliminates the transaction id queue. -> -> The nice thing about the sync-flag in pg_log is that there is no copying -> by the backend. But we would have to spin through the file to set those -> sync bits. Your method just copies whole pages to the boot copy. - - In my plans to re-design transaction system I supposed to keep in shmem -two last pg_log pages. They are most often used and using ReadBuffer/WriteBuffer -to access them is not good idea. Also, we could use spinlock instead of -lock manager to synchronize access to these pages (as I see in spin.c -spinlock-s could be shared, but only exclusive ones are used) - spinlocks -are faster. - These two last pg_log pages are "online" ones. Race condition: when one or -both of online pages becomes non-online ones, i.e. pg_log has to be expanded -when writing commit/abort of "big" xid. This is how we could handle this -in "buffered" logging (delayed fsync) mode: - - When backend want to write commit/abort status he acquires exclusive -OnLineLogLock. If xid belongs to online pages then backend writes status -and releases spin. If xid is less than least xid on 1st online page then -backend releases spin and does exactly the same what he does in normal mode: -flush (write and fsync) all durty data files, lock pg_log for write, ReadBuffer, -update xid status, WriteBuffer, release write lock, flush pg_log. -If xid is greater than max xid on 2nd online page then the simplest way is -just do sync(); sync() (two times), flush 1st or both online pages, -read new page(s) into online pages space, update xid status, -release OnLineLogLock spin. We could try other ways but pg_log expanding -is rare case (32K xids in one pg_log page)... - All what postmaster will have to do is: -1. Get shared OnLineLogLock. -2. Copy 2 x 8K data to private place. -3. Release spinlock. -4. sync(); sync(); (two times!) -5. Flush online pages. - -We could use -F DELAY_TIME to turn fsync delayed mode ON. - -And, btw, having two bits for xact status we have only one unused -status value (0x11) currently - I would like to use this for -nested xactions and savepoints... - -> I don't want to force this idea on anyone, or annoy anyone. I just -> think it needs to be considered. The concepts are unusual, so once -> people get the full idea, if they don't like it, we can trash it. I -> still think it holds promise. - -Agreed. - -Vadim - -From owner-pgsql-hackers@hub.org Fri Nov 7 01:32:49 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07651 - for ; Fri, 7 Nov 1997 01:32:47 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA23328 for ; Thu, 6 Nov 1997 23:46:08 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA19565; Thu, 6 Nov 1997 23:38:55 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 06 Nov 1997 23:36:53 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA18911 for pgsql-hackers-outgoing; Thu, 6 Nov 1997 23:36:44 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA18779 for ; Thu, 6 Nov 1997 23:36:02 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id LAA25448; Fri, 7 Nov 1997 11:40:29 +0700 (KRS) -Message-ID: <34629BBD.59E2B600@sable.krasnoyarsk.su> -Date: Fri, 07 Nov 1997 11:40:29 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: Mattias Kregert , pgsql-hackers@postgreSQL.org -Subject: Re: Sync:ing data and log (Was: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!) -References: <199711061810.NAA02118@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce Momjian wrote: -> -> > -> > Never use sync(). Use fsync(). Other processes should take care of their -> > own syncing. If you use sync(), and you have a lot of disks, the sync -> > can -> > take half a minute if you are unlucky. -> -> We could use fsync() but then the postmaster has to know what tables -> have dirty buffers, and I don't think there is an easy way to do this. - -There is one way - shared system cache... - -Vadim - - -From vadim@sable.krasnoyarsk.su Fri Nov 7 01:31:24 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07639 - for ; Fri, 7 Nov 1997 01:31:22 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA23094 for ; Thu, 6 Nov 1997 23:39:00 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id LAA25457; Fri, 7 Nov 1997 11:43:52 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <34629C87.3F54BC7E@sable.krasnoyarsk.su> -Date: Fri, 07 Nov 1997 11:43:51 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Mattias Kregert -CC: Bruce Momjian , pgsql-hackers@postgreSQL.org -Subject: Re: Performance vs. Crash Recovery (Was: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel!) -References: <199711051615.LAA02260@candle.pha.pa.us> <34619E9E.622F563@algonet.se> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Mattias Kregert wrote: -> -> > The strange thing I am hearing is that the people who use PostgreSQL are -> > more worried about data recovery from a crash than million-dollar -> > companies that use commercial databases. -> > -> > I don't get it. -> -> Perhaps the million-dollar companies have more sophisticated hardware, -> like big expensive disk arrays, big UPS:es and parallell backup -> servers? -> If so, the risk of harware failure is much smaller for them. - -More of that - Informix is more stable than postgres: elog(FATAL) -occures sometime and in fsync delayed mode this will cause -of losing xaction too, not onle hard/OS failure. - -Vadim - -From owner-pgsql-hackers@hub.org Fri Nov 7 01:31:26 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07642 - for ; Fri, 7 Nov 1997 01:31:24 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id AAA24358 for ; Fri, 7 Nov 1997 00:09:47 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA00167; Fri, 7 Nov 1997 00:03:17 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 07 Nov 1997 00:01:26 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA29427 for pgsql-hackers-outgoing; Fri, 7 Nov 1997 00:01:19 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA29364 for ; Fri, 7 Nov 1997 00:01:02 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id XAA05565; - Thu, 6 Nov 1997 23:54:33 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711070454.XAA05565@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Thu, 6 Nov 1997 23:54:33 -0500 (EST) -Cc: marc@fallon.classyad.com, hackers@postgreSQL.org -In-Reply-To: <34629AC9.15FB7483@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 7, 97 11:36:25 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -I was worried when you didn't respond to my last list of ideas. I -thought perhaps the idea was getting on your nerves. - -I haven't dropped the idea because: - - 1) it offers 2-9 times speedup in database modifications - 2) this is how the big commercial system handle it, and I think - we need to give users this option. - 3) in the way I had it designed, it wouldn't take much work to - do it. - -Anything that promises that much speedup, if it can be done easy, I say -lets consider it, even if you loose 60 seconds of changes. - - -> In my plans to re-design transaction system I supposed to keep in shmem -> two last pg_log pages. They are most often used and using ReadBuffer/WriteBuffer -> to access them is not good idea. Also, we could use spinlock instead of -> lock manager to synchronize access to these pages (as I see in spin.c -> spinlock-s could be shared, but only exclusive ones are used) - spinlocks -> are faster. - -Ah, so you already had the idea of having on-line pages in shared memory -as part of a transaction system overhaul? Right now, does each backend -lock/read/write/unlock to get at pg_log? Wow, that is bad. - -Perhaps mmap() would be a good idea. My system has msync() to flush -mmap()'ed pages to the underlying file. You would still run fsync() -after that. This may give us the best of both worlds: a shared-memory -area of variable size, and control of when it get flushed to disk. Do -other OS's have this? I have a feeling OS's with unified buffer caches -don't have this ability to determine when the underlying mmap'ed file -gets sent to the underlying file and disk. - - -> These two last pg_log pages are "online" ones. Race condition: when one or -> both of online pages becomes non-online ones, i.e. pg_log has to be expanded -> when writing commit/abort of "big" xid. This is how we could handle this -> in "buffered" logging (delayed fsync) mode: -> -> When backend want to write commit/abort status he acquires exclusive -> OnLineLogLock. If xid belongs to online pages then backend writes status -> and releases spin. If xid is less than least xid on 1st online page then -> backend releases spin and does exactly the same what he does in normal mode: -> flush (write and fsync) all durty data files, lock pg_log for write, ReadBuffer, -> update xid status, WriteBuffer, release write lock, flush pg_log. -> If xid is greater than max xid on 2nd online page then the simplest way is -> just do sync(); sync() (two times), flush 1st or both online pages, -> read new page(s) into online pages space, update xid status, -> release OnLineLogLock spin. We could try other ways but pg_log expanding -> is rare case (32K xids in one pg_log page)... -> All what postmaster will have to do is: -> 1. Get shared OnLineLogLock. -> 2. Copy 2 x 8K data to private place. -> 3. Release spinlock. -> 4. sync(); sync(); (two times!) -> 5. Flush online pages. -> -> We could use -F DELAY_TIME to turn fsync delayed mode ON. -> -> And, btw, having two bits for xact status we have only one unused -> status value (0x11) currently - I would like to use this for -> nested xactions and savepoints... - -I saw that. By keeping two copies of pg_log, one in memory to be used -by all backend, and another that hits the disk, it certainly will work. - -> -> > I don't want to force this idea on anyone, or annoy anyone. I just -> > think it needs to be considered. The concepts are unusual, so once -> > people get the full idea, if they don't like it, we can trash it. I -> > still think it holds promise. -> -> Agreed. -> -> Vadim -> - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Fri Nov 7 01:03:09 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07314 - for ; Fri, 7 Nov 1997 01:03:05 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA07879; Fri, 7 Nov 1997 00:57:42 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 07 Nov 1997 00:55:52 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA03918 for pgsql-hackers-outgoing; Fri, 7 Nov 1997 00:55:46 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA02961 for ; Fri, 7 Nov 1997 00:55:18 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id MAA25567; Fri, 7 Nov 1997 12:59:29 +0700 (KRS) -Message-ID: <3462AE40.FF6D5DF@sable.krasnoyarsk.su> -Date: Fri, 07 Nov 1997 12:59:28 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: marc@fallon.classyad.com, hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: <199711070454.XAA05565@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Bruce Momjian wrote: -> -> I was worried when you didn't respond to my last list of ideas. I -> thought perhaps the idea was getting on your nerves. - -No, I was (and, unfortunately, I still) busy... - -> -> I haven't dropped the idea because: -> -> 1) it offers 2-9 times speedup in database modifications -> 2) this is how the big commercial system handle it, and I think -> we need to give users this option. -> 3) in the way I had it designed, it wouldn't take much work to -> do it. -> -> Anything that promises that much speedup, if it can be done easy, I say -> lets consider it, even if you loose 60 seconds of changes. - -I agreed with your un-buffered logging idea. This would be excellent -feature for un-critical dbase usings (WWW, etc). - -> -> > In my plans to re-design transaction system I supposed to keep in shmem -> > two last pg_log pages. They are most often used and using ReadBuffer/WriteBuffer -> > to access them is not good idea. Also, we could use spinlock instead of -> > lock manager to synchronize access to these pages (as I see in spin.c -> > spinlock-s could be shared, but only exclusive ones are used) - spinlocks -> > are faster. -> -> Ah, so you already had the idea of having on-line pages in shared memory -> as part of a transaction system overhaul? Right now, does each backend - -Yes. I hope to implement this in the next 1-2 weeks. - -> lock/read/write/unlock to get at pg_log? Wow, that is bad. - -Yes, he does. - -> -> Perhaps mmap() would be a good idea. My system has msync() to flush -> mmap()'ed pages to the underlying file. You would still run fsync() -> after that. This may give us the best of both worlds: a shared-memory - ^^^^^^^^^^^^^ -> area of variable size, and control of when it get flushed to disk. Do - ^^^^^^^^^^^^^^^^^^^^^ -I like it. FreeBSD supports - -MAP_ANON Map anonymous memory not associated with any specific file. - -It would be nice to use mmap to get more "shared" memory, but I don't see -reasons to mmap any particular file to memory. Having two last pg_log pages -in memory + xact commit/abort writeback optimization (updation of commit/abort -xmin/xmax status in tuples by any scan - we already have this) reduce access -to "old" pg_log pages to zero. - -> other OS's have this? I have a feeling OS's with unified buffer caches -> don't have this ability to determine when the underlying mmap'ed file -> gets sent to the underlying file and disk. -> -> > These two last pg_log pages are "online" ones. Race condition: when one or -> > both of online pages becomes non-online ones, i.e. pg_log has to be expanded -> > when writing commit/abort of "big" xid. This is how we could handle this -> > in "buffered" logging (delayed fsync) mode: -> > -> > When backend want to write commit/abort status he acquires exclusive -> > OnLineLogLock. If xid belongs to online pages then backend writes status -> > and releases spin. If xid is less than least xid on 1st online page then -> > backend releases spin and does exactly the same what he does in normal mode: -> > flush (write and fsync) all durty data files, lock pg_log for write, ReadBuffer, -> > update xid status, WriteBuffer, release write lock, flush pg_log. -> > If xid is greater than max xid on 2nd online page then the simplest way is -> > just do sync(); sync() (two times), flush 1st or both online pages, -> > read new page(s) into online pages space, update xid status, -> > release OnLineLogLock spin. We could try other ways but pg_log expanding -> > is rare case (32K xids in one pg_log page)... -> > All what postmaster will have to do is: -> > 1. Get shared OnLineLogLock. -> > 2. Copy 2 x 8K data to private place. -> > 3. Release spinlock. -> > 4. sync(); sync(); (two times!) -> > 5. Flush online pages. -> > -> > We could use -F DELAY_TIME to turn fsync delayed mode ON. -> > -> > And, btw, having two bits for xact status we have only one unused -> > status value (0x11) currently - I would like to use this for -> > nested xactions and savepoints... - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -More about this: 0x11 could mean "this _child_ transaction is committed - -you have to lookup in pg_xact_child to get parent xid and use pg_log again -to get parent xact status". If parent committed then child xact status -will be changed to 0x10 (committed) else - to 0x01 (aborted). Using this -we could get xact nesting and savepoints by starting new child xaction -inside running one... - -> -> I saw that. By keeping two copies of pg_log, one in memory to be used - ^^^^^^ - Just two pg_log pages... - -> by all backend, and another that hits the disk, it certainly will work. - -Vadim - - -From vadim@sable.krasnoyarsk.su Fri Nov 7 01:30:59 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA07599 - for ; Fri, 7 Nov 1997 01:30:58 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26793 for ; Fri, 7 Nov 1997 01:12:33 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA25592; Fri, 7 Nov 1997 13:16:39 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <3462B247.ABD322C@sable.krasnoyarsk.su> -Date: Fri, 07 Nov 1997 13:16:39 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Jan Wieck -CC: Bruce Momjian , hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -wieck@sapserv.debis.de wrote: -> -> Bruce wrote: -> > -> > > > It seems that this is what Oracle does, but Sybase writes queries -> > > > (with transaction ids, of 'course, and before execution) and -> > > > begin, commit/abort events <-- this is better for non-overwriting -> > > > system (shorter redo file), but, agreed, recovering is more complicated. -> > > > -> > > > Vadim -> > > > -> > > -> > > Writing only the queries (and only those that really modify -> > > data - no selects) would be much smarter and the redo files -> > > will be shorter. But it wouldn't fit for PostgreSQL as long -> > > as someone can submit a query like -> > > -> > > DELETE FROM xxx WHERE oid = 59337; -> > -> > Interesting point. Currently, an insert shows the OID as output in -> > psql. Perhaps we could do a little oid-manipulating to set the oid of -> > the insert. -> -> Only for simple inserts, not on -> -> INSERT INTO xxx SELECT any_type_of_merge_join; - -I don't know how but Sybase handle this and IDENTITY (case of OIDs) too. -But I don't object you, Jan, just because I havn't time to do -"log queries" redo implementation and so I would like to have "log changes" -redo at least. (Actually, "log changes" is good for my production dbase -with 1 - 2 thousand updations per day). -(BTW, "incrementing" backup could be implemented without redo - I have -some thoughts about this, - but having additional recovering is good -in any case). - -Vadim - -From owner-pgsql-hackers@hub.org Fri Nov 7 15:42:58 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA22341 - for ; Fri, 7 Nov 1997 15:42:55 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id PAA02769; Fri, 7 Nov 1997 15:28:54 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 07 Nov 1997 15:24:00 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id PAA01318 for pgsql-hackers-outgoing; Fri, 7 Nov 1997 15:23:52 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id PAA00705 for ; Fri, 7 Nov 1997 15:21:56 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id PAA20010; - Fri, 7 Nov 1997 15:20:10 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711072020.PAA20010@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Fri, 7 Nov 1997 15:20:10 -0500 (EST) -Cc: marc@fallon.classyad.com, hackers@postgreSQL.org -In-Reply-To: <3462AE40.FF6D5DF@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 7, 97 12:59:28 pm -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> > Anything that promises that much speedup, if it can be done easy, I say -> > lets consider it, even if you loose 60 seconds of changes. -> -> I agreed with your un-buffered logging idea. This would be excellent -> feature for un-critical dbase usings (WWW, etc). - -Actually, it is buffered logging. We currently have unbuffered logging, -I think. - -> > > In my plans to re-design transaction system I supposed to keep in shmem -> > > two last pg_log pages. They are most often used and using ReadBuffer/WriteBuffer -> > > to access them is not good idea. Also, we could use spinlock instead of -> > > lock manager to synchronize access to these pages (as I see in spin.c -> > > spinlock-s could be shared, but only exclusive ones are used) - spinlocks -> > > are faster. -> > -> > Ah, so you already had the idea of having on-line pages in shared memory -> > as part of a transaction system overhaul? Right now, does each backend -> -> Yes. I hope to implement this in the next 1-2 weeks. -> -> > lock/read/write/unlock to get at pg_log? Wow, that is bad. -> -> Yes, he does. -> -> > -> > Perhaps mmap() would be a good idea. My system has msync() to flush -> > mmap()'ed pages to the underlying file. You would still run fsync() -> > after that. This may give us the best of both worlds: a shared-memory -> ^^^^^^^^^^^^^ -> > area of variable size, and control of when it get flushed to disk. Do -> ^^^^^^^^^^^^^^^^^^^^^ -> I like it. FreeBSD supports -> -> MAP_ANON Map anonymous memory not associated with any specific file. -> -> It would be nice to use mmap to get more "shared" memory, but I don't see -> reasons to mmap any particular file to memory. Having two last pg_log pages -> in memory + xact commit/abort writeback optimization (updation of commit/abort -> xmin/xmax status in tuples by any scan - we already have this) reduce access -> to "old" pg_log pages to zero. - -I totally agree. There is no advantage to mmap() vs. shared memory for -us. I thought if we could control when the mmap() gets flushed to disk, -we could let the OS handle the syncing, but I doubt this is going to be -portable. - -Though, we could mmap() pg_log, and that way backends would not have to -read/write the blocks, and they could all see the same data. But with -the new scheme, they have most transaction ids in shared memory. - -Interesting you mention the scan updating the transaction status. We -would have a problem here. It is possible a backend will update the -commit status of a data page, and that data page will make it to disk, -but if there is a crash before the update pg_log gets sync'ed, there -would be a partial transaction in the system. - -I don't know any way that a backend would know the transaction has hit -disk, and the data commit flag could be set. You don't want to update -the commit flag of the data page until entire transaction has been -sync'ed. The only way to do that would be to have a 'commit and synced' -flag, but you want to save that for nested transactions. - -Another case this could come in handy is to allow reuse of superceeded -data rows. If the transaction is committed and synced, the row space -could be reused by another transaction. - -> > other OS's have this? I have a feeling OS's with unified buffer caches -> > don't have this ability to determine when the underlying mmap'ed file -> > gets sent to the underlying file and disk. -> > -> > > These two last pg_log pages are "online" ones. Race condition: when one or -> > > both of online pages becomes non-online ones, i.e. pg_log has to be expanded -> > > when writing commit/abort of "big" xid. This is how we could handle this -> > > in "buffered" logging (delayed fsync) mode: -> > > -> > > When backend want to write commit/abort status he acquires exclusive -> > > OnLineLogLock. If xid belongs to online pages then backend writes status - -This confuses me. Why does a backend need to lock pg_log to update a -transaction status? - -> > > and releases spin. If xid is less than least xid on 1st online page then -> > > backend releases spin and does exactly the same what he does in normal mode: -> > > flush (write and fsync) all durty data files, lock pg_log for write, ReadBuffer, -> > > update xid status, WriteBuffer, release write lock, flush pg_log. -> > > If xid is greater than max xid on 2nd online page then the simplest way is -> > > just do sync(); sync() (two times), flush 1st or both online pages, -> > > read new page(s) into online pages space, update xid status, -> > > release OnLineLogLock spin. We could try other ways but pg_log expanding -> > > is rare case (32K xids in one pg_log page)... -> > > All what postmaster will have to do is: -> > > 1. Get shared OnLineLogLock. -> > > 2. Copy 2 x 8K data to private place. -> > > 3. Release spinlock. -> > > 4. sync(); sync(); (two times!) -> > > 5. Flush online pages. - -Great. - -> > > -> > > We could use -F DELAY_TIME to turn fsync delayed mode ON. -> > > -> > > And, btw, having two bits for xact status we have only one unused -> > > status value (0x11) currently - I would like to use this for -> > > nested xactions and savepoints... -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> More about this: 0x11 could mean "this _child_ transaction is committed - -> you have to lookup in pg_xact_child to get parent xid and use pg_log again -> to get parent xact status". If parent committed then child xact status -> will be changed to 0x10 (committed) else - to 0x01 (aborted). Using this -> we could get xact nesting and savepoints by starting new child xaction -> inside running one... - -OK. - -> -> > -> > I saw that. By keeping two copies of pg_log, one in memory to be used -> ^^^^^^ -> Just two pg_log pages... - -Got it. - - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From owner-pgsql-hackers@hub.org Sun Nov 9 22:07:36 1997 -Received: from hub.org (hub.org [209.47.148.200]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA04655 - for ; Sun, 9 Nov 1997 22:07:30 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id VAA07023; Sun, 9 Nov 1997 21:55:54 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 09 Nov 1997 21:52:20 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id VAA06174 for pgsql-hackers-outgoing; Sun, 9 Nov 1997 21:52:13 -0500 (EST) -Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id VAA06092 for ; Sun, 9 Nov 1997 21:51:58 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id VAA04150; - Sun, 9 Nov 1997 21:50:29 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711100250.VAA04150@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! (fwd) -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Sun, 9 Nov 1997 21:50:29 -0500 (EST) -Cc: hackers@postgreSQL.org (PostgreSQL-development) -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -Forwarded message: -> > > Perhaps mmap() would be a good idea. My system has msync() to flush -> > > mmap()'ed pages to the underlying file. You would still run fsync() -> > > after that. This may give us the best of both worlds: a shared-memory -> > ^^^^^^^^^^^^^ -> > > area of variable size, and control of when it get flushed to disk. Do -> > ^^^^^^^^^^^^^^^^^^^^^ -> > I like it. FreeBSD supports -> > -> > MAP_ANON Map anonymous memory not associated with any specific file. -> > -> > It would be nice to use mmap to get more "shared" memory, but I don't see -> > reasons to mmap any particular file to memory. Having two last pg_log pages -> > in memory + xact commit/abort writeback optimization (updation of commit/abort -> > xmin/xmax status in tuples by any scan - we already have this) reduce access -> > to "old" pg_log pages to zero. -> -> I totally agree. There is no advantage to mmap() vs. shared memory for -> us. I thought if we could control when the mmap() gets flushed to disk, -> we could let the OS handle the syncing, but I doubt this is going to be -> portable. -> -> Though, we could mmap() pg_log, and that way backends would not have to -> read/write the blocks, and they could all see the same data. But with -> the new scheme, they have most transaction ids in shared memory. -> -> Interesting you mention the scan updating the transaction status. We -> would have a problem here. It is possible a backend will update the -> commit status of a data page, and that data page will make it to disk, -> but if there is a crash before the update pg_log gets sync'ed, there -> would be a partial transaction in the system. -> -> I don't know any way that a backend would know the transaction has hit -> disk, and the data commit flag could be set. You don't want to update -> the commit flag of the data page until entire transaction has been -> sync'ed. The only way to do that would be to have a 'commit and synced' -> flag, but you want to save that for nested transactions. -> -> Another case this could come in handy is to allow reuse of superceeded -> data rows. If the transaction is committed and synced, the row space -> could be reused by another transaction. -> - -I have been thinking about the mmap() issue, and it seems a natural for -pg_log. You can have every backend mmap() pg_log. It becomes a dynamic -shared memory area that is auto-initialized to the contents of pg_log, -and all changes can be made by all backends. No locking needed. We can -also flush the changes to the underlying file. Under bsdi, you can also -have the mmap area follow you across exec() calls, so each backend -doesn't have to do anything. I want to replace exec with fork also, so -the stuff would be auto-loaded in the address space of each backend. - -This way, you don't have to have two on-line pages and move them around -as pg_log grows. - -The only problem remains how to mark certain transactions as synced or -force only synced transactions to hit the pg_log file itself, and data -row commit status only should be updated for synced transactions. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -From vadim@sable.krasnoyarsk.su Sun Nov 9 23:00:58 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA05394 - for ; Sun, 9 Nov 1997 23:00:55 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA25139 for ; Sun, 9 Nov 1997 22:42:33 -0500 (EST) -Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id KAA01845; Mon, 10 Nov 1997 10:49:25 +0700 (KRS) -Sender: root@www.krasnet.ru -Message-ID: <34668444.237C228A@sable.krasnoyarsk.su> -Date: Mon, 10 Nov 1997 10:49:24 +0700 -From: "Vadim B. Mikheev" -Organization: ITTS (Krasnoyarsk) -X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386) -MIME-Version: 1.0 -To: Bruce Momjian -CC: marc@fallon.classyad.com, hackers@postgreSQL.org -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -References: <199711072020.PAA20010@candle.pha.pa.us> -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit -Status: OR - -Bruce Momjian wrote: -> -> > > Anything that promises that much speedup, if it can be done easy, I say -> > > lets consider it, even if you loose 60 seconds of changes. -> > -> > I agreed with your un-buffered logging idea. This would be excellent -> > feature for un-critical dbase usings (WWW, etc). -> -> Actually, it is buffered logging. We currently have unbuffered logging, -> I think. - -Sorry - mistyping. - -> -> Interesting you mention the scan updating the transaction status. We -> would have a problem here. It is possible a backend will update the -> commit status of a data page, and that data page will make it to disk, -> but if there is a crash before the update pg_log gets sync'ed, there -> would be a partial transaction in the system. - -You're right! Currently, only system relations can be affected by this: -backend releases locks on user tables after syncing data and pg_log. -I'll keep this in mind... - -> > > > These two last pg_log pages are "online" ones. Race condition: when one or -> > > > both of online pages becomes non-online ones, i.e. pg_log has to be expanded -> > > > when writing commit/abort of "big" xid. This is how we could handle this -> > > > in "buffered" logging (delayed fsync) mode: -> > > > -> > > > When backend want to write commit/abort status he acquires exclusive -> > > > OnLineLogLock. If xid belongs to online pages then backend writes status -> -> This confuses me. Why does a backend need to lock pg_log to update a -> transaction status? - -What if two backends try to change xact statuses in the same byte ? - -Vadim - -From owner-pgsql-hackers@hub.org Sun Nov 9 23:59:50 1997 -Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) - by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA06523 - for ; Sun, 9 Nov 1997 23:59:48 -0500 (EST) -Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA27105 for ; Sun, 9 Nov 1997 23:41:39 -0500 (EST) -Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA08860; Sun, 9 Nov 1997 23:35:42 -0500 (EST) -Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 09 Nov 1997 23:31:50 -0500 (EST) -Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA07962 for pgsql-hackers-outgoing; Sun, 9 Nov 1997 23:31:43 -0500 (EST) -Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA07875 for ; Sun, 9 Nov 1997 23:31:28 -0500 (EST) -Received: (from maillist@localhost) - by candle.pha.pa.us (8.8.5/8.8.5) id XAA05566; - Sun, 9 Nov 1997 23:17:41 -0500 (EST) -From: Bruce Momjian -Message-Id: <199711100417.XAA05566@candle.pha.pa.us> -Subject: Re: [HACKERS] PERFORMANCE and Good Bye, Time Travel! -To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev) -Date: Sun, 9 Nov 1997 23:17:41 -0500 (EST) -Cc: marc@fallon.classyad.com, hackers@postgreSQL.org -In-Reply-To: <34668444.237C228A@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 10, 97 10:49:24 am -X-Mailer: ELM [version 2.4 PL25] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit -Sender: owner-hackers@hub.org -Precedence: bulk -Status: OR - -> > > > > These two last pg_log pages are "online" ones. Race condition: when one or -> > > > > both of online pages becomes non-online ones, i.e. pg_log has to be expanded -> > > > > when writing commit/abort of "big" xid. This is how we could handle this -> > > > > in "buffered" logging (delayed fsync) mode: -> > > > > -> > > > > When backend want to write commit/abort status he acquires exclusive -> > > > > OnLineLogLock. If xid belongs to online pages then backend writes status -> > -> > This confuses me. Why does a backend need to lock pg_log to update a -> > transaction status? -> -> What if two backends try to change xact statuses in the same byte ? - -Ooo, you got me. I so hoped to prevent locking. It would be nice if: - - *x |= 3; - -would be atomic, but I don't think it is. Most RISC machines don't even -have an OR against a memory address, I think. - --- -Bruce Momjian -maillist@candle.pha.pa.us - - -- 2.40.0