From: Bruce Momjian Date: Mon, 26 Nov 2001 20:19:30 +0000 (+0000) Subject: Add to TODO item about raw device performance. X-Git-Tag: REL7_2_BETA4~160 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=07d5117a762b7e6d73f5d3d57d84ede43898fc43;p=postgresql Add to TODO item about raw device performance. --- diff --git a/doc/TODO.detail/performance b/doc/TODO.detail/performance index e5123a668d..1d3ed185fd 100644 --- a/doc/TODO.detail/performance +++ b/doc/TODO.detail/performance @@ -345,7 +345,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 10:31:10 1999 Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA29087 for ; Tue, 19 Oct 1999 10:31:08 -0400 (EDT) -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.8 $) with ESMTP id KAA27535 for ; Tue, 19 Oct 1999 10:19:47 -0400 (EDT) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.9 $) with ESMTP id KAA27535 for ; Tue, 19 Oct 1999 10:19:47 -0400 (EDT) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id KAA30328; Tue, 19 Oct 1999 10:12:10 -0400 (EDT) @@ -454,7 +454,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 21:25:30 1999 Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA28130 for ; Tue, 19 Oct 1999 21:25:26 -0400 (EDT) -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.8 $) with ESMTP id VAA10512 for ; Tue, 19 Oct 1999 21:15:28 -0400 (EDT) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.9 $) with ESMTP id VAA10512 for ; Tue, 19 Oct 1999 21:15:28 -0400 (EDT) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id VAA50745; Tue, 19 Oct 1999 21:07:23 -0400 (EDT) @@ -1002,3 +1002,114 @@ Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 +From pgsql-general-owner+M2497@hub.org Fri Jun 16 18:31:03 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04165 + for ; Fri, 16 Jun 2000 17:31:01 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.9 $) with ESMTP id RAA13110 for ; Fri, 16 Jun 2000 17:20:12 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5GLDaM14477; + Fri, 16 Jun 2000 17:13:36 -0400 (EDT) +Received: from home.dialix.com ([203.15.150.26]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5GLCQM14064 + for ; Fri, 16 Jun 2000 17:12:27 -0400 (EDT) +Received: from nemeton.com.au ([202.76.153.71]) + by home.dialix.com (8.9.3/8.9.3/JustNet) with SMTP id HAA95516 + for ; Sat, 17 Jun 2000 07:11:44 +1000 (EST) + (envelope-from giles@nemeton.com.au) +Received: (qmail 10213 invoked from network); 16 Jun 2000 09:52:29 -0000 +Received: from nemeton.com.au (203.8.3.17) + by nemeton.com.au with SMTP; 16 Jun 2000 09:52:29 -0000 +To: Jurgen Defurne +cc: Mark Stier , + postgreSQL general mailing list +Subject: Re: [GENERAL] optimization by removing the file system layer? +In-Reply-To: Message from Jurgen Defurne + of "Thu, 15 Jun 2000 20:26:57 +0200." <39491FF1.E1E583F8@glo.be> +Date: Fri, 16 Jun 2000 19:52:28 +1000 +Message-ID: <10210.961149148@nemeton.com.au> +From: Giles Lean +X-Mailing-List: pgsql-general@postgresql.org +Precedence: bulk +Sender: pgsql-general-owner@hub.org +Status: OR + + + +> I think that the Un*x filesystem is one of the reasons that large +> database vendors rather use raw devices, than filesystem storage +> files. + +This used to be the preference, back in the late 80s and possibly +early 90s. I'm seeing a preference toward using the filesystem now, +possibly with some sort of async I/O and co-operation from the OS +filesystem about interactions with the filesystem cache. + +Performance preferences don't stand still. The hardware changes, the +software changes, the volume of data changes, and different solutions +become preferable. + +> Using a raw device on the disk gives them the possibility to have +> complete control over their files, indices and objects without being +> bothered by the operating system. +> +> This speeds up things in several ways : +> - the least possible OS intervention + +Not that this is especially useful, necessarily. If the "raw" device +is in fact managed by a logical volume manager doing mirroring onto +some sort of storage array there is still plenty of OS code involved. + +The cost of using a filesystem in addition may not be much if anything +and of course a filesystem is considerably more flexible to +administer (backup, move, change size, check integrity, etc.) + +> - choose block sizes according to applications +> - reducing fragmentation +> - packing data in nearby cilinders + +... but when this storage area is spread over multiple mechanisms in a +smart storage array with write caching, you've no idea what is where +anyway. Better to let the hardware or at least the OS manage this; +there are so many levels of caching between a database and the +magnetic media that working hard to influence layout is almost +certainly a waste of time. + +Kirk McKusick tells a lovely story that once upon a time it used to be +sensible to check some registers on a particular disk controller to +find out where the heads were when scheduling I/O. Needless to say, +that is history now! + +There's a considerable cost in complexity and code in using "raw" +storage too, and it's not a one off cost: as the technologies change, +the "fast" way to do things will change and the code will have to be +updated to match. Better to leave this to the OS vendor where +possible, and take advantage of the tuning they do. + +> - Anyone other ideas -> the sky is the limit here + +> It also aids portability, at least on platforms that have an +> equivalent of a raw device. + +I don't understand that claim. Not much is portable about raw +devices, and they're typically not nearlly as well documented as the +filesystem interfaces. + +> It is also independent of the standard implemented Un*x filesystems, +> for which you will have to pay extra if you want to take extra +> measures against power loss. + +Rather, it is worse. With a Unix filesystem you get quite defined +semantics about what is written when. + +> The problem with e.g. e2fs, is that it is not robust enough if a CPU +> fails. + +ext2fs doesn't even claim to have Unix filesystem semantics. + +Regards, + +Giles + + +