From: Bruce Momjian Date: Wed, 7 Jul 2010 14:42:09 +0000 (+0000) Subject: Document the interaction of write-barrier-enabled file systems, and BBU X-Git-Tag: REL9_0_BETA3~14 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=e3243488b06aa17a4ce14a5c4e3234284a3789b8;p=postgresql Document the interaction of write-barrier-enabled file systems, and BBU caches, per June email thread. --- diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index b69bc0be68..f1c95b3484 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -1,4 +1,4 @@ - + Reliability and the Write-Ahead Log @@ -48,21 +48,27 @@ some later time. Such caches can be a reliability hazard because the memory in the disk controller cache is volatile, and will lose its contents in a power failure. Better controller cards have - battery-backed caches, meaning the card has a battery that + battery-backed unit (BBU) caches, meaning + the card has a battery that maintains power to the cache in case of system power loss. After power is restored the data will be written to the disk drives. And finally, most disk drives have caches. Some are write-through - while some are write-back, and the - same concerns about data loss exist for write-back drive caches as - exist for disk controller caches. Consumer-grade IDE and SATA drives are - particularly likely to have write-back caches that will not survive a - power failure, though ATAPI-6 introduced a drive cache - flush command (FLUSH CACHE EXT) that some file systems use, e.g. ZFS. - Many solid-state drives (SSD) also have volatile write-back - caches, and many do not honor cache flush commands by default. + while some are write-back, and the same concerns about data loss + exist for write-back drive caches as exist for disk controller + caches. Consumer-grade IDE and SATA drives are particularly likely + to have write-back caches that will not survive a power failure, + though ATAPI-6 introduced a drive cache flush command + (FLUSH CACHE EXT) that some file systems use, e.g. + ZFS, ext4. (The SCSI command + SYNCHRONIZE CACHE has long been available.) Many + solid-state drives (SSD) also have volatile write-back caches, and + many do not honor cache flush commands by default. + + + To check write caching on Linux use hdparm -I; it is enabled if there is a * next to Write cache; hdparm -W to turn off @@ -82,6 +88,25 @@ fsync_writethrough never do write caching. + + Many file systems that use write barriers (e.g. ZFS, + ext4) internally use FLUSH CACHE EXT or + SYNCHRONIZE CACHE commands to flush data to the platers on + write-back-enabled drives. Unfortunately, such write barrier file + systems behave suboptimally when combined with battery-backed unit + (BBU) disk controllers. In such setups, the synchronize + command forces all data from the BBU to the disks, eliminating much + of the benefit of the BBU. You can run the utility + src/tools/fsync in the PostgreSQL source tree to see + if you are effected. If you are effected, the performance benefits + of the BBU cache can be regained by turning off write barriers in + the file system or reconfiguring the disk controller, if that is + an option. If write barriers are turned off, make sure the battery + remains active; a faulty battery can potentially lead to data loss. + Hopefully file system and disk controller designers will eventually + address this suboptimal behavior. + + When the operating system sends a write request to the storage hardware, there is little it can do to make sure the data has arrived at a truly