Brian Behlendorf [Wed, 20 May 2015 21:36:37 +0000 (14:36 -0700)]
Wait in libzfs_init() for the /dev/zfs device
While module loading itself is synchronous the creation of the /dev/zfs
device is not. This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs. This small window between module loading and device creation
can result in spurious failures of libzfs_init().
This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created. This allows scripts to reliably use the following
shell construct without the need for additional error handling.
$ /sbin/modprobe zfs && /sbin/zpool import -a
To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed. The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently. If it
takes longer than this it will fall back to polling for up to 10 seconds.
This behavior can be customized to some degree by setting the following
new environment variables. This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior. By default module auto-loading is now disabled.
* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>" - Seconds to wait for /dev/zfs
The zfs-import-* systemd service files have been updated to call
'/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
behavior.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #2556
If the command "shellcheck" exists, then find all shell scripts and
run shellcheck on them.
* Use 'gcc' format with shellcheck.
* Exclude zfs-script-config.sh (which isn't really a script).
Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3428
Hajo Möller [Fri, 15 May 2015 21:14:56 +0000 (23:14 +0200)]
Change 3-digit octal escapes to 4-digit ones
Prefixing an octal value with a leading zero is the standard way
to disambiguate it. This change only impacts the `zfs diff` output
and is therefore very limited in scope.
Signed-off-by: Hajo M<C3><B6>ller <dasjoe@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3417
The mount helper mount.zfs MUST be in /sbin (not '$sbindir').
Commit 60e9f69 added the --with-mounthelperdir option for Gentoo
and in the process accidentally modified the default installation
location. For security reasons mount(8) expects it to only be
installed under /sbin.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3426
Tim Chase [Sat, 16 May 2015 15:40:45 +0000 (10:40 -0500)]
Initialize dbu_tqent in dmu_buf_init_user()
The dbu_evict_taskq added in 0c66c32d is only invoked via
taskq_dispatch_ent(). In these cases, ZoL's implementation of taskqs
requires the entries to be initialized first with taskq_init_ent() in
order that, among other things, the embedded spinlock is initialized
properly.
Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3419
Alexander Eremin [Wed, 13 May 2015 22:15:56 +0000 (15:15 -0700)]
Illumos 5847 - libzfs_diff should check zfs_prop_get() return
5847 libzfs_diff should check zfs_prop_get() return
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Albert Lee <trisk@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Matthew Ahrens [Thu, 14 May 2015 23:41:29 +0000 (17:41 -0600)]
Illumos 5243 - zdb -b could be much faster
5243 zdb -b could be much faster
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Jan Sanislo [Tue, 12 May 2015 20:30:19 +0000 (13:30 -0700)]
Return -ESTALE to force lookup for missing NFS file handles
There seems to be a annoying problem using NFSv4 to access ZFS
file systems under certain circumstances. It's easily reproduced:
nfs_client1: mount server:/export /mnt
nfs_client1: cd /mnt
nfs_client1: echo foo >junk
nfs_client1: cat junk
foo
Now on a different NFSv4 client:
nfs_client2: mount server:/export /mnt
nfs_client2: cd /mnt
nfs_client2: vi junk
# Make some changes to /mnt/junk and save
# This change the inode associated with /mnt/junk
Now back to the original client:
nfs_client1: cat junk
cat: junk: No such file or directory
Admittedly NFSv4 is not advertised as a cluster file system that
maintains a completely coherent view of data across multiple nodes.
But it does have some mechanisms built in that try to deal with
situations like the above. Namely, it employs specialized file
handle lookup routines that return ESTALE when a file handle contains
a non-existant inode value. The ESTALE return triggers a return
full file path lookup from the client to determine if the file has
actually gone away or if the cached file handle is no longer valid.
ZFS behavior can be brought into line with other file systems
(e.g., ext4) by applying the following patch:
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3224
Antonio Russo [Wed, 13 May 2015 14:16:42 +0000 (07:16 -0700)]
Relax restriction on zfs_ioc_next_obj() iteration
Per the documentation for dnode_next_offset in dnode.c, the "txg"
parameter specifies a lower bound on which transaction the dnode can
be found in. We are interested in all dnodes that are removed between
the first and last transaction in the snapshot. It doesn't need to be
created in that snapshot to correspond to a removed file.
In fact, the behavior of zfs diff in the test case exactly matches
this: the transaction that created the data that was deleted in snapshot
"2" was produced before, in snapshot "1", definitely predating the first
transaction in snapshot "2".
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <Tim Chase <tim@onlight.com>
Closes #2081
Brian Behlendorf [Wed, 13 May 2015 17:50:35 +0000 (10:50 -0700)]
Remove unused 'dsl_pool_t *dp' variable
When ASSERTs are compiled out by using the --disable-debug configure
option. Then the local variable 'dsl_pool_t *dp' will be unused and
generate a compiler warning. Since this variable is only used once
in the ASSERT replace it with 'ds->ds_dir->dd_pool'.
This has the additional advantage of potentially saving a few bytes
on the stack depending on how gcc decides to compile the function.
This issue was not noticed immediately because the automated builders
use --enable-debug to make the testing as rigorous as possible.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Closes #3410
Max Grossman [Wed, 8 Apr 2015 18:37:13 +0000 (11:37 -0700)]
Illumos 5765 - add support for estimating send stream size with lzc_send_space when source is a bookmark
5765 add support for estimating send stream size with lzc_send_space when source is a bookmark
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@nexenta.com>
Matthew Ahrens [Sun, 26 Apr 2015 22:27:36 +0000 (15:27 -0700)]
Illumos 5810 - zdb should print details of bpobj
5810 zdb should print details of bpobj
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Matthew Ahrens [Wed, 6 May 2015 17:38:29 +0000 (03:38 +1000)]
Illumos 5351, 5352 - scrub pauses
5351 scrub goes for an extra second each txg
5352 scrub should pause when there is some dirty data
Author: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Matthew Ahrens [Wed, 6 May 2015 17:24:09 +0000 (03:24 +1000)]
Illumos 5350 - clean up code in dnode_sync()
Author: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Alex Reece [Wed, 6 May 2015 17:08:25 +0000 (03:08 +1000)]
Illumos 5422 - preserve AVL invariants in dn_dbufs
Author: Alex Reece <alex@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Albert Lee <trisk@nexenta.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Tim Chase [Fri, 8 May 2015 20:49:56 +0000 (15:49 -0500)]
Linux 2.6.36 compat, use REQ_FAILFAST_MASK and remove pre-2.6.36 support
Commit f4af6bb783b0b7f2a6075cb1c74c225db8a157b2 which added support
for REQ_FAILFAST_MASK but the new autoconf test didn't use the same
preprocessor macro name as the code did.
The effect is that FAILFAST mode has not been enabled for ZoL in any
post-2.6.35 kernel.
Retire the HAVE_BIO_RW_FAILFAST interface used in pre-2.6.28 kernels.
Raise an error condition if the FAILFAST interface can't be detected.
Signed-off-by: Tim Chase <tim@onlight.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3386
Chris Dunlap [Wed, 6 May 2015 22:56:03 +0000 (15:56 -0700)]
Update ZED copyright boilerplate
This commit updates the copyright boilerplate within the ZED subtree.
The instructions for appending a contributor copyright line have
been removed. Manually maintaining copyright notices in this
manner is error-prone, imprecise at a file-scope granularity, and
oftentimes inaccurate. These lines can become a pernicious source of
merge conflicts. A commit log is better suited to maintaining this
information. Consequently, a line has been added to the boilerplate
to refer to the git commit log for authoritative copyright attribution.
To account for the scenario where a file may become separated from
the codebase and commit history (i.e., it is copied somewhere else),
a line has been added to identify the file's origin.
David Lamparter [Wed, 2 Jul 2014 21:47:02 +0000 (23:47 +0200)]
Safely handle security / ACL failures
The security and ACL operations should all be performed atomically.
To accomplish this there would need to significant invasive changes
made to the common code base. For the moment it's desirable for
compatibility reasons to avoid this. Therefore the code has been
updated to attempt to unwind the operation in case of failure
rather than panic.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2445
Brian Behlendorf [Thu, 29 Jan 2015 20:45:40 +0000 (12:45 -0800)]
ztest should randomly change recordsize
Improve the large block feature test coverage by extending ztest
to frequently change the recordsize. This is specificially designed
to catch corner cases which might otherwise go unnoticed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #354
* Included in this patch is a tiny ISP2() cleanup in zio_init() from
Illumos 5255.
* Unlike the upstream Illumos commit this patch does not impose an
arbitrary 128K block size limit on volumes. Volumes, like filesystems,
are limited by the zfs_max_recordsize=1M module option.
* By default the maximum record size is limited to 1M by the module
option zfs_max_recordsize. This value may be safely increased up to
16M which is the largest block size supported by the on-disk format.
At the moment, 1M blocks clearly offer a significant performance
improvement but the benefits of going beyond this for the majority
of workloads are less clear.
* The illumos version of this patch increased DMU_MAX_ACCESS to 32M.
This was determined not to be large enough when using 16M blocks
because the zfs_make_xattrdir() function will fail (EFBIG) when
assigning a TX. This was immediately observed under Linux because
all newly created files must have a security xattr created and
that was failing. Therefore, we've set DMU_MAX_ACCESS to 64M.
* On 32-bit platforms a hard limit of 1M is set for blocks due
to the limited virtual address space. We should be able to relax
this one the ABD patches are merged.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #354
Brian Behlendorf [Mon, 11 May 2015 19:05:05 +0000 (12:05 -0700)]
Fix type mismatch on 32-bit systems
The umem_alloc_aligned() function should not assume that a 'void *'
type is 64-bit. It will not be on 32-bit platforms. Rather than
complicating the ASSERT to handle this it is simply removed.
Additionally, the '%lu' format specifier should not be assumed to
imply a 64-bit value. Fix this by using the 'llu' format specifier
which will always be atleast 64-bit and explicitly casing the
variable to an u_longlong_t. This issue is handled the same way
in many other parts of the code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The metaslab_min_alloc_size option is no longer used in the code.
This functionality was removed by commit f3a7f66 and the module
options should have been dropped at that time.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Chris Dunlop [Tue, 5 May 2015 23:59:17 +0000 (09:59 +1000)]
arc_evict, arc_evict_ghost: reduce stack usage using kmem_zalloc
With debugging enabled and depending on your kernel config, the size of
arc_buf_hdr_t can blow out the stack of arc_evict() and arc_evict_ghost()
to greater than 1024 bytes. Let's avoid this.
Signed-off-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3377
Matthew Ahrens [Wed, 26 Nov 2014 17:57:30 +0000 (09:57 -0800)]
Illumos 5349 - verify that block pointer is plausible before reading
5349 verify that block pointer is plausible before reading
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Xin Li <delphij@FreeBSD.org>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Chris Dunlop [Sat, 2 May 2015 05:47:06 +0000 (15:47 +1000)]
Wait for all znodes to be released before tearing down the superblock
By the time we're tearing down our superblock the VFS has started releasing
all our inodes/znodes. Some of this work may have been handed off to our
iput taskq so we need to wait for that work to complete. However the iput
from the taskq can itself result in additional work being added to the
taskq:
Matthew Ahrens [Sun, 23 Nov 2014 03:24:52 +0000 (19:24 -0800)]
Illumos 5348 - zio_checksum_error() only fills in info if ECKSUM
5348 zio_checksum_error() only fills in info if ECKSUM
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Dan McDonald <danmcd@omniti.com>
Matthew Ahrens [Sun, 26 Apr 2015 22:24:34 +0000 (15:24 -0700)]
Illumos 5808 - spa_check_logs is not necessary on readonly pools
5808 spa_check_logs is not necessary on readonly pools
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Will Andrews [Sun, 26 Apr 2015 22:30:46 +0000 (15:30 -0700)]
Illumos 5814 - bpobj_iterate_impl(): Close a refcount leak iterating on a sublist.
5814 bpobj_iterate_impl(): Close a refcount leak iterating on a sublist.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Support exporting all imported pools in one go, using 'zpool export -a'.
This is accomplished by moving the export parts from zpool_do_export()
in to the new function zpool_export_one(). The for_each_pool() function
is used to enumerate the list of pools to be exported. Passing an argc
of 0 implies the function should be called on all pools.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #3203
Matthew Ahrens [Mon, 3 Nov 2014 20:28:43 +0000 (12:28 -0800)]
Illumos 4951 - ZFS administrative commands should use reserved space
4951 ZFS administrative commands should use reserved space, not with ENOSPC
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Matthew Ahrens [Mon, 3 Nov 2014 19:12:40 +0000 (11:12 -0800)]
Illumos 3654,3656
3654 zdb should print number of ganged blocks
3656 remove unused function zap_cursor_move_to_key()
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>
tuxoko [Fri, 1 May 2015 03:11:01 +0000 (11:11 +0800)]
Add cond_resched to zfs_zget to prevent infinite loop
It's been reported that threads would loop infinitely inside zfs_zget. The
speculated cause for this is that if an inode is marked for evict, zfs_zget
would see that and loop. However, if the looping thread doesn't yield, the
inode may not have a chance to finish evict, thus causing a infinite loop.
This patch solve this issue by add cond_resched to zfs_zget, making the
looping thread to yield when needed.
Jason Zaman [Thu, 30 Apr 2015 12:20:38 +0000 (16:20 +0400)]
dmu: fix integer overflows
The params to the functions are uint64_t, but the offsets to memcpy
/ bcopy are calculated using 32bit ints. This patch changes them to
also be uint64_t so there isnt an overflow. PaX's Size Overflow
caught this when formatting a zvol.
George Wilson [Mon, 20 Oct 2014 22:07:45 +0000 (22:07 +0000)]
Illumos #5244 - zio pipeline callers should explicitly invoke next stage
5244 zio pipeline callers should explicitly invoke next stage
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Gordon Ross <gwr@nexenta.com>
1. The unported "2932 support crash dumps to raidz, etc. pools"
caused a merge conflict due to a copyright difference in
module/zfs/vdev_raidz.c.
2. The unported "4128 disks in zpools never go away when pulled"
and additional Linux-specific changes caused merge conflicts in
module/zfs/vdev_disk.c.
Ported-by: Richard Yao <richard.yao@clusterhq.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2828
Matthew Ahrens [Sun, 26 Apr 2015 22:29:43 +0000 (15:29 -0700)]
Illumos 5812 - assertion failed in zrl_tryenter(): zr_owner==NULL
5812 assertion failed in zrl_tryenter(): zr_owner==NULL
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Illumos 5592 - NULL pointer dereference in dsl_prop_notify_all_cb()
5592 NULL pointer dereference in dsl_prop_notify_all_cb()
Author: Justin T. Gibbs <justing@spectralogic.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Illumos 5531 - NULL pointer dereference in dsl_prop_get_ds()
5531 NULL pointer dereference in dsl_prop_get_ds()
Author: Justin T. Gibbs <justing@spectralogic.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Illumos 5056 - ZFS deadlock on db_mtx and dn_holds
5056 ZFS deadlock on db_mtx and dn_holds
Author: Justin Gibbs <justing@spectralogic.com>
Reviewed by: Will Andrews <willa@spectralogic.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
sa_handle_get_from_db():
- the original patch includes an otherwise unmentioned fix for a
possible usage of an uninitialised variable
dmu_objset_open_impl():
- Under Illumos list_link_init() is the same as filling a list_node_t
with NULLs, so they don't notice if they miss doing list_link_init()
on a zero'd containing structure (e.g. allocated with kmem_zalloc as
here). Under Linux, not so much: an uninitialised list_node_t goes
"Boom!" some time later when it's used or destroyed.
dmu_objset_evict_dbufs():
- reduce stack usage using kmem_alloc()
Ported-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Alex Reece [Wed, 1 Apr 2015 15:10:58 +0000 (02:10 +1100)]
Illumos 5095 - panic when adding a duplicate dbuf to dn_dbufs
5095 panic when adding a duplicate dbuf to dn_dbufs
Author: Alex Reece <alex@delphix.com>
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Mattew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Josef Sipek <jeffpc@josefsipek.net>
Approved by: Robert Mustacchi <rm@joyent.com>
Alex Reece [Fri, 3 Apr 2015 03:14:28 +0000 (14:14 +1100)]
Illumos 4873 - zvol unmap calls can take a very long time for larger datasets
4873 zvol unmap calls can take a very long time for larger datasets
Author: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Basil Crow <basil.crow@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
dbuf_free_range():
- reduce stack usage using kmem_alloc()
- the sorted AVL tree will handle the spill block case correctly
without all the special handling in the for() loop
Ported-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Alex Reece [Wed, 1 Apr 2015 13:10:21 +0000 (00:10 +1100)]
Illumos 3897 - zfs filesystem and snapshot limits (fix leak)
3897 zfs filesystem and snapshot limits (fix leak)
Author: Alex Reece <alex.reece@delphix.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>
Jerry Jelinek [Wed, 1 Apr 2015 13:07:48 +0000 (00:07 +1100)]
Illumos 3897 - zfs filesystem and snapshot limits
3897 zfs filesystem and snapshot limits
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>
Matthew Ahrens [Fri, 5 Sep 2014 22:03:09 +0000 (00:03 +0200)]
Illumos 5134 - if ZFS_DEBUG or debug= is set, libzpool should enable debug prints
5134 if ZFS_DEBUG or debug= is set, libzpool should enable debug prints
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
In traverse_visitbp(), the input argument dnp is modified in the middle to
point to a temporary buffer. Originally this doesn't matter, because no user
of TRAVERSE_POST dereferences it. However, in fbeddd6 a piece of code is added
dereferencing dnp after the modification, creating a possible bug.
We fix this by creating a new local variable cdnp for the DMU_OT_DNODE case,
so we don't modify the input argument. Also we introduce different local
variables in the DMU_OT_OBJSET case to prevent confusion between the input
argument.
Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2060
Brian Behlendorf [Mon, 27 Apr 2015 19:15:42 +0000 (12:15 -0700)]
Merge branch 'zed-pushbullet'
This patch stack begins with cleaning up the existing ZEDLETs,
refactoring common code blocks into zed-functions.sh, adopting a
more consistent coding style, updating exit codes, etc. All scripts
now run cleanly through ShellCheck.
The old "email" ZEDLETs are replaced with new "notify" ZEDLETs. A
notification can now be sent via email and/or Pushbullet. Additional
notification methods will likely be added in the future.
Pushbullet notifications are enabled by setting the
ZED_PUSHBULLET_ACCESS_TOKEN and (optionally) ZED_PUSHBULLET_CHANNEL_TAG
in zed.rc. The Pushbullet implementation requires awk, curl, and sed
executables to be installed in the standard PATH.
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3327
Chris Dunlap [Wed, 25 Mar 2015 20:10:32 +0000 (13:10 -0700)]
Combine data-notify.sh with io-notify.sh
The data-notify.sh ZEDLET serves a very similar purpose to
io-notify.sh, namely, to generate a notification in response to a
particular error event. Initially, data-notify.sh was separated from
io-notify.sh since the "data" zevent does not (as I understand it)
pertain to a specific vdev device. This stands in contrast to the
"checksum" and "io" zevents (both handled by io-notify.sh) that can
be attributed to a specific vdev. At the time, it seemed simpler to
handle these two cases in separate scripts.
This commit adds support for the "data" zevent to io-notify.sh, and
symlinks io-notify.sh to data-notify.sh. It also adds the counts
for vdev_read_errors, vdev_write_errors, and vdev_cksum_errors to
the notification message.
Chris Dunlap [Fri, 20 Mar 2015 00:40:54 +0000 (17:40 -0700)]
Add notification to io-spare.sh
The io-spare.sh ZEDLET does not generate a notification when a failing
device is replaced with a hot spare. Maybe it should tell someone.
This commit adds a notification message to the io-spare.sh ZEDLET.
This notification is triggered when a failing device is successfully
replaced with a hot spare after encountering a checksum or io error.
Chris Dunlap [Fri, 27 Feb 2015 00:26:48 +0000 (16:26 -0800)]
Add support for Pushbullet notifications
This commit adds the zed_notify_pushbullet() function and hooks
it into zed_notify(), thereby integrating it with the existing
"notify" ZEDLETs. This enables ZED to push notifications to your
desktop computer and/or mobile device(s). It is configured with the
ZED_PUSHBULLET_ACCESS_TOKEN and ZED_PUSHBULLET_CHANNEL_TAG variables
in zed.rc.
https://www.pushbullet.com/
The Makefile install-data-local target has been replaced with
install-data-hook. With the "-local" target, there is no particular
guarantee of execution order. But with the zed.rc now potentially
containing sensitive information (i.e., the Pushbullet access token),
the recommended permissions have changed to 0600. The "-hook" target
is always executed after the main rule's work is done; thus, the
chmod will always take place after the zed.rc file has been installed.
Chris Dunlap [Fri, 27 Feb 2015 00:26:25 +0000 (16:26 -0800)]
Replace "email" ZEDLETs with "notify" ZEDLETs
Several ZEDLETs already exist for sending email in reponse to a
particular zevent. While email is ubiquitous, alternative methods may
be better suited for some configurations. Instead of duplicating the
"email" ZEDLETs for every future notification method, it is preferable
to abstract the notification method into a function. This has the
added benefit of reducing the amount of code duplicated between
ZEDLETs, and allowing related bugs to be fixed in a single location.
This commit replaces the existing "email" ZEDLETs with corresponding
"notify" ZEDLETs. In addition, the ZEDLET code for sending an
email message has been moved into the zed_notify_email() function.
And this zed_notify_email() has been added to a generic zed_notify()
function for sending notifications via all available methods that
have been configured.
This commit also changes a couple of related zed.rc variables.
ZED_EMAIL_INTERVAL_SECS is changed to ZED_NOTIFY_INTERVAL_SECS,
and ZED_EMAIL_VERBOSE is changed to ZED_NOTIFY_VERBOSE. Note that
ZED_EMAIL remains unchanged as its use is solely for the email
notification method.
Chris Dunlap [Wed, 18 Feb 2015 01:23:54 +0000 (17:23 -0800)]
Cleanup ZEDLETs
This commit factors out several common ZEDLET code blocks into
zed-functions.sh. This shortens the length of the scripts, thereby
(hopefully) making them easier to understand and maintain.
In addition, this commit revamps the coding style used by the
scripts to be more consistent and (again, hopefully) maintainable.
It now mostly follows the Google Shell Style Guide. I've tried to
assimilate the following resources:
Google Shell Style Guide
https://google-styleguide.googlecode.com/svn/trunk/shell.xml
Dash as /bin/sh
https://wiki.ubuntu.com/DashAsBinSh
Filenames and Pathnames in Shell: How to do it Correctly
http://www.dwheeler.com/essays/filenames-in-shell.html
Common shell script mistakes
http://www.pixelbeat.org/programming/shell_script_mistakes.html
Finally, this commit updates the exit codes used by the ZEDLETs to be
more consistent with one another.
All scripts run cleanly through ShellCheck <http://www.shellcheck.net/>.
All scripts have been tested on bash and dash.
Isaac Huang [Sun, 26 Apr 2015 04:25:45 +0000 (22:25 -0600)]
Remove useless variable spa_active_count
This isn't required for the Linux port because the kernel tracks
if a module is busy. The prototype for spa_busy() is also removed
since its definition was already removed.
Signed-off-by: Isaac Huang <he.huang@intel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3262
5313 Allow I/Os to be aggregated across ZIO priority classes
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Will Andrews <willa@SpectraLogic.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ned Bass [Thu, 23 Apr 2015 19:32:59 +0000 (12:32 -0700)]
Serialize access to spa->spa_feat_stats nvlist
The function spa_add_feature_stats() manipulates the shared nvlist
spa->spa_feat_stats in an unsafe concurrent manner. Add a mutex to
protect the list.
Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3335
The following panic would occur under certain heavy load:
[ 4692.202686] Kernel panic - not syncing: thread ffff8800c4f5dd60 terminating with rrw lock ffff8800da1b9c40 held
[ 4692.228053] CPU: 1 PID: 6250 Comm: mmap_deadlock Tainted: P OE 3.18.10 #7
The culprit is that ZFS_EXIT(zsb) would call tsd_exit() every time, which
would purge all tsd data for the thread. However, ZFS_ENTER is designed to be
reentrant, so we cannot allow ZFS_EXIT to blindly purge tsd data.
Instead, we rely on the new behavior of tsd_set. When NULL is passed as the
new value to tsd_set, it will automatically remove the tsd entry specified the
the key for the current thread.
rrw_tsd_key and zfs_allow_log_key already calls tsd_set(key, NULL) when
they're done. The zfs_fsyncer_key relied on ZFS_EXIT(zsb) to call tsd_exit() to
do clean up. Now we explicitly call tsd_set(key, NULL) on them.
Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3247
Tim Chase [Sun, 19 Apr 2015 03:57:36 +0000 (23:57 -0400)]
Support the "version" property on volumes via the zfs_prop_get_int() API
As of this commit, volumes do not possess the version property in
any existing OpenZFS implementation. The zpool upgrade code, however,
uses zfs_prop_get_int() to fetch the version property of all children in
a pool. The semantics of the function, however, demand that it only be
used for known valid properties so it returns a garbage value for volumes.
This patch causes the version of a volume to appear to callers using
the zfs_prop_get_int() API to be that of the default ZPL version for
the implementation. In the future, should volumes gain the property,
its actual value will be used.
Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Fixes #3313
Brian Behlendorf [Thu, 23 Apr 2015 17:09:19 +0000 (10:09 -0700)]
Extend PF_FSTRANS critical regions
Additional testing has shown that the region covered by PF_FSTRANS
needs to be extended to cover the zpl_xattr_security_init() and
init_acl() functions. The zpl_mark_dirty() function can also recurse
and therefore must always be protected.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #3331
DHE [Sat, 18 Apr 2015 11:07:53 +0000 (07:07 -0400)]
Fix formatting error in zfs(8)
Commit b1a3e93217e6e474e86345010469994c066cf875 accidentally
introduced an intentation error between the 'zfs receive'
and 'zfs allow' detailed documentation sections.
Signed-off-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3312
Chris Dunlap [Fri, 10 Apr 2015 19:56:21 +0000 (12:56 -0700)]
Fix io-spare.sh to work with disk vdevs
The "zpool status" output shows the full pathname for file-type vdevs,
but only the basename component for disk-type vdevs. In commit bee6665, the "basename" command was dropped from altering the vdev
name used when searching the "zpool status" output. Consequently,
hot-disk sparing for disk vdevs broke since "zpool status" output
was now being searched for the full pathname to the disk vdev.
Parsing the "zpool status" output in this manner is rather brittle.
It would be preferable to search for the vdev based on its guid.
But until that happens, this commit adds back the "basename" command
to fix the vdev name breakage.
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3310
DHE [Fri, 10 Apr 2015 15:14:47 +0000 (11:14 -0400)]
Rebuild init scripts on source file updates
The resulting script is not removed by 'make clean' or rebuilt
when the source files are changed. Users with long standing git
trees may find their init script is out of date.
Signed-off-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3273
Tim Chase [Tue, 14 Apr 2015 05:06:40 +0000 (01:06 -0400)]
Allocate zfs_znode_cache on the Linux slab
The Linux slab, in general, performs better than the SPl slab in cases
where a lot of objects are allocated and fragmentation is likely present.
This patch fixes pathologically bad behavior in cases where the ARC is
filled with mostly metadata and a user program needs to allocate and
dirty enough memory which would require an insignificant amount of the
ARC to be reclaimed.
If zfs_znode_cache is on the SPL slab, the system may spin for a very
long time trying to reclaim sufficient memory. If it is on the Linux
slab, the behavior has been observed to be much more predictible; the
memory is reclaimed more efficiently.
Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3283
The packed nvlist allocated in spa_config_write() may exceed the
warning threshold for large configurations. Use the vmem interfaces
for this short lived allocation.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3251
Matthew Ahrens [Fri, 27 Mar 2015 06:11:50 +0000 (17:11 +1100)]
Illumus 5693 - ztest fails in dbuf_verify: buf[i] == 0, due to dedup and bp_override
5693 ztest fails in dbuf_verify: buf[i] == 0, due to dedup and bp_override
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
George Wilson [Fri, 27 Mar 2015 04:31:52 +0000 (15:31 +1100)]
Illumos 5694 - traverse_prefetcher does not prefetch enough
5694 traverse_prefetcher does not prefetch enough
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Chris Dunlop [Fri, 27 Mar 2015 04:04:12 +0000 (15:04 +1100)]
Align code with Illumos
Align code in traverse_visitbp() with that in Illumos in preparation for
applying Illumos-5694.
No functional change: use a temporary variable pd to replace multiple
occurrences of td->td_pfd. This increases our stack use slightly more
then normal because the function is called recursively.
Signed-off-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3230
Prakash Surya [Fri, 27 Mar 2015 02:03:22 +0000 (13:03 +1100)]
Illumos 5695 - dmu_sync'ed holes do not retain birth time
5695 dmu_sync'ed holes do not retain birth time
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Ned Bass [Thu, 26 Mar 2015 19:10:26 +0000 (12:10 -0700)]
zpool import should honor overlay property
Make the 'zpool import' command honor the overlay property to allow
filesystems to be mounted on a non-empty directory. As it stands now
this property is only checked by the 'zfs mount' command. Move the
check into 'zfs_mount()` in libzpool so the property is honored for all
callers.
Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3227
Brian Behlendorf [Wed, 25 Mar 2015 23:59:17 +0000 (16:59 -0700)]
Add RHEL style kmod packages
Provide a Redhat specific zfs-kmod.spec file which uses the old style
kmods (not kmods2) packaging. By using the provided kmodtool script
packages can be built which support weak modules. This allows for the
kernel to be updated without having to rebuild the ZFS kernel modules.
Packages for RHEL/Centos/SL/TOSS which use this spec file can by built
as follows:
$ ./configure --with-spec=redhat
$ make rpms
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Fri, 27 Mar 2015 21:30:23 +0000 (14:30 -0700)]
Remove rpm/fedora directory
Originally it was thought that custom spec files might be required
for Fedora. Happily that has turns out not to be the case. Since
this directory just contains symlinks to the generic spec files it
can be removed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Fri, 20 Mar 2015 22:10:24 +0000 (15:10 -0700)]
Check all vdev labels in 'zpool import'
When using 'zpool import' to scan for available pools prefer vdev names
which reference vdevs with more valid labels. There should be two labels
at the start of the device and two labels at the end of the device. If
labels are missing then the device has been damaged or is in some other
way incomplete. Preferring names with fully intact labels helps weed out
bad paths and improves the likelihood of being able to import the pool.
This behavior only applies when scanning /dev/ for valid pools. If a
cache file exists the pools described by the cache file will be used.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Closes #3145
Closes #2844
Closes #3107
Ned Bass [Wed, 25 Mar 2015 00:00:08 +0000 (17:00 -0700)]
dbuf_free_range() overzealously frees dbufs
When called to free a spill block from a dnode, dbuf_free_range() has a
bug that results in all dbufs for the dnode getting freed. A variety of
problems may result from this bug, but a common one was a zap lookup
tripping an ASSERT because the zap buffers had been zeroed out. This
could happen on a dataset with xattr=sa set when extended attributes are
written and removed on a directory concurrently with I/O to files in
that directory.
Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Fixes #3195
Fixes #3204
Fixes #3222
Tim Chase [Mon, 23 Mar 2015 17:10:19 +0000 (12:10 -0500)]
Set the maximum ZVOL transfer size correctly
ZoL had been setting max_sectors to UINT_MAX, but until Linux 3.19, it
the kernel artifically capped it at 1024 (BLK_DEF_MAX_SECTORS).
This cap was removed in torvalds/linux@34b48db. This patch changes
it to DMU_MAX_ACCESS (in sectors) and also changes the ASSERT in
dmu_tx_hold_write() to allow the maximum transfer size.
Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3212
Gordan Bobic [Mon, 23 Mar 2015 16:17:56 +0000 (16:17 +0000)]
Execute udevadm settle before trying to import pools
Execute udevadm settle before trying to import pools. Otherwise the
disk device nodes may not be ready before import time. This is
analogous to the behavior of the init scripts and systemd units.
Signed-off-by: Gordan Bobic <gordan@steel.shatteredsilicon.net> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3213
Chris Dunlop [Mon, 16 Mar 2015 01:21:21 +0000 (12:21 +1100)]
Reduce size of zfs_sb_t: allocate z_hold_mtx separately
zfs_sb_t has grown to the point where using kmem_zalloc() for allocations
is triggering the 32k warning threshold.
We can't safely convert this entire allocation to use vmem_alloc() instead
of kmem_alloc() because the backing_dev_info structure is embedded here.
It depends on the bit_waitqueue() function which won't behave properly
when given a virtual address.
Instead, use vmem_alloc() to allocate the z_hold_mtx array separately.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #3178
Brian Behlendorf [Tue, 17 Mar 2015 22:08:22 +0000 (15:08 -0700)]
Fix arc_adjust_meta() behavior
The goal of this function is to evict enough meta data buffers from the
ARC in order to enforce the arc_meta_limit. Achieving this is slightly
more complicated than it appears because it is common for data buffers
to have holds on meta data buffers. In addition, dnode meta data buffers
will be held by the dnodes in the block preventing them from being freed.
This means we can't simply traverse the ARC and expect to always find
enough unheld meta data buffer to release.
Therefore, this function has been updated to make alternating passes
over the ARC releasing data buffers and then newly unheld meta data
buffers. This ensures forward progress is maintained and arc_meta_used
will decrease. Normally this is sufficient, but if required the ARC
will call the registered prune callbacks causing dentry and inodes to
be dropped from the VFS cache. This will make dnode meta data buffers
available for reclaim. The number of total restarts in limited by
zfs_arc_meta_adjust_restarts to prevent spinning in the rare case
where all meta data is pinned.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Issue #3160
Brian Behlendorf [Tue, 17 Mar 2015 22:07:47 +0000 (15:07 -0700)]
Restructure per-filesystem reclaim
Originally when the ARC prune callback was introduced the idea was
to register a single callback for the ZPL. The ARC could invoke this
call back if it needed the ZPL to drop dentries, inodes, or other
cache objects which might be pinning buffers in the ARC. The ZPL
would iterate over all ZFS super blocks and perform the reclaim.
For the most part this design has worked well but due to limitations
in 2.6.35 and earlier kernels there were some problems. This patch
is designed to address those issues.
1) iterate_supers_type() is not provided by all kernels which makes
it impossible to safely iterate over all zpl_fs_type filesystems in
a single callback. The most straight forward and portable way to
resolve this is to register a callback per-filesystem during mount.
The arc_*_prune_callback() functions have always supported multiple
callbacks so this is functionally a very small change.
2) Commit 050d22b removed the non-portable shrink_dcache_memory()
and shrink_icache_memory() functions and didn't replace them with
equivalent functionality. This meant that for Linux 3.1 and older
kernels the ARC had no mechanism to drop dentries and inodes from
the caches if needed. This patch adds that missing functionality
by calling shrink_dcache_parent() to release dentries which may be
pinning inodes. This will result in all unused cache entries being
dropped which is a bit heavy handed but it's the only interface
available for old kernels.
3) A zpl_drop_inode() callback is registered for kernels older than
2.6.35 which do not support the .evict_inode callback. This ensures
that when the last reference on an inode is dropped it is immediately
removed from the cache. If this isn't done than inode can end up on
the global unused LRU with no mechanism available to ZFS to drop them.
Since the ARC buffers are not dropped the hottest inodes can still
be recreated without performing disk IO.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Issue #3160
Richard Yao [Wed, 11 Mar 2015 18:24:46 +0000 (14:24 -0400)]
Increase Linux pipe buffer size on 'zfs receive'
I noticed when reviewing documentation that it is possible for user
space to use fctnl(fd, F_SETPIPE_SZ, (unsigned long) size) to change
the kernel pipe buffer size on Linux to increase the pipe size up to
the value specified in /proc/sys/fs/pipe-max-size. There are users using
mbuffer to improve zfs recv performance when piping over the network, so
it seems advantageous to integrate such functionality directly into the
zfs recv tool. This avoids the addition of two buffers and two copies
(one for the buffer mbuffer adds and another for the additional pipe),
so it should be more efficient.
This could have been made configurable and/or this could have changed
the value back to the original after we were done with the file
descriptor, but I do not see a strong case for doing either, so I
went with a simple implementation.
Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1161