Brian Behlendorf [Tue, 19 Jan 2016 18:41:21 +0000 (10:41 -0800)]
Close possible zfs_znode_held() race
Check if the lock is held while holding the z_hold_locks() lock.
This prevents a possible use-after-free bug for callers which are
not holding the lock. There currently are no such callers so this
can't cause a problem today but it has been fixed regardless.
Brian Behlendorf [Tue, 19 Jan 2016 17:04:44 +0000 (09:04 -0800)]
Linux 4.5 compat: pfn_t typedef
The pfn_t typedef was inherited from Illumos but never directly
used by any libspl consumers. This doesn't cause any issues in
user space but for consistency with the kernel build it has been
removed. See torvalds/linux/commit/34c0fd54.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Issue #4228
Brian Behlendorf [Thu, 14 Jan 2016 23:01:24 +0000 (18:01 -0500)]
Linux 4.5 compat: xattr list handler
The registered xattr .list handler was simplified in the 4.5 kernel
to only perform a permission check. Given a dentry for the file it
must return a boolean indicating if the name is visible. This
differs slightly from the previous APIs which also required the
function to copy the name in to the provided list and return its
size. That is now all the responsibility of the caller.
This should be straight forward change to make to ZoL since we've
always required the caller to make the copy. However, this was
slightly complicated by the need to support 3 older APIs. Yes,
between 2.6.32 and 4.5 there are 4 versions of this interface!
Therefore, while the functional change in this patch is small it
includes significant cleanup to make the code understandable and
maintainable. These changes include:
- Improved configure checks for .list, .get, and .set interfaces.
- Interfaces checked from newest to oldest.
- Strict checking for each possible known interface.
- Configure fails when no known interface is available.
- HAVE_*_XATTR_LIST renamed HAVE_XATTR_LIST_* for consistency
with similar iops and fops configure checks.
- POSIX_ACL_XATTR_{DEFAULT|ACCESS} were removed forcing callers to
move to their replacements, XATTR_NAME_POSIX_ACL_{DEFAULT|ACCESS}.
Compatibility wrapper were added for old kernels.
- ZPL_XATTR_LIST_WRAPPER added which behaves the same as the existing
ZPL_XATTR_{GET|SET} WRAPPERs. Only the inode is guaranteed to be
a valid pointer, passing NULL for the 'list' and 'name' variables
is allowed and must be checked for. All .list functions were
updated to use the wrapper to aid readability.
- zpl_xattr_filldir() updated to use the .list function for its
permission check which is consistent with the updated Linux 4.5
interface. If a .list function is registered it should return 0
to indicate a name should be skipped, if there is no registered
function the name will be added.
- Additional documentation from xattr(7) describing the correct
behavior for each namespace was added before the relevant handlers.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Issue #4228
Brian Behlendorf [Thu, 14 Jan 2016 18:25:10 +0000 (13:25 -0500)]
Linux 4.5 compat: get_link() / put_link()
The follow_link() interface was retired in favor of get_link().
In the process of phasing in get_link() the Linux kernel went
through two different versions. The first of which depended
on put_link() and the final version on a delayed done function.
- Improved configure checks for .follow_link, .get_link, .put_link.
- Interfaces checked from newest to oldest.
- Strict checking for each possible known interface.
- Configure fails when no known interface is available.
- Both versions .get_link are detected and supported as well
two previous versions of .follow_link.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Issue #4228
Matthew Ahrens [Thu, 14 Jan 2016 00:10:38 +0000 (16:10 -0800)]
Illumos 4953, 4954, 4955
4953 zfs rename <snapshot> need not involve libshare
4954 "zfs create" need not involve libshare if we are not sharing
4955 libshare's get_zfs_dataset need not sort the datasets
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Marcel Telka [Wed, 13 Jan 2016 23:35:55 +0000 (15:35 -0800)]
Illumos 4039 - zfs_rename()/zfs_link() needs stronger test for XDEV
4039 zfs_rename()/zfs_link() needs stronger test for XDEV
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Kevin Crowe <kevin.crowe@nexenta.com>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@nexenta.com>
Porting notes:
- This check was updated in Linux in a similar fashion early on in
the port. Therefore, this patch just reorders the function and
updates the comment so it flows the same way as the upstream code.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4218
Joe Stein [Wed, 13 Jan 2016 23:05:59 +0000 (15:05 -0800)]
Illumos 6298 - zfs_create_008_neg and zpool_create_023_neg
6298 zfs_create_008_neg and zpool_create_023_neg need to be updated
for large block support
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
George Wilson [Wed, 13 Jan 2016 22:37:39 +0000 (14:37 -0800)]
Illumos 3557, 3558, 3559, 3560
3557 dumpvp_size is not updated correctly when a dump zvol's size is changed
3558 setting the volsize on a dump device does not return back ENOSPC
3559 setting a volsize larger than the space available sometimes succeeds
3560 dumpadm should be able to remove a dump device
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Albert Lee <trisk@nexenta.com>
Porting notes:
- Internal zvol.c changes not applied due to implementation differences.
The external interface and behavior was already consistent with the
latest upstream code.
- Retired 2.6.28 HAVE_CHECK_DISK_SIZE_CHANGE configure check. All
supported kernels (2.6.32 and newer) provide this interface.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4217
Chunwei Chen [Wed, 30 Dec 2015 23:47:11 +0000 (15:47 -0800)]
Prevent duplicated xattr between SA and dir
When replacing an xattr would cause overflowing in SA, we would fallback
to xattr dir. However, current implementation don't clear the one in SA,
so we would end up with duplicated SA.
For example, running the following script on an xattr=sa filesystem
would cause duplicated "user.1".
-- dup_xattr.sh begin --
randbase64()
{
dd if=/dev/urandom bs=1 count=$1 2>/dev/null | openssl enc -a -A
}
Also, when a filesystem is switch from xattr=sa to xattr=on, it will
never modify those in SA. This would cause strange behavior like, you
cannot delete an xattr, or setxattr would cause duplicate and the result
would not match when you getxattr.
For example, the following shell sequence.
-- shell begin --
$ sudo zfs set xattr=sa pp/fs0
$ touch zzz
$ setfattr -n user.test -v asdf zzz
$ sudo zfs set xattr=on pp/fs0
$ setfattr -x user.test zzz
setfattr: zzz: No such attribute
$ getfattr -d zzz
user.test="asdf"
$ setfattr -n user.test -v zxcv zzz
$ getfattr -d zzz
user.test="asdf"
user.test="asdf"
-- shell end --
We fix this behavior, by first finding where the xattr resides before
setxattr. Then, after we successfully updated the xattr in one location,
we will clear the other location. Note that, because update and clear
are not in single tx, we could still end up with duplicated xattr. But
by doing setxattr again, it can be fixed.
Richard Yao [Tue, 28 Jul 2015 14:22:56 +0000 (10:22 -0400)]
Remove fastwrite mutex
The fast write mutex is intended to protect accounting, but it is
redundant because all accounting is performed through atomic operations.
It also serializes all metaslab IO behind a mutex, which introduces a
theoretical scaling regression that the Illumos developers did not like
when we showed this to them. Removing it makes the selection of the
metaslab_group lock free as it is on Illumos. The selection is not quite
the same without the lock because the loop races with IO completions,
but any imbalances caused by this are likely to be corrected by
subsequent metaslab group selections.
Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3643
Brian Behlendorf [Tue, 22 Dec 2015 21:47:38 +0000 (13:47 -0800)]
Fix zsb->z_hold_mtx deadlock
The zfs_znode_hold_enter() / zfs_znode_hold_exit() functions are used to
serialize access to a znode and its SA buffer while the object is being
created or destroyed. This kind of locking would normally reside in the
znode itself but in this case that's impossible because the znode and SA
buffer may not yet exist. Therefore the locking is handled externally
with an array of mutexs and AVLs trees which contain per-object locks.
In zfs_znode_hold_enter() a per-object lock is created as needed, inserted
in to the correct AVL tree and finally the per-object lock is held. In
zfs_znode_hold_exit() the process is reversed. The per-object lock is
released, removed from the AVL tree and destroyed if there are no waiters.
This scheme has two important properties:
1) No memory allocations are performed while holding one of the z_hold_locks.
This ensures evict(), which can be called from direct memory reclaim, will
never block waiting on a z_hold_locks which just happens to have hashed
to the same index.
2) All locks used to serialize access to an object are per-object and never
shared. This minimizes lock contention without creating a large number
of dedicated locks.
On the downside it does require znode_lock_t structures to be frequently
allocated and freed. However, because these are backed by a kmem cache
and very short lived this cost is minimal.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4106
Brian Behlendorf [Fri, 18 Dec 2015 20:19:14 +0000 (12:19 -0800)]
Add zfs_object_mutex_size module option
Add a zfs_object_mutex_size module option to facilitate resizing the
the per-dataset znode mutex array. Increasing this value may help
make the deadlock described in #4106 less common, but this is not a
proper fix. This patch is primarily to aid debugging and analysis.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #4106
Brian Behlendorf [Wed, 13 Jan 2016 19:39:25 +0000 (11:39 -0800)]
Illumos 3465, 3466, 3467, 3468, 3470, 3473
3465 ::walk ... | ::<dcmd> misinterprets input as symbol names
3466 ::tsd should handle missing/NULL values better
3467 mdb_ctf_vread() could be more useful
3468 mdb enhancements for zfs development
3470 ::whatis does not print callers from KMF_LITE
3473 mdb_get_module() returns wrong module
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@nexenta.com>
Brian Behlendorf [Wed, 13 Jan 2016 18:41:24 +0000 (10:41 -0800)]
Increase default user space stack size
Under RHEL6/CentOS6 the default stack size must be increased to 32K
to prevent overflowing the stack when running ztest. This isn't an
issue for other distributions due to either the version of pthreads
or perhaps the compiler. Doubling the stack size resolves the
issue safely for all distribution and leaves us some headroom.
loading space map for vdev 0 of 1, metaslab 0 of 30 ...
...
loading space map for vdev 0 of 1, metaslab 14 of 30 ...
child died with signal 11
Exited ztest with error 3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4215
Will Andrews [Thu, 31 Dec 2015 16:38:59 +0000 (17:38 +0100)]
Illumos 3749 - zfs event processing should work on R/O root filesystems
3749 zfs event processing should work on R/O root filesystems
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>
Porting notes:
- [include/sys/spa_impl.h]
- ffe9d38 Add generic errata infrastructure
- 1421c89 Add visibility in to arc_read
- [include/sys/fm/fs/zfs.h]
- 2668527 Add linux events
- 6283f55 Support custom build directories and move includes
- [module/zfs/spa_config.c]
- Updated spa_config_sync() to match illumos with the exception
of a Linux specific block.
Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Marcel Telka [Sun, 10 Jan 2016 22:35:29 +0000 (23:35 +0100)]
Illumos 6280 - libzfs: unshare_one() could fail with EZFS_SHARENFSFAILED
6280 libzfs: unshare_one() could fail with EZFS_SHARENFSFAILED
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Matthew Ahrens [Sat, 9 Jan 2016 18:33:11 +0000 (19:33 +0100)]
Illumos 5141 - zfs minimum indirect block size is 4K
5141 zfs minimum indirect block size is 4K
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Justin Gibbs [Sat, 9 Jan 2016 17:29:05 +0000 (18:29 +0100)]
Illumos 5438 - zfs_blkptr_verify should continue after zfs_panic_recover
5438 zfs_blkptr_verify should continue after zfs_panic_recover
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Xin LI <delphij@freebsd.org>
Approved by: Dan McDonald <danmcd@omniti.com>
George Wilson [Sat, 9 Jan 2016 16:19:10 +0000 (17:19 +0100)]
Illumos 6281 - prefetching should apply to 1MB reads
6281 prefetching should apply to 1MB reads
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Alexander Motin <mav@freebsd.org>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
Dan Vatca [Sat, 9 Jan 2016 17:42:21 +0000 (18:42 +0100)]
Illumos 6358 - A faulted pool with only unavailable vdevs
6358 A faulted pool with only unavailable vdevs triggers assertion
failure in libzfs
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Serban Maduta <serban.maduta@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Joe Stein [Wed, 23 Dec 2015 19:51:02 +0000 (20:51 +0100)]
Illumos 6295 - metaslab_condense's dbgmsg should include vdev id
6295 metaslab_condense's dbgmsg should include vdev id
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@freebsd.org>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Joshua M. Clulow [Thu, 12 Nov 2015 02:33:52 +0000 (03:33 +0100)]
Illumos 6268 - zfs diff confused by moving a file to another directory
6268 zfs diff confused by moving a file to another directory
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Porting notes:
- Conflicts
- 3558fd7 Prototype/structure update for Linux
- 2cf7f52 Linux compat 2.6.39: mount_nodev()
- 13fe019 Illumos #3464
- 241b541 Illumos 5959 - clean up per-dataset feature count code
- dsl_prop_unregister() preserved until out of tree consumers
like Lustre can transition to dsl_prop_unregister_all().
- Fixing 'space or tab at end of line' in include/sys/dsl_dataset.h
Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Matthew Ahrens [Wed, 4 Nov 2015 20:37:33 +0000 (21:37 +0100)]
Illumos 6288 - dmu_buf_will_dirty could be faster
6288 dmu_buf_will_dirty could be faster
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Porting notes:
- [module/zfs/dbuf.c]
- Fix 'warning: ISO C90 forbids mixed declarations and code'
by moving 'dbuf_dirty_record_t *dr' to start of code block.
Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
George Wilson [Wed, 4 Nov 2015 20:12:40 +0000 (21:12 +0100)]
Illumos 6292 - exporting a pool while an async destroy
6292 exporting a pool while an async destroy is running can leave
entries in the deferred tree
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Fabian Keil <fk@fabiankeil.de>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
Matthew Ahrens [Wed, 4 Nov 2015 20:19:17 +0000 (21:19 +0100)]
Illumos 6319 - assertion failed in zio_ddt_write: bp->blk_birth == txg
6319 assertion failed in zio_ddt_write: bp->blk_birth == txg
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Matthew Ahrens [Sat, 26 Dec 2015 21:10:31 +0000 (22:10 +0100)]
Illumos 5987 - zfs prefetch code needs work
5987 zfs prefetch code needs work
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
References:
https://www.illumos.org/issues/5987 zfs prefetch code needs work
illumos/illumos-gate@cf6106c 5987 zfs prefetch code needs work
Porting notes:
- [module/zfs/dbuf.c]
- 5f6d0b6 Handle block pointers with a corrupt logical size
- [module/zfs/dmu_zfetch.c]
- c65aa5b Fix gcc missing parenthesis warnings
- 428870f Update core ZFS code from build 121 to build 141.
- 79c76d5 Change KM_PUSHPAGE -> KM_SLEEP
- b8d06fc Switch KM_SLEEP to KM_PUSHPAGE
- Account for ISO C90 - mixed declarations and code - warnings
- Module parameters (new/changed):
- Replaced zfetch_block_cap with zfetch_max_distance
(Max bytes to prefetch per stream (default 8MB; 8 * 1024 * 1024))
- Preserved zfs_prefetch_disable as 'int' for consistency with
existing Linux module options.
- [include/sys/trace_arc.h]
- Added new tracepoints
- DEFINE_ARC_BUF_HDR_EVENT(zfs_arc__sync__wait__for__async);
- DEFINE_ARC_BUF_HDR_EVENT(zfs_arc__demand__hit__predictive__prefetch);
- [man/man5/zfs-module-parameters.5]
- Updated man page
Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Mon, 11 Jan 2016 21:23:04 +0000 (13:23 -0800)]
Illumos 5039 - ztest should default to larger device sizes
5039 ztest should default to larger device sizes
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Max Grossman <max.grossman@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Porting notes:
- [include/sys/fs/zfs.h]
- f67d70 Create an 'overlay' property
- 11b9ec Add full SELinux support
- [fs/zfs/dsl_dataset.c]
- This increases the stack size of dsl_dataset_stats() but
nothing has been changed until this is shown to be an issue.
Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Matthew Ahrens [Fri, 1 Jan 2016 13:42:58 +0000 (14:42 +0100)]
Illumos 4891 - want zdb option to dump all metadata
4891 want zdb option to dump all metadata
Reviewed by: Sonu Pillai <sonu.pillai@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
We'd like a way for zdb to dump metadata in a machine-readable
format, so that we can bring that back from a customer site for
in-house diagnosis. Think of it as a crash dump for zpools,
which can be used for post-mortem analysis of a malfunctioning
pool
Porting notes:
- [cmd/zdb/zdb.c]
- a5778ea zdb: Introduce -V for verbatim import
- In main() getopt 'opt' variable removed and the code was
brought back in line with illumos.
- [lib/libzpool/kernel.c]
- 1e33ac1 Fix Solaris thread dependency by using pthreads
- f0e324f Update utsname support
- 4d58b69 Fix vn_open/vn_rdwr error handling
- In vn_open() allocate 'dumppath' on heap instead of stack
- Properly handle 'dump_fd == -1' error path
- Free 'realpath' after added vn_dumpdir_code block
Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting notes:
- [module/zfs/zfs_vnops.c]
- 3558fd7 Prototype/structure update for Linux
- 2cf7f52 Linux compat 2.6.39: mount_nodev()
- Use zfs_is_readonly() wrapper
- Remove first line of comment which doesn't apply
Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Will Andrews [Thu, 31 Dec 2015 16:38:59 +0000 (17:38 +0100)]
Illumos 3749 - zfs event processing should work on R/O root filesystems
3749 zfs event processing should work on R/O root filesystems
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>
Porting notes:
- [include/sys/spa_impl.h]
- ffe9d38 Add generic errata infrastructure
- 1421c89 Add visibility in to arc_read
- [include/sys/fm/fs/zfs.h]
- 2668527 Add linux events
- 6283f55 Support custom build directories and move includes
- [module/zfs/spa_config.c]
- Updated spa_config_sync() to match illumos with the exception
of a Linux specific block.
Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Fix an off by one error introduced by fcff0f3 which triggers an
assertion when 16M blocks are used with send/recv. This fix was
intentionally not folder in to the Illumos commit so it can be
easily cherry-picked by upstream.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting notes:
- [lib/libzfs/libzfs_sendrecv.c]
- b8864a2 Fix gcc cast warnings
- 325f023 Add linux kernel device support
- 5c3f61e Increase Linux pipe buffer size on 'zfs receive'
- [module/zfs/zfs_vnops.c]
- 3558fd7 Prototype/structure update for Linux
- c12e3a5 Restructure zfs_readdir() to fix regressions
- [module/zfs/zvol.c]
- Function @zvol_map_block() isn't needed in ZoL
- 9965059 Prefetch start and end of volumes
- [module/zfs/dmu.c]
- Fixed ISO C90 - mixed declarations and code
- Function dmu_prefetch() 'int i' is initialized before
the following code block (c90 vs. c99)
- [module/zfs/dbuf.c]
- fc5bb51 Fix stack dbuf_hold_impl()
- 9b67f60 Illumos 4757, 4913
- 34229a2 Reduce stack usage for recursive traverse_visitbp()
- [module/zfs/dmu_send.c]
- Fixed ISO C90 - mixed declarations and code
- b58986e Use large stacks when available
- 241b541 Illumos 5959 - clean up per-dataset feature count code
- 77aef6f Use vmem_alloc() for nvlists
- 00b4602 Add linux kernel memory support
Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Richard Sharpe [Mon, 28 Dec 2015 00:08:05 +0000 (16:08 -0800)]
Fix casesensitivity=insensitive deadlock
When casesensitivity=insensitive is set for the
file system, we can deadlock in a rename if the user uses different case
for each path. For example rename("A/some-file.txt", "a/some-file.txt").
The simple test for this is:
1. mkdir some-dir in a ZFS file system
2. touch some-dir/some-file.txt
3. mv Some-dir/some-file.txt some-dir/some-other-file.txt
This last request deadlocks trying to relock the i_mutex on the inode for
the parent directory.
The solution is to use d_add_ci in zpl_lookup if we are on a file system
that has the casesensitivity=insensitive attribute set.
This patch checks if we are working on a case insensitive file system and if
so, allocates storage for the case insensitive name and passes it to
zfs_lookup and then calls d_add_ci instead of d_splice_alias.
The performance impact seems to be minimal even though we have introduced a
kmalloc and kfree in the lookup path.
The problem was found when running Microsoft's FSCT against Samba on top of
ZFS On Linux.
Signed-off-by: Richard Sharpe <realrichardsharpe@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4136
Hajo Möller [Fri, 1 Jan 2016 01:20:43 +0000 (02:20 +0100)]
Make arc_summary.py and dbufstat.py compatible with python3
To make arc_summary.py and dbufstat.py compatible with python3
some minor fixes were required, this was done automatically by
`2to3 -w arc_summary.py` and `2to3 -w dbufstat.py`.
Signed-off-by: Hajo Möller <dasjoe@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Reviewed-by: Richard Laager <rlaager@wiktel.com>
Jeremy Jones [Thu, 31 Dec 2015 15:41:52 +0000 (16:41 +0100)]
Illumos 3139 - zdb dies when it tries to determine path of unlinked file
3139 zdb dies when it tries to determine path of unlinked file
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Approved by: Dan McDonald <danmcd@nexenta.com>
Matthew Ahrens [Mon, 6 Jul 2015 03:20:31 +0000 (05:20 +0200)]
Illumos 5746 - more checksumming in zfs send
5746 more checksumming in zfs send
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@omniti.com>
Ned Bass [Wed, 30 Dec 2015 02:41:22 +0000 (18:41 -0800)]
Prevent SA length overflow
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.
The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.
Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().
Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #4150
Illumos 5745 - zfs set allows only one dataset property to be set at a time
5745 zfs set allows only one dataset property to be set at a time
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Reviewed by: Richard PALO <richard@NetBSD.org>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Rich Lowe <richlowe@richlowe.net>
Chunwei Chen [Mon, 21 Dec 2015 19:57:18 +0000 (11:57 -0800)]
Make xattr dir truncate and remove in one tx
We need truncate and remove be in the same tx when doing zfs_rmnode on xattr
dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap
object on the delete queue. We do this by skipping dmu_free_long_range and let
zfs_znode_delete to do the work.
Chunwei Chen [Fri, 18 Dec 2015 19:39:41 +0000 (11:39 -0800)]
Fix empty xattr dir causing lockup
During zfs_rmnode on a xattr dir, if the system crash just after
dmu_free_long_range, we would get empty xattr dir in delete queue. This would
cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during
mount, and would try to do rw_enter on a wrong structure and cause system
lockup.
We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk.
Brian Behlendorf [Mon, 21 Dec 2015 17:27:24 +0000 (09:27 -0800)]
Fix z_xattr_lock/z_teardown_lock inversion
There exists a lock inversion between the z_xattr_lock and the
z_teardown_lock. Resolve this by taking the z_teardown_lock in
all registered xattr callbacks prior to taking the z_xattr_lock.
This ensures the locks are always taken is the same order thus
preventing a deadlock. Note the z_teardown_lock is taken again
in zfs_lookup() and this is safe because the z_teardown lock is
a re-entrant read reader/writer lock.
Brian Behlendorf [Mon, 21 Dec 2015 22:02:22 +0000 (17:02 -0500)]
Fix ztest truncated cache file
Commit efc412b updated spa_config_write() for Linux 4.2 kernels to
truncate and overwrite rather than rename the cache file. This is
the correct fix but it should have only been applied for the kernel
build. In user space rename(2) is needed because ztest depends on
the cache file.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4129
Olaf Faaland [Thu, 15 Oct 2015 20:08:27 +0000 (13:08 -0700)]
Identify locks flagged by lockdep
When running a kernel with CONFIG_LOCKDEP=y, lockdep reports possible
recursive locking in some cases and possible circular locking dependency
in others, within the SPL and ZFS modules.
This patch uses a mutex type defined in SPL, MUTEX_NOLOCKDEP, to mark
such mutexes when they are initialized. This mutex type causes
attempts to take or release those locks to be wrapped in lockdep_off()
and lockdep_on() calls to silence the dependency checker and allow the
use of lock_stats to examine contention.
For RW locks, it uses an analogous lock type, RW_NOLOCKDEP.
The goal is that these locks are ultimately changed back to type
MUTEX_DEFAULT or RW_DEFAULT, after the locks are annotated to reflect
their relationship (e.g. z_name_lock below) or any real problem with the
lock dependencies are fixed.
Some of the affected locks are:
tc_open_lock:
=============
This is an array of locks, all with same name, which txg_quiesce must
take all of in order to move txg to next state. All default to the same
lockdep class, and so to lockdep appears recursive.
zp->z_name_lock:
================
In zfs_rmdir,
dzp = znode for the directory (input to zfs_dirent_lock)
zp = znode for the entry being removed (output of zfs_dirent_lock)
zfs_rmdir()->zfs_dirent_lock() takes z_name_lock in dzp
zfs_rmdir() takes z_name_lock in zp
Since both dzp and zp are type znode_t, the locks have the same default
class, and lockdep considers it a possible recursive lock attempt.
l->l_rwlock:
============
zap_expand_leaf() sometimes creates two new zap leaf structures, via
these call paths:
Because both zap_leaf_open() and zap_create_leaf() initialize
l->l_rwlock in their (separate) leaf structures, the lockdep class is
the same, and the linux kernel believes these might both be the same
lock, and emits a possible recursive lock warning.
Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3895
Activate LVM volume groups before looking for zpools.
Original-patch-by: @jgoerzen Signed-off-by: Benjamin Albrecht <git@albrecht.io> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/pkg-zfs#102
Closes #4029
Brian Behlendorf [Thu, 17 Dec 2015 17:26:05 +0000 (09:26 -0800)]
Fix zfs_vdev_aggregation_limit bounds checking
Update the bounds checking for zfs_vdev_aggregation_limit so that
it has a floor of zero and a maximum value of the supported block
size for the pool.
Additionally add an early return when zfs_vdev_aggregation_limit
equals zero to disable aggregation. For very fast solid state or
memory devices it may be more expensive to perform the aggregation
than to issue the IO immediately.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Wed, 16 Dec 2015 19:28:15 +0000 (11:28 -0800)]
Fix vdev_queue_aggregate() deadlock
This deadlock may manifest itself in slightly different ways but
at the core it is caused by a memory allocation blocking on file-
system reclaim in the zio pipeline. This is normally impossible
because zio_execute() disables filesystem reclaim by setting
PF_FSTRANS on the thread. However, kmem cache allocations may
still indirectly block on file system reclaim while holding the
critical vq->vq_lock as shown below.
To resolve this issue zio_buf_alloc_flags() is introduced which
allocation flags to be passed. This can then be used in
vdev_queue_aggregate() with KM_NOSLEEP when allocating the
aggregate IO buffer. Since aggregating the IO is purely a
performance optimization we want this to either succeed or fail
quickly. Trying too hard to allocate this memory under the
vq->vq_lock can negatively impact performance and result in
this deadlock.
* z_wr_iss
zio_vdev_io_start
vdev_queue_io -> Takes vq->vq_lock
vdev_queue_io_to_issue
vdev_queue_aggregate
zio_buf_alloc -> Waiting on spl_kmem_cache process
* z_wr_int
zio_vdev_io_done
vdev_queue_io_done
mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss
* txg_sync
spa_sync
dsl_pool_sync
zio_wait -> Waiting on zio being handled by z_wr_int
Brian Behlendorf [Wed, 16 Dec 2015 22:17:49 +0000 (14:17 -0800)]
Fix z_xattr_lock/z_teardown_lock lock inversion
There exists a lock inversion between the z_xattr_lock and the
z_teardown_lock. Detect this case and return EBUSY so zfs_resume_fs()
will mark the inode stale and it can be safely revalidated on next
access.
Chunwei Chen [Tue, 8 Dec 2015 20:26:18 +0000 (12:26 -0800)]
Fix uio_prefaultpages for 0 length iovec
Userspace can freely pass in whatever iovec it feels like, and it's perfectly
legal to pass an iovec which contains a zero length segment. In the current
implementation, uio_prefaultpages would touch an out of bound byte in the
"last byte" logic. While this probably wouldn't cause any critical error, we
would like uio_prefaultpages to be able to continue gracefully.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4078
Brian Behlendorf [Fri, 11 Dec 2015 19:09:41 +0000 (11:09 -0800)]
Handle damaged blk_birth in dsl_deadlist_insert()
If a bit were cleared in `bp->blk_birth` such that the txg birth
was now lower than any other txg_birth in the deadlist, then there
will be no entry before this in the tree.
This should be impossible but regardless error handling code has
been added for this case. By default this is left as a fatal case
and the blk_birth is logged. However, setting `zfs_recover=1` will
cause the bp to be placed at the start of the deadlist even though
it contains an invalid blk_birth.
Commit 5f6d0b6 was originally added to gracefully handle block
pointers with a damaged logical size. However, it incorrectly
assumed that all passed arc_done_func_t could handle a NULL
arc_buf_t.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4069
Closes #4080
Richard Yao [Fri, 11 Dec 2015 23:40:05 +0000 (18:40 -0500)]
Unconditionally build zdb and ztest with -DDEBUG
Illumos unconditionally builds zdb and ztest with -DDEBUG. This helps
catch bugs and eliminates the need for commits like 202619623022722f30c2ee49931a4fa6896421c7, which changed ASSERTs to
VERIFYs. The following files in the illumos tree show this:
Brian Behlendorf [Mon, 14 Dec 2015 18:59:25 +0000 (10:59 -0800)]
Hold the zfs_snapentry_t before dispatch
While exceptionally unlikely to cause a problem the zfs_snapentry_t
hold should be taken before the dispatch to prevent any possibility
of the task being processed before the hold.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Chunwei Chen [Fri, 11 Dec 2015 23:24:34 +0000 (15:24 -0800)]
Fix snapshot automount race cause EREMOTE
When a concorrent mount finishes just before calling to
zfsctl_snapshot_ismounted, if we return EISDIR, the VFS will return
with EREMOTE. We should instead just return 0, so VFS may retry and
would likely notice the dentry is alreadly mounted. This will be
inline with when usermode helper return EBUSY.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Thu, 10 Dec 2015 23:53:37 +0000 (15:53 -0800)]
Change zfs_snapshot_lock from mutex to rw lock
By changing the zfs_snapshot_lock from a mutex to a rw lock the
zfsctl_lookup_objset() function can be allowed to run concurrently.
This should reduce the latency of fh_to_dentry lookups in ZFS
snapshots which are being accessed over NFS.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Brian Behlendorf [Thu, 10 Dec 2015 23:47:18 +0000 (15:47 -0800)]
Fix zfsctl_lookup_objset() deadlock
The zfsctl_snapshot_unmount_delay() function must not be called
from zfsctl_lookup_objset() while it is currently holding the
zfs_snapshot_lock. This will result in a deadlock. It is safe
to call zfsctl_snapshot_unmount_delay_impl() directly because the
function already has a reference on the zfs_snapentry_t.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #3997
Brian Behlendorf [Thu, 10 Dec 2015 23:23:26 +0000 (15:23 -0800)]
Set 'zfs_expire_snapshot=0' to disable auto-unmount
There are cases where it's desirable that auto-mounted snapshots
not expire after a fixed duration. They should be unmounted only
when the filesystem they are a snapshot of is unmounted.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
For some arm, powerpc, and sparc platforms it was possible that
neither _ILP32 of _LP64 would be defined. Update the isa_defs.h
header to explicitly set these macros and generate a compile error
in the case neither are defined.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #4048
Chunwei Chen [Mon, 7 Dec 2015 23:43:53 +0000 (15:43 -0800)]
Use spa as key besides objsetid for snapentry
objsetid is not unique across pool, so using it solely as key would cause
panic when automounting two snapshot on different pools with the same
objsetid. We fix this by adding spa pointer as additional key.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #3948
Issue #3786
Issue #3887
While stack size will vary by architecture it has historically defaulted to
8K on x86_64 systems. However, as of Linux 3.15 the default thread stack
size was increased to 16K. These kernels are now the default in most non-
enterprise distributions which means we no longer need to assume 8K stacks.
This patch takes advantage of that fact by appropriately reverting stack
conservation changes which were made to ensure stability. Changes which
may have had a negative impact on performance for certain workloads. This
also has the side effect of bringing the code slightly more in line with
upstream.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #4059
Matthew Ahrens [Fri, 24 Jul 2015 16:53:55 +0000 (09:53 -0700)]
Illumos 5959 - clean up per-dataset feature count code
5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
illumos code doesn't check for feature_get_refcount() returning
ENOTSUP (which means feature is disabled) in zdb. zfsonlinux added
a check in https://github.com/zfsonlinux/zfs/commit/784652c
due to #3468. The check was reintroduced here.
Ported-by: Witaut Bajaryn <vitaut.bayaryn@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3965
Provide a generic interface to prefetch ZAP entries by name. This
functionality is being added for external consumers such as Lustre.
It is based of the existing zap_prefetch_uint64() version which is
used by the deduplication code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #4061
ilovezfs [Sun, 22 Nov 2015 12:06:21 +0000 (04:06 -0800)]
Ext4's typical GPT partition type not recognized
Adding additional entries to the efi conversion array will help prevent
the overwriting of the GPTs of disks with in-use file systems in more
cases. Most notably, this adds partition type 8300 "Linux filesystem"
(0FC63DAF-8483-4772-8E79-3D69D8477DE4), which is often used for ext4 and
btrfs, among others.
This commit itself does nothing to address the underlying problematic
behavior that check_slice() isn't called on partitions of an
unrecognized type, even when they contain a currently mounted file
system.
The additional entries were derived from these two resources:
https://en.wikipedia.org/wiki/GUID_Partition_Table
http://sourceforge.net/p/gptfdisk/code/ci/master/tree/parttypes.cc
Signed-off-by: ilovezfs <ilovezfs@icloud.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4016
Yuri Pankov [Thu, 23 Feb 2012 03:11:44 +0000 (07:11 +0400)]
Illumos 934 - FreeBSD's GPT not recognized
Reviewed by: Alexander Eremin <alexander.r.eremin@gmail.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Reviewed by: Andrew Stormont <Andrew.Stormont@nexenta.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Richard Yao [Wed, 25 Nov 2015 15:39:57 +0000 (10:39 -0500)]
Only trigger SET_ERROR tracepoint event on error
Currently, the SET_ERROR tracepoint triggers regardless of whether there
is an error or not. On Illumos, SET_ERROR only triggers on an actual
error, which is avoids irrelevant noise. Linux 2.6.38 added support for
conditional tracepoints, so we modify SET_ERROR to use them when they
are avaliable for functionality equivalent to the Illumos functionality.
Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4043
Chunwei Chen [Fri, 20 Nov 2015 23:50:06 +0000 (15:50 -0800)]
Fix zdb calling behavior in ztest
The current zdb calling behaviour is really fragile, and is guaranteed to
segfault if ztest is not installed in either /sbin or /usr/sbin. With this
patch, the ztest will try to call zdb in the following order.
1. Use environmental variable ZDB_PATH if provided.
2. If ztest resides in build tree, guess the in tree zdb path.
3. Just pass zdb to popen and let it search it in PATH.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3126
Adding VPATH support, commit 47a4a6f, required that a `src`
and `obj` line be added to the top of the Makefiles. They
must be removed from the Makefiles when builtin.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/spl#481
Issue zfsonlinux/spl#498
Chunwei Chen [Mon, 23 Nov 2015 23:06:46 +0000 (15:06 -0800)]
Linux 4.4 compat: xattr operations takes xattr_handler
The xattr_hander->{list,get,set} were changed to take a xattr_handler,
and handler_flags argument was removed and should be accessed by
handler->flags.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4021
Chunwei Chen [Mon, 23 Nov 2015 22:47:29 +0000 (14:47 -0800)]
Linux 4.4 compat: make_request_fn returns blk_qc_t
As part of block polling support in Linux 4.4, make_request_fn should
return a cookie value of type blk_qc_t. For now, we make zvol_request
always return BLK_QC_T_NONE until we assess whether and how we want
to support block polling.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4021
tuxoko [Fri, 30 Oct 2015 23:10:01 +0000 (16:10 -0700)]
Fix zfs_dirty_data_max overflow on 32-bit
On 32 bit, the calculation of zfs_dirty_data_max from phymem will overflow,
causing it to be smaller than zfs_dirty_data_sync, and will cause txg being
delayed while no one write to disk. The end result is horrendous write speed.
On 4G ram 32-bit VM, before this patch, simple dd results in ~7MB/s. Now it
can reach speed on par with 64-bit VM.
Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3973
Chunwei Chen [Tue, 17 Nov 2015 00:39:52 +0000 (16:39 -0800)]
Fix snapshot automount behavior when concurrent or fail
When concurrent threads accessing the snapdir, one will succeed the user
helper mount while others will get EBUSY. However, the original code treats
those EBUSY threads as success and goes on to do zfsctl_snapshot_add, which
causes repeated avl_add and thus panic.
Also, if the snapshot is already mounted somewhere else, a thread accessing
the snapdir will also get EBUSY from user helper mount. And it will cause
strange things as doing follow_down_one will fail and then follow_up will jump
up to the mountpoint of the filesystem and confuse the hell out of VFS.
The patch fix both behavior by returning 0 immediately for the EBUSY threads.
Note, this will have a side effect for the second case where the VFS will
retry several times before returning ELOOP.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4018
Jason Zaman [Sat, 24 Oct 2015 06:01:08 +0000 (14:01 +0800)]
sysmacros: Make P2ROUNDUP not trigger int overflow
The original P2ROUNDUP and P2ROUNDUP_TYPED macros contain -x which
triggers PaX's integer overflow detection for unsigned integers.
Replace the macros with an equivalent version that does not trigger
the overflow.
Axioms:
A. (-(x)) === (~((x) - 1)) === (~(x) + 1) under two's complement.
B. ~(x & y) === ((~(x)) | (~(y))) under De Morgan's law.
C. ~(~x) === x under the law of excluded middle.
Proof:
0. (-(-(x) & -(align))) original
1. (~(-(x) & -(align)) + 1) by A
2. (((~(-(x))) | (~(-(align)))) + 1) by B
3. (((~(~((x) - 1))) | (~(~((align) - 1)))) + 1) by A
4. (((((x) - 1)) | (((align) - 1))) + 1) by C
Q.E.D.
Signed-off-by: Jason Zaman <jason@perfinion.com> Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3949
Brian Behlendorf [Mon, 16 Nov 2015 17:47:43 +0000 (09:47 -0800)]
zimport.sh: Add configure/make option support
Allow the following environment variables to control the build
behavior of the zimport.sh script. This can be useful when you
want a debug build or require specific build options. The
default values are:
CONFIG_OPTIONS=""
MAKE_OPTIONS="-s -j$(nproc)"
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>