]> granicus.if.org Git - zfs/log
zfs
7 years agoMulti-modifier protection (MMP)
Olaf Faaland [Sat, 8 Jul 2017 03:20:35 +0000 (20:20 -0700)]
Multi-modifier protection (MMP)

Add multihost=on|off pool property to control MMP.  When enabled
a new thread writes uberblocks to the last slot in each label, at a
set frequency, to indicate to other hosts the pool is actively imported.
These uberblocks are the last synced uberblock with an updated
timestamp.  Property defaults to off.

During tryimport, find the "best" uberblock (newest txg and timestamp)
repeatedly, checking for change in the found uberblock.  Include the
results of the activity test in the config returned by tryimport.
These results are reported to user in "zpool import".

Allow the user to control the period between MMP writes, and the
duration of the activity test on import, via a new module parameter
zfs_multihost_interval.  The period is specified in milliseconds.  The
activity test duration is calculated from this value, and from the
mmp_delay in the "best" uberblock found initially.

Add a kstat interface to export statistics about Multiple Modifier
Protection (MMP) updates. Include the last synced txg number, the
timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV
label that received the last MMP update, and the VDEV path.  Abbreviated
output below.

$ cat /proc/spl/kstat/zfs/mypool/multihost
31 0 0x01 10 880 105092382393521 105144180101111
txg   timestamp  mmp_delay   vdev_guid   vdev_label vdev_path
20468    261337  250274925   68396651780       3    /dev/sda
20468    261339  252023374   6267402363293     1    /dev/sdc
20468    261340  252000858   6698080955233     1    /dev/sdx
20468    261341  251980635   783892869810      2    /dev/sdy
20468    261342  253385953   8923255792467     3    /dev/sdd
20468    261344  253336622   042125143176      0    /dev/sdab
20468    261345  253310522   1200778101278     2    /dev/sde
20468    261346  253286429   0950576198362     2    /dev/sdt
20468    261347  253261545   96209817917       3    /dev/sds
20468    261349  253238188   8555725937673     3    /dev/sdb

Add a new tunable zfs_multihost_history to specify the number of MMP
updates to store history for. By default it is set to zero meaning that
no MMP statistics are stored.

When using ztest to generate activity, for automated tests of the MMP
function, some test functions interfere with the test.  For example, the
pool is exported to run zdb and then imported again.  Add a new ztest
function, "-M", to alter ztest behavior to prevent this.

Add new tests to verify the new functionality.  Tests provided by
Giuseppe Di Natale.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #745
Closes #6279

7 years agoMake hostid consistent in user and kernel space
Olaf Faaland [Thu, 25 May 2017 20:32:06 +0000 (13:32 -0700)]
Make hostid consistent in user and kernel space

If no spl_hostid was set, and no /etc/hostid file existed, the user
and kernel would have different values for the hostid.

The kernel's would be 0.  User space's would depend on the libc
implementation.  On systems with glibc, it would be a generated value,
probably the first 4 bytes of an IP address (see man 3 gethostid and
comments above hostid_read in SPL for details).

This then causes the hostid stored in the labels and in the pool
config not to match the hostid userspace obtains from
get_system_hostid().

Since the kernel has no way to know the libc's generated hostid value,
it serves no purpose for ZFS to use the value.

This patch changes user space's get_system_hostid() to conform to the
kernel's method, first checking for the spl_hostid via sysfs, and then
reading from /etc/hostid directly.

It does not look up spl_hostid_path, because if that is set and the
file it pointed to exists, spl_hostid will reflect its contents.

It eliminates the call to libc's gethostid().

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #745
Closes #6279

7 years agoOpenZFS 6939 - add sysevents to zfs core for commands
Dave Eddy [Tue, 30 May 2017 18:39:17 +0000 (11:39 -0700)]
OpenZFS 6939 - add sysevents to zfs core for commands

Authored by: Dave Eddy <dave@daveeddy.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Joshua M. Clulow <jmc@joyent.com>
Reviewed by: Josh Wilsdon <jwilsdon@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Alan Somers <asomers@gmail.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/6939
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ce1577b
Closes #6328

7 years agoFixed VERIFY3_IMPL() bug from 682ce104
Tom Caputi [Thu, 13 Jul 2017 00:15:24 +0000 (20:15 -0400)]
Fixed VERIFY3_IMPL() bug from 682ce104

When VERIFY3_IMPL() was adjusted in 682ce104, the values of
the operands were omitted from the variadic arguments list.
This patch simply corrects this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #6343

7 years agoAdd port of FreeBSD 'volmode' property
LOLi [Wed, 12 Jul 2017 20:05:37 +0000 (22:05 +0200)]
Add port of FreeBSD 'volmode' property

The volmode property may be set to control the visibility of ZVOL
block devices.

This allow switching ZVOL between three modes:
   full - existing fully functional behaviour (default)
   dev  - hide partitions on ZVOL block devices
   none - not exposing volumes outside ZFS

Additionally the new zvol_volmode module parameter can be used to
control the default behaviour.

This functionality can be used, for instance, on "backup" pools to
avoid cluttering /dev with unneeded zd* devices.

Original-patch-by: mav <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
FreeBSD-commit: https://github.com/freebsd/freebsd/commit/dd28e6bb
Closes #1796
Closes #3438
Closes #6233

7 years agoOpenZFS 5428 - provide fts(), reallocarray(), and strtonum()
Yuri Pankov [Tue, 13 Jun 2017 03:16:28 +0000 (20:16 -0700)]
OpenZFS 5428 - provide fts(), reallocarray(), and strtonum()

Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
* All hunks unrelated to ZFS were dropped.

OpenZFS-issue: https://www.illumos.org/issues/5428
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4585130
Closes #6326

7 years agoExit test-runner with non-zero if tests are KILLED
Giuseppe Di Natale [Sat, 8 Jul 2017 00:07:40 +0000 (17:07 -0700)]
Exit test-runner with non-zero if tests are KILLED

fe46eeb introduced non-zero exit codes to test-runner.
A non-zero exit code should be returned when test-runner
decided to kill a test and mark it as KILLED.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6325

7 years agoFix chattr_001_pos
LOLi [Fri, 7 Jul 2017 22:45:29 +0000 (00:45 +0200)]
Fix chattr_001_pos

Commands should be eval()ed if they involve a shell redirection,
otherwise we end up writing log_* functions messages to the output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6300
Closes #6323

7 years agoOpenZFS 8126 - ztest assertion failed in dbuf_dirty due to dn_nlevels changing
Matthew Ahrens [Mon, 20 Mar 2017 22:38:11 +0000 (15:38 -0700)]
OpenZFS 8126 - ztest assertion failed in dbuf_dirty due to dn_nlevels changing

The sync thread is concurrently modifying dn_phys->dn_nlevels
while dbuf_dirty() is trying to assert something about it, without
holding the necessary lock. We need to move this assertion further down
in the function, after we have acquired the dn_struct_rwlock.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8126
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0ef125d
Closes #6314

7 years agoOpenZFS 8067 - zdb should be able to dump literal embedded block pointer
Matthew Ahrens [Mon, 1 May 2017 18:06:07 +0000 (11:06 -0700)]
OpenZFS 8067 - zdb should be able to dump literal embedded block pointer

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8067
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8173085
Closes #6319

7 years agoPrevent dependencies on Debianized packages
Antonio Russo [Fri, 7 Jul 2017 17:45:17 +0000 (13:45 -0400)]
Prevent dependencies on Debianized packages

Call dpkg-shlibdeps with arguments excluding the Debianized packages
lib{uutil1,nvpair1,zfs2,zpool2}linux from the auto-generated
dependencies of generated .debs. A shim dh_shlibdeps that calls the
real dh_shlibdeps with corresponding arguments is installed into a
temporary directory, which is in turn pre-pended to the PATH for the
alien call, working around alien's inability to directly alter the
dependencies of its output debs. Resolves #6106.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #6309
Closes #6106

7 years agoFix 'zpool clear' on readonly pools
LOLi [Fri, 7 Jul 2017 17:39:53 +0000 (19:39 +0200)]
Fix 'zpool clear' on readonly pools

Illumos 4080 inadvertently allows 'zpool clear' on readonly pools: fix
this by reintroducing a check (POOL_CHECK_READONLY) in zfs_ioc_clear
registration code.

Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6306

7 years agoImplemented zpool scrub pause/resume
Alek P [Fri, 7 Jul 2017 05:16:13 +0000 (22:16 -0700)]
Implemented zpool scrub pause/resume

Currently, there is no way to pause a scrub. Pausing may
be useful when the pool is busy with other I/O to preserve
bandwidth.

This patch adds the ability to pause and resume scrubbing.
This is achieved by maintaining a persistent on-disk scrub state.
While the state is 'paused' we do not scrub any more blocks.
We do however perform regular scan housekeeping such as
freeing async destroyed and deadlist blocks while paused.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Serapheim Dimitropoulos <serapheimd@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #6167

7 years agoReschedule processes on -ERESTARTSYS
Arkadiusz Bubała [Thu, 6 Jul 2017 15:38:24 +0000 (17:38 +0200)]
Reschedule processes on -ERESTARTSYS

On the single core machine the system may hang when the
spa_namespare_lock acquisition fails in the zvol_first_open
function. It returns -ERESTARTSYS error what causes the
endless loop in __blkdev_get function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
Closes #6283
Closes #6312

7 years agoZTS: replace su commands by run_user function
George Melikov [Wed, 5 Jul 2017 17:46:52 +0000 (20:46 +0300)]
ZTS: replace su commands by run_user function

Needed for PATH variable to be passed into su.  The
posix* tests were fixed, but they need further investigation
before they can be enabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6303

7 years agoMusl libc fixes
alaviss [Wed, 5 Jul 2017 17:39:13 +0000 (00:39 +0700)]
Musl libc fixes

Musl libc's <stdio.h> doesn't include <stdarg.h>, which cause
`va_start` and `va_end` end up being undefined symbols.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Leorize <alaviss@users.noreply.github.com>
Closes #6310

7 years agoClang fixes
alaviss [Wed, 5 Jul 2017 17:38:20 +0000 (00:38 +0700)]
Clang fixes

Clang doesn't support `/` as comment in assembly, this patch replaces
them with `#`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Leorize <alaviss@users.noreply.github.com>
Closes #6311

7 years agoOpenZFS 8378 - crash due to bp in-memory modification of nopwrite block
Matthew Ahrens [Fri, 14 Apr 2017 19:59:18 +0000 (12:59 -0700)]
OpenZFS 8378 - crash due to bp in-memory modification of nopwrite block

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
The problem is that zfs_get_data() supplies a stale zgd_bp to
dmu_sync(), which we then nopwrite against.
zfs_get_data() doesn't hold any DMU-related locks, so after it
copies db_blkptr to zgd_bp, dbuf_write_ready() could change
db_blkptr, and dbuf_write_done() could remove the dirty record.
dmu_sync() then sees the stale BP and that the dbuf it not dirty,
so it is eligible for nop-writing.
The fix is for dmu_sync() to copy db_blkptr to zgd_bp after
acquiring the db_mtx. We could still see a stale db_blkptr,
but if it is stale then the dirty record will still exist and
thus we won't attempt to nopwrite.

OpenZFS-issue: https://www.illumos.org/issues/8378
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3127742
Closes #6293

7 years agoOpenZFS 7600 - zfs rollback should pass target snapshot to kernel
Andriy Gapon [Sat, 11 Mar 2017 18:26:47 +0000 (20:26 +0200)]
OpenZFS 7600 - zfs rollback should pass target snapshot to kernel

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
The existing kernel-side code only provides a method to rollback to a
latest snapshot, whatever it happens to be at the time when the rollback
is actually done.  That could be unsafe or confusing in environments
where concurrent DSL changes are possible as the resulting state could
correspond to a newer or older snapshot than the originally requested
one.
This change allows to amend that method such that the rollback is
performed only when the latest snapshot has a specific name.  That is,
if a new snapshot is concurrently created or the target snapshot is
destroyed, then no rollback is done and EXDEV error is returned.
New libzfs_core function lzc_rollback_to() is provided for the new
functionality.  libzfs is changed to use lzc_rollback_to() to implement
zfs rollback command.
Perhaps we should return different errors to distinguish the case where
the desired snapshot exists but it's not the latest snapshot and the
case where the desired snapshot does not exist.

OpenZFS-issue: https://www.illumos.org/issues/7600
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3d645eb
Closes #6292

7 years agoOpenZFS 7910 - l2arc_write_buffers() may write beyond target_sz
Andriy Gapon [Sat, 11 Mar 2017 17:48:35 +0000 (19:48 +0200)]
OpenZFS 7910 - l2arc_write_buffers() may write beyond target_sz

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7910
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/cb6af4b
Closes #6291

7 years agoOpenZFS 8418 - zfs_prop_get_table() call in zfs_validate_name() is a no-op
Marcel Telka [Thu, 22 Jun 2017 13:30:49 +0000 (15:30 +0200)]
OpenZFS 8418 - zfs_prop_get_table() call in zfs_validate_name() is a no-op

Authored by: Marcel Telka <marcel@telka.sk>
Reviewed by: Vitaliy Gusev <gusev.vitaliy@icloud.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8418
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e09ba01
Closes #6305

7 years agoZTS: minor typo and old default values
George Melikov [Mon, 3 Jul 2017 21:21:12 +0000 (00:21 +0300)]
ZTS: minor typo and old default values

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6298

7 years agoOn failure tests-runner should do non-zero exit
Alek P [Fri, 30 Jun 2017 18:14:26 +0000 (14:14 -0400)]
On failure tests-runner should do non-zero exit

Right now test runner will always exit(0).
It's helpful to have zfs-tests.sh provide different
exit values depending on if everything passed or not.
We can then use common shell cmds to run tests until failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #6285

7 years agoPrint fail messages before callbacks in test suite
Giuseppe Di Natale [Fri, 30 Jun 2017 18:12:29 +0000 (11:12 -0700)]
Print fail messages before callbacks in test suite

Reorder operations in _endlog so failure messages get
printed prior to performing callbacks and cleanup. This
helps clarify why a test failed and places the message
closer to the point of incident in the resulting logs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6281

7 years agoOpenZFS 8430 - dir_is_empty_readdir() doesn't properly handle error from fdopendir()
Sowrabha Gopal [Thu, 1 Jun 2017 20:27:02 +0000 (13:27 -0700)]
OpenZFS 8430 - dir_is_empty_readdir() doesn't properly handle error from fdopendir()

Authored by: Sowrabha Gopal <sowrabha.gopal@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
dir_is_empty_readdir() immediately returns if fdopendir() fails.
We should close dirfd when that happens.

OpenZFS-issue: https://www.illumos.org/issues/8430
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e165e20
Closes #6289

7 years agoOpenZFS 8416 - abd.h is not C++ friendly
Andriy Gapon [Wed, 21 Jun 2017 20:47:54 +0000 (23:47 +0300)]
OpenZFS 8416 - abd.h is not C++ friendly

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8416
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/589c189
Closes #6288

7 years agoOpenZFS 8426 - mark immutable buffer arguments as such in abd.h
Andriy Gapon [Mon, 26 Jun 2017 10:46:45 +0000 (13:46 +0300)]
OpenZFS 8426 - mark immutable buffer arguments as such in abd.h

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8426
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/37359a6
Closes #6287

7 years agoOpenZFS 8377 - Panic in bookmark deletion
Matthew Ahrens [Fri, 14 Apr 2017 19:52:43 +0000 (12:52 -0700)]
OpenZFS 8377 - Panic in bookmark deletion

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
The problem is that when dsl_bookmark_destroy_check() is
executed from open context (the pre-check), it fills in
dbda_success based on the existence of the bookmark. But
the bookmark (or containing filesystem as in this case)
can be destroyed before we get to syncing context. When
we re-run dsl_bookmark_destroy_check() in syncing context,
it will not add the deleted bookmark to dbda_success,
intending for dsl_bookmark_destroy_sync() to not process
it. But because the bookmark is still in dbda_success from
the open-context call, we do try to destroy it.
The fix is that dsl_bookmark_destroy_check() should not
modify dbda_success when called from open context.

OpenZFS-issue: https://www.illumos.org/issues/8377
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b0b6fe3
Closes #6286

7 years agoClean up large dnode code
Matthew Ahrens [Thu, 29 Jun 2017 17:18:03 +0000 (10:18 -0700)]
Clean up large dnode code

Resolves issues discovered when porting to OpenZFS.

* Lint warnings.
* Made dnode_move_impl() large dnode aware.  This
  functionality is currently unused on Linux.

Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #6262

7 years agoSet arc_meta_limit, arc_dnode_limit on change
chrisrd [Thu, 29 Jun 2017 16:57:27 +0000 (02:57 +1000)]
Set arc_meta_limit, arc_dnode_limit on change

Make zfs_arc_meta_limit_percent and zfs_arc_dnode_limit_percent behave
as you would expect from zfs-module-parameters.5.

- recalculate arc_meta_limit if zfs_arc_meta_limit_percent changes
- recalculate arc_dnode_limit if zfs_arc_dnode_limit_percent changes
- correctly set arc_meta_limit and arc_dnode_limit if zfs_arc_max or
  zfs_arc_meta_min changes

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #6269

7 years agoConvert man zfs.8 to mdoc (OpenZFS sync)
Brian Behlendorf [Thu, 29 Jun 2017 16:55:30 +0000 (09:55 -0700)]
Convert man zfs.8 to mdoc (OpenZFS sync)

* Fixed some typos
* Additional description for some commands arguments
* Text reworked to be in sync with OpenZFS
* Added Linux as .Os type

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6282

7 years agoGCC 7.1 fixes
Tony Hutter [Wed, 28 Jun 2017 17:05:16 +0000 (10:05 -0700)]
GCC 7.1 fixes

GCC 7.1 with will warn when we're not checking the snprintf()
return code in cases where the buffer could be truncated. This
patch either checks the snprintf return code (where applicable),
or simply disables the warnings (ztest.c).

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6253

7 years agoConvert man zpool.8 to mdoc (OpenZFS sync)
George Melikov [Sun, 18 Jun 2017 18:27:06 +0000 (21:27 +0300)]
Convert man zpool.8 to mdoc (OpenZFS sync)

* Fixed some typos
* Additional description for some commands arguments
* `listsnapshots` remained
* Text reworked to be in sync with OpenZFS
* Added Linux as .Os type
* Updated `zpool events` section.
* Updated `zpool iostat|status -c` sections
* Added zed(8) reference to SEE ALSO

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6245

7 years agoFix RHEL 7.4 bio_set_op_attrs build error
Tony Hutter [Tue, 27 Jun 2017 19:00:27 +0000 (12:00 -0700)]
Fix RHEL 7.4 bio_set_op_attrs build error

On RHEL 7.4, include/linux/bio.h now includes a macro for
bio_set_op_attrs that conflicts with the ifndef in ZFS
include/linux/blkdev_compat.h.  This patch fixes the build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6234
Closes #6271

7 years agoCap maximum aggregate IO size
Brian Behlendorf [Tue, 27 Jun 2017 17:09:16 +0000 (10:09 -0700)]
Cap maximum aggregate IO size

Commit 8542ef8 allowed optional IOs to be aggregated beyond
the specified aggregation limit.  Since the aggregation limit
was also used to enforce the maximum block size, setting
`zfs_vdev_aggregation_limit=16777216` could result in an
attempt to allocate an ABD larger than 16M.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6259
Closes #6270

7 years agoFix zpool_add_005_pos
bunder2015 [Tue, 27 Jun 2017 17:06:07 +0000 (13:06 -0400)]
Fix zpool_add_005_pos

Under Linux the existence of a block device in /etc/fstab is
not sufficient to prevent the use of the force flag.  Without
the force flag a warning will be printed that the device has
a filesystem of a given type.  Providing the force option
will overwrite that filesystem as long as it is not actively
mounted.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Tested-by: bunder2015 <omfgbunder@gmail.com>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #6267
Closes #6272

7 years agoRefine use of zv_state_lock.
Boris Protopopov [Tue, 13 Jun 2017 16:03:44 +0000 (12:03 -0400)]
Refine use of zv_state_lock.

Use zv_state_lock to protect all members of zvol_state structure, add
relevant ASSERT()s. Take zv_suspend_lock before zv_state_lock, do not
hold zv_state_lock across suspend/resume.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Closes #6226

7 years agoOpenZFS 5220 - L2ARC does not support devices that do not provide 512B access
Giuseppe Di Natale [Tue, 27 Jun 2017 00:32:43 +0000 (17:32 -0700)]
OpenZFS 5220 - L2ARC does not support devices that do not provide 512B access

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/5220
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/403a8da
Closes #6260

7 years agoOpenZFS 8264 - want support for promoting datasets in libzfs_core
Giuseppe Di Natale [Mon, 26 Jun 2017 23:56:09 +0000 (16:56 -0700)]
OpenZFS 8264 - want support for promoting datasets in libzfs_core

Authored by: Andrew Stormont <astormont@racktopsystems.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@kebe.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8264
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a4b8c9a
Closes #6254

7 years agoFix arithmetic error message in zfs_clone_010_pos
bunder2015 [Mon, 26 Jun 2017 21:48:54 +0000 (17:48 -0400)]
Fix arithmetic error message in zfs_clone_010_pos

zfs_clone_010_pos.ksh: line 234: ZFS_MAXPROPLEN: arithmetic syntax error

Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #6268

7 years agoCall cv_signal() with mutex held
Boris Protopopov [Mon, 26 Jun 2017 21:36:49 +0000 (17:36 -0400)]
Call cv_signal() with mutex held

In bqueue_dequeue(), call cv_signal() with bq_lock held.
Re-enable rsend_009_pos to test the fix.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Closes #5887

7 years agoOpenZFS 8331 - zfs_unshare returns wrong error code for smb unshare failure
Andrew Stormont [Mon, 12 Jun 2017 16:56:09 +0000 (17:56 +0100)]
OpenZFS 8331 - zfs_unshare returns wrong error code for smb unshare failure

Authored by: Andrew Stormont <astormont@racktopsystems.com>
Reviewed by: Marcel Telka <marcel@telka.sk>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8331
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4f4378c
Closes #6255

7 years agoDashes for zero latency values in zpool iostat -p
Tony Hutter [Thu, 22 Jun 2017 16:39:01 +0000 (09:39 -0700)]
Dashes for zero latency values in zpool iostat -p

This prints dashes instead of zeros for zero latency values in
'zpool iostat -p'.  You'll get zero latencies reported when the
disk is idle, but technically a zero latency is invalid, since you
can't measure the latency of doing nothing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6210

7 years agoAdd kpreempt_disable/enable around CPU_SEQID uses
Morgan Jones [Mon, 19 Jun 2017 16:43:16 +0000 (16:43 +0000)]
Add kpreempt_disable/enable around CPU_SEQID uses

In zfs/dmu_object and icp/core/kcf_sched, the CPU_SEQID macro
should be surrounded by `kpreempt_disable` and `kpreempt_enable`
calls to avoid a Linux kernel BUG warning.  These code paths use
the cpuid to minimize lock contention and is is safe to reschedule
the process to a different processor at any time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Morgan Jones <me@numin.it>
Closes #6239

7 years agoInject zinject(8) a percentage amount of dev errs
Don Brady [Sat, 17 Jun 2017 00:21:11 +0000 (18:21 -0600)]
Inject zinject(8) a percentage amount of dev errs

In the original form of device error injection, it was an all or nothing
situation.  To help simulate intermittent error conditions, you can now
specify a real number percentage value. This is also very useful for our
ZFS fault diagnosis testing and for injecting intermittent errors during
load testing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #6227

7 years agoProvide links to info about ZFS Buildbot options
Giuseppe Di Natale [Fri, 16 Jun 2017 00:52:18 +0000 (17:52 -0700)]
Provide links to info about ZFS Buildbot options

Add links for information about the ZFS buildbot options
to the contributing guidelines and PR template.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6235

7 years agoAvoid 'queue not locked' warning at pool import.
Boris Protopopov [Wed, 14 Jun 2017 20:18:36 +0000 (16:18 -0400)]
Avoid 'queue not locked' warning at pool import.

Use queue_flag_set_unlocked() in zvol_alloc().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Issue #6226

7 years agoFix zvol_state_t->zv_open_count race
LOLi [Thu, 15 Jun 2017 18:08:45 +0000 (20:08 +0200)]
Fix zvol_state_t->zv_open_count race

5559ba0 added zv_state_lock to protect zvol_state_t internal data:
this, however, doesn't guard zv->zv_open_count and
zv->zv_disk->private_data in zvol_remove_minors_impl().

Fix this by taking zv->zv_state_lock before we check its zv_open_count.

P1 (z_zvol)                       P2 (systemd-udevd)
---                               ---
zvol_remove_minors_impl()
: zv->zv_open_count==0
                                  zvol_open()
                                  ->mutex_enter(zv_state_lock)
                                  : zv->zv_open_count++
                                  ->mutex_exit(zv_state_lock)
->mutex_enter(zv->zv_state_lock)
->zvol_remove(zv)
->mutex_exit(zv->zv_state_lock)
: zv->zv_disk->private_data = NULL
->zvol_free()
-->ASSERT(zv->zv_open_count==0) *
                                  zvol_release()
                                  : zv = disk->private_data
                                  ->ASSERT(zv && zv->zv_open_count>0) *
---                               ---
* ASSERT() fails

Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6213

7 years agoFix manual description of zfs_arc_dnode_limit
chrisrd [Wed, 14 Jun 2017 20:23:02 +0000 (06:23 +1000)]
Fix manual description of zfs_arc_dnode_limit

In arc_evict_state() we start pruning when arc_dnode_size >
arc_dnode_limit, i.e. arc_dnode_limit is a ceiling rather than a
floor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #6228

7 years agoFix zvol_init error handling
Richard Yao [Sat, 20 May 2017 18:01:55 +0000 (14:01 -0400)]
Fix zvol_init error handling

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@prophetstor.com>
7 years agoMake zvol operations use _by_dnode routines
Richard Yao [Tue, 13 Jun 2017 16:18:08 +0000 (12:18 -0400)]
Make zvol operations use _by_dnode routines

This continues what was started in
0eef1bde31d67091d3deed23fe2394f5a8bf2276 by fully converting zvols
to avoid unnecessary dnode_hold() calls. This saves a small amount
of CPU time and slightly improves latencies of operations on zvols.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@prophetstor.com>
Closes #6058

7 years agoFix zpool_import_all_001_pos
Giuseppe Di Natale [Tue, 13 Jun 2017 16:05:55 +0000 (09:05 -0700)]
Fix zpool_import_all_001_pos

Cleanup zpool_import_all_001_pos to no longer use devices.
The test is meant to test zpool import -a and by no longer
requiring devices, a number of dependencies are no longer
necessary.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6198

7 years agoReduce stack usage of dsl_dir_tempreserve_impl
DeHackEd [Mon, 12 Jun 2017 18:41:03 +0000 (14:41 -0400)]
Reduce stack usage of dsl_dir_tempreserve_impl

Buildbots and zfs-tests regularly see 7 kilobytes of stack
usage with this function. Convert self-calls to iterations

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #6219

7 years agoUse log_must_busy in destroy_pool
Brian Behlendorf [Mon, 12 Jun 2017 16:45:32 +0000 (09:45 -0700)]
Use log_must_busy in destroy_pool

The log function log_must_busy was added in commit e623aea2 for
this purpose.  Update destroy_pool to use it.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6217

7 years agoAdd missing \n for "invalid optionusage" output
kpande [Fri, 9 Jun 2017 16:51:13 +0000 (12:51 -0400)]
Add missing \n for "invalid optionusage" output

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: DHE <git@dehacked.net>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Jack Draak <jackdraak@gmail.com>
Signed-off-by: Kash Pande <kash@tripleback.net>
Closes #6203

7 years agoOpenZFS 8056 - zfs send size estimate is inaccurate for some zvols
Paul Dagnelie [Thu, 7 Jul 2016 22:00:51 +0000 (15:00 -0700)]
OpenZFS 8056 - zfs send size estimate is inaccurate for some zvols

Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
The send size estimate for a zvol can be too low, if the size of the
record headers (dmu_replay_record_t's) is a significant portion of the
size. This is typically the case when the data is highly compressible,
especially with embedded blocks.

The problem is that dmu_adjust_send_estimate_for_indirects() assumes
that blocks are the size of the "recordsize" property (128KB). However,
for zvols, the blocks are the size of the "volblocksize" property (8KB).
Therefore, we estimate that there will be 16x less record headers than
there really will be.

The fix is to check the type of the object set (whether it is a zvol or
not) and pick the appropriate property. In addition, while we are at it,
we also add the size of the BEGIN and END records to the estimate.

OpenZFS-issue: https://www.illumos.org/issues/8056
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/faf09cd
Closes #6205

7 years agoOpenZFS 8156 - dbuf_evict_notify() does not need dbuf_evict_lock
Matthew Ahrens [Tue, 28 Mar 2017 22:31:49 +0000 (15:31 -0700)]
OpenZFS 8156 - dbuf_evict_notify() does not need dbuf_evict_lock

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should
do the eviction itself (because the evict thread is not able to keep up).
This can result in massive lock contention.  It isn't necessary to hold
the lock, because if we make the wrong choice occasionally, nothing bad
will happen. This commit results in a ~60% performance improvement for
ARC-cached sequential reads.

OpenZFS-issue: https://www.illumos.org/issues/8156
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f73e5d9
Closes #6204

7 years agoOpenZFS 8199 - multi-threaded dmu_object_alloc()
Matthew Ahrens [Fri, 13 May 2016 04:16:36 +0000 (21:16 -0700)]
OpenZFS 8199 - multi-threaded dmu_object_alloc()

dmu_object_alloc() is single-threaded, so when multiple threads are
creating files in a single filesystem, they spend a lot of time waiting
for the os_obj_lock.  To improve performance of multi-threaded file
creation, we must make dmu_object_alloc() typically not grab any
filesystem-wide locks.

The solution is to have a "next object to allocate" for each CPU. Each
of these "next object"s is in a different block of the dnode object, so
that concurrent allocation holds dnodes in different dbufs.  When a
thread's "next object" reaches the end of a chunk of objects (by default
4 blocks worth -- 128 dnodes), it will be reset to the per-objset
os_obj_next, which will be increased by a chunk of objects (128).  Only
when manipulating the os_obj_next will we need to grab the os_obj_lock.
This decreases lock contention dramatically, because each thread only
needs to grab the os_obj_lock briefly, once per 128 allocations.

This results in a 70% performance improvement to multi-threaded object
creation (where each thread is creating objects in its own directory),
from 67,000/sec to 115,000/sec, with 8 CPUs.

Work sponsored by Intel Corp.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
OpenZFS-issue: https://www.illumos.org/issues/8199
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/374
Closes #4703
Closes #6117

7 years agoOpenZFS 7578 - Fix/improve some aspects of ZIL writing
Giuseppe Di Natale [Fri, 9 Jun 2017 16:15:37 +0000 (09:15 -0700)]
OpenZFS 7578 - Fix/improve some aspects of ZIL writing

- After some ZIL changes 6 years ago zil_slog_limit got partially broken
due to zl_itx_list_sz not updated when async itx'es upgraded to sync.
Actually because of other changes about that time zl_itx_list_sz is not
really required to implement the functionality, so this patch removes
some unneeded broken code and variables.

 - Original idea of zil_slog_limit was to reduce chance of SLOG abuse by
single heavy logger, that increased latency for other (more latency critical)
loggers, by pushing heavy log out into the main pool instead of SLOG.  Beside
huge latency increase for heavy writers, this implementation caused double
write of all data, since the log records were explicitly prepared for SLOG.
Since we now have I/O scheduler, I've found it can be much more efficient
to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE
to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.

 - Existing ZIL implementation had problem with space efficiency when it
has to write large chunks of data into log blocks of limited size.  In some
cases efficiency stopped to almost as low as 50%.  In case of ZIL stored on
spinning rust, that also reduced log write speed in half, since head had to
uselessly fly over allocated but not written areas.  This change improves
the situation by offloading problematic operations from z*_log_write() to
zil_lwb_commit(), which knows real situation of log blocks allocation and
can split large requests into pieces much more efficiently.  Also as side
effect it removes one of two data copy operations done by ZIL code WR_COPIED
case.

 - While there, untangle and unify code of z*_log_write() functions.
Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing
block boundary, that may also improve efficiency if ZPL is made to do that.

Sponsored by:   iXsystems, Inc.

Authored by: Alexander Motin <mav@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7578
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/aeb13ac
Closes #6191

7 years agoOpenZFS 8155 - simplify dmu_write_policy handling of pre-compressed buffers
Matthew Ahrens [Thu, 23 Mar 2017 16:07:27 +0000 (09:07 -0700)]
OpenZFS 8155 - simplify dmu_write_policy handling of pre-compressed buffers

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
When writing pre-compressed buffers, arc_write() requires that
the compression algorithm used to compress the buffer matches
the compression algorithm requested by the zio_prop_t, which is
set by dmu_write_policy(). This makes dmu_write_policy() and its
callers a bit more complicated.

We simplify this by making arc_write() trust the caller to supply
the type of pre-compressed buffer that it wants to write,
and override the compression setting in the zio_prop_t.

OpenZFS-issue: https://www.illumos.org/issues/8155
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b55ff58
Closes #6200

7 years agoAdd MS_MANDLOCK mount failure message
Brian Behlendorf [Wed, 7 Jun 2017 17:59:44 +0000 (10:59 -0700)]
Add MS_MANDLOCK mount failure message

Commit torvalds/linux@9e8925b6 allowed for kernels to be built
without support for mandatory locking (MS_MANDLOCK).  This will
result in 'zfs mount' failing when the nbmand=on property is set
if the kernel is built without CONFIG_MANDATORY_FILE_LOCKING.

Unfortunately we can not reliably detect prior to the mount(2) system
call if the kernel was built with this support.  The best we can do
is check if the mount failed with EPERM and if we passed 'mand'
as a mount option and then print a more useful error message. e.g.

  filesystem 'tank/fs' has the 'nbmand=on' property set, this mount
  option may be disabled in your kernel.  Use 'zfs set nbmand=off'
  to disable this option and try to mount the filesystem again.

Additionally, switch the default error message case to use
strerror() to produce a more human readable message.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4729
Closes #6199

7 years agoSkip tests that are slow on 32-bit builders
Giuseppe Di Natale [Wed, 7 Jun 2017 02:04:01 +0000 (22:04 -0400)]
Skip tests that are slow on 32-bit builders

zpool_create_024_pos, zvol_misc_002_pos, write_dirs_002_pos are slow
on the buildbot 32-bit builder. Skip the test cases for now on 32-bit
builders.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6195

7 years agoReduce async_destroy_001_pos memory requirements
Brian Behlendorf [Tue, 6 Jun 2017 18:30:47 +0000 (11:30 -0700)]
Reduce async_destroy_001_pos memory requirements

The number of blocks which can be freed per TXG is controlled
by the zfs_free_max_blocks module option (defaults to 100,000).
Both speed up this test case and reduce the memory requirements
by only creating 4 TXGs worth of blocks to be freed.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5479
Closes #6192

7 years agoAllow add of raidz and mirror with same redundancy
Håkan Johansson [Mon, 5 Jun 2017 20:53:09 +0000 (22:53 +0200)]
Allow add of raidz and mirror with same redundancy

Allow new members to be added to a pool mixing raidz and mirror vdevs
without giving -f, as long as they have matching redundancy.  This case
was missed in #5915, which only handled zpool create.

Add zfstest zpool_add_010_pos.ksh, with test of zpool create
followed by zpool add of mixed raidz and mirror vdevs.

Add some more mixed raidz and mirror cases to zpool_create_006_pos.ksh.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Haakan Johansson <f96hajo@chalmers.se>
Issue #5915
Closes #6181

7 years agoLinux 4.9 compat: fix zfs_ctldir xattr handling
LOLi [Mon, 5 Jun 2017 18:26:25 +0000 (20:26 +0200)]
Linux 4.9 compat: fix zfs_ctldir xattr handling

Since torvalds/linux@d0a5b99 IOP_XATTR is used to indicate the inode
has xattr support: clear it for the ctldir inodes to avoid EIO errors.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6189

7 years agozpool iostat/status -c improvements
Giuseppe Di Natale [Mon, 5 Jun 2017 17:52:15 +0000 (13:52 -0400)]
zpool iostat/status -c improvements

Users can now provide their own scripts to be run
with 'zpool iostat/status -c'. User scripts should be
placed in ~/.zpool.d to be included in zpool's
default search path.

Provide a script which can be used with
'zpool iostat|status -c' that will return the type of
device (hdd, sdd, file).

Provide a script to get various values from smartctl
when using 'zpool iostat/status -c'.

Allow users to define the ZPOOL_SCRIPTS_PATH
environment variable which can be used to override
the default 'zpool iostat/status -c' search path.

Allow the ZPOOL_SCRIPTS_ENABLED environment
variable to enable or disable 'zpool status/iostat -c'
functionality.

Use the new smart script to provide the serial command.

Install /etc/sudoers.d/zfs file which contains the sudoer
rule for smartctl as a sample.

Allow 'zpool iostat/status -c' tests to run in tree.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6121
Closes #6153

7 years agoFix "snapdev" property issues
LOLi [Fri, 2 Jun 2017 14:17:00 +0000 (16:17 +0200)]
Fix "snapdev" property issues

When inheriting the "snapdev" property to we don't always call
zfs_prop_set_special(): this prevents device nodes from being created in
certain situations. Because "snapdev" is the only *special* property
that is also inheritable we need to call zfs_prop_set_special() even
when we're not reverting it to the received value ('zfs inherit -S').

Additionally, fix a NULL pointer dereference accidentally introduced in
5559ba0 that can be triggered when setting the "snapdev" property to
the value "hidden" twice.

Finally, add a new test case "zvol_misc_snapdev" to the ZFS Test Suite.

Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6131
Closes #6175
Closes #6176

7 years agoFix import wrong spare/l2 device when path change
Chunwei Chen [Thu, 25 May 2017 22:56:12 +0000 (15:56 -0700)]
Fix import wrong spare/l2 device when path change

If, for example, your aux device was /dev/sdc, but now the aux device is
removed and /dev/sdc points to other device. zpool import will still
use that device and corrupt it.

The problem is that the spa_validate_aux in spa_import, rather than
validate the on-disk label, it would actually write label to disk. We
remove them since spa_load_{spares,l2cache} seems to do everything we
need and they would actually validate on-disk label.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #6158

7 years agoFix import finding spare/l2cache when path changes
Chunwei Chen [Wed, 24 May 2017 22:11:23 +0000 (15:11 -0700)]
Fix import finding spare/l2cache when path changes

When spare or l2cache device path changes, zpool import will not fix up
their paths like normal vdev. The issue is that when you supply a pool
name argument to zpool import, it will use it to filter out device which
doesn't have the pool name in the label. Since spare and l2cache device
never have that in the label, they'll always get filtered out.

We fix this by making sure we never filter out a spare or l2cache
device.

Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #6158

7 years agoRetire filebench testing
Giuseppe Di Natale [Thu, 1 Jun 2017 13:24:28 +0000 (09:24 -0400)]
Retire filebench testing

We no longer perform automated filebench testing.  Remove
references to it for the automated testing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6186

7 years agoFix memory leak in zvol_set_volsize()
LOLi [Wed, 31 May 2017 19:52:12 +0000 (21:52 +0200)]
Fix memory leak in zvol_set_volsize()

Move kmem_free() so it's called for every error path: this is
preferred over making `dmu_object_info_t doi` local to accommodate
older kernels with limited stacks.

Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6177

7 years agoExplain reason for Signed-off-by in CONTRIBUTING
kpande [Wed, 31 May 2017 14:30:07 +0000 (10:30 -0400)]
Explain reason for Signed-off-by in CONTRIBUTING

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Haakan T Johansson <f96hajo@chalmers.se>
Signed-off-by: Kash Pande <kash@tripleback.net>
Closes #6183

7 years agoFix ida leak in zvol_create_minor_impl
Boris Protopopov [Sat, 27 May 2017 00:50:25 +0000 (20:50 -0400)]
Fix ida leak in zvol_create_minor_impl

Added missing ida_simple_remove() in the error handling path.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Closes #6159
Closes #6172

7 years agoDon't dirty bpobj if it has no entries
Alek P [Fri, 26 May 2017 18:42:10 +0000 (08:42 -1000)]
Don't dirty bpobj if it has no entries

In certain cases (dsl_scan_sync() is one), we may end up calling
bpobj_iterate() on an empty bpobj. Even though we don't end up
modifying the bpobj it still gets dirtied, causing unneeded writes
to the pool.

This patch adds an early bail from bpobj_iterate_impl() if bpobj
is empty to prevent unneeded writes.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #6164

7 years agoRevert "Fix "snapdev" property inheritance behaviour"
Brian Behlendorf [Fri, 26 May 2017 18:40:44 +0000 (11:40 -0700)]
Revert "Fix "snapdev" property inheritance behaviour"

This reverts commit 959f56b99366c8727647b5b19fb3d47555c96cf3.
An issue was uncovered by the new zvol_misc_snapdev test case
which needs to be investigated and resolved.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6174
Issue #6131

7 years agoOpenZFS 8077 - zfs-tests suite fails zpool_get_002_pos
Yuri Pankov [Wed, 24 May 2017 11:11:47 +0000 (07:11 -0400)]
OpenZFS 8077 - zfs-tests suite fails zpool_get_002_pos

Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <jwk404@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: bunder2015 <omfgbunder@gmail.com>
Porting Notes:
* Also corrected a quoting mistake found in our copy

OpenZFS-issue: https://www.illumos.org/issues/8077
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/481467d
Closes #6163

7 years agoOpenZFS 8076 - zfs-tests suite fails rootpool_002_neg
Yuri Pankov [Wed, 24 May 2017 11:01:49 +0000 (07:01 -0400)]
OpenZFS 8076 - zfs-tests suite fails rootpool_002_neg

Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: John Kennedy <jwk404@gmail.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: bunder2015 <omfgbunder@gmail.com>
OpenZFS-issue: https://www.illumos.org/issues/8076
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ab3407e
Closes #6162

7 years agoOpenZFS 8071 - zfs-tests: 7290 missed some cases
Yuri Pankov [Wed, 24 May 2017 10:46:28 +0000 (06:46 -0400)]
OpenZFS 8071 - zfs-tests: 7290 missed some cases

Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <jwk404@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: bunder2015 <omfgbunder@gmail.com>
OpenZFS-issue: https://www.illumos.org/issues/8071
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e84991e
Closes #6161

7 years agoOpenZFS 8070 - Add some ZFS comments
Alan Somers [Wed, 24 May 2017 10:34:56 +0000 (06:34 -0400)]
OpenZFS 8070 - Add some ZFS comments

Authored by: Alan Somers <asomers@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: bunder2015 <omfgbunder@gmail.com>
OpenZFS-issue: https://www.illumos.org/issues/8070
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/40713f2
Closes #6160

7 years agoFix "snapdev" property inheritance behaviour
LOLi [Thu, 25 May 2017 23:43:46 +0000 (01:43 +0200)]
Fix "snapdev" property inheritance behaviour

When inheriting the "snapdev" property to we don't always call
zfs_prop_set_special(): this prevents device nodes from being created in
certain situations. Because "snapdev" is the only *special* property
that is also inheritable we need to call zfs_prop_set_special() even
when we're not reverting it to the received value ('zfs inherit -S').

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6131

7 years agoOpenZFS 8072 - zfs-tests: several test cases incorrectly spell TESTPOOL
Yuri Pankov [Tue, 16 May 2017 18:22:23 +0000 (11:22 -0700)]
OpenZFS 8072 - zfs-tests: several test cases incorrectly spell TESTPOOL

Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <jwk404@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8072
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/56e4733
Closes #6137

7 years agoconfig: allow --with-linux without --with-linux-obj
Chunwei Chen [Wed, 24 May 2017 23:02:04 +0000 (16:02 -0700)]
config: allow --with-linux without --with-linux-obj

Don't use `uname -r` to determine kernel build directory when the user
specified kernel source with --with-linux. Otherwise, the user is forced
to use --with-linux-obj even if they are the same directory, which is
very counterintuitive.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Requires-spl: refs/pull/617/head

7 years agoImprove gitignore
Chunwei Chen [Wed, 24 May 2017 22:56:10 +0000 (15:56 -0700)]
Improve gitignore

Ignore .*.d and exclude Makefile.in in module/
Also, ignore *.patch and *.orig files

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
7 years agoLinux 4.12 compat: fix super_setup_bdi_name() call
LOLi [Thu, 25 May 2017 16:55:55 +0000 (18:55 +0200)]
Linux 4.12 compat: fix super_setup_bdi_name() call

Provide a format parameter to super_setup_bdi_name() so we don't
create duplicate names in '/devices/virtual/bdi' sysfs namespace which
would prevent us from mounting more than one ZFS filesystem at a time.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6147

7 years agoRetire zconfig.sh
Brian Behlendorf [Fri, 19 May 2017 17:08:23 +0000 (13:08 -0400)]
Retire zconfig.sh

All of the test coverage provided by this script is now handled
as part of the ZFS Test Suite.  Remove it.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6128

7 years agoAdd zpool events tests
Brian Behlendorf [Thu, 18 May 2017 19:57:21 +0000 (15:57 -0400)]
Add zpool events tests

* events_001_pos - Verify the expected events are generated when
  invoking the various zpool sub-commands.  These events must
  appear in `zpool event` and be consumed by the ZED.

* events_002_pos - Verify the ZED consumes events which were
  generated while it wasn't running when it is started.
  Additionally, verify that events are only processed once.

As part of this change the default.cfg used by the test suite
was changed to a default.cfg.in file.  This was needed so the
install location of all zed scripts, not only the enabled ones,
could be reliably determined.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6128

7 years agoEnable xattr tests
Brian Behlendorf [Fri, 19 May 2017 00:22:04 +0000 (20:22 -0400)]
Enable xattr tests

Updated the xattr_common.ksh helper functions to use the attr
command on Linux to manipulate xattrs.  Added an xattr.cfg file
and reworked the user/group functionality to be consist with
the existing delegate test cases.  The intent of each test
case was preserved.

* xattr_001_pos, xattr_002_neg - Updated to verity xattr=on
  and xattr=sa sytle xattrs.

* xattr_003_neg - Use user_run helper instead of su.

* xattr_004_pos - Updated to work with ext2 xattrs.

* xattr_007_neg - Updated to use attr instead of runat.

* xattr_008_pos, xattr_009_neg8_pos, xattr_010_neg -
  Test cases disables since they aren't applicable to Linux.

* xattr_011_pos - Updated to expected behavior from GNU
  versions of the tested utilities.

* xattr_012_pos - Updated to use xattrtest to create many
  small xattrs instead of a single large one.

* xattr_013_pos - Updated to use attr instead of runat.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6128

7 years agoEnable remaining tests
Brian Behlendorf [Fri, 19 May 2017 00:21:15 +0000 (20:21 -0400)]
Enable remaining tests

Enable most of the remaining test cases which were previously
disabled.  The required fixes are as follows:

* cache_001_pos - No changes required.

* cache_010_neg - Updated to use losetup under Linux.  Loopback
  cache devices are allowed, ZVOLs as cache devices are not.
  Disabled until all the builders pass reliably.

* cachefile_001_pos, cachefile_002_pos, cachefile_003_pos,
  cachefile_004_pos - Set set_device_dir path in cachefile.cfg,
  updated CPATH1 and CPATH2 to reference unique files.

* zfs_clone_005_pos - Wait for udev to create volumes.

* zfs_mount_007_pos - Updated mount options to expected Linux names.

* zfs_mount_009_neg, zfs_mount_all_001_pos - No changes required.

* zfs_unmount_005_pos, zfs_unmount_009_pos, zfs_unmount_all_001_pos -
  Updated to expect -f to not unmount busy mount points under Linux.

* rsend_019_pos - Observed to occasionally take a long time on both
  32-bit systems and the kmemleak builder.

* zfs_written_property_001_pos - Switched sync(1) to sync_pool.

* devices_001_pos, devices_002_neg - Updated create_dev_file() helper
  for Linux.

* exec_002_neg.ksh - Fixed mmap_exec.c to preserve errno.  Updated
  test case to expect EPERM from Linux as described by mmap(2).

* grow_pool_001_pos - Adding missing setup.ksh and cleanup.ksh
  scripts from OpenZFS.

* grow_replicas_001_pos.ksh - Added missing $SLICE_* variables.

* history_004_pos, history_006_neg, history_008_pos - Fixed by
  previous commits and were not enabled.  No changes required.

* zfs_allow_010_pos - Added missing spaces after assorted zfs
  commands in delegate_common.kshlib.

* inuse_* - Illumos dump device tests skipped.  Remaining test
  cases updated to correctly create required partitions.

* large_files_001_pos - Fixed largest_file.c to accept EINVAL
  as well as EFBIG as described in write(2).

* link_count_001 - Added nproc to required commands.

* umountall_001 - Updated to use umount -a.

* online_offline_001_* - Pull in OpenZFS change to file_trunc.c
  to make the '-c 0' option run the test in a loop.  Included
  online_offline.cfg file in all test cases.

* rename_dirs_001_pos - Updated to use the rename_dir test binary,
  pkill restricted to exact matches and total runtime reduced.

* slog_013_neg, write_dirs_002_pos - No changes required.

* slog_013_pos.ksh - Updated to use losetup under Linux.

* slog_014_pos.ksh - ZED will not be running, manually degrade
  the damaged vdev as expected.

* nopwrite_varying_compression, nopwrite_volume - Forced pool
  sync with sync_pool to ensure up to date property values.

* Fixed typos in ZED log messages.  Refactored zed_* helper
  functions to resolve all-syslog exit=1 errors in zedlog.

* zfs_copies_005_neg, zfs_get_004_pos, zpool_add_004_pos,
  zpool_destroy_001_pos, largest_pool_001_pos, clone_001_pos.ksh,
  clone_001_pos, - Skip until layering pools on zvols is solid.

* largest_pool_001_pos - Limited to 7eb pool, maximum
  supported size in 8eb-1 on Linux.

* zpool_expand_001_pos, zpool_expand_003_neg - Requires
  additional support from the ZED, updated skip reason.

* zfs_rollback_001_pos, zfs_rollback_002_pos - Properly cleanup
  busy mount points under Linux between test loops.

* privilege_001_pos, privilege_003_pos, rollback_003_pos,
  threadsappend_001_pos - Skip with log_unsupported.

* snapshot_016_pos - No changes required.

* snapshot_008_pos - Increased LIMIT from 512K to 2M and added
  sync_pool to avoid false positives.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6128

7 years agoFix LZ4_uncompress_unknownOutputSize caused panic
Feng Sun [Fri, 19 May 2017 20:45:46 +0000 (04:45 +0800)]
Fix LZ4_uncompress_unknownOutputSize caused panic

Sync with kernel patches for lz4

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/lib/lz4

4a3a99 lz4: add overrun checks to lz4_uncompress_unknownoutputsize()
d5e7ca LZ4 : fix the data abort issue
bea2b5 lib/lz4: Pull out constant tables
99b7e9 lz4: fix system halt at boot kernel on x86_64

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Feng Sun <loyou85@gmail.com>
Closes #5975
Closes #5973

7 years agoImplemented zpool sync command
Alek P [Fri, 19 May 2017 19:33:11 +0000 (12:33 -0700)]
Implemented zpool sync command

This addition will enable us to sync an open TXG to the main pool
on demand. The functionality is similar to 'sync(2)' but 'zpool sync'
will return when data has hit the main storage instead of potentially
just the ZIL as is the case with the 'sync(2)' cmd.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #6122

7 years agoForce fault a vdev with 'zpool offline -f'
Tony Hutter [Fri, 19 May 2017 19:30:16 +0000 (12:30 -0700)]
Force fault a vdev with 'zpool offline -f'

This patch adds a '-f' option to 'zpool offline' to fault a vdev
instead of bringing it offline.  Unlike the OFFLINE state, the
FAULTED state will trigger the FMA code, allowing for things like
autoreplace and triggering the slot fault LED.  The -f faults
persist across imports, unless they were set with the temporary
(-t) flag.  Both persistent and temporary faults can be cleared
with zpool clear.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6094

7 years agoFixed small memory leak in ereport handling
Tom Caputi [Fri, 19 May 2017 00:35:49 +0000 (20:35 -0400)]
Fixed small memory leak in ereport handling

One pre-check in zfs_ereport_start() was being called after
the nvlists were being allocated. This simply corrects that
issue.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #6140

7 years agoFix large dnode send stream flag conflict
Brian Behlendorf [Thu, 18 May 2017 17:02:16 +0000 (10:02 -0700)]
Fix large dnode send stream flag conflict

Bit 21 of the send stream flags was inadvertently used for two
different features under concurrent development.  To avoid any
future compatibility problems the large dnode flag is being
switched to bit 23 which is unused.

The large dnode feature has only been present in pre-releases of
ZoL and dnodesize defaults to legacy which is compatible with
existing OpenZFS implementations.  Users with dnodesize=auto
needing to use zfs send/recv must update ZoL on both the
source and destination systems.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6139

7 years agoCompatibilty with glibc-2.23
Justin Lecher [Wed, 17 May 2017 00:00:16 +0000 (01:00 +0100)]
Compatibilty with glibc-2.23

In glibc-2.23 <sys/sysmacros.h> isn't automatically included in
<sys/types.h> [1], so we need ot explicitely include it.

https://sourceware.org/ml/libc-alpha/2015-11/msg00253.html

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Justin Lecher <jlec@gentoo.org>
Closes #6132

7 years agoIntroduce zv_state_lock
Boris Protopopov [Wed, 10 May 2017 17:51:29 +0000 (13:51 -0400)]
Introduce zv_state_lock

The lock is designed to protect internal state of zvol_state_t and
to avoid taking spa_namespace_lock (e.g. in dmu_objset_own() code path)
while holding zvol_stat_lock. Refactor the code accordingly.

Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3484
Closes #6065
Closes #6134

7 years agoRevert commit 1ee159f4
Boris Protopopov [Thu, 11 May 2017 20:40:33 +0000 (16:40 -0400)]
Revert commit 1ee159f4

Fix lock order inversion with zvol_open() as it did not account
for use of zvols as vdevs. The latter use cases resulted in the
lock order inversion deadlocks that involved spa_namespace_lock
and bdev->bd_mutex.

Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #6065
Issue #6134

7 years agoSkip spurious resilver IO on raidz vdev
Isaac Huang [Sat, 13 May 2017 00:28:03 +0000 (18:28 -0600)]
Skip spurious resilver IO on raidz vdev

On a raidz vdev, a block that does not span all child vdevs, excluding
its skip sectors if any, may not be affected by a child vdev outage or
failure. In such cases, the block does not need to be resilvered.
However, current resilver algorithm simply resilvers all blocks on a
degraded raidz vdev. Such spurious IO is not only wasteful, but also
adds the risk of overwriting good data.

This patch eliminates such spurious IOs.

Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Isaac Huang <he.huang@intel.com>
Closes #5316

7 years agoEnable additional test cases
Brian Behlendorf [Thu, 11 May 2017 21:27:57 +0000 (14:27 -0700)]
Enable additional test cases

Enable additional test cases, in most cases this required a few
minor modifications to the test scripts.  In a few cases a real
bug was uncovered and fixed.  And in a handful of cases where pools
are layered on pools the test case will be skipped until this is
supported.  Details below for each test case.

* zpool_add_004_pos - Skip test on Linux until adding zvols to pools
  is fully supported and deadlock free.

* zpool_add_005_pos.ksh - Skip dumpadm portion of the test which isn't
  relevant for Linux.  The find_vfstab_dev, find_mnttab_dev, and
  save_dump_dev functions were updated accordingly for Linux.  Add
  O_EXCL to the in-use check to prevent the -f (force) option from
  working for mounted filesystems and improve the resulting error.

* zpool_add_006_pos - Update test case such that it doesn't depend
  on nested pools.  Switch to truncate from mkfile to reduce space
  requirements and speed up the test case.

* zpool_clear_001_pos - Speed up test case by filling filesystem to
  25% capacity.

* zpool_create_002_pos, zpool_create_004_pos - Use sparse files for
  file vdevs in order to avoid increasing the partition size.

* zpool_create_006_pos - 6ba1ce9 allows raidz+mirror configs with
  similar redundancy.  Updating the valid_args and forced_args cases.

* zpool_create_008_pos - Disable overlapping partition portion.

* zpool_create_011_neg - Fix to correctly create the extra partition.
  Modified zpool_vdev.c to use fstat64_blk() wrapper which includes
  the st_size even for block devices.

* zpool_create_012_neg - Updated to properly find swap devices.

* zpool_create_014_neg, zpool_create_015_neg - Updated to use
  swap_setup() and swap_cleanup() wrappers which do the right thing
  on Linux and Illumos.  Removed '-n' option which succeeds under
  Linux due to differences in the in-use checks.

* zpool_create_016_pos.ksh - Skipped test case isn't useful.

* zpool_create_020_pos - Added missing / to cleanup() function.
  Remove cache file prior to test to ensure a clean environment
  and avoid false positives.

* zpool_destroy_001_pos - Removed test case which creates a pool on
  a zvol.  This is more likely to deadlock under Linux and has never
  been completely supported on any platform.

* zpool_destroy_002_pos - 'zpool destroy -f' is unsupported on Linux.
  Mount point must not be busy in order to unmount them.

* zfs_destroy_001_pos - Handle EBUSY error which can occur with
  volumes when racing with udev.

* zpool_expand_001_pos, zpool_expand_003_neg - Skip test on Linux
  until adding zvols to pools is fully supported and deadlock free.
  The test could be modified to use loop-back devices but it would
  be preferable to use the test case as is for improved coverage.

* zpool_export_004_pos - Updated test case to such that it doesn't
  depend on nested pools.  Normal file vdev under /var/tmp are fine.

* zpool_import_all_001_pos - Updated to skip partition 1, which is
  known as slice 2, on Illumos.  This prevents overwriting the
  default TESTPOOL which was causing the failure.

* zpool_import_002_pos, zpool_import_012_pos - No changes needed.

* zpool_remove_003_pos - No changes needed

* zpool_upgrade_002_pos, zpool_upgrade_004_pos - Root cause addressed
  by upstream OpenZFS commit 3b7f360.

* zpool_upgrade_007_pos - Disabled in test case due to known failure.
  Opened issue https://github.com/zfsonlinux/zfs/issues/6112

* zvol_misc_002_pos - Updated to to use ext2.

* zvol_misc_001_neg, zvol_misc_003_neg, zvol_misc_004_pos,
  zvol_misc_005_neg, zvol_misc_006_pos - Moved to skip list, these
  test case could be updated to use Linux's crash dump facility.

* zvol_swap_* - Updated to use swap_setup/swap_cleanup helpers.
  File creation switched from /tmp to /var/tmp.  Enabled minimal
  useful tests for Linux, skip test cases which aren't applicable.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3484
Issue #5634
Issue #2437
Issue #5202
Issue #4034
Closes #6095

7 years agoOpenZFS 8063 - verify that we do not attempt to access inactive txg
Matthew Ahrens [Mon, 24 Apr 2017 16:34:36 +0000 (09:34 -0700)]
OpenZFS 8063 - verify that we do not attempt to access inactive txg

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>
A standard practice in ZFS is to keep track of "per-txg" state. Any of
the 3 active TXG's (open, quiescing, syncing) can have different values
for this state. We should assert that we do not attempt to modify other
(inactive) TXG's.

Porting Notes:
- ASSERTV added to txg_sync_waiting() for unused variable.

OpenZFS-issue: https://www.illumos.org/issues/8063
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/01acb46
Closes #6109

7 years agoOpenZFS 8166 - zpool scrub thinks it repaired offline device
Matthew Ahrens [Wed, 10 May 2017 17:32:40 +0000 (10:32 -0700)]
OpenZFS 8166 - zpool scrub thinks it repaired offline device

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Matthew Ahrens <mahrens@delphix.com>
If we do a scrub while a leaf device is offline (via "zpool offline"),
we will inadvertently clear the DTL (dirty time log) of the offline
device, even though it is still damaged.  When the device comes back
online, we will incompletely resilver it, thinking that the scrub
repaired blocks written before the scrub was started.  The incomplete
resilver can lead to data loss if there is a subsequent failure of a
different leaf device.

The fix is to never clear the DTL of offline devices.  Note that if a
device is onlined while a scrub is in progress, the scrub will be
restarted.

The problem can be worked around by running "zpool scrub" after
"zpool online".

OpenZFS-issue: https://www.illumos.org/issues/8166
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/372
Closes #5806
Closes #6103