]> granicus.if.org Git - zfs/log
zfs
5 years agoTag 0.8.0-rc2 zfs-0.8.0-rc2
Brian Behlendorf [Mon, 12 Nov 2018 19:57:03 +0000 (11:57 -0800)]
Tag 0.8.0-rc2

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
5 years agoAllow spaces in pool names for cmdline argument
kpande [Mon, 12 Nov 2018 02:23:11 +0000 (21:23 -0500)]
Allow spaces in pool names for cmdline argument

PR #8114 quoted the ${ENCRYPTIONROOT} parameter to ensure we don't
lose spaces when unlocking root filesystem in the off chance that
it has a space in its name.

Unfortunately, dracut and initramfs-tools do not actually get the
quotes from the cmdline. If we use root=ZFS="root pool/filesystem
name" the script still only sees root=ZFS=root and no quotation
marks.

Because + is a reserved character in ZFS, it's used as a
placeholder for spaces in the kernel cmdline.  In this way,
root=ZFS=root+pool/filesystem+name will properly expand by
replacing the character with sed (POSIX compliant method).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: bunder2015 <omfgbunder@gmail.com>
Signed-off-by: Kash Pande <kash@tripleback.net>
Issue #8114
Closes #8117

5 years agoFix coverity defects: CID 184285
LOLi [Mon, 12 Nov 2018 02:09:00 +0000 (03:09 +0100)]
Fix coverity defects: CID 184285

CID 184285: Read from pointer after free (USE_AFTER_FREE)

This patch fixes an use-after-free in vdev_config_generate_stats()
moving the kmem_free() call at the end of the function.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8120

5 years agoFix systemd spec file macros
Brian Behlendorf [Mon, 12 Nov 2018 02:06:36 +0000 (18:06 -0800)]
Fix systemd spec file macros

Ensure that the _unitdir, _presetdir, _modulesloaddir, and
_systemdgeneratordir macros are always defined.  If not set
them to the expected default values.  Pass all of these options
to ./configure and package the resulting files in those locations.

Additionally, set __brp_mangle_shebangs_exclude_from until the
conversion to Python 3 is complete so they may be built cleanly
under mock.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7567
Closes #8119

5 years agoMake initramfs-tools script encryption aware
Garrett Fields [Fri, 9 Nov 2018 19:30:09 +0000 (14:30 -0500)]
Make initramfs-tools script encryption aware

Changed decrypt_fs zfs command to "load-key"
Plymouth case code based on "contrib/dracut/90zfs/zfs-lib.sh.in"
Systemd case based on "contrib/dracut/90zfs/zfs-load-key.sh.in"
Cleaned up misspelling of "available" throughout

Code style fixes
Single quote for ${ENCRYPTIONROOT}
Changed "${DECRYPT_CMD}"  to "eval ${DECRYPT_CMD}"

Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes #8093

5 years agozed: detect and offline physically removed devices
loli10K [Tue, 18 Sep 2018 21:45:52 +0000 (23:45 +0200)]
zed: detect and offline physically removed devices

This commit adds a new test case to the ZFS Test Suite to verify ZED
can detect when a device is physically removed from a running system:
the device will be offlined if a spare is not available in the pool.

We implement this by using the existing libudev functionality and
without relying solely on the FM kernel module capabilities which have
been observed to be unreliable with some kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #1537
Closes #7926

5 years agoAdd quotations for ${ENCRYPTIONROOT}
kpande [Fri, 9 Nov 2018 17:32:01 +0000 (12:32 -0500)]
Add quotations for ${ENCRYPTIONROOT}

Add quotations for ${ENCRYPTIONROOT} to avoid breaking systems
with a space in the name.

Reviewed-by: bunder2015 <omfgbunder@gmail.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kash Pande <kash@tripleback.net>
Related-to: #8093
Closes #8114

5 years agoAdd zpool status -s (slow I/Os) and -p (parseable)
Tony Hutter [Fri, 9 Nov 2018 00:47:24 +0000 (16:47 -0800)]
Add zpool status -s (slow I/Os) and -p (parseable)

This patch adds a new slow I/Os (-s) column to zpool status to show the
number of VDEV slow I/Os. This is the number of I/Os that didn't
complete in zio_slow_io_ms milliseconds. It also adds a new parsable
(-p) flag to display exact values.

  NAME         STATE     READ WRITE CKSUM  SLOW
  testpool     ONLINE       0     0     0     -
  mirror-0   ONLINE       0     0     0     -
      loop0    ONLINE       0     0     0    20
      loop1    ONLINE       0     0     0     0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7756
Closes #6885

5 years agoUpdate zfs_admin_snapshot value (disabled)
George Melikov [Fri, 9 Nov 2018 00:17:12 +0000 (03:17 +0300)]
Update zfs_admin_snapshot value (disabled)

It's disabled by default, update code and tests to reflect
the documentation.

Minor cleanup in delegate_common.kshlib.

Reviewed-by: Gregor Kopka <gregor@kopka.net>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #7835
Closes #8045

5 years agoZTS: Fix and reenable zfs_rename tests
Tom Caputi [Thu, 8 Nov 2018 00:59:27 +0000 (19:59 -0500)]
ZTS: Fix and reenable zfs_rename tests

zfs_rename_006_pos has been flaky in the past because it was
missing a call to block_device_wait to ensure the zvols it creates
are present before running dd. Whenever this this happened,
zfs_rename_009_neg would also fail because the first test would
leak a zvol clone that it did not know how to clean up. This patch
fixes the root cause and reenables the test. It also fixes some
minor grammar errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #5647
Closes #5648
Closes #8088

5 years agoZTS: Fix test zfs_mount_006_pos
Paul Zuchowski [Thu, 8 Nov 2018 00:54:08 +0000 (19:54 -0500)]
ZTS: Fix test zfs_mount_006_pos

For Linux, place a file in the mount point folder so it will be
considered "busy".  Fix the while loop so it doesn't rm in
directories above the testdir.  Add Linux-specific code to test
overlay on|off.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #4990
Closes #8081

5 years agoAdd BuildRequires gcc, make, elfutils-libelf-devel
Tony Hutter [Wed, 7 Nov 2018 23:48:24 +0000 (15:48 -0800)]
Add BuildRequires gcc, make, elfutils-libelf-devel

This adds a BuildRequires for gcc, make, and elfutils-libelf-devel
into our spec files.  gcc has been a packaging requirement for
awhile now:

https://fedoraproject.org/wiki/Packaging:C_and_C%2B%2B

These additional BuildRequires allow us to mock build in
Fedora 29.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8095
Closes #8102

5 years agoFix !zilog_is_dirty() assert during ztest
Tom Caputi [Wed, 7 Nov 2018 23:46:50 +0000 (18:46 -0500)]
Fix !zilog_is_dirty() assert during ztest

ztest occasionally hits an assert that !zilog_is_dirty() during
zil_close(). This is caused by an interaction between 2 threads.
First, ztest_run() waits for each test thread to complete and
closes the associated dataset as soon as the thread joins. At
the same time, the ztest_vdev_add_remove() test may attempt to
remove the slog, which will open, dirty, and reset the logs on
every dataset in the pool (including those of other threads).
This patch simply ensures that we always join all of the test
threads before closing any datasets.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8094

5 years agoFix divide by zero during indirect split damage
Tom Caputi [Wed, 7 Nov 2018 23:44:56 +0000 (18:44 -0500)]
Fix divide by zero during indirect split damage

This patch simply ensures that vdev_indirect_splits_damage()
cannot hit a divide by zero exception if a split has no
children with valid data. The normal reconstruction code
path in vdev_indirect_reconstruct_io_done() already has this
check.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8086

5 years agoFix dirtying vdev config on with RO spa
Tom Caputi [Wed, 7 Nov 2018 23:44:14 +0000 (18:44 -0500)]
Fix dirtying vdev config on with RO spa

This patch simply corrects an issue where vdev_dtl_reassess()
could attempt to dirty the vdev config even when the spa was
not elligable for writing.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8085

5 years agoReplay logs before starting ztest workers
Tom Caputi [Wed, 7 Nov 2018 23:40:24 +0000 (18:40 -0500)]
Replay logs before starting ztest workers

This patch ensures that logs are replayed on all datasets prior
to starting ztest workers. This ensures that the call to
vdev_offline() a log device in ztest_fault_inject() will not fail
due to the log device being required for replay.

This patch also fixes a small issue found during testing where
spa_keystore_load_wkey() does not check that the dataset specified
is an encryption root. This check was present in libzfs, however.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8084

5 years agoFix vdev removal finishing race
Tom Caputi [Wed, 7 Nov 2018 23:38:10 +0000 (18:38 -0500)]
Fix vdev removal finishing race

This patch fixes a race condition where the end of
vdev_remove_replace_with_indirect(), which holds
svr_lock, would race against spa_vdev_removal_destroy(),
which destroys the same lock and is called asynchronously
via dsl_sync_task_nowait().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Issue #6900
Closes #8083

5 years agoMake vdev_set_deferred_resilver() recursive
Tom Caputi [Wed, 7 Nov 2018 23:33:17 +0000 (18:33 -0500)]
Make vdev_set_deferred_resilver() recursive

vdev_clear() can call vdev_set_deferred_resilver() with a
non-leaf vdev to setup a deferred resilver. However, this
function is currently written to only handle leaf vdevs.
This bug was introduced with deferred resilvers in 80a91e74.
This patch makes this function recursive so that it can find
appropriate vdevs to resilver and set vdev_resilver_deferred
on them.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Issue #7732
Closes #8082

5 years agoFix libudev dependency in libzutil
Don Brady [Wed, 7 Nov 2018 01:47:52 +0000 (18:47 -0700)]
Fix libudev dependency in libzutil

ZFS should be able to build without libudev installed. The recent
change for libzutil inadvertently broke that.  Make the libudev code
conditional in zutil_import.c to resolve the build failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #8097

5 years agozpool: bogus error for invalid dedupditto value
LOLi [Tue, 6 Nov 2018 18:14:56 +0000 (19:14 +0100)]
zpool: bogus error for invalid dedupditto value

When provided with an invalid 'dedupditto' value zpool prints
a misleading error message:

    $ sudo zpool set dedupditto=99 pp
    cannot set property for 'pp': property 'dedupditto'(14) not defined

Fix this by printing a meaningful error description for unsupported
'dedupditto' values.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8079

5 years agoztest: reduce gangblock creation
Brian Behlendorf [Mon, 5 Nov 2018 19:53:49 +0000 (11:53 -0800)]
ztest: reduce gangblock creation

In order to validate the gang block code ztest is configured to
artificially force a fraction of large blocks to be written as
gang blocks.  The default setting chosen for this was to
write 25% of all blocks 32k or larger using gang blocks.

The confluence of an unrealistically large number of gang blocks,
the aggressive fault injection done by ztest, and the split
segment reconstruction logic introduced by device removal has
resulted in the following type of failure:

  zdb -bccsv -G -d ... exit code 3

Specifically, zdb was unable to open the pool because it was
unable to reconstruct a damaged block.  Manual investigation
of multiple failures clearly showed that the block could be
reconstructed.  However, due to the large number of damaged
segments (>35) it could not be done in the allotted time.

Furthermore, the large number of gang blocks was determined
to be the reason for the unrealistically large number of
damaged segments.  In order to make this situation less
likely, this change both increases the forced gang block
size to 64k and reduces the frequency to 3% of blocks.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8080

5 years agoAdd libzutil for libzfs or libzpool consumers
Don Brady [Mon, 5 Nov 2018 19:22:33 +0000 (12:22 -0700)]
Add libzutil for libzfs or libzpool consumers

Adds a libzutil for utility functions that are common to libzfs and
libzpool consumers (most of what was in libzfs_import.c).  This
removes the need for utilities to link against both libzpool and
libzfs.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #8050

5 years agoUpdate zfs-events.5 with info from PSARC 2009/497
Richard Elling [Thu, 1 Nov 2018 22:54:55 +0000 (15:54 -0700)]
Update zfs-events.5 with info from PSARC 2009/497

Update zfs-events.5 with info from PSARC 2009/497 regarding ereport fields.
Also updates ZIO_STAGE_* and ZIO_FLAG_* descriptions to match current source.

Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #8057

5 years agoZTS: Fix posix ACL tests that should pass
Paul Zuchowski [Wed, 31 Oct 2018 23:58:43 +0000 (19:58 -0400)]
ZTS: Fix posix ACL tests that should pass

Make sure tests have proper include files.  Make sure underlying
"chmod" style permissions don't interfere with ACLs.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #8069

5 years agoZTS: change `$(cat)` to `$(<)` for speedup
George Melikov [Wed, 31 Oct 2018 17:00:06 +0000 (20:00 +0300)]
ZTS: change `$(cat)` to `$(<)` for speedup

It's better to use ksh/bash built in methods,
rather than spawn new processes every time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #8071

5 years agobpobj_enqueue_subobj() should copy small subobj's
Matthew Ahrens [Wed, 31 Oct 2018 16:58:17 +0000 (09:58 -0700)]
bpobj_enqueue_subobj() should copy small subobj's

When we delete a snapshot, we consolidate some bpobj's together because
we no longer need to keep their entries in separate buckets.  This is
done in constant time by including the "sub" bpobj by reference in the
parent bpobj.

After many snapshots have been deleted, we may have many sub-bpobj's.
Usually, most sub-bpobj's don't contain many BP's.  Compared to this
small payload, the sub-bpobj is relatively heavyweight since it is a
object in the MOS.  A common scenario on a long-lived pool is for the
vast majority of MOS objects to be small sub-bpobj's.

To improve this situation, when consolidating bpobj's together,
bpobj_enqueue_subobj() can copy the contents of small bpobj's into the
parent, and then delete the enqueued bpobj, rather than including it by
reference.  Since this copying is limited in size (to one block), the
consolidation is still constant time, though with a larger constant due
to reading in the one block of the enqueued bpobj.

This idea and mechanism are similar to how we handle "sub-subobj's".
When including a sub-bpobj by reference, if the sub-bpobj itself has
less than a block of sub-sub-bpobj's, the list of sub-sub-bpobj's is
copied to the parent bpobj's list of sub-bpobj's.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8053
Issue #7908

5 years agoLinux 4.20 compat: current_kernel_time()
Brian Behlendorf [Wed, 31 Oct 2018 16:50:42 +0000 (11:50 -0500)]
Linux 4.20 compat: current_kernel_time()

Commit torvalds/linux@976516404 removed the current_kernel_time()
function (and several others).  All callers are expected to use
current_kernel_time64().  Update the gethrestime_sec() wrapper
accordingly.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8074

5 years agoImprove snapshot listing error message
Md Islam [Tue, 30 Oct 2018 16:47:50 +0000 (12:47 -0400)]
Improve snapshot listing error message

Provide a hint in the error message if listing snapshots for a
single dataset fails.

Using -r is not needed to list all snapshots so requiring it when
listing snapshots for a single dataset makes it confusing. This
change will make the error message more clear.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Md Islam <mdnahian@outlook.com>
Closes #8047

5 years agozdb -k does not work on Linux when used with -e
Serapheim Dimitropoulos [Tue, 30 Oct 2018 16:46:18 +0000 (09:46 -0700)]
zdb -k does not work on Linux when used with -e

This minor bug was introduced with the port of the feature from
OpenZFS to ZoL. This patch fixes the issue that was caused by
a minor re-ordering from the original code.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #8001

5 years agoAdded column definitions to arcstat.py
Gregor Kopka [Mon, 29 Oct 2018 23:18:20 +0000 (00:18 +0100)]
Added column definitions to arcstat.py

grow: ARC Grow enabled (!arc_no_grow)
free: ARC Free memory (arc_sys_free)
need: ARC Reclaim need (arc_need_free)

Fixed alignment issues (mread had wrong width).

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #8058

5 years agoZTS: Fix auto_replace_001_pos test
Brian Behlendorf [Mon, 29 Oct 2018 20:05:14 +0000 (15:05 -0500)]
ZTS: Fix auto_replace_001_pos test

The root cause of these failures is that udev can notify the
ZED of newly created partition before its links are created.
Handle this by allowing an auto-replace to briefly wait until
udev confirms the links exist.

Distill this test case down to its essentials so it can be run
reliably.  What we need to check is that:

  1) A new disk, in the same physical location, is automatically
     brought online when added to the system,
  2) It completes the replacement process, and
  3) The pool is now ONLINE and healthy.

There is no need to remove the scsi_debug module.  After exporting
the pool the disk can be zeroed, removed, and then re-added to the
system as a new disk.

Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8051

5 years agoFix flake8 "invalid escape sequence 'x'" warning
Brian Behlendorf [Thu, 25 Oct 2018 06:26:08 +0000 (23:26 -0700)]
Fix flake8 "invalid escape sequence 'x'" warning

From, https://lintlyci.github.io/Flake8Rules/rules/W605.html

As of Python 3.6, a backslash-character pair that is not a valid
escape sequence now generates a DeprecationWarning. Although this
will eventually become a SyntaxError, that will not be for several
Python releases.

Note 'float_pobj' was simply removed from arcstat.py since it
was entirely unused.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8056

5 years agoRemove zfs_gitrev.h
Brian Behlendorf [Wed, 24 Oct 2018 21:48:14 +0000 (14:48 -0700)]
Remove zfs_gitrev.h

This generated file was accidentally included in previous commit,
80a91e7, and should not be included in the repository.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8054

5 years agoFix 2 small bugs with cached dsl_scan_phys_t
Tom Caputi [Tue, 23 Oct 2018 19:17:18 +0000 (15:17 -0400)]
Fix 2 small bugs with cached dsl_scan_phys_t

This patch corrects 2 small bugs where scn->scn_phys_cached was
not properly updated to match the primary copy when it needed to
be. The first resulted in the pause state not being properly
updated and the second resulted in the cached version being
completely zeroed even if the primary was not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010

5 years agoFix waiting in ztest_device_removal()
Tom Caputi [Mon, 22 Oct 2018 13:54:01 +0000 (09:54 -0400)]
Fix waiting in ztest_device_removal()

spa->spa_vdev_removal is created in a sync task that is initiated
via dsl_sync_task_nowait(). Since the task may not run before
spa_vdev_remove() returns, we must wait at least 1 txg to ensure
that the removal struct has been created.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010

5 years agoFix ENXIO from spa_ld_verify_logs() in ztest
Tom Caputi [Fri, 19 Oct 2018 22:10:23 +0000 (18:10 -0400)]
Fix ENXIO from spa_ld_verify_logs() in ztest

This patch fixes a small issue where the zil_check_log_chain()
code path would hit an EBUSY error. This would occur when
2 threads attempted to call metaslab_activate() at the same time.
In this case, the "loser" would receive an error code which should
have been ignored, but was instead floated to the caller. This
ended up resulting in an ENXIO being returned from from
spa_ld_verify_logs().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010

5 years agoFix ztest deadman panic with indirect vdev damage
Tom Caputi [Thu, 18 Oct 2018 20:53:27 +0000 (16:53 -0400)]
Fix ztest deadman panic with indirect vdev damage

This patch fixes an issue where ztest's deadman thread would
trigger a panic because reconstructing artifically damaged
blocks would take too long to reconstruct. This patch simply
limits how often ztest inflicts split-block damage and how
many segments it can damage when it does.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010

5 years agoFix issue with scanning dedup blocks as scan ends
Tom Caputi [Thu, 18 Oct 2018 08:13:07 +0000 (04:13 -0400)]
Fix issue with scanning dedup blocks as scan ends

This patch fixes an issue discovered by ztest where
dsl_scan_ddt_entry() could add I/Os to the dsl scan queues
between when the scan had finished all required work and
when the scan was marked as complete. This caused the scan
to spin indefinitely without ending.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010

5 years agoFix lock inversion in txg_sync_thread()
Tom Caputi [Tue, 16 Oct 2018 21:00:55 +0000 (17:00 -0400)]
Fix lock inversion in txg_sync_thread()

This patch fixes a lock inversion issue in txg_sync_thread() where
the code would attempt hold the spa config lock as a reader while
holding tx->tx_sync_lock. This races with spa_vdev_remove() which
attempts to hold the tx->tx_sync_lock to assign a new tx (via
spa_history_log_internal()) while holding the spa config lock as a
writer.

Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010

5 years agoFix dbgmsg printing in ztest and zdb
Tom Caputi [Mon, 15 Oct 2018 19:14:22 +0000 (15:14 -0400)]
Fix dbgmsg printing in ztest and zdb

This patch resolves a problem where the -G option in both zdb and
ztest would cause the code to call __dprintf() to print zfs_dbgmsg
output. This function was not properly wired to add messages to the
dbgmsg log as it is in userspace and so the messages were simply
dropped. This patch also tries to add some degree of distinction to
dprintf() (which now prints directly to stdout) and zfs_dbgmsg()
(which adds messages to an internal list that can be dumped with
zfs_dbgmsg_print()).

In addition, this patch corrects an issue where ztest used a global
variable to decide whether to dump the dbgmsg buffer on a crash.
This did not work because ztest spins up more instances of itself
using execv(), which did not copy the global variable to the new
process. The option has been moved to the ztest_shared_opts_t
which already exists for interprocess communication.

This patch also changes zfs_dbgmsg_print() to use write() calls
instead of printf() so that it will not fail when used in a signal
handler.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010

5 years agoFix ASSERT in zil_create() during ztest
Tom Caputi [Thu, 11 Oct 2018 20:38:27 +0000 (16:38 -0400)]
Fix ASSERT in zil_create() during ztest

This patch corrects an ASSERT in zil_create() that will only be
true if the call to zio_alloc_zil() does not fail.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010

5 years agoFix random ztest_deadman_thread failures
Tom Caputi [Wed, 10 Oct 2018 20:48:33 +0000 (16:48 -0400)]
Fix random ztest_deadman_thread failures

The zloop test has been failing in buildbot for the last few weeks
with various failures in ztest_deadman_thread(). This is due to the
fact that this thread is not stopped when performing pool import /
export tests as it should be. This patch simply corrects this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8010

5 years agoAllow use of pool GUID as root pool
George Melikov [Wed, 24 Oct 2018 03:06:40 +0000 (06:06 +0300)]
Allow use of pool GUID as root pool

It's helpful if there are pools with same names,
but you need to use only one of them.

Main case is twin servers, meanwhile some software
requires the same name of pools (e.g. Proxmox).

Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Igor ‘guardian’ Lidin of Moscow, Russia
Closes #8052

5 years agoZTS: Update project quota tests
Brian Behlendorf [Wed, 24 Oct 2018 02:53:14 +0000 (19:53 -0700)]
ZTS: Update project quota tests

e2fsprogs v1.44.1, which provides lsattr, added a new attribute
for ext3 called "verity".  It is reported after the project quota
flag as a 'V' character in the `lsattr` output.

Update projectid_001_pos.ksh and projecttree_001_pos.ksh to use
a pattern which will match the expected output in both cases.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8043

5 years agoMake gitrev more reliable
Matthew Ahrens [Mon, 22 Oct 2018 19:23:30 +0000 (12:23 -0700)]
Make gitrev more reliable

In some build methods, the gitrev is unnecessarily set to "unknown".
We can improve this by changing the gitrev to use
`git describe --always --long --dirty`.

This gets the revision even when no tag matches (--always).  It prints
the hash even when it exactly matches a tag (--long).  And if there are
uncommitted changes, it appends "-dirty", rather than failing (--dirty).

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Thode <prometheanfire@gentoo.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8034

5 years agoOpenZFS 9688 - aggsum_fini leaks memory
Paul Dagnelie [Wed, 23 May 2018 17:32:31 +0000 (10:32 -0700)]
OpenZFS 9688 - aggsum_fini leaks memory

Porting Notes:
- Most of these fixes were applied in the original 37fb3e43
  commit when this change was ported for Linux.

Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Jorgen Lundman <lundman@lundman.net>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: George Melikov <mail@gmelikov.ru>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/9688
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/29bf2d68be
Closes #8042

5 years agoOpenZFS 9682 - page fault in dsl_async_clone_destroy() while opening pool
Serapheim Dimitropoulos [Sun, 17 Jun 2018 00:39:14 +0000 (17:39 -0700)]
OpenZFS 9682 - page fault in dsl_async_clone_destroy() while opening pool

Authored by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: George Melikov <mail@gmelikov.ru>
OpenZFS-issue: https://www.illumos.org/issues/9682
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ade2c82828
Closes #8037

5 years agoOpenZFS 9690 - metaslab of vdev with no space maps was flushed during removal
Serapheim Dimitropoulos [Tue, 15 May 2018 17:43:25 +0000 (10:43 -0700)]
OpenZFS 9690 - metaslab of vdev with no space maps was flushed during removal

Authored by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: George Melikov <mail@gmelikov.ru>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/9690
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4e75ba6826
Closes #8039

5 years agoOpenZFS 9681 - ztest failure in spa_history_log_internal due to spa_rename()
Matthew Ahrens [Fri, 18 May 2018 22:56:13 +0000 (15:56 -0700)]
OpenZFS 9681 - ztest failure in spa_history_log_internal due to spa_rename()

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Tom Caputi <tcaputi@datto.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/9681
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/6aee0ad7
Closes #8041

5 years agoDefer new resilvers until the current one ends
Tom Caputi [Fri, 19 Oct 2018 04:06:18 +0000 (00:06 -0400)]
Defer new resilvers until the current one ends

Currently, if a resilver is triggered for any reason while an
existing one is running, zfs will immediately restart the existing
resilver from the beginning to include the new drive. This causes
problems for system administrators when a drive fails while another
is already resilvering. In this case, the optimal thing to do to
reduce risk of data loss is to wait for the current resilver to end
before immediately replacing the second failed drive, which allows
the system to operate with two incomplete drives for the minimum
amount of time.

This patch introduces the resilver_defer feature that essentially
does this for the admin without forcing them to wait and monitor
the resilver manually. The change requires an on-disk feature
since we must mark drives that are part of a deferred resilver in
the vdev config to ensure that we do not assume they are done
resilvering when an existing resilver completes.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: @mmaybee
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7732

5 years agoOpenZFS 9862 - fix typo in comment in vdev_impl.h
Allan Jude [Thu, 18 Oct 2018 11:14:04 +0000 (14:14 +0300)]
OpenZFS 9862 - fix typo in comment in vdev_impl.h

Authored by: Allan Jude <allanjude@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Tony Hutter <hutter2@llnl.gov>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: George Melikov <mail@gmelikov.ru>
OpenZFS-issue: https://www.illumos.org/issues/9862
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/84927f52
Closes #8036

5 years agoAllow copy-builtin to work with modified sources
Matthew Thode [Wed, 17 Oct 2018 19:06:05 +0000 (14:06 -0500)]
Allow copy-builtin to work with modified sources

`scripts/make_gitrev.sh` had 'set -e' so if any command failed it would
fail and cause copy-builtin to fail (copy-builtin also has `set -e`.
This commit also simplifies scripts/make_gitrev.sh to always write a
file by using a cleanup function.  It also simplifies other areas of
the script as well (making it much shorter).

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Thode <mthode@mthode.org>
Closes #8022
Closes #8025

5 years agozpool: allow sharing of spare device among pools
LOLi [Wed, 17 Oct 2018 18:21:07 +0000 (20:21 +0200)]
zpool: allow sharing of spare device among pools

ZFS allows, by default, sharing of spare devices among different pools;
this commit simply restores this functionality for disk devices and
adds an additional tests case to the ZFS Test Suite to prevent future
regression.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7999

5 years agoLinux does not HAVE_SMB_SHARE
Matthew Ahrens [Wed, 17 Oct 2018 17:31:38 +0000 (10:31 -0700)]
Linux does not HAVE_SMB_SHARE

Since Linux does not have an in-kernel SMB server, we don't need the
code to manage it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8032

5 years agoLinux does not HAVE_DNLC
Matthew Ahrens [Wed, 17 Oct 2018 17:30:08 +0000 (10:30 -0700)]
Linux does not HAVE_DNLC

Since Linux does not have the Directory Name Lookup Cache, we don't need
the code to manage it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8031

5 years agoAdvise users to retain issue/PR templates
bunder2015 [Wed, 17 Oct 2018 17:25:38 +0000 (13:25 -0400)]
Advise users to retain issue/PR templates

Occasionally we get issues and PRs from users who delete the
templates.  Advise users that their issues and PRs may be closed if
they do not fill out the templates as we really need this information.

Also updating PR template to drop unneeded approval toggle as we are
now using issue labels for status tracking.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #8029

5 years agoAdd types to featureflags in zfs
Paul Dagnelie [Tue, 16 Oct 2018 18:15:04 +0000 (11:15 -0700)]
Add types to featureflags in zfs

The boolean featureflags in use thus far in ZFS are extremely useful,
but because they take advantage of the zap layer, more interesting data
than just a true/false value can be stored in a featureflag. In redacted
send/receive, this is used to store the list of redaction snapshots for
a redacted dataset.

This change adds the ability for ZFS to store types other than a boolean
in a featureflag. The only other implemented type is a uint64_t array.
It also modifies the interfaces around dataset features to accomodate
the new capabilities, and adds a few new functions to increase
encapsulation.

This functionality will be used by the Redacted Send/Receive feature.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7981

5 years agodeadlock between mm_sem and tx assign in zfs_write() and page fault
ilbsmart [Tue, 16 Oct 2018 18:11:24 +0000 (02:11 +0800)]
deadlock between mm_sem and tx assign in zfs_write() and page fault

The bug time sequence:
1. thread #1, `zfs_write` assign a txg "n".
2. In a same process, thread #2, mmap page fault (which means the
   `mm_sem` is hold) occurred, `zfs_dirty_inode` open a txg failed,
   and wait previous txg "n" completed.
3. thread #1 call `uiomove` to write, however page fault is occurred
   in `uiomove`, which means it need `mm_sem`, but `mm_sem` is hold by
   thread #2, so it stuck and can't complete,  then txg "n" will
   not complete.

So thread #1 and thread #2 are deadlocked.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Grady Wong <grady.w@xtaotech.com>
Closes #7939

5 years agoAdd zts-report.py to python shebang exclusion
Brian Behlendorf [Mon, 15 Oct 2018 23:59:28 +0000 (16:59 -0700)]
Add zts-report.py to python shebang exclusion

Include zts-report.py is the __brp_mangle_shebangs_exclude_from
to resolve build failures in Fedora 28 and newer.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8020
Issue #7360

5 years agoOpenZFS 9847 - leaking dd_clones (DMU_OT_DSL_CLONES) objects (#7979)
Matthew Ahrens [Fri, 12 Oct 2018 18:28:26 +0000 (11:28 -0700)]
OpenZFS 9847 - leaking dd_clones (DMU_OT_DSL_CLONES) objects (#7979)

OpenZFS 9847 - leaking dd_clones (DMU_OT_DSL_CLONES) objects

We're leaking the dd_clones objects in dsl_dir_destroy_sync.  This bug
appears to have been around forever.  Thankfully the amount of space
typically involved is tiny.

In addition this adds a mechanism in ZDB to find objects in the MOS
which are leaked (not referenced anywhere).

Porting notes:
* Added dd_crypto_obj to ZDB MOS object leak tracking

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Matthew Ahrens <mahrens@delphix.com>
OpenZFS-issue: https://illumos.org/issues/9847
Closes #7979

5 years agoImproved error handling for extreme rewinds
Brian Behlendorf [Tue, 9 Oct 2018 22:42:42 +0000 (15:42 -0700)]
Improved error handling for extreme rewinds

The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and
vdev_obsolete_counts_are_precise() functions assume that the
only way a zap_lookup() can fail is if the requested entry is
missing.  While this is the most common cause, it's not the only
cause.  Attemping to access a damaged ZAP will result in other
errors.

The most likely scenario for accessing a damaged ZAP is during
an extreme rewind pool import.  Under these conditions the pool
is expected to contain damaged objects and the import code was
updated to handle this gracefully.  Getting an ECKSUM error from
these ZAPs after the pool in import a far less likely, therefore
the behavior for call paths was not modified.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7809
Closes #7921

5 years agoRevert "Allow ECKSUM in vdev_checkpoint_sm_object()"
Brian Behlendorf [Tue, 9 Oct 2018 22:21:27 +0000 (15:21 -0700)]
Revert "Allow ECKSUM in vdev_checkpoint_sm_object()"

This reverts commit e927fc8a522e1c0db89955cc555841aa23bbd634.

Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7921

5 years agoDefine timestruc_t for Lustre compatibility
Tony Hutter [Fri, 12 Oct 2018 18:13:34 +0000 (11:13 -0700)]
Define timestruc_t for Lustre compatibility

Lustre 2.8 (and possibly other versions) are still using timestruc_t,
which was removed in spl-0.7.10 in favor of inode_timespec_t.  Add
in a backwards compatibility #define for timestruc_t so that Lustre
builds.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8014

5 years agoOpenZFS 9689 - zfs range lock code should not be zpl-specific
Matt Ahrens [Mon, 1 Oct 2018 22:13:12 +0000 (15:13 -0700)]
OpenZFS 9689 - zfs range lock code should not be zpl-specific

The ZFS range locking code in zfs_rlock.c/h depends on ZPL-specific
data structures, specifically znode_t.  However, it's also used by
the ZVOL code, which uses a "dummy" znode_t to pass to the range
locking code.

We should clean this up so that the range locking code is generic
and can be used equally by ZPL and ZVOL, and also can be used by
future consumers that may need to run in userland (libzpool) as
well as the kernel.

Porting notes:
* Added missing sys/avl.h include to sys/zfs_rlock.h.
* Removed 'dbuf is within the locked range' ASSERTs from dmu_sync().
  This was needed because ztest does not yet use a locked_range_t.
* Removed "Approved by:" tag requirement from OpenZFS commit
  check to prevent needless warnings when integrating changes
  which has not been merged to illumos.
* Reverted free_list range lock changes which were originally
  needed to defer the cv_destroy() which was called immediately
  after cv_broadcast().  With d2733258 this should be safe but
  if not we may need to reintroduce this logic.
* Reverts: The following two commits were reverted and squashed in
  to this change in order to make it easier to apply OpenZFS 9689.
  - d88895a0, which removed the dummy znode from zvol_state
  - e3a07cd0, which updated ztest to use range locks
* Preserved optimized rangelock comparison function.  Preserved the
  rangelock free list.  The cv_destroy() function will block waiting
  for all processes in cv_wait() to be scheduled and drop their
  reference.  This is done to ensure it's safe to free the condition
  variable.  However, blocking while holding the rl->rl_lock mutex
  can result in a deadlock on Linux.  A free list is introduced to
  defer the cv_destroy() and kmem_free() until after the mutex is
  released.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9689
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/680
External-issue: DLPX-58662
Closes #7980

5 years agoFix changelist mounted-dataset iteration
Alek P [Thu, 11 Oct 2018 04:13:13 +0000 (00:13 -0400)]
Fix changelist mounted-dataset iteration

Commit 0c6d093 caused a regression in the inherit codepath.
The fix is to restrict the changelist iteration on mountpoints and
add proper handling for 'legacy' mountpoints

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #7988
Closes #7991

5 years agoCheck scheduler for "noop" before setting "noop"
Garrett Fields [Wed, 10 Oct 2018 15:46:22 +0000 (11:46 -0400)]
Check scheduler for "noop" before setting "noop"

Originally code only checked for presence of "/sys/block/$i/queue/
scheduler".  "sh: write error: Invalid argument" was produced when
trying to set "noop" on certain devices (eg. virtio) when it isn't
a listed option. This modification continues to check for the presence
of "/sys/block/$i/queue/scheduler" and also checks that it contains
"noop" as an option before setting "noop".

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes #8004

5 years agoPrint "(repairing)" in zpool status again
Tony Hutter [Wed, 10 Oct 2018 03:30:32 +0000 (20:30 -0700)]
Print "(repairing)" in zpool status again

Historically, zpool status prints "(repairing)" for any drives that
have errors during a scrub:

        NAME            STATE     READ WRITE CKSUM
        mypool          ONLINE       0     0     0
          mirror-0      ONLINE       0     0     0
            /tmp/file1  ONLINE      13     0     0  (repairing)
            /tmp/file2  ONLINE       0     0     0
            /tmp/file3  ONLINE       0     0     0

This was accidentally broken in "OpenZFS 9166 - zfs storage pool
checkpoint" (d2734cc).  This patch adds it back in.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7779
Closes #7978

5 years ago Refactor dmu_recv into its own file
Paul Dagnelie [Tue, 9 Oct 2018 21:05:13 +0000 (14:05 -0700)]
 Refactor dmu_recv into its own file

This change moves the bottom half of dmu_send.c (where the receive
logic is kept) into a new file, dmu_recv.c, and does similarly
for receive-related changes in header files.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7982

5 years agoFix arc_release() refcount
Brian Behlendorf [Mon, 8 Oct 2018 21:59:34 +0000 (14:59 -0700)]
Fix arc_release() refcount

Update arc_release to use arc_buf_size().  This hunk was accidentally
dropped when porting compressed send/recv, 2aa34383b.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8000

5 years agoAdd zfs_refcount_transfer_ownership_many()
Brian Behlendorf [Mon, 8 Oct 2018 21:58:21 +0000 (14:58 -0700)]
Add zfs_refcount_transfer_ownership_many()

When debugging is enabled and a zfs_refcount_t contains multiple holders
using the same key, but different ref_counts, the wrong reference_t may
be transferred.  Add a zfs_refcount_transfer_ownership_many() function,
like the existing zfs_refcount_*_many() functions, to match and transfer
the correct refcount_t;

This issue may occur when using encryption with refcount debugging
enabled.  An arc_buf_hdr_t can have references for both the
hdr->b_l1hdr.b_pabd and hdr->b_crypt_hdr.b_rabd both of which use
the hdr as the reference holder.  When unsharing the buffer the
p_abd should be transferred.

This issue does not impact production builds because refcount holders
are not tracked.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7219
Closes #8000

5 years agoCreate /proc/sys/kernel/spl/gitrev with git hash
Matthew Ahrens [Tue, 9 Oct 2018 04:57:02 +0000 (21:57 -0700)]
Create /proc/sys/kernel/spl/gitrev with git hash

The existing mechanisms for determining what code is running in the
kernel do not always correctly report the git hash.  The versions
reported there do not reflect changes made since `configure` was run
(i.e. incremental builds do not update the version) and they are
misleading if git tags are not set up properly.  This applies to
`modinfo zfs`, `dmesg`, and `/sys/module/zfs/version`.

There are complicated requirements on how the existing version is
generated.  Therefore we are leaving that alone, and adding a new
mechanism to record and retrieve the git hash:
`cat /proc/sys/kernel/spl/gitrev`

The gitrev is re-generated at compile time, when running `make`
(including for incremental builds).  The value is the output of `git
describe` (or "unknown" if not in a git repo or there are uncommitted
changes).

We're also removing /proc/sys/kernel/spl/version, which was never very
useful.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Tim Chase <tim@chase2k.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7931
Closes #7965

5 years agoOpenZFS 9617 - too-frequent TXG sync causes excessive write inflation
Matthew Ahrens [Tue, 12 Dec 2017 23:46:58 +0000 (15:46 -0800)]
OpenZFS 9617 - too-frequent TXG sync causes excessive write inflation

Porting notes:
* Renamed zfs_dirty_data_sync_pct to zfs_dirty_data_sync_percent and
  changed the type to be consistent with the other dirty module params.
* Updated zfs-module-parameters.5 accordingly.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9617
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7928f4ba
Closes #7976

5 years agoWarn if checking programs are not installed
Matthew Ahrens [Thu, 4 Oct 2018 20:11:45 +0000 (13:11 -0700)]
Warn if checking programs are not installed

`make checkstyle` silently skips checks if the required programs are not
installed (e.g. shellcheck, mandoc).  Therefore developers may not
realize that they are not getting the full suite of code checks.  This
also applies to more specific targets like `make shellcheck`.

We should print a warning message when a check is skipped due to missing
tools.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7984

5 years agoAdd codecheck make target
Matthew Ahrens [Thu, 4 Oct 2018 20:10:10 +0000 (13:10 -0700)]
Add codecheck make target

We'd like to have tooling that verifies code style, while ignoring the
commit message.  For example, code does not need to be signed off in
order to be tested.  Current workarounds are to run `git checkstyle` and
ignore the commit message errors, or to run `make cstyle shellcheck
flake8 mancheck testscheck`, and make sure that list stays updated.

Solution is to add a new make target, `codecheck` which does all the
code checks.  `checkstyle` is now simply `codecheck` + `commitcheck`.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7985

5 years agoFix ASSERT macros to not over-expand
Paul Dagnelie [Thu, 4 Oct 2018 03:16:45 +0000 (20:16 -0700)]
Fix ASSERT macros to not over-expand

The code reuse in the definitions of the ASSERT and VERIFY macros result
in expansion of their arguments before they are stringified, which
produces ugly and undesirable output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7884

5 years agoAdd new fnvlist_lookup_* functions
Paul Dagnelie [Wed, 3 Oct 2018 22:30:55 +0000 (15:30 -0700)]
Add new fnvlist_lookup_* functions

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7977

5 years agoVerify 'zfs destroy' will unshare the dataset
Prakash Surya [Fri, 21 Sep 2018 17:54:49 +0000 (10:54 -0700)]
Verify 'zfs destroy' will unshare the dataset

This change adds a new test case to the zfs-test suite to verify that
when 'zfs destroy' is used on a shared dataset, the dataset will be
unshared after the destroy operation completes.

Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #7941

5 years agoFix "zfs destroy" when "sharenfs=on" is used
Prakash Surya [Fri, 21 Sep 2018 15:47:42 +0000 (08:47 -0700)]
Fix "zfs destroy" when "sharenfs=on" is used

When using "zfs destroy" on a dataset that is using "sharenfs=on" and
has been automatically exported (by libzfs), the dataset will not be
automatically unexported as it should be. This workflow appears to have
been broken by this commit: 3fd3e56cfd543d7d7a1bf502bfc0db6e24139668

In that change, the "zfs_unmount" function was modified to use the
"mnt.mnt_special" field when determining the mount point that is being
unmounted, rather than "mnt.mnt_mountp".

As a result, when "mntpt" is passed into "zfs_unshare_proto", it's value
is now the dataset name rather than the mountpoint. Thus, when this
value is used with the "is_shared" function (via "zfs_unshare_proto") it
will not find a match (since that function assumes it'll be passed the
mountpoint) and incorrectly reports that the dataset is not shared.

This can be easily reproduced with the following commands:

    $ sudo zpool create tank xvdb
    $ sudo zfs create -o sharenfs=on tank/fish
    $ sudo zfs destroy tank/fish

    $ sudo zfs list -r tank
    NAME   USED  AVAIL  REFER  MOUNTPOINT
    tank  97.5K  7.27G    24K  /tank

    $ sudo exportfs
    /tank/fish      <world>
    $ sudo cat /etc/dfs/sharetab
    /tank/fish      -       nfs     rw,crossmnt

At this point, the "tank/fish" filesystem doesn't exist, but it's still
listed as exported when looking at "exportfs" and "/etc/dfs/sharetab".

Also note, this change brings us back in-sync with the illumos code, as
it pertains to this one line; on illumos, "mnt.mnt_mountp" is used.

Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Issue #6143
Closes #7941

5 years agoOpenZFS 9677 - panic from zio_write_gang_block()
Brad Lewis [Thu, 15 Mar 2018 19:47:26 +0000 (13:47 -0600)]
OpenZFS 9677 - panic from zio_write_gang_block()

Panic from zio_write_gang_block() when creating dump device
on fragmented rpool.

Authored by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9677
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7341a7d
Closes #7975

5 years agoOpenZFS 9616 - Bogus error when attempting to set property on read-only pool
Andrew Stormont [Sun, 17 Jun 2018 10:53:29 +0000 (11:53 +0100)]
OpenZFS 9616 - Bogus error when attempting to set property on read-only pool

Authored by: Andrew Stormont <astormont@racktopsystems.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9616
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f62db44d
Closes #7974

5 years agoRefcounted DSL Crypto Key Mappings
Tom Caputi [Wed, 3 Oct 2018 16:47:11 +0000 (12:47 -0400)]
Refcounted DSL Crypto Key Mappings

Since native ZFS encryption was merged, we have been fighting
against a series of bugs that come down to the same problem: Key
mappings (which must be present during all I/O operations) are
created and destroyed based on dataset ownership, but I/Os can
have traditionally been allowed to "leak" into the next txg after
the dataset is disowned.

In the past we have attempted to solve this problem by trying to
ensure that datasets are disowned ater all I/O is finished by
calling txg_wait_synced(), but we have repeatedly found edge cases
that need to be squashed and code paths that might incur a high
number of txg syncs. This patch attempts to resolve this issue
differently, by adding a reference to the key mapping for each txg
it is dirtied in. By doing so, we can remove many of the
unnecessary calls to txg_wait_synced() we have added in the past
and ensure we don't need to deal with this problem in the future.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7949

5 years agoOpenZFS 9700 - ZFS resilvered mirror does not balance reads
Jerry Jelinek [Thu, 30 Aug 2018 13:13:30 +0000 (13:13 +0000)]
OpenZFS 9700 - ZFS resilvered mirror does not balance reads

Authored by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9700
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/82f63c3c
Closes #7973

5 years agoOpenZFS 9763 - zfs(1M): broken formatting in allow/unallow description
Yuri Pankov [Fri, 24 Aug 2018 00:36:04 +0000 (03:36 +0300)]
OpenZFS 9763 -  zfs(1M): broken formatting in allow/unallow description

Porting notes:
* Two of the three changes from the upstream patch were already
  applied for Linux.  Only the last one is required.

Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Gordon Ross <gwr@nexenta.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9763
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8a702e55
Closes #7972

5 years agochangelist should be able to iter on mounts
Alek P [Tue, 2 Oct 2018 19:30:58 +0000 (15:30 -0400)]
changelist should be able to iter on mounts

Modified changelist_gather()ing for the mountpoint property.
Now instead of iterating on all dataset descendants, we read
/proc/self/mounts and iterate on the mounted descendant datasets only.

Switched changelist implementation from a uu_list_* to uu_avl_* in
order to  reduce changlist code-path's worst case time complexity.

Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #7967

5 years agoZTS: Fix snapshot_009_pos, snapshot_010_pos
Brian Behlendorf [Tue, 2 Oct 2018 00:15:57 +0000 (17:15 -0700)]
ZTS: Fix snapshot_009_pos, snapshot_010_pos

Mitigate the likelihood of the newly created volumes being busy
when the 'zfs destroy -r' is issued by waiting for udev to settle.
Since this is not a iron clad fix I've added the test case to
the known list of possible failures and referenced issue #7961.

Finally, in the case this test does fail fix the cleanup logic
so subsequent tests won't incorrectly fail.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7961
Closes #7962

5 years agoPrefix all refcount functions with zfs_
Tim Schumacher [Mon, 1 Oct 2018 17:42:05 +0000 (19:42 +0200)]
Prefix all refcount functions with zfs_

Recent changes in the Linux kernel made it necessary to prefix
the refcount_add() function with zfs_ due to a name collision.

To bring the other functions in line with that and to avoid future
collisions, prefix the other refcount functions as well.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Closes #7963

5 years agoRemove duplicate macro in dsl_dir.h
Matthew Ahrens [Mon, 1 Oct 2018 17:40:11 +0000 (10:40 -0700)]
Remove duplicate macro in dsl_dir.h

The DD_FIELD_LAST_REMAP_TXG macro was added twice (with the same value).
This change removes one of them.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7968

5 years agoRefine split block reconstruction
Brian Behlendorf [Mon, 1 Oct 2018 17:36:34 +0000 (10:36 -0700)]
Refine split block reconstruction

Due to a flaw in 4589f3ae the number of unique combinations
could be calculated incorrectly.  This could result in the
random combinations reconstruction being used when it would
have been possible to check all combinations.

This change fixes the unique combinations calculation and
simplifies the reconstruction logic by maintaining a per-
segment list of unique copies.

The vdev_indirect_splits_damage() function was introduced
to validate both the enumeration and random reconstruction
logic with ztest.  It is implemented such it will never
make a known recoverable block unrecoverable.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #6900
Closes #7934

5 years agoFixes for procfs files backed by linked lists
John Gallagher [Wed, 26 Sep 2018 18:08:12 +0000 (11:08 -0700)]
Fixes for procfs files backed by linked lists

There are some issues with the way the seq_file interface is implemented
for kstats backed by linked lists (zfs_dbgmsgs and certain per-pool
debugging info):

* We don't account for the fact that seq_file sometimes visits a node
  multiple times, which results in missing messages when read through
  procfs.
* We don't keep separate state for each reader of a file, so concurrent
  readers will receive incorrect results.
* We don't account for the fact that entries may have been removed from
  the list between read syscalls, so reading from these files in procfs
  can cause the system to crash.

This change fixes these issues and adds procfs_list, a wrapper around a
linked list which abstracts away the details of implementing the
seq_file interface for a list and exposing the contents of the list
through procfs.

Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
External-issue: LX-1211
Closes #7819

5 years agoFix flake 8 style warnings
Gregor Kopka [Wed, 26 Sep 2018 18:02:26 +0000 (20:02 +0200)]
Fix flake 8 style warnings

Ran zts-report.py and test-runner.py from ./tests/test-runner/bin/
through the 2to3 (https://docs.python.org/2/library/2to3.html).
Checked the result, fixed:
- 'maxint' -> 'maxsize' that 2to3 missed.
- 'cmp=' parameter for a 'sorted()' with a 'key=' version.
- try/except wrapping of configparser import as there are still
  python 2.7 systems that lack a compatibility shim

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #7925
Closes #7952

5 years agoLinux 4.19-rc3+ compat: Remove refcount_t compat
Tim Schumacher [Wed, 26 Sep 2018 17:29:26 +0000 (19:29 +0200)]
Linux 4.19-rc3+ compat: Remove refcount_t compat

torvalds/linux@59b57717f ("blkcg: delay blkg destruction until
after writeback has finished") added a refcount_t to the blkcg
structure. Due to the refcount_t compatibility code, zfs_refcount_t
was used by mistake.

Resolve this by removing the compatibility code and replacing the
occurrences of refcount_t with zfs_refcount_t.

Reviewed-by: Franz Pletz <fpletz@fnordicwalking.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Closes #7885
Closes #7932

5 years agoFix small sysfs leak
Brian Behlendorf [Wed, 26 Sep 2018 16:50:58 +0000 (09:50 -0700)]
Fix small sysfs leak

When zfs_kobj_init() is called with an attr_cnt of 0 only the
kobj->zko_default_attrs is allocated.  It subsequently won't
get freed in zfs_kobj_release since the free is wrapped in
a kobj->zko_attr_count != 0 conditional.

Split the block in zfs_kobj_release() to make sure the
kobj->zko_default_attrs are freed in this case.

Additionally, fix a minor spelling mistake and typo in
zfs_kobj_init() which could also cause a leak but in practice
is almost certain not to fail.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7957

5 years agoZpool iostat: remove latency/queue scaling
Gregor Kopka [Tue, 25 Sep 2018 23:29:16 +0000 (01:29 +0200)]
Zpool iostat: remove latency/queue scaling

Bandwidth and iops are average per second while *_wait are averages
per request for latency or, for queue depths, an instantaneous
measurement at the end of an interval (according to man zpool).

When calculating the first two it makes sense to do
x/interval_duration (x being the increase in total bytes or number of
requests over the duration of the interval, interval_duration in
seconds) to 'scale' from amount/interval_duration to amount/second.

But applying the same math for the latter (*_wait latencies/queue) is
wrong as there is no interval_duration component in the values (these
are time/requests to get to average_time/request or already an
absulute number).

This bug leads to the only correct continuous *_wait figures for both
latencies and queue depths from 'zpool iostat -l/q' being with
duration=1 as then the wrong math cancels itself (x/1 is a nop).

This removes temporal scaling from latency and queue depth figures.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #7945
Closes #7694

5 years agoRevert "Fix flake 8 style warnings"
Brian Behlendorf [Tue, 25 Sep 2018 00:16:41 +0000 (17:16 -0700)]
Revert "Fix flake 8 style warnings"

This reverts commit b8fd4310c54444eecb66140d99a6156f4353b29b which
accidentally introduced a regression for some versions of python.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #7929

5 years agoFix statfs(2) for 32-bit user space
Brian Behlendorf [Tue, 25 Sep 2018 00:11:25 +0000 (17:11 -0700)]
Fix statfs(2) for 32-bit user space

When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW error will be returned.

This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled as
needed to fit in a 32-bit unsigned long.

Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are no changes to the
internal ZFS limits.

Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #7927
Closes #7122
Closes #7937

5 years agoZTS: Fix removal_resume_export
LOLi [Mon, 24 Sep 2018 19:58:16 +0000 (21:58 +0200)]
ZTS: Fix removal_resume_export

This change simplify the test case removing part of the logic which was
introducing a race condition and thus causing spurious failures: we use
attempt_during_removal() from removal.kshlib instead which has been
observed to be more stable.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7894
Closes #7913

5 years agoFix flake 8 style warnings
Gregor Kopka [Mon, 24 Sep 2018 17:12:59 +0000 (19:12 +0200)]
Fix flake 8 style warnings

Ran zts-report.py and test-runner.py from ./tests/test-runner/bin/
through the 2to3 (https://docs.python.org/2/library/2to3.html).
Checked the result, fixed:
- 'maxint' -> 'maxsize' that 2to3 missed.
- 'cmp=' parameter for a 'sorted()' with a 'key=' version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #7925
Closes #7929

5 years agovdev_disk_error() prints ASCII SOH to debug log
LOLi [Fri, 21 Sep 2018 16:42:42 +0000 (18:42 +0200)]
vdev_disk_error() prints ASCII SOH to debug log

Currently vdev_disk_error() prepends its messages sent to the internal
ZFS debug log with KERN_WARNING, which is currently defined as follows:

   #define KERN_SOH      "\001"
   #define KERN_WARNING  KERN_SOH "4"

Since "\001" (ASCII Start Of Header) is not printable this results in
weird characters displayed when inspecting the debug log. This commit
simply removes this superfluous prefix passed to zfs_dbgmsg().

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7936

5 years agoFix reference to zpool-features(5)
DeHackEd [Fri, 21 Sep 2018 16:41:08 +0000 (12:41 -0400)]
Fix reference to zpool-features(5)

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: DHE <git@dehacked.net>
Closes #7938

5 years agoAdd limits to spa_slop_shift tunable
LOLi [Fri, 21 Sep 2018 04:10:12 +0000 (06:10 +0200)]
Add limits to spa_slop_shift tunable

This change adds limits to the possible spa_slop_shift values set via
the sysfs interface. Accepted values are from a minimum of 1 to a
maximum of 31 (inclusive): these limits are based on the following
values observed on a 128PB file-vdev test pool:

spa_slop_shift=1, spa_get_slop_space=63.5PiB
spa_slop_shift=2, spa_get_slop_space=31.8PiB
spa_slop_shift=3, spa_get_slop_space=15.9PiB
spa_slop_shift=4, spa_get_slop_space=7.9PiB
spa_slop_shift=5, spa_get_slop_space=4PiB
spa_slop_shift=6, spa_get_slop_space=2PiB
...
spa_slop_shift=25, spa_get_slop_space=4GiB
spa_slop_shift=26, spa_get_slop_space=2GiB
spa_slop_shift=27, spa_get_slop_space=1016MiB
spa_slop_shift=28, spa_get_slop_space=508MiB
spa_slop_shift=29, spa_get_slop_space=254MiB
spa_slop_shift=30, spa_get_slop_space=128MiB
spa_slop_shift=31, spa_get_slop_space=128MiB
spa_slop_shift=32, spa_get_slop_space=128MiB

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7876
Closes #7900