Matthew Macy [Tue, 1 Oct 2019 23:35:05 +0000 (16:35 -0700)]
OpenZFS restructuring - arc_stats
Make arc_stats visible to platform code.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9386
dacianstremtan [Tue, 1 Oct 2019 19:54:27 +0000 (15:54 -0400)]
Fix for zfs-dracut regression
Line 31 and 32 overwrote the ${root} variable which broke mount-zfs.sh
We have create a new variable for the dataset instead of overwriting the
${root} variable in zfs-load-key.sh${root} variable in zfs-load-key.sh
Reduce the time required for ./configure to perform the needed
KABI checks by allowing kbuild to compile multiple test cases in
parallel. This was accomplished by splitting each test's source
code from the logic handling whether that code could be compiled
or not.
By introducing this split it's possible to minimize the number of
times kbuild needs to be invoked. As importantly, it means all of
the tests can be built in parallel. This does require a little extra
care since we expect some tests to fail, so the --keep-going (-k)
option must be provided otherwise some tests may not get compiled.
Furthermore, since a failure during the kbuild modpost phase will
result in an early exit; the final linking phase is limited to tests
which passed the initial compilation and produced an object file.
Once everything has been built the configure script proceeds as
previously. The only significant difference is that it now merely
needs to test for the existence of a .ko file to determine the
result of a given test. This vastly speeds up the entire process.
New test cases should use ZFS_LINUX_TEST_SRC to declare their test
source code and ZFS_LINUX_TEST_RESULT to check the result. All of
the existing kernel-*.m4 files have been updated accordingly, see
config/kernel-current-time.m4 for a basic example. The legacy
ZFS_LINUX_TRY_COMPILE macro has been kept to handle special cases
but it's use is not encouraged.
master (secs) patched (secs)
------------- ----------------
autogen.sh 61 68
configure 137 24 (~17% of current run time)
make -j $(nproc) 44 44
make rpms 287 150
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8547
Closes #9132
Closes #9341
Prakash Surya [Tue, 1 Oct 2019 19:33:12 +0000 (12:33 -0700)]
Timeout waiting for ZVOL device to be created
We've seen cases where after creating a ZVOL, the ZVOL device node in
"/dev" isn't generated after 20 seconds of waiting, which is the point
at which our applications gives up on waiting and reports an error.
The workload when this occurs is to "refresh" 400+ ZVOLs roughly at the
same time, based on a policy set by the user. This refresh operation
will destroy the ZVOL, and re-create it based on a snapshot.
When this occurs, we see many hundreds of entries on the "z_zvol" taskq
(based on inspection of the /proc/spl/taskq-all file). Many of the
entries on the taskq end up in the "zvol_remove_minors_impl" function,
and I've measured the latency of that function:
That data is from a 10 second sample, using the BCC "funclatency" tool.
As we can see, in this 10 second sample, most calls took 128ms at a
minimum. Thus, some basic math tells us that in any 20 second interval,
we could only process at most about 150 removals, which is much less
than the 400+ that'll occur based on the workload.
As a result of this, and since all ZVOL minor operations will go through
the single threaded "z_zvol" taskq, the latency for creating a single
ZVOL device can be unreasonably large due to other ZVOL activity on the
system. In our case, it's large enough to cause the application to
generate an error and fail the operation.
When profiling the "zvol_remove_minors_impl" function, I saw that most
of the time in the function was spent off-cpu, blocked in the function
"taskq_wait_outstanding". How this works, is "zvol_remove_minors_impl"
will dispatch calls to "zvol_free" using the "system_taskq", and then
the "taskq_wait_outstanding" function is used to wait for all of those
dispatched calls to occur before "zvol_remove_minors_impl" will return.
As far as I can tell, "zvol_remove_minors_impl" doesn't necessarily have
to wait for all calls to "zvol_free" to occur before it returns. Thus,
this change removes the call to "taskq_wait_oustanding", so that calls
to "zvol_free" don't affect the latency of "zvol_remove_minors_impl".
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Gallagher <john.gallagher@delphix.com> Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9380
Matthew Macy [Mon, 30 Sep 2019 19:16:06 +0000 (12:16 -0700)]
OpenZFS restructuring - zpool
Factor Linux specific functions out of the zpool command.
Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9333
Matthew Macy [Fri, 27 Sep 2019 17:46:28 +0000 (10:46 -0700)]
OpenZFS restructuring - zfs_ioctl
Refactor the zfs ioctls in to platform dependent and independent bits.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Signed-off-by: Matthew Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9301
Ben McGough [Thu, 26 Sep 2019 16:52:10 +0000 (09:52 -0700)]
Adding slack notifier
Allow ZED notification via slack incoming webhook.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Ben McGough <bmcgough@fredhutch.org>
Closes #9076
Closes #9350
Tom Caputi [Thu, 26 Sep 2019 00:02:33 +0000 (20:02 -0400)]
Fix encryption hierarchy issues with zfs recv -d
Currently, the recv_fix_encryption_hierarchy() function accepts
'destsnap' as one of its parameters. Originally, this was intended
to be the top-level dataset of a receive (whether or not the
receive was recursive). Unfortunately, this parameter actually is
simply the input that is passed in from the command line. When
the user specifies 'zfs recv -d', this string is actually only the
name of the receiving pool since the rest of the name is derived
from the send stream. This causes the function to fail, leaving
some datasets with an invalid encryption hierarchy.
This patch resolves this problem by passing in the top_zfs variable
instead. In order to make this work, this patch also includes some
changes that ensure the value is always present when we need it.
Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9273
Closes #9309
Ryan Moeller [Wed, 25 Sep 2019 18:42:04 +0000 (14:42 -0400)]
ZTS: Fix typos in zfs_destroy tests cleanup
lot_must -> log_must
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9362
Brian Behlendorf [Wed, 25 Sep 2019 16:24:45 +0000 (09:24 -0700)]
ZTS: harden xattr/cleanup.ksh
When the xattr/cleanup.ksh script is unable to remove the test group
due to an active process then it will not call default_cleanup. This
will result in a zvol_ENOSPC/setup failure when attempting to create
the /mnt/testdir directory which will already exist.
Resolve the issue by performing the default_cleanup before removing
the test user and group to ensure this step always happens. Also
allow one more retry to further minimize the likelihood of the
cleanup failing.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9358
Brian Behlendorf [Wed, 25 Sep 2019 16:23:29 +0000 (09:23 -0700)]
Add warning for zfs_vdev_elevator option removal
Originally the zfs_vdev_elevator module option was added as a
convenience so the requested elevator would be automatically set
on the underlying block devices. At the time this was simple
because the kernel provided an API function which did exactly this.
This API was then removed in the Linux 4.12 kernel which prompted
us to add compatibly code to set the elevator via a usermodehelper.
While well intentioned this introduced a bug which could cause a
system hang, that issue was subsequently fixed by commit 2a0d4188.
In order to avoid future bugs in this area, and to simplify the code,
this functionality is being deprecated. A console warning has been
added to notify any existing consumers and the documentation updated
accordingly. This option will remain for the lifetime of the 0.8.x
series for compatibility but if planned to be phased out of master.
Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8664
Closes #9317
Trying to 'zfs diff' a snapshot with large dnodes will incorrectly try
to access its interior slots when dnodesize > sizeof(dnode_phys_t).
This is normally not an issue because the interior slots are
zero-filled, which report_dnode() handles calling
report_free_dnode_range(). However this is not the case for encrypted
large dnodes or filesystem using many SA based xattrs where the extra
data past the legacy dnode size boundary is interpreted as a
dnode_phys_t.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7678
Closes #8931
Closes #9343
Ryan Moeller [Sun, 22 Sep 2019 22:27:53 +0000 (18:27 -0400)]
Use signed types to prevent subtraction overflow
The difference between the sizes could be positive or negative. Leaving
the types as unsigned means the result overflows when the difference is
negative and removing the labs() means we'll have introduced a bug. The
subtraction results in the correct value when the unsigned integer is
interpreted as a signed integer by labs().
Clang doesn't see that we're doing a subtraction and abusing the types.
It sees the result of the subtraction, an unsigned value, being passed
to an absolute value function and emits a warning which we treat as an
error.
Reviewed by: Youzhong Yang <youzhong@gmail.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9355
Disabled resilver_defer feature leads to looping resilvers
When a disk is replaced with another on a pool with the resilver_defer
feature present, but not enabled the resilver activity restarts during
each spa_sync. This patch checks to make sure that the resilver_defer
feature is first enabled before requesting a deferred resilver.
This was originally fixed in illumos-joyent as OS-7982.
Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Signed-off-by: Kody A Kantor <kody@kkantor.com>
External-issue: illumos-joyent OS-7982
Closes #9299
Closes #9338
Ryan Moeller [Wed, 18 Sep 2019 16:05:57 +0000 (12:05 -0400)]
Refactor libzfs_error_init newlines
Move the trailing newlines from the error message strings to the format
strings to more closely match the other error messages.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9330
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.
In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued. A duplicate insertion would be attempted and
its failure would result in a panic.
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #9140
Closes #9163
Prevent gcc -Werror=maybe-uninitialized warnings in spa_wait_common()
This commit fixes the following build failure detected on Debian9
(GCC 6.3.0):
CC [M] module/zfs/spa.o
module/zfs/spa.c: In function ‘spa_wait_common.part.31’:
module/zfs/spa.c:9468:6: error: ‘in_progress’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (!in_progress || spa->spa_waiters_cancel || error)
^
cc1: all warnings being treated as errors
Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Gallagher <john.gallagher@delphix.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9326
ZTS: Fix /usr/bin/env: 'python2': No such file or directory
Since 4f342e45 env(1) must be able to find a "python2" executable in
the "constrained path" on systems configured with --with-python=2.x
otherwise the ZFS Test Suite won't be able to use Python scripts.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9325
Tom Caputi [Mon, 16 Sep 2019 17:07:33 +0000 (13:07 -0400)]
Fix clone handling with encryption roots
Currently, spa_keystore_change_key_sync_impl() does not recurse
into clones when updating encryption roots for either a call to
'zfs promote' or 'zfs change-key'. This can cause children of
these clones to end up in a state where they point to the wrong
dataset as the encryption root. It can also trigger ASSERTs in
some cases where the code checks reference counts on wrapping
keys. This patch fixes this issue by ensuring that this function
properly recurses into clones during processing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9267
Closes #9294
Scrubbing root pools may deadlock on kernels without elevator_change() (#9321)
Originally the zfs_vdev_elevator module option was added as a
convenience so the requested elevator would be automatically set
on the underlying block devices. At the time this was simple
because the kernel provided an API function which did exactly this.
This API was then removed in the Linux 4.12 kernel which prompted
us to add compatibly code to set the elevator via a usermodehelper.
Unfortunately changing the evelator via usermodehelper requires reading
some userland binaries, most notably modprobe(8) or sh(1), from a zfs
dataset on systems with root-on-zfs. This can deadlock the system if
used during the following call path because it may need, if the data
is not already cached in the ARC, reading directly from disk while
holding the spa config lock as a writer:
While the usermodehelper waits sh(1), modprobe(8) is blocked in the
ZIO pipeline trying to read from disk:
INFO: task modprobe:2650 blocked for more than 10 seconds.
Tainted: P OE 5.2.14
modprobe D 0 2650 206 0x00000000
Call Trace:
? __schedule+0x244/0x5f0
schedule+0x2f/0xa0
cv_wait_common+0x156/0x290 [spl]
? do_wait_intr_irq+0xb0/0xb0
spa_config_enter+0x13b/0x1e0 [zfs]
zio_vdev_io_start+0x51d/0x590 [zfs]
? tsd_get_by_thread+0x3b/0x80 [spl]
zio_nowait+0x142/0x2f0 [zfs]
arc_read+0xb2d/0x19d0 [zfs]
...
zpl_iter_read+0xfa/0x170 [zfs]
new_sync_read+0x124/0x1b0
vfs_read+0x91/0x140
ksys_read+0x59/0xd0
do_syscall_64+0x4f/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This commit changes how we use the usermodehelper functionality from
synchronous (UMH_WAIT_PROC) to asynchronous (UMH_NO_WAIT) which prevents
scrubs, and other vdev_elevator_switch() consumers, from triggering the
aforementioned issue.
John Gallagher [Sat, 14 Sep 2019 01:09:06 +0000 (18:09 -0700)]
Add subcommand to wait for background zfs activity to complete
Currently the best way to wait for the completion of a long-running
operation in a pool, like a scrub or device removal, is to poll 'zpool
status' and parse its output, which is neither efficient nor convenient.
This change adds a 'wait' subcommand to the zpool command. When invoked,
'zpool wait' will block until a specified type of background activity
completes. Currently, this subcommand can wait for any of the following:
- Scrubs or resilvers to complete
- Devices to initialized
- Devices to be replaced
- Devices to be removed
- Checkpoints to be discarded
- Background freeing to complete
For example, a scrub that is in progress could be waited for by running
zpool wait -t scrub <pool>
This also adds a -w flag to the attach, checkpoint, initialize, replace,
remove, and scrub subcommands. When used, this flag makes the operations
kicked off by these subcommands synchronous instead of asynchronous.
This functionality is implemented using a new ioctl. The type of
activity to wait for is provided as input to the ioctl, and the ioctl
blocks until all activity of that type has completed. An ioctl was used
over other methods of kernel-userspace communiction primarily for the
sake of portability.
Porting Notes:
This is ported from Delphix OS change DLPX-44432. The following changes
were made while porting:
- Added ZoL-style ioctl input declaration.
- Reorganized error handling in zpool_initialize in libzfs to integrate
better with changes made for TRIM support.
- Fixed check for whether a checkpoint discard is in progress.
Previously it also waited if the pool had a checkpoint, instead of
just if a checkpoint was being discarded.
- Exposed zfs_initialize_chunk_size as a ZoL-style tunable.
- Updated more existing tests to make use of new 'zpool wait'
functionality, tests that don't exist in Delphix OS.
- Used existing ZoL tunable zfs_scan_suspend_progress, together with
zinject, in place of a new tunable zfs_scan_max_blks_per_txg.
- Added support for a non-integral interval argument to zpool wait.
Future work:
ZoL has support for trimming devices, which Delphix OS does not. In the
future, 'zpool wait' could be extended to add the ability to wait for
trim operations to complete.
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #9162
1. Fix issue: Kernel BUG with QAT during decompression #9276.
Now it is uninterruptible for a specific given QAT request,
but Ctrl-C interrupt still works in user-space process.
2. Copy the digest result to the buffer only when doing encryption,
and vise-versa for decryption.
Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chengfei Zhu <chengfeix.zhu@intel.com>
Closes #9276
Closes #9303
Ryan Moeller [Thu, 12 Sep 2019 20:32:32 +0000 (16:32 -0400)]
Canonicalize Python shebangs
/usr/bin/env python3 is the suggested[1] shebang for Python in general
(likewise for python2) and is conventional across platforms. This eases
development on systems where python is not installed in /usr/bin
(FreeBSD for example) and makes it possible to develop in virtual
environments (venv) for isolating dependencies.
Many packaging guidelines discourage the use of /usr/bin/env, but since
this is the canonical way of writing shebangs in the Python community,
many packaging scripts are already equipped to handle substituting the
appropriate absolute path to python automatically.
Some RPM package builders lacking brp-mangle-shebangs need a small
fallback mechanism in the package spec to stamp the appropriate shebang
on installed Python scripts.
Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9314
Matthew Macy [Thu, 12 Sep 2019 20:31:09 +0000 (13:31 -0700)]
Move objnode handling to common code
objnode is OS agnostic and used only by dmu_redact.c.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9315
Matthew Macy [Thu, 12 Sep 2019 20:28:26 +0000 (13:28 -0700)]
Enable compiler to typecheck logging
Annotate spa logging declarations with printflike
Workaround gcc bug (non disable-able warning) by
replacing "" with " "
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9316
Matthew Macy [Wed, 11 Sep 2019 21:25:53 +0000 (14:25 -0700)]
OpenZFS restructuring - move linux tracing code to platform directories
Move Linux specific tracing headers and source to platform directories
and update the build system.
Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9290
Tom Caputi [Wed, 11 Sep 2019 18:16:48 +0000 (14:16 -0400)]
Fix stalled txg with repeated noop scans
Currently, the DSL scan code figures out when it should suspend
processing and allow a txg to continue by calling the function
dsl_scan_check_suspend(). Unfortunately, this function only
allows the scan to suspend at a level 0 block. In the event that
the system is scanning a bunch of empty snapshots or a resilver
is running with a high enough scn_cur_min_txg, the scan will
stop processing each dataset at the root level, deciding it
has nothing left to do. This means that the check_suspend
function is never called and the txg remains stuck until a
dataset is found that has data to scan.
This patch fixes the problem by allowing scans to suspend at
the root level of the objset. For backwards compatibility, we
use the bookmark <objsetid, 0, 0, 0> when we suspend here so
that older versions of the code will work as intended.
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9300
Brian Behlendorf [Wed, 11 Sep 2019 18:14:50 +0000 (11:14 -0700)]
kmodtool: depmod path
Determine the location of depmod on the system, either /sbin/depmod or
/usr/sbin/depmod. Then use that path when generating the specfile.
Additionally, update the Requires lines to reference the package which
provides depmod rather than the binary itself. For CentOS/RHEL 7+8
and all supported Fedora releases this is the kmod package, and for
CentOS/RHEL 6 it is the module-init-tools package.
Reviewed-by: Minh Diep <mdiep@whamcloud.com> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8724
Closes #9310
Brian Behlendorf [Wed, 11 Sep 2019 18:09:50 +0000 (11:09 -0700)]
copy-builtin: SPL must be in Kbuild first (again)
Commit bced7e3 accidentally reintroduced issue #7595 which was
previously addressed by 517d247. Re-apply the original fix to
resolve the issue and include a comment to make it clear the
ordering is important.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Matthew Thode <prometheanfire@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9302
Closes #9208
Brian Behlendorf [Tue, 10 Sep 2019 20:42:30 +0000 (13:42 -0700)]
Fix /etc/hostid on root pool deadlock
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify(). A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool. This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer. For example, to perform a
`zpool attach` as shown in the abbreviated stack below.
To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property. The cached
value is then relied upon for subsequent accesses.
Ryan Moeller [Tue, 10 Sep 2019 20:27:53 +0000 (16:27 -0400)]
Add/generalize abstractions in arc_summary3
Code for interfacing with procfs for kstats and tunables is Linux-
specific. A more generic interface can be used for the abstractions of
loading kstats and various tunable parameters, allowing other platforms
to implement the functions cleanly. In a similar vein, determining the
ZFS/SPL version can be abstracted away in order for other platforms to
provide their own implementations of this function.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9279
Ryan Moeller [Tue, 10 Sep 2019 19:17:54 +0000 (15:17 -0400)]
Add/generalize abstraction in arc_summary2
A more generic interface can be used for the abstraction of loading
kstats, allowing other platforms to implement the function cleanly.
In a similar vein, loading tunables can be abstracted away in order for
other platforms to provide their own implementations of this function.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9277
Brian Behlendorf [Tue, 10 Sep 2019 17:45:46 +0000 (10:45 -0700)]
Enable SIMD for encryption
When adding the SIMD compatibility code in e5db313 the decryption of a
dataset wrapping key was left in a user thread context. This was done
intentionally since it's a relatively infrequent operation. However,
this also meant that the encryption context templates were initialized
using the generic operations. Therefore, subsequent encryption and
decryption operations would use the generic implementation even when
executed by an I/O pipeline thread.
Resolve the issue by initializing the context templates in an I/O
pipeline thread. And by updating zio_do_crypt_uio() to dispatch any
encryption operations to a pipeline thread when called from the user
context. For example, when performing a read from the ARC.
Tested-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9215
Closes #9296
filetest_001_pos verifies that various checksum algorithms detect
corruption by overwriting the underlying vdev on which a file resides.
It is possible for the overwrite to miss the blocks of a file, causing a
spurious failure. This change introduces a function to corrupt the
individual blocks of a file as determined by zdb.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #9288
Ryan Moeller [Mon, 9 Sep 2019 23:04:05 +0000 (19:04 -0400)]
Clean up do_vol_test in zfs_copies tests
Get rid of the `get_used_prop` function. `get_prop used` works fine.
Fix the comment describing the function parameters. The type does not
have a default, and mntp is also used for ext2.
Rename the variable for the number of copies from `copy` to `copies`.
Use a `case` statement to match the type parameter, order the cases
alphabetically, and add a little sanity checking for good measure.
Use eval to make sure the output of commands is silenced rather than
the log messages when redirecting output to /dev/null.
Simplify cases where zfs requires special behavior.
Don't allow the test to loop forever in the event space usage does not
change. Bail out of the loop and fail after an arbitrary number of
iterations.
Add more information to the log message when the test fails, to help
debugging.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9286
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #9289
Matthew Macy [Fri, 6 Sep 2019 18:26:26 +0000 (11:26 -0700)]
OpenZFS restructuring - move platform specific sources
Move platform specific Linux source under module/os/linux/
and update the build system accordingly. Additional code
restructuring will follow to make the common code fully
portable.
Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Closes #9206
Tom Caputi [Thu, 5 Sep 2019 23:22:05 +0000 (19:22 -0400)]
Fix noop receive of raw send stream
Currently, the noop receive code fails to work with raw send streams
and resuming send streams. This happens because zfs_receive_impl()
reads the DRR_BEGIN payload without reading the payload itself.
Normally, the kernel expects to read this itself, but in this case
the recv_skip() code runs instead and it is not prepared to handle
the stream being left at any place other than the beginning of a
record.
This patch resolves this issue by manually reading the DRR_BEGIN
payload in the dry-run case. This patch also includes a number of
small fixups in this code path.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9221
Closes #9173
Ryan Moeller [Thu, 5 Sep 2019 23:20:09 +0000 (19:20 -0400)]
Clean up zfs_clone_010_pos
Remove a lot of unnecessary setting and incrementing of `i`.
Remove unused variable `j`.
Instead of calling out to Python in a loop to generate the same string
repeatedly, generate the string once using shell constructs before
entering the loop.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9284
metaslab_verify_weight_and_frag() shouldn't cause side-effects
`metaslab_verify_weight_and_frag()` a verification function and
by the end of it there shouldn't be any side-effects.
The function calls `metaslab_weight()` which in turn calls
`metaslab_set_fragmentation()`. The latter can dirty and otherwise
not dirty metaslab fro the next TXGand set `metaslab_condense_wanted`
if the spacemaps were just upgraded (meaning we just enabled the
SPACEMAP_HISTOGRAM feature through upgrade).
This patch adds a new flag as a parameter to `metaslab_weight()` and
`metaslab_set_fragmentation()` making the dirtying of the metaslab
optional.
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9185
Closes #9282
Ryan Moeller [Thu, 5 Sep 2019 16:51:59 +0000 (12:51 -0400)]
Refactor checksum operations in tests
md5sum in particular but also sha256sum to a lesser extent is used
in several areas of the test suite for computing checksums. The vast
majority of invocations are followed by `| awk '{ print $1 }'`.
Introduce functions to wrap up `md5sum $file | awk '{ print $1 }'` and
likewise for sha256sum. These also serve as a convenient interface for
alternative implementations on other platforms.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9280
Matthew Macy [Thu, 5 Sep 2019 16:34:54 +0000 (09:34 -0700)]
OpenZFS restructuring - move platform specific headers
Move platform specific Linux headers under include/os/linux/.
Update the build system accordingly to detect the platform.
This lays some of the initial groundwork to supporting building
for other platforms.
As part of this change it was necessary to create both a user
and kernel space sys/simd.h header which can be included in
either context. No functional change, the source has been
refactored and the relevant #include's updated.
Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Matthew Macy <mmacy@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9198
Debian zfs-dkms package generated by alien doesn't call the prerm script
(rpm's %preun) with an integer as first parameter, which results in the
following warning when the package is uninstalled:
"zfs-dkms.prerm: line 3: [: remove: integer expression expected"
Modify the if-condition to avoid the warning.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9271
Ryan Moeller [Tue, 3 Sep 2019 19:44:08 +0000 (15:44 -0400)]
Use the right booleans
TRUE and FALSE happen to be defined, but we should use B_TRUE and
B_FALSE for the sake of consistency.
Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9264
Pavel Zakharov [Tue, 3 Sep 2019 18:29:52 +0000 (14:29 -0400)]
zvol_wait script should ignore partially received zvols
Partially received zvols won't have links in /dev/zvol.
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes #9260
Always refuse receving non-resume stream when resume state exists
This fixes a hole in the situation where the resume state is left from
receiving a new dataset and, so, the state is set on the dataset itself
(as opposed to %recv child).
Additionally, distinguish incremental and resume streams in error
messages.
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #9252
Igor K [Tue, 3 Sep 2019 17:46:41 +0000 (20:46 +0300)]
ZTS: Fix removal_cancel.ksh
Create a larger file to extend the time required to perform the
removal. Occasional failures were observed due to the removal
completing before the cancel could be requested.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9259
George Wilson [Tue, 3 Sep 2019 02:17:51 +0000 (22:17 -0400)]
maxinflight can overflow in spa_load_verify_cb()
When running on larger memory systems, we can overflow the value of
maxinflight. This can result in maxinflight having a value of 0 causing
the system to hang.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Wilson <george.wilson@delphix.com>
Closes #9272
Andrea Gelmini [Tue, 3 Sep 2019 01:17:39 +0000 (03:17 +0200)]
Fix typos
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9251
Andrea Gelmini [Tue, 3 Sep 2019 01:14:53 +0000 (03:14 +0200)]
Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9250
Andrea Gelmini [Tue, 3 Sep 2019 01:13:19 +0000 (03:13 +0200)]
Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9249
Andrea Gelmini [Tue, 3 Sep 2019 01:12:01 +0000 (03:12 +0200)]
Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9247
Andrea Gelmini [Tue, 3 Sep 2019 01:10:31 +0000 (03:10 +0200)]
Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9246
Andrea Gelmini [Tue, 3 Sep 2019 01:08:56 +0000 (03:08 +0200)]
Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9244
Andrea Gelmini [Tue, 3 Sep 2019 01:07:35 +0000 (03:07 +0200)]
Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9243
Andrea Gelmini [Tue, 3 Sep 2019 00:58:26 +0000 (02:58 +0200)]
Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9242
Andrea Gelmini [Tue, 3 Sep 2019 00:56:41 +0000 (02:56 +0200)]
Fix typos in module/zfs/
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9240
Andrea Gelmini [Tue, 3 Sep 2019 00:53:27 +0000 (02:53 +0200)]
Fix typos in lib/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9237
Andrea Gelmini [Fri, 30 Aug 2019 23:53:48 +0000 (01:53 +0200)]
Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9248
Andrea Gelmini [Fri, 30 Aug 2019 23:52:00 +0000 (01:52 +0200)]
Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9245
Andrea Gelmini [Fri, 30 Aug 2019 21:32:18 +0000 (23:32 +0200)]
Fix typos in module/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9241
Andrea Gelmini [Fri, 30 Aug 2019 21:26:07 +0000 (23:26 +0200)]
Fix typos in modules/icp/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9239
Andrea Gelmini [Fri, 30 Aug 2019 16:53:15 +0000 (18:53 +0200)]
Fix typos in include/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9238
Andrea Gelmini [Fri, 30 Aug 2019 16:46:52 +0000 (18:46 +0200)]
Fix typos in etc/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9236
Andrea Gelmini [Fri, 30 Aug 2019 16:44:43 +0000 (18:44 +0200)]
Fix typos in contrib/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9235
Andrea Gelmini [Fri, 30 Aug 2019 16:43:30 +0000 (18:43 +0200)]
Fix typos in cmd/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9234
Andrea Gelmini [Fri, 30 Aug 2019 16:41:35 +0000 (18:41 +0200)]
Fix typos in man/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9233
Andrea Gelmini [Fri, 30 Aug 2019 16:40:30 +0000 (18:40 +0200)]
Fix typos in config/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9232
Igor K [Fri, 30 Aug 2019 16:32:25 +0000 (19:32 +0300)]
Fix refquota_007_neg.ksh
Must use 'zfs' instead of '$ZFS' which is undefined.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9257
Paul Dagnelie [Fri, 30 Aug 2019 16:28:31 +0000 (09:28 -0700)]
Prevent metaslab_sync panic due to spa_final_dirty_txg
If a pool enables the SPACEMAP_HISTOGRAM feature shortly before being
exported, we can enter a situation that causes a kernel panic. Any metaslabs
that are loaded during the final dirty txg and haven't already been condensed
will cause metaslab_sync to proceed after the final dirty txg so that the
condense can be performed, which there are assertions to prevent. Because of
the nature of this issue, there are a number of ways we can enter this
state. Rather than try to prevent each of them one by one, potentially missing
some edge cases, we instead cut it off at the point of intersection; by
preventing metaslab_sync from proceeding if it would only do so to perform a
condense and we're past the final dirty txg, we preserve the utility of the
existing asserts while preventing this particular issue.
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9185
Closes #9186
Closes #9231
Closes #9253
Ryan Moeller [Thu, 29 Aug 2019 20:11:29 +0000 (16:11 -0400)]
Simplify deleting partitions in libtest
Eliminate unnecessary code duplication. We can use a for-loop instead
of a while-loop. There is no need to echo $DISKSARRAY in a subshell or
return 0. Declare all variables with typeset.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9224
Ryan Moeller [Thu, 29 Aug 2019 18:03:09 +0000 (14:03 -0400)]
Use compatible arg order in tests
BSD getopt() and getopt_long() want options before arguments.
Reorder arguments to zfs/zpool in tests to put all the options first.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9228
Paul Dagnelie [Thu, 29 Aug 2019 17:20:36 +0000 (10:20 -0700)]
Keep more metaslabs loaded
With the other metaslab changes loaded onto a system, we can
significantly reduce the memory usage of each loaded metaslab and
unload them on demand if there is memory pressure. However, none
of those changes actually result in us keeping more metaslabs loaded.
If we don't keep more metaslabs loaded, we will still have to wait
for demand-loading to finish when no loaded metaslab can satisfy our
allocation, which can cause ZIL performance issues. In addition,
performance is traditionally measured by IOs per unit time, while
unloading is currently done on a txg-count basis. Txgs can take a
widely varying range of times, from tenths of a second to several
seconds. This can result in confusing, hard to predict behavior.
This change simply adds a time-based component to metaslab unloading.
A metaslab will remain loaded for one minute and 8 txgs (by default)
after it was last used, unless it is evicted due to memory pressure.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
External-issue: DLPX-65016
External-issue: DLPX-65047
Closes #9197
Pavel Zakharov [Wed, 28 Aug 2019 22:02:58 +0000 (18:02 -0400)]
zfs_handle used after being closed/freed in change_one callback
This is a typical case of use after free. We would call zfs_close(zhp)
which would free the handle, and then call zfs_iter_children() on that
handle later. This change ensures that the zfs_handle is only closed
when we are ready to return.
Running `zfs inherit -r sharenfs pool` was failing with an error
code without any error messages. After some debugging I've pinpointed
the issue to be memory corruption, which would cause zfs to try to
issue an ioctl to the wrong device and receive ENOTTY.
Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Issue #7967
Closes #9165
Tony Nguyen [Wed, 28 Aug 2019 21:56:54 +0000 (15:56 -0600)]
Use smaller default slack/delta value for schedule_hrtimeout_range()
For interrupt coalescing, cv_timedwait_hires() uses a 100us slack/delta
for calls to schedule_hrtimeout_range(). This 100us slack can be costly
for small writes.
This change improves small write performance by passing resolution `res`
parameter to schedule_hrtimeout_range() to be used as delta/slack. A new
tunable `spl_schedule_hrtimeout_slack_us` is added to preserve old
behavior when desired.
Performance observations on 8K recordsize filesystem:
- 8K random writes at 1-64 threads, up to 60% improvement for one thread
and smaller gains as thread count increases. At >64 threads, 2-5%
decrease in performance was observed.
- 8K sequential writes, similar 60% improvement for one thread and
leveling out around 64 threads. At >64 threads, 5-10% decrease in
performance was observed.
- 128K sequential write sees 1-5 for the 128K. No observed regression at
high thread count.
Testing done on Ubuntu 18.04 with 4.15 kernel, 8vCPUs and SSD storage on
VMware ESX.
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #9217
Don Brady [Wed, 28 Aug 2019 17:44:46 +0000 (11:44 -0600)]
Tag ABD pages for exclusion in kernel crash dumps
Tag the ABD data pages so that they can be identified for exclusion
from kernel crash dumps. Eliminating the zfs file data allows for
significantly smaller crash dump files. Note that ZFS in illumos has
always excluded the zfs data pages from a kernel crash dump.
This change tags ARC scatter data pages so they can be identified from
the makedumpfile(8) command. That command is used to create smaller
dump files by ignoring some memory regions and using compression. It
already filters file data from the VFS page cache and will now be able
to exclude ZFS file data pages from the dump file.
A corresponding change to makeumpfile(8) is required to identify ZFS
data pages.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #8899
Chunwei Chen [Wed, 28 Aug 2019 17:42:02 +0000 (10:42 -0700)]
Fix zil replay panic when TX_REMOVE followed by TX_CREATE
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.
We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7151
Closes #8910
Closes #9123
Closes #9145
Ryan Moeller [Wed, 28 Aug 2019 17:38:40 +0000 (13:38 -0400)]
Prefer `for (;;)` to `while (TRUE)`
Defining a special constant to make an infinite loop is excessive,
especially when the name clashes with symbols commonly defined on
some platforms (ie FreeBSD).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9219
Andriy Gapon [Tue, 27 Aug 2019 20:45:53 +0000 (23:45 +0300)]
zfs_ioc_snapshot: check user-prop permissions on snapshotted datasets
Previously, the permissions were checked on the pool which was obviously
incorrect.
After this change, zfs_check_userprops() only validates the properties
without any permission checks. The permissions are checked individually
for each snapshotted dataset.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #9179
Closes #9180
Richard Allen [Tue, 27 Aug 2019 20:44:02 +0000 (21:44 +0100)]
Fix Plymouth passphrase prompt in initramfs script
Entering the ZFS encryption passphrase under Plymouth wasn't working
because in the ZFS initrd script, Plymouth was calling zfs via
"--command", which wasn't passing through the filesystem argument to
zfs load-key properly (it was passing through the single quotes around
the filesystem name intended to handle spaces literally,
which zfs load-key couldn't understand).
Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Garrett Fields <ghfields@gmail.com> Signed-off-by: Richard Allen <belperite@gmail.com>
Issue #9193
Closes #9202
Tom Caputi [Tue, 27 Aug 2019 16:55:51 +0000 (12:55 -0400)]
Fix deadlock in 'zfs rollback'
Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.
This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.
Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9203
Ryan Moeller [Mon, 26 Aug 2019 18:48:31 +0000 (14:48 -0400)]
Restore :: in Makefile.am
The double-colon looked like a typo, but it's actually an obscure
feature. Rules with :: may appear multiple times and are run
independently of one another in the order they appear. The use of ::
for distclean-local was conventional, not accidental.
Add comments to indicate the intentional use of double-colon rules.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9210
Ryan Moeller [Mon, 26 Aug 2019 01:30:39 +0000 (21:30 -0400)]
Split argument list, satisfy shellcheck SC2086
Split the arguments for ${TEST_RUNNER} across multiple lines for
clarity. Also added quotes in the message to match the invoked command.
Unquoted variables in argument lists are subject to splitting. In this
particular case we can't quote the variable because it is an optional
argument. Use the method suggested in the description linked below,
instead.
The technique is to use an unquoted variable with an alternate value.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9212
Brian Behlendorf [Fri, 23 Aug 2019 00:37:48 +0000 (17:37 -0700)]
ZTS: Fix in-tree dbufstats test case
Commit a887d653 updated the dbufstats such that escalated privileges
are required. Since all tests under cli_user are run with normal
privileges move this test case to a location where it will be run
required privileges.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9118
Closes #9196
Ryan Moeller [Fri, 23 Aug 2019 00:26:51 +0000 (20:26 -0400)]
Make slog test setup more robust
The slog tests fail when attempting to create pools using file vdevs
that already exist from previous test runs. Remove these files in the
setup for the test.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9194
Brian Behlendorf [Thu, 22 Aug 2019 17:36:57 +0000 (10:36 -0700)]
ZTS: Use decimal values when setting tunables
The mdb_set_uint32 function requires that the values passed in be
decimal. This was overlooked initially because the matching Linux
function accepts both decimal and hexadecimal values.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9125
Closes #9195
Brian Behlendorf [Thu, 22 Aug 2019 16:53:45 +0000 (09:53 -0700)]
Fix automake program name transformations (#9190)
Automake can perform program name transformations at install time.
However, arc_summary has its own name transformation taking place,
which interferes with the automake transforms. The automake transforms
must be taken into account in order to resolve the conflict.