Matthew Ahrens [Mon, 3 Apr 2017 16:47:11 +0000 (09:47 -0700)]
OpenZFS 8375 - Kernel memory leak in nvpair code
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@joyent.com> Reviewed-by: Don Brady <dev.fs.zfs@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8375
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/843c211
Closes #6578
alaviss [Tue, 29 Aug 2017 17:17:49 +0000 (00:17 +0700)]
libtpool: don't clone affinity if not supported
pthread_attr_(get/set)affinity_np() is glibc-only. This commit
disable the code path that use those functions in non-glibc
system. Fixes the following when building with musl:
Richard Yao [Tue, 23 Sep 2014 18:29:30 +0000 (14:29 -0400)]
Implement --enable-debuginfo to force debuginfo
Inspection of a Ubuntu 14.04 x64 system revealed that the config file
used to build the kernel image differs from the config file used to
build kernel modules by the presence of CONFIG_DEBUG_INFO=y:
This in itself is insufficient to show that the kernel is built with
debuginfo, but a cursory analysis of the debuginfo provided and the
size of the kernel strongly suggests that it was built with
CONFIG_DEBUG_INFO=y while the modules were not. Installing
linux-image-$(uname -r)-dbgsym had no obvious effect on the debuginfo
provided by either the modules or the kernel.
The consequence is that issue reports from distributions such as Ubuntu
and its derivatives build kernel modules without debuginfo contain
nonsensical backtraces. It is therefore desireable to force generation
of debuginfo, so we implement --enable-debuginfo. Since the build system
can build both userspace components and kernel modules, the generic
--enable-debuginfo option will force debuginfo for both. However, it
also supports --enable-debuginfo=kernel and --enable-debuginfo=user for
finer grained control.
Enabling debuginfo for the kernel modules works by injecting
CONFIG_DEBUG_INFO=y into the make environment. This is enables
generation of debuginfo by the kernel build systems on all Linux
kernels, but the build environment is slightly different int hat
CONFIG_DEBUG_INFO has not been in the CPP. Adding -DCONFIG_DEBUG_INFO
would fix that, but it would also cause build failures on kernels where
CONFIG_DEBUG_INFO=y is already set. That would complicate its use in
DKMS environments that support a range of kernels and is therefore
undesireable. We could write a compatibility shim to enable
CONFIG_DEBUG_INFO only when it is explicitly disabled, but we forgo
doing that because it is unnecessary. Nothing in ZoL or the kernel uses
CONFIG_DEBUG_INFO in the CPP at this time and that is unlikely to
change.
Enabling debuginfo for the userspace components is done by injecting -g
into CPPFLAGS. This is not necessary because the build system honors the
environment's CPPFLAGS by appending them to the actual CPPFLAGS used,
but it is supported for consistency.
Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@clusterhq.com>
Closes #2734
Richard Yao [Tue, 23 Sep 2014 17:31:33 +0000 (13:31 -0400)]
Make --enable-debug fail when given bogus args
Currently, bogus options to --enable-debug become --disable-debug. That
means that passing --enable-debug=true is analogous to --disable-debug,
but the result is counterintuitive. We switch to AS_CASE to allow us to
fail when given a bogus option.
Also, we modify the text printed to clarify that --enable-debug enables
assertions.
Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@clusterhq.com>
Closes #2734
Matthew Ahrens [Tue, 29 Aug 2017 16:00:28 +0000 (09:00 -0700)]
Enhance comments for large dnode project
Fix a few nits in the comments from large dnodes. Also import
some of the commit message as a comment in the code, making
it more accessible.
Reviewed-by: @rottegift Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Matt Ahrens <mahrens@delphix.com>
Closes #6551
dbavatar [Thu, 24 Aug 2017 17:48:23 +0000 (13:48 -0400)]
Linux 4.8+ compatibility fix for vm stats
vm_node_stat must be used instead of vm_zone_stat. Unfortunately the
old code still compiles potentially leading to silent failure of
arc_evictable_memory()
AKAMAI: CR 3816601: Regression in zfs dropcache test
George Melikov [Thu, 24 Aug 2017 17:36:17 +0000 (20:36 +0300)]
Remove copyright duplicate in zpool man page
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6553
chrisrd [Thu, 24 Aug 2017 17:31:59 +0000 (03:31 +1000)]
dbuf_cons: deduplicate multilist_link_init()
Remove harmless duplicate multilist_link_init() introduced by
commit d3c2ae1.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #6552
LOLi [Tue, 22 Aug 2017 18:53:40 +0000 (20:53 +0200)]
Add support for DMU_OTN_* types in dbufstat.py
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6535
vdev_queue:
- Track the last position of each vdev, including the io size,
in order to detect linear access of the following zio.
- Remove duplicate `vq_lastoffset`
vdev_mirror:
- Correctly calculate the zio offset (signedness issue)
- Deprecate `vdev_queue_register_lastoffset()`
- Add `VDEV_LABEL_START_SIZE` to zio offset of leaf vdevs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes #6461
Brian Behlendorf [Mon, 21 Aug 2017 17:00:12 +0000 (10:00 -0700)]
zimport.sh: Allow custom pool create options
Allow custom options to be passed to 'zpool create` when creating
a new pool.
Normally zimport.sh is intented to prevent accidentally introduced
incompatibilities so we want the default behavior. However, when
introducing a known incompatibility with a feature flag we need a
way to disable the feature. By adding a line like the following
to the commit message the feature can be disabled allowing the
pool to be compatibile with older versions.
LOLi [Mon, 21 Aug 2017 16:31:54 +0000 (18:31 +0200)]
Disable mount(8) canonical paths in do_mount()
By default the mount(8) command, as invoked by 'zfs mount', will try
to resolve any path parameter in its canonical form: this could lead
to mount failures when the cwd contains a symlink having the same name
of the dataset being mounted.
Fix this by explicitly disabling mount(8) path canonicalization.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #1791
Closes #6429
Closes #6437
LOLi [Mon, 21 Aug 2017 15:59:48 +0000 (17:59 +0200)]
Fix range locking in ZIL commit codepath
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput
we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr
offset and length to the offset and length of the BIO from
zvol_write()->zvol_log_write(): these offset and length are later used
to take a range lock in zillog->zl_get_data function: zvol_get_data().
Now suppose we have a ZVOL with blocksize=8K and push 4K writes to
offset 0: we will only be range-locking 0-4096. This means the
ASSERTion we make in dbuf_unoverride() is no longer valid because now
dmu_sync() is called from zilog's get_data functions holding a partial
lock on the dbuf.
Fix this by taking a range lock on the whole block in zvol_get_data().
LOLi [Thu, 17 Aug 2017 21:28:17 +0000 (23:28 +0200)]
Fix remounting snapshots read-write
It's not enough to preserve/restore MS_RDONLY on the superblock flags
to avoid remounting a snapshot read-write: be explicit about our
intentions to the VFS layer so the readonly bit is updated correctly
in do_remount_sb().
* Removed zfs-script-config.sh.in. When building 'make' generates
a common.sh with in-tree path information from the common.sh.in
template. This file and sourced by the test scripts and used
for in-tree testing, it is not included in the packages. When
building packages 'make install' uses the same template to
create a new common.sh which is appropriate for the packaging.
* Removed unused functions/variables from scripts/common.sh.in.
Only minimal path information and configuration environment
variables remain.
* Removed unused scripts from scripts/ directory.
* Remaining shell scripts in the scripts directory updated to
cleanly pass shellcheck and added to checked scripts.
* Renamed tests/test-runner/cmd/ to tests/test-runner/bin/ to
match install location name.
* Removed last traces of the --enable-debug-dmu-tx configure
options which was retired some time ago.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6509
Brian Behlendorf [Tue, 15 Aug 2017 23:40:04 +0000 (16:40 -0700)]
Fix ZTS grow_pool/setup
The addition of the large_dnode_008_pos test case, which runs
right before this one, exposed some racy behavior in grow_pool
setup.sh on the Ubuntu kmemleak builder. Before creating
partitions on a device destroying any existing ones.
ERROR: set_partition 1 100mb loop0 exited 1
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6499
Closes #6516
sckobras [Mon, 14 Aug 2017 22:18:26 +0000 (00:18 +0200)]
vdev_id: implement slot numbering by port id
With HPE hardware and hpsa-driven SAS adapters, only a single phy is
reported, but no individual per-port phys (ie. no phy* entry below
port_dir), which breaks topology detection in the current sas_handler
code. Instead, slot information can be derived directly from the port
number. This change implements a new slot keyword "port" similar to
"id" and "lun", and assumes a default phy/port of 0 if no individual
phy entry can be found. It allows to use the "sas_direct" topology with
current HPE Dxxxx and Apollo 45xx JBODs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Daniel Kobras <d.kobras@science-computing.de>
Closes #6484
Don Brady [Mon, 14 Aug 2017 22:17:15 +0000 (18:17 -0400)]
Add corruption failure option to zinject(8)
Added a 'corrupt' error option that will flip a bit in the data
after a read operation. This is useful for generating checksum
errors at the device layer (in a mirror config for example). It
is also used to validate the diagnosis of checksum errors from
the zfs diagnosis engine.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@intel.com>
Closes #6345
Tom Caputi [Mon, 14 Aug 2017 17:36:48 +0000 (13:36 -0400)]
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
gaurkuma [Fri, 11 Aug 2017 15:56:24 +0000 (08:56 -0700)]
Allow longer SPA names in stats
The pool name can be 256 chars long. Today, in /proc/spl/kstat/zfs/
the name is limited to < 32 characters. This change is to allows
bigger pool names.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: gaurkuma <gauravk.18@gmail.com>
Closes #6481
Brian Behlendorf [Fri, 11 Aug 2017 15:51:44 +0000 (08:51 -0700)]
Simplify threads, mutexs, cvs and rwlocks
* Simplify threads, mutexs, cvs and rwlocks
* Update the zk_thread_create() function to use the same trick
as Illumos. Specifically, cast the new pthread_t to a void
pointer and return that as the kthread_t *. This avoids the
issues associated with managing a wrapper structure and is
safe as long as the callers never attempt to dereference it.
* Update all function prototypes passed to pthread_create() to
match the expected prototype. We were getting away this with
before since the function were explicitly cast.
* Replaced direct zk_thread_create() calls with thread_create()
for code consistency. All consumers of libzpool now use the
proper wrappers.
* The mutex_held() calls were converted to MUTEX_HELD().
* Removed all mutex_owner() calls and retired the interface.
Instead use MUTEX_HELD() which provides the same information
and allows the implementation details to be hidden. In this
case the use of the pthread_equals() function.
* The kthread_t, kmutex_t, krwlock_t, and krwlock_t types had
any non essential fields removed. In the case of kthread_t
and kcondvar_t they could be directly typedef'd to pthread_t
and pthread_cond_t respectively.
* Removed all extra ASSERTS from the thread, mutex, rwlock, and
cv wrapper functions. In practice, pthreads already provides
the vast majority of checks as long as we check the return
code. Removing this code from our wrappers help readability.
* Added TS_JOINABLE state flag to pass to request a joinable rather
than detached thread. This isn't a standard thread_create() state
but it's the least invasive way to pass this information and is
only used by ztest.
sanjeevbagewadi [Thu, 10 Aug 2017 22:53:40 +0000 (04:23 +0530)]
zio_dva_throttle_done() should allow zinjected ZIO
If fault injection is enabled, the ZIO_FLAG_IO_RETRY could be set by
zio_handle_device_injection() to generate the FMA events and update
stats. Hence, ignore the flag and process such zios.
A better fix would be to add another flag in the zio_t to indicate that
the zio is failed because of a zinject rule. However, considering the
fact that we do this in debug bits, we could do with the crude check
using the global flag zio_injection_enabled which is set to 1 when
zinject records are added.
* ztest.1 man page: fix typo
* zfs-module-parameters.5 man page: fix grammar
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #6492
The test case frequently hangs on buildbot
TEST builders. Disable it for now.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6487
OpenZFS provides a library called tpool which implements thread
pools for user space applications. Porting this library means
the zpool utility no longer needs to borrow the kernel mutex and
taskq interfaces from libzpool. This code was updated to use
the tpool library which behaves in a very similar fashion.
Porting libtpool was relatively straight forward and minimal
modifications were needed. The core changes were:
* Fully convert the library to use pthreads.
* Updated signal handling.
* lmalloc/lfree converted to calloc/free
* Implemented portable pthread_attr_clone() function.
Finally, update the build system such that libzpool.so is no
longer linked in to zfs(8), zpool(8), etc. All that is required
is libzfs to which the zcommon soures were added (which is the way
it always should have been). Removing the libzpool dependency
resulted in several build issues which needed to be resolved.
* Moved zfeature support to module/zcommon/zfeature_common.c
* Moved ratelimiting to to module/zfs/zfs_ratelimit.c
* Moved get_system_hostid() to lib/libspl/gethostid.c
* Removed use of cmn_err() in zcommon source
* Removed dprintf_setup() call from zpool_main.c and zfs_main.c
* Removed highbit() and lowbit()
* Removed unnecessary library dependencies from Makefiles
* Removed fletcher-4 kstat in user space
* Added sha2 support explicitly to libzfs
* Added highbit64() and lowbit64() to zpool_util.c
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6442
Ned Bass [Tue, 8 Aug 2017 15:41:31 +0000 (08:41 -0700)]
Add debug log entries for failed receive records
Log contents of a receive record if an error occurs while writing
it out to the pool. This may help determine the cause when backup
streams are rejected as invalid.
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6465
When performing concurrent object allocations using the new
multi-threaded allocator and large dnodes it's possible to
allocate overlapping large dnodes.
This case should have been handled by detecting an error
returned by dnode_hold_impl(). But that logic only checked
the returned dnp was not-NULL, and the dnp variable was not
reset to NULL when retrying. Resolve this issue by properly
checking the return value of dnode_hold_impl().
Additionally, it was possible that dnode_hold_impl() would
misreport a dnode as free when it was in fact in use. This
could occurs for two reasons:
* The per-slot zrl_lock must be held over the entire critical
section which includes the alloc/free until the new dnode
is assigned to children_dnodes. Additionally, all of the
zrl_lock's in the range must be held to protect moving
dnodes.
* The dn->dn_ot_type cannot be solely relied upon to check
the type. When allocating a new dnode its type will be
DMU_OT_NONE after dnode_create(). Only latter when
dnode_allocate() is called will it transition to the new
type. This means there's a window when allocating where
it can mistaken for a free dnode.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Ned Bass <bass6@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6414
Closes #6439
Sen Haerens [Thu, 3 Aug 2017 16:56:15 +0000 (18:56 +0200)]
Fix zpool events scripted mode tab separator
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Sen Haerens <sen@senhaerens.be>
Closes #6444
Closes #6445
rsend tests in the test suite frequently create and
destroy datasets. It is possible for zfs destroy to
return an error code indicating the dataset is busy.
Simply use a log_must_busy in these cases to retry
destroying those datasets. Other fixes to rsend test
cases to avoid unmounting and remounting filesystems
and some cleanup.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6418
Ned Bass [Thu, 3 Aug 2017 04:16:12 +0000 (21:16 -0700)]
Use SET_ERROR for constant non-zero return codes
Update many return and assignment statements to follow the convention
of using the SET_ERROR macro when returning a hard-coded non-zero
value from a function. This aids debugging by recording the error
codes in the debug log.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6441
Tony Hutter [Wed, 2 Aug 2017 16:08:38 +0000 (09:08 -0700)]
Only record zio->io_delay on reads and writes
While investigating https://github.com/zfsonlinux/zfs/issues/6425 I
noticed that ioctl ZIOs were not setting zio->io_delay correctly. They
would set the start time in zio_vdev_io_start(), but never set the end
time in zio_vdev_io_done(), since ioctls skip it and go straight to
zio_done(). This was causing spurious "delayed IO" events to appear,
which would eventually get rate-limited and displayed as
"Missed events" messages in zed.
To get around the problem, this patch only sets zio->io_delay for read
and write ZIOs, since that's all we care about anyway.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6425
Closes #6440
At import time spa_import() calls zvol_create_minors() directly: with
the current implementation we have no way to avoid device node
creation when volmode=none.
Fix this by enforcing volmode=none directly in zvol_alloc().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6426
Test case zfs_send_007_pos regularly is killed
by test-runner during zfs-tests on buildbot. Disable
it for now until further investigation can be done.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6422
If we are in the middle of an incremental 'zfs receive', the child
.../%recv will exist. If we run 'zfs promote' .../%recv, it will "work",
but then zfs gets confused about the status of the new dataset.
Attempting to do this promote should be an error.
Similarly renaming .../%recv datasets should not be allowed.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #4843
Closes #6339
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #6409
Closes #6410
Ned Bass [Wed, 26 Jul 2017 06:09:48 +0000 (23:09 -0700)]
Add line info and SET_ERROR() to ZFS debug log
Redefine the SET_ERROR macro in terms of __dprintf() so the error
return codes get logged as both tracepoint events (if tracepoints are
enabled) and as ZFS debug log entries. This also allows us to use
the same definition of SET_ERROR() in kernel and user space.
Define a new debug flag ZFS_DEBUG_SET_ERROR=512 that may be bitwise
or'd into zfs_flags. Setting this flag enables both dprintf() and
SET_ERROR() messages in the debug log. That is, setting
ZFS_DEBUG_SET_ERROR and ZFS_DEBUG_DPRINTF|ZFS_DEBUG_SET_ERROR are
equivalent (this was done for sake of simplicity). Leaving
ZFS_DEBUG_SET_ERROR unset suppresses the SET_ERROR() messages which
helps avoid cluttering up the logs.
Remove the zfs_set_error_class tracepoints event class since
SET_ERROR() now uses __dprintf(). This sacrifices a bit of
granularity when selecting individual tracepoint events to enable but
it makes the code simpler.
Include file, function, and line number information in debug log
entries. The information is now added to the message buffer in
__dprintf() and as a result the zfs_dprintf_class tracepoints event
class was changed from a 4 parameter interface to a single parameter.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6400
Brian Behlendorf [Wed, 26 Jul 2017 01:57:00 +0000 (18:57 -0700)]
Fix zpool-features.5 indentation
The userobj_accounting feature described in the zpool-features.5
man page was incorrectly indented. Fix it.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6402
Ned Bass [Wed, 26 Jul 2017 01:52:40 +0000 (18:52 -0700)]
Some additional send stream validity checking
Check in the DMU whether an object record in a send stream being
received contains an unsupported dnode slot count, and return an
error if it does. Failure to catch an unsupported dnode slot count
would result in a panic when the SPA attempts to increment the
reference count for the large_dnode feature and the pool has the
feature disabled. This is not normally an issue for a well-formed
send stream which would have the DMU_BACKUP_FEATURE_LARGE_DNODE flag
set if it contains large dnodes, so it will be rejected as
unsupported if the required feature is disabled. This change adds a
missing object record field validation.
Add missing stream feature flag checks in
dmu_recv_resume_begin_check().
Consolidate repetitive comment blocks in dmu_recv_begin_check().
Update zstreamdump to print the dnode slot count (dn_slots) for an
object record when running in verbose mode.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6396
Brian Behlendorf [Tue, 25 Jul 2017 19:20:52 +0000 (12:20 -0700)]
Fix 'zpool clear' on suspended pools
'zpool clear' should be able to resume I/O on suspended, but otherwise
healthy, pools.
4a283c7 accidentally introduced a new code path where we call
txg_wait_synced() on the suspended pool before we had the chance to
resume I/O via zio_resume(): this results in the 'zpool clear'
command hanging indefinitely, waiting for a TXG that cannot be synced.
Fix this by avoiding the call to txg_wait_synced().
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6399
The previous autoconf test for the presence of super_setup_bdi_name()
uses an invocation with an incorrect type signature, producing a
warning by the compiler when the test is run. This gets elevated to an
error when compiling with -Werror=format-security, causing autoconf to
falsely infer super_setup_bdi_name() is not present. This updates the
testing code to match the invocation used in
include/linux/vfs_compat.h.
Olaf Faaland [Sat, 15 Jul 2017 01:15:00 +0000 (18:15 -0700)]
Report MMP_STATE_NO_HOSTID immediately
There is no need to perform the activity check before detecting that the
user must set the system hostid, because the pool's multihost property
is on, but spa_get_hostid() returned 0. The initial call to
vdev_uberblock_load() provided the information required.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6388
Olaf Faaland [Fri, 21 Jul 2017 00:54:26 +0000 (17:54 -0700)]
Add callback for zfs_multihost_interval
Add a callback to wake all running mmp threads when
zfs_multihost_interval is changed.
This is necessary when the interval is changed from a very large value
to a significantly lower one, while pools are imported that have the
multihost property enabled.
Without this commit, the mmp thread does not wake up and detect the new
interval until after it has waited the old multihost interval time. A
user monitoring mmp writes via the provided kstat would be led to
believe that the changed setting did not work.
Added a test in the ZTS under mmp to verify the new functionality is
working.
Added a test to ztest which starts and stops mmp threads, and calls into
the code to signal sleeping mmp threads, to test for deadlocks or
similar locking issues.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6387
Olaf Faaland [Fri, 14 Jul 2017 23:32:55 +0000 (16:32 -0700)]
Skip activity check for zhack RO import
"zhack feature stat" performs a read-only import, so the MMP activity
check is not necessary.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6388
Closes #6389
Olaf Faaland [Wed, 19 Jul 2017 01:11:08 +0000 (18:11 -0700)]
Add zgenhostid utility script
Turning the multihost property on requires that a hostid be set to allow
ZFS to determine when a foreign system is attemping to import a pool.
The error message instructing the user to set a hostid refers to
genhostid(1).
Genhostid(1) is not available on SUSE Linux. This commit adds a script
modeled after genhostid(1) for those users.
Zgenhostid checks for an /etc/hostid file; if it does not exist, it
creates one and stores a value. If the user has provided a hostid as an
argument, that value is used. Otherwise, a random hostid is generated
and stored.
This differs from the CENTOS 6/7 versions of genhostid, which overwrite
the /etc/hostid file even though their manpages state otherwise.
A man page for zgenhostid is added. The one for genhostid is in (1), but
I put zgenhostid in (8) because I believe it's more appropriate.
The mmp tests are modified to use zgenhostid to set the hostid instead
of using the spl_hostid module parameter. zgenhostid will not replace
an existing /etc/hostid file, so new mmp_clear_hostid calls are
required.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6358
Closes #6379
Olaf Faaland [Mon, 24 Jul 2017 15:48:28 +0000 (08:48 -0700)]
Release SCL_STATE in map_write_done()
The config lock must be held for the duration of the MMP write.
Since the I/Os are executed via map_nowait(), the done function
is the only place where we know the write has completed.
Since SCL_STATE is taken as reader, overlapping I/Os do not
create a deadlock. The refcount is simply increased when new
I/Os are queued and decreased when I/Os complete.
Test case added which exercises the probe IO call path to
verify the fix and prevent a regression.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6394
Olaf Faaland [Tue, 18 Jul 2017 18:43:55 +0000 (11:43 -0700)]
Revert Fix vdev_probe() call wrt SCL_STATE_ALL
This reverts commit cc9c6bc, which has been causing intermittent
test failures on buildbot. A correct fix for this locking issue
has been applied in a separate patch.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
In zed event test cases, a brief delay was introduced
to allow for events to make it to the zed log. On at least
one buildbot builder, the 1 second delay is not long enough.
Therefore, increasing the delay should ensure the zed has
more than enough time to write to its log.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6395
If we're creating a pool with version >= SPA_VERSION_DSL_SCRUB (v11)
we need to account for additional space needed by the origin dataset
which will also be snapshotted: "poolname"+"/"+"$ORIGIN"+"@"+"$ORIGIN".
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6374
When replacing a disk with non-wholedisk spare, we shouldn't zero_label
it. The wholedisk case already skip it. In fact, zero_label function
will fail saying device busy because it's already opened exclusively,
but since there's no error checking, the replace command will succeed,
causing great confusion.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #6369
zpool iostat/status -c is supposed to be restricted
by its search path, but currently isn't. To prevent
arbitrary scripts from being executed, disallow '/'
from commands.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6353
Closes #6359
Olaf Faaland [Mon, 24 Jul 2017 18:22:10 +0000 (11:22 -0700)]
Use correct macro for hz in mmp.c
Commit 379ca9c Multi-modifier protection (MMP) used HZ to convert
nanoseconds to ticks for use with cv_timedwait() and ddi_get_lbolt().
The correct macro is hz, which is defined within the SPL for kernel
space, and within zfs_context.h for user space.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6357
Closes #6360
CID 165755: Division or modulo by zero (DIVIDE_BY_ZERO)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6352
Use log_must_busy when destroying the snapshot
and dataset during cleanup in zfs_mount_001_neg.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6382
This change allows mountpoint_003_pos and send-c_props
to run on Linux kernels that do not support mandatory
locking. Linux kernel versions greater than or equal to
4.4 no longer support mandatory locking and the test
suite will now account for that.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6346
Closes #6347
Closes #6362
Tony Hutter [Mon, 24 Jul 2017 17:58:14 +0000 (10:58 -0700)]
Add new fsck return code to zvol_misc_002_pos
zvol_misc_002_pos was failing on Fedora 26 because its newer version
of fsck was returning a different code than previous versions. The
new fsck error code is valid and is been added to the test in this
patch.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6350
OpenZFS 8491 - uberblock on-disk padding to reserve space for smoothly merging zpool checkpoint & MMP in ZFS
The zpool checkpoint feature in DxOS added a new field in the uberblock.
The Multi-Modifier Protection Pull Request from ZoL adds three new fields
in the uberblock (Reference: https://github.com/zfsonlinux/zfs/pull/6279).
As these two changes come from two different sources and once upstreamed
and deployed will introduce an incompatibility with each other we want
to upstream a change that will reserve the padding for both of them so
integration goes smoothly and everyone gets both features.
Porting Notes: Preserved MMP comments in uberblock struct.
Authored by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Olaf Faaland <faaland1@llnl.gov>
Approved by: Gordon Ross <gwr@nexenta.com> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8491
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d84fa5f
Closes #6390
Ned Bass [Fri, 21 Jul 2017 00:04:35 +0000 (17:04 -0700)]
Minor fixes in zpool iostat -c documentation (#6370)
- Use nested [] notation to denote optional script list elements
- Fix space before comma after smarctl(8)
- Fix typo and formatting error in reference to -v option
- Fix spelling errors
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6370
George Melikov [Fri, 14 Jul 2017 16:34:35 +0000 (19:34 +0300)]
Fix coverity defects: CID 165757
CID 165757: Control flow issues (MISSING_BREAK)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6348
Brian Behlendorf [Wed, 12 Jul 2017 00:35:34 +0000 (20:35 -0400)]
Fix vdev_probe() call outside SCL_STATE_ALL lock
When an IO fails then zio_vdev_io_done() can call vdev_probe()
to determine the health of the vdev. This is safe as long as
the original zio was submitted with zio_wait() and holds the
SCL_STATE_ALL lock over the operation.
If zio_no_wait() was used then the done callback will submit
the probe IO outside the SCL_STATE_ALL lock and hit this
ASSERT in zio_create()
Resolve the issue by only allowing vdev_probe() to be called
when there's a waiter indicating the caller is using zio_wait().
This assumes that caller is still holding SCL_STATE_ALL.
This issue isn't MMP specific but was surfaced when testing.
Without this patch it can be reproduced by running:
zpool set multihost on <pool>
zinject -d <vdev> -e io -T write -f 50 <pool> -L uber
Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@intel.com>
Closes #745
Closes #6279
Olaf Faaland [Sat, 8 Jul 2017 03:20:35 +0000 (20:20 -0700)]
Multi-modifier protection (MMP)
Add multihost=on|off pool property to control MMP. When enabled
a new thread writes uberblocks to the last slot in each label, at a
set frequency, to indicate to other hosts the pool is actively imported.
These uberblocks are the last synced uberblock with an updated
timestamp. Property defaults to off.
During tryimport, find the "best" uberblock (newest txg and timestamp)
repeatedly, checking for change in the found uberblock. Include the
results of the activity test in the config returned by tryimport.
These results are reported to user in "zpool import".
Allow the user to control the period between MMP writes, and the
duration of the activity test on import, via a new module parameter
zfs_multihost_interval. The period is specified in milliseconds. The
activity test duration is calculated from this value, and from the
mmp_delay in the "best" uberblock found initially.
Add a kstat interface to export statistics about Multiple Modifier
Protection (MMP) updates. Include the last synced txg number, the
timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV
label that received the last MMP update, and the VDEV path. Abbreviated
output below.
Add a new tunable zfs_multihost_history to specify the number of MMP
updates to store history for. By default it is set to zero meaning that
no MMP statistics are stored.
When using ztest to generate activity, for automated tests of the MMP
function, some test functions interfere with the test. For example, the
pool is exported to run zdb and then imported again. Add a new ztest
function, "-M", to alter ztest behavior to prevent this.
Add new tests to verify the new functionality. Tests provided by
Giuseppe Di Natale.
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Ned Bass <bass6@llnl.gov> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #745
Closes #6279
Olaf Faaland [Thu, 25 May 2017 20:32:06 +0000 (13:32 -0700)]
Make hostid consistent in user and kernel space
If no spl_hostid was set, and no /etc/hostid file existed, the user
and kernel would have different values for the hostid.
The kernel's would be 0. User space's would depend on the libc
implementation. On systems with glibc, it would be a generated value,
probably the first 4 bytes of an IP address (see man 3 gethostid and
comments above hostid_read in SPL for details).
This then causes the hostid stored in the labels and in the pool
config not to match the hostid userspace obtains from
get_system_hostid().
Since the kernel has no way to know the libc's generated hostid value,
it serves no purpose for ZFS to use the value.
This patch changes user space's get_system_hostid() to conform to the
kernel's method, first checking for the spl_hostid via sysfs, and then
reading from /etc/hostid directly.
It does not look up spl_hostid_path, because if that is set and the
file it pointed to exists, spl_hostid will reflect its contents.
It eliminates the call to libc's gethostid().
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Ned Bass <bass6@llnl.gov> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #745
Closes #6279
Dave Eddy [Tue, 30 May 2017 18:39:17 +0000 (11:39 -0700)]
OpenZFS 6939 - add sysevents to zfs core for commands
Authored by: Dave Eddy <dave@daveeddy.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Joshua M. Clulow <jmc@joyent.com>
Reviewed by: Josh Wilsdon <jwilsdon@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Alan Somers <asomers@gmail.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/6939
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ce1577b
Closes #6328
When VERIFY3_IMPL() was adjusted in 682ce104, the values of
the operands were omitted from the variadic arguments list.
This patch simply corrects this.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #6343
The volmode property may be set to control the visibility of ZVOL
block devices.
This allow switching ZVOL between three modes:
full - existing fully functional behaviour (default)
dev - hide partitions on ZVOL block devices
none - not exposing volumes outside ZFS
Additionally the new zvol_volmode module parameter can be used to
control the default behaviour.
This functionality can be used, for instance, on "backup" pools to
avoid cluttering /dev with unneeded zd* devices.
Yuri Pankov [Tue, 13 Jun 2017 03:16:28 +0000 (20:16 -0700)]
OpenZFS 5428 - provide fts(), reallocarray(), and strtonum()
Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
* All hunks unrelated to ZFS were dropped.
Exit test-runner with non-zero if tests are KILLED
fe46eeb introduced non-zero exit codes to test-runner.
A non-zero exit code should be returned when test-runner
decided to kill a test and mark it as KILLED.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6325
Commands should be eval()ed if they involve a shell redirection,
otherwise we end up writing log_* functions messages to the output.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6300
Closes #6323
Matthew Ahrens [Mon, 20 Mar 2017 22:38:11 +0000 (15:38 -0700)]
OpenZFS 8126 - ztest assertion failed in dbuf_dirty due to dn_nlevels changing
The sync thread is concurrently modifying dn_phys->dn_nlevels
while dbuf_dirty() is trying to assert something about it, without
holding the necessary lock. We need to move this assertion further down
in the function, after we have acquired the dn_struct_rwlock.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8126
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0ef125d
Closes #6314
Matthew Ahrens [Mon, 1 May 2017 18:06:07 +0000 (11:06 -0700)]
OpenZFS 8067 - zdb should be able to dump literal embedded block pointer
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8067
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8173085
Closes #6319
Antonio Russo [Fri, 7 Jul 2017 17:45:17 +0000 (13:45 -0400)]
Prevent dependencies on Debianized packages
Call dpkg-shlibdeps with arguments excluding the Debianized packages
lib{uutil1,nvpair1,zfs2,zpool2}linux from the auto-generated
dependencies of generated .debs. A shim dh_shlibdeps that calls the
real dh_shlibdeps with corresponding arguments is installed into a
temporary directory, which is in turn pre-pended to the PATH for the
alien call, working around alien's inability to directly alter the
dependencies of its output debs. Resolves #6106.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #6309
Closes #6106
Illumos 4080 inadvertently allows 'zpool clear' on readonly pools: fix
this by reintroducing a check (POOL_CHECK_READONLY) in zfs_ioc_clear
registration code.
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6306
Alek P [Fri, 7 Jul 2017 05:16:13 +0000 (22:16 -0700)]
Implemented zpool scrub pause/resume
Currently, there is no way to pause a scrub. Pausing may
be useful when the pool is busy with other I/O to preserve
bandwidth.
This patch adds the ability to pause and resume scrubbing.
This is achieved by maintaining a persistent on-disk scrub state.
While the state is 'paused' we do not scrub any more blocks.
We do however perform regular scan housekeeping such as
freeing async destroyed and deadlist blocks while paused.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com> Reviewed-by: Serapheim Dimitropoulos <serapheimd@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #6167
On the single core machine the system may hang when the
spa_namespare_lock acquisition fails in the zvol_first_open
function. It returns -ERESTARTSYS error what causes the
endless loop in __blkdev_get function.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
Closes #6283
Closes #6312
George Melikov [Wed, 5 Jul 2017 17:46:52 +0000 (20:46 +0300)]
ZTS: replace su commands by run_user function
Needed for PATH variable to be passed into su. The
posix* tests were fixed, but they need further investigation
before they can be enabled.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6303
Matthew Ahrens [Fri, 14 Apr 2017 19:59:18 +0000 (12:59 -0700)]
OpenZFS 8378 - crash due to bp in-memory modification of nopwrite block
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
The problem is that zfs_get_data() supplies a stale zgd_bp to
dmu_sync(), which we then nopwrite against.
zfs_get_data() doesn't hold any DMU-related locks, so after it
copies db_blkptr to zgd_bp, dbuf_write_ready() could change
db_blkptr, and dbuf_write_done() could remove the dirty record.
dmu_sync() then sees the stale BP and that the dbuf it not dirty,
so it is eligible for nop-writing.
The fix is for dmu_sync() to copy db_blkptr to zgd_bp after
acquiring the db_mtx. We could still see a stale db_blkptr,
but if it is stale then the dirty record will still exist and
thus we won't attempt to nopwrite.
Andriy Gapon [Sat, 11 Mar 2017 18:26:47 +0000 (20:26 +0200)]
OpenZFS 7600 - zfs rollback should pass target snapshot to kernel
Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
The existing kernel-side code only provides a method to rollback to a
latest snapshot, whatever it happens to be at the time when the rollback
is actually done. That could be unsafe or confusing in environments
where concurrent DSL changes are possible as the resulting state could
correspond to a newer or older snapshot than the originally requested
one.
This change allows to amend that method such that the rollback is
performed only when the latest snapshot has a specific name. That is,
if a new snapshot is concurrently created or the target snapshot is
destroyed, then no rollback is done and EXDEV error is returned.
New libzfs_core function lzc_rollback_to() is provided for the new
functionality. libzfs is changed to use lzc_rollback_to() to implement
zfs rollback command.
Perhaps we should return different errors to distinguish the case where
the desired snapshot exists but it's not the latest snapshot and the
case where the desired snapshot does not exist.
Andriy Gapon [Sat, 11 Mar 2017 17:48:35 +0000 (19:48 +0200)]
OpenZFS 7910 - l2arc_write_buffers() may write beyond target_sz
Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7910
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/cb6af4b
Closes #6291