Tony Hutter [Fri, 21 Apr 2017 16:27:04 +0000 (09:27 -0700)]
Prebaked scripts for zpool status/iostat -c
This patch updates the "zpool status/iostat -c" commands to only run
"pre-baked" scripts from the /etc/zfs/zpool.d directory (or wherever
you install to). The scripts can only be run from -c as an unprivileged
user (unless the ZPOOL_SCRIPTS_AS_ROOT environment var is
set by root). This was done to encourage scripts to be written is such
a way that normal users can use them, and to be cautious. If your
script needs to run a privileged command, consider adding the
appropriate line in /etc/sudoers. See zpool(8) for an example of how
to do this.
The patch also allows the scripts to output custom column names. If
the script outputs a line like:
name=value
then "name" is used for the column name, and "value" is its value.
Multiple columns can be specified by outputting multiple lines. Column
names and values can have spaces. If the value is empty, a dash (-) is
printed instead.
After all the "name=value" lines are read (if any), zpool will take the
next the next line of output (if any) and print it without a column
header. After that, no more lines will be processed. This can be
useful for printing errors.
Lastly, this patch also disables the -c option with the latency and
request size histograms, since it produced awkward output and made the
code harder to maintain.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #5852
Ned Bass [Thu, 20 Apr 2017 19:10:55 +0000 (12:10 -0700)]
vdev_id: fix failure due to multipath -l bug
Udev may fail to create the expected symbolic links in
/dev/disk/by-vdev on systems with the
device-mapper-multipath-0.4.9-100.el6 package installed. This affects
RHEL 6.9 and possibly other downstream distributions.
That version of the multipath command may incorrectly list a drive
state as "unkown" instead of "running". The issue was introduced
in the patch for https://bugzilla.redhat.com/show_bug.cgi?id=1401769
The vdev_id udev helper uses the state reported by "multipath -l" to
detect an online component disk of a multipath device in order to
resolve its physical slot and enclosure. Changing the command
invocation to "multipath -ll" works around the above issue by causing
multipath to consult additional sources of information to determine
the drive state.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6039
This lets users create a bookmark from the command line by its name
only, without the need to specify the dataset path which is extacted
from the snapshot parameter.
George Melikov [Thu, 20 Apr 2017 19:05:39 +0000 (23:05 +0400)]
zfs_receive_010_pos: change dd arguments
The `dd` command as written will not create a hole in the file.
Additionally, the `stride` argument isn't understood by `dd` so
it's replaced with `seek` which isn't equivilant but will result in
a single whole which is sufficient for the test case. Finally,
`conv=notrunc` is added to avoid truncating the file.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6023
Tim Crawford [Wed, 19 Apr 2017 23:36:32 +0000 (19:36 -0400)]
Fix leak in send_iterate_fs
Fix a leak when generating a replication stream of a cloned dataset.
Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Tim Crawford <tcrawford@datto.com>
Closes #6034
Richard Yao [Thu, 13 Apr 2017 21:28:46 +0000 (14:28 -0700)]
OpenZFS 6392 - zdb: introduce -V for verbatim import
Authored by: Richard Yao <ryao@gentoo.org>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Porting Notes:
This was already implemented in ZFS on Linux. This patch
is to resolved the deltas present in our version.
Ensure `zinject -c` all gets called whenever
zpool_scrub_004_pos exits.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Issue #5444
Closes #6021
George Melikov [Tue, 18 Apr 2017 16:44:17 +0000 (20:44 +0400)]
zfstest: add dmesg command to $PATH
Error example in `zfs_list_007_pos`:
`sudo: dmesg: command not found`
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6024
DHE [Sun, 26 Mar 2017 02:36:28 +0000 (22:36 -0400)]
Increase zfs_vdev_async_write_min_active to 2
Resilver operations frequently cause only a small amount of dirty data
to be written to disk at a time, resulting in the IO scheduler to only
issue 1 write at a time to the resilvering disk. When it is rotational
media the drive will often travel past the next sector to be written
before receiving a write command from ZFS, significantly delaying the
write of the next sector.
Raise zfs_vdev_async_write_min_active so that drives are kept fed
during resilvering.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: DHE <git@dehacked.net>
Issue #4825
Closes #5926
Matthew Ahrens [Thu, 13 Apr 2017 21:38:16 +0000 (14:38 -0700)]
OpenZFS 8061 - sa_find_idx_tab can be declared more type-safely
Authored by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
sa_find_idx_tab() is declared as taking and returning "void *" parameters.
These can be declared to be the specific types.
Alan Somers [Thu, 13 Apr 2017 21:22:32 +0000 (14:22 -0700)]
OpenZFS 7900 - zdb shouldn't print the path of a znode at verbosity < 5
Authored by: Alan Somers <asomers@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
There are two reasons:
1) Finding a znode's path is slower than printing any other znode
information at verbosity < 5.
2) On a corrupted pool like the one mentioned below, zdb will crash when it
tries to determine the znode's path. But with this patch, zdb can still
extract useful information from such pools.
OpenZFS 6101 - attempt to lzc_create() a filesystem under a volume results in a panic
Authored by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
When querying ZPL properties verify that the objset is of type
DMU_OST_ZFS.
Force flushing of txg's can be painfully slow when competing for disk
IO, since this is a process meant to execute asynchronously. Optimize
this path via allowing data/hole seeking if the file is clean, but if
dirty fall back to old logic. This is a compromise to disabling the
feature entirely.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Closes #4306
Closes #5962
Brian Behlendorf [Thu, 13 Apr 2017 16:40:56 +0000 (09:40 -0700)]
OpenZFS 6410 - teach zdb to perform object lookups by path
Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Dan McDonald <danmcd@omniti.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Replaced zdb.8 with upstream mdoc zdb.1m version. Updated to
include Linux specific features: -V verbatium imports and
improved label printing (-u, and -l).
- Minor changes to `zdb -h` output to honor 80 character limit.
Brian Behlendorf [Thu, 13 Apr 2017 16:40:00 +0000 (09:40 -0700)]
OpenZFS 5120 - zfs should allow large block/gzip/raidz boot pool (loader project)
Authored by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Don Brady <don.brady@intel.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- grub-2.02-beta2-422-gcad5cc0 includes support for large blocks.
- Commit 8aab121 allowed GZIP[1-9].
- Grub allows pools with multiple top-level vdevs.
Be sure to invalidate a vdev's cache before performing
a zpool labelclear. There are cases where the cache is
stale because we did some operation that bypassed it,
and since we are doing an open with only O_RDWR, we
should invalidate it to be safe.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6009
Brian Behlendorf [Wed, 12 Apr 2017 20:36:48 +0000 (13:36 -0700)]
OpenZFS 7503 - zfs-test should tail ::zfs_dbgmsg on test failure
Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Enable internal log for DEBUG builds and in zfs-tests.sh.
- callbacks/zfs_dbgmsg.ksh - Dump interal log via kstat.
- callbacks/zfs_dmesg.ksh - Dump dmesg log.
- default.cfg - 'Test Suite Specific Commands' dropped.
Richard Yao [Sat, 8 Apr 2017 16:51:04 +0000 (12:51 -0400)]
Fix header inclusions for standards conformance
musl's sys/errno.h is literally:
/#warning redirecting incorrect #include <sys/errno.h> to <errno.h>
/#include <errno.h>
It does the same for sys/{poll,signal}.h. This is rather noisy when
building ZoL against musl. musl is also correct in pointing out that the
correct headers are outside of sys/ according to the single unix
specification:
Lets implement our own sys/* versions of these headers to redirect to
the proper userland ones when building in userspace. That will silence
the warning.
There are also some instances where we include incorrectly from sys/ or
from outside of sys/ in userspace only code. In these instances, lets
just fix the includes directly.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #5993
Richard Yao [Sun, 9 Apr 2017 19:00:03 +0000 (15:00 -0400)]
Fix `zpool iostat -T d 1` on musl
When building on Gentoo against musl, GCC complains:
timestamp.c: In function ‘print_timestamp’:
timestamp.c:32:19: warning: passing argument 1 of ‘nl_langinfo’ makes
integer from pointer without a cast
#define _DATE_FMT "%+"
^
timestamp.c:47:21: note: in expansion of macro ‘_DATE_FMT’
fmt = nl_langinfo(_DATE_FMT);
^
The error was wrapped to meet comment style requirements.
This code is used by `zpool iostat -T d 1` to print a date and upon
testing it, I see no date printed. Lets use D_T_FMT so that something
gets printed and if D_T_FMT is not avaliable, then we can fall back to
"%+".
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #5993
Richard Yao [Sat, 8 Apr 2017 17:14:14 +0000 (13:14 -0400)]
Add missing includes to zed_log.c
GCC 4.9.4 complains about implicit function declarations when building
against musl on Gentoo.
zed_log.c: In function ‘zed_log_pipe_open’:
zed_log.c:69:7: warning: implicit declaration of function ‘getpid’
(int)getpid());
^
zed_log.c:71:2: warning: implicit declaration of function ‘pipe’
if (pipe(_ctx.pipe_fd) < 0)
^
zed_log.c: In function ‘zed_log_pipe_close_reads’:
zed_log.c:90:2: warning: implicit declaration of function ‘close’
if (close(_ctx.pipe_fd[0]) < 0)
^
zed_log.c: In function ‘zed_log_pipe_wait’:
zed_log.c:141:3: warning: implicit declaration of function ‘read’
n = read(_ctx.pipe_fd[0], &c, sizeof (c));
The [-Wimplicit-function-declaration] at the end of each warning has
been removed to meet comment style requirements.
The man pages say to include <sys/types.h> and <unistd.h>. Doing that
silences the warnings.
Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #5993
Brian Behlendorf [Wed, 12 Apr 2017 15:47:42 +0000 (08:47 -0700)]
OpenZFS 7535 - need test for resumed send of top most filesystem
Authored by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- zfs_share_001_pos.ksh - Older versions of exportfs will match
multiple exports that share a common prefix. Reorder the 'fs'
list so unshares occur from most to least unique.
- zfs_share_005_pos.ksh - Enabled and updated for Linux.
Yuri Pankov [Fri, 13 Jan 2017 17:25:15 +0000 (09:25 -0800)]
OpenZFS 6865 - want zfs-tests cases for zpool labelclear command
Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Updated 'zpool labelclear' and 'zdb -l' such that they attempt
to find a vdev given solely its short name. This behavior is
consistent with the upstream OpenZFS code and the test cases
depend on it. The actual implementation differs slightly due
to device naming conventions on Linux.
- auto_online_001_pos, auto_replace_001_pos and add-o_ashift
test cases updated to expect failure when no label exists.
- read_efi_label() and zpool_label_disk_check() are read-only
operations and should use O_RDONLY at open time to enforce this.
- zpool_label_disk() and zpool_relabel_disk() write the partition
information using O_DIRECT an fsync() and page cache invalidation
to ensure a consistent view of the device.
- dump_label() in zdb should invalidate the page cache in order
to get the authoritative label from disk.
When we try assign a new transaction to a TXG we must know beforehand
if there is sufficient free space on disk. This is to decide,
in dmu_tx_assign(), if we should reject the TX with ENOSPC.
We rely on spa_get_worst_case_asize() to inflate the size of our
logical writes by a factor of spa_asize_inflation which is
calculated as:
The problem with the current implementation is that we don't take
into account what happens with very small writes on VDEVs with large
physical block sizes.
Consider the case of writes to a dataset with recordsize=512,
copies=3 on a VDEV with ashift=13 (usually SSD with 8K block size):
every logical IO will end up allocating 3 * 8K = 24K on disk, so 512
bytes multiplied by 48, which is double the size we account for.
If we allow this kind of writes to be assigned a TX it is possible,
when the pool is almost full, to trigger an allocation failure
(ENOSPC) in the ZIO pipeline, which will in turn result in the whole
pool being suspended.
The bug is fixed by using, in spa_get_worst_case_asize(), the MAX()
value chosen between the logical io size from zfs_write() and the
maximum physical block size used among our VDEVs.
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #5941
Matthew Ahrens [Mon, 10 Apr 2017 22:21:45 +0000 (15:21 -0700)]
OpenZFS 8005 - poor performance of 1MB writes on certain RAID-Z configurations
Authored by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Don Brady <don.brady@intel.com> Ported-by: Matt Ahrens <mahrens@delphix.com>
RAID-Z requires that space be allocated in multiples of P+1 sectors,
because this is the minimum size block that can have the required amount
of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2
sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector
is a unit of 2^ashift bytes, typically 512B or 4KB.
To satisfy this constraint, the allocation size is rounded up to the
proper multiple, resulting in up to 3 "pad sectors" at the end of some
blocks. The contents of these pad sectors are not used, so we do not
need to read or write these sectors. However, some storage hardware
performs much worse (around 1/2 as fast) on mostly-contiguous writes
when there are small gaps of non-overwritten data between the writes.
Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that
include pad sectors. If writing a pad sector will fill the gap between
two (required) writes, we will issue the optional zio, thus doubling
performance. The gap-filling performance improvement was introduced in
July 2009.
Writing the optional zio is done by the io aggregation code in
vdev_queue.c. The problem is that it is also subject to the limit on
the size of aggregate writes, zfs_vdev_aggregation_limit, which is by
default 128KB. For a given block, if the amount of data plus padding
written to a leaf device exceeds zfs_vdev_aggregation_limit, the
optional zio will not be written, resulting in a ~2x performance
degradation.
The problem occurs only for certain values of ashift, compressed block
size, and RAID-Z configuration (number of parity and data disks). It
cannot occur with the default recordsize=128KB. If compression is
enabled, all configurations with recordsize=1MB or larger will be
impacted to some degree.
The problem notably occurs with recordsize=1MB, compression=off, with 10
disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore
this problem has been known as "the 1MB 10-wide RAIDZ2 (or 3) problem".
The problem also occurs with the following configurations:
With recordsize=512KB or 256KB, compression=off, the problem occurs only
in rarely-used configurations:
* 4-wide RAIDZ1 with recordsize=512KB and ashift=12 (4KB sectors)
* 4-wide RAIDZ2 (either recordsize, either ashift)
* 5-wide RAIDZ2 with recordsize=512KB (either ashift)
* 6-wide RAIDZ2 with recordsize=512KB (either ashift)
With recordsize=1MB, compression=off, ashift=9 (512B sectors)
* RAIDZ1 with 4 or 8 disks
* RAIDZ2 with 4, 8, or 10 disks
* RAIDZ3 with 6, 8, 9, or 10 disks
With recordsize=1MB, compression=off, ashift=12 (4KB sectors)
* RAIDZ1 with 7 or 8 disks
* RAIDZ2 with 4, 5, or 10 disks
* RAIDZ3 with 6, 9, or 10 disks
With recordsize=2MB and larger (which can only be selected by changing
kernel tunables), many configurations are affected, including with
higher numbers of disks (up to 18 disks with recordsize=2MB).
Increase zfs_vdev_aggregation_limit to allow the optional zio to be
aggregated, thus eliminating the problem. Setting it to 256KB fixes all
commonly-used configurations.
The solution is to aggregate optional zio's regardless of the
aggregation size limit.
George Melikov [Sun, 9 Apr 2017 23:17:55 +0000 (03:17 +0400)]
zfstest - replace dircmp with diff
`dircmp` doesn't exist in Linux while `diff` is already used
by zfstests on all platforms.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #5996
George Melikov [Sun, 9 Apr 2017 23:15:44 +0000 (03:15 +0400)]
zfstest reservation_009_pos.sh missed backslash
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #5997
George Wilson [Fri, 7 Apr 2017 20:50:18 +0000 (13:50 -0700)]
OpenZFS 8023 - Panic destroying a metaslab deferred range tree
Authored by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
We don't want to dirty any data when we're in the final txgs of the pool
export logic. This change introduces checks to make sure that no data is
dirtied after a certain point. It also addresses the culprit of this
specific bug – the space map cannot be upgraded when we're in final
stages of pool export. If we encounter a space map that wants to be
upgraded in this phase, then we simply ignore the request as it will get
retried the next time we set the fragmentation metric on that metaslab.
OpenZFS 5380 - receive of a send -p stream doesn't need to try renaming snapshots
Authored by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
recv_incremental_replication() takes care of things like removing
datasets that have been removed on the sending side, detecting renamed
datasets, ensuring that all datasets in the affected hierarchy have the
same properties as their counterparts on the sending side.
All of the above are not necessary if we are receiving a stream for a
single dataset that has been generated with zfs send -p, that is, a
stream that includes properties. zfs_receive_one() already takes care
of applying the properties to the received datasets.
Pedro Giffuni [Fri, 7 Apr 2017 20:36:06 +0000 (13:36 -0700)]
OpenZFS 8046 - Let calloc() do the multiplication in libzfs_fru_refresh
Authored by: Pedro Giffuni <pfg@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8046
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3a3c0d5
Closes #5989
George Melikov [Sun, 9 Apr 2017 23:01:54 +0000 (03:01 +0400)]
zfs_receive_010_pos.ksh local => typeset change
Ksh uses `typeset`, `local` is a Bash analog.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #5995
George Melikov [Sun, 9 Apr 2017 23:00:43 +0000 (03:00 +0400)]
zfstests cli_user/misc/setup.ksh space missed
Ksh syntax requires a space after `!` in if statement.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #5994
Toomas Soome [Sat, 3 Dec 2016 07:13:44 +0000 (23:13 -0800)]
OpenZFS 7404 - rootpool_007_neg, bootfs_006_pos and bootfs_008_neg tests fail with the loader project bits
Authored by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Marcel Telka <marcel@telka.sk>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Removed gzip and zle compression restriction on bootfs
datasets. Grub added support for these long ago. Ay
version of grub which understands lz4 also supports this.
- Enabled rootpool tests in runfile but skipped by default
in setup on Linux since they modify the rootpool.
- bootfs_006_pos.ksh, striped pools are allowed as bootfs.
OpenZFS 7629 - Fix for 7290 neglected to remove some escape sequences
Authored by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Multiple changes in this commit were applied in c1d9abf.
Consolidated the shellcheck call in the
make recipe down to a single call of
shellcheck. Corrected script errors that
have been skipped. Corrected script errors
that have been introduced because make
wasn't reporting any errors from shellcheck.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5976
The ZFS enabled versions of xfstests fails to build cleanly on
Amazon Linux. This issue should be resolved by rebasing the ZFS
patches against the latest xfstests and pushing those patches
upstream. This would allow us to use an unmodified xfstests.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5481
Closes #5977
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5974
OpenZFS 7290 - ZFS test suite needs to control what utilities it can run
Authored by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru>
Porting Notes:
- Utilities which aren't available under Linux have been removed.
- Because of sudo's default secure path behavior PATH must be
explicitly reset at the top of libtest.shlib. This avoids the
need for all users to customize secure path on their system.
- Updated ZoL infrastructure to manage constrained path
- Updated all test cases
- Check permissions for usergroup tests
- When testing in-tree create links under bin/
- Update fault cleanup such that missing files during
cleanup aren't fatal.
- Configure su environment with constrained path
Sydney Vanda [Thu, 2 Mar 2017 16:47:26 +0000 (09:47 -0700)]
Added auto-replace FMA test for the ZFS Test Suite
Also included are updates to auto-online test
Automated auto-replace test to go along with ZED FMA integration
(PR 4673) auto-replace_001.pos works using a scsi_debug device
(the only usable virtual device currently due to whole_disk var
needing to be set)
Functionality for automated FMA auto-replace test to work with
scsi_debug devs: Some functionality/exceptions needed to be
added for automation of auto-replace to work correctly.
In the test an alias vdev_id rule is added for any scsi_debug
device which sets the phys_path="scsidebug" after a udevadm
trigger command.
A symlink is created for the vdev_id.conf file (in /etc/zfs/ by
default) to be used in-tree for the test suite
(/var/tmp/zfs/vdev_id.conf). "./scripts/zfs-helpers.sh -i" needs
to be run before fault tests in the ZTS (to use udev rules in-tree)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: David Quigley <david.quigley@intel.com> Signed-off-by: Sydney Vanda <sydney.m.vanda@intel.com>
Closes #5944
Don Brady [Wed, 5 Apr 2017 21:24:26 +0000 (15:24 -0600)]
Fix regression in zfs_ereport_start()
On 32-bit platforms spa_state is 32 bits without cast, and thus
caused a NULL pointer dereference when treated as 64bit in
var arg. Accidentally introduced by bcdb96a.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com> Signed-off-by: Don Brady <don.brady@intel.com>
Closes #5966
Closes #5965
In _zed_event_add_nvpair, when handling DATA_TYPE_UINT64,
we should be using i64 throughout the entire case.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@intel.com> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5964
Steven Hartland [Mon, 3 Apr 2017 23:38:51 +0000 (16:38 -0700)]
OpenZFS 7885 - zpool list can report 16.0e for expandsz
Authored by: Steven Hartland <steven.hartland@multiplay.co.uk>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
When a member of a RAIDZ has been replaced with a device smaller than
the original, then the top level vdev can report its expand size as
16.0E.
The reduced child asize causes the RAIDZ to have a vdev_asize lower than
its vdev_max_asize which then results in an underflow during the
calculation of the parents expand size.
Fix this by updating the vdev_asize if it shrinks, which is already
protected by a check against vdev_min_asize so should always be safe.
Also for RAIDZ vdevs, ensure that the sum of their child vdev_min_asize
is always greater than the parents vdev_min_size.
Reviewed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru>
OpenZFS-issue: https://www.illumos.org/issues/7885
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/bb0dbaa
Closes #5963
Tom Matthews [Tue, 4 Apr 2017 18:03:33 +0000 (19:03 +0100)]
list -o props should be alloc,free not used,avail
Manpage suggests the zpool list properties include 'used'
and 'available', when these are invalid property names.
Use alloc and free in their place.
```
$ zpool list -o name,size,used 2>&1 |head -1
bad property list: invalid property 'used'
$ zpool list -o name,size,avail 2>&1 |head -1
bad property list: invalid property 'avail'
$ zpool list -o name,size,available 2>&1 |head -1
bad property list: invalid property 'available'
$ zpool list -o name,size,alloc,free
NAME SIZE ALLOC FREE
apool 464M 203M 261M
bpool 3.62T 1.97T 1.65T
```
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Matthews <tom@axiom-partners.com>
Closes #5959
N Clark [Mon, 3 Apr 2017 21:23:02 +0000 (17:23 -0400)]
Additional Information for Zedlets
* Add ZPOOL pool state to zfs_post_common to
allow differentiation between export and destroy
by zedlets.
* Add pool name as standard export This ensures
pool name is exported to zedlets.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Closes #5942
A stray semicolon was causing commitcheck.sh
to run twice when running make checkstyle.
Updated regexes for matching tagged lines.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5952
George Melikov [Mon, 3 Apr 2017 18:06:04 +0000 (22:06 +0400)]
zfs_get_005_neg.ksh fix typos
`test_options_bookmark` function must have an `s` at the end.
Reviewed-by: Marcel Telka <marcel@telka.sk> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #5957
Add the need to have a commit message with a specific
format to the contributing guidelines. Provide a script
to help enforce commit message style.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5943
Olaf Faaland [Fri, 31 Mar 2017 16:32:00 +0000 (09:32 -0700)]
glibc 2.5 compat: use correct header for makedev() et al.
In glibc 2.5, makedev(), major(), and minor() are defined in
sys/sysmacros.h. They are also defined in types.h for backward
compatability, but using these definitions triggers a compile warning.
This breaks the ZFS build, as it builds with -Werror.
autoconf email threads indicate these macros may be defined in
sys/mkdev.h in some cases.
This commit adds configure checks to detect where makedev() is defined:
sys/sysmacros.h
sys/mkdev.h
It assumes major() and minor() are defined in the same place.
The libspl types.h then includes
sys/sysmacros.h (preferred) or
sys/mkdev.h (2nd choice)
if one of those defines makedev().
This is done before including the system types.h.
An alternative would be to remove uses of major, minor, and makedev,
instead comparing the st_dev returned from stat64. These configure
checks would then be unnecessary.
This change revealed that __NORETURN was being defined unnecessarily in
libspl/include/sys/sysmacros.h. That definition is removed.
The files in which __NORETURN are used all include types.h, and so all
will get the definition provided by feature_tests.h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5945
Brian Behlendorf [Fri, 31 Mar 2017 16:25:23 +0000 (09:25 -0700)]
Fix add-o_ashift.ksh permissions
Test cases must be executable or they will be skipped.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5947
LOLi [Wed, 29 Mar 2017 00:21:11 +0000 (02:21 +0200)]
Check ashift validity in 'zpool add'
df83110 added the ability to specify a custom "ashift" value from the command
line in 'zpool add' and 'zpool attach'. This commit adds additional checks to
the provided ashift to prevent invalid values from being used, which could
result in disastrous consequences for the whole pool.
Additionally provide ASHIFT_MAX and ASHIFT_MIN definitions in spa.h.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #5878
Sen Haerens [Tue, 28 Mar 2017 17:47:50 +0000 (19:47 +0200)]
Fix "undefined reference to xdr_control" when building raidz_test cmd
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: SenH <sen@senhaerens.be>
Closes #5933
Brian Behlendorf [Tue, 28 Mar 2017 16:58:23 +0000 (09:58 -0700)]
Disable rsend_009_pos
Test rsend_009_pos has been observed to fail pretty frequently
when testing using a kmemleak enabled kernel. For the moment
disable this test case until the underlying issue is resolved.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5887
Closes #5934
wli5 [Mon, 27 Mar 2017 19:33:57 +0000 (03:33 +0800)]
Update documentation for new parameter "zfs_qat_disable"
Update documentation in zfs-module-parameters.5 for new
parameter "zfs_qat_disable" which was introduced by #5846.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Weigang Li <weigang.li@intel.com>
Closes #5914
Brian Behlendorf [Mon, 27 Mar 2017 19:31:15 +0000 (12:31 -0700)]
Allow c99 when building ZFS in the kernel tree
Commit 4a5d7f82 enabled building c99 out of the kernel tree.
However, when building as part of the kernel different Makefiles
are used and -std=gnu99 must additionially be added there.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5919
LOLi [Fri, 24 Mar 2017 01:57:54 +0000 (02:57 +0100)]
Fix 'zdb -o' segmentation fault
Fix a regression accidentally introduced by OpenZFS 7280 in ed828c0: since
whether to accept NULL as a valid first parameter in strchr() is implementation
specific we add an additional check to avoid crashing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #5917
Brian Behlendorf [Fri, 24 Mar 2017 01:26:50 +0000 (18:26 -0700)]
Retry zfs_znode_alloc() in zfs_mknode()
For historical reasons zfs_mknode() was written such that it could
never fail. This poses a problem for Linux since zfs_znode_alloc()
could potentually failure due to low memory. Handle this gracefully
by retrying zfs_znode_alloc() until it succeeds, direct reclaim
will eventually be able to allocate memory.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5535
Closes #5908
Brian Behlendorf [Fri, 24 Mar 2017 01:24:09 +0000 (18:24 -0700)]
Fix undefined reference to `libzfs_fru_compare'
Add trivial libzfs_fru_compare() function which can be used when
HAVE_LIBTOPO is not defined. The only caller is find_vdev() and
this function should never be reached because search_fru must be
NULL unless HAVE_LIBTOPO is defined.
Rename _HAS_FMD_TOPO to existing HAVE_LIBTOPO which was
originally added for this purpose. This macro will never be defined.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5402
Closes #5909
Brian Behlendorf [Thu, 23 Mar 2017 01:08:55 +0000 (18:08 -0700)]
Fix `zpool status -v` error message
When a pool is suspended it's impossible to read the list
of damaged files from disk. This would result in a generic
misleading "insufficient permissions" error message.
Update zpool_get_errlog() to use the standard zpool error
logging functions to generate a useful error message. In
this case:
errors: List of errors unavailable: pool I/O is currently suspended
This patch does not address the related issue of potentially
not being able to resume a suspend pool when the underlying
device names have changed.
Additionally, remove the error handling from zfs_alloc()
in zpool_get_errlog() for readability since this function
can never fail.
Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4031
Closes #5731
Closes #5907
wli5 [Thu, 23 Mar 2017 00:58:47 +0000 (08:58 +0800)]
GZIP compression offloading with QAT accelerator
This patch implement the hardware accelerator method in GZIP compression
in ZFS. When the ZFS pool is enabled GZIP compression, the compression
API will be automatically transferred to the hardware accelerator to
free up CPU resource and speed up the compression time.
* To enable Intel QAT hardware acceleration in ZOL you need to have QAT
hardware and the driver installed:
* QAT hardware DH8950:
http://ark.intel.com/products/79483/Intel-QuickAssist-Adapter-8950
* QAT driver:
https://01.org/intel-quickassist-technology
* Start QAT driver in your system:
service qat_service start
* Enable QAT in ZFS, e.g.:
./configure --with-qat=<qat-driver-path>/QAT1.6
make
* Set GZIP compression in ZFS dataset:
zfs set compression = gzip <dataset>
* Get QAT hardware statistics by:
cat /proc/spl/kstat/zfs/qat
* To disable QAT in ZFS:
insmod zfs.ko zfs_qat_disable=1
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com> Signed-off-by: Weigang Li <weigang.li@intel.com>
Closes #5846
libspl: Fix incorrect use of platform defines on sparc64
libspl tries to detect sparc64 by checking whether __sparc64__
is defined. Unfortunately, this assumption is not correct as
sparc64 does not define __sparc64__ but it defines __sparc__
and __arch64__ instead. This leads to sparc64 being detected
as 32-Bit sparc and the build fails because both _ILP32 and
_LP64 are defined in this case.
To fix the problem, remove the checks for __sparc64__ and
just check __arch64__ if a sparc host was previously
detected with __sparc__.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Closes #5913
Matthew Ahrens [Tue, 21 Mar 2017 01:36:00 +0000 (18:36 -0700)]
OpenZFS 7968 - multi-threaded spa_sync()
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Matthew Ahrens <mahrens@delphix.com>
spa_sync() iterates over all the dirty dnodes and processes each of them
by calling dnode_sync(). If there are many dirty dnodes (e.g. because we
created or removed a lot of files), the single thread of spa_sync()
calling dnode_sync() can become a bottleneck. Additionally, if many
dnodes are dirtied concurrently in open context (e.g. due to concurrent
file creation), the os_lock will experience lock contention via
dnode_setdirty().
The solution is to track dirty dnodes on a multilist_t, and for
spa_sync() to use separate threads to process each of the sublists in
the multilist.
Olaf Faaland [Tue, 21 Mar 2017 00:51:16 +0000 (17:51 -0700)]
Linux 4.11 compat: iops.getattr and friends
In torvalds/linux@a528d35, there are changes to the getattr family of functions,
struct kstat, and the interface of inode_operations .getattr.
The inode_operations .getattr and simple_getattr() interface changed to:
int (*getattr) (const struct path *, struct dentry *, struct kstat *,
u32 request_mask, unsigned int query_flags)
The request_mask argument indicates which field(s) the caller intends to use.
Fields the caller has not specified via request_mask may be set in the returned
struct anyway, but their values may be approximate.
The query_flags argument indicates whether the filesystem must update
the attributes from the backing store.
Currently both fields are ignored. It is possible that getattr-related
functions within zfs could be optimized based on the request_mask.
struct kstat includes new fields:
u32 result_mask; /* What fields the user got */
u64 attributes; /* See STATX_ATTR_* flags */
struct timespec btime; /* File creation time */
Fields attribute and btime are cleared; the result_mask reflects this. These
appear to be optional based on simple_getattr() and vfs_getattr() within the
kernel, which take the same approach.
Reviewed-by: Chunwei Chen <david.chen@osnexus.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5875
DeHackEd [Mon, 20 Mar 2017 22:14:28 +0000 (18:14 -0400)]
zfs(8) fixes
Documentation fixes for zfs(8)
* White space issue in the userused@user property section
* zfs send supports using bookmarks as the origin snapshot
Reviewed by: Ned Bass <bass6@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: DHE <git@dehacked.net>
Closes #5906
Matthew Ahrens [Wed, 15 Mar 2017 12:49:59 +0000 (08:49 -0400)]
OpenZFS 7801 - add more by-dnode routines (lint)
Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7801
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f25efb3
Closes #5894
Matthew Ahrens [Wed, 11 May 2016 03:49:02 +0000 (20:49 -0700)]
OpenZFS 6874 - rollback and receive need to reset ZPL state to what's on disk
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
When we do a clone swap (caused by "zfs rollback" or "zfs receive"), the
ZPL doesn't completely reload the state from the DMU; some values remain
cached in the zfsvfs_t.
Brian Behlendorf [Mon, 13 Mar 2017 22:08:40 +0000 (15:08 -0700)]
Align mount options handling and type/function names with OpenZFS
Refactor the temporary mount option in a way which minimizes
differences with upstream. Additionally, replace the zfs_sb_t
type with zfsvfs_t and rename several functions to be consistent
with the upstream names.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5876
Restructure the handling of mount options to be consistent with
upstream OpenZFS. This required making the following changes.
- The zfs_mntopts_t was renamed vfs_t and adjusted to provide
the minimal needed functionality. This includes a pointer
back to the associated zfsvfs_t. Plus it made it possible
to revert zfs_register_callbacks() and zfsvfs_create() back
to their original prototypes.
- A zfs_mnt_t structure was added for the sole purpose of
providing a structure to pass the osname and raw mount
pointer to zfs_domount() without having to copy them.
- Mount option parsing was moved down from the zpl_* wrapper
functions in to the zfs_* functions. This allowed for the
code to be simplied and it's where similar functionality
appears on other platforms.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Several functions were renamed when ZFS was originally ported to
Linux. Revert the code to the original names to minimize the
delta with upstream OpenZFS.
The use of zfs_sb_t instead of zfsvfs_t results in unnecessary
conflicts with the upstream source. Change all instances of
zfs_sb_t to zfsvfs_t including updating the variables names.
Whenever possible the code was updated to be consistent with
hope it appears in the upstream OpenZFS source.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Fri, 10 Mar 2017 01:43:36 +0000 (17:43 -0800)]
Fix ZVOL BLKFLSBUF ioctl
The BLKFLSBUF ioctl is expected to do two things:
- flush dirty pages to stable storage, and
- invalidate clean pages
Unfortunately, the existing implementation of BLKFLSBUF in
zvol_ioctl() only flushes pages which are part of the current
TXG to disk. There may be additional dirty pages in the
page cache which haven't yet been submitted to the DMU and
therefore aren't part of any TXG.
Furthermore because zvol_ioctl() returns 0 the generic
blkdev_flushbuf() does not invalidate the page cache.
Resolve the issue by moving bdev_flush() in to zvol_ioctl()
and explicitly waiting for a full TXG sync. Then invalidate
the page cache. The associated ARC buffers need not be
evicted since they cannot be bypassed using O_DIRECT.
Reviewed-by: Chunwei Chen <david.chen@osnexus.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5871
Closes #5879
Newer versions of cppcheck find the potential NULL pointer
bug in zfs_write(). The function is difficult to refactor without
extensive work, so suppress the potential NULL pointer error
which cannot occur for now.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5882
arc_summary and dbufstat should have two spaces
after their last function definitions.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5881
Enable shellcheck to run on zed scripts,
paxcheck.sh, zfs-tests.sh, zfs.sh, and zloop.sh.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5812
Chunwei Chen [Wed, 8 Mar 2017 17:26:33 +0000 (09:26 -0800)]
Fix nfs snapdir automount
The current implementation for allowing nfs to access snapdir is very buggy.
It uses a special fh for snapdirs, such that the next time nfsd does
fh_to_dentry, it actually returns the root inode inside the snapshot. So nfsd
never knows it cross a mountpoint.
The problem is that nfsd will not hold a reference on the vfsmount of the
snapshot. This cause auto unmounter to unmount the snapshot even though nfs is
still holding dentries in it.
To fix this, we return the inode for the snapdirs themselves. However, we also
trigger automount upon fh_to_dentry, and return ESTALE so nfsd will revalidate
and see the mountpoint and do crossmnt.
Because nfsd will now be aware that these are different filesystems users
must add crossmnt to their export options to access snapshot directories.
Tony Hutter [Wed, 8 Mar 2017 17:20:21 +0000 (09:20 -0800)]
Fix harmless "BARRIER is deprecated" kernel warning on Centos 6.8
A one time warning after module load that "BARRIER is deprecated" was seen
on the heavily patched 2.6.32-642.13.1.el6.x86_64 Centos 6.8 kernel. It seems
that kernel had both the old BARRIER and the newer FLUSH/FUA interfaces
defined. This fixes the warning by prefering the newer FLUSH/FUA interface
if it's available.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #5739
Closes #5828
Andriy Gapon [Mon, 27 Feb 2017 22:47:33 +0000 (14:47 -0800)]
OpenZFS 7867 - ARC space accounting leak
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Tim Chase <tim@chase2k.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7867
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/aa1f740d
Closes #5874
bunder2015 [Tue, 7 Mar 2017 21:01:39 +0000 (16:01 -0500)]
Corrected highlight for zpool man page
SS is already highlighted and the fB/fR tags break the highlighting
prematurely, removing the tags highlights the entire line.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #5873
bunder2015 [Tue, 7 Mar 2017 17:54:55 +0000 (12:54 -0500)]
Fix multi-line error messages in blkdev_compat.h
Fix multi-line error messages in blkdev_compat.h by changing
error-generating multi-line error messages to single line errors.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #5860
OpenZFS 7793 - ztest fails assertion in dmu_tx_willuse_space
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Background information: This assertion about tx_space_* verifies that we
are not dirtying more stuff than we thought we would. We “need” to know
how much we will dirty so that we can check if we should fail this
transaction with ENOSPC/EDQUOT, in dmu_tx_assign(). While the
transaction is open (i.e. between dmu_tx_assign() and dmu_tx_commit() —
typically less than a millisecond), we call dbuf_dirty() on the exact
blocks that will be modified. Once this happens, the temporary
accounting in tx_space_* is unnecessary, because we know exactly what
blocks are newly dirtied; we call dnode_willuse_space() to track this
more exact accounting.
The fundamental problem causing this bug is that dmu_tx_hold_*() relies
on the current state in the DMU (e.g. dn_nlevels) to predict how much
will be dirtied by this transaction, but this state can change before we
actually perform the transaction (i.e. call dbuf_dirty()).
This bug will be fixed by removing the assertion that the tx_space_*
accounting is perfectly accurate (i.e. we never dirty more than was
predicted by dmu_tx_hold_*()). By removing the requirement that this
accounting be perfectly accurate, we can also vastly simplify it, e.g.
removing most of the logic in dmu_tx_count_*().
The new tx space accounting will be very approximate, and may be more or
less than what is actually dirtied. It will still be used to determine
if this transaction will put us over quota. Transactions that are marked
by dmu_tx_mark_netfree() will be excepted from this check. We won’t make
an attempt to determine how much space will be freed by the transaction
— this was rarely accurate enough to determine if a transaction should
be permitted when we are over quota, which is why dmu_tx_mark_netfree()
was introduced in 2014.
We also won’t attempt to give “credit” when overwriting existing blocks,
if those blocks may be freed. This allows us to remove the
do_free_accounting logic in dbuf_dirty(), and associated routines. This
logic attempted to predict what will be on disk when this txg syncs, to
know if the overwritten block will be freed (i.e. exists, and has no
snapshots).
Porting notes:
- DNODE_SIZE replaced with DNODE_MIN_SIZE in dmu_tx_count_dnode(),
Using the default dnode size would be slightly better.
- DEBUG_DMU_TX wrappers and configure option removed.
- Resolved _by_dnode() conflicts these changes have not yet been
applied to OpenZFS.
OpenZFS 7843 - get_clones_stat() is suboptimal for lots of clones
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7843
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4d519e7
Closes #5868
Olaf Faaland [Tue, 7 Mar 2017 00:01:45 +0000 (16:01 -0800)]
Dump unique configurations and Uberblocks in zdb -lu
For zdb -l, detect when the configuration nvlist in some label l (l>0)
is the same as a configuration already dumped. If so, do not dump it.
Make a similar check when dumping Uberblocks for zdb -lu. Check whether
a label already dumped contains an identical Uberblock. If so, do not
dump the Uberblock.
When dumping a configuration or Uberblock, state which labels it is
found in (0-3), for example: labels = 1 2 3
Detecting redundant uberblocks or configurations is accomplished by
calculating checksums of the uberblocks and the packed nvlists
containing the configuration.
If there is nothing unique to be dumped for a label (ie the
configuration and uberblocks have checksums matching those already
dumped) print nothing for that label.
With additional l's or u's, increase verbosity as follows:
-l Dump each unique configuration only once.
Indicate which labels it appears in.
-ll In addition, dump label space usage stats.
-lll Dump every configuration, unique or not.
-u Dump each unique, valid, uberblock only once.
Indicate which labels it appears in.
-uu In addition, state which slots are invalid.
-uuu Dump every uberblock, unique or not.
-uuuu Dump the uberblock blockpointer (used to be -uuu)
Make exit values conform to the manual page. Failing to unpack a
configuration nvlist is considered an error, as well as failing to open
or read from the device.
Add three tests, zdb_00{3,4,5}_pos to verify the above functionality.
Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5738
Chunwei Chen [Mon, 6 Mar 2017 17:20:20 +0000 (09:20 -0800)]
Fix loop device becomes read-only
Commit 933ec99 removes read and write from f_op because the vfs layer will
select iter_write or aio_write automatically. However, for Linux <= 4.0,
loop_set_fd will actually check f_op->write and set read-only if not exists.
This patch add them back and use the generic do_sync_{read,write} for
aio_{read,write} and new_sync_{read,write} for {read,write}_iter.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5776
Closes #5855
Unlike other architectures which sanitize the LDFLAGS from the
environment in arch/<arch>/Makefile. The powerpc Makefile
allows LDFLAGS to be passed through resulting in the following
build failure.
/usr/bin/ld: unrecognized option '-Wl,-z,relro'
LDFLAGS is set in /usr/lib/rpm/redhat/macros by default. Clear
the environment variable when building kmods for powerpc.
Additionally, now that ppc64le exists it's not longer safe to
assume a powerpc system is big endian. Rely on the endianness
provided by the compiler.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5856
Reduce size of zvol and enforce 4k blocksize in zvol tests
32-bit builders in the buildbot are having trouble completing
their ENOSPC testing in less than the timeout. Reduce the
zvol size and use a 4k block size to reduce read-modify-writes
which are particularly expensive on 32-bit systems due to the
reduced maximum ARC size.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kash Pande <kash@tripleback.net> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5845
Sydney Vanda [Fri, 23 Sep 2016 20:51:08 +0000 (13:51 -0700)]
Add auto-online test for ZED/FMA as part of the ZTS
Automated auto-online test to go along with ZED FMA integration (PR 4673)
auto_online_001.pos works with real devices (sd- and mpath) and with non-real
block devices (loop) by adding a scsi_debug device to the pool
Note: In order for test group to run, ZED must not currently be running.
Kernel 3.16.37 or higher needed for scsi_debug to work properly
If timeout occurs on test using a scsi_debug device (error noticed on Ubuntu
system), a reboot might be needed in order for test to pass. (more
investigation into this)
Also suppressed output from is_real_device/is_loop_device/is_mpath_device -
was making the log file very cluttered with useless error messages
"ie /dev/mapper/sdc is not a block device" from previous patch
Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: David Quigley <david.quigley@intel.com> Signed-off-by: Sydney Vanda <sydney.m.vanda@intel.com>
Closes #5774
Olaf Faaland [Wed, 1 Mar 2017 00:10:18 +0000 (16:10 -0800)]
Linux 4.11 compat: avoid refcount_t name conflict
Linux 4.11 introduces a new type, refcount_t, which conflicts with the
type of the same name defined within ZFS.
Rename the ZFS type zfs_refcount_t. Within the ZFS code, use a macro to
cause references to refcount_t to be changed to zfs_refcount_t at
compile time. This reduces conflicts when later landing OpenZFS
patches.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5823
Closes #5842
Matt Kemp [Mon, 27 Feb 2017 20:03:23 +0000 (14:03 -0600)]
Fix initramfs hook for merged /usr/lib and /lib
Under a merged `/lib` -> `/usr/lib` which renders `/lib` as a symlink,
`find /lib -type f -name libgcc_s.so.1` will not return a result as
`find` will not traverse the symlink. Modifying it to `find /lib/ -type
f -name libgcc_s.so.1` should work for both symlinked and non-symlinked
`/lib` directories.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Kemp <matt@mattikus.com>
Closes #5834
Matthew Ahrens [Fri, 24 Feb 2017 21:34:26 +0000 (13:34 -0800)]
Clean up by-dnode code in dmu_tx.c
https://github.com/zfsonlinux/zfs/commit/0eef1bde31d67091d3deed23fe2394f5a8bf2276
introduced some changes which we slightly improved the style of when
porting to illumos.
There is also one minor error-handling fix, in zap_add() the "zap" may
become NULL in case of an error re-opening the ZAP.
Originally suggested at: https://github.com/openzfs/openzfs/pull/276
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #5805
Isaac Huang [Fri, 24 Feb 2017 20:05:42 +0000 (13:05 -0700)]
ABD style cleanups
The commit a6255b7fce400d485a0e87cbe369aa0ed7dc5dc4 removed a few
assertions which help catch errors and improve code readability. It also
duplicated two conditionals, which was unnecessary and made the code
confusing to read. This patch cleans it up.
Reviewed-by: David Quigley <david.quigley@intel.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Isaac Huang <he.huang@intel.com>
Closes #5802