]> granicus.if.org Git - zfs/log
zfs
8 years agoCompile zio.h and zio_impl.h mutual include
cao [Thu, 1 Dec 2016 23:36:25 +0000 (07:36 +0800)]
Compile zio.h and zio_impl.h mutual include

zio.h includes zio_impl.h but zio_impl.h also includes zio.h, so the
header files to contain each other.  Get rid of the zio_impl.h include
in zio.h and update zio_inject.c to include zio.h instead of zio_impl.h.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5439

8 years agoDo not force VDEV_NAME_TYPE_ID in max_width()
Håkan Johansson [Thu, 1 Dec 2016 00:46:16 +0000 (01:46 +0100)]
Do not force VDEV_NAME_TYPE_ID in max_width()

Do not force VDEV_NAME_TYPE_ID in max_width(), instead add it
in the relevant calls to max_width().

The first location of max_width() where VDEV_NAME_TYPE_ID is
now added in show_import() is followed by print_import_config() and
print_logs().  Both these print children vdev names that have been
retrieved using an explicit VDEV_NAME_TYPE_ID added.

The second location is in status_callback().  This is followed by
print_status_config(), print_logs(), print_l2cache(), and
print_spares(). For l2cache and spares it should not matter as there
are no mirror-X or raidz-X involved.  print_status_config() as above
retrieves the name using explicit VDEV_NAME_TYPE_ID before
calling itself to print children.

The call of max_width() in get_namewidth() is not changed, as this is
used by zpool_do_iostat(), followed by print_iostat(), which does not
add VDEV_NAME_TYPE_ID.

Overall, we should consider adding VDEV_NAME_TYPE_ID to the
relevant name_flags / cb_name_flags fields, and remove the explicit
adding in called routines.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Haakan T Johansson <f96hajo@chalmers.se>
Closes #5401

8 years agoConvert zio_buf_alloc() consumers
Brian Behlendorf [Wed, 30 Nov 2016 23:18:20 +0000 (16:18 -0700)]
Convert zio_buf_alloc() consumers

In multiple cases zio_buf_alloc() was used instead of kmem_alloc()
or vmem_alloc().  This was often done because the allocations
could be large and it was easy to use zfs_buf_alloc() for them.

But this isn't ideal for allocations which are small or short
lived.  In these cases it is better to use kmem_alloc() or
vmem_alloc().  If possible we want to avoid the case where
we have slabs allocated for kmem caches which are rarely used.

Note for small allocations vmem_alloc() will be internally
converted to kmem_alloc().  Therefore as long as large
allocations are infrequent and short lived the penalty for
using vmem_alloc() is small.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5409

8 years agoIntroduce ARC Buffer Data (ABD)
Brian Behlendorf [Wed, 30 Nov 2016 21:48:16 +0000 (14:48 -0700)]
Introduce ARC Buffer Data (ABD)

ZFS currently uses ARC buffers which are backed by virtual memory.
While functional, there are some major problems with this approach
which can be observed on all OpenZFS platforms.  ABD was designed
to address these issues and includes contributions from OpenZFS
developers from multiple platforms.

While all OpenZFS platforms will benefit from ABD this functionality
is critical for Linux.  Unlike the other OpenZFS platforms the Linux
kernel discourages extensive use of virtual memory.  The provided
interfaces are not optimized for frequent allocations from the virtual
address space.  To maintain good performance a kmem cache is
used which contains relatively long lived slabs backed by virtual
memory.  The downside to the approach is that those slabs can
become highly fragmented resulting in an inefficient use of memory.

Another issue is that on 32-bit systems the available virtual
address space in the kernel is only a small fraction of total
system memory.  This means the ARC size is highly constrained
which hurts performance and make allocating memory difficult
and OOMs more likely.

ABD is designed to address these issues by using scatter lists
of pages for data buffers.  This removes the need for slabs
which resolves the fragmentation issue.  It also allows high
memory pages to be allocated which alleviates the virtual
address space pressure on 32-bit systems.

For metadata buffers, which are small, linear ABDs are allocated
from the slab.  This is preferable because there are many places
in the code which expect to be able to read from a given offset
in the buffer.  Using linear ABDs means none of that code needs
to be modified.  The majority of these buffers are allocated with
kmalloc so there's minimal impact of the virtual address space.

Tested-by: Kash Pande <kash@tripleback.net>
Tested-by: kernelOfTruth <kerneloftruth@gmail.com>
Tested-by: RageLtMan <rageltman@sempervictus>
Tested-by: DHE <git@dehacked.net>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: David Quigley <david.quigley@intel.com>
Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3441
Closes #5135

8 years agoEnable ro_props_001_pos
ChaoyuZhang [Wed, 30 Nov 2016 18:27:04 +0000 (02:27 +0800)]
Enable ro_props_001_pos

This script was disabled as the avail/used space changed slightly.
Add sync_pool() and a short delay after snapshots are created to
ensure everything in flight has been written.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
Closes #5201
Closes #5419

8 years agoFix coverity defects: CID 154591
luozhengzheng [Wed, 30 Nov 2016 17:48:01 +0000 (01:48 +0800)]
Fix coverity defects: CID 154591

CID 154591: Incorrect expression (SIZEOF_MISMATCH)

Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5435

8 years agoABD optimized page allocation code
Chunwei Chen [Wed, 26 Oct 2016 04:32:23 +0000 (00:32 -0400)]
ABD optimized page allocation code

* Convert ABD to use the Linux Kernel scatterlist implementation
  instead of the hand rolled one from illumos.

* Scatter ABDs are preferentially populated with higher order
  compound pages from a single zone.  Allocation size is
  progressively decreased until it can be satisfied without
  performing reclaim or compaction.

* An alternate page allocator is provided for kernels older
  than 3.6 and for CONFIG_HIGHMEM systems.  This allocator
  is designed as a fallback for maximum compatibility.

* Extended abdstats to provide visibility in the the allocator.

* Add cached value for PAGESIZE in userspace.

Contributions-by:
Chunwei Chen <david.chen@osnexus.com>
Gvozden Neskovic <neskovic@gmail.com>
Jinshan Xiong <jinshan.xiong@intel.com>
Isaac Huang <he.huang@intel.com>
David Quigley <david.quigley@intel.com>
Brian Behlendorf <behlendorf1@llnl.gov>

8 years agoABD kmap to kmap_atomic
Chunwei Chen [Tue, 27 Sep 2016 21:30:02 +0000 (17:30 -0400)]
ABD kmap to kmap_atomic

Convert usage of kmap to kmap_atomic while correctly saving off
irq state.

8 years agoABD raidz NEON support
Romain Dolbeau [Tue, 22 Nov 2016 07:38:34 +0000 (08:38 +0100)]
ABD raidz NEON support

Port NEON implementation of RAID-Z functions to ABD.

Signed-off-by: Roomain Dolbeau <romain.dolbeau@atos.net>
8 years agoABD raidz avx512f support
Gvozden Neskovic [Sun, 20 Nov 2016 05:01:31 +0000 (06:01 +0100)]
ABD raidz avx512f support

Implement shift based multiplication for 512f. Higher IPC over lookup based
methods yields up to 40% better performance on the current hardware.

Results on Xeon Phi(TM) CPU 7210:
implementation   gen_p           gen_pq          gen_pqr         rec_p           rec_q           rec_r           rec_pq          rec_pr          rec_qr          rec_pqr
original         142232671       24411492        12948205        283053705       22348167        4215911         9171609         2265548         2378370         1648495
scalar           295711162       49851491        33253815        293198109       88179448        61866752        27941684        25764416        17384442        12138153
sse2             410055998       199642658       117973654       406240463       152688682       121092250       84968180        79291076        47473657        20779719
ssse3            411641595       199669571       117937647       406211024       137638508       117050346       81263322        76120405        46281559        32696722
avx2             616485806       311515332       188595628       605455115       260602390       230554476       148198817       138800254       92273356        62937819
avx512f          832191523       408509425       253599522       810094481       404325734       317590971       218235687       197204920       133101937       94001219
fastest          avx512f         avx512f         avx512f         avx512f         avx512f         avx512f         avx512f         avx512f         avx512f         avx512f

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
8 years agoABD Vectorized raidz
Gvozden Neskovic [Wed, 24 Aug 2016 13:51:33 +0000 (15:51 +0200)]
ABD Vectorized raidz

Enable vectorized raidz code on ABD buffers.  The avx512f,
avx512bw, neon and aarch64_neonx2 are disabled in this commit.
With the exception of avx512bw these implementations are
updated for ABD in the subsequent commits.

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
8 years agoABD changes for vectorized RAIDZ
Gvozden Neskovic [Wed, 24 Aug 2016 13:42:51 +0000 (15:42 +0200)]
ABD changes for vectorized RAIDZ

* userspace: aligned buffers. Minimum of 32B alignment is
  needed for AVX2. Kernel buffers are aligned 512B or more.
* add abd_get_offset_size() interface
* abd_iter_map(): fix calculation of iter_mapsize
* add abd_raidz_gen_iterate() and abd_raidz_rec_iterate()

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
8 years agoABD page support to vdev_disk.c
Isaac Huang [Wed, 31 Aug 2016 06:26:43 +0000 (00:26 -0600)]
ABD page support to vdev_disk.c

Signed-off-by: Isaac Huang <he.huang@intel.com>
8 years agoDLPX-44812 integrate EP-220 large memory scalability
David Quigley [Fri, 22 Jul 2016 15:52:49 +0000 (11:52 -0400)]
DLPX-44812 integrate EP-220 large memory scalability

8 years agozstreamdump needs to initialize fletcher 4 support
Tim Chase [Tue, 29 Nov 2016 21:47:05 +0000 (15:47 -0600)]
zstreamdump needs to initialize fletcher 4 support

Otherwise, the checksum function pointer isn't initialized.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #5411

8 years agoAdd -c to zpool iostat & status to run command
Tony Hutter [Tue, 29 Nov 2016 21:45:38 +0000 (13:45 -0800)]
Add -c to zpool iostat & status to run command

This patch adds a command (-c) option to zpool status and zpool iostat.  The
-c option allows you to run an arbitrary command on each vdev and display
the first line of output in zpool status/iostat.  The environment vars
VDEV_PATH and VDEV_UPATH are set to the vdev's path and "underlying path"
before running the command.  For device mapper, multipath, or partitioned
vdevs, VDEV_UPATH is the actual underlying /dev/sd* disk.  This can be useful
if the command you're running requires a /dev/sd* device.

The patch also uses /sys/block/<dev>/slaves/ to lookup the underlying device
instead of using libdevmapper.  This not only removes the libdevmapper
requirement at build time, but also allows you to resolve device mapper
devices without being root.  This means that UDEV_UPATH get set correctly
when running zpool status/iostat as an unprivileged user.

Example:

$ zpool status -c 'echo I am $VDEV_PATH, $VDEV_UPATH'

NAME        STATE     READ WRITE CKSUM
mypool      ONLINE       0     0     0
  mirror-0  ONLINE       0     0     0
    mpatha  ONLINE       0     0     0  I am /dev/mapper/mpatha, /dev/sdc
    sdb     ONLINE       0     0     0  I am /dev/sdb1, /dev/sdb

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #5368

8 years agoAllow zfs unshare <protocol> -a
LOLi [Tue, 29 Nov 2016 19:22:38 +0000 (20:22 +0100)]
Allow zfs unshare <protocol> -a

Allow `zfs unshare <protocol> -a` command to share or unshare all datasets
of a given protocol, nfs or smb.

Additionally, enable most of ZFS Test Suite zfs_share/zfs_unshare test cases.
To work around some Illumos-specific functionalities ($SHARE/$UNSHARE) some
function wrappers were added around them.

Finally, fix and issue in smb_is_share_active() that would leave SMB shares
exported when invoking 'zfs unshare -a'

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #3238
Closes #5367

8 years agoEnsure that perf regression tests cleanup properly
Giuseppe Di Natale [Tue, 29 Nov 2016 00:24:47 +0000 (16:24 -0800)]
Ensure that perf regression tests cleanup properly

Each test in the performance regression test suite
creates a pool and a dataset for use. Unfortunately,
these tests do not cleanup the pool and dataset
correctly once they complete. Each test now kills
fio and iostat, destroys the dataset, and finally
destroys the pool. Each test also now traps the
SIGTERM signal to handle cases where test-runner
kills a test.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Requires-builders: all
Closes #5407

8 years agoEnable user_property_002_pos
ChaoyuZhang [Sat, 19 Nov 2016 00:25:06 +0000 (08:25 +0800)]
Enable user_property_002_pos

The user_property_002_pos passes as expected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
Closes #5406

8 years agoKernel 4.9 compat: file_operations->aio_fsync removal
DeHackEd [Tue, 15 Nov 2016 17:20:46 +0000 (12:20 -0500)]
Kernel 4.9 compat: file_operations->aio_fsync removal

Linux kernel commit 723c038475b78 removed this field.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #5393

8 years agoFix man page formatting in zfs-module-parameters
DeHackEd [Tue, 15 Nov 2016 01:03:57 +0000 (20:03 -0500)]
Fix man page formatting in zfs-module-parameters

Bold and Normal codes were mixed up in a few places resulting in
bad highlighting.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #5397

8 years agoRepair indent of zpool.8 man page
Håkan Johansson [Mon, 14 Nov 2016 17:47:49 +0000 (18:47 +0100)]
Repair indent of zpool.8 man page

Repair indent of zpool.8 man page, just before zpool labelclear
details.  Accidentally introduced by 193a37cb2 (git bisect).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Haakan T Johansson <f96hajo@chalmers.se>
Closes #5394

8 years agoFix 'zpool import' detection issue
Brian Behlendorf [Mon, 14 Nov 2016 17:40:18 +0000 (09:40 -0800)]
Fix 'zpool import' detection issue

Before adding the entry to the configuration verify that the
device can be opened exclusively.  This ensures that as long
as multipathd is running the underlying multipath devices, which
otherwise appear identical to their /dev/mapper counterpart,
are pruned from the configuration.

Failure to do so can result in a result in the vdev appearing
as UNAVAIL when the vdev path provided to the kernel can't be
opened exclusively.

This check would normally be performed in zpool_open_func()
but placing it there would result in false positives because
it is called concurrently for many devices.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5387

8 years agoAdd a statechange notify zedlet
Don Brady [Thu, 10 Nov 2016 21:52:59 +0000 (14:52 -0700)]
Add a statechange notify zedlet

Now that ZED has internal fault diagnosis and the statechange event
is generated for faulted states, we can replace the io-notify and
checksum-notify zedlets with one based on statechange.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #5383

8 years agoFix coverity defects: CID 147503
luozhengzheng [Thu, 10 Nov 2016 16:50:32 +0000 (00:50 +0800)]
Fix coverity defects: CID 147503

CID 147503: Dereference after null check (FORWARD_NULL)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5326

8 years agoFix coverity defects: CID 147540, 147542
cao [Thu, 10 Nov 2016 01:35:26 +0000 (09:35 +0800)]
Fix coverity defects: CID 147540, 147542

CID 147540: unsigned_compare
- Cast nsec to a int32_t to properly detect the expected overflow.
CID 147542: unsigned_compare
- intval can never be less than ZIO_FAILURE_MODE_WAIT which is
  defined to be zero.  Remove this useless check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5379

8 years agoFix ZFS_AC_KERNEL_SET_CACHED_ACL_USABLE check
Gvozden Neskovic [Wed, 9 Nov 2016 21:53:13 +0000 (22:53 +0100)]
Fix ZFS_AC_KERNEL_SET_CACHED_ACL_USABLE check

Pass `ACL_TYPE_ACCESS` for type parameter of `set_cached_acl()` and
`forget_cached_acl()` to avoid removal of dead code after BUG() in
compile time. Tested on 3.2.0 kernel.

Introduced in 3779913

Reviewed-by: Massimo Maggi <me@massimo-maggi.eu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes #5378

8 years agoExport symbol dmu_objset_userobjspace_upgradable
jxiong [Wed, 9 Nov 2016 21:51:12 +0000 (13:51 -0800)]
Export symbol dmu_objset_userobjspace_upgradable

It's used by Lustre to determine if the objset can be upgraded.
The inline version doesn't work because dmu_objset_is_snapshot()
is not exported.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Closes #5385

8 years agoLinux 3.14 compat: assign inode->set_acl
tuxoko [Wed, 9 Nov 2016 18:37:17 +0000 (10:37 -0800)]
Linux 3.14 compat: assign inode->set_acl

Linux 3.14 introduces inode->set_acl(). Normally, acl modification will come
from setxattr, which will handle by the acl xattr_handler, and we already
handles that well. However, nfsd will directly calls inode->set_acl or
return error if it doesn't exists.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Massimo Maggi <me@massimo-maggi.eu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5371
Closes #5375

8 years agoFix symlinks for {vdev_clear,statechange}-led.sh
Olaf Faaland [Wed, 9 Nov 2016 18:19:43 +0000 (10:19 -0800)]
Fix symlinks for {vdev_clear,statechange}-led.sh

These were named in the zed/Makefile.am as vdev_clear-blinkled.sh
and statechange-blinkled.sh causing bad symlinks to be created.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5384

8 years agoFix coverity defects: CID 147586
cao [Wed, 9 Nov 2016 01:33:23 +0000 (09:33 +0800)]
Fix coverity defects: CID 147586

CID 147586: function:allow_usage Type:out-of-bounds read

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5364

8 years agoFix coverity defects: CID 147629
cao [Wed, 9 Nov 2016 00:41:31 +0000 (08:41 +0800)]
Fix coverity defects: CID 147629

CID 147629: Type:Dereference before null check

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov
Signed-off-by: <cao.xuewen cao.xuewen@zte.com.cn>
Closes #5376

8 years agoFix coverity defects: 154021
luozhengzheng [Tue, 8 Nov 2016 22:34:52 +0000 (06:34 +0800)]
Fix coverity defects: 154021

CID 154021: Null pointer dereference

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5380

8 years agoFix coverity defects: CID 147626, 147628
cao [Tue, 8 Nov 2016 22:28:17 +0000 (06:28 +0800)]
Fix coverity defects: CID 147626, 147628

CID 147626: Type:Dereference before null check
CID 147628: Type:Dereference before null check

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5304

8 years agoSkip test suites on 32-bit TEST builders
Brian Behlendorf [Tue, 8 Nov 2016 21:57:17 +0000 (13:57 -0800)]
Skip test suites on 32-bit TEST builders

The ztest, filebench, xfstests, and zfsstress test suites should
be skipped when testing on 32-bit platforms until they pass
reliably.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5381

8 years agoAdd illumos FMD ZFS logic to ZED -- phase 2
Don Brady [Mon, 7 Nov 2016 23:01:38 +0000 (16:01 -0700)]
Add illumos FMD ZFS logic to ZED -- phase 2

The phase 2 work primarily entails the Diagnosis Engine and
the Retire Agent modules. It also includes infrastructure
to support a crude FMD environment to host these modules.

The Diagnosis Engine consumes I/O and checksum ereports and
feeds them into a SERD engine which will generate a corres-
ponding fault diagnosis when the SERD engine fires. All the
diagnosis state data is collected into cases, one case per
vdev being tracked.

The Retire Agent responds to diagnosed faults by isolating
the faulty VDEV. It will notify the ZFS kernel module of
the new VDEV state (degraded or faulted). This agent is
also responsible for managing hot spares across pools.
When it encounters a device fault or a device removal it
replaces the device with an appropriate spare if available.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #5343

8 years agoFix coverity defects: CID 147575, 147577, 147578, 147579
cao [Mon, 7 Nov 2016 22:54:32 +0000 (06:54 +0800)]
Fix coverity defects: CID 147575, 147577, 147578, 147579

CID 147575, Type:Unintentional integer overflow
CID 147577, Type:Unintentional integer overflow
CID 147578, Type:Unintentional integer overflow
CID 147579, Type:Unintentional integer overflow

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5365

8 years agoUse set_cached_acl and forget_cached_acl when possible
Chunwei Chen [Wed, 2 Nov 2016 00:19:52 +0000 (17:19 -0700)]
Use set_cached_acl and forget_cached_acl when possible

Originally, these two function are inline, so their usability is tied to
posix_acl_release. However, since Linux 3.14, they became EXPORT_SYMBOL, so we
can always use them. In this patch, we create an independent test for these
two functions so we can use them when possible.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
8 years agoBatch free zpl_posix_acl_release
Chunwei Chen [Fri, 28 Oct 2016 20:37:00 +0000 (13:37 -0700)]
Batch free zpl_posix_acl_release

Currently every calls to zpl_posix_acl_release will schedule a delayed task,
and each delayed task will add a timer. This used to be fine except for
possibly bad performance impact.

However, in Linux 4.8, a new timer wheel implementation[1] is introduced. In
this new implementation, the larger the delay, the less accuracy the timer is.
So when we have a flood of timer from zpl_posix_acl_release, they will expire
at the same time. Couple with the fact that task_expire will do linear search
with lock held. This causes an extreme amount of contention inside interrupt
and would actually lockup the system.

We fix this by doing batch free to prevent a flood of delayed task. Every call
to zpl_posix_acl_release will put the posix_acl to be freed on a lockless
list. Every batch window, 1 sec, the zpl_posix_acl_free will fire up and free
every posix_acl that passed the grace period on the list. This way, we only
have one delayed task every second.

[1] https://lwn.net/Articles/646950/

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
8 years agoFix 'zpool import' detection issues
Brian Behlendorf [Mon, 7 Nov 2016 18:28:57 +0000 (10:28 -0800)]
Fix 'zpool import' detection issues

This patch addresses multiple 'zpool import' block device
indentification problems which are most likely to occur on a
system configured to use blkid, by_vdev paths, multipath and
failover.  The symptom most commonly observed is the import
uses different path names to import the pool than would
normally be expected.

* When using blkid to identify vdevs the listed devices may
be added to the cache in any order.  In order to apply the
preferred search order heuristic a zfs_path_order() function
was added to calculate the order given full path names.

* Since it's possible to have multiple block devices with
different vdev guids which refer to the same ZPOOL_CONFIG_PATH
the slice cache must be indexed by guid and name.  By avoiding
collisions the preferred ordering can be maintaining even
when multiple block devices claim the same ZPOOL_CONFIG_PATH.
The preferred sorting by partition was never benefitial for
a Linux system and was removed as part of this change.

* When adding entries to the blkid cache avl_find/avl_insert
are used instead of avl_add because collisions are possible
and must be handled gracefully.

* For pools using multipath devices there are, at a minimum,
three devices where a vdev label may be read.  They are the
dm-* device and each underlying /dev/sd* device.  Due to the
way the block cache is implemented each of these devices may
have a different cached copy of the vdev label.  This can
result in "ghost pools" which appear to persist even after
a 'zpool labelclear' has been done to the dm-* device.  In
order to prevent this the vdev label is read with O_DIRECT
in order to bypass any caching to get the on-disk version.

* When opening a block device verify that vdev guid read from
the disk matches the expected vdev guid.  This allows for bad
labels to be filtered out.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5359

8 years agoAllow 16M zio buffers in user space
Brian Behlendorf [Sat, 5 Nov 2016 04:54:48 +0000 (04:54 +0000)]
Allow 16M zio buffers in user space

Only restrict the maximum zio alloc size to 32-bit kernel space.
The same virtual address space limitations don't apply to user
space.  This resolves a memory allocation failure in raidz_test
where it expects to be able to exercises all valid zio sizes.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
8 years agoReplace ISAINFO with is_32bit function
Brian Behlendorf [Fri, 4 Nov 2016 21:10:17 +0000 (21:10 +0000)]
Replace ISAINFO with is_32bit function

The isainfo(1) utility was used by the ZFS Test Suite to determine
when running on a 32-bit platform.  This non-portable check has been
replaced with an is_32bit helper function which uses getconf(1).
The getconf(1) utility is available for Linux, FreeBSD, and Illumos.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
8 years agoAllow autoreplace even when enclosure LED sysfs entries don't exist
Tony Hutter [Fri, 4 Nov 2016 20:34:13 +0000 (13:34 -0700)]
Allow autoreplace even when enclosure LED sysfs entries don't exist

The previous autoreplace code assumed that if you were using autoreplace, then
you also had the enclosure SES driver loaded.  This could lead to autoreplace
not working if the SES driver wasn't loaded, or if it wasn't creating the
proper enclosure_device symlinks (which has happened).  This patch removes
that assumption.

Reviewed by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #5363

8 years agoAdd superscalar fletcher4
Romain Dolbeau [Fri, 4 Nov 2016 17:53:03 +0000 (18:53 +0100)]
Add superscalar fletcher4

This is the Fletcher4 algorithm implemented in pure C, but using
multiple counters using algorithms identical to those used for
SSE/NEON and AVX2.

This allows for faster execution on core with strong superscalar
capabilities but weak SIMD capabilities.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net>
Closes #5317

8 years agoAdd support for O_TMPFILE
Chunwei Chen [Tue, 26 Jan 2016 20:29:46 +0000 (12:29 -0800)]
Add support for O_TMPFILE

Linux 3.11 add O_TMPFILE to open(2), which allow creating an unlinked file on
supported filesystem. It's basically doing open(2) and unlink(2) atomically.

The filesystem support is added through i_op->tmpfile. We basically copy the
create operation except we get rid of the link and name related stuff and add
the new node to unlinked set.

We also add support for linkat(2) to link tmpfile. However, since all previous
file operation will skip ZIL, we force a txg_wait_synced to make sure we are
sync safe.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
8 years agoFix unlinked file cannot do xattr operations
Chunwei Chen [Thu, 13 Oct 2016 00:30:46 +0000 (17:30 -0700)]
Fix unlinked file cannot do xattr operations

Currently, doing things like fsetxattr(2) on an unlinked file will result in
ENODATA. There's two places that cause this: zfs_dirent_lock and zfs_zget.

The fix in zfs_dirent_lock is pretty straightforward. In zfs_zget though, we
need it to not return error when the zp is unlinked. This is a pretty big
change in behavior, but skimming through all the callers, I don't think this
change would cause any problem. Also there's nothing preventing z_unlinked
from being set after the z_lock mutex is dropped before but before zfs_zget
returns anyway.

The rest of the stuff is to make sure we don't log xattr stuff when owner is
unlinked.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
8 years agoAdd parity generation/rebuild using AVX-512 for x86-64
Romain Dolbeau [Wed, 2 Nov 2016 19:40:23 +0000 (20:40 +0100)]
Add parity generation/rebuild using AVX-512 for x86-64

avx512f should work on all AVX512 hardware, since it only uses
Foundation instructions.

avx512bw should be faster on hardware supporting the AVW512BW
extension. We can use full-width pshufb (instead of relying on the 256
bits AVX2 pshufb). As a side-effect, the code is also unrolled more.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.github@dolbeau.name>
Closes #5219

8 years agoFix dsl_prop_get_all_dsl() memory leak
BearBabyLiu [Wed, 2 Nov 2016 19:34:10 +0000 (03:34 +0800)]
Fix dsl_prop_get_all_dsl() memory leak

On error dsl_prop_get_all_ds() does not free the nvlist it allocates.
This behavior may have been intentional when originally written
but is atypical and often confusing.  Since no callers rely on this
behavior the function has been updated to always free the nvlist
on error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: BearBabyLiu <liu.huang@zte.com.cn>
Closes #5320

8 years agoSkip async_destroy_001_pos on 32-bit systems
Brian Behlendorf [Mon, 31 Oct 2016 21:16:37 +0000 (21:16 +0000)]
Skip async_destroy_001_pos on 32-bit systems

The async_destroy_001_pos test case currently hangs when testing on
a 32-bit system.  Conditionally skip this test case on 32-bit
systems until the root cause is identified and resolved.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5352
Issue #5347

8 years agoUse vmem_size() for 32-bit systems
Brian Behlendorf [Mon, 31 Oct 2016 19:24:54 +0000 (19:24 +0000)]
Use vmem_size() for 32-bit systems

On 32-bit Linux systems use vmem_size() to correctly size the ARC
and better determine when IO should be throttle due to low memory.

On 64-bit systems this change has no effect since the virtual
address space available far exceeds the physical memory available.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5347

8 years agoFix 32-bit maximum volume size
Brian Behlendorf [Fri, 28 Oct 2016 23:53:24 +0000 (23:53 +0000)]
Fix 32-bit maximum volume size

A limit of 1TB exists for zvols on 32-bit systems.  Update the code
to correctly reflect this limitation in a similar manor as the
OpenZFS implementation.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5347

8 years agoEnable .zfs/snapshot for 32-bit systems
Brian Behlendorf [Fri, 28 Oct 2016 22:42:56 +0000 (22:42 +0000)]
Enable .zfs/snapshot for 32-bit systems

Originally the .zfs/snapshot directory was disabled for 32-bit systems
because 64-bit inode numbers were not supported.  This is no longer
the case and this functionality can be enabled by default.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5347
Closes #2002

8 years agoAdd TASKQID_INVALID
Brian Behlendorf [Fri, 28 Oct 2016 22:40:14 +0000 (22:40 +0000)]
Add TASKQID_INVALID

Add the TASKQID_INVALID macros and update callers to use the macro
instead of testing against 0.  There is no functional change
even though the functions in zfs_ctldir.c incorrectly used -1
instead of 0.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5347

8 years agoProcess all systemd services through the systemd scriptlets
Neal Gompa (ニール・ゴンパ) [Wed, 2 Nov 2016 17:56:36 +0000 (13:56 -0400)]
Process all systemd services through the systemd scriptlets

This patch ensures that all systemd services are processed through the
systemd scriptlets, so that services are properly configured per the
preset file installed by the package.

Without this, zfs.target is set, but none of the services are enabled per
the preset file, meaning automounting filesystems and such won't work
out of the box.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Neal Gompa <ngompa13@gmail.com>
Closes #5356

8 years agoFix sa_legacy_attr_count to use ARRAY_SIZE
cao [Wed, 2 Nov 2016 17:26:12 +0000 (01:26 +0800)]
Fix sa_legacy_attr_count to use ARRAY_SIZE

Replace magic value 16 with ARRAY_SIZE() to correctly handle
when the sa_legacy_attrs array size changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5354

8 years agoFix coverity defects: CID 147553
cao [Tue, 1 Nov 2016 17:20:24 +0000 (01:20 +0800)]
Fix coverity defects: CID 147553

CID 147553: Type:Dereference null return value

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5305

8 years agoFix coverity defects: CID 147548
cao [Mon, 31 Oct 2016 23:56:10 +0000 (07:56 +0800)]
Fix coverity defects: CID 147548

CID 147548: Type:Dereference null return value

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5321

8 years agoFix coverity defects: CID 152975
cao [Mon, 31 Oct 2016 23:23:56 +0000 (07:23 +0800)]
Fix coverity defects: CID 152975

CID 152975: Type:Dereference null return value

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5322

8 years agoFix coverity defects: CID 147509
GeLiXin [Mon, 31 Oct 2016 23:04:01 +0000 (07:04 +0800)]
Fix coverity defects: CID 147509

CID 147509: Explicit null dereferenced
- l2arc_sublist_lock is fragile as relied on caller too much.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes #5319

8 years agoUpdate migration tests
legend-hua [Mon, 31 Oct 2016 21:55:40 +0000 (05:55 +0800)]
Update migration tests

Due to the instability of the migration tests, the test will skip.
The migration tests focus on migrating test file from fs to ZFS fs.
We can create zpool and ext2 directly by loop device, rather than
by set_partition

Reviewed-by: Sydney Vanda <sydney.m.vanda@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: legend-hua <liu.hua130@zte.com.cn>
Closes #5315

8 years agoAdd paxcheck make lint target
Jason Zaman [Fri, 28 Oct 2016 23:10:00 +0000 (07:10 +0800)]
Add paxcheck make lint target

This uses scanelf (from pax-utils) to check for any issues with the
binaries. It currently checks for executable stacks and textrels.
The checks are in a script so can be extended easily in the future for
more checks.

Executable stacks and textrels are frequently caused by issues in asm
files and lead to security and perf problems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason Zaman <jason@perfinion.com>
Closes #5338

8 years agoTag 0.7.0-rc2 zfs-0.7.0-rc2
Brian Behlendorf [Wed, 26 Oct 2016 17:36:33 +0000 (10:36 -0700)]
Tag 0.7.0-rc2

Second release candidate.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
8 years agoFix lookup_bdev() on Ubuntu
Hajo Möller [Wed, 26 Oct 2016 17:30:43 +0000 (19:30 +0200)]
Fix lookup_bdev() on Ubuntu

Ubuntu added support for checking inode permissions to lookup_bdev() in kernel
commit 193fb6a2c94fab8eb8ce70a5da4d21c7d4023bee (merged in 4.4.0-6.21).
Upstream bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1636517

This patch adds a test for Ubuntu's variant of lookup_bdev() to configure and
calls the function in the correct way.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hajo Möller <dasjoe@gmail.com>
Closes #5336

8 years agoDisable zio_dva_throttle_enabled by default
Brian Behlendorf [Wed, 26 Oct 2016 16:13:43 +0000 (09:13 -0700)]
Disable zio_dva_throttle_enabled by default

Until it can be determined definitively that a performance
regression wasn't introduced accidentally by 3dfb57a this
functionality is being disabled by default.  It can be re-
enabled by setting zio_dva_throttle_enabled=1.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5335
Issue #5289

8 years agoAllow for '-o feature@<feature>=disabled' on the command line
LOLi [Tue, 25 Oct 2016 23:17:47 +0000 (01:17 +0200)]
Allow for '-o feature@<feature>=disabled' on the command line

Sometimes it is desirable to specifically disable one or several
features directly on the 'zpool create' command line.

$ zpool create -o feature@<feature>=disabled ...

Original-patch-by: Turbo Fredriksson <turbo@bayour.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #3460
Closes #5142
Closes #5324

8 years agoDo not upgrade userobj accounting for snapshot dataset
jxiong [Tue, 25 Oct 2016 20:21:05 +0000 (04:21 +0800)]
Do not upgrade userobj accounting for snapshot dataset

'zfs recv' could disown a living objset without calling
dmu_objset_disown(). This will cause the problem that the objset
would be released while the upgrading thread is still running.

This patch avoids the problem by checking if a dataset is a snapshot
before calling dmu_objset_userobjspace_upgrade().  Snapshots
are immutable and therefore it doesn't make sense to update them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Closes #5295
Closes #5328

8 years agoFix statechange-led.sh & unnecessary libdevmapper warning
Tony Hutter [Tue, 25 Oct 2016 18:05:30 +0000 (11:05 -0700)]
Fix statechange-led.sh & unnecessary libdevmapper warning

- Fix autoreplace behaviour on statechange-led.sh script.

ZED sends the following events on an auto-replace:

1. statechange: Disk goes UNAVAIL->ONLINE
2. statechange: Disk goes ONLINE->UNAVAIL
3. vdev_attach: Disk goes ONLINE

Events 1-2 happen when ZED first attempts to do an auto-online.  When that
fails, ZED then tries an auto-replace, generating the vdev_attach event in #3.

In the previous code, statechange-led was only looking at the UNAVAIL->ONLINE
transition to turn off the LED.  It ignored the #2 ONLINE->UNAVAIL transition,
assuming it was just the "old" VDEV going offline.  This is problematic, as
a drive can go from ONLINE->UNAVAIL when it's malfunctioning, and we don't want
to ignore that.

This new patch correctly turns on the fault LED every time a drive becomes
UNAVAIL.  It also monitors vdev_attach events to trigger turning off the LED
when an auto-replaced disk comes online.

- Remove unnecessary libdevmapper warning with --with-config=kernel

This fixes an unnecessary libdevmapper warning when building
--with-config=kernel.  Kernel code does not use libdevmapper, so the warning
is not needed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #2375
Closes #5312
Closes #5331

8 years agoicp: mark asm files with noexec stack
Jason Zaman [Tue, 25 Oct 2016 17:44:09 +0000 (01:44 +0800)]
icp: mark asm files with noexec stack

Similar to commit a3600a106.  Asm files need an explicit note
that they do not require an executable stack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason Zaman <jason@perfinion.com>
Closes #5332

8 years agoFix cred leak in zpl_fallocate_common
tuxoko [Mon, 24 Oct 2016 23:41:56 +0000 (16:41 -0700)]
Fix cred leak in zpl_fallocate_common

This is caught by kmemleak when running compress_004_pos

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5244
Closes #5330

8 years agoDisable zpool_upgrade_002_pos test case
Brian Behlendorf [Mon, 24 Oct 2016 23:39:47 +0000 (16:39 -0700)]
Disable zpool_upgrade_002_pos test case

This test case frequently triggers issue #4034.  There exists a
fix for this which is in the process of being upstreamed.  Until
that fix is available disable the test case.

Reviewed by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5329
Issue #4034

8 years agoFix coverity defects: CID 147511, 147513
cao [Mon, 24 Oct 2016 20:37:38 +0000 (04:37 +0800)]
Fix coverity defects: CID 147511, 147513

CID 147511: Type:Dereference before null check
CID 147513: Type:Dereference before null check

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5306

8 years agoFix taskq creation failure in vdev_open_children()
Brian Behlendorf [Mon, 24 Oct 2016 20:28:58 +0000 (13:28 -0700)]
Fix taskq creation failure in vdev_open_children()

When creating and destroying pools in tight loop it's possible to
exhaust the number of allowed threads on a system.  This results
in taskq_create() failling and a NULL dereference.

Resolve the issue by falling back to opening the vdevs all
synchronously.

Reviewed-by: Denys Rtveliashvili <denys@rtveliashvili.name>
Reviewed-by: Håkan Johansson <f96hajo@chalmers.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/spl#521
Closes #4637

8 years agoTurn on/off enclosure slot fault LED even when disk isn't present
Tony Hutter [Mon, 24 Oct 2016 17:45:59 +0000 (10:45 -0700)]
Turn on/off enclosure slot fault LED even when disk isn't present

Previously when a drive faulted, the statechange-led.sh script would lookup
the drive's LED sysfs entry in /sys/block/sd*/device/enclosure_device, and
turn it on.  During testing we noticed that if you pulled out a drive, or if
the drive was so badly broken that it no longer appeared to Linux, that the
/sys/block/sd* path would be removed, and the script could not lookup the
LED entry.

To fix this, this patch looks up the disks's more persistent
"/sys/class/enclosure/X:X:X:X/Slot N" LED sysfs path at pool import.  It then
passes that path to the statechange-led script to use, rather than having the
script look it up on the fly.  This allows the script to turn on/off the slot
LEDs even when the drive is missing.

Closes #5309
Closes #2375

8 years agoChange location of current symlink created by test-runner
Giuseppe Di Natale [Mon, 24 Oct 2016 17:24:10 +0000 (10:24 -0700)]
Change location of current symlink created by test-runner

test-runner should be creating the current symlink in the
directory above the output directory. In a previous commit,
the current symlink was placed in the current working
directory, which could be inaccessible. It is more likely
that the output directory is always accessible.

This is needed because without this there's no deterministic
way to get the path to ZFS Test Suite results until after the
test suite has started. This makes it difficult for buildbot to
follow the log file.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5314

8 years agoFletcher4 algorithm implemented in pure NEON for Aarch64 / ARMv8 64 bits
Romain Dolbeau [Fri, 21 Oct 2016 17:55:49 +0000 (19:55 +0200)]
Fletcher4 algorithm implemented in pure NEON for Aarch64 / ARMv8 64 bits

This is not useful on micro-architecture with a weak NEON
implementation (only 64 bits); the native version is slower &
the byteswap barely faster than scalar.  On A53 or A57, it's
a small improvement on scalar but OK for byteswap.

Results from an A53 system:
0 0 0x01 -1 0 1499068294333000 1499101101878000
implementation   native         byteswap
scalar           1008227510     755880264
aarch64_neon     1198098720     1044818671
fastest          aarch64_neon   aarch64_neon

Results from a A57 system:
0 0 0x01 -1 0 4407214734807033 4407233933777404
implementation   native         byteswap
scalar           2302071241     1124873346
aarch64_neon     2542214946     2245570352
fastest          aarch64_neon   aarch64_neon

Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net>
Closes #5248

8 years agoFix userquota_compare() function
Brian Behlendorf [Fri, 21 Oct 2016 15:23:27 +0000 (08:23 -0700)]
Fix userquota_compare() function

The AVL tree compare function requires that either -1, 0, or 1 be
returned.  However the strcmp() function only guarantees that a
negative, zero, or positive value is returned.  Therefore, the
return value of strcmp() needs to be sanitized with AVL_ISIGN.

This was initially overlooked because the x86_64 implementation
of strcmp() happens to only returns the allowed values.  This
was observed on an aarch64 platform which behaves correctly but
differently as described above.

Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5311
Closes #5313

8 years agoFix coverity defects: CID 153459
luozhengzheng [Thu, 20 Oct 2016 18:54:02 +0000 (02:54 +0800)]
Fix coverity defects: CID 153459

CID 153459: Null pointer dereferences (FORWARD_NULL)
Accidentally introduced by #5159.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5310

8 years agoFix coverity defects: CID 147551, 147552
cao [Thu, 20 Oct 2016 18:49:50 +0000 (02:49 +0800)]
Fix coverity defects: CID 147551, 147552

CID 147551: Type:dereference null return value
CID 147552: Type:dereference null return value

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5279

8 years agoFix coverity defects: CID 147472
cao [Thu, 20 Oct 2016 18:24:01 +0000 (02:24 +0800)]
Fix coverity defects: CID 147472

CID 147472: Type: 'Constant' variable guards dead code

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5288

8 years agoFix coverity defects: CID 150919, 150923
luozhengzheng [Thu, 20 Oct 2016 18:09:39 +0000 (02:09 +0800)]
Fix coverity defects: CID 150919, 150923

CID 150919: Buffer not null terminated (BUFFER_SIZE_WARNING)
CID 150923: Buffer not null terminated (BUFFER_SIZE_WARNING)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5298

8 years agoUpdate migration_004_pos, migration_005_pos, migration_006_pos
legend-hua [Thu, 20 Oct 2016 18:04:30 +0000 (02:04 +0800)]
Update migration_004_pos, migration_005_pos, migration_006_pos

Log function should be "log_fail", rather than "log_failED"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: legend-hua <liu.hua130@zte.com.cn>
Closes #5300

8 years agoFix make distclean Makefile.am removal
Brian Behlendorf [Thu, 20 Oct 2016 16:55:03 +0000 (09:55 -0700)]
Fix make distclean Makefile.am removal

The file tests/zfs-tests/tests/stress/Makefile.am gets mistakenly
removed by the distclean target because it's empty.  Adding a
`SUBDIRS =` line prevents the removal.

This directory is being preserved as the location to add assorted
stress tests.  These may include but are not limited to.

  http://kernel.ubuntu.com/~cking/stress-ng/
  https://github.com/zfsonlinux/zfsstress/

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5308

8 years agoLinux 4.9 compat: inode_change_ok() renamed setattr_prepare()
Brian Behlendorf [Tue, 18 Oct 2016 23:49:23 +0000 (23:49 +0000)]
Linux 4.9 compat: inode_change_ok() renamed setattr_prepare()

In torvalds/linux@31051c8 the inode_change_ok() function was
renamed setattr_prepare() and updated to take a dentry ratheri
than an inode.  Update the code to call the setattr_prepare()
and add a wrapper function which call inode_change_ok() for
older kernels.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Requires-spl: refs/pull/581/head

8 years agoLinux 4.9 compat: remove iops->{set,get,remove}xattr
Chunwei Chen [Wed, 19 Oct 2016 18:19:17 +0000 (11:19 -0700)]
Linux 4.9 compat: remove iops->{set,get,remove}xattr

In Linux 4.9, torvalds/linux@fd50eca, iops->{set,get,remove}xattr and
generic_{set,get,remove}xattr are removed. xattr operations will directly
go through sb->s_xattr.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
8 years agoLinux 4.9 compat: iops->rename() wants flags
Chunwei Chen [Wed, 19 Oct 2016 18:19:01 +0000 (11:19 -0700)]
Linux 4.9 compat: iops->rename() wants flags

In Linux 4.9, torvalds/linux@2773bf0, iops->rename() and iops->rename2() are
merged together into iops->rename(), it now wants flags.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
8 years agoRemove dir inode operations from zpl_inode_operations
Chunwei Chen [Wed, 19 Oct 2016 18:12:20 +0000 (11:12 -0700)]
Remove dir inode operations from zpl_inode_operations

These operations are dir specific, there's no point putting them in
zpl_inode_operations which is for regular files.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
8 years agoUpdate .gitignore
Brian Behlendorf [Wed, 19 Oct 2016 21:29:33 +0000 (14:29 -0700)]
Update .gitignore

Two additional files were recently introduced and should be
ignored by git.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5299

8 years agoMultipath autoreplace, control enclosure LEDs, event rate limiting
Tony Hutter [Wed, 19 Oct 2016 19:55:59 +0000 (12:55 -0700)]
Multipath autoreplace, control enclosure LEDs, event rate limiting

1. Enable multipath autoreplace support for FMA.

This extends FMA autoreplace to work with multipath disks.  This
requires libdevmapper to be installed at build time.

2. Turn on/off fault LEDs when VDEVs become degraded/faulted/online

Set ZED_USE_ENCLOSURE_LEDS=1 in zed.rc to have ZED turn on/off the enclosure
LED for a drive when a drive becomes FAULTED/DEGRADED.  Your enclosure must
be supported by the Linux SES driver for this to work.  The enclosure LED
scripts work for multipath devices as well.  The scripts will clear the LED
when the fault is cleared.

3. Rate limit ZIO delay and checksum events so as not to flood ZED

ZIO delay and checksum events are rate limited to 5/sec in the zfs module.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #2449
Closes #3017
Closes #5159

8 years agoFix coverity defects: CID 150926
luozhengzheng [Tue, 18 Oct 2016 18:32:59 +0000 (02:32 +0800)]
Fix coverity defects: CID 150926

CID 150926: Unchecked return value (CHECKED_RETURN)
- This case cannot occur given the existing taskq implementation
  and flags passed to task_dispatch().

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5272

8 years agoFix unused variable
Brian Behlendorf [Tue, 18 Oct 2016 17:44:44 +0000 (10:44 -0700)]
Fix unused variable

Accidentally introduced by 3dfb57a, when building with debugging
disabled several variables are unused.  Resolve this by wrapping
them in ASSERTV to remove them for non-debug builds.

Reviewed by: Don Brady <don.brady@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5284

8 years agoFix coverity defects: CID 147643, 152204, 49339
GeLiXin [Tue, 18 Oct 2016 17:43:22 +0000 (01:43 +0800)]
Fix coverity defects: CID 147643, 152204, 49339

CID 147643: Type: String not null terminated
- make sure that the string is null terminated before strlen
  and fprintf.

CID 152204: Type: Copy into fixed size buffer
- since strlcpy isn't availabe here, use strncpy and terminate
  the string manually.

CID 49339: Type: Buffer not null terminated
- since strlcpy isn't availabe here, terminate the string
  manually before fprintf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes #5283

8 years agoFix coverity defects: CID 49339, 153393
cao [Tue, 18 Oct 2016 17:31:57 +0000 (01:31 +0800)]
Fix coverity defects: CID 49339, 153393

CID 49339: Type:Buffer not null terminated
CID 153393: Type:Buffer not null terminated

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: <cao.xuewen cao.xuewen@zte.com.cn>
Closes #5296

8 years agoCreate a symlink to current test-runner output
Giuseppe Di Natale [Tue, 18 Oct 2016 17:19:28 +0000 (10:19 -0700)]
Create a symlink to current test-runner output

Generate a symlink in the current working directory to
test-runner.py output. This will make it easier for the
ZFS buildbot to collect logs.

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5293

8 years agoFix coverity defects: CID 150924
luozhengzheng [Mon, 17 Oct 2016 19:03:52 +0000 (03:03 +0800)]
Fix coverity defects: CID 150924

CID 150924: Unchecked return value (CHECKED_RETURN)
- On taskq_dispatch failure the reference must be dropped and
  this entry can be safely skipped.  This case should be impossible
  in the existing implementation but should be handled regardless.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5278

8 years agoProperly use the Dracut cleanup hook to order pool shutdown
Rudd-O [Mon, 17 Oct 2016 18:51:15 +0000 (18:51 +0000)]
Properly use the Dracut cleanup hook to order pool shutdown

When Dracut starts up, it needs to determine whether a pool will remain
"hanging open" before the system shuts off. In such a case, then the
code to clean up the pool (using the previous export -F work) must
be invoked. Since Dracut has had a recent change that makes
mount-zfs.sh simply not run when the root dataset is already mounted,
we must use the cleanup hook to order Dracut to do shutdown cleanup.

Important note: this code will not accomplish its stated goal until this
bug is fixed: https://bugzilla.redhat.com/show_bug.cgi?id=1385432

That bug impacts more than just ZFS. It impacts LUKS, dmraid, and
unmount during poweroff. It is a Fedora-wide bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Closes #5287

8 years agoPass status_cbdata_t to print_status_config() and friends
Håkan Johansson [Mon, 17 Oct 2016 18:46:35 +0000 (20:46 +0200)]
Pass status_cbdata_t to print_status_config() and friends

First rename spare_cbdata_t cb -> spare_cb in print_status_config(),
to free up cb.

Using the structure removes the explicit parameters namewidth
and name_flags from several functions.  Also use status_cbdata_t
for print_import_config().  This simplifies print_logs().

Remove the parameter 'verbose' for print_logs().  It does not really
mean verbose, it selected between the print_status_config and
print_import_config() paths.  This selection is now done by
cb_print_config of spare_cbdata_t.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Håkan Johansson <f96hajo@chalmers.se>
Closes #5259

8 years agoUse -F to export pools so as not to dirty up device labels
Rudd-O [Sun, 16 Oct 2016 03:30:53 +0000 (03:30 +0000)]
Use -F to export pools so as not to dirty up device labels

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Closes #5228
Closes #5238

8 years agoAllow partition aliases in vdev_id.conf (#5266)
Brian Behlendorf [Fri, 14 Oct 2016 23:11:16 +0000 (16:11 -0700)]
Allow partition aliases in vdev_id.conf (#5266)

When pools are assembled from partitions, vdev_id.conf aliases
do not work.  The directory /dev/disk/by-vdev is not created because
the associated udev rule for parsing vdev_id.conf is never called.
Extend to logic to match "disk" and "partition".

Patch-proposed-by: @sparksh
Reviewed-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3859
Closes #5266

8 years agoFix coverity defects: CID 147488, 147490
cao [Fri, 14 Oct 2016 18:00:47 +0000 (02:00 +0800)]
Fix coverity defects: CID 147488, 147490

CID 147488, Type:explicit null dereferenced
CID 147490, Type:dereference null return value

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5237

8 years agoOpenZFS 6877 - zfs_rename_006_pos fails due to missing zvol snapshot device file
Akash Ayare [Wed, 20 Apr 2016 04:07:54 +0000 (21:07 -0700)]
OpenZFS 6877 - zfs_rename_006_pos fails due to missing zvol snapshot device file

Authored by: Akash Ayare <aayare@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Reviewed-by: yuxiang <guo.yong33@zte.com.cn>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Bug was caused due to a change in functionality. At some point, ZFS
snapshots no longer created associated device files which were being
used in the test. To resolve this issue, a clone of the snapshot can be
produced which will also create the expected device files; then, the
test will behave as it did historically.

OpenZFS-issue: https://www.illumos.org/issues/6877
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2200f27
Closes #5275

Porting Notes:
- Hardcoded /dev/zvol/rdsk changed to $ZVOL_RDEVDIR for compatibility.
- Enabled in linux runfile.