Nathaniel Clark [Thu, 18 Apr 2013 20:57:29 +0000 (16:57 -0400)]
Make spl directory setable when building rpms and add --buildroot
This adds ability to set the location of spl via defines when
building from the spec files. This is useful for build systems
that build spl and zfs together without installing the actual rpms.
Signed-off-by: Nathaniel Clark <Nathaniel.Clark@misrule.us> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1486
Brian Behlendorf [Tue, 18 Jun 2013 17:15:33 +0000 (10:15 -0700)]
Register correct handlers in nvlist_alloc()
The non-blocking allocation handlers in nvlist_alloc() would be
mistakenly assigned if any flags other than KM_SLEEP were passed.
This meant that nvlists allocated with KM_PUSHPUSH or other KM_*
debug flags were effectively always using atomic allocations.
While these failures were unlikely it could lead to assertions
because KM_PUSHPAGE allocations in particular are guaranteed to
succeed or block. They must never fail.
Since the existing API does not allow us to pass allocation
flags to the private allocators the cleanest thing to do is to
add a KM_PUSHPAGE allocator.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/spl#249
Matthew Ahrens [Thu, 6 Jun 2013 22:46:55 +0000 (18:46 -0400)]
Illumos #3805 arc shouldn't cache freed blocks
3805 arc shouldn't cache freed blocks
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Richard Elling <richard.elling@dey-sys.com>
Reviewed by: Will Andrews <will@firepipe.net>
Approved by: Dan McDonald <danmcd@nexenta.com>
ZFS should proactively evict freed blocks from the cache.
On dcenter, we saw that we were caching ~256GB of metadata, while the
pool only had <4GB of metadata on disk. We were wasting about half the
system's RAM (252GB) on blocks that have been freed.
Even though these freed blocks will never be used again, and thus will
eventually be evicted, this causes us to use memory inefficiently for 2
reasons:
1. A block that is freed has no chance of being accessed again, but will
be kept in memory preferentially to a block that was accessed before it
(and is thus older) but has not been freed and thus has at least some
chance of being accessed again.
2. We partition the ARC into several buckets:
user data that has been accessed only once (MRU)
metadata that has been accessed only once (MRU)
user data that has been accessed more than once (MFU)
metadata that has been accessed more than once (MFU)
The user data vs metadata split is somewhat arbitrary, and the primary
control on how much memory is used to cache data vs metadata is to
simply try to keep the proportion the same as it has been in the past
(each bucket "evicts against" itself). The secondary control is to
evict data before evicting metadata.
Because of this bucketing, we may end up with one bucket mostly
containing freed blocks that are very old, while another bucket has more
recently accessed, still-allocated blocks. Data in the useful bucket
(with still-allocated blocks) may be evicted in preference to data in
the useless bucket (with old, freed blocks).
On dcenter, we saw that the MFU metadata bucket was 230MB, while the MFU
data bucket was 27GB and the MRU metadata bucket was 256GB. However,
the vast majority of data in the MRU metadata bucket (256GB) was freed
blocks, and thus useless. Meanwhile, the MFU metadata bucket (230MB)
was constantly evicting useful blocks that will be soon needed.
The problem of cache segmentation is a larger problem that needs more
investigation. However, if we stop caching freed blocks, it should
reduce the impact of this more fundamental issue.
Ported-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1503
Ying Zhu [Sat, 15 Jun 2013 14:25:48 +0000 (22:25 +0800)]
Fix compile warning on 32-bit systems
The definition of zfs_vdev_holder casts VDEV_HOLDER into a function pointer
passing to linux kernel's block layer function blkdev_get_by_path.
However current VDEV_HOLDER is defined to be wider than 32 bits and the compiler
warns about potential overflows. Instead of specifying different values for 32-bit and
64-bit systems using ifdefs, choose the common factor 32-bit addresses.
Redefine VDEV_HOLDER to 0x2401de7("zholder") here.
Signed-off-by: Ying Zhu <casualfisher@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1520
George Wilson [Sat, 15 Jun 2013 03:30:35 +0000 (22:30 -0500)]
Illumos #3552, #3564
3552 condensing one space map burns 3 seconds of CPU in spa_sync() thread
3564 spa_sync() spends 5-10% of its time in metaslab_sync() (when not condensing)
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Madhav Suresh [Fri, 10 May 2013 21:17:03 +0000 (14:17 -0700)]
Illumos #3006
3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first
argument is zero
Reviewed by Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by George Wilson <george.wilson@delphix.com>
Approved by Eric Schrock <eric.schrock@delphix.com>
Richard Yao [Wed, 29 May 2013 00:08:15 +0000 (20:08 -0400)]
Improve OpenRC init script
The current zfs OpenRC script's dependencies cause OpenRC to attempt to
unmount ZFS filesystems at shutdown while things were still using them,
which would fail. This is a cosmetic issue, but it should still be
addressed. It probably does not affect systems where the rootfs is a
legacy filesystem, but any system with the rootfs on ZFS needs to run
the ZFS init script after the system is ready to shutdown filesystems.
OpenRC's shutdown process occurs in the reverse order of the startup
process. Therefore running the ZFS shutdown procedure after filesystems
are ready to be unmounted requires running the startup procedure before
fstab. This patch changes the dependencies of the script to expliclty
run before fstab at boot when the rootfs is ZFS and to run after fstab
at boot whenever the rootfs is not ZFS. This should cover most use
cases.
The only cases not covered well by this are systems with legacy
root filesystems where people want to configure fstab to mount a non-ZFS
filesystem off a zvol and possibly also systems whose pools are stored
on network block devices. The former requires that the ZFS script run
before fstab, which could cause ZFS datasets to mount too early and
appear under the fstab mount points. The latter requires that the ZFS
script run after networking starts, which precludes the ability to store
any system information on ZFS. An additional OpenRC script could be
written to handle non-root pools on network block devices, but that will
depend on user demand and developer time.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1479
Ned Bass [Tue, 14 May 2013 02:48:24 +0000 (19:48 -0700)]
Don't leak mount flags into kernel
When calling mount(), care must be taken to avoid passing in flags
that are used only by the user space utilities. Otherwise we may
stomp on flags that are reserved for other purposes in the kernel.
In particular, openSUSE 12.3 kernels have added a new MS_RICHACL
super-block flag whose value conflicts with our MS_COMMENT flag. This
causes incorrect behavior such as the umask being ignored. The
MS_COMMENT flag essentially serves as a placeholder in the option_map
data structure of zfs_mount.c, but its value is never used. Therefore
we can avoid the conflict by defining it to 0.
The MS_USERS, MS_OWNER, and MS_GROUP flags also conflict with reserved
flags in the kernel. While this is not known to have caused any
problems, it is nevertheless incorrect. For the purposes of the
mount.zfs helper, the "users", "owner", and "group" options just serve
as hints to set additional implied options. Therefore we now define
their associated mount flags in terms of the options that they imply
rather than giving them unique values.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1457
Steven Burgess [Sun, 12 May 2013 17:14:44 +0000 (13:14 -0400)]
Adds zpool split to man page
Adds zpool split documentation to the zpool man page. I only documented
the options that I could get to work. While it is documented on some
sun blogs that devices can be specified for split, I was not able to get
that to work during my testing.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1456
When SA xattrs are enabled only fallback to checking the directory
xattrs when the name is not found as a SA xattr. Otherwise, the SA
error which should be returned to the caller is overwritten by the
directory xattr errors. Positive return values indicating success
will also be immediately returned.
In the case of #1437 the ERANGE error was being correctly returned
by zpl_xattr_get_sa() only to be overridden with ENOENT which was
returned by the subsequent unnessisary call to zpl_xattr_get_dir().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1437
Cyril Plisko [Wed, 24 Oct 2012 10:26:56 +0000 (12:26 +0200)]
zfs_scrub_limit tunable is not used anywhere
As a part of scrub/resilver tuning zfs_scrub_limit fell out of use,
but the definition of the variable remained in place.
Moreover various guides still (misleadingly) mention it as a way
to influence resilver/scrub behavior.
This commit removes its finally.
Signed-off-by: Cyril Plisko <cyril.plisko@mountall.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1444
Fix incorrect assertions in ddt_phys_decref and ddt_sync_entry
The assertions in ddt_phys_decref and ddt_sync_entry cast ddp->ddp_refcnt
from uint64_t to int64_t, with a reference count bigger than 2^63, e.g. the
reference count of zero blocks commonly available in spare files, we may
mistakenly hit these assertations, so drop the type conversions here.
Signed-off-by: Ying Zhu <casualfisher@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1436
The vn_rdwr() function performs I/O by calling the vfs_write() or
vfs_read() functions. These functions reside just below the system
call layer and the expectation is they have almost the entire 8k of
stack space to work with. In fact, certain layered configurations
such as ext+lvm+md+multipath require the majority of this stack to
avoid stack overflows.
To avoid this posibility the vn_rdwr() call in dump_bytes() has been
moved to the ZIO_TYPE_FREE, taskq. This ensures that all I/O will be
performed with the majority of the stack space available. This ends
up being very similiar to as if the I/O were issued via sys_write()
or sys_read().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1399
Closes #1423
3581 spa_zio_taskq[ZIO_TYPE_FREE][ZIO_TASKQ_ISSUE]->tq_lock is piping hot
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Earlier commit 08d08eb reduced contention on this taskq lock by simply
reducing the number of z_fr_iss threads from 100 to one-per-CPU. We
also optimized the taskq implementation in zfsonlinux/spl@3c6ed54.
These changes significantly improved unlink performance to acceptable
levels.
This patch further reduces time spent spinning on this lock by
randomly dispatching the work items over multiple independent task
queues. The Illumos ZFS developers stated that this lock contention
only arose after "3329 spa_sync() spends 10-20% of its time in
spa_free_sync_cb()" was landed. It's not clear if 3329 affects the
Linux port or not. I didn't see spa_free_sync_cb() show up in
oprofile sessions while unlinking large files, but I may just not
have used the right test case.
I tested unlinking a 1 TB of data with and without the patch and
didn't observe a meaningful difference in elapsed time. However,
oprofile showed that the percent time spent in taskq_thread() was
reduced from about 16% to about 5%. Aside from a possible slight
performance benefit this may be worth landing if only for the sake of
maintaining consistency with upstream.
George Wilson [Mon, 6 May 2013 17:14:52 +0000 (10:14 -0700)]
Illumos #3329, #3330, #3331, #3335
3329 spa_sync() spends 10-20% of its time in spa_free_sync_cb()
3330 space_seg_t should have its own kmem_cache
3331 deferred frees should happen after sync_pass 1
3335 make SYNC_PASS_* constants tunable
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Approved by: Eric Schrock <eric.schrock@delphix.com>
George Wilson [Thu, 2 May 2013 23:36:32 +0000 (16:36 -0700)]
Illumos #3306, #3321
3306 zdb should be able to issue reads in parallel
3321 'zpool reopen' command should be documented in the man
page and help
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
The vdev_file.c implementation in this patch diverges significantly
from the upstream version. For consistenty with the vdev_disk.c
code the upstream version leverages the Illumos bio interfaces.
This makes sense for Illumos but not for ZoL for two reasons.
1) The vdev_disk.c code in ZoL has been rewritten to use the
Linux block device interfaces which differ significantly
from those in Illumos. Therefore, updating the vdev_file.c
to use the Illumos interfaces doesn't get you consistency
with vdev_disk.c.
2) Using the upstream patch as is would requiring implementing
compatibility code for those Solaris block device interfaces
in user and kernel space. That additional complexity could
lead to confusion and doesn't buy us anything.
For these reasons I've opted to simply move the existing vn_rdwr()
as is in to the taskq function. This has the advantage of being
low risk and easy to understand. Moving the vn_rdwr() function
in to its own taskq thread also neatly avoids the possibility of
a stack overflow.
Finally, because of the additional work which is being handled by
the free taskq the number of threads has been increased. The
thread count under Illumos defaults to 100 but was decreased to 2
in commit 08d08e due to contention. We increase it to 8 until
the contention can be address by porting Illumos #3581.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1354
Ensure --with-spl-timeout waits for spl_config.h and symvers
The previous code was only waiting for the symver file. But the
postinst target of the DKMS script for SPL will not only create
the symvers file, but also the header spl_config.h.
If we are waiting in the configure script of ZFS for the SPL
symvers file, then we also need to wait for spl_config.h.
Otherwise the configure script will abort because the spl_config.h
is not yet available.
On top of that, the function ZFS_AC_SPL_MODULE_SYMVERS is moved
to the end of the function ZFS_AC_SPL to allow both checks share
the with-spl-timeout parameter.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1431
Brian Behlendorf [Mon, 29 Apr 2013 21:07:46 +0000 (14:07 -0700)]
Silence 'old_umask' uninit variable warning
Recent changes have caused older versions of gcc to mistakenly
flag 'old_umask' in vn_open() as an unitialized variable. To
silence the warning initialize it.
kernel.c: In function 'vn_open':
kernel.c:525:6: error: 'old_umask' may be used uninitialized
in this function
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The zfs_fd must be opened before calling print_all_handlers() or
the ioctl() cannot be used to the zfs control device. This brings
the zinject code back in sync with the Illumos implementation.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
NOTES: This patch has been reworked from the original in the
following ways to accomidate Linux ZFS implementation
*) Usage of the cyclic interface was replaced by the delayed taskq
interface. This avoids the need to implement new compatibility
code and allows us to rely on the existing taskq implementation.
*) An extern for zfs_txg_synctime_ms was added to sys/dsl_pool.h
because declaring externs in source files as was done in the
original patch is just plain wrong.
*) Instead of panicing the system when the deadman triggers a
zevent describing the blocked vdev and the first pending I/O
is posted. If the panic behavior is desired Linux provides
other generic methods to panic the system when threads are
observed to hang.
*) For reference, to delay zios by 30 seconds for testing you can
use zinject as follows: 'zinject -d <vdev> -D30 <pool>'
Brian Behlendorf [Thu, 25 Apr 2013 23:29:22 +0000 (16:29 -0700)]
Fix txg_quiesce thread deadlock
A deadlock was accidentally introduced by commit e95853a which
can occur when the system is under memory pressure. What happens
is that while the txg_quiesce thread is holding the tx->tx_cpu
locks it enters memory reclaim. In the context of this memory
reclaim it then issues synchronous I/O to a ZVOL swap device.
Because the txg_quiesce thread is holding the tx->tx_cpu locks
a new txg cannot be opened to handle the I/O. Deadlock.
The fix is straight forward. Move the memory allocation outside
the critical region where the tx->tx_cpu locks are held. And for
good measure change the offending allocation to KM_PUSHPAGE to
ensure it never attempts to issue I/O during reclaim.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1274
Brian Behlendorf [Tue, 23 Apr 2013 23:40:47 +0000 (16:40 -0700)]
Set RPM_DEFINE_COMMON options
When the kmod packaging was introduced the ability to pass the
--enable-debug and --enable-dmu-tx options from configure all
the way through to `make rpm|deb` was accidenally lost. Update
ZFS_AC_RPM to explicitlu set RPM_DEFINE_COMMON with these
rpmbuild defines.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1402
Preserve the release field when creating Debian packages. The
--keep-version option was not used because it results in a failure
when the git '<commit>_<hash>' syntax is used for the release.
The '_' is a valid character for RPM packages but not for DEBs.
There are a number of issues with the generic kmod RPM spec in its
current state:
- The "%{__id_u}" macro seems to not be available on some systems (e.g.
Debian squeeze). It appears it has been deprecated. Use "${__id} -u"
instead.
- The way the "--with-linux=" configure option is generated in the
non-RHEL/Fedora case is completely wrong with various newline and
escaping issues (also, $kernel_version is not available in the
generator context).
The second issue made the generator shell snippet (almost) silently
fail, which under specific circumstances can result in broken builds
against the wrong kernel sources.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1416
Brian Behlendorf [Wed, 17 Apr 2013 20:07:36 +0000 (13:07 -0700)]
Correctly return ERANGE in getxattr(2)
According to the getxattr(2) man page the ERANGE errno should be
returned when the size of the value buffer is to small to hold the
result. Prior to this patch the implementation would just truncate
the value to size bytes.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1408
The zpl_readdir() function shouldn't be registered as part of
the zpl_file_operations table, it must only be part of the
zpl_dir_file_operations table. By removing this callback
the VFS will now correctly return ENOTDIR when calling
getdents() on a file.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1404
Martin Matuska [Fri, 12 Apr 2013 17:26:03 +0000 (10:26 -0700)]
Allow setting a lower ashift with -o ashift
Previous patches have allowed you to set an increased ashift to
avoid doing 512b IO with 4k sector devices. However, it was not
possible to set the ashift lower than the reported physical sector
size even when a smaller logical size was supported. In practice,
there are several cases where settong a lower ashift is useful:
* Most modern drives now correctly report their physical sector
size as 4k. This causes zfs to correctly default to using a 4k
sector size (ashift=12). However, for some usage models this
new default ashift value causes an unacceptable increase in
space usage. Filesystems with many small files may see the
total available space reduced to 30-40% which is unacceptable.
* When replacing a drive in an existing pool which was created
with ashift=9 a modern 4k sector drive cannot be used. The
'zpool replace' command will issue an error that the new drive
has an 'incompatible sector alignment'. However, by allowing
the ashift to be manual specified as smaller, non-optimal,
value the device may still be safely used.
For the DKMS package to successfully build the kernel-devel
headers must be included along gcc, make, and perl. The ZFS
code never directly invokes perl but the kernel build system
depends on it.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1380
The only remaining perl dependency is part of the ZFS_AC_META macro.
By eliminating this and replacing it with awk we can avoid the need
to pull in perl to rebuild the packages.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1380
Commit f6fb7651a0d05b357dc179cc4853263ce15da6ed introduced the idea
of working builds which work correctly. However, because the zfs-kmod
depends on a specific 'spl-devel-kmod = {version}-%{release}' package
and the release component is unique the dependency is never satisfied.
This requires line was introduced to ensure the correct version of the
spl is always used. In this context only the version number is required
so the release component has been dropped to satisfy the dependency.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Part of the automated testing involves building the source on Debian Lenny
which ships an ancient version of automake (1.10.1). Historically, this
has caused a non-fatal warning about AM_SILENT_RULES not being defined.
But when the autogen.sh script was updated to use autoreconf the warning
became fatal.
configure.ac:31: warning: macro `AM_SILENT_RULES' not found in library
autoreconf: running: /usr/bin/autoconf --force
configure.ac:34: error: possibly undefined macro: AM_SILENT_RULES
If this token and others are legitimate, please use m4_pattern_allow.
To resolve this build issue the call to AM_SILENT_RULES has been wrapped
by m4_ifdef(). This prevents the macro from being expanded on platforms
where it's undefined.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
* Update manpage with more information about the ACL, guest access
and that samba needs to be able to authenticate user(s).
* Add information that 'net' can be used to modify the share after
ZFS sharing and that it will be undone with a 'zfs unshare'.
* Give an example on how to mount a SMB filesystem shared via ZFS.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1181
Issue #1170
Brian Behlendorf [Tue, 26 Mar 2013 15:40:44 +0000 (08:40 -0700)]
Include init scripts in packages
The distribution specific init scripts where excluded from the
packaging when it was reworked. The intention is to replace
them with systemd equivilants. However, that work has not yet
been done and the init scripts are still useful so they have
been added back in to the packaging.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Mon, 25 Mar 2013 18:28:18 +0000 (11:28 -0700)]
Provide ${kmodname}-devel-kmod for yum-builddep
In order to ensure that yum-builddep pulls in all the build
requirements a generic ${kmodname}-devel-kmod provides line is
added. This allows a version of the development headers to be
included without requiring knowledge of the kernel version.
This is important because unlike rpmbuild which does correctly
expand the source rpm spec file, yum-builddep does not. Without
this generic provides line mock which relies on yum-builddep is
unable to automatically satisfy the dependency.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Fri, 22 Mar 2013 21:46:11 +0000 (14:46 -0700)]
Use 'git describe' for working builds
When building from an arbitrary commit in the git tree it's useful
for the resulting packages to be uniquely identifiable. Therefore,
the build system has been updated to detect if your compiling in
git tree.
If you are building in a git tree, and there are commits after the
last annotated tag. Then the <id>-<hash> component of 'git describe'
will be used to overwrite the 'Release:' field in the META file.
The only tricky part is that to ensure the 'make dist' tarball is
built using the correct release. A dist-hook was added to the top
level make file to rewrite the META file using the correct release.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Darik Horn [Thu, 21 Mar 2013 03:52:53 +0000 (22:52 -0500)]
Fix minor typos and update marketing copy.
Correct spelling mistakes in the AUTHORS and DISCLAIMER files, and
update the README.markdown file to credit Illumos and mention that
the ZPL is finished.
The README.markdown file is also the first impression for a handful
of new users that discover ZoL through a web search because it
doubles as the splash page for the Github repository. The build blurbs
are therefore removed because these people should be encouraged to
visit the regular home page before installing the product.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1366
Brian Behlendorf [Wed, 20 Mar 2013 22:15:05 +0000 (15:15 -0700)]
Use requested kernel for dkms builds
The --with-linux and --with-linux-obj options must be specified
as part of the dkms build otherwise the package will be built
against the running kernel.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Wed, 20 Mar 2013 18:01:48 +0000 (11:01 -0700)]
Use BUILD_DEPENDS option for dkms builds
Support was added to dkms so build dependencies can be specified.
This allows us to ensure that the spl package will always be built
before the zfs package. Those patches have not yet been merged
upstream but they are available in the zfsonlinux/dkms repository.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Wed, 20 Mar 2013 18:28:00 +0000 (11:28 -0700)]
Remove zfs-dkms conflict with zfs-kmod
Because the zfs-dkms package also provides zfs-kmod for the
zfs user package yum flags this as a conflict. To avoid the
problem remove the Conflicts tag from zfs-dkms and just rely
on the one in zfs-kmod.
zfs-dkms-0.6.0-rc14.fc18.noarch has installed conflicts
zfs-kmod: zfs-dkms-0.6.0-rc14.fc18.noarch
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Wed, 20 Mar 2013 18:25:50 +0000 (11:25 -0700)]
Refresh dracut module setup
60-zpool.rules was retired some time ago in favor of 69-vdev.rules.
Additionally, there is no guarentee a zpool.cache file exists so
only install it conditionally upon its existance.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Wed, 20 Mar 2013 02:25:01 +0000 (19:25 -0700)]
Add missing dependencies
The spl, zfs-test, and zfs-dracut packages should be pulled in
by the core zfs package. This allows all the required zfs packages
to be installed from a yum repository by running:
yum install zfs
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Tue, 19 Mar 2013 22:55:44 +0000 (15:55 -0700)]
Add spl-dkms dependency to zfs-dkms
Adding the 'spl-dkms = VERSION' dependency to the zfs-dkms
package ensures the spl will be installed before zfs. This
cleanly handles the initial 'yum localinstall' case.
However, this does not address the dkms rebuilds caused by
a new kernel being installed. For that we still rely on the
clunky --with-spl-timeout=<timeout> configure option which
allows the zfs compilation to be briefly delayed while the
spl components are built.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The assumption in zio_ddt_free() is that ddt_phys_select() must
always find a match. However, if that fails due to a damaged
DDT or some other reason the code will NULL dereference in
ddt_phys_decref().
While this should never happen it has been observed on various
platforms. The result is that unless your willing to patch the
ZFS code the pool is inaccessible. Therefore, we're choosing
to more gracefully handle this case rather than leave it fatal.
Brian Behlendorf [Mon, 18 Mar 2013 20:03:09 +0000 (13:03 -0700)]
Add metaslab_debug option
Enabling metaslab debugging will prevent space maps from being
automatically unloaded. This can significantly increase the
memory footprint but being able to dynamically control this is
helpful for debugging and certain performance testing.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Sun, 17 Feb 2013 20:10:17 +0000 (12:10 -0800)]
Refresh RPM packaging
Refresh the existing RPM packaging to conform to the 'Fedora
Packaging Guidelines'. This includes adopting the kmods2
packaging standard which is used fod kmods distributed by
rpmfusion for Fedora/RHEL.
While the spec files have been entirely rewritten from a
user perspective the only major changes are:
* The Fedora packages now have a build dependency on the
rpmfusion repositories. The generic kmod packages also
have a new dependency on kmodtool-1.22 but it is bundled
with the source rpm so no additional packages are needed.
* The kernel binary module packages have been renamed from
zfs-modules-* to kmod-zfs-* as specificed by kmods2.
* The is now a common kmod-zfs-devel-* package in addition
to the per-kernel devel packages. The common package
contains the development headers while the per-kernel
package contains kernel specific build products.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1341
Brian Behlendorf [Fri, 22 Feb 2013 23:50:00 +0000 (15:50 -0800)]
Configure --with-spl{-obj} auto-detect cleanup
Because the install location for the spl/zfs-devel headers was
changed we need to refresh the auto-detect code. Note that
for packaging which already explicitly calls --with-spl{-obj}
nothing has changed.
The updated code is now structured like that in ZFS_AC_KERNEL
and should be cleaner and easier to maintain. In addition,
it's stricter about detecting a valid source and object
directory. It requires:
* The source directory contains the file 'spl.release'
* The object directory contains the file 'spl_config.h'
* The following paths will be checked. Notice the /var/lib/
and /usr/src paths require that the spl and zfs version be
matched. This is done to prevent accidentally mixing releases.
Brian Behlendorf [Thu, 21 Feb 2013 23:10:11 +0000 (15:10 -0800)]
Change zfs-kmod-devel install path
Install the common zfs kernel development headers under
/usr/src/zfs-<version>/ rather than in a kernel specific
directory. The kernel specific build products such as
zfs_config.h and Modules.symvers are left installed under
/usr/src/zfs-<version>/<kernel>.
This was done to be consistent with where dkms expects
kernel module source to be packaged. It also allows for
a common zfs-kmod-devel package which includes the headers,
and per-kernel zfs-kmod-devel-<kernel> packages.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
There's no reason to keep this file around it's redundant with
the COPYRIGHT and OPENSOLARIS.LICENSE files and causes lintian
to emit an extra-license-file warning.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Richard Yao [Mon, 4 Mar 2013 20:52:11 +0000 (15:52 -0500)]
Linux 3.9 compat: Undefine GCC_VERSION
The mainline kernel started defining GCC_VERSION with commit
torvalds/linux@3f3f8d2f48acfd8ed3b8e6b7377935da57b27b16.
Unfortunately, LZ4 also defines this macro, but the two
defintions are incompatible. We undefine GCC_VERSION in lz4.c
to handle this.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1339
Brian Behlendorf [Fri, 22 Feb 2013 19:28:28 +0000 (11:28 -0800)]
Fix zdb.8 macro warning
Detected by rpmlint the 'rpool/export/home' section was being
interpretted by troff as an undefined macro. This resulted
in the 'rpool/export/home' output being omitted from 'man zdb'.
This was caused by starting the line with a ' character. By
moving the 'in' down to the next line we're able to fix it.
zfs.x86_64: W: manual-page-warning /usr/share/man/man8/zdb.8.gz
450: warning: macro `rpool/export/home'' not defined
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Sun, 17 Feb 2013 19:16:22 +0000 (11:16 -0800)]
Retire ZFS.RELEASE file
The ZFS.RELEASE file was originally added to document which
version of OpenSolaris the ZoL code was based on. However,
that's no longer particularly important or useful. We'll
likely never see a new onnv_* drop from Solaris, and even
if we do the ZoL changes are now extensive enough they
could not be easily applied. We now treat Illumos as the
official upstream and cherry pick the patches we need.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Sun, 17 Feb 2013 19:11:41 +0000 (11:11 -0800)]
Remove ARCH packaging
The kernel modules are now available in the Arch User Repository
(AUR) via zfs. Since their packaging is maintained and superior
to ours it is being removed from the tree.
https://wiki.archlinux.org/index.php/ZFS
Now that various distributions are picking up the packages we
should eventually be able to remove most of this infrastructure.
Packaging belongs with the distributions not upstream.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Fri, 22 Feb 2013 18:16:16 +0000 (10:16 -0800)]
Add --with-dracutdir configure option
The standard dracut directory has moved from /usr/share/dracut to
/usr/lib/dracut. To ensure the dracut modules get installed in
the correct location provide a --with-dracutdir configure option
to set the path.
The default install location has been updated to /usr/lib/dracut
which is used by more current versions of Fedora. However, this
default is overriden by the RPM packaging for consistency.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Sun, 17 Feb 2013 19:05:11 +0000 (11:05 -0800)]
Add KMODDIR to install target
Provide a mechanism to control the directory name the modules
are installed in. The kernel privdes INSTALL_MOD_DIR for
this but it was hardcoded to be 'addon/zfs'.
Add a KMODDIR variable which can be passed to 'make install'
to override the default directory name. While we're here
change the default from 'addon/zfs' to 'extra' which is the
kernel.org default.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Eric Dillmann [Wed, 13 Feb 2013 23:11:59 +0000 (00:11 +0100)]
Add snapdev=[hidden|visible] dataset property
The new snapdev dataset property may be set to control the
visibility of zvol snapshot devices. By default this value
is set to 'hidden' which will prevent zvol snapshots from
appearing under /dev/zvol/ and /dev/<dataset>/. When set to
'visible' all zvol snapshots for the dataset will be visible.
This functionality was largely added because when automatic
snapshoting is enabled large numbers of read-only zvol snapshots
will be created. When creating these devices the kernel will
attempt to read their partition tables, and blkid will attempt
to identify any filesystems on those partitions. This leads
to a variety of issues:
1) The zvol partition tables will be read in the context of
the `modprobe zfs` for automatically imported pools. This
is undesirable and should be done asynchronously, but for
now reducing the number of visible devices helps.
2) Udev expects to be able to complete its work for a new
block devices fairly quickly. When many zvol devices are
added at the same time this is no longer be true. It can
lead to udev timeouts and missing /dev/zvol links.
3) Simply having lots of devices in /dev/ can be aukward from
a management standpoint. Hidding the devices your unlikely
to ever use helps with this. Any snapshot device which is
needed can be made visible by changing the snapdev property.
NOTE: This patch changes the default behavior for zvols which
was effectively 'snapdev=visible'.
George Wilson [Sun, 3 Mar 2013 05:57:39 +0000 (00:57 -0500)]
Merge zvol.c changes from PSARC 2010/306 Read-only ZFS pools
The changes to zvol.c were never merged from the last onnv_147
bulk update. This was because zvol.c was largely rewritten
for Linux making it fairly easy to miss these sorts of changes.
This causes a regression when importing a zpool with zvols
read-only. This does not impact pool which only contain
filesystem datasets.
Richard Yao [Fri, 15 Feb 2013 04:37:43 +0000 (23:37 -0500)]
Constify structures containing function pointers
The PaX team modified the kernel's modpost to report writeable function
pointers as section mismatches because they are potential exploit
targets. We could ignore the warnings, but their presence can obscure
actual issues. Proper const correctness can also catch programming
mistakes.
Building the kernel modules against a PaX/GrSecurity patched Linux 3.4.2
kernel reports 133 section mismatches prior to this patch. This patch
eliminates 130 of them. The quantity of writeable function pointers
eliminated by constifying each structure is as follows:
The remaining 3 writeable function pointers cannot be addressed by this
patch. 2 of them are in zpl_fs_type. The kernel's sget function requires
that this be non-const. The final writeable function pointer is created
by SPL_SHRINKER_DECLARE. The kernel's set_shrinker() and
remove_shrinker() functions also require that this be non-const.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1300
Richard Yao [Thu, 14 Feb 2013 23:54:04 +0000 (18:54 -0500)]
Eliminate runtime function pointer mods in autotools checks
PaX/GrSecurity patched kernels implement a dialect of C that relies on a
GCC plugin for enforcement. A basic idea in this dialect is that
function pointers in structures should not change during runtime.
This causes code that modifies function pointers at runtime to fail to
compile in many instances. The autotools checks rely on whether or
not small test cases compile against a given kernel. Some
autotools checks assume some default case if other cases fail. When one
of these autotools checks tests a PaX/GrSecurity patched kernel by
modifying a function pointer at runtime, the default case will be used.
Early detection of such situations is possible by relying on compiler
warnings, which are compiler errors when --enable-debug is used.
Unfortunately, very few people build ZFS with --enable-debug. The more
common situation is that these issues manifest themselves as runtime
failures in the form of NULL pointer exceptions.
Previous patches that addressed such issues with PaX/GrSecurity
compatibility largely relied on rewriting autotools checks to avoid
runtime function pointer modification or the addition of PaX/GrSecurity
specific checks. This patch takes the previous work to its logical
conclusion by eliminating the use of runtime function pointer
modification. This permits the removal of PaX-specific autotools checks
in favor of ones that work across all supported kernels.
This should resolve issues that were reported to occur with
PaX/GrSecurity-patched Linux 3.7.5 kernels on Gentoo Linux.
https://bugs.gentoo.org/show_bug.cgi?id=457176
We should be able to prevent future regressions in PaX/GrSecurity
compatibility by ensuring that all changes to ZFSOnLinux avoid runtime
function pointer modification. At the same time, this does not solve the
issue of silent failures triggering default cases in the autotools
check, which is what permitted these regressions to become runtime
failures in the first place. This will need to be addressed in a future
patch.
Reported-by: Marcin Mirosław <bug@mejor.pl> Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1300
Brian Behlendorf [Wed, 27 Feb 2013 01:02:27 +0000 (17:02 -0800)]
Fix hot spares
The issue with hot spares in ZoL is because it opens all leaf
vdevs exclusively (O_EXCL). On Linux, exclusive opens cause
subsequent exclusive opens to fail with EBUSY.
This could be resolved by not opening any of the devices
exclusively, which is what Illumos does, but the additional
protection offered by exclusive opens is desirable. It cleanly
prevents you from accidentally adding an in-use non-ZFS device
to your pool.
To fix this we very slightly relaxed the usage of O_EXCL in
the following ways.
1) Functions which open the device but only read had the
O_EXCL flag removed and were updated to use O_RDONLY.
2) A common holder was added to the vdev disk code. This
allow the ZFS code to internally open the device multiple
times but non-ZFS callers may not.
3) An exception was added to make_disks() for hot spare when
creating partition tables. For hot spare devices which
are already opened exclusively we skip creating the partition
table because this must already have been done when the disk
was originally added as a hot spare.
Additional minor changes include fixing check_in_use() to use
a partition instead of a slice suffix. And is_spare() was moved
above make_disks() to avoid adding a forward reference.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #250
Brian Behlendorf [Tue, 26 Feb 2013 19:28:14 +0000 (11:28 -0800)]
Remove wholedisk check from vdev_disk_open()
As described by the comment and enforced the by assertion the
v->vdev_wholedisk will never be -1. The wholedisk handling
is performed by the user space utilities. To prevent confusion
this dead code is being removed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Tue, 26 Feb 2013 19:25:55 +0000 (11:25 -0800)]
Leaf vdevs should not be reopened
When vdev_disk.c was implemented for Linux we failed to handle the
reopen case. According to the vdev_reopen() comment leaf vdevs should
not be closed or opened when v->vdev_reopening is set. Under Linux
we would always close and open the device.
This issue was only noticed when a 'zpool scrub' command was run while
the leaf vdev device names in /dev/disk/by-vdev were missing. The
scrub command calls vdev_reopen() which caused the vdevs to be closed
but they couldn't be reopened due to the missing links. The result
was that all the vdevs were marked unavailable and the pool was
halted due to failmode=wait.
This patch adds the missing functionality in a similiar fashion to
to the Illumos code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tim Connors [Mon, 25 Feb 2013 21:00:45 +0000 (08:00 +1100)]
-x shouldn't warn about old on-disk format or unavailable features
`zpool status -x` should only flag errors or where the pool is
unavailable. If it imported fine but isn't using the latest features
available in the code, that's not an error.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1319
Etienne Dechamps [Sun, 24 Feb 2013 11:22:07 +0000 (11:22 +0000)]
Remove the bio_empty_barrier() check.
To determine whether the kernel is capable of handling empty barrier
BIOs, we check for the presence of the bio_empty_barrier() macro,
which was introduced in 2.6.24. If this macro is defined, then we can
flush disk vdevs; if it isn't, then flushing is disabled.
Unfortunately, the bio_empty_barrier() macro was removed in 2.6.37,
even though the kernel is still capable of handling empty barrier BIOs.
As a result, flushing is effectively disabled on kernels >= 2.6.37,
meaning that starting from this kernel version, zfs doesn't use
barriers to guarantee on-disk data consistency. This is quite bad and
can lead to potential data corruption on power failures.
This patch fixes the issue by removing the configure check for
bio_empty_barrier(), as we don't support kernels <= 2.6.24 anymore.
Thanks to Richard Kojedzinszky for catching this nasty bug.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1318
Etienne Dechamps [Sun, 24 Feb 2013 12:42:28 +0000 (12:42 +0000)]
Use -Werror for all kernel configure tests.
As a matter of fact, we're already using -Werror for most tests because
of a bug in kernel-bio-empty-barrier.m4 which sets -Werror without
reverting it afterwards. This meant that all tests which ran after this
one was using -Werror.
This patch simply makes it clear that we're using -Werror and makes
the code more readable and more predictable.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1317
Brian Behlendorf [Thu, 21 Feb 2013 20:14:44 +0000 (12:14 -0800)]
Enable zfs_arc_memory_throttle_disable by default
The zfs_arc_memory_throttle_disable module option was introduced
by commit 0c5493d47059f25ce9dbf20c9fe87655f55102a1 to resolve a
memory miscalculation which could result in the txg_sync thread
spinning.
When this was first introduced the default behavior was left
unchanged until enough real world usage confirmed there were no
unexpected issues. We've now reached that point. Linux's
direct reclaim is working as expected so we're enabling this
behavior by default.
This helps pave the way to retire the spl_kmem_availrmem()
functionality in the SPL layer. This was the only caller.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #938
Rather then setting _prefix=/ and having to override all the
default install locations. It's cleaner, and more understandable,
to leave prefix=/usr and only override _sbindir and _libdir. This
fixes three issues:
* The commands no longer get built with an incorrect rpath for
the libraries. This is good because fixing this sort of
thing is required by the Fedora packaging guidelines.
Richard Yao [Sun, 10 Feb 2013 00:25:55 +0000 (19:25 -0500)]
Make spa.c assertions catch unsupported pre-feature flag pool versions
A couple of assertions in spa.c were designed to prevent the use of
invalid pool versions. They were written under the assumption
that all valid pools are less than SPA_VERSION. Since feature flags
jumped from 28 to 5000, any numbers in the range 28 to 5000
non-inclusive will fail to trigger them. We switch to the new
SPA_VERSION_IS_SUPPORTED macro to correct this.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1282
Brian Behlendorf [Mon, 11 Feb 2013 20:55:24 +0000 (12:55 -0800)]
Add explicit MAXNAMELEN check
It turns out that the Linux VFS doesn't strictly handle all cases
where a component path name exceeds MAXNAMELEN. It does however
appear to correctly handle MAXPATHLEN for us.
The right way to handle this appears to be to add an explicit
check to the zpl_lookup() function. Several in-tree filesystems
handle this case the same way.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1279
Ned Bass [Wed, 6 Feb 2013 18:15:13 +0000 (10:15 -0800)]
Switch KM_SLEEP to KM_PUSHPAGE
Two more locations where KM_SLEEP was used in a call which must
use KM_PUSHPAGE were found while using the zpool upgrade command.
See commit b8d06fc for additional details.
Also make a small correction to the comment block above
dsl_dir_open_spa().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1268
Richard Yao [Tue, 5 Feb 2013 23:14:30 +0000 (18:14 -0500)]
Fix function relocations in libzpool
binutils 2.23.1 fails in situations that generate function relocations
on PowerPC and possibly other architectures. This causes linking of
libzpool to fail because it depends on libnvpair. We add a dependency on
libnvpair to lib/libzpool/Makefile.am to correct that.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1267
Explicitly case this value to an unsigned long long for 32-bit
systems to inform the compiler that a long type should not be
used. Otherwise we get the following compiler error:
dmu_send.c:376: error: integer constant is too large for
‘long’ type
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The zpool-features(5) man page was accidentally omitted from the
build target when feature flags was merged. As a result it doesn't
get installed as part of 'make install' so none of the packages
include this man page.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1262
The way in which virtual box ab(uses) memory can throw off the
free memory calculation in arc_memory_throttle(). The result is
the txg_sync thread will effectively spin waiting for memory to
be released even though there's lots of memory on the system.
To handle this case I'm adding a zfs_arc_memory_throttle_disable
module option largely for virtual box users. Setting this option
disables free memory checks which allows the txg_sync thread to
make progress.
By default this option is disabled to preserve the current
behavior. However, because Linux supports direct memory reclaim
it's doubtful throttling due to perceived memory pressure is ever
a good idea. We should enable this option by default once we've
done enough real world testing to convince ourselve there aren't
any unexpected side effects.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #938
Commit 1eb5bfa introduced a new zfs_disable_dup_eviction tunable.
It should have been made available as a module option in the
original patch but was overlooked.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Brian Behlendorf [Thu, 31 Jan 2013 19:02:21 +0000 (11:02 -0800)]
Honor 80 character limit in 'zpool status'
This is a minor nit, but the second line of the 'action' message
when you need to upgrade your pool to support feature flags exceeds
the standard 80 character limit. Fix it by moving the word
'feature' on to the third line.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>