granicus.if.org Git

Tag spl-0.6.3

META file and release log updated.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Set LANG to a reasonable default (C)

Set LANG=C before calling 'rpmbuild' to avoid rpmbuild failing on
the translated date string in the changelog.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #306

Fix DKMS package upgrade and packager

Running 'yum upgrade spl-dkms' package could appear to work properly
and still leave you with no spl modules installed.  This will occur
when only the spl release, and not the version, are incremented.
This may be the case for a fast moving spl-testing repository.

During the upgrade process DKMS will realize that spl-x.y.z is already
installed and remove it.  DKMS then correctly builds the new modules
for spl-x.y.z.  However, as a final step when the old spl-x.y.z-r is
removed the %preun script runs and removes the newly build modules.
To handle this case the %preun script has been updated to only run
when the installed version exactly matches the full spec file version.

This change also updated ChangeLog section based on the DKMS
reference spec file.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Restrict release number to META version

When creating packages in a git repository the release number
can be automatically set by 'git describe'. This normally works
well but if your repository has newer tags which match the form
NAME-VERSION* the release may be incorrectly calculated. To
prevent this the match patten has been restricted to NAME-VERSION.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Add spl_kmem_cache_reclaim module option

The correct behavior for all registered shrinkers is to return the
number of objects in their cache.  In theory this allows the Linux
VM to balance memory reclaim across all registered caches.

In commit b9b3715 this behavior was disabled in favor of returning
-1 which notifies the VM that no additional objects are available
for reclaim.  This was done as a workaround to resolve thrashing
in shrink_slabs() which could occur when memory was low and numerous
core where in reclaim.  Unfortunately, this has been observed to
increase the likelihood of OOM events when SPL slab consumers are
responsible for consuming the majority of memory.

Therefore, this patch makes this behavior tunable.  Setting the
spl_kmem_cache_reclaim module option to 0x1 will result in the
shrinker only being called once.  This is the default behavior.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Closes #358

Add KMC_SLAB cache type

For small objects the Linux slab allocator has several advantages
over its counterpart in the SPL.  These include:

1) It is more memory-efficient and packs objects more tightly.
2) It is continually tuned to maximize performance.

Therefore it makes sense to layer the SPLs slab allocator on top
of the Linux slab allocator.  This allows us to leverage the
advantages above while preserving the Illumos semantics we depend
on.  However, there are some things we need to be careful of:

1) The Linux slab allocator was never designed to work well with
   large objects.  Because the SPL slab must still handle this use
   case a cut off limit was added to transition from Linux slab
   backed objects to kmem or vmem backed slabs.

   spl_kmem_cache_slab_limit - Objects less than or equal to this
   size in bytes will be backed by the Linux slab.  By default
   this value is zero which disables the Linux slab functionality.
   Reasonable values for this cut off limit are in the range of
   4096-16386 bytes.

   spl_kmem_cache_kmem_limit - Objects less than or equal to this
   size in bytes will be backed by a kmem slab.  Objects over this
   size will be vmem backed instead.  This value defaults to
   1/8 a page, or 512 bytes on an x86_64 architecture.

2) Be aware that using the Linux slab may inadvertently introduce
   new deadlocks.  Care has been taken previously to ensure that
   all allocations which occur in the write path use GFP_NOIO.
   However, there may be internal allocations performed in the
   Linux slab which do not honor these flags.  If this is the case
   a deadlock may occur.

The path forward is definitely to start relying on the Linux slab.
But for that to happen we need to start building confidence that
there aren't any unexpected surprises lurking for us.  And ideally
need to move completely away from using the SPLs slab for large
memory allocations.  This patch is a first step.

NOTES:
1) The KMC_NOMAGAZINE flag was leveraged to support the Linux slab
   backed caches but it is not supported for kmem/vmem backed caches.

2) Regardless of the spl_kmem_cache_*_limit settings a cache may
   be explicitly set to a given type by passed the KMC_KMEM,
   KMC_VMEM, or KMC_SLAB flags during cache creation.

3) The constructors, destructors, and reclaim callbacks are all
   functional and will be called regardless of the cache type.

4) KMC_SLAB caches will not appear in /proc/spl/kmem/slab due to
   the issues involved in presenting correct object accounting.
   Instead they will appear in /proc/slabinfo under the same names.

5) Several kmem SPLAT tests needed to be fixed because they relied
   incorrectly on internal kmem slab accounting.  With the updated
   test cases all the SPLAT tests pass as expected.

6) An autoconf test was added to ensure that the __GFP_COMP flag
   was correctly added to the default flags used when allocating
   a slab.  This is required to ensure all pages in higher order
   slabs are properly refcounted, see ae16ed9.

7) When using the SLUB allocator there is no need to attempt to
   set the __GFP_COMP flag.  This has been the default behavior
   for the SLUB since Linux 2.6.25.

8) When using the SLUB it may be desirable to set the slub_nomerge
   kernel parameter to prevent caches from being merged.

Original-patch-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: DHE <git@dehacked.net>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #356

Linux 3.15: vfs_rename() added a flags argument

Detect the updated vfs_rename() interface and call it with an
extra flags argument.

References:
torvalds/linux@520c8b1

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #355

Linux 3.15 compat: NICE_TO_PRIO and PRIO_TO_NICE

These macro's were exposed to make them available to other
parts of the kernel and modules.

References:
torvalds/linux@6b6350f

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #355

Evenly distribute the taskq threads across available CPUs

The problem is described in commit aeeb4e0c0ae75b99ebbaa3056f0afc8e12949532.
However, instead of disabling the binding to CPU altogether we just keep the
last CPU index across calls to taskq_create() and thus achieve even
distribution of the taskq threads across all available CPUs.

The implementation based on assumption that task queues initialization
performed in serial manner.

Signed-off-by: Andrey Vesnovaty <andrey.vesnovaty@gmail.com>
Signed-off-by: Andrey Vesnovaty <andreyv@infinidat.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #336

Fix crash when using ZFS on Ceph rbd

When using __get_free_pages to get high order memory, only the first page's
_count will set to 1, other's will be 0. When an internal page get passed into
rbd, it will eventully go into tcp_sendpage. There, it will be called with
get_page and put_page, and get freed erroneously when _count jump back to 0.

The solution to this problem is to use compound page. All pages in a
high order compound page share a single _count. So get_page and put_page in
tcp_sendpage will not cause _count jump to 0.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #251

Add support for aarch64 (ARMv8)

Using the ARM reference simulation (fast model foundation v8) I
cross compiled spl and zfs, to confirm it works on ARMv8 (64 bit
arm architecture, called aarch64 in Linux).

As it is based on previous ARM porting, the resulting patch is
disappointingly small, there was very little to do. The code fixes
the compile issues and has light testing done.

Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #351

Change spl_kmem_cache_expire default setting to 2

This behavior is more consistent with the way memory reclaim
is expected to work under Linux.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #349

Expose max/min objs per slab and max slab size

By default maximal number of objects in slab can't exceed (16*2 - 1) and slab
size can't exceed 32M.
Today's high end servers having couple hundreds of RAM available for ARC may
run into a trouble with virtual memory because of the restriction mentioned
above.

Problem:
Reasons for very high number of virtual memory allocations:
* Real slab size very small relative to the size of the entire RAM
* Slabs allocated on virtual memory and fill entire ARC

The result is very high number of allocated virtual memory ranges (hundreds of
ranges). When virtual memory subsystem manages high number of ranges its
performance become so poor that it freezes from time to time.

Solution:
Number of objects per slab should be increased taking into account maximal
slab size which can also be increased if needed.

Signed-off-by: Andrey Vesnovaty <andrey.vesnovaty@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #337

Add ddi_time_after and friends

When comparing times gotten from ddi_get_lbolt, we have to take account of
wrap around of jiffies. Therefore, we cannot use 't1 < t2'. Instead we should
use 't1 - t2 < 0'.

This patch add ddi_time_after and friends to address this issue. They have
strict type restriction, clock_t for vanilla and int64_t for 64 version, to
prevent type conversion from screwing things.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #335

This patch add a CTASSERT macro for compile time assertion.

This macro makes the compile to spit "mixed definition and code"
warning, I can't find a way to avoid it.

This patch lays some groundwork for the persistent l2arc feature.
See https://www.illumos.org/issues/3525.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #303

Simplify hostid logic

There is plenty of compatibility code for a hw_hostid
that isn't used by anything. At the same time, there are apparently
issues with the current hostid logic. coredumb in #zfsonlinux on
freenode reported that Fedora 17 changes its hostid on every boot, which
required force importing his pool. A suggestion by wca was to adopt
FreeBSD's behavior, where it treats hostid as zero if /etc/hostid does
not exist

Adopting FreeBSD's behavior permits us to eliminate plenty of code,
including a userland helper that invokes the system's hostid as a
fallback.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #224

Call kthread_create() correctly with fixed arguments.

The kernel's kthread_create() function is defined as "..." and there is
no va_list variant at the moment. The task name is pre-formatted into
a local buffer and passed to kthread_create() with fixed arguments.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #347

De-inline spl_kthread_create().

The function was defined as a static inline with variable arguments
which causes gcc to generate errors on some distros.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #346

Support post-3.13 kthread_create() semantics.

Provide spl_kthread_create() as a wrapper to the kernel's kthread_create()
to provide pre-3.13 semantics. Re-try if the call is interrupted or if it
would have returned -ENOMEM. Otherwise return NULL.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #339

splat cred:groupmember: Fix false positives

Due to certain assumptions made in the the cred:groupmember test it
could result in false positives when run on specific distributions.
This was solely a bug in the test case and not in the groupmember()
function which the test case was validating.

To prevent future false positives the test case has been rewritten
to be both more rigerous and to make fewer assumptions about the
system.

Minor style cleanup was done to cr_groups_search() and groupmember()
functions.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

splat kmem:slab_reclaim: Test cleanup

By setting __GFP_NORETRY the kernel memory reclaim logic was allowed to
abort early and dump a falled allocation stack to the console.  Since
this was done in a tight loop to fill memory it could result in a large
number of stacks being dumped to the console.  This in turn slowed down
the test sufficiently so it exceeded the time limit and failed.

To resolve this issue the __GFP_NORETRY flag is being removed.  This is
how it should have been called originally to ensure we're simulating
the behavior of most callers which will use the GFP_KERNEL flag.

In addition, the reclaim granularity of 1000 objects was far to coarse
for this to be a realistic test.  For kmem:slab_reclaim there might
only be a few thousand objects total in the cache.  Therefore, the
SPLAT_KMEM_OBJ_RECLAIM constant for these tests was lowered.  This
will cause the reclaim callback to run more frequently which makes
for a better test case.

The frequency of the cache reaping in kmem:slab_reap was increased
to accommodate the reduced number of objects released during the
reclaim.

These changes only impact the test cases and were done to remove
false positives caused by the test case itself.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Remove incorrect use of EXTRA_DIST for man pages

Setting the 'dist_' prefix is the correct way to instruct Automake
to include these files in the distribution. The EXTRA_DIST variable
is reserved for files which are not covered by the automatic rules.

http://www.gnu.org/software/automake/manual/automake.html#Basics

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Define the needed ISA types for Sparc

Add the minimum required ISA types to support the Sparc
architecture.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: marku89 <mar42@kola.li>
Closes #317

Remove default taskq thread to CPU bindings

When this code was written it appears to have been assumed that
every taskq would have a large number of threads. In this case
it would make sense to attempt to evenly bind the threads over
all available CPUs. However, it failed to consider that creating
taskqs with a small number of threads will cause the CPUs with
lower ids become over-subscribed.

For this reason the kthread_bind() call is being removed and
we're leaving the kernel to schedule these threads as it sees fit.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #325

Include linux/vmalloc.h for ARM and Sparc

Related to issue #257 which added Linux 3.10 compatibility. For
ARM and Sparc architectures we must explicitly include the
<linux/vmalloc.h> header to ensure the vmalloc_info structure
is always defined when available.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #257
Closes #291

Add module versioning

Use the standard Linux MODULE_VERSION macro to expose the installed
spl and splat module versions.  This will also automatically add a
checksum of the .c files and headers in "srcversion".  See:

  /sys/module/spl/version
  /sys/module/spl/srcversion
  /sys/module/splat/version
  /sys/module/splat/srcversion

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/zfs#1923

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Linux 3.13 compat: Pass NULL for new delegated inode argument

This check was originally added for SLES10, a093c6a, to check for
a 'struct vfsmount *' argument which they added. However, since
SLES10 is based on a 2.6.16 kernel which is no longer supported
this functionality was dropped. The checks were refactored to
support Linux 3.13 without concern for historical versions.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #312

Linux 3.13 compat: Remove unused flags variable from __cv_init()

GCC 4.8.1 complained about an unused flags variable when building
against Linux 2.6.26.8:

/var/tmp/portage/sys-kernel/spl-9999/work/spl-9999/module/spl/../../module/spl/spl-condvar.c:
In function ‘__cv_init’:
/var/tmp/portage/sys-kernel/spl-9999/work/spl-9999/module/spl/../../module/spl/spl-condvar.c:39:6:
error: variable ‘flags’ set but not used
[-Werror=unused-but-set-variable]
int flags = KM_SLEEP;
^
cc1: all warnings being treated as errors

Additionally, the superfluous code uses a preempt_count variable that is
no longer available on Linux 3.13. Deleting the unnecessary code fixes a
Linux 3.13 compatibility issue.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #312

Document SPL module parameters.

This is a first draft of a spl-module-parameters(5) man page. I have
just extracted the parameter name and its description with modinfo,
then checked the source what type it is and its default value.

This will need more work, preferably someone that actually know these
values and what to use them for. Similar to zfsonlinux/zfs#1856, but
for the spl.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/zfs#1856

Retroactively fix bogus %changelog date

New versions of rpmbuild detect the invalid date which was added
incorrectly to the changelog. To silence this noise fix it.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #306

Tighten spl dependency on spl-kmod

Make spl depend on the same version of spl-kmod, rather than on same or
better. When yum repository contains a number of versions the dependency
resolution breaks on trying to install non-latest version.

Signed-off-by: Cyril Plisko <cyril.plisko@mountall.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/zfs#1677

Linux 3.12 compat: New shrinker API

torvalds/linux@24f7c6 introduced a new shrinker API while
torvalds/linux@a0b021 dropped support for the old shrinker API.
This patch adds support for the new shrinker API by wrapping
the old one with the new one.

This change also reorganizes the autotools checks on the shrinker
API such that the configure script will fail early if an unknown
API is encountered in the future.

Support for the set_shrinker() API which was used by Linux 2.6.22
and older has been dropped. As a general rule compatibility is
only maintained back to Linux 2.6.26.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/zfs#1732
Closes zfsonlinux/zfs#1822
Closes #293
Closes #307

Emulate illumos interface cv_timedwait_hires()

Needed for Illumos #3582. This interface is supposed to support
a variable-resolution timeout with nanosecond granularity.  This
implementation rounds up to microsecond resolution, as nanosecond-
precision timing is rarely needed for real-world performance
tuning and may incur unnecessary busy-waiting.  usleep_range() is
used if available, otherwise udelay() or msleep() are used
depending on the length of the delay interval.

Add flags from sys/callo.h as these are used to control the behavior of
cv_timedwait_hires().  Specifically,

CALLOUT_FLAG_ABSOLUTE
    Normally, the expiration passed to the timeout API functions is
    an expiration interval. If this flag is specified, then it is
    interpreted as the expiration time itself.

CALLOUT_FLAG_ROUNDUP
    Roundup the expiration time to the next resolution boundary. If this
    flag is not specified, the expiration time is rounded down.

References:
    https://www.illumos.org/issues/3582
    illumos/illumos-gate@0689f76

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #304

Merge branch 'kstat'

This branch updates the existing kstat infrastructure to be
more flexible. In particular, it extends the KSTAT_TYPE_RAW
type so it may be used to generate more dynamic kstats without
the need for additional custom types.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

3537 add kstat_waitq_enter and friends

These kstat interfaces are required to port
"Illumos #3537 want pool io kstats" to ZFS on Linux.

kstat_waitq_enter()
kstat_waitq_exit()
kstat_runq_enter()
kstat_runq_exit()

Additionally, zero out the ks_data buffer in __kstat_create() so
that the kstat_io_t counters are initialized to zero.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Kstat to use private lock by default

While porting Illumos #3537 I found that ks_lock member of kstat_t
structure is different between Illumos and SPL. It is a pointer to
the kmutex_t in Illumos, but the mutex lock itself in SPL.
Apparently Illumos kstat API allows consumer to override the lock
if required. With SPL implementation it is not possible anymore.

Things were alright until the first attempt to actually override
the lock. Porting of Illumos #3537 introduced such code for the
first time.

In order to provide the Solaris/Illumos like functionality we:
  1. convert ks_lock to "kmutex_t *ks_lock"
  2. create a new field "kmutex_t ks_private_lock"
  3. On kstat_create() ks_lock = &ks_private_lock

Thus if consumer doesn't care we still have our internal lock in use.
If, however, consumer does care she has a chance to set ks_lock to
anything else before calling kstat_install().

The rest of the code will use ks_lock regardless of its origin.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #286

Revert "Add KSTAT_TYPE_TXG type"

This reverts commit dba79fcbf2cc50be5caef84ae01657e884ac5d89 in
favor of using the generic KSTAT_TYPE_RAW callbacks. The advantage
of this approach is that arbitrary types can be added without the
need to add them to the SPL.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #296

Add wrappers for accessing PID and command info

This change adds simple wrappers for accessing a thread's PID and
command character string.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #296

Add callbacks for displaying KSTAT_TYPE_RAW kstats

The current implementation for displaying kstats of type KSTAT_TYPE_RAW
is rather crude. This patch attempts to enhance this handling by
allowing a kstat user to register formatting callbacks which can
optionally be used.

The callbacks allow the user to implement functions for interpreting
their data and transposing it into a character buffer. This buffer,
containing a string representation of the raw data, is then be displayed
through the current /proc textual interface.

Additionally the kstats are made writable because it's now possible
to provide a useful handler via the existing ks_update() interface.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #296

Define SET_ERROR()

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Consistently use local_irq_disable/local_irq_enable

It was observed that spl_kmem_cache_alloc() uses local_irq_save()
and saves the interrupt state in a local variable. This would
normally be fine except that spl_kmem_cache_alloc() calls
spl_cache_refill() which re-enables interrupts. It is then
possible that while interrupts are enabled the process is
rescheduled to a different cpu before being disable again.
This could result in us restoring the saved interrupt state
from one cpu to another.

What the consequences of this are aren't perfectly clear, but
this is clearly a bug and it has the potential to cause issues.
The code has been updated to just use local_irq_enable() and
local_irq_disable() to avoid this.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Document how to run SPLAT

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #294

Add kpreempt() compatibility macro

This is needed for the Illumos #4045 write throttle patch. It is used
in the arc eviction code to avoid blocking all arc activity by sitting on
arcs_mtx too long.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #286

Replace current_kernel_time() with getnstimeofday()

current_kernel_time() is used by the SPLAT, but it is not meant for
performance measurement. We modify the SPLAT to use getnstimeofday(),
which is equivalent to the gethrestime() function on Solaris.
Additionally, we update gethrestime() to invoke getnstimeofday().

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #279

Tag spl-0.6.2

META file and release log updated.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Linux 3.8 compat: Use kuid_t/kgid_t when required

When CONFIG_UIDGID_STRICT_TYPE_CHECKS is enabled uid_t/git_t are
replaced by kuid_t/kgid_t, which are structures instead of integral
types. This causes any code that uses an integral type to fail to build.
The User Namespace functionality introduced in Linux 3.8 requires
CONFIG_UIDGID_STRICT_TYPE_CHECKS, so we could not build against any
kernel that supported it.

We resolve this by converting between the new kuid_t/kgid_t structures
and the original uid_t/gid_t types.

Original-patch-by: DHE
Rewrite-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #260

PaX/GrSecurity Linux 3.8.y compat: Use __no_const on struct ctl_table

The PaX team started constifying `struct ctl_table` as of their Linux
3.8.0 patchset. This lead to zfsonlinux/spl#225 and Gentoo bug #463012.

While investigating our options, I learned that there is a preprocessor
directive called CONSTIFY_PLUGIN that we can use to detect the presence
of the PaX changes and adjust the code accordingly.

The PaX Team had suggested adopting ctl_table_no_const, but supporting
older kernels required declaring that whenever the CONSTIFY_PLUGIN was
set. Future compiler changes could potentially cause that to break in
the presence of -Werror, so instead we define our own spl_ctl_table
typdef and use that. This should be compatible with all PaX kernels.

This introduces a Linux kernel version number check to prevent a build
failure on versions of the PaX GCC plugin that existed for kernels
before Linux 3.8.0. Affected versions of the PaX plugin will trigger a
compiler error when they see no_const cast on a non-constified
structure. Ordinarily, we would need an autotools check to catch that.
However, it is safe to do a kernel version check instead of an autotools
check in this specific instance because the affected versions of the PaX
GCC plugin only exist for Linux kernels before 3.8.0 and the
constification of `struct ctl_table` by the PaX developers only occurs
in Linux 3.8.0 and later.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #225

Fix race in spl_kmem_cache_reap_now()

The current code contains a race condition that triggers when bit 2 in
spl.spl_kmem_cache_expire is set, spl_kmem_cache_reap_now() is invoked
and another thread is concurrently accessing its magazine.

spl_kmem_cache_reap_now() currently invokes spl_cache_flush() on each
magazine in the same thread when bit 2 in spl.spl_kmem_cache_expire is
set. This is unsafe because there is one magazine per CPU and the
magazines are lockless, so it is impossible to guarentee that another
CPU is not using its magazine when this function is called.

The solution is to only touch the local CPU's magazine and leave other
CPU's magazines to other CPUs.

Reported-by: DHE
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #274

Linux 3.11 compat: Replace num_physpages with totalram_pages

num_physpages was removed by
torvalds/linux@cfa11e08ed39eb28a9eff9a907b20913020c69b5, so lets replace
it with totalram_pages.

This is a bug fix as much as it is a compatibility fix because
num_physpages did not reflect the number of pages actually available to
the kernel:

http://lkml.indiana.edu/hypermail/linux/kernel/0908.2/01001.html

Also, there are known issues with memory calculations when ZFS is in a
Xen dom0. There is a chance that using totalram_pages could resolve
them. This conjecture is untested at the time of writing.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #273

Add kmod repo integration

When the kmod packaging infrastructure was originally added the
dependency on the rpmfusion yum repositories was disabled.  This
was done at the time in favour of getting local builds working.

Now the time has come to conditionally re-enable that functionality
so we can properly provide binary kmod packages.

  ./configure --with-config=srpm
  make SRPM_DEFINE_KMOD='--define="repo rpmfusion"' srpm-kmod
  mock rebuild spl-kmod-x.y.z-r.el6.src.rpm

One nice benefit of finishing this work is that the generic and
fedora spl-kmod spec files can be merged again.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Fix KMC_OFFSLAB type caches

Because spl_slab_size() was always returning -ENOSPC for caches of
type KMC_OFFSLAB the cache could never be created. Additionally
the slab size is rounded up to a page which is what kv_alloc()
expects. The kv_alloc() code will minimally allocate a page,
in the KMC_OFFSLAB case this could be reduced.

The basic regression tests kmem:slab_small, kmem:slab_large,
and kmem:slab_align regression were updated to test KMC_OFFSLAB.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ying Zhu <casualfisher@gmail.com>
Closes #266

Return -1 for generic kmem cache shrinker

It has been observed that it's possible to get in a state where
shrink_slabs() will spin repeated invoking the generic kmem cache
shrinker. It fails to detect it's not making forward progress
reclaiming from the cache and doesn't give up. To ensure this
never occurs we unconditionally return -1 after reclaiming what
we can.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes zfsonlinux/zfs#1276
Closes zfsonlinux/zfs#1598
Closes zfsonlinux/zfs#1432

Modify gethrestime to use current_kernel_time()

This allows us to get nanosecond resolution. It also means
we use the same time source as utimensat(now) etc.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #255

Improve build instructions

Make it clear that when building directly from the Git tree
the configure script must be manually generated by running the
autogen.sh script. This requires that the GNU autotools packages
be installed for your distribution.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/zfs#1448

Fix bogus kmem leak warning

Commit 5c7a036 correctly relocated the creation of a taskq
and the registraction of the kmem_cache_shrinker after the
initialization of the kmem tracking code. However, the
cleanup of these structures was not done before the leak
checks in spl_kmem_fini(). This resulted in an incorrect
'kmem leaked' warning even though there was no actual leak.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/zfs#1569

Fix --enable-debug-kmem-tracking option

This code has gotten something stale and no longer builds cleanly
against modern kernels.  The two issues addressed here are as
follows:

* The hlist_*_rcu interfaces in the kernel have been relatively
  unstable.  Since this isn't performance critical code just use
  the long standing hlist_* variants.

* In older kernels the hash_ptr() function takes a 'void *' but
  in newer kernels it expects a 'const void *'.  To silence the
  compiler warnings about this explicitly cast it to a 'void *'.
  The memset function is a similar case but it always expects
  a 'void *'.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #256

Merge branch 'linux-3.10'

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #257

Linux 3.10 compat: Do not rely on struct proc_dir_entry definition

Linux kernel commit torvalds/linux#59d8053f moved the definition of
struct proc_dir_entry from include/linux/proc_fs.h to the private
header fs/proc/internal.h. The SPL relied on that to map Solaris'
kstat to entries in /proc/spl/kstat.

Since the proc_dir_entry structure is now private the only safe
thing to do is wrap the opaque proc handle with our own structure.
This actually ends up simplify the code and is good because it
moves us away from depending on implementation details of /proc.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #257

Linux 3.10 compat: add missing include of linux/slab.h

Linux kernel commit torvalds/linux@0d01ff2 changes some
includes we were depending on through linux/proc_fs.h.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #257

Linux 3.10 compat: replace PDE()->data with PDE_DATA()

Linux kernel commit torvalds/linux@d9dda78b renamed PDE() to
PDE_DATA(). To handle this detect the prefered interface
and define a PDE_DATA() wrapper for consistency.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #257

Linux 3.10 compat: struct vmalloc_info moved

Linux kernel commmit torvalds/linux@db3808c1 moved the
vmalloc_info structure from a private to a public header.
Now that it's available for kernel modules use it.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #257

Add --buildroot option to kmod build

This allows rpmbuild to define buildroot to point to where kernel
data is located.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #242

Copy spl.release.in to kernel dir

Required when compiling ZFS in the kernel.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #253

Fix ASSERT0 and VERIFY0 macro typo

Ensure the value is cast to a 'long long' for printing purposes. The
expectation is that ASSERT0/VERIFY0 are mostly used for validating
return values and thus may commonly be negative.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #246

Add ASSERT0 and VERIFY0 macros

The Illumos code introduced the ASSERT0 and VERIFY0 macros which
are to be used instead of ASSERT3S(x, ==, 0) and VERIFY3S(x, ==, 0).

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Madhav Suresh <madhav.suresh@delphix.com>
Closes #246

Fix --enable-debug-kmem-tracking option

Re-order initialization in spl_kmem_init to allow for kmem tracing
to work. The spl_kmem_init function calls taskq_create prior to
initializing the tracking (calling spl_kmem_init_tracking). Since
taskq_create uses kmem_alloc, NULL dereferences occur because the
global kmem_list hasn't had its next & prev pointers initialized yet.

This commit moves the calls to spl_kmem_init_tracking earlier in the
spl_kmem_init function in order that the subsequent kmem_alloc calls
(by taskq_create) work properly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #243

Fix taskq_wait_id()

The existing taskq_wait_id() function can incorrectly block
indefinitely. Reimplement it more simply using wait_event()
in a similar fashion to taskq_wait_all().

This flaw was uncovered in the context of moving vn_rdwr() to
a taskq. Previously taskq_wait_id() had no consumers outside
the SPLAT task framework which is why the issue went unnoticed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Fix delay()

Somewhat amazingly it went unnoticed that the delay() function
doesn't actually cause the task to block. Since the task state
is never changed from TASK_RUNNING before schedule_timeout() the
scheduler allows to task to continue running without any delay.
Using schedule_timeout_interruptible() resolves the issue by
correctly setting TASK_UNINTERRUPTIBLE.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Add msec/usec/nsec to tick convertors

Add wrappers for the Solaris MSEC_TO_TICK, USEC_TO_TICK, and
NSEC_TO_TICK conversion functions. They are mapped directly to
their Linux counterparts with the exception of NSEC_TO_TICK
can cannot use usecs_to_jiffies() because it is not exported
by the kernel.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Ignore *.{deb,rpm,tar.gz} files in the top directory.

These are build products and should be ignored.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Issue zfsonlinux/zfs#1402

Add --bump=0 to alien

Preserve the release field when creating Debian packages. The
--keep-version option was not used because it results in a failure
when the git '<commit>_<hash>' syntax is used for the release.
The '_' is a valid character for RPM packages but not for DEBs.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Issue zfsonlinux/zfs#1402
Issue zfsonlinux/zfs#928

Support .nogitrelease file

When building a custom release in a git tree provide the ability
to prevent the release field from being overwritten by the
`git describe` output.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#1402

Fix various generic kmod RPM spec issues.

There are a number of issues with the generic kmod RPM spec in its
current state:
- The "%{__id_u}" macro seems to not be available on some systems (e.g.
   Debian squeeze). It appears it has been deprecated. Use "${__id} -u"
   instead.
- The way the "--with-linux=" configure option is generated in the
   non-RHEL/Fedora case is completely wrong with various newline and
   escaping issues (also, $kernel_version is not available in the
   generator context).

The second issue made the generator shell snippet (almost) silently
fail, which under specific circumstances can result in broken builds
against the wrong kernel sources.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #231

Add additional dependencies for DKMS package

For the DKMS package to successfully build the kernel-devel
headers must be included along gcc, make, and perl. The SPL
code never directly invokes perl but the kernel build system
depends on it.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#1380

Replace the SPL_AC_META perl dependency with awk

The only remaining perl dependency is part of the SPL_AC_META macro.
By eliminating this and replacing it with awk we can avoid the need
to pull in perl to rebuild the packages.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#1380

Automake 1.10.1 compat: AM_SILENT_RULES

Part of the automated testing involves building the source on Debian Lenny
which ships an ancient version of automake (1.10.1).  Historically, this
has caused a non-fatal warning about AM_SILENT_RULES not being defined.
But when the autogen.sh script was updated to use autoreconf the warning
became fatal.

  configure.ac:31: warning: macro `AM_SILENT_RULES' not found in library
  autoreconf: running: /usr/bin/autoconf --force
  configure.ac:34: error: possibly undefined macro: AM_SILENT_RULES
        If this token and others are legitimate, please use m4_pattern_allow.

To resolve this build issue the call to AM_SILENT_RULES has been wrapped
by m4_ifdef().  This prevents the macro from being expanded on platforms
where it's undefined.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

build: do not call boilerplate ourself

Rationale see section 3.5 "Using `autoreconf' to Update `configure'
Scripts" of the autoconf manual.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

gitignore: anchor entries at their respective directory

.ko is specific to module, .m4 to config, etc.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

build: use CPPFLAGS

-D and -I are preprocessor flags, so should preferably be in the
appropriate variable.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

build: resolve orthographic and other grammatical errors

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Tag spl-0.6.1

META file and release log updated.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Provide ${kmodname}-devel-kmod for yum-builddep

In order to ensure that yum-builddep pulls in all the build
requirements a generic ${kmodname}-devel-kmod provides line is
added. This allows a version of the development headers to be
included without requiring knowledge of the kernel version.

This is important because unlike rpmbuild which does correctly
expand the source rpm spec file, yum-builddep does not. Without
this generic provides line mock which relies on yum-builddep is
unable to automatically satisfy the dependency.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Use 'git describe' for working builds

When building from an arbitrary commit in the git tree it's useful
for the resulting packages to be uniquely identifiable.  Therefore,
the build system has been updated to detect if your compiling in
git tree.

If you are building in a git tree, and there are commits after the
last annotated tag.  Then the <id>-<hash> component of 'git describe'
will be used to overwrite the 'Release:' field in the META file.

The only tricky part is that to ensure the 'make dist' tarball is
built using the correct release.  A dist-hook was added to the top
level make file to rewrite the META file using the correct release.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #195
Issue #111

Do not call cond_resched() in spl_slab_reclaim()

Calling cond_resched() after each object is freed and then after each
slab is freed can cause slabs of objects to live for excessive periods
of time following reclaimation. This interferes with the kernel's own
memory management when called from kswapd and can cause direct reclaim
to occur in response to memory pressure that should have been resolved.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>

Use requested kernel for dkms builds

The --with-linux and --with-linux-obj options must be specified
as part of the dkms build otherwise the package will be built
against the running kernel.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Remove spl-dkms conflict with spl-kmod

Because the spl-dkms package also provides spl-kmod for the
spl user package yum flags this as a conflict.  To avoid the
problem remove the Conflicts tag from spl-dkms and just rely
on the one in spl-kmod.

  spl-dkms-0.6.0-rc14.fc18.noarch has installed conflicts
    spl-kmod: spl-dkms-0.6.0-rc14.fc18.noarch

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Create splat man page

The automake templates have been updated to install this man
page and the existing packaging was updated to include it.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Refresh RPM packaging

Refresh the existing RPM packaging to conform to the 'Fedora
Packaging Guidelines'.  This includes adopting the kmods2
packaging standard which is used fod kmods distributed by
rpmfusion for Fedora/RHEL.

  http://fedoraproject.org/wiki/Packaging:Guidelines
  http://rpmfusion.org/Packaging/KernelModules/Kmods2

While the spec files have been entirely rewritten from a
user perspective the only major changes are:

* The Fedora packages now have a build dependency on the
  rpmfusion repositories.  The generic kmod packages also
  have a new dependency on kmodtool-1.22 but it is bundled
  with the source rpm so no additional packages are needed.

* The kernel binary module packages have been renamed from
  spl-modules-* to kmod-spl-* as specificed by kmods2.

* The is now a common kmod-spl-devel-* package in addition
  to the per-kernel devel packages.  The common package
  contains the development headers while the per-kernel
  package contains kernel specific build products.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #222

Change spl-kmod-devel install path

Install the common spl kernel development headers under
/usr/src/spl-<version>/ rather than in a kernel specific
directory. The kernel specific build products such as
spl_config.h and Modules.symvers are left installed under
/usr/src/spl-<version>/<kernel>.

This was done to be consistent with where dkms expects
kernel module source to be packaged. It also allows for
a common spl-kmod-devel package which includes the headers,
and per-kernel spl-kmod-devel-<kernel> packages.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Merge branch 'linux-3.9'

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #221

Linux 3.9 compat: Switch to hlist_for_each{,_rcu}

torvalds/linux@b67bfe0d42cac56c512dd5da4b1b347a23f4b70a changed
hlist_for_each_entry{,_rcu} to take 3 arguments instead of 4. We handle
this by switching to hlist_for_each{,_rcu}, which works across all
supported kernels.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Drop support for 3 argument version of set_fs_pwd

This was a suggestion that Brian Behlendorf made when reviewing an early
pull request for Linux 3.9 support. This commit was made intentionally
easy to revert should we ever have a reason to reintroduce support for
older kernels.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Linux 3.9 compat: set_fs_root takes const struct path *

torvalds/linux@dcf787f39162ce32ca325b3e784aba2d2444619a enforces
const-correctness in passing struct path *.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Linux 3.9 compat: vfs_getattr takes two arguments

The function prototype of vfs_getattr previoulsy took struct vfsmount *
and struct dentry * as arguments. These would always be defined together
in a struct path *.

torvalds/linux@3dadecce20603aa380023c65e6f55f108fd5e952 modified
vfs_getattr to take struct path * is taken as an argument instead.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Linux 3.9 compat: Do not depend on f_vfsmnt

torvalds/linux@182be684784334598eee1d90274e7f7aa0063616 removed the
preprocessor definition for f_vfsmnt. The ability to access the
mountpoint via ->f_path.mnt has been stable for a long time, so we
switch to that.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Linux 3.9 compat: Include linux/sched/rt.h

Linux 3.9 reorganized sched.h, splitting it into numerous files.
torvalds/linux@8bd75c77b7c6a3954140dd2e20346aef3efe4a35 moved MAX_PRIO
and MAX_RT_PRIO to linux/sched/rt.h.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Merge branch 'build-system'

Refresh links to web site

Update links to refer to the official ZFS on Linux website instead of
@behlendorf's personal fork on github.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Remove ARCH packaging

The kernel modules are now available in the Arch User Repository
(AUR) via zfs. Since their packaging is maintained and superior
to ours it is being removed from the tree.

https://wiki.archlinux.org/index.php/ZFS

Now that various distributions are picking up the packages we
should eventually be able to remove most of this infrastructure.
Packaging belongs with the distributions not upstream.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Remove custom install-data-local for headers

Rather than use a custom install target it is cleaner to define
a 'kerneldir' and set 'kernel_HEADERS' appropriately. This
allows us to leverage the standing configure install support.

Additionally, I took this opertunity add the missing make files
to the include subdirectories.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>