granicus.if.org Git

Optimize lowest outstanding taskqid calculation in taskq_lowest_id()

In the initial version of taskq_lowest_id() the entire pending and
work list was locked under the tq->tq_lock to determine the lowest
outstanding taskqid.  At the time this done because I was rushed
and wanted to make sure it was right... fast was secondary.  Well now
fast is important too so I carefully thought through the pending
and work list management and convinced myself it is safe and correct
to simply check the first entry.  I added a large comment to the source
to explain this.  But basically as long as we are careful to ensure the
pending and work list stay sorted this is safe and fast.

The motivation for this chance was that I was observing as much as
10% of the total CPU time go to waiting on the tq->tq_lock when the
pending list was long.  This resolves that problems and frees up
that CPU time for something useful.

Strip __GFP_ZERO from kmalloc it is not available for older kernels.

This is needed to avoid a BUG_ON() on RHEL5.4 kernel 2.6.18-164.6.1,
since __GFP_ZERO is not a valid flag for kmalloc().

Fix kmem:slab_overcommit regression test locking

This regression test could crash in splat_kmem_cache_test_reclaim()
due to a race between the slab relclaim and the normal exiting of
the thread.  Specifically, the kct structure could be free'd by
the thread performing the allocations while the reclaim function
was also working on that's threads kct structure.  The simplest
fix is to extend the kcp->kcp_lock over the reclaim to prevent
the kct from being freed.  A better fix would be to ref count
these structures, but since is just a regression this locking
change is enough.  Surprisingly this was only observed commonly
under RHEL5.4 but all platform could have hit this.

Check for changed gaurd macro in 2.6.28+ for rwsem implementation.

As part of the 2.6.28 cleanup which moved all the linux/include/asm/
headers in to linux/arch, the guard headers for many header files
changed. The i386 rwsem implementation keys off this header to
ensure the internal members of the rwsem structure are interpreted
correctly. This change checks for the new guard macro in addition
to the only one, the implementation of the rwsem has not changed
for i386 so this is safe and correct.

Add skc_flags and full header to /proc/spl/kmem/slab.

Splat vnode tests must return negative error codes.

I must have been in a hurry when I wrote the vnode regression tests
because the error code handling is not correct.  The Solaris vnode
API returns positive errno's, these need to be converted to negative
errno's for Linux before being passed back to user space.  Otherwise
the test hardness with report the failure but errno will not be set
with the correct error code.

Additionally tests 3, 4, 6, and 7 may fail in the test file already
exists.  To avoid false positives a user mode helper has added to
remove the test files in /tmp/ before running the actual test.

Atomic64 compatibility for 32-bit systems without kernel support.

This patch is another step towards updating the code to handle the
32-bit kernels which I have not been regularly testing.  This changes
do not really impact the common case I'm expected which is the latest
kernel running on an x86_64 arch.

Until the linux-2.6.31 kernel the x86 arch did not have support for
64-bit atomic operations.  Additionally, the new atomic_compat.h support
for this case was wrong because it embedded a spinlock in the atomic
variable which must always and only be 64-bits total.  To handle these
32-bit issues we now simply fall back to the --enable-atomic-spinlock
implementation if the kernel does not provide the 64-bit atomic funcs.

The second issue this patch addresses is the DEBUG_KMEM assumption that
there will always be atomic64 funcs available.  On 32-bit archs this may
not be true, and actually that's just fine.  In that case the kernel will
will never be able to allocate more the 32-bits worth anyway.  So just
check if atomic64 funcs are available, if they are not it means this
is a 32-bit machine and we can safely use atomic_t's instead.

Correctly handle division on 32-bit RHEL5 systems by returning dividend.

When using x86 specific rwsem correctly intepret rwsem->count.

Only run the kmem overcommit test on 64-bit systems.

Add missing atomic64 compat helpers for 32-bit systems.

The use of these functions was added with the recent atomic work
and not tested on 32-bit systems. Add the missing compat functions:
atomic64_inc, atomic64_dec, atomic64_add_return, atomic64_sub_return,
atomic64_inc_return, atomic64_dec_return.

Type long expected explicitly cast for 32-bit systems.

spl-modules-devel package must depend on the exact version of kernel
devel package it was built against.

Add 'srpm' --with-config option for creation of spec files.

Add chaos5 and rhel6 macro's to the spl-modules.spec.in

Prep for 0.4.7 tag, updated META and ChangeLog.

Ensure *.order and *.markers build products are removed by distclean rule.

Ensure spl_config.h is include in spl-generic.c

Always use the generic mutex_destroy().

Add mutex_enter_nested() as wrapper for mutex_lock_nested().

This symbol can be used by GPL modules which use the SPL to handle
cases where a call path takes a two different locks by the same
name. This is needed to avoid a false positive in the lock checker.

Linux 2.6.31 kmem cache alignment fixes and cleanup.

The big fix here is the removal of kmalloc() in kv_alloc().  It used
to be true in previous kernels that kmallocs over PAGE_SIZE would
always be pages aligned.  This is no longer true atleast in 2.6.31
there are no longer any alignment expectations.  Since kv_alloc()
requires the resulting address to be page align we no only either
directly allocate pages in the KMC_KMEM case, or directly call
__vmalloc() both of which will always return a page aligned address.
Additionally, to avoid wasting memory size is always a power of two.

As for cleanup several helper functions were introduced to calculate
the aligned sizes of various data structures.  This helps ensure no
case is accidentally missed where the alignment needs to be taken in
to account.  The helpers now use P2ROUNDUP_TYPE instead of P2ROUNDUP
which is safer since the type will be explict and we no longer count
on the compiler to auto promote types hopefully as we expected.

Always wnforce minimum (SPL_KMEM_CACHE_ALIGN) and maximum (PAGE_SIZE)
alignment restrictions at cache creation time.

Use SPL_KMEM_CACHE_ALIGN in splat alignment test.

Remove __GFP_NOFAIL in kmem and retry internally.

As of 2.6.31 it's clear __GFP_NOFAIL should no longer be used and it
may disappear from the kernel at any time. To handle this I have simply
added *_nofail wrappers in the kmem implementation which perform the
retry for non-atomic allocations.

From linux-2.6.31 mm/page_alloc.c:1166
/*
* __GFP_NOFAIL is not to be used in new code.
*
* All __GFP_NOFAIL callers should be fixed so that they
* properly detect and handle allocation failures.
*
* We most definitely don't want callers attempting to
* allocate greater than order-1 page units with
* __GFP_NOFAIL.
*/
WARN_ON_ONCE(order > 1);

Linux 2.6.31 Compatibility Updates

SPL_AC_2ARGS_SET_FS_PWD macro updated to explicitly include
linux/fs_struct.h which was dropped from linux/sched.h.

min_wmark_pages, low_wmark_pages, high_wmark_pages macros
introduced in newer kernels. For older kernels mm_compat.h
was introduced to define them as needed as direct mappings
to per zone min_pages, low_pages, max_pages.

Prep for 0.4.6 tag, updated META and ChangeLog.

Autoconf --enable-debug-* cleanup

Cleanup the --enable-debug-* configure options, this has been pending
for quite some time and I am glad I finally got to it.  To summerize:

1) All SPL_AC_DEBUG_* macros were updated to be a more autoconf
friendly.  This mainly involved shift to the GNU approved usage of
AC_ARG_ENABLE and ensuring AS_IF is used rather than directly using
an if [ test ] construct.

2) --enable-debug-kmem=yes by default.  This simply enabled keeping
a running tally of total memory allocated and freed and reporting a
memory leak if there was one at module unload.  Additionally, it
ensure /proc/spl/kmem/slab will exist by default which is handy.
The overhead is low for this and it should not impact performance.

3) --enable-debug-kmem-tracking=no by default.  This option was added
to provide a configure option to enable to detailed memory allocation
tracking.  This support was always there but you had to know where to
turn it on.  By default this support is disabled because it is known
to badly hurt performence, however it is invaluable when chasing a
memory leak.

4) --enable-debug-kstat removed.  After further reflection I can't see
why you would ever really want to turn this support off.  It is now
always on which had the nice side effect of simplifying the proc handling
code in spl-proc.c.  We can now always assume the top level directory
will be there.

5) --enable-debug-callb removed.  This never really did anything, it was
put in provisionally because it might have been needed.  It turns out
it was not so I am just removing it to prevent confusion.

Add autoconf checks for atomic64_cmpxchg + atomic64_xchg

These functions didn't exist for all archs prior to 2.6.24. This
patch addes an autoconf test to detect this and add them when needed.
The autoconf check is needed instead of just an #ifndef because in
the most modern kernels atomic64_{cmp}xchg are implemented as in
inline function and not a #define.

Use Linux atomic primitives by default.

Previously Solaris style atomic primitives were implemented simply by
wrapping the desired operation in a global spinlock.  This was easy to
implement at the time when I wasn't 100% sure I could safely layer the
Solaris atomic primatives on the Linux counterparts.  It however was
likely not good for performance.

After more investigation however it does appear the Solaris primitives
can be layered on Linux's fairly safely.  The Linux atomic_t type really
just wraps a long so we can simply cast the Solaris unsigned value to
either a atomic_t or atomic64_t.  The only lingering problem for both
implementations is that Solaris provides no atomic read function.  This
means reading a 64-bit value on a 32-bit arch can (and will) result in
word breaking.  I was very concerned about this initially, but upon
further reflection it is a limitation of the Solaris API.  So really
we are just being bug-for-bug compatible here.

With this change the default implementation is layered on top of Linux
atomic types.  However, because we're assuming a lot about the internal
implementation of those types I've made it easy to fall-back to the
generic approach.  Simply build with --enable-atomic_spinlocks if
issues are encountered with the new implementation.

I should not have removed these, they are important.

Rebase cmn_err on vcmn_err and don't warn about missing \n

The cmn_err/vcmn_err functions are layered on top of the debug
system which usually expects a newline at the end. However, there
really doesn't need to be a newline there and there in fact should
not be for the CE_CONT case so let's just drop the warning.

Also we make a half-hearted attempt to handle a leading ! which
means only send it to the syslog not the console. In this case
we just send to the the debug logs and not the console.

Remove usage of the __id_u macro for portability.

This macro was removed from the default RPM macro file. Interestly,
some of the arch specific macro's add it back it based on your distro
but it should not be counted on. However, __id still exists and its
command line args have historically been fairly stable so we will
directly use %{__id} -un to get the user name.

Use kobject_set_name() for increased portability.

As of 2.6.25 kobj->k_name was replaced with kobj->name. Some distros
such as RHEL5 (2.6.18) add a patch to prevent this from being a problem
but other older distros such as SLES10 (2.6.16) have not. To avoid
the whole issue I'm updating the code to use kobject_set_name() which
does what I want and has existed all the way back to 2.6.11.

Set cwd to '/' for the process executing insmod.

Ricardo has pointed out that under Solaris the cwd is set to '/'
during module load, while under Linux it is set to the callers cwd.
To handle this cleanly I've reworked the module *_init()/_exit()
macros so they call a *_setup()/_cleanup() function when any SPL
dependent module is loaded or unloaded.  This gives us a chance to
perform any needed modification of the process, in this case changing
the cwd.  It also handily provides a way to avoid creating wrapper
init()/exit() functions because the Solaris and Linux prototypes
differ slightly.  All dependent modules should now call the spl
helper macros spl_module_{init,exit}() instead of the native linux
versions.

Unfortunately, it appears that under Linux there has been no consistent
API in the kernel to set the cwd in a module.  Because of this I have
had to add more autoconf magic than I'd like.  However, what I have
done is correct and has been tested on RHEL5, SLES11, FC11, and CHAOS
kernels.

In addition, I have change the rootdir type from a 'void *' to the
correct 'vnode_t *' type.  And I've set rootdir to a non-NULL value.

Expand SEM() outside init_rwsem and directly call __init_rwsem().

We need to directly call __init_rwsem() or the name gets expanded
to SEM(lock-name). This is safe and correct for the support arches
x86/x86_64/ppc/ppc64.

Reimplement mutexs for Linux lock profiling/analysis

For a generic explanation of why mutexs needed to be reimplemented
to work with the kernel lock profiling see commits:
  e811949a57044d60d12953c5c3b808a79a7d36ef and
  d28db80fd0fd4fd63aec09037c44408e51a222d6

The specific changes made to the mutex implemetation are as follows.
The Linux mutex structure is now directly embedded in the kmutex_t.
This allows a kmutex_t to be directly case to a mutex struct and
passed directly to the Linux primative.

Just like with the rwlocks it is critical that these functions be
implemented as '#defines to ensure the location information is
preserved.  The preprocessor can then do a direct replacement of
the Solaris primative with the linux primative.

Just as with the rwlocks we need to track the lock owner.  Here
things get a little more interesting because depending on your
kernel version, and how you've built your kernel Linux may already
do this for you.  If your running a 2.6.29 or newer kernel on a
SMP system the lock owner will be tracked.  This was added to Linux
to support adaptive mutexs, more on that shortly.  Alternately, your
kernel might track the lock owner if you've set CONFIG_DEBUG_MUTEXES
in the kernel build.  If neither of the above things is true for
your kernel the kmutex_t type will include and track the lock owner
to ensure correct behavior.  This is all handled by a new autoconf
check called SPL_AC_MUTEX_OWNER.

Concerning adaptive mutexs these are a very recent development and
they did not make it in to either the latest FC11 of SLES11 kernels.
Ideally, I'd love to see this kernel change appear in one of these
distros because it does help performance.  From Linux kernel commit:
  0d66bf6d3514b35eb6897629059443132992dbd7
  "Testing with Ingo's test-mutex application...
  gave a 345% boost for VFS scalability on my testbox"
However, if you don't want to backport this change yourself you
can still simply export the task_curr() symbol.  The kmutex_t
implementation will use this symbol when it's available to
provide it's own adaptive mutexs.

Finally, DEBUG_MUTEX support was removed including the proc handlers.
This was done because now that we are cleanly integrated with the
kernel profiling all this information and much much more is available
in debug kernel builds.  This code was now redundant.

Update mutexs validated on:
    - SLES10   (ppc64)
    - SLES11   (x86_64)
    - CHAOS4.2 (x86_64)
    - RHEL5.3  (x86_64)
    - RHEL6    (x86_64)
    - FC11     (x86_64)

Update rwlocks to track owner to ensure correct semantics

The behavior of RW_*_HELD was updated because it was not quite right.
It is not sufficient to return non-zero when the lock is help, we must
only do this when the current task in the holder.

This means we need to track the lock owner which is not something
tracked in a Linux semaphore.  After some experimentation the
solution I settled on was to embed the Linux semaphore at the start
of a larger krwlock_t structure which includes the owner field.
This maintains good performance and allows us to cleanly intergrate
with the kernel lock analysis tools.  My reasons:

1) By placing the Linux semaphore at the start of krwlock_t we can
then simply cast krwlock_t to a rw_semaphore and pass that on to
the linux kernel.  This allows us to use '#defines so the preprocessor
can do direct replacement of the Solaris primative with the linux
equivilant.  This is important because it then maintains the location
information for each rw_* call point.

2) Additionally, by adding the owner to krwlock_t we can keep this
needed extra information adjacent to the lock itself.  This removes
the need for a fancy lookup to get the owner which is optimal for
performance.  We can also leverage the existing spin lock in the
semaphore to ensure owner is updated correctly.

3) All helper functions which do not need to strictly be implemented
as a define to preserve location information can be done as a static
inline function.

4) Adding the owner to krwlock_t allows us to remove all memory
allocations done during lock initialization.  This is good for all
the obvious reasons, we do give up the ability to specific the lock
name.  The Linux profiling tools will stringify the lock name used
in the code via the preprocessor and use that.

Update rwlocks validated on:
- SLES10   (ppc64)
- SLES11   (x86_64)
- CHAOS4.2 (x86_64)
- RHEL5.3  (x86_64)
- RHEL6    (x86_64)
- FC11     (x86_64)

Reimplement rwlocks for Linux lock profiling/analysis.

It turns out that the previous rwlock implementation worked well but
did not integrate properly with the upstream kernel lock profiling/
analysis tools.  This is a major problem since it would be awfully
nice to be able to use the automatic lock checker and profiler.

The problem is that the upstream lock tools use the pre-processor
to create a lock class for each uniquely named locked.  Since the
rwsem was embedded in a wrapper structure the name was always the
same.  The effect was that we only ended up with one lock class for
the entire SPL which caused the lock dependency checker to flag
nearly everything as a possible deadlock.

The solution was to directly map a krwlock to a Linux rwsem using
a typedef there by eliminating the wrapper structure.  This was not
done initially because the rwsem implementation is specific to the arch.
To fully implement the Solaris krwlock API using only the provided rwsem
API is not possible.  It can only be done by directly accessing some of
the internal data member of the rwsem structure.

For example, the Linux API provides a different function for dropping
a reader vs writer lock.  Whereas the Solaris API uses the same function
and the caller does not pass in what type of lock it is.  This means to
properly drop the lock we need to determine if the lock is currently a
reader or writer lock.  Then we need to call the proper Linux API function.
Unfortunately, there is no provided API for this so we must extracted this
information directly from arch specific lock implementation.  This is
all do able, and what I did, but it does complicate things considerably.

The good news is that in addition to the profiling benefits of this
change.  We may see performance improvements due to slightly reduced
overhead when creating rwlocks and manipulating them.

The only function I was forced to sacrafice was rw_owner() because this
information is simply not stored anywhere in the rwsem.  Luckily this
appears not to be a commonly used function on Solaris, and it is my
understanding it is mainly used for debugging anyway.

In addition to the core rwlock changes, extensive updates were made to
the rwlock regression tests.  Each class of test was extended to provide
more API coverage and to be more rigerous in checking for misbehavior.

This is a pretty significant change and with that in mind I have been
careful to validate it on several platforms before committing.  The full
SPLAT regression test suite was run numberous times on all of the following
platforms.  This includes various kernels ranging from 2.6.16 to 2.6.29.

- SLES10   (ppc64)
- SLES11   (x86_64)
- CHAOS4.2 (x86_64)
- RHEL5.3  (x86_64)
- RHEL6    (x86_64)
- FC11     (x86_64)

Various spec file tweaks to handle rpm building of several distros.
Supported and tested distros now include SLES10, SLES11, Chaos 4.x,
RHEL5, and Fedora 11. This update was mainly to address rebuildable
kernel module rpms, and correct rpm dependencies for each distro.

Explicit check for requires_* rpm defines
Due to different distros and/or versions of rpm mishandling the shorthand
syntax simply use the longer version which get interpreted correctly.

Tag spl-0.4.5.
Update the ChangeLog with a summary of the changes since the last release
and update the META file to reflect the new version number.

Required missing symbols for FC11 kernels (2.6.29.4-167.fc11.x86_64)

Disable stack overflow checking by default.
The run time stack overflow checking is being disabled by default
because it is not safe for use with 2.6.29 and latter kernels. These
kernels do now have their own stack overflow checking so this support
has become redundant anyway. It can be re-enabled for older kernels or
arches without stack overflow checking by redefining CHECK_STACK().

Update global_page_state() support for 2.6.29 kernels.
Basically everything we need to monitor the global memory state of
the system is now cleanly available via global_page_state().  The
problem is that this interface is still fairly recent, and there
has been one change in the page state enum which we need to handle.
These changes basically boil down to the following:
- If global_page_state() is available we should use it.  Several
  autoconf checks have been added to detect the correct enum names.
- If global_page_state() is not available check to see if
  get_zone_counts() symbol is available and use that.
- If the get_zone_counts() symbol is not exported we have no choice
  be to dynamically aquire it at load time.  This is an absolute
  last resort for old kernel which we don't want to patch to
  cleanly export the symbol.

Remove get/put_task_struct as they are not available for SLES11
This interface is going away, and it's not as if most callers actually
use crhold/crfree when working with credentials. So it'll be okay
they we're not taking a reference on the task structure the odds of
it going away while working with a credential and pretty small.

Add basic credential support and splat tests.
The previous credential implementation simply provided the needed types and
a couple of dummy functions needed.  This update correctly ties the basic
Solaris credential API in to one of two Linux kernel APIs.

Prior to 2.6.29 the linux kernel embeded all credentials in the task
structure.  For these kernels, we pass around the entire task struct as if
it were the credential, then we use the helper functions to extract the
credential related bits.

As of 2.6.29 a new credential type was added which we can and do fairly
cleanly layer on top of.  Once again the helper functions nicely hide
the implementation details from all callers.

Three tests were added to the splat test framework to verify basic
correctness.  They should be extended as needed when need credential
functions are added.

Remove LINUXINCLUDE from autoconf wrapper, breaks 2.6.28+ kernels.
Modern kernel build systems at least post 2.6.16 will set this properly
so we should not. In fact post 2.6.28 the include headers have moved
under arch so the guess we make here is completely wrong. Letting
the kernel build system set this ensure it will be correct.

Positive Solaris ioctl return codes need to be negated for use by libc

Allow kmem or vmem based slab for slab_lock and slab_overcommit tests.
The slab_overcommit test case could hang on a system with fragmented
memory because it was creating a kmem based slab with 256K objects.
To avoid this I've removed the KMC_KMEM flag which allows the slab
to decide if it should be kmem or vmem backed based on the object
side. The slab_lock test shares this code and will also be effected.
But the point of these two tests is to stress cache locking and
memory overcommit, the type of slab is not critical. In fact, allowing
the slab to do the default smart thing is preferable.

The HAVE_PATH_IN_NAMEIDATA compat macros should have been used here.

Check arch/default/ path when detecting kernel objects on SLES
We still preferentially use arch/arch looking for a native version
but if that fails it is acceptable to use default.

Register a basic compat ioctl handler (32 vs 64 bit compat)
Simply pass the ioctl on to the normal handler.  If the ioctl
helper macros are used correctly this should be safe as they
will handle the packing/unpacking of the data encoded in the
ioctl command.  And actually, if the caller does not use the
IO* macros at all, and just passes small values, it will probably
be OK as well.  We only get in to trouble if they try and use
the upper 32-bits.  Endianness is not really a concern here, we
we are pretty much assumed they user and kernel will match.

Fixed NULL dereference by tcd_for_each() when the kmalloc() call in module/spl/spl-debug.c:1163 returns NULL.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Prevent integer overflow after ~164 days of uptime.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Add a little paranoia here to ensure endianess is set correctly.

Add basic groupmember() function, not sup groups.

Add ddi_copyin/ddi_copyout support for fake kernel originated ioctls.

Define ACE_ALL_PERMS for use by ACLs

Define FKIOCTL which is used on Solaris to mark an in-kernel ioctl.

Add ASSERTV macro to simplify removing variables (the V in ASSERTV)
which are only used in ASSERT().

Add basic support for TASKQ_THREADS_CPU_PCT taskq flag which is
used to scale the number of threads based on the number of online
CPUs. As CPUs are added/removed we should rescale the thread
count appropriately, but currently this is only done at create.

Update ChangeLog

Cleanly handle --with-linux=NONE option when used to generate source
rpms. These should not be fatal because we actually don't need them
until we build the source rpm. When doing mock builds this is
important because these dependent rpms will only be installed if
they are specificed in the source rpms spec file.

Simplify rpm build rules, added config/rpm.am.
Distro friendly changes such that the kernel modules are packaged seperately.

Add spl.release to spl-devel to simply dependent package version check.

Install spl-devel products in /usr/src/spl-SPL_VERSION/LINUX_VERSION/
Remove the spl symlink, it's just confusing

Use do_div on older kernel where do_div64 doesn't exist.

Additional tuning to get the BuildRequires right for all cases.
pl.spec~

Simplify the kernel depenency logic

Spec file update, for some reason the following shorthand syntax
was failing so it was replaced with the longer %if version.

%{!?foo: %define foo bar}

changed to

%if %{undefined foo}
%define foo bar
%endif

SRPM build farm / mock itergration

Build farm integration to ensure BuildRequires are correct

Packaging Fixes
- Kernel modules should be built using the LINUX_OBJ Makefiles and
not the LINUX Makefiles to ensure the proper install paths are used.
- Install modules in to addon/spl/
- Ensure no additional kernel module build products are packaged.
- Simplified spl.spec.in which supports RHEL, CHAOS, SLES, FEDORA.

Update ChangeLog with a high level summary of the changes from
0.4.3 to 0.4.4 prior to tagging. Full details can be found in
the git commit history.

Packaging improvements for RHEL and SLES (part 2)
- Allow checking for exported symbols in both Module.symvers
  and Module.symvers.  My stock SLES kernel ships an objects
  directory with Module.symvers, yet produces a Module.symvers
  in the local build directory.

Packaging improvements for RHEL and SLES
- Properly honor --prefix in build system and rpm spec file.
- Add '--define require_kdir' to spec file to support building
  rpms against kernel sources installed in non-default locations.
- Add '--define require_kobj' to spec file to support building
  rpms against kernel object installed in non-default locations.
- Stop suppressing errors in autogen.sh script.
- Improved logic to detect missing kernel objects when they are
  not located with the source.  This is the common case for SLES
  as well as in-tree chaos kernel builds and is done to simply
  support for multiple arches.
- Moved spl-devel build products to /usr/src/spl-<version>, a
  spl symlink is created to reference the last installed version.

SLES10 Fixes (part 9)
- Proper ioctl() 32/64-bit binary compatibility.  We need to ensure the
  ioctl data itself is always packed the same for 32/64-bit binaries.
  Additionally, the correct thing to do is encode this size in bytes
  as part of the command using _IOC_SIZE().
- Minor formatting changes to respect the 80 character limit.
- Move all SPLAT_SUBSYSTEM_* defines in to splat-ctl.h.
- Increase SPLAT_SUBSYSTEM_UNKNOWN because we were getting close
  to accidentally using it for a real registered subsystem.

SLES10 Fixes (part 8)
- Add compat_ioctl() handler, by default 64-bit SLES systems build 32-bit
  ELF binaries.  For the 32-bit binaries to pass ioctl information to a
  64-bit kernel a compatibility handler needs to be registered.  In our
  case no additional conversions are needed to convert 32-bit ioctl()
  commands to 64-bit commands so we can just call the default handler.

SLES10 Fixes (part 7)
- Initial SLES testing uncovered a long standing bug in the debug
  tracing.  The tcd_for_each() macro expected a NULL to terminate
  the trace_data[i] array but this was only ever true due to luck.
  All trace_data[] iterators are now properly capped by TCD_TYPE_MAX.
- SPLAT_MAJOR 229 conflicted with a 'hvc' device on my SLES system.
  Since this was always an arbitrary choice I picked something else.
- The HAVE_PGDAT_LIST case should set pgdat_list_addr to the value stored
  at the address of the memory location returned by kallsyms_lookup_name().

SLES10 Fixes (part 6)
- Prior to 2.6.17 there were no *_pgdat helper functions in mm/mmzone.c.
  Instead for_each_zone() operated directly on pgdat_list which may or
  may not have been exported depending on how your kernel was compiled.
  Now new configure checks determine if you have the helpers or not, and
  if the needed symbols are exported.  If they are not exported then they
  are dynamically aquired at runtime by kallsyms_lookup_name().

Powerpc Fixes (part 1):
- Enable builds for powerpc ISA type.
- Add DIV_ROUND_UP and roundup macros if unavailable.
- Cast 64-bit values for %lld format string to (long long) to
quiet compile warning.

SLES10 Fixes (part 5):
- Fix incorrect mapping for spl_device_create()->class_device_create()
which is the prefered API for 2.6.13 to 2.6.17 based kernels.

SLES10 Fixes (part 4):
- Configure check for SLES specific API change to vfs_unlink()
  and vfs_rename() which added a 'struct vfsmount *' argument.
  This was for something called the linux-security-module, but
  it appears that it was never adopted upstream.

SLES10 Fixes (part 3):
- Configure check for mutex_lock_nested().  This function was introduced
  as part of the mutex validator in 2.6.18, but if it's unavailable then
  it's safe to fallback to a plain mutex_lock().

SLES10 Fixes (part 2):
- Configure check, the div64_64() function was renamed to
  div64_u64() as of 2.6.26.
- Configure check, the global_page_state() fuction was introduced
  in 2.6.18 kernels.  The earlier 2.6.16 based SLES10 must not try
  and use it, thankfully get_zone_counts() is still available.
- To simplify debugging poison all symbols aquired dynamically
  using spl_kallsyms_lookup_name() with SYMBOL_POISON.
- Add console messages when the user mode helpers fail.
- spl_kmem_init_globals() use bit shifts instead of division.
- When the monotonic clock is unavailable __gethrtime() must perform
  the HZ division as an 'unsigned long long' because the SPL only
  implements __udivdi3(), and not __divdi3() for 'long long' division
  on 32-bit arches.

SLES10 Fixes (part 1):
- Exclude -obj when detecting installed kernel source.
- Detect -obj directory for out of tree kernel builds.
- Allow kernel build system to set CC to ensure -m64 is set properly.
This is an issue on 64-bit SLES systems which by default always
build 32-bit binaries (unlike RHEL/Fedora which default to 64-bit)

Prep for spl-0.4.3 tag.

Add list_move_tail() function.

Remove useless EOL white space padding from `splat -l` command.

Fix vmem leak in kmem_cache_test (missing splat_kmem_cache_test_kcp_free())

Allow spl_config.h to be included by dependant packages

We need dependent packages to be able to include spl_config.h so they
can leverage the configure checks the SPL has done.  This is important
because several of the spl headers need the results of these checks to
work properly.  Unfortunately, the autoheader build product is always
private to a particular build and defined certain common things.
(PACKAGE, VERSION, etc).  This prevents other packages which also use
autoheader from being include because the definitions conflict.  To
avoid this problem the SPL build system leverage AH_BOTTOM to include
a spl_unconfig.h at the botton of the autoheader build product.  This
custom include undefs all known shared symbols to prevent the confict.
This does however mean that those definition are also not availble
to the SPL package either.  The SPL package therefore uses the
equivilant SPL_META_* definitions.

FC10/i686 Compatibility Update (2.6.27.19-170.2.35.fc10.i686)

In the interests of portability I have added a FC10/i686 box to
my list of development platforms.  The hope is this will allow me
to keep current with upstream kernel API changes, and at the same
time ensure I don't accidentally break x86 support.  This patch
resolves all remaining issues observed under that environment.

1) SPL_AC_ZONE_STAT_ITEM_FIA autoconf check added.  As of 2.6.21
the kernel added a clean API for modules to get the global count
for free, inactive, and active pages.  The SPL attempts to detect
if this API is available and directly map spl_global_page_state()
to global_page_state().  If the full API is not available then
spl_global_page_state() is implemented as a thin layer to get
these values via get_zone_counts() if that symbol is available.

2) New kmem:vmem_size regression test added to validate correct
vmem_size() functionality.  The test case acquires the current
global vmem state, allocates from the vmem region, then verifies
the allocation is correctly reflected in the vmem_size() stats.

3) Change splat_kmem_cache_thread_test() to always use KMC_KMEM
based memory.  On x86 systems with limited virtual address space
failures resulted due to exhaustig the address space.  The tests
really need to problem exhausting all memory on the system thus
we need to use the physical address space.

4) Change kmem:slab_lock to cap it's memory usage at availrmem
instead of using the native linux nr_free_pages().  This provides
additional test coverage of the SPL Linux VM integration.

5) Change kmem:slab_overcommit to perform allocation of 256K
instead of 1M.  On x86 based systems it is not possible to create
a kmem backed slab with entires of that size.  To compensate for
this the number of allocations performed in increased by 4x.

6) Additional autoconf documentation for proposed upstream API
changes to make additional symbols available to modules.

7) Console error messages added when spl_kallsyms_lookup_name()
fails to locate an expected symbol.  This causes the module to fail
to load and we need to know exactly which symbol was not available.

Fix taskq_wait() not waiting bug

I'm very surprised this has not surfaced until now.  But the taskq_wait()
implementation work only wait successfully the first time it was called.
Subsequent usage of taskq_wait() on the taskq would not wait.

The issue was caused by tq->tq_lowest_id being set to MAX_INT after the
first wait completed.  This caused subsequent waits which check that the
waiting id is less than the lowest taskq id to always succeed.  The fix
is to ensure that tq->tq_lowest_id is never set larger than tq->tq_next.id.

Additional fixes which were added to this patch include:
1) Fix a race by placing the taskq_wait_check() in the tq->tq_lock spinlock.
2) taskq_wait() should wait for the largest outstanding id.
3) Multiple spelling corrections.
4) Added taskq wait regression test to validate correct behavior.

Mutex tests updated to use task queues instead of work queues.

Mainly for portability reasons I have rebased the mutex tests on Solaris
taskqs instead of linux work queues. The linux workqueue API changed post
2.6.18 kernels and using task queues avoids having to conditionally detect
which workqueue API to use.

Additionally, this is basically free additional testing for the task queues.
Much to my surprise after updating these test cases they did expose a long
standing bug in the taskq_wait() implementation. This patch does not
address that issue but the followup patch does.

Added SPL_AC_5ARGS_DEVICE_CREATE autoconf configure check

As of 2.6.27 kernels the device_create() API changed to include
a private data argument. This check detects which version of
device_create() function the kernel has and properly defines
spl_device_create() to use the correct prototype.

Fix off-by-1 truncation of hw_serial when converting from integer to string, when writing to /proc/sys/kernel/spl/spl_hostid.
Fixes hostid mismatch which leads to assertion failure when the hostid/hw_serial is a 10-character decimal number:

$ zpool status
pool: lustre
state: ONLINE
lt-zpool: zpool_main.c:3176: status_callback: Assertion `reason == ZPOOL_STATUS_OK' failed.
zsh: 5262 abort zpool status

Minor bug fix in XDR code introduced in last minute change before landing.

1) Removed xdr_bytesrec typedef which has no consumers. If we re-add
it should also probably be xdr_bytesrec_t.

Add XDR implementation

Added proper XDR implementation (Lustre bug 17662), needed for on-disk
compatibility between platforms of different endianness.

Build system cleanup

1) Undefine non-unique entries in spl_config.h
2) Minor Makefile cleanup
3) Don't use includedir for proper kernel header install

Build System Default Kernel

Update the method used for determining which kernel to build against
when not specified. Previous 'uname -r' was used but this makes the
assumption that the running kernel is the one you want to use, this is
often not the case. It is better to examine the usual kernel-devel
install locations and select one of the installed kernels.

Build system and packaging (RPM support)

An update to the build system to properly support all commonly
used Makefile targets these include:

  make all        # Build everything
  make install    # Install everything
  make clean   # Clean up build products
  make distclean  # Clean up everything
  make dist       # Create package tarball
  make srpm       # Create package source RPM
  make rpm        # Create package binary RPMs
  make tags       # Create ctags and etags for everything

Extra care was taken to ensure that the source RPMs are fully
rebuildable against Fedora/RHEL/Chaos kernels.  To build binary
RPMs from the source RPM for your system simply run:

  rpmbuild --rebuild spl-x.y.z-1.src.rpm

This will produce two binary RPMs with correct 'requires'
dependencies for your kernel.  One will contain all spl modules
and support utilities, the other is a devel package for compiling
additional kernel modules which are dependant on the spl.

  spl-x.y.z-1_<kernel version>.x86_64.rpm
  spl-devel-x.y.2-1_<kernel version>.x86_64.rpm

XXX: Temporarily disable vmem_size().