Brian Behlendorf [Sat, 31 Jan 2009 04:54:49 +0000 (20:54 -0800)]
kmem_cache hardening and performance improvements
- Added slab work queue task which gradually ages and free's slabs
from the cache which have not been used recently.
- Optimized slab packing algorithm to ensure each slab contains the
maximum number of objects without create to large a slab.
- Fix deadlock, we can never call kv_free() under the skc_lock. We
now unlink the objects and slabs from the cache itself and attach
them to a private work list. The contents of the list are then
subsequently freed outside the spin lock.
- Move magazine create/destroy operation on to local cpu.
- Further performace optimizations by minimize the usage of the large
per-cache skc_lock. This includes the addition of KMC_BIT_REAPING
bit mask which is used to prevent concurrent reaping, and to defer
new slab creation when reaping is occuring.
- Add KMC_BIT_DESTROYING bit mask which is set when the cache is being
destroyed, this is used to catch any task accessing the cache while
it is being destroyed.
- Add comments to all the functions and additional comments to try
and make everything as clear as possible.
- Major cleanup and additions to the SPLAT kmem tests to more
rigerously stress the cache implementation and look for any problems.
This includes correctness and performance tests.
- Updated portable work queue interfaces
Brian Behlendorf [Tue, 20 Jan 2009 19:59:47 +0000 (11:59 -0800)]
Ensure -NDEBUG does not get added to spl_config.h and is only set in the build options. This allows other kernel modules to use spl_config to leverage the reset of the config checks without getting confused with the debug options
Brian Behlendorf [Tue, 13 Jan 2009 19:43:05 +0000 (11:43 -0800)]
TASKQ_DYNAMIC not yet support, do not create the global taskq with that flag or we crash with debug enabled. Also don't bother dumping debug when debugging is diabled, it's pointless
Brian Behlendorf [Tue, 13 Jan 2009 17:30:59 +0000 (09:30 -0800)]
Rework ddi_strtox calls to a native implementation which actuall supports the EINVAL, ERANGE error handling, plus add a regression suite to ensure I got it atleast mostly right
Brian Behlendorf [Wed, 26 Nov 2008 21:09:37 +0000 (13:09 -0800)]
Prefix all META_* #defines with SPL to prevent colisions which include our spl_config.h. Dependent packages may do this to leverage the autoconf check we have already run aganst the kernel.
behlendo [Thu, 13 Nov 2008 21:43:30 +0000 (21:43 +0000)]
* include/sys/sunddi.h, modules/spl/spl-module.c : Removed default
udev support from sunddi implementation because it uses GPL-only
symbols. This support is optionally available for SPL consumers
if they define HAVE_GPL_ONLY_SYMBOLS and license their module as
GPL using the MODULE_LICENSE("GPL") macro.
behlendo [Wed, 5 Nov 2008 21:43:37 +0000 (21:43 +0000)]
Add proper error handling for the case where a thread can not be
created. Instead of asserting we simply abort the test, wait for
any tasks we created to finish, and return -ESRCH back to the user
space component.
behlendo [Tue, 4 Nov 2008 23:38:29 +0000 (23:38 +0000)]
Fix a small corner case in the test infrastructure where
we might end up with a non-NULL terminated buffer if the
test name or desc is too long. Only copy N-1 bytes.
behlendo [Tue, 4 Nov 2008 23:30:15 +0000 (23:30 +0000)]
3 minor fixups where sprintf() was used instead of snprintf() with
a known max length. Additionally the function return value is cast
to void to make it explicit that the value is not needed.
behlendo [Mon, 3 Nov 2008 22:02:15 +0000 (22:02 +0000)]
* spl-09-fix-kmem-track-oops.patch
This fixes an oops when unloading the modules, in the case where memory
tracking was enabled and there were memory leaks. The comment in the
code explains what was the problem.
* spl-10-fix-assert-verify-ndebug.patch
This fixes ASSERT*() and VERIFY*() macros in non-debug builds. VERIFY*()
macros are supposed to check the condition and panic even in production
builds, and ASSERT*() macros don't need to evaluate the arguments.
Also some 32-bit fixes.
behlendo [Mon, 3 Nov 2008 21:51:33 +0000 (21:51 +0000)]
Under Solaris KM_SLEEP ensures success (or at least you hang forever).
That said when working with a finite resource like memory failure really
is always a possibility. It would be far better longer term if the ZFS
code could be weened off this assumption and properly handle the cases
where an allocation fails. Still I've applied the patch to spl-0.3.4
since this layer is supposed to emulate Solaris as closely as possible.
behlendo [Mon, 3 Nov 2008 21:06:04 +0000 (21:06 +0000)]
Add a SPL_AC_TYPE_ATOMIC64_T test to configure for systems which do
already supprt atomic64_t types.
* spl-07-kmem-cleanup.patch
This moves all the debugging code from sys/kmem.h to spl-kmem.c, because
the huge macros were hard to debug and were bloating functions that
allocated memory. I also fixed some other minor problems, including
32-bit fixes and a reported memory leak which was just due to using the
wrong free function.
behlendo [Mon, 3 Nov 2008 20:21:08 +0000 (20:21 +0000)]
Apply a nice fix caught by Ricardo,
* spl-04-fix-taskq-spinlock-lockup.patch
Fixes a deadlock in the BIO completion handler, due to the taskq code
prematurely re-enabling interrupts when another spinlock had disabled
them in the IDE IRQ handler.
behlendo [Mon, 3 Nov 2008 20:07:20 +0000 (20:07 +0000)]
Reviewed and applied spl-01-rm-gpl-symbol-set_cpus_allowed.patch
from Ricardo which removes a dependency on the GPL-only symbol
set_cpus_allowed(). Using this symbol is simpler but in the name
of portability we are adopting a spinlock based solution here
to remove this dependency.
behlendo [Mon, 3 Nov 2008 19:53:23 +0000 (19:53 +0000)]
Reviewed and applied spl-00-rm-gpl-symbol-notifier_chain.patch
from Ricardo which removes a dependency on the GPL-only symbol
needed for a panic time notifier. This funcationality was never
used and this improves our portability.
behlendo [Sun, 10 Aug 2008 03:50:36 +0000 (03:50 +0000)]
Add class / device portability code. Two autoconf tests
were added to cover the 3 possible APIs from 2.6.9 to
2.6.26. We attempt to use the newest interfaces and if
not available fallback to the oldest. This a rework of
some changes proposed by Ricardo for RHEL4.
behlendo [Tue, 5 Aug 2008 04:16:09 +0000 (04:16 +0000)]
Start bringing in Ricardo's spl-00-rhel4-compat.patch, a few chunks
at a time as I audit it. This chunk finishes moving the SPL entirely
off the linux slab on to the SPL implementation. It differs slightly
from the proposed version in that the spl continues to export to
all the Solaris types and functions. These do conflict with the
Linux slab so a module usings these interfaces must not include the
SPL slab if they also intend to use the linux slab. Or they must
explcitly #undef the macros which remap the functioin to their
spl_* equivilants.
A nice side of effect of dropping the entire linux slab is we
don't need to autoconf checks anymore. They kept messing with
the slab API endlessly!
- Remove hash functionality from slab in favor of direct lookups
based of the spl_kmem_obj_t tacked on the end of each object.
This actually isn't so back because we are now allocing large
chunks for the slab and partitioning it ourselves. So there's
not a ton of wasted space. We may suffer a performance hit
however due to alignment issues.
- Remove remaining depenancies on the linux slab implementation.
We're standing on our own now for better or worse.
- Rework slabs to be either kmem or vmem based. If neither
KMC_VMEM of KMC_KMEM are specified we make a decent guess
about what will work best for their based on the object
size. Additionally we provide a kmem_virt() function caller
can use to see if they have a virtual or physical address.
behlendo [Sat, 28 Jun 2008 20:03:11 +0000 (20:03 +0000)]
Remove stray call to spl_cache_free() and remove all the
cycle count which was costing me overhead. It was hurting
performance pretty badly for heavily used caches. I'm also
thinking the hash may be hurting me as well and it might
be worth sticking a pointer in to a little space after the
alloced object.
behlendo [Sat, 28 Jun 2008 05:04:46 +0000 (05:04 +0000)]
Victory! I've reworked caches with large objects which are
based by vmalloc()'ed memory. I now alloc a slab which is
roughly 32*spl_obj_size and in this block of memory I place
the slab descriptor, slab object descriptors, and objects
themselves. This greatly reduces vmalloc lock contention.
Still some minor cleanup remains and fine tuning but
it's working pretty well.
behlendo [Fri, 27 Jun 2008 21:40:11 +0000 (21:40 +0000)]
Further slab improvements, I'm getting close to something which works
well for the expected workloads. Improvement in this commit include:
- Added DEBUG_KMEM_TRACKING #define which can optionally be set
when DEBUG_KMEM is defined to do per allocation tracking. This
allows us to get all the lightweight kmem debugging enabled by
default which is pretty light weight, and only when looking
for a memory leak we can briefly enable the per alloc tracking.
- Added set_normalized_timespec() in to SPL to simply using
the timespec() primatives from within a module.
- Added per-spinlock cycle counters to the slab in an attempt
to run down a lock contention issue. The contended lock
was in vmalloc() but I'm going to leave the cycle counters
in place for a little while until I'm convinced there arn't
other locking improvement possible in the slab.
- Added a proc interface to the slab to export per slab
cache statistics to /proc/spl/kmem/slab for analysis.
- Reworked spl_slab_alloc() function to allocate from kmem for
small allocation and vmem for large allocations. This improved
things considerably but futher work is needed.
behlendo [Thu, 26 Jun 2008 19:49:42 +0000 (19:49 +0000)]
Fix for memory corruption caused by overruning the magazine
when repopulating it. Plus I fixed a few more suble races in
that part of the code which were catching me. Finally I fixed
a small race in kmem_test8.
behlendo [Wed, 25 Jun 2008 20:57:45 +0000 (20:57 +0000)]
Implement per-cpu local caches. This seems to have bough me another
factor of 10x improvement on SMP system due to reduced lock contention.
This may put me in the ballpark of what is needed. We can still further
improve things on NUMA systems by creating an additional L3 cache per
memory node instead of the current global pool. With luck this won't
be needed. I should also take another look at the locking now that
everything is working. There's a good chance I can tighten it up a
little bit and improve things a little more.