Brian Behlendorf [Wed, 25 Feb 2009 21:20:40 +0000 (13:20 -0800)]
Linux VM Integration Cleanup
Remove all instances of functions being reimplemented in the SPL.
When the prototypes are available in the linux headers but the
function address itself is not exported use kallsyms_lookup_name()
to find the address. The function name itself can them become a
define which calls a function pointer. This is preferable to
reimplementing the function in the SPL because it ensures we get
the correct version of the function for the running kernel. This
is actually pretty safe because the prototype is defined in the
headers so we know we are calling the function properly.
This patch also includes a rhel5 kernel patch we exports the needed
symbols so we don't need to use kallsyms_lookup_name(). There are
autoconf checks to detect if the symbol is exported and if so to
use it directly. We should add patches for stock upstream kernels
as needed if for no other reason than so we can easily track which
additional symbols we needed exported. Those patches can also be
used by anyone willing to rebuild their kernel, but this should
not be a requirement. The rhel5 version of the export-symbols
patch has been applied to the chaos kernel.
Additional fixes:
1) Implement vmem_size() function using get_vmalloc_info()
2) SPL_CHECK_SYMBOL_EXPORT macro updated to use $LINUX_OBJ instead
of $LINUX because Module.symvers is a build product. When
$LINUX_OBJ != $LINUX we will not properly detect exported symbols.
3) SPL_LINUX_COMPILE_IFELSE macro updated to add include2 and
$LINUX/include search paths to allow proper compilation when
the kernel target build directory is not the source directory.
Brian Behlendorf [Thu, 19 Feb 2009 19:26:17 +0000 (11:26 -0800)]
Add zone_get_hostid() function
Minimal support added for the zone_get_hostid() function. Only
global zones are supported therefore this function must be called
with a NULL argumment. Additionally, I've added the HW_HOSTID_LEN
define and updated all instances where a hard coded magic value
of 11 was used; "A good riddance of bad rubbish!"
Brian Behlendorf [Wed, 18 Feb 2009 18:16:26 +0000 (10:16 -0800)]
Coverity 9657: Resource Leak
Accidentally leaked list item li in error path. The fix is to
adjust this error path to ensure the allocated list item which
has not yet been added to the list gets freed. To do this we
simply add a new goto label slightly earlier to use the existing
cleanup logic and minimize the number of unique return points.
Brian Behlendorf [Wed, 18 Feb 2009 18:09:01 +0000 (10:09 -0800)]
Coverity 9656: Forward NULL
This was a false positive the callpath being walked is impossible
because the splat_kmem_cache_test_kcp_alloc() function will ensure
kcp->kcp_kcd[0] is initialized to NULL. However, there is no harm
is making this explicit for the test case so I'm adding a line to
clearly set it to correct the analysis.
Brian Behlendorf [Wed, 18 Feb 2009 17:48:07 +0000 (09:48 -0800)]
Coverity 9649, 9650, 9651: Uninit
This check was originally added to detect double initializations
of mutex types (which it did find). Unfortunately, Coverity is
right that there is a very small chance we could trigger the
assertion by accident because an uninitialized stack variable
happens to contain the mutex magic. This is particularly unlikely
since we do poison the mutexs when destroyed but still possible.
Therefore I'm simply removing the assertion.
Brian Behlendorf [Wed, 18 Feb 2009 00:41:08 +0000 (16:41 -0800)]
Coverity 9654, 9654: Use After Free
Because vmem_free() was implemented as a macro using the ','
operator to evaluate both arguments and we performed the free
before evaluating size we would deference the free'd pointer.
To resolve the problem we just invert the ordering and evaluate
size first just as if it was evaluated by the caller when being
passed to this function. This ensure that if the caller is
doing something reckless like performing an assignment as
part of the size argument we still perform it and it simply
doesn't get removed by the macro. Oh course nobody should
be doing this sort of thing, but just in case.
Brian Behlendorf [Wed, 18 Feb 2009 00:24:26 +0000 (16:24 -0800)]
Coverity 9641: Buffer Size
When SPLAT_TEST_INIT() initialized SPLAT_KMEM_TEST11_NAME the short
short test name overran the static length buffer of SPLAT_NAME_SIZE.
This was fixed by increasing the buffer length from 16 to 20 bytes.
Brian Behlendorf [Tue, 17 Feb 2009 23:52:18 +0000 (15:52 -0800)]
kmem slab magazine ageing deadlock
- The previous magazine ageing sceme relied on the on_each_cpu()
function to call spl_magazine_age() on each cpu. It turns out
this could deadlock with do_flush_tlb_all() which also relies
on the IPI based on_each_cpu(). To avoid this problem a per-
magazine delayed work item is created and indepentantly
scheduled to the correct cpu removing the need for on_each_cpu().
- Additionally two unused fields were removed from the type
spl_kmem_cache_t, they were hold overs from previous cleanup.
- struct work_struct work
- struct timer_list timer
Brian Behlendorf [Fri, 13 Feb 2009 18:28:55 +0000 (10:28 -0800)]
kmem slab fixes
- spl_slab_reclaim() 'continue' changed back to 'break' from commit 37db7d8cf9936e6d2851a4329c11efcd9f61305c. The original was correct,
I have added a comment to ensure this does not happen again.
- spl_slab_reclaim() further optimized by moving the destructor call
in spl_slab_free() outside the skc->skc_lock. This minimizes the
length of time the spin lock is held, allows the destructors to
be invoked concurrently for different objects, and as a bonus makes
it safe (although unwise) to sleep in the destructors.
Brian Behlendorf [Thu, 12 Feb 2009 21:32:10 +0000 (13:32 -0800)]
kmem slab fixes
- Default SPL_KMEM_CACHE_DELAY changed to 15 to match Solaris.
- Aged out slab checking occurs every SPL_KMEM_CACHE_DELAY / 3.
- skc->skc_reap tunable added whichs allows callers of
spl_slab_reclaim() to cap the number of slabs reclaimed.
On Solaris all eligible slabs are always reclaimed, and this
is still the default behavior. However, I suspect that is
not always wise for reasons such as in the next comment.
- spl_slab_reclaim() added cond_resched() while walking the
slab/object free lists. Soft lockups were observed when
freeing large numbers of vmalloc'd slabs/objets.
- spl_slab_reclaim() 'sks->sks_ref > 0' check changes from
incorrect 'break' to 'continue' to ensure all slabs are
checked.
- spl_cache_age() reworked to avoid a deadlock with
do_flush_tlb_all() which occured because we slept waiting
for completion in spl_cache_age(). To waiting for magazine
reclamation to finish is not required so we no longer wait.
- spl_magazine_create() and spl_magazine_destroy() shifted
back to using for_each_online_cpu() instead of the
spl_on_each_cpu() approach which was of course a bad idea
due to memory allocations which Ricardo pointed out.
Added support for Solaris swapfs_minfree, and swapfs_reserve tunables.
In additional availrmem is now available and return a reasonable value
which is reasonably analogous to the Solaris meaning. On linux we
return the sun of free and inactive pages since these are all easily
reclaimable.
All tunables are available in /proc/sys/kernel/spl/vm/* and they may
need a little adjusting once we observe the real behavior. Some of
the defaults are mapped to similar linux counterparts, others are
straight from the OpenSolaris defaults.
Support added to provide reasonable values for the global Solaris
VM variables: minfree, desfree, lotsfree, needfree. These values
are set to the sum of their per-zone linux counterparts which
should be close enough for Solaris consumers.
When a non-GPL app links against the SPL we cannot use the udev
interfaces, which means non of the device special files are created.
Because of this I had added a poor mans udev which cause the SPL
to invoke an upcall and create the basic devices when a minor
is registered. When a minor is unregistered we use the vnode
interface to unlink the special file.
- Added SPL_AC_3ARGS_ON_EACH_CPU configure check to determine
if the older 4 argument version of on_each_cpu() should be
used or the new 3 argument version. The retry argument was
dropped in the new API which was never used anyway.
- Updated work queue compatibility wrappers. The old way this
worked was to pass a data point when initialized the workqueue.
The new API assumed the work item is embedding in a structure
and we us container_of() to find that data pointer.
- Updated skc->skc_flags to be an unsigned long which is now
type checked in the bit operations. This silences the warnings.
- Updated autogen products and splat tests accordingly
Brian Behlendorf [Sat, 31 Jan 2009 04:54:49 +0000 (20:54 -0800)]
kmem_cache hardening and performance improvements
- Added slab work queue task which gradually ages and free's slabs
from the cache which have not been used recently.
- Optimized slab packing algorithm to ensure each slab contains the
maximum number of objects without create to large a slab.
- Fix deadlock, we can never call kv_free() under the skc_lock. We
now unlink the objects and slabs from the cache itself and attach
them to a private work list. The contents of the list are then
subsequently freed outside the spin lock.
- Move magazine create/destroy operation on to local cpu.
- Further performace optimizations by minimize the usage of the large
per-cache skc_lock. This includes the addition of KMC_BIT_REAPING
bit mask which is used to prevent concurrent reaping, and to defer
new slab creation when reaping is occuring.
- Add KMC_BIT_DESTROYING bit mask which is set when the cache is being
destroyed, this is used to catch any task accessing the cache while
it is being destroyed.
- Add comments to all the functions and additional comments to try
and make everything as clear as possible.
- Major cleanup and additions to the SPLAT kmem tests to more
rigerously stress the cache implementation and look for any problems.
This includes correctness and performance tests.
- Updated portable work queue interfaces
Brian Behlendorf [Tue, 20 Jan 2009 19:59:47 +0000 (11:59 -0800)]
Ensure -NDEBUG does not get added to spl_config.h and is only set in the build options. This allows other kernel modules to use spl_config to leverage the reset of the config checks without getting confused with the debug options
Brian Behlendorf [Tue, 13 Jan 2009 19:43:05 +0000 (11:43 -0800)]
TASKQ_DYNAMIC not yet support, do not create the global taskq with that flag or we crash with debug enabled. Also don't bother dumping debug when debugging is diabled, it's pointless
Brian Behlendorf [Tue, 13 Jan 2009 17:30:59 +0000 (09:30 -0800)]
Rework ddi_strtox calls to a native implementation which actuall supports the EINVAL, ERANGE error handling, plus add a regression suite to ensure I got it atleast mostly right
Brian Behlendorf [Wed, 26 Nov 2008 21:09:37 +0000 (13:09 -0800)]
Prefix all META_* #defines with SPL to prevent colisions which include our spl_config.h. Dependent packages may do this to leverage the autoconf check we have already run aganst the kernel.
behlendo [Thu, 13 Nov 2008 21:43:30 +0000 (21:43 +0000)]
* include/sys/sunddi.h, modules/spl/spl-module.c : Removed default
udev support from sunddi implementation because it uses GPL-only
symbols. This support is optionally available for SPL consumers
if they define HAVE_GPL_ONLY_SYMBOLS and license their module as
GPL using the MODULE_LICENSE("GPL") macro.
behlendo [Wed, 5 Nov 2008 21:43:37 +0000 (21:43 +0000)]
Add proper error handling for the case where a thread can not be
created. Instead of asserting we simply abort the test, wait for
any tasks we created to finish, and return -ESRCH back to the user
space component.
behlendo [Tue, 4 Nov 2008 23:38:29 +0000 (23:38 +0000)]
Fix a small corner case in the test infrastructure where
we might end up with a non-NULL terminated buffer if the
test name or desc is too long. Only copy N-1 bytes.
behlendo [Tue, 4 Nov 2008 23:30:15 +0000 (23:30 +0000)]
3 minor fixups where sprintf() was used instead of snprintf() with
a known max length. Additionally the function return value is cast
to void to make it explicit that the value is not needed.
behlendo [Mon, 3 Nov 2008 22:02:15 +0000 (22:02 +0000)]
* spl-09-fix-kmem-track-oops.patch
This fixes an oops when unloading the modules, in the case where memory
tracking was enabled and there were memory leaks. The comment in the
code explains what was the problem.
* spl-10-fix-assert-verify-ndebug.patch
This fixes ASSERT*() and VERIFY*() macros in non-debug builds. VERIFY*()
macros are supposed to check the condition and panic even in production
builds, and ASSERT*() macros don't need to evaluate the arguments.
Also some 32-bit fixes.
behlendo [Mon, 3 Nov 2008 21:51:33 +0000 (21:51 +0000)]
Under Solaris KM_SLEEP ensures success (or at least you hang forever).
That said when working with a finite resource like memory failure really
is always a possibility. It would be far better longer term if the ZFS
code could be weened off this assumption and properly handle the cases
where an allocation fails. Still I've applied the patch to spl-0.3.4
since this layer is supposed to emulate Solaris as closely as possible.
behlendo [Mon, 3 Nov 2008 21:06:04 +0000 (21:06 +0000)]
Add a SPL_AC_TYPE_ATOMIC64_T test to configure for systems which do
already supprt atomic64_t types.
* spl-07-kmem-cleanup.patch
This moves all the debugging code from sys/kmem.h to spl-kmem.c, because
the huge macros were hard to debug and were bloating functions that
allocated memory. I also fixed some other minor problems, including
32-bit fixes and a reported memory leak which was just due to using the
wrong free function.
behlendo [Mon, 3 Nov 2008 20:21:08 +0000 (20:21 +0000)]
Apply a nice fix caught by Ricardo,
* spl-04-fix-taskq-spinlock-lockup.patch
Fixes a deadlock in the BIO completion handler, due to the taskq code
prematurely re-enabling interrupts when another spinlock had disabled
them in the IDE IRQ handler.
behlendo [Mon, 3 Nov 2008 20:07:20 +0000 (20:07 +0000)]
Reviewed and applied spl-01-rm-gpl-symbol-set_cpus_allowed.patch
from Ricardo which removes a dependency on the GPL-only symbol
set_cpus_allowed(). Using this symbol is simpler but in the name
of portability we are adopting a spinlock based solution here
to remove this dependency.
behlendo [Mon, 3 Nov 2008 19:53:23 +0000 (19:53 +0000)]
Reviewed and applied spl-00-rm-gpl-symbol-notifier_chain.patch
from Ricardo which removes a dependency on the GPL-only symbol
needed for a panic time notifier. This funcationality was never
used and this improves our portability.