behlendo [Tue, 6 May 2008 20:38:28 +0000 (20:38 +0000)]
Lots of fixes here:
- Detailed kmem memory allocation tracking. We can now get on
spl module unload a list of all memory allocations which were
not free'd and where the original alloc was. E.g.
- Shift to using rwsems in kmem implmentation, to simply locking and
improve concurency.
- Shift to using rwsems in mutex implementation, additionally ensure we
never sleep in the init function if non-zero preempt_count or
interrupts are disabled as can happen in a slab cache ctor/dtor.
- Other minor formating fixes and such.
TODO:
- Finish the vmem memory allocation tracking
- Vet all other SPL primatives for potential sleeping during *_init. I
suspect the rwlock implemenation does this and should be fixes just
like the mutex implemenation.
behlendo [Mon, 5 May 2008 20:18:49 +0000 (20:18 +0000)]
Commit adaptive mutexes. This seems to have introduced some new
crashes but it's not clear to me yet if these are a problem with
the mutex implementation or ZFSs usage of it.
Minor taskq fixes to add new tasks to the end of the pending list.
New an improved taskq implementation for the SPL. It allows a
configurable number of threads like the Solaris version and almost
all of the options are supported. Unfortunately, it appears to have
made absolutely no difference to our performance numbers. I need
to keep looking for where we are bottle necking.
Make sure that when calling __vmem_alloc that we
do not have __GFP_ZERO set. Once the memory is allocated
then zero out the memory if __GFP_ZERO is passed to
__vmem_alloc.
Be careful to never use any of the debug infrastructure either
before the debug subsystem is fully set up, or after the debug
subsystem has been torn down.
Stack usage is my enemy. Trade cpu cycles in the debug code to
ensure I never add anything to the stack I don't absolutely need.
All this debug code could be removed from a production build
anyway so I'm not so worried about the performance impact. We
may also consider revisting the mutex and condvar implementation
to ensure no additional stack is used there.
Initial indications are I have reduced the worst case stack
usage to 9080 bytes. Still to large for the default 8k stacks
so I have been forced to run with 16k stacks until I can
reduce the worst offenders.
More fixes to ensure we get good debug logs even if we're in the
process of destroying the stacks. Threshhold set fairly aggressively
top 80% of stack usage.
Update SPL to use new debug infrastructure. This means:
- Replacing all BUG_ON()'s with proper ASSERT()'s
- Using ENTRY,EXIT,GOTO, and RETURN macro to instument call paths
First commit of lustre style internal debug support. These
changes bring over everything lustre had for debugging with
two exceptions. I dropped by the debug daemon and upcalls
just because it made things a little easier. They can be
readded easily enough if we feel they are needed.
Everything compiles and seems to work on first inspection
but I suspect there are a handful of issues still lingering
which I'll be sorting out right away. I just wanted to get
all these changes commited and safe. I'm getting a little
paranoid about losing them.
* modules/spl/spl-kmem.c : Make sure to disable interrupts
when necessary to avoid deadlocks. We were seeing the deadlock
when calling kmem_cache_generic_constructor() and then an interrupt
forced us to end up calling kmem_cache_generic_destructor()
which caused our deadlock.
- Add some spinlocks to cover all the private data in the mutex. I don't
think this should fix anything but it's a good idea regardless.
- Drop the lock before calling the construct/destructor for the slab
otherwise we can't sleep in a constructor/destructor and for long running
functions we may NMI.
- Do something braindead, but safe for the console debug logs for now.
Add hw_serial support based on a usermodehelper which runs
at spl module load time can calls hostid. The resolved hostid
is then fed back in to a proc entry for latter use. It's
not a pretty thing, but it will work for now. The hw_serial
is required for things such as 'zpool status' to work.
Adjust the condition variables to simply sleep uninteruptibly.
This way we don't have to contend with superious wakeups which
it appears ZFS is not so careful to handle anyway. So this is
probably for the best.
Fix race in rwlock implementation which can occur when
your task is rescheduled to a different cpu after you've
taken the lock but before calling RW_LOCK_HELD is called.
We need the spinlock to ensure there is a wmb() there.
- Fix write-only behavior in vn-open()
- Ensure we have at least 1 write-only splat test
- Fix return codes for vn_* Solaris does not use negative return
codes in the kernel. So linux errno's must be inverted.
Update the thread shim to use the current kernel threading API.
We need to use kthread_create() here for a few reasons. First
off to old kernel_thread() API functioin will be going away.
Secondly, and more importantly if I use kthread_create() we can
then properly implement a thread_exit() function which terminates
the kernel thread at any point with do_exit(). This fixes our
cleanup bug which was caused by dropping a mutex twice after
thread_exit() didn't really exit.
Correctly implement atomic_cas_ptr() function. Ideally all of these
atomic operations will be rewritten anyway with the correct arch
specific assembly. But not today.
- Remapped ldi_handle_t to struct block_device * which is much more useful
- Added liunx block device headers to sunldi.h
- Made __taskq_dispatch safe for interrupt context where it turns out we
need to be useing it.
- Fixed NULL const/dest bug for kmem slab caches
- Places debug __dprintf debugging messages under a spin_lock_irqsave
so it's safe to use then in interrupt handlers. For debugging only!
Apparently it's OK for done to be NULL, which was not clear in the
Solaris man page. Anyway, since apparently this usage is accectable
I've updated the function to handle it.
Double large kmalloc warning size to 4 pages. It was 2 pages, and ideally
it should be dropped to one page but in the short term we should be able
to easily live with 4 page allocations.
Fix the nvlist bug, it turns out the user space side of things were
packing the nvlists correctly as little endian, and the kernel space
side of things due to a missing #define were unpacking them as big endian.
behlendo [Fri, 28 Mar 2008 18:21:09 +0000 (18:21 +0000)]
Correctly functioning 64-bit atomic shim layer. It's not
what I would call effecient but it does have the advantage
of being correct which is all I need right now. I added
a regression test as well.
behlendo [Thu, 27 Mar 2008 22:06:59 +0000 (22:06 +0000)]
- Thinko fix to the SPL module interface
- Enhanse the VERIFY() support to output the values which
failed to compare as expected before crashing. This make
debugging much much much easier.
behlendo [Thu, 20 Mar 2008 23:30:15 +0000 (23:30 +0000)]
OK, a first reasonable attempt at a solaris module/chdev shim layer.
This should handle the absolute minimum I need for ZFS. It will
register the chdev with the right callbacks. Then the generic
registered linux callback will find the right registered solaris
callback for the function and munge the args just right before
passing it on. Should work, but untested (just compiled), so I
expect bugs.
behlendo [Tue, 18 Mar 2008 23:20:30 +0000 (23:20 +0000)]
OK, some pretty substantial rework here. I've merged the spl-file
stuff which only inclused the getf()/releasef() in to the vnode area
where it will only really be used. These calls allow a user to
grab an open file struct given only the known open fd for a particular
user context. ZFS makes use of these, but they're a bit tricky to
test from within the kernel since you already need the file open
and know the fd. So we basically spook the system calls to setup
the environment we need for the splat test case and verify given
just the know fd we can get the file, create the needed vnode, and
then use the vnode interface as usual to read and write from it.
While I was hacking away I also noticed a NULL termination issue
in the second kobj test case so I fixed that too. In fact, I fixed
a few other things as well but all for the best!
behlendo [Wed, 12 Mar 2008 20:52:46 +0000 (20:52 +0000)]
- Implemented vnode interfaces and 6 test cases to the test suite.
- Re-implmented kobj support based on the vnode support.
- Add TESTS option to check.sh, and removed delay after module load.
behlendo [Tue, 11 Mar 2008 20:54:40 +0000 (20:54 +0000)]
Apply fix from bug239 for rwlock deadlock.
Update check.sh script to take V=1 env var so you can run it verbosely as
follows if your chasing something: sudo make check V=1
Add new kobj api and needed regression tests to allow reading of files from
within the kernel. Normally thats not something I support but the spa layer
needs the support for its config file.
behlendo [Thu, 6 Mar 2008 23:12:55 +0000 (23:12 +0000)]
Add highbit func,
Add sloopy atomic declaration which will need to be fixed (eventually)
Fill out more of the Solaris VM hooks
Adjust the create_thread function
behlendo [Sat, 1 Mar 2008 18:30:12 +0000 (18:30 +0000)]
Stub out some missing headers which are expected. I'll fill
in what the contents need to be as I encounter the warnings
about missing prototypes, symbols, and such.
behlendo [Thu, 28 Feb 2008 00:48:31 +0000 (00:48 +0000)]
OK, I think this is the last of major cleanup and restructuring.
We've dropped all the linux- prefixes on the file in favor of spl-
which makes more sense. And we've cleaned up some of the includes
so everybody should be including their own dependencies properly.
All a module which wants to use the spl support needs to do in
include spl.h and ensure it has access to Module.symvers.
behlendo [Thu, 28 Feb 2008 00:16:24 +0000 (00:16 +0000)]
Add top level make check target which runs the validation
suite. Careful with this right now one of the tests still
causes a lockup on the node. This happened before the move
from the ZFS repo so its not a new issue.
behlendo [Wed, 27 Feb 2008 23:42:31 +0000 (23:42 +0000)]
More cleanup.
- Removed all references to kzt and replaced with splat
- Moved portions of include files which do not need to be
available to all source files in to local.h files in
proper source subdirs.
behlendo [Wed, 27 Feb 2008 20:52:44 +0000 (20:52 +0000)]
OK, everything builds now. My initial intent was to place all of
the directories at the top level but that proved troublesome. The
kernel buildsystem and autoconf were conflicting too much. To
resolve the issue I moved the kernel bits in to a modules directory
which can then only use the kernel build system. We just pass
along the likely make targets to the kernel build system.
behlendo [Wed, 27 Feb 2008 19:09:51 +0000 (19:09 +0000)]
Lots of build fixes. This is turning out to be a very good
idea since it forcefully codifing the ABI. Since the shim
layer is no longer linked at build time in to the test suite
we can;'t cut any corners and get away with it.
Everything is working now with the exception of sorting
setting Module.symvers properly. This may take a little
Makefile reorg.
behlendo [Tue, 26 Feb 2008 23:20:41 +0000 (23:20 +0000)]
User space build fixes:
- Add list handling compatibility library
- Drop uu_* list handling in favor of local list implementation
- libtoolize
- generic makefile cleanup
behlendo [Tue, 26 Feb 2008 20:36:04 +0000 (20:36 +0000)]
Initial commit. All spl source written up to this point wrapped
in an initial reasonable autoconf style build system. This does
not yet build but the configure system does appear to work properly
and integrate with the kernel. Hopefully the next commit gets
us back to a buildable version we can run the test suite against.