The first locking issue was due to the semaphore I used. I was trying
to be overly clever and the context switch when the semaphore was busy
was destroying performance. Converting to a simple spin lock bough me
a factor of 50 or so. That said it's still not good enough. Tests
show bad performance and we are still CPU bound. The logical fix is
I need to implement per-cpu hot caches to minimize the SMP contention.
Linux and Solaris both have this, I was hoping to do without but it
looks like that's not to be.
kmem_lock: time (sec) slabs objs hash
kmem_lock: tot/max/calc tot/max/calc size/depth
kmem_lock: 0.
022000000 7/6/64 224/177/2048 32768/1
kmem_lock: 0.
039000000 13/13/128 416/404/4096 32768/1
kmem_lock: 0.
079000000 23/21/256 736/672/8192 32768/1
kmem_lock: 0.
158000000 48/47/512 1536/1504/16384 32768/1
kmem_lock: 0.
345000000 105/105/1024 3360/3358/32768 32768/2
kmem_lock: 0.
760000000 202/200/2048 6464/6400/65536 32768/3
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@135
7e1ea52c-4ff2-0310-8f11-
9dd32ca42a1c