+++ /dev/null
-I spent the last couple of days debugging a very weird race condition.
-
-The problem only occured when running XSP (SVN revision 60518) inside
-the debugger and only when using special parameters.
-
-I'm using Mono from SVN revision 60564 (that's from last Thursday),
-XSP from SVN revision 60518 and manually installed xsp.exe.mdb and
-Mono.WebServer.dll.mdb into $prefix/lib/xsp/1.0/.
-
-With this setup, I'm running XSP with
-
- mdb -args /work/asgard/INSTALL/lib/xsp/1.0/xsp.exe --root /work/asgard/INSTALL/lib/xsp/test/
-
-Note that adding options like --nonstop or changing the --root may
-make the problem go away or make it crash somewhere else.
-
-Then I insert a breakpoint on line 476 (that's the line before the
-Console.ReadLine()) and continue.
-
-Using `set env GC_DONT_GC 1' inside mdb makes the problem go away and
-running a stand-alone mono with -O=shared (and all the other
-optimization flags the debugger is using) works fine.
-
-So my first guess was that this is a GC issue.
-
-After implementing hardware breakpoints in the debugger, I was finally
-able to track this down. If I understand things correctly, the
-problem goes like this:
-
-Some code inside XSP calls mono_thread_pool_add() - inside that
-method, we GC-allocate an `ASyncCall *ac' structure, store the `msg'
-and `state' objects in it and create a `MonoAsyncResult *ares'.
-
-Then we call mono_thread_create() passing it async_invoke_thread() and
-the `ares'.
-
-mono_thread_create() stores them as `func' and `start_arg' in the
-g_new()-allocated `start_info' and calls CreateThread() which calls
-pthread_create().
-
-pthread_create() is in fact a wrapper in libgc - it calls the "real"
-pthread_create() and then blocks on a semaphore until the thread is
-actually started.
-
-Now - somehow - and I still don't fully understand why - the parent
-"loses" all references to the `ac' and `ares' after calling the real
-pthread_create().
-
-If I understand this correctly, mono_thread_pool_add() only stores
-them in registers and not on the stack, so the `start_info' contains
-the only references to them. The `start_info', however, is just
-passed to the clone() system call and not accessed anymore after that.
-
-This means that all references to the `ac' and the `ares' may
-disappear from the parent's stack between the clone() and sem_wait()
-system calls. Under normal circumstances, this is no problem since
-the child's stack is created with a reference to the `start_info'.
-
-I said under normal circumstances, because this is where race
-condition #1 comes into the picture:
-
-The GC's pthread_create() passes a wrapper called GC_start_func()
-around the original `start_func' to the real pthread_create(). This
-wrapper calls GC_new_thread() and stores some information about the
-newly created thread in its internal structures - this information is
-also used to determine the child's stack.
-
-After that, it posts the semaphore on which the parent thread is
-blocking, we release the allocation lock and everything is fine.
-
-However - GC_new_thread() uses GC_INTERNAL_MALLOC() to allocate the
-`GC_thread' structure - and GC_INTERNAL_MALLOC() may in fact trigger a
-collection !
-
-Doing a collection at this time means we don't know about the child's
-stack yet - and if the parent doesn't keep a reference to the `ares'
-anymore, it's gone ....
-
-Fixing this was really easy, all I had to do is make GC_new_thread()
-use calloc() instead of GC_INTERNAL_MALLOC().
-
-The second issue is a debugger-only problem: we need to tell the
-debugger about newly created threads while still holding the
-allocation lock to ensure that no collection may happen in the
-meantime.