From 92466a2729192e1bfb27536e910b77de18c125b1 Mon Sep 17 00:00:00 2001 From: Martin Baulig Date: Wed, 24 May 2006 14:56:07 +0000 Subject: [PATCH] 2006-05-24 Martin Baulig * doc/debugger-support.txt: Removed; this issue turned out to be something completely different and the patch mentioned in this file is already reverted. svn path=/trunk/mono/; revision=61062 --- ChangeLog | 6 +++ doc/debugger-issues.txt | 85 ----------------------------------------- 2 files changed, 6 insertions(+), 85 deletions(-) delete mode 100644 doc/debugger-issues.txt diff --git a/ChangeLog b/ChangeLog index 96384b33..7c68d663 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,9 @@ +2006-05-24 Martin Baulig + + * doc/debugger-support.txt: Removed; this issue turned out to be + something completely different and the patch mentioned in this + file is already reverted. + 2006-05-23 Zoltan Varga * os_dep.c (GC_unix_get_mem): Add an assert to bail out early if the runtime is diff --git a/doc/debugger-issues.txt b/doc/debugger-issues.txt deleted file mode 100644 index a739393e..00000000 --- a/doc/debugger-issues.txt +++ /dev/null @@ -1,85 +0,0 @@ -I spent the last couple of days debugging a very weird race condition. - -The problem only occured when running XSP (SVN revision 60518) inside -the debugger and only when using special parameters. - -I'm using Mono from SVN revision 60564 (that's from last Thursday), -XSP from SVN revision 60518 and manually installed xsp.exe.mdb and -Mono.WebServer.dll.mdb into $prefix/lib/xsp/1.0/. - -With this setup, I'm running XSP with - - mdb -args /work/asgard/INSTALL/lib/xsp/1.0/xsp.exe --root /work/asgard/INSTALL/lib/xsp/test/ - -Note that adding options like --nonstop or changing the --root may -make the problem go away or make it crash somewhere else. - -Then I insert a breakpoint on line 476 (that's the line before the -Console.ReadLine()) and continue. - -Using `set env GC_DONT_GC 1' inside mdb makes the problem go away and -running a stand-alone mono with -O=shared (and all the other -optimization flags the debugger is using) works fine. - -So my first guess was that this is a GC issue. - -After implementing hardware breakpoints in the debugger, I was finally -able to track this down. If I understand things correctly, the -problem goes like this: - -Some code inside XSP calls mono_thread_pool_add() - inside that -method, we GC-allocate an `ASyncCall *ac' structure, store the `msg' -and `state' objects in it and create a `MonoAsyncResult *ares'. - -Then we call mono_thread_create() passing it async_invoke_thread() and -the `ares'. - -mono_thread_create() stores them as `func' and `start_arg' in the -g_new()-allocated `start_info' and calls CreateThread() which calls -pthread_create(). - -pthread_create() is in fact a wrapper in libgc - it calls the "real" -pthread_create() and then blocks on a semaphore until the thread is -actually started. - -Now - somehow - and I still don't fully understand why - the parent -"loses" all references to the `ac' and `ares' after calling the real -pthread_create(). - -If I understand this correctly, mono_thread_pool_add() only stores -them in registers and not on the stack, so the `start_info' contains -the only references to them. The `start_info', however, is just -passed to the clone() system call and not accessed anymore after that. - -This means that all references to the `ac' and the `ares' may -disappear from the parent's stack between the clone() and sem_wait() -system calls. Under normal circumstances, this is no problem since -the child's stack is created with a reference to the `start_info'. - -I said under normal circumstances, because this is where race -condition #1 comes into the picture: - -The GC's pthread_create() passes a wrapper called GC_start_func() -around the original `start_func' to the real pthread_create(). This -wrapper calls GC_new_thread() and stores some information about the -newly created thread in its internal structures - this information is -also used to determine the child's stack. - -After that, it posts the semaphore on which the parent thread is -blocking, we release the allocation lock and everything is fine. - -However - GC_new_thread() uses GC_INTERNAL_MALLOC() to allocate the -`GC_thread' structure - and GC_INTERNAL_MALLOC() may in fact trigger a -collection ! - -Doing a collection at this time means we don't know about the child's -stack yet - and if the parent doesn't keep a reference to the `ares' -anymore, it's gone .... - -Fixing this was really easy, all I had to do is make GC_new_thread() -use calloc() instead of GC_INTERNAL_MALLOC(). - -The second issue is a debugger-only problem: we need to tell the -debugger about newly created threads while still holding the -allocation lock to ensure that no collection may happen in the -meantime. -- 2.40.0