From b810e8e0895cb6bc758bee23e913b9e64fb724a0 Mon Sep 17 00:00:00 2001 From: Ivan Maidanski Date: Fri, 19 May 2017 18:55:29 +0300 Subject: [PATCH] Convert some (small) .html files to Markdown format The "doc" files converted: debugging.html, finalization.html, leak.html, porting.html, scale.html, simple_example.html. * README.md: Replace leak.html with leak.md; replace debugging.html with debugging.md. * doc/debugging.html: Change file suffix to .md; convert text format from HTML to Markdown. * doc/finalization.html: Likewise. * doc/leak.html: Likewise. * doc/porting.html: Likewise. * doc/scale.html: Likewise. * doc/simple_example.html: Likewise. * doc/gcdescr.html: Replace scale.html with scale.md; replace finalization.html with finalization.md. * doc/gcinterface.html: Likewise. * doc/doc.am (dist_doc_DATA): Rename doc/debugging.html to doc/debugging.md, finalization.html to finalization.md, leak.html to leak.md, porting.html to porting.md, scale.html to scale.md, simple_example.html to simple_example.md. * doc/overview.html: Replace simple_example.html with simple_example.md; replace leak.html with leak.md; replace scale.html with scale.md; replace finalization.html with finalization.md; replace debugging.html with debugging.md. --- README.md | 6 +- doc/debugging.html | 304 ------------------------------------- doc/debugging.md | 279 ++++++++++++++++++++++++++++++++++ doc/doc.am | 12 +- doc/finalization.html | 190 ----------------------- doc/finalization.md | 159 ++++++++++++++++++++ doc/gcdescr.html | 9 +- doc/gcinterface.html | 2 +- doc/leak.html | 207 ------------------------- doc/leak.md | 189 +++++++++++++++++++++++ doc/overview.html | 18 +-- doc/porting.html | 326 ---------------------------------------- doc/porting.md | 262 ++++++++++++++++++++++++++++++++ doc/scale.html | 199 ------------------------ doc/scale.md | 169 +++++++++++++++++++++ doc/simple_example.html | 212 -------------------------- doc/simple_example.md | 181 ++++++++++++++++++++++ 17 files changed, 1262 insertions(+), 1462 deletions(-) delete mode 100644 doc/debugging.html create mode 100644 doc/debugging.md delete mode 100644 doc/finalization.html create mode 100644 doc/finalization.md delete mode 100644 doc/leak.html create mode 100644 doc/leak.md delete mode 100644 doc/porting.html create mode 100644 doc/porting.md delete mode 100644 doc/scale.html create mode 100644 doc/scale.md delete mode 100644 doc/simple_example.html create mode 100644 doc/simple_example.md diff --git a/README.md b/README.md index 2a07e67a..ce11b7b7 100644 --- a/README.md +++ b/README.md @@ -65,7 +65,7 @@ was part of version 8 UNIX (tm), but appears to not have received widespread use. Rudimentary tools for use of the collector as a -[leak detector](http://www.hboehm.info/gc/leak.html) are included, +[leak detector](doc/leak.md) are included, as is a fairly sophisticated string package "cord" that makes use of the collector. (See doc/README.cords and H.-J. Boehm, R. Atkinson, and M. Plass, "Ropes: An Alternative to Strings", Software Practice and Experience 25, 12 @@ -113,7 +113,7 @@ large objects to be disregarded, greatly reducing the probability of accidental retention of large objects. For most purposes it seems best to compile with `ALL_INTERIOR_POINTERS` and to use `GC_malloc_ignore_off_page` if you get collector warnings from -allocations of very large objects. See doc/debugging.html for details. +allocations of very large objects. See [here](doc/debugging.md) for details. _WARNING_: pointers inside memory allocated by the standard `malloc` are not seen by the garbage collector. Thus objects pointed to only from such a @@ -260,7 +260,7 @@ or 64 bit addresses will require a major effort. A port to plain MSDOS or win16 is hard. For machines not already mentioned, or for nonstandard compilers, -some porting suggestions are provided in doc/porting.html. +some porting suggestions are provided [here](doc/porting.md). ## The C Interface to the Allocator diff --git a/doc/debugging.html b/doc/debugging.html deleted file mode 100644 index 5c24862b..00000000 --- a/doc/debugging.html +++ /dev/null @@ -1,304 +0,0 @@ - - - - -Debugging Garbage Collector Related Problems - - -

Debugging Garbage Collector Related Problems

-This page contains some hints on -debugging issues specific to -the Boehm-Demers-Weiser conservative garbage collector. -It applies both to debugging issues in client code that manifest themselves -as collector misbehavior, and to debugging the collector itself. -

-If you suspect a bug in the collector itself, it is strongly recommended -that you try the latest collector release before proceeding. -

Bus Errors and Segmentation Violations

-

-If the fault occurred in GC_find_limit, or with incremental collection enabled, -this is probably normal. The collector installs handlers to take care of -these. You will not see these unless you are using a debugger. -Your debugger should allow you to continue. -It's often preferable to tell the debugger to ignore SIGBUS and SIGSEGV -("handle SIGSEGV SIGBUS nostop noprint" in gdb, -"ignore SIGSEGV SIGBUS" in most versions of dbx) -and set a breakpoint in abort. -The collector will call abort if the signal had another cause, -and there was not other handler previously installed. -

-We recommend debugging without incremental collection if possible. -(This applies directly to UNIX systems. -Debugging with incremental collection under win32 is worse. See README.win32.) -

-If the application generates an unhandled SIGSEGV or equivalent, it may -often be easiest to set the environment variable GC_LOOP_ON_ABORT. On many -platforms, this will cause the collector to loop in a handler when the -SIGSEGV is encountered (or when the collector aborts for some other reason), -and a debugger can then be attached to the looping -process. This sidesteps common operating system problems related -to incomplete core files for multi-threaded applications, etc. -

Other Signals

-On most platforms, the multi-threaded version of the collector needs one or -two other signals for internal use by the collector in stopping threads. -It is normally wise to tell the debugger to ignore these. On Linux, -the collector currently uses SIGPWR and SIGXCPU by default. -

Warning Messages About Needing to Allocate Blacklisted Blocks

-The garbage collector generates warning messages of the form -
-Needed to allocate blacklisted block at 0x...
-
-or -
-Repeated allocation of very large block ...
-
-when it needs to allocate a block at a location that it knows to be -referenced by a false pointer. These false pointers can be either permanent -(e.g. a static integer variable that never changes) or temporary. -In the latter case, the warning is largely spurious, and the block will -eventually be reclaimed normally. -In the former case, the program will still run correctly, but the block -will never be reclaimed. Unless the block is intended to be -permanent, the warning indicates a memory leak. -
    -
  1. Ignore these warnings while you are using GC_DEBUG. Some of the routines -mentioned below don't have debugging equivalents. (Alternatively, write -the missing routines and send them to me.) -
  2. Replace allocator calls that request large blocks with calls to -GC_malloc_ignore_off_page or -GC_malloc_atomic_ignore_off_page. You may want to set a -breakpoint in GC_default_warn_proc to help you identify such calls. -Make sure that a pointer to somewhere near the beginning of the resulting block -is maintained in a (preferably volatile) variable as long as -the block is needed. -
  3. -If the large blocks are allocated with realloc, we suggest instead allocating -them with something like the following. Note that the realloc size increment -should be fairly large (e.g. a factor of 3/2) for this to exhibit reasonable -performance. But we all know we should do that anyway. -
    -void * big_realloc(void *p, size_t new_size)
    -{
    -    size_t old_size = GC_size(p);
    -    void * result;
    -
    -    if (new_size <= 10000) return(GC_realloc(p, new_size));
    -    if (new_size <= old_size) return(p);
    -    result = GC_malloc_ignore_off_page(new_size);
    -    if (result == 0) return(0);
    -    memcpy(result,p,old_size);
    -    GC_free(p);
    -    return(result);
    -}
    -
    - -
  4. In the unlikely case that even relatively small object -(<20KB) allocations are triggering these warnings, then your address -space contains lots of "bogus pointers", i.e. values that appear to -be pointers but aren't. Usually this can be solved by using GC_malloc_atomic -or the routines in gc_typed.h to allocate large pointer-free regions of bitmaps, etc. Sometimes the problem can be solved with trivial changes of encoding -in certain values. It is possible, to identify the source of the bogus -pointers by building the collector with -DPRINT_BLACK_LIST, -which will cause it to print the "bogus pointers", along with their location. - -
  5. If you get only a fixed number of these warnings, you are probably only -introducing a bounded leak by ignoring them. If the data structures being -allocated are intended to be permanent, then it is also safe to ignore them. -The warnings can be turned off by calling GC_set_warn_proc with a procedure -that ignores these warnings (e.g. by doing absolutely nothing). -
- -

The Collector References a Bad Address in GC_malloc

- -This typically happens while the collector is trying to remove an entry from -its free list, and the free list pointer is bad because the free list link -in the last allocated object was bad. -

-With > 99% probability, you wrote past the end of an allocated object. -Try setting GC_DEBUG before including gc.h and -allocating with GC_MALLOC. This will try to detect such -overwrite errors. - -

Unexpectedly Large Heap

- -Unexpected heap growth can be due to one of the following: -
    -
  1. Data structures that are being unintentionally retained. This -is commonly caused by data structures that are no longer being used, -but were not cleared, or by caches growing without bounds. -
  2. Pointer misidentification. The garbage collector is interpreting -integers or other data as pointers and retaining the "referenced" -objects. A common symptom is that GC_dump() shows much of the heap -as black-listed. -
  3. Heap fragmentation. This should never result in unbounded growth, -but it may account for larger heaps. This is most commonly caused -by allocation of large objects. On some platforms it can be reduced -by building with -DUSE_MUNMAP, which will cause the collector to unmap -memory corresponding to pages that have not been recently used. -
  4. Per object overhead. This is usually a relatively minor effect, but -it may be worth considering. If the collector recognizes interior -pointers, object sizes are increased, so that one-past-the-end pointers -are correctly recognized. The collector can be configured not to do this -(-DDONT_ADD_BYTE_AT_END). -

    -The collector rounds up object sizes so the result fits well into the -chunk size (HBLKSIZE, normally 4K on 32 bit machines, 8K -on 64 bit machines) used by the collector. Thus it may be worth avoiding -objects of size 2K + 1 (or 2K if a byte is being added at the end.) -

-The last two cases can often be identified by looking at the output -of a call to GC_dump(). Among other things, it will print the -list of free heap blocks, and a very brief description of all chunks in -the heap, the object sizes they correspond to, and how many live objects -were found in the chunk at the last collection. -

-Growing data structures can usually be identified by -

    -
  1. Building the collector with -DKEEP_BACK_PTRS, -
  2. Preferably using debugging allocation (defining GC_DEBUG -before including gc.h and allocating with GC_MALLOC), -so that objects will be identified by their allocation site, -
  3. Running the application long enough so -that most of the heap is composed of "leaked" memory, and -
  4. Then calling GC_generate_random_backtrace() from gc_backptr.h -a few times to determine why some randomly sampled objects in the heap are -being retained. -
-

-The same technique can often be used to identify problems with false -pointers, by noting whether the reference chains printed by -GC_generate_random_backtrace() involve any misidentified pointers. -An alternate technique is to build the collector with --DPRINT_BLACK_LIST which will cause it to report values that -are almost, but not quite, look like heap pointers. It is very likely that -actual false pointers will come from similar sources. -

-In the unlikely case that false pointers are an issue, it can usually -be resolved using one or more of the following techniques: -

    -
  1. Use GC_malloc_atomic for objects containing no pointers. -This is especially important for large arrays containing compressed data, -pseudo-random numbers, and the like. It is also likely to improve GC -performance, perhaps drastically so if the application is paging. -
  2. If you allocate large objects containing only -one or two pointers at the beginning, either try the typed allocation -primitives is gc_typed.h, or separate out the pointer-free component. -
  3. Consider using GC_malloc_ignore_off_page() -to allocate large objects. (See gc.h and above for details. -Large means > 100K in most environments.) -
  4. If your heap size is larger than 100MB or so, build the collector with --DLARGE_CONFIG. -This allows the collector to keep more precise black-list -information. -
  5. If you are using heaps close to, or larger than, a gigabyte on a 32-bit -machine, you may want to consider moving to a platform with 64-bit pointers. -This is very likely to resolve any false pointer issues. -
-

Prematurely Reclaimed Objects

-The usual symptom of this is a segmentation fault, or an obviously overwritten -value in a heap object. This should, of course, be impossible. In practice, -it may happen for reasons like the following: -
    -
  1. The collector did not intercept the creation of threads correctly in -a multi-threaded application, e.g. because the client called -pthread_create without including gc.h, which redefines it. -
  2. The last pointer to an object in the garbage collected heap was stored -somewhere were the collector couldn't see it, e.g. in an -object allocated with system malloc, in certain types of -mmaped files, -or in some data structure visible only to the OS. (On some platforms, -thread-local storage is one of these.) -
  3. The last pointer to an object was somehow disguised, e.g. by -XORing it with another pointer. -
  4. Incorrect use of GC_malloc_atomic or typed allocation. -
  5. An incorrect GC_free call. -
  6. The client program overwrote an internal garbage collector data structure. -
  7. A garbage collector bug. -
  8. (Empirically less likely than any of the above.) A compiler optimization -that disguised the last pointer. -
-The following relatively simple techniques should be tried first to narrow -down the problem: -
    -
  1. If you are using the incremental collector try turning it off for -debugging. -
  2. If you are using shared libraries, try linking statically. If that works, -ensure that DYNAMIC_LOADING is defined on your platform. -
  3. Try to reproduce the problem with fully debuggable unoptimized code. -This will eliminate the last possibility, as well as making debugging easier. -
  4. Try replacing any suspect typed allocation and GC_malloc_atomic -calls with calls to GC_malloc. -
  5. Try removing any GC_free calls (e.g. with a suitable -#define). -
  6. Rebuild the collector with -DGC_ASSERTIONS. -
  7. If the following works on your platform (i.e. if gctest still works -if you do this), try building the collector with --DREDIRECT_MALLOC=GC_malloc_uncollectable. This will cause -the collector to scan memory allocated with malloc. -
-If all else fails, you will have to attack this with a debugger. -Suggested steps: -
    -
  1. Call GC_dump() from the debugger around the time of the failure. Verify -that the collectors idea of the root set (i.e. static data regions which -it should scan for pointers) looks plausible. If not, i.e. if it doesn't -include some static variables, report this as -a collector bug. Be sure to describe your platform precisely, since this sort -of problem is nearly always very platform dependent. -
  2. Especially if the failure is not deterministic, try to isolate it to -a relatively small test case. -
  3. Set a break point in GC_finish_collection. This is a good -point to examine what has been marked, i.e. found reachable, by the -collector. -
  4. If the failure is deterministic, run the process -up to the last collection before the failure. -Note that the variable GC_gc_no counts collections and can be used -to set a conditional breakpoint in the right one. It is incremented just -before the call to GC_finish_collection. -If object p was prematurely recycled, it may be helpful to -look at *GC_find_header(p) at the failure point. -The hb_last_reclaimed field will identify the collection number -during which its block was last swept. -
  5. Verify that the offending object still has its correct contents at -this point. -Then call GC_is_marked(p) from the debugger to verify that the -object has not been marked, and is about to be reclaimed. Note that -GC_is_marked(p) expects the real address of an object (the -address of the debug header if there is one), and thus it may -be more appropriate to call GC_is_marked(GC_base(p)) -instead. -
  6. Determine a path from a root, i.e. static variable, stack, or -register variable, -to the reclaimed object. Call GC_is_marked(q) for each object -q along the path, trying to locate the first unmarked object, say -r. -
  7. If r is pointed to by a static root, -verify that the location -pointing to it is part of the root set printed by GC_dump(). If it -is on the stack in the main (or only) thread, verify that -GC_stackbottom is set correctly to the base of the stack. If it is -in another thread stack, check the collector's thread data structure -(GC_thread[] on several platforms) to make sure that stack bounds -are set correctly. -
  8. If r is pointed to by heap object s, check that the -collector's layout description for s is such that the pointer field -will be scanned. Call *GC_find_header(s) to look at the descriptor -for the heap chunk. The hb_descr field specifies the layout -of objects in that chunk. See gc_mark.h for the meaning of the descriptor. -(If its low order 2 bits are zero, then it is just the length of the -object prefix to be scanned. This form is always used for objects allocated -with GC_malloc or GC_malloc_atomic.) -
  9. If the failure is not deterministic, you may still be able to apply some -of the above technique at the point of failure. But remember that objects -allocated since the last collection will not have been marked, even if the -collector is functioning properly. On some platforms, the collector -can be configured to save call chains in objects for debugging. -Enabling this feature will also cause it to save the call stack at the -point of the last GC in GC_arrays._last_stack. -
  10. When looking at GC internal data structures remember that a number -of GC_xxx variables are really macro defined to -GC_arrays._xxx, so that -the collector can avoid scanning them. -
- - diff --git a/doc/debugging.md b/doc/debugging.md new file mode 100644 index 00000000..b714dfb2 --- /dev/null +++ b/doc/debugging.md @@ -0,0 +1,279 @@ +# Debugging Garbage Collector Related Problems + +This page contains some hints on debugging issues specific to the +Boehm-Demers-Weiser conservative garbage collector. It applies both +to debugging issues in client code that manifest themselves as collector +misbehavior, and to debugging the collector itself. + +If you suspect a bug in the collector itself, it is strongly recommended that +you try the latest collector release before proceeding. + +## Bus Errors and Segmentation Violations + +If the fault occurred in `GC_find_limit`, or with incremental collection +enabled, this is probably normal. The collector installs handlers to take care +of these. You will not see these unless you are using a debugger. Your +debugger _should_ allow you to continue. It's often preferable to tell the +debugger to ignore SIGBUS and SIGSEGV ("handle SIGSEGV SIGBUS nostop noprint" +in gdb, "ignore SIGSEGV SIGBUS" in most versions of dbx) and set a breakpoint +in `abort`. The collector will call abort if the signal had another cause, and +there was not other handler previously installed. + +We recommend debugging without incremental collection if possible. (This +applies directly to UNIX systems. Debugging with incremental collection under +win32 is worse. See README.win32.) + +If the application generates an unhandled SIGSEGV or equivalent, it may often +be easiest to set the environment variable `GC_LOOP_ON_ABORT`. On many +platforms, this will cause the collector to loop in a handler when the SIGSEGV +is encountered (or when the collector aborts for some other reason), and +a debugger can then be attached to the looping process. This sidesteps common +operating system problems related to incomplete core files for multi-threaded +applications, etc. + +## Other Signals + +On most platforms, the multi-threaded version of the collector needs one or +two other signals for internal use by the collector in stopping threads. It is +normally wise to tell the debugger to ignore these. On Linux, the collector +currently uses SIGPWR and SIGXCPU by default. + +## Warning Messages About Needing to Allocate Blacklisted Blocks + +The garbage collector generates warning messages of the form: + + + Needed to allocate blacklisted block at 0x... + + +or + + + Repeated allocation of very large block ... + + +when it needs to allocate a block at a location that it knows to be referenced +by a false pointer. These false pointers can be either permanent (e.g. +a static integer variable that never changes) or temporary. In the latter +case, the warning is largely spurious, and the block will eventually +be reclaimed normally. In the former case, the program will still run +correctly, but the block will never be reclaimed. Unless the block is intended +to be permanent, the warning indicates a memory leak. + + 1. Ignore these warnings while you are using GC_DEBUG. Some of the routines + mentioned below don't have debugging equivalents. (Alternatively, write the + missing routines and send them to me.) + 2. Replace allocator calls that request large blocks with calls to + `GC_malloc_ignore_off_page` or `GC_malloc_atomic_ignore_off_page`. You may + want to set a breakpoint in `GC_default_warn_proc` to help you identify such + calls. Make sure that a pointer to somewhere near the beginning of the + resulting block is maintained in a (preferably volatile) variable as long + as the block is needed. + 3. If the large blocks are allocated with realloc, we suggest instead + allocating them with something like the following. Note that the realloc + size increment should be fairly large (e.g. a factor of 3/2) for this to + exhibit reasonable performance. But we all know we should do that anyway. + + + void * big_realloc(void *p, size_t new_size) { + size_t old_size = GC_size(p); + void * result; + if (new_size <= 10000) return(GC_realloc(p, new_size)); + if (new_size <= old_size) return(p); + result = GC_malloc_ignore_off_page(new_size); + if (result == 0) return(0); + memcpy(result,p,old_size); + GC_free(p); + return(result); + } + + + 4. In the unlikely case that even relatively small object (<20KB) + allocations are triggering these warnings, then your address space contains + lots of "bogus pointers", i.e. values that appear to be pointers but aren't. + Usually this can be solved by using `GC_malloc_atomic` or the routines + in `gc_typed.h` to allocate large pointer-free regions of bitmaps, etc. + Sometimes the problem can be solved with trivial changes of encoding + in certain values. It is possible, to identify the source of the bogus + pointers by building the collector with `-DPRINT_BLACK_LIST`, which will + cause it to print the "bogus pointers", along with their location. + 5. If you get only a fixed number of these warnings, you are probably only + introducing a bounded leak by ignoring them. If the data structures being + allocated are intended to be permanent, then it is also safe to ignore them. + The warnings can be turned off by calling `GC_set_warn_proc` with + a procedure that ignores these warnings (e.g. by doing absolutely nothing). + +## The Collector References a Bad Address in GC_malloc + +This typically happens while the collector is trying to remove an entry from +its free list, and the free list pointer is bad because the free list link +in the last allocated object was bad. + +With >99% probability, you wrote past the end of an allocated object. Try +setting `GC_DEBUG` before including `gc.h` and allocating with `GC_MALLOC`. +This will try to detect such overwrite errors. + +## Unexpectedly Large Heap + +Unexpected heap growth can be due to one of the following: + + 1. Data structures that are being unintentionally retained. This is commonly + caused by data structures that are no longer being used, but were not + cleared, or by caches growing without bounds. + 2. Pointer misidentification. The garbage collector is interpreting integers + or other data as pointers and retaining the "referenced" objects. A common + symptom is that GC_dump() shows much of the heap as black-listed. + 3. Heap fragmentation. This should never result in unbounded growth, but + it may account for larger heaps. This is most commonly caused by allocation + of large objects. On some platforms it can be reduced by building with + `-DUSE_MUNMAP`, which will cause the collector to unmap memory corresponding + to pages that have not been recently used. + 4. Per object overhead. This is usually a relatively minor effect, but + it may be worth considering. If the collector recognizes interior pointers, + object sizes are increased, so that one-past-the-end pointers are correctly + recognized. The collector can be configured not to do this + (`-DDONT_ADD_BYTE_AT_END`). + +The collector rounds up object sizes so the result fits well into the chunk +size (`HBLKSIZE`, normally 4K on 32 bit machines, 8K on 64 bit machines) used +by the collector. Thus it may be worth avoiding objects of size 2K + 1 (or 2K +if a byte is being added at the end.) The last two cases can often +be identified by looking at the output of a call to `GC_dump`. Among other +things, it will print the list of free heap blocks, and a very brief +description of all chunks in the heap, the object sizes they correspond to, +and how many live objects were found in the chunk at the last collection. + +Growing data structures can usually be identified by: + + 1. Building the collector with `-DKEEP_BACK_PTRS`, + 2. Preferably using debugging allocation (defining `GC_DEBUG` before + including `gc.h` and allocating with `GC_MALLOC`), so that objects will + be identified by their allocation site, + 3. Running the application long enough so that most of the heap is composed + of "leaked" memory, and + 4. Then calling `GC_generate_random_backtrace` from gc_backptr.h a few times + to determine why some randomly sampled objects in the heap are being + retained. + +The same technique can often be used to identify problems with false pointers, +by noting whether the reference chains printed +by `GC_generate_random_backtrace` involve any misidentified pointers. +An alternate technique is to build the collector with `-DPRINT_BLACK_LIST` +which will cause it to report values that are almost, but not quite, look like +heap pointers. It is very likely that actual false pointers will come from +similar sources. + +In the unlikely case that false pointers are an issue, it can usually +be resolved using one or more of the following techniques: + + 1. Use `GC_malloc_atomic` for objects containing no pointers. This is + especially important for large arrays containing compressed data, + pseudo-random numbers, and the like. It is also likely to improve GC + performance, perhaps drastically so if the application is paging. + 2. If you allocate large objects containing only one or two pointers at the + beginning, either try the typed allocation primitives is`gc_typed.h`, + or separate out the pointer-free component. + 3. Consider using `GC_malloc_ignore_off_page` to allocate large objects. + (See `gc.h` and above for details. Large means >100K in most environments.) + 4. If your heap size is larger than 100MB or so, build the collector with + `-DLARGE_CONFIG`. This allows the collector to keep more precise black-list + information. + 5. If you are using heaps close to, or larger than, a gigabyte on a 32-bit + machine, you may want to consider moving to a platform with 64-bit pointers. + This is very likely to resolve any false pointer issues. + +## Prematurely Reclaimed Objects + +The usual symptom of this is a segmentation fault, or an obviously overwritten +value in a heap object. This should, of course, be impossible. In practice, +it may happen for reasons like the following: + + 1. The collector did not intercept the creation of threads correctly + in a multi-threaded application, e.g. because the client called + `pthread_create` without including `gc.h`, which redefines it. + 2. The last pointer to an object in the garbage collected heap was stored + somewhere were the collector could not see it, e.g. in an object allocated + with system `malloc`, in certain types of `mmap`ed files, or in some data + structure visible only to the OS. (On some platforms, thread-local storage + is one of these.) + 3. The last pointer to an object was somehow disguised, e.g. by XORing + it with another pointer. + 4. Incorrect use of `GC_malloc_atomic` or typed allocation. + 5. An incorrect `GC_free` call. + 6. The client program overwrote an internal garbage collector data + structure. + 7. A garbage collector bug. + 8. (Empirically less likely than any of the above.) A compiler optimization + that disguised the last pointer. + +The following relatively simple techniques should be tried first to narrow +down the problem: + + 1. If you are using the incremental collector try turning it off for + debugging. + 2. If you are using shared libraries, try linking statically. If that works, + ensure that DYNAMIC_LOADING is defined on your platform. + 3. Try to reproduce the problem with fully debuggable unoptimized code. This + will eliminate the last possibility, as well as making debugging easier. + 4. Try replacing any suspect typed allocation and `GC_malloc_atomic` calls + with calls to `GC_malloc`. + 5. Try removing any `GC_free` calls (e.g. with a suitable `#define`). + 6. Rebuild the collector with `-DGC_ASSERTIONS`. + 7. If the following works on your platform (i.e. if gctest still works if + you do this), try building the collector with + `-DREDIRECT_MALLOC=GC_malloc_uncollectable`. This will cause the collector + to scan memory allocated with malloc. + +If all else fails, you will have to attack this with a debugger. The suggested +steps are: + + 1. Call `GC_dump` from the debugger around the time of the failure. Verify + that the collectors idea of the root set (i.e. static data regions which + it should scan for pointers) looks plausible. If not, i.e. if it does not + include some static variables, report this as a collector bug. Be sure + to describe your platform precisely, since this sort of problem is nearly + always very platform dependent. + 2. Especially if the failure is not deterministic, try to isolate + it to a relatively small test case. + 3. Set a break point in `GC_finish_collection`. This is a good point + to examine what has been marked, i.e. found reachable, by the collector. + 4. If the failure is deterministic, run the process up to the last + collection before the failure. Note that the variable `GC_gc_no` counts + collections and can be used to set a conditional breakpoint in the right + one. It is incremented just before the call to `GC_finish_collection`. + If object `p` was prematurely recycled, it may be helpful to look + at `*GC_find_header(p)` at the failure point. The `hb_last_reclaimed` field + will identify the collection number during which its block was last swept. + 5. Verify that the offending object still has its correct contents at this + point. Then call `GC_is_marked(p)` from the debugger to verify that the + object has not been marked, and is about to be reclaimed. Note that + `GC_is_marked(p)` expects the real address of an object (the address of the + debug header if there is one), and thus it may be more appropriate to call + `GC_is_marked(GC_base(p))` instead. + 6. Determine a path from a root, i.e. static variable, stack, or register + variable, to the reclaimed object. Call `GC_is_marked(q)` for each object + `q` along the path, trying to locate the first unmarked object, say `r`. + 7. If `r` is pointed to by a static root, verify that the location pointing + to it is part of the root set printed by `GC_dump`. If it is on the stack + in the main (or only) thread, verify that `GC_stackbottom` is set correctly + to the base of the stack. If it is in another thread stack, check the + collector's thread data structure (`GC_thread[]` on several platforms) + to make sure that stack bounds are set correctly. + 8. If `r` is pointed to by heap object `s`, check that the collector's + layout description for `s` is such that the pointer field will be scanned. + Call `*GC_find_header(s)` to look at the descriptor for the heap chunk. + The `hb_descr` field specifies the layout of objects in that chunk. + See `gc_mark.h` for the meaning of the descriptor. (If its low order 2 bits + are zero, then it is just the length of the object prefix to be scanned. + This form is always used for objects allocated with `GC_malloc` or + `GC_malloc_atomic`.) + 9. If the failure is not deterministic, you may still be able to apply some + of the above technique at the point of failure. But remember that objects + allocated since the last collection will not have been marked, even if the + collector is functioning properly. On some platforms, the collector can + be configured to save call chains in objects for debugging. Enabling this + feature will also cause it to save the call stack at the point of the last + GC in `GC_arrays._last_stack`. + 10. When looking at GC internal data structures remember that a number + of `GC_xxx` variables are really macro defined to `GC_arrays._xxx`, so that + the collector can avoid scanning them. diff --git a/doc/doc.am b/doc/doc.am index d280a58c..cea7f661 100644 --- a/doc/doc.am +++ b/doc/doc.am @@ -36,14 +36,14 @@ dist_doc_DATA = \ doc/README.uts \ doc/README.win32 \ doc/README.win64 \ - doc/debugging.html \ - doc/finalization.html \ + doc/debugging.md \ + doc/finalization.md \ doc/gc.man \ doc/gcdescr.html \ doc/gcinterface.html \ - doc/leak.html \ + doc/leak.md \ doc/overview.html \ - doc/porting.html \ - doc/scale.html \ - doc/simple_example.html \ + doc/porting.md \ + doc/scale.md \ + doc/simple_example.md \ doc/tree.html diff --git a/doc/finalization.html b/doc/finalization.html deleted file mode 100644 index 3733c0c4..00000000 --- a/doc/finalization.html +++ /dev/null @@ -1,190 +0,0 @@ - - -Finalization in the Boehm-Demers-Weiser collector - - -

Finalization

-Many garbage collectors provide a facility for executing user code -just before an object is collected. This can be used to reclaim any -system resources or non-garbage-collected memory associated with the -object. -Experience has shown that this can be a useful facility. -It is indispensable in cases in which system resources are embedded -in complex data structures (e.g. file descriptors -in the cord package). -

-Our collector provides the necessary functionality through -GC_register_finalizer in -gc.h, or by -inheriting from gc_cleanup -in gc_cpp.h. -

-However, finalization should not be used in the same way as C++ -destructors. In well-written programs there will typically be -very few uses of finalization. (Garbage collected programs that -interact with explicitly memory-managed libraries may be an exception.) -

-In general the following guidelines should be followed: -

-

Topologically Ordered Finalization

-Our conservative garbage collector supports -a form of finalization -(with GC_register_finalizer) -in which objects are finalized in topological -order. If A points to B, and both are registered for -finalization, it is guaranteed the A will be finalized first. -This usually guarantees that finalization procedures see only -unfinalized objects. -

-This decision is often questioned, particularly since it has an obvious -disadvantage. The current implementation finalizes long chains of -finalizable objects one per collection. This is hard to avoid, since -the first finalizer invoked may store a pointer to the rest of the chain -in a global variable, making it accessible again. Or it may mutate the -rest of the chain. -

-Cycles involving one or more finalizable objects are never finalized. -

-Why topological ordering? -

-It is important to keep in mind that the choice of finalization ordering -matters only in relatively rare cases. In spite of the fact that it has -received a lot of discussion, it is not one of the more important -decisions in designing a system. Many, especially smaller, applications -will never notice the difference. Nonetheless, we believe that topologically -ordered finalization is the right choice. -

-To understand the justification, observe that if As -finalization procedure does not refer to B, we could fairly easily have -avoided the dependency. We could have split A into A' -and A'' such that any references to A become references to -A', A' points to A'' but not vice-versa, only fields -needed for finalization are stored in A'', and A'' is enabled -for finalization. (GC_register_disappearing_link provides an -alternative mechanism that does not require breaking up objects.) -

-Thus assume that A actually does need access to B during -finalization. To make things concrete, assume that B is -finalizable because it holds a pointer to a C object, which must be -explicitly deallocated. (This is likely to be one of the most common -uses of finalization.) If B happens to be finalized first, -A will see a dangling pointer during its finalization. But a -principal goal of garbage collection was to avoid dangling pointers. -

-Note that the client program could enforce topological ordering -even if the system didn't. A pointer to B could be stored in -some globally visible place, where it is cleared only by As -finalizer. But this puts the burden to ensure safety back on the -programmer. -

-With topologically ordered finalization, the programmer -can fail to split an object, thus leaving an accidental cycle. This -results in a leak, which is arguably less dangerous than a dangling -pointer. More importantly, it is much easier to diagnose, -since the garbage collector would have to go out of its way not to -notice finalization cycles. It can trivially report them. -

-Furthermore unordered finalization does not really solve the problem -of cycles. Consider the above case in which As -finalization procedure depends on B, and thus a pointer to B -is stored in a global data structure, to be cleared by As finalizer. -If there is an accidental pointer from B back to A, and -thus a cycle, neither B nor A will become unreachable. -The leak is there, just as in the topologically ordered case, but it is -hidden from easy diagnosis. -

-A number of alternative finalization orderings have been proposed, e.g. -based on statically assigned priorities. In our opinion, these are much -more likely to require complex programming discipline to use in a large -modular system. (Some of them, e.g. Guardians proposed by Dybvig, -Bruggeman, and Eby, do avoid some problems which arise in combination -with certain other collection algorithms.) -

-Fundamentally, a garbage collector assumes that objects reachable -via pointer chains may be accessed, and thus should be preserved. -Topologically ordered finalization simply extends this to object finalization; -an finalizable object reachable from another finalizer via a pointer chain -is presumed to be accessible by the finalizer, and thus should not be -finalized. - -

Programming with topological finalization

-Experience with Cedar has shown that cycles or long chains of finalizable -objects are typically not a problem. -Finalizable objects are typically rare. -There are several ways to reduce spurious dependencies between finalizable -objects. Splitting objects as discussed above is one technique. -The collector also provides GC_register_disappearing_link, which -explicitly nils a pointer before determining finalization ordering. -

-Some so-called "operating systems" fail to clean up some resources associated -with a process. These resources must be deallocated at all cost before -process exit whether or not they are still referenced. Probably the best -way to deal with those is by not relying exclusively on finalization. -They should be registered in a table of weak pointers (implemented as -disguised pointers cleared by the finalization procedure that deallocates -the resource). If any references are still left at process exit, they -can be explicitly deallocated then. - -

Getting around topological finalization ordering

-There are certain situations in which cycles between finalizable objects are -genuinely unavoidable. Most notably, C++ compilers introduce self-cycles -to represent inheritance. GC_register_finalizer_ignore_self tells the -finalization part of the collector to ignore self cycles. -This is used by the C++ interface. -

-Finalize.c actually contains an intentionally undocumented mechanism -for registering a finalizable object with user-defined dependencies. -The problem is that this dependency information is also used for memory -reclamation, not just finalization ordering. Thus misuse can result in -dangling pointers even if finalization doesn't create any. -The risk of dangling pointers can be eliminated by building the collector -with -DJAVA_FINALIZATION. This forces objects reachable from finalizers -to be marked, even though this dependency is not considered for finalization -ordering. - - - diff --git a/doc/finalization.md b/doc/finalization.md new file mode 100644 index 00000000..13c14afb --- /dev/null +++ b/doc/finalization.md @@ -0,0 +1,159 @@ +# Finalization + +Many garbage collectors provide a facility for executing user code just before +an object is collected. This can be used to reclaim any system resources +or non-garbage-collected memory associated with the object. Experience has +shown that this can be a useful facility. It is indispensable in cases +in which system resources are embedded in complex data structures (e.g. file +descriptors in the `include/cord.h`). + +Our collector provides the necessary functionality through +`GC_register_finalizer` in `include/gc.h`, or by inheriting from `gc_cleanup` +in `include/gc_cpp.h`). + +However, finalization should not be used in the same way as C++ destructors. +In well-written programs there will typically be very few uses +of finalization. (Garbage collected programs that interact with explicitly +memory-managed libraries may be an exception.) + +In general the following guidelines should be followed: + + * Actions that must be executed promptly do not belong in finalizers. They + should be handled by explicit calls in the code (or C++ destructors if you + prefer). If you expect the action to occur at a specific point, this + is probably not hard. + * Finalizers are intended for resource reclamation. + * Scarce system resources should be managed explicitly whenever convenient. + Use finalizers only as a backup mechanism for the cases that would be hard + to handle explicitly. + * If scarce resources are managed with finalization, the allocation routine + for that resource (e.g. open for file handles) should force a garbage + collection (two if that does not suffice) if it finds itself short of the + resource. + * If extremely scarce resources are managed by finalization (e.g. file + descriptors on systems which have a limit of 20 open files), it may + be necessary to introduce a descriptor caching scheme to hide the resource + limit. (E.g., the program would keep real file descriptors for the 20 most + recently used logically open files. Any other needed files would be closed + after saving their state. They would then be reopened on demand. + Finalization would logically close the file, closing the real descriptor + only if it happened to be cached.) Note that most modern systems (e.g. Irix) + allow hundreds or thousands of open files, and this is typically not + an issue. + * Finalization code may be run anyplace an allocation or other call to the + collector takes place. In multi-threaded programs, finalizers have to obey + the normal locking conventions to ensure safety. Code run directly from + finalizers should not acquire locks that may be held during allocation. + This restriction can be easily circumvented by registering a finalizer which + enqueues the real action for execution in a separate thread. + +In single-threaded code, it is also often easiest to have finalizers queue +actions, which are then explicitly run during an explicit call by the user's +program. + +# Topologically Ordered Finalization + +Our _conservative garbage collector_ supports a form of finalization (with +`GC_register_finalizer`) in which objects are finalized in topological order. +If _A_ points to _B_ and both are registered for finalization, it is +guaranteed the _A_ will be finalized first. This usually guarantees that +finalization procedures see only unfinalized objects. + +This decision is often questioned, particularly since it has an obvious +disadvantage. The current implementation finalizes long chains of finalizable +objects one per collection. This is hard to avoid, since the first finalizer +invoked may store a pointer to the rest of the chain in a global variable, +making it accessible again. Or it may mutate the rest of the chain. + +Cycles involving one or more finalizable objects are never finalized. + +# Why topological ordering? + +It is important to keep in mind that the choice of finalization ordering +matters only in relatively rare cases. In spite of the fact that it has +received a lot of discussion, it is not one of the more important decisions +in designing a system. Many, especially smaller, applications will never +notice the difference. Nonetheless, we believe that topologically ordered +finalization is the right choice. + +To understand the justification, observe that if _A_'s finalization procedure +does not refer to _B_, we could fairly easily have avoided the dependency. +We could have split _A_ into _A'_ and _A''_ such that any references to _A_ +become references to _A'_, _A'_ points to _A''_ but not vice-versa, only +fields needed for finalization are stored in _A''_, and _A''_ is enabled for +finalization. (`GC_register_disappearing_link` provides an alternative +mechanism that does not require breaking up objects.) + +Thus assume that _A_ actually does need access to _B_ during finalization. +To make things concrete, assume that _B_ is finalizable because it holds +a pointer to a C object, which must be explicitly deallocated. (This is likely +to be one of the most common uses of finalization.) If _B_ happens to be +finalized first, _A_ will see a dangling pointer during its finalization. But +a principal goal of garbage collection was to avoid dangling pointers. + +Note that the client program could enforce topological ordering even if the +system did not. A pointer to _B_ could be stored in some globally visible +place, where it is cleared only by _A_'s finalizer. But this puts the burden +to ensure safety back on the programmer. + +With topologically ordered finalization, the programmer can fail to split +an object, thus leaving an accidental cycle. This results in a leak, which +is arguably less dangerous than a dangling pointer. More importantly, it is +_much_ easier to diagnose, since the garbage collector would have to go out of +its way not to notice finalization cycles. It can trivially report them. + +Furthermore unordered finalization does not really solve the problem +of cycles. Consider the above case in which _A_'s finalization procedure +depends on _B_, and thus a pointer to _B_ is stored in a global data +structure, to be cleared by _A_'s finalizer. If there is an accidental pointer +from _B_ back to _A_, and thus a cycle, neither _B_ nor _A_ will become +unreachable. The leak is there, just as in the topologically ordered case, but +it is hidden from easy diagnosis. + +A number of alternative finalization orderings have been proposed, e.g. based +on statically assigned priorities. In our opinion, these are much more likely +to require complex programming discipline to use in a large modular system. +(Some of them, e.g. Guardians proposed by Dybvig, Bruggeman, and Eby, do avoid +some problems which arise in combination with certain other collection +algorithms.) + +Fundamentally, a garbage collector assumes that objects reachable via pointer +chains may be accessed, and thus should be preserved. Topologically ordered +finalization simply extends this to object finalization; an finalizable object +reachable from another finalizer via a pointer chain is presumed to be +accessible by the finalizer, and thus should not be finalized. + +# Programming with topological finalization + +Experience with Cedar has shown that cycles or long chains of finalizable +objects are typically not a problem. Finalizable objects are typically rare. +There are several ways to reduce spurious dependencies between finalizable +objects. Splitting objects as discussed above is one technique. The collector +also provides `GC_register_disappearing_link`, which explicitly nils a pointer +before determining finalization ordering. + +Some so-called "operating systems" fail to clean up some resources associated +with a process. These resources must be deallocated at all cost before process +exit whether or not they are still referenced. Probably the best way to deal +with those is by not relying exclusively on finalization. They should +be registered in a table of weak pointers (implemented as disguised pointers +cleared by the finalization procedure that deallocates the resource). If any +references are still left at process exit, they can be explicitly deallocated +then. + +# Getting around topological finalization ordering + +There are certain situations in which cycles between finalizable objects are +genuinely unavoidable. Most notably, C++ compilers introduce self-cycles +to represent inheritance. `GC_register_finalizer_ignore_self` tells the +finalization part of the collector to ignore self cycles. This is used by the +C++ interface. + +Finalize.c actually contains an intentionally undocumented mechanism for +registering a finalizable object with user-defined dependencies. The problem +is that this dependency information is also used for memory reclamation, not +just finalization ordering. Thus misuse can result in dangling pointers even +if finalization does not create any. The risk of dangling pointers can be +eliminated by building the collector with `-DJAVA_FINALIZATION`. This forces +objects reachable from finalizers to be marked, even though this dependency +is not considered for finalization ordering. diff --git a/doc/gcdescr.html b/doc/gcdescr.html index a30e776b..25377354 100644 --- a/doc/gcdescr.html +++ b/doc/gcdescr.html @@ -275,8 +275,7 @@ of the garbage collection time. The fact that it performs only a small amount of work per call also allows it to be used as the core routine of the parallel marker. In that case it is normally invoked on thread-private mark stacks instead of the -global mark stack. More details can be found in -scale.html +global mark stack. More details can be found here.

The marker correctly handles mark stack overflows. Whenever the mark stack overflows, the mark state is reset to MS_INVALID. @@ -407,7 +406,7 @@ object itself becomes marked, we have uncovered a cycle involving the object. This usually results in a warning from the collector. Such objects are not finalized, since it may be unsafe to do so. See the more detailed - discussion of finalization semantics. + discussion of finalization semantics.

Any objects remaining unmarked at the end of this process are added to a queue of objects whose finalizers can be run. Depending on collector @@ -553,7 +552,7 @@ by using ld's function call wrapping mechanism under Linux.

Recent versions of the collector support several facilities to enhance the processor-scalability and thread performance of the collector. -These are discussed in more detail here. +These are discussed in more detail here. We briefly outline the data approach to thread-local allocation in the next section.

Thread-local allocation

@@ -606,7 +605,7 @@ Note that if the collector is configured for thread-local allocation, GC versions before 7 do not invoke the thread-local allocator by default. GC_malloc only uses thread-local allocation in version 7 and later.

-For some more details see here, and the +For some more details see here, and the technical report entitled "Fast Multiprocessor Memory Allocation and Garbage Collection" diff --git a/doc/gcinterface.html b/doc/gcinterface.html index 57729549..a7bca8d2 100644 --- a/doc/gcinterface.html +++ b/doc/gcinterface.html @@ -144,7 +144,7 @@ inaccessible. It is not an acceptable method to perform actions that must be performed in a timely fashion. See gc.h for details of the interface. -See here for a more detailed discussion +See here for a more detailed discussion of the design.

Note that an object may become inaccessible before client code is done diff --git a/doc/leak.html b/doc/leak.html deleted file mode 100644 index 43bc66d5..00000000 --- a/doc/leak.html +++ /dev/null @@ -1,207 +0,0 @@ - - - - -Using the Garbage Collector as Leak Detector - - -

Using the Garbage Collector as Leak Detector

-The garbage collector may be used as a leak detector. -In this case, the primary function of the collector is to report -objects that were allocated (typically with GC_MALLOC), -not deallocated (normally with GC_FREE), but are -no longer accessible. Since the object is no longer accessible, -there in normally no way to deallocate the object at a later time; -thus it can safely be assumed that the object has been "leaked". -

-This is substantially different from counting leak detectors, -which simply verify that all allocated objects are eventually -deallocated. A garbage-collector based leak detector can provide -somewhat more precise information when an object was leaked. -More importantly, it does not report objects that are never -deallocated because they are part of "permanent" data structures. -Thus it does not require all objects to be deallocated at process -exit time, a potentially useless activity that often triggers -large amounts of paging. -

-All non-ancient versions of the garbage collector provide -leak detection support. Version 5.3 adds the following -features: -

    -
  1. Leak detection mode can be initiated at run-time by -setting GC_find_leak instead of building the -collector with FIND_LEAK -defined. This variable should be set to a nonzero value -at program startup. -
  2. Leaked objects should be reported and then correctly garbage collected. -Prior versions either reported leaks or functioned as a garbage collector. -
-For the rest of this description we will give instructions that work -with any reasonable version of the collector. -

-To use the collector as a leak detector, follow the following steps: -

    -
  1. Build the collector with -DFIND_LEAK. Otherwise use default -build options. -
  2. Change the program so that all allocation and deallocation goes -through the garbage collector. -
  3. Arrange to call GC_gcollect at appropriate points to check -for leaks. -(For sufficiently long running programs, this will happen implicitly, -but probably not with sufficient frequency.) -
-The second step can usually be accomplished with the --DREDIRECT_MALLOC=GC_malloc option when the collector is built, -or by defining malloc, calloc, -realloc and free -to call the corresponding garbage collector functions. -But this, by itself, will not yield very informative diagnostics, -since the collector does not keep track of information about -how objects were allocated. The error reports will include -only object addresses. -

-For more precise error reports, as much of the program as possible -should use the all uppercase variants of these functions, after -defining GC_DEBUG, and then including gc.h. -In this environment GC_MALLOC is a macro which causes -at least the file name and line number at the allocation point to -be saved as part of the object. Leak reports will then also include -this information. -

-Many collector features (e.g. stubborn objects, finalization, -and disappearing links) are less useful in this context, and are not -fully supported. Their use will usually generate additional bogus -leak reports, since the collector itself drops some associated objects. -

-The same is generally true of thread support. However, as of 6.0alpha4, -correct leak reports should be generated with linuxthreads. -

-On a few platforms (currently Solaris/SPARC, Irix, and, with -DSAVE_CALL_CHAIN, -Linux/X86), GC_MALLOC -also causes some more information about its call stack to be saved -in the object. Such information is reproduced in the error -reports in very non-symbolic form, but it can be very useful with the -aid of a debugger. -

An Example

-The following header file leak_detector.h is included in the -"include" subdirectory of the distribution: -
-#define GC_DEBUG
-#include "gc.h"
-#define malloc(n) GC_MALLOC(n)
-#define calloc(m,n) GC_MALLOC((m)*(n))
-#define free(p) GC_FREE(p)
-#define realloc(p,n) GC_REALLOC((p),(n))
-#define CHECK_LEAKS() GC_gcollect()
-
-

-Assume the collector has been built with -DFIND_LEAK. (For -newer versions of the collector, we could instead add the statement -GC_find_leak = 1 as the first statement in main(). -

-The program to be tested for leaks can then look like: -

-#include "leak_detector.h"
-
-main() {
-    int *p[10];
-    int i;
-    /* GC_find_leak = 1; for new collector versions not         */
-    /* compiled with -DFIND_LEAK.                               */
-    for (i = 0; i < 10; ++i) {
-        p[i] = malloc(sizeof(int)+i);
-    }
-    for (i = 1; i < 10; ++i) {
-        free(p[i]);
-    }
-    for (i = 0; i < 9; ++i) {
-        p[i] = malloc(sizeof(int)+i);
-    }
-    CHECK_LEAKS();
-}
-
-

-On an Intel X86 Linux system this produces on the stderr stream: -

-Leaked composite object at 0x806dff0 (leak_test.c:8, sz=4)
-
-(On most unmentioned operating systems, the output is similar to this. -If the collector had been built on Linux/X86 with -DSAVE_CALL_CHAIN, -the output would be closer to the Solaris example. For this to work, -the program should not be compiled with -fomit_frame_pointer.) -

-On Irix it reports -

-Leaked composite object at 0x10040fe0 (leak_test.c:8, sz=4)
-        Caller at allocation:
-                ##PC##= 0x10004910
-
-and on Solaris the error report is -
-Leaked composite object at 0xef621fc8 (leak_test.c:8, sz=4)
-        Call chain at allocation:
-                args: 4 (0x4), 200656 (0x30FD0)
-                ##PC##= 0x14ADC
-                args: 1 (0x1), -268436012 (0xEFFFFDD4)
-                ##PC##= 0x14A64
-
-In the latter two cases some additional information is given about -how malloc was called when the leaked object was allocated. For -Solaris, the first line specifies the arguments to GC_debug_malloc -(the actual allocation routine), The second the program counter inside -main, the third the arguments to main, and finally the program -counter inside the caller to main (i.e. in the C startup code). -

-In the Irix case, only the address inside the caller to main is given. -

-In many cases, a debugger is needed to interpret the additional information. -On systems supporting the "adb" debugger, the tools/callprocs.sh -script can be used to replace program counter values with symbolic names. -As of version 6.1, the collector tries to generate symbolic names for -call stacks if it knows how to do so on the platform. This is true on -Linux/X86, but not on most other platforms. -

Simplified leak detection under Linux

-Since version 6.1, it should be possible to run the collector in leak -detection mode on a program a.out under Linux/X86 as follows: -
    -
  1. Ensure that a.out is a single-threaded executable, or you are using -a very recent (7.0alpha7+) collector version on Linux. -On most platforms this does not work at all for the multi-threaded programs. -
  2. If possible, ensure that the addr2line program is installed in -/usr/bin. (It comes with most Linux distributions.) -
  3. If possible, compile your program, which we'll call a.out, -with full debug information. -This will improve the quality of the leak reports. With this approach, it is -no longer necessary to call GC_ routines explicitly, -though that can also -improve the quality of the leak reports. -
  4. Build the collector and install it in directory foo as follows: -
      -
    • configure --prefix=foo --enable-gc-debug --enable-redirect-malloc ---disable-threads -
    • make -
    • make install -
    -With a very recent collector on Linux, it may sometimes be safe to omit -the --disable-threads. But the combination of thread support -and malloc replacement is not yet rock solid. -
  5. Set environment variables as follows: -
      -
    • LD_PRELOAD=foo/lib/libgc.so -
    • GC_FIND_LEAK -
    • You may also want to set GC_PRINT_STATS -(to confirm that the collector is running) and/or -GC_LOOP_ON_ABORT (to facilitate debugging from another -window if something goes wrong). -
    -
  6. Simply run a.out as you normally would. Note that if you run anything -else (e.g. your editor) with those environment variables set, -it will also be leak tested. This may or may not be useful and/or -embarrassing. It can generate -mountains of leak reports if the application wasn't designed to avoid leaks, -e.g. because it's always short-lived. -
-This has not yet been thoroughly tested on large applications, but it's known -to do the right thing on at least some small ones. - - diff --git a/doc/leak.md b/doc/leak.md new file mode 100644 index 00000000..9af33e87 --- /dev/null +++ b/doc/leak.md @@ -0,0 +1,189 @@ +# Using the Garbage Collector as Leak Detector + +The garbage collector may be used as a leak detector. In this case, the +primary function of the collector is to report objects that were allocated +(typically with `GC_MALLOC`), not deallocated (normally with `GC_FREE`), but +are no longer accessible. Since the object is no longer accessible, there +in normally no way to deallocate the object at a later time; thus it can +safely be assumed that the object has been "leaked". + +This is substantially different from counting leak detectors, which simply +verify that all allocated objects are eventually deallocated. +A garbage-collector based leak detector can provide somewhat more precise +information when an object was leaked. More importantly, it does not report +objects that are never deallocated because they are part of "permanent" data +structures. Thus it does not require all objects to be deallocated at process +exit time, a potentially useless activity that often triggers large amounts +of paging. + +All non-ancient versions of the garbage collector provide leak detection +support. Version 5.3 adds the following features: + + 1. Leak detection mode can be initiated at run-time by setting + `GC_find_leak` instead of building the collector with `FIND_LEAK` defined. + This variable should be set to a nonzero value at program startup. + 2. Leaked objects should be reported and then correctly garbage collected. + Prior versions either reported leaks or functioned as a garbage collector. + For the rest of this description we will give instructions that work with + any reasonable version of the collector. + +To use the collector as a leak detector, follow the following steps: + + 1. Build the collector with `-DFIND_LEAK`. Otherwise use default build + options. + 2. Change the program so that all allocation and deallocation goes through + the garbage collector. + 3. Arrange to call `GC_gcollect` at appropriate points to check for leaks. + (For sufficiently long running programs, this will happen implicitly, but + probably not with sufficient frequency.) The second step can usually + be accomplished with the `-DREDIRECT_MALLOC=GC_malloc` option when the + collector is built, or by defining `malloc`, `calloc`, `realloc` and `free` + to call the corresponding garbage collector functions. But this, by itself, + will not yield very informative diagnostics, since the collector does not + keep track of information about how objects were allocated. The error + reports will include only object addresses. + +For more precise error reports, as much of the program as possible should use +the all uppercase variants of these functions, after defining `GC_DEBUG`, and +then including `gc.h`. In this environment `GC_MALLOC` is a macro which causes +at least the file name and line number at the allocation point to be saved +as part of the object. Leak reports will then also include this information. + +Many collector features (e.g. stubborn objects, finalization, and +disappearing links) are less useful in this context, and are not fully +supported. Their use will usually generate additional bogus leak reports, +since the collector itself drops some associated objects. + +The same is generally true of thread support. However, as of 6.0alpha4, +correct leak reports should be generated with linuxthreads. + +On a few platforms (currently Solaris/SPARC, Irix, and, with +-DSAVE_CALL_CHAIN, Linux/X86), `GC_MALLOC` also causes some more information +about its call stack to be saved in the object. Such information is reproduced +in the error reports in very non-symbolic form, but it can be very useful with +the aid of a debugger. + +## An Example + +The following header file `leak_detector.h` is included in the "include" +subdirectory of the distribution: + + + #define GC_DEBUG + #include "gc.h" + #define malloc(n) GC_MALLOC(n) + #define calloc(m,n) GC_MALLOC((m)*(n)) + #define free(p) GC_FREE(p) + #define realloc(p,n) GC_REALLOC((p),(n)) + #define CHECK_LEAKS() GC_gcollect() + + +Assume the collector has been built with `-DFIND_LEAK`. (For newer versions +of the collector, we could instead add the statement `GC_find_leak = 1` as the +first statement in `main`. + +The program to be tested for leaks can then look like: + + + #include "leak_detector.h" + + main() { + int *p[10]; + int i; + /* GC_find_leak = 1; for new collector versions not */ + /* compiled with -DFIND_LEAK. */ + for (i = 0; i < 10; ++i) { + p[i] = malloc(sizeof(int)+i); + } + for (i = 1; i < 10; ++i) { + free(p[i]); + } + for (i = 0; i < 9; ++i) { + p[i] = malloc(sizeof(int)+i); + } + CHECK_LEAKS(); + } + + +On an Intel X86 Linux system this produces on the stderr stream: + + + Leaked composite object at 0x806dff0 (leak_test.c:8, sz=4) + + +(On most unmentioned operating systems, the output is similar to this. If the +collector had been built on Linux/X86 with `-DSAVE_CALL_CHAIN`, the output +would be closer to the Solaris example. For this to work, the program should +not be compiled with `-fomit_frame_pointer`.) + +On Irix it reports: + + + Leaked composite object at 0x10040fe0 (leak_test.c:8, sz=4) + Caller at allocation: + ##PC##= 0x10004910 + + +and on Solaris the error report is: + + + Leaked composite object at 0xef621fc8 (leak_test.c:8, sz=4) + Call chain at allocation: + args: 4 (0x4), 200656 (0x30FD0) + ##PC##= 0x14ADC + args: 1 (0x1), -268436012 (0xEFFFFDD4) + ##PC##= 0x14A64 + + +In the latter two cases some additional information is given about how malloc +was called when the leaked object was allocated. For Solaris, the first line +specifies the arguments to `GC_debug_malloc` (the actual allocation routine), +The second the program counter inside main, the third the arguments to `main`, +and finally the program counter inside the caller to main (i.e. in the +C startup code). + +In the Irix case, only the address inside the caller to main is given. + +In many cases, a debugger is needed to interpret the additional information. +On systems supporting the "adb" debugger, the `tools/callprocs.sh` script can +be used to replace program counter values with symbolic names. As of version +6.1, the collector tries to generate symbolic names for call stacks if it +knows how to do so on the platform. This is true on Linux/X86, but not on most +other platforms. + +## Simplified leak detection under Linux + +Since version 6.1, it should be possible to run the collector in leak +detection mode on a program a.out under Linux/X86 as follows: + + 1. _Ensure that a.out is a single-threaded executable, or you are using + a very recent (7.0alpha7+) collector version on Linux._ On most platforms + this does not work at all for the multi-threaded programs. + 2. If possible, ensure that the `addr2line` program is installed + in `/usr/bin`. (It comes with most Linux distributions.) + 3. If possible, compile your program, which we'll call `a.out`, with full + debug information. This will improve the quality of the leak reports. + With this approach, it is no longer necessary to call `GC_` routines + explicitly, though that can also improve the quality of the leak reports. + 4. Build the collector and install it in directory _foo_ as follows: + * `configure --prefix=_foo_ --enable-gc-debug --enable-redirect-malloc --disable-threads` + * `make` + * `make install` + + With a very recent collector on Linux, it may sometimes be safe to omit + the `--disable-threads`. But the combination of thread support and + `malloc` replacement is not yet rock solid. + 5. Set environment variables as follows: + * `LD_PRELOAD=`_foo_`/lib/libgc.so` + * `GC_FIND_LEAK` + + You may also want to set `GC_PRINT_STATS` (to confirm that the collector + is running) and/or `GC_LOOP_ON_ABORT` (to facilitate debugging from + another window if something goes wrong). + 6. Simply run `a.out` as you normally would. Note that if you run anything + else (e.g. your editor) with those environment variables set, it will also + be leak tested. This may or may not be useful and/or embarrassing. It can + generate mountains of leak reports if the application was not designed + to avoid leaks, e.g. because it's always short-lived. This has not yet + been thoroughly tested on large applications, but it's known to do the right + thing on at least some small ones. diff --git a/doc/overview.html b/doc/overview.html index b8c6f333..2c451204 100644 --- a/doc/overview.html +++ b/doc/overview.html @@ -6,7 +6,7 @@ Interface Overview Tutorial Slides FAQ - Example + Example Download License @@ -39,7 +39,7 @@ without explicitly deallocating memory that is no longer useful. The collector automatically recycles memory when it determines that it can no longer be otherwise accessed. A simple example of such a use is given -here. +here.

The collector is also used by a number of programming language implementations that either use C as intermediate code, want @@ -49,7 +49,7 @@ For a more detailed description of the interface, see here.

Alternatively, the garbage collector may be used as -a leak detector +a leak detector for C or C++ programs, though that is not its primary goal.

Typically several versions will be available. @@ -129,14 +129,14 @@ based on this one. Their collector takes advantage of multiple processors during a collection. Starting with collector version 6.0alpha1 we also do this, though with more modest processor scalability goals. Our approach is discussed briefly in -scale.html. +this document.

Some Collector Details

The collector uses a mark-sweep algorithm. It provides incremental and generational collection under operating systems which provide the right kind of virtual memory support. (Currently this includes SunOS[45], IRIX, OSF/1, Linux, and Windows, with varying restrictions.) -It allows finalization code +It allows finalization code to be invoked when an object is collected. It can take advantage of type information to locate pointers if such information is provided, but it is usually used without such information. @@ -340,7 +340,7 @@ system. Asymptote LaTeX-compatible vector graphics language.

More collector information at this site

-A simple illustration of how to build and +A simple illustration of how to build and use the collector.

Description of alternate interfaces to the @@ -350,9 +350,9 @@ garbage collector.

A FAQ (frequently asked questions) list.

-How to use the garbage collector as a leak detector. +How to use the garbage collector as a leak detector.

-Some hints on debugging garbage collected +Some hints on debugging garbage collected applications.

An overview of the implementation of the @@ -360,7 +360,7 @@ garbage collector.

The data structure used for fast pointer lookups.

-Scalability of the collector to multiprocessors. +Scalability of the collector to multiprocessors.

Directory containing garbage collector source.

More background information at this site

diff --git a/doc/porting.html b/doc/porting.html deleted file mode 100644 index 2b019d7d..00000000 --- a/doc/porting.html +++ /dev/null @@ -1,326 +0,0 @@ - - - Conservative GC Porting Directions - - -

Conservative GC Porting Directions

-The collector is designed to be relatively easy to port, but is not -portable code per se. The collector inherently has to perform operations, -such as scanning the stack(s), that are not possible in portable C code. -

-All of the following assumes that the collector is being ported to a -byte-addressable 32- or 64-bit machine. Currently all successful ports -to 64-bit machines involve LP64 targets. The code base includes some -provisions for P64 targets (notably win64), but that has not been tested. -You are hereby discouraged from attempting a port to non-byte-addressable, -or 8-bit, or 16-bit machines. -

-The difficulty of porting the collector varies greatly depending on the needed -functionality. In the simplest case, only some small additions are needed -for the include/private/gcconfig.h file. This is described in the -following section. Later sections discuss some of the optional features, -which typically involve more porting effort. -

-Note that the collector makes heavy use of ifdefs. Unlike -some other software projects, we have concluded repeatedly that this is preferable -to system dependent files, with code duplicated between the files. -However, to keep this manageable, we do strongly believe in indenting -ifdefs correctly (for historical reasons usually without the leading -sharp sign). (Separate source files are of course fine if they don't result in -code duplication.) -

Adding Platforms to gcconfig.h

-If neither thread support, nor tracing of dynamic library data is required, -these are often the only changes you will need to make. -

-The gcconfig.h file consists of three sections: -

    -
  1. A section that defines GC-internal macros -that identify the architecture (e.g. IA64 or I386) -and operating system (e.g. LINUX or MSWIN32). -This is usually done by testing predefined macros. By defining -our own macros instead of using the predefined ones directly, we can -impose a bit more consistency, and somewhat isolate ourselves from -compiler differences. -

    -It is relatively straightforward to add a new entry here. But please try -to be consistent with the existing code. In particular, 64-bit variants -of 32-bit architectures general are not treated as a new architecture. -Instead we explicitly test for 64-bit-ness in the few places in which it -matters. (The notable exception here is I386 and X86_64. -This is partially historical, and partially justified by the fact that there -are arguably more substantial architecture and ABI differences here than -for RISC variants.) -

    -on GNU-based systems, cpp -dM empty_source_file.c seems to generate -a set of predefined macros. On some other systems, the "verbose" -compiler option may do so, or the manual page may list them. -

  2. -A section that defines a small number of platform-specific macros, which are -then used directly by the collector. For simple ports, this is where most of -the effort is required. We describe the macros below. -

    -This section contains a subsection for each architecture (enclosed in a -suitable ifdef. Each subsection usually contains some -architecture-dependent defines, followed by several sets of OS-dependent -defines, again enclosed in ifdefs. -

  3. -A section that fills in defaults for some macros left undefined in the preceding -section, and defines some other macros that rarely need adjustment for -new platforms. You will typically not have to touch these. -If you are porting to an OS that -was previously completely unsupported, it is likely that you will -need to add another clause to the definition of GET_MEM. -
-The following macros must be defined correctly for each architecture and operating -system: -
-
MACH_TYPE -
-Defined to a string that represents the machine architecture. Usually -just the macro name used to identify the architecture, but enclosed in quotes. -
OS_TYPE -
-Defined to a string that represents the operating system name. Usually -just the macro name used to identify the operating system, but enclosed in quotes. -
CPP_WORDSZ -
-The word size in bits as a constant suitable for preprocessor tests, -i.e. without casts or sizeof expressions. Currently always defined as -either 64 or 32. For platforms supporting both 32- and 64-bit ABIs, -this should be conditionally defined depending on the current ABI. -There is a default of 32. -
ALIGNMENT -
-Defined to be the largest N, such that -all pointer are guaranteed to be aligned on N-byte boundaries. -defining it to be 1 will always work, but perform poorly. -For all modern 32-bit platforms, this is 4. For all modern 64-bit -platforms, this is 8. Whether or not X86 qualifies as a modern -architecture here is compiler- and OS-dependent. -
DATASTART -
-The beginning of the main data segment. The collector will trace all -memory between DATASTART and DATAEND for root pointers. -On some platforms, this can be defined to a constant address, -though experience has shown that to be risky. Ideally the linker will -define a symbol (e.g. _data whose address is the beginning -of the data segment. Sometimes the value can be computed using -the GC_SysVGetDataStart function. Not used if either -the next macro is defined, or if dynamic loading is supported, and the -dynamic loading support defines a function -GC_register_main_static_data() which returns false. -
SEARCH_FOR_DATA_START -
-If this is defined DATASTART will be defined to a dynamically -computed value which is obtained by starting with the address of -_end and walking backwards until non-addressable memory is found. -This often works on Posix-like platforms. It makes it harder to debug -client programs, since startup involves generating and catching a -segmentation fault, which tends to confuse users. -
DATAEND -
-Set to the end of the main data segment. Defaults to end, -where that is declared as an array. This works in some cases, since -the linker introduces a suitable symbol. -
DATASTART2, DATAEND2 -
-Some platforms have two discontiguous main data segments, e.g. -for initialized and uninitialized data. If so, these two macros -should be defined to the limits of the second main data segment. -
STACK_GROWS_UP -
-Should be defined if the stack (or thread stacks) grow towards higher -addresses. (This appears to be true only on PA-RISC. If your architecture -has more than one stack per thread, and is not supported yet, you will -need to do more work. Grep for "IA64" in the source for an example.) -
STACKBOTTOM -
-Defined to be the cool end of the stack, which is usually the -highest address in the stack. It must bound the region of the -stack that contains pointers into the GC heap. With thread support, -this must be the cold end of the main stack, which typically -cannot be found in the same way as the other thread stacks. -If this is not defined and none of the following three macros -is defined, client code must explicitly set -GC_stackbottom to an appropriate value before calling -GC_INIT() or any other GC_ routine. -
LINUX_STACKBOTTOM -
-May be defined instead of STACKBOTTOM. -If defined, then the cold end of the stack will be determined -Currently we usually read it from /proc. -
HEURISTIC1 -
-May be defined instead of STACKBOTTOM. -STACK_GRAN should generally also be undefined and defined. -The cold end of the stack is determined by taking an address inside -GC_init's frame, and rounding it up to -the next multiple of STACK_GRAN. This works well if the stack base is -always aligned to a large power of two. -(STACK_GRAN is predefined to 0x1000000, which is -rarely optimal.) -
HEURISTIC2 -
-May be defined instead of STACKBOTTOM. -The cold end of the stack is determined by taking an address inside -GC_init's frame, incrementing it repeatedly -in small steps (decrement if STACK_GROWS_UP), and reading the value -at each location. We remember the value when the first -Segmentation violation or Bus error is signaled, round that -to the nearest plausible page boundary, and use that as the -stack base. -
DYNAMIC_LOADING -
-Should be defined if dyn_load.c has been updated for this -platform and tracing of dynamic library roots is supported. -
MPROTECT_VDB, PROC_VDB -
-May be defined if the corresponding "virtual dirty bit" -implementation in os_dep.c is usable on this platform. This -allows incremental/generational garbage collection. -MPROTECT_VDB identifies modified pages by -write protecting the heap and catching faults. -PROC_VDB uses the /proc primitives to read dirty bits. -
PREFETCH, GC_PREFETCH_FOR_WRITE -
-The collector uses PREFETCH(x) to preload the cache -with *x. -This defaults to a no-op. -
CLEAR_DOUBLE -
-If CLEAR_DOUBLE is defined, then -CLEAR_DOUBLE(x) is used as a fast way to -clear the two words at GC_malloc-aligned address x. By default, -word stores of 0 are used instead. -
HEAP_START -
-HEAP_START may be defined as the initial address hint for mmap-based -allocation. -
-

Additional requirements for a basic port

-In some cases, you may have to add additional platform-specific code -to other files. A likely candidate is the implementation of -GC_with_callee_saves_pushed in mach_dep.c. -This ensure that register contents that the collector must trace -from are copied to the stack. Typically this can be done portably, -but on some platforms it may require assembly code, or just -tweaking of conditional compilation tests. -

-For GC7, if your platform supports getcontext(), then defining -the macro UNIX_LIKE for your OS in gcconfig.h -(if it isn't defined there yet) is likely to solve the problem. -otherwise, if you are using gcc, _builtin_unwind_init() -will be used, and should work fine. If that is not applicable either, -the implementation will try to use setjmp(). This will work if your -setjmp implementation saves all possibly pointer-valued registers -into the buffer, as opposed to trying to unwind the stack at -longjmp time. The setjmp_test test tries to determine this, -but often doesn't get it right. -

-In GC6.x versions of the collector, tracing of registers -was more commonly handled -with assembly code. In GC7, this is generally to be avoided. -

-Most commonly os_dep.c will not require attention, but see below. -

Thread support

-Supporting threads requires that the collector be able to find and suspend -all threads potentially accessing the garbage-collected heap, and locate -any state associated with each thread that must be traced. -

-The functionality needed for thread support is generally implemented -in one or more files specific to the particular thread interface. -For example, somewhat portable pthread support is implemented -in pthread_support.c and pthread_stop_world.c. -The essential functionality consists of -

-
GC_stop_world() -
-Stops all threads which may access the garbage collected heap, other -than the caller. -
GC_start_world() -
-Restart other threads. -
GC_push_all_stacks() -
-Push the contents of all thread stacks (or at least of pointer-containing -regions in the thread stacks) onto the mark stack. -
-These very often require that the garbage collector maintain its -own data structures to track active threads. -

-In addition, LOCK and UNLOCK must be implemented -in gc_locks.h -

-The easiest case is probably a new pthreads platform -on which threads can be stopped -with signals. In this case, the changes involve: -

    -
  1. Introducing a suitable GC_X_THREADS macro, which should -be automatically defined by gc_config_macros.h in the right cases. -It should also result in a definition of GC_PTHREADS, as for the -existing cases. -
  2. For GC v7, ensuring that the atomic_ops package at least -minimally supports the platform. -If incremental GC is needed, or if pthread locks don't -perform adequately as the allocation lock, you will probably need to -ensure that a sufficient atomic_ops port -exists for the platform to provided an atomic test and set -operation. The latest GC code can use GCC atomic intrinsics instead of -atomic_ops package (see include/private/gc_atomic_ops.h). -
  3. Making any needed adjustments to pthread_stop_world.c and -pthread_support.c. Ideally none should be needed. In fact, -not all of this is as well standardized as one would like, and outright -bugs requiring workarounds are common. -
-Non-preemptive threads packages will probably require further work. Similarly -thread-local allocation and parallel marking requires further work -in pthread_support.c, and may require better atomic_ops -support. -

Dynamic library support

-So long as DATASTART and DATAEND are defined correctly, -the collector will trace memory reachable from file scope or static -variables defined as part of the main executable. This is sufficient -if either the program is statically linked, or if pointers to the -garbage-collected heap are never stored in non-stack variables -defined in dynamic libraries. -

-If dynamic library data sections must also be traced, then -

-

-Implementations that scan for writable data segments are error prone, particularly -in the presence of threads. They frequently result in race conditions -when threads exit and stacks disappear. They may also accidentally trace -large regions of graphics memory, or mapped files. On at least -one occasion they have been known to try to trace device memory that -could not safely be read in the manner the GC wanted to read it. -

-It is usually safer to walk the dynamic linker data structure, especially -if the linker exports an interface to do so. But beware of poorly documented -locking behavior in this case. -

Incremental GC support

-For incremental and generational collection to work, os_dep.c -must contain a suitable "virtual dirty bit" implementation, which -allows the collector to track which heap pages (assumed to be -a multiple of the collectors block size) have been written during -a certain time interval. The collector provides several -implementations, which might be adapted. The default -(DEFAULT_VDB) is a placeholder which treats all pages -as having been written. This ensures correctness, but renders -incremental and generational collection essentially useless. -

Stack traces for debug support

-If stack traces in objects are need for debug support, -GC_dave_callers and GC_print_callers must be -implemented. -

Disclaimer

-This is an initial pass at porting guidelines. Some things -have no doubt been overlooked. - - diff --git a/doc/porting.md b/doc/porting.md new file mode 100644 index 00000000..bc2b253c --- /dev/null +++ b/doc/porting.md @@ -0,0 +1,262 @@ +# Conservative Garbage Collector Porting Directions + +The collector is designed to be relatively easy to port, but is not portable +code per se. The collector inherently has to perform operations, such +as scanning the stack(s), that are not possible in portable C code. + +All of the following assumes that the collector is being ported to +a byte-addressable 32- or 64-bit machine. Currently all successful ports +to 64-bit machines involve LP64 targets. The code base includes some +provisions for P64 targets (notably Win64), but that has not been tested. You +are hereby discouraged from attempting a port to non-byte-addressable, +or 8-bit, or 16-bit machines. + +The difficulty of porting the collector varies greatly depending on the needed +functionality. In the simplest case, only some small additions are needed for +the `include/private/gcconfig.h` file. This is described in the following +section. Later sections discuss some of the optional features, which typically +involve more porting effort. + +Note that the collector makes heavy use of `ifdef`s. Unlike some other +software projects, we have concluded repeatedly that this is preferable +to system dependent files, with code duplicated between the files. However, +to keep this manageable, we do strongly believe in indenting `ifdef`s +correctly (for historical reasons usually without the leading sharp sign). +(Separate source files are of course fine if they do not result in code +duplication.) + +## Adding Platforms to gcconfig.h + +If neither thread support, nor tracing of dynamic library data is required, +these are often the only changes you will need to make. + +The `gcconfig.h` file consists of three sections: + + 1. A section that defines GC-internal macros that identify the architecture + (e.g. `IA64` or `I386`) and operating system (e.g. `LINUX` or `MSWIN32`). + This is usually done by testing predefined macros. By defining our own + macros instead of using the predefined ones directly, we can impose a bit + more consistency, and somewhat isolate ourselves from compiler differences. + It is relatively straightforward to add a new entry here. But please try + to be consistent with the existing code. In particular, 64-bit variants + of 32-bit architectures general are _not_ treated as a new architecture. + Instead we explicitly test for 64-bit-ness in the few places in which + it matters. (The notable exception here is `I386` and `X86_64`. This + is partially historical, and partially justified by the fact that there are + arguably more substantial architecture and ABI differences here than for + RISC variants.) On GNU-based systems, `cpp -dM empty_source_file.c` seems + to generate a set of predefined macros. On some other systems, the "verbose" + compiler option may do so, or the manual page may list them. + 2. A section that defines a small number of platform-specific macros, which + are then used directly by the collector. For simple ports, this is where + most of the effort is required. We describe the macros below. This section + contains a subsection for each architecture (enclosed in a suitable `ifdef`. + Each subsection usually contains some architecture-dependent defines, + followed by several sets of OS-dependent defines, again enclosed in + `ifdef`s. + + 3. A section that fills in defaults for some macros left undefined in the + preceding section, and defines some other macros that rarely need adjustment + for new platforms. You will typically not have to touch these. If you are + porting to an OS that was previously completely unsupported, it is likely + that you will need to add another clause to the definition of `GET_MEM`. + +The following macros must be defined correctly for each architecture and +operating system: + + * `MACH_TYPE` - Defined to a string that represents the machine + architecture. Usually just the macro name used to identify the architecture, + but enclosed in quotes. + * `OS_TYPE` - Defined to a string that represents the operating system name. + Usually just the macro name used to identify the operating system, but + enclosed in quotes. + * `CPP_WORDSZ` - The word size in bits as a constant suitable for + preprocessor tests, i.e. without casts or `sizeof` expressions. Currently + always defined as either 64 or 32. For platforms supporting both 32- and + 64-bit ABIs, this should be conditionally defined depending on the current + ABI. There is a default of 32. + * `ALIGNMENT` - Defined to be the largest _N_ such that all pointer + are guaranteed to be aligned on _N_-byte boundaries. Defining it to be _1_ + will always work, but perform poorly. For all modern 32-bit platforms, this + is 4. For all modern 64-bit platforms, this is 8. Whether or not X86 + qualifies as a modern architecture here is compiler- and OS-dependent. + * `DATASTART` - The beginning of the main data segment. The collector will + trace all memory between `DATASTART` and `DATAEND` for root pointers. + On some platforms, this can be defined to a constant address, though + experience has shown that to be risky. Ideally the linker will define + a symbol (e.g. `_data` whose address is the beginning of the data segment. + Sometimes the value can be computed using the `GC_SysVGetDataStart` + function. Not used if either the next macro is defined, or if dynamic + loading is supported, and the dynamic loading support defines a function + `GC_register_main_static_data` which returns false. + * `SEARCH_FOR_DATA_START` - If this is defined `DATASTART` will be defined + to a dynamically computed value which is obtained by starting with the + address of `_end` and walking backwards until non-addressable memory + is found. This often works on Posix-like platforms. It makes it harder + to debug client programs, since startup involves generating and catching + a segmentation fault, which tends to confuse users. + * `DATAEND` - Set to the end of the main data segment. Defaults to `end`, + where that is declared as an array. This works in some cases, since the + linker introduces a suitable symbol. + * `DATASTART2`, `DATAEND2` - Some platforms have two discontiguous main data + segments, e.g. for initialized and uninitialized data. If so, these two + macros should be defined to the limits of the second main data segment. + * `STACK_GROWS_UP` - Should be defined if the stack (or thread stacks) grow + towards higher addresses. (This appears to be true only on PA-RISC. If your + architecture has more than one stack per thread, and is not supported yet, + you will need to do more work. Grep for "IA64" in the source for an + example.) + * `STACKBOTTOM` - Defined to be the cool end of the stack, which is usually + the highest address in the stack. It must bound the region of the stack that + contains pointers into the GC heap. With thread support, this must be the + cold end of the main stack, which typically cannot be found in the same way + as the other thread stacks. If this is not defined and none of the following + three macros is defined, client code must explicitly set `GC_stackbottom` + to an appropriate value before calling `GC_INIT` or any other `GC_` routine. + * `LINUX_STACKBOTTOM` - May be defined instead of `STACKBOTTOM`. If defined, + then the cold end of the stack will be determined Currently we usually read + it from `/proc`. + * `HEURISTIC1` - May be defined instead of `STACKBOTTOM`. `STACK_GRAN` + should generally also be redefined. The cold end of the stack is determined + by taking an address inside `GC_init`s frame, and rounding it up to the next + multiple of `STACK_GRAN`. This works well if the stack base is always + aligned to a large power of two. (`STACK_GRAN` is predefined to 0x1000000, + which is rarely optimal.) + * `HEURISTIC2` - May be defined instead of `STACKBOTTOM`. The cold end + of the stack is determined by taking an address inside `GC_init`s frame, + incrementing it repeatedly in small steps (decrement if `STACK_GROWS_UP`), + and reading the value at each location. We remember the value when the first + Segmentation violation or Bus error is signaled, round that to the nearest + plausible page boundary, and use that as the stack base. + * `DYNAMIC_LOADING` - Should be defined if `dyn_load.c` has been updated for + this platform and tracing of dynamic library roots is supported. + * `MPROTECT_VDB`, `PROC_VDB` - May be defined if the corresponding + _virtual dirty bit_ implementation in `os_dep.c` is usable on this platform. + This allows incremental/generational garbage collection. `MPROTECT_VDB` + identifies modified pages by write protecting the heap and catching faults. + `PROC_VDB` uses the /proc primitives to read dirty bits. + * `PREFETCH`, `GC_PREFETCH_FOR_WRITE` - The collector uses `PREFETCH(x)` + to preload the cache with the data at _x_ address. This defaults to a no-op. + * `CLEAR_DOUBLE` - If `CLEAR_DOUBLE` is defined, then `CLEAR_DOUBLE(x)` + is used as a fast way to clear the two words at `GC_malloc`-aligned address + _x_. By default, word stores of 0 are used instead. + * `HEAP_START` - May be defined as the initial address hint for mmap-based + allocation. + +## Additional requirements for a basic port + +In some cases, you may have to add additional platform-specific code to other +files. A likely candidate is the implementation +of `GC_with_callee_saves_pushed` in `mach_dep.c`. This ensure that register +contents that the collector must trace from are copied to the stack. Typically +this can be done portably, but on some platforms it may require assembly code, +or just tweaking of conditional compilation tests. + +For GC v7, if your platform supports `getcontext`, then defining the macro +`UNIX_LIKE` for your OS in `gcconfig.h` (if it is not defined there yet) +is likely to solve the problem. otherwise, if you are using gcc, +`_builtin_unwind_init` will be used, and should work fine. If that is not +applicable either, the implementation will try to use `setjmp`. This will work +if your `setjmp` implementation saves all possibly pointer-valued registers +into the buffer, as opposed to trying to unwind the stack at `longjmp` time. +The `setjmp_test` test tries to determine this, but often does not get it +right. + +In GC v6.x versions of the collector, tracing of registers was more commonly +handled with assembly code. In GC v7, this is generally to be avoided. + +Most commonly `os_dep.c` will not require attention, but see below. + +## Thread support + +Supporting threads requires that the collector be able to find and suspend all +threads potentially accessing the garbage-collected heap, and locate any state +associated with each thread that must be traced. + +The functionality needed for thread support is generally implemented in one or +more files specific to the particular thread interface. For example, somewhat +portable pthread support is implemented in `pthread_support.c` and +`pthread_stop_world.c`. The essential functionality consists of: + + * `GC_stop_world` - Stops all threads which may access the garbage collected + heap, other than the caller; + * `GC_start_world` - Restart other threads; + * `GC_push_all_stacks` - Push the contents of all thread stacks (or, + at least, of pointer-containing regions in the thread stacks) onto the mark + stack. + +These very often require that the garbage collector maintain its own data +structures to track active threads. + +In addition, `LOCK` and `UNLOCK` must be implemented in `gc_locks.h`. + +The easiest case is probably a new pthreads platform on which threads can be +stopped with signals. In this case, the changes involve: + + 1. Introducing a suitable `GC_xxx_THREADS` macro, which should + be automatically defined by `gc_config_macros.h` in the right cases. + It should also result in a definition of `GC_PTHREADS`, as for the existing + cases. + 2. For GC v7, ensuring that the `atomic_ops` package at least minimally + supports the platform. If incremental GC is needed, or if pthread locks + do not perform adequately as the allocation lock, you will probably need + to ensure that a sufficient `atomic_ops` port exists for the platform + to provided an atomic test and set operation. The latest GC code can use + GCC atomic intrinsics instead of `atomic_ops` package (see + `include/private/gc_atomic_ops.h`). + 3. Making any needed adjustments to `pthread_stop_world.c` and + `pthread_support.c`. Ideally none should be needed. In fact, not all of this + is as well standardized as one would like, and outright bugs requiring + workarounds are common. Non-preemptive threads packages will probably + require further work. Similarly thread-local allocation and parallel marking + requires further work in `pthread_support.c`, and may require better + `atomic_ops` support. + +## Dynamic library support + +So long as `DATASTART` and `DATAEND` are defined correctly, the collector will +trace memory reachable from file scope or `static` variables defined as part +of the main executable. This is sufficient if either the program is statically +linked, or if pointers to the garbage-collected heap are never stored +in non-stack variables defined in dynamic libraries. + +If dynamic library data sections must also be traced, then: + + * `DYNAMIC_LOADING` must be defined in the appropriate section of + `gcconfig.h`. + * An appropriate versions of the functions `GC_register_dynamic_libraries` + should be defined in `dyn_load.c`. This function should invoke + `GC_cond_add_roots(_region_start, region_end_, TRUE)` on each dynamic + library data section. + +Implementations that scan for writable data segments are error prone, +particularly in the presence of threads. They frequently result in race +conditions when threads exit and stacks disappear. They may also accidentally +trace large regions of graphics memory, or mapped files. On at least one +occasion they have been known to try to trace device memory that could not +safely be read in the manner the GC wanted to read it. + +It is usually safer to walk the dynamic linker data structure, especially +if the linker exports an interface to do so. But beware of poorly documented +locking behavior in this case. + +## Incremental GC support + +For incremental and generational collection to work, `os_dep.c` must contain +a suitable _virtual dirty bit_ implementation, which allows the collector +to track which heap pages (assumed to be a multiple of the collectors block +size) have been written during a certain time interval. The collector provides +several implementations, which might be adapted. The default (`DEFAULT_VDB`) +is a placeholder which treats all pages as having been written. This ensures +correctness, but renders incremental and generational collection essentially +useless. + +## Stack traces for debug support + +If stack traces in objects are need for debug support, `GC_dave_callers` and +`GC_print_callers` must be implemented. + +## Disclaimer + +This is an initial pass at porting guidelines. Some things have no doubt been +overlooked. diff --git a/doc/scale.html b/doc/scale.html deleted file mode 100644 index 15c8c597..00000000 --- a/doc/scale.html +++ /dev/null @@ -1,199 +0,0 @@ - - -Garbage collector scalability - - -

Garbage collector scalability

-In its default configuration, the Boehm-Demers-Weiser garbage collector -is not thread-safe. It can be made thread-safe for a number of environments -by building the collector with -DGC_THREADS compilation -flag. This has primarily two effects: -
    -
  1. It causes the garbage collector to stop all other threads when -it needs to see a consistent memory state. -
  2. It causes the collector to acquire a lock around essentially all -allocation and garbage collection activity. -
-Since a single lock is used for all allocation-related activity, only one -thread can be allocating or collecting at one point. This inherently -limits performance of multi-threaded applications on multiprocessors. -

-On most platforms, the allocator/collector lock is implemented as a -spin lock with exponential back-off. Longer wait times are implemented -by yielding and/or sleeping. If a collection is in progress, the pure -spinning stage is skipped. This has the advantage that uncontested and -thus most uniprocessor lock acquisitions are very cheap. It has the -disadvantage that the application may sleep for small periods of time -even when there is work to be done. And threads may be unnecessarily -woken up for short periods. Nonetheless, this scheme empirically -outperforms native queue-based mutual exclusion implementations in most -cases, sometimes drastically so. -

Options for enhanced scalability

-Version 6.0 of the collector adds two facilities to enhance collector -scalability on multiprocessors. As of 6.0alpha1, these are supported -only under Linux on X86 and IA64 processors, though ports to other -otherwise supported Pthreads platforms should be straightforward. -They are intended to be used together. - -

The Parallel Marking Algorithm

-We use an algorithm similar to -that developed by -Endo, Taura, and Yonezawa at the University of Tokyo. -However, the data structures and implementation are different, -and represent a smaller change to the original collector source, -probably at the expense of extreme scalability. Some of -the refinements they suggest, e.g. splitting large -objects, were also incorporated into out approach. -

-The global mark stack is transformed into a global work queue. -Unlike the usual case, it never shrinks during a mark phase. -The mark threads remove objects from the queue by copying them to a -local mark stack and changing the global descriptor to zero, indicating -that there is no more work to be done for this entry. -This removal -is done with no synchronization. Thus it is possible for more than -one worker to remove the same entry, resulting in some work duplication. -

-The global work queue grows only if a marker thread decides to -return some of its local mark stack to the global one. This -is done if the global queue appears to be running low, or if -the local stack is in danger of overflowing. It does require -synchronization, but should be relatively rare. -

-The sequential marking code is reused to process local mark stacks. -Hence the amount of additional code required for parallel marking -is minimal. -

-It should be possible to use generational collection in the presence of the -parallel collector, by calling GC_enable_incremental(). -This does not result in fully incremental collection, since parallel mark -phases cannot currently be interrupted, and doing so may be too -expensive. -

-Gcj-style mark descriptors do not currently mix with the combination -of local allocation and incremental collection. They should work correctly -with one or the other, but not both. -

-The number of marker threads is set on startup to the number of -available processors (or to the value of the GC_NPROCS -environment variable). If only a single processor is detected, -parallel marking is disabled. -

-Note that setting GC_NPROCS to 1 also causes some lock acquisitions inside -the collector to immediately yield the processor instead of busy waiting -first. In the case of a multiprocessor and a client with multiple -simultaneously runnable threads, this may have disastrous performance -consequences (e.g. a factor of 10 slowdown). -

Performance

-We conducted some simple experiments with a version of -our GC benchmark -that was slightly modified to -run multiple concurrent client threads in the same address space. -Each client thread does the same work as the original benchmark, but they share -a heap. -This benchmark involves very little work outside of memory allocation. -This was run with GC 6.0alpha3 on a dual processor Pentium III/500 machine -under Linux 2.2.12. -

-Running with a thread-unsafe collector, the benchmark ran in 9 -seconds. With the simple thread-safe collector, -built with -DGC_THREADS, the execution time -increased to 10.3 seconds, or 23.5 elapsed seconds with two clients. -(The times for the malloc/free version -with glibc malloc -are 10.51 (standard library, pthreads not linked), -20.90 (one thread, pthreads linked), -and 24.55 seconds respectively. The benchmark favors a -garbage collector, since most objects are small.) -

-The following table gives execution times for the collector built -with parallel marking and thread-local allocation support -(-DGC_THREADS -DPARALLEL_MARK -DTHREAD_LOCAL_ALLOC). We tested -the client using either one or two marker threads, and running -one or two client threads. Note that the client uses thread local -allocation exclusively. With -DTHREAD_LOCAL_ALLOC the collector -switches to a locking strategy that is better tuned to less frequent -lock acquisition. The standard allocation primitives thus perform -slightly worse than without -DTHREAD_LOCAL_ALLOC, and should be -avoided in time-critical code. -

-(The results using pthread_mutex_lock -directly for allocation locking would have been worse still, at -least for older versions of linuxthreads. -With THREAD_LOCAL_ALLOC, we first repeatedly try to acquire the -lock with pthread_mutex_try_lock(), busy-waiting between attempts. -After a fixed number of attempts, we use pthread_mutex_lock().) -

-These measurements do not use incremental collection, nor was prefetching -enabled in the marker. We used the C version of the benchmark. -All measurements are in elapsed seconds on an unloaded machine. -

- - - - - -
Number of threads1 marker thread (secs.)2 marker threads (secs.)
1 client10.457.85
2 clients19.9512.3
- -The execution time for the single threaded case is slightly worse than with -simple locking. However, even the single-threaded benchmark runs faster than -even the thread-unsafe version if a second processor is available. -The execution time for two clients with thread local allocation time is -only 1.4 times the sequential execution time for a single thread in a -thread-unsafe environment, even though it involves twice the client work. -That represents close to a -factor of 2 improvement over the 2 client case with the old collector. -The old collector clearly -still suffered from some contention overhead, in spite of the fact that the -locking scheme had been fairly well tuned. -

-Full linear speedup (i.e. the same execution time for 1 client on one -processor as 2 clients on 2 processors) -is probably not achievable on this kind of -hardware even with such a small number of processors, -since the memory system is -a major constraint for the garbage collector, -the processors usually share a single memory bus, and thus -the aggregate memory bandwidth does not increase in -proportion to the number of processors. -

-These results are likely to be very sensitive to both hardware and OS -issues. Preliminary experiments with an older Pentium Pro machine running -an older kernel were far less encouraging. - - - diff --git a/doc/scale.md b/doc/scale.md new file mode 100644 index 00000000..f38de04b --- /dev/null +++ b/doc/scale.md @@ -0,0 +1,169 @@ +# Garbage collector scalability + +In its default configuration, the Boehm-Demers-Weiser garbage collector is not +thread-safe. It can be made thread-safe for a number of environments +by building the collector with `-DGC_THREADS` compilation flag. This has +primarily two effects: + + 1. It causes the garbage collector to stop all other threads when it needs + to see a consistent memory state. + 2. It causes the collector to acquire a lock around essentially all + allocation and garbage collection activity. Since a single lock is used for + all allocation-related activity, only one thread can be allocating + or collecting at one point. This inherently limits performance + of multi-threaded applications on multiprocessors. + +On most platforms, the allocator/collector lock is implemented as a spin lock +with exponential back-off. Longer wait times are implemented by yielding +and/or sleeping. If a collection is in progress, the pure spinning stage +is skipped. This has the advantage that uncontested and thus most uniprocessor +lock acquisitions are very cheap. It has the disadvantage that the application +may sleep for small periods of time even when there is work to be done. And +threads may be unnecessarily woken up for short periods. Nonetheless, this +scheme empirically outperforms native queue-based mutual exclusion +implementations in most cases, sometimes drastically so. + +## Options for enhanced scalability + +Version 6.0 of the collector adds two facilities to enhance collector +scalability on multiprocessors. As of 6.0alpha1, these are supported only +under Linux on X86 and IA64 processors, though ports to other otherwise +supported Pthreads platforms should be straightforward. They are intended +to be used together. + + * Building the collector with `-DPARALLEL_MARK` allows the collector to run + the mark phase in parallel in multiple threads, and thus on multiple + processors. The mark phase typically consumes the large majority of the + collection time. Thus this largely parallelizes the garbage collector + itself, though not the allocation process. Currently the marking + is performed by the thread that triggered the collection, together with + _N_ - 1 dedicated threads, where _N_ is the number of processors detected + by the collector. The dedicated threads are created once at initialization + time. A second effect of this flag is to switch to a more concurrent + implementation of `GC_malloc_many`, so that free lists can be built, and + memory can be cleared, by more than one thread concurrently. + * Building the collector with `-DTHREAD_LOCAL_ALLOC` adds support for + thread-local allocation. This causes `GC_malloc`, `GC_malloc_atomic`, and + `GC_gcj_malloc` to be redefined to perform thread-local allocation. + +Memory returned from thread-local allocators is completely interchangeable +with that returned by the standard allocators. It may be used by other +threads. The only difference is that, if the thread allocates enough memory +of a certain kind, it will build a thread-local free list for objects of that +kind, and allocate from that. This greatly reduces locking. The thread-local +free lists are refilled using `GC_malloc_many`. + +An important side effect of this flag is to replace the default +spin-then-sleep lock to be replace by a spin-then-queue based implementation. +This _reduces performance_ for the standard allocation functions, though +it usually improves performance when thread-local allocation is used heavily, +and thus the number of short-duration lock acquisitions is greatly reduced. + +## The Parallel Marking Algorithm + +We use an algorithm similar to +[that developed by Endo, Taura, and Yonezawa](http://www.yl.is.s.u-tokyo.ac.jp/gc/) +at the University of Tokyo. However, the data structures and implementation +are different, and represent a smaller change to the original collector +source, probably at the expense of extreme scalability. Some of the +refinements they suggest, e.g. splitting large objects, were also incorporated +into out approach. + +The global mark stack is transformed into a global work queue. Unlike the +usual case, it never shrinks during a mark phase. The mark threads remove +objects from the queue by copying them to a local mark stack and changing the +global descriptor to zero, indicating that there is no more work to be done +for this entry. This removal is done with no synchronization. Thus it is +possible for more than one worker to remove the same entry, resulting in some +work duplication. + +The global work queue grows only if a marker thread decides to return some +of its local mark stack to the global one. This is done if the global queue +appears to be running low, or if the local stack is in danger of overflowing. +It does require synchronization, but should be relatively rare. + +The sequential marking code is reused to process local mark stacks. Hence the +amount of additional code required for parallel marking is minimal. + +It should be possible to use generational collection in the presence of the +parallel collector, by calling `GC_enable_incremental`. This does not result +in fully incremental collection, since parallel mark phases cannot currently +be interrupted, and doing so may be too expensive. + +Gcj-style mark descriptors do not currently mix with the combination of local +allocation and incremental collection. They should work correctly with one or +the other, but not both. + +The number of marker threads is set on startup to the number of available +processors (or to the value of the `GC_NPROCS` environment variable). If only +a single processor is detected, parallel marking is disabled. + +Note that setting `GC_NPROCS` to 1 also causes some lock acquisitions inside +the collector to immediately yield the processor instead of busy waiting +first. In the case of a multiprocessor and a client with multiple +simultaneously runnable threads, this may have disastrous performance +consequences (e.g. a factor of 10 slowdown). + +## Performance + +We conducted some simple experiments with a version of +[our GC benchmark](http://www.hboehm.info/gc/gc_bench/) that was slightly +modified to run multiple concurrent client threads in the same address space. +Each client thread does the same work as the original benchmark, but they +share a heap. This benchmark involves very little work outside of memory +allocation. This was run with GC 6.0alpha3 on a dual processor Pentium III/500 +machine under Linux 2.2.12. + +Running with a thread-unsafe collector, the benchmark ran in 9 seconds. With +the simple thread-safe collector, built with `-DGC_THREADS`, the execution +time increased to 10.3 seconds, or 23.5 elapsed seconds with two clients. (The +times for the `malloc`/`free` version with glibc `malloc` are 10.51 (standard +library, pthreads not linked), 20.90 (one thread, pthreads linked), and 24.55 +seconds respectively. The benchmark favors a garbage collector, since most +objects are small.) + +The following table gives execution times for the collector built with +parallel marking and thread-local allocation support +(`-DGC_THREADS -DPARALLEL_MARK -DTHREAD_LOCAL_ALLOC`). We tested the client +using either one or two marker threads, and running one or two client threads. +Note that the client uses thread local allocation exclusively. With +`-DTHREAD_LOCAL_ALLOC` the collector switches to a locking strategy that +is better tuned to less frequent lock acquisition. The standard allocation +primitives thus perform slightly worse than without `-DTHREAD_LOCAL_ALLOC`, +and should be avoided in time-critical code. + +(The results using `pthread_mutex_lock` directly for allocation locking would +have been worse still, at least for older versions of linuxthreads. With +`-DTHREAD_LOCAL_ALLOC`, we first repeatedly try to acquire the lock with +`pthread_mutex_try_lock`, busy-waiting between attempts. After a fixed number +of attempts, we use `pthread_mutex_lock`.) + +These measurements do not use incremental collection, nor was prefetching +enabled in the marker. We used the C version of the benchmark. All +measurements are in elapsed seconds on an unloaded machine. + +Number of threads| 1 marker thread (secs.) | 2 marker threads (secs.) +---|---|--- +1 client| 10.45| 7.85 | 2 clients| 19.95| 12.3 + +The execution time for the single threaded case is slightly worse than with +simple locking. However, even the single-threaded benchmark runs faster than +even the thread-unsafe version if a second processor is available. The +execution time for two clients with thread local allocation time is only 1.4 +times the sequential execution time for a single thread in a thread-unsafe +environment, even though it involves twice the client work. That represents +close to a factor of 2 improvement over the 2 client case with the old +collector. The old collector clearly still suffered from some contention +overhead, in spite of the fact that the locking scheme had been fairly well +tuned. + +Full linear speedup (i.e. the same execution time for 1 client on one +processor as 2 clients on 2 processors) is probably not achievable on this +kind of hardware even with such a small number of processors, since the memory +system is a major constraint for the garbage collector, the processors usually +share a single memory bus, and thus the aggregate memory bandwidth does not +increase in proportion to the number of processors. + +These results are likely to be very sensitive to both hardware and OS issues. +Preliminary experiments with an older Pentium Pro machine running an older +kernel were far less encouraging. diff --git a/doc/simple_example.html b/doc/simple_example.html deleted file mode 100644 index b01d94a7..00000000 --- a/doc/simple_example.html +++ /dev/null @@ -1,212 +0,0 @@ - - - - -Using the Garbage Collector: A simple example - - -

Using the Garbage Collector: A simple example

-The following consists of step-by-step instructions for building and -using the collector. We'll assume a Linux/gcc platform and -a single-threaded application. The green -text contains information about other platforms or scenarios. -It can be skipped, especially on first reading. -

Building the collector

-If you have not so yet, unpack the collector and enter -the newly created directory with -
-tar xvfz gc<version>.tar.gz
-cd gc<version>
-
-

-You can configure, build, and install the collector in a private -directory, say /home/xyz/gc, with the following commands: -

-./configure --prefix=/home/xyz/gc --disable-threads
-make
-make check
-make install
-
-Here the "make check" command is optional, but highly recommended. -It runs a basic correctness test which usually takes well under a minute. -

Other platforms

- -On non-Unix, non-Linux platforms, the collector is usually built by copying -the appropriate makefile (see the platform-specific README in doc/README.xxx -in the distribution) to the file "Makefile", and then typing "make" -(or "nmake" or ...). This builds the library in the source tree. You may -want to move it and the files in the include directory to a more convenient -place. - -

- -If you use a makefile that does not require running a configure script, -you should first look at the makefile, and adjust any options that are -documented there. - -

- -If your platform provides a "make" utility, that is generally preferred -to platform- and compiler- dependent "project" files. (At least that is the -strong preference of the would-be maintainer of those project files.) - -

Threads

- -If you do not need thread support, configure the collector with - -
---disable-threads
-
- -Alternatively, if your target is a real old-fashioned uniprocessor (no -"hyperthreading", etc.), you may just want to turn off parallel marking with ---disable-parallel-mark. - -

C++

- -You will need to include the C++ support, which unfortunately tends to -be among the least portable parts of the collector, since it seems -to rely on some corner cases of the language. On Linux, it -suffices to add --enable-cplusplus to the configure options. - -

Writing the program

-You will need a -
-#include "gc.h"
-
-at the beginning of every file that allocates memory through the -garbage collector. Call GC_MALLOC wherever you would -have call malloc. This initializes memory to zero like -calloc; there is no need to explicitly clear the -result. -

-If you know that an object will not contain pointers to the -garbage-collected heap, and you don't need it to be initialized, -call GC_MALLOC_ATOMIC instead. -

-A function GC_FREE is provided but need not be called. -For very small objects, your program will probably perform better if -you do not call it, and let the collector do its job. -

-A GC_REALLOC function behaves like the C library realloc. -It allocates uninitialized pointer-free memory if the original -object was allocated that way. -

-The following program loop.c is a trivial example: -

-#include "gc.h"
-#include <assert.h>
-#include <stdio.h>
-
-int main()
-{
-  int i;
-
-  GC_INIT();
-  for (i = 0; i < 10000000; ++i)
-   {
-     int **p = (int **) GC_MALLOC(sizeof(int *));
-     int *q = (int *) GC_MALLOC_ATOMIC(sizeof(int));
-     assert(*p == 0);
-     *p = (int *) GC_REALLOC(q, 2 * sizeof(int));
-     if (i % 100000 == 0)
-       printf("Heap size = %d\n", GC_get_heap_size());
-   }
-  return 0;
-}
-
-

Interaction with the system malloc

- -It is usually best not to mix garbage-collected allocation with the system -malloc-free. If you do, you need to be careful not to store -pointers to the garbage-collected heap in memory allocated with the system -malloc. - - -

Other Platforms

- -On some other platforms it is necessary to call GC_INIT() from the main program, -which is presumed to be part of the main executable, not a dynamic library. -This can never hurt, and is thus generally good practice. - - -

Threads

- -For a multi-threaded program, some more rules apply: - - - -

C++

- -In the case of C++, you need to be especially careful not to store pointers -to the garbage-collected heap in areas that are not traced by the collector. -The collector includes some alternate interfaces -to make that easier. - - -

Debugging

- -Additional debug checks can be performed by defining GC_DEBUG before -including gc.h. Additional options are available if the collector -is also built with --enable-gc-debug (--enable-full-debug in -some older versions) and all allocations are -performed with GC_DEBUG defined. - - -

What if I can't rewrite/recompile my program?

- -You may be able to build the collector with --enable-redirect-malloc -and set the LD_PRELOAD environment variable to point to the resulting -library, thus replacing the standard malloc with its garbage-collected -counterpart. This is rather platform dependent. See the -leak detection documentation for some more details. - - -

Compiling and linking

- -The above application loop.c test program can be compiled and linked -with - -
-cc -I/home/xyz/gc/include loop.c /home/xyz/gc/lib/libgc.a -o loop
-
- -The -I option directs the compiler to the right include -directory. In this case, we list the static library -directly on the compile line; the dynamic library could have been -used instead, provided we arranged for the dynamic loader to find -it, e.g. by setting LD_LIBRARY_PATH. - -

Threads

- -On pthread platforms, you will of course also have to link with --lpthread, -and compile with any thread-safety options required by your compiler. -On some platforms, you may also need to link with -ldl -or -lrt. -Looking at tools/threadlibs.c should give you the appropriate -list if a plain -lpthread doesn't work. - - -

Running the executable

- -The executable can of course be run normally, e.g. by typing - -
-./loop
-
- -The operation of the collector is affected by a number of environment variables. -For example, setting GC_PRINT_STATS produces some -GC statistics on stdout. -See README.environment in the distribution for details. - - diff --git a/doc/simple_example.md b/doc/simple_example.md new file mode 100644 index 00000000..e19d5686 --- /dev/null +++ b/doc/simple_example.md @@ -0,0 +1,181 @@ +# Using the Garbage Collector: A simple example + +The following consists of step-by-step instructions for building and using the +collector. We'll assume a Linux/gcc platform and a single-threaded +application. The green text contains information about other platforms +or scenarios. It can be skipped, especially on first reading. + +## Building the collector + +If you have not so yet, unpack the collector and enter the newly created +directory with: + + + tar xvfz gc-.tar.gz + cd gc- + + +You can configure, build, and install the collector in a private directory, +say /home/xyz/gc, with the following commands: + + + ./configure --prefix=/home/xyz/gc --disable-threads + make + make check + make install + + +Here the `make check` command is optional, but highly recommended. It runs +a basic correctness test which usually takes well under a minute. + +### Other platforms + +On non-Unix, non-Linux platforms, the collector is usually built by copying +the appropriate makefile (see the platform-specific README in doc/README.xxx +in the distribution) to the file "Makefile", and then typing `make` (or +`nmake` or ...). This builds the library in the source tree. You may want +to move it and the files in the include directory to a more convenient place. + +If you use a makefile that does not require running a configure script, you +should first look at the makefile, and adjust any options that are documented +there. + +If your platform provides a `make` utility, that is generally preferred +to platform- and compiler- dependent "project" files. (At least that is the +strong preference of the would-be maintainer of those project files.) + +### Threads + +If you do not need thread support, configure the collector with: + + + --disable-threads + + +Alternatively, if your target is a real old-fashioned uniprocessor (no +"hyperthreading", etc.), you may just want to turn off parallel marking with +`--disable-parallel-mark`. + +### C++ + +You will need to include the C++ support, which unfortunately tends to be +among the least portable parts of the collector, since it seems to rely +on some corner cases of the language. On Linux, it suffices to add +`--enable-cplusplus` to the configure options. + +## Writing the program + +You will need to include "gc.h" at the beginning of every file that allocates +memory through the garbage collector. Call `GC_MALLOC` wherever you would have +call `malloc`. This initializes memory to zero like `calloc`; there is no need +to explicitly clear the result. + +If you know that an object will not contain pointers to the garbage-collected +heap, and you don't need it to be initialized, call `GC_MALLOC_ATOMIC` +instead. + +A function `GC_FREE` is provided but need not be called. For very small +objects, your program will probably perform better if you do not call it, and +let the collector do its job. + +A `GC_REALLOC` function behaves like the C library `realloc`. It allocates +uninitialized pointer-free memory if the original object was allocated that +way. + +The following program `loop.c` is a trivial example: + + + #include "gc.h" + #include + #include + + int main(void) { + int i; + + GC_INIT(); + for (i = 0; i < 10000000; ++i) { + int **p = (int **) GC_MALLOC(sizeof(int *)); + int *q = (int *) GC_MALLOC_ATOMIC(sizeof(int)); + assert(*p == 0); + *p = (int *) GC_REALLOC(q, 2 * sizeof(int)); + if (i % 100000 == 0) + printf("Heap size = %d\n", GC_get_heap_size()); + } + return 0; + } + + +### Interaction with the system malloc + +It is usually best not to mix garbage-collected allocation with the system +`malloc`-`free`. If you do, you need to be careful not to store pointers +to the garbage-collected heap in memory allocated with the system `malloc`. + +### Other Platforms + +On some other platforms it is necessary to call `GC_INIT` from the main +program, which is presumed to be part of the main executable, not a dynamic +library. This can never hurt, and is thus generally good practice. + +### Threads + +For a multi-threaded program, some more rules apply: + + * Files that either allocate through the GC _or make thread-related calls_ + should first define the macro `GC_THREADS`, and then include `gc.h`. On some + platforms this will redefine some threads primitives, e.g. to let the + collector keep track of thread creation. + +### C++ + +In the case of C++, you need to be especially careful not to store pointers +to the garbage-collected heap in areas that are not traced by the collector. +The collector includes some _alternate interfaces_ to make that easier. + +### Debugging + +Additional debug checks can be performed by defining `GC_DEBUG` before +including `gc.h`. Additional options are available if the collector is also +built with `--enable-gc-debug` (`--enable-full-debug` in some older versions) +and all allocations are performed with `GC_DEBUG` defined. + +### What if I can't rewrite/recompile my program? + +You may be able to build the collector with `--enable-redirect-malloc` and set +the `LD_PRELOAD` environment variable to point to the resulting library, thus +replacing the standard `malloc` with its garbage-collected counterpart. This +is rather platform dependent. See the _GC leak detection documentation_ for +some more details. + +## Compiling and linking + +The above application `loop.c` test program can be compiled and linked with: + + + cc -I/home/xyz/gc/include loop.c /home/xyz/gc/lib/libgc.a -o loop + + +The `-I` option directs the compiler to the right include directory. In this +case, we list the static library directly on the compile line; the dynamic +library could have been used instead, provided we arranged for the dynamic +loader to find it, e.g. by setting `LD_LIBRARY_PATH`. + +### Threads + +On pthread platforms, you will of course also have to link with `-lpthread`, +and compile with any thread-safety options required by your compiler. On some +platforms, you may also need to link with `-ldl` or `-lrt`. Looking +at `tools/threadlibs.c` should give you the appropriate list if a plain +`-lpthread` does not work. + +## Running the executable + +The executable can of course be run normally, e.g. by typing: + + + ./loop + + +The operation of the collector is affected by a number of environment +variables. For example, setting `GC_PRINT_STATS` produces some GC statistics +on stdout. See `README.environment` in the distribution for details. -- 2.40.0