From 6f8f39af54d73711d1226bb109ef2107a5e5b04d Mon Sep 17 00:00:00 2001 From: Ivan Maidanski Date: Tue, 26 Mar 2019 08:37:11 +0300 Subject: [PATCH] Update the documentation to match the current GC implementation * README.md: Update the documentation to match the current implementation of the collector. * doc/README.Mac: Likewise. * doc/README.autoconf: Likewise. * doc/README.darwin: Likewise. * doc/README.hp: Likewise. * doc/README.linux: Likewise. * doc/README.macros: Likewise. * doc/README.solaris2: Likewise. * doc/README.win32: Likewise. * doc/debugging.md: Likewise. * doc/finalization.md: Likewise. * doc/gc.man: Likewise. * doc/gcdescr.md: Likewise. * doc/gcinterface.md: Likewise. * doc/leak.md: Likewise. * doc/overview.md: Likewise. * doc/porting.md: Likewise. * doc/scale.md: Likewise. --- README.md | 88 ++++++++++++++++++++------------------------- doc/README.Mac | 7 ++-- doc/README.autoconf | 5 +-- doc/README.darwin | 12 +++---- doc/README.hp | 2 +- doc/README.linux | 8 ++--- doc/README.macros | 55 ++++++++++++++-------------- doc/README.solaris2 | 18 +++++----- doc/README.win32 | 21 +++++------ doc/debugging.md | 7 +--- doc/finalization.md | 15 ++++---- doc/gc.man | 16 +++++++-- doc/gcdescr.md | 49 ++++++++++++------------- doc/gcinterface.md | 54 ++++++++++++++-------------- doc/leak.md | 48 +++++++++++++------------ doc/overview.md | 52 +++++++++++++++------------ doc/porting.md | 18 +++++----- doc/scale.md | 49 +++++++++++++------------ 18 files changed, 263 insertions(+), 261 deletions(-) diff --git a/README.md b/README.md index 9a5a2e24..c50ecd53 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ Many of the ideas underlying the collector have previously been explored by others. Notably, some of the run-time systems developed at Xerox PARC in the early 1980s conservatively scanned thread stacks to locate possible pointers (cf. Paul Rovner, "On Adding Garbage Collection and Runtime Types -to a Strongly-Typed Statically Checked, Concurrent Language" Xerox PARC +to a Strongly-Typed Statically Checked, Concurrent Language" Xerox PARC CSL 84-7). Doug McIlroy wrote a simpler fully conservative collector that was part of version 8 UNIX (tm), but appears to not have received widespread use. @@ -142,8 +142,8 @@ at least one call to `GC_is_visible` to ensure that those areas are visible to the collector. Note that the garbage collector does not need to be informed of shared -read-only data. However if the shared library mechanism can introduce -discontiguous data areas that may contain pointers, then the collector does +read-only data. However, if the shared library mechanism can introduce +discontiguous data areas that may contain pointers then the collector does need to be informed. Signal processing for most signals may be deferred during collection, @@ -166,7 +166,7 @@ stored on the thread's stack for the duration of their lifetime. ## Installation and Portability -As distributed, the collector operates silently +The collector operates silently in the default configuration. In the event of problems, this can usually be changed by defining the `GC_PRINT_STATS` or `GC_PRINT_VERBOSE_STATS` environment variables. This will result in a few lines of descriptive output for each collection. @@ -215,12 +215,6 @@ runs a number of tests. `make install` installs at least libgc, and libcord. Try `./configure --help` to see the configuration options. It is currently not possible to exercise all combinations of build options this way. -It is suggested that if you need to replace a piece of the collector -(e.g. GC_mark_rts.c) you simply list your version ahead of gc.a on the -ld command line, rather than replacing the one in gc.a. (This will -generate numerous warnings under some versions of AIX, but it still -works.) - All include files that need to be used by clients will be put in the include subdirectory. (Normally this is just gc.h. `make cords` adds "cord.h" and "ec.h".) @@ -228,8 +222,6 @@ include subdirectory. (Normally this is just gc.h. `make cords` adds The collector currently is designed to run essentially unmodified on machines that use a flat 32-bit or 64-bit address space. That includes the vast majority of Workstations and X86 (X >= 3) PCs. -(The list here was deleted because it was getting too long and constantly -out of date.) In a few cases (Amiga, OS/2, Win32, MacOS) a separate makefile or equivalent is supplied. Many of these have separate README.system @@ -299,11 +291,8 @@ all of the following, plus many others. a given size. Returns a pointer to the new object, which may, or may not, be the same as the pointer to the old object. The new object is taken to be atomic if and only if the old one was. If the new object is composite - and larger than the original object then the newly added bytes are cleared - (we hope). This is very likely to allocate a new object, unless - `MERGE_SIZES` is defined in gc_priv.h. Even then, it is likely to recycle - the old object only if the object is grown in small additive increments - (which, we claim, is generally bad coding practice). + and larger than the original object then the newly added bytes are cleared. + This is very likely to allocate a new object. 4. `GC_free(object)` - Explicitly deallocate an object returned by `GC_malloc` or `GC_malloc_atomic`, or friends. Not necessary, but can be @@ -363,7 +352,7 @@ friends. All externally visible names in the garbage collector start with `GC_`. To avoid name conflicts, client code should avoid this prefix, except when -accessing garbage collector routines or variables. +accessing garbage collector routines. There are provisions for allocation with explicit type information. This is rarely necessary. Details can be found in gc_typed.h. @@ -371,12 +360,12 @@ This is rarely necessary. Details can be found in gc_typed.h. ## The C++ Interface to the Allocator -The Ellis-Hull C++ interface to the collector is included in -the collector distribution. If you intend to use this, type -`make c++` after the initial build of the collector is complete. -See gc_cpp.h for the definition of the interface. This interface -tries to approximate the Ellis-Detlefs C++ garbage collection -proposal without compiler changes. +The Ellis-Hull C++ interface to the collector is included in the collector +distribution. If you intend to use this, type +`./configure --enable-cplusplus; make` (or `make -f Makefile.direct c++`) +after the initial build of the collector is complete. See gc_cpp.h for the +definition of the interface. This interface tries to approximate the +Ellis-Detlefs C++ garbage collection proposal without compiler changes. Very often it will also be necessary to use gc_allocator.h and the allocator declared there to construct STL data structures. Otherwise @@ -389,26 +378,25 @@ allocator, and objects they refer to may be prematurely collected. The collector may be used to track down leaks in C programs that are intended to run with malloc/free (e.g. code with extreme real-time or portability constraints). To do so define `FIND_LEAK` in Makefile. -This will cause the collector to invoke the `report_leak` -routine defined near the top of reclaim.c whenever an inaccessible -object is found that has not been explicitly freed. Such objects will -also be automatically reclaimed. - -If all objects are allocated with `GC_DEBUG_MALLOC` (see next section), then -the default version of report_leak will report at least the source file and -line number at which the leaked object was allocated. This may sometimes be -sufficient. (On a few machines, it will also report a cryptic stack trace. -If this is not symbolic, it can sometimes be called into a symbolic stack -trace by invoking program "foo" with "tools/callprocs.sh foo". It is a short -shell script that invokes adb to expand program counter values to symbolic -addresses. It was largely supplied by Scott Schwartz.) +This will cause the collector to print a human-readable object description +whenever an inaccessible object is found that has not been explicitly freed. +Such objects will also be automatically reclaimed. + +If all objects are allocated with `GC_DEBUG_MALLOC` (see the next section) +then, by default, the human-readable object description will at least contain +the source file and the line number at which the leaked object was allocated. +This may sometimes be sufficient. (On a few machines, it will also report +a cryptic stack trace. If this is not symbolic, it can sometimes be called +into a symbolic stack trace by invoking program "foo" with +`tools/callprocs.sh foo`. It is a short shell script that invokes adb to +expand program counter values to symbolic addresses. It was largely supplied +by Scott Schwartz.) Note that the debugging facilities described in the next section can -sometimes be slightly LESS effective in leak finding mode, since in -leak finding mode, `GC_debug_free` actually results in reuse of the object. -(Otherwise the object is simply marked invalid.) Also note that the test -program is not designed to run meaningfully in `FIND_LEAK` mode. -Use "make gc.a" to build the collector. +sometimes be slightly LESS effective in leak finding mode, since in the latter +`GC_debug_free` actually results in reuse of the object. (Otherwise the +object is simply marked invalid.) Also, note that most GC tests are not +designed to run meaningfully in `FIND_LEAK` mode. ## Debugging Facilities @@ -428,8 +416,8 @@ object, so accidentally repeated calls to `GC_debug_free` will report the deallocation of an object without debugging information. Out of memory errors will be reported to stderr, in addition to returning `NULL`. -`GC_debug_malloc` checking during garbage collection is enabled -with the first call to `GC_debug_malloc`. This will result in some +`GC_debug_malloc` checking during garbage collection is enabled +with the first call to this function. This will result in some slowdown during collections. If frequent heap checks are desired, this can be achieved by explicitly invoking `GC_gcollect`, e.g. from the debugger. @@ -442,18 +430,18 @@ low probability that `GC_malloc` allocated objects may be misidentified as having been overwritten. This should happen with probability at most one in 2**32. This probability is zero if `GC_debug_malloc` is never called. -`GC_debug_malloc`, `GC_malloc_atomic`, and `GC_debug_realloc` take two +`GC_debug_malloc`, `GC_debug_malloc_atomic`, and `GC_debug_realloc` take two additional trailing arguments, a string and an integer. These are not interpreted by the allocator. They are stored in the object (the string is not copied). If an error involving the object is detected, they are printed. -The macros `GC_MALLOC`, `GC_MALLOC_ATOMIC`, `GC_REALLOC`, `GC_FREE`, and -`GC_REGISTER_FINALIZER` are also provided. These require the same arguments -as the corresponding (nondebugging) routines. If gc.h is included +The macros `GC_MALLOC`, `GC_MALLOC_ATOMIC`, `GC_REALLOC`, `GC_FREE`, +`GC_REGISTER_FINALIZER` and friends are also provided. These require the same +arguments as the corresponding (nondebugging) routines. If gc.h is included with `GC_DEBUG` defined, they call the debugging versions of these functions, passing the current file name and line number as the two extra arguments, where appropriate. If gc.h is included without `GC_DEBUG` -defined, then all these macros will instead be defined to their nondebugging +defined then all these macros will instead be defined to their nondebugging equivalents. (`GC_REGISTER_FINALIZER` is necessary, since pointers to objects with debugging information are really pointers to a displacement of 16 bytes from the object beginning, and some translation is necessary @@ -482,7 +470,7 @@ of information: 1. Information provided by the VM system. This may be provided in one of several forms. Under Solaris 2.X (and potentially under other similar systems) information on dirty pages can be read from the /proc file system. - Under other systems (currently SunOS4.X) it is possible to write-protect + Under other systems (e.g. SunOS4.X) it is possible to write-protect the heap, and catch the resulting faults. On these systems we require that system calls writing to the heap (other than read) be handled specially by client code. See `os_dep.c` for details. diff --git a/doc/README.Mac b/doc/README.Mac index f2206d48..b317a2a5 100644 --- a/doc/README.Mac +++ b/doc/README.Mac @@ -199,9 +199,10 @@ Files to build the GC libraries: checksums.c dbg_mlc.c finalize.c + fnlz_mlc.c headers.c mach_dep.c - MacOS.c -- contains MacOS code + extra/MacOS.c -- contains MacOS code malloc.c mallocx.c mark.c @@ -213,9 +214,7 @@ Files to build the GC libraries: ptr_chck.c reclaim.c typd_mlc.c - gc++.cc -- this is 'gc_cpp.cc' with less 'inline' and - -- throw std::bad_alloc when out of memory - -- gc_cpp.cc works just fine too + gc_cpp.cc == 2. Test that the library works with 'test.c' == diff --git a/doc/README.autoconf b/doc/README.autoconf index 7f9de22c..dd373402 100644 --- a/doc/README.autoconf +++ b/doc/README.autoconf @@ -51,8 +51,9 @@ Important options to configure: Unless --prefix is set (or --exec-prefix or one of the more obscure options), -make install will install libgc.a and libgc.so in /usr/local/bin, which -would typically require the "make install" to be run as root. +"make install" will install libgc.a and libgc.so in /usr/local/lib and +/usr/local/bin, respectively, which would typically require the "make install" +to be run as root. It is not recommended to turn off parallel marking for multiprocessors unless a poor support of the feature on the platform. diff --git a/doc/README.darwin b/doc/README.darwin index 2727d0b1..4bc6a538 100644 --- a/doc/README.darwin +++ b/doc/README.darwin @@ -11,18 +11,18 @@ CFLAGS="-arch ppc -arch i386 -arch x86_64" ./configure --disable-dependency-trac == Important Usage Notes == -GC_init() MUST be called before calling any other GC functions. This +GC_INIT() MUST be called before calling any other GC functions. This is necessary to properly register segments in dynamic libraries. This call is required even if you code does not use dynamic libraries as the dyld code handles registering all data segments. When your use of the garbage collector is confined to dylibs and you -cannot call GC_init() before your libraries' static initializers have +cannot call GC_INIT() before your libraries' static initializers have run and perhaps called GC_malloc(), create an initialization routine -for each library to call GC_init(): +for each library to call GC_INIT(), e.g.: #include "gc.h" -extern "C" void my_library_init() { GC_init(); } +extern "C" void my_library_init() { GC_INIT(); } Compile this code into a my_library_init.o, and link it into your dylib. When you link the dylib, pass the -init argument with @@ -31,10 +31,10 @@ my_library_init.o -init _my_library_init). This causes my_library_init() to be called before any static initializers, and will initialize the garbage collector properly. -Note: It doesn't hurt to call GC_init() more than once, so it's best, +Note: It doesn't hurt to call GC_INIT() more than once, so it's best, if you have an application or set of libraries that all use the garbage collector, to create an initialization routine for each of -them that calls GC_init(). Better safe than sorry. +them that calls GC_INIT(). Better safe than sorry. The incremental collector is still a bit flaky on darwin. It seems to work reliably with workarounds for a few possible bugs in place however diff --git a/doc/README.hp b/doc/README.hp index 83708ea0..cc31b18d 100644 --- a/doc/README.hp +++ b/doc/README.hp @@ -15,4 +15,4 @@ Define GC_THREADS macro for the build. Incremental collection still does not work in combination with it. The stack finding code can be confused by putenv calls before collector -initialization. Call GC_malloc or GC_init before any putenv calls. +initialization. Call GC_malloc() or GC_INIT() before any putenv() calls. diff --git a/doc/README.linux b/doc/README.linux index 7f23c2c0..29084fa0 100644 --- a/doc/README.linux +++ b/doc/README.linux @@ -1,7 +1,7 @@ See README.alpha for Linux on DEC AXP info. -This file applies mostly to Linux/Intel IA32. Ports to Linux on an M68K, -IA64, SPARC, MIPS, Alpha and PowerPC are integrated too. They should behave +This file applies mostly to Linux/Intel IA-32. Ports to Linux on an M68K, +IA-64, SPARC, MIPS, Alpha and PowerPC are integrated too. They should behave similarly, except that the PowerPC port lacks incremental GC support, and it is unknown to what extent the Linux threads code is functional. See below for M68K specific notes. @@ -29,8 +29,8 @@ To use threads, you need to abide by the following requirements: in the Makefile. 3a) Every file that makes thread calls should define GC_THREADS, and then - include gc.h. Gc.h redefines some of the pthread primitives as macros - which also provide the collector with information it requires. + include gc.h. The latter redefines some of the pthread primitives as + macros which also provide the collector with information it requires. 3b) A new alternative to (3a) is to build the collector and compile GC clients with -DGC_USE_LD_WRAP, and to link the final program with diff --git a/doc/README.macros b/doc/README.macros index 94596e98..886bb608 100644 --- a/doc/README.macros +++ b/doc/README.macros @@ -83,12 +83,13 @@ GC_NOT_DLL User-settable macro that overrides _DLL, e.g. if runtime dynamic libraries are used, but the collector is in a static library. Tested by gc_config_macros.h. -GC_REQUIRE_WCSDUP Force GC to export GC_wcsdup() (the Unicode version - of GC_strdup); could be useful in the leak-finding mode. - These define arguments influence the collector configuration: +GC_REQUIRE_WCSDUP Force GC to export GC_wcsdup() (the Unicode version + of GC_strdup); could be useful in the leak-finding mode. Clients should + define it before including gc.h if the function is needed. + FIND_LEAK Causes GC_find_leak to be initially set. This causes the collector to assume that all inaccessible objects should have been explicitly deallocated, and reports exceptions. Finalization and the test @@ -111,13 +112,13 @@ SUNOS5SIGS Solaris-like signal handling. This is probably misnamed, PCR Set if the collector is being built as part of the Xerox Portable Common Runtime. -IMPORTANT: Any of the _THREADS options must normally also be defined in - the client before including gc.h. This redefines thread primitives to - invoke the GC_ versions instead. Alternatively, linker-based symbol - interception can be used on a few platforms. - GC_THREADS Should set the appropriate one of the below macros, except GC_WIN32_PTHREADS, which must be set explicitly. Tested by gc.h. + IMPORTANT: GC_THREADS macro (or the relevant platform-specific deprecated + one) must normally also be defined by the client before including gc.h. + This redefines thread primitives to invoke the GC_ wrappers instead. + Alternatively, linker-based symbol interception can be used on a few + platforms. GC_SOLARIS_THREADS Enables support for Solaris pthreads. Must also define _REENTRANT. Deprecated, use GC_THREADS instead. @@ -155,8 +156,7 @@ GC_DGUX386_THREADS Enables support for DB/UX on I386 threads. See README.DGUX386. (Probably has not been tested recently.) Deprecated, use GC_THREADS instead. -GC_WIN32_THREADS Enables support for Win32 threads. That makes sense - for Makefile (and Makefile.direct) only under Cygwin or MinGW. Deprecated, +GC_WIN32_THREADS Enables support for Win32 threads. Deprecated, use GC_THREADS instead. GC_WIN32_PTHREADS Enables support for pthreads-win32 (or other @@ -183,9 +183,9 @@ NO_CLOCK Do not use system clock. Disables some statistic printing. GC_DISABLE_INCREMENTAL Turn off the incremental collection support. -NO_INCREMENTAL Causes the gctest program to not invoke the incremental - collector. This has no impact on the generated library, only on the test - program. (This is often useful for debugging failures unrelated to +NO_INCREMENTAL Causes the GC test programs to not invoke the incremental mode + of the collector. This has no impact on the generated library, only on the + test programs. (This is often useful for debugging failures unrelated to incremental GC.) LARGE_CONFIG Tunes the collector for unusually large heaps. @@ -211,7 +211,7 @@ NO_EXECUTE_PERMISSION May cause some or all of the heap to not execute permission is required. GC_NO_OPERATOR_NEW_ARRAY Declares that the C++ compiler does not - support the new syntax "operator new[]" for allocating and deleting arrays. + support the new syntax "operator new[]" for allocating and deleting arrays. See gc_cpp.h for details. No effect on the C part of the collector. This is defined implicitly in a few environments. Must also be defined by clients that use gc_cpp.h. @@ -327,9 +327,9 @@ KEEP_BACK_PTRS Add code to save back pointers in debugging headers for debugging/profiling purposes. The gc_backptr.h interface is implemented only if this is defined. -GC_ASSERTIONS Enable some internal GC assertion checking. Currently - this facility is only used in a few places. It is intended primarily - for debugging of the garbage collector itself, but could also... +GC_ASSERTIONS Enable some internal GC assertion checking. It is intended + primarily for debugging of the garbage collector itself, but could also + help to identify cases of incorrect GC usage by a client. DBG_HDRS_ALL Make sure that all objects have debug headers. Increases the reliability (from 99.9999% to 100% mod. bugs) of some of the debugging @@ -350,10 +350,10 @@ SHORT_DBG_HDRS Assume that all objects have debug headers. Shorten SAVE_CALL_COUNT= Set the number of call frames saved with objects allocated through the debugging interface. Affects the amount of information generated in leak reports. Only matters on platforms - on which we can quickly generate call stacks, currently Linux/(X86 & SPARC) - and Solaris/SPARC and platforms that provide execinfo.h. - Default is zero. On X86, client - code should NOT be compiled with -fomit-frame-pointer. + on which we can quickly generate call stacks, currently Linux/X86, + Linux/SPARC, Solaris/SPARC, and platforms that provide execinfo.h. + Default is zero. On X86, client code should NOT be compiled with + -fomit-frame-pointer. SAVE_CALL_NARGS= Set the number of functions arguments to be saved with each call frame. Default is zero. Ignored if we don't know how to @@ -367,13 +367,11 @@ GC_GCJ_SUPPORT Includes support for gcj (and possibly other systems that include a pointer to a type descriptor in each allocated object). USE_I686_PREFETCH Causes the collector to issue Pentium III style - prefetch instructions. No effect except on X86 Linux platforms. - Assumes a very recent gcc-compatible compiler and assembler. - (Gas prefetcht0 support was added around May 1999.) + prefetch instructions. No effect except on Linux/X86 platforms. Empirically the code appears to still run correctly on Pentium II processors, though with no performance benefit. May not run on other - X86 processors? In some cases this improves performance by - 15% or so. + X86 processors probably. In some cases this improves performance by 15% + or so. USE_3DNOW_PREFETCH Causes the collector to issue AMD 3DNow style prefetch instructions. Same restrictions as USE_I686_PREFETCH. @@ -396,8 +394,7 @@ GC_USE_DLOPEN_WRAP Causes the collector to redefine malloc and THREAD_LOCAL_ALLOC Defines GC_malloc(), GC_malloc_atomic() and GC_gcj_malloc() to use a per-thread set of free-lists. These then allocate in a way that usually does not involve acquisition of a global lock. - Recommended for multiprocessors. Requires explicit GC_INIT() call, unless - REDIRECT_MALLOC is defined and GC_malloc is used first. + Recommended for multiprocessors. USE_COMPILER_TLS Causes thread local allocation to use the compiler-supported "__thread" thread-local variables. This is the @@ -535,7 +532,7 @@ DONT_USE_USER32_DLL (Win32 only) Don't use "user32" DLL import library GC_PREFER_MPROTECT_VDB Choose MPROTECT_VDB manually in case of multiple virtual dirty bit strategies are implemented (at present useful on Win32 and Solaris to force MPROTECT_VDB strategy instead of the default GWW_VDB or - PROC_VDB ones). + PROC_VDB ones, respectively). GC_IGNORE_GCJ_INFO Disable GCJ-style type information (useful for debugging on WinCE). diff --git a/doc/README.solaris2 b/doc/README.solaris2 index 7d8814fe..85f0ea22 100644 --- a/doc/README.solaris2 +++ b/doc/README.solaris2 @@ -10,7 +10,7 @@ the collector normally obtains memory through sbrk. There is some reason to expect that this is not safe if the client program also calls the system malloc, or especially realloc. The sbrk man page strongly suggests this is not safe: "Many library routines use malloc() internally, so use brk() -and sbrk() only when you know that malloc() definitely will not be used by +and sbrk() only when you know that malloc() definitely will not be used by any library routine." This doesn't make a lot of sense to me, since there seems to be no documentation as to which routines can transitively call malloc. Nonetheless, under Solaris2, the collector now allocates @@ -33,9 +33,9 @@ by configure --disable-parallel-mark option). It is also essential that gc.h be included in files that call pthread_create, pthread_join, pthread_detach, or dlopen. gc.h macro defines these to also do -GC bookkeeping, etc. gc.h must be included with one or both of these macros -defined, otherwise these replacements are not visible. A collector built in -this way way only be used by programs that are linked with the threads library. +GC bookkeeping, etc. gc.h must be included with GC_THREADS macro defined +first, otherwise these replacements are not visible. A collector built in +this way may only be used by programs that are linked with the threads library. Unless USE_PROC_FOR_LIBRARIES is defined, dlopen disables collection temporarily. In some unlikely cases, this can result in unpleasant heap @@ -46,11 +46,11 @@ GC_malloc, it is necessary to call GC_INIT explicitly before forking the first thread. (This avoids a deadlock arising from calling GC_thr_init with the allocation lock held.) -It appears that there is a problem in using gc_cpp.h in conjunction with -Solaris threads and Sun's C++ runtime. Apparently the overloaded new operator -is invoked by some iostream initialization code before threads are correctly -initialized. As a result, call to thr_self() in garbage collector -initialization SEGV faults. Currently the only known workaround is to not +There could be an issue when using gc_cpp.h in conjunction with Solaris +threads and Sun's C++ runtime. Apparently the overloaded new operator +may be invoked by some iostream initialization code before threads are +correctly initialized. This may cause a SIGSEGV during initialization +of the garbage collector. Currently the only known workaround is to not invoke the garbage collector from a user defined global operator new, or to have it invoke the garbage-collector's allocators only after main has started. (Note that the latter requires a moderately expensive test in operator diff --git a/doc/README.win32 b/doc/README.win32 index 19038c40..48a95fa2 100644 --- a/doc/README.win32 +++ b/doc/README.win32 @@ -1,8 +1,7 @@ -The collector has at various times been compiled under Windows 95 & later, NT, -and XP, with the original Microsoft SDK, with Visual C++ 2.0, 4.0, and 6, with -the GNU win32 tools, with Borland C++ Builder, with Watcom C, and -with the Digital Mars compiler. It is likely that some of these have been -broken in the meantime. Patches are appreciated. +The collector has at various times been compiled under Windows 95 and later, +NT, and XP, with the original Microsoft SDK, with Visual C++ 2.0, 4.0, and 6, +with the GNU win32 tools, with Borland C++ Builder, with Watcom C, with EMX, +and with the Digital Mars compiler (DMC). For historical reasons, the collector test program "gctest" is linked as a GUI application, @@ -11,8 +10,8 @@ but does not open any windows. Its output normally appears in the file cursor may appear as long as it's running. If it is started from the command line, it will usually run in the background. Wait a few minutes (a few seconds on a modern machine) before you check the output. -You should see either a failure indication or a "Collector appears to -work" message. +You should see either a failure indication or a "Collector appears to work" +message. A toy editor (cord/de.exe) based on cords (heavyweight strings represented as trees) has been ported and is included. @@ -40,6 +39,7 @@ since we now separate heap sections with an unused page.) Microsoft Tools --------------- + For Microsoft development tools, type "nmake -f NT_MAKEFILE cpu=i386 make_as_lib=1 nothreads=1 nodebug=1" to build the release variant of the collector as a static library without @@ -61,6 +61,7 @@ collector was built as a static library. GNU Tools --------- + The collector should be buildable under Cygwin with the "./configure; make check" machinery. @@ -78,6 +79,7 @@ Memory unmapping could be turned off by "--disable-munmap" option. Borland Tools ------------- + [Rarely tested.] For Borland tools, use BCC_MAKEFILE. Note that Borland's compiler defaults to 1 byte alignment in structures (-a1), @@ -132,7 +134,6 @@ If the gc is compiled as dll, the macro "GC_DLL" should be defined before including "gc.h" (for example, with -DGC_DLL compiler option). It's important, otherwise resulting programs will not run. - Special note for OpenWatcom users: the C (unlike the C++) compiler (of the latest stable release, not sure for older ones) doesn't force pointer global variables (i.e. not struct fields, not sure for locals) to be aligned unless @@ -141,9 +142,9 @@ pragma) only controls alignment for structs; I don't know whether it's a bug or a feature (see an old report of same kind - http://bugzilla.openwatcom.org/show_bug.cgi?id=664), so You are warned. - Incremental Collection ---------------------- + There is some support for incremental collection. By default, the collector chooses between explicit page protection, and GetWriteWatch-based write tracking automatically, depending on the platform. @@ -199,7 +200,7 @@ CMakeLists.txt). For the normal, non-dll-based thread tracking to work properly, threads should be created with GC_CreateThread or GC_beginthreadex, -and exit normally or call GC_endthreadex or GC_ExitThread. (For Cygwin, the +and exit normally, or call GC_endthreadex or GC_ExitThread. (For Cygwin, the standard pthread_create/exit calls could be used instead.) As in the pthread case, including gc.h will redefine CreateThread, _beginthreadex, _endthreadex, and ExitThread to call the GC_ versions instead. diff --git a/doc/debugging.md b/doc/debugging.md index 10727f27..c43c355c 100644 --- a/doc/debugging.md +++ b/doc/debugging.md @@ -43,13 +43,8 @@ currently uses SIGPWR and SIGXCPU by default. The garbage collector generates warning messages of the form: - Needed to allocate blacklisted block at 0x... - - -or - - Repeated allocation of very large block ... + May lead to memory leak and poor performance when it needs to allocate a block at a location that it knows to be referenced diff --git a/doc/finalization.md b/doc/finalization.md index 75428cd6..49aee982 100644 --- a/doc/finalization.md +++ b/doc/finalization.md @@ -37,19 +37,18 @@ In general the following guidelines should be followed: recently used logically open files. Any other needed files would be closed after saving their state. They would then be reopened on demand. Finalization would logically close the file, closing the real descriptor - only if it happened to be cached.) Note that most modern systems (e.g. Irix) - allow hundreds or thousands of open files, and this is typically not - an issue. + only if it happened to be cached.) Note that most modern systems allow + thousands of open files, and this is typically not an issue. * Finalization code may be run anyplace an allocation or other call to the collector takes place. In multi-threaded programs, finalizers have to obey the normal locking conventions to ensure safety. Code run directly from finalizers should not acquire locks that may be held during allocation. - This restriction can be easily circumvented by registering a finalizer which - enqueues the real action for execution in a separate thread. + This restriction can be easily circumvented by calling + `GC_set_finalize_on_demand(1)` at program start and creating a separate + thread dedicated to periodic invocation of `GC_invoke_finalizers()`. -In single-threaded code, it is also often easiest to have finalizers queue -actions, which are then explicitly run during an explicit call by the user's -program. +In single-threaded code, it is also often easiest to have finalizers queued +and, then to have them explicitly executed by `GC_invoke_finalizers()`. ## Topologically ordered finalization diff --git a/doc/gc.man b/doc/gc.man index 590bf544..dd3da022 100644 --- a/doc/gc.man +++ b/doc/gc.man @@ -1,4 +1,4 @@ -.TH BDWGC 3 "15 Aug 2018" +.TH BDWGC 3 "26 Mar 2019" .SH NAME GC_malloc, GC_malloc_atomic, GC_free, GC_realloc, GC_enable_incremental, GC_register_finalizer, GC_malloc_ignore_off_page, GC_malloc_atomic_ignore_off_page, GC_set_warn_proc \- Garbage collecting malloc replacement .SH SYNOPSIS @@ -6,10 +6,20 @@ GC_malloc, GC_malloc_atomic, GC_free, GC_realloc, GC_enable_incremental, GC_regi .br void * GC_malloc(size_t size); .br +void * GC_malloc_atomic(size_t size); +.br void GC_free(void *ptr); .br void * GC_realloc(void *ptr, size_t size); .br +void GC_enable_incremental(); +.br +void * GC_malloc_ignore_off_page(size_t size); +.br +void * GC_malloc_atomic_ignore_off_page(size_t size); +.br +void GC_set_warn_proc(void (*proc)(char *, GC_word)); +.br .sp cc ... -lgc .LP @@ -67,7 +77,7 @@ inform the collector that the client code will always maintain a pointer to near .LP It is also possible to use the collector to find storage leaks in programs destined to be run with standard malloc/free. The collector can be compiled for thread-safe operation. Unlike standard malloc, it is safe to call malloc after a previous malloc call was interrupted by a signal, provided the original malloc call is not resumed. .LP -The collector may, on rare occasion produce warning messages. On UNIX machines these appear on stderr. Warning messages can be filtered, redirected, or ignored with +The collector may, on rare occasion, produce warning messages. On UNIX machines these appear on stderr. Warning messages can be filtered, redirected, or ignored with .I GC_set_warn_proc This is recommended for production code. See gc.h for details. @@ -75,7 +85,7 @@ This is recommended for production code. See gc.h for details. Fully portable code should call .I GC_INIT -from the main program before making any other GC calls. +from the primordial thread of the main program before making any other GC calls. On most platforms this does nothing and the collector is initialized on first use. On a few platforms explicit initialization is necessary. And it can never hurt. .LP diff --git a/doc/gcdescr.md b/doc/gcdescr.md index 923ccf82..daaf5203 100644 --- a/doc/gcdescr.md +++ b/doc/gcdescr.md @@ -58,17 +58,16 @@ of the garbage collector is stored inside the `_GC_arrays` structure. This allows the garbage collector to easily ignore the collectors own data structures when it searches for root pointers. Other allocator and collector internal data structures are allocated dynamically with `GC_scratch_alloc`. -`GC_scratch_alloc` does not allow for deallocation, and is therefore used only -for permanent data structures. +The latter does not allow for deallocation, and is therefore used only for +permanent data structures. -The allocator allocates objects of different _kinds_. Different kinds are +The allocator returns objects of different _kinds_. Different _kinds_ are handled somewhat differently by certain parts of the garbage collector. Certain kinds are scanned for pointers, others are not. Some may have per-object type descriptors that determine pointer locations. Or a specific kind may correspond to one specific object layout. Two built-in kinds are -uncollectible. -In spite of that, it is very likely that most C clients of the collector -currently use at most two kinds: `NORMAL` and `PTRFREE` objects. The +uncollectible. In spite of that, it is very likely that most C clients of the +collector currently use at most two kinds: `NORMAL` and `PTRFREE` objects. The [GCJ](https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcj/) runtime also makes heavy use of a kind (allocated with `GC_gcj_malloc`) that stores type information at a known offset in method tables. @@ -181,7 +180,7 @@ variables are located, it scans the following _root segments_ for pointers: a length. (For other possibilities, see `gc_mark.h`.) At the beginning of the mark phase, all root segments (as described above) are -pushed on the stack by `GC_push_roots`. (Registers and eagerly processed stack +pushed on the stack by `GC_push_roots`. (Registers and eagerly scanned stack sections are processed by pushing the referenced objects instead of the stack section itself.) If `ALL_INTERIOR_POINTERS` is not defined, then stack roots require special treatment. In this case, the normal marking code ignores @@ -233,9 +232,9 @@ forward progress, even in case of repeated mark stack overflows. Every mark attempt results in additional marked objects. Each mark stack entry is processed by examining all candidate pointers in the -range described by the entry. If the region has no associated type -information, then this typically requires that each 4-byte aligned quantity -(8-byte aligned with 64-bit pointers) be considered a candidate pointer. +range described by the entry. If the region has no associated type information +then this typically requires that each 4-byte aligned quantity (8-byte aligned +if 64-bit pointers) be considered a candidate pointer. We determine whether a candidate pointer is actually the address of a heap block. This is done in the following steps: @@ -248,8 +247,10 @@ block. This is done in the following steps: * The candidate pointer is divided into two pieces; the most significant bits identify a `HBLKSIZE`-sized page in the address space, and the least significant bits specify an offset within that page. (A hardware page may - actually consist of multiple such pages. HBLKSIZE is usually the page size - divided by a small power of two.) + actually consist of multiple such pages. Normally, HBLKSIZE is usually the + page size divided by a small power of two. Alternatively, if the collector + is built with `-DLARGE_CONFIG`, such a page may consist of multiple hardware + pages.) * The page address part of the candidate pointer is looked up in a [table](tree.md). Each table entry contains either 0, indicating that the page is not part of the garbage collected heap, a small integer _n_, @@ -268,8 +269,8 @@ block. This is done in the following steps: operation in computing the object start address. * The mark bit for the target object is checked and set. If the object was previously unmarked, the object is pushed on the mark stack. The descriptor - is read from the page descriptor. (This is computed from information - `GC_obj_kinds` when the page is first allocated.) + is read from the page descriptor. (This is computed from information stored + in `GC_obj_kinds` when the page is first allocated.) At the end of the mark phase, mark bits for left-over free lists are cleared, in case a free list was accidentally marked due to a stray pointer. @@ -372,7 +373,7 @@ collector, and hence provoking unneeded heap growth. In incremental mode, the heap is always expanded when we encounter insufficient space for an allocation. Garbage collection is triggered whenever -we notice that more than `GC_heap_size`/2 * `GC_free_space_divisor` bytes +we notice that more than `GC_heap_size / 2 * GC_free_space_divisor` bytes of allocation have taken place. After `GC_full_freq` minor collections a major collection is started. @@ -473,27 +474,27 @@ allocation in the next section. ## Thread-local allocation -If thread-local allocation is enabled, the collector keeps separate arrays -of free lists for each thread. Thread-local allocation is currently only -supported on a few platforms. +If thread-local allocation is enabled (which is true in the default +configuration for most supported platforms), the collector keeps separate +arrays of free lists for each thread. The free list arrays associated with each thread are only used to satisfy requests for objects that are both very small, and belong to one of a small -number of well-known kinds. These currently include _normal_ and pointer-free -objects. Depending on the configuration, _gcj_ objects may also be included. +number of well-known kinds. These include _normal_, pointer-free, _gcj_ and +_disclaim_ objects. Thread-local free list entries contain either a pointer to the first element of a free list, or they contain a counter of the number of allocation granules, corresponding to objects of this size, allocated so far. Initially they contain the value one, i.e. a small counter value. -Thread-local allocation allocates directly through the global allocator, -if the object is of a size or kind not covered by the local free lists. +Thread-local allocation goes directly through the global allocator if the +object is of a size or kind not covered by the local free lists. If there is an appropriate local free list, the allocator checks whether it contains a sufficiently small counter value. If so, the counter is simply -incremented by the counter value, and the global allocator is used. In this -way, the initial few allocations of a given size bypass the local allocator. +incremented by a value, and the global allocator is used. In this way, +the initial few allocations of a given size bypass the local allocator. A thread that only allocates a handful of objects of a given size will not build up its own free list for that size. This avoids wasting space for unpopular objects sizes or kinds. diff --git a/doc/gcinterface.md b/doc/gcinterface.md index 9fb1b849..cbce0983 100644 --- a/doc/gcinterface.md +++ b/doc/gcinterface.md @@ -15,7 +15,8 @@ on how the collector is built, this will be `gc.a` or `libgc.{a,so}`. The following describes the standard C interface to the garbage collector. It is not a complete definition of the interface. It describes only the most commonly used functionality, approximately in decreasing order of frequency -of use. The full interface is described in `gc.h` file. +of use. This somewhat duplicates the information in `gc.man` file. The full +interface is described in `gc.h` file. Clients should include `gc.h` (i.e., not `gc_config_macros.h`, `gc_pthread_redirects.h`, `gc_version.h`). In the case of multi-threaded code, @@ -27,11 +28,11 @@ to cooperate with the GC on many platforms. Thread users should also be aware that on many platforms objects reachable only from thread-local variables may be prematurely reclaimed. Thus objects pointed to by thread-local variables should also be pointed to by a globally -visible data structure. (This is viewed as a bug, but as one that -is exceedingly hard to fix without some `libc` hooks.) +visible data area, e.g. thread's stack. (This behavior is viewed as a bug, but +as one that is exceedingly hard to fix without some `libc` hooks.) -**void * `GC_MALLOC`(size_t _nbytes_)** - Allocates and clears _nbytes_ -of storage. Requires (amortized) time proportional to _nbytes_. The resulting +**void * `GC_MALLOC`(size_t _bytes_)** - Allocates and clears _bytes_ +of storage. Requires (amortized) time proportional to _bytes_. The resulting object will be automatically deallocated when unreferenced. References from objects allocated with the system malloc are usually not considered by the collector. (See `GC_MALLOC_UNCOLLECTABLE`, however. Building the collector @@ -40,33 +41,33 @@ with `-DREDIRECT_MALLOC=GC_malloc_uncollectable` is often a way around this.) is defined before `gc.h` is included, a debugging version that checks occasionally for overwrite errors, and the like. -**void * `GC_MALLOC_ATOMIC`(size_t _nbytes_)** - Allocates _nbytes_ -of storage. Requires (amortized) time proportional to _nbytes_. The resulting +**void * `GC_MALLOC_ATOMIC`(size_t _bytes_)** - Allocates _bytes_ +of storage. Requires (amortized) time proportional to _bytes_. The resulting object will be automatically deallocated when unreferenced. The client promises that the resulting object will never contain any pointers. The memory is not cleared. This is the preferred way to allocate strings, floating point arrays, bitmaps, etc. More precise information about pointer locations can be communicated to the collector using the interface in `gc_typed.h`. -**void * `GC_MALLOC_UNCOLLECTABLE`(size_t _nbytes_)** - Identical +**void * `GC_MALLOC_UNCOLLECTABLE`(size_t _bytes_)** - Identical to `GC_MALLOC`, except that the resulting object is not automatically deallocated. Unlike the system-provided `malloc`, the collector does scan the object for pointers to garbage-collectible memory, even if the block itself does not appear to be reachable. (Objects allocated in this way are effectively treated as roots by the collector.) -**void * `GC_REALLOC`(void * _old_, size_t _new_size_)** - Allocates a new -object of the indicated size and copy (a prefix of) the old object into the +**void * `GC_REALLOC`(void * _old_object_, size_t _new_bytes_)** - Allocates +a new object of the indicated size and copy the old object's content into the new object. The old object is reused in place if convenient. If the original object was allocated with `GC_MALLOC_ATOMIC`, the new object is subject to the same constraints. If it was allocated as an uncollectible object, then the new object is uncollectible, and the old object (if different) is deallocated. -**void `GC_FREE`(void * _dead_)** - Explicitly deallocates an object. +**void `GC_FREE`(void * _object_)** - Explicitly deallocates an _object_. Typically not useful for small collectible objects. -**void * `GC_MALLOC_IGNORE_OFF_PAGE`(size_t _nbytes_)** and -**void * `GC_MALLOC_ATOMIC_IGNORE_OFF_PAGE`(size_t _nbytes_)** - Analogous +**void * `GC_MALLOC_IGNORE_OFF_PAGE`(size_t _bytes_)** and +**void * `GC_MALLOC_ATOMIC_IGNORE_OFF_PAGE`(size_t _bytes_)** - Analogous to `GC_MALLOC` and `GC_MALLOC_ATOMIC`, respectively, except that the client guarantees that as long as the resulting object is of use, a pointer is maintained to someplace inside the first 512 bytes of the object. This @@ -75,7 +76,9 @@ optimizations. (Other nonvolatile pointers to the object may exist as well.) This is the preferred way to allocate objects that are likely to be more than 100 KB in size. It greatly reduces the risk that such objects will be accidentally retained when they are no longer needed. Thus space usage may -be significantly reduced. +be significantly reduced. Another way is `GC_set_all_interior_pointers(0)` +called at program start (this, however, is generally not suitable for C++ code +because of multiple inheretance). **void `GC_INIT()`** - On some platforms, it is necessary to invoke this _from the main executable_, _not from a dynamic library_, before the initial @@ -89,9 +92,9 @@ as possible. to perform a small amount of work every few invocations of `GC_MALLOC` or the like, instead of performing an entire collection at once. This is likely to increase total running time. It will improve response on a platform that -either has suitable support in the garbage collector (Linux and most Unix -versions, Win32 if the collector was suitably built). On many platforms this -interacts poorly with system calls that write to the garbage collected heap. +has suitable support in the garbage collector (Linux and most Unix versions, +Win32 if the collector was suitably built). On many platforms this interacts +poorly with system calls that write to the garbage collected heap. **void `GC_set_warn_proc`(GC_warn_proc)** - Replaces the default procedure used by the collector to print warnings. The collector may otherwise @@ -105,20 +108,17 @@ releasing system resources (e.g. closing files) when the object referencing them becomes inaccessible. It is not an acceptable method to perform actions that must be performed in a timely fashion. See `gc.h` for details of the interface. See also [here](finalization.md) for a more detailed discussion -of the design. - -Note that an object may become inaccessible before client code is done -operating on objects referenced by its fields. Suitable synchronization -is usually required. See -[here](http://portal.acm.org/citation.cfm?doid=604131.604153) -or [here](http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html) for -details. +of the design. Note that an object may become inaccessible before client code +is done operating on objects referenced by its fields. Suitable +synchronization is usually required. See +[here](http://portal.acm.org/citation.cfm?doid=604131.604153) or +[here](http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html) for details. If you are concerned with multiprocessor performance and scalability, you should consider enabling and using thread local allocation. -If your platform supports it, you should build the collector with parallel -marking support (`-DPARALLEL_MARK`); configure has it on by default. +If your platform supports it, you should also build the collector with +parallel marking support (`-DPARALLEL_MARK`); configure has it on by default. If the collector is used in an environment in which pointer location information for heap objects is easily available, this can be passed on to the diff --git a/doc/leak.md b/doc/leak.md index cf5992c4..9d5f32ba 100644 --- a/doc/leak.md +++ b/doc/leak.md @@ -19,26 +19,26 @@ of paging. The garbage collector provides leak detection support. This includes the following features: - 1. Leak detection mode can be initiated at run-time by setting - `GC_find_leak` instead of building the collector with `FIND_LEAK` defined. - This variable should be set to a nonzero value at program startup. + 1. Leak detection mode can be initiated at run-time by `GC_set_find_leak(1)` + call at program startup instead of building the collector with `FIND_LEAK` + macro defined. 2. Leaked objects should be reported and then correctly garbage collected. -To use the collector as a leak detector, follow the following steps: +To use the collector as a leak detector, do the following steps: - 1. Build the collector with `-DFIND_LEAK`. Otherwise use default build - options. + 1. Activate the leak detection mode as described above. 2. Change the program so that all allocation and deallocation goes through the garbage collector. - 3. Arrange to call `GC_gcollect` at appropriate points to check for leaks. - (For sufficiently long running programs, this will happen implicitly, but - probably not with sufficient frequency.) + 3. Arrange to call `GC_gcollect` (or `CHECK_LEAKS()`) at appropriate points + to check for leaks. (This happens implicitly but probably not with + a sufficient frequency for long running programs.) The second step can usually be accomplished with the `-DREDIRECT_MALLOC=GC_malloc` option when the collector is built, or by -defining `malloc`, `calloc`, `realloc` and `free` to call the corresponding +defining `malloc`, `calloc`, `realloc`, `free` (as well as `strdup`, +`strndup`, `wcsdup`, `memalign`, `posix_memalign`) to call the corresponding garbage collector functions. But this, by itself, will not yield very -informative diagnostics, since the collector does not keep track of +informative diagnostics, since the collector does not keep track of the information about how objects were allocated. The error reports will include only object addresses. @@ -57,7 +57,7 @@ The same is generally true of thread support. However, the correct leak reports should be generated with linuxthreads, at least. On a few platforms (currently Solaris/SPARC, Irix, and, with --DSAVE_CALL_CHAIN, Linux/X86), `GC_MALLOC` also causes some more information +`-DSAVE_CALL_CHAIN`, Linux/X86), `GC_MALLOC` also causes some more information about its call stack to be saved in the object. Such information is reproduced in the error reports in very non-symbolic form, but it can be very useful with the aid of a debugger. @@ -70,13 +70,14 @@ distribution. Assume the collector has been built with `-DFIND_LEAK` or `GC_set_find_leak(1)` exists as the first statement in `main`. -The program to be tested for leaks can then look like "leak_test.c" file -in the "tests" subdirectory of the distribution. +The program to be tested for leaks could look like `tests/leak_test.c` file +of the distribution. On an Intel X86 Linux system this produces on the stderr stream: - Leaked composite object at 0x806dff0 (leak_test.c:8, sz=4) + Found 1 leaked objects: + 0x806dff0 (tests/leak_test.c:19, sz=4, NORMAL) (On most unmentioned operating systems, the output is similar to this. If the @@ -87,7 +88,8 @@ not be compiled with `-fomit_frame_pointer`.) On Irix it reports: - Leaked composite object at 0x10040fe0 (leak_test.c:8, sz=4) + Found 1 leaked objects: + 0x10040fe0 (tests/leak_test.c:19, sz=4, NORMAL) Caller at allocation: ##PC##= 0x10004910 @@ -95,7 +97,8 @@ On Irix it reports: and on Solaris the error report is: - Leaked composite object at 0xef621fc8 (leak_test.c:8, sz=4) + Found 1 leaked objects: + 0xef621fc8 (tests/leak_test.c:19, sz=4, NORMAL) Call chain at allocation: args: 4 (0x4), 200656 (0x30FD0) ##PC##= 0x14ADC @@ -106,14 +109,13 @@ and on Solaris the error report is: In the latter two cases some additional information is given about how malloc was called when the leaked object was allocated. For Solaris, the first line specifies the arguments to `GC_debug_malloc` (the actual allocation routine), -The second the program counter inside main, the third the arguments to `main`, -and finally the program counter inside the caller to main (i.e. in the -C startup code). - -In the Irix case, only the address inside the caller to main is given. +The second one specifies the program counter inside `main`, the third one +specifies the arguments to `main`, and, finally, the program counter inside +the caller to `main` (i.e. in the C startup code). In the Irix case, only the +address inside the caller to `main` is given. In many cases, a debugger is needed to interpret the additional information. -On systems supporting the "adb" debugger, the `tools/callprocs.sh` script can +On systems supporting the `adb` debugger, the `tools/callprocs.sh` script can be used to replace program counter values with symbolic names. The collector tries to generate symbolic names for call stacks if it knows how to do so on the platform. This is true on Linux/X86, but not on most other platforms. diff --git a/doc/overview.md b/doc/overview.md index 2a28a0fa..3edccf1f 100644 --- a/doc/overview.md +++ b/doc/overview.md @@ -6,9 +6,9 @@ * Platforms * Some collector details * Further reading - * Local Links for this collector - * Local Background Links - * Contacts and Mailing List + * Information provided on the BDWGC site + * More background information + * Contacts and new release announcements [ This is an updated version of the page formerly at `www.hpl.hp.com/personal/Hans_Boehm/gc/`, before that at @@ -39,6 +39,8 @@ legacy. Usually you should use the one marked as the _latest stable_ release. Preview versions may contain additional features, platform support, but are likely to be less well tested. The list of changes for each version is specified on the [releases](https://github.com/ivmai/bdwgc/releases) page. +The development version (snapshot) is available in the master branch of +[bdwgc git](https://github.com/ivmai/bdwgc) repository on GitHub. The arguments for and against conservative garbage collection in C and C++ are briefly discussed [here](http://www.hboehm.info/gc/issues.html). The @@ -48,28 +50,30 @@ beginnings of a frequently-asked-questions list are The garbage collector code is copyrighted by [Hans-J. Boehm](http://www.hboehm.info), Alan J. Demers, [Xerox Corporation](http://www.xerox.com/), -[Silicon Graphics](http://www.sgi.com/), and -[Hewlett-Packard Company](http://www.hp.com/). It may be used and copied -without payment of a fee under minimal restrictions. See the README.md file -in the distribution or the [license](http://www.hboehm.info/gc/license.txt) -for more details. **IT IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY -EXPRESSED OR IMPLIED. ANY USE IS AT YOUR OWN RISK**. +[Silicon Graphics](http://www.sgi.com/), +[Hewlett-Packard Company](http://www.hp.com/), +[Ivan Maidanski](https://github.com/ivmai), and partially by some others. +It may be used and copied without payment of a fee under minimal restrictions. +See the README.md file in the distribution or the +[license](http://www.hboehm.info/gc/license.txt) for more details. +**IT IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED OR IMPLIED. +ANY USE IS AT YOUR OWN RISK.** Empirically, this collector works with most unmodified C programs, simply -by replacing `malloc` with `GC_malloc` calls, replacing `realloc` with -`GC_realloc` calls, and removing free calls. Exceptions are discussed +by replacing `malloc` and `calloc` with `GC_malloc` calls, replacing `realloc` +with `GC_realloc` calls, and removing `free` calls. Exceptions are discussed [here](http://www.hboehm.info/gc/issues.html). ## Platforms The collector is not completely portable, but the distribution includes ports to most standard PC and UNIX/Linux platforms. The collector should work -on Linux, *BSD, recent Windows versions, MacOS X, HP/UX, Solaris, Tru64, Irix -and a few other operating systems. Some ports are more polished than others. +on Linux, Android, BSD variants, OS/2, Windows (Win32 and Win64), MacOS X, +iOS, HP/UX, Solaris, Tru64, Irix, Symbian and other operating systems. Some +platforms are more polished (better supported) than others. -Irix pthreads, Linux threads, Win32 threads, Solaris threads (pthreads only), -HP/UX 11 pthreads, Tru64 pthreads, and MacOS X threads are supported in recent -versions. +Irix pthreads, Linux threads, Windows threads, Solaris threads (pthreads +only), HP/UX 11 pthreads, Tru64 pthreads, and MacOS X threads are supported. ## Some Collector Details @@ -77,7 +81,7 @@ The collector uses a [mark-sweep](http://www.hboehm.info/gc/complexity.html) algorithm. It provides incremental and generational collection under operating systems which provide the right kind of virtual memory support. (Currently this includes SunOS[45], IRIX, OSF/1, Linux, and Windows, with varying -restrictions.) It allows [_finalization_](finalization.md) code to be invoked +restrictions.) It allows [finalization](finalization.md) code to be invoked when an object is collected. It can take advantage of type information to locate pointers if such information is provided, but it is usually used without such information. See the README and `gc.h` files in the distribution @@ -102,16 +106,18 @@ thread-local allocation, it may in some cases significantly outperform `malloc`/`free` allocation in time. We also expect that in many cases any additional overhead will be more than -compensated for by decreased copying etc. if programs are written and tuned +compensated for by e.g. decreased copying if programs are written and tuned for garbage collection. ## Further reading **The beginnings of a frequently asked questions list for this collector are -[here](http://www.hboehm.info/gc/faq.html)**. +[here](http://www.hboehm.info/gc/faq.html).** -**The following provide information on garbage collection in general**: Paul -Wilson's [garbage collection ftp archive](ftp://ftp.cs.utexas.edu/pub/garbage) +**The following provide information on garbage collection in general:** + +Paul Wilson's +[garbage collection ftp archive](ftp://ftp.cs.utexas.edu/pub/garbage) and [GC survey](ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps). The Ravenbrook @@ -124,7 +130,7 @@ Richard Jones' and his [book](http://www.cs.kent.ac.uk/people/staff/rej/gcbook/gcbook.html). **The following papers describe the collector algorithms we use and the -underlying design decisions at a higher level.** +underlying design decisions at a higher level:** (Some of the lower level details can be found [here](gcdescr.md).) @@ -181,7 +187,7 @@ version. Includes a discussion of a collector facility to much more reliably test for the potential of unbounded heap growth. **The following papers discuss language and compiler restrictions necessary -to guaranteed safety of conservative garbage collection.** +to guaranteed safety of conservative garbage collection:** We thank John Levine and JCLT for allowing us to make the second paper available electronically, and providing PostScript for the final version. diff --git a/doc/porting.md b/doc/porting.md index e71346d0..de7e5036 100644 --- a/doc/porting.md +++ b/doc/porting.md @@ -6,8 +6,7 @@ as scanning the stack(s), that are not possible in portable C code. All of the following assumes that the collector is being ported to a byte-addressable 32- or 64-bit machine. Currently all successful ports -to 64-bit machines involve LP64 targets. The code base includes some -provisions for P64 targets (notably Win64), but that has not been tested. You +to 64-bit machines involve LP64 and LLP64 targets (notably Win64). You are hereby discouraged from attempting a port to non-byte-addressable, or 8-bit, or 16-bit machines. @@ -96,7 +95,7 @@ operating system: is found. This often works on Posix-like platforms. It makes it harder to debug client programs, since startup involves generating and catching a segmentation fault, which tends to confuse users. - * `DATAEND` - Set to the end of the main data segment. Defaults to `end`, + * `DATAEND` - Set to the end of the main data segment. Defaults to `_end`, where that is declared as an array. This works in some cases, since the linker introduces a suitable symbol. * `DATASTART2`, `DATAEND2` - Some platforms have two discontiguous main data @@ -131,11 +130,12 @@ operating system: plausible page boundary, and use that as the stack base. * `DYNAMIC_LOADING` - Should be defined if `dyn_load.c` has been updated for this platform and tracing of dynamic library roots is supported. - * `MPROTECT_VDB`, `PROC_VDB` - May be defined if the corresponding - _virtual dirty bit_ implementation in `os_dep.c` is usable on this platform. - This allows incremental/generational garbage collection. `MPROTECT_VDB` - identifies modified pages by write protecting the heap and catching faults. - `PROC_VDB` uses the /proc primitives to read dirty bits. + * `GWW_VDB`, `MPROTECT_VDB`, `PROC_VDB` - May be defined if the + corresponding _virtual dirty bit_ implementation in `os_dep.c` is usable on + this platform. This allows incremental/generational garbage collection. + (`GWW_VDB` uses the Win32 `GetWriteWatch` function to read dirty bits, + `MPROTECT_VDB` identifies modified pages by write protecting the heap and + catching faults. `PROC_VDB` uses the /proc primitives to read dirty bits.) * `PREFETCH`, `GC_PREFETCH_FOR_WRITE` - The collector uses `PREFETCH(x)` to preload the cache with the data at _x_ address. This defaults to a no-op. * `CLEAR_DOUBLE` - If `CLEAR_DOUBLE` is defined, then `CLEAR_DOUBLE(x)` @@ -209,7 +209,7 @@ stopped with signals. In this case, the changes involve: workarounds are common. Non-preemptive threads packages will probably require further work. Similarly thread-local allocation and parallel marking requires further work in `pthread_support.c`, and may require better - `atomic_ops` support. + `atomic_ops` support for the designed platform. ## Dynamic library support diff --git a/doc/scale.md b/doc/scale.md index c98edd9d..855e04eb 100644 --- a/doc/scale.md +++ b/doc/scale.md @@ -1,12 +1,14 @@ # Garbage collector scalability -In its default configuration, the Boehm-Demers-Weiser garbage collector is not -thread-safe. It can be made thread-safe for a number of environments -by building the collector with `-DGC_THREADS` compilation flag. This has -primarily two effects: +If Makefile.direct is used, in its default configuration the +Boehm-Demers-Weiser garbage collector is not thread-safe. Generally, it can be +made thread-safe by building the collector with `-DGC_THREADS` compilation +flag. This has primarily the following effects: 1. It causes the garbage collector to stop all other threads when it needs - to see a consistent memory state. + to see a consistent memory state. It intercepts thread creation and + termination events to maintain a list of client threads to be stopped when + needed. 2. It causes the collector to acquire a lock around essentially all allocation and garbage collection activity. Since a single lock is used for all allocation-related activity, only one thread can be allocating @@ -16,9 +18,9 @@ primarily two effects: On most platforms, the allocator/collector lock is implemented as a spin lock with exponential back-off. Longer wait times are implemented by yielding and/or sleeping. If a collection is in progress, the pure spinning stage -is skipped. This has the advantage that uncontested and thus most uniprocessor -lock acquisitions are very cheap. It has the disadvantage that the application -may sleep for small periods of time even when there is work to be done. And +is skipped. This has the uncontested advantage that most uniprocessor lock +acquisitions are very cheap. It has the disadvantage that the application may +sleep for small periods of time even when there is work to be done. And threads may be unnecessarily woken up for short periods. Nonetheless, this scheme empirically outperforms native queue-based mutual exclusion implementations in most cases, sometimes drastically so. @@ -31,18 +33,18 @@ to Makefile.direct again.) * Building the collector with `-DPARALLEL_MARK` allows the collector to run the mark phase in parallel in multiple threads, and thus on multiple - processors. The mark phase typically consumes the large majority of the - collection time. Thus this largely parallelizes the garbage collector - itself, though not the allocation process. Currently the marking + processors (or processor cores). The mark phase typically consumes the large + majority of the collection time. Thus, this largely parallelizes the garbage + collector itself, though not the allocation process. Currently the marking is performed by the thread that triggered the collection, together with - _N_ - 1 dedicated threads, where _N_ is the number of processors detected - by the collector. The dedicated threads are created once at initialization - time. A second effect of this flag is to switch to a more concurrent - implementation of `GC_malloc_many`, so that free lists can be built, and - memory can be cleared, by more than one thread concurrently. + _N_ - 1 dedicated threads, where _N_ is the number of processors (cores) + detected by the collector. The dedicated marker threads are created once at + initialization time. Another effect of this flag is to switch to a more + concurrent implementation of `GC_malloc_many`, so that free lists can be + built and memory can be cleared by more than one thread concurrently. * Building the collector with `-DTHREAD_LOCAL_ALLOC` adds support for - thread-local allocation. This causes `GC_malloc`, `GC_malloc_atomic`, and - `GC_gcj_malloc` to be redefined to perform thread-local allocation. + thread-local allocation. This causes `GC_malloc` (actually `GC_malloc_kind`) + and `GC_gcj_malloc` to be redefined to perform thread-local allocation. Memory returned from thread-local allocators is completely interchangeable with that returned by the standard allocators. It may be used by other @@ -55,7 +57,7 @@ An important side effect of this flag is to replace the default spin-then-sleep lock to be replaced by a spin-then-queue based implementation. This _reduces performance_ for the standard allocation functions, though it usually improves performance when thread-local allocation is used heavily, -and thus the number of short-duration lock acquisitions is greatly reduced. +and, thus, the number of short-duration lock acquisitions is greatly reduced. ## The Parallel Marking Algorithm @@ -93,8 +95,9 @@ allocation and incremental collection. They should work correctly with one or the other, but not both. The number of marker threads is set on startup to the number of available -processors (or to the value of the `GC_NPROCS` environment variable). If only -a single processor is detected, parallel marking is disabled. +processor cores (or to the value of either `GC_MARKERS` or `GC_NPROCS` +environment variable, if provided). If only a single processor is detected, +parallel marking is disabled. Note that setting `GC_NPROCS` to 1 also causes some lock acquisitions inside the collector to immediately yield the processor instead of busy waiting @@ -117,7 +120,7 @@ the simple thread-safe collector, built with `-DGC_THREADS`, the execution time increased to 10.3 seconds, or 23.5 elapsed seconds with two clients. (The times for the `malloc`/`free` version with glibc `malloc` are 10.51 (standard library, pthreads not linked), 20.90 (one thread, pthreads linked), and 24.55 -seconds respectively. The benchmark favors a garbage collector, since most +seconds, respectively. The benchmark favors a garbage collector, since most objects are small.) The following table gives execution times for the collector built with @@ -161,7 +164,7 @@ processor as 2 clients on 2 processors) is probably not achievable on this kind of hardware even with such a small number of processors, since the memory system is a major constraint for the garbage collector, the processors usually share a single memory bus, and thus the aggregate memory bandwidth does not -increase in proportion to the number of processors. +increase in proportion to the number of processors (cores). These results are likely to be very sensitive to both hardware and OS issues. Preliminary experiments with an older Pentium Pro machine running an older -- 2.40.0