* README.QUICK: Replace gcinterface.html with gcinterface.md.
* doc/doc.am (dist_doc_DATA): Likewise.
* doc/gc.man: Likewise.
* doc/overview.html: Likewise.
* doc/doc.am (dist_doc_DATA): Replace gcdescr.html with gcdescr.md.
* doc/overview.html: Likewise.
* doc/gcdescr.html: Change file suffix to .md; convert text format
from HTML to Markdown.
* doc/gcinterface.html: Likewise.
Define GC_DEBUG before including gc.h for additional checking.
More documentation on the collector interface can be found in README.md,
-doc/gcinterface.html, include/gc.h, and other files in the doc directory.
+doc/gcinterface.md, include/gc.h, and other files in the doc directory.
WARNINGS:
doc/debugging.md \
doc/finalization.md \
doc/gc.man \
- doc/gcdescr.html \
- doc/gcinterface.html \
+ doc/gcdescr.md \
+ doc/gcinterface.md \
doc/leak.md \
doc/overview.html \
doc/porting.md \
including gc.h.
.LP
See the documentation in the include files gc_cpp.h and gc_allocator.h,
-as well as the gcinterface.html file in the distribution,
+as well as the gcinterface.md file in the distribution,
for an alternate, C++ specific interface to the garbage collector.
Note that C++ programs generally
need to be careful to ensure that all allocated memory (whether via new,
+++ /dev/null
-<HTML>
-<HEAD>
- <TITLE> Conservative GC Algorithmic Overview </TITLE>
- <AUTHOR> Hans-J. Boehm, HP Labs (Some of this was written at SGI)</author>
-</HEAD>
-<BODY>
-<H1> <I>This is under construction, and may always be.</i> </h1>
-<H1> Conservative GC Algorithmic Overview </h1>
-<P>
-This is a description of the algorithms and data structures used in our
-conservative garbage collector. I expect the level of detail to increase
-with time. For a survey of GC algorithms, see for example
-<A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's
-excellent paper</a>. For an overview of the collector interface,
-see <A HREF="gcinterface.html">here</a>.
-<P>
-This description is targeted primarily at someone trying to understand the
-source code. It specifically refers to variable and function names.
-It may also be useful for understanding the algorithms at a higher level.
-<P>
-The description here assumes that the collector is used in default mode.
-In particular, we assume that it used as a garbage collector, and not just
-a leak detector. We initially assume that it is used in stop-the-world,
-non-incremental mode, though the presence of the incremental collector
-will be apparent in the design.
-We assume the default finalization model, but the code affected by that
-is very localized.
-<H2> Introduction </h2>
-The garbage collector uses a modified mark-sweep algorithm. Conceptually
-it operates roughly in four phases, which are performed occasionally
-as part of a memory allocation:
-
-<OL>
-
-<LI>
-<I>Preparation</i> Each object has an associated mark bit.
-Clear all mark bits, indicating that all objects
-are potentially unreachable.
-
-<LI>
-<I>Mark phase</i> Marks all objects that can be reachable via chains of
-pointers from variables. Often the collector has no real information
-about the location of pointer variables in the heap, so it
-views all static data areas, stacks and registers as potentially containing
-pointers. Any bit patterns that represent addresses inside
-heap objects managed by the collector are viewed as pointers.
-Unless the client program has made heap object layout information
-available to the collector, any heap objects found to be reachable from
-variables are again scanned similarly.
-
-<LI>
-<I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked,
-objects, and returns them to an appropriate free list for reuse. This is
-not really a separate phase; even in non incremental mode this is operation
-is usually performed on demand during an allocation that discovers an empty
-free list. Thus the sweep phase is very unlikely to touch a page that
-would not have been touched shortly thereafter anyway.
-
-<LI>
-<I>Finalization phase</i> Unreachable objects which had been registered
-for finalization are enqueued for finalization outside the collector.
-
-</ol>
-
-<P>
-The remaining sections describe the memory allocation data structures,
-and then the last 3 collection phases in more detail. We conclude by
-outlining some of the additional features implemented in the collector.
-
-<H2>Allocation</h2>
-The collector includes its own memory allocator. The allocator obtains
-memory from the system in a platform-dependent way. Under UNIX, it
-uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>.
-<P>
-Most static data used by the allocator, as well as that needed by the
-rest of the garbage collector is stored inside the
-<TT>_GC_arrays</tt> structure.
-This allows the garbage collector to easily ignore the collectors own
-data structures when it searches for root pointers. Other allocator
-and collector internal data structures are allocated dynamically
-with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not
-allow for deallocation, and is therefore used only for permanent data
-structures.
-<P>
-The allocator allocates objects of different <I>kinds</i>.
-Different kinds are handled somewhat differently by certain parts
-of the garbage collector. Certain kinds are scanned for pointers,
-others are not. Some may have per-object type descriptors that
-determine pointer locations. Or a specific kind may correspond
-to one specific object layout. Two built-in kinds are uncollectible.
-One (<TT>STUBBORN</tt>) is immutable without special precautions.
-In spite of that, it is very likely that most C clients of the
-collector currently
-use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects.
-The <A HREF="https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcj/">GCJ</a> runtime
-also makes heavy use of a kind (allocated with GC_gcj_malloc) that stores
-type information at a known offset in method tables.
-<P>
-The collector uses a two level allocator. A large block is defined to
-be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2,
-typically on the order of the page size.
-<P>
-Large block sizes are rounded up to
-the next multiple of <TT>HBLKSIZE</tt> and then allocated by
-<TT>GC_allochblk</tt>. Recent versions of the collector
-use an approximate best fit algorithm by keeping free lists for
-several large block sizes.
-The actual
-implementation of <TT>GC_allochblk</tt>
-is significantly complicated by black-listing issues
-(see below).
-<P>
-Small blocks are allocated in chunks of size <TT>HBLKSIZE</tt>.
-Each chunk is
-dedicated to only one object size and kind.
-<P>
-The allocator maintains
-separate free lists for each size and kind of object.
-Associated with each kind is an array of free list pointers,
-with entry <TT>freelist[</tt><I>i</i><TT>]</tt> pointing to
-a free list of size <I>i</i> objects.
-In recent versions of the
-collector, index <TT>i</tt> is expressed in granules, which are the
-minimum allocatable unit, typically 8 or 16 bytes.
-The free lists themselves are
-linked through the first word in each object (see <TT>obj_link()</tt>
-macro).
-<P>
-Once a large block is split for use in smaller objects, it can only
-be used for objects of that size, unless the collector discovers a completely
-empty chunk. Completely empty chunks are restored to the appropriate
-large block free list.
-<P>
-In order to avoid allocating blocks for too many distinct object sizes,
-the collector normally does not directly allocate objects of every possible
-request size. Instead request are rounded up to one of a smaller number
-of allocated sizes, for which free lists are maintained. The exact
-allocated sizes are computed on demand, but subject to the constraint
-that they increase roughly in geometric progression. Thus objects
-requested early in the execution are likely to be allocated with exactly
-the requested size, subject to alignment constraints.
-See <TT>GC_init_size_map</tt> for details.
-<P>
-The actual size rounding operation during small object allocation is
-implemented as a table lookup in <TT>GC_size_map</tt> which maps
-a requested allocation size in bytes to a number of granules.
-<P>
-Both collector initialization and computation of allocated sizes are
-handled carefully so that they do not slow down the small object fast
-allocation path. An attempt to allocate before the collector is initialized,
-or before the appropriate <TT>GC_size_map</tt> entry is computed,
-will take the same path as an allocation attempt with an empty free list.
-This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>)
-which performs the appropriate initialization checks.
-<P>
-In non-incremental mode, we make a decision about whether to garbage collect
-whenever an allocation would otherwise have failed with the current heap size.
-If the total amount of allocation since the last collection is less than
-the heap size divided by <TT>GC_free_space_divisor</tt>, we try to
-expand the heap. Otherwise, we initiate a garbage collection. This ensures
-that the amount of garbage collection work per allocated byte remains
-constant.
-<P>
-The above is in fact an oversimplification of the real heap expansion
-and GC triggering heuristic, which adjusts slightly for root size
-and certain kinds of
-fragmentation. In particular:
-<UL>
-<LI> Programs with a large root set size and
-little live heap memory will expand the heap to amortize the cost of
-scanning the roots.
-<LI> Versions 5.x of the collector actually collect more frequently in
-nonincremental mode. The large block allocator usually refuses to split
-large heap blocks once the garbage collection threshold is
-reached. This often has the effect of collecting well before the
-heap fills up, thus reducing fragmentation and working set size at the
-expense of GC time. Versions 6.x choose an intermediate strategy depending
-on how much large object allocation has taken place in the past.
-(If the collector is configured to unmap unused pages, versions 6.x
-use the 5.x strategy.)
-<LI> In calculating the amount of allocation since the last collection we
-give partial credit for objects we expect to be explicitly deallocated.
-Even if all objects are explicitly managed, it is often desirable to collect
-on rare occasion, since that is our only mechanism for coalescing completely
-empty chunks.
-</ul>
-<P>
-It has been suggested that this should be adjusted so that we favor
-expansion if the resulting heap still fits into physical memory.
-In many cases, that would no doubt help. But it is tricky to do this
-in a way that remains robust if multiple application are contending
-for a single pool of physical memory.
-
-<H2>Mark phase</h2>
-
-At each collection, the collector marks all objects that are
-possibly reachable from pointer variables. Since it cannot generally
-tell where pointer variables are located, it scans the following
-<I>root segments</i> for pointers:
-<UL>
-<LI>The registers. Depending on the architecture, this may be done using
-assembly code, or by calling a <TT>setjmp</tt>-like function which saves
-register contents on the stack.
-<LI>The stack(s). In the case of a single-threaded application,
-on most platforms this
-is done by scanning the memory between (an approximation of) the current
-stack pointer and <TT>GC_stackbottom</tt>. (For Itanium, the register stack
-scanned separately.) The <TT>GC_stackbottom</tt> variable is set in
-a highly platform-specific way depending on the appropriate configuration
-information in <TT>gcconfig.h</tt>. Note that the currently active
-stack needs to be scanned carefully, since callee-save registers of
-client code may appear inside collector stack frames, which may
-change during the mark process. This is addressed by scanning
-some sections of the stack "eagerly", effectively capturing a snapshot
-at one point in time.
-<LI>Static data region(s). In the simplest case, this is the region
-between <TT>DATASTART</tt> and <TT>DATAEND</tt>, as defined in
-<TT>gcconfig.h</tt>. However, in most cases, this will also involve
-static data regions associated with dynamic libraries. These are
-identified by the mostly platform-specific code in <TT>dyn_load.c</tt>.
-</ul>
-The marker maintains an explicit stack of memory regions that are known
-to be accessible, but that have not yet been searched for contained pointers.
-Each stack entry contains the starting address of the block to be scanned,
-as well as a descriptor of the block. If no layout information is
-available for the block, then the descriptor is simply a length.
-(For other possibilities, see <TT>gc_mark.h</tt>.)
-<P>
-At the beginning of the mark phase, all root segments
-(as described above) are pushed on the
-stack by <TT>GC_push_roots</tt>. (Registers and eagerly processed
-stack sections are processed by pushing the referenced objects instead
-of the stack section itself.) If <TT>ALL_INTERIOR_POINTERS</tt> is not
-defined, then stack roots require special treatment. In this case, the
-normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt>
-explicitly checks for interior pointers and pushes descriptors for target
-objects.
-<P>
-The marker is structured to allow incremental marking.
-Each call to <TT>GC_mark_some</tt> performs a small amount of
-work towards marking the heap.
-It maintains
-explicit state in the form of <TT>GC_mark_state</tt>, which
-identifies a particular sub-phase. Some other pieces of state, most
-notably the mark stack, identify how much work remains to be done
-in each sub-phase. The normal progression of mark states for
-a stop-the-world collection is:
-<OL>
-<LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked
-objects. In this case <TT>GC_objects_are_marked</tt> will simultaneously
-be false, so the mark state is advanced to
-<LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push
-uncollectible objects, roots, and then mark everything reachable from them.
-<TT>Scan_ptr</tt> is advanced through the heap until all uncollectible
-objects are pushed, and objects reachable from them are marked.
-At that point, the next call to <TT>GC_mark_some</tt> calls
-<TT>GC_push_roots</tt> to push the roots. It the advances the
-mark state to
-<LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is
-empty, all reachable objects are marked. Once in this state, we work
-only on emptying the mark stack. Once this is completed, the state
-changes to
-<LI> <TT>MS_NONE</tt> indicating that reachable objects are marked.
-</ol>
-
-The core mark routine <TT>GC_mark_from</tt>, is called
-repeatedly by several of the sub-phases when the mark stack starts to fill
-up. It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state
-to empty the mark stack.
-The routine is designed to only perform a limited amount of marking at
-each call, so that it can also be used by the incremental collector.
-It is fairly carefully tuned, since it usually consumes a large majority
-of the garbage collection time.
-<P>
-The fact that it performs only a small amount of work per call also
-allows it to be used as the core routine of the parallel marker. In that
-case it is normally invoked on thread-private mark stacks instead of the
-global mark stack. More details can be found <A HREF="scale.md">here</a>.
-<P>
-The marker correctly handles mark stack overflows. Whenever the mark stack
-overflows, the mark state is reset to <TT>MS_INVALID</tt>.
-Since there are already marked objects in the heap,
-this eventually forces a complete
-scan of the heap, searching for pointers, during which any unmarked objects
-referenced by marked objects are again pushed on the mark stack. This
-process is repeated until the mark phase completes without a stack overflow.
-Each time the stack overflows, an attempt is made to grow the mark stack.
-All pieces of the collector that push regions onto the mark stack have to be
-careful to ensure forward progress, even in case of repeated mark stack
-overflows. Every mark attempt results in additional marked objects.
-<P>
-Each mark stack entry is processed by examining all candidate pointers
-in the range described by the entry. If the region has no associated
-type information, then this typically requires that each 4-byte aligned
-quantity (8-byte aligned with 64-bit pointers) be considered a candidate
-pointer.
-<P>
-We determine whether a candidate pointer is actually the address of
-a heap block. This is done in the following steps:
-<NL>
-<LI> The candidate pointer is checked against rough heap bounds.
-These heap bounds are maintained such that all actual heap objects
-fall between them. In order to facilitate black-listing (see below)
-we also include address regions that the heap is likely to expand into.
-Most non-pointers fail this initial test.
-<LI> The candidate pointer is divided into two pieces; the most significant
-bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and
-the least significant bits specify an offset within that page.
-(A hardware page may actually consist of multiple such pages.
-HBLKSIZE is usually the page size divided by a small power of two.)
-<LI>
-The page address part of the candidate pointer is looked up in a
-<A HREF="tree.html">table</a>.
-Each table entry contains either 0, indicating that the page is not part
-of the garbage collected heap, a small integer <I>n</i>, indicating
-that the page is part of large object, starting at least <I>n</i> pages
-back, or a pointer to a descriptor for the page. In the first case,
-the candidate pointer i not a true pointer and can be safely ignored.
-In the last two cases, we can obtain a descriptor for the page containing
-the beginning of the object.
-<LI>
-The starting address of the referenced object is computed.
-The page descriptor contains the size of the object(s)
-in that page, the object kind, and the necessary mark bits for those
-objects. The size information can be used to map the candidate pointer
-to the object starting address. To accelerate this process, the page header
-also contains a pointer to a precomputed map of page offsets to displacements
-from the beginning of an object. The use of this map avoids a
-potentially slow integer remainder operation in computing the object
-start address.
-<LI>
-The mark bit for the target object is checked and set. If the object
-was previously unmarked, the object is pushed on the mark stack.
-The descriptor is read from the page descriptor. (This is computed
-from information <TT>GC_obj_kinds</tt> when the page is first allocated.)
-</nl>
-<P>
-At the end of the mark phase, mark bits for left-over free lists are cleared,
-in case a free list was accidentally marked due to a stray pointer.
-
-<H2>Sweep phase</h2>
-
-At the end of the mark phase, all blocks in the heap are examined.
-Unmarked large objects are immediately returned to the large object free list.
-Each small object page is checked to see if all mark bits are clear.
-If so, the entire page is returned to the large object free list.
-Small object pages containing some reachable object are queued for later
-sweeping, unless we determine that the page contains very little free
-space, in which case it is not examined further.
-<P>
-This initial sweep pass touches only block headers, not
-the blocks themselves. Thus it does not require significant paging, even
-if large sections of the heap are not in physical memory.
-<P>
-Nonempty small object pages are swept when an allocation attempt
-encounters an empty free list for that object size and kind.
-Pages for the correct size and kind are repeatedly swept until at
-least one empty block is found. Sweeping such a page involves
-scanning the mark bit array in the page header, and building a free
-list linked through the first words in the objects themselves.
-This does involve touching the appropriate data page, but in most cases
-it will be touched only just before it is used for allocation.
-Hence any paging is essentially unavoidable.
-<P>
-Except in the case of pointer-free objects, we maintain the invariant
-that any object in a small object free list is cleared (except possibly
-for the link field). Thus it becomes the burden of the small object
-sweep routine to clear objects. This has the advantage that we can
-easily recover from accidentally marking a free list, though that could
-also be handled by other means. The collector currently spends a fair
-amount of time clearing objects, and this approach should probably be
-revisited.
-<P>
-In most configurations, we use specialized sweep routines to handle common
-small object sizes. Since we allocate one mark bit per word, it becomes
-easier to examine the relevant mark bits if the object size divides
-the word length evenly. We also suitably unroll the inner sweep loop
-in each case. (It is conceivable that profile-based procedure cloning
-in the compiler could make this unnecessary and counterproductive. I
-know of no existing compiler to which this applies.)
-<P>
-The sweeping of small object pages could be avoided completely at the expense
-of examining mark bits directly in the allocator. This would probably
-be more expensive, since each allocation call would have to reload
-a large amount of state (e.g. next object address to be swept, position
-in mark bit table) before it could do its work. The current scheme
-keeps the allocator simple and allows useful optimizations in the sweeper.
-
-<H2>Finalization</h2>
-Both <TT>GC_register_disappearing_link</tt> and
-<TT>GC_register_finalizer</tt> add the request to a corresponding hash
-table. The hash table is allocated out of collected memory, but
-the reference to the finalizable object is hidden from the collector.
-Currently finalization requests are processed non-incrementally at the
-end of a mark cycle.
-<P>
-The collector makes an initial pass over the table of finalizable objects,
-pushing the contents of unmarked objects onto the mark stack.
-After pushing each object, the marker is invoked to mark all objects
-reachable from it. The object itself is not explicitly marked.
-This assures that objects on which a finalizer depends are neither
-collected nor finalized.
-<P>
-If in the process of marking from an object the
-object itself becomes marked, we have uncovered
-a cycle involving the object. This usually results in a warning from the
-collector. Such objects are not finalized, since it may be
-unsafe to do so. See the more detailed
-<A HREF="finalization.md"> discussion of finalization semantics</a>.
-<P>
-Any objects remaining unmarked at the end of this process are added to
-a queue of objects whose finalizers can be run. Depending on collector
-configuration, finalizers are dequeued and run either implicitly during
-allocation calls, or explicitly in response to a user request.
-(Note that the former is unfortunately both the default and not generally safe.
-If finalizers perform synchronization, it may result in deadlocks.
-Nontrivial finalizers generally need to perform synchronization, and
-thus require a different collector configuration.)
-<P>
-The collector provides a mechanism for replacing the procedure that is
-used to mark through objects. This is used both to provide support for
-Java-style unordered finalization, and to ignore certain kinds of cycles,
-<I>e.g.</i> those arising from C++ implementations of virtual inheritance.
-
-<H2>Generational Collection and Dirty Bits</h2>
-We basically use the concurrent and generational GC algorithm described in
-<A HREF="http://www.hboehm.info/gc/papers/pldi91.ps.Z">"Mostly Parallel Garbage Collection"</a>,
-by Boehm, Demers, and Shenker.
-<P>
-The most significant modification is that
-the collector always starts running in the allocating thread.
-There is no separate garbage collector thread. (If parallel GC is
-enabled, helper threads may also be woken up.)
-If an allocation attempt either requests a large object, or encounters
-an empty small object free list, and notices that there is a collection
-in progress, it immediately performs a small amount of marking work
-as described above.
-<P>
-This change was made both because we wanted to easily accommodate
-single-threaded environments, and because a separate GC thread requires
-very careful control over the scheduler to prevent the mutator from
-out-running the collector, and hence provoking unneeded heap growth.
-<P>
-In incremental mode, the heap is always expanded when we encounter
-insufficient space for an allocation. Garbage collection is triggered
-whenever we notice that more than
-<TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt>
-bytes of allocation have taken place.
-After <TT>GC_full_freq</tt> minor collections a major collection
-is started.
-<P>
-All collections initially run uninterrupted until a predetermined
-amount of time (50 msecs by default) has expired. If this allows
-the collection to complete entirely, we can avoid correcting
-for data structure modifications during the collection. If it does
-not complete, we return control to the mutator, and perform small
-amounts of additional GC work during those later allocations that
-cannot be satisfied from small object free lists. When marking completes,
-the set of modified pages is retrieved, and we mark once again from
-marked objects on those pages, this time with the mutator stopped.
-<P>
-We keep track of modified pages using one of several distinct mechanisms:
-<OL>
-<LI>
-Through explicit mutator cooperation. Currently this requires
-the use of <TT>GC_malloc_stubborn</tt>, and is rarely used.
-<LI>
-(<TT>MPROTECT_VDB</tt>) By write-protecting physical pages and
-catching write faults. This is
-implemented for many Unix-like systems and for win32. It is not possible
-in a few environments.
-<LI>
-(<TT>GWW_VDB</tt>) By using the Win32 GetWriteWatch function to read dirty
-bits.
-<LI>
-(<TT>PROC_VDB</tt>) By retrieving dirty bit information from /proc.
-(Currently only Sun's
-Solaris supports this. Though this is considerably cleaner, performance
-may actually be better with mprotect and signals.)
-<LI>
-(<TT>PCR_VDB</tt>) By relying on an external dirty bit implementation, in this
-case the one in Xerox PCR.
-<LI>
-(<TT>DEFAULT_VDB</tt>) By treating all pages as dirty. This is the default if
-none of the other techniques is known to be usable, and
-<TT>GC_malloc_stubborn</tt> is not used. Practical only for testing, or if
-the vast majority of objects use <TT>GC_malloc_stubborn</tt>.
-</ol>
-
-<H2>Black-listing</h2>
-
-The collector implements <I>black-listing</i> of pages, as described
-in
-<A HREF="http://dl.acm.org/citation.cfm?doid=155090.155109">
-Boehm, ``Space Efficient Conservative Collection'', PLDI '93</a>, also available
-<A HREF="https://www.cs.rice.edu/~javaplt/311/Readings/pldi93.pdf">here</a>.
-<P>
-During the mark phase, the collector tracks ``near misses'', i.e. attempts
-to follow a ``pointer'' to just outside the garbage-collected heap, or
-to a currently unallocated page inside the heap. Pages that have been
-the targets of such near misses are likely to be the targets of
-misidentified ``pointers'' in the future. To minimize the future
-damage caused by such misidentification, they will be allocated only to
-small pointer-free objects.
-<P>
-The collector understands two different kinds of black-listing. A
-page may be black listed for interior pointer references
-(<TT>GC_add_to_black_list_stack</tt>), if it was the target of a near
-miss from a location that requires interior pointer recognition,
-<I>e.g.</i> the stack, or the heap if <TT>GC_all_interior_pointers</tt>
-is set. In this case, we also avoid allocating large blocks that include
-this page.
-<P>
-If the near miss came from a source that did not require interior
-pointer recognition, it is black-listed with
-<TT>GC_add_to_black_list_normal</tt>.
-A page black-listed in this way may appear inside a large object,
-so long as it is not the first page of a large object.
-<P>
-The <TT>GC_allochblk</tt> routine respects black-listing when assigning
-a block to a particular object kind and size. It occasionally
-drops (i.e. allocates and forgets) blocks that are completely black-listed
-in order to avoid excessively long large block free lists containing
-only unusable blocks. This would otherwise become an issue
-if there is low demand for small pointer-free objects.
-
-<H2>Thread support</h2>
-We support several different threading models. Unfortunately Pthreads,
-the only reasonably well standardized thread model, supports too narrow
-an interface for conservative garbage collection. There appears to be
-no completely portable way to allow the collector
-to coexist with various Pthreads
-implementations. Hence we currently support only the more
-common Pthreads implementations.
-<P>
-In particular, it is very difficult for the collector to stop all other
-threads in the system and examine the register contents. This is currently
-accomplished with very different mechanisms for some Pthreads
-implementations. For Linux/HPUX/OSF1, Solaris and Irix it sends signals to
-individual Pthreads and has them wait in the signal handler.
-<P>
-The Linux and Irix implementations use
-only documented Pthreads calls, but rely on extensions to their semantics.
-The Linux implementation <TT>pthread_stop_world.c</tt> relies on only very
-mild extensions to the pthreads semantics, and already supports a large number
-of other Unix-like pthreads implementations. Our goal is to make this the
-only pthread support in the collector.
-<P>
-All implementations must
-intercept thread creation and a few other thread-specific calls to allow
-enumeration of threads and location of thread stacks. This is current
-accomplished with <TT># define</tt>'s in <TT>gc.h</tt>
-(really <TT>gc_pthread_redirects.h</tt>), or optionally
-by using ld's function call wrapping mechanism under Linux.
-<P>
-Recent versions of the collector support several facilities to enhance
-the processor-scalability and thread performance of the collector.
-These are discussed in more detail <A HREF="scale.md">here</a>.
-We briefly outline the data approach to thread-local allocation in the
-next section.
-<H2>Thread-local allocation</h2>
-If thread-local allocation is enabled, the collector keeps separate
-arrays of free lists for each thread. Thread-local allocation
-is currently only supported on a few platforms.
-<P>
-The free list arrays associated
-with each thread are only used to satisfy requests for objects that
-are both very small, and belong to one of a small number of well-known
-kinds. These currently include "normal" and pointer-free objects.
-Depending on the configuration, "gcj" objects may also be included.
-<P>
-Thread-local free list entries contain either a pointer to the first
-element of a free list, or they contain a counter of the number of
-allocation granules, corresponding to objects of this size,
-allocated so far. Initially they contain the
-value one, i.e. a small counter value.
-<P>
-Thread-local allocation allocates directly through the global
-allocator, if the object is of a size or kind not covered by the
-local free lists.
-<P>
-If there is an appropriate local free list, the allocator checks whether it
-contains a sufficiently small counter value. If so, the counter is simply
-incremented by the counter value, and the global allocator is used.
-In this way, the initial few allocations of a given size bypass the local
-allocator. A thread that only allocates a handful of objects of a given
-size will not build up its own free list for that size. This avoids
-wasting space for unpopular objects sizes or kinds.
-<P>
-Once the counter passes a threshold, <TT>GC_malloc_many</tt> is called
-to allocate roughly <TT>HBLKSIZE</tt> space and put it on the corresponding
-local free list. Further allocations of that size and kind then use
-this free list, and no longer need to acquire the allocation lock.
-The allocation procedure is otherwise similar to the global free lists.
-The local free lists are also linked using the first word in the object.
-In most cases this means they require considerably less time.
-<P>
-Local free lists are treated buy most of the rest of the collector
-as though they were in-use reachable data. This requires some care,
-since pointer-free objects are not normally traced, and hence a special
-tracing procedure is required to mark all objects on pointer-free and
-gcj local free lists.
-<P>
-On thread exit, any remaining thread-local free list entries are
-transferred back to the global free list.
-<P>
-Note that if the collector is configured for thread-local allocation,
-GC versions before 7 do not invoke the thread-local allocator by default.
-<TT>GC_malloc</tt> only uses thread-local allocation in version 7 and later.
-<P>
-For some more details see <A HREF="scale.md">here</a>, and the
-technical report entitled
-<A HREF="http://www.hpl.hp.com/techreports/2000/HPL-2000-165.html">
-"Fast Multiprocessor Memory Allocation and Garbage Collection"</a>
-</body>
-</html>
--- /dev/null
+# Conservative GC Algorithmic Overview
+
+This is a description of the algorithms and data structures used in our
+conservative garbage collector. I expect the level of detail to increase with
+time. For a survey of GC algorithms, e.g. see Paul Wilson's
+["Uniprocessor Garbage Collection Techniques"](ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps)
+excellent paper. For an overview of the collector interface, see
+[here](gcinterface.md).
+
+This description is targeted primarily at someone trying to understand the
+source code. It specifically refers to variable and function names. It may
+also be useful for understanding the algorithms at a higher level.
+
+The description here assumes that the collector is used in default mode.
+In particular, we assume that it used as a garbage collector, and not just
+a leak detector. We initially assume that it is used in stop-the-world,
+non-incremental mode, though the presence of the incremental collector will
+be apparent in the design. We assume the default finalization model, but the
+code affected by that is very localized.
+
+## Introduction
+
+The garbage collector uses a modified mark-sweep algorithm. Conceptually
+it operates roughly in four phases, which are performed occasionally as part
+of a memory allocation:
+
+ 1. _Preparation_ Each object has an associated mark bit. Clear all mark
+ bits, indicating that all objects are potentially unreachable.
+ 2. _Mark phase_ Marks all objects that can be reachable via chains
+ of pointers from variables. Often the collector has no real information
+ about the location of pointer variables in the heap, so it views all static
+ data areas, stacks and registers as potentially containing pointers. Any bit
+ patterns that represent addresses inside heap objects managed by the
+ collector are viewed as pointers. Unless the client program has made heap
+ object layout information available to the collector, any heap objects found
+ to be reachable from variables are again scanned similarly.
+ 3. _Sweep phase_ Scans the heap for inaccessible, and hence unmarked,
+ objects, and returns them to an appropriate free list for reuse. This is not
+ really a separate phase; even in non incremental mode this is operation
+ is usually performed on demand during an allocation that discovers an empty
+ free list. Thus the sweep phase is very unlikely to touch a page that would
+ not have been touched shortly thereafter anyway.
+ 4. _Finalization phase_ Unreachable objects which had been registered for
+ finalization are enqueued for finalization outside the collector.
+
+The remaining sections describe the memory allocation data structures, and
+then the last 3 collection phases in more detail. We conclude by outlining
+some of the additional features implemented in the collector.
+
+## Allocation
+
+The collector includes its own memory allocator. The allocator obtains memory
+from the system in a platform-dependent way. Under UNIX, it uses either
+`malloc`, `sbrk`, or `mmap`.
+
+Most static data used by the allocator, as well as that needed by the rest
+of the garbage collector is stored inside the `_GC_arrays` structure. This
+allows the garbage collector to easily ignore the collectors own data
+structures when it searches for root pointers. Other allocator and collector
+internal data structures are allocated dynamically with `GC_scratch_alloc`.
+`GC_scratch_alloc` does not allow for deallocation, and is therefore used only
+for permanent data structures.
+
+The allocator allocates objects of different _kinds_. Different kinds are
+handled somewhat differently by certain parts of the garbage collector.
+Certain kinds are scanned for pointers, others are not. Some may have
+per-object type descriptors that determine pointer locations. Or a specific
+kind may correspond to one specific object layout. Two built-in kinds are
+uncollectible. One (`STUBBORN`) is immutable without special precautions.
+In spite of that, it is very likely that most C clients of the collector
+currently use at most two kinds: `NORMAL` and `PTRFREE` objects. The
+[GCJ](https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcj/) runtime also makes heavy
+use of a kind (allocated with `GC_gcj_malloc`) that stores type information
+at a known offset in method tables.
+
+The collector uses a two level allocator. A large block is defined to be one
+larger than half of `HBLKSIZE`, which is a power of 2, typically on the order
+of the page size.
+
+Large block sizes are rounded up to the next multiple of `HBLKSIZE` and then
+allocated by `GC_allochblk`. Recent versions of the collector use an
+approximate best fit algorithm by keeping free lists for several large block
+sizes. The actual implementation of `GC_allochblk` is significantly
+complicated by black-listing issues (see below).
+
+Small blocks are allocated in chunks of size `HBLKSIZE`. Each chunk
+is dedicated to only one object size and kind.
+
+The allocator maintains separate free lists for each size and kind of object.
+Associated with each kind is an array of free list pointers, with entry
+`freelist[i]` pointing to a free list of size 'i' objects. In recent versions
+of the collector, index `i` is expressed in granules, which are the minimum
+allocatable unit, typically 8 or 16 bytes. The free lists themselves are
+linked through the first word in each object (see `obj_link` macro).
+
+Once a large block is split for use in smaller objects, it can only be used
+for objects of that size, unless the collector discovers a completely empty
+chunk. Completely empty chunks are restored to the appropriate large block
+free list.
+
+In order to avoid allocating blocks for too many distinct object sizes, the
+collector normally does not directly allocate objects of every possible
+request size. Instead request are rounded up to one of a smaller number
+of allocated sizes, for which free lists are maintained. The exact allocated
+sizes are computed on demand, but subject to the constraint that they increase
+roughly in geometric progression. Thus objects requested early in the
+execution are likely to be allocated with exactly the requested size, subject
+to alignment constraints. See `GC_init_size_map` for details.
+
+The actual size rounding operation during small object allocation
+is implemented as a table lookup in `GC_size_map` which maps a requested
+allocation size in bytes to a number of granules.
+
+Both collector initialization and computation of allocated sizes are handled
+carefully so that they do not slow down the small object fast allocation path.
+An attempt to allocate before the collector is initialized, or before the
+appropriate `GC_size_map` entry is computed, will take the same path as an
+allocation attempt with an empty free list. This results in a call to the slow
+path code (`GC_generic_malloc_inner`) which performs the appropriate
+initialization checks.
+
+In non-incremental mode, we make a decision about whether to garbage collect
+whenever an allocation would otherwise have failed with the current heap size.
+If the total amount of allocation since the last collection is less than the
+heap size divided by `GC_free_space_divisor`, we try to expand the heap.
+Otherwise, we initiate a garbage collection. This ensures that the amount
+of garbage collection work per allocated byte remains constant.
+
+The above is in fact an oversimplification of the real heap expansion and GC
+triggering heuristic, which adjusts slightly for root size and certain kinds
+of fragmentation. In particular:
+
+ * Programs with a large root set size and little live heap memory will
+ expand the heap to amortize the cost of scanning the roots.
+ * GC v5 actually collect more frequently in non-incremental mode. The large
+ block allocator usually refuses to split large heap blocks once the garbage
+ collection threshold is reached. This often has the effect of collecting
+ well before the heap fills up, thus reducing fragmentation and working set
+ size at the expense of GC time. GC v6 chooses an intermediate strategy
+ depending on how much large object allocation has taken place in the past.
+ (If the collector is configured to unmap unused pages, GC v6 uses the
+ strategy of GC v5.)
+ * In calculating the amount of allocation since the last collection we give
+ partial credit for objects we expect to be explicitly deallocated. Even
+ if all objects are explicitly managed, it is often desirable to collect
+ on rare occasion, since that is our only mechanism for coalescing completely
+ empty chunks.
+
+It has been suggested that this should be adjusted so that we favor expansion
+if the resulting heap still fits into physical memory. In many cases, that
+would no doubt help. But it is tricky to do this in a way that remains robust
+if multiple application are contending for a single pool of physical memory.
+
+## Mark phase
+
+At each collection, the collector marks all objects that are possibly
+reachable from pointer variables. Since it cannot generally tell where pointer
+variables are located, it scans the following _root segments_ for pointers:
+
+ * The registers. Depending on the architecture, this may be done using
+ assembly code, or by calling a `setjmp`-like function which saves register
+ contents on the stack.
+ * The stack(s). In the case of a single-threaded application, on most
+ platforms this is done by scanning the memory between (an approximation of)
+ the current stack pointer and `GC_stackbottom`. (For Intel Itanium, the
+ register stack scanned separately.) The `GC_stackbottom` variable is set in
+ a highly platform-specific way depending on the appropriate configuration
+ information in `gcconfig.h`. Note that the currently active stack needs
+ to be scanned carefully, since callee-save registers of client code may
+ appear inside collector stack frames, which may change during the mark
+ process. This is addressed by scanning some sections of the stack _eagerly_,
+ effectively capturing a snapshot at one point in time.
+ * Static data region(s). In the simplest case, this is the region between
+ `DATASTART` and `DATAEND`, as defined in `gcconfig.h`. However, in most
+ cases, this will also involve static data regions associated with dynamic
+ libraries. These are identified by the mostly platform-specific code
+ in `dyn_load.c`. The marker maintains an explicit stack of memory regions
+ that are known to be accessible, but that have not yet been searched for
+ contained pointers. Each stack entry contains the starting address of the
+ block to be scanned, as well as a descriptor of the block. If no layout
+ information is available for the block, then the descriptor is simply
+ a length. (For other possibilities, see `gc_mark.h`.)
+
+At the beginning of the mark phase, all root segments (as described above) are
+pushed on the stack by `GC_push_roots`. (Registers and eagerly processed stack
+sections are processed by pushing the referenced objects instead of the stack
+section itself.) If `ALL_INTERIOR_POINTERS` is not defined, then stack roots
+require special treatment. In this case, the normal marking code ignores
+interior pointers, but `GC_push_all_stack` explicitly checks for interior
+pointers and pushes descriptors for target objects.
+
+The marker is structured to allow incremental marking. Each call
+to `GC_mark_some` performs a small amount of work towards marking the heap.
+It maintains explicit state in the form of `GC_mark_state`, which identifies
+a particular sub-phase. Some other pieces of state, most notably the mark
+stack, identify how much work remains to be done in each sub-phase. The normal
+progression of mark states for a stop-the-world collection is:
+
+ 1. `MS_INVALID` indicating that there may be accessible unmarked objects.
+ In this case `GC_objects_are_marked` will simultaneously be false, so the
+ mark state is advanced to
+ 2. `MS_PUSH_UNCOLLECTABLE` indicating that it suffices to push uncollectible
+ objects, roots, and then mark everything reachable from them. `scan_ptr`
+ is advanced through the heap until all uncollectible objects are pushed, and
+ objects reachable from them are marked. At that point, the next call
+ to `GC_mark_some` calls `GC_push_roots` to push the roots. It the advances
+ the mark state to
+ 3. `MS_ROOTS_PUSHED` asserting that once the mark stack is empty, all
+ reachable objects are marked. Once in this state, we work only on emptying
+ the mark stack. Once this is completed, the state changes to
+ 4. `MS_NONE` indicating that reachable objects are marked. The core mark
+ routine `GC_mark_from`, is called repeatedly by several of the sub-phases
+ when the mark stack starts to fill up. It is also called repeatedly
+ in `MS_ROOTS_PUSHED` state to empty the mark stack. The routine is designed
+ to only perform a limited amount of marking at each call, so that it can
+ also be used by the incremental collector. It is fairly carefully tuned,
+ since it usually consumes a large majority of the garbage collection time.
+
+The fact that it performs only a small amount of work per call also allows
+it to be used as the core routine of the parallel marker. In that case it is
+normally invoked on thread-private mark stacks instead of the global mark
+stack. More details can be found [here](scale.md).
+
+The marker correctly handles mark stack overflows. Whenever the mark stack
+overflows, the mark state is reset to `MS_INVALID`. Since there are already
+marked objects in the heap, this eventually forces a complete scan of the
+heap, searching for pointers, during which any unmarked objects referenced
+by marked objects are again pushed on the mark stack. This process is repeated
+until the mark phase completes without a stack overflow. Each time the stack
+overflows, an attempt is made to grow the mark stack. All pieces of the
+collector that push regions onto the mark stack have to be careful to ensure
+forward progress, even in case of repeated mark stack overflows. Every mark
+attempt results in additional marked objects.
+
+Each mark stack entry is processed by examining all candidate pointers in the
+range described by the entry. If the region has no associated type
+information, then this typically requires that each 4-byte aligned quantity
+(8-byte aligned with 64-bit pointers) be considered a candidate pointer.
+
+We determine whether a candidate pointer is actually the address of a heap
+block. This is done in the following steps:
+
+ * The candidate pointer is checked against rough heap bounds. These heap
+ bounds are maintained such that all actual heap objects fall between them.
+ In order to facilitate black-listing (see below) we also include address
+ regions that the heap is likely to expand into. Most non-pointers fail this
+ initial test.
+ * The candidate pointer is divided into two pieces; the most significant
+ bits identify a `HBLKSIZE`-sized page in the address space, and the least
+ significant bits specify an offset within that page. (A hardware page may
+ actually consist of multiple such pages. HBLKSIZE is usually the page size
+ divided by a small power of two.)
+ * The page address part of the candidate pointer is looked up in
+ a [table](tree.html). Each table entry contains either 0, indicating that
+ the page is not part of the garbage collected heap, a small integer _n_,
+ indicating that the page is part of large object, starting at least _n_
+ pages back, or a pointer to a descriptor for the page. In the first case,
+ the candidate pointer `i` not a true pointer and can be safely ignored.
+ In the last two cases, we can obtain a descriptor for the page containing
+ the beginning of the object.
+ * The starting address of the referenced object is computed. The page
+ descriptor contains the size of the object(s) in that page, the object kind,
+ and the necessary mark bits for those objects. The size information can be
+ used to map the candidate pointer to the object starting address.
+ To accelerate this process, the page header also contains a pointer to
+ a precomputed map of page offsets to displacements from the beginning of an
+ object. The use of this map avoids a potentially slow integer remainder
+ operation in computing the object start address.
+ * The mark bit for the target object is checked and set. If the object was
+ previously unmarked, the object is pushed on the mark stack. The descriptor
+ is read from the page descriptor. (This is computed from information
+ `GC_obj_kinds` when the page is first allocated.)
+
+At the end of the mark phase, mark bits for left-over free lists are cleared,
+in case a free list was accidentally marked due to a stray pointer.
+
+## Sweep phase
+
+At the end of the mark phase, all blocks in the heap are examined. Unmarked
+large objects are immediately returned to the large object free list. Each
+small object page is checked to see if all mark bits are clear. If so, the
+entire page is returned to the large object free list. Small object pages
+containing some reachable object are queued for later sweeping, unless
+we determine that the page contains very little free space, in which case
+it is not examined further.
+
+This initial sweep pass touches only block headers, not the blocks themselves.
+Thus it does not require significant paging, even if large sections of the
+heap are not in physical memory.
+
+Nonempty small object pages are swept when an allocation attempt encounters
+an empty free list for that object size and kind. Pages for the correct size
+and kind are repeatedly swept until at least one empty block is found.
+Sweeping such a page involves scanning the mark bit array in the page header,
+and building a free list linked through the first words in the objects
+themselves. This does involve touching the appropriate data page, but in most
+cases it will be touched only just before it is used for allocation. Hence any
+paging is essentially unavoidable.
+
+Except in the case of pointer-free objects, we maintain the invariant that any
+object in a small object free list is cleared (except possibly for the link
+field). Thus it becomes the burden of the small object sweep routine to clear
+objects. This has the advantage that we can easily recover from accidentally
+marking a free list, though that could also be handled by other means. The
+collector currently spends a fair amount of time clearing objects, and this
+approach should probably be revisited. In most configurations, we use
+specialized sweep routines to handle common small object sizes. Since
+we allocate one mark bit per word, it becomes easier to examine the relevant
+mark bits if the object size divides the word length evenly. We also suitably
+unroll the inner sweep loop in each case. (It is conceivable that
+profile-based procedure cloning in the compiler could make this unnecessary
+and counterproductive. I know of no existing compiler to which this applies.)
+
+The sweeping of small object pages could be avoided completely at the expense
+of examining mark bits directly in the allocator. This would probably be more
+expensive, since each allocation call would have to reload a large amount
+of state (e.g. next object address to be swept, position in mark bit table)
+before it could do its work. The current scheme keeps the allocator simple and
+allows useful optimizations in the sweeper.
+
+## Finalization
+
+Both `GC_register_disappearing_link` and `GC_register_finalizer` add the
+request to a corresponding hash table. The hash table is allocated out of
+collected memory, but the reference to the finalizable object is hidden from
+the collector. Currently finalization requests are processed non-incrementally
+at the end of a mark cycle.
+
+The collector makes an initial pass over the table of finalizable objects,
+pushing the contents of unmarked objects onto the mark stack. After pushing
+each object, the marker is invoked to mark all objects reachable from it. The
+object itself is not explicitly marked. This assures that objects on which
+a finalizer depends are neither collected nor finalized.
+
+If in the process of marking from an object the object itself becomes marked,
+we have uncovered a cycle involving the object. This usually results in
+a warning from the collector. Such objects are not finalized, since it may be
+unsafe to do so. See the more detailed discussion of
+[finalization semantics](finalization.md).
+
+Any objects remaining unmarked at the end of this process are added to a queue
+of objects whose finalizers can be run. Depending on collector configuration,
+finalizers are dequeued and run either implicitly during allocation calls,
+or explicitly in response to a user request. (Note that the former
+is unfortunately both the default and not generally safe. If finalizers
+perform synchronization, it may result in deadlocks. Nontrivial finalizers
+generally need to perform synchronization, and thus require a different
+collector configuration.)
+
+The collector provides a mechanism for replacing the procedure that is used
+to mark through objects. This is used both to provide support for Java-style
+unordered finalization, and to ignore certain kinds of cycles, e.g. those
+arising from C++ implementations of virtual inheritance.
+
+## Generational Collection and Dirty Bits
+
+We basically use the concurrent and generational GC algorithm described in
+["Mostly Parallel Garbage Collection"](http://www.hboehm.info/gc/papers/pldi91.ps.Z),
+by Boehm, Demers, and Shenker.
+
+The most significant modification is that the collector always starts running
+in the allocating thread. There is no separate garbage collector thread. (If
+parallel GC is enabled, helper threads may also be woken up.) If an allocation
+attempt either requests a large object, or encounters an empty small object
+free list, and notices that there is a collection in progress, it immediately
+performs a small amount of marking work as described above.
+
+This change was made both because we wanted to easily accommodate
+single-threaded environments, and because a separate GC thread requires very
+careful control over the scheduler to prevent the mutator from out-running the
+collector, and hence provoking unneeded heap growth.
+
+In incremental mode, the heap is always expanded when we encounter
+insufficient space for an allocation. Garbage collection is triggered whenever
+we notice that more than `GC_heap_size`/2 * `GC_free_space_divisor` bytes
+of allocation have taken place. After `GC_full_freq` minor collections a major
+collection is started.
+
+All collections initially run uninterrupted until a predetermined amount
+of time (50 msecs by default) has expired. If this allows the collection
+to complete entirely, we can avoid correcting for data structure modifications
+during the collection. If it does not complete, we return control to the
+mutator, and perform small amounts of additional GC work during those later
+allocations that cannot be satisfied from small object free lists. When
+marking completes, the set of modified pages is retrieved, and we mark once
+again from marked objects on those pages, this time with the mutator stopped.
+
+We keep track of modified pages using one of several distinct mechanisms:
+
+ * Through explicit mutator cooperation. Currently this requires the use of
+ `GC_malloc_stubborn`, and is rarely used.
+ * (`MPROTECT_VDB`) By write-protecting physical pages and catching write
+ faults. This is implemented for many Unix-like systems and for Win32. It is
+ not possible in a few environments.
+ * (`GWW_VDB`) By using the Win32 `GetWriteWatch` function to read dirty
+ bits.
+ * (`PROC_VDB`) By retrieving dirty bit information from /proc. (Currently
+ only Sun's Solaris supports this. Though this is considerably cleaner,
+ performance may actually be better with `mprotect` and signals.)
+ * (`PCR_VDB`) By relying on an external dirty bit implementation, in this
+ case the one in Xerox PCR.
+ * (`DEFAULT_VDB`) By treating all pages as dirty. This is the default
+ if none of the other techniques is known to be usable, and
+ `GC_malloc_stubborn` is not used. (Practical only for testing, or if the
+ vast majority of objects use `GC_malloc_stubborn`.)
+
+## Black-listing
+
+The collector implements _black-listing_ of pages, as described in
+["Space Efficient Conservative Collection", PLDI'93](http://dl.acm.org/citation.cfm?doid=155090.155109)
+by Boehm, also available
+[here](https://www.cs.rice.edu/~javaplt/311/Readings/pldi93.pdf).
+
+During the mark phase, the collector tracks _near misses_, i.e. attempts
+to follow a _pointer_ to just outside the garbage-collected heap, or to
+a currently unallocated page inside the heap. Pages that have been the targets
+of such near misses are likely to be the targets of misidentified _pointers_
+in the future. To minimize the future damage caused by such misidentification,
+they will be allocated only to small pointer-free objects.
+
+The collector understands two different kinds of black-listing. A page may be
+black listed for interior pointer references (`GC_add_to_black_list_stack`),
+if it was the target of a near miss from a location that requires interior
+pointer recognition, e.g. the stack, or the heap if `GC_all_interior_pointers`
+is set. In this case, we also avoid allocating large blocks that include this
+page.
+
+If the near miss came from a source that did not require interior pointer
+recognition, it is black-listed with `GC_add_to_black_list_normal`. A page
+black-listed in this way may appear inside a large object, so long as it is
+not the first page of a large object.
+
+The `GC_allochblk` routine respects black-listing when assigning a block to
+a particular object kind and size. It occasionally drops (i.e. allocates and
+forgets) blocks that are completely black-listed in order to avoid excessively
+long large block free lists containing only unusable blocks. This would
+otherwise become an issue if there is low demand for small pointer-free
+objects.
+
+## Thread support
+
+We support several different threading models. Unfortunately Pthreads, the
+only reasonably well standardized thread model, supports too narrow
+an interface for conservative garbage collection. There appears to be no
+completely portable way to allow the collector to coexist with various
+Pthreads implementations. Hence we currently support only the more common
+Pthreads implementations.
+
+In particular, it is very difficult for the collector to stop all other
+threads in the system and examine the register contents. This is currently
+accomplished with very different mechanisms for some Pthreads implementations.
+For Linux/HPUX/OSF1, Solaris and Irix it sends signals to individual Pthreads
+and has them wait in the signal handler.
+
+The Linux and Irix implementations use only documented Pthreads calls, but
+rely on extensions to their semantics. The Linux implementation
+`pthread_stop_world.c` relies on only very mild extensions to the pthreads
+semantics, and already supports a large number of other Unix-like pthreads
+implementations. Our goal is to make this the only pthread support in the
+collector.
+
+All implementations must intercept thread creation and a few other
+thread-specific calls to allow enumeration of threads and location of thread
+stacks. This is current accomplished with `#define`'s in `gc.h` (really
+`gc_pthread_redirects.h`), or optionally by using `ld`'s function call
+wrapping mechanism under Linux.
+
+Recent versions of the collector support several facilities to enhance the
+processor-scalability and thread performance of the collector. These are
+discussed in more detail [here](scale.md). We briefly outline the data
+approach to thread-local allocation in the next section.
+
+## Thread-local allocation
+
+If thread-local allocation is enabled, the collector keeps separate arrays
+of free lists for each thread. Thread-local allocation is currently only
+supported on a few platforms.
+
+The free list arrays associated with each thread are only used to satisfy
+requests for objects that are both very small, and belong to one of a small
+number of well-known kinds. These currently include _normal_ and pointer-free
+objects. Depending on the configuration, _gcj_ objects may also be included.
+
+Thread-local free list entries contain either a pointer to the first element
+of a free list, or they contain a counter of the number of allocation
+granules, corresponding to objects of this size, allocated so far. Initially
+they contain the value one, i.e. a small counter value.
+
+Thread-local allocation allocates directly through the global allocator,
+if the object is of a size or kind not covered by the local free lists.
+
+If there is an appropriate local free list, the allocator checks whether
+it contains a sufficiently small counter value. If so, the counter is simply
+incremented by the counter value, and the global allocator is used. In this
+way, the initial few allocations of a given size bypass the local allocator.
+A thread that only allocates a handful of objects of a given size will not
+build up its own free list for that size. This avoids wasting space for
+unpopular objects sizes or kinds.
+
+Once the counter passes a threshold, `GC_malloc_many` is called to allocate
+roughly `HBLKSIZE` space and put it on the corresponding local free list.
+Further allocations of that size and kind then use this free list, and no
+longer need to acquire the allocation lock. The allocation procedure
+is otherwise similar to the global free lists. The local free lists are also
+linked using the first word in the object. In most cases this means they
+require considerably less time.
+
+Local free lists are treated buy most of the rest of the collector as though
+they were in-use reachable data. This requires some care, since pointer-free
+objects are not normally traced, and hence a special tracing procedure
+is required to mark all objects on pointer-free and gcj local free lists.
+
+On thread exit, any remaining thread-local free list entries are transferred
+back to the global free list.
+
+Note that if the collector is configured for thread-local allocation,
+`GC_malloc` only uses thread-local allocation (starting from GC v7).
+
+For some more details see [here](scale.md), and the technical report entitled
+["Fast Multiprocessor Memory Allocation and Garbage Collection"](http://www.hpl.hp.com/techreports/2000/HPL-2000-165.html).
+++ /dev/null
-<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
-<html lang="en-us">
-<HEAD>
-<meta http-equiv="Content-Type" content="text/html;charset=US-ASCII" >
-<TITLE>Garbage Collector Interface</TITLE>
-</HEAD>
-<BODY>
-<H1>C Interface</h1>
-On many platforms, a single-threaded garbage collector library can be built
-to act as a plug-in malloc replacement.
-(Build with <TT>-DREDIRECT_MALLOC=GC_malloc -DIGNORE_FREE</tt>.)
-This is often the best way to deal with third-party libraries
-which leak or prematurely free objects.
-<TT>-DREDIRECT_MALLOC=GC_malloc</tt> is intended
-primarily as an easy way to adapt old code, not for new development.
-<P>
-New code should use the interface discussed below.
-<P>
-Code must be linked against the GC library. On most UNIX platforms,
-depending on how the collector is built, this will be <TT>gc.a</tt>
-or <TT>libgc.{a,so}</tt>.
-<P>
-The following describes the standard C interface to the garbage collector.
-It is not a complete definition of the interface. It describes only the
-most commonly used functionality, approximately in decreasing order of
-frequency of use.
-The full interface is described in
-<A type="text/plain" HREF="../include/gc.h">gc.h</a>
-or <TT>gc.h</tt> in the distribution.
-<P>
-Clients should include <TT>gc.h</tt>.
-<P>
-In the case of multi-threaded code,
-<TT>gc.h</tt> should be included after the threads header file, and
-after defining <TT>GC_THREADS</tt> macro.
-The header file <TT>gc.h</tt> must be included
-in files that use either GC or threads primitives, since threads primitives
-will be redefined to cooperate with the GC on many platforms.
-<P>
-Thread users should also be aware that on many platforms objects reachable
-only from thread-local variables may be prematurely reclaimed.
-Thus objects pointed to by thread-local variables should also be pointed to
-by a globally visible data structure. (This is viewed as a bug, but as
-one that is exceedingly hard to fix without some libc hooks.)
-<DL>
-<DT> <B>void * GC_MALLOC(size_t <I>nbytes</i>)</b>
-<DD>
-Allocates and clears <I>nbytes</i> of storage.
-Requires (amortized) time proportional to <I>nbytes</i>.
-The resulting object will be automatically deallocated when unreferenced.
-References from objects allocated with the system malloc are usually not
-considered by the collector. (See <TT>GC_MALLOC_UNCOLLECTABLE</tt>, however.
-Building the collector with <TT>-DREDIRECT_MALLOC=GC_malloc_uncollectable
-is often a way around this.)
-<TT>GC_MALLOC</tt> is a macro which invokes <TT>GC_malloc</tt> by default or,
-if <TT>GC_DEBUG</tt>
-is defined before <TT>gc.h</tt> is included, a debugging version that checks
-occasionally for overwrite errors, and the like.
-<DT> <B>void * GC_MALLOC_ATOMIC(size_t <I>nbytes</i>)</b>
-<DD>
-Allocates <I>nbytes</i> of storage.
-Requires (amortized) time proportional to <I>nbytes</i>.
-The resulting object will be automatically deallocated when unreferenced.
-The client promises that the resulting object will never contain any pointers.
-The memory is not cleared.
-This is the preferred way to allocate strings, floating point arrays,
-bitmaps, etc.
-More precise information about pointer locations can be communicated to the
-collector using the interface in
-<A type="text/plain" HREF="../include/gc_typed.h">gc_typed.h</a> in the distribution.
-<DT> <B>void * GC_MALLOC_UNCOLLECTABLE(size_t <I>nbytes</i>)</b>
-<DD>
-Identical to <TT>GC_MALLOC</tt>,
-except that the resulting object is not automatically
-deallocated. Unlike the system-provided malloc, the collector does
-scan the object for pointers to garbage-collectible memory, even if the
-block itself does not appear to be reachable. (Objects allocated in this way
-are effectively treated as roots by the collector.)
-<DT> <B> void * GC_REALLOC(void *<I>old</i>, size_t <I>new_size</i>) </b>
-<DD>
-Allocate a new object of the indicated size and copy (a prefix of) the
-old object into the new object. The old object is reused in place if
-convenient. If the original object was allocated with
-<TT>GC_MALLOC_ATOMIC</tt>,
-the new object is subject to the same constraints. If it was allocated
-as an uncollectible object, then the new object is uncollectible, and
-the old object (if different) is deallocated.
-<DT> <B> void GC_FREE(void *<I>dead</i>) </b>
-<DD>
-Explicitly deallocate an object. Typically not useful for small
-collectible objects.
-<DT> <B> void * GC_MALLOC_IGNORE_OFF_PAGE(size_t <I>nbytes</i>) </b>
-<DD>
-<DT> <B> void * GC_MALLOC_ATOMIC_IGNORE_OFF_PAGE(size_t <I>nbytes</i>) </b>
-<DD>
-Analogous to <TT>GC_MALLOC</tt> and <TT>GC_MALLOC_ATOMIC</tt>,
-except that the client
-guarantees that as long
-as the resulting object is of use, a pointer is maintained to someplace
-inside the first 512 bytes of the object. This pointer should be declared
-volatile to avoid interference from compiler optimizations.
-(Other nonvolatile pointers to the object may exist as well.)
-This is the
-preferred way to allocate objects that are likely to be > 100KBytes in size.
-It greatly reduces the risk that such objects will be accidentally retained
-when they are no longer needed. Thus space usage may be significantly reduced.
-<DT> <B> void GC_INIT(void) </b>
-<DD>
-On some platforms, it is necessary to invoke this
-<I>from the main executable, not from a dynamic library,</i> before
-the initial invocation of a GC routine. It is recommended that this be done
-in portable code, though we try to ensure that it expands to a no-op
-on as many platforms as possible. In GC 7.0, it was required if
-thread-local allocation is enabled in the collector build, and <TT>malloc</tt>
-is not redirected to <TT>GC_malloc</tt>.
-<DT> <B> void GC_gcollect(void) </b>
-<DD>
-Explicitly force a garbage collection.
-<DT> <B> void GC_enable_incremental(void) </b>
-<DD>
-Cause the garbage collector to perform a small amount of work
-every few invocations of <TT>GC_MALLOC</tt> or the like, instead of performing
-an entire collection at once. This is likely to increase total
-running time. It will improve response on a platform that either has
-suitable support in the garbage collector (Linux and most Unix
-versions, win32 if the collector was suitably built) or if "stubborn"
-allocation is used (see
-<A type="text/plain" HREF="../include/gc.h">gc.h</a>).
-On many platforms this interacts poorly with system calls
-that write to the garbage collected heap.
-<DT> <B> GC_warn_proc GC_set_warn_proc(GC_warn_proc <I>p</i>) </b>
-<DD>
-Replace the default procedure used by the collector to print warnings.
-The collector
-may otherwise write to stderr, most commonly because GC_malloc was used
-in a situation in which GC_malloc_ignore_off_page would have been more
-appropriate. See <A type="text/plain" HREF="../include/gc.h">gc.h</a> for details.
-<DT> <B> void GC_REGISTER_FINALIZER(...) </b>
-<DD>
-Register a function to be called when an object becomes inaccessible.
-This is often useful as a backup method for releasing system resources
-(<I>e.g.</i> closing files) when the object referencing them becomes
-inaccessible.
-It is not an acceptable method to perform actions that must be performed
-in a timely fashion.
-See <A type="text/plain" HREF="../include/gc.h">gc.h</a> for details of the interface.
-See <A HREF="finalization.md">here</a> for a more detailed discussion
-of the design.
-<P>
-Note that an object may become inaccessible before client code is done
-operating on objects referenced by its fields.
-Suitable synchronization is usually required.
-See <A HREF="http://portal.acm.org/citation.cfm?doid=604131.604153">here</a>
-or <A HREF="http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html">here</a>
-for details.
-</dl>
-<P>
-If you are concerned with multiprocessor performance and scalability,
-you should consider enabling and using thread local allocation.
-<P>
-If your platform supports it, you should build the collector with parallel
-marking support (<TT>-DPARALLEL_MARK</tt>); configure has it on by default.
-<P>
-If the collector is used in an environment in which pointer location
-information for heap objects is easily available, this can be passed on
-to the collector using the interfaces in either <TT>gc_typed.h</tt>
-or <TT>gc_gcj.h</tt>.
-<P>
-The collector distribution also includes a <B>string package</b> that takes
-advantage of the collector. For details see
-<A type="text/plain" HREF="../include/cord.h">cord.h</a>
-
-<H1>C++ Interface</h1>
-The C++ interface is implemented as a thin layer on the C interface.
-Unfortunately, this thin layer appears to be very sensitive to variations
-in C++ implementations, particularly since it tries to replace the global
-::new operator, something that appears to not be well-standardized.
-Your platform may need minor adjustments in this layer (gc_cpp.cc, gc_cpp.h,
-and possibly gc_allocator.h). Such changes do not require understanding
-of collector internals, though they may require a good understanding of
-your platform. (Patches enhancing portability are welcome.
-But it's easy to break one platform by fixing another.)
-<P>
-Usage of the collector from C++ is also complicated by the fact that there
-are many "standard" ways to allocate memory in C++. The default ::new
-operator, default malloc, and default STL allocators allocate memory
-that is not garbage collected, and is not normally "traced" by the
-collector. This means that any pointers in memory allocated by these
-default allocators will not be seen by the collector. Garbage-collectible
-memory referenced only by pointers stored in such default-allocated
-objects is likely to be reclaimed prematurely by the collector.
-<P>
-It is the programmers responsibility to ensure that garbage-collectible
-memory is referenced by pointers stored in one of
-<UL>
-<LI> Program variables
-<LI> Garbage-collected objects
-<LI> Uncollected but "traceable" objects
-</ul>
-"Traceable" objects are not necessarily reclaimed by the collector,
-but are scanned for pointers to collectible objects.
-They are usually allocated by <TT>GC_MALLOC_UNCOLLECTABLE</tt>, as described
-above, and through some interfaces described below.
-<P>
-(On most platforms, the collector may not trace correctly from in-flight
-exception objects. Thus objects thrown as exceptions should only
-point to otherwise reachable memory. This is another bug whose
-proper repair requires platform hooks.)
-<P>
-The easiest way to ensure that collectible objects are properly referenced
-is to allocate only collectible objects. This requires that every
-allocation go through one of the following interfaces, each one of
-which replaces a standard C++ allocation mechanism. Note that
-this requires that all STL containers be explicitly instantiated with
-<TT>gc_allocator</tt>.
-<DL>
-<DT> <B> STL allocators </b>
-<DD>
-<P>
-Recent versions of the collector include a hopefully standard-conforming
-allocator implementation in <TT>gc_allocator.h</tt>. It defines
-<UL>
-<LI> <TT>traceable_allocator</tt>
-<LI> <TT>gc_allocator</tt>
-</ul>
-which may be used either directly to allocate memory or to instantiate
-container templates.
-The former allocates uncollectible but traced memory.
-The latter allocates garbage-collected memory.
-<P>
-These should work with any fully standard-conforming C++ compiler.
-<P>
-Users of the <A HREF="http://www.sgi.com/tech/stl">SGI extended STL</a>
-or its derivatives (including most g++ versions)
-may instead be able to include <TT>new_gc_alloc.h</tt> before including
-STL header files. This is increasingly discouraged.
-<P>
-This defines SGI-style allocators
-<UL>
-<LI> <TT>alloc</tt>
-<LI> <TT>single_client_alloc</tt>
-<LI> <TT>gc_alloc</tt>
-<LI> <TT>single_client_gc_alloc</tt>
-</ul>
-The first two allocate uncollectible but traced
-memory, while the second two allocate collectible memory.
-The <TT>single_client</tt> versions are not safe for concurrent access by
-multiple threads, but are faster.
-<P>
-For an example, click <A HREF="http://www.hboehm.info/gc/gc_alloc_exC.txt">here</a>.
-<DT> <B> Class inheritance based interface for new-based allocation</b>
-<DD>
-Users may include gc_cpp.h and then cause members of classes to
-be allocated in garbage collectible memory by having those classes
-inherit from class gc.
-For details see <A type="text/plain" HREF="../include/gc_cpp.h">gc_cpp.h</a>.
-<P>
-Linking against libgccpp in addition to the gc library overrides
-::new (and friends) to allocate traceable memory but uncollectible
-memory, making it safe to refer to collectible objects from the resulting
-memory.
-<DT> <B> C interface </b>
-<DD>
-It is also possible to use the C interface from
-<A type="text/plain" HREF="../include/gc.h">gc.h</a> directly.
-On platforms which use malloc to implement ::new, it should usually be possible
-to use a version of the collector that has been compiled as a malloc
-replacement. It is also possible to replace ::new and other allocation
-functions suitably, as is done by libgccpp.
-<P>
-Note that user-implemented small-block allocation often works poorly with
-an underlying garbage-collected large block allocator, since the collector
-has to view all objects accessible from the user's free list as reachable.
-This is likely to cause problems if <TT>GC_MALLOC</tt>
-is used with something like
-the original HP version of STL.
-This approach works well with the SGI versions of the STL only if the
-<TT>malloc_alloc</tt> allocator is used.
-</dl>
-</body>
-</html>
--- /dev/null
+# C/C++ Interface
+
+On many platforms, a single-threaded garbage collector library can be built
+to act as a plug-in `malloc` replacement. (Build with
+`-DREDIRECT_MALLOC=GC_malloc -DIGNORE_FREE`.) This is often the best way to
+deal with third-party libraries which leak or prematurely free objects.
+`-DREDIRECT_MALLOC=GC_malloc` is intended primarily as an easy way to adapt
+old code, not for new development.
+
+New code should use the interface discussed below.
+
+Code must be linked against the GC library. On most UNIX platforms, depending
+on how the collector is built, this will be `gc.a` or `libgc.{a,so}`.
+
+The following describes the standard C interface to the garbage collector.
+It is not a complete definition of the interface. It describes only the most
+commonly used functionality, approximately in decreasing order of frequency
+of use. The full interface is described in `gc.h` file.
+
+Clients should include `gc.h` (i.e., not `gc_config_macros.h`,
+`gc_pthread_redirects.h`, `gc_version.h`). In the case of multi-threaded code,
+`gc.h` should be included after the threads header file, and after defining
+`GC_THREADS` macro. The header file `gc.h` must be included in files that use
+either GC or threads primitives, since threads primitives will be redefined
+to cooperate with the GC on many platforms.
+
+Thread users should also be aware that on many platforms objects reachable
+only from thread-local variables may be prematurely reclaimed. Thus objects
+pointed to by thread-local variables should also be pointed to by a globally
+visible data structure. (This is viewed as a bug, but as one that
+is exceedingly hard to fix without some `libc` hooks.)
+
+**void * `GC_MALLOC`(size_t _nbytes_)** - Allocates and clears _nbytes_
+of storage. Requires (amortized) time proportional to _nbytes_. The resulting
+object will be automatically deallocated when unreferenced. References from
+objects allocated with the system malloc are usually not considered by the
+collector. (See `GC_MALLOC_UNCOLLECTABLE`, however. Building the collector
+with `-DREDIRECT_MALLOC=GC_malloc_uncollectable` is often a way around this.)
+`GC_MALLOC` is a macro which invokes `GC_malloc` by default or, if `GC_DEBUG`
+is defined before `gc.h` is included, a debugging version that checks
+occasionally for overwrite errors, and the like.
+
+**void * `GC_MALLOC_ATOMIC`(size_t _nbytes_)** - Allocates _nbytes_
+of storage. Requires (amortized) time proportional to _nbytes_. The resulting
+object will be automatically deallocated when unreferenced. The client
+promises that the resulting object will never contain any pointers. The memory
+is not cleared. This is the preferred way to allocate strings, floating point
+arrays, bitmaps, etc. More precise information about pointer locations can be
+communicated to the collector using the interface in `gc_typed.h`.
+
+**void * `GC_MALLOC_UNCOLLECTABLE`(size_t _nbytes_)** - Identical
+to `GC_MALLOC`, except that the resulting object is not automatically
+deallocated. Unlike the system-provided `malloc`, the collector does scan the
+object for pointers to garbage-collectible memory, even if the block itself
+does not appear to be reachable. (Objects allocated in this way are
+effectively treated as roots by the collector.)
+
+**void * `GC_REALLOC`(void * _old_, size_t _new_size_)** - Allocate a new
+object of the indicated size and copy (a prefix of) the old object into the
+new object. The old object is reused in place if convenient. If the original
+object was allocated with `GC_MALLOC_ATOMIC`, the new object is subject to the
+same constraints. If it was allocated as an uncollectible object, then the new
+object is uncollectible, and the old object (if different) is deallocated.
+
+**void `GC_FREE`(void * _dead_)** - Explicitly deallocate an object. Typically
+not useful for small collectible objects.
+
+**void * `GC_MALLOC_IGNORE_OFF_PAGE`(size_t _nbytes_)** and **void
+* `GC_MALLOC_ATOMIC_IGNORE_OFF_PAGE`(size_t _nbytes_)** - Analogous
+to `GC_MALLOC` and `GC_MALLOC_ATOMIC`, respectively, except that the client
+guarantees that as long as the resulting object is of use, a pointer
+is maintained to someplace inside the first 512 bytes of the object. This
+pointer should be declared volatile to avoid interference from compiler
+optimizations. (Other nonvolatile pointers to the object may exist as well.)
+This is the preferred way to allocate objects that are likely to be
+more than 100 KB in size. It greatly reduces the risk that such objects will
+be accidentally retained when they are no longer needed. Thus space usage may
+be significantly reduced.
+
+**void `GC_INIT()`** - On some platforms, it is necessary to invoke this _from
+the main executable_, _not from a dynamic library_, before the initial
+invocation of a GC routine. It is recommended that this be done in portable
+code, though we try to ensure that it expands to a no-op on as many platforms
+as possible. In GC v7.0, it was required if thread-local allocation is enabled
+in the collector build, and `malloc` is not redirected to `GC_malloc`.
+
+**void `GC_gcollect`(void)** - Explicitly force a garbage collection.
+
+**void `GC_enable_incremental`(void)** - Cause the garbage collector
+to perform a small amount of work every few invocations of `GC_MALLOC` or the
+like, instead of performing an entire collection at once. This is likely
+to increase total running time. It will improve response on a platform that
+either has suitable support in the garbage collector (Linux and most Unix
+versions, Win32 if the collector was suitably built) or if _stubborn_
+allocation is used (see `gc.h`). On many platforms this interacts poorly with
+system calls that write to the garbage collected heap.
+
+**`GC_warn_proc GC_set_warn_proc(GC_warn_proc)`** - Replace the default
+procedure used by the collector to print warnings. The collector may otherwise
+write to `stderr`, most commonly because `GC_malloc` was used in a situation
+in which `GC_malloc_ignore_off_page` would have been more appropriate. See
+`gc.h` for details.
+
+**void `GC_REGISTER_FINALIZER`(...)** - Register a function to be called when
+an object becomes inaccessible. This is often useful as a backup method for
+releasing system resources (e.g. closing files) when the object referencing
+them becomes inaccessible. It is not an acceptable method to perform actions
+that must be performed in a timely fashion. See `gc.h` for details of the
+interface. See also [here](finalization.md) for a more detailed discussion
+of the design.
+
+Note that an object may become inaccessible before client code is done
+operating on objects referenced by its fields. Suitable synchronization
+is usually required. See
+[here](http://portal.acm.org/citation.cfm?doid=604131.604153)
+or [here](http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html) for
+details.
+
+If you are concerned with multiprocessor performance and scalability, you
+should consider enabling and using thread local allocation.
+
+If your platform supports it, you should build the collector with parallel
+marking support (`-DPARALLEL_MARK`); configure has it on by default.
+
+If the collector is used in an environment in which pointer location
+information for heap objects is easily available, this can be passed on to the
+collector using the interfaces in either `gc_typed.h` or `gc_gcj.h`.
+
+The collector distribution also includes a **string package** that takes
+advantage of the collector. For details see `cord.h` file.
+
+# C++ Interface
+
+The C++ interface is implemented as a thin layer on the C interface.
+Unfortunately, this thin layer appears to be very sensitive to variations
+in C++ implementations, particularly since it tries to replace the global
+`::new` operator, something that appears to not be well-standardized. Your
+platform may need minor adjustments in this layer (`gc_cpp.cc`, `gc_cpp.h`,
+and possibly `gc_allocator.h`). Such changes do not require understanding
+of collector internals, though they may require a good understanding of your
+platform. (Patches enhancing portability are welcome. But it is easy to break
+one platform by fixing another.)
+
+Usage of the collector from C++ is also complicated by the fact that there are
+many _standard_ ways to allocate memory in C++. The default `::new` operator,
+default `malloc`, and default STL allocators allocate memory that is not
+garbage collected, and is not normally _traced_ by the collector. This means
+that any pointers in memory allocated by these default allocators will not be
+seen by the collector. Garbage-collectible memory referenced only by pointers
+stored in such default-allocated objects is likely to be reclaimed prematurely
+by the collector.
+
+It is the programmers responsibility to ensure that garbage-collectible memory
+is referenced by pointers stored in one of
+
+ * Program variables
+ * Garbage-collected objects
+ * Uncollected but _traceable_ objects
+
+Traceable objects are not necessarily reclaimed by the collector, but are
+scanned for pointers to collectible objects. They are usually allocated
+by `GC_MALLOC_UNCOLLECTABLE`, as described above, and through some interfaces
+described below.
+
+On most platforms, the collector may not trace correctly from in-flight
+exception objects. Thus objects thrown as exceptions should only point
+to otherwise reachable memory. This is another bug whose proper repair
+requires platform hooks.
+
+The easiest way to ensure that collectible objects are properly referenced
+is to allocate only collectible objects. This requires that every allocation
+go through one of the following interfaces, each one of which replaces
+a standard C++ allocation mechanism. Note that this requires that all STL
+containers be explicitly instantiated with `gc_allocator`.
+
+**STL allocators**
+
+Recent versions of the collector include a hopefully standard-conforming
+allocator implementation in `gc_allocator.h`. It defines `traceable_allocator`
+and `gc_allocator` which may be used either directly to allocate memory or to
+instantiate container templates. The former allocates uncollectible but traced
+memory. The latter allocates garbage-collected memory.
+
+These should work with any fully standard-conforming C++ compiler.
+
+Users of the [SGI extended STL](http://www.sgi.com/tech/stl) or its
+derivatives (including most g++ versions) may instead be able to include
+`new_gc_alloc.h` before including STL header files. This is increasingly
+discouraged.
+
+This defines SGI-style allocators
+
+ * `alloc`
+ * `single_client_alloc`
+ * `gc_alloc`
+ * `single_client_gc_alloc`
+
+The first two allocate uncollectible but traced memory, while the second two
+allocate collectible memory. The `single_client_...` versions are not safe for
+concurrent access by multiple threads, but are faster.
+
+See sample code [here](http://www.hboehm.info/gc/gc_alloc_exC.txt).
+
+**Class inheritance based interface for new-based allocation**
+
+Users may include `gc_cpp.h` and then cause members of classes to be allocated
+in garbage collectible memory by having those classes inherit from class `gc`.
+For details see `gc_cpp.h` file.
+
+Linking against `libgccpp` in addition to the `gc` library overrides `::new`
+(and friends) to allocate traceable memory but uncollectible memory, making
+it safe to refer to collectible objects from the resulting memory.
+
+**C interface**
+
+It is also possible to use the C interface from `gc.h` directly. On platforms
+which use `malloc` to implement `::new`, it should usually be possible to use
+a version of the collector that has been compiled as a `malloc` replacement.
+It is also possible to replace `::new` and other allocation functions
+suitably, as is done by `libgccpp`.
+
+Note that user-implemented small-block allocation often works poorly with
+an underlying garbage-collected large block allocator, since the collector has
+to view all objects accessible from the user's free list as reachable. This
+is likely to cause problems if `GC_MALLOC` is used with something like the
+original HP version of STL. This approach works well with the SGI versions
+of the STL only if the `malloc_alloc` allocator is used.
<body>
<table bgcolor="#f0f0ff" cellpadding="10%">
<tbody><tr>
- <td><a href="gcinterface.html">Interface Overview</a></td>
+ <td><a href="gcinterface.md">Interface Overview</a></td>
<td><a href="http://www.hboehm.info/gc/04tutorial.pdf">Tutorial Slides</a></td>
<td><a href="http://www.hboehm.info/gc/faq.html">FAQ</a></td>
<td><a href="simple_example.md">Example</a></td>
to facilitate easier interoperation with C libraries, or
just prefer the simple collector interface.
For a more detailed description of the interface, see
-<a href="gcinterface.html">here</a>.
+<a href="gcinterface.md">here</a>.
</p><p>
Alternatively, the garbage collector may be used as
a <a href="leak.md">leak detector</a>
See the README and
<tt>gc.h</tt> files in the distribution for more details.
<p>
-For an overview of the implementation, see <a href="gcdescr.html">here</a>.
+For an overview of the implementation, see <a href="gcdescr.md">here</a>.
</p><p>
The garbage collector distribution includes a C string
(<a type="text/plain" href="../include/cord.h"><i>cord</i></a>) package that provides
a higher level.</b>
</p><p>
(Some of the lower level details can be found
-<a href="gcdescr.html">here</a>.)
+<a href="gcdescr.md">here</a>.)
</p><p>
The first one is not available
electronically due to copyright considerations. Most of the others are
<a href="simple_example.md">A simple illustration of how to build and
use the collector</a>.
<p>
-<a href="gcinterface.html">Description of alternate interfaces to the
+<a href="gcinterface.md">Description of alternate interfaces to the
garbage collector.</a>
</p><p>
<a href="http://www.hboehm.info/gc/04tutorial.pdf">Slides from an ISMM 2004 tutorial about the GC</a>.
<a href="debugging.md">Some hints on debugging garbage collected
applications.</a>
</p><p>
-<a href="gcdescr.html">An overview of the implementation of the
+<a href="gcdescr.md">An overview of the implementation of the
garbage collector.</a>
</p><p>
<a href="tree.html">The data structure used for fast pointer lookups.</a>