--- /dev/null
+<!DOCTYPE HTML>
+<HEAD>
+<TITLE>Finalization in the Boehm-Demers-Weiser collector </TITLE>
+</HEAD>
+<BODY>
+<H1>Finalization</h1>
+Many garbage collectors provide a facility for executing user code
+just before an object is collected. This can be used to reclaim any
+system resources or non-garbage-collected memory associated with the
+object.
+Experience has shown that this can be a useful facility.
+It is indispensable in cases in which system resources are embedded
+in complex data structures (<I>e.g.</i> file descriptors
+in the <A type="text/plain" HREF="../include/cord.h">cord package</a>).
+<P>
+Our collector provides the necessary functionality through
+<TT>GC_register_finalizer</tt> in
+<A type="text/plain" HREF="../include/gc.h">gc.h</a>, or by
+inheriting from <TT>gc_cleanup</tt>
+in <A type="text/plain" HREF="../include/gc_cpp.h">gc_cpp.h</a>.
+<P>
+However, finalization should not be used in the same way as C++
+destructors. In well-written programs there will typically be
+very few uses of finalization. (Garbage collected programs that
+interact with explicitly memory-managed libraries may be an exception.)
+<P>
+In general the following guidelines should be followed:
+<UL>
+<LI>
+Actions that must be executed promptly do not belong in finalizers.
+They should be handled by explicit calls in the code (or C++
+destructors if you prefer). If you expect the action to occur at
+a specific point, this is probably not hard.
+<LI>
+Finalizers are intended for resource reclamation.
+<LI>
+Scarce system resources should be managed explicitly whenever
+convenient. Use finalizers only as a backup mechanism for the
+cases that would be hard to handle explicitly.
+<LI>
+If scarce resources are managed with finalization, the allocation
+routine for that resource (<I>e.g.</i> open for file handles) should force
+a garbage collection (two if that doesn't suffice) if it finds itself
+short of the resource.
+<LI>
+If extremely scarce resources are managed by finalization (<I>e.g.</i>
+file descriptors on systems which have a limit of 20 open files),
+it may be necessary to introduce a descriptor caching scheme to
+hide the resource limit.
+(<I>E.g.</i>, the program would keep real file descriptors
+for the 20 most recently used logically open files.
+Any other needed files would be closed after saving their state.
+They would then be reopened on demand.
+Finalization would logically close the file, closing the
+real descriptor only if it happened to be cached.)
+Note that most modern systems (<I>e.g.</i> Irix®) allow hundreds or
+thousands of open files, and this is typically not an issue.
+<LI>
+Finalization code may
+be run anyplace an allocation or other call to the collector
+takes place.
+In multi-threaded programs, finalizers have to obey the normal
+locking conventions to ensure safety.
+Code run directly from finalizers should not acquire locks that may
+be held during allocation. This restriction can be easily circumvented
+by registering a finalizer which enqueues the real action for execution
+in a separate thread.
+<P>
+In single-threaded code, it is also often easiest to have finalizers
+queue actions, which are then explicitly run during an
+explicit call by the user's program.
+</ul>
+<H1>Topologically Ordered Finalization</h1>
+Our <A HREF="overview.html">conservative garbage collector</a> supports
+a form of finalization
+(with <TT>GC_register_finalizer</tt>)
+in which objects are finalized in topological
+order. If <I>A</i> points to <I>B</i>, and both are registered for
+finalization, it is guaranteed the <I>A</i> will be finalized first.
+This usually guarantees that finalization procedures see only
+unfinalized objects.
+<P>
+This decision is often questioned, particularly since it has an obvious
+disadvantage. The current implementation finalizes long chains of
+finalizable objects one per collection. This is hard to avoid, since
+the first finalizer invoked may store a pointer to the rest of the chain
+in a global variable, making it accessible again. Or it may mutate the
+rest of the chain.
+<P>
+Cycles involving one or more finalizable objects are never finalized.
+<H1>
+Why topological ordering?
+</h1>
+It is important to keep in mind that the choice of finalization ordering
+matters only in relatively rare cases. In spite of the fact that it has
+received a lot of discussion, it is not one of the more important
+decisions in designing a system. Many, especially smaller, applications
+will never notice the difference. Nonetheless, we believe that topologically
+ordered finalization is the right choice.
+<P>
+To understand the justification, observe that if <I>A</i>s
+finalization procedure does not refer to <I>B</i>, we could fairly easily have
+avoided the dependency. We could have split <I>A</i> into <I>A'</i>
+and <I>A''</i> such that any references to <I>A</i> become references to
+<I>A'</i>, <I>A'</i> points to <I>A''</i> but not vice-versa, only fields
+needed for finalization are stored in <I>A''</i>, and <I>A''</i> is enabled
+for finalization. (<TT>GC_register_disappearing_link</tt> provides an
+alternative mechanism that does not require breaking up objects.)
+<P>
+Thus assume that <I>A</i> actually does need access to <I>B</i> during
+finalization. To make things concrete, assume that <I>B</i> is
+finalizable because it holds a pointer to a C object, which must be
+explicitly deallocated. (This is likely to be one of the most common
+uses of finalization.) If <I>B</i> happens to be finalized first,
+<I>A</i> will see a dangling pointer during its finalization. But a
+principal goal of garbage collection was to avoid dangling pointers.
+<P>
+Note that the client program could enforce topological ordering
+even if the system didn't. A pointer to <I>B</i> could be stored in
+some globally visible place, where it is cleared only by <I>A</i>s
+finalizer. But this puts the burden to ensure safety back on the
+programmer.
+<P>
+With topologically ordered finalization, the programmer
+can fail to split an object, thus leaving an accidental cycle. This
+results in a leak, which is arguably less dangerous than a dangling
+pointer. More importantly, it is <I>much</i> easier to diagnose,
+since the garbage collector would have to go out of its way not to
+notice finalization cycles. It can trivially report them.
+<P>
+Furthermore unordered finalization does not really solve the problem
+of cycles. Consider the above case in which <I>A</i>s
+finalization procedure depends on <I>B</i>, and thus a pointer to <I>B</i>
+is stored in a global data structure, to be cleared by <I>A</i>s finalizer.
+If there is an accidental pointer from <I>B</i> back to <I>A</i>, and
+thus a cycle, neither <I>B</i> nor <I>A</i> will become unreachable.
+The leak is there, just as in the topologically ordered case, but it is
+hidden from easy diagnosis.
+<P>
+A number of alternative finalization orderings have been proposed, e.g.
+based on statically assigned priorities. In our opinion, these are much
+more likely to require complex programming discipline to use in a large
+modular system. (Some of them, e.g. Guardians proposed by Dybvig,
+Bruggeman, and Eby, do avoid some problems which arise in combination
+with certain other collection algorithms.)
+<P>
+Fundamentally, a garbage collector assumes that objects reachable
+via pointer chains may be accessed, and thus should be preserved.
+Topologically ordered finalization simply extends this to object finalization;
+an finalizable object reachable from another finalizer via a pointer chain
+is presumed to be accessible by the finalizer, and thus should not be
+finalized.
+
+<H1>Programming with topological finalization</h1>
+Experience with Cedar has shown that cycles or long chains of finalizable
+objects are typically not a problem.
+Finalizable objects are typically rare.
+There are several ways to reduce spurious dependencies between finalizable
+objects. Splitting objects as discussed above is one technique.
+The collector also provides <TT>GC_register_disappearing_link</tt>, which
+explicitly nils a pointer before determining finalization ordering.
+<P>
+Some so-called "operating systems" fail to clean up some resources associated
+with a process. These resources must be deallocated at all cost before
+process exit whether or not they are still referenced. Probably the best
+way to deal with those is by not relying exclusively on finalization.
+They should be registered in a table of weak pointers (implemented as
+disguised pointers cleared by the finalization procedure that deallocates
+the resource). If any references are still left at process exit, they
+can be explicitly deallocated then.
+
+<H1>Getting around topological finalization ordering</h1>
+There are certain situations in which cycles between finalizable objects are
+genuinely unavoidable. Most notably, C++ compilers introduce self-cycles
+to represent inheritance. <TT>GC_register_finalizer_ignore_self</tt> tells the
+finalization part of the collector to ignore self cycles.
+This is used by the C++ interface.
+<P>
+Finalize.c actually contains an intentionally undocumented mechanism
+for registering a finalizable object with user-defined dependencies.
+The problem is that this dependency information is also used for memory
+reclamation, not just finalization ordering. Thus misuse can result in
+dangling pointers even if finalization doesn't create any.
+The risk of dangling pointers can be eliminated by building the collector
+with -DJAVA_FINALIZATION. This forces objects reachable from finalizers
+to be marked, even though this dependency is not considered for finalization
+ordering.
+
+</body>
+</html>