Complete GC documentation with 'finalization' topic

author Ivan Maidanski <ivmai@mail.ru>

Mon, 26 Dec 2011 07:47:35 +0000 (11:47 +0400)

committer Ivan Maidanski <ivmai@mail.ru>

Mon, 26 Dec 2011 07:47:35 +0000 (11:47 +0400)
author Ivan Maidanski <ivmai@mail.ru>
Mon, 26 Dec 2011 07:47:35 +0000 (11:47 +0400)
committer Ivan Maidanski <ivmai@mail.ru>
Mon, 26 Dec 2011 07:47:35 +0000 (11:47 +0400)
diff --git a/Makefile.direct b/Makefile.direct

index a8168b50ffce45bda36f1cc60216f59c4ff16cd7..cfdecfb6069f1a66ee6a6fa76d4a9a04b168a5bc 100644 (file)
--- a/Makefile.direct
+++ b/Makefile.direct
@@ -112,7 +112,7 @@ SRCS= $(CSRCS) \
  
  DOC_FILES= README.QUICK TODO doc/README.Mac doc/README.OS2 \
    doc/README.amiga doc/README.cords doc/debugging.html \
-  doc/porting.html doc/overview.html \
+  doc/finalization.html doc/porting.html doc/overview.html \
    doc/README.dj doc/README.hp doc/README.linux doc/README.rs6000 \
    doc/README.sgi doc/README.solaris2 doc/README.uts \
    doc/README.win32 doc/barrett_diagram \
diff --git a/doc/doc.am b/doc/doc.am

index 36e410ba8bacf7e7d801b189c4d8d43869ae02c4..27c4691a64d6ab800b2740480473f055530042fc 100644 (file)
--- a/doc/doc.am
+++ b/doc/doc.am
@@ -38,6 +38,7 @@ dist_pkgdata_DATA = \
          doc/README.win64 \
          doc/barrett_diagram \
          doc/debugging.html \
+        doc/finalization.html \
          doc/gc.man \
          doc/gcdescr.html \
          doc/gcinterface.html \
diff --git a/doc/finalization.html b/doc/finalization.html

new file mode 100644 (file)

index 0000000..3733c0c
--- /dev/null
+++ b/doc/finalization.html
@@ -0,0 +1,190 @@
+<!DOCTYPE HTML>
+<HEAD>
+<TITLE>Finalization in the Boehm-Demers-Weiser collector </TITLE>
+</HEAD>
+<BODY>
+<H1>Finalization</h1>
+Many garbage collectors provide a facility for executing user code
+just before an object is collected.  This can be used to reclaim any
+system resources or non-garbage-collected memory associated with the
+object.
+Experience has shown that this can be a useful facility.
+It is indispensable in cases in which system resources are embedded
+in complex data structures (<I>e.g.</i> file descriptors
+in the <A type="text/plain" HREF="../include/cord.h">cord package</a>).
+<P>
+Our collector provides the necessary functionality through
+<TT>GC_register_finalizer</tt> in
+<A type="text/plain" HREF="../include/gc.h">gc.h</a>, or by
+inheriting from <TT>gc_cleanup</tt>
+in <A type="text/plain" HREF="../include/gc_cpp.h">gc_cpp.h</a>.
+<P>
+However, finalization should not be used in the same way as C++
+destructors.  In well-written programs there will typically be
+very few uses of finalization.  (Garbage collected programs that
+interact with explicitly memory-managed libraries may be an exception.)
+<P>
+In general the following guidelines should be followed:
+<UL>
+<LI>
+Actions that must be executed promptly do not belong in finalizers.
+They should be handled by explicit calls in the code (or C++
+destructors if you prefer).  If you expect the action to occur at
+a specific point, this is probably not hard.
+<LI>
+Finalizers are intended for resource reclamation.
+<LI>
+Scarce system resources should be managed explicitly whenever
+convenient.  Use finalizers only as a backup mechanism for the
+cases that would be hard to handle explicitly.
+<LI>
+If scarce resources are managed with finalization, the allocation
+routine for that resource (<I>e.g.</i> open for file handles) should force
+a garbage collection (two if that doesn't suffice) if it finds itself
+short of the resource.
+<LI>
+If extremely scarce resources are managed by finalization (<I>e.g.</i>
+file descriptors on systems which have a limit of 20 open files),
+it may be necessary to introduce a descriptor caching scheme to
+hide the resource limit.
+(<I>E.g.</i>, the program would keep real file descriptors
+for the 20 most recently used logically open files.
+Any other needed files would be closed after saving their state.
+They would then be reopened on demand.
+Finalization would logically close the file, closing the
+real descriptor only if it happened to be cached.)
+Note that most modern systems (<I>e.g.</i> Irix&#174;) allow hundreds or
+thousands of open files, and this is typically not an issue.
+<LI>
+Finalization code may
+be run anyplace an allocation or other call to the collector
+takes place.
+In multi-threaded programs, finalizers have to obey the normal
+locking conventions to ensure safety.
+Code run directly from finalizers should not acquire locks that may
+be held during allocation.  This restriction can be easily circumvented
+by registering a finalizer which enqueues the real action for execution
+in a separate thread.
+<P>
+In single-threaded code, it is also often easiest to have finalizers
+queue actions, which are then explicitly run during an
+explicit call by the user's program.
+</ul>
+<H1>Topologically Ordered Finalization</h1>
+Our <A HREF="overview.html">conservative garbage collector</a> supports
+a form of finalization
+(with <TT>GC_register_finalizer</tt>)
+in which objects are finalized in topological
+order.  If <I>A</i> points to <I>B</i>, and both are registered for
+finalization, it is guaranteed the <I>A</i> will be finalized first.
+This usually guarantees that finalization procedures see only
+unfinalized objects.
+<P>
+This decision is often questioned, particularly since it has an obvious
+disadvantage.  The current implementation finalizes long chains of
+finalizable objects one per collection.  This is hard to avoid, since
+the first finalizer invoked may store a pointer to the rest of the chain
+in a global variable, making it accessible again.  Or it may mutate the
+rest of the chain.
+<P>
+Cycles involving one or more finalizable objects are never finalized.
+<H1>
+Why topological ordering?
+</h1>
+It is important to keep in mind that the choice of finalization ordering
+matters only in relatively rare cases.  In spite of the fact that it has
+received a lot of discussion, it is not one of the more important
+decisions in designing a system.  Many, especially smaller, applications
+will never notice the difference.  Nonetheless, we believe that topologically
+ordered finalization is the right choice.
+<P>
+To understand the justification, observe that if <I>A</i>s
+finalization procedure does not refer to <I>B</i>, we could fairly easily have
+avoided the dependency.  We could have split <I>A</i> into <I>A'</i>
+and <I>A''</i> such that any references to <I>A</i> become references to
+<I>A'</i>, <I>A'</i> points to <I>A''</i> but not vice-versa, only fields
+needed for finalization are stored in <I>A''</i>, and <I>A''</i> is enabled
+for finalization.  (<TT>GC_register_disappearing_link</tt> provides an
+alternative mechanism that does not require breaking up objects.)
+<P>
+Thus assume that <I>A</i> actually does need access to <I>B</i> during
+finalization.  To make things concrete, assume that <I>B</i> is
+finalizable because it holds a pointer to a C object, which must be
+explicitly deallocated.  (This is likely to be one of the most common
+uses of finalization.)  If <I>B</i> happens to be finalized first,
+<I>A</i> will see a dangling pointer during its finalization.  But a
+principal goal of garbage collection was to avoid dangling pointers.
+<P>
+Note that the client program could enforce topological ordering
+even if the system didn't.  A pointer to <I>B</i> could be stored in
+some globally visible place, where it is cleared only by <I>A</i>s
+finalizer.  But this puts the burden to ensure safety back on the
+programmer.
+<P>
+With topologically ordered finalization, the programmer
+can fail to split an object, thus leaving an accidental cycle.  This
+results in a leak, which is arguably less dangerous than a dangling
+pointer.  More importantly, it is <I>much</i> easier to diagnose,
+since the garbage collector would have to go out of its way not to
+notice finalization cycles.  It can trivially report them.
+<P>
+Furthermore unordered finalization does not really solve the problem
+of cycles.  Consider the above case in which <I>A</i>s
+finalization procedure depends on <I>B</i>, and thus a pointer to <I>B</i>
+is stored in a global data structure, to be cleared by <I>A</i>s finalizer.
+If there is an accidental pointer from <I>B</i> back to <I>A</i>, and
+thus a cycle, neither <I>B</i> nor <I>A</i> will become unreachable.
+The leak is there, just as in the topologically ordered case, but it is
+hidden from easy diagnosis.
+<P>
+A number of alternative finalization orderings have been proposed, e.g.
+based on statically assigned priorities.  In our opinion, these are much
+more likely to require complex programming discipline to use in a large
+modular system.  (Some of them, e.g. Guardians proposed by Dybvig,
+Bruggeman, and Eby, do avoid some problems which arise in combination
+with certain other collection algorithms.)
+<P>
+Fundamentally, a garbage collector assumes that objects reachable
+via pointer chains may be accessed, and thus should be preserved.
+Topologically ordered finalization simply extends this to object finalization;
+an finalizable object reachable from another finalizer via a pointer chain
+is presumed to be accessible by the finalizer, and thus should not be
+finalized.
+
+<H1>Programming with topological finalization</h1>
+Experience with Cedar has shown that cycles or long chains of finalizable
+objects are typically not a problem.
+Finalizable objects are typically rare.
+There are several ways to reduce spurious dependencies between finalizable
+objects.  Splitting objects as discussed above is one technique.
+The collector also provides <TT>GC_register_disappearing_link</tt>, which
+explicitly nils a pointer before determining finalization ordering.
+<P>
+Some so-called "operating systems" fail to clean up some resources associated
+with a process.  These resources must be deallocated at all cost before
+process exit whether or not they are still referenced.  Probably the best
+way to deal with those is by not relying exclusively on finalization.
+They should be registered in a table of weak pointers (implemented as
+disguised pointers cleared by the finalization procedure that deallocates
+the resource).  If any references are still left at process exit, they
+can be explicitly deallocated then.
+
+<H1>Getting around topological finalization ordering</h1>
+There are certain situations in which cycles between finalizable objects are
+genuinely unavoidable.  Most notably, C++ compilers introduce self-cycles
+to represent inheritance.  <TT>GC_register_finalizer_ignore_self</tt> tells the
+finalization part of the collector to ignore self cycles.
+This is used by the C++ interface.
+<P>
+Finalize.c actually contains an intentionally undocumented mechanism
+for registering a finalizable object with user-defined dependencies.
+The problem is that this dependency information is also used for memory
+reclamation, not just finalization ordering.  Thus misuse can result in
+dangling pointers even if finalization doesn't create any.
+The risk of dangling pointers can be eliminated by building the collector
+with -DJAVA_FINALIZATION.  This forces objects reachable from finalizers
+to be marked, even though this dependency is not considered for finalization
+ordering.
+
+</body>
+</html>
diff --git a/doc/gcdescr.html b/doc/gcdescr.html

index 08ca2a8f3631238103d6a89854bb28fef66648ff..f56b1fcf0d777edb77954d58dee142332361ccec 100644 (file)
--- a/doc/gcdescr.html
+++ b/doc/gcdescr.html
@@ -407,7 +407,7 @@ object itself becomes marked, we have uncovered
  a cycle involving the object.  This usually results in a warning from the
  collector.  Such objects are not finalized, since it may be
  unsafe to do so.  See the more detailed
-<A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html"> discussion of finalization semantics</a>.
+<A HREF="finalization.html"> discussion of finalization semantics</a>.
  <P>
  Any objects remaining unmarked at the end of this process are added to
  a queue of objects whose finalizers can be run.  Depending on collector
diff --git a/doc/gcinterface.html b/doc/gcinterface.html

index 83574bb8df4084ab8137666f10fb2ee923310b16..eaa038c4a8a90fcd585df8e01dfe9e6c0ce20537 100644 (file)
--- a/doc/gcinterface.html
+++ b/doc/gcinterface.html
@@ -145,7 +145,7 @@ inaccessible.
  It is not an acceptable method to perform actions that must be performed
  in a timely fashion.
  See <A type="text/plain" HREF="../include/gc.h">gc.h</a> for details of the interface.
-See <A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html">here</a> for a more detailed discussion
+See <A HREF="finalization.html">here</a> for a more detailed discussion
  of the design.
  <P>
  Note that an object may become inaccessible before client code is done
diff --git a/doc/overview.html b/doc/overview.html

index de589235c39e65691841b7c1774cd901a2059649..80d5f3122764b769f3faf220878be1d566fd7eaf 100644 (file)
--- a/doc/overview.html
+++ b/doc/overview.html
@@ -137,7 +137,7 @@ It provides incremental and generational
  collection under operating systems which provide the right kind of
  virtual memory support.  (Currently this includes SunOS[45], IRIX,
  OSF/1, Linux, and Windows, with varying restrictions.)
-It allows <a href="http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html"><i>finalization</i></a> code
+It allows <a href="finalization.html"><i>finalization</i></a> code
  to be invoked when an object is collected.
  It can take advantage of type information to locate pointers if such
  information is provided, but it is usually used without such information.
author	Ivan Maidanski <ivmai@mail.ru>
	Mon, 26 Dec 2011 07:47:35 +0000 (11:47 +0400)
committer	Ivan Maidanski <ivmai@mail.ru>
	Mon, 26 Dec 2011 07:47:35 +0000 (11:47 +0400)
Makefile.direct		patch \| blob \| history
doc/doc.am		patch \| blob \| history
doc/finalization.html	[new file with mode: 0644]	patch \| blob
doc/gcdescr.html		patch \| blob \| history
doc/gcinterface.html		patch \| blob \| history
doc/overview.html		patch \| blob \| history