granicus.if.org Git - graphviz/commit

fix gvpr corruption of dynamically allocated arguments to user-defined functions

Gvpr programs can define their own functions which can then be called within the
same program:

  void foo(string s) {
    print(s);
  }

  foo("bar");

This mostly worked. However in some cases the gvpr implementation was not
extending the lifetime of the memory allocated to store the passed in value
long enough. Enumerating the cases in which this occurred is complicated because
whether this (used-after-free) memory retained its intended content depended on
(1) the complexity of the expression of the passed in value and (2) what the
target function (`foo` in the above example) was itself doing. As a result, it
seems users mostly did not observe the problem (program/output corruption)
unless they were writing non-trivial functions and calling them with non-trivial
expressions.

Commit 8da53964edec8a665c3996d483df243eb150c2c4 compounded the above problem by
replacing the underlying allocator. While both before and after states use an
arena allocator,¹ the allocator after this change eagerly returns memory to the
backing system allocator (`malloc`) on `vmclear` while the allocator before this
change retained it within its own internal pool. The system allocator is used
much more pervasively in the Graphviz code base than the more tightly scoped
lib/vmalloc allocator, and it also typically does much more aggressive reuse of
recently-freed memory under the assumption that this is more likely to still be
cache-resident and thus faster to access. The net effect of this was that the
chance of the memory in question being reused and overwritten significantly
increased, making a number of latent cases of the problem described above now
user-visible.

The fix in this commit removes the freeing of expressions that are still
potentially in use. The contents of a subexpression in the above described
scenarios now remains intact up to the point it is accessed when evaluating its
parent containing expression.

The astute reader who has followed everything up to now may notice that the
subexpressions’ contents are actually maintained _beyond_ the point of
evaluation of the parent expression, and may be wondering, “didn’t you just turn
a use-after-free into a memory leak?” Unfortunately the answer is yes. However,
it is unclear how to determine when it is safe to free a subexpression without
introducing a more complex concept of call stacks and arbitrarily nested
expressions to lib/expr. Thus given the choice right now between use-after-free
or leaking memory, we are choosing to leak memory. Hopefully this can be
revisited in future.

Gitlab: fixes #2185

¹ https://en.wikipedia.org/wiki/Region-based_memory_management

author	Matthew Fernandez <matthew.fernandez@gmail.com>
	Sat, 26 Feb 2022 20:01:24 +0000 (12:01 -0800)
committer	Matthew Fernandez <matthew.fernandez@gmail.com>
	Sat, 5 Mar 2022 04:28:41 +0000 (20:28 -0800)
commit	734f29cc85f2b3292536c7ef5dcd6e6e00010563
tree	b079bece79bd88dcd36f967b0af8575099bb19c8	tree \| snapshot
parent	9141310423e8950883334371519fbedf0d5c6429	commit \| diff

CHANGELOG.md		diff \| blob \| history
lib/expr/exeval.c		diff \| blob \| history
rtest/test_regression.py		diff \| blob \| history