From 734f29cc85f2b3292536c7ef5dcd6e6e00010563 Mon Sep 17 00:00:00 2001 From: Matthew Fernandez Date: Sat, 26 Feb 2022 12:01:24 -0800 Subject: [PATCH] fix gvpr corruption of dynamically allocated arguments to user-defined functions MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Gvpr programs can define their own functions which can then be called within the same program: void foo(string s) { print(s); } foo("bar"); This mostly worked. However in some cases the gvpr implementation was not extending the lifetime of the memory allocated to store the passed in value long enough. Enumerating the cases in which this occurred is complicated because whether this (used-after-free) memory retained its intended content depended on (1) the complexity of the expression of the passed in value and (2) what the target function (`foo` in the above example) was itself doing. As a result, it seems users mostly did not observe the problem (program/output corruption) unless they were writing non-trivial functions and calling them with non-trivial expressions. Commit 8da53964edec8a665c3996d483df243eb150c2c4 compounded the above problem by replacing the underlying allocator. While both before and after states use an arena allocator,¹ the allocator after this change eagerly returns memory to the backing system allocator (`malloc`) on `vmclear` while the allocator before this change retained it within its own internal pool. The system allocator is used much more pervasively in the Graphviz code base than the more tightly scoped lib/vmalloc allocator, and it also typically does much more aggressive reuse of recently-freed memory under the assumption that this is more likely to still be cache-resident and thus faster to access. The net effect of this was that the chance of the memory in question being reused and overwritten significantly increased, making a number of latent cases of the problem described above now user-visible. The fix in this commit removes the freeing of expressions that are still potentially in use. The contents of a subexpression in the above described scenarios now remains intact up to the point it is accessed when evaluating its parent containing expression. The astute reader who has followed everything up to now may notice that the subexpressions’ contents are actually maintained _beyond_ the point of evaluation of the parent expression, and may be wondering, “didn’t you just turn a use-after-free into a memory leak?” Unfortunately the answer is yes. However, it is unclear how to determine when it is safe to free a subexpression without introducing a more complex concept of call stacks and arbitrarily nested expressions to lib/expr. Thus given the choice right now between use-after-free or leaking memory, we are choosing to leak memory. Hopefully this can be revisited in future. Gitlab: fixes #2185 ¹ https://en.wikipedia.org/wiki/Region-based_memory_management --- CHANGELOG.md | 4 ++++ lib/expr/exeval.c | 1 - rtest/test_regression.py | 9 --------- 3 files changed, 4 insertions(+), 10 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9d8c1c62d..ec3c3f614 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - `agcanon`, `agcanonStr`, and `agwrite` now return error values on memory allocation failures instead of crashing or corrupting data +- `gvpr` programs can now pass dynamically allocated arguments to user-defined + functions without corrupting their content. Some cases of this were a + regression in Graphviz 2.46.0. Other cases have existed since the first + release of `gvpr`. #2185 ## [3.0.0] – 2022-02-26 diff --git a/lib/expr/exeval.c b/lib/expr/exeval.c index 77bc82bc5..943b6cc49 100644 --- a/lib/expr/exeval.c +++ b/lib/expr/exeval.c @@ -2056,7 +2056,6 @@ exeval(Expr_t* ex, Exnode_t* expr, void* env) { Extype_t v; - vmclear(ex->ve); if (expr->compiled.integer) { switch (expr->type) diff --git a/rtest/test_regression.py b/rtest/test_regression.py index 47cd0723b..a69fc73be 100644 --- a/rtest/test_regression.py +++ b/rtest/test_regression.py @@ -1376,9 +1376,6 @@ def test_2179(): assert "Warning: no hard-coded metrics for" not in stderr, \ "incorrect warning triggered" -# FIXME: this has been root caused to 84b2983edf458098bb6233368904265c92da4e65 -# but a fix has not yet been devised -@pytest.mark.xfail() def test_2185_1(): """ GVPR should deal with strings correctly @@ -1394,7 +1391,6 @@ def test_2185_1(): stdin=subprocess.DEVNULL, universal_newlines=True) -@pytest.mark.xfail() def test_2185_2(): """ GVPR should deal with strings correctly @@ -1419,9 +1415,6 @@ def test_2185_2(): # check the first line is as expected assert out.startswith(expected), "incorrect GVPR interpretation" -# FIXME: like `test_2185_1`, it is believed that -# 84b2983edf458098bb6233368904265c92da4e65 is the root cause of this -@pytest.mark.xfail() def test_2185_3(): """ GVPR should deal with strings correctly @@ -1446,7 +1439,6 @@ def test_2185_3(): # check the first two lines are as expected assert out.startswith(expected), "incorrect GVPR interpretation" -@pytest.mark.xfail() # FIXME def test_2185_4(): """ GVPR should deal with strings correctly @@ -1471,7 +1463,6 @@ def test_2185_4(): # check the first three lines are as expected assert out.startswith(expected), "incorrect GVPR interpretation" -@pytest.mark.xfail() # FIXME def test_2185_5(): """ GVPR should deal with strings correctly -- 2.40.0