Proposal: Backward-edge CFI for return statements (RCFI)

author Kostya Serebryany <kcc@google.com>

Mon, 20 Mar 2017 20:42:00 +0000 (20:42 +0000)

committer Kostya Serebryany <kcc@google.com>

Mon, 20 Mar 2017 20:42:00 +0000 (20:42 +0000)
author Kostya Serebryany <kcc@google.com>
Mon, 20 Mar 2017 20:42:00 +0000 (20:42 +0000)
committer Kostya Serebryany <kcc@google.com>
Mon, 20 Mar 2017 20:42:00 +0000 (20:42 +0000)
diff --git a/docs/ControlFlowIntegrityDesign.rst b/docs/ControlFlowIntegrityDesign.rst

index 552e8919796e1130ddbe5bf409c65b42d059fd70..69b72f9ea5b2e33364c6f5bbaef5171d9b1e0133 100644 (file)
--- a/docs/ControlFlowIntegrityDesign.rst
+++ b/docs/ControlFlowIntegrityDesign.rst
@@ -498,12 +498,100 @@ In non-PIE executables the address of an external function (taken from
  the main executable) is the address of that function’s PLT record in
  the main executable. This would break the CFI checks.
  
+Backward-edge CFI for return statements (RCFI)
+==============================================
+
+This section is a proposal. As of March 2017 it is not implemented.
+
+Backward-edge control flow (`RET` instructions) can be hijacked
+via overwriting the return address (`RA`) on stack.
+Various mitigation techniques (e.g. `SafeStack`_, `RFG`_, `Intel CET`_)
+try to detect or prevent `RA` corruption on stack.
+
+RCFI enforces the expected control flow in several different ways described below.
+RCFI heavily relies on LTO.
+
+Leaf Functions
+--------------
+If `f()` is a leaf function (i.e. it has no calls
+except maybe no-return calls) it can be called using a special calling convention
+that stores `RA` in a dedicated register `R` before the `CALL` instruction.
+`f()` does not spill `R` and does not use the `RET` instruction,
+instead it uses the value in `R` to `JMP` to `RA`.
+
+This flavour of CFI is *precise*, i.e. the function is guaranteed to return
+to the point exactly following the call.
+
+An alternative approach is to
+copy `RA` from stack to `R` in the first instruction of `f()`,
+then `JMP` to `R`.
+This approach is simpler to implement (does not require changing the caller)
+but weaker (there is a small window when `RA` is actually stored on stack).
+
+
+Functions called once
+---------------------
+Suppose `f()` is called in just one place in the program
+(assuming we can verify this in LTO mode).
+In this case we can replace the `RET` instruction with a `JMP` instruction
+with the immediate constant for `RA`.
+This will *precisely* enforce the return control flow no matter what is stored on stack.
+
+Another variant is to compare `RA` on stack with the known constant and abort
+if they don't match; then `JMP` to the known constant address.
+
+Functions called in a small number of call sites
+------------------------------------------------
+We may extend the above approach to cases where `f()`
+is called more than once (but still a small number of times).
+With LTO we know all possible values of `RA` and we check them
+one-by-one (or using binary search) against the value on stack.
+If the match is found, we `JMP` to the known constant address, otherwise abort.
+
+This protection is *near-precise*, i.e. it guarantees that the control flow will
+be transferred to one of the valid return addresses for this function,
+but not necessary to the point of the most recent `CALL`.
+
+General case
+------------
+For functions called multiple times a *return jump table* is constructed
+in the same manner as jump tables for indirect function calls (see above).
+The correct jump table entry (or it's index) is passed by `CALL` to `f()`
+(as an extra argument) and then spilled to stack.
+The `RET` instruction is replaced with a load of the jump table entry,
+jump table range check, and `JMP` to the jump table entry.
+
+This protection is also *near-precise*.
+
+Returns from functions called indirectly
+----------------------------------------
+
+If a function is called indirectly, the return jump table is constructed for the
+equivalence class of functions instead of a single function.
+
+Cross-DSO calls
+---------------
+Consider two instrumented DSOs, `A` and `B`. `A` defines `f()` and `B` calls it.
+
+This case will be handled similarly to the cross-DSO scheme using the slow path callback.
+
+Non-goals
+---------
+
+RCFI does not protect `RET` instructions:
+  * in non-instrumented DSOs,
+  * in instrumented DSOs for functions that are called from non-instrumented DSOs,
+  * embedded into other instructions (e.g. `0f4fc3 cmovg %ebx,%eax`).
+
+.. _SafeStack: https://clang.llvm.org/docs/SafeStack.html
+.. _RFG: http://xlab.tencent.com/en/2016/11/02/return-flow-guard
+.. _Intel CET: https://software.intel.com/en-us/blogs/2016/06/09/intel-release-new-technology-specifications-protect-rop-attacks
  
  Hardware support
  ================
  
  We believe that the above design can be efficiently implemented in hardware.
-A single new instruction added to an ISA would allow to perform the CFI check
+A single new instruction added to an ISA would allow to perform the forward-edge CFI check
  with fewer bytes per check (smaller code size overhead) and potentially more
  efficiently. The current software-only instrumentation requires at least
  32-bytes per check (on x86_64).
author	Kostya Serebryany <kcc@google.com>
	Mon, 20 Mar 2017 20:42:00 +0000 (20:42 +0000)
committer	Kostya Serebryany <kcc@google.com>
	Mon, 20 Mar 2017 20:42:00 +0000 (20:42 +0000)