David Blaikie [Tue, 11 Dec 2018 00:09:06 +0000 (00:09 +0000)]
llvm-objcopy: Improve/simplify llvm::Error handling during notes iteration
Using an Error as an out parameter from an indirect operation like
iteration as described in the documentation (
http://llvm.org/docs/ProgrammersManual.html#building-fallible-iterators-and-iterator-ranges
) seems to be a little fussy - so here's /one/ possible solution, though
I'm not sure it's the right one.
Alternatively such APIs may be better off being switched to a standard
algorithm style, where they take a lambda to do the iteration work that
is then called back into (eg: "Error e = obj.for_each_note([](const
Note& N) { ... });"). This would be safer than having an unwritten
assumption that the user of such an iteration cannot return early from
the inside of the function - and must always exit through the gift
shop... I mean error checking. (even though it's guaranteed that if
you're mid-way through processing an iteration, it's not in an error
state).
Alternatively we'd need some other (the super untrustworthy/thing we've
generally tried to avoid) error handling primitive that actually clears
the error state entirely so it's safe to ignore.
Fleshed this solution out a bit further during review - it now relies on
op==/op!= comparison as the equivalent to "if (Err)" testing the Error.
So just like an Error must be checked (even if it's in a success state),
the Error hiding in the iterator must be checked after each increment
(including by comparison with another iterator - perhaps this could be
constrained to only checking if the iterator is compared to the end
iterator? Not sure it's too important).
So now even just creating the iterator and not incrementing it at all
should still assert because the Error has not been checked.
David Blaikie [Mon, 10 Dec 2018 22:44:48 +0000 (22:44 +0000)]
debuginfo: Use symbol difference for CU length to simplify assembly reading/editing
Mucking about simplifying a test case ( https://reviews.llvm.org/D55261 ) I stumbled across something I've hit before - that LLVM's (GCC's does too, FWIW) assembly output includes a hardcode length for a DWARF unit in its header. Instead we could emit a label difference - making the assembly easier to read/edit (though potentially at a slight (I haven't tried to observe it) performance cost of delaying/sinking the length computation into the MC layer).
[Hexagon] Couple of fixes in optimize addressing mode
- Check if an operand is an immediate before calling getImm. Some operands
that take constant values can actually have global symbols or other
constant expressions.
- When a load-constant instruction can be folded into users, make sure to
only delete it when all users have been successfully converted.
JF Bastien [Mon, 10 Dec 2018 19:27:38 +0000 (19:27 +0000)]
APFloat: allow 64-bit of payload
Summary: The APFloat and Constant APIs taking an APInt allow arbitrary payloads,
and that's great. There's a convenience API which takes an unsigned, and that's
silly because it then directly creates a 64-bit APInt. Just change it to 64-bits
directly.
At the same time, add ConstantFP NaN getters which match the APFloat ones (with
getQNaN / getSNaN and APInt parameters).
Improve the APFloat testing to set more payload bits.
As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now.
Neil Henning [Mon, 10 Dec 2018 16:35:53 +0000 (16:35 +0000)]
[AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D.
This commit changes which l1 flush instruction is used for AMDPAL and
MESA3d workloads to flush the entire l1 cache instead of just the
volatile lines.
Petr Pavlu [Mon, 10 Dec 2018 15:15:05 +0000 (15:15 +0000)]
[GlobalISel] Set stack protector index when translating Intrinsic::stackprotector
Record the stack protector index in MachineFrameInfo when translating
Intrinsic::stackprotector similarly as is done by SelectionDAG when
processing the same intrinsic.
Setting this index allows the Prologue/Epilogue Insertion to recognize
that the stack protection is enabled. The pass can then make sure that
the stack protector comes before local variables on the stack and
assigns potentially vulnerable objects first so they are close to the
stack protector slot.
[mips][mc] Emit R_{MICRO}MIPS_JALR when expanding jal to jalr
When replacing jal with jalr, also emit '.reloc R_MIPS_JALR' (R_MICROMIPS_JALR
for micromips). The linker might then be able to turn jalr into a direct
call.
Add '-mips-jalr-reloc' to enable/disable this feature (default is true).
Tim Corringham [Mon, 10 Dec 2018 12:06:10 +0000 (12:06 +0000)]
[AMDGPU] Add new Mode Register pass
A new pass to manage the Mode register.
Currently this just manages the floating point double precision
rounding requirements, but is intended to be easily extended to
encompass all Mode register settings.
The immediate motivation comes from the requirement to use the
round-to-zero rounding mode for the 16 bit interpolation
instructions, where the rounding mode setting is shared between
16 and 64 bit operations.
Jeremy Morse [Mon, 10 Dec 2018 12:04:08 +0000 (12:04 +0000)]
[DebugInfo] Don't drop dbg.value's of nullptr
Currently, dbg.value's of "nullptr" are dropped when entering a SelectionDAG --
apparently just because of an oversight when recognising Values that are
constant (see PR39787). This patch adds ConstantPointerNull to the list of
constants that can be turned into DBG_VALUEs.
The matter of what bit-value a null pointer constant in LLVM has was raised
in this mailing list thread:
Jeremy Morse [Mon, 10 Dec 2018 11:20:47 +0000 (11:20 +0000)]
[DebugInfo] Emit undef DBG_VALUEs when SDNodes are optimised out
This is a fix for PR39896, where dbg.value's of SDNodes that have been
optimised out do not lead to "DBG_VALUE undef" instructions being created.
Such undef instructions are necessary to terminate earlier variable
ranges, otherwise variable values leak past the point where they're valid.
The "invalidated" flag of SDDbgValue is currently being abused to mean two
things:
* The corresponding SDNode is now invalid
* This SDDbgValue should not be emitted
Of which there are several legitimate combinations of meaning:
* The SDNode has been invalidated and we should emit "DBG_VALUE undef"
* The SDNode has been invalidated but the debug data was salvaged, don't
emit anything for this SDDbgValue
* This SDDbgValue has been emitted
This patch introduces distinct "Emitted" and "Invalidated" fields to the
SDDbgValue class, updates users accordingly, and generates "undef"
DBG_VALUEs for invalidated records. Awkwardly, there are circumstances
where we emit SDDbgValue's twice, specifically DebugInfo/X86/dbg-addr-dse.ll
which I've preserved.
The size of the first copy was computed as
std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in
skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as
I can see, this should just be LdDisp2 - LdDisp1. The case where
LdDisp1 > LdDisp2 is already handled in the code above, in which case
LdDisp2 is set to LdDisp1 and this subtraction will evaluate to
Size1 = 0, which is the correct value to skip an overlapping copy.
Craig Topper [Mon, 10 Dec 2018 06:58:58 +0000 (06:58 +0000)]
[CostModel][X86][AArch64] Adjust cost of the scalarization part of min/max reduction.
Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register?
Craig Topper [Mon, 10 Dec 2018 06:07:50 +0000 (06:07 +0000)]
[X86] Merge addcarryx/addcarry intrinsic into a single addcarry intrinsic.
Both intrinsics do the exact same thing so we really only need one.
Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency.
Armando Montanez [Mon, 10 Dec 2018 02:36:33 +0000 (02:36 +0000)]
[TextAPI][elfabi] Make TBE handlers functions that return Errors
Since TBEHandler doesn't maintain state or otherwise have any need to be
a class right now, the read and write functions have been moved out and
turned into standalone functions. Additionally, the TBE read function
has been updated to return an Expected value for better error handling.
Tests have been updated to reflect these changes.
Brian Gesiak [Mon, 10 Dec 2018 00:56:13 +0000 (00:56 +0000)]
[bugpoint] Find 'opt', etc., in bugpoint directory
Summary:
When bugpoint attempts to find the other executables it needs to run,
such as `opt` or `clang`, it tries searching the user's PATH. However,
in many cases, the 'bugpoint' executable is part of an LLVM build, and
the 'opt' executable it's looking for is in that same directory.
Many LLVM tools handle this case by using the `Paths` parameter of
`llvm::sys::findProgramByName`, passing the parent path of the currently
running executable. Do this same thing for bugpoint. However, to
preserve the current behavior exactly, first search the user's PATH,
and then search for 'opt' in the directory containing 'bugpoint'.
Test Plan:
`check-llvm`. Many of the existing bugpoint tests no longer need to use the
`--opt-command` option as a result of these changes.
Brian Gesiak [Sun, 9 Dec 2018 21:56:50 +0000 (21:56 +0000)]
[AMDGPU] Fix discarded result of addAttribute
Summary:
`llvm::AttributeList` and `llvm::AttributeSet` are immutable, and so methods
defined on these classes, such as `addAttribute`, return a new immutable
object with the attribute added. In https://reviews.llvm.org/D55217 I attempted
to annotate methods such as `addAttribute` with `LLVM_NODISCARD`, since
calling these methods has no side-effects, and so ignoring the result
that is returned is almost certainly a programmer error.
However, committing the change resulted in new warnings in the AMDGPU target.
The AMDGPU simplify libcalls pass added in https://reviews.llvm.org/D36436
attempts to add the readonly and nounwind attributes to simplified
library functions, but instead calls the `addAttribute` methods and
ignores the result.
Modify the simplify libcalls pass to actually add the nounwind and
readonly attributes. Also update the simplify libcalls test to assert
that these attributes are actually being set.
Aaron Ballman [Sun, 9 Dec 2018 19:53:15 +0000 (19:53 +0000)]
Adding an STL-like type trait that is duplicated in multiple places in Clang.
This trait is used by several AST visitor classes to control whether the AST is visiting const nodes or non-const nodes. These uses cannot be easily replaced with the STL traits directly due to use of an unspecialized templated when a type is expected (due to the template template parameter involved).
Craig Topper [Sun, 9 Dec 2018 18:02:40 +0000 (18:02 +0000)]
[X86] Add some comments about when some X86 intrinsic autoupgrade code was added.
Someday we'd like to remove old autoupgrade code so it helps to annotate how long its been there so we don't have to go digging through commit history.
Craig Topper [Sun, 9 Dec 2018 18:02:37 +0000 (18:02 +0000)]
[X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB.
Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother.
Sanjay Patel [Sun, 9 Dec 2018 14:40:37 +0000 (14:40 +0000)]
[x86] don't try to convert add with undef operands to LEA
The existing code tries to handle an undef operand while transforming an add to an LEA,
but it's incomplete because we will crash on the i16 test with the debug output shown below.
It's better to just give up instead. Really, GlobalIsel should have folded these before we
could get into trouble.
# Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected
Simon Pilgrim [Sun, 9 Dec 2018 13:45:15 +0000 (13:45 +0000)]
[X86] Extend pfm counter coverage for llvm-exegesis
Extension to rL348617, turns out llvm-exegesis doesn't need to match the perf counter name against a scheduler model resource name - so I've added a few more counters that I could find in the libpfm4 source code (and fix a typo in the knl/knm retired_uops counter - which uses 'all' instead of 'any').
Martin Storsjo [Sat, 8 Dec 2018 18:15:41 +0000 (18:15 +0000)]
[COFF] Map truncated .eh_frame section name
PE/COFF sections can have section names truncated to 8 chars, in order to
have the name available at runtime. (The string table, where long untruncated
names are stored, isn't loaded at runtime.)
This allows various llvm tools to dump the .eh_frame section from such
executables.
Sanjay Patel [Sat, 8 Dec 2018 16:07:38 +0000 (16:07 +0000)]
[DAGCombiner] re-enable truncation of binops
This is effectively re-committing the changes from:
rL347917 (D54640)
rL348195 (D55126)
...which were effectively reverted here:
rL348604
...because the code had a bug that could induce infinite looping
or eventual out-of-memory compilation.
The bug was that this code did not guard against transforming
opaque constants. More details are in the post-commit mailing
list thread for r347917. A reduced test for that is included
in the x86 bool-math.ll file. (I wasn't able to reduce a PPC
backend test for this, but it was almost the same pattern.)
Original commit message for r347917:
The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.
Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc
sequences that don't get folded in IR.
As the TODO comments suggest, there will be regressions if we extend this (for x86,
we mostly seem to be missing LEA opportunities, but there are likely vector folds
missing too). I think those should be considered existing bugs because this is the
same transform that we do as an IR canonicalization in instcombine. We just need
more tests to make those visible independent of this patch.
Xing GUO [Sat, 8 Dec 2018 05:32:28 +0000 (05:32 +0000)]
[llvm-readobj] Little clean up inside `parseDynamicTable`
Summary:
This anoymous function actually has same logic with `Obj->toMappedAddr`.
Besides, I have a question on resolving illegal value. `gnu-readelf`, `gnu-objdump` and `llvm-objdump` could parse the test file 'test/tools/llvm-objdump/Inputs/private-headers-x86_64.elf', but `llvm-readobj` will fail when parse `DT_RELR` segment. Because, the value is 0x87654321 which is illegal. So, shall we do this clean up rather then remove the checking statements inside anoymous function?
```
if (Delta >= Phdr.p_filesz)
return createError("Virtual address is not in any segment");
```
Craig Topper [Sat, 8 Dec 2018 00:27:34 +0000 (00:27 +0000)]
[SelectionDAG] Remove ISD::ADDC/ADDE from some undef handling code in getNode. NFCI
These nodes should have two results. A real VT and a Glue. But this code would have returned Undef which would only be a single result. But we're in the single result version of getNode so these opcodes should never be seen by this function anyway.
Craig Topper [Fri, 7 Dec 2018 22:16:40 +0000 (22:16 +0000)]
[X86] Remove the XFAILed test added in r348620
It seems to be unexpectedly passing on some bots probably because it requires asserts to fail, but doesn't say that. But we already have a patch in review to make it not xfail so I'd rather just focus on getting it passing rather than trying to figure out an unexpected pass.
[ModuleSummary] use StringRefs to avoid a redundant copy; NFC
`Saver` is a StringSaver, which has a few overloads of `save` that all
ultimately just call `StringRef save(StringRef)`. Just take a StringRef
here instead of building up a std::string to convert it to a StringRef.
Summary:
- LLVM clang-format style doesn't allow one-line ifs.
- LLVM clang-tidy style says method names should start with a lowercase
letter. But currently WebAssemblyAsmParser's parent class
MCTargetAsmParser is mixing lowercase and uppercase method names
itself so overridden methods cannot be renamed now.
- Changed else ifs after returns to ifs.
- Added some newlines for readability.
Nikita Popov [Fri, 7 Dec 2018 21:16:58 +0000 (21:16 +0000)]
[MemCpyOpt] memset->memcpy forwarding with undef tail
Currently memcpyopt optimizes cases like
memset(a, byte, N);
memcpy(b, a, M);
to
memset(a, byte, N);
memset(b, byte, M);
if M <= N. Often this allows further simplifications down the line,
which drop the first memset entirely.
This patch extends this optimization for the case where M > N, but we
know that the bytes a[N..M] are undef due to alloca/lifetime.start.
This situation arises relatively often for Rust code, because Rust does
not initialize trailing structure padding and loves to insert redundant
memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.
For the implementation, I'm reusing a bit of code for a similar existing
optimization (direct memcpy of undef). I've also added memset support to
MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be
used, but it seems to make more sense to add this to GetLocation and thus
make the computation cachable.
Vedant Kumar [Fri, 7 Dec 2018 20:24:04 +0000 (20:24 +0000)]
[HotColdSplitting] Refine definition of unlikelyExecuted
The splitting pass uses its 'unlikelyExecuted' predicate to statically
decide which blocks are cold.
- Do not treat noreturn calls as if they are cold unless they are actually
marked cold. This is motivated by functions like exit() and longjmp(), which
are not beneficial to outline.
- Do not treat inline asm as an outlining barrier. In practice asm("") is
frequently used to inhibit basic block merging; enabling outlining in this case
results in substantial memory savings.
- Treat invokes of cold functions as cold.
As a drive-by, remove the 'exceptionHandlingFunctions' predicate, because it's
no longer needed. The pass can identify & outline blocks dominated by EH pads,
so there's no need to special-case __cxa_begin_catch etc.
Vedant Kumar [Fri, 7 Dec 2018 20:23:52 +0000 (20:23 +0000)]
[HotColdSplitting] Outline more than once per function
Algorithm: Identify maximal cold regions and put them in a worklist. If
a candidate region overlaps with another, discard it. While the worklist
is full, remove a single-entry sub-region from the worklist and attempt
to outline it. By the non-overlap property, this should not invalidate
parts of the domtree pertaining to other outlining regions.
Testing: LNT results on X86 are clean. With test-suite + externals, llvm
outlines 134KB pre-patch, and 352KB post-patch (+ ~2.6x). The file
483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm
outlines over 100 times in some functions (mostly EH paths). There was
not a significant performance impact pre vs. post-patch.
Zachary Turner [Fri, 7 Dec 2018 19:34:02 +0000 (19:34 +0000)]
[NativePDB] Reconstruct function declarations from debug info.
Previously we would create an lldb::Function object for each function
parsed, but we would not add these to the clang AST. This is a first
step towards getting local variable support working, as we first need an
AST decl so that when we create local variable entries, they have the
proper DeclContext.
Sam Clegg [Fri, 7 Dec 2018 19:29:00 +0000 (19:29 +0000)]
[llvm-tapi] Don't try to override SequenceTraits for std::string
For some reason this doesn't seem to work with LLVM_LINK_LLVM_DYLIB
build.
See https://logs.chromium.org/logs/chromium/bb/client.wasm.llvm/linux/37764/+/recipes/steps/LLVM_regression_tests/0/stdout
What is more it seems that overriding these traits for core types
(including std::string) is not supported/recommend by YAMLTraits.h.
See line 1918 which has the assertion:
"only use LLVM_YAML_IS_SEQUENCE_VECTOR for types you control"
Craig Topper [Fri, 7 Dec 2018 18:20:56 +0000 (18:20 +0000)]
[CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost
We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width.
So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops.
There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles.
Craig Topper [Fri, 7 Dec 2018 18:10:34 +0000 (18:10 +0000)]
[X86] Initialize and Register X86CondBrFoldingPass
To make X86CondBrFoldingPass can be run with --run-pass option, this can test one wrong assertion on analyzeCompare function for SUB32ri when its operand is not imm
Simon Pilgrim [Fri, 7 Dec 2018 17:48:40 +0000 (17:48 +0000)]
[X86] Improve pfm counter coverage for llvm-exegesis
This patch attempts to improve pfm perf counter coverage for all the x86 CPUs that libpfm4 supports.
Intel/AMD CPU families tend to share names for cycle/uops counters so even if they don't have a scheduler model yet they can at least use the default values (checked against the libpfm4 source code).
The remaining CPUs (where their port/pipe resource counters are known) I've tried to add to the existing model mappings.
These are untested but don't represent a regression to current llvm-exegesis behaviour for these CPUs.
Sanjay Patel [Fri, 7 Dec 2018 15:47:52 +0000 (15:47 +0000)]
[DAGCombiner] disable truncation of binops by default
As discussed in the post-commit thread of r347917, this
transform is fighting with an existing transform causing
an infinite loop or out-of-memory, so this is effectively
reverting r347917 and its follow-up r348195 while we
investigate the bug.
Nikita Popov [Fri, 7 Dec 2018 15:38:13 +0000 (15:38 +0000)]
Reapply "[DemandedBits][BDCE] Support vectors of integers"
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.
Unlike the previous iteration of this patch, getDemandedBits() can now
again be called on arbirary (sized) instructions, even if they don't
have integer or vector of integer type. (For vector types the size of the
returned mask will now be the scalar size in bits though.)
The added LoopVectorize test case shows a case which triggered an
assertion failure with the previous attempt, because getDemandedBits()
was called on a pointer-typed instruction.
Graham Sellers [Fri, 7 Dec 2018 15:33:21 +0000 (15:33 +0000)]
[AMDGPU] Shrink scalar AND, OR, XOR instructions
This change attempts to shrink scalar AND, OR and XOR instructions which take an immediate that isn't inlineable.
It performs:
AND s0, s0, ~(1 << n) -> BITSET0 s0, n
OR s0, s0, (1 << n) -> BITSET1 s0, n
AND s0, s1, x -> ANDN2 s0, s1, ~x
OR s0, s1, x -> ORN2 s0, s1, ~x
XOR s0, s1, x -> XNOR s0, s1, ~x
In particular, this catches setting and clearing the sign bit for fabs (and x, 0x7ffffffff -> bitset0 x, 31 and or x, 0x80000000 -> bitset1 x, 31).
Sanjay Patel [Fri, 7 Dec 2018 15:00:56 +0000 (15:00 +0000)]
[DAGCombiner] remove explicit calls to AddToWorkList; NFCI
As noted in the post-commit thread for rL347917:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608936.html
...we don't need to repeat these calls because the combiner does it automatically.
This patch introduces a new instinsic `@llvm.experimental.widenable_condition`
that allows explicit representation for guards. It is an alternative to using
`@llvm.experimental.guard` intrinsic that does not contain implicit control flow.
We keep finding places where `@llvm.experimental.guard` is not supported or
treated too conservatively, and there are 2 reasons to that:
- `@llvm.experimental.guard` has memory write side effect to model implicit control flow,
and this sometimes confuses passes and analyzes that work with memory;
- Not all passes and analysis are aware of the semantics of guards. These passes treat them
as regular throwing call and have no idea that the condition of guard may be used to prove
something. One well-known place which had caused us troubles in the past is explicit loop
iteration count calculation in SCEV. Another example is new loop unswitching which is not
aware of guards. Whenever a new pass appears, we potentially have this problem there.
Rather than go and fix all these places (and commit to keep track of them and add support
in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible.
The only significant difference between guards and regular explicit branches is that guard's condition
can be widened. It means that a guard contains (explicitly or implicitly) a `deopt` block successor,
and it is always legal to go there no matter what the guard condition is. The other successor is
a guarded block, and it is only legal to go there if the condition is true.
This patch introduces a new explicit form of guards alternative to `@llvm.experimental.guard`
intrinsic. Now a widenable guard can be represented in the CFG explicitly like this:
deopt:
call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]
The new intrinsic `@llvm.experimental.widenable.condition` has semantics of an
`undef`, but the intrinsic prevents the optimizer from folding it early. This form
should exploit all optimization boons provided to `br` instuction, and it still can be
widened by replacing the result of `@llvm.experimental.widenable.condition()`
with `and` with any arbitrary boolean value (as long as the branch that is taken when
it is `false` has a deopt and has no side-effects).
For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using
implicit control flow in guards".
This patch introduces this new intrinsic with respective LangRef changes and a pass
that converts old-style guards (expressed as intrinsics) into the new form.
The naming discussion is still ungoing. Merging this to unblock further items. We can
later change the name of this intrinsic.
Tim Northover [Fri, 7 Dec 2018 13:43:55 +0000 (13:43 +0000)]
ARM: use correct offset from base pointer (r6) in call frame regions.
When we had dynamic call frames (i.e. sp adjustment around each call) we
were including that adjustment into offsets calculated based on r6, even
though it's only sp that changes. This led to incorrect stack slot
accesses.