Compute safety information in a much finer granularity.
Summary:
Instead of keeping a variable indicating whether there are early exits
in the loop. We keep all the early exits. This improves LICM's ability to
move instructions out of the loop based on is-guaranteed-to-execute.
I am going to update compilation time as well soon.
InstCombine/AMDGPU: Fix constant folding of llvm.amdgcn.{icmp,fcmp}
Summary:
The return value of these intrinsics should always have 0 bits for
inactive threads. This means that when all arguments are constant
and the comparison evaluates to true, the intrinsic should return
the current exec mask.
Nirav Dave [Mon, 24 Apr 2017 15:37:20 +0000 (15:37 +0000)]
[SDAG] Teach Chain Analysis about BaseIndexOffset addressing.
While we use BaseIndexOffset in FindBetterNeighborChains to
appropriately realize they're almost the same address and should be
improved concurrently we do not use it in isAlias using the non-index
understanding FindBaseOffset instead. Adding a BaseIndexOffset check
in isAlias like should allow indexed stores to be merged.
Philip Pfaffe [Mon, 24 Apr 2017 11:54:37 +0000 (11:54 +0000)]
[RegionInfo] Fix dangling references created by moving RegionInfo objects
Summary: Region objects capture the address of the creating RegionInfo instance. Because the RegionInfo class is movable, moving a RegionInfo object creates dangling references. This patch fixes these references by walking the Regions post-move, and updating references to the new parent.
Summary: SUSE's ARM triples end with -gnueabi even though they are hard-float. This requires special handling of SUSE ARM triples. Hence we need a way to differentiate the SUSE as vendor. This CL adds that.
George Rimar [Mon, 24 Apr 2017 10:19:45 +0000 (10:19 +0000)]
[DWARF] - Take relocations in account when extracting ranges from .debug_ranges
I found this when investigated "Bug 32319 - .gdb_index is broken/incomplete" for LLD.
When we have object file with .debug_ranges section it may be filled with zeroes.
Relocations are exist in file to relocate this zeroes into real values later, but until that
a pair of zeroes is treated as terminator. And DWARF parser thinks there is no ranges at all
when I am trying to collect address ranges for building .gdb_index.
Solution implemented in this patch is to take relocations in account when parsing ranges.
We have to widen the operands to 32 bits and then we can either use
hardware division if it is available or lower to a libcall otherwise.
At the moment it is not enough to set the Legalizer action to
WidenScalar, since for libcalls it won't know what to do (it won't be
able to find what size to widen to, because it will find Libcall and not
Legal for 32 bits). To hack around this limitation, we request Custom
lowering, and as part of that we widen first and then we run another
legalizeInstrStep on the widened DIV.
Instruction isb takes as an operand either 'sy' or an immediate value. This
improves the diagnostic when the string is not 'sy' and adds a test case for
this which was missing. This also adds tests to check invalid inputs for dsb
and dmb.
Add support for both targets with hardware division and without. For
hardware division we have to add support throughout the pipeline
(legalizer, reg bank select, instruction select). For targets without
hardware division, we only need to mark it as a libcall.
Treat them the same as the other binary operations that we have so far,
but on integers rather than floating point types. Extract the common
code into a helper.
[ARM] GlobalISel: Select G_CONSTANT with CImm operands
When selecting a G_CONSTANT to a MOVi, we need the value to be an Imm
operand. We used to just leave the G_CONSTANT operand unchanged, which
works in some cases (such as the GEP offsets that we create when
referring to stack slots). However, in many other places the G_CONSTANTs
are created with CImm operands. This patch makes sure to handle those as
well, and to error out gracefully if in the end we don't end up with an
Imm operand.
Thanks to Oliver Stannard for reporting this issue.
[XRay] A tool for Comparing xray function call graphs
Summary:
This is a tool for comparing the function graphs produced by the
llvm-xray graph too. It takes the form of a new subcommand of the
llvm-xray tool 'graph-diff'.
This initial version of the patch is very rough, but it is close to
feature complete.
[APInt] Make behavior of ashr by BitWidth consistent between single and multi word.
Previously single word would always return 0 regardless of the original sign. Multi word would return all 0s or all 1s based on the original sign. Now single word takes into account the sign as well.
Refactor DynamicLibrary so searching for a symbol will have a defined order and
libraries are properly unloaded when llvm_shutdown is called.
Summary:
This was mostly affecting usage of the JIT, where storing the library handles in
a set made iteration unordered/undefined. This lead to disagreement between the
JIT and native code as to what the address and implementation of particularly on
Windows with stdlib functions:
JIT: putenv_s("TEST", "VALUE") // called msvcrt.dll, putenv_s
JIT: getenv("TEST") -> "VALUE" // called msvcrt.dll, getenv
Native: getenv("TEST") -> NULL // called ucrt.dll, getenv
Also fixed is the issue of DynamicLibrary::getPermanentLibrary(0,0) on Windows
not giving priority to the process' symbols as it did on Unix.
Sanjoy Das [Sun, 23 Apr 2017 23:04:45 +0000 (23:04 +0000)]
[SCEV] Move towards a verifier without false positives
This change reboots SCEV's current (off by default) verification logic
to avoid false failures. Instead of stringifying trip counts, it maps
old and new trip counts to the same ScalarEvolution "universe" and
asks ScalarEvolution to compute the difference between them. If the
difference comes out to be a non-zero constant, then (barring some
corner cases) we *know* we messed up.
I've not yet enabled this by default since it hits an exponential time
issue in SCEV, but once I fix that, I'll flip it on by default in
EXPENSIVE_CHECKS builds.
We handled all of the commuted variants for plain xor already,
although they were scattered around and sometimes folded less
efficiently using distributive laws. We had no folds for not-xor.
Handling all of these patterns consistently is part of trying to
reinstate:
https://reviews.llvm.org/rL300977
This makes the WordBits calculation calculate a value between 1 and 64 for the number of bits in the last word. Previously if the BitWidth was a multiple of 64 bits the WordBits value was 0 and we had to bail out early to avoid an undefined shift. Now with a value of 64 we no longer have an undefined shift issue.
This shows a 15-16k reduction in the size of the opt binary on my local x86-64 build.
[APInt] In sext single word case, use SignExtend64 and let the APInt constructor mask off any excess bits.
The current code is trying to be clever with shifts to avoid needing to clear unused bits. But it looks like the compiler is unable to optimize out the unused bit handling in the APInt constructor. Given this its better to just use SignExtend64 and have more readable code.
[InstCombine] add pattern matches for commuted variants of xor-to-xor
There's probably some better way to write this that eliminates the
code duplication without hurting readability, but at least this
eliminates the logic holes and is hopefully slightly more efficient
than creating new instructions.
Revert "[APInt] Fix a few places that use APInt::getRawData to operate within the normal API."
This reverts commit r301105, 4, 3 and 1, as a follow up of the previous
revert, which broke even more bots.
For reference:
Revert "[APInt] Use operator<<= where possible. NFC"
Revert "[APInt] Use operator<<= instead of shl where possible. NFC"
Revert "[APInt] Use ashInPlace where possible."
[ARM] ScheduleDAGRRList::DelayForLiveRegsBottomUp must consider OptionalDefs
Summary:
D30400 has enabled tADC and tSBC instructions to be unglued, thereby allowing CPSR to remain live between Thumb1 scheduling units.
Most Thumb1 instructions have an OptionalDef for CPSR; but the scheduler ignored the OptionalDefs, and could unwittingly insert a flag-setting instruction in between an ADDS and the corresponding ADC.
Davide Italiano [Sun, 23 Apr 2017 04:49:34 +0000 (04:49 +0000)]
[ThinLTO/Summary] Rename anonymous globals as last action ...
... in the per-TU -O0 pipeline.
The problem is that there could be passes registered using
`addExtensionsToPM()` introducing unnamed globals.
Asan is an example, but there may be others. Building cppcheck
with `-flto=thin` and `-fsanitize=address` triggers an assertion
while we're reading bitcode (in lib/LTO), as the BitcodeReader
assumes there are no unnamed globals (because the namer has run).
Unfortunately I wasn't able to find an easy way to test this.
I added a comment in the hope nobody moves this again.
Adrian Prantl [Sat, 22 Apr 2017 20:54:06 +0000 (20:54 +0000)]
Use DW_OP_stack_value when reconstructing variable values with arithmetic.
When the location description of a source variable involves arithmetic
on the value itself, it needs to be marked with DW_OP_stack_value since it
is not describing the variable's location, but rather its value.
This is a follow-up to r297971 and fixes the source testcase quoted in
the comment in debuginfo-dce.ll.
The later uses of dyn_castNotVal in this block are either
incomplete (doesn't handle vector constants) or overstepping
(shouldn't handle constants at all), but this first use is
just unnecessary. 'I' is obviously not a constant, and it
can't be a not-of-a-not because that would already be
instsimplified.
Daniel Sanders [Sat, 22 Apr 2017 15:11:04 +0000 (15:11 +0000)]
[globalisel][tablegen] Revise API for ComplexPattern operands to improve flexibility.
Summary:
Some targets need to be able to do more complex rendering than just adding an
operand or two to an instruction. For example, it may need to insert an
instruction to extract a subreg first, or it may need to perform an operation
on the operand.
In SelectionDAG, targets would create SDNode's to achieve the desired effect
during the complex pattern predicate. This worked because SelectionDAG had a
form of garbage collection that would take care of SDNode's that were created
but not used due to a later predicate rejecting a match. This doesn't translate
well to GlobalISel and the churn was wasteful.
The API changes in this patch enable GlobalISel to accomplish the same thing
without the waste. The API is now:
InstructionSelector::OptionalComplexRendererFn selectArithImmed(MachineOperand &Root) const;
where Root is the root of the match. The return value can be omitted to
indicate that the predicate failed to match, or a function with the signature
ComplexRendererFn can be returned. For example:
return OptionalComplexRendererFn(
[=](MachineInstrBuilder &MIB) { MIB.addImm(Immed).addImm(ShVal); });
adds two immediate operands to the rendered instruction. Immed and ShVal are
captured from the predicate function.
As an added bonus, this also reduces the amount of information we need to
provide to GIComplexOperandMatcher.
Daniel Sanders [Sat, 22 Apr 2017 14:31:28 +0000 (14:31 +0000)]
[globalisel][tablegen] Fix PR32733 by checking which instruction operands belong to.
canMutate() was returning true when the operands were all in the same order as
the matched instruction. However, it wasn't checking the operands were actually
on that instruction. This worked when we could only match a single instruction
but the addition of nested instruction matching led to cases where the operands
could be split across multiple instructions. canMutate() now returns false if
operands belong to instructions other than the root of the match.
David Blaikie [Sat, 22 Apr 2017 07:53:44 +0000 (07:53 +0000)]
Avoid using relocations for ref_addr in .dwo files
In dwo files the fixed offset can be used - if the dwos are linked into
a dwp, the dwo consumer must use the dwp tables to find out where the
original range of the debug_info was and resolve the "section relative"
value relative to that original range - effectively
avoiding/reimplementing the relocation handling.
Artur Pilipenko [Sat, 22 Apr 2017 07:24:52 +0000 (07:24 +0000)]
Fix for PR32740 - Invalid floating type, unreachable between r300969 and r301029
The bug was introduced by r301018 "[InstCombine] fadd double (sitofp x), y check that the promotion is valid". The patch didn't expect that fadd can be on vectors not necessarily scalars. Add vector support along with the test.
David Blaikie [Sat, 22 Apr 2017 02:18:00 +0000 (02:18 +0000)]
Remove the unnecessary virtual dtor from the DIEUnit hierarchy (in favor of protected dtor in the base, final derived classes with public non-virtual dtors)
These objects are never polymorphically owned/destroyed, so the virtual
dtor was unnecessary.
David Blaikie [Fri, 21 Apr 2017 23:35:26 +0000 (23:35 +0000)]
Move Split DWARF handling to an MC option/command line argument rather than using metadata
Since Split DWARF needs to name the actual .dwo file that is generated,
it can't be known at the time the llvm::Module is produced as it may be
merged with other Modules before the object is generated and that object
may be generated with any name.
By passing the Split DWARF file name when LLVM is producing object code
the .dwo file name in the object file can match correctly.
The support for Split DWARF for implicit modules remains the same -
using metadata to store the dwo name and dwo id so that potentially
multiple skeleton CUs referring to different dwo files can be generated
from one llvm::Module.
AArch64FrameLowering: Check if the ExtraCSSpill register is actually unused
The code assumed that when saving an additional CSR register
(ExtraCSSpill==true) we would have a free register throughout the
function. This was not true if this CSR register is also used to pass
values as in the swiftself case.
Kuba Mracek [Fri, 21 Apr 2017 22:38:24 +0000 (22:38 +0000)]
[libFuzzer] Always build libFuzzer
There are two reasons why users might want to build libfuzzer:
- To fuzz LLVM itself
- To get the libFuzzer.a archive file, so that they can attach it to their code
This change always builds libfuzzer, and supports the second use case if the specified flag is set.
The point of this patch is to have something that can potentially be shipped with the compiler, and this also ensures that the version of libFuzzer is correct to use with that compiler.
[APSInt] Use APInt::compare and APInt::compareSigned to implement APSInt::compareValue
APInt just got compare methods that return -1, 0, or 1 instead of just having ult/slt and eq.
This patch uses these methods to implement APSInt::compareValues so that we don't have to call do an equal comparison and then possibly a second less than comparison.
Hans Wennborg [Fri, 21 Apr 2017 21:48:41 +0000 (21:48 +0000)]
Re-commit r301040 "X86: Don't emit zero-byte functions on Windows"
In addition to the original commit, tighten the condition for when to
pad empty functions to COFF Windows. This avoids running into problems
when targeting e.g. Win32 AMDGPU, which caused test failures when this
was committed initially.
Hans Wennborg [Fri, 21 Apr 2017 20:58:12 +0000 (20:58 +0000)]
X86: Don't emit zero-byte functions on Windows
Empty functions can lead to duplicate entries in the Guard CF Function
Table of a binary due to multiple functions sharing the same RVA,
causing the kernel to refuse to load that binary.
We had a terrific bug due to this in Chromium.
It turns out we were already doing this for Mach-O in certain
situations. This patch expands the code for that in
AsmPrinter::EmitFunctionBody() and renames
TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it
seems it was used for not just Mach-O anyway.