[DemandedBits] Fix computation of demanded bits for ICmps
The computation of ICmp demanded bits is independent of the individual operand being evaluated. We simply return a mask consisting of the minimum leading zeroes of both operands.
We were incorrectly passing "I" to ComputeKnownBits - this should be "UserI->getOperand(0)". In cases where we were evaluating the 1th operand, we were taking the minimum leading zeroes of it and itself.
This should fix PR26266.
------------------------------------------------------------------------
[GCOV] Avoid emitting profile arcs for module and skeleton CUs
Do not emit profile arc files and note files for module and skeleton
CU's.
Our users report seeing unexpected *.gcda and *.gcno files in their
projects when using gcov-style profiling with modules or frameworks.
The unwanted files come from these modules. This is not very helpful
for end-users. Further, we've seen reports of instrumented programs
crashing while writing these files out (due to I/O failures).
[LibCallSimplifier] don't get fooled by a fake fmin()
This is similar to the bug/fix:
https://llvm.org/bugs/show_bug.cgi?id=26211
http://reviews.llvm.org/rL258325
The fmin() test case reveals another bug caused by sloppy
code duplication. It will crash without this patch because
fp128 is a valid floating-point type, but we would think
that we had matched a function that used doubles.
The new helper function can be used to replace similar
checks that are used in several other places in this file.
------------------------------------------------------------------------
dupRetToEnableTailCallOpts(BB) can invalidate BB. It must run *after* we iterate across BB!
------------------------------------------------------------------------
[LibCallSimplifier] don't get fooled by a fake sqrt()
The test case will crash without this patch because the subsequent call to
hasUnsafeAlgebra() assumes that the call instruction is an FPMathOperator
(ie, returns an FP type).
This part of the function signature check was omitted for the sqrt() case,
but seems to be in place for all other transforms.
Before:
http://reviews.llvm.org/rL257400
...we would have needlessly continued execution in optimizeSqrt(), but the
bug was harmless because we'd eventually fail some other check and return
without damage.
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=26211
[SelectionDAG] CSE nodes with differing SDNodeFlags
In the optimizer (GVN etc.) when eliminating redundant nodes with different
flags, the flags are ignored for the purposes of testing for congruence, and
then intersected for the purposes of producing a result that supports the union
of all the uses. This commit makes SelectionDAG's CSE do the same thing,
allowing it to CSE nodes in more cases. This fixes PR26063.
Summary:
Funclet EH tables require that a given funclet have only one unwind
destination for exceptional exits. The verifier will therefore reject
e.g. two cleanuprets with different unwind dests for the same cleanup, or
two invokes exiting the same funclet but to different unwind dests.
Because catchswitch has no 'nounwind' variant, and because IR producers
are not *required* to annotate calls which will not unwind as 'nounwind',
it is legal to nest a call or an "unwind to caller" catchswitch within a
funclet pad that has an unwind destination other than caller; it is
undefined behavior for such a call or catchswitch to unwind.
Normally when inlining an invoke, calls in the inlined sequence are
rewritten to invokes that unwind to the callsite invoke's unwind
destination, and "unwind to caller" catchswitches in the inlined sequence
are rewritten to unwind to the callsite invoke's unwind destination.
However, if such a call or "unwind to caller" catchswitch is located in a
callee funclet that has another exceptional exit with an unwind
destination within the callee, applying the normal transformation would
give that callee funclet multiple unwind destinations for its exceptional
exits. There would be no way for EH table generation to determine which
is the "true" exit, and the verifier would reject the function
accordingly.
Add logic to the inliner to detect these cases and leave such calls and
"unwind to caller" catchswitches as calls and "unwind to caller"
catchswitches in the inlined sequence.
[X86] Do not run shrink-wrapping on function with split-stack attribute or HiPE
calling convention.
The implementation of the related callbacks in the x86 backend for such
functions are not ready to deal with a prologue block that is not the entry
block of the function.
This fixes PR26107, but the longer term solution would be to fix those callbacks.
There are several requirements that ended up with this design;
1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early.
2. Bitreversals and byteswaps are very related in their matching logic.
3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses.
4. Bswaps are best matched early in InstCombine.
The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals.
We can then extend the matching logic in one place only.
------------------------------------------------------------------------
CXX_FAST_TLS calling convention: fix issue on X86-64.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.
CXX_FAST_TLS calling convention: fix issue on AArch64.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.
CXX_FAST_TLS calling convention: fix issue on ARM.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.
CXX_FAST_TLS calling convention: fix issue on x86-64.
%RBP can't be handled explicitly. We generate the following code:
pushq %rbp
movq %rsp, %rbp
...
movq %rbx, (%rbp) ## 8-byte Spill
where %rbp will be overwritten by the spilled value.
The fix is to let PEI handle %RBP.
PR26136
------------------------------------------------------------------------
Stop increasing alignment of externally-visible globals on ELF
platforms.
With ELF, the alignment of a global variable in a shared library will
get copied into an executables linked against it, if the executable even
accesss the variable. So, it's not possible to implicitly increase
alignment based on access patterns, or you'll break existing binaries.
This happened to affect libc++'s std::cout symbol, for example. See
thread: http://thread.gmane.org/gmane.comp.compilers.clang.devel/45311
(This is a re-commit of r257719, without the bug reported in
PR26144. I've tweaked the code to not assert-fail in
enforceKnownAlignment when computeKnownBits doesn't recurse far enough
to find the underlying Alloca/GlobalObject value.)
Hans Wennborg [Thu, 14 Jan 2016 23:24:17 +0000 (23:24 +0000)]
Merging r257791:
------------------------------------------------------------------------
r257791 | hans | 2016-01-14 11:21:14 -0800 (Thu, 14 Jan 2016) | 4 lines
Exclude test-suite from CMake builds in test-release.sh
It's broken. In 3.7 there wasn't a CMake build for test-suite at all,
so we're not losing something we had before.
------------------------------------------------------------------------
[X86] Don't alter HasOpaqueSPAdjustment after we've relied on it
We rely on HasOpaqueSPAdjustment not changing after we've calculated
things based on it. Things like whether or not we can use 'rep;movs' to
copy bytes around, that sort of thing. If it changes, invariants in the
backend will quietly break. This situation arose when we had a call to
memcpy *and* a COPY of the FLAGS register where we would attempt to
reference local variables using %esi, a register that was clobbered by
the 'rep;movs'.
This fixes PR26124.
------------------------------------------------------------------------
Dimitry Andric [Wed, 13 Jan 2016 19:37:51 +0000 (19:37 +0000)]
Merging r257645:
------------------------------------------------------------------------
r257645 | dim | 2016-01-13 19:29:46 +0100 (Wed, 13 Jan 2016) | 22 lines
Avoid undefined behavior in LinkAllPasses.h
The LinkAllPasses.h file is included in several main programs, to force
a large number of passes to be linked in. However, the ForcePassLinking
constructor uses undefined behavior, since it calls member functions on
`nullptr`, e.g.:
Marek Olsak [Wed, 13 Jan 2016 17:23:04 +0000 (17:23 +0000)]
AMDGPU/SI: Add support for non-void functions
Summary:
Return values can be stored in SGPRs (i32) and VGPRs (f32).
This will be used by functions which expect some bytecode or other binary to
be appended at the end. It allows defining in which registers the return
values will be stored.
Dan Liew [Wed, 13 Jan 2016 16:43:49 +0000 (16:43 +0000)]
[lit] Fix handling of per test timeout when the installed psutil version
is < ``2.0``.
Older versions of psutil (e.g. ``1.2.1`` which is the version shipped with
Ubuntu 14.04) use a different API for retrieving the child processes.
To handle this try the new API first and if that fails try the old API.
Ulrich Weigand [Wed, 13 Jan 2016 13:12:23 +0000 (13:12 +0000)]
[PowerPC] Fix large code model with the ELFv2 ABI
The global entry point prologue currently assumes that the TOC
associated with a function is less than 2GB away from the function
entry point. This is always true when using the medium or small
code model, but may not be the case when using the large code model.
This patch adds a new variant of the ELFv2 global entry point prologue
that lifts the 2GB restriction when building with -mcmodel=large.
This works by emitting a quadword containing the distance from the
function entry point to its associated TOC immediately before the
entry point, and then using a prologue like:
ld r2,-8(r12)
add r2,r2,r12
Since creation of the entry point prologue is now split across two
separate routines (PPCLinuxAsmPrinter::EmitFunctionEntryLabel emits
the data word, PPCLinuxAsmPrinter::EmitFunctionBodyStart the prolog
code), I've switched to using named labels instead of just temporaries
to indicate the locations of the global and local entry points and the
new TOC offset data word.
These names are provided by new routines in PPCFunctionInfo modeled
after the existing PPCFunctionInfo::getPICOffsetSymbol.
Note that a corresponding change was committed to GCC here:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00355.html
Marek Olsak [Wed, 13 Jan 2016 11:45:36 +0000 (11:45 +0000)]
AMDGPU/SI: Add new target attribute InitialPSInputAddr
Summary:
This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM.
The register assigns VGPR locations to PS inputs, while the ENA register
determines whether or not they are loaded.
Mesa needs to set some inputs as not-movable, so that a pixel shader prolog
binary appended at the beginning can assume where some inputs are.
v2: Make PSInputAddr private, because there is never enough silly getters
and setters for people to read.
Make x86 OptimizeLEAs pass remove LEA instruction if there is another LEA
(in the same basic block) which calculates address differing only be a
displacement. Works only for -Oz.
Craig Topper [Wed, 13 Jan 2016 07:20:13 +0000 (07:20 +0000)]
[TableGen] Cleanup output formatting and add llvm_unreachables to the output the AsmMatcher uses when it overflows the 64-bit tables. No in tree targets use this code, but I tested it with an temporarily reduced table width.
Akira Hatanaka [Wed, 13 Jan 2016 06:02:45 +0000 (06:02 +0000)]
[Inliner] Merge the attributes of the caller and callee functions
This patch turns off the fast-math optimization attribute on the caller
if the callee's fast-math attribute is not turned on.
For example,
- before inlining
caller: "less-precise-fpmad"="true"
callee: "less-precise-fpmad"="false"
- after inlining
caller: "less-precise-fpmad"="false"
Alternatively, it's possible to block inlining if the caller's and
callee's attributes don't match. If this approach is preferable to the
one in this patch, we can discuss post-commit.
Fix PointerIntPair so that it can use an enum class as its integer template argument.
Summary:
The problem here is that an enum class can not be implicitly converted to an
integer. That assumption snuck back into PointerIntPair. This commit fixes the
issue and more importantly adds some unittests to make sure that we do not break
this again.
James Y Knight [Wed, 13 Jan 2016 04:44:14 +0000 (04:44 +0000)]
[SPARC] Revamp AnalyzeBranch and add ReverseBranchCondition.
AnalyzeBranch on X86 (and, previously, SPARC, which implementation was
copied from X86) tries to modify the branches based on block
layout (e.g. checking isLayoutSuccessor), when AllowModify is true.
The rest of the architectures leave that up to the caller, which can
call InsertBranch, RemoveBranch, and ReverseBranchCondition as
appropriate. That appears to be the preferred way to do it nowadays.
This commit makes SPARC like the rest: replaces AnalyzeBranch with an
implementation cribbed from AArch64, and adds a ReverseBranchCondition
implementation.
Additionally, a test-case has been added (also cribbed from AArch64)
demonstrating that redundant branch sequences no longer get emitted.
E.g., it used to emit code like this:
bne .LBB1_2
nop
ba .LBB1_1
nop
.LBB1_2:
(Resubmit after fixing a typo that breaks test on big endian
machines)
In this refactoring, member functions are introduced to access
CovMap header/func record members and hide layout details. This
will enable further code restructuring to support reading multiple
versions of coverage mapping data with shared/templatized code.
(When coveremap format version changes, backward compatibtility
should be preserved).
Davide Italiano [Wed, 13 Jan 2016 04:11:36 +0000 (04:11 +0000)]
[llvm-readobj] Remove dead code. Add an assertion instead.
When we arrive at the end of the function, the validation of
the object has been done already. In theory, so, we should never
arrive here with something broken as the object isn't mutated.
Practice sometimes proves theory to be wrong, so leave an assertion
instead, as suggested by David Blaikie, to catch bugs.
Matthias Braun [Wed, 13 Jan 2016 01:18:13 +0000 (01:18 +0000)]
AsmPrinter: Fix wrong OS X versions being emitted for darwin triples
The version numbers of the darwin kernel are different from the version
numbers of OS X, so we need adjustments if we had "*-*-darwin" triples.
Use the existing utility functions in TargetTriple for this.
David Majnemer [Wed, 13 Jan 2016 01:05:23 +0000 (01:05 +0000)]
[CodeView] Mark our lines as statements, not expressions
The line tables for CodeView make a distinction between expressions and
statements. As it turns out, MSVC always emits them as statements and
we always emit them as expressions. Let's switch to statements to match
the CodeView that they emit.
David Majnemer [Wed, 13 Jan 2016 01:05:16 +0000 (01:05 +0000)]
[CodeView] Improve the line table dumper
This change has us print out fields we didn't previously understand. To
improve readability, we now group column information with it's
respective line.
In this refactoring, member functions are introduced to access
CovMap header/func record members and hide layout details. This
will enable further code restructuring to support reading multiple
versions of coverage mapping data with shared/templatized code.
(When coveremap format version changes, backward compatibtility
should be preserved).
Additionally, clang should no longer complain during bootstrap should no
longer happen after r257534.
The original commit message was:
```
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.
One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.
Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
variable as that would depend on the target, even though this test is
supposed to be generic
- I had to manually declared size/align for reference type. See also the
discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
```
Kevin Enderby [Wed, 13 Jan 2016 00:25:36 +0000 (00:25 +0000)]
For llvm-objdump, add the option -private-header (without the trailing ’s’)
to only print the first private header.
Which for Mach-O files only prints the Mach header and not the subsequent load
commands. Which is used by scripts to match what the darwin otool(1) with the
-h flag does without the -l flag.
For non-Mach-O files it has the same functionality as -private-headers (with
the trailing ’s’).
In this refactoring, member functions are introduced to access
CovMap header/func record members and hide layout details. This
will enable further code restructuring to support reading multiple
versions of coverage mapping data with shared/templatized code.
(When coveremap format version changes, backward compatibtility
should be preserved).
Quentin Colombet [Wed, 13 Jan 2016 00:02:40 +0000 (00:02 +0000)]
[ARM] Mark VMOV with immediate: isAsCheapAsMove.
VMOVs are not strictly speaking cheap, but they are as expensive as a vector
copy (VORR), so we should prefer rematerialization over splitting when it
applies.
Matthias Braun [Tue, 12 Jan 2016 22:57:35 +0000 (22:57 +0000)]
RegisterPressure: Expose RegisterOperands API
Previously the RegisterOperands have only been used internally in
RegisterPressure.cpp. However this datastructure can be useful for other
tasks as well and allows refactoring of PDiff initialisation out of
RPTracker::recede().
This patch:
- Exposes RegisterOperands as public API
- Splits RPTracker::recede() into a part that skips DebugValues and
maintains the region borders, and the core that changes register
pressure when given a set of RegisterOperands.
- This allows to move the PDiff initialisation out recede() into a
method of the PressureDiffs class.
- The upcoming subregister scheduling code will also use
RegisterOperands to avoid pushing more unrelated functionality into
recede()/advance().
Keno Fischer [Tue, 12 Jan 2016 22:46:09 +0000 (22:46 +0000)]
[Utils] Insert DW_OP_bit_piece when only describing part of the variable
Summary: The dbg.declare -> dbg.value conversion looks through any zext/sext
to find a value to describe the variable (in the expectation that those
zext/sext instruction will go away later). However, those values do not
cover the entire variable and thus need a DW_OP_bit_piece.
David Majnemer [Tue, 12 Jan 2016 21:58:20 +0000 (21:58 +0000)]
[CodeView] Initialize column-end to zero
CodeView, unlike DWARF, can associate code with a range of columns.
However, LLVM can only represent a single column position internally.
We used to claim that the end column and start column were the same
which yielded less than satisfactory results: we would stop printing at
the _beginning_ of the source expression! Instead, mark the column-end
as 'zero' to indicate that we don't have one (as per the documentation
for IDiaLineNumber::get_lineNumberEnd).
Dan Gohman [Tue, 12 Jan 2016 20:56:01 +0000 (20:56 +0000)]
[WebAssembly] Add a EM_WEBASSEMBLY value, and several bits of code that use it.
A request has been made to the official registry, but an official value is
not yet available. This patch uses a temporary value in order to support
development. When an official value is recieved, the value of EM_WEBASSEMBLY
will be updated.
Dan Gohman [Tue, 12 Jan 2016 19:14:46 +0000 (19:14 +0000)]
[WebAssembly] Make CFG stackification independent of basic-block labels.
This patch changes the way labels are referenced. Instead of referencing the
basic-block label name (eg. .LBB0_0), instructions now just have an immediate
which indicates the depth in the control-flow stack to find a label to jump to.
This makes them much closer to what we expect to have in the binary encoding,
and avoids the problem of basic-block label names not being explicit in the
binary encoding.
Also, it terminates blocks and loops with end_block and end_loop instructions,
rather than basic-block label names, for similar reasons.
This will also fix problems where two constructs appear to have the same label,
because we no longer explicitly use labels, so consumers that need labels will
presumably create their own labels, and presumably they won't reuse labels
when they do.
This patch does make the code a little more awkward to read; as a partial
mitigation, this patch also introduces comments showing where the labels are,
and comments on each branch showing where it's branching to.
- Handle simple cases of register copies (what current RDF CP allows).
- Hexagon-specific dead code elimination: handles dead address updates
in post-increment instructions.