Quentin Colombet [Thu, 12 Nov 2015 18:13:42 +0000 (18:13 +0000)]
[ShrinkWrap] Make sure we do not mess up with EH funclet lowering.
ShrinkWrapping does not understand exception handling constraints for now, so
make sure we do not mess with them by aborting on functions that use EH
funclets.
[llvm-profdata] Add check for text profile formats and improve error reporting
Summary:
This change addresses two possible instances of user error / confusion when
merging sampled profile data.
Previously any input that didn't match the raw or processed instrumented format
would automatically be interpreted as instrumented profile text format data.
No error would be reported during the merge.
Example:
If foo-sampled.profdata and bar-sampled.profdata are binary sampled profiles:
Old behavior:
$ llvm-profdata merge foo-sampled.profdata bar-sampled.profdata -output foobar-sampled.profdata
$ llvm-profdata show -sample foobar-sampled.profdata
error: foobar-sampled.profdata:1: Expected 'mangled_name:NUM:NUM', found lprofi
This change adds basic checks for valid input data when assuming text input.
It also makes error messages related to file format validity more specific about
the assumbed profile data type.
New behavior:
$ llvm-profdata merge foo-sampled.profdata bar-sampled.profdata -o foobar-sampled.profdata
error: foo.profdata: Unrecognized instrumentation profile encoding format
Perhaps you forgot to use the -sample option?
Dan Gohman [Thu, 12 Nov 2015 17:04:33 +0000 (17:04 +0000)]
[WebAssembly] Reapply r252858, with svn add for the new file.
Switch to MC for instruction printing.
This encompasses several changes which are all interconnected:
- Use the MC framework for printing almost all instructions.
- AsmStrings are now live.
- This introduces an indirection between LLVM vregs and WebAssembly registers,
and a new pass, WebAssemblyRegNumbering, for computing a basic the mapping.
This addresses some basic issues with argument registers and unused registers.
- The way ARGUMENT instructions are handled no longer generates redundant
get_local+set_local for every argument.
This also changes the assembly syntax somewhat; most notably, MC's printing
does not use sigils on label names, so those are no longer present, and
push/pop now have a sigil to keep them unambiguous.
The usage of set_local/get_local/$push/$pop will continue to evolve
significantly. This patch is just one step of a larger change.
[x86] translating "fp" (floating point) instructions from {fadd,fdiv,fmul,fsub,fsubr,fdivr} to {faddp,fdivp,fmulp,fsubp,fsubrp,fdivrp}
LLVM Missing the following instructions: fadd\fdiv\fmul\fsub\fsubr\fdivr.
GAS and MS supporting this instruction and lowering them in to a faddp\fdivp\fmulp\fsubp\fsubrp\fdivrp instructions.
Artyom Skrobov [Thu, 12 Nov 2015 15:51:41 +0000 (15:51 +0000)]
Cull non-standard variants of ARM architectures (NFC)
Summary:
This patch changes ARMV5, ARMV5E, ARMV6SM, ARMV6HL, ARMV7, ARMV7L,
ARMV7HL, ARMV7EM to be treated as aliases for the corresponding
standard architectures, instead of as actual architectures.
James Molloy [Thu, 12 Nov 2015 13:49:17 +0000 (13:49 +0000)]
[ARM] CMOV->BFI combining: handle both senses of CMPZ
I completely misunderstood what ARMISD::CMPZ means. It's not "compare equal to zero", it's "compare, only setting the zero/Z flag". It can either be equal-to-zero or not-equal-to-zero, and we weren't checking what sense it was.
If it's equal-to-zero, we can swap the operands around and pretend like it is not-equal-to-zero, which is both a bug fix and lets us handle more cases.
[mips] Use correct frame register for DWARF info when dynamically realigning the stack.
Summary:
This patch overrides TargetFrameLowering::getFrameIndexReference() in order to
specify the correct register when the function needs dynamic stack realignment.
The values returned from this function are used in order to create DW_AT_locations
for DWARF info. These locations would use the wrong registers as it's been
reported in PR25028.
James Molloy [Thu, 12 Nov 2015 12:39:41 +0000 (12:39 +0000)]
[InstCombine] Add trivial folding (bitreverse (bitreverse x)) -> x
There are plenty more instcombines we could probably do with bitreverse, but this seems like a very obvious and trivial starting point and was brought up by Hal in his review.
James Molloy [Thu, 12 Nov 2015 12:29:09 +0000 (12:29 +0000)]
[SDAG] Introduce a new BITREVERSE node along with a corresponding LLVM intrinsic
Several backends have instructions to reverse the order of bits in an integer. Conceptually matching such patterns is similar to @llvm.bswap, and it was mentioned in http://reviews.llvm.org/D14234 that it would be best if these patterns were matched in InstCombine instead of reimplemented in every different target.
This patch introduces an intrinsic @llvm.bitreverse.i* that operates similarly to @llvm.bswap. For plumbing purposes there is also a new ISD node ISD::BITREVERSE, with simple expansion and promotion support.
The intention is that InstCombine's BSWAP detection logic will be extended to support BITREVERSE too, and @llvm.bitreverse intrinsics emitted (if the backend supports lowering it efficiently).
Kuba Brecka [Thu, 12 Nov 2015 09:40:29 +0000 (09:40 +0000)]
[Object, MachO] Mark symbols from DATA and BSS sections as ST_Data
In `MachOObjectFile::getSymbolType` we currently always return `SymbolRef::ST_Function` for symbols from any section. In order for llvm-symbolizer to correctly symbolize Mach-O globals, symbols from data and BSS sections should return `SymbolRef::ST_Data`.
James Molloy [Thu, 12 Nov 2015 08:53:04 +0000 (08:53 +0000)]
[FunctionAttrs] Identify norecurse functions
A function can be marked as norecurse if:
* The SCC to which it belongs has cardinality 1; and either
a) It does not call any non-norecurse function. This includes self-recursion; or
b) It only has one callsite and the function that callsite is within is marked norecurse.
a) is best propagated bottom-up and b) is best propagated top-down.
We build up the norecurse attributes bottom-up using the existing SCC pass, and mark functions with no obvious recursion (but not provably norecurse) to sweep later, top-down.
David Blaikie [Thu, 12 Nov 2015 06:33:14 +0000 (06:33 +0000)]
Mostly revert 252842 due to failures on some buildbots.
I imagine there's some UB in here somewhere, though Valgrind doesn't
seem to have picked it up (not sure if I have a working asan build right
now to test there).
GDB bot seems to be crashing:
http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu-gdb-75/builds/26267/steps/check-all/logs/FAIL%3A%20LLVM%3A%3Adwarfdump-dwp.test
Hexagon ELF bot is, presumably, just getting different output:
http://lab.llvm.org:8011/builders/clang-hexagon-elf/builds/32927/steps/check-all/logs/FAIL%3A%20LLVM%3A%3Adwarfdump-dwp.test
Dan Gohman [Thu, 12 Nov 2015 06:10:03 +0000 (06:10 +0000)]
[WebAssembly] Switch to MC for instruction printing.
This encompasses several changes which are all interconnected:
- Use the MC framework for printing almost all instructions.
- AsmStrings are now live.
- This introduces an indirection between LLVM vregs and WebAssembly registers,
and a new pass, WebAssemblyRegNumbering, for computing a basic the mapping.
This addresses some basic issues with argument registers and unused registers.
- The way ARGUMENT instructions are handled no longer generates redundant
get_local+set_local for every argument.
This also changes the assembly syntax somewhat; most notably, MC's printing
use sigils on label names, so those are no longer present, and push/pop now
have a sigil to keep them unambiguous.
The usage of set_local/get_local/$push/$pop will continue to evolve
significantly. This patch is just one step of a larger change.
Mike Aizatsky [Thu, 12 Nov 2015 04:38:40 +0000 (04:38 +0000)]
output_csv libfuzzer option
Summary:
The option outputs statistics in CSV format preceded by 1 header line.
This is intended for machine processing of the output.
-verbosity=0 should likely be set.
Matthias Braun [Thu, 12 Nov 2015 01:02:47 +0000 (01:02 +0000)]
LegalizeDAG: Fix and improve FCOPYSIGN/FABS legalization
- Factor out code to query and modify the sign bit of a floatingpoint
value as an integer. This also works if none of the targets integer
types is big enough to hold all bits of the floatingpoint value.
- Legalize FABS(x) as FCOPYSIGN(x, 0.0) if FCOPYSIGN is available,
otherwise perform bit manipulation on the sign bit. The previous code
used "x >u 0 ? x : -x" which is incorrect for x being -0.0! It also
takes 34 instructions on ARM Cortex-M4. With this patch we only
require 5:
vldr d0, LCPI0_0
vmov r2, r3, d0
lsrs r2, r3, #31
bfi r1, r2, #31, #1
bx lr
(This could be further improved if the compiler would recognize that
r2, r3 is zero).
- Only lower FCOPYSIGN(x, y) = sign(x) ? -FABS(x) : FABS(x) if FABS is
available otherwise perform bit manipulation on the sign bit.
- Perform the sign(x) test by masking out the sign bit and comparing
with 0 rather than shifting the sign bit to the highest position and
testing for "<s 0". For x86 copysignl (on 80bit values) this gets us:
testl $32768, %eax
rather than:
shlq $48, %rax
sets %al
testb %al, %al
Chad Rosier [Wed, 11 Nov 2015 23:00:59 +0000 (23:00 +0000)]
[LIR] General refactor to improve compile-time and simplify code.
First create a list of candidates, then transform. This simplifies the code in
that you have don't have to worry that you may be using an invalidated
iterator.
Previously, each time we created a memset/memcpy we would reevaluate the entire
loop potentially resulting in lots of redundant work for large basic blocks.
David Majnemer [Wed, 11 Nov 2015 21:57:16 +0000 (21:57 +0000)]
[IR] Add support for empty tokens
When working with tokens, it is often the case that one has instructions
which consume a token and produce a new token. Currently, we have no
mechanism to represent an initial token state.
Instead, we can create a notional "empty token" by inventing a new
constant which captures the semantics we would like. This new constant
is called ConstantTokenNone and is written textually as "token none".
Sanjoy Das [Wed, 11 Nov 2015 21:38:02 +0000 (21:38 +0000)]
Introduce deoptimization operand bundles
Summary:
This change introduces the notion of "deoptimization" operand bundles.
LLVM can recognize and optimize these in more precise ways than it can a
generic "unknown" operand bundles.
The current form of this special recognition / optimization is an enum
entry in LLVMContext, a LangRef blurb and a verifier rule. Over time we
will teach LLVM to do more aggressive optimization around deoptimization
operand bundles, exploiting known facts about kinds of state
deoptimization operand bundles are allowed to track.
Akira Hatanaka [Wed, 11 Nov 2015 20:35:42 +0000 (20:35 +0000)]
Move the enum attributes defined in Attributes.h to a table-gen file.
This is a step towards consolidating some of the information regarding
attributes in a single place.
This patch moves the enum attributes in Attributes.h to the table-gen
file. Additionally, it adds definitions of target independent string
attributes that will be used in follow-up commits by the inliner to
check attribute compatibility.
Yunzhong Gao [Wed, 11 Nov 2015 19:59:08 +0000 (19:59 +0000)]
Add a libLTO diagnostic handler that supports lto_get_error_message API
This is a follow-up from the previous discussion on the thread:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151019/307763.html
The LibLTO lto_get_error_message() API reads error messages from a std::string
sLastErrorString. Instead of passing this string around as an argument, this
patch creates a diagnostic handler and then sends this handler to the
constructor of LTOCodeGenerator.
Geoff Berry [Wed, 11 Nov 2015 19:42:52 +0000 (19:42 +0000)]
[DAGCombiner] Improve zextload optimization.
Summary:
Don't fold
(zext (and (load x), cst)) -> (and (zextload x), (zext cst))
if
(and (load x) cst)
will match as a zextload already and has additional users.
For example, the following IR:
%load = load i32, i32* %ptr, align 8
%load16 = and i32 %load, 65535
%load64 = zext i32 %load16 to i64
store i32 %load16, i32* %dst1, align 4
store i64 %load64, i64* %dst2, align 8
used to produce the following aarch64 code:
ldr w8, [x0]
and w9, w8, #0xffff
and x8, x8, #0xffff
str w9, [x1]
str x8, [x2]
but with this change produces the following aarch64 code:
Diego Novillo [Wed, 11 Nov 2015 17:54:37 +0000 (17:54 +0000)]
SamplePGO - Fix PR 25482 - Do not rely on llvm.dbg.cu for discriminators
The discriminators pass relied on the presence of llvm.dbg.cu to decide
whether to add discriminators, but this fails in the case where debug
info is only enabled partially when -fprofile-sample-use is active.
The reason llvm.dbg.cu is not present in these cases is to prevent
codegen from emitting debug info (as it is only used for the sample
profile pass).
This changes the discriminators pass to also emit discriminators even
when debug info is not being emitted.
Sanjay Patel [Wed, 11 Nov 2015 17:24:56 +0000 (17:24 +0000)]
[MIPS] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz()
MIPS32 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any MIPS32
implementation.
The net result of allowing this speculation for the regression tests in this patch is
that we get this code:
ctlz:
jr $ra
clz $2, $4
cttz:
addiu $1, $4, -1
not $2, $4
and $1, $2, $1
clz $1, $1
addiu $2, $zero, 32
jr $ra
subu $2, $2, $1
Diego Novillo [Wed, 11 Nov 2015 16:39:22 +0000 (16:39 +0000)]
Properly fix unused variable in disable-assert builds.
I missed the side-effects of ParseBFI in my previous attempt (r252748).
Thanks dblaikie for the suggestion of adding a void use of the unused
variable instead.
Chris Bieneman [Wed, 11 Nov 2015 16:14:03 +0000 (16:14 +0000)]
[CMake] Add support for building the llvm test-suite as part of an LLVM build using clang and lld
Summary:
This patch adds a new CMake module for working with ExternalProjects. This wrapper for ExternalProject supports using just-built tools and can hook up dependencies properly so that projects get cleared out.
The example usage here is for the llvm test-suite. In this example, the test-suite is setup as dependent on clang and lld if they are in-tree. If the clang or lld binaries change the test-suite is re-configured, cleaned, and rebuilt.
This cleanup and abstraction wrapping ExternalProject can be extended and applied to other runtime libraries like compiler-rt and libcxx.
James Molloy [Wed, 11 Nov 2015 15:40:40 +0000 (15:40 +0000)]
[ARM] Combine BFIs together
If we have a chain of BFIs, we may be able to combine several together into one merged BFI. We can do this if the "from" bits from one BFI OR'd with the "from" bits from the other BFI form a contiguous range, and the same with the "to" bits.
Charlie Turner [Wed, 11 Nov 2015 15:03:46 +0000 (15:03 +0000)]
[SLP] Enable -slp-vectorize-hor by default.
Measurements primarily on AArch64 have shown this feature does not
significantly effect compile-time. The are no significant perf changes in LNT,
but for AArch64 at least, there are wins in third party benchmarks.
As discussed on llvm-dev, we're going to try turning this on by default and see
how other targets react to the change.
ADT: Avoid relying on UB in ilist_node::getNextNode()
Re-implement `ilist_node::getNextNode()` and `getPrevNode()` without
relying on the sentinel having a "next" pointer. Instead, get access to
the owning list and compare against the `begin()` and `end()` iterators.
This only works when the node *can* get access to the owning list. The
new support is in `ilist_node_with_parent<>`, and any class `Ty`
inheriting from `ilist_node<NodeTy>` that wants `getNextNode()` and/or
`getPrevNode()` should inherit from
`ilist_node_with_parent<NodeTy, ParentTy>` instead. The requirements:
- `NodeTy` must have a `getParent()` function that returns the parent.
- `ParentTy` must have a `getSublistAccess()` static that, given a(n
ignored) `NodeTy*` (to determine which list), returns a member field
pointer to the appropriate `ilist<>`.
This isn't the cleanest way to get access to the owning list, but it
leverages the API already used in the IR hierarchy (see, e.g.,
`Instruction::getSublistAccess()`).
If anyone feels like ripping out the calls to `getNextNode()` and
`getPrevNode()` and replacing with direct iterator logic, they can also
remove the access function, etc., but as an incremental step, I'm
maintaining the API where it's currently used in tree.
If these requirements are *not* met, call sites with access to the ilist
can call `iplist<NodeTy>::getNextNode(NodeTy*)` directly, as in
ilistTest.cpp.
Why rewrite this?
The old code was broken, calling `getNext()` on a sentinel that possibly
didn't have a "next" pointer at all! The new code avoids that
particular flavour of UB (see the commit message for r252538 for more
details about the "lucky" memory layout that made this function so
interesting).
There's still some UB here: the end iterator gets downcast to `NodeTy*`,
even when it's a sentinel (which is typically
`ilist_half_node<NodeTy*>`). I'll tackle that in follow-up commits.
See this llvm-dev thread for more details:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/091115.html
What's the danger?
There might be some code that relies on `getNextNode()` or
`getPrevNode()` *never* returning `nullptr` -- i.e., that relies on them
being broken when the sentinel is an `ilist_half_node<NodeTy>`. I tried
to root out those cases with the audits I did leading up to r252380, but
it's possible I missed one or two. I hope not.
(If (1) you have out-of-tree code, (2) you've reverted r252380
temporarily, and (3) you get some weird crashes with this commit, then I
recommend un-reverting r252380 and auditing the compile errors looking
for "strange" implicit conversions.)
Right now isTruePredicate is only ever called with Pred == ICMP_SLE or
ICMP_ULE, and the ICMP_SLT and ICMP_ULT cases are dead. This change
removes the untested dead code so that the function is not misleading.