David Blaikie [Mon, 23 Mar 2015 21:17:43 +0000 (21:17 +0000)]
Cleanup else-after-return and add an early-return to llvm-nm
The loop and error handling in checkMachOAndArchFlags didn't make sense
to me (a loop that only ever executes once? An error path that uses the
element the loop stopped at (which must always be a buffer overrun if
I'm reading that right?)... I'm confused) but I've made a guess at what
was intended.
Based on a patch by Richard Thomson to simplify boolean expressions.
Ahmed Bougacha [Mon, 23 Mar 2015 21:17:36 +0000 (21:17 +0000)]
[AArch64, ARM] Enable GlobalMerge with -O3 rather than -O1.
The pass used to be enabled by default with CodeGenOpt::Less (-O1).
This is too aggressive, considering the pass indiscriminately merges
all globals together.
Currently, performance doesn't always improve, and, on code that uses
few globals (e.g., the odd file- or function- static), more often than
not is degraded by the optimization. Lengthy discussion can be found
on llvmdev (AArch64-focused; ARM has similar problems):
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-February/082800.html
Also, it makes tooling and debuggers less useful when dealing with
globals and data sections.
GlobalMerge needs to better identify those cases that benefit, and this
will be done separately. In the meantime, move the pass to run with
-O3 rather than -O1, on both ARM and AArch64.
Chris Bieneman [Mon, 23 Mar 2015 20:04:00 +0000 (20:04 +0000)]
Re-land: Generate targets for each lit suite.
Summary:
This change makes CMake scan for lit suites and generate a target for each lit test suite. The targets follow the format check-<project>-<suite path>.
For example:
check-llvm-unit - Runs the LLVM unit tests
check-llvm-codegen-arm - Runs the ARM codeine tests
Note: These targets are not generated during multi-configuration generators (i.e. Xcode and Visual Studio) because target clutter impacts UI usability.
* Also fixed a minor issue that Duncan pointed out to me I was passing the suite to lit twice
David Blaikie [Mon, 23 Mar 2015 19:45:40 +0000 (19:45 +0000)]
Refactor: Simplify boolean expressions in llvm Support
Simplify boolean expressions using `true` and `false` with `clang-tidy`
Patch by Richard Thomson - I dropped the parens and != 0 test, for
consistency with other patches/tests like this, but I'm open to the
notion that we should add the explicit non-zero test in all these sort
of cases (non-bool assigned to a bool).
Matt Arsenault [Mon, 23 Mar 2015 18:45:30 +0000 (18:45 +0000)]
R600/SI: Allow commuting compares
This enables very common cases to switch to the
smaller encoding.
All of the standard LLVM canonicalizations of comparisons
are the opposite of what we want. Compares with constants
are moved to the RHS, but the first operand can be an inline
immediate, literal constant, or SGPR using the 32-bit VOPC
encoding.
There are additional bad canonicalizations that should
also be fixed, such as canonicalizing ge x, k to gt x, (k + 1)
if this makes k no longer an inline immediate value.
David Blaikie [Mon, 23 Mar 2015 18:39:02 +0000 (18:39 +0000)]
Refactor: simplify boolean expressions in llvm-objdump
Simplify boolean expressions involving `true` and `false` with `clang-tidy`.
Actually upon inspection a bunch of these boolean variables could be
factored away entirely anyway - using find_if and then testing the
result before using it. This also helps reduce indentation in the code
anyway - and a bunch of other related simplification fell out nearby so
I just committed all of that.
Bradley Smith [Mon, 23 Mar 2015 16:52:52 +0000 (16:52 +0000)]
Revert "[ARM] Add more pattern matching for f16 <-> f64 conversions"
This change is incorrect since it converts double rounding into single rounding,
which can produce different results. Instead this optimization will be done by
modifying Clang's codegen to not produce double rounding in the first place.
James Molloy [Mon, 23 Mar 2015 16:15:16 +0000 (16:15 +0000)]
[ARM] Remove target-specific ITOFP/FPTOI nodes
Anton tried this 5 years ago but it was reverted due to extra VMOVs
being emitted. This can be easily fixed with a liberal application
of patterns - matching loads/stores and extractelts.
Petar Jovanovic [Mon, 23 Mar 2015 12:28:13 +0000 (12:28 +0000)]
Fix sign extension for MIPS64 in makeLibCall function
Fixing sign extension in makeLibCall for MIPS64. In MIPS64 architecture all
32 bit arguments (int, unsigned int, float 32 (soft float)) must be sign
extended. This fixes test "MultiSource/Applications/oggenc/".
Daniel Sanders [Mon, 23 Mar 2015 11:33:15 +0000 (11:33 +0000)]
[aarch64] Distinguish the 'Q' and 'm' inline assembly memory constraints.
Summary:
But still handle them the same way since I don't know how they differ on
this target.
Clang also has code for 'Ump', 'Utf', 'Usa', and 'Ush' but calls
llvm_unreachable() on this code path so they are not converted to a
constraint id at the moment.
Hal Finkel [Mon, 23 Mar 2015 08:22:43 +0000 (08:22 +0000)]
[SDAG] Don't widen VSETCC during type legalization for split operands
Because the operands of a vector SETCC node can be of a different type from the
result (and often are), it can happen that even if we'd prefer to widen the
result type of the SETCC, the operands have been split instead. In this case,
the SETCC result also must be split. This mirrors what is done in
WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not
widened the following calls to GetWidenedVector will assert (which is what was
happening in the test case).
Benjamin Kramer [Sat, 21 Mar 2015 21:09:33 +0000 (21:09 +0000)]
[SimplifyLibCalls] Turn memchr(const, C, const) into a bitfield check.
strchr("123!", C) != nullptr is a common pattern to check if C is one
of 1, 2, 3 or !. If the largest element of the string is smaller than
the target's register size we can easily create a bitfield and just
do a simple test for set membership.
int foo(char C) { return strchr("123!", C) != nullptr; } now becomes
cmpl $64, %edi ## range check
sbbb %al, %al
movabsq $0xE000200000001, %rcx
btq %rdi, %rcx ## bit test
sbbb %cl, %cl
andb %al, %cl ## and the two conditions
andb $1, %cl
movzbl %cl, %eax ## returning an int
ret
(imho the backend should expand this into a series of branches, but
that's a different story)
The code is currently limited to bit fields that fit in a register, so
usually 64 or 32 bits. Sadly, this misses anything using alpha chars
or {}. This could be fixed by just emitting a i128 bit field, but that
can generate really ugly code so we have to find a better way. To some
degree this is also recreating switch lowering logic, but we can't
simply emit a switch instruction and thus change the CFG within
instcombine.
Benjamin Kramer [Sat, 21 Mar 2015 16:42:35 +0000 (16:42 +0000)]
StringRef: Just forward StringRef::find to libc's memchr.
Modern libc's have an SSE version of memchr which is a lot faster than our
hand-rolled version. In the past I was reluctant to use it because Darwin's
memchr used a naive ridiculously slow implementation, but that has been fixed
some versions ago.
Benjamin Kramer [Sat, 21 Mar 2015 15:36:06 +0000 (15:36 +0000)]
ValueTracking: Forward getConstantStringInfo's TrimAtNul param into recursive invocation
Currently this is only used to tweak the backend's memcpy inlining
heuristics, testing that isn't very helpful. A real test case will
follow in the next commit, where this behavior would cause a real
miscompilation.
r216771 introduced a change to MemoryDependenceAnalysis that allowed it
to reason about acquire/release operations. However, this change does
not ensure that the acquire/release operations pair. Unfortunately,
this leads to miscompiles as we won't see an acquire load as properly
memory effecting. This largely reverts r216771.
Eric Christopher [Sat, 21 Mar 2015 04:22:23 +0000 (04:22 +0000)]
Remove the target independent TargetMachine::getSubtarget and
TargetMachine::getSubtargetImpl routines.
This keeps the target independent code free of bare subtarget
calls while the remainder of the backends are migrated, or not
if they don't wish to support per-function subtargets as would
be needed for function multiversioning or LTO of disparate
cpu subarchitecture types, e.g.
Eric Christopher [Sat, 21 Mar 2015 04:04:50 +0000 (04:04 +0000)]
Remove the bare getSubtargetImpl call from the AArch64 port. As part
of this add a test that shows we can generate code for functions
that specifically enable a subtarget feature.
Eric Christopher [Sat, 21 Mar 2015 03:36:02 +0000 (03:36 +0000)]
Remove the bare getSubtargetImpl call from the PPC port. As part
of this add a test that shows we can generate code with
for functions that differ by subtarget feature.
Eric Christopher [Sat, 21 Mar 2015 03:17:25 +0000 (03:17 +0000)]
Grab a subtarget off of an AMDGPUTargetMachine rather than a
bare target machine in preparation for the TargetMachine bare
getSubtarget/getSubtargetImpl calls going away.
Eric Christopher [Sat, 21 Mar 2015 03:13:10 +0000 (03:13 +0000)]
Cache the Function dependent subtarget on the MachineFunction.
As preparation for removing the getSubtargetImpl() call from
TargetMachine go ahead and flip the switch on caching the function
dependent subtarget and remove the bare getSubtargetImpl call
from the X86 port. As part of this add a few tests that show we
can generate code and assemble on X86 based on features/cpu on
the Function.
Eric Christopher [Sat, 21 Mar 2015 03:13:05 +0000 (03:13 +0000)]
Grab a subtarget off of a MipsTargetMachine rather than a
bare target machine in preparation for the TargetMachine bare
getSubtarget/getSubtargetImpl calls going away.
Eric Christopher [Sat, 21 Mar 2015 03:13:01 +0000 (03:13 +0000)]
Change getISAEncoding to use the target triple to determine
thumb-ness similar to the rest of the Module level asm printing
infrastructure as debug info finalization happens after the function
may be missing.
Ahmed Bougacha [Sat, 21 Mar 2015 01:23:15 +0000 (01:23 +0000)]
[CodeGen][IfCvt] Don't re-ifcvt blocks with unanalyzable terminators.
If we couldn't analyze its terminator (i.e., it's an indirectbr, or some
other weirdness), we can't safely re-if-convert a predicated block,
because we can't tell whether the predicated terminator can
fallthrough (it does).
Currently, we would completely ignore the fallthrough successor. In
the added testcase, this means we used to generate:
As part of PR22777, switch from `dyn_cast_or_null<>` to `cast<>` in most
`DIDescriptor` accessors. These classes are lightweight wrappers around
pointers, so the users should check for valid pointers before using
them.
This survives a Darwin clang -g bootstrap (after fixing testcases), but
it's possible the bots will complain about other configurations. I'll
fix any fallout as quickly as I can! Once this bakes for a bit I'll
remove the macros.
Note that `DebugLoc` implicitly gets stricter with this change as well,
since it forward to `DILocation`. Any code that's using `DebugLoc`
accessors should check `DebugLoc::isUnknown()` first. (BTW, I'm also
partway through a cleanup of the `DebugLoc` API to make it more obvious
what it is (a glorified pointer wrapper) and remove cruft from before
the Metadata/Value split. I'll commit soon.)
Rafael Espindola [Fri, 20 Mar 2015 20:00:01 +0000 (20:00 +0000)]
Don't declare all text sections at the start of the .s
The code this patch removes was there to make sure the text sections went
before the dwarf sections. That is necessary because MachO uses offsets
relative to the start of the file, so adding a section can change relaxations.
The dwarf sections were being printed at the start just to produce symbols
pointing at the start of those sections.
The underlying issue was fixed in r231898. The dwarf sections are now printed
when they are about to be used, which is after we printed the text sections.
To make sure we don't regress, the patch makes the MachO streamer assert
if CodeGen puts anything unexpected after the DWARF sections.
Rafael Espindola [Fri, 20 Mar 2015 19:48:54 +0000 (19:48 +0000)]
Reorganize the x86 ELF relocation selection logic.
The main differences are:
* Split in 32 and 64 bit functions.
* First switch on the Modifier so that we have only one non fully covered
switch.
* Map the fixup kind first to a x86_64 (or i386) specific enum, to make
it easy to handle cases like X86::reloc_riprel_4byte_movq_load.
* Switch on IsPCRel last, which reduces code duplication.
Verifier: Check that !dbg attachments have the right type
A WIP patch makes `DIDescriptor` accessors more strict, which in turn
causes the `DebugInfoFinder` to crash on wrongly typed `!dbg`
attachments. Catch that error up front in
`Verifier::visitInstruction()`.
Also remove a test that we "handle" invalid `!dbg` attachments, added
back in r99938. We don't want to handle those anymore.
Note: I'm *not* recursing and verifying the debug info graph reachable
from this node; that work is already done by `verifyDebugInfo()`.
This test is supposed to be testing whether metadata attachments to
instructions work, but it was using invalid debug info to do so. (This
was causing assertion failures in the `DebugInfoFinder` with a WIP patch
to be more strict about `DIDescriptor` accessors.)
Rather than fix the debug info -- which is better tested elsewhere --
just test the IR feature directly.
Wei Mi [Fri, 20 Mar 2015 18:33:12 +0000 (18:33 +0000)]
Correctly estimate SROA savings for store operands in inline cost analysis.
When estimating SROA savings, we want to see if an address is derived
off an alloca in the caller. For store instructions, operand 1 is the
address operand, but the current code uses operand 0. Use
getPointerOperand for loads and stores to fix this.
Patch by Easwaran Raman.
http://reviews.llvm.org/D8425
John Brawn [Fri, 20 Mar 2015 17:20:07 +0000 (17:20 +0000)]
[ARM] Fix handling of thumb1 out-of-range frame offsets
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure.
Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer.
Eric Christopher [Fri, 20 Mar 2015 16:03:42 +0000 (16:03 +0000)]
Rewrite StackMap location handling to pre-compute the dwarf register
numbers before emission.
This removes a dependency on being able to access TRI at the module
level and is similar to the DwarfExpression handling. I've modified
the debug support into print/dump routines that'll do the same dumping
but is now callable anywhere and if TRI isn't available will go ahead
and just print out raw register numbers.
Eric Christopher [Fri, 20 Mar 2015 16:03:39 +0000 (16:03 +0000)]
At the beginning of doFinalization set the MachineFunction to
nullptr so that users get an earlier dereferencing error and
so that we can use it to conditionalize access to MachineFunction
specific data.