John Brawn [Fri, 20 Mar 2015 17:20:07 +0000 (17:20 +0000)]
[ARM] Fix handling of thumb1 out-of-range frame offsets
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure.
Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer.
Eric Christopher [Fri, 20 Mar 2015 16:03:42 +0000 (16:03 +0000)]
Rewrite StackMap location handling to pre-compute the dwarf register
numbers before emission.
This removes a dependency on being able to access TRI at the module
level and is similar to the DwarfExpression handling. I've modified
the debug support into print/dump routines that'll do the same dumping
but is now callable anywhere and if TRI isn't available will go ahead
and just print out raw register numbers.
Eric Christopher [Fri, 20 Mar 2015 16:03:39 +0000 (16:03 +0000)]
At the beginning of doFinalization set the MachineFunction to
nullptr so that users get an earlier dereferencing error and
so that we can use it to conditionalize access to MachineFunction
specific data.
Daniel Jasper [Fri, 20 Mar 2015 10:00:37 +0000 (10:00 +0000)]
[MBP] Don't outline short optional branches
With the option -outline-optional-branches, LLVM will place optional
branches out of line (more details on r231230).
With this patch, this is not done for short optional branches. A short
optional branch is a branch containing a single block with an
instruction count below a certain threshold (defaulting to 3). Still
everything is guarded under -outline-optional-branches).
Outlining a short branch can't significantly improve code locality. It
can however decrease performance because of the additional jmp and in
cases where the optional branch is hot. This fixes a compile time
regression I have observed in a benchmark.
Nick Lewycky [Fri, 20 Mar 2015 02:25:00 +0000 (02:25 +0000)]
When simplifying a SCEV truncate by distributing, consider it a simplification to replace a cast, even if we end up with a trunc around the term. Fixes PR22960!
Don't use `DebugLoc` accessors if we're pointing at null, which will be
a problem after a WIP patch to make the `DIDescriptor` accessors more
strict. Caught by Frontend/profile-sample-use-loc-tracking.c (in
clang).
Verifier: Remove the separate DebugInfoVerifier class
Remove the separate `DebugInfoVerifier` class, as a partial step toward
better integrating debug info verification with the `Verifier`.
Right now, verification of debug info is kind of a mess.
- There are `DIDescriptor::Verify()` checks live in `DebugInfo.cpp`.
These return `bool`, and there's no way to see (except by opening a
debugger) why they fail.
- We rely on `DebugInfoFinder` to traverse the debug info graph and
dig up nodes. However, the regular `Verifier` visits many of these
nodes when it calls into debug info intrinsic operands. Visiting
twice and running different checks is kind of absurd.
- Moreover, `DebugInfoFinder` asserts on failed type resolution -- the
verifier should never assert!
By integrating the two verifiers, I'm aiming at solving these problems
(work to be done, obviously). Verification can be localized to the
`Verifier`; we can use a naive `MDNode` operand traversal to find all
the nodes; we can verify type references instead of asserting on
failure.
There are `assert()`s sprinkled throughout the optimizer and dwarf
backend on `DIDescriptor::Verify()` checks. This is a hangover from
when the debug info verifier was off, so I plan to remove them as I go
(once I confirm that the checks are done at verification time).
Note: to keep the behaviour of only running the debug info verifier when
-verify succeeds, I've added an `EverBroken` flag. Once the
`DebugInfoFinder` assertions are gone and the two traversals have been
merged, I expect to be able to remove this.
This works in a similar way to the gold plugin tests. We search for a compatible
linker on $PATH and use it to run tests against our just-built libLTO. To start
with, test the just added opt level functionality.
Eric Christopher [Thu, 19 Mar 2015 23:27:42 +0000 (23:27 +0000)]
Use the cached subtarget on the MachineFunction when the AsmPrinter
will have a MachineFunction, i.e. in places other than the module
level doInitialize/doFinalize.
Owen Anderson [Thu, 19 Mar 2015 22:48:57 +0000 (22:48 +0000)]
Fix a nasty bug in DAGCombine of STORE nodes.
This is very related to the bug fixed in r174431. The problem is that
SelectionDAG does not include alignment in the uniquing of loads and
stores. When an otherwise no-op DAGCombine would increase the alignment
of a load or store, the original node would be returned (with the
alignment increased), which would cause the node not to be processed by
any further DAGCombines.
I don't have a direct testcase for this that manifests on an in-tree
target, but I did see some noise in the tests for other targets and have
updated them for it.
Eric Christopher [Thu, 19 Mar 2015 22:36:37 +0000 (22:36 +0000)]
Add an MCSubtargetInfo variable to the TargetMachine.
This enables us to remove calls to the subtarget from the TargetMachine
and with a small hack for backends that require global subtarget
information for module level code generation, e.g. mips abi flags, as
mentioned in a fixme in the code.
Sanjay Patel [Thu, 19 Mar 2015 22:29:40 +0000 (22:29 +0000)]
[X86, AVX] use blends instead of insert128 with index 0
Another case of x86-specific shuffle strength reduction:
avoid generating insert*128 instructions with index 0 because
they are slower than their non-lane-changing blend equivalents.
Shuffle lowering already catches most of these cases, but
the zero vector case and some other paths such as in the
modified test in vector-shuffle-256-v32.ll were getting
through.
Remove `DebugInfoVerifierLegacyPass` and the `-verify-di` pass.
Instead, call into the `DebugInfoVerifier` from inside
`VerifierLegacyPass::finalizeModule()`. This better matches the logic
in `verifyModule()` (used by the new PassManager), avoids requiring two
separate passes to verify the IR, and makes the API for "add a pass to
verify the IR" simple.
Note: the `-verify-debug-info` flag still works (for now, at least;
eventually it might make sense to just remove it).
Each use of the byte array uses a different alias. This makes the
backend less likely to reuse previously computed byte array addresses,
improving the security of the CFI mechanism based on this pass.
libLTO, llvm-lto, gold: Introduce flag for controlling optimization level.
This change also introduces a link-time optimization level of 1. This
optimization level runs only the globaldce pass as well as cleanup passes for
passes that run at -O0, specifically simplifycfg which cleans up lowerbitsets.
PassManagerBuilder: Remove effectively dead 'StripDebug' option
`StripDebug` was only used by tools/opt/opt.cpp in
`AddStandardLinkPasses()`, but opt.cpp adds the same pass based on its
command-line flag before it calls `AddStandardLinkPasses()`. Stripping
debug info twice isn't very useful.
GlobalDCE: Improve performance for large modules containing comdats.
When we encounter a global with a comdat, rather than iterating over
every global in the module to find globals in the same comdat, store the
members in a multimap. This effectively lowers the complexity to O(N log N),
improving performance significantly for large modules such as might be
encountered during LTO.
It looks like we used to do something like this until r219191.
Artem Belevich [Thu, 19 Mar 2015 17:05:35 +0000 (17:05 +0000)]
Add support for __nvvm_reflect changes in libdevice in CUDA-7.0
Summary:
CUDA 7.0's libdevice uses slightly different IR to call __nvvm_reflect
and that triggers an assertion in nvvm_reflect optimization pass. This
change allows nvvm_reflect pass to deal with both old and new ways to
pass an argument to __nvvm_reflect.
Chris Bieneman [Thu, 19 Mar 2015 16:49:44 +0000 (16:49 +0000)]
Fixing dependencies for native tablegen.
The dependencies for cross-built tablegen were a bit confused. This fixes that. The following dependencies are now enforced:
(1) Tablegen tasks depend on the native tablegen
(2) Native tablegen depends on the cross-compiled tablegen
Although the native tablegen doesn't actually require the cross tablegen, having this dependency forces the native tablegen to rebuild whenever the cross tablegen changes.
Greg Bedwell [Thu, 19 Mar 2015 16:32:47 +0000 (16:32 +0000)]
[CMake] Don't pass in MSVC warning flags as definitions.
NFC currently but required as a prerequisite for using
the Microsoft resource compiler in conjunction with
CMake's ninja generator, which knows how to filter flags
appropriately, but not definitions.
Daniel Jasper [Thu, 19 Mar 2015 11:05:08 +0000 (11:05 +0000)]
[InstCombine] Don't fold a GEP into itself through a PHI node
This can only occur (I think) through the back-edge of the loop.
However, folding a GEP into itself means that the value of the previous
iteration needs to be stored in the meantime, thus requiring an
additional register variable to be live, but not actually achieving
anything (the gep still needs to be executed once per loop iteration).
The attached test case is derived from:
typedef unsigned uint32;
typedef unsigned char uint8;
inline uint8 *f(uint32 value, uint8 *target) {
while (value >= 0x80) {
value >>= 7;
++target;
}
++target;
return target;
}
uint8 *g(uint32 b, uint8 *target) {
target = f(b, f(42, target));
return target;
}
What happens is that the GEP stored in incptr2 is folded into itself
through the loop's back-edge and the phi-node stored in loopptr,
effectively incrementing the ptr by "2" in each iteration instead of "1".
In this case, it is actually increasing the number of GEPs required as
the GEP before the loop can't be folded away anymore. For comparison:
Justin Bogner [Thu, 19 Mar 2015 04:45:16 +0000 (04:45 +0000)]
llvm-cov: Rename -color={always|never} to -use-color[=0]
This is an ugly hack to fix the configure --enable-shared build. It
turns out that *every cl::opt in LLVM* shows up in *every tool* in
that configuration, which is hopelessly broken. This skirts around the
issue by not colliding with another option's name, for now.
I've also simplified the option implementation - the other "color"
option used cl::boolOrDefault and was much nicer than what I'd written
before.
Rafael Espindola [Thu, 19 Mar 2015 01:50:16 +0000 (01:50 +0000)]
Split the object streamer callback in one per file format.
There are two main advantages to doing this
* Targets that only need to handle one of the formats specially don't have
to worry about the others. For example, x86 now only registers a
constructor for the COFF streamer.
* Changes to the arguments passed to one format constructor will not impact
the other formats.
I don't understand why this error is happening, and I don't see it on
any other bots or on my own machine, so I'm kind of grasping at
straws. Try using an unscoped enum and specifying a cl::init to see if
they help.
Matthias Braun [Thu, 19 Mar 2015 00:21:58 +0000 (00:21 +0000)]
Do not track subregister liveness when it brings no benefits
Some subregisters are only to indicate different access sizes, while not
providing any way to actually divide the register up into multiple
disjunct parts. Avoid tracking subregister liveness in these cases as it
is not beneficial.
Rafael Espindola [Wed, 18 Mar 2015 22:19:16 +0000 (22:19 +0000)]
Teach getDefaultFormat that we only support ELF on some architectures.
This should bring the windows bots back.
It is a bit ugly, but it is better than what we had before: The triple would
say that the object format was COFF, but llc/llvm-mc would produce an ELF.
Chris Bieneman [Wed, 18 Mar 2015 21:19:06 +0000 (21:19 +0000)]
Generate targets for each lit suite.
Summary:
This change makes CMake scan for lit suites and generate a target for each lit test suite. The targets follow the format check-<project>-<suite path>.
For example:
check-llvm-unit - Runs the LLVM unit tests
check-llvm-codegen-arm - Runs the ARM codeine tests
Note: These targets are not generated during multi-configuration generators (i.e. Xcode and Visual Studio) because target clutter impacts UI usability.
Eric Christopher [Wed, 18 Mar 2015 20:37:30 +0000 (20:37 +0000)]
Revert "Migrate the AArch64 TargetRegisterInfo to its TargetMachine"
as we don't necessarily need to do this yet - though we could move
the base class to the TargetMachine as it isn't subtarget dependent.
Reid Kleckner [Wed, 18 Mar 2015 20:09:13 +0000 (20:09 +0000)]
CMake: Disable ENABLE_EXPORTS for executables with MSVC
The MSVC linker won't produce a .lib file for an executable that doesn't
export anything, and LLVM doesn't maintain dllexport annotations or .def
files listing all C++ symbols. It also doesn't support exporting all
symbols, like binutils ld.
CMake 3.2 changed the Ninja generator to list both the .exe and .lib
files as outputs of executable build targets. Ninja would always re-link
executables with ENABLE_EXPORTS because the .lib output file was not
present, and therefore the target was out of date.
Simon Pilgrim [Wed, 18 Mar 2015 19:35:31 +0000 (19:35 +0000)]
[X86][SSE] Avoid scalarization of v2i64 vector shifts
Currently v2i64 vectors shifts (non-equal shift amounts) are scalarized, costing 4 x extract, 2 x x86-shifts and 2 x insert instructions - and it gets even more awkward on 32-bit targets.
This patch separately shifts the vector by both shift amounts and then shuffles the partial results back together, costing 2 x shuffles and 2 x sse-shifts instructions (+ 2 movs on pre-AVX hardware).
Note - this patch only improves the SHL / LSHR logical shifts as only these are supported in SSE hardware.
Matthias Braun [Wed, 18 Mar 2015 17:56:09 +0000 (17:56 +0000)]
TableGen: Fix register class lane masks being too conservative.
When calculating the lanemask of a register class we have to include the
masks of subregisters supported by any of the class members, not just
the ones supported by all class members.
This fixes problems when coalescing towards a subclass with additional
subregisters available.
The attached testcase works fine as is, but does crash if you enable
subregister liveness on x86 without this change applied.
Sanjay Patel [Wed, 18 Mar 2015 16:07:10 +0000 (16:07 +0000)]
fixed to test features, not CPU model
The 'vmovntdq' was only passing due to a fluke in
SandyBridge codegen that splits 32-byte stores in half,
but that meant that the test was not correctly checking
for the 32-byte store that we thought we were generating.
The lax checking in this file will be addressed in
another commit. There are bigger problems here.
Daniel Jasper [Wed, 18 Mar 2015 12:45:45 +0000 (12:45 +0000)]
Change test to accept an additional critical edge split.
The two hot blocks are right next to each other and I verified that
there is no performance regression by compressing/uncompressing some
files with a minigzip built with the different options.
John Brawn [Wed, 18 Mar 2015 12:01:59 +0000 (12:01 +0000)]
[ARM] Align stack objects passed to memory intrinsics
Memcpy, and other memory intrinsics, typically tries to use LDM/STM if
the source and target addresses are 4-byte aligned. In CodeGenPrepare
look for calls to memory intrinsics and, if the object is on the
stack, 4-byte align it if it's large enough that we expect that memcpy
would want to use LDM/STM to copy it.
Yaron Keren [Wed, 18 Mar 2015 10:17:07 +0000 (10:17 +0000)]
Remove many superfluous SmallString::str() calls.
Now that SmallString is a first-class citizen, most SmallString::str()
calls are not required. This patch removes a whole bunch of them, yet
there are lots more.
There are two use cases where str() is really needed:
1) To use one of StringRef member functions which is not available in
SmallString.
2) To convert to std::string, as StringRef implicitly converts while
SmallString do not. We may wish to change this, but it may introduce
ambiguity.
Kai Nacke [Wed, 18 Mar 2015 06:28:38 +0000 (06:28 +0000)]
[mips] Add itineraries for ext and ins instructions.
Currently, there are no itineraries defined for ext and ins instructions.
This patch adds these itineraries and uses them in the instruction definitions.
Josh Magee [Wed, 18 Mar 2015 01:34:06 +0000 (01:34 +0000)]
Add testcases for BEXTR.
These BEXTR cases are a check for the 64-bit load form and two negative cases where the bitrange is non-contiguous. From a private patch equivalent to r189742/PR17028.