Sanjay Patel [Wed, 25 Mar 2015 17:34:11 +0000 (17:34 +0000)]
use update_llc_test_checks.py to tighten checking in these tests
1. There were no CHECK-LABELs, so we could match instructions from the wrong function.
2. The use of zero operands meant multiple xor instructions could match some CHECKs.
3. The test was over-specified to need a Sandybridge CPU and Darwin triple.
Justin Bogner [Wed, 25 Mar 2015 08:07:47 +0000 (08:07 +0000)]
test: Fix the dependencies for the check-llvm-* targets
In r233009 we gained specific check-llvm-* build targets for invoking
specific parts of the test suite, but they were copying the
dependencies for check-all, rather than just listing the dependencies
for check-llvm.
This moves the creation of these targets next to the check-llvm
target, and uses that target's configuration rather than the check-all
config.
Craig Topper [Wed, 25 Mar 2015 04:16:50 +0000 (04:16 +0000)]
[X86] Remove GetCpuIDAndInfo, GetCpuIDAndInfoEx and DetectFamilyModel functions from X86 MC layer. They haven't been used since CPU autodetection was removed from X86Subtarget.cpp.
Linker: Temporarily disable dwarfdump checks from r233164
At least one Linux bot [1] doesn't like my dwarfdump checks, so I've
disable those until I can investigate what's going on there. I'll
continue to track this in PR22792.
Linker: Drop function pointers for overridden subprograms
Instead of dropping subprograms that have been overridden, just set
their function pointers to `nullptr`. This is a minor adjustment to the
stop-gap fix for PR21910 committed in r224487, and fixes the crasher
from PR22792.
The problem that r224487 put a band-aid on: how do we find the canonical
subprogram for a `Function`? Since the backend currently relies on
`DebugInfoFinder` (which does a naive in-order traversal of compile
units and picks the first subprogram) for this, r224487 tried dropping
non-canonical subprograms.
Dropping subprograms fails because the backend *also* builds up a map
from subprogram to compile unit (`DwarfDebug::SPMap`) based on the
subprogram lists. A missing subprogram causes segfaults later when an
inlined reference (such as in this testcase) is created.
Instead, just drop the `Function` pointer to `nullptr`, which nicely
mirrors what happens when an already-inlined `Function` is optimized
out. We can't really be sure that it's the same definition anyway, as
the testcase demonstrates.
This still isn't completely satisfactory. Two flaws at least that I can
think of:
- I still haven't found a straightforward way to make this symmetric
in the IR. (Interestingly, the DWARF output is already symmetric,
and I've tested for that to be sure we don't regress.)
- Using `DebugInfoFinder` to find the canonical subprogram for a
function is kind of crazy. We should just attach metadata to the
function, like this:
Paul Robinson [Wed, 25 Mar 2015 00:10:24 +0000 (00:10 +0000)]
'optnone' should not disable DAG combiner.
Reverts the code change from r221168 and the relevant test.
It was a mistake to disable the combiner, and based on the ultimate
definition of 'optnone' we shouldn't have considered the test case
as failing in the first place.
Philip Reames [Tue, 24 Mar 2015 23:54:54 +0000 (23:54 +0000)]
!invariant.load semantics with potentially clobbering calls
A load from an invariant location is assumed to not alias any otherwise potentially aliasing stores. Our implementation only applied this rule to store instructions themselves whereas they it should apply for any memory accessing instruction. This results in both FRE and PRE becoming more effective at eliminating invariant loads.
Note that as a follow on change I will likely move this into AliasAnalysis itself. That's where the TBAA constant flag is handled and the semantics are essentially the same. I'd like to separate the semantic change from the refactoring and thus have extended the hack that's already in MemoryDependenceAnalysis for this change.
Reid Kleckner [Tue, 24 Mar 2015 23:46:01 +0000 (23:46 +0000)]
X86: Fix frameescape when not using an FP
We can't use TargetFrameLowering::getFrameIndexOffset directly, because
Win64 really wants the offset from the stack pointer at the end of the
prologue. Instead, use X86FrameLowering::getFrameIndexOffsetFromSP(),
which is a pretty close approximiation of that. It fails to handle cases
with interestingly large stack alignments, which is pretty uncommon on
Win64 and is TODO.
Justin Bogner [Tue, 24 Mar 2015 23:34:36 +0000 (23:34 +0000)]
llvm-cov: Require a subcommand when invoked as llvm-cov
A while ago llvm-cov gained support for clang's instrumentation based
profiling in addition to its gcov support, and subcommands were added
to choose which behaviour to use. When no subcommand was specified, we
fell back to gcov compatibility with a warning that a subcommand would
be required in the future. Now, we require the subcommand.
Note that if the basename of llvm-cov is gcov (via symlink or
hardlink, for example), we still use the gcov compatible behaviour
with no subcommand required.
David Blaikie [Tue, 24 Mar 2015 23:34:31 +0000 (23:34 +0000)]
Opaque Pointer Types: GEP API migrations to specify the gep type explicitly
The changes to InstCombine (& SCEV) do seem a bit silly - it doesn't make
anything obviously better to have the caller access the pointers element
type (the thing I'm trying to remove) than the GEP itself, but it's a
helpful migration step. This will allow me to more obviously lock down
GEP (& Load, etc) API usage, then fix all the code that accesses pointer
element types except the places that need to be removed (most of the
InstCombines) anyway - at which point I'll need to just remove all that
code because it won't be meaningful anymore (there will be no pointer
types, so no bitcasts to combine)
SCEV looks like it'll need some restructuring - we'll have to do a bit
more work for GEP canonicalization, since it'll depend on how it's used
if we can even manage to canonicalize it to a non-ugly GEP. I guess we
can do some fun stuff like voting (do 2 out of 3 load from the GEP with
a certain type that gives a pretty GEP? Does every typed use of the GEP
use either a specific type or a generic type (i8*, etc)?)
Frederic Riss [Tue, 24 Mar 2015 23:11:07 +0000 (23:11 +0000)]
[dsymutil] Temporarily disable some tests on windows.
It seems one windows bot fails since I added ilne table linking to
llvm-dsymutil (see r232333 commit thread).
Disable the affected tests until I can figure out what's happening.
Sanjay Patel [Tue, 24 Mar 2015 22:39:29 +0000 (22:39 +0000)]
optimize the AVX2 (integer) version of vperm2 into a shuffle
...because this is what happens when an instruction
set puts its underwear on after its pants.
This is an extension of r232852, r233100, and 233110:
http://llvm.org/viewvc/llvm-project?view=revision&revision=232852
http://llvm.org/viewvc/llvm-project?view=revision&revision=233100
http://llvm.org/viewvc/llvm-project?view=revision&revision=233110
David Blaikie [Tue, 24 Mar 2015 22:38:16 +0000 (22:38 +0000)]
Opaque Pointer Types: GEP API migrations to specify the gep type explicitly
The changes to InstCombine do seem a bit silly - it doesn't make
anything obviously better to have the caller access the pointers element
type (the thing I'm trying to remove) than the GEP itself, but it's a
helpful migration step. This will allow me to more obviously lock down
GEP (& Load, etc) API usage, then fix all the code that accesses pointer
element types except the places that need to be removed (most of the
InstCombines) anyway - at which point I'll need to just remove all that
code because it won't be meaningful anymore (there will be no pointer
types, so no bitcasts to combine)
Philip Reames [Tue, 24 Mar 2015 22:28:45 +0000 (22:28 +0000)]
Merge empty landing pads in SimplifyCFG
This patch tries to merge duplicate landing pads when they branch to a common shared target.
Given IR that looks like this:
lpad1:
%exn = landingpad {i8*, i32} personality i32 (...)* @__gxx_personality_v0
cleanup
br label %shared_resume
lpad2:
%exn2 = landingpad {i8*, i32} personality i32 (...)* @__gxx_personality_v0
cleanup
br label %shared_resume
shared_resume:
call void @fn()
ret void
}
We can rewrite the users of both landing pad blocks to use one of them. This will generally allow the shared_resume block to be merged with the common landing pad as well.
Without this change, tail duplication would likely kick in - creating N (2 in this case) copies of the shared_resume basic block.
AArch64: use a different means to determine whether to byte swap relocations.
This code depended on a bug in the FindAssociatedSection function that would
cause it to return the wrong result for certain absolute expressions. Instead,
use EvaluateAsRelocatable.
David Blaikie [Tue, 24 Mar 2015 21:31:31 +0000 (21:31 +0000)]
Remove an InstCombine that seems to have become redundant.
Assert that this doesn't fire - I'll remove all of this later, but just
leaving it in for a while in case this is firing & we just don't have
test coverage.
Sanjay Patel [Tue, 24 Mar 2015 20:36:42 +0000 (20:36 +0000)]
[X86, AVX] instcombine vperm2 intrinsics with zero inputs into shuffles
This is the IR optimizer follow-on patch for D8563: the x86 backend patch
that converts this kind of shuffle back into a vperm2.
This is also a continuation of the transform that started in D8486.
In that patch, Andrea suggested that we could convert vperm2 intrinsics that
use zero masks into a single shuffle.
Sanjoy Das [Tue, 24 Mar 2015 19:29:22 +0000 (19:29 +0000)]
[IRCE] Fix how IRCE checks for no-sign-overflow.
IRCE requires the induction variables it handles to not sign-overflow.
The current scheme of checking if sext({X,+,S}) == {sext(X),+,sext(S)}
fails when SCEV simplifies sext(X) too. After this change we //also//
check no-signed-wrap by looking at the flags set on the SCEVAddRecExpr.
DebugInfo: Reorder definitions of MDLocation and MDFile, NFC
Move definition of `MDLocation` after `MDLocalScope` so that the latter
is available for casts in the former. Similarly, move the definition of
`MDFile` as early as possible so that other classes can cast to it in
their definitions. (Follow-up commits will take advantage of this.)
DebugInfo: Add MDLocalScope, a legal scope for locals
Add a subclass of `MDScope` to explicitly categorize the legal scopes
for locals -- in particular, scopes that are legal for `MDLocation`,
`MDLexicalBlockBase`, and `MDLocalVariable`. This provides a convenient
`isa<>` target for the verifier, and eventually I'll be changing the
above classes' `getScope()` to specifically return it. Currently, its
subclasses are `MDSubprogram`, `MDLexicalBlock`, and
`MDLexicalBlockFile`.
I've gone with `MDLocalScope` for now -- a little ambiguous since it's a
scope *for* locals, not a scope that's local -- but I'm open to more
descriptive names if someone can think of something better. Regardless,
the code docs should make it clear enough.
It still causes buildbot failures (gcc running out of memory on several platforms, and a self-host failure on arm), although less than the previous time.
Daniel Sanders [Tue, 24 Mar 2015 11:26:34 +0000 (11:26 +0000)]
[mips] Distinguish 'R', 'ZC', and 'm' inline assembly memory constraint.
Summary:
Previous behaviour of 'R' and 'm' has been preserved for now. They will be
improved in subsequent commits.
The offset permitted by ZC varies according to the subtarget since it is
intended to match the restrictions of the pref, ll, and sc instructions.
The restrictions on these instructions are:
* For microMIPS: 12-bit signed offset.
* For Mips32r6/Mips64r6: 9-bit signed offset.
* Otherwise: 16-bit signed offset.
James Molloy [Tue, 24 Mar 2015 11:15:23 +0000 (11:15 +0000)]
"float2int": Add a new pass to demote from float to int where possible.
It is possible to have code that converts from integer to float, performs operations then converts back, and the result is provably the same as if integers were used.
This can come from different sources, but the most obvious is a helper function that uses floats but the arguments given at an inlined callsites are integers.
This pass considers all integers requiring a bitwidth less than or equal to the bitwidth of the mantissa of a floating point type (23 for floats, 52 for doubles) as exactly representable in floating point.
To reduce the risk of harming efficient code, the pass only attempts to perform complete removal of inttofp/fptoint operations, not just move them around.
Previously, subtarget features were a bitfield with the underlying type being uint64_t.
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.
The first time this was committed (r229831), it caused several buildbot failures.
At least some of the ARM ones were due to gcc/binutils issues, and should now be fixed.
Lang Hames [Tue, 24 Mar 2015 04:07:01 +0000 (04:07 +0000)]
[Orc] Use std::string to capture name by value.
This just updates the code to reflect the comment, but this bug actually hit the
out-of-tree lazy demo. I'm working on a patch to add the lazy-demo's
functionality to lli so that we can test this in-tree soon.
DebugInfo: Overload get() in DIDescriptor subclasses
Continue to simplify the `DIDescriptor` subclasses, so that they behave
more like raw pointers. Remove `getRaw()`, replace it with an
overloaded `get()`, and overload the arrow and cast operators. Two
testcases started to crash on the arrow operators with this change
because of `scope:` references that weren't real scopes. I fixed them.
Soon I'll add verifier checks for them too.
This also adds explicit dereference operators. Previously, the builtin
dereference against `operator MDNode *()` would have worked, but now the
builtins are ambiguous.
David Blaikie [Mon, 23 Mar 2015 21:17:43 +0000 (21:17 +0000)]
Cleanup else-after-return and add an early-return to llvm-nm
The loop and error handling in checkMachOAndArchFlags didn't make sense
to me (a loop that only ever executes once? An error path that uses the
element the loop stopped at (which must always be a buffer overrun if
I'm reading that right?)... I'm confused) but I've made a guess at what
was intended.
Based on a patch by Richard Thomson to simplify boolean expressions.
Ahmed Bougacha [Mon, 23 Mar 2015 21:17:36 +0000 (21:17 +0000)]
[AArch64, ARM] Enable GlobalMerge with -O3 rather than -O1.
The pass used to be enabled by default with CodeGenOpt::Less (-O1).
This is too aggressive, considering the pass indiscriminately merges
all globals together.
Currently, performance doesn't always improve, and, on code that uses
few globals (e.g., the odd file- or function- static), more often than
not is degraded by the optimization. Lengthy discussion can be found
on llvmdev (AArch64-focused; ARM has similar problems):
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-February/082800.html
Also, it makes tooling and debuggers less useful when dealing with
globals and data sections.
GlobalMerge needs to better identify those cases that benefit, and this
will be done separately. In the meantime, move the pass to run with
-O3 rather than -O1, on both ARM and AArch64.
Chris Bieneman [Mon, 23 Mar 2015 20:04:00 +0000 (20:04 +0000)]
Re-land: Generate targets for each lit suite.
Summary:
This change makes CMake scan for lit suites and generate a target for each lit test suite. The targets follow the format check-<project>-<suite path>.
For example:
check-llvm-unit - Runs the LLVM unit tests
check-llvm-codegen-arm - Runs the ARM codeine tests
Note: These targets are not generated during multi-configuration generators (i.e. Xcode and Visual Studio) because target clutter impacts UI usability.
* Also fixed a minor issue that Duncan pointed out to me I was passing the suite to lit twice
David Blaikie [Mon, 23 Mar 2015 19:45:40 +0000 (19:45 +0000)]
Refactor: Simplify boolean expressions in llvm Support
Simplify boolean expressions using `true` and `false` with `clang-tidy`
Patch by Richard Thomson - I dropped the parens and != 0 test, for
consistency with other patches/tests like this, but I'm open to the
notion that we should add the explicit non-zero test in all these sort
of cases (non-bool assigned to a bool).
Matt Arsenault [Mon, 23 Mar 2015 18:45:30 +0000 (18:45 +0000)]
R600/SI: Allow commuting compares
This enables very common cases to switch to the
smaller encoding.
All of the standard LLVM canonicalizations of comparisons
are the opposite of what we want. Compares with constants
are moved to the RHS, but the first operand can be an inline
immediate, literal constant, or SGPR using the 32-bit VOPC
encoding.
There are additional bad canonicalizations that should
also be fixed, such as canonicalizing ge x, k to gt x, (k + 1)
if this makes k no longer an inline immediate value.
David Blaikie [Mon, 23 Mar 2015 18:39:02 +0000 (18:39 +0000)]
Refactor: simplify boolean expressions in llvm-objdump
Simplify boolean expressions involving `true` and `false` with `clang-tidy`.
Actually upon inspection a bunch of these boolean variables could be
factored away entirely anyway - using find_if and then testing the
result before using it. This also helps reduce indentation in the code
anyway - and a bunch of other related simplification fell out nearby so
I just committed all of that.