Mehdi Amini [Tue, 6 Oct 2015 17:19:20 +0000 (17:19 +0000)]
This patch builds on top of D13378 to handle constant condition.
With this patch, clang -O3 optimizes correctly providing > 1000x speedup on this artificial benchmark):
for (a=0; a<n; a++)
for (b=0; b<n; b++)
for (c=0; c<n; c++)
for (d=0; d<n; d++)
for (e=0; e<n; e++)
for (f=0; f<n; f++)
x++;
From test-suite/SingleSource/Benchmarks/Shootout/nestedloop.c
Daniel Sanders [Tue, 6 Oct 2015 15:17:25 +0000 (15:17 +0000)]
[mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
Summary:
This fixes 7 tests during fast LLVM test-suite run:
* MultiSource/Benchmarks/McCat/18-imp/imp
* MultiSource/Applications/oggenc/oggenc
* MultiSource/Benchmarks/MallocBench/gs/gs
* MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer
* MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame
* MultiSource/Benchmarks/Bullet/bullet
Error message was in the form of:
fatal error: error in backend: Cannot select: 0x95c3288: f32 = fsqrt 0x95c0190 [ORD=9] [ID=18]
0x95c0190: f32 = fadd 0x95bef30, 0x95c4d00 [ORD=8] [ID=17]
0x95bef30: f32 = fmul 0x95c4988, 0x95c4988 [ORD=5] [ID=16]
...
There was problem with selecting sqrt instruction in LLVM backend.
To fix the issue changes are made in TableGen definition for sqrt instruction in MipsInstrFPU.td and new test file sqrt.ll is added to LLVM regression tests.
[InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector select masks with ConstantExpr elements (PR24922)
If the mask of a select instruction is a ConstantVector, method
SimplifyDemandedVectorElts iterates over the mask elements to identify which
values are selected from the select inputs.
Before this patch, method SimplifyDemandedVectorElts always used method
Constant::isNullValue() to check if a value in the mask was zero. Unfortunately
that method always returns false when called on a ConstantExpr.
This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit
check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we
avoid calling isNullValue() on it.
[bpf] Avoid extra pointer arithmetic for stack access
For the program like below
struct key_t {
int pid;
char name[16];
};
extern void test1(char *);
int test() {
struct key_t key = {};
test1(key.name);
return 0;
}
For key.name, the llc/bpf may generate the below code:
R1 = R10 // R10 is the frame pointer
R1 += -24 // framepointer adjustment
R1 |= 4 // R1 is then used as the first parameter of test1
OR operation is not recognized by in-kernel verifier.
This patch introduces an intermediate FI_ri instruction and
generates the following code that can be properly verified:
R1 = R10
R1 += -20
Craig Topper [Tue, 6 Oct 2015 02:50:24 +0000 (02:50 +0000)]
[X86] Teach constant hoisting that ANDs with 64-bit immediates in the range 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted.
Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister.
Dan Gohman [Tue, 6 Oct 2015 00:27:55 +0000 (00:27 +0000)]
[WebAssembly] Switch to a more traditional assembly syntax
This new syntax is built around putting each instruction on its own line
in a "mnemonic op, op, op" like syntax. It also uses conventional data
section directives like ".byte" and so on rather than requiring everything
to be in hierarchical S-expression format. This is a more natural syntax
for a ".s" file format from the perspective of LLVM MC and related tools,
while remaining easy to translate into other forms as needed.
Adrian Prantl [Mon, 5 Oct 2015 23:11:20 +0000 (23:11 +0000)]
dsymutil: Don't prune forward declarations inside of an imported TAG_module
if there exists not definition for the type.
For this to work, we need to clone the imported modules before building
the decl context chains of the DIEs in the non-skeleton CUs.
Diego Novillo [Mon, 5 Oct 2015 21:08:05 +0000 (21:08 +0000)]
Remove AutoFDO profile handling for GCC's LIPO. NFC.
Given the work we are doing on ThinLTO, we will never need to support
module groups and working sets in GCC's implementation of LIPO. These
are currently dead code, and will continue to be so.
David Majnemer [Mon, 5 Oct 2015 20:09:16 +0000 (20:09 +0000)]
[WinEH] Update CATCHRET's operand to match its successor
The CATCHRET operand did not match the MachineFunction's CFG. This
mismatch happened because FrameLowering created a new MachineBasicBlock
and updated the CFG but forgot to update the CATCHRET operand.
Let's make sure this doesn't happen again by strengthing the funclet
membership analysis: it can now reason about the membership of all basic
blocks, not just those inside of funclets.
Tom Stellard [Mon, 5 Oct 2015 17:57:39 +0000 (17:57 +0000)]
AMDGPU/SI: Add a helper for creating aliases for the _e32 instructions
Summary:
We are currently only using these aliases for VOPC instructions,
but this helper will make it easier to use them everywhere.
These aliases allow for the automatic matching of instructions
with forced 32-bit encoding. Eventually, we should be able to remove
the custom C++ logic we have for this in the assembler.
Otherwise, the map will observe changes as long as MergeFunctions is alive. This
is bad because follow-up passes could replace-all-uses-with on the key of an
entry in the map. The value handle callback of ValueMap however asserts that the
key type matches.
Scott Douglass [Mon, 5 Oct 2015 14:49:54 +0000 (14:49 +0000)]
[ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.
A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.
Fixes PR9199 and PR23768.
[This is based on Peter Collingbourne's r238473 which was reverted.]
[MC layer][AArch64] llvm-mc accepts 4-bit immediate values for
"msr pan, #imm", while only 1-bit immediate values should be valid.
Changed encoding and decoding for msr pstate instructions.
Daniel Sanders [Mon, 5 Oct 2015 13:19:29 +0000 (13:19 +0000)]
[mips] Changed the way symbols are handled in dla and la instructions to allow simple expressions.
Summary:
An instruction like "(d)la $5, symbol+8" previously would have crashed the
assembler as it contains an expression. This is now fixed.
A few tests cases have also been changed to reflect these changes, however
these should only be syntax changes. Some new test cases have also been
added.
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.
With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.
In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.
This patch is a joint work by Maxim Ostapenko andy myself.
Teresa Johnson [Sun, 4 Oct 2015 14:33:43 +0000 (14:33 +0000)]
Support for function summary index bitcode sections and files.
Summary:
The bitcode format is described in this document:
https://drive.google.com/file/d/0B036uwnWM6RWdnBLakxmeDdOeXc/view
For more info on ThinLTO see:
https://sites.google.com/site/llvmthinlto
The first customer is ThinLTO, however the data structures are designed
and named more generally based on prior feedback. There are a few
comments regarding how certain interfaces are used by ThinLTO, and the
options added here to gold currently have ThinLTO-specific names as the
behavior they provoke is currently ThinLTO-specific.
This patch includes support for generating per-module function indexes,
the combined index file via the gold plugin, and several tests
(more are included with the associated clang patch D11908).
David Majnemer [Sun, 4 Oct 2015 02:22:52 +0000 (02:22 +0000)]
[WinEH] Permit branch folding in the face of funclets
Track which basic blocks belong to which funclets. Permit branch
folding to fire but only if it can prove that doing so will not cause
code in one funclet to be reused in another.
visitSIGN_EXTEND_INREG calls SelectionDAG::getNode to constant fold scalar constants but handles vector constants itself, despite getNode being capable of dealing with them.
This required a minor change to the getNode implementation to actually deal with cases where the scalars of a BUILD_VECTOR were wider integers than the vector type - which was the only extra ability of the visitSIGN_EXTEND_INREG implementation.
No codegen intended and all existing tests remain the same.
Sanjoy Das [Sat, 3 Oct 2015 00:34:19 +0000 (00:34 +0000)]
Try to appease MSVC, NFCI.
This time by lifting the lambda's in `createNodeFromSelectLikePHI` to
the file scope. Looks like there are differences in capture rules
between clang and MSVC?
[libFuzzer] make LLVMFuzzerTestOneInput (the fuzzer target function) return int instead of void. The actual return value is not *yet* used (and expected to be 0). This change is API breaking, so the fuzzers will need to be updated.
Sanjoy Das [Fri, 2 Oct 2015 23:24:52 +0000 (23:24 +0000)]
Fix comment ASCII art to unbreak the gcc 4.9.1 build
The trailing backslashes in some ASCII art added in r248527 cause a
"error: multi-line comment [-Werror=comment]" when building with gcc
4.9.1 -Wall. Swallow (ASCII-)artistic integrity and use pipes instead.
Piotr Padlewski [Fri, 2 Oct 2015 22:12:22 +0000 (22:12 +0000)]
inariant.group handling in GVN
The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.
Tim Northover [Fri, 2 Oct 2015 18:07:18 +0000 (18:07 +0000)]
ARM: diagnose invalid local fixups on Thumb1
We previously stopped producing Thumb2 relaxations when they weren't supported,
but only diagnosed the case where an actual relocation was produced. We should
also tell people if local symbols aren't going to work rather than silently
overflowing.
Hal Finkel [Fri, 2 Oct 2015 17:50:28 +0000 (17:50 +0000)]
[lit] Raise the default soft process limit when possible
It is common to have a default soft process limit, at least on some families of
Linux distributions, of 1024. This is normally more than enough, but if you
have many cores, and you're running tests that create many threads, this can
become a problem. My POWER7 development machine has 48 cores, and when running
the lld regression tests, which often want to create up to 48 threads, I run
into problems. lit, by default, will want to run 48 tests in parallel, and
48*48 < 1024, and so many tests fail like this:
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
or lit fails like this when launching a test:
lit can easily detect this situation and attempt to repair it before launching
tests (by raising the soft process limit to something that will allow ncpus^2
threads to be created), and should do so to prevent spurious test failures.
This is the follow-up to this thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/090942.html
Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types."
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.
Example:
%1 = bitcast <4 x i31> %V to <2 x i64>
Before (-fast-isel -fast-isel-abort=1):
FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>
Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.
Originally reviewed here: http://reviews.llvm.org/D13347
[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.
Example:
%1 = bitcast <4 x i31> %V to <2 x i64>
Before (-fast-isel -fast-isel-abort=1):
FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>
Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.
Richard Smith [Fri, 2 Oct 2015 00:46:33 +0000 (00:46 +0000)]
DenseMap: we're trying to call the reserved global placement allocation
function here; use "::new" to avoid accidentally picking up a class-specific
operator new.