David Peixotto [Thu, 17 Oct 2013 19:52:05 +0000 (19:52 +0000)]
17309 ARM backend incorrectly lowers COPY_STRUCT_BYVAL_I32 for thumb1 targets
This commit implements the correct lowering of the
COPY_STRUCT_BYVAL_I32 pseudo-instruction for thumb1 targets.
Previously, the lowering of COPY_STRUCT_BYVAL_I32 generated the
post-increment forms of ldr/ldrh/ldrb instructions. Thumb1 does not
have the post-increment form of these instructions so the generated
assembly contained invalid instructions.
Passing the generated assembly to gcc caused it to complain with an
error like this:
and the integrated assembler would generate an object file with an
invalid instruction encoding.
This commit contains a small test case that demonstrates the problem
with thumb1 targets as well as an expanded test case that more
throughly tests the lowering of byval struct passing for arm,
thumb1, and thumb2 targets.
David Peixotto [Thu, 17 Oct 2013 19:49:22 +0000 (19:49 +0000)]
Refactor lowering for COPY_STRUCT_BYVAL_I32
This commit refactors the lowering of the COPY_STRUCT_BYVAL_I32
pseudo-instruction in the ARM backend. We introduce a new helper
class that encapsulates all of the operations needed during the
lowering. The operations are implemented for each subtarget in
different subclasses. Currently only arm and thumb2 subtargets are
supported.
This refactoring was done to easily implement support for thumb1
subtargets. This initial patch does not add support for thumb1, but
is only a refactoring. A follow on patch will implement the support
for thumb1 subtargets.
All of the Core API functions have versions which accept explicit context, in
addition to ones which work on global context. This commit adds functions
which accept explicit context to the Target API for consistency.
Chad Rosier [Thu, 17 Oct 2013 18:12:29 +0000 (18:12 +0000)]
[AArch64] Add support for NEON scalar three register different instruction
class. The instruction class includes the signed saturating doubling
multiply-add long, signed saturating doubling multiply-subtract long, and
the signed saturating doubling multiply long instructions.
Hans Wennborg [Thu, 17 Oct 2013 17:49:57 +0000 (17:49 +0000)]
CMake: set stack size to 2MB for MSVC builds
Compiling under Visual C++ 2012 with the default stack size of 1MB, the stack
overflows at a depth of 216 template instantiations, well before the 256
default limit. This patch modifies the default MSVC stack size to 2MB.
Daniel Sanders [Thu, 17 Oct 2013 12:16:03 +0000 (12:16 +0000)]
[mips][msa] Removed ldx.[bhwd] and stx.[bhwd].
These were present in a previous version of the MSA spec but are not
present in the published version. There is no hardware that uses these
instructions.
Andrea Di Biagio [Thu, 17 Oct 2013 11:02:58 +0000 (11:02 +0000)]
Fix edge condition in DAGCombiner to improve codegen of shift sequences.
When canonicalizing dags according to the rule
(shl (zext (shr X, c1) ), c1) ==> (zext (shl (shr X, c1), c1))
remember to add the new shl dag to the DAGCombiner worklist of nodes.
If we don't explicitly add it to the worklist of nodes to visit, we
may not trigger later on the rule that folds the shift left + logical
shift right into a AND instruction with bitmask.
llvm-c: Return NULL from LLVMGetFirstTarget instead of asserting
If no targets are registered, LLVMGetFirstTarget currently fails with
an assertion. This patch makes it return NULL instead, similarly to
how LLVMGetNextTarget would.
Jim Grosbach [Thu, 17 Oct 2013 02:58:06 +0000 (02:58 +0000)]
x86: Move bitcasts outside concat_vector.
Consider the following:
typedef unsigned short ushort4U __attribute__((ext_vector_type(4),
aligned(2)));
typedef unsigned short ushort4 __attribute__((ext_vector_type(4)));
typedef unsigned short ushort8 __attribute__((ext_vector_type(8)));
typedef int int4 __attribute__((ext_vector_type(4)));
This generates the, not unreasonable, IR:
define <4 x i32> @foo0(double %v.coerce) nounwind ssp {
%tmp = bitcast double %v.coerce to <4 x i16>
%tmp1 = shufflevector <4 x i16> %tmp, <4 x i16> undef, <8 x i32> <i32
%0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
%tmp2 = tail call <4 x i32> @llvm.x86.sse41.pmovzxwd(<8 x i16> %tmp1)
ret <4 x i32> %tmp2
}
The problem is when type legalization gets hold of the v4i16. It
legalizes that by spilling to the stack, then doing a zero-extending
load. Things go even more silly from there, ending up with something
like:
_foo0:
movsd %xmm0, -8(%rsp) <== Spill to the stack.
movq -8(%rsp), %xmm0 <== Reload it right back out.
pmovzxwd %xmm0, %xmm1 <== Here's what we actually asked for.
pblendw $1, %xmm1, %xmm0 <== We don't need this at all
pmovzxwd %xmm0, %xmm0 <== We already did this
ret
The v8i8 to v8i16 zext intrinsic gives even worse results, with two
table lookups via pshufb instructions(!!).
To avoid all that, we can move the bitcasting until after we've formed
the wider (legal) vector type. Then our normal codegen flows along
nicely and we get the expected:
_foo0:
pmovzxwd %xmm0, %xmm0
ret
Eric Christopher [Thu, 17 Oct 2013 02:06:06 +0000 (02:06 +0000)]
According to the dwarf standard pubnames and pubtypes for languages
like C++ should be the fully qualified names for the type.
Add a routine that does a language specific context walk to build
up the qualified name and use it when we add types/names to the
tables. Expand the gnu pubnames testcase as it's the most complex
to make sure that qualified types are also being added.
Filip Pizlo [Thu, 17 Oct 2013 01:38:28 +0000 (01:38 +0000)]
Expose install_fatal_error_handler() through the C API.
I expose the API with some caveats:
- The C++ API involves a traditional void* opaque pointer for the fatal
error callback. The C API doesn’t do this. I don’t think that the void*
opaque pointer makes any sense since this is a global callback - there will
only be one of them. So if you need to pass some data to your callback,
just put it in a global variable.
- The bindings will ignore the gen_crash_diag boolean. I ignore it because
(1) I don’t know what it does, (2) it’s not documented AFAIK, and (3) I
couldn’t imagine any use for it. I made the gut call that it probably
wasn’t important enough to expose through the C API.
Hans Wennborg [Thu, 17 Oct 2013 01:13:02 +0000 (01:13 +0000)]
Re-commit r192758 - MC: quote tricky symbol names in asm output
The reason this got reverted was that the @feat.00 symbol which was emitted
for every TU became quoted, and on cygwin/mingw we use the gas assembler which
couldn't handle the quotes.
This commit fixes the problem by only emitting @feat.00 for win32, where we use
clang -cc1as to assemble. gas would just drop this symbol anyway, so there is no
loss there.
With @feat.00 gone, there shouldn't be quoted symbols showing up on cygwin since
it uses the Itanium ABI, which doesn't put these funny characters in symbols.
> Because of win32 mangling, we produce symbol and section names with
> funny characters in them, most notably @ characters.
>
> MC would choke on trying to parse its own assembly output. This patch addresses
> that by:
>
> - Making @ trigger quoting of symbol names
> - Also quote section names in the same way
> - Just parse section names like other identifiers (to allow for quotes)
> - Don't assume @ signifies a symbol variant if it is in a string.
David Blaikie [Wed, 16 Oct 2013 22:43:10 +0000 (22:43 +0000)]
Invert arguments to ASSERT_EQ to match gtest diagnostic printing
GTest assumes the left hand side of the assert is the expectation and
the right hand side is the test result. It's easier to read gtest
failures when these things are ordered correctly.
Rafael Espindola [Wed, 16 Oct 2013 20:21:39 +0000 (20:21 +0000)]
Allow repeated registration again.
Our use of -fvisibility-inlines-hidden means we cannot check function pointers
against non null values.
Unfortunately, we also cannot assert that the callbacks are initialized only
once. The problem is that lldb has multiple subsystems that need to call this
and they don't have a unique initialization order.
Yunzhong Gao [Wed, 16 Oct 2013 19:04:11 +0000 (19:04 +0000)]
Enabling 3DNow! prefetch instruction for a few AMD processors: bobcat, jaguar,
bulldozer and piledriver. Support for the instruction itself seems to have
already been added in r178040.
Tom Stellard [Wed, 16 Oct 2013 17:06:02 +0000 (17:06 +0000)]
R600: Fix a crash in the AMDILCFGStructurizer
We were calling llvm_unreachable() when failing to optimize the
branch into if case. However, it is still possible for us
to structurize the CFG by duplicating blocks even if this optimization
fails.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192813 91177308-0d34-0410-b5e6-96231b3b80d8
Rafael Espindola [Wed, 16 Oct 2013 16:21:40 +0000 (16:21 +0000)]
Assert on duplicate registration. Don't depend on function pointer equality.
Before this patch we would assert when building llvm as multiple shared
libraries (cmake's BUILD_SHARED_LIBS). The problem was the line
if (T.AsmStreamerCtorFn == Target::createDefaultAsmStreamer)
which returns false because of -fvisibility-inlines-hidden. It is easy
to fix just this one case, but I decided to try to also make the
registration more strict. It looks like the old logic for ignoring
followup registration was just a temporary hack that outlived its
usefulness.
This patch converts the ifs to asserts, fixes the few cases that were
registering twice and makes sure all the asserts compare with null.
Thanks for Joerg for reporting the problem and reviewing the patch.
Benjamin Kramer [Wed, 16 Oct 2013 14:16:19 +0000 (14:16 +0000)]
DAGCombiner: Don't fold xor into not if getNOT would introduce an illegal constant.
This happens e.g. with <2 x i64> -1 on x86_32. It cannot be generated directly
because i64 is illegal. It would be nice if getNOT would handle this
transparently, but I don't see a way to generate a legal constant there right
now. Fixes PR17487.
[asan] Optimize accesses to global arrays with constant index
Summary:
Given a global array G[N], which is declared in this CU and has static initializer
avoid instrumenting accesses like G[i], where 'i' is a constant and 0<=i<N.
Also add a bit of stats.
This eliminates ~1% of instrumentations on SPEC2006
and also partially helps when asan is being run together with coverage.
We previously used the default expansion to SELECT_CC, which in turn would
expand to "LHI; BRC; LHI". In most cases it's better to use an IPM-based
sequence instead.
Handle (shl (anyext (shr ...))) in SimpilfyDemandedBits
This is really an extension of the current (shl (shr ...)) -> shl optimization.
The main difference is that certain upper bits must also not be demanded.
The motivating examples are the first two in the testcase, which occur
in llvmpipe output.
Bill Wendling [Wed, 16 Oct 2013 08:59:57 +0000 (08:59 +0000)]
Add a 'deleteModule' method to the Linker class.
This deletes the Module ivar instead of having the LTO code generater do it. It
also sets the pointer to 'NULL', so that if it's used again it will abort
quickly.
Eric Christopher [Wed, 16 Oct 2013 01:37:49 +0000 (01:37 +0000)]
Fix a pair of bugs in the emission of pubname tables:
1) Make sure we emit static member variables by checking
at the end of createGlobalVariableDIE rather than piecemeal
in the function.
(As a note, createGlobalVariableDIE needs rewriting.)
2) Make sure we use the definition rather than declaration DIE
for two things: a) determining linkage for gnu pubnames, and b)
as the address of the DIE for global variables.
(As a note, createGlobalVariableDIE really needs rewriting.)
Adjust the testcase to make sure we're checking the correct DIEs.
Hans Wennborg [Wed, 16 Oct 2013 01:20:40 +0000 (01:20 +0000)]
MC: Better handling of tricky symbol and section names
Because of win32 mangling, we produce symbol and section names with
funny characters in them, most notably @ characters.
MC would choke on trying to parse its own assembly output. This patch addresses
that by:
- Making @ trigger quoting of symbol names
- Also quote section names in the same way
- Just parse section names like other identifiers (to allow for quotes)
- Don't assume @ signifies a symbol variant if it is in a string.
Andrew Trick [Tue, 15 Oct 2013 23:33:07 +0000 (23:33 +0000)]
Enable MI Sched for x86.
This changes the SelectionDAG scheduling preference to source
order. Soon, the SelectionDAG scheduler can be bypassed saving
a nice chunk of compile time.
Performance differences that result from this change are often a
consequence of register coalescing. The register coalescer is far from
perfect. Bugs can be filed for deficiencies.
On x86 SandyBridge/Haswell, the source order schedule is often
preserved, particularly for small blocks.
Register pressure is generally improved over the SD scheduler's ILP
mode. However, we are still able to handle large blocks that require
latency hiding, unlike the SD scheduler's BURR mode. MI scheduler also
attempts to discover the critical path in single-block loops and
adjust heuristics accordingly.
The MI scheduler relies on the new machine model. This is currently
unimplemented for AVX, so we may not be generating the best code yet.
Unit tests are updated so they don't depend on SD scheduling heuristics.
Eric Christopher [Tue, 15 Oct 2013 23:31:38 +0000 (23:31 +0000)]
Make sure we're not attempting to construct a subprogram DIE
twice and just look up the value. Fix the one case where
we were trying to create a subprogram DIE and we should already
have had one. Reflow formatting in collectDeadVariables while fixing.
Eric Christopher [Tue, 15 Oct 2013 23:31:36 +0000 (23:31 +0000)]
Add an assert that we have a scope that matters for methods
and remove a call to getNonCompileUnitScope as a method
shouldn't be in the compile unit scope.
Rui Ueyama [Tue, 15 Oct 2013 22:45:38 +0000 (22:45 +0000)]
Path: Recognize Windows compiled resource file.
Some background: One can pass compiled resource files (.res files) directly
to the linker on Windows. If a resource file is given, the linker will run
"cvtres" command in background to convert the resource file to a COFF file
to link it.
What I'm trying to do with this patch is to make the linker to recognize
the resource file by file magic, so that it can run cvtres command.
Michael Liao [Tue, 15 Oct 2013 17:51:58 +0000 (17:51 +0000)]
Fix PR17546
- Type of index used in extract_vector_elt or insert_vector_elt supposes
to be TLI.getVectorIdxTy() which is pointer type on most targets. It'd
better to truncate (or zero-extend in case it's changed later) it to
mask element type to guarantee they are matching instead of asserting
that.