Bob Wilson [Tue, 22 Oct 2013 00:09:03 +0000 (00:09 +0000)]
Move the printing of llvm-cov information out from collectLineCounts().
collectLineCounts() should only organize the output data. This is done in
anticipation of subsequent changes which will pass in GCNO and GCDA filenames
into the print function where it is printed similar to the gcov output.
David Blaikie [Mon, 21 Oct 2013 18:59:40 +0000 (18:59 +0000)]
DWARF type hashing: Handle multiple (including recursive) references to the same type
This uses a map, keeping the type DIE numbering separate from the DIEs
themselves - alternatively we could do things the way GCC does if we
want to add an integer to the DIE type to record the numbering there.
Lang Hames [Mon, 21 Oct 2013 17:51:24 +0000 (17:51 +0000)]
X86 vector element shift-by-immediate instructions take i8 immediates. Make
the instruction defenitions and ISEL reflect this.
Prior to this patch these instructions took an i32i8imm, and the high bits were
dropped during encoding. This led to incorrect behavior for shifts by
immediates higher than 255. This patch fixes that issue by detecting large
immediate shifts and returning constant zero (for logical shifts) or capping
the shift amount at an encodable value (for arithmetic shifts).
Rafael Espindola [Mon, 21 Oct 2013 17:14:55 +0000 (17:14 +0000)]
Optimize more linkonce_odr values during LTO.
When a linkonce_odr value that is on the dso list is not unnamed_addr
we can still look to see if anything is actually using its address. If
not, it is safe to hide it.
This patch implements that by moving GlobalStatus to Transforms/Utils
and using it in Internalize.
David Blaikie [Mon, 21 Oct 2013 16:37:22 +0000 (16:37 +0000)]
DebugInfo: Hash DW_FORM_GNU_str_index as a string.
Found while adding type safety to the various DWARF enumerations (form,
attribute, tag, etc) that caused Clang to warn on an incompletely
covered switch. Converting the comment to a default/unreachable
uncovered this case of an unsupported form encoding. Seems we were
skipping fission strings entirely.
Matheus Almeida [Mon, 21 Oct 2013 12:26:50 +0000 (12:26 +0000)]
[mips][msa] Direct Object Emission support for CTCMSA and CFCMSA.
These instructions are logically related as they allow read/write of MSA control registers.
Currently MSA control registers are emitted by number but hopefully that will change as soon
as GAS starts accepting them by name as that would make the assembly easier to read.
Bill Wendling [Mon, 21 Oct 2013 04:09:17 +0000 (04:09 +0000)]
Don't eliminate a partially redundant load if it's in a landing pad.
A landing pad can be jumped to only by the unwind edge of an invoke
instruction. If we eliminate a partially redundant load in a landing pad, it
will create a basic block that violates this constraint. It then leads to other
problems down the line if it tries to merge that basic block with the landing
pad. Avoid this by not eliminating the load in a landing pad.
Teach simplify-cfg how to correctly create covered lookup tables for switches on iN with N >= 3.
One optimization simplify-cfg performs is the converting of switches to
lookup tables if the switch has > 4 cases. This is done by:
1. Finding the max/min case value and calculating the switch case range.
2. Create a lookup table basic block.
3. Perform a check in the switch's BB to see if the input value is in
the switch's case range. If the input value satisfies said predicate
branch to the lookup table BB, otherwise branch to the switch's default
destination BB using the default value as the result.
The conditional check consists of subtracting the min case value of the
table from any input iN value and then ensuring that said value is
unsigned less than the size of the lookup table represented as an iN
value.
If the lookup table is a covered lookup table, the size of the table will be N
which is 0 as an iN value. Thus the comparison will be an `icmp ult` of an iN
value against 0 which is always false yielding the incorrect result.
This patch fixes this problem by recognizing if we have a covered lookup table
and if we do, unconditionally jumps to the lookup table BB since the covering
property of the lookup table implies no input values could not be handled by
said BB.
This ensures that the prefix data is treated as part of the function for
the purpose of debug info. This provides a better debugging experience,
among other things by allowing a debug info client to correctly look up
a function in debug info given a function pointer.
Bill Wendling [Sat, 19 Oct 2013 11:27:12 +0000 (11:27 +0000)]
Perform an intelligent splice of the predecessor with the single successor.
If the predecessor's being spliced into a landing pad, then we need the PHIs to
come first and the rest of the predecessor's code to come *after* the landing
pad instruction.
Andrew Trick [Fri, 18 Oct 2013 23:43:53 +0000 (23:43 +0000)]
SCEV should use NSW to get trip count for positive nonunit stride loops.
SCEV currently fails to compute loop counts for nonunit stride
loops. This comes up frequently. It prevents loop optimization and
forces vectorization to insert extra loop checks.
For example:
void foo(int n, int *x) {
for (int i = 0; i < n; i += 3) {
x[i] = i;
x[i+1] = i+1;
x[i+2] = i+2;
}
}
We need to properly handle the case in which limit > INT_MAX-stride. In
the above case: n > INT_MAX-3. In this case the loop counter will step
beyond the limit and overflow at the same time. However, knowing that
signed integer overlow in undefined, we can assume the loop test
behavior is arbitrary after overflow. This obeys both C undefined
behavior rules, and the more strict LLVM poison value rules.
I'm finally fixing this in response to Hal Finkel's persistence.
The most probable reason that we never optimized this before is that
we were being careful to handle case where the developer expected a
side-effect free infinite loop relying on overflow:
for (int i = 0; i < n; i += s) {
++j;
}
return j;
If INT_MAX+1 is a multiple of s and n > INT_MAX-s, then we might
expect an infinite loop. However there are plenty of ways to achieve
this effect without relying on undefined behavior of signed overflow.
Manman Ren [Fri, 18 Oct 2013 21:14:19 +0000 (21:14 +0000)]
Debug Info: add a newly-created DIE to a parent in the same function.
With this commit, all DIEs created in CompileUnit will be added to parents
inside the same function. Also make getOrCreateTemplateType|Value functions
private.
Hans Wennborg [Fri, 18 Oct 2013 20:46:28 +0000 (20:46 +0000)]
MC asm parser: allow ?'s in symbol names, and handle @'s in names in MS asm
This is another (final?) stab at making us able to parse our own asm output
on Windows.
Symbols on Windows often contain @'s and ?'s in their names. Our asm parser
didn't like this. ?'s were not allowed, and @'s were intepreted as trying to
reference PLT/GOT/etc.
We can't just add quotes around the bad names, since e.g. for MinGW, we use gas
to assemble, and it doesn't like quotes in some places (notably in .def
directives).
This commit makes us allow ?'s in symbol names, and @'s in symbol names for MS
assembly.
Bill Schmidt [Fri, 18 Oct 2013 14:20:11 +0000 (14:20 +0000)]
[PATCH] Fix PR17168 (DAG scheduler inserts DBG_VALUE before PHI with fast-isel)
PR17168 describes a test case that fails when compiling for debug with
fast-isel. Investigation showed that the test was failing because a DBG_VALUE
machine instruction was placed prior to a PHI.
For this problem to occur requires the following:
* Compile for debug
* Compile with fast-isel
* In a block B, fast-isel must partially succeed before punting to DAG-isel
* B must start with a PHI
* The first unhandled node in the DAG must not generate a machine instruction
* A debug value with an order less than that of that first node exists
When all of these circumstances apply, the existing test that an instruction
was not inserted won't fire. Currently it tests whether the block is empty,
or whether the last instruction generated is a phi. When fast-isel has
partially succeeded, the last instruction generated will not be a phi.
Instead, we need to check whether the current insert position is immediately
following a phi. This patch adds that check, and adds the test case from the
PR as a regression test.
Silviu Baranga [Fri, 18 Oct 2013 10:18:40 +0000 (10:18 +0000)]
Add hardware division as a default feature on Cortex-A15. Also add test cases to check this, and change diagnostics for the hwdiv-arm feature to something useful.
David Peixotto [Thu, 17 Oct 2013 19:52:05 +0000 (19:52 +0000)]
17309 ARM backend incorrectly lowers COPY_STRUCT_BYVAL_I32 for thumb1 targets
This commit implements the correct lowering of the
COPY_STRUCT_BYVAL_I32 pseudo-instruction for thumb1 targets.
Previously, the lowering of COPY_STRUCT_BYVAL_I32 generated the
post-increment forms of ldr/ldrh/ldrb instructions. Thumb1 does not
have the post-increment form of these instructions so the generated
assembly contained invalid instructions.
Passing the generated assembly to gcc caused it to complain with an
error like this:
and the integrated assembler would generate an object file with an
invalid instruction encoding.
This commit contains a small test case that demonstrates the problem
with thumb1 targets as well as an expanded test case that more
throughly tests the lowering of byval struct passing for arm,
thumb1, and thumb2 targets.
David Peixotto [Thu, 17 Oct 2013 19:49:22 +0000 (19:49 +0000)]
Refactor lowering for COPY_STRUCT_BYVAL_I32
This commit refactors the lowering of the COPY_STRUCT_BYVAL_I32
pseudo-instruction in the ARM backend. We introduce a new helper
class that encapsulates all of the operations needed during the
lowering. The operations are implemented for each subtarget in
different subclasses. Currently only arm and thumb2 subtargets are
supported.
This refactoring was done to easily implement support for thumb1
subtargets. This initial patch does not add support for thumb1, but
is only a refactoring. A follow on patch will implement the support
for thumb1 subtargets.
All of the Core API functions have versions which accept explicit context, in
addition to ones which work on global context. This commit adds functions
which accept explicit context to the Target API for consistency.
Chad Rosier [Thu, 17 Oct 2013 18:12:29 +0000 (18:12 +0000)]
[AArch64] Add support for NEON scalar three register different instruction
class. The instruction class includes the signed saturating doubling
multiply-add long, signed saturating doubling multiply-subtract long, and
the signed saturating doubling multiply long instructions.
Hans Wennborg [Thu, 17 Oct 2013 17:49:57 +0000 (17:49 +0000)]
CMake: set stack size to 2MB for MSVC builds
Compiling under Visual C++ 2012 with the default stack size of 1MB, the stack
overflows at a depth of 216 template instantiations, well before the 256
default limit. This patch modifies the default MSVC stack size to 2MB.