Reid Kleckner [Mon, 22 Dec 2014 22:10:08 +0000 (22:10 +0000)]
Fix Windows unwind info for functions in sections other than .text
Previously we assumed the section name had the form .text$foo, which is
what we used to do for inline functions. If the dollar wasn't present,
we'd put unwind data in the .pdata and .xdata sections for the main
.text section, which is incorrect.
Currently, when ctpop is supported for scalar types, the expansion of
@llvm.ctpop.vXiY uses vector element extractions, insertions and individual
calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used
for the scalar calls.
Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY
expansion in some cases by using a using a vector parallel bit twiddling
approach, based on:
v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = ((v + (v >> 4) & 0xF0F0F0F)
v = v + (v >> 8)
v = v + (v >> 16)
v = v & 0x0000003F
(from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)
When scalar ctpop isn't supported, the approach above performs better for
v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop
is supported, this approach performs ~2x better for v8i32.
Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop
support with -march=core-avx2.
Matt Arsenault [Sun, 21 Dec 2014 16:48:42 +0000 (16:48 +0000)]
Enable (sext x) == C --> x == (trunc C) combine
Extend the existing code which handles this for zext. This makes this
more useful for targets with ZeroOrNegativeOne BooleanContent and
obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne)
since the constant will now be shrunk to i1.
The ARM ARM states:
LDM/LDMIA/LDMFD:
The SP can be in the list. However, ARM deprecates using these instructions
with SP in the list.
ARM deprecates using these instructions with both the LR and the PC in the
list.
LDMDA/LDMFA/LDMDB/LDMEA/LDMIB/LDMED:
The SP can be in the list. However, instructions that include the SP in the
list are deprecated.
Instructions that include both the LR and the PC in the list are deprecated.
POP:
The SP can only be in the list before ARMv7. ARM deprecates any use of ARM
instructions that include the SP, and the value of the SP after such an
instruction is UNKNOWN.
ARM deprecates the use of this instruction with both the LR and the PC in
the list.
Attempt to diagnose use of deprecated forms of these instructions. This mirrors
the previous changes to diagnose use of the deprecated forms of STM in ARM mode.
David Majnemer [Sat, 20 Dec 2014 03:04:38 +0000 (03:04 +0000)]
InstSimplify: Optimize away pointless comparisons
(X & INT_MIN) ? X & INT_MAX : X into X & INT_MAX
(X & INT_MIN) ? X : X & INT_MAX into X
(X & INT_MIN) ? X | INT_MIN : X into X
(X & INT_MIN) ? X : X | INT_MIN into X | INT_MIN
Chandler Carruth [Sat, 20 Dec 2014 02:39:18 +0000 (02:39 +0000)]
[SROA] Run clang-format over the entire SROA pass as I wrote it before
much of the glory of clang-format, and now any time I touch it I risk
introducing formatting changes as part of a functional commit.
Also, clang-format is *way* better at formatting my code than I am.
Most of this is a huge improvement although I reverted a couple of
places where I hit a clang-format bug with lambdas that has been filed
but not (fully) fixed.
Chandler Carruth [Sat, 20 Dec 2014 02:19:22 +0000 (02:19 +0000)]
[x86] Change the test added in r223774 to first check the spelling of
the error message for a bogus processor, and then look specifically for
that error message using FileCheck.
I actually tried to write the test this way at first, but drew a blank
on how to ensure the error message stayed in sync (oops). Now that I've
recalled how to do that, this is clearly better.
It also fixes an issue with a malloc implementation that actually prints
to stderr in all cases, which was causing problems for some builders it
seems.
Matthias Braun [Sat, 20 Dec 2014 01:54:50 +0000 (01:54 +0000)]
LiveIntervalAnalysis: No kill flags for partially undefined uses.
We must not add kill flags when reading a vreg with some undefined
subregisters, if subreg liveness tracking is enabled. This is because
the register allocator may reuse these undefined subregisters for other
values which are not killed.
Eric Fiselier [Fri, 19 Dec 2014 22:29:12 +0000 (22:29 +0000)]
[LIT] Add JSONMetricValue type to wrap types supported by the json encoder.
Summary:
The following types can be encoded and decoded by the json library:
`dict`, `list`, `tuple`, `str`, `unicode`, `int`, `long`, `float`, `bool`, `NoneType`.
`JSONMetricValue` can be constructed with any of these types, and used as part of Test.Result.
This patch also adds a toMetricValue function that converts a value into a MetricValue.
Reid Kleckner [Fri, 19 Dec 2014 22:19:48 +0000 (22:19 +0000)]
Add the ExceptionHandling::MSVC enumeration
It is intended to be used for a family of personality functions that
have similar IR preparation requirements. Typically when interoperating
with MSVC personality functions, bits of functionality need to be
outlined from the main function into helper functions. There is also
usually more than one landing pad per invoke, which does not match the
LLVM IR landingpad representation.
None of this is implemented yet. This change just adds a new enum that
is active for *-windows-msvc and delegates to the EH removal preparation
pass. No functionality change for other targets.
Sanjay Patel [Fri, 19 Dec 2014 22:16:28 +0000 (22:16 +0000)]
Model sqrtss as a binary operation with one source operand tied to the destination (PR14221)
This is a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).
That patch started to fix PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ), but it was not completed.
Sanjay Patel [Fri, 19 Dec 2014 20:23:41 +0000 (20:23 +0000)]
merge consecutive stores of extracted vector elements
Add a path to DAGCombiner::MergeConsecutiveStores()
to combine multiple scalar stores when the store operands
are extracted vector elements. This is a partial fix for
PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).
Frederic Riss [Fri, 19 Dec 2014 18:26:33 +0000 (18:26 +0000)]
[DebugInfo] Move all DWARF headers to the public include directory.
dsymutil needs access to DWARF specific inforamtion, the small DIContext
wrapper isn't sufficient. Other DWARF consumers might want to use it too
(I'm looking at you lldb).
Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. Also, fix code to also return the modified switch when only
the truncation is performed.
Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr"
Reverts commit r224574 to appease buildbots:
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.
[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.
Export symbols in libLTO.dylib for the local context-related functions
added in r221733 (`LTO_API_VERSION=11`)... and add the missing
definition for `lto_codegen_create_in_local_context()`.
Instead of reusing the name `MapValue()` when mapping `Metadata`, use
`MapMetadata()`. The old name doesn't make much sense after the
`Metadata`/`Value` split.
Matthias Braun [Fri, 19 Dec 2014 01:39:46 +0000 (01:39 +0000)]
RegisterCoalescer: rewrite eliminateUndefCopy().
This also fixes problems with undef copies of subregisters. I can't
attach a testcase for that as none of the targets in trunk has
subregister liveness tracking enabled.
The algorithm for sorting libraries in topological order, as
previously implemented, had a few issues:
* It didn't make any sense.
* It didn't actually sort libraries in topological order.
* It hung on some inputs, e.g. "LLVMipo".
This commit replaces the old algorithm with a straightforward port
from llvm-config.cpp.
Roman Divacky [Thu, 18 Dec 2014 23:12:34 +0000 (23:12 +0000)]
Instead of explicitely comparing both lowercase and uppercase variants.
.lower() the Name and compare only the lowecase. Removing 81 compares/lines of
code. This changes the accepted string to be mixed lower/upper case but it
should be ok.
Chris Bieneman [Thu, 18 Dec 2014 21:03:49 +0000 (21:03 +0000)]
Have llvm-c-test only use libLLVM if libLLVM has all the right components.
Summary: We should only have llvm-c-test use libLLVM if the library is built with the default set of components or if LLVM_DYLIB_COMPONENTS includes all the LLVM_LINK_COMPONENTS required for llvm-c-test. Making libLLVM always used causes build failures if libLLVM doesn't include all
Matthias Braun [Thu, 18 Dec 2014 19:58:52 +0000 (19:58 +0000)]
LiveIntervalAnalysis: Cleanup computeDeadValues
- This also fixes a bug introduced in r223880 where values were not
correctly marked as Dead anymore.
- Cleanup computeDeadValues(): split up SubRange code variant, simplify
arguments.
Robert Khasanov [Thu, 18 Dec 2014 12:28:22 +0000 (12:28 +0000)]
[AVX512] Enable FP arithmetic lowering for AVX512VL subsets.
Added RegOp2MemOpTable4 to transform 4th operand from register to memory in merge-masked versions of instructions.
Added lowering tests.
ARM: improve instruction validation for thumb mode
The ARM Architecture Reference Manual states the following:
LDM{,IA,DB}:
The SP cannot be in the list.
The PC can be in the list.
If the PC is in the list:
• the LR must not be in the list
• the instruction must be either outside any IT block, or the last
instruction in an IT block.
POP:
The PC can be in the list.
If the PC is in the list:
• the LR must not be in the list
• the instruction must be either outside any IT block, or the last
instruction in an IT block.
PUSH:
The SP and PC can be in the list in ARM instructions, but not in Thumb
instructions.
STM:{,IA,DB}:
The SP and PC can be in the list in ARM instructions, but not in Thumb
instructions.