Reid Kleckner [Tue, 28 Oct 2014 01:29:26 +0000 (01:29 +0000)]
X86: Implement the vectorcall calling convention
This is a Microsoft calling convention that supports both x86 and x86_64
subtargets. It passes vector and floating point arguments in XMM0-XMM5,
and passes them indirectly once they are consumed.
Homogenous vector aggregates of up to four elements can be passed in
sequential vector registers, but this part is not implemented in LLVM
and will be handled in Clang.
On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as
integer register parameters and is callee cleanup. On x86_64, it
delegates to the normal win64 calling convention.
Tim Northover [Tue, 28 Oct 2014 01:24:32 +0000 (01:24 +0000)]
AArch64: enable Cortex-A57 FP balancing on Cortex-A53.
Benchmarks have shown that it's harmless to the performance there, and having a
unified set of passes between the two cores where possible helps big.LITTLE
deployment.
If it is to be resurrected, it has to be fixed and we should probably have a
-preserve-source command line option in llvm-mc and run tests with and without
it.
Adam Nemet [Mon, 27 Oct 2014 23:08:37 +0000 (23:08 +0000)]
[AVX512] Clean up avx512_perm_imm to use X86VectorVTInfo
No functionality change. No change in X86.td.expanded except that we only set
the CD8 attributes for the memory variants. (This shouldn't be used unless we
have a memory operand.)
Juergen Ributzka [Mon, 27 Oct 2014 19:58:36 +0000 (19:58 +0000)]
[FastISel][AArch64] Emit immediate version of icmp (subs) for null pointer check.
This is a minor change to use the immediate version when the operand is a null
value. This should get rid of an unnecessary 'mov' instruction in debug
builds and align the code more with the one generated by SelectionDAG.
Pete Cooper [Mon, 27 Oct 2014 19:40:35 +0000 (19:40 +0000)]
Stackmap shadows should consider call returns a branch target.
To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets.
A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return.
Juergen Ributzka [Mon, 27 Oct 2014 19:38:05 +0000 (19:38 +0000)]
[FastISel][AArch64] Use 'cbz' also for null values (pointers).
The pattern matching for a 'ConstantInt' value was too restrictive. Checking for
a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction.
Juergen Ributzka [Mon, 27 Oct 2014 19:16:48 +0000 (19:16 +0000)]
[FastISel][AArch64] Don't fold the 'and' instruction into the 'tbz/tbnz' instruction if it is in a different basic block.
This fixes a bug where the input register was not defined for the 'tbz/tbnz'
instruction. This happened, because we folded the 'and' instruction from a
different basic block.
Juergen Ributzka [Mon, 27 Oct 2014 18:21:58 +0000 (18:21 +0000)]
[FastISel][AArch64] Fix load/store with frame indices.
At higher optimization levels the LLVM IR may contain more complex patterns for
loads/stores from/to frame indices. The 'computeAddress' function wasn't able to
handle this and triggered an assertion.
This fix extends the possible addressing modes for frame indices.
Lang Hames [Mon, 27 Oct 2014 17:44:25 +0000 (17:44 +0000)]
[PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of these
sets as keys into a cache of interference matrice values in the Interference
constraint adder.
Creating interference matrices was one of the large remaining time-sinks in
PBQP. Caching them reduces the total compile time (when using PBQP) on the
nightly test suite by ~10%.
Oliver Stannard [Mon, 27 Oct 2014 09:23:02 +0000 (09:23 +0000)]
[ARM] Select VMAXNM and VMINNM regardless of operand order
Currently, the ARM backend will select the VMAXNM and VMINNM for these C
expressions:
(a < b) ? a : b
(a > b) ? a : b
but not these expressions:
(a > b) ? b : a
(a < b) ? b : a
This patch allows all of these expressions to be matched.
Rui Ueyama [Mon, 27 Oct 2014 07:37:57 +0000 (07:37 +0000)]
Include stddef.h before including cxxabi.h
On FreeBSD 10.0, size_t needs to be defined before including cxxabi.h.
Currenty HAVE_CXXABI_H is not defined on FreeBSD because of that reason.
This patch teaches cmake and configure how to include it.
Add an option to the LTO code generator to disable vectorization during LTO
We used to always vectorize (slp and loop vectorize) in the LTO pass pipeline.
r220345 changed it so that we used the PassManager's fields 'LoopVectorize' and
'SLPVectorize' out of the desire to be able to disable vectorization using the
cl::opt flags 'vectorize-loops'/'slp-vectorize' which the before mentioned
fields default to.
Unfortunately, this turns off vectorization because those fields
default to false.
This commit adds flags to the LTO library to disable lto vectorization which
reconciles the desire to optionally disable vectorization during LTO and
the desired behavior of defaulting to enabled vectorization.
We really want tools to set PassManager flags directly to enable/disable
vectorization and not go the route via cl::opt flags *in*
PassManagerBuilder.cpp.
Lang Hames [Sun, 26 Oct 2014 20:57:16 +0000 (20:57 +0000)]
[PBQP] Re-commit r220642 with a workaround for quirky Visual Studio behavior.
Apparently unique_ptr'ifying NodeMetadata exposed an issue in VS where it
occasionally tries to synthesize copy constructors instead of moves. Hopefully
explicitly deleting the copy constructor and defining the move constructor will
fix this.
Hans Wennborg [Sun, 26 Oct 2014 19:50:13 +0000 (19:50 +0000)]
Revert "[PBQP] Unique-ptrify some PBQP Metadata structures. No functional change." (r220642)
It broke the Windows build:
[1/19] Building CXX object lib\CodeGen\CMakeFiles\LLVMCodeGen.dir\RegAllocPBQP.cpp.obj
C:\bb-win7\ninja-clang-i686-msc17-R\llvm-project\llvm\include\llvm/CodeGen/RegAllocPBQP.h(132) : error C2248: 'std::unique_ptr<_Ty>::unique_ptr' : cannot access private member declared in class 'std::unique_ptr<_Ty>'
with
[
_Ty=unsigned int []
]
D:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\include\memory(1600) : see declaration of 'std::unique_ptr<_Ty>::unique_ptr'
with
[
_Ty=unsigned int []
]
This diagnostic occurred in the compiler generated function 'llvm::PBQP::RegAlloc::NodeMetadata::NodeMetadata(const llvm::PBQP::RegAlloc::NodeMetadata &)'
Lang Hames [Sun, 26 Oct 2014 18:16:27 +0000 (18:16 +0000)]
[PBQP] Tidy up CostAllocator.h: fix variable case, rename CostPool to ValuePool.
No functional change. This just brings things more in-line with coding
standards, and makes ValuePool's functionality clearer (it's not tied to pooling
costs, and we may want to use it to hold other things in the future).
Andrew Trick [Sat, 25 Oct 2014 19:42:07 +0000 (19:42 +0000)]
Fix LSR compile time.
This is a simple fix that brings the compilation time from 5min to 5s
on a specific real-world example. It's a large chain of computation in
a crypto routine (always a problem for SCEV). A unit test is not
feasible and there would be no way to check it. The fix is just basic
good practice for dealing with SCEVs, there's no risk of regression.
First, return true on success, as it is the OCaml convention.
Second, also initialize the native assembly printer, which is,
despite the name, required for MCJIT operation.
Since this function did not initialize the assembly printer earlier
and no function to initialize native assembly printer was available
elsewhere, it is safe to break its interface: it means that it
simply could not be used successfully before.
Tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1).
David Majnemer [Sat, 25 Oct 2014 07:13:13 +0000 (07:13 +0000)]
InstCombine: Remove overzealous asserts
These asserts can trigger if the worklist iteration order is
sufficiently unlucky. Instead of adding special case logic to handle
these edge conditions, just bail out on trying to transform them:
InstSimplify will get them when it reaches them on the worklist.
This fixes PR21378.
N.B. No test case is included because any test would rely on the
fragile worklist iteration order.
Jingyue Wu [Sat, 25 Oct 2014 03:46:16 +0000 (03:46 +0000)]
[NVPTX] aligned byte-buffers for vector return types
Summary:
Fixes PR21100 which is caused by inconsistency between the declared return type
and the expected return type at the call site. The new behavior is consistent
with nvcc and the NVPTXTargetLowering::getPrototype function.
Kevin Enderby [Fri, 24 Oct 2014 22:39:40 +0000 (22:39 +0000)]
Fix a Mach-O assembler segfault for a subtraction expression with an undefined symbol.
In a Mach-O object file a relocatable expression of the form
SymbolA - SymbolB + constant is allowed when both symbols are
defined in a section. But when either symbol is undefined it
is an error.
The code was crashing when it had an undefined symbol in this case.
And should have printed a error message using the location information
in the relocation entry.
Frederic Riss [Fri, 24 Oct 2014 21:31:09 +0000 (21:31 +0000)]
Sink DwarfUnit::constructImportedEntityDIE into DwarfCompileUnit.
So that it has access to getOrCreateGlobalVariableDIE. If we ever support
decsribing using directive in C++ classes (thus requiring support in type
units), it will certainly use another mechanism anyway.
Simon Pilgrim [Fri, 24 Oct 2014 21:04:41 +0000 (21:04 +0000)]
[X86][SSE] Bitcast assertion in XFormVExtractWithShuffleIntoLoad
Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element.
An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle.
Colin LeMahieu [Fri, 24 Oct 2014 19:00:32 +0000 (19:00 +0000)]
[Hexagon] Resubmission of 220427
Modified library structure to deal with circular dependency between HexagonInstPrinter and HexagonMCInst.
Adding encoding bits for add opcode.
Adding llvm-mc tests.
Removing unit tests.
Rafael Espindola [Fri, 24 Oct 2014 18:13:04 +0000 (18:13 +0000)]
Don't ever call materializeAllPermanently during LTO.
To do this, change the representation of lazy loaded functions.
The previous representation cannot differentiate between a function whose body
has been removed and one whose body hasn't been read from the .bc file. That
means that in order to drop a function, the entire body had to be read.
David Blaikie [Fri, 24 Oct 2014 17:57:34 +0000 (17:57 +0000)]
DebugInfo: Sink DwarfDebug::ScopeVariables down into DwarfFile
(part of refactoring to allow subprogram emission in both the skeleton
and main units to enable -gmlt-like data to be included in the skeleton
for live inlined backtracing purposes)
Sanjay Patel [Fri, 24 Oct 2014 17:02:16 +0000 (17:02 +0000)]
Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.
For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).
This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..
See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900
Daniel Sanders [Fri, 24 Oct 2014 16:15:27 +0000 (16:15 +0000)]
[mips] Replace MipsABIEnum with a MipsABIInfo class.
Summary:
No functional change yet, it's just an object replacement for an enum.
It will allow us to gather ABI information in a single place so that we can
start testing for properties of the ABI's instead of the ABI itself.
For example we will eventually be able to use:
ABI.MinStackAlignmentInBytes()
instead of:
(isABI_N32() || isABI_N64()) ? 16 : 8
which is clearer and more maintainable.
Aaron Ballman [Fri, 24 Oct 2014 15:16:39 +0000 (15:16 +0000)]
These functions are not actually defined for NDEBUG or !LLVM_DUMP_ENABLED, so guarding the declarations as well. NFC, silences MSVC warnings in release builds.
Daniel Sanders [Fri, 24 Oct 2014 13:09:19 +0000 (13:09 +0000)]
[mips] For N32/N64, structs must be passed in the upper bits of a register.
Summary:
Most structs were fixed by r218451 but those of between >32-bits and
<64-bits remained broken since they were not marked with [ASZ]ExtUpper.
This patch fixes the remaining cases by using
CCPromoteToUpperBitsInType<i64> on i64's in addition to i32 and smaller.
Oliver Stannard [Fri, 24 Oct 2014 09:54:41 +0000 (09:54 +0000)]
[AArch64] Fix fast-isel of cbz of i1, i8, i16
This fixes a miscompilation in the AArch64 fast-isel which was
triggered when a branch is based on an icmp with condition eq or ne,
and type i1, i8 or i16. The cbz instruction compares the whole 32-bit
register, so values with the bottom 1, 8 or 16 bits clear would cause
the wrong branch to be taken.
Adam Nemet [Fri, 24 Oct 2014 00:03:00 +0000 (00:03 +0000)]
[AVX512] FMA support for the 231 variants
This is asm/diasm-only support, similar to AVX.
For ISeling the register variant, they are no different from 213 other than
whether the multiplication or the addition operand is destructed.
For ISeling the memory variant, i.e. to fold a load, they are no different
than the 132 variant. The addition operand (op3) in both cases can come from
memory. Again the ony difference is which operand is destructed.
There could be a post-RA pass that would convert a 213 or 132 into a 231.
Adam Nemet [Fri, 24 Oct 2014 00:02:55 +0000 (00:02 +0000)]
[AVX512] Introduce fma3p_forms from AVX
This multiclass generates the different forms: 213, 231, 132 in AVX.
132 in AVX512 is a separate class but I am planning to use this same
multiclass to generate 231 relying on the nice the null_frag trick from AVX to
disable codegen pattern for 231.
No functionality change, no change in X86.td.expanded except for the different
instruction definition names.
Tim Northover [Thu, 23 Oct 2014 22:31:48 +0000 (22:31 +0000)]
ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodes
x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS
dependency because it was represented by a pair of CopyFromReg(EFLAGS) ->
CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an
implicit-def on the instruction, where the result numbers in the DAG and the
Uses list in TableGen matched up precisely.
The Copy notation seems much more robust, so this patch extends ScheduleDAG
rather than refactoring x86.
David Blaikie [Thu, 23 Oct 2014 22:27:50 +0000 (22:27 +0000)]
DebugInfo: Remove DwarfDebug::CurrentFnArguments since we have to handle argument ordering of other arguments (abstract arguments) in the same way and already have code for that too.
While refactoring this code I was confused by both the name I had
introduced (addNonArgumentVariable... but it has all this logic to
handle argument numbering and keep things in order?) and by the
redundancy. Seems when I fixed the misordered inlined argument handling,
I didn't realize it was mostly redundant with the argument ordering code
(which I may've also written, I'm not sure). So let's just rely on the
more general case.
The only oddity in output this produces is that it means when we emit
all the variables for the current function, we don't track when we've
finished the argument variables and are about to start the local
variables and insert DW_AT_unspecified_parameters (for varargs
functions) there. Instead it ends up after the local variables, scopes,
etc. But this isn't invalid and doesn't cause DWARF consumers problems
that I know of... so we'll just go with that because it makes the code
nice & simple.
(though, let's see what the buildbots have to say about this - *crosses
fingers*)
There will be some cleanup commits to follow to remove the now trivial
wrappers, etc.