Chandler Carruth [Tue, 29 Dec 2015 09:52:41 +0000 (09:52 +0000)]
[ADT] Teach alignment helpers to work correctly for abstract classes.
This is necessary to use them as part of pointer traits and is generally
useful. I've added unit test coverage to isolate and ensure this works
correctly.
I'll watch the build bots to try to see if any compilers can't tolerate
this bit of magic (and much credit goes to Richard Smith for coming up
with this magical production!) but give a shout if you see issues.
Chandler Carruth [Tue, 29 Dec 2015 09:32:18 +0000 (09:32 +0000)]
[ptr-traits] Provide a real MCFragment address for the sentinel instead
of casting the integer '4' to such a pointer. There is no reason to
expect '4' to be a portable or reliable pointer of this form. The only
reason this ever worked is because the PointerIntPair that this actually
gets used with has an artificially *low* presumed alignment that allowed
it to work. When the alignment of PointerIntPair is derived from the
actual type's alignment, the asserts start firing on this pointer. I'm
amazed we never managed to do anything that triggered the alignment
sanitizer with it, as this is just flat out UB.
If folks dislike this approach to providing a sentinel fragment address,
there are a myriad of other alternatives, suggestions welcome. But this
one has the distinct advantage of not requiring the friend dance of
ilist's sentinel (which I'll point out is *also* in play for
MCFragment!) and seems to be using a nicely provided facility in
MCFragment to establish just such dummy nodes.
This is part of a series of patches to allow LLVM to check for complete
pointee types when computing its pointer traits. This is absolutely
necessary to get correct (or reproducible) results for things like how
many low bits are guaranteed to be zero.
Chandler Carruth [Tue, 29 Dec 2015 09:24:42 +0000 (09:24 +0000)]
[ptr-traits] Sink several in-body method definitions to be out-of-line
inline definitions after the mutually recursive pair of types have been
defined. The two types mutually recurse specifically through
abstractions that require pointer traits which makes this kind of mutual
recursion especially tricky to get right in terms of ordering.
This is part of a series of patches to allow LLVM to check for complete
pointee types when computing its pointer traits. This is absolutely
necessary to get correct (or reproducible) results for things like how
many low bits are guaranteed to be zero.
Chandler Carruth [Tue, 29 Dec 2015 09:24:39 +0000 (09:24 +0000)]
[ptr-traits] Sink a constructor definition to the .cpp file and add
missing includes so that the pointee types for DenseMap pointer keys and
such are complete prior to us querying the pointer traits for them.
This is part of a series of patches to allow LLVM to check for complete
pointee types when computing its pointer traits. This is absolutely
necessary to get correct (or reproducible) results for things like how
many low bits are guaranteed to be zero.
Chandler Carruth [Tue, 29 Dec 2015 09:06:21 +0000 (09:06 +0000)]
[ptr-traits] Add a bunch of includes to provide complete types that are
used in pointer dense map key types or in other ways that require
pointer traits.
This is part of a series of patches to allow LLVM to check for complete
pointee types when computing its pointer traits. This is absolutely
necessary to get correct (or reproducible) results for things like how
many low bits are guaranteed to be zero.
Chandler Carruth [Tue, 29 Dec 2015 09:06:16 +0000 (09:06 +0000)]
[ptr-traits] Split the MCFragment type hierarchy out of the MCAssembler
header to its own header, allowing users of fragments to have a narrower
header file, and avoid circular header dependencies when getting the
definition of MCSection prior to inspecting traits on MCSection
pointers.
This is part of a series of patches to allow LLVM to check for complete
pointee types when computing its pointer traits. This is absolutely
necessary to get correct (or reproducible) results for things like how
many low bits are guaranteed to be zero.
Note that this doesn't in any way change the design of MC, it is just
moving code around to allow the *header files* to be more fine grained.
Without this, it is impossible to get a complete type for MCSection
where it is needed.
If anyone would prefer a different slicing of the header files, I'm
happy to oblige of course. =]
Previously, the code enforced non-decreasing alignment of each trailing
type. However, it's easy enough to allow for realignment as needed, and
thus avoid the developer having to think about the possiblilities for
alignment requirements on all architectures.
(E.g. on Linux/x86, a struct with an int64 member is 4-byte aligned,
while on other 32-bit archs -- and even with other OSes on x86 -- it has
8-byte alignment. This sort of thing is irritating to have to manually
deal with.)
Chandler Carruth [Tue, 29 Dec 2015 02:14:50 +0000 (02:14 +0000)]
[ptr-traits] Merge the MetadataTracking helpers into the Metadata
header.
This is part of a series of patches to allow LLVM to check for complete
pointee types when computing its pointer traits. This is absolutely
necessary to get correct (or reproducible) results for things like how
many low bits are guaranteed to be zero.
The MetadataTracking helpers aren't actually independent. They rely on
constructing a PointerUnion between Metadata and MetadataAsValue
pointers, which requires know the alignment of pointers to those types
which requires them to be complete.
The .cpp file even defined a method declared in Metadata.h! These really
don't seem like something that is separable, and there is no real
layering problem with just placing them together.
Chandler Carruth [Tue, 29 Dec 2015 00:03:24 +0000 (00:03 +0000)]
[ADT] Use a nonce type with at least 4 byte alignment.
We didn't actually statically check this, and so it worked 25% of the
time for me. =/ Really sorry it took so long to fix, I shouldn't leave
the commit log editor window open without saving and landing the commit.
=[
Artyom Skrobov [Mon, 28 Dec 2015 21:40:45 +0000 (21:40 +0000)]
[Thumb] Fix assembler error 'cannot honor width suffix pop {lr}'
Summary:
* avoid generating POP {LR} in Thumb1 epilogues
* combine MOV LR, Rx + BX LR -> BX Rx in a peephole optimization pass
* combine POP {LR} + B + BX LR -> POP {PC} on v5T+
Easwaran Raman [Mon, 28 Dec 2015 20:28:19 +0000 (20:28 +0000)]
Refactor inline costs analysis by removing the InlineCostAnalysis class
InlineCostAnalysis is an analysis pass without any need for it to be one.
Once it stops being an analysis pass, it doesn't maintain any useful state
and the member functions inside can be made free functions. NFC.
Manuel Jacob [Mon, 28 Dec 2015 20:14:05 +0000 (20:14 +0000)]
[RS4GC] Fix rematerialization of bitcast of bitcast.
Summary:
Previously, only the outer (last) bitcast was rematerialized, resulting in a
use of the unrelocated inner (first) bitcast after the statepoint. See the
test case for an example.
Implemented cost model for masked gather and scatter operations
The cost is calculated for all X86 targets. When gather/scatter instruction
is not supported we calculate the cost of scalar sequence.
Sanjay Patel [Mon, 28 Dec 2015 18:28:44 +0000 (18:28 +0000)]
Specify triple so 'make check' passes on darwin x86-64
The check lines were added with:
http://reviews.llvm.org/rL256458
http://reviews.llvm.org/rL256460
but on a darwin target, the output looks like:
## InlineAsm Start
rorq %rdi
## InlineAsm End
## InlineAsm Start
rorq %rsi
## InlineAsm End
leaq (%rsi,%rdi), %rax
retq
[X86] Better support for the MCU psABI (LLVM part)
This adds support for the MCU psABI in a way different from r251223 and r251224,
basically reverting most of these two patches. The problem with the approach
taken in r251223/4 is that it only handled libcalls that originated from the backend.
However, the mid-end also inserts quite a few libcalls and assumes these use the
platform's default calling convention.
The previous patch tried to insert inregs when necessary both in the FE and,
somewhat hackily, in the CG. Instead, we now define a new default calling convention
for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does.
Asaf Badouh [Mon, 28 Dec 2015 08:26:26 +0000 (08:26 +0000)]
[X86][AVX512] Lower broadcast sub vector to vector inrtrinsics
lower broadcast<type>x<vector> to shuffles.
there are two cases:
1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0.
2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op).
Chandler Carruth [Mon, 28 Dec 2015 01:54:18 +0000 (01:54 +0000)]
[lcg] Fix formatting errors found with clang-format, remove the now
optional '\brief' tag and reflow some comments based on the added
horizontal space. NFC.
Craig Topper [Sun, 27 Dec 2015 21:33:50 +0000 (21:33 +0000)]
[AVX512] Remove separate instruction and patterns for lowering ctlz_zero_undef. Change the operation for CTLZ_ZERO_UNDEF to Expand so SelectionDAG will convert them to CTLZ before lowering.
Craig Topper [Sun, 27 Dec 2015 21:33:47 +0000 (21:33 +0000)]
[SelectionDAG] Teach LegalizeVectorOps to not unroll CTLZ_ZERO_UNDEF and CTTZ_ZERO_UNDEF if the non-ZERO_UNDEF form is legal or custom. Will be used to simplify X86 code in a follow on commit.
Craig Topper [Sun, 27 Dec 2015 19:45:21 +0000 (19:45 +0000)]
[AVX512] Remove alternate data type versions of VALIGND, VALIGNQ, VMOVSHDUP and VMOVSLDUP. They don't have any tests and I don't think they can be selected. If they are truly needed they should be implemented with patterns against the normal instructions and not separate instructions.
Dan Liew [Sun, 27 Dec 2015 14:03:49 +0000 (14:03 +0000)]
[lit] Implement support of per test timeout in lit.
This should work with ShTest (executed externally or internally) and GTest
test formats.
To set the timeout a new option ``--timeout=`` has
been added which specifies the maximum run time of an individual test
in seconds. By default this 0 which causes no timeout to be enforced.
The timeout can also be set from a lit configuration file by modifying
the ``lit_config.maxIndividualTestTime`` property.
To implement a timeout we now require the psutil Python module if a
timeout is requested. This dependency is confined to the newly added
``lit.util.killProcessAndChildren()``. A note has been added into the
TODO document describing how we can remove the dependency on the
``pustil`` module in the future. It would be nice to remove this
immediately but that is a lot more work and Daniel Dunbar believes it is
better that we get a working implementation first and then improve it.
To avoid breaking the existing behaviour the psutil module will not be
imported if no timeout is requested.
The included testcases are derived from test cases provided by
Jonathan Roelofs which were in an previous attempt to add a per test
timeout to lit (http://reviews.llvm.org/D6584). Thanks Jonathan!
Igor Breger [Sun, 27 Dec 2015 13:56:16 +0000 (13:56 +0000)]
AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE.
Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB.
Implement VPMOVB/W/D/Q2M intrinsic.
Chandler Carruth [Sun, 27 Dec 2015 08:41:34 +0000 (08:41 +0000)]
[attrs] Extract the pure inference of function attributes into
a standalone pass.
There is no call graph or even interesting analysis for this part of
function attributes -- it is literally inferring attributes based on the
target library identification. As such, we can do it using a much
simpler module pass that just walks the declarations. This can also
happen much earlier in the pass pipeline which has benefits for any
number of other passes.
In the process, I've cleaned up one particular aspect of the logic which
was necessary in order to separate the two passes cleanly. It now counts
inferred attributes independently rather than just counting all the
inferred attributes as one, and the counts are more clearly explained.
The two test cases we had for this code path are both ... woefully
inadequate and copies of each other. I've kept the superset test and
updated it. We need more testing here, but I had to pick somewhere to
stop fixing everything broken I saw here.
Chandler Carruth [Sun, 27 Dec 2015 08:13:45 +0000 (08:13 +0000)]
[attrs] Split off the forced attributes utility into its own pass that
is (by default) run much earlier than FuncitonAttrs proper.
This allows forcing optnone or other widely impactful attributes. It is
also a bit simpler as the force attribute behavior needs no specific
iteration order.
I've added the pass into the default module pass pipeline and LTO pass
pipeline which mirrors where function attrs itself was being run.
Craig Topper [Sun, 27 Dec 2015 06:55:08 +0000 (06:55 +0000)]
[AVX-512] Remove alernate integer forms for VPERMILPS and VPERMILPD. There no tests for them and I don't see any way to select them anyway. If they are really needed they should be implemented as patterns and not full fledged instructions.
David Majnemer [Sun, 27 Dec 2015 06:07:26 +0000 (06:07 +0000)]
[X86, Win64] Use a frame pointer if pushf is emitted
A frame pointer must be used if stack pointer is modified after the
prologue. LLVM will emit pushf/popf if we need to save/restore the
FLAGS register, requiring us to have a frame pointer for the function.
There is a small twist: this sequence might exist in user code via
inline-assembly. For now, conservatively assume that such functions
require a frame pointer. For real world justification, please see
clang's implementation of __readeflags.
Chen Li [Sat, 26 Dec 2015 07:54:32 +0000 (07:54 +0000)]
[gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint.
Craig Topper [Sat, 26 Dec 2015 04:58:05 +0000 (04:58 +0000)]
Add test case for r256433. "[X86] Fix shuffle decoding for variable VPERMIL to be tolerant of the Constant type not matching due to folding in the constant pool and to get VPERMILPD correct."
Craig Topper [Sat, 26 Dec 2015 04:50:07 +0000 (04:50 +0000)]
[X86] Fix shuffle decoding for variable VPERMIL to be tolerant of the Constant type not matching due to folding in the constant pool and to get VPERMILPD correct.
Craig Topper [Fri, 25 Dec 2015 17:07:32 +0000 (17:07 +0000)]
[X86] getX86SubSuperRegisterOrZero shouldn't call getX86SubSuperRegister recursively. It should call itself instead. Otherwise it might fire an assertion when it was designed not too.
David Majnemer [Fri, 25 Dec 2015 09:37:26 +0000 (09:37 +0000)]
[CodeGen] Use generic printAsOperand machinery instead of hand rolling it
We already know how to properly print out basic blocks in
printAsOperand, we should not roll it ourselves in
AsmPrinter::EmitBasicBlockStart. No functionality change is intended.
Dan Gohman [Fri, 25 Dec 2015 00:31:02 +0000 (00:31 +0000)]
[WebAssembly] Fix handling of COPY instructions in WebAssemblyRegStackify.
Move RegStackify after coalescing and teach it to use LiveIntervals instead
of depending on SSA form. This avoids a problem where a register in a COPY
instruction is stackified and then subsequently coalesced with a register
that is not stackified.
This also puts it after the scheduler, which allows us to simplify the
EXPR_STACK constraint, as we no longer have instructions being reordered
after stackification and before coloring.
Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases.
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.
Dave Bartolomeo [Thu, 24 Dec 2015 18:12:38 +0000 (18:12 +0000)]
LLVM CodeView library
Summary: This diff is the initial implementation of the LLVM CodeView library. There is much more work to be done, namely a CodeView dumper and tests. This patch should help others make progress on the LLVM->CodeView debug info emission while I continue with the implementation of the dumper and tests.
This library implements support for emitting debug info in the CodeView format. This phase of the implementation only includes support for CodeView type records. Clients that need to emit type records will use a class derived from TypeTableBuilder. TypeTableBuilder provides member functions for writing each kind of type record; each of these functions eventually calls the writeRecord virtual function to emit the actual bits of the record. Derived classes override writeRecord to implement the folding of duplicate records and the actual emission to the appropriate destination. LLVMCodeView provides MemoryTypeTableBuilder, which creates the table in memory. In the future, other classes derived from TypeTableBuilder will write to other destinations, such as the type stream in a PDB.
The rest of the types in LLVMCodeView define the actual CodeView type records and all of the supporting enums and other types used in the type records. The TypeIndex class is of particular interest, because it is used by clients as a handle to a type in the type table.
The library provides a relatively low-level interface based on the actual on-disk format of CodeView. For example, type records refer to other type records by TypeIndex, rather than by an actual pointer to the referent record. This allows clients to emit type records one at a time, rather than having to keep the entire transitive closure of type records in memory until everything has been emitted. At some point, having a higher-level interface layered on top of this one may be useful for debuggers and other tools that want a more holistic view of the debug info. The lower-level interface should be sufficient for compilers and linkers to do the debug info manipulation that they need to do efficiently.
AVX-512: Kreg set 0/1 optimization
The patterns that set a mask register to 0/1
KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn
are replaced with
KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization.
KNL does not recognize dependency-breaking idioms for mask registers,
so kxnor %k1, %k1, %k2 has a RAW dependence on %k1.
Using %k0 as the undef input register is a performance heuristic based
on the assumption that %k0 is used less frequently than the other mask
registers, since it is not usable as a write mask.