Chen Li [Sat, 26 Dec 2015 07:54:32 +0000 (07:54 +0000)]
[gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint.
Craig Topper [Sat, 26 Dec 2015 04:58:05 +0000 (04:58 +0000)]
Add test case for r256433. "[X86] Fix shuffle decoding for variable VPERMIL to be tolerant of the Constant type not matching due to folding in the constant pool and to get VPERMILPD correct."
Craig Topper [Sat, 26 Dec 2015 04:50:07 +0000 (04:50 +0000)]
[X86] Fix shuffle decoding for variable VPERMIL to be tolerant of the Constant type not matching due to folding in the constant pool and to get VPERMILPD correct.
Craig Topper [Fri, 25 Dec 2015 17:07:32 +0000 (17:07 +0000)]
[X86] getX86SubSuperRegisterOrZero shouldn't call getX86SubSuperRegister recursively. It should call itself instead. Otherwise it might fire an assertion when it was designed not too.
David Majnemer [Fri, 25 Dec 2015 09:37:26 +0000 (09:37 +0000)]
[CodeGen] Use generic printAsOperand machinery instead of hand rolling it
We already know how to properly print out basic blocks in
printAsOperand, we should not roll it ourselves in
AsmPrinter::EmitBasicBlockStart. No functionality change is intended.
Dan Gohman [Fri, 25 Dec 2015 00:31:02 +0000 (00:31 +0000)]
[WebAssembly] Fix handling of COPY instructions in WebAssemblyRegStackify.
Move RegStackify after coalescing and teach it to use LiveIntervals instead
of depending on SSA form. This avoids a problem where a register in a COPY
instruction is stackified and then subsequently coalesced with a register
that is not stackified.
This also puts it after the scheduler, which allows us to simplify the
EXPR_STACK constraint, as we no longer have instructions being reordered
after stackification and before coloring.
Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases.
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.
Dave Bartolomeo [Thu, 24 Dec 2015 18:12:38 +0000 (18:12 +0000)]
LLVM CodeView library
Summary: This diff is the initial implementation of the LLVM CodeView library. There is much more work to be done, namely a CodeView dumper and tests. This patch should help others make progress on the LLVM->CodeView debug info emission while I continue with the implementation of the dumper and tests.
This library implements support for emitting debug info in the CodeView format. This phase of the implementation only includes support for CodeView type records. Clients that need to emit type records will use a class derived from TypeTableBuilder. TypeTableBuilder provides member functions for writing each kind of type record; each of these functions eventually calls the writeRecord virtual function to emit the actual bits of the record. Derived classes override writeRecord to implement the folding of duplicate records and the actual emission to the appropriate destination. LLVMCodeView provides MemoryTypeTableBuilder, which creates the table in memory. In the future, other classes derived from TypeTableBuilder will write to other destinations, such as the type stream in a PDB.
The rest of the types in LLVMCodeView define the actual CodeView type records and all of the supporting enums and other types used in the type records. The TypeIndex class is of particular interest, because it is used by clients as a handle to a type in the type table.
The library provides a relatively low-level interface based on the actual on-disk format of CodeView. For example, type records refer to other type records by TypeIndex, rather than by an actual pointer to the referent record. This allows clients to emit type records one at a time, rather than having to keep the entire transitive closure of type records in memory until everything has been emitted. At some point, having a higher-level interface layered on top of this one may be useful for debuggers and other tools that want a more holistic view of the debug info. The lower-level interface should be sufficient for compilers and linkers to do the debug info manipulation that they need to do efficiently.
AVX-512: Kreg set 0/1 optimization
The patterns that set a mask register to 0/1
KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn
are replaced with
KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization.
KNL does not recognize dependency-breaking idioms for mask registers,
so kxnor %k1, %k1, %k2 has a RAW dependence on %k1.
Using %k0 as the undef input register is a performance heuristic based
on the assumption that %k0 is used less frequently than the other mask
registers, since it is not usable as a write mask.
JF Bastien [Wed, 23 Dec 2015 23:56:13 +0000 (23:56 +0000)]
WebAssembly: remove 'external' from test
Summary: Linker testing was sad at seeing an unresolved external symbol. For now don't do that: it's valid but we're not playing with multi-file linking yet, and the LLVM tests are used as hacky sanity tests for single-file linking (the GCC torture tests are much better for this purpose). Another solution would be to use '.extern' to make the intent explicit (don't simple-file link this, there's an unresolved symbol), some assemblers use '.extern' while others ignore it, so we wouldn't really be inventing anything new.
Philip Reames [Wed, 23 Dec 2015 23:44:28 +0000 (23:44 +0000)]
[Statepoints] Use Indirect operands for spill slots
Teach the statepoint lowering code to emit Indirect stackmap entries for spill inserted by StatepointLowering (i.e. SelectionDAG), but Direct stackmap entries for in-IR allocas which represent manual stack slots. This is what the docs call for (http://llvm.org/docs/StackMaps.html#stack-map-format), but we've been emitting both as Direct. This was pointed out recently on the mailing list as a bug. It also blocks http://reviews.llvm.org/D15632 which extends the lowering to handle vector-of-pointers since only Indirect references can encode a variable sized slot.
To implement this, I introduced a new flag on the StackObject class used to maintian information about stack slots. I original considered (and prototyped in http://reviews.llvm.org/D15632), the idea of using the existing isSpillSlot flag, but end up deciding that was a bit too risky and that the cost of adding a new flag was low. Having the new flag will also allow us - in the future - to emit better comments in verbose assembly which indicate where a particular stack spill around a call comes from. (deopt, gc, regalloc).
Adrian Prantl [Wed, 23 Dec 2015 21:51:13 +0000 (21:51 +0000)]
llvm-dwarfdump: Add support for dumping .dSYM bundles.
This replicates the logic of Darwin dwarfdump for manually opening up
.dSYM bundles without introducing any new dependencies.
<rdar://problem/20491670>
Philip Reames [Wed, 23 Dec 2015 19:16:04 +0000 (19:16 +0000)]
[MemOperands] Clarify code around dropping memory operands [NFC]
Clarify a comment about what it means to drop memory operands from an instruction. While I'm adding change the name of the method slightly to make it a bit more clear what's going on when reading calling code.
Keno Fischer [Wed, 23 Dec 2015 18:27:23 +0000 (18:27 +0000)]
[Function] Properly remove use when clearing personality
Summary:
We need to actually remove the use of the personality function,
otherwise we can run into trouble if we want to e.g. delete
the personality function because ther's no way to get rid of
its uses. Do this by resetting to ConstantPointerNull value
that the operands are set to when first allocated.
Sanjoy Das [Wed, 23 Dec 2015 17:48:14 +0000 (17:48 +0000)]
[SCEV] Fix getLoopBackedgeTakenCounts
The way `getLoopBackedgeTakenCounts` is written right now isn't
correct. It will try to compute and store the BE counts of a Loop
#{child loop} number of times (which may be zero).
Philip Reames [Wed, 23 Dec 2015 17:05:57 +0000 (17:05 +0000)]
[MachineLICM] Fix handling of memoperands
As far as I can tell, the correct interpretation of an empty memoperands list is that we didn't have sufficient room to store information about the MachineInstr, NOT that the MachineInstr doesn't access any particular bit of memory. This appears to be fairly consistent in a number of places, but I'm not 100% sure of this interpretation. I'd really appreciate someone more knowledgeable confirming my reading of the code.
This patch fixes two latent bugs in MachineLICM - given the above assumption - and adds comments to document the meaning and required handling. I don't have test cases; these were noticed by inspection.
Simon Pilgrim [Wed, 23 Dec 2015 13:10:07 +0000 (13:10 +0000)]
[X86][AVX] Only shuffle the lower half of vectors if the upper half is undefined
First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151.
As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops.
Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well.
Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions).
David Majnemer [Wed, 23 Dec 2015 03:59:04 +0000 (03:59 +0000)]
[WinEH] Don't visit the same catchswitch twice
We visited the same catchswitch twice because it was both the child of
another funclet and the predecessor of a cleanuppad.
Instead, change the numbering algorithm to only recurse if the unwind
destination of the inner funclet agrees with the unwind destination of
the catchswitch.
Nico Weber [Wed, 23 Dec 2015 02:38:31 +0000 (02:38 +0000)]
win: Pass /W4 in front of all the -wd flags.
This should fix many many -Wunused-parameter warnings in self-host builds on
Windows after r255382. cl.exe doesn't care about the order of /W4 and
/wd flags, but clang-cl currently does (just like -Wno-foo -Wall order
matters for clang). We might want to change how clang-cl behaves in
the future, but until then this change makes self-host builds much more
silent.
Philip Reames [Wed, 23 Dec 2015 01:42:15 +0000 (01:42 +0000)]
[GC] Make GCStrategy::isGCManagedPointer a type predicate not a value predicate [NFC]
Reasons:
1) The existing form was a form of false generality. None of the implemented GCStrategies use anything other than a type. Its becoming more and more clear we're going to need some type of strong GC pointer in the type system and we shouldn't pretend otherwise at this point.
2) The API was awkward when applied to vectors-of-pointers. The old one could have been made to work, but calling isGCManagedPointer(Ty->getScalarType()) is much cleaner than the Value alternatives.
3) The rewriting implementation effectively assumes the type based predicate as well. We should be consistent.
Akira Hatanaka [Tue, 22 Dec 2015 23:57:37 +0000 (23:57 +0000)]
Provide a way to specify inliner's attribute compatibility and merging.
This reapplies r256277 with two changes:
- In emitFnAttrCompatCheck, change FuncName's type to std::string to fix
a use-after-free bug.
- Remove an unnecessary install-local target in lib/IR/Makefile.
Original commit message for r252949:
Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
Dan Gohman [Tue, 22 Dec 2015 23:37:37 +0000 (23:37 +0000)]
Add an OperandNamespace field to Target.td's Operand.
For targets to add their own operand types as needed, as advertised in
Operand's comment, they need to be able to specify an alternate namespace
for OperandType names too. This matches the RegisterOperand class.
Changpeng Fang [Tue, 22 Dec 2015 20:55:23 +0000 (20:55 +0000)]
AMDGPU/SI: Use flat for global load/store when targeting HSA
Summary:
For some reason doing executing an MUBUF instruction with the addr64
bit set and a zero base pointer in the resource descriptor causes
the memory operation to be dropped when the shader is executed using
the HSA runtime.
This kind of MUBUF instruction is commonly used when the pointer is
stored in VGPRs. The base pointer field in the resource descriptor
is set to zero and and the pointer is stored in the vaddr field.
This patch resolves the issue by only using flat instructions for
global memory operations when targeting HSA. This is an overly
conservative fix as all other configurations of MUBUF instructions
appear to work.
NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll
Akira Hatanaka [Tue, 22 Dec 2015 20:00:05 +0000 (20:00 +0000)]
Provide a way to specify inliner's attribute compatibility and merging.
This reapplies r252990 and r252949. I've added member function getKind
to the Attr classes which returns the enum or string of the attribute.
Original commit message for r252949:
Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
Changpeng Fang [Tue, 22 Dec 2015 19:32:28 +0000 (19:32 +0000)]
AMDGPU/SI: Use flat for global load/store when targeting HSA
Summary:
For some reason doing executing an MUBUF instruction with the addr64
bit set and a zero base pointer in the resource descriptor causes
the memory operation to be dropped when the shader is executed using
the HSA runtime.
This kind of MUBUF instruction is commonly used when the pointer is
stored in VGPRs. The base pointer field in the resource descriptor
is set to zero and and the pointer is stored in the vaddr field.
This patch resolves the issue by only using flat instructions for
global memory operations when targeting HSA. This is an overly
conservative fix as all other configurations of MUBUF instructions
appear to work.
Cong Hou [Tue, 22 Dec 2015 18:56:14 +0000 (18:56 +0000)]
[BPI] Replace weights by probabilities in BPI.
This patch removes all weight-related interfaces from BPI and replace
them by probability versions. With this patch, we won't use edge weight
anymore in either IR or MC passes. Edge probabilitiy is a better
representation in terms of CFG update and validation.
Manuel Jacob [Tue, 22 Dec 2015 16:50:44 +0000 (16:50 +0000)]
[RS4GC] Fix crash in the case that a live variable has a constant base.
Summary:
Previously, RS4GC crashed in CreateGCRelocates() because it assumed
that every base is also in the array of live variables, which isn't true if a
live variable has a constant base.
This change fixes the crash by making sure CreateGCRelocates() won't try to
relocate a live variable with a constant base. This would be unnecessary
anyway because anything with a constant base won't move.
Jun Bum Lim [Tue, 22 Dec 2015 16:36:16 +0000 (16:36 +0000)]
[AArch64] Promote loads from stored
This is a recommit of r256004 which was reverted in r256160. The issue was the
incorrect promotion for half and byte loads transformed into mov instructions.
This fix will replace half and byte type loads only with bit field extracts.
Original commit message:
This change promotes load instructions which directly read from stored by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
STRWui %W1, %X0, 1
%W0 = LDRHHui %X0, 3
becomes
STRWui %W1, %X0, 1
%W0 = UBFMWri %W1, 16, 31
Keno Fischer [Tue, 22 Dec 2015 07:14:50 +0000 (07:14 +0000)]
[ASMPrinter] Fix missing handling of DW_OP_bit_piece
In r256077, I added printing for DIExpressions in DEBUG_VALUE comments,
but neglected to handle DW_OP_bit_piece operands. Thanks to
Mikael Holmen and Joerg Sonnenberger for spotting this.
David Majnemer [Tue, 22 Dec 2015 01:39:04 +0000 (01:39 +0000)]
[MC] Don't use the architecture to govern which object file format to use
InitMCObjectFileInfo was trying to override the triple in awkward ways.
For example, a triple specifying COFF but not Windows was forced as ELF.
This makes it easy for internal invariants to get violated, such as
those which triggered PR25912.
Easwaran Raman [Tue, 22 Dec 2015 00:32:35 +0000 (00:32 +0000)]
Determine callee's hotness and adjust threshold based on that. NFC.
This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold
callees and uses values of inlinehint-threshold and inlinecold-threshold
respectively as the thresholds for such callees.