Matthias Braun [Tue, 21 Mar 2017 21:58:08 +0000 (21:58 +0000)]
SplitKit: Fix subreg copy related problems
Fix two problems related to r298025:
- SplitKit would create duplicate VNIs in some cases leading to crashs
when hoisting copies.
- VirtRegMap could fail expanding copies at the beginning of a basic
block.
Matt Arsenault [Tue, 21 Mar 2017 21:39:51 +0000 (21:39 +0000)]
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
Zachary Turner [Tue, 21 Mar 2017 20:27:36 +0000 (20:27 +0000)]
Improve StringMap iterator support.
StringMap's iterators did not support LLVM's
iterator_facade_base, which made it unusable in various
STL algorithms or with some of our range adapters.
This patch makes both StringMapConstIterator as well as
StringMapIterator support iterator_facade_base.
With this in place, it is easy to make an iterator adapter
that iterates over only keys, and whose value_type is
StringRef. So I add StringMapKeyIterator as well, and
provide the method StringMap::keys() that returns a
range that can be iterated.
Dehao Chen [Tue, 21 Mar 2017 19:55:36 +0000 (19:55 +0000)]
Do not inline hot callsites for samplepgo in thinlto compile phase.
Summary: Because SamplePGO passes will be invoked twice in ThinLTO build: once at compile phase, the other at backend. We want to make sure the IR at the 2nd phase matches the hot part in profile, thus we do not want to inline hot callsites in the first phase.
Coby Tayree [Tue, 21 Mar 2017 19:31:55 +0000 (19:31 +0000)]
[X86][MS-compatability][llvm] allow MS TYPE/SIZE/LENGTH operators as a part of a compound expression
This patch introduces X86AsmParser with the ability to handle the aforementioned ops within compound "MS" arithmetical expressions.
Currently - only supported as a stand alone Operand, e.g.:
"TYPE X"
now allowed :
"4 + TYPE X * 128"
Artyom Skrobov [Tue, 21 Mar 2017 18:39:41 +0000 (18:39 +0000)]
[ARM] Recommit the glueless lowering of addc/adde in Thumb1,
including the amended (no UB anymore) fix for adding/subtracting -2147483648.
This reverts r298328 "[ARM] Revert r297443 and r297820."
and partially reverts r297842 "Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648""
Recommit r298282 with fixes for memory allocation/deallocation
[Hexagon] Recognize polynomial-modulo loop idiom again
Regain the ability to recognize loops calculating polynomial modulo
operation. This ability has been lost due to some changes in the
preceding optimizations. Add code to preprocess the IR to a form
that the pattern matching code can recognize.
Reid Kleckner [Tue, 21 Mar 2017 16:57:19 +0000 (16:57 +0000)]
Rename AttributeSet to AttributeList
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.
Adrian Prantl [Tue, 21 Mar 2017 16:37:39 +0000 (16:37 +0000)]
Don't compose DWARF expressions with multiple subregisters.
If a register location can only be described by a complex expression
(i.e., multiple subregisters) it doesn't safely compose with another
complex expression. For example, it is not possible to apply a
DW_OP_deref operation to multiple DW_OP_pieces.
Sanne Wouda [Tue, 21 Mar 2017 14:59:17 +0000 (14:59 +0000)]
[ARM] [Assembler] Support negative immediates for A32, T32 and T16
Summary:
To support negative immediates for certain arithmetic instructions, the
instruction is converted to the inverse instruction with a negated (or inverted)
immediate. For example, "ADD r0, r1, #FFFFFFFF" cannot be encoded as an ADD
instruction. However, "SUB r0, r1, #1" is equivalent.
These conversions are different from instruction aliases. An alias maps
several assembler instructions onto one encoding. A conversion, however, maps
an *invalid* instruction--e.g. with an immediate that cannot be represented in
the encoding--to a different (but equivalent) instruction.
Several instructions with negative immediates were being converted already, but
this was not systematically tested, nor did it cover all instructions.
This patch implements all possible substitutions for ARM, Thumb1 and
Thumb2 assembler and adds tests. It also adds a feature flag
(-mattr=+no-neg-immediates) to turn these substitutions off. This is
helpful for users who want their code to assemble to exactly what they
wrote.
Sanjay Patel [Tue, 21 Mar 2017 13:50:33 +0000 (13:50 +0000)]
[x86] use PMOVMSK for vector-sized equality comparisons
We could do better by splitting any oversized type into whatever vector size the target supports,
but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte
structs, so I think we can wire that up with a TLI hook that feeds into this.
Sam Kolton [Tue, 21 Mar 2017 12:51:34 +0000 (12:51 +0000)]
[ADMGPU] SDWA peephole optimization pass.
Summary:
First iteration of SDWA peephole.
This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
'''
V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
'''
Into:
'''
V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
'''
Pass structure:
1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
3. Iterate over all potential instructions and check if they can be converted into SDWA.
4. Convert instructions to SDWA.
This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
There are several ways this pass can be improved:
1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
2. Introduce more SDWA patterns
3. Introduce mnemonics to limit when SDWA patterns should apply
Andrea Di Biagio [Tue, 21 Mar 2017 11:36:21 +0000 (11:36 +0000)]
[DebugInfo][X86] Teach Optimize LEAs pass to handle debug values
This patch fixes an issue in the Optimize LEAs pass where redundant LEAs were
not removed because they were being used by debug values. The debug values are
now ignored when determining whether LEAs are redundant.
For now the debug values for the redundant LEAs are marked as undefined,
effectively lost. The intention is for a follow up patch which will attempt to
preserve the debug values where possible.
David Green [Tue, 21 Mar 2017 10:17:39 +0000 (10:17 +0000)]
[ConstantFolding] Fix to prevent constant folding having to repeatedly scan operands. NFCI
After the loop unroll threshold was increased in r295538, very
large constant expressions can be created. This prevents them
from having to be recursively scanned, leading to a compile
time blow-up.
Serge Pavlov [Tue, 21 Mar 2017 04:03:24 +0000 (04:03 +0000)]
Fix evaluation of LLVM_DEFINITIONS
CMake variable LLVM_DEFINITIONS collects preprocessor definitions provided
for host compiler that builds llvm components. A function
add_llvm_definitions was introduced in AddLLVMDefinitions.cmake to keep
track of these definitions and was intended to be a replacement for CMake
command add_definitions. Actually in many cases add_definitions is still
used and the content of LLVM_DEFINITIONS is not actual now. On the other
hand the current version of CMake allows getting set of definitions in a
more convenient way. This fix implements evaluation of the variable by
reading corresponding cmake property.
Eli Friedman [Tue, 21 Mar 2017 00:26:39 +0000 (00:26 +0000)]
[ARM] Revert r297443 and r297820.
The glueless lowering of addc/adde in Thumb1 has known serious
miscompiles (see https://reviews.llvm.org/D31081), and r297820
causes an infinite loop for certain constructs. It's not
clear when they will be fixed, so let's just take them out
of the tree for now.
Zachary Turner [Mon, 20 Mar 2017 23:33:18 +0000 (23:33 +0000)]
Add a function to MD5 a file's contents.
In doing so, clean up the MD5 interface a little. Most
existing users only care about the lower 8 bytes of an MD5,
but for some users that care about the upper and lower,
there wasn't a good interface. Furthermore, consumers
of the MD5 checksum were required to handle endianness
details on their own, so it seems reasonable to abstract
this into a nicer interface that just gives you the right
value.
Adrian Prantl [Mon, 20 Mar 2017 21:35:09 +0000 (21:35 +0000)]
Replace uses of DwarfExpression::addMachineReg* with addMachineRegExpression
and mark the methods as protected.
Besides reducing the surface area of DwarfExpression, this is in
preparation for an upcoming bugfix in the DwarfExpression
implementation, for which it will be necessary to defer emitting
register operations until the rest of the expression is known.
Eli Friedman [Mon, 20 Mar 2017 20:25:46 +0000 (20:25 +0000)]
[SCEV] Fix trip multiple calculation
If loop bound containing calculations like min(a,b), the Scalar
Evolution API getSmallConstantTripMultiple returns 4294967295 "-1"
as the trip multiple. The problem is that, SCEV use -1 * umax to
represent umin. The multiple constant -1 was returned, and the logic
of guarding against huge trip counts was skipped. Because -1 has 32
active bits.
The fix attempt to factor more general cases. First try to get the
greatest power of two divisor of trip count expression. In case
overflow happens, the trip count expression is still divisible by the
greatest power of two divisor returned. Returns 1 if not divisible by 2.
These are needed due to some obscure rules in the standard
about how std::vector selects between copy and move
constructors, which can cause a conforming implementation
to attempt to select the copy constructor of RuleMatcher,
which will fail since std::unique_ptr<> isn't copyable.
Kevin Enderby [Mon, 20 Mar 2017 19:46:55 +0000 (19:46 +0000)]
Add the rest of the error checking for Mach-O dyld compact bind entry errors
and test cases for each of the error checks.
To do this more plumbing was needed so that the segment indexes and
segment offsets can be checked. Basically what was done was the SegInfo
from llvm-objdump’s MachODump.cpp was moved into libObject for Mach-O
objects as BindRebaseSegInfo and it is only created when an iterator for
bind or rebase entries are created.
This commit really only adds the error checking and test cases for the
bind table entires and the checking for the lazy bind and weak bind entries
are still to be fully done as well as the rebase entires. Though some of
the plumbing for those are added with this commit. Those other error
checks and test cases will be added in follow on commits.
Note, the two llvm_unreachable() calls should now actually be unreachable
with the error checks in place and would take a logic bug in the error
checking code to be reached if the segment indexes and segment
offsets are used from a checked bind entry. Comments have been added
to the methods that require the arguments to have been checked
prior to calling.
[Hexagon] Recognize polynomial-modulo loop idiom again
Regain the ability to recognize loops calculating polynomial modulo
operation. This ability has been lost due to some changes in the
preceding optimizations. Add code to preprocess the IR to a form
that the pattern matching code can recognize.
Reid Kleckner [Mon, 20 Mar 2017 17:45:59 +0000 (17:45 +0000)]
[WinEH] Adjust decision to emit SEH moves for leaf functions
Move the check for "MF->hasWinCFI()" up into the calculation of the
shouldEmitMoves boolean, rather than putting it in the early returning
if. This ensures that endFunction doesn't try to emit .seh_* directives
for leaf functions.
Support, LTO: When pruning a directory, ignore files matching a prefix.
This is a safeguard against data loss if the user specifies a directory
that is not a cache directory. Teach the existing cache pruning clients
to create files with appropriate names.
Craig Topper [Mon, 20 Mar 2017 16:31:14 +0000 (16:31 +0000)]
[InstCombine] Print a debug message when we constant fold an operand during worklist creation
InstCombine tries to constant fold instruction operands during worklist building, but we don't print that we're doing this.
We also set a change flag here that causes us to rebuild and rerun the worklist one more time even if processing the worklist itself created no additional changes. So in the log I saw two inst combine runs that visited all instructions without printing that anything was changed. I may be submitting another patch to remove the change flag unless I can find some reason why we should be doing that.
Daniel Berlin [Mon, 20 Mar 2017 16:08:29 +0000 (16:08 +0000)]
Templatize parts of VNCoercion, and add constant-only versions of the functions to be used in NewGVN.
NFCI.
Summary:
This is ground work for the changes to enable coercion in NewGVN.
GVN doesn't care if they end up constant because it eliminates as it goes.
NewGVN cares.
IRBuilder and ConstantFolder deliberately present the same interface,
so we use this to our advantage to templatize our functions to make
them either constant only or not.
Jessica Paquette [Mon, 20 Mar 2017 15:51:45 +0000 (15:51 +0000)]
[Outliner] Remove output for offset range check
Forgot to remove some output before committing last time. (Instruction fixups
don't actually overflow anywhere in the test suite so far, so I missed it).
To prevent the outliner from screaming "Overflow!" in the event that that
does happen, this commit removes that output.
Daniel Sanders [Mon, 20 Mar 2017 15:20:42 +0000 (15:20 +0000)]
[tablegen][globalisel] Capture instructions into locals and related infrastructure for multiple instructions matches.
Summary:
Prepare the way for nested instruction matching support by having actions
like CopyRenderer look up operands in the RuleMatcher rather than a
specific InstructionMatcher. This allows actions to reference any operand
from any matched instruction.
It works by checking the 'shape' of the match and capturing
each matched instruction to a local variable. If the shape is wrong
(not enough operands, leaf nodes where non-leafs are expected, etc.), then
the rule exits early without checking the predicates. Once we've captured
the instructions, we then test the predicates as before (except using the
local variables). If the match is successful, then we render the new
instruction as before using the local variables.
It's not noticable in this patch but by the time we support multiple
instruction matching, this patch will also cause a significant improvement
to readability of the emitted code since
MRI.getVRegDef(I->getOperand(0).getReg()) will simply be MI1 after
emitCxxCaptureStmts().
This isn't quite NFC because I've also fixed a bug that I'm surprised we
haven't encountered yet. It now checks there are at least the expected
number of operands before accessing them with getOperand().
Diana Picus [Mon, 20 Mar 2017 14:40:18 +0000 (14:40 +0000)]
[GlobalISel] Use the correct calling conv for calls
This commit adds a parameter that lets us pass in the calling convention
of the call to CallLowering::lowerCall. This allows us to handle
situations where the calling convetion of the callee is different from
that of the caller.
Maxim Ostapenko [Mon, 20 Mar 2017 14:06:04 +0000 (14:06 +0000)]
[sancov] Fix broken links and displaced coloring in coverage-report-server.py
This patch fixes two issues:
* Fixed relative links to source files
* Enumeration of lines in source files starts from 1 instead of 0 to
align with .symcov files generated by sancov -symbolize
Summary:
ConstantRange class currently has a method getSetSize, which is mostly used to
compare set sizes of two constant ranges (there is only one spot where it's used
in a slightly different scenario). This patch introduces setSizeSmallerThanOf
method, which does such comparison in a more efficient way. In the original
method we have to extend our types to (BitWidth+1), which can result it using
slow case of APInt, extra memory allocations, etc.
The change is supposed to not change any functionality, but it slightly improves
compile time. Here is compile time improvements that I observed on CTMark:
* tramp3d-v4 -2.02%
* pairlocalalign -1.82%
* lencod -1.67%
Xin Tong [Mon, 20 Mar 2017 00:30:19 +0000 (00:30 +0000)]
Remove unnecessary IDom check
Summary: This Idom check seems unnecessary. The immediate children of a node on the Dominator Tree should always be the IDom of its immediate children in this case.
Craig Topper [Sun, 19 Mar 2017 17:11:09 +0000 (17:11 +0000)]
[AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time instead of isel
Summary:
Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust.
This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely.