Zachary Turner [Tue, 21 Feb 2017 21:31:28 +0000 (21:31 +0000)]
Try to fix the buildbot on OSX.
Since I'm only seeing failures on OSX, and it's saying
permission denied, I'm suspecting this is due to the addition
of the MAP_RESILIENT_CODESIGN and/or MAP_RESILIENT_MEDIA flags.
Speculatively trying to remove those to get the bots working.
Rafael Espindola [Tue, 21 Feb 2017 20:40:54 +0000 (20:40 +0000)]
Don't modify archive members unless really needed.
For whatever reason ld64 requires that member headers (not the member
themselves) should be aligned. The only way to do that is to edit the
previous member so that it ends at an aligned boundary.
Since modifying data put in an archive is an undesirable property,
llvm-ar should only do it when it is absolutely necessary.
Sanjay Patel [Tue, 21 Feb 2017 19:33:53 +0000 (19:33 +0000)]
[InstCombine] canonicalize non-obivous forms of integer min/max
This is part of trying to clean up our handling of min/max patterns in IR.
By converting these to canonical form, we're more likely to recognize them
because there are various places in InstCombine that don't use
matchSelectPattern or m_SMax and friends.
The backend fixups referenced in the now deleted TODO comment were added with:
https://reviews.llvm.org/rL291392
https://reviews.llvm.org/rL289738
If there's any codegen fallout from this change, we should be able to address
it in DAGCombiner or target-specific lowering.
Zachary Turner [Tue, 21 Feb 2017 19:29:56 +0000 (19:29 +0000)]
Remove svn:eol-style property from 2 files.
There are still over 3400 files remaining with this property set, but there are tens of thousands more with the property not set. Until we decide what to do on a global scale, this at least unblocks me temporarily.
Matt Arsenault [Tue, 21 Feb 2017 19:12:08 +0000 (19:12 +0000)]
AMDGPU: Don't use stack space for SGPR->VGPR spills
Before frame offsets are calculated, try to eliminate the
frame indexes used by SGPR spills. Then we can delete them
after.
I think for now we can be sure that no other instruction
will be re-using the same frame indexes. It should be easy
to notice if this assumption ever breaks since everything
asserts if it tries to use a dead frame index later.
The unused emergency stack slot seems to still be left behind,
so an additional 4 bytes is still wasted.
Adrian Prantl [Tue, 21 Feb 2017 19:03:15 +0000 (19:03 +0000)]
Teach the IR verifier to reject conflicting debug info for function arguments.
Conflicting debug info for function arguments causes hard-to-debug
assertions in the DWARF backend, so the Verifier should reject it.
For performance reasons this only checks function arguments from
non-inlined debug intrinsics for now.
Geoff Berry [Tue, 21 Feb 2017 18:53:14 +0000 (18:53 +0000)]
[CodeGenPrepare] Sink and duplicate more 'and' instructions.
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64. The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.
The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used. One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.
This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).
John Brawn [Tue, 21 Feb 2017 16:41:29 +0000 (16:41 +0000)]
[ARM] Correct SP/PC handling in t2MOVr
PC isn't allowed in the source operand of t2MOVr, so change the register class
to one without PC. SP handling is slightly trickier and changes depending on if
we're in ARMv8, so do that in checkTargetMatchPredicate.
The patch introduces new way of narrowing complex (>UINT16 variants) solutions.
The new method introduced under "-lsr-exp-narrow" option (currenlty set to true).
Summary:
The method is based on registers number mathematical expectation and should be
generally closer to optimal solution.
Please see details in comments to
"LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function
(in lib/Transforms/Scalar/LoopStrengthReduce.cpp).
Craig Topper [Tue, 21 Feb 2017 06:39:13 +0000 (06:39 +0000)]
[X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs
Summary:
Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency.
This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do.
Matthias Braun [Tue, 21 Feb 2017 01:27:33 +0000 (01:27 +0000)]
ScheduleDAG: Cleanup; NFC
- Fix doxygen comments (do not repeat documented name, remove definition
comment if there is already one at the declaration, add \p, ...)
- Add some const modifiers
- Use range based for
Matthias Braun [Tue, 21 Feb 2017 01:27:29 +0000 (01:27 +0000)]
SubtargetFeature: Cleanup; NFC
- Fix doxygen comments
- Remove duplicated comments
- Remove section comments (which became wrong over time)
- Use more `const` and references but less `auto`
Taewook Oh [Tue, 21 Feb 2017 00:12:38 +0000 (00:12 +0000)]
[BranchFolding] Update debug location along with the update of branch instruction.
Summary:
Currently, BranchFolder drops DebugLoc for branch instructions in some places. For example, for the test code attached, the branch instruction of 'entry' block has a DILocation of
Daniel Sanders [Mon, 20 Feb 2017 15:30:43 +0000 (15:30 +0000)]
[globalisel] OperandPredicateMatcher's shouldn't need to generate the MachineOperand expr. NFC
Summary:
Each OperandPredicateMatcher shouldn't need to know how to generate the expression
to reference a MachineOperand. The OperandMatcher should provide it.
In addition to separating responsibilities, this also lays some groundwork for
decoupling source patterns from destination patterns to allow invented operands
or operands provided by GlobalISel's equivalent to the ComplexPattern<> class.
Diana Picus [Mon, 20 Feb 2017 14:45:58 +0000 (14:45 +0000)]
[ARM] GlobalISel: Don't select atomic loads
There used to be a check in the IRTranslator that prevented us from having to
deal with atomic loads/stores. That check has been removed in r294993 and the
AArch64 backend was updated accordingly. This commit does the same thing for the
ARM backend.
In general, in the ARM backend we introduce fences during the atomic expand
pass, so we don't have to worry about atomics, *except* for the 32-bit ARMv8
target, which handles atomics more like AArch64. Since we don't want to worry
about that yet, just bail out of instruction selection if we find any atomic
loads.
Daniel Sanders [Mon, 20 Feb 2017 14:31:27 +0000 (14:31 +0000)]
[globalisel] Separate the SelectionDAG importer from the emitter. NFC
Summary:
In the near future the rules will be sorted between these two steps to
ensure that more important rules are not prevented by less important ones.
Igor Breger [Mon, 20 Feb 2017 14:16:29 +0000 (14:16 +0000)]
[X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector.
Its more profitable to go through memory (1 cycles throughput)
than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index.
IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer)
For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles.
Removing the VINSERT node, we don't need it any more.
Sjoerd Meijer [Mon, 20 Feb 2017 10:57:54 +0000 (10:57 +0000)]
AArch64AsmParser: tablegen the isBranchTarget helper functions
Use tablegen to autogenerate isBranchtarget helper functions. This is a cleanup
that removes almost identical functions that differ only in a few constants.
Ayman Musa [Mon, 20 Feb 2017 08:27:54 +0000 (08:27 +0000)]
[X86][AVX] Extend hasVEX_WPrefix bit to accept WIG value (W Ignore) + update all AVX instructions with the new value.
Add WIG value to all of AVX instructions which ignore the W-bit in their encoding, instead of giving them the default value of 0.
This patch is needed for a follow up work on EVEX2VEX pass (replacing EVEX encoded instructions with their corresponding VEX version when possible).
The current ObjectLinkingLayer (now RTDyldObjectLinkingLayer) links objects
in-process using MCJIT's RuntimeDyld class. In the near future I hope to add new
object linking layers (e.g. a remote linking layer that links objects in the JIT
target process, rather than the client), so I'm renaming this class to be more
descriptive.