Daniel Sanders [Mon, 20 Feb 2017 15:30:43 +0000 (15:30 +0000)]
[globalisel] OperandPredicateMatcher's shouldn't need to generate the MachineOperand expr. NFC
Summary:
Each OperandPredicateMatcher shouldn't need to know how to generate the expression
to reference a MachineOperand. The OperandMatcher should provide it.
In addition to separating responsibilities, this also lays some groundwork for
decoupling source patterns from destination patterns to allow invented operands
or operands provided by GlobalISel's equivalent to the ComplexPattern<> class.
Diana Picus [Mon, 20 Feb 2017 14:45:58 +0000 (14:45 +0000)]
[ARM] GlobalISel: Don't select atomic loads
There used to be a check in the IRTranslator that prevented us from having to
deal with atomic loads/stores. That check has been removed in r294993 and the
AArch64 backend was updated accordingly. This commit does the same thing for the
ARM backend.
In general, in the ARM backend we introduce fences during the atomic expand
pass, so we don't have to worry about atomics, *except* for the 32-bit ARMv8
target, which handles atomics more like AArch64. Since we don't want to worry
about that yet, just bail out of instruction selection if we find any atomic
loads.
Daniel Sanders [Mon, 20 Feb 2017 14:31:27 +0000 (14:31 +0000)]
[globalisel] Separate the SelectionDAG importer from the emitter. NFC
Summary:
In the near future the rules will be sorted between these two steps to
ensure that more important rules are not prevented by less important ones.
Igor Breger [Mon, 20 Feb 2017 14:16:29 +0000 (14:16 +0000)]
[X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector.
Its more profitable to go through memory (1 cycles throughput)
than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index.
IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer)
For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles.
Removing the VINSERT node, we don't need it any more.
Sjoerd Meijer [Mon, 20 Feb 2017 10:57:54 +0000 (10:57 +0000)]
AArch64AsmParser: tablegen the isBranchTarget helper functions
Use tablegen to autogenerate isBranchtarget helper functions. This is a cleanup
that removes almost identical functions that differ only in a few constants.
Ayman Musa [Mon, 20 Feb 2017 08:27:54 +0000 (08:27 +0000)]
[X86][AVX] Extend hasVEX_WPrefix bit to accept WIG value (W Ignore) + update all AVX instructions with the new value.
Add WIG value to all of AVX instructions which ignore the W-bit in their encoding, instead of giving them the default value of 0.
This patch is needed for a follow up work on EVEX2VEX pass (replacing EVEX encoded instructions with their corresponding VEX version when possible).
The current ObjectLinkingLayer (now RTDyldObjectLinkingLayer) links objects
in-process using MCJIT's RuntimeDyld class. In the near future I hope to add new
object linking layers (e.g. a remote linking layer that links objects in the JIT
target process, rather than the client), so I'm renaming this class to be more
descriptive.
Simon Pilgrim [Sun, 19 Feb 2017 19:40:31 +0000 (19:40 +0000)]
[X86][SSE] Use getTargetConstantBitsFromNode to find zeroable shuffle elements.
Replaces existing approach that could only search BUILD_VECTOR nodes.
Requires getTargetConstantBitsFromNode to discriminate cases with all/partial UNDEF bits in each element - this should also be useful when we get around to supporting getTargetShuffleMaskIndices with UNDEF elements.
Simon Pilgrim [Sun, 19 Feb 2017 17:19:38 +0000 (17:19 +0000)]
[X86][SSE] Enable initial support for domain crossing at high shuffle combine depths.
As discussed on D27692, this permits another domain to be used to combine a shuffle at high depths.
We currently set the required depth at 4 or more combined shuffles, this is probably too high for most targets but is a good starting point and already helps avoid a number of costly variable shuffles.
Craig Topper [Sun, 19 Feb 2017 08:03:23 +0000 (08:03 +0000)]
[AVX-512] Add patterns to show missed opportunities for folding vpternlog with broadcast loads. Also demonstrates a bug in the commuting of broadcast vpternlog instructions when we are able to select them.
Daniel Berlin [Sun, 19 Feb 2017 04:28:56 +0000 (04:28 +0000)]
Add initial support for debug counting
Summary:
We have support for bisection, and bugpoint can reduce testcases
often to a single pass. But that doesn't help reduce it to a single
transform by a single pass. Which debug counting lets us do.
Debug counting lets you instrument a pass so that it only executes a
certain thing (rwhatever you want) after skipping it a certain time of
times, and then only does a certain number of executions before saying
"skip" again.
To make it concrete, for predicateinfo, if i instrument use renaming,
i can make it so it skips renaming the first N uses, renames the next
N, and then skips the rest.
This lets you narrow down a miscompilation to, often, a single
transformation, and then also debug it (by using the same command line
parameters).
Craig Topper [Sun, 19 Feb 2017 02:08:48 +0000 (02:08 +0000)]
[X86] Remove patterns for MOVSD with v4i32 types. We don't appear to really need them and if we do we should just use a bitcast to a 64-bit element type.
Craig Topper [Sat, 18 Feb 2017 22:53:43 +0000 (22:53 +0000)]
[X86][XOP] Reduce the size of a multiclass by moving more stuff to parameters instead of doing 128-bit and 256-bit simultaneously.
This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions.
Sanjay Patel [Sat, 18 Feb 2017 21:59:09 +0000 (21:59 +0000)]
[InstSimplify] add nsw/nuw (xor X, signbit), signbit --> X
The change to InstCombine in:
https://reviews.llvm.org/D29729
...exposes this missing fold in InstSimplify, so adding this
first to avoid a regression.
Craig Topper [Sat, 18 Feb 2017 19:51:25 +0000 (19:51 +0000)]
[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR.
It seems we were already upgrading 128-bit VPCMOV, but the intrinsic was still defined and being used in isel patterns. While I was here I also simplified the tablegen multiclasses.
Easwaran Raman [Sat, 18 Feb 2017 17:22:52 +0000 (17:22 +0000)]
Refactor instruction simplification code in visitors. NFC.
Several visitors check if operands to the instruction are constants,
either as it is or after looking up SimplifiedValues, check if the
result is a constant and update the SimplifiedValues map. This
refactoring splits it into a common function that does the checking of
whether the operands are constants and updating of the SimplifiedValues
table, and an instruction specific part that is implemented by each
instruction visitor as a lambda and passed to the common function.
Brian Cain [Sat, 18 Feb 2017 15:13:58 +0000 (15:13 +0000)]
opt-viewer: Fix syntax highlighting
Syntax highlighting has been done line-at-a-time. Done this way, the lexer
resets the context at each line, distorting the formatting.
This change will render the whole file at once and feed the highlighted text
line-at-a-time to be wrapped by the SourceFileRenderer.
Leading/trailing newlines were being ignored by Pygments but since each line
was rendered in its own row, it didn't matter. This bug was masked by the
line-at-a-time algorithm. So now we need to add "stripnl=False" to the
CppLexer to change its behavior to match the expectation.
Davide Italiano [Sat, 18 Feb 2017 03:02:44 +0000 (03:02 +0000)]
[IR/Verifier] Don't visit DISubprograms more than needed.
Before this patch we happened to visit twice, one when scanning
MDNodes and the other one while visiting the function. Remove
the explicit call to visitDISubprogram there, so we don't emit
the same error twice in case the verifier fail and we save some
time when running it.
Thanks to Justin Bogner for the report and Adrian for the quick
review!
Justin Bogner [Sat, 18 Feb 2017 02:00:27 +0000 (02:00 +0000)]
OptDiag: Allow constructing DiagnosticLocation from DISubprograms
This avoids creating a DILocation just to represent a line number,
since creating Metadata is expensive. Creating a DiagnosticLocation
directly is much cheaper.
Justin Bogner [Sat, 18 Feb 2017 00:42:23 +0000 (00:42 +0000)]
OptDiag: Decouple backend diagnostics from debug info metadata
This creates and uses a DiagnosticLocation type rather than using
DebugLoc for this purpose in the backend diagnostics. This is NFC for
now, but will allow us to create locations for diagnostics without
having to create new metadata nodes when we don't have a DILocation.
Matthias Braun [Sat, 18 Feb 2017 00:41:16 +0000 (00:41 +0000)]
MachineRegionInfo: Fix pass initialization
- Adapt MachineBasicBlock::getName() to have the same behavior as the IR
BasicBlock (Value::getName()).
- Add it to lib/CodeGen/CodeGen.cpp::initializeCodeGen so that it is linked in
the CodeGen library.
- MachineRegionInfoPass's name conflicts with RegionInfoPass's name ("region").
- MachineRegionInfo should depend on MachineDominatorTree,
MachinePostDominatorTree and MachineDominanceFrontier instead of their
respective IR versions.
- Since there were no tests for this, add a X86 MIR test.
Patch by Francis Visoiu Mistrih<fvisoiumistrih@apple.com>