Simon Dardis [Fri, 24 Nov 2017 16:45:28 +0000 (16:45 +0000)]
[CodeGenPrepare] Check that erased sunken address are not reused
CodeGenPrepare sinks address computations from one basic block to another
and attempts to reuse address computations that have already been sunk. If
the same address computation appears twice with the first instance as an
operand of a load whose result is an operand to a simplifable select,
CodeGenPrepare simplifies the select and recursively erases the now dead
instructions. CodeGenPrepare then attempts to use the erased address
computation for the second load.
Fix this by erasing the cached address value if it has zero uses before
looking for the address value in the sunken address map.
This partially resolves PR35209.
Thanks to Alexander Richardson for reporting the issue!
This fixed version relands r318032 which was reverted in r318049 due to
sanitizer buildbot failures.
John Brawn [Fri, 24 Nov 2017 14:10:45 +0000 (14:10 +0000)]
[CGP] Make optimizeMemoryInst able to combine more kinds of ExtAddrMode fields
This patch extends the recent work in optimizeMemoryInst to make it able to
combine more ExtAddrMode fields than just the BaseReg.
This fixes some benchmark regressions introduced by r309397, where GVN PRE is
hoisting a getelementptr such that it can no longer be combined into the
addressing mode of the load or store that uses it.
Craig Topper [Thu, 23 Nov 2017 19:25:45 +0000 (19:25 +0000)]
[X86] Don't invert NewCC variable while processing the jcc/setcc/cmovcc instructions in optimizeCompareInstr.
The NewCC variable is calculated outside of the loop that processes jcc/setcc/cmovcc instructions. If we invert it during the loop it can cause an incorrect value to be used by a later iteration. Instead only read it during the loop and use a new variable to store the possibly inverted value.
Diana Picus [Thu, 23 Nov 2017 13:26:07 +0000 (13:26 +0000)]
[ARM GlobalISel] Support G_FDIV for s32 and s64
TableGen already generates code for selecting a G_FDIV, so we only need
to add a test.
For the legalizer and reg bank select, we do the same thing as for the
other floating point binary operations: either mark as legal if we have
a FP unit or lower to a libcall, and map to the floating point
registers.
Ying Yi [Thu, 23 Nov 2017 12:48:41 +0000 (12:48 +0000)]
[lit] Implement non-pipelined ‘mkdir’, ‘diff’ and ‘rm’ commands internally
Summary:
The internal shell already supports 'cd', ‘export’ and ‘echo’ commands.
This patch adds implementation of non-pipelined ‘mkdir’, ‘diff’ and ‘rm’
commands as the internal shell builtins.
Diana Picus [Thu, 23 Nov 2017 12:44:20 +0000 (12:44 +0000)]
[ARM GlobalISel] Support G_FMUL for s32 and s64
TableGen already generates code for selecting a G_FMUL, so we only need
to add a test for that part.
For the legalizer and reg bank select, we do the same thing as the other
floating point binary operators: either mark as legal if we have a FP
unit or lower to a libcall, and map to the floating point registers.
Simon Dardis [Thu, 23 Nov 2017 12:38:04 +0000 (12:38 +0000)]
[mips] Use the delay slot filler to convert branches for microMIPSR6.
The MIPS delay slot filler converts delay slot branches into compact
forms for the MIPS ISAs which support them. For branches that compare
(in)equality with with zero, it converts them into branches with implict
zero register operands. These branches have a slightly greater range
than normal two register operands branches.
Changing the branches at this point in the pipeline offers the long
branch pass the ability to mark better judgements if a long branch
sequence is required.
[MSan] Move the access address check before the shadow access for that address
MSan used to insert the shadow check of the store pointer operand
_after_ the shadow of the value operand has been written.
This happens to work in the userspace, as the whole shadow range is
always mapped. However in the kernel the shadow page may not exist, so
the bug may cause a crash.
This patch moves the address check in front of the shadow access.
Max Kazantsev [Thu, 23 Nov 2017 06:14:39 +0000 (06:14 +0000)]
[IRCE][NFC] Add no wrap flags to no-wrapping SCEV calculation
In a lambda where we expect to have result within bounds, add respective `nsw/nuw` flags to
help SCEV just in case if it fails to figure them out on its own.
Yaxun Liu [Thu, 23 Nov 2017 03:08:51 +0000 (03:08 +0000)]
[NFC] CodeGen: Handle shift amount type in DAGTypeLegalizer::SplitInteger
This patch reverts change to X86TargetLowering::getScalarShiftAmountTy in
rL318727 and move the logic to DAGTypeLegalizer::SplitInteger.
The reason is that getScalarShiftAmountTy returns a shift amount type that
is suitable for common use cases in CodeGen. DAGTypeLegalizer::SplitInteger
is a rare situation which requires a shift amount type larger than what
getScalarShiftAmountTy. In this case, it is more reasonable to do special
handling of shift amount type in DAGTypeLegalizer::SplitInteger only. If
similar situations arises the logic may be moved to a separate function.
Fedor Sergeev [Wed, 22 Nov 2017 20:59:53 +0000 (20:59 +0000)]
IR printing improvement for loop passes
Summary:
Loop-pass printing is somewhat deficient since it does not provide the
context around the loop (e.g. preheader). This context information becomes
pretty essential when analyzing transformations that move stuff out of the loop.
Extending printLoop to cover preheader and exit blocks (if any).
[Hexagon] Implement buildVector32 and buildVector64 as utility functions
Change LowerBUILD_VECTOR to use those functions. This commit will tempora-
rily affect constant vector generation (it will generate constant-extended
values instead of non-extended combines), but the code for the general case
should be better. The constant selection part will be fixed later.
Rafael Espindola [Wed, 22 Nov 2017 19:59:05 +0000 (19:59 +0000)]
Allow TempFile::discard to be called twice.
We already allowed keep+discard. It is important to be able to discard
a temporary if a rename fail. It is also convenient as it allows the
use of RAII for discarding.
CachePruning: Allow limiting the number of files in the cache directory.
The default limit is 1000000 but it can be configured with a cache
policy. The motivation is that some filesystems (notably ext4) have
a limit on the number of files that can be contained in a directory
(separate from the inode limit).
Paul Robinson [Wed, 22 Nov 2017 15:33:17 +0000 (15:33 +0000)]
[DWARFv5] Support DW_FORM_strp in the .debug_line.dwo header.
As a side effect, the .debug_line section will be dumped in physical
order, rather than in the order that compile units refer to their
associated portions of the .debug_line section. These are probably
always the same order anyway, and no tests noticed the difference.
Nicolai Haehnle [Wed, 22 Nov 2017 12:25:21 +0000 (12:25 +0000)]
AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer
Summary:
This bug seems to have gone unnoticed because critical cases with LDS
instructions are eliminated by the peephole optimizer.
However, equivalent situations arise with buffer loads and stores
as well, so this fixes regressions since r317751 ("AMDGPU: Merge
S_BUFFER_LOAD_DWORD_IMM into x2, x4").
Fixes at least:
KHR-GL45.shader_storage_buffer_object.basic-operations-case1-cs
KHR-GL45.cull_distance.functional
piglit tes-input-gl_ClipDistance.shader_test
... and probably more
George Rimar [Wed, 22 Nov 2017 07:53:48 +0000 (07:53 +0000)]
[llvm-tblgen] - Stop using std::string in RecordKeeper.
RecordKeeper::getDef() is a hot place, it shows up in profiling
and it creates std::string instance for each search in RecordMap
though RecordKeeper::RecordMap can use StringRef as a key
instead to avoid that. Patch do that change.
Craig Topper [Wed, 22 Nov 2017 07:11:03 +0000 (07:11 +0000)]
[X86] Lower all ISD::MGATHER nodes to X86ISD:MGATHER.
Now we consistently represent the mask result without relying on isel ignoring it.
We now have a more general SDNode and type constraints to represent these nodes in isel patterns. This allows us to present both both vXi1 and XMM/YMM mask types with a single set of constraints.
Max Kazantsev [Wed, 22 Nov 2017 06:21:39 +0000 (06:21 +0000)]
[SCEV] Strengthen variance condition in calculateLoopDisposition
Given loops `L1` and `L2` with AddRecs `AR1` and `AR2` varying in them respectively.
When identifying loop disposition of `AR2` w.r.t. `L1`, we only say that it is varying if
`L1` contains `L2`. But there is also a possible situation where `L1` and `L2` are
consecutive sibling loops within the parent loop. In this case, `AR2` is also varying
w.r.t. `L1`, but we don't correctly identify it.
It can lead, for exaple, to attempt of incorrect folding. Consider:
AR1 = {a,+,b}<L1>
AR2 = {c,+,d}<L2>
EXAR2 = sext(AR1)
MUL = mul AR1, EXAR2
If we incorrectly assume that `EXAR2` is invariant w.r.t. `L1`, we can end up trying to
construct something like: `{a * {c,+,d}<L2>,+,b * {c,+,d}<L2>}<L1>`, which is incorrect
because `AR2` is not available on entrance of `L1`.
Both situations "`L1` contains `L2`" and "`L1` preceeds sibling loop `L2`" can be handled
with one check: "header of `L1` dominates header of `L2`". This patch replaces the old
insufficient check with this one.
Davide Italiano [Wed, 22 Nov 2017 03:04:55 +0000 (03:04 +0000)]
[SCCP] Pick the right lattice value for constants.
After the dataflow algorithm proves that an argument is constant,
it replaces it value with the integer constant and drops the lattice
value associated to the DEF.
e.g. in the example we have @f() that's called twice:
call @f(undef, ...)
call @f(2, ...)
`undef` MEET 2 = 2 so we replace the argument and all its uses with
the constant 2.
Shortly after, tryToReplaceWithConstantRange() tries to get the lattice
value for the argument we just replaced, causing an assertion.
This function is a little peculiar as it runs when we're doing replacement
and not as part of the solver but still queries the solver.
The fix is that of checking whether we replaced the value already and
get a temporary lattice value for the constant.
Craig Topper [Tue, 21 Nov 2017 23:36:42 +0000 (23:36 +0000)]
[X86] Move the information about the feature bits used by compiler-rt and shared by Host.cpp to a .def file and TargetParser.h so clang can make use of it.
Since we keep Host.cpp and compiler-rt relatively in sync, clang can use this information as a proxy.
Change the representation of COFF comdats so that a COFF linker
is able to accurately resolve comdats between IR and native object
files. Specifically, apply name mangling to comdat names consistently
with native object files, and do not export comdats with an internal
leader because they do not affect symbol resolution.
Chad Rosier [Tue, 21 Nov 2017 18:08:34 +0000 (18:08 +0000)]
[AArch64] Mark mrs of TPIDR_EL0 (thread pointer) as *having* side effects.
This partially reverts r298851. The the underlying issue is that we don't
currently model the dependency between mrs (read system register) and
msr (write system register) instructions.
Something like the below should never be reordered:
Oliver Stannard [Tue, 21 Nov 2017 16:20:25 +0000 (16:20 +0000)]
[ARM] Remove pre-UAL FLDM/FSTM aliases
These are pre-UAL syntax, and we don't support any other pre-UAL instructions,
with the exception of FLDMX/FSTMX, which don't have a UAL equivalent. Therefore
there's no reason to keep them or their AsmParser hacks around.
With the AsmParser hacks removed, the FLDMX and FSTMX instructions get the same
operand diagnostics as the UAL instructions.
Alina Sbirlea [Tue, 21 Nov 2017 15:45:46 +0000 (15:45 +0000)]
Add MemorySSA as loop dependency, disabled by default [NFC].
Summary:
First step in adding MemorySSA as dependency for loop pass manager.
Adding the dependency under a flag.
New pass manager: MSSA pointer in LoopStandardAnalysisResults can be null.
Legacy and new pass manager: Use cl::opt EnableMSSALoopDependency. Disabled by default.
Oliver Stannard [Tue, 21 Nov 2017 15:16:50 +0000 (15:16 +0000)]
[Asm] Improve "too few operands" errors
- We can still emit this error if the actual instruction has two or more
operands missing compared to the expected one.
- We should only emit this error once per instruction.
Oliver Stannard [Tue, 21 Nov 2017 15:12:05 +0000 (15:12 +0000)]
[Asm] Finish matching once end of formal and actual lists reached (NFC)
This is NFC, as the matcher would continue looping up to the maximum
number of operands with no effect, but this should improve performance a
bit, and makes the debug trace clearer.
Sander de Smalen [Tue, 21 Nov 2017 12:26:06 +0000 (12:26 +0000)]
[TableGen] AsmMatcher: Fix bug with reported diagnostic for operand.
Summary:
The generated diagnostic by the AsmMatcher isn't always applicable to the AsmOperand.
This is because the code will only update the diagnostic if it is more specific than the previous diagnostic. However, when having validated operands and 'moved on' to a next operand (for some instruction/alias for which all previous operands are valid), if the diagnostic is InvalidOperand, than that should be set as the diagnostic, not the more specific message about a previous operand for some other instruction/alias candidate.
Alex Bradbury [Tue, 21 Nov 2017 12:00:19 +0000 (12:00 +0000)]
[RISCV][NFC] Clean up RISCVDAGToDAGISel::Select
As pointed out in post-commit review of r318738, `return ReplaceNode(..)` when
both ReplaceNode and the current function return void is confusing. This patch
moves to using a more obvious early return, and moves to just using an if to
catch the one case we currently care about. A future patch that adds further
custom instruction selection can introduce a switch.
properlyDominates() shouldn't be used as sort key. It causes different output between stdlibc++ and libc++.
Instead, I introduced RPOT. In most cases, it works for CSE.