Craig Topper [Sat, 12 Aug 2017 20:19:44 +0000 (20:19 +0000)]
[X86] When handling addcarry intrinsic, create the flag result with the correct type so we don't crash if we use a memory instruction
Summary:
Previously we were creating the flag result with MVT::Other which is interpretted as a Chain node. If we used a memory form of the instruction we would end up with a copyToReg that consumed the chain result of the adcx instruction instead of the flag result.
Pretty sure we should be using MVT::i32 here, that's what we do other places we create these node types.
Florian Hahn [Sat, 12 Aug 2017 17:40:18 +0000 (17:40 +0000)]
[Triple] Add isThumb and isARM functions.
Summary:
isThumb returns true for Thumb triples (little and big endian), isARM
returns true for ARM triples (little and big endian).
There are a few more checks using arm/thumb that are not covered by
those functions, e.g. that the architecture is either ARM or Thumb
(little endian) or ARM/Thumb little endian only.
Sanjay Patel [Sat, 12 Aug 2017 16:41:08 +0000 (16:41 +0000)]
[BDCE] clear poison generators after turning a value into zero (PR33695, PR34037)
nsw, nuw, and exact carry implicit assumptions about their operands, so we need
to clear those after trivializing a value. We decided there was no danger for
llvm.assume or metadata, so there's just a comment about that.
This fixes miscompiles as shown in:
https://bugs.llvm.org/show_bug.cgi?id=33695
https://bugs.llvm.org/show_bug.cgi?id=34037
Sanjay Patel [Fri, 11 Aug 2017 22:38:40 +0000 (22:38 +0000)]
[x86] add tests for rotate left/right with masked shifter; NFC
As noted in the test comment, instcombine now produces the masked
shift value even when it's not included in the source, so we should
handle this.
Although the AMD/Intel docs don't say it explicitly, over-rotating
the narrow ops produces the same results. An existence proof that
this works as expected on all x86 comes from gcc 4.9 or later:
https://godbolt.org/g/K6rc1A
Eli Friedman [Fri, 11 Aug 2017 21:12:04 +0000 (21:12 +0000)]
[OptDiag] Updating Remarks in SampleProfile
Updating remark API to newer OptimizationDiagnosticInfo API. This
allows remarks to show up in diagnostic yaml file, and enables use
of opt-viewer tool.
Hotness information for remarks (L505 and L751) do not display hotness
information, most likely due to profile information not being
propagated yet. Unsure if this is the desired outcome.
Zachary Turner [Fri, 11 Aug 2017 20:46:28 +0000 (20:46 +0000)]
Output S_SECTION symbols to the Linker module.
PDBs need to contain 1 module for each object file/compiland,
and a special one synthesized by the linker. This one contains
a symbol record for each output section in the executable with
its address information. This patch adds such symbols to the
linker module. Note that we also are supposed to add an
S_COFFGROUP symbol for what appears to be each input section that
contributes to each output section, but it's not entirely clear
how to generate these yet, so I'm leaving that for a separate
patch.
Daniel Sanders [Fri, 11 Aug 2017 19:19:21 +0000 (19:19 +0000)]
Revert r310716 (and r310735): [globalisel][tablegen] Support zero-instruction emission.
Two of the Windows bots are failing test\CodeGen\X86\GlobalISel\select-inc.mir
which should not have been affected by the change. Reverting while I investigate.
Also reverted r310735 because it builds on r310716.
Zachary Turner [Fri, 11 Aug 2017 19:00:03 +0000 (19:00 +0000)]
[LLD/PDB] Write actual records to the globals stream.
Previously we were writing an empty globals stream. Windows
tools interpret this as "private symbols are not present in
this PDB", even when they are, so we need to fix this. Regardless,
without it we don't have information about global variables, so
we need to fix it anyway. This patch does that.
With this patch, the "lm" command in WinDbg correctly reports
that we have private symbols available, but the "dv" command
still refuses to display local variables.
Brian Gesiak [Fri, 11 Aug 2017 18:05:26 +0000 (18:05 +0000)]
[opt-viewer] Decode HTML bytes for Python 3
Summary:
When using Python 3, `pygments.highlight()` returns a `bytes` object, not
a `str`, causing the call to `str.replace` on the following line to fail
with a runtime exception:
`TypeError: 'str' does not support the buffer interface`. Decode the
bytes into a string in order to fix the exception.
Test Plan:
Run `opt-viewer.py` with Python 3.4, and confirm no runtime error occurs
when calling `str.replace`.
Brian Gesiak [Fri, 11 Aug 2017 17:56:57 +0000 (17:56 +0000)]
[opt-viewer] Use Python 3-compatible `intern()`
Summary:
In Python 2, `intern()` is a builtin function available to all programs.
In Python 3, it was moved into the `sys` module, available as
`sys.intern`. Import it such that, within `optrecord.py`, `intern()` is
available whether run using Python 2 or 3.
Test Plan:
Run `opt-viewer.py` using Python 3, confirm it no longer
encounters a runtime error when `intern()` is called.
The pass does simplifications of well known AMD library calls.
If given -amdgpu-prelink option it works in a pre-link mode which
allows to reference new library functions which will be linked in
later.
In addition it also used to process traditional AMD option
-fuse-native which allows to replace some of the functions with
their fast native implementations from the library.
The necessary glue to pass the prelink option and translate
-fuse-native is to be added to the driver.
Craig Topper [Fri, 11 Aug 2017 16:20:05 +0000 (16:20 +0000)]
[x86] Enable some support for lowerVectorShuffleWithUndefHalf with AVX-512
Summary:
This teaches 512-bit shuffles to detect unused halfs in order to reduce shuffle size.
We may need to refine the 512-bit exit point. I couldn't remember if we had good cross lane shuffles for 8/16 bit with AVX-512 or not.
I believe this is step towards being able to handle D36454 without a special case.
From here we need to improve our ability to combine extract_subvector with insert_subvector and other extract_subvectors. And we need to support narrowing binary operations where we don't demand all elements. This may be improvements to DAGCombiner::narrowExtractedVectorBinOp(by recognizing an insert_subvector in addition to concat) or we may need a target specific combiner.
Sanjay Patel [Fri, 11 Aug 2017 15:44:14 +0000 (15:44 +0000)]
[x86] use more shift or LEA for select-of-constants (2nd try)
The previous rev (r310208) failed to account for overflow when subtracting the
constants to see if they're suitable for shift/lea. This version add a check
for that and more test were added in r310490.
We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d
For this patch, I'm enhancing an existing x86 transform that uses fake multiplies
(they always become shl/lea) to avoid cmov or branching. The current code misses
cases where we have a negative constant and a positive constant, so this is just
trying to plug that hole.
The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start
with a select in IR, create a select DAG node, convert it into a sext, convert it
back into a select, and then lower it to sext machine code.
Some notes about the test diffs:
1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. We could avoid the
push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a
post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if
that's a regression, but those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so we don't actually care about these diffs.
5. sbb.ll - This shows a win for what is likely a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.
Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.
Daniel Sanders [Fri, 11 Aug 2017 15:40:32 +0000 (15:40 +0000)]
[globalisel][tablegen] Support zero-instruction emission.
Summary:
Support the case where an operand of a pattern is also the whole of the
result pattern. In this case the original result and all its uses must be
replaced by the operand. However, register class restrictions can require
a COPY. This patch handles both cases by always emitting the copy and
leaving it for the register allocator to optimize.
Simon Dardis [Fri, 11 Aug 2017 14:36:05 +0000 (14:36 +0000)]
[mips] Lift the assertion on the types that can be used with MipsGPRel
Post commit review of rL308619 highlighted the need for handling N64
with -fno-pic. Testing reveale a stale assert when generating a GP
relative addressing mode.
This patch removes that assert and adds the necessary patterns for
MIPS64 to perform gp relative addressing with -fno-pic
(and the implicit -mno-abicalls + -mgpopt).
Michal Gorny [Fri, 11 Aug 2017 13:25:20 +0000 (13:25 +0000)]
[cmake] Expose the dependencies of ExecutionEngine as PUBLIC
Expose the dependencies of LLVMExecutionEngine library as PUBLIC rather
than PRIVATE when building a shared library. This is necessary because
the library is not contained but exposes API of other LLVM libraries via
its headers.
This causes other libraries to fail to link if the linker verifies for
correctness of -l flags (i.e. fails on indirect dependencies). This e.g.
happens when building LLDB against shared LLVM:
lib64/liblldbExpression.a(IRExecutionUnit.cpp.o):(.data.rel.ro._ZTIN4llvm18MCJITMemoryManagerE[_ZTIN4llvm18MCJITMemoryManagerE]+0x10): undefined reference to `typeinfo for llvm::RuntimeDyld::MemoryManager'
lib64/liblldbExpression.a(IRExecutionUnit.cpp.o):(.data.rel.ro._ZTVN4llvm18MCJITMemoryManagerE[_ZTVN4llvm18MCJITMemoryManagerE]+0x60): undefined reference to `llvm::RuntimeDyld::MemoryManager::anchor()'
lib64/liblldbExpression.a(IRExecutionUnit.cpp.o):(.data.rel.ro._ZTVN12lldb_private15IRExecutionUnit13MemoryManagerE[_ZTVN12lldb_private15IRExecutionUnit13MemoryManagerE]+0x48): undefined reference to `llvm::RTDyldMemoryManager::deregisterEHFrames()'
lib64/liblldbExpression.a(IRExecutionUnit.cpp.o):(.data.rel.ro._ZTVN12lldb_private15IRExecutionUnit13MemoryManagerE[_ZTVN12lldb_private15IRExecutionUnit13MemoryManagerE]+0x60): undefined reference to `llvm::RuntimeDyld::MemoryManager::anchor()'
lib64/liblldbExpression.a(IRExecutionUnit.cpp.o):(.data.rel.ro._ZTVN12lldb_private15IRExecutionUnit13MemoryManagerE[_ZTVN12lldb_private15IRExecutionUnit13MemoryManagerE]+0xd0): undefined reference to `llvm::JITSymbolResolver::anchor()'
collect2: error: ld returned 1 exit status
Declaring the dependencies as PUBLIC guarantees that any package using
the ExecutionEngine library will also get explicit -l flags for
the dependent libraries guaranteeing that the symbols exposed in headers
could be resolved.
Mikael Holmen [Fri, 11 Aug 2017 06:57:08 +0000 (06:57 +0000)]
[IfConversion] Maintain the CFG when predicating/merging blocks in IfConvert*
Summary:
This fixes PR32721 in IfConvertTriangle and possible similar problems in
IfConvertSimple, IfConvertDiamond and IfConvertForkedDiamond.
In PR32721 we had a triangle
EBB
| \
| |
| TBB
| /
FBB
where FBB didn't have any successors at all since it ended with an
unconditional return. Then TBB and FBB were be merged into EBB, but EBB
would still keep its successors, and the use of analyzeBranch and
CorrectExtraCFGEdges wouldn't help to remove them since the return
instruction is not analyzable (at least not on ARM).
The edge updating code and branch probability updating code is now pushed
into MergeBlocks() which allows us to share the same update logic between
more callsites. This lets us remove several dependencies on analyzeBranch
and completely eliminate RemoveExtraEdges.
One thing that showed up with this patch was that IfConversion sometimes
left a successor with 0% probability even if there was no branch or
fallthrough to the successor.
One such example from the test case ifcvt_bad_zero_prob_succ.mir. The
indirect branch tBRIND can only jump to bb.1, but without the patch we
got:
bb.0:
successors: %bb.1(0x80000000)
bb.1:
successors: %bb.1(0x80000000), %bb.2(0x00000000)
tBRIND %r1, 1, %cpsr
B %bb.1
bb.2:
There is no way to jump from bb.1 to bb2, but still there is a 0% edge
from bb.1 to bb.2.
With the patch applied we instead get the expected:
bb.0:
successors: %bb.1(0x80000000)
bb.1:
successors: %bb.1(0x80000000)
tBRIND %r1, 1, %cpsr
B %bb.1
Since bb.2 had no predecessor at all, it was removed.
Several testcases had to be updated due to this since the removed
successor made the "Branch Probability Basic Block Placement" pass
sometimes place blocks in a different order.
Finally added a couple of new test cases:
* PR32721_ifcvt_triangle_unanalyzable.mir:
Regression test for the original problem dexcribed in PR 32721.
* ifcvt_triangleWoCvtToNextEdge.mir:
Regression test for problem that caused a revert of my first attempt
to solve PR 32721.
* ifcvt_simple_bad_zero_prob_succ.mir:
Test case showing the problem where a wrong successor with 0% probability
was previously left.
* ifcvt_[diamond|forked_diamond|simple]_unanalyzable.mir
Very simple test cases for the simple and (forked) diamond cases
involving unanalyzable branches that can be nice to have as a base if
wanting to write more complicated tests.
Chandler Carruth [Fri, 11 Aug 2017 05:47:13 +0000 (05:47 +0000)]
[PM] Switch the CGSCC debug messages to use the standard LLVM debug
printing techniques with a DEBUG_TYPE controlling them.
It was a mistake to start re-purposing the pass manager `DebugLogging`
variable for generic debug printing -- those logs are intended to be
very minimal and primarily used for testing. More detailed and
comprehensive logging doesn't make sense there (it would only make for
brittle tests).
Moreover, we kept forgetting to propagate the `DebugLogging` variable to
various places making it also ineffective and/or unavailable. Switching
to `DEBUG_TYPE` makes this a non-issue.
Jessica Paquette [Thu, 10 Aug 2017 23:11:24 +0000 (23:11 +0000)]
[MachineOutliner] Add RegState::Define to LDRXpost in insertOutlinedCall
This fixes a MachineVerifier failure in machine-outliner.mir. Not explicitly
adding RegState::Define to the LR argument makes it unhappy because an explicit
definition is marked as a use.
Reid Kleckner [Thu, 10 Aug 2017 21:14:07 +0000 (21:14 +0000)]
Disable some IR death tests when SEH is available
They hang for me locally. I suspect that there is a use-after-free when
attempting to destroy an LLVMContext after asserting from the middle of
metadata tracking. It doesn't seem worth debugging it further.
Craig Topper [Thu, 10 Aug 2017 20:35:34 +0000 (20:35 +0000)]
[InstCombine] Make (X|C1)^C2 -> X^(C1^C2) iff X&~C1 == 0 work for splat vectors
This also corrects the description to match what was actually implemented. The old comment said X^(C1|C2), but it implemented X^((C1|C2)&~(C1&C2)). I believe ((C1|C2)&~(C1&C2)) is equivalent to (C1^C2).
Nirav Dave [Thu, 10 Aug 2017 19:52:45 +0000 (19:52 +0000)]
[DAG] Relax type restriction for store merge
Summary: Allow stores of bitcastable types to be merged by peeking through BITCAST nodes and recasting stored values constant and vector extract nodes as necessary.
Taewook Oh [Thu, 10 Aug 2017 18:17:11 +0000 (18:17 +0000)]
Make .file directive to have basename only
Summary:
Currently LLVM puts directory along with the filename in .file directive, but this behavior doesn't match gcc. There's a no clear description about which one is right (https://sourceware.org/binutils/docs/as/File.html#File), but one document (https://sourceware.org/gdb/current/onlinedocs/stabs/ELF-Linker-Relocation.html) suggests that STT_FILE symbol in elf file is expected to have basename only, which should have a same sting file .file directive according to (https://docs.oracle.com/cd/E26502_01/html/E28388/eoiyg.html).
This also affects badly on the build system that uses hashing, as the directory info could be differnt from developer to developer even when they're working on same file.
Craig Topper [Thu, 10 Aug 2017 17:48:14 +0000 (17:48 +0000)]
[InstCombine] Fix a crash in getSelectCondition if we happen to have two inverse vectors of i1 constants.
We used to try to truncate the constant vector to vXi1, but if it's already i1 this would fail. Instead we now use IRBuilder::getZExtOrTrunc which should check the type and only create a trunc if needed. I believe this should trigger constant folding in the IRBuilder and ultimately do the same thing just with the additional type check.
Craig Topper [Thu, 10 Aug 2017 17:48:12 +0000 (17:48 +0000)]
[InstCombine] Add a DEBUG_COUNTER to InstCombine to limit how many instructions are visited for debug
Sometimes it would be nice to stop InstCombine mid way through its combining to see the current IR. By using a debug counter we can place an upper limit on how many instructions to process.
This will also allow skipping the first X combines, but that has the potential to change later combines since earlier canonicalizations might have been skipped.
Benjamin Kramer [Thu, 10 Aug 2017 17:38:41 +0000 (17:38 +0000)]
[gold-plugin] Avoid race condition when creating temporary files.
This is both a potential security issue and a potential functionality
issue because we create temporary files from multiple threads. Use
the safe version of createTemporaryFile instead.
Simon Pilgrim [Thu, 10 Aug 2017 17:27:20 +0000 (17:27 +0000)]
[CostModel][X86] Improve single src shuffle costs
Add missing SK_PermuteSingleSrc costs for AVX2 targets and earlier, also added some of the simpler SK_PermuteTwoSrc costs to support splitting of SK_PermuteSingleSrc shuffles
Marek Sokolowski [Thu, 10 Aug 2017 16:21:44 +0000 (16:21 +0000)]
Add .rc scripts tokenizer.
This extends the shell of llvm-rc tool with the ability of tokenization
of the input files. Currently, ASCII and ASCII-compatible UTF-8 files
are supported.
Thanks to Nico Weber (thakis) for his original work in this area.
The liveness-tracking code assumes that the registers that were saved
in the function's prolog are live outside of the function. Specifically,
that registers that were saved are also live-on-exit from the function.
This isn't always the case as illustrated by the LR register on ARM.
[sanitizer-coverage] Change cmp instrumentation to distinguish const operands
This implementation of SanitizerCoverage instrumentation inserts different
callbacks depending on constantness of operands:
1. If both operands are non-const, then a usual
__sanitizer_cov_trace_cmp[1248] call is inserted.
2. If exactly one operand is const, then a
__sanitizer_cov_trace_const_cmp[1248] call is inserted. The first
argument of the call is always the constant one.
3. If both operands are const, then no callback is inserted.
This separation comes useful in fuzzing when tasks like "find one operand
of the comparison in input arguments and replace it with the other one"
have to be done. The new instrumentation allows us to not waste time on
searching the constant operands in the input.
[libFuzzer] Update LibFuzzer w.r.t. the new comparisons instrumentation API
Added the _sanitizer_cov_trace_const_cmp[1248] callbacks.
For now they are implemented the same way as _sanitizer_cov_trace_cmp[1248].
For more details, please see https://reviews.llvm.org/D36465.
Oleg Ranevskyy [Thu, 10 Aug 2017 13:37:58 +0000 (13:37 +0000)]
[CMake][LLVM] Remove duplicated library mask. Broken clang linking against clangShared
Summary:
The `LLVM${c}Info` mask is listed twice in LLVM-Config.cmake. This results in the libraries such as LLVMARMInfo, LLVMAArch4Info, etc appearing twice in `extract_symbols.py` command line while building `clangShared`. `Extract_symbols.py` does not work well in such a case and completely ignores the symbols from the duplicated libraries. Thus, the LLVM(...)Info symbols do not get exported from `clangShared` and linking clang against it fails with unresolved dependencies.
Seems to be a mere copy-paste mistake.
Reviewers: beanz, chapuni
Reviewed By: chapuni
Subscribers: chapuni, aemerson, mgorny, kristof.beyls, llvm-commits, asl
Nikolai Bozhenov [Thu, 10 Aug 2017 11:24:57 +0000 (11:24 +0000)]
[ValueTracking] Enabling ValueTracking patch by default (recommit). Part 2.
The original patch was an improvement to IR ValueTracking on non-negative
integers. It has been checked in to trunk (D18777, r284022). But was disabled by
default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
Sam Parker [Thu, 10 Aug 2017 09:41:00 +0000 (09:41 +0000)]
[ARM][AArch64] ARMv8.3-A enablement
The beta ARMv8.3 ISA specifications have been released for AArch64
and AArch32, these can be found at:
https://developer.arm.com/products/architecture/a-profile/exploration-tools
An introduction to this architecture update can be found at:
https://community.arm.com/processors/b/blog/posts/armv8-a-architecture-2016-additions
This patch is the first in a series which will add ARM v8.3-A support
in LLVM and Clang. It adds the necessary changes that create targets
for both the ARM and AArch64 backends.
Elad Cohen [Thu, 10 Aug 2017 07:44:23 +0000 (07:44 +0000)]
[SelectionDAG] When scalarizing vselect, don't assert on
a legal cond operand.
When scalarizing the result of a vselect, the legalizer currently expects
to already have scalarized the operands. While this is true for the true/false
operands (which have the same type as the result), it is not case for the
condition operand. On X86 AVX512, v1i1 is legal - this leads to operations such
as '< N x type> vselect < N x i1> < N x type> < N x type>' where < N x type > is
illegal to hit an assertion during the scalarization.
The handling is similar to r205625.
This also exposes the fact that (v1i1 extract_subvector) should be legal
and selectable on AVX512 - We do this by custom lowering to vector_extract_elt.
This still leaves us in some cases with redundant dag nodes which will be
combined in a separate soon to come patch.
Dehao Chen [Thu, 10 Aug 2017 05:10:32 +0000 (05:10 +0000)]
Revert part of r310296 to make it really NFC for instrumentation PGO.
Summary: Part of r310296 will disable PGOIndirectCallPromotion in ThinLTO backend if PGOOpt is None. However, as PGOOpt is not passed down to ThinLTO backend for instrumentation based PGO, that change would actually disable ICP entirely in ThinLTO backend, making it behave differently in instrumentation PGO mode. This change reverts that change, and only disable ICP there when it is SamplePGO.
Chandler Carruth [Thu, 10 Aug 2017 03:05:21 +0000 (03:05 +0000)]
[LCG] Fix an assert in a on-scope-exit lambda that checked the contents
of the returned value.
Checking the returned value from inside of a scoped exit isn't actually
valid. It happens to work when NRVO fires and the stars align, which
they reliably do with Clang but don't, for example, on MSVC builds.
Hiroshi Yamauchi [Thu, 10 Aug 2017 02:23:14 +0000 (02:23 +0000)]
[LVI] Fix LVI compile time regression around constantFoldUser()
Summary:
Avoid checking each operand and calling getValueFromCondition() before calling
constantFoldUser() when the instruction type isn't supported by
constantFoldUser().
This fixes a large compile time regression in an internal build.
Linker: Create a function declaration when moving a non-prevailing alias of function type.
We were previously creating a global variable of function type,
which is invalid IR. This issue was exposed by r304690, in which we
started asserting that global variables were of a valid type.
CFI code generation for users (not just callers) of a function depends
on whether this function has a jumptable entry or not. This
information needs to be encoded in of thinlto cache key.
We filter the jumptable list against functions that are actually
referenced in the current module.