Tom Stellard [Wed, 30 May 2018 19:04:40 +0000 (19:04 +0000)]
Merging r326358:
------------------------------------------------------------------------
r326358 | dim | 2018-02-28 12:04:21 -0800 (Wed, 28 Feb 2018) | 29 lines
Fix llvm-config --system-libs output on FreeBSD and NetBSD
Summary:
For various reasons, CMake's detection mechanism for `backtrace()`
returns an absolute path `/usr/lib/libexecinfo.so` on FreeBSD and
NetBSD.
Since `tools/llvm-config/CMakeLists.txt` only checks if system
libraries start with `-`, this causes `llvm-config --system-libs` to
produce the following incorrect output:
[JumpThreading] Don't select an edge that we know we can't thread
In r312664 (D36404), JumpThreading stopped threading edges into
loop headers. Unfortunately, I observed a significant performance
regression as a result of this change. Upon further investigation,
the problematic pattern looked something like this (after
many high level optimizations):
while (true) {
bool cond = ...;
if (!cond) {
<body>
}
if (cond)
break;
}
Now, naturally we want jump threading to essentially eliminate the
second if check and hook up the edges appropriately. However, the
above mentioned change, prevented it from doing this because it would
have to thread an edge into the loop header.
Upon further investigation, what is happening is that since both branches
are threadable, JumpThreading picks one of them at arbitrarily. In my
case, because of the way that the IR ended up, it tended to pick
the one to the loop header, bailing out immediately after. However,
if it had picked the one to the exit block, everything would have
worked out fine (because the only remaining branch would then be folded,
not thraded which is acceptable).
Thus, to fix this problem, we can simply eliminate loop headers from
consideration as possible threading targets earlier, to make sure that
if there are multiple eligible branches, we can still thread one of
the ones that don't target a loop header.
Patch by Keno Fischer!
Differential Revision: https://reviews.llvm.org/D42260
------------------------------------------------------------------------
[InstCombine] fix crash due to ignored addrspacecast
Summary:
Part of the InstCombine code for simplifying GEPs looks through
addrspacecasts. However, this was done by updating a variable
also used by the next transformation, for marking GEPs as
inbounds. This led to replacing a GEP with a similar instruction
in a different addrspace, which caused an assertion failure in RAUW.
This caused julia issue https://github.com/JuliaLang/julia/issues/27055
[BlockPlacement] Disable block placement tail duplciation in structured CFG.
Summary:
Tail duplication easily breaks the structure of CFG, e.g. duplicating on
a region entry. If the structure is intended to be preserved, then we
may want to configure tail duplication, or disable it for structured
CFG. From our benchmark results disabling it doesn't cause performance
regression.
Notice that this currently affects AMDGPU backend. In the next patch, I
also plan to turn on requiresStructuredCFG for NVPTX.
[X86DomainReassignment] Don't compare stack-allocated values by address
Summary:
The Closure allocated in the main loop is allocated on the stack. However,
later in the code its address is taken (and used for comparisons). This
obviously doesn't work. In fact, the Closure will get the same stack address
during every loop iteration, rendering the check that intended to identify
Closure conflicts entirely ineffective. Fix this bug by giving every Closure
a unique ID and using that for comparison. Alternatively, we could heap
allocate the closure object.
The Darwin build bot failed with:
```
llc -mcpu=skylake-avx512 -mtriple=x86_64-unknown-linux-gnu domain-reassignment-test.ll -o - | llvm-mc
--
Exit Code: 134
Command Output (stderr):
--
Assertion failed: (MAI->hasSingleParameterDotFile()), function EmitFileDirective, file lib/MC/MCAsmStreamer.cpp, line 1087.
```
Looks like this is because the `llvm-mc` command was missing a triple
directive and defaulting to MachO. Add the triple option.
------------------------------------------------------------------------
Summary:
We cannot simply delete IMPLICIT_DEF nodes. They may be used
later (e.g. by a PHI) and deleting them will cause later passes (e.g.
LiveVariables) to crash. However, it seems fine to ignore them for
purposes of the domain reassignment (as we do with PHI).
Craig Topper [Tue, 22 May 2018 04:22:44 +0000 (04:22 +0000)]
[X86] Add hasSideEffects=0 back to ADOX instructions. Partial cherrypick from r328952.
This flag was present before the cherrypick of 328945. This matches what happened on trunk. I've left out the scheduling changes from r328952 to minimize changes from 6.0.1.
Chandler Carruth [Tue, 22 May 2018 03:03:11 +0000 (03:03 +0000)]
Merge r330269 to fix egregiously bad codegeneration in the new EFLAGS lowering
that was defferred to a follow-up commit by me not understanding how part of
the x86 backend worked.
Chandler Carruth [Tue, 22 May 2018 02:46:36 +0000 (02:46 +0000)]
Merge r329657.
This is the main patch that introduced the new EFLAGS lowering infrastructure.
All the source merges were clean, but the tests required help to merge.
Specifically, I had to regenerate the CHECK lines in the tests (using the trunk
update_llc_test_checks.py script) because trunk has copy propagation that the
branch doesn't. I also had to update the MIR test to use the old MIR syntax for
physical registers (%name instead of $name).
Chandler Carruth [Mon, 21 May 2018 21:36:11 +0000 (21:36 +0000)]
Merge a series of test updates r329055-329057.
These required skipping the updates to the update test scripts. Note that to
regenerate these tests you'll need to use the test update script close to trunk
rather than on the branch. =/
Previously, the MIPS backend would alwyas break down constant multiplications
into a series of shifts, adds, and subs. This patch changes that so the cost of
doing so is estimated.
The cost is estimated against worst case constant materialization and retrieving
the results from the HI/LO registers.
For cases where the value type of the multiplication is not legal, the cost of
legalization is estimated and is accounted for before performing the
optimization of breaking down the constant
[mips] Fix 'l' constraint handling for types smaller than 32 bits
In case of correct using of the 'l' constraint llvm now generates valid
code; otherwise it shows an error message. Initially these triggers an
assertion.
This commit is the same as r324869 with fixed the test's file name.
InlineAsm is only uniqued if the FunctionTypes are exactly the
same, while cmpTypes() for example considers all pointer types
in the default address space to be the same. For this reason
the end of cmpInlineAsm() can be reached.
This patch replaces the unreachable assertion with a check that
the function types are not identical.
[MergeFunctions] Fix merging of small weak functions
When two interposable functions are merged, we cannot replace
uses and have to emit calls to a common internal function. However,
writeThunk() will not actually emit a thunk if the function is too
small. This leaves us in a broken state where mergeTwoFunctions
already rewired the functions, but writeThunk doesn't do anything.
This patch changes the implementation so that:
* writeThunk() does just that.
* The direct replacement of calls is moved into mergeTwoFunctions()
into the non-interposable case only.
* isThunkProfitable() is extracted and will be called for
the non-iterposable case always, and in the interposable case
only if uses are still left after replacement.
This issue has been introduced in https://reviews.llvm.org/D34806,
where the code for checking thunk profitability has been moved.
[AArch64] Don't materialize 0 with "fmov h0, .." when FullFP16 is not supported
We were generating "fmov h0, wzr" instructions when FullFP16 is not enabled.
I've not added any tests, because the problem was visible in:
test/CodeGen/AArch64/arm64-zero-cycle-zeroing.ll,
which I had to change: I don't think Cyclone has FullFP16 enabled
by default, so it shouldn't be using this v8.2a instruction.
I've also removed these rdar tags, please shout if there are any objections.
I say re-introduces as it seems like "S" was implemented in the original
AArch64 backend, but it looks like it wasn't carried forward to the merged
backend. The original implementation had A and L modifiers that could be
used to print ":lo12:" to the string. It looks like gcc doesn't use these
and :lo12: is expected to be written in the inline assembly string so I've
not implemented A and L. Clang already supports the S modifier.
[cmake] Don't build Native llvm-config when cross compiling if passed by user.
Summary:
Rename LLVM_CONFIG_EXE to LLVM_CONFIG_PATH, and avoid building it if
passed in by user. This is the same way CLANG_TABLEGEN and
LLVM_TABLEGEN are handled, e.g., when -DLLVM_OPTIMIZED_TABLEGEN=ON is
passed.
These are very simple flag setting instructions that appear to only be a single uop. They're unlikely to need this separation.
------------------------------------------------------------------------
[x86] Expose more of the condition conversion routines in the public API
for X86's instruction information. I've now got a second patch under
review that needs these same APIs. This bit is nicely orthogonal and
obvious, so landing it. NFC.
------------------------------------------------------------------------
Tom Stellard [Mon, 14 May 2018 17:17:14 +0000 (17:17 +0000)]
Merging r332176:
------------------------------------------------------------------------
r332176 | dim | 2018-05-12 12:59:54 -0700 (Sat, 12 May 2018) | 20 lines
Clear converters map after X86 Domain Reassignment to avoid crashes
Summary:
As reported in PR37264, in some cases the X86 Domain Reassignment
`runOnMachineFunction()` is called twice. Because it only deletes the
`.second` members of its `InstrConverterBaseMap`, and does not clean up
the map itself, this can lead to double frees and crashes.
Use `DeleteContainerSeconds()` instead, so the `Converters` map can
safely be reinitialized and its members re-deleted for each X86 Domain
Reassignment pass.
[AArch64] Fix performPostLD1Combine to check for constant lane index.
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction. The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:
- an assert when lowering the LD1LANE ISD node since it assumes an
constant operand
- an assert in isel if the lane index value depends on the
post-incremented base register
Both of these issues are avoided by simply checking that the lane index
is a constant.
[DivRemPairs] Fix non-determinism in use list order.
Summary:
Use a MapVector instead of a DenseMap for RemMap since it is iteratated
over and the order of iteration can effect the order that new
instructions are created. This can in turn effect the use list order of
div/rem input values if multiple new instructions are created that share
any input values.
[X86] Add back fast-isel code for handling i8 shifts.
I removed this in r316797 because the coverage report showed no coverage and I thought it should have been handled by the auto generated table. I now see that there is code that bypasses the table if the shift amount is out of bounds.
This adds back the code. We'll codegen out of bounds i8 shifts to effectively (amount & 0x1f). The 0x1f is a strange quirk of x86 that shift amounts are always masked to 5-bits(except 64-bits). So if the masked value is still out bounds the result will be 0.
Duplicating this intrinsic is not generally valid because it has the side-effect
of decrementing the CTR. Any passes that duplicate it would need to be taught to
keep the regions formed completely disjoint.
This patch should be NFC for typical uses as CTRLoops runs after the remaining
loop passes. It only affects situations where the loop passes are scheduled on
the IR after the codegen passes (as is the case with some JIT pipelines).
[RuntimeDyld][PowerPC] Use global entry points for calls between sections.
Functions in different objects may use different TOCs, so calls between such
functions should use the global entry point of the callee which updates the
TOC pointer.
This should fix a bug that the Numba developers encountered (see
https://github.com/numba/numba/issues/2451).
Patch by Olexa Bilaniuk. Thanks Olexa!
No RuntimeDyld checker test case yet as I am not familiar enough with how
RuntimeDyldELF fixes up call-sites, but I do not want to hold up landing
this. I will continue to work on it and see if I can rope some powerpc
experts in.
------------------------------------------------------------------------
[RuntimeDyld][PowerPC] Add a test case for r329335.
Checks that calls to different sections go to the function's global entry point,
rather than the local one.
------------------------------------------------------------------------
Allow FDE references outside the +/-2GB range supported by PC relative
offsets for code models other than small/medium. For JIT application,
memory layout is less controlled and can result in truncations
otherwise.
Patch based on one by Olexa Bilaniuk!
------------------------------------------------------------------------
[AArch64][Falkor] Fix bug in Falkor HWPF collision avoidance pass.
Summary:
When inserting MOVs to avoid Falkor HWPF collisions, the non-base
register operand of load instructions (e.g. a register offset) was not
being considered live, so it could potentially have been used as a
scratch register, clobbering the actual offset value.
[PowerPC] Don't miscompile rotate+mask into an ANDIo if it can't recreate the immediate
I'm not even sure if this transform is ever worth it, but this at least
stops the bleeding.
------------------------------------------------------------------------
[PowerPC] Fix condition for 64-bit rotate when replacing r+r instr with r+i
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=37039
The condition only covers one of the two 64-bit rotate instructions. This just
adds the second (RLDICLo).
This patch provides mitigation for CVE-2017-5715, Spectre variant two,
which affects the P5600 and P6600. It implements the LLVM part of
-mindirect-jump=hazard. It is _not_ enabled by default for the P5600.
The migitation strategy suggested by MIPS for these processors is to use
hazard barrier instructions. 'jalr.hb' and 'jr.hb' are hazard
barrier variants of the 'jalr' and 'jr' instructions respectively.
These instructions impede the execution of instruction stream until
architecturally defined hazards (changes to the instruction stream,
privileged registers which may affect execution) are cleared. These
instructions in MIPS' designs are not speculated past.
These instructions are used with the attribute +use-indirect-jump-hazard
when branching indirectly and for indirect function calls.
These instructions are defined by the MIPS32R2 ISA, so this mitigation
method is not compatible with processors which implement an earlier
revision of the MIPS ISA.
Performance benchmarking of this option with -fpic and lld using
-z hazardplt shows a difference of overall 10%~ time increase
for the LLVM testsuite. Certain benchmarks such as methcall show a
substantially larger increase in time due to their nature.
The original author was Fedor Indutny <fedor@indutny.com>.
`musttail` calls can't be naively splitted. The split blocks must
include not only the call instruction itself, but also (optional)
`bitcast` and `return` instructions that follow it.
Clone `bitcast` and `ret`, place them into the split blocks, and
remove the tail block when done.
Eli pointed out that variadic functions are totally a thing, so this
assert is incorrect.
No test-case is provided, since the only way this assert fires is if a
specific DenseMap falls back to doing `isEqual` checks, and that seems
fairly brittle (and requires a pyramid of growing
`call void (i8, ...) @varargs(i8 0)`).
[DebugInfo] Discard invalid DBG_VALUE instructions in LiveDebugVariables
Summary:
This is a workaround for pr36417
https://bugs.llvm.org/show_bug.cgi?id=36417
LiveDebugVariables will now verify that the DBG_VALUE instructions
are sane (prior to register allocation) by asking LIS if a virtual
register used in the DBG_VALUE is live (or dead def) in the slot
index before the DBG_VALUE. If it isn't sane the DBG_VALUE is
discarded.
One pass that was identified as introducing non-sane DBG_VALUE
instructtons, when analysing pr36417, was the DAG->DAG Instruction
Selection. It sometimes inserts DBG_VALUE instructions referring to
a virtual register that is defined later in the same basic block.
So it is a use before def kind of problem. The DBG_VALUE is
typically inserted in the beginning of a basic block when this
happens. The problem can be seen in the test case
test/DebugInfo/X86/dbg-value-inlined-parameter.ll
[MemorySSA] Consider callsite args for hashing and equality.
We use a `DenseMap<MemoryLocOrCall, MemlocStackInfo>` to keep track of
prior work when optimizing uses in MemorySSA. Because we weren't
accounting for callsite arguments in either the hash code or equality
tests for `MemoryLocOrCall`s, we optimized uses too aggressively in
some rare cases.
PR35402 triggered this case. It bswap and stores a 48bit value, current STBRX optimization transforms it into STBRX. Unfortunately 48bit is not a simple MVT, there is no PPC instruction to support it, and it can't be automatically expanded by llvm, so caused a crash.
This patch detects the non-simple MVT and returns early.
[GlobalOpt] don't change CC of musttail calle(e|r)
When the function has musttail call - its cc is fixed to be equal to the
cc of the musttail callee. In such case (and in the case of the musttail
callee), GlobalOpt should not change the cc to fastcc as it will break
the invariant.
The PeepholeOptimizer would fail for vregs without a definition. If this
was caused by an undef operand abort to keep the code simple (so we
don't need to add logic everywhere to replicate the undef flag).
[ARM] Fix "Constant pool entry out of range!" in Thumb1 mode
This patch fixes PR36658, "Constant pool entry out of range!" in Thumb1 mode.
In ARMConstantIslands::optimizeThumb2JumpTables() in Thumb1 mode,
adjustBBOffsetsAfter() is not calculating postOffset correctly by
properly accounting for the padding that is required for the constant pool
that immediately follows the jump table branch instruction.
[GlobalsAA] Fix a pretty terrible bug that has been in GlobalsAA for
a long time.
The key thing is that we need to create value handles for every function
that we create a `FunctionInfo` object around. Without this, when that
function is deleted we can end up creating a new function that collides
with its address and look up a stale AA result. With that AA result we
can in turn miscompile code in ways that break.
This is seriously one of the most absurd miscompiles I've seen. It only
reproduced for us recently and only when building a very large server
with both ThinLTO and PGO.
A *HUGE* shout out to Wei Mi who tracked all of this down and came up
with this patch. I'm just landing it because I happened to still by at
a computer.
He or I can work on crafting a test case to hit this (now that we know
what to target) but it'll take a while, and we've been chasing this for
a long time and need it fix Right Now.
------------------------------------------------------------------------
[X86] Fix a typo in Host.cpp that causes us to misidentify KNL, Silvermont, Goldmont and probably other CPUs for -march=native
I think most of the Intel Core CPUs and recent AMD CPUs are unaffected. All the CPUs that have a "subtype" should work. The ones that were broken are the ones that are a "type" with no subtypes.
Because of CVE-2018-6574, some compiler options and linker options are restricted to prevent arbitrary code execution.
https://github.com/golang/go/issues/23672
By this change, building a Go code with LLVM Go bindings causes a compilation error as follows.
go build llvm.org/llvm/bindings/go/llvm: invalid flag in #cgo LDFLAGS: -Wl,-headerpad_max_install_names
llvm-go tool generates cgo LDFLAGS directive from `llvm-config --ldflags` and it contains -Wl,option options. But -Wl,option is banned by default. To avoid this problem, we need to set $CGO_LDFLAGS_ALLOW environment variable to notify a compiler that the flags should be allowed.
By default for go 1.10 and go 1.9.5 these options should appear in the accepted set of options, however, if you're running into the error it's useful to have this documented.
Patch by Ryuichi Hayashida
------------------------------------------------------------------------
[PowerPC] Do not produce invalid CTR loop with an FRem
An FRem instruction inside a loop should prevent the loop from being converted
into a CTR loop since this is not an operation that is legal on any PPC
subtarget. This will always be a call to a library function which means the
loop will be invalid if this instruction is in the body.
bitcode support change for fast flags compatibility
Summary: The discussion and as per need, each vendor needs a way to keep the old fast flags and the new fast flags in the auto upgrade path of the IR upgrader. This revision addresses that issue.
[AArch64][GlobalISel] When copying from a gpr32 to an fpr16 reg, convert to fpr32 first.
This is a follow on commit to r[x] where we fix the other direction of copy.
For this case, after converting the source from gpr32 -> fpr32, we use a
subregister copy, which is essentially what EXTRACT_SUBREG does in SDAG land.
[LV] Use Demanded Bits and ValueTracking for reduction type-shrinking
The type-shrinking logic in reduction detection, although narrow in scope, is
also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies
the approach to rely on the demanded bits and value tracking analyses, if
available. We currently perform type-shrinking separately for reductions and
other instructions in the loop. Long-term, we should probably think about
computing minimal bit widths in a more complete way for the loops we want to
vectorize.
[LICM] update BlockColors after splitting predecessors
Update BlockColors after splitting predecessors. Do not allow splitting
EHPad for sinking when the BlockColors is not empty, so we can
simply assign predecessor's color to the new block.
[Dominators] Always recalculate postdominators when update yields different roots
Summary:
This patch makes postdominators always recalculate the tree when an update causes to change the tree roots.
As @dmgreen noticed in [[ https://reviews.llvm.org/D41298 | D41298 ]], the previous implementation was not conservative enough and it was possible to end up with a PostDomTree that was different than a freshly computed one.
The patch also compares postdominators with a freshly computed tree at the end of full verification to make sure we don't hit similar issues in the future.
This should (ideally) be also backported to 6.0 before the release, although I don't have any reports of this causing an observable error. It should be safe to do it even if it's late in the release, as the change only makes the current behavior more conservative.