To commute a trivial rlwimi instructions (meaning one with a full mask and zero
shift), we'd need to ability to form an all-zero mask (instead of an all-one
mask) using rlwimi. We can't represent this, however, and we'll miscompile code
if we try.
The code quality problem that this highlights (that SDAG simplification can
lead to us generating an ISD::OR node with a constant zero LHS) will be fixed
as a follow-up.
PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from
an input pattern that looks like this:
and(or(x, c1), c2)
but the associated logic does not work if there are bits that are 1 in c1 but 0
in c2 (these are normally canonicalized away, but that can't happen if the 'or'
has other users. Make sure we abort the transformation if such bits are
discovered.
[PowerPC] Don't always consider P8Altivec-only masks in LowerVECTOR_SHUFFLE
LowerVECTOR_SHUFFLE needs to decide whether to pass a vector shuffle off to the
TableGen-generated matching code, and it does this by testing the same
predicates used by the TableGen files. Unfortunately, when we added new
P8Altivec-only predicates, we started universally testing them in
LowerVECTOR_SHUFFLE, and if then matched when targeting a system prior to a P8,
we'd end up with a selection failure.
[PowerPC] Fixup SELECT_CC (and SETCC) patterns with i1 comparison operands
There were really two problems here. The first was that we had the truth tables
for signed i1 comparisons backward. I imagine these are not very common, but if
you have:
setcc i1 x, y, LT
this has the '0 1' and the '1 0' results flipped compared to:
setcc i1 x, y, ULT
because, in the signed case, '1 0' is really '-1 0', and the answer is not the
same as in the unsigned case.
The second problem was that we did not have patterns (at all) for the unsigned
comparisons select_cc nodes for i1 comparison operands. This was the specific
cause of PR24552. These had to be added (and a missing Altivec promotion added
as well) to make sure these function for all types. I've added a bunch more
test cases for these patterns, and there are a few FIXMEs in the test case
regarding code-quality.
[PowerPC] Don't assume ADDISdtprelHA's source is r3
Even through ADDISdtprelHA generally has r3 as its source register, it is
possible for the instruction scheduler to move things around such that some
other register is the source. We need to print the actual source register, not
always r3. Fixes PR24394.
The test case will come in a follow-up commit because it depends on MIR
target-flags parsing.
[PowerPC] PPCVSXFMAMutate should ignore trivial-copy addends
We might end up with a trivial copy as the addend, and if so, we should ignore
the corresponding FMA instruction. The trivial copy can be coalesced away later,
so there's nothing to do here. We should not, however, assert. Fixes PR24544.
[mips] Check the register class before replacing materializations of zero with $zero in microMIPS.
Summary:
The microMIPS register class GPRMM16 does not contain the $zero register.
However, MipsSEDAGToDAGISel::replaceUsesWithZeroReg() would replace uses
of the $dst register:
[d]addiu, $dst, $zero, 0
with the $zero register, without checking for membership in the register
class of the target machine operand.
[PowerPC] PPCVSXFMAMutate should not segfault on undef input registers
When PPCVSXFMAMutate would look at the input addend register, it would get its
input value number. This would fail, however, if the register was undef,
causing a segfault. Don't segfault (just skip such FMA instructions).
Fixes the test case from PR24542 (although that may have been over-reduced).
[bpf] Do not expand UNDEF SDNode during insn selection lowering
o Before this patch, BPF backend will expand UNDEF node
to i64 constant 0.
o For second pass of dag combiner, legalizer will run through
each to-be-processed dag node.
o If any new SDNode is generated and has an undef operand,
dag combiner will put undef node, newly-generated constant-0 node,
and any node which uses these nodes in the working list.
o During this process, it is possible undef operand is
generated again, and this will form an infinite loop
for dag combiner pass2.
o This patch allows UNDEF to be a legal type.
Signed-off-by: Yonghong Song <yhs@plumgrid.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
------------------------------------------------------------------------
[bpf] Avoid extra pointer arithmetic for stack access
For the program like below
struct key_t {
int pid;
char name[16];
};
extern void test1(char *);
int test() {
struct key_t key = {};
test1(key.name);
return 0;
}
For key.name, the llc/bpf may generate the below code:
R1 = R10 // R10 is the frame pointer
R1 += -24 // framepointer adjustment
R1 |= 4 // R1 is then used as the first parameter of test1
OR operation is not recognized by in-kernel verifier.
This patch introduces an intermediate FI_ri instruction and
generates the following code that can be properly verified:
R1 = R10
R1 += -20
Patch by Yonghong Song <yhs@plumgrid.com>
------------------------------------------------------------------------
Fix vector splitting for extract_vector_elt and vector elements of <8-bits.
Summary:
One of the vector splitting paths for extract_vector_elt tries to lower:
define i1 @via_stack_bug(i8 signext %idx) {
%1 = extractelement <2 x i1> <i1 false, i1 true>, i8 %idx
ret i1 %1
}
to:
define i1 @via_stack_bug(i8 signext %idx) {
%base = alloca <2 x i1>
store <2 x i1> <i1 false, i1 true>, <2 x i1>* %base
%2 = getelementptr <2 x i1>, <2 x i1>* %base, i32 %idx
%3 = load i1, i1* %2
ret i1 %3
}
However, the elements of <2 x i1> are not byte-addressible. The result of this
is that the getelementptr expands to '%base + %idx * (1 / 8)' which simplifies
to '%base + %idx * 0', and then simply '%base' causing all values of %idx to
extract element zero.
This commit fixes this by promoting the vector elements of <8-bits to i8 before
splitting the vector.
Fix embarrassing bugs I introduced to the `SlotTracker` in or around
r235785. I had us iterating through every instruction in a function
(and hitting a map in the LLVMContext) for every basic block in the
function.
While there, completely avoid the call to
`SlotTracker::processFunctionMetadata()` from
`SlotTracker::processFunction()` if we've speculatively done this
already in `SlotTracker::processModule()` by checking
`ShouldInitializeAllMetadata` (this wasn't an algorithmic problem, but
it's touching the same line of code).
Michael Wong [Mon, 24 Aug 2015 14:50:26 +0000 (14:50 +0000)]
Update CREDITS.TXT with Clang OpenMP implementation + test suite contributors from AMD, Argonne National Lab., IBM, Intel, Texas Instruments, University of Houston and many others.
Renato Golin [Thu, 20 Aug 2015 16:41:22 +0000 (16:41 +0000)]
Merge r245577 into branch_37
[ARM] Don't try and custom lower a vNi64 SETCC.
It won't go well. We've already marked 64-bit SETCCs as non-Custom, but it's
just possible that a SETCC has a legal result type but an illegal operand
type. If this happens, bail out before we create unselectable nodes.
Fixes PR24292. I tried to create a testcase but in 99% of cases we can't
trigger this - not surprising that this bug has been latent since 2009.
[PowerPC] Fix the int2fp(fp2int(x)) DAGCombine to ignore ppc_fp128
This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128
operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)),
but shouldn't (it should only apply to f32/f64 types). The result was a crash.
------------------------------------------------------------------------
Hans Wennborg [Thu, 20 Aug 2015 15:56:38 +0000 (15:56 +0000)]
Merging r245365 and r245369:
------------------------------------------------------------------------
r245365 | majnemer | 2015-08-18 15:07:25 -0700 (Tue, 18 Aug 2015) | 4 lines
[InstSimplify] Don't assume getAggregateElement will succeed
It isn't always possible to get a value from getAggregateElement.
This fixes PR24488.
------------------------------------------------------------------------
Renato Golin [Thu, 20 Aug 2015 15:05:48 +0000 (15:05 +0000)]
Revert "[SimplifyCFG] Be more aggressive" on branch_37
This reverts commit r229099 in branch 37 only, because it caused PR24292.
I'll continue investigating and will fix on trunk, but being an optimization
change, we can let the rest of the release go without this one.
Prevent the scalarizer from caching incorrect entries
The scalarizer can cache incorrect entries when walking up a chain of
insertelement instructions. This occurs when it encounters more than one
instruction that it is not actively searching for, as it unconditionally caches
every element it finds. The fix is to only cache the first element that it
isn't searching for so we don't overwrite correct entries.
This change adds RTTI and Exception flags to llvm-config's cxxflags. This solution is a minimal patch to solve the issue, and is recommended for the 3.7 release branch. Tom Stellard's outstanding work is the longer term solution.
Patch By: David Wiberg
------------------------------------------------------------------------
Daniel Sanders [Fri, 14 Aug 2015 10:18:40 +0000 (10:18 +0000)]
[mips] Drop the claim that ubsan works since r243384 and r244646 are not yet merged.
The timezone difference between myself, the code-owner, and release manager means
it's sensible to update the release notes on the assumption that they won't be
merged. If we do merge them, then we can revert this release notes change.
[SystemZ] Support large LLVM IR struct return values
Recent mesa/llvmpipe crashes on SystemZ due to a failed assertion when
attempting to compile a routine with a return type of
{ <4 x float>, <4 x float>, <4 x float>, <4 x float> }
on a system without vector instruction support.
This is because after legalizing the vector type, we get a return value
consisting of 16 floats, which cannot all be returned in registers.
Usually, what should happen in this case is that the target's CanLowerReturn
routine rejects the return type, in which case SelectionDAG falls back to
implementing a structure return in memory via implicit reference.
However, the SystemZ target never actually implemented any CanLowerReturn
routine, and thus would accept any struct return type.
This patch fixes the crash by implementing CanLowerReturn. As a side effect,
this also handles fp128 return values, fixing a todo that was noted in
SystemZCallingConv.td.
`InstCombiner::OptimizeOverflowCheck` was asserting an
invariant (operands to binary operations are ordered by decreasing
complexity) that wasn't really an invariant. Fix this by instead having
`InstCombiner::OptimizeOverflowCheck` establish the invariant if it does
not hold.
------------------------------------------------------------------------
[PHITransAddr] Don't assume that instruction operands are translatable
We can only PHI translate instructions. In our attempt to PHI translate
a bitcast, we attempt to translate its operand; however, the operand
might be an argument or a global instead of an instruction. Benignly
bail out when this happens.
Kai Nacke [Thu, 6 Aug 2015 19:44:38 +0000 (19:44 +0000)]
diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst
index 96461e5..f669a1f 100644
--- a/docs/ReleaseNotes.rst
+++ b/docs/ReleaseNotes.rst
@@ -206,6 +206,21 @@ Jade project is hosted as part of the Open RVC-CAL Compiler
(`Orcc <http://orcc.sf.net>`_) and requires it to translate the RVC-CAL standard
library of video coding tools into an LLVM assembly code.
LDC - the LLVM-based D compiler
-------------------------------
`D <http://dlang.org>`_ is a language with C-like syntax and static typing. It
pragmatically combines efficiency, control, and modeling power, with safety and
programmer productivity. D supports powerful concepts like Compile-Time Function
Execution (CTFE) and Template Meta-Programming, provides an innovative approach
to concurrency and offers many classical paradigms.
`LDC <http://wiki.dlang.org/LDC>`_ uses the frontend from the reference compiler
combined with LLVM as backend to produce efficient native code. LDC targets
x86/x86_64 systems like Linux, OS X and Windows and also PowerPC (32/64 bit).
Ports to other architectures like ARM, AArch64 and MIPS64 are underway.
Hans Wennborg [Thu, 6 Aug 2015 16:08:01 +0000 (16:08 +0000)]
Merging r244123:
------------------------------------------------------------------------
r244123 | pete | 2015-08-05 13:55:53 -0700 (Wed, 05 Aug 2015) | 7 lines
Update GettingStarted docs list of LLVM_TARGETS_TO_BUILD to match cmake.
Since the docs were written, we've added the BPF backend to the list.
Updating the docs to take this in to account. Also sorted them to
match cmake while I was changing these lines.
Reviewed by Chris B.
------------------------------------------------------------------------
Hans Wennborg [Thu, 6 Aug 2015 16:02:17 +0000 (16:02 +0000)]
Merging r243927, r243932, and r243934:
------------------------------------------------------------------------
r243927 | chandlerc | 2015-08-03 17:44:07 -0700 (Mon, 03 Aug 2015) | 11 lines
[UB] Fix a nasty place where we would pass null pointers to memcpy.
This happens to work, but is not guaranteed to work. Indeed, most memcpy
interfaces in Linux-land annotate these arguments as nonnull, and GCC
and LLVM both can and do optimized based upon that. When they do so,
they might legitimately have miscompiled code calling this routine with
two valid iterators, 'nullptr' and 'nullptr'. There was even code doing
precisely this because StringRef().begin() and StringRef().end() both
produce null pointers.
This was found by UBSan.
------------------------------------------------------------------------
While checking for the existence of the clang-tools-extra directory,
the script was not checking for its destination name, "extra", and
the script was failing when re-running without checking out new
sources.
------------------------------------------------------------------------
Revert r229675 - [mips] Avoid redundant sign extension of the result of binary bitwise instructions.
It introduced two regressions on 64-bit big-endian targets running under N32
(MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4, and
MultiSource/Applications/kimwitu++/kc) The issue is that on 64-bit targets
comparisons such as BEQ compare the whole GPR64 but incorrectly tell the
instruction selector that they operate on GPR32's. This leads to the
elimination of i32->i64 extensions that are actually required by
comparisons to work correctly.
There's currently a patch under review that fixes this problem.
------------------------------------------------------------------------
[mips][FastISel] Disable code generation for unsupported targets through FastISel.
Summary:
Previously, we would check whether the target is supported or not, only in
fastSelectInstruction(). This means that 64-bit targets could use FastISel too.
We fix this by checking every overridden method of the FastISel class and
by falling back to SelectionDAG if the target isn't supported. This change
should have been committed along with r243638, but somehow I missed it.
[regalloc] Make RegMask clobbers prevent merging vreg's into PhysRegs when hoisting def's upwards.
Summary:
This prevents vreg260 and D7 from being merged in:
%vreg260<def> = LDC1 ...
JAL <ga:@sin>, <regmask ... list not containing D7 ...>
%D7<def> = COPY %vreg260; ...
Doing so is not valid because the JAL clobbers the D7.
This fixes the almabench regression in the LLVM 3.7.0 release branch.
Reviewers: MatzeB
Subscribers: MatzeB, qcolombet, hans, llvm-commits
fix crash in machine trace metrics due to processing dbg_value instructions (PR24199)
The test in PR24199 ( https://llvm.org/bugs/show_bug.cgi?id=24199 ) crashes because machine
trace metrics was not ignoring dbg_value instructions when calculating data dependencies.
The machine-combiner pass asks machine trace metrics to calculate an instruction trace,
does some reassociations, and calls MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
along with MachineTraceMetrics::invalidate(). The dbg_value instructions have their operands
invalidated, but the instructions are not expected to be deleted.
On a subsequent loop iteration of the machine-combiner pass, machine trace metrics would be
called again and die while accessing the invalid debug instructions.
[MCJIT] Fix PR20656 by teaching MCJIT to honor ExecutionEngine's global mapping.
This is important for users of the C API who can't supply custom symbol
resolvers yet.
------------------------------------------------------------------------
Summary:
This hidden option would disable code generation through FastISel by
default. It was removed from the available options and from the
Fast-ISel tests that required it in order to run the tests.
[mips] Fix out-of-date debug information in test file.
Update the debug info in the check-lines because the change in r243638
introduced a constant initialization before the prologue's end as part
of a register spill.
------------------------------------------------------------------------
[mips][FastISel] Apply only zero-extension to constants prior to their materialization.
Summary:
Previously, we would sign-extend non-boolean negative constants and
zero-extend otherwise. This was problematic for PHI instructions with
negative values that had a type with bitwidth less than that of the
register used for materialization.
More specifically, ComputePHILiveOutRegInfo() assumes the constants
present in a PHI node are zero extended in their container and
afterwards deduces the known bits.
For example, previously we would materialize an i16 -4 with the
following instruction:
addiu $r, $zero, -4
The register would end-up with the 32-bit 2's complement representation
of -4. However, ComputePHILiveOutRegInfo() would generate a constant
with the upper 16-bits set to zero. The SelectionDAG builder would use
that information to generate an AssertZero node that would remove any
subsequent trunc & zero_extend nodes.
In theory, we should modify ComputePHILiveOutRegInfo() to consult
target-specific hooks about the way they prefer to materialize the
given constants. However, git-blame reports that this specific code
has not been touched since 2011 and it seems to be working well for every
target so far.
[mips][FastISel] Fix call lowering by bailing out on "fastcc" calls.
Summary:
Currently, we support only the MIPS O32 ABI calling convention for call
lowering. With this change we avoid using the O32 calling convetion for
lowering calls marked as using the fast calling convention.
[mips][FastISel] Fix generated code for IR's select instruction.
Summary:
Generate correct code for the select instruction by zero-extending
it's boolean/condition operand to GPR-width. This is necessary because
the conditional-move instructions operate on the whole register.
test-release.sh: Add option for building the OpenMP run-time
This isn't part of the official release process, but provides a convenient way
to build binaries for those who want to experiment with it. Hopefully the run-
time can be part of the regular build and release process for 3.8.
[PPC] Fix PR24216: Don't generate splat for misaligned shuffle mask
Given certain shuffle-vector masks, LLVM emits splat instructions
which splat the wrong bytes from the source register. The issue is
that the function PPC::isSplatShuffleMask() in PPCISelLowering.cpp
does not ensure that the splat pattern found is requesting bytes that
are aligned on an EltSize boundary. This patch detects this situation
as not a valid splat mask, resulting in a permute being generated
instead of a splat.
Patch and test case by Tyler Kenney, cleaned up a bit by me.
This is a simple bug fix that would be good to incorporate into 3.7.
Hans Wennborg [Wed, 29 Jul 2015 15:38:37 +0000 (15:38 +0000)]
Merging r243500: (conflicts resolved manually since the branch doesn't have r243293)
------------------------------------------------------------------------
r243500 | spatel | 2015-07-28 16:28:22 -0700 (Tue, 28 Jul 2015) | 16 lines
ignore duplicate divisor uses when transforming into reciprocal multiplies (PR24141)
PR24141: https://llvm.org/bugs/show_bug.cgi?id=24141
contains a test case where we have duplicate entries in a node's uses() list.
After r241826, we use CombineTo() to delete dead nodes when combining the uses into
reciprocal multiplies, but this fails if we encounter the just-deleted node again in
the list.
The solution in this patch is to not add duplicate entries to the list of users that
we will subsequently iterate over. For the test case, this avoids triggering the
combine divisors logic entirely because there really is only one user of the divisor.
fix invalid load folding with SSE/AVX FP logical instructions (PR22371)
This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ).
I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy
when targeting 32-bit x86 and causes a miscompile/crash in Wine:
https://bugs.winehq.org/show_bug.cgi?id=38826
https://llvm.org/bugs/show_bug.cgi?id=22371#c25
The quick fix is to simply remove the scalar FP logical instructions from the load folding table
in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs,
fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP
logical instructions (because that's all x86 gives us anyway). That lets us do the load folding
legally.
In order to implement indirect sampler loads, we don't
want to match on a VGPR load but an SGPR one for constants,
as we cannot feed VGPRs to the sampler only SGPRs.
this should be applicable for llvm 3.7 as well.
------------------------------------------------------------------------
This makes the script run to the end and produce tarballs even on test
failures, and then highlights any errors afterwards.
(I first tried just storing the errors in a global variable, but that
didn't work as the "test_llvmCore" function invocation is actually
running as a sub-shell.)
Prior to CMAKE 2.8.4 that was covered by the WIN32 conditional but
from 2.8.4 CMAKE no longer defined WIN32 when running under Cygwin
and it needs its own test.