The code wasn't yelling at the user when there's a reference
from a DIGlobalVariableExpression. Thanks to Adrian for the
reduced testcase. Fixes PR34672.
[x86] reduce 64-bit mask constant to 32-bits by right shifting
This is a follow-up from D38181 (r314023). We have to put 64-bit
constants into a register using a separate instruction, so we
should try harder to avoid that.
From what I see, we're not likely to encounter this pattern in the
DAG because the upstream setcc combines from this don't (usually?)
produce this pattern. If we fix that, then this will become more
relevant. Since the cost of handling this case is just loosening
the predicate of the existing fold, we might as well do it now.
[PowerPC] Eliminate compares - add i32 sext/zext handling for SETULT/SETUGT
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
[PowerPC] Eliminate compares - add i32 sext/zext handling for SETULE/SETUGE
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLT/SETGT
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
[X86] [MC] fixed non optimal encoding of instruction memory operand (PR24038).
Fixed suboptimal encoding of instruction memory operand when assembler is used to select 32 bit fixup rather than 8 bit immediate for encoding memory offset value.
Differential Revision: https://reviews.llvm.org/D38117
[x86] swap order of srl (and X, C1), C2 when it saves size
The (non-)obvious win comes from saving 3 bytes by using the 0x83 'and' opcode variant instead of 0x81.
There are also better improvements based on known-bits that allow us to eliminate the mask entirely.
As noted, this could be extended. There are potentially other wins from always shifting first, but doing
that reveals a tangle of problems in other pattern matching. We do this transform generically in
instcombine, but we often have icmp IR that doesn't match that pattern, so we must account for this
in the backend.
[InstCombine] Move the call to isSignBitCheck into getDemandedBitsLHSMask instead of calling it outside and passing its result through a flag. NFCI
The result of the isSignBitCheck isn't used anywhere else and this allows us to share the m_APInt call in the likely case that it isn't a sign bit check.
[InstCombine] Simplify check for RHS being a splat constant in foldICmpUsingKnownBits by just checking Op1Min==Op1Max rather than going through m_APInt.
Tim Shen [Fri, 22 Sep 2017 18:30:02 +0000 (18:30 +0000)]
[XRay] support conditional return on PPC.
Summary: Conditional returns were not taken into consideration at all. Implement them by turning them into jumps and normal returns. This means there is a slightly higher performance penalty for conditional returns, but this is the best we can do, and it still disturbs little of the rest.
[TargetTransformInfo] Handle intrinsic call in getInstructionLatency()
Usually an intrinsic is a simple target instruction, it should have a small latency. A real function call has much larger latency. So handle the intrinsic call in function getInstructionLatency().
Check vector elements for equivalence in the HexagonVectorLoopCarriedReuse pass
If the two instructions being compared for equivalence have corresponding operands
that are integer constants, then check their values to determine equivalence.
Daniel Neilson [Fri, 22 Sep 2017 15:47:57 +0000 (15:47 +0000)]
[SCEV] Generalize folding of trunc(x)+n*trunc(y) into folding m*trunc(x)+n*trunc(y)
Summary:
A SCEV such as:
{%v2,+,((-1 * (trunc i64 (-1 * %v1) to i32)) + (-1 * (trunc i64 %v1 to i32)))}<%loop>
can be folded into, simply, {%v2,+,0}. However, the current code in ::getAddExpr()
will not try to apply the simplification m*trunc(x)+n*trunc(y) -> trunc(trunc(m)*x+trunc(n)*y)
because it only keys off having a non-multiplied trunc as the first term in the simplification.
This patch generalizes this code to try to do a more generic fold of these trunc
expressions.
Artur Pilipenko [Fri, 22 Sep 2017 13:13:57 +0000 (13:13 +0000)]
Rework loop predication pass
We've found a serious issue with the current implementation of loop predication.
The current implementation relies on SCEV and this turned out to be problematic.
To fix the problem we had to rework the pass substantially. We have had the
reworked implementation in our downstream tree for a while. This is the initial
patch of the series of changes to upstream the new implementation.
For now the transformation is limited to the following case:
* The loop has a single latch with either ult or slt icmp condition.
* The step of the IV used in the latch condition is 1.
* The IV of the latch condition is the same as the post increment IV of the guard condition.
* The guard condition is ult.
See the review or the LoopPredication.cpp header for the details about the
problem and the new implementation.
This patch re-commits the patch that was pulled out due to a
problem it caused, but with a fix for the problem. The fix
was reviewed separately by Eric Christopher and Hal Finkel.
Yonghong Song [Fri, 22 Sep 2017 04:36:36 +0000 (04:36 +0000)]
bpf: initial 32-bit ALU encoding support in assembler
This patch adds instruction patterns for operations in BPF_ALU. After this,
assembler could recognize some 32-bit ALU statement. For example, those listed
int the unit test file.
Separate MOV patterns are unnecessary as MOV is ALU operation that could reuse
ALU encoding infrastructure, this patch removed those redundant patterns.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313961 91177308-0d34-0410-b5e6-96231b3b80d8
Yonghong Song [Fri, 22 Sep 2017 04:36:35 +0000 (04:36 +0000)]
bpf: add 32bit register set
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313960 91177308-0d34-0410-b5e6-96231b3b80d8
Yonghong Song [Fri, 22 Sep 2017 04:36:34 +0000 (04:36 +0000)]
bpf: refactor inst patterns with better inheritance
Arithmetic and jump instructions, load and store instructions are sharing
the same 8-bit code field encoding,
A better instruction pattern implemention could be the following inheritance
relationships, and each layer only encoding those fields which start to
diverse from that layer. This avoids some redundant code.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313959 91177308-0d34-0410-b5e6-96231b3b80d8
Yonghong Song [Fri, 22 Sep 2017 04:36:32 +0000 (04:36 +0000)]
bpf: refactor inst patterns with more mnemonics
Currently, eBPF backend is using some constant directly in instruction patterns,
This patch replace them with mnemonics and removed some unnecessary temparary
variables.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313958 91177308-0d34-0410-b5e6-96231b3b80d8
The previous SwiftCC support for AAPCS64 was partially correct. It
setup swiftself parameters in the proper register but failed to setup
swifterror in the correct register. This would break compilation of
swift code for non-Darwin AAPCS64 conforming environments.
Resubmit "[lit] Refactor out some more common lit configuration code."
There were two issues, one Python 3 specific related to Unicode,
and another which is that the tool substitution for lld no longer
rejected matches where a / preceded the tool name.
Enable the reuse of values computed in a previous loop iteration.
This patch adds a pass that removes the computation of provably redundant
expressions that have been computed earlier in a previous iteration. It
relies on the use of PHIs to identify loop carried dependences.
Kevin Enderby [Thu, 21 Sep 2017 21:45:02 +0000 (21:45 +0000)]
Fix a bug in llvm-objdump when disassembling using the wrong default CPU
in the second slice of a Mach-O universal file.
The code in llvm-objdump in in DisassembleMachO() was getting the default
CPU then incorrectly setting into the global variable used for the -mcpu option
if that was not set. This caused a second call to DisassembleMachO() to use
the wrong default CPU when disassembling the next slice in a Mach-O universal
file. And would result in bad disassembly and an error message about an
recognized processor for the target:
% llvm-objdump -d -m -arch all fat.macho-armv7s-arm64
fat.macho-armv7s-arm64 (architecture armv7s):
(__TEXT,__text) section
armv7:
0: 60 47 bx r12
fat.macho-armv7s-arm64 (architecture arm64):
'cortex-a7' is not a recognized processor for this target (ignoring processor)
'cortex-a7' is not a recognized processor for this target (ignoring processor)
(__TEXT,__text) section
___multc3:
0: .long 0x1e620810
[lit] Refactor out some more common lit configuration code.
debuginfo-tests has need to reuse a lot of common configuration
from clang and lld, and in general it seems like all of the
projects which are tightly coupled (e.g. lld, clang, llvm, lldb,
etc) can benefit from knowing about one other. For example,
lldb needs to know various things about how to run clang in its
test suite. Since there's a lot of common substitutions and
operations that need to be shared among projects, sinking this
up into LLVM makes sense.
In addition, this patch introduces a function add_tool_substitution
which handles all the dirty intricacies of matching tool names
which was previously copied around the various config files. This
is now a simple straightforward interface which is hard to mess
up.
[lit] Actually do normalize the case of files in the config map.
This has gone back and forth, but it seems this is necessary
after all. realpath is not sufficient because if you have a
file named 'C:\foo.txt', then both realpath('c:\foo.txt') and
realpath(C:\foo.txt') return the string that was passed to them
exactly as is, meaning the case of the drive-letter won't match.
The problem before was not that we were normalizing the case of
items going into the config map, but rather that we were
normalizing the case of something we needed to print. The value
that is used to key on the config map should never be printed.
[AArch64] Fix bug in store of vector 0 DAGCombine.
Summary:
Avoid using XZR/WZR directly as operands to split stores of zero
vectors. Doing so can lead to the XZR/WZR being used by an instruction
that doesn't allow it (e.g. add).
[dwarfdump] Add verbose output for .debug-line section
This patch adds dumping of line table instructions as well as the final
state at each specified pc value in verbose mode. This is essentially
the same as the default in Darwin's dwarfdump. Dumping the actual line
table opcodes can be particularly useful for something like debugging a
bad `.debug_line` section.
[DAGCombiner] Slightly simplify some code by using APInt::isMask() and countTrailingOnes instead of getting active bits and checking if all the bits below that make a mask.
At least for the 64-bit and less case, we should be able to determine if we even have a mask without counting any bits. This also removes the need to explicitly check for 0 active bits, isMask will return false for 0.
Re-land r313825: "[IR] Add llvm.dbg.addr, a control-dependent version of llvm.dbg.declare"
The fix is to avoid invalidating our insertion point in
replaceDbgDeclare:
Builder.insertDeclare(NewAddress, DIVar, DIExpr, Loc, InsertBefore);
+ if (DII == InsertBefore)
+ InsertBefore = &*std::next(InsertBefore->getIterator());
DII->eraseFromParent();
I had to write a unit tests for this instead of a lit test because the
use list order matters in order to trigger the bug.
The reduced C test case for this was:
void useit(int*);
static inline void inlineme() {
int x[2];
useit(x);
}
void f() {
inlineme();
inlineme();
}
[SelectionDAG] Pick correct frame index in LowerArguments
Summary:
SelectionDAGISel::LowerArguments is associating arguments
with frame indices (FuncInfo->setArgumentFrameIndex). That
information is later on used by EmitFuncArgumentDbgValue to
create DBG_VALUE instructions that denotes that a variable
can be found on the stack.
I discovered that for our (big endian) out-of-tree target
the association created by SelectionDAGISel::LowerArguments
sometimes is wrong. I've seen this happen when a 64-bit value
is passed on the stack. The argument will occupy two stack
slots (frame index X, and frame index X+1). The fault is
that a call to setArgumentFrameIndex is associating the
64-bit argument with frame index X+1. The effect is that the
debug information (DBG_VALUE) will point at the least significant
part of the arguement on the stack. When printing the
argument in a debugger I will get the wrong value.
I managed to create a test case for PowerPC that seems to
show the same kind of problem.
The bugfix will look at the datalayout, taking endianness into
account when examining a BUILD_PAIR node, assuming that the
least significant part is in the first operand of the BUILD_PAIR.
For big endian targets we should use the frame index from
the second operand, as the most significant part will be stored
at the lower address (using the highest frame index).
[lit] Don't norm case when inserting into the config map.
This makes all paths lowercase on Windows, which seemed like a
good idea at the time, but it means that tests can't properly
use FileCheck to match expected path names.
Config map is not exposed through the command line, so testing this
is somewhat tricky. But basically we need a test that if a custom
driver builds a config map and passes it to main, it gets respected.
A config map allows config files in the source tree to be mapped
to alternate config files in the build tree. This particular test
works by having two config files in separate directories, and
setting up a config map to have that redirects A/lit.site.cfg
to B/altconfig. Then, we print a message in A/lit.site.cfg
and B/altconfig and check that we do see the output from B
but don't see the output from A. Additionally we test that
the test suite specified by A's config map is properly discovered.
[Power9] Spill gprs to vector registers rather than stack
This patch updates register allocation to enable spilling gprs to
volatile vector registers rather than the stack. It can be enabled
for Power9 with option -ppc-enable-gpr-to-vsr-spills.
Simon Atanasyan [Thu, 21 Sep 2017 14:04:53 +0000 (14:04 +0000)]
[mips] Implement generation of relocations "chains" used by N32 ABI
In case of using a "nested" relocation expressions like this
`%hi(%neg(%gp_rel()))`, N32 ABI requires generation of three consecutive
relocations. That differs from the N64 ABI case where all relocations
are packed into the single relocation record.
Simon Atanasyan [Thu, 21 Sep 2017 14:04:47 +0000 (14:04 +0000)]
[mips] Do not pass redundant IsN64 flag to MCELFObjectTargetWriter. NFC
Now we pass the 'Is64_' flag to the MCELFObjectTargetWriter ctor iif
when we make deal with N64 ABI. So it is redundant to pass additional
'IsN64' flag.