I'm trying to refactor some shared code for integer div/rem,
but I keep having to scroll through fdiv. The FP ops have
nothing in common with the integer ops, so I'm moving FP
below everything else.
While here, improve a couple of comments and fix some formatting.
Gadi Haber [Mon, 11 Sep 2017 11:26:20 +0000 (11:26 +0000)]
[X86][SKX][KNL] Updating several CodeGen tests to use the attr flag instead of mcpu flag
NFC.
Updated 3 Codegen regression tests to use the -mattr flag instead of the -mcpu flags as follows:
Instead of -mcpu=skx use -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq
Instead of -mcpu=knl use -mattr=+avx512f
This is a preparatory step for D34515 and also is being recommitted as its
first version caused PR34045.
This change:
- makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
- lowering is done by first converting the boolean value into the carry flag
using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
operations does the actual addition.
- for subtraction, given that ISD::SUBCARRY second result is actually a
borrow, we need to invert the value of the second operand and result before
and after using ARMISD::SUBE. We need to invert the carry result of
ARMISD::SUBE to preserve the semantics.
- given that the generic combiner may lower ISD::ADDCARRY and
ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
as well otherwise i64 operations now would require branches. This implies
updating the corresponding test for unsigned.
- add new combiner to remove the redundant conversions from/to carry flags
to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
- fixes PR34045
Fixed a bug in splitting Scatter operation in the Type Legalizer.
After the split of the Scatter operation, the order of the new instructions is well defined - Lo goes before Hi. Otherwise the semantic of Scatter (from LSB to MSB) is broken.
I'm chaining 2 nodes to prevent reordering.
[InstSimplify] refactor udiv/urem code and add tests; NFCI
This removes some duplicated code and makes it easier to support signed div/rem
in a similar way if we want to do that. Note that the existing comments were not
accurate - we don't need a constant divisor to simplify; icmp simplification does
more than that. But as the added tests show, it could go even further.
Merge isKnownNonNull into isKnownNonZero
It now knows the tricks of both functions.
Also, fix a bug that considered allocas of non-zero address space to be always non null
MinSeong Kim [Sat, 9 Sep 2017 14:17:52 +0000 (14:17 +0000)]
[CMake] Update GetSVN.cmake to handle repo
Summary:
When repo is used with git, 'clang --version' option does not display
the correct revision information (i.e. git hash on TOP) as the following:
clang version 6.0.0 --->
clang version 6.0.0 (clang version) (llvm version)
This is because repo also creates .git/svn folder as git-svn does and
this makes repo with git uses "git svn info" command, which is only for
git-svn, to retrieve its revision information, making null for the info.
To correctly distinguish between git-svn and repo with git, the folder
hierarchy to specify for git-svn should be .git/svn/refs as the "git svn
info" command depends on the revision data in .git/svn/refs. This patch
in turn makes repo with git passes through to the third macro,
get_source_info_git, in get_source_info function, resulting in correctly
retrieving the revision information for repo with git using "git log ..."
command.
This patch is tested with git, svn, git-svn, and repo with git.
[DivRempairs] add a pass to optimize div/rem pairs (PR31028)
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented
as an independent pass, so there's no stretching of scope and feature creep for an existing pass.
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028
The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.
Decomposing remainder may allow removing some code from the backend (PPC and possibly others).
[X86] Call removeDeadNode when we're done doing custom isel for mul, div and test
Summary:
Once we've done our custom isel for these nodes, I think we should be calling removeDeadNode to prune them out of the DAG. Table driven isel ultimately either calls morphNodeTo which modifies a node and doesn't leave dead nodes. Or it emits new nodes and then calls removeDeadNode as part of Opc_CompleteMatch.
If you run a simple multiply test case like this through llc with -debug you'll see a umul_lohi node get printed as part of the dump for Instruction Selection ends.
- Use range based for
- Variable names should start with upper case
- Add `const`
- Change class name to match filename
- Fix doxygen comments
- Use MCPhysReg instead of unsigned
- Use references instead of pointers where things cannot be nullptr
- Misc coding style improvements
PPC: Don't select lxv/stxv for insufficiently aligned stack slots.
The lxv/stxv instructions require an offset that is 0 % 16. Previously we were
selecting lxv/stxv for loads and stores to the stack where the offset from the
slot was a multiple of 16, but the stack slot was not 16 or more byte aligned.
When the frame gets lowered these transform to r(1|31) + slot + offset.
If slot is not aligned, slot + offset may not be 0 % 16.
Now we require 16 byte or more alignment for select lxv/stxv to stack slots.
Includes a testcase that shows both sufficiently and insufficiently aligned
stack slots.
[TargetTransformInfo] Remove the extra "default" in a switch that all enum values has been covered.
In function TargetTransformInfo::getInstructionCost, all enum values in the switch statement has been covered, so the default is unnecessary, and may cause error with option -Werror,-Wcovered-switch-default, so remove it.
Yonghong Song [Fri, 8 Sep 2017 23:32:38 +0000 (23:32 +0000)]
bpf: proper print imm64 expression in inst printer
Fixed an issue in printImm64Operand where if the value is
an expression, print out the expression properly. Currently,
it will print
r1 = <MCOperand Expr:(tx_port)>ll
With the patch, the printout will be
r1 = tx_port
Suggested-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312833 91177308-0d34-0410-b5e6-96231b3b80d8
[TargetTransformInfo] Add a new public interface getInstructionCost
Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface:
enum TargetCostKind {
TCK_RecipThroughput, ///< Reciprocal throughput.
TCK_Latency, ///< The latency of instruction.
TCK_CodeSize ///< Instruction code size.
};
int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const;
All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model.
This patch also provides a simple default implementation of getInstructionLatency.
The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways:
Add more detail into this function.
Add getXXXLatency function and call it from here.
Implement target specific getInstructionLatency function.
Petr Hosek [Fri, 8 Sep 2017 22:26:50 +0000 (22:26 +0000)]
[CMake][runtimes] Use the same configuration for non-target and "default" target
The default host target for builtins and runtimes has special behavior
on some platforms, e.g. on Linux both i386 and x86_64 targets are being
built. Specifying "default" as a target name should lead to the same
behavior, which wasn't the case in the past. This patch unifies the
configuration between the non-target and "default" target to produce the
same behavior by moving the default configuration into a function that
can be used from both paths.
On a Windows bot, I see a FileCheck error where the source being matched
over no longer exists, i.e it seems like it's FileCheck'ing some stale
output:
Matt Arsenault [Fri, 8 Sep 2017 19:09:13 +0000 (19:09 +0000)]
AMDGPU: Start using !con operator
We have a lot of operand definition work essentially producing
every valid permutation of operands to workaround builiding
operand lists based on the instruction features. Apparently tablegen
already has a mostly undocumented operator to concat dags which
simplies this.
Convert one simple place to use this. The BUF instruction definitions
have much more complicated logic that can be totally rewritten now.
[llvm-cov] Disable name-compression in a test binary
This should fix the lld bot:
The Buildbot has detected a new failure on builder llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast while building cfe.
Full details are available at:
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/16993
A coverage segment contains a starting line and column, an execution
count, and some other metadata. Clients of the coverage library use
segments to prepare line-oriented reports.
Users of the coverage library depend on segments being unique and sorted
in source order. Currently this is not guaranteed (this is why the clang
change which introduced deferred regions was reverted).
This commit documents the "unique and sorted" condition and asserts that
it holds. It also fixes the SegmentBuilder so that it produces correct
output in some edge cases.
Testing: I've added unit tests for some edge cases. I've also checked
that the new SegmentBuilder implementation is fully covered. Apart from
running check-profile and the llvm-cov tests, I've successfully used a
stage1 llvm-cov to prepare a coverage report for an instrumented clang
binary.
[Coverage] Report errors when reading malformed source regions
Each source region has a start and end location. Report an error when
the end location does not precede the begin location.
The old lineExecutionCounts.covmapping test actually had a buggy source
region in it. This commit introduces a regenerated copy of the coverage
and moves the old copy to malformedRegions.covmapping, for a test.
[llvm-cov] Unify region marker placement between text/html modes
Make sure that the text and html emitters always emit the same set of
region markers, and avoid emitting redundant markers for line segments
which don't end on the line they start on.
Wei Mi [Fri, 8 Sep 2017 16:44:52 +0000 (16:44 +0000)]
Fix a bug for rL312641.
rL312641 Allowed llvm.memcpy/memset/memmove to be tail calls when parent
function return the intrinsics's first argument. However on arm-none-eabi
platform, llvm.memcpy will be expanded to __aeabi_memcpy which doesn't
have return value. The fix is to check the libcall name after expansion
to match "memcpy/memset/memmove" before allowing those intrinsic to be
tail calls.
SLP vectorizer supports horizontal reductions for Add/FAdd binary
operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for
binary operation reductions and getMinMaxReductionCost() for min/max
reductions.
Patch fixes PR26956.
This patch adds prologue verification, which is already present in
Apple's dwarfdump. It checks for invalid directory indices and warns
about duplicate file paths.
- // `ICI` can either be a comparison against IV or a comparison of IV.next.
- // Depending on the interpretation, we calculate the start value differently.
+ // `ICI` is interpreted as taking the backedge if the *next* value of the
+ // induction variable satisfies some constraint.
- // Pair {IndVarBase; IsIndVarNext} semantically designates whether the latch
- // comparisons happens against the IV before or after its value is
- // incremented. Two valid combinations for them are:
- //
- // 1) { phi [ iv.start, preheader ], [ iv.next, latch ]; false },
- // 2) { iv.next; true }.
- //
- // The latch comparison happens against IndVarBase which can be either current
- // or next value of the induction variable.
const SCEVAddRecExpr *IndVarBase = cast<SCEVAddRecExpr>(LeftSCEV);
bool IsIncreasing = false;
bool IsSignedPredicate = true;
- bool IsIndVarNext = false;
ConstantInt *StepCI;
if (!IsInductionVar(IndVarBase, IsIncreasing, StepCI)) {
FailureReason = "LHS in icmp not induction variable";
return None;
}
- const SCEV *IndVarStart = nullptr;
- // TODO: Currently we only handle comparison against IV, but we can extend
- // this analysis to be able to deal with comparison against sext(iv) and such.
- if (isa<PHINode>(LeftValue) &&
- cast<PHINode>(LeftValue)->getParent() == Header)
- // The comparison is made against current IV value.
- IndVarStart = IndVarBase->getStart();
- else {
- // Assume that the comparison is made against next IV value.
- const SCEV *StartNext = IndVarBase->getStart();
- const SCEV *Addend = SE.getNegativeSCEV(IndVarBase->getStepRecurrence(SE));
- IndVarStart = SE.getAddExpr(StartNext, Addend);
- IsIndVarNext = true;
- }
+ const SCEV *StartNext = IndVarBase->getStart();
+ const SCEV *Addend = SE.getNegativeSCEV(IndVarBase->getStepRecurrence(SE));
+ const SCEV *IndVarStart = SE.getAddExpr(StartNext, Addend);
const SCEV *Step = SE.getSCEV(StepCI);
[XRay][CodeGen][PowerPC] Fix tail exit codegen for XRay in PPC
Summary:
This fixes code-gen for XRay in PPC. The regression wasn't caught by
codegen tests which we add in this change.
What happened was the following:
- For tail exits, we used to unconditionally prepend the returns/exits
with a pseudo-instruction that gets lowered to the instrumentation
sled (and leave the actual return/exit instruction as-is).
- Changes to the XRay instrumentation pass caused the tail exits to
suddenly also emit the tail exit pseudo-instruction, since the check
for whether a return instruction was also a call instruction meant it
was a tail exit instruction.
- None of the tests caught the regression either due to non-existent
tests, or the tests being disabled/removed for continuous breakage.
This change re-introduces some of the basic tests and verifies that
we're back to a state that allows the back-end to generate appropriate
XRay instrumented binaries for PPC in the presence of tail exits.
[x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to
cover the bitwise operators.
Nothing really exciting here, this just stamps out the rest of the core
operations that can RMW memory and set flags.
Still not implemented here: ADC, SBB. Those will require more
interesting logic to channel the flags *in*, and I'm not currently
planning to try to tackle that. It might be interesting for someone who
wants to improve our code generation for bignum implementations.
[x86] Extend the manual ISel of `add` and `sub` with both RMW memory
operands and used flags to support matching immediate operands.
This is a bit trickier than register operands, and we still want to fall
back on a register operands even for things that appear to be
"immediates" when they won't actually select into the operation's
immediate operand. This also requires us to handle things like selecting
`sub` vs. `add` to minimize the number of bits needed to represent the
immediate, and picking the shortest immediate encoding. In order to
that, we in turn need to scan to make sure that CF isn't used as it will
get inverted.
The end result seems very nice though, and we're now generating
optimal instruction sequences for these patterns IMO.
A follow-up patch will further expand this to other operations with RMW
memory operands. But handing `add` and `sub` are useful starting points
to flesh out the machinery and make sure interesting and complex cases
can be handled.
Thanks to Craig Topper who provided a few fixes and improvements to this
patch in addition to the review!
Petr Hosek [Thu, 7 Sep 2017 23:02:50 +0000 (23:02 +0000)]
[llvm-objcopy] Add support for special section indexes in symbol table greater than SHN_LORESERVE
As is indexes above SHN_LORESERVE will not be handled correctly because
they'll be treated as indexes of sections rather than special values
that should just be copied. This change adds support to copy them
though.
Lang Hames [Thu, 7 Sep 2017 21:04:00 +0000 (21:04 +0000)]
[ORC] Add ErrorSuccess and void specializations to AsyncHandlerTraits.
This will allow async handlers to be added that return void or Error::success().
Such handlers are expected to be common, since one of the primary uses of
addAsyncHandler is to run the body of the handler in a detached thread, in which
case the main handler returns immediately and does not need to provide an Error
value.
Petr Hosek [Thu, 7 Sep 2017 20:44:16 +0000 (20:44 +0000)]
[yaml2obj][ELF] Add support for symbol indexes greater than SHN_LORESERVE
Right now Symbols must be either undefined or defined in a specific
section. Some symbols have section indexes like SHN_ABS however. This
change adds support for outputting symbols that have such section
indexes.
COFF: PDB: Allow multiple modules with the same name.
It is possible for two modules to have the same name if they are
archive members with the same name, or if we are doing LTO (in which
case all modules will have the name "lto.tmp").
Keith Wyss [Thu, 7 Sep 2017 18:07:48 +0000 (18:07 +0000)]
[XRay][tools] Function call stack based analysis tooling for XRay traces
Second try after fixing a code san problem with iterator reference types.
This change introduces a subcommand to the llvm-xray tool called
"stacks" which allows for analysing XRay traces provided as inputs and
accounting time to stacks instead of just individual functions. This
gives us a more precise view of where in a program the latency is
actually attributed.
The tool uses a trie data structure to keep track of the caller-callee
relationships as we process the XRay traces. In particular, we keep
track of the function call stack as we enter functions. While we're
doing this we're adding nodes in a trie and indicating a "calls"
relatinship between the caller (current top of the stack) and the callee
(the new top of the stack). When we push function ids onto the stack, we
keep track of the timestamp (TSC) for the enter event.
When exiting functions, we are able to account the duration by getting
the difference between the timestamp of the exit event and the
corresponding entry event in the stack. This works even if we somehow
miss the exit events for intermediary functions (i.e. if the exit event
is not cleanly associated with the enter event at the top of the stack).
The output of the tool currently provides just the top N leaf functions
that contribute the most latency, and the top N stacks that have the
most frequency. In the future we can provide more sophisticated query
mechanisms and potentially an export to database feature to make offline
analysis of the stack traces possible with existing tools.
Fixes some combine issues for AMDGPU where we weren't
getting the many extract_vector_elt combines expected
in a future patch.
This should really be checking isOperationLegalOrCustom on
the extract. That improves a number of x86 lit tests, but
a few get stuck in an infinite loop from one place
where a similar looking extract is created. I have a
different workaround in the backend for that which
keeps many of those improvements, but also adds a few
regressions.
Benjamin Kramer [Thu, 7 Sep 2017 14:52:26 +0000 (14:52 +0000)]
[ARM] Remove redundant vcvt patterns.
These don't add any value as they're just compositions of existing
patterns. However, they can confuse the cost logic in ISel, leading to
duplicated vcvt instructions like in PR33199.