Implement aggregate structure split to simpler types in splitToValueTypes.
splitToValueTypes is used for return values.
According to MipsABIInfo from clang/lib/CodeGen/TargetInfo.cpp,
aggregate structure arguments for O32 always get simplified and thus
will remain unsupported by the MIPS GlobalISel for the time being.
For O32, aggregate structures can be encountered only for complex number
returns e.g. 'complex float' or 'complex double' from <complex.h>.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<MCConstantExpr> directly and if not assert will fire for us.
[X86] Remove isCodeGenOnly from (V)ROUND.*_Int and put it on the non _Int form instead.
This matches what's done for VRNDSCALE and most other instructions.
This mainly determines which instruction will be preferred by
disassembler and assembly parser. The printing and encoding
information is the same.
We prefer the _Int form since it uses the VR128 class due to
intrinsic interface. For some of EVEX features like embedded
rounding, we only select from intrinsics today. So there is
only a VR128 version. So making the VR128 version the preferred
is overally consistent.
Mikael Holmen [Thu, 26 Sep 2019 06:35:55 +0000 (06:35 +0000)]
[IfConversion] Disallow TBB == FBB for valid triangles
Summary:
Previously the case
EBB
| \_
| |
| TBB
| /
FBB
was treated as a valid triangle also when TBB and FBB was the same basic
block. This could then lead to an invalid CFG when we removed the edge
from EBB to TBB, since that meant we would also remove the edge from EBB
to FBB.
Since TBB == FBB is quite a degenerated case of a triangle, we now
don't treat it as a valid triangle anymore, and thus we will avoid the
trouble with updating the CFG.
Previously we might attempt to use a BitCast to turn bits into vectors of pointers,
but that requires an inttoptr cast to be legal. Add an assertion to detect the formation of illegal bitcast attempts
early (in the tests, we often constant-fold away the result before getting to this assertion check),
while being careful to still handle the early-return conditions without adding extra complexity in the result.
Patch by Jameson Nash <jameson@juliacomputing.com>.
Nick Lewycky [Thu, 26 Sep 2019 00:58:55 +0000 (00:58 +0000)]
Improve C API support for atomicrmw and cmpxchg.
atomicrmw and cmpxchg have a volatile flag, so allow them to be get and set with LLVM{Get,Set}Volatile. atomicrmw and fence have orderings, so allow them to be get and set with LLVM{Get,Set}Ordering. Add missing LLVMAtomicRMWBinOpFAdd and LLVMAtomicRMWBinOpFSub enum constants. AtomicCmpXchg also has a weak flag, add a getter/setter for that too. Add a getter/setter for the binary-op of an atomicrmw.
atomicrmw and cmpxchg have a volatile flag, so allow it to be set/get with LLVMGetVolatile and LLVMSetVolatile. Add missing LLVMAtomicRMWBinOpFAdd and LLVMAtomicRMWBinOpFSub enum constants. AtomicCmpXchg also has a weak flag, add a getter/setter for that too. Add a getter/setter for the binary-op of an atomicrmw.
Add LLVMIsA## for CatchSwitchInst, CallBrInst and FenceInst, as well as AtomicCmpXchgInst and AtomicRMWInst.
Update llvm-c-test to include atomicrmw and fence, and to copy volatile for the four applicable instructions.
Thomas Raoux [Thu, 26 Sep 2019 00:16:01 +0000 (00:16 +0000)]
[TargetLowering] Make allowsMemoryAccess methode virtual.
Rename old function to explicitly show that it cares only about alignment.
The new allowsMemoryAccess call the function related to alignment by default
and can be overridden by target to inform whether the memory access is legal or
not.
[X86] Use VR512_0_15RegClass intead of VR512RegClass in X86VZeroUpper.
This pass is only concerned with ZMM0-15 and YMM0-15. For YMM
we use VR256 which only contains YMM0-15, but for ZMM we were
using VR512 which contains ZMM0-31. Using VR512_0_15 is more
correct.
Given that the ABI and register allocator will use registers in
order, its unlikely that register from 16-31 would be used
without also using 0-15. So this probably doesn't functionally
matter.
[MemorySSA] Avoid adding Phis in the presence of unreachable blocks.
Summary:
If a block has all incoming values with the same MemoryAccess (ignoring
incoming values from unreachable blocks), then use that incoming
MemoryAccess and do not create a Phi in the first place.
Revert IDF work-around added in rL372673; it should not be required unless
the Def inserted is the first in its block.
The patch also cleans up a series of tests, added during the many
iterations on insertDef.
The patch also fixes PR43438.
The same issue that occurs in insertDef with "adding phis, hence the IDF of
Phis is needed", can also occur in fixupDefs: the `getPreviousRecursive`
call only adds Phis walking on the predecessor edges, which means there
may be the case of a Phi added walking the CFG "backwards" which
triggers the needs for an additional Phi in successor blocks.
Such Phis are added during fixupDefs only in the presence of unreachable
blocks.
Hence this highlights the need to avoid adding Phis in blocks with
unreachable predecessors in the first place.
Nick Desaulniers [Wed, 25 Sep 2019 22:28:27 +0000 (22:28 +0000)]
[Verifier] add invariant check for callbr
Summary:
The list of indirect labels should ALWAYS have their blockaddresses as
argument operands to the callbr (but not necessarily the other way
around). Add an invariant that checks this.
The verifier catches a bad test case that was added recently in r368478.
I think that was a simple mistake, and the test was made less strict in
regards to the precise addresses (as those weren't specifically the
point of the test).
This invariant will be used to find a reported bug.
[InstSimplify] Match 1.0 and 0.0 for both operands in SimplifyFMAMul
Because we do not constant fold multiplications in SimplifyFMAMul,
we match 1.0 and 0.0 for both operands, as multiplying by them
is guaranteed to produce an exact result (if it is allowed to do so).
Note that it is not enough to just swap the operands to ensure a
constant is on the RHS, as we want to also cover the case with
2 constants.
Roman Lebedev [Wed, 25 Sep 2019 19:06:40 +0000 (19:06 +0000)]
[InstCombine] Fold (A - B) u>=/u< A --> B u>/u<= A iff B != 0
https://rise4fun.com/Alive/KtL
This also shows that the fold added in D67412 / r372257
was too specific, and the new fold allows those test cases
to be handled more generically, therefore i delete now-dead code.
This is yet again motivated by
D67122 "[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour"
Roman Lebedev [Wed, 25 Sep 2019 19:06:26 +0000 (19:06 +0000)]
[NFC][InstCombine] Add tests for (X - Y) < X --> Y <= X iff Y != 0
https://rise4fun.com/Alive/KtL
This should go to InstCombiner::foldICmpBinO(), next to
"Convert sub-with-unsigned-overflow comparisons into a comparison of args."
[MSP430] Allow msp430_intrcc functions to not have interrupt attribute.
Summary:
Useful in case you want to have control over interrupt vector generation.
For example in Rust language we have an arrangement where all unhandled
ISR vectors gets mapped to a single default handler function. Which is
hard to implement when LLVM tries to generate vectors on its own.
Bob Haarman [Wed, 25 Sep 2019 18:16:02 +0000 (18:16 +0000)]
[emacs] simplify and improve keyword highlighting in tablegen-mode.el
Summary:
The keyword and type keyword matchers in tablegen-mode.el checked
for space, newline, tab, or open paren after the regular expression
that matches keywords (or type keywords, respectively). This is
unnecessary, because those regular expressions already include word
boundaries. This change removes the extra check. This also causes
"def" in "def:" to be highlighted as a keyword, which was missed
before.
[InstCombine] Limit FMul constant folding for fma simplifications.
As @reames pointed out post-commit, rL371518 adds additional rounding
in some cases, when doing constant folding of the multiplication.
This breaks a guarantee llvm.fma makes and must be avoided.
This patch reapplies rL371518, but splits off the simplifications not
requiring rounding from SimplifFMulInst as SimplifyFMAFMul.
[AArch64][GlobalISel] Choose CCAssignFns per-argument for tail call lowering
When checking for tail call eligibility, we should use the correct CCAssignFn
for each argument, rather than just checking if the caller/callee is varargs or
not.
This is important for tail call lowering with varargs. If we don't check it,
then basically any varargs callee with parameters cannot be tail called on
Darwin, for one thing. If the parameters are all guaranteed to be in registers,
this should be entirely safe.
On top of that, not checking for this could potentially make it so that we have
the wrong stack offsets when checking for tail call eligibility.
Also refactor some of the stuff for CCAssignFnForCall and pull it out into a
helper function.
Update call-translator-tail-call.ll to show that we can now correctly tail call
on Darwin. Also add two extra tail call checks. The first verifies that we still
respect the caller's stack size, and the second verifies that we still don't
tail call when a varargs function has a memory argument.
[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets
Modern processors predict the targets of an indirect branch regardless of
the size of any jump table used to glean its target address. Moreover,
branch predictors typically use resources limited by the number of actual
targets that occur at run time.
This patch changes the semantics of the option `-max-jump-table-size` to limit
the number of different targets instead of the number of entries in a jump
table. Thus, it is now renamed to `-max-jump-table-targets`.
Before, when `-max-jump-table-size` was specified, it could happen that
cluster jump tables could have targets used repeatedly, but each one was
counted and typically resulted in tables with the same number of entries.
With this patch, when specifying `-max-jump-table-targets`, tables may have
different lengths, since the number of unique targets is counted towards the
limit, but the number of unique targets in tables is the same, but for the
last one containing the balance of targets.
[DAGCombiner] add one-use restriction to vector transform with cheap extract
We might be able to do better on the example in the test,
but in general, we should not scalarize a splatted vector
binop if there are other uses of the binop. Otherwise, we
can end up with code as we had - a scalar op that is
redundant with a vector op.
[PatternMatch] Make m_Br more flexible, add matchers for BB values.
Currently m_Br only takes references to BasicBlock*, which limits its
flexibility. For example, you have to declare a variable, even if you
ignore the result or you have to have additional checks to make sure the
matched BB matches an expected one.
This patch adds m_BasicBlock and m_SpecificBB matchers, which can be
used like the existing matchers for constants or values.
I also had a look at the existing uses and updated a few. IMO it makes
the code a bit more explicit.
Simon Pilgrim [Wed, 25 Sep 2019 14:55:57 +0000 (14:55 +0000)]
[TargetInstrInfo] Let findCommutedOpIndices take const MachineInstr&
Neither the base implementation of findCommutedOpIndices nor any in-tree target modifies the instruction passed in and there is no reason why they would in the future.
[IR] allow fast-math-flags on phi of FP values (2nd try)
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917 <https://reviews.llvm.org/D61917>
As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086
The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535
But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.
The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.
Jakub Kuderski [Wed, 25 Sep 2019 14:04:36 +0000 (14:04 +0000)]
[Dominators][AMDGPU] Don't use virtual exit node in findNearestCommonDominator. Cleanup MachinePostDominators.
Summary:
This patch fixes a bug that originated from passing a virtual exit block (nullptr) to `MachinePostDominatorTee::findNearestCommonDominator` and resulted in assertion failures inside its callee. It also applies a small cleanup to the class.
The patch introduces a new function in PDT that given a list of `MachineBasicBlock`s finds their NCD. The new overload of `findNearestCommonDominator` handles virtual root correctly.
Note that similar handling of virtual root nodes is not necessary in (forward) `DominatorTree`s, as right now they don't use virtual roots.
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917
As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086
The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535
But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.
The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.
The --bytes option uses the phrase "printable ASCII characters", but the
description section used simply "printable characters". To avoid any
confusion about locale impacts etc, this change adopts the former's
phrasing in both places. It also fixes a minor grammar issue in the
description.
James Henderson [Wed, 25 Sep 2019 13:09:12 +0000 (13:09 +0000)]
[docs][llvm-strip] Update llvm-strip doc to better match llvm-objcopy's
Main changes are mostly wording of some options, but this change also
fixes a switch reference so that a link is created and moves
--strip-sections into the ELF-specific area since it is only supported
for ELF currently.
George Rimar [Wed, 25 Sep 2019 12:09:30 +0000 (12:09 +0000)]
[yaml2elf] - Support describing .stack_sizes sections using unique suffixes.
Currently we can't use unique suffixes in section names to describe
stack sizes sections. E.g. '.stack_sizes [1]' will be treated as a regular section.
This happens because we recognize stack sizes section by name and
do not yet drop the suffix before the check.
George Rimar [Wed, 25 Sep 2019 11:40:11 +0000 (11:40 +0000)]
[yaml2obj] - Add a Size field for StackSizesSection.
It is a follow-up requested in the review comment
for D67757. Allows to use Content + Size or just Size
when describing .stack_sizes sections in YAML document
David Green [Wed, 25 Sep 2019 10:16:48 +0000 (10:16 +0000)]
[ARM] Ensure we do not attempt to create lsll #0
During legalisation we can end up with some pretty strange nodes, like shifts
of 0. We need to make sure we don't try to make long shifts of these, ending up
with invalid assembly instructions. A long shift with a zero immediate actually
encodes a shift by 32.
George Rimar [Wed, 25 Sep 2019 10:14:50 +0000 (10:14 +0000)]
[llvm-readobj] - Don't crash when dumping .stack_sizes and unable to find a relocation resolver.
The crash might happen when we have either a broken or unsupported object
and trying to resolve relocations when dumping the .stack_sizes section.
For the test case I used a 32-bits ELF header and a 64-bit relocation.
In this case a null pointer is returned by the code instead of the relocation
resolver function and then we crash.
Thomas Lively [Wed, 25 Sep 2019 00:15:59 +0000 (00:15 +0000)]
[WebAssembly][NFC] Remove duplicate SIMD instructions and predicates
Summary:
Instead of having different v128.load and v128.store instructions for
each MVT, just have one of each that is reused in all the
patterns. Also removes the HasSIMD128 predicate where accompanied by
HasUnimplementedSIMD128, since the latter implies the former.
Yonghong Song [Tue, 24 Sep 2019 22:38:43 +0000 (22:38 +0000)]
[BPF] Generate array dimension size properly for zero-size elements
Currently, if an array element type size is 0, the number of
array elements will be set to 0, regardless of what user
specified. This implementation is done in the beginning where
BTF is mostly used to calculate the member offset.
For example,
struct s {};
struct s1 {
int b;
struct s a[2];
};
struct s1 s1;
The BTF will have struct "s1" member "a" with element count 0.
Now BTF types are used for compile-once and run-everywhere
relocations and we need more precise type representation
for type comparison. Andrii reported the issue as there
are differences between original structure and BTF-generated
structure.
This patch made the change to correctly assign "2"
as the number elements of member "a".
Some dead codes related to ElemSize compuation are also removed.
Adding support for overriding LLVM_ENABLE_RUNTIMES for runtimes builds.
Second attempt: Now with ';' -> '|' replacement.
On some platforms, certain runtimes are not supported. For runtimes builds of
those platforms it would be nice if we could disable certain runtimes (ie
libunwind on Windows).
Philip Reames [Tue, 24 Sep 2019 17:24:16 +0000 (17:24 +0000)]
[GCRelocate] Add a peephole to canonicalize base pointer relocation
If we generate the gc.relocate, and then later prove two arguments to the statepoint are equivalent, we should canonicalize the gc.relocate to the form we would have produced if this had been known before rewriting.
Roman Lebedev [Tue, 24 Sep 2019 16:10:50 +0000 (16:10 +0000)]
[InstCombine] (a+b) < a && (a+b) != 0 -> (0-b) < a iff a/b != 0 (PR43259)
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.
For
```
#include <cassert>
char* test(char& base, signed long offset) {
__builtin_assume(offset < 0);
return &base + offset;
}
```
We produce
https://godbolt.org/z/r40U47
and again those two icmp's can be merged:
```
Name: 0
Pre: C != 0
%adjusted = add i8 %base, C
%not_null = icmp ne i8 %adjusted, 0
%no_underflow = icmp ult i8 %adjusted, %base
%r = and i1 %not_null, %no_underflow
=>
%neg_offset = sub i8 0, C
%r = icmp ugt i8 %base, %neg_offset
```
https://rise4fun.com/Alive/ALap
https://rise4fun.com/Alive/slnN
There are 3 other variants of this pattern,
i believe they all will go into InstSimplify.
Roman Lebedev [Tue, 24 Sep 2019 16:10:38 +0000 (16:10 +0000)]
[InstCombine] (a+b) <= a && (a+b) != 0 -> (0-b) < a (PR43259)
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.
This pattern isn't exactly what we get there
(strict vs. non-strict predicate), but this pattern does not
require known-bits analysis, so it is best to handle it first.
Regex: Make "match" and "sub" const member functions
Summary:
The Regex "match" and "sub" member functions were previously not "const"
because they wrote to the "error" member variable. This commit removes
those assignments, and instead assumes that the validity of the regex
is already known after the initial compilation of the regular
expression. As a result, these member functions were possible to make
"const". This makes it easier to do things like pre-compile Regexes
up-front, and makes "match" and "sub" thread-safe. The error status is
now returned as an optional output, which also makes the API of "match"
and "sub" more consistent with each other.
Also, some uses of Regex that could be refactored to be const were made const.
David Bolvansky [Tue, 24 Sep 2019 14:01:14 +0000 (14:01 +0000)]
[Compiler] Fix LLVM_NODISCARD for GCC
Summary:
This branch is currently dead since we don't use C++17.
#if __cplusplus > 201402L && LLVM_HAS_CPP_ATTRIBUTE(nodiscard)
#define LLVM_NODISCARD [[nodiscard]]
This branch is Clang-only.
#elif LLVM_HAS_CPP_ATTRIBUTE(clang::warn_unused_result)
#define LLVM_NODISCARD [[clang::warn_unused_result]]
While we could use gnu variant [[gnu::warn_unused_result]], it is not ideal because it works only on functions.
/home/xbolva00/LLVM/llvm/include/llvm/ADT/ArrayRef.h:41:24: warning: ‘warn_unused_result’ attribute only applies to function types [-Wattributes]
GCC (checked 5,6,7,8) seems to enable [[nodiscard]] even in C++14 mode and does not produce warnings that nodiscard is C++17 feature. but Clang does - but we do not reach it due the code above. So it affects only GCC and does what we want.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<Instruction> directly and if not assert will fire for us.
Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
Reason: this caused severe compile time regressions in JAX.
See email thread of original revision on llvm-commits for details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html
The static analyzer is warning about a potential null dereference, but we should be able to use cast<CmpInst> directly and if not assert will fire for us.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<LandingPadInst> directly and if not assert will fire for us.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<Instruction> directly and if not assert will fire for us.