Eric Christopher [Mon, 14 Oct 2019 22:56:07 +0000 (22:56 +0000)]
In the new pass manager use PTO.LoopUnrolling to determine when and how
we will unroll loops. Also comment a few occasions where we need to
know whether or not we're forcing the unwinder or not.
The default before and after this patch is for LoopUnroll to be enabled,
and for it to use a cost model to determine whether to unroll the loop
(`OnlyWhenForced = false`). Before this patch, disabling loop unroll
would not run the LoopUnroll pass. After this patch, the LoopUnroll pass
is being run, but it restricts unrolling to only the loops marked by a
pragma (`OnlyWhenForced = true`).
In addition, this patch disables the UnrollAndJam pass when disabling unrolling.
Testcase is in clang because it's controlling how the loop optimizer
is being set up and there's no other way to trigger the behavior.
Jian Cai [Mon, 14 Oct 2019 22:22:26 +0000 (22:22 +0000)]
[ARM][AsmParser] handles offset expression in parentheses
Summary:
Integrated assembler does not accept offset expressions surrounded by
parenthesis. Handle this case for GAS compability.
https://bugs.llvm.org/show_bug.cgi?id=43631
David Blaikie [Mon, 14 Oct 2019 22:12:45 +0000 (22:12 +0000)]
DebugInfo: Remove unnecessary/mistaken inclusion of Bitcode/BitcodeAnalyzer.h
Introduced in r374582, Michael Spencer pointed out this broke the
modules build due to a missing tblgen dependency on
llvm/IR/Attributes.inc.
Michael fixed the dependency in r374827.
So this removes the inclusion and the new dependency (effectively
reverting r374827 and including the alternative fix of removing rather
than supporting the new dependency).
A previous commit made libLLVMDebugInfoDWARF depend on the LLVM_Bitcode module which depends on the LLVM_intrinsic_gen module which depends onĀ "llvm/IR/Attributes.inc" which is a generated header not depended on by libLLVMDebugInfo. Add that dependency.
Joel E. Denny [Mon, 14 Oct 2019 19:59:30 +0000 (19:59 +0000)]
[lit] Extend internal diff to support -U
When using lit's internal shell, RUN lines like the following
accidentally execute an external `diff` instead of lit's internal
`diff`:
```
# RUN: program | diff -U1 file -
```
Such cases exist now, in `clang/test/Analysis` for example. We are
preparing patches to ensure lit's internal `diff` is called in such
cases, which will then fail because lit's internal `diff` doesn't
recognize `-U` as a command-line option. This patch adds `-U`
support.
Philip Reames [Mon, 14 Oct 2019 19:49:40 +0000 (19:49 +0000)]
[Tests] Add a test demonstrating a miscompile in the off-by-default loop-pred transform
Credit goes to Evgeny Brevnov for figuring out the problematic case.
Fuzzing probably also found it (lots of failures), but due to some silly infrastructure problems I hadn't gotten to the results before Evgeny hand reduced it from a benchmark.
Roman Lebedev [Mon, 14 Oct 2019 19:46:34 +0000 (19:46 +0000)]
[LoopIdiom] BCmp: loop exit count must not be wider than size_t that `bcmp` takes
As reported by Joerg Sonnenberger in IRC, for 32-bit systems,
where pointer and size_t are 32-bit, if you use 64-bit-wide variable
in the loop, you could end up with loop exit count being of the type
wider than the size_t. Now, i'm not sure if we can produce `bcmp`
from that (just truncate?), but we certainly should not assert/miscompile.
Jordan Rupprecht [Mon, 14 Oct 2019 17:47:17 +0000 (17:47 +0000)]
[llvm-objdump] Adjust spacing and field width for --section-headers
Summary:
- Expand the "Name" column past 13 characters when any of the section names are longer. Current behavior is a staggard output instead of a nice table if a single name is longer.
- Only print the required number of hex chars for addresses (i.e. 8 characters for 32-bit, 16 characters for 64-bit)
- Fix trailing spaces
Vedant Kumar [Mon, 14 Oct 2019 17:20:22 +0000 (17:20 +0000)]
[llvm-profdata] Weaken "malformed-ptr-to-counter-array.test" to appease arm bots
There are a number arm bots failing after r374617 landed, and I'm not
sure why. It looks a bit like the error message llvm-profdata is
expected to print to stderr isn't flushed.
Weaken the test in an attempt to appease the arm bots: if this doesn't
work, that means that llvm-profdata is actually *not failing*, and that
will be a clear indication that some logic error is actually happening.
Simon Pilgrim [Mon, 14 Oct 2019 16:30:17 +0000 (16:30 +0000)]
[CostModel][X86] Add CTLZ scalar costs
Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it.
For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs.
Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
David Green [Mon, 14 Oct 2019 15:19:33 +0000 (15:19 +0000)]
[ARM] Selection for MVE VMOVN
The adds both VMOVNt and VMOVNb instruction selection from the appropriate
shuffles. We detect shuffle masks of the form:
0, N, 2, N+2, 4, N+4, ...
or
0, N+1, 2, N+3, 4, N+5, ...
ISel will also try the opposite patterns, with inputs reversed. These are
selected to VMOVNt and VMOVNb respectively.
Makes Object/macho-invalid.test fail everywhere, e.g. here:
http://lab.llvm.org:8011/builders/llvm-hexagon-elf/builds/23669/steps/test-llvm/logs/FAIL%3A%20LLVM%3A%3Amacho-invalid.test
[Alignment][NFC] Move and type functions from MathExtras to Alignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'
Detailed description:
After https://reviews.llvm.org/D59990 submit several issues were discovered.
Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly.
Discovered issues were addressed in the following commits:
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Craig Topper [Mon, 14 Oct 2019 06:47:56 +0000 (06:47 +0000)]
[X86] Teach EmitTest to handle ISD::SSUBO/USUBO in order to use the Z flag from the subtract directly during isel.
This prevents isel from emitting a TEST instruction that
optimizeCompareInstr will need to remove later.
In some of the modified tests, the SUB gets duplicated due to
the flags being needed in two places and being clobbered in
between. optimizeCompareInstr was able to optimize away the TEST
that was using the result of one of them, but optimizeCompareInstr
doesn't know to turn SUB into CMP after removing the TEST. It
only knows how to turn SUB into CMP if the result was already
dead.
With this change the TEST never exists, so optimizeCompareInstr
doesn't have to remove it. Then it can just turn the SUB into
CMP immediately.
Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
[Attributor] Shortcut no-return through will-return
No-return and will-return are exclusive, assuming the latter is more
prominent we can avoid updates of the former unless will-return is not
known for sure.
[Attributor][MemBehavior] Fallback to the function state for arguments
Even if an argument is captured, we cannot have an effect the function
does not have. This is fine except for the special case of `inalloca` as
it does not behave by the rules.
TODO: Maybe the special rule for `inalloca` is wrong after all.
Simon Pilgrim [Sun, 13 Oct 2019 19:35:35 +0000 (19:35 +0000)]
[X86] getTargetShuffleInputs - Control KnownUndef mask element resolution as well as KnownZero.
We were already controlling whether the KnownZero elements were being written to the target mask, this extends it to the KnownUndef elements as well so we can prevent the target shuffle mask being manipulated at all.
Craig Topper [Sun, 13 Oct 2019 19:07:28 +0000 (19:07 +0000)]
[X86] Enable use of avx512 saturating truncate instructions in more cases.
This enables use of the saturating truncate instructions when the
result type is less than 128 bits. It also enables the use of
saturating truncate instructions on KNL when the input is less
than 512 bits. We can do this by widening the input and then
extracting the result.
Roman Lebedev [Sun, 13 Oct 2019 17:11:16 +0000 (17:11 +0000)]
[NFC][InstCombine] More test for "sign bit test via shifts" pattern (PR43595)
While that pattern is indirectly handled via
reassociateShiftAmtsOfTwoSameDirectionShifts(),
that incursme one-use restriction on truncation,
which is pointless since we know that we'll produce a single instruction.
Additionally, *if* we are only looking for sign bit,
we don't need shifts to be identical,
which isn't the case in general,
and is the blocker for me in bug in question:
The CmpInst::getType() calls can be replaced by just using User::getType() that it was dyn_cast from, and we then need to assert that any default predicate cases came from the CmpInst.
Craig Topper [Sun, 13 Oct 2019 06:48:05 +0000 (06:48 +0000)]
[X86] Add a one use check on the setcc to the min/max canonicalization code in combineSelect.
This seems to improve std::midpoint code where we have a min and
a max with the same condition. If we split the setcc we can end
up with two compares if the one of the operands is a constant.
Since we aggressively canonicalize compares with constants.
For non-constants it can interfere with our ability to share
control flow if we need to expand cmovs into control flow.
I'm also not sure I understand this min/max canonicalization code.
The motivating case talks about comparing with 0. But we don't
check for 0 explicitly.
Removes one instruction from the codegen for PR43658.
[Attributor][FIX] Ensure h2s doesn't trigger on escaped pointers
We do not yet perform h2s because we know something is free'ed but we do
it because we know the pointer does not escape. Storing the pointer
allows it to escape so we have to prevent that.
[Attributor][FIX] Do not apply h2s for arbitrary mallocs
H2S did apply to mallocs of non-constant sizes if the uses were OK. This
is now forbidden through reording of the "good" and "bad" cases in the
conditional.
The check for naked/optnone was insufficient for different reasons. We
now check before we initialize an abstract attribute and we do it for
all abstract attributes.
[SROA] Reuse existing lifetime markers if possible
Summary:
If the underlying alloca did not change, we do not necessarily need new
lifetime markers. This patch adds a check and reuses the old ones if
possible.