Lang Hames [Tue, 7 May 2019 22:56:40 +0000 (22:56 +0000)]
Reapply r360194 "[JITLink] Add support for MachO .alt_entry atoms." with fixes.
This patch modifies MachOAtomGraphBuilder to use setLayoutNext rather than
addEdge, and fixes a bug in the section layout algorithm that could result in
atoms appearing more than once in the section ordering (which resulted in those
atoms being assigned invalid addresses during layout).
Austin Kerbow [Tue, 7 May 2019 22:12:15 +0000 (22:12 +0000)]
[AMDGPU] Check MI bundles for hazards
Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually.
Lang Hames [Tue, 7 May 2019 21:35:14 +0000 (21:35 +0000)]
[JITLink] Add support for MachO .alt_entry atoms.
The MachO .alt_entry directive is applied to a symbol to indicate that it is
locked (in terms of address layout and liveness) to its predecessor atom. I.e.
it is an alternate entry point, at a fixed offset, for the previous atom.
This patch updates MachOAtomGraphBuilder to check for the .alt_entry flag on
symbols and add a corresponding LayoutNext edge to the atom-graph. It also
updates MachOAtomGraphBuilder_x86_64 to generalize handling of the
X86_64_RELOC_SUBTRACTOR relocation: previously either the minuend or
subtrahend of the subtraction had to be the same as the atom being fixed up,
now it is only necessary for the minuend or subtrahend to be locked (via any
chain of alt_entry directives) to the atom being fixed up.
Revert "[OpenMP][Clang] Support for target math functions"
This commit appears to be breaking stage-2 builds on GreenDragon. The
OpenMP wrappers for cmath and math.h are copied into the root of the
resource directory and cause a cyclic dependency in module 'Darwin':
Darwin -> std -> Darwin. This blows up when CMake is testing for modules
support and breaks all stage 2 module builds, including the ThinLTO bot
and all LLDB bots.
CMake Error at cmake/modules/HandleLLVMOptions.cmake:497 (message):
LLVM_ENABLE_MODULES is not supported by this compiler
Compute results in more direct ways, avoid subset intersect
operations. Extract the core code for computing mul nowrap ranges
into separate static functions, so they can be reused.
Make sure that the DAG combiner doesn't merge stores that we explicitly
asked not be greater than preferred vector width for the vectorizer.
Test for both 128 and 256 with a skylake architecture.
Sanjay Patel [Tue, 7 May 2019 18:58:07 +0000 (18:58 +0000)]
[InstCombine] allow sinking fneg operands through an FP min/max
Fundamentally/generally, we should not have to rely on bailouts/crippling of
folds. In this particular case, I think we always recognize the inverted
predicate min/max pattern, so there should not be any loss of optimization.
Codegen looks better because we are eliminating an fneg.
Don Hinton [Tue, 7 May 2019 18:57:01 +0000 (18:57 +0000)]
[CommandLine] Allow Options to specify multiple OptionCategory's.
Summary:
It's not uncommon for separate components to share common
Options, e.g., it's common for related Passes to share Options in
addition to the Pass specific ones.
With this change, components can use OptionCategory's to simply help
output even if some of the options are shared.
Adrian Prantl [Tue, 7 May 2019 17:42:38 +0000 (17:42 +0000)]
Debug Info: Support address space attributes on rvalue references.
DWARF5, 2.12 20ff says that
Any debugging information entry representing a pointer or reference
type [may have a DW_AT_address_class attribute].
The existing code (https://reviews.llvm.org/D29670) seems to take a
quite literal interpretation of that wording. I don't see a reason why
an rvalue reference isn't a reference type in the spirit of that
paragraph. This patch allows rvalue references to also have address
spaces.
Jinsong Ji [Tue, 7 May 2019 17:29:44 +0000 (17:29 +0000)]
[PowerPC][NFC] Update build-vector-tests.ll using utils/update_llc_test_checks.py
build-vector-tests.ll is a huge testcase, it is hard to maintain: eg:
any fundamental changes might need to update hundreds of lines. We should
leverage the script to maintain it.
This patch simply run utils/update_llc_test_checks.py on it. There
should be no missing test points.
Florian Hahn [Tue, 7 May 2019 16:47:27 +0000 (16:47 +0000)]
[DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
When simplifying TokenFactors, we potentially iterate over all
operands of a large number of TokenFactors. This causes quadratic
compile times in some cases and the large token factors cause additional
scalability problems elsewhere.
This patch adds some limits to the number of nodes explored for the
cases mentioned above.
The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:
A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.
In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.
Keno Fischer [Tue, 7 May 2019 15:28:47 +0000 (15:28 +0000)]
[SCEV] Add explicit representations of umin/smin
Summary:
Currently we express umin as `~umax(~x, ~y)`. However, this becomes
a problem for operands in non-integral pointer spaces, because `~x`
is not something we can compute for `x` non-integral. However, since
comparisons are generally still allowed, we are actually able to
express `umin(x, y)` directly as long as we don't try to express is
as a umax. Support this by adding an explicit umin/smin representation
to SCEV. We do this by factoring the existing getUMax/getSMax functions
into a new function that does all four. The previous two functions were
largely identical.
[PowerPC] Use the two-constant NR algorithm for refining estimates
The single-constant algorithm produces infinities on a lot of denormal values.
The precision of the two-constant algorithm is actually sufficient across the
range of denormals. We will switch to that algorithm for now to avoid the
infinities on denormals. In the future, we will re-evaluate the algorithm to
find the optimal one for PowerPC.
George Rimar [Tue, 7 May 2019 13:14:18 +0000 (13:14 +0000)]
[llvm-objdump] - Print relocation record in a GNU format.
This fixes the https://bugs.llvm.org/show_bug.cgi?id=41355.
Previously with -r we printed relocation section name instead of the target section name.
It was like this: "RELOCATION RECORDS FOR [.rel.text]"
Now it is: "RELOCATION RECORDS FOR [.text]"
Also when relocation target section has more than one relocation section,
we did not combine the output. Now we do.
Roman Lebedev [Tue, 7 May 2019 12:28:08 +0000 (12:28 +0000)]
[llvm-exegesis] BenchmarkRunner::runConfiguration(): write small snippet to memory
It was previously writing this temporary snippet to file,
then reading it back, but leaving the tmp file in place.
This is both unefficient, and results in huge garbage pileup
in /tmp.
One would have thought it would have been caught during D60317..
George Rimar [Tue, 7 May 2019 12:10:51 +0000 (12:10 +0000)]
[yaml2obj] - Allow setting st_value explicitly for Symbol.
In some cases it is useful to explicitly set symbol's st_name value.
For example, I am using it in a patch for LLD to remove the broken
binary from a test case and replace it with a YAML test.
The revisioin causes llvm-tblgen to hang while generating info for
RISCV.td. The root cause might be in the RISCV.td definition but I don't
know enough about this to investigate further.
Command that starts hangning after r360106:
`llvm-build/bin/llvm-tblgen -I llvm/include -I llvm/tools/clang/include -I llvm/lib/Target/RISCV -gen-instr-info llvm/lib/Target/RISCV/RISCV.td`
Basic "revectorization" combine, we can probably do more opcodes here but it can be a tricky cost-benefit depending on where the subvectors came from - but this case helps shuffle combining.
Roman Lebedev [Tue, 7 May 2019 09:21:13 +0000 (09:21 +0000)]
[llvm-exegesis] InstructionBenchmark::writeYamlTo(): don't forget to flush()
This *APPEARS* to fix a *very* infuriating issue of Yaml's being corrupted,
partially written, truncated. Or at least i'm not seeing the issue
on a new benchmark sweep.
The issue is somewhat rare, happens maybe once in 1000 benchmarks.
Which means there are up to hundreds of broken benchmarks
for a full x86 sweep in a single mode.
Craig Topper [Tue, 7 May 2019 04:25:24 +0000 (04:25 +0000)]
[FastISel][X86] If selectFNeg fails, fall back to SelectionDAG not treating it as an fsub.
Summary:
If fneg lowering for fsub -0.0, x fails we currently fall back to treating it as an fsub. This has different behavior for nans than the xor with sign bit trick we normally try to do. On X86, the xor trick for double fails fast-isel in 32-bit mode with sse2 due to 64 bit integer types not being available. With -O2 we would always use an xorpd for this case. If we use subsd, this creates an observable behavior difference between -O0 and -O2. So fall back to SelectionDAG if we can't fast-isel it, that way SelectionDAG will use the xorpd.
I believe this patch is restoring the behavior prior to r345295 from last October. This was missed then because our fast isel case in 32-bit mode aborted fast-isel earlier for another reason. But I've added new tests to cover that.
Fangrui Song [Tue, 7 May 2019 02:06:37 +0000 (02:06 +0000)]
[DebugInfo] Delete TypedDINodeRef
TypedDINodeRef<T> is a redundant wrapper of Metadata * that is actually a T *.
Accordingly, change DI{Node,Scope,Type}Ref uses to DI{Node,Scope,Type} * or their const variants.
This allows us to delete many resolve() calls that clutter the code.
Fangrui Song [Tue, 7 May 2019 01:39:37 +0000 (01:39 +0000)]
[SanitizerCoverage] Use different module ctor names for trace-pc-guard and inline-8bit-counters
Fixes the main issue in PR41693
When both modes are used, two functions are created:
`sancov.module_ctor`, `sancov.module_ctor.$LastUnique`, where
$LastUnique is the current LastUnique counter that may be different in
another module.
`sancov.module_ctor.$LastUnique` belongs to the comdat group of the same
name (due to the non-null third field of the ctor in llvm.global_ctors).
# 2 problems:
# 1) If sancov.module_ctor in this module is discarded, this group
# has a relocation to a discarded section. ld.bfd and gold will
# error. (Another issue: it is silently accepted by lld)
# 2) The comdat group has an unstable name that may be different in
# another translation unit. Even if the linker allows the dangling relocation
# (with --noinhibit-exec), there will be many undesired .init_array entries
COMDAT group section [ 25] `.group' [sancov.module_ctor.6] contains 2 sections:
[Index] Name
[ 26] .init_array.2
[ 27] .rela.init_array.2
By using different module ctor names, the associated comdat group names
will also be different and thus stable across modules.
The UnaryOperator class was originally placed in llvm/IR/Instructions.h, with the other UnaryInstructions. However, I'm now thinking that it makes more sense for it to live in llvm/IR/InstrTypes.h, with BinaryOperator. It is more similar to BinaryOperator than any of the other UnaryInstructions.
Craig Topper [Mon, 6 May 2019 23:57:42 +0000 (23:57 +0000)]
[X86] Use extended vector register classes in getRegForInlineAsmConstraint to support x/y/zmm16-31 when the type is mismatched.
The FR32/FR64/VR128/VR256 register classes don't contain the upper 16 registers. For most cases we use the default implementation which will find any register class that contains the register in question if the VT is legal for the register class. But if the VT is i32 or i64, we won't find a matching register class and will instead up in the code modified in this patch.
If the requested register is x/y/zmm16-31 we weren't returning a register class that contains those registers and will hit an assertion in the caller.
To fix this, I've changed to use the extended register class instead. I don't believe we need a subtarget check to see if avx512 is enabled. The default implementation just pick whatever register class it finds first. I checked and we currently pick FR32X for XMM0 with an f32 type using the default implementation regardless of whether avx512 is enabled. So I assume its it is ok to do the same for i32.
Amy Huang [Mon, 6 May 2019 23:37:03 +0000 (23:37 +0000)]
Fix bug in getCompleteTypeIndex in codeview debug info
Summary:
When there are multiple instances of a forward decl record type, only the first one is emitted with a type index, because
the type is added to a map with a null type index. Avoid this by reordering so that forward decl types aren't added to the map.
Eli Friedman [Mon, 6 May 2019 23:21:59 +0000 (23:21 +0000)]
[ARM] Glue register copies to tail calls.
This generally follows what other targets do. I don't completely
understand why the special case for tail calls existed in the first
place; even when the code was committed in r105413, call lowering didn't
work in the way described in the comments.
Stack protector lowering breaks if the register copies are not glued to
a tail call: we have to insert the stack protector check before the tail
call, and we choose the location based on the assumption that all
physical register dependencies of a tail call are adjacent to the tail
call. (See FindSplitPointForStackProtector.) This is sort of fragile,
but I don't see any reason to break that assumption.
I'm guessing nobody has seen this before just because it's hard to
convince the scheduler to actually schedule the code in a way that
breaks; even without the glue, the only computation that could actually
be scheduled after the register copies is the computation of the call
address, and the scheduler usually prefers to schedule that before the
copies anyway.
Philip Reames [Mon, 6 May 2019 22:09:31 +0000 (22:09 +0000)]
Fix pr33010, a 2 year old crashing regression
The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>. The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this. Instead, simply prevent it's creation.
Arguably, we shouldn't be using *Target*FrameIndices in StatepointLowering at all, but that's a much deeper change.
Craig Topper [Mon, 6 May 2019 22:04:26 +0000 (22:04 +0000)]
[X86] Add more test cases for fast-isel handling of fneg.
The fneg double case is falling back to a subsd in 32-bit mode if you write a test that doesn't trigger a fast-isel abort on the return value.
The subsd lowering has different behavior with respect to nans than using an xor. This is inconsisent with what we would do in SelectionDAG
and can lead to differences between -O0 and -O2.
Craig Topper [Mon, 6 May 2019 21:39:51 +0000 (21:39 +0000)]
[X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing.
We require d/q suffixes on the memory form of these instructions to disambiguate the memory size.
We don't require it on the register forms, but need to support parsing both with and without it.
Previously we always printed the d/q suffix on the register forms, but it's redundant and
inconsistent with gcc and objdump.
After this patch we should support the d/q for parsing, but not print it when its unneeded.
Sanjay Patel [Mon, 6 May 2019 20:34:05 +0000 (20:34 +0000)]
[InstCombine] sink FP negation of operands through select
We don't always get this:
Cond ? -X : -Y --> -(Cond ? X : Y)
...even with the legacy IR form of fneg in the case with extra uses,
and we miss matching with the newer 'fneg' instruction because we
are expecting binops through the rest of the path.
Amara Emerson [Mon, 6 May 2019 19:41:01 +0000 (19:41 +0000)]
[GlobalISel] Handle <1 x T> vector return types properly.
After support for dealing with types that need to be extended in some way was
added in r358032 we didn't correctly handle <1 x T> return types. These types
don't have a GISel direct representation, instead we just see them as scalars.
When we need to pad them into <2 x T> types however we need to use a
G_BUILD_VECTOR instead of trying to do a G_CONCAT_VECTOR.
Craig Topper [Mon, 6 May 2019 19:29:24 +0000 (19:29 +0000)]
Revert r359392 and r358887
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"
Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.
Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.
Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.
This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.
Xing Xue [Mon, 6 May 2019 17:45:21 +0000 (17:45 +0000)]
Add libc++ to link XRay test cases if libc++ is used to build CLANG
Summary: When libc++ is used to build CLANG, its XRay libraries libclang_rt.xray-*.a have dependencies on libc++. Therefore, libc++ is needed to link and run XRay test cases. For Linux -rpath is also needed to specify where to load libc++. This change sets macro LLVM_LIBCXX_USED to 1 if libc++ is actually used in the build. XRay tests then check the flag and add -L<llvm_shlib_dir> -lc++ and -Wl,-rpath=<llvm_shlib_dir> if needed.
Nikita Popov [Mon, 6 May 2019 16:59:37 +0000 (16:59 +0000)]
[ConstantRange] Add srem() support
Add support for srem() to ConstantRange so we can use it in LVI. For
srem the sign of the result matches the sign of the LHS. For the RHS
only the absolute value is important. Apart from that the logic is
like urem.
Just like for urem this is only an approximate implementation. The tests
check a few specific cases and run an exhaustive test for conservative
correctness (but not exactness).
Nikita Popov [Mon, 6 May 2019 16:17:17 +0000 (16:17 +0000)]
[SDAG][AArch64] Boolean and/or reduce to umax/min reduce (PR41635)
This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635
by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is
legal but former is not) for zero-or-all-ones boolean reductions (which
are detected based on sign bits).
[PowerPC] Fix erroneous condition for converting uint-to-fp vector conversion
A condition for exiting the legalization of v4i32 conversion to v2f64 through
extract/convert/build erroneously checks for the extract having type i32.
This is not adequate as smaller extracts are actually legalized to i32 as well.
Furthermore, an early exit is missing which means that we only check that
both extracts are from the same vector if that check fails.
As a result, both cases in the included test case fail - the first gets a
select error and the second generates incorrect code.
[llvm-c-test] Make include-all.c do what its name says it does
The purpose of this file is to make sure that all includes in llvm-c
works when included from a C source file (i.e no C++isms sneaked in).
To do this it must actually include all the include files.