Craig Topper [Mon, 4 Feb 2019 04:15:02 +0000 (04:15 +0000)]
Revert r352985 "[X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly"
Looking into gcc and objdump behavior more this was overly aggressive. If the register is encoded in the instruction we should print %st(0), if its implicit we should print %st.
I'll be making a more directed change in a future patch.
Compute the correct symbol size in llvm-nm even without --print-size
In llvm-nm, the symbol size was being computed only with --print-size option,
even though it was being printed in other cases, such as with --format=posix.
This patch simply removes the guard, so that the size is computed
independently of the later decision to print it or not.
Sanjay Patel [Sun, 3 Feb 2019 17:53:09 +0000 (17:53 +0000)]
[CGP] adjust target constraints for forming uaddo
There are 2 changes visible here:
1. There's no reason to limit this transform based on number
of condition registers. That diff allows PPC to produce
slightly better (dot-instructions should be generally good)
code.
Note: someone that cares about PPC codegen might want to
look closer at that output because it seems like we could
still improve this.
2. We (probably?) should not bother trying to form uaddo (or
other overflow ops) when there's no target support for such
an op. This goes beyond checking whether the op is expanded
because both PPC and AArch64 show better codegen for standard
types regardless of whether the op is legal/custom.
Sanjay Patel [Sun, 3 Feb 2019 16:16:48 +0000 (16:16 +0000)]
[PatternMatch] add special-case uaddo matching for increment-by-one (2nd try)
This is the most important uaddo problem mentioned in PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754
...but that was overcome in x86 codegen with D57637.
That patch also corrects the inc vs. add regressions seen with the previous attempt at this.
Still, we want to make this matcher complete, so we can potentially canonicalize the pattern
even if it's an 'add 1' operation.
Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4
commuted variants of uaddo.
There's also a test with a crazy type to show that the existing CGP transform based on this
matcher is not limited by target legality checks.
I'm not sure if the Hexagon diff means the test is no longer testing what it intended to
test, but that should be solvable in a follow-up.
Sanjay Patel [Sun, 3 Feb 2019 13:48:03 +0000 (13:48 +0000)]
[CGP] refactor optimizeCmpExpression (NFCI)
This is not truly NFC because we are bailing out without
a TLI now. That should not be a real concern though because
there should be a TLI in any real-world scenario.
That seems better than passing around a pointer and then
checking it for null-ness all over the place.
The motivation is to fix what appears to be an unintended
restriction on the uaddo transform -
hasMultipleConditionRegisters() shouldn't be reason to limit
the transform.
Craig Topper [Sun, 3 Feb 2019 07:53:39 +0000 (07:53 +0000)]
[X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly
Summary:
When calculating clobbers for MS style inline assembly we fail if the asm clobbers stack top because we print st(0) and try to pass it through the gcc register name check. This was found with when I attempted to make a emms/femms clobber all ST registers. If you use emms/femms in MS inline asm we would try to use st(0) as the clobber name but clang would think that wasn't a valid clobber name.
This also matches what objdump disassembly prints. It's also what is printed by gcc -S.
Push the insert_subvector up through the shuffle operands to help find more cross-lane shuffles.
The is exposes a couple of minor issues that will be fixed shortly:
Missed broadcast folds - we have a mixture of vzext_load lengths that need cleaning up
combine-sdiv.ll - AVX1 SimplifyDemandedVectorElts failure (hits max depth due to a couple of extra bitcasts).
We already have the getConstantOperandVal helper which returns a uint64_t, but along comes the fuzzer and inserts a i128 -1 constant or something and the whole thing asserts.......
I've updated a few obvious cases, and tried to make use of the const reference where possible, but there's more to do. A number of existing oss-fuzz tickets should be fixed if we start using APInt and perform value clamping where necessary.
Florian Hahn [Sat, 2 Feb 2019 14:42:27 +0000 (14:42 +0000)]
[LCSSA] Add expensive verification of LCSSA form for sub-loops.
This assertion makes sure all sub-loops are in LCSSA form before
bringing their parent in LCSSA form. This precondition was added to
formLCSSA in D56848.
Yonghong Song [Sat, 2 Feb 2019 05:54:59 +0000 (05:54 +0000)]
[BPF] [BTF] Process FileName with absolute path correctly
In IR, sometimes the following attributes for DIFile may be
generated:
filename: /home/yhs/test.c
directory: /tmp
The /tmp may represent the working directory of the compilation
process.
In such cases, since filename is with absolute path,
the directory should be ignored by BTF. The filename alone is
enough to get the source.
[llvm-objcopy] Temporarily limit one test to darwin
Some triples in llvm-mc appear to be unavailable on some buildbots.
To please those buildbots we temporarily limit the test to darwin
(where the required triple is guranteed to be available)
until we find the right solution.
Julian Lettner [Sat, 2 Feb 2019 02:05:16 +0000 (02:05 +0000)]
[ASan] Do not instrument other runtime functions with `__asan_handle_no_return`
Summary:
Currently, ASan inserts a call to `__asan_handle_no_return` before every
`noreturn` function call/invoke. This is unnecessary for calls to other
runtime funtions. This patch changes ASan to skip instrumentation for
functions calls marked with `!nosanitize` metadata.
Yonghong Song [Fri, 1 Feb 2019 23:23:17 +0000 (23:23 +0000)]
[BPF] [BTF] Process FileName with absolute path correctly
In IR, sometimes the following attributes for DIFile may be
generated:
filename: /home/yhs/test.c
directory: /tmp
The /tmp may represent the working directory of the compilation
process.
In such cases, since filename is with absolute path,
the directory should be ignored by BTF. The filename alone is
enough to get the source.
Philip Reames [Fri, 1 Feb 2019 22:58:52 +0000 (22:58 +0000)]
[CodeGen] Be as conservative about atomic accesses as for volatile
Background: At the moment, we record the AtomicOrdering of an access in the MMO, but also mark any atomic access as volatile in SelectionDAG. I'm working towards separating that. See https://reviews.llvm.org/D57601 for context.
Update all usages of isVolatile in lib/CodeGen to preserve behaviour once atomic MMOs stop being also volatile. This is NFC in it's current form, but is essential for correctness once we make that final change.
It useful to keep in mind that AtomicSDNode is not a parent of LoadSDNode, StoreSDNode, or LSBaseSDNode. As a result, any call to isVolatile on one of those static types doesn't need a companion isAtomic check. We should probably adjust that class hierarchy long term, but for now, that seperation is useful.
I'm deliberately being conservative about handling. I want the change to stop adding volatile to be NFC itself, and then will work through places where we can be less conservative for atomics one by one in separate changes w/tests.
[DebugInfo] Don't use realpath when looking up debug binary locations.
Summary:
Using realpath makes assumptions about build systems that do not always hold true. The debug binary referred to from the .gnu_debuglink should exist in the same directory (or in a .debug directory, etc.), but the files may only exist as symlinks to a differently named files elsewhere, and using realpath causes that lookup to fail.
This was added in r189250, and this is basically a revert + regression test case.
gn build: Create regular archives for the sanitizer runtimes.
We'll need to do this eventually if we create an installable package.
For now, this lets me use the archives to build Android, whose build
system wants to copy the archives to another location.
Philip Reames [Fri, 1 Feb 2019 19:08:59 +0000 (19:08 +0000)]
Fix a bug in the definition of isUnordered on MachineMemOperand
Background: At the moment, we record the AtomicOrdering of an access in the MMO, but also mark any atomic access as volatile in SelectionDAG. GlobalISEL keeps the two separate, but currently doesn't know how to lower an atomic G_LOAD at all. See https://reviews.llvm.org/D57601 for context.
The definition used for unordered was only checking volatility, not atomicity. As noted above, all atomic MMOs are currently also volatile, so this is a latent bug only. Copy the definition used in IR, after auditing the two (2) uses of the function to be sure the desired semantics are the same.
Matt Davis [Fri, 1 Feb 2019 18:51:10 +0000 (18:51 +0000)]
[llvm-readobj] Add a flag to dump just the section-to-segment mapping.
Summary:
The following patch introduces a new function `printSectionMapping` which is responsible for dumping just the section-to-segment mapping.
This patch also introduces a n option `-section-mapping` that outputs that mapping without the program headers.
Previously, this functionality was controlled by `printProgramHeaders`, and the output from `-program-headers` has not been changed. I am happy to change the option name, I copied the name that was displayed when outputting the mapping table.
Nico Weber [Fri, 1 Feb 2019 18:17:19 +0000 (18:17 +0000)]
gn build: Add a missing dependency from llvm/test to llvm-lit
check-llvm already listed llvm-lit as script which counts as a dep, so running
check-llvm worked fine, but `ninja -C out/gn llvm/test` didn't build llvm-lit
before if it wasn't already there.
Matt Davis [Fri, 1 Feb 2019 17:38:08 +0000 (17:38 +0000)]
[llvm-nm] Report '.comment' ELF sections as 'n' instead of '?'
Summary:
The previous implementation reported `.comment` sections as '?'
GNU uses 'n' which means "The symbol is a debugging symbol." `.note` sections are represented as 'n' too.
The test related to this change was updated to CHECK-NEXT to ensure
order and that we did not miss any symbols in the dump.
[llvm-objcopy][NFC] More error propagation (executeObjcopyOnArchive)
Summary:
Replace some reportError() calls with error propagation that was missed from rL352625.
Note this also adds an error check during Archive iteration that was being hidden by a different error check before:
```
for (const Archive::Child &Child : Ar.children(Err)) {
Expected<std::unique_ptr<Binary>> ChildOrErr = Child.getAsBinary();
if (!ChildOrErr)
// This aborts, so Err is never checked
reportError(Ar.getFileName(), ChildOrErr.takeError());
```
Err is being checked after the loop, so during happy runs, everything is fine. But when reportError is changed to return the error instead of aborting, the fact that Err is never checked is now noticed in tests that trigger an error during the loop.
Tim Corringham [Fri, 1 Feb 2019 16:51:09 +0000 (16:51 +0000)]
[AMDGPU] Fix for vector element insertion
Summary:
Incorrect code was generated when lowering insertelement operations
for vectors with 8 or 16 bit elements. The value being inserted was
not adjusted for the position of the element within the 32 bit word
and so only the low element within each 32 bit word could receive
the intended value.
Fixed by simply replicating the value to each element of a
congruent vector before the mask and or operation used to
update the intended element.
A number of affected LIT tests have been updated appropriately.
Sanjay Patel [Fri, 1 Feb 2019 16:06:53 +0000 (16:06 +0000)]
[SDAG] improve variable names; NFC
The version of FoldConstantArithmetic() that takes arbitrary nodes
was confusingly naming those nodes as constants when they might
not be; also "Cst" reads like "Cast".
Simon Pilgrim [Fri, 1 Feb 2019 16:02:12 +0000 (16:02 +0000)]
[X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle
As suggested on PR40318, this patch uses PSLLDQ/PSRLDQ to lower shuffles to zero out the ends of a vector, leaving a sequential inner section.
For pre-SSSE3 we do this for shuffles with zeros at either end (requiring up to 3 shifts), but once PSHUFB is available I've limited this to shuffles with a single zeroable end (2 shifts).
Sanjay Patel [Fri, 1 Feb 2019 15:35:12 +0000 (15:35 +0000)]
[TargetLowering] try harder to determine undef elements of vector binops
This might be the start of tracking all vector element constants generally if we take it to its
logical conclusion, but let's stop here and make sure this is correct/beneficial so far.
The affected tests require a convoluted path before they get simplified currently because we
don't call SimplifyDemandedVectorElts() from binops directly and don't modify the binop operands
directly in SimplifyDemandedVectorElts().
That's why the tests all have a trailing shuffle to induce a chain reaction of transforms. So
something like this is happening:
1. Improve the knowledge of undefs in the binop via a SimplifyDemandedVectorElts() call that
originates from a shuffle.
2. Transfer that undef knowledge back to the shuffle mask user as more undef lanes.
3. Combine the modified shuffle by calling SimplifyDemandedVectorElts() again.
4. Translate the improved shuffle mask as undemanded lanes of build vector constants causing
those to become full undef constants.
5. Simplify the binop now that it has a full undef operand.
As we can see from the unchanged 'and' and 'or' tests, tracking undefs alone isn't a full solution.
We would need to track zero and all-ones constants to improve those opcodes. We'd probably need to
track NaN for FP ops too (assuming we don't have fast-math-flags set).
Sanjay Patel [Fri, 1 Feb 2019 14:37:49 +0000 (14:37 +0000)]
[InstCombine] reduce duplicate code; NFC
An unused variable problem was introduced with rL352870
and stubbed out with rL352871, but we can make a better
fix by actually using the local variable in code rather
than just the assert.
Sanjay Patel [Fri, 1 Feb 2019 14:14:47 +0000 (14:14 +0000)]
[InstCombine] try to reduce x86 addcarry to generic uaddo intrinsic
If we can reduce the x86-specific intrinsic to the generic op, it allows existing
simplifications and value tracking folds. AFAICT, this always results in identical
x86 codegen in the non-reduced case...which should be true because we semi-generically
(too aggressively IMO) convert to llvm.uadd.with.overflow in CGP, so the DAG/isel must
already combine/lower this intrinsic as expected.
This isn't quite what was requested in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but we want to have these kinds of folds early for efficiency and to enable greater
simplifications. For the case in the bug report where we have:
_addcarry_u64(0, ahi, 0, &ahi)
...this gets completely simplified away in IR.
Stefan Granitz [Fri, 1 Feb 2019 13:08:09 +0000 (13:08 +0000)]
[CMake] Add install targets for utilities to LLVM exports if LLVM_INSTALL_UTILS=ON
Summary: D56606 was only appending target names to the `LLVM_EXPORTS`/`LLVM_EXPORTS_BUILDTREE_ONLY` properties. Targets showed up correctly in the build-tree `LLVMExports.cmake`, but they were missing in the installed one (as we found in https://bugs.llvm.org/show_bug.cgi?id=40443), because install did not register them explicitly.
This patch changes isFPImmLegal to return if the value can be enconded
as the immediate operand of a logical instruction besides checking if
for immediate field for fmov.
This optimizes some floating point materization, inclusive values
used on isinf lowering.
Ilya Biryukov [Fri, 1 Feb 2019 11:20:13 +0000 (11:20 +0000)]
Disable tidy checks with too many hits
Summary:
Some tidy checks have too many hits in the codebase, making it hard to spot
other results from clang-tidy, therefore rendering the tool less useful.
Two checks were disabled:
- misc-non-private-member-variable-in-classes in the whole LLVM monorepo,
it is very common to have those in LLVM and the style guide does not forbid
them.
- readability-identifier-naming in the clang subtree. There are thousands of
violations in 'Sema.h' alone.
Before the change, 'Sema.h' had >1000 tidy warnings, after the change the number
dropped to 3 warnings (unterminated namespace comments).
Roman Lebedev [Fri, 1 Feb 2019 11:15:13 +0000 (11:15 +0000)]
[X86][BdVer2] Transfer delays from the integer to the floating point unit.
Summary:
I'm unable to find this number in the "AMD SOG for family 15h".
llvm-exegesis measures the latencies of these instructions as `2`,
which matches the latencies specified in "AMD SOG for family 15h".
However if we look at Agner, Microarchitecture, "AMD Bulldozer, Piledriver,
Steamroller and Excavator pipeline", "Data delay between different execution
domains", the int->ivec transfer is listed as `8`..`10`cy of additional latency.
Also, Agner's "Instruction tables", for Piledriver, lists their latencies as `12`,
which is consistent with `2cy` from exegesis / AMD SOG + `10cy` transfer delay.
Additional data point comes from the fact that Agner's "Instruction tables",
for Jaguar, lists their latencies as `8`; and "AMD SOG for family 16h" does
state the `+6cy` int->ivec delay, which is consistent with instr latency of `1` or `2`.
Yevgeny Rouban [Fri, 1 Feb 2019 10:44:43 +0000 (10:44 +0000)]
Provide reason messages for unviable inlining
InlineCost's isInlineViable() is changed to return InlineResult
instead of bool. This provides messages for failure reasons and
allows to get more specific messages for cases where callsites
are not viable for inlining.
James Henderson [Fri, 1 Feb 2019 10:24:55 +0000 (10:24 +0000)]
[llvm-symbolizer][test] Rename and tweak tests using llvm-symbolizer
Prior to this change, there are a few tests called llvm-symbolizer* in
the DebugInfo test area. These really were testing either the DebugInfo
or Symbolizer library, rather than the llvm-symbolizer tool itself, so
this patch renames them to be clearer that they aren't explicitly tests
for llvm-symbolizer (such tests belong in test/tools/llvm-symbolizer).
This patch also reinstates the copying of a DWO file, removed previously
in r352752. The test needs this so that it could possibly fail.
Finally, some of the tests have been simplified slightly by removing
unnecessary switches and/or unused check-prefixes.
James Henderson [Fri, 1 Feb 2019 10:02:42 +0000 (10:02 +0000)]
[doc]Update String Error documentation in Programmer Manual
A while back, createStringError was added to provide easier construction
of StringError instances, especially with formatting options. Prior to
this patch, that the documentation only mentions the standard method of
using it. Since createStringError is slightly shorter to type, and also
provides the formatting options, this patch updates the Programmer's
Manual to use the new function in its examples, and to mention the
printf formatting options. It also fixes a small typo in one of the
examples and removes the unnecessary make_error_code call.
Oliver Stannard [Fri, 1 Feb 2019 09:23:51 +0000 (09:23 +0000)]
[CodeGen] Don't scavenge non-saved regs in exception throwing functions
Previously, LiveRegUnits was assuming that if a block has no successors
and does not return, then no registers are live at the end of it
(because the end of the block is unreachable). This was causing the
register scavenger to use callee-saved registers to materialise stack
frame addresses without saving them in the prologue. This would normally
be fine, because the end of the block is unreachable, but this is not
legal if the block ends by throwing a C++ exception. If this happens,
the scratch register will be modified, but its previous value won't be
preserved, so it doesn't get restored by the exception unwinder.