Jeremy Morse [Mon, 19 Aug 2019 09:53:07 +0000 (09:53 +0000)]
[DebugInfo] Make postra sinking of DBG_VALUEs subregister-safe
Currently the machine instruction sinker identifies DBG_VALUE insts that
also need to sink by comparing register numbers. Unfortunately this isn't
safe, because (after register allocation) a DBG_VALUE may read a register
that aliases what's being sunk. To fix this, identify the DBG_VALUEs that
need to sink by recording & examining their register units. Register units
gives us the following guarantee:
"Two registers overlap if and only if they have a common register unit"
[MCRegisterInfo.h]
Thus we can always identify aliasing DBG_VALUEs if the set of register
units read by the DBG_VALUE, and the register units of the instruction
being sunk, intersect. (MachineSink already uses classes like
"LiveRegUnits" for determining sinking validity anyway).
The test added checks for super and subregister DBG_VALUE reads of a sunk
copy being sunk as well.
Jeremy Morse [Mon, 19 Aug 2019 09:02:18 +0000 (09:02 +0000)]
[DebugInfo] Test for variable range un-coalescing
LiveDebugVariables can coalesce ranges of variable locations across
multiple basic blocks. However when it recreates DBG_VALUE instructions,
it has to recreate one DBG_VALUE per block, otherwise it doesn't
represent the pre-regalloc layout and variable assignments can go missing.
This feature works -- however while mucking around with LiveDebugVariables,
I commented the relevant code it out and no tests failed. Thus, here's a
test that checks LiveDebugVariables preserves DBG_VALUEs across block
boundaries.
Fangrui Song [Mon, 19 Aug 2019 06:17:30 +0000 (06:17 +0000)]
[MC] Don't emit .symver redirected symbols to the symbol table
GNU as keeps the original symbol in the symbol table for defined @ and
@@, but suppresses it in other cases (@@@ or undefined). The original
symbol is usually undesired:
In a shared object, the original symbol can be localized with a version
script, but it is hard to remove/localize in an archive:
1) a post-processing step removes the undesired original symbol
2) consumers (executable) of the archive are built with the
version script
Moreover, it can cause linker issues like binutils PR/18703 if the
original symbol name and the base name of the versioned symbol is the
same (both ld.bfd and gold have some code to work around defined @ and
@@). In lld, if it sees f and f@v1:
--version-script =(printf 'v1 {};') => f and f@v1
--version-script =(printf 'v1 { f; };') => f@v1 and f@@v1
It can be argued that @@@ added on 2000-11-13 corrected the @ and @@ mistake.
This patch catches some more multiple version errors (defined @ and @@),
and consistently suppress the original symbol. This addresses all the
problems listed above.
If the user wants other aliases to the versioned symbol, they can copy
the original symbol to other symbol names with .set directive, e.g.
.symver f, f@v1 # emit f@v1 but not f into .symtab
.set f_impl, f # emit f_impl into .symtab
Craig Topper [Mon, 19 Aug 2019 05:45:39 +0000 (05:45 +0000)]
[X86] Teach lower1BitShuffle to match right shifts with upper zero elements on types that don't natively support KSHIFT.
We can support these by widening to a supported type,
then shifting all the way to the left and then
back to the right to ensure that we shift in zeroes.
Seiya Nuta [Mon, 19 Aug 2019 05:41:33 +0000 (05:41 +0000)]
[llvm-objcopy][MachO] Implement a layout algorithm for executables
Summary: The layout algorithm for relocatable objects and for executable are somewhat different. This patch implements the latter one based on the algorithm in LLD (MachOFileLayout).
Seiya Nuta [Mon, 19 Aug 2019 05:37:38 +0000 (05:37 +0000)]
[llvm-objcopy][MachO] Support load commands used in executables/shared libraries
Summary:
This patch implements copying some load commands that appear in executables/shared libraries such as the indirect symbol table.
I don't add tests intentionally because this patch is incomplete: we need a layout algorithm for executables/shared libraries. I'll submit it as a separate patch with tests.
Craig Topper [Mon, 19 Aug 2019 04:08:44 +0000 (04:08 +0000)]
[X86] Fix the lower1BitShuffle code added in r369215 to correctly pass the widened vector to the KSHIFT node.
Not sure how to test this as we have tests that exercise this code,
but nothing failed for the types not matching. Since all the k-registers
use equivalent register classes everything just ends up working.
Craig Topper [Mon, 19 Aug 2019 04:08:40 +0000 (04:08 +0000)]
[X86] Teach lower1BitShuffle to match KSHIFTR that doesn't use Zeroable and only relies on undef.
This allows us to widen the type when the KSHIFTR instruction
doesn't exist for the type. If we need to shift in zeroes into
the upper elements we would need more work to guarantee zeroes
when widening.
Craig Topper [Mon, 19 Aug 2019 00:39:18 +0000 (00:39 +0000)]
[X86] Add test case for missed opportunity to recognize a vXi1 shuffle as an insert into a zero vector.
We are currently missing this because shuffle canonicalization
puts the zero vector as V1 and the subvector as V2. Our current
code doesn't recognize this case.
Craig Topper [Sun, 18 Aug 2019 23:30:11 +0000 (23:30 +0000)]
[X86] Add a special case to LowerCONCAT_VECTORSvXi1 to handle concatenating zero vectors followed by one non-zero vector followed by undef vectors.
For such a case we should only need a KSHIFTL, but we were
previously generating a KSHIFTL followed by a KSHIFTR because
we mistakenly believed we need to zero the undef elements.
Hubert Tong [Sun, 18 Aug 2019 22:02:24 +0000 (22:02 +0000)]
[cmake] Move blocks out of redundant else( MSVC ); NFC
Address post-commit comment on D66256 regarding the `else( MSVC )` block
containing only blocks guarded with `LLVM_COMPILER_IS_GCC_COMPATIBLE`,
which would imply `NOT MSVC`.
Add similar functionality to isShuffleEquivalent - if the mask elements don't match, try matching the BUILD_VECTOR scalars instead.
As target shuffles need to handle SM_Sentinel values, this can get a bit tricky, so commit just adds actual mask element index handling - full SM_SentinelZero support will be added when the need arises.
Also, enables support in matchVectorShuffleWithPACK
Craig Topper [Sun, 18 Aug 2019 06:28:06 +0000 (06:28 +0000)]
[TargetLowering] Teach computeRegisterProperties to only widen v3i16/v3f16 vectors to the next power of 2 type if that's legal.
These were recently made simple types. This restores their
behavior back to something like their EVT legalization.
We might be able to fix the code in type legalization where the
assert was failing, but I didn't investigate too much as I had
already looked at the computeRegisterProperties code during the
review for v3i16/v3f16.
Most of the test changes restore the X86 codegen back to what
it looked like before the recent change. The test case in
vec_setcc.ll and is a reduced version of the reproducer from
the fuzzer.
Matt Arsenault [Sun, 18 Aug 2019 00:20:43 +0000 (00:20 +0000)]
AMDGPU: Disambiguate v3f16 format in load/store tables
Currently the searchable tables report the number of dwords. These
round to the same number for 3 and 4 component d16
instructions. Change this to report the number of elements so this
isn't ambiguous.
Craig Topper [Sat, 17 Aug 2019 22:46:15 +0000 (22:46 +0000)]
[X86] Add a one use check to the combineStore code that handles v16i16->v16i8 truncate+store by extending to v16i32 and then emitting a v16i32->v16i8 truncstore.
This prevent us from emitting a separate truncate and a truncating
store instruction.
Yonghong Song [Sat, 17 Aug 2019 22:12:00 +0000 (22:12 +0000)]
[BPF] Fix bpf llvm-objdump issues.
Commit https://reviews.llvm.org/D57939 ("[DWARF] Refactor
RelocVisitor and fix computation of SHT_RELA-typed relocation entries)
made a change for relocation resolution when operating
on an object file.
The change unfortunately broke BPF as given SymbolValue (S) and
Addent (A), previously relocation is resolved to
S + A
and after the change, it is resolved to
S
This patch fixed the issue by resolving relocation correctly.
It looks not all relocation resolution reaches here and I did not
trace down exactly when. But I do find if the object file includes
codes in two different ELF sections than default ".text",
the above bug will be triggered.
This patch included a trivial two function source code to
demonstrate this issue. The relocation for .debug_loc is resolved
incorrectly due to this and llvm-objdump cannot display source
annotated assembly.
Roman Lebedev [Sat, 17 Aug 2019 21:35:33 +0000 (21:35 +0000)]
[NFC][InstCombine] Some tests for 'shift amount reassoc in bit test - trunc-of-lshr' (PR42399)
Finally, the fold i was looking forward to :)
The legality check is muddy, i doubt i've groked the full generalization,
but it handles all the cases i care about, and can come up with:
https://rise4fun.com/Alive/26j
George Rimar [Sat, 17 Aug 2019 16:07:18 +0000 (16:07 +0000)]
Recommit r369190 "[llvm-readobj/llvm-readelf] - Improve/cleanup the error reporting API."
Fix: Add a `consumeError` call removed by mistake to 'printStackSize',
this should fix the "Expected<T> must be checked before access or destruction." reported by following bot:
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/9743/steps/stage%201%20check/logs/stdio
Original commit message:
Currently we have the following functions for error reporting:
George Rimar [Sat, 17 Aug 2019 15:24:16 +0000 (15:24 +0000)]
[llvm-readobj] - An attemp to fix BB after r369191.
Few BB failed with the following error:
Command Output (stderr):
--
/home/buildbots/ppc64be-clang-lnt-test/clang-ppc64be-lnt/llvm/test/tools/llvm-readobj/stack-sizes.test:263:19: error: BADSECTION-OUT: expected string not found in input
# BADSECTION-OUT: 8 ?
^
<stdin>:4:1: note: scanning from here
^
It doesn't reproduce on ubuntu/windows I have. Also, seems many of the bots
are happy too.
This slightly reorders the code to make fouts().flush() call earlier,
like it was before the r369191.
Kang Zhang [Sat, 17 Aug 2019 14:37:05 +0000 (14:37 +0000)]
[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks
Summary:
Fix a bug of preducessors.
In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun.
But the `early-ret` pass is before `block-placement`, we don't want to run it again.
This patch is to do the simple early return to optimize the blocks at the last of `block-placement`.
Troy A. Johnson [Sat, 17 Aug 2019 14:20:41 +0000 (14:20 +0000)]
[circular_raw_ostream] Delegate is_displayed to contained stream
raw_ostream has an is_displayed() member function that determines if the stream
is connected to a console for display or is connected to a file/pipe. By
default, is_displayed() returns false, and derived classes like raw_fd_ostream
override it. Because circular_raw_ostream wraps another stream, its result for
is_displayed() should be the same as that stream.
Alina Sbirlea [Sat, 17 Aug 2019 01:02:12 +0000 (01:02 +0000)]
[MemorySSA] Loop passes should mark MSSA preserved when available.
This patch applies only to the new pass manager.
Currently, when MSSA Analysis is available, and pass to each loop pass, it will be preserved by that loop pass.
Hence, mark the analysis preserved based on that condition, vs the current `EnableMSSALoopDependency`. This leaves the global flag to affect only the entry point in the loop pass manager (in FunctionToLoopPassAdaptor).
Petr Hosek [Sat, 17 Aug 2019 00:07:26 +0000 (00:07 +0000)]
[llvm-readobj] Unwrap the value first to avoid the error
This addresses the issue introduced in r369169, we need to unwrap
the value first before we can check whether it's empty. This also
swaps the two branches to put the common path first which should
be NFC.
As noted in the post-commit thread for r367891, this can create
a multiply that is lowered to a libcall that may not exist.
We need to improve the backend decomposition for integer multiply
before trying to re-land this (if it's still worthwhile after
doing the backend work).
Sanjay Patel [Fri, 16 Aug 2019 23:10:34 +0000 (23:10 +0000)]
[CodeGenPrepare] Fix use-after-free
If OptimizeExtractBits() encountered a shift instruction with no operands at all,
it would erase the instruction, but still return false.
This previously didn’t matter because its caller would always return after
processing the instruction, but https://reviews.llvm.org/D63233 changed the
function’s caller to fall through if it returned false, which would then cause
a use-after-free detectable by ASAN.
This change makes OptimizeExtractBits return true if it removes a shift
instruction with no users, terminating processing of the instruction.
Eli Friedman [Fri, 16 Aug 2019 22:20:14 +0000 (22:20 +0000)]
[ARM] Preserve liveness in ARMConstantIslands.
We currently don't use liveness information after this point, but it can
be useful to catch bugs using -verify-machineinstrs, and optimizations
could potentially use this information in the future.
[Attributor] Fix: Do not partially resolve returned calls.
By partially resolving returned calls we did not record that they were
not fully resolved which caused odd behavior down the line. We could
also end up with some, but not all, returned values of the callee in the
returned values map of the caller, another odd behavior we want to
avoid.
[CaptureTracking] Allow null to be in either icmp operand
Summary:
Before we required the comparison against null to be "canonical", hence
null to be operand #1. This patch allows null to be in either operand,
similar to the handling of loaded globals that follows.
[Attributor] Add all missing attribute definitions/symbols
As a preparation to "on-demand" abstract attribute generation we need
implementations for all attributes (as they can be queried and then
created on-demand where we now fail to find one).
[Attributor] Towards a more structured deduction pattern
Summary:
This is the first commit aiming to structure the attribute deduction.
The base idea is that we have default propagation patterns as listed
below on top of which we can add specific, e.g., context sensitive,
logic.
Deduction patterns used in this patch:
- argument states are determined from call site argument states,
see AAAlignArgument and AAArgumentFromCallSiteArguments.
- call site argument states are determined as if they were floating
values, see AAAlignCallSiteArgument and AAAlignFloating.
- floating value states are determined by traversing the def-use chain
and combining the states determined for the leaves, see
AAAlignFloating and genericValueTraversal.
- call site return states are determined from function return states,
see AAAlignCallSiteReturned and AACallSiteReturnedFromReturned.
- function return states are determined from returned value states,
see AAAlignReturned and AAReturnedFromReturnedValues.
Through this strategy all logic for alignment is concentrated in the
AAAlignFloating::updateImpl method.
Note: This commit works on its own but is part of a larger change that
involves "on-demand" creation of abstract attributes that will
participate in the fixpoint iteration. Without this part, we sometimes
do not have an AAAlign abstract attribute to query, loosing information
we determined before. All tests have appropriate FIXMEs and the
information will be recovered once we added all parts.
[Attributor][NFC] Introduce aliases for call site attributes
Until we have call site specific liveness and/or value information there
is no need to do call site specific deduction. Though, we need the
symbols in follow up patches that make Attributor::getAAFor return a
reference.
[Attributor] Introduce initialize calls and move code to keep attributes concise
Summary:
This patch should not change the behavior except that the added
initialize methods might indicate an optimistic fixpoint earlier. The
code movement is done to keep the attribute definitions in a single
block where it makes sense. No functional changes intended there.
Sanjay Patel [Fri, 16 Aug 2019 18:51:30 +0000 (18:51 +0000)]
[InstCombine] canonicalize a scalar-select-of-vectors to vector select
This pattern may arise more frequently with an enhancement to SLP vectorization suggested in PR42755:
https://bugs.llvm.org/show_bug.cgi?id=42755
...but we should handle this pattern to make things easier for the backend either way.
For all in-tree targets that I looked at, codegen for typical vector sizes looks better when we change
to a vector select, so this is safe to do without a cost model (in other words, as a target-independent
canonicalization).
For example, if the condition of the select is a scalar, we end up with something like this on x86:
Guanzhong Chen [Fri, 16 Aug 2019 18:21:08 +0000 (18:21 +0000)]
[WebAssembly] Forbid use of EM_ASM with setjmp/longjmp
Summary:
We tried to support EM_ASM with setjmp/longjmp in binaryen. But with dynamic
linking thrown into the mix, the code is no longer understandable and cannot
be maintained. We also discovered more bugs in the EM_ASM handling code.
To ensure maintainability and correctness of the binaryen code, EM_ASM will
no longer be supported with setjmp/longjmp. This is probably fine since the
support was added recently and haven't be published.
Amara Emerson [Fri, 16 Aug 2019 18:06:53 +0000 (18:06 +0000)]
[AArch64][GlobalISel] Lower G_SHUFFLE_VECTOR with 1 elt src and 1 elt mask.
Again, it's weird that these are allowed. Since lowering support was added in
r368709 we started crashing on compiling the neon intrinsics test in the test
suite. This fixes the lowering to fold the 1 elt src/mask case into copies.
Eventually we need to generalize combineExtractWithShuffle to handle all faux shuffles and handle truncate (and X86ISD::VTRUNC etc.) there, but we're not ready yet (still creates nodes on the fly, incomplete DemandedElts support, bad use of recursive Depth limit).
Paul Walker [Fri, 16 Aug 2019 17:29:53 +0000 (17:29 +0000)]
[AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero for meta instructions.
Recommit with fixes for mac builders.
Summary:
AArch64InstrInfo::getInstSizeInBytes is incorrectly treating meta
instructions (e.g. CFI_INSTRUCTION) as normal instructions and
giving them a size of 4.
This results in branch relaxation calculating block sizes wrong.
Branch relaxation also considers alignment and thus a single
mistake can result in later blocks being incorrectly sized even
when they themselves do not contain meta instructions.
The net result is we might not relax a branch whose destination is
not within range.
[SLPVectorizer] Make the scheduler aware of the TreeEntry operands.
Summary:
The scheduler's dependence graph gets the use-def dependencies by accessing the operands of the instructions in a bundle. However, buildTree_rec() may change the order of the operands in TreeEntry, and the scheduler is currently not aware of this. This is not causing any functional issues currently, because reordering is restricted to the operands of a single instruction. Once we support operand reordering across multiple TreeEntries, as shown here: http://www.llvm.org/devmtg/2019-04/slides/Poster-Porpodas-Supernode_SLP.pdf , the scheduler will need to get the correct operands from TreeEntry and not from the individual instructions.
In short, this patch:
- Connects the scheduler's bundle with the corresponding TreeEntry. It introduces new TE and Lane fields in ScheduleData.
- Moves the location where the operands of the TreeEntry are initialized. This used to take place in newTreeEntry() setting one operand at a time, but is now moved pre-order just before the recursion of buildTree_rec(). This is required because the scheduler needs to access both operands of the TreeEntry in tryScheduleBundle().
- Updates the scheduler to access the instruction operands through the TreeEntry operands instead of accessing the instruction operands directly.
All uses of llvm::make_unique should have been replaced with
std::make_unique. This patch represents the last part of the migration
and removes the utility from LLVM.
Jordan Rose [Fri, 16 Aug 2019 17:17:45 +0000 (17:17 +0000)]
Fix llvm-config support for CMake build-mode-style builds
At some point we and/or CMake changed our build-mode-style builds from
$LLVM_OBJ_ROOT/bin/$CMAKE_CFG_INTDIR/
to
$LLVM_OBJ_ROOT/$CMAKE_CFG_INTDIR/bin/
which is way easier to use. But no one updated llvm-config.
Guozhi Wei [Fri, 16 Aug 2019 16:26:12 +0000 (16:26 +0000)]
[CodeGen/Analysis] Intrinsic llvm.assume should not block tail call optimization
In function Analysis.cpp:isInTailCallPosition, instructions between call and ret are checked to see if they block tail call optimization. If an instruction is an intrinsic call, only llvm.lifetime_end is allowed and other intrinsic functions block tail call. When compiling tcmalloc, we found llvm.assume between a hot function call and ret, it blocks the optimization. But llvm.assume doesn't generate instructions, it should not block tail call.
Cyndy Ishida [Fri, 16 Aug 2019 15:30:48 +0000 (15:30 +0000)]
[TextAPI] Update reader to be supported by lib/Object
Summary:
To be able to use the TextAPI/Reader for tbd file consumption (by libObject)
it gets passed a MemoryBufferRef which isn't castable to MemoryBuffer.
Updated the tests to expect that input as well.
Roman Lebedev [Fri, 16 Aug 2019 15:10:41 +0000 (15:10 +0000)]
[InstCombine] Shift amount reassociation in bittest: trunc-of-shl (PR42399)
Summary:
This is continuation of D63829 / https://bugs.llvm.org/show_bug.cgi?id=42399
I thought naive pattern would solve my issue, but nope, it involved truncation,
thus more folds needed.. This isn't really the fold i'm interested in,
i need trunc-of-lshr, but i'we decided to start with `shl` because it's simpler.
In this case, no extra legality checks are needed:
https://rise4fun.com/Alive/CAb
We should be careful about not increasing instruction count,
since we need to produce `zext` because `and` is done in wider type.