Jinsong Ji [Tue, 20 Aug 2019 20:45:16 +0000 (20:45 +0000)]
[llvm-extract] Update the help message for group extraction feature
Summary:
https://reviews.llvm.org/D60973 exposed the group extraction feature of
the BlockExtractor to llvm-extract.
However, the help message was not updated, so users might not be able to
know how to use this feature without looking into history/commits.
This patch just update the help message to show how to use this group
extraction feature.
Craig Topper [Tue, 20 Aug 2019 20:20:04 +0000 (20:20 +0000)]
[X86] Add a DAG combine to transform (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) -> (i8 (trunc (i16 (bitcast (v16i1 X))))) on KNL target
Without AVX512DQ we don't have KMOVB so we can't really copy 8-bits of a k-register to a GPR. We have to copy 16 bits instead. We do this even if the DAG copy is from v8i1->v16i1. If we detect the (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) we should rewrite the types to match the copy we do support. By doing this, we can help known bits to propagate without losing the upper 8 bits of the input to the extract_subvector. This allows some zero extends to be removed since we have an isel pattern to use kmovw for (zero_extend (i16 (bitcast (v16i1 X))).
Craig Topper [Tue, 20 Aug 2019 19:43:48 +0000 (19:43 +0000)]
[X86] Add isel patterns for (i64 (zext (i8 (bitcast (v16i1 X))))) to use a KMOVW and a SUBREG_TO_REG. Similar for i8 and anyextend.
We already had patterns for extending to i32 to take advantage of
the impliciting zeroing of the upper bits of a 32-bit GPR that is
done by KMOVW/KMOVB. But the extend might be all the way to i64,
in which case the existing patterns would fail and we'd get a
KMOVW/B followed by a MOVZX. By adding patterns for i64 we can
use the fact that KMOVW/B zero the upper bits of the 32-bit GPR
and the normal property that 32-bit GPR writes implicitly zero the
upper 32-bits of the full 64-bit GPR.
The anyextend patterns are slightly different since we don't care
about the upper zeros. For the i8->i64 I think this avoids selecting
the anyextend as a MOVZX to prevent a partial register issue that
doesn't exist. For i16->i64 I think we would have just emitted an
insert_subreg on top of the extract_subreg that the vXi16->i16
bitcast pattern emits. The register coalescer or peephole pass
should combine those, but this saves that work and makes i8/16
consistent.
Sam Clegg [Tue, 20 Aug 2019 18:39:24 +0000 (18:39 +0000)]
[WebAssembly][lld] Fix crash when applying relocations to debug sections
Debug sections are special in that they can contain relocations against
symbols that are not present in the final output (i.e. not live).
However it is also possible to have R_WASM_TABLE_INDEX relocations
against symbols that don't have a table index assigned (since they are
not address taken by actual code.
Thomas Raoux [Tue, 20 Aug 2019 15:54:59 +0000 (15:54 +0000)]
[CodeGen] Add a pass to do block predication on SSA machine IR.
For targets requiring aggressive scheduling and/or software pipeline we need to
apply predication before preRA scheduling. This adds a pass re-using the early
if-cvt infrastructure but generating predicated instructions instead of
speculatively executing instructions. It allows doing if conversion on blocks
containing instructions with side-effects. The pass re-use the target hook from
postRA if-conversion to let the target decide on the heuristic to apply.
Sanjay Patel [Tue, 20 Aug 2019 14:56:44 +0000 (14:56 +0000)]
[InstCombine] improve readability for icmp with cast folds; NFC
1. Update function name and stale code comments.
2. Use variable names that are less ambiguous.
3. Move operand checks into the function as early exits.
Jinsong Ji [Tue, 20 Aug 2019 14:46:02 +0000 (14:46 +0000)]
[BlockExtractor] Avoid assert with wrong line format
Summary:
When the line format is wrong, we may end up accessing out of bound
memory. eg: the test with invalide line will cause assert.
Assertion `idx < size()' failed
The fix is to report fatal when we found mismatched line format.
Sanjay Patel [Tue, 20 Aug 2019 13:39:17 +0000 (13:39 +0000)]
[InstCombine] simplify min/max of min/max with same operands (PR35607)
This is the original integer variant requested in:
https://bugs.llvm.org/show_bug.cgi?id=35607
As noted in the TODO and several similar TODOs around this block,
we could do this in instsimplify, but then it would cost more
because we would be trying to match min/max via ValueTracking
in 2 different places.
There are 4 commuted variants for each of smin/smax/umin/umax
that are not matched here. There are also icmp predicate variants
that are not included in the affected test file because they are
already handled by instsimplify by folding the final icmp to
true/false.
George Rimar [Tue, 20 Aug 2019 13:19:16 +0000 (13:19 +0000)]
[llvm-objdump] - Remove one of `report_error` functions and improve the error reporting.
One of the report_error functions was taking object::Archive::Child as an
argument. It feels excessive, this patch removes it and introduce a helper
function instead. Also I fixed a "TODO" in this patch what improved the message printed.
Igor Kudrin [Tue, 20 Aug 2019 12:52:32 +0000 (12:52 +0000)]
[DWARF] Fix reading 64-bit DWARF type units.
The type_offset field is 8 bytes long in DWARF64. The patch extends
TypeOffset to uint64_t and fixes its reading. The patch also fixes
checking of TypeOffset bounds as it was inaccurate in DWARF64 case.
Alex Bradbury [Tue, 20 Aug 2019 12:32:31 +0000 (12:32 +0000)]
[RISCV] Implement getExprForFDESymbol to ensure RISCV_32_PCREL is used for the FDE location
Follow binutils in using RISCV_32_PCREL for the FDE initial location. As
explained in the relevant binutils commit
<https://github.com/riscv/riscv-binutils-gdb/commit/a6cbf936e3dce68114d28cdf60d510a3f78a6d40>,
the ADD/SUB pair of relocations is problematic in the presence of linker
relaxation.
This patch has the same end goal as D64715 but includes test changes and
avoids adding a new global VariantKind to MCExpr.h (preferring
RISCVMCExpr VKs like the rest of the RISC-V backend).
Pavel Labath [Tue, 20 Aug 2019 12:08:52 +0000 (12:08 +0000)]
Recommit "MemoryBuffer: Add a missing error-check to getOpenFileImpl"
This recommits r368977, which was reverted in r369027 due to test
failures in lldb. The cause of this was different behavior of
readNativeFileSlice on windows and unix. These have been addressed in
r369269.
The original commit message was:
In case the function was called with a desired read size *and* the file
was not an "mmap()" candidate, the function was falling back to a
"pread()", but it was failing to check the result of that system call.
This meant that the function would return "success" even though the read
operation failed, and it returned a buffer full of uninitialized memory.
Simon Pilgrim [Tue, 20 Aug 2019 11:20:05 +0000 (11:20 +0000)]
[CMake] Update C4324 MSVC warning comment to explain its still broken at VS2019
As promised, I've updated the comment for the C4324 MSVC warning that was re-disabled at rL367409 / rG8f823e63e3edf87ab029ba32b68f3eb5d2f392b5 to put it in terms of currently supported VS versions
Andrea Di Biagio [Tue, 20 Aug 2019 10:23:55 +0000 (10:23 +0000)]
[X86][Btver2] Fix latency and throughput of CMPXCHG instructions.
On Jaguar, CMPXCHG has a latency of 11cy, and a maximum throughput of 0.33 IPC.
Throughput is superiorly limited to 0.33 because of the implicit in/out
dependency on register EAX. In the case of repeated non-atomic CMPXCHG with the
same memory location, store-to-load forwarding occurs and values for sequent
loads are quickly forwarded from the store buffer.
Interestingly, the functionality in LLVM that computes the reciprocal throughput
doesn't seem to know about RMW instructions. That functionality only looks at
the "consumed resource cycles" for the throughput computation. It should be
fixed/improved by a future patch. In particular, for RMW instructions, that
logic should also take into account for the write latency of in/out register
operands.
An atomic CMPXCHG has a latency of ~17cy. Throughput is also limited to
~17cy/inst due to cache locking, which prevents other memory uOPs to start
executing before the "lock releasing" store uOP.
CMPXCHG8rr and CMPXCHG8rm are treated specially because they decode to one less
macro opcode. Their latency tend to be the same as the other RR/RM variants. RR
variants are relatively fast 3cy (but still microcoded - 5 macro opcodes).
CMPXCHG8B is 11cy and unfortunately doesn't seem to benefit from store-to-load
forwarding. That means, throughput is clearly limited by the in/out dependency
on GPR registers. The uOP composition is sadly unknown (due to the lack of PMCs
for the Integer pipes). I have reused the same mix of consumed resource from the
other CMPXCHG instructions for CMPXCHG8B too.
LOCK CMPXCHG8B is instead 18cycles.
CMPXCHG16B is 32cycles. Up to 38cycles when the LOCK prefix is specified. Due to
the in/out dependencies, throughput is limited to 1 instruction every 32 (or 38)
cycles dependeing on whether the LOCK prefix is specified or not.
I wouldn't be surprised if the microcode for CMPXCHG16B is similar to 2x
microcode from CMPXCHG8B. So, I have speculatively set the JALU01 consumption to
2x the resource cycles used for CMPXCHG8B.
The two new hasLockPrefix() functions are used by the btver2 scheduling model
check if a MCInst/MachineInst has a LOCK prefix. Calls to hasLockPrefix() have
been encoded in predicates of variant scheduling classes that describe lat/thr
of CMPXCHG.
George Rimar [Tue, 20 Aug 2019 08:23:57 +0000 (08:23 +0000)]
[test/Object] - Move/rewrite 2 more test cases.
This patch makes a change for test/Object tests responsible
for relocations.
* 2 tests were moved to llvm-readobj/llvm-objdump folders:
Object/elf-reloc-no-sym.test -> tools/llvm-readobj/elf-reloc-no-sym.test
Object/objdump-reloc-shared.test -> tools/llvm-objdump/relocations-in-nonreloc.test
* A prerecompiled binary was removed and these tests were refactored.
Fangrui Song [Tue, 20 Aug 2019 07:42:04 +0000 (07:42 +0000)]
[MC] Delete an overload of MCExpr::evaluateKnownAbsolute and its associated hack
The hack dated back to 2010 (r121076) and was documented by r122144:
// FIXME: The use if InSet = Addrs is a hack. Setting InSet causes us
// absolutize differences across sections and that is what the MachO writer
// uses Addrs for.
Craig Topper [Tue, 20 Aug 2019 06:58:00 +0000 (06:58 +0000)]
[X86] Add back the -x86-experimental-vector-widening-legalization comand line flag and all associated code, but leave it enabled by default
Google is reporting performance issues with the new default behavior
and have asked for a way to switch back to the old behavior while we
investigate and make fixes.
I've restored all of the code that had since been removed and added
additional checks of the command flag onto code paths that are
not otherwise guarded by a check of getTypeAction.
I've also modified the cost model tables to hopefully get us back
to the previous costs.
Hopefully we won't need to support this for very long since we
have no test coverage of the old behavior so we can very easily
break it.
Before, we create the set of abstract attributes initially and then
dealt with the fact hat a lookup could fail, e.g., return a nullptr.
This patch will ensure we always return a valid object from a lookup,
allowing us not only to remove the nullptr checks but also to grow the
set of abstract attributes "in-flight" on-demand.
One can now start from those that have the best chance of improving
performance without the need to specify all they might depend on.
While this introduces some boilerplate, the usage of attributes is much
easier and cleaner now.
[Attributor] Use structured deduction for AADereferenceable
Summary:
This is analogous to D66128 but for AADereferenceable. We have the logic
concentrated in the floating value updateImpl and we use the combiner
helper classes for arguments and return values.
The regressions will go away with "on-demand" attribute creation.
Improvements are already visible in the existing tests.
[Attributor] Use structured deduction for AANonNull
Summary:
What D66126 did for AAAlign, this patch does for AANonNull. Agian, the
logic becomes more concise and localized. Again, returned poiners are
not annotated properly but that will not be an issue if this lands with
the "on-demand" generation of attributes. First improvements due to the
genericValueTraversal are already visible.
The clamp operator should not take the known of the given state as the
known is potentially based on assumed information. This also adds TODOs
to guide improvements.
WebAssembly doesn't support PC relative relocation or relocation
expressions that can't be reduced to single symbol.
The only support for we have for fixups involving two symbols are when
both symbols are defined and withing the same section. In this case
evaluateFixup will already have evaluated to the expression before
calling recordRelocation.
Hubert Tong [Mon, 19 Aug 2019 23:12:48 +0000 (23:12 +0000)]
[cmake] Link in LLVMPasses due to dependency by LLVMOrcJIT; NFC
Summary:
rL367756 (f5c40cb) increases the dependency of LLVMOrcJIT on LLVMPasses.
In particular, symbols defined in LLVMPasses that are referenced by the
destructor of `PassBuilder` are now referenced by LLVMOrcJIT through
`Speculation.cpp.o`.
We believe that referencing symbols defined in LLVMPasses in the
destructor of `PassBuilder` is valid, and that adding to the set of such
symbols is legitimate. To support such cases, this patch adds LLVMPasses
to the set of libraries being linked when linking in LLVMOrcJIT causes
such symbols from LLVMPasses to be referenced.
Joel E. Denny [Mon, 19 Aug 2019 22:59:37 +0000 (22:59 +0000)]
[lit] Check for accidental external command calls
This patch extends lit's test suite to check that lit's internal shell
doesn't accidentally execute internal commands as external commands.
It does so by putting fake failing versions of those commands in
`PATH` while the entire lit test suite is running. Without the fixes
in D65697 but with its tests, this approach catches accidental
external `env` calls.
Anton Afanasyev [Mon, 19 Aug 2019 22:58:26 +0000 (22:58 +0000)]
[Support][Time profiler] Make FE codegen blocks to be inside frontend blocks
Summary:
Add `Frontend` time trace entry to `HandleTranslationUnit()` function.
Add test to check all codegen blocks are inside frontend blocks.
Also, change `--time-trace-granularity` option a bit to make sure very small
time blocks are outputed to json-file when using `--time-trace-granularity=0`.
Matthias Gehre [Mon, 19 Aug 2019 21:59:44 +0000 (21:59 +0000)]
[ORC] fix use-after-free detected by -Wreturn-stack-address
Summary:
llvm/lib/ExecutionEngine/Orc/Layer.cpp:53:12: warning: returning address of local temporary object [-Wreturn-stack-address]
In
```
StringRef IRMaterializationUnit::getName() const {
[...]
return TSM.withModuleDo(
[](const Module &M) { return M.getModuleIdentifier(); });
```
`getModuleIdentifier()` returns a `const std::string &`, but the implicit return type
of the lambda is `std::string` by value, and thus the returned `StringRef` refers
to a temporary `std::string`.
Detect by annotating `llvm::StringRef` with `[[gsl::Pointer]]`.
[CaptureTracker] Let subclasses provide dereferenceability information
Summary:
CaptureTracker subclasses might have better dereferenceability
information which allows null pointer checks to be no-capturing.
The first user will be D59922.
Seiya Nuta [Mon, 19 Aug 2019 21:12:02 +0000 (21:12 +0000)]
Recommit "[llvm-objcopy][MachO] Implement a layout algorithm for executables"
Summary: The layout algorithm for relocatable objects and for executable are somewhat different. This patch implements the latter one based on the algorithm in LLD (MachOFileLayout).
Seiya Nuta [Mon, 19 Aug 2019 21:05:31 +0000 (21:05 +0000)]
Recommit "[llvm-objcopy][MachO] Support load commands used in executables/shared libraries"
Summary:
This patch implements copying some load commands that appear in executables/shared libraries such as the indirect symbol table.
I don't add tests intentionally because this patch is incomplete: we need a layout algorithm for executables/shared libraries. I'll submit it as a separate patch with tests.
Evgeniy Stepanov [Mon, 19 Aug 2019 20:47:09 +0000 (20:47 +0000)]
MemTag: stack initializer merging.
Summary:
MTE provides instructions to update memory tags and data at the same
time. This change makes use of those to generate more compact code for
stack variable tagging + initialization.
We collect memory store and memset instructions following an alloca or a
lifetime.start call, and replace them with the corresponding MTE
intrinsics. Since the intrinsics work on 16-byte aligned chunks, the
stored values are combined as necessary.
Alina Sbirlea [Mon, 19 Aug 2019 18:57:40 +0000 (18:57 +0000)]
[MemorySSA] Rename uses when inserting memory uses.
Summary:
When inserting uses from outside the MemorySSA creation, we don't
normally need to rename uses, based on the assumption that there will be
no inserted Phis (if Def existed that required a Phi, that Phi already
exists). However, when dealing with unreachable blocks, MemorySSA will
optimize away Phis whose incoming blocks are unreachable, and these Phis end
up being re-added when inserting a Use.
There are two potential solutions here:
1. Analyze the inserted Phis and clean them up if they are unneeded
(current method for cleaning up trivial phis does not cover this)
2. Leave the Phi in place and rename uses, the same way as whe inserting
defs.
This patch use approach 2.
Andrea Di Biagio [Mon, 19 Aug 2019 18:20:30 +0000 (18:20 +0000)]
[X86] Move scheduling tests for CMPXCHG to the corresponding resources-x86_64.s files. NFC
In D66424 it has been requested to move all the new tests added by r369278 into
resources-x86_64.s. That is because only the 8b/16 ops should be tested by
resources-cmpxchg.s. This partially reverts r369278.
Craig Topper [Mon, 19 Aug 2019 18:15:50 +0000 (18:15 +0000)]
[X86] Teach lowerV4I32Shuffle to only use broadcasts if the mask has more than one undef element. Prioritize shifts over broadcast in lowerV8I16Shuffle.
The motivating case are the changes in vector-reduce-add.ll where
we were doing extra work in the scalar domain instead of shuffling.
There may be some one use check that needs to be looked into there,
but this patch sidesteps the issue by avoiding broadcasts that
aren't really broadcasting.
Craig Topper [Mon, 19 Aug 2019 18:02:24 +0000 (18:02 +0000)]
[CGP] Remove ModifiedDT from the makeBitReverse loop
I don't think anything in this loop modifies the control flow and we don't restart any iteration after setting the flag.
This code was added in http://reviews.llvm.org/D16893 but looking at the test case added there the code that caused the dominator tree to change was merging blocks with their predecessor not the bitreverse optimization.
Pavel Labath [Mon, 19 Aug 2019 15:40:49 +0000 (15:40 +0000)]
Filesystem/Windows: fix inconsistency in readNativeFileSlice API
Summary:
The windows version implementation of readNativeFileSlice, was trying to
match the POSIX behavior of not treating EOF as an error, but it was
only handling the case of reading from a pipe. Attempting to read past
the end of a regular file returns a slightly different error code, which
needs to be handled too. This patch adds ERROR_HANDLE_EOF to the list of
error codes to be treated as an end of file, and adds some unit tests
for the API.
This issue was found while attempting to land D66224, which caused a bunch of
lldb tests to start failing on windows.
Roman Lebedev [Mon, 19 Aug 2019 15:01:42 +0000 (15:01 +0000)]
[TargetLowering] x s% C == 0 fold: vector divisor with INT_MIN handling
Summary:
The general fold is only valid for positive divisors.
Which effectively means, it is invalid for `INT_MIN` divisors,
and we currently bailout if we see them.
But that is too strict, we can just fix-up the results.
For that, let's do a second computation 'in parallel':
```
Name: srem -> and
Pre: isPowerOf2(C)
%o = srem i8 %X, C
%r = icmp eq %o, 0
=>
%n = and i8 %X, C-1
%r = icmp eq %n, 0
```
https://rise4fun.com/Alive/Sup
And then just blend results: if the divisor was `INT_MIN`,
pick the value we got via bit-test,
else pick the value from general fold.
There's interesting observation - `ISD::ROTR` is set to
`LegalizeAction::Expand` before AVX512, so we should not
treat `INT_MIN` divisor as even; and as it can be seen
while `@test_srem_odd_even_one` improves on all run-lines,
`@test_srem_odd_even_INT_MIN` only improves for AVX512.
Alex Bradbury [Mon, 19 Aug 2019 13:23:02 +0000 (13:23 +0000)]
[RISCV] Don't force absolute FK_Data_X fixups to relocs
The current behavior of shouldForceRelocation forces relocations for the
majority of fixups when relaxation is enabled. This makes sense for
fixups which incorporate symbols but is unnecessary for simple data
fixups where the fixup target is already resolved to an absolute value.
Differential Revision: https://reviews.llvm.org/D63404
Patch by Edward Jones.
Jeremy Morse [Mon, 19 Aug 2019 09:53:07 +0000 (09:53 +0000)]
[DebugInfo] Make postra sinking of DBG_VALUEs subregister-safe
Currently the machine instruction sinker identifies DBG_VALUE insts that
also need to sink by comparing register numbers. Unfortunately this isn't
safe, because (after register allocation) a DBG_VALUE may read a register
that aliases what's being sunk. To fix this, identify the DBG_VALUEs that
need to sink by recording & examining their register units. Register units
gives us the following guarantee:
"Two registers overlap if and only if they have a common register unit"
[MCRegisterInfo.h]
Thus we can always identify aliasing DBG_VALUEs if the set of register
units read by the DBG_VALUE, and the register units of the instruction
being sunk, intersect. (MachineSink already uses classes like
"LiveRegUnits" for determining sinking validity anyway).
The test added checks for super and subregister DBG_VALUE reads of a sunk
copy being sunk as well.
Jeremy Morse [Mon, 19 Aug 2019 09:02:18 +0000 (09:02 +0000)]
[DebugInfo] Test for variable range un-coalescing
LiveDebugVariables can coalesce ranges of variable locations across
multiple basic blocks. However when it recreates DBG_VALUE instructions,
it has to recreate one DBG_VALUE per block, otherwise it doesn't
represent the pre-regalloc layout and variable assignments can go missing.
This feature works -- however while mucking around with LiveDebugVariables,
I commented the relevant code it out and no tests failed. Thus, here's a
test that checks LiveDebugVariables preserves DBG_VALUEs across block
boundaries.
Fangrui Song [Mon, 19 Aug 2019 06:17:30 +0000 (06:17 +0000)]
[MC] Don't emit .symver redirected symbols to the symbol table
GNU as keeps the original symbol in the symbol table for defined @ and
@@, but suppresses it in other cases (@@@ or undefined). The original
symbol is usually undesired:
In a shared object, the original symbol can be localized with a version
script, but it is hard to remove/localize in an archive:
1) a post-processing step removes the undesired original symbol
2) consumers (executable) of the archive are built with the
version script
Moreover, it can cause linker issues like binutils PR/18703 if the
original symbol name and the base name of the versioned symbol is the
same (both ld.bfd and gold have some code to work around defined @ and
@@). In lld, if it sees f and f@v1:
--version-script =(printf 'v1 {};') => f and f@v1
--version-script =(printf 'v1 { f; };') => f@v1 and f@@v1
It can be argued that @@@ added on 2000-11-13 corrected the @ and @@ mistake.
This patch catches some more multiple version errors (defined @ and @@),
and consistently suppress the original symbol. This addresses all the
problems listed above.
If the user wants other aliases to the versioned symbol, they can copy
the original symbol to other symbol names with .set directive, e.g.
.symver f, f@v1 # emit f@v1 but not f into .symtab
.set f_impl, f # emit f_impl into .symtab
Craig Topper [Mon, 19 Aug 2019 05:45:39 +0000 (05:45 +0000)]
[X86] Teach lower1BitShuffle to match right shifts with upper zero elements on types that don't natively support KSHIFT.
We can support these by widening to a supported type,
then shifting all the way to the left and then
back to the right to ensure that we shift in zeroes.
Seiya Nuta [Mon, 19 Aug 2019 05:41:33 +0000 (05:41 +0000)]
[llvm-objcopy][MachO] Implement a layout algorithm for executables
Summary: The layout algorithm for relocatable objects and for executable are somewhat different. This patch implements the latter one based on the algorithm in LLD (MachOFileLayout).
Seiya Nuta [Mon, 19 Aug 2019 05:37:38 +0000 (05:37 +0000)]
[llvm-objcopy][MachO] Support load commands used in executables/shared libraries
Summary:
This patch implements copying some load commands that appear in executables/shared libraries such as the indirect symbol table.
I don't add tests intentionally because this patch is incomplete: we need a layout algorithm for executables/shared libraries. I'll submit it as a separate patch with tests.
Craig Topper [Mon, 19 Aug 2019 04:08:44 +0000 (04:08 +0000)]
[X86] Fix the lower1BitShuffle code added in r369215 to correctly pass the widened vector to the KSHIFT node.
Not sure how to test this as we have tests that exercise this code,
but nothing failed for the types not matching. Since all the k-registers
use equivalent register classes everything just ends up working.