Simon Pilgrim [Sat, 5 Oct 2019 20:49:34 +0000 (20:49 +0000)]
[X86][AVX] Push sign extensions of comparison bool results through bitops (PR42025)
As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop.
This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended.
Sanjay Patel [Sat, 5 Oct 2019 18:03:58 +0000 (18:03 +0000)]
[SLP] avoid reduction transform on patterns that the backend can load-combine
I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146
We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).
Here, we add a scalar cost model adjustment with a conservative pattern match and cost
summation for a multi-instruction sequence that can probably be reduced later.
This should prevent SLP from creating a vector reduction unless that sequence is
extremely cheap.
In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:
movbe rax, qword ptr [rdi]
or:
mov rax, qword ptr [rdi]
Not some (half) vector monstrosity as we currently do using SLP:
James Molloy [Sat, 5 Oct 2019 08:57:17 +0000 (08:57 +0000)]
[UnitTests] Try and pacify gcc-5
This looks like a defect in gcc-5 where it chooses a constexpr
constructor from the initializer-list that it considers to be explicit.
I've tried to reproduce but I can't install anything prior to gcc-6 easily
on my system, and that doesn't have the error. So this is speculative
pacification.
Mehdi Amini [Sat, 5 Oct 2019 01:37:04 +0000 (01:37 +0000)]
Expose ProvidePositionalOption as a public API
The motivation is to reuse the key value parsing logic here to
parse instance specific pass options within the context of MLIR.
The primary functionality exposed is the "," splitting for
arrays and the logic for properly handling duplicate definitions
of a single flag.
Philip Reames [Sat, 5 Oct 2019 00:32:10 +0000 (00:32 +0000)]
Fix a *nasty* miscompile in experimental unordered atomic lowering
This is an omission in rL371441. Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block.
Included test case is how I spotted this. We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with. I'm sure it showed up a bunch of other ways as well.
Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode. Finding this with testing would have been "unpleasant".
It caused PR43566 by removing empty, address-taken MachineBasicBlocks.
Such blocks may have references from blockaddress or other operands, and
need more consideration to be removed.
We do indeed already get it right in some cases, but only transitively,
with one-use restrictions. Since we only need to produce a single
comparison, it makes sense to match the pattern directly:
https://rise4fun.com/Alive/kPg
In https://bugs.llvm.org/show_bug.cgi?id=43564 we happen to have
this very sequence, of two right shifts separated by trunc.
And "just" so that happens, we apparently can fold the pattern
if the total shift amount is either 0, or it's equal to the bitwidth
of the innermost widest shift - i.e. if we are left with only the
original sign bit. Which is exactly what is wanted there.
Julian Lettner [Fri, 4 Oct 2019 21:40:20 +0000 (21:40 +0000)]
[lit] Use better name for "test in parallel" concept
In the past, lit used threads to run tests in parallel. Today we use
`multiprocessing.Pool`, which uses processes. Let's stay more abstract
and use "worker" everywhere.
[MachineOutliner] Disable outlining from noreturn functions
Outlining from noreturn functions doesn't do the correct thing right now. The
outliner should respect that the caller is marked noreturn. In the event that
we have a noreturn function, and the outlined code is in tail position, the
outliner will not see that the outlined function should be tail called. As a
result, you end up with a regular call containing a return.
Fixing this requires that we check that all candidates live inside noreturn
functions. So, for the sake of correctness, don't outline from noreturn
functions right now.
Eli Friedman [Fri, 4 Oct 2019 19:51:40 +0000 (19:51 +0000)]
[ScheduleDAG] When a node is cloned, add an edge between the nodes.
InstrEmitter's virtual register handling assumes that clones are emitted
after the cloned node. Make sure this assumption actually holds.
Fixes a "Node emitted out of order - early" assertion on the testcase.
This is probably a very rare case to actually hit in practice; even
without the explicit edge, the scheduler will usually end up scheduling
the nodes in the expected order due to other constraints.
Craig Topper [Fri, 4 Oct 2019 18:02:46 +0000 (18:02 +0000)]
[X86] Remove isel patterns for mask vpcmpgt/vpcmpeq. Switch vpcmp to these based on the immediate in MCInstLower
The immediate form of VPCMP can represent these completely. The
vpcmpgt/eq are just shorter encodings.
This patch removes the isel patterns and just swaps the opcodes
and removes the immediate in MCInstLower. This matches where we do
some other encodings tricks.
Siva Chandra [Fri, 4 Oct 2019 17:30:54 +0000 (17:30 +0000)]
Add few docs and implementation of strcpy and strcat.
Summary:
This patch illustrates some of the features like modularity we want
in the new libc. Few other ideas like different kinds of testing, redirectors
etc are not yet present.
Mikhail Maltsev [Fri, 4 Oct 2019 16:44:18 +0000 (16:44 +0000)]
[utils] Fix incompatibility of bisect[-skip-count] with Python 3
Summary:
This change replaces the print statements with print function calls
and also replaces the '/' operator (which is integer division in Py2,
but becomes floating point division in Py3) with the '//' operator
which has the same semantics in Py2 and Py3.
[NFC] [FileCheck] Reapply fix init of objects in unit tests
Summary:
Fix initialization style of objects allocated on the stack and member
objects in unit test to use the "Type Var(init list)" and
"Type Member{init list}" convention. The latter fixes the buildbot
breakage.
Tim Northover [Fri, 4 Oct 2019 12:29:32 +0000 (12:29 +0000)]
ARM-Darwin: keep the frame register reserved even if not updated.
Darwin platforms need the frame register to always point at a valid record even
if it's not updated in a leaf function. Backtraces are more important than one
extra GPR.
Simon Atanasyan [Fri, 4 Oct 2019 11:59:16 +0000 (11:59 +0000)]
[llvm-readobj][mips] Implement GNU-style printing of .MIPS.abiflags section
In this patch `llvm-readobj` prints ASEs flags on a single line
separated by a comma. GNU `readelf` prints each ASEs flag on
a separate line. It will be fixed later.
Simon Atanasyan [Fri, 4 Oct 2019 11:59:06 +0000 (11:59 +0000)]
[llvm-readobj] Replace arch-specific ObjDumper methods by the single `printArchSpecificInfo`
Initially llvm-readobj supports multiple command line options like
`--arm-attributes` and `--mips-plt-got` for display ELF arch-specific
information. Now all these options are superseded by the
`--arch-specific` one. It makes sense to have a single `printArchSpecificInfo`
method in the base `ObjDumper`, and hide all ELF/target specific details
in the `ELFDumper::printArchSpecificInfo` override.
Jeremy Morse [Fri, 4 Oct 2019 10:53:47 +0000 (10:53 +0000)]
[DebugInfo] LiveDebugValues: move DBG_VALUE creation into VarLoc class
Rather than having a mixture of location-state shared between DBG_VALUEs
and VarLoc objects in LiveDebugValues, this patch makes VarLoc the
master record of variable locations. The refactoring means that the
transfer of locations from one place to another is always a performed by
an operation on an existing VarLoc, that produces another transferred
VarLoc. DBG_VALUEs are only created at the end of LiveDebugValues, once
all locations are known. As a plus, there is now only one method where
DBG_VALUEs can be created.
The test case added covers a circumstance that is now impossible to
express in LiveDebugValues: if an already-indirect DBG_VALUE is spilt,
previously it would have been restored-from-spill as a direct DBG_VALUE.
We now don't lose this information along the way, as VarLocs always
refer back to the "original" non-transfer DBG_VALUE, and we can always
work out whether a location was "originally" indirect.
Jeremy Morse [Fri, 4 Oct 2019 09:38:05 +0000 (09:38 +0000)]
[DebugInfo] LiveDebugValues: defer DBG_VALUE creation during analysis
When transfering variable locations from one place to another,
LiveDebugValues immediately creates a DBG_VALUE representing that
transfer. This causes trouble if the variable location should
subsequently be invalidated by a loop back-edge, such as in the added
test case: the transfer DBG_VALUE from a now-invalid location is used
as proof that the variable location is correct. This is effectively a
self-fulfilling prophesy.
To avoid this, defer the insertion of transfer DBG_VALUEs until after
analysis has completed. Some of those transfers are still sketchy, but
we don't propagate them into other blocks now.
James Molloy [Fri, 4 Oct 2019 09:03:36 +0000 (09:03 +0000)]
[TableGen] Introduce a generic automaton (DFA) backend
Summary:
This patch introduces -gen-automata, a backend for generating deterministic finite-state automata.
DFAs are already generated by the -gen-dfa-packetizer backend. This backend is more generic and will
hopefully be used to implement the DFA generation (and determinization) for the packetizer in the
future.
This backend allows not only generation of a DFA from an NFA (nondeterministic finite-state
automaton), it also emits sidetables that allow a path through the DFA under a sequence of inputs to
be analyzed, and the equivalent set of all possible NFA transitions extracted.
This allows a user to not just answer "can my problem be solved?" but also "what is the
solution?". Clearly this analysis is more expensive than just playing a DFA forwards so is
opt-in. The DFAPacketizer has this behaviour already but this is a more compact and generic
representation.
Examples are bundled in unittests/TableGen/Automata.td. Some are trivial, but the BinPacking example
is a stripped-down version of the original target problem I set out to solve, where we pack values
(actually immediates) into bins (an immediate pool in a VLIW bundle) subject to a set of esoteric
constraints.
Summary:
llvm-ar's mri-utf8.test test relies on the en_US.UTF-8 locale to be
installed for its last RUN line to work. If not installed, the unicode
string gets encoded (interpreted) as ascii which fails since the most
significant byte is non zero. This commit changes the call to open to
use a binary literal of the UTF-8 encoding for the pound sign instead,
thus bypassing the encoding step.
Note that the echo to create the <pound sign>.txt file will work
regardless of the locale because both the shell and the echo (in case
it's not a builtin of the shell concerned) only care about ascii
character to operate. Indeed, the mri-utf8.test file (and in particular
the pound sign) is encoded in UTF-8 and UTF-8 guarantees only ascii
characters can create bytes that can be interpreted as ascii characters
(i.e. bytes with the most significant bit null).
So the process to break down the filename in the line goes something
along:
- find an ascii chevron '>'
- find beginning of the filename by removing ascii space-like characters
- find ascii newline character indicating the end of the redirection (no
semicolon ';', closing curly bracket '}' or parenthesis ')' or the
like
- create a file whose name is made of all the bytes in between beginning
and end of filename *without interpretting them*
Lang Hames [Fri, 4 Oct 2019 03:55:26 +0000 (03:55 +0000)]
[JITLink] Switch from an atom-based model to a "blocks and symbols" model.
In the Atom model the symbols, content and relocations of a relocatable object
file are represented as a graph of atoms, where each Atom represents a
contiguous block of content with a single name (or no name at all if the
content is anonymous), and where edges between Atoms represent relocations.
If more than one symbol is associated with a contiguous block of content then
the content is broken into multiple atoms and layout constraints (represented by
edges) are introduced to ensure that the content remains effectively contiguous.
These layout constraints must be kept in mind when examining the content
associated with a symbol (it may be spread over multiple atoms) or when applying
certain relocation types (e.g. MachO subtractors).
This patch replaces the Atom model in JITLink with a blocks-and-symbols model.
The blocks-and-symbols model represents relocatable object files as bipartite
graphs, with one set of nodes representing contiguous content (Blocks) and
another representing named or anonymous locations (Symbols) within a Block.
Relocations are represented as edges from Blocks to Symbols. This scheme
removes layout constraints (simplifying handling of MachO alt-entry symbols,
and hopefully ELF sections at some point in the future) and simplifies some
relocation logic.
Shiva Chen [Fri, 4 Oct 2019 02:00:57 +0000 (02:00 +0000)]
[RISCV] Split SP adjustment to reduce the offset of callee saved register spill and restore
We would like to split the SP adjustment to reduce the instructions in
prologue and epilogue as the following case. In this way, the offset of
the callee saved register could fit in a single store.
The lambda is taking the stack-allocated Verify boolean by reference and
it would go out of scope on the next iteration. Moving it out of the
loop should fix the issue.
[llvm-objdump] Further rearrange llvm-objdump sections for compatability
Summary:
rL371826 rearranged some output from llvm-objdump for GNU objdump compatability, but there still seem to be some more.
I think this rearrangement is a little closer. Overview of the ordering which matches GNU objdump:
* Archive headers
* File headers
* Section headers
* Symbol table
* Dwarf debugging
* Relocations (if `--disassemble` is not used)
* Section contents
* Disassembly
Sanjay Patel [Thu, 3 Oct 2019 21:34:04 +0000 (21:34 +0000)]
[DAGCombiner] add operation legality checks before creating shift ops (PR43542)
As discussed on llvm-dev and:
https://bugs.llvm.org/show_bug.cgi?id=43542
...we have transforms that assume shift operations are legal and transforms to
use them are profitable, but that may not hold for simple targets.
In this case, the MSP430 target custom lowers shifts by repeating (many)
simpler/fixed ops. That can be avoided by keeping this code as setcc/select.
Nico Weber [Thu, 3 Oct 2019 21:22:28 +0000 (21:22 +0000)]
Reland r349624: Let TableGen write output only if it changed, instead of doing so in cmake
Move the write-if-changed logic behind a flag and don't pass it
with the MSVC generator. msbuild doesn't have a restat optimization,
so not doing write-if-change there doesn't have a cost, and it
should fix whatever causes PR43385.
Nico Weber [Thu, 3 Oct 2019 20:41:57 +0000 (20:41 +0000)]
gn build: (manually) merge 373651 better
The reland uses a static library, not an object library.
Doesn't really matter for the gn build, but it's probalby
nice to have the same semantics for the target type.
Philip Reames [Thu, 3 Oct 2019 20:24:18 +0000 (20:24 +0000)]
[Test] Fix inconsistency in alignment in test case
The IR was using a fixed 8 byte alignment, but the MIR portion was using native alignment. Since the test doesn't appear to be deliberately testing overalignment, just make the IR match the MIR.
Jinsong Ji [Thu, 3 Oct 2019 19:36:42 +0000 (19:36 +0000)]
[PowerPC] Adjust the naming and operand order of fnmsub patterns
Summary:
This is follow up patch of https://reviews.llvm.org/D67595.
Adjust naming and the Commutable operands for additional patterns
to make it easier to read.
The testcase update also show that we can save some unecessary fmr as
well.
Daniel Sanders [Thu, 3 Oct 2019 19:13:39 +0000 (19:13 +0000)]
[gicombiner] Add a CodeExpander to handle C++ fragments with variable expansion
Summary:
This will handle expansion of C++ fragments in the declarative combiner
including custom predicates, and escapes into C++ to aid the migration
effort.
Fixed the -DLLVM_LINK_LLVM_DYLIB=ON using DISABLE_LLVM_LINK_LLVM_DYLIB when
creating the library. Apparently it automatically links to libLLVM.dylib
and we don't want that from tablegen.
Craig Topper [Thu, 3 Oct 2019 18:34:42 +0000 (18:34 +0000)]
[X86] Add v32i8 shuffle lowering strategy to recognize two v4i64 vectors truncated to v4i8 and concatenated into the lower 8 bytes with undef/zero upper bytes.
This patch recognizes the shuffle pattern we get from a
v8i64->v8i8 truncate when v8i64 isn't a legal type.
With VLX we can use two VTRUNCs, unpckldq, and a insert_subvector.
Simon Pilgrim [Thu, 3 Oct 2019 18:13:50 +0000 (18:13 +0000)]
[X86] matchShuffleWithSHUFPD - use Zeroable element mask directly. NFCI.
We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly.
This only leaves one user of createTargetShuffleMask which we can hopefully get rid of in a similar manner.
This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask.
Matt Arsenault [Thu, 3 Oct 2019 17:55:27 +0000 (17:55 +0000)]
AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelect
Register indexing 64-bit elements is possible on the SALU, but not the
VALU. Handle splitting this into two 32-bit indexes. Extend waterfall
loop handling to allow moving a range of instructions.
Matt Arsenault [Thu, 3 Oct 2019 17:50:32 +0000 (17:50 +0000)]
AMDGPU/GlobalISel: Allow VGPR to index SGPR register
We can still do a waterfall loop over the index if using a VGPR to
index an SGPR. The result will still be a VGPR, but we can avoid the
wide copy of the source register to a VGPR.
Tom Stellard [Thu, 3 Oct 2019 17:11:47 +0000 (17:11 +0000)]
AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructions
Summary:
This adds a pre-pass to this optimization that scans through the basic
block and generates lists of mergeable instructions with one list per unique
address.
In the optimization phase instead of scanning through the basic block for mergeable
instructions, we now iterate over the lists generated by the pre-pass.
The decision to re-optimize a block is now made per list, so if we fail to merge any
instructions with the same address, then we do not attempt to optimize them in
future passes over the block. This will help to reduce the time this pass
spends re-optimizing instructions.
In one pathological test case, this change reduces the time spent in the
SILoadStoreOptimizer from 0.2s to 0.03s.
This restructuring will also make it possible to implement further solutions in
this pass, because we can now add less expensive checks to the pre-pass and
filter instructions out early which will avoid the need to do the expensive
scanning during the optimization pass. For example, checking for adjacent
offsets is an inexpensive test we can move to the pre-pass.
James Molloy [Thu, 3 Oct 2019 17:10:32 +0000 (17:10 +0000)]
[ModuloSchedule] removeBranch() *before* creating the trip count condition
The Hexagon code assumes there's no existing terminator when inserting its
trip count condition check.
This causes swp-stages5.ll to break. The generated code looks good to me,
it is likely a permutation. I have disabled the new codegen path to keep
everything green and will investigate along with the other 3-4 tests
that have different codegen.
This patch reimplements command line option parsing in dsymutil with
Tablegen and libOption. The main motivation for this change is to
prevent clashes with other cl::opt options defined in llvm. Although
it's a bit more heavyweight, it has some nice advantages such as no
global static initializers and better separation between the code and
the option definitions.
I also used this opportunity to improve how dsymutil deals with
incompatible options. Instead of having checks spread across the code,
everything is now grouped together in verifyOptions. The fact that the
options are no longer global means that we need to pass them around a
bit more, but I think it's worth the trade-off.