Nikolai Bozhenov [Thu, 24 Nov 2016 13:23:35 +0000 (13:23 +0000)]
[x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B
The bug arises during register allocation on i686 for
CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B
needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address,
plus ESI is reserved as the base pointer. With such constraints the only
way register allocator would do its job successfully is when the addressing
mode of the instruction requires only one register. If that is not the case
- we are emitting additional LEA instruction to compute the address.
It fixes PR28755.
Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>
Nikolai Bozhenov [Thu, 24 Nov 2016 13:05:43 +0000 (13:05 +0000)]
[x86] Rewrite getAddressFromInstr helper function
- It does not modify the input instruction
- Second operand of any address is always an Index Register,
make sure we actually check for that, instead of a check for
an immediate value
Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>
Simon Pilgrim [Thu, 24 Nov 2016 12:13:46 +0000 (12:13 +0000)]
[X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI
Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions.
This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types.
Object: Simplify the IRObjectFile symbol iterator implementation.
Change the IRObjectFile symbol iterator to be a pointer into a vector of
PointerUnions representing either IR symbols or asm symbols.
This change is in preparation for a future change for supporting multiple
modules in an IRObjectFile. Although it causes an increase in memory
consumption, we can deal with that issue separately by introducing a bitcode
symbol table.
Matt Arsenault [Thu, 24 Nov 2016 00:26:47 +0000 (00:26 +0000)]
TRI: Add hook to pass scavenger during frame elimination
The scavenger was not passed if requiresFrameIndexScavenging was
enabled. I need to be able to test for the availability of an
unallocatable register here, so I can't create a virtual register for
it.
It might be better to just always use the scavenger and stop
creating virtual registers.
Greg Clayton [Wed, 23 Nov 2016 23:30:37 +0000 (23:30 +0000)]
Rely on a single DWARF version instead of having two copies
This patch makes AsmPrinter less reliant on DwarfDebug by relying on the DWARF version in the AsmPrinter's MCStreamer's MCContext. This allows us to remove the redundant DWARF version from DwarfDebug. It also lets us change code that used to access the AsmPrinter's DwarfDebug just to get to the DWARF version by changing the DWARF version accessor on AsmPrinter so that it grabs the version from its MCStreamer's MCContext.
Matt Arsenault [Wed, 23 Nov 2016 20:52:53 +0000 (20:52 +0000)]
AMDGPU: Fix MMO when splitting spill
The size and offset were wrong. The size of the object was
being used for the size of the access, when here it is really
being split into 4-byte accesses. The underlying object size
is set in the MachinePointerInfo, which also didn't have the
offset set.
Meador Inge [Wed, 23 Nov 2016 20:17:15 +0000 (20:17 +0000)]
llvm-nm: Don't print value or size for undefined or weak symbols
Undefined and weak symbols don't have a meaningful size or value.
As such, nothing should be printed for those attributes (this is
already done for the address with 'U') with the BSD format. This
matches what GNU nm does.
Note that for the POSIX.2 format [1] zero values are still
printed for the size and value. This seems in spirit with
the format strings in that specification, but is debatable.
Daniel Berlin [Wed, 23 Nov 2016 19:03:54 +0000 (19:03 +0000)]
Revert "[Triple] Add Facebook vendor"
This reverts commit r287684
Objections on the review thread had not been addressed to
prior to commit. I asked the committer to revert, but i expect they
are gone for the US holiday or something.
[X86] Allow folding of stack reloads when loading a subreg of the spilled reg
We did not support subregs in InlineSpiller:foldMemoryOperand() because targets
may not deal with them correctly.
This adds a target hook to let the spiller know that a target can handle
subregs, and actually enables it for x86 for the case of stack slot reloads.
This fixes PR30832.
Hemant Kulkarni [Wed, 23 Nov 2016 18:04:23 +0000 (18:04 +0000)]
llvm-readobj: Use hash tables to print dynamic symbols.
-symbols prints both .symtab and .dynsym symbols for GNU style in ELF.
-dyn-symbols prints symbols looking up through hash tables. This helps validate hash tables.
Chandler Carruth [Wed, 23 Nov 2016 17:53:26 +0000 (17:53 +0000)]
[PM] Change the static object whose address is used to uniquely identify
analyses to have a common type which is enforced rather than using
a char object and a `void *` type when used as an identifier.
This has a number of advantages. First, it at least helps some of the
confusion raised in Justin Lebar's code review of why `void *` was being
used everywhere by having a stronger type that connects to documentation
about this.
However, perhaps more importantly, it addresses a serious issue where
the alignment of these pointer-like identifiers was unknown. This made
it hard to use them in pointer-like data structures. We were already
dodging this in dangerous ways to create the "all analyses" entry. In
a subsequent patch I attempted to use these with TinyPtrVector and
things fell apart in a very bad way.
And it isn't just a compile time or type system issue. Worse than that,
the actual alignment of these pointer-like opaque identifiers wasn't
guaranteed to be a useful alignment as they were just characters.
This change introduces a type to use as the "key" object whose address
forms the opaque identifier. This both forces the objects to have proper
alignment, and provides type checking that we get it right everywhere.
It also makes the types somewhat less mysterious than `void *`.
We could go one step further and introduce a truly opaque pointer-like
type to return from the `ID()` static function rather than returning
`AnalysisKey *`, but that didn't seem to be a clear win so this is just
the initial change to get to a reliably typed and aligned object serving
is a key for all the analyses.
Thanks to Richard Smith and Justin Lebar for helping pick plausible
names and avoid making this refactoring many times. =] And thanks to
Sean for the super fast review!
While here, I've tried to move away from the "PassID" nomenclature
entirely as it wasn't really helping and is overloaded with old pass
manager constructs. Now we have IDs for analyses, and key objects whose
address can be used as IDs. Where possible and clear I've shortened this
to just "ID". In a few places I kept "AnalysisID" to make it clear what
was being identified.
Alina Sbirlea [Wed, 23 Nov 2016 17:43:15 +0000 (17:43 +0000)]
[LoadStoreVectorizer] Enable vectorization of stores in the presence of an aliasing load
Summary:
The "getVectorizablePrefix" method would give up if it found an aliasing load for a store chain.
In practice, the aliasing load can be treated as a memory barrier and all stores that precede it
are a valid vectorizable prefix.
Issue found by volkan in D26962. Testcase is a pruned version of the one in the original patch.
John Brawn [Wed, 23 Nov 2016 16:05:51 +0000 (16:05 +0000)]
[DAGCombiner] Fix infinite loop in vector mul/shl combining
We have the following DAGCombiner transformations:
(mul (shl X, c1), c2) -> (mul X, c2 << c1)
(mul (shl X, C), Y) -> (shl (mul X, Y), C)
(shl (mul x, c1), c2) -> (mul x, c1 << c2)
Usually the constant shift is optimised by SelectionDAG::getNode when it is
constructed, by SelectionDAG::FoldConstantArithmetic, but when we're dealing
with vectors and one of those vector constants contains an undef element
FoldConstantArithmetic does not fold and we enter an infinite loop.
Fix this by making FoldConstantArithmetic use getNode to decide how to fold each
vector element, the same as FoldConstantVectorArithmetic does, and rather than
adding the constant shift to the work list instead only apply the transformation
if it's already been folded into a constant, as if it's not we're going to loop
endlessly. Additionally add missing NoOpaques to one of those transformations,
which I noticed when writing the tests for this.
Nemanja Ivanovic [Wed, 23 Nov 2016 15:51:52 +0000 (15:51 +0000)]
[PowerPC] Remove InstAlias definitions that cause incorrect assembly
In rL283190, I added some InstAlias definitions to generate extended mnemonics
for some uses of the XXPERMDI instruction. However, when the assembler matches
these extended mnemonics, it matches the new instruction in situations where it
should match the old one.
This patch removes these definitions and accomplishes that by defining these
mnemonics with additional instructions that are isCodeGenOnly.
Type legalization for compressstore and expandload intrinsics.
Implemented widening (v2f32) and splitting (v16f64).
On splitting, I use "popcnt" to calculate memory increment.
More type legalization work will come in the next patches.
Davide Italiano [Wed, 23 Nov 2016 01:42:39 +0000 (01:42 +0000)]
[SCCP] Add a test for switches on undef.
Without this test, you can just remove the code fixing the
switch to the first constant in ResolvedUndefs in and everything
pass. This test, instead, fails with an assertion if the code
is removed. Found while refactoring SCCP to integrate undef in
the solver.
Justin Lebar [Tue, 22 Nov 2016 23:14:11 +0000 (23:14 +0000)]
[StructurizeCFG] Refactor OrderNodes.
Summary:
No need to copy the RPOT vector before using it. Switch from std::map
to SmallDenseMap. Get rid of an unused variable (TempVisited). Get rid
of a typedef, RNVector, which is now used only once.
Justin Lebar [Tue, 22 Nov 2016 23:14:07 +0000 (23:14 +0000)]
[StructurizeCFG] Add whitespace in getAnalysisUsage.
Summary:
"addRequired" and "addPreserved" look very similar when squished up next
to each other -- without the newline this code looked to me like it was
addRequired'ing DominatorTreeWrapperPass twice.
Davide Italiano [Tue, 22 Nov 2016 22:11:25 +0000 (22:11 +0000)]
[SCCP] Remove code in visitBinaryOperator (and add tests).
We visit and/or, we try to derive a lattice value for the
instruction even if one of the operands is overdefined.
If the non-overdefined value is still 'unknown' just return and wait
for ResolvedUndefsIn to "plug in" the correct value. This simplifies
the logic a bit. While I'm here add tests for missing cases.
Matthias Braun [Tue, 22 Nov 2016 22:09:03 +0000 (22:09 +0000)]
TargetSubtargetInfo: Move implementation to lib/CodeGen; NFC
TargetSubtargetInfo is filled with CodeGen specific interfaces nowadays
(getInstrInfo(), getFrameLowering(), getSelectionDAGInfo()) most of the
tuning flags like enablePostRAScheduler(), getAntiDepBreakMode(),
enableRALocalReassignment(), ... also do not seem to be universal enough
to make sense outside of CodeGen.
Sanjay Patel [Tue, 22 Nov 2016 22:05:48 +0000 (22:05 +0000)]
[InstCombine] change bitwise logic type to eliminate bitcasts
In PR27925:
https://llvm.org/bugs/show_bug.cgi?id=27925
...we proposed adding this fold to eliminate a bitcast. In D20774, there was
some concern about changing the type of a bitwise op as well as creating
bitcasts that might not be free for a target. However, if we're strictly
eliminating an instruction (by limiting this to one-use ops), then we should
be able to do this in InstCombine.
But we're cautiously restricting the transform for now to vector types to
avoid possible backend problems. A transform to make sure the logic op is
legal for the target should be added to reverse this transform and improve
codegen.
Chandler Carruth [Tue, 22 Nov 2016 20:35:32 +0000 (20:35 +0000)]
[LCG] Start using SCC relationship predicates in the unittest.
This mostly gives us nice unittesting of the predicates themselves. I'll
start using them further in subsequent commits to help test the actual
operations performed on the graph.
Rui Ueyama [Tue, 22 Nov 2016 20:32:22 +0000 (20:32 +0000)]
Remove PDBFileBuilder::build() and related functions.
PDBFileBuilder supports two different ways to create files.
One is PDBFileBuilder::commit. That function takes a filename
and write a result to the file. The other is PDBFileBuilder::build.
That returns a new PDBFile object.
This patch removes the latter because no one is using it and
in a real life situation we are very unlikely to need it.
Even if you need it, it'd be easy to write a new PDB to a memory
buffer and read it back.
Removing PDBFileBuilder::build enables us to remove other classes
build transitively.
Chandler Carruth [Tue, 22 Nov 2016 19:23:31 +0000 (19:23 +0000)]
[LCG] Add utilities to compute parent and ascestor relationships between
SCCs.
These will be fairly expensive routines to call and might be abused in
real code, but are quite useful when debugging or in asserts and are
reasonable and well formed properties to query.
I've used one of them in an assert that was requested in a code review
here. In subsequent commits I'll start using these routines more
heavily, for example in unittests etc. But this at least gets the
groundwork in place.
Simon Pilgrim [Tue, 22 Nov 2016 17:50:06 +0000 (17:50 +0000)]
[X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles.
This occurs during UINT_TO_FP v2f64 lowering.
We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle
[mips] Add support for unaligned load/store macros.
Add missing unaligned store macros (ush/usw) and fix the exisiting
implementation of the unaligned load macros in order to generate
identical expansions with the GNU assembler.
No-one actually had a mangler handy when calling this function, and
getSymbol itself went most of the way towards getting its own mangler
(with a local TLOF variable) so forcing all callers to supply one was
just extra complication.
Coby Tayree [Tue, 22 Nov 2016 09:30:29 +0000 (09:30 +0000)]
[AVX512][inline-asm] Fix AVX512 inline assembly instruction resolution when the size qualifier of a memory operand is not specified explicitly.
This commit handles cases where the size qualifier of an indirect memory reference operand in Intel syntax is missing (e.g. "vaddps xmm1, xmm2, [a]").
GCC will deduce the size qualifier for AVX512 vector and broadcast memory operands based on the possible matches:
"vaddps xmm1, xmm2, [a]" matches only “XMMWORD PTR” qualifier.
"vaddps xmm1, xmm2, [a]{1to4}" matches only “DWORD PTR” qualifier.
This is different from the current behavior of LLVM, which deduces the size qualifier based on the size of the memory operand.
For "vaddps xmm1, xmm2, [a]"
"char a;" will imply "BYTE PTR" qualifier
"short a;" will imply "WORD PTR" qualifier.
This commit aligns LLVM to GCC’s behavior.
This is the LLVM part of the review.
The Clang part of the review: https://reviews.llvm.org/D26587
Craig Topper [Tue, 22 Nov 2016 07:00:06 +0000 (07:00 +0000)]
[TableGen][ISel] When factoring ScopeMatcher, if the child of the ScopeMatcher we're working on is also a ScopeMatcher, merge all its children into the one we're working on.
There were several cases in X86 where we were unable to fully factor a ScopeMatcher but created nested ScopeMatchers for some portions of it. Then we created a SwitchType that split it up and further factored it so that we ended up with something like this:
SwitchType
Scope
Scope
Sequence of matchers
Some other sequence of matchers
EndScope
Another sequence of matchers
EndScope
...Next type
This change turns it into this:
SwitchType
Scope
Sequence of matchers
Some other sequence of matchers
Another sequence of matchers
EndScope
...Next type
Several other in-tree targets had similar nested scopes like this. Overall this doesn't save many bytes, but makes the isel output a little more regular.
Craig Topper [Tue, 22 Nov 2016 05:31:43 +0000 (05:31 +0000)]
[X86] Remove alternate CodeGenOnly version of (v)movq that declared the load size as i128mem. Change all uses to the use the i64mem version.
I'm sure this caused the load size to misprint in Intel syntax output. We were also inconsistent about which patterns used which instruction between VEX and EVEX.
There are two different reg/reg versions of movq, one from a GPR and one from the lower 64-bits of an XMM register. This changes the loading folding table to use the single i64mem memory form for folding both cases. But we need to use TB_NO_REVERSE to prevent a duplicate entry in the unfolding table.
Craig Topper [Tue, 22 Nov 2016 04:57:34 +0000 (04:57 +0000)]
[AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD).
Summary:
The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities.
We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too.
MC: ensure that we have a section before accessing it
We would attempt to access the symbol section without ensuring that the symbol
was not absolute. When the assembler referenced relocation is not evaluated to
the absolute, but when we record the relocation, we would query the section.
Because the symbol is absolute, it does not have a section associated with it,
triggering an assertion. Just be more careful about the access of the section.