Craig Topper [Sat, 23 Feb 2019 21:41:44 +0000 (21:41 +0000)]
[TwoAddressInstructionPass] After commuting an instruction and before trying to look for more commutable operands, resample the number of operands.
The new instruciton might have less operands than the original instruction. If we don't resample, the next loop iteration might read an operand that doesn't exist.
X86 can commute blends to movss/movsd which reduces from 4 operands to 3. This happened in the test case that caused r354363 & company to be reverted. A reduced version of that has been committed here.
Really this whole checking for more commutable operands is a little fragile. It assumes that the new instructions operands are the same order and positions as the original except for the pair that was swapped. I don't know of anything that breaks this assumption today, but I've left a fixme. Fixing this will likely require an interface change.
Craig Topper [Sat, 23 Feb 2019 19:51:32 +0000 (19:51 +0000)]
Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract"
r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit."
These were reverted in r354713 as their context depended on other patches that were reverted for a bug.
Craig Topper [Sat, 23 Feb 2019 08:34:10 +0000 (08:34 +0000)]
[X86] Sign extend the 8-bit immediate when commuting blend instructions to match isel.
Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t.
This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost.
The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results.
Michael Trent [Sat, 23 Feb 2019 06:19:56 +0000 (06:19 +0000)]
objdump fails to parse Mach-O binaries with n_desc bearing stabs
Summary:
The objdump Mach-O parser uses MachOObjectFile::checkSymbolTable() to
verify the symbol table is in a legal state before dereferencing the
offsets in the table. This routine missed a test for N_STAB symbols
when validating the two-level name space library ordinal for undefined
symbols. If the binary in question contained a value in the n_desc high
byte that is larger than the list of loaded dylibs, checkSymbolTable()
will flag the library ordinal as being out of range. Most of the time
the n_desc field is set to 0 or to small values, but old final linked
binaries exist with N_STAB symbols bearing non-trivial n_desc fields.
The change here is simply to verify a symbol is not an N_STAB symbol
before consulting the values of n_other or n_desc.
Craig Topper [Sat, 23 Feb 2019 00:35:02 +0000 (00:35 +0000)]
[X86] Enable custom splitting of v8i64/v16i32 sext/zext for avx/avx2 when input type will be promoted by the type legalize to 128-bits.
If the the input type will be promoted to 128 bits its better to put a sign_extend_inreg/and in the 128 bit register before the split occurs. Otherwise we end up doing it on each half in the wider register.
Some of the overflow arithmetic tests are regressions, but I think we can make some improvement using getSetccResultType in DAG combine and/or type legalization.
Craig Topper [Sat, 23 Feb 2019 00:34:58 +0000 (00:34 +0000)]
[X86] Add a few test cases for a v8i64 sext/zext from an illegal type that needs to be promoted to 128 bits.
If v8i64 isn't a legal type but v4i64 is, these will be split and then each half will get their input promoted and become an any_extend_vector_inreg/punpckhwd + any_extend + and/sign_extend_inreg.
If we instead recognize the input will be promoted we can emit the and/sign_extend_inreg first in a 128 bit register. Then we can sign_extend/zero_extend one half and pshufd+sign_extend/zero_extend the other half.
We record the type of the symbol (event/function/data/global) in the
MCWasmSymbol and so it should always be clear how to handle a relocation
based on the symbol itself.
The exception is a function which still needs the special @TYPEINDEX
then the relocation contains the signature rather than the address
of the functions.
David Greene [Fri, 22 Feb 2019 21:19:48 +0000 (21:19 +0000)]
[CMake] Honor LLVM_EXTERNAL_<proj>_SOURCE_DIR
When LLVM_ENABLE_PROJECTS is set, CMake assumes the project
directories are all side-by-side. This is not always the case and
there's no reason to expect it if LLVM_EXTERNAL_<proj>_SOURCE_DIR is
set. Honor that setting if it exists and allow the build configuration
to continue.
Daniel Sanders [Fri, 22 Feb 2019 20:59:07 +0000 (20:59 +0000)]
Restore ability for C++ API users to Enable IPRA.
Summary:
Prior to r310876 one of our out-of-tree targets was enabling IPRA by modifying
the TargetOptions::EnableIPRA. This no longer works on current trunk since the
useIPRA() hook overrides any values that are set in advance. This patch adjusts
the behaviour of the hook so that API users and useIPRA() can both enable it
but useIPRA() cannot disable it if the API user already enabled it.
Sanjay Patel [Fri, 22 Feb 2019 20:20:24 +0000 (20:20 +0000)]
[CGP] move overflow intrinsic insertion to common location; NFCI
We need to enhance the uaddo matching to handle special-cases
as seen in PR40486 and PR31754. That means we won't necessarily
have a def-use pattern, so we'll need to check dominance to
determine where to place the intrinsic (as we already do for
usubo). This preliminary patch is just rearranging the code,
so the planned follow-up to improve uaddo will be more clear.
Matt Arsenault [Fri, 22 Feb 2019 19:30:38 +0000 (19:30 +0000)]
MIR: Preserve incoming frame index numbers
Don't skip incrementing the frame index number
if the object is dead. Instructions can still be
referencing the old frame index number, and this
doesn't attempt to remap those. The resulting
MIR then fails to load because the use instructions
use a higher frame index number than recorded
list of stack objects.
I'm not sure it's possible to craft a testcase
with the existing set of passes. It requires
selectively marking some stack objects
dead in an essentially random order.
StackSlotColoring condenses towards
the low indexes. This avoids a regression in a
future AMDGPU commit when some frame indexes
are lowered separately from PEI.
Guozhi Wei [Fri, 22 Feb 2019 18:04:37 +0000 (18:04 +0000)]
[MBP] Factor out function hasViableTopFallthrough and enhancement
This patch factor out the function hasViableTopFallthrough from rotateLoop. It is also enhanced. Original code checks only if there is a block can be placed before current loop top. This patch also checks if the loop top is the most possible successor of its predecessor. The attached test case shows its effect.
Petar Jovanovic [Fri, 22 Feb 2019 14:53:58 +0000 (14:53 +0000)]
[mips][micromips] fix filling delay slots for PseudoIndirectBranch_MM
Filling a delay slot in 32bit jump instructions with a 16bit instruction
can cause issues. According to the documentation such an operation is
unpredictable.
This patch adds opcode Mips::PseudoIndirectBranch_MM alongside
Mips::PseudoIndirectBranch and other instructions that are expanded to jr
instruction and do not allow a 16bit instruction in their delay slots.
Roman Tereshin [Fri, 22 Feb 2019 14:33:46 +0000 (14:33 +0000)]
[LowerSwitch][AMDGPU] Do not handle impossible values
This patch adds LazyValueInfo to LowerSwitch to compute the range of the
value being switched over and reduce the size of the tree LowerSwitch
builds to lower a switch.
Chijun Sima [Fri, 22 Feb 2019 13:48:38 +0000 (13:48 +0000)]
[DTU] Refine the interface and logic of applyUpdates
Summary:
This patch separates two semantics of `applyUpdates`:
1. User provides an accurate CFG diff and the dominator tree is updated according to the difference of `the number of edge insertions` and `the number of edge deletions` to infer the status of an edge before and after the update.
2. User provides a sequence of hints. Updates mentioned in this sequence might never happened and even duplicated.
Logic changes:
Previously, removing invalid updates is considered a side-effect of deduplication and is not guaranteed to be reliable. To handle the second semantic, `applyUpdates` does validity checking before deduplication, which can cause updates that have already been applied to be submitted again. Then, different calls to `applyUpdates` might cause unintended consequences, for example,
```
DTU(Lazy) and Edge A->B exists.
1. DTU.applyUpdates({{Delete, A, B}, {Insert, A, B}}) // User expects these 2 updates result in a no-op, but {Insert, A, B} is queued
2. Remove A->B
3. DTU.applyUpdates({{Delete, A, B}}) // DTU cancels this update with {Insert, A, B} mentioned above together (Unintended)
```
But by restricting the precondition that updates of an edge need to be strictly ordered as how CFG changes were made, we can infer the initial status of this edge to resolve this issue.
Interface changes:
The second semantic of `applyUpdates` is separated to `applyUpdatesPermissive`.
These changes enable DTU(Lazy) to use the first semantic if needed, which is quite useful in `transforms/utils`.
Diana Picus [Fri, 22 Feb 2019 09:54:54 +0000 (09:54 +0000)]
[ARM GlobalISel] Support floating point for Thumb2
This is exactly the same as arm mode, so for the instruction selector
tests we just extract them to a new file and run with the same checks
for both arm and thumb mode.
For the legalizer we need to update the tests for soft float a bit, but
only because BL and tBL are slightly different. We could be pedantic and
check that we get a well-formed BL for arm mode and a tBL for thumb, but
for the purposes of the legalizer test it's sufficient to just skip over
the predicate operands in the checks. Also note that we have the
pedantic checks in the divmod test, so we're covered.
George Rimar [Fri, 22 Feb 2019 08:45:21 +0000 (08:45 +0000)]
[obj2yaml] - Do not miss section index for special symbols.
This fixes https://bugs.llvm.org/show_bug.cgi?id=40786
("obj2yaml symbol output missing section index for SHN_ABS and SHN_COMMON symbols")
Since SHN_ABS and SHN_COMMON symbols are special, we should preserve
the st_shndx for them. The patch does this for them and the other special symbols.
The test case is based on the test provided by James Henderson at the bug page!
Alina Sbirlea [Fri, 22 Feb 2019 07:18:37 +0000 (07:18 +0000)]
[MemorySSA & LoopPassManager] Resolve PR40038.
The correct edge being deleted is not to the unswitched exit block, but to the
original block before it was split. That's the key in the map, not the
value.
The insert is correct. The new edge is to the .split block.
The splitting turns OriginalBB into:
OriginalBB -> OriginalBB.split.
Assuming the orignal CFG edge: ParentBB->OriginalBB, we must now delete
ParentBB->OriginalBB, not ParentBB->OriginalBB.split.
Craig Topper [Fri, 22 Feb 2019 07:03:25 +0000 (07:03 +0000)]
[LegalizeVectorOps] Improve the placement of ANDs in the ExpandLoad path for non-byte-sized loads.
When we need to merge two adjacent loads the AND mask for the low piece was still sized for the full src element size. But we didn't have that many bits. The upper bits are already zero due to the SRL. So we can skip the AND if we're going to combine with the high bits.
We do need an AND to clear out any bits from the high part. We were anding the high part before combining with the low part, but it looks like ANDing after the OR gets better results.
So we can just emit the final AND after the optional concatentation is done. That will handling skipping before the OR and get rid of extra high bits after the OR.
Craig Topper [Fri, 22 Feb 2019 01:49:50 +0000 (01:49 +0000)]
[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract.
Otherwise we end up creating extract_vector_elts that then each need to have their input promoted. This can lead to truncates needing to be emitted for each of those.
But we already emitted any_extends when we legalized the extract_subvector. So now we have pairs of any_extend+trunc that partially cancel. But depending on how DAGCombiner visits them we can get weird results.
By promoting the input at the same time we can create only a single any_extend or truncate.
There's one regression in the vector-narrow-binop.ll case, but that looks easy to fix with a follow up patch.
Lang Hames [Fri, 22 Feb 2019 01:19:53 +0000 (01:19 +0000)]
Disable Kaleidoscope tests on Win32 -- looks like they're still failing there.
The Kaleidoscope tests were re-enabled in r354630, but are still failing on
Windows. This patch disables them on that platform until the failure can be
investigated.
Lang Hames [Thu, 21 Feb 2019 22:24:53 +0000 (22:24 +0000)]
[Kaleidoscope] Re-enable Kaleidoscope tests.
These were disabled in r246267 (back in 2015). I suspect that the Win32 issues
that caused them to be disabled at the time have been resovlved, but if not
we can disable them again while we sort those out.
Craig Topper [Thu, 21 Feb 2019 22:00:15 +0000 (22:00 +0000)]
[X86] Remove hasSideEffects=1 from the X87 pseudos with folded load.
This was done in r321424 to prevent scheduling from reordering things. But now that we model FPCW as a dependency, I don't think the same scheduling we were trying to prevent can occur.
Alina Sbirlea [Thu, 21 Feb 2019 19:54:05 +0000 (19:54 +0000)]
[LoopSimplifyCFG] Update MemorySSA after r353911.
Summary:
MemorySSA is not properly updated in LoopSimplifyCFG after recent changes. Use SplitBlock utility to resolve that and clear all updates once handleDeadExits is finished.
All updates that follow are removal of edges which are safe to handle via the removeEdge() API.
Also, deleting dead blocks is done correctly as is, i.e. delete from MemorySSA before updating the CFG and DT.
Mark Searles [Thu, 21 Feb 2019 18:19:54 +0000 (18:19 +0000)]
[AMDGPU] remove unused AssemblerPredicates
An internal build is hitting asserts complaining about too many subtarget
features:
llvm/utils/TableGen/Types.cpp:42:
const char* llvm::getMinimalTypeForEnumBitfield(uint64_t):
Assertion `MaxIndex <= 64 && "Too many bits"' failed.
The short-term solution is to remove a few unused AssemblerPredicates to get
under the limit.
The long-term solution seems to be to revisit these asserts. E.g., rather than
hardcoded '64', use the standard sized std::bitset like the other places that
track subtarget features.
Jordan Rupprecht [Thu, 21 Feb 2019 17:05:19 +0000 (17:05 +0000)]
[llvm-objcopy][NFC] More error cleanup
Summary:
This removes calls to `error()`/`reportError()` in the main driver (llvm-objcopy.cpp) as well as the associated argv-parsing (CopyConfig.cpp). `logAllUnhandledErrors()` is now the main way to print errors.
There are still a few uses from within the per-arch drivers, so we can't delete them yet... but almost!
Sam Clegg [Thu, 21 Feb 2019 17:05:19 +0000 (17:05 +0000)]
[WebAssembly] Don't create MSSymbolWasm object for non-symbols
`__linear_memory` and `__indirect_function_table` are both generated
as imports in wasm object files but are actually symbols and don't
appear in any symbols table or relocation entry. Indeed we
don't have any symbol type to meaningfully represent either of them.
Jordan Rupprecht [Thu, 21 Feb 2019 16:45:42 +0000 (16:45 +0000)]
[llvm-objcopy] Make removeSectionReferences batched
Summary:
Removing a large number of sections from a file with a lot of symbols can have abysmal (i.e. O(n^2)) performance, e.g. when running `--only-section` to extract one section out of a large file.
This comes from iterating over all symbols in the symbol table each time we remove a section, to remove references to the section we just removed.
Instead, do just one pass of symbol removal by passing a hash set of all the sections we'd like to remove references to.
This fixes a regression when running llvm-objcopy -j <one section> on an object file with many sections and symbols -- on my machine, running `objcopy -j .keep_me huge-input.o /tmp/foo.o` takes .3s with GNU objcopy, 1.3s with an updated llvm-objcopy, and 7+ minutes with llvm-objcopy prior to this patch.
Sanjay Patel [Thu, 21 Feb 2019 16:01:48 +0000 (16:01 +0000)]
[DAGCombiner] prevent infinite looping by truncating 'and' (PR40793)
This fold can occur during legalization, so it can fight with promotion
to the larger type. It apparently takes a special sequence and subtarget
to avoid more basic simplifications that would hide the problem.
But there's a bigger question raised here: why does distributeTruncateThroughAnd()
even exist? It duplicates functionality from a more minimal pattern that we
already have. But getting rid of this function requires some preliminary steps.
Matt Arsenault [Thu, 21 Feb 2019 15:48:13 +0000 (15:48 +0000)]
RegBankSelect: Allow targets to introduce control flow for mapping
For AMDGPU, if an operand requires an SGPR but is only available as a
VGPR, a loop needs to be introduced to execute the instruction with
each unique combination of values across all lanes. The rest of the
instructions in the block will be moved to a new block following the
loop. Check if the next instruction's parent changed, and update the
iterators and insertion block if this happened.
James Henderson [Thu, 21 Feb 2019 12:47:10 +0000 (12:47 +0000)]
[llvm-readobj]Add testing for ELF symbol and section table printing for a wider range of values
The existing ELF symbol and section table testing doesn't test many of
the corner-cases or valid values for various ELF properties, including
things like binding, visibility, section type and so on. This patch adds
a series of tests that test these and other related edge-cases.
Simon Pilgrim [Thu, 21 Feb 2019 12:24:49 +0000 (12:24 +0000)]
[X86][SSE] combineX86ShufflesRecursively - moved to generic op input index lookup. NFCI.
We currently bail if the target shuffle decodes to more than 2 input vectors, this change alters the input index to work for any number of inputs for when we drop that requirement.