Ahmed Bougacha [Wed, 15 Mar 2017 18:22:37 +0000 (18:22 +0000)]
[GlobalISel] Insert translated switch icmp blocks after switch parent.
Now that we preserve the IR layout, we would end up with all the newly
synthesized switch comparison blocks at the end of the function.
Instead, use a hopefully more reasonable layout, with the comparison
blocks immediately following the switch comparison blocks.
Ahmed Bougacha [Wed, 15 Mar 2017 18:22:33 +0000 (18:22 +0000)]
[GlobalISel] Preserve IR block layout.
It makes the output function layout more predictable; the layout has
an effect on performance, we don't want it to be at the mercy of the
translator's visitation order and such.
The predictable output is also easier to digest.
getOrCreateBB isn't appropriately named anymore, as it never needs to
create anything. Rename it and extract the MBB creation logic out of it.
A couple tests were sensitive to the order. Update them.
Zachary Turner [Wed, 15 Mar 2017 17:47:39 +0000 (17:47 +0000)]
[YAML] When outputting, provide the ability to write default values.
Previously, if you attempted to write a key/value pair and the
value was equal to the key's default value, we would not output
the value. Sometimes it is useful to be able to see this value
in the output anyway.
Reid Kleckner [Wed, 15 Mar 2017 17:43:40 +0000 (17:43 +0000)]
Move some LAST_* enum sentinels out of their enums
These are not valid values of the enum, so this will improve clang
-Wcovered-switch-default diagnostics. It also fixes some
-Wbitfield-enum-conversion warnings.
Craig Topper [Wed, 15 Mar 2017 16:53:53 +0000 (16:53 +0000)]
[CodeGen] Use APInt::setLowBits/setHighBits/setBitsFrom in more places
This patch replaces ORs with getHighBits/getLowBits etc. with setLowBits/setHighBits/setBitsFrom.
In a few of the places we weren't ORing, but the KnownZero/KnownOne vectors were already initialized to zero. We exploit this in most places already there were just some that were inconsistent.
CodeGen: Use the source filename as the argument to .file, rather than the module ID.
Using the module ID here is wrong for a couple of reasons:
1) The module ID is not persisted, so we can end up with different
object file contents given the same input file (for example if the same
file is accessed via different paths).
2) With ThinLTO the module ID field may contain the path to a bitcode file,
which is incorrect, as the .file argument is supposed to contain the path to
a source file.
Simon Pilgrim [Wed, 15 Mar 2017 15:40:34 +0000 (15:40 +0000)]
[SelectionDAG][AArch64] Add test case showing incorrect SelectionDAG::ComputeNumSignBits BUILD_VECTOR handling
Reduced from a mixture of PR32273 and David Green's test cases showing SelectionDAG::ComputeNumSignBits not correctly handling BUILD_VECTOR implicit truncation of inputs.
Petar Jovanovic [Wed, 15 Mar 2017 13:10:08 +0000 (13:10 +0000)]
[Mips] Add support to match more patterns for DEXT and CINS
This patch adds support for recognizing more patterns to match to DEXT and
CINS instructions.
It finds cases where multiple instructions could be replaced with a single
DEXT or CINS instruction.
Sam Parker [Wed, 15 Mar 2017 10:21:23 +0000 (10:21 +0000)]
[ARM] Fix for branch label disassembly for Thumb
Different MCInstrAnalysis classes for arm and thumb mode, each with
their own evaluateBranch implementation. I added a test case and
fixed the coff-relocations test to use '<label>:' rather than
'<label>' in the CHECK-LABEL entries, since the ones without the
colon would match branch targets. Might be worth noticing that
llvm-objdump does not lookup the relocation and thus assigns it a
target depending on the encoded immediate which #0, so it thinks it
branches to the next instruction.
Eric Liu [Wed, 15 Mar 2017 08:41:00 +0000 (08:41 +0000)]
[Support][CommandLine] Make it possible to get error messages from ParseCommandLineOptions when ignoring errors.
Summary:
Previously, ParseCommandLineOptions returns false and ignores error messages
when IgnoreErrors. It would be useful to also return error messages if users
decide to check parsing result instead of having the program exit on error.
Sam Parker [Wed, 15 Mar 2017 08:27:11 +0000 (08:27 +0000)]
[ARM] Enable SMLAL[B|T] isel
Enable the selection of the 64-bit signed multiply accumulate
instructions which operate on 16-bit operands. These are enabled for
ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb
architectures.
Michal Gorny [Wed, 15 Mar 2017 05:57:29 +0000 (05:57 +0000)]
[llvm-config] Add minimal sanity tests for path options
Add minimal tests that check whether path options do not fail and output
directories looking like expected. Requested in
https://reviews.llvm.org/rL291218.
Taewook Oh [Wed, 15 Mar 2017 05:44:59 +0000 (05:44 +0000)]
[BranchFolding] Merge debug locations from common tail instead of removing
Summary: D25742 improved the precision of debug locations for PGO by removing debug locations from common tail when tail-merging. However, if identical insturctions that are merged into a common tail have the same debug locations, there's no need to remove them. This patch creates a merged debug location of identical instructions across SameTails and assign it to the instruction in the common tail, so that the debug locations are maintained if they are same across identical instructions.
Ensure that prefix data is preserved with subsections-via-symbols
On MachO platforms that use subsections-via-symbols dead code stripping will
drop prefix data. Unfortunately there is no great way to convey the relationship
between a function and its prefix data to the linker. We are forced to use a bit
of a hack: we give the prefix data it’s own symbol, and mark the actual function
entry an .alt_entry.
Daniel Sanders [Tue, 14 Mar 2017 21:32:08 +0000 (21:32 +0000)]
[globalisel][tblgen] Add support for ComplexPatterns
Summary:
Adds a new kind of MachineOperand: MO_Placeholder.
This operand must not appear in the MIR and only exists as a way of
creating an 'uninitialized' operand until a matcher function overwrites it.
Simon Pilgrim [Tue, 14 Mar 2017 21:26:58 +0000 (21:26 +0000)]
[SelectionDAG] Add a signed integer absolute ISD node
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering.
ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns.
At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom.
Derek Schuff [Tue, 14 Mar 2017 20:23:22 +0000 (20:23 +0000)]
[WebAssembly] Use LEB encoding for value types
Previously we were using the encoded LEB hex values
for the value types. This change uses the decoded
negative value and the LEB encoder to write them out.
Rafael Espindola [Tue, 14 Mar 2017 19:57:13 +0000 (19:57 +0000)]
Archives require a symbol table on Solaris, even if empty.
On Solaris ld (and some other tools that use the underlying utility
libraries, such as elfdump) chokes on an archive library that has no
symbol table. The Solaris tools always create one, even if it's empty.
That bug has been fixed in the latest development line, and can
probably be backported to a supported release, but it would be nice if
LLVM's archiver could emit the empty symbol table, too.
Sanjay Patel [Tue, 14 Mar 2017 18:06:28 +0000 (18:06 +0000)]
[DAG] vector div/rem with any zero element in divisor is undef
This is the backend counterpart to:
https://reviews.llvm.org/rL297390
https://reviews.llvm.org/rL297409
and follow-up to:
https://reviews.llvm.org/rL297384
It surprised me that we need to duplicate the check in FoldConstantArithmetic and FoldConstantVectorArithmetic,
but one or the other doesn't catch all of the test cases. There is an existing code comment about merging those
someday.
Dehao Chen [Tue, 14 Mar 2017 17:33:01 +0000 (17:33 +0000)]
SamplePGO ThinLTO ICP fix for local functions.
Summary:
In SamplePGO, if the profile is collected from non-LTO binary, and used to drive ThinLTO, the indirect call promotion may fail because ThinLTO adjusts local function names to avoid conflicts. There are two places of where the mismatch can happen:
1. thin-link prepends SourceFileName to front of FuncName to build the GUID (GlobalValue::getGlobalIdentifier). Unlike instrumentation FDO, SamplePGO does not use the PGOFuncName scheme and therefore the indirect call target profile data contains a hash of the OriginalName.
2. backend compiler promotes some local functions to global and appends .llvm.{$ModuleHash} to the end of the FuncName to derive PromotedFunctionName
This patch tries at the best effort to find the GUID from the original local function name (in profile), and use that in ICP promotion, and in SamplePGO matching that happens in the backend after importing/inlining:
1. in thin-link, it builds the map from OriginalName to GUID so that when thin-link reads in indirect call target profile (represented by OriginalName), it knows which GUID to import.
2. in backend compiler, if sample profile reader cannot find a profile match for PromotedFunctionName, it will try to find if there is a match for OriginalFunctionName.
3. in backend compiler, we build symbol table entry for OriginalFunctionName and pointer to the same symbol of PromotedFunctionName, so that ICP can find the correct target to promote.
This patch refactors the PHisToFix loop as follows:
- The loop itself now resides in its own method.
- The new method iterates on scalar-loop's header; the PHIsToFix map formerly
propagated as an output parameter and filled during phi widening is removed.
- The code handling reductions is moved into its own method, similar to the
existing fixFirstOrderRecurrence().
Oliver Stannard [Tue, 14 Mar 2017 13:50:10 +0000 (13:50 +0000)]
[ARM] Diagnose ARM MOVT without :lower16: or :upper16: expression
This instruction was missing from the list of opcodes that we check, so we were
hitting an llvm_unreachable in ARMMCCodeEmitter.cpp for the ARM MOVT
instruction, rather than the diagnostic that is emitted for the other MOVW/MOVT
instructions.
Refactoring Cost Model's selectVectorizationFactor() so that it handles only the
selection of the best VF from a pre-computed range of candidate VF's, extracting
early-exit criteria and the computation of a MaxVF upper-bound to other methods,
all driven by a newly introduced LoopVectorizationPlanner.
Daniel Berlin [Tue, 14 Mar 2017 11:25:45 +0000 (11:25 +0000)]
Make PredIteratorCache size() logically const. Do not require copying predecessors to get size.
Summary:
Every single benchmark i can run, on large and small cfgs, fully
connected, etc, across 3 different platforms (x86, arm., and PPC) says
that the current pred iterator cache is a losing proposition.
I can't find a case where it's faster than just walking preds, and in some cases, it's 5-10% slower.
This is due to copying the preds.
It also degrades into copying the entire cfg.
The one operation that is occasionally faster is the cached size.
This makes that operation faster by not relying on having the copies available.
I'm not even sure that is faster enough to be worth it. I, again, have
trouble finding cases where this takes long enough in a pass to be
worth caching compared to a million other things they could cache or
improve.
My suggestion:
We next remove the get() interface.
We do stronger benchmarking of size().
We probably end up killing this entire cache.
/
Oliver Stannard [Tue, 14 Mar 2017 10:13:17 +0000 (10:13 +0000)]
[ValueTracking] Out of range shifts might be undef
If it is possible for the RHS of a shift operation to be greater than or equal
to the bit-width, then the result might be undef, and we can't report any known
bits.
In some cases, this was allowing a transformation in instcombine which widened
an undef value from i1 to i32, increasing the range of values that a function
could return.
Sam Parker [Tue, 14 Mar 2017 09:13:22 +0000 (09:13 +0000)]
[ARM] Move SMULW[B|T] isel to DAG Combine
Create nodes for smulwb and smulwt and move their selection from
DAGToDAG to DAG combine. smlawb and smlawt can then be selected
using tablegen. Added some helper functions to detect shift patterns
as well as a wrapper around SimplifyDemandBits. Added a couple of
extra tests.
Oren Ben Simhon [Tue, 14 Mar 2017 09:09:26 +0000 (09:09 +0000)]
Disable Callee Saved Registers
Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller.
Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list.
The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee.
The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee.
Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span).
The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments.
The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC.
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.
The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.
Review: Hal Finkel
https://reviews.llvm.org/D29540
Nirav Dave [Tue, 14 Mar 2017 01:42:23 +0000 (01:42 +0000)]
Recommitting Craig Topper's patch now that r296476 has been recommitted.
When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.
This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.
Nirav Dave [Tue, 14 Mar 2017 00:34:14 +0000 (00:34 +0000)]
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Zachary Turner [Mon, 13 Mar 2017 23:28:25 +0000 (23:28 +0000)]
Add the beginning of PDB diffing support.
For now this only diffs the stream directory and the MSF
Superblock. Future patches will drill down into individual
streams to find out where the differences lie.
Adrian Prantl [Mon, 13 Mar 2017 22:56:14 +0000 (22:56 +0000)]
Revert "Debug Info: Add basic support for external types references."
This reverts commit r242302. External type refs of this form were
never used by any LLVM frontend so this is effectively dead code.
(They were introduced to support clang module debug info, but in the
end we came up with a better design that doesn't use this feature at
all.)
Rui Ueyama [Mon, 13 Mar 2017 22:19:05 +0000 (22:19 +0000)]
Make FileOutputBuffer fail early if you pass a directory.
Previously, it created a temporary directory and then failed when
FileOutputBuffer tried to rename that file to the destination file
(which is actually a directory name).
[IPRA] Change algorithm for RegUsageInfoCollector.
The previous algorithm for RegUsageInfoCollector had pretty bad
performance on architectures with a lot of registers that alias
a lot one another, because we potentially iterate for every register
over all the aliasing registers. This costs even more if the function
is small and doesn't define a lot of registers.
This patch changes the algorithm to one that while iterating over
all the registers it will iterate over the aliasing registers only
if the register itself is defined.
This should be faster based on the assumption that only a subset
of the whole LLVM registers set is actually defined in the function.
Juergen Ributzka [Mon, 13 Mar 2017 21:34:07 +0000 (21:34 +0000)]
[Support] Test directory iterators and recursive directory iterators with broken symlinks.
This commit adds a unit test to the file system tests to verify the behavior of
the directory iterator and recursive directory iterator with broken symlinks.
Simon Pilgrim [Mon, 13 Mar 2017 21:23:29 +0000 (21:23 +0000)]
[X86][MMX] Fix folding of shift value loads to cover whole 64-bits
rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value.
This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source.