This fold hit the trifecta:
1. It was untested.
2. It oversteps (multiuse is not checked, so increases instruction count).
3. It is incomplete (doesn't work for vectors).
Revert r308078 (and subsequent tweak in r308079) which introduces a test
that appears to exhibit non-determinism and is flaking on the bots
pretty consistently.
r308078: [ThinLTO] Ensure we always select the same function copy to import
r308079: Require asserts in new test that uses debug flag
[PM/LCG] Teach the LazyCallGraph to maintain reference edges from every
function to every defined function known to LLVM as a library function.
LLVM can introduce calls to these functions either by replacing other
library calls or by recognizing patterns (such as memset_pattern or
vector math patterns) and replacing those with calls. When these library
functions are actually defined in the module, we need to have reference
edges to them initially so that we visit them during the CGSCC walk in
the right order and can effectively rebuild the call graph afterward.
This was discovered when building code with Fortify enabled as that is
a common case of both inline definitions of library calls and
simplifications of code into calling them.
This can in extreme cases of LTO-ing with libc introduce *many* more
reference edges. I discussed a bunch of different options with folks but
all of them are unsatisfying. They either make the graph operations
substantially more complex even when there are *no* defined libfuncs, or
they introduce some other complexity into the callgraph. So this patch
goes with the simplest possible solution of actual synthetic reference
edges. If this proves to be a memory problem, I'm happy to implement one
of the clever techniques to save memory here.
Simon Atanasyan [Sat, 15 Jul 2017 07:14:25 +0000 (07:14 +0000)]
[mips] Handle the `long-calls` feature flags in the MIPS backend
If the `long-calls` feature flags is enabled, disable use of the `jal`
instruction. Instead of that call a function by by first loading its
address into a register, and then using the contents of that register.
Matt Arsenault [Sat, 15 Jul 2017 05:52:59 +0000 (05:52 +0000)]
AMDGPU: Return correct type during argument lowering
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.
Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.
Yonghong Song [Sat, 15 Jul 2017 05:41:42 +0000 (05:41 +0000)]
bpf: generate better lowering code for certain select/setcc instructions
Currently, for code like below,
===
inner_map = bpf_map_lookup_elem(outer_map, &port_key);
if (!inner_map) {
inner_map = &fallback_map;
}
===
the compiler generates (pseudo) code like the below:
===
I1: r1 = bpf_map_lookup_elem(outer_map, &port_key);
I2: r2 = 0
I3: if (r1 == r2)
I4: r6 = &fallback_map
I5: ...
===
During kernel verification process, After I1, r1 holds a state
map_ptr_or_null. If I3 condition is not taken
(path [I1, I2, I3, I5]), supposedly r1 should become map_ptr.
Unfortunately, kernel does not recognize this pattern
and r1 remains map_ptr_or_null at insn I5. This will cause
verificaiton failure later on.
Kernel, however, is able to recognize pattern "if (r1 == 0)"
properly and give a map_ptr state to r1 in the above case.
LLVM here generates suboptimal code which causes kernel verification
failure. This patch fixes the issue by changing BPF insn pattern
matching and lowering to generate proper codes if the righthand
parameter of the above condition is a constant. A test case
is also added.
Signed-off-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308080 91177308-0d34-0410-b5e6-96231b3b80d8
Teresa Johnson [Sat, 15 Jul 2017 04:53:05 +0000 (04:53 +0000)]
[ThinLTO] Ensure we always select the same function copy to import
Summary:
Check if the first eligible callee is under the instruction threshold.
Checking this on the first eligible callee ensures that we don't end
up selecting different callees to import when we invoke this routine
with different thresholds due to reaching the callee via paths that
are shallower or hotter (when there are multiple copies, i.e. with
weak or linkonce linkage). We don't want to leave the decision of which
copy to import up to the backend.
Now, getUserCost() only checks the src and dst types of EXT to decide it is free
or not. This change first checks the types, then calls isExtFreeImpl(), and
check if EXT can form ExtLoad at last. Currently, only AArch64 has customized
implementation of isExtFreeImpl() to check if EXT can be folded into its use.
Jakub Kuderski [Sat, 15 Jul 2017 01:27:16 +0000 (01:27 +0000)]
[Dominators] Fix reachable visitation and reenable a unit test
This fixes a minor bug in insertion to a reachable node that caused
DominatorTree.InsertDeleteExhaustive flakiness. The patch also adds
a new testcase for this exact failure.
Jakub Kuderski [Fri, 14 Jul 2017 23:49:12 +0000 (23:49 +0000)]
[Dominators] Temporarily disable a flaky unit test
The DominatorTree.InsertDeleteExhaustive uses a RNG with a
constant seed to generate different sequences of updates. The test
fails on some buildbots and this patch disables it for now.
[libFuzzer] Allow non-fuzzer args after -ignore_remaining_args=1
With this change, libFuzzer will ignore any arguments after a sigil
argument, but it will preserve these arguments at the end of the
command line when launching subprocesses. Using this, its possible to
handle positional and single-dash arguments to the program under test
by discarding everything up to -ignore_remaining_args=1 in
LLVMFuzzerInitialize.
Jakub Kuderski [Fri, 14 Jul 2017 21:58:53 +0000 (21:58 +0000)]
[Dominators] Implement incremental deletions
Summary:
This patch implements incremental edge deletions.
It also makes DominatorTreeBase store a pointer to the parent function. The parent function is needed to perform full rebuilts during some deletions, but it is also used to verify that inserted and deleted edges come from the same function.
[AArch64][Falkor] Avoid HW prefetcher tag collisions (step 1)
Summary:
This patch is the first step in reducing HW prefetcher instruction tag
collisions in inner loops for Falkor. It adds a pass that annotates IR
loads with metadata to indicate that they are known to be strided loads,
and adds a target lowering hook that translates this metadata to a
target-specific MachineMemOperand flag.
A follow on change will use this MachineMemOperand flag to re-write
instructions to reduce tag collisions.
Summary:
When checking for memory dependencies between calls using MemorySSA,
handle cases where the calls have no MemoryAccess associated with them
because the AA analysis being used has determined that the call does not
read/write memory.
The Select in the above pattern will be unfolded and then jump-threaded. The
current implementation does not allow CMP in the middle of PHI and Select.
[TableGen][MC] Fix a few places where we didn't hide the underlying type of LaneBitmask very well.
One place compared with 32, which I've replaced with LaneBitmask::BitWidth.
The other places are shifts of a constant 1 by a lane number. But if LaneBitmask were to be a larger type than 32-bits like 64-bits, the 1 would need to be 1ULL to do a 64-bit shift. To hide this I've added a LanebitMask::getLane that hides the shift and make sures the 1 is casted to correct type first.
Jakub Kuderski [Fri, 14 Jul 2017 18:26:09 +0000 (18:26 +0000)]
[Dominators] Make IsPostDominator a template parameter
Summary:
DominatorTreeBase used to have IsPostDominators (bool) member to indicate if the tree is a dominator or a postdominator tree. This made it possible to switch between the two 'modes' at runtime, but it isn't used in practice anywhere.
This patch makes IsPostDominator a template argument. This way, it is easier to switch between different algorithms at compile-time based on this argument and design external utilities around it. It also makes it impossible to incidentally assign a postdominator tree to a dominator tree (and vice versa), and to further simplify template code in GenericDominatorTreeConstruction.
Alfred Huang [Fri, 14 Jul 2017 17:56:55 +0000 (17:56 +0000)]
[AMDGPU] Do not insert an instruction into worklist twice in movetovalu
In moveToVALU(), move to vector ALU is performed, all instrs in
the use chain will be visited. We do not want the same node to be
pushed to the visit worklist more than once.
Jakub Kuderski [Fri, 14 Jul 2017 16:56:35 +0000 (16:56 +0000)]
[Dominators] Simplify block and node printing
Summary:
This patch adds `BlockPrinter`-- a small wrapper for printing CFG nodes and DomTree nodes to `raw_ostream`. It is meant to be only used internally, for debugging and printing errors.
[Hexagon] Add intrinsics for data cache operations
This is the LLVM part, adding definitions for
void @llvm.hexagon.Y2.dccleana(i8*)
void @llvm.hexagon.Y2.dccleaninva(i8*)
void @llvm.hexagon.Y2.dcinva(i8*)
void @llvm.hexagon.Y2.dczeroa(i8*)
void @llvm.hexagon.Y4.l2fetch(i8*, i32)
void @llvm.hexagon.Y5.l2fetch(i8*, i64)
The clang part will follow.
Nirav Dave [Fri, 14 Jul 2017 13:56:21 +0000 (13:56 +0000)]
Improve Aliasing of operations to static alloca
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.
Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.
Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.
The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.
Simon Dardis [Fri, 14 Jul 2017 13:44:12 +0000 (13:44 +0000)]
Reland "[mips][mt][6/7] Add support for mftr, mttr instructions.""
Unlike many other instructions, these instructions have aliases which
take coprocessor registers, gpr register, accumulator (and dsp accumulator)
registers, floating point registers, floating point control registers and
coprocessor 2 data and control operands.
For the moment, these aliases are treated as pseudo instructions which are
expanded into the underlying instruction. As a result, disassembling these
instructions shows the underlying instruction and not the alias.
The last version of this patch broke one of the expensive checks buildbots,
this version changes the failing test/MC/Mips/mt/invalid.s and other invalid
tests to write the errors to a file and run FileCheck on that, rather than
relying on the 'not llvm-mc ... <%s 2>&1 | Filecheck %s' idiom.
[mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
The following instructions are examined and transformed, if possible:
ADDIU instruction is transformed into 16-bit instruction ADDIUSP
ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP
Function InRange is changed to avoid left shifting of negative values, since
that caused some sanitizer tests to fail (so the previous patch
Differential Revision: https://reviews.llvm.org/D34511
Insert a TSTri to set the flags and a Bcc to branch based on their
values. This is a bit inefficient in the (common) cases where the
condition for the branch comes from a compare right before the branch,
since we set the flags both as part of the compare lowering and as part
of the branch lowering. We're going to live with that until we settle on
a principled way to handle this kind of situation, which occurs with
other patterns as well (combines might be the way forward here).
Sam Parker [Fri, 14 Jul 2017 08:23:56 +0000 (08:23 +0000)]
[ARM] Allow rematerialization of ARM Thumb literal pool loads
Constants are crucial for code size in the ARM Thumb-1 instruction
set. The 16 bit instruction size often does not offer enough space
for immediate arguments. This means that additional instructions are
frequently used to load constants into registers. Since constants are
hoisted, this can lead to significant register spillage if they are
used multiple times in a single function. This can be avoided by
rematerialization, i.e. recomputing a constant instead of reloading
it from the stack. This patch fixes the rematerialization of literal
pool loads in the ARM Thumb instruction set.
Max Kazantsev [Fri, 14 Jul 2017 06:35:03 +0000 (06:35 +0000)]
[IRCE] Fix corner case with Start = INT_MAX
When iterating through loop
for (int i = INT_MAX; i > 0; i--)
We fail to generate the pre-loop for it. It happens because we use the
overflown value in a comparison predicate when identifying whether or not
we need it.
In old logic, we used SLE predicate against Greatest value which exceeds all
seen values of the IV and might be overflown. Now we use the GreatestSeen
value of this IV with SLT predicate.
Also added a test that ensures that a pre-loop is generated for such loops.
Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that
is possible to further optimize fcanonicalize and remove it
if applied to min/max given their operands are known not to be
an sNaN or that sNaNs are not supported.
Additionally we can remove fcanonicalize if denorms are supported
for the VT and we know that its argument is never a NaN.
[DWARF] Introduce verification for the unit header chain in .debug_info section to llvm-dwarfdump.
This patch adds verification checks for the unit header chain in the .debug_info section.
Specifically, for each unit in the .debug_info section, the verifier checks that:
The unit length is valid (i.e. the unit can actually fit in the .debug_info section)
The dwarf version of the unit is valid
The address size is valid (4 or 8)
The unit type (if the unit is in dwarf5) is valid
The debug_abbrev_offset is valid
[PDB] Fix quadratic behavior when writing a BinaryItemStream
Binary streams are an abstraction over a discontiguous buffer. To write
a discontiguous buffer, we want to copy each contiguous chunk
individually. Currently BinaryStreams do not expose a way to iterate
over the chunks, so the code repeatedly calls
readLongestContiguousChunk() with an increasing offset. In order to
lookup the chunk by offset, we would iterate the items list to figure
out which chunk the offset is within. This is obviously O(n^2).
Instead, pre-compute a table of offsets and do a binary search to figure
out which chunk to use. This is still only an O(n^2) to O(n log n)
improvement, but it's a very local fix that seems worth doing.
This improves self-linking lld.exe with PDBs from 90s to 10s.
Jakub Kuderski [Thu, 13 Jul 2017 21:16:01 +0000 (21:16 +0000)]
[Dominators] Add CFGBuilder testing utility
Summary:
This patch introduces a new testing utility for building and modifying CFG -- CFGBuilder. The primary use case for the utility is testing the upcoming incremental dominator tree update API.
The current design provides a simple mechanism of constructing arbitrary graphs and then applying series of updates to them. CFGBuilder takes care of creating empty functions, connecting and disconnecting basic blocks. Under the hood it uses SwitchInst and UnreachableInst.
It will be also possible to create a thin wrapper over CFGBuilder for parsing string input and to hook it up to other textual tools (e.g. opt used with FileCheck).
Jakub Kuderski [Thu, 13 Jul 2017 20:45:32 +0000 (20:45 +0000)]
[Dominators] Simplify templates
Summary: DominatorTreeBase and related classes used overcomplicated template machinery. This patch simplifies them and gets rid of DominatorTreeBaseTraits and DominatorTreeBaseByTraits, which weren't actually used outside the DomTree construction.
Jakub Kuderski [Thu, 13 Jul 2017 20:35:01 +0000 (20:35 +0000)]
[Dominators] Split SemiNCA into smaller functions
Summary:
This patch splits the SemiNCA algorithm into smaller functions. It also adds a new debug macro.
In order to perform incremental updates, we need to be able to refire SemiNCA on a subset of CFG nodes (determined by a DFS walk results). We also need to skip nodes that are not deep enough in a DomTree.
Summary:
This fixes type indices for SDK or CRT static archives. Previously we'd
try to look next to the archive object file path, which would not exist
on the local machine.
Also error out if we can't resolve a type server record. Hypothetically
we can recover from this error by discarding debug info for this object,
but that is not yet implemented.
George Karpenkov [Thu, 13 Jul 2017 19:26:27 +0000 (19:26 +0000)]
[lit] add a -vv option to echo all executed commands.
Debugging LIT scripts can be rather painful, as LIT directly does not
specify which line has failed.
Rather, FileCheck is expected to report the failing location, but it can
be often ambiguous if multiple commands are tested against the same
prefix. This change adds a -vv option, which echoes all output.
Then detecting the error becomes straightforward: last printed line is
the failing one.
Of course, it could be desired to try to get failing line number
directly from bash, but it involves excessive hacks on older bash
versions (cf.
https://stackoverflow.com/questions/24398691/how-to-get-the-real-line-number-of-a-failing-bash-command)
Jakub Kuderski [Thu, 13 Jul 2017 18:55:52 +0000 (18:55 +0000)]
[Dominators] Improve reachability verification
Summary:
This patch improves verification by making `verifyReachablility` look for CFG not found in the DomTree.
It also makes the verification work with postdominators by handling virtual root.
[PowerPC] Ensure displacements for DQ-Form instructions are multiples of 16
As outlined in the PR, we didn't ensure that displacements for DQ-Form
instructions are multiples of 16. Since the instruction encoding encodes
a quad-word displacement, a sub-16 byte displacement is meaningless and
ends up being encoded incorrectly.
Martin Storsjo [Thu, 13 Jul 2017 17:03:12 +0000 (17:03 +0000)]
[AArch64] Implement support for windows style vararg functions
Pass parameters properly in calls to such functions (pass all
floats in integer registers), and handle va_start properly (allocate
stack immediately below the arguments on the stack, to save the
register arguments into a single continuous array).
Anna Thomas [Thu, 13 Jul 2017 13:21:23 +0000 (13:21 +0000)]
[RuntimeUnrolling] Update DomTree correctly when exit blocks have successors
Summary:
When we runtime unroll with multiple exit blocks, we also need to update the
immediate dominators of the immediate successors of the exit blocks.