This change returns empty PSet list for M0 register. Otherwise its
PSet as defined by tablegen is SReg_32. This results in incorrect
register pressure calculation every time an instruction uses M0.
Such uses count as SReg_32 PSet and inadequately increase pressure
on SGPRs.
Eric Fiselier [Fri, 10 Feb 2017 01:59:20 +0000 (01:59 +0000)]
[CMake] Fix pthread handling for out-of-tree builds
LLVM defines `PTHREAD_LIB` which is used by AddLLVM.cmake and various projects
to correctly link the threading library when needed. Unfortunately
`PTHREAD_LIB` is defined by LLVM's `config-ix.cmake` file which isn't installed
and therefore can't be used when configuring out-of-tree builds. This causes
such builds to fail since `pthread` isn't being correctly linked.
This patch attempts to fix that problem by renaming and exporting
`LLVM_PTHREAD_LIB` as part of`LLVMConfig.cmake`. I renamed `PTHREAD_LIB`
because It seemed likely to cause collisions with downstream users of
`LLVMConfig.cmake`.
Marcos Pividori [Fri, 10 Feb 2017 01:40:28 +0000 (01:40 +0000)]
[libFuzzer] Export external functions on tests.
We need to export external functions so they are found when calling
GetProcAddress() on Windows. But we can't use `__declspec(dllexport)` because
we want the targets to be completely independent from the fuzz engines and don't
depend on other header files. Also, we don't want to include platform specific
code managed with conditional macros.
So, the solution is to add the exported symbols with linker flags in cmake.
Marcos Pividori [Fri, 10 Feb 2017 01:35:46 +0000 (01:35 +0000)]
[libFuzzer] Use dynamic loading for External Functions on Windows.
Replace weak aliases with dynamic loading.
Weak aliases were generating some problems when linking for MT on Windows. For
MT, compiler-rt's libraries are statically linked to the main executable the
same than libFuzzer, so if we use weak aliases, we are providing two different
default implementations for the same weak function and the linker fails.
In this diff I re implement ExternalFunctions() using dynamic loading, so it
works in both cases (MD and MT). Also, dynamic loading is simpler, since we are
not defining any auxiliary external function, and we don't need to deal with
weak aliases.
This is equivalent to the implementation using dlsym(RTLD_DEFAULT, FnName) for
Posix.
Dan Gohman [Fri, 10 Feb 2017 00:02:58 +0000 (00:02 +0000)]
[Support] Extend SLEB128 encoding support.
Add support for padded SLEB128 values, and support for writing SLEB128
values to buffers rather than to ostreams, similar to the existing
ULEB128 support.
The pygments syntax highlighting package used by sphinx fails to parse
newer LLVM constructs or valid (at least to me) gas constructs like
`.secrel32 _function_name + 0`.
Disable this particular warning so the build doesn't abort as fixing
pygments doesn't seem a workable option here.
This needs explicit requires of the optimization remark emission before
loop pass pipelines containing LICM as we no longer get it from the
inliner -- Argument Promotion may invalidate it. Technically the inliner
could also have broken this, but it never came up in testing.
[PM] Port ArgumentPromotion to the new pass manager.
Now that the call graph supports efficient replacement of a function and
spurious reference edges, we can port ArgumentPromotion to the new pass
manager very easily.
The old PM-specific bits are sunk into callbacks that the new PM simply
doesn't use. Unlike the old PM, the new PM simply does argument
promotion and afterward does the update to LCG reflecting the promoted
function.
[PM/LCG] Teach LCG to support spurious reference edges.
Somewhat amazingly, this only requires teaching it to clean them up when
deleting a dead function from the graph. And we already have exactly the
necessary data structures to do that in the parent RefSCCs.
This allows ArgPromote to work in a much simpler way be merely letting
reference edges linger in the graph after the causing IR is deleted. We
will clean up these edges when we run any function pass over the IR, but
don't remove them eagerly.
This avoids all of the quadratic update issues both in the current pass
manager and in my previous attempt with the new pass manager.
[PM/LCG] Teach the LazyCallGraph how to replace a function without
disturbing the graph or having to update edges.
This is motivated by porting argument promotion to the new pass manager.
Because of how LLVM IR Function objects work, in order to change their
signature a new object needs to be created. This is efficient and
straight forward in the IR but previously was very hard to implement in
LCG. We could easily replace the function a node in the graph
represents. The challenging part is how to handle updating the edges in
the graph.
LCG previously used an edge to a raw function to represent a node that
had not yet been scanned for calls and references. This was the core
of its laziness. However, that model causes this kind of update to be
very hard:
1) The keys to lookup an edge need to be `Function*`s that would all
need to be updated when we update the node.
2) There will be some unknown number of edges that haven't transitioned
from `Function*` edges to `Node*` edges.
All of this complexity isn't necessary. Instead, we can always build
a node around any function, always pointing edges at it and always using
it as the key to lookup an edge. To maintain the laziness, we need to
sink the *edges* of a node into a secondary object and explicitly model
transitioning a node from empty to populated by scanning the function.
This design seems much cleaner in a number of ways, but importantly
there is now exactly *one* place where the `Function*` has to be
updated!
Some other cleanups that fall out of this include having something to
model the *entry* edges more accurately. Rather than hand rolling parts
of the node in the graph itself, we have an explicit `EdgeSequence`
object that gives us exactly the functionality needed. We also have
a consistent place to define the edge iterators and can use them for
both the entry edges and the internal edges of the graph.
The API used to model the separation between a node and its edges is
intentionally very thin as most clients are expected to deal with nodes
that have populated edges. We model this exactly as an optional does
with an additional method to populate the edges when that is
a reasonable thing for a client to do. This is based on API design
suggestions from Richard Smith and David Blaikie, credit goes to them
for helping pick how to model this without it being either too explicit
or too implicit.
The patch is somewhat noisy due to shifting around iterator types and
new syntax for walking the edges of a node, but most of the
functionality change is in the `Edge`, `EdgeSequence`, and `Node` types.
X86: Teach X86InstrInfo::analyzeCompare to recognize compares of symbols.
This requires that we communicate to X86InstrInfo::optimizeCompareInstr
that the second operand is neither a register nor an immediate. The way we
do that is by setting CmpMask to zero.
Note that there were already instructions where the second operand was not a
register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero
for those instructions. This seems like a latent bug, but I was unable to
trigger it.
Adrian McCarthy [Thu, 9 Feb 2017 21:51:19 +0000 (21:51 +0000)]
Introduce NativeRawSymbol for PDB reading.
This is a stub for a new concrete implementation of IPDBRawSymbol.
Nothing uses this uses this implementation yet. My plan is to
locally switch lldb-pdbdump from the DIA reader to the Native one
and flesh out the implementations of these method stubs in the order
they're needed.
Daniel Berlin [Thu, 9 Feb 2017 20:37:24 +0000 (20:37 +0000)]
GraphTraits: Add range versions of graph traits functions (graph_nodes, graph_children, inverse_graph_nodes, inverse_graph_children).
Summary:
Convert all obvious node_begin/node_end and child_begin/child_end
pairs to range based for.
Sending for review in case someone has a good idea how to make
graph_children able to be inferred. It looks like it would require
changing GraphTraits to be two argument or something. I presume
inference does not happen because it would have to check every
GraphTraits in the world to see if the noderef types matched.
Note: This change was 3-staged with clang as well, which uses
Dominators/etc from LLVM.
Frederic Riss [Thu, 9 Feb 2017 19:41:55 +0000 (19:41 +0000)]
[dsymutil] Fix handling of empty CUs in LTO links.
r288399 introduced the DIEUnit class, and in the process broke
the corner case where dsymutil generates an empty CU during an
LTO link. This restores the logic and adds a test for the corner
case.
Sanjoy Das [Thu, 9 Feb 2017 19:40:22 +0000 (19:40 +0000)]
[JumpThreading] Thread through guards
Summary:
This patch allows JumpThreading also thread through guards.
Virtually, guard(cond) is equivalent to the following construction:
if (cond) { do something } else {deoptimize}
Yet it is not explicitly converted into IFs before lowering.
This patch enables early threading through guards in simple cases.
Currently it covers the following situation:
if (cond1) {
// code A
} else {
// code B
}
// code C
guard(cond2)
// code D
If there is implication cond1 => cond2 or !cond1 => cond2, we can transform
this construction into the following:
if (cond1) {
// code A
// code C
} else {
// code B
// code C
guard(cond2)
}
// code D
Thus, removing the guard from one of execution branches.
Vedant Kumar [Thu, 9 Feb 2017 19:37:18 +0000 (19:37 +0000)]
[utils] coverage: Add help text about the --restrict flag (NFC)
Passing the --restrict flag to the coverage prep script before other
positional arguments is wrong, because it prevents the argparse module
from telling apart arguments to --restrict versus positional arguments.
ld64 requires its archive members to be 8-byte aligned for 64-bit
content and 4-byte aligned for 32-bit content. Opt for the larger
alignment requirement. This ensures that ld64 can consume archives
generated by llvm-ar.
Thanks to Kevin Enderby for the hint about the ld64/cctools behaviours!
Geoff Berry [Thu, 9 Feb 2017 18:28:17 +0000 (18:28 +0000)]
[SelectionDAG] Fix bugs in inverted condition splitting code.
Summary:
Fix two bugs in SelectionDAGBuilder::FindMergedConditions reported by
Mikael Holmen. Handle non-canonicalized xor not operation
correctly (was assuming operand 0 was always the non-constant operand)
and check that the negated condition is also in the same block as the
original and/or instruction (as is done for and/or operands already)
before proceeding with optimization.
Kevin Enderby [Thu, 9 Feb 2017 17:56:26 +0000 (17:56 +0000)]
Tweak the implementation of llvm-objdump’s -objc-meta-data option so
that it works when the ObjC metadata sections end up in the
__DATA_CONST or __DATA_DIRTY segments.
Summary:
Documentation update to reflect the changes that occured in the allocator:
- additional architectures support;
- modification of the header;
- options default values for 32 & 64-bit.
Artur Pilipenko [Thu, 9 Feb 2017 15:13:40 +0000 (15:13 +0000)]
Add DAGCombiner load combine tests for partially available values
If some of the trailing or leading bytes of a load combine pattern are zeroes we can combine the pattern to a load + zext and shift. Currently we don't support it, so the tests check the current codegen without load combine. This change will make the patch to support this kind of combine a bit more clear.
David Bozier [Thu, 9 Feb 2017 15:08:40 +0000 (15:08 +0000)]
[Stack Protection] Add diagnostic information for why stack protection was applied to a function
Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which function have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function.
This change adds an SSP-specific DiagnosticInfo class and uses of it to the Stack Protection code. A subsequent change to clang will cause the remarks to be emitted when enabled.
David Bozier [Thu, 9 Feb 2017 14:12:30 +0000 (14:12 +0000)]
[docs] cleanup documentation on lit substitutions
1. Added missing substitutions to the documentation in docs/TestingGuide.rst
2. Modified docs/CommandGuide/lit.rst to only document the "base" set of substitutions and to refer the reader to docs/TestingGuide.rst for more detailed info on substitutions.
Simon Pilgrim [Thu, 9 Feb 2017 11:50:19 +0000 (11:50 +0000)]
[X86][SSE] Attempt to break register dependencies during lowerBuildVector
LowerBuildVectorv16i8/LowerBuildVectorv8i16 insert values into a UNDEF vector if the build vector doesn't contain any zero elements, resulting in register dependencies with a previous use of the register.
This patch attempts to break the register dependency by either always zeroing the vector before hand or (if we're inserting to the 0'th element) by using VZEXT_MOVL(SCALAR_TO_VECTOR(i32 AEXT(Elt))) which lowers to (V)MOVD and performs a similar function. Additionally (V)MOVD is a shorter instruction than PINSRB/PINSRW. We already do something similar for SSE41 PINSRD.
On pre-SSE41 LowerBuildVectorv16i8 we go a little further and use VZEXT_MOVL(SCALAR_TO_VECTOR(i32 ZEXT(Elt))) if the build vector contains zeros to avoid the vector zeroing at the cost of a scalar zero extension, which can probably be brought over to the other cases in a future patch in some cases (load folding etc.)
Craig Topper [Thu, 9 Feb 2017 06:51:02 +0000 (06:51 +0000)]
[X86] Remove the HLE feature flag.
We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line.
Craig Topper [Thu, 9 Feb 2017 04:27:34 +0000 (04:27 +0000)]
[X86] Clzero intrinsic and its addition under znver1
This patch does the following.
1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero
2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1)
3. Adds the clzero feature under znver1 architecture.
4. The custom inserter is added in Lowering.
5. A testcase is added to check the intrinsic.
6. The clzero instruction is added to assembler test.
Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me.
cctools would pad the string table to a sizeof(int32_t) (explicitly
printed out by cctools rather than 4). This adjusts the string table to
make it more compatible with cctools, but is insufficient to make ld64
happy.
SwiftCC: swifterror register cannot be as the base register
Functions that have a dynamic alloca require a base register which is defined to
be X19 on AArch64 and r6 on ARM. We have defined the swifterror register to be
the same register. Use a different callee save register for swifterror instead:
LowerTypeTests: Change a few vtable globals in tests to constants.
It turns out that some of our negative tests were not in fact providing the
test coverage we expected: they were passing because the vtables were failing
an early check that they were constant. Fix this by changing the globals in
these tests to constants.
Tim Northover [Wed, 8 Feb 2017 23:23:32 +0000 (23:23 +0000)]
GlobalISel: translate @llvm.pow intrinsic to G_FPOW.
It'll usually be immediately legalized back to a libcall, but occasionally
something can be done with it so we'd just as well enable that flexibility from
the start.
[ARM/AArch ISel] SwiftCC: First parameters that are marked swiftself are not 'this returns'
We mark X0 as preserved by a call that passes the returned parameter.
x0 = ...
fun(x0) // no implicit def of x0
This no longer is valid if we pass the parameter in a different register then
the returned value as is the case with a swiftself parameter (passed in x20).
x20 = ...
fun(x20) // there should be an implict def of x8
Tim Northover [Wed, 8 Feb 2017 21:22:15 +0000 (21:22 +0000)]
GlobalISel: expand mul-with-overflow into mul-hi on AArch64.
AArch64 has specific instructions to multiply two numbers at double the width
and produce the high part of the result. These can be used to implement LLVM's
mul.with.overflow instructions fairly simply. Helps with C++ operator new[].
Implement getRegPressureLimit and getRegPressureSetLimit callbacks in
SIRegisterInfo.
This makes standard converge scheduler to behave almost the same as
GCNScheduler, sometime slightly better sometimes a bit worse.
In gerenal that is also possible to switch GCNScheduler to use these
callbacks instead of getMaxWaves(), which also makes GCNScheduler
slightly better on some tests and slightly worse on another. A big
win is behavior with converge scheduler.
Note, these are used not only by scheduling, but in places like
MachineLICM.
Hans Wennborg [Wed, 8 Feb 2017 20:58:33 +0000 (20:58 +0000)]
build_llvm_package.bat: Build teh clang-format plugin separately
In r293373 we switched the build to linking dynamically against the
Universal CRT and include the redistributables in the installer.
However, clang-format.exe is copied into the vsix and needs to be
statically linked. This commit makes us build the plugin in a separate
step that uses static linking.
ThinLTOBitcodeWriter: Strip debug info from merged module.
This module will contain nothing but vtable definitions and (soon)
available_externally function definitions, so there is no point in keeping
debug info in the module.
[Loop Vectorizer] Cost-based decision for vectorization form of memory instruction.
Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction.
The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution.
This patch includes the following changes:
- Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision.
- Arrays of Uniform and Scalar values are moved from Legality to Cost Model.
- Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form.
- Vectorization of memory instruction is performed according to the CM decision.