Davide Italiano [Fri, 21 Oct 2016 01:37:02 +0000 (01:37 +0000)]
Revert "[GVN/PRE] Hoist global values outside of loops."
There's no agreement about this patch. I personally find the
PRE machinery of the current GVN hard enough to reason about
that I'm not sure I'll try to land this again, instead of working
on the rewrite).
Keno Fischer [Thu, 20 Oct 2016 22:15:56 +0000 (22:15 +0000)]
Fix cross-endianness RuntimeDyld relocation for ARM
rL284780 fixed the PREL31 relocation and added a test for it. Being
the first such test for ARM relocations, it exposed incorrect endianness
assumptions (causing buildbot failures on big-endian hosts). Fix that by
using the same helpers used for the x86 case.
Daniel Berlin [Thu, 20 Oct 2016 20:13:45 +0000 (20:13 +0000)]
[MSSA] Avoid unnecessary use walks when calling getClobberingMemoryAccess
Summary:
This allows us to mark when uses have been optimized.
This lets us avoid rewalking (IE when people call getClobberingAccess on everything), and also
enables us to later relax the requirement of use optimization during updates with less cost.
Kevin Enderby [Thu, 20 Oct 2016 20:10:30 +0000 (20:10 +0000)]
Another additional error check for invalid Mach-O files for the
load commands that use the MachO::twolevel_hints_command type
which includes only the LC_TWOLEVEL_HINTS load command.
This is not used in llvm libObject code or in llvm tool code. But
does appear in one of the binary test files. While this load command is
obsolete it is easier to add code for it in libObject than edit or change
the binary test case.
Zachary Turner [Thu, 20 Oct 2016 18:31:19 +0000 (18:31 +0000)]
[CodeView] Refactor serialization to use StreamInterface.
This was all using ArrayRef<>s before which presents a problem
when you want to serialize to or deserialize from an actual
PDB stream. An ArrayRef<> is really just a special case of
what can be handled with StreamInterface though (e.g. by using
a ByteStream), so changing this to use StreamInterface allows
us to plug in a PDB stream and get all the record serialization
and deserialization for free on a MappedBlockStream.
Subsequent patches will try to remove TypeTableBuilder and
TypeRecordBuilder in favor of class that operate on
Streams as well, which should allow us to completely merge
the reading and writing codepaths for both types and symbols.
Dehao Chen [Thu, 20 Oct 2016 18:06:52 +0000 (18:06 +0000)]
Using branch probability to guide critical edge splitting.
Summary:
The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
The performance impact on speccpu2006 on Intel sandybridge machines:
Summary:
While promoting *_EXTEND_VECTOR_INREG nodes whose inputs are already
promoted, perform the appropriate sign extension for the promoted node
before doing the *_EXTEND_VECTOR_INREG operation. If not, the undefined
high-order bits of the promoted operand may (a) be garbage inc ase of
zext) or (b) contribute the wrong sign-bit (in case of sext)
Updated the promote-vec3.ll test after this change. The diff shows
explicit zeroing in case of zext and intermediate sign extension in case
of sext.
This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.
This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.
If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.
As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.
Benjamin Kramer [Thu, 20 Oct 2016 15:36:38 +0000 (15:36 +0000)]
[Support] Remove llvm::alignOf now that all uses are gone.
Also clean up the legacy hacks for AlignedCharArray. I'm keeping
LLVM_ALIGNAS alive for a bit longer because GCC 4.8.0 (which we still
support apparently) shipped a buggy alignas(). All other supported
compilers have a working alignas.
Benjamin Kramer [Thu, 20 Oct 2016 12:20:28 +0000 (12:20 +0000)]
Do a sweep over move ctors and remove those that are identical to the default.
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.
Pavel Labath [Thu, 20 Oct 2016 12:05:50 +0000 (12:05 +0000)]
Reapply "Add Chrono.h - std::chrono support header"
This is a resubmission of r284590. The mingw build should be fixed now. The
problem was we were matching time_t with _localtime_64s, which was incorrect on
_USE_32BIT_TIME_T systems. Instead I use localtime_s, which should always
evaluate to the correct function.
Victor Leschuk [Thu, 20 Oct 2016 00:13:12 +0000 (00:13 +0000)]
DebugInfo: preparation to implement DW_AT_alignment
- Add alignment attribute to DIVariable family
- Modify bitcode format to match new DIVariable representation
- Update tests to match these changes (also add bitcode upgrade test)
- Expect that frontend passes non-zero align value only when it is not default
(was forcibly aligned by alignas()/_Alignas()/__atribute__(aligned())
Kevin Enderby [Wed, 19 Oct 2016 23:44:34 +0000 (23:44 +0000)]
Next set of additional error checks for invalid Mach-O files for the
load commands that use the MachO::thread_command type
but are not used in llvm libObject code but used in llvm tool code.
This includes the LC_UNIXTHREAD and LC_THREAD
load commands.
A quick note about the philosophy of the error checking in
libObject for Mach-O files, the idea behind the checking is
that we never will return a Mach-O file out of libObject that
contains unknown things in the load commands.
To do this the 32-bit ARM and PPC general tread states
needed to be defined as two test case binaries contained
them. If other thread states for other CPUs need to be
added we will do that as needed.
Going forward the LC_MAIN load command is used to
set the entry point in Mach-O executables these days
instead of an LC_UNIXTHREAD as was done in the past.
So today only in core files are LC_THREAD load commands
and thread states usually found.
Other thread states have not yet been defined in
include/Support/MachO.h at this time. But that can be
added as needed with their corresponding checking also
added.
Rong Xu [Wed, 19 Oct 2016 22:51:17 +0000 (22:51 +0000)]
[PGO] Fix bogus warning for merging empty llvm profile file
Profile runtime can generate an empty raw profile (when there is no function in
the shared library). This empty profile is treated as a text format profile. A
test format profile without the flag of "#IR" is thought to be a clang
generated profile. So in llvm profile merging, we will get a bogus warning of
"Merge IR generated profile with Clang generated profile."
The fix here is to skip the empty profile (when the buffer size is 0) for
profile merge.
Lang Hames [Wed, 19 Oct 2016 22:41:03 +0000 (22:41 +0000)]
[BuildingAJIT] Use the remote target triple to construct the TargetMachine in
Chapter 5.
Chapter 5 demonstrates remote JITing: code is executed on the remote, not the
machine running the REPL, so it's the remote's triple (and TargetMachine) that
we need.
Lang Hames [Wed, 19 Oct 2016 22:19:38 +0000 (22:19 +0000)]
Remove the JIT EH/small code model tests for now.
These tests rely on two sections being allocated with a limited displacement
from one to the other to work. We've never guaranteed this, and consequently
these tests usually fail. That led to them being XFAILed, but now they XPASS
whenever the sections do happen to be allocated nearby in memory. So I'm
removing these for now to get rid of the noise. We can re-instate them if/when
we take the time to implement a displacement-respecting allocator.
Chris Bieneman [Wed, 19 Oct 2016 21:50:25 +0000 (21:50 +0000)]
[CMake] Make the runtimes directory work with bootstrap builds
This patch builds on clang r284648, and allows the runtime directory to make the bootstrap builds depend on the builtin libraries.
This patch also make the bootstrap build depend on configuring the other runtimes because the libcxx headers are copied during configuration. I have left a TODO in the code to remove that once I come up with a better solution.
Sanjay Patel [Wed, 19 Oct 2016 21:23:45 +0000 (21:23 +0000)]
[InstSimplify] fold negation of sign-bit
0 - X --> X, if X is 0 or the minimum signed value
0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW
I noticed this pattern might be created in the backend after the change from D25485,
so we'll want to add a similar fold for the DAG.
The use of computeKnownBits in InstSimplify may be something to investigate if the
compile time of InstSimplify is noticeable. We could replace computeKnownBits with
specific pattern matchers or limit the recursion.
Reid Kleckner [Wed, 19 Oct 2016 19:56:22 +0000 (19:56 +0000)]
[GlobalMerge] Handle non-landingpad EH pads
This code crashed on funclet-style EH instructions such as catchpad,
catchswitch, and cleanuppad. Just treat all EH pad instructions
equivalently and avoid merging the globals they reference through any
use.
Matthew Simpson [Wed, 19 Oct 2016 19:22:02 +0000 (19:22 +0000)]
[LV] Avoid emitting trivially dead instructions
Some instructions from the original loop, when vectorized, can become trivially
dead. This happens because of the way we structure the new loop. For example,
we create new induction variables and induction variable "steps" in the new
loop. Thus, when we go to vectorize the original induction variable update, it
may no longer be needed due to the instructions we've already created. This
patch prevents us from creating these redundant instructions. This reduces code
size before simplification and allows greater flexibility in code generation
since we have fewer unnecessary instruction uses.
Artur Pilipenko [Wed, 19 Oct 2016 18:59:03 +0000 (18:59 +0000)]
[IndVarSimplify] Use control-dependent range information to prove non-negativity
This change is motivated by the case when IndVarSimplify doesn't widen a comparison of IV increment because it can't prove IV increment being non-negative. We end up with a redundant trunc of the widened increment on this example.
There is a range check inside of the loop which guarantees the IV to be non-negative. NSW on the increment guarantees that the increment is also non-negative. Teach IndVarSimplify to use the range check to prove non-negativity of loop increments.
Mehdi Amini [Wed, 19 Oct 2016 18:02:21 +0000 (18:02 +0000)]
[ADT] Zip range adapter
This augments the STLExtras toolset with a zip iterator and range
adapter. Zip comes in two varieties: `zip`, which will zip to the
shortest of the input ranges, and `zip_first`, which limits its
`begin() == end()` checks to just the first range.
Recommit r284035 after MSVC2013 support has been dropped.
Vedant Kumar [Wed, 19 Oct 2016 17:55:44 +0000 (17:55 +0000)]
[llvm-cov] Don't spawn a thread unless ThreadCount > 1
Initializing a ThreadPool with ThreadCount = 1 spawns a thread even
though we don't need to. This is at least slower than it needs to be,
and at worst may somehow be exacerbating PR30735 (llvm-cov times out
on ARM bots).
As a follow-up, I'll try to add logic to llvm::ThreadPool to avoid
spawning a thread when ThreadCount = 1.
Teresa Johnson [Wed, 19 Oct 2016 17:35:01 +0000 (17:35 +0000)]
[ThinLTO] Default backend threads to heavyweight_hardware_concurrency
Summary:
Changes default backend parallelism from thread::hardware_concurrency to
the new llvm::heavyweight_hardware_concurrency, which for X86 Linux
defaults to the number of physical cores (and will fall back to
thread::hardware_concurrency otherwise). This avoid oversubscribing
the physical cores using hyperthreading.
Pavel Labath [Wed, 19 Oct 2016 17:17:53 +0000 (17:17 +0000)]
Revert "Add Chrono.h - std::chrono support header"
This reverts commit r284590 as it fails on the mingw buildbot. I think I know the
fix, but I cannot test it right now. Will reapply when I verify it works ok.
Sanjay Patel [Wed, 19 Oct 2016 16:58:59 +0000 (16:58 +0000)]
[DAG] optimize negation of bool
Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG.
With the mask, this should be no worse than 2 shifts. The mask can be eliminated
in some cases, so that should be better than 2 shifts.
This change exposed some missing folds related to negation:
https://reviews.llvm.org/rL284239
https://reviews.llvm.org/rL284395
There may be others, so please let me know if you see any regressions.
[RDF] Switch RefMap in liveness calculation to use lane masks
This required reengineering of some of the part of liveness calculation,
including fixing some issues caused by the limitations of the previous
approach. The current code is not necessarily the fastest, but it should
be functionally correct (at least more so than before). The compile-time
performance will be addressed in the future.
Pavel Labath [Wed, 19 Oct 2016 13:58:55 +0000 (13:58 +0000)]
Add Chrono.h - std::chrono support header
Summary:
std::chrono mostly covers the functionality of llvm::sys::TimeValue and
lldb_private::TimeValue. This header adds a bit of utility functions and
typedefs, which make the usage of the library and porting code from TimeValues
easier.
Rationale:
- TimePoint typedef - precision of system_clock is implementation defined -
using a well-defined precision helps maintain consistency between platforms,
makes it interact better with existing TimeValue classes, and avoids cases
there a time point is implicitly convertible to a specific precision on some
platforms but not on others.
- system_clock::to_time_t only accepts time_points with the default system
precision (even though time_t has only second precision on all platforms we
support). To avoid the need for explicit casts, I have added a toTimeT()
wrapper function. toTimePoint(time_t) was not strictly necessary, but I have
added it for symmetry.
Ulrich Weigand [Wed, 19 Oct 2016 13:03:18 +0000 (13:03 +0000)]
[SystemZ] Add missing vector instructions for the assembler
Most z13 vector instructions have a base form where the data type of
the operation (whether to consider the vector to be 16 bytes, 8
halfwords, 4 words, or 2 doublewords) is encoded into a mask field,
and then a set of extended mnemonics where the mask field is not
present but the data type is encoded into the mnemonic name.
Currently, LLVM only supports the type-specific forms (since those
are really the ones needed for code generation), but not the base
type-generic forms.
To complete the assembler support and make it fully compatible with
the GNU assembler, this commit adds assembler aliases for all the
base forms of the various vector instructions.
It also adds two more alias forms that are documented in the PoP:
VFPSO/VFPSODB/WFPSODB -- generic form of VFLCDB etc.
VNOT -- special variant of VNO
Ulrich Weigand [Wed, 19 Oct 2016 12:57:46 +0000 (12:57 +0000)]
[SystemZ] Add optional argument to some vector string instructions
The vfee[bhf], vfene[bhf], and vistr[bhf] assembler mnemonics are
documented in the Principles of Operation to have an optional last
operand to encode arbitrary values in a mask field.
This commit adds support for those optional operands, and cleans up
the patterns to generate vector string instruction as bit. No change
to code generation intended.
Michal Gorny [Wed, 19 Oct 2016 12:18:34 +0000 (12:18 +0000)]
[cmake] Declare LLVM_CMAKE_PATH for use in subprojects
Declare the LLVM_CMAKE_PATH to the source directory location of CMake
files, in order to make it possible to easily use them in subprojects.
Such a variable is already declared in most of LLVM projects
(and inconsistently mixed with direct source tree references), including
Clang, LLDB, compiler-rt, libcxx... Declaring it inside main LLVM tree
makes it possible to avoid having to declare fallback values or use
conditionals in those projects.
It should be noted that in some of the subprojects LLVM_CMAKE_PATH is
used to reference generated LLVMConfig.cmake file. However, these
references are conditional to stand-alone builds and explicitly
including this file is unnecessary in combined builds.
James Molloy [Wed, 19 Oct 2016 12:06:49 +0000 (12:06 +0000)]
[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables
The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.
It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.
TBB example:
Before: lsls r0, r0, #2 After: add r0, pc
adr r1, .LJTI0_0 ldrb r0, [r0, #6]
ldr r0, [r0, r1] lsls r0, r0, #1
mov pc, r0 add pc, r0
=> No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.
The only case that can increase dynamic instruction count is the TBH case:
Before: lsls r0, r4, #2 After: lsls r4, r4, #1
adr r1, .LJTI0_0 add r4, pc
ldr r0, [r0, r1] ldrh r4, [r4, #6]
mov pc, r0 lsls r4, r4, #1
add pc, r4
=> 1 more instruction in prologue. Jump table shrunk by a factor of 2.
So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)
Sjoerd Meijer [Wed, 19 Oct 2016 07:25:06 +0000 (07:25 +0000)]
Checking FP function attribute values and adding more build attribute tests.
This renames the function for checking FP function attribute values and also
adds more build attribute tests (which are in separate files because build
attributes are set per file).
NAKAMURA Takumi [Wed, 19 Oct 2016 05:43:17 +0000 (05:43 +0000)]
DenseSet: Appease msc18 to define derived constructors explicitly.
msc18 doesn't recognize "using BaseT::BaseT;"
llvm\include\llvm/ADT/DenseSet.h(213) : error C2875: using-declaration causes a multiple declaration of 'BaseT'
llvm\include\llvm/ADT/DenseSet.h(214) : see reference to class template instantiation 'llvm::DenseSet<ValueT,ValueInfoT>' being compiled
llvm\include\llvm/ADT/DenseSet.h(231) : error C2875: using-declaration causes a multiple declaration of 'BaseT'
llvm\include\llvm/ADT/DenseSet.h(232) : see reference to class template instantiation 'llvm::SmallDenseSet<ValueT,InlineBuckets,ValueInfoT>' being compiled
Craig Topper [Wed, 19 Oct 2016 04:44:17 +0000 (04:44 +0000)]
[AVX-512] Teach isel lowering that a subvector broadcast being inserted into both halves of a 512-bit vector can be combined into a larger subvector broadcast.
Summary:
This allows us to create broadcasts of 128-bit vector loads into 512-bit vectors.
New patterns added to support 8-bit and 16-bit vector types and v2f64/v2i64->v8f64/v8i64 without DQI instructions.
There also fallback patterns when the load can't be folded. These patterns are a little complex as we first need to insert the lower 128-bits into the second 128-bits using a zmm subvector insert instruction. We need to use a zmm insert in case VLX isn't available. Then use another zmm sub vector insert to take those 256-bits and insert them into the upper bits. Since we used a zmm insert to create the 256-bits we also need to do a extract_subreg to get just the lower 256-bits to pass to the second insert.
The outer insert for the fallback patterns should have its type correct because eventually we should also supported masked operations here too. So we need a DQI and a NoDQI version of the v16f32/v16i32 patterns.
[libFuzzer] extend -print_coverage to also print uncovered lines, functions, and files.
Example of output:
COVERAGE:
COVERED: in DSO2(int) /pathto/DSO2.cpp:6
COVERED: in DSO2(int) /pathto/DSO2.cpp:8
COVERED: in DSO1(int) /pathto/DSO1.cpp:6
COVERED: in DSO1(int) /pathto/DSO1.cpp:8
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:16
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:19
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:25
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:26
MODULE_WITH_COVERAGE: /pathto/libLLVMFuzzer-DSO1.so
UNCOVERED_LINE: in DSO1(int) /pathto/DSO1.cpp:9
UNCOVERED_FUNC: in Uncovered1()
MODULE_WITH_COVERAGE: /pathto/libLLVMFuzzer-DSO2.so
UNCOVERED_LINE: in DSO2(int) /pathto/DSO2.cpp:9
UNCOVERED_FUNC: in Uncovered2()
MODULE_WITH_COVERAGE: /pathto/LLVMFuzzer-DSOTest
UNCOVERED_LINE: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:21
UNCOVERED_LINE: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:27
UNCOVERED_FILE: /pathto/DSOTestExtra.cpp
Several things are not perfect here:
* we are using objdump+awk instead of sancov because sancov does not support DSOs yet.
* this breaks in the presence of ASAN_OPTIONS=strip_path_prefix=...
(need to implement another API to get the module name by PC)