Alexey Samsonov [Wed, 2 Dec 2015 21:25:28 +0000 (21:25 +0000)]
[PowerPC] Remove wild call to RegScavenger::initRegState().
This call should in fact be made by RegScavenger::enterBasicBlock()
called below. The first call does nothing except for triggering UB,
indicated by UBSan (passing nullptr to memset()).
Alexey Samsonov [Wed, 2 Dec 2015 21:13:43 +0000 (21:13 +0000)]
[Hexagon] Remove std::hex in favor of format().
std::hex is not used anywhere in LLVM code base except for this place,
and it has a known undefined behavior (at least in libstdc++ 4.9.3):
https://llvm.org/bugs/show_bug.cgi?id=18156, which fires in UBSan
bootstrap of LLVM.
Kyle Butt [Wed, 2 Dec 2015 18:58:51 +0000 (18:58 +0000)]
[CodeGen]: Fix bad interaction with AntiDep breaking and inline asm.
AggressiveAntiDepBreaker was renaming registers specified by the user
for inline assembly. While this will work for compiler-specified
registers, it won't work for user-specified registers, and at the time
this runs, I don't currently see a way to distinguish them.
Fiona Glaser [Wed, 2 Dec 2015 18:32:59 +0000 (18:32 +0000)]
Scheduler / Regalloc: use unique_ptr[] instead of std::vector
vector.resize() is significantly slower than memset in many STLs
and the cost of initializing these vectors is significant on targets
with many registers. Since we don't need the overhead of a vector,
use a simple unique_ptr instead.
[llvm-profdata] Change instr prof counter overflow to saturate rather than discard
Summary: This changes overflow handling during instrumentation profile merge. Rathar than throwing away records that would result in counter overflow, merged counts are instead clamped to the maximum representable value. A warning about counter overflow is still surfaced to the user as before.
Tim Northover [Wed, 2 Dec 2015 18:12:57 +0000 (18:12 +0000)]
AArch64: use ldxp/stxp pair to implement 128-bit atomic loads.
The ARM ARM is clear that 128-bit loads are only guaranteed to have been atomic
if there has been a corresponding successful stxp. It's less clear for AArch32, so
I'm leaving that alone for now.
|9B DD /7| FSTSW m2byte| Valid Valid Store FPU status word at m2byteafter checking for pending unmasked floating-point exceptions.|
|9B DF E0| FSTSW AX| Valid Valid Store FPU status word in AX register after checking for pending unmasked floating-point exceptions.|
|DD /7 |FNSTSW *m2byte| Valid Valid Store FPU status word at m2bytewithout checking for pending unmasked floating-point exceptions.|
|DF E0 |FNSTSW *AX| Valid Valid Store FPU status word in AX register without checking for pending unmasked floating-point exceptions|
m2byte is word register, and therefor instruction operand need to be change from f32mem to i16mem.
Andy Gibbs [Wed, 2 Dec 2015 14:22:18 +0000 (14:22 +0000)]
Fix buildbots broken by r254508
g++ 4.7 does not allow an inline defaulted virtual destructor to be overridden,
giving the error "looser throw specifier for ... overridding ~SCEVPredicate()
noexcept (true)" (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53613).
The work-around given in the bug report above has been utilised here.
AVX-512: Updated cost of FP/SINT/UINT conversion operations
I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode.
Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets).
Akira Hatanaka [Wed, 2 Dec 2015 06:58:49 +0000 (06:58 +0000)]
[AttributeSet] Overload AttributeSet::addAttribute to reduce compile
time.
The new overloaded function is used when an attribute is added to a
large number of slots of an AttributeSet (for example, to function
parameters). This is much faster than calling AttributeSet::addAttribute
once per slot, because AttributeSet::getImpl (which calls
FoldingSet::FIndNodeOrInsertPos) is called only once per function
instead of once per slot.
With this commit, clang compiles a file which used to take over 22
minutes in just 13 seconds.
Craig Topper [Wed, 2 Dec 2015 06:39:19 +0000 (06:39 +0000)]
[X86] Change getZeroVector to take an MVT instead of EVT. One minor change needed to only try to perform 256-it shuffle combines on legal vector types.
David Blaikie [Wed, 2 Dec 2015 06:21:34 +0000 (06:21 +0000)]
[llvm-dwp] Emit a rather fictional debug_cu_index
This is very rudimentary support for debug_cu_index, but it is enough to
allow llvm-dwarfdump to find the offsets for contributions and
correctly dump debug_info.
It will need to actually find the real signature of the unit and build
the real hash table with the right number of buckets, as per the DWP
specification.
It will also need to be expanded to cover the tu_index as well.
[sanitizer coverage] when adding a bb trace instrumentation, do it instead, not in addition to, regular coverage. Do the regular coverage in the run-time instead
Mehdi Amini [Wed, 2 Dec 2015 02:00:29 +0000 (02:00 +0000)]
Modify FunctionImport to take a callback to load modules
When linking static archive, there is no individual module files to
load. Instead they can be mmap'ed and could be initialized from a
buffer directly. The callback provide flexibility to override the
scheme for loading module from the summary.
Tim Northover [Wed, 2 Dec 2015 00:33:54 +0000 (00:33 +0000)]
AArch64: fix 128-bit shifts
We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an
UNDEF value (and worse, it's not what you want with the natural Arch64
implementation).
The generated code is pretty horrific, but I couldn't come up with an obviously
better alternative (if the amount is constant EXTR could help). Turns out
128-bit shifts are just nasty.
For the struct with trailing objects, define
a member operator delete. Without this, the program
will fail when -fsized-deallocation option is used
where the wrong size will be passed to the global
delete operator.
Cong Hou [Tue, 1 Dec 2015 21:50:20 +0000 (21:50 +0000)]
Fix a bug in IfConversion.cpp.
The bug is introduced in r254377 which failed some tests on ARM, where a new
probability is assigned to a successor but the provided BB may not be a
successor.
Justin Bogner [Tue, 1 Dec 2015 20:20:49 +0000 (20:20 +0000)]
IR: Clean up some duplicated code in ConstantDataSequential creation. NFC
ConstantDataArray::getImpl and ConstantDataVector::getImpl had a lot
of copy pasta in how they handled sequences of constants. Break that
out into a couple of simple functions.
Matt Arsenault [Tue, 1 Dec 2015 19:57:17 +0000 (19:57 +0000)]
AMDGPU: Optimize VOP2 operand legalization
Don't use commuteInstruction, and don't commute if
doing so will not improve legality. Skip the more
complex checks for literal operands and constant bus restrictions,
which are not a concern for VOP2 instructions because src1
does not accept SGPRs or constants and few implicitly
read vcc.
This gets called quite a few times and the
attempts at commuting are a significant fraction
of the time spent in SIFixSGPRCopies, so it's
somewhat worthwhile to optimize. With this patch and others
leading up to it, this reduces the compile time of SIFixSGPRCopies
on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system.
Matt Arsenault [Tue, 1 Dec 2015 19:08:39 +0000 (19:08 +0000)]
AMDGPU: Report extractelement as free in cost model
The cost for scalarized operations is computed as N * (scalar operation
cost + 1 extractelement + 1 insertelement). This partially fixes
inflating the cost of scalarized operations since every operation is
scalarized and free. I don't think we want any cost asociated with
scalarization, but for now insertelement is still counted. I'm not sure
if we should pretend that insertelement is also free, or add a way
to compute a custom scalarization cost.
It was only used from LTO for a debug feature, and LTO can just create
another linker.
It is pretty odd to have a method to reset the module in the middle of a
link. It would make IdentifiedStructTypes inconsistent with the Module
for example.
David Blaikie [Tue, 1 Dec 2015 18:07:07 +0000 (18:07 +0000)]
[llvm-dwp] Correctly update debug_str_offsets.dwo when linking dwo files
This doesn't deduplicate strings in the debug_str section, nor does it
properly wire up the index so that debug_info can /find/ these strings,
but it does correct the str_offsets specifically.
Follow up patches to address those related/next issues.
Make appending var linking less of a special case.
It has to be a bit special because:
* materializeInitFor is not really supposed to call replaceAllUsesWith.
The caller has a plain variable with Dst and expects just the
initializer to be set, not for it to be removed.
* Calling mutateType as we used to do before gets some type
inconsistency which breaks the bitcode writer.
* If linkAppendingVarProto create a dest decl with the correct type to
avoid the above problems, it needs to put the original dst init in
some side table for materializeInitFor to use.
In the end the simplest solution seems to be to just have
linkAppendingVarProto do all the work and set ValueMap[SrcGV to avoid
recursion.
The difference is that now we don't error on out-of-comdat access to
internal global values. We copy them instead. This seems to match the
expectation of COFF linkers (see pr25686).
Original message:
Start deciding earlier what to link.
A traditional linker is roughly split in symbol resolution and
"copying
stuff".
The two tasks are badly mixed in lib/Linker.
This starts splitting them apart.
With this patch there are no direct call to linkGlobalValueBody or
linkGlobalValueProto. Everything is linked via WapValue.
This also includes a few fixes:
* A GV goes undefined if the comdat is dropped (comdat11.ll).
* We error if an internal GV goes undefined (comdat13.ll).
* We don't link an unused comdat.
The first two match the behavior of an ELF linker. The second one is
equivalent to running globaldce on the input.