Matthias Braun [Fri, 4 Dec 2015 01:51:19 +0000 (01:51 +0000)]
ScheduleDAGInstrs: Rework schedule graph builder.
Re-comitting with a change that avoids undefined uses getting put into
the VRegUses list.
The new algorithm remembers the uses encountered while walking backwards
until a matching def is found. Contrary to the previous version this:
- Works without LiveIntervals being available
- Allows to increase the precision to subregisters/lanemasks
(not used for now)
The changes in the AMDGPU tests are necessary because the R600 scheduler
is not stable with respect to the order of nodes in the ready queues.
Summary:
computeRegisterLiveness and analyzePhysReg are currently getting
confused about liveness in some cases, breaking copyPhysReg's
calculation of whether AX is dead in some cases. Work around this issue
temporarily by assuming that AX is always live.
See detail in: https://llvm.org/bugs/show_bug.cgi?id=25033#c7
And associated bugs PR24535 PR25033 PR24991 PR24992 PR25201.
This workaround makes the code correct but slightly inefficient, but it
seems to confuse the machine instr verifier which now things EAX was
undefined in some cases where it's being conservatively saved /
restored.
[PGO] Unify VP data format between raw and indexed profile (Reader)
With the latest refactoring and code sharing patches landed,
it is possible to unify the value profile implementation between
raw and indexed profile. This is the patch in raw profile reader
that uses the common interface.
Cong Hou [Fri, 4 Dec 2015 00:36:58 +0000 (00:36 +0000)]
Don't punish vectorized arithmetic instruction whose type will be split to multiple registers
Currently in LLVM's cost model, a vectorized arithmetic instruction will have
high cost if its type is split into multiple registers. However, this
punishment is too heavy and unnecessary. The overhead of the split should not
be on arithmetic instructions but instructions that implement the split. Note
that during vectorization we have calculated the register pressure, and we
only choose proper interleaving factor (and also vectorization factor) so
that we don't use more registers than the maximum number.
Here is a very simple example: if a vadd has the cost 1, and if we double VF
so that we need two registers to perform it, then its cost will become 4 with
the current implementation, which will prevent us to use larger VF.
[llvm-profdata] Add support for weighted merge of profile data
This change adds support for an optional weight when merging profile data with the llvm-profdata tool.
Weights are specified by adding an option ':<weight>' suffix to the input file names.
Adding support for arbitrary weighting of input profile data allows for relative importance to be placed on the
input data from multiple training runs.
Both sampled and instrumented profiles are supported.
JF Bastien [Thu, 3 Dec 2015 23:43:56 +0000 (23:43 +0000)]
CodeGen peephole: fold redundant phys reg copies
Code generation often exposes redundant physical register copies through
virtual registers such as:
%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg
There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.
This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.
Before this patch we have:
DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>
Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.
dec is especially confusing to LLVM when compared with sub.
I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.
The following tests used to failed when the patch also replaced allocatable
registers:
CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.
Dan Gohman [Thu, 3 Dec 2015 23:07:03 +0000 (23:07 +0000)]
[WebAssembly] Fix dominance check for PHIs in the StoreResult pass
When a block has no terminator instructions, getFirstTerminator() returns
end(), which can't be used in dominance checks. Check dominance for phi
operands separately.
Also, remove some bits from WebAssemblyRegStackify.cpp that were causing
trouble on the same testcase; they were left behind from an earlier
experiment.
David Majnemer [Thu, 3 Dec 2015 22:45:19 +0000 (22:45 +0000)]
[Analysis] Become aware of MSVC's new/delete functions
The compiler can take advantage of the allocation/deallocation
function's properties. We knew how to do this for Itanium but had no
support for MSVC-style functions.
Chih-Hung Hsieh [Thu, 3 Dec 2015 22:02:40 +0000 (22:02 +0000)]
[X86] Part 1 to fix x86-64 fp128 calling convention.
Almost all these changes are conditioned and only apply to the new
x86-64 f128 type configuration, which will be enabled in a follow up
patch. They are required together to make new f128 work. If there is
any error, we should fix or revert them as a whole.
These changes should have no impact to current configurations.
* Relax type legalization checks to accept new f128 type configuration,
whose TypeAction is TypeSoftenFloat, not TypeLegal, but also has
TLI.isTypeLegal true.
* Relax GetSoftenedFloat to return in some cases f128 type SDValue,
which is TLI.isTypeLegal but not "softened" to i128 node.
* Allow customized FABS, FNEG, FCOPYSIGN on new f128 type configuration,
to generate optimized bitwise operators for libm functions.
* Enhance related Lower* functions to handle f128 type.
* Enhance DAGTypeLegalizer::run, SoftenFloatResult, and related functions
to keep new f128 type in register, and convert f128 operators to library calls.
* Fix Combiner, Emitter, Legalizer routines that did not handle f128 type.
* Add ExpandConstant to handle i128 constants, ExpandNode
to handle ISD::Constant node.
* Add one more parameter to getCommonSubClass and firstCommonClass,
to guarantee that returned common sub class will contain the specified
simple value type.
This extra parameter is used by EmitCopyFromReg in InstrEmitter.cpp.
* Fix infinite loop in getTypeLegalizationCost when f128 is the value type.
* Fix printOperand to handle null operand.
* Enhance ISD::BITCAST node to handle f128 constant.
* Expand new f128 type for BR_CC, SELECT_CC, SELECT, SETCC nodes.
* Enhance X86AsmPrinter to emit f128 values in comments.
Keno Fischer [Thu, 3 Dec 2015 21:27:59 +0000 (21:27 +0000)]
[RuntimeDyld] DenseMap -> std::unordered_map
DenseMap is most applicable when both keys and values are small.
In this case, the value violates that assumption, causing quite
significant memory overhead. A std::unordered_map is more appropriate
in this case (or at least fixed the memory problems I was seeing).
Easwaran Raman [Thu, 3 Dec 2015 20:57:37 +0000 (20:57 +0000)]
Interface to attach maximum function count from PGO to module as module flags.
This provides interface to get and set maximum function counts to Module. This
would allow things like determination of function hotness. The actual setting
of this max function count will have to be done in the frontend.
Chris Bieneman [Thu, 3 Dec 2015 18:45:39 +0000 (18:45 +0000)]
[CMake] Add option LLVM_EXTERNALIZE_DEBUGINFO
Summary: This adds support for generating dSYM files and stripping debug info from executables and dylibs. It also supports passing -object_path_lto to the linker to generate dSYMs for LTO builds.
Teresa Johnson [Thu, 3 Dec 2015 18:20:05 +0000 (18:20 +0000)]
[ThinLTO] Appending linkage fixes
Summary:
Fix import from module with appending var, which cannot be imported. The
first fix is to remove an overly-aggressive error check.
The second fix is to deal with restructuring introduced to the module
linker yesterday in r254418 (actually, this fix was included already
in r254559, just added some additional cleanup).
Colin LeMahieu [Thu, 3 Dec 2015 16:37:21 +0000 (16:37 +0000)]
[Hexagon] NFC Using canonicalizePacket to compound/duplex/pad packets rather than doing it separately. This also ensures the integrated assembler path matches the assembly parser path.
Marina Yatsina [Thu, 3 Dec 2015 12:17:03 +0000 (12:17 +0000)]
[X86] MS inline asm: produce error when encountering "<type> ptr <reg name>"
Currently "<type> ptr <reg name>" treated as <reg name> in MS inline asm, ignoring the "<type> ptr" completely and possibly ignoring the intention of the user.
Fixed llvm to produce an error when encountering "<type> ptr <reg name>" operands.
For example: andpd xmm1,xmmword ptr xmm1 --> andpd xmm1, xmm1
though andpd has 2 possible matching formats - andpd xmm, xmm/m128
Andy Gibbs [Thu, 3 Dec 2015 08:20:20 +0000 (08:20 +0000)]
Fix class SCEVPredicate has virtual functions and accessible non-virtual destructor.
It is not enough to simply make the destructor virtual since there is a g++ 4.7
issue (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53613) that throws the
error "looser throw specifier for ... overridding ~SCEVPredicate() noexcept".
Craig Topper [Thu, 3 Dec 2015 05:57:37 +0000 (05:57 +0000)]
[TableGen] Remove an assumption about the order of encodings in the MVT::SimpleValueType enum. Instead of assuming the types are sorted by size, scan the typeset arrays to find the smallest/largest type. NFC
This works mostly fine but breaks some stage 1 builders when compiling
compiler-rt on i386. Revert for further investigation as I can't see an
obvious cause/fix.
Mehdi Amini [Thu, 3 Dec 2015 02:37:23 +0000 (02:37 +0000)]
Remove "ExportingModule" from ThinLTO Index (NFC)
There is no real reason the index has to have the concept of an
exporting Module. We should be able to have one single unique
instance of the Index, and it should be read-only after creation
for the whole ThinLTO processing.
The linker plugin should be able to process multiple modules (in
parallel or in sequence) with the same index.
The only reason the ExportingModule was present seems to be to
implement hasExportedFunctions() that is used by the Module linker
to decide what to do with the current Module.
For now I replaced it with a query to the map of Modules path to
see if this module was declared in the Index and consider that if
it is the case then it is probably exporting function.
On the long term the Linker interface needs to evolve and this
call should not be needed anymore.
Matthias Braun [Thu, 3 Dec 2015 02:05:27 +0000 (02:05 +0000)]
ScheduleDAGInstrs: Rework schedule graph builder.
The new algorithm remembers the uses encountered while walking backwards
until a matching def is found. Contrary to the previous version this:
- Works without LiveIntervals being available
- Allows to increase the precision to subregisters/lanemasks
(not used for now)
The changes in the AMDGPU tests are necessary because the R600 scheduler
is not stable with respect to the order of nodes in the ready queues.
Alexey Samsonov [Wed, 2 Dec 2015 21:25:28 +0000 (21:25 +0000)]
[PowerPC] Remove wild call to RegScavenger::initRegState().
This call should in fact be made by RegScavenger::enterBasicBlock()
called below. The first call does nothing except for triggering UB,
indicated by UBSan (passing nullptr to memset()).
Alexey Samsonov [Wed, 2 Dec 2015 21:13:43 +0000 (21:13 +0000)]
[Hexagon] Remove std::hex in favor of format().
std::hex is not used anywhere in LLVM code base except for this place,
and it has a known undefined behavior (at least in libstdc++ 4.9.3):
https://llvm.org/bugs/show_bug.cgi?id=18156, which fires in UBSan
bootstrap of LLVM.
Kyle Butt [Wed, 2 Dec 2015 18:58:51 +0000 (18:58 +0000)]
[CodeGen]: Fix bad interaction with AntiDep breaking and inline asm.
AggressiveAntiDepBreaker was renaming registers specified by the user
for inline assembly. While this will work for compiler-specified
registers, it won't work for user-specified registers, and at the time
this runs, I don't currently see a way to distinguish them.
Fiona Glaser [Wed, 2 Dec 2015 18:32:59 +0000 (18:32 +0000)]
Scheduler / Regalloc: use unique_ptr[] instead of std::vector
vector.resize() is significantly slower than memset in many STLs
and the cost of initializing these vectors is significant on targets
with many registers. Since we don't need the overhead of a vector,
use a simple unique_ptr instead.
[llvm-profdata] Change instr prof counter overflow to saturate rather than discard
Summary: This changes overflow handling during instrumentation profile merge. Rathar than throwing away records that would result in counter overflow, merged counts are instead clamped to the maximum representable value. A warning about counter overflow is still surfaced to the user as before.
Tim Northover [Wed, 2 Dec 2015 18:12:57 +0000 (18:12 +0000)]
AArch64: use ldxp/stxp pair to implement 128-bit atomic loads.
The ARM ARM is clear that 128-bit loads are only guaranteed to have been atomic
if there has been a corresponding successful stxp. It's less clear for AArch32, so
I'm leaving that alone for now.
|9B DD /7| FSTSW m2byte| Valid Valid Store FPU status word at m2byteafter checking for pending unmasked floating-point exceptions.|
|9B DF E0| FSTSW AX| Valid Valid Store FPU status word in AX register after checking for pending unmasked floating-point exceptions.|
|DD /7 |FNSTSW *m2byte| Valid Valid Store FPU status word at m2bytewithout checking for pending unmasked floating-point exceptions.|
|DF E0 |FNSTSW *AX| Valid Valid Store FPU status word in AX register without checking for pending unmasked floating-point exceptions|
m2byte is word register, and therefor instruction operand need to be change from f32mem to i16mem.
Andy Gibbs [Wed, 2 Dec 2015 14:22:18 +0000 (14:22 +0000)]
Fix buildbots broken by r254508
g++ 4.7 does not allow an inline defaulted virtual destructor to be overridden,
giving the error "looser throw specifier for ... overridding ~SCEVPredicate()
noexcept (true)" (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53613).
The work-around given in the bug report above has been utilised here.