Daniel Sanders [Thu, 26 Nov 2015 11:23:03 +0000 (11:23 +0000)]
[mips][ias] Explicitly disable IAS on tests that depend on not assembling.
Summary:
no-odd-spreg-msa.ll: This test deliberately uses an odd-numbered register
in inline assembly and expects the compiler to insert a move to an
even-numbered register.
inlineasm-operand-code.ll and inlineasm_constraint.ll:
Checks for IAS's output will be added once a matcher bug is resolved. This bug
causes the canonical output emitted by IAS to be incorrect for uimm16 constants
with the MSB set. We will still need the non-IAS checks at this point since
these tests primarily test formatting of operands.
X86-FMA3: Improved/enabled the memory folding optimization for scalar loads
generated for _mm_losd_s{s,d}() intrinsics and used in scalar FMAs generated
for FMA intrinsics _mm_f{madd,msub,nmadd,nmsub}_s{s,d}().
Reviewer: David Kreitzer
Differential Revision: http://reviews.llvm.org/D14762
Teach LLVM optimize to more precisely in the presence of "deopt" operand
bundles. "deopt" operand bundles imply that the call they're attached
to is at least `readonly` (i.e. they don't imply clobber semantics), and
they don't capture their bundle operands.
[PGO] Implement ValueProfiling Closure interfaces for runtime value profile data
This is one of the many steps to commonize value profiling support between profile
runtime and compiler/llvm tools.
After this change, profiler runtime now can share the same C APIs to do VP
serialization/deseriazation with LLVM host tools (and produces value data
in identical format between indexed and raw profile).
It is not yet enabled in profiler runtime yet.
Also added a unit test case to test runtime profile data serialization/deserialization
interfaces implemented using common closure code.
Richard Diamond [Wed, 25 Nov 2015 22:49:48 +0000 (22:49 +0000)]
Fix a use-after-free in `llvm-config`.
Summary:
This could happen if `GetComponentNames` is true, because `Name` from
`VisitComponent` would reference a stack instance of `std::string` in
`ComputeLibsForComponents`.
[Hexagon] Treat transfers of FP immediates are pseudo instructions
This is a temporary fix to address ICE on 2005-10-21-longlonggtu.ll.
The proper fix will be to use A2_tfrsi, but it will need more work to
teach all users of A2_tfrsi to also expect a floating-point operand.
Marek Olsak [Wed, 25 Nov 2015 21:22:45 +0000 (21:22 +0000)]
AMDGPU/SI: select S_ABS_I32 when possible (v2)
v2: added more tests, moved the SALU->VALU conversion to a separate function
It looks like it's not possible to get subregisters in the S_ABS lowering
code, and I don't feel like guessing without testing what the correct code
would look like.
Dan Gohman [Wed, 25 Nov 2015 21:13:02 +0000 (21:13 +0000)]
[WebAssembly] Fix WebAssembly register numbering for registers added late.
If virtual registers are created late, mappings to WebAssembly
registers need to be added explicitly. This patch adds a function
to do so and teaches WebAssemblyPeephole to use it. This fixes
an out-of-bounds access on the WARegs vector.
Artyom Skrobov [Wed, 25 Nov 2015 19:41:11 +0000 (19:41 +0000)]
Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC)
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.
Dan Gohman [Wed, 25 Nov 2015 19:36:19 +0000 (19:36 +0000)]
[WebAssembly] Use a physical register to describe ARGUMENT liveness.
Instead of trying to move ARGUMENT instructions back up to the top after
they've been scheduled or sunk down, use a fake physical register to
create a liveness constraint that prevents ARGUMENT instructions from
moving down in the first place. This is still not entirely ideal, however
it is more robust than letting them move and moving them back.
Eric Christopher [Wed, 25 Nov 2015 09:11:53 +0000 (09:11 +0000)]
Fix some places where we were assuming that memory type had been legalized
to a simple type when lowering a truncating store of a vector type. In this
case for an EVT we'll return Expand as we should in all of the cases anyhow.
The testcase triggered at the one in VectorLegalizer::LegalizeOp, inspection
found the rest.
[PGO] Convert InstrProfRecord based serialization methods to use common C methods
1. Convert serialization methods using InstrProfRecord as source into C (impl)
interfaces using Closure.
2. Reimplement InstrProfRecord serialization method to use new C interface
as dummy wrapper.
Now it is ready to implement wrapper for runtime value profile data.
(The new code need better source location -- but not changed in this patch to
minimize diffs. )
Matthias Braun [Wed, 25 Nov 2015 00:50:47 +0000 (00:50 +0000)]
Doxygen: Use mathjax to create formulas.
The main motivation is to not require a latex installation when building
the documentation. I would also expect a better image quality and the
ability to copy&paste from formulas with a javascript based solution for
displaying the math.
Teresa Johnson [Tue, 24 Nov 2015 22:55:46 +0000 (22:55 +0000)]
[ThinLTO] Add option to limit importing based on instruction count
Add a simple initial heuristic to control importing based on the number
of instructions recorded in the function's summary. Add option to
control the limit, and test using option.
Rong Xu [Tue, 24 Nov 2015 22:50:34 +0000 (22:50 +0000)]
[PGO] Relax test cases in PGO instrumentation
Fix buildbot failure for clang-x86_64-linux-selfhost-modules.
http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/8866
The failing test cases are newly added from r254021. It seems the IR has a
different order in this platform. In this patch, I temporarily relax the test
case to make the build green. I'll have a complete fix (more robust way to test)
soon.
Diego Novillo [Tue, 24 Nov 2015 22:38:37 +0000 (22:38 +0000)]
SamplePGO - Add test for hot/cold inlined functions.
When the original binary is executed and sampled, the resulting profile
contains information on the original inline stack. We currently follow
the original inline plan if we notice that the inlined callsite has more
than 0 samples to it.
A better way is to determine whether the callsite is actually worth
inlining. If the callsite accumulates a small fraction of the samples
spent in the parent function, then we don't want to bother inlining it
(as it means that the callsite is actually cold).
This patch introduces a threshold expressed in percentage of samples
in relation to the parent function. If the callsite uses less than N%
of the total samples used by its parent, the original inline decision is
not re-applied.
I've set the threshold to the very arbitrary value of 5%. I'm yet to do
any actual experiments to see what's a good value. I wanted to separate
the basic mechanism from the tuning.
Evgeniy Stepanov [Tue, 24 Nov 2015 21:44:16 +0000 (21:44 +0000)]
[msan] Relax origin-alignment test.
Change origin-alignment test to test only the alignment of the origin
store, and not the exact instruction sequence used to compute the
address. This makes the test less fragile and, in particular, lets it
pass both with the old and new MSan ABIs.
Rong Xu [Tue, 24 Nov 2015 21:31:25 +0000 (21:31 +0000)]
[PGO] MST based PGO instrumentation infrastructure
This patch implements a minimum spanning tree (MST) based instrumentation for
PGO. The use of MST guarantees minimum number of CFG edges getting
instrumented. An addition optimization is to instrument the less executed
edges to further reduce the instrumentation overhead. The patch contains both the
instrumentation and the use of the profile to set the branch weights.
Sanjoy Das [Tue, 24 Nov 2015 20:37:01 +0000 (20:37 +0000)]
[RuntimeDyld] Fix a class of arithmetic errors introduced in r253918
r253918 had refactored expressions like "A - B.Address + C" to "A -
B.getAddressWithOffset(C)". This is incorrect, since the latter really
computes "A - B.Address - C".
None of the tests I can run locally on x86 broke due to this bug, but it
is the current suspect for breakage on the AArch64 buildbots.
Simon Pilgrim [Tue, 24 Nov 2015 20:31:46 +0000 (20:31 +0000)]
[X86][FMA] Optimize FNEG(FMA) Patterns
X86 needs to use its own FMA opcodes, preventing the standard FNEG(FMA) pattern table recognition method used by other platforms. This patch adds support for lowering FNEG(FMA(X,Y,Z)) into a single suitably negated FMA instruction.
Teresa Johnson [Tue, 24 Nov 2015 19:55:04 +0000 (19:55 +0000)]
[ThinLTO] Enable iterative importing in FunctionImport pass
Analyze imported function bodies and add any new external calls to
the worklist for importing. Currently no controls on the importing
so this will end up importing everything possible in the call tree
below the importing module. Basic profitability checks coming next.
Update test to check for iteratively inlined functions.
Cong Hou [Tue, 24 Nov 2015 19:51:26 +0000 (19:51 +0000)]
[X86] Fix several issues related to X86's psadbw instruction.
This patch fixes the following issues:
1. Fix the return type of X86psadbw: it should not be the same type of inputs.
For vNi8 inputs the output should be vMi64, where M = N/8.
2. Fix the return type of int_x86_avx512_psad_bw_512 accordingly.
3. Fix the definiton of PSADBW, VPSADBW, and VPSADBWY accordingly.
4. Adjust the return type when building a DAG node of X86ISD::PSADBW type.
5. Update related tests.
Teresa Johnson [Tue, 24 Nov 2015 19:46:58 +0000 (19:46 +0000)]
[ThinLTO] Handle previously imported and promoted locals in module linker
The new function import pass exposed an issue when we import references
to local values on multiple importing passes. They are renamed on each
import pass, and we need to ensure that the already promoted and renamed
references existing in the dest module are correctly identified and
updated so that they aren't spuriously renamed again (due to a perceived
conflict with the newly linked reference).
The closure is designed to abstact away two types of value profile
data:
- InstrProfRecord which is the primary data structure used to
represent profile data in host tools (reader, writer, and profile-use)
- value profile runtime data structure suitable to be used by C
runtime library.
Both sources of data need to serialize to disk/memory-buffer in common
format: ValueProfData.
The abstraction allows compiler-rt's raw profiler writer to share
the same code with indexed profile writer.
As shown in the test case and PR25554:
https://llvm.org/bugs/show_bug.cgi?id=25554
This causes us to miss reusing an operand because later passes don't know these 'movq' are the same instruction.
This patch deletes one pair of these defs.
Sadly, this won't fix the original test case in the bug report. Something else is still broken.
Matt Arsenault [Tue, 24 Nov 2015 12:05:03 +0000 (12:05 +0000)]
AMDGPU: Split x8 and x16 vector loads instead of scalarize
The one regression in the builtin tests is in the read2 test which now
(again) has many extra copies, but this should be solved once the pass
is replaced with a DAG combine.
Pavel Labath [Tue, 24 Nov 2015 09:46:01 +0000 (09:46 +0000)]
Fix non-PIC build after 253959
CMAKE_EXE_LINKER_FLAGS is a string. Appending a flag using list(APPEND) introduces an extra
semicolon which breaks stuff. Change this to append the value in the same way that everyone else
seems to be doing.
Cong Hou [Tue, 24 Nov 2015 08:51:23 +0000 (08:51 +0000)]
Let SelectionDAG start to use probability-based interface to add successors.
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes.
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights.
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This the second patch above. In this patch SelectionDAG starts to use
probability-based interfaces in MBB to add successors but other MC passes are
still using weight-based interfaces. Therefore, we need to maintain correct
weight list in MBB even when probability-based interfaces are used. This is
done by updating weight list in probability-based interfaces by treating the
numerator of probabilities as weights. This change affects many test cases
that check successor weight values. I will update those test cases once this
patch looks good to you.
Chris Bieneman [Tue, 24 Nov 2015 08:04:59 +0000 (08:04 +0000)]
[CMake] When disabling PIC, also pass -fno-pie when linking if it is supported.
Building clang with -fno-pie generates slightly faster code. In my not-very-rigorous testing I saw about a 4% speed up using the clang test-suite sources.