Justin Bogner [Tue, 8 Dec 2015 02:37:48 +0000 (02:37 +0000)]
AsmPrinter: Use emitGlobalConstantFP to emit elements of constant data
It's strange to duplicate the logic for emitting FP values into
emitGlobalConstantDataSequential, and it's even stranger that we end
up printing the verbose assembly comments differently between the two
paths. Just call into emitGlobalConstantFP rather than crudely
duplicating its logic.
Philip Reames [Tue, 8 Dec 2015 00:10:56 +0000 (00:10 +0000)]
[PassManager] Tuning Memory Usage of AnalysisUsage
We were using unneccessarily large initial sizes for these SmallVectors. This was wasting around 50kb of memory for the O3 pipeline, even after the uniquing changes. We're still using around 20kb which is a bit much, but it's definitely better. This is about a 6% improvement in total O3 memory usage.
Note: The raw data on structure size which were used to pick these thresholds can be found in the review thread.
Philip Reames [Mon, 7 Dec 2015 22:41:23 +0000 (22:41 +0000)]
Reapply 254950 w/fix
254950 ended up being not NFC. The previous code was overriding the flags for whether an instruction read or wrote memory using the target specific flags returned via TTI. I'd missed this in my refactoring. Since I mistakenly built only x86 and didn't notice the number of unsupported tests, I didn't catch that before the original checkin.
This raises an interesting issue though. Given we have function attributes (i.e. readonly, readnone, argmemonly) which describe the aliasing of intrinsics, why does TTI have this information overriding the instruction definition at all? I see no reason for this, but decided to preserve existing behavior for the moment. The root issue might be that we don't have a "writeonly" attribute.
Original commit message:
[EarlyCSE] Simplify and invert ParseMemoryInst [NFCI]
Restructure ParseMemoryInst - which was introduced to abstract over target specific load and stores instructions - to just query the underlying instructions. In theory, this could be slightly slower than caching the results, but in practice, it's very unlikely to be measurable.
The simple query scheme makes it far easier to understand, and much easier to extend with new queries. Given I'm about to need to add new query types, doing the cleanup first seemed worthwhile.
Do we still believe the target specific intrinsic handling is worthwhile in EarlyCSE? It adds quite a bit of complexity and makes the code harder to read. Being able to delete the abstraction entirely would be wonderful.
Mehdi Amini [Mon, 7 Dec 2015 22:27:19 +0000 (22:27 +0000)]
Remove useless hack that avoids calling LLVMLinkInInterpreter()
This is supposed to force-link the Interpreter, by inserting a dead
call to LLVMLinkInInterpreter().
Since it is actually an empty function, there is no reason for the
call to be dead.
Philip Reames [Mon, 7 Dec 2015 21:27:15 +0000 (21:27 +0000)]
[EarlyCSE] Simplify and invert ParseMemoryInst [NFCI]
Restructure ParseMemoryInst - which was introduced to abstract over target specific load and stores instructions - to just query the underlying instructions. In theory, this could be slightly slower than caching the results, but in practice, it's very unlikely to be measurable.
The simple query scheme makes it far easier to understand, and much easier to extend with new queries. Given I'm about to need to add new query types, doing the cleanup first seemed worthwhile.
Do we still believe the target specific intrinsic handling is worthwhile in EarlyCSE? It adds quite a bit of complexity and makes the code harder to read. Being able to delete the abstraction entirely would be wonderful.
Easwaran Raman [Mon, 7 Dec 2015 21:21:20 +0000 (21:21 +0000)]
Use updated threshold for indirect call bonus
When considering foo->bar inlining, if there is an indirect call in foo which gets resolved to a direct call (say baz), then we try to inline baz into bar with a threshold T and subtract max(T - Cost(bar->baz), 0) from Cost(foo->bar). This patch uses max(Threshold(bar->baz) - Cost(bar->baz)) instead, where Thresheld(bar->baz) could be different from T due to bonuses or subtractions. Threshold(bar->baz) - Cost(bar->baz) better represents the desirability of inlining baz into bar.
Teresa Johnson [Mon, 7 Dec 2015 19:21:11 +0000 (19:21 +0000)]
[ThinLTO] Support for specifying function index from pass manager
Summary:
Add a field on the PassManagerBuilder that clang or gold can use to pass
down a pointer to the function index in memory to use for importing when
the ThinLTO backend is triggered. Add support to supply this to the
function import pass.
Sanjay Patel [Mon, 7 Dec 2015 17:39:48 +0000 (17:39 +0000)]
Tighten checks so we can see existing codegen
The 2-element vector case shows a surprising bug: we failed to
eliminate ops on undefs, so there are 4 fmax calls even though
there can only be 2 valid elements in the inputs.
Aaron Ballman [Mon, 7 Dec 2015 15:44:34 +0000 (15:44 +0000)]
Silence all C4592 warnings with MSVC 2015 Update 1. This warning produces false positives that Microsoft says will be fixed in Update 2. Until this produces reliable diagnostics, it is safe to disable the diagnostic -- the compiler is not doing anything different than it previously did aside from issuing the diagnostic.
(Note, this silences at least one false positive in LLVM with FeatureBitset uses.)
Teresa Johnson [Mon, 7 Dec 2015 15:05:44 +0000 (15:05 +0000)]
[ThinLTO] Support cloning of temporary DILocation metadata
This is needed to support linking of module-level metadata as a
postpass after function importing, where we will be leaving temporary
metadata on imported instructions until the postpass metadata import.
VX-512: Fixed a bug in FP logic operation lowering
FP logic instructions are supported in DQ extension on AVX-512 target.
I use integer operations instead.
Added tests.
I also enabled FABS in this patch in order to check ANDPS.
The operations are FOR, FXOR, FAND, FANDN.
The instructions, that supported for 512-bit vector under DQ are:
VORPS/PD, VXORPS/PD, VANDPS/PD, FANDNPS/PD.
Daniel Sanders [Mon, 7 Dec 2015 14:12:44 +0000 (14:12 +0000)]
[mips][ias] Removed DSP/DSPr2 instructions from base architecture valid-xfail.s's.
Summary:
valid-xfail.s is for instructions that should be valid in the given ISA but
incorrectly fail. DSP/DSPr2 instructions are correct to fail since DSP/DSPr2 is
not enabled.
AVX-512: Fixed masked load / store instruction selection for KNL.
Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store.
This intrinsic comes with all legal types:
<8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru),
but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only.
All data operands should be widened to 512-bit vector.
The mask operand should be widened to v16i1 with zeroes.
Keno Fischer [Sun, 6 Dec 2015 23:05:38 +0000 (23:05 +0000)]
[Verifier] Fix !dbg validation if Scope is the Subprogram
Summary:
We are inserting both Scope and SP into the Seen map and check whether
it was already there in which case we skip the validation (the idea
being that we already checked this Subprogram before). However,
if (Scope == SP) as MDNodes, then inserting the Scope, will trigger
the Seen check causing us to incorrectly not validate this !dbg
attachment. Fix this by not performing the SP Seen check if Scope == SP
Sanjay Patel [Sun, 6 Dec 2015 18:05:12 +0000 (18:05 +0000)]
[x86] add missing maxnum/minnum tests for 256-bit vectors
Also, switch to x86-64 because once we can lower these to something
more reasonable, there will be less noise in the checks. And add
AVX runs because those will be different than SSE.
This removes the code path that generate "synchronous" (only correct at call site) CFA.
We will probably want to re-introduce it once we are capable of emitting different
.eh_frame and .debug_frame sections.
Craig Topper [Sat, 5 Dec 2015 17:34:07 +0000 (17:34 +0000)]
[Hexagon] Don't call getNumImplicitDefs and then iterate over the count. getNumImplicitDefs contains a loop so its better to just loop over the null terminated implicit def list. NFC
Keno Fischer [Sat, 5 Dec 2015 14:42:34 +0000 (14:42 +0000)]
[ASAN] Add doFinalization to reset state
Summary: If the same pass manager is used for multiple modules ASAN
complains about GlobalsMD being initialized twice. Fix this by
resetting GlobalsMD in a new doFinalization method to allow this
use case.
Different version of indexed format may use different
name uniquing schemes for static functions. Pass the
version info to the name interface so that different
schmes can be picked (for profile lookup).
David Blaikie [Sat, 5 Dec 2015 03:41:53 +0000 (03:41 +0000)]
[llvm-dwp] Add coverage for both the presence and absence of type units, and fix/remove the emission of a broken tu_index when no type units are present
Dan Gohman [Sat, 5 Dec 2015 03:03:35 +0000 (03:03 +0000)]
[WebAssembly] Implement ReverseBranchCondition, and re-enable MachineBlockPlacement
This patch introduces a codegen-only instruction currently named br_unless,
which makes it convenient to implement ReverseBranchCondition and re-enable
the MachineBlockPlacement pass. Then in a late pass, it lowers br_unless
back into br_if.