Craig Topper [Mon, 16 Jan 2017 00:55:58 +0000 (00:55 +0000)]
[AVX-512] Correct memory operand size for VPGATHERQPS and VPGATHERQD
with ZMM index. Similar for SCATTER and the prefetch gather and scatter
instructions.
Justin Lebar [Sun, 15 Jan 2017 16:54:35 +0000 (16:54 +0000)]
[NVPTX] Let there be One True Way to set NVVMReflect params.
Summary:
Previously there were three ways to inform the NVVMReflect pass whether
you wanted to flush denormals to zero:
* An LLVM command-line option
* Parameters to the NVVMReflect constructor
* Metadata on the module itself.
This change removes the first two, leaving only the third.
The motivation for this change, aside from simplifying things, is that
we want LLVM to be aware of whether it's operating in FTZ mode, so other
passes can use this information. Ideally we'd have a target-generic
piece of metadata on the module. This change moves us in that
direction.
Daniel Jasper [Sun, 15 Jan 2017 16:42:36 +0000 (16:42 +0000)]
Fix un-initialized error introduced by r291959.
This is uncovered when running tools/dsymutil/X86/empty_range.s.test
with ASAN. Haven't investigate yet, whether that means there is an ODR
violation in that test.
Chandler Carruth [Sun, 15 Jan 2017 09:29:27 +0000 (09:29 +0000)]
[PM] Clean up the testing for IVUsers, especially with the new PM.
First, I've moved a test of IVUsers from the LSR tree to a dedicated
IVUsers test directory. I've also simplified its RUN line now that the
new pass manager's loop PM is providing analyses on their own.
No functionality changed, but it makes subsequent changes cleaner.
Chandler Carruth [Sun, 15 Jan 2017 08:20:50 +0000 (08:20 +0000)]
[PM] Teach the optimization remarks emitter to handle invalidation
events.
This pass sometimes has a pointer to BlockFrequencyInfo so it needs
custom invalidation logic. It is also otherwise immutable so we can
reduce the number of invalidations that happen substantially.
Lang Hames [Sun, 15 Jan 2017 06:34:25 +0000 (06:34 +0000)]
[Orc][RPC] Add an RPCFunctionNotSupported error type and return it from
negotiateFunction where appropriate.
Replacing the old ECError with a custom type allows us to attach the name of
the function that could not be negotiated, enabling better diagnostics for
negotiation failures.
Chandler Carruth [Sun, 15 Jan 2017 06:32:49 +0000 (06:32 +0000)]
[PM] Introduce an analysis set used to preserve all analyses over
a function's CFG when that CFG is unchanged.
This allows transformation passes to simply claim they preserve the CFG
and analysis passes to check for the CFG being preserved to remove the
fanout of all analyses being listed in all passes.
I've gone through and removed or cleaned up as many of the comments
reminding us to do this as I could.
Craig Topper [Sun, 15 Jan 2017 05:21:29 +0000 (05:21 +0000)]
[X86] Remove untested MOVDDUP patterns.
These all involve bitcasts around the memory operands. This isn't
something we normally do for isel patterns. I suspect DAG combine should
convert the load type making this unnecessary.
Mehdi Amini [Sun, 15 Jan 2017 03:21:30 +0000 (03:21 +0000)]
Add a LLVM_USE_LINKER that defines the linker to use when building LLVM
Summary:
This string parameter is passed to -fuse-ld when linking. It can be
an absolute path to your custom linker, otherwise clang will look for
`ld.{name}`.
Rui Ueyama [Sun, 15 Jan 2017 00:36:02 +0000 (00:36 +0000)]
PDB: Add a class to create the /names stream contents.
This patch adds a new class NameHashTableBuilder which creates /names streams.
This patch contains a test to confirm that a stream created by
NameHashTableBuilder can be read by NameHashTable reader class.
Chandler Carruth [Sun, 15 Jan 2017 00:26:18 +0000 (00:26 +0000)]
[PM] The assumption cache is fundamentally designed to be self-updating,
mark it as never invalidated in the new PM.
The old PM already required this to work, and after a discussion with
Hal this seems to really be the only sensible answer. The cache
gracefully degrades as the IR is mutated, and most things which do this
should already be incrementally updating the cache.
This gets rid of a bunch of logic preserving and testing the
invalidation of this analysis.
Chandler Carruth [Sat, 14 Jan 2017 23:25:22 +0000 (23:25 +0000)]
[PM] Fix instcombine's analysis preservation in the new pass manager to
cover domtree and alias analysis. These are the pretty clear analyses
that we would always want to survive this pass.
To make these survive, we also need to preserve the assumption cache.
Added a test that verifies the important bits of this preservation.
David Majnemer [Sat, 14 Jan 2017 21:54:58 +0000 (21:54 +0000)]
Adding const overloads of operator* and operator-> for DenseSet iterators
This fixes some problems when building ClangDiagnostics.cpp on Visual Studio 2017 RC. As far as I understand, there was a change in the implementation of the constructor for std::vector with two iterator parameters, which in our case causes an attempt to dereference const Iterator objects. Since there was no overload for a const Iterator, the compile would fail.
Simon Pilgrim [Sat, 14 Jan 2017 13:07:22 +0000 (13:07 +0000)]
[X86][XOP] Add tests for integer fused multiply add
Tests showing missed opportunities to use XOP's integer fma instructions
Some of these are pretty awkward to match as they often have implicit sext/trunc stages but many just ignore overflow bits which makes things pretty straightforward.
Craig Topper [Sat, 14 Jan 2017 07:50:52 +0000 (07:50 +0000)]
[AVX-512] Teach two address instruction pass to replace masked move instructions with blendm instructions when its beneficial.
Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions.
This also picks up cases where the masked move was created due to a masked load intrinsic.
Craig Topper [Sat, 14 Jan 2017 07:29:24 +0000 (07:29 +0000)]
[AVX-512] Replace V_SET0 in AVX-512 patterns with AVX512_128_SET0. Enhance AVX512_128_SET0 expansion to make this possible.
We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD.
This makes it possible for the register allocator to have all 32 registers available to work with.
This fallthrough if other cases are added between fabs and default
could cause fabs to fall to the next case resulting in a bug.
Better getting rid of it immediately just to be sure.
Craig Topper [Sat, 14 Jan 2017 04:19:35 +0000 (04:19 +0000)]
[AVX-512] Change blend mask in lowerVectorShuffleAsBlend to a 64-bit value. Also add 32-bit mode command lines to the test case that exercises this just to make sure we sanely handle the 64-bit immediate there.
This fixes a undefined sanitizer failure from r291888.
Easwaran Raman [Sat, 14 Jan 2017 00:32:37 +0000 (00:32 +0000)]
Compute summary before calling extractProfTotalWeight
extractProfTotalWeight checks if the profile type is sample profile, but
before that we have to ensure that summary is available. Also expanded
the unittest to test the case where there is no summar
Justin Bogner [Fri, 13 Jan 2017 23:46:11 +0000 (23:46 +0000)]
GlobalISel: Abort in ResetMachineFunctionPass if fallback isn't enabled
When GlobalISel is configured to abort rather than fallback the only
thing that resetting the machine function does is make things harder
to debug. If we ever get to this point in the abort configuration it
indicates that we've already hit a bug, so this changes the behaviour
to abort instead.
Previously, only signed comparisons were being handled.
Decrements could also be handled, but 'sub nuw %x, 1' is currently canonicalized to
'add %x, -1' in InstCombineAddSub, losing the nuw flag. Removing that canonicalization
seems like it might have far-reaching ramifications so I kept this simple for now.
Tim Northover [Fri, 13 Jan 2017 23:11:37 +0000 (23:11 +0000)]
[GlobalISel] track predecessor mapping during switch lowering.
Correctly populating Machine PHIs relies on knowing exactly how the IR level
CFG was lowered to MachineIR. This needs to be tracked by any translation
phases that meddle (currently only SwitchInst handling).
Daniel Berlin [Fri, 13 Jan 2017 22:40:01 +0000 (22:40 +0000)]
NewGVN: Move leaders around properly to ensure we have a canonical dominating leader. Fixes PR 31613.
Summary:
This is a testcase where phi node cycling happens, and because we do
not order the leaders by domination or anything similar, the leader
keeps changing.
Using std::set for the members is too expensive, and we actually don't
need them sorted all the time, only at leader changes.
We could keep both a set and a vector, and keep them mostly sorted and
resort as necessary, or use a set and a fibheap, but all of this seems
premature.
After running some statistics, we are able to avoid the vast majority
of sorting by keeping a "next leader" field. Most congruence classes only have
leader changes once or twice during GVN.
Greg Clayton [Fri, 13 Jan 2017 21:08:18 +0000 (21:08 +0000)]
Cleanup how DWARFDie attributes are accessed and decoded.
Removed all DWARFDie::getAttributeValueAs*() calls.
Renamed:
Optional<DWARFFormValue> DWARFDie::getAttributeValue(dwarf::Attribute);
To:
Optional<DWARFFormValue> DWARFDie::find(dwarf::Attribute);
Added:
Optional<DWARFFormValue> DWARFDie::findRecursively(dwarf::Attribute);
All decoding of Optional<DWARFFormValue> values are now done using the dwarf::to*() functions from DWARFFormValue.h:
Old code:
auto DeclLine = DWARFDie.getAttributeValueAsSignedConstant(DW_AT_decl_line).getValueOr(0);
New code:
auto DeclLine = toUnsigned(DWARFDie.find(DW_AT_decl_line), 0);
This composition helps us since we can now easily do:
auto DeclLine = toUnsigned(DWARFDie.findRecursively(DW_AT_decl_line), 0);
This allows us to easily find attribute values in the current DIE only (the first new code above) or in any DW_AT_abstract_origin or DW_AT_specification Dies using the line above. Note that the code line length is shorter and more concise.
David L. Jones [Fri, 13 Jan 2017 21:02:41 +0000 (21:02 +0000)]
"Use" lambda captures which are otherwise only used in asserts. NFC
Summary:
The LLVM coding standards recommend "using" values that are only
needed by asserts:
http://llvm.org/docs/CodingStandards.html#assert-liberally
Without this change, LLVM cannot bootstrap with -Werror as the second
stage fails with this new warning:
https://reviews.llvm.org/rL291905
See also the previous fixes:
https://reviews.llvm.org/rL291916
https://reviews.llvm.org/rL291939
https://reviews.llvm.org/rL291940
https://reviews.llvm.org/rL291941
Artem Belevich [Fri, 13 Jan 2017 20:56:17 +0000 (20:56 +0000)]
[NVPTX] Added support for half-precision floating point.
Only scalar half-precision operations are supported at the moment.
- Adds general support for 'half' type in NVPTX.
- fp16 math operations are supported on sm_53+ GPUs only
(can be disabled with --nvptx-no-f16-math).
- Type conversions to/from fp16 are supported on all GPU variants.
- On GPU variants that do not have full fp16 support (or if it's disabled),
fp16 operations are promoted to fp32 and results are converted back
to fp16 for storage.
Michael Liao [Fri, 13 Jan 2017 18:28:30 +0000 (18:28 +0000)]
[SCEV] Limit recursion depth of constant evolving.
- For a loop body with VERY complicated exit condition evaluation, constant
evolving may run out of stack on platforms such as Windows. Need to limit the
recursion depth.
Ivan Krasin [Fri, 13 Jan 2017 17:30:10 +0000 (17:30 +0000)]
Fix UBSan bots by blacklisting bits/stl_tree.h.
Summary:
libstdc++ has some undefined behavior in bits/stl_tree.h that
has recently became excercised by some of the LLVM code.
Given that fixing libstdc++ will take years, adding the file
into a blacklist to fix bots seems like a necessity.
Ivan Krasin [Fri, 13 Jan 2017 16:45:15 +0000 (16:45 +0000)]
Revert r291903 and r291898. Reason: they break check-lld on the bots.
Summary:
Revert [ARM] Fix ubig32_t read in ARMAttributeParser
Now using support functions to read data instead of trying to
perform casts.
===========================================================
Revert [ARM] Enable objdump to construct triple for ARM
Now that The ARMAttributeParser has been moved into the library,
it has been modified so that it can parse the attributes without
printing them and stores them in a map. ELFObjectFile now queries
the attributes to fill out the architecture details of a provided
triple for 'arm' and 'thumb' targets. llvm-objdump uses this new
functionality.