Ahmed Bougacha [Tue, 7 Mar 2017 20:34:20 +0000 (20:34 +0000)]
[GlobalISel] Emit DBG_VALUE %noreg for non-int/fp constant values.
When a dbg_value has a constant operand that isn't representable in MI,
there isn't much we can do. Use %noreg (0) for those situations.
This matches the SelectionDAG behavior.
SjLjEHPrepare: Fix the pass for swifterror arguments
We cannot leave the identity copies 'select true, arg, undef' that this pass
inserts for arguments to simplify handling of values on swifterror arguments.
swifterror arguments have restrictions on their uses.
[cmake] Include openmp with add_llvm_external_project
Summary:
Include projects/openmp into the build using add_llvm_external_project
instead of add_subdirectory. This creates an option
LLVM_TOOL_OPENMP_BUILD that selects whether this project gets included
in an in-tree build.
Daniel Berlin [Tue, 7 Mar 2017 18:47:50 +0000 (18:47 +0000)]
Add PointerLikeTypeTraits for const things, as long as there is one for the non-const version. Clang and other users have a number of types they use as pointers, and this avoids having to define both const and non-const versions of PointerLikeTraits.
Daniel Berlin [Tue, 7 Mar 2017 18:47:48 +0000 (18:47 +0000)]
Make SmallPtrSet count and find able to take const PtrType's
Summary:
For our set/map types, count/find normally take const references.
This works well for non-pointer types, but can suck for pointer
types.
DenseSet<int *> foo;
const int *b = nullptr;
foo.count(b) does not work
but the equivalent reference version does work
(patch to fix DenseSet/DenseMap coming up)
For SmallPtrSet, you have no such option.
The following will not work right now:
SmallPtrSet<int *> foo;
const int *b = nullptr;
foo.count(b);
This makes const correctness hard in some cases.
Example:
SmallPtrSet<Instruction *> InstructionsToErase;
You can't make this SmallPtrSet<const Instruction *> because then you
can't erase the instruction. If I want to see if something is in the
set, I may only have a const Instruction *. Given that count and find
are non-mutating, this should just work.
The places in our code base that do this resort to const_cast :(.
This patch makes count and find able to be used with const Instruction
* in the above SmallPtrSet examples.
This is a bit annoying because of where C++ applies the const, so we
have to remove the pointer type from the passed-in-type and rebuild it
with const.
Matthew Simpson [Tue, 7 Mar 2017 18:47:30 +0000 (18:47 +0000)]
[LV] Consider users that are memory accesses in uniforms expansion step
When expanding the set of uniform instructions beyond the seed instructions
(e.g., consecutive pointers), we mark a new instruction uniform if all its
loop-varying users are uniform. We should also allow users that are consecutive
or interleaved memory accesses. This fixes cases where we have an instruction
that is used as the pointer operand of a consecutive access but also used by a
non-memory instruction that later becomes uniform as part of the expansion.
Sanjoy Das [Tue, 7 Mar 2017 18:47:22 +0000 (18:47 +0000)]
[X86] Add option to specify preferable loop alignment
Summary:
Loop alignment can cause a significant change of
the perfromance for short loops.
To be able to evaluate the impact of loop alignment this change
introduces the new option x86-experimental-pref-loop-alignment.
The alignment will be 2^Value bytes, the default value is 4.
Daniel Sanders [Tue, 7 Mar 2017 18:32:25 +0000 (18:32 +0000)]
[globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.
Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.
Teresa Johnson [Tue, 7 Mar 2017 18:15:13 +0000 (18:15 +0000)]
Fix test and add missing return for llvm-lto2 error case
Summary:
This test was missing the target triple.
Once I fixed that, the case with the invalid character error stopped
returning 1 from llvm-lto2 and the test reported a failure. Fixed by
adding the missing return from llvm-lto2. Apparently we were failing
when we eventually tried to get the target.
Adrian Prantl [Tue, 7 Mar 2017 17:28:57 +0000 (17:28 +0000)]
Revert "Strip debug info when inlining into a nodebug function."
This reverts commit r296488.
As noted by David Blaikie on llvm-commits, I overlooked the case of a
debug function being inlined into a nodebug function being inlined
into a debug function.
Sanjay Patel [Tue, 7 Mar 2017 16:10:36 +0000 (16:10 +0000)]
[InstCombine] shrink truncated splat shuffle
This is one part of solving a recent bug report:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html
This keeps with our general approach: changing arbitrary shuffles is off-limts,
but changing splat is ok. The transform is very similar to the existing
shrinkBitwiseLogic() canonicalization.
John Brawn [Tue, 7 Mar 2017 14:42:03 +0000 (14:42 +0000)]
[ARM] Correct handling of LSL #0 in an IT block
The check for LSL #0 in an IT block was checking if operand 4 was zero, but
operand 4 is the condition code operand so it was actually checking for LSLEQ.
Fix this by checking operand 3, which really is the immediate operand, and add
some tests.
Ranjeet Singh [Tue, 7 Mar 2017 11:17:53 +0000 (11:17 +0000)]
[ARM] Reapply r296865 "[ARM] fpscr read/write intrinsics not aware of each other""
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.
The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.
Ayman Musa [Tue, 7 Mar 2017 08:11:19 +0000 (08:11 +0000)]
[X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX compressing tables.
X86EvexToVex machine instruction pass compresses EVEX encoded instructions by replacing them with their identical VEX encoded instructions when possible.
It uses manually supported 2 large tables that map the EVEX instructions to their VEX ideticals.
This TableGen backend replaces the tables by automatically generating them.
Craig Topper [Tue, 7 Mar 2017 05:36:19 +0000 (05:36 +0000)]
[APInt] Add rvalue reference support to and, or, xor operations to allow their memory allocation to be reused when possible
This extends an earlier change that did similar for add and sub operations.
With this first patch we lose the fastpath for the single word case as operator&= and friends don't support it. This can be added there if we think that's important.
I had to change some functions in the APInt class since the operator overloads were moved out of the class and can't be used inside the class now. The getBitsSet change collides with another outstanding patch to implement it with setBits. But I didn't want to make this patch dependent on that series.
I've also removed the Or, And, Xor functions which were rarely or never used. I already commited two changes to remove the only uses of Or that existed.
Zachary Turner [Tue, 7 Mar 2017 03:43:17 +0000 (03:43 +0000)]
Use LLVM for all stat-related functionality.
This deletes LLDB's FileType enumeration and replaces all
users, and all calls to functions that check whether a file
exists etc with corresponding calls to LLVM.
Craig Topper [Tue, 7 Mar 2017 02:58:36 +0000 (02:58 +0000)]
[APInt] Add getBitsSetFrom and setBitsFrom to set upper bits starting at a bit
We currently have methods to set a specified number of low bits, a specified number of high bits, or a range of bits. But looking at some existing code it seems sometimes we want to set the high bits starting from a certain bit. Currently we do this with something like getHighBits(BitWidth, BitWidth - StartBit). Or once we start switching to setHighBits, setHighBits(BitWidth - StartBit) or setHighBits(getBitWidth() - StartBit).
Particularly for the latter case it would be better to have a convenience method like setBitsFrom(StartBit) so we don't need to mention the bit width that's already known to the APInt object.
I considered just making setBits have a default value of UINT_MAX for the hiBit argument and we would internally MIN it with the bit width. So if it wasn't specified it would be treated as bit width. This would require removing the assertion we currently have on the value of hiBit and may not be as readable.
Craig Topper [Tue, 7 Mar 2017 02:19:45 +0000 (02:19 +0000)]
[APInt] Implement getLowBitsSet/getHighBitsSet/getBitsSet using setLowBits/setHighBits/setBits
This patch implements getLowBitsSet/getHighBitsSet/getBitsSet in terms of the new setLowBits/setHighBits/setBits methods by making an all 0s APInt and then calling the appropriate set method.
This also adds support to setBits to allow loBits/hiBits to be in the other order to match with getBitsSet behavior.
For BitWidths larger than 64 this creates a short lived APInt with malloced storage. I think it might even call malloc twice. Its better to just provide methods that can set the necessary bits without the temporary APInt.
I'll update usages that benefit in a separate patch.
Reviewers: majnemer, MatzeB, davide, RKSimon, hans
Bob Wilson [Tue, 7 Mar 2017 00:51:07 +0000 (00:51 +0000)]
remove Cmake option for LLVM_DISABLE_ABI_BREAKING_CHECKS_ENFORCING
This is a follow-up to my change in r295090, which added support for
disabling these checks selectively based on setting the preprocessor
macro without relying on the Cmake setting. Swift has moved over to use
that approach, so we can clean up here and remove the Cmake setting.
Sanjay Patel [Mon, 6 Mar 2017 22:32:40 +0000 (22:32 +0000)]
[DAG] refactor related div/rem folds; NFCI
This is known incomplete and not called in the right order relative to
other folds, but that's the current behavior. I'm just trying to clean
this up before making actual functional changes to make the patch smaller.
The logic here should mimic the IR equivalents that are in InstSimplify's
simplifyDivRem().
Paul Robinson [Mon, 6 Mar 2017 22:20:03 +0000 (22:20 +0000)]
[DWARFv5] Update definitions to match published spec.
Some late additions to DWARF v5 were not in Dwarf.def; also one form
was redefined. Add the new cases to relevant switches in different
parts of LLVM. Replace DW_FORM_ref_sup with DW_FORM_ref_sup[4,8].
I did not add support for DW_FORM_strx3/addrx3 other that defining the
constants. We don't have any infrastructure to support these.
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.
Chad Rosier [Mon, 6 Mar 2017 21:20:00 +0000 (21:20 +0000)]
[AArch64][Redundant Copy Elim] Add support for CMN and shifted imm.
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle CMN instructions as well as a shifted
immediates.
Summary: This patch refactors the DWARFYAML code for dumping compile units to use a visitor pattern. Using this design will, in the future, enable the DWARF YAML code to perform analysis and mutations of the DWARF DIEs. An example of such mutations would be calculating the length of a compile unit and updating the CU's Length field before writing the DIE. This support will make it easier to craft or modify DWARF tests by hand.
Daniel Berlin [Mon, 6 Mar 2017 20:01:31 +0000 (20:01 +0000)]
NewGVN: We were not really failing this testcase, because the instructions it was looking for are unused. GVN value numbers unused instructions, NewGVN does not. Fix the instructions to be used, so we eliminate the redundancies it's checking for, and un-XFAIL it
[IfConversion] Only renormalize probabilities if branches are analyzable
If a block has non-analyzable branches, the listed successors don't need
to add up to one. For example, if a block has a conditional tail call,
that tail call will not have a corresponding successor in the successor
list, but will still be a possible branch.
Before, we were producing G_INSERT instructions that were actually closer to a
cast or even a COPY when both input and output sizes are the same. This doesn't
really make sense and means that everything interpreting a G_INSERT also has to
handle all these kinds of casts.
So now we detect these degenerate cases and emit real casts instead.
Tim Northover [Mon, 6 Mar 2017 18:23:04 +0000 (18:23 +0000)]
GlobalISel: refactor legalization of G_INSERT.
Now that G_INSERT instructions can only insert one register, this code was
overly general. In another direction it didn't handle registers that crossed
split boundaries properly, which needed to be fixed.
Dehao Chen [Mon, 6 Mar 2017 17:49:59 +0000 (17:49 +0000)]
Remove the sample pgo annotation heuristic that uses call count to annotate basic block count.
Summary: We do not need that special handling because the debug info is more accurate now. Performance testing shows no regression on google internal benchmarks.
[Hexagon] Early-if-convert branches that may exit the loop
Merge the tail block into the loop in cases where the main loop body
exits early, subject to profitability constraints. This will coalesce
the loop body into fewer blocks.
For example:
loop: loop:
// loop body // loop body
if (...) jump exit --> // more body
more: if (...) jump exit
// more body jump loop
jump loop
[Hexagon] Mark dead defs as <dead> in expand-condsets
The code in updateDeadFlags removed unnecessary <dead> flags, but there
can be cases where such a flag is not set, and yet a register has become
dead. For example, if a mux with identical inputs is replaced with a COPY,
the predicate register may no longer be used after that.
Sanjay Patel [Mon, 6 Mar 2017 16:36:42 +0000 (16:36 +0000)]
[DAGCombiner] simplify div/rem-by-0
Refactoring of duplicated code and more fixes to follow.
This is motivated by the post-commit comments for r296699:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html
Ie, we can crash if we're missing obvious simplifications like this that
exist in the IR simplifier or if these occur later than expected.
The x86 change for non-splat division shows a potential opportunity to improve
vector codegen: we assumed that since only one lane had meaningful results, we
should do the math in scalar. But that means moving back and forth from vector
registers.
Sanjay Patel [Mon, 6 Mar 2017 15:50:07 +0000 (15:50 +0000)]
[x86] add tests to show missing div/rem simplifications; NFC
These are not x86-specific, but the problem is not visible for all targets
because it is masked by other transforms. These can lead to compiler crashes.
Michael Kruse [Mon, 6 Mar 2017 15:33:05 +0000 (15:33 +0000)]
[BasicBlockUtils] Check for nullptr before updating LoopInfo.
LoopInfo::getLoopFor returns nullptr if a BB is not in a loop and only
then can the loop be updated to contain the newly created BBs. Add the
missing nullptr check to SplitBlockAndInsertIfThen.
Within LLVM, the only user of this function that also passes a LoopInfo
to be updated is InnerLoopVectorizer::predicateInstructions().
As the method's name implies, the BB operataten on will always be within
a loop, but out-of-tree users may also use it differently (here: Polly).
All other uses of LoopInfo::getLoopFor in the file properly check its
return value for nullptr.
Tom Stellard [Mon, 6 Mar 2017 14:26:50 +0000 (14:26 +0000)]
CMake: Add a build target for generating a source RPM
Summary:
'make srpm' or 'ninja srpm' will tar up the current source code and then
build a source RPM package.
By default it will use the llvm.spec file to generate the source RPM,
but you can specify your own custom spec file with the
LLVM_SRPM_USER_BINARY_SPECFILE option. CMake will perform variable
substitution on your custom specfile, so you can reference CMake
variables in it. For example:
Version: @LLVM_RPM_SPEC_VERSION@
Note that everything in the source directory will be included in the
tarball so if you have a clang check out in tools/clang, then all
the clang source will end up in the tarball to. It is recommended
to only use this build target with a clean source tree.
[XRay] Allow logging the first argument of a function call.
Summary:
Functions with the "xray-log-args" attribute will have a special XRay sled kind
emitted, for compiler-rt to copy any call arguments to your logging handler.
For practical and performance reasons, only the first argument is supported, and
only up to 64 bits.