Andrew Ng [Mon, 1 Jul 2019 10:58:20 +0000 (10:58 +0000)]
[benchmark] Disable CMake get_git_version
Disabled CMake get_git_version as it is meaningless for this in-tree
build, and hardcoded a null version.
Not using get_git_version avoids a refresh of the git index that is
executed by get_git_version. Refreshing the index can take a
considerable amount of time if the index needs to be refreshed
(particularly with the mono repo). This situation can arise when
building shared source on a host in VMs.
Jeremy Morse [Mon, 1 Jul 2019 09:38:23 +0000 (09:38 +0000)]
[DebugInfo] Avoid adding too much indirection to pointer-valued variables
This patch addresses PR41675, where a stack-pointer variable is dereferenced
too many times by its location expression, presenting a value on the stack as
the pointer to the stack.
The difference between a stack *pointer* DBG_VALUE and one that refers to a
value on the stack, is currently the indirect flag. However the DWARF backend
will also try to guess whether something is a memory location or not, based
on whether there is any computation in the location expression. By simply
prepending the stack offset to existing expressions, we can accidentally
convert a register location into a memory location, which introduces a
suprise (and unintended) dereference.
The solution is to add DW_OP_stack_value whenever we add a DIExpression
computation to a stack *pointer*. It's an implicit location computed on the
expression stack, thus needs to be flagged as a stack_value.
For the edge case where the offset is zero and the location could be a register
location, DIExpression::prepend will still generate opcodes, and thus
DW_OP_stack_value must still be added.
Sam Parker [Mon, 1 Jul 2019 08:21:28 +0000 (08:21 +0000)]
[ARM] WLS/LE Code Generation
Backend changes to enable WLS/LE low-overhead loops for armv8.1-m:
1) Use TTI to communicate to the HardwareLoop pass that we should try
to generate intrinsics that guard the loop entry, as well as setting
the loop trip count.
2) Lower the BRCOND that uses said intrinsic to an Arm specific node:
ARMWLS.
3) ISelDAGToDAG the node to a new pseudo instruction:
t2WhileLoopStart.
4) Add support in ArmLowOverheadLoops to handle the new pseudo
instruction.
[X86] Improve the type checking fast-isel handling of vector bitcasts.
We had a bunch of vector size legality checks for the source type
based on feature flags, but we didn't check the destination type at
all beyond ensuring that it was a "simple" type. But this allowed
the destination to be i128 which isn't legal.
This commit changes the code to use TLI's isTypeLegal logic in
place of the all the subtarget checks. Then additionally checks
that the source and dest are vectors.
[X86] Add a DAG combine to replace vector loads feeding a v4i32->v2f64 CVTSI2FP/CVTUI2FP node with a vzload.
But only when the load isn't volatile.
This improves load folding during isel where we only have vzload
and scalar_to_vector+load patterns. We can't have full vector load
isel patterns for the same volatile load issue.
Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns.
Mike Spertus [Sun, 30 Jun 2019 21:54:34 +0000 (21:54 +0000)]
Clean up MSVC visualization of LLVM pointer types
Create separate natvis ptr and int views for PointerIntPair.
These are convenient in watch Windows and will be used by
Clang visualizers to be checked in shortly
Also, removed deref views as the MSVC na format has
done the same thing natively since MSVC2013.
Sanjay Patel [Sun, 30 Jun 2019 13:40:31 +0000 (13:40 +0000)]
[InstCombine] canonicalize fcmp+select to minnum/maxnum intrinsics
This is the opposite direction of D62158 (we have to choose 1 form or the other).
Now that we have FMF on the select, this becomes more palatable. And the benefits
of having a single IR instruction for this operation (less chances of missing folds
based on extra uses, etc) overcome my previous comments about the potential advantage
of larger pattern matching/analysis.
Craig Topper [Sun, 30 Jun 2019 06:46:37 +0000 (06:46 +0000)]
[X86] Custom lower AVX masked loads to masked load and vselect instead of selecting a maskmov+vblend during isel.
AVX masked loads only support 0 as the value for masked off elements.
So we need an extra blend to support other values. Previously we
expanded the masked load to two instructions with isel patterns.
With this patch we now insert the vselect during lowering and it
will be separately selected as a blend.
Nikita Popov [Sat, 29 Jun 2019 15:12:59 +0000 (15:12 +0000)]
[LFTR] Rephrase getLoopTest into "based-on" check; NFCI
What we want to know here is whether we're already using this value
for the loop condition, so make the query about that. We can extend
this to a more general "based-on" relationship, rather than a direct
icmp use later.
Sanjay Patel [Sat, 29 Jun 2019 14:28:54 +0000 (14:28 +0000)]
[InstCombine] canonicalize fmin/fmax to LLVM intrinsics minnum/maxnum
This transform came up in D62414, but we should deal with it first.
We have LLVM intrinsics that correspond exactly to libm calls (unlike
most libm calls, these libm calls never set errno).
This holds without any fast-math-flags, so we should always canonicalize
to those intrinsics directly for better optimization.
Currently, we convert to fcmp+select only when we have FMF (nnan) because
fcmp+select does not preserve the semantics of the call in the general case.
Nikita Popov [Sat, 29 Jun 2019 12:41:02 +0000 (12:41 +0000)]
[LFTR] Remove unnecessary latch check; NFCI
The whole indvars pass works on loops in simplified form, so there
is always a unique latch. Convert the condition into an assertion
in needsLFTR (though we also assert this in later LFTR functions).
Additionally update the comment on getLoopTest() now that we are
dealing with multiple exits.
Summary:
Given pattern:
`(x shiftopcode Q) shiftopcode K`
we should rewrite it as
`x shiftopcode (Q+K)` iff `(Q+K) u< bitwidth(x)`
This is valid for any shift, but they must be identical.
* https://rise4fun.com/Alive/9E2
* exact on both lshr => exact https://rise4fun.com/Alive/plHk
* exact on both ashr => exact https://rise4fun.com/Alive/QDAA
* nuw on both shl => nuw https://rise4fun.com/Alive/5Uk
* nsw on both shl => nsw https://rise4fun.com/Alive/0plg
Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42391 | PR42391]].
Nikita Popov [Sat, 29 Jun 2019 09:24:12 +0000 (09:24 +0000)]
[LFTR] Fix post-inc pointer IV with truncated exit count (PR41998)
Fixes https://bugs.llvm.org/show_bug.cgi?id=41998. Usually when we
have a truncated exit count we'll truncate the IV when comparing
against the limit, in which case exit count overflow in post-inc
form doesn't matter. However, for pointer IVs we don't do that, so
we have to be careful about incrementing the IV in the wide type.
I'm fixing this by removing the IVCount variable (which was
ExitCount or ExitCount+1) and replacing it with a UsePostInc flag,
and then moving the actual limit adjustment to the individual cases
(which are: pointer IV where we add to the wide type, integer IV
where we add to the narrow type, and constant integer IV where we
add to the wide type).
Sanjay Patel [Fri, 28 Jun 2019 20:45:32 +0000 (20:45 +0000)]
[Lanai] auto-generate complete test checks; NFC
This file will fail with a common codegen transform that
I'm looking at, and I can't tell if that's an improvement
or regression based on the sparse checking.
Rainer Orth [Fri, 28 Jun 2019 18:29:18 +0000 (18:29 +0000)]
[unittests][Support] Fix LLVM-Unit :: Support/./SupportTests/FileSystemTest.permissions on Solaris
LLVM-Unit :: Support/./SupportTests/FileSystemTest.permissions currently
FAILs on Solaris:
FAIL: LLVM-Unit :: Support/./SupportTests/FileSystemTest.permissions (2940 of 51555)
******************** TEST 'LLVM-Unit :: Support/./SupportTests/FileSystemTest.permissions' FAILED ********************
Note: Google Test filter = FileSystemTest.permissions
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from FileSystemTest
[ RUN ] FileSystemTest.permissions
/opt/llvm-buildbot/obj/llvm/llvm.src/unittests/Support/Path.cpp:1705: Failure
Value of: CheckPermissions(fs::sticky_bit)
Actual: false
Expected: true
/opt/llvm-buildbot/obj/llvm/llvm.src/unittests/Support/Path.cpp:1712: Failure
Value of: CheckPermissions(fs::set_uid_on_exe | fs::set_gid_on_exe | fs::sticky_bit)
Actual: false
Expected: true
/opt/llvm-buildbot/obj/llvm/llvm.src/unittests/Support/Path.cpp:1719: Failure
Value of: CheckPermissions(fs::all_read | fs::set_uid_on_exe | fs::set_gid_on_exe | fs::sticky_bit)
Actual: false
Expected: true
/opt/llvm-buildbot/obj/llvm/llvm.src/unittests/Support/Path.cpp:1722: Failure
Value of: CheckPermissions(fs::all_perms)
Actual: false
Expected: true
[ FAILED ] FileSystemTest.permissions (0 ms)
[----------] 1 test from FileSystemTest (0 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] FileSystemTest.permissions
1 FAILED TEST
Checking with truss reveals that this is the same issue as on AIX and
documented in chmod(2):
If the process is not a privileged process and the file is not a direc-
tory, mode bit 01000 (S_ISVTX, the sticky bit) is cleared.
The following patch fixes this in the same way. Tested on amd64-pc-solaris2.11.
Sam Tebbs [Fri, 28 Jun 2019 15:43:31 +0000 (15:43 +0000)]
[ARM] Add support for the MVE long shift instructions
MVE adds the lsll, lsrl and asrl instructions, which perform a shift on a 64 bit value separated into two 32 bit registers.
The Expand64BitShift function is modified to accept ISD::SHL, ISD::SRL and ISD::SRA and convert it into the appropriate opcode in ARMISD. An SHL is converted into an lsll, an SRL is converted into an lsrl for the immediate form and a negation and lsll for the register form, and SRA is converted into an asrl.
test/CodeGen/ARM/shift_parts.ll is added to test the logic of emitting these instructions.
Roman Lebedev [Fri, 28 Jun 2019 15:32:52 +0000 (15:32 +0000)]
[NFC][InstCombine] Shift amount reassociation: add flag preservation test
As discussed in https://reviews.llvm.org/D63812#inline-569870
* exact on both lshr => exact https://rise4fun.com/Alive/plHk
* exact on both ashr => exact https://rise4fun.com/Alive/QDAA
* nuw on both shl => nuw https://rise4fun.com/Alive/5Uk
* nsw on both shl => nsw https://rise4fun.com/Alive/0plg
So basically if the same flag is set on both original shifts -> set it on new shift.
Don't think we can do anything with non-matching flags on shl.
Fangrui Song [Fri, 28 Jun 2019 10:06:11 +0000 (10:06 +0000)]
[DebugInfo] Simplify GSYM::AddressRange and GSYM::AddressRanges
Delete unnecessary getters of AddressRange.
Simplify AddressRange::size(): Start <= End check should be checked in an upper layer.
Delete isContiguousWith() that doesn't make sense.
Simplify AddressRanges::insert. Delete commented code. Fix it when more than 1 ranges are to be deleted.
Delete trailing newline.
David Green [Fri, 28 Jun 2019 09:47:55 +0000 (09:47 +0000)]
[ARM] Widening loads and narrowing stores
MVE has instructions to widen as it loads, and narrow as it stores. This adds
the required patterns and legalisation to make them work including specifying
that they are legal, patterns to select them and test changes.
David Green [Fri, 28 Jun 2019 08:41:40 +0000 (08:41 +0000)]
[ARM] MVE loads and stores
This fills in the gaps for basic MVE loads and stores, allowing unaligned
access and adding far too many tests. These will become important as
narrowing/expanding and pre/post inc are added. Big endian might still not be
handled very well, because we have not yet added bitcasts (and I'm not sure how
we want it to work yet). I've included the alignment code anyway which maps
with our current patterns. We plan to return to that later.
Code written by Simon Tatham, with additional tests from Me and Mikhail Maltsev.
David Green [Fri, 28 Jun 2019 07:08:42 +0000 (07:08 +0000)]
[ARM] MVE vector shuffles
This patch adds necessary shuffle vector and buildvector support for ARM MVE.
It essentially adds support for VDUP, VREVs and some VMOVs, which are often
required by other code (like upcoming patches).
This mostly uses the same code from Neon that already generated
NEONvdup/NEONvduplane/NEONvrev's. These have been renamed to ARMvdup/etc and
moved to ARMInstrInfo as they are common to both architectures. Most of the
selection code seems to be applicable to both, but NEON does have some more
instructions making some parts specific.
Mikael Holmen [Fri, 28 Jun 2019 06:45:20 +0000 (06:45 +0000)]
Silence gcc warning in testcase [NFC]
Without the fix gcc (7.4.0) complains with
../unittests/ADT/APIntTest.cpp: In member function 'virtual void {anonymous}::APIntTest_MultiplicativeInverseExaustive_Test::TestBody()':
../unittests/ADT/APIntTest.cpp:2510:36: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
for (unsigned Value = 0; Value < (1 << BitWidth); ++Value) {
~~~~~~^~~~~~~~~~~~~~~~~
Alex Brachet [Fri, 28 Jun 2019 03:21:00 +0000 (03:21 +0000)]
[Support] Add fs::getUmask() function and change fs::setPermissions
Summary: This patch changes fs::setPermissions to optionally set permissions while respecting the umask. It also adds the function fs::getUmask() which returns the current umask.
Amara Emerson [Thu, 27 Jun 2019 23:56:34 +0000 (23:56 +0000)]
[GlobalISel][IRTranslator] Fix some PHI bugs related to jump tables when optimizations are used.
The new switch lowering code that tries to generate jump tables and range checks
were tested at -O0 on arm64, but on -O3 the generic switch lowering code goes to
town on trying to generate optimized lowerings, e.g. multiple jump tables, range
checks etc. This exposed bugs in the way PHI nodes are handled because the CFG
looks even stranger after all of this is done.
Fedor Sergeev [Thu, 27 Jun 2019 23:41:03 +0000 (23:41 +0000)]
[InlineCost] make InlineCost assignable
Summary:
Current InlineCost is not assignable because of const members Cost and Threshold.
I dont see practical benefits from having them const (access to these members is
private and internal interactions are rather simple). On other hand that makes
it hard to use as a member in some other data structure where assignability is necessary.
I'm going to use InlineCost in a downstream inliner that maintains a complex queue
of candidate call-sites and thus keeping and recalculating InlineCost is necessary.
This patch just removes 'const' from both members, making InlineCost assignable.
Roman Lebedev [Thu, 27 Jun 2019 21:52:10 +0000 (21:52 +0000)]
[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.
This is a recommit, the original commit rL364563 was reverted in rL364568
because test-suite detected miscompile - the new comparison constant 'Q'
was being computed incorrectly (we divided by `D0` instead of `D`).
Original patch D50222 by @hermord (Dmytro Shynkevych)
Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.