Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch lift the restriction for a
numeric expression to either be a variable definition or a numeric
expression to try to match.
This commit allows a numeric variable to be set to the result of the
evaluation of a numeric expression after it has been matched
successfully. When it happens, the variable is allowed to be used on
the same line since its value is known at match time.
It also makes use of this possibility to reuse the parsing code to
parse a command-line definition by crafting a mirror string of the
-D option with the equal sign replaced by a colon sign, e.g. for option
'-D#NUMVAL=10' it creates the string
'-D#NUMVAL=10 (parsed as [[#NUMVAL:10]])' where the numeric expression
is parsed to define NUMVAL. This result in a few tests needing updating
for the location diagnostics on top of the tests for the new feature.
It also enables empty numeric expression which match any number without
defining a variable. This is done here rather than in commit #5 of the
patch series because it requires to dissociate automatic regex insertion
in RegExStr from variable definition which would make commit #5 even
bigger than it already is.
Copyright:
- Linaro (changes up to diff 183612 of revision D55940)
- GraphCore (changes in later versions of revision D55940 and
in new revision created off D55940)
George Rimar [Wed, 24 Jul 2019 12:20:42 +0000 (12:20 +0000)]
[Object/llvm-readobj] - Cleanup testing of the dynamic objects.
This patch touches a few test cases:
It removes dtflags.elf-x86-64 binary and elf-dtflags.test.
elf-dtflags.test is excessive because we have the
elf-dynamic-tags.test which test all non-machine specific tags.
It removes testing of --dynamic-table from test\Object\readobj-shared-object.test
(we have the elf-dynamic-tags.test for that), and simplifies this test case.
It moves testing of the headers from readobj-shared-object.test
to elf-file-headers.test.
Adds test/tools/llvm-readobj/elf-file-types.test and test/tools/llvm-readobj/elf-loadname.test.
It opens road for removing the readobj-shared-object.test completely soon.
George Rimar [Wed, 24 Jul 2019 12:16:22 +0000 (12:16 +0000)]
[yaml2obj] - Allow custom fields for the SHT_UNDEF sections.
This is a follow-up refactoring patch for recently
introduced functionality which which reduces the code duplication
and also makes possible to redefine all possible fields of
the first SHT_NULL section (previously it was only possible to set
sh_link and sh_size).
David Green [Wed, 24 Jul 2019 11:51:36 +0000 (11:51 +0000)]
[ARM] MVE predicate register support
This adds support code for building and shuffling i1 predicate registers. It
generally uses two basic principles, either converting the predicate into an
scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or
by converting the register to an full vector register and back.
Some of the code here is a not super efficient but will hopefully cover most
cases of moving i1 vectors around and can be improved in subsequent patches.
David Green [Wed, 24 Jul 2019 11:08:14 +0000 (11:08 +0000)]
[ARM] MVE integer compares and selects
This adds the very basics for MVE vector predication, adding integer VCMP and
VSEL instruction support. This is done through predicate registers (MVT::v16i1,
MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom
lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc).
An extra VCNE was added, as this can be handled sensibly by MVE's expanded
number of VCMP condition codes. (There are also VCLE and VCLT which are added
later).
VPSEL is also added here, simply selecting on the vselect.
Sam Parker [Wed, 24 Jul 2019 09:38:39 +0000 (09:38 +0000)]
[ARM][ParallelDSP] Fix pointer operand reordering
While combining two loads into a single load, we often need to
reorder the pointer operands for the new load. This reordering was
broken in the cases where there was a chain of values that built up
the pointer.
Petr Hosek [Wed, 24 Jul 2019 00:16:23 +0000 (00:16 +0000)]
[SafeStack] Insert the deref before remaining elements
This is a follow up to D64971. While we need to insert the deref after
the offset, it needs to come before the remaining elements in the
original expression since the deref needs to happen before the LLVM
fragment if present.
Copying them is expensive. This allows the tables to be moved around at
lower cost, and allows a remarks::StringTable to be constructed from
a remarks::ParsedStringTable.
Summary:
A number of EXPECT statements in FileCheck's unit tests are dependent
from results of other values being tested. This commit changes those
earlier test to use ASSERT instead of EXPECT to avoid cascade errors
when they are all related to the same issue.
Summary:
Testing of caret location in diagnostic message is currently made with
CHECK directive with the following general format:
CHECK: {{^ \^$}}
James Henderson suggested the following would be more readable:
CHECK: {{^}} ^{{$}}
and when whole lines can be matched (as is the case for command-line
testing where error messages do not include path):
CHECK: ^
using the option --match-full-lines.
This commit implements these 2 changes on all existing caret position
tests. It also aligns the caret to the character it is trying to match
in the above line.
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch lift the restriction for a
numeric expression to either be a variable definition or a numeric
expression to try to match.
This commit allows a numeric variable to be set to the result of the
evaluation of a numeric expression after it has been matched
successfully. When it happens, the variable is allowed to be used on
the same line since its value is known at match time.
It also makes use of this possibility to reuse the parsing code to
parse a command-line definition by crafting a mirror string of the
-D option with the equal sign replaced by a colon sign, e.g. for option
'-D#NUMVAL=10' it creates the string
'-D#NUMVAL=10 (parsed as [[#NUMVAL:10]])' where the numeric expression
is parsed to define NUMVAL. This result in a few tests needing updating
for the location diagnostics on top of the tests for the new feature.
It also enables empty numeric expression which match any number without
defining a variable. This is done here rather than in commit #5 of the
patch series because it requires to dissociate automatic regex insertion
in RegExStr from variable definition which would make commit #5 even
bigger than it already is.
Copyright:
- Linaro (changes up to diff 183612 of revision D55940)
- GraphCore (changes in later versions of revision D55940 and
in new revision created off D55940)
[AArch64][GlobalISel] Add support for s128 loads, stores, extracts, truncs.
We need to be able to load and store s128 for memcpy inlining, where we want to
generate Q register mem ops. Making these legal also requires that we add some
support in other instructions. Regbankselect should also know about these since
they have no GPR register class that can hold them, so need special handling to
live on the FPR bank.
[X86] In lowerVectorShuffle, instead of creating a new node to canonicalize the shuffle mask by commuting, just commute the mask and swap V1/V2.
LegalizeDAG tries to legal the DAG by legalizing nodes before
their operands.
If we create a new node, we end up legalizing it after its operands.
This prevents some of the optimizations that can be done when the
operand is a build_vector since the build_vector will have been
legalized to something else.
Philip Reames [Tue, 23 Jul 2019 17:45:11 +0000 (17:45 +0000)]
[IndVars] Fix a subtle bug in optimizeLoopExits
The original code failed to account for the fact that one exit can have a pointer exit count without all of them having pointer exit counts. This could cause two separate bugs:
1) We might exit the loop early, and leave optimizations undone. This is what triggered the assertion failure in the reported test case.
2) We might optimize one exit, then exit without indicating a change. This could result in an analysis invalidaton bug if no other transform is done by the rest of indvars.
Note that the pointer exit counts are a really fragile concept. They show up only when we have a pointer IV w/o a datalayout to provide their size. It's really questionable to me whether the complexity implied is worth it.
Ryan Taylor [Tue, 23 Jul 2019 17:19:56 +0000 (17:19 +0000)]
[IR][Verifier] Allow IntToPtrInst to be !dereferenceable
Summary:
Allow IntToPtrInst to carry !dereferenceable metadata tag.
This is valid since !dereferenceable can be only be applied to
pointer type values.
[GlobalISel][AArch64] Teach GISel to handle shifts in load addressing modes
When we select the XRO variants of loads, we can pull in very specific shifts
(of the size of an element). E.g.
```
ldr x1, [x2, x3, lsl #3]
```
This teaches GISel to handle these when they're coming from shifts
specifically.
This adds a new addressing mode function, `selectAddrModeShiftedExtendXReg`
which recognizes this pattern.
This also packs this up with `selectAddrModeRegisterOffset` into
`selectAddrModeXRO`. This is intended to be equivalent to `selectAddrModeXRO`
in AArch64ISelDAGtoDAG.
Also update load-addressing-modes to show that all of the cases here work.
If all the demanded elts are from one operand and are inline, then we can use the operand directly.
The changes are mainly from SSE41 targets which has blendvpd but not cmpgtq, allowing the v2i64 comparison to be simplified as we only need the signbit from alternate v4i32 elements.
[llvm-ar] Fix support for archives with members larger than 4GB
llvm-ar outputs a strange error message when handling archives with
members larger than 4GB due to not checking file size when passing the
value as an unsigned 32 bit integer. This overflow issue caused
malformed archives to be created.:
https://bugs.llvm.org/show_bug.cgi?id=38058
This change allows for members above 4GB and will error in a case that
is over the formats size limit, a 10 digit decimal integer.
Sam Parker [Tue, 23 Jul 2019 14:08:46 +0000 (14:08 +0000)]
[ARM][LowOverheadLoops] Fix branch target codegen
While lowering test.set.loop.iterations, it wasn't checked how the
brcond was using the result and so the wls could branch to the loop
preheader instead of not entering it. The same was true for
loop.decrement.reg.
So brcond and br_cc and now lowered manually when using the hwloop
intrinsics. During this we now check whether the result has been
negated and whether we're using SETEQ or SETNE and 0 or 1. We can
then figure out which basic block the WLS and LE should be targeting.
Roman Lebedev [Tue, 23 Jul 2019 12:42:49 +0000 (12:42 +0000)]
[InstSimplify][NFC] Tests for skipping 'div-by-0' checks before inverted @llvm.umul.with.overflow
It would be already handled by the non-inverted case if we were hoisting
the `not` in InstCombine, but we don't (granted, we don't sink it
in this case either), so this is a separate case.
Roman Lebedev [Tue, 23 Jul 2019 12:42:41 +0000 (12:42 +0000)]
[NFC][PhaseOredering][SimplifyCFG] Add more runlines to umul.with.overflow tests
This way it will be more obvious that the problem is both
in cost threshold and in hardcoded benefit check,
plus will show how the instsimplify cleans this all in the end.
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.
The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.
We do see a couple of regressions that need to be addressed:
AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.
The code owners have confirmed its ok for these cases to fixed up in future patches.
Sam Elliott [Tue, 23 Jul 2019 11:40:55 +0000 (11:40 +0000)]
[RISCV] Re-enable rv32i-aliases-invalid.s test
We were getting test failures on some builders, which pointed to @LINE
being an undefined variable. I think that these failures should have
been fixed by https://reviews.llvm.org/rL366434, so I'm re-enabling the
test.
George Rimar [Tue, 23 Jul 2019 11:03:37 +0000 (11:03 +0000)]
[yaml2obj] - Add a support for defining null sections in YAMLs.
ELF spec shows (Figure 4-10: Section Header Table Entry:Index 0,
http://www.sco.com/developers/gabi/latest/ch4.sheader.html)
that section header at index 0 (null section) can have sh_size and
sh_link fields set to non-zero values.
It says (https://docs.oracle.com/cd/E19683-01/817-3677/6mj8mbtc9/index.html):
"If the number of sections is greater than or equal to SHN_LORESERVE (0xff00),
this member has the value zero and the actual number of section header table
entries is contained in the sh_size field of the section header at index 0.
Otherwise, the sh_size member of the initial entry contains 0."
and:
"If the section name string table section index is greater than or equal to SHN_LORESERVE
(0xff00), this member has the value SHN_XINDEX (0xffff) and the actual index of the section
name string table section is contained in the sh_link field of the section header at index 0.
Otherwise, the sh_link member of the initial entry contains 0."
At this moment it is not possible to create custom section headers at index 0 using yaml2obj.
Simon Pilgrim [Tue, 23 Jul 2019 10:51:43 +0000 (10:51 +0000)]
[SLPVectorizer] Remove null-pointer test. NFCI.
cast<CallInst> shouldn't return null and we dereference the pointer in a lot of other places, causing both MSVC + cppcheck to warn about dereferenced null pointers
George Rimar [Tue, 23 Jul 2019 07:38:44 +0000 (07:38 +0000)]
[yaml2elf] - Treat the SHN_UNDEF section as kind of regular section.
We have a logic that adds a few sections implicitly.
Though the SHT_NULL section with section number 0
is an exception.
In D64913 I want to teach yaml2obj to redefine the null section.
And in this patch I add it to the sections list,
to make it kind of a regular section.
[DAGCombiner] Make ShrinkLoadReplaceStoreWithStore return an SDValue instead of an SDNode*. NFCI
The function was calling getNode() on an SDValue to return and the
caller turned the result back into a SDValue. So just return the
original SDValue to avoid this.
Robert Widmann [Tue, 23 Jul 2019 04:56:44 +0000 (04:56 +0000)]
[LLVM-C] Improve Bindings to The Internalize Pass
Summary: Adds a binding to the internalize pass that allows the caller to pass a function pointer that acts as the visibility-preservation predicate. Previously, one could only pass an unsigned value (not LLVMBool?) that directed the pass to consider "main" or not.
Zi Xuan Wu [Tue, 23 Jul 2019 03:34:40 +0000 (03:34 +0000)]
[PowerPC] Replace float load/store pair with integer load/store pair when it's only used in load/store
Replace float load/store pair with integer load/store pair when it's only used in load/store,
because float load/store instructions cost more cycles then integer load/store.
A typical scenario is when there is a call with more than 13 float arguments passing, we need pass them by stack.
So we need a load/store pair to do such memory operation if the variable is global variable.
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Philip Reames [Mon, 22 Jul 2019 23:33:18 +0000 (23:33 +0000)]
[Statepoints] Fix a bug in statepoint lowering for functions w/no-realign-stack
We were silently using the ABI alignment for all of the stores generated for deopt and gc values. We'd gotten the alignment of the stack slot itself properly reduced (via MachineFrameInfo's clamping), but having the MMO on the store incorrect was enough for us to generate an aligned store to a unaligned location.
The simplest fix would have been to just pass the alignment to the helper function, but once we do that, the helper function doesn't really help. So, inline it and directly call the MMO version of DAG.getStore with a properly constructed MMO.
Note that there's a separate performance possibility here. Even if we *can* realign stacks, we probably don't *want to* if all of the stores are in slowpaths. But that's a later patch, if at all. :)
[DWARF] Add more error handling to debug line parser.
This patch exnteds the error handling in the debug line parser to get
rid of the existing MD5 assertion. I want to reuse the debug line parser
from LLVM in LLDB where we cannot crash on invalid input.
Analysis: Don't look through aliases when simplifying GEPs.
It is not safe in general to replace an alias in a GEP with its aliasee
if the alias can be replaced with another definition (i.e. via strong/weak
resolution (linkonce_odr) or via symbol interposition (default visibility
in ELF)) while the aliasee cannot. An example of how this can go wrong is
in the included test case.
I was concerned that this might be a load-bearing misoptimization (it's
possible for us to use aliases to share vtables between base and derived
classes, and on Windows, vtable symbols will always be aliases in RTTI
mode, so this change could theoretically inhibit trivial devirtualization
in some cases), so I built Chromium for Linux and Windows with and without
this change. The file sizes of the resulting binaries were identical, so it
doesn't look like this is going to be a problem.
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Roman Lebedev [Mon, 22 Jul 2019 22:08:55 +0000 (22:08 +0000)]
[SimplifyCFG][NFC] Test that we fail to flatten CFG after forming @llvm.umul.with.overflow
Even if we formed @llvm.umul.with.overflow, we are still stuck
with that guard against div-by-zero, which is no longer needed,
because we didn't flatten the CFG.
Roman Lebedev [Mon, 22 Jul 2019 22:08:35 +0000 (22:08 +0000)]
[NFC][PhaseOrdering] Add tests showcasing the problems of unsigned multiply overflow check
While we can form the @llvm.mul.with.overflow easily,
we are still left with that check that was guarding against div-by-0.
And in the second case we won't even flatten the CFG.
Matt Arsenault [Mon, 22 Jul 2019 21:38:11 +0000 (21:38 +0000)]
AMDGPU: Don't use SDNodeXForm for DS offset output
The xform has no real valuewhen it's using out of a complex pattern
output. The complex pattern was already creating TargetConstants with
i16, so this was just unnecessary machinery.
This allows global isel to import the simple cases once the complex
pattern is implemented.
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
[X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it.
The build_vector will become a constant pool load. By using the
desired type initially, it ensures we don't generate a bitcast
of the constant pool load which will need to be folded with
the load.
While experimenting with another patch, I noticed that when the
load type and the constant pool type don't match, then
SimplifyDemandedBits can't handle it. While we should probably
fix that, this was a simple way to fix the issue I saw.
Jason Liu [Mon, 22 Jul 2019 19:55:33 +0000 (19:55 +0000)]
[NFC][PowerPC]Change ADDIStocHA to ADDIStocHA8 to follow 64-bit naming convention
Summary:
Since we are planning to add ADDIStocHA for 32bit in later patch, we decided
to change 64bit one first to follow naming convention with 8 behind opcode.
Sean Fertile [Mon, 22 Jul 2019 19:15:29 +0000 (19:15 +0000)]
Stubs out TLOF for AIX and add support for common vars in assembly output.
Stubs out a TargetLoweringObjectFileXCOFF class, implementing only
SelectSectionForGlobal for common symbols. Also adds an override of
EmitGlobalVariable in PPCAIXAsmPrinter which adds a number of defensive errors
and adds support for emitting common globals.
The order of operations in the DWARF symbols is incorrect in this case.
Specifically, the deref is incorrect; this appears to be incorrectly
re-inserted in repalceOneDbgValueForAlloca.
With this change which inserts the deref after the offset instead of
before it, LLVM produces correct DWARF:
WholeProgramDevirt: Teach the pass to respect the global's alignment.
The bytes inserted before an overaligned global need to be padded according
to the alignment set on the original global in order for the initializer
to meet the global's alignment requirements. The previous implementation
that padded to the pointer width happened to be correct for vtables on most
platforms but may do the wrong thing if the vtable has a larger alignment.
This issue is visible with a prototype implementation of HWASAN for globals,
which will overalign all globals including vtables to 16 bytes.
There is also no padding requirement for the bytes inserted after the global
because they are never read from nor are they significant for alignment
purposes, so stop inserting padding there.
LowerTypeTests: Teach the pass to respect global alignments.
We were previously ignoring alignment entirely when combining globals
together in this pass. There are two main things that we need to do here:
add additional padding before each global to meet the alignment requirements,
and set the combined global's alignment to the maximum of all of the original
globals' alignments.
Since we now need to calculate layout as we go anyway, use the calculated
layout to produce GlobalLayout instead of using StructLayout.