David Stenberg [Wed, 10 Apr 2019 11:28:28 +0000 (11:28 +0000)]
[DebugInfo] Track multiple registers in DbgEntityHistoryCalculator
Summary:
When calculating the debug value history, DbgEntityHistoryCalculator
would only keep track of register clobbering for the latest debug value
per inlined entity. This meant that preceding register-described debug
value fragments would live on until the next overlapping debug value,
ignoring any potential clobbering. This patch amends
DbgEntityHistoryCalculator so that it keeps track of all registers that
a inlined entity's currently live debug values are described by.
The DebugInfo/COFF/pieces.ll test case has had to be changed since
previously a register-described fragment would incorrectly outlive its
basic block.
The parent patch D59941 is expected to increase the coverage slightly,
as it makes sure that location list entries are inserted after clobbered
fragments, and this patch is expected to decrease it, as it stops
preceding register-described from living longer than they should. All in
all, this patch and the preceding patch has a negligible effect on the
output from `llvm-dwarfdump -statistics' for a clang-3.4 binary built
using the RelWithDebInfo build profile. "Scope bytes covered" increases
by 0.5%, and "variables with location" increases from 2212083 to 2212088, but it should improve the accuracy quite a bit.
David Stenberg [Wed, 10 Apr 2019 11:28:20 +0000 (11:28 +0000)]
[DebugInfo] Improve handling of clobbered fragments
Summary:
Currently the DbgValueHistorymap only keeps track of clobbered registers
for the last debug value that it has encountered. This could lead to
preceding register-described debug values living on longer in the
location lists than they should. See PR40283 for an example. This
patch does not introduce tracking of multiple registers, but changes
the DbgValueHistoryMap structure to allow for that in a follow-up
patch. This patch is not NFC, as it at least fixes two bugs in
DwarfDebug (both are covered in the new clobbered-fragments.mir test):
* If a debug value was clobbered (its End pointer set), the value would
still be added to OpenRanges, meaning that the succeeding location list
entries could potentially contain stale values.
* If a debug value was clobbered, and there were non-overlapping
fragments that were still live after the clobbering, DwarfDebug would
not create a location list entry starting directly after the
clobbering instruction. This meant that the location list could have
a gap until the next debug value for the variable was encountered.
Before this patch, the history map was represented by <Begin, End>
pairs, where a new pair was created for each new debug value. When
dealing with partially overlapping register-described debug values, such
as in the following example:
the history map would then contain the entries `[<DV1, insn1>, [<DV2, insn2>]`.
This would leave it up to the users of the map to be aware of
the relative order of the instructions, which e.g. could make
DwarfDebug::buildLocationList() needlessly complex. Instead, this patch
makes the history map structure monotonically increasing by dropping the
End pointer, and replacing that with explicit clobbering entries in the
vector. Each debug value has an "end index", which if set, points to the
entry in the vector that ends the debug value. The ending entry can
either be an overlapping debug value, or an instruction which clobbers
the register that the debug value is described by. The ending entry's
instruction can thus either be excluded or included in the debug value's
range. If the end index is not set, the debug value that the entry
introduces is valid until the end of the function.
Changes to test cases:
* DebugInfo/X86/pieces-3.ll: The range of the first DBG_VALUE, which
describes that the fragment (0, 64) is located in RDI, was
incorrectly ended by the clobbering of RAX, which the second
(non-overlapping) DBG_VALUE was described by. With this patch we
get a second entry that only describes RDI after that clobbering.
* DebugInfo/ARM/partial-subreg.ll: This test seems to indiciate a bug
in LiveDebugValues that is caused by it not being aware of fragments.
I have added some comments in the test case about that. Also, before
this patch DwarfDebug would incorrectly include a register-described
debug value from a preceding block in a location list entry.
Make it possible to TableGen code for FCONSTS and FCONSTD.
We need to make two changes to the TableGen descriptions of vfp_f32imm
and vfp_f64imm respectively:
* add GISelPredicateCode to check that the immediate fits in 8 bits;
* extract the SDNodeXForms into separate definitions and create a
GISDNodeXFormEquiv and a custom renderer function for each of them.
There's a lot of boilerplate to get the actual value of the immediate,
but it basically just boils down to calling ARM_AM::getFP32Imm or
ARM_AM::getFP64Imm.
Summary:
In an upcoming commit the history map will be changed so that it
contains explicit entries for instructions that clobber preceding debug
values, rather than Begin- End range pairs, so generalize the name to
"Entry".
Also, prefix the iterator variable names in buildLocationList() with
"E". In an upcoming commit the entry will have query functions such as
"isD(e)b(u)gValue", which could at a glance make one confuse it for
iterations over MachineInstrs, so make the iterator names a bit more
distinct to avoid that.
David Stenberg [Wed, 10 Apr 2019 09:07:32 +0000 (09:07 +0000)]
[DebugInfo] Make InstrRange into a class, NFC
Summary:
Replace use of std::pair by creating a class for the debug value
instruction ranges instead. This is a preparatory refactoring for
improving handling of clobbered fragments.
In an upcoming commit the Begin pointer will become a PointerIntPair, so
it will be cleaner to have a getter for that.
[VPLAN] Minor improvement to testing and debug messages.
1. Use computed VF for stress testing.
2. If the computed VF does not produce vector code (VF smaller than 2), force VF to be 4.
3. Test vectorization of i64 data on AArch64 to make sure we generate VF != 4 (on X86 that was already tested on AVX).
Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>
Fangrui Song [Wed, 10 Apr 2019 07:44:23 +0000 (07:44 +0000)]
[DWARF] Simplify LineTable::findRowInSeq
We want the last row whose address is less than or equal to Address.
This can be computed as upper_bound - 1, which is simpler than
lower_bound followed by skipping equal rows in a loop.
Since FirstRow (LowPC) does not satisfy the predicate (OrderByAddress)
while LastRow-1 (HighPC) satisfies the predicate. We can decrease the
search range by two, i.e.
Check AlwaysOverflow condition for usubo. The implementation is the
same as the existing handling for uaddo and umulo. Handling for saddo
and ssubo will follow (smulo doesn't have the necessary ValueTracking
support).
[ObjC][ARC] Convert the retainRV marker that is passed as a named
metadata into a module flag in the auto-upgrader and make the ARC
contract pass read the marker as a module flag.
This is needed to fix a bug where ARC contract wasn't inserting the
retainRV marker when LTO was enabled, which caused objects returned
from a function to be auto-released.
[X86] Move the 2 byte VEX optimization for MOV instructions back to the X86AsmParser::processInstruction where it used to be. Block when {vex3} prefix is present.
Years ago I moved this to an InstAlias using VR128H/VR128L. But now that we support {vex3} pseudo prefix, we need to block the optimization when it is set to match gas behavior.
Jim Lin [Wed, 10 Apr 2019 01:56:32 +0000 (01:56 +0000)]
[Sparc] Fix incorrect MI insertion position for spilling f128.
Summary:
Obviously, new built MI (sethi+add or sethi+xor+add) for constructing large offset
should be inserted before new created MI for storing even register into memory.
So the insertion position should be *StMI instead of II.
before fixed:
std %f0, [%g1+80]
sethi 4, %g1 <<<
add %g1, %sp, %g1 <<< this two instructions should be put before "std %f0, [%g1+80]".
sethi 4, %g1
add %g1, %sp, %g1
std %f2, [%g1+88]
[X86] Support the EVEX versions vcvt(t)ss2si and vcvt(t)sd2si with the {evex} pseudo prefix in the assembler.
The EVEX versions are ambiguous with the VEX versions based on operands alone so we had explicitly dropped
them from the AsmMatcher table. Unfortunately, when we add them they incorrectly show in the table before
their VEX counterparts. This is different how the prioritization normally works.
To fix this we have to explicitly reject the instructions unless the {evex} prefix has been seen.
Robert Widmann [Tue, 9 Apr 2019 22:31:56 +0000 (22:31 +0000)]
[LLVM-C] Correct The Current Debug Location Accessors
Summary: Deprecate the existing accessors for the "current debug location" of an IRBuilder. The setter could not handle being reset to NULL, and the getter would create bogus metadata if the NULL location was returned. Provide direct metadata-based accessors instead.
Robert Widmann [Tue, 9 Apr 2019 22:27:51 +0000 (22:27 +0000)]
[LLVM-C] Add Bindings to Access an Instruction's DebugLoc
Summary: Provide direct accessors to supplement LLVMSetInstDebugLocation. In addition, properly accept and return the NULL location. The old accessors provided no way to do this, so the current debug location cannot currently be cleared.
Robert Widmann [Tue, 9 Apr 2019 21:53:31 +0000 (21:53 +0000)]
[LLVM-C] Add Section and Symbol Iterator Accessors for Object File Binaries
Summary: This brings us to full feature parity with the old API, so I've deprecated it and updated the tests. I'll do a follow-up patch to do some more cleanup and documentation work in this header.
[X86] Fix a dangling StringRef issue introduced in r358029.
I was attempting to convert mnemonics to lower case after processing a pseudo prefix. But the ParseOperands just hold a StringRef for tokens so there is no where to allocate the memory.
Add FIXMEs for the lower case issue which also exists in the prefix parsing code.
[AArch64][GlobalISel] Add isel support for vector G_ICMP and G_ASHR & G_SHL
The selection for G_ICMP is unfortunately not currently importable from SDAG
due to the use of custom SDNodes. To support this, this selection method has an
opcode table which has been generated by a script, indexed by various
instruction properties. Ideally in future we will have a GISel native selection
patterns that we can write in tablegen to improve on this.
For selection of some types we also need support for G_ASHR and G_SHL which are
generated as a result of legalization. This patch also adds support for them,
generating the same code as SelectionDAG currently does.
[GlobalISel][AArch64] Allow CallLowering to handle types which are normally
required to be passed as different register types. E.g. <2 x i16> may need to
be passed as a larger <2 x i32> type, so formal arg lowering needs to be able
truncate it back. Likewise, when dealing with returns of these types, they need
to be widened in the appropriate way back.
The uadd and umul cases are currently handled, the usub, sadd, ssub
and smul cases are not. usub, sadd and ssub already have the
necessary ValueTracking support, smul doesn't.
[X86] Add support for {vex2}, {vex3}, and {evex} to the assembler to match gas. Use {evex} to improve the one our 32-bit AVX512 tests.
These can be used to force the encoding used for instructions.
{vex2} will fail if the instruction is not VEX encoded, but otherwise won't do anything since we prefer vex2 when possible. Might need to skip use of the _REV MOV instructions for this too, but I haven't done that yet.
{vex3} will force the instruction to use the 3 byte VEX encoding or fail if there is no VEX form.
{evex} will force the instruction to use the EVEX version or fail if there is no EVEX version.
[DAGCombiner][X86][SystemZ] Canonicalize SSUBO with immediate RHS to SADDO by negating the immediate.
This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate.
sdiv-canonicalize.ll fails after this revision. The fold needs to be
moved outside the branch handling constant operands. However when this
is done there are further test changes, so I'm reverting this in the
meantime.
Change the code to always handle the unsigned+signed cases together
with the same basic structure for add/sub/mul. The simple folds are
always handled first and then the ValueTracking overflow checks are
used.
Remove the unit at a time option
Removes the code from opt and the pass manager builder.
The code was unused - even by the C library code that was supposed to set
it and had been removed previously.
[ValueTracking] Use computeConstantRange() for signed sub overflow determination
This is the same change as D60420 but for signed sub rather than
signed add: Range information is intersected into the known bits
result, allows to detect more no/always overflow conditions.
One of out of tree targets has regressed with this patch. Reverting
it for now and let liveness to be fully reconstructed in case pass
was used after the LIS is created to resolve the regression.
[ValueTracking] Use computeConstantRange() in signed add overflow determination
This is D59386 for the signed add case. The computeConstantRange()
result is now intersected into the existing known bits information,
allowing to detect additional no-overflow/always-overflow conditions
(though the latter isn't used yet).
This (finally...) covers the motivating case from D59071.
[InstCombine] prevent possible miscompile with sdiv+negate of vector op
Similar to:
rL358005
Forego folding arbitrary vector constants to fix a possible miscompile bug.
We can enhance the transform if we do want to handle the more complicated
vector case.
[InstCombine] prevent possible miscompile with negate+sdiv of vector op
// 0 - (X sdiv C) -> (X sdiv -C) provided the negation doesn't overflow.
This fold has been around for many years and nobody noticed the potential
vector miscompile from overflow until recently...
So it seems unlikely that there's much demand for a vector sdiv optimization
on arbitrary vector constants, so just limit the matching to splat constants
to avoid the possible bug.
NFC: Refactor library-specific mappings of scalar maths functions to their vector counterparts
This patch factors out mappings of scalar maths functions to their vector
counterparts from TargetLibraryInfo.cpp to a separate VecFuncs.def file. Such
mappings are currently available for Accelerate framework, and SVML library.
This is in support of the follow-up: https://reviews.llvm.org/D59881
Simon Pilgrim [Tue, 9 Apr 2019 10:27:59 +0000 (10:27 +0000)]
[TargetLowering] SimplifyDemandedBits - call SimplifyDemandedBits in bitcast handling
When bitcasting from a source op to a larger bitwidth op, split the demanded bits and OR them on top of one another and demand those merged bits in the SimplifyDemandedBits call on the source op.
David Stenberg [Tue, 9 Apr 2019 10:08:26 +0000 (10:08 +0000)]
[DebugInfo] Pass all values in DebugLocEntry's constructor, NFC
Summary:
With MergeValues() removed, amend DebugLocEntry's constructor so that it
takes multiple values rather than a single, and keep non-fragment values
in OpenRanges, as this allows some cleanup of the code in
buildLocationList().
[CMake] Move configuration of LLVM_CXX_STD to HandleLLVMOptions.cmake
Standalone builds of projects other than llvm itself (lldb, libcxx,
etc) include HandleLLVMOptions but not the top level llvm CMakeLists,
so we need to set this variable here to ensure that it always has a
value.
This should fix the build issues some folks have been seeing.
Summary:
The MergeValues() function would try to merge two entries if they shared
the same beginning label. Having the same beginning label means that the
former entry's range would be empty; however, after D55919 we no longer
create entries for empty ranges, so we can no longer land in a situation
where that check in MergeValues would succeed. Instead, the "merging" is
done by keeping the live values from the preceding empty ranges in
OpenRanges, and adding them to the first non-empty range.
[X86] Have EVEX2VEX tablegenerator use HasVEX_L and HasEVEX_L2 fields instead of the composite EVEX_LL field. Remove the EVEX_LL field. NFCI
The composite existed to simplify some other tablegen code and not really in an
important way. Remove the combined field and just calculate the vector size
using two ifs.
[X86] Use VEX_WIG for VPINSRB/W and VPEXTRB/W to match what is done for EVEX.
The instruction's document this as W0 for the VEX encoding. But there's a
footnote mentioning that VEX.W is ignored in 64-bit mode. And the main VEX
encoding description says the VEX.W bit is ignored for instructions that are
equivalent to a legacy SSE instruction that uses REX.W to select a GPR which
would apply here.
By making this match EVEX we can remove a special case of allowing EVEX2VEX to
turn an EVEX.WIG instruction into VEX.W0.
Switch part of the computeOverflowForSignedAdd() implementation to
use Range.isAllNegative() rather than KnownBits.isNegative() and
similar. They do the same thing, but using the ConstantRange methods
allows dropping the KnownBits variables more easily in D60420.
[X86] Derive ssmem and sdmem from X86MemOperand. NFCI
This changes the operand type from v4f32/v2f64 to iPTR which seems more correct. But that doesn't seem to do anything other than change the comments in X86GenDAGISel.inc. Probably because we use a ComplexPattern to do the matching so there's no autogenerated code to change.
[InstCombine] peek through fdiv to find a squared sqrt
A more general canonicalization between fdiv and fmul would not
handle this case because that would have to be limited by uses
to prevent 2 values from becoming 3 values:
(x/y) * (x/y) --> (x*x) / (y*y)
(But we probably should still have that limited -- but more general --
canonicalization independently of this change.)
llvm-undname: Fix more crashes and asserts on invalid inputs
For functions whose callers don't check that enough input is present,
add checks at the start of the function that enough input is there and
set Error otherwise.
For functions that return AST objects, return nullptr instead of
incomplete AST objects with nullptr fields if an error occurred during
the function.
Introduce a new function demangleDeclarator() for the sequence
demangleFullyQualifiedSymbolName(); demangleEncodedSymbol() and
use it in the two places that had this sequence. Let this new function
check that ConversionOperatorIdentifiers have a valid TargetType.
Some of the bad inputs found by oss-fuzz, others by inspection.
[X86] Fix a couple lowering functions that called ReplaceAllUsesOfValueWith for the newly created code and then return SDValue(). Use MERGE_VALUES instead.
Returning SDValue() makes the caller think custom lowering was unsuccessful and then it will fall back to trying to expand the original node. This expanded code will end up with no users and end up being pruned later. But it was useless unnecessary work to create it.
Instead return a MERGE_VALUES with all the results so the caller knows something changed. The caller can handle the replacements.
For one of the cases I had to use UNDEF has a dummy value for a result we know is unused. This should get pruned later.
Adrian Prantl [Mon, 8 Apr 2019 19:13:55 +0000 (19:13 +0000)]
Add LLVM IR debug info support for Fortran COMMON blocks
COMMON blocks are a feature of Fortran that has no direct analog in C languages, but they are similar to data sections in assembly language programming. A COMMON block is a named area of memory that holds a collection of variables. Fortran subprograms may map the COMMON block memory area to their own, possibly distinct, non-empty list of variables. A Fortran COMMON block might look like the following example.
COMMON /ALPHA/ I, J
For this construct, the compiler generates a new scope-like DI construct (!DICommonBlock) into which variables (see I, J above) can be placed. As the common block implies a range of storage with global lifetime, the !DICommonBlock refers to a !DIGlobalVariable. The Fortran variable that comprise the COMMON block are also linked via metadata to offsets within the global variable that stands for the entire common block.
Steven Wu [Mon, 8 Apr 2019 18:24:10 +0000 (18:24 +0000)]
[ThinLTO] Fix ThinLTOCodegenerator to export llvm.used symbols
Summary:
ThinLTOCodeGenerator currently does not preserve llvm.used symbols and
it can internalize them. In order to pass the necessary information to the
legacy ThinLTOCodeGenerator, the input to the code generator is
rewritten to be based on lto::InputFile.
There is potential for miscompiled code emitted from JumpThreading when
analyzing a block with one or more indirectbr or callbr predecessors. The
ProcessThreadableEdges() function incorrectly folds conditional branches
into an unconditional branch.
This patch prevents incorrect branch folding without fully pessimizing
other potential threading opportunities through the same basic block.
This IR shape was manually fed in via opt and is unclear if clang and the
full pass pipeline will ever emit similar code shapes.
Thanks to Matthias Liedtke for the bug report and simplified IR example.
[llvm-mca][scheduler-stats] Print issued micro opcodes per cycle. NFCI
It makes more sense to print out the number of micro opcodes that are issued
every cycle rather than the number of instructions issued per cycle.
This behavior is also consistent with the dispatch-stats: numbers from the two
views can now be easily compared.