Adrian Prantl [Mon, 2 Oct 2017 18:31:29 +0000 (18:31 +0000)]
Move the stripping of invalid debug info from the Verifier to AutoUpgrade.
This came out of a recent discussion on llvm-dev
(https://reviews.llvm.org/D38042). Currently the Verifier will strip
the debug info metadata from a module if it finds the dbeug info to be
malformed. This feature is very valuable since it allows us to improve
the Verifier by making it stricter without breaking bcompatibility,
but arguable the Verifier pass should not be modifying the IR. This
patch moves the stripping of broken debug info into AutoUpgrade
(UpgradeDebugInfo to be precise), which is a much better location for
this since the stripping of malformed (i.e., produced by older, buggy
versions of Clang) is a (harsh) form of AutoUpgrade.
This change is mostly NFC in nature, the one big difference is the
behavior when LLVM module passes are introducing malformed debug
info. Prior to this patch, a NoAsserts build would have printed a
warning and stripped the debug info, after this patch the Verifier
will report a fatal error. I believe this behavior is actually more
desirable anyway.
Dehao Chen [Mon, 2 Oct 2017 18:13:14 +0000 (18:13 +0000)]
Update getMergedLocation to check the instruction type and merge properly.
Summary: If the merged instruction is call instruction, we need to set the scope to the closes common scope between 2 locations, otherwise it will cause trouble when the call is getting inlined.
[Debug info] Handle endianness when moving debug info for split integer values
Summary:
Take the target's endianness into account when splitting the
debug information in DAGTypeLegalizer::SetExpandedInteger.
This patch fixes so that, for big-endian targets, the fragment
expression corresponding to the high part of a split integer
value is placed at offset 0, in order to correctly represent
the memory address order.
I have attached a PPC32 reproducer where the resulting DWARF
pieces for a 64-bit integer were incorrectly reversed.
Hiroshi Inoue [Mon, 2 Oct 2017 09:24:00 +0000 (09:24 +0000)]
[PowerPC] support ZERO_EXTEND in tryBitPermutation
This patch add a support of ISD::ZERO_EXTEND in PPCDAGToDAGISel::tryBitPermutation to increase the opportunity to use rotate-and-mask by reordering ZEXT and ANDI.
Since tryBitPermutation stops analyzing nodes if it hits a ZEXT node while traversing SDNodes, we want to avoid ZEXT between two nodes that can be folded into a rotate-and-mask instruction.
to be folded into a rotate-and-mask instruction.
Such case often happens in array accesses with logical AND operation in the index, e.g. array[i & 0xFF];
[X86][LLVM]Expanding Supports lowerInterleaved{store|load}() in X86InterleavedAccess (VF64 stride 3-4)
I continue to support different VF interleaved and in this pass for this patch,
I added the vf64 stride3 support for both load and store.
I also added support fot the stride4 store.
Craig Topper [Mon, 2 Oct 2017 00:44:50 +0000 (00:44 +0000)]
[X86] Use _NOREX MOVZX instructions for some patterns even in 32-bit mode.
This unifies the patterns between both modes. This should be effectively NFC since all the available registers in 32-bit mode statisfy this constraint.
Ron Lieberman [Mon, 2 Oct 2017 00:34:07 +0000 (00:34 +0000)]
[Hexagon] Check vector elements for equivalence in the HexagonVectorLoopCarriedReuse pass
If the two instructions being compared for equivalence have corresponding operands
that are integer constants, then check their values to determine equivalence.
Craig Topper [Sun, 1 Oct 2017 23:53:53 +0000 (23:53 +0000)]
[X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMem
Summary:
Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable.
For the register®ister form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register®ister form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem.
I believe this supercedes D38025 which was trying to switch the register®ister form back to pre-PR22995.
Daniel Jasper [Sun, 1 Oct 2017 09:53:53 +0000 (09:53 +0000)]
Revert r314579: "Recommi r314561 after fixing over-debug assertion".
And follow-up r314585.
Leads to segfaults. I'll forward reproduction instructions to the patch
author.
Also, for a recommit, still add the original patch description.
Otherwise, it becomes really tedious to find out what a patch actually
does. The fact that it is a recommit with a fix is somewhat secondary.
Michal Gorny [Sun, 1 Oct 2017 07:13:25 +0000 (07:13 +0000)]
[lit] Fix running lit tests in unconfigured source dir
Fix llvm_tools_dir attribute access not to fail when the variable is not
present. This directory is not really necessary to run lit tests,
and the code already accounts for it being None.
The reference was added in r313407, and it breaks the stand-alone lit
package in Gentoo.
Dehao Chen [Sun, 1 Oct 2017 05:24:51 +0000 (05:24 +0000)]
Separate the logic when handling indirect calls in SamplePGO ThinLTO compile phase and other phases.
Summary: In SamplePGO ThinLTO compile phase, we will not invoke ICP as it may introduce confusion to the 2nd annotation. This patch extracted that logic and makes it clearer before profile annotation. In the mean time, we need to make function importing process both inlined callsites as well as not promoted indirect callsites.
Gadi Haber [Sat, 30 Sep 2017 14:30:23 +0000 (14:30 +0000)]
[X86][SKX] Added codegen regression test for avx512 instructions scheduling.NFC.
NFC.
Added code gen regression tests for avx512 instructions scheduling called avx512-schedule.ll and
avx512-shuffle-schedule.ll.
This patch is in preparation of a larger patch of adding all SKX instruction scheduling and therefore
the scheduling for the avx512 instructions are still missing.
Marek Sokolowski [Sat, 30 Sep 2017 00:38:52 +0000 (00:38 +0000)]
[llvm-rc] Serialize DIALOG(EX) to .res files (serialization, pt 4).
This is now able to serialize DIALOG and DIALOGEX resources to .res
files. It still can't parse dialog-specific CAPTION, FONT, and STYLE
optional statement - these will be added in the following patch.
A limited set of controls is included. However, more can be easily added
by extending SupportedCtls map defined in ResourceScriptStmt.cpp.
[AMDGPU] Set fast-math flags on functions given the options
We have a single library build without relaxation options.
When inlined library functions remove fast math attributes
from the functions they are integrated into.
This patch sets relaxation attributes on the functions after
linking provided corresponding relaxation options are given.
Math instructions inside the inlined functions remain to have
no fast flags, but inlining does not prevent fast math
transformations of a surrounding caller code anymore.
Yaxun Liu [Fri, 29 Sep 2017 23:31:14 +0000 (23:31 +0000)]
CodeGen: Fix pointer info in expandUnalignedLoad/Store
Currently expandUnalignedLoad/Store uses place holder pointer info for temporary memory operand
in stack, which does not have correct address space. This causes unaligned private double16 load/store to be
lowered to flat_load instead of buffer_load for amdgcn target.
This fixes failures of OpenCL conformance test basic/vload_private/vstore_private on target amdgcn---amdgizcl.
Eliminate PHI (int typed) which has only one use by intptr
This patch will eliminate redundant intptr/ptrtoint that pessimizes
analyses such as SCEV, AA and will make optimization passes such
as auto-vectorization more powerful.
Brian Gesiak [Fri, 29 Sep 2017 19:34:57 +0000 (19:34 +0000)]
[CMake] Remove `CMAKE_.*_OUTPUT_DIRECTORY` (NFCI)
Summary:
Three `CMAKE_.*_OUTPUT_DIRECTORY` variables used to be set in CMake and
referenced in various other parts of the project. However, in r198205
chapuni added a note to "don't set them anymore", and any remaining
references to them were subsequently removed in r198316 and r199592.
Now that the variables are no longer used anywhere, remove them, along
with the comments advising against using them any longer.
Test Plan:
I ran `check-all` and confirmed the tests built and passed.
Matthew Simpson [Fri, 29 Sep 2017 18:07:39 +0000 (18:07 +0000)]
[LV] Use correct insertion point when type shrinking reductions
When type shrinking reductions, we should insert the truncations and extends at
the end of the loop latch block. Previously, these instructions were inserted
at the end of the loop header block. The difference is only a problem for loops
with predicated instructions (e.g., conditional stores and instructions that
may divide by zero). For these instructions, we create new basic blocks inside
the vectorized loop, which cause the loop header and latch to no longer be the
same block. This should fix PR34687.
Marek Sokolowski [Fri, 29 Sep 2017 17:46:32 +0000 (17:46 +0000)]
[llvm-rc] Refactoring needed for ACCELERATORS and MENU resources.
This is a part of llvm-rc serialization patch set (serialization, pt 1.5).
This:
* Unifies the internal representation of flags in ACCELERATORS and MENU
with the corresponding representation in .res files (noticed in
https://reviews.llvm.org/D37828#inline-329828).
* Creates an RCResource subclass, OptStatementsRCResource, describing
resource statements that can declare resource-local optional statements
(proposed in https://reviews.llvm.org/D37824#inline-329775).
These modifications don't fit to any of the current patches, so I'm
submitting them as a separate patch.
Marek Sokolowski [Fri, 29 Sep 2017 17:14:09 +0000 (17:14 +0000)]
[llvm-rc] Serialize HTML resources to .res files (serialization, pt 1).
This allows to process HTML resources defined in .rc scripts and output
them to resulting .res files. Additionally, some infrastructure allowing
to output these files is created.
This is the first resource type we can operate on.
Thanks to Nico Weber for his original work in this area.
Adam Nemet [Fri, 29 Sep 2017 16:56:54 +0000 (16:56 +0000)]
Display relative hotness with two decimal digits after the decimal point
I've seen cases where tiny inlined functions have such a high execution count
that most everything would show up with a relative of hotness of 0%. Since
the inlined functions effectively disappear you need to tune in the lower
range, thus we need more precision.
The test attempts to use -1 as carry-in for v_addc_*.
Before writing r314522, I did actually test this on real hardware,
and found that it doesn't work. So r314522 is correct in restricting
the carry-in operand: just remove those tests to make things pass
again.
Teresa Johnson [Fri, 29 Sep 2017 15:55:42 +0000 (15:55 +0000)]
[ThinLTO] Use decimal suffix for promoted values to match demanglers
Summary:
Demanglers such as libiberty know how to strip suffixes of the form
\.[a-zA-Z]+\.\d+, but our current promoted value suffixes are
.llvm.${modulehash}, where the module hash is in hex. Change the
module hash to decimal to allow demanglers to handle this.
[dwarfdump][NFC] Consistent printing of address ranges
This implement the insertion operator for DWARF address ranges so they
are consistently printed as [LowPC, HighPC).
While a dump method might have felt more consistent, it is used
exclusively for printing error messages in the verifier and never used
for actual dumping. Hence this approach is more intuitive and creates
less clutter at the call sites.
Use the basic cost if a GEP is not used as addressing mode
Summary:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.
Jonas Paulsson [Fri, 29 Sep 2017 14:31:39 +0000 (14:31 +0000)]
[SystemZ] implement shouldCoalesce()
Implement shouldCoalesce() to help regalloc avoid running out of GR128
registers.
If a COPY involving a subreg of a GR128 is coalesced, the live range of the
GR128 virtual register will be extended. If this happens where there are
enough phys-reg clobbers present, regalloc will run out of registers (if
there is not a single GR128 allocatable register available).
This patch tries to allow coalescing only when it can prove that this will be
safe by checking the (local) interval in question.
Sam Parker [Fri, 29 Sep 2017 13:11:33 +0000 (13:11 +0000)]
[ARM] v8.3-a complex number support
New instructions are added to AArch32 and AArch64 to aid
floating-point multiplication and addition of complex numbers, where
the complex numbers are packed in a vector register as a pair of
elements. The Imaginary part of the number is placed in the more
significant element, and the Real part of the number is placed in the
less significant element.
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.
Recommitting r314497. This version does not contain test which fails
when compiler is not build in debug mode.