Marek Sokolowski [Sat, 30 Sep 2017 00:38:52 +0000 (00:38 +0000)]
[llvm-rc] Serialize DIALOG(EX) to .res files (serialization, pt 4).
This is now able to serialize DIALOG and DIALOGEX resources to .res
files. It still can't parse dialog-specific CAPTION, FONT, and STYLE
optional statement - these will be added in the following patch.
A limited set of controls is included. However, more can be easily added
by extending SupportedCtls map defined in ResourceScriptStmt.cpp.
[AMDGPU] Set fast-math flags on functions given the options
We have a single library build without relaxation options.
When inlined library functions remove fast math attributes
from the functions they are integrated into.
This patch sets relaxation attributes on the functions after
linking provided corresponding relaxation options are given.
Math instructions inside the inlined functions remain to have
no fast flags, but inlining does not prevent fast math
transformations of a surrounding caller code anymore.
Yaxun Liu [Fri, 29 Sep 2017 23:31:14 +0000 (23:31 +0000)]
CodeGen: Fix pointer info in expandUnalignedLoad/Store
Currently expandUnalignedLoad/Store uses place holder pointer info for temporary memory operand
in stack, which does not have correct address space. This causes unaligned private double16 load/store to be
lowered to flat_load instead of buffer_load for amdgcn target.
This fixes failures of OpenCL conformance test basic/vload_private/vstore_private on target amdgcn---amdgizcl.
Eliminate PHI (int typed) which has only one use by intptr
This patch will eliminate redundant intptr/ptrtoint that pessimizes
analyses such as SCEV, AA and will make optimization passes such
as auto-vectorization more powerful.
Brian Gesiak [Fri, 29 Sep 2017 19:34:57 +0000 (19:34 +0000)]
[CMake] Remove `CMAKE_.*_OUTPUT_DIRECTORY` (NFCI)
Summary:
Three `CMAKE_.*_OUTPUT_DIRECTORY` variables used to be set in CMake and
referenced in various other parts of the project. However, in r198205
chapuni added a note to "don't set them anymore", and any remaining
references to them were subsequently removed in r198316 and r199592.
Now that the variables are no longer used anywhere, remove them, along
with the comments advising against using them any longer.
Test Plan:
I ran `check-all` and confirmed the tests built and passed.
Matthew Simpson [Fri, 29 Sep 2017 18:07:39 +0000 (18:07 +0000)]
[LV] Use correct insertion point when type shrinking reductions
When type shrinking reductions, we should insert the truncations and extends at
the end of the loop latch block. Previously, these instructions were inserted
at the end of the loop header block. The difference is only a problem for loops
with predicated instructions (e.g., conditional stores and instructions that
may divide by zero). For these instructions, we create new basic blocks inside
the vectorized loop, which cause the loop header and latch to no longer be the
same block. This should fix PR34687.
Marek Sokolowski [Fri, 29 Sep 2017 17:46:32 +0000 (17:46 +0000)]
[llvm-rc] Refactoring needed for ACCELERATORS and MENU resources.
This is a part of llvm-rc serialization patch set (serialization, pt 1.5).
This:
* Unifies the internal representation of flags in ACCELERATORS and MENU
with the corresponding representation in .res files (noticed in
https://reviews.llvm.org/D37828#inline-329828).
* Creates an RCResource subclass, OptStatementsRCResource, describing
resource statements that can declare resource-local optional statements
(proposed in https://reviews.llvm.org/D37824#inline-329775).
These modifications don't fit to any of the current patches, so I'm
submitting them as a separate patch.
Marek Sokolowski [Fri, 29 Sep 2017 17:14:09 +0000 (17:14 +0000)]
[llvm-rc] Serialize HTML resources to .res files (serialization, pt 1).
This allows to process HTML resources defined in .rc scripts and output
them to resulting .res files. Additionally, some infrastructure allowing
to output these files is created.
This is the first resource type we can operate on.
Thanks to Nico Weber for his original work in this area.
Adam Nemet [Fri, 29 Sep 2017 16:56:54 +0000 (16:56 +0000)]
Display relative hotness with two decimal digits after the decimal point
I've seen cases where tiny inlined functions have such a high execution count
that most everything would show up with a relative of hotness of 0%. Since
the inlined functions effectively disappear you need to tune in the lower
range, thus we need more precision.
The test attempts to use -1 as carry-in for v_addc_*.
Before writing r314522, I did actually test this on real hardware,
and found that it doesn't work. So r314522 is correct in restricting
the carry-in operand: just remove those tests to make things pass
again.
Teresa Johnson [Fri, 29 Sep 2017 15:55:42 +0000 (15:55 +0000)]
[ThinLTO] Use decimal suffix for promoted values to match demanglers
Summary:
Demanglers such as libiberty know how to strip suffixes of the form
\.[a-zA-Z]+\.\d+, but our current promoted value suffixes are
.llvm.${modulehash}, where the module hash is in hex. Change the
module hash to decimal to allow demanglers to handle this.
[dwarfdump][NFC] Consistent printing of address ranges
This implement the insertion operator for DWARF address ranges so they
are consistently printed as [LowPC, HighPC).
While a dump method might have felt more consistent, it is used
exclusively for printing error messages in the verifier and never used
for actual dumping. Hence this approach is more intuitive and creates
less clutter at the call sites.
Use the basic cost if a GEP is not used as addressing mode
Summary:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.
Jonas Paulsson [Fri, 29 Sep 2017 14:31:39 +0000 (14:31 +0000)]
[SystemZ] implement shouldCoalesce()
Implement shouldCoalesce() to help regalloc avoid running out of GR128
registers.
If a COPY involving a subreg of a GR128 is coalesced, the live range of the
GR128 virtual register will be extended. If this happens where there are
enough phys-reg clobbers present, regalloc will run out of registers (if
there is not a single GR128 allocatable register available).
This patch tries to allow coalescing only when it can prove that this will be
safe by checking the (local) interval in question.
Sam Parker [Fri, 29 Sep 2017 13:11:33 +0000 (13:11 +0000)]
[ARM] v8.3-a complex number support
New instructions are added to AArch32 and AArch64 to aid
floating-point multiplication and addition of complex numbers, where
the complex numbers are packed in a vector register as a pair of
elements. The Imaginary part of the number is placed in the more
significant element, and the Real part of the number is placed in the
less significant element.
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.
Recommitting r314497. This version does not contain test which fails
when compiler is not build in debug mode.
Add missing license information to MicroMipsInstrFPU.td and
fix most of the formatting errors present. Others will be
addressed in a follow up commits.
Simon Pilgrim [Fri, 29 Sep 2017 10:02:01 +0000 (10:02 +0000)]
[X86][SSE] Added more tests for vector multiplications as utility for D37896
Added additional tests for vector multiplications with multipliers that are:
* powers of 2 displaced by 1,
* product of a power of 2 displaced by one with another power of 2.
Tim Renouf [Fri, 29 Sep 2017 09:51:22 +0000 (09:51 +0000)]
[AMDGPU] calling conventions for AMDPAL OS type
Summary:
This commit adds comments on how the AMDPAL OS type overloads the
existing AMDGPU_ calling conventions used by Mesa, and adds a couple of
new ones.
Tim Renouf [Fri, 29 Sep 2017 09:49:35 +0000 (09:49 +0000)]
[AMDGPU] AMDPAL scratch buffer support
Summary:
Added support for scratch (including spilling) for OS type amdpal:
generates code to set up the scratch descriptor if it is needed.
With amdpal, the scratch resource descriptor is loaded from offset 0 of
the global information table. The low 32 bits of the address of the
global information table is passed in s0.
Added amdgpu-git-ptr-high function attribute to hard-wire the high 32
bits of the address of the global information table. If the function
attribute is not specified, or is 0xffffffff, then the backend generates
code to use the high 32 bits of pc.
The documentation for the AMDPAL ABI will be added in a later commit.
Tim Renouf [Fri, 29 Sep 2017 09:48:12 +0000 (09:48 +0000)]
[Triple] Add AMDPAL operating system type
Summary:
This operating system type represents the AMDGPU PAL runtime, and will
be required by the AMDGPU backend in order to generate correct code for
this runtime.
Currently it generates the same code as not specifying an OS at all.
That will change in future commits.
[dwarfdump][NFC] Consistent errors and warnings with --verify
This patch introduces 3 helper functions: error(), warn() and note() to
make printing during verification more consistent. When supported, the
respective prefixes are printed in color using the same color scheme as
clang.
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.
Coby Tayree [Fri, 29 Sep 2017 07:02:46 +0000 (07:02 +0000)]
[X86][MS-InlineAsm] Extended support for variables / identifiers on memory / immediate expressions
Allow the proper recognition of Enum values and global variables inside ms inline-asm memory / immediate expressions, as they require some additional overhead and treated incorrect if doesn't early recognized.
supersedes D33278, D35774
Marek Sokolowski [Fri, 29 Sep 2017 00:33:57 +0000 (00:33 +0000)]
[llvm-rc] Import all make_unique invocations from llvm namespace.
Previous patch fixed one of LLVM buildbots (lld-x86_64-win7).
However, some others have already been failing because of make_unique
compilation error (llvm-clang-x86_64-expensive-checks-win).
This allows llvm-rc to parse user-defined resources (ref:
msdn.microsoft.com/en-us/library/windows/desktop/aa381054.aspx).
These statements either import files, or put the specified raw data in
the resulting resource file.
Thanks to Nico Weber for his original work in this area.
This allows the ints to be written as integer expressions evaluating to
unsigned 16-bit/32-bit integers.
All the expressions may use the following operators: + - & | ~, and
parentheses. Minus token - can be also unary. There is no precedence of
the operators other than the unary operators binding stronger than their
binary counterparts.
[MachineOutliner][NFC] Simplify logic in pruneCandidates
This commit yanks out the repeated sections of code in pruneCandidates into
two lambdas: ShouldSkipCandidate and Prune. This simplifies the logic in
pruneCandidates significantly, and reduces the chance of introducing bugs by
folding all of the shared logic into one place.
Summary:
X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters.
I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs.
ARM: Fix cases where CSI Restored bit is not cleared
LR is an untypical callee saved register in that it is restored into a
different register (PC) and thus does not live-out of the return block.
This case requires the `Restored` flag in CalleeSavedInfo to be cleared.
This fixes a number of cases where this wasn't handled correctly yet.
The expensive-checks build bot found a problem with the r314428 commit:
if CC is live after a ATOMIC_CMP_SWAPW instruction, it needs to be
marked as live-in to the block after the loop the pseudo gets expanded
to. This actually fixes a code-gen bug as well, since if the CC isn't
live, the CR and JLH are merged to a CRJLH which doesn't actually set
the condition code any more.
[X86] Make use of vpmovwb when possible in LowerMULH
If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract.
/code/llvm-project/llvm/unittests/ExecutionEngine/Orc/RTDyldObjectLinkingLayerTest.cpp:260:38: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
[this](decltype(ObjLayer)::ObjHandleT,
Martin Storsjo [Thu, 28 Sep 2017 19:04:30 +0000 (19:04 +0000)]
[ARM] Restore the right frame pointer register in Int_eh_sjlj_longjmp
In setupEntryBlockAndCallSites in CodeGen/SjLjEHPrepare.cpp,
we fetch and store the actual frame pointer, but on return via
the longjmp intrinsic, it always was restored into the r7 variable.
On windows, the frame pointer should be restored into r11 instead of r7.
On Darwin (where sjlj exception handling is used by default), the frame
pointer is always r7, both in arm and thumb mode, and likewise, on
windows, the frame pointer always is r11.
On linux however, if sjlj exception handling is enabled (which it isn't
by default), libcxxabi and the user code can be built in differing modes
using different registers as frame pointer. Therefore, when restoring
registers on a platform where we don't always use the same register
depending on code mode, restore both r7 and r11.
[X86] Use target independent ZERO_EXTEND/SIGN_EXTEND nodes were possible in LowerMULH
We aren't do any in register extends here so we should be able to just the target independent nodes directly and allow them to be lowered as necessary.
Adrian Prantl [Thu, 28 Sep 2017 18:10:52 +0000 (18:10 +0000)]
llvm-dwarfdump: implement --find for .apple_names
This patch implements the dwarfdump option --find=<name>. This option
looks for a DIE in the accelerator tables and dumps it if found. This
initial patch only adds support for .apple_names to keep the review
small, adding the other sections and pubnames support should be
trivial though.
[JumpThreading] Preserve DT and LVI across the pass
JumpThreading now preserves dominance and lazy value information across the
entire pass. The pass manager is also informed of this preservation with
the goal of DT and LVI being recalculated fewer times overall during
compilation.
This change prepares JumpThreading for enhanced opportunities; particularly
those across loop boundaries.
Patch by: Brian Rzycki <b.rzycki@samsung.com>,
Sebastian Pop <s.pop@samsung.com>
[X86] Remove dead code from X86ISelDAGToDAG.cpp multiply handling
Summary:
Lowering never creates X86ISD::UMUL for 8-bit types. X86ISD::UMUL8 is used instead. If X86ISD::UMUL 8-bit were ever used it would crash.
DAGCombiner replaces UMUL_LOHI/SMUL_LOHI with a wider MUL and a shift if the type twice as wide is legal. So we should never see i8 UMUL_LOHI/SMUL_LOHI. In fact I think there was a bug in part of the i8 code. Similar is true for i16 though without the bug.
The SystemZ compare-and-swap instructions already provide the "success"
indication via a condition-code value, so the default expansion of those
operations generates an unnecessary extra comparsion.
Simon Dardis [Thu, 28 Sep 2017 15:24:07 +0000 (15:24 +0000)]
[mips] Remove codegen support for branch likely instructions.
This patch disables codegen support for branch likely instructions to
address a potential bug. These branches were unselectable as
they had the same patterns as the normal branches but came after them
when ISel was concerned.
The branch likely instructions were marked as having no delay
slots when they have annulling delay slots. The delay slot filler
does not currently handle annulling delay slot branches, so this
would lead to wrong codegen if these branches were generated.
[DebugInfo] Do not extend range for physreg in LiveDebugVariables
Summary:
A DBG_VALUE that is referring to a physical register is
valid up until the next def of the register, or the end
of the basic block that it belongs to.
LiveDebugVariables is computing live intervals (slot index
ranges) for DBG_VALUE instructions, before regalloc, in order
to be able to re-insert DBG_VALUE instructions again after
regalloc. When the DBG_VALUE is mapping a variable to a
physical register we do not need to compute the range. We
should simply re-insert the DBG_VALUE at the start position.
The problem that was found, resulting in this patch, was a
situation when the DBG_VALUE was the last real use of the
physical register. The computeIntervals/extendDef methods
extended the range to cover the whole basic block, even though
the physical register very well could be allocated to some
virtual register inside the basic block. So the extended
range could not be trusted.
This patch is a preparation for https://reviews.llvm.org/D38229,
where the goal is to insert DBG_VALUE after each new definition
of a variable, even if the virtual registers that the variable
was connected to has been coalesced into using the same physical
register (e.g. due to two address instructions). For more info
see https://bugs.llvm.org/show_bug.cgi?id=34545
Benjamin Kramer [Thu, 28 Sep 2017 12:53:20 +0000 (12:53 +0000)]
[LoopInfo] Don't poison random memory regions.
The second argument for Allocator::Deallocate is the number of elements,
not the size of a single element. In asan mode specifying a large number
of elements poisoned random memory regions, leading to crashes
everywhere.
Coby Tayree [Thu, 28 Sep 2017 11:04:08 +0000 (11:04 +0000)]
[x86][AsmParser] Allow some more MS size directives
MS allows the following size directives: float/double and long as synonymous to dword/qword and dword, respectively.
Differential Revision: https://reviews.llvm.org/D37190
Sean Eveson [Thu, 28 Sep 2017 10:07:30 +0000 (10:07 +0000)]
[llvm-cov] Create directory structure when filtering using -name*= options
Before this change using any of the -name*= command line options with an output
directory would result in a single file (functions.txt/functions.html)
containing the coverage for those specific functions. Now you get the same
directory structure as when not using any -name*= options.
Alex Bradbury [Thu, 28 Sep 2017 09:31:46 +0000 (09:31 +0000)]
Teach TargetInstrInfo::getInlineAsmLength to parse .space directives with integer arguments
It's currently quite difficult to test passes like branch relaxation, which
requires branches with large displacement to be generated. The .space assembler
directive makes it easy to create arbitrarily large basic blocks, but
getInlineAsmLength is not able to parse it and so the size of the block is not
correctly estimated. Other backends (AArch64, AMDGPU) introduce options just
for testing that artificially restrict the ranges of branch instructions (e.g.
aarch64-tbz-offset-bits). Although parsing a single form of the .space
directive feels inelegant, it does allow a more direct testing approach.
This patch adapts the .space parsing code from
Mips16InstrInfo::getInlineAsmLength and removes it now the extra functionality
is provided by the base implementation. I want to move this functionality to
the generic getInlineAsmLength as 1) I need the same for RISC-V, and 2) I feel
other backends will benefit from more direct testing of large branch
displacements.
This is a follow-on of D37211.
D37211 eliminates a compare instruction if two conditional branches can be made based on the one compare instruction, e.g.
if (a == 0) { ... }
else if (a < 0) { ... }
This patch extends this optimization to support partially redundant cases, which often happen in while loops.
For example, one compare instruction is moved from the loop body into the preheader by this optimization in the following example.
do {
if (a == 0) dummy1();
a = func(a);
} while (a > 0);
Alex Bradbury [Thu, 28 Sep 2017 08:26:24 +0000 (08:26 +0000)]
[RISCV] Add common fixups and relocations
%lo(), %hi(), and %pcrel_hi() are supported and test cases have been added to
ensure the appropriate fixups and relocations are generated. I've added an
instruction format field which is used in RISCVMCCodeEmitter to, for
instance, tell whether it should emit a lo12_i fixup or a lo12_s fixup
(RISC-V has two 12-bit immediate encodings depending on the instruction
type).
Mikael Holmen [Thu, 28 Sep 2017 08:22:35 +0000 (08:22 +0000)]
[RegAllocGreedy]: Allow recoloring of done register if it's non-tied
Summary:
If we have a non-allocated register, we allow us to try recoloring of an
already allocated and "Done" register, even if they are of the same
register class, if the non-allocated register has at least one tied def
and the allocated one has none.
It should be easier to recolor the non-tied register than the tied one, so
it might be an improvement even if they use the same regclasses.