2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.
Teresa Johnson [Thu, 3 Nov 2016 01:07:16 +0000 (01:07 +0000)]
[ThinLTO] Handle distributed backend case when doing renaming
Summary:
The recent change I made to consult the summary when deciding whether to
rename (to handle inline asm) in r285513 broke the distributed build
case. In a distributed backend we will only have a portion of the
combined index, specifically for imported modules we only have the
summaries for any imported definitions. When renaming on import we were
asserting because no summary entry was found for a local reference being
linked in (def wasn't imported).
We only need to consult the summary for a renaming decision for the
exporting module. For imports, we would have prevented importing any
references to NoRename values already.
Kevin Enderby [Wed, 2 Nov 2016 21:08:39 +0000 (21:08 +0000)]
Add the rest of the additional error checks for invalid Mach-O files when
the offsets and sizes of an element of the Mach-O file overlaps with
another element in the Mach-O file.
Some other tests for malformed Mach-O files now run into these
checks so their tests were also adjusted.
Zachary Turner [Wed, 2 Nov 2016 17:05:19 +0000 (17:05 +0000)]
Add CodeViewRecordIO for reading and writing.
Using a pattern similar to that of YamlIO, this allows
us to have a single codepath for translating codeview
records to and from serialized byte streams. The
current patch only hooks this up to the reading of
CodeView type records. A subsequent patch will hook
it up for writing of CodeView type records, and then a
third patch will hook up the reading and writing of
CodeView symbols.
Nicolai Haehnle [Wed, 2 Nov 2016 17:03:11 +0000 (17:03 +0000)]
AMDGPU: Allow additional implicit operands on MOVRELS instructions
Summary:
The post-RA scheduler occasionally uses additional implicit operands when
the vector implicit operand as a whole is killed, but some subregisters
are still live because they are directly referenced later. Unfortunately,
this seems incredibly subtle to reproduce.
Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test
and others.
Summary:
Remove this pass from addMachineSSAOptimization() and register it unconditionally in through addPreRegAlloc(). This pass is required for generating correct PIC calls.
Create the virtual register for the global base in the intersection of
GPRC and GPRC_NOR0 (or the 64bit equivalent) and not just the latter.
GPRC_NOR0 contains ZERO as alternative meaning of r0 and is therefore
not a true subclass of GPRC.
[Reassociate] Skip analysis of dead code to avoid infinite loop.
Summary:
It was detected that the reassociate pass could enter an inifite
loop when analysing dead code. Simply skipping to analyse basic
blocks that are dead avoids such problems (and as a side effect
we avoid spending time on optimising dead code).
The solution is using the same Reverse Post Order ordering of the
basic blocks when doing the optimisations, as when building the
precalculated rank map. A nice side-effect of this solution is
that we now know that we only try to do optimisations for blocks
with ranked instructions.
Shoaib Meenai [Wed, 2 Nov 2016 06:10:03 +0000 (06:10 +0000)]
[CMake] Set default build type correctly
At least with cmake 3.6.1, the default build type setting was having no
effect; the generated CMakeCache.txt still had an empty CMAKE_BUILD_TYPE.
Force the variable to be set to achieve the desired behavior.
Bitcode: Change reader interface to take memory buffers.
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106595.html
This change also fixes an API oddity where BitstreamCursor::Read() would
return zero for the first read past the end of the bitstream, but would
report_fatal_error for subsequent reads. Now we always report_fatal_error
for all reads past the end. Updated clients to check for the end of the
bitstream before reading from it.
I also needed to add padding to the invalid bitcode tests in
test/Bitcode/. This is because the streaming interface was not checking that
the file size is a multiple of 4.
Alex Bradbury [Tue, 1 Nov 2016 23:47:30 +0000 (23:47 +0000)]
[RISCV] Add bare-bones RISC-V MCTargetDesc
This is enough to compile and link but doesn't yet do anything particularly
useful. Once an ASM parser and printer are added in the next two patches, the
whole thing can be usefully tested.
For now, only add instruction definitions for basic ALU operations. Our
initial target is a working MC layer rather than codegen, so appropriate
SelectionDAG patterns will come later.
Matt Arsenault [Tue, 1 Nov 2016 22:55:07 +0000 (22:55 +0000)]
AMDGPU: Default to using scalar mov to materialize immediate
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.
Also start verifying the register classes of inlineasm operands.
Sam McCall [Tue, 1 Nov 2016 22:02:14 +0000 (22:02 +0000)]
Fix uninitialized access in MachineBlockPlacement.
Summary:
Currently PreferredLoopExit is set only in buildLoopChains, which is
never called if there are no MachineLoops.
MSan is currently broken by this:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/145/steps/check-llvm%20msan/logs/stdio
This is a naive fix to get things green again. iteratee: you may have a better fix.
This change will also mean PreferredLoopExit will not carry over if
buildCFGChains() is called a second time in runOnMachineFunction, this
appears to be the right thing.
---------------------------
If the section name string table section index is greater than or
equal to SHN_LORESERVE (0xff00), this member has the value SHN_XINDEX
(0xffff) and the actual index of the section name string table section
is contained in the sh_link field of the section header at index 0.
---------------------------
So we only have to check for it being SHN_XINDEX. Also, sh_link is
always 32 bits, so don't return an uintX_t.
Matt Arsenault [Tue, 1 Nov 2016 20:42:24 +0000 (20:42 +0000)]
AMDGPU: Workaround for instruction size with literals
Instructions with a 32-bit base encoding with an optional
32-bit literal encoded after them report their size as 4
for the disassembler. Consider these when computing the
MachineInstr size. This fixes problems caused by size estimate
consistency in BranchRelaxation.
[Hexagon] Rename operand/predicate names for unshifted integers
For example, rename s6Ext to s6_0Ext. The names for shifted integers
include the underscore and this will make the naming consistent. It
also exposed a few duplicates that were removed.
Matt Arsenault [Tue, 1 Nov 2016 18:34:00 +0000 (18:34 +0000)]
BranchRelaxation: Expand unconditional branches first
It's likely if a conditional branch needs to be expanded, the following
unconditional branch will also need expansion. By expanding the
unconditional branch first, the conditional branch can be simply
inverted to jump over the inserted indirect branch block. If the
conditional branch is expanded first, it results in an additional
branch.
Chris Bieneman [Tue, 1 Nov 2016 17:44:58 +0000 (17:44 +0000)]
[CMake] Fix rpath construction for out-of-tree builds
This patch was produced in conjunction with Michał Górny. It should resolve the issues that were trying to be solved by D25304.
This moves rpath handling into `llvm_add_library` and `add_llvm_executable` so that it is available to all projects using AddLLVM whether built in-tree or out-of-tree.
Alex Bradbury [Tue, 1 Nov 2016 17:27:54 +0000 (17:27 +0000)]
[RISCV] Add stub backend
This contains just enough for lib/Target/RISCV to compile. Notably a basic
RISCVTargetMachine and RISCVTargetInfo. At this point you can attempt llc
-march=riscv32 myinput.ll and will find it fails due to the lack of
MCAsmInfo.
See http://lists.llvm.org/pipermail/llvm-dev/2016-August/103748.html for
further discussion
Alex Bradbury [Tue, 1 Nov 2016 16:59:37 +0000 (16:59 +0000)]
[RISCV] Add RISC-V ELF defines
Add the necessary definitions for RISC-V ELF files, including relocs. Also
make necessary trivial change to ELFYaml, llvm-objdump, and llvm-readobj in
order to work with RISC-V ELFs.
Alex Bradbury [Tue, 1 Nov 2016 16:47:54 +0000 (16:47 +0000)]
[RISCV] Recognise riscv32 and riscv64 in triple parsing code
This is the first in a series of 10 initial patches that incrementally add an
MC layer for RISC-V to LLVM. See
<http://lists.llvm.org/pipermail/llvm-dev/2016-August/103748.html> for more
discussion.
Alex Bradbury [Tue, 1 Nov 2016 16:32:05 +0000 (16:32 +0000)]
[TableGen] Move OperandMatchResultTy enum to MCTargetAsmParser.h
As it stands, the OperandMatchResultTy is only included in the generated
header if there is custom operand parsing. However, almost all backends
make use of MatchOperand_Success and friends from OperandMatchResultTy for
e.g. parseRegister. This is a pain when starting an AsmParser for a new
backend that doesn't yet have custom operand parsing. Move the enum to
MCTargetAsmParser.h.
Tom Stellard [Tue, 1 Nov 2016 16:31:48 +0000 (16:31 +0000)]
AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64
I wanted to implement this as a target independent expansion, however when
targets say they want to expand FP_TO_FP16 what they actually want is
the unsafe math expansion when possible and expansion to a libcall in all
other cases.
The only way to make this work as a target independent would be to add logic
to target's TargetLowering construction to mark theses nodes as Expand when
LegalizeDAG can use the unsafe expansion and mark them as LibCall when it
cannot. I think this would be possible, but I think it would be too fragile
and complex as it would require targets to keep their expansion logic up
to date with the code in LegalizeDAG.
Sjoerd Meijer [Tue, 1 Nov 2016 15:59:37 +0000 (15:59 +0000)]
This is a 1 character fix for an ARM build attribute test (r284571): the
purpose of the test was to have 2 different function attribute sets, but due
to a typo there was only one both with number #0.
Chris Dewhurst [Tue, 1 Nov 2016 14:23:37 +0000 (14:23 +0000)]
[Sparc][LEON] Test for FixFDIVSQRT erratum fix.
Note: Test is per differential review, but the other changed code in the review was for an optimisation that din't quite work. Nevertheless, the test is valid for the unoptimised version of the fix.
James Molloy [Tue, 1 Nov 2016 13:37:41 +0000 (13:37 +0000)]
[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables
[Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment]
The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.
It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.
TBB example:
Before: lsls r0, r0, #2 After: add r0, pc
adr r1, .LJTI0_0 ldrb r0, [r0, #6]
ldr r0, [r0, r1] lsls r0, r0, #1
mov pc, r0 add pc, r0
=> No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.
The only case that can increase dynamic instruction count is the TBH case:
Before: lsls r0, r4, #2 After: lsls r4, r4, #1
adr r1, .LJTI0_0 add r4, pc
ldr r0, [r0, r1] ldrh r4, [r4, #6]
mov pc, r0 lsls r4, r4, #1
add pc, r4
=> 1 more instruction in prologue. Jump table shrunk by a factor of 2.
So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)
Serge Pavlov [Tue, 1 Nov 2016 06:53:29 +0000 (06:53 +0000)]
Allow resolving response file names relative to including file
If a response file included by construct @file itself includes a response file
and that file is specified by relative file name, current behavior is to resolve
the name relative to the current working directory. The change adds additional
flag to ExpandResponseFiles that may be used to resolve nested response file
names relative to including file. With the new mode a set of related response
files may be kept together and reference each other with short position
independent names.
Sanjoy Das [Tue, 1 Nov 2016 02:58:30 +0000 (02:58 +0000)]
[TBAA] Use wrapper objects instead of raw getOperand s; NFC
This is intended to make the semantic intent clearer.
The wrapper objects are now generic to avoid `const_cast` s. Since
`const` ness is part of the API of `MDNode::getMostGenericTBAA` (and
therefore I can't make things `const` all the way through without some
code churn outside TypeBasedAliasAnalysis.cpp), this seemed like the
cleanest solution.
Sanjay Patel [Mon, 31 Oct 2016 23:28:45 +0000 (23:28 +0000)]
[DAG] disable nsw/nuw for add/sub/mul when simplifying based on demanded bits (PR30841)
This bug was exposed by using nsw/nuw for more aggressive folds in:
https://reviews.llvm.org/rL284844
The changes mimic the IR demanded bits logic in InstCombiner::SimplifyDemandedUseBits(),
but we can't just flip flag bits in the DAG; we have to create a new node that has the
bits cleared.
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30841