This condition is trivially always true prior to the change. The comment
at the call site makes it clear that we expect *all* of these to be '=',
'S', or 'I' so fix the code.
We have a bug I will update to track the fact that Clang doesn't warn on
this: http://llvm.org/PR13101
A large number of tests in the LLVM tree use the default (CHECK) prefix
to indicate checked expressions via FileCheck. Highlight it as a
special comment. Although this wont get all the instances of the
checked patters, it is strictly better than the current state.
[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.
hange explores the fact that LDS reads may be reordered even if access
the same location.
Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.
Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.
Nicolai Haehnle [Thu, 3 Nov 2016 14:25:04 +0000 (14:25 +0000)]
DAGCombiner: fix use-after-free when merging consecutive stores
Summary:
Have MergeConsecutiveStores explicitly return information about the stores
that were merged, so that we can safely determine whether the starting
node has been freed.
John Brawn [Thu, 3 Nov 2016 13:55:04 +0000 (13:55 +0000)]
[CMake] Make CMAKE_INSTALL_RPATH work again
r285714 made it so that when CMAKE_INSTALL_RPATH is set _install_rpath is not
set, but that means INSTALL_RPATH gets set to an empty string which isn't what
we want. Fix this by setting INSTALL_RPATH only when _install_rpath is set.
James Molloy [Thu, 3 Nov 2016 10:18:20 +0000 (10:18 +0000)]
[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk.
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.
Teresa Johnson [Thu, 3 Nov 2016 01:07:16 +0000 (01:07 +0000)]
[ThinLTO] Handle distributed backend case when doing renaming
Summary:
The recent change I made to consult the summary when deciding whether to
rename (to handle inline asm) in r285513 broke the distributed build
case. In a distributed backend we will only have a portion of the
combined index, specifically for imported modules we only have the
summaries for any imported definitions. When renaming on import we were
asserting because no summary entry was found for a local reference being
linked in (def wasn't imported).
We only need to consult the summary for a renaming decision for the
exporting module. For imports, we would have prevented importing any
references to NoRename values already.
Kevin Enderby [Wed, 2 Nov 2016 21:08:39 +0000 (21:08 +0000)]
Add the rest of the additional error checks for invalid Mach-O files when
the offsets and sizes of an element of the Mach-O file overlaps with
another element in the Mach-O file.
Some other tests for malformed Mach-O files now run into these
checks so their tests were also adjusted.
Zachary Turner [Wed, 2 Nov 2016 17:05:19 +0000 (17:05 +0000)]
Add CodeViewRecordIO for reading and writing.
Using a pattern similar to that of YamlIO, this allows
us to have a single codepath for translating codeview
records to and from serialized byte streams. The
current patch only hooks this up to the reading of
CodeView type records. A subsequent patch will hook
it up for writing of CodeView type records, and then a
third patch will hook up the reading and writing of
CodeView symbols.
Nicolai Haehnle [Wed, 2 Nov 2016 17:03:11 +0000 (17:03 +0000)]
AMDGPU: Allow additional implicit operands on MOVRELS instructions
Summary:
The post-RA scheduler occasionally uses additional implicit operands when
the vector implicit operand as a whole is killed, but some subregisters
are still live because they are directly referenced later. Unfortunately,
this seems incredibly subtle to reproduce.
Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test
and others.
Summary:
Remove this pass from addMachineSSAOptimization() and register it unconditionally in through addPreRegAlloc(). This pass is required for generating correct PIC calls.
Create the virtual register for the global base in the intersection of
GPRC and GPRC_NOR0 (or the 64bit equivalent) and not just the latter.
GPRC_NOR0 contains ZERO as alternative meaning of r0 and is therefore
not a true subclass of GPRC.
[Reassociate] Skip analysis of dead code to avoid infinite loop.
Summary:
It was detected that the reassociate pass could enter an inifite
loop when analysing dead code. Simply skipping to analyse basic
blocks that are dead avoids such problems (and as a side effect
we avoid spending time on optimising dead code).
The solution is using the same Reverse Post Order ordering of the
basic blocks when doing the optimisations, as when building the
precalculated rank map. A nice side-effect of this solution is
that we now know that we only try to do optimisations for blocks
with ranked instructions.
Shoaib Meenai [Wed, 2 Nov 2016 06:10:03 +0000 (06:10 +0000)]
[CMake] Set default build type correctly
At least with cmake 3.6.1, the default build type setting was having no
effect; the generated CMakeCache.txt still had an empty CMAKE_BUILD_TYPE.
Force the variable to be set to achieve the desired behavior.
Bitcode: Change reader interface to take memory buffers.
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106595.html
This change also fixes an API oddity where BitstreamCursor::Read() would
return zero for the first read past the end of the bitstream, but would
report_fatal_error for subsequent reads. Now we always report_fatal_error
for all reads past the end. Updated clients to check for the end of the
bitstream before reading from it.
I also needed to add padding to the invalid bitcode tests in
test/Bitcode/. This is because the streaming interface was not checking that
the file size is a multiple of 4.
Alex Bradbury [Tue, 1 Nov 2016 23:47:30 +0000 (23:47 +0000)]
[RISCV] Add bare-bones RISC-V MCTargetDesc
This is enough to compile and link but doesn't yet do anything particularly
useful. Once an ASM parser and printer are added in the next two patches, the
whole thing can be usefully tested.
For now, only add instruction definitions for basic ALU operations. Our
initial target is a working MC layer rather than codegen, so appropriate
SelectionDAG patterns will come later.
Matt Arsenault [Tue, 1 Nov 2016 22:55:07 +0000 (22:55 +0000)]
AMDGPU: Default to using scalar mov to materialize immediate
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.
Also start verifying the register classes of inlineasm operands.
Sam McCall [Tue, 1 Nov 2016 22:02:14 +0000 (22:02 +0000)]
Fix uninitialized access in MachineBlockPlacement.
Summary:
Currently PreferredLoopExit is set only in buildLoopChains, which is
never called if there are no MachineLoops.
MSan is currently broken by this:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/145/steps/check-llvm%20msan/logs/stdio
This is a naive fix to get things green again. iteratee: you may have a better fix.
This change will also mean PreferredLoopExit will not carry over if
buildCFGChains() is called a second time in runOnMachineFunction, this
appears to be the right thing.
---------------------------
If the section name string table section index is greater than or
equal to SHN_LORESERVE (0xff00), this member has the value SHN_XINDEX
(0xffff) and the actual index of the section name string table section
is contained in the sh_link field of the section header at index 0.
---------------------------
So we only have to check for it being SHN_XINDEX. Also, sh_link is
always 32 bits, so don't return an uintX_t.
Matt Arsenault [Tue, 1 Nov 2016 20:42:24 +0000 (20:42 +0000)]
AMDGPU: Workaround for instruction size with literals
Instructions with a 32-bit base encoding with an optional
32-bit literal encoded after them report their size as 4
for the disassembler. Consider these when computing the
MachineInstr size. This fixes problems caused by size estimate
consistency in BranchRelaxation.
[Hexagon] Rename operand/predicate names for unshifted integers
For example, rename s6Ext to s6_0Ext. The names for shifted integers
include the underscore and this will make the naming consistent. It
also exposed a few duplicates that were removed.
Matt Arsenault [Tue, 1 Nov 2016 18:34:00 +0000 (18:34 +0000)]
BranchRelaxation: Expand unconditional branches first
It's likely if a conditional branch needs to be expanded, the following
unconditional branch will also need expansion. By expanding the
unconditional branch first, the conditional branch can be simply
inverted to jump over the inserted indirect branch block. If the
conditional branch is expanded first, it results in an additional
branch.