Add a missing return to the move assignment operator for
SequenceNumberManager.
Sadly, we don't have any unittests for this class because it is
a private class. Since it seems to have a nice isolated and testable
interface, it'd be great to extract it to a detail namespace and write
unit tests for it as then we could catch issues. I'll probably pester
Lang about that or some alternative refactoring.
Remove dead code trying to handle when the amount of data read is
insufficient to populate the expected struct. Prior to this we already
bailed out of the routine when this situation comes up, so none of this
code had any effect.
If someone wants to bring it back to handle these cases, fixing the
earlier conditions and adding the necessary test cases that actually
exercises it, they can always revert this and go from there.
Both of these were noticed by PVS-Studio due to the identical (dead)
condition.
Only log the visit of a return instruction if we in fact found a return
instruction.
This avoids dereferencing null in the debug logging if the instruction
was not in fact a return instruction. This potential bug was found by
PVS-Studio.
This actually fixes the last of the "dereferenced a pointer before
checking it for null" reports in the recent PVS-Studio run. However,
there are quite a few reports of this nature that I did not do anything
to fix because they are pretty glaring false positives. They usually
took the form of quite clear correlated checks or a check made in
a separate function. I've even added asserts anywhere this correlation
wasn't pretty obvious and fundamental to the code.
Hoist check for TLI above all of the attempts to use it (including one
of which that is hidden inside a separate function call) and helpfully
before building expensive transaction infrastructure. This will avoid
crashing when running CGP in a generic mode if we ever managed to hit
this case.
Note that I spent some time looking at alternatives. CGP is actually
used without a TM or TLI in order to do some target-independent testing.
Further, all of the neighboring optimization techniques actually have
some paths that are effective even in the absence of TLI so this seemed
the correct scope at which to check and bypass logic. It still isn't
clear that long-term support for missing TM/TLI is the right
cost/benefit tradeoff for CGP -- we seem to get relatively little for it
and the code is just littered with checks (and assumptions which
I suspect are still missing some checks).
This at least fixes the potential bug in this code spotted by
PVS-Studio, so we've got that going for us. ;]
Brian Gesiak [Thu, 3 Nov 2016 23:41:49 +0000 (23:41 +0000)]
[lit] Remove TODO
Summary:
Instead of keeping track of TODOs for lit in a file checked into source
control, use LLVM's bug tracker. The TODOs have been migrated to the
following bugs:
Sink all of the code relying on the MachO MachineModuleInfo to live
behind the test that the MachineModuleInfo analysis was
actually available and can be used.
While the MachO bits may well be reasonable to assume in the darwin
assembly printer, the analysis isn't constructively guaranteed anywhere
I could find so it seems safest to avoid crashing here.
This issue was found with PVS-Studio. Pretty sure the Clang Static
Anaylzer flags similar issues but we've probably never pointed it at
this code effectively.
Weiming Zhao [Thu, 3 Nov 2016 21:49:08 +0000 (21:49 +0000)]
[Cortex-M0] Atomic lowering
Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls.
Kevin Enderby [Thu, 3 Nov 2016 20:51:28 +0000 (20:51 +0000)]
Add support for the ARM_THREAD_STATE64 and
in llvm-objdump for Mach-O files add the printing of the
ARM_THREAD_STATE64 in the same format as
otool-classic(1) on darwin.
To do this the 64-bit ARM general tread state
needed to be defined in include/llvm/Support/MachO.h .
All error checking now happens when the information is needed. The
only thing left is the minimum size of the buffer and that can be just
a precondition. I will add an ErrorOr create method in a followup
commit.
Also don't store a pointer to the Header, since it is just a trivial
cast.
Adrian Prantl [Thu, 3 Nov 2016 19:42:02 +0000 (19:42 +0000)]
Add DWARF debug info support for C++11 inline namespaces.
This implements the DWARF 5 DW_AT_export_symbols feature:
http://dwarfstd.org/ShowIssue.php?issue=141212.1
Michael LeMay [Thu, 3 Nov 2016 19:14:46 +0000 (19:14 +0000)]
[ADT] IntervalMap: fix setStart and setStop
Summary:
These functions currently require that the new closed interval has a length of
at least 2. They also currently permit empty half-open intervals. This patch
defines nonEmpty in each traits structure and uses it to correct the
implementations of setStart and setStop.
Tom Stellard [Thu, 3 Nov 2016 17:56:46 +0000 (17:56 +0000)]
AMDGPU/SI: Re add VIInstructions.td to unbreak bots
This file is unused as of r285939, but we need to keep it around
for bots that don't do full rebuilds. We should be able to delete this
again in a few days.
This condition is trivially always true prior to the change. The comment
at the call site makes it clear that we expect *all* of these to be '=',
'S', or 'I' so fix the code.
We have a bug I will update to track the fact that Clang doesn't warn on
this: http://llvm.org/PR13101
A large number of tests in the LLVM tree use the default (CHECK) prefix
to indicate checked expressions via FileCheck. Highlight it as a
special comment. Although this wont get all the instances of the
checked patters, it is strictly better than the current state.
[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.
hange explores the fact that LDS reads may be reordered even if access
the same location.
Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.
Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.
Nicolai Haehnle [Thu, 3 Nov 2016 14:25:04 +0000 (14:25 +0000)]
DAGCombiner: fix use-after-free when merging consecutive stores
Summary:
Have MergeConsecutiveStores explicitly return information about the stores
that were merged, so that we can safely determine whether the starting
node has been freed.
John Brawn [Thu, 3 Nov 2016 13:55:04 +0000 (13:55 +0000)]
[CMake] Make CMAKE_INSTALL_RPATH work again
r285714 made it so that when CMAKE_INSTALL_RPATH is set _install_rpath is not
set, but that means INSTALL_RPATH gets set to an empty string which isn't what
we want. Fix this by setting INSTALL_RPATH only when _install_rpath is set.
James Molloy [Thu, 3 Nov 2016 10:18:20 +0000 (10:18 +0000)]
[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk.
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.
Teresa Johnson [Thu, 3 Nov 2016 01:07:16 +0000 (01:07 +0000)]
[ThinLTO] Handle distributed backend case when doing renaming
Summary:
The recent change I made to consult the summary when deciding whether to
rename (to handle inline asm) in r285513 broke the distributed build
case. In a distributed backend we will only have a portion of the
combined index, specifically for imported modules we only have the
summaries for any imported definitions. When renaming on import we were
asserting because no summary entry was found for a local reference being
linked in (def wasn't imported).
We only need to consult the summary for a renaming decision for the
exporting module. For imports, we would have prevented importing any
references to NoRename values already.
Kevin Enderby [Wed, 2 Nov 2016 21:08:39 +0000 (21:08 +0000)]
Add the rest of the additional error checks for invalid Mach-O files when
the offsets and sizes of an element of the Mach-O file overlaps with
another element in the Mach-O file.
Some other tests for malformed Mach-O files now run into these
checks so their tests were also adjusted.
Zachary Turner [Wed, 2 Nov 2016 17:05:19 +0000 (17:05 +0000)]
Add CodeViewRecordIO for reading and writing.
Using a pattern similar to that of YamlIO, this allows
us to have a single codepath for translating codeview
records to and from serialized byte streams. The
current patch only hooks this up to the reading of
CodeView type records. A subsequent patch will hook
it up for writing of CodeView type records, and then a
third patch will hook up the reading and writing of
CodeView symbols.
Nicolai Haehnle [Wed, 2 Nov 2016 17:03:11 +0000 (17:03 +0000)]
AMDGPU: Allow additional implicit operands on MOVRELS instructions
Summary:
The post-RA scheduler occasionally uses additional implicit operands when
the vector implicit operand as a whole is killed, but some subregisters
are still live because they are directly referenced later. Unfortunately,
this seems incredibly subtle to reproduce.
Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test
and others.
Summary:
Remove this pass from addMachineSSAOptimization() and register it unconditionally in through addPreRegAlloc(). This pass is required for generating correct PIC calls.
Create the virtual register for the global base in the intersection of
GPRC and GPRC_NOR0 (or the 64bit equivalent) and not just the latter.
GPRC_NOR0 contains ZERO as alternative meaning of r0 and is therefore
not a true subclass of GPRC.
[Reassociate] Skip analysis of dead code to avoid infinite loop.
Summary:
It was detected that the reassociate pass could enter an inifite
loop when analysing dead code. Simply skipping to analyse basic
blocks that are dead avoids such problems (and as a side effect
we avoid spending time on optimising dead code).
The solution is using the same Reverse Post Order ordering of the
basic blocks when doing the optimisations, as when building the
precalculated rank map. A nice side-effect of this solution is
that we now know that we only try to do optimisations for blocks
with ranked instructions.
Shoaib Meenai [Wed, 2 Nov 2016 06:10:03 +0000 (06:10 +0000)]
[CMake] Set default build type correctly
At least with cmake 3.6.1, the default build type setting was having no
effect; the generated CMakeCache.txt still had an empty CMAKE_BUILD_TYPE.
Force the variable to be set to achieve the desired behavior.
Bitcode: Change reader interface to take memory buffers.
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106595.html
This change also fixes an API oddity where BitstreamCursor::Read() would
return zero for the first read past the end of the bitstream, but would
report_fatal_error for subsequent reads. Now we always report_fatal_error
for all reads past the end. Updated clients to check for the end of the
bitstream before reading from it.
I also needed to add padding to the invalid bitcode tests in
test/Bitcode/. This is because the streaming interface was not checking that
the file size is a multiple of 4.
Alex Bradbury [Tue, 1 Nov 2016 23:47:30 +0000 (23:47 +0000)]
[RISCV] Add bare-bones RISC-V MCTargetDesc
This is enough to compile and link but doesn't yet do anything particularly
useful. Once an ASM parser and printer are added in the next two patches, the
whole thing can be usefully tested.
For now, only add instruction definitions for basic ALU operations. Our
initial target is a working MC layer rather than codegen, so appropriate
SelectionDAG patterns will come later.