[LoopUnrollAnalyzer] Fix a bug in UnrolledInstAnalyzer::visitLoad.
When simplifying a load we need to make sure that the type of the
simplified value matches the type of the instruction we're processing.
In theory, we can handle casts here as we deal with constant data, but
since it's not implemented at the moment, we at least need to bail out.
Hal Finkel [Thu, 23 Jun 2016 13:46:39 +0000 (13:46 +0000)]
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Daniel Sanders [Thu, 23 Jun 2016 12:42:53 +0000 (12:42 +0000)]
[mips] Don't derive the default ABI from the CPU in the backend.
Summary:
The backend has no reason to behave like a driver and should generally do
as it's told (and error out if it can't) instead of trying to figure out
what the API user meant. The default ABI is still derived from the arch
component as a concession to backwards compatibility.
API-users that previously passed an explicit CPU and a triple that was
inconsistent with the CPU (e.g. mips-linux-gnu and mips64r2) may get a
different ABI to what they got before. However, it's expected that there
are no such users on the basis that CodeGen has been asserting that the
triple is consistent with the selected ABI for several releases. API-users
that were consistent or passed '' or 'generic' as the CPU will see no
difference.
Daniel Sanders [Thu, 23 Jun 2016 10:54:09 +0000 (10:54 +0000)]
[mips][ias] Integers are not registers.
Summary:
When parseAnyRegister() encounters a symbol alias, it parses integers and adds
a corresponding expression to the operand list. This is clearly wrong since the
only operands that parseAnyRegister() should be accepting are registers.
It's not clear why this code was added and there are no test cases that cover
it. I think it might be leftover from when searchSymbolAlias() was more widely
used.
Diana Picus [Thu, 23 Jun 2016 09:49:56 +0000 (09:49 +0000)]
[llc] Remove exit-on-error flag (PR27759)
This flag was introduced in r269655 with the new diagnostic handler for llc. Its
purpose was to keep the old behavior for some of the tests that didn't recover
well after an error. Those tests have been fixed, so now it's safe to remove the
flag entirely.
Simon Dardis [Thu, 23 Jun 2016 09:22:11 +0000 (09:22 +0000)]
[misched] Extend scheduler to handle unsupported features
Currently isComplete = 1 requires that every instruction must
be described, declared unsupported or marked as having no
scheduling information for a processor.
For some backends such as MIPS, this requirement entails
long regex lists of instructions that are unsupported.
This patch teaches Tablegen to skip over instructions that
are associated with unsupported feature when checking if the
scheduling model is complete.
Diana Picus [Thu, 23 Jun 2016 09:19:16 +0000 (09:19 +0000)]
[AMDGPU] Remove exit-on-error in test (PR27761)
The exit-on-error flag was necessary in order to avoid an assertion when
handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize.
We can avoid the assertion by creating some dummy nodes. This enables us to
remove the exit-on-error flag on the first 2 run lines (SI), but on the third
run line (R600) we would run into another assertion when trying to reserve
indirect registers. This patch also replaces that assertion with an early exit
from the function.
Jonas Paulsson [Thu, 23 Jun 2016 08:13:20 +0000 (08:13 +0000)]
[IfConversion] Bugfix: Don't use undef flag while adding use operands.
IfConversion used to always add the undef flag when adding a use operand
on a newly predicated instruction. This would be an operand for the register
being conditionally redefined. Due to the undef flag, the liveness of this
register prior to the predicated instruction would get lost.
This patch changes this so that such use operands are added only when the
register is live, without the undef flag.
Reviewed by Quentin Colombet.
http://reviews.llvm.org/D209077
Diana Picus [Thu, 23 Jun 2016 07:47:35 +0000 (07:47 +0000)]
[ARM] Do not test for CPUs, use SubtargetFeatures (Part 1). NFCI
This is a cleanup commit similar to r271555, but for ARM.
The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.
Since the ARM backend seems to have quite a lot of calls to these methods, I
intend to submit 5-6 subtarget features at a time, instead of one big lump.
Sagar Thakur [Thu, 23 Jun 2016 06:39:35 +0000 (06:39 +0000)]
[LLVM][MIPS] Introduce 64 bit atomic check in CheckAtomic.cmake
Patch by Nitesh Jain.
Summary: On some target like MIPS32 we need to explicitly link atomic library for 64 bit atomic operations. This module then can be used in LLDB (http://reviews.llvm.org/D20464) or Libcxx (http://reviews.llvm.org/D16613) for explicitly link to atomic library.
Reid Kleckner [Wed, 22 Jun 2016 23:23:08 +0000 (23:23 +0000)]
Prune some includes from headers and sink some inline functions
MCSymbol.h shouldn't pull in MCAssembler.h, just MCFragment.h.
MCLinkerOptimizationHint.h shouldn't need MCMachObjectWriter.h. The
rest is fixing the fallout.
Chris Bieneman [Wed, 22 Jun 2016 22:19:08 +0000 (22:19 +0000)]
[MachO] Finish moving fat header swap functions to MachO.h
This is a follow-up to r273479. At the time I wrote r273479 I didn't connect the dots that the functions I was adding had to exist somewhere. Turns out, they do. This finishes moving the functions to MachO.h.
Existing MachO fat header tests like test/tools/llvm-readobj/Inputs/macho-universal-archive.x86_64.i386 execute this code.
Sanjoy Das [Wed, 22 Jun 2016 22:16:51 +0000 (22:16 +0000)]
[ImplicitNullChecks] Hoist trivial depdendencies if possible
When trying to convert a loading instruction into a FAULTING_LOAD, we
sometimes face code like this:
if %R10 is not null:
%R9<def> = MOV32ri Immediate
%R9<def, tied> = AND32rm %R9, 0x20(%R10)
else:
goto TRAP
In these cases we would like to use the AND32rm instruction as the
faulting operation by hoisting the "depedency" def-ing %R9 also above
the control flow, transforming the program into:
Pawel Bylica [Wed, 22 Jun 2016 21:15:51 +0000 (21:15 +0000)]
Do not require __STDC_LIMIT_MACROS and others
Summary: Do not require __STDC_LIMIT_MACROS and __STDC_CONSTANT_MACROS macros to be defined globally. They are not needed for C++11 compliant standard headers.
Chris Bieneman [Wed, 22 Jun 2016 21:01:19 +0000 (21:01 +0000)]
[CMake] LLVM_BINARY_DIR was not being properly set in LLVMConfig.cmake
LLVMConfig.cmake needs to set LLVM_BINARY_DIR differently based on whether or not it is the build directory or the install directory. The build directory just needs to set the value from the configuration, the install directory needs to set it to the install prefix.
Matt Arsenault [Wed, 22 Jun 2016 20:15:28 +0000 (20:15 +0000)]
AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.
Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.
[Hexagon] Add SDAG preprocessing step to expose shifted addressing modes
Transform: (store ch addr (add x (add (shl y c) e)))
to: (store ch addr (add x (shl (add y d) c))),
where e = (shl d c) for some integer d.
The purpose of this is to enable generation of loads/stores with
shifted addressing mode, i.e. mem(x+y<<#c). For that, the shift
value c must be 0, 1 or 2.
Davide Italiano [Wed, 22 Jun 2016 19:50:42 +0000 (19:50 +0000)]
[UpdateCompilerUsed] API rename and cleanup, suggested by Rafaael.
* UpdateCompilerUsed() -> updateCompilerUsed()
* ThinLTO doesn't use the API so we can remove the include
* Clean up unused #include <functional> from the header
* Rename #ifdef guard comment to be correct.
-view-machine-block-freq-propagation-dags currently
support integer and fraction as the suboptions. This
patch adds the 'count' suboption to display actual
profile count if available.
Sanjay Patel [Wed, 22 Jun 2016 19:20:59 +0000 (19:20 +0000)]
[ValueTracking] improve ComputeNumSignBits for vector constants
This is similar to the computeKnownBits improvement in rL268479.
There's probably more we can do for vector logic instructions, but
this should let us see non-splat constant masking ops that can
become vector selects instead of and/andn/or sequences.
Reid Kleckner [Wed, 22 Jun 2016 18:31:14 +0000 (18:31 +0000)]
[codeview] Add IntroducingVirtual debug info flag
CodeView needs to know if a virtual method was introduced in the current
class, and base classes may not have complete type information, so we
need to thread this bit through from the frontend.
Reid Kleckner [Wed, 22 Jun 2016 17:32:59 +0000 (17:32 +0000)]
[codeview] Fix trivial bug in OneMethodRecord::isIntroducingVirtual
These should be equality comparisons. Fixes assertions while
self-hosting clang with codeview debug info.
Ultimately this is going to be covered by real tests for virtual method
emission, so I'm not adding a "don't crash on this input" test that I'll
remove soon afterwards.
Vedant Kumar [Wed, 22 Jun 2016 17:30:58 +0000 (17:30 +0000)]
[asan] Do not instrument accesses to profiling globals
It's only useful to asan-itize profiling globals while debugging llvm's
profiling instrumentation passes. Enabling asan along with instrprof or
gcov instrumentation shouldn't incur extra overhead.
This patch is in the same spirit as r264805 and r273202, which disabled
tsan instrumentation of instrprof/gcov globals.
Reid Kleckner [Wed, 22 Jun 2016 17:15:28 +0000 (17:15 +0000)]
[codeview] Defer emission of all referenced complete records
This is the motivating example:
struct B { int b; };
struct A { B *b; };
int f(A *p) { return p->b->b; }
Clang emits complete types for both A and B because they are required to
be complete, but our CodeView emission would only emit forward
declarations of A and B. This was a consequence of the fact that the A*
type must reference the forward declaration of A, which doesn't
reference B at all.
We can't eagerly emit complete definitions of A and B when we request
the forward declaration's type index because of recursive types like
linked lists. If we did that, our stack usage could get out of hand, and
it would be possible to lower a type while attempting to lower a type,
and we would need to double check if our type is already present in the
TypeIndexMap after all recursive getTypeIndex calls.
Instead, defer complete type emission until after all type lowering has
completed. This ensures that all referenced complete types are emitted,
and that type lowering is not re-entrant.
Artur Pilipenko [Wed, 22 Jun 2016 15:16:06 +0000 (15:16 +0000)]
Upgrade old memset/memcpy signatures (without isVolatile argument) in tests
We no longer have corresponding code in autoupgrade and the vast majority of the tests were fixed long time ago. Fix the remaining few. One of the verifier test cases is marked as XFAIL because it was passing only because the signature was incorrect.
Artur Pilipenko [Wed, 22 Jun 2016 14:56:33 +0000 (14:56 +0000)]
NFC. Move Verifier::verifyIntrinsicType to Intrinsics.h
Move Verifier::verifyIntrinsicType to Intrinsics::matchIntrinsicsType. Will be used to accumulate overloaded types of a given intrinsic by the upcoming patch to fix intrinsics names when overloaded types are renamed.
George Rimar [Wed, 22 Jun 2016 13:43:38 +0000 (13:43 +0000)]
[llvm-readobj] - Teach llvm-readobj to print dependencies of SHT_GNU_verdef and refactor dumping method.
This patch changes single method of llvm-readobj.
It teaches SHT_GNU_verdef dumper to print version dependencies,
also it removes few fields from output that can be dumped with other keys
and slightly refactors code.
Testcase was also modified to match the changes.
Change is required for testcases of upcoming lld patches.
[SDAG] Remove FixedArgs parameter from CallLoweringInfo::setCallee
The setCallee function will set the number of fixed arguments based
on the size of the argument list. The FixedArgs parameter was often
explicitly set to 0, leading to a lack of consistent value for non-
vararg functions.
Try to be more clever about selecting the default format. When an existing
archive is used, use the type of the archive to determine the format. When
existing members are present, use the first member's format to determine the
format to use. If we are creating an empty archive (MRI mode) or are adding
non-object members, default to the current behaviour of using the host type due
to the lack of a better alternative. This aids in cross-compilation on Darwin
to non-Darwin platforms which rely on GNU format archives.
Anna Zaks [Wed, 22 Jun 2016 00:15:52 +0000 (00:15 +0000)]
[asan] Do not instrument pointers with address space attributes
Do not instrument pointers with address space attributes since we cannot track
them anyway. Instrumenting them results in false positives in ASan and a
compiler crash in TSan. (The compiler should not crash in any case, but that's
a different problem.)
IR: Allow metadata attachments on declarations, and fix lazy loaded metadata issue with globals.
This change is motivated by an upcoming change to the metadata representation
used for CFI. The indirect function call checker needs type information for
external function declarations in order to correctly generate jump table
entries for such declarations. We currently associate such type information
with declarations using a global metadata node, but I plan [1] to move all
such metadata to global object attachments.
In bitcode, metadata attachments for function declarations appear in the
global metadata block. This seems reasonable to me because I expect metadata
attachments on declarations to be uncommon. In the long term I'd also expect
this to be the case for CFI, because we'd want to use some specialized bitcode
format for this metadata that could be read as part of the ThinLTO thin-link
phase, which would mean that it would not appear in the global metadata block.
To solve the lazy loaded metadata issue I was seeing with D20147, I use the
same bitcode representation for metadata attachments for global variables as I
do for function declarations. Since there's a use case for metadata attachments
in the global metadata block, we might as well use that representation for
global variables as well, at least until we have a mechanism for lazy loading
global variables.
In the assembly format, the metadata attachments appear after the "declare"
keyword in order to avoid a parsing ambiguity.
Vedant Kumar [Tue, 21 Jun 2016 22:22:33 +0000 (22:22 +0000)]
[Coverage] Clarify ownership of a MemoryBuffer in the reader (NFC)
Pass a `MemoryBuffer &` to BinaryCoverageReader::create() instead of a
`std::unique_ptr<MemoryBuffer> &`. This makes it easier to reason about
the ownership of the buffer at a glance.