Chad Rosier [Tue, 9 Jun 2015 20:59:41 +0000 (20:59 +0000)]
[AArch64] Remove an overly conservative check when generating store pairs.
Store instructions do not modify register values and therefore it's safe
to form a store pair even if the source register has been read in between
the two store instructions.
Previously, the read of w1 (see below) prevented the formation of a stp.
Akira Hatanaka [Tue, 9 Jun 2015 19:07:19 +0000 (19:07 +0000)]
Remove DisableTailCalls from TargetOptions and the code in resetTargetOptions
that was resetting it.
Remove the uses of DisableTailCalls in subclasses of TargetLowering and use
the value of function attribute "disable-tail-calls" instead. Also,
unconditionally add pass TailCallElim to the pipeline and check the function
attribute at the start of runOnFunction to disable the pass on a per-function
basis.
This is part of the work to remove TargetMachine::resetTargetOptions, and since
DisableTailCalls was the last non-fast-math option that was being reset in that
function, we should be able to remove the function entirely after the work to
propagate IR-level fast-math flags to DAG nodes is completed.
Out-of-tree users should remove the uses of DisableTailCalls and make changes
to attach attribute "disable-tail-calls"="true" or "false" to the functions in
the IR.
Pete Cooper [Tue, 9 Jun 2015 18:36:13 +0000 (18:36 +0000)]
Allocate space for MCSymbol::Name only if required.
Similarly to User which allocates a number of Use's prior to the this pointer,
allocate space for the Name* for MCSymbol only when we need a name.
Given that an MCSymbol is 48-bytes on 64-bit systems, this saves a decent % of space.
Given the verify_uselistorder test case with debug info and llc, 50k symbols have names
out of 700k so this optimises for the common case of temporary unnamed symbols.
Samuel Antao [Tue, 9 Jun 2015 16:29:34 +0000 (16:29 +0000)]
The constant initialization for globals in NVPTX is generated as an
array of bytes. The generation of this byte arrays was expecting
the host to be little endian, which prevents big endian hosts to be
used in the generation of the PTX code. This patch fixes the
problem by changing the way the bytes are extracted so that it
works for either little and big endian.
Rui Ueyama [Tue, 9 Jun 2015 15:20:42 +0000 (15:20 +0000)]
Remove object_error::success and use std::error_code() instead
make_error_code(object_error) is slow because object::object_category()
uses a ManagedStatic variable. But the real problem is that the function is
called too frequently. This patch uses std::error_code() instead of
object_error::success. In most cases, we return "success", so this patch
reduces number of function calls to that function.
Toma Tabacu [Tue, 9 Jun 2015 10:34:31 +0000 (10:34 +0000)]
[mips] [IAS] Add support for BNE and BEQ with an immediate operand.
Summary:
For some branches, GAS accepts an immediate instead of the 2nd register operand.
We only implement this for BNE and BEQ for now. Other branch instructions can be added later, if needed.
Daniel Sanders [Tue, 9 Jun 2015 09:47:46 +0000 (09:47 +0000)]
[ADT] Assert that SmallVectorBase::grow_pod() successfully reallocates memory.
Summary:
If malloc/realloc fails then the SmallVector becomes unusable since begin() and
end() will return NULL. This is unlikely to occur but was the cause of recent
bugpoint test failures on my machine.
It is not clear whether not checking for malloc/realloc failure is a deliberate
decision and adding checks has the potential to impact compiler performance.
Therefore, this patch only adds the check to builds with assertions enabled for
the moment.
Keno Fischer [Tue, 9 Jun 2015 01:53:59 +0000 (01:53 +0000)]
[DWARF] Fix a few corner cases in expression emission
Summary: I noticed an object file with `DW_OP_reg4 DW_OP_breg4 0` as a DWARF expression,
which I traced to a missing break (and `++I`) in this code snippet.
While I was at it, I also added support for a few other corner cases
along the same lines that I could think of.
Test Plan: Hand-crafted test case to exercises these cases is included.
Anna Zaks [Tue, 9 Jun 2015 00:58:08 +0000 (00:58 +0000)]
[asan] Prevent __attribute__((annotate)) triggering errors on Darwin
The following code triggers a fatal error in the compiler instrumentation
of ASan on Darwin because we place the attribute into llvm.metadata section,
which does not have the proper MachO section name.
This commit reorders the checks so that we skip everything in llvm.metadata
first. It also removes the hard failure in case the section name does not
parse. That check will be done lower in the compilation pipeline anyway.
MergeFunctions: Impose a total order on the replacement of functions
We don't want to replace function A by Function B in one module and Function B
by Function A in another module.
If these functions are marked with linkonce_odr we would end up with a function
stub calling B in one module and a function stub calling A in another module. If
the linker decides to pick these two we will have two stubs calling each other.
Reid Kleckner [Mon, 8 Jun 2015 22:12:44 +0000 (22:12 +0000)]
[MC] Use unsigned for the Kind bitfield in MCSymbol
Fixes most of the test suite on Windows with clang-cl.
I'm not sure why the test suite was passing with MSVC 2013. Maybe they
changed their behavior and we are emulating their old sign extension
behavior. I think this deserves more investigation, but I want to green
the bot first.
Keno Fischer [Mon, 8 Jun 2015 20:09:58 +0000 (20:09 +0000)]
[InstrInfo] Refactor foldOperandImpl to thread through InsertPt. NFC
Summary:
This was a longstanding FIXME and is a necessary precursor to cases
where foldOperandImpl may have to create more than one instruction
(e.g. to constrain a register class). This is the split out NFC changes from
D6262.
Akira Hatanaka [Mon, 8 Jun 2015 18:50:43 +0000 (18:50 +0000)]
[ARM] Pass a callback to FunctionPass constructors to enable skipping execution
on a per-function basis.
Previously some of the passes were conditionally added to ARM's pass pipeline
based on the target machine's subtarget. This patch makes changes to add those
passes unconditionally and execute them conditonally based on the predicate
functor passed to the pass constructors. This enables running different sets of
passes for different functions in the module.
Pete Cooper [Mon, 8 Jun 2015 17:17:16 +0000 (17:17 +0000)]
Move COFF Type in to the MCSymbolCOFF class.
The flags field in MCSymbol only needs to be 16-bits on ELF and MachO.
This moves the 16-bit Type out of there so that it can be reduced in size in a future commit.
Matthias Braun [Mon, 8 Jun 2015 16:56:23 +0000 (16:56 +0000)]
X86: Reject register operands with obvious type mismatches.
While we have some code to transform specification like {ax} into
{eax}/{rax} if the operand type isn't 16bit, we should reject cases
where there is no sane way to do this, like the i128 type in the
example.
Oliver Stannard [Mon, 8 Jun 2015 16:55:31 +0000 (16:55 +0000)]
Fix assertion failure in global-merge with unused ConstantExpr
The global-merge pass was crashing because it assumes that all ConstantExprs
(reached via the global variables that they use) have at least one user.
I haven't worked out a way to test this, as an unused ConstantExpr cannot be
represented by serialised IR, and global-merge can only be run in llc, which
does not run any passes which can make a ConstantExpr dead.
This (reduced to the point of silliness) C code triggers this bug when compiled
for arm-none-eabi at -O1:
static a = 7;
static volatile b[10] = {&a};
c;
main() {
c = 0;
for (; c < 10;)
printf(b[c]);
}
Colin LeMahieu [Mon, 8 Jun 2015 16:34:47 +0000 (16:34 +0000)]
[Hexagon] Adding functionality for searching for compound instruction pairs. Compound instructions reduce slot resource requirements freeing those packet slots up for more instructions.
Javed Absar [Mon, 8 Jun 2015 15:01:11 +0000 (15:01 +0000)]
ARM]: Add support for MMFR4_EL1 in assembler
This patch adds support for system register MMFR4_EL1 (memory model feature register) in the assembler.
This register provides information about the implemented memory model and memory management support.
Igor Breger [Mon, 8 Jun 2015 14:03:17 +0000 (14:03 +0000)]
AVX-512: Implemented 256/128bit VALIGND/Q instructions for SKX and KNL
Implemented DAG lowering for all these forms.
Added tests for DAG lowering and encoding.
Artur Pilipenko [Mon, 8 Jun 2015 11:58:13 +0000 (11:58 +0000)]
Minor refactoring of GEP handling in isDereferenceablePointer
For GEP instructions isDereferenceablePointer checks that all indices are constant and within bounds. Replace this index calculation logic to a call to accumulateConstantOffset. Separated from the http://reviews.llvm.org/D9791
Silviu Baranga [Mon, 8 Jun 2015 10:27:06 +0000 (10:27 +0000)]
[LAA] Fix estimation of number of memchecks
Summary:
We need to add a runtime memcheck for pair of accesses (x,y) where at least one of x and y
are writes.
Assuming we have w writes and r reads, currently this number is estimated as being
w* (w+r-1). This estimation will count (write,write) pairs twice and will overestimate
the number of checks required.
This change adds a getNumberOfChecks method to RuntimePointerCheck, which
will count the number of runtime checks needed (similar in implementation to
needsAnyChecking) and uses it to produce the correct number of runtime checks.
Test Plan:
llvm test suite
spec2k
spec2k6
Performance results: no changes observed (not surprising since the formula for 1 writer is basically the same, which would covers most cases - at least with the current check limit).
Hao Liu [Mon, 8 Jun 2015 06:39:56 +0000 (06:39 +0000)]
[LoopVectorize] Teach Loop Vectorizor about interleaved memory accesses.
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
a = A[i]; // load of even element
b = A[i+1]; // load of odd element
... // operations on a, b, c, d
A[i] = c; // store of even element
A[i+1] = d; // store of odd element
}
The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
%interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr
This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'.
Remove SCEVCache and FindConstantPointers from complete loop unrolling heuristic.
Summary:
Using some SCEV functionality helped to entirely remove SCEVCache class and FindConstantPointers SCEV visitor.
Also, this makes the code more universal - I'll take advandate of it in next patches where I start handling additional types of instructions.
Test Plan: Tests would be submitted in subsequent patches.
Colin LeMahieu [Sun, 7 Jun 2015 01:46:24 +0000 (01:46 +0000)]
Teaching llvm-mc how to understand the defsym command line option. This allows integer-constant symbols to be defined on the command line and used during assembly.
David Majnemer [Sat, 6 Jun 2015 22:40:21 +0000 (22:40 +0000)]
[InstCombine, InstSimplify] Move xforms from Combine to Simplify
There were several SelectInst combines that always returned an existing
instruction instead of modifying an old one or creating a new one.
These are prime candidates for moving to InstSimplify.
Colin LeMahieu [Sat, 6 Jun 2015 20:12:40 +0000 (20:12 +0000)]
[MC] Common symbols weren't being checked for redeclaration which allowed an assembly file to generate an assertion in setCommon(): !isCommon(). This change allows redeclaration as long as the size and alignment match exactly, otherwise report a fatal error.
Sanjoy Das [Sat, 6 Jun 2015 05:24:10 +0000 (05:24 +0000)]
[LoopUnroll] Fix truncation bug in canUnrollCompletely.
Summary:
canUnrollCompletely takes `unsigned` values for `UnrolledCost` and
`RolledDynamicCost` but is passed in `uint64_t`s that are silently
truncated. Because of this, when `UnrolledSize` is a large integer
that has a small remainder with UINT32_MAX, LLVM tries to completely
unroll loops with high trip counts.