Adam Nemet [Wed, 18 Feb 2015 03:42:15 +0000 (03:42 +0000)]
[LoopAccesses] Make raw_string_ostream local in VectorizationReport
Since VectorizationReport will be part of the result of the analysis it
will be stored in a container. However, one of its members is a
raw_string_ostream which cannot be copy-constructed.
This makes the raw_string_ostream local to the << operator.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
Akira Hatanaka [Wed, 18 Feb 2015 03:30:11 +0000 (03:30 +0000)]
[InstCombine] Do not insert a GEP instruction before a landingpad instruction.
InstCombiner::visitGetElementPtrInst was using getFirstNonPHI to compute the
insertion point, which caused the verifier to complain when a GEP was inserted
before a landingpad instruction. This commit fixes it to use getFirstInsertionPt
instead.
Hal Finkel [Wed, 18 Feb 2015 03:12:28 +0000 (03:12 +0000)]
[BDCE] Don't forget uses of root instructions seen before the instruction itself
When visiting the initial list of "root" instructions (those which must always
be alive), for those that are integer-valued (such as invokes returning an
integer), we mark their bits as (initially) all dead (we might, obviously, find
uses of those bits later, but all bits are assumed dead until proven
otherwise). Don't do so, however, if we're already seen a use of those bits by
another root instruction (such as a store).
Fixes a miscompile of the sanitizer unit tests on x86_64.
Also, add a debug line for visiting the root instructions, and remove a debug
line which tried to print instructions being removed (printing dead
instructions is dangerous, and can sometimes crash).
Justin Bogner [Wed, 18 Feb 2015 01:58:17 +0000 (01:58 +0000)]
Re-apply "InstrProf: Add unit tests for the profile reader and writer"
Have the InstrProfWriter return a MemoryBuffer instead of a
std::string. This fixes the alignment issues the reader would hit, and
it's a more appropriate type for this anyway.
I've also removed an ugly helper function that's not needed since
we're allowing initializer lists now, and updated some error code
checks based on MSVC's issues with r229473.
Eric Christopher [Wed, 18 Feb 2015 01:01:57 +0000 (01:01 +0000)]
Make the Mips AsmPrinter independent of global subtarget
initialization. Initialize the subtarget once per function and
migrate EmitStartOfAsmFile to either use calls on the
TargetMachine or get information from the subtarget we'd use
for assembling.
The top-level-ness of the MIPS attribute output for assembly is,
by nature, contrary to how we'd want to do this for an LTO
situation where we have multiple cpu architectures so this
solution is good enough for now.
Sanjoy Das [Wed, 18 Feb 2015 00:43:19 +0000 (00:43 +0000)]
Bugfix: SCEV incorrectly marks certain expressions as nsw
I could not come up with a test case for this one; but I don't think
`getPreStartForSignExtend` can assume `AR` is `nsw` -- there is one
place in scalar evolution that calls `getSignExtendAddRecStart(AR,
...)` without proving that `AR` is `nsw`
(line 1564)
OperandExtendedAdd =
getAddExpr(WideStart,
getMulExpr(WideMaxBECount,
getZeroExtendExpr(Step, WideTy)));
if (SAdd == OperandExtendedAdd) {
// If AR wraps around then
//
// abs(Step) * MaxBECount > unsigned-max(AR->getType())
// => SAdd != OperandExtendedAdd
//
// Thus (AR is not NW => SAdd != OperandExtendedAdd) <=>
// (SAdd == OperandExtendedAdd => AR is NW)
// Return the expression with the addrec on the outside.
return getAddRecExpr(getSignExtendAddRecStart(AR, Ty, this),
getZeroExtendExpr(Step, Ty),
L, AR->getNoWrapFlags());
}
Andrea Di Biagio [Tue, 17 Feb 2015 23:40:58 +0000 (23:40 +0000)]
[X86][FastIsel] Teach how to select scalar integer to float/double conversions.
This patch teaches fast-isel how to select a (V)CVTSI2SSrr for an integer to
float conversion, and how to select a (V)CVTSI2SDrr for an integer to double
conversion.
We require MSVC 2013 Update 4 due to previous versions miscompiling ASTMatchers
Previous versions of MSVC 2013 would miscompile ASTMatchers (and/or their
tests). Bump up the requirement and make sure we know about the minor
revision.
Minimum required version found by Michael Edwards!
Add missing specialized node overloads for `MDNode::clone()` (they were
on most of the node types already, but missing from the others).
`MDNode::clone()` returns `TempMDNode` (`std::unique_ptr<MDNode,...>`),
while `TempMDSubrange::clone()` (for example) returns the more
convenient `TempMDSubrange` (`std::unique_ptr<TempMDSubrange,...>`).
AsmPrinter: Take range in DwarfExpression::AddExpression(), NFC
Previously `DwarfExpression::AddExpression()` relied on
default-constructing the end iterators for `DIExpression` -- once the
operands are represented explicitly via `MDExpression` (instead of via
the strange `StringRef` navigator in `DIHeaderIterator`) this won't
work. Explicitly take an iterator for the end of the range.
Justin Bogner [Tue, 17 Feb 2015 21:33:43 +0000 (21:33 +0000)]
Re-apply "InstrProf: Use a test fixture in the coverage mapping tests"
This time we use a helper to format the assertion so we can just use
ASSERT_TRUE instead of relying on ASSERT_EQ being able to deal with
conversions between enum types.
Rafael Espindola [Tue, 17 Feb 2015 20:48:01 +0000 (20:48 +0000)]
Add r228980 back.
Add support for having multiple sections with the same name and comdat.
Using this in combination with -ffunction-sections allows LLVM to output a .o
file with mulitple sections named .text. This saves space by avoiding long
unique names of the form .text.<C++ mangled name>.
Sanjay Patel [Tue, 17 Feb 2015 20:08:21 +0000 (20:08 +0000)]
prevent folding a scalar FP load into a packed logical FP instruction (PR22371)
Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors.
That's what the hardware packed logical FP instructions define: 128-bit memory operands.
There are no scalar versions of these instructions...because this is x86.
Generating the wrong code (folding a scalar load into a 128-bit load) is still possible
using the peephole optimization pass and the load folding tables. We won't completely
solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other
places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl()
to make sure it isn't changing the size of the load.
Reid Kleckner [Tue, 17 Feb 2015 20:02:34 +0000 (20:02 +0000)]
Expose LLVM_VERSION_PATCH in llvm-config.h
There was no reason to keep this private in config.h, and users
requested that it be available in PR22615.
Also fix a bug where patch versions of '0' would cause the macro to
remain undefined. The "#cmakedefine" command only creates a macro if the
named variable would be considered true in the context of an if().
Eric Christopher [Tue, 17 Feb 2015 20:02:32 +0000 (20:02 +0000)]
Make the ARM AsmPrinter independent of global subtarget
initialization. Initialize the subtarget once per function and
migrate Emit{Start|End}OfAsmFile to either use attributes on the
TargetMachine or get information from the subtarget we'd use
for assembling. One bit (getISAEncoding) touched the general
AsmPrinter and the debug output. Handle this one by passing
the function for the subprogram down and updating all callers
and users.
The top-level-ness of the ARM attribute output for assembly is,
by nature, contrary to how we'd want to do this for an LTO
situation where we have multiple cpu architectures so this
solution is good enough for now.
Simon Atanasyan [Tue, 17 Feb 2015 18:54:22 +0000 (18:54 +0000)]
[Object] Support reading 64-bit MIPS ELF archives
The 64-bit MIPS ELF archive file format is used by MIPS64 targets.
The main difference from a regular archive file is the symbol table format:
1. ar_name is equal to "/SYM64/"
2. number of symbols and offsets are 64-bit integers
The patch allows reading of such archive files by llvm-nm, llvm-objdump
and other tools. But it does not support archive files with number of symbols
and/or offsets exceed 2^32. I think it is a rather rare case requires more
significant modification of `Archive` class code.
Aaron Ballman [Tue, 17 Feb 2015 17:44:07 +0000 (17:44 +0000)]
Correcting the ArrayRef test to not cause use-after-free bugs with initializer lists. Should also silence a -Wsign-compare warning accidentally introduced.
Sanjay Patel [Tue, 17 Feb 2015 16:54:32 +0000 (16:54 +0000)]
Canonicalize splats as build_vectors (PR22283)
This is a follow-on patch to:
http://reviews.llvm.org/D7093
That patch canonicalized constant splats as build_vectors,
and this patch removes the constant check so we can canonicalize
all splats as build_vectors.
This fixes the 2nd test case in PR22283:
http://llvm.org/bugs/show_bug.cgi?id=22283
The unfortunate code duplication between SelectionDAG and DAGCombiner
is discussed in the earlier patch review. At least this patch is just
removing code...
This improves an existing x86 AVX test and changes codegen in an ARM test.
Aaron Ballman [Tue, 17 Feb 2015 15:37:53 +0000 (15:37 +0000)]
I believe we no longer require LLVM_HAS_INITIALIZER_LISTS; it's supported in MSVC 2013 and GCC. Added a trivial test to ensure the ArrayRef initializer list constructor is called and behaves as expected.
If any of the bots complain (perhaps due to an antiquated version of an STL implementation), I will revert.
Andrea Di Biagio [Tue, 17 Feb 2015 12:25:49 +0000 (12:25 +0000)]
[X86][FastISel] Add missing flag -fast-isel-abort to run lines in test fast-isel-fptrunc-fpext.ll.
Flag -fast-isel-abort is required in order to verify that X86FastISel
never fails to select FPExt (float-to-double) and FPTrunc (double-to-float).
No Functional change intended.
Andrea Di Biagio [Tue, 17 Feb 2015 11:20:11 +0000 (11:20 +0000)]
[X86] Silence -Wsign-compare warnings.
GCC 4.8 reported two new warnings due to comparisons
between signed and unsigned integer expressions. The new warnings were
accidentally introduced by revision 229480.
Added explicit casts to silence the warnings. No functional change intended.
Justin Bogner [Tue, 17 Feb 2015 09:21:43 +0000 (09:21 +0000)]
Revert "InstrProf: Add unit tests for the profile reader and writer"
This added API to the InstrProfWriter to write to a string so I could
write unittests without using temp files. This doesn't really work,
since the format has tighter alignment requirements than a char.
AVX-512: changes in intel_ocl_bi calling conventions
- added mask types v8i1 and v16i1 to possible function parameters
- enabled passing 512-bit vectors in standard CC
- added a test for KNL intel_ocl_bi conventions
[X86] Combine vector anyext + and into a vector zext
Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements.
Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND.
This combine only covers a subset of the cases, but it's a start.
Eric Christopher [Tue, 17 Feb 2015 07:21:21 +0000 (07:21 +0000)]
Make the PowerPC AsmPrinter independent of global subtarget
initialization. Initialize the subtarget once per function and
migrate EmitStartOfAsmFile to either use attributes on the
TargetMachine or get information from all of the various
subtargets.
Eric Christopher [Tue, 17 Feb 2015 06:45:15 +0000 (06:45 +0000)]
Move ABI handling and 64-bitness to the PowerPC target machine.
This required changing how the computation of the ABI is handled
and how some of the checks for ABI/target are done.
Lang Hames [Tue, 17 Feb 2015 05:40:42 +0000 (05:40 +0000)]
[Orc][Kaleidoscope] Add an example of extreme-laziness in Orc.
The version of the tutorial uses the new compile callbacks API to inject stubs
that trigger IRGen & Codegen of their respective function bodies when they are
first called.
Chandler Carruth [Tue, 17 Feb 2015 02:12:24 +0000 (02:12 +0000)]
[x86] Teach the unpack lowering to try wider element unpacks.
This allows it to match still more places where previously we would have
to fall back on floating point shuffles or other more complex lowering
strategies.
I'm hoping to replace some of the hand-rolled unpack matching with this
routine is it gets more and more clever.
Hal Finkel [Tue, 17 Feb 2015 01:36:59 +0000 (01:36 +0000)]
[BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
Lang Hames [Tue, 17 Feb 2015 01:18:38 +0000 (01:18 +0000)]
[Orc] Update the Orc indirection utils and refactor the CompileOnDemand layer.
This patch replaces most of the Orc indirection utils API with a new class:
JITCompileCallbackManager, which creates and manages JIT callbacks.
Exposing this functionality directly allows the user to create callbacks that
are associated with user supplied compilation actions. For example, you can
create a callback to lazyily IR-gen something from an AST. (A kaleidoscope
example demonstrating this will be committed shortly).
This patch also refactors the CompileOnDemand layer to use the
JITCompileCallbackManager API.
Hal Finkel [Tue, 17 Feb 2015 00:11:19 +0000 (00:11 +0000)]
Specify arch in test/CodeGen/X86/float-conv-elim.ll
This test was failing on non-x86 hosts because it specified a cpu of x86_64,
but not an architecture. x86_64 is obviously not a valid cpu on all
architectures.
While looking at a heap profile of a clang LTO bootstrap with -g, I
noticed that 2.2% of memory in an `llvm-lto` of clang is from calling
`DebugLoc::get()` in `collectVariableInfo()` (accounting for ~40% of
memory used for `MDLocation`s).
I suspect this was introduced by r226736, whose goal was to prevent
uniquing of `DebugLoc`s (goal achieved, if so).
There's no reason we need a `DebugLoc` here at all -- it was just being
used for (in)convenient API -- so the fix is to pass the scope and
inlined-at directly to `LexicalScopes::findInlinedScope()`.
Hal Finkel [Mon, 16 Feb 2015 23:46:30 +0000 (23:46 +0000)]
[PowerPC] Support non-direct-sub/superclass VSX copies
Our register allocation has become better recently, it seems, and is now
starting to generate cross-block copies into inflated register classes. These
copies are not transformed into subregister insertions/extractions by the
PPCVSXCopy class, and so need to be handled directly by
PPCInstrInfo::copyPhysReg. The code to do this was *almost* there, but not
quite (it was unnecessarily restricting itself to only the direct
sub/super-register-class case (not copying between, for example, something in
VRRC and the lower-half of VSRC which are super-registers of F8RC).
Triggering this behavior manually is difficult; I'm including two
bugpoint-reduced test cases from the test suite.
Lang Hames [Mon, 16 Feb 2015 22:36:25 +0000 (22:36 +0000)]
[Orc] Add an emitAndFinalize method to the ObjectLinkingLayer, IRCompileLayer
and LazyEmittingLayer of Orc.
This method allows you to immediately emit and finalize a module. It is required
by an upcoming refactor of the indirection utils and the compile-on-demand
layer.
I've filed http://llvm.org/PR22608 to write unit tests for this and other Orc
APIs.
Matthias Braun [Mon, 16 Feb 2015 22:05:12 +0000 (22:05 +0000)]
RegisterCoalescer: Do not look for regclass of IMPLICIT_DEF.
IMPLICIT_DEF is a generic instruction and has no (fixed) output register
class defined. The rematerialization code of the register coalescer
should not scan the instruction description for a register class.
This fixes a problem showing up in
test/CodeGen/R600/subreg-coalescer-crash.ll with subregister liveness
enabled.