CodeGen: Add an explicit BuildMI overload for MachineInstr&
Add an explicit overload to BuildMI for MachineInstr& to deal with
insertions inside of instruction bundles.
- Use it to re-implement MachineInstr* to give it coverage.
- Document how the overload for MachineBasicBlock::instr_iterator
differs from that for MachineBasicBlock::iterator (the previous
(implicit) overload for MachineInstr&).
- Add a comment explaining why the MachineInstr& and MachineInstr*
overloads don't universally forward to the
MachineBasicBlock::instr_iterator overload.
Thanks to Justin for noticing the API quirk. While this doesn't fix any
known bugs -- all uses of BuildMI with a MachineInstr& were previously
using MachineBasicBlock::iterator -- it protects against future bugs.
CodeGen: Use MachineInstr& in TargetInstrInfo, NFC
This is mostly a mechanical change to make TargetInstrInfo API take
MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
when the argument is expected to be a valid MachineInstr. This is a
general API improvement.
Although it would be possible to do this one function at a time, that
would demand a quadratic amount of churn since many of these functions
call each other. Instead I've done everything as a block and just
updated what was necessary.
This is mostly mechanical fixes: adding and removing `*` and `&`
operators. The only non-mechanical change is to split
ARMBaseInstrInfo::getOperandLatencyImpl out from
ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a
`MachineInstr*` which it updated to the instruction bundle leader; now,
the latter calls the former either with the same `MachineInstr&` or the
bundle leader.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
Note: I updated WebAssembly, Lanai, and AVR (despite being
off-by-default) since it turned out to be easy. I couldn't run tests
for AVR since llc doesn't link with it turned on.
Object: Replace NewArchiveIterator with a simpler NewArchiveMember class. NFCI.
The NewArchiveIterator class has a problem: it requires too much context. Any
memory buffers added to the archive must be stored within an Archive::Member,
which must have an associated Archive. This makes it harder than necessary
to create new archive members (or new archives entirely) from scratch using
memory buffers.
This patch replaces NewArchiveIterator with a NewArchiveMember class that
stores just the memory buffer and the information that goes into the archive
member header.
Where command1 and command2 contain completely different sets of
valid options. This is backwards compatible with previous uses
of llvm cl which did not support subcommands, as any option
which specifies no optional subcommand (e.g. all existing
code) goes into a special "top level" subcommand that expects
dashed options to appear immediately after the program name.
For example, code which is subcommand unaware would generate
a command line such as the following, where no subcommand
is specified:
llvm-foo.exe -q1 -q2
The top level subcommand can co-exist with actual subcommands,
as it is implemented as an actual subcommand which is searched
if no explicit subcommand is specified. So llvm-foo.exe as
specified above could be written so as to support all three
aforementioned command lines simultaneously.
There is one additional "special" subcommand called AllSubCommands,
which can be used to inject an option into every subcommand.
This is useful to support things like help, so that commands
such as:
All work and display the help for the selected subcommand
without having to explicitly go and write code to handle each
one separately.
This patch is submitted without an example of anything actually
using subcommands, but a followup patch will convert the
llvm-pdbdump tool to use subcommands.
Evgeniy Stepanov [Wed, 29 Jun 2016 20:37:43 +0000 (20:37 +0000)]
StackColoring for SafeStack.
This is a fix for PR27842.
An IR-level implementation of stack coloring tailored to work with
SafeStack. It is a bit weaker than the MI implementation in that it
does not the "lifetime start at first access" logic. This can be
improved in the future.
This patch also replaces the naive implementation of stack frame
layout with a greedy algorithm that can split existing stack slots
and even fit small objects inside the alignment padding of other
objects.
Tim Shen [Wed, 29 Jun 2016 20:10:17 +0000 (20:10 +0000)]
[InstCombine] Simplify and correct folding fcmps with the same children
Summary: Take advantage of FCmpInst::Predicate's bit pattern and handle (fcmp *, x, y) | (fcmp *, x, y) and (fcmp *, x, y) & (fcmp *, x, y) more consistently. Also fold more FCmpInst::FCMP_FALSE and FCmpInst::FCMP_TRUE to constants.
Currently InstCombine wrongly folds (fcmp ogt, x, y) | (fcmp ord, x, y) to (fcmp ogt, x, y); this patch also fixes that.
Nirav Dave [Wed, 29 Jun 2016 19:54:27 +0000 (19:54 +0000)]
Permit memory operands in ins/outs instructions
[x86] (PR15455) While (ins|outs)[bwld] instructions do not take %dx as a
memory operand, various unofficial references do and objdump
disassembles to this format. Extend special treatment of
similar (in|out)[bwld] operations.
Rafael Espindola [Wed, 29 Jun 2016 18:31:48 +0000 (18:31 +0000)]
Don't verify inputs to the Linker if ODR merging.
This fixes pr28072.
The point, as Duncan pointed out, is that the file is already
partially linked by just reading it.
Long term I think the solution is to make metadata owned by the module
and then the linker will lazily read it and be in charge of all the
linking. Running a verifier in each input will defeat the lazy
loading, but will be legal.
Right now we are at the unfortunate position that to support odr
merging we cannot verify the inputs, which mildly annoying (see test
update).
Vedant Kumar [Wed, 29 Jun 2016 17:47:08 +0000 (17:47 +0000)]
[llvm-cov] Change some FileCheck prefixes to make tests reusable (NFC)
I'm planning on extending these two tests with checks that validate
html coverage reports. Make it easier to extend them by not using a
prefix called "CHECK".
Vedant Kumar [Wed, 29 Jun 2016 16:22:12 +0000 (16:22 +0000)]
[llvm-cov] Do not allow ".." to escape the coverage sub-directory
In -output-dir mode, file reports are placed into a "coverage"
directory. If filenames in the coverage mapping contain "..", they might
escape out of this directory.
Fix the problem by removing ".." from source filenames (expand the path
component).
Benjamin Kramer [Wed, 29 Jun 2016 15:04:07 +0000 (15:04 +0000)]
[ManagedStatic] Reimplement double-checked locking with std::atomic.
This gets rid of the memory fence in the hot path (dereferencing the
ManagedStatic), trading for an extra mutex lock in the cold path (when
the ManagedStatic was uninitialized). Since this only happens on the
first accesses it shouldn't matter much. On strict architectures like
x86 this removes any atomic instructions from the hot path.
Also remove the tsan annotations, tsan knows how standard atomics work
so they should be unnecessary now.
Craig Topper [Wed, 29 Jun 2016 04:57:00 +0000 (04:57 +0000)]
Revert "[ValueTracking] Teach computeKnownBits for PHI nodes to compute sign bit for a recurrence with a NSW addition."
This is breaking an optimizaton remark test in clang. I've identified a couple fixes for that, but want to understand it better before I commit to anything.
Adam Nemet [Wed, 29 Jun 2016 04:55:19 +0000 (04:55 +0000)]
[Diag] Add getter shouldAlwaysPrint. NFC
For the new hotness attribute, the API will take the pass rather than
the pass name so we can no longer play the trick of AlwaysPrint being a
special pass name. This adds a getter to help the transition.
Craig Topper [Wed, 29 Jun 2016 03:46:47 +0000 (03:46 +0000)]
[ValueTracking] Teach computeKnownBits for PHI nodes to compute sign bit for a recurrence with a NSW addition.
If a operation for a recurrence is an addition with no signed wrap and both input sign bits are 0, then the result sign bit must also be 0. Similar for the negative case.
I found this deficiency while playing around with a loop in the x86 backend that contained a signed division that could be optimized into an unsigned division if we could prove both inputs were positive. One of them being the loop induction variable. With this patch we can perform the conversion for this case. One of the test cases here is a contrived variation of the loop I was looking at.
Craig Topper [Wed, 29 Jun 2016 03:29:12 +0000 (03:29 +0000)]
[DAGCombine] Teach DAG combine to handle ORs of shuffles involving zero vectors where the zero vector is the first operand to the shuffle instead of the second.
Craig Topper [Wed, 29 Jun 2016 03:29:09 +0000 (03:29 +0000)]
[DAGCombine] Add test cases to show that DAG combining an OR of two shuffles with zero vectors doesn't work if the zero vector is the first operand of the shuffle. Fix coming in a follow up patch.
Eric Christopher [Wed, 29 Jun 2016 03:05:58 +0000 (03:05 +0000)]
Revert "[InstCombine] Avoid combining the bitcast of a var that is used as both address and result of load instructions"
Revert "[InstCombine] Combine A->B->A BitCast"
as this appears to cause PR27996 and as discussed in http://reviews.llvm.org/D20847
Davide Italiano [Wed, 29 Jun 2016 01:56:27 +0000 (01:56 +0000)]
[Triple] Add isLittleEndian().
This allows us to query about the endianness without having to
look at DataLayout. The API will be used (and tested) in lld,
in order to find out the endianness of BitcodeFiles.
Kevin Enderby [Tue, 28 Jun 2016 23:16:13 +0000 (23:16 +0000)]
Finish cleaning up most of the error handling in libObject’s MachOUniversalBinary
and its clients to use the new llvm::Error model for error handling.
Changed getAsArchive() from ErrorOr<...> to Expected<...> so now all
interfaces there use the new llvm::Error model for return values.
In the two places it had if (!Parent) this is actually a program error so changed
from returning errorCodeToError(object_error::parse_failed) to calling
report_fatal_error() with a message.
In getObjectForArch() added error messages to its two llvm::Error return values
instead of returning errorCodeToError(object_error::arch_not_found) with no
error message.
For the llvm-obdump, llvm-nm and llvm-size clients since the only binary files in
Mach-O Universal Binaries that are supported are Mach-O files or archives with
Mach-O objects, updated their logic to generate an error when a slice contains
something like an ELF binary instead of ignoring it. And added a test case for
that.
The last error stuff to be cleaned up for libObject’s MachOUniversalBinary is
the use of errorOrToExpected(Archive::create(ObjBuffer)) which needs
Archive::create() to be changed from ErrorOr<...> to Expected<...> first,
which I’ll work on next.
Weiming Zhao [Tue, 28 Jun 2016 22:30:45 +0000 (22:30 +0000)]
[ARM] Fix 28282: cost computation for constant hoisting
Summary:
This fixes bug: https://llvm.org/bugs/show_bug.cgi?id=28282
Currently the cost model of constant hoisting checks the bit width of the data type of the constants.
However, the actual immediate value is small enough and not need to be hoisted.
This patch checks for the actual bit width needed for the constant.
Dehao Chen [Tue, 28 Jun 2016 21:19:34 +0000 (21:19 +0000)]
Relax the clearance calculating for breaking partial register dependency.
Summary: LLVM assumes that large clearance will hide the partial register spill penalty. But in our experiment, 16 clearance is too small. As the inserted XOR is normally fairly cheap, we should have a higher clearance threshold to aggressively insert XORs that is necessary to break partial register dependency.
Chris Bieneman [Tue, 28 Jun 2016 21:10:26 +0000 (21:10 +0000)]
[YAML] Fix YAML tags appearing before the start of sequence elements
Our existing yaml::Output code writes tags immediately when mapTag is called, without any state handling. This results in tags on sequence elements being written before the element itself. For example, we see this:
Our reader handles reading properly, so this bug only impacts writing yaml sequences with tagged elements.
As a test for this I've modified the Mach-O yaml encoding to allways apply the !mach-o tag when encoding MachOYAML::Object entries. This results in the !mach-o tag appearing as expected in dumped fat files.
Zhan Jun Liau [Tue, 28 Jun 2016 21:03:19 +0000 (21:03 +0000)]
[SystemZ] Use NILL instruction instead of NILF where possible
Summary: SystemZ shift instructions only use the last 6 bits of the shift
amount. When the result of an AND operation is used as a shift amount, this
means that we can use the NILL instruction (which operates on the last 16 bits)
rather than NILF (which operates on the last 32 bits) for a 16-bit savings in
instruction size.
Rafael Espindola [Tue, 28 Jun 2016 20:13:36 +0000 (20:13 +0000)]
Use isPositionIndependent in a few more places.
I think this converts all the simple cases that really just care about
the generated code being position independent or not. The remaining
uses are a bit more complicated and are checking things like "is this
a library or executable" or "can this symbol be preempted".
Where command1 and command2 contain completely different sets of
valid options. This is backwards compatible with previous uses
of llvm cl which did not support subcommands, as any option
which specifies no optional subcommand (e.g. all existing
code) goes into a special "top level" subcommand that expects
dashed options to appear immediately after the program name.
For example, code which is subcommand unaware would generate
a command line such as the following, where no subcommand
is specified:
llvm-foo.exe -q1 -q2
The top level subcommand can co-exist with actual subcommands,
as it is implemented as an actual subcommand which is searched
if no explicit subcommand is specified. So llvm-foo.exe as
specified above could be written so as to support all three
aforementioned command lines simultaneously.
There is one additional "special" subcommand called AllSubCommands,
which can be used to inject an option into every subcommand.
This is useful to support things like help, so that commands
such as:
All work and display the help for the selected subcommand
without having to explicitly go and write code to handle each
one separately.
This patch is submitted without an example of anything actually
using subcommands, but a followup patch will convert the
llvm-pdbdump tool to use subcommands.
Artur Pilipenko [Tue, 28 Jun 2016 18:27:25 +0000 (18:27 +0000)]
Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
David Majnemer [Tue, 28 Jun 2016 16:04:46 +0000 (16:04 +0000)]
[X86] Make WRPKRU/RDPKRU pass -verify-machineinstrs
The original implementation attempted to zero registers using
XOR %foo, %foo. This is problematic because it constitutes a
read-modify-write of a register which might not be defined.
Instead, use MOV32r0 to avoid these problems; expandPostRAPseudo does
the right thing here.
Simon Pilgrim [Tue, 28 Jun 2016 13:24:05 +0000 (13:24 +0000)]
[X86][AVX] Peek through bitcasts to find the source of broadcasts (reapplied)
AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors.
This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected.
Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node.
As we're being more aggressive with bitcasts, we also need to ensure that the broadcast type is correctly bitcasted
Vassil Vassilev [Tue, 28 Jun 2016 12:27:09 +0000 (12:27 +0000)]
[modules] Separate intrinsic_gen dependent headers from module LLVM_IR.
Some headers in IR depend on tablegen generated code. Modules builds triggered
generation of the LLVM_IR module (including headers dependant on intrinsic_gen),
imposing a unnecessary build dependency.
Simon Pilgrim [Tue, 28 Jun 2016 08:08:15 +0000 (08:08 +0000)]
[X86][SSE] Added support for combining target shuffles to (V)PSHUFD/VPERMILPD/VPERMILPS immediate permutes
This patch allows target shuffles to be combined to single input immediate permute instructions - (V)PSHUFD/VPERMILPD/VPERMILPS - allowing more general pattern matching than what we current do and improves the likelihood of memory folding compared to existing patterns which tend to reuse the input in multiple arguments.
Further permute instructions (V)PSHUFLW/(V)PSHUFHW/(V)PERMQ/(V)PERMPD may be added in the future but its proven tricky to create tests cases for them so far. (V)PSHUFLW/(V)PSHUFHW is already handled quite well in combineTargetShuffle so it may be that removing some of that code may allow us to perform more of the combining in one place without duplication.
This patch enhances dot graph viewer to show hot regions
with hot bbs/edges displayed in red. The ratio of the bb
freq to the max freq of the function needs to be no less
than the value specified by view-hot-freq-percent option.
The default value is 10 (i.e. 10%).