Justin Bogner [Tue, 12 Aug 2014 03:24:59 +0000 (03:24 +0000)]
IR: Print a newline when dumping Types
Type::dump() doesn't print a newline, which makes for a poor
experience in a debugger. This looks like it was an ommission
considering Value::dump() two lines above, so I've changed Type to add
a newline as well.
Of the two in-tree callers, one added a newline anyway, and I've
updated the other one to use Type::print instead.
Adrian Prantl [Tue, 12 Aug 2014 01:07:53 +0000 (01:07 +0000)]
DebugLocEntry: Restore the comparison predicate from before the
refactoring in 215384. This way it can unique multiple entries describing
the same piece even if they don't have the exact same location.
(The same piece may get merged in and be added from OpenRanges).
There ought to be a more elegant solution for this, though.
Reid Kleckner [Tue, 12 Aug 2014 00:12:43 +0000 (00:12 +0000)]
msan: Handle musttail calls
First, avoid calling setTailCall(false) on musttail calls. The funciton
prototypes should be "congruent", so the shadow layout should be exactly
the same.
Second, avoid inserting instrumentation after a musttail call to
propagate the return value shadow. We don't need to propagate the
result of a tail call, it should already be in the right place.
Quentin Colombet [Mon, 11 Aug 2014 23:52:01 +0000 (23:52 +0000)]
[MachineSink] Improve the compile time by preserving the dominance information
as long as possible.
** Context **
Each time the dominance information is modified, the dominator tree analysis
switches in a slow query mode. After a few queries without any modification on
the dominator tree, it performs an expensive update of its internal structure to
provide fast queries again.
** Problem **
Prior to this patch, the MachineSink pass was splitting the critical edges on
demand while relying heavy on the dominator tree information. In some cases,
this leads to pathological behavior where:
- We end up in the slow query mode right after splitting an edge.
- We update the dominance information.
- We break the dominance information again, thus ending up in the slow query
mode and so on.
** Proposed Solution **
To mitigate this effect, this patch postpones all the splitting of the edges at
the end of each iteration of the main loop.
The benefits are:
- The dominance information is valid for the life time of an iteration.
- This simplifies the code as we do not have to special treat instructions that
are sunk on critical edges. Indeed, the related block will be available
through the next iteration.
The downside is that when edges splitting is required, this incurs an additional
iteration of the main loop compared to the previous scheme.
** Performance **
Thanks to this patch, the motivating example compiles in 6+ minutes instead of
10+ minutes. No test case added as the motivating example as nothing special but
being huge!
I have measured only noise for both the compile time and the runtime on the llvm
test-suite + SPECs with Os and O3.
Note: The current implementation of MachineBasicBlock::SplitCriticalEdge also
uses the dominance information and therefore, hits this problem. A subsequent
patch will address that.
Tom Stellard [Mon, 11 Aug 2014 22:18:11 +0000 (22:18 +0000)]
R600/SI: Add check for low 32 bits of encoding to mubuf tests
There are no variable values like registers encoded in the low 32 bits of MUBUF
instructions, so it is relatively easy to check these bits, and it will
help prevent us from introducing encoding bugs.
Quentin Colombet [Mon, 11 Aug 2014 22:17:14 +0000 (22:17 +0000)]
Add isRegSequence property.
This patch adds a new property: isRegSequence and the related target hooks:
TargetIntrInfo::getRegSequenceInputs and
TargetInstrInfo::getRegSequenceLikeInputs to specify that a target specific
instruction is a (kind of) REG_SEQUENCE.
Adrian Prantl [Mon, 11 Aug 2014 21:05:55 +0000 (21:05 +0000)]
Debug Info: Move the sorting and uniqueing of pieces from emitLocPieces()
into buildLocationList(). By keeping the list of Values sorted,
DebugLocEntry::Merge can also merge multi-piece entries.
ARM: try harder to detect non-IT eligible instructions
For many Thumb-1 register register instructions, setting the CPSR is not
permitted inside an IT block. We would not correctly flag those instructions.
The previous change to identify this scenario was insufficient as it did not
actually catch all the instances. The current list is formed by manual
inspection of the ARMv6M ARM.
The change to the Thumb2 IT block test is due to the fact that the new more
stringent checking of the MIs results in the If Conversion pass being prevented
from executing (since not all the instructions in the BB are predicable). This
results in code gen changes.
Thanks to Tim Northover for pointing out that the previous patch was
insufficient and hinting that the use of the v6M ARM would be much easier to use
than the v7 or v8!
Oliver Stannard [Mon, 11 Aug 2014 09:12:32 +0000 (09:12 +0000)]
ARM: __gnu_h2f_ieee and __gnu_f2h_ieee always use the soft-float calling convention
By default, LLVM uses the "C" calling convention for all runtime
library functions. The half-precision FP conversion functions use the
soft-float calling convention, and are needed for some targets which
use the hard-float convention by default, so must have their calling
convention explicitly set.
Jiangning Liu [Mon, 11 Aug 2014 05:17:19 +0000 (05:17 +0000)]
In Machine CSE pass, the source register of a COPY machine instruction can
be propagated to all its users, and this propagation could increase the
probability of finding common subexpressions. If the COPY has only one user,
the COPY itself can be removed.
Jiangning Liu [Mon, 11 Aug 2014 05:02:04 +0000 (05:02 +0000)]
In LVI(Lazy Value Info), originally value on a BB can only be caculated once,
and the lattice will be updated to be a state other than "undefined". This
limiation could miss some opportunities of lowering "overdefined" to be an
even accurate value. So this patch ask the algorithm to try to lower the
lattice value again even if the value has been lowered to be "overdefined".
Hans Wennborg [Mon, 11 Aug 2014 02:34:52 +0000 (02:34 +0000)]
Revert "Increase the size of this SmallVector in PeepholeOptimizer." (r215340)
That broke the build:
/data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:729:46: error: non-const lvalue reference to type 'SmallPtrSet<[...], 8>' cannot bind to a value of unrelated type 'SmallPtrSet<[...], 16>'
Changed |= optimizeExtInstr(MI, MBB, LocalMIs);
^~~~~~~~
/data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:265:49: note: passing argument to parameter 'LocalMIs' here
SmallPtrSet<MachineInstr*, 8> &LocalMIs) {
^
The ARM ARM states that CPSR may not be updated by a MUL in thumb mode. Due to
an ordering of Thumb 2 Size Reduction and If Conversion, we would end up
generating a THUMB MULS inside an IT block.
The If Conversion pass uses the TTI isPredicable method to ensure that it can
transform a Basic Block. However, because we only check for IT handling on
Thumb2 functions, we may miss some cases. Even then, it only validates that the
CPSR is not *live* rather than it is not accessed. This corrects the handling
for that particular case since the same restriction does not hold on the vast
majority of the instructions.
This does prevent the IfConversion optimization from kicking in in certain
cases, but generating correct code is more valuable. Addresses PR20555.
@l and friends adjust their value depending the context used in.
For ori, they are unsigned, for addi, signed. Create a new target
expression type to handle this and evaluate Fixups accordingly.
If available, pass down the Fixup object to EvaluateAsRelocatable.
At least on PowerPC, the interpretation of certain modifiers depends on
the context they appear in.
Remove the MinGW32 and Cygwin types from the OSType enumeration. These values
are represented via environments of Windows. It is a source of confusion and
needlessly clutters the code. The cost of doing this is that we must sink the
check for them into the normalization code path along with the spelling.
This removes the duplicate definition of GetXDataSection. This function is
available as a static method and is identical to the previous implementation.
This just cleans up the unnecessary duplication.
Use the full form of dccci and iccci from the early PPC 405 documents,
since the operands are actually used on those cores. Provide aliases for
the only documented case in the newer Power ISA speec.
Tom Stellard [Sat, 9 Aug 2014 01:06:53 +0000 (01:06 +0000)]
R600/SI: Update concat_vectors.ll to check for scratch usage
These tests were using SI-NOT: MOVREL to make sure concat vectors
weren't being lowered to stack loads and stores, but we are using
scratch buffers for the stack now instead of registers, so we need
to add an additional SI-NOT check for scratch buffers.
With this change I was able to uncover one broken test which will
be fixed in a future commit.
Lang Hames [Fri, 8 Aug 2014 23:12:22 +0000 (23:12 +0000)]
[MCJIT] Simplify immediate decoding code in the RuntimeDyldMachO hierarchy.
Cleanup only: no functional change.
This patch makes RuntimeDyldMachO targets directly responsible for decoding
immediates, rather than letting them implement catch a callback from generic
code. Since this is a very target specific operation, it makes sense to let the
target-specific code drive it.
I accidentally also used INC/DEC for unsigned arithmetic which doesn't work,
because INC/DEC don't set the required flag which is used for the overflow
check.
Tim Northover [Fri, 8 Aug 2014 17:31:52 +0000 (17:31 +0000)]
AArch64: avoid deleting the current iterator in a loop.
std::map invalidates the iterator to any element that gets deleted, which means
we can't increment it correctly afterwards. This was causing Darwin test
failures.
David Blaikie [Fri, 8 Aug 2014 17:12:35 +0000 (17:12 +0000)]
DebugInfo: Recommit (reverted in r215217, originally committed in r215157) the assertion that no argument variable is overwritten by subsequent argument variables.
This turned up a bug in clang where arguments were emitted with
duplicate argument numbers (see r215227).
Pedro Artigas [Fri, 8 Aug 2014 16:46:53 +0000 (16:46 +0000)]
Added a TLI hook to signal that the target does not have or does not care about
floating point exceptions, added use of flag to fold potentially exception
raising floating point math in selection DAG. No functionality change, as
targets have to explicitly ask for this behavior and none does today.