Tom Stellard [Mon, 11 Aug 2014 22:18:11 +0000 (22:18 +0000)]
R600/SI: Add check for low 32 bits of encoding to mubuf tests
There are no variable values like registers encoded in the low 32 bits of MUBUF
instructions, so it is relatively easy to check these bits, and it will
help prevent us from introducing encoding bugs.
Quentin Colombet [Mon, 11 Aug 2014 22:17:14 +0000 (22:17 +0000)]
Add isRegSequence property.
This patch adds a new property: isRegSequence and the related target hooks:
TargetIntrInfo::getRegSequenceInputs and
TargetInstrInfo::getRegSequenceLikeInputs to specify that a target specific
instruction is a (kind of) REG_SEQUENCE.
Adrian Prantl [Mon, 11 Aug 2014 21:05:55 +0000 (21:05 +0000)]
Debug Info: Move the sorting and uniqueing of pieces from emitLocPieces()
into buildLocationList(). By keeping the list of Values sorted,
DebugLocEntry::Merge can also merge multi-piece entries.
ARM: try harder to detect non-IT eligible instructions
For many Thumb-1 register register instructions, setting the CPSR is not
permitted inside an IT block. We would not correctly flag those instructions.
The previous change to identify this scenario was insufficient as it did not
actually catch all the instances. The current list is formed by manual
inspection of the ARMv6M ARM.
The change to the Thumb2 IT block test is due to the fact that the new more
stringent checking of the MIs results in the If Conversion pass being prevented
from executing (since not all the instructions in the BB are predicable). This
results in code gen changes.
Thanks to Tim Northover for pointing out that the previous patch was
insufficient and hinting that the use of the v6M ARM would be much easier to use
than the v7 or v8!
Oliver Stannard [Mon, 11 Aug 2014 09:12:32 +0000 (09:12 +0000)]
ARM: __gnu_h2f_ieee and __gnu_f2h_ieee always use the soft-float calling convention
By default, LLVM uses the "C" calling convention for all runtime
library functions. The half-precision FP conversion functions use the
soft-float calling convention, and are needed for some targets which
use the hard-float convention by default, so must have their calling
convention explicitly set.
Jiangning Liu [Mon, 11 Aug 2014 05:17:19 +0000 (05:17 +0000)]
In Machine CSE pass, the source register of a COPY machine instruction can
be propagated to all its users, and this propagation could increase the
probability of finding common subexpressions. If the COPY has only one user,
the COPY itself can be removed.
Jiangning Liu [Mon, 11 Aug 2014 05:02:04 +0000 (05:02 +0000)]
In LVI(Lazy Value Info), originally value on a BB can only be caculated once,
and the lattice will be updated to be a state other than "undefined". This
limiation could miss some opportunities of lowering "overdefined" to be an
even accurate value. So this patch ask the algorithm to try to lower the
lattice value again even if the value has been lowered to be "overdefined".
Hans Wennborg [Mon, 11 Aug 2014 02:34:52 +0000 (02:34 +0000)]
Revert "Increase the size of this SmallVector in PeepholeOptimizer." (r215340)
That broke the build:
/data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:729:46: error: non-const lvalue reference to type 'SmallPtrSet<[...], 8>' cannot bind to a value of unrelated type 'SmallPtrSet<[...], 16>'
Changed |= optimizeExtInstr(MI, MBB, LocalMIs);
^~~~~~~~
/data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:265:49: note: passing argument to parameter 'LocalMIs' here
SmallPtrSet<MachineInstr*, 8> &LocalMIs) {
^
The ARM ARM states that CPSR may not be updated by a MUL in thumb mode. Due to
an ordering of Thumb 2 Size Reduction and If Conversion, we would end up
generating a THUMB MULS inside an IT block.
The If Conversion pass uses the TTI isPredicable method to ensure that it can
transform a Basic Block. However, because we only check for IT handling on
Thumb2 functions, we may miss some cases. Even then, it only validates that the
CPSR is not *live* rather than it is not accessed. This corrects the handling
for that particular case since the same restriction does not hold on the vast
majority of the instructions.
This does prevent the IfConversion optimization from kicking in in certain
cases, but generating correct code is more valuable. Addresses PR20555.
@l and friends adjust their value depending the context used in.
For ori, they are unsigned, for addi, signed. Create a new target
expression type to handle this and evaluate Fixups accordingly.
If available, pass down the Fixup object to EvaluateAsRelocatable.
At least on PowerPC, the interpretation of certain modifiers depends on
the context they appear in.
Remove the MinGW32 and Cygwin types from the OSType enumeration. These values
are represented via environments of Windows. It is a source of confusion and
needlessly clutters the code. The cost of doing this is that we must sink the
check for them into the normalization code path along with the spelling.
This removes the duplicate definition of GetXDataSection. This function is
available as a static method and is identical to the previous implementation.
This just cleans up the unnecessary duplication.
Use the full form of dccci and iccci from the early PPC 405 documents,
since the operands are actually used on those cores. Provide aliases for
the only documented case in the newer Power ISA speec.
Tom Stellard [Sat, 9 Aug 2014 01:06:53 +0000 (01:06 +0000)]
R600/SI: Update concat_vectors.ll to check for scratch usage
These tests were using SI-NOT: MOVREL to make sure concat vectors
weren't being lowered to stack loads and stores, but we are using
scratch buffers for the stack now instead of registers, so we need
to add an additional SI-NOT check for scratch buffers.
With this change I was able to uncover one broken test which will
be fixed in a future commit.
Lang Hames [Fri, 8 Aug 2014 23:12:22 +0000 (23:12 +0000)]
[MCJIT] Simplify immediate decoding code in the RuntimeDyldMachO hierarchy.
Cleanup only: no functional change.
This patch makes RuntimeDyldMachO targets directly responsible for decoding
immediates, rather than letting them implement catch a callback from generic
code. Since this is a very target specific operation, it makes sense to let the
target-specific code drive it.
I accidentally also used INC/DEC for unsigned arithmetic which doesn't work,
because INC/DEC don't set the required flag which is used for the overflow
check.
Tim Northover [Fri, 8 Aug 2014 17:31:52 +0000 (17:31 +0000)]
AArch64: avoid deleting the current iterator in a loop.
std::map invalidates the iterator to any element that gets deleted, which means
we can't increment it correctly afterwards. This was causing Darwin test
failures.
David Blaikie [Fri, 8 Aug 2014 17:12:35 +0000 (17:12 +0000)]
DebugInfo: Recommit (reverted in r215217, originally committed in r215157) the assertion that no argument variable is overwritten by subsequent argument variables.
This turned up a bug in clang where arguments were emitted with
duplicate argument numbers (see r215227).
Pedro Artigas [Fri, 8 Aug 2014 16:46:53 +0000 (16:46 +0000)]
Added a TLI hook to signal that the target does not have or does not care about
floating point exceptions, added use of flag to fold potentially exception
raising floating point math in selection DAG. No functionality change, as
targets have to explicitly ask for this behavior and none does today.
Daniel Sanders [Fri, 8 Aug 2014 15:47:17 +0000 (15:47 +0000)]
[mips] Invert the abicalls feature bit to be noabicalls so that it's possible for -mno-abicalls to take effect.
Also added the testcase that should have been in r215194.
This behaviour has surprised me a few times now. The problem is that the
generated MipsSubtarget::ParseSubtargetFeatures() contains code like this:
if ((Bits & Mips::FeatureABICalls) != 0) IsABICalls = true;
so '-abicalls' means 'leave it at the default' and '+abicalls' means 'set it to
true'. In this case, (and the similar -modd-spreg case) I'd like the code to be
James Molloy [Fri, 8 Aug 2014 12:33:21 +0000 (12:33 +0000)]
[AArch64] Add an FP load balancing pass for Cortex-A57
For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations.
This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU.
Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness).
This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected.
Tim Northover [Fri, 8 Aug 2014 12:00:09 +0000 (12:00 +0000)]
llvm-objdump: use portable format specifiers for info.
ARM bots (& others, I think, now that I look) were failing because we
were using incorrect printf-style format specifiers. They were wrong
on almost any platform, actually, just mostly harmlessly so.