Fix a problem where the TwoAddressInstructionPass which generate redundant register moves in a loop.
From:
int M, total;
void foo() {
int i;
for (i = 0; i < M; i++) {
total = total + i / 2;
}
}
This is the kernel loop:
.LBB0_2: # %for.body
=>This Inner Loop Header: Depth=1
movl %edx, %esi
movl %ecx, %edx
shrl $31, %edx
addl %ecx, %edx
sarl %edx
addl %esi, %edx
incl %ecx
cmpl %eax, %ecx
jl .LBB0_2
--------------------------
The first mov insn "movl %edx, %esi" could be removed if we change "addl %esi, %edx" to "addl %edx, %esi".
The IR before TwoAddressInstructionPass is:
BB#2: derived from LLVM BB %for.body
Predecessors according to CFG: BB#1 BB#2
%vreg3<def> = COPY %vreg12<kill>; GR32:%vreg3,%vreg12
%vreg2<def> = COPY %vreg11<kill>; GR32:%vreg2,%vreg11
%vreg7<def,tied1> = SHR32ri %vreg3<tied0>, 31, %EFLAGS<imp-def,dead>; GR32:%vreg7,%vreg3
%vreg8<def,tied1> = ADD32rr %vreg3<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg3,%vreg7
%vreg9<def,tied1> = SAR32r1 %vreg8<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8
%vreg4<def,tied1> = ADD32rr %vreg9<kill,tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg9,%vreg2
%vreg5<def,tied1> = INC64_32r %vreg3<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg5,%vreg3
CMP32rr %vreg5, %vreg0, %EFLAGS<imp-def>; GR32:%vreg5,%vreg0
%vreg11<def> = COPY %vreg4; GR32:%vreg11,%vreg4
%vreg12<def> = COPY %vreg5<kill>; GR32:%vreg12,%vreg5
JL_4 <BB#2>, %EFLAGS<imp-use,kill>
Now TwoAddressInstructionPass will choose vreg9 to be tied with vreg4. However, it doesn't see that there is copy from vreg4 to vreg11 and another copy from vreg11 to vreg2 inside the loop body. To remove those copies, it is necessary to choose vreg2 to be tied with vreg4 instead of vreg9. This code pattern commonly appears when there is reduction operation in a loop.
So check for a reversed copy chain and if we encounter one then we can commute the add instruction so we can avoid a copy.
David Blaikie [Tue, 3 Mar 2015 21:56:11 +0000 (21:56 +0000)]
Fix the build broken in r231142
I removed the copy ctor, thinking that'd be the end of it - these
iterators should be perfectly assignable even from disjoint ranges (as
any iterator would be) - exkcept that the member was const.
David Blaikie [Tue, 3 Mar 2015 21:49:07 +0000 (21:49 +0000)]
RewriteStatepointsForGC::PhiState: Remove explicit copy ctor in favor of the Rule of Zero
The assertion was just checking a class invariant that's pretty easy to
verify by inspection (no mutating operations, and the two non-copy ctors
already ensure the state is maintained) so remove the explicit copy ctor
in favor of the default, thus allowing the use of the default copy
assignment operator without hitting the C++11 deprecation here.
David Blaikie [Tue, 3 Mar 2015 21:45:54 +0000 (21:45 +0000)]
CFG::SuccessorIterator: Remove explicit copy assignment, as the default is fine
There's no reason to disallow assigning an iterator from one range to an
iterator that previously iterated over a disjoint range. This then
follows the Rule of Zero, allowing implicit copy construction to be used
without hitting the case that's deprecated in C++11.
David Blaikie [Tue, 3 Mar 2015 21:44:06 +0000 (21:44 +0000)]
CFG::SuccessorIterator::SuccessorProxy:: Expliictly default copy construction as it is deprecated in C++11 in the presence of explicit copy assignment.
See r231099 for similar issues & details in [Small]BitVector.
David Blaikie [Tue, 3 Mar 2015 21:26:17 +0000 (21:26 +0000)]
Remove the explicit SDNodeIterator::operator= in favor of the implicit default
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.
David Blaikie [Tue, 3 Mar 2015 21:17:08 +0000 (21:17 +0000)]
Remove the explicit SDNodeIterator::operator= in favor of the implicit default
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.
David Blaikie [Tue, 3 Mar 2015 21:17:00 +0000 (21:17 +0000)]
Remove the explicit SUnitIterator::operator= as the default is just fine
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.
David Blaikie [Tue, 3 Mar 2015 21:16:56 +0000 (21:16 +0000)]
Remove LatencyPriorityQueue::dump because it relies on an implicit copy ctor which is deprecated in C++11 (due to the presence of a user-declare dtor in the base class)
This type could be made copyable (= default a protected copy ctor in the
base class, and preferably make the derived class final to avoid risks
of providing a slicing copy operation to further derived classes) but it
seemed easier to avoid that complexity for a dump function that I assume
(by symmetry with ResourcePriorityQueue's dump, which was actively
buggy) not often used.
Paul Robinson [Tue, 3 Mar 2015 21:01:27 +0000 (21:01 +0000)]
[X86][ELF] Correct relocation for DWARF TLS references
Previously we had only Linux using DTPOFF for these; all X86 ELF
targets should. Fixes a side issue mentioned in PR21077.
Sanjoy Das [Tue, 3 Mar 2015 20:46:45 +0000 (20:46 +0000)]
[ADT] fail-fast iterators for DenseMap
This patch was landed in r231035 and reverted because it was buggy.
This is fixed version of the same change.
Summary:
This patch is an attempt at making `DenseMapIterator`s "fail-fast".
Fail-fast iterators that have been invalidated due to insertion into
the host `DenseMap` deterministically trip an assert (in debug mode)
on access, instead of non-deterministically hitting memory corruption
issues.
David Blaikie [Tue, 3 Mar 2015 19:53:04 +0000 (19:53 +0000)]
DeltaAlgorithm: Provide protected default copy ctor for use by test derived class.
Without this, use of this copy ctor is deprecated in C++11 due to the
presence of a user-declared dtor.
Marking the class final is just a little extra security that there are
no further derived classes that may then end up using the intermediate
base class's copy assignment operator and cause slicing to occur.
I didn't bother marking the other (non-test) base class final, since it
has reference members so it won't have any implicit assignment operators
anyway. Open to ideas on that, though.
We probably want a warning about use of a slicing assignment operator,
then I wouldn't worry so much about marking the class as final.
David Blaikie [Tue, 3 Mar 2015 19:20:13 +0000 (19:20 +0000)]
Remove some explicit copy assignment operators is favor of implicit ones, as their presence makes the use of the implicit copy ctor deprecated in C++11
Dario Domizioli [Tue, 3 Mar 2015 18:40:53 +0000 (18:40 +0000)]
Fix PR22750: non-determinism causes assertion failure in DWARF generation
The cause of the issue is the interaction of two factors:
1) When generating a DW_TAG_imported_declaration DIE which imports another
imported declaration, the code in AsmPrinter/DwarfCompileUnit.cpp
asserts that the second imported declaration must already have a DIE.
2) There is a non-determinism in the order in which imported declarations
within the same scope are processed.
Because of the non-determinism (2), it is possible that an imported
declaration is processed before another one it depends on, breaking the
assumption in (1).
The source of the non-determinism is that the imported declaration
DIDescriptors are sorted by scope in DwarfDebug::beginModule(); however that
sort is not a stable_sort, therefore the order of the declarations within
the same scope is not preserved. The attached patch changes the std::sort to
a std::stable_sort and it fixes the problem.
Test omitted due to it being non-deterministic and depending on the
implementation of std::sort.
David Blaikie [Tue, 3 Mar 2015 18:39:00 +0000 (18:39 +0000)]
[Small]BitVector::reference: Explicitly default copy construction as it is deprecated in C++11 in the presence of explicit copy assignment.
I tried making these private & friended to the BitVector, but that
didn't work - there's one use of BitVector::reference in Clang that
actually copies it into a local variable & uses it from there, rather
than just using the result of op[] in a temporary expression.
Whether or not this is desired is debatable (we could just fix that one
use in Clang) & it's not clear which way the C++ standard falls on this
for std::bitset's reference type (it has the same bug at least in
libstdc++, but Clang's -Wdeprecated doesn't flag it, because it's in a
standard header)
While it was only BitVector::reference's copy ctor that was referenced
by user code, I made SmallBitVector::reference's copy ctor public too,
for consistency.
Reid Kleckner [Tue, 3 Mar 2015 17:41:09 +0000 (17:41 +0000)]
Make llvm.eh.begincatch use an outparam
Ultimately, __CxxFrameHandler3 needs us to put a stack offset in a
table, and it will take responsibility for copying the exception object
into that slot. Modelling the exception object as an SSA value returned
by begincatch isn't going to work in general, so make it use an output
parameter.
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464. I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned. Let me know if I'm wrong :).
The code changes are fairly mechanical:
- Bumped the "Debug Info Version".
- `DIBuilder` now creates the appropriate subclass of `MDNode`.
- Subclasses of DIDescriptor now expect to hold their "MD"
counterparts (e.g., `DIBasicType` expects `MDBasicType`).
- Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
for printing comments.
- Big update to LangRef to describe the nodes in the new hierarchy.
Feel free to make it better.
Testcase changes are enormous. There's an accompanying clang commit on
its way.
If you have out-of-tree debug info testcases, I just broke your build.
- `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to
update all the IR testcases.
- Unfortunately I failed to find way to script the updates to CHECK
lines, so I updated all of these by hand. This was fairly painful,
since the old CHECKs are difficult to reason about. That's one of
the benefits of the new hierarchy.
This work isn't quite finished, BTW. The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro). Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I
also expect to make a few schema changes now that it's easier to reason
about everything.
Add the final bits of API that `DIBuilder` needs before the new nodes
can be moved into place.
- Add `MDType::clone()` and `MDType::setFlags()` to support
`DIBuilder::createTypeWithFlags()`.
- Add `MDBasicType::get()` overload that just requires a tag and a
name, as a convenience for `DIBuilder::createUnspecifiedType()`.
- Add `MDLocalVariable::withInline()` and
`MDLocalVariable::withoutInline()` to support
`llvm::createInlinedVariable()` and
`llvm::cleanseInlinedVariable()`.
(Somehow these got lost inside the "move into place" patch I'm about to
commit -- better to commit separately!)
Daniel Jasper [Tue, 3 Mar 2015 10:23:11 +0000 (10:23 +0000)]
During PHI elimination, split critical edges that move copies out of loops.
This prevents the behavior observed in llvm.org/PR22369. I am not sure
whether I am reading the code correctly, but the early exit based on
isLiveOutPastPHIs() seems to make the wrong assumption that
RegisterCoalescer won't be able to coalesce those copies later.
This change hides the new behavior behind -no-phi-elim-live-out-early-exit
as it currently breaks four tests:
* Assertion in:
CodeGen/Hexagon/hwloop-cleanup.ll
* Worse code in:
CodeGen/X86/coalescer-commute4.ll
CodeGen/X86/phys_subreg_coalesce-2.ll
CodeGen/X86/zlib-longest-match.ll
The root cause here seems to be that the heuristic that determines
the visitation order in RegisterCoalescer gets less lucky.
Ahmed Bougacha [Tue, 3 Mar 2015 01:21:16 +0000 (01:21 +0000)]
[X86] Special-case 2x CMOV when custom-inserting.
This lets us avoid a few copies that are otherwise hard to get rid of.
The way this is done is, the custom-inserter looks at the following
instruction for another CMOV, and replaces both at the same time.
A previous version used a new CMOV2 opcode, but the custom inserter
is expected to be able to return a different basic block anyway, which
means it's OK - though far from ideal - to alter that block's contents.
Explicitly document that, in case it ever makes a difference.
Alternatives welcome!
When we can't use the CMOV instruction, it might increase branch
mispredicts. When we can, or when there is no mispredict, this
improves throughput and reduces register pressure.
These can't be catched by generic combines, because the pattern can
appear when legalizing some instructions (such as fcmp une).
LowerBitSets: Use byte arrays instead of bit sets to represent in-memory bit sets.
By loading from indexed offsets into a byte array and applying a mask, a
program can test bits from the bit set with a relatively short instruction
sequence. For example, suppose we have 15 bit sets to lay out:
A (16 bits), B (15 bits), C (14 bits), D (13 bits), E (12 bits),
F (11 bits), G (10 bits), H (9 bits), I (7 bits), J (6 bits), K (5 bits),
L (4 bits), M (3 bits), N (2 bits), O (1 bit)
These bits can be laid out in a 16-byte array like this:
For example, to test bit X of A, we evaluate ((bits[X] & 1) != 0), or to
test bit X of I, we evaluate ((bits[9 + X] & 0x80) != 0). This can be done
in 1-2 machine instructions on x86, or 4-6 instructions on ARM.
This uses the LPT multiprocessor scheduling algorithm to lay out the bits
efficiently.
Saves ~450KB of instructions in a recent build of Chromium.
Benjamin Kramer [Tue, 3 Mar 2015 00:17:09 +0000 (00:17 +0000)]
LoopIdiom: Give globals for memset_pattern16 private linkage.
There's really no reason to have them have entries in the symbol table
anymore. Old versions of ld64 had some bugs in this area but those have
been fixed long ago.
Sanjoy Das [Mon, 2 Mar 2015 23:29:37 +0000 (23:29 +0000)]
[ADT] fail-fast iterators for DenseMap
Summary:
This patch is an attempt at making `DenseMapIterator`s "fail-fast".
Fail-fast iterators that have been invalidated due to insertion into
the host `DenseMap` deterministically trip an assert (in debug mode)
on access, instead of non-deterministically hitting memory corruption
issues.
Adrian Prantl [Mon, 2 Mar 2015 22:02:33 +0000 (22:02 +0000)]
Refactor DebugLocDWARFExpression so it doesn't require access to the
TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes
of the pre-calculated DWARF expression.
Ought to be NFC, but it does slightly alter the output format of the
textual assembly.
This reapplies 230930 without the assertion in DebugLocEntry::finalize()
because not all Machine registers can be lowered into DWARF register
numbers and floating point constants cannot be expressed.
Sanjoy Das [Mon, 2 Mar 2015 21:41:07 +0000 (21:41 +0000)]
Revert some changes that were made to fix PR20680.
This re-lands change r230921. r230921 was reverted because it broke a
clang test; a checkin fixing the clang test will be commited shortly.
Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533. SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow. There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.
Revert "IndVarSimplify: Allow LFTR to fire more often"
Reid Kleckner [Mon, 2 Mar 2015 21:33:18 +0000 (21:33 +0000)]
lit: Add 'cd' support to the internal shell and port some tests
The internal shell was already threading around a 'cwd' parameter. We
just have to make it mutable so that we can update it as the test script
executes.
If the shell ever grows support for environment variable substitution,
we could also implement support for export.
Restore LLVMLinkModules C API until it is properly deprecated.
Add the enum "LLVMLinkerMode" back for backwards-compatibility and add the
linker mode parameter back to the "LLVMLinkModules" function. The paramter is
ignored and has no effect.
Patch provided by: Filip Pizlo
Reviewed by: Rafael and Sean
Justin Bogner [Mon, 2 Mar 2015 17:26:43 +0000 (17:26 +0000)]
Detect malformed YAML sequence in yaml::Input::beginSequence()
When reading a yaml::SequenceTraits object, YAMLIO does not report an
error if the yaml item is not a sequence. Instead, YAMLIO reads an
empty sequence. For example:
---
seq:
foo: 1
bar: 2
...
If `seq` is a SequenceTraits object, then reading the above yaml will
yield `seq` as an empty sequence.
Fix this to report an error for the above mapping ("not a sequence")
Adrian Prantl [Mon, 2 Mar 2015 17:21:06 +0000 (17:21 +0000)]
Refactor DebugLocDWARFExpression so it doesn't require access to the
TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes
of the pre-calculated DWARF expression.
Ought to be NFC, but it does slightly alter the output format of the
textual assembly.
This reapplies 230930 with a relaxed assertion in DebugLocEntry::finalize()
that allows for empty DWARF expressions for constant FP values.