Reapply r215966, r215965, r215964, r215963, r215960, r215959, r215958, and r215957
This reverts commit r215981, which reverted the above commits because
MSVC std::equal asserts on nullptr iterators, and thes commits
introduced an `ArrayRef::equals()` on empty ArrayRefs.
ArrayRef was changed not to use std::equal in r215986.
MSVC's STL has a bug in `std::equal()`: it asserts on nullptr iterators,
causing a block revert in r215981. This works around that by re-writing
`ArrayRef::equals()` to do the work itself.
Aaron Ballman [Tue, 19 Aug 2014 14:59:02 +0000 (14:59 +0000)]
Reverting r215966, r215965, r215964, r215963, r215960, r215959, r215958, and r215957 (these commits all rely on previous commits) due to build breakage. These commits cause failed assertions when testing Clang using MSVC 2013. The asserts are triggered from the std::equal call within ArrayRef::equals due to being passed invalid input (ArrayRef.begin() is returning a nullptr which is problematic).
Toma Tabacu [Tue, 19 Aug 2014 14:22:52 +0000 (14:22 +0000)]
[mips] Add assembler support for .set arch=x directive.
Summary:
This directive is similar to ".set mipsX".
It is used to change the CPU target of the assembler, enabling it to accept instructions for a specific CPU.
This patch only implements the r4000 CPU (which is treated internally as generic mips3) and the generic ISAs.
Craig Topper [Tue, 19 Aug 2014 06:57:14 +0000 (06:57 +0000)]
Prevent use of the implicit copy constructor on SmallPtrSetImpl. An accidental copy caused my SmallPtrSet->SmallPtrSetImpl conversion commit to fail the other day.
Previously, `ConstantArray::replaceUsesOfWithOnConstant()` neglected to
check whether it becomes a `ConstantDataArray`. Call
`ConstantArray::getImpl()` to check for that.
Introduce `getImpl()` that tries the simplification logic from `get()`
and then gives up. This allows the logic to be reused elsewhere in a
follow-up commit.
Avoid RAUW-ing `ConstantExpr` when an operand changes unless the new
`ConstantExpr` already has users. This prevents the RAUW from rippling
up the expression tree unnecessarily.
This commit indirectly adds test coverage for r215953 (this is how I
came across the bug).
Rewrite `ConstantUniqueMap` to be more similar to
`ConstantAggrUniqueMap`.
- Use a `DenseMap` with custom MapInfo instead of a `std::map` with
linear lookups and deletion.
- Don't waste memory explicitly storing (heavyweight) keys.
Only `ConstantExpr` and `InlineAsm` actually use this data structure, so
I also updated them to use it.
This code cleanup is a precursor to reducing RAUW traffic on
`ConstantExpr` -- I felt badly adding a new (linear) call to
`ConstantUniqueMap::FindExistingKey`, so this designs away the concern.
A follow-up commit will transition the users of `ConstantAggrUniqueMap`
over.
This code had a homemade RAUW that was incorrect when a user was a
constant: instead of calling `replaceUsersWithOnConstant()` it would
incorrectly update the operand in-place, invalidating
`LLVMContextImpl::ExprConstants`. RAUW does the job better.
The ValueHandle that `GVMap` is holding onto needs to be removed first,
so this commit also removes each variable from the map on-the-fly.
Since deletions from `ExprConstants` use a linear search that compares
directly on the pointer value (instead of using the key), there isn't an
obvious way to expose this with a testcase.
Previously all `blockaddress()` constants were treated as forward
references. They were resolved twice: once at the end of the function
in question, and again at the end of the module. Furthermore, if the
same blockaddress was referenced N times, the parser created N distinct
`GlobalVariable`s (one for each reference).
Instead, resolve all block addresses at the beginning of the function,
creating the standard `BasicBlock` forward references used for all other
basic block references. After the function, all references can be
resolved immediately. To check for the condition of parsing block
addresses from within the same function, I created a reference to the
current per-function-state in `BlockAddressPFS`.
Also, create only one forward-reference per basic block. Because
forward references to block addresses are rare, the data structure here
shouldn't matter. If somehow it does someday, this can be pretty easily
changed to a `DenseMap<std::pair<ValID, ValID>, GV>`.
verify-uselistorder: Call verifyModule() and improve output
Call `verifyModule()` after parsing and after every transformation.
Also convert some `DEBUG(dbgs())` to `errs()` to increase visibility
into what's going on.
Robin Morisset [Mon, 18 Aug 2014 22:18:14 +0000 (22:18 +0000)]
Answer to Philip Reames comments
- add check for volatile (probably unneeded, but I agree that we should be conservative about it).
- strengthen condition from isUnordered() to isSimple(), as I don't understand well enough Unordered semantics (and it also matches the comment better this way) to be confident in the previous behaviour (thanks for catching that one, I had missed the case Monotonic/Unordered).
- separate a condition in two.
- lengthen comment about aliasing and loads
- add tests in GVN/atomic.ll
Kevin Enderby [Mon, 18 Aug 2014 20:21:02 +0000 (20:21 +0000)]
Make llvm-objdump handle both arm and thumb disassembly from the same Mach-O
file with -macho, the Mach-O specific object file parser option.
After some discussion I chose to do this implementation contained in the logic
of llvm-objdump’s MachODump.cpp using a second disassembler for thumb when
needed and with updates mostly contained in the MachOObjectFile class.
Aaron Ballman [Mon, 18 Aug 2014 14:54:22 +0000 (14:54 +0000)]
Disabling an MSVC warning ('var' : definition from the for loop is ignored; the definition from the enclosing scope is used) which will trigger false positives more than true positives.
Oliver Stannard [Mon, 18 Aug 2014 12:42:15 +0000 (12:42 +0000)]
[ARM,AArch64] Do not tail-call to an externally-defined function with weak linkage
Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.
Tim Northover [Mon, 18 Aug 2014 11:49:42 +0000 (11:49 +0000)]
TableGen: allow use of uint64_t for available features mask.
ARM in particular is getting dangerously close to exceeding 32 bits worth of
possible subtarget features. When this happens, various parts of MC start to
fail inexplicably as masks get truncated to "unsigned".
Mostly just refactoring at present, and there's probably no way to test.
The set of functions defined in the RTABI was separated for no real reason.
This brings us closer to proper utilisation of the functions defined by the
RTABI. It also sets the ground for correctly emitting function calls to AEABI
functions on all AEABI conforming platforms.
The previously existing lie on the behaviour of __ldivmod and __uldivmod is
propagated as it is beyond the scope of the change.
The changes to the test are due to the fact that we now use the divmod functions
which return both the quotient and remainder and thus we no longer need to
invoke two functions on Linux (making it closer to EABI's behaviour).
Daniel Sanders [Sun, 17 Aug 2014 19:47:47 +0000 (19:47 +0000)]
Revert: r215698 - Current implementation of c.cond.fmt instructions only accept default cc0 register...
It causes a number of regressions when -fintegrated-as is enabled. This happens
because there are codegen-only instructions that incorrectly uses the first
operand as the encoding for the $fcc register. The regressions do not occur when
-via-file-asm is also given.
This was a thinko. The intent was to flip the explicit bits that need toggling
rather than all bits. This would result in incorrect behaviour (which now is
tested).
We already handle the no-slabs case when checking whether the current slab
is large enough: if no slabs have been allocated, CurPtr and End are both 0.
alignPtr(0), will still be 0, and so "if (Ptr + Size <= End)" fails.
Added a table for intrinsics on X86.
It should remove dosens of lines in handling instrinsics (in a huge switch) and give an easy way to add new intrinsics.
I did not completed to move al intrnsics to the table, I'll do this in the upcomming commits.
Owen Anderson [Sun, 17 Aug 2014 03:51:29 +0000 (03:51 +0000)]
Remove an InstCombine that transformed patterns like (x * uitofp i1 y) to (select y, x, 0.0) when the multiply has fast math flags set.
While this might seem like an obvious canonicalization, there is one subtle problem with it. The result of the original expression
is undef when x is NaN (remember, fast math flags), but the result of the select is always defined when x is NaN. This means that the
new expression is strictly more defined than the original one. One unfortunate consequence of this is that the transform is not reversible!
It's always legal to make increase the defined-ness of an expression, but it's not legal to reduce it. Thus, targets that prefer the original
form of the expression cannot reverse the transform to recover it. Another way to think of it is that the transform has lost source-level
information (the fast math flags), which is undesirable.
Chandler Carruth [Sat, 16 Aug 2014 09:42:15 +0000 (09:42 +0000)]
[x86] Teach lots of the new vector shuffle lowering to use UNPCK
instructions for blend operations at 128 bits. This was a serious hole
in our prior blend lowering.
David Majnemer [Sat, 16 Aug 2014 09:23:42 +0000 (09:23 +0000)]
InstCombine: Fix a potential bug in 0 - (X sdiv C) -> (X sdiv -C)
While *most* (X sdiv 1) operations will get caught by InstSimplify, it
is still possible for a sdiv to appear in the worklist which hasn't been
simplified yet.
This means that it is possible for 0 - (X sdiv 1) to get transformed
into (X sdiv -1); dividing by -1 can make the transform produce undef
values instead of the proper result.
Sorry for the lack of testcase, it's a bit problematic because it relies
on the exact order of operations in the worklist.
Nico Weber [Sat, 16 Aug 2014 05:37:51 +0000 (05:37 +0000)]
arm asm: Let .fpu enable instructions, PR20447.
I'm not very happy with duplicating the fpu->feature mapping in ARMAsmParser.cpp
and in clang's driver. See the bug for a patch that doesn't do that, and the
review thread [1] for why this duplication exists.
Eric Fiselier [Sat, 16 Aug 2014 02:16:25 +0000 (02:16 +0000)]
[LIT] Move display of unsupported and xfail tests to summary.
Summary:
This patch changes the way xfail and unsupported tests are displayed.
This output is only displayed when the --show-unsupported/--show-xfail flags are passed to lit.
Currently xfail/unsupported tests are printed during the run of the test-suite. I think its better to display this information during the summary instead.
This patch removes the printing of these tests from when they are run to the summary.
BitcodeReader: Only create one basic block for each blockaddress
Block address forward-references are implemented by creating a
`BasicBlock` ahead of time that gets inserted in the `Function` when
it's eventually encountered.
However, if the same blockaddress was used in two separate functions
that were parsed *before* the referenced function (and the blockaddress
was never used at global scope), two separate basic blocks would get
created, one of which would be forgotten creating invalid IR.
This commit changes the forward-reference logic to create only one basic
block (and always return the same blockaddress).
This is an off-by-one bug I found by inspection, which would only
trigger if the bitcode writer sees more uses of a `Value` than the
reader. Since this is only relevant when an instruction gets upgraded
somehow, there unfortunately isn't a reasonable way to add test
coverage.
IR: Don't add inbounds to GEPs of extern_weak variables
Global variables that have `extern_weak` linkage may be null, so it's
incorrect to add `inbounds` when constant folding.
This also fixes a bug when parsing global aliases, whose forward
reference placeholders are global variables with `extern_weak` linkage.
If GEPs to these aliases are encountered before the alias itself, the
GEPs would incorrectly gain the `inbounds` keyword as well.
Andrea Di Biagio [Sat, 16 Aug 2014 00:29:44 +0000 (00:29 +0000)]
[DAGCombiner] Improve the folding of target independet shuffles to Undef.
When combining a pair of shuffle nodes, check if the combined shuffle mask is
trivially Undef. In case, immediately fold that pair of shuffles to Undef.
The lack of checks for undef masks was the root-cause of a poor-codegen bug
in the dag combiner.
Example:
%1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6>
%2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6>
%3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3>
Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire
sequence to Undef value and therefore we generated:
shufps $-123, %xmm1, $xmm0
pshufd $-46, %xmm0, %xmm0
With this patch, the entire shuffle sequence is folded to Undef and no
shuffles are generated in the output assembly.
Added new test cases to test 'combine-vec-shuffle-5.ll'.
Hal Finkel [Sat, 16 Aug 2014 00:17:05 +0000 (00:17 +0000)]
[PowerPC] Mark fixed-offset byvals as pointed-to by IR values
A byval object, even if allocated at a fixed offset (prescribed by the ABI) is
pointed to by IR values. Most fixed-offset stack objects are not pointed-to by
IR values, so the default is to assume this is not possible. However, we need
to override the default in this case (instruction scheduling can cause
miscompiles otherwise).
Hal Finkel [Sat, 16 Aug 2014 00:17:02 +0000 (00:17 +0000)]
Make isAliased property for fixed-offset stack objects adjustable
We used to assume that any fixed-offset stack object was not aliased. This
meant that no IR value could point to the memory contained in such an object.
This is a reasonable default, but is not a universally-correct
target-independent fact. For example, on PowerPC (both Darwin and non-Darwin),
some byval arguments are allocated at fixed offsets by the ABI. These, however,
certainly can be pointed to by IR values. This change moves the 'isAliased'
logic out of FixedStackPseudoSourceValue and into MFI, and allows the isAliased
property to be overridden for fixed-offset objects.
This will be used by an upcoming commit to the PowerPC backend to fix PR20280.
No functionality change intended (the behavior of
FixedStackPseudoSourceValue::isAliased has been made more conservative for
callers that don't pass an MFI object, but I don't see any in-tree callers that
do that).