Tim Northover [Thu, 22 Oct 2015 17:20:48 +0000 (17:20 +0000)]
CodeGen: increase bits allocated for LegalizeActions
The array handling CondCodes only allocated 2 bits to describe the
desired action for each type. The new addition of a "LibCall" option
overflowed this and caused corruption for Custom actions.
No in-tree targets have a Custom CondCodeAction, so unfortunately it
can't be tested.
Zia Ansari [Thu, 22 Oct 2015 16:14:45 +0000 (16:14 +0000)]
[X86] - Catch extra combine opportunities for redundant imuls.
When we fold "mul ((add x, c1), c1)" -> "add ((mul x, c2), c1*c2)", we bail if (add x, c1) has multiple
users which would result in an extra add instruction.
In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add.
I also added the capability of doing the existing optimization with non-splatted vectors (splatted also works).
Bill Schmidt [Thu, 22 Oct 2015 15:53:44 +0000 (15:53 +0000)]
[PPC] Fix PR24686 by failing assembly for an invalid relocation
PR24686 identifies a problem where a relocation expression is invalid
when not all of the symbols in the expression can be locally
resolved. This causes the compiler to request a PC-relative half16ds
relocation, which is nonsensical for PowerPC. This patch recognizes
this situation and ensures we fail the assembly cleanly.
Summary:
Hyphens were missing from the triple, causing it to be parsed
incorrectly. This patch updates the triple and makes necessary
changes to the expected output.
James Molloy [Thu, 22 Oct 2015 13:44:26 +0000 (13:44 +0000)]
[GlobalsAA] Loosen an overly conservative bailout
Instead of bailing out when we see loads, analyze them. If we can prove that the loaded-from address must escape, then we can conclude that a load from that address must escape too and therefore cannot alias a non-addr-taken global.
When checking if a Value can alias a non-addr-taken global, if the Value is a LoadInst of a non-global, recurse instead of bailing.
If we can follow a trail of loads up to some base that is captured, we know by inference that all the loads we followed are also captured.
James Molloy [Thu, 22 Oct 2015 13:18:42 +0000 (13:18 +0000)]
[ValueTracking] Add a new predicate: isKnownNonEqual()
isKnownNonEqual(A, B) returns true if it can be determined that A != B.
At the moment it only knows two facts, that a non-wrapping add of nonzero to a value cannot be that value:
A + B != A [where B != 0, addition is nsw or nuw]
and that contradictory known bits imply two values are not equal.
This patch also hooks this up to InstSimplify; InstSimplify had a peephole for the first fact but not the second so this teaches InstSimplify a new trick too (alas no measured performance impact!)
Manuel Klimek [Thu, 22 Oct 2015 08:31:46 +0000 (08:31 +0000)]
Fix add_llvm_external_project.
r250835 unintentionally discarded the optional parameter to the
add_llvm_external_project() macro that may point to a path when the said
path is different from ${name}. This should fix it by passing ${ARGN} on
to add_llvm_subdirectory(). The problem manifests itself with e.g.
add_llvm_external_project(clang-tools-extra extra) from
clang/tools/CMakeLists.txt
Partially reverted changes from r250686
Clang runtime failure was reported.
Assertion failed: (isExtended() && "Type is not extended!"), function getTypeForEVT
I'll need to add a proper handling for PointerType in masked load/store intrinsics.
Sanjoy Das [Thu, 22 Oct 2015 03:12:22 +0000 (03:12 +0000)]
[OperandBundles] Make function attributes conservatively correct
Summary:
This makes attribute accessors on `CallInst` and `InvokeInst` do the
(conservatively) right thing. This essentially involves, in some
cases, *not* falling back querying the attributes on the called
`llvm::Function` when operand bundles are present.
Attributes locally present on the `CallInst` or `InvokeInst` will still
override operand bundle semantics. The LangRef has been amended to
reflect this. Note: this change does not do anything prevent
`-function-attrs` from inferring `CallSite` local attributes after
inspecting the called function -- that will be done as a separate
change.
I've used `-adce` and `-early-cse` to test these changes. There is
nothing special about these passes (and they did not require any
changes) except that they seemed be the easiest way to write the tests.
This change does not add deal with `argmemonly`. That's a later change
because alias analysis requires a related fix before `argmemonly` can be
tested.
Pete Cooper [Thu, 22 Oct 2015 01:48:57 +0000 (01:48 +0000)]
Add missing load/store flags to thumb2 instructions.
These were the cause of a verifier error when building 7zip with
-verify-machineinstrs. Running 'make check' with the verifier
triggered the same error on the test here so i've updated the test
to run the verifier on one of its runs instead of adding a new one.
While looking at this code, there was a stale comment that these
instructions were only used for disassembly. This probably used to
be the case, but they are now used in the 'ARM load / store optimization pass' too.
This reapplies r242300 which was reverted in r242428 due to bot failures.
Ultimately those failures were spurious and completely unrelated to this commit. I reverted this
at the time because it was thought to be at fault.
Matt Arsenault [Wed, 21 Oct 2015 22:37:50 +0000 (22:37 +0000)]
AMDGPU: Fix verifier error in SIFoldOperands
There may be other use operands that also need their kill flags cleared.
This happens in a few tests when SIFoldOperands is moved after
PeepholeOptimizer.
PeepholeOptimizer rewrites cases that look like:
%vreg0 = ...
%vreg1 = COPY %vreg0
use %vreg1<kill>
%vreg2 = COPY %vreg0
use %vreg2<kill>
to use the earlier source to
%vreg0 = ...
use %vreg0
use %vreg0
Currently SIFoldOperands sees the copied registers, so there is
only one use. So far I haven't managed to come up with a test
that currently has multiple uses of a foldable VGPR -> VGPR copy.
Davide Italiano [Wed, 21 Oct 2015 22:12:03 +0000 (22:12 +0000)]
[JIT] Towards a working small memory model.
This commit introduces an option, --preallocate, so that we can get memory
upfront and use it in small memory model tests (in order to get
reliable results).
Matt Arsenault [Wed, 21 Oct 2015 21:51:02 +0000 (21:51 +0000)]
AMDGPU: Simplify VOP3 operand legalization.
This was checking for a variety of situations that should
never happen. This saves a tiny bit of compile time.
We should not be selecting instructions with invalid operands in the
first place. Most of the time for registers copys are inserted
to the correct operand register class.
For VOP3, since all operand types are supported and literal
constants never are, we just need to verify the constant bus
requirements (all immediates should be legal inline ones).
The only possibly tricky case to maybe worry about is if when
legalizing operands in moveToVALU with s_add_i32 and similar
instructions. If the original s_add_i32 had a literal constant
and we need to replace it with v_add_i32_e64 we would have an
unsupported literal operand. However, I don't think we should worry
about that because SIFoldOperands should handle folding literal
constant operands into the SALU instructions based on the uses.
At SIFoldOperands time, the legality and profitability of
operand types is a bit different.
Keno Fischer [Wed, 21 Oct 2015 20:22:04 +0000 (20:22 +0000)]
[RuntimeDyld] Ignore ST_FILE symbols when constructing GlobalSymbolTable
Summary: ELF's STT_File symbols may overlap with regular globals in
other files, so we should ignore them here in order to avoid having
bogus entries in the symbol table that confuse us when resolving relocations.
Drop assert that a call with struct return goes to a function with sret
attribute. Clang incorrectly misses it on __muldc3 and friends and the
type system doesn't include it properly either.
Kevin Enderby [Wed, 21 Oct 2015 16:59:24 +0000 (16:59 +0000)]
This removes the eating of the error in Archive::Child::getSize() when the characters
in the size field in the archive header for the member is not a number. To do this we
have all of the needed methods return ErrorOr to push them up until we get out of lib.
Then the tools and can handle the error in whatever way is appropriate for that tool.
So the solution is to plumb all the ErrorOr stuff through everything that touches archives.
This include its iterators as one can create an Archive object but the first or any other
Child object may fail to be created due to a bad size field in its header.
Thanks to Lang Hames on the changes making child_iterator contain an
ErrorOr<Child> instead of a Child and the needed changes to ErrorOr.h to add
operator overloading for * and -> .
We don’t want to use llvm_unreachable() as it calls abort() and is produces a “crash”
and using report_fatal_error() to move the error checking will cause the program to
stop, neither of which are really correct in library code. There are still some uses of
these that should be cleaned up in this library code for other than the size field.
Also corrected the code where the size gets us to the “at the end of the archive”
which is OK but past the end of the archive will return object_error::parse_failed now.
The test cases use archives with text files so one can see the non-digit character,
in this case a ‘%’, in the size field.
Vedant Kumar [Wed, 21 Oct 2015 16:03:32 +0000 (16:03 +0000)]
[llvm-cov] Adjust column widths for function and file reports
Previously, we only expanded function and filename column widths when
rendering file reports. This commit makes the change for function
reports as well.
Daniel Sanders [Wed, 21 Oct 2015 12:44:14 +0000 (12:44 +0000)]
[mips][mips16] Re-work the inline assembly stubs to work with IAS. NFC.
Summary:
Previously, we were inserting an InlineAsm statement for each line of the
inline assembly. This works for GAS but it triggers prologue/epilogue
emission when IAS is in use. This caused:
.set noreorder
.cpload $25
to be emitted as:
.set push
.set reorder
.set noreorder
.set pop
.set push
.set reorder
.cpload $25
.set pop
which led to assembler errors and caused the test to fail.
The whitespace-after-comma changes included in this patch are necessary to
match the output when IAS is in use.
Chandler Carruth [Wed, 21 Oct 2015 12:15:19 +0000 (12:15 +0000)]
[AA] Enhance the new AliasAnalysis infrastructure with an optional
"external" AA wrapper pass.
This is a generic hook that can be used to thread custom code into the
primary AAResultsWrapperPass for the legacy pass manager in order to
allow it to merge external AA results into the AA results it is
building. It does this by threading in a raw callback and so it is
*very* powerful and should serve almost any use case I have come up with
for extending the set of alias analyses used. The only thing not well
supported here is using a *different order* of alias analyses. That form
of extension *is* supportable with the new pass manager, and I can make
the callback structure here more elaborate to support it in the legacy
pass manager if this is a critical use case that people are already
depending on, but the only use cases I have heard of thus far should be
reasonably satisfied by this simpler extension mechanism.
It is hard to test this using normal facilities (the built-in AAs don't
use this for obvious reasons) so I've written a fairly extensive set of
custom passes in the alias analysis unit test that should be an
excellent test case because it models the out-of-tree users: it adds
a totally custom AA to the system. This should also serve as
a reasonably good example and guide for out-of-tree users to follow in
order to rig up their existing alias analyses.
No support in opt for commandline control is provided here however. I'm
really unhappy with the kind of contortions that would be required to
support that. It would fully re-introduce the analysis group
self-recursion kind of patterns. =/
I've heard from out-of-tree users that this will unblock their use cases
with extending AAs on top of the new infrastructure and let us retain
the new analysis-group-free-world.
Masked Load/Store optimization for scalar code
When we have to convert the masked.load, masked.store to scalar code, we generate a chain of conditional basic blocks.
I added optimization for constant mask vector.
Daniel Sanders [Wed, 21 Oct 2015 09:58:54 +0000 (09:58 +0000)]
[mips][msa] Remove copy_u.d and move copy_u.w to MSA64.
Summary:
The forwards compatibility strategy employed by MIPS is to consider registers
to be infinitely sign-extended. Then on ISA's with a wider register, the result
of existing instructions are sign-extended to register width and zero-extended
counterparts are added. copy_u.w on MSA32 and copy_u.w on MSA64 violate this
strategy and we have therefore corrected the MSA specs to fix this.
We still keep track of sign/zero-extension during legalization but we now
match copy_s.[wd] where required.
No change required to clang since __builtin_msa_copy_u_[wd] will map to
copy_s.[wd] where appropriate for the target.
Jonas Paulsson [Wed, 21 Oct 2015 07:39:47 +0000 (07:39 +0000)]
Let MachineVerifier be aware of mem-to-mem instructions.
A mem-to-mem instruction (that both loads and stores), which store to an
FI, cannot pass the verifier since it thinks it is loading from the FI.
For the mem-to-mem instruction, do a looser check in visitMachineOperand()
and only check liveness at the reg-slot while analyzing a frame index operand.
Needed to make CodeGen/SystemZ/xor-01.ll pass with -verify-machineinstrs,
which now runs with this flag.
Mehdi Amini [Wed, 21 Oct 2015 06:10:55 +0000 (06:10 +0000)]
Revert "Add missing #include, found by modules build."
This reverts commit r250239.
It seems unwanted changes got committed here, and part of
the patch does not seem correct.
For instance RoundUpToAlignment() is called without its returned
value actually used.
JF Bastien [Wed, 21 Oct 2015 02:23:09 +0000 (02:23 +0000)]
WebAssembly: support imports
C/C++ code can declare an extern function, which will show up as an import in WebAssembly's output. It's expected that the linker will resolve these, and mark unresolved imports as call_import (I have a patch which does this in wasmate).
Dehao Chen [Wed, 21 Oct 2015 01:22:27 +0000 (01:22 +0000)]
Tolerate negative offset when matching sample profile.
In some cases (as illustrated in the unittest), lineno can be less than the heade_lineno because the function body are included from some other files. In this case, offset will be negative. This patch makes clang still able to match the profile to IR in this situation.
Igor Laevsky [Tue, 20 Oct 2015 21:33:30 +0000 (21:33 +0000)]
[MemorySanitizer] NFC. Do not use GET_INTRINSIC_MODREF_BEHAVIOR table.
It is now possible to infer intrinsic modref behaviour purely from intrinsic attributes.
This change will allow to completely remove GET_INTRINSIC_MODREF_BEHAVIOR table.
This is the last of the implicit ilist iterator conversions in LLVM.
Still up for debate whether we let these bitrot back:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/091617.html
Chris Bieneman [Tue, 20 Oct 2015 18:16:37 +0000 (18:16 +0000)]
[CMake] All the checks for if LLVM_VERSION_* variables are set need to be if(DEFINED ...)
This is because if you set one of the variables to 0, if(NOT ...) is true, which isn't what you actually want. Should have thought that through better the first time.
Artyom Skrobov [Tue, 20 Oct 2015 13:14:52 +0000 (13:14 +0000)]
Adding support for TargetLoweringBase::LibCall
Summary:
TargetLoweringBase::Expand is defined as "Try to expand this to other ops,
otherwise use a libcall." For ISD::UDIV and ISD::SDIV, the choice between
the two possibilities was defined in a rather convoluted way:
- if DIVREM is legal, expand to DIVREM
- if DIVREM has a custom lowering, expand to DIVREM
- if DIVREM libcall is defined and a remainder from the same division is
computed elsewhere, expand to a DIVREM libcall
- else, expand to a DIV libcall
This had the undesirable effect that if both DIV and DIVREM are implemented
as libcalls, then ISD::UDIV and ISD::SDIV are expanded to the heavier DIVREM
libcall, even when the remainder isn't used.
The new code adds a new LegalizeAction, TargetLoweringBase::LibCall, so that
backends can directly control whether they prefer an expansion or a conversion
to a libcall. This makes the generic lowering code even more generic,
allowing its reuse in a wider range of target-specific configurations.
The useful effect is that ARM backend will now generate a call
to __aeabi_{i,u}div rather than __aeabi_{i,u}divmod in cases where
it doesn't need the remainder. There's no functional change outside
the ARM backend.
Artyom Skrobov [Tue, 20 Oct 2015 13:06:02 +0000 (13:06 +0000)]
Combining DIV+REM->DIVREM doesn't belong in LegalizeDAG; move it over into DAGCombiner.
Summary:
In addition to moving the code over, this patch amends the DIV,REM -> DIVREM
combining to run on all affected nodes at once: if the nodes are converted
to DIVREM one at a time, then the resulting DIVREM may get legalized by the
backend into something target-specific that we won't be able to recognize
and correlate with the remaining nodes.
The motivation is to "prepare terrain" for D13862: when we set DIV and REM
to be legalized to libcalls, instead of the DIVREM, we otherwise lose the
ability to combine them together. To prevent this, we need to take the
DIV,REM -> DIVREM combining out of the lowering stage.
The mask value type for maskload/maskstore GCC builtins is never a vector of
packed floats/doubles.
This patch fixes the following issues:
1. The mask argument for builtin_ia32_maskloadpd and builtin_ia32_maskstorepd
should be of type llvm_v2i64_ty and not llvm_v2f64_ty.
2. The mask argument for builtin_ia32_maskloadpd256 and
builtin_ia32_maskstorepd256 should be of type llvm_v4i64_ty and not
llvm_v4f64_ty.
3. The mask argument for builtin_ia32_maskloadps and builtin_ia32_maskstoreps
should be of type llvm_v4i32_ty and not llvm_v4f32_ty.
4. The mask argument for builtin_ia32_maskloadps256 and
builtin_ia32_maskstoreps256 should be of type llvm_v8i32_ty and not
llvm_v8f32_ty.
Keno Fischer [Tue, 20 Oct 2015 10:13:55 +0000 (10:13 +0000)]
Fix missing INITIALIZE_PASS_DEPENDENCY for AddressSanitizer
Summary: In r231241, TargetLibraryInfoWrapperPass was added to
`getAnalysisUsage` for `AddressSanitizer`, but the corresponding
`INITIALIZE_PASS_DEPENDENCY` was not added.
Matt Arsenault [Tue, 20 Oct 2015 03:59:58 +0000 (03:59 +0000)]
AMDGPU: Stop reserving v[254:255]
This wasn't doing anything useful. They weren't explicitly used
anywhere, and the RegScavenger ignores reserved registers.
This for some reason caused a random scheduling change in the test.
Getting the check lines to pass is too frustrating, and there's probably
not too much value in checking the vector case's operands N times.