Matthias Braun [Fri, 18 Nov 2016 19:43:18 +0000 (19:43 +0000)]
Timer: Track name and description.
The previously used "names" are rather descriptions (they use multiple
words and contain spaces), use short programming language identifier
like strings for the "names" which should be used when exporting to
machine parseable formats.
Hans Wennborg [Fri, 18 Nov 2016 17:33:05 +0000 (17:33 +0000)]
IRMover: Avoid accidentally mapping types from the destination module (PR30799)
During Module linking, it's possible for SrcM->getIdentifiedStructTypes();
to return types that are actually defined in the destination module
(DstM). Depending on how the bitcode file was read,
getIdentifiedStructTypes() might do a walk over all values, including
metadata nodes, looking for types. In my case, a debug info metadata
node was shared between the two modules, and it referred to a type
defined in the destination module (see test case).
Mehdi Amini [Fri, 18 Nov 2016 17:28:10 +0000 (17:28 +0000)]
Add link-time detection of LLVM_ABI_BREAKING_CHECKS mismatch
Summary:
LLVM will define a symbol, either EnableABIBreakingChecks or
DisableABIBreakingChecks depending on the configuration setting for
LLVM_ABI_BREAKING_CHECKS.
The llvm-config.h header will add weak references to these symbols in
every clients that includes this header. This should ensure that
a mismatch triggers a link failure (or a load time failure for DSO).
On MSVC, the pragma "detect_mismatch" is used instead.
Simon Dardis [Fri, 18 Nov 2016 16:17:44 +0000 (16:17 +0000)]
[mips][msa] Implement f16 support
The MIPS MSA ASE provides instructions to convert to and from half precision
floating point. This patch teaches the MIPS backend to treat f16 as a legal
type and how to promote such values to f32 for the usual set of operations.
As a result of this, the fexup[lr].w intrinsics no longer crash LLVM during
type legalization.
Florian Hahn [Fri, 18 Nov 2016 13:12:07 +0000 (13:12 +0000)]
[simplifycfg][loop-simplify] Preserve loop metadata in 2 transformations.
insertUniqueBackedgeBlock in lib/Transforms/Utils/LoopSimplify.cpp now
propagates existing llvm.loop metadata to newly the added backedge.
llvm::TryToSimplifyUncondBranchFromEmptyBlock in lib/Transforms/Utils/Local.cpp
now propagates existing llvm.loop metadata to the branch instructions in the
predecessor blocks of the empty block that is removed.
Nicolai Haehnle [Fri, 18 Nov 2016 11:55:52 +0000 (11:55 +0000)]
AMDGPU: Fix legalization of MUBUF instructions in shaders
Summary:
The addr64-based legalization is incorrect for MUBUF instructions with idxen
set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions. This affects
e.g. shaders that access buffer textures.
Since we never actually need the addr64-legalization in shaders, this patch
takes the easy route and keys off the calling convention. If this ever
affects (non-OpenGL) compute, the type of legalization needs to be chosen
based on some TSFlag.
Ehsan Amiri [Fri, 18 Nov 2016 10:41:44 +0000 (10:41 +0000)]
[PPC][DAGCombine] Convert SETCC to subtract when the result is zero extended
When we see a SETCC whose only users are zero extend operations, we can replace
it with a subtraction. This results in doing all calculations in GPRs and
avoids CR use.
Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are
ways that this can be extended. For example for signed condition codes. In that
case we will be introducing additional sign extend instructions, so more careful
profitability analysis may be required.
Another direction to extend this is for equal, not equal conditions. Also when
users of SETCC are any_ext or sign_ext, we might be able to do something
similar.
Craig Topper [Fri, 18 Nov 2016 06:04:33 +0000 (06:04 +0000)]
[InstCombine][AVX-512] Teach InstCombineCalls how to handle the intrinsics for variable shift with 16-bit elements.
This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches.
convert bpf assembler to look like kernel verifier output
since bpf instruction set was introduced people learned to
read and understand kernel verifier output whereas llvm asm
output stayed obscure and unknown. Convert llvm to emit
assembler text similar to kernel to avoid this discrepancy
Yichao Yu [Fri, 18 Nov 2016 01:25:49 +0000 (01:25 +0000)]
Add an option to disable libedit
Summary: This should provide the function similar to `--disable-libedit` with the autotools build system, which seems to be missing from the commit (r200595) that adds this.
Anna Zaks [Thu, 17 Nov 2016 16:55:40 +0000 (16:55 +0000)]
[asan] Turn on Mach-O global metadata liveness tracking by default
This patch turns on the metadata liveness tracking since all known issues
have been resolved. The future has been implemented in
https://reviews.llvm.org/D16737 and enables support of dead code stripping
option on Mach-O platforms.
As part of enabling the feature, I also plan on reverting the following
patch to compiler-rt:
Simon Pilgrim [Thu, 17 Nov 2016 12:14:49 +0000 (12:14 +0000)]
[X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves
vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves.
If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer).
This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases).
Pavel Labath [Thu, 17 Nov 2016 11:22:23 +0000 (11:22 +0000)]
[cmake] Move LLVM_BUILD_STATIC check to an earlier point
Summary:
The motivation for this is to enable correct detection of dlopen() on Android.
Android does not provide a static version of libdl, so if we add the -static flag
after performing the check, it will succeed even though subsequent link steps
will fail. With this change we correctly detect the absence of libdl in a
LLVM_BUILD_STATIC build on Android.
The link itself still does not succeed because the code does not check the result
of this check properly, but I plan to fix that in a separate change.
Oren Ben Simhon [Thu, 17 Nov 2016 09:59:40 +0000 (09:59 +0000)]
[X86] RegCall - Handling v64i1 in 32/64 bit target
Register Calling Convention defines a new behavior for v64i1 types.
This type should be saved in GPR.
However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit).
Sanjoy Das [Thu, 17 Nov 2016 07:29:40 +0000 (07:29 +0000)]
[ImplicitNullCheck] Fix an edge case where we were hoisting incorrectly
ImplicitNullCheck keeps track of one instruction that the memory
operation depends on that it also hoists with the memory operation.
When hoisting this dependency, it would sometimes clobber a live-in
value to the basic block we were hoisting the two things out of. Fix
this by explicitly looking for such dependencies.
I also noticed two redundant checks on `MO.isDef()` in IsMIOperandSafe.
They're redundant since register MachineOperands are either Defs or Uses
-- there is no third kind. I'll change the checks to asserts in a later
commit.
Chris Bieneman [Thu, 17 Nov 2016 04:36:59 +0000 (04:36 +0000)]
[CMake] [Darwin] Add support for debugging tablegen dependencies
This patch adds an option to the build system LLVM_DEPENDENCY_DEBUGGING. Over time I plan to extend this to do more complex verifications, but the initial patch causes compile errors wherever there is missing a dependency on intrinsics_gen.
Because intrinsics_gen is a compile-time dependency not a link-time dependency, everything that relies on the headers generated in intrinsics_gen needs an explicit dependency.
This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system.
Dehao Chen [Thu, 17 Nov 2016 01:17:02 +0000 (01:17 +0000)]
Use profile info to adjust loop unroll threshold.
Summary:
For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.
Sanjay Patel [Wed, 16 Nov 2016 22:34:05 +0000 (22:34 +0000)]
[x86] allow FP-logic ops when one operand is FP and result is FP
We save an inter-register file move this way. If there's any CPU where
the FP logic is slower, we could transform this back to int-logic in
MachineCombiner.
This helps, but doesn't solve, PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137
The 'andn' test shows that we're missing a pattern match to
recognize the xor with -1 constant as a 'not' op.
Ahmed Bougacha [Wed, 16 Nov 2016 22:24:56 +0000 (22:24 +0000)]
[CodeGen] Pull MMI helpers from FunctionLoweringInfo to MMI. NFC.
They're not SelectionDAG- or FunctionLoweringInfo-specific. They
are, however, specific to building MMI from IR.
We could make them members, but it's nice having MMI be a "simple" data
structure and this logic kept separate.
Kevin Enderby [Wed, 16 Nov 2016 22:17:38 +0000 (22:17 +0000)]
General clean up of error handling in llvm-objdump to remove its use of report_fatal_error().
No real functional change with this commit.
The problem with report_fatal_error() is it does not include the tool name
and the file name the for which the error message was generated.
Uses of report_fatal_error() were change to report_error() or error()
to get a better error and to make the code smaller and cleaner.
Also changed things like error(errorToErrorCode(SOrErr.takeError())) to
use report_error() with a file name and the llvm::Error (as well as the
ArchitectureName if available) so the error message is printed.
Dylan McKay [Wed, 16 Nov 2016 21:58:04 +0000 (21:58 +0000)]
[AVR] Add the pseudo instruction expansion pass
Summary:
A lot of the pseudo instructions are required because LLVM assumes that
all integers of the same size as the pointer size are legal. This means
that it will not currently expand 16-bit instructions to their 8-bit
variants because it thinks 16-bit types are legal for the operations.
This also adds all of the CodeGen tests that required the pass to run.
We only ever create TargetConstantPool, TargetJumpTable, TargetExternalSymbol,
TargetGlobalAddress, TargetGlobalTLSAddress, MCSymbol and TargetBlockAddress
nodes as operands of X86ISD::Wrapper nodes, so we can remove one check and
invert the other.
Also update the documentation comment for X86ISD::Wrapper.
Sanjoy Das [Wed, 16 Nov 2016 21:45:22 +0000 (21:45 +0000)]
[ImplicitNullChecks] Do not not handle call MachineInstrs
We don't track callee clobbered registers correctly, so avoid hoisting
across calls.
Note: for this bug to trigger we need a `readonly` call target, since we
already have logic to not hoist across potentially storing instructions
either.
Tim Northover [Wed, 16 Nov 2016 20:54:28 +0000 (20:54 +0000)]
ARM: fix CodeGen for 64-bit shifts.
One half of the shifts obviously needed conditional selection based on whether
the shift amount is more than 32-bits, but leaving the other half as the
natural shift isn't acceptable either: it's undefined behaviour to shift a
32-bit value by more than 31.
Rong Xu [Wed, 16 Nov 2016 20:50:06 +0000 (20:50 +0000)]
Make block placement deterministic
We fail to produce bit-to-bit matching stage2 and stage3 compiler in PGO
bootstrap build. The reason is because LoopBlockSet is of SmallPtrSet type
whose iterating order depends on the pointer value.
This patch fixes this issue by changing to use SmallSetVector.
Geoff Berry [Wed, 16 Nov 2016 19:35:19 +0000 (19:35 +0000)]
[AArch64] Handle vector types in replaceZeroVectorStore.
Summary:
Extend replaceZeroVectorStore to handle more vector type stores,
floating point zero vectors and set alignment more accurately on split
stores.
Tom Stellard [Wed, 16 Nov 2016 18:42:17 +0000 (18:42 +0000)]
AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass
Summary:
1. Don't try to copy values to and from the same register class.
2. Replace copies with of registers with immediate values with v_mov/s_mov
instructions.
The main purpose of this change is to make MachineSink do a better job of
determining when it is beneficial to split a critical edge, since the pass
assumes that copies will become move instructions.
This prevents a regression in uniform-cfg.ll if we enable critical edge
splitting for AMDGPU.