Jun Bum Lim [Thu, 2 Feb 2017 15:12:34 +0000 (15:12 +0000)]
[JumpThread] Enhance finding partial redundant loads by continuing scanning single predecessor
Summary: While scanning predecessors to find an available loaded value, if the predecessor has a single predecessor, we can continue scanning through the single predecessor.
Nirav Dave [Thu, 2 Feb 2017 14:39:42 +0000 (14:39 +0000)]
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Nirav Dave [Thu, 2 Feb 2017 14:39:26 +0000 (14:39 +0000)]
[X86,ISEL] Fix X86 increment chain dependence calculation
Merging Load-add-store pattern into a increment op previously dropped
the load's chain from the instructions dependence if the store is
chained to a TokenFactor.
Diana Picus [Thu, 2 Feb 2017 14:01:00 +0000 (14:01 +0000)]
[ARM] GlobalISel: Lower pointer args and returns
It is important to change the ArgInfo's type from pointer to integer, otherwise
the CC assign function won't know what to do. Instead of hacking it up, we use
ComputeValueVTs and introduce some of the helpers that we will need later on for
lowering more complex types.
Diana Picus [Thu, 2 Feb 2017 14:00:54 +0000 (14:00 +0000)]
[ARM] GlobalISel: Error out instead of asserting
Allow unknown types in TLI.getValueType, otherwise we get asserts for certain
types that we do not support yet (instead of returning that we don't support
them and falling through the normal error path).
Anna Thomas [Thu, 2 Feb 2017 13:22:03 +0000 (13:22 +0000)]
[LICM] Hoist loads that are dominated by invariant.start intrinsic, and are invariant in the loop.
Summary:
We can hoist out loads that are dominated by invariant.start, to the preheader.
We conservatively assume the load is variant, if we see a corresponding
use of invariant.start (it could be an invariant.end or an escaping
call).
Simon Pilgrim [Thu, 2 Feb 2017 11:52:33 +0000 (11:52 +0000)]
[X86][SSE] Use MOVMSK for all_of/any_of reduction patterns
This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain).
So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc.
LTO: Link non-prevailing weak_odr or linkonce_odr globals into the combined module with available_externally linkage.
These linkages mean that the ultimately prevailing symbol will have the same
semantics as any non-prevailing copy of the symbol, so we are free to ignore
the linker's resolution.
Matt Arsenault [Thu, 2 Feb 2017 02:27:04 +0000 (02:27 +0000)]
AMDGPU: Use source modifiers with f16->f32 conversions
The operand types were defined to fit the fp16_to_fp node, which
has the half as an integer type. v_cvt_f32_f16 does support
source modifiers, so change this to have an FP type and modifiers.
For targets without legal f16, this requires recognizing the
bit operations and trying to produce them.
Paul Robinson [Wed, 1 Feb 2017 23:51:56 +0000 (23:51 +0000)]
Remove an assertion that doesn't hold when mixing -g and -gmlt through
LTO. Replace it with a related assertion, ensuring that abstract
variables appear only in abstract scopes.
Part of PR31437.
[AMDGPU] Account workgroup size in LDS occupancy limits
Functions matching LDS use to occupancy return results for a workgroup
of 64 workitems. The numbers has to be adjusted for bigger workgroups.
For example a workgroup of size 256 already occupies 4 waves just by
itself. Given that all numbers of LDS use in the compiler are per
workgroup, occupancy shall be multiplied by 4 in this case. Each 64
workitems still limited by the same number, but 4 subrgoups 64 workitems
each can afford 4 times more LDS to get the same occupancy.
In addition change initializes LDS size in the subtarget to a real value
for SI+ targets. This is required since LDS size is a variable in these
calculations.
Marcos Pividori [Wed, 1 Feb 2017 22:40:34 +0000 (22:40 +0000)]
[libFuzzer] Add features `windows` and `posix` for lit tests.
Add 2 features: posix and windows.
Sometimes we want some specific tests only for posix and we use:
REQUIRES: posix
Sometimes we want some specific tests only for windows and we use:
REQUIRES: windows
Marcos Pividori [Wed, 1 Feb 2017 22:39:55 +0000 (22:39 +0000)]
[libFuzzer] Fix test because cmd prompt does not expand wildcard.
Commands should expand the wildcards on Windows, the cmd prompt doesn't.
Because of that sancov was not finding the needed file.
To deal with this, we use ls and xargs from gnu win utils.
Rui Ueyama [Wed, 1 Feb 2017 22:09:34 +0000 (22:09 +0000)]
Return Error instead of bool from mergeTypeStreams().
Previously, mergeTypeStreams returns only true or false, so it was
impossible to know the reason if it failed. This patch changes the
function signature so that it returns an Error object.
Sanjay Patel [Wed, 1 Feb 2017 21:31:34 +0000 (21:31 +0000)]
[InstCombine] move folds for shift-shift pairs; NFCI
Although this is 'no-functional-change-intended', I'm adding tests
for shl-shl and lshr-lshr pairs because there is no existing test
coverage for those folds.
It seems like we should be able to remove some code from foldShiftedShift()
at this point because we're handling those patterns on the general path.
Davide Italiano [Wed, 1 Feb 2017 18:52:20 +0000 (18:52 +0000)]
[IPSCCP] Don't propagate return values of functions marked as noinline.
This tries to address what Hal defined (in the post-commit review of
r293727) a long-standing problem with noinline, where we end up
de facto inlining trivial functions e.g.
__attribute__((noinline)) int patatino(void) { return 5; }
Simon Dardis [Wed, 1 Feb 2017 18:50:24 +0000 (18:50 +0000)]
[mips] Parse the 'bopt' and 'nobopt' directives in IAS.
The GAS assembler supports the ".set bopt" directive but according
to the sources it doesn't do anything. It's supposed to optimize
branches by filling the delay slot of a branch with it's target.
This patch teaches the MIPS asm parser to accept both and warn in
the case of 'bopt' that the bopt directive is unsupported.
Zachary Turner [Wed, 1 Feb 2017 18:30:22 +0000 (18:30 +0000)]
[pdb] Add a new command for analyzing hash collisions.
This introduces the `analyze` subcommand. For now there is only
one option, to analyze hash collisions in the type streams. In
the future, however, we could add many more things here, such
as performing size analyses, compacting, and statistics about
the type of records etc.
Sanjoy Das [Wed, 1 Feb 2017 17:50:40 +0000 (17:50 +0000)]
[ImplicitNullChecks] NFC Fix the implicit-null-checks.mir test
Summary:
Currently the test implicit-null-checks.mir crashes if we run llc with
-enable-implicit-null-checks -start-before implicit-null-checks
options. Change fixes the RET instruction causing the crash.
Matthew Simpson [Wed, 1 Feb 2017 17:45:46 +0000 (17:45 +0000)]
[LV] Move interleaved access helper functions to VectorUtils (NFC)
This patch moves some helper functions related to interleaved access
vectorization out of LoopVectorize.cpp and into VectorUtils.cpp. We would like
to use these functions in a follow-on patch that improves interleaved load and
store lowering in (ARM/AArch64)ISelLowering.cpp. One of the functions was
already duplicated there and has been removed.
Sanjoy Das [Wed, 1 Feb 2017 16:34:55 +0000 (16:34 +0000)]
[InstCombine] Allow InstCombine to merge adjacent guards
Summary:
If there are two adjacent guards with different conditions, we can
remove one of them and include its condition into the condition of
another one. This patch allows InstCombine to merge them by the
following pattern:
Sanjay Patel [Wed, 1 Feb 2017 15:41:32 +0000 (15:41 +0000)]
[ValueTracking] avoid crashing from bad assumptions (PR31809)
A program may contain llvm.assume info that disagrees with other analysis.
This may be caused by UB in the program, so we must not crash because of that.
As noted in the code comments:
https://llvm.org/bugs/show_bug.cgi?id=31809
...we can do better, but this at least avoids the assert/crash in the bug report.
Simon Dardis [Wed, 1 Feb 2017 15:39:23 +0000 (15:39 +0000)]
[mips] Fix an initialization issue with MipsABIInfo in MipsTargetELFStreamer
DebugInfoDWARFTests is the only user so far which initializes the
MCObjectStreamer without initializing the ASMParser. The MIPS backend
relies on the ASMParser to initialize the MipsABIInfo object and to
update the target streamer with it. This should turn the mips buildbots
green.
Kit Barton [Wed, 1 Feb 2017 14:33:57 +0000 (14:33 +0000)]
[PowerPC] Fix sjlj pseduo instructions to use G8RC_NOX0 register class
The the following instructions:
- LD/LWZ (expanded from sjLj pseudo-instructions)
- LXVL/LXVLL vector loads
- STXVL/STXVLL vector stores
all require G8RC_NO0X class registers for RA.
Florian Hahn [Wed, 1 Feb 2017 13:01:33 +0000 (13:01 +0000)]
[legalizetypes] Push fp16 -> fp32 extension node to worklist.
Summary:
This way, the type legalization machinery will take care of registering
the result of this node properly.
This patches fixes all failing fp16 test cases with expensive checks.
(CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll
CodeGen/X86/soft-fp.ll)
Javed Absar [Wed, 1 Feb 2017 11:55:03 +0000 (11:55 +0000)]
[ARM] Enable Cortex-M23 and Cortex-M33 support.
Add both cores to the target parser and TableGen. Test that eabi
attributes are set correctly for both cores. Additionally, test the
absence and presence of MOVT in Cortex-M23 and Cortex-M33, respectively.
Committed on behalf of Sanne Wouda.
Reviewers : rengolin, olista01.
Florian Hahn [Wed, 1 Feb 2017 10:39:35 +0000 (10:39 +0000)]
[LoopUnroll] Use addClonedBlockToLoopInfo to add loop header to LI (NFC).
Summary:
I have a similar patch up for review already (D29173). If you prefer I
can squash them both together.
Also I think there more potential for code sharing between
LoopUnroll.cpp and LoopUnrollRuntime.cpp. Do you think patches for
that would be worthwhile?
Craig Topper [Wed, 1 Feb 2017 07:17:16 +0000 (07:17 +0000)]
[X86] For AVX1/AVX2 isel, don't use FP move instructions for 128-bit loads/stores of integer types.
For SSE we use fp because of the smaller encoding, but that doesn't apply to AVX. So just do the natural thing so we don't have to explain why we aren't. We can't do this for 256-bit loads/stores since integer loads and stores aren't available in AVX1 so we need fallback patterns since the integer types are legal.
This doesn't affect any tests because execution domain fixing freely converts the instructions anyway. Honestly, we could probably rely on it for the SSE size optimization too.
Summary:
isSuitableMemoryOp method is repsonsible for verification
that instruction is a candidate to use in implicit null check.
Additionally it checks that base register is not re-defined before.
In case base has been re-defined it just returns false and lookup
is continued while any suitable instruction will not succeed this check
as well. This results in redundant further operations.
So when we found that base register has been re-defined we just
stop.
SplitEditor::defFromParent() can create a register copy.
If register is a tuple of other registers and not all lanes are used
a copy will be done on a full tuple regardless. Later register unit
for an unused lane will be considered free and another overlapping
register tuple can be assigned to a different value even though first
register is live at that point. That is because interference only look at
liveness info, while full register copy clobbers all lanes, even unused.
Summary:
This change implements the instrumentation map loading library which can
understand both YAML-defined instrumentation maps, and ELF 64-bit object
files that have the XRay instrumentation map section. We break it out
into a library on its own to allow for other applications to deal with
the XRay instrumentation map defined in XRay-instrumented binaries.
This type provides both raw access to the logical representation of the
instrumentation map entries as well as higher level functions for
converting a function ID into a function address.
At this point we only support ELF64 binaries and YAML-defined XRay
instrumentation maps. Future changes should extend this to support
32-bit ELF binaries, as well as other binary formats (like MachO).
As part of this change we also migrate all uses of the extraction logic
that used to be defined in tools/llvm-xray/ to use this new type and
interface for loading from files. We also remove the flag from the
`llvm-xray` tool that required users to specify the type of the
instrumentation map file being provided to instead make the library
auto-detect the file type.
Kyle Butt [Tue, 31 Jan 2017 23:48:32 +0000 (23:48 +0000)]
CodeGen: Allow small copyable blocks to "break" the CFG.
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.