Simon Pilgrim [Thu, 21 Feb 2019 12:24:49 +0000 (12:24 +0000)]
[X86][SSE] combineX86ShufflesRecursively - moved to generic op input index lookup. NFCI.
We currently bail if the target shuffle decodes to more than 2 input vectors, this change alters the input index to work for any number of inputs for when we drop that requirement.
Fangrui Song [Thu, 21 Feb 2019 11:35:41 +0000 (11:35 +0000)]
[llvm-readobj] Change "SHT_MIPS_DWARF" to "MIPS_DWARF"
Summary:
This is to be consistent with the display of other MIPS section types.
This string is also used by binutils-gdb/binutils/readelf.c:get_mips_section_type_name
Since we are here, reorder the two enum constatns because SHT_MIPS_DWARF < SHT_MIPS_ABIFLAGS.
James Henderson [Thu, 21 Feb 2019 11:00:29 +0000 (11:00 +0000)]
[llvm-readobj]Test basic command-line handling
There was no real testing for llvm-readobj/llvm-readelf's behaviour
under various bad inputs and command-line switches. This patch adds some
testing of this, along with basic testing of --version and --help.
James Henderson [Thu, 21 Feb 2019 10:57:15 +0000 (10:57 +0000)]
[yaml2obj]Allow symbol Index field to take values lower than SHN_LORESERVE
In order to test tool handling of invalid section indexes, I need to
create an object containing such an invalid section index. I could
create a hex-edited binary, but having the ability to use yaml2obj is
preferable. Prior to this change, yaml2obj would reject any explicit
section indexes less than SHN_LORESERVE. This patch changes it to allow
any value.
I had to change the test to use llvm-readelf instead of llvm-readobj,
because llvm-readobj does not like invalid section indexes. I've also
expanded the test to show that the most common SHN_* values are accepted
(SHN_UNDEF, SHN_ABS, SHN_COMMON).
David Spickett [Thu, 21 Feb 2019 10:42:49 +0000 (10:42 +0000)]
[AArch64] Print instruction before atomic semantic annotations
Commit r353303 added annotations when acquire semantics
were dropped from an instruction.
printAnnotation was called before printInstruction.
So if you didn't set a separate comment output stream
you got <comment><instr> instead of <instr><comment>
as expected.
To fix this move the new printAnnotation to after
the instruction is printed.
Sam Parker [Thu, 21 Feb 2019 09:33:18 +0000 (09:33 +0000)]
[ARM] Negative constants mishandled in ARM CGP
During type promotion, sometimes we convert negative an add with a
negative constant into a sub with a positive constant. The loop that
performs this transformation has two issues:
- it iterates over a set, causing non-determinism.
- it breaks, instead of continuing, when it finds the first
non-negative operand.
Markus Lavin [Thu, 21 Feb 2019 08:20:24 +0000 (08:20 +0000)]
[DebugInfo] Prep llvm-dwarfdump for typed DW5 ops.
Adds llvm-dwarfdump support for pretty printing Dwarf5 expressions ops
that reference a base type (right now only DW_OP_convert is added).
Includes verification to verify that the ops operand is actually a
DW_TAG_base_type DIE.
Wei Mi [Thu, 21 Feb 2019 02:57:52 +0000 (02:57 +0000)]
[Inliner] Pass nullptr for the ORE param of getInlineCost if RemarkEnabled
is false.
Right now for inliner and partial inliner, we always pass the address of a
valid ORE object to getInlineCost even if RemarkEnabled is false because of
no -Rpass is specified. Since ComputeFullInlineCost will be set to true if
ORE is non-null in getInlineCost, this introduces the problem that in
getInlineCost we cannot return early even if we already know the cost is
definitely higher than the threshold. It is a general problem for compile
time.
This patch fixes that by pass nullptr as the ORE argument if RemarkEnabled is
false.
Amara Emerson [Wed, 20 Feb 2019 23:22:15 +0000 (23:22 +0000)]
[GlobalISel] Add -O0 to some tests to see if it fixes them. I can't reproduce the failures locally,
and greendragon also passes, but some other bots fail for reasons I don't understand.
The only difference I can see between these tests is it's missing an -O0
If this doesn't work I'll revert and continue investigating.
Petr Hosek [Wed, 20 Feb 2019 23:06:10 +0000 (23:06 +0000)]
[CMake][runtimes] Set clang-header dependency for builtins
compiler-rt builtins depend on clang headers, but that dependency
wasn't explicitly stated in the build system and we were relying
on the transitive depenendecy via clang. However, when we're
cross-compiling clang, we'll be using host compiler instead and
that depenendecy is missing, breaking the build.
Sam Clegg [Wed, 20 Feb 2019 22:40:57 +0000 (22:40 +0000)]
[WebAssembly] Don't error on conflicting uses of prototype-less functions
When we can't determine with certainty the signature of a function
import we pick the fist signature we find rather than error'ing out.
The resulting program might not do what is expected since we might pick
the wrong signature. However since undefined behavior in C to use the
same function with different signatures this seems better than refusing
to compile such programs.
Amara Emerson [Wed, 20 Feb 2019 22:11:39 +0000 (22:11 +0000)]
[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR
This change makes some basic type combinations for G_SHUFFLE_VECTOR legal, and
implements them with a very pessimistic TBL2 instruction in the selector.
For TBL2, support is also needed to generate constant pool entries and load from
them in order to materialize the mask register.
Currently supports <2 x s64> and <4 x s32> result types.
Craig Topper [Wed, 20 Feb 2019 21:35:05 +0000 (21:35 +0000)]
[X86] Add test cases to show missed opportunities to remove AND mask from BTC/BTS/BTR instructions when LHS of AND has known zeros.
We can currently remove the mask if the immediate has all ones in the LSBs, but if the LHS of the AND is known zero, then the immediate might have had bits removed.
A similar issue also occurs with shifts and rotates. I'm preparing a common fix for all of them.
Craig Topper [Wed, 20 Feb 2019 20:52:26 +0000 (20:52 +0000)]
[SelectionDAG] Teach GetDemandedBits to look at the known zeros of the LHS when handling ISD::AND
If the LHS has known zeros, then the RHS immediate mask might have been simplified to remove those bits.
This patch adds a call to computeKnownBits to get the known zeroes to handle that possibility. I left an early out to skip the call if all of the demanded bits are set in the mask.
Tom Stellard [Wed, 20 Feb 2019 18:43:45 +0000 (18:43 +0000)]
AArch64/test: Add check for function name to machine-outliner-bad-adrp.mir
Summary:
This test was failing in one of our setups because the generated ModuleID
had the full path of the test file and that path contained the string
BL.
Daniel Sanders [Wed, 20 Feb 2019 18:08:48 +0000 (18:08 +0000)]
Add partial implementation of std::to_address() as llvm::to_address()
Summary:
Following on from the review for D58088, this patch provides the
prerequisite to_address() implementation that's needed to have
pointer_iterator support unique_ptr.
The late bound return should be removed once we move to C++14 to better
align with the C++20 declaration. Also, this implementation can be removed
once we move to C++20 where it's defined as std::to_addres()
The std::pointer_traits<>::to_address(p) variations of these overloads has
not been implemented.
Andrea Di Biagio [Wed, 20 Feb 2019 18:01:49 +0000 (18:01 +0000)]
[MCA][Scheduler] Collect resource pressure and memory dependency bottlenecks.
Every cycle, the Scheduler checks if instructions in the ReadySet can be issued
to the underlying pipelines. If an instruction cannot be issued because one or
more pipeline resources are unavailable, then field
Instruction::CriticalResourceMask is updated with the resource identifier of the
unavailable resources.
If an instruction cannot be promoted from the PendingSet to the ReadySet because
of a memory dependency, then field Instruction::CriticalMemDep is updated with
the identifier of the dependending memory instruction.
Bottleneck information is collected after every cycle for instructions that are
waiting to execute. The idea is to help identify causes of bottlenecks; this
information can be used in future to implement a bottleneck analysis.
Simon Pilgrim [Wed, 20 Feb 2019 17:58:29 +0000 (17:58 +0000)]
[X86][SSE] combineX86ShufflesRecursively - begin generalizing the number of shuffle inputs. NFCI.
We currently bail if the target shuffle decodes to more than 2 input vectors, this is some initial cleanup that still has the limit but generalizes the opindices to an array that will be necessary when we drop the limit.
James Henderson [Wed, 20 Feb 2019 17:21:38 +0000 (17:21 +0000)]
[llvm-readelf]Test a couple of corner-cases for --section-mapping
This patch adds two new tests for edge-case behaviour for --section-
mapping, namely when there are no program headers, and when there are no
section headers.
James Henderson [Wed, 20 Feb 2019 15:13:44 +0000 (15:13 +0000)]
[obj2yaml][yaml2obj]Locate all .yaml and .test tests
A number of the obj2yaml tests end in .yaml, but .yaml is not a default
file type picked up by lit, so these tests weren't being run when
running the testsuite as a whole (they could be run explicitly still).
This change adds a lit local config file to specify the known file types
for obj2yaml tests (.yaml and .test). Additionally, it fixes the
yaml2obj config file to allow both .test and .yaml suffixed tests
(previously, the two tests ending in '.test' were not being run).
Hans Wennborg [Wed, 20 Feb 2019 14:56:31 +0000 (14:56 +0000)]
Speculative buildfix for Mac
Our builds were failing with
FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/ARMBuildAttrs.cpp.o
[..]
In file included from /b/c/b/ToTMac/src/third_party/llvm/lib/Support/ARMBuildAttrs.cpp:9:
In file included from /b/c/b/ToTMac/src/third_party/llvm/include/llvm/ADT/StringRef.h:12:
In file included from /b/c/b/ToTMac/src/third_party/llvm/include/llvm/ADT/STLExtras.h:19:
/b/c/b/ToTMac/src/third_party/llvm/include/llvm/ADT/Optional.h:88:25: error: no member named 'addressof' in namespace 'std'
::new ((void *)std::addressof(value)) T(std::forward<Args>(args)...);
~~~~~^
Andrea Di Biagio [Wed, 20 Feb 2019 14:53:18 +0000 (14:53 +0000)]
[MCA][ResourceManager] Add a table that maps processor resource indices to processor resource identifiers.
This patch adds a lookup table to speed up resource queries in the ResourceManager.
This patch also moves helper function 'getResourceStateIndex()' from
ResourceManager.cpp to Support.h, so that we can reuse that logic in the
SummaryView (and potentially other views in llvm-mca).
No functional change intended.
Sanjay Patel [Wed, 20 Feb 2019 14:34:00 +0000 (14:34 +0000)]
[InstSimplify] use any-zero matcher for fcmp folds
The m_APFloat matcher does not work with anything but strict
splat vector constants, so we could miss these folds and then
trigger an assertion in instcombine:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13201
The previous attempt at this in rL354406 had a logic bug that
actually triggered a regression test failure, but I failed to
notice it the first time.
Petar Avramovic [Wed, 20 Feb 2019 13:42:44 +0000 (13:42 +0000)]
[MIPS MSA] Avoid some DAG combines for vector shifts
DAG combiner combines two shifts into shift + and with bitmask.
Avoid such combines for vectors since leaving two vector shifts
as they are produces better end results.
David Green [Wed, 20 Feb 2019 10:22:18 +0000 (10:22 +0000)]
[Codegen] Remove dead flags on Physical Defs in machine cse
We may leave behind incorrect dead flags on instructions that are CSE'd. Make
sure we remove the dead flags on physical registers to prevent other incorrect
code motion.
Summary:
Given an instruction `Opcode`, we can make benchmarks (measurements) of the
instruction characteristics/performance. Then, to facilitate further analysis
we group the benchmarks with *similar* characteristics into clusters.
Now, this is all not entirely deterministic. Some instructions have variable
characteristics, depending on their arguments. And thus, if we do several
benchmarks of the same instruction `Opcode`, we may end up with *different*
performance characteristics measurements. And when we then do clustering,
these several benchmarks of the same instruction `Opcode` may end up being
clustered into *different* clusters. This is not great for further analysis.
We shall find every `Opcode` with benchmarks not in just one cluster, and move
*all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.
I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit,
and introducing `-analysis-display-unstable-clusters` switch to toggle between
displaying stable-only clusters and unstable-only clusters.
The reclusterization is deterministically stable, produces identical reports
between runs. (Or at least that is what i'm seeing, maybe it isn't)
Timings/comparisons:
old (current trunk/head) {F8303582}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% )
```
Funnily, *this* measurement shows that said reclustering actually improved performance.
patch, with reclustering, only the stable clusters {F8303594}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% )
```
Performance improved even further?! Makes sense i guess, less clusters to print.
patch, with reclustering, only the unstable clusters {F8303601}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% )
```
This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
(Also, wow this is fast, it used to take several minutes originally)
Mikael Holmen [Wed, 20 Feb 2019 07:14:39 +0000 (07:14 +0000)]
[RegAllocGreedy] Take last chance recoloring into account in split and assign
Summary:
This is a follow-up to r353988 where tryEvict was extended to take last
chance recoloring into account. Now we do the same thing for trySplit and
tryAssign.
Now we always pass a "FixedRegisters" argument to canEvictInterference and
tryEvict so it doesn't need to have a default value anymore.
The need for this was found long ago in an out-of-tree target.
Unfortunately I don't have a reproducer for an in-tree target.
Chijun Sima [Wed, 20 Feb 2019 05:49:01 +0000 (05:49 +0000)]
[DTU] Refine the document of mutation APIs [NFC] (PR40528)
Summary:
It was pointed out in [[ https://bugs.llvm.org/show_bug.cgi?id=40528 | Bug 40528 ]] that it is not clear whether insert/deleteEdge can be used to perform multiple updates and [[ https://reviews.llvm.org/D57316#1388344 | a comment in D57316 ]] reveals that the difference between several ways to update the DominatorTree is confusing.
Craig Topper [Wed, 20 Feb 2019 05:39:11 +0000 (05:39 +0000)]
[X86] Remove FeatureSlowIncDec from Sandy Bridge and later Intel Core CPUs
Summary:
Inc and Dec were at one point slow on Intel CPUs due to their tendency to cause partial flag stalls on P6 derived CPU cores. This is because these instructions are defined to preserve the carry flag. This partial flag stall issue persisted until Sandy Bridge when flag merging was changed to be handled as a data dependency instead of as a stall until retirement. Sandy Bridge and later CPUs rename the C flag separately from OSPAZ so there is no flag merge needed on INC/DEC to preserve the C flag.
Given these improvements I don't know why INC/DEC was ever considered slow on Sandy Bridge. If anything they should have been disabled on the earlier CPUs instead.
Note after this patch, INC/DEC are still considered slow on Silvermont, Goldmont, Knights Landing and our generic "x86-64" CPU.
Eric Christopher [Wed, 20 Feb 2019 04:42:07 +0000 (04:42 +0000)]
Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)"
As this has broken the lto bootstrap build for 3 days and is
showing a significant regression on the Dither_benchmark results (from
the LLVM benchmark suite) -- specifically, on the
BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and
BENCHMARK_FLOYD_DITHER_512; the others are unchanged. These have
regressed by about 28% on Skylake, 34% on Haswell, and over 40% on
Sandybridge.
Fangrui Song [Wed, 20 Feb 2019 04:39:42 +0000 (04:39 +0000)]
[Dominators] Simplify and optimize path compression used in link-eval forest.
Summary:
* NodeToInfo[*] have been allocated so the addresses are stable. We can store them instead of NodePtr to save NumToNode lookups.
* Nodes are traversed twice. Using `Visited` to check the traversal number is expensive and obscure. Just split the two traversals into two loops explicitly.
* The check `VInInfo.DFSNum < LastLinked` is redundant as it is implied by `VInInfo->Parent < LastLinked`
* VLabelInfo PLabelInfo are used to save a NodeToInfo lookup in the second traversal.
Also add some comments explaining eval().
This shows a ~4.5% improvement (9.8444s -> 9.3996s) on
Kito Cheng [Wed, 20 Feb 2019 03:31:32 +0000 (03:31 +0000)]
[RISCV] Implement pseudo instructions for load/store from a symbol address.
Summary:
Those pseudo-instructions are making load/store instructions able to
load/store from/to a symbol, and its always using PC-relative addressing
to generating a symbol address.
Fangrui Song [Wed, 20 Feb 2019 02:35:24 +0000 (02:35 +0000)]
[Dominators] Delete UpdateLevelsAfterInsertion in edge insertion of depth-based search for release builds
Summary:
After insertion of (From, To), v is affected iff
depth(NCD)+1 < depth(v) && path P from To to v exists where every w on P s.t. depth(v) <= depth(w)
All affected vertices change their idom to NCD.
If a vertex u has changed its depth, it must be a descendant of an
affected vertex v. Its depth must have been updated by UpdateLevel()
called by setIDom() of the first affected ancestor.
So UpdateLevelsAfterInsertion and its bookkeeping variable VisitedNotAffectedQueue are redundant.
Run them only in debug builds as a sanity check.
Summary:
Changes from using a total ordering of known sections to using a
dependency graph approach. This allows our tools to accept and process
binaries that are compliant with the spec and tool conventions that
would have been previously rejected. It also means our own tools can
do less work to enforce an artificially imposed ordering. Using a
general mechanism means fewer special cases and exceptions in the
ordering logic.
Summary:
- Make `ATOMIC_I`, `ATOMIC_NRI`, `AtomicLoad`, `AtomicStore` classes and
make other operations inherit from them
- Factor the common opcode prefix '0xfe' out from the opcodes into the
common class
- Reorder instructions in the order of increasing opcodes
Heejin Ahn [Wed, 20 Feb 2019 01:14:36 +0000 (01:14 +0000)]
[WebAssembly] Fix load/store name detection for atomic instructions
Summary:
Fixed a bug in the routine in AsmParser that determines whether the
current instruction is a load or a store. Atomic instructions' prefixes
are not `atomic_` but `atomic.`, and all atomic instructions are also
memory instructions. Also fixed the printing format of atomic
instructions to match other memory instructions and added encoding tests
for atomic instructions.
Tom Stellard [Wed, 20 Feb 2019 01:11:05 +0000 (01:11 +0000)]
CMake: Fix stand-alone clang builds since r353268
Summary:
Handle the case where LLVM_MAIN_SRC_DIR is not set and also use
LLVM_CMAKE_DIR for locating installed cmake files rather than
LLVM_CMAKE_PATH.
Philip Reames [Wed, 20 Feb 2019 00:31:28 +0000 (00:31 +0000)]
[GVN] Small tweaks to comments, style, and missed vector handling
Noticed these while doing a final sweep of the code to make sure I hadn't missed anything in my last couple of patches. The (minor) missed optimization was noticed because of the stylistic fix to avoid an overly specific cast.
Bob Haarman [Wed, 20 Feb 2019 00:26:01 +0000 (00:26 +0000)]
[lld-link] preserve @llvm.used symbols in LTO
Summary:
We translate @llvm.used to COFF by generating /include directives
in the .drectve section. However, in LTO links, this happens after
directives have already been processed, so the new directives do
not take effect. This change marks @llvm.used symbols as GCRoots
so that they are preserved as intended.
Yonghong Song [Wed, 20 Feb 2019 00:22:19 +0000 (00:22 +0000)]
[BPF] make test case reloc-btf.ll tolerable for old compilers
The test case reloc-btf.ll is generated with an IR containing
spFlags introduced by https://reviews.llvm.org/rL347806.
In the case of BTF backporting, the old compiler may not
have this patch, so this test will fail during
validation.
This patch removed spFlags from IR in the test case
and used the old way for various flags.
Philip Reames [Wed, 20 Feb 2019 00:15:54 +0000 (00:15 +0000)]
[GVN] Fix last crasher w/non-integral pointers
Same case as for memset and memcpy, but this time for clobbering stores and loads. We still can't allow coercion to or from non-integrals, regardless of the transform.
Now that I'm done the whole little sequence, it seems apparent that we'd entirely missed reasoning about clobbers in the original GVN support for non-integral pointers.
My appologies, I thought we'd upstreamed all of this, but it turns out we were still carrying a downstream hack which hid all of these issues. My chanks to Cherry Zhang for helping debug.
Sanjay Patel [Wed, 20 Feb 2019 00:09:50 +0000 (00:09 +0000)]
[InstSimplify] use any-zero matcher for fcmp folds
The m_APFloat matcher does not work with anything but strict
splat vector constants, so we could miss these folds and then
trigger an assertion in instcombine:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13201
Philip Reames [Tue, 19 Feb 2019 23:49:38 +0000 (23:49 +0000)]
[GVN] Fix a crash bug w/non-integral pointers and memtransfers
Problem is very similiar to the one fixed for memsets in r354399, we try to coerce a value to non-integral type, and then crash while try to do so. Since we shouldn't be doing such coercions to start with, easy fix. From inspection, I see two other cases which look to be similiar and will follow up with most test cases and fixes if confirmed.
Philip Reames [Tue, 19 Feb 2019 23:19:51 +0000 (23:19 +0000)]
[GVN] Fix a non-integral pointer bug w/vector types
GVN generally doesn't forward structs or array types, but it *will* forward vector types to non-vectors and vice versa. As demonstrated in tests, we need to inhibit the same set of transforms for vector of non-integral pointers as for non-integral pointers themselves.
Philip Reames [Tue, 19 Feb 2019 23:07:15 +0000 (23:07 +0000)]
[GVN] Fix a crash bug around non-integral pointers
If we encountered a location where we tried to forward the value of a memset to a load of a non-integral pointer, we crashed. Such a forward is not legal in general, but we can forward null pointers. Test for both cases are included.
Thomas Lively [Tue, 19 Feb 2019 22:56:19 +0000 (22:56 +0000)]
[WebAssembly] Update MC for bulk memory
Summary:
Rename MemoryIndex to InitFlags and implement logic for determining
data segment layout in ObjectYAML and MC. Also adds a "passive" flag
for the .section assembler directive although this cannot be assembled
yet because the assembler does not support data sections.
Craig Topper [Tue, 19 Feb 2019 22:37:00 +0000 (22:37 +0000)]
[X86] Mark FP32_TO_INT16_IN_MEM/FP32_TO_INT32_IN_MEM/FP32_TO_INT64_IN_MEM as clobbering EFLAGS to prevent mis-scheduling during conversion from SelectionDAG to MIR.
After r354178, these instruction expand to a sequence that uses an OR instruction. That OR clobbers EFLAGS so we need to state that to avoid accidentally using the clobbered flags.
Our tests show the bug, but I didn't notice because the SETcc instructions didn't move after r354178 since it used to be safe to do the fp->int conversion first.
We should probably convert this whole sequence to SelectionDAG instead of a custom inserter to avoid mistakes like this.
Sanjay Patel [Tue, 19 Feb 2019 22:14:21 +0000 (22:14 +0000)]
[InstCombine] reduce even more unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613