Sanjay Patel [Thu, 28 Mar 2019 14:16:13 +0000 (14:16 +0000)]
[x86] avoid cmov in movmsk reduction
This is probably the least important of our movmsk problems, but I'm starting
at the bottom to reduce distractions.
We were creating a select_cc which bypasses the select and bitmask codegen
optimizations that we have now. If we produce a compare+negate instead, we
allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov.
There's no partial register update danger in these sequences because we always
produce the zero-register xor ahead of the 'set' if needed.
There seems to be a missing fold for sext of a bool bit here:
Roman Lebedev [Thu, 28 Mar 2019 13:40:34 +0000 (13:40 +0000)]
[X86] AMD Piledriver (BdVer2): fine-tune some latencies
Based on llvm-exegesis measurements.
Now that llvm-exegesis is ~2 magnitudes faster, and is a bit smarter,
it is now possible to continue cleanup of the scheduler model.
With this, there are no more latency inconsistencies for the
opcodes that produce stable measurements, and only a few inconsistencies
for unstable measurements (MMX_* opcodes, opcodes that llvm-exegesis
measures by chaining - CMP, TEST, BT, SETcc, CVT, MOV, etc.)
Pierre Gousseau [Thu, 28 Mar 2019 10:51:24 +0000 (10:51 +0000)]
[asan] Add options -asan-detect-invalid-pointer-cmp and -asan-detect-invalid-pointer-sub options.
This is in preparation to a driver patch to add gcc 8's -fsanitize=pointer-compare and -fsanitize=pointer-subtract.
Disabled by default as this is still an experimental feature.
Diana Picus [Thu, 28 Mar 2019 09:09:27 +0000 (09:09 +0000)]
[ARM GlobalISel] Fix selection of G_SELECT
G_SELECT uses a 1-bit scalar for the condition, and is currently
implemented with a plain CMPri against 0. This means that values such as
0x1110 are interpreted as true, when instead the higher bits should be
treated as undefined and therefore ignored. Replace the CMPri with a
TSTri against 0x1, which performs an implicit AND, yielding the expected
result.
Roman Lebedev [Thu, 28 Mar 2019 08:55:01 +0000 (08:55 +0000)]
[llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880)
Summary:
This is an alternative to D59539.
Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.
But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
the LLVM values this cluster will **never** match the LLVM values,
and thus this cluster will **always** be displayed as inconsistent.
The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the `.yaml` and look it up manually.
The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.
But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.
Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.
This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..
Piotr Sobczak [Thu, 28 Mar 2019 07:06:26 +0000 (07:06 +0000)]
[SelectionDAG] Add 2 tests for selection across basic blocks
Summary:
Add tests for selection across basic block boundary:
* one test containing a buffer load, where part of the offset
computation is placed in the predecessor of the load
* similar test, but containing two buffer loads and shared
computations
Please note that the behaviour being tested will be updated in
a subsequent commit.
This commit was extracted from https://reviews.llvm.org/D59535.
Chandler Carruth [Thu, 28 Mar 2019 00:51:36 +0000 (00:51 +0000)]
[NewPM] Fix a nasty bug with analysis invalidation in the new PM.
The issue here is that we actually allow CGSCC passes to mutate IR (and
therefore invalidate analyses) outside of the current SCC. At a minimum,
we need to support mutating parent and ancestor SCCs to support the
ArgumentPromotion pass which rewrites all calls to a function.
However, the analysis invalidation infrastructure is heavily based
around not needing to invalidate the same IR-unit at multiple levels.
With Loop passes for example, they don't invalidate other Loops. So we
need to customize how we handle CGSCC invalidation. Doing this without
gratuitously re-running analyses is even harder. I've avoided most of
these by using an out-of-band preserved set to accumulate the cross-SCC
invalidation, but it still isn't perfect in the case of re-visiting the
same SCC repeatedly *but* it coming off the worklist. Unclear how
important this use case really is, but I wanted to call it out.
Another wrinkle is that in order for this to successfully propagate to
function analyses, we have to make sure we have a proxy from the SCC to
the Function level. That requires pre-creating the necessary proxy.
The motivating test case now works cleanly and is added for
ArgumentPromotion.
Sanjay Patel [Wed, 27 Mar 2019 22:42:11 +0000 (22:42 +0000)]
[x86] improve AVX lowering of vector zext
If we know the 2 halves of an oversized zext-in-reg are the same,
don't create those halves independently.
I tried several different approaches to fold this, but it's difficult
to get right during legalization. In the default path, we are creating
a generic shuffle that looks like an unpack high, but it can get
transformed into a different mask (a blend), so it's not
straightforward to match that. If we try to fold after it actually
becomes an X86ISD::UNPCKH node, we can't be sure what the operand node
is - it might be a generic shuffle, or it could be some x86-specific op.
From the test output, we should be doing something like this for SSE4.1
as well, but I'd rather leave that as a follow-up since it involves
changing lowering actions.
Sanjay Patel [Wed, 27 Mar 2019 22:24:03 +0000 (22:24 +0000)]
[x86] look through bitcast operand of MOVMSK
This is not exactly NFC because it should make further combines
of MOVMSK easier to match, but there should be no outward differences
because we have isel patterns in place specifically to allow this. See:
// Also support integer VTs to avoid a int->fp bitcast in the DAG.
Craig Topper [Wed, 27 Mar 2019 21:05:07 +0000 (21:05 +0000)]
[X86ISelDAGToDAG] Move initialization of OptForSize and OptForMinSize from PreprocessISelDAG to runOnMachineFunction. NFCI
This makes more sense as a place to initialize these. I don't think runOnMachineFunction was overriden when these cached values were originally created.
Justin Bogner [Wed, 27 Mar 2019 20:35:56 +0000 (20:35 +0000)]
[LegalizeVectorTypes] Allow single loads and stores for more short vectors
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal
or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable.
(See https://reviews.llvm.org/rL236528 for reference.)
This applies that behaviour to vector types. If the vector type is
TypePromoteInteger, the element type is going to be TypePromoteInteger
as well, which will lead to have a single promoting load rather than N
individual promoting loads. For instance, if we have a v3i1, we would
now have a load of v4i1 instead of 3 loads of i1.
Nikita Popov [Wed, 27 Mar 2019 20:18:51 +0000 (20:18 +0000)]
[ConstantRangeTest] Add exhaustive intersectWith() test
Add a test that checks the intersectWith() implementation against
all 4-bit range pairs. The test uses a more explicit way of
calculating the possible intersections, and checks that the right
one is picked out according to the smallest set heuristic.
This is in preparation for introducing intersectWith() variants that
use different heuristics to pick an intersection range, if there are
multiple possibilities.
Evgeniy Stepanov [Wed, 27 Mar 2019 20:15:08 +0000 (20:15 +0000)]
Fix llvm-rc tests.
Summary:
Follow-up for D56743.
* Add more "--" in llvm-rc invocations.
* Add llvm-rc to the tools list. This uses full path to llvm-rc in test
RUN lines (llvm-lit -v), making them copy-pasteable.
Nikita Popov [Wed, 27 Mar 2019 19:12:09 +0000 (19:12 +0000)]
[ConstantRange] Add isWrappedSet() and isUpperSignWrapped()
Split off from D59749. This adds isWrappedSet() and
isUpperSignWrapped() set with the same behavior as isSignWrappedSet()
and isUpperWrapped() for the respectively other domain.
The methods isWrappedSet() and isSignWrappedSet() will not consider
ranges of the form [X, Max] == [X, 0) and [X, SignedMax] == [X, SignedMin)
to be wrapping, while isUpperWrapped() and isUpperSignWrapped() will.
Also replace the checks in getUnsignedMin() and friends with method
calls that implement the same logic.
Teresa Johnson [Wed, 27 Mar 2019 18:44:25 +0000 (18:44 +0000)]
[CGP] Reset DT when optimizing select instructions
Summary:
A recent fix (r355751) caused a compile time regression because setting
the ModifiedDT flag in optimizeSelectInst means that each time a select
instruction is optimized the function walk in runOnFunction stops and
restarts again (which was needed to build a new DT before we started
building it lazily in r356937). Now that the DT is built lazily, a
simple fix is to just reset the DT at this point, rather than restarting
the whole function walk.
In the future other places that set ModifiedDT may want to switch to
just resetting the DT directly. But that will require an evaluation to
ensure that they don't otherwise need to restart the function walk.
Eli Friedman [Wed, 27 Mar 2019 18:33:30 +0000 (18:33 +0000)]
[ARM] Don't confuse the scheduler for very large VLDMDIA etc.
ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the
memory operands of load and store-multiple operations. This doesn't
really fix the problem properly, but it's enough to prevent crashing,
at least.
Nikita Popov [Wed, 27 Mar 2019 18:19:33 +0000 (18:19 +0000)]
[ConstantRange] Rename isWrappedSet() to isUpperWrapped()
Split out from D59749. The current implementation of isWrappedSet()
doesn't do what it says on the tin, and treats ranges like
[X, Max] as wrapping, because they are represented as [X, 0) when
using half-inclusive ranges. This also makes it inconsistent with
the semantics of isSignWrappedSet().
This patch renames isWrappedSet() to isUpperWrapped(), in preparation
for the introduction of a new isWrappedSet() method with corrected
behavior.
Jessica Paquette [Wed, 27 Mar 2019 18:14:32 +0000 (18:14 +0000)]
[opt-viewer] Make filter_=None by default in get_remarks and gather_results
Right now, if you try to use optdiff.py on any opt records, it will fail because
its calls to gather_results weren't updated to support filtering.
Since filters are supposed to be optional, this makes them None by default in
get_remarks and in gather_results. This allows other tools that don't support
filtering to still use the functions as is.
Nikita Popov [Wed, 27 Mar 2019 17:56:15 +0000 (17:56 +0000)]
[InstCombine] Use uadd.sat and usub.sat for canonicalization
Start using the uadd.sat and usub.sat intrinsics for the existing
canonicalizations. These intrinsics should optimize better than
expanded IR, have better handling in the X86 backend and should
be no worse than expanded IR in other backends, as far as we know.
rL357012 already introduced use of uadd.sat for the add+umin pattern.
Amara Emerson [Wed, 27 Mar 2019 17:47:42 +0000 (17:47 +0000)]
[GlobalISel] Fix legalizer artifact combiner from crashing with invalid dead instructions.
The artifact combiners push instructions which have been marked for deletion
onto an list for the legalizer to deal with on return. However, for trunc(ext)
combines the combiner routine recursively calls itself. When it does this the
dead instructions list may not be empty, and the other combiners don't expect
to be dealing with essentially invalid MIR (multiple vreg defs etc).
This change fixes it by ensuring that the dead instructions are processed on
entry into tryCombineInstruction.
As a result, this fix exposed a few places in tests where G_TRUNC instructions
were not being deleted even though they were dead.
Matt Arsenault [Wed, 27 Mar 2019 17:31:29 +0000 (17:31 +0000)]
Reapply "AMDGPU: Scavenge register instead of findUnusedReg"
This reapplies r356149, using the correct overload of findUnusedReg
which passes the current iterator.
This worked most of the time, because the scavenger iterator was moved
at the end of the frame index loop in PEI. This would fail if the
spill was the first instruction. This was further hidden by the fact
that the scavenger wasn't passed in for normal frame index
elimination.
Craig Topper [Wed, 27 Mar 2019 17:29:34 +0000 (17:29 +0000)]
[X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD
Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD.
When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization.
This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work.
Quentin Colombet [Wed, 27 Mar 2019 17:27:56 +0000 (17:27 +0000)]
[PeepholeOpt] Don't stop simplifying copies on sequence of subregs
This patch removes an overly conservative check that would prevent
simplifying copies when the value we were tracking would go through
several subregister indices.
Indeed, the intend of this check was to not track values whenever
we have to compose subregister, but actually what the check was
doing was bailing anytime we see a second subreg, even if that
second subreg would actually be the new source of truth (as opposed
to a part of that subreg).
Sander de Smalen [Wed, 27 Mar 2019 17:23:38 +0000 (17:23 +0000)]
[AArch64][SVE] Asm: error on unexpected SVE vector register type suffix
This patch fixes an assembler bug that allowed SVE vector registers to contain a
type suffix when not expected. The SVE unpredicated movprfx instruction is the
only instruction affected.
The following are examples of what was previously valid:
Currently this is called before the frame size is set on the
function. For AMDGPU, the scavenger is used for large frames where
part of the offset needs to be materialized in a register, so
estimating the frame size is useful for knowing whether the scavenger
is useful.
Matt Arsenault [Wed, 27 Mar 2019 16:12:26 +0000 (16:12 +0000)]
MIR: Freeze reserved regs after parsing everything
The AMDGPU implementation of getReservedRegs depends on
MachineFunctionInfo fields that are parsed from the YAML section. This
was reserving the wrong register since it was setting the reserved
regs before parsing the correct one.
Some tests were relying on the default reserved set for the assumed
default calling convention.
Yonghong Song [Wed, 27 Mar 2019 15:45:27 +0000 (15:45 +0000)]
[BPF] use std::map to ensure consistent output
The .BTF.ext FuncInfoTable and LineInfoTable contain
information organized per ELF section. Current definition
of FuncInfoTable/LineInfoTable is:
std::unordered_map<uint32_t, std::vector<BTFFuncInfo>> FuncInfoTable
std::unordered_map<uint32_t, std::vector<BTFLineInfo>> LineInfoTable
where the key is the section name off in the string table.
The unordered_map may cause the order of section output
different for different platforms.
The same for unordered map definition of
std::unordered_map<std::string, std::unique_ptr<BTFKindDataSec>>
DataSecEntries
where BTF_KIND_DATASEC entries may have different ordering
for different platforms.
This patch fixed the issue by using std::map.
Test static-var-derived-type.ll is modified to generate two
DataSec's which will ensure the ordering is the same for all
supported platforms.
Signed-off-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357077 91177308-0d34-0410-b5e6-96231b3b80d8
Andrea Di Biagio [Wed, 27 Mar 2019 15:41:53 +0000 (15:41 +0000)]
[MCA][Pipeline] Don't visit stages in reverse order when calling method cycleEnd(). NFCI
There is no reason why stages should be visited in reverse order.
This patch allows the definition of stages that push instructions forward from
their cycleEnd() routine.
Hans Wennborg [Wed, 27 Mar 2019 14:10:11 +0000 (14:10 +0000)]
Re-commit r355490 "[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default"
Original commit by Ayonam Ray.
This commit adds a regression test for the issue discovered in the
previous commit: that the range check for the jump table can only be
omitted if the fall-through destination of the jump table is
unreachable, which isn't necessarily true just because the default of
the switch is unreachable.
This addresses the missing optimization in PR41242.
> During the lowering of a switch that would result in the generation of a
> jump table, a range check is performed before indexing into the jump
> table, for the switch value being outside the jump table range and a
> conditional branch is inserted to jump to the default block. In case the
> default block is unreachable, this conditional jump can be omitted. This
> patch implements omitting this conditional branch for unreachable
> defaults.
>
> Differential Revision: https://reviews.llvm.org/D52002
> Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev
Kevin P. Neal [Wed, 27 Mar 2019 13:30:57 +0000 (13:30 +0000)]
The IR verifier currently supports the constrained floating point intrinsics,
but the implementation is hard to extend. It doesn't currently have an
easy way to support intrinsics that, for example, lack a rounding mode.
This will be needed for impending new constrained intrinsics.
This code is split out of D55897 <https://reviews.llvm.org/D55897>, which
itself was split out of D43515 <https://reviews.llvm.org/D43515>.
Sander de Smalen [Wed, 27 Mar 2019 13:16:19 +0000 (13:16 +0000)]
[AArch64] NFC: Cleanup isAArch64FrameOffsetLegal
Cleanup isAArch64FrameOffsetLegal by:
- Merging the large switch statement to reuse AArch64InstrInfo::getMemOpInfo().
- Using AArch64InstrInfo::getUnscaledLdSt() to determine whether an instruction
has an unscaled variant.
- Simplifying the logic that calculates the offset to fit the immediate.
Andrew Ng [Wed, 27 Mar 2019 10:26:21 +0000 (10:26 +0000)]
[Support] MemoryBlock size should reflect the requested size
This patch mirrors the change made to the Unix equivalent in
r351916. This in turn fixes bugs related to the use of FileOutputBuffer
to output to "-", i.e. stdout, on Windows.
Simon Pilgrim [Wed, 27 Mar 2019 10:25:02 +0000 (10:25 +0000)]
Revert rL356864 : [X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685)
Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure.
This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain.
........
Causes PR41249
Jonas Paulsson [Wed, 27 Mar 2019 08:41:46 +0000 (08:41 +0000)]
[DAGCombiner] Don't allow addcarry if the carry producer is illegal.
getAsCarry() checks that the input argument is a carry-producing node before
allowing a transformation to addcarry. This patch adds a check to make sure
that the carry-producing node is legal. If it is not, it may not remain in a
form that is manageable by the target backend. The test case caused a
compilation failure during instruction selection for this reason on SystemZ.
Fangrui Song [Wed, 27 Mar 2019 08:19:36 +0000 (08:19 +0000)]
[llvm-dwarfdump] Simplify -o handling
ToolOutputFile handles '-' so no need to specialize here.
Also, we neither reassign the variable nor pass it around, thus no need
to use std::unique_ptr<ToolOutputFile>.
exit(1) -> return 1; to call the destructor of raw_fd_stream
Craig Topper [Wed, 27 Mar 2019 06:07:05 +0000 (06:07 +0000)]
[X86] Add test cases for missed opportunities in (x << C1) op C2 to (x op (C2>>C1)) << C1 transform.
We handle the case where the C2 does not fit in a signed 32-bit immediate, but
(C2>>C1) does. But there's also some 64-bit opportunities when C2 is not an unsigned
32-bit immediate, but (C2>>C1) is. For OR/XOR this allows us to load the
immediate with with MOV32ri instead of a movabsq. For AND it allows us to use a
32-bit AND and fold the immediate.
Craig Topper [Wed, 27 Mar 2019 04:45:58 +0000 (04:45 +0000)]
[X86] When iselling (x << C1) and/or/xor C2 as (x and/or/xor (C2>>C1)) << C1, go through the isel table instead of manually selecting.
Previously we manually selected the AND/OR/XOR with immediate and the SHL(or ADD if the shift is 1). But this was missing out on the opportunity to use a 64 bit AND with a 32-bit immediate and possibly other isel tricks we have built into the tables.
Instead, insert the new nodes into the DAG using insertDAGNode and allow them each to be selected through the normal table.
QingShan Zhang [Wed, 27 Mar 2019 03:50:16 +0000 (03:50 +0000)]
[NFC][PowerPC] Custom PowerPC specific machine-scheduler
This patch lays the groundwork for extending the generic machine scheduler by providing a PPC-specific implementation.
There are no functional changes as this is an incremental patch that simply provides the necessary overrides which just
encapsulate the behavior of the generic scheduler. Subsequent patches will add specific behavior.
Craig Topper [Wed, 27 Mar 2019 02:08:03 +0000 (02:08 +0000)]
[X86] Simplify some code in matchBitExtract by using ANY_EXTEND.
We were manually outputting the code we would get from selecting ANY_EXTEND. We
can save some code by just letting an ANY_EXTEND go through isel on its own.
[Remarks] Emit a section containing remark diagnostics metadata
A section containing metadata on remark diagnostics will be emitted if
the flag (-mllvm) -remarks-section is present.
For now, the metadata is:
* a magic number for remarks: "REMARKS\0"
* the version number: a little-endian uint64_t
* the absolute file path to the serialized remark diagnostics: a
null-terminated string.
Nikita Popov [Tue, 26 Mar 2019 22:37:26 +0000 (22:37 +0000)]
[ConstantRange] Exclude full set from isSignWrappedSet()
Split off from D59749. This uses a simpler and more efficient
implementation of isSignWrappedSet(), and considers full sets
as non-wrapped, to be consistent with isWrappedSet(). Otherwise
the behavior is unchanged.
There are currently only two users of this function and both already
check for isFullSet() || isSignWrappedSet(), so this is not going to
cause a change in overall behavior.
Shoaib Meenai [Tue, 26 Mar 2019 22:16:53 +0000 (22:16 +0000)]
[cmake] Reset variable before using it
A bunch of macros use the same variable name, and since CMake macros
don't get their own scope, the value persists across macro invocations,
and we can end up exporting targets which shouldn't be exported. Clear
the variable before each use to avoid this.
Converting these macros to functions would also help, since it would
avoid the variable leaking into its parent scope, and that's something I
plan to follow up with. It won't fully address the problem, however,
since functions still inherit variables from their parent scopes, so if
someone in the parent scope just happened to use the same variable name
we'd still have the same issue.
Quentin Colombet [Tue, 26 Mar 2019 21:27:15 +0000 (21:27 +0000)]
[LiveRange] Reset the VNIs when splitting subranges
When splitting a subrange we end up with two different subranges covering
two different, non overlapping, lanes.
As part of this splitting the VNIs of the original live-range need
to be dispatched to the subranges according to which lanes they are
actually defining.
Prior to this patch we were assuming that all values were defining
all lanes. This was wrong as demonstrated by llvm.org/PR40835.
Sanjay Patel [Tue, 26 Mar 2019 20:54:15 +0000 (20:54 +0000)]
[SDAG] add simplifications for FP at node creation time
We have the folds for fadd/fsub/fmul already in DAGCombiner,
so it may be possible to remove that code if we can guarantee that
these ops are zapped before they can exist.
Ali Tamur [Tue, 26 Mar 2019 20:05:27 +0000 (20:05 +0000)]
Revert "[llvm] Reapply "Prevent duplicate files in debug line header in dwarf 5.""
This reverts commit rL357020.
The commit broke the test llvm/test/tools/llvm-objdump/embedded-source.test
on some builds including clang-ppc64be-linux-multistage,
clang-s390x-linux, clang-with-lto-ubuntu, clang-x64-windows-msvc,
llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast (and others).
Sam Clegg [Tue, 26 Mar 2019 19:46:15 +0000 (19:46 +0000)]
[WebAssembly] Initial implementation of PIC code generation
This change implements lowering of references global symbols in PIC
mode.
This change implements lowering of global references in PIC mode using a
new @GOT reference type. @GOT references can be used with function or
data symbol names combined with the get_global instruction. In this case
the linker will insert the wasm global that stores the address of the
symbol (either in memory for data symbols or in the wasm table for
function symbols).
For now I'm continuing to use the R_WASM_GLOBAL_INDEX_LEB relocation
type for this type of reference which means that this relocation type
can refer to either a global or a function or data symbol. We could
choose to introduce specific relocation types for GOT entries in the
future. See the current dynamic linking proposal: