[Nios2] final infrastructure to provide compilation of a return from a function
This patch includes all missing functionality needed to provide first
compilation of a simple program that just returns from a function.
I've added a test case that checks for "ret" instruction printed in assembly
output.
Patch by Andrei Grischenko (andrei.l.grischenko@intel.com)
Differential revision: https://reviews.llvm.org/D39688
[dsymutil] Add -verify option to run DWARF verifier after linking.
This patch adds support for running the DWARF verifier on the linked
debug info files. If the -verify options is specified and verification
fails, dsymutil exists with abort with non-zero exit code. This behavior
is *not* enabled by default.
Pavel Labath [Thu, 7 Dec 2017 10:54:23 +0000 (10:54 +0000)]
[Testing/Support] Make matchers work with Expected<T&>
Summary:
This did not work because the ExpectedHolder was trying to hold the
value in an Optional<T*>. Instead of trying to mimic the behavior of
Expected and try to make ExpectedHolder work with references and
non-references, I simply store the reference to the Expected object in
the holder.
I also add a bunch of tests for these matchers, which have helped me
flesh out some problems in my initial implementation of this patch, and
uncovered the fact that we are not consistent in quoting our values in
the matcher output (which I also fix).
Alex Bradbury [Thu, 7 Dec 2017 10:46:23 +0000 (10:46 +0000)]
[RISCV] MC layer support for the standard RV32D instruction set extension
As the FPR32 and FPR64 registers have the same names, use
validateTargetOperandClass in RISCVAsmParser to coerce a parsed FPR32 to an
FPR64 when necessary. The rest of this patch is very similar to the RV32F
patch.
[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.
Work towards the unification of MIR and debug output by refactoring the
interfaces.
For MachineOperand::print, keep a simple version that can be easily called
from `dump()`, and a more complex one which will be called from both the
MIRPrinter and MachineInstr::print.
Add extra checks inside MachineOperand for detached operands (operands
with getParent() == nullptr).
Alex Bradbury [Thu, 7 Dec 2017 10:26:05 +0000 (10:26 +0000)]
[RISCV] MC layer support for the standard RV32F instruction set extension
The most interesting part of this patch is probably the handling of
rounding mode arguments. Sadly, the RISC-V assembler handles floating point
rounding modes as a special "argument" when it would be more consistent to
handle them like the atomics, opcode suffixes. This patch supports parsing
this optional parameter, using InstAlias to allow parsing these floating point
instructions when no rounding mode is specified.
Alex Bradbury [Thu, 7 Dec 2017 09:51:55 +0000 (09:51 +0000)]
[TableGen] Give the option of tolerating duplicate register names
A number of architectures re-use the same register names (e.g. for both 32-bit
FPRs and 64-bit FPRs). They are currently unable to use the tablegen'erated
MatchRegisterName and MatchRegisterAltName, as tablegen (when built with
asserts enabled) will fail.
When the AllowDuplicateRegisterNames in AsmParser is set, duplicated register
names will be tolerated. A backend can then coerce registers to the desired
register class by (for instance) implementing validateTargetOperandClass.
At least the in-tree Sparc backend could benefit from this, as does RISC-V
(single and double precision floating point registers).
Gadi Haber [Thu, 7 Dec 2017 09:16:34 +0000 (09:16 +0000)]
[X86][FMA][FMA4]: Adding full coverage of MC encoding for the FMA, FMA4 isa sets.<NFC>
NFC.
Adding MC regressions tests to cover the FMA and FMA4 ISA sets.
This patch is part of a larger task to cover MC encoding of all X86 ISA Sets starting revision https://reviews.llvm.org/D39952
Gadi Haber [Thu, 7 Dec 2017 09:00:19 +0000 (09:00 +0000)]
[X86][X87]: Adding full coverage of MC encoding for all X87 ISA Sets.<NFC>
NFC.
Currently, not all the X86 ISA Sets are covered by the MC regressions tests for X86.
A full coverage needs to be added for each ISA set and for both 32bit and 64bit instructions + registers.
This patch includes MC assembly tests for the X87 32bit and 64bit.
Sam Clegg [Thu, 7 Dec 2017 02:55:51 +0000 (02:55 +0000)]
[WebAssembly] section kind can be code
Currently, when creating a named section, the Wasm
frontend forces it to use `SectionKind::Data`, whereas
in fact C++ does generate code sections with custom
names.
Dan Gohman [Wed, 6 Dec 2017 23:57:11 +0000 (23:57 +0000)]
[WebAssembly] Import the linear memory and function table.
Instead of having .o files contain linear-memory and function table
definitions, use imports. This is more consistent with the stack pointer
being imported, and it's consistent with the linker being the one to
decide whether linear memory and function table are imported or defined
in the linked output. This implements tool-conventions #23.
Florian Hahn [Wed, 6 Dec 2017 22:48:36 +0000 (22:48 +0000)]
[AArch64] Add patterns to replace fsub fmul with fma fneg.
Summary:
This patch adds MachineCombiner patterns for transforming
(fsub (fmul x y) z) into (fma x y (fneg z)). This has a lower
latency on micro architectures where fneg is cheap.
Matthew Simpson [Wed, 6 Dec 2017 21:22:54 +0000 (21:22 +0000)]
[PGO] Make indirect call promotion a utility
This patch factors out the main code transformation utilities in the pgo-driven
indirect call promotion pass and places them in Transforms/Utils. The change is
intended to be a non-functional change, letting non-pgo-driven passes share a
common implementation with the existing pgo-driven pass.
The common utilities are used to conditionally promote indirect call sites to
direct call sites. They perform the underlying transformation, and do not
consider profile information. The pgo-specific details (e.g., the computation
of branch weight metadata) have been left in the indirect call promotion pass.
Florian Hahn [Wed, 6 Dec 2017 20:27:33 +0000 (20:27 +0000)]
[MachineCombiner] Add up latencies of all instructions in new pattern.
Summary:
When calculating the RootLatency, we add up all the latencies of the
deleted instructions. But for NewRootLatency we only add the latency of
the new root instructions, ignoring the latencies of the other
instructions inserted. This leads the combiner to underestimate the cost
of patterns which add multiple instructions. This patch fixes that by
summing up the latencies of all new instructions. For NewRootNode, the
more complex getLatency function is used.
Note that we may be slightly more precise than just summing up
all latencies. For example, consider a pattern like
r1 = INS1 ..
r2 = INS2 ..
r3 = INS3 r1, r2
I think in some other places, the total latency of the pattern would be
estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider
that worth changing, I think it would be best to do in a follow-up
patch.
Alina Sbirlea [Wed, 6 Dec 2017 19:56:37 +0000 (19:56 +0000)]
[ModRefInfo] Do not use ModRefInfo result in if conditions as this makes
assumptions about the values in the enum. Replace with wrapper returning
bool [NFC].
Rui Ueyama [Wed, 6 Dec 2017 19:18:24 +0000 (19:18 +0000)]
[COFF] Ignore semicolons in module definition identifiers
Patch by David Major.
The NSS project's .def files make heavy use of semicolons in a
frightening attempt at portability:
https://hg.mozilla.org/projects/nss/raw-file/tip/lib/ckfw/capi/nsscapi.def
lld-link was treating the semicolon as part of the export name,
resulting in unresolved symbols. This patch includes ';' in the list of
characters to split on.
Craig Topper [Wed, 6 Dec 2017 18:40:46 +0000 (18:40 +0000)]
[X86] Simplify the TTI code for getInterleavedMemoryOpCost around for AVX512BW. NFCI
Previously the lambda for AVX512 passed out a flag that indicated whether AVX512BW was required and that was checked against the AVX512BW subtarget flag outside.
This patch changes the interface to pass the AVX512BW subtarget bit in and return its value if we detect 16 or 8 bit types.
Shoaib Meenai [Wed, 6 Dec 2017 18:33:07 +0000 (18:33 +0000)]
[cmake] Remove unnecessary header include in atomics check
The header include was required to work around PR19898, as noted in that
comment. That PR has since been marked resolved fixed, and the
configuration check passes without the header inclusion both when
compiling on Windows with cl and when cross-compiling on Linux using
clang-cl.
I noticed this because the inclusion was cased incorrectly (Intrin.h
instead of intrin.h), which when cross-compiling on a case sensitive
file system would cause the intrin.h from the Windows SDK to be included
(which LLVM can't handle) instead of the one from clang's resource
directory, making the check fail. This is the same issue as r309980.
Correcting the case of the inclusion makes the check pass when cross
compiling, but it seems better to get rid of the inclusion entirely,
since it appears to be unnecessary now.
Adam Nemet [Wed, 6 Dec 2017 16:50:50 +0000 (16:50 +0000)]
[opt-viewer] Suppress noisy Swift remarks
Most likely, this is not how we want to handle this in the long term. This
code should probably be in the Swift repo and somehow plugged into the
opt-viewer. This is still however very experimental at this point so I don't
want to over-engineer it at this point.
Nirav Dave [Wed, 6 Dec 2017 15:30:13 +0000 (15:30 +0000)]
[ARM][AArch64][DAG] Reenable post-legalize store merge
Reenable post-legalize stores with constant merging computation and
corresponding test case.
* Properly truncate store merge constants
* Disable merging of truncated stores floating points
* Ensure merges of constant stores into a single vector are
constructed from legal elements.
Jonas Paulsson [Wed, 6 Dec 2017 13:53:24 +0000 (13:53 +0000)]
[SystemZ] Bugfix in expandRxSBG()
Csmith discovered a program that caused wrong code generation with -O0:
When handling a SIGN_EXTEND in expandRxSBG(), RxSBG.BitSize may be less than
the Input width (if a truncate was previously traversed), so maskMatters()
should be called with a masked based on the width of the sign extend result
instead.
Max Kazantsev [Wed, 6 Dec 2017 12:44:56 +0000 (12:44 +0000)]
[SCEV][NFC] Check NoWrap flags before lexicographical comparison of SCEVs
Lexicographical comparison of SCEV trees is potentially expensive for big
expression trees. We can define ordering between them for AddRecs and
N-ary operations by SCEV NoWrap flags to make non-equality check
cheaper.
This change does not prevent grouping eqivalent SCEVs together and is
not supposed to have any meaningful impact on behavior of any transforms.
Max Kazantsev [Wed, 6 Dec 2017 08:58:16 +0000 (08:58 +0000)]
[SCEV][NFC] Share value cache between SCEVs in GroupByComplexity
Current implementation of `compareSCEVComplexity` is being unreasonable with `SCEVUnknown`s:
every time it sees one, it creates a new value cache and tries to prove equality of two values using it.
This cache reallocates and gets lost from SCEV to SCEV.
This patch changes this behavior: now we create one cache for all values and share it between SCEVs.
Craig Topper [Wed, 6 Dec 2017 07:37:20 +0000 (07:37 +0000)]
[X86] Split 512-bit vector extends from types other than vXi1 out of LowerZERO_EXTEND_AVX512/LowerSIGN_EXTEND_AVX512. NFCI
Most of the code in these routines is for handling extends from vXi1 types. The 512-bit handling for other extends is very much like the AVX2 code. So make the special routines just do vXi1 types and move the other 512-bit handling to the place that handles AVX2.
Hans Wennborg [Wed, 6 Dec 2017 01:47:55 +0000 (01:47 +0000)]
Revert r319482 and r319483 "[memcpyopt] Teach memcpyopt to optimize across basic blocks"
This caused PR35519.
> [memcpyopt] Teach memcpyopt to optimize across basic blocks
>
> This teaches memcpyopt to make a non-local memdep query when a local query
> indicates that the dependency is non-local. This notably allows it to
> eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
>
> Fixes PR28958.
>
> Differential Revision: https://reviews.llvm.org/D38374
>
> [memcpyopt] Commit file missed in r319482.
>
> This change was meant to be included with r319482 but was accidentally
> omitted.
Zachary Turner [Wed, 6 Dec 2017 00:58:12 +0000 (00:58 +0000)]
Regex out the local hash comparison test.
Since the local hash is a different number of bytes depending
on host architecture, we don't have a consistent value. I
will need to re-do this test for both x86 and x64. For now
it accepts any value for the local hash.
Craig Topper [Tue, 5 Dec 2017 23:08:30 +0000 (23:08 +0000)]
[SelectionDAG] Don't promote mask operand when widening mstore and mscatter.
If the mask needs to be promoted that should occur by the legalizer detecting the mask operand needs to be promoted not as a side effect of another action.