Reid Kleckner [Sat, 19 Jan 2019 00:33:02 +0000 (00:33 +0000)]
[X86] Deduplicate static calling convention helpers for code size, NFC
Summary:
Right now we include ${TGT}GenCallingConv.inc once per each instruction
selection method implemented by ${TGT}:
- ${TGT}ISelLowering.cpp
- ${TGT}CallLowering.cpp
- ${TGT}FastISel.cpp
Instead, add a mechanism to tablegen for marking a particular convention
as "External", which causes tablegen to emit into the ::llvm namespace,
instead of as a static helper. This allows us to provide a header to
forward declare it, so we can simply call the function from all the
places it is referenced. Typically the calling convention analyzer is
called indirectly, so it doesn't benefit from inlining.
This saves a bit of final binary size, but mostly just saves object file
size:
before after diff artifact
12852K 12492K -360K X86ISelLowering.cpp.obj
4640K 4280K -360K X86FastISel.cpp.obj
1704K 2092K +388K X86CallingConv.cpp.obj
52448K 52336K -112K llc.exe
I didn't collect before numbers for X86CallLowering.cpp.obj, which is
for GlobalISel, but we should save 360K there as well.
This patch applies the strategy to the X86 backend, but there is no
reason it couldn't be applied to the other backends that implement
multiple ISel strategies, like AArch64.
Nico Weber [Sat, 19 Jan 2019 00:10:54 +0000 (00:10 +0000)]
Use llvm_canonicalize_cmake_booleans for LLVM_LIBXML2_ENABLED [llvm]
r291284 added a nice mechanism to consistently pass CMake on/off toggles to
lit. This change uses it for LLVM_LIBXML2_ENABLED too (which was added around
the same time and doesn't use the new system yet).
Also alphabetically sort the list passed to llvm_canonicalize_cmake_booleans()
in llvm/test/CMakeLists.txt.
Sanjay Patel [Fri, 18 Jan 2019 20:42:12 +0000 (20:42 +0000)]
[x86] add more movmsk tests; NFC
The existing tests already show a sub-optimal transform,
but this should make it clear that we can't just match
an 'and' op when creating movmsk instructions.
Craig Topper [Fri, 18 Jan 2019 20:14:46 +0000 (20:14 +0000)]
[X86] Lower avx512f scatter intrinsics to X86MaskedScatterSDNode instead of going directly to MachineSDNode.
This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field.
Roman Tereshin [Fri, 18 Jan 2019 20:13:42 +0000 (20:13 +0000)]
[CGP] Check for existing inttotpr before creating new one
Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions of
the same integer value while sinking address computations, but rather
CSEs them on the fly: excessive inttoptr's confuse SCEV into thinking
that related pointers have nothing to do with each other.
This problem blocks LoadStoreVectorizer from vectorizing some of the
loads / stores in a downstream target.
Bjorn Pettersson [Fri, 18 Jan 2019 20:06:13 +0000 (20:06 +0000)]
[SelectionDAG] Updates for -dag-dump-verbose
Summary:
This patch makes some changes related to -dag-dump-verbose.
Main use case has been when debugging how SelectionDAG is
dealing with debug info (SDDbgValue nodes).
1) We now print the number of DbgValues that are mapped to each
SDNode.
2) Removed duplicated printing of DebugLoc (nowadays DebugLoc is
printed also when not using -dag-dump-verbose).
3) Renamed SDDbgValue::dump to SDDbgValue::print, and added a
new SDDbgValue::dump that will start a new line after calling
print.
4) SDDbgValue::print now prints "Order", and it also prints
some additional information when kind is CONST/FRAMEIX/VREG.
5) SelectionDAG::dump() now dumps all SDDbgValue nodes after
the list of SDNodes (both "regular" and "ByVal" SDDbgValue:s).
Invalidated nodes are not printed.
6) Prohibit inline printing of SDNode operands that has SDDbgValue
nodes associated to them.
Daniel Sanders [Fri, 18 Jan 2019 18:40:35 +0000 (18:40 +0000)]
[adt] Twine(nullptr) derefs the nullptr. Add a deleted Twine(std::nullptr_t)
Summary:
nullptr can implicitly convert to Twine as Twine(nullptr) in which case it
resolves to Twine(const char *). This constructor derefs the pointer and
therefore doesn't work. Add a Twine(std::nullptr_t) = delete to make it a
compile time error.
It turns out that in-tree usage of Twine(nullptr) is confined to a single
private method in IRBuilder where foldConstant(... const Twine &Name = nullptr)
and this method is only ever called with an explicit Name argument as making it
a mandatory argument doesn't cause compile-time or run-time errors.
Florian Hahn [Fri, 18 Jan 2019 18:37:38 +0000 (18:37 +0000)]
[SelectionDAG] Split very large token factors for chained stores to 64k chunks.
Similar to D55073. Without this change, the DAG combiner crashes on code
with more than 64k of stores in a single basic block that form parallelizable
chains.
Craig Topper [Fri, 18 Jan 2019 18:22:26 +0000 (18:22 +0000)]
[X86] Lower avx2/avx512f gather intrinsics to X86MaskedGatherSDNode instead of going directly to MachineSDNode.:
This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field.
Florian Hahn [Fri, 18 Jan 2019 17:36:22 +0000 (17:36 +0000)]
[LCSSA] Skip blocks in sub-loops when scanning for uses.
Summary:
Scanning blocks in sub-loops for uses is unnecessary, as they were
already handled while dealing with the containing sub-loop.
This speeds up LCSSA for highly nested loops. For the test case in PR37202, it
halves the time spent in LCSSA. In cases were we won't be able to skip
any blocks, the additional lookup should be negligible.
Time-passes without this patch for test case from PR37202:
Total Execution Time: 48.5505 seconds (48.5511 wall clock)
Defer inline asm's output fixup work until after we've generated the
inline asm node itself. Remove StoresToEmit, IndirectStoresToEmit, and
RetValRegs in favor of using ConstraintOperands.
Pavel Labath [Fri, 18 Jan 2019 12:52:03 +0000 (12:52 +0000)]
[ADT] Add streaming operators for llvm::Optional
Summary:
The operators simply print the underlying value or "None".
The trickier part of this patch is making sure the streaming operators
work even in unit tests (which was my primary motivation, though I can
also see them being useful elsewhere). Since the stream operator was a
template, implicit conversions did not kick in, and our gtest glue code
was explicitly introducing an implicit conversion to make sure other
implicit conversions do not kick in :P. I resolve that by specializing
llvm_gtest::StreamSwitch for llvm:Optional<T>.
George Rimar [Fri, 18 Jan 2019 11:33:26 +0000 (11:33 +0000)]
[llvm-objdump] - Move getRelocationValueString and dependenices out of the llvm-objdump.cpp
getRelocationValueString is a dispatcher function that calls the
corresponding ELF/COFF/Wasm/MachO implementations
that currently live in the llvm-objdump.cpp file.
These implementations better be moved to ELFDump.cpp,
COFFDump.cpp and other corresponding files, to move platform specific
implementation out from the common logic.
The patch does that. Also, I had to move ToolSectionFilter helper
and SectionFilterIterator, SectionFilter to a header to make them
available across the objdump code.
Dylan McKay [Fri, 18 Jan 2019 11:27:38 +0000 (11:27 +0000)]
[AVR] Fix codegen bug in 16-bit loads
Prior to this patch, the AVR::LDWRdPtr instruction was always lowered to
instructions of this pattern:
ld $GPR8, [PTR:XYZ]+
ld $GPR8, [PTR]+1
This has a problem; the [PTR] is incremented in-place once, but never
decremented.
Future uses of the same pointer will use the now clobbered value,
leading to the pointer being incorrect by an offset of one.
This patch modifies the expansion code of the LDWRdPtr pseudo
instruction so that the pointer variable is not silently clobbered in
future uses in the same live range.
Florian Hahn [Fri, 18 Jan 2019 10:00:38 +0000 (10:00 +0000)]
[SelectionDAG] Add static getMaxNumOperands function to SDNode.
Summary:
Use this helper to make sure we use the same value at various places.
This will likely be needed at more places were we currently crash
because we use more operands than possible.
Shiva Chen [Fri, 18 Jan 2019 08:36:06 +0000 (08:36 +0000)]
[ScheduleDAGRRList] Do not preschedule the node has ADJCALLSTACKDOWN parent
We should not pre-scheduled the node has ADJCALLSTACKDOWN parent,
or else, when bottom-up scheduling, ADJCALLSTACKDOWN and
ADJCALLSTACKUP may hold CallResource too long and make other
calls can't be scheduled. If there's no other available node
to schedule, the scheduler will try to rename the register by
creating copy to avoid the conflict which will fail because
CallResource is not a real physical register.
Dylan McKay [Fri, 18 Jan 2019 06:10:41 +0000 (06:10 +0000)]
[AVR] Expand 8/16-bit multiplication to libcalls on MCUs that don't have hardware MUL
This change modifies the LLVM ISel lowering settings so that
8-bit/16-bit multiplication is expanded to calls into the compiler
runtime library if the MCU being targeted does not support
multiplication in hardware.
Before this, MUL instructions would be generated on CPUs like the
ATtiny85, triggering a CPU reset due to an illegal instruction at
runtime.
First raised in https://github.com/avr-rust/rust/issues/124.
Nico Weber [Fri, 18 Jan 2019 04:09:30 +0000 (04:09 +0000)]
gn build: unbreak mac (and maybe win) after r351258, r351277
The check-hwasan build files assert that current_os == "linux" || current_os ==
"android", so pull it in only there.
ar is unused on mac, so don't set it in the stage2 toolchain. (It'd be nicer to
use llvm-libtool on mac instead of host libtool, but llvm-libtool doesn't seem
to understand the -no_warning_for_no_symbols flag.)
Nico Weber [Fri, 18 Jan 2019 03:36:04 +0000 (03:36 +0000)]
mac: Correctly disable tools/lto tests when building with LLVM_ENABLE_PIC=OFF
llvm/tools sets LLVM_TOOL_LTO_BUILD to Off if LLVM_ENABLE_PIC=OFF, but that's
not visible in llvm/test.
r289662 added the llvm_tool_lto_build lit parameter, there the intent was to
use it with an explicit -DLLVM_TOOL_LTO_BUILD=OFF, which is visible globally.
On the review for that (D27739), a mild preference was expressed for using a
lit parameter over checking the existence of libLTO.dylib. Since that works
with the LLVM_ENABLE_PIC=OFF case too and since it matches what we do for the
gold plugin, switch to that approach.
Vedant Kumar [Thu, 17 Jan 2019 22:36:05 +0000 (22:36 +0000)]
[HotColdSplit] Allow outlining with live outputs
Prior to r348205, extracting code regions with live output values was
disabled because of a miscompilation (PR39433). Lift the restriction as
PR39433 has been addressed.
Tested on LNT+externals, on a run of check-llvm in a stage2 build, and
with a full build of iOS (with hot/cold splitting enabled).
[mips] Emit .reloc R_{MICRO}MIPS_JALR along with j(al)r(c) $25
The callee address is added as an optional operand (MCSymbol) in
AdjustInstrPostInstrSelection() and then used by asm printer to insert:
'.reloc tmplabel, R_MIPS_JALR, symbol
tmplabel:'.
Controlled with '-mips-jalr-reloc', default is true.
Vedant Kumar [Thu, 17 Jan 2019 21:29:34 +0000 (21:29 +0000)]
[HotColdSplit] Simplify tests by lowering their splitting thresholds
This gets rid of the brittle/mysterious calls to @sink()/@sideeffect()
peppered throughout the test cases. They are no longer needed to force
splitting to occur.
Wei Mi [Thu, 17 Jan 2019 20:48:34 +0000 (20:48 +0000)]
[SampleFDO] Skip profile reading when flattened profile used in ThinLTO postlink
If the sample profile has no inlining hierachy information included, we call
the sample profile is flattened. For flattened profile, in ThinLTO postlink
phase, SampleProfileLoader's hot function inlining and profile annotation will
do nothing, so it is better to save the effort to read in the profile and run
the sample profile loader pass. It is helpful for reducing compile time when
the flattened profile is huge.
Reid Kleckner [Thu, 17 Jan 2019 20:46:53 +0000 (20:46 +0000)]
[InstCombine] Don't sink dynamic allocas
Summary:
InstCombine's sinking algorithm only thinks about memory. It doesn't
think about non-memory constraints like stack object lifetime. It can
sink dynamic allocas across a stacksave call, which may be used with
stackrestore, which can incorrectly reduce the lifetime of the dynamic
alloca.
Erik Pilkington [Thu, 17 Jan 2019 20:37:51 +0000 (20:37 +0000)]
NFC: Make the copies of the demangler byte-for-byte identical
With this patch, the copies of the files ItaniumDemangle.h,
StringView.h, and Utility.h are kept byte-for-byte in sync between
libcxxabi and llvm. All differences (namespaces, fallthrough, and
unreachable macros) are defined in each copies' DemanglerConfig.h.
This patch also adds a script to copy changes from libcxxabi
(cp-to-llvm.sh), and a README.txt explaining the situation.
[WebAssembly] Fixed objdump not parsing function headers.
Summary:
objdump was interpreting the function header containing the locals
declaration as instructions. To parse these without injecting target
specific code in objdump, MCDisassembler::onSymbolStart was added to
be implemented by the WebAssembly implemention.
WasmObjectFile now returns a code offset for the "address" of a symbol,
rather than the index. This is also more in-line with what other
targets do.
Also ensured that the AsmParser correctly puts each function
in its own segment to enable this test case.
Teresa Johnson [Thu, 17 Jan 2019 15:49:03 +0000 (15:49 +0000)]
[ThinLTO] Add summary entries for index-based WPD
Summary:
If LTOUnit splitting is disabled, the module summary analysis computes
the summary information necessary to perform single implementation
devirtualization during the thin link with the index and no IR. The
information collected from the regular LTO IR in the current hybrid WPD
algorithm is summarized, including:
1) For vtable definitions, record the function pointers and their offset
within the vtable initializer (subsumes the information collected from
IR by tryFindVirtualCallTargets).
2) A record for each type metadata summarizing the vtable definitions
decorated with that metadata (subsumes the TypeIdentiferMap collected
from IR).
Also added are the necessary bitcode records, and the corresponding
assembly support.
James Henderson [Thu, 17 Jan 2019 15:34:12 +0000 (15:34 +0000)]
[llvm-readobj][ELF]Add demangling support
This change adds demangling support to the ELF side of llvm-readobj,
under the switch --demangle/-C.
The following places are demangled: symbol table dumps (static and
dynamic), relocation dumps (static and dynamic), addrsig dumps, call
graph profile dumps, and group section signature symbols.
Although GNU readelf doesn't support demangling, it is still a useful
feature to have, and brings it on a par with llvm-objdump's
capabilities.
This fixes https://bugs.llvm.org/show_bug.cgi?id=40054.
James Henderson [Thu, 17 Jan 2019 15:18:44 +0000 (15:18 +0000)]
Move demangling function from llvm-objdump to Demangle library
This allows it to be used in an upcoming llvm-readobj change.
A small change in internal behaviour of the function is to always call
the microsoftDemangle function if the string does not have an itanium
encoding prefix, rather than only if it starts with '?'. This is
harmless because the microsoftDemangle function does the same check
already.
Max Kazantsev [Thu, 17 Jan 2019 12:51:10 +0000 (12:51 +0000)]
[LoopSimplifyCFG] Form LCSSA when a parent loop becomes a sibling
During the transforms in LoopSimplifyCFG, when we remove a dead exiting edge, the
parent loop may stop being reachable from the child loop, and therefore they become
siblings. If the former child loop had uses of some values from its former parent loop,
now such uses will require LCSSA Phis, even if they weren't needed before. So we must
form LCSSA for all loops that stopped being ancestors of the current loop in this case.
Max Kazantsev [Thu, 17 Jan 2019 12:25:40 +0000 (12:25 +0000)]
[LoopSimplifyCFG] Fix order of deletion of complex dead subloops
Function `DeleteDeadBlock` requires that all predecessors of a block
being deleted have already been deleted, with the exception of a
single-block loop. When we use it for removal of dead subloops that
contain more than one block, we may not fulfull this requirement and
fail an assertion.
This patch replaces invocation of `DeleteDeadBlock` with a generalized
version `DeleteDeadBlocks` that is able to deal with multiple dead blocks,
even if they contain some cycles.
Diana Picus [Thu, 17 Jan 2019 10:11:55 +0000 (10:11 +0000)]
[ARM GlobalISel] Allow calls to varargs functions
Allow varargs functions to be called, both in arm and thumb mode. This
boils down to choosing the correct calling convention, which we can
easily test by making sure arm_aapcscc is used instead of
arm_aapcs_vfpcc when the callee is variadic.
Alex Bradbury [Thu, 17 Jan 2019 10:04:39 +0000 (10:04 +0000)]
[RISCV] Add codegen support for RV64A
In order to support codegen RV64A, this patch:
* Introduces masked atomics intrinsics for atomicrmw operations and cmpxchg
that use the i64 type. These are ultimately lowered to masked operations
using lr.w/sc.w, but we need to use these alternate intrinsics for RV64
because i32 is not legal
* Modifies RISCVExpandPseudoInsts.cpp to handle PseudoAtomicLoadNand64 and
PseudoCmpXchg64
* Modifies the AtomicExpandPass hooks in RISCVTargetLowering to sext/trunc as
needed for RV64 and to select the i64 intrinsic IDs when necessary
* Adds appropriate patterns to RISCVInstrInfoA.td
* Updates test/CodeGen/RISCV/atomic-*.ll to show RV64A support
This ends up being a fairly mechanical change, as the logic for RV32A is
effectively reused.
Sanjin Sijaric [Thu, 17 Jan 2019 09:45:17 +0000 (09:45 +0000)]
[ARM64][Windows] Share unwind codes between epilogues
There are cases where we have multiple epilogues that have the exact same unwind
code sequence. In that case, the epilogues can share the same unwind codes in
the .xdata section. This should get us past the assert "SEH unwind data
splitting not yet implemented" in many cases.
We still need to add support for generating multiple .pdata/.xdata sections for
those functions that need to be split into fragments.
George Rimar [Thu, 17 Jan 2019 09:13:17 +0000 (09:13 +0000)]
[llvm-objdump] - Simplify the getRelocationValueString. NFCI.
This refactors the getRelocationValueString method.
It is a bit overcomplicated and it is possible to reduce it without
losing the functionality it seems.
Vedant Kumar [Thu, 17 Jan 2019 02:15:05 +0000 (02:15 +0000)]
[MergeFunc] Prevent silent miscompile of vararg functions
The function merging pass miscompiles identical vararg functions. The
forwarding thunk it emits doesn't forward the full variable-length list
of arguments. Disable merging for vararg functions for now.