Tom Stellard [Mon, 24 Jul 2017 19:28:30 +0000 (19:28 +0000)]
test-release.sh: Fix phase2 and phase3 binary comparision
Summary:
scudo_utils.cpp.o from compiler-rt has one of the host compiler's builtin
include paths stored in the .debug_line section. So we need to do
sed 's,Phase1,Phase2,g` on the Phase2 object file so it matches Phase3.
Leo Li [Mon, 24 Jul 2017 17:26:28 +0000 (17:26 +0000)]
[CMake] Remove redundant logic in runtimes/CMakeList.txt
Summary:
`SUB_CHECK_TARGETS` contains all test targets in `SUB_COMPONENTS` when
we load `Components.cmake`. We don't need to add those targets
again and having duplicate targets will break the cmake policy CMP0002.
Benjamin Kramer [Mon, 24 Jul 2017 16:18:09 +0000 (16:18 +0000)]
[CodeGenPrepare] Cut off FindAllMemoryUses if there are too many uses.
This avoids excessive compile time. The case I'm looking at is
Function.cpp from an old version of LLVM that still had the giant memcmp
string matcher in it. Before r308322 this compiled in about 2 minutes,
after it, clang takes infinite* time to compile it. With this patch
we're at 5 min, which is still bad but this is a pathological case.
The cut off at 20 uses was chosen by looking at other cut-offs in LLVM
for user scanning. It's probably too high, but does the job and is very
unlikely to regress anything.
Fixes PR33900.
* I'm impatient and aborted after 15 minutes, on the bug report it was
killed after 2h.
Propagates the GraphT template argument to the default value of
the AnalysisGraphTraitsT template argument. This allows to specialize
the DefaultAnalysisGraphTraits<AnalysisT,GraphT> for analysis with a
graph type different from the analysis type and it will automatically
get picked-up.
Note: This was probably the intended purpose and should not result in any
functional change.
Chad Rosier [Sun, 23 Jul 2017 16:38:08 +0000 (16:38 +0000)]
[AArch64] Redundant Copy Elimination - remove more zero copies.
This patch removes unnecessary zero copies in BBs that are targets of b.eq/b.ne
and we know the result of the compare instruction is zero. For example,
Max Kazantsev [Sun, 23 Jul 2017 15:40:19 +0000 (15:40 +0000)]
[SCEV] Limit max size of AddRecExpr during evolving
When SCEV calculates product of two SCEVAddRecs from the same loop, it
tries to combine them into one big AddRecExpr. If the sizes of the initial
SCEVs were `S1` and `S2`, the size of their product is `S1 + S2 - 1`, and every
operand of the resulting SCEV is combined from operands of initial SCEV and
has much higher complexity than they have.
As result, if we try to calculate something like:
%x1 = {a,+,b}
%x2 = mul i32 %x1, %x1
%x3 = mul i32 %x2, %x1
%x4 = mul i32 %x3, %x2
...
The size of such SCEVs grows as `2^N`, and the arguments
become more and more complex as we go forth. This leads
to long compilation and huge memory consumption.
This patch sets a limit after which we don't try to combine two
`SCEVAddRecExpr`s into one. By default, max allowed size of the
resulting AddRecExpr is set to 16.
[Modules] Rework r274270. Let Clang targets depend on intrinsics_gen.
This gets rid of almost LLVM targets unconditionally depending on intrinsic_gen.
Clang's modules still have weird dependencies and hard to remove intrinsics_gen in better way.
Then, it'd be better to give whole clang targets depend on intrinsic_gen.
[X86] Add patterns for memory forms of SARX/SHLX/SHRX with careful complexity adjustment to keep shift by immediate using the legacy instructions.
These patterns were only missing to favor using the legacy instructions when the shift was a constant. With careful adjustment of the pattern complexity we can make sure the immediate instructions still have priority over these patterns.
Petr Hosek [Sat, 22 Jul 2017 02:33:45 +0000 (02:33 +0000)]
Reland "[LLVM][llvm-objcopy] Added basic plumbing to get things started"
As discussed on llvm-dev I've implemented the first basic steps towards
llvm-objcopy/llvm-objtool (name pending).
This change adds the ability to copy (without modification) 64-bit
little endian ELF executables that have SHT_PROGBITS, SHT_NOBITS,
SHT_NULL and SHT_STRTAB sections.
[libFuzzer] reimplement experimental_len_control=1: bump the temporary max_len every time we failed to find new coverage during the last 1000 runs and 1 second. Also fix FileToVector to not load unfinished files
Petr Hosek [Fri, 21 Jul 2017 23:27:40 +0000 (23:27 +0000)]
Reland "[LLVM][llvm-objcopy] Added basic plumbing to get things started"
As discussed on llvm-dev I've implemented the first basic steps towards
llvm-objcopy/llvm-objtool (name pending).
This change adds the ability to copy (without modification) 64-bit
little endian ELF executables that have SHT_PROGBITS, SHT_NOBITS,
SHT_NULL and SHT_STRTAB sections.
David Blaikie [Fri, 21 Jul 2017 21:41:15 +0000 (21:41 +0000)]
[ProfData] Detect if zlib is available
As discussed on [1], if the profile is compressed and llvm-profdata is not built with zlib support, the error message is not informative. Give a better error message if zlib is not available.
MIR SRADI uses instruction template XSForm_1rc which declares Defs = [CARRY]. But MIR SRADI_32 uses instruction template XSForm_1, and it doesn't declare such implicit definition. With patch D33720 it causes wrong code generation for perl.
This includes the hash table, the address map, and the thunk table and
section offset table. The last two are only used for incremental
linking, which LLD doesn't support, so they are less interesting. The
hash table is particularly important to get right, since this is the one
of the streams that debuggers use to translate addresses to symbols.
Haojie Wang [Fri, 21 Jul 2017 17:25:20 +0000 (17:25 +0000)]
ThinLTO Minimized Bitcode File Size Reduction
Summary: Currently the ThinLTO minimized bitcode file only strip the debug info, but there is still a lot of information in the minimized bit code file that will be not used for thin linker. In this patch, most of the extra information is striped to reduce the minimized bitcode file. Now only ModuleVersion, ModuleInfo, ModuleGlobalValueSummary, ModuleHash, Symtab and Strtab are left. Now the minimized bitcode file size is reduced to 15%-30% of the debug info stripped bitcode file size.
Simon Dardis [Fri, 21 Jul 2017 17:19:00 +0000 (17:19 +0000)]
[mips] Support -membedded-data and fix a related bug
-membedded-data changes the location of constant data from the .sdata to
the .rodata section. Previously it was (incorrectly) always located in the
.rodata section.
Anna Thomas [Fri, 21 Jul 2017 16:30:38 +0000 (16:30 +0000)]
[RuntimeUnroll] NFC: Add a profitability function for mutliexit loop
Separated out the profitability from the safety analysis for multiexit
loop unrolling. Currently, this is an NFC because profitability is true
only if the unroll-runtime-multi-exit is set to true (off-by-default).
This is to ease adding the profitability heuristic up for review at
D35380.
Jonas Paulsson [Fri, 21 Jul 2017 11:59:37 +0000 (11:59 +0000)]
[SystemZ, LoopStrengthReduce]
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.
In order to achieve this, the following common code changes were made:
* New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
LSR should do instruction-based addressing evaluations by calling
isLegalAddressingMode() with the Instruction pointers.
* In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
not just loads or stores.
SystemZ changes:
* isLSRCostLess() implemented with Insns first, and without ImmCost.
* New function supportedAddressingMode() that is a helper for TTI methods
looking at Instructions passed via pointers.
Adrian Prantl [Fri, 21 Jul 2017 02:07:33 +0000 (02:07 +0000)]
dsymutil: strip unused types from imported DW_TAG_modules
This patch teaches dsymutil to strip types from the imported
DW_TAG_module inside of an object file (not inside the PCM) if they
can be resolved to the full definition inside the PCM. This reduces
the size of the .dSYM from WebCore from webkit.org by almost 2/3.
George Karpenkov [Thu, 20 Jul 2017 23:46:46 +0000 (23:46 +0000)]
Generate a compile_commands.json DB for external projects.
compile_commands.json file is very useful both for tooling and for
reproducible builds.
For files generated from recursive CMake invocation this information was
not previously generated.
Kevin Enderby [Thu, 20 Jul 2017 23:08:41 +0000 (23:08 +0000)]
Add error handling to the dyld compact export entries in libObject.
lld needs a matching change for this will be my next commit.
Expect it to fail build until that matching commit is picked up by the bots.
Like the changes in r296527 for dyld bind entires and the changes in
r298883 for lazy bind, weak bind and rebase entries the export
entries are the last of the dyld compact info to have error handling added.
This follows the model of iterators that can fail that Lang Hanes
designed when fixing the problem for bad archives r275316 (or r275361).
So that iterating through the exports now terminates if there is an error
and returns an llvm::Error with an error message in all cases for malformed
input.
This change provides the plumbing for the error handling, all the needed
testing of error conditions and test cases for all of the unique error messages.
Tim Northover [Thu, 20 Jul 2017 22:58:26 +0000 (22:58 +0000)]
GlobalISel: stop localizer putting constants before EH_LABELs
If the localizer pass puts one of its constants before the label that tells the
unwinder "jump here to handle your exception" then control-flow will skip it,
leaving uninitialized registers at runtime. That's bad.
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].
Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
[1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types
Matt Arsenault [Thu, 20 Jul 2017 21:06:04 +0000 (21:06 +0000)]
AMDGPU: Rename _RTN atomic instructions
Move the _RTN to the end of the name. It reads
better if the other addressing mode components
line up with the non-RTN version. It is also
more convenient to define saddr variants of
FLAT atomics to have the RTN last, and it is
good to have a consistent naming scheme.
Matt Arsenault [Thu, 20 Jul 2017 21:03:45 +0000 (21:03 +0000)]
Add an ID field to StackObjects
On AMDGPU SGPR spills are really spilled to another register.
The spiller creates the spills to new frame index objects,
which is used as a placeholder.
This will eventually be replaced with a reference to a position
in a VGPR to write to and the frame index deleted. It is
most likely not a real stack location that can be shared
with another stack object.
This is a problem when StackSlotColoring decides it should
combine a frame index used for a normal VGPR spill with
a real stack location and a frame index used for an SGPR.
Add an ID field so that StackSlotColoring has a way
of knowing the different frame index types are
incompatible.