Adam Nemet [Fri, 1 Dec 2017 17:02:04 +0000 (17:02 +0000)]
[opt-remarks] If hotness threshold is set, ignore remarks without hotness
These are blocks that haven't not been executed during training. For large
projects this could make a significant difference. For the project, I was
looking at, I got an order of magnitude decrease in the size of the total YAML
files with this and r319235.
Jatin Bhateja [Fri, 1 Dec 2017 14:07:38 +0000 (14:07 +0000)]
[X86] Improvement in CodeGen instruction selection for LEAs.
Summary:
1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to
accommodate similar operand appearing in the DAG e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will now look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity
then complex LEAs (having 3 operands) could be factored out e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop.
4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX) which offers better through put.
Mikael Holmen [Fri, 1 Dec 2017 12:30:49 +0000 (12:30 +0000)]
Bail out of a SimplifyCFG switch table opt at undef values.
Summary:
A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef.
The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization.
Pavel Labath [Fri, 1 Dec 2017 11:41:07 +0000 (11:41 +0000)]
[cmake] Enable zlib support on windows
Summary:
zlib support was hard-wired to off for (non-cygwin) windows targets.
This disables some features, such as reading debug info from compressed
dwarf sections.
This has been this way since zlib support was added in 2013 (r180083),
but there is no obvious reason for that. Zlib is perfectly capable of
being compiled for windows (it even has a cmake file that works out of
the box).
This enables one to turn on zlib support on windows, if one has zlib
avaliable.
[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops.
Patch tries to improve vectorization of the following code:
void add1(int * __restrict dst, const int * __restrict src) {
*dst++ = *src++;
*dst++ = *src++ + 1;
*dst++ = *src++ + 2;
*dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.
Ying Yi [Fri, 1 Dec 2017 09:54:27 +0000 (09:54 +0000)]
[lit] Implement non-pipelined ‘mkdir’, ‘diff’ and ‘rm’ commands internally
Summary:
The internal shell already supports 'cd', ‘export’ and ‘echo’ commands.
This patch adds implementation of non-pipelined ‘mkdir’, ‘diff’ and ‘rm’
commands as the internal shell builtins.
Volkan Keles [Fri, 1 Dec 2017 08:19:10 +0000 (08:19 +0000)]
GlobalISel: Enable the legalization of G_MERGE_VALUES and G_UNMERGE_VALUES
Summary: LegalizerInfo assumes all G_MERGE_VALUES and G_UNMERGE_VALUES instructions are legal, so it is not possible to legalize vector operations on illegal vector types. This patch fixes the problem by removing the related check and adding default actions for G_MERGE_VALUES and G_UNMERGE_VALUES.
Craig Topper [Fri, 1 Dec 2017 06:02:02 +0000 (06:02 +0000)]
[X86] Custom legalize v2i32 gathers via widening rather than promoting.
The default legalization for v2i32 is promotion to v2i64. This results in a gather that reads 64-bit elements rather than 32. If one of the elements is near a page boundary this can cause an illegal access that can fault.
We also miscalculate the scale for the gather which is an even worse problem, but we probably could have found a separate way to fix that.
Craig Topper [Fri, 1 Dec 2017 06:02:00 +0000 (06:02 +0000)]
[X86][SelectionDAG] Make sure we explicitly sign extend the index when type promoting the index of scatter and gather.
Type promotion makes no guarantee about the contents of the promoted bits. Since the gather/scatter instruction will use the bits to calculate addresses, we need to ensure they aren't garbage.
Jake Ehrlich [Fri, 1 Dec 2017 00:54:28 +0000 (00:54 +0000)]
Add flag to ArchiveWriter to test GNU64 format more efficiently
Even with the sparse file optimizations the SYM64 test can still be painfully
slow. This unnecessarily slows down devs. It's critical that we test that the
switch to the SYM64 format occurs at 4GB but there isn't any better of a way to
fake the size of the file than sparse files. This change introduces a flag that
allows the cutoff to be arbitrarily set to whatever power of two is desired.
The flag is hidden as it really isn't meant to be used outside this one test.
This is unfortunate but appears necessary, at least until the average hard
drive is much faster.
The changes to the test require some explanation. Prior to this change we knew
that the SYM64 format was being used because the file was simply too large to
have validly handled this case if the SYM64 format were not used. To ensure
that the SYM64 format is still being used I am grepping the file for "SYM64".
Without changing the filename however this would be pointless because "SYM64"
would occur in the file either way. So the filename of the test is also changed
in order to avoid this issue.
Zachary Turner [Fri, 1 Dec 2017 00:53:10 +0000 (00:53 +0000)]
Mark all library options as hidden.
These command line options are not intended for public use, and often
don't even make sense in the context of a particular tool anyway. About
90% of them are already hidden, but when people add new options they
forget to hide them, so if you were to make a brand new tool today, link
against one of LLVM's libraries, and run tool -help you would get a
bunch of junk that doesn't make sense for the tool you're writing.
This patch hides these options. The real solution is to not have
libraries defining command line options, but that's a much larger effort
and not something I'm prepared to take on.
ThinLTOBitcodeWriter: Try harder to discard unused references to the merged module.
If the thin module has no references to an internal global in the
merged module, we need to make sure to preserve that property if the
global is a member of a comdat group, as otherwise promotion can end
up adding global symbols to the comdat, which is not allowed.
This situation can arise if the external global in the thin module
has dead constant users, which would cause use_empty() to return
false and would cause us to try to promote it. To prevent this from
happening, discard the dead constant users before asking whether a
global is empty.
Zachary Turner [Thu, 30 Nov 2017 23:00:30 +0000 (23:00 +0000)]
Simplify the DenseSet used for hashing CodeView records.
This was storing the hash alongside the key so that the hash
doesn't need to be re-computed every time, but in doing so it
was allocating a structure to keep the key size small in the
DenseMap. This is a noble goal, but it also leads to a pointer
indirection on every probe, and this cost of this pointer
indirection ends up being higher than the cost of having a
slightly larger entry in the hash table. Removing this not only
simplifies the code, but yields a small but noticeable
performance improvement in the type merging algorithm.
Dan Gohman [Thu, 30 Nov 2017 22:10:53 +0000 (22:10 +0000)]
[memcpyopt] Teach memcpyopt to optimize across basic blocks
This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
Shoaib Meenai [Thu, 30 Nov 2017 21:48:26 +0000 (21:48 +0000)]
[llvm] Add stripped installation targets
CMake's generated installation scripts support `CMAKE_INSTALL_DO_STRIP`
to enable stripping the installed binaries. LLVM's build system doesn't
expose this option to the `install-` targets, but it's useful in
conjunction with `install-distribution`.
Add a new function to create the install targets, which creates both the
regular install target and a second install target that strips during
installation. Change the creation of all installation targets to use
this new function. Stripping doesn't make a whole lot of sense for some
installation targets (e.g. the LLVM headers), but consistency doesn't
hurt.
I'll make other repositories (e.g. clang, compiler-rt) use this in a
follow-up, and then add an `install-distribution-stripped` target to
actually accomplish the end goal of creating a stripped distribution. I
don't want to do that step yet because the creation of that target would
depend on the presence of the `install-*-stripped` target for each
distribution component, and the distribution components from other
repositories will be missing that target right now.
Jake Ehrlich [Thu, 30 Nov 2017 20:14:53 +0000 (20:14 +0000)]
[llvm-objcopy] Add support for --only-keep/-j and --keep
This change adds support for the --only-keep option and the -j alias as well.
A common use case for these being used together is to dump a specific section's
data. Additionally the --keep option is added (GNU objcopy doesn't have this)
to avoid removing a bunch of things. This allows people to err on the side of
stripping aggressively and then to keep the specific bits that they need for
their application.
Michal Gorny [Thu, 30 Nov 2017 19:09:22 +0000 (19:09 +0000)]
[cmake] Include project name in Sphinx doctree dir to fix race conditions
Modify add_sphinx_target() to include the project name alongside builder
in Sphinx doctree directory. This aims to avoid crashes due to race
conditions between multiple Sphinx instances running in parallel that
attempt to create or read that directory simultaneously.
This problem has originally been addressed in r283188. However, that
commit presumed that there will be only one target per builder being
run. However, r314863 introduced a second manpage target, reintroducing
the race condition.
Igor Laevsky [Thu, 30 Nov 2017 15:41:58 +0000 (15:41 +0000)]
[FuzzMutate] Bailout from injecting into empty basic blocks.
In rare cases we can receive request to inject into completelly empty basic block. In the normal case
all basic blocks contain at least terminator instruction, but it is possible that the only instruction is
catchpad instruction which is not part of the instruction iterator. This case seems rare enough to not care
about it.
Submiting without review, since it seems almost NFC. I couldn't come up with any reasonable way to test this.
Sanjay Patel [Thu, 30 Nov 2017 14:59:03 +0000 (14:59 +0000)]
[LangRef] clarify semantics of the frem instruction
As noted in D40594, the frem instruction corresponds to fmod() except that it can't set errno.
I modified the text that we currently use for intrinsics that map to libm functions and applied
it to frem.
Nemanja Ivanovic [Thu, 30 Nov 2017 13:39:10 +0000 (13:39 +0000)]
[PowerPC] Recommit r314244 with refactoring and off by default
This re-commits everything that was pulled in r314244. The transformation
is off by default (patch to enable it to follow). The code is refactored
to have a single entry-point and provide fine-grained control over patterns
that it selects. This patch also fixes the bugs in the original code.
Everything that failed with the original patch has been re-tested with this
patch (with the transformation turned on). So the patch to turn this on is
soon to follow.
Sean Eveson [Thu, 30 Nov 2017 13:05:14 +0000 (13:05 +0000)]
[MC] Function stack size section.
Re applying after fixing issues in the diff, sorry for any painful conflicts/merges!
Original RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-August/117028.html
This change adds a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. The section contains pairs of function symbol references (8 byte) and stack sizes (unsigned LEB128).
The contents of this section can be used to measure changes to stack sizes between different versions of the compiler or a source base. The advantage of having a section is that we can extract this information when examining binaries that we didn't build, and it allows users and tools easy access to that information just by referencing the binary.
There is a follow up change to add an option to clang.
[CodeGen] Print "%vreg0" as "%0" in both MIR and debug output
As part of the unification of the debug format and the MIR format, avoid
printing "vreg" for virtual registers (which is one of the current MIR
possibilities).
Basically:
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g"
* grep -nr '%vreg' . and fix if needed
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g"
* grep -nr 'vreg[0-9]\+' . and fix if needed
Sean Eveson [Thu, 30 Nov 2017 12:01:16 +0000 (12:01 +0000)]
[MC] Function stack size section.
Summary:
Original RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-August/117028.html
I wasn't sure who to put as reviewers, so please add/remove people as appropriate.
This change adds a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. The section contains pairs of function symbol references (8 byte) and stack sizes (unsigned LEB128).
The contents of this section can be used to measure changes to stack sizes between different versions of the compiler or a source base. The advantage of having a section is that we can extract this information when examining binaries that we didn't build, and it allows users and tools easy access to that information just by referencing the binary.
There is a follow up change to add an option to clang.
Sam Parker [Thu, 30 Nov 2017 11:49:11 +0000 (11:49 +0000)]
[DAGCombine] Refactor ReduceLoadWidth
visitAND attempts to narrow the width of extending loads that are
then masked off. ReduceLoadWidth already exists for a similar purpose
and handles shifts, so I've moved the code to handle AND nodes there.
This patch implements `getBundleInfo`, which uses CoreFoundation to
obtain information about the CFBundle. This information is needed to
populate the Plist in the dSYM bundle.
This change only applies to darwin and is an NFC as far as other
platforms are concerned.
Jonas Paulsson [Thu, 30 Nov 2017 08:18:50 +0000 (08:18 +0000)]
[SystemZ] Bugfix in adjustSubwordCmp.
Csmith generated a program where a store after load to the same address did
not get chained after the new load created during DAG legalizing, and so
performed an illegal overwrite of the expected value.
When the new zero-extending load is created, the chain users of the original
load must be updated, which was not done previously.
A similar case was also found and handled in lowerBITCAST.
Hiroshi Inoue [Thu, 30 Nov 2017 07:44:46 +0000 (07:44 +0000)]
[SROA] enable splitting for non-whole-alloca loads and stores
Currently, SROA splits loads and stores only when they are accessing the whole alloca.
This patch relaxes this limitation to allow splitting a load/store if all other loads and stores to the alloca are disjoint to or fully included in the current load/store. If there is no other load or store that crosses the boundary of the current load/store, the current splitting implementation works as is.
The whole-alloca loads and stores meet this new condition and so they are still splittable.
Here is a simplified motivating example.
struct record {
long long a;
int b;
int c;
};
int func(struct record r) {
for (int i = 0; i < r.c; i++)
r.b++;
return r.b;
}
When updating r.b (or r.c as well), LLVM generates redundant instructions on some platforms (such as x86_64, ppc64); here, r.b and r.c are packed into one 64-bit GPR when the struct is passed as a method argument.
With this patch, the above example is compiled into only few instructions without loop.
Without the patch, unnecessary loop-carried dependency is introduced by SROA and the loop cannot be eliminated by the later optimizers.
Craig Topper [Thu, 30 Nov 2017 07:01:40 +0000 (07:01 +0000)]
[X86] Optimize avx2 vgatherqps for v2f32 with v2i64 index type.
Normal type legalization will widen everything. This requires forcing 0s into the mask register. We can instead choose the form that only reads 2 elements without zeroing the mask.
[XRay][docs] Update documentation on new default for xray_naive_log=
We've recently changed the default for `xray_naive_log=` to be `false`
instead of `true` to make it more consistent with the FDR mode logging
implementation. This means we will now ask users to explicitly choose
which version of the XRay logging is being used.
Graham Yiu [Thu, 30 Nov 2017 02:41:36 +0000 (02:41 +0000)]
With PGO information, we can do more aggressive outlining of cold regions in the inline candidate function. This contrasts with the scheme of keeping only the 'early return' portion of the inline candidate and outlining the rest of the function as a single function call.
Support for outlining multiple regions of each function is added, as well as some basic heuristics to determine which regions are good to outline. Outline candidates limited to regions that are single-entry & single-exit. We also avoid outlining regions that produce live-exit variables, which may inhibit some forms of code motion (like commoning).
Fallback to the regular partial inlining scheme is retained when either i) no regions are identified for outlining in the function, or ii) the outlined function could not be inlined in any of its callers.
Matt Arsenault [Thu, 30 Nov 2017 00:52:40 +0000 (00:52 +0000)]
AMDGPU: Allow negative MUBUF vaddr for gfx9
GFX9 does not enable bounds checking for the resource descriptors
used for private access, so it should be OK to use vaddr with
a potentially negative value.
Rafael Espindola [Thu, 30 Nov 2017 00:44:22 +0000 (00:44 +0000)]
Check alignment in getSectionContentsAsArray.
While the ArrayRef can technically have unaligned data, it would be
extremely surprising if iterating over it caused undefined behavior
when a reference to the underlying type was bound.
Vedant Kumar [Thu, 30 Nov 2017 00:28:23 +0000 (00:28 +0000)]
[Coverage] Use the most-recent completed region count (PR35437)
This is a fix for the coverage segment builder.
If multiple regions must be popped off the active stack at once, and
more than one of them end at the same location, emit a segment using the
count from the most-recent completed region.
First step towards more human-friendly PPC assembler output:
- add -ppc-reg-with-percent-prefix option to use %r3 etc as register
names
- split off logic for Darwinish verbose conditional codes into a helper
function
- be explicit about Darwin vs AIX vs GNUish assembler flavors
Zachary Turner [Wed, 29 Nov 2017 22:41:56 +0000 (22:41 +0000)]
[CodeView] Factor some code out of TypeTableBuilder.
This class had some code that would automatically remap type
indices before hashing and serializing. The only caller of
this method was the TypeStreamMerger anyway, and the method
doesn't make general sense, and prevents making certain future
improvements to the class. So, factoring this up one level
into the TypeStreamMerger where it belongs.