[libFuzzer] change the default max_len from 64 to 4096. This will affect cases where libFuzzer is run w/o initial corpus or with a corpus of very small items.
Matthias Braun [Thu, 15 Jun 2017 22:14:55 +0000 (22:14 +0000)]
RegScavenging: Add scavengeRegisterBackwards()
Re-apply r276044/r279124. Trying to reproduce or disprove the ppc64
problems reported in the stage2 build last time, which I cannot
reproduce right now.
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Craig Topper [Thu, 15 Jun 2017 21:38:44 +0000 (21:38 +0000)]
[InstCombine] Add test cases to demonstrate instcombine increasing instruction count when trying to fold (select (icmp eq (and X, C1), 0), Y, (or Y, C2))->(or (shl (and X, C1), C3), y) when the pieces have multiple uses.
Zachary Turner [Thu, 15 Jun 2017 19:34:41 +0000 (19:34 +0000)]
[llvm-pdbutil] rewrite the "raw" output style.
After some internal discussions, we agreed that the raw output style had
outlived its usefulness. It was originally created before we had even
thought of dumping to YAML, and it was intended to give us some insight
into the internals of a PDB file. Now we have YAML mode which does
almost exactly this but is more powerful in that it can round-trip back
to a PDB, which the raw mode could not do. So the raw mode had become
purely a maintenance burden.
One option was to just delete it. However, its original goal was to be
as readable as possible while staying close to the "metal" - i.e.
presenting the output in a way that maps directly to the underlying file
format. We don't actually need that last requirement anymore since it's
covered by the yaml mode, so we could repurpose "raw" mode to actually
just be as readable as possible.
This patch implements about 80% of the functionality previously in raw
mode, but in a completely different style that is more akin to what
cvdump outputs. Records are very compressed, often times appearing on
just one line. One nice thing about this is that it makes full record
matching easier, because you can grep for indices, names, and leaf types
on a single line often.
See the tests for some examples of what the new output looks like.
Note that this patch actually regresses the functionality of raw mode in
a few areas, but only because the patch was already unreasonably large
and going 100% would have been even worse. Specifically, this patch is
missing:
The ability to dump module debug subsections (checksums, lines, etc)
The ability to dump section headers
Aside from that everything is here. While goign through the tests fixing
them all up, I found many duplicate tests. They've been deleted. In
subsequent patches I will go through and re-add the missing
functionality.
Craig Topper [Thu, 15 Jun 2017 19:09:51 +0000 (19:09 +0000)]
[InstCombine] Make the context instruction parameter of foldOrOfICmps a reference to discourage passing nullptr and to remove the '&' from all of the call sites. NFC
Add condition for MachineLICM to safely hoist instructions that utilize
non constant registers that are reserved.
On PPC, global variable access is done through the table of contents (TOC)
which is always in register X2. The ABI reserves this register in any
functions that have calls or access global variables.
A call through a function pointer involves saving, changing and restoring
this register around the call and thus MachineLICM does not consider it to
be invariant. We can however guarantee the register is preserved across the
call and thus is invariant.
The code assumed that we process instructions in basic block order. FastISel
processes instructions in reverse basic block order. We need to pre-assign
virtual registers before selecting otherwise we get def-use relationships wrong.
Apply summary-based dead stripping to regular LTO modules with summaries.
If a regular LTO module has a summary index, then instead of linking
it into the combined regular LTO module right away, add it to the
combined summary index and associate it with a special module that
represents the combined regular LTO module.
Any such modules are linked during LTO::run(), at which time we use
the results of summary-based dead stripping to control whether to
link prevailing symbols.
Craig Topper [Thu, 15 Jun 2017 17:16:56 +0000 (17:16 +0000)]
[BasicAA] Don't call isKnownNonEqual if we might be have gone through a PHINode.
This is a fix for the test case in PR32314.
Basic Alias Analysis can ask if two nodes are known non-equal after looking through a phi node to find a GEP. isAddOfNonZero saw an add of a constant from the same phi and said that its output couldn't be equal. But Basic Alias Analysis was really asking about the value from the previous loop iteration.
This patch at least makes that case not happen anymore, I'm not sure if there were still other ways this can fail. As was discussed in the bug, it looks like fixing BasicAA would be difficult so this patch seemed like a possible workaround
Nirav Dave [Thu, 15 Jun 2017 15:05:48 +0000 (15:05 +0000)]
[DAG] Defer Pre/Post IndexStore merge to after mergestore. NFCI.
In preparation for doing storemerge post-legalization, reorder
visitSTORE passes to move pre/post-index combining after store
merge. Reordered passes other than store merge are unaffected.
Nirav Dave [Thu, 15 Jun 2017 14:04:07 +0000 (14:04 +0000)]
[DAG] Allow truncated and extend memory operations in Store Merge. NFCI.
As all store merges checks are based on the memory operation
performed, allow use of truncated stores and extended loads as valid
input candidates for merging.
AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.
When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.
Max Kazantsev [Thu, 15 Jun 2017 11:48:21 +0000 (11:48 +0000)]
[ScalarEvolution] Apply Depth limit to getMulExpr
This is a fix for PR33292 that shows a case of extremely long compilation
of a single .c file with clang, with most time spent within SCEV.
We have a mechanism of limiting recursion depth for getAddExpr to avoid
long analysis in SCEV. However, there are calls from getAddExpr to getMulExpr
and back that do not propagate the info about depth. As result of this, a chain
can be extremely long, with every segment of getAddExpr's being up to max depth long.
This leads either to long compilation or crash by stack overflow. We face this situation while
analyzing big SCEVs in the test of PR33292.
This patch applies the same limit on max expression depth for getAddExpr and getMulExpr.
Diana Picus [Thu, 15 Jun 2017 10:53:31 +0000 (10:53 +0000)]
[ARM] GlobalISel: Add support for i32 modulo
Add support for modulo for targets that have hardware division and for
those that don't. When hardware division is not available, we have to
choose the correct libcall to use. This is generally straightforward,
except for AEABI.
The AEABI variant is trickier than the other libcalls because it
returns { quotient, remainder }, instead of just one value like the
other libcalls that we've seen so far. Therefore, we need to use custom
lowering for it. However, we don't want to have too much special code,
so we refactor the target-independent code in the legalizer by adding a
helper for replacing an instruction with a libcall. This helper is used
by the legalizer itself when dealing with simple calls, and also by the
custom ARM legalization for the more complicated AEABI divmod calls.
Diana Picus [Thu, 15 Jun 2017 09:42:02 +0000 (09:42 +0000)]
[ARM] GlobalISel: Lower only homogeneous struct args
Lowering mixed struct args, params and returns used G_INSERT, which is a
bit more convoluted to support through the entire pipeline. Since they
don't occur that often in practice, it's probably wiser to leave them
out until later.
Meanwhile, we can lower homogeneous structs using G_MERGE_VALUES, which
has good support in the legalizer. These occur e.g. as the return of
__aeabi_idivmod, so it's nice to be able to support them.
Florian Hahn [Thu, 15 Jun 2017 09:31:23 +0000 (09:31 +0000)]
[AArch64] Enable FeatureFuseAES for the generic processor model.
Summary:
Scheduling AESE/AESMC and AESD/AESIMC instruction pairs back-to-back
gives a double digit speedup on benchmarks using those instructions on
Cortex-A processors. In GCC, this optimization is part of the generic
processor model as well.
This change should not have a major performance impact on processors
that do not optimize AES instruction pairs, although I only had access
to Cortex-A processors for benchmarking.
Zoran Jovanovic [Thu, 15 Jun 2017 09:14:33 +0000 (09:14 +0000)]
[mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
The following instructions are examined and transformed, if possible:
ADDIU instruction is transformed into 16-bit instruction ADDIUSP
ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP
Differential Revision: https://reviews.llvm.org/D33887
Zachary Turner [Thu, 15 Jun 2017 03:06:38 +0000 (03:06 +0000)]
[formatv] Add the ability to specify a fill character when aligning.
Previously if you used fmt_align(7, Center) you would get the
output ' 7 '. It may be desirable for the user to specify
the fill character though, for example producing '---7---'. This
patch adds that.
IR: Tweak the API around adding modules to the summary index.
The current name (addModulePath) and return value
(ModulePathStringTableTy::iterator) is a little confusing. This
API adds a module, not just a path. And the iterator is basically
just an implementation detail of the summary index. Address
both of those issues by renaming to addModule and introducing a
ModuleSummaryIndex::ModuleInfo type that the function returns.
Zachary Turner [Wed, 14 Jun 2017 22:33:43 +0000 (22:33 +0000)]
Don't include TestingSupport in LLVM_LINK_COMPONENTS.
Instead use target_link_libraries directly. Thanks to
Juergen Ributzka for the suggestion, which fixes an issue
when llvm is configured with no targets.
Daniel Berlin [Wed, 14 Jun 2017 21:19:28 +0000 (21:19 +0000)]
NewGVN: This is wrong by inspection, it will not cause an issue currently due to other limitations, i believe. This also means i can't make a test for it.
Sanjay Patel [Wed, 14 Jun 2017 20:37:11 +0000 (20:37 +0000)]
[x86] avoid unnecessary shuffle mask math in combineX86ShufflesRecursively()
This is a follow-up to https://reviews.llvm.org/D34174 / https://reviews.llvm.org/rL305398.
We mentioned replacing the multiplies with shifts, but the real win seems to be in
bypassing the extra ops in the common case when the RootRatio and OpRatio are one.
This gives us another 1-2% overall win for the test in PR32037:
https://bugs.llvm.org/show_bug.cgi?id=32037
David Callahan [Wed, 14 Jun 2017 20:35:33 +0000 (20:35 +0000)]
Allow -profile-guided-section-prefix more than once
Summary:
At present, `-profile-guided-section-prefix` is a `cl::Optional` option, which means it demands to be passed exactly zero or one times. Our build system makes it pretty tricky to guarantee this. We often accidentally pass the flag more than once (but always with the same "false" value) which results in an error, after which compilation fails:
```
clang (LLVM option parsing): for the -profile-guided-section-prefix option: may only occur zero or one times!
```
While we work on improving our build system, it also seems reasonable just to allow `-profile-guided-section-prefix` to be passed more than once, by to `cl::ZeroOrMore`. Quoting [[ http://llvm.org/docs/CommandLine.html#controlling-the-number-of-occurrences-required-and-allowed | the documentation ]]:
> The cl::ZeroOrMore modifier ... indicates that your program will allow the option to be specified zero or more times.
> ...
> If an option is specified multiple times for an option of the cl::opt class, only the last value will be retained.
Davide Italiano [Wed, 14 Jun 2017 19:29:53 +0000 (19:29 +0000)]
[EarlyCSE] Make PhiToCheck in removeMSSA() a set.
This way we end up not looking at PHI args already removed.
MemSSA now goes through the updater so we can prune
it to avoid having redundant MemoryPHI arguments, but that
doesn't quite work for the general case.
Craig Topper [Wed, 14 Jun 2017 17:04:59 +0000 (17:04 +0000)]
[ValueTracking] Correct early out in computeKnownBitsFromOperator to work with non power of 2 bit widths
There's an early out that's trying to detect when we don't know any bits that make up the legal range of a shift. The code subtracts one from BitWidth which creates a mask in the lower bits for power of 2 bit widths. This is then ANDed with the known bits to see if any of those bits are known. If the bit width isn't a power of 2 this creates a non-sensical mask.
This patch corrects this by rounding up to a power of 2 before doing the subtract and mask.
Sanjay Patel [Wed, 14 Jun 2017 17:00:57 +0000 (17:00 +0000)]
[x86] replace div/rem with shift/mask for better shuffle combining perf
We know that shuffle masks are power-of-2 sizes, but there's no way (?) for LLVM to know that,
so hack combineX86ShufflesRecursively() to be much faster by replacing div/rem with shift/mask.
This makes the motivating compile-time bug in PR32037 ( https://bugs.llvm.org/show_bug.cgi?id=32037 )
about 9% faster overall.
Zachary Turner [Wed, 14 Jun 2017 16:41:50 +0000 (16:41 +0000)]
[gtest] Create a shared include directory for gtest utilities.
Many times unit tests for different libraries would like to use
the same helper functions for checking common types of errors.
This patch adds a common library with helpers for testing things
in Support, and introduces helpers in here for integrating the
llvm::Error and llvm::Expected<T> classes with gtest and gmock.
Normally, we would just be able to write:
EXPECT_THAT(someFunction(), succeeded());
but due to some quirks in llvm::Error's move semantics, gmock
doesn't make this easy, so two macros EXPECT_THAT_ERROR() and
EXPECT_THAT_EXPECTED() are introduced to gloss over the difficulties.
Consider this an exception, and possibly only temporary as we
look for ways to improve this.
Zachary Turner [Wed, 14 Jun 2017 15:59:27 +0000 (15:59 +0000)]
Resubmit "[codeview] Make obj2yaml/yaml2obj support .debug$S..."
This was originally reverted because of some non-deterministic
failures on certain buildbots. Luckily ASAN eventually caught
this as a stack-use-after-scope, so the fix is included in
this patch.
I've upset a buildbot which runs the address sanitizer:
ERROR: AddressSanitizer: stack-use-after-scope
lib/Target/ARM/ARMISelLowering.cpp:2690
That Twine variable is used illegally.
Simon Dardis [Wed, 14 Jun 2017 14:46:30 +0000 (14:46 +0000)]
[mips] Fix multiprecision arithmetic.
For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC,
get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs.
For MIPS, only the DSP ASE has a carry flag, so in the general case it is not
useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes.
Also improve the generation code in such cases for targets with
TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the
comparison node rather than using it in selects. Similarly for ISD::SUBE /
ISD::SUBC.
Address optimization breakage by moving the generation of MIPS specific integer
multiply-accumulate nodes to before legalization.
This revolves PR32713 and PR33424.
Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue!
[ARM] Support constant pools in data when generating execute-only code.
The ARM backend asserts against constant pool lowering when it generates
execute-only code in order to prevent the generation of constant pools in
the text section. It appears that target independent optimizations might
generate DAG nodes that represent constant pools. By lowering such nodes
as global addresses we don't violate the semantics of execute-only code
and also it is guaranteed that execute-only behaves correct with the
position-independent addressing modes that support execute-only code.
Florian Hahn [Wed, 14 Jun 2017 13:14:38 +0000 (13:14 +0000)]
Align definition of DW_OP_plus with DWARF spec [3/3]
Summary:
This patch is part of 3 patches that together form a single patch, but must be introduced in stages in order not to break things.
The way that LLVM interprets DW_OP_plus in DIExpression nodes is basically that of the DW_OP_plus_uconst operator since LLVM expects an unsigned constant operand. This unnecessarily restricts the DW_OP_plus operator, preventing it from being used to describe the evaluation of runtime values on the expression stack. These patches try to align the semantics of DW_OP_plus and DW_OP_minus with that of the DWARF definition, which pops two elements off the expression stack, performs the operation and pushes the result back on the stack.
This is done in three stages:
• The first patch (LLVM) adds support for DW_OP_plus_uconst.
• The second patch (Clang) contains changes all its uses from DW_OP_plus to DW_OP_plus_uconst.
• The third patch (LLVM) changes the semantics of DW_OP_plus and DW_OP_minus to be in line with its DWARF meaning. This patch includes the bitcode upgrade from legacy DIExpressions.
Simon Dardis [Wed, 14 Jun 2017 12:16:47 +0000 (12:16 +0000)]
[mips] Fix machine verifier errors in the long branch pass
This patch fixes two systemic machine verifier errors in the long
branch pass. The first is the incorrect basic block successors
and the second was the incorrect construction of several jump
instructions.
This partially resolves PR27458 and the associated PR32146.
Zachary Turner [Wed, 14 Jun 2017 06:24:24 +0000 (06:24 +0000)]
Revert "[codeview] Make obj2yaml/yaml2obj support .debug$S..."
This is causing failures on linux bots with an invalid stream
read. It doesn't repro in any configuration on Windows, so
reverting until I have a chance to investigate on Linux.
Zachary Turner [Wed, 14 Jun 2017 05:31:00 +0000 (05:31 +0000)]
[codeview] Make obj2yaml/yaml2obj support .debug$S/T sections.
This allows us to use yaml2obj and obj2yaml to round-trip CodeView
symbol and type information without having to manually specify the bytes
of the section. This makes for much easier to maintain tests. See the
tests under lld/COFF in this patch for example. Before they just said
SectionData: <blob> whereas now we can use meaningful record
descriptions. Note that it still supports the SectionData yaml field,
which could be useful for initializing a section to invalid bytes for
testing, for example.
Daniel Sanders [Tue, 13 Jun 2017 23:42:32 +0000 (23:42 +0000)]
[globalisel][legalizer] G_LOAD/G_STORE NarrowScalar should not emit G_GEP x, 0.
Summary:
When legalizing G_LOAD/G_STORE using NarrowScalar, we should avoid emitting
%0 = G_CONSTANT ty 0
%1 = G_GEP %x, %0
since it's cheaper to not emit the redundant instructions than it is to fold them
away later.
Craig Topper [Tue, 13 Jun 2017 23:30:41 +0000 (23:30 +0000)]
[InstCombine] Add test cases demonstrating failure to handle (select (icmp eq (and X, C1), 0), Y, (or Y, C2)) when the icmp portion gets turned into a truncate and a signed compare with 0.
InstCombine has an optimization that recognizes an and with the sign bit of legal type size and turns it into a truncate and compare that checks the sign bit. But the select handling code doesn't recognize this idiom.