LowerTypeTests: Change a few vtable globals in tests to constants.
It turns out that some of our negative tests were not in fact providing the
test coverage we expected: they were passing because the vtables were failing
an early check that they were constant. Fix this by changing the globals in
these tests to constants.
Tim Northover [Wed, 8 Feb 2017 23:23:32 +0000 (23:23 +0000)]
GlobalISel: translate @llvm.pow intrinsic to G_FPOW.
It'll usually be immediately legalized back to a libcall, but occasionally
something can be done with it so we'd just as well enable that flexibility from
the start.
[ARM/AArch ISel] SwiftCC: First parameters that are marked swiftself are not 'this returns'
We mark X0 as preserved by a call that passes the returned parameter.
x0 = ...
fun(x0) // no implicit def of x0
This no longer is valid if we pass the parameter in a different register then
the returned value as is the case with a swiftself parameter (passed in x20).
x20 = ...
fun(x20) // there should be an implict def of x8
Tim Northover [Wed, 8 Feb 2017 21:22:15 +0000 (21:22 +0000)]
GlobalISel: expand mul-with-overflow into mul-hi on AArch64.
AArch64 has specific instructions to multiply two numbers at double the width
and produce the high part of the result. These can be used to implement LLVM's
mul.with.overflow instructions fairly simply. Helps with C++ operator new[].
Implement getRegPressureLimit and getRegPressureSetLimit callbacks in
SIRegisterInfo.
This makes standard converge scheduler to behave almost the same as
GCNScheduler, sometime slightly better sometimes a bit worse.
In gerenal that is also possible to switch GCNScheduler to use these
callbacks instead of getMaxWaves(), which also makes GCNScheduler
slightly better on some tests and slightly worse on another. A big
win is behavior with converge scheduler.
Note, these are used not only by scheduling, but in places like
MachineLICM.
Hans Wennborg [Wed, 8 Feb 2017 20:58:33 +0000 (20:58 +0000)]
build_llvm_package.bat: Build teh clang-format plugin separately
In r293373 we switched the build to linking dynamically against the
Universal CRT and include the redistributables in the installer.
However, clang-format.exe is copied into the vsix and needs to be
statically linked. This commit makes us build the plugin in a separate
step that uses static linking.
ThinLTOBitcodeWriter: Strip debug info from merged module.
This module will contain nothing but vtable definitions and (soon)
available_externally function definitions, so there is no point in keeping
debug info in the module.
[Loop Vectorizer] Cost-based decision for vectorization form of memory instruction.
Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction.
The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution.
This patch includes the following changes:
- Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision.
- Arrays of Uniform and Scalar values are moved from Legality to Cost Model.
- Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form.
- Vectorization of memory instruction is performed according to the CM decision.
Adrian Prantl [Wed, 8 Feb 2017 17:44:43 +0000 (17:44 +0000)]
Fix bitcode upgrade for DIGlobalVariables with a var: field.
This is a follow-up to https://reviews.llvm.org/D29349. It turns out
that NeedUpgradeToDIGlobalVariableExpression is always necessary when
we encountered a version==0 record because it may always be referenced
via a list of globals in a DICompileUnit. My tests weren't good enough
to catch this though. To trigger this case, we need much older bitcode
produced by LLVM around version 3.7.
Daniel Berlin [Wed, 8 Feb 2017 15:22:52 +0000 (15:22 +0000)]
LVI: Add a per-value worklist limit to LazyValueInfo.
Summary:
LVI is now depth first, which is optimal for iteration strategy in
terms of work per call. However, the way the results get cached means
it can still go very badly N^2 or worse right now. The overdefined
cache is per-block, because LVI wants to try to get different results
for the same name in different blocks (IE solve the problem
PredicateInfo solves). This means even if we discover a value is
overdefined after going very deep, it doesn't cache this information,
causing it to end up trying to rediscover it again and again. The
same is true for values along the way. In practice, overdefined
anywhere should mean overdefined everywhere (this is how, for example,
SCCP works).
Until we get around to reworking the overdefined cache, we need to
limit the worklist size we process. Note that permanently reverting
the DFS strategy exploration seems the wrong strategy (temporarily
seems fine if we really want). BFS is clearly the wrong approach, it
just gets luckier on some testcases. It's also very hard to design
an effective throttle for BFS. For DFS, the throttle is directly related
to the depth of the CFG. So really deep CFGs will get cutoff, smaller
ones will not. As the CFG simplifies, you get better results.
In BFS, the limit is it's related to the fan-out times average block size,
which is harder to reason about or make good choices for.
Bug being filed about the overdefined cache, but it will require major
surgery to fix it (plumbing predicateinfo through CVP or LVI).
Note: I did not make this number configurable because i'm not sure
anyone really needs to tweak this knob. We run CVP 3 times. On the
testcases i have the slow ones happen in the middle, where CVP is
doing cleanup work other things are effective at. Over the course of
3 runs, we don't see to have any real loss of performance.
I haven't gotten a minimized testcase yet, but just imagine in your
head a testcase where, going *up* the CFG, you have branches, one of
which leads 50000 blocks deep, and the other, to something where the
answer is overdefined immediately. BFS would discover the overdefined
faster than DFS, but do more work to do so. In practice, the right
answer is "once DFS discovers overdefined for a value, stop trying to
get more info about that value" (and so, DFS would normally cache the
overdefined results for every value it passed through in those 50k
blocks, and never do that work again. But it don't, because of the
naming problem)
Sanne Wouda [Wed, 8 Feb 2017 14:48:05 +0000 (14:48 +0000)]
[Assembler] Enable nicer diagnostics for inline assembly.
Fixed test.
Summary:
Enables source location in diagnostic messages from the backend. This
is after parsing, during finalization. This requires the SourceMgr, the
inline assembly string buffer, and DiagInfo to still be alive after
EmitInlineAsm returns.
This patch creates a single SourceMgr for inline assembly inside the
AsmPrinter. MCContext gets a pointer to this SourceMgr. Using one
SourceMgr per call to EmitInlineAsm would make it difficult for
MCContext to figure out in which SourceMgr the SMLoc is located, while a
single SourceMgr can figure it out if it has multiple buffers.
The Str argument to EmitInlineAsm is copied into a buffer and owned by
the inline asm SourceMgr. This ensures that DiagHandlers won't print
garbage. (Clang emits a "note: instantiated into assembly here", which
refers to this string.)
The AsmParser gets destroyed before finalization, which means that the
DiagHandlers the AsmParser installs into the SourceMgr will be stale.
Restore the saved DiagHandlers.
Since now we're using just one SourceMgr for multiple inline asm
strings, we need to tell the AsmParser which buffer it needs to parse
currently. Hand a buffer id -- returned from SourceMgr::
AddNewSourceBuffer -- to the AsmParser.
Diana Picus [Wed, 8 Feb 2017 14:23:30 +0000 (14:23 +0000)]
Fix test to work on swift/cyclone too
I forgot to remove the neonfp target feature from the test, which means we'd
have trouble selecting VADDS on targets that have neonfp enabled by default.
NAKAMURA Takumi [Wed, 8 Feb 2017 13:49:28 +0000 (13:49 +0000)]
Revert r294356, "DebugInfo: Track spilled variables in LiveDebugValues"
It caused undefined behavior in VarLoc. As far as I investigated,
- VarLoc::VarLoc() treats negative offset value as InvalidKind.
Consider the case that (int64_t)MI.getOperand(1).getImm() is negative and whether it satisfies ((uint64_t)Offset < (1ULL << 32)).
- Comparison operators in VarLoc behave undefined since VarLoc::Loc.Hash is uninitialized in case of InvalidKind.
I guess Offset (in VarLoc) could be made aware of signed, but I am not sure.
So I have reverted it for now.
A virtual destructor is needed, since the derived classes are stored in
`iplist<PredicateBase> AllInfos;` and, apparently, ilist_node doesn't have a
virtual destructor.
Diana Picus [Wed, 8 Feb 2017 13:23:04 +0000 (13:23 +0000)]
[ARM] GlobalISel: Add FPR reg bank
Add a register bank for floating point values and select simple instructions
using them (add, copies from GPR).
This assumes that the hardware can cope with a single precision add (VADDS)
instruction, so the legalizer will treat G_FADD as legal and the instruction
selector will refuse to select if the hardware doesn't support it. In the future
we'll want to be more careful about this, and legalize to libcalls if we have to
use soft float.
Amara Emerson [Wed, 8 Feb 2017 11:28:08 +0000 (11:28 +0000)]
[AArch64][TableGen] Skip tied result operands for InstAlias
This patch checks the number of operands in the resulting
instruction instead of just the alias, then skips over
tied operands when generating the printing method.
This allows us to generate the preferred assembly syntax
for the AArch64 'ins' instruction, which should always be
displayed as 'mov' according to the ARMARM.
Several unit tests have changed as a result, but only to
reflect the preferred disassembly.
Some other InstAlias patterns (movk/bic/orr) needed a
slight adjustment to stop them becoming the default
and breaking other unit tests.
Sanne Wouda [Wed, 8 Feb 2017 10:20:07 +0000 (10:20 +0000)]
[Assembler] Enable nicer diagnostics for inline assembly.
Summary:
Enables source location in diagnostic messages from the backend. This
is after parsing, during finalization. This requires the SourceMgr, the
inline assembly string buffer, and DiagInfo to still be alive after
EmitInlineAsm returns.
This patch creates a single SourceMgr for inline assembly inside the
AsmPrinter. MCContext gets a pointer to this SourceMgr. Using one
SourceMgr per call to EmitInlineAsm would make it difficult for
MCContext to figure out in which SourceMgr the SMLoc is located, while a
single SourceMgr can figure it out if it has multiple buffers.
The Str argument to EmitInlineAsm is copied into a buffer and owned by
the inline asm SourceMgr. This ensures that DiagHandlers won't print
garbage. (Clang emits a "note: instantiated into assembly here", which
refers to this string.)
The AsmParser gets destroyed before finalization, which means that the
DiagHandlers the AsmParser installs into the SourceMgr will be stale.
Restore the saved DiagHandlers.
Since now we're using just one SourceMgr for multiple inline asm
strings, we need to tell the AsmParser which buffer it needs to parse
currently. Hand a buffer id -- returned from SourceMgr::
AddNewSourceBuffer -- to the AsmParser.
Sam Parker [Wed, 8 Feb 2017 09:44:18 +0000 (09:44 +0000)]
Use dynamic symbols for ELF disassembly
Disassembly currently begins from addresses obtained from the objects
symbol table. For ELF, add the dynamic symbols to the list if no
static symbols are available so that we can more successfully
disassemble stripped binaries.
[ArgPromote] Delete a test that makes no sense (any more).
This test is under 'ArgumentPromotion' but there are no arguments that
get promoted in the test case, so there seems to be no point. Also,
there are no assertions about the output at all, so this seems like
something we should just delete given the low value.
[ArgPromote] Clean up a crash test case by rinsing it through opt,
renaming things to at least have somewhat spelled out names, and even
have meaningful names where I could guess at what they should be.
Also add FileCheck assertions that we're actually doing what we set out
to do for some of the tests, for example not promoting a type that would
result in infinite promotion.
[ArgPromote] Actually add FileCheck to a test that I actually updated to
have nice CHECK patterns instead of relying on a coarse 'not grep'
check. Sorry that I missed this the first time through.
Craig Topper [Wed, 8 Feb 2017 02:54:12 +0000 (02:54 +0000)]
Move mnemonicIsValid to Mips target.
Summary:
The Mips target is the only user of mnemonicIsValid. This patch
moves this method from AsmMatcherEmitter.cpp to MipsAsmParser.cpp,
getting rid of the method in all other targets where it generated
warnings about an unused function.
Daniel Berlin [Wed, 8 Feb 2017 02:35:07 +0000 (02:35 +0000)]
CVP: Make CVP iterate in an order that maximizes reuse of LVI cache
Summary:
After the DFS order change for LVI, i have a few testcases that now
take forever.
The TL;DR - This is mainly due to the overdefined cache, but that
requires predicateinfo to fix[1]
In order to maximize reuse of the LVI cache for now, change the order
we iterate in.
This reduces my testcase from 5 minutes to 4 seconds.
I have verified cases like gmic do not get slower.
I am playing with whether the order should be postorder or idf.
[1] In practice, overdefined anywhere should be overdefined
everywhere, so this cache should be global. That also fixes this bug.
The problem, however, is that LVI relies on this cache being filled in
per-block because it wants different values in different blocks due to
precisely the naming issue that predicateinfo fixes. With
predicateinfo, making the cache global works fine on individual
passes, and also resolves this issue.
Marcos Pividori [Wed, 8 Feb 2017 00:03:31 +0000 (00:03 +0000)]
[libFuzzer] Use long long to ensure 64 bits.
We should always use unsigned long long to ensure 64 bits. On Windows, unsigned
long is 4 bytes. This was the reason why value-profile-cmp4.test was failing on
Windows.
Marcos Pividori [Wed, 8 Feb 2017 00:03:26 +0000 (00:03 +0000)]
[libFuzzer] Use custom target instead of list of binaries for tests.
Update cmake to use a custom target TestBinaries instead of a list of targets.
This simplifies cmake, and fix some errors. This way, we don't have to propagate
the values into parents directories. We only need to use add_dependencies.
Marcos Pividori [Wed, 8 Feb 2017 00:03:18 +0000 (00:03 +0000)]
[libFuzzer] Properly use Handle instead of FD on Windows.
For Windows, sanitizers work with Handles, not with posix file descriptors,
because they use the windows-specific API. So we need to convert the fds to
handles before passing them to the sanitizer library.
After this change, close_fd_mask is fixed for Windows (this fix some tests too).
Marcos Pividori [Wed, 8 Feb 2017 00:02:41 +0000 (00:02 +0000)]
[libFuzzer] Properly configure tests for Windows.
This configuration is necessary, and is included in all tests suites.
We need to execute: `config.test_format = lit.formats.ShTest(False)`
Otherwise, lit will try to use bash, which generates many problems.
Marcos Pividori [Wed, 8 Feb 2017 00:02:36 +0000 (00:02 +0000)]
[libFuzzer] Simplify dump_coverage test.
Environment variables are handled differently on Windows. In this case it is not
necessary to use environment variables. So, I simplify the test to work on
Windows.
Marcos Pividori [Wed, 8 Feb 2017 00:02:32 +0000 (00:02 +0000)]
[libFuzzer] Update Load test to work on 32 bits.
We should ensure the size of the variable `a` is 8 bytes. Otherwise, this
generates a stack buffer overflow inside the memcpy call in 32 bits machines.
(We write more bytes than the size of a, when it is 4 bytes)
Sanjoy Das [Tue, 7 Feb 2017 23:59:07 +0000 (23:59 +0000)]
[IRCE] Add a missing invariant check
Currently IRCE relies on the loops it transforms to be (semantically) of
the form:
for (i = START; i < END; i++)
...
or
for (i = START; i > END; i--)
...
However, we were not verifying the presence of the START < END entry
check (i.e. check before the first iteration). We were only verifying
that the backedge was guarded by (i + 1) < END.
Usually this would work "fine" since (especially in Java) most loops do
actually have the START < END check, but of course that is not
guaranteed.
Eric Fiselier [Tue, 7 Feb 2017 22:48:20 +0000 (22:48 +0000)]
[CMake] Fix USE_LLVM_SANITIZER configuration for out-of-tree builds.
Summary:
r291918 changed `HandleLLVMOptions.cmake` to add `-fsanitize-blacklist=<llvm-file>` when `LLVM_USE_SANITIZER=Undefined` is specified. This breaks out-of-tree users of `LLVM_USE_SANITIZER` since that file is not present.
This patch fixes the issue by checking if the file exists first.