Summary:
isFusion returns true if the subtarget supports any kind of instruction
fusion, similar to ARMSubtarget::isFusion. This was suggested in D34142.
This changes the current behavior slightly, because the macro fusion mutation
is now added to the PostRA MachineScheduler in case the subtarget supports
any kind of fusion. I think that makes sense because if the PostRA
MachineScheduler is run, there is potential that instructions scheduled back to
back are re-scheduled.
Simon Dardis [Wed, 12 Jul 2017 19:47:45 +0000 (19:47 +0000)]
[mips][mt][6/7] Add support for mftr, mttr instructions.
Unlike many other instructions, these instructions have aliases which
take coprocessor registers, gpr register, accumulator (and dsp accumulator)
registers, floating point registers, floating point control registers and
coprocessor 2 data and control operands.
For the moment, these aliases are treated as pseudo instructions which are
expanded into the underlying instruction. As a result, disassembling these
instructions shows the underlying instruction and not the alias.
Adrian McCarthy [Wed, 12 Jul 2017 19:38:11 +0000 (19:38 +0000)]
[PDB] Enable NativeSession to create symbols for built-in types on demand
Summary:
There is a reserved range of type indexes for built-in types (like integers).
This will create a symbol for a built-in type if the caller askes for one by
type index. This is also plumbing for being able to recall symbols by type
index in general, but user-defined types will come in subsequent patches.
Daniel Neilson [Wed, 12 Jul 2017 19:24:07 +0000 (19:24 +0000)]
Fix to web assembly lib call list
Summary:
Revision 307796 caused an internal build break in WebAssembly bots in the form of a
crash. ex:
Here's the crash dump from one of the failing tests:
Command Output (stderr):
--
Stack dump:
0. Program arguments: build/default/./bin/llc -asm-verbose=false -disable-wasm-fallthrough-return-opt -disable-wasm-explicit-locals
1. Running pass 'Function Pass Manager' on module '<stdin>'.
2. Running pass 'WebAssembly Assembly Printer' on function '@call_memcpy'
FileCheck error: '-' is empty.
FileCheck command line: build/default/./bin/FileCheck src/test/CodeGen/WebAssembly/global.ll
The problem is in lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp. There’s an array declared:
545 static const char *
Fix to web assembly lib call list
Summary:
Revision 307796 caused an internal build break in WebAssembly bots in the form of a
crash. ex:
Here's the crash dump from one of the failing tests:
Command Output (stderr):
--
Stack dump:
0. Program arguments: build/default/./bin/llc -asm-verbose=false -disable-wasm-fallthrough-return-opt -disable-wasm-explicit-locals
1. Running pass 'Function Pass Manager' on module '<stdin>'.
2. Running pass 'WebAssembly Assembly Printer' on function '@call_memcpy'
FileCheck error: '-' is empty.
FileCheck command line: build/default/./bin/FileCheck src/test/CodeGen/WebAssembly/global.ll
The problem is in lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp. There’s an array declared:
static const char *
RuntimeLibcallNames[RTLIB::UNKNOWN_LIBCALL] = {
That is defining a runtime lib call name for each entry in the enum RTLIB:Libcall from include/llvm/CodeGen/RuntimeLibcalls.h.
Revision 307796 added entries to the enum, but didn’t add entries to the RuntimeLibcallNames array, which caused a crash when attempting
to access past the end of the array.
This patch fixes the issue by adding the element atomic memmove to the WebAssembly arrays.
Use std::mutex to avoid memory allocation after OOM
ManagedStatic<sys::Mutex> would lazilly allocate a sys::Mutex to lock
when reporting an OOM, which is a bad idea.
The three STL implementations that I know of use pthread_mutex_lock and
EnterCriticalSection to implement std::mutex. I'm pretty sure that
neither of those allocate heap memory.
It seems that we unconditionally use std::mutex without testing
LLVM_ENABLE_THREADS elsewhere in the codebase, so this should be
portable.
George Karpenkov [Wed, 12 Jul 2017 18:16:09 +0000 (18:16 +0000)]
[libFuzzer] NFC Declare LIBFUZZER_FLAGS_BASE outside of an if-block
The current code relies on the assumption that tests are included only
if LLVM_USE_SANITIZE_COVERAGE is enabled.
This commit makes it easier to relax the assumption in the future, as
the variable LIBFUZZER_FLAGS_BASE is used further in libFuzzer tests.
[x86] improve SBB optimizations for SETB/SETA with subtract
This is another step towards removing a combine that turns sext
into select of constants and preparing the backend for an IR
future where select is the canonical form.
Earlier commits in this area:
https://reviews.llvm.org/rL306040
https://reviews.llvm.org/rL306072
https://reviews.llvm.org/rL307404 (https://reviews.llvm.org/D34652)
https://reviews.llvm.org/rL307471
GlobalISel: Handle selection of G_IMPLICIT_DEF in AArch64
A generic variant of IMPLICIT_DEF was added in r306875, but this
survives to selection and hits a `Cannot Select`. Add handling that
converts the note to a regular IMPLICIT_DEF.
Uses -codegenprepare instead of -instcombine since we hit the same
buggy path anyway, and CGP lets us keep this test really simple
(instcombine likes turning the alloca T, N into alloca [N x T], which
hides the bug this is testing for).
Daniel Neilson [Wed, 12 Jul 2017 15:25:26 +0000 (15:25 +0000)]
Add element atomic memmove intrinsic
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memmove intrinsic. This intrinsic is essentially memmove with the implementation requirement that all loads/stores used for the copy are done with unordered-atomic loads/stores of a given element size.
John Brawn [Wed, 12 Jul 2017 13:23:10 +0000 (13:23 +0000)]
[ARM] Adjust ifcvt heuristic for the diamond ifcvt case
When we have a diamond ifcvt the fallthough block will have a branch at the end
of it that disappears when predicated, so discount it from the predication cost.
[Linker] Add directives to support mixing ARM/Thumb module-level inline asm.
Summary:
By prepending `.text .thumb .balign 2` to the module-level inline
assembly from a Thumb module, the assembler will generate the assembly
from that module as Thumb, even if the destination module uses an ARM
triple. Similar directives are used for module-level inline assembly in
ARM modules.
The alignment and instruction set are reset based on the target triple
before emitting the first function label.
Reviewers: olista01, tejohnson, echristo, t.p.northover, rafael
Refactor CmpHelper into something simpler. It was overkill to use
templates for this - instead, use a simple CmpConstants structure to
hold the opcodes and other constants that are different when selecting
int / float / double comparisons. Also, extract some of the helpers that
were in CmpHelper into ARMInstructionSelector and make use of some of
them when selecting other things than just compares.
[PM] Fix a silly bug in my recent update to the CG update logic.
I used the wrong variable to update. This was even covered by a unittest
I wrote, and the comments for the unittest were correct (if confusing)
but the test itself just matched the buggy behavior. =[
[X86] Synchronize the ProcessorFeatures enum used by getHostCPUName with the enum in libgcc and soon compiler-rt.
This adds all the feature bits libgcc has. They will soon be added to compiler-rt as well. This adds a second 32 bit feature variable to hold the bits that are needed by getHostCPUName that are not in libgcc. libgcc had already used 31 of the 32 bits in the existing variable and we needed 3 bits so at minimum 2 bits would spill over. I chose to move all 3.
Mikael Holmen [Wed, 12 Jul 2017 06:19:10 +0000 (06:19 +0000)]
[MemoryBuiltins] Allow truncation in visitAllocaInst()
Summary:
Solves PR33689.
If the pointer size is less than the size of the type used for the array
size in an alloca (the <ty> type below) then we could trigger the assert in
the PR. In that example we have pointer size i16 and <ty> is i32.
LowerTypeTests: When importing functions skip definitions where the summary contains a decl.
This normally indicates mixed CFI + non-CFI compilation, and will
result in us treating the function in the same way as a function
defined outside of the LTO unit.
Sam Clegg [Wed, 12 Jul 2017 00:24:54 +0000 (00:24 +0000)]
[WebAssembly] Expose the offset of each data segment
Summary:
This allows tools like lld that process relocations
to apply data relocation correctly. This information
is required because relocation are stored as section
offset.
Avoid duplicating DictScope with hand-written names everywhere. Print
the S_-prefixed symbol kind for every record. This should make it easier
to search for certain kinds of records when debugging PDB linking.
Petr Hosek [Tue, 11 Jul 2017 23:41:15 +0000 (23:41 +0000)]
[CMake] Support multi-target runtimes build
This changes adds support for building runtimes for multiple
different targets using LLVM runtimes directory.
The implementation follow the model used already by the builtins
build which already supports this option. To specify the runtimes
targets to be built, use the LLVM_RUNTIME_TARGETS variable, where
the valuae is the list of targets to build runtimes for. To pass
a per target variable to the runtimes build, you can set
RUNTIMES_<target>_<variable> where <variable> will be passed to the
runtimes build for <target>.
Each runtime target (except for the default one) will be installed
into lib/<target> subdirectory. Build targets will be suffixed with
the target name.
Davide Italiano [Tue, 11 Jul 2017 23:10:17 +0000 (23:10 +0000)]
[IPO] Temporarily rollback r307215.
[GlobalOpt] Remove unreachable blocks before optimizing a function.
While the change is presumably correct, it exposes a latent bug
in DI which breaks on of the CFI checks. I'll analyze it further
and try to understand what's going on.
Jakub Kuderski [Tue, 11 Jul 2017 22:55:04 +0000 (22:55 +0000)]
[Dominators] Use a custom DFS implementation
Summary:
Custom DFS implementation allows us to skip over certain nodes without adding them to the visited map, which is not easily doable with llvm's dfs iterators. What's more, caching predecessors becomes easy.
This patch implements a single DFS function (template) for both forward and reverse DFS, which should be easier to maintain then separate two ones.
Skipping over nodes based on a predicate will be necessary later to implement incremental updates.
There also seems to be a very slight performance improved when bootstrapping clang with this patch on my machine (3:28s -> 3:26s) .
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Martin Storsjo [Tue, 11 Jul 2017 21:07:10 +0000 (21:07 +0000)]
[ARM, ELF] Don't shift movt relocation offsets
For ELF, a movw+movt pair is handled as two separate relocations.
If an offset should be applied to the symbol address, this offset is
stored as an immediate in the instruction (as opposed to stored as an
offset in the relocation itself).
Even though the actual value stored in the movt immediate after linking
is the top half of the value, we need to store the unshifted offset
prior to linking. When the relocation is made during linking, the offset
gets added to the target symbol value, and the upper half of the value
is stored in the instruction.
This makes sure that movw+movt with offset symbols get properly
handled, in case the offset addition in the lower half should be
carried over to the upper half.
This makes the output from the additions to the test case match
the output from GNU binutils.
For COFF and MachO, the movw/movt relocations are handled as a pair,
and the overflow from the lower half gets carried over to the movt,
so they should keep the shifted offset just as before.
Dan Liew [Tue, 11 Jul 2017 18:27:48 +0000 (18:27 +0000)]
[LibFuzzer] Fix `-Wpedantic` warning reported by Eric Christopher.
The warning is reproducible with GCC 4.8. Thanks to David Blaikie for
the suggested fix.
The reported warning was
```
/usr/local/google/home/echristo/sources/llvm/lib/Fuzzer/FuzzerExtFunctions.def:29:10: warning: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Wpedantic]
EXT_FUNC(__lsan_enable, void, (), false);
^
/usr/local/google/home/echristo/sources/llvm/lib/Fuzzer/FuzzerExtFunctionsWeak.cpp:44:24: note: in definition of macro ‘EXT_FUNC’
CheckFnPtr((void *)::NAME, #NAME, WARN);
^
```
[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
Base test for avx512
adding new base test to trunk befor commit change on the test
Anna Thomas [Tue, 11 Jul 2017 17:16:33 +0000 (17:16 +0000)]
[LoopUnrollRuntime] Avoid multi-exit nested loop with epilog generation
The loop structure for the outer loop does not contain the epilog
preheader when we try to unroll inner loop with multiple exits and
epilog code is generated. For now, we just bail out in such cases.
Added a test case that shows the problem. Without this bailout, we would
trip on assert saying LCSSA form is incorrect for outer loop.
[Support] - Add bad alloc error handler for handling allocation malfunctions
Summary:
Patch by Klaus Kretzschmar
We would like to introduce a new type of llvm error handler for handling
bad alloc fault situations. LLVM already provides a fatal error handler
for serious non-recoverable error situations which by default writes
some error information to stderr and calls exit(1) at the end (functions
are marked as 'noreturn').
For long running processes (e.g. a server application), exiting the
process is not an acceptable option, especially not when the system is
in a temporary resource bottleneck with a good chance to recover from
this fault situation. In such a situation you would rather throw an
exception to stop the current compilation and try to overcome the
resource bottleneck. The user should be aware of the problem of throwing
an exception in bad alloc situations, e.g. you must not do any
allocations in the unwind chain. This is especially true when adding
exceptions in existing unfamiliar code (as already stated in the comment
of the current fatal error handler)
So the new handler can also be used to distinguish from general fatal
error situations where recovering is no option. It should be used in
cases where a clean unwind after the allocation is guaranteed.
This patch contains:
- A report_bad_alloc function which calls a user defined bad alloc
error handler. If no user handler is registered the
report_fatal_error function is called. This function is not marked as
'noreturn'.
- A install/restore_bad_alloc_error_handler to install/restore the bad
alloc handler.
- An example (in Mutex.cpp) where the report_bad_alloc function is
called in case of a malloc returns a nullptr.
If this patch gets accepted we would create similar patches to fix
corresponding malloc/calloc usages in the llvm code.
Tony Jiang [Tue, 11 Jul 2017 16:42:20 +0000 (16:42 +0000)]
[PPC] Fix two bugs in frame lowering.
1. The available program storage region of the red zone to compilers is 288
bytes rather than 244 bytes.
2. The formula for negative number alignment calculation should be
y = x & ~(n-1) rather than y = (x + (n-1)) & ~(n-1).
Summary:
This speeds up the LLD test suite on Windows by 3x. Most of the time is
spent on lld/test/ELF/linkerscript/diagnostics.s, which repeatedly
constructs linker scripts with appending echo commands.
Philip Pfaffe [Tue, 11 Jul 2017 11:17:44 +0000 (11:17 +0000)]
[PM] Another post-commit fix in NewPMDriver
There were two errors in the parsing of opt's command line options for
extension point pipelines. The EP callbacks are not supposed to return a
value. To check the pipeline text for correctness, I now try to parse it
into a temporary PM object, and print a message on failure. This solves
the compile time error for the lambda return type, as well as correctly
handles unparsable pipelines now.
Make sure that all the legalizer tests where the original instruction
needs to be removed check for the removal. We do this by adding
CHECK-NOT lines before and after the replacement sequence. This won't
catch pathological cases where the instruction remains somewhere in the
middle of the instruction sequence that's supposed to replace it, but
hopefully that won't occur in practice (since ideally we'd be setting
the insert point for the new instruction sequence either before or after
the original instruction and not fiddle with it while building the
sequence).
Daniel Sanders [Tue, 11 Jul 2017 10:40:18 +0000 (10:40 +0000)]
[globalisel][tablegen] Fix an multi-insn match bug where ComplexPattern is used on multiple insns.
In each rule, each use of ComplexPattern is assigned an element in the Renderers
array. The matcher then collects renderer functions in this array and they are
used to render instructions. This works well for a single instruction but a
bug in the allocation mechanism causes the elements to be assigned on a
per-instruction basis rather than a per-rule basis.
So in the case of:
(set GPR32:$dst, (Op complex:$src1, complex:$src2))
tablegen currently assigns elements 0 and 1 to $src1 and $src2 respectively,
but for:
(set GPR32:$dst, (Op complex:$src1, (Op complex:$src2)))
it currently assigned both $src1 and $src2 the same element (0). This results in
one complex operand being rendered twice and the other being forgotten.
This patch corrects the allocation such that $src1 and $src2 are still allocated
different elements in this case.
Peter Smith [Tue, 11 Jul 2017 09:47:12 +0000 (09:47 +0000)]
[ARM] ldr pc,=expression should be allowed in Thumb2
This change allows the pc to be used as a destination register for the
pseudo instruction LDR pc,=expression . The pseudo instruction must not be
transformed into a MOV, but it can use the Thumb2 LDR (literal) instruction
to a constant pool entry. See (A7.7.43 from ARMv7M ARM ARM).
Daniel Sanders [Tue, 11 Jul 2017 08:57:29 +0000 (08:57 +0000)]
[globalisel][tablegen] Correct matching of intrinsic ID's.
TreePatternNode considers them to be plain integers but MachineInstr considers
them to be a distinct kind of operand.
The tweak to AArch64InstrInfo.td to produce a simple test case is a NFC for
everything except GlobalISelEmitter (confirmed by diffing the tablegenerated
files). GlobalISelEmitter is currently unable to infer the type of operands in
the Dst pattern from the operands in the Src pattern.
Revert Revert [MBP] do not rotate loop if it creates extra branch
This is a second attempt to land this patch.
The first one resulted in a crash of clang sanitizer buildbot.
The fix is here and regression test is added.
This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.
We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.
Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one
So if C is not a predecessor of H then we introduce extra branch.
This change actually prohibits rotation of the loop if both true
Best Exit has next element in chain as successor.
Last element in chain is not a predecessor of first element of chain.
[CGP] Relax a bit restriction for optimizeMemoryInst to extend scope
CodeGenPrepare::optimizeMemoryInst contains a check that we do nothing
if all instructions combining the address for memory instruction is in the same
block as memory instruction itself.
However if any of these instruction are placed after memory instruction then
address calculation will not be folded to memory instruction.