Nirav Dave [Mon, 19 Jun 2017 15:32:28 +0000 (15:32 +0000)]
Allow truncated and extend memory operations in Store Merge. NFCI.
As all store merges checks are based on the memory operation
performed, allow use of truncated stores and extended loads as valid
input candidates for merging.
Relanding after fixing selection between truncated and normal store.
Anna Thomas [Mon, 19 Jun 2017 15:23:33 +0000 (15:23 +0000)]
[JumpThreading][LVI] Invalidate LVI information after blocks are merged
Summary:
After a single predecessor is merged into a basic block, we need to invalidate
the LVI information for the new merged block, when LVI is not provably true for
all of instructions in the new block.
The test cases added show the correct LVI information using the LVI printer
pass.
Use llvm::make_unique to avoid ambiguity with MSVC.
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.
Florian Hahn [Mon, 19 Jun 2017 10:51:38 +0000 (10:51 +0000)]
[CodeGen] Add generic MacroFusion pass.
Summary:
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.
Diana Picus [Mon, 19 Jun 2017 09:40:51 +0000 (09:40 +0000)]
[ARM] GlobalISel: Support G_ICMP for i32 and pointers
Add support throughout the pipeline:
- mark as legal for s32 and pointers
- map to GPRs
- lower to a sequence of instructions, which moves 0 or 1 into the
result register based on the flags set by a CMPrr
We have copied from FastISel a helper function which maps CmpInst
predicates into ARMCC codes. Ideally, we should be able to move it
somewhere that both FastISel and GlobalISel can use.
Max Kazantsev [Mon, 19 Jun 2017 06:24:53 +0000 (06:24 +0000)]
[SCEV] Teach SCEVExpander to expand BinPow
Current implementation of SCEVExpander demonstrates a very naive behavior when
it deals with power calculation. For example, a SCEV for x^8 looks like
(x * x * x * x * x * x * x * x)
If we try to expand it, it generates a very straightforward sequence of muls, like:
x2 = mul x, x
x3 = mul x2, x
x4 = mul x3, x
...
x8 = mul x7, x
This is a non-efficient way of doing that. A better way is to generate a sequence of
binary power calculation. In this case the expanded calculation will look like:
x2 = mul x, x
x4 = mul x2, x2
x8 = mul x4, x4
In some cases the code size reduction for such SCEVs is dramatic. If we had a loop:
x = a;
for (int i = 0; i < 3; i++)
x = x * x;
And this loop have been fully unrolled, we have something like:
x = a;
x2 = x * x;
x4 = x2 * x2;
x8 = x4 * x4;
The SCEV for x8 is the same as in example above, and if we for some reason
want to expand it, we will generate naively 7 multiplications instead of 3.
The BinPow expansion algorithm here allows to keep code size reasonable.
This patch teaches SCEV Expander to generate a sequence of BinPow multiplications
if we have repeating arguments in SCEVMulExpressions.
David Blaikie [Mon, 19 Jun 2017 05:34:21 +0000 (05:34 +0000)]
[Doc] Fix getelementptr description about arguments
Section "Arguments" of `getelementptr` [1] says the first argument is a
type, the second argument is a pointer or a vector of pointers, and is
the base address to start from. Update `getelementptr` FAQ [2]
accordingly, based on discussion with David on the mailing list [3].
Craig Topper [Sun, 18 Jun 2017 18:15:41 +0000 (18:15 +0000)]
[APFloat] Move the integerPartWidth constant into APFloatBase. Remove integerPart typedef at file scope and just use the one in APFloatBase everywhere. NFC
Kamil Rytarowski [Sun, 18 Jun 2017 16:52:32 +0000 (16:52 +0000)]
Implement AllocateRWX and ReleaseRWX for NetBSD
Summary:
NetBSD ships with PaX MPROTECT disallowing RWX mappings.
There is a solution to bypass this restriction with double mapping
RX (code) and RW (data) using mremap(2) MAP_REMAPDUP.
The initial mapping must be mmap(2)ed with protection:
PROT_MPROTECT(PROT_EXEC).
This functionality to bypass PaX MPROTECT appeared in NetBSD-7.99.72.
NAKAMURA Takumi [Sat, 17 Jun 2017 03:19:08 +0000 (03:19 +0000)]
[CMake] Introduce LLVM_TARGET_TRIPLE_ENV as an option to override LLVM_DEFAULT_TARGET_TRIPLE at runtime.
No behavior is changed if LLVM_TARGET_TRIPLE_ENV is blank or undefined.
If LLVM_TARGET_TRIPLE_ENV is "TEST_TARGET_TRIPLE" and $TEST_TARGET_TRIPLE is not blank,
llvm::sys::getDefaultTargetTriple() returns $TEST_TARGET_TRIPLE.
Lit resets config.target_triple and config.environment[LLVM_TARGET_TRIPLE_ENV] to change the default target.
Without changing LLVM_DEFAULT_TARGET_TRIPLE nor rebuilding, lit can be run;
Matthias Braun [Sat, 17 Jun 2017 02:08:18 +0000 (02:08 +0000)]
RegScavenging: Add scavengeRegisterBackwards()
Re-apply r276044/r279124/r305516. Fixed a problem where we would refuse
to place spills as the very first instruciton of a basic block and thus
artifically increase pressure (test in
test/CodeGen/PowerPC/scavenging.mir:spill_at_begin)
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Sam Clegg [Fri, 16 Jun 2017 23:59:10 +0000 (23:59 +0000)]
[WebAssembly] Use __stack_pointer global when writing wasm binary
This ensures that symbolic relocations are generated for stack
pointer manipulations.
These relocations are of type R_WEBASSEMBLY_GLOBAL_INDEX_LEB.
This change also adds support for reading relocations of this
type in WasmObjectFile.cpp.
Since its a globally imported symbol this does mean that
the get_global/set_global instruction won't be valid until
the objects are linked that global used in no longer an
imported global.
Zachary Turner [Fri, 16 Jun 2017 23:42:44 +0000 (23:42 +0000)]
[CodeView] Fix random access of type names.
Suppose we had a type index offsets array with a boundary at type index
N. Then you request the name of the type with index N+1, and that name
requires the name of index N-1 (think a parameter list, for example). We
didn't handle this, and we would print something like (<unknown UDT>,
<unknown UDT>).
The fix for this is not entirely trivial, and speaks to a larger
problem. I think we need to kill TypeDatabase, or at the very least kill
TypeDatabaseVisitor. We need a thing that doesn't do any caching
whatsoever, just given a type index it can compute the type name "the
slow way". The reason for the bug is that we don't have anything like
that. Everything goes through the type database, and if we've visited a
record, then we're "done". It doesn't know how to do the expensive thing
of re-visiting dependent records if they've not yet been visited.
What I've done here is more or less copied the code (albeit greatly
simplified) from TypeDatabaseVisitor, but wrapped it in an interface
that just returns a std::string. The logic of caching the name is now in
LazyRandomTypeCollection. Eventually I'd like to move the record
database here as well and the visited record bitfield here as well, at
which point we can actually just delete TypeDatabase. I don't see any
reason for it if a "sequential" collection is just a special case of a
random access collection with an empty partial offsets array.
Yonghong Song [Fri, 16 Jun 2017 23:28:04 +0000 (23:28 +0000)]
bpf: fix a strict-aliasing issue
Davide Italiano reported the following issue if llvm
is compiled with gcc -Wstrict-aliasing -Werror:
.....
lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BPFISelDAGToDAG.cpp.o
../lib/Target/BPF/BPFISelDAGToDAG.cpp: In member function ‘virtual
void {anonymous}::BPFDAGToDAGISel::PreprocessISelDAG()’:
../lib/Target/BPF/BPFISelDAGToDAG.cpp:264:26: warning: dereferencing
type-punned pointer will break strict-aliasing rules
[-Wstrict-aliasing]
val = *(uint16_t *)new_val;
.....
The error is caused by my previous commit (revision 305560).
This patch fixed the issue by introducing an union to avoid
type casting.
Signed-off-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305608 91177308-0d34-0410-b5e6-96231b3b80d8
Craig Topper [Fri, 16 Jun 2017 23:26:23 +0000 (23:26 +0000)]
[ConstantRange] Implement getSignedMin/Max in a less complicated and faster way
Summary: As far as I can tell we should be able to implement these almost the same way we do unsigned, but using signed comparisons and checks for min signed value instead of min unsigned value.
Adrian Prantl [Fri, 16 Jun 2017 22:40:04 +0000 (22:40 +0000)]
Improve the accuracy of variable ranges .debug_loc location lists.
For the following motivating example
bool c();
void f();
bool start() {
bool result = c();
if (!c()) {
result = false;
goto exit;
}
f();
result = true;
exit:
return result;
}
we would previously generate a single DW_AT_const_value(1) because
only the DBG_VALUE in the second-to-last basic block survived
codegen. This patch improves the heuristic used to determine when a
DBG_VALUE is available at the beginning of its variable's enclosing
lexical scope:
- Stop giving singular constants blanket permission to take over the
entire scope. There is still a special case for constants in the
function prologue that we also miight want to retire later.
- Use the lexical scope information to determine available-at-entry
instead of proximity to the function prologue.
After this patch we generate a location list with a more accurate
narrower availability for the constant true value. As a pleasant side
effect, we also generate inline locations instead of location lists
where a loacation covers the entire range of the enclosing lexical
scope.
Measured on compiling llc with four targets this doesn't have an
effect on compile time and reduces the size of the debug info for llc
by ~600K.
[DWARF] Corrected behavior for when no .apple_names section is present in the object.
The verifier should not output any message in such a case.
Added test case with no .apple_name section in the file to verify new functionality.
Made existing test case more specific.
Eric Beckmann [Fri, 16 Jun 2017 21:13:24 +0000 (21:13 +0000)]
Switch external cvtres.exe for llvm's own resource library.
In this patch, I flip the switch in DriverUtils from using the external
cvtres.exe tool to using the Windows Resource library in llvm.
I also fixed a bug where .rsrc sections were marked as discardable
memory and therefore were placed in the wrong order in the final PE.
Furthermore, I modified WindowsResource to write the coff directly to a
memory buffer instead of to file, also had it use the machine types
already declared in COFF.h instead creating my own enum.
Finally, I flipped the switch to allow all unit tests that had
previously run only on windows due to a winres dependency to run
cross-platform.
Anna Thomas [Fri, 16 Jun 2017 21:08:37 +0000 (21:08 +0000)]
[InstCombine] Set correct insertion point for selects generated while folding phis
Summary:
When we fold vector constants that are operands of phi's that feed into select,
we need to set the correct insertion point for the *new* selects that get generated.
The correct insertion point is the incoming block for the phi.
Such cases can occur with patch r298845, which fixed folding of
vector constants, but the new selects could be inserted incorrectly (as the added
test case shows).
Wei Mi [Fri, 16 Jun 2017 20:21:01 +0000 (20:21 +0000)]
[GVN] Recommit the patch "Add phi-translate support in scalarpre".
The recommit fixes two bugs: The first one is to use CurrentBlock instead of
PREInstr's Parent as param of performScalarPREInsertion because the Parent
of a clone instruction may be uninitialized. The second one is stop PRE when
CurrentBlock to its predecessor is a backedge and an operand of CurInst is
defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available.
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.
long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();
void foo(long a, long b, long c, long d) {
g1 = a * b;
if (__builtin_expect(g2 > 3, 0)) {
a = c;
b = d;
g2 = a * b;
}
g3 = a * b; // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.
Yonghong Song [Fri, 16 Jun 2017 15:41:16 +0000 (15:41 +0000)]
bpf: avoid load from read-only sections
If users tried to have a structure decl/init code like below
struct test_t t = { .memeber1 = 45 };
It is very likely that compiler will generate a readonly section
to hold up the init values for variable t. Later load of t members,
e.g., t.member1 will result in a read from readonly section.
BPF program cannot handle relocation. This will force users to
write:
struct test_t t = {};
t.member1 = 45;
This is just inconvenient and unintuitive.
This patch addresses this issue by implementing BPF PreprocessISelDAG.
For any load from a global constant structure or an global array of
constant struct, it attempts to
translate it into a constant directly. The traversal of the
constant struct and other constant data structures are similar
to where the assembler emits read-only sections.
Four different unit test cases are also added to cover
different scenarios.
Signed-off-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305560 91177308-0d34-0410-b5e6-96231b3b80d8
Yonghong Song [Fri, 16 Jun 2017 15:30:55 +0000 (15:30 +0000)]
bpf: set missing types in insn tablegen file
o This is discovered during my study of 32-bit subregister
support.
o This is no impact on current functionality since we
only support 64-bit registers.
o Searching the web, looks like the issue has been discovered
before, so fix it now.
Signed-off-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305559 91177308-0d34-0410-b5e6-96231b3b80d8
This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.
[TableGen] Do not assume that the first variant is the original pattern
The variant generation for commutative/associative patterns would simply
delete the first output from the list assuming that it was identical to
the original pattern. This does not have to be the case, and a legitimate
variant could actually be removed that way.
[Hexagon] Don't kill live registers when creating mux out of tfr
The second part of r305300: when placing the mux at the later location,
make sure that it won't use any register that was killed between the
two original instructions. Remove any such kills and transfer them to
the mux.
Hiroshi Inoue [Fri, 16 Jun 2017 12:23:04 +0000 (12:23 +0000)]
[MachineBlockPlacement] trivial fix in comments, NFC
- Topologocal is abbreviated as "topo" in comments, but "top" is used in only one comment. Modify it for consistency.
- Capitalize "succ" and "pred" for consistency in one figure.
- Other trivial fixes.
Rui Ueyama [Fri, 16 Jun 2017 02:17:35 +0000 (02:17 +0000)]
Fix msan buildbot.
This patch should fix sanitizer-x86_64-linux-fast bot.
The problem was that the contents of this stream are aligned to 4 byte,
and the paddings were created just by incrementing `Offset`, so paddings
had undefined values. When the entire stream is written to an output,
it triggered msan.
Evgeniy Stepanov [Fri, 16 Jun 2017 00:18:29 +0000 (00:18 +0000)]
[cfi] CFI-ICall for ThinLTO.
Implement ControlFlowIntegrity for indirect function calls in ThinLTO.
Design follows the RFC in llvm-dev, see
https://groups.google.com/d/msg/llvm-dev/MgUlaphu4Qc/kywu0AqjAQAJ
[libFuzzer] change the default max_len from 64 to 4096. This will affect cases where libFuzzer is run w/o initial corpus or with a corpus of very small items.