[PM] Defend against getting slightly wrong template arguments passed
into CRTP base classes.
This can sometimes happen and not cause an immediate failure when the
derived class is, itself, a template. You can end up essentially calling
methods on the wrong derived type but a type where many things will
appear to "work".
To fail fast and with a clear error message we can use a static_assert,
but we have to stash that static_assert inside a method body or nested
type that won't need to be completed while building the base class. I've
tried to pick a reasonably small number of places that seemed like they
would definitely get triggered on use.
This is the last of the patch series defending against this that I have
planned, so far no bugs other than the original were found.
[IR/Analysis] Defend against getting slightly wrong template arguments
passed into CRTP base classes.
This can sometimes happen and not cause an immediate failure when the
derived class is, itself, a template. You can end up essentially calling
methods on the wrong derived type but a type where many things will
appear to "work".
To fail fast and with a clear error message we can use a static_assert,
but we have to stash that static_assert inside a method body or nested
type that won't need to be completed while building the base class. I've
tried to pick a reasonably small number of places that seemed like
reliably places for this to be instantiated.
[ADT] Defend against getting slightly wrong template arguments passed
into CRTP base classes.
This can sometimes happen and not cause an immediate failure when the
derived class is, itself, a template. You can end up essentially calling
methods on the wrong derived type but a type where many things will
appear to "work".
To fail fast and with a clear error message we can use a static_assert,
but we have to stash that static_assert inside a method body or nested
type that won't need to be completed while building the base class. I've
tried to pick a reasonably small number of places that seemed like
reliably places for this to be instantiated.
Revert r293017 and fix the actual underlying issue.
The patch committed in r293017, as discussed on the list, doesn't really
make sense but was causing an actual issue to go away.
The issue turns out to be that in one place the extra template arguments
were dropped from the OuterAnalysisManagerProxy. This in turn caused the
types used in one set of places to access the key to be completely
different from the types used in another set of places for both Loop and
CGSCC cases where there are extra arguments.
I have literally no idea how anything seemed to work with this bug in
place. It blows my mind. But it did except for mingw64 in a DLL build.
I've added a really handy static assert that helps ensure we don't break
this in the future. It immediately diagnoses the issue with a compile
failure and a very clear error message. Much better that staring at
backtraces on a build bot. =]
Yaxun Liu [Tue, 7 Feb 2017 00:43:21 +0000 (00:43 +0000)]
[AMDGPU] Lower null pointers in static variable initializer
For amdgcn target Clang generates addrspacecast to represent null pointers in private and local address spaces.
In LLVM codegen, the static variable initializer is lowered by virtual function AsmPrinter::lowerConstant which is target generic. Since addrspacecast is target specific, AsmPrinter::lowerConst
This patch overrides AsmPrinter::lowerConstant with AMDGPUAsmPrinter::lowerConstant, which is able to lower the target-specific addrspacecast in the null pointer representation so that -1 is co
Philip Reames [Tue, 7 Feb 2017 00:25:24 +0000 (00:25 +0000)]
[LVI] Switch from BFS to DFS exploration order
This patch changes the order in which LVI explores previously unexplored paths.
Previously, the code used an BFS strategy where each unexplored input was added to the search queue before any of them were explored. This has the effect of causing all inputs to be explored before returning to re-evaluate the merge point (non-local or phi node). This has the unfortunate property of doing redundant work if one of the inputs to the merge is found to be overdefined (i.e. unanalysable). If any input is overdefined, the result of the merge will be too; regardless of the values of other inputs.
The new code uses a DFS strategy where we re-evaluate the merge after evaluating each input. If we discover an overdefined input, we immediately return without exploring other inputs.
We have reports of large (4-10x) improvements of compile time with this patch and some reports of more precise analysis results as well. See the review discussion for details. The original motivating case was pr10584.
Dehao Chen [Mon, 6 Feb 2017 23:33:15 +0000 (23:33 +0000)]
Fix the samplepgo indirect call promotion bug: we should not promote a direct call.
Summary: Checking CS.getCalledFunction() == nullptr does not necessary indicate indirect call. We also need to check if CS.getCalledValue() is not a constant.
Taewook Oh [Mon, 6 Feb 2017 22:05:04 +0000 (22:05 +0000)]
[GVNHoist] Merge DebugLoc metadata on hoisted instructions
Summary:
When instructions are hoisted, current implementation keeps DebugLoc metadata of the instruction that chosen as Repl (and its GEP operand if Repl is a load or a store). However, DebugLoc metadata should be updated to the 'merged' location across all hoisted instructions. See the following example code:
```
1: typedef struct {
2: int a[10];
3: } S1;
4:
5: extern S1 *s1[10];
6:
7: void foo(int x, int y, int i) {
8: if (y)
9: s1[i]->a[i] = x + y;
10: else
11: s1[i]->a[i] = x;
12: }
```
Below is LLVM IR representation of the program before gvn-hoist:
```
%struct.S1 = type { [10 x i32] }
@s1 = external local_unnamed_addr global [10 x %struct.S1*], align 16
```
As you see, loads and their GEPs have been hosited from if.then/if.else block to entry block. However, DebugLoc metadata of these new instructions are still same as the instructions in if.then block, as they are moved/cloned from if.then block. This may result incorrect stepping and imprecise sample profile result.
Tim Northover [Mon, 6 Feb 2017 21:57:06 +0000 (21:57 +0000)]
GlobalISel: fall back gracefully when we can't map an operand's size.
AArch64 was asserting when it was asked to provide a register-bank of a size it
couldn't deal with (in this case an s128 IMPLICIT_DEF). But we want a robust
fallback path so this isn't allowed.
[SCEV] Scale back the test added in r294181 as it goes quadratic in
SCEV.
This test was immediately the slowest test in 'check-llvm' even in an
optimized build and was driving up the total test time by 50% for me.
Sanjoy has filed a PR about the quadratic behavior in SCEV but it is
also concerning that the test still passes given that r294181 added
a threshold at 32 to SCEV. I've followed up on the original patch to
figure out how this test should work long-term, but for now I want to
get check-llvm to be fast again.
IR: Consider two DISubprograms to be odr-equal if they have the same template parameters.
In ValueMapper we create new operands for MDNodes and
rely on MDNode::replaceWithUniqued to create a new MDNode
with the specified operands. However this doesn't always
actually happen correctly for DISubprograms because when we
uniquify the new node, we only odr-compare it with existing nodes
(MDNodeSubsetEqualImpl<DISubprogram>::isDeclarationOfODRMember). Although
the TemplateParameters field can refer to a distinct DICompileUnit via
DITemplateTypeParameter::type -> DICompositeType::scope -> DISubprogram::unit,
it is not currently included in the odr comparison. As a result, we can end
up getting our original DISubprogram back, which means we will have a cloned
module referring to the DICompileUnit in the original module, which causes
a verification error.
The fix I implemented was to consider TemplateParameters to be one of the
odr-equal properties. But I'm a little uncomfortable with this. In general it
seems unsound to rely on distinct MDNodes never being reachable from nodes
which we only check odr-equality of. My only long term suggestion would be
to separate odr-uniquing from full uniquing.
Kevin Enderby [Mon, 6 Feb 2017 21:01:08 +0000 (21:01 +0000)]
Fix a bug in llvm-obdump(1) with the -macho and -info-plist options
which caused it to print more than the (__TEXT,__info_plist) if that
section did not end with a null.
David Blaikie [Mon, 6 Feb 2017 20:19:02 +0000 (20:19 +0000)]
Get function start line number from DWARF info
DWARF info contains info about the line number at which a function starts (DW_AT_decl_line).
This patch creates a function to look up the start line number for a function, and returns it in
DILineInfo when looking up debug info for a particular address.
Refactor a helper function, FactorNodes, to search for a push node in constant space. This resolves a problem in a not-yet-upstreamed backend where a recursive pattern blew the call stack (at a depth of 255) under a debug build of tablegen. No functional change so no new test coverage. The change is minimal to avoid disturbing existing behaviour.
[PM/LCG] Remove the lazy RefSCC formation from the LazyCallGraph during
iteration.
The lazy formation of RefSCCs isn't really the most important part of
the laziness here -- that has to do with walking the functions
themselves -- and isn't essential to maintain. Originally, there were
incremental update algorithms that relied on updates happening
predominantly near the most recent RefSCC formed, but those have been
replaced with ones that have much tighter general case bounds at this
point. We do still perform asserts that only scale well due to this
incrementality, but those are easy to place behind EXPENSIVE_CHECKS.
Removing this simplifies the entire analysis by having a single up-front
step that builds all of the RefSCCs in a direct Tarjan walk. We can even
easily replace this with other or better algorithms at will and with
much less confusion now that there is no iterator-based incremental
logic involved. This removes a lot of complexity from LCG.
Another advantage of moving in this direction is that it simplifies
testing the system substantially as we no longer have to worry about
observing and mutating the graph half-way through the RefSCC formation.
We still need a somewhat special iterator for RefSCCs because we want
the iterator to remain stable in the face of graph updates. However,
this now merely involves relative indexing to the current RefSCC's
position in the sequence which isn't too hard.
Changes include:
- Updates to the instruction descriptor flags.
- Improvements to the packet shuffler and checker.
- Updates to the handling of certain relocations.
- Better handling of duplex instructions.
Kevin Enderby [Mon, 6 Feb 2017 18:43:18 +0000 (18:43 +0000)]
Fix a bug in llvm-obdump(1) with the -macho and -disassemble options
which caused it to not disassemble the bytes a the start of the section if
the section had symbols and the first symbol was not at the start of the
section.
Zachary Turner [Mon, 6 Feb 2017 18:31:21 +0000 (18:31 +0000)]
[Support] Add support for runtime endian values.
Endian functions only support reading and writing when the
endianness is known at compile time. This patch adds overloads
where the endianness is a runtime value, and then delegates the
compile-time versions to the runtime versions.
Dehao Chen [Mon, 6 Feb 2017 18:10:36 +0000 (18:10 +0000)]
Fix the bug of samplepgo indirect call promption when type casting of the return value is needed.
Summary: When type casting of the return value is needed, promoteIndirectCall will return the type casting instruction instead of the direct call. This patch changed to return the direct call instruction instead.
John Brawn [Mon, 6 Feb 2017 18:07:20 +0000 (18:07 +0000)]
[AArch64] Fix incorrect MachinePointerInfo in splitStoreSplat
When splitting up one store into several in splitStoreSplat we have to
make sure we get the MachinePointerInfo right, otherwise alias
analysis thinks they all store to the same location. This can then
cause invalid scheduling later on.
Amaury Sechet [Mon, 6 Feb 2017 16:21:41 +0000 (16:21 +0000)]
Commit full codegen for mul-i256.ll . NFC
The full codegen is committed for larger multiply, so that won't make the test suite more fragile. However, it'll allow to expose the effects fo various DAG combine.
Simon Pilgrim [Mon, 6 Feb 2017 13:44:45 +0000 (13:44 +0000)]
[X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined.
Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.
We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.
This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.
Simon Dardis [Mon, 6 Feb 2017 12:43:46 +0000 (12:43 +0000)]
[mips] dla expansion without the at register
Previously only the superscalar scheduled expansion of the dla macro for
MIPS64 was implemented. If assembler temporary register is not available
and the optional source register is not the destination register, synthesize
the address using the naive solution of adds and shifts.
Daniil Fukalov [Mon, 6 Feb 2017 12:38:06 +0000 (12:38 +0000)]
[SCEV] limit recursion depth and operands number in getAddExpr
for a quite big function with source like
%add = add nsw i32 %mul, %conv
%mul1 = mul nsw i32 %add, %conv
%add2 = add nsw i32 %mul1, %add
%mul3 = mul nsw i32 %add2, %add
; repeat couple of thousands times
that can be produced by loop unroll, getAddExpr() tries to recursively construct SCEV and runs almost infinite time.
Added recursion depth restriction (with new parameter to set it)
Igor Breger [Mon, 6 Feb 2017 08:37:41 +0000 (08:37 +0000)]
[X86][GlobalISel] Add limited ret lowering support to the IRTranslator.
Summary:
Support return lowering for i8/i16/i32/i64/float/double, vector type supported for 64bit platform only.
Support argument lowering for float/double types.
Simon Pilgrim [Sun, 5 Feb 2017 22:50:29 +0000 (22:50 +0000)]
[X86][SSE] Replace insert_vector_elt(vec, -1, idx) with shuffle
Similar to what we already do for zero elt insertion, we can quickly rematerialize 'allbits' vectors so to avoid a unnecessary gpr value and insertion into a vector
Revamp llvm::once_flag to be closer to std::once_flag
Summary:
Make this interface reusable similarly to std::call_once and std::once_flag interface.
This makes porting LLDB to NetBSD easier as there was in the original approach a portable way to specify a non-static once_flag. With this change translating std::once_flag to llvm::once_flag is mechanical.
Craig Topper [Sun, 5 Feb 2017 18:33:14 +0000 (18:33 +0000)]
[X86] In LowerTRUNCATE, create an ISD::VECTOR_SHUFFLE instead of explicitly creating a PSHUFB. This will be lowered by regular shuffle lowering to a PSHUFB later.
Similar was already done for several other shuffles in this function.
The test changes are because the old code used explicity zeroing for elements that could have been undef.
While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us.
Geoff Berry [Sun, 5 Feb 2017 18:28:14 +0000 (18:28 +0000)]
[SelectionDAG] In InstrEmitter, handle EXTRACT_SUBREG of a physical register.
Summary:
Without this change, the getVR() call would hit an assert since it was
being passed a physical register.
Update the AArch64/ldst-opt.ll test with a case that triggers this
behavior by adding a run with strict-align, which causes an unaligned
STR XZR instruction to be split into byte stores, creating an
EXTRACT_SUBREG of XZR that triggers the original problem.
Amaury Sechet [Sun, 5 Feb 2017 14:22:20 +0000 (14:22 +0000)]
[DAGCombiner] Leverage add's commutativity
Summary: This avoid the need to duplicate all pattern and actually end up exposing some opportunity to optimize existing pattern that did not exists in both directions on an existing test case.
Dylan McKay [Sun, 5 Feb 2017 09:53:45 +0000 (09:53 +0000)]
[AVR] Support zero-sized arguments in defined methods
It is sufficient to skip emission of these arguments as we have nothing
to actually pass through the function call.
The AVR-GCC reference has nothing to say about zero-sized arguments,
presumably because C/C++ doesn't support them. This means we don't have
to worry about ABI differences.
Craig Topper [Sat, 4 Feb 2017 23:26:34 +0000 (23:26 +0000)]
[DAGCombiner] In visitINSERT_VECTOR_ELT, move check for BUILD_VECTOR being legal below code that just canonicalizes INSERT_VECTOR_ELT without creating BUILD_VECTORS.