Sanjoy Das [Sat, 19 Dec 2015 22:40:28 +0000 (22:40 +0000)]
Nonnull elements in OperandBundleCallSites are not all Instructions
`CloneAndPruneIntoFromInst` sometimes RAUW's dead instructions with
`undef` before erasing them (to avoid deleting instructions that still
have uses). This changes the `WeakVH` in `OperandBundleCallSites` to
hold an `undef`, and we need to guard for this situation in eventuality
in `llvm::InlineFunction`.
Keno Fischer [Sat, 19 Dec 2015 03:32:23 +0000 (03:32 +0000)]
Hopefully fix debug-info-blocks.ll test on win32 bot
llc_dwarf adds an mtriple, which forces this to use COFF, causing
the test to fail. Hopefully using regular llc without the triple
will work fine everywhere
Tom Stellard [Sat, 19 Dec 2015 02:54:15 +0000 (02:54 +0000)]
AMDGPU/SI: Fix implemenation of isSourceOfDivergence() for graphics shaders
Summary:
The analysis of shader inputs was completely wrong. We were passing the
wrong index to AttributeSet::hasAttribute() and the logic for which
inputs where in SGPRs was wrong too.
Philip Reames [Sat, 19 Dec 2015 02:38:22 +0000 (02:38 +0000)]
[RS4GC] Remove an overly strong assertion
As shown by the included test case, it's reasonable to end up with constant references during base pointer calculation. The code actually handled this case just fine, we only had the assert to help isolate problems under the belief that constant references shouldn't be present in IR generated by managed frontends. This turned out to be wrong on two fronts: 1) Manual Jacobs is working on a language with constant references, and b) we found a case where the optimizer does create them in practice.
Keno Fischer [Sat, 19 Dec 2015 02:02:44 +0000 (02:02 +0000)]
Clean up the processing of dbg.value in various places
Summary:
First up is instcombine, where in the dbg.declare -> dbg.value conversion,
the llvm.dbg.value needs to be called on the actual loaded value, rather
than the address (since the whole point of this transformation is to be
able to get rid of the alloca). Further, now that that's cleaned up, we
can remove a hack in the backend, that would add an implicit OP_deref if
the argument to dbg.value was an alloca. This stems from before the
existence of DIExpression and is no longer necessary since the deref can
be expressed explicitly.
Now, in order to make sure that the tests pass with this change, we need to
correct the printing of DEBUG_VALUE comments to take into account the
expression, which wasn't taken into account before.
Unfortunately, for both these changes, there were a number of incorrect
test cases (mostly the wrong number of DW_OP_derefs, but also a couple
where the test itself was broken more badly). aprantl and I have gone
through and adjusted these test case in order to make them pass with
these fixes and in some cases to make sure they're actually testing
what they are meant to test.
Matt Arsenault [Sat, 19 Dec 2015 01:46:41 +0000 (01:46 +0000)]
AMDGPU: Switch barrier intrinsics to using convergent
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.
Nicolai Haehnle [Sat, 19 Dec 2015 01:16:06 +0000 (01:16 +0000)]
AMDGPU: fix overlapping copies in copyPhysReg
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.
Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)
Together with r255906, this fixes a VM fault in Unreal Elemental Demo.
While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.
James Y Knight [Sat, 19 Dec 2015 00:53:22 +0000 (00:53 +0000)]
Possibly fix MSVC compilation after r256054.
I don't have any way to test MSVC compilation, but maybe this will fix
the error:
llvm/Support/TrailingObjects.h(286) : error C3210: 'TrailingObjectsBase' : access declaration can only be applied to a base class member
llvm/Support/TrailingObjects.h(337) : see reference to class template instantiation 'llvm::TrailingObjects<BaseTy,TrailingTys...>' being compiled
llvm/Support/TrailingObjects.h(286) : error C2602: 'llvm::trailing_objects_internal::TrailingObjectsBase::OverloadToken' is not a member of a base class of 'llvm::TrailingObjects<BaseTy,TrailingTys...>'
llvm/Support/TrailingObjects.h(91) : see declaration of 'llvm::trailing_objects_internal::TrailingObjectsBase::OverloadToken'
Teresa Johnson [Fri, 18 Dec 2015 22:59:35 +0000 (22:59 +0000)]
Remove possibility of failures to due race in ThreadPool unittest
Remove all checks that required main thread to run faster than tasks in
ThreadPool, and yields which are now unnecessary. This should fix some
bot failures.
Alexey Samsonov [Fri, 18 Dec 2015 22:02:14 +0000 (22:02 +0000)]
[Symbolize] Improve the ownership of parsed objects.
This code changes the way Symbolize handles parsed binaries: now
parsed OwningBinary<Binary> is not broken into (binary, memory buffer)
pair, and is just stored as-is in a cache. ObjectFile components
of Mach-O universal binaries are also stored explicitly in a
separate cache.
Additionally, this change:
* simplifies the code that parses/caches binaries: it's now done
in a single place, not three different functions.
* makes flush() method behave as expected, and actually clear
the cached parsed binaries and objects.
* fixes a dangling pointer issue described in
http://reviews.llvm.org/D15638
Cong Hou [Fri, 18 Dec 2015 21:53:24 +0000 (21:53 +0000)]
Use getEdgeProbability() instead of getEdgeWeight() in BFI and remove getEdgeWeight() interfaces from MBPI.
This patch removes all getEdgeWeight() interfaces from CodeGen directory. As
getEdgeProbability() is a little more expensive than getEdgeWeight(), I will
compose a patch soon in which BPI only stores probabilities instead of edge
weights so that getEdgeProbability() will have O(1) time.
Jun Bum Lim [Fri, 18 Dec 2015 20:53:47 +0000 (20:53 +0000)]
Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst
When identifying blocks post-dominated by an unreachable-terminated block
in BranchProbabilityInfo, consider only the edge to the normal destination
block if the terminator is InvokeInst and let calcInvokeHeuristics() decide
edge weights for the InvokeInst.
Pete Cooper [Fri, 18 Dec 2015 18:51:08 +0000 (18:51 +0000)]
Improve DWARFDebugFrame::parse to also handle __eh_frame.
LLVM MC has single methods which can handle the output of EH frame and DWARF CIE's and FDE's.
This code improves DWARFDebugFrame::parse to do the same for parsing.
This also allows llvm-objdump to support the --dwarf=frames option which objdump supports. This
option dumps the .eh_frame section using the new code in DWARFDebugFrame::parse.
Jun Bum Lim [Fri, 18 Dec 2015 18:08:30 +0000 (18:08 +0000)]
[AArch64] Promote loads from stores
This change promotes load instructions which directly read from stores by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
STRWui %W1, %X0, 1
%W0 = LDRHHui %X0, 3
becomes
STRWui %W1, %X0, 1
%W0 = UBFMWri %W1, 16, 31
Teresa Johnson [Fri, 18 Dec 2015 17:51:37 +0000 (17:51 +0000)]
[ThinLTO/LTO] Don't link in unneeded metadata
Summary:
Third patch split out from http://reviews.llvm.org/D14752.
Only map in needed DISubroutine metadata (imported or otherwise linked
in functions and other DISubroutine referenced by inlined instructions).
This is supported for ThinLTO, LTO and llvm-link --only-needed, with
associated tests for each one.
GlobalsAA: Take advantage of ArgMemOnly, InaccessibleMemOnly and InaccessibleMemOrArgMemOnly attributes
Summary:
1. Modify AnalyzeCallGraph() to retain function info for external functions
if the function has [InaccessibleMemOr]ArgMemOnly flags.
2. When analyzing the use of a global is function parameter at a call site,
mark the callee also as modifying the global appropriately.
3. Add additional test cases.
Philip Reames [Fri, 18 Dec 2015 03:53:28 +0000 (03:53 +0000)]
[RS4GC] Use an value handle to help isolate errors quickly
Inspired by the bug reported in 25846. Whatever we end up doing about that one, the value handle change is a generally good one since it will help catch this type of mistake more quickly.
Eric Christopher [Fri, 18 Dec 2015 01:46:52 +0000 (01:46 +0000)]
Reorganize the C API headers to improve build times.
Type specific declarations have been moved to Type.h and error handling
routines have been moved to ErrorHandling.h. Both are included in Core.h
so nothing should change for projects directly including the headers,
but transitive dependencies may be affected.
Hans Wennborg [Thu, 17 Dec 2015 23:18:39 +0000 (23:18 +0000)]
[X86] Use push-pop for materializing small constants under 'minsize'
Use the 3-byte (4 with REX prefix) push-pop sequence for materializing
small constants. This is smaller than using a mov (5, 6 or 7 bytes
depending on size and REX prefix), but it's likely to be slower, so
only used for 'minsize'.
Cong Hou [Thu, 17 Dec 2015 22:27:07 +0000 (22:27 +0000)]
[BranchProbability] Remove the restriction that known and unknown probabilities cannot coexist when being normalized.
The current BranchProbability::normalizeProbabilities() forbids known and
unknown probabilities to coexist in the list. This was once used to help
capture probability exceptions but has caused some reported build
failures (https://llvm.org/bugs/show_bug.cgi?id=25838).
This patch removes this restriction by evenly distributing the complement
of the sum of all known probabilities to unknown ones. We could still
treat this as an abnormal behavior, but it is better to emit warnings in
our future profile validator.
Philip Reames [Thu, 17 Dec 2015 22:19:27 +0000 (22:19 +0000)]
[InstCombine] Extend peephole DSE to handle unordered atomics
This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine.
Key points:
* We only remove unordered or simple stores.
* The loads producing values consumed by dead stores don't influence whether the store is dead.
JF Bastien [Thu, 17 Dec 2015 22:09:19 +0000 (22:09 +0000)]
Polish atomic pointers
Summary:
I didn't realize that we already allowed atomic load/store of pointers,
it was added in 2012 by r162146. This patch updates the documentation
and tightens the verifier by using DataLayout to make sure that the
stored size is byte-sized and power-of-two. DataLayout is also used for
integers, and while I'm here I updated the corresponding code for
cmpxchg and rmw.
See the following discussion for context and upcoming changes to
add floating-point and vector atomics:
https://groups.google.com/forum/#!topic/llvm-dev/Nh0P_E3CRoo/discussion
Philip Reames [Thu, 17 Dec 2015 18:50:50 +0000 (18:50 +0000)]
[EarlyCSE] DSE of atomic unordered stores
The rules for removing trivially dead stores are a lot less complicated than loads. Since we know the later store post dominates the former and the former dominates the later, unless the former has side effects other than the actual store, we can remove it. One slightly surprising thing is that we can freely remove atomic stores, even if the later one isn't atomic. There's no guarantee the atomic one was every visible.
For the moment, we don't handle DSE of ordered atomic stores. We could extend the same chain of reasoning to them, but the catch is we'd then have to model the ordering effect without a store instruction. Since our fences are a stronger than our operation orderings, simple using a fence isn't an obvious win. This arguable calls for a refinement in our fence specification, but that's (much) later work.
Teresa Johnson [Thu, 17 Dec 2015 17:14:09 +0000 (17:14 +0000)]
[ThinLTO] Metadata linking for imported functions
Summary:
Second patch split out from http://reviews.llvm.org/D14752.
Maps metadata as a post-pass from each module when importing complete,
suturing up final metadata to the temporary metadata left on the
imported instructions.
This entails saving the mapping from bitcode value id to temporary
metadata in the importing pass, and from bitcode value id to final
metadata during the metadata linking postpass.
Nicolai Haehnle [Thu, 17 Dec 2015 16:46:42 +0000 (16:46 +0000)]
AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndex
Summary:
The method insertNOPs expected the number of wait states to be passed as
parameter, while eliminateFrameIndex passed the immediate argument for the
S_NOP, leading to an off-by-one error. Rename the method to make the
meaning of its parameter clearer. The number of 4 / 5 wait states (which
is what the method has always _tried_ to do according to the comment) is
correct according to the hardware docs.
I stumbled upon this while trying to track down the cause of
https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed,
this patch unfortunately does not fix that bug...
Andy Gibbs [Thu, 17 Dec 2015 16:43:53 +0000 (16:43 +0000)]
Revert r254592 (virtual dtor in SCEVPredicate).
Clang has better diagnostics in this case. It is not necessary therefore
to change the destructor to avoid what is effectively an invalid warning
in gcc. Instead, better handle the warning flags given to the compiler.
Rafael Espindola [Thu, 17 Dec 2015 16:22:06 +0000 (16:22 +0000)]
Avoid explicit relocation sorting most of the time.
These days relocations are created and stored in a deterministic way.
The order they are created is also suitable for the .o file, so we don't
need an explicit sort.