Quentin Colombet [Thu, 15 Oct 2015 00:41:26 +0000 (00:41 +0000)]
[ARM] Make sure we do not dereference the end iterator when accessing debug
information.
Although the problem was always here, it would only be exposed when
shrink-wrapping is enable.
Davide Italiano [Thu, 15 Oct 2015 00:05:32 +0000 (00:05 +0000)]
[JIT] TrivialMemoryManager: Fail if we can't allocate memory.
TrivialMemoryManager currently doesn't check the return type of AllocateRWX --
and returns a 'null' MemoryBlock to its caller. As pointed out by Lang,
this exposes some serious issues with the MemoryManager interface. There's,
in fact, no way to report back an error to clients rather than aborting in
case memory can't be allocated. Eventually the interface will grow to support
this, but for now, fail sooner rather than later.
Akira Hatanaka [Wed, 14 Oct 2015 23:48:10 +0000 (23:48 +0000)]
[MachO] Stop generating *coal* sections.
Recommit r250342: add -arch=ppc32 to the RUN lines of powerpc tests.
Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).
The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.
This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:
TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data
If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.
Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.
Cong Hou [Wed, 14 Oct 2015 23:14:17 +0000 (23:14 +0000)]
Update the branch weight metadata in JumpThreading pass.
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
This is the third attempt to submit this patch, while the first two led to failures in some FDO tests. After investigation, it is the edge weight normalization that caused those failures. In this patch the edge weight normalization is fixed so that there is no zero weight in the output and the sum of all weights can fit in 32-bit integer. Several unit tests are added.
Philip Reames [Wed, 14 Oct 2015 22:46:19 +0000 (22:46 +0000)]
[SimplifyCFG] Speculatively flatten CFG based on profiling metadata
If we have a series of branches which are all unlikely to fail, we can possibly combine them into a single check on the fastpath combined with a bit of dispatch logic on the slowpath. We don't want to do this unconditionally since it requires speculating instructions past a branch, but if the profiling metadata on the branch indicates profitability, this can reduce the number of checks needed along the fast path.
The canonical example this is trying to handle is removing the second bounds check implied by the Java code: a[i] + a[i+1]. Note that it can currently only do so for really simple conditions and the values of a[i] can't be used anywhere except in the addition. (i.e. the load has to have been sunk already and not prevent speculation.) I plan on extending this transform over the next few days to handle alternate sequences.
Akira Hatanaka [Wed, 14 Oct 2015 22:45:36 +0000 (22:45 +0000)]
[MachO] Stop generating *coal* sections.
Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).
The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.
This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:
TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data
If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.
Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.
Philip Reames [Wed, 14 Oct 2015 22:42:12 +0000 (22:42 +0000)]
Tighten known bits for ctpop based on zero input bits
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.
The basic idea here is that input bits that are known zero reduce the maximum count that the intrinsic could return. We know that the number of bits required to represent a particular count is at most log2(N)+1.
Chris Bieneman [Wed, 14 Oct 2015 21:50:09 +0000 (21:50 +0000)]
[CMake] Make LLVM_VERSION_* variables user definable
CMake's set command overwrites existing values. Package maintainers may want or need to set the version variables manually, so we need to only set them if they are not already defined. Note I use the "if(NOT DEFINED ...)" syntax deliberately in the last case because empty string is a valid value for the suffx, but not the other variables.
PR25157 identifies a bug where a load plus a vector shuffle is
incorrectly converted into an LXVDSX instruction. That optimization
is only valid if the load is of a doubleword, and in the noted case,
it was not. This corrects that problem.
Joint patch with Eric Schweitz, who provided the bugpoint-reduced test
case.
Daniel Berlin [Wed, 14 Oct 2015 19:54:24 +0000 (19:54 +0000)]
[IDFCalculator] Use DominatorTreeBase instead of DominatorTree
Summary:
IDFCalculator used a DominatorTree instance for its calculations. Since the PostDominatorTree struct is not a subclass of DominatorTree, it wasn't possible to use PDT in IDFCalculator to compute post-dominance frontiers.
This patch makes IDFCalculator work with a DominatorTreeBase<BasicBlock> instead, which enables PDTs to be utilized.
Andrea Di Biagio [Wed, 14 Oct 2015 10:03:13 +0000 (10:03 +0000)]
[x86][FastISel] Teach how to select nontemporal stores.
This patch teaches x86 fast-isel how to select nontemporal stores.
On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords.
Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal
vector stores.
Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti'
for doubleword/quadword nontemporal stores. In the case of nontemporal stores
of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of
movntps/movntpd/movntdq.
With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now
always get the expected (V)MOVNT instructions.
The lack of fast-isel support for nontemporal stores was spotted when analyzing
the -O0 codegen for nontemporal stores.
Chris Bieneman [Wed, 14 Oct 2015 07:37:00 +0000 (07:37 +0000)]
[CMake] Set Policy CMP0048 to NEW
CMake 3.0 introduced the VERSION option for the project() command. If you don't specify the VERSION in the function it will clear out variables matching ${PROJECT_NAME}_VERSION_${MAJOR|MINOR|PATCH|TWEAK}.
This makes overriding LLVM_VERSION_* not work properly with newer versions of CMake. To make this work properly we need to:
(1) Optionally set the policy to NEW
(2) Move default versions and setting PACKAGE_VERSION to before the call to project()
(3) If the policy is set, pass the VERSION and LANGUAGES options in the new format
This change should have no behavioral change for CMake versions before 3.0, and it makes the behavior of later versions match the earlier versions.
Craig Topper [Wed, 14 Oct 2015 05:37:42 +0000 (05:37 +0000)]
[X86] Update CPU detection to only enable XSAVE features if the OS has enabled them and the saving of YMM state. This seems to be consistent with gcc behavior.
Richard Smith [Wed, 14 Oct 2015 00:04:19 +0000 (00:04 +0000)]
Rename one of our two llvm::GCOVOptions classes to llvm::GCOV::Options. We used
to get away with this because llvm/Support/GCOV.h was an implementation detail
of the llvm-gcov tool, but it's now being used by FDO.
Diego Novillo [Tue, 13 Oct 2015 22:48:46 +0000 (22:48 +0000)]
Sample profiles - Add a name table to the binary encoding.
Binary encoded profiles used to encode all function names inline at
every reference. This is clearly suboptimal in terms of space. This
patch fixes this by adding a name table to the header of the file.
Cong Hou [Tue, 13 Oct 2015 22:27:41 +0000 (22:27 +0000)]
Update MachineBranchProbabilityInfo::normalizeEdgeWeights to make sure there is no zero weight in the output, and also add a missing test for JumpThreading.
The test is for the patch in http://reviews.llvm.org/D10979 but was missing when committing that patch.
We forgot to append the terminatepad's arguments which resulted in us
treating the old terminatepad as an argument to the new terminatepad
causing us to crash immediately. Instead, add the old terminatepad's
arguments to the new terminatepad.
Kevin Enderby [Tue, 13 Oct 2015 20:48:04 +0000 (20:48 +0000)]
Tweak to r250117 and change to use ErrorOr and drop isSizeValid for
ArchiveMemberHeader, suggestion by Rafael Espíndola.
Also The clang-x86-win2008-selfhost bot still does not like the
malformed-machos 00000031.a test, so removing it for now. All
the other bots are fine with it however.
Joseph Tremoulet [Tue, 13 Oct 2015 20:18:27 +0000 (20:18 +0000)]
[WinEH] Add CoreCLR EH table emission
Summary:
Emit the handler and clause locations immediately after the standard
xdata.
Clauses are emitted in the same order and format used to communiate them
to the CLR Execution Engine.
Add a lit test to verify correct table generation on a small but
interesting example function.
One of the changes in lib/Target/AMDGPU/AMDGPUMCInstLower.cpp was a new
one. Previously, bundle iterators and single-instruction iterators
could be compared to each other (comparing on underlying pointers).
I changed a comparison from using `MBB->end()` to using
`MBB->instr_end()`, since both end iterators should point at the some
place anyway.
I don't think the implicit conversion between the two iterator types is
a good idea since it's fairly easy to accidentally compare to the wrong
thing (they aren't always end iterators). Otherwise I would have just
added the conversion.
Even with that, no there should be functionality change here.
Remove remaining `ilist_iterator` implicit conversions from
LLVMScalarOpts.
This change exposed some scary behaviour in
lib/Transforms/Scalar/SCCP.cpp around line 1770. This patch changes a
call from `Function::begin()` to `&Function::front()`, since the return
was immediately being passed into another function that takes a
`Function*`. `Function::front()` started to assert, since the function
was empty. Note that `Function::end()` does not point at a legal
`Function*` -- it points at an `ilist_half_node` -- so the other
function was getting garbage before. (I added the missing check for
`Function::isDeclaration()`.)
Cong Hou [Tue, 13 Oct 2015 18:43:10 +0000 (18:43 +0000)]
Update the branch weight metadata in JumpThreading pass.
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
[PGO]: Eliminate calls to __llvm_profile_register_function for Linux.
On Linux, the profile runtime can use __start_SECTNAME and __stop_SECTNAME
symbols defined by the linker to locate the start and end location of
a named section (with C name). This eliminates the need for instrumented
binary to call __llvm_profile_register_function during start-up time.
Kevin Enderby [Tue, 13 Oct 2015 17:06:34 +0000 (17:06 +0000)]
The issue with the malformed-machos 00000031.a test is that it needed ‘-arch x86_64’
flag as it was a Mach-O universal file.
The default as to which architecture slice that is dumped without an -arch flag
depends on the host architecture and the contents of the universal file. The
malformed archive 00000031.a file has both an x86_64 and i386 slice. So for
for x86_64 hosts only that slice is dumped, for non-x86_64 hosts, which is many
of the bots both slices are dumped.
The test is intended to only check that the malformation of the x86_64 which
has a non-decimal characters in the size field of the archive header so it no
longer crashes.
The problem turned out that the i388 slice of the malformed archive had a
different malformation which was causing the non-x86_64 bots to get this error:
llvm-objdump -macho -disassemble -arch i386 00000031.a
Archive : .00000031.a 00000031.a(c_start.o):
LLVM ERROR: Symbol name entry points before beginning or past end of file.
and causing the test as it was written to fail. So by adding ‘-arch x86_64’ it
should correct the test and the malformation on the i388 slice will not be
dumped.
Also the removal of the malformed-machos mem-crup-0261.macho was not causing
the issue so that is put back in.
Joseph Tremoulet [Tue, 13 Oct 2015 16:44:30 +0000 (16:44 +0000)]
[WinEH] Iterate state changes instead of invokes
Summary:
Add an iterator that can walk across blocks and which visits the state
transitions rather than state ranges, with explicit transitions to -1
indicating the presence of top-level calls that may throw and cause the
current function to unwind to caller. This will simplify code that needs
to identify nested try regions.
Refactor SEH and C++EH table generation to use the new
InvokeStateChangeIterator, and remove the InvokeLabelIterator they were
using.
James Molloy [Tue, 13 Oct 2015 10:43:33 +0000 (10:43 +0000)]
[GlobalsAA] Don't assume anything about functions that may be overridden
Weak linkage and friends allow a symbol to be overriden outside the
code generator's model, so GlobalsAA shouldn't assume that anything it
can compute about such a symbol is valid.
BitcodeWriter: Stop using implicit ilist iterator conversion, NFC
Now LLVMBitWriter compiles without implicit ilist iterator conversions.
In these cases, the cleanest thing was to switch to range-based for
loops. Since there wasn't much noise I converted sub-loops and parent
loops as a drive-by.
Sanjoy Das [Tue, 13 Oct 2015 02:53:27 +0000 (02:53 +0000)]
[SCEV] Put some utilites in the ScalarEvolution class
In a later commit, `SplitBinaryAdd` will be used outside `IsConstDiff`,
so lift that out. And lift out `IsConstDiff` as
`computeConstantDifference` to keep things clean and to avoid playing
C++ access specifier games.
Continuing the work from last week to remove implicit ilist iterator
conversions. First related commit was probably r249767, with some more
motivation in r249925. This edition gets LLVMTransformUtils compiling
without the implicit conversions.
Matt Arsenault [Tue, 13 Oct 2015 00:49:00 +0000 (00:49 +0000)]
DAGCombiner: Don't stop finding better chain on 2 aliases
The comment says this was stopped because it was unlikely to be
profitable. This is not true if you want to combine vector loads
with multiple components.
t4 = store t0:1, t0:1
t5 = store t4, t1:0
t6 = store t5, t2:0
t7 = store t6, t3:0
We want to get all of these stores onto a chain
that is a TokenFactor of these N loads. This mostly
solves the AMDGPU merge-stores.ll regressions
with -combiner-alias-analysis for merging vector
stores of vector loads.
JF Bastien [Tue, 13 Oct 2015 00:28:47 +0000 (00:28 +0000)]
x86: preserve flags when folding atomic operations
Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.
This patch adds the missing EFLAGS definition.
Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.
Kevin Enderby [Tue, 13 Oct 2015 00:05:17 +0000 (00:05 +0000)]
Remove the correct unstable malformed-machos test mem-crup-0261.macho and
restore the malformed-machos 00000031.a test. Hopefully this will get all the
build bots happy again. I’ll again keep an eye on them.
Matt Arsenault [Mon, 12 Oct 2015 23:59:50 +0000 (23:59 +0000)]
DAGCombiner: Combine extract_vector_elt from build_vector
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.
InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.
Cong Hou [Mon, 12 Oct 2015 23:02:58 +0000 (23:02 +0000)]
Assign correct edge weights to unwind destinations when lowering invoke statement.
When lowering invoke statement, all unwind destinations are directly added as successors of call site block, and the weight of those new edges are not assigned properly. Actually, default weight 16 are used for those edges. This patch calculates the proper edge weights for those edges when collecting all unwind destinations.
Simon Pilgrim [Mon, 12 Oct 2015 23:00:11 +0000 (23:00 +0000)]
[SelectionDAG] Add common vector constant folding helper function
We have a number of functions that implement constant folding of vectors (unary and binary ops) in near identical manners (and the differences don't appear to be critical).
This patch introduces a common implementation (SelectionDAG::FoldConstantVectorArithmetic) and calls this in both the unary and binary op cases.
After this initial patch I intend to begin enabling vector constant folding for a wider number of opcodes in SelectionDAG::getNode().
Kevin Enderby [Mon, 12 Oct 2015 22:04:54 +0000 (22:04 +0000)]
Fixed bugs in llvm-obdump while parsing Mach-O files from malformed archives
that caused aborts. This was because of the characters of the ‘Size’ field in
the archive header did not contain decimal characters.
Cong Hou [Mon, 12 Oct 2015 19:44:08 +0000 (19:44 +0000)]
Update the branch weight metadata in JumpThreading pass.
In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
Reid Kleckner [Mon, 12 Oct 2015 19:43:34 +0000 (19:43 +0000)]
Make Win64 localescape offsets FP relative instead of SP relative
We made them SP relative back in March (r233137) because that's the
value the runtime passes to EH functions. With the new cleanuppad IR,
funclets adjust their frame argument from SP to FP, so our offsets
should now be FP-relative.