Petr Hosek [Fri, 4 Aug 2017 03:17:37 +0000 (03:17 +0000)]
Reland "[llvm][llvm-objcopy] Added support for outputting to binary in llvm-objcopy"
This change adds the "-O binary" flag which directs llvm-objcopy to
output the object file to the same format as GNU objcopy does when given
the flag "-O binary". This was done by splitting the Object class into
two subclasses ObjectELF and ObjectBianry which each output a different
format but relay on the same code to read in the Object in Object.
Reid Kleckner [Fri, 4 Aug 2017 01:39:23 +0000 (01:39 +0000)]
[Support] Update comments about stdout, raw_fd_ostream, and outs()
The full story is in the comments:
// Do not attempt to close stdout or stderr. We used to try to maintain the
// property that tools that support writing file to stdout should not also
// write informational output to stdout, but in practice we were never able to
// maintain this invariant. Many features have been added to LLVM and clang
// (-fdump-record-layouts, optimization remarks, etc) that print to stdout, so
// users must simply be aware that mixed output and remarks is a possibility.
NFC, I am just updating comments to reflect reality.
Vedant Kumar [Fri, 4 Aug 2017 00:36:24 +0000 (00:36 +0000)]
[llvm-cov] Ignore unclosed line segments when setting line counts
This patch makes a slight change to the way llvm-cov determines line
execution counts. If there are multiple line segments on a line, the
line count is the max count among the regions which start *and* end on
the line. This avoids an issue posed by deferred regions which start on
the same line as a terminated region, e.g:
if (false)
return; //< The line count should be 0, even though a new region
//< starts at the semi-colon.
foo();
Another change is that counts from line segments which don't correspond
to region entries are considered. This enables the first change, and
corrects an outstanding issue (see the showLineExecutionCounts.cpp test
change).
Teresa Johnson [Thu, 3 Aug 2017 23:42:58 +0000 (23:42 +0000)]
Use profile summary to disable peeling for huge working sets
Summary:
Detect when the working set size of a profiled application is huge,
by comparing the number of counts required to reach the hot percentile
in the profile summary to a large threshold*.
When the working set size is determined to be huge, disable peeling
to avoid bloating the working set further.
*Note that the selected threshold (15K) is significantly larger than the
largest working set value in SPEC cpu2006 (which is gcc at around 11K).
Zachary Turner [Thu, 3 Aug 2017 23:11:52 +0000 (23:11 +0000)]
[llvm-pdbutil] Add an option to only dump specific module indices.
Often something interesting (like a symbol) is in a particular
module, and you don't want to dump symbols from all other 300
modules to see the one you want. This adds a -modi option so that
we only dump the specified module.
Easwaran Raman [Thu, 3 Aug 2017 22:23:33 +0000 (22:23 +0000)]
[Inliner] Increase threshold for hot callsites without PGO.
Summary:
This increases the inlining threshold for hot callsites. Hotness is
defined in terms of block frequency of the callsite relative to the
caller's entry block's frequency. Since this requires BFI in the
inliner, this only affects the new PM pipeline. This is enabled by
default at -O3.
This improves the performance of some internal benchmarks. Notably, an
internal benchmark for Gipfeli compression
(https://github.com/google/gipfeli) improves by ~7%. Povray in SPEC2006
improves by ~2.5%. I am running more experiments and will update the
thread if other benchmarks show improvement/regression.
In terms of text size, LLVM test-suite shows an 1.22% text size
increase. Diving into the results, 13 of the benchmarks in the
test-suite increases by > 10%. Most of these are small, but
Adobe-C++/loop_unroll (17.6% increases) and tramp3d(20.7% size increase)
have >250K text size. On a large application, the text size increases by
2%
[GlobalISel] Make GlobalISel a non-optional library.
With this change, the GlobalISel library gets always built. In
particular, this is not possible to opt GlobalISel out of the build
using the LLVM_BUILD_GLOBAL_ISEL variable any more.
Reid Kleckner [Thu, 3 Aug 2017 21:15:09 +0000 (21:15 +0000)]
[PDB] Fix section contributions
Summary:
PDB section contributions are supposed to use output section indices and
offsets, not input section indices and offsets.
This allows the debugger to look up the index of the module that it
should look up in the modules stream for symbol information. With this
change, windbg can now find line tables, but it still cannot print local
variables.
[LVI] Constant-propagate a zero extension of the switch condition value through case edges
Summary:
(This is a second attempt as https://reviews.llvm.org/D34822 was reverted.)
LazyValueInfo currently computes the constant value of the switch condition through case edges, which allows the constant value to be propagated through the case edges.
But we have seen a case where a zero-extended value of the switch condition is used past case edges for which the constant propagation doesn't occur.
This patch adds a small logic to handle such a case in getEdgeValueLocal().
This is motivated by the Python 2.7 eval loop in PyEval_EvalFrameEx() where the lack of the constant propagation causes longer live ranges and more spill code than necessary.
With this patch, we see that the code size of PyEval_EvalFrameEx() decreases by ~5.4% and a performance test improves by ~4.6%.
Taewook Oh [Thu, 3 Aug 2017 21:07:12 +0000 (21:07 +0000)]
Move unit test to the proper location
Summary: Move test/CodeGen/AArch64/reg-bank-128bit.mir to test/CodeGen/AArch64/GlobalISel/reg-bank-128bit.mir so that the test is executed only when global-isel is enabled. lit.local.cfg under test/CodeGen/AArch64/GlobalISel checks if 'global-isel' is in the available_features while the same file under test/CodeGen/AArch64 doesn't.
Zachary Turner [Thu, 3 Aug 2017 20:30:09 +0000 (20:30 +0000)]
[llvm-pdbutil] Allow diff to force module equivalencies.
Sometimes the normal module equivalence detection algorithm doesn't
quite work. For example, you might build the same program with
MSVC and clang-cl, outputting to different object files, exes, and
PDBs, then compare them. If the object files have different names
though, then they won't be treated as equivalent. This way we
can force specific module indices to be treated as equivalent.
Nico Weber [Thu, 3 Aug 2017 20:10:47 +0000 (20:10 +0000)]
Fix llvm-for-windows-on-linux build after LLVM r272701.
The file is called "intrin.h". When building targeting Windows on a Linux
system, with the SDK mounted in a case-insensitive file system, "Intrin.h" will
miss clang's intrin.h header (because that's not in a case-insensitive file
system) but then find intrin.h in the Microsoft SDK. clang can't handle the
SDK's intrin.h.
Greg Bedwell [Thu, 3 Aug 2017 17:55:54 +0000 (17:55 +0000)]
Fix check-lit compatibility with multi-config CMake generators
Multi-configuration CMake generators such as those for Visual Studio or Xcode do not
specify a build config at configure time, but let the user choose at build
time. In these cases binaries go into build/${Configuration}/bin rather than
build/bin. Prior to this commit, check-lit would fail when using multi-configuration
generators as it did not know how to resolve ${Configuration} in order
to find tools such as FileCheck. This commit teaches it to resolve
llvm_tools_dir within lit using the value specified with --param
build_mode.
Teresa Johnson [Thu, 3 Aug 2017 17:52:38 +0000 (17:52 +0000)]
Disable loop peeling during full unrolling pass.
Summary:
Peeling should not occur during the full unrolling invocation early
in the pipeline, but rather later with partial and runtime loop
unrolling. The later loop unrolling invocation will also eventually
utilize profile summary and branch frequency information, which
we would like to use to control peeling. And for ThinLTO we want
to delay peeling until the backend (post thin link) phase, just as
we do for most types of unrolling.
Ensure peeling doesn't occur during the full unrolling invocation
by adding a parameter to the shared implementation function, similar
to the way partial and runtime loop unrolling are disabled.
Performance results for ThinLTO suggest this has a neutral to positive
effect on some internal benchmarks.
Dehao Chen [Thu, 3 Aug 2017 17:11:41 +0000 (17:11 +0000)]
Do not want to use BFI to get profile count for sample pgo
Summary: For SamplePGO, we already record the callsite count in the call instruction itself. So we do not want to use BFI to get profile count as it is less accurate.
Simon Pilgrim [Thu, 3 Aug 2017 17:04:59 +0000 (17:04 +0000)]
[X86] Adding a test for vector shuffle extractions.
When both the vector inputs of the shuffle vector is comprising of same vector or shuffle mask is accessing elements from only one operand vector (like in PR33758 test already present).
My suggestion was wrong because it left the MachineOperands tied which
confused the verifier. Since there's no easy way to untie operands, the
original BuildMI solution is probably best.
Nirav Dave [Thu, 3 Aug 2017 15:40:21 +0000 (15:40 +0000)]
[TableGen] AsmMatcher: fix OpIdx computation when HasOptionalOperands is true
Consider the following instruction: "inst.eq $dst, $src" where ".eq"
is an optional flag operand. The $src and $dst operands are
registers. If we parse the instruction "inst r0, r1", the flag is not
present and it will be marked in the "OptionalOperandsMask" variable.
After the matching is complete we call the "convertToMCInst" method.
The current implementation works only if the optional operands are at
the end of the array. The "Operands" array looks like [token:"inst",
reg:r0, reg:r1]. The first operand that must be added to the MCInst
is the destination, the r0 register. The "OpIdx" (in the Operands
array) for this register is 2. However, since the flag is not present
in the Operands, the actual index for r0 should be 1. The flag is not
present since we rely on the default value.
This patch removes the "NumDefaults" variable and replaces it with an
array (DefaultsOffset). This array contains an index for each operand
(excluding the mnemonic). At each index, the array contains the
number of optional operands that should be subtracted. For the
previous example, this array looks like this: [0, 1, 1]. When we need
to access the r0 register, we compute its index as 2 -
DefaultsOffset[1] = 1.
Robert Lougher [Thu, 3 Aug 2017 11:54:02 +0000 (11:54 +0000)]
[LiveDebugVariables] Use lexical scope to trim debug value live intervals
The debug value live intervals computed by Live Debug Variables may extend
beyond the range of the debug location's lexical scope. In this case,
splitting of an interval can result in an interval outside of the scope being
created, causing extra unnecessary DBG_VALUEs to be emitted. To prevent this,
trim the intervals to the lexical scope.
Simon Dardis [Thu, 3 Aug 2017 09:38:46 +0000 (09:38 +0000)]
[SelectionDAG] Resolve PR33978.
rL306209 taught SelectionDAG how to add the dereferenceable flag when
expanding memcpy and memmove. The fix however contained a nit where
the offset + size was constructed as an APInt of PointerSize rather
than PointerSizeInBits.
This lead to isDereferenceableAndAlignedPointer() get truncated values or
values which would be sign extended within that function leading to
incorrect results.
Ewan Crawford [Thu, 3 Aug 2017 09:23:03 +0000 (09:23 +0000)]
[Cloning] Move distinct GlobalVariable debug info metadata in CloneModule
Duplicating the distinct Subprogram and CU metadata nodes seems like the incorrect thing to do in CloneModule for GlobalVariable debug info. As it results in the scope of the GlobalVariable DI no longer being consistent with the rest of the module, and the new CU is absent from llvm.dbg.cu.
Fixed by adding RF_MoveDistinctMDs to MapMetadata flags for GlobalVariables.
Current unit test IR after clone:
```
@gv = global i32 1, comdat($comdat), !dbg !0, !type !5
Add support in the instruction selector for G_GLOBAL_VALUE for ELF and
MachO for the static relocation model. We don't handle Windows yet
because that's Thumb-only, and we don't handle Thumb in general at the
moment.
Support for PIC, ROPI, RWPI and TLS will be added in subsequent commits.
Max Kazantsev [Thu, 3 Aug 2017 08:41:30 +0000 (08:41 +0000)]
[SCEV] Re-enable "Cache results of computeExitLimit"
The patch rL309080 was reverted because it did not clean up the cache on "forgetValue"
method call. This patch re-enables this change, adds the missing check and introduces
two new unit tests that make sure that the cache is cleaned properly.
- makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
- lowering is done by first converting the boolean value into the carry flag
using (_, C) <- (ARMISD::ADDC R, -1) and converted back to an integer value
using (R, _) <- (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
operations does the actual addition.
- for subtraction, given that ISD::SUBCARRY second result is actually a
borrow, we need to invert the value of the second operand and result before
and after using ARMISD::SUBE. We need to invert the carry result of
ARMISD::SUBE to preserve the semantics.
- given that the generic combiner may lower ISD::ADDCARRY and
ISD::SUBCARRY into ISD::UADDO and ISD::USUBO we need to update their lowering
as well otherwise i64 operations now would require branches. This implies
updating the corresponding test for unsigned.
- add new combiner to remove the redundant conversions from/to carry flags
to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) -> C
Tobias Grosser [Thu, 3 Aug 2017 04:17:58 +0000 (04:17 +0000)]
[unittest] Remove TODO comment which caused concern
Remove the second part of the TODO comment that highlighted an issue with
possibly connecting all nodes to the exit of the CFG. This caused concerns
with Jakub Kuderski regarding its feasability, hence we remove it. Such
points are better discussed outside of CFG. If connecting all nodes makes
sense and what the impact is is currently part of an active review discussion.
Sameer AbuAsal [Thu, 3 Aug 2017 02:41:17 +0000 (02:41 +0000)]
[RegisterCoalescer] Add wrapper for Erasing Instructions
Summary:
To delete an instruction the coalescer needs to call eraseFromParent()
on the MachineInstr, insert it in the ErasedInstrs list and update the
Live Ranges structure. This patch re-factors the code to do all that in
one function. This will also fix cases where previous code wasn't
inserting deleted instructions in the ErasedList.
IMHO it is an antipattern to have a enum value that is Default.
At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.
This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.
Javed Absar [Thu, 3 Aug 2017 01:24:12 +0000 (01:24 +0000)]
[ARM] Tidy up banked registers encoding
Moves encoding (SYSm) information of banked registers to ARMSystemRegister.td,
where it rightly belongs and forms a single point of reference in the code.
Vedant Kumar [Wed, 2 Aug 2017 23:35:25 +0000 (23:35 +0000)]
[Coverage] Add an API to retrive all instantiations of a function (NFC)
The CoverageMapping::getInstantiations() API retrieved all function
records corresponding to functions with more than one instantiation (e.g
template functions with multiple specializations). However, there was no
simple way to determine *which* function a given record was an
instantiation of. This was an oversight, since it's useful to aggregate
coverage information over all instantiations of a function.
llvm-cov works around this by building a mapping of source locations to
instantiation sets, but this duplicates logic that libCoverage already
has (see FunctionInstantiationSetCollector).
This change adds a new API, CoverageMapping::getInstantiationGroups(),
which returns a list of InstantiationGroups. A group contains records
for each instantiation of some particular function, and also provides
utilities to get the total execution count within the group, the source
location of the common definition, etc.
This lets removes some hacky logic in llvm-cov by reusing
FunctionInstantiationSetCollector and makes the CoverageMapping API
friendlier for other clients.
Zachary Turner [Wed, 2 Aug 2017 22:31:39 +0000 (22:31 +0000)]
[pdb/lld] Write a valid FPM.
The PDB reserves certain blocks for the FPM that describe which
blocks in the file are allocated and which are free. We weren't
filling that out at all, and in some cases we were even stomping
it with incorrect data. This patch writes a correct FPM.
Zachary Turner [Wed, 2 Aug 2017 22:25:52 +0000 (22:25 +0000)]
[pdbutil] Add a command to dump the FPM.
Recently problems have been discovered in the way we write the FPM
(free page map). In order to fix this, we first need to establish
a baseline about what a correct FPM looks like using an MSVC
generated PDB, so that we can then make our own generated PDBs
match. And in order to do this, the dumper needs a mode where it
can dump an FPM so that we can write tests for it.
This patch adds a command to dump the FPM, as well as a test against
a known-good PDB.
Teresa Johnson [Wed, 2 Aug 2017 20:35:29 +0000 (20:35 +0000)]
[PM] Split LoopUnrollPass and make partial unroller a function pass
Summary:
This is largely NFC*, in preparation for utilizing ProfileSummaryInfo
and BranchFrequencyInfo analyses. In this patch I am only doing the
splitting for the New PM, but I can do the same for the legacy PM as
a follow-on if this looks good.
*Not NFC since for partial unrolling we lose the updates done to the
loop traversal (adding new sibling and child loops) - according to
Chandler this is not very useful for partial unrolling, but it also
means that the debugging flag -unroll-revisit-child-loops no longer
works for partial unrolling.
I was surprised to see the code model being passed to MC. After all,
it assembles code, it doesn't create it.
The one place it is used is in the expansion of .cfi directives to
handle .eh_frame being more that 2gb away from the code.
As far as I can tell, gnu assembler doesn't even have an option to
enable this. Compiling a c file with gcc -mcmodel=large produces a
regular looking .eh_frame. This is probably because in practice linker
parse and recreate .eh_frames.
In llvm this is used because the JIT can place the code and .eh_frame
very far apart. Ideally we would fix the jit and delete this
option. This is hard.
Apart from confusion another problem with the current interface is
that most callers pass CodeModel::Default, which is bad since MC has
no way to map it to the target default if it actually needed to.
This patch then replaces the argument with a boolean with a default
value. The vast majority of users don't ever need to look at it. In
fact, only CodeGen and llvm-mc use it and llvm-mc just to enable more
testing.
David Blaikie [Wed, 2 Aug 2017 20:16:22 +0000 (20:16 +0000)]
DebugInfo: Test & handle (differently) non-zero DW_AT_ranges_base
Followup to r309570, fixing it slightly differently (ranges_base and
addr_base should never be read from a DWO file - so there shouldn't be
any issue with 'overriding' the values - conditionalize the code and
assert that the values aren't being overriden).
Jakub Kuderski [Wed, 2 Aug 2017 18:17:52 +0000 (18:17 +0000)]
[Dominators] Teach LoopDeletion to use the new incremental API
Summary:
This patch makes LoopDeletion use the incremental DominatorTree API.
We modify LoopDeletion to perform the deletion in 5 steps:
1. Create a new dummy edge from the preheader to the exit, by adding a conditional branch.
2. Inform the DomTree about the new edge.
3. Remove the conditional branch and replace it with an unconditional edge to the exit. This removes the edge to the loop header, making it unreachable.
4. Inform the DomTree about the deleted edge.
5. Remove the unreachable block from the function.
Creating the dummy conditional branch is necessary to perform incremental DomTree update.
We should consider using the batch updater when it's ready.
Hiroshi Inoue [Wed, 2 Aug 2017 18:16:32 +0000 (18:16 +0000)]
[StackColoring] Update AliasAnalysis information in stack coloring pass (part 2)
This patch is update after the first patch (https://reviews.llvm.org/rL309651) based on the post-commit comments.
Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types.
Actually, there is a FIXME comment in StackColoring.cpp
// FIXME: In order to enable the use of TBAA when using AA in CodeGen,
// we'll also need to update the TBAA nodes in MMOs with values
// derived from the merged allocas.
But, TBAA has been already enabled in CodeGen without fixing this pass.
The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling.
Although we observed the problem on ppc64le, this is a platform neutral issue.
This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots.
Nirav Dave [Wed, 2 Aug 2017 16:35:58 +0000 (16:35 +0000)]
[DAG] Improve candidate pruning in store merge failure case. NFCI
During store merge we construct a sorted list of consecutive store
candidates and consider subsequences for merging into a single
store. For each subsequence we check if the stored value type is legal
the merged store would have valid and fast and if the constructed
value to be stored is valid. The only properties that affect this
check between subsequences is the size of the subsequence, the
alignment of the first store, the alignment of the stored load value
(when merging stores-of-loads), and whether the merged value is a
constant zero.
If we do not find a viable mergeable subsequence starting from the
first store of length N, we know that a subsequence starting at a
later store of length N will also fail unless the new store's
alignment, the new load's alignment (if we're merging store-of-loads),
or we've dropped stores of nonzero value and could construct a merged
stores of zero (for merging constants).
As a result if we fail to find a valid subsequence starting from the
first store we can safely skip considering subsequences that start
with subsequent stores unless one of the above properties is
true. This significantly (2x) improves compile time in some
pathological cases.