Zachary Turner [Mon, 17 Oct 2016 22:49:24 +0000 (22:49 +0000)]
Resubmit "Add support for advanced number formatting."
This resubmits commits 284425 and r284428, which were reverted
in r284429 due to some infinite recursion caused by an incorrect
selection of function overloads. Reproduced the failure on Linux
using GCC 4.8.4, and confirmed that with the new patch the tests
path on GCC as well as MSVC. So hopefully this fixes everything.
Kevin Enderby [Mon, 17 Oct 2016 22:09:25 +0000 (22:09 +0000)]
Next set of additional error checks for invalid Mach-O files for the
load commands that use the MachO::sub_framework_command,
MachO::sub_umbrella_command, MachO::sub_library_command
and MachO::sub_client_command types but are not used in llvm
libObject code but used in llvm tool code.
This includes the LC_SUB_FRAMEWORK, LC_SUB_UMBRELLA,
LC_SUB_LIBRARY and LC_SUB_CLIENT load commands.
Zachary Turner [Mon, 17 Oct 2016 20:57:45 +0000 (20:57 +0000)]
[Support] Add support for "advanced" number formatting.
raw_ostream has not afforded a lot of flexibility in terms of
how to format numbers when outputting. Wrap this all up into
a set of low level helper functions that can be used to output
numbers with arbitrary precision, alignment, format, etc and
then update raw_ostream to use these functions.
This will be useful for upcoming improvements to llvm's string
formatting libraries, but are still useful independently.
Sanjay Patel [Mon, 17 Oct 2016 20:26:46 +0000 (20:26 +0000)]
[DAG] make isConstOrConstSplat and isConstOrConstSplatFP more accessible; NFC
As noted in:
https://reviews.llvm.org/D25685
This is the next-to-smallest step needed to enable the ComputeNumSignBits fix in that patch.
In a minor attempt to keep some structure, we're pulling the FP helper over along with its
integer sibling, but clearly we can and should do more refactoring of the similar helper
functions in DAGCombiner and SelectionDAG to simplify and not duplicate functionality.
Davide Italiano [Mon, 17 Oct 2016 20:05:35 +0000 (20:05 +0000)]
[opt] Strip coverage if debug info is not present.
If -coverage is passed, but -g is not, clang populates the PassManager
pipeline with StripSymbols(debugOnly = true).
The stripSymbol pass therefore scans the list of named metadata,
drops !llvm.dbg.cu, but leaves !llvm.gcov and !0 (the compileUnit MD)
around. The verifier runs, and finds out that there's a CU not listed
in !llvm.dbg.cu (as it was previously dropped) -> crash.
When we strip debug info, so, check if there's coverage data,
and strip it as well, in order to avoid pending metadata left around.
Dehao Chen [Mon, 17 Oct 2016 19:28:44 +0000 (19:28 +0000)]
Ignore debug info when making optimization decisions in SimplifyCFG.
Summary: Debug info should *not* affect code generation. This patch properly handles debug info to make sure the generated code are the same with or without debug info.
Walter Erquinigo [Mon, 17 Oct 2016 18:56:18 +0000 (18:56 +0000)]
Handle relocations to thumb functions when dynamic linking COFF modules
Summary:
This adds the necessary logic to support relocations to thumb functions in the COFF dynamic linker.
The jumps to function addresses are mostly blx, which requires the ISA selection bit when jumping to a thumb function.
Note: I'm determining if the relocation requires the ISA bit when creating the relocation entries and not when resolving the relocation. I have to do that because I need the ObjectFile and the actual Symbol, which are available only when creating the entries. It would require a gross refactor if I do it otherwise, but I'm okay with doing it if you think it's better.
Teresa Johnson [Mon, 17 Oct 2016 14:56:53 +0000 (14:56 +0000)]
Rename interface for querying physical hardware concurrency
Based on post-commit review for D25585/r284180, rename
hardware_physical_concurrency to heavyweight_hardware_concurrency,
to better reflect what type of tasks it should be used for and
to enable other systems to map this to something other than the
number of physical cores.
James Molloy [Mon, 17 Oct 2016 12:54:07 +0000 (12:54 +0000)]
[SDAG] Use ABI type alignment for constant pools when optimizing for size
SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128.
In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding.
Oliver Stannard [Mon, 17 Oct 2016 12:00:24 +0000 (12:00 +0000)]
[SimplifyCFG] Don't lower complex ConstantExprs to lookup tables
Not all ConstantExprs can be represented by a global variable, for example most
pointer arithmetic other than addition of a constant, so we can't convert these
values from switch statements to lookup tables.
Tobias Grosser [Mon, 17 Oct 2016 11:56:26 +0000 (11:56 +0000)]
[SCEV] Consider delinearization pattern with extension with identity factor
Summary: The delinearization algorithm did not consider terms which had an extension without a multiply factor, i.e. a identify factor. We lose cases where size is char type where there will no multiply factor.
Andrea Di Biagio [Mon, 17 Oct 2016 11:32:26 +0000 (11:32 +0000)]
[CodeGenPrepare] When moving a zext near to its associated load, do not retain the original debug location.
CodeGenPrepare knows how to move a zext of a load into the same basic block
where the load lives. The goal is to help ISel match a zero-extending load
instead of two separated instructions.
CGP attempts to move a zext computation even if it lives in a basic block that
does not post-dominate the load's basic block. That means, the hoisted zext may
be speculated. Preserving the zext location would hurt the debugging experience
and the quality of sample pgo.
With this patch, when moving a zext near to its associated load, CGP no longer
propagates the zext's debug location. Instead, CGP conservatively reuses the
same debug location for the load and the zext.
An alternative approach would be to assign an artificial line-0 location to the
zext. However we don't want to over-use the 'line-0' for this particular case
because it would have a size cost in the line-table section for no additional
benefit.
George Rimar [Mon, 17 Oct 2016 10:58:02 +0000 (10:58 +0000)]
Recommit r284371 "[Object/ELF] - Check that e_shnum is null when e_shoff is."
With fix: hex edited the precompiled inputs from another testcases to pass new checks.
Original commit message:
[Object/ELF] - Check that e_shnum is null when e_shoff is.
Spec says (http://www.sco.com/developers/gabi/1998-04-29/ch4.eheader.html) :
e_shnum
This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table's size in bytes. If a file has no section header table, e_shnum holds the value zero.
Revealed using "id_000037,sig_11,src_000015,op_havoc,rep_8" from PR30540
That was the reason of crash in lld on incorrect input file.
Binary reduced using afl-min.
George Rimar [Mon, 17 Oct 2016 10:06:44 +0000 (10:06 +0000)]
[Object/ELF] - Check that e_shnum is null when e_shoff is.
Spec says (http://www.sco.com/developers/gabi/1998-04-29/ch4.eheader.html) :
e_shnum
This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table's size in bytes. If a file has no section header table, e_shnum holds the value zero.
Revealed using "id_000037,sig_11,src_000015,op_havoc,rep_8" from PR30540
That was the reason of crash in lld on incorrect input file.
Binary reduced using afl-min.
Justin Bogner [Mon, 17 Oct 2016 07:37:11 +0000 (07:37 +0000)]
Support: Drop LLVM_ATTRIBUTE_UNUSED_RESULT
Uses of this have all been updated to use LLVM_NODISCARD, which
matches the C++17 [[nodiscard]] semantics rather than those of GCC's
__attribute__((warn_unused_result)).
Craig Topper [Mon, 17 Oct 2016 06:41:18 +0000 (06:41 +0000)]
[X86] Fix shuffle decoding assertions to print the right number of required operands. Update the checks themselves to be >= to the same number instead of > one less than the required number.
Justin Bogner [Sun, 16 Oct 2016 22:09:24 +0000 (22:09 +0000)]
unittests: Explicitly ignore some return values in crash tests
Ideally these would actually check that the results are reasonable,
but given that we're looping over so many different kinds of path that
isn't really practical.
[MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG.
Davide Italiano [Sat, 15 Oct 2016 21:35:23 +0000 (21:35 +0000)]
[GVN/PRE] Hoist global values outside of loops.
In theory this could be generalized to move anything where
we prove the operands are available, but that would require
rewriting PRE. As NewGVN will hopefully come soon, and we're
trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably
not worth spending too much time on it. Fix provided by
Daniel Berlin!
Benjamin Kramer [Sat, 15 Oct 2016 13:15:05 +0000 (13:15 +0000)]
[SimplifyCFG] Use the error checking provided by getPrevNode.
BasicBlock::size is O(insts), making this loop O(blocks*insts), which
can be really slow on generated code. getPrevNode already checks if
we're at the beginning of the block and returns nullptr if so, just use
that instead. No functionality change intended.
- Removed unused class members.
- Made class internal data private.
- Made class scoped data function scoped where it's possible.
- Replace naked new/delete with unique_ptr.
- Made resources guaranteed to be freed.
Tim Northover [Fri, 14 Oct 2016 22:18:18 +0000 (22:18 +0000)]
GlobalISel: rename legalizer components to match others.
The previous names were both misleading (the MachineLegalizer actually
contained the info tables) and inconsistent with the selector & translator (in
having a "Machine") prefix. This should make everything sensible again.
The only functional change is the name of a couple of command-line options.
Justin Bogner [Fri, 14 Oct 2016 22:04:17 +0000 (22:04 +0000)]
Support: Add LLVM_NODISCARD with C++17's [[nodiscard]] semantics
This is essentially a more powerful version of our current
LLVM_ATTRIBUTE_UNUSED_RESULT, in that it can also be applied to types
and generate warnings whenever an object of that type is returned by
value and the value is discarded.
I'll replace uses of LLVM_ATTRIBUTE_UNUSED_RESULT and remove that
macro in follow up commits.
Guozhi Wei [Fri, 14 Oct 2016 20:41:50 +0000 (20:41 +0000)]
[PPC] Shorter sequence to load 64bit constant with same hi/lo words
This is a patch to implement pr30640.
When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.
This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.
[libFuzzer] add -trace_cmp=1 (guiding mutations based on the observed CMP instructions). This is a reincarnation of the previously deleted -use_traces, but using a different approach for collecting traces. Still a toy, but at least it scales well. Also fix -merge in trace-pc-guard mode
Sanjay Patel [Fri, 14 Oct 2016 19:46:31 +0000 (19:46 +0000)]
[DAG] avoid creating illegal node when transforming negated shifted sign bit
Eli noted this potential bug in the post-commit thread for:
https://reviews.llvm.org/rL284239
...but I'm not sure how to trigger it, so there's no test case yet.
Tom Stellard [Fri, 14 Oct 2016 19:14:29 +0000 (19:14 +0000)]
AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)
Tom Stellard [Fri, 14 Oct 2016 19:14:26 +0000 (19:14 +0000)]
TargetLowering: Add SimplifyDemandedBits() helper to TargetLoweringOpt
Summary:
The main purpose of this new helper is to enable simplifying operations that
have multiple uses. SimplifyDemandedBits does not handle multiple uses
currently, and this new function makes it possible to optimize:
and v1, v0, 0xffffff
mul24 v2, v1, v1 ; Multiply ignoring high 8-bits.
To:
mul24 v2, v0, v0
Where before this would not be optimized, because v1 has multiple uses.
David L Kreitzer [Fri, 14 Oct 2016 18:20:41 +0000 (18:20 +0000)]
Add a pass to optimize patterns of vectorized interleaved memory accesses for
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.
Vedant Kumar [Fri, 14 Oct 2016 17:16:53 +0000 (17:16 +0000)]
[Coverage] Support loading multiple binaries into a CoverageMapping
Add support for loading multiple coverage readers into a single
CoverageMapping instance. This should make it easier to prepare a
unified coverage report for multiple binaries.
Pierre Gousseau [Fri, 14 Oct 2016 16:41:38 +0000 (16:41 +0000)]
[X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero.
This change adds transformations such as:
zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
To:
srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.
Mehdi Amini [Fri, 14 Oct 2016 16:23:09 +0000 (16:23 +0000)]
[docs] Update some obsolete information in BitCodeFormat docs.
Summary:
* Describe new (3.3) parameter attribute group encoding, leaving old encoding there with a note about legacy
* Bring TYPE_BLOCK docs up to date
* Remove docs about obsolete (pre 3.0) TYPE_SYMTAB_BLOCK, TST_CODE_ENTRY
* Fix a couple of incorrect comments and remove one unused enum definition along the way
This addresses https://llvm.org/bugs/show_bug.cgi?id=28941.