George Rimar [Mon, 17 Oct 2016 10:58:02 +0000 (10:58 +0000)]
Recommit r284371 "[Object/ELF] - Check that e_shnum is null when e_shoff is."
With fix: hex edited the precompiled inputs from another testcases to pass new checks.
Original commit message:
[Object/ELF] - Check that e_shnum is null when e_shoff is.
Spec says (http://www.sco.com/developers/gabi/1998-04-29/ch4.eheader.html) :
e_shnum
This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table's size in bytes. If a file has no section header table, e_shnum holds the value zero.
Revealed using "id_000037,sig_11,src_000015,op_havoc,rep_8" from PR30540
That was the reason of crash in lld on incorrect input file.
Binary reduced using afl-min.
George Rimar [Mon, 17 Oct 2016 10:06:44 +0000 (10:06 +0000)]
[Object/ELF] - Check that e_shnum is null when e_shoff is.
Spec says (http://www.sco.com/developers/gabi/1998-04-29/ch4.eheader.html) :
e_shnum
This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table's size in bytes. If a file has no section header table, e_shnum holds the value zero.
Revealed using "id_000037,sig_11,src_000015,op_havoc,rep_8" from PR30540
That was the reason of crash in lld on incorrect input file.
Binary reduced using afl-min.
Justin Bogner [Mon, 17 Oct 2016 07:37:11 +0000 (07:37 +0000)]
Support: Drop LLVM_ATTRIBUTE_UNUSED_RESULT
Uses of this have all been updated to use LLVM_NODISCARD, which
matches the C++17 [[nodiscard]] semantics rather than those of GCC's
__attribute__((warn_unused_result)).
Craig Topper [Mon, 17 Oct 2016 06:41:18 +0000 (06:41 +0000)]
[X86] Fix shuffle decoding assertions to print the right number of required operands. Update the checks themselves to be >= to the same number instead of > one less than the required number.
Justin Bogner [Sun, 16 Oct 2016 22:09:24 +0000 (22:09 +0000)]
unittests: Explicitly ignore some return values in crash tests
Ideally these would actually check that the results are reasonable,
but given that we're looping over so many different kinds of path that
isn't really practical.
[MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG.
Davide Italiano [Sat, 15 Oct 2016 21:35:23 +0000 (21:35 +0000)]
[GVN/PRE] Hoist global values outside of loops.
In theory this could be generalized to move anything where
we prove the operands are available, but that would require
rewriting PRE. As NewGVN will hopefully come soon, and we're
trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably
not worth spending too much time on it. Fix provided by
Daniel Berlin!
Benjamin Kramer [Sat, 15 Oct 2016 13:15:05 +0000 (13:15 +0000)]
[SimplifyCFG] Use the error checking provided by getPrevNode.
BasicBlock::size is O(insts), making this loop O(blocks*insts), which
can be really slow on generated code. getPrevNode already checks if
we're at the beginning of the block and returns nullptr if so, just use
that instead. No functionality change intended.
- Removed unused class members.
- Made class internal data private.
- Made class scoped data function scoped where it's possible.
- Replace naked new/delete with unique_ptr.
- Made resources guaranteed to be freed.
Tim Northover [Fri, 14 Oct 2016 22:18:18 +0000 (22:18 +0000)]
GlobalISel: rename legalizer components to match others.
The previous names were both misleading (the MachineLegalizer actually
contained the info tables) and inconsistent with the selector & translator (in
having a "Machine") prefix. This should make everything sensible again.
The only functional change is the name of a couple of command-line options.
Justin Bogner [Fri, 14 Oct 2016 22:04:17 +0000 (22:04 +0000)]
Support: Add LLVM_NODISCARD with C++17's [[nodiscard]] semantics
This is essentially a more powerful version of our current
LLVM_ATTRIBUTE_UNUSED_RESULT, in that it can also be applied to types
and generate warnings whenever an object of that type is returned by
value and the value is discarded.
I'll replace uses of LLVM_ATTRIBUTE_UNUSED_RESULT and remove that
macro in follow up commits.
Guozhi Wei [Fri, 14 Oct 2016 20:41:50 +0000 (20:41 +0000)]
[PPC] Shorter sequence to load 64bit constant with same hi/lo words
This is a patch to implement pr30640.
When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.
This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.
[libFuzzer] add -trace_cmp=1 (guiding mutations based on the observed CMP instructions). This is a reincarnation of the previously deleted -use_traces, but using a different approach for collecting traces. Still a toy, but at least it scales well. Also fix -merge in trace-pc-guard mode
Sanjay Patel [Fri, 14 Oct 2016 19:46:31 +0000 (19:46 +0000)]
[DAG] avoid creating illegal node when transforming negated shifted sign bit
Eli noted this potential bug in the post-commit thread for:
https://reviews.llvm.org/rL284239
...but I'm not sure how to trigger it, so there's no test case yet.
Tom Stellard [Fri, 14 Oct 2016 19:14:29 +0000 (19:14 +0000)]
AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)
Tom Stellard [Fri, 14 Oct 2016 19:14:26 +0000 (19:14 +0000)]
TargetLowering: Add SimplifyDemandedBits() helper to TargetLoweringOpt
Summary:
The main purpose of this new helper is to enable simplifying operations that
have multiple uses. SimplifyDemandedBits does not handle multiple uses
currently, and this new function makes it possible to optimize:
and v1, v0, 0xffffff
mul24 v2, v1, v1 ; Multiply ignoring high 8-bits.
To:
mul24 v2, v0, v0
Where before this would not be optimized, because v1 has multiple uses.
David L Kreitzer [Fri, 14 Oct 2016 18:20:41 +0000 (18:20 +0000)]
Add a pass to optimize patterns of vectorized interleaved memory accesses for
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.
Vedant Kumar [Fri, 14 Oct 2016 17:16:53 +0000 (17:16 +0000)]
[Coverage] Support loading multiple binaries into a CoverageMapping
Add support for loading multiple coverage readers into a single
CoverageMapping instance. This should make it easier to prepare a
unified coverage report for multiple binaries.
Pierre Gousseau [Fri, 14 Oct 2016 16:41:38 +0000 (16:41 +0000)]
[X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero.
This change adds transformations such as:
zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
To:
srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.
Mehdi Amini [Fri, 14 Oct 2016 16:23:09 +0000 (16:23 +0000)]
[docs] Update some obsolete information in BitCodeFormat docs.
Summary:
* Describe new (3.3) parameter attribute group encoding, leaving old encoding there with a note about legacy
* Bring TYPE_BLOCK docs up to date
* Remove docs about obsolete (pre 3.0) TYPE_SYMTAB_BLOCK, TST_CODE_ENTRY
* Fix a couple of incorrect comments and remove one unused enum definition along the way
This addresses https://llvm.org/bugs/show_bug.cgi?id=28941.
Sanjay Patel [Fri, 14 Oct 2016 15:36:28 +0000 (15:36 +0000)]
[InstCombine] remove redundant test
This test was apparently checking for 2 independent folds, but we have
plenty of tests for those individual folds already. We are lacking
vector tests, however, because we don't have the shift folds for vectors.
Diana Picus [Fri, 14 Oct 2016 10:19:40 +0000 (10:19 +0000)]
[GlobalISel] Get the AArch64 tests to work on Linux
Mostly this just means changing the triple from aarch64-apple-ios to the generic
aarch64--. Only one test needs more significant changes, but GlobalISel already
does the right thing so it's ok to just change the checks.
Simon Dardis [Fri, 14 Oct 2016 09:31:42 +0000 (09:31 +0000)]
[mips] Fix aui/daui/dahi/dati for MIPSR6
For compatiblity with binutils, define these instructions to take
two registers with a 16bit unsigned immediate. Both of the registers
have to be same for dahi and dati.
[x86][ms-inline-asm] use of "jmp short" in asm is not supported
Committing in the name of Ziv Izhar: After check-all and LGTM .
The following patch is for compatability with Microsoft.
Microsoft ignores the keyword "short" when used after a jmp, for example:
__asm {
jmp short label
label:
}
A test for that patch will be added in another patch, since it's located in clang's codegen tests. Link will be added shortly.
link to test: https://reviews.llvm.org/D24958
Craig Topper [Fri, 14 Oct 2016 06:00:42 +0000 (06:00 +0000)]
[DAGCombiner] Teach createBuildVecShuffle to handle cases where input vectors are less than half of the output vector size.
This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code.
Eric Christopher [Fri, 14 Oct 2016 05:47:41 +0000 (05:47 +0000)]
In preparation for removing getNameWithPrefix off of TargetMachine,
sink the current behavior into the callers and sink
TargetMachine::getNameWithPrefix into TargetMachine::getSymbol.
AMDGPU isa only supports 32-bit immediates. In order to access 64-bit address we need to generate 32-bit lo/hi relocations, and do the right math (separate patch). Currently we only generate one 32 bit relocation for lower bits for each access, losing higher bits. Hence we need relocations listed above.
Teresa Johnson [Fri, 14 Oct 2016 00:13:59 +0000 (00:13 +0000)]
Add interface for querying physical hardware concurrency
Summary:
This will be used by ThinLTO to set the amount of backend
parallelism, which performs better when restricted to the number
of physical cores (on X86 at least, where getHostNumPhysicalCores is
currently defined). If not available this falls back to
thread::hardware_concurrency.
Note I didn't add to the thread class since that is a typedef to
std::thread where available.
Tom Stellard [Thu, 13 Oct 2016 21:03:49 +0000 (21:03 +0000)]
LegalizeDAG: Implement PROMOTE for ISD::BITREVERSE
Summary:
This operation is promoted the same way was ISD::BSWAP. This will
prevent a regression in test/Target/AMDGOU/bitreverse.ll when i16
support is implemented.
Sriraman Tallam [Thu, 13 Oct 2016 20:54:39 +0000 (20:54 +0000)]
New llc option pie-copy-relocations to optimize access to extern globals.
This option indicates copy relocations support is available from the linker
when building as PIE and allows accesses to extern globals to avoid the GOT.
Quentin Colombet [Thu, 13 Oct 2016 19:27:48 +0000 (19:27 +0000)]
[RAGreedy] Empty live-ranges always succeed in last chance recoloring.
Relax the constraint for empty live-ranges while doing last chance
recoloring. Indeed, those live-ranges do not need an actual color to be
fond for the recoloring to work.
Empty live-range may happen as a result of splitting/spilling.
Nirav Dave [Thu, 13 Oct 2016 19:20:16 +0000 (19:20 +0000)]
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and
merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill behavior.
Teresa Johnson [Thu, 13 Oct 2016 17:43:20 +0000 (17:43 +0000)]
Add interface to compute number of physical cores on host system
Summary:
For now I have only added support for x86_64 Linux, but other systems
can be added incrementally.
This is to be used for setting the default parallelism for ThinLTO
backends (instead of thread::hardware_concurrency which includes
hyperthreading and is too aggressive). I'll send this as a follow-on
patch, and it will fall back to hardware_concurrency when the new
getHostNumPhysicalCores returns -1 (when not supported for a given
host system).
I also added an interface to MemoryBuffer to force reading a file
as a stream - this is required for /proc/cpuinfo which is a special
file that looks like a normal file but appears to have 0 size.
The existing readers of this file in Host.cpp are reading the first
1024 or so bytes from it, because the necessary info is near the top.
But for the new functionality we need to be able to read the entire
file. I can go back and change the other readers to use the new
getFileAsStream as a follow-on patch since it seems much more robust.