Kuba Brecka [Tue, 15 Nov 2016 21:07:03 +0000 (21:07 +0000)]
Fix llvm-symbolizer to correctly sort a symbol array and calculate symbol sizes
Sometimes, llvm-symbolizer gives wrong results due to incorrect sizes of some symbols. The reason for that was an incorrectly sorted array in computeSymbolSizes. The comparison function used subtraction of unsigned types, which is incorrect. Let's change this to return explicit -1 or 1.
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.
Sanjay Patel [Tue, 15 Nov 2016 18:44:53 +0000 (18:44 +0000)]
[x86] auto-generate checks; NFC
Also, fix the test params to use an attribute rather than a CPU model
and remove the AVX run because that does nothing but check for a 'v'
prefix in all of these tests.
Wei Mi [Tue, 15 Nov 2016 18:35:53 +0000 (18:35 +0000)]
[LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.
Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.
Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.
But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.
From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.
Wei Mi [Tue, 15 Nov 2016 17:34:52 +0000 (17:34 +0000)]
[IndVars] Change the order to compute WidenAddRec in widenIVUse.
When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence
return non-null but different WideAddRec, if getWideRecurrence is called
before getExtendedOperandRecurrence, we won't bother to call
getExtendedOperandRecurrence again. But As we know it is possible that after
SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned
by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null
WideAddRec, we know for sure that it is legal to do widening for current instruction.
So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which
will increase the chance of successful widening.
Simon Pilgrim [Tue, 15 Nov 2016 16:24:40 +0000 (16:24 +0000)]
[X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)
This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic.
This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases.
Diana Picus [Tue, 15 Nov 2016 15:38:15 +0000 (15:38 +0000)]
[ARM] Make sure GlobalISel is only initialized once. NFCI
Move some code inside the proper 'if' block to make sure it is only run once,
when the subtarget is first created. Things can still break if we use different
ARM target machines or if we have functions with different 'target-cpu' or
'target-features', we should fix that too in the future.
Robert Lougher [Tue, 15 Nov 2016 14:27:33 +0000 (14:27 +0000)]
[LoopVectorizer] When estimating reg usage, unused insts may "end" another use
The register usage algorithm incorrectly treats instructions whose value is
not used within the loop (e.g. those that do not produce a value).
The algorithm first calculates the usages within the loop. It iterates over
the instructions in order, and records at which instruction index each use
ends (in fact, they're actually recorded against the next index, as this is
when we want to delete them from the open intervals).
The algorithm then iterates over the instructions again, adding each
instruction in turn to a list of open intervals. Instructions are then
removed from the list of open intervals when they occur in the list of uses
ended at the current index.
The problem is, instructions which are not used in the loop are skipped.
However, although they aren't used, the last use of a value may have been
recorded against that instruction index. In this case, the use is not deleted
from the open intervals, which may then bump up the estimated register usage.
This patch fixes the issue by simply moving the "is used" check after the loop
which erases the uses at the current index.
Tony Jiang [Tue, 15 Nov 2016 14:25:56 +0000 (14:25 +0000)]
[PowerPC] Implement BE VSX load/store builtins - llvm portion.
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.
Diana Picus [Tue, 15 Nov 2016 14:11:11 +0000 (14:11 +0000)]
Get GlobalISel to build on Linux after r286407
r286407 has introduced calls to llvm::AddLandingPadInfo, which lives in the
SelectionDAG component. Add it to LLVMBuild to avoid linker failures on Linux.
Zvi Rackover [Tue, 15 Nov 2016 13:29:23 +0000 (13:29 +0000)]
[X86][FastISel] Fix lowering of overflow result on AVX512 targets
Summary:
Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class.
We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction.
Introduce TLI predicative for base-relative Jump Tables.
For 64bit ABIs it is common practice to use relative Jump Tables with
potentially different relocation bases. As the logic for the jump table
itself doesn't depend on the relocation base, make it easier for targets
to use the generic logic. Start by dropping the now redundant MIPS logic.
Daniel Sanders [Tue, 15 Nov 2016 09:51:02 +0000 (09:51 +0000)]
[tablegen] Extract portions of AsmMatcherEmitter for re-use by another generator. NFC.
Summary:
This change is preparation for a change that will allow targets to verify that the instructions
they emit meet the predicates they specify. This is useful to ensure that C++
legalization/lowering/instruction-selection doesn't incorrectly select code for a different
subtarget than intended. Such cases are not caught by the integrated assembler when emitting
instructions directly to an object file.
Adam Nemet [Tue, 15 Nov 2016 08:40:51 +0000 (08:40 +0000)]
[opt-viewer] Add support for libYAML for faster parsing
This results in a speed-up of over 6x on sqlite3.
Before:
$ time -p /org/llvm/utils/opt-viewer/opt-viewer.py ./MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.opt.yaml html
real 415.07
user 410.00
sys 4.66
After with libYAML:
$ time -p /org/llvm/utils/opt-viewer/opt-viewer.py ./MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.opt.yaml html
real 63.96
user 60.03
sys 3.67
I followed these steps to get libYAML working with PyYAML: http://rmcgibbo.github.io/blog/2013/05/23/faster-yaml-parsing-with-libyaml/
Zvi Rackover [Tue, 15 Nov 2016 06:34:33 +0000 (06:34 +0000)]
[X86][GlobalISel] Add minimal call lowering support to the IRTranslator
Summary:
Add basic functionality to support call lowering for X86.
Currently only supports functions which return void and take zero arguments.
Inspired by commit 286573.
Greg Clayton [Tue, 15 Nov 2016 01:23:06 +0000 (01:23 +0000)]
Improve DWARF parsing speed by improving DWARFAbbreviationDeclaration
This patch gets a DWARF parsing speed improvement by having DWARFAbbreviationDeclaration instances know if they have a fixed byte size. If an abbreviation has a fixed byte size that can be calculated given a DWARFUnit, then parsing a DIE becomes two steps: parse ULEB128 abbrev code, and then add constant size to the offset.
This patch also adds a fixed byte size to each DWARFAbbreviationDeclaration::AttributeSpec so that attributes can quickly skip their values if needed without the need to lookup the fixed for size.
Notable improvements:
- DWARFAbbreviationDeclaration::findAttributeIndex() now returns an Optional<uint32_t> instead of a uint32_t and we no longer have to look for the magic -1U return value
- Optional<uint32_t> DWARFAbbreviationDeclaration::findAttributeIndex(dwarf::Attribute attr) const;
- DWARFAbbreviationDeclaration now has a getAttributeValue() function that extracts an attribute value given a DIE offset that takes advantage of the DWARFAbbreviationDeclaration::AttributeSpec::ByteSize
- bool DWARFAbbreviationDeclaration::getAttributeValue(const uint32_t DIEOffset, const dwarf::Attribute Attr, const DWARFUnit &U, DWARFFormValue &FormValue) const;
- A DWARFAbbreviationDeclaration instance can return a fixed byte size for itself so DWARF parsing is faster:
- Optional<size_t> DWARFAbbreviationDeclaration::getFixedAttributesByteSize(const DWARFUnit &U) const;
- Any functions that used to take a "const DWARFUnit *U" that would crash if U was NULL now take a "const DWARFUnit &U" and are only called with a valid DWARFUnit
Rui Ueyama [Tue, 15 Nov 2016 00:54:54 +0000 (00:54 +0000)]
Add a file magic for CL.exe's object file created with /GL.
This patch makes it possible to identify object files created by CL.exe
with /GL option. Such file contains Microsoft proprietary intermediate
code instead of target machine code to do LTO.
I need this to print out user-friendly error message from LLD.
Permit specifying the match length (the `-n` or `--bytes` option). The
deprecated `-[length]` form is not supported as an option. This allows the
strings tool to display only the specified length strings rather than the
hardcoded default length of >= 4.
Evandro Menezes [Mon, 14 Nov 2016 23:29:01 +0000 (23:29 +0000)]
[AArch64] Compute the Newton series for reciprocals natively
Implement the Newton series for square root, its reciprocal and reciprocal
natively using the specialized instructions in AArch64 to perform each
series iteration.
Kuba Brecka [Mon, 14 Nov 2016 21:41:13 +0000 (21:41 +0000)]
[tsan] Add support for C++ exceptions into TSan (call __tsan_func_exit during unwinding), LLVM part
This adds support for TSan C++ exception handling, where we need to add extra calls to __tsan_func_exit when a function is exitted via exception mechanisms. Otherwise the shadow stack gets corrupted (leaked). This patch moves and enhances the existing implementation of EscapeEnumerator that finds all possible function exit points, and adds extra EH cleanup blocks where needed.
Kevin Enderby [Mon, 14 Nov 2016 20:57:04 +0000 (20:57 +0000)]
Add a checkSymbolTable() method to the MachOObjectFile class.
The philosophy of the error checking in libObject for Mach-O files
is that the constructor will check the load commands so for their
tables the offsets and sizes are properly contained in the file.
But there is no checking of the entries of any of the tables.
For the contents of the tables themselves the methods accessing
the contents of the entries return errors as needed. In some
cases this however makes it difficult or cumbersome to produce
a good error message which would include the tool name, file name,
archive member, and name of the architecture of a slice of a universal file
the error occurred in.
So idea is that there will be a method to check a table which can
be called up front before using it allowing a good error message
to be produced before a table is used. And if only verification of
the Mach-O file and its tables are wanted a new possible method
checkAllTables() could be added to call all of the methods to
check all the tables at some time when such methods exist.
The checkSymbolTable() is the first of such methods to check
one of the Mach-O file tables. This method initially will used in
llvm-objdump’s DisassembleMachO() routine before it gets the
section and symbol information. As if there are problems with
the symbol table currently the error is first encountered by the
bool operator() in the SymbolSorter() struct which passed to
std::sort(). In this case there is no context as to the file name
the symbol which results a poor error message:
LLVM ERROR: truncated or malformed object (bad string index: 22 for symbol at index 1)
with the added call to the checkSymbolTable() method the
error message includes the tool name and file name:
llvm-objdump: 'macho-invalid-symbol-strx': truncated or malformed object (bad string table index: 22 past the end of string table, for symbol at index 1)
Tim Northover [Mon, 14 Nov 2016 20:28:24 +0000 (20:28 +0000)]
Recommit: ARM: sort register lists by encoding in push/pop instructions.
For example we were producing
push {r8, r10, r11, r4, r5, r7, lr}
This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.
Fixed usage of std::sort so that we (hopefully) use instantiations that
actually exist in GCC 4.8.
Geoff Berry [Mon, 14 Nov 2016 19:39:04 +0000 (19:39 +0000)]
[AArch64] Split 0 vector stores into scalar store pairs.
Summary:
Replace a splat of zeros to a vector store by scalar stores of WZR/XZR.
The load store optimizer pass will merge them to store pair stores.
This should be better than a movi to create the vector zero followed by
a vector store if the zero constant is not re-used, since one
instructions and one register live range will be removed.
Teresa Johnson [Mon, 14 Nov 2016 19:21:41 +0000 (19:21 +0000)]
[ThinLTO] Only promote exported locals as marked in index
Summary:
We have always speculatively promoted all renamable local values
(except const non-address taken variables) for both the exporting
and importing module. We would then internalize them back based on
the ThinLink results if they weren't actually exported. This is
inefficient, and results in unnecessary renames. It also meant we
had to check the non-renamability of a value in the summary, which
was already checked during function importing analysis in the ThinLink.
Made renameModuleForThinLTO (which does the promotion/renaming) instead
use the index when exporting, to avoid unnecessary renames/promotions.
For importing modules, we can simply promoted all values as any local
we import by definition is exported and needs promotion.
This required changes to the method used by the FunctionImport pass
(only invoked from 'opt' for testing) and when invoked from llvm-link,
since neither does a ThinLink. We simply conservatively mark all locals
in the index as promoted, which preserves the current aggressive
promotion behavior.
I also needed to change an llvm-lto based test where we had previously
been aggressively promoting values that weren't importable (aliasees),
but now will not promote.
Tim Northover [Mon, 14 Nov 2016 19:02:17 +0000 (19:02 +0000)]
ARM: sort register lists by encoding in push/pop instructions.
For example we were producing
push {r8, r10, r11, r4, r5, r7, lr}
This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.
Changpeng Fang [Mon, 14 Nov 2016 18:33:18 +0000 (18:33 +0000)]
AMDGPU/SI: Support data types other than V4f32 in image intrinsics
Summary:
Extend image intrinsics to support data types of V1F32 and V2F32.
TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active,
even though such case should be very rare.
Bob Wilson [Mon, 14 Nov 2016 17:56:18 +0000 (17:56 +0000)]
Use _Unwind_Backtrace on Apple platforms.
Darwin's backtrace() function does not work with sigaltstack (which was
enabled when available with r270395) — it does a sanity check to make
sure that the current frame pointer is within the expected stack area
(which it is not when using an alternate stack) and gives up otherwise.
The alternative of _Unwind_Backtrace seems to work fine on macOS, so use
that when backtrace() fails. Note that we then use backtrace_symbols_fd()
with the addresses from _Unwind_Backtrace, but I’ve tested that and it
also seems to work fine. rdar://problem/28646552
Teresa Johnson [Mon, 14 Nov 2016 17:12:32 +0000 (17:12 +0000)]
Restore "[ThinLTO] Prevent exporting of locals used/defined in module level asm"
This restores the rest of r286297 (part was restored in r286475).
Specifically, it restores the part requiring adding a dependency from
the Analysis to Object library (downstream use changed to correctly
model split BitReader vs BitWriter libraries).
Original description of this part of patch follows:
Module level asm may also contain defs of values. We need to prevent
export of any refs to local values defined in module level asm (e.g. a
ref in normal IR), since that also requires renaming/promotion of the
local. To do that, the summary index builder looks at all values in the
module level asm string that are not marked Weak or Global, which is
exactly the set of locals that are defined. A summary is created for
each of these local defs and flagged as NoRename.
This required adding handling to the BitcodeWriter to look at GV
declarations to see if they have a summary (rather than skipping them
all).
Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to
ensure that an MCAsmParser is available, otherwise the module asm parse
would silently fail. Initialized the asm parser in the opt tool for use
in testing this fix.
[Hexagon] Remove unsafe load instructions that affect Stack Slot Coloring
The Stack slot coloring pass removes a store that is followed by a load
that deal with the same stack slot. The function isLoadFromStackSlot
is supposed to consider the loads that have no side-effects. This
patch fixed the issue by removing the unsafe loads from this function
Eg:
%vreg0<def> = L2_loadruh_io <fi#15>, 0
S2_storeri_io <fi#15>, 0, %vreg0
In this case, we load an unsigned extended half word and store this in to
the same stack slot. The Stack slot coloring pass considers safe to remove
the store. This patch marked all the non-vector byte and half word loads as
unsafe.