Simon Pilgrim [Sat, 11 Mar 2017 20:42:31 +0000 (20:42 +0000)]
[X86][SSE] Improve extraction of elements from v16i8 (pre-SSE41)
Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte.
This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte.
Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate?
Matt Arsenault [Sat, 11 Mar 2017 05:40:40 +0000 (05:40 +0000)]
AMDGPU: Keep track of modifiers when converting v_mac to v_mad
Since v_max_f32_e64/v_max_f16_e64 can be folded if the target
instruction supports the clamp bit, we also need to maintain
modifiers when converting v_mac to v_mad.
This fixes a rendering issue with Dirt Rally because a v_mac
instruction with the clamp bit set was converted to a v_mad
but that bit was lost during the conversion.
Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit")
Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>
Eric Fiselier [Sat, 11 Mar 2017 02:24:13 +0000 (02:24 +0000)]
Revert r297516 - Respect CMAKE_INSTALL_MANDIR for sphinx generated manpages
When CMAKE_INSTALL_MANDIR isn't defined it ends up attempting to install
the man pages under "/man1" and we really don't want to accidentally install
stuff at the filesystem root.
Daniel Berlin [Sat, 11 Mar 2017 01:41:03 +0000 (01:41 +0000)]
Remove opt-bisect support for "cases" in favor of debug counters
Summary:
Ths "cases" support was not quite finished, is unused, and is really just debug counters.
(well, almost, debug counters are slightly more powerful, in that they can skip things at the start, too).
Note, opt-bisect itself could also be implemented as a wrapper around
debug counters, but not sure it's worth it ATM.
Jordan Rose [Sat, 11 Mar 2017 01:24:56 +0000 (01:24 +0000)]
[unittest] Explicitly specify alignment when using BumpPtrAllocator.
r297310 began inserting red zones around allocations under ASan, which
perturbs the alignment of subsequent allocations. Deliberately specify
this in two places where it matters.
Fixes failures when these tests are run under ASan and UBSan together.
Reviewed by Duncan Exon Smith.
Sanjoy Das [Sat, 11 Mar 2017 01:15:48 +0000 (01:15 +0000)]
Use a WeakVH for UnknownInstructions in AliasSetTracker
Summary:
This change solves the same problem as D30726, except that this only
throws out the bathwater.
AST was not correctly tracking and deleting UnknownInstructions via
handles. The existing code only tracks "pointers" in its
`ASTCallbackVH`, so an UnknownInstruction (that isn't also def'ing a
pointer used by another memory instruction) never gets a
`ASTCallbackVH`.
There are two other ways to solve this problem:
- Use the `PointerRec` scheme for both known and unknown instructions.
- Use a `CallbackVH` that erases the offending Instruction from the
UnknownInstruction list.
Both of the above changes seemed to be significantly (and unnecessarily
IMO) more complex than this.
This method inverts the Reason field of a scheduling candidate.
It does right comparison between RegCritical and RegExcess, but
everything else is broken. In fact it can prefer less strong reason
such as Weak over RegCritical because Weak > -RegCritical.
The CandReason enum is properly sorted, so just remove artificial
ranking.
Quentin Colombet [Sat, 11 Mar 2017 00:28:33 +0000 (00:28 +0000)]
[IRTranslator] Simplify error handling for translating constants. NFC.
We don't need to check whether the fallback path is enabled to return
false. Just do that all the time on error cases, the caller knows (or
at least should know!) how to handle the failing case.
The problem can occur in presence of subregs. If we are swapping two
instructions defining different subregs of the same register we will
get a new liveout from a block. We need to preserve value number for
block's liveout for successor block's livein to match.
This function will find the closest ref node aliased to Reg that is
in an instruction preceding Inst. This could be used to identify the
hypothetical reaching def of Reg, if Reg was a member of Inst.
Evandro Menezes [Fri, 10 Mar 2017 20:20:04 +0000 (20:20 +0000)]
[AArch64, X86] Additional debug information for MacroFusion
In order to make it easier to parse information about the performance of
MacroFusion, this patch adds the function and the instruction names to the
debug output of this pass.
[SLP] Revert everything that has to do with memory access sorting.
This reverts r293386, r294027, r294029 and r296411.
Turns out the SLP tree isn't actually a "tree" and we don't handle
accessing the same packet of loads in several different orders well,
causing miscompiles.
Volkan Keles [Fri, 10 Mar 2017 18:34:57 +0000 (18:34 +0000)]
[GlobalISel] Make LegalizerInfo accessible in LegalizerHelper
Summary:
We don’t actually use LegalizerInfo in Legalizer pass, it’s just passed
as an argument.
In order to check if an instruction is legal or not, we need to get LegalizerInfo
by calling `MI.getParent()->getParent()->getSubtarget().getLegalizerInfo()`.
Instead, make LegalizerInfo accessible in LegalizerHelper.
Zachary Turner [Fri, 10 Mar 2017 18:33:41 +0000 (18:33 +0000)]
[Support] Don't return an error if realPath fails.
In openFileForRead, we would not previously return an error
if real_path resolution failed. After a recent patch, we
started propagating this error up. This caused a failure
in clang when trying to call openFileForRead("nul"). This
patch restores the previous behavior of not propagating this
error up.
Simon Pilgrim [Fri, 10 Mar 2017 18:01:53 +0000 (18:01 +0000)]
[X86][SSE] Added tests showing missed truncations for sitofp conversion
SelectionDAG::ComputeNumSignBits is poor at build_vector handling, meaning that we can't see that all the vXi64 sources are in fact sign extended i32 or smaller.
Zachary Turner [Fri, 10 Mar 2017 17:39:21 +0000 (17:39 +0000)]
Add llvm::sys::fs::real_path.
LLVM already has real_path like functionality, but it is
cumbersome to use and involves clean up after (e.g. you have
to call openFileForRead, then close the resulting FD).
Furthermore, on Windows it doesn't work for directories since
opening a directory and opening a file require slightly
different flags.
So I add a simple function `real_path` which works for all
paths on all platforms and has a simple to use interface.
In doing so, I add the ability to opt in to resolving tilde
expressions (e.g. ~/foo), which are normally handled by
the shell.
Simon Pilgrim [Fri, 10 Mar 2017 17:23:55 +0000 (17:23 +0000)]
[X86][MMX] Add tests showing missed opportunities to use MMX sitofp conversions
If we are transferring MMX registers to XMM for conversion we could use the MMX equivalents (CVTPI2PD + CVTPI2PS) without affecting rounding/exceptions etc.
Simon Pilgrim [Fri, 10 Mar 2017 16:59:43 +0000 (16:59 +0000)]
[X86][MMX] Add tests showing missed opportunities to use MMX fptosi conversions
If we are transferring XMM conversion results to MMX registers we could use the MMX equivalents (CVTPD2PI/CVTTPD2PI + CVTPS2PI/CVTTPS2PI) with affecting rounding/expections etc.
Simon Pilgrim [Fri, 10 Mar 2017 13:44:32 +0000 (13:44 +0000)]
[APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt
We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64).
This is another of the compile time issues identified in PR32037 (see also D30265).
This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target.
Simon Dardis [Fri, 10 Mar 2017 13:27:14 +0000 (13:27 +0000)]
[mips][msa] Accept more values for constant splats
This patches teaches the MIPS backend to accept more values for constant
splats. Previously, only 10 bit signed immediates or values that could be
loaded using an ldi.[bhwd] instruction would be acceptted. This patch relaxes
that constraint so that any constant value that be splatted is accepted.
As a result, the constant pool is used less for vector operations, and the
suite of bit manipulation instructions b(clr|set|neg)i can now be used with
the full range of their immediate operand.
Sanne Wouda [Fri, 10 Mar 2017 13:08:20 +0000 (13:08 +0000)]
[Assembler] Add location info to unary expressions.
Summary:
This is a continuation of D28861. Add an SMLoc to MCUnaryExpr such that
a better diagnostic can be given in case of an error in later stages of
assembling.
Artyom Skrobov [Fri, 10 Mar 2017 12:41:33 +0000 (12:41 +0000)]
Refactor the multiply-accumulate combines to act on
ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE].
Summary:
This allows for some simplification because the combines
are no longer limited to just one go at the node before
it gets legalized into an ARM target-specific one.
George Rimar [Fri, 10 Mar 2017 10:31:56 +0000 (10:31 +0000)]
WholeProgramDevirt: Fixed compilation error under MSVS2015.
It was introduced in:
r296945
WholeProgramDevirt: Implement exporting for single-impl devirtualization.
---------------------
r296939
WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users.
---------------------
Microsoft Visual Studio Community 2015
Version 14.0.23107.0 D14REL
Does not compile that code without additional brackets, showing multiple error like below:
WholeProgramDevirt.cpp(1216): error C2958: the left bracket '[' found at 'c:\access_softek\llvm\lib\transforms\ipo\wholeprogramdevirt.cpp(1216)' was not matched correctly
WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ']' before '}'
WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ';' before '}'
WholeProgramDevirt.cpp(1216): error C2059: syntax error: ']'
Simon Atanasyan [Fri, 10 Mar 2017 08:22:20 +0000 (08:22 +0000)]
[MC] Set SHT_MIPS_DWARF section type for all .debug_* sections on MIPS
All MIPS .debug_* sections should be marked with ELF type SHT_MIPS_DWARF
accordingly the specification [1]. Also the same section type is assigned
to these sections by GNU tools.
Simon Atanasyan [Fri, 10 Mar 2017 08:22:13 +0000 (08:22 +0000)]
[MC] Accept a numeric value as an ELF section header's type
GAS supports specification of section header's type using a numeric
value [1]. This patch brings the same functionality to LLVM. That allows
to setup some target-specific section types belong to the SHT_LOPROC -
SHT_HIPROC range. If we attempt to print unknown section type, MCSectionELF
class shows an error message. It's better than print sole '@' sign
without any section type name.
In case of MIPS, example of such section's type is SHT_MIPS_DWARF.
Without the patch we will have to implement some workarounds
in probably not-MIPS-specific part of code base to convert SHT_MIPS_DWARF
to the @progbits while printing assembly and to assign SHT_MIPS_DWARF for
@progbits sections named .debug_* if we encounter such section in
an input assembly.
Daniel Berlin [Fri, 10 Mar 2017 04:54:10 +0000 (04:54 +0000)]
Move memory coercion functions from GVN.cpp to VNCoercion.cpp so they can be shared between GVN and NewGVN.
Summary:
These are the functions used to determine when values of loads can be
extracted from stores, etc, and to perform the necessary insertions to
do this. There are no changes to the functions themselves except
reformatting, and one case where memdep was informed of a removed load
(which was pushed into the caller).
- Fix the insertion point, which occasionally could have been incorrect.
- Avoid creating multiple bitsplits with the same operands, if an old one
could be reused.
Tim Northover [Thu, 9 Mar 2017 21:12:06 +0000 (21:12 +0000)]
GlobalISel: put debug info for static allocas in the MachineFunction.
The good reason to do this is that static allocas are pretty simple to handle
(especially at -O0) and avoiding tracking DBG_VALUEs throughout the pipeline
should give some kind of performance benefit.
The bad reason is that the debug pipeline is an unholy mess of implicit
contracts, where determining whether "DBG_VALUE %reg, imm" actually implies a
load or not involves the services of at least 3 soothsayers and the sacrifice
of at least one chicken. And it still gets it wrong if the variable is at SP
directly.