Craig Topper [Sun, 12 Jun 2016 04:48:00 +0000 (04:48 +0000)]
[X86] Pre-allocate some of the shuffle mask SmallVectors in the auto upgrade code instead of calling push_back in a loop. This removes the need to check if the vector needs to grow on each iteration.
Craig Topper [Sun, 12 Jun 2016 03:10:47 +0000 (03:10 +0000)]
[X86] Greatly simplify the llvm.x86.avx.vpermil.* auto-upgrade code. We can fully derive everything using types of the intrinsic arguments rather than writing separate loops for each intrinsic. NFC
Craig Topper [Sun, 12 Jun 2016 01:05:59 +0000 (01:05 +0000)]
[X86,IR] Make use of the CreateShuffleVector form that takes an ArrayRef<uint32_t> to avoid the need to manually create a bunch of Constants and a ConstantVector. NFC
Craig Topper [Sun, 12 Jun 2016 00:41:19 +0000 (00:41 +0000)]
[IR] Require ArrayRef of 'uint32_t' instead of 'int' for the mask argument for one of the signatures of CreateShuffleVector. This better emphasises that you can't use it for the -1 as undef behavior.
Eli Friedman [Sat, 11 Jun 2016 21:48:25 +0000 (21:48 +0000)]
[LICM] Make isGuaranteedToExecute more accurate.
Summary:
Make isGuaranteedToExecute use the
isGuaranteedToTransferExecutionToSuccessor helper, and make that helper
a bit more accurate.
There's a potential performance impact here from assuming that arbitrary
calls might not return. This probably has little impact on loads and
stores to a pointer because most things alias analysis can reason about
are dereferenceable anyway. The other impacts, like less aggressive
hoisting of sdiv by a variable and less aggressive hoisting around
volatile memory operations, are unlikely to matter for real code.
This also impacts SCEV, which uses the same helper. It's a minor
improvement there because we can tell that, for example, memcpy always
returns normally. Strictly speaking, it's also introducing
a bug, but it's not any worse than everywhere else we assume readonly
functions terminate.
Simon Pilgrim [Sat, 11 Jun 2016 20:39:21 +0000 (20:39 +0000)]
[X86] Updated test checks script to generalise LCPI symbol refs
The script now replace '.LCPI888_8' style asm symbols with the {{\.LCPI.*}} re pattern - this helps stop hardcoded symbols in 32-bit x86 tests changing with every edit of the file
This is one of the patches to clean up the code so that
it is in a better form to make future enhancements easier.
In htis patch, the logic to collect viable successors are
extrated as a helper to unclutter the caller which gets very
large recenty. Also cleaned up BP adjustment code.
Vikram TV [Sat, 11 Jun 2016 16:41:10 +0000 (16:41 +0000)]
Delay dominator updation while cloning loop.
Summary:
Dominator updation fails for a loop inserted with a new basicblock.
A block required by DT to set the IDom might not have been cloned yet. This is because there is no predefined ordering of loop blocks (except for the header block which should be the first block in the list).
The patch first creates DT nodes for the cloned blocks and then separately updates the DT in a follow-on loop.
Simon Pilgrim [Sat, 11 Jun 2016 15:44:13 +0000 (15:44 +0000)]
[X86][SSSE3] Added PSHUFB LUT implementation of BITREVERSE
PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach.
Simon Pilgrim [Sat, 11 Jun 2016 12:54:37 +0000 (12:54 +0000)]
[X86][SSE] Use vXi8 return type for PSLLDQ/PSRLDQ instructions
These are byte shift instructions and it will make shuffle combining a lot more straightforward if we can assume a vXi8 vector of bytes so decoded shuffle masks match the return type's number of elements
Chandler Carruth [Sat, 11 Jun 2016 08:19:59 +0000 (08:19 +0000)]
Use a two-level cast through an intptr_t, and make them C-style casts.
This shouldn't have any functional difference, but it appears to be the
pattern used for other methods on DynamicLibrary, and it should avoid
the -Wpedantic warning on one of the build bots about the direct
reinterpret_cast.
Lang Hames [Sat, 11 Jun 2016 05:47:04 +0000 (05:47 +0000)]
[MCJIT] Update MCJIT and get the fibonacci example working again.
MCJIT will now set the DataLayout on a module when it is added to the JIT,
rather than waiting until it is codegen'd, and the runFunction method will
finalize the module containing the function to be run before running it.
The fibonacci example has been updated to include and link against MCJIT.
Craig Topper [Sat, 11 Jun 2016 03:27:37 +0000 (03:27 +0000)]
[AVX512] No need to check for BWI being enabled before lowering v32i16 and v64i8 shuffles. If we get this far the types are already legal which means BWI must be enabled.
Matthias Braun [Sat, 11 Jun 2016 00:31:28 +0000 (00:31 +0000)]
LiveIntervalAnalysis: findLastUseBefore() must ignore undef uses.
undef uses are no real uses of a register and must be ignored by
findLastUseBefore() so that handleMove() does not produce invalid live
intervals in some cases.
Don't try to rotate a loop more than once - we never do this anyway.
Summary:
I can't find a case where we can rotate a loop more than once, and it looks
like we never do this. To rotate a loop following conditions should be met:
1) its header should be exiting
2) its latch shouldn't be exiting
But after the first rotation the header becomes the new latch, so this
condition can never be true any longer.
Tested on with an assert on LNT testsuite and make check.
Zachary Turner [Fri, 10 Jun 2016 21:47:26 +0000 (21:47 +0000)]
[pdb] Fix issues with pdb writing.
This fixes an alignment issue by forcing all cached allocations
to be 8 byte aligned, and also fixes an issue arising on big
endian systems by writing ulittle32_t's instead of uint32_t's
in the test.
Sebastian Pop [Fri, 10 Jun 2016 21:36:41 +0000 (21:36 +0000)]
MemorySSA: fix memory access local dominance function for live on entry
A memory access defined on function entry cannot be locally dominated by another memory access.
The patch was split from http://reviews.llvm.org/D19338 which exposes the problem.
Etienne Bergeron [Fri, 10 Jun 2016 20:24:38 +0000 (20:24 +0000)]
[CodeGen] Fix PrologEpilogInserter to avoid duplicate allocation of SEH structs
Summary:
When stack-protection is activated and WinEH exceptions is used,
the EHRegNode (exception handling registration) is allocated twice on the stack.
This was not breaking anything except loosing space on the stack.
```
D:\src\llvm\examples>llc exc2.ll -debug-only=pei
alloc FI(0) at SP[-24]
alloc FI(1) at SP[-48] <<-- Allocated
alloc FI(1) at SP[-72] <<-- Allocated twice!?
alloc FI(2) at SP[-76]
alloc FI(4) at SP[-80]
alloc FI(3) at SP[-84]
```
Evgeniy Stepanov [Fri, 10 Jun 2016 20:03:20 +0000 (20:03 +0000)]
Disable MSan-hostile loop unswitching.
Loop unswitching may cause MSan false positive when the unswitch
condition is not guaranteed to execute.
This is very similar to ASan and TSan special case in
llvm::isSafeToSpeculativelyExecute (they don't like speculative loads
and stores), but for branch instructions.
Zhan Jun Liau [Fri, 10 Jun 2016 19:58:10 +0000 (19:58 +0000)]
[SystemZ] Support Compare and Traps
Support and generate Compare and Traps like CRT, CIT, etc.
Support Trap as legal DAG opcodes and generate "j .+2" for them by default.
Add support for Conditional Traps and use the If Converter to convert them into
the corresponding compare and trap opcodes.
Tom Stellard [Fri, 10 Jun 2016 19:26:38 +0000 (19:26 +0000)]
AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.
Mehdi Amini [Fri, 10 Jun 2016 18:37:21 +0000 (18:37 +0000)]
Interprocedural Register Allocation (IPRA): add a Transformation Pass
Adds a MachineFunctionPass that scans the body to find calls, and
update the register mask with the one saved by the
RegUsageInfoCollector analysis in PhysicalRegisterUsageInfo.
Add an option to enable the analysis of MachineFunction register
usage to extract the list of clobbered registers.
When enabled, the CodeGen order is changed to be bottom up on the Call
Graph.
The analysis is split in two parts, RegUsageInfoCollector is the
MachineFunction Pass that runs post-RA and collect the list of
clobbered registers to produce a register mask.
An immutable pass, RegisterUsageInfo, stores the RegMask produced by
RegUsageInfoCollector, and keep them available. A future tranformation
pass will use this information to update every call-sites after
instruction selection.
Sanjay Patel [Fri, 10 Jun 2016 15:17:54 +0000 (15:17 +0000)]
[x86] add missing tests for fcmp ueq/one
Somehow, the codegen logic for these sequences has gone completely untested
until now (note the 2 compare instructions generated per test).
There's also an *Intel* AVX optimization opportunity exposed in these cases
and the existing tests. Intel's (but not AMD's) AVX spec shows that extra FP
predicates were added, so a single comparison should always be sufficient,
and operand commutation should never be necessary.
Matthew Simpson [Fri, 10 Jun 2016 14:33:30 +0000 (14:33 +0000)]
Reapply "[TTI] Refine default cost for interleaved load groups with gaps"
This reapplies commit r272385 with a fix. The build was failing when compiled
with gcc, but not with clang. With the fix, we now get the data layout from the
current TTI implementation, which will hopefully solve the issue.
Matthew Simpson [Fri, 10 Jun 2016 11:27:51 +0000 (11:27 +0000)]
[TTI] Refine default cost for interleaved load groups with gaps
This patch refines the default cost for interleaved load groups having gaps. If
a load group has gaps, the legalized instructions corresponding to the unused
elements will be dead. Thus, we don't need to account for them in the cost
model. Instead, we only need to account for the fraction of legalized loads
that will actually be used.
Sam Kolton [Fri, 10 Jun 2016 09:57:59 +0000 (09:57 +0000)]
[AMDGPU] AsmParser: Support for sext() modifier in SDWA. Some code cleaning in AMDGPUOperand.
Summary:
sext() modifier is supported in SDWA instructions only for integer operands. Spec is unclear should integer operands support abs and neg modifiers with sext - for now they are not supported.
Renamed InputModsWithNoDefault to FloatInputMods. Added SextInputMods for operands that support sext() modifier.
Added AMDGPUOperand::Modifier struct to handle register and immediate modifiers.
Code cleaning in AMDGPUOperand class: organize method in groups (render-, predicate-methods...).
Bug fix remove another illegal char from prof symbol name
End-end test with no integrated assembly should be added
at some point (not done now because some bots are not properly configured to
support -no-integrated-as)
The issue appears to be mixing non-ASan-ified code (LibFuzzer) and
ASan-ified code (the unittest) as the tests would pass fine if
everything was built with ASan enabled.
I believe the issue is that different implementations of std::vector<>
are being used in LibFuzzer and outside LibFuzzer (in the unittests).
For Libcxx (I've not seen the issue manifest for libstdc++) we can disable
the ASanified std::vector<> by definining the ``_LIBCPP_HAS_NO_ASAN`` macro.
Doing this fixes the tests on OSX.
Zachary Turner [Fri, 10 Jun 2016 05:10:19 +0000 (05:10 +0000)]
Make PDBFile take a StreamInterface instead of a MemBuffer.
This is the next step towards being able to write PDBs.
MemoryBuffer is immutable, and StreamInterface is our replacement
which can be any combination of read-only, read-write, or write-only
depending on the particular implementation.
The one place where we were creating a PDBFile (in RawSession) is
updated to subclass ByteStream with a simple adapter that holds
a MemoryBuffer, and initializes the superclass with the buffer's
array, so that all the functionality of ByteStream works
transparently.
Zachary Turner [Fri, 10 Jun 2016 05:09:12 +0000 (05:09 +0000)]
Add support for writing through StreamInterface.
This adds method and tests for writing to a PDB stream. With
this, even a PDB stream which is discontiguous can be treated
as a sequential stream of bytes for the purposes of writing.
Craig Topper [Fri, 10 Jun 2016 04:48:05 +0000 (04:48 +0000)]
[AVX512] Fix shuffle comment printing to handle the masked versions of some shuffles. Previously we were printing the mask operands as the register names.
Daniel Dunbar [Fri, 10 Jun 2016 04:17:30 +0000 (04:17 +0000)]
[lit] Only gather redirected files for command failures.
- The intended use of this was just in diagnostics, so we shouldn't pay the
cost of reading these all the time.
- This will avoid including the full output of each command in tests which
fail, but the most important use case for this was to gather the output of
the specific command which failed.
Quentin Colombet [Fri, 10 Jun 2016 01:57:48 +0000 (01:57 +0000)]
[LiveRangeEdit] Add a test case for r272314.
The test case is not great espicially because it is still cumbersome to
run the regalloc pass with run-pass. (We miss a bunch of initiliazier to
be properly implemented.)
Quentin Colombet [Fri, 10 Jun 2016 00:52:10 +0000 (00:52 +0000)]
[llc] Add support for several run-pass options.
Previously we could run only one machine pass with the run-pass option.
With that patch, we can now specify several passes with several run-pass
options (or just one option with a list of comma separated passes) and
llc will build the related pipeline.
This is great to test the interaction of two passes that are not
necessarily next to each other in the pipeline, or play with pass
ordering.
Now, we should be at parity with opt for the flexibility of running
passes.
Note: I also moved the run pass option from CommandFlags.h to llc.cpp
because, really, this is needed only there!
Tom Stellard [Fri, 10 Jun 2016 00:31:13 +0000 (00:31 +0000)]
docs: Add AMDGPU relocation information
Summary:
This documents the various relocation types that are supported by the
Radeon Open Compute (ROC) runtime (which is essentially the dynamic
linker for AMDGPU).
Only R_AMDGPU_32 is not currently supported by the ROC runtime, but
it will usually be resolved at link time by lld.
Tom Stellard [Fri, 10 Jun 2016 00:01:04 +0000 (00:01 +0000)]
AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permute
Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.