Lang Hames [Sat, 11 Jun 2016 05:47:04 +0000 (05:47 +0000)]
[MCJIT] Update MCJIT and get the fibonacci example working again.
MCJIT will now set the DataLayout on a module when it is added to the JIT,
rather than waiting until it is codegen'd, and the runFunction method will
finalize the module containing the function to be run before running it.
The fibonacci example has been updated to include and link against MCJIT.
Craig Topper [Sat, 11 Jun 2016 03:27:37 +0000 (03:27 +0000)]
[AVX512] No need to check for BWI being enabled before lowering v32i16 and v64i8 shuffles. If we get this far the types are already legal which means BWI must be enabled.
Matthias Braun [Sat, 11 Jun 2016 00:31:28 +0000 (00:31 +0000)]
LiveIntervalAnalysis: findLastUseBefore() must ignore undef uses.
undef uses are no real uses of a register and must be ignored by
findLastUseBefore() so that handleMove() does not produce invalid live
intervals in some cases.
Don't try to rotate a loop more than once - we never do this anyway.
Summary:
I can't find a case where we can rotate a loop more than once, and it looks
like we never do this. To rotate a loop following conditions should be met:
1) its header should be exiting
2) its latch shouldn't be exiting
But after the first rotation the header becomes the new latch, so this
condition can never be true any longer.
Tested on with an assert on LNT testsuite and make check.
Zachary Turner [Fri, 10 Jun 2016 21:47:26 +0000 (21:47 +0000)]
[pdb] Fix issues with pdb writing.
This fixes an alignment issue by forcing all cached allocations
to be 8 byte aligned, and also fixes an issue arising on big
endian systems by writing ulittle32_t's instead of uint32_t's
in the test.
Sebastian Pop [Fri, 10 Jun 2016 21:36:41 +0000 (21:36 +0000)]
MemorySSA: fix memory access local dominance function for live on entry
A memory access defined on function entry cannot be locally dominated by another memory access.
The patch was split from http://reviews.llvm.org/D19338 which exposes the problem.
Etienne Bergeron [Fri, 10 Jun 2016 20:24:38 +0000 (20:24 +0000)]
[CodeGen] Fix PrologEpilogInserter to avoid duplicate allocation of SEH structs
Summary:
When stack-protection is activated and WinEH exceptions is used,
the EHRegNode (exception handling registration) is allocated twice on the stack.
This was not breaking anything except loosing space on the stack.
```
D:\src\llvm\examples>llc exc2.ll -debug-only=pei
alloc FI(0) at SP[-24]
alloc FI(1) at SP[-48] <<-- Allocated
alloc FI(1) at SP[-72] <<-- Allocated twice!?
alloc FI(2) at SP[-76]
alloc FI(4) at SP[-80]
alloc FI(3) at SP[-84]
```
Evgeniy Stepanov [Fri, 10 Jun 2016 20:03:20 +0000 (20:03 +0000)]
Disable MSan-hostile loop unswitching.
Loop unswitching may cause MSan false positive when the unswitch
condition is not guaranteed to execute.
This is very similar to ASan and TSan special case in
llvm::isSafeToSpeculativelyExecute (they don't like speculative loads
and stores), but for branch instructions.
Zhan Jun Liau [Fri, 10 Jun 2016 19:58:10 +0000 (19:58 +0000)]
[SystemZ] Support Compare and Traps
Support and generate Compare and Traps like CRT, CIT, etc.
Support Trap as legal DAG opcodes and generate "j .+2" for them by default.
Add support for Conditional Traps and use the If Converter to convert them into
the corresponding compare and trap opcodes.
Tom Stellard [Fri, 10 Jun 2016 19:26:38 +0000 (19:26 +0000)]
AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.
Mehdi Amini [Fri, 10 Jun 2016 18:37:21 +0000 (18:37 +0000)]
Interprocedural Register Allocation (IPRA): add a Transformation Pass
Adds a MachineFunctionPass that scans the body to find calls, and
update the register mask with the one saved by the
RegUsageInfoCollector analysis in PhysicalRegisterUsageInfo.
Add an option to enable the analysis of MachineFunction register
usage to extract the list of clobbered registers.
When enabled, the CodeGen order is changed to be bottom up on the Call
Graph.
The analysis is split in two parts, RegUsageInfoCollector is the
MachineFunction Pass that runs post-RA and collect the list of
clobbered registers to produce a register mask.
An immutable pass, RegisterUsageInfo, stores the RegMask produced by
RegUsageInfoCollector, and keep them available. A future tranformation
pass will use this information to update every call-sites after
instruction selection.
Sanjay Patel [Fri, 10 Jun 2016 15:17:54 +0000 (15:17 +0000)]
[x86] add missing tests for fcmp ueq/one
Somehow, the codegen logic for these sequences has gone completely untested
until now (note the 2 compare instructions generated per test).
There's also an *Intel* AVX optimization opportunity exposed in these cases
and the existing tests. Intel's (but not AMD's) AVX spec shows that extra FP
predicates were added, so a single comparison should always be sufficient,
and operand commutation should never be necessary.
Matthew Simpson [Fri, 10 Jun 2016 14:33:30 +0000 (14:33 +0000)]
Reapply "[TTI] Refine default cost for interleaved load groups with gaps"
This reapplies commit r272385 with a fix. The build was failing when compiled
with gcc, but not with clang. With the fix, we now get the data layout from the
current TTI implementation, which will hopefully solve the issue.
Matthew Simpson [Fri, 10 Jun 2016 11:27:51 +0000 (11:27 +0000)]
[TTI] Refine default cost for interleaved load groups with gaps
This patch refines the default cost for interleaved load groups having gaps. If
a load group has gaps, the legalized instructions corresponding to the unused
elements will be dead. Thus, we don't need to account for them in the cost
model. Instead, we only need to account for the fraction of legalized loads
that will actually be used.
Sam Kolton [Fri, 10 Jun 2016 09:57:59 +0000 (09:57 +0000)]
[AMDGPU] AsmParser: Support for sext() modifier in SDWA. Some code cleaning in AMDGPUOperand.
Summary:
sext() modifier is supported in SDWA instructions only for integer operands. Spec is unclear should integer operands support abs and neg modifiers with sext - for now they are not supported.
Renamed InputModsWithNoDefault to FloatInputMods. Added SextInputMods for operands that support sext() modifier.
Added AMDGPUOperand::Modifier struct to handle register and immediate modifiers.
Code cleaning in AMDGPUOperand class: organize method in groups (render-, predicate-methods...).
Bug fix remove another illegal char from prof symbol name
End-end test with no integrated assembly should be added
at some point (not done now because some bots are not properly configured to
support -no-integrated-as)
The issue appears to be mixing non-ASan-ified code (LibFuzzer) and
ASan-ified code (the unittest) as the tests would pass fine if
everything was built with ASan enabled.
I believe the issue is that different implementations of std::vector<>
are being used in LibFuzzer and outside LibFuzzer (in the unittests).
For Libcxx (I've not seen the issue manifest for libstdc++) we can disable
the ASanified std::vector<> by definining the ``_LIBCPP_HAS_NO_ASAN`` macro.
Doing this fixes the tests on OSX.
Zachary Turner [Fri, 10 Jun 2016 05:10:19 +0000 (05:10 +0000)]
Make PDBFile take a StreamInterface instead of a MemBuffer.
This is the next step towards being able to write PDBs.
MemoryBuffer is immutable, and StreamInterface is our replacement
which can be any combination of read-only, read-write, or write-only
depending on the particular implementation.
The one place where we were creating a PDBFile (in RawSession) is
updated to subclass ByteStream with a simple adapter that holds
a MemoryBuffer, and initializes the superclass with the buffer's
array, so that all the functionality of ByteStream works
transparently.
Zachary Turner [Fri, 10 Jun 2016 05:09:12 +0000 (05:09 +0000)]
Add support for writing through StreamInterface.
This adds method and tests for writing to a PDB stream. With
this, even a PDB stream which is discontiguous can be treated
as a sequential stream of bytes for the purposes of writing.
Craig Topper [Fri, 10 Jun 2016 04:48:05 +0000 (04:48 +0000)]
[AVX512] Fix shuffle comment printing to handle the masked versions of some shuffles. Previously we were printing the mask operands as the register names.
Daniel Dunbar [Fri, 10 Jun 2016 04:17:30 +0000 (04:17 +0000)]
[lit] Only gather redirected files for command failures.
- The intended use of this was just in diagnostics, so we shouldn't pay the
cost of reading these all the time.
- This will avoid including the full output of each command in tests which
fail, but the most important use case for this was to gather the output of
the specific command which failed.
Quentin Colombet [Fri, 10 Jun 2016 01:57:48 +0000 (01:57 +0000)]
[LiveRangeEdit] Add a test case for r272314.
The test case is not great espicially because it is still cumbersome to
run the regalloc pass with run-pass. (We miss a bunch of initiliazier to
be properly implemented.)
Quentin Colombet [Fri, 10 Jun 2016 00:52:10 +0000 (00:52 +0000)]
[llc] Add support for several run-pass options.
Previously we could run only one machine pass with the run-pass option.
With that patch, we can now specify several passes with several run-pass
options (or just one option with a list of comma separated passes) and
llc will build the related pipeline.
This is great to test the interaction of two passes that are not
necessarily next to each other in the pipeline, or play with pass
ordering.
Now, we should be at parity with opt for the flexibility of running
passes.
Note: I also moved the run pass option from CommandFlags.h to llc.cpp
because, really, this is needed only there!
Tom Stellard [Fri, 10 Jun 2016 00:31:13 +0000 (00:31 +0000)]
docs: Add AMDGPU relocation information
Summary:
This documents the various relocation types that are supported by the
Radeon Open Compute (ROC) runtime (which is essentially the dynamic
linker for AMDGPU).
Only R_AMDGPU_32 is not currently supported by the ROC runtime, but
it will usually be resolved at link time by lld.
Tom Stellard [Fri, 10 Jun 2016 00:01:04 +0000 (00:01 +0000)]
AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permute
Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.
Prior to this patch, we used argument/global stratified attributes in
order to note that a value could have come from either dereferencing a
global/arg, or from the assignment from a global/arg.
Now, AttrUnknown is placed on sets when we see a dereference, instead of
the global/arg attributes. This allows us to be more aggressive in the
future when we see global/arg attributes without AttrUnknown.
Alina Sbirlea [Thu, 9 Jun 2016 23:04:15 +0000 (23:04 +0000)]
Reapply 272328 and 272329 as a single patch.
[cpu-detection] [amdfam10] Return barcelona, and amdfam10 for all other
subtypes. Address Bug 28067.
Along with the refactoring of Host.cpp, getHostCPUName() was modified to
return more precise types for CPUs in amdfam10.
However, callers of getHostCPUName() do string matching on type, so this
cannot be modified.
Currently there is support in the x86 backend for barcelona.
For all other subtypes the assumed return value is amdfam10.
Fix: getHostCPUName() returns barcelona subtype and amdfam10 for all
others. This can be extended further when support for the other subtypes
is added.
Easwaran Raman [Thu, 9 Jun 2016 22:23:21 +0000 (22:23 +0000)]
Use ProfileSummaryInfo in inline cost analysis.
Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo.
When we delete a live-range, we check if that live-range is the origin of others
to keep it around for rematerialization. For that we check that the instruction
we are about to remove is the same as the definition of the VNI of the original
live-range.
If this is the case, we just shrink the live-range to an empty one.
Now, when we try to delete one of the children of such live-range (product of
splitting), we do the same check.
However, now the original live-range is empty and there is no way we can
access the VNI to check its definition, and we crash.
When we cannot get the VNI for the original live-range, that means we are not in
the presence of the original definition. Thus, this check does not need to happen
in that case and the crash is sloved!
This bug was introduced in r266162 | wmi | 2016-04-12 20:08:27. It affects every
target that uses the greedy register allocator.
To happen, we need to delete both a the original instruction and its split
products, in that order. This is likely to happen when rematerialization comes
into play.
Trying to produce a more robust test case. Will follow in a coming commit.
BitcodeReader: Use std:::piecewise_construct when upgrading type refs
r267296 used std::piecewise_construct without using
std::forward_as_tuple, and r267298 hacked it out (using an emplace_back
followed by a couple of reset() calls) because of a problem on a bot.
I'm finally circling back to call forward_as_tuple as I should have to
begin with (thanks to David Blaikie for pointing out the missing piece).
Note that this code uses emplace_back() instead of
push_back(make_pair()) because the move constructor for TrackingMDRef is
expensive (cheaper than a copy, but still expensive).
Justin Lebar [Thu, 9 Jun 2016 20:04:08 +0000 (20:04 +0000)]
[NVPTX] Add intrinsics for shfl instructions.
Summary:
Currently clang emits these instructions via inline (volatile) asm in
the CUDA headers. Switching to intrinsics will let the optimizer reason
across calls to these intrinsics.