Tom Stellard [Tue, 24 Jun 2014 23:33:07 +0000 (23:33 +0000)]
R600/SI: Use a ComplexPattern for MUBUF stores
Now that non-leaf ComplexPatterns are allowed we can fold all the MUBUF
store patterns into the instruction definition. We will also be able to
reuse this new ComplexPattern for MUBUF loads and atomic operations.
Rafael Espindola [Tue, 24 Jun 2014 22:45:16 +0000 (22:45 +0000)]
Print a=b as an assignment.
In assembly the expression a=b is parsed as an assignment, so it should be
printed as one.
This remove a truly horrible hack for producing a label with "a=.". It would
be used by codegen but would never be reached by the asm parser. Sorry I
missed this when it was first committed.
Matt Arsenault [Tue, 24 Jun 2014 22:13:39 +0000 (22:13 +0000)]
R600: Fix inconsistency in rsq instructions.
R600 was using a clamped version of rsq, but SI was not. Add a
new rsq_clamped intrinsic and use them consistently.
It's unclear to me from the documentation what behavior
the R600 instructions have, so I assume they have the legacy behavior
described by the SI documents. For R600, use RECIPSQRT_IEEE
for both llvm.AMDGPU.rsq.legacy and llvm.AMDGPU.rsq. R600 also
has RECIPSQRT_FF, which I'm not sure how it fits in here.
David Blaikie [Tue, 24 Jun 2014 20:10:27 +0000 (20:10 +0000)]
Fix up scoping in a few tests (and delete one that validates unnecessary behavior).
Most of this is just tests that were silently succeeding in spite of
schema changes I made over a year ago. Cleaning them up as they lead to
failures in a change I'm working on/will come soon.
test/DebugInfo/2010-01-19-DbgScope.ll was removed as it tested miscoping
where a DebugLoc described a location not in the current function. The
test case doesn't describe why this is a valid situation and should be
supported, so I'm removing it and shortly going to commit changes that
make this firmly unsupported/assert-fail.
Bill Schmidt [Tue, 24 Jun 2014 20:05:18 +0000 (20:05 +0000)]
[PPC64] Fix PR20071 (fctiduz generated for targets lacking that instruction)
PR20071 identifies a problem in PowerPC's fast-isel implementation for
floating-point conversion to integer. The fctiduz instruction was added in
Power ISA 2.06 (i.e., Power7 and later). However, this instruction is being
generated regardless of which 64-bit PowerPC target is selected.
The intent is for fast-isel to punt to DAG selection when this instruction is
not available. This patch implements that change. For testing purposes, the
existing fast-isel-conversion.ll test adds a RUN line for -mcpu=970 and tests
for the expected code generation. Additionally, the existing test
fast-isel-conversion-p5.ll was found to be incorrectly expecting the
unavailable instruction to be generated. I've removed these test variants
since we have adequate coverage in fast-isel-conversion.ll.
Diego Novillo [Tue, 24 Jun 2014 17:02:03 +0000 (17:02 +0000)]
Add new debug kind LocTrackingOnly.
Summary:
This new debug emission kind supports emitting line location
information in all instructions, but stops code generation
from emitting debug info to the final output.
This mode is useful when the backend wants to track source
locations during code generation, but it does not want to
produce debug info. This is currently used by optimization
remarks (-pass-remarks, -pass-remarks-missed and
-pass-remarks-analysis).
To prevent debug info emission, DIBuilder never inserts the
annotation 'llvm.dbg.cu' when LocTrackingOnly is enabled.
Daniel Sanders [Tue, 24 Jun 2014 13:53:56 +0000 (13:53 +0000)]
Revert: r211588 - [mips] Use __clear_cache builtin instead of cacheflush() in Unix Memory::InvalidateInstructionCache()
Buildbot reports a test failure on the llvm-mips-linux builder and blames r211588.
Although it doesn't appear in the blamelist, it seems it could also be r211587
(because it's committed to compiler-rt?) since they were tested together.
Reverting the most likely suspect (r211588) to confirm one way or the other.
Daniel Sanders [Tue, 24 Jun 2014 13:00:32 +0000 (13:00 +0000)]
[mips] Added support for assembling sdbbp.
Summary:
This instruction is re-encoded in MIPS32r6/MIPS64r6 without changing the
restrictions. We hadn't implemented it for earlier ISA's so it has been added to those too.
ScaledNumber has been cleaned up enough to pull out of BFI now. Still
work to do there (tests for shifting, bloated printing code, etc.), but
it seems clean enough for its new home.
Rafael Espindola [Mon, 23 Jun 2014 22:00:37 +0000 (22:00 +0000)]
Pass a std::unique_ptr& to the create??? methods is lib/Object.
This makes the buffer ownership on error conditions very natural. The buffer
is only moved out of the argument if an object is constructed that now
owns the buffer.
Juergen Ributzka [Mon, 23 Jun 2014 21:55:44 +0000 (21:55 +0000)]
[FastISel][X86] Lower unsupported selects to control-flow.
The extends the select lowering coverage by emiting pseudo cmov
instructions. These insturction will be later on lowered to control-flow to
simulate the select.
Juergen Ributzka [Mon, 23 Jun 2014 21:55:40 +0000 (21:55 +0000)]
[FastISel][X86] Add support for floating-point select.
This extends the select lowering to support floating-point selects. The
lowering depends on SSE instructions and that the conditon comes from a
floating-point compare. Under this conditions it is possible to emit an
optimized instruction sequence that doesn't require any branches to
simulate the select.
Rafael Espindola [Mon, 23 Jun 2014 21:53:12 +0000 (21:53 +0000)]
Make ObjectFile and BitcodeReader always own the MemoryBuffer.
This allows us to just use a std::unique_ptr to store the pointer to the buffer.
The flip side is that they have to support releasing the buffer back to the
caller.
Overall this looks like a more efficient and less brittle api.
Weiming Zhao [Mon, 23 Jun 2014 20:44:16 +0000 (20:44 +0000)]
Fix PR20056: Implement pseudo LDR <reg>, =<literal/label> for AArch64
This patch is based on the changes from ARM target [1,2]
Based on ARM doc [3], if the literal value can be loaded with a valid MOV,
it can emit that instruction. This is implemented in this patch.
[1] Fix PR18345: ldr= pseudo instruction produces incorrect code when using in inline assembly
Author: David Peixotto <dpeixott@codeaurora.org>
commit b92cca222898d87bbc764fa22e805adb04ef7f13 (r200777)
[2] Implement the ldr-pseudo opcode for ARM assembly
Author: David Peixotto <dpeixott@codeaurora.org>
commit 0fa193b08627927ccaa0804a34d80480894614b8 (r197708)
[3] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/CJAHAIBC.html
David Blaikie [Mon, 23 Jun 2014 18:28:53 +0000 (18:28 +0000)]
Recommit 211309 (StringMap::insert), reverted in 211328 due to issues with private, but non-deleted, move members.
Certain versions of GCC (~4.7) couldn't handle the SFINAE on access
control, but with "= delete" (hidden behind a macro for portability)
this issue is worked around/addressed.
Rafael Espindola [Mon, 23 Jun 2014 15:13:23 +0000 (15:13 +0000)]
Stop producing func.eh symbols on Darwin.
According Nick Kledzik (http://llvm.org/bugs/show_bug.cgi?id=19430#c2):
"... mach-o no longer needs names in the __eh_frame section (and has not for
years)."
Iain Sandoe confirms it is also unnecessary for their old darwin support.
As of r211495, the only remaining users of getMinCallFrameSize are in
core ABI code (LowerFormalParameter / LowerCall). This is actually a
good thing, since the details of the parameter save area are ABI specific.
With the new ELFv2 ABI in particular, the rules defining the size of the
save area will become significantly more complex, so it wouldn't make
sense to implement those outside ABI code that has all required
information.
In preparation, this patch eliminates the getMinCallFrameSize (and
associated getMinCallArgumentsSize) routines, and inlines them into all
callers. Note that since nearly all call arguments are constant, this
allows simplifying the inlined copies to a single line everywhere.
Ulrich Weigand [Mon, 23 Jun 2014 13:47:52 +0000 (13:47 +0000)]
[PowerPC] Allow stack frames without parameter save area
The PPCFrameLowering::determineFrameLayout routine currently ensures
that every function that allocates a stack frame provides space for the
parameter save area (via PPCFrameLowering::getMinCallFrameSize).
This is actually not necessary. There may be functions that never call
another routine but still allocate a frame; those do not require the
parameter save area. In the future, with the ELFv2 ABI, even some
routines that do call other functions do not need to allocate the
parameter save area.
While it is not a bug to allocate the parameter area when it is not
needed, it is better to avoid it to save stack space.
Note that when any particular function call requires the parameter save
area, this space will already have been included by ABI code in the size
the CALLSEQ_START insn is annotated with, and therefore included in the
size returned by MFI->getMaxCallFrameSize().
This means that determineFrameLayout simply does not need to care about
the parameter save area. (It still needs to ensure that every frame
provides the linkage area.) This is implemented by this patch.
Note that this exposed a bug in the new fast-isel code where the parameter
area was *not* included in the CALLSEQ_START size; this is also fixed.
A couple of test cases needed to be adapted for the new (smaller) stack
frame size those tests now see.
Ulrich Weigand [Mon, 23 Jun 2014 13:21:43 +0000 (13:21 +0000)]
[PowerPC] Fix IsDarwin arg in PPCFrameLowering:: calls
As remarked in the commit message to r211493, in several places
throughout the 64-bit SVR4 ABI code there are calls to
PPCFrameLowering::getLinkageSize and getMinCallFrameSize
using an incorrect IsDarwin argument of "true".
(Some of those were made explicit by the above refactoring patch, others
have been there all along.)
This patch fixes those places to pass "false" for IsDarwin.
Ulrich Weigand [Mon, 23 Jun 2014 13:08:27 +0000 (13:08 +0000)]
[PowerPC] Refactor setMinReservedArea and CalculateParameterAndLinkageAreaSize
The PPCISelLowering.cpp routines PPCTargetLowering::setMinReservedArea and
CalculateParameterAndLinkageAreaSize are currently used as subroutines
from both 64-bit SVR4 and Darwin ABI code.
However, the two ABIs are already quite different w.r.t. AltiVec
conventions, and they will become more different when the ELFv2 ABI is
supported. Also, in general it seems better to disentangle ABI support
routines for different ABIs to avoid accidentally affecting one ABI when
intending to change only the other.
(Actually, the current code strictly speaking already contains a bug:
these routines call PPCFrameLowering::getMinCallFrameSize and
PPCFrameLowering::getLinkageSize with the IsDarwin parameter set to
"true" even on 64-bit SVR4. This bug currently has no adverse effect
since those routines always return the same for 64-bit SVR4 and 64-bit
Darwin, but it still seems wrong ... I'll fix this in a follow-up
commit shortly.)
To remove this code sharing, I'm simply inlining both routines into all
call sites (there are just two each, one for 64-bit SVR4 and one for
Darwin), and simplifying due to constant parameters where possible.
A small piece of code that *does* make sense to share is refactored into
the new routine EnsureStackAlignment, now also called from 32-bit SVR4
ABI code.
Ulrich Weigand [Mon, 23 Jun 2014 12:36:34 +0000 (12:36 +0000)]
[PowerPC] Fix on-stack AltiVec arguments with 64-bit SVR4
Current 64-bit SVR4 code seems to have some remnants of Darwin code
in AltiVec argument handing. This had the effect that AltiVec arguments
(or subsequent arguments) were not correctly placed in the parameter area
in some cases.
The correct behaviour with the 64-bit SVR4 ABI is:
- All AltiVec arguments take up space in the parameter area, just like
any other arguments, whether vararg or not.
- They are always 16-byte aligned, skipping a parameter area doubleword
(and the associated GPR, if any), if necessary.
This patch implements the correct behaviour and adds a test case.
(Verified against GCC behaviour via the ABI compat test suite.)
Tim Northover [Mon, 23 Jun 2014 09:20:02 +0000 (09:20 +0000)]
ARM: mark UBFX as not allowing PC.
Strictly, it's unpredictable. But we don't quite model that yet and an error is
better than ignoring the issue. This one somehow got left out before though.
Correct the section flags for code built for Windows on ARM with
`-ffunction-sections`. Windows on ARM uses solely Thumb-2 instructions, and
indicates that the function is thumb by placing it in a text section that has
IMAGE_SCN_MEM_16BIT flag set.
When we encounter a .section directive, a new section is constructed. This may
be a text segment. In order to identify that we need the additional flag,
expose the target triple through the ObjectFileInfo as this information is lost
otherwise.
Since any modern ARM targeting environment on Windows would be Thumb-2 (Windows
ARM NT or Windows Embedded Compact), introducing a new flag to indicate the
section attribute seems to be a bit overkill. Simply depend on the target
triple. Since there is one location that this information is currently needed,
creating a target specific assembly parser and delegating the parsing of section
switches also feels a bit heavy handed. If it turns out that this information
ends up changing additional behaviour, then it may be worth considering that
alternative.
Jan Vesely [Sun, 22 Jun 2014 21:43:00 +0000 (21:43 +0000)]
R600: Implement custom SDIVREM.
Instead of separate SDIV/SREM. SDIV used UDIV which in turn used UDIVREM anyway.
SREM used SDIV(UDIV->UDIVREM)+MUL+SUB, using UDIVREM directly is more efficient.
v2: Don't use all caps names
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211477 91177308-0d34-0410-b5e6-96231b3b80d8