Colin LeMahieu [Fri, 24 Oct 2014 19:00:32 +0000 (19:00 +0000)]
[Hexagon] Resubmission of 220427
Modified library structure to deal with circular dependency between HexagonInstPrinter and HexagonMCInst.
Adding encoding bits for add opcode.
Adding llvm-mc tests.
Removing unit tests.
Rafael Espindola [Fri, 24 Oct 2014 18:13:04 +0000 (18:13 +0000)]
Don't ever call materializeAllPermanently during LTO.
To do this, change the representation of lazy loaded functions.
The previous representation cannot differentiate between a function whose body
has been removed and one whose body hasn't been read from the .bc file. That
means that in order to drop a function, the entire body had to be read.
David Blaikie [Fri, 24 Oct 2014 17:57:34 +0000 (17:57 +0000)]
DebugInfo: Sink DwarfDebug::ScopeVariables down into DwarfFile
(part of refactoring to allow subprogram emission in both the skeleton
and main units to enable -gmlt-like data to be included in the skeleton
for live inlined backtracing purposes)
Sanjay Patel [Fri, 24 Oct 2014 17:02:16 +0000 (17:02 +0000)]
Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.
For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).
This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..
See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900
Daniel Sanders [Fri, 24 Oct 2014 16:15:27 +0000 (16:15 +0000)]
[mips] Replace MipsABIEnum with a MipsABIInfo class.
Summary:
No functional change yet, it's just an object replacement for an enum.
It will allow us to gather ABI information in a single place so that we can
start testing for properties of the ABI's instead of the ABI itself.
For example we will eventually be able to use:
ABI.MinStackAlignmentInBytes()
instead of:
(isABI_N32() || isABI_N64()) ? 16 : 8
which is clearer and more maintainable.
Aaron Ballman [Fri, 24 Oct 2014 15:16:39 +0000 (15:16 +0000)]
These functions are not actually defined for NDEBUG or !LLVM_DUMP_ENABLED, so guarding the declarations as well. NFC, silences MSVC warnings in release builds.
Daniel Sanders [Fri, 24 Oct 2014 13:09:19 +0000 (13:09 +0000)]
[mips] For N32/N64, structs must be passed in the upper bits of a register.
Summary:
Most structs were fixed by r218451 but those of between >32-bits and
<64-bits remained broken since they were not marked with [ASZ]ExtUpper.
This patch fixes the remaining cases by using
CCPromoteToUpperBitsInType<i64> on i64's in addition to i32 and smaller.
Oliver Stannard [Fri, 24 Oct 2014 09:54:41 +0000 (09:54 +0000)]
[AArch64] Fix fast-isel of cbz of i1, i8, i16
This fixes a miscompilation in the AArch64 fast-isel which was
triggered when a branch is based on an icmp with condition eq or ne,
and type i1, i8 or i16. The cbz instruction compares the whole 32-bit
register, so values with the bottom 1, 8 or 16 bits clear would cause
the wrong branch to be taken.
Adam Nemet [Fri, 24 Oct 2014 00:03:00 +0000 (00:03 +0000)]
[AVX512] FMA support for the 231 variants
This is asm/diasm-only support, similar to AVX.
For ISeling the register variant, they are no different from 213 other than
whether the multiplication or the addition operand is destructed.
For ISeling the memory variant, i.e. to fold a load, they are no different
than the 132 variant. The addition operand (op3) in both cases can come from
memory. Again the ony difference is which operand is destructed.
There could be a post-RA pass that would convert a 213 or 132 into a 231.
Adam Nemet [Fri, 24 Oct 2014 00:02:55 +0000 (00:02 +0000)]
[AVX512] Introduce fma3p_forms from AVX
This multiclass generates the different forms: 213, 231, 132 in AVX.
132 in AVX512 is a separate class but I am planning to use this same
multiclass to generate 231 relying on the nice the null_frag trick from AVX to
disable codegen pattern for 231.
No functionality change, no change in X86.td.expanded except for the different
instruction definition names.
Tim Northover [Thu, 23 Oct 2014 22:31:48 +0000 (22:31 +0000)]
ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodes
x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS
dependency because it was represented by a pair of CopyFromReg(EFLAGS) ->
CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an
implicit-def on the instruction, where the result numbers in the DAG and the
Uses list in TableGen matched up precisely.
The Copy notation seems much more robust, so this patch extends ScheduleDAG
rather than refactoring x86.
David Blaikie [Thu, 23 Oct 2014 22:27:50 +0000 (22:27 +0000)]
DebugInfo: Remove DwarfDebug::CurrentFnArguments since we have to handle argument ordering of other arguments (abstract arguments) in the same way and already have code for that too.
While refactoring this code I was confused by both the name I had
introduced (addNonArgumentVariable... but it has all this logic to
handle argument numbering and keep things in order?) and by the
redundancy. Seems when I fixed the misordered inlined argument handling,
I didn't realize it was mostly redundant with the argument ordering code
(which I may've also written, I'm not sure). So let's just rely on the
more general case.
The only oddity in output this produces is that it means when we emit
all the variables for the current function, we don't track when we've
finished the argument variables and are about to start the local
variables and insert DW_AT_unspecified_parameters (for varargs
functions) there. Instead it ends up after the local variables, scopes,
etc. But this isn't invalid and doesn't cause DWARF consumers problems
that I know of... so we'll just go with that because it makes the code
nice & simple.
(though, let's see what the buildbots have to say about this - *crosses
fingers*)
There will be some cleanup commits to follow to remove the now trivial
wrappers, etc.
Ahmed Bougacha [Thu, 23 Oct 2014 21:55:31 +0000 (21:55 +0000)]
[X86] Improve mul w/ overflow codegen, to MUL8+SETO.
Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where
3 are really needed.
This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to
MUL8/IMUL8 + SETO.
i8 is a special case because there is no two/three operand variants of
(I)MUL8, so the first operand and return value need to go in AL/AX.
Also, we can't write patterns for these instructions: TableGen refuses
patterns where output operands don't match SDNode results. In this case,
instructions where the output operand is an implicitly defined register.
A related special case (and FIXME) exists for MUL8 (X86InstrArith.td):
// FIXME: Used for 8-bit mul, ignore result upper 8 bits.
// This probably ought to be moved to a def : Pat<> if the
// syntax can be accepted.
[(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)]
Ideally, these go away with UMUL8, but we still need to improve TableGen
support of implicit operands in patterns.
Sanjay Patel [Thu, 23 Oct 2014 21:52:45 +0000 (21:52 +0000)]
Handle sqrt() shrinking in SimplifyLibCalls like any other call
This patch removes a chunk of special case logic for folding
(float)sqrt((double)x) -> sqrtf(x)
in InstCombineCasts and handles it in the mainstream path of SimplifyLibCalls.
No functional change intended, but I loosened the restriction on the existing
sqrt testcases to allow for this optimization even without unsafe-fp-math because
that's the existing behavior.
I also added a missing test case for not shrinking the llvm.sqrt.f64 intrinsic
in case the result is used as a double.
David Blaikie [Thu, 23 Oct 2014 19:12:43 +0000 (19:12 +0000)]
DebugInfo: Simplify/tidy/correct global variable decl/def emission handling.
This fixes a bug (introduced by fixing the IR emitted from Clang where
the definition of a static member would be scoped within the class,
rather than within its lexical decl context) where the definition of a
static variable would be placed inside a class.
It also improves source fidelity by scoping static class member
definitions inside the lexical decl context in which tehy are written
(eg: namespace n { class foo { static int i; } int foo::i; } - the
definition of 'i' will be within the namespace 'n' in the DWARF output
now).
Lastly, and the original goal, this reduces debug info size slightly
(and makes debug info easier to read, etc) by placing the definitions of
non-member global variables within their namespace, rather than using a
separate namespace-scoped declaration along with a definition at global
scope.
Chris Bieneman [Thu, 23 Oct 2014 17:22:14 +0000 (17:22 +0000)]
Adding llvm-shlib to CMake build system with a few new bells and whistles
Summary:
This patch adds a new CMake build setting LLVM_BUILD_LLVM_DYLIB, which defaults to OFF. When set to ON, this will generate a shared library containing most of LLVM. The contents of the shared library can be overriden by specifying LLVM_DYLIB_COMPONENTS. LLVM_DYLIB_COMPONENTS can be set to a semi-colon delimited list of any LLVM components that you llvm-config can resolve.
On Windows, unless you are using Cygwin, you must specify an explicit symbol export file using LLVM_EXPORTED_SYMBOL_FILE. On Cygwin and all unix-like platforms if you do not specify LLVM_EXPORTED_SYMBOL_FILE, an export file containing only the LLVM C API will be auto-generated from the list of LLVM components specified in LLVM_DYLIB_COMPONENTS.
Renato Golin [Thu, 23 Oct 2014 15:31:50 +0000 (15:31 +0000)]
Do not emit intermediate register for zero FP immediate
This updates check for double precision zero floating point constant to allow
use of instruction with immediate value rather than temporary register.
Currently "a == 0.0", where "a" is of "double" type generates:
Oliver Stannard [Thu, 23 Oct 2014 08:52:58 +0000 (08:52 +0000)]
[Thumb2] Improve disassembly of memory hints
Currently, the ARM disassembler will disassemble the Thumb2 memory hint
instructions (PLD, PLDW and PLI), even for targets which do not have
these instructions. This patch adds the required checks to the
disassmebler.
Frederic Riss [Thu, 23 Oct 2014 04:08:42 +0000 (04:08 +0000)]
Assert that ValueHandleBase::ValueIsRAUWd doesn't change the tracked Value type.
This invariant is enforced in Value::replaceAllUsesWith, thus it seems
logical to apply it also to ValueHandles. This commit fixes InstCombine
to not trigger the assertion during the removal of constant bitcasts in
call instructions.
Frederic Riss [Thu, 23 Oct 2014 04:08:38 +0000 (04:08 +0000)]
Modernize doxygen comments in Support/Dwarf.h
In post-commit review of r219442, Rafael pointed out that the comment style
of the newly introduced helper didn't follow LLVM's coding standard.
Modernize the whole file to the new standards.
David Blaikie [Thu, 23 Oct 2014 00:16:05 +0000 (00:16 +0000)]
[DebugInfo] Sink DwarfDebug::addCurrentFnArgument down into DwarfFile.
Variable handling will be sunk into DwarfFile so that abstract variables
and the like can be shared across multiple CUs (to handle cross-CU
inlining, for example).
David Blaikie [Thu, 23 Oct 2014 00:06:27 +0000 (00:06 +0000)]
[DebugInfo] Remove LexicalScopes::isCurrentFunctionScope and CSE a use of LexicalScopes::getCurrentFunctionScope
Now that we're sure the only root (non-abstract) scope is the current
function scope, there's no need for isCurrentFunctionScope, the property
can be tested directly instead.
Lang Hames [Wed, 22 Oct 2014 23:18:42 +0000 (23:18 +0000)]
[MCJIT] Make repeat calls to MCJIT::getPointerToFunction for declarations safe.
MCJIT::getPointerForFunction adds the resulting address to the global mapping.
This should be done via updateGlobalMapping rather than addGlobalMapping, since
the latter asserts if a mapping already exists.
MCJIT::getPointerToFunction is actually deprecated - hopefully we can remove it
(or more likely re-task it) entirely soon. In the mean time it should at least
work as advertised.
Derek Schuff [Wed, 22 Oct 2014 22:38:06 +0000 (22:38 +0000)]
[MC] Attach labels to existing fragments instead of using a separate fragment
Summary:
Currently when emitting a label, a new data fragment is created for it if the
current fragment isn't a data fragment.
This change instead enqueues the label and attaches it to the next fragment
(e.g. created for the next instruction) if possible.
When bundle alignment is not enabled, this has no functionality change (it
just results in fewer extra fragments being created). For bundle alignment,
previously labels would point to the beginning of the bundle padding instead
of the beginning of the emitted instruction. This was not only less efficient
(e.g. jumping to the nops instead of past them) but also led to miscalculation
of the address of the GOT (since MC uses a label difference rather than
emitting a "." symbol).
This would cause the flag to appear in the output of "llvm-config --cppflags",
which should contain only preprocessor flags. The -gsplit-dwarf flag in
particular can cause problems with certain downstream users such as cgo.
Justin Bogner [Wed, 22 Oct 2014 18:18:54 +0000 (18:18 +0000)]
test: Make this test runnable in directories with @ in their names
Jenkins likes to use directories with names involving the '@'
character, which breaks the sed expression in this test. Switch to use
'|' on the assumption that it's less likely to show up in a path.
Bill Schmidt [Wed, 22 Oct 2014 16:58:20 +0000 (16:58 +0000)]
[PATCH] Support select-cc for VSFRC when VSX is enabled
A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to
handle <2 x double> cases. This patch adds SELECT_VSFRC and
SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the
f64 type when VSX is enabled. The changes are analogous to those in
the previous patch. I've added a new variant to vsx.ll to test the
code generation.
(I also cleaned up a little formatting in PPCInstrVSX.td from the
previous patch.)
Colin LeMahieu [Wed, 22 Oct 2014 16:49:14 +0000 (16:49 +0000)]
[Hexagon] Adding basic disassembler.
Marking all instructions as CodeGenOnly since encoding bits are not set yet.
http://reviews.llvm.org/D5829?vs=on&id=15023&whitespace=ignore-all#toc
Philip Reames [Wed, 22 Oct 2014 16:37:13 +0000 (16:37 +0000)]
Preserving 'nonnull' metadata in SimplifyCFG
When we hoist two loads above an if, we can preserve the nonnull metadata. We could also do the same for sinking them, but we appear to not handle metadata at all in that case.
Sanjay Patel [Wed, 22 Oct 2014 15:29:23 +0000 (15:29 +0000)]
Shrinkify libcalls: use float versions of double libm functions with fast-math (bug 17850)
When a call to a double-precision libm function has fast-math semantics
(via function attribute for now because there is no IR-level FMF on calls),
we can avoid fpext/fptrunc operations and use the float version of the call
if the input and output are both float.
We already do this optimization using a command-line option; this patch just
adds the ability for fast-math to use the existing functionality.
I moved the cl::opt from InstructionCombining into SimplifyLibCalls because
it's only ever used internally to that class.
Modified the existing test cases to use the unsafe-fp-math attribute rather
than repeating all tests.
This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850
Diego Novillo [Wed, 22 Oct 2014 13:36:35 +0000 (13:36 +0000)]
Change error to warning when a profile cannot be found.
When the profile for a function cannot be applied, we use to emit an
error. This seems extreme. The compiler can continue, it's just that the
optimization opportunities won't include profile information.
Bill Schmidt [Wed, 22 Oct 2014 13:13:40 +0000 (13:13 +0000)]
[PowerPC] Support select-cc for VSX
The tests test/CodeGen/Generic/select-cc.ll and
test/CodeGen/PowerPC/select-cc.ll both fail with VSX enabled. The
problem is that the lowering logic for the SELECT and SELECT_CC
operations doesn't currently support the VSX registers. This patch
fixes that.
In lib/Target/PowerPC/PPCInstrInfo.td, we have pseudos to handle this
for other register classes. Similar pseudos are added in
PPCInstrVSX.td (they must be there, because the "vsrc" register class
definition appears there) for the VSRC register class. The
SELECT_VSRC pseudo is then used in pattern matching for SELECT_CC.
The rest of the patch just adds logic for SELECT_VSRC wherever similar
logic appears for SELECT_VRRC.
There are no new test cases because the existing tests above test
this, along with a variant in test/CodeGen/PowerPC/vsx.ll.
After discussion with Hal, a future patch will add similar _VSFRC
variants to override f64 type handling (currently using F8RC).
Aaron Ballman [Wed, 22 Oct 2014 13:09:43 +0000 (13:09 +0000)]
Fixing a -Wsign-compare warning; NFC.
I think it might make sense to make COFF::MaxNumberOfSections16 be a uint32_t, however, that may have wider-reaching implications in other projects, which is why I did not change that declaration.
Diego Novillo [Wed, 22 Oct 2014 12:59:00 +0000 (12:59 +0000)]
Support using sample profiles with partial debug info.
Summary:
When using a profile, we used to require the use -gmlt so that we could
get access to the line locations. This is used to match line numbers in
the input profile to the line numbers in the function's IR.
But this is actually not necessary. The driver can provide source
location tracking without the emission of debug information. In these
cases, the annotation 'llvm.dbg.cu' is missing from the IR, but the
actual line location annotations are still present.
This patch adds a new way of looking for the start of the current
function. Instead of looking through the compile units in llvm.dbg.cu,
we can walk up the scope for the first instruction in the function with
a debug loc. If that describes the function, we use it. Otherwise, we
keep looking until we find one.
If no such instruction is found, we then give up and produce an error.