[PowerPC] fix register alignment for long double type
This patch fixes register alignment for long double type in
soft float mode. Before this patch alignment was 8 and this
patch changes it to 4.
Differential Revision: http://reviews.llvm.org/D18034
Chris Dewhurst [Mon, 9 May 2016 11:55:15 +0000 (11:55 +0000)]
[Sparc][LEON] Add UMAC and SMAC instruction support for Sparc LEON subtargets
This change adds SMAC (signed multiply-accumulate) and UMAC (unsigned multiply-accumulate) for LEON subtargets of the Sparc processor.
The new files LeonFeatures.td and leon-instructions.ll will both be expanded in future, so I want to leave them separate as small files for this review, to be expanded in future check-ins.
Note: The functions are provided only for inline-assembly provision. No DAG selection is provided.
Silviu Baranga [Mon, 9 May 2016 11:10:44 +0000 (11:10 +0000)]
[AArch64] Implement lowering of the X constraint on AArch64
Summary:
This implements the lowering of the X constraint on
AArch64.
The default behaviour of the X constraint lowering is to
restrict it to "f". This is a problem because the "f"
constraint is not implemented on AArch64 and would be too
restrictive anyway. Therefore, the AArch64 hook will
lower this to "w" (if the operand is a floating point or
vector) or "r" otherwise.
The implementation is similar with the one added for
ARM (r267411).
This is the AArch64 side of the fix for http://llvm.org/PR26493
Daniel Sanders [Mon, 9 May 2016 10:21:14 +0000 (10:21 +0000)]
[mips][ias] R_MIPS_(GOT|HI|LO|PC)16 and R_MIPS_GPREL32 do not need symbols.
Summary:
In theory, care must be taken to ensure that pairs of R_MIPS_(GOT|HI|LO)16
make the same decision on both relocs in the reloc pair but in practice
this isn't as hard as it sounds and only limits the complexity of the
predicate used. We handle all three with the same code to ensure their
decisions always agree with each other.
Frederic Riss [Mon, 9 May 2016 06:01:12 +0000 (06:01 +0000)]
[dsymutil] Fix -arch option for thumb variants.
r267249 removed the dual ARM/Thumb interface from MachOObjectFile,
simplifying llvm-dsymutil's code. This unfortunately also regressed
llvm-dsymutil's ability to select thumb slices, because the simplified
code was also dealing with the discrepency between the slice arch
(eg. armv7m) and the triple arch name (eg. thumbv7m).
Craig Topper [Mon, 9 May 2016 05:34:12 +0000 (05:34 +0000)]
[AVX512] Fix up types for arguments of int_x86_avx512_mask_cvtsd2ss_round and int_x86_avx512_mask_cvtss2sd_round. Only the argument being converted should be a different type. The other 2 argument should have the same type as the result.
Craig Topper [Sun, 8 May 2016 23:08:45 +0000 (23:08 +0000)]
[AVX512] Add missing patterns for non-temporal stores of 128/256-bit vXi8/vXi16/vXi32 when VLX is enabled. The equivalent AVX1/2 patterns are disabled by VLX.
[Bitcode] Fix an unsigned integer overflow while parsing bitcode wrapper header
Specially crafted bitcode wrapper headers can cause unsigned interger
overflow and lead to crashes when wrapping around. Fix the offset check
and avoid such scenarios.
Writing a testcase for this would involve editing the binary to generate
values that trigger the overflow, since this would never happen while
generating the bitcode in regular compilation flows, so there's
currently no feasible way add one.
Craig Topper [Sun, 8 May 2016 20:10:20 +0000 (20:10 +0000)]
[X86] Remove extra patterns that check for BUILD_VECTOR of all 0s. These are always canonicalized to v4i32/v8i32/v16i32 except for in SSE1 only when only v4f32 is supported.
David Majnemer [Sun, 8 May 2016 08:15:50 +0000 (08:15 +0000)]
[X86] Promote several single precision FP libcalls on Windows
A number of libcalls don't exist in any particular lib but are, instead,
defined in math.h as inline functions (even in C mode!). Don't rely on
their existence when lowering @llvm.{cos,sin,floor,..}.f32, promote them
instead.
N.B. We had logic to handle FREM but were missing out on a number of
others. This change generalizes the FREM handling.
Craig Topper [Sun, 8 May 2016 07:10:50 +0000 (07:10 +0000)]
[X86] Add patterns for 256-bit non-temporal stores when only AVX1 is supported. While there, add a predicate to the SSE2 patterns to avoid an ordering dependency.
Craig Topper [Sun, 8 May 2016 07:10:47 +0000 (07:10 +0000)]
[X86] No need to avoid selecting AVX_SET0 for 256-bit integer types when only AVX1 is supported. AVX_SET0 just expands to 256-bit VXORPS which is legal in AVX1.
Weiming Zhao [Sun, 8 May 2016 05:11:54 +0000 (05:11 +0000)]
[ARM] Fix Scavenger assert due to underestimated stack size
(re-apply r268810 as it exposed an uninitialized variable in ARM MFI.
Patch 268868 should fix that.)
Summary:
Currently, when checking if a stack is "BigStack" or not, it doesn't count into spills and arguments. Therefore, LLVM won't reserve spill slot for this actually "BigStack". This may cause scavenger failure.
Sanjay Patel [Sat, 7 May 2016 15:03:40 +0000 (15:03 +0000)]
[x86, BMI] add TLI hook for 'andn' and use it to simplify comparisons
For the sake of minimalism, this patch is x86 only, but I think that at least
PPC, ARM, AArch64, and Sparc probably want to do this too.
We might want to generalize the hook and pattern recognition for a target like
PPC that has a full assortment of negated logic ops (orc, nand).
Note that http://reviews.llvm.org/D18842 will cause this transform to trigger
more often.
For reference, this relates to:
https://llvm.org/bugs/show_bug.cgi?id=27105
https://llvm.org/bugs/show_bug.cgi?id=27202
https://llvm.org/bugs/show_bug.cgi?id=27203
https://llvm.org/bugs/show_bug.cgi?id=27328
Mehdi Amini [Sat, 7 May 2016 04:10:52 +0000 (04:10 +0000)]
Refactor stripDebugInfo(Function) to handle intrinsic
This moves the code that handles stripping debug info intrinsic from
StripDebugInfo(Module) to StripDebugInfo(Function). The latter is
already walking every instructions so it makes sense to do it at the
same time.
This makes also stripDebugInfo(Function) as an API more useful: it
is really dropping every debug info in the Function.
Finally the existing code is trigerring an assertion when the Module
is not fully materialized.
Mehdi Amini [Sat, 7 May 2016 01:42:36 +0000 (01:42 +0000)]
Refactor stripDebugInfo(Function) to handle intrinsic
This moves the code that handles stripping debug info intrinsic from
StripDebugInfo(Module) to StripDebugInfo(Function). The latter is
already walking every instructions so it makes sense to do it at the
same time.
This makes also stripDebugInfo(Function) as an API more useful: it
is really dropping every debug info in the Function.
Finally the existing code is trigerring an assertion when the Module
is not fully materialized.
Ahmed Bougacha [Sat, 7 May 2016 01:11:17 +0000 (01:11 +0000)]
[X86] Teach X86FixupBWInsts to promote MOV8rr/MOV16rr to MOV32rr.
This re-applies r268760, reverted in r268794.
Fixes http://llvm.org/PR27670
The original imp-defs assertion was way overzealous: forward all
implicit operands, except imp-defs of the new super-reg def (r268787
for GR64, but also possible for GR16->GR32), or imp-uses of the new
super-reg use.
While there, mark the source use as Undef, and add an imp-use of the
old source reg: that should cover any case of dead super-regs.
At the stage the pass runs, flags are unlikely to matter anyway;
still, let's be as correct as possible.
Also add MIR tests for the various interesting cases.
Original commit message:
Codesize is less (16) or equal (8), and we avoid partial
dependencies.
Rong Xu [Fri, 6 May 2016 23:20:58 +0000 (23:20 +0000)]
[PGO] Use rsplit to parse value-data line in text profile file.
The value-data line is <PGOFuncName>:<Count_Value>. PGOFuncName might contain
':' for the internal linkage functions. We therefore need to use rsplit,
rather split, to extract the data from the line. This fixes the error when
merging a text profile file to an indexed profile file.
Adrian Prantl [Fri, 6 May 2016 22:53:06 +0000 (22:53 +0000)]
Implement a safer bitcode upgrade for DISubprogram.
The bitcode upgrade I added for DISubprogram in r266446 was based on the
assumption that the CU node for the subprogram was already materialized by the
time the DISubprogram is visited. This assumption may not hold true as future
versions of LLVM may decide to write out bitcode in a different order. This
patch corrects this by introducing a versioning bit next to the distinct flag to
unambiguously differentiate the new from the old record layouts.
Note for people stabilizing LLVM out-of-tree: This patch introduces a bitcode
incompatibility with llvm trunk revisions from r266446 — this commit. (But
D19987 will ensure that it degrades gracefully).
Matthias Braun [Fri, 6 May 2016 22:43:50 +0000 (22:43 +0000)]
DetectDeadLanes: Increase precision when detecting undef inputs
In case of COPY-like instruction we may be able to deduce that a certain
input is unused, based on the used lanes of the register defined by the
instruction.
This even works accross otherwise incompatible copies (no need to have
compatible lanemasks, completely unused operands are still completely
unused). It even makes sense to redo the analysis in this case since we
gained information for a case we previously stopped at because of the
incompatible masks.
Weiming Zhao [Fri, 6 May 2016 22:20:13 +0000 (22:20 +0000)]
[ARM] Fix Scavenger assert due to underestimated stack size
(this is resubmit of r268529 with minor refactoring. r268529 was reverted
at r268536 due a memory sanitizer failure. I have not been able to
reproduce that failure and I checked all the variable used in my change
but I could not spot an issue. I did some refactoring and see if it will
give a clearer hint)
Summary:
Currently, when checking if a stack is "BigStack" or not, it doesn't count into spills and arguments. Therefore, LLVM won't reserve spill slot for this actually "BigStack". This may cause scavenger failure.
Philip Reames [Fri, 6 May 2016 22:17:01 +0000 (22:17 +0000)]
Reapply 267210 with fix for PR27490
Original Commit Message
Extend load/store type canonicalization to handle unordered operations
Extend the type canonicalization logic to work for unordered atomic loads and stores. Note that while this change itself is fairly simple and low risk, there's a reasonable chance this will expose problems in the backends by suddenly generating IR they wouldn't have seen before. Anything of this nature will be an existing bug in the backend (you could write an atomic float load), but this will definitely change the frequency with which such cases are encountered. If you see problems, feel free to revert this change, but please make sure you collect a test case.
Note that the concern about lowering is now much less likely. PR27490 proved that we already *were* mucking with the types of ordered atomics and volatiles. As a result, this change doesn't introduce as much new behavior as originally thought.
Justin Bogner [Fri, 6 May 2016 21:57:30 +0000 (21:57 +0000)]
CMake: generate check targets for lit suites without their own lit.cfgs
Currently our cmake generates targets like check-llvm-unit and
check-llvm-transforms-loopunroll-x86, but not check-llvm-transforms or
check-llvm-transforms-adce. This is because the search for test suites
only lists the ones with a custom lit.cfg or lit.local.cfg.
Instead, we can do something a little smarter - any directory under
test that isn't called Inputs or inside a directory called Inputs is a
test suite.
Philip Reames [Fri, 6 May 2016 21:43:51 +0000 (21:43 +0000)]
[GVN] PRE of unordered loads
Again, fairly simple. Only change is ensuring that we actually copy the property of the load correctly. The aliasing legality constraints were already handled by the FRE patches. There's nothing special about unorder atomics from the perspective of the PRE algorithm itself.
[X86] Add a new LOW32_ADDR_ACCESS_RBP register class.
ABIs like NaCl uses 32-bit addresses but have 64-bit frame.
The new register class reflects those constraints when choosing a
register class for a address access.
[X86] Rename the X32_ADDR_ACCESS register class into LOW32_ADDR_ACCESS.
This register class may be used by any ABIs that uses x86_64 ISA while
using 32-bit addresses, not just in X32 cases. Make sure the name
reflects that.
Kevin Enderby [Fri, 6 May 2016 20:16:28 +0000 (20:16 +0000)]
Change GenericBinaryError to no longer include a FileName, which is then not
part of the error message.
As the caller is the one that needs to add the name of where the "object file"
comes from to the error message as the object file could be in an archive, or
coming from a slice of a Mach-O universal file or a buffer created by a JIT.
In the cases of a Mach-O universal file the architecture name may or may not
also need to be printed which is up to the tool code. For example if the tool
code is only selecting the host architecture slice then that architecture name
is never printed.
This patch is the change to the libObject code and there will be follow on
commits for changes to the code for each tool.
Adrian Prantl [Fri, 6 May 2016 19:26:47 +0000 (19:26 +0000)]
Refactor the Verifier so it can diagnose IR validation errors and debug
info metadata errors separately. (NFC)
This patch refactors the Verifier so it can diagnose IR validation errors
and debug info metadata errors separately.
The motivation behind this change is that broken (or outdated) debug info
can be "recovered" from by stripping the debug info.
The problem I'm trying to solve with this sequence of patches is that
historically we've done a really bad job at verifying debug info.
We want to be able to make the verifier stricter without having to worry
about breaking bitcode compatibility with existing producers. For example,
we don't necessarily want IR produced by an older version of clang to be
rejected by an LTO link just because of malformed debug info, and rather
provide an option to strip it. Note that merely outdated (but well-formed)
debug info would continue to be auto-upgraded in this scenario.
[Hexagon] Be careful about anti-dependencies with a call in packetizer
In a case like
J2_callr <ga:@foo>, %R0<imp-use>, ...
R0<def> = ...
the anti-dependency on R0 cannot be ignored and the two instructions
cannot be packetized together, since if they were, the assignment to
R0 would take place before the call.
Philip Reames [Fri, 6 May 2016 18:46:45 +0000 (18:46 +0000)]
[GVN] Handle unordered atomics in cross block FRE
You'll note there are essentially no code changes here. Cross block FRE heavily reuses code from the block local FRE. All of the tricky parts were done as part of the previous patch and the refactoring that removed the original code duplication.
Philip Reames [Fri, 6 May 2016 18:17:13 +0000 (18:17 +0000)]
[GVN] Do local FRE for unordered atomic loads
This patch is the first in a small series teaching GVN to optimize unordered loads aggressively. This change just handles block local FRE because that's the simplest thing which lets me test MDA, and the AvailableValue pieces. Somewhat suprisingly, MDA appears fine and only a couple of small changes are needed in GVN.
Once this is in, I'll tackle non-local FRE and PRE. The former looks like a natural extension of this, the later will require a couple of minor changes.
Mehdi Amini [Fri, 6 May 2016 18:17:03 +0000 (18:17 +0000)]
Tweak the ThinLTO pass pipeline
Summary:
The original ThinLTO pipeline was derived from some
work I did tuning FullLTO on the test suite and SPEC. This
patch reduces the amount of work done in the "linker phase" of
the build, and extend the function simplifications passes
performed during the "compile phase". This helps the build time
by reducing the IR as much as possible during the compile phase
and limiting the work to be performed during the "link phase",
while keeping the performance "on par" with the existing pipeline.
Sanjay Patel [Fri, 6 May 2016 18:07:46 +0000 (18:07 +0000)]
[SimplifyCFG] propagate branch metadata when creating select (retry r268550 / r268751 with possible fix)
Retrying r268550/r268751 which were reverted at r268577/r268765 due a memory sanitizer failure.
I have not been able to reproduce that failure, but I've taken another guess at fixing
the problem in this version of the patch and will watch for another failure.
Original commit message:
Unlike earlier similar fixes, we need to recalculate the branch weights
in this case.
Artem Tamazov [Fri, 6 May 2016 17:48:48 +0000 (17:48 +0000)]
[AMDGPU][llvm-mc] Add support for sendmsg(...) syntax.
Added support for sendmsg(MSG[, OP[, STREAM_ID]]) syntax
in s_sendmsg and s_sendmsghalt instructions.
The syntax matches the SP3 assembler/disassembler rules.
That is why implicit inputs (like M0 and EXEC) are not printed
to disassembly output anymore.
sendmsg(...) allows only known message types and attributes,
even if literals are used instead of symbolic names.
However, raw literal (without "sendmsg") still can be used,
and that allows for any 16-bit value.
Sanjay Patel [Fri, 6 May 2016 17:07:47 +0000 (17:07 +0000)]
[SimplifyCFG] propagate branch metadata when creating select (retry r268550 with possible fix)
Retrying r268550 which was reverted at r268577 due a memory sanitizer failure.
I have not been able to reproduce that failure, but I've taken a guess at fixing
the problem in this version of the patch and will watch for another failure.
Original commit message:
Unlike earlier similar fixes, we need to recalculate the branch weights
in this case.