Logan Chien [Mon, 28 Oct 2013 17:51:12 +0000 (17:51 +0000)]
[arm] Implement eabi_attribute, cpu, and fpu directives.
This commit allows the ARM integrated assembler to parse
and assemble the code with .eabi_attribute, .cpu, and
.fpu directives.
To implement the feature, this commit moves the code from
AttrEmitter to ARMTargetStreamers, and several new test
cases related to cortex-m4, cortex-r5, and cortex-a15 are
added.
Besides, this commit also change the Subtarget->isFPOnlySP()
to Subtarget->hasD16() to match the usage of .fpu directive.
This commit changes the test cases:
* Several .eabi_attribute directives in
2010-09-29-mc-asm-header-test.ll are removed because the .fpu
directive already cover the functionality.
* In the Cortex-A15 test case, the value for
Tag_Advanced_SIMD_arch has be changed from 1 to 2,
which is more precise.
useAA significantly improves the handling of vector code that has TBAA
information attached. It also helps other cases, as shown by the testsuite
changes here. The only real downside I've seen is that it interferes with
MergeConsecutiveStores. The problem is that that optimization works top
down, starting at the first store in the chain, and looks for cases where
the chain result is only used by a single related store. These related
stores don't alias, so useAA will have rewritten all the later stores to
use a different chain input (typically the same one as the first store).
I think the advantages outweigh the disadvantages though, so for now I've
just disabled alias analysis for the unaligned-01.ll test.
[DAGCombiner] Respect volatility when checking for aliases
Making useAA() default to true for SystemZ showed that the combiner alias
analysis wasn't handling volatile accesses. This hit many of the SystemZ
tests, but I arbitrarily picked one for the purpose of this patch.
Keep TBAA info when rewriting SelectionDAG loads and stores
Most SelectionDAG code drops the TBAA info when creating a new form of a
load and store (e.g. during legalization, or when converting a plain
load to an extending one). This patch tries to catch all cases where
the TBAA information can legitimately be carried over.
The patch adds alternative forms of getLoad() and getExtLoad() that take
a MachineMemOperand instead of individual fields. (The corresponding
getTruncStore() already exists.) The idea is to use the MachineMemOperand
forms when all fields are carried over (size, pointer info, isVolatile,
isNonTemporal, alignment and TBAA info). If some adjustment is being
made, e.g. to narrow the load, then we still pass the individual fields
but also pass the TBAA info.
Benjamin Kramer [Mon, 28 Oct 2013 07:30:06 +0000 (07:30 +0000)]
SCEV: Make the final add of an inbounds GEP nuw if we know that the index is positive.
We can't do this for the general case as saying a GEP with a negative index
doesn't have unsigned wrap isn't valid for negative indices.
%gep = getelementptr inbounds i32* %p, i64 -1
But an inbounds GEP cannot run past the end of address space. So we check for
the very common case of a positive index and make GEPs derived from that NUW.
Together with Andy's recent non-unit stride work this lets us analyze loops
like
void foo3(int *a, int *b) {
for (; a < b; a++) {}
}
Reed Kotler [Sun, 27 Oct 2013 21:57:36 +0000 (21:57 +0000)]
Make first substantial checkin of my port of ARM constant islands code to Mips.
Before I just ported the shell of the pass. I've tried to keep everything
nearly identical to the ARM version. I think it will be very easy to eventually
merge these two and create a new more general pass that other targets can
use. I have some improvements I would like to make to allow pools to
be shared across functions and some other things. When I'm all done we
can think about making a more general pass. More to be ported but the
basic mechanism works now almost as good as gcc mips16.
NAKAMURA Takumi [Sun, 27 Oct 2013 10:22:52 +0000 (10:22 +0000)]
MCJIT-remote: __main should be resolved in child context.
- Mark tests as XFAIL:cygming in test/ExecutionEngine/MCJIT/remote.
Rather to suppress them, I'd like to leave them running as XFAIL.
- Revert r193472. RecordMemoryManager no longer resolves __main on cygming.
There are a couple of issues.
- X86 Codegen emits "call __main" in @main for targeting cygming.
It is useless in JIT. FYI, tests are passing when emitting __main is disabled.
- Current remote JIT does not resolve any symbols in child context.
FIXME: __main should be disabled, or remote JIT should resolve __main.
self.path may be empty or otherwise miss the normal system directories,
so try PATH next. Assume it is sane enough to cover the usual system
bash locations too, but the old list is not good enough for NetBSD.
Alp Toker [Sat, 26 Oct 2013 09:29:58 +0000 (09:29 +0000)]
lit: Issue a note when multiprocessing fails to load
If multiprocessing was requested, detected as available and subsequently failed
to initialize it's worth letting the user know about it before falling back to
threads.
This condition can arise in certain OpenBSD / FreeBSD Python versions.
Wan Xiaofei [Sat, 26 Oct 2013 03:08:02 +0000 (03:08 +0000)]
Quick look-up for block in loop.
This patch implements quick look-up for block in loop by maintaining a hash set for blocks.
It improves the efficiency of loop analysis a lot, the biggest improvement could be 5-6%(458.sjeng).
Below are the compilation time for our benchmark in llc before & after the patch.
Reviewer : Andrew Trick <atrick@apple.com>, Hal Finkel <hfinkel@anl.gov>
Approver : Andrew Trick <atrick@apple.com>
Test : Pass make check-all & llvm test-suite
Alp Toker [Sat, 26 Oct 2013 02:43:08 +0000 (02:43 +0000)]
Attempt to fix the FreeBSD build, disable multiprocessing
Speculative quick fix based on clang-X86_64-freebsd output:
File "/usr/local/lib/python2.6/multiprocessing/synchronize.py", line 33, in <module>
" function, see issue 3770.")
ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.
Andrew Trick [Fri, 25 Oct 2013 21:35:56 +0000 (21:35 +0000)]
Fix SCEVExpander: don't try to expand quadratic recurrences outside a loop.
Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
(affecting trunk and 3.3)
When SCEV expands a recurrence outside of a loop it attempts to scale
by the stride of the recurrence. Chained recurrences don't work that
way. We could compute binomial coefficients, but would hve to
guarantee that the chained AddRec's are in a perfectly reduced form.
Rafael Espindola [Fri, 25 Oct 2013 21:29:52 +0000 (21:29 +0000)]
Handle calls and invokes in GlobalStatus.
This patch teaches GlobalStatus to analyze a call that uses the global value as
a callee, not as an argument.
With this change internalize call handle the common use of linkonce_odr
functions. This reduces the number of linkonce_odr functions in a LTO build of
clang (checked with the emit-llvm gold plugin option) from 1730 to 60.
Hal Finkel [Fri, 25 Oct 2013 20:40:15 +0000 (20:40 +0000)]
LoopVectorizer: Don't attempt to vectorize extractelement instructions
The loop vectorizer does not currently understand how to vectorize
extractelement instructions. The existing check, which excluded all
vector-valued instructions, did not catch extractelement instructions because
it checked only the return value. As a result, vectorization would proceed,
producing illegal instructions like this:
Daniel Sanders [Fri, 25 Oct 2013 17:41:41 +0000 (17:41 +0000)]
[bugpoint] Increase the default memory limit for subprocesses to 300MB.
Summary:
Currently shared library builds (BUILD_SHARED_LIBS=ON in cmake) fail three
bugpoint tests (BugPoint/remove_arguments_test.ll,
BugPoint/crash-narrowfunctiontest.ll, and BugPoint/metadata.ll).
If I run the bugpoint commands that llvm-lit runs with without -silence-passes
I see errors such as this:
opt: error while loading shared libraries: libLLVMSystemZInfo.so: failed to
map segment from shared object: Cannot allocate memory
It seems that the increased size of the binaries in a shared library build is
causing the subprocess to exceed the 100MB memory limit. This patch therefore
increases the default limit to a level at which these tests pass.
Reviewers: dsanders
Reviewed By: dsanders
CC: llvm-commits, rafael
Differential Revision: http://llvm-reviews.chandlerc.com/D2013
Tim Northover [Fri, 25 Oct 2013 12:49:50 +0000 (12:49 +0000)]
ARM: allow .thumb_func to be separated from symbol definition
When assembling, a .thumb_func directive is supposed to be applicable to the
next symbol definition, even if there are intervening directives. We were
racing ahead to try and find it, and this commit should fix the issue.
Tim Northover [Fri, 25 Oct 2013 09:30:24 +0000 (09:30 +0000)]
ARM: don't expand atomicrmw inline on Cortex-M0
There's a barrier instruction so that should still be used, but most actual
atomic operations are going to need a platform decision on the correct
behaviour (either nop if single-threaded or OS-support otherwise).
Tim Northover [Fri, 25 Oct 2013 09:30:20 +0000 (09:30 +0000)]
LegalizeDAG: allow libcalls for max/min atomic operations
ARM processors without ldrex/strex need to be able to make libcalls for all
atomic operations, including the newer min/max versions.
The alternative would probably be expanding these operations in terms of
cmpxchg (as x86 does always), but in the configurations where this matters
code-size tends to be paramount so the libcall is more desirable.
Nadav Rotem [Fri, 25 Oct 2013 06:41:18 +0000 (06:41 +0000)]
Optimize concat_vectors(X, undef) -> scalar_to_vector(X).
This optimization is not SSE specific so I am moving it to DAGco.
The new scalar_to_vector dag node exposed a missing pattern in the AArch64 target that I needed to add.
Yuchen Wu [Fri, 25 Oct 2013 02:22:21 +0000 (02:22 +0000)]
Support for reading program counts in llvm-cov.
llvm-cov will now be able to read program counts from the GCDA file and
output it in the same format as gcov. The program summary tag was
identified from gcov-io.h as "\0\0\0\a3".
There is currently a bug in GCOVProfiling.cpp which does not generate
the
run- or program-counting IR, so this change was tested manually by
modifying the GCDA file and comparing the gcov and llvm-cov outputs.
Reid Kleckner [Thu, 24 Oct 2013 22:26:04 +0000 (22:26 +0000)]
lto.h: Use lto_bool_t instead of int to restore the ABI
This reverts commit r193255 and instead creates an lto_bool_t typedef
that points to bool, _Bool, or unsigned char depending on what is
available. Only recent versions of MSVC provide a stdbool.h header.
David Peixotto [Thu, 24 Oct 2013 16:39:36 +0000 (16:39 +0000)]
Remove class abstraction from ARM struct byval lowering
This commit changes the struct byval lowering for arm to use inline
checks for the subtarget instead of a class abstraction to represent
the differences. The class abstraction was judged to be too much
code for this task.
Renato Golin [Thu, 24 Oct 2013 14:50:51 +0000 (14:50 +0000)]
Mark vector loops as already vectorized
Make sure we mark all loops (scalar and vector) when vectorizing,
so that we don't try to vectorize them anymore. Also, set unroll
to 1, since this is what we check for on early exit.
Tim Northover [Thu, 24 Oct 2013 12:48:05 +0000 (12:48 +0000)]
ARM: add a couple more NEON predicates.
The fused multiply instructions were added in VFPv4 but are still NEON
instructions, in particular they shouldn't be available on a Cortex-M4 not
matter how floaty it is.
Tim Northover [Thu, 24 Oct 2013 12:22:58 +0000 (12:22 +0000)]
ARM: mark various aliases with their architecture requirements.
If an alias inherits directly from InstAlias then it doesn't get any default
"Requires" values, so llvm-mc will allow it even on architectures that don't
support the underlying instruction.
This tidies up the obvious VFP and NEON cases I found.
Tim Northover [Thu, 24 Oct 2013 10:37:09 +0000 (10:37 +0000)]
ARM: Use non-VFP softcalls on embedded Darwinish targets
The compiler-rt functions __adddf3vfp and so on exist purely to allow Thumb1
code to make use of VFP instructions by switching back to ARM mode, they make
no sense for M-class processors which don't even have an ARM mode.
Given that justification, in practice this is a platform ABI decision so the
actual check is based on that rather than CPU features.
Chandler Carruth [Thu, 24 Oct 2013 09:52:56 +0000 (09:52 +0000)]
Revert part of r193291, restoring the deletion of loaded objects.
Without this, customers of the MCJIT were leaking memory like crazy.
It's not really clear what the *right* memory management is here, so I'm
not trying to add lots of tests or other logic, just trying to get us
back to a better baseline. I'll follow up on the original commit to
figure out the right path forward.
Tim Northover [Thu, 24 Oct 2013 09:37:18 +0000 (09:37 +0000)]
ARM: fix assert on unpredictable POP instruction.
POP instructions are aliased to the ARM LDM variants but have different syntax.
This caused two problems: we tried to access a non-existent operand to annotate
the '!', and the error message didn't make much sense.
With some vigorous hand-waving in the error message both problems can be
fixed.
Nuno Lopes [Thu, 24 Oct 2013 09:17:24 +0000 (09:17 +0000)]
fix PR17635: false positive with packed structures
LLVM optimizers may widen accesses to packed structures that overflow the structure itself, but should be in bounds up to the alignment of the object
Amara Emerson [Thu, 24 Oct 2013 08:28:24 +0000 (08:28 +0000)]
[AArch64] Fix NZCV reg live-in bug in F128CSEL codegen.
When generating the IfTrue basic block during the F128CSEL pseudo-instruction
handling, the NZCV live-in for the newly created BB wasn't being added. This
caused a fault during MI-sched/live range calculation when the predecessor
for the fall-through BB didn't have a live-in for phys-reg as expected.
Craig Topper [Thu, 24 Oct 2013 06:45:13 +0000 (06:45 +0000)]
Add tests for SSE intrinsics in non-avx mode by copying from the AVX test cases. Some of these may have been tested by other tests, but most weren't. Patch by Cameron McInally.
Yuchen Wu [Thu, 24 Oct 2013 01:51:04 +0000 (01:51 +0000)]
Fixed llvm-cov to count edges instead of blocks.
This was a fundamental flaw in llvm-cov where it treated the values in
the GCDA files as block counts instead of edge counts. This created
incorrect line counts when branching was present. Instead, the edge
counts should be summed to obtain the correct block count.
The fix was tested using custom test files as well as single source
files from the test-suite directory. The behaviour can be verified by
reading the GCOV documentation that describes the GCDA spec ("ARC_COUNTS
gives the counter values for those arcs that are instrumented") and the
header description provided by GCOVProfiling.cpp ("instruments the code
that runs to records (sic) the edges between blocks that run and emit a
complementary "gcda" file on exit").
Yaron Keren [Wed, 23 Oct 2013 23:37:01 +0000 (23:37 +0000)]
(this is a corrected patch)
Calling _chkstk is required on ELF as well as COFF on Windows. Without
_chkstk, functions requiring large stack crash in initialization code.
Previous code tested for COFF format but not Mach-O and this patch modifies
the code to test for Windows OS (both Windows target and MingW target)
but not Mach-O object format: Looks like macho environment was used to
build some EFI code.
Manman Ren [Wed, 23 Oct 2013 23:05:28 +0000 (23:05 +0000)]
Debug Info: code clean up.
Since we never insert DIE for DITemplateTypeParameter to a map, there is no need
to call getDIE in getOrCreateTemplateTypeParameterDIE. It is also renamed to
constructTemplateTypeParameterDIE to match with other construct functions
in CompileUnit.
Same applies to getOrCreateTemplateValueParameterDIE.