Solaris doesn't have an endian.h header, but SPARC is the only
big-endian architecture that runs Solaris, so just use that to detect
endianness at compile time.
Filip Pizlo [Tue, 21 May 2013 20:24:07 +0000 (20:24 +0000)]
Put RTDyldMemoryManager into its own file, and make it linked into
libExecutionEngine. Move method implementations that aren't specific to
allocation out of SectionMemoryManager and into RTDyldMemoryManager.
This is in preparation for exposing RTDyldMemoryManager through the C
API.
This is a fixed version of r182407 and r182411. That first revision
broke builds because I forgot to move the conditional includes of
various POSIX headers from SectionMemoryManager into
RTDyldMemoryManager. Those includes are necessary because of how
getPointerToNamedFunction works around the glibc libc_nonshared.a thing.
The latter revision still broke things because I forgot to include
llvm/Config/config.h.
Filip Pizlo [Tue, 21 May 2013 20:07:12 +0000 (20:07 +0000)]
Put RTDyldMemoryManager into its own file, and make it linked into
libExecutionEngine. Move method implementations that aren't specific to
allocation out of SectionMemoryManager and into RTDyldMemoryManager.
This is in preparation for exposing RTDyldMemoryManager through the C
API.
This is a fixed version of r182407. That revision broke builds because I
forgot to move the conditional includes of various POSIX headers from
SectionMemoryManager into RTDyldMemoryManager. Those includes are
necessary because of how getPointerToNamedFunction works around the
glibc libc_nonshared.a thing.
Filip Pizlo [Tue, 21 May 2013 20:00:56 +0000 (20:00 +0000)]
Expose the RTDyldMemoryManager through the C API. This allows clients of
the C API to provide their own way of allocating JIT memory (both code
and data) and finalizing memory permissions (page protections, cache
flush).
Filip Pizlo [Tue, 21 May 2013 19:56:00 +0000 (19:56 +0000)]
Put RTDyldMemoryManager into its own file, and make it linked into
libExecutionEngine. Move method implementations that aren't specific to
allocation out of SectionMemoryManager and into RTDyldMemoryManager.
This is in preparation for exposing RTDyldMemoryManager through the C
API.
Hal Finkel [Tue, 21 May 2013 14:21:09 +0000 (14:21 +0000)]
Fix PPC branch selection for counter-based branches
Although I had added some support for the BDZ/BDNZ branches into the selector
(in r158204), I had not correctly adjusted the condition at the top of the
loop. As a result, these branches were still essentially unsupported.
This fixes PR16086. Unfortunately, any test case would be very large (because
it would need to force the loop backedge to exceed the range of the 16-bit
immediate).
Ulrich Weigand [Tue, 21 May 2013 10:30:59 +0000 (10:30 +0000)]
Alternative fix for problem addressed in r182233
Revision r182233 partially reverted the change in r181200 to simplify
JIT unif test #ifdefs, because that change caused a link error on some
host operating systems where the export list requires the following
symbols to be defined:
As discussed on the list, the commit reverts r182233 (and re-installs
the full r181200 change), and instead fixes the link problem by moving
those two symbols to the top of the file and unconditionally defining
them.
Manman Ren [Tue, 21 May 2013 00:57:22 +0000 (00:57 +0000)]
Dwarf: use a single line table to generate assembly when .loc is used.
This is to fix PR15408 where an undefined symbol Lline_table_start1 is used.
Since we do not generate the debug_line section when .loc is used,
Lline_table_start1 is not emitted and we can't refer to it when calculating
at_stmt_list for a compile unit.
Reed Kotler [Tue, 21 May 2013 00:50:30 +0000 (00:50 +0000)]
Add some additional functions to the list of helper functions for
pic calls. These need to be there so we don't try and use helper
functions when we call those.
As part of this, make sure that we properly exclude helper functions in pic
mode when indirect calls are involved.
Richard Smith [Mon, 20 May 2013 23:55:41 +0000 (23:55 +0000)]
Comment update: these things are called "configuration names" these days, not
"triples". Also remove the implication that they're only used for specifying a
target.
David Blaikie [Mon, 20 May 2013 22:50:35 +0000 (22:50 +0000)]
PR14606: Debug Info for namespace aliases/DW_TAG_imported_module
This resolves the last of the PR14606 failures in the GDB 7.5 test
suite by implementing an optional name field for
DW_TAG_imported_modules/DIImportedEntities and using that to implement
C++ namespace aliases (eg: "namespace X = Y;").
Hal Finkel [Mon, 20 May 2013 16:47:07 +0000 (16:47 +0000)]
Expose InsertPreheaderForLoop from LoopSimplify to other passes
Other passes, PPC counter-loop formation for example, also need to add loop
preheaders outside of the regular loop simplification pass. This makes
InsertPreheaderForLoop a global function so that it can be used by other
passes.
Hal Finkel [Mon, 20 May 2013 16:08:37 +0000 (16:08 +0000)]
Rename PPC MTCTRse to MTCTRloop
As the pairing of this instruction form with the bdnz/bdz branches is now
enforced by the verification pass, make it clear from the name that these
are used only for counter-based loops.
Hal Finkel [Mon, 20 May 2013 16:08:17 +0000 (16:08 +0000)]
Add a PPCCTRLoops verification pass
When asserts are enabled, this adds a verification pass for PPC counter-loop
formation. Unfortunately, without sacrificing code quality, there is no better
way of forming counter-based loops except at the (late) IR level. This means
that we need to recognize, at the IR level, anything which might turn into a
function call (or indirect branch). Because this is currently a finite set of
things, and because SelectionDAG lowering is basic-block local, this can be
done. Nevertheless, it is fragile, and failure results in a miscompile. This
verification pass checks that all (reachable) counter-based branches are
dominated by a loop mtctr instruction, and that no instructions in between
clobber the counter register. If these conditions are not satisfied, then an
ICE will be triggered.
In short, this is to help us sleep better at night.
Mihai Popa [Mon, 20 May 2013 14:57:05 +0000 (14:57 +0000)]
VSTn instructions have a number of encoding constraints which are not implemented. I have added these using wrapper methods around the original custom decoder (incidentally - this is a huge poorly written method that should be cleaned up. I have left it as is since the changes would be much to hard to review).
Mihai Popa [Mon, 20 May 2013 14:42:43 +0000 (14:42 +0000)]
Q registers are encoded in fields of the same length as D registers. As Q registers are half as many, the ARM reference manual mandates the least significant bit to be zeroed out. Failure to do so should result in an undefined instruction. With this change test/MC/Disassembler/ARM/invalid-VQADD-arm.txt is passing (removed XFAIL).
Before this change, the SystemZ backend would use BRCL for all branches
and only consider shortening them to BRC when generating an object file.
E.g. a branch on equal would use the JGE alias of BRCL in assembly output,
but might be shortened to the JE alias of BRC in ELF output. This was
a useful first step, but it had two problems:
(1) The z assembler isn't traditionally supposed to perform branch shortening
or branch relaxation. We followed this rule by not relaxing branches
in assembler input, but that meant that generating assembly code and
then assembling it would not produce the same result as going directly
to object code; the former would give long branches everywhere, whereas
the latter would use short branches where possible.
(2) Other useful branches, like COMPARE AND BRANCH, do not have long forms.
We would need to do something else before supporting them.
(Although COMPARE AND BRANCH does not change the condition codes,
the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction
during codegen, so that we can safely lower it to a separate compare
and long branch where necessary. This is not a valid transformation
for the assembler proper to make.)
This patch therefore moves branch relaxation to a pre-emit pass.
For now, calls are still shortened from BRASL to BRAS by the assembler,
although this too is not really the traditional behaviour.
The first test takes about 1.5s to run, and there are likely to be
more tests in this vein once further branch types are added. The feeling
on IRC was that 1.5s is a bit much for a single test, so I've restricted
it to SystemZ hosts for now.
The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests.
A later patch will remove the {{g}}s from that directory.
[NVPTX] Add GenericToNVVM IR converter to better handle idiomatic LLVM IR inputs
This converter currently only handles global variables in address space 0. For
these variables, they are promoted to address space 1 (global memory), and all
uses are updated to point to the result of a cvta.global instruction on the new
variable.
The motivation for this is address space 0 global variables are illegal since we
cannot declare variables in the generic address space. Instead, we place the
variables in address space 1 and explicitly convert the pointer to address
space 0. This is primarily intended to help new users who expect to be able to
place global variables in the default address space.
Introduction:
In case when stack alignment is 8 and GPRs parameter part size is not N*8:
we add padding to GPRs part, so part's last byte must be recovered at
address K*8-1.
We need to do it, since remained (stack) part of parameter starts from
address K*8, and we need to "attach" "GPRs head" without gaps to it:
FIX:
Note, once we added padding we need to correct *all* Arg offsets that are going
after padded one. That's why we need this fix: Arg offsets were never corrected
before this patch. See new test-cases included in patch.
We also don't need to insert padding for byval parameters that are stored in GPRs
only. We need pad only last byval parameter and only in case it outsides GPRs
and stack alignment = 8.
Though, stack area, allocated for recovered byval params, must satisfy
"Size mod 8 = 0" restriction.
This patch reduces stack usage for some cases:
We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be
"packed" with alignment 4 in some cases.
Bob Wilson [Mon, 20 May 2013 06:13:09 +0000 (06:13 +0000)]
Partially revert change in r181200 that tried to simplify JIT unit test #ifdefs.
The export list for this test requires the following symbols to be available:
JITTest_AvailableExternallyFunction
JITTest_AvailableExternallyGlobal
The change in r181200 commented them out, which caused the test to fail to
link, at least on Darwin. I have only reverted the change for arm, since I
can't test the other targets and since it sounds like that change was fixing
real problems for those other targets. It should be possible to rearrange the
code to keep those definitions outside the #ifdefs, but that should be done by
someone who can reproduce the problems that r181200 was trying to fix.
Bob Wilson [Sun, 19 May 2013 20:33:51 +0000 (20:33 +0000)]
Remove declaration of __clear_cache for __APPLE__. <rdar://problem/13924072>
This fixes a bootstrapping problem with builds for Apple ARM targets.
Clang had the wrong prototype for __clear_cache with ARM targets. Rafael
fixed that in clang svn r181784 and r181810, but without those changes,
we can't build this code for ARM because clang reports an error about the
declaration in Memory.inc not matching the builtin declaration. Some of our
buildbots need to use an older compiler that doesn't have the clang fix.
Since __clear_cache is never used here when __APPLE__ is defined, I'm just
conditionalizing the declaration to match that. I also moved the declaration
of sys_icache_invalidate inside the conditional for __APPLE__ while I was at
it.
Tim Northover [Sun, 19 May 2013 15:39:03 +0000 (15:39 +0000)]
AArch64: make RuntimeDyld relocations idempotent
AArch64 ELF uses .rela relocations so there's no need to actually make
use of the bits we're setting in the destination However, we should
make sure all bits are cleared properly since multiple runs of
resolveRelocations are possible and these could combine to produce
invalid results if stale versions remain in the code.
Tim Northover [Sun, 19 May 2013 15:28:16 +0000 (15:28 +0000)]
Invalidate instruction cache when setting memory to be executable.
lli's remote MCJIT code calls setExecutable just prior to running
code. In line with Darwin behaviour this seems to be the place to
invalidate any caches needed so that relocations can take effect
properly.
Tim Northover [Sun, 19 May 2013 09:55:06 +0000 (09:55 +0000)]
Print uint64_t -debug text correctly on 32-bit hosts
On 32-bit hosts %p can print garbage when given a uint64_t, we should
use %llx instead. This only affects the output of the debugging text
produced by lli.
Hal Finkel [Sat, 18 May 2013 09:20:39 +0000 (09:20 +0000)]
Check InlineAsm clobbers in PPCCTRLoops
We don't need to reject all inline asm as using the counter register (most does
not). Only those that explicitly clobber the counter register need to prevent
the transformation.
David Majnemer [Sat, 18 May 2013 01:02:03 +0000 (01:02 +0000)]
X86: Bad peephole interaction between adc, MOV32r0
The peephole tries to reorder MOV32r0 instructions such that they are
before the instruction that modifies EFLAGS.
The problem is that the peephole does not consider the case where the
instruction that modifies EFLAGS also depends on the previous state of
EFLAGS.
Instead, walk backwards until we find an instruction that has a def for
EFLAGS but does not have a use.
If we find such an instruction, insert the MOV32r0 before it.
If it cannot find such an instruction, skip the optimization.
JF Bastien [Fri, 17 May 2013 23:49:01 +0000 (23:49 +0000)]
Support unaligned load/store on more ARM targets
This patch matches GCC behavior: the code used to only allow unaligned
load/store on ARM for v6+ Darwin, it will now allow unaligned load/store
for v6+ Darwin as well as for v7+ on Linux and NaCl.
The distinction is made because v6 doesn't guarantee support (but LLVM
assumes that Apple controls hardware+kernel and therefore have
conformant v6 CPUs), whereas v7 does provide this guarantee (and
Linux/NaCl behave sanely).
The patch keeps the -arm-strict-align command line option, and adds
-arm-no-strict-align. They behave similarly to GCC's -mstrict-align and
-mnostrict-align.
I originally encountered this discrepancy in FastIsel tests which expect
unaligned load/store generation. Overall this should slightly improve
performance in most cases because of reduced I$ pressure.