David Majnemer [Mon, 14 Jul 2014 22:57:27 +0000 (22:57 +0000)]
CodeGen: Stick constant pool entries in COMDAT sections for WinCOFF
COFF lacks a feature that other object file formats support: mergeable
sections.
To work around this, MSVC sticks constant pool entries in special COMDAT
sections so that each constant is in it's own section. This permits
unused constants to be dropped and it also allows duplicate constants in
different translation units to get merged together.
Andrea Di Biagio [Mon, 14 Jul 2014 22:46:26 +0000 (22:46 +0000)]
[DAGCombiner] Add more rules to combine shuffle vector dag nodes.
This patch teaches the DAGCombiner how to fold a pair of shuffles
according to rules:
1. shuffle(shuffle A, B, M0), B, M1) -> shuffle(A, B, M2)
2. shuffle(shuffle A, B, M0), A, M1) -> shuffle(A, B, M3)
The new rules would only trigger if the resulting shuffle has legal type and
legal mask.
Added test 'combine-vec-shuffle-3.ll' to verify that DAGCombiner correctly
folds shuffles on x86 when the resulting mask is legal. Also added some negative
cases to verify that we avoid introducing illegal shuffles.
David Majnemer [Mon, 14 Jul 2014 21:56:54 +0000 (21:56 +0000)]
ADT: Surface LowerCase argument for utohexstr
The underlying function. utohex_buffer, already supports an argument for
deciding if the hex characters should be upper or lower case. Expose an
identical argument for utohexstr.
Support: Fix option handling when using cl::Required with aliasopt
Until now, attempting to create an alias of a required option would
complain if the user supplied the alias, because the required option
didn't have a value. Similarly, if you said the alias was required,
then using the base option would complain that the alias wasn't
supplied. Lastly, if you put required on both, *neither* option would
work.
By changning alias to overload addOccurrence and setting cl::Required
on the original option, we can get this to behave in a more useful
way. I've also added a test and updated a user that was getting this
wrong.
David Majnemer [Mon, 14 Jul 2014 20:38:45 +0000 (20:38 +0000)]
InstSimplify: Correct sdiv x / -1
Determining the bounds of x/ -1 would start off with us dividing it by
INT_MIN. Suffice to say, this would not work very well.
Instead, handle it upfront by checking for -1 and mapping it to the
range: [INT_MIN + 1, INT_MAX. This means that the result of our
division can be any value other than INT_MIN.
David Majnemer [Mon, 14 Jul 2014 19:49:57 +0000 (19:49 +0000)]
InstSimplify: The upper bound of X / C was missing a rounding step
Summary:
When calculating the upper bound of X / -8589934592, we would perform
the following calculation: Floor[INT_MAX / 8589934592]
However, flooring the result would make us wrongly come to the
conclusion that 1073741824 was not in the set of possible values.
Instead, use the ceiling of the result.
We would emit a libcall for a 64-bit atomic on x86 after SVN r212119. This was
due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a
32-bit target. They were added at different times and would result in the
border condition being mishandled.
This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic
operations on x86 at the cost of restoring a long-standing bug in the codegen.
We emit a cmpxchg8b on all x86 targets even where the CPU does not support this
instruction (pre-Pentium CPUs). Although this bug should be fixed, this was
present prior to SVN r212119 and this change, so this is not really introducing
a regression.
Found during windows unwinding work. This header is indirectly included through
a chain leading through Support/Win64EH.h. Explicitly include the header. NFC.
Tim Northover [Mon, 14 Jul 2014 15:31:13 +0000 (15:31 +0000)]
X86: remove temporary atomicrmw used during lowering.
We construct a temporary "atomicrmw xchg" instruction when lowering atomic
stores for widths that aren't supported natively. This isn't on the top-level
worklist though, so it won't be removed automatically and we have to do it
ourselves once that itself has been lowered.
Daniel Sanders [Mon, 14 Jul 2014 14:02:14 +0000 (14:02 +0000)]
[mips] Correct section alignments and EntrySizes for .bss, .text, .data, .reginfo, .MIPS.options, and .MIPS.abiflags
Summary:
.bss, .text, and .data are at least 16-byte aligned.
.reginfo is 4-byte aligned and has a 24-byte EntrySize.
.MIPS.abiflags has an 24-byte EntrySize.
.MIPS.options is 8-byte aligned and has 1-byte EntrySize.
Using a 1-byte EntrySize for .MIPS.options seems strange because the
records are neither 1-byte long nor fixed-length but this matches the value
that GAS emits.
Daniel Sanders [Mon, 14 Jul 2014 13:08:14 +0000 (13:08 +0000)]
[mips] For the FP64A ABI, odd-numbered double-precision moves must not use mtc1/mfc1.
Summary:
This is because the FP64A the hardware will redirect 32-bit reads/writes
from/to odd-numbered registers to the upper 32-bits of the corresponding
even register. In effect, simulating FR=0 mode when FR=0 mode is not
available.
Unfortunately, we have to make the decision to avoid mfc1/mtc1 before
register allocation so we currently do this for even registers too.
FPXX has a similar requirement on 32-bit architectures that lack
mfhc1/mthc1 so this patch also handles the affected moves from the FPU for
FPXX too. Moves to the FPU were supported by an earlier commit.
[CMake][Win32.DLL] Let llvm_add_library(SHARED) link dependent libraries as PRIVATE.
For example, c-index-test.exe requires just libclang.dll (its import library).
When libraries in libclang were not PRIVATE but PUBLIC, c-index-test required libraries transitive by libclang.
Note, on mingw with BUILD_SHARED_LIBS, library dependencies would become more strict.
In principle, required libraries should be "required in its source file".
Sasa Stankovic [Mon, 14 Jul 2014 09:40:29 +0000 (09:40 +0000)]
[mips] Expand BuildPairF64 to a spill and reload when the O32 FPXX ABI is
enabled and mthc1 and dmtc1 are not available (e.g. on MIPS32r1)
This prevents the upper 32-bits of a double precision value from being moved to
the FPU with mtc1 to an odd-numbered FPU register. This is necessary to ensure
that the code generated executes correctly regardless of the current FPU mode.
MIPS32r2 and above continues to use mtc1/mthc1, while MIPS-IV and above continue
to use dmtc1.
Bill Wendling [Mon, 14 Jul 2014 06:22:36 +0000 (06:22 +0000)]
Support lowering of empty aggregates.
This crash was pretty common while compiling Rust for iOS (armv7). Reason -
SjLj preparation step was lowering aggregate arguments as ExtractValue +
InsertValue. ExtractValue has assertion which checks that there is some data in
value, which is not true in case of empty (no fields) structures. Rust uses
them quite extensively so this patch uses a 'select true, %val, undef'
instruction to lower the argument.
Andrea Di Biagio [Sun, 13 Jul 2014 21:02:14 +0000 (21:02 +0000)]
[DAGCombiner] Fix a crash caused by a missing check for legal type when trying to fold shuffles.
Verify that DAGCombiner does not crash when trying to fold a pair of shuffles
according to rule (added at r212539):
(shuffle (shuffle A, Undef, M0), Undef, M1) -> (shuffle A, Undef, M2)
The DAGCombiner avoids folding shuffles if the resulting shuffle dag node
is not legal for the target. That means, the resulting shuffle must have
legal type and legal mask.
Before, the DAGCombiner only called method
'TargetLowering::isShuffleMaskLegal' to check if it was "safe" to fold according
to the above-mentioned rule. However, this caused a crash in the x86 backend
since method 'isShuffleMaskLegal' always expects to be called on a
legal vector type.
This is the first of a number of changes designed to generalise
MCWin64EHInstruction to support different target architectures. An ordered set
(vector) of these instructions is saved per frame to permit the emission of
information for Windows NT style unwinding. The only bit of information which
is actually target specific here is the Opcode for the unwinding bytecode. The
remainder of the information is simply generic information that is relevant to
the Windows NT unwinding model.
Remove the accessors for the fields, making them const and public instead. Sink
the knowledge of the alias'ed name into the single source and sink a single-use
check method into the use.
MC: make DWARF and Windows unwinding handling more similar
Rename member variables and functions for the MCStreamer for DWARF-like
unwinding management. Rename the Windows ones as well and make the naming and
handling similar across the two. No functional change intended.
AArch64: add support for llvm.aarch64.hint intrinsic
This adds a llvm.aarch64.hint intrinsic to mirror the llvm.arm.hint in order to
support the various hint intrinsic functions in the ACLE.
Add an optional pattern field that permits the subclass to specify the pattern
that matches the selection. The intrinsic pattern is set as mayLoad, mayStore,
so overload the value for the definition of the hint instruction.
Due to the fact that the windows unwinding has the concept of chained frames, we
maintain a current frame info pointer that is adjusted on any push and pop of a
unwinding context. This just removes an unnecessary variable that was used to
mirror the DWARF unwinding code.
This structure contains information related to the call frame used to generate
unwinding information. Rename this to reflect the future use to represent the
shared state between various architectures for WinCFI information.
Owen Anderson [Sat, 12 Jul 2014 07:12:47 +0000 (07:12 +0000)]
Fix an issue with the MergeBasicBlockIntoOnlyPred() helper function where it did
not properly handle the case where the predecessor block was the entry block to
the function. The only in-tree client of this is JumpThreading, which worked
around the issue in its own code. This patch moves the solution into the helper
so that JumpThreading (and other clients) do not have to replicate the same fix
everywhere.
[ASan] Collect unmangled names of global variables in Clang to print them in error reports.
Currently ASan instrumentation pass creates a string with global name
for each instrumented global (to include global names in the error report). Global
name is already mangled at this point, and we may not be able to demangle it
at runtime (e.g. there is no __cxa_demangle on Android).
Instead, create a string with fully qualified global name in Clang, and pass it
to ASan instrumentation pass in llvm.asan.globals metadata. If there is no metadata
for some global, ASan will use the original algorithm.
This fixes https://code.google.com/p/address-sanitizer/issues/detail?id=264.
Implementation is small now -- the interesting logic was moved to
`BranchProbability` a while ago. Move it into `bfi_detail` and get rid
of the related TODOs.
I was originally planning to define it within `BlockFrequencyInfoImpl`
(or `BFIIBase`), but it seems cleaner in a namespace. Besides,
`isPodLike` needs to be specialized before `BlockMass` can be used in
some of the other data structures, and there isn't a clear way to do
that.
This implements the target-independent lowering for the patchpoint
intrinsic. Targets have to implement the FastLowerCall
hook to support this intrinsic.
[FastISel] Add basic infrastructure to support a target-independent call lowering hook in FastISel. WIP
The infrastructure mimics the call lowering we have already in place for
SelectionDAG, but with limitations. For example structure return demotion and
non-simple types are not supported (yet).
Currently every backend has its own implementation and duplicated code for call
lowering. There is also no specified interface that could be called from
target-independent code. The target-hook is opt-in and doesn't affect current
implementations.
[FastISel] Breakout intrinsic lowering into a separate function and add a target-hook.
Create a separate helper function for target-independent intrinsic lowering. Also
add an target-hook that allows to directly call into a target-sepcific intrinsic
lowering method. Currently the implementation is opt-in and doesn't affect
existing target implementations.
Kevin Enderby [Fri, 11 Jul 2014 20:30:00 +0000 (20:30 +0000)]
Add the "-s" flag to llvm-nm for Mach-O files that prints symbols only in
the specified section. This is same functionality as darwin’s nm(1) "-s" flag.
There is one FIXME in the code and I’m all ears to anyone that can help me
with that. This option takes exactly two strings and should be allowed
anywhere on the command line. Such that "llvm-nm -s __TEXT __text foo.o"
would work. But that does not as the CommandLine Library does not have a
way to make this work as far as I can tell. For now the "-s __TEXT __text"
has to be last on the command line.
[PowerPC] Fix invalid displacement created by LocalStackAlloc
This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that
could result in the LocalStackAlloc pass creating an MI instruction
out-of-range displacement:
%vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32)
%G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31
(In final assembler output the top bits are stripped off, resulting
in a negative offset loading from below the stack pointer.)
Common code expects the isFrameOffsetLegal routine to verify whether
adding a given offset to the offset already present in the instruction
results in a valid displacement. However, on PowerPC the routine
did not take the already present instruction offset into account.
This commit fixes isFrameOffsetLegal to add the instruction offset,
and updates a local caller (needsFrameBaseReg) to no longer add the
instruction offset itself before calling isFrameOffsetLegal.
Marek Olsak [Fri, 11 Jul 2014 17:11:46 +0000 (17:11 +0000)]
R600/SI: add sample and image intrinsics exposing all instruction fields
We need the intrinsics with offsets, so why not just add them all.
The R128 parameter will also be useful for reducing SGPR usage.
GL_ARB_image_load_store also adds some image GLSL modifiers like "coherent",
so Mesa will probably translate those to slc, glc, etc.
When LLVM 3.5 is released, I'll switch Mesa to these new intrinsics.
Alp Toker [Fri, 11 Jul 2014 14:02:04 +0000 (14:02 +0000)]
raw_svector_ostream: grow and reserve atomically
Including the scratch buffer size in the initial reservation eliminates the
subsequent malloc+move operation and offers a healthier constant growth with
less memory wastage.
When doing this, take care to avoid invalidating the source buffer.