Benjamin Kramer [Fri, 4 Mar 2011 19:49:30 +0000 (19:49 +0000)]
raw_ostream: while it is generally desirable to do larger writes, it can lead to
inefficient file system buffering if the writes are not a multiple of the desired
buffer size. Avoid this by limiting the large write to a multiple of the buffer
size and copying the remainder into the buffer.
Initially, slot indexes are quad-spaced. There is room for inserting up to 3
new instructions between the original instructions.
When we run out of indexes between two instructions, renumber locally using
double-spaced indexes. The original quad-spacing means that we catch up quickly,
and we only have to renumber a handful of instructions to get a monotonic
sequence. This is much faster than renumbering the whole function as we did
before.
Duncan Sands [Fri, 4 Mar 2011 14:28:59 +0000 (14:28 +0000)]
Revert commit 126684 "Use the correct shift amount type". It is only the correct
type after type legalization has completed. Before then it may simply not be big
enough to hold the shift amount, particularly on x86 which uses a very small type
for shifts (this issue broke stuff in the past which is why LegalizeTypes carefully
uses a large type for shift amounts).
Kalle Raiskila [Fri, 4 Mar 2011 13:19:18 +0000 (13:19 +0000)]
Allow vector shifts (shl,lshr,ashr) on SPU.
There was a previous implementation with patterns that would
have matched e.g.
shl <v4i32> <i32>,
but this is not valid LLVM IR so they never were selected.
Nick Lewycky [Fri, 4 Mar 2011 10:06:52 +0000 (10:06 +0000)]
Fold "icmp pred (srem X, Y), Y" like we do for urem. Handle signed comparisons
in the urem case, though not the other way around. This is enough to get #3 from
PR9343!
Nick Lewycky [Fri, 4 Mar 2011 07:00:57 +0000 (07:00 +0000)]
Teach instruction simplify to use constant ranges to solve problems of the form
"icmp pred %X, CI" and a number of examples where "%X = binop %Y, CI2".
Some of these cases (div and rem) used to make it through opt -O2, but the
others are probably now making code elsewhere redundant (probably instcombine).
Andrew Trick [Fri, 4 Mar 2011 02:03:45 +0000 (02:03 +0000)]
Minor pre-RA-sched fixes and cleanup.
Fix the PendingQueue, then disable it because it's not required for
the current schedulers' heuristics.
Fix the logic for the unused list-ilp scheduler.
Bill Wendling [Thu, 3 Mar 2011 23:14:05 +0000 (23:14 +0000)]
There are times when the landing pad won't have a call to 'eh.selector' in
it. It's been assumed up til now that it would be in its immediate
successor. However, this isn't necessarily the case. It could be in one of its
successor's successors.
Modify the code to more thoroughly check for an 'eh.selector' call in
successors. It only looks at a successor if we get there as a result of an
unconditional branch.
Devang Patel [Thu, 3 Mar 2011 20:02:02 +0000 (20:02 +0000)]
llvm::Function argument count is not a good indicator of how many arugments does the function have at source level. If we need more space, just resize vector conservatively. This vector is only used once per function.
Richard Osborne [Thu, 3 Mar 2011 13:17:51 +0000 (13:17 +0000)]
Optimize printf -> iprintf if there are no floating point arguments
and iprintf is available on the target. Currently iprintf is only
marked as being available on the XCore.
Eli Friedman [Thu, 3 Mar 2011 07:24:36 +0000 (07:24 +0000)]
PR9352: Always emit a relocation for weak symbols. Not emitting relocations
for calls to weak symbols with a definition has the appearance of working
with LLVM-generated code because weak symbol definitions are put in their
own sections.
Renumber slot indexes uniformly instead of spacing according to the number of defs.
There are probably much larger speedups to be had by renumbering locally instead
of looping over the whole function. For now, the greedy register allocator is
25% faster.
Represent sentinel slot indexes with a null pointer.
This is much faster than using a pointer to a ManagedStatic object accessed with
a function call. The greedy register allocator is 5% faster overall just from
the SlotIndex default constructor savings.
Avoid comparing invalid slot indexes, and assert that it doesn't happen.
The SlotIndex created by the default construction does not represent a position
in the function, and it doesn't make sense to compare it to other indexes.
Bob Wilson [Wed, 2 Mar 2011 23:38:06 +0000 (23:38 +0000)]
Avoid exponential blow-up when printing DAGs.
David Greene changed CannotYetSelect() to print the full DAG including multiple
copies of operands reached through different paths in the DAG. Unfortunately
this blows up exponentially in some cases. The depth limit of 100 is way too
high to prevent this -- I'm seeing a message string of 150MB with a depth of
only 40 in one particularly bad case, even though the DAG has less than 200
nodes. Part of the problem is that the printing code is following chain
operands, so if you fail to select an operation with a chain, the printer will
follow all the chained operations back to the entry node.
Transfer simply defined values directly without recomputing liveness and SSA.
Values that map to a single new value in a new interval after splitting don't
need new PHIDefs, and if the parent value was never rematerialized the live
range will be the same.
Add a special streamer to libLTO that just records symbols definitions and
uses.
The result produced by the streamer is used to give the linker more accurate
information and to add to llvm.compiler.used. The second improvement removes
the need for the user to add __attribute__((used)) to functions only used in
inline asm. The first one lets us build firefox with LTO on Darwin :-)
Che-Liang Chiou [Wed, 2 Mar 2011 03:20:28 +0000 (03:20 +0000)]
Extend initial support for primitive types in PTX backend
- Allow i16, i32, i64, float, and double types, using the native .u16,
.u32, .u64, .f32, and .f64 PTX types.
- Allow loading/storing of all primitive types.
- Allow primitive types to be passed as parameters.
- Allow selection of PTX Version and Shader Model as sub-target attributes.
- Merge integer/floating-point test cases for load/store.
- Use .u32 instead of .s32 to conform to output from NVidia nvcc compiler.
Move the value map from LiveIntervalMap to SplitEditor.
The value map is currently not used, all values are 'complex mapped' and
LiveIntervalMap::mapValue is used to dig them out.
This is the first step in a series changes leading to the removal of
LiveIntervalMap. Its data structures can be shared among all the live intervals
created by a split, so it is wasteful to create a copy for each.
Devang Patel [Tue, 1 Mar 2011 22:58:13 +0000 (22:58 +0000)]
Today, the language front ends produces llvm.dbg.* intrinsics, used to encode arguments' debug info, in order any way, most of the times. However, if a front end mix-n-matches llvm.dbg.declare and llvm.dbg.value intrinsics to encode debug info for arguments then code generator needs a way to find argument order.
Use 8 bits from line number field to keep track of argument ordering while encoding debug info for an argument. That leaves 24 bit for line no, DebugLoc also allocates 24 bit for line numbers. If a function has more than 255 arguments then rest of the arguments will be ordered by llvm.dbg.* intrinsics' ordering in IR.
Oscar Fuentes [Tue, 1 Mar 2011 22:31:19 +0000 (22:31 +0000)]
Cmake fix for option defaults not being set correctly on first run
On the first cmake run before the caches has been updated with the
default options, options defined after HandleLLVMOptions are always
treated as off inside HandleLLVMOptions.