Nick Lewycky [Sun, 6 Mar 2011 03:36:19 +0000 (03:36 +0000)]
ConstantInt has some getters which return ConstantInt's or ConstantVector's of
the value splatted into every element. Extend this to getTrue and getFalse which
by providing new overloads that take Types that are either i1 or <N x i1>. Use
it in InstCombine to add vector support to some code, fixing PR8469!
In Thumb1 mode the constant might be materialized via the load from constpool. Emit unwinding information in case when this load from constpool is used to change the stack pointer in the prologue.
Implement frame unwinding information emission for Thumb1. Not finished yet because there is no way given the constpool index to examine the actual entry: the reason is clones inserted by constant island pass, which are not tracked at all! The only connection is done during asmprinting time via magic label names which is really gross and needs to be eventually fixed.
Preliminary support for ARM frame save directives emission via MI flags.
This is just very first approximation how the stuff should be done
(e.g. ARM-only for now). More to follow.
NAKAMURA Takumi [Sat, 5 Mar 2011 09:46:36 +0000 (09:46 +0000)]
On Windows hosts, Python scripts in test/Scripts did not accept binary files from stdin. The environment variable "PYTHONUNBUFFERED" makes stdin as binary. Thanks to Danil Malyshev!
Cameron Zwarich [Sat, 5 Mar 2011 08:12:26 +0000 (08:12 +0000)]
Fix PR9398 - 10% of llc compile time is spent in Value::getNumUses. This reduces
the percentage of time spent in CodeGenPrepare when llcing 403.gcc from 12.6% to
1.8% of total llc time.
Andrew Trick [Sat, 5 Mar 2011 08:00:22 +0000 (08:00 +0000)]
Increased the register pressure limit on x86_64 from 8 to 12
regs. This is the only change in this checkin that may affects the
default scheduler. With better register tracking and heuristics, it
doesn't make sense to artificially lower the register limit so much.
Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to
give the scheduler a way to account for div and sqrt on targets that
don't have an itinerary. It is currently defaults to 10 (the actual
number doesn't matter much), but only takes effect on non-default
schedulers: list-hybrid and list-ilp.
Added several heuristics that can be individually disabled for the
non-default sched=list-ilp mode. This helps us determine how much
better we can do on a given benchmark than the default
scheduler. Certain compute intensive loops run much faster in this
mode with the right set of heuristics, and it doesn't seem to have
much negative impact elsewhere. Not all of the heuristics are needed,
but we still need to experiment to decide which should be disabled by
default for sched=list-ilp.
Nick Lewycky [Sat, 5 Mar 2011 05:19:11 +0000 (05:19 +0000)]
Thread comparisons over udiv/sdiv/ashr/lshr exact and lshr nuw/nsw whenever
possible. This goes into instcombine and instsimplify because instsimplify
doesn't need to check hasOneUse since it returns (almost exclusively) constants.
David Greene [Fri, 4 Mar 2011 23:02:52 +0000 (23:02 +0000)]
Fix the case where the number of jobs is less than the
number of threads. In that case make the number of threads
equal to the number of jobs and launch one jobs on each
thread. This makes things work like make -j.
Dan Gohman [Fri, 4 Mar 2011 20:46:46 +0000 (20:46 +0000)]
When decling to reuse existing expressions that involve casts, ignore
bitcasts, which are really no-ops here. This fixes slowdowns on
MultiSource/Applications/aha and others.
Benjamin Kramer [Fri, 4 Mar 2011 19:49:30 +0000 (19:49 +0000)]
raw_ostream: while it is generally desirable to do larger writes, it can lead to
inefficient file system buffering if the writes are not a multiple of the desired
buffer size. Avoid this by limiting the large write to a multiple of the buffer
size and copying the remainder into the buffer.
Initially, slot indexes are quad-spaced. There is room for inserting up to 3
new instructions between the original instructions.
When we run out of indexes between two instructions, renumber locally using
double-spaced indexes. The original quad-spacing means that we catch up quickly,
and we only have to renumber a handful of instructions to get a monotonic
sequence. This is much faster than renumbering the whole function as we did
before.
Duncan Sands [Fri, 4 Mar 2011 14:28:59 +0000 (14:28 +0000)]
Revert commit 126684 "Use the correct shift amount type". It is only the correct
type after type legalization has completed. Before then it may simply not be big
enough to hold the shift amount, particularly on x86 which uses a very small type
for shifts (this issue broke stuff in the past which is why LegalizeTypes carefully
uses a large type for shift amounts).
Kalle Raiskila [Fri, 4 Mar 2011 13:19:18 +0000 (13:19 +0000)]
Allow vector shifts (shl,lshr,ashr) on SPU.
There was a previous implementation with patterns that would
have matched e.g.
shl <v4i32> <i32>,
but this is not valid LLVM IR so they never were selected.
Nick Lewycky [Fri, 4 Mar 2011 10:06:52 +0000 (10:06 +0000)]
Fold "icmp pred (srem X, Y), Y" like we do for urem. Handle signed comparisons
in the urem case, though not the other way around. This is enough to get #3 from
PR9343!
Nick Lewycky [Fri, 4 Mar 2011 07:00:57 +0000 (07:00 +0000)]
Teach instruction simplify to use constant ranges to solve problems of the form
"icmp pred %X, CI" and a number of examples where "%X = binop %Y, CI2".
Some of these cases (div and rem) used to make it through opt -O2, but the
others are probably now making code elsewhere redundant (probably instcombine).
Andrew Trick [Fri, 4 Mar 2011 02:03:45 +0000 (02:03 +0000)]
Minor pre-RA-sched fixes and cleanup.
Fix the PendingQueue, then disable it because it's not required for
the current schedulers' heuristics.
Fix the logic for the unused list-ilp scheduler.
Bill Wendling [Thu, 3 Mar 2011 23:14:05 +0000 (23:14 +0000)]
There are times when the landing pad won't have a call to 'eh.selector' in
it. It's been assumed up til now that it would be in its immediate
successor. However, this isn't necessarily the case. It could be in one of its
successor's successors.
Modify the code to more thoroughly check for an 'eh.selector' call in
successors. It only looks at a successor if we get there as a result of an
unconditional branch.
Devang Patel [Thu, 3 Mar 2011 20:02:02 +0000 (20:02 +0000)]
llvm::Function argument count is not a good indicator of how many arugments does the function have at source level. If we need more space, just resize vector conservatively. This vector is only used once per function.
Richard Osborne [Thu, 3 Mar 2011 13:17:51 +0000 (13:17 +0000)]
Optimize printf -> iprintf if there are no floating point arguments
and iprintf is available on the target. Currently iprintf is only
marked as being available on the XCore.
Eli Friedman [Thu, 3 Mar 2011 07:24:36 +0000 (07:24 +0000)]
PR9352: Always emit a relocation for weak symbols. Not emitting relocations
for calls to weak symbols with a definition has the appearance of working
with LLVM-generated code because weak symbol definitions are put in their
own sections.
Renumber slot indexes uniformly instead of spacing according to the number of defs.
There are probably much larger speedups to be had by renumbering locally instead
of looping over the whole function. For now, the greedy register allocator is
25% faster.
Represent sentinel slot indexes with a null pointer.
This is much faster than using a pointer to a ManagedStatic object accessed with
a function call. The greedy register allocator is 5% faster overall just from
the SlotIndex default constructor savings.
Avoid comparing invalid slot indexes, and assert that it doesn't happen.
The SlotIndex created by the default construction does not represent a position
in the function, and it doesn't make sense to compare it to other indexes.