Rename "fpaccuracy" metadata to the more generic "fpmath". That's because I'm
thinking of generalizing it to be able to specify other freedoms beyond accuracy
(such as that NaN's don't have to be respected). I'd like the 3.1 release (the
first one with this metadata) to have the more generic name already rather than
having to auto-upgrade it in 3.2.
Hal Finkel [Sat, 14 Apr 2012 07:32:50 +0000 (07:32 +0000)]
Fix an error in BBVectorize important for vectorizing pointer types.
When vectorizing pointer types it is important to realize that potential
pairs cannot be connected via the address pointer argument of a load or store.
This is because even after vectorization, the address is still a scalar because
the address of the higher half of the pair is implicit from the address of the
lower half (it need not be, and should not be, explicitly computed).
Andrew Trick [Fri, 13 Apr 2012 23:29:54 +0000 (23:29 +0000)]
misched: Added CanHandleTerminators.
This is a special flag for targets that really want their block
terminators in the DAG. The default scheduler cannot handle this
correctly, so it becomes the specialized scheduler's responsibility to
schedule terminators.
Benjamin Kramer [Fri, 13 Apr 2012 20:06:17 +0000 (20:06 +0000)]
Reduce malloc traffic in DwarfAccelTable
- Don't copy offsets into HashData, the underlying vector won't change once the table is finalized.
- Allocate HashData and HashDataContents in a BumpPtrAllocator.
- Allocate string map entries in the same allocator.
- Random cleanups.
Kevin Enderby [Fri, 13 Apr 2012 18:46:37 +0000 (18:46 +0000)]
For ARM disassembly only print 32 unsigned bits for the address of branch
targets so if the branch target has the high bit set it does not get printed as:
beq 0xffffffff8008c404
Dan Gohman [Fri, 13 Apr 2012 18:28:58 +0000 (18:28 +0000)]
Consider ObjC runtime calls objc_storeWeak and others which make a copy of
their argument as "escape" points for objc_retainBlock optimization.
This fixes rdar://11229925.
Hal Finkel [Fri, 13 Apr 2012 17:15:33 +0000 (17:15 +0000)]
By default, use Early-CSE instead of GVN for vectorization cleanup.
As has been suggested by Duncan and others, Early-CSE and GVN should
do similar redundancy elimination, but Early-CSE is much less expensive.
Most of my autovectorization benchmarks show a performance regresion, but
all of these are < 0.1%, and so I think that it is still worth using
the less expensive pass.
Catch the Python exception when subprocess.Popen is failing.
For example, if llc cannot be found, the full python stacktrace is displayed
and no interesting information are provided.
+ fail the process when an exception occurs
Silence various build warnings from Hexagon backend that show up in release builds. Mostly converting 'assert(0)' to 'llvm_unreachable' to silence warnings about missing returns. Also fold some variable declarations into asserts to prevent the variables from being unused in release builds.
Fix target specific intrinsic handling to adjust intrinsic number before doing attribute table lookup. Also fix attribute table lookup to handle 'invalid' intrinsic correctly. Fixes PR12542
Remove getElfArchType from ELF.h. It's only used in ELFObjectFile.cpp and there's already a copy there. ELF.h was hiding the one there and causing an unused function warning.
Dan Gohman [Fri, 13 Apr 2012 01:08:28 +0000 (01:08 +0000)]
Use the new Use-aware dominates method to apply the objc runtime
library return value optimization for phi uses. Even when the
phi itself is not dominated, the specific use may be dominated.
Bill Wendling [Fri, 13 Apr 2012 01:06:27 +0000 (01:06 +0000)]
Code-gen may inject code into the IR before it emits the ASM. The linker
obviously cannot know that this code is present, let alone used. So prevent the
internalize pass from internalizing those global values which code-gen may
insert.
Dan Gohman [Fri, 13 Apr 2012 00:59:57 +0000 (00:59 +0000)]
Don't move objc_autorelease calls past autorelease pool boundaries when
optimizing autorelease calls on phi nodes with null operands.
This fixes rdar://11207070.
Dan Gohman [Thu, 12 Apr 2012 23:31:46 +0000 (23:31 +0000)]
Add forms of dominates and isReachableFromEntry that accept a Use
directly instead of a user Instruction. This allows them to test
whether a def dominates a particular operand if the user instruction
is a PHI.
There is an assert at line 558 in ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA).
This assert needs to addressed for post RA scheduler. Until that assert is addressed,
any passes that uses post ra scheduler will fail. So, I am temporarily disabling the
hexagon tests until that fix is in.
The assert is as follows:
assert(!MI->isTerminator() && !MI->isLabel() &&
"Cannot schedule terminators or labels!");
This patch improves the MCJIT runtime dynamic loader by adding new handling
of zero-initialized sections, virtual sections and common symbols
and preventing the loading of sections which are not required for
execution such as debug information.
Jim Grosbach [Wed, 11 Apr 2012 17:40:18 +0000 (17:40 +0000)]
ARM 'vuzp.32 Dd, Dm' is a pseudo-instruction.
While there is an encoding for it in VUZP, the result of that is undefined,
so we should avoid it. Define the instruction as a pseudo for VTRN.32
instead, as the ARM ARM indicates.
Jim Grosbach [Wed, 11 Apr 2012 16:53:25 +0000 (16:53 +0000)]
ARM 'vzip.32 Dd, Dm' is a pseudo-instruction.
While there is an encoding for it in VZIP, the result of that is undefined,
so we should avoid it. Define the instruction as a pseudo for VTRN.32
instead, as the ARM ARM indicates.
Benjamin Kramer [Wed, 11 Apr 2012 14:06:54 +0000 (14:06 +0000)]
Cache the hash value of the operands in the MDNode.
FoldingSet is implemented as a chained hash table. When there is a hash
collision during insertion, which is common as we fill the table until a
load factor of 2.0 is hit, we walk the chained elements, comparing every
operand with the new element's operands. This can be very expensive if the
MDNode has many operands.
We sacrifice a word of space in MDNode to cache the full hash value, reducing
compares on collision to a minimum. MDNode grows from 28 to 32 bytes + operands
on x86. On x86_64 the new bits fit nicely into existing padding, not growing
the struct at all.
The actual speedup depends a lot on the test case and is typically between
1% and 2% for C++ code with clang -c -O0 -g.
Add a C binding to the Target and TargetMachine classes to allow for emitting
binary and assembly. Patch by Carlo Kok. Emitting was inspired by but not based
on the D llvm bindings.
Fix a dagcombine optimization which assumes that the vsetcc result type is always
of the same size as the compared values. This is ture for SSE/AVX/NEON but not
for all targets.
Original message:
Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendV uses a register for the selection while Vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.
Clean up ARM fused multiply + add/sub support some more: rename some isel
predicates.
Also remove NEON2 since it's not really useful and it is confusing. If
NEON + VFP4 implies NEON2 but NEON2 doesn't imply NEON + VFP4, what does it
really mean?
Fix a number of problems with ARM fused multiply add/subtract instructions.
1. The new instruction itinerary entries are not properly described.
2. The asm parser can't handle vfms and vfnms.
3. There were no assembler, disassembler test cases.
4. HasNEON2 has the wrong assembler predicate.
rdar://10139676
Tweak MachineLICM heuristics for cheap instructions.
Allow cheap instructions to be hoisted if they are register pressure
neutral or better. This happens if the instruction is the last loop use
of another virtual register.
Only expensive instructions are allowed to increase loop register
pressure.
Owen Anderson [Tue, 10 Apr 2012 22:46:53 +0000 (22:46 +0000)]
Move the constant-folding support for FP_ROUND in SelectionDAG from the one-operand version of getNode() to the two-operand version, since it became a two-operand node at sound point.
Zap a testcase that this allows us to completely fold away.
ConstantFP::get(Type*, double) is unreliably host-specific:
it can't handle a type like PPC128 on an x86 host. It even
has a comment to that effect: "This should only be used for
simple constant values like 2.0/1.0 etc, that are
known-valid both as host double and as the target format."
Instead, use APFloat. While we're at it, randomize the floating
point value more thoroughly; it was previously limited
to the range 0 to 2**19 - 1.
[tsan] two more compile-time optimizations:
- don't isntrument reads from constant globals.
Saves ~1.5% of instrumented instructions on CPU2006
(counting static instructions, not their execution).
- don't insrument reads from vtable (which is a global constant too).
Saves ~5%.
I did not measure the run-time impact of this,
but it is certainly non-negative.
Bill Wendling [Tue, 10 Apr 2012 20:12:16 +0000 (20:12 +0000)]
The MDString class stored a StringRef to the string which was already in a
StringMap. This was redundant and unnecessarily bloated the MDString class.
Because the MDString class is a "Value" and will never have a "name", and
because the Name field in the Value class is a pointer to a StringMap entry, we
repurpose the Name field for an MDString. It stores the StringMap entry in the
Name field, and uses the normal methods to get the string (name) back.
[tsan] compile-time instrumentation: do not instrument a read if
a write to the same temp follows in the same BB.
Also add stats printing.
On Spec CPU2006 this optimization saves roughly 4% of instrumented reads
(which is 3% of all instrumented accesses):
Writes : 161216
Reads : 446458
Reads-before-write: 18295
Eric Christopher [Tue, 10 Apr 2012 18:18:10 +0000 (18:18 +0000)]
To ensure that we have more accurate line information for a block
don't elide the branch instruction if it's the only one in the block,
otherwise it's ok.