Wei Mi [Tue, 21 Apr 2015 23:02:15 +0000 (23:02 +0000)]
Limiting gep merging to fix the performance problem described in
https://llvm.org/bugs/show_bug.cgi?id=23163.
Gep merging sometimes behaves like a reverse CSE/LICM optimization,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.
The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.
Wei Mi [Tue, 21 Apr 2015 22:37:09 +0000 (22:37 +0000)]
Limiting gep merging to fix the performance problem described in
https://llvm.org/bugs/show_bug.cgi?id=23163.
Gep merging sometimes behaves like a reverse CSE/LICM optimizations,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.
The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.
Instead, use getRawDest, which just gives you the i8* value.
While there, use the memcpy's dest, as it's live anyway.
In most cases, when the optimization triggers, the memset and memcpy
sizes are the same, so the built memset is 0-sized and eliminated.
The problem occurs when they're different.
Summary:
After we rewrite a candidate, the instructions used by the old form may
become unused. This patch cleans up these unused instructions so that we
needn't run DCE after SLSR.
DebugInfo: Assert dbg.declare/value insts are valid
Remove early returns for when `getVariable()` is null, and just assert
that it never happens. The Verifier already confirms that there's a
valid variable on these intrinsics, so we should assume the debug info
isn't broken. I also updated a check for a `!dbg` attachment, which the
Verifier similarly guarantees.
[mips] Optimize code generation for 64-bit variable shift instructions.
Summary:
The 64-bit version of the variable shift instructions uses the
shift_rotate_reg class which uses a GPR32Opnd to specify the variable
shift amount. With this patch we avoid the generation of a redundant
SLL instruction for the variable shift instructions in 64-bit targets.
Simon Pilgrim [Tue, 21 Apr 2015 08:05:43 +0000 (08:05 +0000)]
CONCAT_VECTOR of BUILD_VECTOR - minor fix
Fixed issue with the combine of CONCAT_VECTOR of 2 BUILD_VECTOR nodes - the optimisation wasn't ensuring that the scalar operands of both nodes were the same type/size for implicit truncation.
Fix generic shift expansion when shift amount is 0
Summary:
This fixes http://llvm.org/bugs/show_bug.cgi?id=16439.
This is one possible way to approach this. The other would be to split InL>>(nbits-Amt) into (InL>>(nbits-1-Amt))>>1, which is also valid since since we only need to care about Amt up nbits-1. It's hard to tell which one is better since the shift might be expensive if this stage of expansion is not yet a legal machine integer, whereas comparisons with zero are relatively cheap at all sizes, but more expensive than a shift if the shift is on a legal machine type.
Patch by Keno Fischer!
Test Plan: regression test from http://reviews.llvm.org/D7752
This brings the utils/vim folder into a more vim-like format by moving
the syntax hightlighting files into a syntax subdirectory. It adds
some minimal settings that everyone should agree on to ftdetect/ftplugin and
features a new indentation plugin for .ll files.
X86: Do not select X86 custom vector nodes if operand types don't match
X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected
if the operand types do not match the result type because vector type
legalization cannot deal with this for custom nodes.
Testcase X86ISD::ADDSUB is attached. I could not create a testcase for
the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296
[MC] When using bundle aligment, align sections to bundle size
Summary:
Bundle aligment requires that the functions always start at an aligned address.
Usually this is ensured by the compiler, but assembly code does not always
begin with a .align directive.
This change ensures that sections get the correct alignment if they contain
any instructions and bundling is enabled. (It also makes LLVM match the
behavior of GNU as).
Summary:
In the f16-promote test, make the checks for native conversion instructions
similar to the libcall checks:
- Remove hard coded register names
- Do not check exact instruction sequences.
This fixes test flakiness due to non-determinism in instruction
scheduling and register allocation. I also fixed a few minor things in
the CHECK-LIBCALL checks.
I'll try to find a way to check that unnecessary loads, stores, or
conversions don't happen.
Summary:
This patch adds two flags to `bugpoint`: "-replace-funcs-with-null" and "-disable-pass-list-reduction".
When "-replace-funcs-with-null" is specified, bugpoint will, instead of simply deleting function bodies, replace all uses of functions and then will delete functions completely from the test module, correctly handling aliasing and @llvm.used && @llvm.compiler.used. This part was conceived while trying to debug the PNaCl IR simplification passes, which don't allow undefined functions (ie no declarations).
With "-disable-pass-list-reduction", bugpoint won't try to reduce the set of passes causing the "crash". This is needed in cases where one is trying to debug an issue inside the PNaCl IR simplification passes which is causing an PNaCl ABI verification error, for example.
Delete subclasses of (the already deleted) `DIType` in favour of
directly using pointers from the `Metadata` hierarchy.
While `DICompositeType` wraps `MDCompositeTypeBase` and `DIDerivedType`
wraps `MDDerivedTypeBase`, most uses of each really meant the more
specific `MDCompositeType` and `MDDerivedType`.
DwarfUnit: Split MDSubroutineType version of constructTypeDIE()
The version of `constructTypeDIE()` for `MDSubroutineType` is unrelated
to (and has different callers than) the `MDCompositeType`. Split the
two in half.
This simplifies an upcoming patch to delete `DICompositeType`. There
shouldn't be any real functionality change here. `createTypeDIE()` is
`cast<>`'ing where it didn't need to before, but that function in turn
is only called for true `MDCompositeType`s.
- Drop duplicated comments at definition, and update the comments at
the declaration where the definition comments looked newer or more
complete.
- Drop the `functionName -` prefix.
- Add `\brief` in a few places.
- Remove a few comments entirely that weren't adding value (just
turned the function name and arguments into a sentence).
Summary:
Set operation action for FP16 conversion opcodes, so the Op legalizer
can choose the gnu_* libcalls for Mips.
Set LoadExtAction and TruncStoreAction for f16 scalars and vectors to
prevent (fpext (load )) and (store (fptrunc)) from getting combined into
unsupported operations.
Added test cases to test that these operations are handled correctly
for f16 scalars and vectors. This patch depends on
http://reviews.llvm.org/D8755.
This is the last major parent class, so I'll probably start deleting
classes in batches now. Looks like many of the references to the DI*
hierarchy were updated organically along the way.
Replace uses of `DIScope` with `MDScope*`. There was one spot where
I've left an `MDScope*` uninitialized (where `DIScope` would have been
default-initialized to `nullptr`) -- this is intentional, since the
if/else that follows should unconditional assign it to a value.
Pete Cooper [Mon, 20 Apr 2015 18:22:05 +0000 (18:22 +0000)]
Add targets to cmake for specific target components.
This adds the following targets to cmake. These can be used to build and link only specific parts of a backend, instead of having to link the whole backend.
n/1 generates a quotient equal to n and a remainder of 0.
If this case is not recognized, then the SCEV divide() function
can return a remainder that is greater than or equal to the
denominator, which means the delinearized subscripts for the
test case will be incorrect.
Rafael Espindola [Mon, 20 Apr 2015 13:04:30 +0000 (13:04 +0000)]
Don't allow pwrite to resize a stream.
The current implementations could exhibit some behavior differences:
raw_fd_ostream: Whatever the underlying fd does with seek+write. In a normal
file, the write position would be back to the old offset.
raw_svector_ostream: The write position is always the end of the stream, so
after pwrite the write position would be the new end. This matches what OS_X
(all BSD?) do with a pwrite in a O_APPEND fd.
Given that we don't need that feature and don't use O_APPEND a lot in LLVM,
just disallow it.
I am open to suggestions on renaming pwrite to something else, but this fixes
the issue for now.
Andrea Di Biagio [Mon, 20 Apr 2015 11:56:59 +0000 (11:56 +0000)]
[X86][FastIsel] Fix assertion failure when selecting int-to-double conversion (PR23273).
This fixes a regression introduced at revision 231243.
The target-independent selection algorithm in FastISel knows how to select
a SINT_TO_FP if the target is SSE but not AVX. That is because on X86, the
tablegen'd 'fastEmit' functions know how to select CVTSI2SSrr and CVTSI2SDrr.
Method X86FastISel::X86SelectSIToFP was therefore working under the
wrong assumption that the target was AVX. That assumption was incorrect since
we can have a target that is neither AVX nor SSE.
So, rather than asserting for the presence of AVX, we should have had an
early exit from 'X86SelectSIToFP' if the target was not AVX.
This patch fixes the issue replacing the invalid assertion with an early exit.
Thanks to Dimitry Andric for reporting this problem and for providing a small
reproducible testcase. Added test pr23273.ll.
[NFC] Refactor identification of reductions as common utility function.
This patch refactors reduction identification code out of LoopVectorizer and
exposes them as common utilities.
No functional change.
Review: http://reviews.llvm.org/D9046
Hal Finkel [Mon, 20 Apr 2015 00:01:30 +0000 (00:01 +0000)]
[InlineAsm] Remove EarlyClobber on registers that are also inputs
When an inline asm call has an output register marked as early-clobber, but
that same register is also an input operand, what should we do? GCC accepts
this, and is documented to accept this for read/write operands saying,
"Furthermore, if the earlyclobber operand is also a read/write operand, then
that operand is written only after it's used." For write-only operands, the
situation seems less clear, but I have at least one existing codebase that
assumes this will work, in part because it has syscall macros like this:
Furthermore, with register aliases and subregister relationships that only the
backend knows about, rejecting this in the frontend seems like a difficult
proposition (if we wanted to do so). However, keeping the early-clobber flag on
the INLINEASM MI does not work for us, because it will cause the register's
live interval to end to soon (so it will not appear defined to be used as an
input).
Fortunately, fixing this does not seem hard: When forming the INLINEASM MI,
check to see if any of the early-clobber outputs are also inputs, and if so,
remove the early-clobber flag.
Ahmed Bougacha [Sat, 18 Apr 2015 23:06:04 +0000 (23:06 +0000)]
[MemCpyOpt] Don't force i64 when promoting memset/memcpy sizes.
Harden r235258 to support any integer bitwidth. The quick glance at
the reference made me think only i32 and i64 were valid types, but
they're not special, so any overload is legal.
Ahmed Bougacha [Sat, 18 Apr 2015 17:57:41 +0000 (17:57 +0000)]
[MemCpyOpt] Promote both memset/memcpy sizes if differently typed.
Followup to r235232, which caused PR23278.
We can't assume the memset and memcpy sizes have the same type, as
nothing in the language reference prevents that.
Instead, zext both to i64 if they disagree.
While there, robustify tests by using i8 %c rather than i8 0 for the
memset character.
Ahmed Bougacha [Sat, 18 Apr 2015 01:21:58 +0000 (01:21 +0000)]
[GlobalMerge] Look at uses to create smaller global sets.
Instead of merging everything together, look at the users of
GlobalVariables, and try to group them by function, to create
sets of globals used "together".
Using that information, a less-aggressive alternative is to keep merging
everything together *except* globals that are only ever used alone, that
is, those for which it's clearly non-profitable to merge with others.
In my testing, grouping by Function is too aggressive, but grouping by
BasicBlock is too conservative. Anything in-between isn't trivially
available, so stick with Function grouping for now.
cl::opts are added for testing; both enabled by default.
A few of the testcases aren't testing the merging proper, but just
various edge cases when merging does occur. Update them to use the
previous grouping behavior. Also, one of the tests is unrelated to
GlobalMerge; change it accordingly.
While there, switch to r234666' flags rather than the brutal -O3.
This has been bit-rotting, so fix it up. I'll have to edit this again
once the MD* classes have been renamed to DI* -- I'll try to remember to
do that with the commit that renames them.
Ahmed Bougacha [Fri, 17 Apr 2015 23:43:33 +0000 (23:43 +0000)]
[AArch64] Don't force MVT::Untyped when selecting LD1LANEpost.
The result is either an Untyped reg sequence, on ldN with N > 1, or
just the type of the input vector, on ld1. Don't force Untyped.
Instead, just use the type of the reg sequence.
This mirrors the behavior of createTuple, which feeds the LD1*_POST.
The narrow code path wasn't actually covered by tests, because V64
insert_vector_elt are widened to V128 before the LD1LANEpost combine
has the chance to run, usually.
The only case where it does run on V64 vectors is if the vector ops
legalizer ran. So, tickle the code with a ctpop.