Sean Silva [Fri, 17 Apr 2015 21:58:55 +0000 (21:58 +0000)]
[LangRef] Remove redundant and inconsistent condition.
Just above, 'op2' is stated to be unsigned, so 'negative' doesn't make
sense (and is handled by "larger than" anyway). The descriptions for
lshr and ashr don't say 'negative or' either.
This commit removes `DebugLocList` and replaces it with
`DebugLocStream`.
- `DebugLocEntry` no longer contains its byte/comment streams.
- The `DebugLocEntry` list for a variable/inlined-at pair is allocated
on the stack, and released right after `DebugLocEntry::finalize()`
(possible because of the refactoring in r231023). Now, only one
list is in memory at a time now.
- There's a single unified stream for the `.debug_loc` section that
persists, stored in the new `DebugLocStream` data structure.
The last point is important: this collapses the nested `SmallVector<>`s
from `DebugLocList` into unified streams. We previously had something
like the following:
A `SmallVector` can avoid allocations, but is statically fairly large
for a vector: three pointers plus the size of the small storage, which
is the number of elements in small mode times the element size).
Nesting these is expensive, since an inner vector's size contributes to
the element size of an outer one. (Nesting any vector is expensive...)
In the old data structure, the outer vector's *element* size was 632B,
excluding allocation costs for when the middle and inner vectors
exceeded their small sizes. 312B of this was for the "three" pointers
in the vector-tree beneath it. If you assume 1M functions with an
average of 10 variable/inlined-at pairs each (in an LTO scenario),
that's almost 6GB (besides inner allocations), with almost 3GB for the
"three" pointers.
This came up in a heap profile a little while ago of a `clang -flto -g`
bootstrap, with `DwarfDebug::collectVariableInfo()` using something like
10-15% of the total memory.
The offsets are used to create `ArrayRef` slices of adjacent
`SmallVector`s. This reduces the number of vectors to four (unrelated
to the number of variable/inlined-at pairs), and caps the number of
allocations at the same number.
Besides saving memory and limiting allocations, this is NFC.
I don't know my way around this code very well yet, but I wonder if we
could go further: why stream to a side-table, instead of directly to the
output stream?
Rafael Espindola [Fri, 17 Apr 2015 21:15:17 +0000 (21:15 +0000)]
Compute A-B when A or B is weak.
Similar to r235222, but for the weak symbol case.
In an "ideal" assembler/object format an expression would always refer to the
final value and A-B would only be computed from a section in the same
comdat as A and B with A and B strong.
Unfortunately that is not the case with debug info on ELF, so we need an
heuristic. Since we need an heuristic, we may as well use the same one as
gas:
* call weak_sym : produces a relocation, even if in the same section.
* A - weak_sym and weak_sym -A: don't produce a relocation if we can
compute it.
This fixes pr23272 and changes the fix of pr22815 to match what gas does.
David Majnemer [Fri, 17 Apr 2015 20:12:09 +0000 (20:12 +0000)]
[WinEH] Reusing HandlerType entries leads to small CatchHigh values
CatchHigh may be smaller than TryHigh if we reuse an outlined catch
handler for two different invokes with different EH states. We have no
evidence which shows that CatchHigh must be greater than TryHigh or
TryLow. We can revisit this if we turn out to be wrong.
Rafael Espindola [Fri, 17 Apr 2015 20:05:17 +0000 (20:05 +0000)]
Compute A-B if both A and B are in the same comdat section.
Part of pr23272.
A small annoyance with the assembly syntax we implement is that given an
expression there is no way to know if what is desired is the value of that
expression for the symbols in this file or for the final values of those
symbols in a link.
The first case is useful for use in sections that get discarded or ignored
if the section they are describing is discarded.
For axample, consider A-B where A and B are in the same comdat section.
We can compute the value of the difference in the section that is present in
the current .o and if that section survives to the final DSO the value will
still will be correct.
But the section is in a comdat. Another section from another object file
might be used istead. We know that that section will define A and B, but
we have no idea what the value of A-B might be.
In practice we have to assume that the intention is to compute the value
in the current section since otherwise the is no way to create something like
the debug aranges section.
David Blaikie [Fri, 17 Apr 2015 19:56:21 +0000 (19:56 +0000)]
[opaque pointer types] Use the pointee type loaded from bitcode when constructing a LoadInst
Now (with a few carefully placed suppressions relating to general type
serialization, etc) we can round trip a simple load through bitcode and
textual IR without calling getElementType on a PointerType.
Suppressing the C4324 warnings generated by MSVC. This is the only declarative instance that would generate the warning, but it accounted for about 525+ warnings due to template instantiations. This is a marginal-value warning which we may decide to disable more broadly, but since this header is in Support and may be used out of tree, it's a low burden for us to be warning-free in this case.
David Majnemer [Fri, 17 Apr 2015 17:20:30 +0000 (17:20 +0000)]
[WinEH] Allow CatchHigh to be equal to TryHigh
Catch blocks which are empty may be in the same state as their try
blocks. It is not meaningful to give the catch block its own state
number in this case because it can't do anything exceptional.
When debugging LTO issues with ld64, we use -save-temps to save the merged
optimized bitcode file, then invoke ld64 again on the single bitcode file.
The saved bitcode file is already internalized, so we can call
lto_codegen_set_should_internalize and skip running internalization again.
AsmPrinter: Stop storing MDLocalVariable in DebugLocEntry
Stop storing the `MDLocalVariable` in the `DebugLocEntry::Value`s. We
generate the list of `DebugLocEntry`s separately for each
variable/inlined-at pair, so the variable never actually changes here.
This is effectively NFC (aside from saving some memory and CPU time).
AsmPrinter: Calculate type upfront for location lists, NFC
We can calculate the variable type up front before calling
`DebugLocEntry::finalize()`. In fact, since we only care about the type
if it's an `MDBasicType`, don't even bother resolving it using the type
identifier map.
Add support for v1i128 type.
The v1i128 type is needed for the quadword add/substract instructions introduced
in POWER8. Futhermore, the PowerPC ABI specifies that parameters of type v1i128
are to be passed in a single vector register, while parameters of type i128 are
passed in pairs of GPRs. Thus, it is necessary to be able to differentiate
between v1i128 and i128 in LLVM.
Add the i128 builtin type to LLVM.
The i128 type is needed as a builtin type in order to support the v1i128 vector
type. The PowerPC ABI requires that the i128 and v1i128 types are handled
differently when passed as parameters to functions (i128 is passed in pairs of
GPRs, v1i128 is passed in a single vector register).
Revert r235177 as the Handle is used to fail GetExitCodeProcess on purpose.
Avoid double closing of the handle by testing GetLastErr for
ERROR_INVALID_HANDLE and not calling CloseHandle(PI.ProcessHandle) then.
[mips] Teach the delay slot filler to remove needless KILL instructions.
Summary:
Previously, the presence of KILL instructions would block valid candidates
from filling a specific delay slot. With the elimination of the KILL
instructions, in the appropriate range, we are able to fill more slots and
keep the information from future def/use analysis consistent.
Eliminate superfluous CloseHandle(PI.ProcessHandle).
This handle will always be closed few lines later, resulting in
an error for the second CloseHandle.
Daniel Sanders [Fri, 17 Apr 2015 09:50:21 +0000 (09:50 +0000)]
[mips] Move ABI-dependent register selections to MipsABIInfo. NFC.
Summary:
For example, a common idiom was 'isN64 ? Mips::SP_64 : Mips::SP'. This has
been moved to MipsABIInfo and replaced with 'ABI.GetStackPtr()'.
There are others that should also be moved. This patch sticks to the ones that
are obviously non-functional. The others have minor mistakes that need fixing
at the same time, mostly involving checks for 64-bit GPR's instead of checks
for 64-bit pointers.
This now emits simple, unoptimized xdata tables for __C_specific_handler
based on the handlers listed in @llvm.eh.actions calls produced by
WinEHPrepare.
This adds support for running __finally blocks when exceptions are
thrown, and removes the old landingpad fan-in codepath.
I ran some manual execution tests on small basic test cases with and
without optimization, as well as on Chrome base_unittests, which uses a
small amount of SEH. I'm sure there are bugs, and we may need to
revert.
Summary:
An alternative is to use a worklist approach. However, that approach
would break the traversing order so that we couldn't lookup SeenExprs
efficiently. I don't see a clear winner here, so I picked the easier approach.
Along with two minor improvements:
1. preserves ScalarEvolution by forgetting instructions replaced
2. removes dead code locally avoiding the need of running DCE afterwards
Test Plan: add to slsr-add.ll a test that requires multiple iterations
Ahmed Bougacha [Thu, 16 Apr 2015 23:57:07 +0000 (23:57 +0000)]
[AArch64] Don't assert on f16 in DUP PerfectShuffle generator.
Found by code inspection, but breaking i16 at least breaks other tests.
They aren't checking this in particular though, so also add some
explicit tests for the already working types.
David Blaikie [Thu, 16 Apr 2015 23:24:18 +0000 (23:24 +0000)]
[opaque pointer type] Add textual IR support for explicit type parameter to the call instruction
See r230786 and r230794 for similar changes to gep and load
respectively.
Call is a bit different because it often doesn't have a single explicit
type - usually the type is deduced from the arguments, and just the
return type is explicit. In those cases there's no need to change the
IR.
When that's not the case, the IR usually contains the pointer type of
the first operand - but since typed pointers are going away, that
representation is insufficient so I'm just stripping the "pointerness"
of the explicit type away.
This does make the IR a bit weird - it /sort of/ reads like the type of
the first operand: "call void () %x(" but %x is actually of type "void
()*" and will eventually be just of type "ptr". But this seems not too
bad and I don't think it would benefit from repeating the type
("void (), void () * %x(" and then eventually "void (), ptr %x(") as has
been done with gep and load.
This also has a side benefit: since the explicit type is no longer a
pointer, there's no ambiguity between an explicit type and a function
that returns a function pointer. Previously this case needed an explicit
type (eg: a function returning a void() function was written as
"call void () () * @x(" rather than "call void () * @x(" because of the
ambiguity between a function returning a pointer to a void() function
and a function returning void).
No ambiguity means even function pointer return types can just be
written alone, without writing the whole function's type.
This leaves /only/ the varargs case where the explicit type is required.
Given the special type syntax in call instructions, the regex-fu used
for migration was a bit more involved in its own unique way (as every
one of these is) so here it is. Use it in conjunction with the apply.sh
script and associated find/xargs commands I've provided in rr230786 to
migrate your out of tree tests. Do let me know if any of this doesn't
cover your cases & we can iterate on a more general script/regexes to
help others with out of tree tests.
About 9 test cases couldn't be automatically migrated - half of those
were functions returning function pointers, where I just had to manually
delete the function argument types now that we didn't need an explicit
function type there. The other half were typedefs of function types used
in calls - just had to manually drop the * from those.
import fileinput
import sys
import re
pat = re.compile(r'((?:=|:|^|\s)call\s(?:[^@]*?))(\s*$|\s*(?:(?:\[\[[a-zA-Z0-9_]+\]\]|[@%](?:(")?[\\\?@a-zA-Z0-9_.]*?(?(3)"|)|{{.*}}))(?:\(|$)|undef|inttoptr|bitcast|null|asm).*$)')
addrspace_end = re.compile(r"addrspace\(\d+\)\s*\*$")
func_end = re.compile("(?:void.*|\)\s*)\*$")
def conv(match, line):
if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)):
return line
return line[:match.start()] + match.group(1)[:match.group(1).rfind('*')].rstrip() + match.group(2) + line[match.end():]
for line in sys.stdin:
sys.stdout.write(conv(re.search(pat, line), line))
DebugInfo: Fix UserValue::match() in LiveDebugVariables after r235050
r235050 dropped the inlined-at field from `MDLocalVariable`, deferring
to the `!dbg` attachments. Fix `UserValue` to take the `!dbg` into
account when differentiating between variables.
Sanjoy Das [Thu, 16 Apr 2015 20:29:50 +0000 (20:29 +0000)]
[IR] Introduce a dereferenceable_or_null(N) attribute.
Summary:
If a pointer is marked as dereferenceable_or_null(N), LLVM assumes it
is either `null` or `dereferenceable(N)` or both. This change only
introduces the attribute and adds a token test case for the `llvm-as`
/ `llvm-dis`. It does not hook up other parts of the optimizer to
actually exploit the attribute -- those changes will come later.
For pointers in address space 0, `dereferenceable(N)` is now exactly
equivalent to `dereferenceable_or_null(N)` && `nonnull`. For other
address spaces, `dereferenceable(N)` is potentially weaker than
`dereferenceable_or_null(N)` && `nonnull` (since we could have a null
`dereferenceable(N)` pointer).
The motivating case for this change is Java (and other managed
languages), where pointers are either `null` or dereferenceable up to
some usually known-at-compile-time constant offset.
Summary:
This fixes a left-over efficiency issue in D8950.
As Andrew and Daniel suggested, we can store the candidates in a stack
and pop the top element when it does not dominate the current
instruction. This reduces the worst-case time complexity to O(n).
Test Plan: a new test in nary-add.ll that exercises this optimization.
[X86, SSE] instcombine common cases of insertps intrinsics into shuffles
This is very similar to D8486 / r232852 (vperm2). If we treat insertps intrinsics
as shufflevectors, we can optimize them better.
I've left all but the full zero case of the zero mask variants out of this patch.
I don't think those can be converted into a single shuffle in all cases, but I'd
be happy to be proven wrong as I was for vperm2f128.
Either way, we'd need to support whatever sequence we come up with for those cases
in the backend before converting them here.
This means it can only take one parameter from the set of RUNTIME, LIBRARY, or ARCHIVE. If you set more than one of these it seems to gobble up the extra arguments and ignore the COMPONENT argument.
This adds a check to only set LIBRARY or ARCHIVE based on whether or not the library being built is shared.
DebugInfo: Remove DIDescriptor from the DIBuilder API
As a step toward killing `DIDescriptor` and its subclasses, remove it
from the `DIBuilder` API. Replace the subclasses with appropriate
pointers from the new debug info hierarchy. There are a couple of
possible surprises in type choices for out-of-tree frontends:
- Subroutine types: `MDSubroutineType`, not `MDCompositeTypeBase`.
- Composite types: `MDCompositeType`, not `MDCompositeTypeBase`.
- Scopes: `MDScope`, not `MDNode`.
- Generic debug info nodes: `DebugNode`, not `MDNode`.
Hans Wennborg [Thu, 16 Apr 2015 14:49:23 +0000 (14:49 +0000)]
Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
This is a major rewrite of the SelectionDAG switch lowering. The previous code
would lower switches as a binary tre, discovering clusters of cases
suitable for lowering by jump tables or bit tests as it went along. To increase
the likelihood of finding jump tables, the binary tree pivot was selected to
maximize case density on both sides of the pivot.
By not selecting the pivot in the middle, the binary trees would not always
be balanced, leading to performance problems in the generated code.
This patch rewrites the lowering to search for clusters of cases
suitable for jump tables or bit tests first, and then builds the binary
tree around those clusters. This way, the binary tree will always be balanced.
This has the added benefit of decoupling the different aspects of the lowering:
tree building and jump table or bit tests finding are now easier to tweak
separately.
For example, this will enable us to balance the tree based on profile info
in the future.
The algorithm for finding jump tables is O(n^2), whereas the previous algorithm
was O(n log n) for common cases, and quadratic only in the worst-case. This
doesn't seem to be major problem in practice, e.g. compiling a file consisting
of a 10k-case switch was only 30% slower, and such large switches should be rare
in practice. Compiling e.g. gcc.c showed no compile-time difference. If this
does turn out to be a problem, we could limit the search space of the algorithm.
This commit also disables all optimizations during switch lowering in -O0.
Simon Pilgrim [Thu, 16 Apr 2015 08:21:09 +0000 (08:21 +0000)]
TRUNCATE constant folding - minor fix for rL233224
Fix for test case found by James Molloy - TRUNCATE of constant build vectors can be more simply achieved by simply replacing with a new build vector node with the truncated value type - no need to touch the scalar operands at all.
Delete `DIRef<>`, and replace the remaining uses of it with
`TypedDebugNodeRef<>`. To minimize code churn, I've added typedefs from
`MDTypeRef` to `DITypeRef` (etc.).
PR23080 is almost finished. With this commit, there's no consequential
API in `DIDescriptor` and its subclasses. What's left?
- Default-constructed to `nullptr`.
- Handy `const_cast<>` (constructed from `const`, but accessors are
non-`const`).
I think the safe way to catch those is to delete the classes and fix
compile errors. That'll be my next step, after I delete the `DITypeRef`
(etc.) wrapper around `MDTypeRef`.