This means it can only take one parameter from the set of RUNTIME, LIBRARY, or ARCHIVE. If you set more than one of these it seems to gobble up the extra arguments and ignore the COMPONENT argument.
This adds a check to only set LIBRARY or ARCHIVE based on whether or not the library being built is shared.
DebugInfo: Remove DIDescriptor from the DIBuilder API
As a step toward killing `DIDescriptor` and its subclasses, remove it
from the `DIBuilder` API. Replace the subclasses with appropriate
pointers from the new debug info hierarchy. There are a couple of
possible surprises in type choices for out-of-tree frontends:
- Subroutine types: `MDSubroutineType`, not `MDCompositeTypeBase`.
- Composite types: `MDCompositeType`, not `MDCompositeTypeBase`.
- Scopes: `MDScope`, not `MDNode`.
- Generic debug info nodes: `DebugNode`, not `MDNode`.
Hans Wennborg [Thu, 16 Apr 2015 14:49:23 +0000 (14:49 +0000)]
Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
This is a major rewrite of the SelectionDAG switch lowering. The previous code
would lower switches as a binary tre, discovering clusters of cases
suitable for lowering by jump tables or bit tests as it went along. To increase
the likelihood of finding jump tables, the binary tree pivot was selected to
maximize case density on both sides of the pivot.
By not selecting the pivot in the middle, the binary trees would not always
be balanced, leading to performance problems in the generated code.
This patch rewrites the lowering to search for clusters of cases
suitable for jump tables or bit tests first, and then builds the binary
tree around those clusters. This way, the binary tree will always be balanced.
This has the added benefit of decoupling the different aspects of the lowering:
tree building and jump table or bit tests finding are now easier to tweak
separately.
For example, this will enable us to balance the tree based on profile info
in the future.
The algorithm for finding jump tables is O(n^2), whereas the previous algorithm
was O(n log n) for common cases, and quadratic only in the worst-case. This
doesn't seem to be major problem in practice, e.g. compiling a file consisting
of a 10k-case switch was only 30% slower, and such large switches should be rare
in practice. Compiling e.g. gcc.c showed no compile-time difference. If this
does turn out to be a problem, we could limit the search space of the algorithm.
This commit also disables all optimizations during switch lowering in -O0.
Simon Pilgrim [Thu, 16 Apr 2015 08:21:09 +0000 (08:21 +0000)]
TRUNCATE constant folding - minor fix for rL233224
Fix for test case found by James Molloy - TRUNCATE of constant build vectors can be more simply achieved by simply replacing with a new build vector node with the truncated value type - no need to touch the scalar operands at all.
Delete `DIRef<>`, and replace the remaining uses of it with
`TypedDebugNodeRef<>`. To minimize code churn, I've added typedefs from
`MDTypeRef` to `DITypeRef` (etc.).
PR23080 is almost finished. With this commit, there's no consequential
API in `DIDescriptor` and its subclasses. What's left?
- Default-constructed to `nullptr`.
- Handy `const_cast<>` (constructed from `const`, but accessors are
non-`const`).
I think the safe way to catch those is to delete the classes and fix
compile errors. That'll be my next step, after I delete the `DITypeRef`
(etc.) wrapper around `MDTypeRef`.
This allows us to get rid of the original unrelocated object file after
we're done processing relocations (but before applying them).
MachO and COFF already do not require this (currently we have temporary hacks
to prevent ownership from being released, but those are brittle and should be
removed soon).
The placeholder mechanism allowed the relocation resolver to look at original
object file to obtain more information that are required to apply the
relocations. This is usually necessary in two cases:
- For relocations targetting sub-word memory locations, there may be pieces
of the instruction at the target address which we should not override.
- Some relocations on some platforms allow an extra addend to be encoded in
their immediate fields.
The problem is that in the second case the information cannot be recovered
after the relocations have been applied once because they will have been
overridden. In the first case we also need to be careful to not use any bits
that aren't fixed and may have been overriden by applying a first relocation.
In the past both have been fixed by just looking at original object file. This
patch attempts to recover the information from the first by looking at the
relocated object file, while the extra addend in the second case is read
upon relocation processing and addend to the regular addend.
I have tested this on X86. Other platforms represent my best understanding
of how those relocations should work, but I may have missed something because
I do not have access to those platforms.
We will keep the ugly workarounds in place for a couple of days, so this commit
can be reverted if it breaks the bots.
DebugInfo: Remove unnecessary API from DIDerivedType and DIType
Remove the accessors of `DIDerivedType` that downcast to
`MDDerivedType`, shifting the `cast<MDDerivedType>` into the callers.
Also remove `DIType::isValid()`, which is really just a check against
`nullptr` at this point.
DebugInfo: Remove 'inlinedAt:' field from MDLocalVariable
Remove 'inlinedAt:' from MDLocalVariable. Besides saving some memory
(variables with it seem to be single largest `Metadata` contributer to
memory usage right now in -g -flto builds), this stops optimization and
backend passes from having to change local variables.
The 'inlinedAt:' field was used by the backend in two ways:
1. To tell the backend whether and into what a variable was inlined.
2. To create a unique id for each inlined variable.
Instead, rely on the 'inlinedAt:' field of the intrinsic's `!dbg`
attachment, and change the DWARF backend to use a typedef called
`InlinedVariable` which is `std::pair<MDLocalVariable*, MDLocation*>`.
This `DebugLoc` is already passed reliably through the backend (as
verified by r234021).
This commit removes the check from r234021, but I added a new check
(that will survive) in r235048, and changed the `DIBuilder` API in
r235041 to require a `!dbg` attachment whose 'scope:` is in the same
`MDSubprogram` as the variable's.
If this breaks your out-of-tree testcases, perhaps the script I used
(mdlocalvariable-drop-inlinedat.sh) will help; I'll attach it to PR22778
in a moment.
Verifier: Check that @llvm.dbg.* intrinsics have a !dbg attachment
Before we start to rely on valid `!dbg` attachments, add a check to the
verifier that `@llvm.dbg.*` intrinsics always have one. Also check that
the `scope:` fields point at the same `MDSubprogram`.
This is in the context of PR22778. The check that the `inlinedAt:`
fields agree has baked for a while (since r234021), so I'll kill [1] the
`MDLocalVariable::getInlinedAt()` field soon.
Unfortunately, that means it's impossible to keep the current `Verifier`
checks, which rely on comparing `inlinedAt:` fields. We'll be able to
keep the checks I'm adding here.
If this breaks your out-of-tree testcases, the upgrade script
(add-dbg-to-intrinsics.sh) attached to PR22778 that I used for r235040
might fix them for you.
DebugInfo: Require a DebugLoc in DIBuilder::insertDeclare()
Change `DIBuilder::insertDeclare()` and `insertDbgValueIntrinsic()` to
take an `MDLocation*`/`DebugLoc` parameter which it attaches to the
created intrinsic. Assert at creation time that the `scope:` field's
subprogram matches the variable's. There's a matching `clang` commit to
use the API.
The context for this is PR22778, which is removing the `inlinedAt:`
field from `MDLocalVariable`, instead deferring to the `!dbg` location
attached to the debug info intrinsic. The best way to ensure we always
have a `!dbg` attachment is to require one at creation time. I'll be
adding verifier checks next, but this API change is the best way to
shake out frontend bugs.
Note: I added an `llvm_unreachable()` in `bindings/go` and passed in
`nullptr` for the `DebugLoc`. The `llgo` folks will eventually need to
pass a valid `DebugLoc` here.
DebugInfo: Add missing !dbg attachments to intrinsics
Add missing `!dbg` attachments to `@llvm.dbg.*` intrinsics. I updated
these using a script (add-dbg-to-intrinsics.sh) that I'll attach to
PR22778 for posterity.
[WinEH] Try to make the MachineFunction CFG more accurate
This avoids emitting code for unreachable landingpad blocks that contain
calls to llvm.eh.actions and indirectbr.
It's also a first step towards unifying the SEH and WinEH lowering
codepaths. I'm keeping the old fan-in lowering of SEH around until the
preparation version works well enough that we can switch over without
breaking existing users.
Charlie Turner [Wed, 15 Apr 2015 17:28:23 +0000 (17:28 +0000)]
Fix BXJ is undefined in AArch32.
BXJ was incorrectly said to be unsupported in ARMv8-A. It is not
supported in the A64 instruction set, but it is supported in the T32
and A32 instruction sets, because it's listed as an instruction in the
ARM ARM section F7.1.28.
Using SP as an operand to BXJ changed from UNPREDICTABLE to
PREDICTABLE in v8-A. This patch reflects that update as well.
A secondary improvement: if (i' - i) is a power of 2, emit Y as X + (S << log(i' - i)). (S << log(i' -i)) is in a canonical form and thus more likely GVN'ed than (i' - i) * S.
[X86] add an exedepfix entry for movq == movlps == movlpd
This is a 1-line patch (with a TODO for AVX because that will affect
even more regression tests) that lets us substitute the appropriate
64-bit store for the float/double/int domains.
It's not clear to me exactly what the difference is between the 0xD6 (MOVPQI2QImr) and
0x7E (MOVSDto64mr) opcodes, but this is apparently the right choice.
Set the transform bar at 2 divisions because the fastest current
x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle
latency (best case) relative to a 5 cycle multiplier.
So that's the worst case for this transform (no latency win),
but multiplies are obviously pipelined while divisions are not,
so there's still a big throughput win which we would expect to
show up in typical FP code.
[Ignore for the moment that we don't optimize the chain of 3 multiplies
into 2 independent fmuls followed by 1 dependent fmul...this is the DAG
version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that,
then the transform becomes even more profitable on all targets.]
[mips] [IAS] Refactor the function which checks for the availability of AT. NFC.
Summary:
Refactor MipsAsmParser::getATReg to return an internal register number instead of a register index.
Also change all the int's to unsigned, seeing as the current AT register index is stored as an unsigned in MipsAssemblerOptions.
So, in this case, the result of a bitwise and is compared against 0,
but nevertheless, we should not assume to have probability
information.
CodeGen/ARM/2013-10-11-select-stalls.ll started failing because the
changed probabilities changed the results of
ARMBaseInstrInfo::isProfitableToIfCvt() and led to an Ifcvt of the
diamond in the test. AFAICT, the test was never meant to test this and
thus changing the test input slightly to not change the probabilities
seems like the best way to preserve the meaning of the test.
Remove all the global bits to do with preserving use-list order by
moving the `cl::opt`s to the individual tools that want them. There's a
minor functionality change to `libLTO`, in that you can't send in
`-preserve-bc-uselistorder=false`, but making that bit settable (if it's
worth doing) should be through explicit LTO API.
As a drive-by fix, I removed some includes of `UseListOrder.h` that were
made unnecessary by recent commits.
uselistorder: Pull the assembly bit up out of the printer
Pull the `-preserve-ll-uselistorder` bit up through all the callers of
`Module::print()`. I converted callers of `operator<<` to
`Module::print()` where necessary to pull the bit through.
uselistorder: Start pulling out -preserve-ll-uselistorder
For consistency, start pulling out `-preserve-ll-uselistorder`. I'll
drop the global state for both eventually. This pulls it up to
`Module::print()` (but not past there).
uselistorder: Pull the bit through WriteToBitcodFile()
Change the callers of `WriteToBitcodeFile()` to pass `true` or
`shouldPreserveBitcodeUseListOrder()` explicitly. I left the callers
that want to send `false` alone.
I'll keep pushing the bit higher until hopefully I can delete the global
`cl::opt` entirely.
Canonicalize access to whether to preserve use-list order in bitcode on
a `bool` stored in `ValueEnumerator`. Next step, expose this as a
`bool` through `WriteBitcodeToFile()`.
Rafael Espindola [Tue, 14 Apr 2015 22:54:16 +0000 (22:54 +0000)]
Use the ability to pwrite to simplify the ELF writer.
Now we don't have to do 2 synchronized passes to compute offsets and then
write the file.
This also includes a fix for the corner case of seeking in /dev/null. It
is not an error, but on some systems (Linux) the returned offset is
always 0. An error is signaled by returning -1. This is checked by
the existing tests now that "clang -o /dev/null ..." seeks.
[Inliner] Don't inline functions with frameescape calls
Inlining such intrinsics is very difficult, since you need to
simultaneously transform many calls to llvm.framerecover and potentially
duplicate the functions containing them. Normally this intrinsic isn't
added until EH preparation, which is part of the backend pass pipeline
after inlining. However, if it were to get fed through the inliner,
this change will ensure that it doesn't break the code.
Daniel Berlin [Tue, 14 Apr 2015 19:09:16 +0000 (19:09 +0000)]
Make updateDFSNumbers API public
Summary:
There are a number of passes that could be sped up by using dominator tree DFS numbers to order or compare things across multiple bbs
(MemorySSA, MergedLoadStoreMotion, EarlyCSE, Sinking, GVN, NewGVN, for starters :P).
For example, GVN/CSE elimination can be done with a simple stack/etc (instead of full-on scoped hash table or repeated leader set walks)
if the DFS pair is stored next to leaders.
The dominator tree keeps them, and the DOM tree nodes expose them as public, but you have no guarantee they are up to date (and in fact,
if you split blocks or whatever during your pass, they definitely won't be)
This means passes either have to compute their own versions[1], or make 32 queries, or ....
Rather than try to hide this, i just made the API public, and make it do nothing if the numbers are already valid.
[1] Which we want as a non-recursive walk, which is not pretty, sadly,
because it cannot use the depth first iterators since you don't get called on the way back up. So you either have to do one walk with po_iterator
and one with df_iterator, or write your own non-recursive walk that looks identical to the one in updateDFSNumbers.
verify-uselistorder: More outs() and errs(), less dbgs()
Change all the normally relevant output in `verify-uselistorder` from
using `dbgs()` to using `outs()` and `errs()`. Now you don't need
`-debug=uselistorder` to figure out what's going on (or at what stage
verification failed, or to get the paths of the left-behind temporary
files). This is a debugging tool, so I put the logging messages on
`outs()` and the error messages on `errs()`.
I also adjusted the output to be less ***loud***. Not sure why I was so
`*`-happy when I first wrote this.
David Blaikie [Tue, 14 Apr 2015 17:17:04 +0000 (17:17 +0000)]
Update test case to include the original source code & account for some changes in clang's order of emission
I'd added some stuff to this test case without adding the original
source, which makes updating/adding further stuff rather difficult. So
update it first (& it seems in the interim Clang's changed its output
order a bit, so adjust the CHECK lines to account for that - rather than
hand hacking the IR order which just makes it harder to maintain/change
next time)
Certain versions of CMake specify /W3 as part of CMAKE_CXX_FLAGS
by default, before you do anything. Appending /W4 to the end of
this and using the Ninja generator results in
cl : Command line warning D9025 : overriding '/W3' with '/W4'.
It is not possible to suppress this since it is a command line
warning and not a compiler warning, so we must fix the command
line to contain only one value for /Wn.
DebugInfo: Add implicit conversion from DISubprogram to DIScope
As a follow-up to r234850, add an implicit conversion from
`DISubprogram` to `DIScope` to support Kaleidoscope Ch. 8. This also
reverts that band-aid from r234890.
(/me learns *again* to build Kaleidoscope before commit...)