David Majnemer [Wed, 4 May 2016 00:22:23 +0000 (00:22 +0000)]
[X86] Lower zext i1 arguments
i1 is now a legal type for X86 with AVX512.
There were some paths in X86FastISel which were not quite ready to see
an i1 value: they were not quite sure how to deal with sign/zero extends
for call arguments.
DTRT by extending to i8 for zeroext and bailing out of FastISel for
signext.
Vedant Kumar [Tue, 3 May 2016 23:32:31 +0000 (23:32 +0000)]
[Support] Add a free toString function for Error
toString() consumes an Error and returns a string representation of its
contents. This commit also adds a message() method to ErrorInfoBase for
convenience.
Kevin Enderby [Tue, 3 May 2016 23:13:50 +0000 (23:13 +0000)]
Produce another specific error message for a malformed Mach-O file when a load
command has a size less than 8 bytes.
I think the existing test case in test/Object/macho-invalid.test for
macho64-invalid-too-small-load-command was trying to test for this but that
test case triggered a different error given how it was constructed. So I
constructed a new test case that would trigger this specific error.
I also changed the error message to be consistent with the other malformed Mach-O
file error messages. I also removed object_error::macho_small_load_command from
Object/Error.h as it is not needed and can just use object_error::parse_failed
and let the error message string distinguish the error.
Zachary Turner [Tue, 3 May 2016 22:18:17 +0000 (22:18 +0000)]
Move CodeViewTypeStream to DebugInfo/CodeView
Ability to parse codeview type streams is also needed by
DebugInfoPDB for parsing PDBs, so moving this into a library
gives us this option. Since DebugInfoPDB had already hand
rolled some code to do this, that code is now convereted over
to using this common abstraction.
Justin Bogner [Tue, 3 May 2016 21:35:08 +0000 (21:35 +0000)]
PM: Check that loop passes preserve a basic set of analyses
A loop pass that didn't preserve this entire set of passes wouldn't
play well with other loop passes, since these are generally a basic
requirement to do any interesting transformations to a loop.
Adds a helper to get the set of analyses a loop pass should preserve,
and checks that any loop pass we run satisfies the requirement.
Reid Kleckner [Tue, 3 May 2016 20:53:20 +0000 (20:53 +0000)]
[ADT] Add drop_front method to ArrayRef
We have it for StringRef but not ArrayRef, and ArrayRef has drop_back,
so I see no reason it shouldn't have drop_front. Splitting this out of a
change that I have that will use this funcitonality.
Jack Liu [Tue, 3 May 2016 19:30:48 +0000 (19:30 +0000)]
[SROA] Function canConvertValue needs to check whether both NewTy and OldTy pointers are
pointing to the same addr space. This can prevent SROA from creating a bitcast
between pointers with different addr spaces.
Sanjoy Das [Tue, 3 May 2016 17:50:11 +0000 (17:50 +0000)]
[LICM] Kill SCEV loop dispositions if needed
SCEV caches whether SCEV expressions are loop invariant, variant or
computable. LICM breaks this cache, almost by definition; so clear the
SCEV disposition cache if LICM changed anything.
Sanjoy Das [Tue, 3 May 2016 17:50:02 +0000 (17:50 +0000)]
[LoopDeletion] Clear SCEV loop dispositions
`Loop::makeLoopInvariant` can hoist instructions out of loops, so loop
dispositions for the loop it operated on may need to be cleared. We can
be smarter here (especially around how `forgetLoopDispositions` is
implemented), but let's be correct first.
Sanjoy Das [Tue, 3 May 2016 17:49:57 +0000 (17:49 +0000)]
[SCEV] Tweak the output format and content of -analyze
In the "LoopDispositions:" section:
- Instead of printing out a list, print out a "dictionary" to make it
obvious by inspection which disposition is for which loop. This is
just a cosmetic change.
- Print dispositions for parent _and_ sibling loops. I will use this
to write a test case.
Kevin Enderby [Tue, 3 May 2016 17:16:08 +0000 (17:16 +0000)]
Produce another specific error message for a malformed Mach-O file when a load
command other than the first one is past the end of the load commands.
This is like the test case in test/Object/macho-invalid.test for
macho64-invalid-incomplete-load-command but it is the second load command
that is past the end of all the load commands instead of the first.
The code in the constructor for MachOObjectFile that loops over the load
commands used getNextLoadCommandInfo() which was not producing
a good error message. So that was fixed and a test case was added.
Mehdi Amini [Tue, 3 May 2016 15:46:00 +0000 (15:46 +0000)]
Move "Eliminate Available Externally" immediately after the inliner
This pass is supposed to reduce the size of the IR for compile time
purpose. We should run it ASAP, except when we prepare for LTO or
ThinLTO, and we want to keep them available for link-time inline.
Anna Thomas [Tue, 3 May 2016 14:58:21 +0000 (14:58 +0000)]
Fold compares irrespective of whether allocation can be elided
Summary
When a non-escaping pointer is compared to a global value, the
comparison can be folded even if the corresponding malloc/allocation
call cannot be elided.
We need to make sure the global value is not null, since comparisons to
null cannot be folded.
In future, we should also handle cases when the the comparison
instruction dominates the pointer escape.
Daniel Sanders [Tue, 3 May 2016 13:35:44 +0000 (13:35 +0000)]
[mips] Use MipsMCExpr instead of MCSymbolRefExpr for all relocations.
Summary:
This is much closer to the way MIPS relocation expressions work
(%hi(foo + 2) rather than %hi(foo) + 2) and removes the need for the
various bodges in MipsAsmParser::evaluateRelocExpr().
Removing those bodges ensures that the constant stored in MCValue is the
full 32 or 64-bit (depending on ABI) offset from the symbol. This will be used
to correct the %hi/%lo matching needed to sort the relocation table correctly.
As part of this:
* Gave MCExpr::print() the ability to omit parenthesis when emitting a
symbol reference inside a MipsMCExpr operator like %hi(X). Without this
we print things like %lo(($L1)).
* %hi(%neg(%gprel(X))) is now three MipsMCExpr's instead of one. Most of
the related special cases have been removed or moved to MipsMCExpr. We
can remove the rest as we gain support for the less common relocations
when they are not part of this specific combination.
* Renamed MipsMCExpr::VariantKind and the enum prefix ('VK_') to avoid confusion
with MCSymbolRefExpr::VariantKind and its prefix (also 'VK_').
* fixup_Mips_GOT_Local and fixup_Mips_GOT_Global were found to be identical
and merged into fixup_Mips_GOT.
* MO_GOT16 and MO_GOT turned out to be identical and have been merged into
MO_GOT.
* VK_Mips_GOT and VK_Mips_GOT16 turned out to be the same thing so they
have been merged into MEK_GOT
Igor Breger [Tue, 3 May 2016 11:51:45 +0000 (11:51 +0000)]
[AVX512] Add support for commutative MAX/MIN . In general VMAX{PS,PD} and VMIN{PS,PD} instruction are not commutative . In combine pass only if UnsafeFPMath are used VMAX/VMAX are converted to commutative nodes VMAXC/VMAXC.
Kristof Beyls [Tue, 3 May 2016 08:33:26 +0000 (08:33 +0000)]
Mark that SpeculativeExecution preserves Globals Alias Analysis.
A few benchmarks with lots of accesses to global variables in the hot
loops regressed a lot since r266399, which added the
SpeculativeExecution pass to the default pipeline. The problem is that
this pass doesn't mark Globals Alias Analysis as preserved. Globals
Alias Analysis is computed in a module pass, whereas
SpeculativeExecution is a function pass, and a lot of passes dependent
on the Globals Alias Analysis to optimize these benchmarks are also
function passes. As such, the Globals Alias Analysis information cannot
be recomputed between SpeculativeExecution and the following function
passes needing that information.
SpeculativeExecution doesn't invalidate Globals Alias Analysis, so mark
it as such to fix those performance regressions.
Craig Topper [Tue, 3 May 2016 05:54:13 +0000 (05:54 +0000)]
[CodeGen] Add some space optimized forms of EmitNode and MorphNodeTo that implicitly indicate the number of result VTs. This shaves about 16K off the X86 matching table taking it down to about 470K.
Overall this reduces the llc binary size with all in-tree targets by about 40K.
David Majnemer [Tue, 3 May 2016 03:57:40 +0000 (03:57 +0000)]
[LoopUnroll] Unroll loops which have exit blocks to EH pads
We were overly cautious in our analysis of loops which have invokes
which unwind to EH pads. The loop unroll transform is safe because it
only clones blocks in the loop body, it does not try to split critical
edges involving EH pads. Instead, move the necessary safety check to
LoopUnswitch.
N.B. The safety check for loop unswitch is covered by an existing test
which fails without it.
Zachary Turner [Tue, 3 May 2016 00:28:21 +0000 (00:28 +0000)]
Parse the TPI (type information) stream of PDB files.
This parses the TPI stream (stream 2) from the PDB file. This stream
contains some header information followed by a series of codeview records.
There is some additional complexity here in that alongside this stream of
codeview records is a serialized hash table in order to efficiently query
the types. We parse the necessary bookkeeping information to allow us to
reconstruct the hash table, but we do not actually construct it yet as
there are still a few things that need to be understood first.
Zachary Turner [Tue, 3 May 2016 00:28:04 +0000 (00:28 +0000)]
Move llvm-readobj/StreamWriter to Support.
We wish to re-use this from llvm-pdbdump, and it provides a nice
way to print structured data in scoped format that could prove
useful for many other dumping tools as well. Moving to support
and changing name to ScopedPrinter to better reflect its purpose.
Mehdi Amini [Tue, 3 May 2016 00:27:28 +0000 (00:27 +0000)]
ThinLTO: do not import function whose linkage prevents inlining.
There is not point in importing a "weak" or a "linkonce" function
since we won't be able to inline it anyway.
We already had a targeted check for WeakAny, this is using the
same check on GlobalValue as the inline, i.e.
isMayBeOverriddenLinkage()
Matthias Braun [Tue, 3 May 2016 00:08:46 +0000 (00:08 +0000)]
LivePhysRegs: Automatically determine presence of pristine regs.
Remove the AddPristinesAndCSRs parameters from
addLiveIns()/addLiveOuts().
We need to respect pristine registers after prologue epilogue insertion,
Seeing that we got this wrong in at least two commits already, we should
rather pay the small price to query MachineFrameInfo for it.
There are three cases that did not set AddPristineAndCSRs to true even
after register allocation:
- ExecutionDepsFix: live-out registers are used as a hint that the
register is used soon. This is not true for pristine registers so
use the new addLiveOutsNoPristines() to maintain this behaviour.
- SystemZShortenInst: Not setting AddPristineAndCSRs to true looks like
a bug, should do the right thing automatically now.
- StackMapLivenessAnalysis: Not adding pristine registers looks like a
bug to me. Added a FIXME comment but maintain the current behaviour
as a change may need to get coordinated with GC runtimes.
Adrian McCarthy [Mon, 2 May 2016 23:45:03 +0000 (23:45 +0000)]
NFC: An iterator for stepping through CodeView type stream in llvm-readobj
This is a small refactoring step toward moving CodeView type stream logic from llvm-readobj to a library. It abstracts the logic of stepping through the stream into an iterator class and updates llvm-readobj to use that iterator. This has no functional change; llvm-readobj produces identical output.
The next step is to abstract the parsing of the different leaf types and then move that and the iterator into a library.
Since this is my first contrib outside LLDB, please let me know if I'm messing up on any of the LLVM style guidelines, idioms, or patterns.
Reid Kleckner [Mon, 2 May 2016 23:22:18 +0000 (23:22 +0000)]
[MC] Create unique .pdata sections for every .text section
Summary:
This adds a unique ID to the COFF section uniquing map, similar to the
one we have for ELF. The unique id is not currently exposed via the
assembler because we don't have a use case for it yet. Users generally
create .pdata with the .seh_* family of directives, and the assembler
internally needs to produce .pdata and .xdata sections corresponding to
the code section.
The association between .text sections and the assembler-created .xdata
and .pdata sections is maintained as an ID field of MCSectionCOFF. The
CFI-related sections are created with the given unique ID, so if more
code is added to the same text section, we can find and reuse the CFI
sections that were already created.
[MachineBlockPlacement] Let the target optimize the branches at the end.
After the layout of the basic blocks is set, the target may be able to get rid
of unconditional branches to fallthrough blocks that the generic code does not
catch. This happens any time TargetInstrInfo::AnalyzeBranch is not able to
analyze all the branches involved in the terminators sequence, while still
understanding a few of them.
In such situation, AnalyzeBranch can directly modify the branches if it has been
instructed to do so.
[X86] Model FAULTING_LOAD_OP as a terminator and branch.
This operation may branch to the handler block and we do not want it
to happen anywhere within the basic block.
Moreover, by marking it "terminator and branch" the machine verifier
does not wrongly assume (because of AnalyzeBranch not knowing better)
the branch is analyzable. Indeed, the target was seeing only the
unconditional branch and not the faulting load op and thought it was
a simple unconditional block.
The machine verifier was complaining because of that and moreover,
other optimizations could have done wrong transformation!
In the process, simplify the representation of the handler block in
the faulting load op. Now, we directly reference the handler block
instead of using a label. This has the benefits of:
1. MC knows how to issue a label for a BB, so leave that to it.
2. Accessing the target BB from its label is painful, whereas it is
direct from a MBB operand.
Note: The 2 bytes offset in implicit-null-check.ll comes from the
fact the unconditional jumps are not removed anymore, as the whole
terminator sequence is not analyzable anymore.
Wolfgang Pieb [Mon, 2 May 2016 22:50:51 +0000 (22:50 +0000)]
DebugInfo: Avoid propagating incorrect debug locations in SelectionDAG via CSE.
Summary:
When SelectionDAG performs CSE it is possible that the context's source
location is different from that of the selected node. This can lead to
incorrect line number records. We update the debug location to the
one that occurs earlier in the instruction sequence.
Mehdi Amini [Mon, 2 May 2016 22:11:27 +0000 (22:11 +0000)]
ThinLTO: do not import function whose linkage prevents inlining.
There is not point in importing a "weak" or a "linkonce" function
since we won't be able to inline it anyway.
We already had a targeted check for WeakAny, this is using the
same check on GlobalValue as the inline, i.e.
isMayBeOverriddenLinkage()
Frederic Riss [Mon, 2 May 2016 21:06:14 +0000 (21:06 +0000)]
[dsymutil] Create the temporary files in the system temp directory.
llvm-dsymutil used to create the temporary files in the output directory.
This works fine except when the output directory contains a '%' char, which
is then replaced by llvm::sys::fs::createUniqueFile() generating an invalid
path.
Just use the default temp dir for those files.
Kevin Enderby [Mon, 2 May 2016 20:28:12 +0000 (20:28 +0000)]
Thread Expected<...> up from libObject’s getType() for symbols to allow llvm-objdump to produce a good error message.
Produce another specific error message for a malformed Mach-O file when a symbol’s
section index is more than the number of sections. The existing test case in test/Object/macho-invalid.test
for macho-invalid-section-index-getSectionRawName now reports the error with the message indicating
that a symbol at a specific index has a bad section index and that bad section index value.
Again converting interfaces to Expected<> from ErrorOr<> does involve
touching a number of places. Where the existing code reported the error with a
string message or an error code it was converted to do the same.
Also there some were bugs in the existing code that did not deal with the
old ErrorOr<> return values. So now with Expected<> since they must be
checked and the error handled, I added a TODO and a comment:
"// TODO: Actually report errors helpfully" and a call something like
consumeError(NameOrErr.takeError()) so the buggy code will not crash
since needed to deal with the Error.
Matt Arsenault [Mon, 2 May 2016 20:07:26 +0000 (20:07 +0000)]
AMDGPU: Make i64 loads/stores promote to v2i32
Now that unaligned access expansion should not attempt
to produce i64 accesses, we can remove the hack in
PreprocessISelDAG where this is done.
This allows splitting i64 private accesses while
allowing the new add nodes indexing the vector components
can be folded with the base pointer arithmetic.
John Regehr [Mon, 2 May 2016 19:58:00 +0000 (19:58 +0000)]
[LVI] Add an API to LazyValueInfo so that it can export ConstantRanges
that it computes. Currently this is used for testing and precision
tuning, but it might be used by optimizations later.
Tom Stellard [Mon, 2 May 2016 19:37:56 +0000 (19:37 +0000)]
AMDGPU/SI: Set the kill flag on temp VGPRs used to restore SGPRs from scratch
Summary:
When we restore an SGPR value from scratch, we first load it into a
temporary VGPR and then use v_readlane_b32 to copy the value from the
VGPR back into an SGPR.
We weren't setting the kill flag on the VGPR in the v_readlane_b32
instruction, so the register scavenger wasn't able to re-use this
temp value later.