Vedant Kumar [Thu, 1 Dec 2016 19:38:50 +0000 (19:38 +0000)]
[tablegen] Delete duplicates from a vector without skipping elements
Tablegen's -gen-instr-info pass has a bug in its emitEnums() routine.
The function intends for values in a vector to be deduplicated, but it
accidentally skips over elements after performing a deletion.
I think there are smarter ways of doing this deduplication, but we can
do that in a follow-up commit if there's interest. See the thread:
[PATCH] TableGen InstrMapping Bug fix.
Matthias Braun [Thu, 1 Dec 2016 19:32:15 +0000 (19:32 +0000)]
Move most EH from MachineModuleInfo to MachineFunction
Recommitting r288293 with some extra fixes for GlobalISel code.
Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.
This is a necessary step to have machine module passes work properly.
Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
because the available MachineFunction pointers are const, but the code
wants to call tidyLandingPads() in between
(markFunctionEnd()/endFunction()).
Mehdi Amini [Thu, 1 Dec 2016 19:08:38 +0000 (19:08 +0000)]
Fix module map to create a module for the configured header Config/abi-breaking.h
A client of a header that relies on ABI breaking should get the macro
exported there.
Before this, the unittest for Support/Error including Support/Error.h
didn't get the macro exported by the Support module, because the
latter only re-export its submodules and included module, not
textual headers.
Hopefully, it'll also fix the build with local submodule visibility,
since the LLVM_Utils contains two submodules: ADT and Support. They
both include abi-breaking.h that defines a symbol. The textual
inclusion lead to a double definition of the symbol which broke
the parent module.
Greg Clayton [Thu, 1 Dec 2016 18:56:29 +0000 (18:56 +0000)]
This change removes the dependency on DwarfDebug that was used for DW_FORM_ref_addr by making a new DIEUnit class in DIE.cpp.
The DIEUnit class represents a compile or type unit and it owns the unit DIE as an instance variable. This allows anyone with a DIE, to get the unit DIE, and then get back to its DIEUnit without adding any new ivars to the DIE class. Why was this needed? The DIE class has an Offset that is always the CU relative DIE offset, not the "offset in debug info section" as was commented in the header file (the comment has been corrected). This is great for performance because most DIE references are compile unit relative and this means most code that accessed the DIE's offset didn't need to make it into a compile unit relative offset because it already was. When we needed to emit a DW_FORM_ref_addr though, we needed to find the absolute offset of the DIE by finding the DIE's compile/type unit. This class did have the absolute debug info/type offset and could be added to the CU relative offset to compute the absolute offset. With this change we can easily get back to a DIE's DIEUnit which will have this needed offset. Prior to this is required having a DwarfDebug and required calling:
DwarfCompileUnit *DwarfDebug::lookupUnit(const DIE *CU) const;
Now we can use the DIEUnit class to do so without needing DwarfDebug. All clients now use DIEUnit objects (the DwarfDebug stack and the DwarfLinker). A follow on patch for the DWARF generator will also take advantage of this.
Alexey Bataev [Thu, 1 Dec 2016 18:42:42 +0000 (18:42 +0000)]
[SLP] Fixed cost model for horizontal reduction.
Currently when cost of scalar operations is evaluated the vector type is
used for scalar operations. Patch fixes this issue and fixes evaluation
of the vector operations cost.
Several test showed that vector cost model is too optimistic. It
allowed vectorization of 8 or less add/fadd operations, though scalar
code is faster. Actually, only for 16 or more operations vector code
provides better performance.
Kuba Mracek [Thu, 1 Dec 2016 17:45:22 +0000 (17:45 +0000)]
Recommit r287403 (reverted in r287804): [lit] When setting SDKROOT on Darwin, use '--sdk macosx' to find the right SDK path.
This shouls now be safe and not break any more bots. It's strictly better to use '--sdk macosx', otherwise xcrun can return weird things for example when you have Command Line Tools or the SDK installed into '/'.
Adam Nemet [Thu, 1 Dec 2016 17:34:50 +0000 (17:34 +0000)]
[GVN, OptDiag] Print the interesting instructions involved in missed load-elimination
[recommitting after the fix in r288307]
This includes the intervening store and the load/store that we're trying
to forward from in the optimization remark for the missed load
elimination.
This is hooked up under a new mode in ORE that allows for compile-time
budget for a bit more analysis to print more insightful messages. This
mode is currently enabled for -fsave-optimization-record (-Rpass is
trickier since it is controlled in the front-end).
With this we can now print the red remark in http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446
Adam Nemet [Thu, 1 Dec 2016 17:34:44 +0000 (17:34 +0000)]
[GVN, OptDiag] Include the value that is forwarded in load elimination
[recommitting after the fix in r288307]
This requires some changes to the opt-diag API. Hal and I have
discussed this at the Dev Meeting and came up with a streaming delimiter
(setExtraArgs) to solve this.
Arguments after this delimiter are only included in the optimization
records and not in the remarks printed in the compiler output. (Note,
how in the test the content of the YAML file changes but the remarks on
the compiler output don't.)
This implements the green GVN message with a bug fix at line
http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446
The fix is that now we properly include the constant value in the
message: "load of type i32 eliminated in favor of 7"
Ulrich Weigand [Thu, 1 Dec 2016 17:10:27 +0000 (17:10 +0000)]
[SystemZ] Fix applyFixup for 12-bit fixups
Now that we have fixups that only fill parts of a byte, it turns
out we have to mask off the bits outside the fixup area when
applying them. Failing to do so caused invalid object code to
be emitted for bprp with a negative 12-bit displacement.
Adam Nemet [Thu, 1 Dec 2016 16:40:32 +0000 (16:40 +0000)]
[GVN] Basic optimization remark support
[recommitting after the fix in r288307]
Follow-on patches will add more interesting cases.
The goal of this patch-set is to get the GVN messages printed in
opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This
is the optimization view for the function (the last remark in the
function has a bug which is fixed in this series):
http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430
Object: Extract a ModuleSymbolTable class from IRObjectFile.
This class represents a symbol table built from in-memory IR. It provides
access to GlobalValues and should only be used if such access is required
(e.g. in the LTO implementation). We will eventually change IRObjectFile
to read from a bitcode symbol table rather than using ModuleSymbolTable,
so it would not be able to expose the module.
Bitcode: Correctly handle Fixed and VBR arrays in BitstreamCursor::skipRecord().
The assertions were wrong; we need to call getEncodingData() on the element,
not the array. While here, simplify the skipRecord() implementation for Fixed
and Char6 arrays. This is tested by the code I added to llvm-bcanalyzer
which makes sure that we can skip any record.
Adam Nemet [Thu, 1 Dec 2016 03:56:43 +0000 (03:56 +0000)]
[GVN] When merging blocks update LoopInfo if it's available
If LoopInfo is available during GVN, BasicAA will use it. However
MergeBlockIntoPredecessor does not update LI as it merges blocks.
This didn't use to cause problems because LI was freed before
GVN/BasicAA. Now with OptimizationRemarkEmitter, the lifetime of LI is
extended so LI needs to be kept up-to-date during GVN.
Ivan Krasin [Thu, 1 Dec 2016 02:54:54 +0000 (02:54 +0000)]
Use trigrams to speed up SpecialCaseList.
Summary:
it's often the case when the rules in the SpecialCaseList
are of the form hel.o*bar. That gives us a chance to build
trigram index to quickly discard 99% of inputs without
running a full regex. A similar idea was used in Google Code Search
as described in the blog post:
https://swtch.com/~rsc/regexp/regexp4.html
The check is defeated, if there's at least one regex
more complicated than that. In this case, all inputs
will go through the regex. That said, the real-world
rules are often simple or can be simplied. That considerably
speeds up compiling Chromium with CFI and UBSan.
As measured on Chromium's content_message_generator.cc:
before, CFI: 44 s
after, CFI: 23 s
after, CFI, no blacklist: 23 s (~1% slower, but 3 runs were unable to show the difference)
after, regular compilation to bitcode: 23 s
Support a new assembler directive, .import_global, to declare imported
global variables (i.e. those with external linkage and no
initializer). The linker turns these into wasm imports.
Matthias Braun [Wed, 30 Nov 2016 23:49:01 +0000 (23:49 +0000)]
Move most EH from MachineModuleInfo to MachineFunction
Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.
This is a necessary step to have machine module passes work properly.
Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
because the available MachineFunction pointers are const, but the code
wants to call tidyLandingPads() in between
(markFunctionEnd()/endFunction()).
Matthias Braun [Wed, 30 Nov 2016 23:48:26 +0000 (23:48 +0000)]
MCStreamer: Use "cfi" for CFI related temp labels.
Choosing a "cfi" name makes the intend a bit clearer in an assembly dump
and more importantly the assembly dumps are slightly more stable as the
numbers don't move around anymore when unrelated code calls
createTempSymbol() more or less often.
As they are temp labels the name doesn't influence the generated object
code.
Maintain the command line resolutions as a map to a list of resolutions
rather than a single resolution, and apply the resolutions in the order
observed. This is not only simpler but allows us to test the scenario where
the two symbols have different resolutions.
Paul Robinson [Wed, 30 Nov 2016 22:49:55 +0000 (22:49 +0000)]
Recommit r288212: Emit 'no line' information for interesting 'orphan' instructions.
The LLDB tests are now ready for this patch.
DWARF specifies that "line 0" really means "no appropriate source
location" in the line table. Use this for branch targets and some
other cases that have no specified source location, to prevent
inheriting unfortunate line numbers from physically preceding
instructions (which might be from completely unrelated source).
David Callahan [Wed, 30 Nov 2016 22:32:58 +0000 (22:32 +0000)]
Only computeRelativePath() on new members
Summary:
When using thin archives, and processing the same archive multiple times, we were mangling existing entries. The root cause is that we were calling computeRelativePath() more than once. Here, we only call it when adding new members to an archive.
Note that D27218 changes the way thin archives are printed, and will break the new unit test included here. Depending on which one lands first, the other will need to be slightly modified.
Joel Jones [Wed, 30 Nov 2016 22:25:24 +0000 (22:25 +0000)]
[AArch64] Refactor LSE support as feature separate from V8.1a support.
Summary:
This is preparation for ThunderX processors that have Large
System Extension (LSE) atomic instructions, but not the
other instructions introduced by V8.1a.
This will mimic changes to GCC as described here:
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00388.html
Zachary Turner [Wed, 30 Nov 2016 21:44:26 +0000 (21:44 +0000)]
[LibFuzzer] Add Windows implementations of some IO functions.
This patch moves some posix specific file i/o code into a new
file, FuzzerIOPosix.cpp, and provides implementations for these
functions on Windows in FuzzerIOWindows.cpp. This is another
incremental step towards getting libfuzzer working on Windows,
although it still should not be expected to be fully working.
Patch by Marcos Pividori
Differential Revision: https://reviews.llvm.org/D27233
The basic idea is that when the average dynamic trip-count of a loop is known,
based on PGO, to be low, we can expect a performance win by peeling off the
first several iterations of that loop.
Unlike unrolling based on a known trip count, or a trip count multiple, this
doesn't save us the conditional check and branch on each iteration. However,
it does allow us to simplify the straight-line code we get (constant-folding,
etc.). This is important given that we know that we will usually only hit this
code, and not the actual loop.
Mehdi Amini [Wed, 30 Nov 2016 19:12:53 +0000 (19:12 +0000)]
[git-llvm] Use --force-interactive when commiting to enable SVN to prompt password
When svn does not know the password and it has to prompt, it needs to query.
However it won't when invoked from the Python script and instead fails with:
svn: E215004: Authentication failed and interactive prompting is disabled; see the --force-interactive option
Zachary Turner [Wed, 30 Nov 2016 19:06:14 +0000 (19:06 +0000)]
[LibFuzzer] Split up some functions among different headers.
In an effort to get libfuzzer working on Windows, we need to make
a distinction between what functions require platform specific
code (e.g. different code on Windows vs Linux) and what code
doesn't. IO functions, for example, tend to be platform
specific.
This patch separates out some of the functions which will need
to have platform specific implementations into different headers,
so that we can then provide different implementations for each
platform.
Aside from that, this patch contains no functional change. It
is purely a re-organization.
Patch by Marcos Pividori
Differential Revision: https://reviews.llvm.org/D27230
Silviu Baranga [Wed, 30 Nov 2016 17:04:22 +0000 (17:04 +0000)]
[AArch64] Fix useful bits detection for BFM instructions
Summary:
When computing useful bits for a BFM instruction, we need
to take into consideration the case where both operands
of the BFM are equal and provide data that we need to track.
Simon Pilgrim [Wed, 30 Nov 2016 16:33:46 +0000 (16:33 +0000)]
[X86][SSE] Add support for target shuffle constant folding
Initial support for target shuffle constant folding in cases where all shuffle inputs are constant. We may be able to relax this and merge shuffles with only some constant inputs in the future.
I've added the helper function getTargetConstantBitsFromNode (based off a similar function in X86ShuffleDecodeConstantPool.cpp) that could be reused for other cases requiring constant vector extraction.
Pavel Labath [Wed, 30 Nov 2016 15:34:29 +0000 (15:34 +0000)]
[Support] Use HAVE_DLOPEN to guard dlopen(3) usage
Summary:
The usage was previously guarded by HAVE_DLFCN. This breaks on Android with
LLVM_BUILD_STATIC as the platform does not provide a static version of libdl.
Using HAVE_DLOPEN fixes it as the code will only get used if we are actually able
to link an executable using dlopen.
Nemanja Ivanovic [Tue, 29 Nov 2016 23:36:03 +0000 (23:36 +0000)]
[PowerPC] Improvements for BUILD_VECTOR Vol. 2
This patch corresponds to review:
https://reviews.llvm.org/D25980
This is the 2nd patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC. This particular patch combines a build vector
of fp-to-int conversions into an fp-to-int conversion of a build vector of fp
values. For example:
Converts (build_vector (fp_to_[su]i $A), (fp_to_[su]i $B), ...)
Into (fp_to_[su]i (build_vector $A, $B, ...))).
Which is a natural match for much cleaner code.
Jacques Pienaar [Tue, 29 Nov 2016 23:01:09 +0000 (23:01 +0000)]
[lanai] Manually match 0/-1 with R0/R1.
Summary: Previously 0 and -1 was matched via tablegen rules. But this could cause problems where a physical register was being used where a virtual register was expected (seen in optimizeSelect and TwoAddressInstructionPass). Instead follow AArch64 and match in DAGToDAGISel.
Paul Robinson [Tue, 29 Nov 2016 22:41:16 +0000 (22:41 +0000)]
Emit 'no line' information for interesting 'orphan' instructions.
DWARF specifies that "line 0" really means "no appropriate source
location" in the line table. Use this for branch targets and some
other cases that have no specified source location, to prevent
inheriting unfortunate line numbers from physically preceding
instructions (which might be from completely unrelated source).