Erik Eckstein [Fri, 11 Nov 2016 21:15:13 +0000 (21:15 +0000)]
Make the FunctionComparator of the MergeFunctions pass a stand-alone utility.
This is pure refactoring. NFC.
This change moves the FunctionComparator (together with the GlobalNumberState
utility) in to a separate file so that it can be used by other passes.
For example, the SwiftMergeFunctions pass in the Swift compiler:
https://github.com/apple/swift/blob/master/lib/LLVMPasses/LLVMMergeFunctions.cpp
Details of the change:
*) The big part is just moving code out of MergeFunctions.cpp into FunctionComparator.h/cpp
*) Make FunctionComparator member functions protected (instead of private)
so that a derived comparator class can use them.
Following refactoring helps to share code between the base FunctionComparator
class and a derived class:
*) Add a beginCompare() function
*) Move some basic function property comparisons into a separate function compareSignature()
*) Do the GEP comparison inside cmpOperations() which now has a new
needToCmpOperands reference parameter
Fixed the lost FastMathFlags for FCmp operations in SLPVectorizer.
Reviewer: Michael Zolotukhin.
Differential Revision: https://reviews.llvm.org/D26543
Bitcode: Clean up error handling for certain bitcode query functions.
The functions getBitcodeTargetTriple(), isBitcodeContainingObjCCategory(),
getBitcodeProducerString() and hasGlobalValueSummary() now return errors
via their return value rather than via the diagnostic handler.
To make this work, re-implement these functions using non-member functions
so that they can be used without the LLVMContext required by BitcodeReader.
Lang Hames [Fri, 11 Nov 2016 19:42:44 +0000 (19:42 +0000)]
[ORC] Refactor the ORC RPC utilities to add some new features.
(1) Add support for function key negotiation.
The previous version of the RPC required both sides to maintain the same
enumeration for functions in the API. This means that any version skew between
the client and server would result in communication failure.
With this version of the patch functions (and serializable types) are defined
with string names, and the derived function signature strings are used to
negotiate the actual function keys (which are used for efficient call
serialization). This allows clients to connect to any server that supports a
superset of the API (based on the function signatures it supports).
(2) Add a callAsync primitive.
The callAsync primitive can be used to install a return value handler that will
run as soon as the RPC function's return value is sent back from the remote.
(3) Launch policies for RPC function handlers.
The new addHandler method, which installs handlers for RPC functions, takes two
arguments: (1) the handler itself, and (2) an optional "launch policy". When the
RPC function is called, the launch policy (if present) is invoked to actually
launch the handler. This allows the handler to be spawned on a background
thread, or added to a work list. If no launch policy is used, the handler is run
on the server thread itself. This should only be used for short-running
handlers, or entirely synchronous RPC APIs.
(4) Zero cost cross type serialization.
You can now define serialization from any type to a different "wire" type. For
example, this allows you to call an RPC function that's defined to take a
std::string while passing a StringRef argument. If a serializer from StringRef
to std::string has been defined for the channel type this will be used to
serialize the argument without having to construct a std::string instance.
This allows buffer reference types to be used as arguments to RPC calls without
requiring a copy of the buffer to be made.
Evgeniy Stepanov [Fri, 11 Nov 2016 18:49:09 +0000 (18:49 +0000)]
[cfi] Implement cfi-icall using inline assembly.
The current implementation is emitting a global constant that happens
to evaluate to the same bytes + relocation as a jump instruction on
X86. This does not work for PIE executables and shared libraries
though, because we end up with a wrong relocation type. And it has no
chance of working on ARM/AArch64 which use different relocation types
for jump instructions (R_ARM_JUMP24) that is never generated for
data.
This change replaces the constant with module-level inline assembly
followed by a hidden declaration of the jump table. Works fine for
ARM/AArch64, but has some drawbacks.
* Extra symbols are added to the static symbol table, which inflate
the size of the unstripped binary a little. Stripped binaries are not
affected. This happens because jump table declarations must be
external (because their body is in the inline asm).
* Original functions that were anonymous are now named
<original name>.cfi, and it affects symbolization sometimes. This is
necessary because the only user of these functions is the (inline
asm) jump table, so they had to be added to @llvm.used, which does
not allow unnamed functions.
Adrian Prantl [Fri, 11 Nov 2016 17:50:09 +0000 (17:50 +0000)]
Revert "Use private linkage for MergedGlobals variables" on Darwin.
This is a partial revert of r244615 (http://reviews.llvm.org/D11942),
which caused a major regression in debug info quality.
Turning the artificial __MergedGlobal symbols into private symbols
(l__MergedGlobal) means that the linker will not include them in the
symbol table of the final executable. Without a symbol table entry
dsymutil is not be able to process the debug info for any of the
merged globals and thus drops the debug info for all of them.
This patch is enabling the old behavior for all MachO targets while
leaving all other targets unaffected.
Greg Clayton [Fri, 11 Nov 2016 16:55:31 +0000 (16:55 +0000)]
Fix windows buildbot where warnings are errors. We had a switch statement where all enumerations were handled, but some compilers don't recognize this. Simplify the logic so that all compilers will know a return value is returned in all cases.
Greg Clayton [Fri, 11 Nov 2016 16:21:37 +0000 (16:21 +0000)]
Clean up DWARFFormValue by reducing duplicated code and removing DWARFFormValue::getFixedFormSizes()
In preparation for a follow on patch that improves DWARF parsing speed, clean up DWARFFormValue so that we have can get the fixed byte size of a form value given a DWARFUnit or given the version, address byte size and dwarf32/64.
This patch cleans up code so that everyone is using one of the new DWARFFormValue functions:
This patch changes DWARFFormValue::skipValue() to rely on the output of DWARFFormValue::getFixedByteSize(...) instead of duplicating the code in each function. This will reduce the number of changes we need to make to DWARF to fewer places in DWARFFormValue when we add support for new form.
This patch also starts to support DWARF64 so that we can get correct byte sizes for forms that vary according the DWARF 32/64.
To reduce the code duplication a new FormSizeHelper pure virtual class was created that can be created as a FormSizeHelperDWARFUnit when you have a DWARFUnit, or FormSizeHelperManual where you manually specify the DWARF version, address byte size and DWARF32/DWARF64. There is now a single implementation of a function that gets the fixed byte size (instead of two where one took a DWARFUnit and one took the DWARF version, address byte size and DWARFFormat enum) and one function to skip the form values.
Nemanja Ivanovic [Fri, 11 Nov 2016 14:41:19 +0000 (14:41 +0000)]
[PowerPC] Add vector conversion builtins to altivec.h - LLVM portion
This patch corresponds to review:
https://reviews.llvm.org/D26307
Adds all the intrinsics used for various conversion builtins that will
be added to altivec.h. These are type conversions between various types of
vectors.
Ulrich Weigand [Fri, 11 Nov 2016 12:48:26 +0000 (12:48 +0000)]
[SystemZ] Support CL(G)T instructions
This adds support for the compare logical and trap (memory)
instructions that were added as part of the miscellaneous
instruction extensions feature with zEC12.
Ulrich Weigand [Fri, 11 Nov 2016 12:43:51 +0000 (12:43 +0000)]
[SystemZ] Use LLGT(R) instructions
This adds support for the 31-to-64-bit zero extension instructions
LLGT and LLGTR and uses them for code generation where appropriate.
Since this operation can also be performed via RISBG, we have to
update SystemZDAGToDAGISel::tryRISBGZero so that we prefer LLGT
over RISBG in case both are possible. The patch includes some
simplification to the tryRISBGZero code; this is not intended
to cause any (further) functional change in codegen.
Teresa Johnson [Fri, 11 Nov 2016 05:34:58 +0000 (05:34 +0000)]
Split Bitcode/ReaderWriter.h into separate reader and writer headers
Summary:
Split ReaderWriter.h which contains the APIs into both the BitReader and
BitWriter libraries into BitcodeReader.h and BitcodeWriter.h.
This is to address Chandler's concern about sharing the same API header
between multiple libraries (BitReader and BitWriter). That concern is
why we create a single bitcode library in our downstream build of clang,
which led to r286297 being reverted as it added a dependency that
created a cycle only when there is a single bitcode library (not two as
in upstream).
This is a replacement to binutils' string tool. It prints strings found in a
binary (object file, executable, or archive library). It is rather bare and
not functionally equivalent, however, it lays the groundwork necessary for the
strings tool, enabling iterative development of features to reach feature
parity.
Matthias Braun [Fri, 11 Nov 2016 01:34:21 +0000 (01:34 +0000)]
ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps()
addSchedBarrierDeps() is supposed to add use operands to the ExitSU
node. The current implementation adds uses for calls/barrier instruction
and the MBB live-outs in all other cases. The use
operands of conditional jump instructions were missed.
Also added code to macrofusion to set the latencies between nodes to
zero to avoid problems with the fusing nodes lingering around in the
pending list now.
Adam Nemet [Fri, 11 Nov 2016 01:25:04 +0000 (01:25 +0000)]
[opt-viewer] Display inlining context
When a function is inlined, each instance is optimized in their own
inlining context. This can produce different remarks all pointing to
the same source line.
This adds a new column on the source view to display the inlining
context.
The NamedRegionTimer initializer without a group name puts the Timer
into the "Misc" group and is (nearly) unused. Remove it.
The only user of this constructor appears to be the HexagonGenInsert pass,
which creates a counter without group to count the complete execution
time of that pass, however since every pass gets a counter by the
PassManager anyway this should be unnecessary. Also removed the
pointless TimerGroup there.
Evandro Menezes [Thu, 10 Nov 2016 23:31:06 +0000 (23:31 +0000)]
[DAG Combiner] Fix the native computation of the Newton series for reciprocals
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself. However, the original code did not properly consider this condition
if returned by a target. This patch addresses the issues to allow a target
to compute the series on its own.
IR: Introduce inrange attribute on getelementptr indices.
If the inrange keyword is present before any index, loading from or
storing to any pointer derived from the getelementptr has undefined
behavior if the load or store would access memory outside of the bounds of
the element selected by the index marked as inrange.
This can be used, e.g. for alias analysis or to split globals at element
boundaries where beneficial.
As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html
Yaxun Liu [Thu, 10 Nov 2016 21:18:49 +0000 (21:18 +0000)]
AMDGPU: Emit runtime metadata as a note element in .note section
Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata.
However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html).
This patch lets AMDGPU backend emits runtime metadata as a note element in .note section.
Zachary Turner [Thu, 10 Nov 2016 20:16:45 +0000 (20:16 +0000)]
[Support] Improve flexibility of binary blob formatter.
This makes it possible to indent a binary blob by a certain
number of bytes, and also makes some things more idiomatic.
Finally, it integrates this binary blob formatter into ScopedPrinter
which used to have its own implementation of this algorithm.
Teresa Johnson [Thu, 10 Nov 2016 16:57:32 +0000 (16:57 +0000)]
Restore part of "[ThinLTO] Prevent exporting of locals used/defined in module level asm"
This restores the part of r286297 that didn't require adding a
dependency from the Analysis to Object library. There are two parts
to the original fix, and this will address the handling for the case
where locals are used in module level asm.
The part that requires functionality in libObject handles local defs
in module level asm, and was reverted because our downstream build
of clang builds lib/Bitcode into a single library, and this new
dependency introduced a cycle there. I am trying to get that fixed
(see D26502), so for now that change isn't being restored
Sanjay Patel [Thu, 10 Nov 2016 14:58:17 +0000 (14:58 +0000)]
[InstCombine] auto-generate better checks; NFC
Note that the existing metadata checking was re-added by hand because the
script doesn't currently know how to generate checks for lines outside of
functions.