Daniel Sanders [Thu, 20 Jul 2017 09:25:44 +0000 (09:25 +0000)]
[globalisel][tablegen] Add control-flow to the MatchTable.
Summary:
This will allow us to merge the various sub-tables into a single table. This is a
compile-time saving at this point. However, this will also enable the optimization
of a table so that similar instructions can be tested together, reducing the time
spent on the matching the code.
The bulk of this patch is a mechanical conversion to the new MatchTable object
which is responsible for tracking label definitions and filling in the index of
the jump targets. It is also responsible for nicely formatting the table.
This was necessary to support the new GIM_Try opcode which takes the index to
jump to if the match should fail. This value is unknown during table
construction and is filled in during emission. To support nesting try-blocks
(although we currently don't emit tables with nested try-blocks), GIM_Reject
has been re-introduced to explicitly exit a try-block or fail the overall match
if there are no active try-blocks.
Introduced FSELECT node necesary when lowering ISD::SELECT
which has i32, f64, f64 as its operands.
SEL_D instruction required that its output and first operand
of a SELECT node, which it used, have matching types.
MTC1_D64 node introduced to aid FSELECT lowering.
This fixes machine verifier errors on following tests:
CodeGen/Mips/llvm-ir/select-dbl.ll
CodeGen/Mips/llvm-ir/select-flt.ll
CodeGen/Mips/select.ll
Matt Arsenault [Thu, 20 Jul 2017 05:17:54 +0000 (05:17 +0000)]
AMDGPU: Correct encoding for global instructions
The soffset field needs to be be set to 0x7f to disable it,
not 0. 0 is interpreted as an SGPR offset.
This should be enough to get basic usage of the global instructions
working. Technically it is possible to use an SGPR_32 offset,
but I'm not sure if it's correct with 64-bit pointers, but
that is not handled now. This should also be cleaned up
to be more similar to how different MUBUF modes are handled,
and to have InstrMappings between the different types.
[DWARF] Added check that verifies that no abbreviation declaration has more than one attribute with the same name.
SUMMARY
This patch adds a verification check on the abbreviation declarations in the .debug_abbrev section.
The check makes sure that no abbreviation declaration has more than one attributes with the same name.
Support, IR, ADT: Check nullptr after allocation with malloc/realloc or calloc
As a follow up of the bad alloc handler patch, this patch introduces nullptr checks on pointers returned from the
malloc/realloc/calloc functions. In addition some memory size assignments are moved behind the allocation
of the corresponding memory to fulfill exception safe memory management (RAII).
[libFuzzer] add DeepRecursionTest, inspired by https://guidovranken.wordpress.com/2017/07/08/libfuzzer-gv-new-techniques-for-dramatically-faster-fuzzing/ (Stack-depth-guided fuzzing). libFuzzer does not solve it yet.
Petr Hosek [Wed, 19 Jul 2017 23:51:13 +0000 (23:51 +0000)]
[LLVM][llvm-objcopy] Added basic plumbing to get things started
As discussed on llvm-dev I've implemented the first basic steps towards
llvm-objcopy/llvm-objtool (name pending).
This change adds the ability to copy (without modification) 64-bit
little endian ELF executables that have SHT_PROGBITS, SHT_NOBITS,
SHT_NULL and SHT_STRTAB sections.
Add optimization remarks support to the PrologueEpilogueInserter. For
now, emit the stack size as an analysis remark, but more additions wrt
shrink-wrapping may be added.
The optional external function callbacks have to be exported in order
for them to be called. The test was failing because libFuzzer wasn't
calling LLVMFuzzerInitialize.
We can reconsider if this is the best way to mark these optional
callbacks exported later.
Adam Nemet [Wed, 19 Jul 2017 22:04:59 +0000 (22:04 +0000)]
[opt-viewer] Reduce memory consumption by another 20-25%
The Args field of the remark which consists of a list of mappings in YAML is
translated into a list of (small) dicts on Python. An empty dict is 280 bytes
on my system so we can save memory by using a tuple of tuples instead.
Making a tuple of tuples rather than a list of tuples allows Args to be shared
with the key of the remark. This is actually an even greater saving. (Keys
are alive throughout the entire run in all_remarks.)
Here are a few opt-stats runs with different input sizes while measuring heap
usage with heapy. Avg remark size is simply estimated as
heap-size / # of remarks:
Adam Nemet [Wed, 19 Jul 2017 22:04:56 +0000 (22:04 +0000)]
[opt-viewer] Reduce memory consumption
The observation is that we have a lot of similar remarks with lots of
identical strings (e.g. file paths, text from the remark). Storing a copy of
each of those strings in memory is wasteful. This makes all the strings in
the remark interned which maintains a single immutable instance that is
referenced everywhere.
I get an average 20% heap size reduction with this but it's possible that this
varies with the typical length of the file paths used. (I used heapy to
report the heap size.) Runtime is same or a tiny bit better.
Petr Hosek [Wed, 19 Jul 2017 20:38:46 +0000 (20:38 +0000)]
[yaml2obj][ELF] Add support for program headers
This change adds basic support for program headers.
I need to do some testing which requires generating program headers but
I can't use ld.lld or clang to produce programs that have headers. I'd
also like to test some strange things that those programs may never
produce.
Martin Storsjo [Wed, 19 Jul 2017 20:14:32 +0000 (20:14 +0000)]
[AArch64] Force relocations for all ADRP instructions
This generalizes an existing fix from ELF to MachO and COFF.
Test that an ADRP to a local symbol whose offset is known at assembly
time still produces relocations, both for MachO and COFF. Test that
an ADRP without a @page modifier on MachO fails (previously it
didn't).
Brian Gesiak [Wed, 19 Jul 2017 18:37:02 +0000 (18:37 +0000)]
[cmake] GetSVN.cmake takes a list of arguments
Summary:
GetSVN.cmake currently takes one or two pairs of <source directory path,
name>, then attempts to get the remote repository URL and source control
revision of the repositories at those one or two paths.
It takes two pairs in order for Clang to get the revision of both
itself, and its dependency LLVM.
For projects that rely upon both LLVM and Clang (Apple's Swift is one
example, but there are others), GetSVN.cmake is used to fetch *three*
revisions: Swift, Clang, and LLVM.
To support this use case, change GetSVN.cmake: instead of taking one or
two pairs (specified via `FIRST_SOURCE_DIR`/`FIRST_NAME`, have it take a list
of pairs (`SOURCE_DIRS`/`NAMES`).
In order to allow Clang to migrate, have GetSVN.cmake support both sets
of arguments for now. The old arguments can be removed once Clang begins
using the new arguments, and Swift can follow when it updates its
copy of LLVM.
Test Plan:
1. Perform a clean build of Clang, verify that `clang --version` still
prints the correct LLVM and Clang revision information.
2. Modify Clang's CMake to use the new arguments, then perform another
clean build. `clang --version` should still print the correct
revision information.
LTO: Export functions referenced by the CFI jump table.
If the LowerTypeTests pass decides to add a function to a jump
table for CFI, it will add its name to the set cfiFunctionDefs,
which among other things will cause the function to be renamed in
the ThinLTO backend.
One other thing that we must do with such functions is to not
internalize them, because the jump table in the full LTO object will
contain a reference to the actual function body in the ThinLTO object.
This patch handles that by ensuring that we export any functions
whose names appear in the cfiFunctionDefs set.
[Solaris] enable --whole-archive for shared-library build, disable --version-script for Solaris-ld
Shared-library build on Solaris requires --whole-archive to be specified (option accepted by all available linkers).
At the same time, --version-script can not be handled by Solaris-ld, so it should be skipped.
-M is of no use here, since there is no syntax in Solaris-ld mapfiles that allows to version all global symbols,
not just the named ones (at least this is my impression from digging deep into the docs).
Hans Wennborg [Wed, 19 Jul 2017 15:03:38 +0000 (15:03 +0000)]
Defeat a GCC -Wunused-result warning
It was warning like:
../llvm-project/llvm/lib/Support/ErrorHandling.cpp:172:51: warning:
ignoring return value of ‘ssize_t write(int, const void*, size_t)’,
declared with attribute warn_unused_result [-Wunused-result]
(void)::write(2, OOMMessage, strlen(OOMMessage));
Work around the warning by storing the return value in a variable and
casting that to void instead. We already did this for the other write()
call in this file.
Joel Jones [Wed, 19 Jul 2017 14:10:42 +0000 (14:10 +0000)]
[docs] Document how to debug instruction scheduling model generation
Describe:
+ Exact tablegen command and how to get it
+ tablegen command debug option for subtarget generation
+ Use of schedcover.py on the debug output to determine coverage
This patch cleans up and fixes issues in the M-Class system register handling:
1. It defines the system registers and the encoding (SYSm values) in one place:
a new ARMSystemRegister.td using SearchableTable, thereby removing the
hand-coded values which existed in multiple places.
2. Some system registers e.g. BASEPRI_MAX_NS which do not exist were being allowed!
Ref: ARMv6/7/8M architecture reference manual.
[LoopUtils] Add an extra parameter OpValue to propagateIRFlags function,
If OpValue is non-null, we only consider operations similar to OpValue
when intersecting.
[SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure.
Summary:
When simplifying unconditional branches from empty blocks, we pre-test if the
BB belongs to a set of loop headers and keep the block to prevent passes from
destroying canonical loop structure. However, the current algorithm fails if
the destination of the branch is a loop header. Especially when such a loop's
latch block is folded into loop header it results in additional backedges and
LoopSimplify turns it into a nested loop which prevent later optimizations
from being applied (e.g., loop unrolling and loop interleaving).
This patch augments the existing algorithm by further checking if the
destination of the branch belongs to a set of loop headers and defer
eliminating it if yes to LateSimplifyCFG.
[LV] Test once if vector trip count is zero, instead of twice
Generate a single test to decide if there are enough iterations to jump to the
vectorized loop, or else go to the scalar remainder loop. This test compares the
Scalar Trip Count: if STC < VF * UF go to the scalar loop. If
requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the
rest can be used to form vector iterations. So in this case the test checks
instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the
scalar loop if so. Otherwise the vector loop is entered for at-least one vector
iteration.
This test covers the case where incrementing the backedge-taken count will
overflow leading to an incorrect trip count of zero. In this (rare) case we will
also avoid the vector loop and jump to the scalar loop.
This patch simplifies the existing tests and effectively removes the basic-block
originally named "min.iters.checked", leaving the single test in block
"vector.ph".
Original observation and initial patch by Evgeny Stupachenko.
[PM/LCG] Follow-up fix to r308088 to handle deletion of library
functions.
In the prior commit, we provide ordering to the LCG between functions
and library function definitions that they might begin to call through
transformations. But we still would delete these library functions from
the call graph if they became dead during inlining.
While this immediately crashed, it also exposed a loss of information.
We shouldn't remove definitions of library functions that can still
usefully participate in the LCG-powered CGSCC optimization process. If
new call edges are formed, we want to have definitions to be called.
We can still remove these functions if truly dead using global-dce, etc,
but removing them during the CGSCC walk is premature.
This fixes a crash in the new PM when optimizing some unusual libraries
that end up with "internal" lib functions such as the code in the "R"
language's libraries.
Summary:
This patch adds the following
1. Adds a skeleton scheduler model for AMD Znver1.
2. Introduces the znver1 execution units and pipes.
3. Caters the instructions based on the generic scheduler classes.
4. Further additions to the scheduler model with instruction itineraries will be carried out incrementally based on
a. Instructions types
b. Registers used
5. Since itineraries are not added based on instructions, throughput information are bound to change when incremental changes are added.
6. Scheduler testcases are modified accordingly to suit the new model.
Patch by Ganesh Gopalasubramanian. With minor formatting tweaks from me.
Preserve the actual library name as provided by the user. This is
required to properly replicate link's behaviour about the module import
name handling. This requires an associated change to lld for updating
the tests for the proper behaviour for the import library module name
handling in various cases.
Adrian Prantl [Wed, 19 Jul 2017 00:09:54 +0000 (00:09 +0000)]
Debug Info: Add a file: field to DIImportedEntity.
DIImportedEntity has a line number, but not a file field. To determine
the decl_line/decl_file we combine the line number from the
DIImportedEntity with the file from the DIImportedEntity's scope. This
does not work correctly when the parent scope is a DINamespace or a
DIModule, both of which do not have a source file.
This patch adds a file field to DIImportedEntity to unambiguously
identify the source location of the using/import declaration. Most
testcase updates are mechanical, the interesting one is the removal of
the FIXME in test/DebugInfo/Generic/namespace.ll.
This fixes PR33822. See https://bugs.llvm.org/show_bug.cgi?id=33822
for more context.
Accept and ignore --wide/-W. In GNU readelf this switch is
necessary to get the output format that's consistent between
32-bit and 64-bit targets. llvm-readobj always produces that
output format.
Petr Hosek [Tue, 18 Jul 2017 23:35:22 +0000 (23:35 +0000)]
[llvm-readobj] Accept -S as an alias for --sections
In GNU readelf, the short option for --sections is upper-case -S.
Note that GNU uses lower-case -s to mean --symbols, while LLVM
uses -s to mean --sections and -t to mean --symbols (-t has yet a
different meaning in GNU). So command-line uses with -S can now
be compatible, but uses with -s or -t are still incompatible.
[asan] Copy arguments passed by value into explicit allocas for ASan
Summary:
ASan determines the stack layout from alloca instructions. Since
arguments marked as "byval" do not have an explicit alloca instruction, ASan
does not produce red zones for them. This commit produces an explicit alloca
instruction and copies the byval argument into the allocated memory so that red
zones are produced.
Submitted on behalf of @morehouse (Matt Morehouse)
Object: rename parameter from DLLName to ImportName
When I originally wrote this code, I neglected the fact that the import
library may be created for executables. This name is not the name of
the DLL, but rather the name for the imported module. It will be
embedded into the IAT/ILT reference. Rename it to make it more obvious.
NFC.
When given an extension as part of the `library` directive in a def
file, the extension is preserved/honoured by link/lib. Behave similarly
when parsing the def file. This requires checking if a native extension
is provided as a keyword parameter. If no extension is present, append
a standard `.dll` or `.exe` extension.
This is best tested via lld, and I will add tests there as a follow up.
Jakub Kuderski [Tue, 18 Jul 2017 20:19:52 +0000 (20:19 +0000)]
[Dominators] Improve error checking in deleteEdge
Summary: This patch improves error detection in deleteEdge. It asserts that the edge doesn't exist in the CFG and that DomTree knew about this edge before.
Nirav Dave [Tue, 18 Jul 2017 20:06:24 +0000 (20:06 +0000)]
[DAG] Improve Aliasing of operations to static alloca
Re-recommiting after landing DAG extension-crash fix.
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.
Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.
Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.
The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.
Nirav Dave [Tue, 18 Jul 2017 19:49:20 +0000 (19:49 +0000)]
[DAG] Reverse node replacement in extension operation. NFCI.
Reorder replacements to be user first in preparation for multi-level
folding to premptively avoid inadvertantly deleting later nodes from
sharing found from replacement.
Brian Gesiak [Tue, 18 Jul 2017 19:25:34 +0000 (19:25 +0000)]
[opt-viewer] Handle file names that contain '#'
Summary:
When using opt-viewer.py with files with '#' in their name, such as
'foo#bar.cpp', opt-viewer.py would generate links such as
'/path/to/foo#bar.cpp.opt.yaml#L42'. In this case, the link is
interpreted by browsers as a link to the file '/path/to/foo', and to the
section within that file with ID 'bar.cpp.opt.yaml#L42'.
To work around this issue, replace '#' with '_' in file names and links
in opt-viewer.py.
Added a feature to the Sparc back-end that replaces the integer multiply and
divide instructions with calls to .mul/.sdiv/.udiv. This is a step towards
having full v7 support.
Patch by: Eric Kedaigle
Differential Revision: https://reviews.llvm.org/D35500