Reid Kleckner [Thu, 16 Mar 2017 22:59:15 +0000 (22:59 +0000)]
Remove getArgumentList() in favor of arg_begin(), args(), etc
Users often call getArgumentList().size(), which is a linear way to get
the number of function arguments. arg_size(), on the other hand, is
constant time.
In general, the fact that arguments are stored in an iplist is an
implementation detail, so I've removed it from the Function interface
and moved all other users to the argument container APIs (arg_begin(),
arg_end(), args(), arg_size()).
Reid Kleckner [Thu, 16 Mar 2017 22:58:56 +0000 (22:58 +0000)]
Remove dead F parameter from Argument constructor
When Function creates its argument list, it does the ilist push_back
itself. No other caller passes in a parent function, so this is dead,
and it uses the soon-to-be-deleted getArgumentList accessor.
Adrian McCarthy [Thu, 16 Mar 2017 22:28:39 +0000 (22:28 +0000)]
Make NativeExeSymbol a concrete subclass of NativeRawSymbol [PDB]
This moves exe symbol-specific method implementations out of NativeRawSymbol
into a concrete subclass. Also adds implementations for hasCTypes and
hasPrivateSymbols and a simple test to ensure the native reader can access
the summary information for the executable from the PDB.
Zachary Turner [Thu, 16 Mar 2017 22:28:04 +0000 (22:28 +0000)]
[Support] Support both Windows and Posix paths on both platforms.
Previously which path syntax we supported dependend on what
platform we were compiling LLVM on. While this is normally
desirable, there are situations where we need to be able to
handle a path that we know was generated on a remote host.
Remote debugging, for example, or parsing debug info.
99% of the code in LLVM for handling paths was platform
agnostic and literally just a few branches were gated behind
pre-processor checks, so this changes those sites to use
runtime checks instead, and adds a flag to every path
API that allows one to override the host native syntax.
Reid Kleckner [Thu, 16 Mar 2017 22:25:45 +0000 (22:25 +0000)]
Make Argument::getArgNo() constant time, not O(#args)
getArgNo is actually hot in LLVM, because its how we check for
attributes on arguments:
bool Argument::hasNonNullAttr() const {
if (!getType()->isPointerTy()) return false;
if (getParent()->getAttributes().
hasAttribute(getArgNo()+1, Attribute::NonNull))
return true;
It actually shows up as the 23rd hottest leaf function in a 13s sample
of LTO of llc.
This grows Argument by four bytes, but I have another pending patch to
shrink it by removing its ilist_node base.
Kyle Butt [Thu, 16 Mar 2017 21:33:29 +0000 (21:33 +0000)]
CodeGen: BlockPlacement: Adjust test case so it covers rL297925. NFC
I had ajusted the test case before when testing a chain of length 2, and then
reverted it with rL296845 when I switched to 3 triangles. After running
benchmarks and examining generated code at length 2 I forgot to put the test
back.
Adrian Prantl [Thu, 16 Mar 2017 21:14:09 +0000 (21:14 +0000)]
Salvage debug info from instructions about to be deleted
[Reapplies r297971 and punting on finding a better API for findDbgValues()]
This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.
In the example in the testcase (which was extracted from XNU) there is a sequence of
When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:
- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic
The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.
[SCEV] Compute affine range in another way to avoid bitwidth extending.
Summary:
This approach has two major advantages over the existing one:
1. We don't need to extend bitwidth in our computations. Extending
bitwidth is a big issue for compile time as we often end up working with
APInts wider than 64bit, which is a slow case for APInt.
2. When we zero extend a wrapped range, we lose some information (we
replace the range with [0, 1 << src bit width)). Thus, avoiding such
extensions better preserves information.
Correctness testing:
I ran 'ninja check' with assertions that the new implementation of
getRangeForAffineAR gives the same results as the old one (this
functionality is not present in this patch). There were several failures
- I inspected them manually and found out that they all are caused by
the fact that we're returning more accurate results now (see bullet (2)
above).
Without such assertions 'ninja check' works just fine, as well as
SPEC2006.
Sanjay Patel [Thu, 16 Mar 2017 20:42:45 +0000 (20:42 +0000)]
[InstCombine] avoid breaking up bitcasted vector min/max patterns (PR32306)
As the related tests show, we're not canonicalizing to this form for scalars or vectors yet,
but this solves the immediate problem in:
https://bugs.llvm.org/show_bug.cgi?id=32306
Zachary Turner [Thu, 16 Mar 2017 20:19:11 +0000 (20:19 +0000)]
[PDB] Add support for parsing Flags from PDB Stream.
This was discovered when running `llvm-pdbdump diff` against
two files, the second of which was generated by running the
first one through pdb2yaml and then yaml2pdb.
The second one was missing some bytes from the PDB Stream, and
tracking this down showed that at the end of the PDB Stream were
some additional bytes that we were ignoring. Looking back
to the reference code, these seem to specify some additional
flags that indicate whether the PDB supports various optional
features.
This patch adds support for reading, writing, and round-tripping
these flags through YAML and the raw dumper, and updates the
tests accordingly.
Zachary Turner [Thu, 16 Mar 2017 20:18:41 +0000 (20:18 +0000)]
[llvm-pdbdump] Add support for diffing the PDB Stream.
In doing so I discovered that we completely ignore some bytes
of the PDB Stream after we "finish" loading it. These bytes
seem to specify some additional information about what kind
of data is present in the PDB. A subsequent patch will add
code to read in those fields and store their values.
Adrian Prantl [Thu, 16 Mar 2017 18:22:52 +0000 (18:22 +0000)]
Salvage debug info from instructions about to be deleted
This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.
In the example in the testcase (which was extracted from XNU) there is a sequence of
When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:
- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic
The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.
LTO: Create temporary cache files in the cache directory instead of $TMPDIR.
This fixes a race condition where another linker process can observe a
partially written file if we copy it from another file system, and allows
the link to be independent of the amount of free disk space in $TMPDIR.
Adrian Prantl [Thu, 16 Mar 2017 17:14:56 +0000 (17:14 +0000)]
PR32288: More efficient encoding for DWARF expr subregister access.
Citing http://bugs.llvm.org/show_bug.cgi?id=32288
The DWARF generated by LLVM includes this location:
0x55 0x93 0x04 DW_OP_reg5 DW_OP_piece(4) When GCC's DWARF is simply
0x55 (DW_OP_reg5) without the DW_OP_piece. I believe it's reasonable
to assume the DWARF consumer knows which part of a register
logically holds the value (low bytes, high bytes, how many bytes,
etc) for a primitive value like an integer.
This patch gets rid of the redundant DW_OP_piece when a subregister is
at offset 0. It also adds previously missing subregister masking when
a subregister is followed by another operation.
(This reapplies r297960 with two additional testcase updates).
Reid Kleckner [Thu, 16 Mar 2017 17:05:16 +0000 (17:05 +0000)]
[cmake] Refactor warning flag logic to use Unix warnings with clang-cl
Summary:
clang-cl understands the GCC-style -W[no-]foo flags, and for the most
part ignores MSVC -wd flags. So, let's pass the curated set of warning
flags we use on Unix on Windows. We can also stop passing /W4 -wd*,
which for the most part corresponds to -Wall -Wextra with a bunch of
flags that we mostly ignore.
I had to disable -Wnon-virtual-dtor on Windows, because it fires on
every COM class ever. I filed PR32286 to fix this.
So far I've only found two instances of -Wstring-conversion in the
WinASan code, which I'll fix. Other than that we seem clean.
Reid Kleckner [Thu, 16 Mar 2017 16:57:31 +0000 (16:57 +0000)]
[IR] Inline some Function accessors
I checked that all of these out-of-line methods previously compiled to
simple loads and bittests, so they are pretty good candidates for
inlining. In particular, arg_size() and arg_empty() are popular and are
just two loads, so they seem worth inlining.
Adrian Prantl [Thu, 16 Mar 2017 16:34:14 +0000 (16:34 +0000)]
PR32288: More efficient encoding for DWARF expr subregister access.
Citing http://bugs.llvm.org/show_bug.cgi?id=32288
The DWARF generated by LLVM includes this location:
0x55 0x93 0x04 DW_OP_reg5 DW_OP_piece(4) When GCC's DWARF is simply
0x55 (DW_OP_reg5) without the DW_OP_piece. I believe it's reasonable
to assume the DWARF consumer knows which part of a register
logically holds the value (low bytes, high bytes, how many bytes,
etc) for a primitive value like an integer.
This patch gets rid of the redundant DW_OP_piece when a subregister is
at offset 0. It also adds previously missing subregister masking when
a subregister is followed by another operation.
We can mark functions to always inline early in the opt. Since we do not have
call support this early inlining creates opportunities for inter-procedural
optimizations which would not occur otherwise.
Bjorn Pettersson [Thu, 16 Mar 2017 13:22:01 +0000 (13:22 +0000)]
[InstCombine] Liberate assert in InstCombiner::visitZExt
Summary:
The call to canEvaluateZExtd in InstCombiner::visitZExt may
return with BitsToClear == SrcTy->getScalarSizeInBits(), but
there is an assert that BitsToClear should be smaller than
SrcTy->getScalarSizeInBits().
I have a test case that triggers the assert, but it only happens
for my downstream target. I've not been able to trigger it for
any upstream target.
The assert triggered for a piece of code such as this
%shr1 = lshr i16 undef, 15
...
%shr2 = lshr i16 %shr1, 1
%conv = zext i16 %shr2 to i32
Normally the lshr instructions are constant folded before we
visit the zext (that is why it is so hard to reproduce).
The original pattern, before instcombine, is of course a lot more
complicated in my test case. The shift count in the second lshr
is for example determined by the outcome of a PHI instruction.
It seems like other rewrites by instcombine leads up to
the pattern above. And then the zext is pulled from the
worklist, and visited (hitting the assert), before we detect
that the lshr instrucions can be constant folded.
Anyway, since the canEvaluateZExtd may return with BitsToClear
equal to SrcTy->getScalarSizeInBits(), and since the rewrite
that converts the expression type to avoid a zero extend works
also for the case where SrcBitsKept ends up being zero, then
it should be OK to liberate the assert to
assert(BitsToClear <= SrcTy->getScalarSizeInBits() &&
"Unreasonable BitsToClear");
James Henderson [Thu, 16 Mar 2017 11:22:09 +0000 (11:22 +0000)]
[Support] Add support for getting file system permissions on Windows and implement sys::fs::set/getPermissions to work with them
This change adds support for functions to set and get file permissions, in a similar manner to the C++17 permissions() function in <filesystem>. The setter uses chmod on Unix systems and SetFileAttributes on Windows, setting the permissions as passed in. The getter simply uses the existing status() function.
Prior to this change, status() would always return an unknown value for the permissions on a Windows file, making it impossible to test the new function on Windows. I have therefore added support for this as well. On Linux, prior to this change, the permissions included the file type, which should actually be accessed via a different member of the file_status class.
Note that on Windows, only the *_write permission bits have any affect - if any are set, the file is writable, and if not, the file is read-only. This is in common with what MSDN describes for their behaviour of std::filesystem::permissions(), and also what boost::filesystem does.
The motivation behind this change is so that we can easily test behaviour on read-only files in LLVM unit tests, but I am sure that others may find it useful in some situations.
Chandler Carruth [Thu, 16 Mar 2017 10:45:42 +0000 (10:45 +0000)]
[PM/Inliner] Fix a bug in r297374 where we would leave stale calls in
the work queue and crash when trying to visit them after deleting the
function containing those calls.
Chandler Carruth [Thu, 16 Mar 2017 10:13:55 +0000 (10:13 +0000)]
[PM/Inliner] Add a test case that encapsulates the core issue addressed
in r297374.
I've extracted a small version of this from the C++ metaprogram Richard
came up with to exercise these kinds of issues and written comments to
describe both how to reproduce a fresh version of the test case and what
likely failure modes are.
The test case is still a bit brittle as it depends on the particular
inline cost modeling and SCC visitation order, but it definitely would
have caught the bug right away when developing things so it seems
a really valuable test case to have.
Jonas Paulsson [Thu, 16 Mar 2017 07:17:12 +0000 (07:17 +0000)]
[SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types.
Don't scalarize VSELECT->SETCC when operands/results needs to be widened,
or when the type of the SETCC operands are different from those of the VSELECT.
(VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled.
The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is
no longer needed and has been removed.
Tobias Grosser [Thu, 16 Mar 2017 03:59:23 +0000 (03:59 +0000)]
[ADCE] Remove redundent code [NFC]
Summary:
In commit r289548 ([ADCE] Add code to remove dead branches) a redundant loop
nest was accidentally introduced, which implements exactly the same
functionality as has already been available right after. This redundancy has
been found when inspecting the ADCE code in the context of our recent
discussions on post-dominator modeling. This redundant code was also eliminated
by r296535 (which sparked the discussion), but only as part of a larger semantic
change of the post-dominance modeling. As this redundency in [ADCE] is really
just an oversight completely independent of the post-dominance changes under
discussion, we remove this redundancy independently.
Kyle Butt [Thu, 16 Mar 2017 01:32:29 +0000 (01:32 +0000)]
CodeGen: BlockPlacement: Reduce TriangleChainCount to 2
This produces a 1% speedup on an important internal Google benchmark
(protocol buffers), with no other regressions in google or in the llvm
test-suite. Only 5 targets in the entire llvm test-suite are affected,
and on those 5 targets the size increase is 0.027%
Craig Topper [Wed, 15 Mar 2017 22:40:26 +0000 (22:40 +0000)]
[StackColoring] Remove unused header file for post-order traversal. Update comment that indicated we were using it when we really use a depth-first search. NFC
Zachary Turner [Wed, 15 Mar 2017 22:18:53 +0000 (22:18 +0000)]
[pdb] Write the module info and symbol record streams.
Previously we did not have support for writing detailed
module information for each module, as well as the symbol
records. This patch adds support for this, and in doing
so enables the ability to construct minimal PDBs from
just a few lines of YAML. A test is added to illustrate
this functionality.
Rong Xu [Wed, 15 Mar 2017 21:05:24 +0000 (21:05 +0000)]
[PGO] Minor cleanup for count instruction in SelectInstVisitor.
Summary:
NSIs can be double-counted by different operations in
SelectInstVisitor. Sink the the update to VM_counting mode only.
Also reset the value for each counting operation.
Ahmed Bougacha [Wed, 15 Mar 2017 19:21:11 +0000 (19:21 +0000)]
[GlobalISel] Avoid translating synthetic constants to new G_CONSTANTS.
Currently, we create a G_CONSTANT for every "synthetic" integer
constant operand (for instance, for the G_GEP offset).
Instead, share the G_CONSTANTs we might have created by going through
the ValueToVReg machinery.
When we're emitting synthetic constants, we do need to get Constants from
the context. One could argue that we shouldn't modify the context at
all (for instance, this means that we're going to use a tad more memory
if the constant wasn't used elsewhere), but constants are mostly
harmless. We currently do this for extractvalue and all.
For constant fcmp, this does mean we'll emit an extra COPY, which is not
necessarily more optimal than an extra materialized constant.
But that preserves the current intended design of uniqued G_CONSTANTs,
and the rematerialization problem exists elsewhere and should be
resolved with a single coherent solution.
Matt Arsenault [Wed, 15 Mar 2017 19:04:26 +0000 (19:04 +0000)]
AMDGPU: Fix unnecessary ands when packing f16 vectors
computeKnownBits didn't handle fp_to_fp16 to report
the high bits as 0. ARM maps the generic node to an instruction
that does not modify the high bits of the register, so introduce
a target node where the high bits are known 0.
Tim Northover [Wed, 15 Mar 2017 18:38:13 +0000 (18:38 +0000)]
ARM: avoid clobbering register in v6 jump-table expansion.
If we got unlucky with register allocation and actual constpool placement, we
could end up producing a tTBB_JT with an index that's already been clobbered.
Technically, we might be able to fix this situation up with a MOV, but I think
the constant islands pass is complex enough without having to deal with more
weird edge-cases.
Ahmed Bougacha [Wed, 15 Mar 2017 18:22:37 +0000 (18:22 +0000)]
[GlobalISel] Insert translated switch icmp blocks after switch parent.
Now that we preserve the IR layout, we would end up with all the newly
synthesized switch comparison blocks at the end of the function.
Instead, use a hopefully more reasonable layout, with the comparison
blocks immediately following the switch comparison blocks.
Ahmed Bougacha [Wed, 15 Mar 2017 18:22:33 +0000 (18:22 +0000)]
[GlobalISel] Preserve IR block layout.
It makes the output function layout more predictable; the layout has
an effect on performance, we don't want it to be at the mercy of the
translator's visitation order and such.
The predictable output is also easier to digest.
getOrCreateBB isn't appropriately named anymore, as it never needs to
create anything. Rename it and extract the MBB creation logic out of it.
A couple tests were sensitive to the order. Update them.