Zachary Turner [Fri, 13 Feb 2015 01:23:51 +0000 (01:23 +0000)]
Improve llvm-pdbdump output display.
This patch adds a number of improvements to llvm-pdbdump.
1) Dumping of the entire global scope, and not only those
symbols that live in individual compilands.
2) Prepend class name to member functions and data
3) Improved display of bitfields.
4) Support for dumping more kinds of data symbols.
Chandler Carruth [Fri, 13 Feb 2015 00:29:39 +0000 (00:29 +0000)]
[unroll] Don't use a map from pointer to bool. Use a set.
This is much more efficient. In particular, the query with the user
instruction has to insert a false for every missing instruction into the
set. This is just a cleanup a long the way to fixing the underlying
algorithm problems here.
When we try to estimate number of potentially removed instructions in
loop unroller, we analyze first N iterations and then scale the
computed number by TripCount/N. We should bail out early if N is 0.
Chandler Carruth [Fri, 13 Feb 2015 00:00:24 +0000 (00:00 +0000)]
[unroll] Update the new analysis logic from r228265 to use modern coding
conventions for function names consistently. Some were already using
this but not all.
Rafael Espindola [Thu, 12 Feb 2015 23:29:51 +0000 (23:29 +0000)]
Add support for having multiple sections with the same name and comdat.
Using this in combination with -ffunction-sections allows LLVM to output a .o
file with mulitple sections named .text. This saves space by avoiding long
unique names of the form .text.<C++ mangled name>.
David Majnemer [Thu, 12 Feb 2015 23:26:26 +0000 (23:26 +0000)]
X86: Don't crash if we can't decode the pshufb mask
Constant pool entries are uniqued by their contents regardless of their
type. This means that a pshufb can have a shuffle mask which isn't a
simple array of bytes.
The code path which attempts to decode the mask didn't check for
failure, causing PR22559.
David Blaikie [Thu, 12 Feb 2015 22:45:25 +0000 (22:45 +0000)]
Remove typedef of a pointer type used in a gep to simplify migration of geps to a typeless-pointer future.
I'd modify my migration tool to account for this, but this is the only
instance of a typedef'd pointer type to a gep I found in the whole test
suite, so it didn't seem worthwhile.
Hal Finkel [Thu, 12 Feb 2015 22:43:52 +0000 (22:43 +0000)]
[SDAG] Don't try to use FP_EXTEND/FP_ROUND for int<->fp promotions
The PowerPC backend has long promoted some floating-point vector operations
(such as select) to integer vector operations. Unfortunately, this behavior was
broken by r216555. When using FP_EXTEND/FP_ROUND for promotions, we must check
that both the old and new types are floating-point types. Otherwise, we must
use BITCAST as we did prior to r216555 for everything.
IR: Stop abusing DW_TAG_base_type for compile unit arrays
The sub-arrays for compile units have for a long time been initialized
to distinct temporary nodes with the `DW_TAG_base_type` tag, with no
other operands. These invalid `DIBasicType`s are later replaced with
appropriate arrays.
This seems like a poor man's assertion that the arrays do eventually get
replaced. These days, temporaries in the graph will cause assertions
when writing bitcode or assembly, so this isn't necessary. Use
temporary empty tuples instead.
Note that the whole idea of using temporaries and then replacing them
later is wasteful here. We never actually want to merge compile units
by uniquing based on content. Compile units should use `getDistinct()`
instead of `get()`, and then their operands can be freely replaced later
on.
Zachary Turner [Thu, 12 Feb 2015 21:09:24 +0000 (21:09 +0000)]
Add concrete type overloads to PDBSymbol::findChildren().
Frequently you only want to iterate over children of a specific
type (e.g. functions). Previously you would get back a generic
interface that allowed iteration over the base symbol type,
which you would have to dyn_cast<> each one of. With this patch,
we allow the user to specify the concrete type as a template
parameter, and it will return an iterator which returns instances
of the concrete type directly.
Bjorn Steinbrink [Thu, 12 Feb 2015 21:04:22 +0000 (21:04 +0000)]
Fix a crash in the assumption cache when inlining indirect function calls
Summary:
Instances of the AssumptionCache are per function, so we can't re-use
the same AssumptionCache instance when recursing in the CallAnalyzer to
analyze a different function. Instead we have to pass the
AssumptionCacheTracker to the CallAnalyzer so it can get the right
AssumptionCache on demand.
James Molloy [Thu, 12 Feb 2015 15:54:14 +0000 (15:54 +0000)]
[LoopRerolling] Be more forgiving with instruction order.
We can't solve the full subgraph isomorphism problem. But we can
allow obvious cases, where for example two instructions of different
types are out of order. Due to them having different types/opcodes,
there is no ambiguity.
Tim Northover [Thu, 12 Feb 2015 15:12:13 +0000 (15:12 +0000)]
Triple: refactor redundant code.
Should be no functional change, since most of the logic removed was
completely pointless (after some previous refactoring) and the rest
duplicated elsewhere.
Andrea Di Biagio [Thu, 12 Feb 2015 14:17:24 +0000 (14:17 +0000)]
[TTI] Teach the cost heuristic how to query TLI to check if a zext/trunc is 'free' for the target.
Now that SimplifyCFG uses TTI for the cost heuristic, we can teach BasicTTIImpl
how to query TLI in order to get a more accurate cost for truncates and
zero-extends.
Before this patch, the basic cost heuristic in TargetTransformInfoImplCRTPBase
would have conservatively returned a 'default' TCC_Basic for all zero-extends,
and TCC_Free for truncates on native types.
This patch improves the heuristic so that we query TLI (if available) to get
more accurate answers. If TLI is available, then methods 'isZExtFree' and
'isTruncateFree' can be used to check if a zext/trunc is free for the target.
Added more test cases to SimplifyCFG/X86/speculate-cttz-ctlz.ll.
With this change, SimplifyCFG is now able to speculate a 'cheap' cttz/ctlz
immediately followed by a free zext/trunc.
Benjamin Kramer [Thu, 12 Feb 2015 13:47:29 +0000 (13:47 +0000)]
MathExtras: Parametrize count(Trailing|Leading)Zeros on the type size.
Otherwise we will always select the generic version for e.g. unsigned
long if uint64_t is typedef'd to 'unsigned long long'. Also remove
enable_if hacks in favor of static_assert.
Asiri Rathnayake [Thu, 12 Feb 2015 13:37:28 +0000 (13:37 +0000)]
ARM: Fix another regression introduced in r223113
The changes in r223113 (ARM modified-immediate syntax) have broken
instructions like:
mov r0, #~0xffffff00
The problem is that I've added a spurious range check on the immediate
operand to ensure that it lies between INT32_MIN and UINT32_MAX. While
this range check is correct in theory, it causes problems because the
operand is stored in an int64_t (by MC). So valid 32-bit constants like
\#~0xffffff00 become out of range. The solution is to simply remove this
range check. It is not possible to validate the range of the immediate
operand with the current setup because: 1) The operand is stored in an
int64_t by MC, 2) The immediate can be of the forms #imm, #-imm, #~imm
or even #((~imm)) etc. So we just chop the value to 32 bits and use it.
Also noted that the original range check was note tested by any of the
unit tests. I've added a new test to cover #~imm kind of operands.
Dmitry Vyukov [Thu, 12 Feb 2015 09:55:28 +0000 (09:55 +0000)]
tsan: do not instrument not captured values
I've built some tests in WebRTC with and without this change. With this change number of __tsan_read/write calls is reduced by 20-40%, binary size decreases by 5-10% and execution time drops by ~5%. For example:
Using KORTESTW for comparison i1 value with zero was wrong since the instruction tests 16 bits.
KORTESTW may be used with KSHIFTL+KSHIFTR that clean the 15 upper bits.
I removed (X86cmp i1, 0) pattern and zero-extend i1 to i8 and then use TESTB.
There are some cases where i1 is in the mask register and the upper bits are already zeroed.
Then KORTESTW is the better solution, but it is subject for optimization.
Meanwhile, I'm fixing the correctness issue.
[X86] A heuristic to estimate the size impact for converting stack-relative parameter movs to pushes
This gives a rough estimate of whether using pushes instead of movs is profitable, in terms of size.
We go over all calls in the MachineFunction and compute:
a) For each callsite that can not use pushes, the penalty of not having a reserved call frame.
b) For each callsite that can use pushes, the gain of actually replacing the movs with pushes (and the potential penalty of having to readjust the stack).
Ahmed Bougacha [Thu, 12 Feb 2015 06:15:29 +0000 (06:15 +0000)]
[CodeGen] Don't blindly combine (fp_round (fp_round x)) to (fp_round x).
We used to do this DAG combine, but it's not always correct:
If the first fp_round isn't a value preserving truncation, it might
introduce a tie in the second fp_round, that wouldn't occur in the
single-step fp_round we want to fold to.
In other words, double rounding isn't the same as rounding.
We would crash if we couldn't locate a Function that either Location's
Value belonged to. Now we just print out a debug message and return
conservatively.
Chandler Carruth [Thu, 12 Feb 2015 02:30:56 +0000 (02:30 +0000)]
[slp] Fix a nasty bug in the SLP vectorizer that Joerg pointed out.
Apparently some code finally started to tickle this after my
canonicalization changes to instcombine.
The bug stems from trying to form a vector type out of scalars that
aren't compatible at all. In this example, from x86_mmx values. The code
in the vectorizer that checks for reasonable types whas checking for
aggregates or vectors, but there are lots of other types that should
just never reach the vectorizer.
Debugging this was made more confusing by the lie in an assert in
VectorType::get() -- it isn't that the types are *primitive*. The types
must be integer, pointer, or floating point types. No other types are
allowed.
I've improved the assert and added a helper to the vectorizer to handle
the element type validity checks. It now re-uses the VectorType static
function and then further excludes weird target-specific types that we
probably shouldn't be touching here (x86_fp80 and ppc_fp128). Neither of
these are really reachable anyways (neither 80-bit nor 128-bit things
will get vectorized) but it seems better to just eagerly exclude such
nonesense.
I've added a test case, but while it definitely covers two of the paths
through this code there may be more paths that would benefit from test
coverage. I'm not familiar enough with the SLP vectorizer to synthesize
test cases for all of these, but was able to update the code itself by
inspection.
Hal Finkel [Thu, 12 Feb 2015 01:02:52 +0000 (01:02 +0000)]
[PowerPC] Mark jumps as expensive (using using CR bits)
On PowerPC, which has a full set of logical operations on (its multiple sets
of) condition-register bits, it is not profitable to break of complex
conditions feeding a jump into multiple jumps. We can turn off this feature of
CGP/SDAGBuilder by marking jumps as "expensive".
Zachary Turner [Thu, 12 Feb 2015 00:05:49 +0000 (00:05 +0000)]
Revert "Change Path::filename_pos() to skip the drive letter."
This reverts commit 228874. For some reason users reported
seeing Clang taking up 25+GB of memory and bringing down
machines with this change. Reverting until we figure it out.
I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It
actually depends on the index too, which means we need to be careful about how
the results are combined before return. In particular if a single Use returns
Live, that counts for the entire object, at the granularity we're considering.
Zachary Turner [Wed, 11 Feb 2015 21:16:35 +0000 (21:16 +0000)]
Change Path::filename_pos() to skip the drive letter.
For Windows, filename_pos() tries to find the filename by
searching for separators after the last :. Instead, it should
really check for the only location that a : is valid, which is
in the second character, and search for separators after that.
Mehdi Amini [Wed, 11 Feb 2015 19:54:44 +0000 (19:54 +0000)]
Reassociate: cannot negate a INT_MIN value
Summary:
When trying to canonicalize negative constants out of
multiplication expressions, we need to check that the
constant is not INT_MIN which cannot be negated.
Adrian Prantl [Wed, 11 Feb 2015 17:45:05 +0000 (17:45 +0000)]
Generalize DIBuilder's createReplaceableForwardDecl() to a more flexible
createReplaceableCompositeType() that allows to create non-forward-declared
temporary nodes.
Jan Wen Voung [Wed, 11 Feb 2015 16:12:50 +0000 (16:12 +0000)]
Gold-plugin: Broaden scope of get/release_input_file to scope of Module.
Summary:
Move calls to get_input_file and release_input_file out of
getModuleForFile(). Otherwise release_input_file may end up
unmapping a view of the file while the view is still being
used by the Module (on 32-bit hosts).
Jonas Paulsson [Wed, 11 Feb 2015 16:10:31 +0000 (16:10 +0000)]
Fix SelectionDAG compile time issue with alias analysis.
Add new token factor node and its users to worklist if alias analysis is
turned on, in DAGCombiner::visitTokenFactor(). Alias analysis may cause
a lot of new token factors to be inserted into the DAG, and they need to
be optimized to avoid significant slow-downs.
Marek Olsak [Wed, 11 Feb 2015 14:26:46 +0000 (14:26 +0000)]
R600/SI: Enable a lot of existing tests for VI (squashed commits)
This is a union of these commits:
* R600/SI: Enable more tests for VI which need no changes
* R600/SI: Enable V_BCNT tests for VI
Differences:
- v_bcnt_..._e32 -> _e64
- s_load_dword* inline offset is in bytes instead of dwords
* R600/SI: Enable all tests for VI which use S_LOAD_DWORD
The inline offset is changed from dwords to bytes.
* R600/SI: Enable LDS tests for VI
Differences:
- the s_load_dword inline offset changed from dwords to bytes
- the tests checked very little on CI, so they have been fixed to check all
instructions that "SI" checked
* R600/SI: Enable lshr tests for VI
* R600/SI: Fix divrem64 tests
- "v_lshl_64" was missing "b" before "64"
- added VI-NOT checks
* R600/SI: Enable the SI.tid test for VI
* R600/SI: Enable the frem test for VI
Also, the frem_f64 checking is added for CI-VI.