Sanjay Patel [Fri, 8 Mar 2019 21:54:41 +0000 (21:54 +0000)]
[x86] scalarize extract element 0 of FP cmp
An extension of D58282 noted in PR39665:
https://bugs.llvm.org/show_bug.cgi?id=39665
This doesn't answer the request to use movmsk, but that's an
independent problem. We need this and probably still need
scalarization of FP selects because we can't do that as a
target-independent transform (although it seems likely that
targets besides x86 should have this transform).
Alexey Bataev [Fri, 8 Mar 2019 21:29:17 +0000 (21:29 +0000)]
[NVPTX][DEBUGINFO]Temp workaround for crash of ptxas: disable packed bytes in debug sections.
Summary:
This patch works around the bug in the ptxas tool with the processing of bytes
separated by the comma symbol. The emission of the packed string is
temporarily disabled.
Mitch Phillips [Fri, 8 Mar 2019 21:22:35 +0000 (21:22 +0000)]
[HWASan] Save + print registers when tag mismatch occurs in AArch64.
Summary:
This change change the instrumentation to allow users to view the registers at the point at which tag mismatch occured. Most of the heavy lifting is done in the runtime library, where we save the registers to the stack and emit unwind information. This allows us to reduce the overhead, as very little additional work needs to be done in each __hwasan_check instance.
In this implementation, the fast path of __hwasan_check is unmodified. There are an additional 4 instructions (16B) emitted in the slow path in every __hwasan_check instance. This may increase binary size somewhat, but as most of the work is done in the runtime library, it's manageable.
The failure trace now contains a list of registers at the point of which the failure occured, in a format similar to that of Android's tombstones. It currently has the following format:
... and prints after the dump of memory tags around the buggy address.
Every register is saved exactly as it was at the point where the tag mismatch occurs, with the exception of x16/x17. These registers are used in the tag mismatch calculation as scratch registers during __hwasan_check, and cannot be saved without affecting the fast path. As these registers are designated as scratch registers for linking, there should be no important information in them that could aid in debugging.
Shoaib Meenai [Fri, 8 Mar 2019 21:10:22 +0000 (21:10 +0000)]
[cmake] Remove llvm from LLVM_ALL_PROJECTS
LLVM is always built; including it in LLVM_ENABLE_PROJECTS has no
effect, but since it's in LLVM_ALL_PROJECTS, we produce a confusing
message about it being disabled. Drop it from LLVM_ALL_PROJECTS to avoid
this. Pointed out by David Greene on the mailing list [1].
Matt Arsenault [Fri, 8 Mar 2019 20:58:11 +0000 (20:58 +0000)]
AMDGPU: Move d16 load matching to preprocess step
When matching half of the build_vector to a load, there could still be
a hidden dependency on the other half of the build_vector the pattern
wouldn't detect. If there was an additional chain dependency on the
other value, a cycle could be introduced.
I don't think a tablegen pattern is capable of matching the necessary
conditions, so move this into PreprocessISelDAG. Check isPredecessorOf
for the other value to avoid a cycle. This has a warning that it's
expensive, so this should probably be moved into an MI pass eventually
that will have more freedom to reorder instructions to help match
this. That is currently complicated by the lack of a computeKnownBits
type mechanism for the selected function.
Matt Arsenault [Fri, 8 Mar 2019 20:46:15 +0000 (20:46 +0000)]
DAG: Don't try to cluster loads with tied inputs
This avoids breaking possible value dependencies when sorting loads by
offset.
AMDGPU has some load instructions that write into the high or low bits
of the destination register, and have a tied input for the other input
bits. These can easily have the same base pointer, but be a swizzle so
the high address load needs to come first. This was inserting glue
forcing the opposite ordering, producing a cycle the InstrEmitter
would assert on. It may be potentially expensive to look for the
dependency between the other loads, so just skip any where this could
happen.
Fixes bug 40936 by reverting r351379, which added a hacky attempt to
fix this by adding chains in this case, which I think was just working
around broken glue before the InstrEmitter. The core of the patch is
re-implementing the fix for that problem.
Matt Arsenault [Fri, 8 Mar 2019 20:30:50 +0000 (20:30 +0000)]
AMDGPU: Correct DS implementation of areLoadsFromSameBasePtr
This was checking the wrong operands for the base register and the
offsets. The indexes are shifted by the number of output registers
from the machine instruction definition, and the chain is moved to the
end.
Alexey Bataev [Fri, 8 Mar 2019 20:08:04 +0000 (20:08 +0000)]
[DEBUG_INFO][NVPTX]Emit empty .debug_loc section in presence of the debug option.
Summary:
If the LLVM module shows that it has debug info, but the file is
actually empty and the real debug info is not emitted, the ptxas tool
emits error 'Debug information not found in presence of .target debug'.
We need at leas one empty debug section to silence this message. Section
`.debug_loc` is not emitted for PTX and we can emit empty `.debug_loc`
section if `debug` option was emitted.
Wei Mi [Fri, 8 Mar 2019 19:25:32 +0000 (19:25 +0000)]
[RegisterCoalescer] Limit the number of joins for large live interval with
many valnos.
Recently we found compile time out problem in several cases when
SpeculativeLoadHardening was enabled. The significant compile time was spent
in register coalescing pass, where register coalescer tried to join many other
live intervals with some very large live intervals with many valnos.
Specifically, every time JoinVals::mapValues is called, computeAssignment will
be called by getNumValNums() times of the target live interval. If the large
live interval has N valnos and has N copies associated with it, trying to
coalescing those copies will at least cost N^2 complexity.
The patch adds some limit to the effort trying to join those very large live
intervals with others. By default, for live interval with > 100 valnos, and
when it has been coalesced with other live interval by more than 100 times,
we will stop coalescing for the live interval anymore. That put a compile
time cap for the N^2 algorithm and effectively solves the compile time
problem we saw.
The indexed variant of vfmal.f16 and vfmsl.f16
instructions use the uppser bits of the indexed
operand to store the index (1 bit for the double
variant, 2 bits for the quad).
This limits the usable registers to d0 - d7 or
s0 - s15. This patch enforces this limitation.
James Henderson [Fri, 8 Mar 2019 13:22:05 +0000 (13:22 +0000)]
[llvm-readelf]Don't lose negative-ness of negative addends for no symbol relocations
llvm-readelf prints relocation addends as:
<symbol value>[+-]<absolute addend>
where [+-] is determined from whether addend is less than zero or not.
However, it does not print the +/- if there is no symbol, which meant
that negative addends became their positive value with no indication
that this had happened. This patch stops the absolute conversion when
addends are negative and there is no associated symbol.
Nico Weber [Fri, 8 Mar 2019 13:01:58 +0000 (13:01 +0000)]
gn build: Unbreak finding a working `gn` on $PATH on Unix after r355645
From the Python subprocess docs:
If shell is True, it is recommended to pass args as a string rather than as
a sequence.
[...]
If args is a sequence, the first item specifies the command string, and any
additional items will be treated as additional arguments to the shell itself.
Prior to this change, the `--version` would be passed to the shell, not to
a potential gn binary on $PATH, and running `gn` without any arguments makes
it exit with an exit code != 0, so the script would think that there wasn't
a working gn binary on $PATH.
Fix this by following the documentation's recommendation of using a string
now that we pass shell=True. I tested this on macOS and Windows, each with
the three cases of
- no gn on PATH (should run gn downloaded by get.py if present,
else suggest running get.py)
- broken gn wrapper on PATH (should behave like the previous item)
- working gn on PATH (should use gn on PATH)
Nico Weber [Fri, 8 Mar 2019 12:45:50 +0000 (12:45 +0000)]
gn build: Unbreak get.py and gn.py on Windows
`os.uname()` doesn't exist on Windows, so use `platform.machine()` which
returns `os.uname()[4]` on non-Win and (on 64-bit systems) "AMD64" on Windows.
Also use `sys.platform` instead of `platform` to check for Windows-ness for the
file extension in gn.py (get.py got this right).
Clement Courbet [Fri, 8 Mar 2019 09:07:45 +0000 (09:07 +0000)]
[SelectionDAG] Allow the user to specify a memeq function.
Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.
This is sub-optimal because memcmp has to compute much more than
equality.
This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.
`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.
Craig Topper [Fri, 8 Mar 2019 07:33:43 +0000 (07:33 +0000)]
[X86] Improve the type checking in isLegalMaskedLoad and isLegalMaskedGather.
We were just checking pointer size and type primitive size. But this caused unintended things like vectors of half being accepted by masked load/store.
For FP we now explicitly check for only double and float.
For pointers we now let any pointer through. Trusting that only 32 and 64 would be used to generate assembly.
We only check bitwidth after checking that the type is an integer.
Petr Hosek [Fri, 8 Mar 2019 05:35:22 +0000 (05:35 +0000)]
[runtimes] Move libunwind, libc++abi and libc++ to lib/ and include/
This change is a consequence of the discussion in "RFC: Place libs in
Clang-dedicated directories", specifically the suggestion that
libunwind, libc++abi and libc++ shouldn't be using Clang resource
directory. Tools like clangd make this assumption, but this is
currently not true for the LLVM_ENABLE_PER_TARGET_RUNTIME_DIR build.
This change addresses that by moving the output of these libraries to
lib/<target> and include/ directories, leaving resource directory only
for compiler-rt runtimes and Clang builtin headers.
Steven Wu [Fri, 8 Mar 2019 05:27:53 +0000 (05:27 +0000)]
[Bitcode] Fix bitcode compatibility issue with clang.arc.use intrinsic
Summary:
In r349534, objc arc implementation is switched to use intrinsics and at
the same time, clang.arc.use is renamed to llvm.objc.clang.arc.use to
make the naming more consistent. The side-effect of that is llvm no
longer recognize it as intrinsics and codegen external references to
it instead.
Rather than upgrade the old intrinsics name to the new one and wait for
the arc-contract pass to remove it, simply remove it in the bitcode
upgrader.
Mitch Phillips [Thu, 7 Mar 2019 22:20:36 +0000 (22:20 +0000)]
[GN] Locate prebuilt binaries correctly.
Use the system shell to see if we can find a 'gn' binary on $PATH. This solves the error wherein subprocess.call fails ungracefully if the binary doesn't exist.
Hubert Tong [Thu, 7 Mar 2019 21:28:33 +0000 (21:28 +0000)]
Add secondary libstdc++ 4.8 and 5.1 detection mechanisms
Summary:
The date-based approach to detecting unsupported versions of libstdc++
does not handle bug fix releases of older versions. As an example, the
`__GLIBCXX__` value associated with version 5.1, `20150422`, is less
than the values associated with versions 4.8.5 and 4.9.3.
This patch adds secondary checks based on certain properties in
sufficiently new versions of libstdc++.
Craig Topper [Thu, 7 Mar 2019 21:22:56 +0000 (21:22 +0000)]
[X86] Correct scheduler information for rotate by constant for Haswell, Broadwell, and Skylake.
Rotate with explicit immediate is a single uop from Haswell on. An immediate of 1 has a dependency on the previous writer of flags, but the other immediate values do not.
The implicit rotate by 1 instruction is 2 uops. But the flags are merged after the rotate uop so the data result does not see the flag dependency. But I don't think we have any way of modeling that.
RORX is 1 uop without the load. 2 uops with the load. We currently model these with WriteShift/WriteShiftLd.
Craig Topper [Thu, 7 Mar 2019 21:22:51 +0000 (21:22 +0000)]
[X86] Model ADC/SBB with immediate 0 more accurately in the Haswell scheduler model
Haswell and possibly Sandybridge have an optimization for ADC/SBB with immediate 0 to use a single uop flow. This only applies GR16/GR32/GR64 with an 8-bit immediate. It does not apply to GR8. It also does not apply to the implicit AX/EAX/RAX forms.
Brian Gesiak [Thu, 7 Mar 2019 20:40:55 +0000 (20:40 +0000)]
[CodeGen] Reuse BlockUtils for -unreachableblockelim pass (NFC)
Summary:
The logic in the -unreachableblockelim pass does the following:
1. It traverses the function it's given in depth-first order and
creates a set of basic blocks that are unreachable from the
function's entry node.
2. It iterates over each of those unreachable blocks and (1) removes any
successors' references to the dead block, and (2) replaces any uses of
instructions from the dead block with null.
The logic in (2) above is identical to what the `llvm::DeleteDeadBlocks`
function from `BasicBlockUtils.h` does. The only difference is that
`llvm::DeleteDeadBlocks` replaces uses of instructions from dead blocks
not with null, but with undef.
Replace the duplicate logic in the -unreachableblockelim pass with a
call to `llvm::DeleteDeadBlocks`. This results in less code but no
functional change (NFC).
Matt Davis [Thu, 7 Mar 2019 19:34:44 +0000 (19:34 +0000)]
[llvm-mca] Emit a message when no bottlenecks are identified.
Summary:
Since bottleneck hints are enabled via user request, it can be
confusing if no bottleneck information is presented. Such is the
case when no bottlenecks are identified. This patch emits a message
in that case.
Summary:
ShadowCallStack on x86_64 suffered from the same racy security issues as
Return Flow Guard and had performance overhead as high as 13% depending
on the benchmark. x86_64 ShadowCallStack was always an experimental
feature and never shipped a runtime required to support it, as such
there are no expected downstream users.
Direct leak of 128 byte(s) in 1 object(s) allocated from:
#0 0x610428 in operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:105
#1 0x16936bc in llvm::User::operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/lib/IR/User.cpp:151:19
#2 0x7c3fe9 in Create /b/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/IR/Function.h:144:12
#3 0x7c3fe9 in (anonymous namespace)::FunctionTest_GetPointerAlignment_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/IR/FunctionTest.cpp:136
#4 0x1a836a0 in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc
#5 0x1a836a0 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2474
#6 0x1a85c55 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11
#7 0x1a870d0 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28
#8 0x1aa5b84 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43
#9 0x1aa4d30 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc
#10 0x1aa4d30 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4257
#11 0x1a6b656 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46
#12 0x1a6b656 in main /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50
#13 0x7f5af37a22e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
Indirect leak of 40 byte(s) in 1 object(s) allocated from:
#0 0x610428 in operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:105
#1 0x151be6b in make_unique<llvm::ValueSymbolTable> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/ADT/STLExtras.h:1349:29
#2 0x151be6b in llvm::Function::Function(llvm::FunctionType*, llvm::GlobalValue::LinkageTypes, unsigned int, llvm::Twine const&, llvm::Module*) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/lib/IR/Function.cpp:241
#3 0x7c4006 in Create /b/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/IR/Function.h:144:16
#4 0x7c4006 in (anonymous namespace)::FunctionTest_GetPointerAlignment_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/IR/FunctionTest.cpp:136
#5 0x1a836a0 in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc
#6 0x1a836a0 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2474
#7 0x1a85c55 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11
#8 0x1a870d0 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28
#9 0x1aa5b84 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43
#10 0x1aa4d30 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc
#11 0x1aa4d30 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4257
#12 0x1a6b656 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46
#13 0x1a6b656 in main /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50
#14 0x7f5af37a22e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
SUMMARY: AddressSanitizer: 168 byte(s) leaked in 2 allocation(s).
```
See http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/11358/steps/check-llvm%20asan/logs/stdio for more information.
Also introduces use-of-uninitialized-value in ConstantsTest.FoldGlobalVariablePtr:
```
==7070==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x14e703c in User /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/User.h:79:5
#1 0x14e703c in Constant /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/Constant.h:44
#2 0x14e703c in llvm::GlobalValue::GlobalValue(llvm::Type*, llvm::Value::ValueTy, llvm::Use*, unsigned int, llvm::GlobalValue::LinkageTypes, llvm::Twine const&, unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/GlobalValue.h:78
#3 0x14e5467 in GlobalObject /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/GlobalObject.h:34:9
#4 0x14e5467 in llvm::GlobalVariable::GlobalVariable(llvm::Type*, bool, llvm::GlobalValue::LinkageTypes, llvm::Constant*, llvm::Twine const&, llvm::GlobalValue::ThreadLocalMode, unsigned int, bool) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/Globals.cpp:314
#5 0x6938f1 in llvm::(anonymous namespace)::ConstantsTest_FoldGlobalVariablePtr_Test::TestBody() /b/sanitizer-x86_64-linux-fast/build/llvm/unittests/IR/ConstantsTest.cpp:565:18
#6 0x1a240a1 in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc
#7 0x1a240a1 in testing::Test::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:2474
#8 0x1a26d26 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11
#9 0x1a2815f in testing::TestCase::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28
#10 0x1a43de8 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43
#11 0x1a42c47 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc
#12 0x1a42c47 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:4257
#13 0x1a0dfba in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46
#14 0x1a0dfba in main /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50
#15 0x7f2081c412e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
#16 0x4dff49 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/IR/IRTests+0x4dff49)
SUMMARY: MemorySanitizer: use-of-uninitialized-value /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/User.h:79:5 in User
```
See http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/30222/steps/check-llvm%20msan/logs/stdio for more information.
Florian Hahn [Thu, 7 Mar 2019 17:50:16 +0000 (17:50 +0000)]
[InterleavedAccessAnalysis] Fix integer overflow in insertMember.
Without checking for integer overflow, invalid members can be added
e.g. if the calculated key overflows, becomes positive and the largest key.
This fixes
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7560
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13128
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13229
Petar Jovanovic [Thu, 7 Mar 2019 16:31:08 +0000 (16:31 +0000)]
[DebugInfo] Fix the type of the formated variable
Change the format type of *Personality and *LSDAAddress to PRIx64 since
they are of type uint64_t.
The problem was detected on mips builds, where it was printing junk values
and causing test failure.
Nico Weber [Thu, 7 Mar 2019 15:44:59 +0000 (15:44 +0000)]
gn build: Port r342002
I had hoped we could remove the dependency on shell32.lib from lib/Support
(there isn't much depending on it), but looks like this will take a while. So
for now, port this over.
David Green [Thu, 7 Mar 2019 13:44:40 +0000 (13:44 +0000)]
[LSR] Attempt to increase the accuracy of LSR's setup cost
In some loops, we end up generating loop induction variables that look like:
{(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1}
As opposed to the simpler:
{(zext i16 (%i0 * %i1) to i32),+,-1}
i.e we count up from -limit to 0, not the simpler counting down from limit to
0. This is because the scores, as LSR calculates them, are the same and the
second is filtered in place of the first. We end up with a redundant SUB from 0
in the code.
This patch tries to make the calculation of the setup cost a little more
thoroughly, recursing into the scev members to better approximate the setup
required. The cost function for comparing LSR costs is:
return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds,
C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds,
C2.ScaleCost, C2.ImmCost, C2.SetupCost);
So this will only alter results if none of the other variables turn out to be
different.
Petar Avramovic [Thu, 7 Mar 2019 13:28:29 +0000 (13:28 +0000)]
[MIPS GlobalISel] Fix mul operands
Unsigned mul high for MIPS32 is selected into two PseudoInstructions:
PseudoMULTu and PseudoMFHI that use accumulator register class ACC64 for
some of its operands. Registers in this class have appropriate hi and lo
register as subregisters: $lo0 and $hi0 are subregisters of $ac0 etc.
mul instruction implicit-defs $lo0 and $hi0 according to MipsInstrInfo.td.
In functions where mul and PseudoMULTu are present fastRegisterAllocator
will "run out of registers during register allocation" because
'calcSpillCost' for $ac0 will return spillImpossible because subregisters
$lo0 and $hi0 of $ac0 are reserved by mul instruction above. A solution is
to mark implicit-defs of $lo0 and $hi0 as dead in mul instruction.
Fangrui Song [Thu, 7 Mar 2019 11:42:59 +0000 (11:42 +0000)]
[IDF] Delete a redundant J-edge test
In the DJ-graph based computation of iterated dominance frontiers,
SuccNode->getIDom() == Node is one of the tests to check if (Node,Succ)
is a J-edge. If it is true, since Node is dominated by Root,
SuccLevel = level(Node)+1 > RootLevel
which means the next test SuccLevel > RootLevel will also be true. test
the check is redundant and can be deleted as it also involves one
indirection and provides no speed-up.
Kristof Beyls [Thu, 7 Mar 2019 10:14:38 +0000 (10:14 +0000)]
Add newline to interpreter debugging output
When running lli --debug --force-interpreter=true the executed instructions are
printed but are missing newlines. This commit adds the missing newlines.
Aakanksha Patil [Thu, 7 Mar 2019 00:54:04 +0000 (00:54 +0000)]
AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV)
A previous patch for "uniform-work-group-size" attribute was found to break
some RADV and possibly radeon SI tests and had to be retracted.
This patch fixes that.
Summary:
While implementing inlining support for callbr
(https://bugs.llvm.org/show_bug.cgi?id=40722), I hit a crash in Loop
Rotation when trying to build the entire x86 Linux kernel
(drivers/char/random.c). This is a small fix up to r353563.
Test case is drivers/char/random.c (with callbr's inlined), then ran
through creduce, then `opt -opt-bisect-limit=<limit>`, then bugpoint.
Thanks to Craig Topper for immediately spotting the fix, and teaching me
how to fish.
Simon Atanasyan [Wed, 6 Mar 2019 22:40:28 +0000 (22:40 +0000)]
[mips] Replace assertion by error message while lowering `RETURNADDR` and `FRAMEADDR`
MIPS target supports lowering `RETURNADDR` and `FRAMEADDR` for a current
frame only. It's better to show an error message then crash on assertion
if `__builtin_return_address` is invoked with non-zero argument.
Philip Reames [Wed, 6 Mar 2019 19:27:13 +0000 (19:27 +0000)]
[AtomicExpand] Allow libcall expansion for non-zero address spaces (try 2)
Restore a reverted commit, with the silly mistake fixed. Sorry for the previous breakage.
Be consistent about how we treat atomics in non-zero address spaces. If we get to the backend, we tend to lower them as if in address space 0. Do the same if we need to insert a libcall instead.
This reverts rL355522 (https://reviews.llvm.org/D57335).
Kills buildbots that use '-Werror' with the following error:
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm/lib/IR/Value.cpp:657:7: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]
See buildbots http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/30200/steps/check-llvm%20asan/logs/stdio for more information.
Guozhi Wei [Wed, 6 Mar 2019 18:22:22 +0000 (18:22 +0000)]
[PPC] Adjust the computed branch offset for the possible shorter distance
In file PPCBranchSelector.cpp we tend to over estimate code size due to large
alignment and inline assembly. Usually it causes larger computed branch offset,
it is not big problem. But sometimes it may also causes smaller computed branch
offset than actual branch offset. If the offset is close to the limit of
encoding, it may cause problem at run time.
Following is a simplified example.
actual estimated
address address
...
bne Far 100 10c
.p2align 4
Near: 110 110
...
Far: 8108 8108
Ryan Taylor [Wed, 6 Mar 2019 17:02:06 +0000 (17:02 +0000)]
[AMDGPU] Add support for 64 bit buffer atomic artihmetic instructions
Summary:
This adds support for 64 bit buffer atomic arithmetic instructions but does not include
cmpswap as that depends on a fix to the way the register pairs are handled
Reland "[Remarks] Refactor remark diagnostic emission in a RemarkStreamer"
This allows us to store more info about where we're emitting the remarks
without cluttering LLVMContext. This is needed for future support for
the remark section.
Teresa Johnson [Wed, 6 Mar 2019 14:57:40 +0000 (14:57 +0000)]
[CGP] Avoid repeatedly building DominatorTree causing long compile-time (NFC)
Summary:
In r354298 a DominatorTree construction was added via new function
combineToUSubWithOverflow, which was subsequently restructured into
replaceMathCmpWithIntrinsic in r354689. We are hitting a very long
compile time due to this repeated construction, once per math cmp in
the function.
We shouldn't need to build the DominatorTree more than once per
function, except when a transformation invalidates it. There is already
a boolean flag that is returned from these methods indicating whether
the DT has been modified. We can simply build the DT once per
Function walk in CodeGenPrepare::runOnFunction, since any time a change
is made we break out of the Function walk and restart it.
I modified the code so that both replaceMathCmpWithIntrinsic as well as
mergeSExts (which was also building a DT) use the DT constructed by the
run method.
From -mllvm -time-passes:
Before this patch: CodeGen Prepare user time is 328s
With this patch: CodeGen Prepare user time is 21s
[Remarks] Refactor remark diagnostic emission in a RemarkStreamer
This allows us to store more info about where we're emitting the remarks
without cluttering LLVMContext. This is needed for future support for
the remark section.
George Rimar [Wed, 6 Mar 2019 14:12:18 +0000 (14:12 +0000)]
[llvm-objcopy] - Remove dead code. NFCI.
DecompressedSection can only be created if --decompress-debug-sections is specified.
https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-objcopy/ELF/ELFObjcopy.cpp#L492
If it is specified when !zlib::isAvailable(), we error out early when parsing the options:
https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-objcopy/CopyConfig.cpp#L657
What means the code I am removing in this patch is dead.