]> granicus.if.org Git - llvm/log
llvm
5 years agoFix cppcheck shadow variable name warnings. NFCI.
Simon Pilgrim [Sat, 12 Oct 2019 16:36:52 +0000 (16:36 +0000)]
Fix cppcheck shadow variable name warnings. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374668 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86] Use any_of/all_of patterns in shuffle mask pattern recognisers. NFCI.
Simon Pilgrim [Sat, 12 Oct 2019 16:36:44 +0000 (16:36 +0000)]
[X86] Use any_of/all_of patterns in shuffle mask pattern recognisers. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374667 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[lit] Adjust error handling for decode introduced by r374665
Joel E. Denny [Sat, 12 Oct 2019 16:25:46 +0000 (16:25 +0000)]
[lit] Adjust error handling for decode introduced by r374665

On that decode, Windows bots fail with:

```
UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128)
```

That's the same error as before r374665 except it's now at the decode
before the write to stdout.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374666 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[lit] Try yet again to fix new tests that fail on Windows bots
Joel E. Denny [Sat, 12 Oct 2019 16:00:35 +0000 (16:00 +0000)]
[lit] Try yet again to fix new tests that fail on Windows bots

I seem to have misread the bot logs on my last attempt.  When lit's
internal diff runs on Windows under Python 2.7, it's text diffs not
binary diffs that need decoding to avoid this error when writing the
diff to stdout:

```
UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128)
```

There is no `decode` attribute in this case under Python 3.6.8 under
Ubuntu, so this patch checks for the `decode` attribute before using
it here.  Hopefully nothing else is needed when `decode` isn't
available.

It might take a couple more attempts to figure out what error
handling, if any, is needed for this decoding.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374665 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoRevert r374657: "[lit] Try again to fix new tests that fail on Windows bots"
Joel E. Denny [Sat, 12 Oct 2019 16:00:25 +0000 (16:00 +0000)]
Revert r374657: "[lit] Try again to fix new tests that fail on Windows bots"

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374664 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[LoopIdiomRecognize] Recommit: BCmp loop idiom recognition
Roman Lebedev [Sat, 12 Oct 2019 15:35:32 +0000 (15:35 +0000)]
[LoopIdiomRecognize] Recommit: BCmp loop idiom recognition

Summary:
This is a recommit, this originally landed in rL370454 but was
subsequently reverted in  rL370788 due to
https://bugs.llvm.org/show_bug.cgi?id=43206
The reduced testcase was added to bcmp-negative-tests.ll
as @pr43206_different_loops - we must ensure that the SCEV's
we got are both for the same loop we are currently investigating.

Original commit message:

@mclow.lists brought up this issue up in IRC.
It is a reasonably common problem to compare some two values for equality.
Those may be just some integers, strings or arrays of integers.

In C, there is `memcmp()`, `bcmp()` functions.
In C++, there exists `std::equal()` algorithm.
One can also write that function manually.

libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ

libc++ does not do anything like that, it simply relies on simple C++'s
`operator==()`. https://godbolt.org/z/er0Zwf (GOOD!)

So likely, there exists a certain performance opportunities.
Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
is using `memcmp()` (in this case, compiled with modified compiler). {F8768213}

```
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iterator>
#include <limits>
#include <random>
#include <type_traits>
#include <utility>
#include <vector>

#include "benchmark/benchmark.h"

template <class T>
bool equal(T* a, T* a_end, T* b) noexcept {
  for (; a != a_end; ++a, ++b) {
    if (*a != *b) return false;
  }
  return true;
}

template <typename T>
std::vector<T> getVectorOfRandomNumbers(size_t count) {
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                       std::numeric_limits<T>::max());
  std::vector<T> v;
  v.reserve(count);
  std::generate_n(std::back_inserter(v), count,
                  [&dis, &gen]() { return dis(gen); });
  assert(v.size() == count);
  return v;
}

struct Identical {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto Tmp = getVectorOfRandomNumbers<T>(count);
    return std::make_pair(Tmp, std::move(Tmp));
  }
};

struct InequalHalfway {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto V0 = getVectorOfRandomNumbers<T>(count);
    auto V1 = V0;
    V1[V1.size() / size_t(2)]++;  // just change the value.
    return std::make_pair(std::move(V0), std::move(V1));
  }
};

template <class T, class Gen>
void BM_bcmp(benchmark::State& state) {
  const size_t Length = state.range(0);

  const std::pair<std::vector<T>, std::vector<T>> Data =
      Gen::template Gen<T>(Length);
  const std::vector<T>& a = Data.first;
  const std::vector<T>& b = Data.second;
  assert(a.size() == Length && b.size() == a.size());

  benchmark::ClobberMemory();
  benchmark::DoNotOptimize(a);
  benchmark::DoNotOptimize(a.data());
  benchmark::DoNotOptimize(b);
  benchmark::DoNotOptimize(b.data());

  for (auto _ : state) {
    const bool is_equal = equal(a.data(), a.data() + a.size(), b.data());
    benchmark::DoNotOptimize(is_equal);
  }
  state.SetComplexityN(Length);
  state.counters["eltcnt"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
  state.counters["eltcnt/sec"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
  const size_t BytesRead = 2 * sizeof(T) * Length;
  state.counters["bytes_read/iteration"] =
      benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                         benchmark::Counter::OneK::kIs1024);
  state.counters["bytes_read/sec"] = benchmark::Counter(
      BytesRead, benchmark::Counter::kIsIterationInvariantRate,
      benchmark::Counter::OneK::kIs1024);
}

template <typename T>
static void CustomArguments(benchmark::internal::Benchmark* b) {
  const size_t L2SizeBytes = []() {
    for (const benchmark::CPUInfo::CacheInfo& I :
         benchmark::CPUInfo::Get().caches) {
      if (I.level == 2) return I.size;
    }
    return 0;
  }();
  // What is the largest range we can check to always fit within given L2 cache?
  const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                        /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
  b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
}

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
    ->Apply(CustomArguments<uint64_t>);

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
    ->Apply(CustomArguments<uint64_t>);
```
{F8768210}
```
$ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
2019-04-25 21:17:11
Running build-old/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 0.65, 3.90, 4.14
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000           432131 ns       432101 ns         1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
BM_bcmp<uint8_t, Identical>_BigO               0.86 N          0.86 N
BM_bcmp<uint8_t, Identical>_RMS                   8 %             8 %
<...>
BM_bcmp<uint16_t, Identical>/256000          161408 ns       161409 ns         4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
BM_bcmp<uint16_t, Identical>_BigO              0.67 N          0.67 N
BM_bcmp<uint16_t, Identical>_RMS                 25 %            25 %
<...>
BM_bcmp<uint32_t, Identical>/128000           81497 ns        81488 ns         8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
BM_bcmp<uint32_t, Identical>_BigO              0.71 N          0.71 N
BM_bcmp<uint32_t, Identical>_RMS                 42 %            42 %
<...>
BM_bcmp<uint64_t, Identical>/64000            50138 ns        50138 ns        10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
BM_bcmp<uint64_t, Identical>_BigO              0.84 N          0.84 N
BM_bcmp<uint64_t, Identical>_RMS                 27 %            27 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000      192405 ns       192392 ns         3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.38 N          0.38 N
BM_bcmp<uint8_t, InequalHalfway>_RMS              3 %             3 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000     127858 ns       127860 ns         5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint16_t, InequalHalfway>_RMS             0 %             0 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000      49140 ns        49140 ns        14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.40 N          0.40 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            18 %            18 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000       32101 ns        32099 ns        21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint64_t, InequalHalfway>_RMS             1 %             1 %
RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
2019-04-25 21:19:29
Running build-new/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 1.01, 2.85, 3.71
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000            18593 ns        18590 ns        37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
BM_bcmp<uint8_t, Identical>_BigO               0.04 N          0.04 N
BM_bcmp<uint8_t, Identical>_RMS                  37 %            37 %
<...>
BM_bcmp<uint16_t, Identical>/256000           18950 ns        18948 ns        37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
BM_bcmp<uint16_t, Identical>_BigO              0.08 N          0.08 N
BM_bcmp<uint16_t, Identical>_RMS                 34 %            34 %
<...>
BM_bcmp<uint32_t, Identical>/128000           18627 ns        18627 ns        37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
BM_bcmp<uint32_t, Identical>_BigO              0.16 N          0.16 N
BM_bcmp<uint32_t, Identical>_RMS                 35 %            35 %
<...>
BM_bcmp<uint64_t, Identical>/64000            18855 ns        18855 ns        37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
BM_bcmp<uint64_t, Identical>_BigO              0.32 N          0.32 N
BM_bcmp<uint64_t, Identical>_RMS                 33 %            33 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000        9570 ns         9569 ns        73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.02 N          0.02 N
BM_bcmp<uint8_t, InequalHalfway>_RMS             29 %            29 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000       9547 ns         9547 ns        74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.04 N          0.04 N
BM_bcmp<uint16_t, InequalHalfway>_RMS            29 %            29 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000       9396 ns         9394 ns        73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.08 N          0.08 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            30 %            30 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000        9499 ns         9498 ns        73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.16 N          0.16 N
BM_bcmp<uint64_t, InequalHalfway>_RMS            28 %            28 %
Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000                      -0.9570         -0.9570        432131         18593        432101         18590
<...>
BM_bcmp<uint16_t, Identical>/256000                     -0.8826         -0.8826        161408         18950        161409         18948
<...>
BM_bcmp<uint32_t, Identical>/128000                     -0.7714         -0.7714         81497         18627         81488         18627
<...>
BM_bcmp<uint64_t, Identical>/64000                      -0.6239         -0.6239         50138         18855         50138         18855
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000                 -0.9503         -0.9503        192405          9570        192392          9569
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000                -0.9253         -0.9253        127858          9547        127860          9547
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000                -0.8088         -0.8088         49140          9396         49140          9394
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000                 -0.7041         -0.7041         32101          9499         32099          9498
```

What can we tell from the benchmark?
* Performance of naive equality check somewhat improves with element size,
  maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
  for uint64_t. I think, that instability implies performance problems.
* Performance of `memcmp()`-aware benchmark always maxes out at around
  bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
  naive variant!
* eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
  eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
  linearly decreases with element size.
  For uint64_t, it's ~4x+ the elements/second.
* The call obvious is more pricey than the loop, with small element count.
  As it can be seen from the full output {F8768210}, the `memcmp()` is almost
  universally worse, independent of the element size (and thus buffer size) when
  element count is less than 8.

So all in all, bcmp idiom does indeed pose untapped performance headroom.
This diff does implement said idiom recognition. I think a reasonable test
coverage is present, but do tell if there is anything obvious missing.

Now, quality. This does succeed to build and pass the test-suite, at least
without any non-bundled elements. {F8768216} {F8768217}
This transform fires 91 times:
```
$ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json
Tests: 1149
Metric: loop-idiom.NumBCmp

Program                                         result-new

MultiSourc...Benchmarks/7zip/7zip-benchmark    79.00
MultiSource/Applications/d/make_dparser         3.00
SingleSource/UnitTests/vla                      2.00
MultiSource/Applications/Burg/burg              1.00
MultiSourc.../Applications/JM/lencod/lencod     1.00
MultiSource/Applications/lemon/lemon            1.00
MultiSource/Benchmarks/Bullet/bullet            1.00
MultiSourc...e/Benchmarks/MallocBench/gs/gs     1.00
MultiSourc...gs-C/TimberWolfMC/timberwolfmc     1.00
MultiSourc...Prolangs-C/simulator/simulator     1.00
```
The size changes are:
I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
```
$ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash
Tests: 1149
Same hash: 907 (filtered out)
Remaining: 242
Metric: size..text

Program                                        result-old result-new diff
test-suite...ingleSource/UnitTests/vla.test   753.00     833.00     10.6%
test-suite...marks/7zip/7zip-benchmark.test   1001697.00 966657.00  -3.5%
test-suite...ngs-C/simulator/simulator.test   32369.00   32321.00   -0.1%
test-suite...plications/d/make_dparser.test   89585.00   89505.00   -0.1%
test-suite...ce/Applications/Burg/burg.test   40817.00   40785.00   -0.1%
test-suite.../Applications/lemon/lemon.test   47281.00   47249.00   -0.1%
test-suite...TimberWolfMC/timberwolfmc.test   250065.00  250113.00   0.0%
test-suite...chmarks/MallocBench/gs/gs.test   149889.00  149873.00  -0.0%
test-suite...ications/JM/lencod/lencod.test   769585.00  769569.00  -0.0%
test-suite.../Benchmarks/Bullet/bullet.test   770049.00  770049.00   0.0%
test-suite...HMARK_ANISTROPIC_DIFFUSION/128    NaN        NaN        nan%
test-suite...HMARK_ANISTROPIC_DIFFUSION/256    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/64    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/32    NaN        NaN        nan%
test-suite...ENCHMARK_BILATERAL_FILTER/64/4    NaN        NaN        nan%
Geomean difference                                                   nan%
         result-old    result-new       diff
count  1.000000e+01  10.00000      10.000000
mean   3.152090e+05  311695.40000  0.006749
std    3.790398e+05  372091.42232  0.036605
min    7.530000e+02  833.00000    -0.034981
25%    4.243300e+04  42401.00000  -0.000866
50%    1.197370e+05  119689.00000 -0.000392
75%    6.397050e+05  639705.00000 -0.000005
max    1.001697e+06  966657.00000  0.106242
```

I don't have timings though.

And now to the code. The basic idea is to completely replace the whole loop.
If we can't fully kill it, don't transform.
I have left one or two comments in the code, so hopefully it can be understood.

Also, there is a few TODO's that i have left for follow-ups:
* widening of `memcmp()`/`bcmp()`
* step smaller than the comparison size
* Metadata propagation
* more than two blocks as long as there is still a single backedge?
* ???

Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet

Reviewed By: courbet

Subscribers: miyuki, hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61144

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374662 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[NFC][LoopIdiom] Add bcmp loop idiom miscompile test from PR43206.
Roman Lebedev [Sat, 12 Oct 2019 15:35:16 +0000 (15:35 +0000)]
[NFC][LoopIdiom] Add bcmp loop idiom miscompile test from PR43206.

The transform forgot to check SCEV loop scopes.

https://bugs.llvm.org/show_bug.cgi?id=43206

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374661 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[NFC][LoopIdiom] Move one bcmp test into the proper place
Roman Lebedev [Sat, 12 Oct 2019 15:35:09 +0000 (15:35 +0000)]
[NFC][LoopIdiom] Move one bcmp test into the proper place

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374660 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86][SSE] Avoid unnecessary PMOVZX in v4i8 sum reduction
Simon Pilgrim [Sat, 12 Oct 2019 15:19:13 +0000 (15:19 +0000)]
[X86][SSE] Avoid unnecessary PMOVZX in v4i8 sum reduction

This should go away once D66004 has landed and we can simplify shuffle chains using demanded elts.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374658 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[lit] Try again to fix new tests that fail on Windows bots
Joel E. Denny [Sat, 12 Oct 2019 14:58:43 +0000 (14:58 +0000)]
[lit] Try again to fix new tests that fail on Windows bots

Based on the bot logs, when lit's internal diff runs on Windows, it
looks like binary diffs must be decoded also for Python 2.7.
Otherwise, writing the diff to stdout fails with:

```
UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128)
```

I did not need to decode using Python 2.7.15 under Ubuntu.  When I do
it anyway in that case, `errors="backslashreplace"` fails for me:

```
TypeError: don't know how to handle UnicodeDecodeError in error callback
```

However, `errors="ignore"` works, so this patch uses that, hoping
it'll work on Windows as well.

This patch leaves `errors="backslashreplace"` for Python >= 3.5 as
there's no evidence yet that doesn't work and it produces more
informative binary diffs.  This patch also adjusts some lit tests to
succeed for either error handler.

This patch adjusts changes introduced by D68664.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374657 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoRevert r374654: "[lit] Try to fix new tests that fail on Windows bots"
Joel E. Denny [Sat, 12 Oct 2019 14:58:30 +0000 (14:58 +0000)]
Revert r374654: "[lit] Try to fix new tests that fail on Windows bots"

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374656 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[CostModel][X86] Improve sum reduction costs.
Simon Pilgrim [Sat, 12 Oct 2019 13:21:50 +0000 (13:21 +0000)]
[CostModel][X86] Improve sum reduction costs.

I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2.

I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374655 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[lit] Try to fix new tests that fail on Windows bots
Joel E. Denny [Sat, 12 Oct 2019 13:08:21 +0000 (13:08 +0000)]
[lit] Try to fix new tests that fail on Windows bots

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374654 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[lit] Fix a few oversights in r374651 that broke some bots
Joel E. Denny [Sat, 12 Oct 2019 12:32:00 +0000 (12:32 +0000)]
[lit] Fix a few oversights in r374651 that broke some bots

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374653 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[lit] Fix internal diff's --strip-trailing-cr and use it
Joel E. Denny [Sat, 12 Oct 2019 11:58:30 +0000 (11:58 +0000)]
[lit] Fix internal diff's --strip-trailing-cr and use it

Using GNU diff, `--strip-trailing-cr` removes a `\r` appearing before
a `\n` at the end of a line.  Without this patch, lit's internal diff
only removes `\r` if it appears as the last character.  That seems
useless.  This patch fixes that.

This patch also adds `--strip-trailing-cr` to some tests that fail on
Windows bots when D68664 is applied.  Based on what I see in the bot
logs, I think the following is happening.  In each test there, lit
diff is comparing a file with `\r\n` line endings to a file with `\n`
line endings.  Without D68664, lit diff reads those files with
Python's universal newlines support activated, causing `\r` to be
dropped.  However, with D68664, lit diff reads the files in binary
mode instead and thus reports that every line is different, just as
GNU diff does (at least under Ubuntu).  Adding `--strip-trailing-cr`
to those tests restores the previous behavior while permitting the
behavior of lit diff to be more like GNU diff.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D68839

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374652 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoReland r374392: [lit] Extend internal diff to support -U
Joel E. Denny [Sat, 12 Oct 2019 11:58:03 +0000 (11:58 +0000)]
Reland r374392: [lit] Extend internal diff to support -U

To avoid breaking some tests, D66574, D68664, D67643, and D68668
landed together.  However, D68664 introduced an issue now addressed by
D68839, with which these are now all relanding.

Differential Revision: https://reviews.llvm.org/D68668

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374651 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoReland r374390: [lit] Extend internal diff to support `-` argument
Joel E. Denny [Sat, 12 Oct 2019 11:57:41 +0000 (11:57 +0000)]
Reland r374390: [lit] Extend internal diff to support `-` argument

To avoid breaking some tests, D66574, D68664, D67643, and D68668
landed together.  However, D68664 introduced an issue now addressed by
D68839, with which these are now all relanding.

Differential Revision: https://reviews.llvm.org/D67643

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374650 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoReland r374389: [lit] Clean up internal diff's encoding handling
Joel E. Denny [Sat, 12 Oct 2019 11:57:20 +0000 (11:57 +0000)]
Reland r374389: [lit] Clean up internal diff's encoding handling

To avoid breaking some tests, D66574, D68664, D67643, and D68668
landed together.  However, D68664 introduced an issue now addressed by
D68839, with which these are now all relanding.

Differential Revision: https://reviews.llvm.org/D68664

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374649 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoReland r374388: [lit] Make internal diff work in pipelines
Joel E. Denny [Sat, 12 Oct 2019 11:56:57 +0000 (11:56 +0000)]
Reland r374388: [lit] Make internal diff work in pipelines

To avoid breaking some tests, D66574, D68664, D67643, and D68668
landed together.  However, D68664 introduced an issue now addressed by
D68839, with which these are now all relanding.

Differential Revision: https://reviews.llvm.org/D66574

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374648 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[Attributor] Extend anonymous namespace. NFC.
Benjamin Kramer [Sat, 12 Oct 2019 11:01:52 +0000 (11:01 +0000)]
[Attributor] Extend anonymous namespace. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374647 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[LV] Merge LLVM_DEBUG blocks.
Benjamin Kramer [Sat, 12 Oct 2019 10:57:22 +0000 (10:57 +0000)]
[LV] Merge LLVM_DEBUG blocks.

Avoids unused variable warnings about the range-based for loops in
there. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374646 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86] Use pack instructions for packus/ssat truncate patterns when 256-bit is the...
Craig Topper [Sat, 12 Oct 2019 07:59:29 +0000 (07:59 +0000)]
[X86] Use pack instructions for packus/ssat truncate patterns when 256-bit is the largest legal vector and the result type is at least 256 bits.

Since the input type is larger than 256-bits we'll need to some
concatenating to reassemble the results. The pack instructions
ability to concatenate while packing make this a shorter/faster
sequence.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374643 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86] Test SKX cpu in the vector-trunc-packus/ssat/usat.ll tests instad of min-legal...
Craig Topper [Sat, 12 Oct 2019 07:59:24 +0000 (07:59 +0000)]
[X86] Test SKX cpu in the vector-trunc-packus/ssat/usat.ll tests instad of min-legal-vector-width.ll

This adds "min-legal-vector-width"="256" function attributes to
all the tests for a larger than 256-bit input. Also switch any
larger than 512-bit inputs to use a load. This makes the
arguments consistent with min-legal-vector-width attribute which
should usually be at least as large as the arguments.

The SKX configuration will avoid using zmm registers on the
modified test cases. For many of them we should use something
closer to the AVX2 codegen with pack instructions instead of
the avx512 saturating truncates.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374642 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[mips] Rely on GPR size not ABI when select instruction to load value into register
Simon Atanasyan [Sat, 12 Oct 2019 07:42:51 +0000 (07:42 +0000)]
[mips] Rely on GPR size not ABI when select instruction to load value into register

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374641 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[mips] Fix `loadImmediate` calls when load non-address values.
Simon Atanasyan [Sat, 12 Oct 2019 07:42:44 +0000 (07:42 +0000)]
[mips] Fix `loadImmediate` calls when load non-address values.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374640 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[lit] Remove setting of the target-windows feature
Martin Storsjo [Sat, 12 Oct 2019 06:40:24 +0000 (06:40 +0000)]
[lit] Remove setting of the target-windows feature

No other OSes use a target-<os> feature, and no tests depend on it
any lomger.

Differential Revision: https://reviews.llvm.org/D68450

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374639 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[llvm-lipo] Pass ArrayRef by value.
Alexander Shaposhnikov [Sat, 12 Oct 2019 06:14:02 +0000 (06:14 +0000)]
[llvm-lipo] Pass ArrayRef by value.

Pass ArrayRef by value, fix formatting. NFC.

Test plan: make check-all

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374637 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoRevert 374629 "[sancov] Accommodate sancov and coverage report server for use under...
Vitaly Buka [Sat, 12 Oct 2019 05:23:43 +0000 (05:23 +0000)]
Revert 374629 "[sancov] Accommodate sancov and coverage report server for use under Windows"

http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/27650/steps/ninja%20check%201/logs/stdio
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/31759
http://lab.llvm.org:8011/builders/clang-s390x-linux-lnt/builds/15095
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/21075
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/31759

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374636 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoNFC: clang-format rL374420 and adjust comment wording
Hubert Tong [Sat, 12 Oct 2019 04:08:31 +0000 (04:08 +0000)]
NFC: clang-format rL374420 and adjust comment wording

The commit of rL374420 had various formatting issues, including lines
that exceed 80 columns. This patch applies `git clang-format` on the
changes from commit 13bd3ef40d8b1586f26a022e01b21e56c91e05bd.

It further adjusts a comment to clarify the domain of inputs upon which
a newly added function is meant to operate. The adjustment to the
comment was suggested in a post-commit comment on D68721 and discussed
off-list with @sfertile.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374635 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agorecommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separatel...
Zi Xuan Wu [Sat, 12 Oct 2019 02:53:04 +0000 (02:53 +0000)]
recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize

In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374634 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[sancov] Accommodate sancov and coverage report server for use under Windows
Vitaly Buka [Sat, 12 Oct 2019 02:29:26 +0000 (02:29 +0000)]
[sancov] Accommodate sancov and coverage report server for use under Windows

Summary:
This patch makes the following changes to SanCov and its complementary Python script in order to resolve issues pertaining to non-UNIX file paths in JSON symbolization information:
* Convert all paths to use forward slash.
* Update `coverage-report-server.py` to correctly handle paths to sources which contain spaces.
* Remove Linux platform restriction for all SanCov unit tests. All SanCov tests passed when ran on my local Windows machine.

Patch by Douglas Gliner.

Reviewers: kcc, filcab, phosek, morehouse, vitalybuka, metzman

Reviewed By: vitalybuka

Subscribers: vsk, Dor1s, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D51018

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374629 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[sancov] Use LLVM Support library JSON writer in favor of individual implementation
Vitaly Buka [Sat, 12 Oct 2019 02:29:24 +0000 (02:29 +0000)]
[sancov] Use LLVM Support library JSON writer in favor of individual implementation

Summary:
In this diff, I've replaced the individual implementation of `JSONWriter` with `json::OStream` provided by `llvm/Support/JSON.h`.

Important Note: The output format of the JSON is considerably different compared to the original implementation. Important differences include:
* New line for each entry in an array (should make diffs cleaner)
* No space between keys and colon in attributed object entries.
* Attributes with empty strings will now print the attribute name and a quote pair rather than excluding the attribute altogether

Examples of these differences can be seen in the changes to the sancov tests which compare the JSON output.

Patch by Douglas Gliner.

Reviewers: kcc, filcab, phosek, morehouse, vitalybuka, metzman

Subscribers: mehdi_amini, dexonsmith, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D68752

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374628 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[asan] Return true from instrumentModule
Vitaly Buka [Sat, 12 Oct 2019 01:50:36 +0000 (01:50 +0000)]
[asan] Return true from instrumentModule

createSanitizerCtorAndInitFunctions always change the module.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374623 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoDebugInfo: Fix msan use-of-uninitialized exposed by r374600
David Blaikie [Sat, 12 Oct 2019 00:27:12 +0000 (00:27 +0000)]
DebugInfo: Fix msan use-of-uninitialized exposed by r374600

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374619 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[llvm-profdata] Make "malformed-ptr-to-counter-array.test" textual
Vedant Kumar [Sat, 12 Oct 2019 00:23:15 +0000 (00:23 +0000)]
[llvm-profdata] Make "malformed-ptr-to-counter-array.test" textual

As pointed out in https://reviews.llvm.org/D66979 post-commit, making
this test textual would make it more maintainable.

Differential Revision: https://reviews.llvm.org/D68718

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374617 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86] Fold a VTRUNCS/VTRUNCUS+store into a saturating truncating store.
Craig Topper [Sat, 12 Oct 2019 00:01:08 +0000 (00:01 +0000)]
[X86] Fold a VTRUNCS/VTRUNCUS+store into a saturating truncating store.

We already did this for VTRUNCUS with a specific combination of
types. This extends this to VTRUNCS and handles any types where
a truncating store is legal.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374615 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86] Add test case showing missing opportunity to fold vmovsdb into a store after...
Craig Topper [Sat, 12 Oct 2019 00:00:59 +0000 (00:00 +0000)]
[X86] Add test case showing missing opportunity to fold vmovsdb into a store after type legalization. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374614 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoDebugInfo: Reduce the scope of some variables related to debug_ranges emission
David Blaikie [Fri, 11 Oct 2019 23:51:24 +0000 (23:51 +0000)]
DebugInfo: Reduce the scope of some variables related to debug_ranges emission

Minor tidy up/NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374613 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agogn build: (manually) merge r374606 better
Nico Weber [Fri, 11 Oct 2019 23:22:36 +0000 (23:22 +0000)]
gn build: (manually) merge r374606 better

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374611 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agogn build: Merge r235758
GN Sync Bot [Fri, 11 Oct 2019 23:12:04 +0000 (23:12 +0000)]
gn build: Merge r235758

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374610 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agogn build: Cmanually) merge r374590
Nico Weber [Fri, 11 Oct 2019 23:05:24 +0000 (23:05 +0000)]
gn build: Cmanually) merge r374590

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374608 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[AMDGPU] Use GCN prefix in dpp_combine.mir. NFC.
Stanislav Mekhanoshin [Fri, 11 Oct 2019 22:28:04 +0000 (22:28 +0000)]
[AMDGPU] Use GCN prefix in dpp_combine.mir. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374607 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[AMDGPU] link dpp pseudos and real instructions on gfx10
Stanislav Mekhanoshin [Fri, 11 Oct 2019 22:03:36 +0000 (22:03 +0000)]
[AMDGPU] link dpp pseudos and real instructions on gfx10

This defaults to zero fi operand, but we do not expose it
anyway. Should we expose it later it needs to be added to
the pseudo.

This enables dpp combining on gfx10.

Differential Revision: https://reviews.llvm.org/D68888

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374604 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[lit] Small cleanups in main.py
Julian Lettner [Fri, 11 Oct 2019 21:57:09 +0000 (21:57 +0000)]
[lit] Small cleanups in main.py

* Extract separate function for running tests from main
* Push single-usage imports to point of usage
* Remove unnecessary sys.exit(0) calls

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D68836

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374602 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[lit] Change regex filter to ignore case
Julian Lettner [Fri, 11 Oct 2019 21:57:06 +0000 (21:57 +0000)]
[lit] Change regex filter to ignore case

Make regex filter `--filter=REGEX` option more lenient via
`re.IGNORECASE`.

Reviewed By: yln

Differential Revision: https://reviews.llvm.org/D68834

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374601 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoDebugInfo: Use base address selection entries for debug_loc
David Blaikie [Fri, 11 Oct 2019 21:52:41 +0000 (21:52 +0000)]
DebugInfo: Use base address selection entries for debug_loc

Unify the range and loc emission (for both DWARFv4 and DWARFv5 style lists) and take advantage of that unification to use strategic base addresses for loclists.

Differential Revision: https://reviews.llvm.org/D68620

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374600 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[mips] Remove unused local variables. NFC
Simon Atanasyan [Fri, 11 Oct 2019 21:51:39 +0000 (21:51 +0000)]
[mips] Remove unused local variables. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374599 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[mips] Store 64-bit `li.d' operand as a single 8-byte value
Simon Atanasyan [Fri, 11 Oct 2019 21:51:33 +0000 (21:51 +0000)]
[mips] Store 64-bit `li.d' operand as a single 8-byte value

Now assembler generates two consecutive `.4byte` directives to store
64-bit `li.d' operand. The first directive stores high 4-byte of the
value. The second directive stores low 4-byte of the value. But on
64-bit system we load this value at once and get wrong result if the
system is little-endian.

This patch fixes the bug. It stores the `li.d' operand as a single
8-byte value.

Differential Revision: https://reviews.llvm.org/D68778

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374598 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[mips] Use less instruction to load zero into FPR by li.s / li.d pseudos
Simon Atanasyan [Fri, 11 Oct 2019 21:51:23 +0000 (21:51 +0000)]
[mips] Use less instruction to load zero into FPR by li.s / li.d pseudos

If `li.s` or `li.d` loads zero into a FPR, it's not necessary to load
zero into `at` GPR register and then move its value into a floating
point register. We can use as a source register the `zero / $0` one.

Differential Revision: https://reviews.llvm.org/D68777

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374597 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[GISel][UnitTest] Fix a bunch of tests that were not doing anything
Quentin Colombet [Fri, 11 Oct 2019 20:58:26 +0000 (20:58 +0000)]
[GISel][UnitTest] Fix a bunch of tests that were not doing anything

After r368065, all the tests using GISelMITest must call setUp() before
doing anything, otherwise the TargetMachine is not going to be set up.
A few tests added after that commit were not doing that and ended up
testing effectively nothing.

Fix the setup of all the tests and fix the failing tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374595 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoRevert 374373: [Codegen] Alter the default promotion for saturating adds and subs
David Green [Fri, 11 Oct 2019 20:33:03 +0000 (20:33 +0000)]
Revert 374373: [Codegen] Alter the default promotion for saturating adds and subs

This commit is not extending the promoted integers as it should. Reverting
whilst I look into the details.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374592 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[Mips][llvm-exegesis] Add a Mips target
Simon Atanasyan [Fri, 11 Oct 2019 20:26:08 +0000 (20:26 +0000)]
[Mips][llvm-exegesis] Add a Mips target

The target does just enough to be able to run llvm-exegesis in latency
mode for at least some opcodes.

Patch by MiloÅ¡ Stojanović.

Differential Revision: https://reviews.llvm.org/D68649

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374590 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[GISel][CallLowering] Enable vector support in argument lowering
Quentin Colombet [Fri, 11 Oct 2019 20:22:57 +0000 (20:22 +0000)]
[GISel][CallLowering] Enable vector support in argument lowering

The exciting code is actually already enough to handle the splitting
of vector arguments but we were lacking a test case.

This commit adds a test case for vector argument lowering involving
splitting and enable the related support in call lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374589 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[MachineIRBuilder] Fix an assertion failure with buildMerge
Quentin Colombet [Fri, 11 Oct 2019 20:22:47 +0000 (20:22 +0000)]
[MachineIRBuilder] Fix an assertion failure with buildMerge

Teach buildMerge how to deal with scalar to vector kind of requests.

Prior to this patch, buildMerge would issue either a G_MERGE_VALUES
when all the vregs are scalars or a G_CONCAT_VECTORS when the destination
vreg is a vector.
G_CONCAT_VECTORS was actually not the proper instruction when the source
vregs were scalars and the compiler would assert that the sources must
be vectors. Instead we want is to issue a G_BUILD_VECTOR when we are
in this situation.

This patch fixes that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374588 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agollvm-dwarfdump: Add verbose printing for debug_loclists
David Blaikie [Fri, 11 Oct 2019 19:06:35 +0000 (19:06 +0000)]
llvm-dwarfdump: Add verbose printing for debug_loclists

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374582 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86][SSE] Add support for v4i8 add reduction
Simon Pilgrim [Fri, 11 Oct 2019 17:54:15 +0000 (17:54 +0000)]
[X86][SSE] Add support for v4i8 add reduction

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374579 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agogn build: (manually) merge r374110
Nico Weber [Fri, 11 Oct 2019 17:42:24 +0000 (17:42 +0000)]
gn build: (manually) merge r374110

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374575 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[AArch64] add tests for (v)select-of-constants; NFC
Sanjay Patel [Fri, 11 Oct 2019 16:10:23 +0000 (16:10 +0000)]
[AArch64] add tests for (v)select-of-constants; NFC

These are copied from existing test files in x86/PPC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374568 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[AArch64][SVE] Implement sdot and udot (lane) intrinsics
Kerry McLaughlin [Fri, 11 Oct 2019 15:53:41 +0000 (15:53 +0000)]
[AArch64][SVE] Implement sdot and udot (lane) intrinsics

Summary:
Implements the following arithmetic intrinsics:
  - int_aarch64_sve_sdot
  - int_aarch64_sve_sdot_lane
  - int_aarch64_sve_udot
  - int_aarch64_sve_udot_lane

This patch includes tests for the Subdivide4Argument type added by D67549

Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D67551

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374566 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[VPlan] Add moveAfter to VPRecipeBase.
Florian Hahn [Fri, 11 Oct 2019 15:36:55 +0000 (15:36 +0000)]
[VPlan] Add moveAfter to VPRecipeBase.

This patch adds a moveAfter method to VPRecipeBase, which can be used to
move elements after other elements, across VPBasicBlocks, if necessary.

Reviewers: dcaballe, hsaito, rengolin, hfinkel

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D46825

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374565 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[AIX] Use .space instead of .zero in assembly
David Tenty [Fri, 11 Oct 2019 15:07:28 +0000 (15:07 +0000)]
[AIX] Use .space instead of .zero in assembly

Summary:
The AIX system assembler does not understand .zero, so we should prefer
emitting .space.

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68815

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374564 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[AMDGPU][MC][GFX9][GFX10] Corrected number of src operands for ds_[read/write]_addtid_b32
Dmitry Preobrazhensky [Fri, 11 Oct 2019 14:53:26 +0000 (14:53 +0000)]
[AMDGPU][MC][GFX9][GFX10] Corrected number of src operands for ds_[read/write]_addtid_b32

See https://bugs.llvm.org/show_bug.cgi?id=37941

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68787

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374561 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agogn build: Merge r374558
GN Sync Bot [Fri, 11 Oct 2019 14:48:31 +0000 (14:48 +0000)]
gn build: Merge r374558

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374560 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[AMDGPU][MC][GFX6][GFX7][GFX10] Added instructions buffer_atomic_[fcmpswap/fmin/fmax]*
Dmitry Preobrazhensky [Fri, 11 Oct 2019 14:44:51 +0000 (14:44 +0000)]
[AMDGPU][MC][GFX6][GFX7][GFX10] Added instructions buffer_atomic_[fcmpswap/fmin/fmax]*

See https://bugs.llvm.org/show_bug.cgi?id=28232

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68788

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374559 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[AMDGPU][MC][GFX10] Enabled null for 64-bit dst operands
Dmitry Preobrazhensky [Fri, 11 Oct 2019 14:35:11 +0000 (14:35 +0000)]
[AMDGPU][MC][GFX10] Enabled null for 64-bit dst operands

See https://bugs.llvm.org/show_bug.cgi?id=43524

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68785

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374557 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[llvm] [ocaml] Support linking against dylib
Michal Gorny [Fri, 11 Oct 2019 14:32:43 +0000 (14:32 +0000)]
[llvm] [ocaml] Support linking against dylib

Support linking OCaml modules against LLVM dylib when requested,
rather than against static libs that might not be installed at all.

Differential Revision: https://reviews.llvm.org/D68452

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374556 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[DAGCombiner] fold vselect-of-constants to shift
Sanjay Patel [Fri, 11 Oct 2019 14:17:56 +0000 (14:17 +0000)]
[DAGCombiner] fold vselect-of-constants to shift

The diffs suggest that we are missing some more basic
analysis/transforms, but this keeps the vector path in
sync with the scalar (rL374397). This is again a
preliminary step for introducing the reverse transform
in IR as proposed in D63382.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374555 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoFix compilation warnings. NFC.
Michael Liao [Fri, 11 Oct 2019 14:09:44 +0000 (14:09 +0000)]
Fix compilation warnings. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374554 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[AMDGPU][MC] Corrected parsing of optional operands
Dmitry Preobrazhensky [Fri, 11 Oct 2019 14:05:09 +0000 (14:05 +0000)]
[AMDGPU][MC] Corrected parsing of optional operands

See https://bugs.llvm.org/show_bug.cgi?id=43486

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D68350

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374553 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[mips] Follow-up to r374544. Fix test case.
Simon Atanasyan [Fri, 11 Oct 2019 12:58:37 +0000 (12:58 +0000)]
[mips] Follow-up to r374544. Fix test case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374548 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[Tests] Output of od can be lower or upper case (llvm-objcopy/yaml2obj).
Kai Nacke [Fri, 11 Oct 2019 12:50:57 +0000 (12:50 +0000)]
[Tests] Output of od can be lower or upper case (llvm-objcopy/yaml2obj).

The command `od -t x` is used to dump data in hex format.
The LIT tests assumes that the hex characters are in lowercase.
However, there are also platforms which use uppercase letter.

To solve this issue the tests are updated to use the new
`--ignore-case` option of FileCheck.

Reviewers: Bigcheese, jakehehrlich, rupprecht, espindola, alexshap, jhenderson

Differential Revision: https://reviews.llvm.org/D68693

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374547 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[mips] Fix loading "double" immediate into a GPR and FPR
Simon Atanasyan [Fri, 11 Oct 2019 12:33:12 +0000 (12:33 +0000)]
[mips] Fix loading "double" immediate into a GPR and FPR

If a "double" (64-bit) value has zero low 32-bits, it's possible to load
such value into a GP/FP registers as an instruction immediate. But now
assembler loads only high 32-bits of the value.

For example, if a target register is GPR the `li.d $4, 1.0` instruction
converts into the `lui $4, 16368` one. As a result, we get `0x3FF00000`
in the register. While a correct representation of the `1.0` value is
`0x3FF0000000000000`. The patch fixes that.

Differential Revision: https://reviews.llvm.org/D68776

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374544 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[llvm-readobj] - Remove excessive fields when dumping "Version symbols".
George Rimar [Fri, 11 Oct 2019 12:27:11 +0000 (12:27 +0000)]
[llvm-readobj] - Remove excessive fields when dumping "Version symbols".

This removes a few fields that are not useful:
"Section Name", "Address", "Offset" and "Link"
(they duplicated the information available under
the "Sections [" tag).

Differential revision: https://reviews.llvm.org/D68704

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374541 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoDead Virtual Function Elimination
Oliver Stannard [Fri, 11 Oct 2019 11:59:55 +0000 (11:59 +0000)]
Dead Virtual Function Elimination

Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.

This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.

To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.

The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.

This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.

To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.

I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.

On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.

I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.

I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).

Differential revision: https://reviews.llvm.org/D63932

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374539 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[FileCheck] Implement --ignore-case option.
Kai Nacke [Fri, 11 Oct 2019 11:59:14 +0000 (11:59 +0000)]
[FileCheck] Implement --ignore-case option.

The FileCheck utility is enhanced to support a `--ignore-case`
option. This is useful in cases where the output of Unix tools
differs in case (e.g. case not specified by Posix).

Reviewers: Bigcheese, jakehehrlich, rupprecht, espindola, alexshap, jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D68146

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374538 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[SCEV] Add stricter verification option.
Florian Hahn [Fri, 11 Oct 2019 11:46:40 +0000 (11:46 +0000)]
[SCEV] Add stricter verification option.

Currently -verify-scev only fails if there is a constant difference
between two BE counts. This misses a lot of cases.

This patch adds a -verify-scev-strict options, which fails for any
non-zero differences, if used together with -verify-scev.

With the stricter checking, some unit tests fail because
of mis-matches, especially around IndVarSimplify.

If there is no reason I am missing for just checking constant deltas, I
am planning on looking into the various failures.

Reviewers: efriedma, sanjoy.google, reames, atrick

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D68592

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374535 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86] isFNEG - add recursion depth limit
Simon Pilgrim [Fri, 11 Oct 2019 11:34:18 +0000 (11:34 +0000)]
[X86] isFNEG - add recursion depth limit

Now that its used by isNegatibleForFree we should try to avoid costly deep recursion

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374534 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[llvm-exegesis] Show noise cluster in analysis output.
Clement Courbet [Fri, 11 Oct 2019 11:33:18 +0000 (11:33 +0000)]
[llvm-exegesis] Show noise cluster in analysis output.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68780

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374533 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[Windows] Use information from the PE32 exceptions directory to construct unwind...
Aleksandr Urakov [Fri, 11 Oct 2019 09:03:29 +0000 (09:03 +0000)]
[Windows] Use information from the PE32 exceptions directory to construct unwind plans

This patch adds an implementation of unwinding using PE EH info. It allows to
get almost ideal call stacks on 64-bit Windows systems (except some epilogue
cases, but I believe that they can be fixed with unwind plan disassembly
augmentation in the future).

To achieve the goal the CallFrameInfo abstraction was made. It is based on the
DWARFCallFrameInfo class interface with a few changes to make it less
DWARF-specific.

To implement the new interface for PECOFF object files the class PECallFrameInfo
was written. It uses the next helper classes:

- UnwindCodesIterator helps to iterate through UnwindCode structures (and
  processes chained infos transparently);
- EHProgramBuilder with the use of UnwindCodesIterator constructs EHProgram;
- EHProgram is, by fact, a vector of EHInstructions. It creates an abstraction
  over the low-level unwind codes and simplifies work with them. It contains
  only the information that is relevant to unwinding in the unified form. Also
  the required unwind codes are read from the object file only once with it;
- EHProgramRange allows to take a range of EHProgram and to build an unwind row
  for it.

So, PECallFrameInfo builds the EHProgram with EHProgramBuilder, takes the ranges
corresponding to every offset in prologue and builds the rows of the resulted
unwind plan. The resulted plan covers the whole range of the function except the
epilogue.

Reviewers: jasonmolenda, asmith, amccarth, clayborg, JDevlieghere, stella.stamenova, labath, espindola

Reviewed By: jasonmolenda

Subscribers: leonid.mashinskiy, emaste, mgorny, aprantl, arichardson, MaskRay, lldb-commits, llvm-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D67347

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374528 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoInsert module constructors in a module pass
Vitaly Buka [Fri, 11 Oct 2019 08:47:03 +0000 (08:47 +0000)]
Insert module constructors in a module pass

Summary:
If we insert them from function pass some analysis may be missing or invalid.
Fixes PR42877.

Reviewers: eugenis, leonardchan

Reviewed By: leonardchan

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68832

llvm-svn: 374481
Signed-off-by: Vitaly Buka <vitalybuka@google.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374527 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[TableGen] Fix a bug that MCSchedClassDesc is interfered between different SchedModel
QingShan Zhang [Fri, 11 Oct 2019 08:36:54 +0000 (08:36 +0000)]
[TableGen] Fix a bug that MCSchedClassDesc is interfered between different SchedModel

Assume that, ModelA has scheduling resource for InstA and ModelB has scheduling resource for InstB. This is what the llvm::MCSchedClassDesc looks like:

llvm::MCSchedClassDesc ModelASchedClasses[] = {
...
InstA, 0, ...
InstB, -1,...
};

llvm::MCSchedClassDesc ModelBSchedClasses[] = {
...
InstA, -1,...
InstB, 0,...
};
The -1 means invalid num of macro ops, while it is valid if it is >=0. This is what we look like now:

llvm::MCSchedClassDesc ModelASchedClasses[] = {
...
InstA, 0, ...
InstB, 0,...
};

llvm::MCSchedClassDesc ModelBSchedClasses[] = {
...
InstA, 0,...
InstB, 0,...
};
And compiler hit the assertion here because the SCDesc is valid now for both InstA and InstB.

Differential Revision: https://reviews.llvm.org/D67950

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374524 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86] Add v8i64->v8i8 ssat/usat/packus truncate tests to min-legal-vector-width.ll
Craig Topper [Fri, 11 Oct 2019 07:24:36 +0000 (07:24 +0000)]
[X86] Add v8i64->v8i8 ssat/usat/packus truncate tests to min-legal-vector-width.ll

I wonder if we should split the v8i8 stores in order to form
two v4i8 saturating truncating stores. This would remove the
unpckl needed to concatenated the v4i8 results to make a
single store.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374519 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[ADT][Statistics] Fix test after rL374490
Kadir Cetinkaya [Fri, 11 Oct 2019 07:19:54 +0000 (07:19 +0000)]
[ADT][Statistics] Fix test after rL374490

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374518 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoFix modules build for r374337
Pavel Labath [Fri, 11 Oct 2019 07:16:19 +0000 (07:16 +0000)]
Fix modules build for r374337

A modules build failed with the following error:
  call to function 'operator&' that is neither visible in the template definition nor found by argument-dependent lookup

Fix that by declaring the appropriate operators in the llvm::minidump
namespace.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374517 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[PowerPC] Remove assertion "Shouldn't overwrite a register before it is killed"
Yi-Hong Lyu [Fri, 11 Oct 2019 05:32:29 +0000 (05:32 +0000)]
[PowerPC] Remove assertion "Shouldn't overwrite a register before it is killed"

The assertion is everzealous and fail tests like:

  renamable $x3 = LI8 0
  STD renamable $x3, 16, $x1
  renamable $x3 = LI8 0

Remove the assertion since killed flag of $x3 is not mandentory.

Differential Revision: https://reviews.llvm.org/D68344

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374515 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[NFC] run specific pass instead of whole -O3 pipeline for popcount recoginzation...
Chen Zheng [Fri, 11 Oct 2019 05:30:18 +0000 (05:30 +0000)]
[NFC] run specific pass instead of whole -O3 pipeline for popcount recoginzation testcase.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374514 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[InstCombine] recognize popcount.
Chen Zheng [Fri, 11 Oct 2019 05:13:56 +0000 (05:13 +0000)]
[InstCombine] recognize popcount.

  This patch recognizes popcount intrinsic according to algorithm from website
  http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel

Differential Revision: https://reviews.llvm.org/D68189

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374512 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86] Add a DAG combine to turn v16i16->v16i8 VTRUNCUS+store into a saturating trunca...
Craig Topper [Fri, 11 Oct 2019 04:16:49 +0000 (04:16 +0000)]
[X86] Add a DAG combine to turn v16i16->v16i8 VTRUNCUS+store into a saturating truncating store.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374509 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86] Add test case for trunc_packus_v16i32_v16i8_store to min-legal-vector-width.ll
Craig Topper [Fri, 11 Oct 2019 04:02:04 +0000 (04:02 +0000)]
[X86] Add test case for trunc_packus_v16i32_v16i8_store to min-legal-vector-width.ll

We aren't folding the vpmovuswb into the store.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374507 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[CVP] Remove a masking operation if range information implies it's a noop
Philip Reames [Fri, 11 Oct 2019 03:48:56 +0000 (03:48 +0000)]
[CVP] Remove a masking operation if range information implies it's a noop

This is really a known bits style transformation, but known bits isn't context sensitive. The particular case which comes up happens to involve a range which allows range based reasoning to eliminate the mask pattern, so handle that case specifically in CVP.

InstCombine likes to generate the mask-by-low-bits pattern when widening an arithmetic expression which includes a zext in the middle.

Differential Revision: https://reviews.llvm.org/D68811

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374506 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86] Add more packus/ssat/usat truncate tests from legal vectors to less than 128...
Craig Topper [Fri, 11 Oct 2019 03:46:39 +0000 (03:46 +0000)]
[X86] Add more packus/ssat/usat truncate tests from legal vectors to less than 128-bit vectors.

Some of these have sub-optimal codegen for avx512 relative to avx2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374505 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoRevert 374481 "[tsan,msan] Insert module constructors in a module pass"
Nico Weber [Fri, 11 Oct 2019 02:44:20 +0000 (02:44 +0000)]
Revert 374481 "[tsan,msan] Insert module constructors in a module pass"

CodeGen/sanitizer-module-constructor.c fails on mac and windows, see e.g.
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/11424

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374503 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[JITLink] Disable the MachO/AArch64 testcase while investigating bot failures.
Lang Hames [Fri, 11 Oct 2019 01:58:12 +0000 (01:58 +0000)]
[JITLink] Disable the MachO/AArch64 testcase while investigating bot failures.

The windows bots are failing due to a memory layout error. Temporarily disabling
while I investigate whether this can be worked around, or whether the test
should be disabled on Windows.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374500 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[JITLink] Fix MachO/arm64 GOTPAGEOFF encoding.
Lang Hames [Fri, 11 Oct 2019 01:50:31 +0000 (01:50 +0000)]
[JITLink] Fix MachO/arm64 GOTPAGEOFF encoding.

The original implementation failed to shift the immediate down.

This should fix some of the bot failures due to r374476.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374499 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[Attributor][FIX] Do not replace musstail calls with constant
Johannes Doerfert [Fri, 11 Oct 2019 01:45:32 +0000 (01:45 +0000)]
[Attributor][FIX] Do not replace musstail calls with constant

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374498 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agoAMDGPU: Move SelectFlatOffset back into AMDGPUISelDAGToDAG
Matt Arsenault [Fri, 11 Oct 2019 01:28:27 +0000 (01:28 +0000)]
AMDGPU: Move SelectFlatOffset back into AMDGPUISelDAGToDAG

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374495 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[Stats] Add ALWAYS_ENABLED_STATISTIC enabled regardless of LLVM_ENABLE_STATS.
Volodymyr Sapsai [Fri, 11 Oct 2019 00:57:41 +0000 (00:57 +0000)]
[Stats] Add ALWAYS_ENABLED_STATISTIC enabled regardless of LLVM_ENABLE_STATS.

The intended usage is to measure relatively expensive operations. So the
cost of the statistic is negligible compared to the cost of a measured
operation and can be enabled all the time without impairing the
compilation time.

rdar://problem/55715134

Reviewers: dsanders, bogner, rtereshin

Reviewed By: dsanders

Subscribers: hiraditya, jkorous, dexonsmith, ributzka, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68252

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374490 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86] Improve the AVX512 bailout in combineTruncateWithSat to allow pack instructions...
Craig Topper [Fri, 11 Oct 2019 00:38:51 +0000 (00:38 +0000)]
[X86] Improve the AVX512 bailout in combineTruncateWithSat to allow pack instructions in more situations.

If we don't have VLX we won't end up selecting a saturating
truncate for 256-bit or smaller vectors so we should just use
the pack lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374487 91177308-0d34-0410-b5e6-96231b3b80d8

5 years ago[X86] Update trunc_packus_v32i32_v32i8 test in min-legal-vector-width.ll to use a...
Craig Topper [Fri, 11 Oct 2019 00:38:41 +0000 (00:38 +0000)]
[X86] Update trunc_packus_v32i32_v32i8 test in min-legal-vector-width.ll to use a load for the large type and add the min-legal-vector-width attribute.

The attribute is needed to avoid zmm registers. Using memory
avoids argument splitting for large vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374486 91177308-0d34-0410-b5e6-96231b3b80d8

5 years agogn build: Merge r374476
GN Sync Bot [Thu, 10 Oct 2019 23:49:59 +0000 (23:49 +0000)]
gn build: Merge r374476

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374482 91177308-0d34-0410-b5e6-96231b3b80d8