Alexandre Ganea [Thu, 28 Feb 2019 03:03:07 +0000 (03:03 +0000)]
Fix non-Windows platforms build break introduced by r355065. Fixes:
In file included from /home/buildbots/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm/lib/Support/Memory.cpp:14:
/home/buildbots/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm/include/llvm/Support/Memory.h:38:14: error: private field 'Flags' is not used [-Werror,-Wunused-private-field]
unsigned Flags = 0;
^
1 error generated.
Alexandre Ganea [Thu, 28 Feb 2019 02:47:34 +0000 (02:47 +0000)]
[Memory] Add basic support for large/huge memory pages
This patch introduces Memory::MF_HUGE_HINT which indicates that allocateMappedMemory() shall return a pointer to a large memory page.
However the flag is a hint because we're not guaranteed in any way that we will get back a large memory page. There are several restrictions:
- Large/huge memory pages aren't enabled by default on modern OSes (Windows 10 and Linux at least), and should be manually enabled/reserved.
- Once enabled, it should be kept in mind that large pages are physical only, they can't be swapped.
- Memory fragmentation can affect the availability of large pages, especially after running the OS for a long time and/or running along many other applications.
Memory::allocateMappedMemory() will fallback to 4KB pages if it can't allocate 2MB large pages (if Memory::MF_HUGE_HINT is provided)
Currently, Memory::MF_HUGE_HINT only works on Windows. The hint will be ignored on Linux, 4KB pages will always be returned.
Eric Christopher [Thu, 28 Feb 2019 01:11:12 +0000 (01:11 +0000)]
Temporarily revert "ArgumentPromotion should copy all metadata to new Function" and the dependent patch "Refine ArgPromotion metadata handling" as they're causing segfaults in argument promotion.
Reid Kleckner [Wed, 27 Feb 2019 23:38:44 +0000 (23:38 +0000)]
[InstrProf] Use separate comdat group for data and counters
Summary:
I hadn't realized that instrumentation runs before inlining, so we can't
use the function as the comdat group. Doing so can create relocations
against discarded sections when references to discarded __profc_
variables are inlined into functions outside the function's comdat
group.
In the future, perhaps we should consider standardizing the comdat group
names that ELF and COFF use. It will save object file size, since
__profv_$sym won't appear in the symbol table again.
Alina Sbirlea [Wed, 27 Feb 2019 22:20:22 +0000 (22:20 +0000)]
[MemorySSA] Make insertDef insert corresponding phi nodes.
Summary:
The original assumption for the insertDef method was that it would not
materialize Defs out of no-where, hence it will not insert phis needed
after inserting a Def.
However, when cloning an instruction (use case used in LICM), we do
materialize Defs "out of no-where". If the block receiving a Def has at
least one other Def, then no processing is needed. If the block just
received its first Def, we must check where Phi placement is needed.
The only new usage of insertDef is in LICM, hence the trigger for the bug.
But the original goal of the method also fails to apply for the move()
method. If we move a Def from the entry point of a diamond to either the
left or right blocks, then the merge block must add a phi.
While this usecase does not currently occur, or may be viewed as an
incorrect transformation, MSSA must behave corectly given the scenario.
Matt Davis [Wed, 27 Feb 2019 21:39:11 +0000 (21:39 +0000)]
[llvm-cxxfilt] Re-enable split and demangle stdin input on certain non-alphanumerics.
This restores the patch that splits demangled stdin input on
non-alphanumerics. I had reverted this patch earlier because it broke
Windows build-bots. I have updated the test so that it passes on
Windows.
I was running the test from powershell and never saw the issue until I
switched to the mingw shell.
Philip Reames [Wed, 27 Feb 2019 20:20:08 +0000 (20:20 +0000)]
Seperate volatility and atomicity/ordering in SelectionDAG
At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..).
This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success.
To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll)..
Previously landed
D57593 Fix a bug in the definition of isUnordered on MachineMemOperand
D57596 [CodeGen] Be conservative about atomic accesses as for volatile
D57802 Be conservative about unordered accesses for the moment
rL353959: [Tests] First batch of cornercase tests for unordered atomics.
rL353966: [Tests] RMW folding tests w/unordered atomic operations.
rL353972: [Tests] More unordered atomic lowering tests.
rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface
rL354740: [Hexagon, SystemZ] Be super conservative about atomics
rL354800: [Lanai] Be super conservative about atomics
rL354845: [ARM] Be super conservative about atomics
Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.)
This test failure seems to be a quoting issue between my test and
FileCheck on Windows. I'm reverting this patch until I can replicate
and fix in my Windows environment.
Simon Pilgrim [Wed, 27 Feb 2019 18:46:32 +0000 (18:46 +0000)]
[X86][AVX] Pull out some INSERT_SUBVECTOR combines into a combineConcatVectorOps helper. NFCI
A lot of the INSERT_SUBVECTOR combines can be more generally handled as if they have come from a CONCAT_VECTORS node.
I've been investigating adding a CONCAT_VECTORS combine to X86, but this is a much easier first step that avoids the issue of handling a number of pre-legalization issues that I've encountered.
Matt Davis [Wed, 27 Feb 2019 18:39:17 +0000 (18:39 +0000)]
[llvm-readobj] Print section type values for unknown sections.
Summary:
This patch displays a hexadecimal section value (Elf_Shdr::sh_type) or section-relative offset when printing unknown sections.
Here is a subset of the output (ignoring the fields following "Type" when dumping an ELF's GNU `--section-headers` table).
Section Headers:
```
[Nr] Name Type
[16] android_rel LOOS+0x1
[17] android_rela LOOS+0x2
[27] unknown 0x1000: <unknown>
[28] loos LOOS+0
[30] hios VERSYM
[31] loproc LOPROC+0
[33] hiproc LOPROC+0xFFFFFFF
[34] louser LOUSER+0
[36] hiuser LOUSER+0x7FFFFFFF
```
As a comparison, the previous output looked something like the above, but with a blank "Type" field:
```
[Nr] Name Type
[27] unknown
[28] loos
[30] hios VERSYM
[31] loproc
[33] hiproc
[34] louser
[36] hiuser
```
Matt Davis [Wed, 27 Feb 2019 18:04:21 +0000 (18:04 +0000)]
[llvm-cxxfilt] Re-enable the delimiters test on Windows.
The original intent was to enable this test for Windows; however, that
initial patch broke one of the build-bots. I temporarily disabled this
test on Windows until that issue was resolved. It was resolved in my
previous patch (cfd1d9742ee2d1b8dd6b7), and now I am re-enabling this
test.
Matt Davis [Wed, 27 Feb 2019 17:39:36 +0000 (17:39 +0000)]
Clean up the delimiters test.
Ideally this is a NFCI, used single quotes in most cases. Hopefully
this will make the Windows bot happy.
I've marked this unsupported on windows, until I get my windows box
setup with this patch to test. I'll remove that constraint after I'm
confident this will pass on windows. I just want to silence the
buildbots for now.
James Henderson [Wed, 27 Feb 2019 16:41:59 +0000 (16:41 +0000)]
[llvm-readobj]Add additional testing for various ELF features
This patch adds testing of areas of the code that are not fully tested,
in particular dynamic table printing, ELF type printing, handling of
edge cases where things are missing/empty (relocations/program header
tables/section header table), and the --string-dump switch.
Only a small problem here, I have no idea on choosing test case. I see there's a test
file(test/tools/llvm-objdump/private-headers-dynamic-section.test). But it has no DT_RPATH and DT_RUNPATH tags. Shall I replace the ELF file in the
Inputs dir by a new one?
Matt Davis [Wed, 27 Feb 2019 16:29:50 +0000 (16:29 +0000)]
[llvm-cxxfilt] Split and demangle stdin input on certain non-alphanumerics.
Summary:
This patch attempts to replicate GNU c++-filt behavior when splitting stdin input for demangling.
Previously, cxx-filt would split input only on spaces. Each delimited item is then demangled.
From what I have tested, GNU c++filt also splits input on any character that does not make
up the mangled name (notably commas, but also a large set of non-alphanumeric characters).
This patch splits stdin input on any character that does not belong to the Itanium mangling
format (since Itanium is currently the only supported format in llvm-cxxfilt).
Eugene Leviant [Wed, 27 Feb 2019 14:46:59 +0000 (14:46 +0000)]
[DebugInfo] Apply subprogram attributes on behalf of owner CU
When using full LTO it is possible that template function definition DIE
is bound to one compilation unit and it's declaration to another. We should
add function declaration attributes on behalf of its owner CU otherwise
we may end up with malformed file identifier in function declaration
DW_AT_decl_file attribute.
Alexey Lapshin [Wed, 27 Feb 2019 13:17:36 +0000 (13:17 +0000)]
[DebugInfo] add SectionedAddress to DebugInfo interfaces.
That patch is the fix for https://bugs.llvm.org/show_bug.cgi?id=40703
"wrong line number info for obj file compiled with -ffunction-sections"
bug. The problem happened with only .o files. If object file contains
several .text sections then line number information showed incorrectly.
The reason for this is that DwarfLineTable could not detect section which
corresponds to specified address(because address is the local to the
section). And as the result it could not select proper sequence in the
line table. The fix is to pass SectionIndex with the address. So that it
would be possible to differentiate addresses from various sections. With
this fix llvm-objdump shows correct line numbers for disassembled code.
James Henderson [Wed, 27 Feb 2019 11:07:08 +0000 (11:07 +0000)]
[llvm-readobj]Fix error messages for bad archive members and add testing for archive handling
llvm-readobj's error messages were broken for bad archive members. This
patch fixes them, and also adds testing for archive and thin archive
handling within llvm-readobj.
Yonghong Song [Wed, 27 Feb 2019 05:36:15 +0000 (05:36 +0000)]
[BPF] Don't fail for static variables
Currently, the LLVM will print an error like
Unsupported relocation: try to compile with -O2 or above,
or check your static variable usage
if user defines more than one static variables in a single
ELF section (e.g., .bss or .data).
There is ongoing effort to support static and global
variables in libbpf and kernel. This patch removed the
assertion so user programs with static variables won't
fail compilation.
The static variable in-section offset is written to
the "imm" field of the corresponding to-be-relocated
bpf instruction. Below is an example to show how the
application (e.g., libbpf) can relate variable to relocations.
-bash-4.4$ cat g1.c
static volatile long a = 2;
static volatile int b = 3;
int test() { return a + b; }
-bash-4.4$ clang -target bpf -O2 -c g1.c
-bash-4.4$ llvm-readelf -r g1.o
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS g1.c
2: 0000000000000000 8 OBJECT LOCAL DEFAULT 4 a
3: 0000000000000008 4 OBJECT LOCAL DEFAULT 4 b
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 64 FUNC GLOBAL DEFAULT 2 test
-bash-4.4$ llvm-objdump -d g1.o
. from symbol table, static variable "a" is in section #4, offset 0.
. from symbol table, static variable "b" is in section #4, offset 8.
. the first relocation is against symbol #4:
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
and in-section offset 0 (see llvm-objdump result)
. the second relocation is against symbol #4:
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
and in-section offset 8 (see llvm-objdump result)
. therefore, the first relocation is for variable "a", and
the second relocation is for variable "b".
Heejin Ahn [Wed, 27 Feb 2019 01:35:14 +0000 (01:35 +0000)]
[WebAssembly] Fix ScopeTops info in CFGStackify for EH pads
Summary:
When creating `ScopeTops` info for `try` ~ `catch` ~ `end_try`, we
should create not only `end_try` -> `try` mapping but also `catch` ->
`try` mapping as well. If this is not created, `block` and `end_block`
markers later added may span across an existing `catch`, resulting in
the incorrect code like:
```
try
block --| (X)
catch |
end_block --|
end_try
```
DWARFFormValues can be created from a data extractor or by passing its
value directly. Until now this was done by member functions that
modified an existing object's internal state. This patch replaces a
subset of these methods with static method that return a new
DWARFFormValue.
Heejin Ahn [Wed, 27 Feb 2019 00:50:53 +0000 (00:50 +0000)]
[WebAssembly] Remove unnecessary instructions after TRY marker placement
Summary:
This removes unnecessary instructions after TRY marker placement. There
are two cases:
- `end`/`end_block` can be removed if they overlap with `try`/`end_try`
and they have the same return types.
- `br` right before `catch` that branches to after `end_try` can be
deleted.
Jonas Paulsson [Wed, 27 Feb 2019 00:18:28 +0000 (00:18 +0000)]
[SystemZ] Pass regalloc hints to help Load-and-Test transformations.
Since there is no "Load-and-Test-High" instruction, the 32 bit load of a
register to be compared with 0 can only be implemented with LT if the virtual
GRX32 register ends up in a low part (GR32 register).
This patch detects these cases and passes the GR32 registers (low parts) as
(soft) hints in getRegAllocationHints().
Rong Xu [Tue, 26 Feb 2019 22:37:46 +0000 (22:37 +0000)]
[PGO] Context sensitive PGO (part 1)
Current PGO profile counts are not context sensitive. The branch probabilities
for the inlined functions are kept the same for all call-sites, and they might
be very different from the actual branch probabilities. These suboptimal
profiles can greatly affect some downstream optimizations, in particular for
the machine basic block placement optimization.
In this patch, we propose to have a post-inline PGO instrumentation/use pass,
which we called Context Sensitive PGO (CSPGO). For the users who want the best
possible performance, they can perform a second round of PGO instrument/use on
the top of the regular PGO. They will have two sets of profile counts. The
first pass profile will be manly for inline, indirect-call promotion, and
CGSCC simplification pass optimizations. The second pass profile is for
post-inline optimizations and code-gen optimizations.
A typical usage:
// Regular PGO instrumentation and generate pass1 profile.
> clang -O2 -fprofile-generate source.c -o gen
> ./gen
> llvm-profdata merge default.*profraw -o pass1.profdata
// CSPGO instrumentation.
> clang -O2 -fprofile-use=pass1.profdata -fcs-profile-generate -o gen2
> ./gen2
// Merge two sets of profiles
> llvm-profdata merge default.*profraw pass1.profdata -o profile.profdata
// Use the combined profile. Pass manager will invoke two PGO use passes.
> clang -O2 -fprofile-use=profile.profdata -o use
This change touches many components in the compiler. The reviewed patch
(D54175) will committed in phrases.
SITargetLowering::reassociateScalarOps() does not touch constants
so that DAGCombiner::ReassociateOps() does not revert the combine.
However a global address is not a ConstantSDNode.
Switched to the method used by DAGCombiner::ReassociateOps() itself
to detect constants.
Kristina Brooks [Tue, 26 Feb 2019 18:53:13 +0000 (18:53 +0000)]
Update docs of memcpy/move/set wrt. align and len
Fix https://bugs.llvm.org/show_bug.cgi?id=38583: Describe
how memcpy/memmove/memset behave when len=0. Also fix
some fallout from when the alignment parameter was
replaced by an attribute.
Andrew Ng [Tue, 26 Feb 2019 18:50:49 +0000 (18:50 +0000)]
[TableGen] Make OpcodeMappings sort comparator deterministic NFCI
The previous sort comparator was not deterministic, i.e. in some
situations it would be possible for lhs < rhs && rhs < lhs. This was
discovered by an STL assertion in a Windows debug build of llvm-tblgen.
Sanjay Patel [Tue, 26 Feb 2019 18:26:56 +0000 (18:26 +0000)]
[InstSimplify] remove zero-shift-guard fold for general funnel shift
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130491.html
We can't remove the compare+select in the general case because
we are treating funnel shift like a standard instruction (as
opposed to a special instruction like select/phi).
That means that if one of the operands of the funnel shift is
poison, the result is poison regardless of whether we know that
the operand is actually unused based on the instruction's
particular semantics.
The motivating case for this transform is the more specific
rotate op (rather than funnel shift), and we are preserving the
fold for that case because there is no chance of introducing
extra poison when there is no anonymous extra operand to the
funnel shift.
1) AMD family 17h "znver2" tune flag (-march, -mcpu).
2) ISAs that are enabled for "znver2" architecture.
3) For the time being, it uses the znver1 scheduler model.
4) Tests are updated.
5) Scheduler descriptions are yet to be put in place.
Jonas Paulsson [Tue, 26 Feb 2019 16:47:59 +0000 (16:47 +0000)]
[SystemZ] Wait with selection of legal vector/FP constants until Select().
This patch aims to make sure that any such constant that can be generated
with a vector instruction (for example VGBM) is recognized as such during
legalization and kept as a target independent node through post-legalize
DAGCombining.
Two new functions named isVectorConstantLegal() and loadVectorConstant()
replace old ways of handling vector/FP constants.
A new struct named SystemZVectorConstantInfo is used to cache the results of
isVectorConstantLegal() and pass them onto loadVectorConstant().
Support for fp128 constants in the presence of FeatureVectorEnhancements1
(z14) has been added.
Sanjay Patel [Tue, 26 Feb 2019 16:44:08 +0000 (16:44 +0000)]
[InstSimplify] add tests for rotate; NFC
Rotate is a special-case of funnel shift that has different
poison constraints than the general case. That's not visible
yet in the existing tests, but it needs to be corrected.
Sanjay Patel [Tue, 26 Feb 2019 15:18:49 +0000 (15:18 +0000)]
[InstCombine] canonicalize more unsigned saturated add with 'not'
Yet another pattern variation suggested by:
https://bugs.llvm.org/show_bug.cgi?id=14613
There are 8 more potential commuted patterns here on top of the
8 that were already handled (rL354221, rL354276, rL354393).
We have the obvious commute of the 'add' + commute of the cmp
predicate/operands (ugt/ult) + commute of the select operands:
Andrea Di Biagio [Tue, 26 Feb 2019 14:19:00 +0000 (14:19 +0000)]
[MCA] Always check if scheduler resources are unavailable when reporting dispatch stalls.
Dispatch stall cycles may be associated to multiple dispatch stall events.
Before this patch, each stall cycle was associated with a single stall event.
This patch also improves a couple of code comments, and adds a helper method to
query the Scheduler for dispatch stalls.
The --disassembler-options, or -M, are used to customize
the disassembler and affect its output.
The two implemented options allow selecting register names on ARM:
* With -Mreg-names-raw, the disassembler uses rNN for all registers.
* With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15,
which is the default behavior of llvm-objdump.
Dan Gohman [Tue, 26 Feb 2019 05:20:19 +0000 (05:20 +0000)]
[WebAssembly] Properly align fp128 arguments in outgoing varargs arguments
For outgoing varargs arguments, it's necessary to check the OrigAlign field
of the corresponding OutputArg entry to determine argument alignment, rather
than just computing an alignment from the argument value type. This is
because types like fp128 are split into multiple argument values, with
narrower types that don't reflect the ABI alignment of the full fp128.
This fixes the printf("printfL: %4.*Lf\n", 2, lval); testcase.
Philip Reames [Tue, 26 Feb 2019 04:30:33 +0000 (04:30 +0000)]
[ARM] Be super conservative about atomics
As requested during review of D57601 <https://reviews.llvm.org/D57601> https://reviews.llvm.org/D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.
Heejin Ahn [Tue, 26 Feb 2019 03:29:59 +0000 (03:29 +0000)]
[WebAssembly] Improve readability of EH tests
Summary:
- Indent check lines to easily figure out try-catch-end structure
- Add the original C++ code the tests were genereated from
- Add a few more lines to make the structure more readable
- Rename a couple function / structures
- Add label and branch annotations to cfg-stackify-eh.ll
- Temporarily delete check lines for `test1` in `cfg-stackify-eh.ll`
because it will be updated in a later CL soon and there's no point of
making it look better here
Reid Kleckner [Tue, 26 Feb 2019 02:30:00 +0000 (02:30 +0000)]
[llvm-cov] Fix llvm-cov on Windows and un-XFAIL test
Summary:
The llvm-cov tool needs to be able to find coverage names in the
executable, so the .lprfn and .lcovmap sections cannot be merged into
.rdata.
Also, the linker merges .lprfn$M into .lprfn, so llvm-cov needs to
handle that when looking up sections. It has to support running on both
relocatable object files and linked PE files.
Lastly, when loading .lprfn from a PE file, llvm-cov needs to skip the
leading zero byte added by the profile runtime.
Reid Kleckner [Tue, 26 Feb 2019 02:11:25 +0000 (02:11 +0000)]
[X86] Fix bug in x86_intrcc with arg copy elision
Summary:
Use a custom calling convention handler for interrupts instead of fixing
up the locations in LowerMemArgument. This way, the offsets are correct
when constructed and we don't need to account for them in as many
places.