Supported targets:
- debian amd64
- debian aarch64
- windows 32 bit
- windows 64 bit
- macos 64bit
The tests are ran on all supported targets (via wine on windows).
The release jobs are only available on master/stable branches in
videolan/x264 repository, and must be ran manually when a developer
wishes to upload the artifacts.
Henrik Gramner [Sat, 16 Feb 2019 16:57:21 +0000 (17:57 +0100)]
x86inc: Improve warnings for use of unsupported instructions
Warn when the following are used without the appropriate cpuflag:
* YMM and ZMM registers
* 'pextrw' with a memory operand
* GPR instruction set extensions
Henrik Gramner [Sun, 12 Aug 2018 15:00:13 +0000 (17:00 +0200)]
x86: Always use PIC in x86-64 asm
Most x86-64 operating systems nowadays doesn't even allow .text relocations
in object files any more, and there is no measurable overall performance
difference from using RIP-relative addressing in x264 asm.
Enforcing PIC reduces complexity and simplifies testing.
Virtually zero increase in compression efficiency compared to 4:2:0 with empty
chroma planes. Performance is better though, especially with fast settings.
Henrik Gramner [Sun, 22 Apr 2018 20:49:15 +0000 (22:49 +0200)]
x86inc: Improve SAVE/LOAD_MM_PERMUTATION macros
Use register numbers instead of copying the full register names. This makes it
possible to change register widths in the middle of a function and keep the
mmreg permutations intact which can be useful for code that only needs larger
vectors for parts of the function in combination with macros etc.
Also change the LOAD_MM_PERMUTATION macro to use the same default name as the
SAVE macro. This simplifies swapping from ymm to xmm registers or vice versa:
Henrik Gramner [Sat, 31 Mar 2018 11:49:56 +0000 (13:49 +0200)]
x86inc: Optimize VEX instruction encoding
Most VEX-encoded instructions require an additional byte to encode when src2
is a high register (e.g. x|ymm8..15). If the instruction is commutative we
can swap src1 and src2 when doing so reduces the instruction length, e.g.
Henrik Gramner [Sat, 2 Jun 2018 18:35:10 +0000 (20:35 +0200)]
Fix clang stack alignment issues
Clang emits aligned AVX stores for things like zeroing stack-allocated
variables when using -mavx even with -fno-tree-vectorize set which can
result in crashes if this occurs before we've realigned the stack.
Previously we only ensured that the stack was realigned before calling
assembly functions that accesses stack-allocated buffers but this is
not sufficient. Fix the issue by changing the stack realignment to
instead occur immediately in all CLI, API and thread entry points.
Martin Storsjö [Fri, 30 Mar 2018 21:10:14 +0000 (00:10 +0300)]
configure: Only use gas-preprocessor with armasm for compiler=CL
This picks the right assembler automatically for arm and aarch64
llvm-mingw targets.
This doesn't get the right assembler for clang setups when clang
acts like MSVC and uses MSVC headers though (where it perhaps
should use armasm as before), but that's probably an even more
obscure setup.
Henrik Gramner [Sun, 22 Oct 2017 07:59:28 +0000 (09:59 +0200)]
input: Add a workaround for swscale overread bugs
swscale can read past the end of the input buffer, which may result in
crashes if such a read crosses a page boundary into an invalid page.
Work around this by adding some padding space at the end of the buffer when
using memory-mapped input frames. This may sometimes require copying the
last frame into a new buffer on Windows since the Microsoft memory-mapping
implementation has very limited capabilities compared to POSIX systems.
Martin Storsjö [Wed, 18 Oct 2017 07:40:02 +0000 (10:40 +0300)]
aarch64: Use ldurb/sturb for loads/stores with negative offsets
The assembler (both gas and clang/llvm) automatically fixes this,
armasm64 doesn't. We can fix it in gas-preprocessor, but we should
also be using the right instruction form.
Martin Storsjö [Mon, 16 Oct 2017 19:50:26 +0000 (22:50 +0300)]
arm: Check for __ELF__ instead of !__APPLE__, for using .arch/.fpu
For windows, when building with armasm, we already filtered these out
with gas-preprocessor.
By filtering them out already in the source, we can also build directly
with clang for windows (which also require wrapping the assembler in
gas-preprocessor for converting instructions to thumb form, but
gas-preprocessor doesn't and shouldn't filter out them in the clang
configuration).
Henrik Gramner [Sat, 14 Oct 2017 12:11:26 +0000 (14:11 +0200)]
Shrink the i4x4_mode cost_table array
Only 17 elements are actually used. It was originally padded to 64 bytes to
avoid cache line splits in the x86 assembly, but those haven't really been
an issue on x86 CPU:s made in the past decade or so.
Benchmarking shows no performance impact from dropping the padding, so
might as well remove it and save some cache.
Henrik Gramner [Wed, 11 Oct 2017 16:02:26 +0000 (18:02 +0200)]
x86: Remove some legacy CPU detection hacks
Some ancient Pentium-M and Core 1 CPU:s had slow SSE units, and using MMX
was preferable. Nowadays many assembly functions in x264 completely lack MMX
implementations and falling back to C code will likely make things worse.
Some misconfigured virtualized systems could sometimes also trigger this code
path and cause assertions.
* Use the codec parameters API instead of the AVStream codec field.
* Use av_packet_unref() instead of av_free_packet().
* Use the AVFrame pts field instead of pkt_pts.