Henrik Gramner [Wed, 29 Mar 2017 14:43:57 +0000 (16:43 +0200)]
x86inc: Fix call with memory operands
We overload the `call` instruction with a macro, but it would misbehave when
the macro argument wasn't a valid identifier. Fix it by explicitly checking
if the argument is an identifier.
Henrik Gramner [Fri, 24 Mar 2017 23:02:11 +0000 (00:02 +0100)]
checkasm: Fix load_deinterleave_chroma_fdec test
The function only writes to parts of the destination buffer but the test
verifies the content of the entire buffer. The problem is that some earlier
IDCT functions clobbers the same part of the buffer with garbage when
benchmarked which would incorrectly cause test failures.
Fix this by explicitly zeroing the buffers beforehand.
Henrik Gramner [Fri, 24 Mar 2017 21:27:42 +0000 (22:27 +0100)]
checkasm: Fix compilation on hardened x86-64 ELF systems
Normal PC-relative relocations cannot be used for resolving the address of
external symbols on systems where ASLR results in the offset being larger
than 32 bits. We are required to to go through the PLT instead.
Martin Storsjö [Thu, 23 Mar 2017 13:05:37 +0000 (15:05 +0200)]
configure: Always enable PIC in aarch64 assembly for apple platforms
This is similar to what we do for 32-bit ARM assembly as well.
Fixes linker errors such as `ld: Absolute addressing not allowed in
arm64 code but used in '_x264_cabac_encode_terminal_asm' referencing
'_x264_cabac_range_lps' for architecture arm64`.
Martin Storsjö [Mon, 26 Dec 2016 22:22:48 +0000 (00:22 +0200)]
arm: Load mb_y properly in mbtree_propagate_list_internal_neon
The previous version, attempting to load two stack parameters at once,
only would have worked if they were interpreted and loaded as 32 bit
elements, not when loading them as 16 bit elements.
Martin Storsjö [Wed, 16 Nov 2016 08:57:31 +0000 (10:57 +0200)]
checkasm: aarch64: Add filler args to make sure all parameters are passed on the stack
This, combined with clobbering the stack space prior to the call,
increases the chances of finding cases where 32 bit parameters
are erroneously treated as 64 bit.
Henrik Gramner [Sat, 8 Oct 2016 15:20:18 +0000 (17:20 +0200)]
x86inc: Avoid using eax/rax for storing the stack pointer
When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.
If we chose to use another register for this purpose we should not pick
eax/rax since it can be overwritten as a return value.
Martin Storsjö [Mon, 14 Nov 2016 21:54:51 +0000 (23:54 +0200)]
checkasm: arm/aarch64: Fix the amount of space reserved for stack parameters
Even if MAX_ARGS - 2 (for arm) or MAX_ARGS - 6 (for aarch64) parameters
are passed on the stack to checkasm_checked_call, we actually only
need to store MAX_ARGS - 4 (for arm) or MAX_ARGS - 8 (for aarch64)
parameters on the stack when calling the tested function.
Janne Grunau [Mon, 14 Nov 2016 21:54:50 +0000 (23:54 +0200)]
checkasm: arm: preserve the stack alignment in x264_checkasm_checked_call
The stack used by x264_checkasm_checked_call_neon was a multiple of 4
when the checked function is called. AAPCS requires a double word (8 byte)
aligned stack public interfaces. Since both calls are public interfaces
the stack is misaligned when the checked is called.
This can cause issues if code called within this (which includes
the C implementations) relies on the stack alignment.
Martin Storsjö [Wed, 16 Nov 2016 08:56:14 +0000 (10:56 +0200)]
arm: Don't use vcmp.f64 for testing for an all-zeros register
On iOS, vcmp.f64 can behave as if the register was zero, if the
register (interpreted as a f64), was a denormal number.
The vcmp.f64 (and other VFP instructions) will trap to the kernel
(which is supposed to implement the FP operation, which it apparently
doesn't do properly on iOS) if the value is a denormal. If this happens,
the whole comparison ends up way more costly.
Anton Mitrofanov [Wed, 21 Sep 2016 21:17:48 +0000 (00:17 +0300)]
Correctly signal max_dec_frame_buffering with --keyint 1
According to E.2.1 it is inferred to be equal to 0 only if profile_idc is equal
to 44, 86, 100, 110, 122, or 244 and constraint_set3_flag is equal to 1.
Henrik Gramner [Thu, 28 Jul 2016 19:58:40 +0000 (21:58 +0200)]
cli: Prefetch yuv/y4m input frames on Windows 8 and newer
Use PrefetchVirtualMemory() (if available) on memory-mapped input frames.
Significantly improves performance when the source file is not already
present in the OS page cache by asking the OS to bring in those pages from
disk using large, concurrent I/O requests.
Most beneficial on fast encoding settings. Up to 40% faster overall with
--preset ultrafast, and up to 20% faster overall with --preset veryfast.
This API was introduced in Windows 8, so call it conditionally. On older
Windows systems the previous behavior remains unchanged.
Henrik Gramner [Thu, 28 Jul 2016 17:34:04 +0000 (19:34 +0200)]
Adjust --preset slow
* Swap --me umh for --trellis 2. They have a similar effect on performance
but the latter gives slightly better results in most cases.
* Change --b-adapt from 2 to 1. Negligible difference in quality since the
b-adapt 1 improvements, but it's significantly faster.
Also remove a redundant assignment from veryfast (--me hex is set by default).
Janne Grunau [Fri, 26 Aug 2016 17:26:55 +0000 (20:26 +0300)]
arm/aarch64: use plane_copy wrapper macros
Move the macros to common/mc.h to share them across all architectures.
Fixes possible buffer overreads if the width of the user supplied frames
is not a multiple of 16.
Henrik Gramner [Sun, 24 Apr 2016 12:10:22 +0000 (14:10 +0200)]
configure: Add link-time optimization support
Enabled by using the --enable-lto configuration option.
May give a slight performance improvement in some cases, but it can
also reduce performance in other cases (largely compiler-dependant)
so don't enable it by default. It also makes compilation (and linking
in particular) a fair bit slower.
Note that some older versions of GNU binutils will incorrectly warn
about "memset used with constant zero length parameter" when linking
using LTO. This is due to a bug in binutils and can safely be ignored.
Henrik Gramner [Wed, 13 Apr 2016 15:53:49 +0000 (17:53 +0200)]
Eliminate some compiler warnings on BSD
Include <strings.h> in addition to <string.h>. According to the POSIX
specification the prototypes for strcasecmp() and strncasecmp() are
declared in <strings.h>. On some systems they are also declared in
<string.h> for compatibility reasons but we shouldn't rely on that.
Define _POSIX_C_SOURCE only when it's required to do so. Some BSD
variants doesn't declare certain function prototypes otherwise.
Henrik Gramner [Fri, 4 Mar 2016 16:53:08 +0000 (17:53 +0100)]
x86: Add asm for mbtree fixed point conversion
The QP offsets of each macroblock are stored as floats internally and
converted to big-endian Q8.8 fixed point numbers when written to the 2-pass
stats file, and converted back to floats when read from the stats file.
Add SSSE3 and AVX2 implementations for conversions in both directions.
x86inc: Improve handling of %ifid with multi-token parameters
The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.