Martin Storsjö [Mon, 26 Dec 2016 22:22:48 +0000 (00:22 +0200)]
arm: Load mb_y properly in mbtree_propagate_list_internal_neon
The previous version, attempting to load two stack parameters at once,
only would have worked if they were interpreted and loaded as 32 bit
elements, not when loading them as 16 bit elements.
Martin Storsjö [Wed, 16 Nov 2016 08:57:31 +0000 (10:57 +0200)]
checkasm: aarch64: Add filler args to make sure all parameters are passed on the stack
This, combined with clobbering the stack space prior to the call,
increases the chances of finding cases where 32 bit parameters
are erroneously treated as 64 bit.
Henrik Gramner [Sat, 8 Oct 2016 15:20:18 +0000 (17:20 +0200)]
x86inc: Avoid using eax/rax for storing the stack pointer
When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.
If we chose to use another register for this purpose we should not pick
eax/rax since it can be overwritten as a return value.
Martin Storsjö [Mon, 14 Nov 2016 21:54:51 +0000 (23:54 +0200)]
checkasm: arm/aarch64: Fix the amount of space reserved for stack parameters
Even if MAX_ARGS - 2 (for arm) or MAX_ARGS - 6 (for aarch64) parameters
are passed on the stack to checkasm_checked_call, we actually only
need to store MAX_ARGS - 4 (for arm) or MAX_ARGS - 8 (for aarch64)
parameters on the stack when calling the tested function.
Janne Grunau [Mon, 14 Nov 2016 21:54:50 +0000 (23:54 +0200)]
checkasm: arm: preserve the stack alignment in x264_checkasm_checked_call
The stack used by x264_checkasm_checked_call_neon was a multiple of 4
when the checked function is called. AAPCS requires a double word (8 byte)
aligned stack public interfaces. Since both calls are public interfaces
the stack is misaligned when the checked is called.
This can cause issues if code called within this (which includes
the C implementations) relies on the stack alignment.
Martin Storsjö [Wed, 16 Nov 2016 08:56:14 +0000 (10:56 +0200)]
arm: Don't use vcmp.f64 for testing for an all-zeros register
On iOS, vcmp.f64 can behave as if the register was zero, if the
register (interpreted as a f64), was a denormal number.
The vcmp.f64 (and other VFP instructions) will trap to the kernel
(which is supposed to implement the FP operation, which it apparently
doesn't do properly on iOS) if the value is a denormal. If this happens,
the whole comparison ends up way more costly.
Anton Mitrofanov [Wed, 21 Sep 2016 21:17:48 +0000 (00:17 +0300)]
Correctly signal max_dec_frame_buffering with --keyint 1
According to E.2.1 it is inferred to be equal to 0 only if profile_idc is equal
to 44, 86, 100, 110, 122, or 244 and constraint_set3_flag is equal to 1.
Henrik Gramner [Thu, 28 Jul 2016 19:58:40 +0000 (21:58 +0200)]
cli: Prefetch yuv/y4m input frames on Windows 8 and newer
Use PrefetchVirtualMemory() (if available) on memory-mapped input frames.
Significantly improves performance when the source file is not already
present in the OS page cache by asking the OS to bring in those pages from
disk using large, concurrent I/O requests.
Most beneficial on fast encoding settings. Up to 40% faster overall with
--preset ultrafast, and up to 20% faster overall with --preset veryfast.
This API was introduced in Windows 8, so call it conditionally. On older
Windows systems the previous behavior remains unchanged.
Henrik Gramner [Thu, 28 Jul 2016 17:34:04 +0000 (19:34 +0200)]
Adjust --preset slow
* Swap --me umh for --trellis 2. They have a similar effect on performance
but the latter gives slightly better results in most cases.
* Change --b-adapt from 2 to 1. Negligible difference in quality since the
b-adapt 1 improvements, but it's significantly faster.
Also remove a redundant assignment from veryfast (--me hex is set by default).
Janne Grunau [Fri, 26 Aug 2016 17:26:55 +0000 (20:26 +0300)]
arm/aarch64: use plane_copy wrapper macros
Move the macros to common/mc.h to share them across all architectures.
Fixes possible buffer overreads if the width of the user supplied frames
is not a multiple of 16.
Henrik Gramner [Sun, 24 Apr 2016 12:10:22 +0000 (14:10 +0200)]
configure: Add link-time optimization support
Enabled by using the --enable-lto configuration option.
May give a slight performance improvement in some cases, but it can
also reduce performance in other cases (largely compiler-dependant)
so don't enable it by default. It also makes compilation (and linking
in particular) a fair bit slower.
Note that some older versions of GNU binutils will incorrectly warn
about "memset used with constant zero length parameter" when linking
using LTO. This is due to a bug in binutils and can safely be ignored.
Henrik Gramner [Wed, 13 Apr 2016 15:53:49 +0000 (17:53 +0200)]
Eliminate some compiler warnings on BSD
Include <strings.h> in addition to <string.h>. According to the POSIX
specification the prototypes for strcasecmp() and strncasecmp() are
declared in <strings.h>. On some systems they are also declared in
<string.h> for compatibility reasons but we shouldn't rely on that.
Define _POSIX_C_SOURCE only when it's required to do so. Some BSD
variants doesn't declare certain function prototypes otherwise.
Henrik Gramner [Fri, 4 Mar 2016 16:53:08 +0000 (17:53 +0100)]
x86: Add asm for mbtree fixed point conversion
The QP offsets of each macroblock are stored as floats internally and
converted to big-endian Q8.8 fixed point numbers when written to the 2-pass
stats file, and converted back to floats when read from the stats file.
Add SSSE3 and AVX2 implementations for conversions in both directions.
x86inc: Improve handling of %ifid with multi-token parameters
The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.
Henrik Gramner [Sun, 31 Jan 2016 20:50:52 +0000 (21:50 +0100)]
ffms: Various improvements
* Drop the MinGW Unicode workarounds. Those were required at the time
Windows Unicode support was added to x264 but the underlying problem
has since been fixed in FFMS.
* Use FFMS_IndexBelongsToFile() as an additional sanity check when reading
an index file to ensure that it belongs to the current source video.
* Upgrade to the new API to prevent deprecation warnings when compiling.
* Fix a resource leak that would occur if FFMS_GetFirstTrackOfType() or
FFMS_CreateVideoSource() failed.
* Minor string handling adjustments related to progress reporting.
This increases the FFMS version requirement from 2.16.2 to 2.21.0.
Henrik Gramner [Sun, 24 Jan 2016 00:48:18 +0000 (01:48 +0100)]
msvs: WinRT support
To compile x264 for WinRT the following additional steps has to be performed.
* Ensure that the necessary SDK is installed.
* Set the correct environment variables in the VS command prompt as shown at
https://trac.ffmpeg.org/wiki/CompilationGuide/WinRT
* Add one of the following to --extra-cflags depending on the target OS:
"-DWINAPI_FAMILY=WINAPI_FAMILY_PC_APP -D_WIN32_WINNT=0x0A00" (Windows 10)
"-DWINAPI_FAMILY=WINAPI_FAMILY_PC_APP -D_WIN32_WINNT=0x0603" (Windows 8.1)
Anton Mitrofanov [Sun, 10 Apr 2016 17:13:59 +0000 (20:13 +0300)]
Use the correct default B-ref placement with B-pyramid
Cost analyse functions expects the placement of the B-ref in a sequence of
an even number of B-frames to be located towards the beginning while the
actual placement was towards the end.
Change the placement to be consistent with the analyse expectations, e.g.
PbbBbP -> PbBbbP.
Alexey Samsonov [Tue, 26 Jan 2016 00:05:25 +0000 (16:05 -0800)]
Fix float-cast-overflow in x264_ratecontrol_end function
According to the C standard, it is undefined behavior to cast a negative
floating point number to an unsigned integer. Float-cast-overflow in
general is known to produce different results on different architectures.
Building x264 code with Clang and -fsanitize=float-cast-overflow
(http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#availablle-checks)
and running it on some real-life examples occasionally produces errors
of the form:
encoder/ratecontrol.c:1892: runtime error: value -5011.14 is outside the
range of representable values of type 'unsigned short'
Fix these errors by explicitly coding the de-facto x86 behavior: casting
float to uint16_t through int16_t.