]> granicus.if.org Git - libx264/log
libx264
7 years agox86: AVX2 plane_copy_deinterleave
Henrik Gramner [Tue, 17 Jan 2017 20:59:47 +0000 (21:59 +0100)]
x86: AVX2 plane_copy_deinterleave

50% faster than SSSE3 in 8-bit.
25% faster than AVX in high bit-depth.

Also drop the MMX versions of deinterleave functions in favor of SSE2.

7 years agox86: AVX2 plane_copy_deinterleave_rgb
Henrik Gramner [Thu, 12 Jan 2017 21:16:53 +0000 (22:16 +0100)]
x86: AVX2 plane_copy_deinterleave_rgb

Around 15% faster than SSSE3.

7 years agox86: Faster plane_copy_deinterleave_rgb_sse2
Henrik Gramner [Thu, 12 Jan 2017 20:36:28 +0000 (21:36 +0100)]
x86: Faster plane_copy_deinterleave_rgb_sse2

50% faster than the previous SSE2 function.

7 years agox86util: Reduce code size of high bit-depth AVX LOAD_DIFF
Henrik Gramner [Sun, 15 Jan 2017 13:52:29 +0000 (14:52 +0100)]
x86util: Reduce code size of high bit-depth AVX LOAD_DIFF

AVX supports unaligned memory operands which makes the SATD code a bit denser.

7 years agoBump dates to 2017
Henrik Gramner [Sun, 1 Jan 2017 18:10:10 +0000 (19:10 +0100)]
Bump dates to 2017

7 years agoppc: Fix the pre-VSX vec_vsx_st() fallback macro
Alexandra Hájková [Sat, 21 Jan 2017 12:34:49 +0000 (12:34 +0000)]
ppc: Fix the pre-VSX vec_vsx_st() fallback macro

It would previously only work correctly with 8-bit data types.

Fixes compilation with --disable-vsx.

7 years agoFix plane_copy_deinterleave_v210 on big-endian
Alexandra Hájková [Wed, 18 Jan 2017 09:13:39 +0000 (09:13 +0000)]
Fix plane_copy_deinterleave_v210 on big-endian

7 years agoppc: Avoid instantiating unused plane_copy functions
Alexandra Hájková [Wed, 21 Dec 2016 13:13:43 +0000 (13:13 +0000)]
ppc: Avoid instantiating unused plane_copy functions

Those functions are currently only used in 8-bit mode and results in
warnings in other bit depths.

8 years agoarm: Load mb_y properly in mbtree_propagate_list_internal_neon
Martin Storsjö [Mon, 26 Dec 2016 22:22:48 +0000 (00:22 +0200)]
arm: Load mb_y properly in mbtree_propagate_list_internal_neon

The previous version, attempting to load two stack parameters at once,
only would have worked if they were interpreted and loaded as 32 bit
elements, not when loading them as 16 bit elements.

8 years agoanalyse: Fix lambda table values
Anton Mitrofanov [Mon, 31 Oct 2016 11:39:52 +0000 (14:39 +0300)]
analyse: Fix lambda table values

8 years agoCosmetics
Anton Mitrofanov [Sat, 26 Nov 2016 12:30:58 +0000 (15:30 +0300)]
Cosmetics

Also make x264_weighted_reference_duplicate() static.

8 years agoppc: AltiVec store_interleave_chroma
Alexandra Hájková [Mon, 28 Nov 2016 14:04:10 +0000 (14:04 +0000)]
ppc: AltiVec store_interleave_chroma

8 years agoppc: AltiVec plane_copy_interleave
Alexandra Hájková [Mon, 28 Nov 2016 10:51:54 +0000 (10:51 +0000)]
ppc: AltiVec plane_copy_interleave

8 years agoppc: AltiVec plane_copy_swap
Alexandra Hájková [Sat, 26 Nov 2016 20:03:34 +0000 (20:03 +0000)]
ppc: AltiVec plane_copy_swap

8 years agoppc: AltiVec zigzag_interleave_8x8_cavlc
Alexandra Hájková [Wed, 23 Nov 2016 19:53:51 +0000 (20:53 +0100)]
ppc: AltiVec zigzag_interleave_8x8_cavlc

8 years agoppc: AltiVec zigzag_scan_8x8_frame
Alexandra Hájková [Wed, 23 Nov 2016 19:53:50 +0000 (20:53 +0100)]
ppc: AltiVec zigzag_scan_8x8_frame

8 years agoppc: AltiVec sub8x8_dct_dc
Alexandra Hájková [Mon, 14 Nov 2016 14:06:06 +0000 (15:06 +0100)]
ppc: AltiVec sub8x8_dct_dc

8 years agoppc: AltiVec add8x8_idct_dc
Alexandra Hájková [Mon, 14 Nov 2016 14:06:05 +0000 (15:06 +0100)]
ppc: AltiVec add8x8_idct_dc

8 years agocheckasm: aarch64: Add filler args to make sure all parameters are passed on the...
Martin Storsjö [Wed, 16 Nov 2016 08:57:31 +0000 (10:57 +0200)]
checkasm: aarch64: Add filler args to make sure all parameters are passed on the stack

This, combined with clobbering the stack space prior to the call,
increases the chances of finding cases where 32 bit parameters
are erroneously treated as 64 bit.

8 years agocheckasm: aarch64: Clobber the stack before calling functions
Martin Storsjö [Wed, 16 Nov 2016 08:57:30 +0000 (10:57 +0200)]
checkasm: aarch64: Clobber the stack before calling functions

8 years agoppc: Use vec_vsx_ld instead of VEC_LOAD/STORE macros
Alexandra Hájková [Tue, 1 Nov 2016 22:16:17 +0000 (23:16 +0100)]
ppc: Use vec_vsx_ld instead of VEC_LOAD/STORE macros

Remove VEC_LOAD*, some of VEC_STORE* macros, some PREP* macros and
VEC_DIFF_H_OFFSET macro.

Make sure the functions do not use deprected primitives.

8 years agoppc: Provide fallbacks for older architectures
Luca Barbato [Tue, 1 Nov 2016 22:16:16 +0000 (23:16 +0100)]
ppc: Provide fallbacks for older architectures

8 years agoppc: Add VSX support to configure
Luca Barbato [Tue, 1 Nov 2016 22:16:14 +0000 (23:16 +0100)]
ppc: Add VSX support to configure

8 years agoppc: Manually unroll the horizontal prediction loop
Luca Barbato [Tue, 1 Nov 2016 22:16:13 +0000 (23:16 +0100)]
ppc: Manually unroll the horizontal prediction loop

Doubles the speedup from the function (from being slower to be over
twice as fast than C).

8 years agox86inc: Avoid using eax/rax for storing the stack pointer
Henrik Gramner [Sat, 8 Oct 2016 15:20:18 +0000 (17:20 +0200)]
x86inc: Avoid using eax/rax for storing the stack pointer

When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.

If we chose to use another register for this purpose we should not pick
eax/rax since it can be overwritten as a return value.

8 years agoShow the correct settings for --preset slow in --fullhelp
Henrik Gramner [Thu, 1 Dec 2016 15:05:16 +0000 (16:05 +0100)]
Show the correct settings for --preset slow in --fullhelp

The slow preset was recently adjusted but we forgot to update the
corresponding --fullhelp message to reflect the change.

8 years agocheckasm: arm/aarch64: Fix the amount of space reserved for stack parameters
Martin Storsjö [Mon, 14 Nov 2016 21:54:51 +0000 (23:54 +0200)]
checkasm: arm/aarch64: Fix the amount of space reserved for stack parameters

Even if MAX_ARGS - 2 (for arm) or MAX_ARGS - 6 (for aarch64) parameters
are passed on the stack to checkasm_checked_call, we actually only
need to store MAX_ARGS - 4 (for arm) or MAX_ARGS - 8 (for aarch64)
parameters on the stack when calling the tested function.

8 years agocheckasm: arm: preserve the stack alignment in x264_checkasm_checked_call
Janne Grunau [Mon, 14 Nov 2016 21:54:50 +0000 (23:54 +0200)]
checkasm: arm: preserve the stack alignment in x264_checkasm_checked_call

The stack used by x264_checkasm_checked_call_neon was a multiple of 4
when the checked function is called. AAPCS requires a double word (8 byte)
aligned stack public interfaces. Since both calls are public interfaces
the stack is misaligned when the checked is called.

This can cause issues if code called within this (which includes
the C implementations) relies on the stack alignment.

8 years agoarm: Don't use vcmp.f64 for testing for an all-zeros register
Martin Storsjö [Wed, 16 Nov 2016 08:56:14 +0000 (10:56 +0200)]
arm: Don't use vcmp.f64 for testing for an all-zeros register

On iOS, vcmp.f64 can behave as if the register was zero, if the
register (interpreted as a f64), was a denormal number.

The vcmp.f64 (and other VFP instructions) will trap to the kernel
(which is supposed to implement the FP operation, which it apparently
doesn't do properly on iOS) if the value is a denormal. If this happens,
the whole comparison ends up way more costly.

8 years agoaarch64: Clear the upper half of int parameters in x264_plane_copy_core_neon
Janne Grunau [Wed, 16 Nov 2016 08:49:14 +0000 (10:49 +0200)]
aarch64: Clear the upper half of int parameters in x264_plane_copy_core_neon

8 years agoppc: Fix hadamard for little-endian
Luca Barbato [Tue, 1 Nov 2016 22:16:18 +0000 (23:16 +0100)]
ppc: Fix hadamard for little-endian

Extending to 16-bit works with flipped bytes.

8 years agoCorrectly signal max_dec_frame_buffering with --keyint 1
Anton Mitrofanov [Wed, 21 Sep 2016 21:17:48 +0000 (00:17 +0300)]
Correctly signal max_dec_frame_buffering with --keyint 1

According to E.2.1 it is inferred to be equal to 0 only if profile_idc is equal
to 44, 86, 100, 110, 122, or 244 and constraint_set3_flag is equal to 1.

8 years agox86: Faster pixel_ssim_4x4x2_core
Henrik Gramner [Sat, 17 Sep 2016 19:41:52 +0000 (21:41 +0200)]
x86: Faster pixel_ssim_4x4x2_core

8 years agox86: Deduplicate a constant in hpel_filter_c
Henrik Gramner [Sat, 17 Sep 2016 19:14:35 +0000 (21:14 +0200)]
x86: Deduplicate a constant in hpel_filter_c

8 years agox86: Faster pixel_ssd_nv12
Henrik Gramner [Sat, 17 Sep 2016 12:45:08 +0000 (14:45 +0200)]
x86: Faster pixel_ssd_nv12

Also drop the MMX2 version to simplify things.

8 years agox86: SSE zigzag_scan_4x4_field
Henrik Gramner [Sun, 11 Sep 2016 13:32:54 +0000 (15:32 +0200)]
x86: SSE zigzag_scan_4x4_field

Replaces the MMX2 version, one cycle faster.

Also change the checkasm test to use the correct alignment macro.

8 years agox86: AVX2 mbtree_propagate_list
Henrik Gramner [Wed, 7 Sep 2016 17:27:31 +0000 (19:27 +0200)]
x86: AVX2 mbtree_propagate_list

SIMD part is around 25% faster than AVX on Haswell, around 7%
faster when including the runtime of the scalar C wrapper.

8 years agox86: Move predict_16x16_dc_left calculations to asm
Henrik Gramner [Wed, 7 Sep 2016 17:26:42 +0000 (19:26 +0200)]
x86: Move predict_16x16_dc_left calculations to asm

1-2 cycles faster and avoids some code duplication to decrease code size.

Also drop the MMX2 implementation in favor of SSE2 to simplify things.

8 years agoavs: support for AviSynth+ high bit-depth pixel formats
Anton Mitrofanov [Thu, 18 Aug 2016 16:00:48 +0000 (19:00 +0300)]
avs: support for AviSynth+ high bit-depth pixel formats

8 years agoaarch64: implement x264_plane_copy_swap_neon
Janne Grunau [Fri, 26 Aug 2016 17:26:56 +0000 (20:26 +0300)]
aarch64: implement x264_plane_copy_swap_neon

plane_copy_swap_c: 27054
plane_copy_swap_neon: 4152

8 years agoVarious cosmetics of semicolon use
Anton Mitrofanov [Thu, 18 Aug 2016 19:14:22 +0000 (22:14 +0300)]
Various cosmetics of semicolon use

8 years agocli: Prefetch yuv/y4m input frames on Windows 8 and newer
Henrik Gramner [Thu, 28 Jul 2016 19:58:40 +0000 (21:58 +0200)]
cli: Prefetch yuv/y4m input frames on Windows 8 and newer

Use PrefetchVirtualMemory() (if available) on memory-mapped input frames.

Significantly improves performance when the source file is not already
present in the OS page cache by asking the OS to bring in those pages from
disk using large, concurrent I/O requests.

Most beneficial on fast encoding settings. Up to 40% faster overall with
--preset ultrafast, and up to 20% faster overall with --preset veryfast.

This API was introduced in Windows 8, so call it conditionally. On older
Windows systems the previous behavior remains unchanged.

8 years agoAdjust --preset slow
Henrik Gramner [Thu, 28 Jul 2016 17:34:04 +0000 (19:34 +0200)]
Adjust --preset slow

 * Swap --me umh for --trellis 2. They have a similar effect on performance
   but the latter gives slightly better results in most cases.
 * Change --b-adapt from 2 to 1. Negligible difference in quality since the
   b-adapt 1 improvements, but it's significantly faster.

Also remove a redundant assignment from veryfast (--me hex is set by default).

8 years agoratecontrol_new: Simplify an expression in HRD timescale calculation
Henrik Gramner [Thu, 28 Jul 2016 17:33:57 +0000 (19:33 +0200)]
ratecontrol_new: Simplify an expression in HRD timescale calculation

Also gets rid of a false positive static analyser integer division warning.

8 years agogcc: Enable __sync_fetch_and_add() on x86-64
Henrik Gramner [Thu, 28 Jul 2016 17:33:44 +0000 (19:33 +0200)]
gcc: Enable __sync_fetch_and_add() on x86-64

It was previously only enabled on 32-bit x86 for no reason, so 64-bit
systems had to use a mutex instead of a simple `lock xadd` instruction.

Note that this code is only used in some very specific configurations
involving sliced threads.

8 years agomips: Fix high bit-depth compilation
Anton Mitrofanov [Tue, 20 Sep 2016 15:48:22 +0000 (18:48 +0300)]
mips: Fix high bit-depth compilation

8 years agocheckasm: Fix compilation on Windows with --disable-thread
Henrik Gramner [Sat, 17 Sep 2016 13:53:59 +0000 (15:53 +0200)]
checkasm: Fix compilation on Windows with --disable-thread

8 years agoarm/aarch64: use plane_copy wrapper macros
Janne Grunau [Fri, 26 Aug 2016 17:26:55 +0000 (20:26 +0300)]
arm/aarch64: use plane_copy wrapper macros

Move the macros to common/mc.h to share them across all architectures.
Fixes possible buffer overreads if the width of the user supplied frames
is not a multiple of 16.

Reported-by: Kirill Batuzov <batuzovk@ispras.ru>
8 years agoconfigure: Support specifying a custom pkg-config
Henrik Gramner [Sun, 3 Apr 2016 15:28:33 +0000 (17:28 +0200)]
configure: Support specifying a custom pkg-config

8 years agoAdd support for new VUI parameters
Anton Mitrofanov [Wed, 8 Jun 2016 19:46:17 +0000 (22:46 +0300)]
Add support for new VUI parameters

Support the new color primaries, transfer characteristics, and matrix
coefficients defined in the 2016-02 edition of the H.264 specification.

8 years agoconfigure: Add link-time optimization support
Henrik Gramner [Sun, 24 Apr 2016 12:10:22 +0000 (14:10 +0200)]
configure: Add link-time optimization support

Enabled by using the --enable-lto configuration option.

May give a slight performance improvement in some cases, but it can
also reduce performance in other cases (largely compiler-dependant)
so don't enable it by default. It also makes compilation (and linking
in particular) a fair bit slower.

Note that some older versions of GNU binutils will incorrectly warn
about "memset used with constant zero length parameter" when linking
using LTO. This is due to a bug in binutils and can safely be ignored.

8 years agoconfigure: Fix clang detection with versioned binaries
Henrik Gramner [Sun, 24 Apr 2016 11:32:43 +0000 (13:32 +0200)]
configure: Fix clang detection with versioned binaries

Correctly detect clang binaries that has the version number appended
as a suffix to the file name, e.g. `clang38`.

8 years agoarm: Add asm for mbtree fixed point conversion
Janne Grunau [Sun, 24 Apr 2016 12:38:56 +0000 (14:38 +0200)]
arm: Add asm for mbtree fixed point conversion

7-8 times faster on a cortex-a53 vs. gcc-5.3.

mbtree_fix8_pack_c: 44114
mbtree_fix8_pack_neon: 5805
mbtree_fix8_unpack_c: 38924
mbtree_fix8_unpack_neon: 4870

8 years agoaarch64: Add asm for mbtree fixed point conversion
Janne Grunau [Sun, 24 Apr 2016 12:38:55 +0000 (14:38 +0200)]
aarch64: Add asm for mbtree fixed point conversion

pack is ~7 times faster and unpack is ~9 times faster on a cortex-a53
compared to gcc-5.3.

mbtree_fix8_pack_c: 41534
mbtree_fix8_pack_neon: 5766
mbtree_fix8_unpack_c: 44102
mbtree_fix8_unpack_neon: 4868

8 years agoFix p4x4 analyse for 4:4:4 encoding with chroma ME
Anton Mitrofanov [Sun, 22 May 2016 19:33:58 +0000 (22:33 +0300)]
Fix p4x4 analyse for 4:4:4 encoding with chroma ME

8 years agoFix 4:4:4 encoding with CQM
Anton Mitrofanov [Sun, 22 May 2016 19:18:34 +0000 (22:18 +0300)]
Fix 4:4:4 encoding with CQM

8 years agoFix p4x4 RDO with CAVLC
Anton Mitrofanov [Sun, 22 May 2016 16:36:05 +0000 (19:36 +0300)]
Fix p4x4 RDO with CAVLC

8 years agoApply zone options a little bit earlier
Anton Mitrofanov [Sat, 23 Apr 2016 20:10:03 +0000 (23:10 +0300)]
Apply zone options a little bit earlier

This way things like SAR changes will have full effect from the start frame.

8 years agoFix corruption when using encoder_reconfig() with some parameters
Anton Mitrofanov [Sat, 23 Apr 2016 19:45:44 +0000 (22:45 +0300)]
Fix corruption when using encoder_reconfig() with some parameters

Changing parameters that affects SPS, like --ref for example, wasn't
behaving correctly previously.

Probably a regression in r2373.

8 years agoClean up header includes
Anton Mitrofanov [Wed, 13 Apr 2016 18:54:25 +0000 (21:54 +0300)]
Clean up header includes

8 years agoEliminate some compiler warnings on BSD
Henrik Gramner [Wed, 13 Apr 2016 15:53:49 +0000 (17:53 +0200)]
Eliminate some compiler warnings on BSD

Include <strings.h> in addition to <string.h>. According to the POSIX
specification the prototypes for strcasecmp() and strncasecmp() are
declared in <strings.h>. On some systems they are also declared in
<string.h> for compatibility reasons but we shouldn't rely on that.

Define _POSIX_C_SOURCE only when it's required to do so. Some BSD
variants doesn't declare certain function prototypes otherwise.

8 years agoosx: Add -D_DARWIN_C_SOURCE to CFLAGS
Henrik Gramner [Tue, 12 Apr 2016 19:33:54 +0000 (21:33 +0200)]
osx: Add -D_DARWIN_C_SOURCE to CFLAGS

OSX doesn't like _POSIX_C_SOURCE being defined when _DARWIN_C_SOURCE isn't.

8 years agoRemove an unused parameter from x264_slicetype_frame_cost()
Anton Mitrofanov [Tue, 12 Apr 2016 17:33:42 +0000 (20:33 +0300)]
Remove an unused parameter from x264_slicetype_frame_cost()

The b_intra_penalty parameter is no longer used anywhere after the
improvements to the --b-adapt 1 algorithm.

8 years agoImprove the --b-adapt 1 algorithm
Anton Mitrofanov [Sun, 10 Apr 2016 17:17:32 +0000 (20:17 +0300)]
Improve the --b-adapt 1 algorithm

Roughly the same speed as before but with significantly better results,
comparable to --b-adapt 2.

8 years agoanalyse: i_sub_partition write combining
Henrik Gramner [Sun, 3 Apr 2016 13:49:26 +0000 (15:49 +0200)]
analyse: i_sub_partition write combining

8 years agox86: Use one less register in mbtree_propagate_cost_avx2
Henrik Gramner [Tue, 15 Mar 2016 19:16:45 +0000 (20:16 +0100)]
x86: Use one less register in mbtree_propagate_cost_avx2

Avoids the need to save and restore xmm6 on 64-bit Windows.

8 years agox86: Add asm for mbtree fixed point conversion
Henrik Gramner [Fri, 4 Mar 2016 16:53:08 +0000 (17:53 +0100)]
x86: Add asm for mbtree fixed point conversion

The QP offsets of each macroblock are stored as floats internally and
converted to big-endian Q8.8 fixed point numbers when written to the 2-pass
stats file, and converted back to floats when read from the stats file.

Add SSSE3 and AVX2 implementations for conversions in both directions.

About 8x faster than C on Haswell.

8 years agox86inc: Enable AVX emulation in additional cases
Anton Mitrofanov [Thu, 7 Apr 2016 10:09:03 +0000 (13:09 +0300)]
x86inc: Enable AVX emulation in additional cases

Allows emulation to work when dst is equal to src2 as long as the
instruction is commutative, e.g. `addps m0, m1, m0`.

8 years agox86inc: Improve handling of %ifid with multi-token parameters
Anton Mitrofanov [Thu, 7 Apr 2016 09:48:29 +0000 (12:48 +0300)]
x86inc: Improve handling of %ifid with multi-token parameters

The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.

8 years agox86inc: Fix AVX emulation of some instructions
Anton Mitrofanov [Mon, 28 Mar 2016 15:35:38 +0000 (18:35 +0300)]
x86inc: Fix AVX emulation of some instructions

8 years agox86inc: Fix AVX emulation of scalar float instructions
Henrik Gramner [Fri, 4 Mar 2016 16:51:41 +0000 (17:51 +0100)]
x86inc: Fix AVX emulation of scalar float instructions

Those instructions are not commutative since they only change the first
element in the vector and leave the rest unmodified.

8 years agox86: dct2x4dc asm
Henrik Gramner [Sat, 27 Feb 2016 19:34:39 +0000 (20:34 +0100)]
x86: dct2x4dc asm

Only used in 4:2:2. MMX2 version implemented for 8-bit, SSE2 and AVX
versions implemented for high bit-depth.

2.5x faster on 32-bit and 1.6x faster on 64-bit compared to C on Ivy Bridge.

8 years agox86: SSE2/AVX idct_dequant_2x4_(dc|dconly)
Henrik Gramner [Sat, 20 Feb 2016 19:31:22 +0000 (20:31 +0100)]
x86: SSE2/AVX idct_dequant_2x4_(dc|dconly)

Only used in 4:2:2. Both 8-bit and high bit-depth implemented.

Approximate performance improvement compared to C on Ivy Bridge:

                         x86-32  x86-64
idct_dequant_2x4_dc      2.1x    1.7x
idct_dequant_2x4_dconly  2.7x    2.0x

Helps more on 32-bit due to the C versions being register starved.

8 years agocheckasm: Fix idct_dequant_2x4_(dc|dconly) tests
Henrik Gramner [Sat, 20 Feb 2016 15:53:35 +0000 (16:53 +0100)]
checkasm: Fix idct_dequant_2x4_(dc|dconly) tests

They used the wrong qp values and the dconly test had the wrong name. This
was undetected before because there wasn't any assembly implementations.

8 years agocheckasm: Disable Windows Error Reporting
Henrik Gramner [Sun, 7 Feb 2016 13:55:26 +0000 (14:55 +0100)]
checkasm: Disable Windows Error Reporting

When developing new assembly code it's expected that checkasm may crash,
and the error reporting dialog popup can be somewhat annoying.

8 years agowindows: Flag debug builds in the resource file
Henrik Gramner [Sat, 6 Feb 2016 17:49:46 +0000 (18:49 +0100)]
windows: Flag debug builds in the resource file

8 years agocli: Refactor filter option parsing
Henrik Gramner [Thu, 4 Feb 2016 19:06:57 +0000 (20:06 +0100)]
cli: Refactor filter option parsing

The old code contained a whole bunch of memory leaks, unchecked mallocs,
sections of dead code, etc. and was generally overly complex.

Also consolidate some memory allocations into a single one.

8 years agoffms: Various improvements
Henrik Gramner [Sun, 31 Jan 2016 20:50:52 +0000 (21:50 +0100)]
ffms: Various improvements

 * Drop the MinGW Unicode workarounds. Those were required at the time
   Windows Unicode support was added to x264 but the underlying problem
   has since been fixed in FFMS.

 * Use FFMS_IndexBelongsToFile() as an additional sanity check when reading
   an index file to ensure that it belongs to the current source video.

 * Upgrade to the new API to prevent deprecation warnings when compiling.

 * Fix a resource leak that would occur if FFMS_GetFirstTrackOfType() or
   FFMS_CreateVideoSource() failed.

 * Minor string handling adjustments related to progress reporting.

This increases the FFMS version requirement from 2.16.2 to 2.21.0.

8 years agomsvc: Add snprintf/vsnprintf replacements
Henrik Gramner [Mon, 11 Apr 2016 14:59:46 +0000 (16:59 +0200)]
msvc: Add snprintf/vsnprintf replacements

MSVC pre-VS2015 has broken snprintf/vsnprintf implementations which are
incompatible with C99 and may lead to buffer overflows.

8 years agoconfigure: Define feature test macros for --std=gnu99
Henrik Gramner [Sun, 31 Jan 2016 19:21:01 +0000 (20:21 +0100)]
configure: Define feature test macros for --std=gnu99

Makes the printf() family functions on MinGW use the correct C99 POSIX
versions instead of the broken pre-VS2015 Microsoft ones.

Also allows us to get rid of some _GNU_SOURCE and _ISOC99_SOURCE defines.

8 years agomingw: Enable high-entropy ASLR on 64-bit Windows
Henrik Gramner [Thu, 28 Jan 2016 17:37:37 +0000 (18:37 +0100)]
mingw: Enable high-entropy ASLR on 64-bit Windows

To fully utilize HEASLR the image base address must also be set above
4 GiB. For consistency use the same address as MSVC uses by default.

This requires binutils 2.25 which isn't available on all common
distributions, so only enable it after checking that it's supported.

8 years agomsvs: WinRT support
Henrik Gramner [Sun, 24 Jan 2016 00:48:18 +0000 (01:48 +0100)]
msvs: WinRT support

To compile x264 for WinRT the following additional steps has to be performed.

 * Ensure that the necessary SDK is installed.

 * Set the correct environment variables in the VS command prompt as shown at
   https://trac.ffmpeg.org/wiki/CompilationGuide/WinRT

 * Add one of the following to --extra-cflags depending on the target OS:
   "-DWINAPI_FAMILY=WINAPI_FAMILY_PC_APP -D_WIN32_WINNT=0x0A00" (Windows 10)
   "-DWINAPI_FAMILY=WINAPI_FAMILY_PC_APP -D_WIN32_WINNT=0x0603" (Windows 8.1)

8 years agoconfigure: Disable CLI libraries when CLI is disabled
Henrik Gramner [Sun, 24 Jan 2016 22:58:40 +0000 (23:58 +0100)]
configure: Disable CLI libraries when CLI is disabled

8 years agomatroska: mk_close: Check fseek() return value
Henrik Gramner [Fri, 5 Feb 2016 17:46:13 +0000 (18:46 +0100)]
matroska: mk_close: Check fseek() return value

8 years agoparse_qpfile: Check ftell() and fseek() return values
Henrik Gramner [Fri, 5 Feb 2016 17:46:02 +0000 (18:46 +0100)]
parse_qpfile: Check ftell() and fseek() return values

8 years agoUse the correct default B-ref placement with B-pyramid
Anton Mitrofanov [Sun, 10 Apr 2016 17:13:59 +0000 (20:13 +0300)]
Use the correct default B-ref placement with B-pyramid

Cost analyse functions expects the placement of the B-ref in a sequence of
an even number of B-frames to be located towards the beginning while the
actual placement was towards the end.

Change the placement to be consistent with the analyse expectations, e.g.
PbbBbP -> PbBbbP.

8 years agoparse_zones: Fix memory leak
Henrik Gramner [Fri, 5 Feb 2016 17:45:47 +0000 (18:45 +0100)]
parse_zones: Fix memory leak

8 years agoFix float-cast-overflow in x264_ratecontrol_end function
Alexey Samsonov [Tue, 26 Jan 2016 00:05:25 +0000 (16:05 -0800)]
Fix float-cast-overflow in x264_ratecontrol_end function

According to the C standard, it is undefined behavior to cast a negative
floating point number to an unsigned integer. Float-cast-overflow in
general is known to produce different results on different architectures.

Building x264 code with Clang and -fsanitize=float-cast-overflow
(http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#availablle-checks)
and running it on some real-life examples occasionally produces errors
of the form:

encoder/ratecontrol.c:1892: runtime error: value -5011.14 is outside the
range of representable values of type 'unsigned short'

Fix these errors by explicitly coding the de-facto x86 behavior: casting
float to uint16_t through int16_t.

8 years agoFix AVC-Intra padding for non-Annex B encoding
Sebastian Dröge [Sun, 20 Dec 2015 20:49:35 +0000 (23:49 +0300)]
Fix AVC-Intra padding for non-Annex B encoding

8 years agoppc: Only perform AltiVec detection if compiled with AltiVec enabled
Anton Mitrofanov [Mon, 11 Jan 2016 18:39:22 +0000 (21:39 +0300)]
ppc: Only perform AltiVec detection if compiled with AltiVec enabled

9 years ago2-pass: Take into account possible frame reordering
Anton Mitrofanov [Tue, 13 Oct 2015 12:30:16 +0000 (15:30 +0300)]
2-pass: Take into account possible frame reordering

9 years agoRevise the 2-pass algorithm
Anton Mitrofanov [Tue, 13 Oct 2015 09:54:05 +0000 (12:54 +0300)]
Revise the 2-pass algorithm

9 years agoRevise the row VBV algorithm (part 2)
Anton Mitrofanov [Mon, 4 Jan 2016 23:41:43 +0000 (02:41 +0300)]
Revise the row VBV algorithm (part 2)

Should fix rare cases of VBV emergency mode activation caused by too much trust
to the row predictors.

9 years agoBump dates to 2016
Henrik Gramner [Fri, 1 Jan 2016 11:44:31 +0000 (12:44 +0100)]
Bump dates to 2016

9 years agocli: Use memory-mapped input frames for yuv and y4m
Henrik Gramner [Mon, 26 Oct 2015 18:54:20 +0000 (19:54 +0100)]
cli: Use memory-mapped input frames for yuv and y4m

Improves performance by avoiding extraneous memory copying.
Most beneficial on fast settings.

On average around 5-10% faster overall on ultrafast but the
performance improvement can be even larger in some cases.

9 years agoy4m: Support extended frame headers when seeking
Henrik Gramner [Thu, 7 Jan 2016 00:59:24 +0000 (01:59 +0100)]
y4m: Support extended frame headers when seeking

Use the actual length of the frame header of the first frame instead of
assuming a header without extensions when calculating the frame size.

Also makes the frame counter more accurate with extended frame headers.

9 years agoconfigure: Simplify cygwin/mingw/msys code
Henrik Gramner [Tue, 3 Nov 2015 16:55:08 +0000 (17:55 +0100)]
configure: Simplify cygwin/mingw/msys code

Avoids some code duplication.

Also drop the -mno-cygwin check since that option was removed back in 2008.

9 years agoy4m: Avoid some redundant strlen() calls
Henrik Gramner [Mon, 26 Oct 2015 17:52:46 +0000 (18:52 +0100)]
y4m: Avoid some redundant strlen() calls

9 years agoSimplify threadpool_wait
Henrik Gramner [Sun, 25 Oct 2015 16:15:10 +0000 (17:15 +0100)]
Simplify threadpool_wait

9 years agowindows: Use native threads by default
Henrik Gramner [Fri, 16 Oct 2015 17:05:34 +0000 (19:05 +0200)]
windows: Use native threads by default

--disable-win32thread can be passed as an argument to configure to compile
with pthreads, which was the old default behavior.