]> granicus.if.org Git - libx264/log
libx264
9 years agoSimplify version.sh
Henrik Gramner [Sat, 8 Aug 2015 10:21:54 +0000 (12:21 +0200)]
Simplify version.sh

Also remove some non-POSIX syntax and improve robustness.

As a bonus the script now runs about 2-3 times faster.

`git rev-list --count` could be used to simplify things even further,
but that functionality was added in git 1.7.2 so keep `wc -l` for now
to maintain compatibility with older git versions.

9 years agomsvs: Fix cl detection in non-English environments
장영훈 [Fri, 7 Aug 2015 05:43:24 +0000 (14:43 +0900)]
msvs: Fix cl detection in non-English environments

9 years agox86inc: Sync minor changes from ffmpeg/libav
Henrik Gramner [Mon, 3 Aug 2015 19:05:11 +0000 (21:05 +0200)]
x86inc: Sync minor changes from ffmpeg/libav

9 years agomatroska: Add comments for the remaining element names
Henrik Gramner [Wed, 29 Jul 2015 17:30:52 +0000 (19:30 +0200)]
matroska: Add comments for the remaining element names

9 years agoSilence various static analyzer warnings
Henrik Gramner [Wed, 29 Jul 2015 17:30:41 +0000 (19:30 +0200)]
Silence various static analyzer warnings

Those are false positives, but it doesn't hurt to get rid of them.

9 years agomingw: Enable the tsaware linker flag
Henrik Gramner [Sun, 26 Jul 2015 21:13:29 +0000 (23:13 +0200)]
mingw: Enable the tsaware linker flag

Avoids an irrelevant compatibility layer in Terminal Services environments.

https://msdn.microsoft.com/en-us/library/cc834995.aspx

9 years agomsvs: Don't redefine snprintf for VS2015
Henrik Gramner [Sun, 26 Jul 2015 21:13:26 +0000 (23:13 +0200)]
msvs: Don't redefine snprintf for VS2015

Visual Studio 2015 has a proper snprintf implementation.

9 years agomsvs: Prefer link.exe from the same directory as cl.exe
Henrik Gramner [Sun, 26 Jul 2015 21:13:19 +0000 (23:13 +0200)]
msvs: Prefer link.exe from the same directory as cl.exe

/usr/bin/link from coreutils may be located before the MSVS linker in $PATH
which causes linking to fail due to using the wrong binary.

9 years agoframe_dump: check fseek() return value
Henrik Gramner [Sun, 26 Jul 2015 22:10:00 +0000 (00:10 +0200)]
frame_dump: check fseek() return value

9 years agox264_vfprintf: use va_copy
Henrik Gramner [Sun, 26 Jul 2015 22:08:38 +0000 (00:08 +0200)]
x264_vfprintf: use va_copy

It's undefined behavior to use the same va_list twice.

This most likely didn't cause any issues in practice since the string would
have to be larger than 4 KiB to trigger the fallback path.

Use workaround for ICL as it doesn't define va_copy even for C99.

9 years agoparam_parse: Fix framerate rounding issues
Henrik Gramner [Sun, 26 Jul 2015 22:08:31 +0000 (00:08 +0200)]
param_parse: Fix framerate rounding issues

9 years agoaarch64: Remove broken CFLAGS in configure
Marcin Juszkiewicz [Mon, 1 Jun 2015 09:24:45 +0000 (11:24 +0200)]
aarch64: Remove broken CFLAGS in configure

GCC doesn't have an "-arch" switch, but works when that entire line is removed.

9 years agoppc: Add little-endian PowerPC support
Rong Yan [Mon, 20 Jul 2015 08:34:20 +0000 (03:34 -0500)]
ppc: Add little-endian PowerPC support

9 years agomips: MSA quant optimizations
Rishikesh More [Thu, 18 Jun 2015 12:18:46 +0000 (17:48 +0530)]
mips: MSA quant optimizations

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: MSA predict optimizations
Rishikesh More [Thu, 18 Jun 2015 12:18:45 +0000 (17:48 +0530)]
mips: MSA predict optimizations

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: MSA pixel optimizations
Rishikesh More [Thu, 18 Jun 2015 12:18:44 +0000 (17:48 +0530)]
mips: MSA pixel optimizations

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: MSA deblock optimizations
Rishikesh More [Thu, 18 Jun 2015 12:18:43 +0000 (17:48 +0530)]
mips: MSA deblock optimizations

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: MSA dct optimizations
Rishikesh More [Thu, 18 Jun 2015 12:18:42 +0000 (17:48 +0530)]
mips: MSA dct optimizations

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: MSA mc optimizations
Rishikesh More [Thu, 18 Jun 2015 12:18:40 +0000 (17:48 +0530)]
mips: MSA mc optimizations

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: Common MSA macros
Rishikesh More [Thu, 18 Jun 2015 12:18:38 +0000 (17:48 +0530)]
mips: Common MSA macros

Add macros for load/store, slide, shift, transpose and basic arithmetic
operations required by subsequent patches.

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: Add MSA support to checkasm
Rishikesh More [Tue, 12 May 2015 14:08:09 +0000 (19:38 +0530)]
mips: Add MSA support to checkasm

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: Initial MSA support
Kaustubh Raste [Fri, 17 Apr 2015 12:08:58 +0000 (17:38 +0530)]
mips: Initial MSA support

MSA is the MIPS SIMD Architecture.

Add X264_CPU_MSA define.
Update configure to detect MIPS platform and set flags.
CPU-specific gcc options are expected through --extra-cflags.

Sample command line for mips32r5:
    ./configure --host=mipsel-linux-gnu --cross-prefix=<TOOLCHAIN>/mips-mti-linux-gnu-
    --extra-cflags="-EL -mips32r5 -msched-weight -mload-store-pairs"

Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
9 years agoLimit autodetection of threads number according to the source height
Anton Mitrofanov [Thu, 16 Jul 2015 21:22:29 +0000 (00:22 +0300)]
Limit autodetection of threads number according to the source height

9 years agoFine-tune of frame's size predictors at ratecontrol start
Anton Mitrofanov [Thu, 16 Jul 2015 16:04:59 +0000 (19:04 +0300)]
Fine-tune of frame's size predictors at ratecontrol start

This is attempt to improve VBV at start of video with a lot of threads which
delay feedback for predictors.

9 years agoUse forced frame types in slicetype analysis
Anton Mitrofanov [Thu, 16 Jul 2015 13:15:56 +0000 (16:15 +0300)]
Use forced frame types in slicetype analysis

This should improve MBTree and VBV when a lot of forced frame types are used.

9 years agox86: SSSE3 and AVX2 implementations of plane_copy_swap
Henrik Gramner [Mon, 1 Dec 2014 21:05:42 +0000 (22:05 +0100)]
x86: SSSE3 and AVX2 implementations of plane_copy_swap

For NV21 input.

9 years agoNV21 input support
Yu Xiaolei [Fri, 6 Jun 2014 08:05:27 +0000 (16:05 +0800)]
NV21 input support

Eliminates an extra copy when encoding Android camera preview images.

Checkasm test by Janne Grunau.
ARM assembly with improvements from Janne Grunau.

9 years agodeblock: Write combining
Henrik Gramner [Tue, 23 Jun 2015 15:00:47 +0000 (17:00 +0200)]
deblock: Write combining

9 years agoGet rid of some tabs and trailing whitespaces
Henrik Gramner [Tue, 23 Jun 2015 12:59:59 +0000 (14:59 +0200)]
Get rid of some tabs and trailing whitespaces

9 years agox86: Experimental nasm support
Henrik Gramner [Sat, 23 May 2015 17:44:16 +0000 (19:44 +0200)]
x86: Experimental nasm support

Enables the use of nasm as an alternative to yasm.

Note that nasm cannot assemble x264 with PIC enabled since it currently doesn't
support [symbol-$$] addressing which is used extensively by x264's PIC code.
This includes all 64-bit Windows and 64-bit OS X builds, even non-shared.

For the above reason nasm is currently intentionally not auto-detected, instead
the assembler must be explicitly specified using "AS=nasm ./configure".

Also drop -O2 from ASFLAGS since it's simply ignored anyway.

9 years agox86inc: Prevent warnings when using `struc` and `endstruc`
Timothy Gu [Tue, 26 May 2015 17:12:42 +0000 (19:12 +0200)]
x86inc: Prevent warnings when using `struc` and `endstruc`

struc and endstruc attempts to revert to the previous section state set by
the SECTION macro.

Use the primitive [SECTION] directive instead of the SECTION macro for the
.note.GNU-stack section to prevent it from being emitted again during endstruc.

9 years agox86inc: Drop SECTION_TEXT macro
Henrik Gramner [Wed, 27 May 2015 19:38:14 +0000 (21:38 +0200)]
x86inc: Drop SECTION_TEXT macro

The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.

9 years agox86inc: Disable vpbroadcastq workaround in newer yasm versions
Henrik Gramner [Sat, 23 May 2015 11:38:05 +0000 (13:38 +0200)]
x86inc: Disable vpbroadcastq workaround in newer yasm versions

The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.

9 years agoPrefer Unicode versions of Windows API calls
Henrik Gramner [Sun, 24 May 2015 20:57:00 +0000 (22:57 +0200)]
Prefer Unicode versions of Windows API calls

Just for consistency, doesn't affect behavior.

9 years agoGet rid of fPIC warnings when compiling a shared library on Windows
Henrik Gramner [Sun, 24 May 2015 21:21:20 +0000 (23:21 +0200)]
Get rid of fPIC warnings when compiling a shared library on Windows

PIC is always enabled when compiling for Windows so gcc complains when using
-fPIC since it doesn't do anything.

9 years agomatroska: Write the correct DocTypeVersion when using frame-packing
Henrik Gramner [Sat, 25 Jul 2015 20:42:59 +0000 (22:42 +0200)]
matroska: Write the correct DocTypeVersion when using frame-packing

The StereoMode element is only valid with DocTypeVersion 3 or higher.

9 years agodump_yuv: Fix file handle leak
Anton Mitrofanov [Fri, 24 Jul 2015 21:21:52 +0000 (00:21 +0300)]
dump_yuv: Fix file handle leak

9 years agomp4: Fix file handle leak
Anton Mitrofanov [Fri, 24 Jul 2015 21:20:47 +0000 (00:20 +0300)]
mp4: Fix file handle leak

9 years agoflv: Check fseek() and fwrite() return values
Henrik Gramner [Tue, 23 Jun 2015 22:40:45 +0000 (00:40 +0200)]
flv: Check fseek() and fwrite() return values

9 years agoflv: Fix memory and file handle leaks
Henrik Gramner [Tue, 23 Jun 2015 22:22:56 +0000 (00:22 +0200)]
flv: Fix memory and file handle leaks

9 years agoavs: Fix file handle leak
Henrik Gramner [Tue, 23 Jun 2015 23:23:35 +0000 (01:23 +0200)]
avs: Fix file handle leak

9 years agomatroska: Fix memory leak
Henrik Gramner [Tue, 23 Jun 2015 11:38:02 +0000 (13:38 +0200)]
matroska: Fix memory leak

9 years agordo: Fix potential CAVLC overflow issues
Henrik Gramner [Tue, 23 Jun 2015 11:24:29 +0000 (13:24 +0200)]
rdo: Fix potential CAVLC overflow issues

9 years agoslurp_file: Various minor bug fixes
Henrik Gramner [Tue, 23 Jun 2015 20:08:35 +0000 (22:08 +0200)]
slurp_file: Various minor bug fixes

 * Fix unsigned <= 0 check.
 * Add additional size sanity check on 32-bit systems.
 * Don't read uninitialized data if fread() fails.

9 years agoparam_parse: Check strdup() return value
Henrik Gramner [Tue, 23 Jun 2015 20:47:53 +0000 (22:47 +0200)]
param_parse: Check strdup() return value

9 years agoparam_parse: Fix memory leak
Henrik Gramner [Tue, 23 Jun 2015 13:38:16 +0000 (15:38 +0200)]
param_parse: Fix memory leak

9 years agoAdd FreeBSD's stdint.h header guard to allowed list
Anton Mitrofanov [Fri, 19 Jun 2015 13:01:12 +0000 (16:01 +0300)]
Add FreeBSD's stdint.h header guard to allowed list

Patch written by Koop Mast <kwm@FreeBSD.org>

9 years agox86: Prevent overread of src in plane_copy_interleave
Henrik Gramner [Fri, 22 May 2015 17:23:33 +0000 (19:23 +0200)]
x86: Prevent overread of src in plane_copy_interleave

Could only occur in 4:2:2 with height == 1.

Also enable asm for inputs with different U/V strides as long as the strides
have identical signs.

9 years agocheckasm: Fix incorrect memcmp size for ARM architecture
Anton Mitrofanov [Wed, 20 May 2015 20:10:20 +0000 (23:10 +0300)]
checkasm: Fix incorrect memcmp size for ARM architecture

9 years agoFix possible use of uninitialized MVs in lookahead analysis for B-frames
Anton Mitrofanov [Sun, 26 Apr 2015 17:51:05 +0000 (20:51 +0300)]
Fix possible use of uninitialized MVs in lookahead analysis for B-frames

9 years agoCatch incorrect usage of libx264 API for delayed frames flushing
Anton Mitrofanov [Tue, 21 Apr 2015 20:08:19 +0000 (23:08 +0300)]
Catch incorrect usage of libx264 API for delayed frames flushing

9 years agoFix detection of system libx264 configuration
Anton Mitrofanov [Sat, 7 Mar 2015 20:00:09 +0000 (23:00 +0300)]
Fix detection of system libx264 configuration

9 years agoCosmetic changes
Anton Mitrofanov [Mon, 23 Feb 2015 11:23:18 +0000 (14:23 +0300)]
Cosmetic changes

9 years agoUpdate configure for auto detection of system libx264 configuration
Anton Mitrofanov [Tue, 30 Dec 2014 23:15:05 +0000 (02:15 +0300)]
Update configure for auto detection of system libx264 configuration

9 years agoAdd tile format frame packing value
Anton Mitrofanov [Tue, 3 Feb 2015 11:51:28 +0000 (14:51 +0300)]
Add tile format frame packing value

Defined in 2014-02 edition.

9 years agoStricter validation of crop-rect values
Anton Mitrofanov [Tue, 3 Feb 2015 10:39:14 +0000 (13:39 +0300)]
Stricter validation of crop-rect values

9 years agoAdd mono frame packing value
Vittorio Giovara [Tue, 20 Jan 2015 16:15:56 +0000 (16:15 +0000)]
Add mono frame packing value

Defined in 2013-04 edition.

9 years agoValidate frame packing value instead of clipping
Vittorio Giovara [Tue, 20 Jan 2015 15:57:41 +0000 (15:57 +0000)]
Validate frame packing value instead of clipping

9 years agox86inc: Correctly warn on use of SSE2 instructions in SSE functions
Christophe Gisquet [Tue, 3 Feb 2015 19:40:41 +0000 (20:40 +0100)]
x86inc: Correctly warn on use of SSE2 instructions in SSE functions

SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2
instructions did not issue warnings when used in SSE functions. Handle
it by also checking the register type when such instructions are used.

9 years agox86inc: Fix instantiation of YMM registers
Christophe Gisquet [Tue, 3 Feb 2015 17:02:30 +0000 (18:02 +0100)]
x86inc: Fix instantiation of YMM registers

9 years agomatroska: Correctly write display width and height in stereo mode
Vittorio Giovara [Tue, 20 Jan 2015 16:28:54 +0000 (16:28 +0000)]
matroska: Correctly write display width and height in stereo mode

According to the specifications, when stereo mode is set, these values
represent the single view size.

9 years agoUse POC type 0 for AVC-Intra
Kieran Kunhya [Tue, 20 Jan 2015 15:38:00 +0000 (09:38 -0600)]
Use POC type 0 for AVC-Intra

Based on a patch from Capella Systems

9 years agoFix ARCH variable name conflict with BSD ports (bsd.port.mk) read-only variable
Anton Mitrofanov [Sat, 3 Jan 2015 12:46:19 +0000 (15:46 +0300)]
Fix ARCH variable name conflict with BSD ports (bsd.port.mk) read-only variable

9 years agoFix negative percentages in final stats output
Anton Mitrofanov [Sat, 27 Dec 2014 17:35:39 +0000 (20:35 +0300)]
Fix negative percentages in final stats output

They were caused by integer overflow when encoding long UHD video.

9 years agoBump dates to 2015
Anton Mitrofanov [Sat, 3 Jan 2015 20:35:23 +0000 (23:35 +0300)]
Bump dates to 2015

10 years agox86: Update intel compiler cpu dispatcher override for new versions of ICC/ICL
Anton Mitrofanov [Mon, 15 Dec 2014 15:49:23 +0000 (18:49 +0300)]
x86: Update intel compiler cpu dispatcher override for new versions of ICC/ICL

10 years agoNew AQ mode: auto-variance AQ with bias to dark scenes
Anton Mitrofanov [Tue, 6 Sep 2011 17:53:29 +0000 (21:53 +0400)]
New AQ mode: auto-variance AQ with bias to dark scenes

Also known as --aq-mode 3 or auto-variance AQ modification.

10 years agoImprove HRD conformance
Anton Mitrofanov [Tue, 28 Aug 2012 23:02:27 +0000 (03:02 +0400)]
Improve HRD conformance

10 years agox86: SSE and AVX implementations of plane_copy
Henrik Gramner [Fri, 28 Nov 2014 22:24:56 +0000 (23:24 +0100)]
x86: SSE and AVX implementations of plane_copy

Also remove the MMX2 implementation and fix src overread for height == 1.

10 years agoUpdate to the latest version of gas-preprocessor.pl from http://git.libav.org/?p...
Anton Mitrofanov [Mon, 29 Sep 2014 19:26:19 +0000 (23:26 +0400)]
Update to the latest version of gas-preprocessor.pl from http://git.libav.org/?p=gas-preprocessor.git

Contributions by Janne Grunau, Martin Storsjo, Mans Rullgard, David Conrad, Martin Aumuller and others

10 years agoaarch64: cabac_encode_{decision,bypass,terminal}_asm
Janne Grunau [Tue, 18 Nov 2014 23:33:55 +0000 (00:33 +0100)]
aarch64: cabac_encode_{decision,bypass,terminal}_asm

benchmarks on a Nexus 9 (nvidia denver):
101.3 cycles in x264_cabac_encode_decision_c,   67105369 runs, 3495 skips
 97.3 cycles in x264_cabac_encode_decision_asm, 67105493 runs, 3371 skips
132.8 cycles in x264_cabac_encode_terminal_c,    1046950 runs, 1626 skips
116.1 cycles in x264_cabac_encode_terminal_asm,  1048424 runs, 152 skips
 92.4 cycles in x264_cabac_encode_bypass_c,     16776192 runs, 1024 skips
 89.6 cycles in x264_cabac_encode_bypass_asm,   16776453 runs, 763 skips

Cycle counts are not as stable as one would like. The dynamic code
optimisation seems to produce different results for small chnages in a
binary. Repeated runs with the same binary produce stable results
though (ignoring the first run).

10 years agocheckasm: add cycle counter read for aarch64
Janne Grunau [Thu, 6 Nov 2014 08:20:17 +0000 (09:20 +0100)]
checkasm: add cycle counter read for aarch64

Needs kernel support since user space access to the cycle counter is not
allowed on all available AArch64 systems (Android 5 and iOS).

10 years agoaarch64: nal_escape_neon
Janne Grunau [Wed, 5 Nov 2014 10:35:13 +0000 (11:35 +0100)]
aarch64: nal_escape_neon

3-4 times faster.

10 years agoaarch64: {plane_copy,memcpy_aligned,memzero_aligned}_neon
Janne Grunau [Fri, 31 Oct 2014 13:49:04 +0000 (14:49 +0100)]
aarch64: {plane_copy,memcpy_aligned,memzero_aligned}_neon

2-3 times faster than C.

10 years agoaarch64: x264_mbtree_propagate_{cost,list}_neon
Janne Grunau [Wed, 29 Oct 2014 17:17:48 +0000 (18:17 +0100)]
aarch64: x264_mbtree_propagate_{cost,list}_neon

x264_mbtree_propagate_cost_neon is ~7 times faster.
x264_mbtree_propagate_list_neon is 33% faster.

10 years agoaarch64: x264_denoise_dct_neon
Janne Grunau [Tue, 21 Oct 2014 13:18:49 +0000 (15:18 +0200)]
aarch64: x264_denoise_dct_neon

3.5 times faster.

10 years agoaarch64: x264_coeff_level_run{4,8,15,16}
Janne Grunau [Mon, 20 Oct 2014 11:12:14 +0000 (13:12 +0200)]
aarch64: x264_coeff_level_run{4,8,15,16}

All functions ~33% faster.

10 years agoaarch64: NEON asm for intra luma deblocking
Janne Grunau [Tue, 14 Oct 2014 17:20:52 +0000 (19:20 +0200)]
aarch64: NEON asm for intra luma deblocking

deblock_luma_intra[0]_neon is 2 times fastes,
deblock_luma_intra[1]_neon is ~4 times faster.

10 years agoaarch64: x264_deblock_h_chroma_422_neon
Janne Grunau [Mon, 13 Oct 2014 15:29:22 +0000 (17:29 +0200)]
aarch64: x264_deblock_h_chroma_422_neon

deblock_h_chroma_422 2.5 times faster

10 years agoaarch64: x264_deblock_h_chroma_mbaff_neon
Janne Grunau [Mon, 13 Oct 2014 10:43:50 +0000 (12:43 +0200)]
aarch64: x264_deblock_h_chroma_mbaff_neon

deblock_chroma_420_mbaff_neon  2 times faster

10 years agoaarch64: NEON asm for intra chroma deblocking
Janne Grunau [Fri, 10 Oct 2014 08:29:15 +0000 (10:29 +0200)]
aarch64: NEON asm for intra chroma deblocking

deblock_h_chroma_420_intra, deblock_h_chroma_422_intra and
x264_deblock_h_chroma_intra_mbaff_neon are ~3 times faster.
deblock_chroma_intra[1] is ~4 times faster than C.

10 years agoaarch64: add myself as author to aarch64/mc.h
Janne Grunau [Tue, 2 Sep 2014 08:27:22 +0000 (10:27 +0200)]
aarch64: add myself as author to aarch64/mc.h

10 years agoaarch64: NEON asm for integral init
Janne Grunau [Thu, 14 Aug 2014 13:22:50 +0000 (14:22 +0100)]
aarch64: NEON asm for integral init

integral_init4h_neon and integral_init8h_neon are 3-4 times faster than
C. integral_init8v_neon is 6 times faster and integral_init4v_neon is 10
times faster.

10 years agoaarch64: NEON asm for 8x16c intra prediction
Janne Grunau [Wed, 13 Aug 2014 12:30:53 +0000 (13:30 +0100)]
aarch64: NEON asm for 8x16c intra prediction

Between 10% and 40% faster than C.

10 years agoaarch64: NEON asm for decimate_score
Janne Grunau [Tue, 12 Aug 2014 15:26:10 +0000 (17:26 +0200)]
aarch64: NEON asm for decimate_score

decimate_score15 and 16 are 60% faster, decimate_score64 is 4 times
faster than C.

10 years agoaarch64: implement x264_sub8x16_dct_dc_neon
Janne Grunau [Fri, 8 Aug 2014 10:19:35 +0000 (11:19 +0100)]
aarch64: implement x264_sub8x16_dct_dc_neon

4 times faster than C.

10 years agoaarch64: implement x264_pixel_asd8_neon
Janne Grunau [Thu, 7 Aug 2014 17:46:07 +0000 (19:46 +0200)]
aarch64: implement x264_pixel_asd8_neon

7 times faster than C.

10 years agoaarch64: NEON asm for 4x16 sad, satd and ssd
Janne Grunau [Thu, 7 Aug 2014 14:49:12 +0000 (16:49 +0200)]
aarch64: NEON asm for 4x16 sad, satd and ssd

pixel_sad_4x16_neon: 33% faster than C
pixel_satd_4x16_neon: 5 times faster
pixel_ssd_4x16_neon:  4 times faster

10 years agoaarch64: implement x264_pixel_ssd_nv12_core_neon
Janne Grunau [Wed, 30 Jul 2014 14:48:25 +0000 (15:48 +0100)]
aarch64: implement x264_pixel_ssd_nv12_core_neon

13 times faster than C.

10 years agoaarch64: implement x264_pixel_vsad_neon
Janne Grunau [Tue, 29 Jul 2014 17:26:11 +0000 (18:26 +0100)]
aarch64: implement x264_pixel_vsad_neon

35 times faster than C.

10 years agoaarch64: NEON asm for missing x264_zigzag_* functions
Janne Grunau [Tue, 29 Jul 2014 10:06:24 +0000 (11:06 +0100)]
aarch64: NEON asm for missing x264_zigzag_* functions

zigzag_scan_4x4_field_neon, zigzag_sub_4x4_field_neon,
zigzag_sub_4x4ac_field_neon, zigzag_sub_4x4_frame_neon,
igzag_sub_4x4ac_frame_neon more than 2 times faster

zigzag_scan_8x8_frame_neon, zigzag_scan_8x8_field_neon,
zigzag_sub_8x8_field_neon, zigzag_sub_8x8_frame_neon 4-5 times faster

zigzag_interleave_8x8_cavlc_neon 6 times faster

10 years agoaarch64: implement x264_pixel_sa8d_satd_16x16_neon
Janne Grunau [Fri, 25 Jul 2014 10:53:17 +0000 (11:53 +0100)]
aarch64: implement x264_pixel_sa8d_satd_16x16_neon

~20% faster than calling pixel_sa8d_16x16 and pixel_satd_16x16
separately.

10 years agoaarch64: optimize x264_predict_8x8c_dc_left_neon
Janne Grunau [Thu, 14 Aug 2014 21:13:27 +0000 (23:13 +0200)]
aarch64: optimize x264_predict_8x8c_dc_left_neon

25% faster than the previous version.

10 years agox86: Make AVX2 also imply FMA3
Henrik Gramner [Sat, 2 Aug 2014 16:26:18 +0000 (18:26 +0200)]
x86: Make AVX2 also imply FMA3

All CPUs with AVX2 supports FMA3 (but not the other way around).

10 years agoSimplify libx264 API usage example
Anton Mitrofanov [Thu, 13 Nov 2014 19:52:00 +0000 (22:52 +0300)]
Simplify libx264 API usage example

10 years agoAvxSynth: Remove a bunch of unused cruft
Henrik Gramner [Fri, 21 Nov 2014 22:47:20 +0000 (23:47 +0100)]
AvxSynth: Remove a bunch of unused cruft

10 years agoFix bugs/typos in motion compensation and cache_load
Anton Mitrofanov [Wed, 3 Dec 2014 19:36:12 +0000 (22:36 +0300)]
Fix bugs/typos in motion compensation and cache_load

Didn't affect output due to the incorrect values either not being used in the
code path or producing equal results compared to the correct values.

Also deduplicate hpel_ref arrays.

10 years agocheckasm: Fix undefined behavior warnings
Anton Mitrofanov [Sun, 30 Nov 2014 20:39:28 +0000 (23:39 +0300)]
checkasm: Fix undefined behavior warnings

10 years agocheckasm: Fix V210 reporting
Henrik Gramner [Sat, 29 Nov 2014 17:47:52 +0000 (18:47 +0100)]
checkasm: Fix V210 reporting

It would previously report FAILED if any of the earlier plane_copy tests failed.

10 years agoSafety check against malicious high bit-depth input which could cause crash
Anton Mitrofanov [Sun, 12 Oct 2014 17:01:53 +0000 (21:01 +0400)]
Safety check against malicious high bit-depth input which could cause crash