granicus.if.org Git - libx264/log

]> granicus.if.org Git - libx264/log

projects / libx264 / log

commit | commitdiff | tree

Janne Grunau [Mon, 13 Oct 2014 10:43:50 +0000 (12:43 +0200)]

aarch64: x264_deblock_h_chroma_mbaff_neon

deblock_chroma_420_mbaff_neon 2 times faster

commit | commitdiff | tree

Janne Grunau [Fri, 10 Oct 2014 08:29:15 +0000 (10:29 +0200)]

aarch64: NEON asm for intra chroma deblocking

deblock_h_chroma_420_intra, deblock_h_chroma_422_intra and
x264_deblock_h_chroma_intra_mbaff_neon are ~3 times faster.
deblock_chroma_intra[1] is ~4 times faster than C.

commit | commitdiff | tree

Janne Grunau [Tue, 2 Sep 2014 08:27:22 +0000 (10:27 +0200)]

aarch64: add myself as author to aarch64/mc.h

commit | commitdiff | tree

Janne Grunau [Thu, 14 Aug 2014 13:22:50 +0000 (14:22 +0100)]

aarch64: NEON asm for integral init

integral_init4h_neon and integral_init8h_neon are 3-4 times faster than
C. integral_init8v_neon is 6 times faster and integral_init4v_neon is 10
times faster.

commit | commitdiff | tree

Janne Grunau [Wed, 13 Aug 2014 12:30:53 +0000 (13:30 +0100)]

aarch64: NEON asm for 8x16c intra prediction

Between 10% and 40% faster than C.

commit | commitdiff | tree

Janne Grunau [Tue, 12 Aug 2014 15:26:10 +0000 (17:26 +0200)]

aarch64: NEON asm for decimate_score

decimate_score15 and 16 are 60% faster, decimate_score64 is 4 times
faster than C.

commit | commitdiff | tree

Janne Grunau [Fri, 8 Aug 2014 10:19:35 +0000 (11:19 +0100)]

aarch64: implement x264_sub8x16_dct_dc_neon

4 times faster than C.

commit | commitdiff | tree

Janne Grunau [Thu, 7 Aug 2014 17:46:07 +0000 (19:46 +0200)]

aarch64: implement x264_pixel_asd8_neon

7 times faster than C.

commit | commitdiff | tree

Janne Grunau [Thu, 7 Aug 2014 14:49:12 +0000 (16:49 +0200)]

aarch64: NEON asm for 4x16 sad, satd and ssd

pixel_sad_4x16_neon: 33% faster than C
pixel_satd_4x16_neon: 5 times faster
pixel_ssd_4x16_neon: 4 times faster

commit | commitdiff | tree

Janne Grunau [Wed, 30 Jul 2014 14:48:25 +0000 (15:48 +0100)]

aarch64: implement x264_pixel_ssd_nv12_core_neon

13 times faster than C.

commit | commitdiff | tree

Janne Grunau [Tue, 29 Jul 2014 17:26:11 +0000 (18:26 +0100)]

aarch64: implement x264_pixel_vsad_neon

35 times faster than C.

commit | commitdiff | tree

Janne Grunau [Tue, 29 Jul 2014 10:06:24 +0000 (11:06 +0100)]

aarch64: NEON asm for missing x264_zigzag_* functions

zigzag_scan_4x4_field_neon, zigzag_sub_4x4_field_neon,
zigzag_sub_4x4ac_field_neon, zigzag_sub_4x4_frame_neon,
igzag_sub_4x4ac_frame_neon more than 2 times faster

zigzag_scan_8x8_frame_neon, zigzag_scan_8x8_field_neon,
zigzag_sub_8x8_field_neon, zigzag_sub_8x8_frame_neon 4-5 times faster

zigzag_interleave_8x8_cavlc_neon 6 times faster

commit | commitdiff | tree

Janne Grunau [Fri, 25 Jul 2014 10:53:17 +0000 (11:53 +0100)]

aarch64: implement x264_pixel_sa8d_satd_16x16_neon

~20% faster than calling pixel_sa8d_16x16 and pixel_satd_16x16
separately.

commit | commitdiff | tree

Janne Grunau [Thu, 14 Aug 2014 21:13:27 +0000 (23:13 +0200)]

aarch64: optimize x264_predict_8x8c_dc_left_neon

25% faster than the previous version.

commit | commitdiff | tree

Henrik Gramner [Sat, 2 Aug 2014 16:26:18 +0000 (18:26 +0200)]

x86: Make AVX2 also imply FMA3

All CPUs with AVX2 supports FMA3 (but not the other way around).

commit | commitdiff | tree

Anton Mitrofanov [Thu, 13 Nov 2014 19:52:00 +0000 (22:52 +0300)]

Simplify libx264 API usage example

commit | commitdiff | tree

Henrik Gramner [Fri, 21 Nov 2014 22:47:20 +0000 (23:47 +0100)]

AvxSynth: Remove a bunch of unused cruft

commit | commitdiff | tree

Anton Mitrofanov [Wed, 3 Dec 2014 19:36:12 +0000 (22:36 +0300)]

Fix bugs/typos in motion compensation and cache_load

Didn't affect output due to the incorrect values either not being used in the
code path or producing equal results compared to the correct values.

Also deduplicate hpel_ref arrays.

commit | commitdiff | tree

Anton Mitrofanov [Sun, 30 Nov 2014 20:39:28 +0000 (23:39 +0300)]

checkasm: Fix undefined behavior warnings

commit | commitdiff | tree

Henrik Gramner [Sat, 29 Nov 2014 17:47:52 +0000 (18:47 +0100)]

checkasm: Fix V210 reporting

It would previously report FAILED if any of the earlier plane_copy tests failed.

commit | commitdiff | tree

Anton Mitrofanov [Sun, 12 Oct 2014 17:01:53 +0000 (21:01 +0400)]

Safety check against malicious high bit-depth input which could cause crash

commit | commitdiff | tree

Anton Mitrofanov [Sun, 12 Oct 2014 16:45:40 +0000 (20:45 +0400)]

libx264 API usage example

commit | commitdiff | tree

Henrik Gramner [Fri, 17 Oct 2014 19:35:42 +0000 (21:35 +0200)]

x86: AVX2 high bit-depth var_16x16

40->27 cycles on Haswell.

commit | commitdiff | tree

Henrik Gramner [Wed, 8 Oct 2014 20:25:35 +0000 (22:25 +0200)]

checkasm: Serialize read_time() calls on x86

Improves the accuracy of benchmarks, especially in short functions.

To quote the Intel 64 and IA-32 Architectures Software Developer's Manual:
"The RDTSC instruction is not a serializing instruction. It does not necessarily
wait until all previous instructions have been executed before reading the counter.
Similarly, subsequent instructions may begin execution before the read operation
is performed. If software requires RDTSC to be executed only after all previous
instructions have completed locally, it can either use RDTSCP (if the processor
supports that instruction) or execute the sequence LFENCE;RDTSC."

RDTSCP would accomplish the same task, but it's only available since Nehalem.

This change makes SSE2 a requirement to run checkasm.

commit | commitdiff | tree

Vittorio Giovara [Mon, 29 Sep 2014 17:51:30 +0000 (18:51 +0100)]

Support case-independent string options

commit | commitdiff | tree

Anton Mitrofanov [Sat, 6 Sep 2014 16:44:49 +0000 (20:44 +0400)]

Shut up gcc -Wuninitialized warnings

commit | commitdiff | tree

Anton Mitrofanov [Fri, 5 Sep 2014 15:43:52 +0000 (19:43 +0400)]

Shut up clang -Wuninitialized warning

commit | commitdiff | tree

Anton Mitrofanov [Fri, 5 Sep 2014 15:30:47 +0000 (19:30 +0400)]

Fix few clang -Wunused-* warnings

commit | commitdiff | tree

Anton Mitrofanov [Thu, 28 Aug 2014 16:13:13 +0000 (20:13 +0400)]

Fix inappropriate instruction use

commit | commitdiff | tree

Anton Mitrofanov [Thu, 28 Aug 2014 14:38:53 +0000 (18:38 +0400)]

x264asm: warn when inappropriate instruction used in function with specified cpuflags

commit | commitdiff | tree

Anton Mitrofanov [Mon, 1 Sep 2014 21:48:00 +0000 (01:48 +0400)]

Fix VBV with true VFR streams

commit | commitdiff | tree

Anton Mitrofanov [Mon, 1 Sep 2014 18:45:00 +0000 (22:45 +0400)]

Fix VBV

commit | commitdiff | tree

Anton Mitrofanov [Tue, 29 Jul 2014 23:03:32 +0000 (03:03 +0400)]

Update to the current lavf API and fix memory leak when using --seek

commit | commitdiff | tree

Henrik Gramner [Mon, 4 Aug 2014 23:42:55 +0000 (01:42 +0200)]

x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags

Previously there was a limit of two cpuflags.

commit | commitdiff | tree

Henrik Gramner [Mon, 4 Aug 2014 23:42:51 +0000 (01:42 +0200)]

x86: Minor pixel_ssim_end4 improvements

Reduce the number of vector registers used from 7 to 5.
Eliminate some moves in the AVX implementation.
Avoid bypass delays for transitioning between int and float domains.

commit | commitdiff | tree

Henrik Gramner [Mon, 4 Aug 2014 23:42:47 +0000 (01:42 +0200)]

x86: Faster quant_4x4x4

Also drop the MMX version instead of doing a bunch of ifdeffery to support it after this change.

commit | commitdiff | tree

Anton Mitrofanov [Sun, 10 Aug 2014 18:46:12 +0000 (22:46 +0400)]

configure: improve cc_check for clang and ICL to not ignore unknown options

commit | commitdiff | tree

Henrik Gramner [Mon, 4 Aug 2014 23:42:44 +0000 (01:42 +0200)]

checkasm: Only call x264_cpu_detect() once

commit | commitdiff | tree

Janne Grunau [Fri, 18 Jul 2014 13:49:10 +0000 (14:49 +0100)]

aarch64: deblocking NEON asm

Deblock chroma/luma are based on libav's h264 aarch64 NEON deblocking
filter which was ported by me from the existing ARM NEON asm. No
additional persons to ask for a relicense.

commit | commitdiff | tree

Janne Grunau [Fri, 18 Jul 2014 08:29:35 +0000 (09:29 +0100)]

aarch64: intra predition NEON asm

Ported from the ARM NEON asm.

commit | commitdiff | tree

Janne Grunau [Thu, 17 Jul 2014 14:58:44 +0000 (15:58 +0100)]

aarch64: motion compensation NEON asm

Ported from the ARM NEON asm.

commit | commitdiff | tree

Janne Grunau [Wed, 16 Jul 2014 09:03:52 +0000 (10:03 +0100)]

aarch64: transform and zigzag NEON asm

Ported from the ARM NEON asm.

commit | commitdiff | tree

Janne Grunau [Tue, 15 Jul 2014 11:57:03 +0000 (12:57 +0100)]

aarch64: quantization and level-run NEON asm

Ported from the ARM NEON asm.

commit | commitdiff | tree

Janne Grunau [Wed, 19 Mar 2014 12:48:21 +0000 (13:48 +0100)]

aarch64: pixel metrics NEON asm

Ported from the ARM NEON asm.

commit | commitdiff | tree

Janne Grunau [Fri, 18 Jul 2014 15:44:57 +0000 (17:44 +0200)]

aarch64: add utility functions for asm

commit | commitdiff | tree

Janne Grunau [Wed, 19 Mar 2014 12:45:17 +0000 (13:45 +0100)]

aarch64: add armv8 and neon cpu flags and test them

commit | commitdiff | tree

Janne Grunau [Tue, 18 Mar 2014 21:10:24 +0000 (22:10 +0100)]

aarch64: initial build support

commit | commitdiff | tree

Janne Grunau [Tue, 22 Jul 2014 17:28:27 +0000 (19:28 +0200)]

checkasm: test zigzag_sub_8x8_{frame,field}

commit | commitdiff | tree

Janne Grunau [Sun, 20 Jul 2014 16:29:01 +0000 (18:29 +0200)]

arm: use long multiplication in mc_weight_w*_neon

9-19% faster on a cortex-a9.

commit | commitdiff | tree

Janne Grunau [Sun, 20 Jul 2014 16:24:57 +0000 (18:24 +0200)]

arm: do not use aligned stores in mc_weight_w4_*neon

mc_weight_w4_*neon is also used for width 2 which does not guarantee
4-byte aligned destination. Fixes crashes caused by random memory
corruption.

commit | commitdiff | tree

Janne Grunau [Wed, 2 Apr 2014 14:31:28 +0000 (16:31 +0200)]

checkasm: add memory clobber to read_time inline asm

The memory acts as compiler barrier preventing aggressive reordering
of read_time calls. gcc 4.8 reorders some of initial read_time calls
after the second when targeting arm.

commit | commitdiff | tree

Janne Grunau [Sun, 20 Jul 2014 11:32:10 +0000 (13:32 +0200)]

arm: check if the assembler supports the '.func' directive

The integrated assembler in llvm trunk (to be released as 3.5) is
otherwise capable enough to assemble the arm asm correctly.

commit | commitdiff | tree

Janne Grunau [Sun, 20 Jul 2014 11:40:28 +0000 (13:40 +0200)]

arm/ppc: use $CC as default assembler

commit | commitdiff | tree

Janne Grunau [Sun, 20 Jul 2014 11:34:27 +0000 (13:34 +0200)]

arm: move instructions after '.rept' to separate line

The gas manual states "Repeat the sequence of lines between the .rept
directive and the next .endr directive ...". GNU as seems to support
instructions on the same line as .rept anyway but the integrated
assembler in llvm trunk (to be released 3.5 in August 2014) does not.

commit | commitdiff | tree

Janne Grunau [Sun, 20 Jul 2014 11:08:17 +0000 (13:08 +0200)]

arm: set .arch/.fpu from asm.S

commit | commitdiff | tree

Janne Grunau [Sun, 20 Jul 2014 10:55:53 +0000 (12:55 +0200)]

arm: do not append CFLAGS to ASFLAGS

commit | commitdiff | tree

Tristan Matthews [Thu, 17 Jul 2014 04:03:50 +0000 (00:03 -0400)]

filters: fix sizeof mismatch

commit | commitdiff | tree

Anton Mitrofanov [Thu, 31 Jul 2014 12:17:32 +0000 (16:17 +0400)]

Fix memory leak when using select_every filter

commit | commitdiff | tree

Tsukasa OMOTO [Sun, 20 Jul 2014 13:17:11 +0000 (22:17 +0900)]

Fix cltostr.sh on OS X

commit | commitdiff | tree

Fiona Glaser [Wed, 9 Jul 2014 19:21:33 +0000 (12:21 -0700)]

Check pf_log is set in validate_parameters

Help remind people to call x264_param_default in case they didn't read the
documentation.

commit | commitdiff | tree

Anton Mitrofanov [Wed, 9 Jul 2014 13:17:04 +0000 (17:17 +0400)]

Check malloc during frame dumping

commit | commitdiff | tree

Yusuke Nakamura [Wed, 18 Jun 2014 20:21:29 +0000 (05:21 +0900)]

mp4_lsmash: Use new I/O API instead of deprecated one.

commit | commitdiff | tree

Anton Mitrofanov [Sun, 8 Jun 2014 18:19:46 +0000 (22:19 +0400)]

Remove meaningless use of abs()

commit | commitdiff | tree

Steven Walters [Sat, 31 May 2014 14:31:16 +0000 (10:31 -0400)]

MSVS 2013 Update 2 support

The first MSVS compiler C99 compliant enough to build x264.
Use `CC=cl ./configure` to compile with it.

commit | commitdiff | tree

Diego Biurrun [Tue, 15 Apr 2014 20:54:08 +0000 (22:54 +0200)]

configure: Add -Wno-maybe-uninitialized to CFLAGS

The warnings generated by -Wmaybe-uninitialized are mostly spurious.

commit | commitdiff | tree

Diego Biurrun [Wed, 7 May 2014 11:20:43 +0000 (13:20 +0200)]

build: Replace cltostr.pl by a shell script

This avoids a dependency on Perl to build OpenCL support.

commit | commitdiff | tree

Diego Biurrun [Tue, 15 Apr 2014 21:02:39 +0000 (23:02 +0200)]

build: Simplify phony target declaration with wildcards

Also add etags to list of phony targets.

commit | commitdiff | tree

Diego Biurrun [Wed, 7 May 2014 10:47:37 +0000 (12:47 +0200)]

configure: Drop workaround for obsolete gcc 4.2 on ARM

commit | commitdiff | tree

Diego Biurrun [Wed, 7 May 2014 19:43:15 +0000 (21:43 +0200)]

build: Add dependencies on x86inc.asm/x86util.asm for all .asm files

This is a little bit overzealous, but errs on the side of caution.
Generating full dependency information is also possible, but slightly
slows down the build as YASM cannot do it as a sideeffect of compilation.

commit | commitdiff | tree

Diego Biurrun [Sun, 27 Apr 2014 19:09:54 +0000 (21:09 +0200)]

Delete all SPARC optimizations

SPARC has been obsolete for a long time and makes little sense as a
H.264 encoding platform.

Also update authors file.

commit | commitdiff | tree

Diego Biurrun [Wed, 7 May 2014 10:46:42 +0000 (12:46 +0200)]

configure: Don't check for libavcore

libavcore was a never-released bad idea with a short lifespan.

commit | commitdiff | tree

Diego Biurrun [Sun, 27 Apr 2014 21:19:04 +0000 (23:19 +0200)]

build: Set all ASFLAGS from within configure

This is how all other toolchain flags are handled.

commit | commitdiff | tree

Diego Biurrun [Sun, 27 Apr 2014 21:23:49 +0000 (23:23 +0200)]

opencl: Check return value of fread()

common/opencl.c:138:10: warning: ignoring return value of 'fread', declared with attribute warn_unused_result [-Wunused-result]

commit | commitdiff | tree

Fiona Glaser [Sun, 20 Jul 2014 03:34:22 +0000 (20:34 -0700)]

Disable i8x8 in lossless

x264's implementation was slightly incorrect due to a vague spec, so some
decoders decoded video incorrectly.

Minimal impact on compression.

commit | commitdiff | tree

Thomas Mundt [Fri, 27 Jun 2014 18:12:06 +0000 (11:12 -0700)]

AVC-Intra: fix compatibility with Avid Transfermanager

commit | commitdiff | tree

Henrik Gramner [Tue, 8 Jul 2014 19:15:32 +0000 (21:15 +0200)]

x86: Fix SIGILL in high bit-depth intra_sad_x3_4x4_sse2

An SSE3 instruction was used in an SSE2 function.

commit | commitdiff | tree

Anton Mitrofanov [Wed, 9 Jul 2014 13:01:54 +0000 (17:01 +0400)]

Fix incorrect row predictor addressing

Somehow managed to not cause things to explode, but was clearly incorrect.
Might improve VBV in some cases to have this working right.

commit | commitdiff | tree

Anton Mitrofanov [Sat, 21 Jun 2014 19:52:39 +0000 (23:52 +0400)]

Fix b-pyramid MMCO remove for frame-packing==5

commit | commitdiff | tree

Tal Aloni [Tue, 17 Jun 2014 22:10:56 +0000 (15:10 -0700)]

Fix frame-packing==5 with some decoders

The spec mandates that frame-packing==5 requires the SEI on every frame that
begins a view sequence (i.e. the input frames L0-R0-L1-R1 have 4 view sequences,
but if reordered by the encoder to L0-L1-R0-R1 there are now 2 view sequences).
For simplicity, we write the SEI on every frame.

This fixes frame-packing==5 3D playback on some decoders (PlayStation 3, Sony
W8 series, possibly others).

commit | commitdiff | tree

Anton Mitrofanov [Thu, 22 May 2014 09:27:00 +0000 (13:27 +0400)]

Fix pixel_ssim_end4 asm function for x86_64 systems

commit | commitdiff | tree

James Almer [Wed, 9 Apr 2014 06:33:06 +0000 (03:33 -0300)]

x86: XOP pixel_sad_{x3, x4} high bit-depth

commit | commitdiff | tree

James Almer [Wed, 9 Apr 2014 06:33:05 +0000 (03:33 -0300)]

x86: XOP pixel_ssd_nv12_core

commit | commitdiff | tree

James Almer [Wed, 9 Apr 2014 06:33:04 +0000 (03:33 -0300)]

x86util: XOP optimized HADDD

commit | commitdiff | tree

James Almer [Wed, 9 Apr 2014 06:33:03 +0000 (03:33 -0300)]

x86: add missing initialization for high bit-depth sa8d_satd

commit | commitdiff | tree

James Almer [Sun, 6 Apr 2014 02:46:31 +0000 (23:46 -0300)]

x86: add missing initializations for high bit-depth variance

commit | commitdiff | tree

Janne Grunau [Tue, 1 Apr 2014 20:11:45 +0000 (22:11 +0200)]

arm: use the weight_fn_t typedef for mc weight function arrays

commit | commitdiff | tree

Janne Grunau [Tue, 1 Apr 2014 20:11:44 +0000 (22:11 +0200)]

arm: correct x264_mc_chroma_neon function declaration

commit | commitdiff | tree

Janne Grunau [Tue, 1 Apr 2014 20:11:43 +0000 (22:11 +0200)]

arm: do not export every asm function

Based on Libav's libavutil/arm/asm.S. Also prevents having the same
label twice for every function on systems not defining EXTERN_ASM.
Clang's integrated assembler does not like it.

commit | commitdiff | tree

Janne Grunau [Tue, 1 Apr 2014 20:11:42 +0000 (22:11 +0200)]

arm: move all .macro/.endm to column 0

commit | commitdiff | tree

William Grant [Sun, 23 Mar 2014 16:21:52 +0000 (09:21 -0700)]

aarch64: require PIC in shared mode

commit | commitdiff | tree

Janne Grunau [Sun, 16 Mar 2014 16:21:58 +0000 (17:21 +0100)]

arm: x264_coeff_last8_arm

checkasm --bench on a coretex-a9:
coeff_last8_c: 173
coeff_last8_armv6: 151

60 instead of 73 cycles in ~130k runs on the same cpu while encoding.

commit | commitdiff | tree

Janne Grunau [Sat, 15 Mar 2014 19:09:18 +0000 (20:09 +0100)]

arm: x264_store_interleave_chroma_neon

store_interleave_chroma_c: 4036
store_interleave_chroma_neon: 1043

commit | commitdiff | tree

Janne Grunau [Sat, 15 Mar 2014 18:55:50 +0000 (19:55 +0100)]

arm: x264_plane_copy_interleave_neon

plane_copy_interleave_c: 40285
plane_copy_interleave_neon: 10137

commit | commitdiff | tree

Janne Grunau [Sat, 15 Mar 2014 18:21:12 +0000 (19:21 +0100)]

arm: x264_plane_copy_deinterleave_rgb_neon

plane_copy_deinterleave_rgb_c: 31543
plane_copy_deinterleave_rgb_neon: 8312

commit | commitdiff | tree

Janne Grunau [Sat, 15 Mar 2014 17:22:49 +0000 (18:22 +0100)]

arm: load_deinterleave_chroma_f{dec,enc}_neon

load_deinterleave_chroma_fdec_c: 4055
load_deinterleave_chroma_fdec_neon: 995
load_deinterleave_chroma_fenc_c: 4071
load_deinterleave_chroma_fenc_neon: 992

commit | commitdiff | tree

Janne Grunau [Sat, 15 Mar 2014 16:22:08 +0000 (17:22 +0100)]

arm: x264_plane_copy_deinterleave_neon

plane_copy_deinterleave_c: 42988
plane_copy_deinterleave_neon: 10184

commit | commitdiff | tree

Janne Grunau [Sat, 15 Mar 2014 12:29:41 +0000 (13:29 +0100)]

arm: implement deblock_strength_neon

Based on deblock_strength_avx.

checkasm --bench on a cortex-a9:
deblock_strength_c: 14611
deblock_strength_neon: 1848

commit | commitdiff | tree

Janne Grunau [Sat, 15 Mar 2014 09:51:11 +0000 (10:51 +0100)]

arm: add missing macro instantiation for x264_pixel_avg_4x16_neon

checkasm --bench on a cortex-a9:
avg_4x16_c: 8910
avg_4x16_neon: 2091

commit | commitdiff | tree

Janne Grunau [Thu, 13 Mar 2014 00:02:13 +0000 (01:02 +0100)]

arm: implement x264_predict_4x4_v_armv6

Alone probably not worth it but allows use of predict_4x4_dc|h_armv6
in intra_sad|satd_x3_4x4_neon.

commit | commitdiff | tree

Roland Stigge [Sun, 23 Mar 2014 16:29:37 +0000 (09:29 -0700)]

ppc: fix build on certain PowerPC variants without Altivec

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom