]>
granicus.if.org Git - libx264/log
Henrik Gramner [Tue, 23 Jun 2015 11:24:29 +0000 (13:24 +0200)]
rdo: Fix potential CAVLC overflow issues
Henrik Gramner [Tue, 23 Jun 2015 20:08:35 +0000 (22:08 +0200)]
slurp_file: Various minor bug fixes
* Fix unsigned <= 0 check.
* Add additional size sanity check on 32-bit systems.
* Don't read uninitialized data if fread() fails.
Henrik Gramner [Tue, 23 Jun 2015 20:47:53 +0000 (22:47 +0200)]
param_parse: Check strdup() return value
Henrik Gramner [Tue, 23 Jun 2015 13:38:16 +0000 (15:38 +0200)]
param_parse: Fix memory leak
Anton Mitrofanov [Fri, 19 Jun 2015 13:01:12 +0000 (16:01 +0300)]
Add FreeBSD's stdint.h header guard to allowed list
Patch written by Koop Mast <kwm@FreeBSD.org>
Henrik Gramner [Fri, 22 May 2015 17:23:33 +0000 (19:23 +0200)]
x86: Prevent overread of src in plane_copy_interleave
Could only occur in 4:2:2 with height == 1.
Also enable asm for inputs with different U/V strides as long as the strides
have identical signs.
Anton Mitrofanov [Wed, 20 May 2015 20:10:20 +0000 (23:10 +0300)]
checkasm: Fix incorrect memcmp size for ARM architecture
Anton Mitrofanov [Sun, 26 Apr 2015 17:51:05 +0000 (20:51 +0300)]
Fix possible use of uninitialized MVs in lookahead analysis for B-frames
Anton Mitrofanov [Tue, 21 Apr 2015 20:08:19 +0000 (23:08 +0300)]
Catch incorrect usage of libx264 API for delayed frames flushing
Anton Mitrofanov [Sat, 7 Mar 2015 20:00:09 +0000 (23:00 +0300)]
Fix detection of system libx264 configuration
Anton Mitrofanov [Mon, 23 Feb 2015 11:23:18 +0000 (14:23 +0300)]
Cosmetic changes
Anton Mitrofanov [Tue, 30 Dec 2014 23:15:05 +0000 (02:15 +0300)]
Update configure for auto detection of system libx264 configuration
Anton Mitrofanov [Tue, 3 Feb 2015 11:51:28 +0000 (14:51 +0300)]
Add tile format frame packing value
Defined in 2014-02 edition.
Anton Mitrofanov [Tue, 3 Feb 2015 10:39:14 +0000 (13:39 +0300)]
Stricter validation of crop-rect values
Vittorio Giovara [Tue, 20 Jan 2015 16:15:56 +0000 (16:15 +0000)]
Add mono frame packing value
Defined in 2013-04 edition.
Vittorio Giovara [Tue, 20 Jan 2015 15:57:41 +0000 (15:57 +0000)]
Validate frame packing value instead of clipping
Christophe Gisquet [Tue, 3 Feb 2015 19:40:41 +0000 (20:40 +0100)]
x86inc: Correctly warn on use of SSE2 instructions in SSE functions
SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2
instructions did not issue warnings when used in SSE functions. Handle
it by also checking the register type when such instructions are used.
Christophe Gisquet [Tue, 3 Feb 2015 17:02:30 +0000 (18:02 +0100)]
x86inc: Fix instantiation of YMM registers
Vittorio Giovara [Tue, 20 Jan 2015 16:28:54 +0000 (16:28 +0000)]
matroska: Correctly write display width and height in stereo mode
According to the specifications, when stereo mode is set, these values
represent the single view size.
Kieran Kunhya [Tue, 20 Jan 2015 15:38:00 +0000 (09:38 -0600)]
Use POC type 0 for AVC-Intra
Based on a patch from Capella Systems
Anton Mitrofanov [Sat, 3 Jan 2015 12:46:19 +0000 (15:46 +0300)]
Fix ARCH variable name conflict with BSD ports (bsd.port.mk) read-only variable
Anton Mitrofanov [Sat, 27 Dec 2014 17:35:39 +0000 (20:35 +0300)]
Fix negative percentages in final stats output
They were caused by integer overflow when encoding long UHD video.
Anton Mitrofanov [Sat, 3 Jan 2015 20:35:23 +0000 (23:35 +0300)]
Bump dates to 2015
Anton Mitrofanov [Mon, 15 Dec 2014 15:49:23 +0000 (18:49 +0300)]
x86: Update intel compiler cpu dispatcher override for new versions of ICC/ICL
Anton Mitrofanov [Tue, 6 Sep 2011 17:53:29 +0000 (21:53 +0400)]
New AQ mode: auto-variance AQ with bias to dark scenes
Also known as --aq-mode 3 or auto-variance AQ modification.
Anton Mitrofanov [Tue, 28 Aug 2012 23:02:27 +0000 (03:02 +0400)]
Improve HRD conformance
Henrik Gramner [Fri, 28 Nov 2014 22:24:56 +0000 (23:24 +0100)]
x86: SSE and AVX implementations of plane_copy
Also remove the MMX2 implementation and fix src overread for height == 1.
Anton Mitrofanov [Mon, 29 Sep 2014 19:26:19 +0000 (23:26 +0400)]
Update to the latest version of gas-preprocessor.pl from http://git.libav.org/?p=gas-preprocessor.git
Contributions by Janne Grunau, Martin Storsjo, Mans Rullgard, David Conrad, Martin Aumuller and others
Janne Grunau [Tue, 18 Nov 2014 23:33:55 +0000 (00:33 +0100)]
aarch64: cabac_encode_{decision,bypass,terminal}_asm
benchmarks on a Nexus 9 (nvidia denver):
101.3 cycles in x264_cabac_encode_decision_c,
67105369 runs, 3495 skips
97.3 cycles in x264_cabac_encode_decision_asm,
67105493 runs, 3371 skips
132.8 cycles in x264_cabac_encode_terminal_c,
1046950 runs, 1626 skips
116.1 cycles in x264_cabac_encode_terminal_asm,
1048424 runs, 152 skips
92.4 cycles in x264_cabac_encode_bypass_c,
16776192 runs, 1024 skips
89.6 cycles in x264_cabac_encode_bypass_asm,
16776453 runs, 763 skips
Cycle counts are not as stable as one would like. The dynamic code
optimisation seems to produce different results for small chnages in a
binary. Repeated runs with the same binary produce stable results
though (ignoring the first run).
Janne Grunau [Thu, 6 Nov 2014 08:20:17 +0000 (09:20 +0100)]
checkasm: add cycle counter read for aarch64
Needs kernel support since user space access to the cycle counter is not
allowed on all available AArch64 systems (Android 5 and iOS).
Janne Grunau [Wed, 5 Nov 2014 10:35:13 +0000 (11:35 +0100)]
aarch64: nal_escape_neon
3-4 times faster.
Janne Grunau [Fri, 31 Oct 2014 13:49:04 +0000 (14:49 +0100)]
aarch64: {plane_copy,memcpy_aligned,memzero_aligned}_neon
2-3 times faster than C.
Janne Grunau [Wed, 29 Oct 2014 17:17:48 +0000 (18:17 +0100)]
aarch64: x264_mbtree_propagate_{cost,list}_neon
x264_mbtree_propagate_cost_neon is ~7 times faster.
x264_mbtree_propagate_list_neon is 33% faster.
Janne Grunau [Tue, 21 Oct 2014 13:18:49 +0000 (15:18 +0200)]
aarch64: x264_denoise_dct_neon
3.5 times faster.
Janne Grunau [Mon, 20 Oct 2014 11:12:14 +0000 (13:12 +0200)]
aarch64: x264_coeff_level_run{4,8,15,16}
All functions ~33% faster.
Janne Grunau [Tue, 14 Oct 2014 17:20:52 +0000 (19:20 +0200)]
aarch64: NEON asm for intra luma deblocking
deblock_luma_intra[0]_neon is 2 times fastes,
deblock_luma_intra[1]_neon is ~4 times faster.
Janne Grunau [Mon, 13 Oct 2014 15:29:22 +0000 (17:29 +0200)]
aarch64: x264_deblock_h_chroma_422_neon
deblock_h_chroma_422 2.5 times faster
Janne Grunau [Mon, 13 Oct 2014 10:43:50 +0000 (12:43 +0200)]
aarch64: x264_deblock_h_chroma_mbaff_neon
deblock_chroma_420_mbaff_neon 2 times faster
Janne Grunau [Fri, 10 Oct 2014 08:29:15 +0000 (10:29 +0200)]
aarch64: NEON asm for intra chroma deblocking
deblock_h_chroma_420_intra, deblock_h_chroma_422_intra and
x264_deblock_h_chroma_intra_mbaff_neon are ~3 times faster.
deblock_chroma_intra[1] is ~4 times faster than C.
Janne Grunau [Tue, 2 Sep 2014 08:27:22 +0000 (10:27 +0200)]
aarch64: add myself as author to aarch64/mc.h
Janne Grunau [Thu, 14 Aug 2014 13:22:50 +0000 (14:22 +0100)]
aarch64: NEON asm for integral init
integral_init4h_neon and integral_init8h_neon are 3-4 times faster than
C. integral_init8v_neon is 6 times faster and integral_init4v_neon is 10
times faster.
Janne Grunau [Wed, 13 Aug 2014 12:30:53 +0000 (13:30 +0100)]
aarch64: NEON asm for 8x16c intra prediction
Between 10% and 40% faster than C.
Janne Grunau [Tue, 12 Aug 2014 15:26:10 +0000 (17:26 +0200)]
aarch64: NEON asm for decimate_score
decimate_score15 and 16 are 60% faster, decimate_score64 is 4 times
faster than C.
Janne Grunau [Fri, 8 Aug 2014 10:19:35 +0000 (11:19 +0100)]
aarch64: implement x264_sub8x16_dct_dc_neon
4 times faster than C.
Janne Grunau [Thu, 7 Aug 2014 17:46:07 +0000 (19:46 +0200)]
aarch64: implement x264_pixel_asd8_neon
7 times faster than C.
Janne Grunau [Thu, 7 Aug 2014 14:49:12 +0000 (16:49 +0200)]
aarch64: NEON asm for 4x16 sad, satd and ssd
pixel_sad_4x16_neon: 33% faster than C
pixel_satd_4x16_neon: 5 times faster
pixel_ssd_4x16_neon: 4 times faster
Janne Grunau [Wed, 30 Jul 2014 14:48:25 +0000 (15:48 +0100)]
aarch64: implement x264_pixel_ssd_nv12_core_neon
13 times faster than C.
Janne Grunau [Tue, 29 Jul 2014 17:26:11 +0000 (18:26 +0100)]
aarch64: implement x264_pixel_vsad_neon
35 times faster than C.
Janne Grunau [Tue, 29 Jul 2014 10:06:24 +0000 (11:06 +0100)]
aarch64: NEON asm for missing x264_zigzag_* functions
zigzag_scan_4x4_field_neon, zigzag_sub_4x4_field_neon,
zigzag_sub_4x4ac_field_neon, zigzag_sub_4x4_frame_neon,
igzag_sub_4x4ac_frame_neon more than 2 times faster
zigzag_scan_8x8_frame_neon, zigzag_scan_8x8_field_neon,
zigzag_sub_8x8_field_neon, zigzag_sub_8x8_frame_neon 4-5 times faster
zigzag_interleave_8x8_cavlc_neon 6 times faster
Janne Grunau [Fri, 25 Jul 2014 10:53:17 +0000 (11:53 +0100)]
aarch64: implement x264_pixel_sa8d_satd_16x16_neon
~20% faster than calling pixel_sa8d_16x16 and pixel_satd_16x16
separately.
Janne Grunau [Thu, 14 Aug 2014 21:13:27 +0000 (23:13 +0200)]
aarch64: optimize x264_predict_8x8c_dc_left_neon
25% faster than the previous version.
Henrik Gramner [Sat, 2 Aug 2014 16:26:18 +0000 (18:26 +0200)]
x86: Make AVX2 also imply FMA3
All CPUs with AVX2 supports FMA3 (but not the other way around).
Anton Mitrofanov [Thu, 13 Nov 2014 19:52:00 +0000 (22:52 +0300)]
Simplify libx264 API usage example
Henrik Gramner [Fri, 21 Nov 2014 22:47:20 +0000 (23:47 +0100)]
AvxSynth: Remove a bunch of unused cruft
Anton Mitrofanov [Wed, 3 Dec 2014 19:36:12 +0000 (22:36 +0300)]
Fix bugs/typos in motion compensation and cache_load
Didn't affect output due to the incorrect values either not being used in the
code path or producing equal results compared to the correct values.
Also deduplicate hpel_ref arrays.
Anton Mitrofanov [Sun, 30 Nov 2014 20:39:28 +0000 (23:39 +0300)]
checkasm: Fix undefined behavior warnings
Henrik Gramner [Sat, 29 Nov 2014 17:47:52 +0000 (18:47 +0100)]
checkasm: Fix V210 reporting
It would previously report FAILED if any of the earlier plane_copy tests failed.
Anton Mitrofanov [Sun, 12 Oct 2014 17:01:53 +0000 (21:01 +0400)]
Safety check against malicious high bit-depth input which could cause crash
Anton Mitrofanov [Sun, 12 Oct 2014 16:45:40 +0000 (20:45 +0400)]
libx264 API usage example
Henrik Gramner [Fri, 17 Oct 2014 19:35:42 +0000 (21:35 +0200)]
x86: AVX2 high bit-depth var_16x16
40->27 cycles on Haswell.
Henrik Gramner [Wed, 8 Oct 2014 20:25:35 +0000 (22:25 +0200)]
checkasm: Serialize read_time() calls on x86
Improves the accuracy of benchmarks, especially in short functions.
To quote the Intel 64 and IA-32 Architectures Software Developer's Manual:
"The RDTSC instruction is not a serializing instruction. It does not necessarily
wait until all previous instructions have been executed before reading the counter.
Similarly, subsequent instructions may begin execution before the read operation
is performed. If software requires RDTSC to be executed only after all previous
instructions have completed locally, it can either use RDTSCP (if the processor
supports that instruction) or execute the sequence LFENCE;RDTSC."
RDTSCP would accomplish the same task, but it's only available since Nehalem.
This change makes SSE2 a requirement to run checkasm.
Vittorio Giovara [Mon, 29 Sep 2014 17:51:30 +0000 (18:51 +0100)]
Support case-independent string options
Anton Mitrofanov [Sat, 6 Sep 2014 16:44:49 +0000 (20:44 +0400)]
Shut up gcc -Wuninitialized warnings
Anton Mitrofanov [Fri, 5 Sep 2014 15:43:52 +0000 (19:43 +0400)]
Shut up clang -Wuninitialized warning
Anton Mitrofanov [Fri, 5 Sep 2014 15:30:47 +0000 (19:30 +0400)]
Fix few clang -Wunused-* warnings
Anton Mitrofanov [Thu, 28 Aug 2014 16:13:13 +0000 (20:13 +0400)]
Fix inappropriate instruction use
Anton Mitrofanov [Thu, 28 Aug 2014 14:38:53 +0000 (18:38 +0400)]
x264asm: warn when inappropriate instruction used in function with specified cpuflags
Anton Mitrofanov [Mon, 1 Sep 2014 21:48:00 +0000 (01:48 +0400)]
Fix VBV with true VFR streams
Anton Mitrofanov [Mon, 1 Sep 2014 18:45:00 +0000 (22:45 +0400)]
Fix VBV
Anton Mitrofanov [Tue, 29 Jul 2014 23:03:32 +0000 (03:03 +0400)]
Update to the current lavf API and fix memory leak when using --seek
Henrik Gramner [Mon, 4 Aug 2014 23:42:55 +0000 (01:42 +0200)]
x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags
Previously there was a limit of two cpuflags.
Henrik Gramner [Mon, 4 Aug 2014 23:42:51 +0000 (01:42 +0200)]
x86: Minor pixel_ssim_end4 improvements
Reduce the number of vector registers used from 7 to 5.
Eliminate some moves in the AVX implementation.
Avoid bypass delays for transitioning between int and float domains.
Henrik Gramner [Mon, 4 Aug 2014 23:42:47 +0000 (01:42 +0200)]
x86: Faster quant_4x4x4
Also drop the MMX version instead of doing a bunch of ifdeffery to support it after this change.
Anton Mitrofanov [Sun, 10 Aug 2014 18:46:12 +0000 (22:46 +0400)]
configure: improve cc_check for clang and ICL to not ignore unknown options
Henrik Gramner [Mon, 4 Aug 2014 23:42:44 +0000 (01:42 +0200)]
checkasm: Only call x264_cpu_detect() once
Janne Grunau [Fri, 18 Jul 2014 13:49:10 +0000 (14:49 +0100)]
aarch64: deblocking NEON asm
Deblock chroma/luma are based on libav's h264 aarch64 NEON deblocking
filter which was ported by me from the existing ARM NEON asm. No
additional persons to ask for a relicense.
Janne Grunau [Fri, 18 Jul 2014 08:29:35 +0000 (09:29 +0100)]
aarch64: intra predition NEON asm
Ported from the ARM NEON asm.
Janne Grunau [Thu, 17 Jul 2014 14:58:44 +0000 (15:58 +0100)]
aarch64: motion compensation NEON asm
Ported from the ARM NEON asm.
Janne Grunau [Wed, 16 Jul 2014 09:03:52 +0000 (10:03 +0100)]
aarch64: transform and zigzag NEON asm
Ported from the ARM NEON asm.
Janne Grunau [Tue, 15 Jul 2014 11:57:03 +0000 (12:57 +0100)]
aarch64: quantization and level-run NEON asm
Ported from the ARM NEON asm.
Janne Grunau [Wed, 19 Mar 2014 12:48:21 +0000 (13:48 +0100)]
aarch64: pixel metrics NEON asm
Ported from the ARM NEON asm.
Janne Grunau [Fri, 18 Jul 2014 15:44:57 +0000 (17:44 +0200)]
aarch64: add utility functions for asm
Janne Grunau [Wed, 19 Mar 2014 12:45:17 +0000 (13:45 +0100)]
aarch64: add armv8 and neon cpu flags and test them
Janne Grunau [Tue, 18 Mar 2014 21:10:24 +0000 (22:10 +0100)]
aarch64: initial build support
Janne Grunau [Tue, 22 Jul 2014 17:28:27 +0000 (19:28 +0200)]
checkasm: test zigzag_sub_8x8_{frame,field}
Janne Grunau [Sun, 20 Jul 2014 16:29:01 +0000 (18:29 +0200)]
arm: use long multiplication in mc_weight_w*_neon
9-19% faster on a cortex-a9.
Janne Grunau [Sun, 20 Jul 2014 16:24:57 +0000 (18:24 +0200)]
arm: do not use aligned stores in mc_weight_w4_*neon
mc_weight_w4_*neon is also used for width 2 which does not guarantee
4-byte aligned destination. Fixes crashes caused by random memory
corruption.
Janne Grunau [Wed, 2 Apr 2014 14:31:28 +0000 (16:31 +0200)]
checkasm: add memory clobber to read_time inline asm
The memory acts as compiler barrier preventing aggressive reordering
of read_time calls. gcc 4.8 reorders some of initial read_time calls
after the second when targeting arm.
Janne Grunau [Sun, 20 Jul 2014 11:32:10 +0000 (13:32 +0200)]
arm: check if the assembler supports the '.func' directive
The integrated assembler in llvm trunk (to be released as 3.5) is
otherwise capable enough to assemble the arm asm correctly.
Janne Grunau [Sun, 20 Jul 2014 11:40:28 +0000 (13:40 +0200)]
arm/ppc: use $CC as default assembler
Janne Grunau [Sun, 20 Jul 2014 11:34:27 +0000 (13:34 +0200)]
arm: move instructions after '.rept' to separate line
The gas manual states "Repeat the sequence of lines between the .rept
directive and the next .endr directive ...". GNU as seems to support
instructions on the same line as .rept anyway but the integrated
assembler in llvm trunk (to be released 3.5 in August 2014) does not.
Janne Grunau [Sun, 20 Jul 2014 11:08:17 +0000 (13:08 +0200)]
arm: set .arch/.fpu from asm.S
Janne Grunau [Sun, 20 Jul 2014 10:55:53 +0000 (12:55 +0200)]
arm: do not append CFLAGS to ASFLAGS
Tristan Matthews [Thu, 17 Jul 2014 04:03:50 +0000 (00:03 -0400)]
filters: fix sizeof mismatch
Anton Mitrofanov [Thu, 31 Jul 2014 12:17:32 +0000 (16:17 +0400)]
Fix memory leak when using select_every filter
Tsukasa OMOTO [Sun, 20 Jul 2014 13:17:11 +0000 (22:17 +0900)]
Fix cltostr.sh on OS X
Fiona Glaser [Wed, 9 Jul 2014 19:21:33 +0000 (12:21 -0700)]
Check pf_log is set in validate_parameters
Help remind people to call x264_param_default in case they didn't read the
documentation.
Anton Mitrofanov [Wed, 9 Jul 2014 13:17:04 +0000 (17:17 +0400)]
Check malloc during frame dumping
Yusuke Nakamura [Wed, 18 Jun 2014 20:21:29 +0000 (05:21 +0900)]
mp4_lsmash: Use new I/O API instead of deprecated one.
Anton Mitrofanov [Sun, 8 Jun 2014 18:19:46 +0000 (22:19 +0400)]
Remove meaningless use of abs()