]> granicus.if.org Git - libx264/log
libx264
12 years agoCABAC trellis opts part 2: C optimizations
Loren Merritt [Thu, 22 Dec 2011 17:56:06 +0000 (17:56 +0000)]
CABAC trellis opts part 2: C optimizations

Hoist the branch on coef value out of the loop over node contexts.
Special cases for each possible coef value (0,1,n).
Special case for dc-only blocks.
Template the main loop for two common subsets of nodes, to avoid a bunch of branches about which nodes are live.
Use the nonupdating version of cabac_size_decision in more cases, and omit those bins from the node struct.
CABAC offsets are now compile-time constants.
Change TRELLIS_SCORE_MAX from a specific constant to anything negative, which is cheaper to test.
Remove dct_weight2_zigzag[], since trellis has to lookup zigzag[] anyway.

60% faster on x86_64.
25k->18k codesize.

12 years agoCABAC trellis opts part 1: minor change in output
Loren Merritt [Thu, 22 Dec 2011 17:55:06 +0000 (17:55 +0000)]
CABAC trellis opts part 1: minor change in output
Due to different tie-break order.

12 years agox86inc improvements for 64-bit
Henrik Gramner [Sun, 8 Jan 2012 03:14:10 +0000 (04:14 +0100)]
x86inc improvements for 64-bit

Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments

12 years agoHigh bit depth SSE2/AVX add8x8_idct8 and add16x16_idct8
Ilia Valiakhmetov [Sun, 15 Jan 2012 10:47:58 +0000 (04:47 -0600)]
High bit depth SSE2/AVX add8x8_idct8 and add16x16_idct8
From Google Code-In.

12 years agoMMX/SSE2/AVX predict_8x16_p, high bit depth fdct8
Edward Wang [Wed, 4 Jan 2012 23:35:54 +0000 (15:35 -0800)]
MMX/SSE2/AVX predict_8x16_p, high bit depth fdct8
From Google Code-In.

12 years agoXOP 8-bit fDCT
Fiona Glaser [Thu, 22 Dec 2011 22:03:15 +0000 (14:03 -0800)]
XOP 8-bit fDCT
Use integer MAC for one of the SUMSUB passes.  About a dozen cycles faster for 16x16.

12 years agoHigh bit depth intra_sad_x3_4x4
Cristian Militaru [Wed, 4 Jan 2012 20:38:08 +0000 (12:38 -0800)]
High bit depth intra_sad_x3_4x4
From Google Code-In.

12 years agoUse a large LUT for CAVLC zero-run bit codes
Fiona Glaser [Thu, 8 Dec 2011 21:45:41 +0000 (13:45 -0800)]
Use a large LUT for CAVLC zero-run bit codes
Helps the most with trellis and RD, but also helps with bitstream writing.
Seems at worst neutral even in the extreme case of a CPU with small L2 cache (e.g. ARM Cortex A8).

12 years agoHigh bit depth intra_sad_x3_8x8, intra_satd_x3_4x4/8x8c/16x16
Matt Habel [Sat, 17 Dec 2011 07:16:09 +0000 (23:16 -0800)]
High bit depth intra_sad_x3_8x8, intra_satd_x3_4x4/8x8c/16x16
Also add an ACCUM macro to handle accumulator-induced add-or-swap more concisely.

12 years agoMMX 10-bit predict_8x8c_h and predict_8x16c_h
Shitiz Garg [Sat, 3 Dec 2011 23:34:57 +0000 (15:34 -0800)]
MMX 10-bit predict_8x8c_h and predict_8x16c_h
From Google Code-In.

12 years agoSome MBAFF x86 assembly functions.
Aaron Schmitz [Wed, 30 Nov 2011 06:15:45 +0000 (00:15 -0600)]
Some MBAFF x86 assembly functions.
deblock_chroma_420_mbaff, plus 422/422_intra_mbaff implemented using existing functions.
From Google Code-In.

12 years agoMore ARM NEON assembly functions
George Stephanos [Fri, 2 Dec 2011 00:53:45 +0000 (16:53 -0800)]
More ARM NEON assembly functions
predict_8x8_v, predict_4x4_dc_top, predict_8x8_ddl, predict_8x8_ddr, predict_8x8_vl, predict_8x8_vr, predict_8x8_hd, predict_8x8_hu.
From Google Code-In.

12 years agoMore 4:2:2 asm functions
Ilia [Mon, 28 Nov 2011 13:20:09 +0000 (05:20 -0800)]
More 4:2:2 asm functions
High bit depth version of deblock_h_chroma_422.
Regular and high bit depth versions of deblock_h_chroma_intra_422.
High bit depth pixel_vsad.
SSE2 high bit depth and MMX 8-bit predict_8x8_vl.
Our first GCI patch this year!

12 years agoSSE2 and SSSE3 versions of sub8x16_dct_dc
Henrik Gramner [Thu, 8 Dec 2011 15:14:35 +0000 (16:14 +0100)]
SSE2 and SSSE3 versions of sub8x16_dct_dc
Also slightly faster sub8x8_dct_dc

12 years agoResize filter updates
Steven Walters [Mon, 5 Dec 2011 13:46:34 +0000 (08:46 -0500)]
Resize filter updates
Use AVPixFmtDescriptors to pick the most compatible x264 csp for any pixel format.
Fix deprecated use of av_set_int.
Now requires libavutil >= 51.19.0

13 years agoAdd out-of-tree build support
Oka Motofumi [Thu, 5 Jan 2012 22:23:50 +0000 (14:23 -0800)]
Add out-of-tree build support

13 years agoLimit SSIM to 100db
Anton Mitrofanov [Fri, 16 Dec 2011 14:17:00 +0000 (18:17 +0400)]
Limit SSIM to 100db
Avoids floating point error for infinite SSIM (lossless).

13 years agoFix wrong conditional inclusion of inttypes.h
Reynaldo H. Verdejo Pinochet [Wed, 4 Jan 2012 16:16:12 +0000 (13:16 -0300)]
Fix wrong conditional inclusion of inttypes.h
inttypes.h is required by encoder/ratecontrol.c for SCNxxx macros, and HAVE_STDINT_H does not imply having inttypes.h.
stdint.h is a subset of inttypes.h, but this isn't enough for x264.
This change fixes building x264 with Android's toolchain.

13 years agoFix crash with sliced threads and input height <= 112
Anton Mitrofanov [Wed, 21 Dec 2011 07:08:56 +0000 (11:08 +0400)]
Fix crash with sliced threads and input height <= 112

13 years agoFix loading custom 8x8 chroma quant matrices in 4:4:4
Phillip Blucas [Mon, 19 Dec 2011 23:43:41 +0000 (17:43 -0600)]
Fix loading custom 8x8 chroma quant matrices in 4:4:4

13 years agoFix PCM cost overflow
Anton Mitrofanov [Thu, 15 Dec 2011 21:48:07 +0000 (01:48 +0400)]
Fix PCM cost overflow

13 years agoFix overflow in 8-bit x86 vsad asm function
Anton Mitrofanov [Thu, 8 Dec 2011 21:54:22 +0000 (01:54 +0400)]
Fix overflow in 8-bit x86 vsad asm function

13 years agoFix crash in --fullhelp when compiled against recent ffmpeg
Anton Mitrofanov [Wed, 7 Dec 2011 15:14:52 +0000 (19:14 +0400)]
Fix crash in --fullhelp when compiled against recent ffmpeg
Don't assume all pixel formats have a description.

13 years agoFix regression in r2118
Fiona Glaser [Tue, 6 Dec 2011 22:39:21 +0000 (14:39 -0800)]
Fix regression in r2118
Broke trellis with i16x16 macroblocks.

13 years agoModify MBAFF chroma deblock functions to handle U/V at the same time
Fiona Glaser [Wed, 30 Nov 2011 21:02:12 +0000 (13:02 -0800)]
Modify MBAFF chroma deblock functions to handle U/V at the same time
Allows for more convenient asm implementations.

13 years agoCABAC trellis optimizations: use SIMD quant
Fiona Glaser [Fri, 11 Nov 2011 00:16:13 +0000 (16:16 -0800)]
CABAC trellis optimizations: use SIMD quant
Significant speed increase, minor change in output due to rounding.

13 years agoYUV range detection and support for x264CLI
Steven Walters [Sun, 6 Nov 2011 17:48:30 +0000 (09:48 -0800)]
YUV range detection and support for x264CLI
Two new options: --input-range and --range.
--input-range forces the range of the input in case of misdetection; auto by default.
-- range sets the range of the output; x264cli will convert if necessary, TV by default.
--fullrange is now removed as a CLI option (but the libx264 API is unchanged).

13 years agoPass through user data
Kieran Kunhya [Fri, 4 Nov 2011 20:09:13 +0000 (20:09 +0000)]
Pass through user data

13 years agoRemove unpredictable branch in CABAC dqp
Fiona Glaser [Thu, 27 Oct 2011 21:05:56 +0000 (14:05 -0700)]
Remove unpredictable branch in CABAC dqp

13 years agox86inc: AVX symmetry optimization
Loren Merritt [Sun, 23 Oct 2011 23:15:11 +0000 (23:15 +0000)]
x86inc: AVX symmetry optimization
3-arg AVX ops with a memory arg can only have it in src2,
whereas SSE emulation of 3-arg prefers to have it in src1 (i.e. the move).
So, if the op is symmetric and the wrong one is memory, swap them.
Eliminates redundant moves in some cases when using 3-operand without AVX with memory arguments.
Also fix movss and movsd in some cases, and flag shufps correctly as float.

13 years agocheckasm: shut up gcc warnings, fix some naming of functions in results
Anton Mitrofanov [Tue, 29 Nov 2011 21:45:13 +0000 (13:45 -0800)]
checkasm: shut up gcc warnings, fix some naming of functions in results

13 years agocheckasm: fix build on ARM
Mans Rullgard [Tue, 29 Nov 2011 00:29:12 +0000 (16:29 -0800)]
checkasm: fix build on ARM
Because of how ALIGNED_ARRAY_16 is defined on ARM, array initialisers cannot be used here.  Use memset() instead.

13 years agoImprove makefile rules
Anton Mitrofanov [Fri, 11 Nov 2011 21:31:49 +0000 (01:31 +0400)]
Improve makefile rules
Remove the need for "make clean" after most reconfigures.

13 years agoMark some local functions as static, cosmetics
Anton Mitrofanov [Fri, 11 Nov 2011 20:47:48 +0000 (00:47 +0400)]
Mark some local functions as static, cosmetics

13 years agoFix crash if timecode file opening fails
Anton Mitrofanov [Fri, 11 Nov 2011 19:19:02 +0000 (23:19 +0400)]
Fix crash if timecode file opening fails

13 years agoConfigure: force PIC for shared build on PARISC and MIPS
Fabian Greffrath [Fri, 11 Nov 2011 21:25:43 +0000 (13:25 -0800)]
Configure: force PIC for shared build on PARISC and MIPS

13 years agoImprove yasm version check
Anton Mitrofanov [Sat, 22 Oct 2011 15:41:07 +0000 (19:41 +0400)]
Improve yasm version check
Previous check allowed certain earlier versions that weren't fully compatible.

13 years agoAdd fenc prefetching to adaptive quant
Fiona Glaser [Tue, 18 Oct 2011 21:30:26 +0000 (14:30 -0700)]
Add fenc prefetching to adaptive quant
Many fewer cache misses, faster adaptive quant.

13 years agoSplit prefetch_fenc between colorspaces
Fiona Glaser [Tue, 18 Oct 2011 21:14:03 +0000 (14:14 -0700)]
Split prefetch_fenc between colorspaces
Add 4:2:2 version.

13 years agoSome more 4:2:2 x86 asm
Fiona Glaser [Wed, 12 Oct 2011 00:04:32 +0000 (17:04 -0700)]
Some more 4:2:2 x86 asm
coeff_last8, coeff_level_run8, var2_8x16, predict_8x16c_dc, satd_4x16, intra_mbcmp_8x16c_x3, deblock_h_chroma_422

13 years agoRemove obsolete versions of intra_mbcmp_x3
Loren Merritt [Tue, 11 Oct 2011 18:12:43 +0000 (18:12 +0000)]
Remove obsolete versions of intra_mbcmp_x3
intra_mbcmp_x3 is unnecessary if x9 exists (SSSE3 and onwards).

13 years agoSSSE3/SSE4/AVX 9-way fully merged i8x8 analysis (sa8d_x9)
Loren Merritt [Mon, 10 Oct 2011 05:42:36 +0000 (05:42 +0000)]
SSSE3/SSE4/AVX 9-way fully merged i8x8 analysis (sa8d_x9)
x86_64 only for now, due to register requirements (like sa8d_x3).

i8x8 analysis cycles (per partition):
 penryn sandybridge bulldozer
616->600  482->374  418->356  preset=faster
892->632  725->387  598->373  preset=medium
948->650  789->409  673->383  preset=slower

13 years agoSSSE3/SSE4/AVX 9-way fully merged i8x8 analysis (sad_x9)
Fiona Glaser [Sat, 1 Oct 2011 02:09:19 +0000 (19:09 -0700)]
SSSE3/SSE4/AVX 9-way fully merged i8x8 analysis (sad_x9)
~3 times faster than current analysis, plus (like intra_sad_x9_4x4) analyzes all modes without shortcuts.

13 years agoMerge i4x4 prediction with intra_mbcmp_x9_4x4
Loren Merritt [Wed, 5 Oct 2011 20:29:21 +0000 (13:29 -0700)]
Merge i4x4 prediction with intra_mbcmp_x9_4x4
Avoids a redundant prediction after analysis.

13 years agoInline i4x4/i8x8 encode into intra analysis
Fiona Glaser [Wed, 5 Oct 2011 20:17:31 +0000 (13:17 -0700)]
Inline i4x4/i8x8 encode into intra analysis
Larger code size, but faster.

13 years agoInitial XOP and FMA4 support on AMD Bulldozer
Fiona Glaser [Thu, 22 Sep 2011 00:12:10 +0000 (17:12 -0700)]
Initial XOP and FMA4 support on AMD Bulldozer
~10% faster Hadamard functions (SATD/SA8D/hadamard_ac) plus other improvements.

13 years agoARM: update NEON chroma deblock functions to NV12 pixel format
Mans Rullgard [Tue, 27 Sep 2011 17:14:14 +0000 (21:14 +0400)]
ARM: update NEON chroma deblock functions to NV12 pixel format

13 years agoAdd /usr/lib/{64/}values-xpg6.o to $LDFLAGS on Solaris
Sean McGovern [Mon, 17 Oct 2011 19:45:15 +0000 (12:45 -0700)]
Add /usr/lib/{64/}values-xpg6.o to $LDFLAGS on Solaris
This is required for POSIX.1-2001 compliance.

13 years agoFix linker test for -Bsymbolic
Sean McGovern [Mon, 17 Oct 2011 19:44:03 +0000 (12:44 -0700)]
Fix linker test for -Bsymbolic
The Solaris linker only accepts -Bsymbolic for objects compiled in dynamic mode (i.e. shared objects), so pass -shared to gcc.
Additionally, for x86_32 unresolved textrels cause a linker error so mark the .text section as 'impure'.

13 years agoAdd $SOFLAGS to exported SOFLAGS make variable
Sean McGovern [Mon, 17 Oct 2011 19:43:28 +0000 (12:43 -0700)]
Add $SOFLAGS to exported SOFLAGS make variable

13 years agoAllow setting a chroma format at compile time
Henrik Gramner [Sat, 24 Sep 2011 13:56:08 +0000 (15:56 +0200)]
Allow setting a chroma format at compile time
Gives a slight speed increase and significant binary size reduction when only one chroma format is needed.

13 years agoImprove profile help
Harfe Leier [Fri, 30 Sep 2011 19:49:33 +0000 (12:49 -0700)]
Improve profile help
List high422/high444 profiles, and don't show non-high-bit-depth profiles in high bit depth builds.

13 years agoFix infinite loop parsing TDecimate Mode 3 timecode v1 files
Yusuke Nakamura [Wed, 19 Oct 2011 18:09:51 +0000 (03:09 +0900)]
Fix infinite loop parsing TDecimate Mode 3 timecode v1 files

13 years agoFix some integer overflows/signedness errors found by IOC
Fiona Glaser [Tue, 11 Oct 2011 00:44:31 +0000 (17:44 -0700)]
Fix some integer overflows/signedness errors found by IOC
The only real bug here is in slicetype.c, which may or may not affect real encodes.

13 years agoFix pixel_var2 with 4:2:2 encoding
Fiona Glaser [Wed, 12 Oct 2011 16:16:32 +0000 (09:16 -0700)]
Fix pixel_var2 with 4:2:2 encoding
Might have caused artifacts or suboptimal chroma compression.

13 years agoFix chroma intra analysis in 4:4:4 lossless mode
Anton Mitrofanov [Sun, 9 Oct 2011 15:14:16 +0000 (19:14 +0400)]
Fix chroma intra analysis in 4:4:4 lossless mode

13 years agoFix use of uninitialized MVs in sub8x8 RDO
Anton Mitrofanov [Sat, 8 Oct 2011 21:13:29 +0000 (01:13 +0400)]
Fix use of uninitialized MVs in sub8x8 RDO

13 years agoFix detection of Alpha CPU arch on alphaev67
Fabian Greffrath [Sat, 8 Oct 2011 02:04:17 +0000 (19:04 -0700)]
Fix detection of Alpha CPU arch on alphaev67

13 years agoOptimize x86 asm for Intel macro-op fusion
Fiona Glaser [Wed, 14 Sep 2011 21:53:04 +0000 (14:53 -0700)]
Optimize x86 asm for Intel macro-op fusion
That is, place all loop counter tests right before their conditional jumps.

13 years agoCAVLC: clean up and restructure
Fiona Glaser [Mon, 12 Sep 2011 18:51:23 +0000 (11:51 -0700)]
CAVLC: clean up and restructure
Somewhat faster CAVLC and RD bit-counting.

13 years agoCABAC: clean up and restructure
Fiona Glaser [Fri, 9 Sep 2011 00:27:02 +0000 (17:27 -0700)]
CABAC: clean up and restructure
Somewhat faster CABAC and RD bit-counting.

13 years agoSome initial 4:2:2 x86 asm
Fiona Glaser [Sun, 4 Sep 2011 09:31:29 +0000 (11:31 +0200)]
Some initial 4:2:2 x86 asm

13 years ago4:2:2 encoding support
Henrik Gramner [Fri, 26 Aug 2011 13:57:04 +0000 (15:57 +0200)]
4:2:2 encoding support

13 years agoSSSE3/SSE4 9-way fully merged i4x4 analysis (sad/satd_x9)
Loren Merritt [Mon, 15 Aug 2011 18:18:55 +0000 (18:18 +0000)]
SSSE3/SSE4 9-way fully merged i4x4 analysis (sad/satd_x9)

i4x4 analysis cycles (per partition):
penryn   sandybridge
184-> 75  157-> 54  preset=superfast (sad)
281->165  225->124  preset=faster    (satd with early termination)
332->165  263->124  preset=medium
379->165  297->124  preset=slower    (satd without early termination)

This is the first code in x264 that intentionally produces different behavior
on different cpus: satd_x9 is implemented only on ssse3+ and checks all intra
directions, whereas the old code (on fast presets) may early terminate after
checking only some of them. There is no systematic difference on slow presets,
though they still occasionally disagree about tiebreaks.

For ease of debugging, add an option "--cpu-independent" to disable satd_x9
and any analogous future code.

13 years agoFaster intra_mbcmp_x3 for versions without dedicated asm
Loren Merritt [Mon, 15 Aug 2011 17:43:42 +0000 (17:43 +0000)]
Faster intra_mbcmp_x3 for versions without dedicated asm
Select asm subroutines more intelligently in the wrapper functions.

13 years agoOptimize x86 intra_predict_4x4 and 8x8
Loren Merritt [Sat, 13 Aug 2011 19:01:22 +0000 (19:01 +0000)]
Optimize x86 intra_predict_4x4 and 8x8

High bit depth Penryn, Sandybridge cycles:
4x4_ddl: 11->10,  9-> 8
4x4_ddr: 15->13, 12->11
4x4_hd:        , 15->12
4x4_hu:        , 14->13
4x4_vr:  15->14, 14->12
8x8_ddl: 32->19, 19->14
8x8_ddr: 42->19, 21->14
8x8_hd:        , 15->13
8x8_hu:  21->17, 16->12
8x8_vr:  33->19,

8-bit Penryn, Sandybridge cycles:
4x4_ddr: 24->15,
4x4_hd:  24->16,
4x4_hu:  23->15,
4x4_vr:  23->16,
4x4_vl:  10-> 9,
8x8_ddl: 23->15,
8x8_hd:        , 17->14
8x8_hu:        , 15->14
8x8_vr:  20->16, 17->13

13 years agoUse realistic alignment for intra pred benchmarks in checkasm
Loren Merritt [Sat, 13 Aug 2011 06:44:28 +0000 (06:44 +0000)]
Use realistic alignment for intra pred benchmarks in checkasm

13 years agoFix frame packing SEI with --frame-packing 0
Yusuke Nakamura [Tue, 20 Sep 2011 16:15:38 +0000 (01:15 +0900)]
Fix frame packing SEI with --frame-packing 0
According to the spec, when frame_packing_arrangement_type is equal to 0, quincunx_sampling_flag shall be equal to 1.

13 years agoFix install/uninstall shared libs if SYS is WINDOWS/CYGWIN
Oka Motofumi [Mon, 5 Sep 2011 02:50:37 +0000 (11:50 +0900)]
Fix install/uninstall shared libs if SYS is WINDOWS/CYGWIN

13 years agoAdd Hurd support to configure
Reinhard Tartler [Wed, 10 Aug 2011 07:16:46 +0000 (00:16 -0700)]
Add Hurd support to configure

13 years agoOptimize x86 intra_satd_x3_*
Loren Merritt [Sat, 13 Aug 2011 00:39:35 +0000 (00:39 +0000)]
Optimize x86 intra_satd_x3_*
~7% faster.

13 years agoOptimize x86 intra_sa8d_x3_8x8
Loren Merritt [Fri, 12 Aug 2011 19:13:07 +0000 (19:13 +0000)]
Optimize x86 intra_sa8d_x3_8x8
~40% faster.
Also some other minor asm cosmetics.

13 years agoScale interlaced refs/mvs for mvr predictors
Loren Merritt [Fri, 12 Aug 2011 02:15:46 +0000 (02:15 +0000)]
Scale interlaced refs/mvs for mvr predictors
Slightly improves compression and fixes a Valgrind error.

13 years agoOptimize predict_8x8_filter and incidentally remove a valgrind false-positive
Loren Merritt [Thu, 11 Aug 2011 15:03:12 +0000 (15:03 +0000)]
Optimize predict_8x8_filter and incidentally remove a valgrind false-positive

13 years agoDon't override flat SSE2 dequant functions with non-flat AVX ones
Anton Mitrofanov [Mon, 15 Aug 2011 08:22:18 +0000 (12:22 +0400)]
Don't override flat SSE2 dequant functions with non-flat AVX ones
Slightly faster.

13 years agoShut up some valgrind false-positives
Loren Merritt [Mon, 8 Aug 2011 13:40:53 +0000 (13:40 +0000)]
Shut up some valgrind false-positives

13 years agoAvoid some unnecessary allocations with B-frames/CABAC off
Fiona Glaser [Tue, 16 Aug 2011 20:02:24 +0000 (13:02 -0700)]
Avoid some unnecessary allocations with B-frames/CABAC off

13 years agoFix typo in p8x8 RD analysis
Fiona Glaser [Tue, 23 Aug 2011 00:07:03 +0000 (17:07 -0700)]
Fix typo in p8x8 RD analysis
Passed wrong idx to trellis.

13 years agoFix invalid memory accesses in x86 lowres_init when width <= 16
Anton Mitrofanov [Sat, 20 Aug 2011 22:44:45 +0000 (02:44 +0400)]
Fix invalid memory accesses in x86 lowres_init when width <= 16

13 years agoFix intermediate conversion for YUVJ* pixfmts with 4:4:4 encoding
Anton Mitrofanov [Mon, 15 Aug 2011 08:03:09 +0000 (12:03 +0400)]
Fix intermediate conversion for YUVJ* pixfmts with 4:4:4 encoding

13 years agoFix pic_out returned by x264_encoder_encode with 4:4:4
Henrik Gramner [Sun, 14 Aug 2011 11:39:29 +0000 (13:39 +0200)]
Fix pic_out returned by x264_encoder_encode with 4:4:4

13 years agoFix zeroing of mvr predictors in bskip blocks
Loren Merritt [Thu, 11 Aug 2011 22:12:26 +0000 (22:12 +0000)]
Fix zeroing of mvr predictors in bskip blocks

13 years agoFix: chroma planes for weightp analysis were not initted if U early-terminates and...
Loren Merritt [Thu, 11 Aug 2011 01:33:13 +0000 (01:33 +0000)]
Fix: chroma planes for weightp analysis were not initted if U early-terminates and V doesn't.

13 years agoExpand borders before chroma weightp analysis
Henrik Gramner [Wed, 10 Aug 2011 18:25:07 +0000 (20:25 +0200)]
Expand borders before chroma weightp analysis
Prevents mc from using uninitialized source pixels.

13 years agoAnother 4:4:4 chroma weightp bug fix
Henrik Gramner [Wed, 10 Aug 2011 17:29:14 +0000 (19:29 +0200)]
Another 4:4:4 chroma weightp bug fix

13 years agoFix typo in help
Fiona Glaser [Wed, 10 Aug 2011 07:17:26 +0000 (00:17 -0700)]
Fix typo in help

13 years agoImprove support for varying resolution between passes
Fiona Glaser [Sat, 6 Aug 2011 17:45:47 +0000 (10:45 -0700)]
Improve support for varying resolution between passes
Should give much better quality, but still doesn't support MB-tree yet.
Also check for the same interlaced options between passes.
Various minor ratecontrol cosmetics.

13 years agoasm cosmetics: base-4 constants for shuffles
Loren Merritt [Sun, 7 Aug 2011 22:57:27 +0000 (22:57 +0000)]
asm cosmetics: base-4 constants for shuffles

13 years agoEnable some existing asm functions that were missing function pointers
Loren Merritt [Wed, 3 Aug 2011 14:58:50 +0000 (14:58 +0000)]
Enable some existing asm functions that were missing function pointers
pixel_ads1_avx, predict_8x8_hd_avxx
High bit depth mc_copy_w8_sse2, denoise_dct_avx, prefetch_fenc/ref, and several pixel*sse4.

13 years agoRemove some unused, broken, and/or useless functions
Loren Merritt [Wed, 3 Aug 2011 14:57:06 +0000 (14:57 +0000)]
Remove some unused, broken, and/or useless functions
Unused frame_sort.
Unused x86_64 dequant_4x4dc_mmx2, predict_8x8_vr_mmx2.
Unused and broken high_depth integral_init*h_sse4, optimize_chroma_*, dequant_flat_*, sub8x8_dct_dc_*, zigzag_sub_*.
Useless high_depth dequant_sse4, dequant_dc_sse4.

13 years agoasm cosmetics: merge all the variants of ABS macros
Loren Merritt [Wed, 3 Aug 2011 14:56:27 +0000 (14:56 +0000)]
asm cosmetics: merge all the variants of ABS macros

13 years agoasm cosmetics part 2
Loren Merritt [Wed, 3 Aug 2011 14:53:29 +0000 (14:53 +0000)]
asm cosmetics part 2
These changes were split out of the cpuflags commit because they change the output executable.

13 years agoasm cosmetics: INIT_MMX/XMM/YMM now support a cpuflags argument
Loren Merritt [Wed, 3 Aug 2011 14:46:41 +0000 (14:46 +0000)]
asm cosmetics: INIT_MMX/XMM/YMM now support a cpuflags argument

Reduces the number of macro args that need to be passed around.
Allows multiple implementations of a given macro (e.g. PALIGNR) to check
cpuflags at the location where the macro is defined, instead of having
to select implementations by %define at toplevel.
Remove INIT_AVX, as it's replaced by "INIT_XMM avx".

This commit does not change the stripped executable.

13 years agoImport x86inc.asm patches from libav
Loren Merritt [Wed, 3 Aug 2011 14:43:34 +0000 (14:43 +0000)]
Import x86inc.asm patches from libav

13 years agoCosmetics: s/mmxext/mmx2/
Loren Merritt [Wed, 3 Aug 2011 14:42:12 +0000 (14:42 +0000)]
Cosmetics: s/mmxext/mmx2/

13 years agoFix two bugs in 4:4:4 chroma weightp analysis
Henrik Gramner [Sun, 7 Aug 2011 09:58:36 +0000 (11:58 +0200)]
Fix two bugs in 4:4:4 chroma weightp analysis
Caused slightly worse compression.

13 years agoFix "--asm avx"
Loren Merritt [Wed, 3 Aug 2011 14:40:01 +0000 (14:40 +0000)]
Fix "--asm avx"
Previously required "--asm sse2fast,fastshuffle,sse4.2,avx".

13 years agoRe-add support for glibc <2.6, which doesn't have CPU_COUNT
Anton Mitrofanov [Fri, 5 Aug 2011 11:59:20 +0000 (15:59 +0400)]
Re-add support for glibc <2.6, which doesn't have CPU_COUNT

13 years agoAvoid using deprecated libavformat functions
Yasuhiro Ikeda [Mon, 1 Aug 2011 23:59:15 +0000 (08:59 +0900)]
Avoid using deprecated libavformat functions
Replace av_find_stream_info with avformat_find_stream_info.
Now requires libavformat 53.3.0 or newer.

13 years agoUse assembly versions of some deblocking functions in MBAFF
Henrik Gramner [Wed, 27 Jul 2011 00:23:12 +0000 (02:23 +0200)]
Use assembly versions of some deblocking functions in MBAFF