]> granicus.if.org Git - libx264/log
libx264
12 years agoAdd support for the ffmpeg/vapoursynth high bit depth y4m extensions
Jan Ekström [Sun, 7 Oct 2012 18:12:05 +0000 (21:12 +0300)]
Add support for the ffmpeg/vapoursynth high bit depth y4m extensions

12 years agox86inc: Rename 3dnow2 to 3dnowext
Diego Biurrun [Tue, 6 Nov 2012 13:48:56 +0000 (14:48 +0100)]
x86inc: Rename 3dnow2 to 3dnowext
The name "3dnowext" is more common than "3dnow2". Doesn't affect x264.

12 years agox86inc: only define program_name if the macro is unset.
Diego Biurrun [Wed, 31 Oct 2012 19:23:54 +0000 (12:23 -0700)]
x86inc: only define program_name if the macro is unset.
This allows overriding the value from outside the file.
This can be useful if x86inc.asm is used outside of x264.

12 years agoDisable ARM NEON MRC CPU test for Apple devices
David Wolstencroft [Mon, 29 Oct 2012 16:07:39 +0000 (09:07 -0700)]
Disable ARM NEON MRC CPU test for Apple devices
The Apple A6 CPU doesn't support performance counters, so this test caused a crash.

12 years agoFix crash with no-scenecut + mbtree
Fiona Glaser [Tue, 6 Nov 2012 20:03:20 +0000 (12:03 -0800)]
Fix crash with no-scenecut + mbtree

12 years agoFix reconfiguring to crf=0
Anton Mitrofanov [Fri, 12 Oct 2012 19:43:40 +0000 (23:43 +0400)]
Fix reconfiguring to crf=0
Lossless mode can't currently be enabled mid-stream.

12 years agoFix ALIGNED_ARRAY_EMU macros on ICL
Derek Buitenhuis [Mon, 17 Sep 2012 18:09:20 +0000 (11:09 -0700)]
Fix ALIGNED_ARRAY_EMU macros on ICL
ICL's preprocessor doesn't handle it correctly.
This fix is similar to libav's fix in 0db2d9.

12 years agoFix use of deprecated av_close_input_file call
Jason Martens [Thu, 13 Sep 2012 18:20:40 +0000 (11:20 -0700)]
Fix use of deprecated av_close_input_file call

12 years agoFix pkg-config for dynamic vs static linking
Brad Smith [Wed, 26 Sep 2012 21:13:27 +0000 (14:13 -0700)]
Fix pkg-config for dynamic vs static linking

12 years agoSet libm in the configure script if the OS has libm
Brad Smith [Tue, 11 Sep 2012 00:52:04 +0000 (17:52 -0700)]
Set libm in the configure script if the OS has libm
Prerequisite for another configure patch after this.
Idea copied from libpthread.

12 years agoEnhance mb_info: add mb_info_update
Fiona Glaser [Thu, 16 Aug 2012 20:40:32 +0000 (13:40 -0700)]
Enhance mb_info: add mb_info_update
This feature lets the callee know which decoded macroblocks have changed.

12 years agoFix mb_info_free with sliced threads
Fiona Glaser [Thu, 16 Aug 2012 20:01:17 +0000 (13:01 -0700)]
Fix mb_info_free with sliced threads
x264 would free mb_info before it was completely done using it.

12 years agoEnhance nalu_process
Fiona Glaser [Tue, 7 Aug 2012 19:43:26 +0000 (12:43 -0700)]
Enhance nalu_process
Add the input frame opaque pointer to the arguments.
This makes it easier to use with multiple simultaneous x264 encodes.

12 years agoImprove mb_info constant mb optimization
Fiona Glaser [Mon, 6 Aug 2012 21:55:35 +0000 (14:55 -0700)]
Improve mb_info constant mb optimization
Allow fast skipping even if the pskip MV isn't zero.

12 years agoExport the average effective CRF of each frame
Fiona Glaser [Mon, 30 Jul 2012 19:58:34 +0000 (12:58 -0700)]
Export the average effective CRF of each frame
Useful to judge the resulting quality of a frame when VBV is enabled.

12 years agoRemove special-casing for OpenBSD pthread handling
Brad Smith [Tue, 21 Aug 2012 06:58:19 +0000 (23:58 -0700)]
Remove special-casing for OpenBSD pthread handling
Previously it was policy to use -pthread, but OpenBSD now recommends -lpthread.
its been libpthread anyway and policy has changed to stop using -pthread.

12 years agox86inc: automatically insert vzeroupper for YMM functions
Ronald S. Bultje [Fri, 27 Jul 2012 01:01:49 +0000 (18:01 -0700)]
x86inc: automatically insert vzeroupper for YMM functions
Backported from libav.

12 years agoFree user supplied data when deleting a frame
Kieran Kunhya [Tue, 24 Jul 2012 15:47:45 +0000 (08:47 -0700)]
Free user supplied data when deleting a frame
This eliminates a memory leak when calling x264_encoder_close.

12 years agoRevert r2204
Fiona Glaser [Wed, 18 Jul 2012 15:33:41 +0000 (08:33 -0700)]
Revert r2204
People don't seem to like this so I'm just going to get rid of it.

12 years agoFaster predictor checking with subme<3
Fiona Glaser [Tue, 10 Jul 2012 21:10:44 +0000 (14:10 -0700)]
Faster predictor checking with subme<3
Fix a typo that made an early-skip less effective.
Avoid a relatively unpredictable branch.
Slightly changed output due to the typo-fix.
~50 cycles faster on Core i7.

12 years agoTry 8x8 transform analysis even when sub8x8 partitions are present
Fiona Glaser [Tue, 26 Jun 2012 01:01:29 +0000 (18:01 -0700)]
Try 8x8 transform analysis even when sub8x8 partitions are present
Turn off the sub8x8 partitions, try it, and turn them back on if it didn't help.
Small compression improvement with p4x4 on (~0.1-0.5%).
Also update related comments.

12 years agoSupport changing resolutions between passes with macroblock-tree
Fiona Glaser [Sat, 9 Jun 2012 01:19:59 +0000 (18:19 -0700)]
Support changing resolutions between passes with macroblock-tree
Implement a basic separable bilinear filter to rescale the quantizer offsets.
Structure inspired by swscale, but floating-point instead of fixed-point.
Not as optimized as it could be, but it's quite fast already.

Example compression penalties on a 720p video game recording:
First pass with 720p and second as 480p: ~-1.5% (vs. same res)
First pass with 480p and second as 720p: ~-3% (vs. same res)

12 years agoPrint elapsed time in encoding progress indicator
Alexander Prikhodko [Tue, 12 Jun 2012 17:21:35 +0000 (20:21 +0300)]
Print elapsed time in encoding progress indicator

12 years agoCap ratecontrol predictor parameters
Anton Mitrofanov [Sat, 2 Jun 2012 17:27:50 +0000 (21:27 +0400)]
Cap ratecontrol predictor parameters
Limits VBV mispredictions after long periods of relatively constant video.

12 years agox86inc: import patches from libav
Loren Merritt [Tue, 3 Jul 2012 21:38:04 +0000 (14:38 -0700)]
x86inc: import patches from libav
Allow manual invocation of WIN64_SPILL_XMM even under INIT_MMX
SSE version of mova is movaps rather than movdqa.
YMM version of movnta.
Add mp size for named arguments.
Fix DEFINE_ARGS when used outside of a cglobal.
Define a few more cpuflags.
3-argument wrappers for a few more instructions.

12 years agoFix crash with --fps 0
Anton Mitrofanov [Fri, 22 Jun 2012 18:02:24 +0000 (22:02 +0400)]
Fix crash with --fps 0
Fix some integer overflows and check input parameters better.
Also fix incorrect type specifiers for demuxer info printing.

12 years agoThreaded lookahead
Fiona Glaser [Tue, 8 May 2012 22:42:56 +0000 (15:42 -0700)]
Threaded lookahead

Split each lookahead frame analysis call into multiple threads.  Has a small
impact on quality, but does not seem to be consistently any worse.

This helps alleviate bottlenecks with many cores and frame threads. In many
case, this massively increases performance on many-core systems.  For example,
over 100% faster 1080p encoding with --preset veryfast on a 12-core i7 system.
Realtime 1080p30 at --preset slow should now be feasible on real systems.

For sliced-threads, this patch should be faster regardless of settings (~10%).

By default, lookahead threads are 1/6 of regular threads.  This isn't exacting,
but it seems to work well for all presets on real systems.  With sliced-threads,
it's the same as the number of encoding threads.

12 years agoAdd support for RGB formats in bit-depth conversion filter
Anton Mitrofanov [Fri, 4 May 2012 13:18:12 +0000 (17:18 +0400)]
Add support for RGB formats in bit-depth conversion filter

12 years agoFix some bugs in mb_info code
Anton Mitrofanov [Sat, 12 May 2012 09:57:49 +0000 (13:57 +0400)]
Fix some bugs in mb_info code

12 years agoAdd mb_info API for signalling constant macroblocks
Fiona Glaser [Thu, 29 Mar 2012 21:14:07 +0000 (14:14 -0700)]
Add mb_info API for signalling constant macroblocks
Some use-cases of x264 involve encoding video with large constant areas of the frame.
Sometimes, the caller knows which areas these are, and can tell x264.
This API lets the caller do this and adds internal tracking of modifications to macroblocks to avoid problems.
This is really only suitable without B-frames.
An example use-case would be using x264 for VNC.

12 years agoFaster chroma weight cost calculation
Henrik Gramner [Fri, 6 Apr 2012 22:40:09 +0000 (00:40 +0200)]
Faster chroma weight cost calculation

New assembly function with SSE2, SSSE3 and XOP implementations for calculating absolute sum of differences.

12 years agoAdd Level 5.2 support
Lucien [Sat, 31 Mar 2012 12:42:49 +0000 (13:42 +0100)]
Add Level 5.2 support

12 years agoEradicate all mention of Extended Profile
Henrik Gramner [Thu, 12 Apr 2012 17:14:43 +0000 (19:14 +0200)]
Eradicate all mention of Extended Profile
x264 never supported it and never will because nobody uses it.

12 years agoFix disabling of mbtree when using 2pass encoding and zones
Anton Mitrofanov [Tue, 3 Apr 2012 17:46:52 +0000 (21:46 +0400)]
Fix disabling of mbtree when using 2pass encoding and zones

12 years agoconfigure: force select -mXX gcc option for i386/x86-64
Alexander Prikhodko [Sat, 31 Mar 2012 09:06:21 +0000 (12:06 +0300)]
configure: force select -mXX gcc option for i386/x86-64
Makes multilib compilation more convenient.

12 years agoUpdate config.guess and config.sub
Rafaël Carré [Mon, 16 Apr 2012 01:20:14 +0000 (21:20 -0400)]
Update config.guess and config.sub
Adds support for a bunch of targets, including:
aarch64 (armv8)
arm-linux-androideabi

12 years agoconfigure: correct use of RC variable and add --extra-rcflags
Alexander Prikhodko [Sat, 31 Mar 2012 08:33:41 +0000 (11:33 +0300)]
configure: correct use of RC variable and add --extra-rcflags

12 years agoICL/MSVS: Fix shared library generation and usage
Steven Walters [Thu, 29 Mar 2012 01:15:04 +0000 (21:15 -0400)]
ICL/MSVS: Fix shared library generation and usage
MSVS requires exported variables to be declared with the DATA keyword, and requires that imported variables be declared with dllimport.
This does not fix x264 cli being unable to use a shared library built by ICL however.

12 years agoFix intra-refresh + hrd
Kieran Kunhya [Tue, 27 Mar 2012 16:38:56 +0000 (17:38 +0100)]
Fix intra-refresh + hrd

12 years agoFix frame input colorspace check
Anton Mitrofanov [Sun, 25 Mar 2012 13:34:24 +0000 (17:34 +0400)]
Fix frame input colorspace check

12 years agoFix comment in deblock.c
Fiona Glaser [Thu, 22 Mar 2012 20:56:50 +0000 (13:56 -0700)]
Fix comment in deblock.c
The code does, in fact, handle CAVLC+8x8dct correctly already.

12 years agoFix sliced-threads ratecontrol bug
Fiona Glaser [Tue, 13 Mar 2012 21:37:26 +0000 (14:37 -0700)]
Fix sliced-threads ratecontrol bug
Was using qp instead of qscale; could cause NANs (not to mention less accurate results).

12 years agoFix clobbering of mutex/cvs
Anton Mitrofanov [Mon, 12 Mar 2012 06:08:18 +0000 (23:08 -0700)]
Fix clobbering of mutex/cvs
Regression in r2183.
Bizarrely seemed to work on many platforms, but crashed on win64 and may have been slower.
Only affected sliced threads during encoding, but could cause crashes on x264 encoder close even without sliced threads.

12 years agoSliced-threads: do hpel and deblock after returning
Fiona Glaser [Fri, 24 Feb 2012 21:34:39 +0000 (13:34 -0800)]
Sliced-threads: do hpel and deblock after returning
Lowers encoding latency around 14% in sliced threads mode with preset superfast.
Additionally, even if there is no waiting time between frames, this improves parallelism, because hpel+deblock are done during the (singlethreaded) lookahead.
For ease of debugging, dump-yuv forces all of the threads to wait and finish instead of setting b_full_recon.

12 years agoAdd full-recon API option
Fiona Glaser [Fri, 24 Feb 2012 21:16:52 +0000 (13:16 -0800)]
Add full-recon API option
Fully reconstruct frames even without dump-yuv.

12 years agox86inc: switch to amdnops
Fiona Glaser [Wed, 22 Feb 2012 21:33:36 +0000 (13:33 -0800)]
x86inc: switch to amdnops
Recent AMD CPUs' instruction decoders choke horribly on extremely long nops (i.e. with 4 prefixes).
Won't affect much, since we don't use ALIGN much.

12 years agoBMI1 decimate functions
Fiona Glaser [Wed, 15 Feb 2012 00:54:03 +0000 (16:54 -0800)]
BMI1 decimate functions
Intel was nice enough to make tzcnt equal to "rep bsf", which is backwards-compatible.
This means we don't actually have to add new functions to make it work.

12 years agoMinor asm changes
Fiona Glaser [Tue, 14 Feb 2012 23:07:10 +0000 (15:07 -0800)]
Minor asm changes

12 years agoAdd row-reencoding support to VBV for improved accuracy
Fiona Glaser [Thu, 9 Feb 2012 22:23:52 +0000 (14:23 -0800)]
Add row-reencoding support to VBV for improved accuracy
Extremely accurate, possibly 100% so (I can't get it to fail even with difficult VBVs).
Does not yet support rows split on slice boundaries (occurs often with slice-max-size/mbs).
Still inaccurate with sliced threads, but better than before.

12 years agoAbstract bitstream backup/restore functions
Fiona Glaser [Thu, 9 Feb 2012 20:38:44 +0000 (12:38 -0800)]
Abstract bitstream backup/restore functions
Required for row re-encoding.

12 years agoAdd an small per-MB cost penalty for lowres
Anton Mitrofanov [Thu, 9 Feb 2012 23:27:53 +0000 (15:27 -0800)]
Add an small per-MB cost penalty for lowres
Helps avoid VBV predictors going nuts with very low-cost MBs.
One particular case this fixes is zero-cost MBs: adaptive quantization decreases the QP a lot, but (before this patch), no cost penalty gets factored in for this, because anything times zero is zero.

12 years agoRemove explicit run calculation from coeff_level_run
Fiona Glaser [Tue, 14 Feb 2012 02:31:51 +0000 (18:31 -0800)]
Remove explicit run calculation from coeff_level_run
Not necessary with the CAVLC lookup table for zero run codes.

12 years agoExport PSNR/SSIM in x264 API
Fiona Glaser [Mon, 13 Feb 2012 21:20:06 +0000 (13:20 -0800)]
Export PSNR/SSIM in x264 API

12 years agox86inc: support yasm -f win64
Ronald S. Bultje [Wed, 8 Feb 2012 21:10:31 +0000 (13:10 -0800)]
x86inc: support yasm -f win64
Not necessary for x264, as -m amd64 already does the right thing, but used by external users of x86inc.

12 years agoFix incorrect zero-extension assumptions in x86_64 asm
Henrik Gramner [Wed, 1 Feb 2012 22:52:48 +0000 (23:52 +0100)]
Fix incorrect zero-extension assumptions in x86_64 asm
Some x264 asm assumed that the high 32 bits of registers containing "int" values would be zero.
This is almost always the case, and it seems to work with gcc, but it is *not* guaranteed by the ABI.
As a result, it breaks with some other compilers, like Clang, that take advantage of this in optimizations.
Accordingly, fix all x86 code by using intptr_t instead of int or using movsxd where neccessary.
Also add checkasm hack to detect when assembly functions incorrectly assumes that 32-bit integers are zero-extended to 64-bit.

12 years agoFix possible alignment crash when linking from MSVC
Fiona Glaser [Thu, 23 Feb 2012 17:11:23 +0000 (09:11 -0800)]
Fix possible alignment crash when linking from MSVC
x264_cavlc_init needs to be stack-aligned now.

12 years agoFix rare overflow in 10-bit intra_satd_x3_16x16 asm
Anton Mitrofanov [Tue, 21 Feb 2012 20:58:22 +0000 (12:58 -0800)]
Fix rare overflow in 10-bit intra_satd_x3_16x16 asm

12 years agoICL: fix out of tree building and resource file usage on Windows
Steven Walters [Sun, 12 Feb 2012 03:56:43 +0000 (22:56 -0500)]
ICL: fix out of tree building and resource file usage on Windows

12 years agoAdd error handling for out-of-tree build
Oka Motofumi [Sun, 5 Feb 2012 21:07:34 +0000 (06:07 +0900)]
Add error handling for out-of-tree build

12 years agoFix RGB colorspace input
Anton Mitrofanov [Tue, 6 Mar 2012 13:34:02 +0000 (17:34 +0400)]
Fix RGB colorspace input
BGR/BGRA input was correct.

12 years agoFix interlaced + extremal slice-max-size
Fiona Glaser [Tue, 14 Feb 2012 00:40:32 +0000 (16:40 -0800)]
Fix interlaced + extremal slice-max-size
Broke if the first macroblock in the slice exceeded the set slice-max-size.

12 years agoFix regression in r2141
Henrik Gramner [Sun, 5 Feb 2012 19:43:09 +0000 (20:43 +0100)]
Fix regression in r2141
Broke register preservation in x264_cpu_cpuid and x264_cpu_xgetbv.
Did not cause any problems.

12 years agoTBM, AVX2, FMA3, BMI1, and BMI2 CPU detection support
Fiona Glaser [Thu, 19 Jan 2012 22:56:54 +0000 (14:56 -0800)]
TBM, AVX2, FMA3, BMI1, and BMI2 CPU detection support
TBM and BMI1 are supported by Trinity/Piledriver.
The others (and BMI1) will probably appear in Intel's upcoming Haswell.
Also update x86inc with AVX2 stuff.

12 years agox86inc: add TAIL_CALL macro to abstract a common asm idiom
Loren Merritt [Fri, 3 Feb 2012 06:27:18 +0000 (06:27 +0000)]
x86inc: add TAIL_CALL macro to abstract a common asm idiom

12 years agoMinor asm optimizations/cleanup
Fiona Glaser [Thu, 26 Jan 2012 00:44:38 +0000 (16:44 -0800)]
Minor asm optimizations/cleanup

12 years agoClean up and optimize weightp, plus enable SSSE3 weight on SB/BDZ
Fiona Glaser [Wed, 25 Jan 2012 03:03:58 +0000 (19:03 -0800)]
Clean up and optimize weightp, plus enable SSSE3 weight on SB/BDZ
Also remove unused AVX cruft.

12 years agoXOP frame_init_lowres
Fiona Glaser [Tue, 24 Jan 2012 02:57:58 +0000 (18:57 -0800)]
XOP frame_init_lowres
Covers both 8-bit and 16-bit, ~5-10% faster on Bulldozer.

12 years agoXOP 8x8 zigzags
Fiona Glaser [Tue, 17 Jan 2012 23:25:10 +0000 (15:25 -0800)]
XOP 8x8 zigzags
Field: 35(mmx) ->16(xop) cycles
Frame: 32(ssse3)->20(xop) cycles

12 years agoAVX 32-bit hpel_filter_h
Fiona Glaser [Mon, 23 Jan 2012 23:09:38 +0000 (15:09 -0800)]
AVX 32-bit hpel_filter_h
Faster on Sandy Bridge.
Also add details on unsuccessful optimizations in these functions.

12 years agox86inc: add high halfword register support
Fiona Glaser [Sat, 28 Jan 2012 00:29:30 +0000 (16:29 -0800)]
x86inc: add high halfword register support
Might be useful in a few cases.

12 years agoChange %ifdef directives to %if directives in *.asm files
Ronald S. Bultje [Wed, 25 Jan 2012 05:53:59 +0000 (13:53 +0800)]
Change %ifdef directives to %if directives in *.asm files
This allows combining multiple conditionals in a single statement.

12 years agoUse TV range algorithm for bit-depth conversions
Anton Mitrofanov [Sun, 22 Jan 2012 18:13:52 +0000 (22:13 +0400)]
Use TV range algorithm for bit-depth conversions
Such sources are more common, so better to be correct for the common case.
This also produces less error for the case of full range than the previous algorithm produced for the case of TV range.

12 years agoBump dates to 2012
Hii [Wed, 25 Jan 2012 08:29:22 +0000 (16:29 +0800)]
Bump dates to 2012

12 years agoAdd Windows resource file
Henrik Gramner [Sat, 28 Jan 2012 20:38:27 +0000 (21:38 +0100)]
Add Windows resource file
Displays version info in Windows Explorer.

12 years agoFix win32 pthread_cond_signal
Sergey Radionov [Mon, 16 Jan 2012 21:22:44 +0000 (13:22 -0800)]
Fix win32 pthread_cond_signal
Isn't used by x264 currently, so didn't cause a problem.
Fix backported from libav.

12 years agoARM: align asm functions to 4 bytes.
Mans Rullgard [Wed, 1 Feb 2012 23:55:25 +0000 (15:55 -0800)]
ARM: align asm functions to 4 bytes.
Some linkers apparently fail to correctly align ARM functions when mixing with Thumb code.

12 years agoFix normalization of colorspace when input is packed YUV 4:2:2
Anton Mitrofanov [Sun, 22 Jan 2012 09:00:23 +0000 (13:00 +0400)]
Fix normalization of colorspace when input is packed YUV 4:2:2

12 years agoForce keyint-min 1 with Blu-ray
Fiona Glaser [Sat, 21 Jan 2012 20:54:40 +0000 (12:54 -0800)]
Force keyint-min 1 with Blu-ray
Fixes an issue with referencing across I-frames that's prohibited in Blu-ray for some godforsaken reason.

12 years agoFix crash in --demuxer y4m with unsupported colorspace
Oka Motofumi [Sun, 29 Jan 2012 11:34:41 +0000 (20:34 +0900)]
Fix crash in --demuxer y4m with unsupported colorspace

12 years agoFix overread/possible crash with intra refresh + VBV
Anton Mitrofanov [Mon, 16 Jan 2012 22:02:53 +0000 (14:02 -0800)]
Fix overread/possible crash with intra refresh + VBV

12 years agoFix trellis 2 + subme >= 8
Loren Merritt [Wed, 18 Jan 2012 23:47:07 +0000 (15:47 -0800)]
Fix trellis 2 + subme >= 8
Trellis didn't return a boolean value as it was supposed to.
Regression in r2143-5.

12 years agoCABAC trellis opts part 4: x86_64 asm
Loren Merritt [Fri, 6 Jan 2012 15:53:29 +0000 (15:53 +0000)]
CABAC trellis opts part 4: x86_64 asm
Another 20% faster.
18k->12k codesize.

This patch series may have a large impact on encoding speed.
For example, 24% faster at --preset slower --crf 23 with 720p parkjoy.
Overall speed increase is proportional to the cost of trellis (which is proportional to bitrate, and much more with --trellis 2).

12 years agoCABAC trellis opts part 3: make some arrays non-static
Loren Merritt [Fri, 6 Jan 2012 15:53:04 +0000 (15:53 +0000)]
CABAC trellis opts part 3: make some arrays non-static

12 years agoCABAC trellis opts part 2: C optimizations
Loren Merritt [Thu, 22 Dec 2011 17:56:06 +0000 (17:56 +0000)]
CABAC trellis opts part 2: C optimizations

Hoist the branch on coef value out of the loop over node contexts.
Special cases for each possible coef value (0,1,n).
Special case for dc-only blocks.
Template the main loop for two common subsets of nodes, to avoid a bunch of branches about which nodes are live.
Use the nonupdating version of cabac_size_decision in more cases, and omit those bins from the node struct.
CABAC offsets are now compile-time constants.
Change TRELLIS_SCORE_MAX from a specific constant to anything negative, which is cheaper to test.
Remove dct_weight2_zigzag[], since trellis has to lookup zigzag[] anyway.

60% faster on x86_64.
25k->18k codesize.

12 years agoCABAC trellis opts part 1: minor change in output
Loren Merritt [Thu, 22 Dec 2011 17:55:06 +0000 (17:55 +0000)]
CABAC trellis opts part 1: minor change in output
Due to different tie-break order.

12 years agox86inc improvements for 64-bit
Henrik Gramner [Sun, 8 Jan 2012 03:14:10 +0000 (04:14 +0100)]
x86inc improvements for 64-bit

Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments

12 years agoHigh bit depth SSE2/AVX add8x8_idct8 and add16x16_idct8
Ilia Valiakhmetov [Sun, 15 Jan 2012 10:47:58 +0000 (04:47 -0600)]
High bit depth SSE2/AVX add8x8_idct8 and add16x16_idct8
From Google Code-In.

12 years agoMMX/SSE2/AVX predict_8x16_p, high bit depth fdct8
Edward Wang [Wed, 4 Jan 2012 23:35:54 +0000 (15:35 -0800)]
MMX/SSE2/AVX predict_8x16_p, high bit depth fdct8
From Google Code-In.

12 years agoXOP 8-bit fDCT
Fiona Glaser [Thu, 22 Dec 2011 22:03:15 +0000 (14:03 -0800)]
XOP 8-bit fDCT
Use integer MAC for one of the SUMSUB passes.  About a dozen cycles faster for 16x16.

12 years agoHigh bit depth intra_sad_x3_4x4
Cristian Militaru [Wed, 4 Jan 2012 20:38:08 +0000 (12:38 -0800)]
High bit depth intra_sad_x3_4x4
From Google Code-In.

12 years agoUse a large LUT for CAVLC zero-run bit codes
Fiona Glaser [Thu, 8 Dec 2011 21:45:41 +0000 (13:45 -0800)]
Use a large LUT for CAVLC zero-run bit codes
Helps the most with trellis and RD, but also helps with bitstream writing.
Seems at worst neutral even in the extreme case of a CPU with small L2 cache (e.g. ARM Cortex A8).

12 years agoHigh bit depth intra_sad_x3_8x8, intra_satd_x3_4x4/8x8c/16x16
Matt Habel [Sat, 17 Dec 2011 07:16:09 +0000 (23:16 -0800)]
High bit depth intra_sad_x3_8x8, intra_satd_x3_4x4/8x8c/16x16
Also add an ACCUM macro to handle accumulator-induced add-or-swap more concisely.

12 years agoMMX 10-bit predict_8x8c_h and predict_8x16c_h
Shitiz Garg [Sat, 3 Dec 2011 23:34:57 +0000 (15:34 -0800)]
MMX 10-bit predict_8x8c_h and predict_8x16c_h
From Google Code-In.

12 years agoSome MBAFF x86 assembly functions.
Aaron Schmitz [Wed, 30 Nov 2011 06:15:45 +0000 (00:15 -0600)]
Some MBAFF x86 assembly functions.
deblock_chroma_420_mbaff, plus 422/422_intra_mbaff implemented using existing functions.
From Google Code-In.

12 years agoMore ARM NEON assembly functions
George Stephanos [Fri, 2 Dec 2011 00:53:45 +0000 (16:53 -0800)]
More ARM NEON assembly functions
predict_8x8_v, predict_4x4_dc_top, predict_8x8_ddl, predict_8x8_ddr, predict_8x8_vl, predict_8x8_vr, predict_8x8_hd, predict_8x8_hu.
From Google Code-In.

12 years agoMore 4:2:2 asm functions
Ilia [Mon, 28 Nov 2011 13:20:09 +0000 (05:20 -0800)]
More 4:2:2 asm functions
High bit depth version of deblock_h_chroma_422.
Regular and high bit depth versions of deblock_h_chroma_intra_422.
High bit depth pixel_vsad.
SSE2 high bit depth and MMX 8-bit predict_8x8_vl.
Our first GCI patch this year!

12 years agoSSE2 and SSSE3 versions of sub8x16_dct_dc
Henrik Gramner [Thu, 8 Dec 2011 15:14:35 +0000 (16:14 +0100)]
SSE2 and SSSE3 versions of sub8x16_dct_dc
Also slightly faster sub8x8_dct_dc

12 years agoResize filter updates
Steven Walters [Mon, 5 Dec 2011 13:46:34 +0000 (08:46 -0500)]
Resize filter updates
Use AVPixFmtDescriptors to pick the most compatible x264 csp for any pixel format.
Fix deprecated use of av_set_int.
Now requires libavutil >= 51.19.0

13 years agoAdd out-of-tree build support
Oka Motofumi [Thu, 5 Jan 2012 22:23:50 +0000 (14:23 -0800)]
Add out-of-tree build support

13 years agoLimit SSIM to 100db
Anton Mitrofanov [Fri, 16 Dec 2011 14:17:00 +0000 (18:17 +0400)]
Limit SSIM to 100db
Avoids floating point error for infinite SSIM (lossless).