Steven Walters [Thu, 16 Sep 2010 00:42:08 +0000 (20:42 -0400)]
Add full chroma input flag to swscale
Improves quality of colorspace conversions involving RGB(A).
James Darnley [Fri, 17 Sep 2010 11:06:59 +0000 (04:06 -0700)]
Add --disable-gpl option to configure
Used for commercially-licensed versions of x264.
Doesn't currently change anything, but may be used to disable GPL-only CLI tools, such as video filters, in the future.
Also print the x264 license and libavformat license in version info.
Fiona Glaser [Fri, 17 Sep 2010 11:03:27 +0000 (04:03 -0700)]
Update source file headers
Update dates, improve file descriptions, make things more consistent.
Also add information about commercial licensing.
Fiona Glaser [Wed, 15 Sep 2010 19:06:47 +0000 (12:06 -0700)]
Fix intra refresh to not exceed max recovery_frame_cnt
The spec constrains recovery_frame_cnt to [0, MaxFrameNum-1].
So make MaxFrameNum bigger in the case of intra refresh.
Fiona Glaser [Thu, 16 Sep 2010 10:36:17 +0000 (03:36 -0700)]
Make intra refresh finish one frame faster
In some cases, the last frame of intra refresh was redundant.
Saves a few bits.
Fiona Glaser [Tue, 14 Sep 2010 19:20:00 +0000 (12:20 -0700)]
Fix intra refresh to not predict from invalid pixels
The blocks on the right side of the intra refresh column should not predict from top-right.
Steven Walters [Mon, 13 Sep 2010 22:47:33 +0000 (18:47 -0400)]
Add configure check for mingw64 prefixing
This compensates for the inconsistent prefixing seen in different versions of the compiler.
Manuel Rommel [Sun, 5 Sep 2010 02:31:53 +0000 (19:31 -0700)]
Update some Altivec function prototypes
Silences a lot of warnings.
Takashi Hirata [Mon, 30 Aug 2010 09:13:49 +0000 (18:13 +0900)]
Add support for level 1b
This level is a stupid hack in the H.264 spec, so it's a stupid hack in x264 too.
Since level is an integer, calling applications need to set level_idc=9 to use it.
String-based option handling will accept "1b" just fine though, so CLI users don't have to worry.
Fiona Glaser [Thu, 2 Sep 2010 22:29:29 +0000 (15:29 -0700)]
Use smaller values for idr_pic_id
Saves a few bits and fixes problems on certain fantastically terrible decoders,
such as the Apple iPad.
Fiona Glaser [Mon, 30 Aug 2010 19:32:31 +0000 (12:32 -0700)]
Use POC type 2 for streams with no B-frames
Saves a few bits per slice header.
Fiona Glaser [Mon, 30 Aug 2010 05:18:07 +0000 (22:18 -0700)]
Faster cabac_encode_ue_bypass
Use CLZ + a lut instead of a loop.
Henrik Gramner [Tue, 31 Aug 2010 22:53:42 +0000 (00:53 +0200)]
Faster nal_escape asm
Anton Mitrofanov [Tue, 31 Aug 2010 15:45:22 +0000 (08:45 -0700)]
Allow --demuxer forcing with known extensions
Anton Mitrofanov [Fri, 3 Sep 2010 20:33:44 +0000 (13:33 -0700)]
Minor fixes/cosmeticcs in commandling parsing
Anton Mitrofanov [Fri, 3 Sep 2010 15:39:48 +0000 (08:39 -0700)]
Fix overflow in stats printing
Anton Mitrofanov [Sun, 29 Aug 2010 12:35:32 +0000 (16:35 +0400)]
Fix bug in 2pass if the first P-frames are all skip
last_qscale_for was read before being initialized in this case, resulting
in the value from the previous iteration being used instead.
Fiona Glaser [Thu, 26 Aug 2010 13:12:01 +0000 (09:12 -0400)]
Don't do deblock-aware RD if deblocking is off
Fiona Glaser [Sat, 21 Aug 2010 07:15:53 +0000 (00:15 -0700)]
CAVLC "trellis"
~3-10% improved compression with CAVLC.
--trellis is now a valid option with CAVLC.
Perhaps more importantly, this means psy-trellis now works with CAVLC.
This isn't a real trellis; it's actually just a simplified QNS.
But it takes enough shortcuts that it's still roughly as fast as a trellis; just not quite optimal.
Thus the name is a bit of a misnomer, but we're reusing the option name because it does the same thing.
A real trellis would be better, but CAVLC is much harder to trellis than CABAC.
I'm not aware of any published polynomial-time solutions that are significantly close to optimal.
Fiona Glaser [Sat, 21 Aug 2010 21:51:39 +0000 (16:51 -0500)]
Add global #define for maximum reference count
This should make it easier to play around with reference frame counts that exceed the spec maximum.
Fiona Glaser [Tue, 17 Aug 2010 00:47:11 +0000 (17:47 -0700)]
Simplify addressing logic for interlaced-related arrays
In progressive mode, just make [0] and [1] point to the same place.
Fiona Glaser [Mon, 23 Aug 2010 22:59:35 +0000 (18:59 -0400)]
Add missing emms to x264_nal_encode
Only matters for applications using the low-latency callback feature.
Fiona Glaser [Tue, 17 Aug 2010 21:38:41 +0000 (14:38 -0700)]
Fix 2 bugs with slice-max-size
Macroblock re-encoding didn't restore mv/tex bit counters (slightly inaccurate 2-pass).
Bitstream buffer check didn't work correctly (insanely large frames could break encoding).
Manuel Rommel [Thu, 12 Aug 2010 19:54:00 +0000 (12:54 -0700)]
NV12 version of Altivec chroma MC
Fiona Glaser [Tue, 10 Aug 2010 23:55:05 +0000 (16:55 -0700)]
Deblock-aware RD
Small quality gain (~0.5%) at lower bitrates, potentially larger with QPRD.
May help more with psy, maybe not.
Enabled at subme >= 9. Small speed cost (a few %).
Brad Smith [Sun, 8 Aug 2010 22:13:32 +0000 (18:13 -0400)]
Correct X header path usage in configure
Don't unconditionally set the header path for OpenBSD but do so if the
--enable-visualize flag is specified.
golgol7777 [Sun, 8 Aug 2010 06:01:46 +0000 (23:01 -0700)]
Fix lavf input with delayed frames
Alexander Strange [Sun, 8 Aug 2010 05:29:12 +0000 (22:29 -0700)]
Slightly improve the filtering section of x264 --help
Fiona Glaser [Sun, 8 Aug 2010 05:32:06 +0000 (22:32 -0700)]
Fix debug message typo with DTS compression
Yasuhiro Ikeda [Tue, 3 Aug 2010 13:10:15 +0000 (22:10 +0900)]
Try to guess input length for lavf input
Allows printing of progress indicator when using lavf input.
Yasuhiro Ikeda [Tue, 3 Aug 2010 13:07:36 +0000 (22:07 +0900)]
Workaround bug in fps/timestamp handling with lavf input
reordered_opaque in lavf doesn't work correctly in the identity case (no reordering).
Fixes incorrect output for some file types (e.g. raw in mov).
Mike Matsnev [Sun, 1 Aug 2010 19:08:20 +0000 (12:08 -0700)]
Fix aspect ratio writing in the MKV muxer
The braindead Matroska spec dictates aspect ratio to be measured in pixels instead of, well, an actual aspect ratio.
Anton Mitrofanov [Thu, 29 Jul 2010 16:23:55 +0000 (20:23 +0400)]
Add libavcore check in configure
Fiona Glaser [Mon, 26 Jul 2010 22:38:13 +0000 (15:38 -0700)]
Improve quantizer distribution with sliced-threads+VBV
Should help avoid cases of very uneven quantizer choice between slices.
Fiona Glaser [Wed, 28 Jul 2010 18:42:06 +0000 (11:42 -0700)]
Remove dead code in slicetype.c
golgol7777 [Tue, 27 Jul 2010 15:54:38 +0000 (00:54 +0900)]
Fix incorrect duration/framerate/bitrate in flv header
Fiona Glaser [Wed, 28 Jul 2010 21:23:53 +0000 (14:23 -0700)]
invalidate_reference fixes
invalidate_reference didn't actually invalidate the immediate previous frame, only frames that came before that.
Make sure that reordering is forced when invalidate_reference is used, so that the reference list is correct decoder-side.
Steven Walters [Sun, 25 Jul 2010 23:45:27 +0000 (19:45 -0400)]
Filtering system-related fixes
Fix configure to check for outdated libavutil in resize filter support.
Do not print an explicit error message in ffms when requesting a frame beyond the number of frames in the source.
Mention in --*help that filtering options can be specified as name=value.
Fix the shadowing warning in the resize filter on posix systems.
Fiona Glaser [Thu, 22 Jul 2010 00:40:14 +0000 (17:40 -0700)]
Improve reference_invalid support
Reference invalidation can now be used to invalidate multiple frames at a time, rather than being limited to one per encoder_encode call.
Loren Merritt [Thu, 22 Jul 2010 06:40:12 +0000 (06:40 +0000)]
Eradicate all mention of SI/SP-frames
Fiona Glaser [Wed, 21 Jul 2010 18:25:11 +0000 (11:25 -0700)]
Fix stack alignment with MB-tree
Broke 2-pass with MB-tree when calling from compilers with broken stack alignment (e.g. MSVC).
Steven Walters [Sat, 17 Jul 2010 21:43:37 +0000 (17:43 -0400)]
Avisynth 2.6 colorspace support
Use a customized avisynth_c.h to detect the new planar colorspaces.
Loren Merritt [Fri, 16 Jul 2010 06:49:03 +0000 (23:49 -0700)]
Prevent some cases of cache aliasing.
Avoid cases where image strides were a large power of 2.
Core 2: +3% speed at widths 898..960, +6% at widths 1922..1984, most other resolutions unaffected.
Nehalem and AMD: similar amount of speedup, but fewer resolutions affected.
Fiona Glaser [Fri, 16 Jul 2010 02:35:52 +0000 (19:35 -0700)]
Fix stack alignment for adaptive quant
Broke calls from compilers with broken stack alignment (e.g. MSVC).
David Conrad [Thu, 15 Jul 2010 22:58:28 +0000 (18:58 -0400)]
Fix compilation with shared ffmpeg libs
lavf input uses libavutil functions, so it must request flags for libavutil from pkg-config.
Fiona Glaser [Thu, 15 Jul 2010 20:20:50 +0000 (13:20 -0700)]
Fix another PCM bug
CABAC assumes that NNZ is 0 or 1, not the number of actual nonzero coefficients.
Didn't actually break the output; only had a tiny effect on RD.
Oskar Arvidsson [Thu, 15 Jul 2010 12:01:36 +0000 (14:01 +0200)]
Fix regression in r1666
Broke encoding of PCM macroblocks.
Oskar Arvidsson [Thu, 15 Jul 2010 06:04:47 +0000 (08:04 +0200)]
Fix build with bit_depth > 8
Definition of x264_cli_plane_copy was inconsistent with declaration.
Loren Merritt [Thu, 8 Jul 2010 19:24:16 +0000 (12:24 -0700)]
Convert x264 to use NV12 pixel format internally
~1% faster overall on Conroe, mostly due to improved cache locality.
Also allows improved SIMD on some chroma functions (e.g. deblock).
This change also extends the API to allow direct NV12 input, which should be a bit faster than YV12.
This isn't currently used in the x264cli, as swscale does not have fast NV12 conversion routines, but it might be useful for other applications.
Note this patch disables the chroma SIMD code for PPC and ARM until new versions are written.
Steven Walters [Mon, 5 Jul 2010 21:37:47 +0000 (17:37 -0400)]
Add video filtering system to x264cli
Similar to mplayer's -vf system.
Supports some basic operations like resizing and cropping. Will support more in the future.
See the help for more details.
Fiona Glaser [Tue, 6 Jul 2010 20:39:44 +0000 (13:39 -0700)]
Eliminate edge cases for MV predictors
Saves a few clocks in mv pred.
Fiona Glaser [Thu, 8 Jul 2010 19:45:25 +0000 (12:45 -0700)]
Improve scenecut detection a bit
Put a minimum value on the scenecut threshold; makes x264 more likely to catch successive scenecuts (but might increase the odds of false detection).
This also fixes scenecut detection with keyint=infinite.
Also print keyint=infinite in the x264 SEI and statsfile correctly.
Fiona Glaser [Thu, 15 Jul 2010 01:47:14 +0000 (18:47 -0700)]
Fix 8x8dct+slices+no sliced threads+cavlc+deblock
Deblocking was done slightly incorrectly.
Regression in r1612.
Fiona Glaser [Thu, 8 Jul 2010 23:20:48 +0000 (16:20 -0700)]
Fix off-by-one error in slice VBV predictor updates
Anton Mitrofanov [Mon, 5 Jul 2010 13:44:15 +0000 (17:44 +0400)]
Fix disabling of progress with --log-level
Oskar Arvidsson [Fri, 2 Jul 2010 02:06:08 +0000 (04:06 +0200)]
Support for 9 and 10-bit encoding
Output bit depth is specified on compilation time via --bit-depth.
There is currently almost no assembly code available for high-bit-depth modes, so encoding will be very slow.
Input is still 8-bit only; this will change in the future.
Note that very few H.264 decoders support >8 bit depth currently.
Also note that the quantizer scale differs for higher bit depth. For example, for 10-bit, the quantizer (and crf) ranges from 0 to 63 instead of 0 to 51.
Fiona Glaser [Wed, 30 Jun 2010 20:55:46 +0000 (13:55 -0700)]
Support infinite keyint (--keyint infinite).
This just means x264 won't insert non-scenecut keyframes.
Useful for streaming when using interactive error recovery or some other mechanism that makes keyframes unnecessary.
Also change POC logic to limit POC/framenum LSB size (to save bits per slice).
Also fix a bug in the CPB underflow detection code (didn't affect the bitstream, just resulted in the failure to print certain warning messages).
Fiona Glaser [Wed, 30 Jun 2010 20:06:22 +0000 (13:06 -0700)]
Don't check i16x16 planar mode unless previous modes were useful
Saves ~160 clocks per MB at subme=1, ~270 per MB at subme>1 (measured on Core i7).
Negligle effect on compression.
Also make a few more arrays static.
Steven Walters [Sat, 26 Jun 2010 20:28:49 +0000 (16:28 -0400)]
Centralize logging within x264cli
x264cli messages will now respect the log level they pertain to.
Slightly reduces binary size.
Lamont Alston [Tue, 29 Jun 2010 17:11:42 +0000 (10:11 -0700)]
Make open-GOP Blu-ray compatible
Blu-ray is even more braindamaged than we thought.
Accordingly, open-gop options are now "normal" and "bluray", as opposed to display and coded.
Normal should be used in all cases besides Blu-ray authoring.
Fiona Glaser [Mon, 28 Jun 2010 22:02:33 +0000 (15:02 -0700)]
Callback feature for low-latency per-slice output
Add a callback to allow the calling application to send slices immediately after being encoded.
Also add some extra information to the x264_nal_t structure to help inform such a calling application how the NAL units should be ordered.
Full documentation is in x264.h.
Loren Merritt [Sun, 27 Jun 2010 03:55:59 +0000 (20:55 -0700)]
Simplify pixel_ads
Fiona Glaser [Thu, 24 Jun 2010 00:29:34 +0000 (17:29 -0700)]
Interactive encoder control: error resilience
In low-latency streaming with few clients, it is often feasible to modify encoder behavior in some fashion based on feedback from clients.
One possible application of this is error resilience: if a packet is lost, mark the associated frame (and any referenced from it) as lost.
This allows quick recovery from errors with minimal expense bit-wise.
The new i_dpb_size parameter allows a calling application to tell x264 to use a larger DPB size than required by the number of reference frames.
This lets x264 and the client keep a large buffer of old references to fall back to in case of lost frames.
If no recovery is possible even with the available buffer, x264 will force a keyframe.
This initial version does not support B-frames or intra refresh.
Recommended usage is to set keyint to a very large value, so that keyframes do not occur except as necessary for extreme error recovery.
Full documentation is in x264.h.
Move DTS/PTS calculation to before encoding each frame instead of after.
Improve documentation of x264_encoder_intra_refresh.
Fiona Glaser [Thu, 17 Jun 2010 21:50:07 +0000 (14:50 -0700)]
Lookaheadless MB-tree support
Uses past motion information instead of future data from the lookahead.
Not as accurate, but better than nothing in zero-latency compression when a lookahead isn't available.
Currently resets on keyframes, so only available if intra-refresh is set, to avoid pops on non-scenecut keyframes.
Not on by default with any preset/tune combination; must be enabled explicitly if --tune zerolatency is used.
Also slightly modify encoding presets: disable rc-lookahead in the fastest presets.
Enable MB-tree in "veryfast", albeit with a very short lookahead.
Lamont Alston [Wed, 16 Jun 2010 17:05:17 +0000 (10:05 -0700)]
Open-GOP support
Allows B-frames immediately prior to keyframes (in display order).
This helps reduce keyframe popping and improve compression with short keyframe intervals.
Due to a staggering display of braindamage in the Blu-ray spec, two open-GOP modes are available.
The two modes calculate keyframe interval differently: one based on coded distance and one based on display distance.
The latter is superior compression-wise, but for no comprehensible reason, Blu-ray requires the former if open-GOP is used.
Steven Walters [Wed, 9 Jun 2010 22:14:52 +0000 (18:14 -0400)]
Use threadpools to avoid unnecessary thread creation
Tiny performance improvement with fast settings and lots of threads.
May help more on some OSs with slow thread creation, like OS X.
Unify inconsistent synchronized abbreviations to sync.
Fiona Glaser [Sat, 19 Jun 2010 08:41:07 +0000 (01:41 -0700)]
Improve 2-pass bitrate prediction
Adapt based on distance to the end in bits, not in frames.
Helps in videos with absurdly simple end sections, e.g. black frames.
Fiona Glaser [Fri, 18 Jun 2010 20:58:11 +0000 (13:58 -0700)]
SSE4 and SSSE3 versions of some intra_sad functions
Primarily Nehalem-optimized.
Fiona Glaser [Sat, 19 Jun 2010 10:27:33 +0000 (03:27 -0700)]
Improve HRD accuracy
In a staggering display of brain damage, the spec requires all HRD math to be done in infinite precision despite the output being of quite limited precision.
Accordingly, convert buffer management to work in units of timescale.
These accumulating rounding errors probably didn't cause any real problems, but might in theory cause issues in very picky muxers on extremely long-running streams.
Fiona Glaser [Tue, 22 Jun 2010 21:20:46 +0000 (14:20 -0700)]
Use -fno-tree-vectorize to avoid miscompilation
Some versions of gcc have been reported to attempt (and fail) to vectorize a loop in plane_expand_border.
This results in a segfault, so to limit the possible effects of gcc's utter incompetence, we're turning off vectorization entirely.
It's not like it ever did anything useful to begin with.
Anton Mitrofanov [Fri, 18 Jun 2010 21:44:56 +0000 (01:44 +0400)]
Fix SIGPIPEs caused by is_regular_file checks
Check to see if input file is a pipe without opening it.
Fiona Glaser [Tue, 15 Jun 2010 12:15:42 +0000 (05:15 -0700)]
Fix compilation on ARM w/ Apple ABI
Holger Lubitz [Wed, 9 Jun 2010 11:59:06 +0000 (13:59 +0200)]
Faster mbtree_propagate asm
Replace fp division by multiply with the reciprocal.
Only ~12% faster on penryn, but over 80% faster on amd k8.
Also make checkasm slightly more tolerant to rounding error.
Diogo Franco [Mon, 14 Jun 2010 00:57:32 +0000 (21:57 -0300)]
Convert the OPT_ defines in x264.c to an enum
Anton Mitrofanov [Sun, 13 Jun 2010 19:14:15 +0000 (23:14 +0400)]
Don't allow baseline profile streams with fake-interlaced
Indicate use of --fake-interlaced in encoding options SEI.
Havoc Pennington [Thu, 10 Jun 2010 20:28:52 +0000 (16:28 -0400)]
Allocate space for null terminator in param_apply_tune
Anton Mitrofanov [Thu, 10 Jun 2010 17:33:46 +0000 (21:33 +0400)]
Fix regression in r1501.
Could cause slightly incorrect analysis in rare cases, but no serious encoding issues.
Also shut up gcc warning about pels_v.
Anton Mitrofanov [Wed, 9 Jun 2010 18:53:08 +0000 (22:53 +0400)]
Fix crash with --subme 0 + --weightp > 0. Regression in r1535
Henrik Gramner [Tue, 8 Jun 2010 14:29:16 +0000 (16:29 +0200)]
Replace some divisions with shifts
Anton Mitrofanov [Mon, 7 Jun 2010 22:43:37 +0000 (02:43 +0400)]
Warn about shadowed variable declarations
Also get rid of a few instances of variable shadowing.
Fiona Glaser [Mon, 7 Jun 2010 21:26:05 +0000 (14:26 -0700)]
Template load_pic_pointers based on interlaced
Significantly speeds up cache_load in the non-interlaced case.
Also various other minor optimizations in cache_load and cache_save.
Fiona Glaser [Mon, 7 Jun 2010 21:15:33 +0000 (14:15 -0700)]
Remove double-dereferences for MB width/height data
Store it in x264_t instead of going through the SPS.
Steven Walters [Sun, 23 May 2010 00:54:35 +0000 (20:54 -0400)]
Exempt Win x86_64 from memalign hack
The API mandates all mallocs are 16 byte aligned.
Remove unused int that stores sizeof malloc in memalign hack.
Steven Walters [Fri, 4 Jun 2010 20:44:55 +0000 (13:44 -0700)]
Preprocessing cosmetics
Unify input/output defines to HAVE_* format.
Define values as 1 to simplify conditionals.
Fiona Glaser [Fri, 4 Jun 2010 04:31:10 +0000 (21:31 -0700)]
Take more shortcuts in i4x4/i8x8 analysis
Based on the scores of the H and V modes, rule out modes which are unlikely.
Small compression loss (0.1-0.5%) and large speed gain (10-30% faster intra analysis).
Not enabled in slower encoding modes.
Also make C versions of the merged SATD functions in order to eliminate branches based on their availability.
Fiona Glaser [Wed, 2 Jun 2010 22:47:26 +0000 (15:47 -0700)]
Display SSIM measurement in db as well
Anton Mitrofanov [Mon, 7 Jun 2010 21:03:03 +0000 (01:03 +0400)]
Make version.sh indicate "M" for local commits too
Alex Jurkiewicz [Sun, 6 Jun 2010 07:21:12 +0000 (15:21 +0800)]
Add error message for invalid [de]muxer selection
Nathan Caldwell [Sun, 6 Jun 2010 20:19:41 +0000 (14:19 -0600)]
Deduplicate the ALIGN macro, move it to common.h
David Conrad [Thu, 3 Jun 2010 23:02:24 +0000 (19:02 -0400)]
Fix a use of ALIGNED_ARRAY_16 on ARM
Fiona Glaser [Tue, 8 Jun 2010 22:41:17 +0000 (15:41 -0700)]
Add missing emms after nal_encode
Caused random, bizarre failures with some calling applications.
Fiona Glaser [Tue, 8 Jun 2010 22:38:32 +0000 (15:38 -0700)]
Fix crash in fake-interlaced at some resolutions
Yusuke Nakamura [Wed, 2 Jun 2010 13:27:57 +0000 (22:27 +0900)]
Fix no-mbtree + aq-mode=0
Regression in r1618.
Fiona Glaser [Wed, 2 Jun 2010 08:07:44 +0000 (01:07 -0700)]
Add API function to fix x264_picture_t initialization
Calling applications that do not use x264_picture_alloc need to use x264_picture_init to initialize x264_picture_t structures.
Previously, if the calling application didn't zero x264_picture_t, Bad Things could happen.
Yusuke Nakamura [Wed, 2 Jun 2010 08:02:31 +0000 (17:02 +0900)]
Fix Avisynth input
Regression in r1624. A more permanent solution to the problem will be committed later.
Oskar Arvidsson [Wed, 2 Jun 2010 00:08:45 +0000 (02:08 +0200)]
Convert to a unified "dctcoeff" type for DCT data
Necessary for future high bit-depth support.
Oskar Arvidsson [Tue, 1 Jun 2010 23:35:38 +0000 (01:35 +0200)]
Convert to a unified "pixel" type for pixel data
Necessary for future high bit-depth support.
Various macros and extra types have been introduced to make operations on variable-size pixels more convenient.
Fiona Glaser [Fri, 28 May 2010 21:27:22 +0000 (14:27 -0700)]
Add API tool to apply arbitrary quantizer offsets
The calling application can now pass a "map" of quantizer offsets to apply to each frame.
An optional callback to free the map can also be included.
This allows all kinds of flexible region-of-interest coding and similar.
Fiona Glaser [Thu, 27 May 2010 21:27:32 +0000 (14:27 -0700)]
x86 assembly code for NAL escaping
Up to ~10x faster than C depending on CPU.
Helps the most at very high bitrates (e.g. lossless).
Also make the C code faster and simpler.
Fiona Glaser [Fri, 28 May 2010 21:30:07 +0000 (14:30 -0700)]
Re-enable i8x8 merged SATD
Accidentally got disabled when intra_sad_x3 was added.