granicus.if.org Git - libx264/log

]> granicus.if.org Git - libx264/log

James Darnley [Thu, 9 Jul 2009 18:25:55 +0000 (11:25 -0700)]

Fix bug in reference frame autoadjustment
For some types of input file, x264 did the adjustment before width/height were known.

commit | commitdiff | tree

Fiona Glaser [Tue, 7 Jul 2009 18:13:39 +0000 (11:13 -0700)]

Fix fprofile settings to match changes in defaults
Also add b-adapt 2 to fprofile.

commit | commitdiff | tree

Fiona Glaser [Fri, 3 Jul 2009 09:33:44 +0000 (02:33 -0700)]

Slightly faster dequant_flat assembly
Eliminate some redundant shifts.

commit | commitdiff | tree

Fiona Glaser [Thu, 2 Jul 2009 04:14:57 +0000 (21:14 -0700)]

Totally new preset system for x264.c (not libx264), new defaults
Other new features include "tune" and "profile" settings; see --help for more details.
Unlike most other settings, "preset" and "tune" act before all other options.
However, "profile" acts afterwards, overriding all other options.
Our defaults have also changed: new defaults are --subme 7 --bframes 3 --8x8dct --no-psnr --no-ssim --threads auto --ref 3 --mixed-refs --trellis 1 --weightb --crf 23 --progress.
Users will hopefully find these changes to greatly improve usability.

commit | commitdiff | tree

Fiona Glaser [Wed, 1 Jul 2009 23:33:12 +0000 (16:33 -0700)]

Update Gabriel's email address in AUTHORS

commit | commitdiff | tree

Fiona Glaser [Tue, 30 Jun 2009 22:20:32 +0000 (15:20 -0700)]

Early termination for chroma encoding
Faster chroma encoding by terminating early if heuristics indicate that the block will be DC-only.
This works because the vast majority of inter chroma blocks have no coefficients at all, and those that do are almost always DC-only.
Add two new helper DSP functions for this: dct_dc_8x8 and var2_8x8. mmx/sse2/ssse3 versions of each.
Early termination is disabled at very low QPs due to it not being useful there.
Performance increase is ~1-2% without trellis, up to 5-6% with trellis=2.
Increase is greater with lower bitrates.

commit | commitdiff | tree

David Conrad [Fri, 26 Jun 2009 20:09:44 +0000 (13:09 -0700)]

Fix bug in checkasm
frame_init_lowres_core check didn't check the C plane.
However, all x86 and PPC assembly was correct regardless of the unit test being incorrect.

commit | commitdiff | tree

Fiona Glaser [Wed, 24 Jun 2009 21:39:15 +0000 (14:39 -0700)]

Add subpartition cost for sub-8x8 blocks
Improves sub-p8x8 mode decision.

commit | commitdiff | tree

Fiona Glaser [Wed, 24 Jun 2009 20:24:18 +0000 (13:24 -0700)]

Yet more CABAC and CAVLC optimizations
Also clean up a lot of pointless code duplication in CAVLC MV coding.

commit | commitdiff | tree

Fiona Glaser [Sat, 20 Jun 2009 01:49:55 +0000 (18:49 -0700)]

Various CABAC optimizations and cleanups
Faster CABAC CBF context calculation for inter blocks.
Add x264_constant_p(), will probably be useful in the future as well.
Simpler subpartition functions.
Clean up and optimize mvd_cpn a bit more.
Various other minor optimizations.

commit | commitdiff | tree

David Wolstencroft [Sat, 20 Jun 2009 19:42:55 +0000 (21:42 +0200)]

AltiVec version of frame_init_lowres_core. 22.4x faster than C on PPC7450 and 25x on PPC970MP.

commit | commitdiff | tree

Fiona Glaser [Fri, 19 Jun 2009 23:03:18 +0000 (16:03 -0700)]

MMX CABAC mvd sum calculation
Faster CABAC mvd coding.

commit | commitdiff | tree

Fiona Glaser [Fri, 19 Jun 2009 23:02:39 +0000 (16:02 -0700)]

Faster MV prediction
Smaller code size, plus I get to use goto.

commit | commitdiff | tree

Fiona Glaser [Wed, 10 Jun 2009 17:37:01 +0000 (10:37 -0700)]

Fix potential crash in checkasm
ssim_end4_sse2 requires aligned sums

commit | commitdiff | tree

Fiona Glaser [Wed, 10 Jun 2009 17:11:00 +0000 (10:11 -0700)]

SSSE3, faster SSE2/MMX integral_init4v
The real reason I wrote this was an excuse to use shufpd.

commit | commitdiff | tree

Mike Frysinger [Thu, 11 Jun 2009 08:29:27 +0000 (08:29 +0000)]

configure check for uclinux

commit | commitdiff | tree

Loren Merritt [Thu, 11 Jun 2009 08:27:46 +0000 (08:27 +0000)]

fix a crash on frame width <= 48 pixels

commit | commitdiff | tree

Loren Merritt [Wed, 27 May 2009 20:47:18 +0000 (20:47 +0000)]

configure check for cc, rather than reporting lack of compiler as an asm error.
configure check for -mno-cygwin, since it's removed from gcc4.

commit | commitdiff | tree

Loren Merritt [Sun, 24 May 2009 05:01:26 +0000 (05:01 +0000)]

a better way to keep track of mv candidates.
2-4% faster dia, hex, and umh.

commit | commitdiff | tree

Loren Merritt [Sun, 24 May 2009 05:01:19 +0000 (05:01 +0000)]

reorder some motion estimation patterns.
this change is useless on its own, but segregates the bitstream-changing part out of my next optimization.

commit | commitdiff | tree

Loren Merritt [Mon, 25 May 2009 23:16:05 +0000 (19:16 -0400)]

Fix VBV warning broken in r915
x264 will now correctly warn about maxrate specified without bufsize even when a level is not set.

commit | commitdiff | tree

Loren Merritt [Mon, 25 May 2009 07:03:10 +0000 (07:03 +0000)]

configure check for ssse3-capable binutils

commit | commitdiff | tree

Fiona Glaser [Sun, 24 May 2009 20:58:08 +0000 (16:58 -0400)]

Fix 10L in r1155
Broke --me esa/tesa due to forgetting to add handling for x264_cost_mv_fpel.

commit | commitdiff | tree

Fiona Glaser [Sat, 23 May 2009 04:28:15 +0000 (21:28 -0700)]

Fix bug where satd was incorrectly used with subme<=1
Faster subme<=1 with i4x4 enabled.

commit | commitdiff | tree

Fiona Glaser [Sat, 23 May 2009 03:40:27 +0000 (20:40 -0700)]

Remove some pointless error handling code in cabac/cavlc

commit | commitdiff | tree

Fiona Glaser [Sat, 23 May 2009 01:40:12 +0000 (18:40 -0700)]

Save some memory on mv cost arrays
Have quantizers that use the same lambda share the same cost array.

commit | commitdiff | tree

Fiona Glaser [Fri, 22 May 2009 23:57:33 +0000 (16:57 -0700)]

Various CABAC and CAVLC optimizations
Backport CAVLC partial-inlining early termination to CABAC (~2-4% faster CABAC residual coding)

commit | commitdiff | tree

Loren Merritt [Tue, 19 May 2009 02:47:15 +0000 (02:47 +0000)]

fix a race condition at the end of thread_input

commit | commitdiff | tree

Fiona Glaser [Tue, 19 May 2009 02:40:45 +0000 (22:40 -0400)]

Various trellis speed optimizations

commit | commitdiff | tree

Fiona Glaser [Sat, 16 May 2009 19:16:34 +0000 (12:16 -0700)]

Make i686 the default arch on x86_32
Disabling asm will default to a generic arch.
Also fix configure for gcc 4.4.

commit | commitdiff | tree

Fiona Glaser [Sat, 16 May 2009 03:07:59 +0000 (20:07 -0700)]

Faster signed golomb coding
3% faster CAVLC RDO and bitstream writing.

commit | commitdiff | tree

Fiona Glaser [Thu, 14 May 2009 11:11:15 +0000 (04:11 -0700)]

Faster spatial direct MV prediction
unroll/tweak col_zero_flag

commit | commitdiff | tree

Fiona Glaser [Mon, 4 May 2009 11:19:28 +0000 (04:19 -0700)]

More CABAC and CAVLC optimizations
Simplified function calling for block_residual_write_(cabac|cavlc) and improved sigmap coding.
Tried making 0/1-bit specific versions of CABAC asm, but benefit was minimal under GCC 4.3.
Helped a decent bit under 3.4, but you shouldn't be using such old versions anyways.

commit | commitdiff | tree

Fiona Glaser [Thu, 30 Apr 2009 05:54:52 +0000 (22:54 -0700)]

Various optimizations in frametype lookahead

commit | commitdiff | tree

Fiona Glaser [Mon, 27 Apr 2009 05:13:17 +0000 (22:13 -0700)]

Some cosmetics/cleanup
Move some macros to x86util.asm that should have been there to begin with.
Fix a typo that didn't cause any issues.

commit | commitdiff | tree

Guillaume Poirier [Tue, 21 Apr 2009 21:18:44 +0000 (21:18 +0000)]

fix "incompatible types in initialization" compilation issues with GCC 4.3 (which is stricter than previous compiler version)

commit | commitdiff | tree

Guillaume Poirier [Tue, 21 Apr 2009 15:32:21 +0000 (17:32 +0200)]

fix conversions between vectors with differing element types or numbers of subparts errors

commit | commitdiff | tree

Fiona Glaser [Sat, 18 Apr 2009 23:07:53 +0000 (16:07 -0700)]

Add "coded blocks" stat to output information.
This measures the total percentage of blocks, intra and inter, which have nonzero coefficients.
"y,uvAC,uvDC" refers to luma, chroma DC, and chroma AC blocks.
Note that skip blocks are included in this stat.

commit | commitdiff | tree

Fiona Glaser [Sat, 18 Apr 2009 06:38:29 +0000 (23:38 -0700)]

Enable asm predict_8x8_filter
I'm not entirely sure how this snuck its way out of holger's intra pred patch.

commit | commitdiff | tree

Fiona Glaser [Fri, 17 Apr 2009 13:00:39 +0000 (06:00 -0700)]

Remove various bits of dead code found by CLANG.

commit | commitdiff | tree

Fiona Glaser [Tue, 14 Apr 2009 21:47:02 +0000 (14:47 -0700)]

Slightly faster SSE4 SA8D, SSE4 Hadamard_AC, SSE2 SSIM
shufps is the most underrated SSE instruction on x86.

commit | commitdiff | tree

Fiona Glaser [Thu, 9 Apr 2009 09:14:41 +0000 (02:14 -0700)]

Various CABAC optimizations
Move calculation of b_intra out of the core residual loop and hardcode it where applicable.
Inlining cabac_mb_mvd was unnecessary and wasted tremendous amounts of code size. Inlining only cache_mvd is faster and significantly smaller.

commit | commitdiff | tree

Fiona Glaser [Wed, 8 Apr 2009 12:45:03 +0000 (05:45 -0700)]

CAVLC optimizations
faster bs_write_te, port CABAC context selection optimization to CAVLC.

commit | commitdiff | tree

Fiona Glaser [Sun, 5 Apr 2009 20:01:42 +0000 (13:01 -0700)]

Faster CABAC RDO
Since the bypass case is quite unlikely, especially when doing merged sigmap/level coding,
it's faster to use a branch than a cmov.

commit | commitdiff | tree

Fiona Glaser [Tue, 31 Mar 2009 17:36:57 +0000 (10:36 -0700)]

Activate intra_sad_x3_8x8c in lookahead

commit | commitdiff | tree

Fiona Glaser [Tue, 31 Mar 2009 17:34:35 +0000 (10:34 -0700)]

MBAFF interlaced coding is not allowed in baseline profile

commit | commitdiff | tree

Fiona Glaser [Tue, 31 Mar 2009 02:30:59 +0000 (19:30 -0700)]

intra_sad_x3_8x8 assembly

commit | commitdiff | tree

Fiona Glaser [Mon, 30 Mar 2009 23:37:46 +0000 (16:37 -0700)]

intra_sad_x3_4x4 assembly

commit | commitdiff | tree

Fiona Glaser [Mon, 30 Mar 2009 11:07:50 +0000 (04:07 -0700)]

intra_sad_x3_8x8c assembly
Also fix intra_sad_x3_16x16's use of "n" as a loop variable (broke SWAP)

commit | commitdiff | tree

Fiona Glaser [Mon, 30 Mar 2009 01:27:32 +0000 (18:27 -0700)]

Shave one instruction off CABAC encode_decision
range_lps>>6 ranges from 4-7, so (range_lps>>6)-4 == (range_lps>>6) & 3

commit | commitdiff | tree

Fiona Glaser [Fri, 27 Mar 2009 05:22:23 +0000 (22:22 -0700)]

Faster probe_skip
Add a second chroma threshold after the DC transform.

commit | commitdiff | tree

Fiona Glaser [Thu, 19 Mar 2009 19:28:21 +0000 (12:28 -0700)]

Add missing "static" qualifier to two arrays
Should slightly improve performance.

commit | commitdiff | tree

Fiona Glaser [Tue, 17 Mar 2009 18:01:57 +0000 (11:01 -0700)]

SSE2 zigzag_interleave
Replace PHADD with FastShuffle (more accurate naming).
This flag represents asm functions that rely on fast SSE2 shuffle units, and thus are only faster on Phenom, Nehalem, and Penryn CPUs.

commit | commitdiff | tree

Fiona Glaser [Tue, 10 Mar 2009 06:37:53 +0000 (23:37 -0700)]

Faster integral_init
palignr to avoid unaligned loads is worth it in inith, but not initv.

commit | commitdiff | tree

Holger Lubitz [Mon, 9 Mar 2009 21:05:16 +0000 (14:05 -0700)]

Faster SSSE3 hpel_filter_v
~10% faster hpel_filter on 64-bit Penryn.
32-bit version by Fiona Glaser.

commit | commitdiff | tree

Fiona Glaser [Sun, 8 Mar 2009 00:43:09 +0000 (16:43 -0800)]

Faster SSE2 pixel_var
Optimized using the DEINTB method from r1122. ~32% faster var_16x16 on Conroe.

commit | commitdiff | tree

Fiona Glaser [Sat, 7 Mar 2009 08:27:27 +0000 (00:27 -0800)]

SSSE3 hpel_filter_v
Optimized using the same method as in r1122. Patch partially by Holger.
~8% faster hpel filter on 64-bit Nehalem

commit | commitdiff | tree

Fiona Glaser [Sat, 7 Mar 2009 02:57:15 +0000 (18:57 -0800)]

Update some asm copyright headers

commit | commitdiff | tree

Holger Lubitz [Sat, 7 Mar 2009 02:16:30 +0000 (18:16 -0800)]

Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT
Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs.
16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit)
Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD.
Overall performance boost is up to ~15% on 64-bit Conroe.

commit | commitdiff | tree

Fiona Glaser [Fri, 6 Mar 2009 23:28:47 +0000 (15:28 -0800)]

Update x264 copyright date

commit | commitdiff | tree

Fiona Glaser [Wed, 4 Mar 2009 11:16:06 +0000 (03:16 -0800)]

Remove pre-scenecut from fprofile commands as well
Also add psy-trellis to fprofile

commit | commitdiff | tree

Fiona Glaser [Wed, 4 Mar 2009 00:21:52 +0000 (16:21 -0800)]

Slightly faster 8x16 SAD on Penryn Core 2
Same as MMX 8x16 cacheline SAD, but calls SSE2 8x16 SAD in non-cacheline case.
Only Nehalem benefits from sizes smaller than 8x16, and Nehalem doesn't use cacheline functions, so no smaller versions are included.

commit | commitdiff | tree

Fiona Glaser [Fri, 27 Feb 2009 03:50:09 +0000 (19:50 -0800)]

Fix scenecut and VBV with videos of width/height <= 32
Also remove an unused variable

commit | commitdiff | tree

Fiona Glaser [Thu, 26 Feb 2009 22:29:50 +0000 (14:29 -0800)]

Remove non-pre scenecut
Add support for no-b-adapt + pre-scenecut (patch by BugMaster)
Pre-scenecut was generally better than regular scenecut in terms of accuracy and regular scenecut didn't work in threaded mode anyways.
Add no-scenecut option (scenecut=0 is now no scenecut; previously it was -1)
Fix an incorrect bias towards P-frames near scenecuts with B-adapt 2.
Simplify pre-scenecut code.

commit | commitdiff | tree

Guillaume Poirier [Tue, 3 Mar 2009 15:44:18 +0000 (07:44 -0800)]

Add AltiVec version of hadamard_ac. 2.4x faster than the C version.
Note this this implementation is pretty naive and should be improved
by implementing what's discussed in this ML thread:
date: Mon, Feb 2, 2009 at 6:58 PM
subject: Re: [x264-devel] [PATCH] AltiVec implementation of hadamard_ac routines

commit | commitdiff | tree

Fiona Glaser [Thu, 26 Feb 2009 20:07:56 +0000 (12:07 -0800)]

Fix regression in r1085
Deblocking was very slightly incorrect with partitions=all.
Bug found by BugMaster.

commit | commitdiff | tree

Fiona Glaser [Mon, 16 Feb 2009 13:56:12 +0000 (05:56 -0800)]

Optimize neighbor CBP calculation and fix related regression
r1105 introduced array overflow in cbp handling

commit | commitdiff | tree

Tal Aloni [Sat, 14 Feb 2009 00:30:14 +0000 (16:30 -0800)]

Show FPS when importing a raw YUV file

commit | commitdiff | tree

Anton Mitrofanov [Wed, 11 Feb 2009 18:38:56 +0000 (10:38 -0800)]

Windows 64-bit support
A "make distclean" is probably required after updating to this revision.

commit | commitdiff | tree

Fiona Glaser [Wed, 11 Feb 2009 18:35:56 +0000 (10:35 -0800)]

Minor fixes and cosmetics
Suppress a GCC warning, fix a non-problematic array overflow, one REP->REP_RET.

commit | commitdiff | tree

Manuel Rommel [Tue, 10 Feb 2009 20:06:47 +0000 (12:06 -0800)]

fix 10l in 75b495f2723fcb77f
Original thread:
date: Mon, Feb 9, 2009 at 9:37 PM
subject: [x264-devel] commit: Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors . (Guillaume Poirier )

commit | commitdiff | tree

Guillaume Poirier [Mon, 9 Feb 2009 20:17:33 +0000 (21:17 +0100)]

Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors.

commit | commitdiff | tree

Guillaume Poirier [Mon, 9 Feb 2009 20:12:23 +0000 (21:12 +0100)]

Promote chroma planes to 16 byte alignment.
This will allow simplifying vectors loads that can only load 16-bytes
aligned data (such as AltiVec).

commit | commitdiff | tree

Fiona Glaser [Mon, 9 Feb 2009 19:30:54 +0000 (11:30 -0800)]

Fix 10L in intra pred
Forgetting a %define resulted in SIGILL on 32-bit systems without SSE (e.g. Athlon XP).

commit | commitdiff | tree

Fiona Glaser [Mon, 9 Feb 2009 07:36:40 +0000 (23:36 -0800)]

Add decimation in i16x16 blocks
Up to +0.04db with CAVLC, generally a lot less with CABAC.

commit | commitdiff | tree

Fiona Glaser [Sat, 7 Feb 2009 10:27:16 +0000 (02:27 -0800)]

Much faster CABAC residual context selection
Up to ~17% faster CABAC RDO, ~36% faster intra-only CABAC RDO.
Up to 7% faster overall in extreme cases.

commit | commitdiff | tree

Fiona Glaser [Sat, 7 Feb 2009 09:57:43 +0000 (01:57 -0800)]

Faster coeff_last64 on 32-bit

commit | commitdiff | tree

Fiona Glaser [Fri, 6 Feb 2009 10:59:36 +0000 (02:59 -0800)]

More intra pred asm optimizations
SSSE3 version of predict_8x8_hu
SSE2 version of predict_8x8c_p
SSSE3 versions of both planar prediction functions
Optimizations to predict_16x16_p_sse2
Some unnecessary REP_RETs -> RETs.
SSE2 version of predict_8x8_vr by Holger.
SSE2 version of predict_8x8_hd.
Don't compile MMX versions of some of the pred functions on x86_64.
Remove now-useless x86_64 C versions of 4x4 pred functions.
Rewrite some of the x86_64-only C functions in asm.

commit | commitdiff | tree

Manuel Rommel [Sun, 8 Feb 2009 20:35:51 +0000 (21:35 +0100)]

Speed-up mc_chroma_altivec by using vec_mladd cleverly, and unrolling.
Also put width == 2 variant in its own scalar function because it's faster
than a vectorized one.

commit | commitdiff | tree

Holger Lubitz [Wed, 4 Feb 2009 20:46:17 +0000 (12:46 -0800)]

Merging Holger's GSOC branch part 2: intra prediction
Assembly versions of most remaining 4x4 and 8x8 intra pred functions.
Assembly version of predict_8x8_filter.
A few other optimizations.
Primarily Core 2-optimized.

commit | commitdiff | tree

Guillaume Poirier [Wed, 4 Feb 2009 10:04:55 +0000 (10:04 +0000)]

10l: fix compilation with GCC 4.3+

commit | commitdiff | tree

Fiona Glaser [Sat, 31 Jan 2009 13:00:39 +0000 (05:00 -0800)]

Faster 8x8dct+CAVLC interleave
Integrate array_non_zero with the CAVLC 8x8dct interleave function.
Roughly 1.5-2x faster than the original separate array_non_zero method.

commit | commitdiff | tree

Fiona Glaser [Sat, 31 Jan 2009 09:00:26 +0000 (01:00 -0800)]

Measure CBP cost in i8x8 RD refinement
~0.02-0.05db PSNR gain at high quants in intra-only encoding, pretty small otherwise.
Allows a small optimization in i8x8 encoding.

commit | commitdiff | tree

Guillaume Poirier [Sun, 1 Feb 2009 19:58:00 +0000 (20:58 +0100)]

Take advantage of saturated signed horizontal sum instructions in
the variance computation epilogue since there won't be any overflow
triggering an overflow.
Suggested by Loren Merritt

commit | commitdiff | tree

Fiona Glaser [Fri, 30 Jan 2009 11:40:54 +0000 (03:40 -0800)]

Massive overhaul of nnz/cbp calculation
Modify quantization to also calculate array_non_zero.
PPC assembly changes by gpoirior.
New quant asm includes some small tweaks to quant and SSE4 versions using ptest for the array_non_zero.
Use this new feature of quant to merge nnz/cbp calculation directly with encoding and avoid many unnecessary calls to dequant/zigzag/decimate/etc.
Also add new i16x16 DC-only iDCT with asm.
Since intra encoding now directly calculates nnz, skip_intra now backs up nnz/cbp as well.
Output should be equivalent except when using p4x4+RDO because of a subtlety involving old nnz values lying around.
Performance increase in macroblock_encode: ~18% with dct-decimate, 30% without at CRF 25.
Overall performance increase 0-6% depending on encoding settings.

commit | commitdiff | tree

Guillaume Poirier [Thu, 29 Jan 2009 09:28:12 +0000 (01:28 -0800)]

Add PowerPC support for "checkasm --bench", reading the time base register.
This isn't ideal since the `time base' register is running at a fraction
of the processor cycle speed, so the measurement isn't as precise as x86's
rdtsc.
It's better than nothing though...

commit | commitdiff | tree

Brad Smith [Thu, 29 Jan 2009 04:35:34 +0000 (04:35 +0000)]

fix detection of pthread and isfinite on OpenBSD

commit | commitdiff | tree

Loren Merritt [Tue, 27 Jan 2009 05:42:51 +0000 (05:42 +0000)]

remove $ECHON kludge, which broke on SunOS. bring back `gcc -MT`.
remove auto-reconfigure on svn update, which has done nothing since we stopped using svn.
fix $AS on sparc (was disabled by mmx check).
fix --extra-asflags (was ignored).
mark bash scripts as bash, not sh

patch partly by Greg Robinson and Jugdish.

commit | commitdiff | tree

Loren Merritt [Mon, 26 Jan 2009 14:28:48 +0000 (14:28 +0000)]

1.6x faster satd_c (and sa8d and hadamard_ac) with pseudo-simd.
60KB smaller binary.

commit | commitdiff | tree

Fiona Glaser [Wed, 28 Jan 2009 07:27:56 +0000 (23:27 -0800)]

Hack around a potential failure point in VBV
pred_b_from_p can become absurdly large in static scenes, leading to rare collapses of quality with VBV+B-frames+threads.
This isn't a final fix, but should resolve the problem in most cases in the meantime.

commit | commitdiff | tree

Fiona Glaser [Tue, 27 Jan 2009 07:43:25 +0000 (23:43 -0800)]

Much faster chroma encoding and other opts
~15% faster chroma encode by reorganizing CBP calculation and adding special-case idct_dc function, since most coded chroma blocks are DC-only.
Small optimization in cache_save (skip_bp)
Fix array_non_zero to not violate strict aliasing (should eliminate miscompilation issues in the future)
Add in automatic substitutions for some asm instructions that have an equivalent smaller representation.

commit | commitdiff | tree

Guillaume Poirier [Mon, 26 Jan 2009 14:28:23 +0000 (06:28 -0800)]

add AltiVec implementation of x264_mc_copy_w16_aligned

commit | commitdiff | tree

Guillaume Poirier [Fri, 23 Jan 2009 21:53:06 +0000 (13:53 -0800)]

add AltiVec implementation of x264_pixel_var_16x16 and x264_pixel_var_8x8

commit | commitdiff | tree

Guillaume Poirier [Fri, 23 Jan 2009 09:11:20 +0000 (01:11 -0800)]

add AltiVec 16 <-> 32 bits conversions macros

commit | commitdiff | tree

Guillaume Poirier [Mon, 19 Jan 2009 20:29:27 +0000 (21:29 +0100)]

Replace 16x16=>32 mul + pack + add by a simple 16x16=>16 multiply-add.
Suggested by Loren.

commit | commitdiff | tree

Fiona Glaser [Mon, 19 Jan 2009 23:17:53 +0000 (15:17 -0800)]

Eliminate support for direct_8x8_inference=0
The benefit in the most extreme contrived situation was at most 0.001db PSNR, at the cost of slower decoding.
As this option was basically useless, it was a waste of code and prevented some other useful optimizations.
Remove some unused mc code related to sub-8x8 partitions.
Small deblocking speedup when p4x4 is used.
Also remove unused x264_nal_decode prototype from x264.h.

commit | commitdiff | tree

Brad Smith [Mon, 19 Jan 2009 13:14:53 +0000 (05:14 -0800)]

Add AltiVec and CPU numbers detection on OpenBSD.

commit | commitdiff | tree

Guillaume Poirier [Sun, 18 Jan 2009 21:44:14 +0000 (22:44 +0100)]

Add AltiVec implementation of predict_8x8c_p. 2.6x faster than scalar C.

commit | commitdiff | tree

Fiona Glaser [Sat, 17 Jan 2009 20:16:37 +0000 (15:16 -0500)]

Warn if direct auto wasn't set on the first pass
And, if it wasn't, run direct auto as if it was the first pass, rather than simply forcing temporal direct mode on all frames.
Also a small tweak to coeff_level_run asm.

commit | commitdiff | tree

Brad Smith [Sat, 17 Jan 2009 12:52:28 +0000 (12:52 +0000)]

Changes the PowerPC ppccommon.h header so it no longer checks for a particular
OS such as Linux but instead looks for HAVE_ALTIVEC_H being set.
Fixes all *BSD/PowerPC builds.

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom