Fiona Glaser [Thu, 12 Jun 2008 14:09:22 +0000 (08:09 -0600)]
More tweaks to me.c
Added inline MMX version of UMH's predictor difference test
Various cosmetics throughout me.c
Removed a C99-ism introduced in r878.
Fiona Glaser [Thu, 12 Jun 2008 00:23:00 +0000 (18:23 -0600)]
Fix regression in r736
r736 added intra RD refinement to B-frames; however, it is possible for subme=7 to be used without b-rdo.
This means intra RD isn't run, and therefore it is possible for intra chroma analysis to not have been run, since update_cache was never called for an intra block, and chroma ME is not required even at subme=7.
r801, which removed a memset, made this worse because previously the chroma prediction mode was at least initialized to zero; now it was not initialized at all.
Therefore, --no-chroma-me, --subme 7, and no --b-rdo had the potential to crash.
This change restricts intra RD refinement to only be run when --b-rdo is enabled (sensible to begin with), thus preventing a crash in this case.
Fiona Glaser [Fri, 6 Jun 2008 20:59:10 +0000 (14:59 -0600)]
Partially inline trellis quantization
Inlining trellis into the 4x4/8x8 trellis wrappers increases trellis speed by about 5-10% through constant propagation.
Loren Merritt [Sat, 7 Jun 2008 05:31:22 +0000 (23:31 -0600)]
many changes to which asm functions are enabled on which cpus.
with Phenom, 3dnow is no longer equivalent to "sse2 is slow", so make a new flag for that.
some sse2 functions are useful only on Core2 and Phenom, so make a "sse2 is fast" flag for that.
some ssse3 instructions didn't become useful until Penryn, so yet another flag.
disable sse2 completely on Pentium M and Core1, because it's uniformly slower than mmx.
enable some sse2 functions on Athlon64 that always were faster and we just didn't notice.
remove mc_luma_sse3, because the only cpu that has lddqu (namely Pentium 4D) doesn't have "sse2 is fast".
don't print mmx1, sse1, nor 3dnow in the detected cpuflags, since we don't really have any such functions. likewise don't print sse3 unless it's used (Pentium 4D).
Fiona Glaser [Thu, 5 Jun 2008 03:28:48 +0000 (21:28 -0600)]
Use a gaussian window for cplxblur
Cplxblur was originally intended to use a gaussian window, but in its current form did not. This change provides a tiny improvement to 2pass ratecontrol.
2-pass VBV support and improved VBV handling
Dramatically improves 1-pass VBV ratecontrol (especially CBR) and provides support for VBV in 2-pass mode. This consists of a series of functions that attempts to find overflows and underflows in the VBV from the first-pass statsfile and fix them before encoding.
1-pass VBV code partially by Fiona Glaser.
Fix noise reduction in threaded mode.
Previously enabling noise reduction with threads had no effect.
Note that this is not an optimal solution; each thread still tracks noise reducation separately (unlike in single-threaded mode).
Loren Merritt [Mon, 5 May 2008 22:28:24 +0000 (16:28 -0600)]
don't pretend to support win64. remove all related code.
it hasn't worked since probably some time in 2005, and won't ever be fixed unless someone steps up to maintain it.
Fix define of illegal identifier (as defined in section "7.1.3 Reserved identiers" of C99 spec) "__UNUSED__", and use the one defined in common/osdep.h, i.e. "UNUSED"
based on a patch by Diego Biurrun
add "SECTION_RODATA" before "SECTION .text" to setup the fakegot label used in macho binaries.
This fixes compilation with --enable-pic
Requires Yasm 0.7.0 or newer
Patch by Dave Lee % davelee P com A gmail P com %