John Koleszar [Wed, 28 Mar 2012 23:41:16 +0000 (16:41 -0700)]
FTFY: support wordwrapping commit messages
It's common for commit messages to be wrapped at odd places. git-gui
is often to blame. Adds support for automatically fixing up these
messages if running ftfy --amend, and adds a new option --msg-only for
fixing only the commit message.
John Koleszar [Tue, 27 Mar 2012 23:28:41 +0000 (16:28 -0700)]
FTFY: an automated style corrector
This is a utility for applying a limited amount of style correction on
a change-by-change basis. Rather than a big-bang reformatting, this
tool attempts to only correct the style in diff hunks that you touch.
This should make the cosmetic changes small enough that we can mix them
with functional changes without destroying the diffs, and there's an
escape hatch for separating the reformatting to a second commit for
purists and cases where it hurts readability.
At this time, the script requires a clean working tree, so run it after
you've commited your changes. Run without arguments, the style
corrections will be applied and left unstaged in your working copy. It
also supports the --amend option, which will automatically amend your
HEAD with the corrected style, and --commit, which will create a new
change dependent on your HEAD that contains only the whitespace changes.
There are a number of ways this could be applied in an automated manner
if this proves to be useful, either on a project-wide or per-user
basis. This doesn't buy anything in terms of real code quality, the
intent here would be to keep formatting nits out of review comments in
favor of more meaningful ones and help people whose habitual style
doesn't match the baseline.
Requires astyle[1] 1.24 or newer.
[1]: http://astyle.sourceforge.net/
Johann [Tue, 6 Mar 2012 00:36:23 +0000 (16:36 -0800)]
include CHANGELOG in CODEC_SRCS
build/make/version.sh requires CHANGELOG to generate vpx_version.h
The file is already included when building the documentation. However,
documentation is not build if doxygen/php are not present.
This is necessary when using '--enable-install-srcs --enable-codec-srcs'
and 'make dist'
Johann [Fri, 2 Mar 2012 00:12:53 +0000 (16:12 -0800)]
Fix encoder debug setting
Propagate debug setting to the EBML struct. When writing the application
name, this allows us to strip the version code and keep the output
metadata static.
Yunqing Wang [Wed, 29 Feb 2012 13:24:53 +0000 (08:24 -0500)]
vpxenc: fix time and fps calculation in 2-pass encoding
When we do 2-pass encoding, elapsed time is accumulated through
whole 2-pass process, which gives incorrect time and fps results
for second pass. This change fixed that by resetting the time
accumulator for second pass.
Attila Nagy [Thu, 9 Feb 2012 10:37:03 +0000 (12:37 +0200)]
Packing bitstream on-the-fly with delayed context updates
Produce the token partitions on-the-fly, while processing each MB.
Context is updated at the beginning of each frame based on the
previoud frame's counters. Optimally encoder outputs partitions in
separate buffers. For frame based output, partitions are concatenated
internally.
Limitations:
- enabled just in combination with realtime-only mode
- number of encoding threads has to be equal or less than the
number of token partitions. For this reason, by default the encoder
will do 8 token partitions.
- vpxenc supports partition output (-P) just in combination with
IVF output format (--ivf)
Performance:
- Realtime encoder can be up to 13% faster (ARM) depending on the number
of threads and bitrate settings. Constant gain over the 5-16 speed
range.
- Token buffer reduced from one frame to 8 MBs
Quality:
- quality is affected by the delayed context updates. This again
dependents on input material, speed and bitrate settings. For VC
style input the loss seen is up to 0.2dB. If error-resilient=2
mode is used than the effect of this change is negligible.
Scott LaVarnway [Tue, 28 Feb 2012 19:12:30 +0000 (14:12 -0500)]
Eliminated reconintra_mt.c
Reworked the code to use vp8_build_intra_predictors_mby_s,
vp8_intra_prediction_down_copy, and vp8_intra4x4_predict_d_c
functions instead. vp8_intra4x4_predict_d_c is a decoder-only
version of vp8_intra4x4_predict. Future commits will fix this
code duplication.
Yunqing Wang [Mon, 27 Feb 2012 22:38:53 +0000 (17:38 -0500)]
Only do uv intra-mode evaluation when intra mode is checked
When we encode slide-show clips, for the majority of the time,
only ZEROMV mode is checked, and all other modes are skipped.
This change delayed uv intra-mode evaluation until intra mode is
actually checked. This gave big performance gain for slide-show
video encoding (2nd pass gain: 18% to 28%). But, this change
doesn't help other types of videos.
Also, zbin_mode_boost is adjusted in mode-checking loop, which
causes bitstream mismatch before/after this change when --best
or --good with --cpu-used=0 are used.
Marco Paniconi [Wed, 22 Feb 2012 22:39:17 +0000 (14:39 -0800)]
Remove the frame rate factor for key frame size.
When temporal layers is used (i.e., number_of_layers > 1),
we don't use the frame rate boost for setting the key
frame target size. The factor was forcing the target size to be
always at its minimum (2* per_frame_bandwidth) for low frame rates
(i.e., base layer frame rate).
Generally we should modify or remove this frame rate factor;
for now we turn if off for number_of_layers > 1.
Yunqing Wang [Fri, 17 Feb 2012 14:15:08 +0000 (09:15 -0500)]
Fix incorrect use of uv eobs in intra modes
In vp8_rd_pick_inter_mode(), if total of eobs is zero, rate needs
to be adjusted since there are no non-zero coefficients for
transmission. The uv intra eobs calculated in
rd_pick_intra_mbuv_mode() need to be saved before they are
overwritten by inter-mode eobs.
Attila Nagy [Fri, 17 Feb 2012 09:50:33 +0000 (11:50 +0200)]
Update encoder mb_skip_coeff and prob_skip_false calculation
mode_info_context->mbmi.mb_skip_coeff has to always reflect the
existence or not of coeffs for a certain MB. The loopfilter needs this
info.
mb_skip_coeff is either set by the vp8_tokenize_mb or has to be set to
1 when the MB is skipped by mode selection. This has to be done
regardless of the mb_no_coeff_skip value.
prob_skip_false is needed just when mb_no_coeff_skip is 1. No need to
keep count of both skip_false and skip_true as they are complementary
(skip_true+skip_false = total_mbs)
John Koleszar [Wed, 15 Feb 2012 20:39:38 +0000 (12:39 -0800)]
vpxenc: initial implementation of multistream support
Add the ability to specify multiple output streams on the command line.
Streams are delimited by --, and most parameters inherit from previous
streams.
In this implementation, resizing streams is still not supported. It
does not make use of the new multistream support in the encoder either.
Two pass support runs all streams independently, though it's
theoretically possible that we could combine firstpass runs in the
future. The logic required for this is too tricky to do as part of this
initial implementation. This is mostly an effort to get the parameter
passing and independent streams working from the application's
perspective, and a later commit will add the rescaling and
multiresolution support.
The text of the bug is somewhat misleading as I initially read it to
imply the bug was present in v0.9.7-p1 (Cayuga), but note the text
"master", which indicates this was something subsequent. This issue
bisects back to v0.9.7-p1-84-ga99c20c, so unfortunately it was broken
during the Duclair release.
Thanks to Alexei Leonenko for investigating the root cause.
Attila Nagy [Fri, 16 Sep 2011 10:54:06 +0000 (13:54 +0300)]
Multithreaded encoder, late sync loopfilter
Second shot at this...
Sync with loopfilter thread as late as possible, usually just at the
beginning of next frame encoding. This returns control to application
faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed
imediatly so we cannot delay the sync. Same has to be done when
internal frame is previewed.
Johann [Thu, 9 Feb 2012 20:38:31 +0000 (12:38 -0800)]
Fix variance overflow
In the variance calculations the difference is summed and later squared.
When the sum exceeds sqrt(2^31) the value is treated as a negative when
it is shifted which gives incorrect results.
To fix this we cast the result of the multiplication as unsigned.
The alternative fix is to shift sum down by 4 before multiplying.
However that will reduce precision.
For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and
change).
John Koleszar [Tue, 7 Feb 2012 18:26:48 +0000 (10:26 -0800)]
Align internal mfqe framebuffer dimensions
MFQE postproc crashed with stream dimensions not a multiple of 16.
The buffer was memset unconditionally, so if the buffer allocation
fails we end up trying to write to NULL.
This patch traps an allocation failure with vpx_internal_error(),
and aligns the buffer dimensions to what vp8_yv12_alloc_frame_buffer()
expects.
Yunqing Wang [Thu, 2 Feb 2012 19:27:04 +0000 (14:27 -0500)]
Allow to skip highest-resolution encoding in multi-resolution encoder
Sometimes, a user doesn't have enough bandwidth to send high-resolution
(i.e. HD) video even though the camera catches HD video. This change
allowed users to skip highest-resolution encoding by setting that level's
target bit rate to 0.
To test it, modify the following line in vp8_multi_resolution_encoder.c.
unsigned int target_bitrate[NUM_ENCODERS]={1400, 500, 100};
To skip the highest-resolution level, change it to
unsigned int target_bitrate[NUM_ENCODERS]={0, 500, 100};
To skip the first and second highest resolution levels, change it to
unsigned int target_bitrate[NUM_ENCODERS]={0, 0, 100};
This change also fixed a small problem in mapping, which slightly helped
quality and performance.
John Koleszar [Fri, 13 Jan 2012 00:55:44 +0000 (16:55 -0800)]
RTCD: finalize removal of old RTCD system
This is the final commit in the series converting to the new RTCD
system. It removes the encoder csystemdependent files and the remaining
global function pointers that didn't conform to the old RTCD system.
John Koleszar [Fri, 19 Aug 2011 18:06:00 +0000 (14:06 -0400)]
New RTCD implementation
This is a proof of concept RTCD implementation to replace the current
system of nested includes, prototypes, INVOKE macros, etc. Currently
only the decoder specific functions are implemented in the new system.
Additional functions will be added in subsequent commits.
Overview:
RTCD "functions" are implemented as either a global function pointer
or a macro (when only one eligible specialization available).
Functions which have RTCD specializations are listed using a simple
DSL identifying the function's base name, its prototype, and the
architecture extensions that specializations are available for.
Advantages over the old system:
- No INVOKE macros. A call to an RTCD function looks like an ordinary
function call.
- No need to pass vtables around.
- If there is only one eligible function to call, the function is
called directly, rather than indirecting through a function pointer.
- Supports the notion of "required" extensions, so in combination with
the above, on x86_64 if the best function available is sse2 or lower
it will be called directly, since all x86_64 platforms implement
sse2.
- Elides all references to functions which will never be called, which
could reduce binary size. For example if sse2 is required and there
are both mmx and sse2 implementations of a certain function, the
code will have no link time references to the mmx code.
- Significantly easier to add a new function, just one file to edit.
Disadvantages:
- Requires global writable data (though this is not a new requirement)
- 1 new generated source file.