Paul Wilkins [Fri, 3 Jan 2014 14:14:04 +0000 (14:14 +0000)]
No arf right before real scene cut.
To reduce pulsing we now allow an arf just before forced key frames
and at the end of a clip or section (which may be stitched to
another clip or section). However, this does not make sense for
key frames arising from real scene cuts.
Change from original patch reflects other recent changes in regard
to alignment of gf/arf and kf groups.
Dmitry Kovalev [Sat, 11 Jan 2014 00:09:56 +0000 (16:09 -0800)]
Cleaning up and fixing psnr calculation code.
Introducing calc_psnr() which calculates psnr between two yv12 buffers.
Previously we incorrectly used width/height instead of
crop_width/crop_height to calculate number of samples -- fixed.
Jingning Han [Fri, 10 Jan 2014 20:48:04 +0000 (12:48 -0800)]
Declare setup_buffer_inter in vp9_rdopt.h
This funtion initializes buffer pointers and first stage motion vector
prediction. It will be needed by both regular rate-distortion
optimization loop and the non-RD mode decision. Hence move its
declaration in vp9_rdopt.h
Jingning Han [Fri, 10 Jan 2014 02:01:30 +0000 (18:01 -0800)]
Enable skipping reference frame check in rd loop
This commit allows encoder to compare the SAD cost associated with
the best motion vector predictor, per frame. If one reference frame
has this cost more than 4 times of the best SAD cost given by other
reference frames, skip NEARESTMV, NEARMV, ZEROMV mode check of this
reference frame.
This setting is turned on in speed 2 and above. Compression quality
change in speed 2:
derf -0.014%
yt -0.097%
hd -0.023%
stdhd 0.046%
It reduces the speed 2 runtime of test sequences:
pedestrian_area_1080p 4000 kbps 310763 ms -> 303595 ms
bluesky_1080p 6000 kbps 259852 ms -> 251920 ms
Marco Paniconi [Thu, 9 Jan 2014 22:17:00 +0000 (14:17 -0800)]
Keep buffer clipped to maximum in change_config.
Under a configuration change, where the bitrate suddenly decreases,
the buffer level may be larger than maximum allowed (for that first frame to be encoded after change_config).
This change keeps it clipped to its maximum level.
Jingning Han [Thu, 9 Jan 2014 20:43:40 +0000 (12:43 -0800)]
Optimze inv 16x16 DCT with 10 non-zero coeffs - P2
This commit further optimizes SSE2 operations in the second 1-D
inverse 16x16 DCT, with (<10) non-zero coefficients. The average
runtime of this module goes down from 779 cycles -> 725 cycles.
levytamar82 [Thu, 21 Nov 2013 22:49:29 +0000 (15:49 -0700)]
SSSE3 convolution optimization
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization done only for 64bit.
Jingning Han [Tue, 7 Jan 2014 22:35:02 +0000 (14:35 -0800)]
Optimze inv 16x16 DCT with 10 non-zero coeffs - P1
This commit is the first patch optimizing SSE2 implementation of inverse
16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
transformation. It exploits the fact that only top-left 4x4 block contains
non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.
The average runtime of idct16x16_10 unit is reduced from
883 cycles -> 779 cycles (12% faster).
For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
down from 310651 ms -> 305910 ms. The decoding speed goes up from
80.37 fps -> 80.87 fps.
levytamar82 [Sun, 29 Dec 2013 08:23:50 +0000 (01:23 -0700)]
AVX2 Variance Optimization
Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32,
vp9_variance64x64, vp9_variance32x16, vp9_variance64x32,
vp9_mse16x16 by migrating to AVX2
some of the functions were optimized by processing 32 elements instead of 16.
some of the functions were optimized by processing 2 loop strides of 16
elements in a single 256 bit register
This optimization gives between 2.4% - 2.7% user level performance gain
and 42% function level gain.
Jingning Han [Tue, 7 Jan 2014 17:53:38 +0000 (09:53 -0800)]
Fix an issue in motion vector prediction stage
The previous implementation stops motion vector prediction test when
the zero motion vector appears for the second time. This commit fixes
it by simply skipping the second time check on zero mv and continuing
on to next mv candidate.
It slightly improves stdhd in speed 2 by 0.06% on average. Most static
sequences are not affected. A few hard ones, like jet, ped, and riverbed
were improved by 0.1 - 0.2%.