Jingning Han [Thu, 23 Oct 2014 23:54:45 +0000 (16:54 -0700)]
Tile based adaptive mode search in RD loop
Make the spatially adaptive mode search in rate-distortion
optimization loop inter tile independent. Experiments suggest that
this does not significantly change the coding staticstics.
Single tile, speed 3:
pedestrian_area 1080p 1500 kbps
59192 b/f, 40.611 dB, 101689 ms
blue_sky 1080p 1500 kbps
58505 b/f, 36.347 dB, 62458 ms
mobile_cal 720p 1000 kbps
13335 b/f, 35.646 dB, 45655 ms
as compared to 4 column tiles, speed 3:
pedestrian_area 1080p 1500 kbps
59329 b/f, 40.597 dB, 101917 ms
blue_sky 1080p 1500 kbps
58712 b/f, 36.320 dB, 62693 ms
mobile_cal 720p 1000 kbps
13191 b/f, 35.485 dB, 45319 ms
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.
Yaowu Xu [Wed, 22 Oct 2014 18:57:09 +0000 (11:57 -0700)]
Fix a subtle issue in re-use inter_pred
The initialization of this_mode_pred does not work when the ref_frame
loop ever goes beyond LAST_FRAME. This commit fixes the subtle issue
and allows potentially expanding the loop to test GOLDEN_FRAME.
Yaowu Xu [Mon, 20 Oct 2014 21:41:20 +0000 (14:41 -0700)]
Change speed features for good quality(cpu-used=5)
The existing speed features produce horrible encoding results, almost
30% worse than cpu-used=4, this commit adjust the speed features to
produce relatively resonable results to be within 3%-5% of cpu-used=4.
Jingning Han [Fri, 17 Oct 2014 15:58:28 +0000 (08:58 -0700)]
Hybrid partition search for rtc coding mode
This commit re-designs the recursive partition search scheme in
rtc speed -5. It first checks if the current block is under cyclic
refresh mode. If so, apply recursive partition search. Otherwise,
perform sub-sampled pixel based partition selection. When the
pre-selection finds the partition size should be 32x32 or above,
use the partition size directly. Otherwise, apply partition search
at nearby levels around the preset partition size.
It is enabled in speed -5. The compression performance of rtc
speed -5 is improved by 9.4%. Speed wise, the run-time goes slower
from 1% to 10%.
nik_720p, 1000 kbps
33220 b/f, 38.977 dB, 10109 ms -> 33200 b/f, 39.119 dB, 10210 ms
vidyo1_720p, 1000 kbps
16536 b/f, 40.495 dB, 10119 ms -> 16536 b/f, 40.827 dB, 11287 ms
levytamar82 [Thu, 2 Oct 2014 06:47:31 +0000 (23:47 -0700)]
SAD32xh and SAD64xh for AVX2
All sad function that process above 32 consecutive elements are optimized
for AVX2:
vp9_sad64x64
vp9_sad64x32
vp9_sad32x64
vp9_sad32x32
vp9_sad32x16
vp9_sad64x64_avg
vp9_sad64x32_avg
vp9_sad32x64_avg
vp9_sad32x32_avg
vp9_sad32x16_avg
The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64
vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90%
both of them gave and overall ~2.3% user level gain
Yunqing Wang [Thu, 16 Oct 2014 19:29:48 +0000 (12:29 -0700)]
Remove the dependency in token storing locations
Currently, the tokens for a tile are stored immediately after its
preceding tile, which causes a dependency. This is unnecessary
since we always allocate enough memory for tokens. Removing
the dependency allows token writing done in parallel. This patch
doesn't change encoding result.
Paul Wilkins [Thu, 16 Oct 2014 15:24:25 +0000 (16:24 +0100)]
Alter adjustment of two pass GF/ARF boost with Q.
Delete gfboost_qadjust() and move Q based adjustment
into calc_frame_boost(). Also remove clamping. Making
the adjustment here means that it influences not just the
boost level but also the selection of the GF/ARF interval.
This change gives a small average gain in PSNR but
larger gains in SSIM, especially for harder std-hd set (1.5%)
Paul Wilkins [Thu, 16 Oct 2014 11:39:14 +0000 (12:39 +0100)]
Change initialization of static_scene_max_gf_interval.
This removes an unnecessary restriction that causes
a problem (noticed by AWG) when the forced key frame
interval is set to a very small value, such as 10. In this case
we were being forced to code minimal length GF groups.
Minghai Shang [Tue, 14 Oct 2014 23:25:03 +0000 (16:25 -0700)]
[spatial svc]Another workaround to avoid using prev_mi
We encode a empty invisible frame in front of the base layer frame to
avoid using prev_mi. Since there's a restriction for reference frame
scaling factor, we have to make it smaller and smaller gradually until
its size is 16x16.
This change has introduced a significant quality regression on content
with forced key frames. (e.g. the YT and yt-hd set). It is most
noticeable in static content where the kf bits dominate. Here, despite
key frames being apparently coded at the same Q, there is a drop in all
metrics of ~20% (e.g clXR and BFa0).
Jingning Han [Wed, 15 Oct 2014 19:18:48 +0000 (12:18 -0700)]
Use rate/distortion thresholds to control non-RD partition search
Compare the estimated rate and distortion to the thresholds scaled
according to the operating block size and determine if further
split partition search will be run. The compression performance of
speed -5 is changed by -0.074%. The encoding speed is 10% - 15%
faster.
vidyo1 720p
16545 b/f, 40.492 dB, 11475 ms -> 16535 b/f, 40.486 dB, 10100 ms
nik720p
16624 b/f, 36.310 dB, 10071 ms -> 16617 b/f, 36.313 dB, 8346 ms
Jingning Han [Wed, 15 Oct 2014 18:37:20 +0000 (11:37 -0700)]
Replace copy_partitioning use case with choose_partitioning
This commit replaces the use of copy_partitioning with
choose_partitioning based on the sse of subsamped pixels, which
provides significantly better coding performance and runs at
similar speed, as compared to copy_partitioning. It improves rtc
speed 5 coding performance by 3%.
James Zern [Wed, 15 Oct 2014 14:37:35 +0000 (16:37 +0200)]
fix CONFIG_SPATIAL_SVC warning
this change checks that CONFIG_SPATIAL_SVC is defined and adds a TODO to
ensure this is changed in the future as the release headers can't
depend on vpx_config.h.
vpx/vpx_encoder.h:164:5: warning: "CONFIG_SPATIAL_SVC" is not defined
[-Wundef]
Minghai Shang [Tue, 14 Oct 2014 23:25:03 +0000 (16:25 -0700)]
[spatial svc]Another workaround to avoid using prev_mi
We encode a empty invisible frame in front of the base layer frame to
avoid using prev_mi. Since there's a restriction for reference frame
scaling factor, we have to make it smaller and smaller gradually until
its size is 16x16.
Change-Id: I60b680314e33a60b4093cafc296465ee18169c19
Adrian Grange [Thu, 2 Oct 2014 22:26:42 +0000 (15:26 -0700)]
Move input frame scaling into the recode loop
Move the point at which input frames are scaled
into the recode loop. This will allow us to change
the coded frame size dynamically in response
to previous attempts to encode the frame at a
higher resolution.
A following patch will implement a scheme for
resizing the frame in the recode loop.