]>
granicus.if.org Git - libvpx/log
Yaowu Xu [Sat, 23 Nov 2013 01:01:07 +0000 (17:01 -0800)]
Merge "Fix bug in extend_frame chroma extended too far"
Adrian Grange [Tue, 19 Nov 2013 22:01:44 +0000 (14:01 -0800)]
Fix decoder to handle display size correctly
The decoder ignored the display width & height
specified in the frame header.
This patch adds a control, VP9D_GET_DISPLAY_SIZE, to
allow the application to obtain the display width and
height from the frame header.
vpxdec has been modified to scale the output frame to
this size.
Should the request for the display size fail vpxdec will
use the native width and height of the raw decoded
frame instead.
Change-Id: I25db04407426dac730263720c75a7dd6400af68a
Dmitry Kovalev [Fri, 22 Nov 2013 18:52:40 +0000 (10:52 -0800)]
Merge "Cleaning up entropy probability update in encoder."
Dmitry Kovalev [Fri, 22 Nov 2013 18:51:38 +0000 (10:51 -0800)]
Merge "Removing txfrm_block_to_raster_xy() call from extend_for_intra()."
Yunqing Wang [Fri, 22 Nov 2013 18:39:55 +0000 (10:39 -0800)]
Merge "Improve vp9_fdct4x4_sse2 (x1.2)"
Yaowu Xu [Fri, 22 Nov 2013 18:03:51 +0000 (10:03 -0800)]
Merge "Fix the cpuid macro for x86_64 non-gcc build"
Adrian Grange [Fri, 22 Nov 2013 01:19:04 +0000 (17:19 -0800)]
Fix bug in extend_frame chroma extended too far
This fixes issue 667.
In the case where the frame was an odd number of pixels
wide or high, the border was being extended by one col
or row too far.
The calculation of color plane dimensions was modified
to use those already computed at the time the frame
buffer was allocated.
Also freed the temporary scaling buffer in vpxdec to
prevent a memory leak.
Change-Id: Ied04bdcdfd77469731408c05da205db1a6f89bf5
Jim Bankoski [Fri, 22 Nov 2013 16:16:17 +0000 (08:16 -0800)]
Merge changes Id1698a35,Idcabd0b9
* changes:
detokenization speedups
Don't write 0's to token_cache
Deb Mukherjee [Fri, 22 Nov 2013 16:06:48 +0000 (08:06 -0800)]
Merge "Refactoring of rate control - part 1"
Deb Mukherjee [Wed, 6 Nov 2013 21:13:59 +0000 (13:13 -0800)]
Refactoring of rate control - part 1
Moves all rate control variables to a separate structure,
removes some currently unused variables,
moves some rate control functions to vp9_ratectrl.c,
and splits the encode_frame_to_data_rate function.
Change-Id: I4ed54c24764b3b6de2dd676484f01473724ab52b
Dmitry Kovalev [Fri, 22 Nov 2013 03:30:58 +0000 (19:30 -0800)]
Removing txfrm_block_to_raster_xy() call from extend_for_intra().
Change-Id: I6a48d1f35ed5fe7a2c7499675b339994c9c3bdf2
Yaowu Xu [Fri, 22 Nov 2013 01:39:33 +0000 (17:39 -0800)]
Fix the cpuid macro for x86_64 non-gcc build
Change-Id: I0c44800db10db8d74c1ddfe89abecfd1c53d0f8d
Tom Finegan [Fri, 22 Nov 2013 01:56:26 +0000 (17:56 -0800)]
Merge "vpxenc: Add vpxenc.h and move/rename the global_config struct"
Jim Bankoski [Fri, 22 Nov 2013 00:55:22 +0000 (16:55 -0800)]
detokenization speedups
removed unnecessary ifs and branches ..
Change-Id: Id1698a35292659388f48926791024d1400f2cea9
Dmitry Kovalev [Fri, 22 Nov 2013 00:48:34 +0000 (16:48 -0800)]
Merge "Using num_4x4_blocks_* instead of b_{width, height}_log2."
Tom Finegan [Fri, 22 Nov 2013 00:46:40 +0000 (16:46 -0800)]
vpxenc: Add vpxenc.h and move/rename the global_config struct
- Rename the struct to VpxEncoderConfig.
- The idea behind this is to enable checking the global settings against
stream specific settings in source files other than vpxenc.c.
Change-Id: Ic736cbb714845b9466acb34671780d65b83ad1a8
Dmitry Kovalev [Fri, 22 Nov 2013 00:37:27 +0000 (16:37 -0800)]
Merge "Removing plane_block_{width, height} functions."
Dmitry Kovalev [Fri, 22 Nov 2013 00:24:22 +0000 (16:24 -0800)]
Merge "Using txfrm_block_to_raster_xy() in encoder."
Dmitry Kovalev [Thu, 21 Nov 2013 23:53:06 +0000 (15:53 -0800)]
Using num_4x4_blocks_* instead of b_{width, height}_log2.
Change-Id: I9ea3946c17b19f511565cd771037abe7db8b3ddb
Joshua Litt [Thu, 21 Nov 2013 23:06:51 +0000 (15:06 -0800)]
Merge "Removing PARAMS macro for consistency"
Frank Galligan [Thu, 21 Nov 2013 23:06:17 +0000 (15:06 -0800)]
Merge "Revert "Add 16 wide neon horz loopfilter.""
Frank Galligan [Thu, 21 Nov 2013 22:01:33 +0000 (14:01 -0800)]
Revert "Add 16 wide neon horz loopfilter."
The change caused mismatches with some test vectors on neon.
Original CL: https://gerrit.chromium.org/gerrit/#/c/67863/
Change-Id: I913891636d53783e93cb1865ca78ded1821dc4b0
Jim Bankoski [Thu, 21 Nov 2013 20:52:15 +0000 (12:52 -0800)]
Don't write 0's to token_cache
This code only updates the token_cache if the result is non0.
Change-Id: Idcabd0b993a926fea9c29dbec134b9c5c4859b40
Dmitry Kovalev [Thu, 21 Nov 2013 20:36:02 +0000 (12:36 -0800)]
Syncing update_coef_probs() implementation with decoder.
Using for loop based on max_tx_size instead of separate checks. Combining
build_coeff_contexts() with update_coef_probs().
Change-Id: Ie335a7db29830677fbc14478a9c190d3c1068665
Abo Talib Mahfoodh [Thu, 21 Nov 2013 20:00:20 +0000 (15:00 -0500)]
Improve vp9_fdct4x4_sse2 (x1.2)
Modifications are done to reduce the total clock cycle.
Speedup: 1.2
Tested with: park_joy_420_720p50.y4m
Change-Id: Ia36b87e62e2f80a5fadaf5628729aedc80f38f3f
Yunqing Wang [Thu, 21 Nov 2013 19:25:55 +0000 (11:25 -0800)]
Merge "Add filter_selectively_vert_row2 to enable parallel loopfiltering"
hkuang [Thu, 21 Nov 2013 19:24:02 +0000 (11:24 -0800)]
Merge "Remove unnecessary eob checking."
Frank Galligan [Thu, 21 Nov 2013 18:29:30 +0000 (10:29 -0800)]
Merge "Add 16 wide neon horz loopfilter."
Yunqing Wang [Fri, 15 Nov 2013 19:04:09 +0000 (11:04 -0800)]
Add filter_selectively_vert_row2 to enable parallel loopfiltering
Added filter_selectively_vert_row2 to be ready for parallel
loopfiltering in vertical direction. This change did 2-row
filtering at a time. If 2 vertically adjacent 8x8 blocks do same
type of filtering, we can do 16-pixel filtering in parallel.
Next, we need to provide 16-pixel loopfiltering functions in c
and optimized versions for codec speedup.
Change-Id: Idf97bbdd70566e55bd30e1fd25cb8544e33291be
Yunqing Wang [Thu, 21 Nov 2013 17:40:02 +0000 (09:40 -0800)]
Merge "Correct ssse3 8/16-pixel wide sub-pixel filter calculation"
Frank Galligan [Tue, 19 Nov 2013 18:33:32 +0000 (10:33 -0800)]
Add 16 wide neon horz loopfilter.
Add support to do 16 pixel horizontal filtering in Neon.
Nexus devices saw about 0.5% decode speed increase.
Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d
Tom Finegan [Thu, 21 Nov 2013 01:18:28 +0000 (17:18 -0800)]
vpxenc: Warn users about incorrect quantizer settings.
Also, clean up stylistically questionable code near my changes.
Change-Id: I92c96a274cb339b7b74174a608f94ae86aba8354
Dmitry Kovalev [Wed, 20 Nov 2013 22:43:03 +0000 (14:43 -0800)]
Merge "Removing old code."
Dmitry Kovalev [Wed, 20 Nov 2013 22:39:58 +0000 (14:39 -0800)]
Merge "Adding MV_FP_SIZE constant."
Dmitry Kovalev [Wed, 20 Nov 2013 22:39:50 +0000 (14:39 -0800)]
Merge "Using is_inter_block() and has_second_ref() functions."
Dmitry Kovalev [Wed, 20 Nov 2013 22:05:21 +0000 (14:05 -0800)]
Removing old code.
Change-Id: I67d1681c7b17661deb792c5e6a9e2014a73ff9b7
Dmitry Kovalev [Wed, 20 Nov 2013 21:58:21 +0000 (13:58 -0800)]
Using txfrm_block_to_raster_xy() in encoder.
Change-Id: Ibe847000467fe46bf8ce87d8f1ef8f2d5ad1eaf4
Yunqing Wang [Wed, 20 Nov 2013 20:52:56 +0000 (12:52 -0800)]
Correct ssse3 8/16-pixel wide sub-pixel filter calculation
Although no mismatch was indicated for 8/16 wide sub-pixel filters
in issue 661, they had similar problems that could cause mismatch
potentially. This patch fixed calculations in HORIZx8/16
and VERTx8/16.
Change-Id: I169961c9d40a20340995b7d22aafc89ccf30bfca
Dmitry Kovalev [Wed, 20 Nov 2013 20:39:29 +0000 (12:39 -0800)]
Removing plane_block_{width, height} functions.
Change-Id: I29c0dfcf41a1253d5e2a0d2ff740c0c38ebaa5a2
Jim Bankoski [Wed, 20 Nov 2013 20:30:03 +0000 (12:30 -0800)]
Merge "Clean up removal of vp9_pareto8 table."
Dmitry Kovalev [Wed, 20 Nov 2013 04:25:55 +0000 (20:25 -0800)]
Using is_inter_block() and has_second_ref() functions.
Change-Id: Iadd771a33c8874f3b774923bca4da3c8fe5429ee
Dmitry Kovalev [Wed, 20 Nov 2013 04:18:01 +0000 (20:18 -0800)]
Adding MV_FP_SIZE constant.
Change-Id: I98d750ee92ff51fb714980418ea28be3b1d0f3c6
Yunqing Wang [Wed, 20 Nov 2013 20:01:39 +0000 (12:01 -0800)]
Merge "Support for extended feature flags enumeration leaf in CPUID instruction"
hkuang [Wed, 20 Nov 2013 19:22:00 +0000 (11:22 -0800)]
Remove unnecessary eob checking.
Change-Id: Ia568f70bddc1a2b62141a0197459119ca74c22b5
Jim Bankoski [Wed, 20 Nov 2013 19:34:30 +0000 (11:34 -0800)]
Merge "remove the model and copy in pack_mb_tokens"
Jim Bankoski [Wed, 20 Nov 2013 19:17:26 +0000 (11:17 -0800)]
Clean up removal of vp9_pareto8 table.
Change-Id: I5556e8d1fc150be8a3e93af21900829b59a500dc
Erik Niemeyer [Wed, 20 Nov 2013 04:11:57 +0000 (21:11 -0700)]
Support for extended feature flags enumeration leaf in CPUID instruction
This CL fixes an overcite with the AVX2 support CL previously
merged (Change-Id: Idc03f3fca4bf2d0afd33631ea1d3caf8fc34ec29) that
prevented runtime execution of AVX2 code in WebM.
Background:
Starting with the Sandybridge processor, the CPUID instruction was
enhanced to add various extended feature flag enumeration leaves.
Reading these leaves requires an additional input value for the CPUID
instruction which is stored in ECX. This change adds this second input
value for all ARCH_X86 and ARCH_x86_64 targets to the CPUID macros,
allowing checks of EBX bit 5 for AVX2 support. This capability will be
required moving forward to check for future processor features.
Change-Id: Ie9d872bc9ff68dad4b6578e4544e4dfd0ae26c36
Jingning Han [Wed, 20 Nov 2013 18:55:27 +0000 (10:55 -0800)]
Merge "Take out assertion from inverse transforms"
Jim Bankoski [Wed, 20 Nov 2013 18:06:04 +0000 (10:06 -0800)]
remove the model and copy in pack_mb_tokens
Change-Id: I00a5203c8ed76c184d936fccf93d76e7c06773d3
Yunqing Wang [Wed, 20 Nov 2013 17:42:44 +0000 (09:42 -0800)]
Fix stack pointer in sub-pixel filters
In commit "
3d50da5397d20abc932d81453b26cde758293a40 ", the stack
pointer was modified while aligning the stack, and it needed to
be pop out at the end.
Change-Id: I062971e195f1f2ab9d0ab5fb84dcf215a0fcaa67
Guillaume Martres [Wed, 20 Nov 2013 16:13:28 +0000 (08:13 -0800)]
Merge "vpxenc: add --aq-mode flag to control adaptive quantization"
Dmitry Kovalev [Wed, 20 Nov 2013 03:49:56 +0000 (19:49 -0800)]
Cleaning up entropy probability update in encoder.
Change-Id: I94cb9e3d910dff74bf90906dd96e3a4e06ebdbe6
Marco Paniconi [Wed, 20 Nov 2013 01:10:57 +0000 (17:10 -0800)]
Undo the vp8 change in "Reduce loop filter in..."
Patch in https://gerrit.chromium.org/gerrit/#/c/41176/
was merged into repository by mistake.
Change-Id: I235c71af26bb2d72698c8aac2301e5a7e9c5f960
Jim Bankoski [Wed, 20 Nov 2013 00:22:48 +0000 (16:22 -0800)]
Merge "scan order table lookup same for encoder and decoder"
Yunqing Wang [Wed, 20 Nov 2013 00:19:32 +0000 (16:19 -0800)]
Merge "Fix decoder mismatch with ssse3 enabled"
Jingning Han [Wed, 20 Nov 2013 00:19:04 +0000 (16:19 -0800)]
Merge "Use restore_dst_buf in handle_inter_mode"
Dmitry Kovalev [Wed, 20 Nov 2013 00:08:16 +0000 (16:08 -0800)]
Merge "Cleaning up probability/cost functions."
Yaowu Xu [Wed, 20 Nov 2013 00:01:19 +0000 (16:01 -0800)]
Merge "Move vp9_setup_interp_filter() to encoder"
Jingning Han [Tue, 19 Nov 2013 23:29:22 +0000 (15:29 -0800)]
Use restore_dst_buf in handle_inter_mode
There are many places in handle_inter_mode that need to restore the
dst buffer pointers, due to buffer pointer swap and early rd search
breakout. This commit wraps these operations into an inline function
for clean-up.
Change-Id: I0462e8c41c8bc3cd8db07395489cac03d8e5be54
Jim Bankoski [Tue, 19 Nov 2013 23:31:43 +0000 (15:31 -0800)]
scan order table lookup same for encoder and decoder
Change-Id: I473947b5ca70b7a81151926284bff86f8555492a
Tom Finegan [Tue, 19 Nov 2013 23:30:41 +0000 (15:30 -0800)]
Merge "vpxdec: Relocate WebM input support."
Yunqing Wang [Tue, 19 Nov 2013 22:29:25 +0000 (14:29 -0800)]
Fix decoder mismatch with ssse3 enabled
This patch fixed issue 661: "Decoder produces mismatched outputs
with ssse3 enabled and disabled." In sub-pixel filters, a pixel
value was multiplied by a filter coefficient, and the results
were added up. The order of adding up these multiplications had to
be arranged carefully to prevent incorrect overflowing.
Change-Id: Id08af4200fea9e1b896fc40157b8651c2c7e80f2
Dmitry Kovalev [Tue, 19 Nov 2013 23:09:01 +0000 (15:09 -0800)]
Merge "Simplifying partition context calculation."
Dmitry Kovalev [Tue, 19 Nov 2013 23:05:46 +0000 (15:05 -0800)]
Merge "Calculating dst pointer only once per transform block."
Dmitry Kovalev [Tue, 19 Nov 2013 22:59:12 +0000 (14:59 -0800)]
Cleaning up probability/cost functions.
Change-Id: Ifad4b0e6355ce49fcc6f470becc080e8069452ee
Jim Bankoski [Tue, 19 Nov 2013 22:58:44 +0000 (14:58 -0800)]
Merge "entropy code speedup"
Yaowu Xu [Tue, 19 Nov 2013 22:57:58 +0000 (14:57 -0800)]
Move vp9_setup_interp_filter() to encoder
As it is used in encoder only.
Change-Id: I5f2a8abbe72bb18cbf6ce36a3dc7e132aeae8ec2
Jim Bankoski [Tue, 19 Nov 2013 22:42:32 +0000 (14:42 -0800)]
Merge "Reduce loop filter in cyclic refresh."
Yaowu Xu [Tue, 19 Nov 2013 22:41:33 +0000 (14:41 -0800)]
Merge "Move vp9_sadmxn.h from common to encoder"
Jim Bankoski [Tue, 19 Nov 2013 22:31:38 +0000 (14:31 -0800)]
entropy code speedup
Change-Id: Ic316d3374ff9a2b43897272260947d56765a0fdd
Jim Bankoski [Tue, 19 Nov 2013 20:50:48 +0000 (12:50 -0800)]
scan order / neighbors converted to lookup
Change-Id: I64b189dfeee1cf3e90134a1a93497072f3361e5e
Yaowu Xu [Tue, 19 Nov 2013 20:46:08 +0000 (12:46 -0800)]
Move vp9_sadmxn.h from common to encoder
Change-Id: I6f6ba91b1b8b280902b171472314d665aa0baf0b
Yaowu Xu [Tue, 19 Nov 2013 19:26:02 +0000 (11:26 -0800)]
Merge "Fix a bug in vpxenc reading raw input frame"
Dmitry Kovalev [Tue, 19 Nov 2013 19:17:30 +0000 (11:17 -0800)]
Simplifying partition context calculation.
Reversing bit order of partition_context_lookup, and modifying accordingly
update_partition_context() and partition_plane_context().
Change-Id: I64a11f1a94962a3bf217de2f50698cb781db71a5
Johann [Tue, 19 Nov 2013 19:17:19 +0000 (11:17 -0800)]
Merge "Disable avx/avx2 for Visual Studio 2010"
Yunqing Wang [Tue, 19 Nov 2013 19:11:47 +0000 (11:11 -0800)]
Merge "Improve vp9_iht4x4_16_add_sse2 (x1.341)"
Yaowu Xu [Tue, 19 Nov 2013 18:17:04 +0000 (10:17 -0800)]
Fix a bug in vpxenc reading raw input frame
The bug was introduced in
00a35aab . The reading of raw yuv input frame
was off by 4 bytes.
Change-Id: I6923ea5528aa529a47a06b64adca8f94847f19a6
Tom Finegan [Mon, 18 Nov 2013 22:39:51 +0000 (14:39 -0800)]
vpxdec: Relocate WebM input support.
- Move it to webmdec.c and webmdec.h.
- Also, tidy up obvious style nits in the vicinity of code I was
already touching.
Change-Id: Ie2898d06e73c1e9030d9c8d465b73ee7edc3c02a
Joshua Litt [Tue, 19 Nov 2013 01:07:55 +0000 (17:07 -0800)]
Removing PARAMS macro for consistency
Change-Id: I23ed873a6c47b15491a2ffbcdd4f0fdeef1207a0
Dmitry Kovalev [Tue, 19 Nov 2013 03:00:49 +0000 (19:00 -0800)]
Removing raster_block_offset_uint8() function.
There is no need to use that function, it is much clear to pass offset
directly to the buffer.
Change-Id: I9026cb0c5094c46f97df5d7f7daeb952f2843b24
Dmitry Kovalev [Tue, 19 Nov 2013 02:43:16 +0000 (18:43 -0800)]
Merge "Finally removing txfrm_block_to_raster_block() function."
Dmitry Kovalev [Tue, 19 Nov 2013 02:37:53 +0000 (18:37 -0800)]
Calculating dst pointer only once per transform block.
Change-Id: I23fea0a2e85be8373600e3e2dae98d36acde389c
Dmitry Kovalev [Tue, 19 Nov 2013 02:04:56 +0000 (18:04 -0800)]
Merge "Cleaning up vp9_entropy.c file."
Abo Talib Mahfoodh [Tue, 19 Nov 2013 01:51:20 +0000 (20:51 -0500)]
Improve vp9_iht4x4_16_add_sse2 (x1.341)
This rebase is a better implementation of the previous ones.
Modifications are done to reduce the total clock cycle.
Speedup: 1.341
Compiled with -O3
Tested with: park_joy_420_720p50.y4m
Change-Id: I940eaf283f60597ca0d9d2e13d518878d55ff02d
Dmitry Kovalev [Tue, 19 Nov 2013 01:18:14 +0000 (17:18 -0800)]
Cleaning up vp9_entropy.c file.
Change-Id: I568f5e2d4ef2f2affe013ba1691ffb546f1fe8c6
Joshua Litt [Fri, 15 Nov 2013 20:29:26 +0000 (12:29 -0800)]
Decoder performance test added to unit tests
Change-Id: Id578a5fe2039631cefd82dc2ef98cc62683194c3
Tom Finegan [Tue, 19 Nov 2013 00:23:20 +0000 (16:23 -0800)]
Merge "vpxdec: Include frame number when decode fails."
Tom Finegan [Mon, 18 Nov 2013 23:50:58 +0000 (15:50 -0800)]
vpxdec: Include frame number when decode fails.
Change-Id: I6ea460af884d522319735e4416a2dd66c2f35d27
Yaowu Xu [Mon, 18 Nov 2013 23:43:41 +0000 (15:43 -0800)]
Merge "Fixed a bug in commit
a4a5a210 "
Yaowu Xu [Mon, 18 Nov 2013 23:43:32 +0000 (15:43 -0800)]
Merge "Move vp9_extend.{h,c} from common to encoder"
Yaowu Xu [Mon, 18 Nov 2013 22:44:38 +0000 (14:44 -0800)]
Fixed a bug in commit
a4a5a210
Commit
a4a5a210 enabled lossless coding, but the commit incorrectly
disabled the usage of skip in encoder even when skip should be used.
This commit make sure that skip is enabled even in lossless mode.
Change-Id: I276954f952c6ac68f17a316ebc72f09001228a08
Johann [Mon, 18 Nov 2013 21:30:19 +0000 (13:30 -0800)]
Disable avx/avx2 for Visual Studio 2010
VS2010 only supports avx. There is currently no avx code
in libvpx so don't create a special case for it.
Change-Id: Iacb10ea4762155412e04f23904b4324d01451fbd
Yaowu Xu [Mon, 18 Nov 2013 20:36:55 +0000 (12:36 -0800)]
Move vp9_extend.{h,c} from common to encoder
Since they used in encoder only. This commit also re-order includes
for the files that include vp9_extend.h
Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
Jingning Han [Mon, 18 Nov 2013 20:35:34 +0000 (12:35 -0800)]
Merge "Constrain encoder motion search range"
Jingning Han [Sat, 16 Nov 2013 04:32:03 +0000 (20:32 -0800)]
Constrain encoder motion search range
Explicitly constrain the upper limit of motion search range (in the
unit of full pixel) to be [-1023, +1023]. It is intended to control
the effective motion search range for 4K sequences.
Change-Id: I645539c70885eec0f155781f439d97d333336e88
Yunqing Wang [Mon, 18 Nov 2013 18:03:41 +0000 (10:03 -0800)]
Merge "Do horizontal loopfiltering in parallel"
Yaowu Xu [Mon, 18 Nov 2013 17:32:19 +0000 (09:32 -0800)]
Merge "Add support for VC++2013"
Jim Bankoski [Sun, 17 Nov 2013 14:58:08 +0000 (06:58 -0800)]
partition context update speedup
This removes a lot of operations in setting partition context...
Change-Id: I365e6f5607ece85190cb21443988816dfa510ce3
Tom Finegan [Sat, 16 Nov 2013 16:31:20 +0000 (08:31 -0800)]
vpxdec: Restore IVF support.
Refactored IVF frame reading code out into ivf_read_frame(). Forgot
to actually make the function call in read_frame().
Change-Id: Ie9f6917e70bd26d0352a761932465c60a29a1f81
Yunqing Wang [Wed, 13 Nov 2013 00:51:15 +0000 (16:51 -0800)]
Do horizontal loopfiltering in parallel
This patch followed "Rewrite filter_selectively_horiz for parallel
loopfiltering" commit, and added x86 SSE2 optimization to do
16-pixel filtering in parallel. Also, corrected the declaration
of aligned arrays. For 8-pixel-in-parallel case, improved the
calculation of the masks and filters. Updated the threshold loading
since the thresholds were already duplicated. Updated neon C functions
to call neon loopfilters twice.
Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35