granicus.if.org Git - libx264/log

]> granicus.if.org Git - libx264/log

projects / libx264 / log

commit | commitdiff | tree

Anton Mitrofanov [Mon, 19 Feb 2018 16:53:38 +0000 (19:53 +0300)]

Fix --qpmax default value in fullhelp

commit | commitdiff | tree

Henrik Gramner [Fri, 30 Mar 2018 23:31:57 +0000 (01:31 +0200)]

x86: Correctly use v-prefix for instructions with opmasks

This was always required, but accidentally happened to work correctly
in a few cases.

commit | commitdiff | tree

Martin Storsjö [Fri, 30 Mar 2018 21:10:14 +0000 (00:10 +0300)]

configure: Only use gas-preprocessor with armasm for compiler=CL

This picks the right assembler automatically for arm and aarch64
llvm-mingw targets.

This doesn't get the right assembler for clang setups when clang
acts like MSVC and uses MSVC headers though (where it perhaps
should use armasm as before), but that's probably an even more
obscure setup.

commit | commitdiff | tree

Anton Mitrofanov [Wed, 17 Jan 2018 19:03:06 +0000 (22:03 +0300)]

Remove ARRAY_SIZE macro which is identical to ARRAY_ELEMS

commit | commitdiff | tree

Henrik Gramner [Sat, 6 Jan 2018 16:47:42 +0000 (17:47 +0100)]

x86inc: Correctly set mmreg variables

commit | commitdiff | tree

Diego Biurrun [Sun, 5 Feb 2017 08:02:49 +0000 (09:02 +0100)]

.gitignore: Ignore TAGS file

commit | commitdiff | tree

Diego Biurrun [Sun, 5 Feb 2017 08:02:51 +0000 (09:02 +0100)]

Minor configure improvements

* Drop empty addition of GPLed filters

* Replace backticks with $()

commit | commitdiff | tree

Henrik Gramner [Mon, 1 Jan 2018 14:05:48 +0000 (15:05 +0100)]

Bump dates to 2018

commit | commitdiff | tree

Henrik Gramner [Tue, 16 Jan 2018 16:43:24 +0000 (17:43 +0100)]

Merge zero buffers

Improves cache efficiency.

commit | commitdiff | tree

Anton Mitrofanov [Wed, 17 Jan 2018 15:19:44 +0000 (18:19 +0300)]

rdo: Use ALIGNED_ARRAY for stack arrays

commit | commitdiff | tree

Henrik Gramner [Mon, 15 Jan 2018 20:42:59 +0000 (21:42 +0100)]

Correctly align buffers for AVX and AVX-512

Fixes segfaults on Windows where the stack is only 16-byte aligned.

commit | commitdiff | tree

Anton Mitrofanov [Sun, 24 Dec 2017 19:59:09 +0000 (22:59 +0300)]

Cosmetics

commit | commitdiff | tree

Alexandra Hájková [Sun, 21 May 2017 17:40:45 +0000 (17:40 +0000)]

ppc: Add load_deinterleave_chroma_fenc_altivec

5x speed up vs C code.

commit | commitdiff | tree

Martin Storsjö [Thu, 26 Oct 2017 10:09:46 +0000 (13:09 +0300)]

Update to the latest upstream version of gas-preprocessor

This version supports converting aarch64 assembly for MS armasm64.exe.

commit | commitdiff | tree

Henrik Gramner [Sun, 22 Oct 2017 07:59:28 +0000 (09:59 +0200)]

input: Add a workaround for swscale overread bugs

swscale can read past the end of the input buffer, which may result in
crashes if such a read crosses a page boundary into an invalid page.

Work around this by adding some padding space at the end of the buffer when
using memory-mapped input frames. This may sometimes require copying the
last frame into a new buffer on Windows since the Microsoft memory-mapping
implementation has very limited capabilities compared to POSIX systems.

commit | commitdiff | tree

Henrik Gramner [Sun, 22 Oct 2017 08:50:46 +0000 (10:50 +0200)]

filters/resize: Upgrade to a newer libavutil API

Use the AVComponentDescriptor depth field instead of depth_minus1.

commit | commitdiff | tree

Martin Storsjö [Wed, 18 Oct 2017 07:40:02 +0000 (10:40 +0300)]

aarch64: Use ldurb/sturb for loads/stores with negative offsets

The assembler (both gas and clang/llvm) automatically fixes this,
armasm64 doesn't. We can fix it in gas-preprocessor, but we should
also be using the right instruction form.

commit | commitdiff | tree

Martin Storsjö [Mon, 16 Oct 2017 19:50:27 +0000 (22:50 +0300)]

configure: Add support for building with MSVC/armasm for ARM64

commit | commitdiff | tree

Martin Storsjö [Mon, 16 Oct 2017 19:50:26 +0000 (22:50 +0300)]

arm: Check for __ELF__ instead of !__APPLE__, for using .arch/.fpu

For windows, when building with armasm, we already filtered these out
with gas-preprocessor.

By filtering them out already in the source, we can also build directly
with clang for windows (which also require wrapping the assembler in
gas-preprocessor for converting instructions to thumb form, but
gas-preprocessor doesn't and shouldn't filter out them in the clang
configuration).

commit | commitdiff | tree

Martin Storsjö [Mon, 16 Oct 2017 19:50:25 +0000 (22:50 +0300)]

aarch64: Don't .set a symbol named st2

This confuses gas-preprocessor, which tries to replace actual
st2 instructions by the integer 1 or 2.

commit | commitdiff | tree

Henrik Gramner [Sat, 14 Oct 2017 12:11:26 +0000 (14:11 +0200)]

Shrink the i4x4_mode cost_table array

Only 17 elements are actually used. It was originally padded to 64 bytes to
avoid cache line splits in the x86 assembly, but those haven't really been
an issue on x86 CPU:s made in the past decade or so.

Benchmarking shows no performance impact from dropping the padding, so
might as well remove it and save some cache.

commit | commitdiff | tree

Henrik Gramner [Wed, 11 Oct 2017 16:02:26 +0000 (18:02 +0200)]

x86: Remove some legacy CPU detection hacks

Some ancient Pentium-M and Core 1 CPU:s had slow SSE units, and using MMX
was preferable. Nowadays many assembly functions in x264 completely lack MMX
implementations and falling back to C code will likely make things worse.

Some misconfigured virtualized systems could sometimes also trigger this code
path and cause assertions.

commit | commitdiff | tree

Henrik Gramner [Wed, 11 Oct 2017 15:58:36 +0000 (17:58 +0200)]

lavf: Upgrade to the new core decoding API

commit | commitdiff | tree

Vittorio Giovara [Mon, 9 Oct 2017 16:04:22 +0000 (12:04 -0400)]

lavf: Upgrade to some newer API:s

* Use the codec parameters API instead of the AVStream codec field.
* Use av_packet_unref() instead of av_free_packet().
* Use the AVFrame pts field instead of pkt_pts.

commit | commitdiff | tree

Henrik Gramner [Sun, 8 Oct 2017 19:41:16 +0000 (21:41 +0200)]

x86: AVX-512 load_deinterleave_chroma_fdec

commit | commitdiff | tree

Henrik Gramner [Sun, 8 Oct 2017 19:23:12 +0000 (21:23 +0200)]

x86: AVX-512 load_deinterleave_chroma_fenc

commit | commitdiff | tree

Henrik Gramner [Sat, 7 Oct 2017 10:06:51 +0000 (12:06 +0200)]

x86: AVX-512 mbtree_fix8_pack and mbtree_fix8_unpack

Takes advantage of opmasks to avoid having to use scalar code for the tail.

Also make some slight improvements to the checkasm test.

commit | commitdiff | tree

Henrik Gramner [Sat, 7 Oct 2017 09:34:16 +0000 (11:34 +0200)]

x86: Faster mbtree_fix8_unpack

Use a different multiplier in order to eliminate some shifts.

About 25% faster than before.

commit | commitdiff | tree

Anton Mitrofanov [Fri, 22 Sep 2017 14:28:18 +0000 (17:28 +0300)]

Don't force fast-intra for subme < 3

It have caused significant quality hit without any meaningful (if any) speed up.

commit | commitdiff | tree

Anton Mitrofanov [Fri, 22 Sep 2017 14:18:55 +0000 (17:18 +0300)]

Make ref and i4x4_mode costs global instead of static

Fixes some thread safety doubts and makes code cleaner.
Downside: slightly higher memory usage when calling multiple encoders from the same application.

commit | commitdiff | tree

Anton Mitrofanov [Fri, 22 Sep 2017 14:05:06 +0000 (17:05 +0300)]

Fix thread safety of x264_threading_init() and use of X264_PTHREAD_MUTEX_INITIALIZER with win32thread

commit | commitdiff | tree

Anton Mitrofanov [Fri, 22 Sep 2017 13:59:13 +0000 (16:59 +0300)]

configure: Improvements

Log result of pkg-config checks to config.log.
Fix lavf support detection for pkg-config fallback case.
Fix detection of linking dependencies errors for lavf/lsmash/gpac.
Cosmetics.

commit | commitdiff | tree

Anton Mitrofanov [Thu, 17 Aug 2017 20:51:14 +0000 (23:51 +0300)]

flv: Fix one frame video total duration

commit | commitdiff | tree

Anton Mitrofanov [Thu, 17 Aug 2017 20:46:23 +0000 (23:46 +0300)]

flv: Split FrameType and CodecID values

commit | commitdiff | tree

Vittorio Giovara [Tue, 8 Aug 2017 13:40:45 +0000 (15:40 +0200)]

Support writing the alternative transfer SEI message

commit | commitdiff | tree

Vittorio Giovara [Tue, 8 Aug 2017 12:56:43 +0000 (14:56 +0200)]

Support 04/2017 color matrix and transfer values

commit | commitdiff | tree

Vittorio Giovara [Fri, 6 Jan 2017 14:23:38 +0000 (15:23 +0100)]

Unify 8-bit and 10-bit CLI and libraries

Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI
option to set the bit depth at runtime.

Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an
incorrect value, it's preferable to induce a linking failure. If applications
relies on this symbol this will make it more obvious where the problem is.

Add Makefile rules that compiles modules with different bit depths. Assembly
on x86 is prefixed with the 'private_prefix' define, while all other archs
modify their function prefix internally.

Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64
assembly, PowerPC assembly, and MIPS assembly.

The depth and cache CLI filters heavily depend on bit depth size, so they
need to be duplicated for each value. This means having to rename these
filters, and adjust the callers to use the right version.

Unfortunately the threaded input CLI module inherits a common.h dependency
(input/frame -> common/threadpool -> common/frame -> common/common) which
is extremely complicated to address in a sensible way. Instead duplicate
the module and select the appropriate one at run time.

Each bitdepth needs different checkasm compilation rules, so split the main
checkasm target into two executables.

commit | commitdiff | tree

Vittorio Giovara [Fri, 6 Jan 2017 16:50:40 +0000 (17:50 +0100)]

Change default QP parameters initialization

qp is modified to require a valid value before use, while qp_max is set
to maximum allowable value (and clipped later on).

This is needed so that param functions do not depend on bit depth size.

commit | commitdiff | tree

Vittorio Giovara [Tue, 17 Jan 2017 16:07:42 +0000 (17:07 +0100)]

aarch64: Set the function symbol prefix in a single location

commit | commitdiff | tree

Vittorio Giovara [Tue, 17 Jan 2017 16:04:19 +0000 (17:04 +0100)]

arm: Set the function symbol prefix in a single location

commit | commitdiff | tree

Vittorio Giovara [Fri, 27 Jan 2017 10:58:33 +0000 (11:58 +0100)]

Drop the x264 prefix from static functions and variables

commit | commitdiff | tree

Anton Mitrofanov [Thu, 17 Aug 2017 20:25:31 +0000 (23:25 +0300)]

configure: Check for strtok_r compiler support

commit | commitdiff | tree

Henrik Gramner [Sun, 6 Aug 2017 15:17:55 +0000 (17:17 +0200)]

cabac: Make the cabac_contexts array static

Also drop the x264 prefix from all static cabac arrays.

commit | commitdiff | tree

Henrik Gramner [Thu, 17 Aug 2017 16:04:13 +0000 (18:04 +0200)]

x86: AVX-512 pixel_satd_x3 and pixel_satd_x4

commit | commitdiff | tree

Henrik Gramner [Mon, 14 Aug 2017 21:13:44 +0000 (23:13 +0200)]

x86: Shrink the x86-64 cabac coeff_last tables

Use dword instead of qword entries. Cuts the size of the tables in half
which allows each table fit inside a single cache line.

When PIC is disabled dwords are enough to store absolute addresses.

When PIC is enabled we can store dword offsets relative to the start of
the table and simply add the address of the table to the offset in order
to calculate the full address. This approach also have the advantage of
eliminating a whole bunch of run-time .data relocations.

commit | commitdiff | tree

Henrik Gramner [Wed, 16 Aug 2017 13:59:16 +0000 (15:59 +0200)]

x86inc: Support creating global symbols from local labels

On ELF platforms such symbols needs to be flagged as functions with the
correct visibility to please certain linkers in some scenarios.

commit | commitdiff | tree

Henrik Gramner [Tue, 15 Aug 2017 14:11:32 +0000 (16:11 +0200)]

x86inc: Use .rdata instead of .rodata on Windows

The standard section for read-only data on Windows is .rdata. Nasm will
flag non-standard sections as executable by default which isn't ideal.

commit | commitdiff | tree

Henrik Gramner [Fri, 4 Aug 2017 22:43:26 +0000 (00:43 +0200)]

x86inc: Set the correct cpuflag for AES-NI instructions

commit | commitdiff | tree

Henrik Gramner [Fri, 4 Aug 2017 22:09:52 +0000 (00:09 +0200)]

x86inc: Enable AVX emulation for floating-point pseudo-instructions

There are 32 pseudo-instructions for each floating-point comparison
instruction, but only 8 of them are actually valid in legacy-encoded mode.
The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions
and can therefore be disregarded for this purpose.

commit | commitdiff | tree

Henrik Gramner [Fri, 4 Aug 2017 21:09:00 +0000 (23:09 +0200)]

configure: Increase x86 stack alignment on clang

commit | commitdiff | tree

Anton Mitrofanov [Sun, 22 Oct 2017 17:18:39 +0000 (20:18 +0300)]

x86: Fix stack alignment for x264_cabac_encode_ue_bypass call

Fix MSVS fprofiled build for win64

commit | commitdiff | tree

Anton Mitrofanov [Sun, 22 Oct 2017 13:18:29 +0000 (16:18 +0300)]

mips: Fix incorrect pointers to msa optimized functions

commit | commitdiff | tree

Henrik Gramner [Fri, 11 Aug 2017 14:41:31 +0000 (16:41 +0200)]

Fix cpu capabilities listing on older x86 operating systems

Some cpuflags would previously be displayed incorrectly when running older
operating systems without AVX support on modern CPU:s.

commit | commitdiff | tree

Henrik Gramner [Sat, 24 Jun 2017 13:12:57 +0000 (15:12 +0200)]

x86: AVX-512 pixel_avg_weight_w8

commit | commitdiff | tree

Henrik Gramner [Sat, 24 Jun 2017 12:26:25 +0000 (14:26 +0200)]

x86: AVX-512 pixel_avg_weight_w16

commit | commitdiff | tree

Henrik Gramner [Thu, 22 Jun 2017 17:51:28 +0000 (19:51 +0200)]

x86: AVX-512 sub8x16_dct_dc

commit | commitdiff | tree

Henrik Gramner [Thu, 22 Jun 2017 09:26:21 +0000 (11:26 +0200)]

x86: AVX-512 sub8x8_dct_dc

commit | commitdiff | tree

Henrik Gramner [Thu, 1 Jun 2017 20:13:19 +0000 (22:13 +0200)]

x86: AVX-512 add8x8_idct

commit | commitdiff | tree

Henrik Gramner [Sat, 10 Jun 2017 14:01:53 +0000 (16:01 +0200)]

x86: AVX-512 sub16x16_dct

commit | commitdiff | tree

Henrik Gramner [Wed, 7 Jun 2017 14:55:48 +0000 (16:55 +0200)]

x86: AVX-512 sub8x8_dct

commit | commitdiff | tree

Henrik Gramner [Thu, 8 Jun 2017 19:14:08 +0000 (21:14 +0200)]

x86: AVX-512 sub4x4_dct

commit | commitdiff | tree

Henrik Gramner [Sun, 28 May 2017 14:12:33 +0000 (16:12 +0200)]

x86: AVX-512 mbtree_propagate_list

Uses gathers and scatters in combination with conflict detections to
vectorize the scalar part.

Also improve the checkasm test to try different mb_y values and check
for out-of-bounds writes.

commit | commitdiff | tree

James Darnley [Fri, 9 Jun 2017 12:08:16 +0000 (14:08 +0200)]

x86inc: Add aesni cpuflag define

Upstreaming this from FFmpeg. Unused in x264.

commit | commitdiff | tree

Martin Storsjö [Mon, 29 May 2017 09:13:03 +0000 (12:13 +0300)]

aarch64: Update the var2 functions to the new signature

The existing functions could easily be used by just calling them
twice - this would give the following cycle numbers from checkasm:

var2_8x8_c:      4110
var2_8x8_neon:   1505
var2_8x16_c:     8019
var2_8x16_neon:  2545

However, by merging both passes into the same function, we get the
following speedup:
var2_8x8_neon:   1205
var2_8x16_neon:  2327

commit | commitdiff | tree

Martin Storsjö [Mon, 29 May 2017 09:13:02 +0000 (12:13 +0300)]

arm: Update the var2 functions to the new signature

The existing functions could easily be used by just calling them
twice - this would give the following cycle numbers from checkasm:

             Cortex A7     A8     A9   A53
var2_8x8_c:       7302   5342   5050  4400
var2_8x8_neon:    2645   1612   1932  1715
var2_8x16_c:     14300  10528  10020  8637
var2_8x16_neon:   5127   2695   3217  2651

However, by merging both passes into the same function, we get the
following speedup:
var2_8x8_neon:    2312   1190   1389  1300
var2_8x16_neon:   4862   2130   2293  2422

commit | commitdiff | tree

Henrik Gramner [Wed, 15 Feb 2017 21:00:25 +0000 (22:00 +0100)]

Add support for levels 6, 6.1, and 6.2

These levels were added in the 2016-10 revision of the H.264 specification and
improves support for content with high resolutions and/or high frame rates.

Level 6.2 supports 8K resolution at 120 fps.

Also shrink the x264_levels array by using smaller data types.

commit | commitdiff | tree

Henrik Gramner [Thu, 23 Mar 2017 16:51:09 +0000 (17:51 +0100)]

Use a larger integer type for the slice_table array

Makes it possible to use slicing with resolutions larger than 2^24 pixels.

commit | commitdiff | tree

Henrik Gramner [Sun, 19 Feb 2017 09:48:33 +0000 (10:48 +0100)]

analyse: Reduce the size the cost_mv arrays

Use a dynamic size depending on the MV range. Reduces memory consumption by
up to a few megabytes.

Drop a related old miscompilation check since it may otherwise cause an
out-of-bounds memory access.

Also remove an unused extern variable declaration.

commit | commitdiff | tree

Anton Mitrofanov [Tue, 30 May 2017 23:52:16 +0000 (02:52 +0300)]

Fix CABAC+8x8dct in 4:4:4

Use the correct ctxIdxInc calculation for coded_block_flag.

commit | commitdiff | tree

Anton Mitrofanov [Mon, 5 Jun 2017 23:07:21 +0000 (02:07 +0300)]

Fix 8x8dct in lossless encoding

Change V and H intra prediction in lossless (TransformBypassModeFlag == 1)
macroblocks to correctly adhere to the specification. Affects lossless
encoding with 8x8dct or mix of lossless with normal macroblocks.

8x8dct has already been disabled in lossless mode for some time due to
being out-of-spec but this will allow us to re-enable it again.

commit | commitdiff | tree

Anton Mitrofanov [Thu, 8 Jun 2017 15:35:21 +0000 (18:35 +0300)]

mbtree: Fix buffer overflow

Could occur on the 1st pass in combination with --fake-interlaced and
some input heights due to allocating a too small buffer.

commit | commitdiff | tree

Henrik Gramner [Tue, 23 May 2017 14:40:26 +0000 (16:40 +0200)]

x86: Avoid self-relative expressions on macho64

Functions that uses self-relative expressions in the form of [foo-$$]
appears to cause issues on 64-bit Mach-O systems when assembled with nasm.
Temporarily disable those functions on macho64 for the time being until
we've figured out the root cause.

commit | commitdiff | tree

Anton Mitrofanov [Mon, 22 May 2017 20:59:32 +0000 (23:59 +0300)]

configure: Don't try to detect clang by $CC

Only check if option -Werror=unknown-warning-option is supported before adding it

commit | commitdiff | tree

Martin Storsjö [Mon, 22 May 2017 10:10:46 +0000 (13:10 +0300)]

checkasm: Use the right variable in a loop condition

Prior to this, this loop hasn't run at all. The condition has been
the same since it was introduced in 5b0cb86f.

This issue was pointed out by a clang warning.

commit | commitdiff | tree

Anton Mitrofanov [Mon, 22 May 2017 19:02:34 +0000 (22:02 +0300)]

x86: Fix linking with 8-bit depth shared libx264

commit | commitdiff | tree

Henrik Gramner [Sun, 14 May 2017 22:18:36 +0000 (00:18 +0200)]

x86: Only enable AVX-512 in 8-bit mode

commit | commitdiff | tree

Henrik Gramner [Thu, 11 May 2017 22:43:43 +0000 (00:43 +0200)]

x86: AVX-512 cabac_block_residual

commit | commitdiff | tree

Henrik Gramner [Wed, 10 May 2017 16:36:59 +0000 (18:36 +0200)]

x86: AVX-512 pixel_sad_x3 and pixel_sad_x4

Covers all variants: 4x4, 4x8, 8x4, 8x8, 8x16, 16x8, and 16x16.

commit | commitdiff | tree

Henrik Gramner [Sun, 7 May 2017 21:35:49 +0000 (23:35 +0200)]

x86: AVX-512 pixel_sad

Covers all variants: 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 16x8, and 16x16.

commit | commitdiff | tree

Henrik Gramner [Thu, 4 May 2017 19:53:28 +0000 (21:53 +0200)]

x86: AVX-512 decimate_score

Also drop the MMX versions and improve the SSE2, SSSE3 and AVX2 versions.

commit | commitdiff | tree

Henrik Gramner [Mon, 1 May 2017 12:55:45 +0000 (14:55 +0200)]

x86: AVX-512 pixel_var2_8x8 and 8x16

commit | commitdiff | tree

Henrik Gramner [Mon, 1 May 2017 12:54:32 +0000 (14:54 +0200)]

Rework pixel_var2

The functions are only ever called with pointers to fenc and fdec and the
strides are always constant so there's no point in having them as parameters.

Cover both the U and V planes in a single function call. This is more
efficient with SIMD, especially with the wider vectors provided by AVX2 and
AVX-512, even when accounting for losing the possibility of early termination.

Drop the MMX and XOP implementations, update the rest of the x86 assembly
to match the new behavior. Also enable high bit-depth in the AVX2 version.

Comment out the ARM, AARCH64, and MIPS MSA assembly for now.

commit | commitdiff | tree

Henrik Gramner [Sat, 29 Apr 2017 12:26:40 +0000 (14:26 +0200)]

x86: AVX-512 pixel_var_8x8, 8x16, and 16x16

Make the SSE2, AVX, and AVX2 versions a bit faster.

Drop the MMX and XOP versions.

commit | commitdiff | tree

Henrik Gramner [Fri, 28 Apr 2017 19:35:25 +0000 (21:35 +0200)]

x86: AVX-512 pixel_sa8d_8x8

commit | commitdiff | tree

Henrik Gramner [Thu, 13 Apr 2017 21:56:04 +0000 (23:56 +0200)]

x86: AVX-512 pixel_satd

Covers all variants: 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 16x8, and 16x16.

commit | commitdiff | tree

Henrik Gramner [Wed, 19 Apr 2017 14:39:48 +0000 (16:39 +0200)]

x86: AVX-512 deblock_strength

Also drop the MMX version and make some slight improvements to the SSE2,
SSSE3, AVX, and AVX2 versions.

commit | commitdiff | tree

Henrik Gramner [Wed, 12 Apr 2017 14:21:09 +0000 (16:21 +0200)]

x86: AVX-512 plane_copy_deinterleave_v210

commit | commitdiff | tree

Henrik Gramner [Sun, 9 Apr 2017 18:34:28 +0000 (20:34 +0200)]

x86: AVX-512 memzero_aligned

Reorder some elements in the x264_t.mb.pic struct to reduce the amount
of padding required.

Also drop the MMX implementation in favor of SSE.

commit | commitdiff | tree

Henrik Gramner [Fri, 7 Apr 2017 19:34:40 +0000 (21:34 +0200)]

x86: AVX and AVX-512 memcpy_aligned

Reorder some elements in the x264_mb_analysis_list_t struct to reduce the
amount of padding required.

Also drop the MMX implementation in favor of SSE.

commit | commitdiff | tree

Henrik Gramner [Thu, 6 Apr 2017 14:06:34 +0000 (16:06 +0200)]

x86: AVX-512 dequant_8x8_flat16

commit | commitdiff | tree

Henrik Gramner [Tue, 4 Apr 2017 18:54:12 +0000 (20:54 +0200)]

x86: AVX-512 dequant_8x8

commit | commitdiff | tree

Henrik Gramner [Tue, 4 Apr 2017 18:01:26 +0000 (20:01 +0200)]

x86: AVX-512 dequant_4x4

commit | commitdiff | tree

Henrik Gramner [Tue, 28 Mar 2017 20:59:56 +0000 (22:59 +0200)]

x86: AVX-512 mbtree_propagate_cost

Also make the AVX and AVX2 implementations slightly faster.

commit | commitdiff | tree

Henrik Gramner [Mon, 27 Mar 2017 16:19:53 +0000 (18:19 +0200)]

x86: AVX-512 coeff_last

commit | commitdiff | tree

Henrik Gramner [Sun, 26 Mar 2017 16:29:37 +0000 (18:29 +0200)]

x86: AVX-512 zigzag_interleave_8x8_cavlc

commit | commitdiff | tree

Henrik Gramner [Sun, 26 Mar 2017 09:34:18 +0000 (11:34 +0200)]

x86: AVX-512 zigzag_scan_8x8_field

commit | commitdiff | tree

Henrik Gramner [Sat, 25 Mar 2017 21:13:22 +0000 (22:13 +0100)]

x86: AVX-512 zigzag_scan_4x4_field

commit | commitdiff | tree

Henrik Gramner [Sat, 25 Mar 2017 18:14:28 +0000 (19:14 +0100)]

x86: AVX-512 zigzag_scan_8x8_frame

The vperm* instructions ignores unused bits, so we can pack the permutation
indices together to save cache and just use a shift to get the right values.

commit | commitdiff | tree

Henrik Gramner [Sat, 25 Mar 2017 18:14:22 +0000 (19:14 +0100)]

x86: AVX-512 zigzag_scan_4x4_frame

commit | commitdiff | tree

Henrik Gramner [Thu, 11 May 2017 22:03:10 +0000 (00:03 +0200)]

checkasm: x86: More accurate ymm/zmm measurements

YMM and ZMM registers on x86 are turned off to save power when they haven't
been used for some period of time. When they are used there will be a
"warmup" period during which performance will be reduced and inconsistent
which is problematic when trying to benchmark individual functions.

Periodically issue "dummy" instructions that uses those registers to
prevent them from being powered down. The end result is more consitent
benchmark results.

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom