]> granicus.if.org Git - libx264/log
libx264
9 years agox86: Enable high bit-depth x264_coeff_last64_avx2_lzcnt
Henrik Gramner [Sun, 11 Oct 2015 20:32:03 +0000 (22:32 +0200)]
x86: Enable high bit-depth x264_coeff_last64_avx2_lzcnt

The function existed but was never enabled.

9 years agox86inc: Add debug symbols indicating sizes of compiled functions
Geza Lore [Mon, 12 Oct 2015 12:13:42 +0000 (13:13 +0100)]
x86inc: Add debug symbols indicating sizes of compiled functions

Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.

Currently only implemented for ELF.

9 years agox86inc: Avoid creating unnecessary local labels
Henrik Gramner [Fri, 16 Oct 2015 19:28:49 +0000 (21:28 +0200)]
x86inc: Avoid creating unnecessary local labels

The REP_RET workaround is only needed on old AMD cpus, and the labels clutter
up the symbol table and confuse debugging/profiling tools, so use EQU to
create SHN_ABS symbols instead of creating local labels. Furthermore, skip
the workaround completely in functions that definitely won't run on such cpus.

This patch doesn't modify any emitted instructions, and doesn't actually affect
x264 at all. It's only for other projects that use x86inc.asm without an
appropriate `strip` command in their buildsystem.

Note that EQU is just creating a local label when using nasm instead of yasm.
This is probably a bug, but at least it doesn't break anything.

9 years agox86inc: Simplify AUTO_REP_RET
Henrik Gramner [Thu, 15 Oct 2015 15:42:49 +0000 (17:42 +0200)]
x86inc: Simplify AUTO_REP_RET

cpuflags is never undefined any more, it's set to 0 instead.

Also fix an incorrect comment.

9 years agox86inc: Use more consistent indentation
Henrik Gramner [Mon, 12 Oct 2015 19:55:11 +0000 (21:55 +0200)]
x86inc: Use more consistent indentation

9 years agox86inc: Preserve arguments when allocating stack space
Henrik Gramner [Mon, 12 Oct 2015 18:15:18 +0000 (20:15 +0200)]
x86inc: Preserve arguments when allocating stack space

When allocating stack space with a larger alignment than the known stack
alignment a temporary register is used for storing the stack pointer.
Ensure that this isn't one of the registers used for passing arguments.

9 years agox86inc: Improve FMA instruction handling
Henrik Gramner [Sat, 16 Jan 2016 23:25:47 +0000 (00:25 +0100)]
x86inc: Improve FMA instruction handling

 * Correctly handle FMA instructions with memory operands.
 * Print a warning if FMA instructions are used without the correct cpuflag.
 * Simplify the instantiation code.
 * Clarify documentation.

Only the last operand in FMA3 instructions can be a memory operand. When
converting FMA4 instructions to FMA3 instructions we can utilize the fact
that multiply is a commutative operation and reorder operands if necessary
to ensure that a memory operand is used only as the last operand.

9 years agox86inc: Be more verbose in assertion failures
Henrik Gramner [Sun, 11 Oct 2015 20:31:53 +0000 (22:31 +0200)]
x86inc: Be more verbose in assertion failures

9 years agox86inc: Make cpuflag() and notcpuflag() return 0 or 1
Henrik Gramner [Wed, 30 Sep 2015 21:17:00 +0000 (23:17 +0200)]
x86inc: Make cpuflag() and notcpuflag() return 0 or 1

Makes it possible to use them in arithmetic expressions.

9 years agoencoder_open: Fix memory leak
Henrik Gramner [Fri, 30 Oct 2015 15:55:49 +0000 (16:55 +0100)]
encoder_open: Fix memory leak

Furthermore, the x264_analyse_prepare_costs() and x264_analyse_init_costs()
functions were only used in x264_encoder_open(), so move that entire section
of code to analyse.c as well to simplify things.

9 years agoarm: do not fill mc_weight*_neon tabs for HIGH_BIT_DEPTH
Janne Grunau [Wed, 18 Nov 2015 10:08:22 +0000 (11:08 +0100)]
arm: do not fill mc_weight*_neon tabs for HIGH_BIT_DEPTH

The asm is only for 8-bit and function prototypes reflect that. Avoids
numerous warnings with --bit-depth=9/10.

9 years agoarm: Eliminate text relocations in asm
Janne Grunau [Tue, 13 Oct 2015 21:50:11 +0000 (23:50 +0200)]
arm: Eliminate text relocations in asm

Android 6 does not link shared libraries with text relocations.

Make the movrel macro position independent and add movrelx for indirect
loads of external symbols.

Move the function pointer table for the aligned memcpy variants to the
data.rel.ro section on Linux/Android.

9 years agoarm: Don't assume alignment in mbtree_propagate_list_internal where it isn't provided
Martin Storsjö [Thu, 15 Oct 2015 08:50:33 +0000 (11:50 +0300)]
arm: Don't assume alignment in mbtree_propagate_list_internal where it isn't provided

9 years agoarm: Fix checkasm register clobber check on iOS
Janne Grunau [Tue, 13 Oct 2015 21:50:12 +0000 (23:50 +0200)]
arm: Fix checkasm register clobber check on iOS

r9 is a volatile register in the iOS ABI and will therefore not be
preserved by compiled functions like the luma motion compensation.

Add the symbol prefix to the puts() call and use blx since a switch
between arm and thumb mode might be required.

9 years agoppc: Add detection of AltiVec support for FreeBSD
Anton Mitrofanov [Wed, 30 Sep 2015 22:02:16 +0000 (01:02 +0300)]
ppc: Add detection of AltiVec support for FreeBSD

Patch from FreeBSD ports.

9 years agoDon't assume 16-byte stack alignment by default on x86-32
Anton Mitrofanov [Mon, 28 Sep 2015 18:07:55 +0000 (21:07 +0300)]
Don't assume 16-byte stack alignment by default on x86-32

Some compilers depending on target OS uses 4-byte stack alignment by default.
Explicitly check known good compilers and specific options for stack alignment.

9 years agoFix a few static analyzer performance hints
Anton Mitrofanov [Tue, 22 Sep 2015 18:33:07 +0000 (21:33 +0300)]
Fix a few static analyzer performance hints

9 years agoRevise the row VBV algorithm
Anton Mitrofanov [Tue, 22 Sep 2015 17:19:23 +0000 (20:19 +0300)]
Revise the row VBV algorithm

9 years agoFix high bit depth lookahead cost compensation algorithm
Anton Mitrofanov [Tue, 22 Sep 2015 16:26:25 +0000 (19:26 +0300)]
Fix high bit depth lookahead cost compensation algorithm

Now high bit depth VBV should act more like 8-bit depth one.

9 years agoCorrectly update the intra row predictor in B-frames
Anton Mitrofanov [Tue, 22 Sep 2015 16:05:52 +0000 (19:05 +0300)]
Correctly update the intra row predictor in B-frames

It was previously used but never updated from it's initialization value.

9 years agoChange the predictors update algorithm
Anton Mitrofanov [Tue, 22 Sep 2015 15:58:24 +0000 (18:58 +0300)]
Change the predictors update algorithm

Keep predictor offsets more stable. This should fix VBV misprediction in frames
with a large difference in complexity between the top and bottom parts.

9 years agoarm: Implement x264_mbtree_propagate_{cost, list}_neon
Martin Storsjö [Thu, 3 Sep 2015 06:30:44 +0000 (09:30 +0300)]
arm: Implement x264_mbtree_propagate_{cost, list}_neon

The cost function could be simplified to avoid having to clobber
q4/q5, but this requires reordering instructions which increase
the total runtime.

checkasm timing       Cortex-A7      A8      A9
mbtree_propagate_cost_c      63702   155835  62829
mbtree_propagate_cost_neon   17199   10454   11106

mbtree_propagate_list_c      104203  108949  84532
mbtree_propagate_list_neon   82035   78348   60410

9 years agox86: Share the mbtree_propagate_list macro with aarch64
Martin Storsjö [Thu, 3 Sep 2015 06:30:43 +0000 (09:30 +0300)]
x86: Share the mbtree_propagate_list macro with aarch64

This avoids having to duplicate the same code for all architectures
that implement only the internal part of this function in assembler.

9 years agoarm: Implement luma intra deblocking
Martin Storsjö [Wed, 2 Sep 2015 19:39:51 +0000 (22:39 +0300)]
arm: Implement luma intra deblocking

checkasm timing       Cortex-A7      A8     A9
deblock_luma_intra[0]_c      5988    4653   4316
deblock_luma_intra[0]_neon   3103    2170   2128
deblock_luma_intra[1]_c      7119    5905   5347
deblock_luma_intra[1]_neon   2068    1381   1412

This includes extra optimizations by Janne Grunau.

Timings from a separate build, on Exynos 5422:

                      Cortex-A7     A15
deblock_luma_intra[0]_c      6627   3300
deblock_luma_intra[0]_neon   3059   1128
deblock_luma_intra[1]_c      7314   4128
deblock_luma_intra[1]_neon   2038   720

9 years agoarm: Implement some neon 8x16c intra predict functions
Martin Storsjö [Mon, 31 Aug 2015 19:40:31 +0000 (22:40 +0300)]
arm: Implement some neon 8x16c intra predict functions

checkasm timing       Cortex-A7      A8     A9
intra_predict_8x16c_dct_c    862     540    590
intra_predict_8x16c_dct_neon 608     511    657
intra_predict_8x16c_h_c      972     707    719
intra_predict_8x16c_h_neon   722     656    672
intra_predict_8x16c_p_c      10183   9819   8655
intra_predict_8x16c_p_neon   2622    1972   1983

9 years agoarm: Implement x264_plane_copy_neon
Martin Storsjö [Thu, 27 Aug 2015 21:15:01 +0000 (00:15 +0300)]
arm: Implement x264_plane_copy_neon

checkasm timing       Cortex-A7      A8     A9
plane_copy_c                 13124   10925  9106
plane_copy_neon              7349    5103   8945

9 years agocheckasm: arm: Check register clobbering
Martin Storsjö [Fri, 28 Aug 2015 06:40:24 +0000 (09:40 +0300)]
checkasm: arm: Check register clobbering

Cast the function pointer to a different type signature, to
be able to use uint64_t as return type (instead of intptr_t) for
those calls that require it.

Use two separate functions, depending on whether neon is available.

9 years agocheckasm: Try different widths for ssd_nv12
Martin Storsjö [Thu, 13 Aug 2015 21:00:57 +0000 (00:00 +0300)]
checkasm: Try different widths for ssd_nv12

To test all codepaths in the aarch64 neon implementation, one at
the very least needs to test with width 8, 16, 24 and 32.

9 years agoHaiku support
Jerome Duval [Fri, 13 Jun 2014 19:56:27 +0000 (19:56 +0000)]
Haiku support

Add Haiku as supported platform in configure.
Haiku has no nice() function, use the platform specific substitute instead.

9 years agocheckasm: aarch64: Check register clobbering
Martin Storsjö [Tue, 25 Aug 2015 11:38:20 +0000 (14:38 +0300)]
checkasm: aarch64: Check register clobbering

Disable this on iOS, since it has got a slightly different ABI
for vararg parameters.

9 years agoarm: Implement x284_decimate_score15/16/64_neon
Martin Storsjö [Tue, 25 Aug 2015 20:36:45 +0000 (23:36 +0300)]
arm: Implement x284_decimate_score15/16/64_neon

checkasm timing       Cortex-A7      A8     A9
decimate_score15_c           764     736    535
decimate_score15_neon        487     494    453
decimate_score16_c           782     727    553
decimate_score16_neon        487     494    521
decimate_score64_c           2361    2597   2011
decimate_score64_neon        1017    802    785

9 years agoarm: Implement chroma intra deblock
Martin Storsjö [Tue, 25 Aug 2015 20:36:44 +0000 (23:36 +0300)]
arm: Implement chroma intra deblock

checkasm timing              Cortex-A7      A8     A9
deblock_chroma_420_intra_mbaff_c    1469    1276   1181
deblock_chroma_420_intra_mbaff_neon 981     717    644
deblock_chroma_intra[1]_c           2954    2402   2321
deblock_chroma_intra[1]_neon        947     581    575
deblock_h_chroma_420_intra_c        2859    2509   2264
deblock_h_chroma_420_intra_neon     1480    1119   1028
deblock_h_chroma_422_intra_c        6211    5030   4792
deblock_h_chroma_422_intra_neon     2894    1990   2077

9 years agoarm: Implement x264_pixel_sa8d_satd_16x16_neon
Martin Storsjö [Tue, 25 Aug 2015 11:38:17 +0000 (14:38 +0300)]
arm: Implement x264_pixel_sa8d_satd_16x16_neon

This requires spilling some registers to the stack,
contray to the aarch64 version.

checkasm timing        Cortex-A7      A8     A9
sa8d_satd_16x16_neon          12936   6365   7492
sa8d_satd_16x16_separate_neon 14841   6605   8324

9 years agoarm: Implement x264_deblock_h_chroma_mbaff_neon
Martin Storsjö [Tue, 25 Aug 2015 11:38:16 +0000 (14:38 +0300)]
arm: Implement x264_deblock_h_chroma_mbaff_neon

checkasm timing        Cortex-A7      A8     A9
deblock_chroma_420_mbaff_c    1944    1706   1526
deblock_chroma_420_mbaff_neon 1210    873    865

9 years agoarm: Implement x264_deblock_h_chroma_422_neon
Martin Storsjö [Tue, 25 Aug 2015 11:38:15 +0000 (14:38 +0300)]
arm: Implement x264_deblock_h_chroma_422_neon

checkasm timing       Cortex-A7      A8     A9
deblock_h_chroma_422_c       6953    6269   5145
deblock_h_chroma_422_neon    3905    2569   2551

9 years agoarm: Implement integral_init4/8h/v_neon
Martin Storsjö [Tue, 25 Aug 2015 11:38:14 +0000 (14:38 +0300)]
arm: Implement integral_init4/8h/v_neon

checkasm timing       Cortex-A7      A8     A9
integral_init4h_c            10466   8590   6161
integral_init4h_neon         3021    1494   1800
integral_init4v_c            16250   13590  13628
integral_init4v_neon         3473    2073   3291
integral_init8h_c            10100   8275   5705
integral_init8h_neon         4403    2344   2751
integral_init8v_c            6403    4632   4999
integral_init8v_neon         1184    783    1306

9 years agoarm: Implement x264_denoise_dct_neon
Martin Storsjö [Tue, 25 Aug 2015 11:38:13 +0000 (14:38 +0300)]
arm: Implement x264_denoise_dct_neon

checkasm timing       Cortex-A7      A8     A9
denoise_dct_c                6604    5510   5858
denoise_dct_neon             1774    1139   1614

9 years agoarm: Add x264_nal_escape_neon
Martin Storsjö [Tue, 25 Aug 2015 11:38:12 +0000 (14:38 +0300)]
arm: Add x264_nal_escape_neon

checkasm timing      Cortex-A7      A8      A9
nal_escape_c                852758  879566  655497
nal_escape_neon             376831  450678  371673

9 years agoarm: Add neon versions of vsad, asd8 and ssd_nv12_core
Martin Storsjö [Tue, 25 Aug 2015 11:38:11 +0000 (14:38 +0300)]
arm: Add neon versions of vsad, asd8 and ssd_nv12_core

These are straight translations of the aarch64 versions.

checkasm timing      Cortex-A7      A8      A9
vsad_c                      16234   10984   9850
vsad_neon                   2132    1020    789

asd8_c                      5859    3561    3543
asd8_neon                   1407    1279    1250

ssd_nv12_c                  608096  591072  426285
ssd_nv12_neon               72752   33549   41347

9 years agocheckasm: Check the right output range for integral_initXh
Martin Storsjö [Tue, 25 Aug 2015 11:38:10 +0000 (14:38 +0300)]
checkasm: Check the right output range for integral_initXh

These functions write their output into sum+stride, while we previously
only checked [0..stride-8] within the sum array.

This catches the previously broken aarch64 version of these functions.

Also check up until stride-4 elements for init4h.

9 years agoaarch64: Skip deblocking in 264_deblock_h_chroma_422_neon
Janne Grunau [Thu, 20 Aug 2015 11:55:54 +0000 (13:55 +0200)]
aarch64: Skip deblocking in 264_deblock_h_chroma_422_neon

If the parameters (alpha, beta, tc0[]) indicated that the deblocking
should have been skipped, every 2nd chrome line would have deblocked
anyway.

deblock_h_chroma_422_neon: 2259 (before)
deblock_h_chroma_422_neon: 2192 (after)

9 years agoaarch64: Optimize various intra_predict asm functions
Janne Grunau [Mon, 17 Aug 2015 14:39:20 +0000 (16:39 +0200)]
aarch64: Optimize various intra_predict asm functions

Make them at least as fast as the compiled C version (tested on
cortex-a53 vs. gcc 4.9.2).

                        C     NEON (before)   NEON (after)
intra_predict_4x4_dc:   260   335             260
intra_predict_4x4_dct:  210   265             200
intra_predict_8x8c_dc:  497   548             493
intra_predict_8x8c_v:   232   309             179 (arm64)
intra_predict_8x16c_dc: 795   830             790

9 years agoaarch64: Faster intra_predict_4x4_h
Janne Grunau [Tue, 18 Aug 2015 08:25:10 +0000 (10:25 +0200)]
aarch64: Faster intra_predict_4x4_h

Use multiplication with 0x01010101 for splats.

On a cortex-a53:
                     gcc 4.9.2   llvm 3.6   neon (before)   neon (after)
intra_predict_4x4_h: 162         147        160/155         139/135

9 years agoaarch64: Fix coeff_level_run* macros with LLVM's assembler
Janne Grunau [Tue, 18 Aug 2015 08:25:09 +0000 (10:25 +0200)]
aarch64: Fix coeff_level_run* macros with LLVM's assembler

LLVM's integrated assembler does not treat symbols as integer constants.

9 years agoaarch64: Remove commas LLVM's assembler complains about
Janne Grunau [Tue, 18 Aug 2015 08:25:08 +0000 (10:25 +0200)]
aarch64: Remove commas LLVM's assembler complains about

9 years agoarm: Implement x264_sub8x16_dct_dc_neon
Martin Storsjö [Thu, 13 Aug 2015 20:59:31 +0000 (23:59 +0300)]
arm: Implement x264_sub8x16_dct_dc_neon

checkasm timing      Cortex-A7      A8     A9
sub8x16_dct_dc_c            6386    3901   4080
sub8x16_dct_dc_neon         1491    698    917

9 years agoarm: Optimize x264_deblock_h_chroma_neon
Martin Storsjö [Thu, 13 Aug 2015 20:59:28 +0000 (23:59 +0300)]
arm: Optimize x264_deblock_h_chroma_neon

Shuffle both chroma components together as a 16 bit unit, and
don't write the unchanged columns (like in x264_deblock_h_luma_neon
and in the aarch64 version of the function).

This causes a minor slowdown for x264_deblock_v_chroma_neon, but
it is negligible compared to the speedup.

checkasm timing      Cortex-A7    A8    A9
deblock_chroma[1]_c         4817  4057  3601
deblock_chroma[1]_neon      1249  716   817   (before)
deblock_chroma[1]_neon      1249  766   845   (after)

deblock_h_chroma_420_c      3699  3275  2830
deblock_h_chroma_420_neon   2068  1414  1400  (before)
deblock_h_chroma_420_neon   1838  1355  1291  (after)

9 years agoaarch64: Remove leftover commented out code
Martin Storsjö [Thu, 13 Aug 2015 20:59:27 +0000 (23:59 +0300)]
aarch64: Remove leftover commented out code

9 years agoaarch64: Simplify the decimate_score functions
Martin Storsjö [Thu, 13 Aug 2015 20:59:26 +0000 (23:59 +0300)]
aarch64: Simplify the decimate_score functions

After doing a left shift by the number of bits returned by clz,
only bits set to zero can be shifted out, so if the register
was nonzero to start with (which is checked), it can't become
zero here.

9 years agoarm: Use aligned loads in x264_coeff_last15_neon
Martin Storsjö [Thu, 13 Aug 2015 20:59:25 +0000 (23:59 +0300)]
arm: Use aligned loads in x264_coeff_last15_neon

After subtracting 2, the pointer will be aligned.

checkasm timing      Cortex-A7    A8    A9
coeff_last15_c              423   375   230
coeff_last15_neon           350   420   404  (before)
coeff_last15_neon           350   400   394  (after)

9 years agoarm: Simplify x264_predict_8x8c_p_neon
Martin Storsjö [Thu, 13 Aug 2015 20:59:24 +0000 (23:59 +0300)]
arm: Simplify x264_predict_8x8c_p_neon

This gets rid of a few unnecessary (and confusing) steps in
calculating the increment to i00.

checkasm timing      Cortex-A7    A8    A9
intra_predict_8x8c_p_c      5525  4732  4755
intra_predict_8x8c_p_neon   1719  1140  1262  (before)
intra_predict_8x8c_p_neon   1663  1142  1255  (after)

9 years agolavf: Use the prefixed name for pixel format enum
Vittorio Giovara [Tue, 15 Sep 2015 13:40:14 +0000 (15:40 +0200)]
lavf: Use the prefixed name for pixel format enum

9 years agoaarch64: fix x264_mbtree_propagate_cost_neon
Janne Grunau [Wed, 2 Sep 2015 22:21:58 +0000 (00:21 +0200)]
aarch64: fix x264_mbtree_propagate_cost_neon

The branch conditon caused the loop to execute one time more than
intended. Detected by a memory corruption on arm with the 1 to 1 port of
the function.

9 years agoaarch64: Fix integral_init4/8h_neon
Martin Storsjö [Thu, 13 Aug 2015 20:59:22 +0000 (23:59 +0300)]
aarch64: Fix integral_init4/8h_neon

The stride is the number of uint16_t elements and thus needs
to be shifted.

This issue had slipped unnoticed since checkasm didn't actually
verify the output of these functions.

9 years agox86: Fix integral_init4/8h_avx2
Henrik Gramner [Thu, 27 Aug 2015 17:53:00 +0000 (19:53 +0200)]
x86: Fix integral_init4/8h_avx2

The AVX2 implementation was using the wrong offsets. It went undetected due to
the checkasm test being incorrect.

9 years agoSimplify inclusion of x264.h in C++ projects
Mark Webster [Wed, 5 Aug 2015 03:28:17 +0000 (04:28 +0100)]
Simplify inclusion of x264.h in C++ projects

Name all structs to support forward declarations.
Add a conditional extern "C" wrapper in x264.h itself instead of having to
specify it in every location where it's included.

9 years agocheckasm: Properly save rdx/edx in checkasm_call() on x86
Henrik Gramner [Sun, 16 Aug 2015 19:59:26 +0000 (21:59 +0200)]
checkasm: Properly save rdx/edx in checkasm_call() on x86

If the return value doesn't fit in a single register rdx/edx can in some
cases be used in addition to rax/eax.

Doesn't affect any of the existing checkasm tests but it's more correct
behavior and it might be useful in the future.

9 years agox86: Enable SSE2 by default on x86-32
Henrik Gramner [Tue, 11 Aug 2015 15:19:35 +0000 (17:19 +0200)]
x86: Enable SSE2 by default on x86-32

It makes more sense to tune the defaults to benefit the vast majority of users.

Anyone still using a Pentium III for video encoding is of course free to
explicitly set different flags when compiling.

9 years agomsvs/icl: Improve default CFLAGS
Henrik Gramner [Mon, 10 Aug 2015 20:30:21 +0000 (22:30 +0200)]
msvs/icl: Improve default CFLAGS

Use -fp:fast as a substitute for -ffast-math.
Increase warning level from -W0 to -W1 (the default setting).
Disable -GS (stack cookies) on MSVS. It's disabled by default on ICL.

9 years agoUse a relative $SRCPATH for out-of-tree builds
Henrik Gramner [Wed, 12 Aug 2015 20:23:31 +0000 (22:23 +0200)]
Use a relative $SRCPATH for out-of-tree builds

Fixes out-of-tree MSVS builds on Cygwin.

9 years agocygwin: Enable MSVS support
Henrik Gramner [Sat, 8 Aug 2015 20:26:38 +0000 (22:26 +0200)]
cygwin: Enable MSVS support

`cl -showIncludes` creates absolute Windows paths for some files, attempt
to convert those to Unix paths.

Use relative paths for dependencies located in or below the working directory
in order to mimic the behavior of gcc and to make the paths more readable.

Make the dependency generation script a bit more robust in general.

9 years agocltostr.sh: Minor fixes
Henrik Gramner [Sat, 8 Aug 2015 16:34:21 +0000 (18:34 +0200)]
cltostr.sh: Minor fixes

9 years agoSimplify version.sh
Henrik Gramner [Sat, 8 Aug 2015 10:21:54 +0000 (12:21 +0200)]
Simplify version.sh

Also remove some non-POSIX syntax and improve robustness.

As a bonus the script now runs about 2-3 times faster.

`git rev-list --count` could be used to simplify things even further,
but that functionality was added in git 1.7.2 so keep `wc -l` for now
to maintain compatibility with older git versions.

9 years agomsvs: Fix cl detection in non-English environments
장영훈 [Fri, 7 Aug 2015 05:43:24 +0000 (14:43 +0900)]
msvs: Fix cl detection in non-English environments

9 years agox86inc: Sync minor changes from ffmpeg/libav
Henrik Gramner [Mon, 3 Aug 2015 19:05:11 +0000 (21:05 +0200)]
x86inc: Sync minor changes from ffmpeg/libav

9 years agomatroska: Add comments for the remaining element names
Henrik Gramner [Wed, 29 Jul 2015 17:30:52 +0000 (19:30 +0200)]
matroska: Add comments for the remaining element names

9 years agoSilence various static analyzer warnings
Henrik Gramner [Wed, 29 Jul 2015 17:30:41 +0000 (19:30 +0200)]
Silence various static analyzer warnings

Those are false positives, but it doesn't hurt to get rid of them.

9 years agomingw: Enable the tsaware linker flag
Henrik Gramner [Sun, 26 Jul 2015 21:13:29 +0000 (23:13 +0200)]
mingw: Enable the tsaware linker flag

Avoids an irrelevant compatibility layer in Terminal Services environments.

https://msdn.microsoft.com/en-us/library/cc834995.aspx

9 years agomsvs: Don't redefine snprintf for VS2015
Henrik Gramner [Sun, 26 Jul 2015 21:13:26 +0000 (23:13 +0200)]
msvs: Don't redefine snprintf for VS2015

Visual Studio 2015 has a proper snprintf implementation.

9 years agomsvs: Prefer link.exe from the same directory as cl.exe
Henrik Gramner [Sun, 26 Jul 2015 21:13:19 +0000 (23:13 +0200)]
msvs: Prefer link.exe from the same directory as cl.exe

/usr/bin/link from coreutils may be located before the MSVS linker in $PATH
which causes linking to fail due to using the wrong binary.

9 years agoframe_dump: check fseek() return value
Henrik Gramner [Sun, 26 Jul 2015 22:10:00 +0000 (00:10 +0200)]
frame_dump: check fseek() return value

9 years agox264_vfprintf: use va_copy
Henrik Gramner [Sun, 26 Jul 2015 22:08:38 +0000 (00:08 +0200)]
x264_vfprintf: use va_copy

It's undefined behavior to use the same va_list twice.

This most likely didn't cause any issues in practice since the string would
have to be larger than 4 KiB to trigger the fallback path.

Use workaround for ICL as it doesn't define va_copy even for C99.

9 years agoparam_parse: Fix framerate rounding issues
Henrik Gramner [Sun, 26 Jul 2015 22:08:31 +0000 (00:08 +0200)]
param_parse: Fix framerate rounding issues

9 years agoaarch64: Remove broken CFLAGS in configure
Marcin Juszkiewicz [Mon, 1 Jun 2015 09:24:45 +0000 (11:24 +0200)]
aarch64: Remove broken CFLAGS in configure

GCC doesn't have an "-arch" switch, but works when that entire line is removed.

9 years agoppc: Add little-endian PowerPC support
Rong Yan [Mon, 20 Jul 2015 08:34:20 +0000 (03:34 -0500)]
ppc: Add little-endian PowerPC support

9 years agomips: MSA quant optimizations
Rishikesh More [Thu, 18 Jun 2015 12:18:46 +0000 (17:48 +0530)]
mips: MSA quant optimizations

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: MSA predict optimizations
Rishikesh More [Thu, 18 Jun 2015 12:18:45 +0000 (17:48 +0530)]
mips: MSA predict optimizations

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: MSA pixel optimizations
Rishikesh More [Thu, 18 Jun 2015 12:18:44 +0000 (17:48 +0530)]
mips: MSA pixel optimizations

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: MSA deblock optimizations
Rishikesh More [Thu, 18 Jun 2015 12:18:43 +0000 (17:48 +0530)]
mips: MSA deblock optimizations

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: MSA dct optimizations
Rishikesh More [Thu, 18 Jun 2015 12:18:42 +0000 (17:48 +0530)]
mips: MSA dct optimizations

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: MSA mc optimizations
Rishikesh More [Thu, 18 Jun 2015 12:18:40 +0000 (17:48 +0530)]
mips: MSA mc optimizations

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: Common MSA macros
Rishikesh More [Thu, 18 Jun 2015 12:18:38 +0000 (17:48 +0530)]
mips: Common MSA macros

Add macros for load/store, slide, shift, transpose and basic arithmetic
operations required by subsequent patches.

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: Add MSA support to checkasm
Rishikesh More [Tue, 12 May 2015 14:08:09 +0000 (19:38 +0530)]
mips: Add MSA support to checkasm

Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
9 years agomips: Initial MSA support
Kaustubh Raste [Fri, 17 Apr 2015 12:08:58 +0000 (17:38 +0530)]
mips: Initial MSA support

MSA is the MIPS SIMD Architecture.

Add X264_CPU_MSA define.
Update configure to detect MIPS platform and set flags.
CPU-specific gcc options are expected through --extra-cflags.

Sample command line for mips32r5:
    ./configure --host=mipsel-linux-gnu --cross-prefix=<TOOLCHAIN>/mips-mti-linux-gnu-
    --extra-cflags="-EL -mips32r5 -msched-weight -mload-store-pairs"

Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
9 years agoLimit autodetection of threads number according to the source height
Anton Mitrofanov [Thu, 16 Jul 2015 21:22:29 +0000 (00:22 +0300)]
Limit autodetection of threads number according to the source height

9 years agoFine-tune of frame's size predictors at ratecontrol start
Anton Mitrofanov [Thu, 16 Jul 2015 16:04:59 +0000 (19:04 +0300)]
Fine-tune of frame's size predictors at ratecontrol start

This is attempt to improve VBV at start of video with a lot of threads which
delay feedback for predictors.

9 years agoUse forced frame types in slicetype analysis
Anton Mitrofanov [Thu, 16 Jul 2015 13:15:56 +0000 (16:15 +0300)]
Use forced frame types in slicetype analysis

This should improve MBTree and VBV when a lot of forced frame types are used.

9 years agox86: SSSE3 and AVX2 implementations of plane_copy_swap
Henrik Gramner [Mon, 1 Dec 2014 21:05:42 +0000 (22:05 +0100)]
x86: SSSE3 and AVX2 implementations of plane_copy_swap

For NV21 input.

9 years agoNV21 input support
Yu Xiaolei [Fri, 6 Jun 2014 08:05:27 +0000 (16:05 +0800)]
NV21 input support

Eliminates an extra copy when encoding Android camera preview images.

Checkasm test by Janne Grunau.
ARM assembly with improvements from Janne Grunau.

9 years agodeblock: Write combining
Henrik Gramner [Tue, 23 Jun 2015 15:00:47 +0000 (17:00 +0200)]
deblock: Write combining

9 years agoGet rid of some tabs and trailing whitespaces
Henrik Gramner [Tue, 23 Jun 2015 12:59:59 +0000 (14:59 +0200)]
Get rid of some tabs and trailing whitespaces

9 years agox86: Experimental nasm support
Henrik Gramner [Sat, 23 May 2015 17:44:16 +0000 (19:44 +0200)]
x86: Experimental nasm support

Enables the use of nasm as an alternative to yasm.

Note that nasm cannot assemble x264 with PIC enabled since it currently doesn't
support [symbol-$$] addressing which is used extensively by x264's PIC code.
This includes all 64-bit Windows and 64-bit OS X builds, even non-shared.

For the above reason nasm is currently intentionally not auto-detected, instead
the assembler must be explicitly specified using "AS=nasm ./configure".

Also drop -O2 from ASFLAGS since it's simply ignored anyway.

9 years agox86inc: Prevent warnings when using `struc` and `endstruc`
Timothy Gu [Tue, 26 May 2015 17:12:42 +0000 (19:12 +0200)]
x86inc: Prevent warnings when using `struc` and `endstruc`

struc and endstruc attempts to revert to the previous section state set by
the SECTION macro.

Use the primitive [SECTION] directive instead of the SECTION macro for the
.note.GNU-stack section to prevent it from being emitted again during endstruc.

9 years agox86inc: Drop SECTION_TEXT macro
Henrik Gramner [Wed, 27 May 2015 19:38:14 +0000 (21:38 +0200)]
x86inc: Drop SECTION_TEXT macro

The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.

9 years agox86inc: Disable vpbroadcastq workaround in newer yasm versions
Henrik Gramner [Sat, 23 May 2015 11:38:05 +0000 (13:38 +0200)]
x86inc: Disable vpbroadcastq workaround in newer yasm versions

The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.

9 years agoPrefer Unicode versions of Windows API calls
Henrik Gramner [Sun, 24 May 2015 20:57:00 +0000 (22:57 +0200)]
Prefer Unicode versions of Windows API calls

Just for consistency, doesn't affect behavior.

9 years agoGet rid of fPIC warnings when compiling a shared library on Windows
Henrik Gramner [Sun, 24 May 2015 21:21:20 +0000 (23:21 +0200)]
Get rid of fPIC warnings when compiling a shared library on Windows

PIC is always enabled when compiling for Windows so gcc complains when using
-fPIC since it doesn't do anything.

9 years agomatroska: Write the correct DocTypeVersion when using frame-packing
Henrik Gramner [Sat, 25 Jul 2015 20:42:59 +0000 (22:42 +0200)]
matroska: Write the correct DocTypeVersion when using frame-packing

The StereoMode element is only valid with DocTypeVersion 3 or higher.

9 years agodump_yuv: Fix file handle leak
Anton Mitrofanov [Fri, 24 Jul 2015 21:21:52 +0000 (00:21 +0300)]
dump_yuv: Fix file handle leak

9 years agomp4: Fix file handle leak
Anton Mitrofanov [Fri, 24 Jul 2015 21:20:47 +0000 (00:20 +0300)]
mp4: Fix file handle leak