Initial CTS (composition offset) was essentially getting added twice to
the computed PTS
Fixes https://github.com/HandBrake/HandBrake/issues/568
Here's a description of how mp4 timestamps work and what is going wrong
for the curious.
Terminology:
pts = presentation timestamp, when a frame is displayed
dts = decode timestamp, when a frame is decoded
cts = composition offset, pts - dts
empty edit = defines the pts of the first frame in an mp4 track
mp4 timestamps are computed from 3 primary values that are in the mp4
stream.
An "empty edit" in the track edit list
per frame duration
per frame cts
Here's where things get messy. How do you compute pts(N) and dts(N) for
some frame N from only the above 3 values in the mp4 file?
empty edit == pts(0) and is read from the mp4 file (EDTS table)
duration(N) is read from the mp4 file (STTS table)
cts(N) is read from the mp4 file (CTTS table)
We know cts(0) = pts(0) - dts(0) by definition of cts
And cts(0) and pts(0) are known since they can be read from the mp4 file
This is the step libav gets wrong!
Therefore we can compute dts(0) = pts(0) - cts(0).
libav computes dts(0) = pts(0) which shifts all frames by cts(0)
After that dts(N) = dts(0) + duration(0) + ... + duration(N-1)
And finally pts(N) = dts(N) + cts(N)
(cherry picked from commit
88343d5a0ee9969071bb8a263dab0e0a66c4c8ff)
--- /dev/null
+diff --git a/libavformat/mov.c b/libavformat/mov.c
+index 2810960..71c37c2 100644
+--- a/libavformat/mov.c
++++ b/libavformat/mov.c
+@@ -2321,6 +2321,9 @@ static void mov_build_index(MOVContext *mov, AVStream *st)
+ if (sc->time_offset < 0)
+ sc->time_offset = av_rescale(sc->time_offset, sc->time_scale, mov->time_scale);
+ current_dts = -sc->time_offset;
++ if (sc->ctts_data && sc->ctts_count) {
++ current_dts -= sc->ctts_data[0].duration;
++ }
+ }
+
+ /* only use old uncompressed audio chunk demuxing when stts specifies it */