This matters if the buffer overflows, when the count is used
for patching the buffer back together. This happens if there
are multiple video packets with zero timestamp at the start
of the stream (before any audio packets), enough to fill the buffer.