if two or more threads accessed tls in a dso that was loaded after
the threads were created, then __tls_get_new could do out-of-bound
memory access (leading to segfault).
accidentally byte count was used instead of element count when
the new dtv pointer was computed. (dso->new_dtv is (void**).)
it is rare that the same dso provides dtv for several threads,
the crash was not observed in practice, but possible to trigger.