at the end of successful pthread_once, there was a race window during
which another thread calling pthread_once would momentarily change the
state back from 2 (finished) to 1 (in-progress). in this case, the
status was immediately changed back, but with no wake call, meaning
that waiters which arrived during this short window could block
forever. there are two possible fixes. one would be adding the wake to
the code path where it was missing. but it's better just to avoid
reverting the status at all, by using compare-and-swap instead of
swap.
Szabolcs Nagy [Mon, 14 Apr 2014 15:42:49 +0000 (17:42 +0200)]
fix RLIMIT_ constants for mips
The mips arch is special in that it uses different RLIMIT_
numbers than other archs, so allow bits/resource.h to override
the default RLIMIT_ numbers (empty on all archs except mips).
Reported by orc.
it will be needed to implement some things in sysconf, and the syscall
can't easily be used directly because the x32 syscall uses the wrong
structure layout. the l (uncreative, for "linux") prefix is used since
the symbol name __sysinfo is already taken for AT_SYSINFO from the aux
vector.
the way the x32 override of this function works is also changed to be
simpler and avoid the useless jump instruction.
in sysconf, use getrlimit function rather than raw syscall for rlimits
the syscall is deprecated (replaced by prlimit64) and does not work
correctly on x32. this change mildly increases size, but is likely
needed anyway for newer archs that might omit deprecated syscalls.
avoid linear-time if/else special cases in sysconf
the previous handling of cases that could not fit in the 16-bit table
or which required non-constant results was extremely ugly and could
not scale. the new code remaps these keys into a contiguous range
that's efficient for a switch statement.
aside from potentially offering better performance, this change is
needed since the old coprocessor-based approach to barriers is
deprecated in arm v7, and some compilers/assemblers issue errors when
using the deprecated instruction for v7 targets.
use hidden visibility rather than protected for syscall internals
the use of visibility at all is purely an optimization to avoid the
need for the caller to load the GOT register or similar to prepare for
a call via the PLT. there is no reason for these symbols to be
externally visible, so hidden works just as well as protected, and
using protected visibility is undesirable due to toolchain bugs and
the lack of testing it receives.
in particular, GCC's microblaze target is known to generate symbolic
relocations in the GOT for functions with protected visibility. this
in turn results in a dynamic linker which crashes under any nontrivial
usage that requires making a syscall before symbolic relocations are
processed.
Szabolcs Nagy [Fri, 11 Apr 2014 15:57:30 +0000 (17:57 +0200)]
math: fix aliasing violation in long double wrappers
modfl and sincosl were passing long double* instead of double*
to the wrapped double precision functions (on archs where long
double and double have the same size).
This is fixed now by using temporaries (this is not optimized
to a single branch so the generated code is a bit bigger).
Found by Morten Welinder.
Timo Teräs [Thu, 10 Apr 2014 01:06:17 +0000 (21:06 -0400)]
fix search past the end of haystack in memmem
to optimize the search, memchr is used to find the first occurrence of
the first character of the needle in the haystack before switching to
a search for the full needle. however, the number of characters
skipped by this first step were not subtracted from the haystack
length, causing memmem to search past the end of the haystack.
fix printf rounding with %g for some corner case midpoints
the subsequent rounding code assumes the end pointer (z) accurately
reflects the end of significance in the decimal expansion, but for
certain large integers, spurious trailing zero slots were left behind
when applying the binary exponent.
issue reported by Morten Welinder; the analysis of the cause was
performed by nsz, who also proposed this change.
the "m" constraint could give a memory reference with an offset that's
not compatible with ldrex/strex, so the arm-specific "Q" constraint is
needed instead.
use inline atomics and thread pointer on arm models supporting them
this is perhaps not the optimal implementation; a_cas still compiles
to nested loops due to the different interface contracts of the kuser
helper cas function (whose contract this patch implements) and the
a_cas function (whose contract mimics the x86 cmpxchg). fixing this
may be possible, but it's more complicated and thus deferred until a
later time.
aside from improving performance and code size, this patch also
provides a means of producing binaries which can run on hardened
kernels where the kuser helpers have been disabled. however, at
present this requires producing binaries for armv6k or later, which
will not run on older cpus. a real solution to the problem of kernels
that omit the kuser helpers would be runtime detection, so that
universal binaries which run on all arm cpu models can also be
compatible with all kernel hardening profiles. robust detection
however is a much harder problem, and will be addressed at a later
time.
in a sense this implementation is incomplete since it doesn't provide
the HWCAP_* macros for use with AT_HWCAP, which is perhaps the most
important intended usage case for getauxval. they will be added at a
later time.
fix failure of printf %g to strip trailing zeros in some cases
the code to strip trailing zeros was only looking in the last slot for
up to 9 zeros, assuming that the rounding code had already removed
fully-zero slots from the end. however, this ignored cases where the
rounding code did not run at all, which occur when the value being
printed is exactly representable in the requested precision.
the simplest solution is to move the code that strips trailing zero
slots to run unconditionally, immediately after rounding, rather than
as the last step of rounding.
fix carry into uninitialized slots during printf floating point rounding
in cases where rounding caused a carry, the slot into which the carry
was taking place was unconditionally treated as valid, despite the
possibility that it could be a new slot prior to the beginning of the
existing non-rounded number. in theory this could lead to unbounded
runaway carry, but in order for that to happen, the whole
uninitialized buffer would need to have been pre-filled with 32-bit
integer values greater than or equal to 999999999.
patch based on proposed fix by Morten Welinder, who also discovered
and reported the bug.
remove cruft left behind when lazy thread pointer init was removed
the function itself was static, but the weak alias provided an
externally visible reference and thus prevented the dead code from
being omitted from the output. so this change actually reduces bloat
in mandatory static-linked code.
sin [Thu, 27 Mar 2014 11:20:17 +0000 (11:20 +0000)]
remove struct elem entirely from hsearch.c
There are two changes here, both of which make sense to be done in a
single patch:
- Remove hash from struct elem and compute it at runtime wherever
necessary.
- Eliminate struct elem and use ENTRY directly.
As a result we cut down on the memory usage as each element in the
hash table now contains only an ENTRY not an ENTRY + size_t for the
hash. The downside is that the hash needs to be computed at runtime.
sin [Tue, 25 Mar 2014 16:37:51 +0000 (16:37 +0000)]
implement hcreate_r, hdestroy_r and hsearch_r
the size and alignment of struct hsearch_data are matched to the glibc
definition for binary compatibility. the members of the structure do
not match, which should not be a problem as long as applications
correctly treat the structure as opaque.
unlike the glibc implementation, this version of hcreate_r does not
require the caller to zero-fill the structure before use.
avoid malloc failure for small requests when brk can't be extended
this issue mainly affects PIE binaries and execution of programs via
direct invocation of the dynamic linker binary: depending on kernel
behavior, in these cases the initial brk may be placed at at location
where it cannot be extended, due to conflicting adjacent maps.
when brk fails, mmap is used instead to expand the heap. in order to
avoid expensive bookkeeping for managing fragmentation by merging
these new heap regions, the minimum size for new heap regions
increases exponentially in the number of regions. this limits the
number of regions, and thereby the number of fixed fragmentation
points, to a quantity which is logarithmic with respect to the size of
virtual address space and thus negligible. the exponential growth is
tuned so as to avoid expanding the heap by more than approximately 50%
of its current total size.
the kernel entry point for syscalls on microblaze nominally saves and
restores all registers, and testing on qemu always worked since qemu
behaves this way too. however, the real kernel treats r3:r4 as a
potential 64-bit return value from the syscall function, and copies
both over top of the saved registers before returning to userspace.
thus, we need to treat r4 as always-clobbered.
Timo Teräs [Tue, 25 Mar 2014 19:50:15 +0000 (21:50 +0200)]
remove lazy ssp initialization
now that thread pointer is initialized always, ssp canary
initialization can be done unconditionally. this simplifies
the ldso as it does not try to detect ssp usage, and the
init function itself as it is always called exactly once.
this also merges ssp init path for shared and static linking.
Timo Teräs [Tue, 25 Mar 2014 18:59:50 +0000 (20:59 +0200)]
clean up internal dynamic linker functions enumerating phdrs
record phentsize in struct dso, so the phdrs can be easily
enumerated via it. simplify all functions enumerating phdrs
to require only struct dso. also merge find_map_range and
find_dso to kernel_mapped_dso function that does both tasks
during single phdr enumeration.
Rich Felker [Mon, 24 Mar 2014 20:57:11 +0000 (16:57 -0400)]
always initialize thread pointer at program start
this is the first step in an overhaul aimed at greatly simplifying and
optimizing everything dealing with thread-local state.
previously, the thread pointer was initialized lazily on first access,
or at program startup if stack protector was in use, or at certain
random places where inconsistent state could be reached if it were not
initialized early. while believed to be fully correct, the logic was
fragile and non-obvious.
in the first phase of the thread pointer overhaul, support is retained
(and in some cases improved) for systems/situation where loading the
thread pointer fails, e.g. old kernels.
some notes on specific changes:
- the confusing use of libc.main_thread as an indicator that the
thread pointer is initialized is eliminated in favor of an explicit
has_thread_pointer predicate.
- sigaction no longer needs to ensure that the thread pointer is
initialized before installing a signal handler (this was needed to
prevent a situation where the signal handler caused the thread
pointer to be initialized and the subsequent sigreturn cleared it
again) but it still needs to ensure that implementation-internal
thread-related signals are not blocked.
- pthread tsd initialization for the main thread is deferred in a new
manner to minimize bloat in the static-linked __init_tp code.
- pthread_setcancelstate no longer needs special handling for the
situation before the thread pointer is initialized. it simply fails
on systems that cannot support a thread pointer, which are
non-conforming anyway.
- pthread_cleanup_push/pop now check for missing thread pointer and
nop themselves out in this case, so stdio no longer needs to avoid
the cancellable path when the thread pointer is not available.
a number of cases remain where certain interfaces may crash if the
system does not support a thread pointer. at this point, these should
be limited to pthread interfaces, and the number of such cases should
be fewer than before.
Rich Felker [Mon, 24 Mar 2014 03:19:30 +0000 (23:19 -0400)]
reduce static linking overhead from TLS support by inlining mmap syscall
the external mmap function is heavy because it has to handle error
reporting that the kernel cannot do, and has to do some locking for
arcane race-condition-avoidance purposes. for allocating initial TLS,
we do not need any of that; the raw syscall suffices.
on i386, this change shaves off 13% of the size of .text for the empty
program.
Rich Felker [Mon, 24 Mar 2014 00:42:05 +0000 (20:42 -0400)]
include header that declares __syscall_ret where it's defined
in general, we aim to always include the header that's declaring a
function before defining it so that the compiler can check that
prototypes match.
additionally, the internal syscall.h declares __syscall_ret with a
visibility attribute to improve code generation for shared libc (to
prevent gratuitous GOT-register loads). this declaration should be
visible at the point where __syscall_ret is defined, too, or the
inconsistency could theoretically lead to problems at link-time.
Rich Felker [Thu, 20 Mar 2014 08:15:47 +0000 (04:15 -0400)]
remove claim of XSI coverage from README
in addition to the dbm functions (which we don't intent to implement
anyway), fmtmsg is still missing too. rather than adding exceptions I
think it's best just to avoid making the claim.
Rich Felker [Thu, 20 Mar 2014 04:55:28 +0000 (00:55 -0400)]
update INSTALL file with new information and better advice
the text covering an ill-advised procedure for 'bootstrapping' a new
musl-based system in-place is removed. new information on targets and
compilers is added. formatting improved. the remaining text is
adjusted to cover both usage with musl-gcc on a non-musl-based system
and upgrading a musl-based system or toolchain.
Rich Felker [Wed, 19 Mar 2014 01:52:24 +0000 (21:52 -0400)]
fix size of mips jmp_buf
the excess space was unused and unintentional. this change does not
affect the ABI between applications and libc. while it does
theoretically affect linkage between third-party translation units
using jmp_buf as part of a structure, we've already changed jmp_buf at
least once on all archs, and problems were never observed, likely
because such usage would be very unusual. in any case it's best to get
things right now rather than making changes sometime during the 1.0.x
series or later.
Rich Felker [Tue, 18 Mar 2014 21:08:15 +0000 (17:08 -0400)]
use syscall_arg_t for arguments in public syscall() function
on x32, this change allows programs which use syscall() with pointers
or 64-bit values as arguments to work correctly, i.e. without
truncation or incorrect sign extension. on all other supported archs,
syscall_arg_t is defined as long, so this change is a no-op.
Rich Felker [Mon, 17 Mar 2014 21:38:22 +0000 (17:38 -0400)]
make configure accept alternate gcc tuples for x32
the previous pattern required "x32" to be used as the second field of
the gcc tuple, which is usually reserved for vendor use and not
appropriate as an ABI specifier. with this change, putting "x32" at
the end of the tuple, the way ABI specifiers are normally done, is
also permitted.
Rich Felker [Mon, 17 Mar 2014 04:25:23 +0000 (00:25 -0400)]
fix negated error codes from ptsname_r
the incorrect error codes also made their way into errno when
__ptsname_r was called by plain ptsname, which reports errors via
errno rather than a return value.
Bobby Bingham [Sun, 16 Mar 2014 21:17:28 +0000 (16:17 -0500)]
superh: fix dynamic linking of __fpscr_values
Applications ended up with copy relocations for this array, which
resulted in libc's references to this array pointing to the
application's copy. The dynamic linker, however, can require this array
before the application is relocated, and therefore before the
application's copy of this array is initialized. This resulted in
garbage being loaded into FPSCR before executing main, which violated
the ABI.
We fix this by putting the array in crt1 and making the libc copy
private. This prevents libc's reference to the array from pointing to
an uninitialized copy in the application.
rofl0r [Thu, 13 Mar 2014 19:27:55 +0000 (20:27 +0100)]
semctl: fix UB causing crashes on powerpc
it's UB to fetch variadic args when none are passed, and this caused
real crashes on ppc due to its calling convention, which defines that
for variadic functions aggregate types be passed as pointers.
the assignment caused that pointer to get dereferenced, resulting in
a crash.
Szabolcs Nagy [Wed, 12 Mar 2014 14:59:09 +0000 (15:59 +0100)]
fix statfs struct on mips
The mips statfs struct layout is different than on other archs, so the
statfs, fstatfs, statvfs and fstatvfs APIs were broken on mips.
Now the ordering is fixed, the types are kept consistent with other archs.
Rich Felker [Tue, 11 Mar 2014 21:01:34 +0000 (17:01 -0400)]
fix sysvipc structures on powerpc
these have been wrong for a long time and were never detected or
corrected. powerpc needs some gratuitous extra padding/reserved slots
in ipc_perm, big-endian ordering for the padding of time_t slots that
was intended by the kernel folks to allow a transition to 64-bit
time_t, and some minor gratuitous reordering of struct members.
Rich Felker [Tue, 11 Mar 2014 19:27:13 +0000 (15:27 -0400)]
move struct semid_ds to from shared sys/sem.h to bits
the definition was found to be incorrect at least for powerpc, and
fixing this cleanly requires making the definition arch-specific. this
will allow cleaning up the definition for other archs to make it more
specific, and reversing some of the ugliness (time_t hacks) introduced
with the x32 port.
this first commit simply copies the existing definition to each arch
without any changes. this is intentional, to make it easier to review
changes made on a per-arch basis.
Rich Felker [Sun, 9 Mar 2014 07:09:49 +0000 (03:09 -0400)]
fix incorrect rounding in printf floating point corner cases
the printf floating point formatting code contains an optimization to
avoid computing digits that will be thrown away by rounding at the
specified (or default) precision. while it was correctly retaining all
places up to the last decimal place to be printed, it was not
retaining enough precision to see the next nonzero decimal place in
all cases. this could cause incorrect rounding down in round-to-even
(default) rounding mode, for example, when printing 0.5+DBL_EPSILON
with "%.0f".
in the fix, LDBL_MANT_DIG/3 is a lazy (non-sharp) upper bound on the
number of zeros between any two nonzero decimal digits.
Rich Felker [Sun, 9 Mar 2014 06:38:52 +0000 (01:38 -0500)]
fix buffer overflow in printf formatting of denormals with low bit set
empirically the overflow was an off-by-one, and it did not seem to be
overwriting meaningful data. rather than simply increasing the buffer
size by one, however, I have attempted to make the size obviously
correct in terms of bounds on the number of iterations for the loops
that fill the buffer. this still results in no more than a negligible
size increase of the buffer on the stack (6-7 32-bit slots) and is a
"safer" fix unless/until somebody wants to do the proof that a smaller
buffer would suffice.
Rich Felker [Sat, 8 Mar 2014 05:50:19 +0000 (00:50 -0500)]
in sys/procfs.h, avoid using __WORDSIZE macro
this was problematic because several archs don't define __WORDSIZE. we
could add it, but I would rather phase this macro out in the long
term. in our version of the headers, UINTPTR_MAX is available here, so
just use it instead.
Rich Felker [Sat, 8 Mar 2014 04:56:48 +0000 (23:56 -0500)]
in fcntl, use unsigned long instead of long for variadic argument type
neither is correct; different commands take different argument types,
and some take no arguments at all. I have a much larger overhaul of
fcntl prepared to address this, but it's not appropriate to commit
during freeze.
the immediate problem being addressed affects forward-compatibility on
x32: if new commands are added and they take pointers, but the
libc-level fcntl function is not aware of them, using long would
sign-extend the pointer to 64 bits and give the kernel an invalid
pointer. on the kernel side, the argument to fcntl is always treated
as unsigned long, so no harm is done by treating possibly-signed
integer arguments as unsigned. for every command that takes an integer
argument except for F_SETOWN, large integer arguments and negative
arguments are handled identically anyway. in the case of F_SETOWN, the
kernel is responsible for converting the argument which it received as
unsigned long to int, so the sign of negative arguments is recovered.
the other problem that will be addressed later is that the type passed
to va_arg does not match the type in the caller of fcntl. an advanced
compiler doing cross-translation-unit analysis could potentially see
this mismatch and issue warnings or otherwise make trouble.
on i386, this patch was confirmed not to alter the code generated by
gcc 4.7.3. in principle the generated code should not be affected on
any arch except x32.
rofl0r [Wed, 5 Mar 2014 23:26:03 +0000 (00:26 +0100)]
x32: fix sysinfo()
the kernel uses long longs in the struct, but the documentation
says they're long. so we need to fixup the mismatch between the
userspace and kernelspace structs.
since the struct offers a mem_unit member, we can avoid truncation
by adjusting that value.
Rich Felker [Wed, 5 Mar 2014 21:08:56 +0000 (16:08 -0500)]
fix strerror on mips: one error code is out of the 8-bit table range
if we ever encounter other targets where error codes don't fit in the
8-bit range, the table should probably just be bumped to 16-bit, but
for now I don't want to increase the table size on all archs just
because of a bug in the mips abi.
Rich Felker [Fri, 28 Feb 2014 18:12:40 +0000 (13:12 -0500)]
improve configure's target arch matching
most notably, it was failing to match sh4-*, etc., but in general the
explicit matching of hyphens for some archs was problematic because it
failed to accept simply the musl-style arch name (without a gcc-style
tuple) as an input. the original motivation of matching hyphens was to
prevent incorrectly identifying a 64-bit arch as the corresponding
32-bit arch (e.g. mips* matching mips64) but this is easily fixed by
simply checking (and for now, rejecting as unsupported) the relevant
64-bit archs.
Rich Felker [Fri, 28 Feb 2014 03:03:25 +0000 (22:03 -0500)]
rename superh port to "sh" for consistency
linux, gcc, etc. all use "sh" as the name for the superh arch. there
was already some inconsistency internally in musl: the dynamic linker
was searching for "ld-musl-sh.path" as its path file despite its own
name being "ld-musl-superh.so.1". there was some sentiment in both
directions as to how to resolve the inconsistency, but overall "sh"
was favored.
Rich Felker [Tue, 25 Feb 2014 18:05:38 +0000 (13:05 -0500)]
fix readdir not to set ENOENT when directory is removed while reading
per POSIX, ENOENT is reserved for invalid stream position; it is an
optional error and would only happen if the application performs
invalid seeks on the underlying file descriptor. however, linux's
getdents syscall also returns ENOENT if the directory was removed
between the time it was opened and the time of the read. we need to
catch this case and remap it to simple end-of-file condition (null
pointer return value like an error, but no change to errno). this
issue reportedly affects GNU make in certain corner cases.
rather than backing up and restoring errno, I've just changed the
syscall to be made in a way that doesn't affect errno (via an inline
syscall rather than a call to the __getdents function). the latter
still exists for the purpose of providing the public getdents alias
which sets errno.
Szabolcs Nagy [Mon, 24 Feb 2014 22:16:29 +0000 (23:16 +0100)]
mips: add mips-sf subarch support (soft-float)
Userspace emulated floating-point (gcc -msoft-float) is not compatible
with the default mips abi (assumes an FPU or in kernel emulation of it).
Soft vs hard float abi should not be mixed, __mips_soft_float is checked
in musl's configure script and there is no runtime check. The -sf subarch
does not save/restore floating-point registers in setjmp/longjmp and only
provides dummy fenv implementation.
rofl0r [Mon, 24 Feb 2014 21:49:42 +0000 (22:49 +0100)]
fixup general __syscall breakage introduced in x32 port
the reordering of headers caused some risc archs to not see
the __syscall declaration anymore.
this caused build errors on mips with any compiler,
and on arm and microblaze with clang.
we now declare it locally just like the powerpc port does.
rofl0r [Sun, 23 Feb 2014 15:36:43 +0000 (16:36 +0100)]
fix some issues in x32 syscall_cp_fixup
- the nanosleep fixup "fixed" the second timespec* argument erroneusly.
- the futex fixup was missing the check for FUTEX_WAIT.
- general cleanup using a macro.
rofl0r [Tue, 7 Jan 2014 22:30:30 +0000 (23:30 +0100)]
configure: recognize x86_64-x32 and x32
x32 is the internal arch name, but glibc uses x86_64-x32.
there doesn't exist a specific triple for x32 in gcc and binutils.
you're supposed to build your compiler for x86_64 and configure
it with multilib support for "mx32".
however it turns out that using a triple of x86_64-x32 makes
gcc and binutils pick up the right arch (they detect it as x86_64)
and allows us to have a unique triple for cross-compiler toolchains.
rofl0r [Tue, 7 Jan 2014 15:49:23 +0000 (16:49 +0100)]
internal/syscall.h: add syscall_arg_t macro
some 32-on-64 archs require that the actual syscall args be long long.
in that case syscall_arch.h can define syscall_arg_t to whatever it needs
and syscall.h picks it up.
all other archs just use long as usual.
rofl0r [Tue, 7 Jan 2014 02:31:34 +0000 (03:31 +0100)]
internal/syscall.h: use a macro for the syscall args casts
this allows syscall_arch.h to define the macro __scc if special
casting is needed, as is the case for x32, where the actual syscall
arguments are 64bit, but, in case of pointers, would get sign-extended
and thus become invalid.
Rich Felker [Sat, 22 Feb 2014 03:25:26 +0000 (22:25 -0500)]
add fallback emulation for accept4 on old kernels
the other atomic FD_CLOEXEC interfaces (dup3, pipe2, socket) already
had such emulation in place. the justification for doing the emulation
here is the same as for the other functions: it allows applications to
simply use accept4 rather than having to have their own fallback code
for ENOSYS/EINVAL (which one you get is arch-specific!) and there is
no reasonable way an application could benefit from knowing the
operation is emulated/non-atomic since there is no workaround at the
application level for non-atomicity (that is the whole reason these
interfaces were added).
Rich Felker [Thu, 13 Feb 2014 17:24:40 +0000 (12:24 -0500)]
fix typo in table for getprotoent that caused out-of-bound reads
this was unlikely to lead to any crash or dangerous behavior, but
caused adjacent string constants to be treated as part of the
protocols table, possibly returning nonsensical results for unknown
protocol names/numbers or when getprotoent was called in a loop to
enumerate all protocols.