Rich Felker [Fri, 24 Jun 2011 02:04:06 +0000 (22:04 -0400)]
prepare support for LD_LIBRARY_PATH (checking suid/sgid safety)
the use of this test will be much stricter than glibc and other
typical implementations; the environment will not be honored
whatsoever unless the program is confirmed non-suid/sgid by the aux
vector the kernel passed in. no fallback to slow syscall-based
checking is used if the kernel fails to provide the information; we
simply assume the worst (suid) in this case and refuse to honor
environment.
Rich Felker [Sat, 18 Jun 2011 23:48:42 +0000 (19:48 -0400)]
experimental dynamic linker!
some notes:
- library search path is hard coded
- x86_64 code is untested and may not work
- dlopen/dlsym is not yet implemented
- relocations in read-only memory won't work
Rich Felker [Tue, 14 Jun 2011 05:35:51 +0000 (01:35 -0400)]
fix race condition in pthread_kill
if thread id was reused by the kernel between the time pthread_kill
read it from the userspace pthread_t object and the time of the tgkill
syscall, a signal could be sent to the wrong thread. the tgkill
syscall was supposed to prevent this race (versus the old tkill
syscall) but it can't; it can only help in the case where the tid is
reused in a different process, but not when the tid is reused in the
same process.
the only solution i can see is an extra lock to prevent threads from
exiting while another thread is trying to pthread_kill them. it should
be very very cheap in the non-contended case.
Rich Felker [Sun, 12 Jun 2011 19:58:15 +0000 (15:58 -0400)]
floating point environment, untested
at present the i386 code does not support sse floating point, which is
not part of the standard i386 abi. while it may be desirable to
support it later, doing so will reduce performance and require some
tricks to probe if sse support is present.
this first commit is i386-only, but it should be trivial to port the
asm to x86_64.
Rich Felker [Sun, 12 Jun 2011 14:53:42 +0000 (10:53 -0400)]
malloc: cast size down to int in bin_index functions
even if size_t was 32-bit already, the fact that the value was
unsigned and that gcc is too stupid to figure out it would be positive
as a signed quantity (due to the immediately-prior arithmetic and
conditionals) results in gcc compiling the integer-to-float conversion
as zero extension to 64 bits followed by an "fildll" (64 bit)
instruction rather than a simple "fildl" (32 bit) instruction on x86.
reportedly fildll is very slow on certain p4-class machines; even if
not, the new code is slightly smaller.
Rich Felker [Tue, 7 Jun 2011 15:26:42 +0000 (11:26 -0400)]
use __WCHAR_TYPE__ on i386 if it is defined
unfortunately traditional i386 practice was to use "long" rather than
"int" for wchar_t, despite the latter being much more natural and
logical. we followed this practice, but it seems some compilers (clang
and maybe certain gcc builds or others too..?) have switched to using
int, resulting in spurious pointer type mismatches when L"..." wide
strings are used. the best solution I could find is to use the
compiler's definition of wchar_t if it exists, and otherwise fallback
to the traditional definition.
there's no point in duplicating this approach on 64-bit archs, as
their only 32-bit type is int.
Rich Felker [Mon, 6 Jun 2011 22:04:28 +0000 (18:04 -0400)]
fix handling of d_name in struct dirent
basically there are 3 choices for how to implement this variable-size
string member:
1. C99 flexible array member: breaks using dirent.h with pre-C99 compiler.
2. old way: length-1 string: generates array bounds warnings in caller.
3. new way: length-NAME_MAX string. no problems, simplifies all code.
of course the usable part in the pointer returned by readdir might be
shorter than NAME_MAX+1 bytes, but that is allowed by the standard and
doesn't hurt anything.
Rich Felker [Sun, 5 Jun 2011 23:29:52 +0000 (19:29 -0400)]
safety fix for glob's vla usage: disallow patterns longer than PATH_MAX
this actually inadvertently disallows some valid patterns with
redundant / or * characters, but it's better than allowing unbounded
vla allocation.
eventually i'll write code to move the pattern to the stack and
eliminate redundancy to ensure that it fits in PATH_MAX at the
beginning of glob. this would also allow it to be modified in place
for passing to fnmatch rather than copied at each level of recursion.
Rich Felker [Mon, 30 May 2011 15:31:07 +0000 (11:31 -0400)]
implement pthread_[sg]etconcurrency.
there is a resource limit of 0 bits to store the concurrency level
requested. thus any positive level exceeds a resource limit, resulting
in EAGAIN. :-)
Rich Felker [Wed, 11 May 2011 23:58:03 +0000 (19:58 -0400)]
fix the last known rounding bug in floating point printing
the observed symptom was that the code was incorrectly rounding up
1.0625 to 1.063 despite the rounding mode being round-to-nearest with
ties broken by rounding to even last place. however, the code was just
not right in many respects, and i'm surprised it worked as well as it
did. this time i tested the values that end up in the variables round,
small, and the expression round+small, and all look good.
Rich Felker [Sun, 8 May 2011 03:23:58 +0000 (23:23 -0400)]
overhaul implementation-internal signal protections
the new approach relies on the fact that the only ways to create
sigset_t objects without invoking UB are to use the sig*set()
functions, or from the masks returned by sigprocmask, sigaction, etc.
or in the ucontext_t argument to a signal handler. thus, as long as
sigfillset and sigaddset avoid adding the "protected" signals, there
is no way the application will ever obtain a sigset_t including these
bits, and thus no need to add the overhead of checking/clearing them
when sigprocmask or sigaction is called.
note that the old code actually *failed* to remove the bits from
sa_mask when sigaction was called.
the new implementations are also significantly smaller, simpler, and
faster due to ignoring the useless "GNU HURD signals" 65-1024, which
are not used and, if there's any sanity in the world, never will be
used.
Rich Felker [Sat, 7 May 2011 01:45:48 +0000 (21:45 -0400)]
reduce some ridiculously large spin counts
these should be tweaked according to testing. offhand i know 1000 is
too low and 5000 is likely to be sufficiently high. consider trying to
add futexes to file locking, too...
Rich Felker [Sat, 7 May 2011 00:00:59 +0000 (20:00 -0400)]
completely new barrier implementation, addressing major correctness issues
the previous implementation had at least 2 problems:
1. the case where additional threads reached the barrier before the
first wave was finished leaving the barrier was untested and seemed
not to be working.
2. threads leaving the barrier continued to access memory within the
barrier object after other threads had successfully returned from
pthread_barrier_wait. this could lead to memory corruption or crashes
if the barrier object had automatic storage in one of the waiting
threads and went out of scope before all threads finished returning,
or if one thread unmapped the memory in which the barrier object
lived.
the new implementation avoids both problems by making the barrier
state essentially local to the first thread which enters the barrier
wait, and forces that thread to be the last to return.
Rich Felker [Mon, 2 May 2011 13:18:03 +0000 (09:18 -0400)]
fix fclose return status logic, again
the previous fix was incorrect, as it would prevent f->close(f) from
being called if fflush(f) failed. i believe this was the original
motivation for using | rather than ||. so now let's just use a second
statement to constrain the order of function calls, and to back to
using |.
Rich Felker [Mon, 2 May 2011 02:16:04 +0000 (22:16 -0400)]
workaround for preprocessor bug in pcc
with this patch, musl compiles and mostly works with pcc 1.0.0. a few
tests are still failing and i'm uncertain whether they are due to
portability problems in musl, or bugs in pcc, but i suspect the
latter.
use compiler builtins for variadic macros when available
this slightly cuts down on the degree musl "fights with" gcc, but more
importantly, it fixes a critical bug when gcc inlines a variadic
function and optimizes out the variadic arguments due to noticing that
they were "not used" (by __builtin_va_arg).
we leave the old code in place if __GNUC__ >= 3 is false; it seems
like it might be necessary at least for tinycc support and perhaps if
anyone ever gets around to fixing gcc 2.95.3 enough to make it work..
replace heap sort with smoothsort implementation by Valentin Ochs
Smoothsort is an adaptive variant of heapsort. This version was
written by Valentin Ochs (apo) specifically for inclusion in musl. I
worked with him to get it working in O(1) memory usage even with giant
array element widths, and to optimize it heavily for size and speed.
It's still roughly 4 times as large as the old heap sort
implementation, but roughly 20 times faster given an almost-sorted
array of 1M elements (20 being the base-2 log of 1M), i.e. it really
does reduce O(n log n) to O(n) in the mostly-sorted case. It's still
somewhat slower than glibc's Introsort for random input, but now
considerably faster than glibc when the input is already sorted, or
mostly sorted.
strictly speaking this and a few other ops should be factored into
asm.h or the file should just be renamed to asm.h, but whatever. clean
it up someday.
1. failed match of literal chars from the format string would always
return matching failure rather than input failure at eof, leading to
infinite loops in some programs.
2. unread of eof would wrongly adjust the character counts reported by
%n, yielding an off-by-one error.
fix minor bugs due to incorrect threaded-predicate semantics
some functions that should have been testing whether pthread_self()
had been called and initialized the thread pointer were instead
testing whether pthread_create() had been called and actually made the
program "threaded". while it's unlikely any mismatch would occur in
real-world problems, this could have introduced subtle bugs. now, we
store the address of the main thread's thread descriptor in the libc
structure and use its presence as a flag that the thread register is
initialized. note that after fork, the calling thread (not necessarily
the original main thread) is the new main thread.
the linux documentation for dup2 says it can fail with EBUSY due to a
race condition with open and dup in the kernel. shield applications
(and the rest of libc) from this nonsense by looping until it succeeds