this race is fundamentally due to linux's bogus requirement that
userspace, rather than kernelspace, fill in the siginfo structure. an
intervening signal handler that calls fork could cause both the parent
and child process to send signals claiming to be from the parent,
which could in turn have harmful effects depending on what the
recipient does with the signal. we simply block all signals for the
interval between getuid and sigqueue syscalls (much like what raise()
does already) to prevent the race and make the getuid/sigqueue pair
atomic.
this will be a non-issue if linux is fixed to validate the siginfo
structure or fill it in from kernelspace.
fix some bugs in setxid and update setrlimit to use __synccall
setrlimit is supposed to be per-process, not per-thread, but again
linux gets it wrong. work around this in userspace. not only is it
needed for correctness; setxid also depends on the resource limits for
all threads being the same to avoid situations where temporarily
unlimiting the limit succeeds in some threads but fails in others.
previously, stdio used spinlocks, which would be unacceptable if we
ever add support for thread priorities, and which yielded
pathologically bad performance if an application attempted to use
flockfile on a key file as a major/primary locking mechanism.
i had held off on making this change for fear that it would hurt
performance in the non-threaded case, but actually support for
recursive locking had already inflicted that cost. by having the
internal locking functions store a flag indicating whether they need
to perform unlocking, rather than using the actual recursive lock
counter, i was able to combine the conditionals at unlock time,
eliminating any additional cost, and also avoid a nasty corner case
where a huge number of calls to ftrylockfile could cause deadlock
later at the point of internal locking.
this commit also fixes some issues with usage of pthread_self
conflicting with __attribute__((const)) which resulted in crashes with
some compiler versions/optimizations, mainly in flockfile prior to
pthread_create.
changing credentials in a multi-threaded program is extremely
difficult on linux because it requires synchronizing the change
between all threads, which have their own thread-local credentials on
the kernel side. this is further complicated by the fact that changing
the real uid can fail due to exceeding RLIMIT_NPROC, making it
possible that the syscall will succeed in some threads but fail in
others.
the old __rsyscall approach being replaced was robust in that it would
report failure if any one thread failed, but in this case, the program
would be left in an inconsistent state where individual threads might
have different uid. (this was not as bad as glibc, which would
sometimes even fail to report the failure entirely!)
the new approach being committed refuses to change real user id when
it cannot temporarily set the rlimit to infinity. this is completely
POSIX conformant since POSIX does not require an implementation to
allow real-user-id changes for non-privileged processes whatsoever.
still, setting the real uid can fail due to memory allocation in the
kernel, but this can only happen if there is not already a cached
object for the target user. thus, we forcibly serialize the syscalls
attempts, and fail the entire operation on the first failure. this
*should* lead to an all-or-nothing success/failure result, but it's
still fragile and highly dependent on kernel developers not breaking
things worse than they're already broken.
ideally linux will eventually add a CLONE_USERCRED flag that would
give POSIX conformant credential changes without any hacks from
userspace, and all of this code would become redundant and could be
removed ~10 years down the line when everyone has abandoned the old
broken kernels. i'm not holding my breath...
instead of creating temp dso objects on the stack and moving them to
the heap if dlopen/dlsym are used, use static objects to begin with,
and just donate them to malloc if we no longer need them.
these changes also make it so clock_gettime(CLOCK_REALTIME, &ts) works
even on pre-2.6 kernels, emulated via the gettimeofday syscall. there
is no cost for the fallback check, as it falls under the error case
that already must be checked for storing the error code in errno, but
which would normally be hidden inside __syscall_ret.
we cannot report failure after forking, so the idea is to ensure prior
to fork that fd 0,1,2 exist. this will prevent dup2 from possibly
hitting a resource limit and failing in the child process. fcntl
rather than dup2 is used prior to forking to avoid race conditions.
fread was calling f->read without checking that the file was in
reading mode. this could:
1. crash, if f->read was a null pointer
2. cause unwanted blocking on a terminal already at eof
3. allow reading on a write-only file
stopping without letting the parser see a stop character prevented
getting a result. so treat all high chars as the null character and
pass them into the parser.
also eliminated ugly tmp var using compound literals.
this fixes a number of bugs in integer parsing due to lazy haphazard
wrapping, as well as some misinterpretations of the standard. the new
parser is able to work character-at-a-time or on whole strings, making
it easy to support the wide functions without unbounded space for
conversion. it will also be possible to update scanf to use the new
parser.
STREAMS are utterly useless as far as I can tell, but some software
was apparently broken by the presence of stropts.h but lack of macros
it's supposed to define...
Rich Felker [Thu, 30 Jun 2011 20:31:43 +0000 (16:31 -0400)]
catch invalid ld80 bit patterns and treat them as nan
this should not be necessary - the invalid bit patterns cannot be
created except through type punning. however, some broken gnu software
is passing them to printf and triggering dangerous stack-smashing, so
let's catch them anyway...
Rich Felker [Thu, 30 Jun 2011 15:42:33 +0000 (11:42 -0400)]
implement the nonstandard GNU function fpurge
this is a really ugly and backwards function, but its presence will
prevent lots of broken gnulib software from trying to define its own
version of fpurge and thereby failing to build or worse.
Rich Felker [Wed, 29 Jun 2011 04:42:42 +0000 (00:42 -0400)]
work around linux bug in mprotect
per POSIX: The mprotect() function shall change the access protections
to be that specified by prot for those whole pages containing any part
of the address space of the process starting at address addr and
continuing for len bytes.
on the other hand, linux mprotect fails with EINVAL if the base
address and/or length is not page-aligned, so we have to align them
before making the syscall.
Rich Felker [Tue, 28 Jun 2011 18:20:41 +0000 (14:20 -0400)]
use load address from elf header if possible
this is mostly useless for shared libs (though it could help for
prelink-like purposes); the intended use case is for adding support
for calling the dynamic linker directly to run a program, as in:
./libc.so ./a.out foo
Rich Felker [Mon, 27 Jun 2011 01:21:04 +0000 (21:21 -0400)]
fix resolving symbols in objects loaded in RTLD_LOCAL mode
basically we temporarily make the library and all its dependencies
part of the global namespace but only for the duration of performing
relocations, then return them to their former state.
Rich Felker [Sun, 26 Jun 2011 21:39:17 +0000 (17:39 -0400)]
error handling in dynamic linking
some of the code is not yet used, and is in preparation for dlopen
which needs to be able to handle failure loading libraries without
terminating the program.
Rich Felker [Fri, 24 Jun 2011 02:04:06 +0000 (22:04 -0400)]
prepare support for LD_LIBRARY_PATH (checking suid/sgid safety)
the use of this test will be much stricter than glibc and other
typical implementations; the environment will not be honored
whatsoever unless the program is confirmed non-suid/sgid by the aux
vector the kernel passed in. no fallback to slow syscall-based
checking is used if the kernel fails to provide the information; we
simply assume the worst (suid) in this case and refuse to honor
environment.
Rich Felker [Sat, 18 Jun 2011 23:48:42 +0000 (19:48 -0400)]
experimental dynamic linker!
some notes:
- library search path is hard coded
- x86_64 code is untested and may not work
- dlopen/dlsym is not yet implemented
- relocations in read-only memory won't work