clean and simple, but fails when the caller does not have permissions
to open the file for reading or when /proc is not available. i may
replace this with a full implementation later, possibly leaving this
version as an optimization to use when it works.
debloat: use __syscall instead of syscall where possible
don't waste time (and significant code size due to function call
overhead!) setting errno when the result of a syscall does not matter
or when it can't fail.
x86_64 was just plain wrong in the cancel-flag-already-set path, and
crashing.
the more subtle error was not clearing the saved stack pointer before
returning to c code. this could result in the signal handler
misidentifying c code as the pre-syscall part of the asm, and acting
on cancellation at the wrong time, and thus resource leak race
conditions.
also, now __cancel (in the c code) is responsible for clearing the
saved sp in the already-cancelled branch. this means we have to use
call rather than jmp to ensure the stack pointer in the c will never
match what the asm saved.
the goal is to be able to use pthread_setcancelstate internally in
the implementation, whenever a function might want to use functions
which are cancellation points but avoid becoming a cancellation point
itself. i could have just used a separate internal function for
temporarily inhibiting cancellation, but the solution in this commit
is better because (1) it's one less implementation-specific detail in
functions that need to use it, and (2) application code can also get
the same benefit.
previously, pthread_setcancelstate dependend on pthread_self, which
would pull in unwanted thread setup overhead for non-threaded
programs. now, it temporarily stores the state in the global libc
struct if threads have not been initialized, and later moves it if
needed. this way we can instead use __pthread_self, which has no
dependencies and assumes that the thread register is already valid.
this patch improves the correctness, simplicity, and size of
cancellation-related code. modulo any small errors, it should now be
completely conformant, safe, and resource-leak free.
the notion of entering and exiting cancellation-point context has been
completely eliminated and replaced with alternative syscall assembly
code for cancellable syscalls. the assembly is responsible for setting
up execution context information (stack pointer and address of the
syscall instruction) which the cancellation signal handler can use to
determine whether the interrupted code was in a cancellable state.
these changes eliminate race conditions in the previous generation of
cancellation handling code (whereby a cancellation request received
just prior to the syscall would not be processed, leaving the syscall
to block, potentially indefinitely), and remedy an issue where
non-cancellable syscalls made from signal handlers became cancellable
if the signal handler interrupted a cancellation point.
x86_64 asm is untested and may need a second try to get it right.
setting errno here is completely valid, but some programs, notably
busybox printf, assume that errno will not be set during output and
treat this as an error condition. in any case, skipping it slightly
reduces code size and saves time.
use a separate signal from SIGCANCEL for SIGEV_THREAD timers
otherwise we cannot support an application's desire to use
asynchronous cancellation within the callback function. this change
also slightly debloats pthread_create.c.
we take advantage of the fact that unless self->cancelpt is 1,
cancellation cannot happen. so just increment it by 2 to temporarily
block cancellation. this drops pthread_create.o well under 1k.
with datagram sockets, depending on fprintf not to flush the output
early was very fragile; the new version simply uses a small fixed-size
buffer. it could be updated to dynamic-allocate large buffers if
needed, but i can't envision any admin being happy about finding
64kb-long lines in their syslog...
it should be noted that flock does not mix well with standard fcntl
locking, but nonetheless some applications will attempt to use flock
instead of fcntl if both exist. options to configure or small patches
may be needed. debian maintainers have plenty of experience with this
unfortunate situation...
after fork, we have a new process and the pid is equal to the tid of
the new main thread. there is no need to make two separate syscalls to
obtain the same number.
we can do this without violating the namespace now that they are
macros/inline functions rather than extern functions. the motivation
is that gcc was generating giant, slow, horrible code for the old
functions, and now generates a single byte-swapping instruction.
the basic idea is that the only things in alltypes.h should be types
that either vary from system to system (in practice, not just in
theoretical la-la land - this is the implementation so we choose what
constraints we want to impose on ports) or which are needed by
multiple system headers.
1. saved errno was not being restored, illegally clearing errno to 0.
2. no need to backup and save errno around free; it will not touch
except perhaps when the program has already invoked UB...
actually FLT_ROUNDS needs to expand to a static inline function that
obtains the current rounding mode and returns it, but that will be
added later with fenv.h stuff.
according to posix, readv "shall be equivalent to read(), except..."
that it places the data into the buffers specified by the iov array.
however on linux, when reading from a terminal, each iov element
behaves almost like a separate read. this means that if the first iov
exactly satisfied the request (e.g. a length-one read of '\n') and the
second iov is nonzero length, the syscall will block again after
getting the blank line from the terminal until another line is read.
simply put, entering a single blank line becomes impossible.
the solution, fortunately, is simple. whenever the buffer size is
nonzero, reduce the length of the requested read by one byte and let
the last byte go through the buffer. this way, readv will already be
in the second (and last) iov, and won't re-block on the second iov.
POSIX clearly specifies the type of msg_iovlen and msg_controllen, and
Linux ignores it and makes them both size_t instead. to work around
this we add padding (instead of just using the wrong types like glibc
does), but we also need to patch-up the struct before passing it to
the kernel in case the caller did not zero-fill it.
if i could trust the kernel to just ignore the upper 32 bits, this
would not be necessary, but i don't think it will ignore them...
return the requested string as the "canonical name" for numeric addresses
previously NULL was returned in ai_canonname, resulting in crashes in
some callers. this behavior was incorrect. note however that the new
behavior differs from glibc, which performs reverse dns lookups. POSIX
is very clear that a reverse DNS lookup must not be performed for
numeric addresses.
this is something of a tradeoff, as now set*id() functions, rather
than pthread_create, are what pull in the code overhead for dealing
with linux's refusal to implement proper POSIX thread-vs-process
semantics. my motivations are:
1. it's cleaner this way, especially cleaner to optimize out the
rsyscall locking overhead from pthread_create when it's not needed.
2. it's expected that only a tiny number of core system programs will
ever use set*id() functions, whereas many programs may want to use
threads, and making thread overhead tiny is an incentive for "light"
programs to try threads.
fix signal-based timers with null sigevent argument
since timer_create is no longer allocating a structure for the timer_t
and simply using the kernel timer id, it was impossible to specify the
timer_t as the argument to the signal handler. the solution is to pass
the null sigevent pointer on to the kernel, rather than filling it in
userspace, so that the kernel does the right thing. however, that
precludes the clever timerid-versus-threadid encoding we were doing.
instead, just assume timerids are below 1M and thread pointers are
above 1M. (in perspective: timerids are sequentially allocated and
seem limited to 32k, and thread pointers are at roughly 3G.)
new framework to inhibit thread cancellation when needed
with these small changes, libc functions which need to call functions
which are cancellation points, but which themselves must not be
cancellation points, can use the CANCELPT_INHIBIT and CANCELPT_RESUME
macros to temporarily inhibit all cancellation.