granicus.if.org Git

fcntl.h: add AT_RECURSIVE from linux v5.2

apply open_tree with OPEN_TREE_CLONE call to the entire subtree, see

linux commit a07b20004793d8926f78d63eb5980559f7813404
vfs: syscall: Add open_tree(2) to reference or clone a mount

fcntl.h: add AT_STATX_ statx sync flag definitions

see

linux commit a528d35e8bfcc521d7cb70aaf03e1bd296c8493f
statx: Add a system call to make enhanced file info available

these are linux specific and not reserved names for fcntl.h so they
are under _BSD_SOURCE|_GNU_SOURCE.

sched.h: add CLONE_PIDFD from linux v5.2

when set a pidfd is stored in parent_tidptr, see

linux commit b3e5838252665ee4cfa76b82bdf1198dca81e5be
clone: add CLONE_PIDFD

netinet/if_ether.h: add ETH_P_DSA_8021Q from linux v5.2

ethertype for fake VLAN header for DSA, see

linux commit bf5bc3ce8a8f32a0d45b6820ede8f9fc3e9c23df
ether: Add dedicated Ethertype for pseudo-802.1Q DSA tagging

honor __WCHAR_TYPE__ on archs with legacy long definition of wchar_t

historically, a number of 32-bit archs used long rather than int for
wchar_t, for no good reason. GCC still uses the historical types, but
clang replaced them all with int, and it seems PCC uses int too.
mismatching the compiler's type for wchar_t is not an option due to
wide string literals.

note that the mismatch does not affect C++ ABI since wchar_t is its
own builtin type/keyword in C++, distinct from both int and long, not
a typedef.

i386 already worked around this by honoring __WCHAR_TYPE__ if defined
by the compiler, and only using the official legacy ABI type if not.
add the same to the other affected archs.

it might make sense at some point to switch to using int as the
default if __WCHAR_TYPE__ is not defined, if the expectations is that
new compilers will treat int as the correct choice, but it's unlikely
that the case where __WCHAR_TYPE__ is undefined will ever be used
anyway. I actually wanted to move the definition of wchar_t to the
top-level shared alltypes.h.in, using __WCHAR_TYPE__ and falling back
to int if not defined, but that can't be done without assuming all
compilers define __WCHAR_TYPE__ thanks to some pathological archs
where the ABI has wchar_t as an unsigned type.

synchronously clean up pthread_create failure due to scheduling errors

previously, when pthread_create failed due to inability to set
explicit scheduling according to the requested attributes, the nascent
thread was detached and made responsible for its own cleanup via the
standard pthread_exit code path. this left it consuming resources
potentially well after pthread_create returned, in a way that the
application could not see or mitigate, and unnecessarily exposed its
existence to the rest of the implementation via the global thread
list.

instead, attempt explicit scheduling early and reuse the failure path
for __clone failure if it fails. the nascent thread's exit futex is
not needed for unlocking the thread list, since the thread calling
pthread_create holds the thread list lock the whole time, so it can be
repurposed to ensure the thread has finished exiting. no pthread_exit
is needed, and freeing the stack, if needed, can happen just as it
would if __clone failed.

set explicit scheduling for new thread from calling thread, not self

if setting scheduling properties succeeds, the new thread may end up
with lower priority than the caller, and may be unable to continue
running due to another intermediate-priority thread. this produces a
priority inversion situation for the thread calling pthread_create,
since it cannot return until the new thread reports success.

originally, the parent was responsible for setting the new thread's
priority; commits b8742f32602add243ee2ce74d804015463726899 and
40bae2d32fd6f3ffea437fa745ad38a1fe77b27e changed it as part of
trimming down the pthread structure. since then, commit
04335d9260c076cf4d9264bd93dd3b06c237a639 partly reversed the changes,
but did not switch responsibilities back. do that now.

fix unsynchronized decrement of thread count on pthread_create error

commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af wrongly documented
that all changes to libc.threads_minus_1 were guarded by the thread
list lock, but the decrement for failed SYS_clone took place after the
thread list lock was released.

add public declaration for optreset under appropriate feature profiles

commit 030e52639248ac8417a4934298caa78c21a228d1 added optreset, a BSD
extension to getopt duplicating the functionality (also an extension)
of setting optind to 0, but failed to provide a public declaration for
it. according to the BSD documentation and headers, the application is
not supposed to need to provide its own declaration.

add posix_spawn [f]chdir file actions

these are presently extensions, thus named with _np to match glibc and
other implementations that provide them; however they are likely to be
standardized in the future without the _np suffix as a result of
Austin Group issue 1208. if so, both names will be kept as aliases.

add copy_file_range system call wrapper

fix clash between sys/user.h and kernel ptrace.h on powerpc[64], sh

due to historical accident/sloppiness in glibc, the powerpc,
powerpc64, and sh versions of struct user, defined by sys/user.h, used
struct pt_regs from the kernel asm/ptrace.h for their regs member.
this made it impossible to define the type in an API-compatible manner
without either including asm/ptrace.h like glibc does (contrary to our
policy of not depending on kernel headers), or clashing with
asm/ptrace.h's definition of struct pt_regs if both headers are
included (which is almost always the case in software using
sys/user.h).

for a long time I viewed this problem as having no reasonable fix. I
even explored the possibility of having the powerpc[64] and sh
versions of user.h just include the kernel header (breaking with
policy), but that looked like it might introduce new clashes with
sys/ptrace.h. and it would also bring in a lot of additional cruft
that makes no sense for sys/user.h to expose. glibc goes out of its
way to suppress some of that with #undef, possibly leading to
different problems. this is a rabbit-hole that should be explored no
further.

as it turns out, however, nothing actually uses struct user
sufficiently to care about the type of the regs member; most software
including sys/user.h does not even use struct user at all. so, the
problem can be fixed just by doing away with the insistence on strict
glibc API compatibility for the struct tag of the regs member.

rather than renaming the tag, which might lead to the new name
entering use as API, simply use an untagged structure inside struct
user with the same members/layout as struct pt_regs.

for sh, struct pt_dspregs is just removed entirely since it was not
used.

fix external dummy_lock symbol inadvertently introduced in sigaction

commit 9b14ad541068d4f7d0be9bcd1ff4c70090d868d3 introduced this
namespace violation.

remove sporadic server members from struct sched_param

these members are associated with an unsupported option group. with
time_t changing size on 32-bit archs, all interfaces taking struct
sched_param arguments would need redirection and compat shims in order
to be able to continue offering these members, for no benefit. just
convert them to reserved space instead.

re-add ELF gregs and fpregs types to riscv64 user.h

d493206de7df4db07ad34f24701539ba0a6ed38c deleted all the content of
user.h, but sys/procfs.h expects this from sys/user.h
threfore we retain the non conflicting parts

fix regression whereby main thread didn't get TLS relocations

commit ffab43602b5900c86b7040abdda8ccf6cdec95f5 broke this by moving
relocations after not only the allocation of storage for the main
thread's static TLS, but after the copying of the TLS image. thus,
relocation results were not reflected in the main thread's copy. this
could be fixed by calling __reset_tls after relocations, but instead
split the allocation and installation before/after relocations so that
there's not a redundant copy.

due to commit 71af5309874269bcc9e4b84ea716fab33d888c1d, updating of
static_tls_cnt needs to be kept with allocation of static TLS, before
relocations, rather than after installation.

fix accidentlly-external cmp symbol introduced with catgets

commit 7590203c486d9002522019045d34ee3dee0a66f5 omitted static here.

make relocation time symbol lookup and dlsym consistent

Using common code path for all symbol lookups fixes three dlsym issues:

- st_shndx of STT_TLS symbols were not checked and thus an undefined
  tls symbol reference could be incorrectly treated as a definition
  (the sysv hash lookup returns undefined symbols, gnu does not, so should
  be rare in practice).

- symbol binding was not checked so a hidden symbol may be returned
  (in principle STB_LOCAL symbols may appear in the dynamic symbol table
  for hidden symbols, but linkers most likely don't produce it).

- mips specific behaviour was not applied (ARCH_SYM_REJECT_UND) so
  undefined symbols may be returned on mips.

always_inline is used to avoid relocation performance regression, the
code generation for find_sym should not be affected.

ldso: correct condition for local symbol handling in do_relocs

commit 7a9669e977e5f750cf72ccbd2614f8b72ce02c4c added use of the
symbol reference as the definition, in place of performing a lookup,
for STT_SECTION symbol references that were first found used in FDPIC.
such references may happen in certain other cases, such as
local-dynamic TLS and with relocation types that require a symbol but
that are being used for non-symbolic purposes, like the powerpc
unaligned address relocations.

in all such cases I'm aware of, the symbol referenced is a section
symbol (STT_SECTION); however, the important semantic property is not
its being a section, but rather its binding local (STB_LOCAL). check
the latter instead of the former for greater generality and semantic
correctness.

add support for powerpc/powerpc64 unaligned relocations

R_PPC_UADDR32 (R_PPC64_UADDR64) has the same meaning as R_PPC_ADDR32
(R_PPC64_ADDR64), except that its address need not be aligned. For
powerpc64, BFD ld(1) will automatically convert between ADDR<->UADDR
relocations when the address is/isn't at its native alignment. This
will happen if, for example, there is a pointer in a packed struct.

gold and lld do not currently generate R_PPC64_UADDR64, but pass
through misaligned R_PPC64_ADDR64 relocations from object files,
possibly relaxing them to misaligned R_PPC64_RELATIVE. In both cases
(relaxed or not) this violates the PSABI, which defines the relevant
field type as "a 64-bit field occupying 8 bytes, the alignment of
which is 8 bytes unless otherwise specified."

All three linkers violate the PSABI on 32-bit powerpc, where the only
difference is that the field is 32 bits wide, aligned to 4 bytes.

Currently musl fails to load executables linked by BFD ld containing
R_PPC64_UADDR64, with the error "unsupported relocation type 43".
This change provides compatibility with BFD ld on powerpc64, and any
static linker on either architecture that starts following the PSABI
more closely.

ldso: remove redundant runtime checks in static TLS logic

as a result of commit ffab43602b5900c86b7040abdda8ccf6cdec95f5,
static_tls_cnt is now valid during relocations at program startup, so
it's no longer necessary to condition the check against static_tls_cnt
on this being a runtime (dlopen) relocation.

ldso: fix calloc misuse allocating initial tls

this is analogous to commit 2f1f51ae7b2d78247568e7fdb8462f3c19e469a4,
and should have been caught at the same time since it was right next
to the code moved in that commit. between final stage 3 reloc_all and
the jump to the main program's entry point, it is not valid to call
any functions which may be interposed by the application; doing so
results in execution of application code before ctors have run, and on
fdpic archs, before the main program's fdpic self-fixups have taken
place, which will produce runaway wrong execution.

add secure_getenv function

This function is a GNU extension introduced in glibc 2.17.

in clock_getres, check for null pointer before storing result

POSIX allows a null pointer, in which case the function only checks
the validity of the clock id argument.

remove spurious null check in clock_settime

at the point of this check, the pointer has already been dereferenced.
clock_settime is not defined for null pointer arguments.

fix regression in recvmmsg with no timeout

somewhat analogous to commit d0b547dfb5f7678cab6bc39dd736ed6454357ca4,
but here the omission of the null timeout check was in the time64
syscall code path. this code is not yet used except on x32.

add non-stub implementation of catgets localization functions

these accept the netbsd/openbsd message catalog file format,
consisting of a sorted list of set headers and a sorted list of
message headers for each set, admitting trivial binary search for
lookups.

the gnu format was not chosen because it's unusably bad. it does not
admit efficient (log time or better) lookups; rather, it requires
linear search or hash table lookups, and the hash function is awful:
it's literally set_id*msg_id.

fix regression in select with no timeout

commit 722a1ae3351a03ab25010dbebd492eced664853b inadvertently passed a
copy of {s,us} to the syscall even if the timeout argument tv was
null, thereby causing immediate timeout (polling) in place of
unlimited timeout. only archs using SYS_select were affected.

fix failure of glob to match broken symlinks under some conditions

when the pattern ended with one or more literal path components, or
when the GLOB_MARK flag was passed to request that glob flag directory
results and the type obtained by readdir was unknown or inconclusive
(symlink), the stat function was called to evaluate existence and/or
determine type. however, stat fails with ENOENT for broken symlinks,
and this caused the match to be omitted from the results.

instead, use stat only for the unknown/inconclusive cases with
GLOB_MARK, and otherwise, or if stat fails, use lstat existence still
needs to be determined. this minimizes the number of costly syscalls,
performing both only in the case where GLOB_MARK is in use and there
is a final literal path component which is a broken symlink.

based on/simplified from patch by James Y Knight.

remove riscv64 bits/user.h contents

the contents conflicted with asm/ptrace.h. glibc does not provide
anything in user.h for riscv, so software cannot be depending on it.

simplified from patch submitted by Baruch Siach.

fix risc64 conflict with kernel headers

Rename user registers struct definitions to avoid conflict with the
asm/ptrace.h kernel header that defines the same structs. Use the
__riscv_mc prefix as glibc does.

in arm cancellation point asm, don't unnecessarily preserve link register

The only reason we needed to preserve the link register was because we
were using a branch-link instruction to branch to __cp_cancel.
Replacing this with a branch means we can avoid the save/restore as
the link register is no longer modified.

glob: implement GLOB_TILDE and GLOB_TILDE_CHECK

use setitimer function rather than syscall to implement alarm

otherwise alarm will break on 32-bit archs when time_t is changed to
64-bit. a second itimerval object is introduced for retrieving the old
value, since the setitimer function has restrict-qualified arguments.

fix build regression in i386 asm for atan2, atan2f

commit f3ed8bfe8a82af1870ddc8696ed4cc1d5aa6b441 inadvertently removed
labels that were still needed.

fix x87 stack imbalance in corner cases of i386 math asm

commit 31c5fb80b9eae86f801be4f46025bc6532a554c5 introduced underflow
code paths for the i386 math asm, along with checks on the fpu status
word to skip the underflow-generation instructions if the underflow
flag was already raised. unfortunately, at least one such path, in
log1p, returned with 2 items on the x87 stack rather than just 1 item
for the return value. this is a violation of the ABI's calling
convention, and could cause subsequent floating point code to produce
NANs due to x87 stack overflow. if floating point results are used in
flow control, this can lead to runaway wrong code execution.

rather than reviewing each "underflow already raised" code path for
correctness, remove them all. they're likely slower than just
performing the underflow code unconditionally, and significantly more
complex.

all of this code should be ripped out and replaced by C source files
with inline asm. doing so would preclude this kind of error by having
the compiler perform all x87 stack register allocation and stack
manipulation, and would produce comparable or better code. however
such a change is a much larger project.

fix regression in clock_gettime on 32-bit archs without vdso

commit 72f50245d018af0c31b38dec83c557a4e5dd1ea8 broke this by creating
a code path where r is uninitialized.

update riscv64 syscall numbers to linux v5.1

commit f3f96f2daa4d00f0e38489fb465cd0244b531abe added these for the
rest of the archs, but the patch it corresponded to missed riscv64
since riscv64 was not yet upstream at the time. this caused commit
dfc81828f7ab41da08f744c44117a1bb20a05749 to break riscv64 build, due
to a wrong assumption that SYS_statx was unconditionally defined.

clock_gettime: add support for 32-bit vdso with 64-bit time_t

this fixes a major upcoming performance regression introduced by
commit 72f50245d018af0c31b38dec83c557a4e5dd1ea8, whereby 32-bit archs
would lose vdso clock_gettime after switching to 64-bit time_t, unless
the kernel supports time64 and provides a time64 version of the vdso
function. this would incur not just one but two syscalls: first, the
failed time64 syscall, then the fallback time32 one.

overflow of the 32-bit result is detected and triggers a revert to
syscalls. normally, on a system that's not Y2038-ready, this would
still overflow, but if the process has been migrated to a
time64-capable kernel or if the kernel has been hot-patched to add
time64 syscalls, it may conceivably work.

move IPC_STAT definition to a new bits/ipcstat.h file

otherwise, 32-bit archs that could otherwise share the generic
bits/ipc.h would need to duplicate the struct ipc_perm definition,
obscuring the fact that it's the same. sysvipc is not widely used and
these headers are not commonly included, so there is no performance
gain to be had by limiting the number of indirectly included files
here.

files with the existing time32 definition of IPC_STAT are added to all
current 32-bit archs now, so that when it's changed the change will
show up as a change rather than addition of a new file where it's less
obvious that the value is changing vs the generic one that was used
before.

fix missing declarations for pthread_join extensions in source file

per policy, define the feature test macro to get declarations for the
pthread_tryjoin_np and pthread_timedjoin_np functions. in the past
this has been only for checking; with 32-bit archs getting 64-bit
time_t it will also be necessary for symbols to get redirected
correctly.

allow archs to define IPC_STAT, propagate time64 bit to other macros

to make use of {sem,shm,msg}ctl IPC_STAT functionality to provide
64-bit time_t on 32-bit archs, IPC_STAT and related macros must be
defined with bit 8 (0x100) set. allow archs to define IPC_STAT in
bits/ipc.h, and define the other macros in terms of it so that they
all get the same value of the time64 bit.

clock_gettime: add time64 syscall support, decouple 32-bit time_t

the time64 syscall has to be used if time_t is 64-bit, since there's
no way of knowing before making a syscall whether the result will fit
in 32 bits, and the 32-bit syscalls do not report overflow as an
error.

on 64-bit archs, there is no change to the code after preprocessing.
on current 32-bit archs, the result is now read from the kernel
through long[2] array, then copied into the timespec, to remove the
assumption that time_t is the same as long.

vdso clock_gettime is still used in place of a syscall if available.
32-bit archs with 64-bit time_t must use the time64 version of the
vdso function; if it's not available, performance will significantly
suffer. support for both vdso functions could be added, but would
break the ability to move a long-lived process from a pre-time64
kernel to one that can outlast Y2038 with checkpoint/resume, at least
without added hacks to identify that the 32-bit function is no longer
usable and stop using it (e.g. by seeing negative tv_sec). this
possibility may be explored in future work on the function.

clock_adjtime: add time64 support, decouple 32-bit time_t, fix x32

the 64-bit/time64 version of the syscall is not API-compatible with
the userspace timex structure definition; fields specified as long
have type long long. so when using the time64 syscall, we have to
convert the entire structure. this was always the case for x32 as
well, but went unnoticed, meaning that clock_adjtime just passed junk
to the kernel on x32. it should be fixed now.

for the fallback case, we avoid encoding any assumptions about the new
location of the time member or naming of the legacy slots by accessing
them through a union of the kernel type and the new userspace type.
the only assumption is that the non-time members live at the same
offsets as in the (non-time64, long-based) kernel timex struct. this
property saves us from having to convert the whole thing, and avoids a
lot of additional work in compat shims.

the new code is statically unreachable for now except on x32, where it
fixes major brokenness. it is permanently unreachable on 64-bit.

ioctl: add fallback for new time64 SIOCGSTAMP[NS]

without this, the SIOCGSTAMP and SIOCGSTAMPNS ioctl commands, for
obtaining timestamps, would stop working on pre-5.1 kernels after
time_t is switched to 64-bit and their values are changed to the new
time64 versions.

new code is written such that it's statically unreachable on 64-bit
archs, and on existing 32-bit archs until the macro values are changed
to activate 64-bit time_t.

get/setsockopt: add fallback for new time64 SO_RCVTIMEO/SO_SNDTIMEO

without this, the SO_RCVTIMEO and SO_SNDTIMEO socket options would
stop working on pre-5.1 kernels after time_t is switched to 64-bit and
their values are changed to the new time64 versions.

new code is written such that it's statically unreachable on 64-bit
archs, and on existing 32-bit archs until the macro values are changed
to activate 64-bit time_t.

make __socketcall analogous to __syscall, error-returning

the __socketcall and __socketcall_cp macros are remnants from a really
old version of the syscall-mechanism infrastructure, and don't follow
the pattern that the "__" version of the macro returns the raw negated
error number rather than setting errno and returning -1.

for time64 purposes, some socket syscalls will need to operate on the
error value rather than returning immediately, so fix this up so they
can use it.

sysvipc: overhaul {sem,shm,msg}ctl for time64

being "ctl" functions that take command numbers, these will be handled
like ioctl/sockopt/etc., using new command numbers for the time64
variants with an "IPC_TIME64" bit added to their values. to obtain
such a reserved bit, we reuse the IPC_64 bit, 0x100, which served only
as part of the libc-to-kernel interface, not as a public interface of
the libc functions.

using new command numbers avoids the need for compat shims (in ABIs
doing time64 through symbol redirection and compat shims) and, by
virtue of having a fixed time64 bit for all commands, we can ensure
that libc can perform the appropriate translations, even if the
application is using new commands from a newer version of the libc
headers than the libc available at runtime.

for the vast majority of 32-bit archs, the kernel {sem,shm,msq}id64_ds
definitions left padding space intended for expanding their time_t
fields to 64 bits in-place, and it would have been really nice to be
able to do time64 support that way. however the padding was almost
always in little-endian order (except on powerpc, and for msqid_ds
only on mips, where it matched the arch's byte order), and more
importantly, the alignment was overlooked. in semid_ds and msqid_ds,
the time_t members were not suitably aligned to be expanded to 64-bit,
due to the ipc_perm header consisting of 9 32-bit words -- except on
powerpc where ipc_perm contains an extra padding word. in shmid_ds,
the time_t members were suitably aligned, except that mips
(accidentally?) omitted the padding for them alltogether.

as a result, we're stuck with adding new time_t fields on the end of
the structures, and assembling the 32-bit lo/hi parts (or 16-bit hi
parts, for mips shmid_ds, which lacked sufficient reserved space for
full 32-bit hi parts) to fill them in.

all of the functional changes here are conditional on the IPC_TIME64
macro having a nonzero definition, which will only happen when
IPC_STAT is redefined for 32-bit archs, and on time_t being larger
than long, so for now the new code is all dead code.

fix semctl with SEM_STAT_ANY

due to the variadic signature, semctl needs to be made aware of any
new commands that take arguments. this was overlooked when commit
af55070eae5438476f921d827b7ae49e8141c3fe added SEM_STAT_ANY.

remove gratuitously-different arch-specific bits/ipc.h files

these differ from generic only in using endian-matched padding with a
short __ipc_perm_seq field in place of the int field in generic. this
is not a documented public interface anyway, and the original intent
was to use int here. some ports just inadvertently slipped in the
kernel short+padding form.

remove arch-specific bits/ipc.h that are identical to generic

previously these differed from generic because they needed their own
definitions of IPC_64. now that it's no longer in public header,
they're identical.

move IPC_64 from public bits/ipc.h to syscall_arch.h

the definition of the IPC_64 macro controls the interface between libc
and the kernel through syscalls; it's not a public API. the meaning is
rather obscure. long ago, Linux's sysvipc *id_ds structures used
16-bit uids/gids and wrong types for a few other fields. this was in
the libc5 era, before glibc. the IPC_64 flag (64 is a misnomer; it's
more like 32) tells the kernel to use the modern[-ish] versions of the
structures.

the definition of IPC_64 has nothing to do with whether the arch is
32- or 64-bit. rather, due to either historical accident or
intentional obnoxiousness, the kernel only accepts and masks off the
0x100 IPC_64 flag conditional on CONFIG_ARCH_WANT_IPC_PARSE_VERSION,
i.e. for archs that want to provide, or that accidentally provided,
both. for archs which don't define this option, no masking is
performed and commands with the 0x100 bit set will fail as invalid. so
ultimately, the definition is just a matter of matching an arbitrary
switch defined per-arch in the kernel.

select: overhaul for time64

major changes are made alongside adding time64 syscall support to
account for issues found during research. select historically accepts
non-normalized (tv_usec not restricted to less than 1000000) timeouts,
and the kernel normalizes them, but the normalization code is buggy
and subject to integer overflows. since normalization is needed anyway
when using SYS_pselect6 or SYS_pselect6_time64 as the backend, simply
do it up-front to eliminate both code path complexity and the
possibility of kernel bugs.

as a side effect, select no longer updates the caller's timeout
timeval with the remaining time. previously, archs that used
SYS_select updated it and archs that used SYS_pselect6 didn't. this
change may turn out to be controversial and may need revisiting, but
in any case the old behavior was not strictly conforming.

POSIX allows modification of the timeout "upon successful completion",
but the Linux syscall modifies it upon unsuccessful completion (EINTR)
as well (and presumably each time the syscall stops and restarts
before it's known whether completion will be successful). it's
possible that this language does not reflect the actual intent of the
standard, since other historical implementations probably behaved like
Linux, but that should be clarified if there's a desire to bring the
old behavior back. regardless, programs that are depending on this are
not correct and are already broken on some archs we support.

recvmmsg: add time64 syscall support, decouple 32-bit time_t

the time64 syscall is used only if the timeout does not fit in 32
bits. after preprocessing, the code is unchanged on 64-bit archs. for
32-bit archs, the timeout now goes through an intermediate copy,
meaning that the caller does not get back the updated timeout. this is
based on my reading of the documentation, which does not document the
updating as a contract you can rely on, and mentions that the whole
recvmmsg timeout mechanism is buggy and unlikely to be useful. if it
turns out that there's interest in making the remaining time
officially available to callers, such functionality could be added
back later.

setitimer, getitimer: decouple time_t from long

these functions have no new time64 syscall, so the existence of a
time64 syscall cannot be used as the condition for the new code.
instead, assume the syscall takes timevals as longs, which is true
everywhere but x32, and interface with the kernel through long[4]
objects.

rather than adding new hacks to special-case x32 here, just add
x32-specific source files since a trivial syscall wrapper suffices
there.

the new code paths added in this commit are statically unreachable on
all current archs, but will become reachable when 32-bit archs get
64-bit time_t.

remove duplicates of new generic bits/msg.h

use 64-bit msqid_ds layout in the generic version of bits/msg.h

this layout is more common already than the old generic, and should
become even more common in the future with new archs added and with
64-bit time_t on 32-bit archs.

duplicate generic bits/msg.h for each arch using it, in prep to change

remove duplicates of new generic bits/sem.h

some of these were not exact duplicates, but had gratuitously
different naming for padding, or omitted the endian checks because the
arch is fixed-endian.

use 64-bit semid_ds layout in the generic version of bits/sem.h

this layout is slightly less common than the old generic one, but only
because x86_64 and x32 wrongly (according to comments in the kernel
headers) copied the i386 padding. for future archs, and with 64-bit
time_t on 32-bit archs, the new layout here will become the most
common, and it makes sense to treat it as the generic.

collapse out byte order conditions in bits/sem.h for fixed-endian archs

having preprocessor conditionals on byte order in the bits headers for
fixed-endian archs is confusing at best. remove them.

duplicate generic bits/sem.h for each arch using it, in prep to change

extricate bits/sem.h from x32 time_t hack

various padding fields in the generic bits/sem.h were defined in terms
of time_t as a cheap hack standing in for "kernel long", to allow x32
to use the generic version of the file. this was a really bad idea, as
it ended up getting copied into lots of arch-specific versions of the
bits file, and is a blocker to changing time_t to 64-bit on 32-bit
archs.

this commit adds an x32-specific version of the header, and changes
padding type back from time_t to long (currently the same type on all
archs but x32) in the generic header and all the others the hack got
copied into.

remove trailing newlines from various versions of bits/shm.h

remove duplicates of new generic bits/shm.h

use 64-bit shmid_ds layout in the generic version of bits/shm.h

this layout is more common already than the old generic, and should
become even more common in the future with new archs added and with
64-bit time_t on 32-bit archs.

the duplicate arch-specific copies are not removed yet in this commit,
so as to assist git tooling in copy/rename tracking.

duplicate generic bits/shm.h for each arch using it, in prep to change

there are more archs sharing the generic 64-bit version of the struct,
which is uniform and much more reasonable, than sharing the current
"generic" one, and depending on how time64 sysvipc is done for 32-bit
archs, even more may be sharing the "64-bit version" in the future.

so, duplicate the current generic to all archs using it (arm, i386,
m68k, microblaze, or1k) so that the generic can be changed freely.

this is recorded as its own commit mainly as a hint to git tooling, to
assist in copy/move tracking.

timerfd: add time64 syscall support, decouple 32-bit time_t

the changes here are semantically and structurally identical to those
made to timer_settime and timer_gettime for time64 support.

sched_rr_get_interval: don't assume time_t is 32-bit on 32-bit archs

as with clock_getres, the time64 syscall for this is not necessary or
useful, this time since scheduling timeslices are not on the order 68
years. if there's a 32-bit syscall, use it and expand the result into
timespec; otherwise there is only one syscall and it does the right
thing to store to timespec directly.

on 64-bit archs, there is no change to the code after preprocessing.

clock_getres: don't assume time_t is 32-bit on 32-bit archs

the time64 syscall for this is not necessary or useful, since clock
resolution is generally better than 68-year granularity. if there's a
32-bit syscall, use it and expand the result into timespec; otherwise
there is only one syscall and it does the right thing to store to
timespec directly.

on 64-bit archs, there is no change to the code after preprocessing.

timer_gettime: add time64 syscall support, decouple 32-bit time_t

the time64 syscall has to be used if time_t is 64-bit, since there's
no way of knowing before making a syscall whether the result will fit
in 32 bits, and the 32-bit syscalls do not report overflow as an
error.

on 64-bit archs, there is no change to the code after preprocessing.
on current 32-bit archs, the result is now read from the kernel
through long[4] array, then copied into the timespec, to remove the
assumption that time_t is the same as long.

remove x32 syscall timespec fixup hacks

the x32 syscall interfaces treat timespec's tv_nsec member as 64-bit
despite the API type being long and long being 32-bit in the ABI. this
is no problem for syscalls that store timespecs to userspace as
results, but caused uninitialized padding to be misinterpreted as the
high bits in syscalls that take timespecs as input.

since the beginning of the port, we've dealt with this situation with
hacks in syscall_arch.h, and injected between __syscall_cp_c and
__syscall_cp_asm, to special-case the syscall numbers that involve
timespecs as inputs and copy them to a form suitable to pass to the
kernel.

commit 40aa18d55ab763e69ad16d0cf1cebea708ffde47 set the stage for
removal of these hacks by letting us treat the "normal" x32 syscalls
dealing with timespec as if they're x32's "time64" syscalls,
effectively making x32 ax "time64-only 32-bit arch" like riscv32 will
be when it's added. since then, all users of syscalls that x32's
syscall_arch.h had hacks for have been updated to use time64 syscalls,
so the hacks can be removed.

there are still at least a few other timespec-related syscalls broken
on x32, which were overlooked when the x32 hacks were done or added
later. these include at least recvmmsg, adjtimex/clock_adjtime, and
timerfd_settime, and they will be fixed independently later on.

utimensat: add time64 syscall support, decouple 32-bit time_t

time64 syscall is used only if it's the only one defined for the arch,
or if either of the requested times does not fit in 32 bits. care is
taken to normalize the inputs to account for UTIME_NOW or UTIME_OMIT
in tv_nsec, in which case tv_sec should be ignored. this is needed not
only to avoid spurious time64 syscalls that might waste time failing
with ENOSYS, but also to accurately decide whether fallback is
possible.

if the requested time cannot be represented, the function fails with
ENOTSUP, defined in general as "The implementation does not support
the requested feature or value". neither the time64 syscall, nor this
error, can happen on current 32-bit archs where time_t is a 32-bit
type, and both are statically unreachable.

on 64-bit archs, there are only superficial changes to the
SYS_futimesat fallback path, which has been modified to pass long[4]
instead of struct timeval[2] to the kernel, making it suitable for use
on 32-bit archs even once time_t is changed to 64-bit. for 32-bit
archs, the call to SYS_utimensat has also been changed to copy the
timespecs through an array of long[4] rather than passing the
timespec[2] in-place.

clock_settime: add time64 syscall support, decouple 32-bit time_t

time64 syscall is used only if it's the only one defined for the arch,
or if the requested time does not fit in 32 bits. on current 32-bit
archs where time_t is a 32-bit type, this makes it statically
unreachable.

if the time64 syscall is needed because the requested time does not
fit in 32 bits, we define this as an error ENOTSUP, for "The
implementation does not support the requested feature or value".

on 64-bit archs, there is no change to the code after preprocessing.
on current 32-bit archs, the time is moved through an intermediate
copy to remove the assumption that time_t is a 32-bit type.

timer_settime: add support for time64 syscall, decouple 32-bit time_t

time64 syscall is used only if it's the only one defined for the arch,
if either component of the itimerspec does not fit in 32 bits, or if
time_t is 64-bit and the caller requested the old value, in which case
there's a possibility that the old value might not fit in 32 bits. on
current 32-bit archs where time_t is a 32-bit type, this makes it
statically unreachable.

on 64-bit archs, there is no change to the code after preprocessing.
on current 32-bit archs, the time is moved through an intermediate
copy to remove the assumption that time_t is a 32-bit type.

pselect, ppoll: add time64 syscall support, decouple 32-bit time_t

time64 syscall is used only if it's the only one defined for the arch,
or if the requested timeout length does not fit in 32 bits. on current
32-bit archs where time_t is a 32-bit type, this makes it statically
unreachable.

on 64-bit archs, there are only superficial changes to the code after
preprocessing. both before and after these changes, these functions
copied their timeout arguments to avoid letting the kernel clobber the
caller's copies. now, the copying also serves to change the type from
userspace timespec to a pair of longs, which makes a difference only
in the 32-bit fallback case, not on 64-bit.

futex wait operations: add time64 syscall support, decouple 32-bit time_t

thanks to the original factorization using the __timedwait function,
there are no FUTEX_WAIT calls anywhere else, giving us a single point
of change to make nearly all the timed thread primitives time64-ready.
the one exception is the FUTEX_LOCK_PI command for PI mutex timedlock.
I haven't tried to make these two points share code, since they have
different fallbacks (no non-private fallback needed for PI since PI
was added later) and FUTEX_LOCK_PI isn't a cancellation point (thus
allowing the whole code path to inline into pthread_mutex_timedlock).

as for other changes in this series, the time64 syscall is used only
if it's the only one defined for the arch, or if the requested timeout
does not fit in 32 bits. on current 32-bit archs where time_t is a
32-bit type, this makes it statically unreachable.

on 64-bit archs, there are only superficial changes to the code after
preprocessing. on current 32-bit archs, the time is passed via an
intermediate copy to remove the assumption that time_t is a 32-bit
type.

semtimedop: add time64 syscall support, decouple 32-bit time_t

time64 syscall is used only if it's the only one defined for the arch,
or if the requested timeout does not fit in 32 bits. on current 32-bit
archs where time_t is a 32-bit type, this makes it statically
unreachable.

on 64-bit archs, there is no change to the code after preprocessing.
on current 32-bit archs, the time is passed via an intermediate copy
to remove the assumption that time_t is a 32-bit type.

to avoid duplicating SYS_ipc/SYS_semtimedop choice logic, the code for
32-bit archs "falls through" after updating the timeout argument ts to
point to a [compound literal] array of longs. in preparation for
"time64-only" 32-bit archs, an extra case is added for neither SYS_ipc
nor the non-time64 SYS_semtimedop existing; the ENOSYS failure path
here should never be reachable, and is added just in case a compiler
can't see that it's not reachable, to avoid spurious static analysis
complaints.

sigtimedwait: add time64 syscall support, decouple 32-bit time_t

time64 syscall is used only if it's the only one defined for the arch,
or if the requested timeout length does not fit in 32 bits. on current
32-bit archs where time_t is a 32-bit type, this makes it statically
unreachable.

on 64-bit archs, there are only superficial changes to the code after
preprocessing. on current 32-bit archs, the timeout is passed via an
intermediate copy to remove the assumption that time_t is a 32-bit
type.

mq_timedsend, mq_timedreceive: add time64, decouple 32-bit time_t

time64 syscall is used only if it's the only one defined for the arch,
or if the requested absolute timeout does not fit in 32 bits. on
current 32-bit archs where time_t is a 32-bit type, this makes it
statically unreachable.

on 64-bit archs, there is no change to the code after preprocessing.
on current 32-bit archs, the timeout is passed via an intermediate
copy to remove the assumption that time_t is a 32-bit type.

clock_nanosleep: add time64 syscall support, decouple 32-bit time_t

time64 syscall is used only if it's the only one defined for the arch,
or if the requested time does not fit in 32 bits. on current 32-bit
archs where time_t is a 32-bit type, this makes it statically
unreachable.

on 64-bit archs, there is no change to the code after preprocessing.
on current 32-bit archs, the time is moved through an intermediate
copy to remove the assumption that time_t is a 32-bit type.

implement settimeofday in terms of clock_settime, not old syscall

this is yet another place where special handling of time syscalls can
and should be avoided by implementing legacy functions in terms of
their modern replacements. in theory a fallback to SYS_settimeofday
could be added to clock_settime, but SYS_clock_settime has been
available since Linux 2.6.0 or earlier, i.e. all the way back to the
minimum supported version.

internally, define plain syscalls, if missing, as their time64 variants

this commit has no effect whatsoever right now, but is in preparation
for a future riscv32 port and other future 32-bit archs that will be
"time64-only" from the start on the kernel side.

together with the previous x32 changes, this commit ensures that
syscall call points that don't care about time (passing null timeouts,
etc.) can continue to do so without having to special-case time64-only
archs, and allows code using the time64 syscalls to uniformly test for
the need to fallback with SYS_foo != SYS_foo_time64, rather than
needing to check defined(SYS_foo) && SYS_foo != SYS_foo_time64.

internally, define time64 syscalls on x32 as the existing syscalls

x32 is odd in that it's the only ILP32 arch/ABI we have where time_t
is 64-bit rather than (32-bit) long, and this has always been
problematic in that it results in struct timespec having unused
padding space, since tv_nsec has type long, which the kernel insists
be zero- or sign-extended (due to negative tv_nsec being invalid, it
doesn't matter which) to match the x86_64 type.

up til now, we've had really ugly hacks in x32/syscall_arch.h to patch
up the timespecs passed to the kernel. but the same requirement to
zero- or sign-extend tv_nsec also applies to all the new time64
syscalls on true 32-bit archs. so let's take advantage of this to
clean things up.

this patch defines all of the time64 syscalls for x32 as aliases for
the existing syscalls by the same name. this establishes the following
invariants:

- if the time64 form is defined, it takes time arguments as 64-bit
  objects, and tv_nsec inputs must be zero-/sign-extended to 64-bit.

- if the time64 form is not defined, or if the time64 form is defined
  and is not equal to the "plain" form, the plain form takes time
  arguments as longs.

this will avoid the need for protocols for archs to define appropriate
types for each family of syscalls, and for the reader of the code to
have to be aware of such type definitions.

in some sense it might be simpler if the plain syscall form were
undefined for x32, so that it would always take longs if defined.
however, a number of these syscalls are used in contexts with a null
time argument, or (e.g. futex) for commands that don't involve time at
all, and having to introduce time64-specific logic to all those call
points does not make sense. thus, while the "plain" forms are kept now
just because they're needed until the affected code is converted over,
they'll also almost surely be kept in the future as well.

don't use futimesat syscall as utimensat fallback on x32

kernel support for x32 was added long after the utimensat syscall was
already available, so having a fallback is just wasted code size.

also, for changes related to time64 support on 32-bit archs, I want to
be able to assume the old futimesat syscall always works with longs,
which is true except for x32. by ensuring that it's not used on x32,
the needed invariant is established.

fix and simplify futimesat fallback in utimensat

previously the fallback wrongly failed with EINVAL rather than ENOSYS
when UTIME_NOW was used with one component but not both. commit
dd5f50da6f6c3df5647e922e47f8568a8896a752 introduced this behavior when
initially adding the fallback support.

instead, detect the case where both are UTIME_NOW early and replace
with a null times pointer; this may improve performance slightly (less
copy from user), and removes the complex logic from the fallback case.
it also makes things slightly simpler for adding time64 code paths.

refactor thrd_sleep and nanosleep in terms of clock_nanosleep

for namespace-safety with thrd_sleep, this requires an alias, which is
also added. this eliminates all but one direct call point for
nanosleep syscalls, and arranges that 64-bit time_t conversion logic
will only need to exist in one file rather than three.

as a bonus, clock_nanosleep with CLOCK_REALTIME and empty flags is now
implemented as SYS_nanosleep, thereby working on older kernels that
may lack POSIX clocks functionality.

use the correct stat structure in the fstat path

commit 01ae3fc6d48f4a45535189b7a6db286535af08ca modified fstatat to
translate the kernel's struct stat ("kstat") into the libc struct stat.
To do this, it created a local kstat object, and copied its contents
into the user-provided object.

However, the commit neglected to update the fstat compatibility path and
its fallbacks. They continued to pass the user-supplied object to the
kernel, later overwiting it with the uninitialized memory in the local
temporary.

refactor adjtime function using adjtimex function instead of syscall

this removes the assumption that userspace struct timex matches the
syscall type and sets the stage for 64-bit time_t on 32-bit archs.

refactor adjtimex in terms of clock_adjtime

this sets the stage for having the conversion logic for 64-bit time_t
all in one file, and as a bonus makes clock_adjtime for CLOCK_REALTIME
work even on kernels too old to have the clock_adjtime syscall.

fix inadvertent introduction of extern object stx

commit dfc81828f7ab41da08f744c44117a1bb20a05749 accidentally defined
an instance of struct statx along with the struct declaration.

implement fstatat with SYS_statx, conditional on undersized kstat time

this commit adds a new backend for fstatat (and thereby the whole stat
family) using the SYS_statx syscall, but conditions the new code on
the kernel stat structure's time fields being smaller than time_t. in
principle that should make it all dead code at present, but mips64 has
a broken stat structure with 32-bit time fields despite having 64-bit
time_t elsewhere, so on mips64 it is a functional change that makes
post-Y2038 filesystem timestamps accessible.

whenever the 32-bit archs end up getting 64-bit time_t, regardless of
how that happens, the changes in this commit will automatically take
effect for them too.

cleanup includes now that stat, lstat no longer make direct syscalls

restore property that fstat(AT_FDCWD) fails with EBADF

AT_FDCWD is not a valid file descriptor, so POSIX requires fstat to
fail with EBADF. if passed to fstatat, the call would spuriously
succeed and return results for the working directory.

remove mips/n32/64 stat struct hacks from syscall machinery

now that we have a kstat structure decoupled from the public struct
stat, we can just use the broken kernel structures directly and let
the code in fstatat do the translation.

decouple struct stat from kernel type

presently, all archs/ABIs have struct stat matching the kernel
stat[64] type, except mips/mipsn32/mips64 which do conversion hacks in
syscall_arch.h to work around bugs in the kernel type. this patch
completely decouples them and adds a translation step to the success
path of fstatat. at present, this is just a gratuitous copying, but it
opens up multiple possibilities for future support for 64-bit time_t
on 32-bit archs and for cleaned-up/unified ABIs.

for clarity, the mips hacks are not yet removed in this commit, so the
mips kstat structs still correspond to the output of the hacks in
their syscall_arch.h files, not the raw kernel type. a subsequent
commit will fix this.

refactor all stat functions in terms of fstatat

equivalent logic for fstat+O_PATH fallback and direct use of
stat/lstat syscalls where appropriate is kept, now in the fstatat
function. this change both improves functionality (now, fstatat forms
equivalent to fstat/lstat/stat will work even on kernels too old to
have the at functions) and localizes direct interfacing with the
kernel stat structure to one file.

remove utterly wrong includes from mips64/n32 bits/stat.h

these were overlooked during review. bits headers are not allowed to
pull in additional headers (note: that rule is currently broken in
other places but just for endian.h). string.h has no place here
anyway, and including bits/alltypes.h without defining macros to
request types from it is a nop.

use register constraint instead of memory operand for riscv64 atomics

the "A" constraint is simply for an address expression that's a single
register, but it's not yet supported by clang, and has no advantage
here over just using a register operand for the address. the latter is
actually preferable in the a_cas_p case because it avoids aliasing an
lvalue onto the memory.

fix riscv64 atomic asm constraints

most egregious problem was the lack of memory clobber and lack of
volatile asm; this made the atomics memory barriers but not compiler
barriers. use of "+r" rather than "=r" for a clobbered temp was also
wrong, since the initial value is indeterminate.