Denys Vlasenko [Sat, 21 Jan 2012 03:01:56 +0000 (04:01 +0100)]
Improve code readability (logic is unchanged)
* util.c (umoven): Move assignment out of function call. Make assignment
to a flag variable later, closer to the place where it will be used.
(umovestr): Likewise.
(uload): Likewise.
Denys Vlasenko [Fri, 20 Jan 2012 10:56:00 +0000 (11:56 +0100)]
Change umovestr API: return > 0 instead of 0 if NUL was seen
* pathtrace.c (upathmatch): Adjust umovestr return value check for new API.
* util.c (printpathn): Use umovestr() > 0 return value for more efficient
(and robust - we don't depend on "no overwrote past NUL" behavior anymore)
handling of terminating NUL.
(printstr): Remove useless NUL placement before umovestr() call.
Allocate 1 byte more to outstr[] array - for NUL.
(umovestr): Change to return 1 if NUL was seen.
Denys Vlasenko [Fri, 20 Jan 2012 10:04:04 +0000 (11:04 +0100)]
Eliminate code duplication in time printing, reduce a few static buffers
text data bss dec hex filename
238454 664 28772 267890 41672 strace.before
238106 664 28676 267446 414b6 strace
* defs.h: Add TIMESPEC_TEXT_BUFSIZE and TIMEVAL_TEXT_BUFSIZE defines.
Add 'int special' parameter to sprinttv().
* time.c (sprinttv): Add 'int special' parameter, and use it
similarly to 'int special' parameter of printtv_bitness().
(printtv_bitness): Use sprinttv() instead of duplicating its code.
(print_timespec): Use sprint_timespec() instead of duplicating
its code.
* desc.c (decode_select): Use TIMEVAL_TEXT_BUFSIZE instead of 128
when checking remaining buffer size.
* net.c (sys_recvmsg): Use TIMESPEC_TEXT_BUFSIZE instead of 128
for static buffer size.
* stream.c (decode_poll): Use TIMESPEC_TEXT_BUFSIZE instead of 128
when checking remaining buffer size.
Denys Vlasenko [Thu, 19 Jan 2012 16:20:23 +0000 (17:20 +0100)]
Reduce bss usage and speed up string printing
text data bss dec hex filename
237913 660 49284 287857 46471 strace.before
237973 660 28772 267405 4148d strace
This reduces L1 D-cache pressure a bit: instead of dirtying
20k of bss, we will reuse already dirty stack area.
* util.c (printpathn): Use on-stack buffers instead of static ones.
Saves 5*MAXPATHLEN in bss.
(printstr): Use tprints() instead of tprintf("%s") when printing
formatted string. May be a bit faster, depending on libc.
Denys Vlasenko [Wed, 18 Jan 2012 15:30:47 +0000 (16:30 +0100)]
Get rid of TCB_SIGTRAPPED
On attempts to block or set SIGTRAP handler,
for example, using sigaction syscall, we generate
an additional SIGSTOP.
This change gets rid of this SIGSTOP sending/ignoring.
It appears to work just fine.
It also works if I force strace to not use PTRACE_O_TRACESYSGOOD,
which means strace stops will be marked with SIGTRAP,
not (SIGTRAP | 0x80) - I wondered maybe that's when
this hack is needed.
So, why we even have TCB_SIGTRAPPED? No one knows. It predates
version control: this code was present in the initial commit,
in 1999. No adequate comments, either.
Moreover, TCB_SIGTRAPPED is not set in sys_rt_sigaction
and sys_sigprocmask syscalls - the ones which are most usually
used to implement signal blocking, it is only set in obsolete
sys_signal, sys_sigaction, sys_sigsetmask, and in some dead
non-Linux code.
I think whatever bug it was fixing is gone long ago -
at least as long as sys_rt_sigaction is used by glibc.
Again, since glibc (and uclibc) uses sys_rt_sigaction
and sys_sigprocmask, modified code paths are not used
by most programs anyway.
* defs.h: Remove definition of TCB_SIGTRAPPED.
* signal.c (sys_sigvec): Don't set TCB_SIGTRAPPED and don't send SIGSTOP.
(sys_sigsetmask): Likewise.
(sys_sigaction): Likewise.
(sys_signal): Likewise.
* strace.c (trace): Remove code which executes if TCB_SIGTRAPPED is set.
Denys Vlasenko [Thu, 12 Jan 2012 10:26:34 +0000 (11:26 +0100)]
Make ERESTARTxyz messages more descriptive
There is widespread confusion about exact meaning
of ERESTARTxyz codes. Before this change, we were showing
all four of them the same: as "(To be restarted)".
This change prints better explanations for these codes,
and contains verbose comments which explain *why* we display
codes that way - or else someone confused
is bound to come later and mangle them again.
New messages are:
ERESTARTSYS (To be restarted if SA_RESTART is set)
ERESTARTNOINTR (To be restarted)
ERESTARTNOHAND (Interrupted by signal)
ERESTART_RESTARTBLOCK (Interrupted by signal)
* syscall.c (trace_syscall_exiting): Make ERESTARTxyz messages
more descriptive.
Denys Vlasenko [Tue, 10 Jan 2012 15:40:35 +0000 (16:40 +0100)]
Display mask on enter to sigreturn, not on exit
sys_sigreturn() performs ugly manipulations in order to show
signal mask which is restored by this syscall: on syscall entry,
fetches it from the stack, saves it in tcp->u_arg[]
(where it used to overflow this array - fixed sometime ago),
then retrieves the mask and displays it on syscall exit.
Apparently, the motivation is to make it slightly more obvious
to user that signal mask is restored only when this syscall returns.
IMO, this hardly justifies the necessary hacks. It is much easier
to display the mask at the point when we fetch it - on syscall entry.
While at it, I made it so that we do display returned value/errno.
I see no point in hiding it and showing uninformative "= ?" instead.
Example of pause() being interrupted by ALRM which has installed handler
which re-arms ALRM:
* defs.h: Declare struct pt_regs i386_regs and struct pt_regs x86_64_regs.
* syscall.c: Remove "static" keywork from these structures' definitions.
* signal.c (sys_sigreturn): Display mask on enter, not on exit.
Denys Vlasenko [Wed, 4 Jan 2012 14:15:26 +0000 (15:15 +0100)]
Do not detach from tracee which experienced ptrace error.
Before this patch, if a thread got nuked by exit in another thread
and we happened to poke it at the same time, we print "????(" thingy
and detach the thread. Since we removed "detach before death" logic,
this no longer matches the behavior of other threads.
Before patch:
[pid 1780] exit_group(1) = ?
[pid 1778] ????( <unfinished ...>
Process 1778 detached
[pid 5860] +++ exited with 1 +++
After:
[pid 17765] exit_group(1) = ?
[pid 21680] ????( <unfinished ...>
[pid 17791] +++ exited with 1 +++
[pid 21680] +++ exited with 1 +++
* strace (trace): Do not detach from tracee which experienced ptrace error.
Dmitry V. Levin [Mon, 26 Dec 2011 20:12:02 +0000 (20:12 +0000)]
Enhance decoding for personalities with small wordsize
* util.c (umoven, umovestr) [SUPPORTED_PERSONALITIES > 1]: If current
personality's wordsize is less than sizeof(long), use only significant
bits of the given address.
Dmitry V. Levin [Fri, 23 Dec 2011 00:50:49 +0000 (00:50 +0000)]
Enhance personality switching
On syscall entry, save current personality in the tcb structure
along with scno.
On syscall exit, restore current personality from the tcb structure.
* defs.h (struct tcb) [SUPPORTED_PERSONALITIES > 1]: Add currpers
field.
* strace.c (alloc_tcb) [SUPPORTED_PERSONALITIES > 1]: Initialize
tcp->currpers.
* syscall.c (update_personality) [SUPPORTED_PERSONALITIES > 1]: New
function.
(get_scno, trace_syscall_exiting): Use it.
Reported-by: Michael A Fetterman <mafetter@nvidia.com>
Heiko Carstens [Wed, 30 Nov 2011 12:16:29 +0000 (13:16 +0100)]
Fix sys_ipc/sys_semtimedop decoding on s390
The s390 kernel sys_ipc system call only takes five arguments instead of
six arguments which the common code sys_ipc implementation takes.
One of the arguments of the sys_semtimedop subcall is therefore passed in
a different register than in the common code implementation.
This leads to broken decoding of the timespec argument:
Dmitry V. Levin [Wed, 12 Oct 2011 19:03:29 +0000 (19:03 +0000)]
Add names for dummy parsers. No code changes
* linux/dummy.h: Add aliases to printargs() for those of dummy parsers
that had no own names before.
* linux/*/syscallent.h: Use these new names instead of printargs.
Dmitry V. Levin [Tue, 11 Oct 2011 17:07:05 +0000 (17:07 +0000)]
Implement decoding of splice, tee and vmsplice(2) syscalls
* io.c (print_loff_t): New function.
(sys_sendfile64): Use it.
(splice_flags): New xlat structure.
(sys_tee, sys_splice, sys_vmsplice): New functions.
* linux/syscall.h (sys_tee, sys_splice, sys_vmsplice): Declare them.
* linux/*/syscallent.h: Use them.
Do post-attach initialization earlier; fix "we ignore SIGSTOP on NOMMU" bug
We set ptrace options when we see post-attach SIGSTOP.
This is wrong: it's better to set them right away on the very first
stop (whichever it will be). It also will make adding SEIZE support easier,
since SEIZE has no post-attach SIGSTOP.
We do it by adding a new bit, TCB_IGNORE_ONE_SIGSTOP, and treating
TCB_STARTUP and TCB_IGNORE_ONE_SIGSTOP as two slightly different things.
* defs.h: Add a new flag bit, TCB_IGNORE_ONE_SIGSTOP.
* process.c (internal_fork): Set TCB_IGNORE_ONE_SIGSTOP on a newly added child.
* strace.c (startup_attach): Set TCB_IGNORE_ONE_SIGSTOP after attach.
Fix a case when "strace -p PID" found PID dead but sone other of its threads
still alive.
(startup_child): Set TCB_IGNORE_ONE_SIGSTOP after attach, _if needed_.
This fixes a bogus case where we can ignore a _real_ SIGSTOP on NOMMU.
(detach): Perform anti-SIGSTOP dance only if TCB_IGNORE_ONE_SIGSTOP is set,
not if TCB_STARTUP is set.
(trace): Set TCB_IGNORE_ONE_SIGSTOP after attach.
Clear TCB_STARTUP and initialize tracee on the very first tracee stop.
Clear TCB_IGNORE_ONE_SIGSTOP when SIGSTOP is seen.
* defs.h: Remove TCB_ATTACH_DONE constant.
* strace.c (startup_attach): Use TCB_STARTUP instead of TCB_ATTACH_DONE
to distinquish attached from not-yet-attached threads.
This fixes logic in detach() which thinks that TCB_STARTUP
means that we are already attached, but did not see SIGSTOP yet.
This also allows to get rid of TCB_ATTACH_DONE flag.
* process.c (internal_fork): Set TCB_STARTUP after attach.
* strace.c (startup_attach): Likewise.
(startup_child): Likewise.
(alloc_tcb): Do not set TCB_STARTUP on tcb allocation - we are
not attached yet.
(trace): Set TCB_STARTUP when we detech an auto-attached child.
After recent change, select(2^31-1, NULL, NULL, NULL)
would make strace exit. This change caps fdsize so that
it is always in [0, 1025*1024], IOW: we will try to allocate at most
1 megabyte, which in practice will almost always work,
unlike malloc(2Gig).
* desc.c (decode_select): Cap fdsize to 1024*1024.
* pathtrace.c (pathtrace_match): Cap fdsize to 1024*1024.
* file.c (sys_getdents): Cap len to 1024*1024.
(sys_getdents64): Cap len to 1024*1024.
* util.c (dumpiov): Refuse to process iov with more than 1024*1024
elements. Don't die on malloc failure.
(dumpstr): Don't die on malloc failure.
Minor tweaks in startup_child(). Logic isn't changed (but code is)
* strace.c (startup_attach): Tweak comment.
(startup_child): Move common code out of ifdef.
Indent nested ifdefs. Tweak comments. Remove two
unnecessary calls to getpid().
Denys Vlasenko [Wed, 31 Aug 2011 13:58:06 +0000 (15:58 +0200)]
Add README-linux-ptrace file
I tried to push this doc to Michael Kerrisk <mtk.manpages@gmail.com>,
but got no reply. To avoid losing the document, let it live
in strace tree for now.
Denys Vlasenko [Wed, 31 Aug 2011 10:26:03 +0000 (12:26 +0200)]
Optimization: eliminate all remaining usages of strcat()
After this change, we don't use strcat() anywhere.
* defs.h: Change sprinttv() return type to char *.
* time.c (sprinttv): Return pointer past last stored char.
* desc.c (decode_select): Change printing logic in order to eliminate
usage of strcat() - use stpcpy(), *outptr++ = ch, sprintf() instead.
Also reduce usage of strlen().
* stream.c (decode_poll): Likewise.
Denys Vlasenko [Wed, 31 Aug 2011 10:22:56 +0000 (12:22 +0200)]
Optimize string_quote() for speed
* util.c (string_quote): Speed up check for terminating NUL.
Replace strintf() with open-coded binary to hex/oct conversions -
we potentially do them for every single byte, need to be fast.
Denys Vlasenko [Tue, 30 Aug 2011 16:53:49 +0000 (18:53 +0200)]
On X86_64 and I386, use PTRACE_GETREGS to fetch all registers
Before this change, registers were read with PTRACE_PEEKUSER
ptrace operation, one per register. This is slower than
fetching them all in one ptrace operation.
* defs.h: include asm/ptrace.h on X86_64 and I386.
* syscall.c: New static variables i386_regs and x86_64_regs.
Remove static eax/rax variables.
(get_scno): Fetch all registers with single PTRACE_GETREGS operation.
(get_syscall_result): Likewise.
(syscall_fixup_on_sysenter): Use PTRACE_GETREGS results in i386/x86_64_regs.
(syscall_enter): Set tcp->u_arg[i] from PTRACE_GETREGS results.
(get_error): Set tcp->u_rval, tcp->u_error from PTRACE_GETREGS results.
Denys Vlasenko [Thu, 25 Aug 2011 08:25:35 +0000 (10:25 +0200)]
Simplify syscall_fixup[_on_sysexit]
* syscall.c (syscall_fixup): Remove checks for entering(tcp).
Remove code which executes if exiting(tcp).
(syscall_fixup_on_sysexit): Remove code which executes
if entering(tcp). Remove checks for exiting(tcp).
Denys Vlasenko [Thu, 25 Aug 2011 08:23:00 +0000 (10:23 +0200)]
Split syscall_fixup into enter/exit pair of functions
* syscall.c: Create syscall_fixup_on_sysexit() which is a copy of
syscall_fixup().
(trace_syscall_exiting): Call syscall_fixup_on_sysexit() instead of
syscall_fixup().
Denys Vlasenko [Wed, 24 Aug 2011 23:27:59 +0000 (01:27 +0200)]
Optimize tabto()
tabto is used in many lines of strace output.
On glibc, tprintf("%*s", col - curcol, "") is noticeably slow
compared to tprintf(" "). Use the latter.
Observed ~15% reduction of time spent in userspace.
* defs.h: Drop extern declaration of acolumn. Make tabto()
take no parameters.
* process.c (sys_exit): Call tabto() with no parameters.
* syscall.c (trace_syscall_exiting): Call tabto() with no parameters.
* strace.c: Make acolumn static, add static char *acolumn_spaces.
(main): Allocate acolumn_spaces as a string of spaces.
(printleader): Call tabto() with no parameters.
(tabto): Use simpler method to print lots of spaces.
Denys Vlasenko [Wed, 24 Aug 2011 23:13:43 +0000 (01:13 +0200)]
Opotimize "scno >= 0 && scno < nsyscalls" check
gcc can't figure out on its own that this check can be done with
single compare, and does two compares. We can help it by casting
scno to unsigned long: ((unsigned long)(scno) < nsyscalls)
* defs.h: New macro SCNO_IN_RANGE(long_var).
* count.c (count_syscall): Use SCNO_IN_RANGE() instead of open-coded check.
* syscall.c (getrval2): Use SCNO_IN_RANGE() instead of open-coded check.
This fixes a bug: missing check for scno < 0 and scno > nsyscalls
instead of scno >= nsyscalls.
(get_scno): Use SCNO_IN_RANGE() instead of open-coded check.
This fixes a bug: scno > nsyscalls instead of scno >= nsyscalls.
(known_scno): Use SCNO_IN_RANGE() instead of open-coded check.
(internal_syscall): Likewise.
(syscall_enter): Likewise.
(trace_syscall_entering): Likewise.
(get_error): Likewise.
(trace_syscall_exiting): Likewise.
Denys Vlasenko [Wed, 24 Aug 2011 15:25:32 +0000 (17:25 +0200)]
Unify per-architecture post-execve SIGTRAP check.
Move post-execve SIGTRAP check from get_scno_on_sysenter
(multitude of places on many architectures) to a single location
in trace_syscall_entering. This loosens the logic for some arches,
since many of them had additional checks such as scno == 0.
However, on non-ancient Linux kernels we should never have post-execve
SIGTRAP in the first place, by virtue of using PTRACE_O_TRACEEXEC.
Denys Vlasenko [Wed, 24 Aug 2011 14:59:23 +0000 (16:59 +0200)]
Speed up x86 by avoiding EAX read on syscall entry
on x86, EAX read on syscall entry is not necessary if we know
that post-execve SIGTRAP is disabled by PTRACE_O_TRACEEXEC ptrace option.
This patch (a) moves EAX retrieval from syscall_fixup
to get_scno_on_sysexit, and (b) perform EAX retrieval in syscall_fixup
only if we are in syscall entry and PTRACE_O_TRACEEXEC option is not on.
* syscall.c (get_scno_on_sysexit): On I386 and X86_64, read eax/rax
which contain syscall return value.
(syscall_fixup): On I386 and X86_64, read eax/rax only on syscall enter
and only if PTRACE_O_TRACEEXEC is not in effect.
Denys Vlasenko [Wed, 24 Aug 2011 14:56:03 +0000 (16:56 +0200)]
Do not read syscall no in get_scno_on_sysexit
* syscall.c (get_scno_on_sysexit): Remove scno retrieval code, since
we don't save it anyway. This is the first real logic change
which should make strace faster: for example, on x64 ORIG_EAX
is no longer read in each syscall exit.
Denys Vlasenko [Wed, 24 Aug 2011 14:47:32 +0000 (16:47 +0200)]
get_scno is an unholy mess, make it less horrible
Currently, get_scno does *much* more than "get syscall no".
It checks for post-execve SIGTRAP. It checks for changes
in personality. It retrieves params on entry and registers on exit.
Worse still, it is different in different architectures: for example,
for AVR32 regs are fetched in get_scno(), while for e.g. I386
it is done in syscall_enter().
Another problem is that get_scno() is called on both syscall entry and
syscall exit, which is stupid: we don't need to know scno on syscall
exit, it is already known from last syscall entry and stored in
tcp->scno! In essence, get_scno() does two completely different things
on syscall entry and on exit, they are just mixed into one bottle, like
shampoo and conditioner.
The following patches will try to improve this situation.
This change duplicates get_scno into identical get_scno_on_sysenter,
get_scno_on_sysexit functions. Call them in syscall enter and syscall
exit, correspondingly.
* defs.h: Rename get_scno to get_scno_on_sysenter; declare it only
if USE_PROCFS.
* strace.c (proc_open): Call get_scno_on_sysenter instead of get_scno.
* syscall.c (get_scno): Split into two (so far identical) functions
get_scno_on_sysenter and get_scno_on_sysexit.
(trace_syscall_entering): Call get_scno_on_sysenter instead of get_scno.
(trace_syscall_exiting): Call get_scno_on_sysexit instead of get_scno.
Dmitry V. Levin [Tue, 23 Aug 2011 16:24:20 +0000 (16:24 +0000)]
Reduce code redundancy in syscall_enter()
* syscall.c [LINUX] (syscall_enter): Move tcp->u_nargs initialization
from arch-specific ifdefs to common code. Always cache tcp->u_nargs in
a local variable and use it in for() loops.
[IA64, AVR32] Rewrite tcp->u_arg[] initialization using a loop.
Denys Vlasenko [Tue, 23 Aug 2011 16:04:25 +0000 (18:04 +0200)]
Define MAX_ARGS to 6 for all Linux arches
* defs.h: Define MAX_ARGS to 6 for all Linux arches.
* linux/ia64/syscallent.h: Change all 8-argument printargs
to MA (MAX_ARGS).
linux/mips/syscallent.h: Change all two 7-argument printargs
to MA (MAX_ARGS).
Denys Vlasenko [Tue, 23 Aug 2011 11:32:38 +0000 (13:32 +0200)]
Cache tcp->u_nargs in a local variable for for() loops
Loops of the form "for (i = 0; i < tcp->u_nargs; i++) ..."
need to fetch tcp->u_nargs from memory on every iteration
if "..." part has a function call (gcc doesn't know that
tcp->u_nargs won't change). This can be sped up
by putting tcp->u_nargs in a local variable, which might
go into a CPU register.
* syscall.c (decode_subcall): Cache tcp->u_nargs in a local variable
as for() loop limit value.
(syscall_enter): Likewise.