Dmitry V. Levin [Mon, 20 Feb 2012 21:17:58 +0000 (21:17 +0000)]
Remove initialization of native_scno field
* linux/i386/syscallent.h: Remove native_scno initialization for clone,
fork and vfork.
* linux/ia64/syscallent.h (sys_fork, sys_vfork): Remove redirections
to printargs.
* linux/syscall.h [IA64]: Do not define SYS_fork and SYS_vfork.
* util.c (printcall) [IA64]: Likewise.
(setbpt): Use sys_func to check for clone, fork and vfork syscalls.
Dmitry V. Levin [Fri, 10 Feb 2012 22:33:36 +0000 (22:33 +0000)]
Remove initialization of native_scno field for most of syscalls
The native_scno field is not so much used in the code than before.
In many cases sys_func is checked instead, and for most of syscall
entries there is no need to initialize native_scno.
* linux/i386/syscallent.h: Remove native_scno initialization for
_exit, read, write, waitpid, execve, wait4, sysfs, readv, writev,
pread64, pwrite64, exit_group, waitid, send, recv, sendto and
recvfrom syscall entries.
* linux/syscall.h: Do not define no longer used SYS_waitid and
SYS_sub_* constants.
[IA64]: Do not define SYS_waitpid and SYS32_* constants.
* defs.h: Do not define no longer used __NR_exit_group constant.
* strace.c [USE_PROCFS] (proc_open): Use sys_func to check for execve.
H.J. Lu [Fri, 3 Feb 2012 18:19:55 +0000 (10:19 -0800)]
Skip the syscall entry if the sys_func field is NULL
Avoid NULL dereference when there are holes in sysent tables.
It can happen with syscall (number, ...) and number is in those holes.
There are no targets with holey systent tables so far, but at least
one such a target, x32, is already on the horizon.
* defs.h (SCNO_IN_RANGE): Also check the sys_func field.
H.J. Lu [Fri, 3 Feb 2012 18:10:30 +0000 (10:10 -0800)]
Define old stat functions only if needed
When HAVE_LONG_LONG_OFF_T is defined, those old stat functions aren't
used and strace won't link since they use realprintstat which isn't
defined when HAVE_LONG_LONG_OFF_T is defined.
* file.c (convertoldstat, sys_oldstat, sys_oldfstat, sys_oldlstat):
Define only if HAVE_LONG_LONG_OFF_T isn't defined.
Dmitry V. Levin [Sat, 4 Feb 2012 15:17:43 +0000 (15:17 +0000)]
Remove unused sys_pread64 and sys_pwrite64 parsers on Linux
* io.c [HAVE_LONG_LONG_OFF_T]: Remove sys_pread64 and sys_pwrite64
aliases.
(sys_pread64, sys_pwrite64): Define these functions only on
[SVR4 && _LFS64_LARGEFILE] platform.
* linux/mips/syscallent.h: Use sys_pread and sys_pwrite to handle
appropriate syscalls.
* linux/syscall.h (sys_pread64, sys_pwrite64): Remove.
* syscall.c (dumpio): Check sys_pread64 and sys_pwrite64 only on
[SVR4 && _LFS64_LARGEFILE] platform.
Denys Vlasenko [Sun, 29 Jan 2012 21:38:35 +0000 (22:38 +0100)]
Simple optimizations
text data bss dec hex filename
239474 672 20484 260630 3fa16 strace.before
239234 668 19044 258946 3f382 strace
* file.c (sprint_open_modes): Reduce static buffer size.
Simplify separator printing.
* signal.c (sprintsigmask): Reduce static buffer size.
Simplify separator printing and printing of almost full masks.
Use stpcpy instead of sprintf and strcpy+strlen.
* strace.c (startup_child): Don't strchr() for ':' twice in a row.
* util.c (sprintflags): Exit loop early if possible.
Denys Vlasenko [Sun, 29 Jan 2012 15:53:03 +0000 (16:53 +0100)]
Make interactive-ness directly controllable via command line option
Defaults are often ok, but when they are not, people get confused.
"Why can't I kill strace?" and "Why strace dies on ^C when I want
to _tracee_ to die instead?" are typical complaints.
* strace.c: Replace 'interactive' variable with 'opt_intr' variable.
Define INTR_foo constants for its possible values.
Define 'interactive' as a macro.
(usage): Document -I n option.
(main): Parse -I n option, modify signal handling to accomidate new
-I 1 and -I 4 modes.
Denys Vlasenko [Sun, 29 Jan 2012 01:01:44 +0000 (02:01 +0100)]
Add experimental code to use PTRACE_SEIZE, disabled by default
All new code is predicated on "ifdef USE_SEIZE". If it is not defined,
behavior is not changed.
If USE_SEIZE is enabled and run-time check shows that PTRACE_SEIZE works, then:
- All attaching is done with PTRACE_SEIZE + PTRACE_INTERRUPT.
This means that we no longer generate (and possibly race with) SIGSTOP.
- PTRACE_EVENT_STOP will be generated if tracee is group-stopped.
When we detect it, we issue PTRACE_LISTEN instead of PTRACE_SYSCALL.
This leaves tracee stopped. This fixes the inability to SIGSTOP or ^Z
a straced process.
* defs.h: Add commented-out "define USE_SEIZE 1" and define PTRACE_SEIZE
and related constants.
* strace.c: New variable post_attach_sigstop shows whether we age going
to expect SIGSTOP on attach (IOW: are we going to use PTRACE_SEIZE).
(ptrace_attach_or_seize): New function. Uses PTRACE_ATTACH or
PTRACE_SEIZE + PTRACE_INTERRUPT to attach to given pid.
(startup_attach): Use ptrace_attach_or_seize() instead of ptrace(PTRACE_ATTACH).
(startup_child): Conditionally use alternative attach method using PTRACE_SEIZE.
(test_ptrace_setoptions_followfork): More robust parameters to PTRACE_TRACEME.
(test_ptrace_seize): New function to test whether PTRACE_SEIZE works.
(main): Call test_ptrace_seize() while initializing.
(trace): If PTRACE_EVENT_STOP is seen, restart using PTRACE_LISTEN in order
to not let tracee run.
* process.c: Decode PTRACE_SEIZE, PTRACE_INTERRUPT, PTRACE_LISTEN.
* util.c (ptrace_restart): Add "LISTEN" to a possible error message.
Denys Vlasenko [Sat, 28 Jan 2012 00:46:33 +0000 (01:46 +0100)]
Use process_vm_readv instead of PTRACE_PEEKDATA to read data blocks
Currently, we use PTRACE_PEEKDATA to read things like filenames and
data passed by I/O syscalls.
PTRACE_PEEKDATA gets one word per syscall. This is VERY expensive.
For example, in order to print fstat syscall, we need to perform
more than twenty trips into kernel to fetch one struct stat!
Kernel 3.2 got a new syscall, process_vm_readv(), which can be used to
copy data blocks out of process' address space.
This change uses it in umoven() and umovestr() functions if possible,
with fallback to old method if process_vm_readv() fails.
If it returns ENOSYS, we don't try to use it anymore, eliminating
overhead of trying it on older kernels.
Result of "time strace -oLOG ls -l /usr/lib >/dev/null":
before patch: 0.372s
After patch: 0.262s
* util.c (process_vm_readv): Wrapper to call process_vm_readv syscall.
(umoven): Use process_vm_readv for block reads of tracee memory.
(umovestr): Likewise.
* linux/syscall.h: Declare new function sys_process_vm_readv.
* process.c (sys_process_vm_readv): Decoder for new syscall.
* linux/i386/syscallent.h: Add process_vm_readv, process_vm_writev syscalls.
* linux/x86_64/syscallent.h: Likewise.
* linux/powerpc/syscallent.h: Likewise.
Denys Vlasenko [Sat, 28 Jan 2012 00:25:03 +0000 (01:25 +0100)]
Fix a case of broken output if last seen syscall was exit
* defs.h: Rename tcp_last to printing_tcp. Explain what it means.
Remove printtrailer() function.
* process.c (sys_exit): Convert printtrailer() call to "printing_tcp = NULL".
* strace.c: Add new variable printing_tcp.
(cleanup): Convert printtrailer() call to "printing_tcp = NULL".
(trace): Likewise.
(trace): Fix checks for incomplete line - it was working wrongly if last syscall was exit.
(printleader): Set printing_tcp.
(printtrailer): Remove this function.
* syscall.c: Remove tcp_last variable.
(trace_syscall_entering): Don't set printing_tcp, printleader call now does it.
(trace_syscall_exiting): Convert printtrailer() call to "printing_tcp = NULL".
Denys Vlasenko [Sat, 28 Jan 2012 00:16:02 +0000 (01:16 +0100)]
Fix handling of test/threaded_execve.c testcase
Since 3.0, Linux has a way to identify which thread execve'ed.
This patch makes use of it in order to properly dispose
of disappeared ("superseded") thread leader,
and replace it with execve'ed thread.
Before this patch, strace was "leaking" thread which exec'ed.
It was thinking that it still runs. It would look like this:
18460 pause( <unfinished ...> <=== thread leader
18466 execve("/proc/self/exe", ["exe", "exe"], [/* 47 vars */] <unfinished ...>
18465 +++ exited with 0 +++ <=== exits from other threads
18460 <... pause resumed> ) = 0
The last line is wrong: it's not pause resumed, it's execve resumed.
If thread leader would do exit instead of pause, it is much worse:
strace panics because it thinks it sees return from exit syscall!
And strace isn't aware 18466 (exec'ed thread) is gone.
It still thinks it's executes execve syscall.
* strace.c: New variable "static char *os_release".
(get_os_release): New static function.
(main): Call get_os_release to retrieve Linux version.
(trace): If we see PTRACE_EVENT_EXEC, retrieve old pid, and if it
differs from new one, free one of tcbs and print correct messages.
In code before commit 44d0532, single fprintf was used on purpose:
we want to send entire message as one write() call. Since stderr
is unbuffered, separate fprintf's to it always result in separate
writes, they are not coalesced. If we aren't the only program
which writes to this particular stderr, this may result
in interleaved messages.
Since this function is not performance critical, I guess
it's ok to make it less efficient.
* strace.c (verror_msg): Attempt to print the message in single
write operation. Use separate fprintfs as a fallback if malloc fails.
Denys Vlasenko [Sat, 21 Jan 2012 03:01:56 +0000 (04:01 +0100)]
Improve code readability (logic is unchanged)
* util.c (umoven): Move assignment out of function call. Make assignment
to a flag variable later, closer to the place where it will be used.
(umovestr): Likewise.
(uload): Likewise.
Denys Vlasenko [Fri, 20 Jan 2012 10:56:00 +0000 (11:56 +0100)]
Change umovestr API: return > 0 instead of 0 if NUL was seen
* pathtrace.c (upathmatch): Adjust umovestr return value check for new API.
* util.c (printpathn): Use umovestr() > 0 return value for more efficient
(and robust - we don't depend on "no overwrote past NUL" behavior anymore)
handling of terminating NUL.
(printstr): Remove useless NUL placement before umovestr() call.
Allocate 1 byte more to outstr[] array - for NUL.
(umovestr): Change to return 1 if NUL was seen.
Denys Vlasenko [Fri, 20 Jan 2012 10:04:04 +0000 (11:04 +0100)]
Eliminate code duplication in time printing, reduce a few static buffers
text data bss dec hex filename
238454 664 28772 267890 41672 strace.before
238106 664 28676 267446 414b6 strace
* defs.h: Add TIMESPEC_TEXT_BUFSIZE and TIMEVAL_TEXT_BUFSIZE defines.
Add 'int special' parameter to sprinttv().
* time.c (sprinttv): Add 'int special' parameter, and use it
similarly to 'int special' parameter of printtv_bitness().
(printtv_bitness): Use sprinttv() instead of duplicating its code.
(print_timespec): Use sprint_timespec() instead of duplicating
its code.
* desc.c (decode_select): Use TIMEVAL_TEXT_BUFSIZE instead of 128
when checking remaining buffer size.
* net.c (sys_recvmsg): Use TIMESPEC_TEXT_BUFSIZE instead of 128
for static buffer size.
* stream.c (decode_poll): Use TIMESPEC_TEXT_BUFSIZE instead of 128
when checking remaining buffer size.
Denys Vlasenko [Thu, 19 Jan 2012 16:20:23 +0000 (17:20 +0100)]
Reduce bss usage and speed up string printing
text data bss dec hex filename
237913 660 49284 287857 46471 strace.before
237973 660 28772 267405 4148d strace
This reduces L1 D-cache pressure a bit: instead of dirtying
20k of bss, we will reuse already dirty stack area.
* util.c (printpathn): Use on-stack buffers instead of static ones.
Saves 5*MAXPATHLEN in bss.
(printstr): Use tprints() instead of tprintf("%s") when printing
formatted string. May be a bit faster, depending on libc.
Denys Vlasenko [Wed, 18 Jan 2012 15:30:47 +0000 (16:30 +0100)]
Get rid of TCB_SIGTRAPPED
On attempts to block or set SIGTRAP handler,
for example, using sigaction syscall, we generate
an additional SIGSTOP.
This change gets rid of this SIGSTOP sending/ignoring.
It appears to work just fine.
It also works if I force strace to not use PTRACE_O_TRACESYSGOOD,
which means strace stops will be marked with SIGTRAP,
not (SIGTRAP | 0x80) - I wondered maybe that's when
this hack is needed.
So, why we even have TCB_SIGTRAPPED? No one knows. It predates
version control: this code was present in the initial commit,
in 1999. No adequate comments, either.
Moreover, TCB_SIGTRAPPED is not set in sys_rt_sigaction
and sys_sigprocmask syscalls - the ones which are most usually
used to implement signal blocking, it is only set in obsolete
sys_signal, sys_sigaction, sys_sigsetmask, and in some dead
non-Linux code.
I think whatever bug it was fixing is gone long ago -
at least as long as sys_rt_sigaction is used by glibc.
Again, since glibc (and uclibc) uses sys_rt_sigaction
and sys_sigprocmask, modified code paths are not used
by most programs anyway.
* defs.h: Remove definition of TCB_SIGTRAPPED.
* signal.c (sys_sigvec): Don't set TCB_SIGTRAPPED and don't send SIGSTOP.
(sys_sigsetmask): Likewise.
(sys_sigaction): Likewise.
(sys_signal): Likewise.
* strace.c (trace): Remove code which executes if TCB_SIGTRAPPED is set.
Denys Vlasenko [Thu, 12 Jan 2012 10:26:34 +0000 (11:26 +0100)]
Make ERESTARTxyz messages more descriptive
There is widespread confusion about exact meaning
of ERESTARTxyz codes. Before this change, we were showing
all four of them the same: as "(To be restarted)".
This change prints better explanations for these codes,
and contains verbose comments which explain *why* we display
codes that way - or else someone confused
is bound to come later and mangle them again.
New messages are:
ERESTARTSYS (To be restarted if SA_RESTART is set)
ERESTARTNOINTR (To be restarted)
ERESTARTNOHAND (Interrupted by signal)
ERESTART_RESTARTBLOCK (Interrupted by signal)
* syscall.c (trace_syscall_exiting): Make ERESTARTxyz messages
more descriptive.
Denys Vlasenko [Tue, 10 Jan 2012 15:40:35 +0000 (16:40 +0100)]
Display mask on enter to sigreturn, not on exit
sys_sigreturn() performs ugly manipulations in order to show
signal mask which is restored by this syscall: on syscall entry,
fetches it from the stack, saves it in tcp->u_arg[]
(where it used to overflow this array - fixed sometime ago),
then retrieves the mask and displays it on syscall exit.
Apparently, the motivation is to make it slightly more obvious
to user that signal mask is restored only when this syscall returns.
IMO, this hardly justifies the necessary hacks. It is much easier
to display the mask at the point when we fetch it - on syscall entry.
While at it, I made it so that we do display returned value/errno.
I see no point in hiding it and showing uninformative "= ?" instead.
Example of pause() being interrupted by ALRM which has installed handler
which re-arms ALRM:
* defs.h: Declare struct pt_regs i386_regs and struct pt_regs x86_64_regs.
* syscall.c: Remove "static" keywork from these structures' definitions.
* signal.c (sys_sigreturn): Display mask on enter, not on exit.
Denys Vlasenko [Wed, 4 Jan 2012 14:15:26 +0000 (15:15 +0100)]
Do not detach from tracee which experienced ptrace error.
Before this patch, if a thread got nuked by exit in another thread
and we happened to poke it at the same time, we print "????(" thingy
and detach the thread. Since we removed "detach before death" logic,
this no longer matches the behavior of other threads.
Before patch:
[pid 1780] exit_group(1) = ?
[pid 1778] ????( <unfinished ...>
Process 1778 detached
[pid 5860] +++ exited with 1 +++
After:
[pid 17765] exit_group(1) = ?
[pid 21680] ????( <unfinished ...>
[pid 17791] +++ exited with 1 +++
[pid 21680] +++ exited with 1 +++
* strace (trace): Do not detach from tracee which experienced ptrace error.
Dmitry V. Levin [Mon, 26 Dec 2011 20:12:02 +0000 (20:12 +0000)]
Enhance decoding for personalities with small wordsize
* util.c (umoven, umovestr) [SUPPORTED_PERSONALITIES > 1]: If current
personality's wordsize is less than sizeof(long), use only significant
bits of the given address.
Dmitry V. Levin [Fri, 23 Dec 2011 00:50:49 +0000 (00:50 +0000)]
Enhance personality switching
On syscall entry, save current personality in the tcb structure
along with scno.
On syscall exit, restore current personality from the tcb structure.
* defs.h (struct tcb) [SUPPORTED_PERSONALITIES > 1]: Add currpers
field.
* strace.c (alloc_tcb) [SUPPORTED_PERSONALITIES > 1]: Initialize
tcp->currpers.
* syscall.c (update_personality) [SUPPORTED_PERSONALITIES > 1]: New
function.
(get_scno, trace_syscall_exiting): Use it.
Reported-by: Michael A Fetterman <mafetter@nvidia.com>
Heiko Carstens [Wed, 30 Nov 2011 12:16:29 +0000 (13:16 +0100)]
Fix sys_ipc/sys_semtimedop decoding on s390
The s390 kernel sys_ipc system call only takes five arguments instead of
six arguments which the common code sys_ipc implementation takes.
One of the arguments of the sys_semtimedop subcall is therefore passed in
a different register than in the common code implementation.
This leads to broken decoding of the timespec argument:
Dmitry V. Levin [Wed, 12 Oct 2011 19:03:29 +0000 (19:03 +0000)]
Add names for dummy parsers. No code changes
* linux/dummy.h: Add aliases to printargs() for those of dummy parsers
that had no own names before.
* linux/*/syscallent.h: Use these new names instead of printargs.
Dmitry V. Levin [Tue, 11 Oct 2011 17:07:05 +0000 (17:07 +0000)]
Implement decoding of splice, tee and vmsplice(2) syscalls
* io.c (print_loff_t): New function.
(sys_sendfile64): Use it.
(splice_flags): New xlat structure.
(sys_tee, sys_splice, sys_vmsplice): New functions.
* linux/syscall.h (sys_tee, sys_splice, sys_vmsplice): Declare them.
* linux/*/syscallent.h: Use them.
Do post-attach initialization earlier; fix "we ignore SIGSTOP on NOMMU" bug
We set ptrace options when we see post-attach SIGSTOP.
This is wrong: it's better to set them right away on the very first
stop (whichever it will be). It also will make adding SEIZE support easier,
since SEIZE has no post-attach SIGSTOP.
We do it by adding a new bit, TCB_IGNORE_ONE_SIGSTOP, and treating
TCB_STARTUP and TCB_IGNORE_ONE_SIGSTOP as two slightly different things.
* defs.h: Add a new flag bit, TCB_IGNORE_ONE_SIGSTOP.
* process.c (internal_fork): Set TCB_IGNORE_ONE_SIGSTOP on a newly added child.
* strace.c (startup_attach): Set TCB_IGNORE_ONE_SIGSTOP after attach.
Fix a case when "strace -p PID" found PID dead but sone other of its threads
still alive.
(startup_child): Set TCB_IGNORE_ONE_SIGSTOP after attach, _if needed_.
This fixes a bogus case where we can ignore a _real_ SIGSTOP on NOMMU.
(detach): Perform anti-SIGSTOP dance only if TCB_IGNORE_ONE_SIGSTOP is set,
not if TCB_STARTUP is set.
(trace): Set TCB_IGNORE_ONE_SIGSTOP after attach.
Clear TCB_STARTUP and initialize tracee on the very first tracee stop.
Clear TCB_IGNORE_ONE_SIGSTOP when SIGSTOP is seen.
* defs.h: Remove TCB_ATTACH_DONE constant.
* strace.c (startup_attach): Use TCB_STARTUP instead of TCB_ATTACH_DONE
to distinquish attached from not-yet-attached threads.
This fixes logic in detach() which thinks that TCB_STARTUP
means that we are already attached, but did not see SIGSTOP yet.
This also allows to get rid of TCB_ATTACH_DONE flag.
* process.c (internal_fork): Set TCB_STARTUP after attach.
* strace.c (startup_attach): Likewise.
(startup_child): Likewise.
(alloc_tcb): Do not set TCB_STARTUP on tcb allocation - we are
not attached yet.
(trace): Set TCB_STARTUP when we detech an auto-attached child.
After recent change, select(2^31-1, NULL, NULL, NULL)
would make strace exit. This change caps fdsize so that
it is always in [0, 1025*1024], IOW: we will try to allocate at most
1 megabyte, which in practice will almost always work,
unlike malloc(2Gig).
* desc.c (decode_select): Cap fdsize to 1024*1024.
* pathtrace.c (pathtrace_match): Cap fdsize to 1024*1024.
* file.c (sys_getdents): Cap len to 1024*1024.
(sys_getdents64): Cap len to 1024*1024.
* util.c (dumpiov): Refuse to process iov with more than 1024*1024
elements. Don't die on malloc failure.
(dumpstr): Don't die on malloc failure.