Dmitry V. Levin [Wed, 20 Mar 2013 21:40:15 +0000 (21:40 +0000)]
Use 64-bit versions of stat, readdir and setrlimit functions when available
strace already has a mechanism to use fopen64 for output when the 64-bit
version of fopen is available on 32-bit architectures. Apply this
mechanism for other three functions to make strace fully adopted for
64-bit types.
* strace.c (struct_stat, stat_file, struct_dirent, read_dir,
struct_rlimit, set_rlimit): New macros.
(startup_attach): Use read_dir.
(startup_child): Use struct_stat and stat_file.
(main): Use struct_rlimit and set_rlimit.
Dmitry V. Levin [Wed, 20 Mar 2013 12:45:05 +0000 (12:45 +0000)]
Do not use struct dirent in readdir decoding
struct dirent from libc should not be used for umove'ing into because it
contains fixed size d_name.
* file.c (printdir): Rename to print_old_dirent.
[SH64]: Decode using struct kernel_dirent.
[!SH64]: Decode using an open-coded struct with 32-bit d_ino and d_off.
(sys_readdir): Update.
Dmitry V. Levin [Mon, 18 Mar 2013 23:28:29 +0000 (23:28 +0000)]
Fix build with older versions of libaio.h
* configure.ac: When libaio.h is available, check for
struct iocb.u.c.flags, IO_CMD_PWRITE and IO_CMD_PWRITEV.
* desc.c (print_common_flags): Check for HAVE_STRUCT_IOCB_U_C_FLAGS.
(sys_io_submit): Check for HAVE_DECL_IO_CMD_PWRITE and
HAVE_DECL_IO_CMD_PWRITEV.
Dmitry V. Levin [Mon, 18 Mar 2013 10:17:14 +0000 (10:17 +0000)]
Reorganize get_regs code, hopefully without functional changes
* syscall.c [I386 || ARM || OR1K || METAG] (ARCH_REGS_FOR_GETREGSET):
New macro.
(get_regset): Implement for AARCH64, METAG, OR1K and X32.
(get_regs) [AARCH64 || METAG || OR1K || X32]: Use it.
Denys Vlasenko [Thu, 7 Mar 2013 11:27:40 +0000 (12:27 +0100)]
Tweaks for -c: fixed setitimer/getitimer hack; optimized call_summary_pers()
count_syscall() was calling setitimer/getitimer once in order to find
smallest "tick" OS uses in time accounting, in order to use it
for syscalls which apparently spent less than that time in syscall.
The code assumed that this "tick" is not zero... but it is zero
on linux-3.6.11. Which means that this hack doesn't work...
At least this change prevents this measurement from being done
_repeatedly_, by initializing one_tick to -1, not 0.
While at it, added comments in count_syscall() explaining what we are doing.
Optimized call_summary_pers() a bit, by eliminating redundant tv -> float
conversions, and prevented 0.0/0.0 which was resulting in "% time"
being shown as "-nan" if total CPU time spent was 0.000000
(try "strace -c /bin/true").
The code seems to seriously underestimate CPU usage:
"strace -c ls -lR /usr/share >/dev/null" shows total time spent
in syscalls to be only ~10..20% of what "time ls -lR /usr/share >/dev/null"
shows.
It might be useful to have a mode where we show wall clock time
spent in syscalls, not CPU time. It might also be more accurate.
text data bss dec hex filename
245019 676 5708 251403 3d60b strace_old
244923 684 5676 251283 3d593 strace
Denys Vlasenko [Wed, 6 Mar 2013 22:44:23 +0000 (23:44 +0100)]
Open-code isprint(c) and isspace(c)
We don't call setlocale, thus we always use C locale.
But libc supports various other locales, and therefore
its ctype interface is general and at times inefficient.
For example, in glibc these macros result in function call,
whereas for e.g. isprint(c) just c >= ' ' && c <= 0x7e
suffices.
By open-coding ctype checks (we have only 4 of them)
we avoid function calls, we get smaller code:
text data bss dec hex filename
245127 680 5708 251515 3d67b strace_old
245019 676 5708 251403 3d60b strace
and we don't link in ctype tables (beneficial for static builds).
Denys Vlasenko [Tue, 5 Mar 2013 12:59:45 +0000 (13:59 +0100)]
Assorted fixes to syscallent.h
or1k was missing TM on many memory-related syscalls
sys_lookup_dcookie is 3-arg on 64-bit arches, and isn't TF
sys_recvmsg is 3-arg on all arches
sys_nfsservctl is 3-arg on all arches
sys_timerfd_create is 2-arg on all arches
sys_[f]truncate64 is 4-arg or 3-arg, never 5-arg
truncate64 is TF
sys_[l]lseek is TD
fstat[64] is TD
James Hogan [Fri, 22 Feb 2013 14:44:10 +0000 (14:44 +0000)]
Add support for Imagination Technologies Meta
Add support for Imagination Technologies Meta architecture (the
architecture/ABI is usually referred to as metag in code). The Meta
Linux kernel port is in the process of being upstreamed for v3.9 so it
uses generic system call numbers.
sys_lookup_dcookie writes a filename to buffer argument, so I've set
TF flag.
nfsservctl appears to be set to sys_ni_syscall in asm-generic/unistd.h
so I've left it blank.
truncate64/ftruncate64/pread64/pwrite64/readahead have unaligned 64bit
args which are packed tightly on metag, so less arguments on metag.
fchdir/llseek takes a file descriptor so s/TF/TD/
sync_file_range has 2 64bit args so uses 6 args, so s/4/6/
timerfd_create/msgget/msgctl/msgrcv/semget/segtimedop/semop/shmget/
shmctl/shmat/shmdt/recvmsg/migrate_pages have different number of args.
oldgetrlimit is just getrlimit for metag.
add TM flag to various memory syscalls.
metag doesn't directly use sys_mmap_pgoff for mmap2.
prlimit64/process_vm_readv/process_vm_writev take a pid so add TP flag.
fanotify_init doesn't appear to take a file descriptor so remove TD.
Add kcmp syscall.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Christian Svensson <blue@cmd.nu> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Denys Vlasenko [Wed, 27 Feb 2013 11:15:19 +0000 (12:15 +0100)]
Make -b take SYSCALL param, document it in --help and in manpage.
To not waste an option letter for just one trick,
extend -b to take a parameter:
"on which syscalls do you want to detach?".
Currently supports only execve.
While at it, fixed (by removing non-Linux and stale info)
and extended manpage text about -f.
Dmitry V. Levin [Tue, 26 Feb 2013 21:16:22 +0000 (21:16 +0000)]
Cleanup umoven and umovestr
Cleanup sloppy error handling.
First, EFAULT kind of errors from process_vm_readv by itself is not
something unusual, so a warning message will not be issued unless a
short read is detected.
Second, clients of umoven and umovestr are not prepared to detect and
handle short reads that can happen in these functions. The most safe
way to handle them is to return an error code.
* util.c (umoven, umovestr): Cleanup handling of errors coming from
process_vm_readv and PTRACE_PEEKDATA.
Ben Noordhuis [Tue, 26 Feb 2013 11:24:25 +0000 (12:24 +0100)]
Make umoven report success as 0, not >=0, stop returning success on partial reads
umoven() uses process_vm_readv() when available but it returns the
return value of that syscall, which is the number of bytes copied,
while its callers expect it to simply return zero on success.
It was causing syscalls that take a user-space argument to print
the abbreviated version, e.g.:
Denys Vlasenko [Fri, 22 Feb 2013 13:47:39 +0000 (14:47 +0100)]
Fix a bug in dumpstr (no null termination). Essentially rewrote dumpstr
This is a 14 year old bug (!).
It wasn't biting us merely because outstr[80] was static, thus ended up
in bss and whatever was after it "accidentally" provided the NUL byte.
When dumpstr was changed to use on-stack buffer, the bug reared its ugly head.
This is a rewrite which is smaller and should be significantly faster
for _long_ strings.
text data bss dec hex filename
244627 680 10860 256167 3e8a7 strace.t9/strace
244563 680 10860 256103 3e867 strace.ta/strace
* util.c (dumpstr): Rewrite to be faster and smaller.
Denys Vlasenko [Fri, 22 Feb 2013 12:37:36 +0000 (13:37 +0100)]
Eliminate MAX_QUALS, make qualifier array dynamic
MAX_QUALS was 2048, even though most arches used less than 500 entries
in it. MAX_QUALS had to be maintained by hand to be higher than syscall
count. It also limited the highest possible fd to track.
This change makes qual_flagsN[] arrays start sized to the required minimum
(number of syscalls) and grow dynamically if user requested
-e read=BIGNUM. As a precaution, BIGNUM should be < 2^15, but this limit
can be raised with no cost for normal strace invocations.
qual_flags is now a define to qual_vec[current_personality].
As a bonus, this patch aliases sysent, errnoent, signalent, ioctlent
names in one-personality arches to their corresponding <foo>0 arrays,
removing one indirection level.
text data bss dec hex filename
244471 700 12928 258099 3f033 strace.t7/strace
244627 680 10860 256167 3e8a7 strace.t8/strace
Denys Vlasenko [Fri, 22 Feb 2013 12:26:10 +0000 (13:26 +0100)]
Create and use struct_sysent and struct_ioctlent typedefs.
This is a preparatory mass replace patch with no code changes.
The future change will need to typedef sysent to sysent0,
which results in compile failures when "struct sysent" string
gets mangled into "struct sysent0".
Denys Vlasenko [Thu, 21 Feb 2013 15:13:47 +0000 (16:13 +0100)]
Eliminate many SCNO_IS_VALID checks
By adding tcp->s_ent pointer tot syscall table entry,
we can replace sysent[tcp->scno] references by tcp->s_ent.
More importantly, we may ensure that tcp->s_ent is always valid,
regardless of tcp->scno value. This allows us to drop
SCNO_IS_VALID(tcp->scno) checks before we access syscall
table entry.
We can optimize (qual_flags[tcp->scno] & QUAL_foo) checks
with a similar technique.
Resulting code shrink:
text data bss dec hex filename
245975 700 19072 265747 40e13 strace.t3/strace
245703 700 19072 265475 40d03 strace.t4/strace
* count.c (count_syscall): Use cheaper SCNO_IN_RANGE() check.
* defs.h: Add "int qual_flg" and "const struct sysent *s_ent"
to struct tcb. Remove "int u_nargs" from it.
Add UNDEFINED_SCNO constant which will mark undefined scnos
in tcp->qual_flg.
* pathtrace.c (pathtrace_match): Drop SCNO_IS_VALID check.
Use tcp->s_ent instead of sysent[tcp->scno].
* process.c (sys_prctl): Use tcp->s_ent->nargs instead of tcp->u_nargs.
(sys_waitid): Likewise.
* strace.c (init): Add compile-time check that DEFAULT_QUAL_FLAGS
constant is consistent with init code.
* syscall.c (decode_socket_subcall): Use tcp->s_ent->nargs
instead of tcp->u_nargs. Set tcp->qual_flg and tcp->s_ent.
(decode_ipc_subcall): Likewise.
(printargs): Use tcp->s_ent->nargs instead of tcp->u_nargs.
(printargs_lu): Likewise.
(printargs_ld): Likewise.
(get_scno): [MIPS,ALPHA] Use cheaper SCNO_IN_RANGE() check.
If !SCNO_IS_VALID, set tcp->s_ent and tcp->qual_flg to default values.
(internal_fork): Use tcp->s_ent instead of sysent[tcp->scno].
(syscall_fixup_for_fork_exec): Remove SCNO_IS_VALID check.
Use tcp->s_ent instead of sysent[tcp->scno].
(get_syscall_args): Likewise.
(get_error): Drop SCNO_IS_VALID check where it is redundant.
(dumpio): Drop SCNO_IS_VALID check where it is redundant.
Use tcp->s_ent instead of sysent[tcp->scno].
(trace_syscall_entering): Use (tcp->qual_flg & UNDEFINED_SCNO) instead
of SCNO_IS_VALID check. Use tcp->s_ent instead of sysent[tcp->scno].
Drop SCNO_IS_VALID check where it is redundant.
Print undefined syscall name with undefined_scno_name(tcp).
(trace_syscall_exiting): Likewise.
* util.c (setbpt): Use tcp->s_ent instead of sysent[tcp->scno].
Denys Vlasenko [Thu, 21 Feb 2013 14:46:34 +0000 (15:46 +0100)]
ARM: make it one-personality arch
ARM in fact _is_ one personality.
We had two personalities for it because it has a handful of
syscalls with huge scnos (0x000f00xx).
Extending syscall table to have [0x000f0005] index is of course
not a good idea.
Someone decided to handle that by having a separate personality
just for these syscalls.
But multi-personality arch does a bit more work in other parts.
This patch is another alternative: "move" 0x000f00nn syscalls
down to the entries just above last ordinary syscall,
by manipulating scno if it falls into the 0x000f00xx range.
In order to not worsen genuine undefined scnos' printing,
the code remaps scno back to actual value before printing
"syscall_NNN" string.
* defs.h: Remove multi-reprsonality defines from ARM.
* syscall.c (shuffle_scno): New function.
(undefined_scno_name): New function.
(get_scno): [ARM] Replace personality setting with scno shuffling.
(trace_syscall_entering): Print unknown syscall name using
undefined_scno_name().
(trace_syscall_exiting): Likewise.
* linux/arm/syscallent.h: Add ARM specific syscalls at the end.
* linux/arm/errnoent1.h: Deleted.
* linux/arm/ioctlent1.h: Deleted.
* linux/arm/signalent1.h: Deleted.
* linux/arm/syscallent1.h: Deleted.
Denys Vlasenko [Tue, 19 Feb 2013 15:30:31 +0000 (16:30 +0100)]
Fix NOMMU + daemonized tracer SEGV
pathname[] was getting destroyed, execve of garbage pathname
failing, and to top it off, the tracer's stack was also
smashed and trecer segfaulted.
* strace.c (exec_or_die): New function.
(startup_child): Don't use pathname[] contents after vfork,
make a malloced copy instead. Explain "NOMMU + -D bug"
and how we work around it.
Denys Vlasenko [Tue, 19 Feb 2013 10:28:20 +0000 (11:28 +0100)]
Clean up mmap decoding
Previous code merges too many similar, but different ways
of decoding mmap. For example, sys_old_mmap is "params in memory"
API... except SH[64], where it is "params in regs",
i.e. what sys_mmap ("new mmap") function does on other arches!
It's much simpler when every mmap handler has same API regardless
of arch. Where API means whether params are in regs or in memory,
and whether offset is in bytes, pages, or 4k blocks.
Then we just insert correct function pointers into
arch syscall tables.
It turns out there are four common mmap APIs over
all architectures which exist in Linux kernel,
and one outlier for S390.
A number of mmap decoders were plain wrong in arch tables.
For example, BFIN has no old_mmap. It returns ENOSYS.
I checked kernel sources for all arches nad fixed the tables.
There was dead code for x86_64 for old_mmap:
x86_64 has no old_mmap.
* mem.c: Refactor mmap functions so that we have five mmap syscall
handlers, each with the fixed API (not varying by arch).
* pathtrace.c (pathtrace_match): Adjust sys_func == mmap_func checks.
* linux/syscall.h: Declare new mmap syscall handler functions.
* linux/arm/syscallent.h: mmap2 is sys_mmap_pgoff.
* linux/avr32/syscallent.h: mmap is sys_mmap_pgoff.
* linux/bfin/syscallent.h: old_mmap is ENOSYS, mmap2 is sys_mmap_pgoff.
* linux/hppa/syscallent.h: mmap2 is sys_mmap_4koff.
* linux/i386/syscallent.h: mmap2 is sys_mmap_pgoff.
* linux/ia64/syscallent.h: mmap2 is sys_mmap_pgoff.
* linux/m68k/syscallent.h: mmap2 is sys_mmap_pgoff.
* linux/microblaze/syscallent.h: old_mmap is sys_mmap, mmap2 is sys_mmap_pgoff.
* linux/mips/syscallent.h: mmap is sys_mmap_4kgoff.
* linux/or1k/syscallent.h: mmap2 is sys_mmap_pgoff.
* linux/powerpc/syscallent.h: mmap2 is sys_mmap_4kgoff.
* linux/s390/syscallent.h: mmap2 is sys_old_mmap_pgoff.
* linux/s390x/syscallent.h: mmap is sys_old_mmap and thus has 1 arg.
* linux/sh/syscallent.h: old_mmap2 is sys_mmap, mmap2 is sys_mmap_4koff.
* linux/sh64/syscallent.h: Likewise.
* linux/sparc/syscallent1.h: mmap is TD|TM.
* linux/tile/syscallent1.h: mmap2 is sys_mmap_4koff.
Denys Vlasenko [Mon, 18 Feb 2013 14:47:57 +0000 (15:47 +0100)]
Remove code which supports systems with long long off_t.
While looking at mmap mess, did experimenting in order
to figure out what gets used when.
Tried building armv4tl, armv5l, armv6l, mips, mipsel, i686,
x86_64 and none of they have long long off_t,
which isn't suprprising: we aren't using glibc defines
which enable that.
Moreover, we SHOULD NOT use off_t in syscall decode!
Its size depends on libc, not on arch! I.e. it is essentially
unpredictable and can even in theory vary on the same arch
with different libc.
We should use longs or long longs, in a way which matches
architectural ABI for the given syscall. There are usually
*at most* two permutations, no need to add yet another variable
(sizeof(off_t)) to the mix.
This change removes almost all HAVE_LONG_LONG_OFF_T conditionals,
which will reveal further possible simplifications.
* mem.c: Remove code conditional on HAVE_LONG_LONG_OFF_T.
As a result, never remap sys_mmap64 to sys_mmap.
(print_mmap): Compile unconditionally.
(sys_old_mmap): Compile unconditionally.
(sys_mmap): Compile unconditionally.
* io.c (sys_sendfile): Add a FIXME comment.
* file.c: Remove code conditional on HAVE_LONG_LONG_OFF_T.
As a result, never remap sys_*stat64 to sys_*stat etc.
(sys_truncate): Compile unconditionally.
(realprintstat): Likewise.
(sys_stat): Likewise.
(sys_fstat): Likewise.
(sys_lstat): Likewise.
* desc.c (printflock): Likewise.
Denys Vlasenko [Mon, 18 Feb 2013 02:13:07 +0000 (03:13 +0100)]
Fixes in "new" mmap
* mem.c (sys_mmap): Ensure unsigned expansion of tcp->u_arg[5].
Add page shift of offset for I386.
Use tcp->ext_arg[5] as offset for X32.
(sys_old_mmap): [X32] Remove this function, X32 doesn't use is.
Denys Vlasenko [Mon, 18 Feb 2013 01:36:36 +0000 (02:36 +0100)]
Preliminary simplifications in mmap functions
* mem.c: Move "define sys_mmap64 sys_mmap" from the top
to the only place it affects.
(print_mmap): Make offset argument unsigned, for safer implicit conversions.
(sys_old_mmap): [IA64] use unsigned narrow_arg[].
Cast u_arg[5] (offset param) to unsigned long, to prevent erroneous signed
expansion.
Denys Vlasenko [Sun, 17 Feb 2013 21:41:33 +0000 (22:41 +0100)]
Remove broken HAVE_LONG_LONG conditionals
We use printllval without HAVE_LONG_LONG guards in many places,
but define it only if HAVE_LONG_LONG. This means that
on !HAVE_LONG_LONG systems we won't build for some time now.
* defs.h: Remove HAVE_LONG_LONG guard around LONG_LONG() macro
and printllval() function declaration.
* util.c: Remove HAVE_LONG_LONG guard around printllval()
function definition.
(printllval): Add compile-time error check for using wrong
if branch. Explain places where we deliberately use mismatched
types for printf formats.
Denys Vlasenko [Sun, 17 Feb 2013 00:38:14 +0000 (01:38 +0100)]
Comment inner workings of sys_[l]lseek
The code doesn't look fully correct to me, but I need to experiment
on actual x32 machine before I start "fixing" things.
For now, add comments, and optimize out one tprints() call...
* file.c (sys_lseek): Rename '_whence' as 'whence'.
Merge printing of ", " into subsequent tprintf.
(sys_lseek32): Likewise.
(sys_llseek): Likewise.
Denys Vlasenko [Fri, 15 Feb 2013 14:25:37 +0000 (15:25 +0100)]
Fix build error on Tile
* syscall.c (get_scno): [TILE] Remove TCB_WAITEXECVE check,
it is never true on Tile, and stopped compiling when
TCB_WAITEXECVE define was removed for Tile.
Denys Vlasenko [Fri, 15 Feb 2013 14:01:38 +0000 (15:01 +0100)]
x86: zero-extend 32-bit args in syscall entry instead of sign-extension
Zero-extension is slightly more common that sign-extension:
all pointers are zero-extended, and some other params are unsigned.
Whereas signed ones (fds, pids, etc) are often treated as
_32-bit ints_ even by kernel, so just unconditionally casting
such tcp->u_arg[N] to int works.
* syscall.c (get_syscall_args): [X86] Zero-extend 32-bit args
instead of sign-extension.
Denys Vlasenko [Fri, 15 Feb 2013 13:58:52 +0000 (14:58 +0100)]
Macroize conditional signed widening operation
* defs.h: Define widen_to_long() macro.
* signal.c (sys_kill): Use it instead of open-coding it.
(sys_tgkill): Use widen_to_long() on pids.
* resource.c (decode_rlimit): Formatting fix.
Denys Vlasenko [Fri, 15 Feb 2013 13:55:14 +0000 (14:55 +0100)]
A better handling of current_wordsize
On x86_64:
text data bss dec hex filename
435661 26628 47424 509713 7c711 strace_old
435501 26612 47440 509553 7c671 strace_new_clever_wordsize
On x32 and arm it should be even better, current_wordsize becomes
a constant there.
* defs.h: Declare current_wordsize as a variable if needed,
else declare as a constant define.
Remove declatation of personality_wordsize[].
* syscall.c: Make personality_wordsize[] static.
Declare current_wordsize as a variable if needed.
(set_personality): Set current_wordsize only if non-constant.
Denys Vlasenko [Thu, 14 Feb 2013 02:29:48 +0000 (03:29 +0100)]
[X86] Use ptrace(PTRACE_GETREGSET, NT_PRSTATUS) to get registers.
Unlike PTRACE_GETREGS, this new method detects 32-bit processes
reliably, without checking segment register values which
are undocumented and aren't part of any sort of API.
While at it, also fixed x32 detection to use __X32_SYSCALL_BIT,
as it should have been from the beginning.
* defs.h: Declare os_release and KERNEL_VERSION.
* strace.c: Make os_release non-static, remove KERNEL_VERSION define.
* syscall.c: New struct i386_user_regs_struct,
static union x86_regs_union and struct iovec x86_io.
(printcall): Use i386_regs or x86_64_regs depending on x86_io.iov_len.
(get_regs): On x86 and kernels 2.6.30+, use PTRACE_GETREGSET,
on earlier kernels fall back to old method.
(get_scno): [X86] Determine personality based on regset size
on scno & __X32_SYSCALL_BIT.
(syscall_fixup_on_sysenter): Use i386_regs or x86_64_regs depending
on x86_io.iov_len.
(get_syscall_args): Likewise.
(get_error): Likewise.
Denys Vlasenko [Wed, 13 Feb 2013 15:31:32 +0000 (16:31 +0100)]
Factor out code to check addr, fetch and print siginfo
* defs.h: Declare new function printsiginfo_at(tcp, addr).
* process.c (sys_waitid): Use printsiginfo_at().
(sys_ptrace): Likewise.
* signal.c: (printsiginfo_at): Implement this new function.
(sys_rt_sigsuspend): Use printsiginfo_at().
(sys_rt_sigtimedwait): Likewise.