Fanda Uchytil [Sat, 5 Oct 2019 22:55:13 +0000 (00:55 +0200)]
strace: expand -D option
As of now, despite of being stated that -D option runs strace as a "detached"
grandchild (and the option name being named after "daemon"), strace
still runs in the same process group and session, thus not being
"detached" in a common sense and being subjected to process group kill
and session termination kill. Quoting[1]:
I stumble upon unexpected behavior: if strace is used with option '-D'
(tracer as a detached grandchild) and process (leader) kills whole
process group, it will kill strace too.
It can be easily reproduced by `timeout` from "coreutils":
Here is "strace log" of `strace` inside `timeout`:
strace: Process 30603 attached
wait4(-1, <unfinished ...>) = ?
+++ killed by SIGKILL +++
I think that detached `strace` should not be killed like that -- it
should not be part of former grandparents' "job pipeline".
While this behaviour is not exactly intuitive, it is implemented this
way for quite some time, so it might be relied upon by some of strace
users. In order to address this issue, two new levels of
"daemonisation" are added, that put strace in a separate process group
and session, respectively.
Dmitry V. Levin [Fri, 4 Oct 2019 13:16:33 +0000 (13:16 +0000)]
filter_seccomp: fix build for no-MMU targets
Avoid unsupported fork() call on no-MMU Linux systems to fix
the following link error:
ld: strace-filter_seccomp.o: in function `check_seccomp_filter':
filter_seccomp.c:(.text+0x39a): undefined reference to `fork'
collect2: error: ld returned 1 exit status
* filter_seccomp.c (__gcov_flush, check_seccomp_order_do_child,
check_seccomp_order_tracer): Move under HAVE_FORK guard.
(check_seccomp_order): Move fork code under HAVE_FORK guard.
(check_seccomp_filter_properties): Do not check for NOMMU_SYSTEM.
* NEWS: Mention this fix.
Dmitry V. Levin [Wed, 2 Oct 2019 09:32:26 +0000 (09:32 +0000)]
tests/sigaction: workaround odd libcs on alpha and mips
Apparently, some libcs define SA_RESTORER on alpha and mips
despite of the absence of sa_restorer field. Workaround this
to match the logic implemented in decode_old_sigaction().
* tests/sigaction.c (main) [ALPHA || MIPS]: Do not check decoding
of sa_restorer field.
Dmitry V. Levin [Tue, 1 Oct 2019 09:10:46 +0000 (09:10 +0000)]
tests: fix -a argument in stat and lstat tests
* tests/trace_lstat.in (lstat): Change -a argument from 32 to 31.
* tests/trace_stat.in (stat): Change -a argument from 32 to 30.
* tests/gen_tests.in (lstat): Change -a argument from 32 to 31.
(stat): Change -a argument from 32 to 30.
Commits v5.3~74 and v5.3~73 have introduced an extended syntax
for time interval sizes specification, but the relevant descriotion
was lacking. Fix it by adding the relevant section to the man page
and reference to it in the descriptions of the respective options.
* strace.1.in (.SH OPTIONS): Rewrite descriptions of -O,
-e inject=delay_enter, and -e inject=delay_exit values, refer to section
"Time specification format description".
(.SS "Time specification format description"): New section.
Complements: v5.3~74 "delay: use parse_ts for parsing delay value"
Complements: v5.3~73 "count: use parse_ts for parsing overhead value"
Chen Jingpiao [Mon, 6 Aug 2018 13:58:43 +0000 (21:58 +0800)]
tests: check seccomp-assisted syscall filtering
Test filter_seccomp-perf checks whether seccomp-filter is actually
enabled by comparing the number of syscalls performed in a time interval
when seccomp-filter is enabled vs. disabled. The number of syscalls
should be at least one order of magnitude higher when seccomp-filter
is enabled.
Test filter_seccomp-flag ensures the audit_arch_vec[].flag constants do
not conflict with syscall numbers. If this test fails, then the number
of syscalls grew high enough that the code for seccomp-filter needs to
be updated.
* tests/init.sh (test_prog_set): New function.
* tests/status-none-f.c: New file.
* tests/filter_seccomp.in: Likewise.
* tests/filter_seccomp.sh: Likewise.
* tests/filter_seccomp-perf.c: Likewise.
* tests/filter_seccomp-flag.c: Likewise.
* tests/filter_seccomp-perf.test: New test.
* tests/Makefile.am (EXTRA_DIST): Add filter_seccomp.in and
filter_seccomp.sh.
(MISC_TESTS): Add filter_seccomp-perf.test.
(check_PROGRAMS): Add filter_seccomp-perf and filter_seccomp-flag.
* tests/pure_executables.list: Add status-none-f.
* tests/.gitignore: Add status-none-f, filter_seccomp-perf, and
filter_seccomp-flag.
* tests/gen_tests.in (filter_seccomp, filter_seccomp-flag): New entries.
Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com> Co-Authored-by: Dmitry V. Levin <ldv@altlinux.org>
Paul Chaignon [Mon, 1 Jul 2019 19:14:15 +0000 (21:14 +0200)]
filter_seccomp: skip seccomp setup when there's nothing to filter
If the trace_set set is complete (no syscalls are filtered), seccomp
filtering is disabled. This patch adds a new is_complete_set_array
function to check whether all sets of a set array are complete.
* number_set.c (is_complete_set_array): New function.
* number_set.h (is_complete_set_array): New prototype.
* filter_seccomp.c (check_seccomp_filter): Skip seccomp setup if there is
nothing to filter.
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Chen Jingpiao [Thu, 3 May 2018 13:00:38 +0000 (21:00 +0800)]
Introduce seccomp-assisted syscall filtering
With this patch, strace can rely on seccomp to only be stopped at syscalls
of interest, instead of stopping at all syscalls. The seccomp filtering
of syscalls is opt-in only; it must be enabled with the --seccomp-bpf
option. Kernel support is first checked with check_seccomp_filter(),
which also ensures the BPF program derived from the syscalls to filter
is not larger than the kernel's limit.
The --seccomp-bpf option implies -f, but a warning is emitted if -f is not
explicitly specified. Since a task's children inherit its seccomp
filters, we want to ensure all children are also traced to avoid their
syscalls failing with ENOSYS (cf. SECCOMP_RET_TRACE in seccomp man page).
Fork/vfork/clone children of traced processes are marked as not having a
seccomp filter until we receive a first seccomp-stop. They are therefore
stopped at every syscall entries and exits until that first seccomp-stop.
The current BPF program implements a simple linear match of the syscall
numbers. Contiguous sequences of syscall numbers are however matched as
an interval, with two instructions only. The algorithm can be improved
or replaced in the future without impacting user-observed behavior.
The behavior of SECCOMP_RET_TRACE changed between Linux 4.7 and 4.8
(cf. PTRACE_EVENT_SECCOMP in ptrace man page). This patch supports both
behaviors by checking the kernel's actual behavior before installing the
seccomp filter.
* filter_seccomp.c: New file.
* filter_seccomp.h: New file.
* Makefile.am (strace_SOURCES): Add filter_seccomp.c and
filter_seccomp.h.
* linux/aarch64/arch_defs_.h (PERSONALITY0_AUDIT_ARCH,
PERSONALITY1_AUDIT_ARCH): Define for aarch64.
* linux/powerpc64/arch_defs_.h (PERSONALITY0_AUDIT_ARCH,
PERSONALITY1_AUDIT_ARCH): Likewise for powerpc64.
* linux/s390x/arch_defs_.h (PERSONALITY0_AUDIT_ARCH,
* linux/sparc64/arch_defs_.h (PERSONALITY0_AUDIT_ARCH,
PERSONALITY1_AUDIT_ARCH): Likewise for sparc64.
PERSONALITY1_AUDIT_ARCH): Likewise for s390x.
* linux/tile/arch_defs_.h (PERSONALITY0_AUDIT_ARCH,
PERSONALITY1_AUDIT_ARCH): Likewise for tile.
* linux/x32/arch_defs_.h (PERSONALITY0_AUDIT_ARCH,
PERSONALITY1_AUDIT_ARCH): Likewise for x32.
* linux/x86_64/arch_defs_.h (PERSONALITY0_AUDIT_ARCH,
PERSONALITY1_AUDIT_ARCH, PERSONALITY2_AUDIT_ARCH): Likewise for x86_64.
* linux/ia64/arch_defs_.h (PERSONALITY0_AUDIT_ARCH): Likewise for IA64.
* strace.c (usage): Document --seccomp-bpf option.
(startup_child): Mark process has having seccomp filter.
(exec_or_die): Initialize seccomp filtering if requested.
(init): Handle --seccomp-bpf option and check that seccomp can be
enabled.
(print_debug_info): Handle PTRACE_EVENT_SECCOMP.
(next_event): Capture PTRACE_EVENT_SECCOMP event.
(dispatch_event): Handle PTRACE_EVENT_SECCOMP event.
* trace_event.h (trace_event): New enumeration entity.
* strace.1.in: Document new --seccomp-bpf option.
* NEWS: Mention this change.
Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com> Co-Authored-by: Dmitry V. Levin <ldv@altlinux.org>
Recognize --help and --version options as aliases to -h and -V options,
respectively.
* strace.c: Include <getopt.h>.
(init): Move short options to optstring, add longopts array, use
getopt_long instead of getopt.
(usage): Document --help and --version options.
* strace.1.in: Likewise.
* tests/strace-V.test: Check that "strace --version" output is the same
as "strace -V" output.
Figure out whether the ioctl code is decoded inside a comment and adjust
printflags/printxval calls accordingly.
* ioctl.c (ioctl_print_code, evdev_decode_number): Add abbrev variable,
set it to true if xlat style is not XLAT_STYLE_VERBOSE, do not provide
dflt and set xlat style to XLAT_STYLE_ABBREV in printflags/printxval
calls (that are now changed to printflags_ex/printxval_ex to accomodate
the change).
PAF_ARRAY_TRUNCATED allows enforcing the fact that an array
is truncated, which is useful for arrays in local memory that are known
as being truncated.
Add support for printing local arrays to print_array
* defs.h (print_array_ex): Describe parameters.
(print_local_array): A wrapper for printing arrays in local memory
via print_array_ex.
* util.c (print_array_ex): Handle case of NULL tfetch_mem_func by
printing elements of array in local memory pointed by start_addr
parameter.
tests: implement ioctl_evdev-success-v.test via ioctl_evdev-success.test
* tests/ioctl_evdev-success-v.test: Remove.
* tests/Makefile.am (DECODER_TESTS): Remove ioctl_evdev-success-v.test.
* tests/gen_tests.in: Add ioctl_evdev-success-v as a wrapper for
ioctl_evdev-success.test.
* tests/ioctl_evdev-success.test: Save "$args" to $prog, increase -a
parameter value to 26 columns, inject "$@" into run_strace arguments,
call $prog instead of axplicit program name.
Paul Chaignon [Sat, 21 Sep 2019 13:00:51 +0000 (15:00 +0200)]
tests: fix format warnings on x32
The type of __X32_SYSCALL_BIT changed from int to unsigned long by Linux
kernel commit v5.3-rc1-1-g45e29d119e9923ff14dfb840e3482bef1667bbfb.
Consequently, __NR_* macros are now defined to values of an unsigned long
integer type on x32.
tests/prctl-seccomp-filter-v.c (PRINT_ALLOW_SYSCALL, PRINT_DENY_SYSCALL):
Fix format warning.
tests/seccomp-filter-v.c (PRINT_ALLOW_SYSCALL, PRINT_DENY_SYSCALL):
Likewise.
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
When the data to be fetched by vm_read_mem resides in a single memory
page, fetch the whole page and cache it. This implementation caches
up to two memory pages.
* defs.h (invalidate_umove_cache): New prototype.
* strace.c (next_event): Call invalidate_umove_cache.
* ucopy.c (cached_idx, cached_raddr): New static variables.
(process_read_mem): New function.
(vm_read_mem): Use them. Implement fetched page caching.
* tests/umovestr_cached.test: New test.
* tests/Makefile.am (MISC_TESTS): Add umovestr_cached.test.
* tests/umovestr_cached.c: New file.
* tests/pure_executables.list: Add umovestr_cached.
* tests/.gitignore: Likewise.
Rewrite printnum_{slong,ulong,ptr,kptr} using dispatch_{word,klong}size
* defs.h (printnum_long_int, printnum_addr_long_int, printnum_addr_klong_int):
Remove declaration.
(printnum_slong, printnum_ulong): Implement unconditionally using
dispatch_wordsize and opt_wordsize for the last argument.
(printnum_ptr): Implement unconditionally using dispatch_wordsize.
(printnum_kptr): Implement unconditionally using dispatch_klongsize.
* util.c (printnum_long_int, printnum_addr_long_int, printnum_addr_klong_int):
Remove.
* defs.h (set_personality, current_personality, current_wordsize,
current_klongsize, max_addr, max_kaddr): Move upwards.
(opt_wordsize): New macro, calls the first or the second argument
depending on the word size.
(dispatch_wordsize): New macro, calls the first or the second function
with the rest of macro parameters as arguments depending on the word
size.
(opt_klongsize): New macro, calls the first or the second argument
depending on the kernel long size.
(dispatch_klongsize): New macro, calls the first or the second function
with the rest of macro parameters as arguments depending on the kernel
long size.
In the conversion of PRINT_UNKNOWN_TAIL into PRINT_UNKNOWN_TAIL_EX
the usage of sizeof(*(hdr_)) hasn't been replaced to (hdr_size_)
in all places. Offset calculation also had to be changed.
* s390.c (PRINT_UNKNOWN_TAIL_EX): Fix addr and len arguments
in is_filled and print_quoted_string calls.
When strace is built for (32-bit) x86, it has HAVE_STRUCT_USER_DESC
and SUPPORTED_PERSONALITIES == 1, which led to execution of the both
branches. Simplify the logic by including the SUPPORTED_PERSONALITIES
into the condition.
* clone.c (print_tls_arg): Include SUPPORTED_PERSONALITIES into the "if"
condition.
* s390.c (struct sthyi_machine): Add fields reserved_1__, infmplnm;
update comment for the infmval1 field; update the related static_assert.
(struct sthyi_partition): Update infpflg1 comment; update infpval1
comment; add infpplnm field; update the related static_assert.
(struct sthyi_hypervisor): Update infyflg1 field comment; add
infyinsf and infyautf fields; update the related static_assert.
(CHECK_SIZE_EX): Rename from CHECK_SIZE; add min_size_ argument, check size_
against it.
(CHECK_SIZE): New macro, a wrapper for CHECK_SIZE_EX.
(PRINT_UNKNOWN_TAIL_EX): Rename from PRINT_UNKNOWN_TAIL, add hdr_size_
argument.
(PRINT_UNKNOWN_TAIL): New macro, a wrapper for PRINT_UNKNOWN_TAIL_EX.
(print_sthyi_machine): New local variable last_decoded; use
CHECK_SIZE_EX instead of CHECK_SIZE to check against the initial value
of last_decoded; decode reserved_1__ and infmplnm fields if the returned
size indicates that they are present; use PRINT_UNKNOWN_TAIL_EX for
printing structure's tail.
(print_sthyi_partition): New local variable last_decoded; use
CHECK_SIZE_EX instead of CHECK_SIZE to check against the initial value
of last_decoded; decode infpplnm field if the returned size indicates
that it is present; use PRINT_UNKNOWN_TAIL_EX for printing structure's
tail.
(print_funcs): New function.
(print_sthyi_hypervisor): New local variable last_decoded; use
CHECK_SIZE_EX instead of CHECK_SIZE to check against the initial value
of last_decoded; update infyflg1 field decoding; decode infyinsf
and infyautf fields if the returned size indicates that they
are present; use PRINT_UNKNOWN_TAIL_EX for printing structure's tail.
(s390_sthyi): Update specification URL.
* tests/s390_sthyi.c: Update expected output.
strace.spec: lower CentOS version requirement for pkgconfig(bluez)
bluez-libs-devel provides pkgconfig(bluez) and the actual headers both
in RHEL 6 and RHEL 7, so the version condition for enablement
of pkgconfig(bluez) in spec file can be lowered. However, the package
in question is in the "optional" repository in RHEL, and there seems to be
no easy way to enable it in OBS (where this spec file is mainly used)
so only %centos check is actually changed for now.
* strace.spec.in: Change "0%{?centos} >= 8" to "0%{?centos} >= 6"
for "BuildRequires: pkgconfig(bluez)" enablement.
strace.1.in: try to be more clear with -e trace=class deprecation notice
It was reported that the current way of labelling of the percent-less
-e trace=class syntax variant may be confusing, as it can be read
as deprecation of the whole option and not specific syntax; try to be
more clear by moving the deprecation notices into the option
descriptions.
* strace.1.in (.SS Filtering): Move the deprecation notice
of -e trace={file,process,network,signal,ipc,desc,memory} syntax
to the descriptions of the respective options.
Dmitry V. Levin [Thu, 15 Aug 2019 20:23:19 +0000 (20:23 +0000)]
Fix syscall tampering when PTRACE_GET_SYSCALL_INFO is in use on some architectures
When PTRACE_GET_SYSCALL_INFO is in use on those architectures
that invoke set_regs in arch_set_scno, get_regs is not called,
so it has to be invoked explicitly before tampering.
Dmitry V. Levin [Thu, 15 Aug 2019 20:23:19 +0000 (20:23 +0000)]
sparc, sparc64: fix redundant get_regs invocation
An explicit get_regs invocation was added to arch_set_error and
arch_set_success on sparc/sparc64 by commit v5.2~27 in attempt to fix
syscall tampering on these architectures when PTRACE_GET_SYSCALL_INFO
is in use.
That change, however, did not fix the bug because set_error and
set_success already invoke get_regs on all architectures where
ptrace_setregset_or_setregs is defined, this includes sparc and sparc64.
* linux/sparc/set_error.c (sparc_set_o0_psr): Do not invoke get_regs.
* linux/sparc64/set_error.c (sparc64_set_o0_tstate): Likewise.
* NEWS (5.2): Remove the statement about syscall tampering fix
on sparc and sparc64 when PTRACE_GET_SYSCALL_INFO is in use.
Replace direct usage of err_name/errnoent with print_err
Introduce print_err function that prints error number respecting current
xlat verbosity settings, and switch err_name/errnoent callers to use
this new function instead.
* defs.h (err_name): Remove.
(print_err): New declaration.
* print_fields.h (PRINT_FIELD_ERR_D, PRINT_FIELD_ERR_U): New macros.
* syscall.c (err_name): Add static qualifier, change argument type
to uint64_t.
(print_err): New function.
* keyctl.c (keyctl_reject_key): Use print_err for printing error
argument.
* net.c (print_get_error): Use print_err for printing err.
* numa.c (print_status): Use print_err for printing errno.
* netlink.c: Include "print_fields.h".
(decode_nlmsgerr): Use PRINT_FIELD_ERR_D for printing errno field.
* printsiginfo.c: Include "print_fields.h".
(print_si_info): Use PRINT_FIELD_ERR_U for printing si_errno field.
* ptrace_syscall_info.c (print_ptrace_syscall_info): Use
PRINT_FIELD_ERR_D for printing info.exit.rval.
* tests/pidfd_send_signal.c (main): Update expected output.
* tests/ptrace.c (main): Likewise.
Co-Authored-by: Dmitry V. Levin <ldv@altlinux.org>
Dmitry V. Levin [Tue, 13 Aug 2019 11:06:57 +0000 (11:06 +0000)]
xlat: update *_MAGIC constants
* xlat/fsmagic.in (Z3FOLD_MAGIC): New constant introduced
by Linux kernel commit v5.3-rc1~31^2~30.
(DMA_BUF_MAGIC): New constant introduced by Linux kernel commit
v5.3-rc1~22^2~20^2~42.
* NEWS: Mention this.
Dmitry V. Levin [Tue, 13 Aug 2019 11:06:57 +0000 (11:06 +0000)]
xlat: update KEYCTL_* constants
* xlat/keyctl_commands.in (KEYCTL_PKEY_QUERY, KEYCTL_PKEY_ENCRYPT,
KEYCTL_PKEY_DECRYPT, KEYCTL_PKEY_SIGN, KEYCTL_PKEY_VERIFY): New
constants introduced by Linux kernel commit v4.20-rc1~29^2~20.
(KEYCTL_MOVE): New constant introduced by Linux kernel commit
v5.3-rc1~189^2~3.
(KEYCTL_CAPABILITIES): New constant introduced by Linux kernel commit
v5.3-rc1~189^2.
* NEWS: Mention this.
Dmitry V. Levin [Sun, 11 Aug 2019 13:11:10 +0000 (13:11 +0000)]
xlat: update SO_* constants
* xlat/sock_options.in (SO_BINDTOIFINDEX): New constant introduced
by Linux commit v5.1-rc1~178^2~508.
(SO_RCVTIMEO_NEW, SO_SNDTIMEO_NEW): New constants introduced by Linux
commit v5.1-rc1~178^2~363^2.
(SO_DETACH_REUSEPORT_BPF): New constant introduced by Linux commit
v5.3-rc1~140^2~179^2~12.
syscall: track syscall system time a bit more explicitly
Before, it relied on implicit assumptions that syscall-exit event is
right the next one after syscall-enter. Also, there's some additional
debugging output that might help someone someday.
* count.c (count_syscall): Calculate system time as difference of tcp's
stime and ltime.
* defs.h (struct tcb): Add ltime, atime fields, remove dtime.
* strace.c (droptcb): Print total system time spent by a tcb.
(startup_tcb): Store initial system time in atime.
(next_event): Update stime directly.
* syscall.c (syscall_entering_finish): Store current system time in
tcb's ltime field.
(syscall_exiting_finish): Likewise.
* count.c (zero_ts): New variable.
(count_syscall): Calculate the spent time in the wts variable, then add
it to cc->time.
(call_summary_pers): Do not perform overhead correction.
* count.c (set_overhead): Change argument type to const char *, call
parse_ts to parse it and set to overhead.
* defs.h (set_overhead): Update declaration.
* strace.c: (init) <case 'O'>: do not parse argument, pass optarg to
set_overhead call.
* tests/count.test (GENERIC, WALLCLOCK, WALLCLOCK1, HALFCLOCK): New
variables with expected patterns.
Add checks for the new -O syntax.