Sebastien GODARD [Fri, 22 Dec 2017 07:49:56 +0000 (08:49 +0100)]
sadc/sar: New format (part 7): Allocate structures dynamically
The goal of this patch is to make sar and sadc able to deal with any
number of devices which may be registered by the system, even if this
number is much higher than the intial one when the datafile was created.
Sebastien GODARD [Sat, 16 Dec 2017 09:41:10 +0000 (10:41 +0100)]
Use ULLONG_MAX/2 to check if network device counters have overflown
Counters's type for stats_net_dev structure is "unsigned long long"
(it used to be "unsigned long" before). So use ULLONG_MAX instead of
ULONG_MAX to try to guess if counters have overflown.
Sebastien GODARD [Sat, 16 Dec 2017 08:29:21 +0000 (09:29 +0100)]
Workaround for offline CPU coming back online
I noticed that whith recent kernels, when a CPU comes back online, some
fields (user, nice, system) restart from their previous value, whereas
others (idle, iowait) restart from zero. For values restarting from
zero, we need to set their previous value to zero to avoid getting an
interval value < 0.
We try also to guess if the counters have overflown: If the previous
value was greater than ULLONG_MAX & 0x80000, then the CPU was probably
not set offline but the counter overflew (the value 0x80000 is an
arbitrary limit that I decided to use).
In the example above, idle value for CPU#3 is 95477 before CPU is set
offline, then 677 when it has come back online, and iowait value goes
from 25 to 0.
On the other hand, user value for example goes from 1208 to 1213...
(We can also notice that values given for global CPU usage on the first
line have a really strange behavior...)
Sebastien GODARD [Sun, 10 Dec 2017 08:35:16 +0000 (09:35 +0100)]
sar/sadc: Make use of __nr_t type
__nr_t type (which defaults to int) is used to define a number of items
(devices, network interfaces, etc.) So use it instead of int in various
places.
sar: No need to init previous iteration structure when a device has been
registered again
Simplify code a bit: Don't reset previous iteration's statistics when a
known device has been registered again, since this is already done when
stats are displayed.
Sebastien GODARD [Fri, 17 Nov 2017 13:37:09 +0000 (14:37 +0100)]
sar/sadc: Get rid of volatile activities
So called "volatile activities" were activities for which the number of
items could vary and corresponding structures were reallocated
accordingly.
In fact only A_CPU activity (CPU statistics) was volatile, to take into
account a possible restart of a VM machine including more CPU than when
the datafile was created.
The goal is now to allocate dynamically all the structures used by
activities, so we no longer have to distinguish between activities which
are volatile or not.
Sebastien GODARD [Fri, 17 Nov 2017 08:33:50 +0000 (09:33 +0100)]
sar/sadc: Don't assume CPU statistics are always recorded in file
CPU statistics were always collected and saved when a saDD datafile was
created. Those statistics were in particular used to compute time
interval in jiffies, and to give the number of processors of the machine
(which was displayed in the report header).
We no longer use CPU jiffies to compute time interval (which is now
expressed in 1/100th of a second and calculated using the /proc/uptime
file). So CPU statistics are no longer compulsory: They may not be
collected and available in file.
Since we always need a way to know the number of CPU of the machine
where the data file has been created, we add a new field (sa_cpu_nr) in
the file_header structure.
Sebastien GODARD [Sun, 12 Nov 2017 09:14:20 +0000 (10:14 +0100)]
sar/sadc: Increase devices upper limits
We found that previous upper limits values were too low sometimes (e.g.
512 for the maximum number of network interfaces). So increase them.
(Upper limits exist to make sure we don't allocate unreasonably high
amount of memory to save devices statistics, in particular when we read
data from a -possibly- corrupted datafile).
Sebastien GODARD [Sun, 12 Nov 2017 08:17:55 +0000 (09:17 +0100)]
sar/sadc: Remove preallocation constants
Next sysstat version will dynamically allocate records for new devices
when they are registered by the system. So we no longer need to
preallocate those records.
This patch removes the constants and files that were used for this
purpose.
Sebastien GODARD [Fri, 13 Oct 2017 12:20:30 +0000 (14:20 +0200)]
Don't compute global system uptime when reading CPU stats
Don't compute "global system uptime" (i.e. total number of jiffies spent
by all processors) in read_stat_cpu() function so that we don't have to
pass the result from one function to another.
Global system uptime is used only when displaying CPU statistics, so
this is now done locally in the functions displaying CPU statistics.
Also as a consequence, field uptime has been removed from the
record_header structure associated with every sample statistics in
sar/sadc.
Sebastien GODARD [Fri, 13 Oct 2017 07:29:19 +0000 (09:29 +0200)]
No longer use /proc/stat to calculate system uptime
sysstat commands used the /proc/stat file to compute system uptime based
on the number of jiffies spent by CPU#0 (which cannot be set offline).
Now only use the /proc/uptime file which we assume always exists on a
Linux machine.
activity structure's f_count2() member (used to count the number of
sub-items) was tested in sadf.c but never initialized (its value is
initialized only when sadc.c is compiled, not sadf.c). So use a specific
flag to indicate if activity has sub-items instead of testing f_count2()
value.
Add extra checks on *_types_nr[] values read from a file to cope with
possible corrupted files.
*_types_nr[] values give the number of fields of each type (long long,
long, int) composing a structure.
sadc/sar: New format (part 4): Update file_magic structure
Use integer type for upgraded field instead of unsigned char in
file_magic structure. Thus we are no longer limited to 15 for the
patchlevel and sublevel version numbers.
sadc/sar: New format (part 3): Use 64-bit time fields
Now use unsigned long long type for time fields (instead of
unsigned long, which are only 32-bit wide on 32-bit systems).
This is to avoid any future problems (like year 2038 integer
overflow).
Also code the year number as an integer instead of a char in datafile's
header.
This function will be used to map two structures from two
different sysstat versions.
It will thus be possible to add new fields, e.g., in structures
containing statistics and still be possible for another sysstat version
to read its contents properly.
This patch aims at making sar and sadf able to read and process files
created on machines with a different endian type. It will now be
possible e.g., to read a file with sar on a little-endian machine even
if this has been created on a big-endian machine.
Yet it will still not be allowed to append (write) data with sar to a
file with a different endianness type.
sa.h: Add new gtypes_nr field in activity structure
This field describes the corresponding structure containing statistics
(as defined in rd_stats.h or rd_sensors.h): Number of fields of type
"long long", number of fields of type "long", number of fields of type
"int".
This field will be used for various purposes, e.g., big/little endian
conversion.
sadf: Update SVG output for new binary datafile format
This is a small fix for I/O and transfer rate SVG graphs.
Structure stats_io members were considered as unsigned long integers
(since they were packed in the structure) but they actually are unsigned
long long integers.
sadc/sar: New format for binary datafiles (part 1)
This is the first patch aimed at redesigning sar binary datafiles (see
also issue #135).
This patch changes alignment for some structures, in particular:
long long integers don't need to be aligned on addresses multiple of 16
bytes
only long integers must be aligned on addresses multiple of 8 bytes
so that the size is sufficient for both 32 and 64 bit machines.
The consequence is a binary datafile which is much smaller in size than
its predecessor (on my machine, sa datafiles are now on average 33%
smaller than before).
But of course this new format is no longer compatible with previous one
and as a consequence, older sysstat versions won't be able to read it (I
still plan to update "sadf -c" so that old binary datafiles can be
converted to the up-to-date format).
Make SCCSID strings optional to allow reproducible build.
SCCSID strings are no longer included in executable files by default.
Use "./configure CFLAGS="-D USE_SCCSID" ; make" to include them.
Sebastien GODARD [Sat, 26 Aug 2017 12:42:28 +0000 (14:42 +0200)]
Fix #162: sadc crashes on a mtab file with really long lines
A segmentation fault may happen with "sadc -S DISK..." or
"sadc -S XDISK..." when lines longer than 512 bytes are read from
/etc/mtab.
Such lines are possible for instance when overlay2 filesystem
with docker is used. In such a case a single mtab entry can look
like this (note that new line characters were added for readability,
the original entry contained only one '\n' at the end):
The crash occurs in the get_filesystem_nr() and read_filesystem()
functions which call strchr(line, ' ') but fail to check if the result
is not NULL.
This patch adds this check, and when a single mtab entry requires more
that one call to fgets() (i.e. the entry is longer than 512 bytes), it
ignores outcome of the second and following calls.
Bugs-Debian: https://bugs.debian.org/872926 Signed-off-by: Robert Luberda <robert@debian.org> Signed-off-by: Sebastien GODARD <sysstat@users.noreply.github.com>
Add new option "-e" to pidstat, which can be used to pass a program
to execute and make pidstat monitor it.
When the program terminates, pidstat will catch the SIGCHLD signal and
display the average statistics.
pidstat: Add new option to display timestamps in seconds since the Epoch
Previously, option -h used to display all activities horizontally on a
single line also displayed timestamps in second since the Epoch. Though
this was intended to be used by scripts, it was not necessarily a
desired feature (see #155).
This patch changes option -h so that time format remains unmodified (all
activities are displayed on a single line with no average statistics)
and adds a new option (-H) that tells pidstat to display time in seconds
since the Epoch. This option may be used separately from option -h.
Revert "Fix #148: ARM: sadc crashes because of unaligned memory accesses"
This reverts commit 569378eb1a3be23cdb45ac5d39e354683a7748f8.
Proposed fix for #148 "sadc crashes because of unaligned memory access"
doesn't work properly:
Copying old pointer contents to new pointer destination uses the size of
the memory pointed to by the newly allocated pointer... which is not the
size of the memory the old pointer was pointing to. The result is you
may get a segmentation fault in some cases.
So for now, revert the corresponding patch, awaiting a better solution.
Sebastien GODARD [Sun, 25 Jun 2017 07:31:35 +0000 (09:31 +0200)]
SVG: Fix graphs for swap space utilization statistics
Commit 8b71682 added a new metric to sar's memory report (available free
memory). This new metric caused a shift in the array containing data to
be drawn that wasn't properly taken into account (see commit f90adb6).
As a consequence graphs for swap space utilisation statistics (those
corresponding to sar -S output) were wrong.
This patch fixes the problem by using the right position in data array.
Sebastien GODARD [Fri, 23 Jun 2017 10:00:01 +0000 (12:00 +0200)]
Fix #154: pidstat: Don't stop if /proc/#/schedstat files not found
Commit a41b24a added %wait field to pidstat CPU statistics. This field
is calculated for each task using the contents of /proc/#/schedstat
file. Yet pidstat should not stop if this file cannot be found: It
should display the other CPU statistics as usual and display 0.00 for
this particular field. This is what this patch does.
Sebastien GODARD [Fri, 16 Jun 2017 13:17:33 +0000 (15:17 +0200)]
Start collect and summary systemd services after sysstat.service
When booting a system, systemd could start sysstat.service and
sysstat-collect.service at the exact same time, causing sysstat.service
to fail with the "flock: Resource temporarily unavailable" error
message.
To avoid the failure, ensure that sysstat.service is started before the
collect and summary services and timers, by adding
"After=sysstat.service" ordering dependencies.
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Acked-by: Tomasz Torcz <tomek@pipebreaker.pl> Signed-off-by: Sebastien GODARD <sysstat@users.noreply.github.com>
Replace "rd_sec/s" and "wr_sec/s" fields (expressed in sectors) with
"rkB/s" and "wkB/s" (expressed in kilobytes).
Replace "avgrq-sz" field (expressed in sectors) with "areq-sz" (expressed
in kilobytes).
Rename "avgqu-sz" field to "aqu-sz".
Original field names are still present in sadf's XML and JSON output to
keep backward compatibility.
Replace "rd_sec/s" and "wr_sec/s" fields with "rkB/s" and "wkB/s". These
fields are now expressed in kilobytes instead of sectors. This also make
them consistent with iostat's output.
Replace "avgrq-sz" field with "areq-sz". This field is now expressed in
kilobytes instead of sectors and make it consistent with iostat's output.
Rename "avgqu-sz" field to "aqu-sz" to make it consistent with iostat's
output.
Notes:
1) All those changes don't break the format of sar's binary data files.
The values for rkB/s, wkB/s and areq-sz fields are still saved as a
number of sectors. Only the output displayed onto the screen changes.
2) I plan to keep the original field names (in addition to the new ones)
in sadf's XML and JSON output to keep backward compatibility. This means
you will still get fields named rd_sec, wr_sec and avgrq-sz (expressed
in sectors) in XML and JSON output in addition to the new fields rkB,
wkB and areq-sz expressed in kB.
Field avgqu-sz will still exist too.
Make JSON output take into account the options used with iostat.
This fixes a problem where some fields had a name like rkB/s or wkB/s
even when data were expressed in MB. In this example, the name will now
be rMB/s or wMB/s when data are expressed in MB.
This patch renames several fields, breaking backward compatibility that
I first hoped to keep. However all fields are now consistent with
iostat's standard report.
iostat: Express requests average size in kB, not sectors
Since field "avgrq-sz" was renamed to "areq-sz", also change its unit:
This is now a number of kilobytes, and not sectors.
JSON output keeps both fields for backward compatibility: "avgrq-sz"
expressed in sectors, and "areq-sz" expressed in kilobytes.
iostat: Add new metrics in extended statistics output
Add the following metrics to iostat -x output:
%rrqm: percentage of read requests merged
%wrqm: percentage of write requests merged
rareq-sz: average size (in sectors) of the read requests
wareq-sz: average size (in sectors) of the write requests
The metric previously known as "avgqu-sz" has been renamed to "aqu-sz"
(except in the JSON output where the name remains unchanged).
The metric previously known as "avgrq-sz" has been renamed to "areq-sz"
(except in the JSON output where the name remains unchanged).
The "await" metric is no longer displayed in the default output (you
have separate values for reads and writes: "r_await", "w_await"),
except in the short output version where it replaces "r_await" and
"w_await".
This patch casts some variables to target type before they are used.
Without this patch, problems may happen (like issue #150) notably on 32
bit architectures where sizeof(long) is different from sizeof(long
long).
Sebastien GODARD [Sat, 27 May 2017 13:58:54 +0000 (15:58 +0200)]
SVG: Define a max number of horizontal lines for the background grid
When a graph for percentage values is displayed, a background grid with
horizontal lines for 25%, 50%, 75% and 100% is drawn.
This works well except when sadf gets a bogus value for the maximum
value reached by the metric. For example, if sadf thinks that the max
value is 123456789% then a high number of horizontal lines will be
drawn, resulting in a huge SVG file that can weigh hundreds of megabytes
or more (NB: There are metrics which are percentage values that can go
above 100%, e.g., %commit).
This patch fixes that by setting a limit for the number of horizontal
lines that can be drawn.
Sebastien GODARD [Wed, 24 May 2017 09:23:03 +0000 (11:23 +0200)]
Fix #153: sar program buffer overflow when options -s or -e specified
When a short time format is used with sar's options -s or -e (e.g.,
sar -s 04:00), 5 characters are copied by strncpy in parse_timestamp
to timestamp variable. Unfortunately these 5 characters do not contain
the termination, therefore the following strcat appends after the
next "random" null byte. Therefore writing beyond the end of timestamp.
This patch tries to prevent this by explicitly terminating.
Debian bug #863197.
Reported-by: Robert Luberda Signed-off-by: Bernhard Ubelacker <bernhardu@mailbox.org> Signed-off-by: Sebastien GODARD <sysstat@users.noreply.github.com>