Sebastien GODARD [Sat, 24 Mar 2018 09:33:20 +0000 (10:33 +0100)]
sar: Softnet stats: Improve support for offline/online CPUs
When a CPU goes offline, the corresponding line in the
/proc/net/softnet_stat file disappears. The problem is that there is no
immediate solution to know which line goes with which CPU.
To fix this, we now use the /proc/stat file to know which CPU are online
and which ones are not.
Moreover, when a CPU comes back online, its counters get their original
values, which makes sar think they have just jumped from 0:
This is what sar displayed before (in this example, CPU 5 goes offline):
Sebastien GODARD [Fri, 23 Mar 2018 11:25:44 +0000 (12:25 +0100)]
mpstat: Remove option "-P ON"
mpstat now doesn't display offline CPUs. So option "-P ON" used to
tell mpstat to display only online CPUs is no longer needed.
This is the same behavior as sar.
Sebastien GODARD [Fri, 23 Mar 2018 11:14:49 +0000 (12:14 +0100)]
mpstat: Compute stats for node "all" as the sum of individual CPU
Don't use global CPU stats from /proc/stat file for node "all".
Compute stats for node "all" as the sum of individual ones.
Also better handle CPUs that go offline or back online.
Sebastien GODARD [Fri, 16 Mar 2018 09:58:37 +0000 (10:58 +0100)]
sar: Compute global CPU stats as the sum of individual ones
sar used to get statistics for CPU "all" from the first line of the
/proc/stat file giving global CPU utilization.
There are several problems with this:
1) With recent kernels (problem detected on a 4.4.14 kernel), the number
of jiffies spent in idle and iowait modes given by this file for global
CPU utilization goes crazy when a CPU is set offline or comes back
online. These counters may even not be monotonic, resulting in wrong
results being displayed by sar.
E.g.:
2) The updating of the /proc/stat global and individual values is not
done atomically. As a result there can be skew between the global and
individual values reported by sar.
E.g.:
sar: Better assess size of buffers that need to be reallocated
When a buffer needs to be reallocated, doubling its size may not be
enough to contain all the additional items.
Assess the needed size based on a value giving the minimum number of
items the buffer should be able to contain.
sar: Test for zero value when reallocating all the buffers
When sar reads the contents of a file and meets a LINUX RESTART, it
may have to reallocate the buffers used to save CPU statistics to match
the new number of CPU. If the number of CPU has increased it doubles the
size of its buffers.
There is a problem though if CPU activity was not included in file (or
if its format was unknown to current sysstat version). In this case the
buffer size was zero. So test this before reallocating buffers.
Below is a sample output before the patch was applied.
The file contains only A_PCSW activity then a LINUX RESTART message
(which cannot be displayed):
$ sar -f data0 -w
Linux 4.4.14-200.fc22.x86_64 (test.home) 03/04/18 _x86_64_
(8 CPU)
Sebastien GODARD [Sun, 18 Feb 2018 15:18:58 +0000 (16:18 +0100)]
sar: Add new -z option
Add new option to sar (-z). This option tells sar to omit output for any
devices for which there was no activity during the sample period.
This option applies for network interfaces, interrupts, block devices,
serial lines, filesystems, CPU (in softnet statistics).
Note: This option already existed for iostat.
Sebastien GODARD [Mon, 12 Feb 2018 14:54:01 +0000 (15:54 +0100)]
sysstat-11.7.2
sysstat version 11.7.2 final packaging.
lsm and spec files updated.
Changelog added.
Year updated in (C) message.
After one additional month of hard work, sysstat version 11.7.2 seems to
be quite stable on my machine. It should now be possible to use it
safely on production systems.
BTW I told you that sar's data files new binary format would be at least
25% smaller than with previous versions. I could have even said 30% to
45% smaller!
Version 11.7.2 also includes a rewritten function that will enable you
to convert your old data files (from versions 9.1.6 and later) to the
up-to-date format (11.7.2).
Please upload, test and tell me if anything goes wrong.
Sebastien GODARD [Sun, 11 Feb 2018 10:43:20 +0000 (11:43 +0100)]
sar/sadf: Add checks on file's header size read from file
sa_open_read_magic() reads file's magic structure and in particular the
size of the header structure to come.
header_size field in file_magic structure exists only with versions
10.3.1 and later. So checking bounds for header_size is done only for
those versions, based on the values of sysstat_version and
sysstat_patchlevel.
With a corrupted datafile (i.e. a file having the right FORMAT_MAGIC
value but values corresponding to older sysstat versions in
sysstat_version and sysstat_patchlevel), the test is not done.
So do it again in check_file_actlst() function.
Offline CPU should be ignored only when graphs are drawn and if they
have been offline on the whole period.
When graphs are created, offline CPU are useful: Their frequency value
(0) is used to make the graph go through 0.
Don't display a graph for CPUs which have been offline on the whole
period.
A CPU whose state alternates between offline and online still gets a
graph.
For very small files with just a few samples taken at a few seconds apart
(e.g. 15 samples taken at 1 second interval) the background grid was not
drawn on the whole width of the graph.
Fix this.
SVG: Make sure that "No data" is displayed when no graphs are created
If no graphs are detected (e.g. the file contains only a LINUX RESTART
message) then SVG file is empty and an error is displayed by the browser.
Make sure that "No data" is displayed in the browser window instead.
Add several tests for continuous integration.
In particular some tests, which previously failed because of endianness
mismatch (see #145), are restored (sar should now be able to process
both endian formats).
@nr_ini shouldn't be modified by upgrade_stats_serial() function because
it is used to kwnow how many structures have to be read from the
original file.
sadf: Properly convert structures with compatible changes
Some activity structures may have gained additional fields while keeping
their original magic number. This is because these changes were seen as
compatible for sar.
Anyway these structures have to be handled in a specific way when they
are converted since these additional fields didn't exist for all the
versions since 9.1.6.
Update sadf function used to convert an old datafile (from version 9.1.6
and later) to the up-to-date format (11.7.2). This function corresponds
to sadf option -c ("sadf -c old_datafile > new_datafile").
It should work on both little endian and big endian data files.
The original endianness of the file is preserved.
Sebastien GODARD [Sat, 27 Jan 2018 06:47:55 +0000 (07:47 +0100)]
Save HZ as an unsigned long integer
Number of clock ticks per second is given as a long integer by sysconf()
function. It is also saved as an unsigned long integer in sar's data
files (sa_hz).
Sebastien GODARD [Mon, 22 Jan 2018 16:06:21 +0000 (17:06 +0100)]
iostat: Refresh device list properly
When running iostat to monitor disk activity,
disconnecting a USB drive from the system then reconnecting another one
didn't make the new one appear on the list.
This patch fixes the problem.
Reported-by: Robert Hoffmann <robert@noreply.servermasters.com> Signed-off-by: Sebastien GODARD <sysstat@users.noreply.github.com>
Sebastien GODARD [Sat, 20 Jan 2018 19:54:41 +0000 (20:54 +0100)]
RAW: Check return code from check_*_reg functions
Distinguish between new devices and devices that have been unregistered
then registered again (this indication is displayed by "sadf -r" with
showhints option).
Sebastien GODARD [Fri, 19 Jan 2018 15:52:15 +0000 (16:52 +0100)]
sar: Don't read statistics twice when displaying average since system
startup
Entering something like "sar 0" to display statistics (here CPU) since
system startup made sadc read the statistics twice. One reading is
enough.
Before this patch, sar called sadc with interval=1 and count=-1.
Now sadc is called with interval=1 and count=1.
Sebastien GODARD [Fri, 19 Jan 2018 15:10:14 +0000 (16:10 +0100)]
sar: Display all items for USB and filesystems activities
The summary displayed by sar for USB and filesystems activities didn't
include all the items. In pacticular, filesystems or USB devices that
had been unmounted were not displayed in the summary ending the report.
This patch fixes the problem.
Sebastien GODARD [Wed, 17 Jan 2018 20:26:56 +0000 (21:26 +0100)]
sadc: Select activities by name
This patch enables the user to select exactly which activities will be
collected by sadc and saved into the binary data file.
Selected activities are entered following sadc's option -S using their
formal report name (these names are displayed by "sar --help").
Sebastien GODARD [Wed, 17 Jan 2018 19:49:49 +0000 (20:49 +0100)]
iostat: Display device name at the end of line when option -h used
iostat's option -h was intended to make the output more easily readable
by a human, especially when some devices had a long name that would make
the whole output shift around. In fact it added a '\n' to split the line
and the result was not particularly easier to read.
This patch moves the device name at the end of the line when option -h
is used without inserting more lines.
Option -h still sets option --human to automatically select the right
unit to display.
Sebastien GODARD [Mon, 15 Jan 2018 16:38:27 +0000 (17:38 +0100)]
sadf: RAW: Display activity name when showhints option used
"sadf -r -O showhints..." now displays the name for each activity.
Also move the code displaying the number of structures used and allocated
from raw_stats.c to sadf.c.
Sebastien GODARD [Fri, 12 Jan 2018 10:56:29 +0000 (11:56 +0100)]
sysstat-11.7.1
sysstat version 11.7.1 final packaging.
lsm and spec files updated.
Changelog added.
NOTE: Stable versions (11.2.14 [EOL], 11.4.8 and 11.6.2) also exist
and will be available for download from my web site:
http://pagesperso-orange.fr/sebastien.godard/
The stable versions include only the bug fixes added in sysstat 11.7.1
but not the new features.
Version 11.7.1 includes major changes concerning sar's binary data
file format and how sar works. Among them:
* A binary data file should take much less space on disk thanks
to new optimizations on how values are stowed in memory. The space saved
is estimated at least at 25% compared to previous format.
* Structures are no longer statically allocated, meaning that
the system can now register as many new devices (disks, network
interfaces, etc.) as needed and you will find all of them saved in your
saXX data file (provided that you have selected the corresponding
activities to collect). Previous version could lead to some devices
being ignored and going unnoticed if no free structures were left.
* Sar (and sadf) will now be able to read a binary data file
whatever its endianness is: Both big-endian and little-endian files can
be read by the same sar or sadf executable.
* More flexibility has been added to sar's binary data file
format. I cannot promise that the format won't change again in the
future but it should now be possible to add new metrics or activities
without making the format unreadable by older sar versions (starting
with version 11.7.1). Older versions will be able to read newer formats
but will display only the metrics they currently know.
Version 11.7.1 is a development version. As such, it is still NOT ready
for use in production. Yet I would really like to have it tested by the
maximum number of users and get feedback! :-)
NOTE: The function provided by sadf (option -c) to convert an old
data file to the up-to-date format has been temporarily inhibited. This
function will be working again in next sysstat version (11.7.2).
Sebastien GODARD [Fri, 12 Jan 2018 09:15:40 +0000 (10:15 +0100)]
sar: Add new option -h
Add option -h to make sar's output easier to read by a human.
This option tells sar to display {devices, network interfaces} names at
the end of the line to make sure the output won't shift. This option
also enables implicitly --human and -p (pretty-print).
SVG: Put queue and load average metrics back in order
Commit 4a15362 changed the order of metrics in stats_queue structure.
This new order has also to be taken into account by corresdponding
rendering function in svg_stats.c.
File activity.c should depend on rd_stats.h and rd_sensors.h
for sadf and sar in addition to sadc.
Structures size (STATS_*_SIZE) is defined in these include files and
used in activity.c for all commands.
sadf: Update functions to display unsigned unsigned long values
Some metrics's type is now unsigned long long instead of unsigned long.
So update functions used by sadf accordingly. Other metrics whose type
remains unsigned long can still be displayed.
sar: Update functions to be consistent with new stats structures
Update functions used to read or write statistics to remain consistent
with structures' new types.
Some work has still to be done in rndr_stats.c and svg_stats.c
Sebastien GODARD [Sun, 31 Dec 2017 14:35:00 +0000 (15:35 +0100)]
sar: Fix code trying to guess when a header line needs to be displayed
sar should repeat the header line for each activity at regular intervals
when displaying stats from file.
The code previously used needs to be updated as the number of items may
now vary with each sample.
Sebastien GODARD [Sun, 31 Dec 2017 08:32:33 +0000 (09:32 +0100)]
sar: Print items' name on the right of the report
Some items' name (devices, network interfaces) may sometimes be long
enough to create a shift in the statistics displayed by sar which are no
longer in their original column.
So display them at the end of the line on the right.