Reduced IOPS when all vdevs are in the zfs_mg_fragmentation_threshold

author Serapheim Dimitropoulos <serapheim@delphix.com>

Thu, 6 Jun 2019 20:08:41 +0000 (13:08 -0700)

committer Tony Hutter <hutter2@llnl.gov>

Wed, 25 Sep 2019 18:27:46 +0000 (11:27 -0700)
author Serapheim Dimitropoulos <serapheim@delphix.com>
Thu, 6 Jun 2019 20:08:41 +0000 (13:08 -0700)
committer Tony Hutter <hutter2@llnl.gov>
Wed, 25 Sep 2019 18:27:46 +0000 (11:27 -0700)
diff --git a/man/man5/zfs-module-parameters.5 b/man/man5/zfs-module-parameters.5

index 5bca12e06ea2bc26118e2701de3f9fe0f7fd21e7..282563f137239a5590b47ab0be512545de7c0977 100644 (file)
--- a/man/man5/zfs-module-parameters.5
+++ b/man/man5/zfs-module-parameters.5
@@ -1817,7 +1817,7 @@ this value. If a metaslab group exceeds this threshold then it will be
  skipped unless all metaslab groups within the metaslab class have also
  crossed this threshold.
  .sp
-Default value: \fB85\fR.
+Default value: \fB95\fR.
  .RE
  
  .sp
diff --git a/module/zfs/metaslab.c b/module/zfs/metaslab.c

index ec89810b48abeed5f31355537803f2d9634e03ee..d1d5a243f403d8be3c2fd1638bc546b78cf01c9c 100644 (file)
--- a/module/zfs/metaslab.c
+++ b/module/zfs/metaslab.c
@@ -103,12 +103,27 @@ int zfs_mg_noalloc_threshold = 0;
  
  /*
   * Metaslab groups are considered eligible for allocations if their
- * fragmenation metric (measured as a percentage) is less than or equal to
- * zfs_mg_fragmentation_threshold. If a metaslab group exceeds this threshold
- * then it will be skipped unless all metaslab groups within the metaslab
- * class have also crossed this threshold.
+ * fragmenation metric (measured as a percentage) is less than or
+ * equal to zfs_mg_fragmentation_threshold. If a metaslab group
+ * exceeds this threshold then it will be skipped unless all metaslab
+ * groups within the metaslab class have also crossed this threshold.
+ *
+ * This tunable was introduced to avoid edge cases where we continue
+ * allocating from very fragmented disks in our pool while other, less
+ * fragmented disks, exists. On the other hand, if all disks in the
+ * pool are uniformly approaching the threshold, the threshold can
+ * be a speed bump in performance, where we keep switching the disks
+ * that we allocate from (e.g. we allocate some segments from disk A
+ * making it bypassing the threshold while freeing segments from disk
+ * B getting its fragmentation below the threshold).
+ *
+ * Empirically, we've seen that our vdev selection for allocations is
+ * good enough that fragmentation increases uniformly across all vdevs
+ * the majority of the time. Thus we set the threshold percentage high
+ * enough to avoid hitting the speed bump on pools that are being pushed
+ * to the edge.
   */
-int zfs_mg_fragmentation_threshold = 85;
+int zfs_mg_fragmentation_threshold = 95;
  
  /*
   * Allow metaslabs to keep their active state as long as their fragmentation
author	Serapheim Dimitropoulos <serapheim@delphix.com>
	Thu, 6 Jun 2019 20:08:41 +0000 (13:08 -0700)
committer	Tony Hutter <hutter2@llnl.gov>
	Wed, 25 Sep 2019 18:27:46 +0000 (11:27 -0700)
man/man5/zfs-module-parameters.5		patch \| blob \| history
module/zfs/metaslab.c		patch \| blob \| history