In order to validate the gang block code ztest is configured to
artificially force a fraction of large blocks to be written as
gang blocks. The default setting chosen for this was to
write 25% of all blocks 32k or larger using gang blocks.
The confluence of an unrealistically large number of gang blocks,
the aggressive fault injection done by ztest, and the split
segment reconstruction logic introduced by device removal has
resulted in the following type of failure:
zdb -bccsv -G -d ... exit code 3
Specifically, zdb was unable to open the pool because it was
unable to reconstruct a damaged block. Manual investigation
of multiple failures clearly showed that the block could be
reconstructed. However, due to the large number of damaged
segments (>35) it could not be done in the allotted time.
Furthermore, the large number of gang blocks was determined
to be the reason for the unrealistically large number of
damaged segments. In order to make this situation less
likely, this change both increases the forced gang block
size to 64k and reduces the frequency to 3% of blocks.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8080
.zo_init = 1,
.zo_time = 300, /* 5 minutes */
.zo_maxloops = 50, /* max loops during spa_freeze() */
- .zo_metaslab_force_ganging = 32 << 10,
+ .zo_metaslab_force_ganging = 64 << 10,
.zo_special_vdevs = ZTEST_VDEV_CLASS_RND,
};
/*
* For testing, make some blocks above a certain size be gang blocks.
- * This will also test spilling from special to normal.
+ * This will result in more split blocks when using device removal,
+ * and a large number of split blocks coupled with ztest-induced
+ * damage can result in extremely long reconstruction times. This
+ * will also test spilling from special to normal.
*/
- if (psize >= metaslab_force_ganging && (ddi_get_lbolt() & 3) == 0) {
+ if (psize >= metaslab_force_ganging && (spa_get_random(100) < 3)) {
metaslab_trace_add(zal, NULL, NULL, psize, d, TRACE_FORCE_GANG,
allocator);
return (SET_ERROR(ENOSPC));