The issue is caused by a small discrepancy in how userland creates the
partition layout and the kernel estimates available space:
* zpool command: subtract 9M from the usable device size, then align
to 1M boundary. 9M is the sum of 1M "start" partition alignment + 8M
EFI "reserved" partition.
* kernel module: subtract 10M from the device size. 10M is the sum of
1M "start" partition alignment + 1m "end" partition alignment + 8M
EFI "reserved" partition.
For devices where the number of sectors is not a multiple of the
alignment size the zpool command will create a partition layout which
reserves less than 1M after the 8M EFI "reserved" partition:
Disk /dev/sda: 1024 MiB,
1073739776 bytes,
2097148 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier:
49811D40-16F4-4E41-84A9-
387703950D7F
Device Start End Sectors Size Type
/dev/sda1 2048
2078719 2076672 1014M Solaris /usr & Apple ZFS
/dev/sda9
2078720 2095103 16384 8M Solaris reserved 1
When the kernel module vdev_open() the device its max_asize ends up
being slightly smaller than asize: this results in a huge number (16E)
reported by metaslab_class_expandable_space().
This change prevents bdev_max_capacity() from returing a size smaller
than bdev_capacity().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #1468
Closes #8391
error = vd->vdev_ops->vdev_op_open(vd, &osize, &max_osize, &ashift);
+ /*
+ * Physical volume size should never be larger than its max size, unless
+ * the disk has shrunk while we were reading it or the device is buggy
+ * or damaged: either way it's not safe for use, bail out of the open.
+ */
+ if (osize > max_osize) {
+ vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,
+ VDEV_AUX_OPEN_FAILED);
+ return (SET_ERROR(ENXIO));
+ }
+
/*
* Reset the vdev_reopening flag so that we actually close
* the vdev on error.
* case, and updating the partition table if appropriate. Once the partition
* size has been increased the additional capacity will be visible using
* bdev_capacity().
+ *
+ * The returned maximum expansion capacity is always expected to be larger, or
+ * at the very least equal, to its usable capacity to prevent overestimating
+ * the pool expandsize.
*/
static uint64_t
bdev_max_capacity(struct block_device *bdev, uint64_t wholedisk)
* alignment restrictions. Over reporting this value isn't
* harmful and would only result in slightly less capacity
* than expected post expansion.
+ * The estimated available space may be slightly smaller than
+ * bdev_capacity() for devices where the number of sectors is
+ * not a multiple of the alignment size and the partition layout
+ * is keeping less than PARTITION_END_ALIGNMENT bytes after the
+ * "reserved" EFI partition: in such cases return the device
+ * usable capacity.
*/
available = i_size_read(bdev->bd_contains->bd_inode) -
((EFI_MIN_RESV_SIZE + NEW_START_BLOCK +
PARTITION_END_ALIGNMENT) << SECTOR_BITS);
- if (available > 0)
- psize = available;
- else
- psize = bdev_capacity(bdev);
+ psize = MAX(available, bdev_capacity(bdev));
} else {
psize = bdev_capacity(bdev);
}