str_and: [nfc] pre-compute and allocate the result string
This change does not affect the functionality of this function, but it has two
motivating advantages:
1. The temporary scratch buffer `ex->tmp` is no longer used. Though it is not
obvious without auditing a lot of surrounding code, the data written into
this buffer does not need to be retained beyond the lifetime of this
function. Removing its use not only removes a code path through sfio, but
decouples this code from other code using `ex->tmp` making it easier to
understand. Related to #1873, #1998.
2. The prior code used an sfio temporary buffer to construct the result string
and then duplicated it into a vmalloc-allocated buffer. This is reasonable
as vmalloc has no support for incrementally constructing dynamically
allocated strings. However we can avoid the intermediate sfio buffer by
simply pre-computing the final vmalloc allocation that will be needed. This
change does exactly that and simply writes the result once into its final
destination instead of copying through an intermediate buffer. This
should not only (slightly) decrease transient heap pressure, but also
(again slightly) accelerate the performance of this function.
Both these effects are a simplification with respect to how the compiler sees
this function. That is, an optimizing compiler should now better comprehend the
intent of this function and be able to more aggressively specialize and inline
it where relevant.