[x86] Teach the new vector shuffle lowering how to cleverly lower single
input v8f32 shuffles which are not 128-bit lane crossing but have
different shuffle patterns in the low and high lanes. This removes most
of the extract/insert traffic that was unnecessary and is particularly
good at lowering cases where only one of the two lanes is shuffled at
all.
I've also added a collection of test cases with undef lanes because this
lowering is somewhat more sensitive to undef lanes than others.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218226
91177308-0d34-0410-b5e6-
96231b3b80d8