x86: detect Bobcat, improve Atom optimizations, reorganize flags
The Bobcat has a 64-bit SIMD unit reminiscent of the Athlon 64; detect this
and apply the appropriate flags.
It also has an extremely slow palignr instruction; create a flag for this to
avoid massive penalties on palignr-heavy functions.
Improve Atom function selection and document exactly what the SLOW_ATOM flag
covers.
Add Atom-optimized SATD/SA8D/hadamard_ac functions: simply combine the ssse3
optimizations with the sse2 algorithm to avoid pmaddubsw, which is slow on
Atom along with other SIMD multiplies.
Drop TBM detection; it'll probably never be useful for x264.
Invert FastShuffle to SlowShuffle; it only ever applied to one CPU (Conroe).
Detect CMOV, to fail more gracefully when run on a chip with MMX2 but no CMOV.