granicus.if.org Git - llvm/commit

author	Sanjay Patel <spatel@rotateright.com>
	Tue, 23 Apr 2019 15:20:17 +0000 (15:20 +0000)
committer	Sanjay Patel <spatel@rotateright.com>
	Tue, 23 Apr 2019 15:20:17 +0000 (15:20 +0000)
commit	fcd06adc8e2b4cccc9c4f9c7cdc7ae6872f3b6fb
tree	e57afceb029a97825465960b37191980b6cf41dc	tree \| snapshot
parent	9d3d80ccad494d31c8b290fc9e78860282f454b4	commit \| diff

[x86] use psubus for more vsetcc lowering (PR39859)

Circling back to a leftover bit from PR39859:
https://bugs.llvm.org/show_bug.cgi?id=39859#c1

...we have this counter-intuitive (based on the test diffs) opportunity to use 'psubus'.
This appears to be the better perf option for both Haswell and Jaguar based on llvm-mca.
We already do this transform for the SETULT predicate, so this makes the code more
symmetrical too. If we have pminub/pminuw, we prefer those, so this should not affect
anything but pre-SSE4.1 subtargets.

  $ cat before.s
movdqa -16(%rip), %xmm2    ## xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
pxor %xmm0, %xmm2
pcmpgtw -32(%rip), %xmm2 ## xmm2 = [255,255,255,255,255,255,255,255]
pand %xmm2, %xmm0
pandn %xmm1, %xmm2
por %xmm2, %xmm0

  $ cat after.s
movdqa -16(%rip), %xmm2    ## xmm2 = [256,256,256,256,256,256,256,256]
psubusw %xmm0, %xmm2
pxor %xmm3, %xmm3
pcmpeqw %xmm2, %xmm3
pand %xmm3, %xmm0
pandn %xmm1, %xmm3
por %xmm3, %xmm0

  $ llvm-mca before.s -mcpu=haswell
  Iterations:        100
  Instructions:      600
  Total Cycles:      909
  Total uOps:        700

  Dispatch Width:    4
  uOps Per Cycle:    0.77
  IPC:               0.66
  Block RThroughput: 1.8

  $ llvm-mca after.s -mcpu=haswell
  Iterations:        100
  Instructions:      700
  Total Cycles:      409
  Total uOps:        700

  Dispatch Width:    4
  uOps Per Cycle:    1.71
  IPC:               1.71
  Block RThroughput: 1.8

Differential Revision: https://reviews.llvm.org/D60838

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358999 91177308-0d34-0410-b5e6-96231b3b80d8

lib/Target/X86/X86ISelLowering.cpp		diff \| blob \| history
test/CodeGen/X86/vec_setcc-2.ll		diff \| blob \| history