On Jaguar, XCHG has a latency of 1cy and decodes to 2 macro-opcodes. Maximum
throughput for XCHG is 1 IPC. The byte exchange has worse latency and decodes to
1 extra uOP; maximum observed throughput is 0.5 IPC.
```
xchgb %cl, %dl # Latency: 2cy - uOPs: 3 - 2 ALU
xchgw %cx, %dx # Latency: 1cy - uOPs: 2 - 2 ALU
xchgl %ecx, %edx # Latency: 1cy - uOPs: 2 - 2 ALU
xchgq %rcx, %rdx # Latency: 1cy - uOPs: 2 - 2 ALU
```
The reg-mem forms of XCHG are atomic operations with an observed latency of
16cy. The resource usage is similar to the XCHGrr variants. The biggest
difference is obviously the bus-locking, which prevents the LS to issue other
memory uOPs in parallel until the unlocking store uOP is executed.
```
xchgb %cl, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy
xchgw %cx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy
xchgl %ecx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy
xchgq %rcx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy
```
The exchanged in/out register operand becomes available after 11cy from the
start of execution. Added test xchg.s to verify that we correctly see that
register write committed in 11cy (and not 16cy).
Reg-reg XADD instructions have the same latency/throughput than the byte
exchange (register-register variant).
```
xaddb %cl, %dl # latency: 2cy - uOPs: 3 - 3 ALU
xaddw %cx, %dx # latency: 2cy - uOPs: 3 - 3 ALU
xaddl %ecx, %edx # latency: 2cy - uOPs: 3 - 3 ALU
xaddq %rcx, %rdx # latency: 2cy - uOPs: 3 - 3 ALU
```
The non-atomic RM variants have a latency of 11cy, and decode to 4
macro-opcodes. They still consume 2 ALU pipes, and the exchange in/out register
operand becomes available in 3cy (it matches the 'load-to-use latency').
```
xaddb %cl, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU
xaddw %cx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU
xaddl %ecx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU
xaddq %rcx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU
```
The atomic XADD variants execute in 16cy. The in/out register operand is
available after 11cy from the start of execution.
```
lock xaddb %cl, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
lock xaddw %cx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
lock xaddl %ecx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
lock xaddq %rcx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
```
Added test xadd.s to verify those latencies as well as read-advance values.
Differential Revision: https://reviews.llvm.org/D66535
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369642
91177308-0d34-0410-b5e6-
96231b3b80d8
defm : X86WriteRes<WriteBSWAP64, [JALU01], 1, [1], 1>;
defm : X86WriteRes<WriteCMPXCHG, [JALU01], 3, [3], 5>;
defm : X86WriteRes<WriteCMPXCHGRMW, [JALU01, JSAGU, JLAGU], 11, [3, 1, 1], 6>;
-defm : X86WriteRes<WriteXCHG, [JALU01], 1, [1], 1>;
+defm : X86WriteRes<WriteXCHG, [JALU01], 1, [2], 2>;
defm : JWriteResIntPair<WriteIMul8, [JALU1, JMul], 3, [1, 1], 2>;
defm : JWriteResIntPair<WriteIMul16, [JALU1, JMul], 3, [1, 1], 2>;
NOT8m, NOT16m, NOT32m, NOT64m,
NEG8m, NEG16m, NEG32m, NEG64m)>;
+def JWriteXCHG8rr_XADDrr : SchedWriteRes<[JALU01]> {
+ let Latency = 2;
+ let ResourceCycles = [3];
+ let NumMicroOps = 3;
+}
+def : InstRW<[JWriteXCHG8rr_XADDrr], (instrs XCHG8rr, XADD8rr, XADD16rr,
+ XADD32rr, XADD64rr)>;
+
+// This write defines the latency of the in/out register operand of a non-atomic
+// XADDrm. This is the first of a pair of writes that model non-atomic
+// XADDrm instructions (the second write definition is JWriteXADDrm_LdSt_Part).
+//
+// We need two writes because the instruction latency differs from the output
+// register operand latency. In particular, the first write describes the first
+// (and only) output register operand of the instruction. However, the
+// instruction latency is set to the MAX of all the write latencies. That's why
+// a second write is needed in this case (see example below).
+//
+// Example:
+// XADD %ecx, (%rsp) ## Instruction latency: 11cy
+// ## ECX write Latency: 3cy
+//
+// Register ECX becomes available in 3 cycles. That is because the value of ECX
+// is exchanged with the value read from the stack pointer, and the load-to-use
+// latency is assumed to be 3cy.
+def JWriteXADDrm_XCHG_Part : SchedWriteRes<[JALU01]> {
+ let Latency = 3; // load-to-use latency
+ let ResourceCycles = [3];
+ let NumMicroOps = 3;
+}
+
+// This write defines the latency of the in/out register operand of an atomic
+// XADDrm. This is the first of a sequence of two writes used to model atomic
+// XADD instructions. The second write of the sequence is JWriteXCHGrm_LdSt_Part.
+//
+//
+// Example:
+// LOCK XADD %ecx, (%rsp) ## Instruction Latency: 16cy
+// ## ECX write Latency: 11cy
+//
+// The value of ECX becomes available only after 11cy from the start of
+// execution. This write is used to specifically set that operand latency.
+def JWriteLOCK_XADDrm_XCHG_Part : SchedWriteRes<[JALU01]> {
+ let Latency = 11;
+ let ResourceCycles = [3];
+ let NumMicroOps = 3;
+}
+
+// This write defines the latency of the in/out register operand of an atomic
+// XCHGrm. This write is the first of a sequence of two writes that describe
+// atomic XCHG operations. We need two writes because the instruction latency
+// differs from the output register write latency. We want to make sure that
+// the output register operand becomes visible after 11cy. However, we want to
+// set the instruction latency to 16cy.
+def JWriteXCHGrm_XCHG_Part : SchedWriteRes<[JALU01]> {
+ let Latency = 11;
+ let ResourceCycles = [2];
+ let NumMicroOps = 2;
+}
+
+def JWriteXADDrm_LdSt_Part : SchedWriteRes<[JLAGU, JSAGU]> {
+ let Latency = 11;
+ let ResourceCycles = [1, 1];
+ let NumMicroOps = 1;
+}
+
+def JWriteXCHGrm_LdSt_Part : SchedWriteRes<[JLAGU, JSAGU]> {
+ let Latency = 16;
+ let ResourceCycles = [16, 16];
+ let NumMicroOps = 1;
+}
+
+def JWriteXADDrm_Part1 : SchedWriteVariant<[
+ SchedVar<MCSchedPredicate<CheckLockPrefix>, [JWriteLOCK_XADDrm_XCHG_Part]>,
+ SchedVar<NoSchedPred, [JWriteXADDrm_XCHG_Part]>
+]>;
+
+def JWriteXADDrm_Part2 : SchedWriteVariant<[
+ SchedVar<MCSchedPredicate<CheckLockPrefix>, [JWriteXCHGrm_LdSt_Part]>,
+ SchedVar<NoSchedPred, [JWriteXADDrm_LdSt_Part]>
+]>;
+
+def : InstRW<[JWriteXADDrm_Part1, JWriteXADDrm_Part2, ReadAfterLd],
+ (instrs XADD8rm, XADD16rm, XADD32rm, XADD64rm,
+ LXADD8, LXADD16, LXADD32, LXADD64)>;
+
+def : InstRW<[JWriteXCHGrm_XCHG_Part, JWriteXCHGrm_LdSt_Part, ReadAfterLd],
+ (instrs XCHG8rm, XCHG16rm, XCHG32rm, XCHG64rm)>;
+
+
////////////////////////////////////////////////////////////////////////////////
// Floating point. This covers both scalar and vector operations.
////////////////////////////////////////////////////////////////////////////////
# CHECK-NEXT: 1 4 1.00 * testq %rsi, (%rax)
# CHECK-NEXT: 1 100 0.50 * U ud2
# CHECK-NEXT: 1 100 0.50 U wrmsr
-# CHECK-NEXT: 1 1 0.50 xaddb %bl, %cl
-# CHECK-NEXT: 1 4 1.00 * * xaddb %bl, (%rcx)
-# CHECK-NEXT: 1 4 1.00 * * lock xaddb %bl, (%rcx)
-# CHECK-NEXT: 1 1 0.50 xaddw %bx, %cx
-# CHECK-NEXT: 1 4 1.00 * * xaddw %ax, (%rbx)
-# CHECK-NEXT: 1 4 1.00 * * lock xaddw %ax, (%rbx)
-# CHECK-NEXT: 1 1 0.50 xaddl %ebx, %ecx
-# CHECK-NEXT: 1 4 1.00 * * xaddl %eax, (%rbx)
-# CHECK-NEXT: 1 4 1.00 * * lock xaddl %eax, (%rbx)
-# CHECK-NEXT: 1 1 0.50 xaddq %rbx, %rcx
-# CHECK-NEXT: 1 4 1.00 * * xaddq %rax, (%rbx)
-# CHECK-NEXT: 1 4 1.00 * * lock xaddq %rax, (%rbx)
-# CHECK-NEXT: 1 1 0.50 xchgb %bl, %cl
-# CHECK-NEXT: 1 4 1.00 * * xchgb %bl, (%rbx)
-# CHECK-NEXT: 1 4 1.00 * * lock xchgb %bl, (%rbx)
-# CHECK-NEXT: 1 1 0.50 xchgw %bx, %ax
-# CHECK-NEXT: 1 1 0.50 xchgw %bx, %cx
-# CHECK-NEXT: 1 4 1.00 * * xchgw %ax, (%rbx)
-# CHECK-NEXT: 1 4 1.00 * * lock xchgw %ax, (%rbx)
-# CHECK-NEXT: 1 1 0.50 xchgl %ebx, %eax
-# CHECK-NEXT: 1 1 0.50 xchgl %ebx, %ecx
-# CHECK-NEXT: 1 4 1.00 * * xchgl %eax, (%rbx)
-# CHECK-NEXT: 1 4 1.00 * * lock xchgl %eax, (%rbx)
-# CHECK-NEXT: 1 1 0.50 xchgq %rbx, %rax
-# CHECK-NEXT: 1 1 0.50 xchgq %rbx, %rcx
-# CHECK-NEXT: 1 4 1.00 * * xchgq %rax, (%rbx)
-# CHECK-NEXT: 1 4 1.00 * * lock xchgq %rax, (%rbx)
+# CHECK-NEXT: 3 2 1.50 xaddb %bl, %cl
+# CHECK-NEXT: 4 11 1.50 * * xaddb %bl, (%rcx)
+# CHECK-NEXT: 4 16 16.00 * * lock xaddb %bl, (%rcx)
+# CHECK-NEXT: 3 2 1.50 xaddw %bx, %cx
+# CHECK-NEXT: 4 11 1.50 * * xaddw %ax, (%rbx)
+# CHECK-NEXT: 4 16 16.00 * * lock xaddw %ax, (%rbx)
+# CHECK-NEXT: 3 2 1.50 xaddl %ebx, %ecx
+# CHECK-NEXT: 4 11 1.50 * * xaddl %eax, (%rbx)
+# CHECK-NEXT: 4 16 16.00 * * lock xaddl %eax, (%rbx)
+# CHECK-NEXT: 3 2 1.50 xaddq %rbx, %rcx
+# CHECK-NEXT: 4 11 1.50 * * xaddq %rax, (%rbx)
+# CHECK-NEXT: 4 16 16.00 * * lock xaddq %rax, (%rbx)
+# CHECK-NEXT: 3 2 1.50 xchgb %bl, %cl
+# CHECK-NEXT: 3 16 16.00 * * xchgb %bl, (%rbx)
+# CHECK-NEXT: 3 16 16.00 * * lock xchgb %bl, (%rbx)
+# CHECK-NEXT: 2 1 1.00 xchgw %bx, %ax
+# CHECK-NEXT: 2 1 1.00 xchgw %bx, %cx
+# CHECK-NEXT: 3 16 16.00 * * xchgw %ax, (%rbx)
+# CHECK-NEXT: 3 16 16.00 * * lock xchgw %ax, (%rbx)
+# CHECK-NEXT: 2 1 1.00 xchgl %ebx, %eax
+# CHECK-NEXT: 2 1 1.00 xchgl %ebx, %ecx
+# CHECK-NEXT: 3 16 16.00 * * xchgl %eax, (%rbx)
+# CHECK-NEXT: 3 16 16.00 * * lock xchgl %eax, (%rbx)
+# CHECK-NEXT: 2 1 1.00 xchgq %rbx, %rax
+# CHECK-NEXT: 2 1 1.00 xchgq %rbx, %rcx
+# CHECK-NEXT: 3 16 16.00 * * xchgq %rax, (%rbx)
+# CHECK-NEXT: 3 16 16.00 * * lock xchgq %rax, (%rbx)
# CHECK-NEXT: 1 3 1.00 * xlatb
# CHECK-NEXT: 1 1 0.50 xorb $7, %al
# CHECK-NEXT: 1 1 0.50 xorb $7, %dil
# CHECK: Resource pressure per iteration:
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
-# CHECK-NEXT: 702.50 752.50 380.00 - - - - 812.00 64.00 713.00 - - - -
+# CHECK-NEXT: 722.50 772.50 380.00 - - - - 992.00 64.00 893.00 - - - -
# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - - - - - - testq %rsi, (%rax)
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - ud2
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - wrmsr
-# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xaddb %bl, %cl
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - xaddb %bl, (%rcx)
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - lock xaddb %bl, (%rcx)
-# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xaddw %bx, %cx
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - xaddw %ax, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - lock xaddw %ax, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xaddl %ebx, %ecx
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - xaddl %eax, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - lock xaddl %eax, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xaddq %rbx, %rcx
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - xaddq %rax, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - lock xaddq %rax, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xchgb %bl, %cl
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - xchgb %bl, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - lock xchgb %bl, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xchgw %bx, %ax
-# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xchgw %bx, %cx
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - xchgw %ax, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - lock xchgw %ax, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xchgl %ebx, %eax
-# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xchgl %ebx, %ecx
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - xchgl %eax, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - lock xchgl %eax, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xchgq %rbx, %rax
-# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xchgq %rbx, %rcx
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - xchgq %rax, (%rbx)
-# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - 1.00 - - - - lock xchgq %rax, (%rbx)
+# CHECK-NEXT: 1.50 1.50 - - - - - - - - - - - - xaddb %bl, %cl
+# CHECK-NEXT: 1.50 1.50 - - - - - 1.00 - 1.00 - - - - xaddb %bl, (%rcx)
+# CHECK-NEXT: 1.50 1.50 - - - - - 16.00 - 16.00 - - - - lock xaddb %bl, (%rcx)
+# CHECK-NEXT: 1.50 1.50 - - - - - - - - - - - - xaddw %bx, %cx
+# CHECK-NEXT: 1.50 1.50 - - - - - 1.00 - 1.00 - - - - xaddw %ax, (%rbx)
+# CHECK-NEXT: 1.50 1.50 - - - - - 16.00 - 16.00 - - - - lock xaddw %ax, (%rbx)
+# CHECK-NEXT: 1.50 1.50 - - - - - - - - - - - - xaddl %ebx, %ecx
+# CHECK-NEXT: 1.50 1.50 - - - - - 1.00 - 1.00 - - - - xaddl %eax, (%rbx)
+# CHECK-NEXT: 1.50 1.50 - - - - - 16.00 - 16.00 - - - - lock xaddl %eax, (%rbx)
+# CHECK-NEXT: 1.50 1.50 - - - - - - - - - - - - xaddq %rbx, %rcx
+# CHECK-NEXT: 1.50 1.50 - - - - - 1.00 - 1.00 - - - - xaddq %rax, (%rbx)
+# CHECK-NEXT: 1.50 1.50 - - - - - 16.00 - 16.00 - - - - lock xaddq %rax, (%rbx)
+# CHECK-NEXT: 1.50 1.50 - - - - - - - - - - - - xchgb %bl, %cl
+# CHECK-NEXT: 1.00 1.00 - - - - - 16.00 - 16.00 - - - - xchgb %bl, (%rbx)
+# CHECK-NEXT: 1.00 1.00 - - - - - 16.00 - 16.00 - - - - lock xchgb %bl, (%rbx)
+# CHECK-NEXT: 1.00 1.00 - - - - - - - - - - - - xchgw %bx, %ax
+# CHECK-NEXT: 1.00 1.00 - - - - - - - - - - - - xchgw %bx, %cx
+# CHECK-NEXT: 1.00 1.00 - - - - - 16.00 - 16.00 - - - - xchgw %ax, (%rbx)
+# CHECK-NEXT: 1.00 1.00 - - - - - 16.00 - 16.00 - - - - lock xchgw %ax, (%rbx)
+# CHECK-NEXT: 1.00 1.00 - - - - - - - - - - - - xchgl %ebx, %eax
+# CHECK-NEXT: 1.00 1.00 - - - - - - - - - - - - xchgl %ebx, %ecx
+# CHECK-NEXT: 1.00 1.00 - - - - - 16.00 - 16.00 - - - - xchgl %eax, (%rbx)
+# CHECK-NEXT: 1.00 1.00 - - - - - 16.00 - 16.00 - - - - lock xchgl %eax, (%rbx)
+# CHECK-NEXT: 1.00 1.00 - - - - - - - - - - - - xchgq %rbx, %rax
+# CHECK-NEXT: 1.00 1.00 - - - - - - - - - - - - xchgq %rbx, %rcx
+# CHECK-NEXT: 1.00 1.00 - - - - - 16.00 - 16.00 - - - - xchgq %rax, (%rbx)
+# CHECK-NEXT: 1.00 1.00 - - - - - 16.00 - 16.00 - - - - lock xchgq %rax, (%rbx)
# CHECK-NEXT: - - - - - - - 1.00 - - - - - - xlatb
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xorb $7, %al
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xorb $7, %dil
--- /dev/null
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=2 -timeline < %s | FileCheck %s
+
+# LLVM-MCA-BEGIN
+xadd %ecx, (%rsp)
+add %ecx, %ecx
+add %ecx, %ecx
+imul %ecx, %ecx
+imul %ecx, %ecx
+# LLVM-MCA-END
+
+# LLVM-MCA-BEGIN
+lock xadd %ecx, (%rsp)
+add %ecx, %ecx
+add %ecx, %ecx
+imul %ecx, %ecx
+imul %ecx, %ecx
+# LLVM-MCA-END
+
+# CHECK: [0] Code Region
+
+# CHECK: Iterations: 2
+# CHECK-NEXT: Instructions: 10
+# CHECK-NEXT: Total Cycles: 27
+# CHECK-NEXT: Total uOps: 20
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.74
+# CHECK-NEXT: IPC: 0.37
+# CHECK-NEXT: Block RThroughput: 5.0
+
+# CHECK: Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
+# CHECK-NEXT: 4 11 1.50 * * xaddl %ecx, (%rsp)
+# CHECK-NEXT: 1 1 0.50 addl %ecx, %ecx
+# CHECK-NEXT: 1 1 0.50 addl %ecx, %ecx
+# CHECK-NEXT: 2 3 1.00 imull %ecx, %ecx
+# CHECK-NEXT: 2 3 1.00 imull %ecx, %ecx
+
+# CHECK: Resources:
+# CHECK-NEXT: [0] - JALU0
+# CHECK-NEXT: [1] - JALU1
+# CHECK-NEXT: [2] - JDiv
+# CHECK-NEXT: [3] - JFPA
+# CHECK-NEXT: [4] - JFPM
+# CHECK-NEXT: [5] - JFPU0
+# CHECK-NEXT: [6] - JFPU1
+# CHECK-NEXT: [7] - JLAGU
+# CHECK-NEXT: [8] - JMul
+# CHECK-NEXT: [9] - JSAGU
+# CHECK-NEXT: [10] - JSTC
+# CHECK-NEXT: [11] - JVALU0
+# CHECK-NEXT: [12] - JVALU1
+# CHECK-NEXT: [13] - JVIMUL
+
+# CHECK: Resource pressure per iteration:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
+# CHECK-NEXT: 2.50 4.50 - - - - - 1.00 2.00 1.00 - - - -
+
+# CHECK: Resource pressure by instruction:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
+# CHECK-NEXT: 1.50 1.50 - - - - - 1.00 - 1.00 - - - - xaddl %ecx, (%rsp)
+# CHECK-NEXT: 1.00 - - - - - - - - - - - - - addl %ecx, %ecx
+# CHECK-NEXT: - 1.00 - - - - - - - - - - - - addl %ecx, %ecx
+# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %ecx, %ecx
+# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %ecx, %ecx
+
+# CHECK: Timeline view:
+# CHECK-NEXT: 0123456789
+# CHECK-NEXT: Index 0123456789 0123456
+
+# CHECK: [0,0] DeeeeeeeeeeeER . . .. xaddl %ecx, (%rsp)
+# CHECK-NEXT: [0,1] . D=eE-------R . . .. addl %ecx, %ecx
+# CHECK-NEXT: [0,2] . D==eE-------R. . .. addl %ecx, %ecx
+# CHECK-NEXT: [0,3] . D==eeeE----R. . .. imull %ecx, %ecx
+# CHECK-NEXT: [0,4] . D====eeeE--R . .. imull %ecx, %ecx
+# CHECK-NEXT: [1,0] . D======eeeeeeeeeeeER.. xaddl %ecx, (%rsp)
+# CHECK-NEXT: [1,1] . . D=======eE-------R.. addl %ecx, %ecx
+# CHECK-NEXT: [1,2] . . D========eE-------R. addl %ecx, %ecx
+# CHECK-NEXT: [1,3] . . D========eeeE----R. imull %ecx, %ecx
+# CHECK-NEXT: [1,4] . . D==========eeeE--R imull %ecx, %ecx
+
+# CHECK: Average Wait times (based on the timeline view):
+# CHECK-NEXT: [0]: Executions
+# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# CHECK: [0] [1] [2] [3]
+# CHECK-NEXT: 0. 2 4.0 0.5 0.0 xaddl %ecx, (%rsp)
+# CHECK-NEXT: 1. 2 5.0 0.0 7.0 addl %ecx, %ecx
+# CHECK-NEXT: 2. 2 6.0 0.0 7.0 addl %ecx, %ecx
+# CHECK-NEXT: 3. 2 6.0 0.0 4.0 imull %ecx, %ecx
+# CHECK-NEXT: 4. 2 8.0 0.0 2.0 imull %ecx, %ecx
+
+# CHECK: [1] Code Region
+
+# CHECK: Iterations: 2
+# CHECK-NEXT: Instructions: 10
+# CHECK-NEXT: Total Cycles: 38
+# CHECK-NEXT: Total uOps: 20
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.53
+# CHECK-NEXT: IPC: 0.26
+# CHECK-NEXT: Block RThroughput: 16.0
+
+# CHECK: Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
+# CHECK-NEXT: 4 16 16.00 * * lock xaddl %ecx, (%rsp)
+# CHECK-NEXT: 1 1 0.50 addl %ecx, %ecx
+# CHECK-NEXT: 1 1 0.50 addl %ecx, %ecx
+# CHECK-NEXT: 2 3 1.00 imull %ecx, %ecx
+# CHECK-NEXT: 2 3 1.00 imull %ecx, %ecx
+
+# CHECK: Resources:
+# CHECK-NEXT: [0] - JALU0
+# CHECK-NEXT: [1] - JALU1
+# CHECK-NEXT: [2] - JDiv
+# CHECK-NEXT: [3] - JFPA
+# CHECK-NEXT: [4] - JFPM
+# CHECK-NEXT: [5] - JFPU0
+# CHECK-NEXT: [6] - JFPU1
+# CHECK-NEXT: [7] - JLAGU
+# CHECK-NEXT: [8] - JMul
+# CHECK-NEXT: [9] - JSAGU
+# CHECK-NEXT: [10] - JSTC
+# CHECK-NEXT: [11] - JVALU0
+# CHECK-NEXT: [12] - JVALU1
+# CHECK-NEXT: [13] - JVIMUL
+
+# CHECK: Resource pressure per iteration:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
+# CHECK-NEXT: 2.50 4.50 - - - - - 16.00 2.00 16.00 - - - -
+
+# CHECK: Resource pressure by instruction:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
+# CHECK-NEXT: 1.50 1.50 - - - - - 16.00 - 16.00 - - - - lock xaddl %ecx, (%rsp)
+# CHECK-NEXT: 1.00 - - - - - - - - - - - - - addl %ecx, %ecx
+# CHECK-NEXT: - 1.00 - - - - - - - - - - - - addl %ecx, %ecx
+# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %ecx, %ecx
+# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %ecx, %ecx
+
+# CHECK: Timeline view:
+# CHECK-NEXT: 0123456789 01234567
+# CHECK-NEXT: Index 0123456789 0123456789
+
+# CHECK: [0,0] DeeeeeeeeeeeeeeeeER . . . . . lock xaddl %ecx, (%rsp)
+# CHECK-NEXT: [0,1] . D=========eE----R . . . . . addl %ecx, %ecx
+# CHECK-NEXT: [0,2] . D==========eE----R. . . . . addl %ecx, %ecx
+# CHECK-NEXT: [0,3] . D==========eeeE-R. . . . . imull %ecx, %ecx
+# CHECK-NEXT: [0,4] . D============eeeER . . . . imull %ecx, %ecx
+# CHECK-NEXT: [1,0] . D===========eeeeeeeeeeeeeeeeER. . lock xaddl %ecx, (%rsp)
+# CHECK-NEXT: [1,1] . . D====================eE----R. . addl %ecx, %ecx
+# CHECK-NEXT: [1,2] . . D=====================eE----R . addl %ecx, %ecx
+# CHECK-NEXT: [1,3] . . D=====================eeeE-R . imull %ecx, %ecx
+# CHECK-NEXT: [1,4] . . D=======================eeeER imull %ecx, %ecx
+
+# CHECK: Average Wait times (based on the timeline view):
+# CHECK-NEXT: [0]: Executions
+# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# CHECK: [0] [1] [2] [3]
+# CHECK-NEXT: 0. 2 6.5 0.5 0.0 lock xaddl %ecx, (%rsp)
+# CHECK-NEXT: 1. 2 15.5 0.0 4.0 addl %ecx, %ecx
+# CHECK-NEXT: 2. 2 16.5 0.0 4.0 addl %ecx, %ecx
+# CHECK-NEXT: 3. 2 16.5 0.0 1.0 imull %ecx, %ecx
+# CHECK-NEXT: 4. 2 18.5 0.0 0.0 imull %ecx, %ecx
--- /dev/null
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=2 -timeline < %s | FileCheck %s
+
+xchg %ecx, (%rsp)
+add %ecx, %ecx
+add %ecx, %ecx
+imul %ecx, %ecx
+imul %ecx, %ecx
+
+# CHECK: Iterations: 2
+# CHECK-NEXT: Instructions: 10
+# CHECK-NEXT: Total Cycles: 38
+# CHECK-NEXT: Total uOps: 18
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.47
+# CHECK-NEXT: IPC: 0.26
+# CHECK-NEXT: Block RThroughput: 16.0
+
+# CHECK: Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
+# CHECK-NEXT: 3 16 16.00 * * xchgl %ecx, (%rsp)
+# CHECK-NEXT: 1 1 0.50 addl %ecx, %ecx
+# CHECK-NEXT: 1 1 0.50 addl %ecx, %ecx
+# CHECK-NEXT: 2 3 1.00 imull %ecx, %ecx
+# CHECK-NEXT: 2 3 1.00 imull %ecx, %ecx
+
+# CHECK: Resources:
+# CHECK-NEXT: [0] - JALU0
+# CHECK-NEXT: [1] - JALU1
+# CHECK-NEXT: [2] - JDiv
+# CHECK-NEXT: [3] - JFPA
+# CHECK-NEXT: [4] - JFPM
+# CHECK-NEXT: [5] - JFPU0
+# CHECK-NEXT: [6] - JFPU1
+# CHECK-NEXT: [7] - JLAGU
+# CHECK-NEXT: [8] - JMul
+# CHECK-NEXT: [9] - JSAGU
+# CHECK-NEXT: [10] - JSTC
+# CHECK-NEXT: [11] - JVALU0
+# CHECK-NEXT: [12] - JVALU1
+# CHECK-NEXT: [13] - JVIMUL
+
+# CHECK: Resource pressure per iteration:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
+# CHECK-NEXT: 2.00 4.00 - - - - - 16.00 2.00 16.00 - - - -
+
+# CHECK: Resource pressure by instruction:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
+# CHECK-NEXT: 1.00 1.00 - - - - - 16.00 - 16.00 - - - - xchgl %ecx, (%rsp)
+# CHECK-NEXT: 1.00 - - - - - - - - - - - - - addl %ecx, %ecx
+# CHECK-NEXT: - 1.00 - - - - - - - - - - - - addl %ecx, %ecx
+# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %ecx, %ecx
+# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %ecx, %ecx
+
+# CHECK: Timeline view:
+# CHECK-NEXT: 0123456789 01234567
+# CHECK-NEXT: Index 0123456789 0123456789
+
+# CHECK: [0,0] DeeeeeeeeeeeeeeeeER . . . . . xchgl %ecx, (%rsp)
+# CHECK-NEXT: [0,1] .D==========eE----R . . . . . addl %ecx, %ecx
+# CHECK-NEXT: [0,2] . D==========eE----R. . . . . addl %ecx, %ecx
+# CHECK-NEXT: [0,3] . D==========eeeE-R. . . . . imull %ecx, %ecx
+# CHECK-NEXT: [0,4] . D============eeeER . . . . imull %ecx, %ecx
+# CHECK-NEXT: [1,0] . D===========eeeeeeeeeeeeeeeeER. . xchgl %ecx, (%rsp)
+# CHECK-NEXT: [1,1] . .D=====================eE----R. . addl %ecx, %ecx
+# CHECK-NEXT: [1,2] . . D=====================eE----R . addl %ecx, %ecx
+# CHECK-NEXT: [1,3] . . D=====================eeeE-R . imull %ecx, %ecx
+# CHECK-NEXT: [1,4] . . D=======================eeeER imull %ecx, %ecx
+
+# CHECK: Average Wait times (based on the timeline view):
+# CHECK-NEXT: [0]: Executions
+# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# CHECK: [0] [1] [2] [3]
+# CHECK-NEXT: 0. 2 6.5 0.5 0.0 xchgl %ecx, (%rsp)
+# CHECK-NEXT: 1. 2 16.5 0.0 4.0 addl %ecx, %ecx
+# CHECK-NEXT: 2. 2 16.5 0.0 4.0 addl %ecx, %ecx
+# CHECK-NEXT: 3. 2 16.5 0.0 1.0 imull %ecx, %ecx
+# CHECK-NEXT: 4. 2 18.5 0.0 0.0 imull %ecx, %ecx