[Atomics] Rename and change prototype for atomic memcpy intrinsic

author Daniel Neilson <dneilson@azul.com>

Fri, 16 Jun 2017 14:43:59 +0000 (14:43 +0000)

committer Daniel Neilson <dneilson@azul.com>

Fri, 16 Jun 2017 14:43:59 +0000 (14:43 +0000)
author Daniel Neilson <dneilson@azul.com>
Fri, 16 Jun 2017 14:43:59 +0000 (14:43 +0000)
committer Daniel Neilson <dneilson@azul.com>
Fri, 16 Jun 2017 14:43:59 +0000 (14:43 +0000)
diff --git a/docs/LangRef.rst b/docs/LangRef.rst

index 093d29a5a8b1364cf276fbb50fb0bca947f9cccd..68aa500150ae326e0c1b0b81e6200ae2ad24ff44 100644 (file)
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@@ -14068,62 +14068,66 @@ Element Wise Atomic Memory Intrinsics
  These intrinsics are similar to the standard library memory intrinsics except
  that they perform memory transfer as a sequence of atomic memory accesses.
  
-.. _int_memcpy_element_atomic:
+.. _int_memcpy_element_unordered_atomic:
  
-'``llvm.memcpy.element.atomic``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+'``llvm.memcpy.element.unordered.atomic``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
  Syntax:
  """""""
  
-This is an overloaded intrinsic. You can use ``llvm.memcpy.element.atomic`` on
+This is an overloaded intrinsic. You can use ``llvm.memcpy.element.unordered.atomic`` on
  any integer bit width and for different address spaces. Not all targets
  support all bit widths however.
  
  ::
  
-      declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* <dest>, i8* <src>,
-                                              i64 <num_elements>, i32 <element_size>)
+      declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* <dest>,
+                                                                       i8* <src>,
+                                                                       i32 <len>,
+                                                                       i32 <element_size>)
+      declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* <dest>,
+                                                                       i8* <src>,
+                                                                       i64 <len>,
+                                                                       i32 <element_size>)
  
  Overview:
  """""""""
  
-The '``llvm.memcpy.element.atomic.*``' intrinsic performs copy of a block of 
-memory from the source location to the destination location as a sequence of
-unordered atomic memory accesses where each access is a multiple of
-``element_size`` bytes wide and aligned at an element size boundary. For example
-each element is accessed atomically in source and destination buffers.
+The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic is a specialization of the
+'``llvm.memcpy.*``' intrinsic. It differs in that the ``dest`` and ``src`` are treated
+as arrays with elements that are exactly ``element_size`` bytes, and the copy between
+buffers uses a sequence of :ref:`unordered atomic <ordering>` load/store operations
+that are a positive integer multiple of the ``element_size`` in size.
  
  Arguments:
  """"""""""
  
-The first argument is a pointer to the destination, the second is a
-pointer to the source. The third argument is an integer argument
-specifying the number of elements to copy, the fourth argument is size of
-the single element in bytes.
+The first three arguments are the same as they are in the :ref:`@llvm.memcpy <int_memcpy>`
+intrinsic, with the added constraint that ``len`` is required to be a positive integer
+multiple of the ``element_size``. If ``len`` is not a positive integer multiple of
+``element_size``, then the behaviour of the intrinsic is undefined.
  
-``element_size`` should be a power of two, greater than zero and less than
-a target-specific atomic access size limit.
+``element_size`` must be a compile-time constant positive power of two no greater than
+target-specific atomic access size limit.
  
-For each of the input pointers ``align`` parameter attribute must be specified.
-It must be a power of two and greater than or equal to the ``element_size``.
-Caller guarantees that both the source and destination pointers are aligned to
-that boundary.
+For each of the input pointers ``align`` parameter attribute must be specified. It
+must be a power of two no less than the ``element_size``. Caller guarantees that
+both the source and destination pointers are aligned to that boundary.
  
  Semantics:
  """"""""""
  
-The '``llvm.memcpy.element.atomic.*``' intrinsic copies
-'``num_elements`` * ``element_size``' bytes of memory from the source location to
-the destination location. These locations are not allowed to overlap. Memory copy
-is performed as a sequence of unordered atomic memory accesses where each access
-is guaranteed to be a multiple of ``element_size`` bytes wide and aligned at an
-element size boundary.
+The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of
+memory from the source location to the destination location. These locations are not
+allowed to overlap. The memory copy is performed as a sequence of load/store operations
+where each access is guaranteed to be a multiple of ``element_size`` bytes wide and
+aligned at an ``element_size`` boundary. 
  
  The order of the copy is unspecified. The same value may be read from the source
  buffer many times, but only one write is issued to the destination buffer per
-element. It is well defined to have concurrent reads and writes to both source
-and destination provided those reads and writes are at least unordered atomic.
+element. It is well defined to have concurrent reads and writes to both source and
+destination provided those reads and writes are unordered atomic when specified.
  
  This intrinsic does not provide any additional ordering guarantees over those
  provided by a set of unordered loads from the source location and stores to the
@@ -14132,8 +14136,8 @@ destination.
  Lowering:
  """""""""
  
-In the most general case call to the '``llvm.memcpy.element.atomic.*``' is lowered
-to a call to the symbol ``__llvm_memcpy_element_atomic_*``. Where '*' is replaced
-with an actual element size.
+In the most general case call to the '``llvm.memcpy.element.unordered.atomic.*``' is
+lowered to a call to the symbol ``__llvm_memcpy_element_unordered_atomic_*``. Where '*'
+is replaced with an actual element size.
  
-Optimizer is allowed to inline memory copy when it's profitable to do so.
+The optimizer is allowed to inline the memory copy when it's profitable to do so.
diff --git a/include/llvm/CodeGen/RuntimeLibcalls.h b/include/llvm/CodeGen/RuntimeLibcalls.h

index ddfabb0c44d633072d8c8c8cecdfcdd097ffdd2e..8c3aacaa8efc19017d18088d3192fbc5c7453bbf 100644 (file)
--- a/include/llvm/CodeGen/RuntimeLibcalls.h
+++ b/include/llvm/CodeGen/RuntimeLibcalls.h
@@ -333,12 +333,12 @@ namespace RTLIB {
      MEMSET,
      MEMMOVE,
  
-    // ELEMENT-WISE ATOMIC MEMORY
-    MEMCPY_ELEMENT_ATOMIC_1,
-    MEMCPY_ELEMENT_ATOMIC_2,
-    MEMCPY_ELEMENT_ATOMIC_4,
-    MEMCPY_ELEMENT_ATOMIC_8,
-    MEMCPY_ELEMENT_ATOMIC_16,
+    // ELEMENT-WISE UNORDERED-ATOMIC MEMORY of different element sizes
+    MEMCPY_ELEMENT_UNORDERED_ATOMIC_1,
+    MEMCPY_ELEMENT_UNORDERED_ATOMIC_2,
+    MEMCPY_ELEMENT_UNORDERED_ATOMIC_4,
+    MEMCPY_ELEMENT_UNORDERED_ATOMIC_8,
+    MEMCPY_ELEMENT_UNORDERED_ATOMIC_16,
  
      // EXCEPTION HANDLING
      UNWIND_RESUME,
@@ -511,9 +511,10 @@ namespace RTLIB {
    /// UNKNOWN_LIBCALL if there is none.
    Libcall getSYNC(unsigned Opc, MVT VT);
  
-  /// getMEMCPY_ELEMENT_ATOMIC - Return MEMCPY_ELEMENT_ATOMIC_* value for the
-  /// given element size or UNKNOW_LIBCALL if there is none.
-  Libcall getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize);
+  /// getMEMCPY_ELEMENT_UNORDERED_ATOMIC - Return
+  /// MEMCPY_ELEMENT_UNORDERED_ATOMIC_* value for the given element size or
+  /// UNKNOW_LIBCALL if there is none.
+  Libcall getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize);
  }
  }
  
diff --git a/include/llvm/IR/IRBuilder.h b/include/llvm/IR/IRBuilder.h

index 5ddaf2b1733be85ce6c271c4210862e5007cfdcf..ec33f82f70224ee9ff11a5094c6ef7dc15cc71d5 100644 (file)
--- a/include/llvm/IR/IRBuilder.h
+++ b/include/llvm/IR/IRBuilder.h
@@ -435,27 +435,25 @@ public:
                           MDNode *ScopeTag = nullptr,
                           MDNode *NoAliasTag = nullptr);
  
-  /// \brief Create and insert an atomic memcpy between the specified
-  /// pointers.
+  /// \brief Create and insert an element unordered-atomic memcpy between the
+  /// specified pointers.
    ///
    /// If the pointers aren't i8*, they will be converted.  If a TBAA tag is
    /// specified, it will be added to the instruction. Likewise with alias.scope
    /// and noalias tags.
-  CallInst *CreateElementAtomicMemCpy(
-      Value *Dst, Value *Src, uint64_t NumElements, uint32_t ElementSize,
+  CallInst *CreateElementUnorderedAtomicMemCpy(
+      Value *Dst, Value *Src, uint64_t Size, uint32_t ElementSize,
        MDNode *TBAATag = nullptr, MDNode *TBAAStructTag = nullptr,
        MDNode *ScopeTag = nullptr, MDNode *NoAliasTag = nullptr) {
-    return CreateElementAtomicMemCpy(Dst, Src, getInt64(NumElements),
-                                     ElementSize, TBAATag, TBAAStructTag,
-                                     ScopeTag, NoAliasTag);
+    return CreateElementUnorderedAtomicMemCpy(
+        Dst, Src, getInt64(Size), ElementSize, TBAATag, TBAAStructTag, ScopeTag,
+        NoAliasTag);
    }
  
-  CallInst *CreateElementAtomicMemCpy(Value *Dst, Value *Src,
-                                      Value *NumElements, uint32_t ElementSize,
-                                      MDNode *TBAATag = nullptr,
-                                      MDNode *TBAAStructTag = nullptr,
-                                      MDNode *ScopeTag = nullptr,
-                                      MDNode *NoAliasTag = nullptr);
+  CallInst *CreateElementUnorderedAtomicMemCpy(
+      Value *Dst, Value *Src, Value *Size, uint32_t ElementSize,
+      MDNode *TBAATag = nullptr, MDNode *TBAAStructTag = nullptr,
+      MDNode *ScopeTag = nullptr, MDNode *NoAliasTag = nullptr);
  
    /// \brief Create and insert a memmove between the specified
    /// pointers.
diff --git a/include/llvm/IR/IntrinsicInst.h b/include/llvm/IR/IntrinsicInst.h

index 2ae98d9e35b0ff910b32647dade6f6263383e9d2..e0dd3ca7d01e39f7ea4fed33b49a259ac3bc4f3b 100644 (file)
--- a/include/llvm/IR/IntrinsicInst.h
+++ b/include/llvm/IR/IntrinsicInst.h
@@ -205,25 +205,91 @@ namespace llvm {
    };
  
    /// This class represents atomic memcpy intrinsic
-  /// TODO: Integrate this class into MemIntrinsic hierarchy.
-  class ElementAtomicMemCpyInst : public IntrinsicInst {
+  /// TODO: Integrate this class into MemIntrinsic hierarchy; for now this is
+  /// C&P of all methods from that hierarchy
+  class ElementUnorderedAtomicMemCpyInst : public IntrinsicInst {
+  private:
+    enum { ARG_DEST = 0, ARG_SOURCE = 1, ARG_LENGTH = 2, ARG_ELEMENTSIZE = 3 };
+
    public:
-    Value *getRawDest() const { return getArgOperand(0); }
-    Value *getRawSource() const { return getArgOperand(1); }
+    Value *getRawDest() const {
+      return const_cast<Value *>(getArgOperand(ARG_DEST));
+    }
+    const Use &getRawDestUse() const { return getArgOperandUse(ARG_DEST); }
+    Use &getRawDestUse() { return getArgOperandUse(ARG_DEST); }
+
+    /// Return the arguments to the instruction.
+    Value *getRawSource() const {
+      return const_cast<Value *>(getArgOperand(ARG_SOURCE));
+    }
+    const Use &getRawSourceUse() const { return getArgOperandUse(ARG_SOURCE); }
+    Use &getRawSourceUse() { return getArgOperandUse(ARG_SOURCE); }
+
+    Value *getLength() const {
+      return const_cast<Value *>(getArgOperand(ARG_LENGTH));
+    }
+    const Use &getLengthUse() const { return getArgOperandUse(ARG_LENGTH); }
+    Use &getLengthUse() { return getArgOperandUse(ARG_LENGTH); }
+
+    bool isVolatile() const { return false; }
+
+    Value *getRawElementSizeInBytes() const {
+      return const_cast<Value *>(getArgOperand(ARG_ELEMENTSIZE));
+    }
+
+    ConstantInt *getElementSizeInBytesCst() const {
+      return cast<ConstantInt>(getRawElementSizeInBytes());
+    }
+
+    uint32_t getElementSizeInBytes() const {
+      return getElementSizeInBytesCst()->getZExtValue();
+    }
+
+    /// This is just like getRawDest, but it strips off any cast
+    /// instructions that feed it, giving the original input.  The returned
+    /// value is guaranteed to be a pointer.
+    Value *getDest() const { return getRawDest()->stripPointerCasts(); }
+
+    /// This is just like getRawSource, but it strips off any cast
+    /// instructions that feed it, giving the original input.  The returned
+    /// value is guaranteed to be a pointer.
+    Value *getSource() const { return getRawSource()->stripPointerCasts(); }
+
+    unsigned getDestAddressSpace() const {
+      return cast<PointerType>(getRawDest()->getType())->getAddressSpace();
+    }
  
-    Value *getNumElements() const { return getArgOperand(2); }
-    void setNumElements(Value *V) { setArgOperand(2, V); }
+    unsigned getSourceAddressSpace() const {
+      return cast<PointerType>(getRawSource()->getType())->getAddressSpace();
+    }
  
-    uint64_t getSrcAlignment() const { return getParamAlignment(0); }
-    uint64_t getDstAlignment() const { return getParamAlignment(1); }
+    /// Set the specified arguments of the instruction.
+    void setDest(Value *Ptr) {
+      assert(getRawDest()->getType() == Ptr->getType() &&
+             "setDest called with pointer of wrong type!");
+      setArgOperand(ARG_DEST, Ptr);
+    }
+
+    void setSource(Value *Ptr) {
+      assert(getRawSource()->getType() == Ptr->getType() &&
+             "setSource called with pointer of wrong type!");
+      setArgOperand(ARG_SOURCE, Ptr);
+    }
+
+    void setLength(Value *L) {
+      assert(getLength()->getType() == L->getType() &&
+             "setLength called with value of wrong type!");
+      setArgOperand(ARG_LENGTH, L);
+    }
  
-    uint64_t getElementSizeInBytes() const {
-      Value *Arg = getArgOperand(3);
-      return cast<ConstantInt>(Arg)->getZExtValue();
+    void setElementSizeInBytes(Constant *V) {
+      assert(V->getType() == Type::getInt8Ty(getContext()) &&
+             "setElementSizeInBytes called with value of wrong type!");
+      setArgOperand(ARG_ELEMENTSIZE, V);
      }
  
      static inline bool classof(const IntrinsicInst *I) {
-      return I->getIntrinsicID() == Intrinsic::memcpy_element_atomic;
+      return I->getIntrinsicID() == Intrinsic::memcpy_element_unordered_atomic;
      }
      static inline bool classof(const Value *V) {
        return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
diff --git a/include/llvm/IR/Intrinsics.td b/include/llvm/IR/Intrinsics.td

index 291d16fb0d9b79119d880b44d2eb9ba7a475ca86..45936a6e9b66d5cf43d6eec07c4c8b991008b781 100644 (file)
--- a/include/llvm/IR/Intrinsics.td
+++ b/include/llvm/IR/Intrinsics.td
@@ -862,11 +862,16 @@ def int_xray_customevent : Intrinsic<[], [llvm_ptr_ty, llvm_i32_ty],
  //===------ Memory intrinsics with element-wise atomicity guarantees ------===//
  //
  
-def int_memcpy_element_atomic  : Intrinsic<[],
-                                           [llvm_anyptr_ty, llvm_anyptr_ty,
-                                            llvm_i64_ty, llvm_i32_ty],
-                                 [IntrArgMemOnly, NoCapture<0>, NoCapture<1>,
-                                  WriteOnly<0>, ReadOnly<1>]>;
+// @llvm.memcpy.element.unordered.atomic.*(dest, src, length, elementsize)
+def int_memcpy_element_unordered_atomic
+    : Intrinsic<[],
+                [
+                  llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty, llvm_i32_ty
+                ],
+                [
+                  IntrArgMemOnly, NoCapture<0>, NoCapture<1>, WriteOnly<0>,
+                  ReadOnly<1>
+                ]>;
  
  //===------------------------ Reduction Intrinsics ------------------------===//
  //
diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

index 3d42990667e8b70df8a49974e52a588a78fd821d..f9f431db55be3f3a8d3c6976597290bab5eda2ec 100644 (file)
--- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -4943,11 +4943,12 @@ SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, unsigned Intrinsic) {
      updateDAGForMaybeTailCall(MM);
      return nullptr;
    }
-  case Intrinsic::memcpy_element_atomic: {
-    SDValue Dst = getValue(I.getArgOperand(0));
-    SDValue Src = getValue(I.getArgOperand(1));
-    SDValue NumElements = getValue(I.getArgOperand(2));
-    SDValue ElementSize = getValue(I.getArgOperand(3));
+  case Intrinsic::memcpy_element_unordered_atomic: {
+    const ElementUnorderedAtomicMemCpyInst &MI =
+        cast<ElementUnorderedAtomicMemCpyInst>(I);
+    SDValue Dst = getValue(MI.getRawDest());
+    SDValue Src = getValue(MI.getRawSource());
+    SDValue Length = getValue(MI.getLength());
  
      // Emit a library call.
      TargetLowering::ArgListTy Args;
@@ -4959,18 +4960,13 @@ SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, unsigned Intrinsic) {
      Entry.Node = Src;
      Args.push_back(Entry);
  
-    Entry.Ty = I.getArgOperand(2)->getType();
-    Entry.Node = NumElements;
+    Entry.Ty = MI.getLength()->getType();
+    Entry.Node = Length;
      Args.push_back(Entry);
  
-    Entry.Ty = Type::getInt32Ty(*DAG.getContext());
-    Entry.Node = ElementSize;
-    Args.push_back(Entry);
-
-    uint64_t ElementSizeConstant =
-        cast<ConstantInt>(I.getArgOperand(3))->getZExtValue();
+    uint64_t ElementSizeConstant = MI.getElementSizeInBytes();
      RTLIB::Libcall LibraryCall =
-        RTLIB::getMEMCPY_ELEMENT_ATOMIC(ElementSizeConstant);
+        RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(ElementSizeConstant);
      if (LibraryCall == RTLIB::UNKNOWN_LIBCALL)
        report_fatal_error("Unsupported element size");
  
diff --git a/lib/CodeGen/TargetLoweringBase.cpp b/lib/CodeGen/TargetLoweringBase.cpp

index 581cfaf60755d349bfccbd54e2c2277358d44667..e9d38c10c86014b63cf3a8a76f19ec785ca80910 100644 (file)
--- a/lib/CodeGen/TargetLoweringBase.cpp
+++ b/lib/CodeGen/TargetLoweringBase.cpp
@@ -374,11 +374,16 @@ static void InitLibcallNames(const char **Names, const Triple &TT) {
    Names[RTLIB::MEMCPY] = "memcpy";
    Names[RTLIB::MEMMOVE] = "memmove";
    Names[RTLIB::MEMSET] = "memset";
-  Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_1] = "__llvm_memcpy_element_atomic_1";
-  Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_2] = "__llvm_memcpy_element_atomic_2";
-  Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_4] = "__llvm_memcpy_element_atomic_4";
-  Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_8] = "__llvm_memcpy_element_atomic_8";
-  Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_16] = "__llvm_memcpy_element_atomic_16";
+  Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_1] =
+      "__llvm_memcpy_element_unordered_atomic_1";
+  Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_2] =
+      "__llvm_memcpy_element_unordered_atomic_2";
+  Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_4] =
+      "__llvm_memcpy_element_unordered_atomic_4";
+  Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_8] =
+      "__llvm_memcpy_element_unordered_atomic_8";
+  Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_16] =
+      "__llvm_memcpy_element_unordered_atomic_16";
    Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume";
    Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1";
    Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2";
@@ -781,22 +786,21 @@ RTLIB::Libcall RTLIB::getSYNC(unsigned Opc, MVT VT) {
    return UNKNOWN_LIBCALL;
  }
  
-RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize) {
+RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize) {
    switch (ElementSize) {
    case 1:
-    return MEMCPY_ELEMENT_ATOMIC_1;
+    return MEMCPY_ELEMENT_UNORDERED_ATOMIC_1;
    case 2:
-    return MEMCPY_ELEMENT_ATOMIC_2;
+    return MEMCPY_ELEMENT_UNORDERED_ATOMIC_2;
    case 4:
-    return MEMCPY_ELEMENT_ATOMIC_4;
+    return MEMCPY_ELEMENT_UNORDERED_ATOMIC_4;
    case 8:
-    return MEMCPY_ELEMENT_ATOMIC_8;
+    return MEMCPY_ELEMENT_UNORDERED_ATOMIC_8;
    case 16:
-    return MEMCPY_ELEMENT_ATOMIC_16;
+    return MEMCPY_ELEMENT_UNORDERED_ATOMIC_16;
    default:
      return UNKNOWN_LIBCALL;
    }
-
  }
  
  /// InitCmpLibcallCCs - Set default comparison libcall CC.
diff --git a/lib/IR/IRBuilder.cpp b/lib/IR/IRBuilder.cpp

index 81b02946e1d507feb9893267cb38e512474455cf..b7fa07c6ffac74c06d8ffbd60545decf26ccb42b 100644 (file)
--- a/lib/IR/IRBuilder.cpp
+++ b/lib/IR/IRBuilder.cpp
@@ -134,18 +134,17 @@ CreateMemCpy(Value *Dst, Value *Src, Value *Size, unsigned Align,
    return CI;  
  }
  
-CallInst *IRBuilderBase::CreateElementAtomicMemCpy(
-    Value *Dst, Value *Src, Value *NumElements, uint32_t ElementSize,
-    MDNode *TBAATag, MDNode *TBAAStructTag, MDNode *ScopeTag,
-    MDNode *NoAliasTag) {
+CallInst *IRBuilderBase::CreateElementUnorderedAtomicMemCpy(
+    Value *Dst, Value *Src, Value *Size, uint32_t ElementSize, MDNode *TBAATag,
+    MDNode *TBAAStructTag, MDNode *ScopeTag, MDNode *NoAliasTag) {
    Dst = getCastedInt8PtrValue(Dst);
    Src = getCastedInt8PtrValue(Src);
  
-  Value *Ops[] = {Dst, Src, NumElements, getInt32(ElementSize)};
-  Type *Tys[] = {Dst->getType(), Src->getType()};
+  Value *Ops[] = {Dst, Src, Size, getInt32(ElementSize)};
+  Type *Tys[] = {Dst->getType(), Src->getType(), Size->getType()};
    Module *M = BB->getParent()->getParent();
-  Value *TheFn =
-      Intrinsic::getDeclaration(M, Intrinsic::memcpy_element_atomic, Tys);
+  Value *TheFn = Intrinsic::getDeclaration(
+      M, Intrinsic::memcpy_element_unordered_atomic, Tys);
  
    CallInst *CI = createCallHelper(TheFn, Ops, this);
  
diff --git a/lib/IR/Verifier.cpp b/lib/IR/Verifier.cpp

index cf54cc3d6aee4dd325208e43a23ee60dde9b01a3..819f63520c7441f46068e0438151ed72fde6f99a 100644 (file)
--- a/lib/IR/Verifier.cpp
+++ b/lib/IR/Verifier.cpp
@@ -4012,10 +4012,16 @@ void Verifier::visitIntrinsicCallSite(Intrinsic::ID ID, CallSite CS) {
             CS);
      break;
    }
-  case Intrinsic::memcpy_element_atomic: {
-    ConstantInt *ElementSizeCI = dyn_cast<ConstantInt>(CS.getArgOperand(3));
-    Assert(ElementSizeCI, "element size of the element-wise atomic memory "
-                          "intrinsic must be a constant int",
+  case Intrinsic::memcpy_element_unordered_atomic: {
+    const ElementUnorderedAtomicMemCpyInst *MI =
+        cast<ElementUnorderedAtomicMemCpyInst>(CS.getInstruction());
+    ;
+
+    ConstantInt *ElementSizeCI =
+        dyn_cast<ConstantInt>(MI->getRawElementSizeInBytes());
+    Assert(ElementSizeCI,
+           "element size of the element-wise unordered atomic memory "
+           "intrinsic must be a constant int",
             CS);
      const APInt &ElementSizeVal = ElementSizeCI->getValue();
      Assert(ElementSizeVal.isPowerOf2(),
@@ -4023,19 +4029,24 @@ void Verifier::visitIntrinsicCallSite(Intrinsic::ID ID, CallSite CS) {
             "must be a power of 2",
             CS);
  
+    if (auto *LengthCI = dyn_cast<ConstantInt>(MI->getLength())) {
+      uint64_t Length = LengthCI->getZExtValue();
+      uint64_t ElementSize = MI->getElementSizeInBytes();
+      Assert((Length % ElementSize) == 0,
+             "constant length must be a multiple of the element size in the "
+             "element-wise atomic memory intrinsic",
+             CS);
+    }
+
      auto IsValidAlignment = [&](uint64_t Alignment) {
        return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment);
      };
-
      uint64_t DstAlignment = CS.getParamAlignment(0),
               SrcAlignment = CS.getParamAlignment(1);
-
      Assert(IsValidAlignment(DstAlignment),
-           "incorrect alignment of the destination argument",
-           CS);
+           "incorrect alignment of the destination argument", CS);
      Assert(IsValidAlignment(SrcAlignment),
-           "incorrect alignment of the source argument",
-           CS);
+           "incorrect alignment of the source argument", CS);
      break;
    }
    case Intrinsic::gcroot:
diff --git a/lib/Transforms/InstCombine/InstCombineCalls.cpp b/lib/Transforms/InstCombine/InstCombineCalls.cpp

index d29ed49eca0b0d31438e92f8ea318c3617d300cd..c0830a5d211248114049501cd1b809aa738f7395 100644 (file)
--- a/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -94,75 +94,80 @@ static Constant *getNegativeIsTrueBoolVec(ConstantDataVector *V) {
    return ConstantVector::get(BoolVec);
  }
  
-Instruction *
-InstCombiner::SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI) {
+Instruction *InstCombiner::SimplifyElementUnorderedAtomicMemCpy(
+    ElementUnorderedAtomicMemCpyInst *AMI) {
    // Try to unfold this intrinsic into sequence of explicit atomic loads and
    // stores.
    // First check that number of elements is compile time constant.
-  auto *NumElementsCI = dyn_cast<ConstantInt>(AMI->getNumElements());
-  if (!NumElementsCI)
+  auto *LengthCI = dyn_cast<ConstantInt>(AMI->getLength());
+  if (!LengthCI)
      return nullptr;
  
    // Check that there are not too many elements.
-  uint64_t NumElements = NumElementsCI->getZExtValue();
+  uint64_t LengthInBytes = LengthCI->getZExtValue();
+  uint32_t ElementSizeInBytes = AMI->getElementSizeInBytes();
+  uint64_t NumElements = LengthInBytes / ElementSizeInBytes;
    if (NumElements >= UnfoldElementAtomicMemcpyMaxElements)
      return nullptr;
  
-  // Don't unfold into illegal integers
-  uint64_t ElementSizeInBytes = AMI->getElementSizeInBytes() * 8;
-  if (!getDataLayout().isLegalInteger(ElementSizeInBytes))
-    return nullptr;
+  // Only expand if there are elements to copy.
+  if (NumElements > 0) {
+    // Don't unfold into illegal integers
+    uint64_t ElementSizeInBits = ElementSizeInBytes * 8;
+    if (!getDataLayout().isLegalInteger(ElementSizeInBits))
+      return nullptr;
  
-  // Cast source and destination to the correct type. Intrinsic input arguments
-  // are usually represented as i8*.
-  // Often operands will be explicitly casted to i8* and we can just strip
-  // those casts instead of inserting new ones. However it's easier to rely on
-  // other InstCombine rules which will cover trivial cases anyway.
-  Value *Src = AMI->getRawSource();
-  Value *Dst = AMI->getRawDest();
-  Type *ElementPointerType = Type::getIntNPtrTy(
-      AMI->getContext(), ElementSizeInBytes, Src->getType()->getPointerAddressSpace());
-
-  Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType,
-                                                "memcpy_unfold.src_casted");
-  Value *DstCasted = Builder->CreatePointerCast(Dst, ElementPointerType,
-                                                "memcpy_unfold.dst_casted");
-
-  for (uint64_t i = 0; i < NumElements; ++i) {
-    // Get current element addresses
-    ConstantInt *ElementIdxCI =
-        ConstantInt::get(AMI->getContext(), APInt(64, i));
-    Value *SrcElementAddr =
-        Builder->CreateGEP(SrcCasted, ElementIdxCI, "memcpy_unfold.src_addr");
-    Value *DstElementAddr =
-        Builder->CreateGEP(DstCasted, ElementIdxCI, "memcpy_unfold.dst_addr");
-
-    // Load from the source. Transfer alignment information and mark load as
-    // unordered atomic.
-    LoadInst *Load = Builder->CreateLoad(SrcElementAddr, "memcpy_unfold.val");
-    Load->setOrdering(AtomicOrdering::Unordered);
-    // We know alignment of the first element. It is also guaranteed by the
-    // verifier that element size is less or equal than first element alignment
-    // and both of this values are powers of two.
-    // This means that all subsequent accesses are at least element size
-    // aligned.
-    // TODO: We can infer better alignment but there is no evidence that this
-    // will matter.
-    Load->setAlignment(i == 0 ? AMI->getSrcAlignment()
-                              : AMI->getElementSizeInBytes());
-    Load->setDebugLoc(AMI->getDebugLoc());
-
-    // Store loaded value via unordered atomic store.
-    StoreInst *Store = Builder->CreateStore(Load, DstElementAddr);
-    Store->setOrdering(AtomicOrdering::Unordered);
-    Store->setAlignment(i == 0 ? AMI->getDstAlignment()
-                               : AMI->getElementSizeInBytes());
-    Store->setDebugLoc(AMI->getDebugLoc());
+    // Cast source and destination to the correct type. Intrinsic input
+    // arguments are usually represented as i8*. Often operands will be
+    // explicitly casted to i8* and we can just strip those casts instead of
+    // inserting new ones. However it's easier to rely on other InstCombine
+    // rules which will cover trivial cases anyway.
+    Value *Src = AMI->getRawSource();
+    Value *Dst = AMI->getRawDest();
+    Type *ElementPointerType =
+        Type::getIntNPtrTy(AMI->getContext(), ElementSizeInBits,
+                           Src->getType()->getPointerAddressSpace());
+
+    Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType,
+                                                  "memcpy_unfold.src_casted");
+    Value *DstCasted = Builder->CreatePointerCast(Dst, ElementPointerType,
+                                                  "memcpy_unfold.dst_casted");
+
+    for (uint64_t i = 0; i < NumElements; ++i) {
+      // Get current element addresses
+      ConstantInt *ElementIdxCI =
+          ConstantInt::get(AMI->getContext(), APInt(64, i));
+      Value *SrcElementAddr =
+          Builder->CreateGEP(SrcCasted, ElementIdxCI, "memcpy_unfold.src_addr");
+      Value *DstElementAddr =
+          Builder->CreateGEP(DstCasted, ElementIdxCI, "memcpy_unfold.dst_addr");
+
+      // Load from the source. Transfer alignment information and mark load as
+      // unordered atomic.
+      LoadInst *Load = Builder->CreateLoad(SrcElementAddr, "memcpy_unfold.val");
+      Load->setOrdering(AtomicOrdering::Unordered);
+      // We know alignment of the first element. It is also guaranteed by the
+      // verifier that element size is less or equal than first element
+      // alignment and both of this values are powers of two. This means that
+      // all subsequent accesses are at least element size aligned.
+      // TODO: We can infer better alignment but there is no evidence that this
+      // will matter.
+      Load->setAlignment(i == 0 ? AMI->getParamAlignment(1)
+                                : ElementSizeInBytes);
+      Load->setDebugLoc(AMI->getDebugLoc());
+
+      // Store loaded value via unordered atomic store.
+      StoreInst *Store = Builder->CreateStore(Load, DstElementAddr);
+      Store->setOrdering(AtomicOrdering::Unordered);
+      Store->setAlignment(i == 0 ? AMI->getParamAlignment(0)
+                                 : ElementSizeInBytes);
+      Store->setDebugLoc(AMI->getDebugLoc());
+    }
    }
  
    // Set the number of elements of the copy to 0, it will be deleted on the
    // next iteration.
-  AMI->setNumElements(Constant::getNullValue(NumElementsCI->getType()));
+  AMI->setLength(Constant::getNullValue(LengthCI->getType()));
    return AMI;
  }
  
@@ -1888,12 +1893,12 @@ Instruction *InstCombiner::visitCallInst(CallInst &CI) {
      if (Changed) return II;
    }
  
-  if (auto *AMI = dyn_cast<ElementAtomicMemCpyInst>(II)) {
-    if (Constant *C = dyn_cast<Constant>(AMI->getNumElements()))
+  if (auto *AMI = dyn_cast<ElementUnorderedAtomicMemCpyInst>(II)) {
+    if (Constant *C = dyn_cast<Constant>(AMI->getLength()))
        if (C->isNullValue())
          return eraseInstFromFunction(*AMI);
  
-    if (Instruction *I = SimplifyElementAtomicMemCpy(AMI))
+    if (Instruction *I = SimplifyElementUnorderedAtomicMemCpy(AMI))
        return I;
    }
  
diff --git a/lib/Transforms/InstCombine/InstCombineInternal.h b/lib/Transforms/InstCombine/InstCombineInternal.h

index 8c5aeb2a0cbe8a6c2632612f4cebb2745dd96b92..1a7db146df426e2e5a9327c3b05ad23e6fa6808e 100644 (file)
--- a/lib/Transforms/InstCombine/InstCombineInternal.h
+++ b/lib/Transforms/InstCombine/InstCombineInternal.h
@@ -726,7 +726,8 @@ private:
    Instruction *MatchBSwap(BinaryOperator &I);
    bool SimplifyStoreAtEndOfBlock(StoreInst &SI);
  
-  Instruction *SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI);
+  Instruction *
+  SimplifyElementUnorderedAtomicMemCpy(ElementUnorderedAtomicMemCpyInst *AMI);
    Instruction *SimplifyMemTransfer(MemIntrinsic *MI);
    Instruction *SimplifyMemSet(MemSetInst *MI);
  
diff --git a/lib/Transforms/Scalar/LoopIdiomRecognize.cpp b/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

index b706152f30c819a2c93b77108eaab804b40faff9..8b435050ac769b3051cce5051d53d4b4f6900bc1 100644 (file)
--- a/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
+++ b/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
@@ -983,21 +983,21 @@ bool LoopIdiomRecognize::processLoopStoreOfLoopLoad(StoreInst *SI,
    const SCEV *NumBytesS =
        SE->getAddExpr(BECount, SE->getOne(IntPtrTy), SCEV::FlagNUW);
  
+  if (StoreSize != 1)
+    NumBytesS = SE->getMulExpr(NumBytesS, SE->getConstant(IntPtrTy, StoreSize),
+                               SCEV::FlagNUW);
+
+  Value *NumBytes =
+      Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());
+
    unsigned Align = std::min(SI->getAlignment(), LI->getAlignment());
    CallInst *NewCall = nullptr;
    // Check whether to generate an unordered atomic memcpy:
    //  If the load or store are atomic, then they must neccessarily be unordered
    //  by previous checks.
-  if (!SI->isAtomic() && !LI->isAtomic()) {
-    if (StoreSize != 1)
-      NumBytesS = SE->getMulExpr(
-          NumBytesS, SE->getConstant(IntPtrTy, StoreSize), SCEV::FlagNUW);
-
-    Value *NumBytes =
-        Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());
-
+  if (!SI->isAtomic() && !LI->isAtomic())
      NewCall = Builder.CreateMemCpy(StoreBasePtr, LoadBasePtr, NumBytes, Align);
-  } else {
+  else {
      // We cannot allow unaligned ops for unordered load/store, so reject
      // anything where the alignment isn't at least the element size.
      if (Align < StoreSize)
@@ -1010,11 +1010,9 @@ bool LoopIdiomRecognize::processLoopStoreOfLoopLoad(StoreInst *SI,
      if (StoreSize > TTI->getAtomicMemIntrinsicMaxElementSize())
        return false;
  
-    Value *NumElements =
-        Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());
+    NewCall = Builder.CreateElementUnorderedAtomicMemCpy(
+        StoreBasePtr, LoadBasePtr, NumBytes, StoreSize);
  
-    NewCall = Builder.CreateElementAtomicMemCpy(StoreBasePtr, LoadBasePtr,
-                                                NumElements, StoreSize);
      // Propagate alignment info onto the pointer args. Note that unordered
      // atomic loads/stores are *required* by the spec to have an alignment
      // but non-atomic loads/stores may not.
diff --git a/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll b/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll

index 4dc5b1ba03398dd26321441798119cf62c26f2a2..9dd184c8ab316ba796a7d3e1fcacb4c31c8c045f 100644 (file)
--- a/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll
+++ b/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll
@@ -2,47 +2,47 @@
  
  define i8* @test_memcpy1(i8* %P, i8* %Q) {
    ; CHECK: test_memcpy
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 1, i32 1)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 1)
    ret i8* %P
+  ; 3rd arg (%edx) -- length
    ; CHECK-DAG: movl $1, %edx
-  ; CHECK-DAG: movl $1, %ecx
-  ; CHECK: __llvm_memcpy_element_atomic_1
+  ; CHECK: __llvm_memcpy_element_unordered_atomic_1
  }
  
  define i8* @test_memcpy2(i8* %P, i8* %Q) {
    ; CHECK: test_memcpy2
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 2, i32 2)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 2, i32 2)
    ret i8* %P
+  ; 3rd arg (%edx) -- length
    ; CHECK-DAG: movl $2, %edx
-  ; CHECK-DAG: movl $2, %ecx
-  ; CHECK: __llvm_memcpy_element_atomic_2
+  ; CHECK: __llvm_memcpy_element_unordered_atomic_2
  }
  
  define i8* @test_memcpy4(i8* %P, i8* %Q) {
    ; CHECK: test_memcpy4
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 4, i32 4)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 4, i32 4)
    ret i8* %P
+  ; 3rd arg (%edx) -- length
    ; CHECK-DAG: movl $4, %edx
-  ; CHECK-DAG: movl $4, %ecx
-  ; CHECK: __llvm_memcpy_element_atomic_4
+  ; CHECK: __llvm_memcpy_element_unordered_atomic_4
  }
  
  define i8* @test_memcpy8(i8* %P, i8* %Q) {
    ; CHECK: test_memcpy8
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %P, i8* align 8 %Q, i64 8, i32 8)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %P, i8* align 8 %Q, i32 8, i32 8)
    ret i8* %P
+  ; 3rd arg (%edx) -- length
    ; CHECK-DAG: movl $8, %edx
-  ; CHECK-DAG: movl $8, %ecx
-  ; CHECK: __llvm_memcpy_element_atomic_8
+  ; CHECK: __llvm_memcpy_element_unordered_atomic_8
  }
  
  define i8* @test_memcpy16(i8* %P, i8* %Q) {
    ; CHECK: test_memcpy16
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %P, i8* align 16 %Q, i64 16, i32 16)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %P, i8* align 16 %Q, i32 16, i32 16)
    ret i8* %P
+  ; 3rd arg (%edx) -- length
    ; CHECK-DAG: movl $16, %edx
-  ; CHECK-DAG: movl $16, %ecx
-  ; CHECK: __llvm_memcpy_element_atomic_16
+  ; CHECK: __llvm_memcpy_element_unordered_atomic_16
  }
  
  define void @test_memcpy_args(i8** %Storage) {
@@ -51,18 +51,15 @@ define void @test_memcpy_args(i8** %Storage) {
    %Src.addr = getelementptr i8*, i8** %Storage, i64 1
    %Src = load i8*, i8** %Src.addr
  
-  ; First argument
+  ; 1st arg (%rdi)
    ; CHECK-DAG: movq (%rdi), [[REG1:%r.+]]
    ; CHECK-DAG: movq [[REG1]], %rdi
-  ; Second argument
+  ; 2nd arg (%rsi)
    ; CHECK-DAG: movq 8(%rdi), %rsi
-  ; Third argument
+  ; 3rd arg (%edx) -- length
    ; CHECK-DAG: movl $4, %edx
-  ; Fourth argument
-  ; CHECK-DAG: movl $4, %ecx
-  ; CHECK: __llvm_memcpy_element_atomic_4
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 4, i32 4)
-  ret void
+  ; CHECK: __llvm_memcpy_element_unordered_atomic_4
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %Dst, i8* align 4 %Src, i32 4, i32 4)  ret void
  }
  
-declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind
+declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind
diff --git a/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll b/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll

index 107440f10a5a22623fb206ca17d6c91815c5a839..230ac1796671fda15f527b7bb399959da809e7f0 100644 (file)
--- a/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll
+++ b/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll
@@ -1,10 +1,11 @@
  ; RUN: opt -instcombine -unfold-element-atomic-memcpy-max-elements=8 -S < %s | FileCheck %s
+; Temporarily an expected failure until inst combine is updated in the next patch
  target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
  
-; Test basic unfolding
-define void @test1(i8* %Src, i8* %Dst) {
-; CHECK-LABEL: test1
-; CHECK-NOT: llvm.memcpy.element.atomic
+; Test basic unfolding -- unordered load & store
+define void @test1a(i8* %Src, i8* %Dst) {
+; CHECK-LABEL: test1a
+; CHECK-NOT: llvm.memcpy.element.unordered.atomic
  
  ; CHECK-DAG: %memcpy_unfold.src_casted = bitcast i8* %Src to i32*
  ; CHECK-DAG: %memcpy_unfold.dst_casted = bitcast i8* %Dst to i32*
@@ -21,7 +22,7 @@ define void @test1(i8* %Src, i8* %Dst) {
  ; CHECK-DAG: [[VAL4:%[^\s]+]] =  load atomic i32, i32* %{{[^\s]+}} unordered, align 4
  ; CHECK-DAG: store atomic i32 [[VAL4]], i32* %{{[^\s]+}} unordered, align 4
  entry:
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 8 %Src, i64 4, i32 4)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %Dst, i8* align 4 %Src, i32 16, i32 4)
    ret void
  }
  
@@ -31,9 +32,9 @@ define void @test2(i8* %Src, i8* %Dst) {
  
  ; CHECK-NOT: load
  ; CHECK-NOT: store
-; CHECK: llvm.memcpy.element.atomic
+; CHECK: llvm.memcpy.element.unordered.atomic
  entry:
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 1000, i32 4)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %Dst, i8* align 4 %Src, i32 256, i32 4)
    ret void
  }
  
@@ -43,16 +44,16 @@ define void @test3(i8* %Src, i8* %Dst) {
  
  ; CHECK-NOT: load
  ; CHECK-NOT: store
-; CHECK: llvm.memcpy.element.atomic
+; CHECK: llvm.memcpy.element.unordered.atomic
  entry:
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 4, i32 64)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 64, i32 64)
    ret void
  }
  
  ; Test that we will eliminate redundant bitcasts
  define void @test4(i64* %Src, i64* %Dst) {
  ; CHECK-LABEL: test4
-; CHECK-NOT: llvm.memcpy.element.atomic
+; CHECK-NOT: llvm.memcpy.element.unordered.atomic
  
  ; CHECK-NOT: bitcast
  
@@ -76,17 +77,18 @@ define void @test4(i64* %Src, i64* %Dst) {
  entry:
    %Src.casted = bitcast i64* %Src to i8*
    %Dst.casted = bitcast i64* %Dst to i8*
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %Dst.casted, i8* align 16 %Src.casted, i64 4, i32 8)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %Dst.casted, i8* align 16 %Src.casted, i32 32, i32 8)
    ret void
  }
  
+; Test that 0-length unordered atomic memcpy gets removed.
  define void @test5(i8* %Src, i8* %Dst) {
  ; CHECK-LABEL: test5
  
-; CHECK-NOT: llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 0, i32 64)
+; CHECK-NOT: llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 0, i32 8)
  entry:
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 0, i32 64)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 0, i32 8)
    ret void
  }
  
-declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32)
+declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind
diff --git a/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll b/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll

index ec93847178b587d4e3deb2b8cded693b6f9ae8c7..d52378b864ff9f1b656315881c0fabb79bd630db 100644 (file)
--- a/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll
+++ b/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll
@@ -5,7 +5,7 @@ target triple = "x86_64-unknown-linux-gnu"
  ;; memcpy.atomic formation (atomic load & store)
  define void @test1(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test1(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
  ; CHECK-NOT: store
  ; CHECK: ret void
  bb.nph:
@@ -30,7 +30,7 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation (atomic store, normal load)
  define void @test2(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test2(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
  ; CHECK-NOT: store
  ; CHECK: ret void
  bb.nph:
@@ -55,7 +55,7 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation rejection (atomic store, normal load w/ no align)
  define void @test2b(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test2b(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
  ; CHECK: store
  ; CHECK: ret void
  bb.nph:
@@ -80,7 +80,7 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation rejection (atomic store, normal load w/ bad align)
  define void @test2c(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test2c(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
  ; CHECK: store
  ; CHECK: ret void
  bb.nph:
@@ -105,7 +105,7 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation rejection (atomic store w/ bad align, normal load)
  define void @test2d(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test2d(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
  ; CHECK: store
  ; CHECK: ret void
  bb.nph:
@@ -131,7 +131,7 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation (normal store, atomic load)
  define void @test3(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test3(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
  ; CHECK-NOT: store
  ; CHECK: ret void
  bb.nph:
@@ -156,7 +156,7 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation rejection (normal store w/ no align, atomic load)
  define void @test3b(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test3b(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
  ; CHECK: store
  ; CHECK: ret void
  bb.nph:
@@ -181,7 +181,7 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation rejection (normal store, atomic load w/ bad align)
  define void @test3c(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test3c(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
  ; CHECK: store
  ; CHECK: ret void
  bb.nph:
@@ -206,7 +206,7 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation rejection (normal store w/ bad align, atomic load)
  define void @test3d(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test3d(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
  ; CHECK: store
  ; CHECK: ret void
  bb.nph:
@@ -232,7 +232,7 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation rejection (atomic load, ordered-atomic store)
  define void @test4(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test4(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
  ; CHECK: store
  ; CHECK: ret void
  bb.nph:
@@ -257,7 +257,7 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation rejection (ordered-atomic load, unordered-atomic store)
  define void @test5(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test5(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
  ; CHECK: store
  ; CHECK: ret void
  bb.nph:
@@ -282,7 +282,8 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation (atomic load & store) -- element size 2
  define void @test6(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test6(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %Dest{{[0-9]*}}, i8* align 2 %Base{{[0-9]*}}, i64 %Size, i32 2)
+; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 1
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 2 %Dest{{[0-9]*}}, i8* align 2 %Base{{[0-9]*}}, i64 [[Sz]], i32 2)
  ; CHECK-NOT: store
  ; CHECK: ret void
  bb.nph:
@@ -307,7 +308,8 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation (atomic load & store) -- element size 4
  define void @test7(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test7(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dest{{[0-9]*}}, i8* align 4 %Base{{[0-9]*}}, i64 %Size, i32 4)
+; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 2
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 4 %Dest{{[0-9]*}}, i8* align 4 %Base{{[0-9]*}}, i64 [[Sz]], i32 4)
  ; CHECK-NOT: store
  ; CHECK: ret void
  bb.nph:
@@ -332,7 +334,8 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation (atomic load & store) -- element size 8
  define void @test8(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test8(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %Dest{{[0-9]*}}, i8* align 8 %Base{{[0-9]*}}, i64 %Size, i32 8)
+; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 3
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 8 %Dest{{[0-9]*}}, i8* align 8 %Base{{[0-9]*}}, i64 [[Sz]], i32 8)
  ; CHECK-NOT: store
  ; CHECK: ret void
  bb.nph:
@@ -357,7 +360,8 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation rejection (atomic load & store) -- element size 16
  define void @test9(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test9(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %Dest{{[0-9]*}}, i8* align 16 %Base{{[0-9]*}}, i64 %Size, i32 16)
+; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 4
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 16 %Dest{{[0-9]*}}, i8* align 16 %Base{{[0-9]*}}, i64 [[Sz]], i32 16)
  ; CHECK-NOT: store
  ; CHECK: ret void
  bb.nph:
@@ -382,7 +386,7 @@ for.end:                                          ; preds = %for.body, %entry
  ;; memcpy.atomic formation rejection (atomic load & store) -- element size 32
  define void @test10(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test10(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
  ; CHECK: store
  ; CHECK: ret void
  bb.nph:
diff --git a/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll b/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll

index b2528f1c24577e768681ca86e1a9dde7bebb3c8e..341a7a0baebf05dae13db26a22b3dd25ba62351b 100644 (file)
--- a/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll
+++ b/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll
@@ -5,7 +5,7 @@ target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3
  ;;  Will not create call due to a max element size of 0
  define void @test1(i64 %Size) nounwind ssp {
  ; CHECK-LABEL: @test1(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
  ; CHECK: store
  ; CHECK: ret void
  bb.nph:
diff --git a/test/Verifier/element-wise-atomic-memory-intrinsics.ll b/test/Verifier/element-wise-atomic-memory-intrinsics.ll

index 5690cd721407518d1d7a0264b205ae24a465deef..470c861c50573d3f19b601b99fa9adb8dc58bd12 100644 (file)
--- a/test/Verifier/element-wise-atomic-memory-intrinsics.ll
+++ b/test/Verifier/element-wise-atomic-memory-intrinsics.ll
@@ -1,17 +1,25 @@
  ; RUN: not opt -verify < %s 2>&1 | FileCheck %s
  
-define void @test_memcpy(i8* %P, i8* %Q) {
+define void @test_memcpy(i8* %P, i8* %Q, i32 %A, i32 %E) {
+  ; CHECK: element size of the element-wise unordered atomic memory intrinsic must be a constant int
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 %E)
    ; CHECK: element size of the element-wise atomic memory intrinsic must be a power of 2
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 2 %Q, i64 4, i32 3)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 3)
  
+  ; CHECK: constant length must be a multiple of the element size in the element-wise atomic memory intrinsic
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 7, i32 4)
+
+  ; CHECK: incorrect alignment of the destination argument
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* %P, i8* align 4 %Q, i32 1, i32 1)
    ; CHECK: incorrect alignment of the destination argument
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 4 %Q, i64 4, i32 4)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %P, i8* align 4 %Q, i32 4, i32 4)
  
    ; CHECK: incorrect alignment of the source argument
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 2 %Q, i64 4, i32 4)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* %Q, i32 1, i32 1)
+  ; CHECK: incorrect alignment of the source argument
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 1 %Q, i32 4, i32 4)
  
    ret void
  }
-declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind
-
+declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind
  ; CHECK: input module is broken!
author	Daniel Neilson <dneilson@azul.com>
	Fri, 16 Jun 2017 14:43:59 +0000 (14:43 +0000)
committer	Daniel Neilson <dneilson@azul.com>
	Fri, 16 Jun 2017 14:43:59 +0000 (14:43 +0000)
docs/LangRef.rst		patch \| blob \| history
include/llvm/CodeGen/RuntimeLibcalls.h		patch \| blob \| history
include/llvm/IR/IRBuilder.h		patch \| blob \| history
include/llvm/IR/IntrinsicInst.h		patch \| blob \| history
include/llvm/IR/Intrinsics.td		patch \| blob \| history
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp		patch \| blob \| history
lib/CodeGen/TargetLoweringBase.cpp		patch \| blob \| history
lib/IR/IRBuilder.cpp		patch \| blob \| history
lib/IR/Verifier.cpp		patch \| blob \| history
lib/Transforms/InstCombine/InstCombineCalls.cpp		patch \| blob \| history
lib/Transforms/InstCombine/InstCombineInternal.h		patch \| blob \| history
lib/Transforms/Scalar/LoopIdiomRecognize.cpp		patch \| blob \| history
test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll		patch \| blob \| history
test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll		patch \| blob \| history
test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll		patch \| blob \| history
test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll		patch \| blob \| history
test/Verifier/element-wise-atomic-memory-intrinsics.ll		patch \| blob \| history