These intrinsics are similar to the standard library memory intrinsics except
that they perform memory transfer as a sequence of atomic memory accesses.
-.. _int_memcpy_element_atomic:
+.. _int_memcpy_element_unordered_atomic:
-'``llvm.memcpy.element.atomic``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+'``llvm.memcpy.element.unordered.atomic``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
"""""""
-This is an overloaded intrinsic. You can use ``llvm.memcpy.element.atomic`` on
+This is an overloaded intrinsic. You can use ``llvm.memcpy.element.unordered.atomic`` on
any integer bit width and for different address spaces. Not all targets
support all bit widths however.
::
- declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* <dest>, i8* <src>,
- i64 <num_elements>, i32 <element_size>)
+ declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* <dest>,
+ i8* <src>,
+ i32 <len>,
+ i32 <element_size>)
+ declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* <dest>,
+ i8* <src>,
+ i64 <len>,
+ i32 <element_size>)
Overview:
"""""""""
-The '``llvm.memcpy.element.atomic.*``' intrinsic performs copy of a block of
-memory from the source location to the destination location as a sequence of
-unordered atomic memory accesses where each access is a multiple of
-``element_size`` bytes wide and aligned at an element size boundary. For example
-each element is accessed atomically in source and destination buffers.
+The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic is a specialization of the
+'``llvm.memcpy.*``' intrinsic. It differs in that the ``dest`` and ``src`` are treated
+as arrays with elements that are exactly ``element_size`` bytes, and the copy between
+buffers uses a sequence of :ref:`unordered atomic <ordering>` load/store operations
+that are a positive integer multiple of the ``element_size`` in size.
Arguments:
""""""""""
-The first argument is a pointer to the destination, the second is a
-pointer to the source. The third argument is an integer argument
-specifying the number of elements to copy, the fourth argument is size of
-the single element in bytes.
+The first three arguments are the same as they are in the :ref:`@llvm.memcpy <int_memcpy>`
+intrinsic, with the added constraint that ``len`` is required to be a positive integer
+multiple of the ``element_size``. If ``len`` is not a positive integer multiple of
+``element_size``, then the behaviour of the intrinsic is undefined.
-``element_size`` should be a power of two, greater than zero and less than
-a target-specific atomic access size limit.
+``element_size`` must be a compile-time constant positive power of two no greater than
+target-specific atomic access size limit.
-For each of the input pointers ``align`` parameter attribute must be specified.
-It must be a power of two and greater than or equal to the ``element_size``.
-Caller guarantees that both the source and destination pointers are aligned to
-that boundary.
+For each of the input pointers ``align`` parameter attribute must be specified. It
+must be a power of two no less than the ``element_size``. Caller guarantees that
+both the source and destination pointers are aligned to that boundary.
Semantics:
""""""""""
-The '``llvm.memcpy.element.atomic.*``' intrinsic copies
-'``num_elements`` * ``element_size``' bytes of memory from the source location to
-the destination location. These locations are not allowed to overlap. Memory copy
-is performed as a sequence of unordered atomic memory accesses where each access
-is guaranteed to be a multiple of ``element_size`` bytes wide and aligned at an
-element size boundary.
+The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of
+memory from the source location to the destination location. These locations are not
+allowed to overlap. The memory copy is performed as a sequence of load/store operations
+where each access is guaranteed to be a multiple of ``element_size`` bytes wide and
+aligned at an ``element_size`` boundary.
The order of the copy is unspecified. The same value may be read from the source
buffer many times, but only one write is issued to the destination buffer per
-element. It is well defined to have concurrent reads and writes to both source
-and destination provided those reads and writes are at least unordered atomic.
+element. It is well defined to have concurrent reads and writes to both source and
+destination provided those reads and writes are unordered atomic when specified.
This intrinsic does not provide any additional ordering guarantees over those
provided by a set of unordered loads from the source location and stores to the
Lowering:
"""""""""
-In the most general case call to the '``llvm.memcpy.element.atomic.*``' is lowered
-to a call to the symbol ``__llvm_memcpy_element_atomic_*``. Where '*' is replaced
-with an actual element size.
+In the most general case call to the '``llvm.memcpy.element.unordered.atomic.*``' is
+lowered to a call to the symbol ``__llvm_memcpy_element_unordered_atomic_*``. Where '*'
+is replaced with an actual element size.
-Optimizer is allowed to inline memory copy when it's profitable to do so.
+The optimizer is allowed to inline the memory copy when it's profitable to do so.
MEMSET,
MEMMOVE,
- // ELEMENT-WISE ATOMIC MEMORY
- MEMCPY_ELEMENT_ATOMIC_1,
- MEMCPY_ELEMENT_ATOMIC_2,
- MEMCPY_ELEMENT_ATOMIC_4,
- MEMCPY_ELEMENT_ATOMIC_8,
- MEMCPY_ELEMENT_ATOMIC_16,
+ // ELEMENT-WISE UNORDERED-ATOMIC MEMORY of different element sizes
+ MEMCPY_ELEMENT_UNORDERED_ATOMIC_1,
+ MEMCPY_ELEMENT_UNORDERED_ATOMIC_2,
+ MEMCPY_ELEMENT_UNORDERED_ATOMIC_4,
+ MEMCPY_ELEMENT_UNORDERED_ATOMIC_8,
+ MEMCPY_ELEMENT_UNORDERED_ATOMIC_16,
// EXCEPTION HANDLING
UNWIND_RESUME,
/// UNKNOWN_LIBCALL if there is none.
Libcall getSYNC(unsigned Opc, MVT VT);
- /// getMEMCPY_ELEMENT_ATOMIC - Return MEMCPY_ELEMENT_ATOMIC_* value for the
- /// given element size or UNKNOW_LIBCALL if there is none.
- Libcall getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize);
+ /// getMEMCPY_ELEMENT_UNORDERED_ATOMIC - Return
+ /// MEMCPY_ELEMENT_UNORDERED_ATOMIC_* value for the given element size or
+ /// UNKNOW_LIBCALL if there is none.
+ Libcall getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize);
}
}
MDNode *ScopeTag = nullptr,
MDNode *NoAliasTag = nullptr);
- /// \brief Create and insert an atomic memcpy between the specified
- /// pointers.
+ /// \brief Create and insert an element unordered-atomic memcpy between the
+ /// specified pointers.
///
/// If the pointers aren't i8*, they will be converted. If a TBAA tag is
/// specified, it will be added to the instruction. Likewise with alias.scope
/// and noalias tags.
- CallInst *CreateElementAtomicMemCpy(
- Value *Dst, Value *Src, uint64_t NumElements, uint32_t ElementSize,
+ CallInst *CreateElementUnorderedAtomicMemCpy(
+ Value *Dst, Value *Src, uint64_t Size, uint32_t ElementSize,
MDNode *TBAATag = nullptr, MDNode *TBAAStructTag = nullptr,
MDNode *ScopeTag = nullptr, MDNode *NoAliasTag = nullptr) {
- return CreateElementAtomicMemCpy(Dst, Src, getInt64(NumElements),
- ElementSize, TBAATag, TBAAStructTag,
- ScopeTag, NoAliasTag);
+ return CreateElementUnorderedAtomicMemCpy(
+ Dst, Src, getInt64(Size), ElementSize, TBAATag, TBAAStructTag, ScopeTag,
+ NoAliasTag);
}
- CallInst *CreateElementAtomicMemCpy(Value *Dst, Value *Src,
- Value *NumElements, uint32_t ElementSize,
- MDNode *TBAATag = nullptr,
- MDNode *TBAAStructTag = nullptr,
- MDNode *ScopeTag = nullptr,
- MDNode *NoAliasTag = nullptr);
+ CallInst *CreateElementUnorderedAtomicMemCpy(
+ Value *Dst, Value *Src, Value *Size, uint32_t ElementSize,
+ MDNode *TBAATag = nullptr, MDNode *TBAAStructTag = nullptr,
+ MDNode *ScopeTag = nullptr, MDNode *NoAliasTag = nullptr);
/// \brief Create and insert a memmove between the specified
/// pointers.
};
/// This class represents atomic memcpy intrinsic
- /// TODO: Integrate this class into MemIntrinsic hierarchy.
- class ElementAtomicMemCpyInst : public IntrinsicInst {
+ /// TODO: Integrate this class into MemIntrinsic hierarchy; for now this is
+ /// C&P of all methods from that hierarchy
+ class ElementUnorderedAtomicMemCpyInst : public IntrinsicInst {
+ private:
+ enum { ARG_DEST = 0, ARG_SOURCE = 1, ARG_LENGTH = 2, ARG_ELEMENTSIZE = 3 };
+
public:
- Value *getRawDest() const { return getArgOperand(0); }
- Value *getRawSource() const { return getArgOperand(1); }
+ Value *getRawDest() const {
+ return const_cast<Value *>(getArgOperand(ARG_DEST));
+ }
+ const Use &getRawDestUse() const { return getArgOperandUse(ARG_DEST); }
+ Use &getRawDestUse() { return getArgOperandUse(ARG_DEST); }
+
+ /// Return the arguments to the instruction.
+ Value *getRawSource() const {
+ return const_cast<Value *>(getArgOperand(ARG_SOURCE));
+ }
+ const Use &getRawSourceUse() const { return getArgOperandUse(ARG_SOURCE); }
+ Use &getRawSourceUse() { return getArgOperandUse(ARG_SOURCE); }
+
+ Value *getLength() const {
+ return const_cast<Value *>(getArgOperand(ARG_LENGTH));
+ }
+ const Use &getLengthUse() const { return getArgOperandUse(ARG_LENGTH); }
+ Use &getLengthUse() { return getArgOperandUse(ARG_LENGTH); }
+
+ bool isVolatile() const { return false; }
+
+ Value *getRawElementSizeInBytes() const {
+ return const_cast<Value *>(getArgOperand(ARG_ELEMENTSIZE));
+ }
+
+ ConstantInt *getElementSizeInBytesCst() const {
+ return cast<ConstantInt>(getRawElementSizeInBytes());
+ }
+
+ uint32_t getElementSizeInBytes() const {
+ return getElementSizeInBytesCst()->getZExtValue();
+ }
+
+ /// This is just like getRawDest, but it strips off any cast
+ /// instructions that feed it, giving the original input. The returned
+ /// value is guaranteed to be a pointer.
+ Value *getDest() const { return getRawDest()->stripPointerCasts(); }
+
+ /// This is just like getRawSource, but it strips off any cast
+ /// instructions that feed it, giving the original input. The returned
+ /// value is guaranteed to be a pointer.
+ Value *getSource() const { return getRawSource()->stripPointerCasts(); }
+
+ unsigned getDestAddressSpace() const {
+ return cast<PointerType>(getRawDest()->getType())->getAddressSpace();
+ }
- Value *getNumElements() const { return getArgOperand(2); }
- void setNumElements(Value *V) { setArgOperand(2, V); }
+ unsigned getSourceAddressSpace() const {
+ return cast<PointerType>(getRawSource()->getType())->getAddressSpace();
+ }
- uint64_t getSrcAlignment() const { return getParamAlignment(0); }
- uint64_t getDstAlignment() const { return getParamAlignment(1); }
+ /// Set the specified arguments of the instruction.
+ void setDest(Value *Ptr) {
+ assert(getRawDest()->getType() == Ptr->getType() &&
+ "setDest called with pointer of wrong type!");
+ setArgOperand(ARG_DEST, Ptr);
+ }
+
+ void setSource(Value *Ptr) {
+ assert(getRawSource()->getType() == Ptr->getType() &&
+ "setSource called with pointer of wrong type!");
+ setArgOperand(ARG_SOURCE, Ptr);
+ }
+
+ void setLength(Value *L) {
+ assert(getLength()->getType() == L->getType() &&
+ "setLength called with value of wrong type!");
+ setArgOperand(ARG_LENGTH, L);
+ }
- uint64_t getElementSizeInBytes() const {
- Value *Arg = getArgOperand(3);
- return cast<ConstantInt>(Arg)->getZExtValue();
+ void setElementSizeInBytes(Constant *V) {
+ assert(V->getType() == Type::getInt8Ty(getContext()) &&
+ "setElementSizeInBytes called with value of wrong type!");
+ setArgOperand(ARG_ELEMENTSIZE, V);
}
static inline bool classof(const IntrinsicInst *I) {
- return I->getIntrinsicID() == Intrinsic::memcpy_element_atomic;
+ return I->getIntrinsicID() == Intrinsic::memcpy_element_unordered_atomic;
}
static inline bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
//===------ Memory intrinsics with element-wise atomicity guarantees ------===//
//
-def int_memcpy_element_atomic : Intrinsic<[],
- [llvm_anyptr_ty, llvm_anyptr_ty,
- llvm_i64_ty, llvm_i32_ty],
- [IntrArgMemOnly, NoCapture<0>, NoCapture<1>,
- WriteOnly<0>, ReadOnly<1>]>;
+// @llvm.memcpy.element.unordered.atomic.*(dest, src, length, elementsize)
+def int_memcpy_element_unordered_atomic
+ : Intrinsic<[],
+ [
+ llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty, llvm_i32_ty
+ ],
+ [
+ IntrArgMemOnly, NoCapture<0>, NoCapture<1>, WriteOnly<0>,
+ ReadOnly<1>
+ ]>;
//===------------------------ Reduction Intrinsics ------------------------===//
//
updateDAGForMaybeTailCall(MM);
return nullptr;
}
- case Intrinsic::memcpy_element_atomic: {
- SDValue Dst = getValue(I.getArgOperand(0));
- SDValue Src = getValue(I.getArgOperand(1));
- SDValue NumElements = getValue(I.getArgOperand(2));
- SDValue ElementSize = getValue(I.getArgOperand(3));
+ case Intrinsic::memcpy_element_unordered_atomic: {
+ const ElementUnorderedAtomicMemCpyInst &MI =
+ cast<ElementUnorderedAtomicMemCpyInst>(I);
+ SDValue Dst = getValue(MI.getRawDest());
+ SDValue Src = getValue(MI.getRawSource());
+ SDValue Length = getValue(MI.getLength());
// Emit a library call.
TargetLowering::ArgListTy Args;
Entry.Node = Src;
Args.push_back(Entry);
- Entry.Ty = I.getArgOperand(2)->getType();
- Entry.Node = NumElements;
+ Entry.Ty = MI.getLength()->getType();
+ Entry.Node = Length;
Args.push_back(Entry);
- Entry.Ty = Type::getInt32Ty(*DAG.getContext());
- Entry.Node = ElementSize;
- Args.push_back(Entry);
-
- uint64_t ElementSizeConstant =
- cast<ConstantInt>(I.getArgOperand(3))->getZExtValue();
+ uint64_t ElementSizeConstant = MI.getElementSizeInBytes();
RTLIB::Libcall LibraryCall =
- RTLIB::getMEMCPY_ELEMENT_ATOMIC(ElementSizeConstant);
+ RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(ElementSizeConstant);
if (LibraryCall == RTLIB::UNKNOWN_LIBCALL)
report_fatal_error("Unsupported element size");
Names[RTLIB::MEMCPY] = "memcpy";
Names[RTLIB::MEMMOVE] = "memmove";
Names[RTLIB::MEMSET] = "memset";
- Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_1] = "__llvm_memcpy_element_atomic_1";
- Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_2] = "__llvm_memcpy_element_atomic_2";
- Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_4] = "__llvm_memcpy_element_atomic_4";
- Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_8] = "__llvm_memcpy_element_atomic_8";
- Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_16] = "__llvm_memcpy_element_atomic_16";
+ Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_1] =
+ "__llvm_memcpy_element_unordered_atomic_1";
+ Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_2] =
+ "__llvm_memcpy_element_unordered_atomic_2";
+ Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_4] =
+ "__llvm_memcpy_element_unordered_atomic_4";
+ Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_8] =
+ "__llvm_memcpy_element_unordered_atomic_8";
+ Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_16] =
+ "__llvm_memcpy_element_unordered_atomic_16";
Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2";
return UNKNOWN_LIBCALL;
}
-RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize) {
+RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize) {
switch (ElementSize) {
case 1:
- return MEMCPY_ELEMENT_ATOMIC_1;
+ return MEMCPY_ELEMENT_UNORDERED_ATOMIC_1;
case 2:
- return MEMCPY_ELEMENT_ATOMIC_2;
+ return MEMCPY_ELEMENT_UNORDERED_ATOMIC_2;
case 4:
- return MEMCPY_ELEMENT_ATOMIC_4;
+ return MEMCPY_ELEMENT_UNORDERED_ATOMIC_4;
case 8:
- return MEMCPY_ELEMENT_ATOMIC_8;
+ return MEMCPY_ELEMENT_UNORDERED_ATOMIC_8;
case 16:
- return MEMCPY_ELEMENT_ATOMIC_16;
+ return MEMCPY_ELEMENT_UNORDERED_ATOMIC_16;
default:
return UNKNOWN_LIBCALL;
}
-
}
/// InitCmpLibcallCCs - Set default comparison libcall CC.
return CI;
}
-CallInst *IRBuilderBase::CreateElementAtomicMemCpy(
- Value *Dst, Value *Src, Value *NumElements, uint32_t ElementSize,
- MDNode *TBAATag, MDNode *TBAAStructTag, MDNode *ScopeTag,
- MDNode *NoAliasTag) {
+CallInst *IRBuilderBase::CreateElementUnorderedAtomicMemCpy(
+ Value *Dst, Value *Src, Value *Size, uint32_t ElementSize, MDNode *TBAATag,
+ MDNode *TBAAStructTag, MDNode *ScopeTag, MDNode *NoAliasTag) {
Dst = getCastedInt8PtrValue(Dst);
Src = getCastedInt8PtrValue(Src);
- Value *Ops[] = {Dst, Src, NumElements, getInt32(ElementSize)};
- Type *Tys[] = {Dst->getType(), Src->getType()};
+ Value *Ops[] = {Dst, Src, Size, getInt32(ElementSize)};
+ Type *Tys[] = {Dst->getType(), Src->getType(), Size->getType()};
Module *M = BB->getParent()->getParent();
- Value *TheFn =
- Intrinsic::getDeclaration(M, Intrinsic::memcpy_element_atomic, Tys);
+ Value *TheFn = Intrinsic::getDeclaration(
+ M, Intrinsic::memcpy_element_unordered_atomic, Tys);
CallInst *CI = createCallHelper(TheFn, Ops, this);
CS);
break;
}
- case Intrinsic::memcpy_element_atomic: {
- ConstantInt *ElementSizeCI = dyn_cast<ConstantInt>(CS.getArgOperand(3));
- Assert(ElementSizeCI, "element size of the element-wise atomic memory "
- "intrinsic must be a constant int",
+ case Intrinsic::memcpy_element_unordered_atomic: {
+ const ElementUnorderedAtomicMemCpyInst *MI =
+ cast<ElementUnorderedAtomicMemCpyInst>(CS.getInstruction());
+ ;
+
+ ConstantInt *ElementSizeCI =
+ dyn_cast<ConstantInt>(MI->getRawElementSizeInBytes());
+ Assert(ElementSizeCI,
+ "element size of the element-wise unordered atomic memory "
+ "intrinsic must be a constant int",
CS);
const APInt &ElementSizeVal = ElementSizeCI->getValue();
Assert(ElementSizeVal.isPowerOf2(),
"must be a power of 2",
CS);
+ if (auto *LengthCI = dyn_cast<ConstantInt>(MI->getLength())) {
+ uint64_t Length = LengthCI->getZExtValue();
+ uint64_t ElementSize = MI->getElementSizeInBytes();
+ Assert((Length % ElementSize) == 0,
+ "constant length must be a multiple of the element size in the "
+ "element-wise atomic memory intrinsic",
+ CS);
+ }
+
auto IsValidAlignment = [&](uint64_t Alignment) {
return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment);
};
-
uint64_t DstAlignment = CS.getParamAlignment(0),
SrcAlignment = CS.getParamAlignment(1);
-
Assert(IsValidAlignment(DstAlignment),
- "incorrect alignment of the destination argument",
- CS);
+ "incorrect alignment of the destination argument", CS);
Assert(IsValidAlignment(SrcAlignment),
- "incorrect alignment of the source argument",
- CS);
+ "incorrect alignment of the source argument", CS);
break;
}
case Intrinsic::gcroot:
return ConstantVector::get(BoolVec);
}
-Instruction *
-InstCombiner::SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI) {
+Instruction *InstCombiner::SimplifyElementUnorderedAtomicMemCpy(
+ ElementUnorderedAtomicMemCpyInst *AMI) {
// Try to unfold this intrinsic into sequence of explicit atomic loads and
// stores.
// First check that number of elements is compile time constant.
- auto *NumElementsCI = dyn_cast<ConstantInt>(AMI->getNumElements());
- if (!NumElementsCI)
+ auto *LengthCI = dyn_cast<ConstantInt>(AMI->getLength());
+ if (!LengthCI)
return nullptr;
// Check that there are not too many elements.
- uint64_t NumElements = NumElementsCI->getZExtValue();
+ uint64_t LengthInBytes = LengthCI->getZExtValue();
+ uint32_t ElementSizeInBytes = AMI->getElementSizeInBytes();
+ uint64_t NumElements = LengthInBytes / ElementSizeInBytes;
if (NumElements >= UnfoldElementAtomicMemcpyMaxElements)
return nullptr;
- // Don't unfold into illegal integers
- uint64_t ElementSizeInBytes = AMI->getElementSizeInBytes() * 8;
- if (!getDataLayout().isLegalInteger(ElementSizeInBytes))
- return nullptr;
+ // Only expand if there are elements to copy.
+ if (NumElements > 0) {
+ // Don't unfold into illegal integers
+ uint64_t ElementSizeInBits = ElementSizeInBytes * 8;
+ if (!getDataLayout().isLegalInteger(ElementSizeInBits))
+ return nullptr;
- // Cast source and destination to the correct type. Intrinsic input arguments
- // are usually represented as i8*.
- // Often operands will be explicitly casted to i8* and we can just strip
- // those casts instead of inserting new ones. However it's easier to rely on
- // other InstCombine rules which will cover trivial cases anyway.
- Value *Src = AMI->getRawSource();
- Value *Dst = AMI->getRawDest();
- Type *ElementPointerType = Type::getIntNPtrTy(
- AMI->getContext(), ElementSizeInBytes, Src->getType()->getPointerAddressSpace());
-
- Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType,
- "memcpy_unfold.src_casted");
- Value *DstCasted = Builder->CreatePointerCast(Dst, ElementPointerType,
- "memcpy_unfold.dst_casted");
-
- for (uint64_t i = 0; i < NumElements; ++i) {
- // Get current element addresses
- ConstantInt *ElementIdxCI =
- ConstantInt::get(AMI->getContext(), APInt(64, i));
- Value *SrcElementAddr =
- Builder->CreateGEP(SrcCasted, ElementIdxCI, "memcpy_unfold.src_addr");
- Value *DstElementAddr =
- Builder->CreateGEP(DstCasted, ElementIdxCI, "memcpy_unfold.dst_addr");
-
- // Load from the source. Transfer alignment information and mark load as
- // unordered atomic.
- LoadInst *Load = Builder->CreateLoad(SrcElementAddr, "memcpy_unfold.val");
- Load->setOrdering(AtomicOrdering::Unordered);
- // We know alignment of the first element. It is also guaranteed by the
- // verifier that element size is less or equal than first element alignment
- // and both of this values are powers of two.
- // This means that all subsequent accesses are at least element size
- // aligned.
- // TODO: We can infer better alignment but there is no evidence that this
- // will matter.
- Load->setAlignment(i == 0 ? AMI->getSrcAlignment()
- : AMI->getElementSizeInBytes());
- Load->setDebugLoc(AMI->getDebugLoc());
-
- // Store loaded value via unordered atomic store.
- StoreInst *Store = Builder->CreateStore(Load, DstElementAddr);
- Store->setOrdering(AtomicOrdering::Unordered);
- Store->setAlignment(i == 0 ? AMI->getDstAlignment()
- : AMI->getElementSizeInBytes());
- Store->setDebugLoc(AMI->getDebugLoc());
+ // Cast source and destination to the correct type. Intrinsic input
+ // arguments are usually represented as i8*. Often operands will be
+ // explicitly casted to i8* and we can just strip those casts instead of
+ // inserting new ones. However it's easier to rely on other InstCombine
+ // rules which will cover trivial cases anyway.
+ Value *Src = AMI->getRawSource();
+ Value *Dst = AMI->getRawDest();
+ Type *ElementPointerType =
+ Type::getIntNPtrTy(AMI->getContext(), ElementSizeInBits,
+ Src->getType()->getPointerAddressSpace());
+
+ Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType,
+ "memcpy_unfold.src_casted");
+ Value *DstCasted = Builder->CreatePointerCast(Dst, ElementPointerType,
+ "memcpy_unfold.dst_casted");
+
+ for (uint64_t i = 0; i < NumElements; ++i) {
+ // Get current element addresses
+ ConstantInt *ElementIdxCI =
+ ConstantInt::get(AMI->getContext(), APInt(64, i));
+ Value *SrcElementAddr =
+ Builder->CreateGEP(SrcCasted, ElementIdxCI, "memcpy_unfold.src_addr");
+ Value *DstElementAddr =
+ Builder->CreateGEP(DstCasted, ElementIdxCI, "memcpy_unfold.dst_addr");
+
+ // Load from the source. Transfer alignment information and mark load as
+ // unordered atomic.
+ LoadInst *Load = Builder->CreateLoad(SrcElementAddr, "memcpy_unfold.val");
+ Load->setOrdering(AtomicOrdering::Unordered);
+ // We know alignment of the first element. It is also guaranteed by the
+ // verifier that element size is less or equal than first element
+ // alignment and both of this values are powers of two. This means that
+ // all subsequent accesses are at least element size aligned.
+ // TODO: We can infer better alignment but there is no evidence that this
+ // will matter.
+ Load->setAlignment(i == 0 ? AMI->getParamAlignment(1)
+ : ElementSizeInBytes);
+ Load->setDebugLoc(AMI->getDebugLoc());
+
+ // Store loaded value via unordered atomic store.
+ StoreInst *Store = Builder->CreateStore(Load, DstElementAddr);
+ Store->setOrdering(AtomicOrdering::Unordered);
+ Store->setAlignment(i == 0 ? AMI->getParamAlignment(0)
+ : ElementSizeInBytes);
+ Store->setDebugLoc(AMI->getDebugLoc());
+ }
}
// Set the number of elements of the copy to 0, it will be deleted on the
// next iteration.
- AMI->setNumElements(Constant::getNullValue(NumElementsCI->getType()));
+ AMI->setLength(Constant::getNullValue(LengthCI->getType()));
return AMI;
}
if (Changed) return II;
}
- if (auto *AMI = dyn_cast<ElementAtomicMemCpyInst>(II)) {
- if (Constant *C = dyn_cast<Constant>(AMI->getNumElements()))
+ if (auto *AMI = dyn_cast<ElementUnorderedAtomicMemCpyInst>(II)) {
+ if (Constant *C = dyn_cast<Constant>(AMI->getLength()))
if (C->isNullValue())
return eraseInstFromFunction(*AMI);
- if (Instruction *I = SimplifyElementAtomicMemCpy(AMI))
+ if (Instruction *I = SimplifyElementUnorderedAtomicMemCpy(AMI))
return I;
}
Instruction *MatchBSwap(BinaryOperator &I);
bool SimplifyStoreAtEndOfBlock(StoreInst &SI);
- Instruction *SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI);
+ Instruction *
+ SimplifyElementUnorderedAtomicMemCpy(ElementUnorderedAtomicMemCpyInst *AMI);
Instruction *SimplifyMemTransfer(MemIntrinsic *MI);
Instruction *SimplifyMemSet(MemSetInst *MI);
const SCEV *NumBytesS =
SE->getAddExpr(BECount, SE->getOne(IntPtrTy), SCEV::FlagNUW);
+ if (StoreSize != 1)
+ NumBytesS = SE->getMulExpr(NumBytesS, SE->getConstant(IntPtrTy, StoreSize),
+ SCEV::FlagNUW);
+
+ Value *NumBytes =
+ Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());
+
unsigned Align = std::min(SI->getAlignment(), LI->getAlignment());
CallInst *NewCall = nullptr;
// Check whether to generate an unordered atomic memcpy:
// If the load or store are atomic, then they must neccessarily be unordered
// by previous checks.
- if (!SI->isAtomic() && !LI->isAtomic()) {
- if (StoreSize != 1)
- NumBytesS = SE->getMulExpr(
- NumBytesS, SE->getConstant(IntPtrTy, StoreSize), SCEV::FlagNUW);
-
- Value *NumBytes =
- Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());
-
+ if (!SI->isAtomic() && !LI->isAtomic())
NewCall = Builder.CreateMemCpy(StoreBasePtr, LoadBasePtr, NumBytes, Align);
- } else {
+ else {
// We cannot allow unaligned ops for unordered load/store, so reject
// anything where the alignment isn't at least the element size.
if (Align < StoreSize)
if (StoreSize > TTI->getAtomicMemIntrinsicMaxElementSize())
return false;
- Value *NumElements =
- Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());
+ NewCall = Builder.CreateElementUnorderedAtomicMemCpy(
+ StoreBasePtr, LoadBasePtr, NumBytes, StoreSize);
- NewCall = Builder.CreateElementAtomicMemCpy(StoreBasePtr, LoadBasePtr,
- NumElements, StoreSize);
// Propagate alignment info onto the pointer args. Note that unordered
// atomic loads/stores are *required* by the spec to have an alignment
// but non-atomic loads/stores may not.
define i8* @test_memcpy1(i8* %P, i8* %Q) {
; CHECK: test_memcpy
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 1, i32 1)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 1)
ret i8* %P
+ ; 3rd arg (%edx) -- length
; CHECK-DAG: movl $1, %edx
- ; CHECK-DAG: movl $1, %ecx
- ; CHECK: __llvm_memcpy_element_atomic_1
+ ; CHECK: __llvm_memcpy_element_unordered_atomic_1
}
define i8* @test_memcpy2(i8* %P, i8* %Q) {
; CHECK: test_memcpy2
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 2, i32 2)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 2, i32 2)
ret i8* %P
+ ; 3rd arg (%edx) -- length
; CHECK-DAG: movl $2, %edx
- ; CHECK-DAG: movl $2, %ecx
- ; CHECK: __llvm_memcpy_element_atomic_2
+ ; CHECK: __llvm_memcpy_element_unordered_atomic_2
}
define i8* @test_memcpy4(i8* %P, i8* %Q) {
; CHECK: test_memcpy4
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 4, i32 4)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 4, i32 4)
ret i8* %P
+ ; 3rd arg (%edx) -- length
; CHECK-DAG: movl $4, %edx
- ; CHECK-DAG: movl $4, %ecx
- ; CHECK: __llvm_memcpy_element_atomic_4
+ ; CHECK: __llvm_memcpy_element_unordered_atomic_4
}
define i8* @test_memcpy8(i8* %P, i8* %Q) {
; CHECK: test_memcpy8
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %P, i8* align 8 %Q, i64 8, i32 8)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %P, i8* align 8 %Q, i32 8, i32 8)
ret i8* %P
+ ; 3rd arg (%edx) -- length
; CHECK-DAG: movl $8, %edx
- ; CHECK-DAG: movl $8, %ecx
- ; CHECK: __llvm_memcpy_element_atomic_8
+ ; CHECK: __llvm_memcpy_element_unordered_atomic_8
}
define i8* @test_memcpy16(i8* %P, i8* %Q) {
; CHECK: test_memcpy16
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %P, i8* align 16 %Q, i64 16, i32 16)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %P, i8* align 16 %Q, i32 16, i32 16)
ret i8* %P
+ ; 3rd arg (%edx) -- length
; CHECK-DAG: movl $16, %edx
- ; CHECK-DAG: movl $16, %ecx
- ; CHECK: __llvm_memcpy_element_atomic_16
+ ; CHECK: __llvm_memcpy_element_unordered_atomic_16
}
define void @test_memcpy_args(i8** %Storage) {
%Src.addr = getelementptr i8*, i8** %Storage, i64 1
%Src = load i8*, i8** %Src.addr
- ; First argument
+ ; 1st arg (%rdi)
; CHECK-DAG: movq (%rdi), [[REG1:%r.+]]
; CHECK-DAG: movq [[REG1]], %rdi
- ; Second argument
+ ; 2nd arg (%rsi)
; CHECK-DAG: movq 8(%rdi), %rsi
- ; Third argument
+ ; 3rd arg (%edx) -- length
; CHECK-DAG: movl $4, %edx
- ; Fourth argument
- ; CHECK-DAG: movl $4, %ecx
- ; CHECK: __llvm_memcpy_element_atomic_4
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 4, i32 4)
- ret void
+ ; CHECK: __llvm_memcpy_element_unordered_atomic_4
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %Dst, i8* align 4 %Src, i32 4, i32 4) ret void
}
-declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind
+declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind
; RUN: opt -instcombine -unfold-element-atomic-memcpy-max-elements=8 -S < %s | FileCheck %s
+; Temporarily an expected failure until inst combine is updated in the next patch
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
-; Test basic unfolding
-define void @test1(i8* %Src, i8* %Dst) {
-; CHECK-LABEL: test1
-; CHECK-NOT: llvm.memcpy.element.atomic
+; Test basic unfolding -- unordered load & store
+define void @test1a(i8* %Src, i8* %Dst) {
+; CHECK-LABEL: test1a
+; CHECK-NOT: llvm.memcpy.element.unordered.atomic
; CHECK-DAG: %memcpy_unfold.src_casted = bitcast i8* %Src to i32*
; CHECK-DAG: %memcpy_unfold.dst_casted = bitcast i8* %Dst to i32*
; CHECK-DAG: [[VAL4:%[^\s]+]] = load atomic i32, i32* %{{[^\s]+}} unordered, align 4
; CHECK-DAG: store atomic i32 [[VAL4]], i32* %{{[^\s]+}} unordered, align 4
entry:
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 8 %Src, i64 4, i32 4)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %Dst, i8* align 4 %Src, i32 16, i32 4)
ret void
}
; CHECK-NOT: load
; CHECK-NOT: store
-; CHECK: llvm.memcpy.element.atomic
+; CHECK: llvm.memcpy.element.unordered.atomic
entry:
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 1000, i32 4)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %Dst, i8* align 4 %Src, i32 256, i32 4)
ret void
}
; CHECK-NOT: load
; CHECK-NOT: store
-; CHECK: llvm.memcpy.element.atomic
+; CHECK: llvm.memcpy.element.unordered.atomic
entry:
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 4, i32 64)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 64, i32 64)
ret void
}
; Test that we will eliminate redundant bitcasts
define void @test4(i64* %Src, i64* %Dst) {
; CHECK-LABEL: test4
-; CHECK-NOT: llvm.memcpy.element.atomic
+; CHECK-NOT: llvm.memcpy.element.unordered.atomic
; CHECK-NOT: bitcast
entry:
%Src.casted = bitcast i64* %Src to i8*
%Dst.casted = bitcast i64* %Dst to i8*
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %Dst.casted, i8* align 16 %Src.casted, i64 4, i32 8)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %Dst.casted, i8* align 16 %Src.casted, i32 32, i32 8)
ret void
}
+; Test that 0-length unordered atomic memcpy gets removed.
define void @test5(i8* %Src, i8* %Dst) {
; CHECK-LABEL: test5
-; CHECK-NOT: llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 0, i32 64)
+; CHECK-NOT: llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 0, i32 8)
entry:
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 0, i32 64)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 0, i32 8)
ret void
}
-declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32)
+declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind
;; memcpy.atomic formation (atomic load & store)
define void @test1(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test1(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
; CHECK-NOT: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation (atomic store, normal load)
define void @test2(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test2(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
; CHECK-NOT: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation rejection (atomic store, normal load w/ no align)
define void @test2b(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test2b(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
; CHECK: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation rejection (atomic store, normal load w/ bad align)
define void @test2c(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test2c(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
; CHECK: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation rejection (atomic store w/ bad align, normal load)
define void @test2d(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test2d(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
; CHECK: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation (normal store, atomic load)
define void @test3(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test3(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
; CHECK-NOT: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation rejection (normal store w/ no align, atomic load)
define void @test3b(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test3b(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
; CHECK: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation rejection (normal store, atomic load w/ bad align)
define void @test3c(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test3c(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
; CHECK: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation rejection (normal store w/ bad align, atomic load)
define void @test3d(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test3d(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
; CHECK: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation rejection (atomic load, ordered-atomic store)
define void @test4(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test4(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
; CHECK: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation rejection (ordered-atomic load, unordered-atomic store)
define void @test5(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test5(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
; CHECK: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation (atomic load & store) -- element size 2
define void @test6(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test6(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %Dest{{[0-9]*}}, i8* align 2 %Base{{[0-9]*}}, i64 %Size, i32 2)
+; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 1
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 2 %Dest{{[0-9]*}}, i8* align 2 %Base{{[0-9]*}}, i64 [[Sz]], i32 2)
; CHECK-NOT: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation (atomic load & store) -- element size 4
define void @test7(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test7(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dest{{[0-9]*}}, i8* align 4 %Base{{[0-9]*}}, i64 %Size, i32 4)
+; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 2
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 4 %Dest{{[0-9]*}}, i8* align 4 %Base{{[0-9]*}}, i64 [[Sz]], i32 4)
; CHECK-NOT: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation (atomic load & store) -- element size 8
define void @test8(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test8(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %Dest{{[0-9]*}}, i8* align 8 %Base{{[0-9]*}}, i64 %Size, i32 8)
+; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 3
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 8 %Dest{{[0-9]*}}, i8* align 8 %Base{{[0-9]*}}, i64 [[Sz]], i32 8)
; CHECK-NOT: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation rejection (atomic load & store) -- element size 16
define void @test9(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test9(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %Dest{{[0-9]*}}, i8* align 16 %Base{{[0-9]*}}, i64 %Size, i32 16)
+; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 4
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 16 %Dest{{[0-9]*}}, i8* align 16 %Base{{[0-9]*}}, i64 [[Sz]], i32 16)
; CHECK-NOT: store
; CHECK: ret void
bb.nph:
;; memcpy.atomic formation rejection (atomic load & store) -- element size 32
define void @test10(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test10(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
; CHECK: store
; CHECK: ret void
bb.nph:
;; Will not create call due to a max element size of 0
define void @test1(i64 %Size) nounwind ssp {
; CHECK-LABEL: @test1(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
; CHECK: store
; CHECK: ret void
bb.nph:
; RUN: not opt -verify < %s 2>&1 | FileCheck %s
-define void @test_memcpy(i8* %P, i8* %Q) {
+define void @test_memcpy(i8* %P, i8* %Q, i32 %A, i32 %E) {
+ ; CHECK: element size of the element-wise unordered atomic memory intrinsic must be a constant int
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 %E)
; CHECK: element size of the element-wise atomic memory intrinsic must be a power of 2
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 2 %Q, i64 4, i32 3)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 3)
+ ; CHECK: constant length must be a multiple of the element size in the element-wise atomic memory intrinsic
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 7, i32 4)
+
+ ; CHECK: incorrect alignment of the destination argument
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* %P, i8* align 4 %Q, i32 1, i32 1)
; CHECK: incorrect alignment of the destination argument
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 4 %Q, i64 4, i32 4)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %P, i8* align 4 %Q, i32 4, i32 4)
; CHECK: incorrect alignment of the source argument
- call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 2 %Q, i64 4, i32 4)
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* %Q, i32 1, i32 1)
+ ; CHECK: incorrect alignment of the source argument
+ call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 1 %Q, i32 4, i32 4)
ret void
}
-declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind
-
+declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind
; CHECK: input module is broken!