From a5bb497257b1e112cc94e6608beeb619d9576259 Mon Sep 17 00:00:00 2001 From: Tyler Nowicki Date: Wed, 18 Jun 2014 00:51:32 +0000 Subject: [PATCH] Documentation for #pragma clang loop directive and options vectorize and interleave. Reviewed by: Aaron Ballman and Dmitri Gribenko git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@211135 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/LanguageExtensions.rst | 65 +++++++++++++++++++++++++++++++++ docs/ReleaseNotes.rst | 8 ++++ include/clang/Basic/Attr.td | 2 +- include/clang/Basic/AttrDocs.td | 11 ++++++ 4 files changed, 85 insertions(+), 1 deletion(-) diff --git a/docs/LanguageExtensions.rst b/docs/LanguageExtensions.rst index 2de18ce885..a9ba9079c0 100644 --- a/docs/LanguageExtensions.rst +++ b/docs/LanguageExtensions.rst @@ -1764,3 +1764,68 @@ The ``container`` function is also in the region and will not be optimized, but it causes the instantiation of ``twice`` and ``thrice`` with an ``int`` type; of these two instantiations, ``twice`` will be optimized (because its definition was outside the region) and ``thrice`` will not be optimized. + +Extensions for loop hint optimizations +====================================== + +The ``#pragma clang loop`` directive is used to specify hints for optimizing the +subsequent for, while, do-while, or c++11 range-based for loop. The directive +provides options for vectorization and interleaving. Loop hints can be specified +before any loop and will be ignored if the optimization is not safe to apply. + +A vectorized loop performs multiple iterations of the original loop +in parallel using vector instructions. The instruction set of the target +processor determines which vector instructions are available and their vector +widths. This restricts the types of loops that can be vectorized. The vectorizer +automatically determines if the loop is safe and profitable to vectorize. A +vector instruction cost model is used to select the vector width. + +Interleaving multiple loop iterations allows modern processors to further +improve instruction-level parallelism (ILP) using advanced hardware features, +such as multiple execution units and out-of-order execution. The vectorizer uses +a cost model that depends on the register pressure and generated code size to +select the interleaving count. + +Vectorization is enabled by ``vectorize(enable)`` and interleaving is enabled +by ``interleave(enable)``. This is useful when compiling with ``-Os`` to +manually enable vectorization or interleaving. + +.. code-block:: c++ + + #pragma clang loop vectorize(enable) + #pragma clang loop interleave(enable) + for(...) { + ... + } + +The vector width is specified by ``vectorize_width(_value_)`` and the interleave +count is specified by ``interleave_count(_value_)``, where +_value_ is a positive integer. This is useful for specifying the optimal +width/count of the set of target architectures supported by your application. + +.. code-block:: c++ + + + #pragma clang loop vectorize_width(2) + #pragma clang loop interleave_count(2) + for(...) { + ... + } + +Specifying a width/count of 1 disables the optimization, and is equivalent to +``vectorize(disable)`` or ``interleave(disable)``. + +For convenience multiple loop hints can be specified on a single line. + +.. code-block:: c++ + + #pragma clang loop vectorize_width(4) interleave_count(8) + for(...) { + ... + } + +If an optimization cannot be applied any hints that apply to it will be ignored. +For example, the hint ``vectorize_width(4)`` is ignored if the loop is not +proven safe to vectorize. To identify and diagnose optimization issues use +`-Rpass`, `-Rpass-missed`, and `-Rpass-analysis` command line options. See the +user guide for details. diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst index 1311e19639..a7bbbb5fdb 100644 --- a/docs/ReleaseNotes.rst +++ b/docs/ReleaseNotes.rst @@ -97,6 +97,14 @@ passes via three new flags: `-Rpass`, `-Rpass-missed` and `-Rpass-analysis`. These flags take a POSIX regular expression which indicates the name of the pass (or passes) that should emit optimization remarks. +New Pragmas in Clang +----------------------- + +Loop optimization hints can be specified using the new `#pragma clang loop` +directive just prior to the desired loop. The directive allows vectorization +and interleaving to be enabled or disabled, and the vector width and interleave +count to be manually specified. See language extensions for details. + C Language Changes in Clang --------------------------- diff --git a/include/clang/Basic/Attr.td b/include/clang/Basic/Attr.td index ab83db206e..df4b38d331 100644 --- a/include/clang/Basic/Attr.td +++ b/include/clang/Basic/Attr.td @@ -1812,5 +1812,5 @@ def LoopHint : Attr { } }]; - let Documentation = [Undocumented]; + let Documentation = [LoopHintDocs]; } diff --git a/include/clang/Basic/AttrDocs.td b/include/clang/Basic/AttrDocs.td index 7441fe5cfe..6c8c9a37ae 100644 --- a/include/clang/Basic/AttrDocs.td +++ b/include/clang/Basic/AttrDocs.td @@ -1012,3 +1012,14 @@ This attribute is incompatible with the ``always_inline`` attribute. }]; } +def LoopHintDocs : Documentation { + let Category = DocCatStmt; + let Content = [{ +The ``#pragma clang loop'' directive allows loop optimization hints to be +specified for the subsequent loop. The directive allows vectorization +and interleaving to be enabled or disabled, and the vector width and interleave +count to be manually specified. See `language extensions +'_ +for details. + }]; +} -- 2.40.0