From 62fd278ff94d1df43652ec30a48fe02bb598e68e Mon Sep 17 00:00:00 2001 From: Chris Lattner Date: Sat, 22 Nov 2008 21:41:31 +0000 Subject: [PATCH] start documenting Diagnostics. Sebastian, I'd appreciate it if you can fill in the section for %plural. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@59883 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/InternalsManual.html | 213 +++++++++++++++++++++++++++++++++++++- 1 file changed, 210 insertions(+), 3 deletions(-) diff --git a/docs/InternalsManual.html b/docs/InternalsManual.html index 2e44640c98..6ecc5d6149 100644 --- a/docs/InternalsManual.html +++ b/docs/InternalsManual.html @@ -17,6 +17,7 @@
  • LLVM System and Support Libraries
  • The clang 'Basic' Library @@ -84,6 +85,204 @@ classes somewhere else, or introduce some other solution.

    We describe the roles of these classes in order of their dependencies.

    + + +

    The Diagnostics Subsystem

    + + +

    The Clang Diagnostics subsystem is an important part of how the compiler +communicates with the human. Diagnostics are the warnings and errors produced +when the code is incorrect or dubious. In Clang, each diagnostic produced has +(at the minimum) a unique ID, a SourceLocation to +"put the caret", an English translation associated with it, and a severity (e.g. +WARNING or ERROR). They can also optionally include a number +of arguments to the dianostic (which fill in "%0"'s in the string) as well as a +number of source ranges that related to the diagnostic.

    + +

    In this section, we'll be giving examples produced by the clang command line +driver, but diagnostics can be rendered in many +different ways depending on how the DiagnosticClient interface is +implemented. A representative example of a diagonstic is:

    + +
    +t.c:38:15: error: invalid operands to binary expression ('int *' and '_Complex float')
    +   P = (P-42) + Gamma*4;
    +       ~~~~~~ ^ ~~~~~~~
    +
    + +

    In this example, you can see the English translation, the severity (error), +you can see the source location (the caret ("^") and file/line/column info), +the source ranges "~~~~", arguments to the diagnostic ("int*" and "_Complex +float"). You'll have to believe me that there is a unique ID backing the +diagnostic :).

    + +

    Getting all of this to happen has several steps and involves many moving +pieces, this section describes them and talks about best practices when adding +a new diagnostic.

    + + +

    The DiagnosticKinds.def file

    + + +

    Diagnostics are created by adding an entry to the DiagnosticKinds.def file. This file encodes the unique ID of the +diagnostic (as an enum, the first argument), the severity of the diagnostic +(second argument) and the English translation + format string.

    + +

    There is little sanity with the naming of the unique ID's right now. Some +start with err_, warn_, ext_ to encode the severity into the name. Since the +enum is referenced in the C++ code that produces the diagnostic, it is somewhat +useful for it to be reasonably short.

    + +

    The severity of the diagnostic comes from the set {NOTE, +WARNING, EXTENSION, EXTWARN, ERROR}. The +ERROR severity is used for diagnostics indicating the program is never +acceptable under any circumstances. When an error is emitted, the AST for the +input code may not be fully built. The EXTENSION and EXTWARN +severities are used for extensions to the language that Clang accepts. This +means that Clang fully understands and can represent them in the AST, but we +produce diagnostics to tell the user their code is non-portable. The difference +is that the former are ignored by default, and the later warn by default. The +WARNING severity is used for constructs that are valid in the currently +selected source language but that are dubious in some way. The NOTE +level is used to staple more information onto a previous diagnostics.

    + +

    These severities are mapped into a smaller set (the +Diagnostic::Level enum, {Ignored, Note, Warning, +Error }) of output levels by the diagnostics subsystem based +on various configuration options. For example, if the user specifies +-pedantic, EXTENSION maps to Warning, if they specify +-pedantic-errors, it turns into Error. Clang also internally +supports a fully fine grained mapping mechanism that allows you to map any +diagnostic that doesn't have ERRROR severity to any output level that +you want. This is used to implement options like -Wunused_macros, +-Wundef etc.

    + + +

    The Format String

    + + +

    The format string for the diagnostic is very simple, but it has some power. +It takes the form of a string in English with markers that indicate where and +how arguments to the diagnostic are inserted and formatted. For example, here +are some simple format strings:

    + +
    +  "binary integer literals are an extension"
    +  "format string contains '\\0' within the string body"
    +  "more '%%' conversions than data arguments"
    +  "invalid operands to binary expression ('%0' and '%1')"
    +  "overloaded '%0' must be a %select{unary|binary|unary or binary}2 operator"
    +       " (has %1 parameter%s1)"
    +
    + +

    These examples show some important points of format strings. You can use any + plain ASCII character in the diagnostic string except "%" without a problem, + but these are C strings, so you have to use and be aware of all the C escape + sequences (as in the second example). If you want to produce a "%" in the + output, use the "%%" escape sequence, like the third diagnostic. Finally, + clang uses the "%...[digit]" sequences to specify where and how arguments to + the diagnostic are formatted.

    + +

    Arguments to the diagnostic are numbered according to how they are specified + by the C++ code that produces them, and are + referenced by %0 .. %9. If you have more than 10 arguments + to your diagnostic, you are doing something wrong. :). Unlike printf, there + is no requirement that arguments to the diagnostic end up in the output in + the same order as they are specified, you could have a format string with + "%1 %0" that swaps them, for example. The text in between the + percent and digit are formatting instructions. If there are no instructions, + the argument is just turned into a string and substituted in.

    + +

    Here are some "best practices" for writing the English format string:

    + + + +

    Diagnostics should never take random English strings as arguments: you +shouldn't use "you have a problem with %0" and pass in things like +"your argument" or "your return value" as arguments. Doing +this prevents translating the Clang diagnostics to +other languages (because they'll get random English words in their otherwise +localized diagnostic). The exceptions to this are C/C++ language keywords +(e.g. auto, const, mutable, etc) and C/C++ operators (/=). Note +that things like "pointer" and "reference" are not keywords. On the other +hand, you can include anything that comes from the user's source code, +including variable names, types, labels, etc.

    + + +

    Formatting a Diagnostic Argument

    + + +

    Arguments to diagnostics are fully typed internally, and come from a couple +different classes: integers, types, names, and random strings. Depending on +the class of the argument, it can be optionally formatted in different ways. +This gives the DiagnosticClient information about what the argument means +without requiring it to use a specific presentation (consider this MVC for +Clang :).

    + +

    Here are the different diagnostic argument formats currently supported by +Clang:

    + + + + + + + + + + + + + + + + + + +
    "s" format
    Example:"requires %1 parameter%s1"
    Classes:Integers
    Description:This is a simple formatter for integers that is + useful when producing English diagnostics. When the integer is 1, it prints + as nothing. When the integer is not 1, it prints as "s". This allows some + simple grammar to be to be handled correctly, and eliminates the need to use + gross things like "rewrite %1 parameter(s)".
    "select" format
    Example:"must be a %select{unary|binary|unary or binary}2 + operator"
    Classes:Integers
    Description:...
    "plural" format
    Example:".."
    Classes:Integers
    Description:...
    + + + + + +

    Producing the Diagnostic

    + + +

    SemaExpr.cpp example

    + + + +

    The DiagnosticClient Interface

    + + +

    Clang command line, buffering, HTMLizing, etc.

    + + +

    Adding Translations to Clang

    + + +

    Not possible yet!

    + +

    The SourceLocation and SourceManager classes

    @@ -367,7 +566,9 @@ efficient way to query whether two types are structurally identical to each other, ignoring typedefs. The solution to both of these problems is the idea of canonical types.

    +

    Canonical Types

    +

    Every instance of the Type class contains a canonical type pointer. For simple types with no typedefs involved (e.g. "int", "int*", @@ -565,7 +766,9 @@ useful for performing flow- or path-sensitive program analyses on a given function.

    +

    Basic Blocks

    +

    Concretely, an instance of CFG is a collection of basic blocks. Each basic block is an instance of CFGBlock, which @@ -587,7 +790,9 @@ should be made on how CFGBlocks are numbered other than their numbers are unique and that they are numbered from 0..N-1 (where N is the number of basic blocks in the CFG).

    +

    Entry and Exit Blocks

    + Each instance of CFG contains two special blocks: an entry block (accessible via CFG::getEntry()), which @@ -598,7 +803,9 @@ clear entrance and exit for a body of code such as a function body. The presence of these empty blocks greatly simplifies the implementation of many analyses built on top of CFGs. +

    Conditional Control-Flow

    +

    Conditional control-flow (such as those induced by if-statements and loops) is represented as edges between CFGBlocks. @@ -716,9 +923,9 @@ block B4 (i.e., B4.2). In this manner, conditions for control-flow (which also includes conditions for loops and switch statements) are hoisted into the actual basic block.

    - + + +