From 62fd278ff94d1df43652ec30a48fe02bb598e68e Mon Sep 17 00:00:00 2001
From: Chris Lattner
We describe the roles of these classes in order of their dependencies.
+ + +The Clang Diagnostics subsystem is an important part of how the compiler +communicates with the human. Diagnostics are the warnings and errors produced +when the code is incorrect or dubious. In Clang, each diagnostic produced has +(at the minimum) a unique ID, a SourceLocation to +"put the caret", an English translation associated with it, and a severity (e.g. +WARNING or ERROR). They can also optionally include a number +of arguments to the dianostic (which fill in "%0"'s in the string) as well as a +number of source ranges that related to the diagnostic.
+ +In this section, we'll be giving examples produced by the clang command line +driver, but diagnostics can be rendered in many +different ways depending on how the DiagnosticClient interface is +implemented. A representative example of a diagonstic is:
+ ++t.c:38:15: error: invalid operands to binary expression ('int *' and '_Complex float') + P = (P-42) + Gamma*4; + ~~~~~~ ^ ~~~~~~~ ++ +
In this example, you can see the English translation, the severity (error), +you can see the source location (the caret ("^") and file/line/column info), +the source ranges "~~~~", arguments to the diagnostic ("int*" and "_Complex +float"). You'll have to believe me that there is a unique ID backing the +diagnostic :).
+ +Getting all of this to happen has several steps and involves many moving +pieces, this section describes them and talks about best practices when adding +a new diagnostic.
+ + +Diagnostics are created by adding an entry to the DiagnosticKinds.def file. This file encodes the unique ID of the +diagnostic (as an enum, the first argument), the severity of the diagnostic +(second argument) and the English translation + format string.
+ +There is little sanity with the naming of the unique ID's right now. Some +start with err_, warn_, ext_ to encode the severity into the name. Since the +enum is referenced in the C++ code that produces the diagnostic, it is somewhat +useful for it to be reasonably short.
+ +The severity of the diagnostic comes from the set {NOTE, +WARNING, EXTENSION, EXTWARN, ERROR}. The +ERROR severity is used for diagnostics indicating the program is never +acceptable under any circumstances. When an error is emitted, the AST for the +input code may not be fully built. The EXTENSION and EXTWARN +severities are used for extensions to the language that Clang accepts. This +means that Clang fully understands and can represent them in the AST, but we +produce diagnostics to tell the user their code is non-portable. The difference +is that the former are ignored by default, and the later warn by default. The +WARNING severity is used for constructs that are valid in the currently +selected source language but that are dubious in some way. The NOTE +level is used to staple more information onto a previous diagnostics.
+ +These severities are mapped into a smaller set (the +Diagnostic::Level enum, {Ignored, Note, Warning, +Error }) of output levels by the diagnostics subsystem based +on various configuration options. For example, if the user specifies +-pedantic, EXTENSION maps to Warning, if they specify +-pedantic-errors, it turns into Error. Clang also internally +supports a fully fine grained mapping mechanism that allows you to map any +diagnostic that doesn't have ERRROR severity to any output level that +you want. This is used to implement options like -Wunused_macros, +-Wundef etc.
+ + +The format string for the diagnostic is very simple, but it has some power. +It takes the form of a string in English with markers that indicate where and +how arguments to the diagnostic are inserted and formatted. For example, here +are some simple format strings:
+ ++ "binary integer literals are an extension" + "format string contains '\\0' within the string body" + "more '%%' conversions than data arguments" + "invalid operands to binary expression ('%0' and '%1')" + "overloaded '%0' must be a %select{unary|binary|unary or binary}2 operator" + " (has %1 parameter%s1)" ++ +
These examples show some important points of format strings. You can use any + plain ASCII character in the diagnostic string except "%" without a problem, + but these are C strings, so you have to use and be aware of all the C escape + sequences (as in the second example). If you want to produce a "%" in the + output, use the "%%" escape sequence, like the third diagnostic. Finally, + clang uses the "%...[digit]" sequences to specify where and how arguments to + the diagnostic are formatted.
+ +Arguments to the diagnostic are numbered according to how they are specified + by the C++ code that produces them, and are + referenced by %0 .. %9. If you have more than 10 arguments + to your diagnostic, you are doing something wrong. :). Unlike printf, there + is no requirement that arguments to the diagnostic end up in the output in + the same order as they are specified, you could have a format string with + "%1 %0" that swaps them, for example. The text in between the + percent and digit are formatting instructions. If there are no instructions, + the argument is just turned into a string and substituted in.
+ +Here are some "best practices" for writing the English format string:
+ +Diagnostics should never take random English strings as arguments: you +shouldn't use "you have a problem with %0" and pass in things like +"your argument" or "your return value" as arguments. Doing +this prevents translating the Clang diagnostics to +other languages (because they'll get random English words in their otherwise +localized diagnostic). The exceptions to this are C/C++ language keywords +(e.g. auto, const, mutable, etc) and C/C++ operators (/=). Note +that things like "pointer" and "reference" are not keywords. On the other +hand, you can include anything that comes from the user's source code, +including variable names, types, labels, etc.
+ + +Arguments to diagnostics are fully typed internally, and come from a couple +different classes: integers, types, names, and random strings. Depending on +the class of the argument, it can be optionally formatted in different ways. +This gives the DiagnosticClient information about what the argument means +without requiring it to use a specific presentation (consider this MVC for +Clang :).
+ +Here are the different diagnostic argument formats currently supported by +Clang:
+ +"s" format | |
Example: | "requires %1 parameter%s1" |
Classes: | Integers |
Description: | This is a simple formatter for integers that is + useful when producing English diagnostics. When the integer is 1, it prints + as nothing. When the integer is not 1, it prints as "s". This allows some + simple grammar to be to be handled correctly, and eliminates the need to use + gross things like "rewrite %1 parameter(s)". |
"select" format | |
Example: | "must be a %select{unary|binary|unary or binary}2 + operator" |
Classes: | Integers |
Description: | ... |
"plural" format | |
Example: | ".." |
Classes: | Integers |
Description: | ... |
SemaExpr.cpp example
+ + + +Clang command line, buffering, HTMLizing, etc.
+ + +Not possible yet!
+ +Every instance of the Type class contains a canonical type pointer. For simple types with no typedefs involved (e.g. "int", "int*", @@ -565,7 +766,9 @@ useful for performing flow- or path-sensitive program analyses on a given function.
+Concretely, an instance of CFG is a collection of basic blocks. Each basic block is an instance of CFGBlock, which @@ -587,7 +790,9 @@ should be made on how CFGBlocks are numbered other than their numbers are unique and that they are numbered from 0..N-1 (where N is the number of basic blocks in the CFG).
+Conditional control-flow (such as those induced by if-statements and loops) is represented as edges between CFGBlocks. @@ -716,9 +923,9 @@ block B4 (i.e., B4.2). In this manner, conditions for control-flow (which also includes conditions for loops and switch statements) are hoisted into the actual basic block.
- + + +