granicus.if.org Git - clang/blob - README.txt

   1 //===----------------------------------------------------------------------===//
   2 // C Language Family Front-end
   3 //===----------------------------------------------------------------------===//
   4                                                              Chris Lattner
   5
   6 I. Introduction:
   7
   8  clang: noun
   9     1. A loud, resonant, metallic sound.
  10     2. The strident call of a crane or goose.
  11     3. C-language family front-end toolkit.
  12
  13  The world needs better compiler tools, tools which are built as libraries. This
  14  design point allows reuse of the tools in new and novel ways. However, building
  15  the tools as libraries isn't enough: they must have clean APIs, be as
  16  decoupled from each other as possible, and be easy to modify/extend.  This
  17  requires clean layering, decent design, and avoiding tying the libraries to a
  18  specific use.  Oh yeah, did I mention that we want the resultant libraries to
  19  be as fast as possible? :)
  20
  21  This front-end is built as a component of the LLVM toolkit that can be used
  22  with the LLVM backend or independently of it.  In this spirit, the API has been
  23  carefully designed as the following components:
  24
  25    libsupport  - Basic support library, reused from LLVM.
  26    libsystem   - System abstraction library, reused from LLVM.
  27
  28    libbasic    - Diagnostics, SourceLocations, SourceBuffer abstraction,
  29                  file system caching for input source files.  This depends on
  30                  libsupport and libsystem.
  31    libast      - Provides classes to represent the C AST, the C type system,
  32                  builtin functions, and various helpers for analyzing and
  33                  manipulating the AST (visitors, pretty printers, etc).  This
  34                  library depends on libbasic.
  35
  36    liblex      - C/C++/ObjC lexing and preprocessing, identifier hash table,
  37                  pragma handling, tokens, and macros.  This depends on libbasic.
  38    libparse    - C (for now) parsing and local semantic analysis. This library
  39                  invokes coarse-grained 'Actions' provided by the client to do
  40                  stuff (e.g. libsema builds ASTs).  This depends on liblex.
  41    libsema     - Provides a set of parser actions to build a standardized AST
  42                  for programs.  AST's are 'streamed' out a top-level declaration
  43                  at a time, allowing clients to use decl-at-a-time processing,
  44                  build up entire translation units, or even build 'whole
  45                  program' ASTs depending on how they use the APIs.  This depends
  46                  on libast and libparse.
  47
  48    libcodegen  - Lower the AST to LLVM IR for optimization & codegen.  Depends
  49                  on libast.
  50    clang       - An example driver, client of the libraries at various levels.
  51                  This depends on all these libraries, and on LLVM VMCore.
  52
  53  This front-end has been intentionally built as a DAG of libraries, making it
  54  easy to  reuse individual parts or replace pieces if desired. For example, to
  55  build a preprocessor, you take the Basic and Lexer libraries. If you want an
  56  indexer, you take those plus the Parser library and provide some actions for
  57  indexing.  If you want a refactoring, static analysis, or source-to-source
  58  compiler tool, it makes sense to take those plus the AST building and semantic
  59  analyzer library.  Finally, if you want to use this with the LLVM backend,
  60  you'd take these components plus the AST to LLVM lowering code.
  61
  62  In the future I hope this toolkit will grow to include new and interesting
  63  components, including a C++ front-end, ObjC support, and a whole lot of other
  64  things.
  65
  66  Finally, it should be pointed out that the goal here is to build something that
  67  is high-quality and industrial-strength: all the obnoxious features of the C
  68  family must be correctly supported (trigraphs, preprocessor arcana, K&R-style
  69  prototypes, GCC/MS extensions, etc).  It cannot be used if it is not 'real'.
  70
  71
  72 II. Usage of clang driver:
  73
  74  * Basic Command-Line Options:
  75    - Help: clang --help
  76    - Standard GCC options accepted: -E, -I*, -i*, -pedantic, -std=c90, etc.
  77    - To make diagnostics more gcc-like: -fno-caret-diagnostics -fno-show-column
  78    - Enable metric printing: -stats
  79
  80  * -fsyntax-only is currently the default mode.
  81
  82  * -E mode works the same way as GCC.
  83
  84  * -Eonly mode does all preprocessing, but does not print the output,
  85      useful for timing the preprocessor.
  86
  87  * -fsyntax-only is currently partially implemented, lacking some
  88      semantic analysis (some errors and warnings are not produced).
  89
  90  * -parse-noop parses code without building an AST.  This is useful
  91      for timing the cost of the parser without including AST building
  92      time.
  93
  94  * -parse-ast builds ASTs, but doesn't print them.  This is most
  95      useful for timing AST building vs -parse-noop.
  96
  97  * -parse-ast-print pretty prints most expression and statements nodes.
  98
  99  * -parse-ast-check checks that diagnostic messages that are expected
 100      are reported and that those which are reported are expected.
 101
 102  * -dump-cfg builds ASTs and then CFGs.  CFGs are then pretty-printed.
 103
 104  * -view-cfg builds ASTs and then CFGs.  CFGs are then visualized by
 105      invoking Graphviz.
 106
 107      For more information on getting Graphviz to work with clang/LLVM,
 108      see: http://llvm.org/docs/ProgrammersManual.html#ViewGraph
 109
 110
 111 III. Current advantages over GCC:
 112
 113  * Column numbers are fully tracked (no 256 col limit, no GCC-style pruning).
 114  * All diagnostics have column numbers, includes 'caret diagnostics', and they
 115    highlight regions of interesting code (e.g. the LHS and RHS of a binop).
 116  * Full diagnostic customization by client (can format diagnostics however they
 117    like, e.g. in an IDE or refactoring tool) through DiagnosticClient interface.
 118  * Built as a framework, can be reused by multiple tools.
 119  * All languages supported linked into same library (no cc1,cc1obj, ...).
 120  * mmap's code in read-only, does not dirty the pages like GCC (mem footprint).
 121  * LLVM License, can be linked into non-GPL projects.
 122  * Full diagnostic control, per diagnostic.  Diagnostics are identified by ID.
 123  * Significantly faster than GCC at semantic analysis, parsing, preprocessing
 124    and lexing.
 125  * Defers exposing platform-specific stuff to as late as possible, tracks use of
 126    platform-specific features (e.g. #ifdef PPC) to allow 'portable bytecodes'.
 127  * The lexer doesn't rely on the "lexer hack": it has no notion of scope and
 128    does not categorize identifiers as types or variables -- this is up to the
 129    parser to decide.
 130
 131 Potential Future Features:
 132
 133  * Fine grained diag control within the source (#pragma enable/disable warning).
 134  * Better token tracking within macros?  (Token came from this line, which is
 135    a macro argument instantiated here, recursively instantiated here).
 136  * Fast #import with a module system.
 137  * Dependency tracking: change to header file doesn't recompile every function
 138    that texually depends on it: recompile only those functions that need it.
 139    This is aka 'incremental parsing'.
 140
 141
 142 IV. Missing Functionality / Improvements
 143
 144 clang driver:
 145  * Include search paths are hard-coded into the driver.  Doh.
 146
 147 File Manager:
 148  * Reduce syscalls for reduced compile time, see NOTES.txt.
 149
 150 Lexer:
 151  * Source character mapping.  GCC supports ASCII and UTF-8.
 152    See GCC options: -ftarget-charset and -ftarget-wide-charset.
 153  * Universal character support.  Experimental in GCC, enabled with
 154    -fextended-identifiers.
 155  * -fpreprocessed mode.
 156
 157 Preprocessor:
 158  * Know about apple header maps.
 159  * #assert/#unassert
 160  * #line / #file directives (currently accepted and ignored).
 161  * MSExtension: "L#param" stringizes to a wide string literal.
 162  * Charize extension: "#define F(o) #@o  F(a)"  -> 'a'.
 163  * Consider merging the parser's expression parser into the preprocessor to
 164    eliminate duplicate code.
 165  * Add support for -M*
 166
 167 Traditional Preprocessor:
 168  * Currently, we have none. :)
 169
 170 Parser:
 171  * C90/K&R modes are only partially implemented.
 172  * __extension__ is currently just skipped and ignored.
 173
 174 Semantic Analysis:
 175  * Perhaps 85% done.
 176
 177 LLVM Code Gen:
 178  * Most of the easy stuff is done, probably 64.9% done so far.
 179