Convert PDB docs to unix line endings. No other changes.

author Nico Weber <nicolasweber@gmx.de>

Wed, 1 May 2019 19:15:05 +0000 (19:15 +0000)

committer Nico Weber <nicolasweber@gmx.de>

Wed, 1 May 2019 19:15:05 +0000 (19:15 +0000)
author Nico Weber <nicolasweber@gmx.de>
Wed, 1 May 2019 19:15:05 +0000 (19:15 +0000)
committer Nico Weber <nicolasweber@gmx.de>
Wed, 1 May 2019 19:15:05 +0000 (19:15 +0000)
diff --git a/docs/PDB/GlobalStream.rst b/docs/PDB/GlobalStream.rst

index 314b9f01ffa8bddcb80b6846a5dea520bd2f91d3..dcc99ae3a0e2368b399de6136ffc92ed9fac7774 100644 (file)
--- a/docs/PDB/GlobalStream.rst
+++ b/docs/PDB/GlobalStream.rst
@@ -1,3 +1,3 @@
-=====================================\r
-The PDB Global Symbol Stream\r
-=====================================\r
+=====================================
+The PDB Global Symbol Stream
+=====================================
diff --git a/docs/PDB/HashTable.rst b/docs/PDB/HashTable.rst

index c296f3efadb71200ba01a971f3f11aa4558d508d..eb7abbd51075673a07f81cbf637af2fc10c41bf3 100644 (file)
--- a/docs/PDB/HashTable.rst
+++ b/docs/PDB/HashTable.rst
@@ -1,103 +1,103 @@
-The PDB Serialized Hash Table Format\r
-====================================\r
-\r
-.. contents::\r
-   :local:\r
-\r
-.. _hash_intro:\r
-\r
-Introduction\r
-============\r
-\r
-One of the design goals of the PDB format is to provide accelerated access to\r
-debug information, and for this reason there are several occasions where hash\r
-tables are serialized and embedded directly to the file, rather than requiring\r
-a consumer to read a list of values and reconstruct the hash table on the fly.\r
-\r
-The serialization format supports hash tables of arbitrarily large size and\r
-capacity, as well as value types and hash functions.  The only supported key\r
-value type is a uint32.  The only requirement is that the producer and consumer\r
-agree on the hash function.  As such, the hash function can is not discussed\r
-further in this document, it is assumed that for a particular instance of a PDB\r
-file hash table, the appropriate hash function is being used.\r
-\r
-On-Disk Format\r
-==============\r
-\r
-.. code-block:: none\r
-\r
-  .--------------------.-- +0\r
-  |        Size        |\r
-  .--------------------.-- +4\r
-  |      Capacity      |\r
-  .--------------------.-- +8\r
-  | Present Bit Vector |\r
-  .--------------------.-- +N\r
-  | Deleted Bit Vector |\r
-  .--------------------.-- +M                  ─╮\r
-  |        Key         |                        │\r
-  .--------------------.-- +M+4                 │\r
-  |       Value        |                        │\r
-  .--------------------.-- +M+4+sizeof(Value)   │\r
-           ...                                  ├─ |Capacity| Bucket entries\r
-  .--------------------.                        │\r
-  |        Key         |                        │\r
-  .--------------------.                        │\r
-  |       Value        |                        │\r
-  .--------------------.                       ─╯\r
-\r
-- **Size** - The number of values contained in the hash table.\r
-  \r
-- **Capacity** - The number of buckets in the hash table.  Producers should\r
-  maintain a load factor of no greater than ``2/3*Capacity+1``.\r
-  \r
-- **Present Bit Vector** - A serialized bit vector which contains information\r
-  about which buckets have valid values.  If the bucket has a value, the\r
-  corresponding bit will be set, and if the bucket doesn't have a value (either\r
-  because the bucket is empty or because the value is a tombstone value) the bit\r
-  will be unset.\r
-  \r
-- **Deleted Bit Vector** - A serialized bit vector which contains information\r
-  about which buckets have tombstone values.  If the entry in this bucket is\r
-  deleted, the bit will be set, otherwise it will be unset.\r
-\r
-- **Keys and Values** - A list of ``Capacity`` hash buckets, where the first\r
-  entry is the key (always a uint32), and the second entry is the value.  The\r
-  state of each bucket (valid, empty, deleted) can be determined by examining\r
-  the present and deleted bit vectors.\r
-\r
-\r
-.. _hash_bit_vectors:\r
-\r
-Present and Deleted Bit Vectors\r
-===============================\r
-\r
-The bit vectors indicating the status of each bucket are serialized as follows:\r
-\r
-.. code-block:: none\r
-\r
-  .--------------------.-- +0\r
-  |     Word Count     |\r
-  .--------------------.-- +4\r
-  |        Word_0      |        ─╮\r
-  .--------------------.-- +8    │\r
-  |        Word_1      |         │\r
-  .--------------------.-- +12   ├─ |Word Count| values\r
-           ...                   │\r
-  .--------------------.         │\r
-  |       Word_N       |         │\r
-  .--------------------.        ─╯\r
-\r
-The words, when viewed as a contiguous block of bytes, represent a bit vector with\r
-the following layout:\r
-\r
-.. code-block:: none\r
-\r
-    .------------.         .------------.------------.\r
-    |   Word_N   |   ...   |   Word_1   |   Word_0   |\r
-    .------------.         .------------.------------.\r
-    |            |         |            |            |\r
-  +N*32      +(N-1)*32    +64          +32          +0\r
-\r
-where the k'th bit of this bit vector represents the status of the k'th bucket\r
-in the hash table.\r
+The PDB Serialized Hash Table Format
+====================================
+
+.. contents::
+   :local:
+
+.. _hash_intro:
+
+Introduction
+============
+
+One of the design goals of the PDB format is to provide accelerated access to
+debug information, and for this reason there are several occasions where hash
+tables are serialized and embedded directly to the file, rather than requiring
+a consumer to read a list of values and reconstruct the hash table on the fly.
+
+The serialization format supports hash tables of arbitrarily large size and
+capacity, as well as value types and hash functions.  The only supported key
+value type is a uint32.  The only requirement is that the producer and consumer
+agree on the hash function.  As such, the hash function can is not discussed
+further in this document, it is assumed that for a particular instance of a PDB
+file hash table, the appropriate hash function is being used.
+
+On-Disk Format
+==============
+
+.. code-block:: none
+
+  .--------------------.-- +0
+  |        Size        |
+  .--------------------.-- +4
+  |      Capacity      |
+  .--------------------.-- +8
+  | Present Bit Vector |
+  .--------------------.-- +N
+  | Deleted Bit Vector |
+  .--------------------.-- +M                  ─╮
+  |        Key         |                        │
+  .--------------------.-- +M+4                 │
+  |       Value        |                        │
+  .--------------------.-- +M+4+sizeof(Value)   │
+           ...                                  ├─ |Capacity| Bucket entries
+  .--------------------.                        │
+  |        Key         |                        │
+  .--------------------.                        │
+  |       Value        |                        │
+  .--------------------.                       ─╯
+
+- **Size** - The number of values contained in the hash table.
+  
+- **Capacity** - The number of buckets in the hash table.  Producers should
+  maintain a load factor of no greater than ``2/3*Capacity+1``.
+  
+- **Present Bit Vector** - A serialized bit vector which contains information
+  about which buckets have valid values.  If the bucket has a value, the
+  corresponding bit will be set, and if the bucket doesn't have a value (either
+  because the bucket is empty or because the value is a tombstone value) the bit
+  will be unset.
+  
+- **Deleted Bit Vector** - A serialized bit vector which contains information
+  about which buckets have tombstone values.  If the entry in this bucket is
+  deleted, the bit will be set, otherwise it will be unset.
+
+- **Keys and Values** - A list of ``Capacity`` hash buckets, where the first
+  entry is the key (always a uint32), and the second entry is the value.  The
+  state of each bucket (valid, empty, deleted) can be determined by examining
+  the present and deleted bit vectors.
+
+
+.. _hash_bit_vectors:
+
+Present and Deleted Bit Vectors
+===============================
+
+The bit vectors indicating the status of each bucket are serialized as follows:
+
+.. code-block:: none
+
+  .--------------------.-- +0
+  |     Word Count     |
+  .--------------------.-- +4
+  |        Word_0      |        ─╮
+  .--------------------.-- +8    │
+  |        Word_1      |         │
+  .--------------------.-- +12   ├─ |Word Count| values
+           ...                   │
+  .--------------------.         │
+  |       Word_N       |         │
+  .--------------------.        ─╯
+
+The words, when viewed as a contiguous block of bytes, represent a bit vector with
+the following layout:
+
+.. code-block:: none
+
+    .------------.         .------------.------------.
+    |   Word_N   |   ...   |   Word_1   |   Word_0   |
+    .------------.         .------------.------------.
+    |            |         |            |            |
+  +N*32      +(N-1)*32    +64          +32          +0
+
+where the k'th bit of this bit vector represents the status of the k'th bucket
+in the hash table.
diff --git a/docs/PDB/ModiStream.rst b/docs/PDB/ModiStream.rst

index 7e500bd921c6fec934e0610e52ea35f58e6efaed..8104c5308b7bde47ca02d213059f5eb6b046837e 100644 (file)
--- a/docs/PDB/ModiStream.rst
+++ b/docs/PDB/ModiStream.rst
@@ -1,80 +1,80 @@
-=====================================\r
-The Module Information Stream\r
-=====================================\r
-\r
-.. contents::\r
-   :local:\r
-\r
-.. _modi_stream_intro:\r
-\r
-Introduction\r
-============\r
-\r
-The Module Info Stream (henceforth referred to as the Modi stream) contains\r
-information about a single module (object file, import library, etc that\r
-contributes to the binary this PDB contains debug information about.  There\r
-is one modi stream for each module, and the mapping between modi stream index\r
-and module is contained in the :doc:`DBI Stream <DbiStream>`.  The modi stream\r
-for a single module contains line information for the compiland, as well as\r
-all CodeView information for the symbols defined in the compiland.  Finally,\r
-there is a "global refs" substream which is not well understood.\r
-\r
-.. _modi_stream_layout:\r
-\r
-Stream Layout\r
-=============\r
-\r
-A modi stream is laid out as follows:\r
-\r
-\r
-.. code-block:: c++\r
-\r
-  struct ModiStream {\r
-    uint32_t Signature;\r
-    uint8_t Symbols[SymbolSize-4];\r
-    uint8_t C11LineInfo[C11Size];\r
-    uint8_t C13LineInfo[C13Size];\r
-    \r
-    uint32_t GlobalRefsSize;\r
-    uint8_t GlobalRefs[GlobalRefsSize];\r
-  };\r
-\r
-- **Signature** - Unknown.  In practice only the value of ``4`` has been\r
-  observed.  It is hypothesized that this value corresponds to the set of\r
-  ``CV_SIGNATURE_xx`` defines in ``cvinfo.h``, with the value of ``4``\r
-  meaning that this module has C13 line information (as opposed to C11 line\r
-  information).  A corollary of this is that we expect to only ever see\r
-  C13 line info, and that we do not understand the format of C11 line info.\r
-  \r
-- **Symbols** - The :ref:`CodeView Symbol Substream <modi_symbol_substream>`.\r
-  ``SymbolSize`` is equal to the value of ``SymByteSize`` for the\r
-  corresponding module's entry in the :ref:`Module Info Substream <dbi_mod_info_substream>`\r
-  of the :doc:`DBI Stream <DbiStream>`.\r
-\r
-- **C11LineInfo** - A block containing CodeView line information in C11\r
-  format.  ``C11Size`` is equal to the value of ``C11ByteSize`` from the\r
-  :ref:`Module Info Substream <dbi_mod_info_substream>` of the\r
-  :doc:`DBI Stream <DbiStream>`.  If this value is ``0``, then C11 line\r
-  information is not present.  As mentioned previously, the format of\r
-  C11 line info is not understood and we assume all line in modern PDBs\r
-  to be in C13 format.\r
-  \r
-- **C13LineInfo** - A block containing CodeView line information in C13\r
-  format.  ``C13Size`` is equal to the value of ``C13ByteSize`` from the\r
-  :ref:`Module Info Substream <dbi_mod_info_substream>` of the\r
-  :doc:`DBI Stream <DbiStream>`.  If this value is ``0``, then C13 line\r
-  information is not present.\r
-  \r
-- **GlobalRefs** - The meaning of this substream is not understood.\r
-\r
-.. _modi_symbol_substream:\r
-\r
-The CodeView Symbol Substream\r
-=============================\r
-\r
-The CodeView Symbol Substream.  This is an array of variable length\r
-records describing the functions, variables, inlining information,\r
-and other symbols defined in the compiland.  The entire array consumes\r
-``SymbolSize-4`` bytes.  The format of a CodeView Symbol Record (and\r
-thusly, an array of CodeView Symbol Records) is described in\r
-:doc:`CodeViewSymbols`.\r
+=====================================
+The Module Information Stream
+=====================================
+
+.. contents::
+   :local:
+
+.. _modi_stream_intro:
+
+Introduction
+============
+
+The Module Info Stream (henceforth referred to as the Modi stream) contains
+information about a single module (object file, import library, etc that
+contributes to the binary this PDB contains debug information about.  There
+is one modi stream for each module, and the mapping between modi stream index
+and module is contained in the :doc:`DBI Stream <DbiStream>`.  The modi stream
+for a single module contains line information for the compiland, as well as
+all CodeView information for the symbols defined in the compiland.  Finally,
+there is a "global refs" substream which is not well understood.
+
+.. _modi_stream_layout:
+
+Stream Layout
+=============
+
+A modi stream is laid out as follows:
+
+
+.. code-block:: c++
+
+  struct ModiStream {
+    uint32_t Signature;
+    uint8_t Symbols[SymbolSize-4];
+    uint8_t C11LineInfo[C11Size];
+    uint8_t C13LineInfo[C13Size];
+    
+    uint32_t GlobalRefsSize;
+    uint8_t GlobalRefs[GlobalRefsSize];
+  };
+
+- **Signature** - Unknown.  In practice only the value of ``4`` has been
+  observed.  It is hypothesized that this value corresponds to the set of
+  ``CV_SIGNATURE_xx`` defines in ``cvinfo.h``, with the value of ``4``
+  meaning that this module has C13 line information (as opposed to C11 line
+  information).  A corollary of this is that we expect to only ever see
+  C13 line info, and that we do not understand the format of C11 line info.
+  
+- **Symbols** - The :ref:`CodeView Symbol Substream <modi_symbol_substream>`.
+  ``SymbolSize`` is equal to the value of ``SymByteSize`` for the
+  corresponding module's entry in the :ref:`Module Info Substream <dbi_mod_info_substream>`
+  of the :doc:`DBI Stream <DbiStream>`.
+
+- **C11LineInfo** - A block containing CodeView line information in C11
+  format.  ``C11Size`` is equal to the value of ``C11ByteSize`` from the
+  :ref:`Module Info Substream <dbi_mod_info_substream>` of the
+  :doc:`DBI Stream <DbiStream>`.  If this value is ``0``, then C11 line
+  information is not present.  As mentioned previously, the format of
+  C11 line info is not understood and we assume all line in modern PDBs
+  to be in C13 format.
+  
+- **C13LineInfo** - A block containing CodeView line information in C13
+  format.  ``C13Size`` is equal to the value of ``C13ByteSize`` from the
+  :ref:`Module Info Substream <dbi_mod_info_substream>` of the
+  :doc:`DBI Stream <DbiStream>`.  If this value is ``0``, then C13 line
+  information is not present.
+  
+- **GlobalRefs** - The meaning of this substream is not understood.
+
+.. _modi_symbol_substream:
+
+The CodeView Symbol Substream
+=============================
+
+The CodeView Symbol Substream.  This is an array of variable length
+records describing the functions, variables, inlining information,
+and other symbols defined in the compiland.  The entire array consumes
+``SymbolSize-4`` bytes.  The format of a CodeView Symbol Record (and
+thusly, an array of CodeView Symbol Records) is described in
+:doc:`CodeViewSymbols`.
diff --git a/docs/PDB/MsfFile.rst b/docs/PDB/MsfFile.rst

index dfbbf9ded7fbee90b06604666d7b300a4bb716eb..a53ebe3e884f461e089fcada4197a96fb7e78a7f 100644 (file)
--- a/docs/PDB/MsfFile.rst
+++ b/docs/PDB/MsfFile.rst
@@ -1,179 +1,179 @@
-=====================================\r
-The MSF File Format\r
-=====================================\r
-\r
-.. contents::\r
-   :local:\r
-\r
-.. _msf_layout:\r
-\r
-File Layout\r
-===========\r
-\r
-The MSF file format consists of the following components:\r
-\r
-1. :ref:`msf_superblock`\r
-2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM)\r
-3. Data\r
-\r
-Each component is stored as an indexed block, the length of which is specified\r
-in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the\r
-following pattern (sometimes referred to as an "interval"):\r
-\r
-1. 1 block of data\r
-2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1)\r
-3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2)\r
-4. ``SuperBlock::BlockSize - 3`` blocks of data\r
-\r
-In the first interval, the first data block is used to store\r
-:ref:`msf_superblock`.\r
-\r
-The following diagram demonstrates the general layout of the file (\| denotes\r
-the end of an interval, and is for visualization purposes only):\r
-\r
-+-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+\r
-| Block Index | 0                     | 1                | 2                | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |\r
-+=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+\r
-| Meaning     | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data     | \| | Data | FPM1 | FPM2 | Data        | \| | ... |\r
-+-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+\r
-\r
-The file may end after any block, including immediately after a FPM1.\r
-\r
-.. note::\r
-  LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"\r
-  variant), so the rest of this document will assume a block size of 4096.\r
-\r
-.. _msf_superblock:\r
-\r
-The Superblock\r
-==============\r
-At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as\r
-follows:\r
-\r
-.. code-block:: c++\r
-\r
-  struct SuperBlock {\r
-    char FileMagic[sizeof(Magic)];\r
-    ulittle32_t BlockSize;\r
-    ulittle32_t FreeBlockMapBlock;\r
-    ulittle32_t NumBlocks;\r
-    ulittle32_t NumDirectoryBytes;\r
-    ulittle32_t Unknown;\r
-    ulittle32_t BlockMapAddr;\r
-  };\r
-\r
-- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``\r
-  followed by the bytes ``1A 44 53 00 00 00``.\r
-- **BlockSize** - The block size of the internal file system.  Valid values are\r
-  512, 1024, 2048, and 4096 bytes.  Certain aspects of the MSF file layout vary\r
-  depending on the block sizes.  For the purposes of LLVM, we handle only block\r
-  sizes of 4KiB, and all further discussion assumes a block size of 4KiB.\r
-- **FreeBlockMapBlock** - The index of a block within the file, at which begins\r
-  a bitfield representing the set of all blocks within the file which are "free"\r
-  (i.e. the data within that block is not used).  See :ref:`msf_freeblockmap` for\r
-  more information.\r
-  **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!\r
-- **NumBlocks** - The total number of blocks in the file.  ``NumBlocks * BlockSize``\r
-  should equal the size of the file on disk.\r
-- **NumDirectoryBytes** - The size of the stream directory, in bytes.  The stream\r
-  directory contains information about each stream's size and the set of blocks\r
-  that it occupies.  It will be described in more detail later.\r
-- **BlockMapAddr** - The index of a block within the MSF file.  At this block is\r
-  an array of ``ulittle32_t``'s listing the blocks that the stream directory\r
-  resides on.  For large MSF files, the stream directory (which describes the\r
-  block layout of each stream) may not fit entirely on a single block.  As a\r
-  result, this extra layer of indirection is introduced, whereby this block\r
-  contains the list of blocks that the stream directory occupies, and the stream\r
-  directory itself can be stitched together accordingly.  The number of\r
-  ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.\r
-\r
-.. _msf_freeblockmap:\r
-\r
-The Free Block Map\r
-==================\r
-\r
-The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a\r
-series of blocks which contains a bit flag for every block in the file. The\r
-flag will be set to 0 if the block is in use, and 1 if the block is unused.\r
-\r
-Each file contains two FPMs, one of which is active at any given time. This\r
-feature is designed to support incremental and atomic updates of the underlying\r
-MSF file. While writing to an MSF file, if the active FPM is FPM1, you can\r
-write your new modified bitfield to FPM2, and vice versa. Only when you commit\r
-the file to disk do you need to swap the value in the SuperBlock to point to\r
-the new ``FreeBlockMapBlock``.\r
-\r
-The Free Block Maps are stored as a series of single blocks thoughout the file\r
-at intervals of BlockSize. Because each FPM block is of size ``BlockSize``\r
-bytes, it contains 8 times as many bits as an interval has blocks. This means\r
-that the first block of each FPM refers to the first 8 intervals of the file\r
-(the first 32768 blocks), the second block of each FPM refers to the next 8\r
-blocks, and so on. This results in far more FPM blocks being present than are\r
-required, but in order to maintain backwards compatibility the format must stay\r
-this way.\r
-\r
-The Stream Directory\r
-====================\r
-The Stream Directory is the root of all access to the other streams in an MSF\r
-file.  Beginning at byte 0 of the stream directory is the following structure:\r
-\r
-.. code-block:: c++\r
-\r
-  struct StreamDirectory {\r
-    ulittle32_t NumStreams;\r
-    ulittle32_t StreamSizes[NumStreams];\r
-    ulittle32_t StreamBlocks[NumStreams][];\r
-  };\r
-\r
-And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.\r
-Note that each of the last two arrays is of variable length, and in particular\r
-that the second array is jagged.\r
-\r
-**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4\r
-streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.\r
-\r
-Stream 0: ceil(1000 / 4096) = 1 block\r
-\r
-Stream 1: ceil(8000 / 4096) = 2 blocks\r
-\r
-Stream 2: ceil(16000 / 4096) = 4 blocks\r
-\r
-Stream 3: ceil(9000 / 4096) = 3 blocks\r
-\r
-In total, 10 blocks are used.  Let's see what the stream directory might look\r
-like:\r
-\r
-.. code-block:: c++\r
-\r
-  struct StreamDirectory {\r
-    ulittle32_t NumStreams = 4;\r
-    ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};\r
-    ulittle32_t StreamBlocks[][] = {\r
-      {4},\r
-      {5, 6},\r
-      {11, 9, 7, 8},\r
-      {10, 15, 12}\r
-    };\r
-  };\r
-\r
-In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``\r
-would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one\r
-``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.\r
-\r
-Note also that the streams are discontiguous, and that part of stream 3 is in the\r
-middle of part of stream 2.  You cannot assume anything about the layout of the\r
-blocks!\r
-\r
-Alignment and Block Boundaries\r
-==============================\r
-As may be clear by now, it is possible for a single field (whether it be a high\r
-level record, a long string field, or even a single ``uint16``) to begin and\r
-end in separate blocks.  For example, if the block size is 4096 bytes, and a\r
-``uint16`` field begins at the last byte of the current block, then it would\r
-need to end on the first byte of the next block.  Since blocks are not\r
-necessarily contiguously laid out in the file, this means that both the consumer\r
-and the producer of an MSF file must be prepared to split data apart\r
-accordingly.  In the aforementioned example, the high byte of the ``uint16``\r
-would be written to the last byte of block N, and the low byte would be written\r
-to the first byte of block N+1, which could be tens of thousands of bytes later\r
-(or even earlier!) in the file, depending on what the stream directory says.\r
+=====================================
+The MSF File Format
+=====================================
+
+.. contents::
+   :local:
+
+.. _msf_layout:
+
+File Layout
+===========
+
+The MSF file format consists of the following components:
+
+1. :ref:`msf_superblock`
+2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM)
+3. Data
+
+Each component is stored as an indexed block, the length of which is specified
+in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the
+following pattern (sometimes referred to as an "interval"):
+
+1. 1 block of data
+2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1)
+3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2)
+4. ``SuperBlock::BlockSize - 3`` blocks of data
+
+In the first interval, the first data block is used to store
+:ref:`msf_superblock`.
+
+The following diagram demonstrates the general layout of the file (\| denotes
+the end of an interval, and is for visualization purposes only):
+
++-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
+| Block Index | 0                     | 1                | 2                | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |
++=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+
+| Meaning     | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data     | \| | Data | FPM1 | FPM2 | Data        | \| | ... |
++-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
+
+The file may end after any block, including immediately after a FPM1.
+
+.. note::
+  LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"
+  variant), so the rest of this document will assume a block size of 4096.
+
+.. _msf_superblock:
+
+The Superblock
+==============
+At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
+follows:
+
+.. code-block:: c++
+
+  struct SuperBlock {
+    char FileMagic[sizeof(Magic)];
+    ulittle32_t BlockSize;
+    ulittle32_t FreeBlockMapBlock;
+    ulittle32_t NumBlocks;
+    ulittle32_t NumDirectoryBytes;
+    ulittle32_t Unknown;
+    ulittle32_t BlockMapAddr;
+  };
+
+- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
+  followed by the bytes ``1A 44 53 00 00 00``.
+- **BlockSize** - The block size of the internal file system.  Valid values are
+  512, 1024, 2048, and 4096 bytes.  Certain aspects of the MSF file layout vary
+  depending on the block sizes.  For the purposes of LLVM, we handle only block
+  sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
+- **FreeBlockMapBlock** - The index of a block within the file, at which begins
+  a bitfield representing the set of all blocks within the file which are "free"
+  (i.e. the data within that block is not used).  See :ref:`msf_freeblockmap` for
+  more information.
+  **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!
+- **NumBlocks** - The total number of blocks in the file.  ``NumBlocks * BlockSize``
+  should equal the size of the file on disk.
+- **NumDirectoryBytes** - The size of the stream directory, in bytes.  The stream
+  directory contains information about each stream's size and the set of blocks
+  that it occupies.  It will be described in more detail later.
+- **BlockMapAddr** - The index of a block within the MSF file.  At this block is
+  an array of ``ulittle32_t``'s listing the blocks that the stream directory
+  resides on.  For large MSF files, the stream directory (which describes the
+  block layout of each stream) may not fit entirely on a single block.  As a
+  result, this extra layer of indirection is introduced, whereby this block
+  contains the list of blocks that the stream directory occupies, and the stream
+  directory itself can be stitched together accordingly.  The number of
+  ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
+
+.. _msf_freeblockmap:
+
+The Free Block Map
+==================
+
+The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a
+series of blocks which contains a bit flag for every block in the file. The
+flag will be set to 0 if the block is in use, and 1 if the block is unused.
+
+Each file contains two FPMs, one of which is active at any given time. This
+feature is designed to support incremental and atomic updates of the underlying
+MSF file. While writing to an MSF file, if the active FPM is FPM1, you can
+write your new modified bitfield to FPM2, and vice versa. Only when you commit
+the file to disk do you need to swap the value in the SuperBlock to point to
+the new ``FreeBlockMapBlock``.
+
+The Free Block Maps are stored as a series of single blocks thoughout the file
+at intervals of BlockSize. Because each FPM block is of size ``BlockSize``
+bytes, it contains 8 times as many bits as an interval has blocks. This means
+that the first block of each FPM refers to the first 8 intervals of the file
+(the first 32768 blocks), the second block of each FPM refers to the next 8
+blocks, and so on. This results in far more FPM blocks being present than are
+required, but in order to maintain backwards compatibility the format must stay
+this way.
+
+The Stream Directory
+====================
+The Stream Directory is the root of all access to the other streams in an MSF
+file.  Beginning at byte 0 of the stream directory is the following structure:
+
+.. code-block:: c++
+
+  struct StreamDirectory {
+    ulittle32_t NumStreams;
+    ulittle32_t StreamSizes[NumStreams];
+    ulittle32_t StreamBlocks[NumStreams][];
+  };
+
+And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
+Note that each of the last two arrays is of variable length, and in particular
+that the second array is jagged.
+
+**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
+streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
+
+Stream 0: ceil(1000 / 4096) = 1 block
+
+Stream 1: ceil(8000 / 4096) = 2 blocks
+
+Stream 2: ceil(16000 / 4096) = 4 blocks
+
+Stream 3: ceil(9000 / 4096) = 3 blocks
+
+In total, 10 blocks are used.  Let's see what the stream directory might look
+like:
+
+.. code-block:: c++
+
+  struct StreamDirectory {
+    ulittle32_t NumStreams = 4;
+    ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
+    ulittle32_t StreamBlocks[][] = {
+      {4},
+      {5, 6},
+      {11, 9, 7, 8},
+      {10, 15, 12}
+    };
+  };
+
+In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
+would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
+``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
+
+Note also that the streams are discontiguous, and that part of stream 3 is in the
+middle of part of stream 2.  You cannot assume anything about the layout of the
+blocks!
+
+Alignment and Block Boundaries
+==============================
+As may be clear by now, it is possible for a single field (whether it be a high
+level record, a long string field, or even a single ``uint16``) to begin and
+end in separate blocks.  For example, if the block size is 4096 bytes, and a
+``uint16`` field begins at the last byte of the current block, then it would
+need to end on the first byte of the next block.  Since blocks are not
+necessarily contiguously laid out in the file, this means that both the consumer
+and the producer of an MSF file must be prepared to split data apart
+accordingly.  In the aforementioned example, the high byte of the ``uint16``
+would be written to the last byte of block N, and the low byte would be written
+to the first byte of block N+1, which could be tens of thousands of bytes later
+(or even earlier!) in the file, depending on what the stream directory says.
diff --git a/docs/PDB/PublicStream.rst b/docs/PDB/PublicStream.rst

index 5b413cfb886590a516b2f68239d6c7061def0e34..7c860c266eeab45b310a1ffb0acc5b51b45b4c95 100644 (file)
--- a/docs/PDB/PublicStream.rst
+++ b/docs/PDB/PublicStream.rst
@@ -1,3 +1,3 @@
-=====================================\r
-The PDB Public Symbol Stream\r
-=====================================\r
+=====================================
+The PDB Public Symbol Stream
+=====================================
diff --git a/docs/PDB/TpiStream.rst b/docs/PDB/TpiStream.rst

index 91429b8b90a4b22f62349ac60335fe20f00816e3..314f688d108cabd493cce73d529eeab8d11f2bfe 100644 (file)
--- a/docs/PDB/TpiStream.rst
+++ b/docs/PDB/TpiStream.rst
@@ -1,312 +1,312 @@
-=====================================\r
-The PDB TPI and IPI Streams\r
-=====================================\r
-\r
-.. contents::\r
-   :local:\r
-\r
-.. _tpi_intro:\r
-\r
-Introduction\r
-============\r
-\r
-The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about\r
-all types used in the program.  It is organized as a :ref:`header <tpi_header>`\r
-followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`.  Types are\r
-referenced from various streams and records throughout the PDB by their\r
-:ref:`type index <type_indices>`.  In general, the sequence of type records\r
-following the :ref:`header <tpi_header>` forms a topologically sorted DAG\r
-(directed acyclic graph), which means that a type record B can only refer to\r
-the type A if ``A.TypeIndex < B.TypeIndex``.  While there are rare cases where\r
-this property will not hold (particularly when dealing with object files\r
-compiled with MASM), an implementation should try very hard to make this\r
-property hold, as it means the entire type graph can be constructed in a single\r
-pass.\r
-\r
-.. important::\r
-   Type records form a topologically sorted DAG (directed acyclic graph).\r
-   \r
-.. _tpi_ipi:\r
-\r
-TPI vs IPI Stream\r
-=================\r
-\r
-Recent versions of the PDB format (aka all versions covered by this document)\r
-have 2 streams with identical layout, henceforth referred to as the TPI stream\r
-and IPI stream.  Subsequent contents of this document describing the on-disk\r
-format apply equally whether it is for the TPI Stream or the IPI Stream.  The\r
-only difference between the two is in *which* CodeView records are allowed to\r
-appear in each one, summarized by the following table:\r
-\r
-+----------------------+---------------------+\r
-|    TPI Stream        |    IPI Stream       |\r
-+======================+=====================+\r
-|  LF_POINTER          | LF_FUNC_ID          |\r
-+----------------------+---------------------+\r
-|  LF_MODIFIER         | LF_MFUNC_ID         |\r
-+----------------------+---------------------+\r
-|  LF_PROCEDURE        | LF_BUILDINFO        |\r
-+----------------------+---------------------+\r
-|  LF_MFUNCTION        | LF_SUBSTR_LIST      |\r
-+----------------------+---------------------+\r
-|  LF_LABEL            | LF_STRING_ID        |\r
-+----------------------+---------------------+\r
-|  LF_ARGLIST          | LF_UDT_SRC_LINE     |\r
-+----------------------+---------------------+\r
-|  LF_FIELDLIST        | LF_UDT_MOD_SRC_LINE |\r
-+----------------------+---------------------+\r
-|  LF_ARRAY            |                     |\r
-+----------------------+---------------------+\r
-|  LF_CLASS            |                     |\r
-+----------------------+---------------------+\r
-|  LF_STRUCTURE        |                     |\r
-+----------------------+---------------------+\r
-|  LF_INTERFACE        |                     |\r
-+----------------------+---------------------+\r
-|  LF_UNION            |                     |\r
-+----------------------+---------------------+\r
-|  LF_ENUM             |                     |\r
-+----------------------+---------------------+\r
-|  LF_TYPESERVER2      |                     |\r
-+----------------------+---------------------+\r
-|  LF_VFTABLE          |                     |\r
-+----------------------+---------------------+\r
-|  LF_VTSHAPE          |                     |\r
-+----------------------+---------------------+\r
-|  LF_BITFIELD         |                     |\r
-+----------------------+---------------------+\r
-|  LF_METHODLIST       |                     |\r
-+----------------------+---------------------+\r
-|  LF_PRECOMP          |                     |\r
-+----------------------+---------------------+\r
-|  LF_ENDPRECOMP       |                     |\r
-+----------------------+---------------------+\r
-\r
-The usage of these records is described in more detail in\r
-:doc:`CodeView Type Records <CodeViewTypes>`.\r
-\r
-.. _type_indices:\r
-\r
-Type Indices\r
-============\r
-\r
-A type index is a 32-bit integer that uniquely identifies a type inside of an\r
-object file's ``.debug$T`` section or a PDB file's TPI or IPI stream.  The\r
-value of the type index for the first type record from the TPI stream is given\r
-by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>`\r
-although in practice this value is always equal to 0x1000 (4096).\r
-\r
-Any type index with a high bit set is considered to come from the IPI stream,\r
-although this appears to be more of a hack, and LLVM does not generate type\r
-indices of this nature.  They can, however, be observed in Microsoft PDBs\r
-occasionally, so one should be prepared to handle them.  Note that having the\r
-high bit set is not a necessary condition to determine whether a type index\r
-comes from the IPI stream, it is only sufficient.\r
-\r
-Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed\r
-to come from the appropriate stream, and any type index less than this is a\r
-bitmask which can be decomposed as follows:\r
-\r
-.. code-block:: none\r
-\r
-  .---------------------------.------.----------.\r
-  |           Unused          | Mode |   Kind   |\r
-  '---------------------------'------'----------'\r
-  |+32                        |+12   |+8        |+0\r
-\r
-\r
-- **Kind** - A value from the following enum:\r
-\r
-.. code-block:: c++\r
-\r
-  enum class SimpleTypeKind : uint32_t {\r
-    None = 0x0000,          // uncharacterized type (no type)\r
-    Void = 0x0003,          // void\r
-    NotTranslated = 0x0007, // type not translated by cvpack\r
-    HResult = 0x0008,       // OLE/COM HRESULT\r
-\r
-    SignedCharacter = 0x0010,   // 8 bit signed\r
-    UnsignedCharacter = 0x0020, // 8 bit unsigned\r
-    NarrowCharacter = 0x0070,   // really a char\r
-    WideCharacter = 0x0071,     // wide char\r
-    Character16 = 0x007a,       // char16_t\r
-    Character32 = 0x007b,       // char32_t\r
-\r
-    SByte = 0x0068,       // 8 bit signed int\r
-    Byte = 0x0069,        // 8 bit unsigned int\r
-    Int16Short = 0x0011,  // 16 bit signed\r
-    UInt16Short = 0x0021, // 16 bit unsigned\r
-    Int16 = 0x0072,       // 16 bit signed int\r
-    UInt16 = 0x0073,      // 16 bit unsigned int\r
-    Int32Long = 0x0012,   // 32 bit signed\r
-    UInt32Long = 0x0022,  // 32 bit unsigned\r
-    Int32 = 0x0074,       // 32 bit signed int\r
-    UInt32 = 0x0075,      // 32 bit unsigned int\r
-    Int64Quad = 0x0013,   // 64 bit signed\r
-    UInt64Quad = 0x0023,  // 64 bit unsigned\r
-    Int64 = 0x0076,       // 64 bit signed int\r
-    UInt64 = 0x0077,      // 64 bit unsigned int\r
-    Int128Oct = 0x0014,   // 128 bit signed int\r
-    UInt128Oct = 0x0024,  // 128 bit unsigned int\r
-    Int128 = 0x0078,      // 128 bit signed int\r
-    UInt128 = 0x0079,     // 128 bit unsigned int\r
-\r
-    Float16 = 0x0046,                 // 16 bit real\r
-    Float32 = 0x0040,                 // 32 bit real\r
-    Float32PartialPrecision = 0x0045, // 32 bit PP real\r
-    Float48 = 0x0044,                 // 48 bit real\r
-    Float64 = 0x0041,                 // 64 bit real\r
-    Float80 = 0x0042,                 // 80 bit real\r
-    Float128 = 0x0043,                // 128 bit real\r
-\r
-    Complex16 = 0x0056,                 // 16 bit complex\r
-    Complex32 = 0x0050,                 // 32 bit complex\r
-    Complex32PartialPrecision = 0x0055, // 32 bit PP complex\r
-    Complex48 = 0x0054,                 // 48 bit complex\r
-    Complex64 = 0x0051,                 // 64 bit complex\r
-    Complex80 = 0x0052,                 // 80 bit complex\r
-    Complex128 = 0x0053,                // 128 bit complex\r
-\r
-    Boolean8 = 0x0030,   // 8 bit boolean\r
-    Boolean16 = 0x0031,  // 16 bit boolean\r
-    Boolean32 = 0x0032,  // 32 bit boolean\r
-    Boolean64 = 0x0033,  // 64 bit boolean\r
-    Boolean128 = 0x0034, // 128 bit boolean\r
-  };\r
-\r
-- **Mode** - A value from the following enum:\r
-\r
-.. code-block:: c++\r
-\r
-  enum class SimpleTypeMode : uint32_t {\r
-    Direct = 0,        // Not a pointer\r
-    NearPointer = 1,   // Near pointer\r
-    FarPointer = 2,    // Far pointer\r
-    HugePointer = 3,   // Huge pointer\r
-    NearPointer32 = 4, // 32 bit near pointer\r
-    FarPointer32 = 5,  // 32 bit far pointer\r
-    NearPointer64 = 6, // 64 bit near pointer\r
-    NearPointer128 = 7 // 128 bit near pointer\r
-  };\r
-  \r
-Note that for pointers, the bitness is represented in the mode.  So a ``void*``\r
-would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits\r
-but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits.\r
-\r
-By convention, the type index for ``std::nullptr_t`` is constructed the same way\r
-as the type index for ``void*``, but using the bitless enumeration value\r
-``NearPointer``.\r
-\r
-\r
-\r
-.. _tpi_header:\r
-\r
-Stream Header\r
-=============\r
-At offset 0 of the TPI Stream is a header with the following layout:\r
-\r
-\r
-.. code-block:: c++\r
-\r
-  struct TpiStreamHeader {\r
-    uint32_t Version;\r
-    uint32_t HeaderSize;\r
-    uint32_t TypeIndexBegin;\r
-    uint32_t TypeIndexEnd;\r
-    uint32_t TypeRecordBytes;\r
-\r
-    uint16_t HashStreamIndex;\r
-    uint16_t HashAuxStreamIndex;\r
-    uint32_t HashKeySize;\r
-    uint32_t NumHashBuckets;\r
-\r
-    int32_t HashValueBufferOffset;\r
-    uint32_t HashValueBufferLength;\r
-    \r
-    int32_t IndexOffsetBufferOffset;\r
-    uint32_t IndexOffsetBufferLength;\r
-\r
-    int32_t HashAdjBufferOffset;\r
-    uint32_t HashAdjBufferLength;\r
-  };\r
-  \r
-- **Version** - A value from the following enum.\r
-\r
-.. code-block:: c++\r
-\r
-  enum class TpiStreamVersion : uint32_t {\r
-    V40 = 19950410,\r
-    V41 = 19951122,\r
-    V50 = 19961031,\r
-    V70 = 19990903,\r
-    V80 = 20040203,\r
-  };\r
-\r
-Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be\r
-``V80``, and no other values have been observed.  It is assumed that should\r
-another value be observed, the layout described by this document may not be\r
-accurate.\r
-\r
-- **HeaderSize** - ``sizeof(TpiStreamHeader)``\r
-  \r
-- **TypeIndexBegin** - The numeric value of the type index representing the\r
-  first type record in the TPI stream.  This is usually the value 0x1000 as type\r
-  indices lower than this are reserved (see :ref:`Type Indices <type_indices>` for\r
-  a discussion of reserved type indices).\r
-  \r
-- **TypeIndexEnd** - One greater than the numeric value of the type index\r
-  representing the last type record in the TPI stream.  The total number of type\r
-  records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``.\r
-  \r
-- **TypeRecordBytes** - The number of bytes of type record data following the header.\r
-  \r
-- **HashStreamIndex** - The index of a stream which contains a list of hashes for\r
-  every type record.  This value may be -1, indicating that hash information is not\r
-  present.  In practice a valid stream index is always observed, so any producer\r
-  implementation should be prepared to emit this stream to ensure compatibility with\r
-  tools which may expect it to be present.\r
-  \r
-- **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate\r
-  hash table, although this has not been observed in practice and it's unclear what it\r
-  might be used for.\r
-  \r
-- **HashKeySize** - The size of a hash value (usually 4 bytes).\r
-\r
-- **NumHashBuckets** - The number of buckets used to generate the hash values in the\r
-  aforementioned hash streams.\r
-\r
-- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within\r
-  the TPI Hash Stream of the list of hash values.  It should be assumed that there\r
-  are either 0 hash values, or a number equal to the number of type records in the\r
-  TPI stream (``TypeIndexEnd - TypeEndBegin``).  Thus, if ``HashBufferLength`` is\r
-  not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the\r
-  PDB malformed.\r
-\r
-- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size\r
-  within the TPI Hash Stream of the Type Index Offsets Buffer.  This is a list of\r
-  pairs of uint32_t's where the first value is a :ref:`Type Index <type_indices>`\r
-  and the second value is the offset in the type record data of the type with this\r
-  index.  This can be used to do a binary search followed bin a linear search to\r
-  get amortized O(log n) lookup by type index.\r
-\r
-- **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within\r
-  the TPI hash stream of a serialized hash table whose keys are the hash values\r
-  in the hash value buffer and whose values are type indices.  This appears to\r
-  be useful in incremental linking scenarios, so that if a type is modified an\r
-  entry can be created mapping the old hash value to the new type index so that\r
-  a PDB file consumer can always have the most up to date version of the type\r
-  without forcing the incremental linker to garbage collect and update\r
-  references that point to the old version to now point to the new version.\r
-  The layout of this hash table is described in :doc:`HashTable`.\r
-\r
-.. _tpi_records:\r
-\r
-CodeView Type Record List\r
-=========================\r
-Following the header, there are ``TypeRecordBytes`` bytes of data that represent a\r
-variable length array of :doc:`CodeView type records <CodeViewTypes>`.  The number\r
-of such records (e.g. the length of the array) can be determined by computing the\r
-value ``Header.TypeIndexEnd - Header.TypeIndexBegin``.\r
-\r
-log(n) random access is provided by way of the Type Index Offsets array (if present)\r
-described previously.
-\ No newline at end of file
+=====================================
+The PDB TPI and IPI Streams
+=====================================
+
+.. contents::
+   :local:
+
+.. _tpi_intro:
+
+Introduction
+============
+
+The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about
+all types used in the program.  It is organized as a :ref:`header <tpi_header>`
+followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`.  Types are
+referenced from various streams and records throughout the PDB by their
+:ref:`type index <type_indices>`.  In general, the sequence of type records
+following the :ref:`header <tpi_header>` forms a topologically sorted DAG
+(directed acyclic graph), which means that a type record B can only refer to
+the type A if ``A.TypeIndex < B.TypeIndex``.  While there are rare cases where
+this property will not hold (particularly when dealing with object files
+compiled with MASM), an implementation should try very hard to make this
+property hold, as it means the entire type graph can be constructed in a single
+pass.
+
+.. important::
+   Type records form a topologically sorted DAG (directed acyclic graph).
+   
+.. _tpi_ipi:
+
+TPI vs IPI Stream
+=================
+
+Recent versions of the PDB format (aka all versions covered by this document)
+have 2 streams with identical layout, henceforth referred to as the TPI stream
+and IPI stream.  Subsequent contents of this document describing the on-disk
+format apply equally whether it is for the TPI Stream or the IPI Stream.  The
+only difference between the two is in *which* CodeView records are allowed to
+appear in each one, summarized by the following table:
+
++----------------------+---------------------+
+|    TPI Stream        |    IPI Stream       |
++======================+=====================+
+|  LF_POINTER          | LF_FUNC_ID          |
++----------------------+---------------------+
+|  LF_MODIFIER         | LF_MFUNC_ID         |
++----------------------+---------------------+
+|  LF_PROCEDURE        | LF_BUILDINFO        |
++----------------------+---------------------+
+|  LF_MFUNCTION        | LF_SUBSTR_LIST      |
++----------------------+---------------------+
+|  LF_LABEL            | LF_STRING_ID        |
++----------------------+---------------------+
+|  LF_ARGLIST          | LF_UDT_SRC_LINE     |
++----------------------+---------------------+
+|  LF_FIELDLIST        | LF_UDT_MOD_SRC_LINE |
++----------------------+---------------------+
+|  LF_ARRAY            |                     |
++----------------------+---------------------+
+|  LF_CLASS            |                     |
++----------------------+---------------------+
+|  LF_STRUCTURE        |                     |
++----------------------+---------------------+
+|  LF_INTERFACE        |                     |
++----------------------+---------------------+
+|  LF_UNION            |                     |
++----------------------+---------------------+
+|  LF_ENUM             |                     |
++----------------------+---------------------+
+|  LF_TYPESERVER2      |                     |
++----------------------+---------------------+
+|  LF_VFTABLE          |                     |
++----------------------+---------------------+
+|  LF_VTSHAPE          |                     |
++----------------------+---------------------+
+|  LF_BITFIELD         |                     |
++----------------------+---------------------+
+|  LF_METHODLIST       |                     |
++----------------------+---------------------+
+|  LF_PRECOMP          |                     |
++----------------------+---------------------+
+|  LF_ENDPRECOMP       |                     |
++----------------------+---------------------+
+
+The usage of these records is described in more detail in
+:doc:`CodeView Type Records <CodeViewTypes>`.
+
+.. _type_indices:
+
+Type Indices
+============
+
+A type index is a 32-bit integer that uniquely identifies a type inside of an
+object file's ``.debug$T`` section or a PDB file's TPI or IPI stream.  The
+value of the type index for the first type record from the TPI stream is given
+by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>`
+although in practice this value is always equal to 0x1000 (4096).
+
+Any type index with a high bit set is considered to come from the IPI stream,
+although this appears to be more of a hack, and LLVM does not generate type
+indices of this nature.  They can, however, be observed in Microsoft PDBs
+occasionally, so one should be prepared to handle them.  Note that having the
+high bit set is not a necessary condition to determine whether a type index
+comes from the IPI stream, it is only sufficient.
+
+Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed
+to come from the appropriate stream, and any type index less than this is a
+bitmask which can be decomposed as follows:
+
+.. code-block:: none
+
+  .---------------------------.------.----------.
+  |           Unused          | Mode |   Kind   |
+  '---------------------------'------'----------'
+  |+32                        |+12   |+8        |+0
+
+
+- **Kind** - A value from the following enum:
+
+.. code-block:: c++
+
+  enum class SimpleTypeKind : uint32_t {
+    None = 0x0000,          // uncharacterized type (no type)
+    Void = 0x0003,          // void
+    NotTranslated = 0x0007, // type not translated by cvpack
+    HResult = 0x0008,       // OLE/COM HRESULT
+
+    SignedCharacter = 0x0010,   // 8 bit signed
+    UnsignedCharacter = 0x0020, // 8 bit unsigned
+    NarrowCharacter = 0x0070,   // really a char
+    WideCharacter = 0x0071,     // wide char
+    Character16 = 0x007a,       // char16_t
+    Character32 = 0x007b,       // char32_t
+
+    SByte = 0x0068,       // 8 bit signed int
+    Byte = 0x0069,        // 8 bit unsigned int
+    Int16Short = 0x0011,  // 16 bit signed
+    UInt16Short = 0x0021, // 16 bit unsigned
+    Int16 = 0x0072,       // 16 bit signed int
+    UInt16 = 0x0073,      // 16 bit unsigned int
+    Int32Long = 0x0012,   // 32 bit signed
+    UInt32Long = 0x0022,  // 32 bit unsigned
+    Int32 = 0x0074,       // 32 bit signed int
+    UInt32 = 0x0075,      // 32 bit unsigned int
+    Int64Quad = 0x0013,   // 64 bit signed
+    UInt64Quad = 0x0023,  // 64 bit unsigned
+    Int64 = 0x0076,       // 64 bit signed int
+    UInt64 = 0x0077,      // 64 bit unsigned int
+    Int128Oct = 0x0014,   // 128 bit signed int
+    UInt128Oct = 0x0024,  // 128 bit unsigned int
+    Int128 = 0x0078,      // 128 bit signed int
+    UInt128 = 0x0079,     // 128 bit unsigned int
+
+    Float16 = 0x0046,                 // 16 bit real
+    Float32 = 0x0040,                 // 32 bit real
+    Float32PartialPrecision = 0x0045, // 32 bit PP real
+    Float48 = 0x0044,                 // 48 bit real
+    Float64 = 0x0041,                 // 64 bit real
+    Float80 = 0x0042,                 // 80 bit real
+    Float128 = 0x0043,                // 128 bit real
+
+    Complex16 = 0x0056,                 // 16 bit complex
+    Complex32 = 0x0050,                 // 32 bit complex
+    Complex32PartialPrecision = 0x0055, // 32 bit PP complex
+    Complex48 = 0x0054,                 // 48 bit complex
+    Complex64 = 0x0051,                 // 64 bit complex
+    Complex80 = 0x0052,                 // 80 bit complex
+    Complex128 = 0x0053,                // 128 bit complex
+
+    Boolean8 = 0x0030,   // 8 bit boolean
+    Boolean16 = 0x0031,  // 16 bit boolean
+    Boolean32 = 0x0032,  // 32 bit boolean
+    Boolean64 = 0x0033,  // 64 bit boolean
+    Boolean128 = 0x0034, // 128 bit boolean
+  };
+
+- **Mode** - A value from the following enum:
+
+.. code-block:: c++
+
+  enum class SimpleTypeMode : uint32_t {
+    Direct = 0,        // Not a pointer
+    NearPointer = 1,   // Near pointer
+    FarPointer = 2,    // Far pointer
+    HugePointer = 3,   // Huge pointer
+    NearPointer32 = 4, // 32 bit near pointer
+    FarPointer32 = 5,  // 32 bit far pointer
+    NearPointer64 = 6, // 64 bit near pointer
+    NearPointer128 = 7 // 128 bit near pointer
+  };
+  
+Note that for pointers, the bitness is represented in the mode.  So a ``void*``
+would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits
+but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits.
+
+By convention, the type index for ``std::nullptr_t`` is constructed the same way
+as the type index for ``void*``, but using the bitless enumeration value
+``NearPointer``.
+
+
+
+.. _tpi_header:
+
+Stream Header
+=============
+At offset 0 of the TPI Stream is a header with the following layout:
+
+
+.. code-block:: c++
+
+  struct TpiStreamHeader {
+    uint32_t Version;
+    uint32_t HeaderSize;
+    uint32_t TypeIndexBegin;
+    uint32_t TypeIndexEnd;
+    uint32_t TypeRecordBytes;
+
+    uint16_t HashStreamIndex;
+    uint16_t HashAuxStreamIndex;
+    uint32_t HashKeySize;
+    uint32_t NumHashBuckets;
+
+    int32_t HashValueBufferOffset;
+    uint32_t HashValueBufferLength;
+    
+    int32_t IndexOffsetBufferOffset;
+    uint32_t IndexOffsetBufferLength;
+
+    int32_t HashAdjBufferOffset;
+    uint32_t HashAdjBufferLength;
+  };
+  
+- **Version** - A value from the following enum.
+
+.. code-block:: c++
+
+  enum class TpiStreamVersion : uint32_t {
+    V40 = 19950410,
+    V41 = 19951122,
+    V50 = 19961031,
+    V70 = 19990903,
+    V80 = 20040203,
+  };
+
+Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
+``V80``, and no other values have been observed.  It is assumed that should
+another value be observed, the layout described by this document may not be
+accurate.
+
+- **HeaderSize** - ``sizeof(TpiStreamHeader)``
+  
+- **TypeIndexBegin** - The numeric value of the type index representing the
+  first type record in the TPI stream.  This is usually the value 0x1000 as type
+  indices lower than this are reserved (see :ref:`Type Indices <type_indices>` for
+  a discussion of reserved type indices).
+  
+- **TypeIndexEnd** - One greater than the numeric value of the type index
+  representing the last type record in the TPI stream.  The total number of type
+  records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``.
+  
+- **TypeRecordBytes** - The number of bytes of type record data following the header.
+  
+- **HashStreamIndex** - The index of a stream which contains a list of hashes for
+  every type record.  This value may be -1, indicating that hash information is not
+  present.  In practice a valid stream index is always observed, so any producer
+  implementation should be prepared to emit this stream to ensure compatibility with
+  tools which may expect it to be present.
+  
+- **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate
+  hash table, although this has not been observed in practice and it's unclear what it
+  might be used for.
+  
+- **HashKeySize** - The size of a hash value (usually 4 bytes).
+
+- **NumHashBuckets** - The number of buckets used to generate the hash values in the
+  aforementioned hash streams.
+
+- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within
+  the TPI Hash Stream of the list of hash values.  It should be assumed that there
+  are either 0 hash values, or a number equal to the number of type records in the
+  TPI stream (``TypeIndexEnd - TypeEndBegin``).  Thus, if ``HashBufferLength`` is
+  not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the
+  PDB malformed.
+
+- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size
+  within the TPI Hash Stream of the Type Index Offsets Buffer.  This is a list of
+  pairs of uint32_t's where the first value is a :ref:`Type Index <type_indices>`
+  and the second value is the offset in the type record data of the type with this
+  index.  This can be used to do a binary search followed bin a linear search to
+  get amortized O(log n) lookup by type index.
+
+- **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within
+  the TPI hash stream of a serialized hash table whose keys are the hash values
+  in the hash value buffer and whose values are type indices.  This appears to
+  be useful in incremental linking scenarios, so that if a type is modified an
+  entry can be created mapping the old hash value to the new type index so that
+  a PDB file consumer can always have the most up to date version of the type
+  without forcing the incremental linker to garbage collect and update
+  references that point to the old version to now point to the new version.
+  The layout of this hash table is described in :doc:`HashTable`.
+
+.. _tpi_records:
+
+CodeView Type Record List
+=========================
+Following the header, there are ``TypeRecordBytes`` bytes of data that represent a
+variable length array of :doc:`CodeView type records <CodeViewTypes>`.  The number
+of such records (e.g. the length of the array) can be determined by computing the
+value ``Header.TypeIndexEnd - Header.TypeIndexBegin``.
+
+log(n) random access is provided by way of the Type Index Offsets array (if present)
+described previously.
diff --git a/docs/PDB/index.rst b/docs/PDB/index.rst

index 0662e9d9e58f36d83ba1c6182004c6db760c2e48..88e6015642abb6f830d268c403d583dc0ea7dcb3 100644 (file)
--- a/docs/PDB/index.rst
+++ b/docs/PDB/index.rst
@@ -1,168 +1,168 @@
-=====================================\r
-The PDB File Format\r
-=====================================\r
-\r
-.. contents::\r
-   :local:\r
-\r
-.. _pdb_intro:\r
-\r
-Introduction\r
-============\r
-\r
-PDB (Program Database) is a file format invented by Microsoft and which contains\r
-debug information that can be consumed by debuggers and other tools.  Since\r
-officially supported APIs exist on Windows for querying debug information from\r
-PDBs even without the user understanding the internals of the file format, a\r
-large ecosystem of tools has been built for Windows to consume this format.  In\r
-order for Clang to be able to generate programs that can interoperate with these\r
-tools, it is necessary for us to generate PDB files ourselves.\r
-\r
-At the same time, LLVM has a long history of being able to cross-compile from\r
-any platform to any platform, and we wish for the same to be true here.  So it\r
-is necessary for us to understand the PDB file format at the byte-level so that\r
-we can generate PDB files entirely on our own.\r
-\r
-This manual describes what we know about the PDB file format today.  The layout\r
-of the file, the various streams contained within, the format of individual\r
-records within, and more.\r
-\r
-We would like to extend our heartfelt gratitude to Microsoft, without whom we\r
-would not be where we are today.  Much of the knowledge contained within this\r
-manual was learned through reading code published by Microsoft on their `GitHub\r
-repo <https://github.com/Microsoft/microsoft-pdb>`__.\r
-\r
-.. _pdb_layout:\r
-\r
-File Layout\r
-===========\r
-\r
-.. important::\r
-   Unless otherwise specified, all numeric values are encoded in little endian.\r
-   If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always\r
-   assume it is little endian!\r
-\r
-.. toctree::\r
-   :hidden:\r
-   \r
-   MsfFile\r
-   PdbStream\r
-   TpiStream\r
-   DbiStream\r
-   ModiStream\r
-   PublicStream\r
-   GlobalStream\r
-   HashTable\r
-   CodeViewSymbols\r
-   CodeViewTypes\r
-\r
-.. _msf:\r
-\r
-The MSF Container\r
------------------\r
-A PDB file is really just a special case of an MSF (Multi-Stream Format) file.\r
-An MSF file is actually a miniature "file system within a file".  It contains\r
-multiple streams (aka files) which can represent arbitrary data, and these\r
-streams are divided into blocks which may not necessarily be contiguously\r
-laid out within the file (aka fragmented).  Additionally, the MSF contains a\r
-stream directory (aka MFT) which describes how the streams (files) are laid\r
-out within the MSF.\r
-\r
-For more information about the MSF container format, stream directory, and\r
-block layout, see :doc:`MsfFile`.\r
-\r
-.. _streams:\r
-\r
-Streams\r
--------\r
-The PDB format contains a number of streams which describe various information\r
-such as the types, symbols, source files, and compilands (e.g. object files)\r
-of a program, as well as some additional streams containing hash tables that are\r
-used by debuggers and other tools to provide fast lookup of records and types\r
-by name, and various other information about how the program was compiled such\r
-as the specific toolchain used, and more.  A summary of streams contained in a\r
-PDB file is as follows:\r
-\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| Name               | Stream Index                 | Contents                                  |\r
-+====================+==============================+===========================================+\r
-| Old Directory      | - Fixed Stream Index 0       | - Previous MSF Stream Directory           |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| PDB Stream         | - Fixed Stream Index 1       | - Basic File Information                  |\r
-|                    |                              | - Fields to match EXE to this PDB         |\r
-|                    |                              | - Map of named streams to stream indices  |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| TPI Stream         | - Fixed Stream Index 2       | - CodeView Type Records                   |\r
-|                    |                              | - Index of TPI Hash Stream                |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| DBI Stream         | - Fixed Stream Index 3       | - Module/Compiland Information            |\r
-|                    |                              | - Indices of individual module streams    |\r
-|                    |                              | - Indices of public / global streams      |\r
-|                    |                              | - Section Contribution Information        |\r
-|                    |                              | - Source File Information                 |\r
-|                    |                              | - References to streams containing        |\r
-|                    |                              |   FPO / PGO Data                          |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| IPI Stream         | - Fixed Stream Index 4       | - CodeView Type Records                   |\r
-|                    |                              | - Index of IPI Hash Stream                |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| /LinkInfo          | - Contained in PDB Stream    | - Unknown                                 |\r
-|                    |   Named Stream map           |                                           |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| /src/headerblock   | - Contained in PDB Stream    | - Summary of embedded source file content |\r
-|                    |   Named Stream map           |   (e.g. natvis files)                     |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| /names             | - Contained in PDB Stream    | - PDB-wide global string table used for   |\r
-|                    |   Named Stream map           |   string de-duplication                   |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| Module Info Stream | - Contained in DBI Stream    | - CodeView Symbol Records for this module |\r
-|                    | - One for each compiland     | - Line Number Information                 |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| Public Stream      | - Contained in DBI Stream    | - Public (Exported) Symbol Records        |\r
-|                    |                              | - Index of Public Hash Stream             |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| Global Stream      | - Contained in DBI Stream    | - Single combined master symbol-table     |\r
-|                    |                              | - Index of Global Hash Stream             |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| TPI Hash Stream    | - Contained in TPI Stream    | - Hash table for looking up TPI records   |\r
-|                    |                              |   by name                                 |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-| IPI Hash Stream    | - Contained in IPI Stream    | - Hash table for looking up IPI records   |\r
-|                    |                              |   by name                                 |\r
-+--------------------+------------------------------+-------------------------------------------+\r
-\r
-More information about the structure of each of these can be found on the\r
-following pages:\r
-   \r
-:doc:`PdbStream`\r
-   Information about the PDB Info Stream and how it is used to match PDBs to EXEs.\r
-\r
-:doc:`TpiStream`\r
-   Information about the TPI stream and the CodeView records contained within.\r
-\r
-:doc:`DbiStream`\r
-   Information about the DBI stream and relevant substreams including the Module Substreams,\r
-   source file information, and CodeView symbol records contained within.\r
-\r
-:doc:`ModiStream`\r
-   Information about the Module Information Stream, of which there is one for each compilation\r
-   unit and the format of symbols contained within.\r
-\r
-:doc:`PublicStream`\r
-   Information about the Public Symbol Stream.\r
-\r
-:doc:`GlobalStream`\r
-   Information about the Global Symbol Stream.\r
-\r
-:doc:`HashTable`\r
-   Information about the serialized hash table format used internally to represent things such\r
-   as the Named Stream Map and the Hash Adjusters in the :doc:`TPI/IPI Stream <TpiStream>`.\r
-\r
-CodeView\r
-========\r
-CodeView is another format which comes into the picture.  While MSF defines\r
-the structure of the overall file, and PDB defines the set of streams that\r
-appear within the MSF file and the format of those streams, CodeView defines\r
-the format of **symbol and type records** that appear within specific streams.\r
-Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for\r
-more information about the CodeView format.\r
+=====================================
+The PDB File Format
+=====================================
+
+.. contents::
+   :local:
+
+.. _pdb_intro:
+
+Introduction
+============
+
+PDB (Program Database) is a file format invented by Microsoft and which contains
+debug information that can be consumed by debuggers and other tools.  Since
+officially supported APIs exist on Windows for querying debug information from
+PDBs even without the user understanding the internals of the file format, a
+large ecosystem of tools has been built for Windows to consume this format.  In
+order for Clang to be able to generate programs that can interoperate with these
+tools, it is necessary for us to generate PDB files ourselves.
+
+At the same time, LLVM has a long history of being able to cross-compile from
+any platform to any platform, and we wish for the same to be true here.  So it
+is necessary for us to understand the PDB file format at the byte-level so that
+we can generate PDB files entirely on our own.
+
+This manual describes what we know about the PDB file format today.  The layout
+of the file, the various streams contained within, the format of individual
+records within, and more.
+
+We would like to extend our heartfelt gratitude to Microsoft, without whom we
+would not be where we are today.  Much of the knowledge contained within this
+manual was learned through reading code published by Microsoft on their `GitHub
+repo <https://github.com/Microsoft/microsoft-pdb>`__.
+
+.. _pdb_layout:
+
+File Layout
+===========
+
+.. important::
+   Unless otherwise specified, all numeric values are encoded in little endian.
+   If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
+   assume it is little endian!
+
+.. toctree::
+   :hidden:
+   
+   MsfFile
+   PdbStream
+   TpiStream
+   DbiStream
+   ModiStream
+   PublicStream
+   GlobalStream
+   HashTable
+   CodeViewSymbols
+   CodeViewTypes
+
+.. _msf:
+
+The MSF Container
+-----------------
+A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
+An MSF file is actually a miniature "file system within a file".  It contains
+multiple streams (aka files) which can represent arbitrary data, and these
+streams are divided into blocks which may not necessarily be contiguously
+laid out within the file (aka fragmented).  Additionally, the MSF contains a
+stream directory (aka MFT) which describes how the streams (files) are laid
+out within the MSF.
+
+For more information about the MSF container format, stream directory, and
+block layout, see :doc:`MsfFile`.
+
+.. _streams:
+
+Streams
+-------
+The PDB format contains a number of streams which describe various information
+such as the types, symbols, source files, and compilands (e.g. object files)
+of a program, as well as some additional streams containing hash tables that are
+used by debuggers and other tools to provide fast lookup of records and types
+by name, and various other information about how the program was compiled such
+as the specific toolchain used, and more.  A summary of streams contained in a
+PDB file is as follows:
+
++--------------------+------------------------------+-------------------------------------------+
+| Name               | Stream Index                 | Contents                                  |
++====================+==============================+===========================================+
+| Old Directory      | - Fixed Stream Index 0       | - Previous MSF Stream Directory           |
++--------------------+------------------------------+-------------------------------------------+
+| PDB Stream         | - Fixed Stream Index 1       | - Basic File Information                  |
+|                    |                              | - Fields to match EXE to this PDB         |
+|                    |                              | - Map of named streams to stream indices  |
++--------------------+------------------------------+-------------------------------------------+
+| TPI Stream         | - Fixed Stream Index 2       | - CodeView Type Records                   |
+|                    |                              | - Index of TPI Hash Stream                |
++--------------------+------------------------------+-------------------------------------------+
+| DBI Stream         | - Fixed Stream Index 3       | - Module/Compiland Information            |
+|                    |                              | - Indices of individual module streams    |
+|                    |                              | - Indices of public / global streams      |
+|                    |                              | - Section Contribution Information        |
+|                    |                              | - Source File Information                 |
+|                    |                              | - References to streams containing        |
+|                    |                              |   FPO / PGO Data                          |
++--------------------+------------------------------+-------------------------------------------+
+| IPI Stream         | - Fixed Stream Index 4       | - CodeView Type Records                   |
+|                    |                              | - Index of IPI Hash Stream                |
++--------------------+------------------------------+-------------------------------------------+
+| /LinkInfo          | - Contained in PDB Stream    | - Unknown                                 |
+|                    |   Named Stream map           |                                           |
++--------------------+------------------------------+-------------------------------------------+
+| /src/headerblock   | - Contained in PDB Stream    | - Summary of embedded source file content |
+|                    |   Named Stream map           |   (e.g. natvis files)                     |
++--------------------+------------------------------+-------------------------------------------+
+| /names             | - Contained in PDB Stream    | - PDB-wide global string table used for   |
+|                    |   Named Stream map           |   string de-duplication                   |
++--------------------+------------------------------+-------------------------------------------+
+| Module Info Stream | - Contained in DBI Stream    | - CodeView Symbol Records for this module |
+|                    | - One for each compiland     | - Line Number Information                 |
++--------------------+------------------------------+-------------------------------------------+
+| Public Stream      | - Contained in DBI Stream    | - Public (Exported) Symbol Records        |
+|                    |                              | - Index of Public Hash Stream             |
++--------------------+------------------------------+-------------------------------------------+
+| Global Stream      | - Contained in DBI Stream    | - Single combined master symbol-table     |
+|                    |                              | - Index of Global Hash Stream             |
++--------------------+------------------------------+-------------------------------------------+
+| TPI Hash Stream    | - Contained in TPI Stream    | - Hash table for looking up TPI records   |
+|                    |                              |   by name                                 |
++--------------------+------------------------------+-------------------------------------------+
+| IPI Hash Stream    | - Contained in IPI Stream    | - Hash table for looking up IPI records   |
+|                    |                              |   by name                                 |
++--------------------+------------------------------+-------------------------------------------+
+
+More information about the structure of each of these can be found on the
+following pages:
+   
+:doc:`PdbStream`
+   Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
+
+:doc:`TpiStream`
+   Information about the TPI stream and the CodeView records contained within.
+
+:doc:`DbiStream`
+   Information about the DBI stream and relevant substreams including the Module Substreams,
+   source file information, and CodeView symbol records contained within.
+
+:doc:`ModiStream`
+   Information about the Module Information Stream, of which there is one for each compilation
+   unit and the format of symbols contained within.
+
+:doc:`PublicStream`
+   Information about the Public Symbol Stream.
+
+:doc:`GlobalStream`
+   Information about the Global Symbol Stream.
+
+:doc:`HashTable`
+   Information about the serialized hash table format used internally to represent things such
+   as the Named Stream Map and the Hash Adjusters in the :doc:`TPI/IPI Stream <TpiStream>`.
+
+CodeView
+========
+CodeView is another format which comes into the picture.  While MSF defines
+the structure of the overall file, and PDB defines the set of streams that
+appear within the MSF file and the format of those streams, CodeView defines
+the format of **symbol and type records** that appear within specific streams.
+Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
+more information about the CodeView format.
author	Nico Weber <nicolasweber@gmx.de>
	Wed, 1 May 2019 19:15:05 +0000 (19:15 +0000)
committer	Nico Weber <nicolasweber@gmx.de>
	Wed, 1 May 2019 19:15:05 +0000 (19:15 +0000)
docs/PDB/GlobalStream.rst		patch \| blob \| history
docs/PDB/HashTable.rst		patch \| blob \| history
docs/PDB/ModiStream.rst		patch \| blob \| history
docs/PDB/MsfFile.rst		patch \| blob \| history
docs/PDB/PublicStream.rst		patch \| blob \| history
docs/PDB/TpiStream.rst		patch \| blob \| history
docs/PDB/index.rst		patch \| blob \| history