--- /dev/null
+=====================================\r
+The PDB DBI (Debug Info) Stream\r
+=====================================\r
--- /dev/null
+=====================================\r
+The PDB Global Symbol Stream\r
+=====================================\r
--- /dev/null
+=====================================\r
+The TPI & IPI Hash Streams\r
+=====================================\r
--- /dev/null
+=====================================\r
+The Module Information Stream\r
+=====================================\r
--- /dev/null
+=====================================\r
+The MSF File Format\r
+=====================================\r
+\r
+.. contents::\r
+ :local:\r
+\r
+.. _msf_superblock:\r
+\r
+The Superblock\r
+==============\r
+At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as\r
+follows:\r
+\r
+.. code-block:: c++\r
+\r
+ struct SuperBlock {\r
+ char FileMagic[sizeof(Magic)];\r
+ ulittle32_t BlockSize;\r
+ ulittle32_t FreeBlockMapBlock;\r
+ ulittle32_t NumBlocks;\r
+ ulittle32_t NumDirectoryBytes;\r
+ ulittle32_t Unknown;\r
+ ulittle32_t BlockMapAddr;\r
+ };\r
+\r
+- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``\r
+ followed by the bytes ``1A 44 53 00 00 00``.\r
+- **BlockSize** - The block size of the internal file system. Valid values are\r
+ 512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary\r
+ depending on the block sizes. For the purposes of LLVM, we handle only block\r
+ sizes of 4KiB, and all further discussion assumes a block size of 4KiB.\r
+- **FreeBlockMapBlock** - The index of a block within the file, at which begins\r
+ a bitfield representing the set of all blocks within the file which are "free"\r
+ (i.e. the data within that block is not used). This bitfield is spread across\r
+ the MSF file at ``BlockSize`` intervals.\r
+ **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``! This field\r
+ is designed to support incremental and atomic updates of the underlying MSF\r
+ file. While writing to an MSF file, if the value of this field is `1`, you\r
+ can write your new modified bitfield to page 2, and vice versa. Only when\r
+ you commit the file to disk do you need to swap the value in the SuperBlock\r
+ to point to the new ``FreeBlockMapBlock``.\r
+- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``\r
+ should equal the size of the file on disk.\r
+- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream\r
+ directory contains information about each stream's size and the set of blocks\r
+ that it occupies. It will be described in more detail later.\r
+- **BlockMapAddr** - The index of a block within the MSF file. At this block is\r
+ an array of ``ulittle32_t``'s listing the blocks that the stream directory\r
+ resides on. For large MSF files, the stream directory (which describes the\r
+ block layout of each stream) may not fit entirely on a single block. As a\r
+ result, this extra layer of indirection is introduced, whereby this block\r
+ contains the list of blocks that the stream directory occupies, and the stream\r
+ directory itself can be stitched together accordingly. The number of\r
+ ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.\r
+ \r
+The Stream Directory\r
+====================\r
+The Stream Directory is the root of all access to the other streams in an MSF\r
+file. Beginning at byte 0 of the stream directory is the following structure:\r
+\r
+.. code-block:: c++\r
+\r
+ struct StreamDirectory {\r
+ ulittle32_t NumStreams;\r
+ ulittle32_t StreamSizes[NumStreams];\r
+ ulittle32_t StreamBlocks[NumStreams][];\r
+ };\r
+ \r
+And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.\r
+Note that each of the last two arrays is of variable length, and in particular\r
+that the second array is jagged. \r
+\r
+**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4\r
+streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.\r
+\r
+Stream 0: ceil(1000 / 4096) = 1 block\r
+\r
+Stream 1: ceil(8000 / 4096) = 2 blocks\r
+\r
+Stream 2: ceil(16000 / 4096) = 4 blocks\r
+\r
+Stream 3: ceil(9000 / 4096) = 3 blocks\r
+\r
+In total, 10 blocks are used. Let's see what the stream directory might look\r
+like:\r
+\r
+.. code-block:: c++\r
+\r
+ struct StreamDirectory {\r
+ ulittle32_t NumStreams = 4;\r
+ ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};\r
+ ulittle32_t StreamBlocks[][] = {\r
+ {4},\r
+ {5, 6},\r
+ {11, 9, 7, 8},\r
+ {10, 15, 12}\r
+ };\r
+ };\r
+ \r
+In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``\r
+would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one\r
+``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.\r
+\r
+Note also that the streams are discontiguous, and that part of stream 3 is in the\r
+middle of part of stream 2. You cannot assume anything about the layout of the\r
+blocks!\r
+\r
+Alignment and Block Boundaries\r
+==============================\r
+As may be clear by now, it is possible for a single field (whether it be a high\r
+level record, a long string field, or even a single ``uint16``) to begin and\r
+end in separate blocks. For example, if the block size is 4096 bytes, and a\r
+``uint16`` field begins at the last byte of the current block, then it would\r
+need to end on the first byte of the next block. Since blocks are not\r
+necessarily contiguously laid out in the file, this means that both the consumer\r
+and the producer of an MSF file must be prepared to split data apart\r
+accordingly. In the aforementioned example, the high byte of the ``uint16``\r
+would be written to the last byte of block N, and the low byte would be written\r
+to the first byte of block N+1, which could be tens of thousands of bytes later\r
+(or even earlier!) in the file, depending on what the stream directory says.\r
--- /dev/null
+========================================\r
+The PDB Info Stream (aka the PDB Stream)\r
+========================================\r
--- /dev/null
+=====================================\r
+The PDB Public Symbol Stream\r
+=====================================\r
--- /dev/null
+=====================================\r
+The PDB TPI Stream\r
+=====================================\r
--- /dev/null
+=====================================\r
+The PDB File Format\r
+=====================================\r
+\r
+.. contents::\r
+ :local:\r
+\r
+.. _pdb_intro:\r
+\r
+Introduction\r
+============\r
+\r
+PDB (Program Database) is a file format invented by Microsoft and which contains\r
+debug information that can be consumed by debuggers and other tools. Since\r
+officially supported APIs exist on Windows for querying debug information from\r
+PDBs even without the user understanding the internals of the file format, a\r
+large ecosystem of tools has been built for Windows to consume this format. In\r
+order for Clang to be able to generate programs that can interoperate with these\r
+tools, it is necessary for us to generate PDB files ourselves.\r
+\r
+At the same time, LLVM has a long history of being able to cross-compile from\r
+any platform to any platform, and we wish for the same to be true here. So it\r
+is necessary for us to understand the PDB file format at the byte-level so that\r
+we can generate PDB files entirely on our own.\r
+\r
+This manual describes what we know about the PDB file format today. The layout\r
+of the file, the various streams contained within, the format of individual\r
+records within, and more.\r
+\r
+We would like to extend our heartfelt gratitude to Microsoft, without whom we\r
+would not be where we are today. Much of the knowledge contained within this\r
+manual was learned through reading code published by Microsoft on their `GitHub\r
+repo <https://github.com/Microsoft/microsoft-pdb>`__.\r
+\r
+.. _pdb_layout:\r
+\r
+File Layout\r
+===========\r
+\r
+.. toctree::\r
+ :hidden:\r
+ \r
+ MsfFile\r
+ PdbStream\r
+ TpiStream\r
+ DbiStream\r
+ ModiStream\r
+ PublicStream\r
+ GlobalStream\r
+ HashStream\r
+\r
+.. _msf:\r
+\r
+The MSF Container\r
+-----------------\r
+A PDB file is really just a special case of an MSF (Multi-Stream Format) file.\r
+An MSF file is actually a miniature "file system within a file". It contains\r
+multiple streams (aka files) which can represent arbitrary data, and these\r
+streams are divided into blocks which may not necessarily be contiguously\r
+laid out within the file (aka fragmented). Additionally, the MSF contains a\r
+stream directory (aka MFT) which describes how the streams (files) are laid\r
+out within the MSF.\r
+\r
+For more information about the MSF container format, stream directory, and\r
+block layout, see :doc:`MsfFile`.\r
+\r
+.. _streams:\r
+\r
+Streams\r
+-------\r
+The PDB format contains a number of streams which describe various information\r
+such as the types, symbols, source files, and compilands (e.g. object files)\r
+of a program, as well as some additional streams containing hash tables that are\r
+used by debuggers and other tools to provide fast lookup of records and types\r
+by name, and various other information about how the program was compiled such\r
+as the specific toolchain used, and more. A summary of streams contained in a\r
+PDB file is as follows:\r
+\r
++--------------------+------------------------------+-------------------------------------------+\r
+| Name | Stream Index | Contents |\r
++====================+==============================+===========================================+\r
+| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |\r
++--------------------+------------------------------+-------------------------------------------+\r
+| PDB Stream | - Fixed Stream Index 1 | - Basic File Information |\r
+| | | - Fields to match EXE to this PDB |\r
+| | | - Map of named streams to stream indices |\r
++--------------------+------------------------------+-------------------------------------------+\r
+| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |\r
+| | | - Index of TPI Hash Stream |\r
++--------------------+------------------------------+-------------------------------------------+\r
+| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |\r
+| | | - Indices of individual module streams |\r
+| | | - Indices of public / global streams |\r
+| | | - Section Contribution Information |\r
+| | | - Source File Information |\r
+| | | - FPO / PGO Data |\r
++--------------------+------------------------------+-------------------------------------------+\r
+| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |\r
+| | | - Index of IPI Hash Stream |\r
++--------------------+------------------------------+-------------------------------------------+\r
+| /LinkInfo | - Contained in PDB Stream | - Unknown |\r
+| | Named Stream map | |\r
++--------------------+------------------------------+-------------------------------------------+\r
+| /src/headerblock | - Contained in PDB Stream | - Unknown |\r
+| | Named Stream map | |\r
++--------------------+------------------------------+-------------------------------------------+\r
+| /names | - Contained in PDB Stream | - PDB-wide global string table used for |\r
+| | Named Stream map | string de-duplication |\r
++--------------------+------------------------------+-------------------------------------------+\r
+| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |\r
+| | - One for each compiland | - Line Number Information |\r
++--------------------+------------------------------+-------------------------------------------+\r
+| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |\r
+| | | - Index of Public Hash Stream |\r
++--------------------+------------------------------+-------------------------------------------+\r
+| Global Stream | - Contained in DBI Stream | - Global Symbol Records |\r
+| | | - Index of Global Hash Stream |\r
++--------------------+------------------------------+-------------------------------------------+\r
+| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |\r
+| | | by name |\r
++--------------------+------------------------------+-------------------------------------------+\r
+| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |\r
+| | | by name |\r
++--------------------+------------------------------+-------------------------------------------+\r
+\r
+More information about the structure of each of these can be found on the\r
+following pages:\r
+ \r
+:doc:`PdbStream`\r
+ Information about the PDB Info Stream and how it is used to match PDBs to EXEs.\r
+\r
+:doc:`TpiStream`\r
+ Information about the TPI stream and the CodeView records contained within.\r
+\r
+:doc:`DbiStream`\r
+ Information about the DBI stream and relevant substreams including the Module Substreams,\r
+ source file information, and CodeView symbol records contained within.\r
+\r
+:doc:`ModiStream`\r
+ Information about the Module Information Stream, of which there is one for each compilation\r
+ unit and the format of symbols contained within.\r
+\r
+:doc:`PublicStream`\r
+ Information about the Public Symbol Stream.\r
+\r
+:doc:`GlobalStream`\r
+ Information about the Global Symbol Stream.\r
+\r
+:doc:`HashStream`\r
+ Information about the Hash Table stream, and how it can be used to quickly look up records\r
+ by name.\r
+\r
+CodeView\r
+========\r
+CodeView is another format which comes into the picture. While MSF defines\r
+the structure of the overall file, and PDB defines the set of streams that\r
+appear within the MSF file and the format of those streams, CodeView defines\r
+the format of **symbol and type records** that appear within specific streams.\r
+Refer to the pages on `CodeView Symbol Records` and `CodeView Type Records` for\r
+more information about the CodeView format.\r
Coroutines
GlobalISel
XRay
+ PDB/index
:doc:`WritingAnLLVMPass`
Information on how to write LLVM transformations and analyses.
:doc:`XRay`
High-level documentation of how to use XRay in LLVM.
+:doc:`The Microsoft PDB File Format <PDB/index>`
+ A detailed description of the Microsoft PDB (Program Database) file format.
+
Development Process Documentation
=================================