Add documentation for PDB TPI/IPI Stream.

author Zachary Turner <zturner@google.com>

Fri, 5 Apr 2019 16:43:42 +0000 (16:43 +0000)

committer Zachary Turner <zturner@google.com>

Fri, 5 Apr 2019 16:43:42 +0000 (16:43 +0000)
author Zachary Turner <zturner@google.com>
Fri, 5 Apr 2019 16:43:42 +0000 (16:43 +0000)
committer Zachary Turner <zturner@google.com>
Fri, 5 Apr 2019 16:43:42 +0000 (16:43 +0000)
diff --git a/docs/PDB/TpiStream.rst b/docs/PDB/TpiStream.rst

index 1e3297ebdc74df3235a649991e6f0edfcb8b9278..74f2c37b0839c56d7a2704246c27d0a5a93fa979 100644 (file)
--- a/docs/PDB/TpiStream.rst
+++ b/docs/PDB/TpiStream.rst
@@ -1,3 +1,304 @@
  =====================================\r
-The PDB TPI Stream\r
+The PDB TPI and IPI Streams\r
  =====================================\r
+\r
+.. contents::\r
+   :local:\r
+\r
+.. _tpi_intro:\r
+\r
+Introduction\r
+============\r
+\r
+The PDB TPI Stream (Index 2) and IPI Stream (Index 3) contain information about\r
+all types used in the program.  It is organized as a :ref:`header <tpi_header>`\r
+followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`.  Types are\r
+referenced from various streams and records throughout the PDB by their\r
+:ref:`type index <type_indices>`.  In general, the sequence of type records\r
+following the :ref:`header <tpi_header>` forms a topologically sorted DAG\r
+(directed acyclic graph), which means that a type record B can only refer to\r
+the type A if ``A.TypeIndex < B.TypeIndex``.  While there are rare cases where\r
+this property will not hold (particularly when dealing with object files\r
+compiled with MASM), an implementation should try very hard to make this\r
+property hold, as it means the entire type graph can be constructed in a single\r
+pass.\r
+\r
+.. important::\r
+   Type records form a topologically sorted DAG (directed acyclic graph).\r
+   \r
+.. _tpi_ipi:\r
+\r
+TPI vs IPI Stream\r
+=================\r
+\r
+Recent versions of the PDB format (aka all versions covered by this document)\r
+have 2 streams with identical layout, henceforth referred to as the TPI stream\r
+and IPI stream.  Subsequent contents of this document describing the on-disk\r
+format apply equally whether it is for the TPI Stream or the IPI Stream.  The\r
+only difference between the two is in *which* CodeView records are allowed to\r
+appear in each one, summarized by the following table:\r
+\r
++----------------------+---------------------+\r
+|    TPI Stream        |    IPI Stream       |\r
++======================+=====================+\r
+|  LF_POINTER          | LF_FUNC_ID          |\r
++----------------------+---------------------+\r
+|  LF_MODIFIER         | LF_MFUNC_ID         |\r
++----------------------+---------------------+\r
+|  LF_PROCEDURE        | LF_BUILDINFO        |\r
++----------------------+---------------------+\r
+|  LF_MFUNCTION        | LF_SUBSTR_LIST      |\r
++----------------------+---------------------+\r
+|  LF_LABEL            | LF_STRING_ID        |\r
++----------------------+---------------------+\r
+|  LF_ARGLIST          | LF_UDT_SRC_LINE     |\r
++----------------------+---------------------+\r
+|  LF_FIELDLIST        | LF_UDT_MOD_SRC_LINE |\r
++----------------------+---------------------+\r
+|  LF_ARRAY            |                     |\r
++----------------------+---------------------+\r
+|  LF_CLASS            |                     |\r
++----------------------+---------------------+\r
+|  LF_STRUCTURE        |                     |\r
++----------------------+---------------------+\r
+|  LF_INTERFACE        |                     |\r
++----------------------+---------------------+\r
+|  LF_UNION            |                     |\r
++----------------------+---------------------+\r
+|  LF_ENUM             |                     |\r
++----------------------+---------------------+\r
+|  LF_TYPESERVER2      |                     |\r
++----------------------+---------------------+\r
+|  LF_VFTABLE          |                     |\r
++----------------------+---------------------+\r
+|  LF_VTSHAPE          |                     |\r
++----------------------+---------------------+\r
+|  LF_BITFIELD         |                     |\r
++----------------------+---------------------+\r
+|  LF_METHODLIST       |                     |\r
++----------------------+---------------------+\r
+|  LF_PRECOMP          |                     |\r
++----------------------+---------------------+\r
+|  LF_ENDPRECOMP       |                     |\r
++----------------------+---------------------+\r
+\r
+The usage of these records is described in more detail in\r
+:doc:`CodeView Type Records <CodeViewTypes>`.\r
+\r
+.. _type_indices:\r
+\r
+Type Indices\r
+============\r
+\r
+A type index is a 32-bit integer that uniquely identifies a type inside of an\r
+object file's ``.debug$T`` section or a PDB file's TPI or IPI stream.  The\r
+value of the type index for the first type record from the TPI stream is given\r
+by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>`\r
+although in practice this value is always equal to 0x1000 (4096).\r
+\r
+Any type index with a high bit set is considered to come from the IPI stream,\r
+although this appears to be more of a hack, and LLVM does not generate type\r
+indices of this nature.  They can, however, be observed in Microsoft PDBs\r
+occasionally, so one should be prepared to handle them.  Note that having the\r
+high bit set is not a necessary condition to determine whether a type index\r
+comes from the IPI stream, it is only sufficient.\r
+\r
+Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed\r
+to come from the appropriate stream, and any type index less than this is a\r
+bitmask which can be decomposed as follows:\r
+\r
+.. code-block:: none\r
+\r
+  .---------------------------.------.----------.\r
+  |           Unused          | Mode |   Kind   |\r
+  '---------------------------'------'----------'\r
+  |+32                        |+12   |+8        |+0\r
+\r
+\r
+- **Kind** - A value from the following enum:\r
+\r
+.. code-block:: c++\r
+\r
+  enum class SimpleTypeKind : uint32_t {\r
+    None = 0x0000,          // uncharacterized type (no type)\r
+    Void = 0x0003,          // void\r
+    NotTranslated = 0x0007, // type not translated by cvpack\r
+    HResult = 0x0008,       // OLE/COM HRESULT\r
+\r
+    SignedCharacter = 0x0010,   // 8 bit signed\r
+    UnsignedCharacter = 0x0020, // 8 bit unsigned\r
+    NarrowCharacter = 0x0070,   // really a char\r
+    WideCharacter = 0x0071,     // wide char\r
+    Character16 = 0x007a,       // char16_t\r
+    Character32 = 0x007b,       // char32_t\r
+\r
+    SByte = 0x0068,       // 8 bit signed int\r
+    Byte = 0x0069,        // 8 bit unsigned int\r
+    Int16Short = 0x0011,  // 16 bit signed\r
+    UInt16Short = 0x0021, // 16 bit unsigned\r
+    Int16 = 0x0072,       // 16 bit signed int\r
+    UInt16 = 0x0073,      // 16 bit unsigned int\r
+    Int32Long = 0x0012,   // 32 bit signed\r
+    UInt32Long = 0x0022,  // 32 bit unsigned\r
+    Int32 = 0x0074,       // 32 bit signed int\r
+    UInt32 = 0x0075,      // 32 bit unsigned int\r
+    Int64Quad = 0x0013,   // 64 bit signed\r
+    UInt64Quad = 0x0023,  // 64 bit unsigned\r
+    Int64 = 0x0076,       // 64 bit signed int\r
+    UInt64 = 0x0077,      // 64 bit unsigned int\r
+    Int128Oct = 0x0014,   // 128 bit signed int\r
+    UInt128Oct = 0x0024,  // 128 bit unsigned int\r
+    Int128 = 0x0078,      // 128 bit signed int\r
+    UInt128 = 0x0079,     // 128 bit unsigned int\r
+\r
+    Float16 = 0x0046,                 // 16 bit real\r
+    Float32 = 0x0040,                 // 32 bit real\r
+    Float32PartialPrecision = 0x0045, // 32 bit PP real\r
+    Float48 = 0x0044,                 // 48 bit real\r
+    Float64 = 0x0041,                 // 64 bit real\r
+    Float80 = 0x0042,                 // 80 bit real\r
+    Float128 = 0x0043,                // 128 bit real\r
+\r
+    Complex16 = 0x0056,                 // 16 bit complex\r
+    Complex32 = 0x0050,                 // 32 bit complex\r
+    Complex32PartialPrecision = 0x0055, // 32 bit PP complex\r
+    Complex48 = 0x0054,                 // 48 bit complex\r
+    Complex64 = 0x0051,                 // 64 bit complex\r
+    Complex80 = 0x0052,                 // 80 bit complex\r
+    Complex128 = 0x0053,                // 128 bit complex\r
+\r
+    Boolean8 = 0x0030,   // 8 bit boolean\r
+    Boolean16 = 0x0031,  // 16 bit boolean\r
+    Boolean32 = 0x0032,  // 32 bit boolean\r
+    Boolean64 = 0x0033,  // 64 bit boolean\r
+    Boolean128 = 0x0034, // 128 bit boolean\r
+  };\r
+\r
+- **Mode** - A value from the following enum:\r
+\r
+.. code-block:: c++\r
+\r
+  enum class SimpleTypeMode : uint32_t {\r
+    Direct = 0,        // Not a pointer\r
+    NearPointer = 1,   // Near pointer\r
+    FarPointer = 2,    // Far pointer\r
+    HugePointer = 3,   // Huge pointer\r
+    NearPointer32 = 4, // 32 bit near pointer\r
+    FarPointer32 = 5,  // 32 bit far pointer\r
+    NearPointer64 = 6, // 64 bit near pointer\r
+    NearPointer128 = 7 // 128 bit near pointer\r
+  };\r
+  \r
+Note that for pointers, the bitness is represented in the mode.  So a ``void*``\r
+would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits\r
+but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits.\r
+\r
+By convention, the type index for ``std::nullptr_t`` is constructed the same way\r
+as the type index for ``void*``, but using the bitless enumeration value\r
+``NearPointer``.\r
+\r
+\r
+\r
+.. _tpi_header:\r
+\r
+Stream Header\r
+=============\r
+At offset 0 of the TPI Stream is a header with the following layout:\r
+\r
+\r
+.. code-block:: c++\r
+\r
+  struct TpiStreamHeader {\r
+    uint32_t Version;\r
+    uint32_t HeaderSize;\r
+    uint32_t TypeIndexBegin;\r
+    uint32_t TypeIndexEnd;\r
+    uint32_t TypeRecordBytes;\r
+\r
+    uint16_t HashStreamIndex;\r
+    uint16_t HashAuxStreamIndex;\r
+    uint32_t HashKeySize;\r
+    uint32_t NumHashBuckets;\r
+\r
+    int32_t HashValueBufferOffset;\r
+    uint32_t HashValueBufferLength;\r
+    \r
+    int32_t IndexOffsetBufferOffset;\r
+    uint32_t IndexOffsetBufferLength;\r
+\r
+    int32_t HashAdjBufferOffset;\r
+    uint32_t HashAdjBufferLength;\r
+  };\r
+  \r
+- **Version** - A value from the following enum.\r
+\r
+.. code-block:: c++\r
+\r
+  enum class TpiStreamVersion : uint32_t {\r
+    V40 = 19950410,\r
+    V41 = 19951122,\r
+    V50 = 19961031,\r
+    V70 = 19990903,\r
+    V80 = 20040203,\r
+  };\r
+\r
+Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be\r
+``V80``, and no other values have been observed.  It is assumed that should\r
+another value be observed, the layout described by this document may not be\r
+accurate.\r
+\r
+- **HeaderSize** - ``sizeof(TpiStreamHeader)``\r
+  \r
+- **TypeIndexBegin** - The numeric value of the type index representing the\r
+  first type record in the TPI stream.  This is usually the value 0x1000 as type\r
+  indices lower than this are reserved (see :ref:`Type Indices <type_indices>` for\r
+  a discussion of reserved type indices).\r
+  \r
+- **TypeIndexEnd** - One greater than the numeric value of the type index\r
+  representing the last type record in the TPI stream.  The total number of type\r
+  records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``.\r
+  \r
+- **TypeRecordBytes** - The number of bytes of type record data following the header.\r
+  \r
+- **HashStreamIndex** - The index of a stream which contains a list of hashes for\r
+  every type record.  This value may be -1, indicating that hash information is not\r
+  present.  In practice a valid stream index is always observed, so any producer\r
+  implementation should be prepared to emit this stream to ensure compatibility with\r
+  tools which may expect it to be present.\r
+  \r
+- **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate\r
+  hash table, although this has not been observed in practice and it's unclear what it\r
+  might be used for.\r
+  \r
+- **HashKeySize** - The size of a hash value (usually 4 bytes).\r
+\r
+- **NumHashBuckets** - The number of buckets used to generate the hash values in the\r
+  aforementioned hash streams.\r
+\r
+- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within\r
+  the TPI Hash Stream of the list of hash values.  It should be assumed that there\r
+  are either 0 hash values, or a number equal to the number of type records in the\r
+  TPI stream (``TypeIndexEnd - TypeEndBegin``).  Thus, if ``HashBufferLength`` is\r
+  not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the\r
+  PDB malformed.\r
+\r
+- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size\r
+  within the TPI Hash Stream of the Type Index Offsets Buffer.  This is a list of\r
+  pairs of uint32_t's where the first value is a :ref:`Type Index <type_indices>`\r
+  and the second value is the offset in the type record data of the type with this\r
+  index.  This can be used to do a binary search followed bin a linear search to\r
+  get amortized O(log n) lookup by type index.\r
+\r
+- **HashAdjBufferOffset / HashAdjBufferLength** - \r
+\r
+.. _tpi_records:\r
+\r
+CodeView Type Record List\r
+=========================\r
+Following the header, there are ``TypeRecordBytes`` bytes of data that represent a\r
+variable length array of :doc:`CodeView type records <CodeViewTypes>`.  The number\r
+of such records (e.g. the length of the array) can be determined by computing the\r
+value ``Header.TypeIndexEnd - Header.TypeIndexBegin``.\r
+\r
+log(n) random access is provided by way of the Type Index Offsets array (if present)\r
+described previously.
+\ No newline at end of file
diff --git a/docs/PDB/index.rst b/docs/PDB/index.rst

index 5300588b1d8aee8f622f4f980aa7c95fd5a2cca9..d1b4379dbf8d1e64b5bd7dc9090921fa86329633 100644 (file)
--- a/docs/PDB/index.rst
+++ b/docs/PDB/index.rst
@@ -100,7 +100,8 @@ PDB file is as follows:
  |                    |                              | - Indices of public / global streams      |\r
  |                    |                              | - Section Contribution Information        |\r
  |                    |                              | - Source File Information                 |\r
-|                    |                              | - FPO / PGO Data                          |\r
+|                    |                              | - References to streams containing        |\r
+|                    |                              |   FPO / PGO Data                          |\r
  +--------------------+------------------------------+-------------------------------------------+\r
  | IPI Stream         | - Fixed Stream Index 4       | - CodeView Type Records                   |\r
  |                    |                              | - Index of IPI Hash Stream                |\r
@@ -108,8 +109,8 @@ PDB file is as follows:
  | /LinkInfo          | - Contained in PDB Stream    | - Unknown                                 |\r
  |                    |   Named Stream map           |                                           |\r
  +--------------------+------------------------------+-------------------------------------------+\r
-| /src/headerblock   | - Contained in PDB Stream    | - Unknown                                 |\r
-|                    |   Named Stream map           |                                           |\r
+| /src/headerblock   | - Contained in PDB Stream    | - Summary of embedded source file content |\r
+|                    |   Named Stream map           |   (e.g. natvis files)                     |\r
  +--------------------+------------------------------+-------------------------------------------+\r
  | /names             | - Contained in PDB Stream    | - PDB-wide global string table used for   |\r
  |                    |   Named Stream map           |   string de-duplication                   |\r
@@ -120,7 +121,7 @@ PDB file is as follows:
  | Public Stream      | - Contained in DBI Stream    | - Public (Exported) Symbol Records        |\r
  |                    |                              | - Index of Public Hash Stream             |\r
  +--------------------+------------------------------+-------------------------------------------+\r
-| Global Stream      | - Contained in DBI Stream    | - Global Symbol Records                   |\r
+| Global Stream      | - Contained in DBI Stream    | - Single combined master symbol-table     |\r
  |                    |                              | - Index of Global Hash Stream             |\r
  +--------------------+------------------------------+-------------------------------------------+\r
  | TPI Hash Stream    | - Contained in TPI Stream    | - Hash table for looking up TPI records   |\r
@@ -129,6 +130,10 @@ PDB file is as follows:
  | IPI Hash Stream    | - Contained in IPI Stream    | - Hash table for looking up IPI records   |\r
  |                    |                              |   by name                                 |\r
  +--------------------+------------------------------+-------------------------------------------+\r
+| * LINKER* Stream   | - Last Stream in PDB File    | - Executable section information          |\r
+|                    |                              | - Incremental linking thunks              |\r
+|                    |                              | - Linker version information              |\r
++--------------------+------------------------------+-------------------------------------------+\r
  \r
  More information about the structure of each of these can be found on the\r
  following pages:\r
author	Zachary Turner <zturner@google.com>
	Fri, 5 Apr 2019 16:43:42 +0000 (16:43 +0000)
committer	Zachary Turner <zturner@google.com>
	Fri, 5 Apr 2019 16:43:42 +0000 (16:43 +0000)
docs/PDB/TpiStream.rst		patch \| blob \| history
docs/PDB/index.rst		patch \| blob \| history