A four-byte value (on most systems) in little-endian byte order,
interpreted as a UNIX-style date, but interpreted as local time rather
than UTC.
+.IP regex
+A regular expression match in extended POSIX regular expression syntax
+(much like egrep).
+The type specification can be optionally followed by
+.B /c
+for case-insensitive matches.
+The regular expression is always
+tested against the first
+.B N
+lines, where
+.B N
+is the given offset, thus it
+is only useful for (single-byte encoded) text.
+.B ^
+and
+.B $
+will match the beginning and end of individual lines, respectively,
+not beginning and end of file.
+.IP search
+A literal string search starting at the given offset. It must be followed by
+.B /<number>
+which specifies how many matches shall be attempted (the range).
+This is suitable for searching larger binary expressions with variable
+offsets, using
+.B \e
+escapes for special characters.
.RE
.PP
The numeric types may optionally be followed by
to specify that the value from the file must be greater than the specified
value,
.BR & ,
-to specify that the value from the file must have set all of the bits
+to specify that the value from the file must have set all of the bits
that are set in the specified value,
.BR ^ ,
-to specify that the value from the file must have clear any of the bits
+to specify that the value from the file must have clear any of the bits
that are set in the specified value, or
.BR x ,
to specify that any value will match.
If the character is omitted, it is assumed to be
.BR = .
+For all tests except
+.B string
+and
+.B regex,
+operation
+.BR !
+specifies that the line matches if the test does
+.B not
+succeed.
.IP
Numeric values are specified in C form; e.g.
.B 13
is hexadecimal.
.IP
For string values, the byte string from the
-file must match the specified byte string.
+file must match the specified byte string.
The operators
.BR = ,
.B <
performed) is printed using the message as the format string.
.PP
Some file formats contain additional information which is to be printed
-along with the file type.
-A line which begins with the character
+along with the file type or need additional tests to determine the true
+file type.
+These additional tests are introduced by one or more
.B >
-indicates additional tests and messages to be printed.
+characters preceding the offset.
The number of
.B >
on the line indicates the level of the test; a line with no
.B >
at the beginning is considered to be at level 0.
-Each line at level
-.IB n \(pl1
-is under the control of the line at level
+Tests are arranged in a tree-like hierarchy:
+If a the test on a line at level
.IB n
-most closely preceding it in the magic file.
-If the test on a line at level
-.I n
-succeeds, the tests specified in all the subsequent lines at level
-.IB n \(pl1
-are performed, and the messages printed if the tests succeed.
-The next line at level
-.I n
-terminates this.
+succeeds, all following tests at level
+.IB n+1
+are performed, and the messages printed if the tests succeed, untile a line
+with level
+.IB n
+(or less) appears.
+For more complex files, one can use empty messages to get just the
+"if/then" effect, in the following way:
+.sp
+.nf
+ 0 string MZ
+ >0x18 leshort <0x40 MS-DOS executable
+ >0x18 leshort >0x3f extended PC executable (e.g., MS Windows)
+.fi
+.PP
+Offsets do not need to be constant, but can also be read from the file
+being examined.
If the first character following the last
.B >
is a
in the file.
Indirect offsets are of the form:
.BI (( x [.[bslBSL]][+\-][ y ]).
-The value of
+The value of
.I x
is used as an offset in the file. A byte, short or long is read at that offset
-depending on the
-.B [bslBSL]
+depending on the
+.B [bslBSL]
type specifier.
The capitalized types interpret the number as a big endian
value, whereas the small letter versions interpret the number as a little
is added and the result is used as an offset in the file.
The default type if one is not specified is long.
.PP
-Sometimes you do not know the exact offset as this depends on the length of
-preceding fields.
-You can specify an offset relative to the end of the
-last uplevel field (of course this may only be done for sublevel tests, i.e.
-test beginning with
-.B >
-).
-Such a relative offset is specified using
-.B &
-as a prefix to the offset.
+That way variable length structures can be examined:
+.sp
+.nf
+ # MS Windows executables are also valid MS-DOS executables
+ 0 string MZ
+ >0x18 leshort <0x40 MZ executable (MS-DOS)
+ # skip the whole block below if it is not an extended executable
+ >0x18 leshort >0x3f
+ >>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
+ >>(0x3c.l) string LX\e0\e0 LX executable (OS/2)
+.fi
+.PP
+This strategy of examining has one drawback: You must make sure that
+you eventually print something, or users may get empty output (like, when
+there is neither PE\e0\e0 nor LE\e0\e0 in the above example)
+.PP
+If this indirect offset cannot be used as-is, there are simple calculations
+possible: appending
+.BI [+-*/%&|^]<number>
+inside parentheses allows one to modify
+the value read from the file before it is used as an offset:
+.sp
+.nf
+ # MS Windows executables are also valid MS-DOS executables
+ 0 string MZ
+ # sometimes, the value at 0x18 is less that 0x40 but there's still an
+ # extended executable, simply appended to the file
+ >0x18 leshort <0x40
+ >>(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
+ >>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
+.fi
+.PP
+Sometimes you do not know the exact offset as this depends on the length or
+position (when indirection was used before) of preceding fields. You can
+specify an offset relative to the end of the last uplevel field using
+.BI &
+as a prefix to the offset:
+.sp
+.nf
+ 0 string MZ
+ >0x18 leshort >0x3f
+ >>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
+ # immediately following the PE signature is the CPU type
+ >>>&0 leshort 0x14c for Intel 80386
+ >>>&0 leshort 0x184 for DEC Alpha
+.fi
+.PP
+Indirect and relative offsets can be combined:
+.sp
+.nf
+ 0 string MZ
+ >0x18 leshort <0x40
+ >>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
+ # if it's not COFF, go back 512 bytes and add the offset taken
+ # from byte 2/3, which is yet another way of finding the start
+ # of the extended executable
+ >>>&(2.s-514) string LE LE executable (MS Windows VxD driver)
+.fi
+.PP
+Or the other way around:
+.sp
+.nf
+ 0 string MZ
+ >0x18 leshort >0x3f
+ >>(0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
+ # at offset 0x80 (-4, since relative offsets start at the end
+ # of the uplevel match) inside the LE header, we find the absolute
+ # offset to the code area, where we look for a specific signature
+ >>>(&0x7c.l+0x26) string UPX \eb, UPX compressed
+.fi
+.PP
+Or even both!
+.sp
+.nf
+ 0 string MZ
+ >0x18 leshort >0x3f
+ >>(0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
+ # at offset 0x58 inside the LE header, we find the relative offset
+ # to a data area where we look for a specific signature
+ >>>&(&0x54.l-3) string UNACE \eb, ACE self-extracting archive
+.fi
+.PP
+Finally, if you have to deal with offset/length pairs in your file, even the
+second value in a parenthesed expression can be taken from the file itself,
+using another set of parentheses. Note that this additional indirect offset
+is always relative to the start of the main indirect offset.
+.sp
+.nf
+ 0 string MZ
+ >0x18 leshort >0x3f
+ >>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
+ # search for the PE section called ".idata"...
+ >>>&0xf4 search/0x140 .idata
+ # ...and go to the end of it, calculated from start+length;
+ # these are located 14 and 10 bytes after the section name
+ >>>>(&0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive
+.fi
.SH BUGS
-The formats
+The formats
.IR long ,
.IR belong ,
.IR lelong ,
and
.I ledate
are system-dependent; perhaps they should be specified as a number
-of bytes (2B, 4B, etc),
+of bytes (2B, 4B, etc),
since the files being recognized typically come from
a system on which the lengths are invariant.
.PP
.\" Date: 3 Sep 85 08:19:07 GMT
.\" Organization: Sun Microsystems, Inc.
.\" Lines: 136
-.\"
+.\"
.\" Here's a manual page for the format accepted by the "file" made by adding
.\" the changes I posted to the S5R2 version.
.\"
.\" Modified for Ian Darwin's version of the file command.
-.\" @(#)$Id: magic.man,v 1.27 2003/09/12 19:43:30 christos Exp $
+.\" @(#)$Id: magic.man,v 1.28 2005/03/17 17:34:15 christos Exp $