.SM ASCII ,
.I file
attempts to guess its language.
-The language tests look for particular strings that can appear
-anywhere in the first few blocks of the file.
+The language tests look for particular strings (cf \fInames.h\fP)
+that can appear anywhere in the first few blocks of a file.
For example, the keyword
-.I dimension
-indicates that the file is most likely a \s-1FORTRAN\s0 program,
+.I .br
+indicates that the file is most likely a troff input file,
just as the keyword
.I struct
indicates a C program.
These tests are less reliable than the previous
two groups, so they are performed last.
-The keywords used in the language tests are stored in the
-source file
-.I names.h .
-The routine that handles the language tests also tests for some miscellany
+The language test routines also test for some miscellany
(such as
.I tar
-archives) and determines whether an unknown file should be
+archives) and determine whether an unknown file should be
labelled as `ascii text' or `data'.
.PP
Use
Its behaviour is mostly compatible with the System V program of the same name.
This version knows more magic, however, so it will produce
different (albeit more accurate) output in many cases.
+.PP
+The one significant difference is that this version treats any white space
+as a delimiter, so that spaces in pattern strings must be escaped.
+For example,
+.br
+>10 string language impress (imPRESS data)
+.br
+in an existing magic file would have to be changed to
+.br
+>10 string language\e impress (imPRESS data)
.SH HISTORY
There has been a
.I file
(man page dated January, 1975).
The System V version introduced one significant major change:
the external list of magic number types.
-This slowed the program down slightly but made it significantly
-more flexible.
+This slowed the program down slightly but made it a lot more flexible.
.PP
This program, based on the System V version,
was written by Ian Darwin without looking at anybody else's source code
(I looked at one later, and was glad I hadn't!).
+.PP
+John Gilmore revised the code extensively, making it better than
+the first version.
+The program has undergone continued evolution since.
.SH NOTICE
-This program is copyright \(co 1986, Ian Darwin.
-.B "Unmodified copies
-of the source may be freely copied for any purpose
-provided this notice is maintained.
-.B "Redistribution of modified source copies is prohibited;
-redistribute the original source with your changes as diff listings.
-Send the author your changes; if I like 'em I'll incorporate them.
-.PP
-Author's addresses:
-Postal: Box 603, Station F, Toronto, CANADA M4Y 2L8;
-uUCp: {utzoo|ihnp4}!darwin!ian;
-InterNet: ian@sq.com.
-.PP
-A few files (such as
-.I is_tar.c ,
-.I strtok.c ,
-and
-.I strtol.c )
-are not covered by the above restrictions.
-.SH BUGS
-You can't use `-' (or a null argument list) to determine the file type
-of the standard input, since the program insists on doing a
-.I stat (2)
-call to glean some information about the file.
+Copyright (c) Ian F. Darwin, 1986 and 1987..
+Written by Ian F. Darwin, {utzoo|ihnp4}!darwin!ian.
+Postal: P.O. Box 603, Station F, Toronto, CANADA M4Y 2L8;
+.I Strtok.c
+written by and copyright by Henry Spencer, utzoo!henry.
+.PP
+This software is not subject to any license of the American Telephone
+and Telegraph Company or of the Regents of the University of California.
+.PP
+Permission is granted to anyone to use this software for any purpose on
+any computer system, and to alter it and redistribute it freely, subject
+to the following restrictions:
+.PP
+1. The author is not responsible for the consequences of use of this
+software, no matter how awful, even if they arise from flaws in it.
+.PP
+2. The origin of this software must not be misrepresented, either by
+explicit claim or by omission. Since few users ever read sources,
+credits must appear in the documentation.
+.PP
+3. Altered versions must be plainly marked as such, and must not be
+misrepresented as being the original software. Since few users
+ever read sources, credits must appear in the documentation.
+.PP
+4. This notice may not be removed or altered.
+.PP
+The file
+.I is_tar.c
+was written by John Gilmore from his public-domain
+.I tar
+program, and is not covered by the above restrictions.
+.SH MAGIC DIRECTORY
+The order of entries in the magic file is significant.
+Depending on what system you are using, the order that
+they are put together may be incorrect.
+Keep your old magic file around for comparison purposes
+(rename it to /etc/magic.old).
+.PP
+The author of this progam will be glad to receive your additional
+or corrected magic file entries.
.PP
+There must be a way to automate the construction of the Magic
+file from all the glop in magdir. What is it?
+.PP
+A consolidation of magic file entries will be distributed periodically.
+.SH BUGS
.I File
uses several algorithms that favor speed over accuracy,
thus it is often misled about the contents of ASCII files.
The support for ASCII files (primarily for programming languages)
is simplistic, inefficient and requires recompilation to update.
.PP
+Should there be an ``else'' clause to follow a series of
+continuation lines?
+.PP
+It might be worthwhile to implement recursive file inspection,
+so that compressed files, uuencoded, etc., can say ``compressed
+ascii text'' or ``compressed executable'' or ``compressed tar archive"
+or whatever.
+.PP
+The magic file and keywords should have regular expression support.
+.PP
+It might be advisable to allow upper-case letters in keywords
+for e.g., troff commands vs man page macros.
+Regular expression support would make this easy.
+.PP
+The program doesn't grok Fortran.
+It should be able to figure Fortran by seeing some keywords which
+appear indented at the start of line.
+Regular expression support would make this easy.
+.PP
+The list of keywords in
+.I ascmagic
+probably belongs in the Magic file.
+This could be done simply by using some keyword like `*'
+for the offset value.
+.PP
+The program should malloc the magic file structures,
+rather than using an array as at present.
+.PP
The magic file should be compiled into a binary
(or better yet, fixed-length ASCII strings
for use in heterogenous network environments)
But then there would have to be yet another magic number for the
.I magic.out
file.
+.PP
+Another optimisation would be to sort
+the magic file so that we can just run down all the
+tests for the first byte, first word, first long, etc, once we
+have fetched it. Complain about conflicts in the magic file entries.
+Make a rule that the magic entries sort based on file offset rather
+than position within the magic file?
+.PP
+The program should provide a way to give an estimate of
+``how good'' a guess is.
+We end up removing guesses (e.g. ``From '' as first 5 chars of file) because
+they are not as good as other guesses (e.g. ``Newsgroups:'' versus
+"Return-Path:"). Still, if the others don't pan out, it should be
+possible to use the first guess.