From 69d81304fd322d440b25ef307df6d5ec1a89917d Mon Sep 17 00:00:00 2001 From: Ian Darwin Date: Fri, 11 Sep 1987 14:35:01 +0000 Subject: [PATCH] Consolidate TODO file in BUGS section here. Use much newer legalnotice. Credit Henry for strtok.c. Add section for MAGIC directory. --- doc/file.man | 143 ++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 108 insertions(+), 35 deletions(-) diff --git a/doc/file.man b/doc/file.man index fe82be8d..91bc0b51 100644 --- a/doc/file.man +++ b/doc/file.man @@ -77,23 +77,20 @@ If an argument appears to be .SM ASCII , .I file attempts to guess its language. -The language tests look for particular strings that can appear -anywhere in the first few blocks of the file. +The language tests look for particular strings (cf \fInames.h\fP) +that can appear anywhere in the first few blocks of a file. For example, the keyword -.I dimension -indicates that the file is most likely a \s-1FORTRAN\s0 program, +.I .br +indicates that the file is most likely a troff input file, just as the keyword .I struct indicates a C program. These tests are less reliable than the previous two groups, so they are performed last. -The keywords used in the language tests are stored in the -source file -.I names.h . -The routine that handles the language tests also tests for some miscellany +The language test routines also test for some miscellany (such as .I tar -archives) and determines whether an unknown file should be +archives) and determine whether an unknown file should be labelled as `ascii text' or `data'. .PP Use @@ -129,6 +126,16 @@ contained therein. Its behaviour is mostly compatible with the System V program of the same name. This version knows more magic, however, so it will produce different (albeit more accurate) output in many cases. +.PP +The one significant difference is that this version treats any white space +as a delimiter, so that spaces in pattern strings must be escaped. +For example, +.br +>10 string language impress (imPRESS data) +.br +in an existing magic file would have to be changed to +.br +>10 string language\e impress (imPRESS data) .SH HISTORY There has been a .I file @@ -136,38 +143,62 @@ command in every UNIX since at least Research Version 6 (man page dated January, 1975). The System V version introduced one significant major change: the external list of magic number types. -This slowed the program down slightly but made it significantly -more flexible. +This slowed the program down slightly but made it a lot more flexible. .PP This program, based on the System V version, was written by Ian Darwin without looking at anybody else's source code (I looked at one later, and was glad I hadn't!). +.PP +John Gilmore revised the code extensively, making it better than +the first version. +The program has undergone continued evolution since. .SH NOTICE -This program is copyright \(co 1986, Ian Darwin. -.B "Unmodified copies -of the source may be freely copied for any purpose -provided this notice is maintained. -.B "Redistribution of modified source copies is prohibited; -redistribute the original source with your changes as diff listings. -Send the author your changes; if I like 'em I'll incorporate them. -.PP -Author's addresses: -Postal: Box 603, Station F, Toronto, CANADA M4Y 2L8; -uUCp: {utzoo|ihnp4}!darwin!ian; -InterNet: ian@sq.com. -.PP -A few files (such as -.I is_tar.c , -.I strtok.c , -and -.I strtol.c ) -are not covered by the above restrictions. -.SH BUGS -You can't use `-' (or a null argument list) to determine the file type -of the standard input, since the program insists on doing a -.I stat (2) -call to glean some information about the file. +Copyright (c) Ian F. Darwin, 1986 and 1987.. +Written by Ian F. Darwin, {utzoo|ihnp4}!darwin!ian. +Postal: P.O. Box 603, Station F, Toronto, CANADA M4Y 2L8; +.I Strtok.c +written by and copyright by Henry Spencer, utzoo!henry. +.PP +This software is not subject to any license of the American Telephone +and Telegraph Company or of the Regents of the University of California. +.PP +Permission is granted to anyone to use this software for any purpose on +any computer system, and to alter it and redistribute it freely, subject +to the following restrictions: +.PP +1. The author is not responsible for the consequences of use of this +software, no matter how awful, even if they arise from flaws in it. +.PP +2. The origin of this software must not be misrepresented, either by +explicit claim or by omission. Since few users ever read sources, +credits must appear in the documentation. +.PP +3. Altered versions must be plainly marked as such, and must not be +misrepresented as being the original software. Since few users +ever read sources, credits must appear in the documentation. +.PP +4. This notice may not be removed or altered. +.PP +The file +.I is_tar.c +was written by John Gilmore from his public-domain +.I tar +program, and is not covered by the above restrictions. +.SH MAGIC DIRECTORY +The order of entries in the magic file is significant. +Depending on what system you are using, the order that +they are put together may be incorrect. +Keep your old magic file around for comparison purposes +(rename it to /etc/magic.old). +.PP +The author of this progam will be glad to receive your additional +or corrected magic file entries. .PP +There must be a way to automate the construction of the Magic +file from all the glop in magdir. What is it? +.PP +A consolidation of magic file entries will be distributed periodically. +.SH BUGS .I File uses several algorithms that favor speed over accuracy, thus it is often misled about the contents of ASCII files. @@ -175,6 +206,34 @@ thus it is often misled about the contents of ASCII files. The support for ASCII files (primarily for programming languages) is simplistic, inefficient and requires recompilation to update. .PP +Should there be an ``else'' clause to follow a series of +continuation lines? +.PP +It might be worthwhile to implement recursive file inspection, +so that compressed files, uuencoded, etc., can say ``compressed +ascii text'' or ``compressed executable'' or ``compressed tar archive" +or whatever. +.PP +The magic file and keywords should have regular expression support. +.PP +It might be advisable to allow upper-case letters in keywords +for e.g., troff commands vs man page macros. +Regular expression support would make this easy. +.PP +The program doesn't grok Fortran. +It should be able to figure Fortran by seeing some keywords which +appear indented at the start of line. +Regular expression support would make this easy. +.PP +The list of keywords in +.I ascmagic +probably belongs in the Magic file. +This could be done simply by using some keyword like `*' +for the offset value. +.PP +The program should malloc the magic file structures, +rather than using an array as at present. +.PP The magic file should be compiled into a binary (or better yet, fixed-length ASCII strings for use in heterogenous network environments) @@ -184,3 +243,17 @@ with the flexibility of the System V version. But then there would have to be yet another magic number for the .I magic.out file. +.PP +Another optimisation would be to sort +the magic file so that we can just run down all the +tests for the first byte, first word, first long, etc, once we +have fetched it. Complain about conflicts in the magic file entries. +Make a rule that the magic entries sort based on file offset rather +than position within the magic file? +.PP +The program should provide a way to give an estimate of +``how good'' a guess is. +We end up removing guesses (e.g. ``From '' as first 5 chars of file) because +they are not as good as other guesses (e.g. ``Newsgroups:'' versus +"Return-Path:"). Still, if the others don't pan out, it should be +possible to use the first guess. -- 2.40.0