Not all files in directory sample/dbase are identified
alright (see output file dbase-5.13-old.txt and dbase-mime-5.13-old.txt)
HELP.CA1 is misidentified as "DBase 3 index file" and HELP.CA3 is
misidentified as "DBase 3 data file" because the magic line for DBase 3
inside msdos magic are too weak. In reality these files are the split files
of HELP.zip , a zip file with 12 extra Bytes of Turbo C.
For the example z-machine.bin i got beside the right message
"Infocom (Z-machine 3, Release 9 / Serial 123456)" also a wrong
identification as "DBase 3 data file (no records)".
Also all Android files (all *.xml except sybase-ianywhere-cdx.trid.xml) are
misidentified as "DBase 3 data file".
The two files umlaut-test-v2.dbf and umlaut-test-v4.dbf are not recognized
as DBase , because the magic lines test only for version 3 of dbase. And so
other versions are not found.
The first programs to create DBase files run under DOS, but nowadays such
xBase files can be created by Libre Office for example running under Linux.
So i removed test lines in msdos ( see file-5.13-msdos-xBase.diff ) and add
right test lines in database magic.
Using information from
http://www.dbase.com/Knowledgebase/INT/db7_file_fmt.htm
http://www.clicketyclick.dk/databases/xbase/format/dbf.html
http://home.f1.htw-berlin.de/scheibl/db/intern/dBase.htm
i inspect at offset zero 4 byte sequence VVYYMMDD ,where month (MM) must be
in the range from 1 to 12, days (DD) in the range from 1 to 31. The lowest
DBase version (VV) is two. By these tests Infocom game Z-machine and the
Androids *.xml files are skipped.
Then i tried to test some reserved fields , which should be 0 . But this
does not work for offset 30 , where i found 0x3901 for T4.DBF or 0x710 for
T5.DBF andT6.DBF. So i use byte at offset 27, which was 0 for all inspected
Dbase files. This is reserved for multi-user dBASE. So i do not know if my
test is reliable for such exotic cases. After this test HELP.CA3 is
skipped.
Further long zero found at offset 24 is the result for .DBF files whereas
for.MDX files this is a low positive number , because it expresses
production flag, tag numbers(<=0x30), tag length(<=0x20), reserved byte
(NULL) . After some additional tests i display version info (VV byte) for
MDX and DBF files by subroutine xbase-type
For my examples VV is in the set (2,3,4,0x30,0x83). So i do not know if all
exotic xBase variants are described right, especially by information in
http://msdn.microsoft.com/en-US/library/st4a0s68(v=vs.80).aspx
Then i use the subroutine xbase-date to test and print the
creation-date and update-date (YY-DD-MM). The date information is only
stored as byte. Values below 100 should be 19YY, but i found samples were
correct interpretation is 20YY (for example umlaut-LibreOffice3.6.3.dbf)
not corresponding with information. Values equal or greater hundred should
be interpreted as 1900+VV, but at the moment they are displayed ugly as
1yy. So may be somebody can fix this problem. I also display number and
sizes
of records or tags and the name of the first item.
No xBase memo file (*.dbt *.DBT *.FPT) is identified correctly ( see output
dbt-5.13-old.txt ). This files have few characteristic sequences ( 0 or 3
at offset 16) . So i test values of some reserved bytes for the existence
of 0 or low values. For my inspected examples this works, but there may be
exist samples which contain garbage values at that points. Furthermore i
has to use some "if -else-if" constructs to match all memo file variants,
for which type and additional information is displayed by subroutine
xbase-memo-print.
After these test all memo files are identified correctly ( see output
dbt-5.13-new.txt ) except test-FoxPro_Enc.FPT, because it is an encrypted
file.
The dbase memo file adressen.dbt is is not misidentified as
"DBase 3 data file (no records)", but also characterised wrong as
"VMS Alpha executable". According to comments in vms magic files
real VMS files should look like examples vms-test6.bin or vms-test7.bin.
So i put the additional the following magic test line inside vms file.
>8 ubelong 0xec020000
After applying file-5.13-vms_dbt.diff only vms-test*.bin are identified
correct as "VMS Alpha executable". Because i have no knowledge about VMS
an expert has to check my diff file for correctness.
The Xbase index files t3-CHAR.NDX is misidentified as "X11 SNF font data,
LSB first". According to information found at
http://computer-programming-forum.com/51-perl/
8f22fb96d2e34bab.htm
the same 4 byte version info at offset zero should also occur at offset 104
So i add additional test line
>104 lelong
00000004 X11 SNF font data, LSB first
to fonts magic.
Because i found no Server Natural Font (.SNF) and had no tool bdftosnf,
i could not verify these fixes made by my patch
file-5.13-fonts-X11-SNF_NDX.diff
The 2 dbase memo files T5.DBT and T6.DBT are misidentified
(see output file pcx-5.13-old.txt ) as " PCX ver. 2.5 image data" without
geometry information because these files with 0xa000000 at offset 0 match
these magic line:
0 beshort 0x0a00 PCX ver. 2.5 image data
So i patch images magic files.
According to http://de.wikipedia.org/wiki/PCX and
http://web.archive.org/web/
20100206055706/http://www.qzx.com/pc-gpe/pcx.txt
i test for bytes 0x0a,version byte (0,2,3,4,5),compression byte flag(0,1)
and bit depth by line
0 ubelong&0xffF8fe00 0x0a000000
For real PCX files bit depth at offset 3 is greater 0. So sample DBT files
are excluded by additional test line
>3 ubyte >0
By further inspection i saw that signed values are used ,especially for
dpi, planes, coordinates, which should be always positive and not negative.
Converting 65432x10-xab.png (right identified as
PNG image data, 65432 x 10) by xnview to 65432x10-xab.pcx
gives wrong "PCX ver. 3.0 image data bounding box [0, 0] - [-105, 9], 3
planes each of 8-bit colour, 72 x 72 dpi, uncompressed"
So i replaced signed pattern by unsigned for such values.
Furthermore 60x20-v4-graytext.pcx is identified OK as "PCX for Windows
image data", but no geometry information is displayed, because original
magic does this only for PCX files with version "ver. 3.0 image data"
After applying file-5.13-images_pcx.diff test files for PCX are identified
all right (see output file pcx-5.13-new.txt )
Unfortunately SYLLABI2.CDX and SYLLABUS.CDX are identified wrong as
"Applesoft BASIC program data" because magic test in apple magic file was
to general:
0 belong&0xff00ff 0x80000 Applesoft BASIC program data
If i understood the magic fragment right first line number is stored
at offset 2 as leshort. For both files this value is zero, where
for real Applesoft BASIC that value should be positive in my opinion.
So with my patch file file-5.13-apple-basic.diff sent to mailing list
at 3.February 2013 these 2 examples are not misidentified any more.
But an expert for Applesoft BASIC should double check my diff file.
In reality these files are indices of the dBase databases with same main
name and extension dbf.
The FoxPro memo file NG.FPT is misidentifies as "MPEG sequence, v4"
because this file start with byte sequence "
000001b0 00000100 00000000"
and in animation the 2 following test lines are too general:
0 belong&0xFFFFFF00 0x00000100
>3 byte 0xB0 MPEG sequence, v4
For me it seems that non zero byte at offset 4 describes the variant like
for example "simple @ L1". If this is a true NG.FPT could be differed from
MGEP by the following additional line
>>4 byte !0 MPEG sequence, v4
Because i have no knowledge for MEG files i add this line only as comment
line (file-5.13-animation-NG_FPT.diff). So an expert has to revise this
things.
The dBASE memo file biblio.dbt is misidentified as "Dyalog APL"
because "0 byte 0xaa" is a too weak magic pattern.
Also Dyalog-test3.bin is characterised as "Dyalog APL workspace type 0
subtype 0 32-bit classic big-endian Dyalog APL workspace version 0 .0\012-
Dyalog APL". Three "Dyalog APL" entries does not make sense in my opinion.
Also the version format "0 .0" should appear as "n.m". Unfortunately i do
not have knowledge about Dyalog APL. So an expert has to revise the
dyadic magic file and fix this very erroneous file.
After changing database magic by file-5.13-database-xBase.diff
and applying the other patches i got finally a much more correcter
classification (see output dbase-5.13-new.txt and
dbase-mime-5.13-new.txt ).
There are still some files that are not identified. So to do is DBASE index
file *.NDX, DBASE Compound Index file *.CDX, dBASE IV Printer Driver *.PRF.
Maybe i will do it in the future
All diffs, output and sample files are stored under
http://mitglied.multimania.de/jenderek/file/
keep track of white-space printed so that we can recurse properly.
handle NOSPACE in indirect and use magic.
Today's fixes.
add the initial offset so that recursive "use" invocations work.
- avoid 0 offset causing an infinite loop.
- XXX: should limit indirect nesting.
add magic.h.in to the dist files.
fix previous, reading section name.
fix incorrect offset.
fix all non-ascii characters.
bump jpeg.
Check sizeof long long from Werner Fink.
PR/233: Magic contains embedded space.
re-factor gnome.
mention fsmagic fix.
Add a space if we printed some magic.
- Warn about continuation levels which are not contiguous when increasing.
- Fix broken magic files discovered by that test.
the continuation error is a magic error.
PR/229: Fix not portable pointer comparison code.
Need to pass the returnval that the child match determined in the use case.
This broke the elf mime printing, where softmagic returned a non-match although
the child match() actually printed something.
PR/224: Add geospatial designs recognition. Guess on little endian.