--- /dev/null
+<h2> ZZIP Future </h2> What next to come.
+
+<!--border--> <date> 15. July 2002 </date>
+
+<h3> ZIP-Write </h3>
+
+<P>
+ Anybody out there who wants to program the write-support for the
+ zziplib? Actually, I just do not have the time to do it and no
+ real need to but I guess it would be nice for people as for
+ example to spit out savegame files in zipformat. The actual
+ programming path is almost obvious - start off with the zziplib
+ as it is, and let it open an existing zip-file. This will parse
+ the central directory into memory - including the file-offsets
+ for each file. Then, truncate the zip-realfile to the place that
+ the central-dir was found (identical with the end of the last
+ file). If a datafile is opened for writing, either add a new
+ entry or modify the start-offset of the existing entry to the
+ end of the zip-realfile - the old data is simply junk. Then
+ init zlib to do the deflation of the data and append it to the
+ current zip-realfile. When the zipdir-handle is getting closed
+ from write-mode, the zip's central-directory needs to be appended
+ to the file on disk. This coincides with creating a new zip-file
+ with an empty central-directory that can be spit out to disk.
+ During development, do not care about creating temp-files to
+ guard against corruption for partial writes - the usual application
+ will use the zziplib to create zip savegames in one turn, no
+ "update"-operation needs to be implemented like exists in the
+ standalone zip command utilities.
+</P>
+
+<h3> readdir for subdir inside zip magicdir </h3>
+
+<P>
+ See the notes in the first paragraphs of <a href="zzip-api.html">
+ ZZIP Programmers Interface</a> description. It would add some
+ complexity for something I never needed so far. The question
+ came up with using zziplib as the backend of a dynamic webserver
+ to store the content in compressed form possibly through the
+ incarnation of a php module - and some scripted functionality
+ that walks all directories to index the files hosted. I'm not
+ going to implement that myself but perhaps someone else wants
+ to do it and send me patches for free.
+</P>
+
+<h3> obfuscation example project </h3>
+
+<P>
+ A subproject that shows <b>all</b> the steps from a dat-tree
+ to a dat-zip to an obfuscated-dat along with build-files and
+ source-files for all helper tools needed to obfuscate and
+ deobfuscate, plus a sample program to use the obfuscated
+ dat-file and make some use of it. Along with some extra
+ documentation about 20..40 hours. Don't underestimate the
+ amount of work for it! (otherwise a great student project).
+</P>
+
+<h3> zip/unzip tool </h3>
+
+<P>
+ The infozip tools implement a full set of zip/unzip routines
+ based on internal code to access the zip-format. The zziplib
+ has its own set of zip-format routines. Still, it should be
+ possible to write a frontend to the library that implements
+ parts (if not all) of the options of the infozip zip/unzip
+ tools. Even without write-support in zziplib it would be
+ interesting to see an normal unzip-tool that does not use
+ the magic-wrappers thereby only going off at plain zip-files.
+ On the upside, such a tool would be smaller than the infozip
+ tools since it can use the library routines that are shared
+ with other tools as well. Again - don't underestimate the
+ amount of work for it, I guess 40..80 hours as there is a lot
+ of fine-tuning needed to match the infozip model.
+</P>
--- /dev/null
+<h2> The Library </h2> Overview
+
+<!--border-->
+<date> created 1.Jun.2000, last updated 09.Feb.2003 </date>
+
+<!-- 1. section of zzip-zip.html -->
+
+<P>
+ The <a href="zziplib.html">zziplib library</a> is intentionally
+ lightweight, it offers the ability to easily extract data from
+ files archived in a single zip file. Applications can bundle
+ files into a single zip archive and access them.
+ The implementation is based only on the (free) subset of
+ compression with the <a href="http://www.gzip.org/zlib">
+ zlib algorithm</a> which is actually used by the <tt>zip/unzip</tt> tools.
+</P>
+
+<p align="center"><span align="center">
+ The library allows reading zip archives in a number of ways,
+</span></p><dl>
+<dt>archive mode:</dt>
+<dd> reading the zip directory and extracting files from it.
+ This is the traditional mode as seen with unzip-utilities.
+ Some extra unzip-utiles for transparent/magic mode are
+ shipped as well.
+</dd>
+<dt>replacement mode:</dt>
+<dd> Use ZZIP_FILE / ZZIP_DIR pointers provided by zziplib and
+ put them to work with routines originally developped to
+ work with real directories and file handles. The API calls
+ do follow traditional synopsis from posix/stdio.
+</dd>
+<dt>transparent mode:</dt>
+<dd> Use replacement handles and allow the open()-calls to
+ automatically detect when a file is contained in a zip
+ archive or when it is a real file in the file system.
+ A filepath can be partly in a real filesystem and partly
+ within the zip archive when one is seen.
+</dd>
+<dt> ext magic </dt>
+<dd> Use the same filepath to access either a zipped or real
+ file - it looks for a real file and there is none then
+ every subdirectory of the path is checked, a ".zip"
+ extension appended, and the zipped file transparently
+ opened. This can speed up dat-file development
+ dramatically.
+</dd>
+<dt> io/xor magic </dt>
+<dd> The access to the filesystem can be hooked up - examples
+ are given for xor obfuscation which is great for game
+ artwork and AI data. A small intro for SDLrwops usage is
+ given as well.
+</dd>
+</dl>
+
--- /dev/null
+<h2> ZIP Format </h2> About Zip Parsing Internals...
+
+<!--border--> <date> 17. December 2002 </date>
+
+<h3> ZIP Trailer Block </h3>
+
+<P>
+ The general ZIP file format is written sequentially - each file
+ being added gets a local file header and its inflated data. When
+ all files are written then a central directory is written - and
+ this central directory may even span multiple disks. And each
+ disk gets a descriptor block that contains a pointer to the start
+ of the central directory. This descriptor is always written last
+ and therefore we call it the "ZIP File Trailer Block".
+</P>
+<P>
+ Okay, so we know that this ZIP Trailer is always at the end of a zip
+ file and that is has a fixed length, and a magic four-byte value at
+ its block start. That should make it easy to detect zip files but in
+ the real world it is not that easy - it is allowed to add a zip
+ archive comment text <em>after</em> the Trailer block. It's rarely
+ used these days but it turns out that a zip reader must be ready
+ to search for the Trailer block starting at the end of the file
+ and looking upwards for the Trailer magic (it's "PK\5\6" btw).
+</P>
+<P>
+ Now that's what the internal function __zip_find_disk_trailer is
+ used for. It's somewhat optimized as we try to use mmap features
+ of the underlying operating system. The returned structure is
+ called zzip_disk_trailer in the library source code, and we only
+ need two values actually: u_rootseek and u_rootsize. The first of
+ these can be used to lseek to the place of the central directory
+ and the second value tells us the byte size of the central directory.
+</P>
+
+<h3> ZIP Central Directory </h3>
+
+<P>
+ So here we are at the central directory. The disk trailer did also
+ tell us how many entries are there but it is not that easy to read
+ them. Each directory entry (zzip_root_dirent type) has again a
+ magic value up front followed by a few items but they all have some
+ dos format - consider the timestamps, and atleast size/seek values
+ are in intel byteorder. So we might want to parse them into a format
+ that is easier to handle in internal code.
+</P>
+<P>
+ That is also needed for another reason - there are three items in that
+ directory entry being size values of three variadic fields following
+ right after the directory. That's right, three of these. The first
+ variadic field is the filename of this directory entry. In other
+ words, the root directory entry does not contain a seek value of
+ where the filename starts off, the start of the filename is
+ implicitly given with the end address of the directory entry.
+</P>
+<P>
+ The size value for the filename does simply say how long the
+ filename is - however, and more importantly, it allows us to
+ compute the start of the next variadic field, called the extra
+ info field. Well, we do not need any value from that extra info
+ block (it has unix filemode bits when packed under unix) but we
+ can be quite sure that this field is not null either. And that
+ was the second variadic field.
+</P>
+<P>
+ There is a third variadic field however - it's the comment field.
+ That was pretty heavily used in the good old DOS days. We are not
+ used to it anymore since filenames are generally self-descriptive
+ today but in the DOS days a filename was 8+3 chars maximum - and
+ it was in the comment field that told users what's in there. It
+ turned out that many software archives used zip format for just
+ that purpose as their primary distribution format - for being
+ able to attach a comment line with each entry.
+</P>
+<P>
+ Now, these three variadic fields have each an entry in the
+ directory entry header telling of their size. And after these
+ three variadic fields the next directory entry follows right in.
+ Yes, again there is no seek value here - we have to take the sum
+ of the three field sizes and add that to the end address of the
+ directory entry - just to be able to get to the next entry.
+</P>
+
+<h3> Internal Directory </h3>
+
+<P>
+ Now, the external ZIP format is too complicated. We cut it down
+ to the bare minimum we actually need. The fields in the entry
+ are parsed into a format directly usable, and from the variadic
+ fields we only keep the filename. Oh, and we ensure that the
+ filename gets a trailing null byte, so it can surely be passed
+ down into libc routines.
+</P>
+<P>
+ There is another trick by the way - we use the u_rootsize value
+ to malloc a block for the internal directory. That ensures the
+ internal root directory entries are in nearby locations, and
+ including the filenames themselves which we put in between the
+ dirent entries. That's not only similar to the external directory
+ format, but when calling readdir and looking for a matching
+ filename of an zzip_open call, this will ensure the memory is
+ fetched in a linear fashion. Modern cpu architectures are able
+ to burst through it.
+</P>
+<P>
+ One might think to use a more complicated internal directory
+ format - like hash tables or something. However, they all suffer
+ from the fact that memory access patterns will be somewhat random
+ which eats a lot of speed. It is hardly predictable under what
+ circumstances it gets us a benefit, but the problem is certainly
+ not off-world: there are zzip archives with 13k+ entries. In a real
+ filesystem people will not put 13k files into one directory, of
+ course - but for the zip central directory all entries are listed
+ in parallel with their subdirectory paths attached. So, if the
+ original subtree had a number of directories, they'll end up in
+ parallel in the zip's central directory.
+</P>
+
+<h3> File Entry </h3>
+
+<P>
+ The zip directory entry has one value that is called z_off in the
+ zziplib sources - it's the seek value to the start of the actual
+ file data, or more correctly it points to the "local file header".
+ Each file data block is preceded/followed with a little frame.
+ There is not much interesting information in these framing blocks,
+ the values are duplicates of the ones found in the zip central
+ directory - however, we must skip the local file header (and a
+ possible duplicate of filename and extrainfo) to arrive at the
+ actual file data.
+</P>
+<P>
+ When the start of the actual file data, we can finally read data.
+ The zziplib library does only know about two choices defined by
+ the value in the z_compr field - a value of "0" means "stored"
+ and data has been stored in uncompresed format, so that we can
+ just copy it out of the file to the application buffer.
+</P>
+<P>
+ A value of "8" means "deflated", and here we initialize the zlib
+ and every file data is decompressed before copying it to the
+ application buffer. Care must be taken here since zlib input data
+ and decompressed data may differ significantly. The zlib compression
+ will not even obey byte boundaries - a single bit may expand to
+ hundreds of bytes. That's why each ZZIP_FILE has a decompression
+ buffer attached.
+</P>
+<P>
+ All the other z_compr values are only of historical meaning,
+ the infozip unix tools will only create deflated content, and
+ the same applies to pkzip 2.x tools. If there would be any other
+ value than "0" or "8" then zziplib can not decompress it, simple
+ as that.
+</P>
+
+<h3> ZZIP_DIR / ZZIP_FILE </h3>
+
+<P>
+ The ZZIP_DIR internal structures stores a posix handle to the
+ zip file, and a pointer to the parsed central directory block.
+ One can use readdir/rewinddir to walk each entry in the central
+ directory and compare with the filenames attached. And that's
+ what will be done at a zzip_open call to find the file entry.
+</P>
+<P>
+ There are a few more fields in the ZZIP_DIR structure, where
+ most of these are related to the use of this struct as a
+ shared recource. You can use zzip_file_open to walk the
+ preparsed central directory and return a new ZZIP_FILE handle
+ for that entry.
+</P>
+<P>
+ That ZZIP_FILE handle contains a back pointer its ZZIP_DIR
+ that it was made from - and the back pointer also serves as flag
+ that the ZZIP_FILE handle points to a file within a ZIP file as
+ opposed to wrapping a real file in the real directory tree.
+ Each ZZIP_FILE will increment a shared counter, so that the
+ next dir_close will be deferred until all ZZIP_FILE have been
+ destroyed.
+</P>
+<P>
+ Another optmization is the cache-pointer in the ZZIP_DIR. It is
+ quite common to read data entries sequentially, as that the
+ zip directory is scanned for files matching a specific pattern,
+ and when a match is seen, that file is openened. However, each
+ ZZIP_FILE needs a decompression buffer, and we keep a cache of
+ the last one freed so that it can be picked up right away for the
+ next zzip_file_open.
+</P>
+<P>
+ Note that using multiple zzip_open() directly, each will open
+ and parse a zip directory of its own. That's bloat both in
+ terms of memory consumption and execution speed. One should try
+ to take advantage of the feature that multiple ZZIP_FILE's can
+ share a common ZZIP_DIR with a common preparsed copy of the
+ zip's central directory. That can be done directly with using
+ zzip_file_open to use a ZZIP_DIR as a factory for ZZIP_FILE,
+ but also zzip_freopen can be used to reuse the old internal
+ central directory, instead of parsing it again.
+</P>
+<P>
+ And while zzip_freopen would release the old ZZIP_FILE handle
+ only resuing the ZZIP_DIR attached, one can use another routine
+ directly called zzip_open_shared that will create a ZZIP_FILE
+ from an existing ZZIP_FILE. Oh, and not need to worry about
+ problems when a filepath given to zzip_freopen() happens to
+ be in another place, another directory, another zip archive.
+ In that case, the old zzip's internal directory is freed and
+ the others directory read - the preparsed central directory
+ is only used if that is actually possible.
+</P>
+