Add a preliminary EBCDIC porting paper (derived from README.EBCDIC)

author Martin Kraemer <martin@apache.org>

Thu, 26 Mar 1998 15:48:49 +0000 (15:48 +0000)

committer Martin Kraemer <martin@apache.org>

Thu, 26 Mar 1998 15:48:49 +0000 (15:48 +0000)
author Martin Kraemer <martin@apache.org>
Thu, 26 Mar 1998 15:48:49 +0000 (15:48 +0000)
committer Martin Kraemer <martin@apache.org>
Thu, 26 Mar 1998 15:48:49 +0000 (15:48 +0000)
diff --git a/docs/manual/ebcdic.html b/docs/manual/ebcdic.html

new file mode 100644 (file)

index 0000000..5471ef4
--- /dev/null
+++ b/docs/manual/ebcdic.html
@@ -0,0 +1,201 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
+<HTML>
+<HEAD>
+<TITLE>The Apache EBCDIC Port</TITLE>
+</HEAD>
+
+<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
+<BODY
+ BGCOLOR="#FFFFFF"
+ TEXT="#000000"
+ LINK="#0000FF"
+ VLINK="#000080"
+ ALINK="#FF0000"
+>
+<!--#include virtual="header.html" -->
+<H1 ALIGN="CENTER">Overview of Apache EBCDIC Port</H1>
+
+ <P>
+  Version 1.3 of the Apache HTTP Server is the first version which
+  includes a port to a (non-ASCII) mainframe machine which uses
+  the EBCDIC character set as its native codeset.<BR>
+  (It is the SIEMENS NIXDORF family of mainframes running the
+  <A HREF="http://www.sni.de/servers/bs2osd/osdbc_us.htm">BS2000/OSD
+  operating system</A>. This mainframe OS nowadays features a
+  SVR4-derived POSIX subsystem).
+ </P>
+
+ <P>
+ The port was started initially to
+  <UL>
+  <LI> prove the feasibility of porting
+       <A HREF="http://dev.apache.org/">the Apache HTTP server</A>
+       to this platform
+  <LI> find a "worthy and capable" successor for the venerable
+       <A HREF="http://www.w3.org/Daemon/">CERN-3.0</A> daemon
+       (which was ported a couple of years ago), and to
+  <LI> prove that Apache's preforking process model can on this platform
+       easily outperform the accept-fork-serve model used by CERN by a
+       factor of 5 or more.
+  </UL>
+ </P>
+
+ <P>
+  This document serves as a rationale to describe some of the design
+  decisions of the port to this machine.
+ </P>
+
+ <P>
+  One objective of the EBCDIC port was to maintain enough backwards
+  compatibility with the (EBCDIC) CERN server to make the transition to
+  the new server attractive and easy. This required the addition of
+  a configurable method to define whether a HTML document was stored
+  in ASCII (the only format accepted by the old server) or in EBCDIC
+  (the native document format in the POSIX subsystem, and therefore
+  the only realistic format in which the other POSIX tools like grep
+  or sed could operate on the documents). The current solution to
+  this is a "pseudo-MIME-format" which is intercepted and
+  interpreted by the Apache server (see below). Future versions
+  might solve the problem by defining an "ebcdic-handler" for all
+  documents which must be converted.
+ </P>
+
+ <P>
+  Since all Apache input and output is based upon the BUFF data type
+  and its methods, the easiest solution was to add the conversion to
+  the BUFF handling routines. The conversion must be settable at any
+  time, so a BUFF flag was added which defines whether a BUFF object
+  has currently enabled conversion or not. This flag is modified at
+  several points in the HTTP protocol:
+  <UL>
+   <LI><STRONG>set</STRONG> before a request is received (because the
+       request and the request header lines are always in ASCII
+       format)
+
+   <LI><STRONG>set/unset</STRONG> when the request body is
+       received - depending on the content type of the request body
+       (because the request body may contain ASCII text or a binary file)
+
+   <LI><STRONG>set</STRONG> before a reply header is sent (because the
+       response header lines are always in ASCII format)
+
+   <LI><STRONG>set/unset</STRONG> when the response body is
+       sent - depending on the content type of the response body
+       (because the response body may contain text or a binary file)
+  </UL>
+ </P>
+
+<H1 ALIGN="CENTER">Porting Notes</H1>
+ <P>
+  <OL>
+   <LI>
+   The relevant changes in the source are #ifdef'ed into two
+   categories:
+   <DL>
+    <DT><CODE><STRONG>#ifdef CHARSET_EBCDIC</STRONG></CODE>
+    <DD>Code which is needed for any EBCDIC based machine. This
+       includes character translations, differences in in
+       contiguity of the two character sets, flags which
+       indicate which part of the HTTP protocol has to be
+       converted and which part doesn't etc.
+    <DT><CODE><STRONG>#ifdef _OSD_POSIX</STRONG></CODE>
+    <DD>Code which is needed for the BS2000 SIEMENS NIXDORF
+       mainframe platform only. This deals with include file
+       differences and socket implementations topics which are
+       only required on the BS2000/OSD platform.
+   </DL>
+   </LI><BR>
+
+   <LI>
+    The possibility to translate between ASCII and EBCDIC at the
+    socket level (on BS2000 POSIX, there is a socket option which
+    supports this) was intentionally <EM>not</EM> chosen, because
+    the byte stream at the HTTP protocol level consists of a
+    mixture of protocol related strings and non-protocol related
+    raw file data. HTTP protocol strings are always encoded in
+    ASCII (the GET request, any Header: lines, the chunking
+    information etc.) whereas the file transfer parts (i.e., GIF
+    images, CGI output etc.) should usually be just "passed thru"
+    by the server. This separation between "protocol string" and
+    "raw data" is reflected in the server code by functions like
+    bgets() or rvputs() for strings, and functions like bwrite()
+    for binary data. A global translation of everything would
+    therefore be inadequate.<BR>
+    (In the case of text files of course, provisions must be made so
+    that the documents are always served in ASCII format)
+   </LI><BR>
+
+   <LI>
+    This port therefore features a built-in protocol level conversion
+    for the server-internal strings (which the compiler translated to
+    EBCDIC strings) and thus for all server-generated documents.
+    The hard coded ASCII escapes \012 and \015 which are
+    ubiquitous in the server code are an exception: they are
+    already the binary encoding of the ASCII \n and \r and must
+    not be converted to ASCII a second time. This exception is
+    only relevant for server-generated strings; and <EM>external</EM>
+    EBCDIC documents always go through a bijective EBCDIC &lt;-&gt; ASCII
+    translation table.
+   </LI><BR>
+
+   <LI>
+    By examining the call hierarchy for the BUFF management
+    routines, I added an "ebcdic/ascii conversion layer" which
+    would be crossed on every puts/write/get/gets, and a
+    conversion flag which allowed enabling/disabling the
+    conversions on-the-fly. It is now possible to read the header
+    lines of a CGI-script output in EBCDIC format, and then find
+    out that the remainder of the script's output is in ASCII
+    (like in the case of the output of a WWW Counter program: the
+    document body contains a GIF image). Likewise, the server
+    always generates its header lines in EBCDIC (and with ASCII
+    conversion enabled) and determines, based on the type of
+    document being served, whether the document body (except for
+    the chunking information, of course) is in ASCII already or
+    is converted from EBCDIC.
+   </LI><BR>
+
+   <LI>
+    For Text documents (MIME types text/plain, text/html etc.),
+    an implicit translation to ASCII can be used, or (if the
+    users prefer to store some documents in raw ASCII form for
+    faster serving, or because the files reside on a NFS-mounted
+    directory tree) can be served without conversion.
+    <BR>
+    <STRONG>Example:</STRONG><BLOCKQUOTE>
+       to serve files with the suffix .ahtml as a raw ASCII text/html
+       document without implicit conversion (and suffix .ascii
+       as ASCII text/plain), use the directives:<PRE>
+      AddType  text/x-ascii-html  .ahtml
+      AddType  text/x-ascii-plain .ascii
+      </PRE></BLOCKQUOTE>
+    Similarly, any text/XXXX MIME type can be served as "raw ASCII" by
+    configuring a MIME type "text/x-ascii-XXXX" for it using AddType.
+   </LI><BR>
+
+   <LI>
+    Non-text documents are always served "binary" without conversion.
+    This seems to be the most sensible choice for, .e.g., GIF/ZIP/AU
+    file types. This of course requires the user to copy them to the
+    mainframe host using the "rcp -b" binary switch.
+   </LI><BR>
+
+   <LI>
+    Server parsed files are always assumed to be in native (i.e.,
+    EBCDIC) format as used on the machine, and are converted after
+    processing.
+   </LI><BR>
+
+   <LI>
+    For CGI output, the CGI script determines whether a conversion is
+    needed or not: by setting the appropriate Content-Type, text files
+    can be converted, or GIF output can be passed through unmodified.
+    An example for the latter case is the wwwcount program which we ported
+    as well.
+   </LI><BR>
+  </OL>
+ </P>
+
+<!--#include virtual="footer.html" -->
+</BODY>
+</HTML>
author	Martin Kraemer <martin@apache.org>
	Thu, 26 Mar 1998 15:48:49 +0000 (15:48 +0000)
committer	Martin Kraemer <martin@apache.org>
	Thu, 26 Mar 1998 15:48:49 +0000 (15:48 +0000)