--- /dev/null
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
+<HTML>
+<HEAD>
+<TITLE>The Apache EBCDIC Port</TITLE>
+</HEAD>
+
+<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
+<BODY
+ BGCOLOR="#FFFFFF"
+ TEXT="#000000"
+ LINK="#0000FF"
+ VLINK="#000080"
+ ALINK="#FF0000"
+>
+<!--#include virtual="header.html" -->
+<H1 ALIGN="CENTER">Overview of Apache EBCDIC Port</H1>
+
+ <P>
+ Version 1.3 of the Apache HTTP Server is the first version which
+ includes a port to a (non-ASCII) mainframe machine which uses
+ the EBCDIC character set as its native codeset.<BR>
+ (It is the SIEMENS NIXDORF family of mainframes running the
+ <A HREF="http://www.sni.de/servers/bs2osd/osdbc_us.htm">BS2000/OSD
+ operating system</A>. This mainframe OS nowadays features a
+ SVR4-derived POSIX subsystem).
+ </P>
+
+ <P>
+ The port was started initially to
+ <UL>
+ <LI> prove the feasibility of porting
+ <A HREF="http://dev.apache.org/">the Apache HTTP server</A>
+ to this platform
+ <LI> find a "worthy and capable" successor for the venerable
+ <A HREF="http://www.w3.org/Daemon/">CERN-3.0</A> daemon
+ (which was ported a couple of years ago), and to
+ <LI> prove that Apache's preforking process model can on this platform
+ easily outperform the accept-fork-serve model used by CERN by a
+ factor of 5 or more.
+ </UL>
+ </P>
+
+ <P>
+ This document serves as a rationale to describe some of the design
+ decisions of the port to this machine.
+ </P>
+
+ <P>
+ One objective of the EBCDIC port was to maintain enough backwards
+ compatibility with the (EBCDIC) CERN server to make the transition to
+ the new server attractive and easy. This required the addition of
+ a configurable method to define whether a HTML document was stored
+ in ASCII (the only format accepted by the old server) or in EBCDIC
+ (the native document format in the POSIX subsystem, and therefore
+ the only realistic format in which the other POSIX tools like grep
+ or sed could operate on the documents). The current solution to
+ this is a "pseudo-MIME-format" which is intercepted and
+ interpreted by the Apache server (see below). Future versions
+ might solve the problem by defining an "ebcdic-handler" for all
+ documents which must be converted.
+ </P>
+
+ <P>
+ Since all Apache input and output is based upon the BUFF data type
+ and its methods, the easiest solution was to add the conversion to
+ the BUFF handling routines. The conversion must be settable at any
+ time, so a BUFF flag was added which defines whether a BUFF object
+ has currently enabled conversion or not. This flag is modified at
+ several points in the HTTP protocol:
+ <UL>
+ <LI><STRONG>set</STRONG> before a request is received (because the
+ request and the request header lines are always in ASCII
+ format)
+
+ <LI><STRONG>set/unset</STRONG> when the request body is
+ received - depending on the content type of the request body
+ (because the request body may contain ASCII text or a binary file)
+
+ <LI><STRONG>set</STRONG> before a reply header is sent (because the
+ response header lines are always in ASCII format)
+
+ <LI><STRONG>set/unset</STRONG> when the response body is
+ sent - depending on the content type of the response body
+ (because the response body may contain text or a binary file)
+ </UL>
+ </P>
+
+<H1 ALIGN="CENTER">Porting Notes</H1>
+ <P>
+ <OL>
+ <LI>
+ The relevant changes in the source are #ifdef'ed into two
+ categories:
+ <DL>
+ <DT><CODE><STRONG>#ifdef CHARSET_EBCDIC</STRONG></CODE>
+ <DD>Code which is needed for any EBCDIC based machine. This
+ includes character translations, differences in in
+ contiguity of the two character sets, flags which
+ indicate which part of the HTTP protocol has to be
+ converted and which part doesn't etc.
+ <DT><CODE><STRONG>#ifdef _OSD_POSIX</STRONG></CODE>
+ <DD>Code which is needed for the BS2000 SIEMENS NIXDORF
+ mainframe platform only. This deals with include file
+ differences and socket implementations topics which are
+ only required on the BS2000/OSD platform.
+ </DL>
+ </LI><BR>
+
+ <LI>
+ The possibility to translate between ASCII and EBCDIC at the
+ socket level (on BS2000 POSIX, there is a socket option which
+ supports this) was intentionally <EM>not</EM> chosen, because
+ the byte stream at the HTTP protocol level consists of a
+ mixture of protocol related strings and non-protocol related
+ raw file data. HTTP protocol strings are always encoded in
+ ASCII (the GET request, any Header: lines, the chunking
+ information etc.) whereas the file transfer parts (i.e., GIF
+ images, CGI output etc.) should usually be just "passed thru"
+ by the server. This separation between "protocol string" and
+ "raw data" is reflected in the server code by functions like
+ bgets() or rvputs() for strings, and functions like bwrite()
+ for binary data. A global translation of everything would
+ therefore be inadequate.<BR>
+ (In the case of text files of course, provisions must be made so
+ that the documents are always served in ASCII format)
+ </LI><BR>
+
+ <LI>
+ This port therefore features a built-in protocol level conversion
+ for the server-internal strings (which the compiler translated to
+ EBCDIC strings) and thus for all server-generated documents.
+ The hard coded ASCII escapes \012 and \015 which are
+ ubiquitous in the server code are an exception: they are
+ already the binary encoding of the ASCII \n and \r and must
+ not be converted to ASCII a second time. This exception is
+ only relevant for server-generated strings; and <EM>external</EM>
+ EBCDIC documents always go through a bijective EBCDIC <-> ASCII
+ translation table.
+ </LI><BR>
+
+ <LI>
+ By examining the call hierarchy for the BUFF management
+ routines, I added an "ebcdic/ascii conversion layer" which
+ would be crossed on every puts/write/get/gets, and a
+ conversion flag which allowed enabling/disabling the
+ conversions on-the-fly. It is now possible to read the header
+ lines of a CGI-script output in EBCDIC format, and then find
+ out that the remainder of the script's output is in ASCII
+ (like in the case of the output of a WWW Counter program: the
+ document body contains a GIF image). Likewise, the server
+ always generates its header lines in EBCDIC (and with ASCII
+ conversion enabled) and determines, based on the type of
+ document being served, whether the document body (except for
+ the chunking information, of course) is in ASCII already or
+ is converted from EBCDIC.
+ </LI><BR>
+
+ <LI>
+ For Text documents (MIME types text/plain, text/html etc.),
+ an implicit translation to ASCII can be used, or (if the
+ users prefer to store some documents in raw ASCII form for
+ faster serving, or because the files reside on a NFS-mounted
+ directory tree) can be served without conversion.
+ <BR>
+ <STRONG>Example:</STRONG><BLOCKQUOTE>
+ to serve files with the suffix .ahtml as a raw ASCII text/html
+ document without implicit conversion (and suffix .ascii
+ as ASCII text/plain), use the directives:<PRE>
+ AddType text/x-ascii-html .ahtml
+ AddType text/x-ascii-plain .ascii
+ </PRE></BLOCKQUOTE>
+ Similarly, any text/XXXX MIME type can be served as "raw ASCII" by
+ configuring a MIME type "text/x-ascii-XXXX" for it using AddType.
+ </LI><BR>
+
+ <LI>
+ Non-text documents are always served "binary" without conversion.
+ This seems to be the most sensible choice for, .e.g., GIF/ZIP/AU
+ file types. This of course requires the user to copy them to the
+ mainframe host using the "rcp -b" binary switch.
+ </LI><BR>
+
+ <LI>
+ Server parsed files are always assumed to be in native (i.e.,
+ EBCDIC) format as used on the machine, and are converted after
+ processing.
+ </LI><BR>
+
+ <LI>
+ For CGI output, the CGI script determines whether a conversion is
+ needed or not: by setting the appropriate Content-Type, text files
+ can be converted, or GIF output can be passed through unmodified.
+ An example for the latter case is the wwwcount program which we ported
+ as well.
+ </LI><BR>
+ </OL>
+ </P>
+
+<!--#include virtual="footer.html" -->
+</BODY>
+</HTML>