From fc1127bbf9137ce992c1c485bc65c1b46ab081ce Mon Sep 17 00:00:00 2001
From: Martin Kraemer
Date: Thu, 26 Mar 1998 15:48:49 +0000
Subject: [PATCH] Add a preliminary EBCDIC porting paper (derived from
README.EBCDIC)
git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@80689 13f79535-47bb-0310-9956-ffa450edef68
---
docs/manual/ebcdic.html | 201 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 201 insertions(+)
create mode 100644 docs/manual/ebcdic.html
diff --git a/docs/manual/ebcdic.html b/docs/manual/ebcdic.html
new file mode 100644
index 0000000000..5471ef4dbc
--- /dev/null
+++ b/docs/manual/ebcdic.html
@@ -0,0 +1,201 @@
+
+
+
+The Apache EBCDIC Port
+
+
+
+
+
+Overview of Apache EBCDIC Port
+
+
+ Version 1.3 of the Apache HTTP Server is the first version which
+ includes a port to a (non-ASCII) mainframe machine which uses
+ the EBCDIC character set as its native codeset.
+ (It is the SIEMENS NIXDORF family of mainframes running the
+ BS2000/OSD
+ operating system. This mainframe OS nowadays features a
+ SVR4-derived POSIX subsystem).
+
+
+
+ The port was started initially to
+
+ - prove the feasibility of porting
+ the Apache HTTP server
+ to this platform
+
- find a "worthy and capable" successor for the venerable
+ CERN-3.0 daemon
+ (which was ported a couple of years ago), and to
+
- prove that Apache's preforking process model can on this platform
+ easily outperform the accept-fork-serve model used by CERN by a
+ factor of 5 or more.
+
+
+
+
+ This document serves as a rationale to describe some of the design
+ decisions of the port to this machine.
+
+
+
+ One objective of the EBCDIC port was to maintain enough backwards
+ compatibility with the (EBCDIC) CERN server to make the transition to
+ the new server attractive and easy. This required the addition of
+ a configurable method to define whether a HTML document was stored
+ in ASCII (the only format accepted by the old server) or in EBCDIC
+ (the native document format in the POSIX subsystem, and therefore
+ the only realistic format in which the other POSIX tools like grep
+ or sed could operate on the documents). The current solution to
+ this is a "pseudo-MIME-format" which is intercepted and
+ interpreted by the Apache server (see below). Future versions
+ might solve the problem by defining an "ebcdic-handler" for all
+ documents which must be converted.
+
+
+
+ Since all Apache input and output is based upon the BUFF data type
+ and its methods, the easiest solution was to add the conversion to
+ the BUFF handling routines. The conversion must be settable at any
+ time, so a BUFF flag was added which defines whether a BUFF object
+ has currently enabled conversion or not. This flag is modified at
+ several points in the HTTP protocol:
+
+ - set before a request is received (because the
+ request and the request header lines are always in ASCII
+ format)
+
+
- set/unset when the request body is
+ received - depending on the content type of the request body
+ (because the request body may contain ASCII text or a binary file)
+
+
- set before a reply header is sent (because the
+ response header lines are always in ASCII format)
+
+
- set/unset when the response body is
+ sent - depending on the content type of the response body
+ (because the response body may contain text or a binary file)
+
+
+
+Porting Notes
+
+
+ -
+ The relevant changes in the source are #ifdef'ed into two
+ categories:
+
+ #ifdef CHARSET_EBCDIC
+ - Code which is needed for any EBCDIC based machine. This
+ includes character translations, differences in in
+ contiguity of the two character sets, flags which
+ indicate which part of the HTTP protocol has to be
+ converted and which part doesn't etc.
+
#ifdef _OSD_POSIX
+ - Code which is needed for the BS2000 SIEMENS NIXDORF
+ mainframe platform only. This deals with include file
+ differences and socket implementations topics which are
+ only required on the BS2000/OSD platform.
+
+
+
+ -
+ The possibility to translate between ASCII and EBCDIC at the
+ socket level (on BS2000 POSIX, there is a socket option which
+ supports this) was intentionally not chosen, because
+ the byte stream at the HTTP protocol level consists of a
+ mixture of protocol related strings and non-protocol related
+ raw file data. HTTP protocol strings are always encoded in
+ ASCII (the GET request, any Header: lines, the chunking
+ information etc.) whereas the file transfer parts (i.e., GIF
+ images, CGI output etc.) should usually be just "passed thru"
+ by the server. This separation between "protocol string" and
+ "raw data" is reflected in the server code by functions like
+ bgets() or rvputs() for strings, and functions like bwrite()
+ for binary data. A global translation of everything would
+ therefore be inadequate.
+ (In the case of text files of course, provisions must be made so
+ that the documents are always served in ASCII format)
+
+
+ -
+ This port therefore features a built-in protocol level conversion
+ for the server-internal strings (which the compiler translated to
+ EBCDIC strings) and thus for all server-generated documents.
+ The hard coded ASCII escapes \012 and \015 which are
+ ubiquitous in the server code are an exception: they are
+ already the binary encoding of the ASCII \n and \r and must
+ not be converted to ASCII a second time. This exception is
+ only relevant for server-generated strings; and external
+ EBCDIC documents always go through a bijective EBCDIC <-> ASCII
+ translation table.
+
+
+ -
+ By examining the call hierarchy for the BUFF management
+ routines, I added an "ebcdic/ascii conversion layer" which
+ would be crossed on every puts/write/get/gets, and a
+ conversion flag which allowed enabling/disabling the
+ conversions on-the-fly. It is now possible to read the header
+ lines of a CGI-script output in EBCDIC format, and then find
+ out that the remainder of the script's output is in ASCII
+ (like in the case of the output of a WWW Counter program: the
+ document body contains a GIF image). Likewise, the server
+ always generates its header lines in EBCDIC (and with ASCII
+ conversion enabled) and determines, based on the type of
+ document being served, whether the document body (except for
+ the chunking information, of course) is in ASCII already or
+ is converted from EBCDIC.
+
+
+ -
+ For Text documents (MIME types text/plain, text/html etc.),
+ an implicit translation to ASCII can be used, or (if the
+ users prefer to store some documents in raw ASCII form for
+ faster serving, or because the files reside on a NFS-mounted
+ directory tree) can be served without conversion.
+
+ Example:
+ to serve files with the suffix .ahtml as a raw ASCII text/html
+ document without implicit conversion (and suffix .ascii
+ as ASCII text/plain), use the directives:
+ AddType text/x-ascii-html .ahtml
+ AddType text/x-ascii-plain .ascii
+
+ Similarly, any text/XXXX MIME type can be served as "raw ASCII" by
+ configuring a MIME type "text/x-ascii-XXXX" for it using AddType.
+
+
+ -
+ Non-text documents are always served "binary" without conversion.
+ This seems to be the most sensible choice for, .e.g., GIF/ZIP/AU
+ file types. This of course requires the user to copy them to the
+ mainframe host using the "rcp -b" binary switch.
+
+
+ -
+ Server parsed files are always assumed to be in native (i.e.,
+ EBCDIC) format as used on the machine, and are converted after
+ processing.
+
+
+ -
+ For CGI output, the CGI script determines whether a conversion is
+ needed or not: by setting the appropriate Content-Type, text files
+ can be converted, or GIF output can be passed through unmodified.
+ An example for the latter case is the wwwcount program which we ported
+ as well.
+
+
+
+
+
+
+
--
2.50.1