From fc1127bbf9137ce992c1c485bc65c1b46ab081ce Mon Sep 17 00:00:00 2001 From: Martin Kraemer Date: Thu, 26 Mar 1998 15:48:49 +0000 Subject: [PATCH] Add a preliminary EBCDIC porting paper (derived from README.EBCDIC) git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@80689 13f79535-47bb-0310-9956-ffa450edef68 --- docs/manual/ebcdic.html | 201 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 201 insertions(+) create mode 100644 docs/manual/ebcdic.html diff --git a/docs/manual/ebcdic.html b/docs/manual/ebcdic.html new file mode 100644 index 0000000000..5471ef4dbc --- /dev/null +++ b/docs/manual/ebcdic.html @@ -0,0 +1,201 @@ + + + +The Apache EBCDIC Port + + + + + +

Overview of Apache EBCDIC Port

+ +

+ Version 1.3 of the Apache HTTP Server is the first version which + includes a port to a (non-ASCII) mainframe machine which uses + the EBCDIC character set as its native codeset.
+ (It is the SIEMENS NIXDORF family of mainframes running the + BS2000/OSD + operating system. This mainframe OS nowadays features a + SVR4-derived POSIX subsystem). +

+ +

+ The port was started initially to +

prove the feasibility of porting + the Apache HTTP server + to this platform +
find a "worthy and capable" successor for the venerable + CERN-3.0 daemon + (which was ported a couple of years ago), and to +
prove that Apache's preforking process model can on this platform + easily outperform the accept-fork-serve model used by CERN by a + factor of 5 or more. +

+ +

+ This document serves as a rationale to describe some of the design + decisions of the port to this machine. +

+ +

+ One objective of the EBCDIC port was to maintain enough backwards + compatibility with the (EBCDIC) CERN server to make the transition to + the new server attractive and easy. This required the addition of + a configurable method to define whether a HTML document was stored + in ASCII (the only format accepted by the old server) or in EBCDIC + (the native document format in the POSIX subsystem, and therefore + the only realistic format in which the other POSIX tools like grep + or sed could operate on the documents). The current solution to + this is a "pseudo-MIME-format" which is intercepted and + interpreted by the Apache server (see below). Future versions + might solve the problem by defining an "ebcdic-handler" for all + documents which must be converted. +

+ +

+ Since all Apache input and output is based upon the BUFF data type + and its methods, the easiest solution was to add the conversion to + the BUFF handling routines. The conversion must be settable at any + time, so a BUFF flag was added which defines whether a BUFF object + has currently enabled conversion or not. This flag is modified at + several points in the HTTP protocol: +

set before a request is received (because the + request and the request header lines are always in ASCII + format) + +
set/unset when the request body is + received - depending on the content type of the request body + (because the request body may contain ASCII text or a binary file) + +
set before a reply header is sent (because the + response header lines are always in ASCII format) + +
set/unset when the response body is + sent - depending on the content type of the response body + (because the response body may contain text or a binary file) +

+ +

Porting Notes

+ The relevant changes in the source are #ifdef'ed into two + categories: +
+
#ifdef CHARSET_EBCDIC +
Code which is needed for any EBCDIC based machine. This + includes character translations, differences in in + contiguity of the two character sets, flags which + indicate which part of the HTTP protocol has to be + converted and which part doesn't etc. +
#ifdef _OSD_POSIX +
Code which is needed for the BS2000 SIEMENS NIXDORF + mainframe platform only. This deals with include file + differences and socket implementations topics which are + only required on the BS2000/OSD platform. +
+

+ The possibility to translate between ASCII and EBCDIC at the + socket level (on BS2000 POSIX, there is a socket option which + supports this) was intentionally not chosen, because + the byte stream at the HTTP protocol level consists of a + mixture of protocol related strings and non-protocol related + raw file data. HTTP protocol strings are always encoded in + ASCII (the GET request, any Header: lines, the chunking + information etc.) whereas the file transfer parts (i.e., GIF + images, CGI output etc.) should usually be just "passed thru" + by the server. This separation between "protocol string" and + "raw data" is reflected in the server code by functions like + bgets() or rvputs() for strings, and functions like bwrite() + for binary data. A global translation of everything would + therefore be inadequate.
+ (In the case of text files of course, provisions must be made so + that the documents are always served in ASCII format) +

+ This port therefore features a built-in protocol level conversion + for the server-internal strings (which the compiler translated to + EBCDIC strings) and thus for all server-generated documents. + The hard coded ASCII escapes \012 and \015 which are + ubiquitous in the server code are an exception: they are + already the binary encoding of the ASCII \n and \r and must + not be converted to ASCII a second time. This exception is + only relevant for server-generated strings; and external + EBCDIC documents always go through a bijective EBCDIC <-> ASCII + translation table. +

+ By examining the call hierarchy for the BUFF management + routines, I added an "ebcdic/ascii conversion layer" which + would be crossed on every puts/write/get/gets, and a + conversion flag which allowed enabling/disabling the + conversions on-the-fly. It is now possible to read the header + lines of a CGI-script output in EBCDIC format, and then find + out that the remainder of the script's output is in ASCII + (like in the case of the output of a WWW Counter program: the + document body contains a GIF image). Likewise, the server + always generates its header lines in EBCDIC (and with ASCII + conversion enabled) and determines, based on the type of + document being served, whether the document body (except for + the chunking information, of course) is in ASCII already or + is converted from EBCDIC. +

+ For Text documents (MIME types text/plain, text/html etc.), + an implicit translation to ASCII can be used, or (if the + users prefer to store some documents in raw ASCII form for + faster serving, or because the files reside on a NFS-mounted + directory tree) can be served without conversion. +
+ Example:
+ to serve files with the suffix .ahtml as a raw ASCII text/html + document without implicit conversion (and suffix .ascii + as ASCII text/plain), use the directives:
```
+      AddType  text/x-ascii-html  .ahtml
+      AddType  text/x-ascii-plain .ascii
+      
```
+ Similarly, any text/XXXX MIME type can be served as "raw ASCII" by + configuring a MIME type "text/x-ascii-XXXX" for it using AddType. +

+ Non-text documents are always served "binary" without conversion. + This seems to be the most sensible choice for, .e.g., GIF/ZIP/AU + file types. This of course requires the user to copy them to the + mainframe host using the "rcp -b" binary switch. +

+ Server parsed files are always assumed to be in native (i.e., + EBCDIC) format as used on the machine, and are converted after + processing. +

+ For CGI output, the CGI script determines whether a conversion is + needed or not: by setting the appropriate Content-Type, text files + can be converted, or GIF output can be passed through unmodified. + An example for the latter case is the wwwcount program which we ported + as well. +

+ + + + -- 2.50.1