--- /dev/null
+<?xml version='1.0' encoding='iso-8859-1'?>
+<!DOCTYPE article PUBLIC '-//OASIS//DTD DocBook XML V4.1.2//EN' 'http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd'>
+<article>
+<articleinfo>
+<title>Profiling DocBook documents</title>
+<subtitle>An easy way to personalize your content for several target audiences</subtitle>
+<author>
+<firstname>Jirka</firstname>
+<surname>Kosek</surname>
+<affiliation>
+<address>E-mail: <email>jirka@kosek.cz</email>
+<otheraddr>Web: <ulink
+url="http://www.kosek.cz">http://www.kosek.cz</ulink>
+</otheraddr>
+</address>
+</affiliation>
+</author>
+<copyright>
+<year>2001</year>
+<holder>Jiří Kosek</holder>
+</copyright>
+<releaseinfo role="meta">
+$Id$
+</releaseinfo>
+</articleinfo>
+
+<section>
+<title>Introduction</title>
+
+<para>There are many situations when we need to generate several
+versions of document with slightly different content from the single
+source. User guide for program with both Windows and Linux port will
+differ only in several topics related to installation and
+configuration. It would be futile to create and maintain two different
+documents in sync. Another real world example – in addition to
+standard documentation we can have guide enriched with problem
+solutions from help-desk. It also may be better to store these
+information in one document in order to make them absolutely
+synchronized.</para>
+<para>Several high-end editing tools have built in support for
+profiling. User can easily add target audiences for any part of
+document in a simple to use dialog box. User can select desired target
+audience before printing or generation of other output formats.
+Software will automatically filter out excess parts of document and
+pass rest of it to rendering engine. However, if your budget is
+limited you can not use commercial solutions. In the following text I
+will show you simple but flexible profiling solution based on freely
+available technologies.</para>
+
+</section>
+
+<section>
+<title>$0 solution</title>
+
+<para>In the document we mark parts targeted for particular platform
+or user group. When generating final output version of document we
+must do profiling i.e. personalization for particular target audience.
+Only some parts of document are processed. DocBook has built in
+support for marking document parts – on almost every element you
+can use attributes <sgmltag class="attribute">os</sgmltag>, <sgmltag
+class="attribute">userlevel</sgmltag> and <sgmltag
+class="attribute">arch</sgmltag>. We can store identifier of operating
+system, user group or hardware architecture here. You can also store
+profiling information into some general use attribute like <sgmltag
+class="attribute">role</sgmltag>. <xref
+linkend="ex:doc"/> shows how document with profile information might
+look.</para>
+
+<example id="ex:doc">
+<title>Sample DocBook document with profiling information</title>
+<programlisting><![CDATA[<?xml version='1.0' encoding='iso-8859-1'?>
+<!DOCTYPE chapter PUBLIC '-//OASIS//DTD DocBook XML V4.1.2//EN'
+ 'http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd'>
+<chapter>
+<title>How to setup SGML catalogs</title>
+
+<para>Many existing SGML tools are able to map public identifiers to
+files on your local file system. Mapping is specified in so called
+catalog file. List of catalog files to use is stored in environment
+variable <envar>SGML_CATALOG_FILES</envar>.</para>
+
+<para os="unix">On Unix systems you can set this variable by invoking
+command <command>export SGML_CATALOG_FILES=/usr/lib/catalog</command>
+on command line. If you want maintain value of the variable between
+sessions, place this command into startup file,
+e.g. <filename>.profile</filename>.</para>
+
+<para os="win">In Windows NT/2000 you can set environment variable by
+issuing command <menuchoice><guimenu>Start</guimenu>
+<guisubmenu>Settings</guisubmenu> <guisubmenu>Control
+Pannel</guisubmenu>
+<guimenuitem>System</guimenuitem></menuchoice>. Then select
+<guilabel>Advanced</guilabel> card in the dialog box and click on the
+<guibutton>Environment Variables...</guibutton> button. Using the
+<guibutton>New</guibutton> button you can add new environment variable
+into your system.</para>
+
+</chapter>]]></programlisting>
+</example>
+
+<para>DocBook documents are often processed by freely available DSSSL
+and XSL stylesheets. Most DocBook users who want profiling starts with
+creation of customization layer which filters out some parts of
+document. This approach has several serious disadvantages. First, you
+must create profiling customization for all output formats as they are
+using different stylesheets. This mean that you must maintain same
+code on several places or do some dirty tricks with importing several
+stylesheets into one stylesheet.</para>
+<para>Second drawback is more serious. If you override templates to
+filter out documents, you can get almost correct output in a single
+run of stylesheet. If you will closely look on generated output, you
+will recognize that in the table of contents there are entries for
+items which should be completely removed by profiling. Similar
+problems are in several other places – e.g. skipped auto
+generated numbers for tables, pictures and so on. To correct this one
+should change all stylesheet code which generates ToC,
+cross-references and so on to ignore filtered content. This is very
+complicated task and will disallow you to easily upgrade to new
+versions of stylesheets.</para>
+<para>Thus we must use different approach. Profiling should be totally
+separate step which will filter out some parts of original document
+and will create new correct DocBook document. When processed with any
+DocBook tool or stylesheet you will get always correct output from the
+new standalone document now. Big advantage of this method is
+compatibility with all DocBook tools. Filtered document is normal
+DocBook document and it does not require any special processing. Of
+course, there is also one disadvantage – formating is now two
+stage process – first you must filter source document and in
+second step you could apply normal stylesheets on result of filtering.
+This may be little bit inconvenient for many users, but whole task can
+be very easily automated by set of shell scripts or batch files or
+whatever else.</para>
+
+<figure>
+<title>Profiling stream</title>
+<mediaobject>
+<imageobject>
+<imagedata fileref="profile-chain.png" format="PNG"/>
+</imageobject>
+</mediaobject>
+</figure>
+
+<para>When implementing filter, you can use many different approaches
+and tools. I decided to use XSLT stylesheet. Writing necessary filter
+is very easy in XSLT and many users have XSLT processor already
+installed. Profiling stylesheet is part of standard XSL stylesheets
+distribution and can be found in file
+<filename>tools/profile.xsl</filename>.</para>
+</section>
+
+<section>
+<title>Usage</title>
+
+<para>If you want to generate Unix specific guide from our sample
+document (<xref linkend="ex:doc"/>) you can do it in the following
+way. (We assume, that command <command>saxon</command> is able to run
+XSLT processor on your machine. You can use your preffered XSLT
+processor instead.)</para>
+
+<screen><command>saxon</command> <option>-o</option> unixsample.xml sample.xml profile.xsl "os=unix"</screen>
+
+<para>We are processing source document
+<filename>sample.xml</filename> with profiling stylesheet
+<filename>profile.xsl</filename>. Result of transformation is stored
+in file <filename>unixsample.xml</filename>. By setting parameter
+<parameter>os</parameter> to value <literal>unix</literal>, we tell
+that only general and Unix specific parts of document should be copied
+to the result document. If you will look at generated result, you will
+notice that this is correct DocBook document:</para>
+
+<programlisting><![CDATA[<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE chapter
+ PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
+ "http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd">
+<chapter>
+<title>How to setup SGML catalogs</title>
+
+<para>Many existing SGML tools are able to map public identifiers to
+files on your local file system. Mapping is specified in so called
+catalog file. List of catalog files to use is stored in environment
+variable <envar>SGML_CATALOG_FILES</envar>.</para>
+
+<para os="unix">On Unix systems you can set this variable by invoking
+command <command moreinfo="none">export SGML_CATALOG_FILES=/usr/lib/catalog</command>
+on command line. If you want maintain value of the variable between
+sessions, place this command into startup file,
+e.g. <filename moreinfo="none">.profile</filename>.</para>
+
+</chapter>]]></programlisting>
+
+<para>It is same as the input document, only Windows specific
+paragraph is missing. Same procedure can be used to get Windows
+specific version of document. The result generated by profiling
+stylesheet have correct document type declaration (DOCTYPE). Without
+it some tools would not be able to process them further. On the result
+of filtering you can run common tools – for example DSSSL or XSL
+stylesheets.</para>
+
+<para>Stylesheet support several attributes for specifying profiling
+values. They are summarized in the following list.</para>
+
+<variablelist>
+<varlistentry>
+<term><parameter>os</parameter></term>
+<listitem>
+<para>This parameter is used for specifying operating system (<sgmltag
+class="attribute">os</sgmltag> attribute) for which you want get
+profiled version of document.</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term><parameter>ul</parameter></term>
+<listitem>
+<para>This parameter is used for specifying user level (<sgmltag
+class="attribute">userlevel</sgmltag> attribute) for which you want get
+profiled version of document.</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term><parameter>arch</parameter></term>
+<listitem>
+<para>This parameter is used for specifying hardware architecture (<sgmltag
+class="attribute">arch</sgmltag> attribute) for which you want get
+profiled version of document.</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term><parameter>attr</parameter></term>
+<listitem>
+<para>Name of attribute on which profiling should be based. It can be
+used if profiling information is stored in other attributes then
+<sgmltag class="attribute">os</sgmltag>, <sgmltag
+class="attribute">userlevel</sgmltag> and <sgmltag class="attribute">arch</sgmltag>.</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term><parameter>val</parameter></term>
+<listitem>
+<para>This parameter is used for specifying value for attribute
+selected by <parameter>attr</parameter> parameter.</para>
+<para>E.g. setting <literal>attr=os</literal> and
+<literal>val=unix</literal> is same as setting
+<literal>os=unix</literal>.</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term><parameter>sep</parameter></term>
+<listitem>
+<para>Separator for multiple target audience identifiers. Default is
+<literal>;</literal>.</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+
+<para>Current implementation is able to handle multiple profiling
+targets in one attribute. In that case you must separate identifiers
+by semicolon:</para>
+
+<programlisting><![CDATA[<para os="unix;mac;win">...</para>]]></programlisting>
+
+<para>It is possible to use different separator than semicolon by
+setting <parameter>sep</parameter> parameter. There cann't be spaces
+between separator and target names.</para>
+
+<para>You can also perform profiling based on several profiling
+attributes in a single step as stylesheet can handle all parameters
+simultaneously. For example to get hypothetical guide for Windows
+beginners, you can run profiling like this:</para>
+
+<screen><command>saxon</command> <option>-o</option> xsample.xml sample.xml profile.xsl "os=win" "ul=beginner"</screen>
+
+<para>As you can see above described profiling process can be used to
+substitute SGML marked sections mechanism which is missing in XML.</para>
+
+</section>
+
+<section>
+<title>Conclusion</title>
+
+<para>Profiling is necessary in many larger DocBook applications. It
+can be quite easily implemented by simple XSLT stylesheet which is
+presented here. This mechanism can also be used to simulate behavior
+of marked sections known from SGML.</para>
+
+</section>
+
+</article>
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-shorttag:t
+End:
+-->