From a0648ce844f3bb8ea1188aa8b0583b1b6c06c86c Mon Sep 17 00:00:00 2001
From: Bob Stayton <bobs@sagehill.net>
Date: Fri, 5 Nov 2004 23:29:15 +0000
Subject: [PATCH] Initial checkin of draft DocBook-WordML specifications.

---
 xsl/wordml/specifications.xml | 426 ++++++++++++++++++++++++++++++++++
 1 file changed, 426 insertions(+)
 create mode 100755 xsl/wordml/specifications.xml

diff --git a/xsl/wordml/specifications.xml b/xsl/wordml/specifications.xml
new file mode 100755
index 000000000..320d77130
--- /dev/null
+++ b/xsl/wordml/specifications.xml
@@ -0,0 +1,426 @@
+<?xml version="1.0"?>
+<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "docbook4.dtd">
+<article>
+<title>DocBook-WordML Conversion Specifications</title> 
+<articleinfo><author><firstname>Bob</firstname><surname>Stayton</surname><affiliation><orgname>Sagehill
+Enterprises</orgname></affiliation></author><releaseinfo>Draft Version 1.0,
+dated 5 November, 2004</releaseinfo> 
+</articleinfo> 
+<abstract>
+<para>This document specifies how DocBook  elements can be mapped to Microsoft Word styles. The specifications will be used to write conversions between DocBook XML and Microsoft's WordProcessingML (WordML).</para>
+</abstract> 
+<section>
+<title>Introduction</title>
+<para>Microsoft Word 2003 introduced WordProcessingML (WordML), an XML vocabulary for Word documents. By converting Word to XML, it becomes possible to convert a Word document to DocBook and vice versa using XSL transformations. Such conversions would then enable the following.</para>
+<itemizedlist>
+<listitem>
+<para>DocBook content creators could write in Word, a familiar wordprocessing application, rather than learning a new XML editing application.</para>
+</listitem>
+<listitem>
+<para>DocBook XML documents could be styled for output using the typesetting features of Word.</para>
+</listitem>
+</itemizedlist>
+<para>This specification describes how DocBook elements could map to a set of Word paragraph and character styles. It defines a specific set of style names for which a Word style template can be created. The style names would also be used in XSLT template match patterns for conversion. </para>
+</section>
+<section>
+<title>Project goals</title>
+<para>The goal of this project is to enable Microsoft Word to be used with DocBook files.  The specific goals include:</para>
+<itemizedlist>
+<listitem>
+<para>Enable authoring of basic DocBook documents in Word.</para>
+</listitem>
+<listitem>
+<para>Enable importing of basic DocBook XML documents into Word.</para>
+</listitem>
+</itemizedlist>
+<para>To meet these goals, the project will produce a toolkit that can be immediately put to use.  The kit will include:</para>
+<itemizedlist>
+<listitem>
+<para>A Word template with formatting styles attached to the style names.</para>
+</listitem>
+<listitem>
+<para>A wordml-to-docbook XSLT stylesheet, which can convert a Word document that is authored with the Word template into a DocBook XML file.</para>
+</listitem>
+<listitem>
+<para>A docbook-to-wordml XSLT stylesheet, which can convert a DocBook document into a WordML document that can be opened in Word with the attached Word template.</para>
+</listitem>
+</itemizedlist>
+<section>
+<title>Why basic DocBook?</title>
+<para>It isn't clear that this project will ever be able to support all DocBook elements and structure. The project will initially focus on a basic set of commonly used DocBook elements to demonstrate the feasibility and usefulness of using Word with DocBook. </para>
+<para>One problem facing this conversion project is the sheer number of DocBook elements, over 400 in DocBook 4.3. To support DocBook structural models, several of the elements will require more than one Word style. This could lead to a very long and unwieldy   list of styles in the Word interface. That would make authoring less efficient and discourage users.</para>
+<para>So this project assumes that authors who need the full set of DocBook elements will use an XML authoring tool that better supports them. This project will enable authors to write basic DocBook documents using Word. Because Word is so widespread, this project will help a lot of new DocBook users get started with familiar tools.  They can then graduate to more advanced tools as their needs develop.</para>
+</section>
+</section>
+<section>
+<title>Mapping elements to styles</title> 
+<para>Although WordML and DocBook are both XML, there several challenges when trying to convert between them.</para>
+<para>The basic problem in mapping Word styles to DocBook elements is that Word documents support far less structure than DocBook.  DocBook permits nesting of elements within other elements, providing multiple levels of context for each element.  </para>
+<para>Word's only structural feature is the outlining mode. In Word outlining, certain paragraph styles are assigned outline levels.  When a user applies those styles, they effectively create logical structure in the Word document.  When such a document is saved to WordML, the outline levels are rendered as nested <literal>wx-sub-section</literal> elements. The outlining feature will be used for components and sections in this specification.</para>
+<para>Nesting of block elements is another commonly used feature of DocBook.  It is not possible to use Word's outline mode for blocks if it is being used for components and sections.  So in this specification, nesting of block elements is indicated by adding a number suffix to a style. So a Word paragraph with style <literal>orderedlist2</literal> is considered to be contained within a preceding paragraph with style <literal>orderedlist</literal>. In Word, paragraph indent levels will be used to visually indicate nesting of blocks. </para>
+<para>Nesting of inline DocBook elements is particularly difficult to support because  Word does not nest character styles. That means a nested inline would require a separate Word style to indicate the parent-child relationship. Given the large number of combinations possible, a prohibitively large number of character styles would have to be created. In this project, nesting of character styles will not be supported in the first release. Nested inlines being imported from DocBook will be converted to a sequence of single-name  Word character styles.</para>
+<para>In many cases, DocBook structure can be derived from the flat Word sequence of paragraphs based on sibling relationships. For example, when a paragraph styled as <literal>para</literal> is followed by a paragraph styled as  <literal>itemizedlist</literal>,   the conversion to DocBook will output a <sgmltag class="element">para</sgmltag> element and then start an <sgmltag class="element">itemizedlist</sgmltag> element, with the second paragraph as its first <sgmltag class="element">listitem</sgmltag>. All <literal>itemizedlist</literal> paragraphs that follow without interruption are put in the same <sgmltag class="element">itemizedlist</sgmltag> element.</para>
+<para>Here are the design principles used in this project for selecting Word style names:</para>
+<itemizedlist>
+<listitem>
+<para>Word paragraph and character style names will match DocBook element names as much as possible. This will enable authors to learn DocBook element names, and help debug problems with conversion.</para>
+</listitem>
+<listitem>
+<para>Some style names will indicate a parent-child relationship.  For example, <literal>chapter-title</literal> indicates that the paragraph is a title whose DocBook parent is a chapter.</para>
+</listitem>
+<listitem>
+<para>Some style names are simplified to make them easier to use in Word. For example, a paragraph in an orderedlist requires three elements in DocBook: <sgmltag class="element">orderedlist</sgmltag>, <sgmltag class="element">listitem</sgmltag>, and <sgmltag class="element">para</sgmltag>. The paragraph style name in Word is shortened from <literal>orderedlist-listitem-para</literal> to just <literal>orderedlist</literal>. </para>
+</listitem>
+<listitem>
+<para>Style names with a number suffix indicate a nesting level, as described above.</para>
+</listitem>
+<listitem>
+<para>Style names with <literal>continue</literal> indicate that the paragraph is part of the preceding element. For example, a <literal>note</literal> paragraph is used for a single paragraph <sgmltag class="element">note</sgmltag> element. But if a Note is to contain more than one paragraph, then the subsequent paragraphs in Word would get a <literal>note continue</literal> style. If the <literal>note</literal> style were used, then they would be taken as separate <sgmltag class="element">note</sgmltag> elements in the conversion to DocBook.</para>
+</listitem>
+<listitem>
+<para>The first paragraph style in the Word document is used to define the root element of the DocBook document. For example, if a Word document starts with <literal>book-title</literal>, then the DocBook document will have <literal>book</literal> as its root element.  All the rest of the document content will be contained in that root element.</para>
+</listitem>
+</itemizedlist>
+<para>Attributes are a feature of DocBook XML that have no direct counterpart in Word. One approach is to use Word Bookmarks for attributes. For example, a Word Bookmark named <literal>att_role_foobar</literal> could be inserted into a paragraph. When converted to DocBook XML, this would become a <sgmltag class="attribute">role="foobar"</sgmltag> attribute on the element derived from the paragraph containing the Bookmark.</para>
+<table>
+<title>DocBook to WordML styles</title>
+<tgroup cols="3"><colspec colnum="1" colname="col1"
+colwidth="1.00*"/>
+<colspec colnum="2" colname="col2" colwidth="1.89*"/>
+<colspec colnum="3" colname="col3" colwidth="1.97*"/>
+<thead>
+<row>
+<entry colname="col1">DocBook element</entry>
+<entry colname="col2">WordML styles</entry>
+<entry colname="col3">Comments</entry>
+</row>
+</thead>
+<tbody>
+<row>
+<entry namest="col1" nameend="col3"><emphasis role="bold">Components and sections</emphasis></entry>
+</row>
+<row>
+<entry colname="col1">book</entry>
+<entry colname="col2">book-title</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">chapter</entry>
+<entry colname="col2">chapter-title</entry>
+<entry colname="col3">Assigned Word outline level 1.</entry>
+</row>
+<row>
+<entry colname="col1">appendix</entry>
+<entry colname="col2">appendix-title</entry>
+<entry colname="col3">Assigned Word outline level 1.</entry>
+</row>
+<row>
+<entry colname="col1">preface</entry>
+<entry colname="col2">preface-title</entry>
+<entry colname="col3">Assigned Word outline level 1.</entry>
+</row>
+<row>
+<entry colname="col1">article</entry>
+<entry colname="col2">article-title</entry>
+<entry colname="col3">Assigned Word outline level 1.</entry>
+</row>
+<row>
+<entry colname="col1">bibliography</entry>
+<entry colname="col2">bibliography-title</entry>
+<entry colname="col3">Assigned Word outline level 1.</entry>
+</row>
+<row>
+<entry colname="col1">glossary</entry>
+<entry colname="col2">glossary-title</entry>
+<entry colname="col3">Assigned Word outline level 1.</entry>
+</row>
+<row>
+<entry colname="col1">index</entry>
+<entry colname="col2">index-title</entry>
+<entry colname="col3">Assigned Word outline level 1.</entry>
+</row>
+<row>
+<entry colname="col1">sect1</entry>
+<entry colname="col2">sect1-title</entry>
+<entry colname="col3">Assigned Word outline level 2.</entry>
+</row>
+<row>
+<entry colname="col1">sect2</entry>
+<entry colname="col2">sect2-title</entry>
+<entry colname="col3">Assigned Word outline level 3.</entry>
+</row>
+<row>
+<entry colname="col1">sect3</entry>
+<entry colname="col2">sect3-title</entry>
+<entry colname="col3">Assigned Word outline level 4.</entry>
+</row>
+<row>
+<entry colname="col1">sect4</entry>
+<entry colname="col2">sect4-title</entry>
+<entry colname="col3">Assigned Word outline level 5.</entry>
+</row>
+<row>
+<entry colname="col1"> sect5 </entry>
+<entry colname="col2">sect5-title</entry>
+<entry colname="col3">Assigned Word outline level 6.</entry>
+</row>
+<row>
+<entry namest="col1" nameend="col3"><emphasis role="bold">Block-level elements</emphasis></entry>
+</row>
+<row>
+<entry colname="col1">para</entry>
+<entry colname="col2">para</entry>
+<entry colname="col3">Any Word paragraph with style <literal>Normal</literal> will also be converted to a <sgmltag class="element">para</sgmltag> element.</entry>
+</row>
+<row>
+<entry colname="col1">note/para</entry>
+<entry colname="col2"><literallayout>note
+note continue</literallayout></entry>
+<entry colname="col3">Any paragraphs after the first in a note just use <literal>note continue</literal> to be treated as part of the same <sgmltag class="element">note</sgmltag> element.</entry>
+</row>
+<row>
+<entry colname="col1">note/title</entry>
+<entry colname="col2">note-title</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">caution/para</entry>
+<entry colname="col2"><literallayout>caution
+caution continue</literallayout></entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">warning/para</entry>
+<entry colname="col2"><literallayout>warning
+warning continue</literallayout></entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">important/para</entry>
+<entry colname="col2"><literallayout>important
+important continue</literallayout></entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">tip/para</entry>
+<entry colname="col2"><literallayout>tip
+tip continue</literallayout></entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">itemizedlist/listitem/para</entry>
+<entry colname="col2"><literallayout class="monospaced">itemizedlist
+itemizedlist continue
+itemizedlist2
+itemizedlist2 continue
+itemizedlist3
+itemizedlist3 continue
+itemizedlist4
+itemizedlist4 continue</literallayout></entry>
+<entry colname="col3">A <literal>continue</literal> suffix indicates a paragraph is part of the same listitem as the preceding paragraph. A number suffix indicates a nesting level within other lists.</entry>
+</row>
+<row>
+<entry colname="col1">orderedlist/listitem/para</entry>
+<entry colname="col2"><literallayout class="monospaced">orderedlist
+orderedlist continue
+orderedlist2
+orderedlist2 continue
+orderedlist3
+orderedlist3 continue
+orderedlist4
+orderedlist4 continue</literallayout></entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">variablelist/varlistentry/term</entry>
+<entry colname="col2"><literallayout class="monospaced">variablelist-term
+variablelist-term2
+variablelist-term3
+variablelist-term4</literallayout></entry>
+<entry colname="col3">A <sgmltag class="element">variblelist</sgmltag> in Word should be a sequence of alternating paragraphs styled as <literal>variablelist-term</literal> and <literal>variablelist</literal>.</entry>
+</row>
+<row>
+<entry colname="col1">variablelist/varlistentry/listitem/para</entry>
+<entry colname="col2"><literallayout class="monospaced">variablelist
+variablelist continue
+variablelist2
+variablelist2 continue
+variablelist3
+variablelist3 continue
+variablelist4
+variablelist4 continue</literallayout></entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">example with title and programlisting children</entry>
+<entry colname="col2">example-title followed by programlisting</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">example with title and literallayout children</entry>
+<entry colname="col2">example-title followed by literallayout</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">example with title and mediaobject children</entry>
+<entry colname="col2">example-title followed by image styled with example style</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">figure with title and programlisting children</entry>
+<entry colname="col2">figure-title followed by programlisting</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">figure with title and literallayout children</entry>
+<entry colname="col2">figure-title followed by literallayout</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">figure with title and mediaobject children</entry>
+<entry colname="col2">figure-title followed by image styled with figure style</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">informalfigure</entry>
+<entry colname="col2">image tagged as figure style</entry>
+<entry colname="col3">with no figure-title above or below</entry>
+</row>
+<row>
+<entry colname="col1">table</entry>
+<entry colname="col2">Word table</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">table/title</entry>
+<entry colname="col2">table-title</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">informaltable</entry>
+<entry colname="col2">Word table</entry>
+<entry colname="col3">with no table-title above or below</entry>
+</row>
+<row>
+<entry colname="col1">literallayout</entry>
+<entry colname="col2">literallayout</entry>
+<entry colname="col3">Inside a <literal>literallayout</literal> paragraph in Word, lines should be separated by  line break (Shift-Enter) rather than paragraph break (Enter).</entry>
+</row>
+<row>
+<entry colname="col1">programlisting</entry>
+<entry colname="col2">programlisting</entry>
+<entry colname="col3">Inside a <literal>programlisting</literal> paragraph in Word, lines should be separated by  line break (Shift-Enter) rather than paragraph break (Enter). Tabs are not supported.</entry>
+</row>
+<row>
+<entry colname="col1">blockquote/para</entry>
+<entry colname="col2">blockquote</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">blockquote/title</entry>
+<entry colname="col2">blockquote-title</entry>
+<entry colname="col3">Should immediately precede a <literal>blockquote</literal> paragraph in Word.</entry>
+</row>
+<row>
+<entry colname="col1">blockquote/attribution</entry>
+<entry colname="col2">blockquote-attribution</entry>
+<entry colname="col3">Should immediately follow a <literal>blockquote</literal> paragraph in Word.</entry>
+</row>
+<row>
+<entry namest="col1" nameend="col3"><emphasis role="bold">Inline elements</emphasis></entry>
+</row>
+<row>
+<entry colname="col1">emphasis</entry>
+<entry colname="col2">emphasis</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">emphasis with @role="bold"</entry>
+<entry colname="col2">emphasis-bold</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">footnote</entry>
+<entry colname="col2">Word footnote</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">link</entry>
+<entry colname="col2">link</entry>
+<entry colname="col3">In Word, hyperlink properties identify the DocBook linkend.</entry>
+</row>
+<row>
+<entry colname="col1">xref</entry>
+<entry colname="col2">xref</entry>
+<entry colname="col3">In Word, hyperlink properties identify the DocBook linkend. Some placeholder text can be used in Word, but it will be discarded when exported to DocBook where xref is an empty element.</entry>
+</row>
+<row>
+<entry colname="col1">olink</entry>
+<entry colname="col2">olink</entry>
+<entry colname="col3">In Word, hyperlink properties identify the DocBook targetdoc and targetptr.</entry>
+</row>
+<row>
+<entry colname="col1">ulink</entry>
+<entry colname="col2">ulink</entry>
+<entry colname="col3">In Word, hyperlink properties identify the url.</entry>
+</row>
+<row>
+<entry colname="col1">glossterm</entry>
+<entry colname="col2">glossterm</entry>
+<entry colname="col3">In Word, hyperlink properties identify the DocBook linkend.</entry>
+</row>
+<row>
+<entry colname="col1">firstterm</entry>
+<entry colname="col2">firstterm</entry>
+<entry colname="col3">In Word, hyperlink properties identify the DocBook linkend.</entry>
+</row>
+<row>
+<entry colname="col1">computeroutput</entry>
+<entry colname="col2">computeroutput</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">literal</entry>
+<entry colname="col2">literal</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">replaceable</entry>
+<entry colname="col2">replaceable</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">userinput</entry>
+<entry colname="col2">userinput</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">command</entry>
+<entry colname="col2">command</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">filename</entry>
+<entry colname="col2">filename</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">option</entry>
+<entry colname="col2">option</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">parameter</entry>
+<entry colname="col2">parameter</entry>
+<entry colname="col3"></entry>
+</row>
+<row>
+<entry colname="col1">systemitem</entry>
+<entry colname="col2">systemitem</entry>
+<entry colname="col3"></entry>
+</row>
+</tbody>
+</tgroup>
+</table>
+</section> 
+</article>
-- 
2.40.0