From a0648ce844f3bb8ea1188aa8b0583b1b6c06c86c Mon Sep 17 00:00:00 2001 From: Bob Stayton Date: Fri, 5 Nov 2004 23:29:15 +0000 Subject: [PATCH] Initial checkin of draft DocBook-WordML specifications. --- xsl/wordml/specifications.xml | 426 ++++++++++++++++++++++++++++++++++ 1 file changed, 426 insertions(+) create mode 100755 xsl/wordml/specifications.xml diff --git a/xsl/wordml/specifications.xml b/xsl/wordml/specifications.xml new file mode 100755 index 000000000..320d77130 --- /dev/null +++ b/xsl/wordml/specifications.xml @@ -0,0 +1,426 @@ + + +
+DocBook-WordML Conversion Specifications +BobStaytonSagehill +EnterprisesDraft Version 1.0, +dated 5 November, 2004 + + +This document specifies how DocBook elements can be mapped to Microsoft Word styles. The specifications will be used to write conversions between DocBook XML and Microsoft's WordProcessingML (WordML). + +
+Introduction +Microsoft Word 2003 introduced WordProcessingML (WordML), an XML vocabulary for Word documents. By converting Word to XML, it becomes possible to convert a Word document to DocBook and vice versa using XSL transformations. Such conversions would then enable the following. + + +DocBook content creators could write in Word, a familiar wordprocessing application, rather than learning a new XML editing application. + + +DocBook XML documents could be styled for output using the typesetting features of Word. + + +This specification describes how DocBook elements could map to a set of Word paragraph and character styles. It defines a specific set of style names for which a Word style template can be created. The style names would also be used in XSLT template match patterns for conversion. +
+
+Project goals +The goal of this project is to enable Microsoft Word to be used with DocBook files. The specific goals include: + + +Enable authoring of basic DocBook documents in Word. + + +Enable importing of basic DocBook XML documents into Word. + + +To meet these goals, the project will produce a toolkit that can be immediately put to use. The kit will include: + + +A Word template with formatting styles attached to the style names. + + +A wordml-to-docbook XSLT stylesheet, which can convert a Word document that is authored with the Word template into a DocBook XML file. + + +A docbook-to-wordml XSLT stylesheet, which can convert a DocBook document into a WordML document that can be opened in Word with the attached Word template. + + +
+Why basic DocBook? +It isn't clear that this project will ever be able to support all DocBook elements and structure. The project will initially focus on a basic set of commonly used DocBook elements to demonstrate the feasibility and usefulness of using Word with DocBook. +One problem facing this conversion project is the sheer number of DocBook elements, over 400 in DocBook 4.3. To support DocBook structural models, several of the elements will require more than one Word style. This could lead to a very long and unwieldy list of styles in the Word interface. That would make authoring less efficient and discourage users. +So this project assumes that authors who need the full set of DocBook elements will use an XML authoring tool that better supports them. This project will enable authors to write basic DocBook documents using Word. Because Word is so widespread, this project will help a lot of new DocBook users get started with familiar tools. They can then graduate to more advanced tools as their needs develop. +
+
+
+Mapping elements to styles +Although WordML and DocBook are both XML, there several challenges when trying to convert between them. +The basic problem in mapping Word styles to DocBook elements is that Word documents support far less structure than DocBook. DocBook permits nesting of elements within other elements, providing multiple levels of context for each element. +Word's only structural feature is the outlining mode. In Word outlining, certain paragraph styles are assigned outline levels. When a user applies those styles, they effectively create logical structure in the Word document. When such a document is saved to WordML, the outline levels are rendered as nested wx-sub-section elements. The outlining feature will be used for components and sections in this specification. +Nesting of block elements is another commonly used feature of DocBook. It is not possible to use Word's outline mode for blocks if it is being used for components and sections. So in this specification, nesting of block elements is indicated by adding a number suffix to a style. So a Word paragraph with style orderedlist2 is considered to be contained within a preceding paragraph with style orderedlist. In Word, paragraph indent levels will be used to visually indicate nesting of blocks. +Nesting of inline DocBook elements is particularly difficult to support because Word does not nest character styles. That means a nested inline would require a separate Word style to indicate the parent-child relationship. Given the large number of combinations possible, a prohibitively large number of character styles would have to be created. In this project, nesting of character styles will not be supported in the first release. Nested inlines being imported from DocBook will be converted to a sequence of single-name Word character styles. +In many cases, DocBook structure can be derived from the flat Word sequence of paragraphs based on sibling relationships. For example, when a paragraph styled as para is followed by a paragraph styled as itemizedlist, the conversion to DocBook will output a para element and then start an itemizedlist element, with the second paragraph as its first listitem. All itemizedlist paragraphs that follow without interruption are put in the same itemizedlist element. +Here are the design principles used in this project for selecting Word style names: + + +Word paragraph and character style names will match DocBook element names as much as possible. This will enable authors to learn DocBook element names, and help debug problems with conversion. + + +Some style names will indicate a parent-child relationship. For example, chapter-title indicates that the paragraph is a title whose DocBook parent is a chapter. + + +Some style names are simplified to make them easier to use in Word. For example, a paragraph in an orderedlist requires three elements in DocBook: orderedlist, listitem, and para. The paragraph style name in Word is shortened from orderedlist-listitem-para to just orderedlist. + + +Style names with a number suffix indicate a nesting level, as described above. + + +Style names with continue indicate that the paragraph is part of the preceding element. For example, a note paragraph is used for a single paragraph note element. But if a Note is to contain more than one paragraph, then the subsequent paragraphs in Word would get a note continue style. If the note style were used, then they would be taken as separate note elements in the conversion to DocBook. + + +The first paragraph style in the Word document is used to define the root element of the DocBook document. For example, if a Word document starts with book-title, then the DocBook document will have book as its root element. All the rest of the document content will be contained in that root element. + + +Attributes are a feature of DocBook XML that have no direct counterpart in Word. One approach is to use Word Bookmarks for attributes. For example, a Word Bookmark named att_role_foobar could be inserted into a paragraph. When converted to DocBook XML, this would become a role="foobar" attribute on the element derived from the paragraph containing the Bookmark. + +DocBook to WordML styles + + + + + +DocBook element +WordML styles +Comments + + + + +Components and sections + + +book +book-title + + + +chapter +chapter-title +Assigned Word outline level 1. + + +appendix +appendix-title +Assigned Word outline level 1. + + +preface +preface-title +Assigned Word outline level 1. + + +article +article-title +Assigned Word outline level 1. + + +bibliography +bibliography-title +Assigned Word outline level 1. + + +glossary +glossary-title +Assigned Word outline level 1. + + +index +index-title +Assigned Word outline level 1. + + +sect1 +sect1-title +Assigned Word outline level 2. + + +sect2 +sect2-title +Assigned Word outline level 3. + + +sect3 +sect3-title +Assigned Word outline level 4. + + +sect4 +sect4-title +Assigned Word outline level 5. + + + sect5 +sect5-title +Assigned Word outline level 6. + + +Block-level elements + + +para +para +Any Word paragraph with style Normal will also be converted to a para element. + + +note/para +note +note continue +Any paragraphs after the first in a note just use note continue to be treated as part of the same note element. + + +note/title +note-title + + + +caution/para +caution +caution continue + + + +warning/para +warning +warning continue + + + +important/para +important +important continue + + + +tip/para +tip +tip continue + + + +itemizedlist/listitem/para +itemizedlist +itemizedlist continue +itemizedlist2 +itemizedlist2 continue +itemizedlist3 +itemizedlist3 continue +itemizedlist4 +itemizedlist4 continue +A continue suffix indicates a paragraph is part of the same listitem as the preceding paragraph. A number suffix indicates a nesting level within other lists. + + +orderedlist/listitem/para +orderedlist +orderedlist continue +orderedlist2 +orderedlist2 continue +orderedlist3 +orderedlist3 continue +orderedlist4 +orderedlist4 continue + + + +variablelist/varlistentry/term +variablelist-term +variablelist-term2 +variablelist-term3 +variablelist-term4 +A variblelist in Word should be a sequence of alternating paragraphs styled as variablelist-term and variablelist. + + +variablelist/varlistentry/listitem/para +variablelist +variablelist continue +variablelist2 +variablelist2 continue +variablelist3 +variablelist3 continue +variablelist4 +variablelist4 continue + + + +example with title and programlisting children +example-title followed by programlisting + + + +example with title and literallayout children +example-title followed by literallayout + + + +example with title and mediaobject children +example-title followed by image styled with example style + + + +figure with title and programlisting children +figure-title followed by programlisting + + + +figure with title and literallayout children +figure-title followed by literallayout + + + +figure with title and mediaobject children +figure-title followed by image styled with figure style + + + +informalfigure +image tagged as figure style +with no figure-title above or below + + +table +Word table + + + +table/title +table-title + + + +informaltable +Word table +with no table-title above or below + + +literallayout +literallayout +Inside a literallayout paragraph in Word, lines should be separated by line break (Shift-Enter) rather than paragraph break (Enter). + + +programlisting +programlisting +Inside a programlisting paragraph in Word, lines should be separated by line break (Shift-Enter) rather than paragraph break (Enter). Tabs are not supported. + + +blockquote/para +blockquote + + + +blockquote/title +blockquote-title +Should immediately precede a blockquote paragraph in Word. + + +blockquote/attribution +blockquote-attribution +Should immediately follow a blockquote paragraph in Word. + + +Inline elements + + +emphasis +emphasis + + + +emphasis with @role="bold" +emphasis-bold + + + +footnote +Word footnote + + + +link +link +In Word, hyperlink properties identify the DocBook linkend. + + +xref +xref +In Word, hyperlink properties identify the DocBook linkend. Some placeholder text can be used in Word, but it will be discarded when exported to DocBook where xref is an empty element. + + +olink +olink +In Word, hyperlink properties identify the DocBook targetdoc and targetptr. + + +ulink +ulink +In Word, hyperlink properties identify the url. + + +glossterm +glossterm +In Word, hyperlink properties identify the DocBook linkend. + + +firstterm +firstterm +In Word, hyperlink properties identify the DocBook linkend. + + +computeroutput +computeroutput + + + +literal +literal + + + +replaceable +replaceable + + + +userinput +userinput + + + +command +command + + + +filename +filename + + + +option +option + + + +parameter +parameter + + + +systemitem +systemitem + + + + +
+
+
-- 2.50.1