From 058f7710bcee7685864ea9e407f441f383bb3b59 Mon Sep 17 00:00:00 2001 From: Steve Ball Date: Sun, 7 Nov 2004 06:57:02 +0000 Subject: [PATCH] added my thoughts to mapping spec --- xsl/wordml/docbook.xsl | 27 +++++------ xsl/wordml/specifications.xml | 90 ++++++++++++++++++++--------------- 2 files changed, 65 insertions(+), 52 deletions(-) diff --git a/xsl/wordml/docbook.xsl b/xsl/wordml/docbook.xsl index 6346a0485..24e3a6443 100755 --- a/xsl/wordml/docbook.xsl +++ b/xsl/wordml/docbook.xsl @@ -168,7 +168,9 @@ - + + + @@ -180,9 +182,9 @@ - + - + @@ -191,6 +193,13 @@ + + sect + + - + + - @@ -204,19 +213,9 @@ + - - - - - - - - - - - diff --git a/xsl/wordml/specifications.xml b/xsl/wordml/specifications.xml index 320d77130..d9a90ba29 100755 --- a/xsl/wordml/specifications.xml +++ b/xsl/wordml/specifications.xml @@ -3,7 +3,9 @@
DocBook-WordML Conversion Specifications BobStaytonSagehill -EnterprisesDraft Version 1.0, +Enterprises +SteveBallZveno +Draft Version 1.0, dated 5 November, 2004 @@ -52,12 +54,28 @@ dated 5 November, 2004 So this project assumes that authors who need the full set of DocBook elements will use an XML authoring tool that better supports them. This project will enable authors to write basic DocBook documents using Word. Because Word is so widespread, this project will help a lot of new DocBook users get started with familiar tools. They can then graduate to more advanced tools as their needs develop. +
+ Project Non-Goals + The following goals do not for part of the scope of the project: + + + Support of versions of Word that do not feature reading/writing WordML (XML). That is, all versions prior to Word 11 (Office 2003). + + + Supporting user-defined style names. However, this system should not prevent, or make difficult, adding such support via a customisation layer. + + + Support of arbitrarily defined styles. This system may expect certain styles to be defined in a particular fashion (in particular, those defining the title of components and divisions). + + +
Mapping elements to styles Although WordML and DocBook are both XML, there several challenges when trying to convert between them. The basic problem in mapping Word styles to DocBook elements is that Word documents support far less structure than DocBook. DocBook permits nesting of elements within other elements, providing multiple levels of context for each element. -Word's only structural feature is the outlining mode. In Word outlining, certain paragraph styles are assigned outline levels. When a user applies those styles, they effectively create logical structure in the Word document. When such a document is saved to WordML, the outline levels are rendered as nested wx-sub-section elements. The outlining feature will be used for components and sections in this specification. -Nesting of block elements is another commonly used feature of DocBook. It is not possible to use Word's outline mode for blocks if it is being used for components and sections. So in this specification, nesting of block elements is indicated by adding a number suffix to a style. So a Word paragraph with style orderedlist2 is considered to be contained within a preceding paragraph with style orderedlist. In Word, paragraph indent levels will be used to visually indicate nesting of blocks. +Word's only structural feature is the outlining mode. In Word outlining, certain paragraph styles are assigned outline levels. When a user applies those styles, they effectively create logical structure in the Word document. When such a document is saved to WordML, the outline levels are rendered as nested wx:sub-section elements. The outlining feature will be used for components and sections in this specification. +This system applies certain hueristics to build the DocBook element structure from the (relatively flat) word processing structure. Titles and other features are used to mark the beginning of a structure, and all paragraphs following that are included in that structure until the beginning of the next structure is found. Problems may arise when a structure should end, but there is no Word feature that marks the endpoint. +Nesting of block elements is another commonly used feature of DocBook. It is not possible to use Word's outline mode for blocks if it is being used for components and sections. So in this specification, nesting of block elements is indicated by adding a number suffix to a style. So a Word paragraph with style orderedlist2 is considered to be contained within a preceding paragraph with style listitem. In Word, paragraph indent levels will be used to visually indicate nesting of blocks. Nesting of inline DocBook elements is particularly difficult to support because Word does not nest character styles. That means a nested inline would require a separate Word style to indicate the parent-child relationship. Given the large number of combinations possible, a prohibitively large number of character styles would have to be created. In this project, nesting of character styles will not be supported in the first release. Nested inlines being imported from DocBook will be converted to a sequence of single-name Word character styles. In many cases, DocBook structure can be derived from the flat Word sequence of paragraphs based on sibling relationships. For example, when a paragraph styled as para is followed by a paragraph styled as itemizedlist, the conversion to DocBook will output a para element and then start an itemizedlist element, with the second paragraph as its first listitem. All itemizedlist paragraphs that follow without interruption are put in the same itemizedlist element. Here are the design principles used in this project for selecting Word style names: @@ -69,19 +87,23 @@ dated 5 November, 2004 Some style names will indicate a parent-child relationship. For example, chapter-title indicates that the paragraph is a title whose DocBook parent is a chapter. -Some style names are simplified to make them easier to use in Word. For example, a paragraph in an orderedlist requires three elements in DocBook: orderedlist, listitem, and para. The paragraph style name in Word is shortened from orderedlist-listitem-para to just orderedlist. +Some style names are simplified to make them easier to use in Word. For example, a paragraph in an orderedlist requires three elements in DocBook: orderedlist, listitem, and para. The paragraph style name in Word is shortened from orderedlist-listitem-para to just orderedlist. NB. in the case of lists (see below), the list level is appended so this example becomes orderedlist1 Style names with a number suffix indicate a nesting level, as described above. -Style names with continue indicate that the paragraph is part of the preceding element. For example, a note paragraph is used for a single paragraph note element. But if a Note is to contain more than one paragraph, then the subsequent paragraphs in Word would get a note continue style. If the note style were used, then they would be taken as separate note elements in the conversion to DocBook. +Style names with continue indicate that the paragraph is part of the preceding element. For example, a para paragraph is used for a single paragraph para element. This would cause any preceding list to be closed. If a list item in the preceding list is to contain more than one paragraph, then the subsequent paragraphs in Word would get a para-continue style. + + + Empty paragraph and character styles are ignored. The first paragraph style in the Word document is used to define the root element of the DocBook document. For example, if a Word document starts with book-title, then the DocBook document will have book as its root element. All the rest of the document content will be contained in that root element. Attributes are a feature of DocBook XML that have no direct counterpart in Word. One approach is to use Word Bookmarks for attributes. For example, a Word Bookmark named att_role_foobar could be inserted into a paragraph. When converted to DocBook XML, this would become a role="foobar" attribute on the element derived from the paragraph containing the Bookmark. +[Alternatively, we could use hidden text for attributes.] DocBook to WordML styles Assigned Word outline level 5. - sect5 +sect5 sect5-title Assigned Word outline level 6. @@ -174,9 +196,8 @@ colwidth="1.00*"/> note/para -note -note continue -Any paragraphs after the first in a note just use note continue to be treated as part of the same note element. +note +Consecutive paragraphs with style "note" after the first note are to be treated as part of the same note element. That is, consecutive notes are coalesced. note/title @@ -185,73 +206,66 @@ note continue caution/para -caution -caution continue - +caution +Consecutive cautions are coalesced. warning/para -warning -warning continue - +warning +Consecutive warnings are coalesced. important/para -important -important continue - +important +Consecutive importants are coalesced. tip/para -tip -tip continue - +tip +Consecutive tips are coalesced. itemizedlist/listitem/para itemizedlist -itemizedlist continue +itemizedlist1 itemizedlist2 -itemizedlist2 continue itemizedlist3 -itemizedlist3 continue -itemizedlist4 -itemizedlist4 continue -A continue suffix indicates a paragraph is part of the same listitem as the preceding paragraph. A number suffix indicates a nesting level within other lists. +itemizedlist4 +A number suffix indicates a nesting level within other lists. orderedlist/listitem/para orderedlist -orderedlist continue +orderedlist1 orderedlist2 -orderedlist2 continue orderedlist3 -orderedlist3 continue -orderedlist4 -orderedlist4 continue +orderedlist4 variablelist/varlistentry/term variablelist-term +variablelist-term1 variablelist-term2 variablelist-term3 variablelist-term4 -A variblelist in Word should be a sequence of alternating paragraphs styled as variablelist-term and variablelist. +A variablelist in Word should be a sequence of alternating paragraphs styled as variablelist-term and variablelist. variablelist/varlistentry/listitem/para variablelist -variablelist continue +variablelist1 variablelist2 -variablelist2 continue variablelist3 -variablelist3 continue -variablelist4 -variablelist4 continue +variablelist4 +listitem/para[position() != 1] +para-continue +This paragraph is included in the immediately preceding listitem. + + example with title and programlisting children example-title followed by programlisting -- 2.40.0