Creating DITA topics using reStructuredText

To a writer whose only tool is a flat text editor, formatted text is the next best thing to a real DITA editor.

In fact, the process is one of the more reliable alternate authoring methods among those I’ve been writing about. Even the use of cell phone photos, while described facetiously for April Fools, has a place whenever the source format is so utterly obtuse that scanning the printout is the best way of filtering out the gorpy markup.

About reStructuredText

reStructuredText is a well-documented example of the family of intuitive markup styles known aslightweight markup (which includes various wikitext dialects).

Popular examples of lightweight markup include the MediaWiki , Eclipse-based Mylyn , and Creolewikitext styles and their particular tools. DITA converters exist for some of these, notably for the Mylyn project .

reStructuredText (also called rst or sometimes–conflicting with ‘RESTful’ API usage–reST) is popular in the Python programming community and has gained a recent boost in visibility by being recognized as a strategic markup strategy for materials in the Gutenberg Project. The part that appealed most to me about rst as a format is that the Python docutils toolkit includes a parser that generates an XML representation of the fully normalized document structure. Having an XML representation of a lightweight markup makes the subsequent migration as DITA vastly simpler and more reliable.

Process

I composed today’s post entirely in a flat text editor, except that I used a clever, “live-rendering” view that gave me some visual feedback and integrated help as I typed (see Online reStructuredText editor ).

Then I copied and pasted my “source” into a file, rstPost.txt, and used the rst2xml.py Python tool to generate an intermediate xml representation which I then converted into a DITA map and related subtopics using a set of XSLT transforms that I’m preparing for contribution to the DITA Open Toolkit. Finally, I used the expeDITA live map rendering feature to create the HTML view that I will paste into my WordPress blog in a few minutes.

Thoughts

If you already know reStructuredText conventions, then you can appreciate how useful this type of conversion process can be. In effect, this process now defines a fairly reliable migration path for existing rst content that you might desire to convert into DITA. For me, the most useful application is for editing “DITA” content on a cell phone, where text editing remains primitive, and structured editing support has tremendous usability issues. Just open a textarea field, type in your text equivalent of your desired structure, and then call back-end services to parse and transform that content into your ideal DITA format on the server.

The downside is that proper XML parsing is only possible using Python tools, which means that the process is not portable to servers based on popular Java or PHP stacks without using remote calls or web services. However, the reliability of the XML-based process makes the pain worth enduring for this case of at least one wikitext-like format that can support somewhat reliable migration into DITA wherever text-only content entry is your starting point.

 

rstPost original text version

my_result_rstPost PDF from generated DITA source using DITA Open Toolkit

This entry was posted in DITA, migration, wiki. Bookmark the permalink.

8 Responses to Creating DITA topics using reStructuredText

  1. Ben Allums says:

    reStructuredText is one of my absolute favorite markup languages Don. Glad to see you’re giving it a go.

  2. Don says:

    Any visual flaws that you see in this post, such as ‘missingspaces’ next to links, are the result of copying and pasting from the browser, not a processing artifact. I literally copied and pasted today’s post content from the DITA-OT processing results of the DITA map and topics that were generated from the original rst text file, which I’ll upload next. In a fully automated workflow, I could have created that content (including links, code samples, highlighting, all forms of lists, and simpletables) on any device that supports textarea fields in forms, and have generated the result into simple DITA for final form and into HTML for viewing.

  3. Don says:

    More production comments:

    Instead of using the expeDITA renderer, I just browsed DITA-OT’s “web” output and copied each page’s content into WordPress to get the composite view. Same difference, and all based on DITA-OT. I added a link to a DITA-OT generated PDF for comparison.

    Also, since the post was intended to demo the process of flat editing, I am resisting the urge to go in and change wording like “strategic markup strategy.” The point is not about the content being in WordPress where obviously I could correct it if I wanted to, but about the process whereby I could have been in Antarctica or on the International Space Station and still have submitted my text file directly into an XML DITA publishing workflow with the same result.

    I’d add this function into the expeDITA project in a heartbeat, but the Python integration puts me off. Collaborative DITA updates from a smart phone or tablet? A lynx browser? Absolutely possible!

    Can anyone advise whether reStructuredText authoring has been evaluated for usability by visually handicapped authors? Since the markup conventions are similar to what seems to be good screen reader layout, the idea of text-based DITA authoring seems feasible. Assistive tools such as a macro to insert title underscores would be handy, but would not be a deal breaker.

  4. I’m a fan of reStructuredText, having spent several years in a Python environment, where it is a defacto standard. rST is gaining traction among programmers working in other languages, including C++, Ruby, and JavaScript. For example, Read The Docs is site that hosts documentation for any open source project that uses rST. rST’s popularity among programmers lies in its unobtrusive syntax and its extensibility through “directives”. Programmers love having a plain-text documentation format that is human-readable in source form, that can be checked into version control and easily diffed, and that they can extend if they need additional capabilities.

    One tool that uses that extension mechanism is Sphinx, which extends rST to support cross-references, indexes, multi-file hierarchical documents, semantic markup for programming concepts, and other book-oriented features. About the only failing of Sphinx with respect to industrial-strength technical documentation is that it expects all documentation source files for a given project to reside under a single file directory. This makes content reuse somewhat awkward.

    What do you think about the possibility of converting Sphinx TOC trees into DITA maps? That could support a documentation lifecycle where documents started by programmers using Sphinx/rST could be handed off to technical writers using DITA at a certain stage of development. (The question of whether such a workflow is wise is separate from the technical feasibility of it.)

  5. Don says:

    Janet, we’ll have to define some conventions in mapping from rST into DITA, but the idea is very feasible. However, several issues could impede how easily the idea can be used by others:

    1. The XML DOM representation created by the rst2xml.py docutil tool conforms well to the docutils DTD, which is goodness–there IS a reliable way to parse rST content in a way that others can repeat exactly. After all, that is part of the interoperability value proposition of XML-based markups, so if any two people can parse an rST document identically, then Game On!

    (How good is this parser? It generates “section” scope that exactly matches the rules of DITA sections, so you can create unambiguous DITA groupings. It generates metadata that you can use as filenames in the exploded DITA tree. It generates exactly the same model for definition lists as the dlentry-wrapped version for DITA–those are literally a 1 to 1 mapping that can be “specialized” to generate simpletables as well (two-column simpletable == DITA definition list). The option list directive is directly transformable into property tables in DITA. Out of all the so-called lightweight markup languages, only this rST parser defines this rich structure as a basis for migration!)

    2. However, the parser is supported only in Python. This is a huge hindrance to portability because it requires anyone who wants to convert rST into DITA to get and learn the Python tool set. For servers based on Java, PHP, or .NET, it means that application programmers must install special server-side support for calling Python from those native execution environments. There IS a pretty decent Java parser, but it does not generate the XML DOM, so it is totally unsuited for reliable migration into DITA. Basically, the ability to generate a full DOM is only possible with Python tools. Python has scant and largely unhelpful docs for noob users on Windows, sorry to say.

    3. As you pointed out, the directives feature in rST is a direct mapping of Python syntax and execution into the document model, which violates the XML principle that the schema fully validates what can be in the document. If I process a document that has a .. codeblock:: directive in it, but I have not installed the co-requisite Python library, then the XML parser cannot fully validate the structure. So I get a lot of System Messages (which by the way are directly equivalent to DITA’s required_cleanup!). To some extent, rST’s directive feature restricts anyone ever being able to write a truly conforming parser in another language. Even if someone succeeded at a point in time, the work would always subject to ongoing code creep on the Python side.

    4. I wasn’t aware of the documentation source tree issue–that might explain why I have trouble parsing some sample documents from the DITA-OT shell environment where I prefer to run the final XSLT conversion into DITA.

    So it’s a capability that I love to hate. Good enough to do many things with, if only it were able to be used outside of the primary environment that spawned that good idea, and without some of these portability killer characteristics. I hope this “kind diatribe” makes sense and doesn’t totally show off my Python newbiness. Let me know if you want to try out the process and transforms I’ve documented thus far.

  6. Pingback: Creating DITA topics with a blog or wiki | Learning by Wrote

  7. Elena says:

    Hi Don,
    I’m currently searching for a way to transform rST sources into DITA topics. In case you’re still willing to spend time with this subject, I’d love to try out the transformations you documented. Where can I find them?
    Thank you,
    Elena

  8. Don says:

    Hi, Elena. I apologize for the long delay in approving your comment. I have recently made the materials available at my Github account at https://github.com/donrday/rst2dita. They require an installed Python environment with the docutils package to enable the initial parse from rST into docutils XML. The DITA Open Toolkit provides the Saxon environment for running the final conversion from the docutils XML file into a DITA map and associated DITA topics. I hope it provides a useful starting point for you!

Comments are closed.