Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, PHP, and whatever else I happen to feel like hacking at.

Nisus HTML conversion

Jerry Stratton, December 20, 2008

I’ve become a huge fan of Nisus just because of its ease-of-use. I’ve started using it for everything I would once have used AppleWorks for; it is very easy to use, has all of the features I need for most documents (and most of the features I need for all documents), and rarely gets in the way when I’m writing. I use it now for all of my fiction and will hopefully be using it for my non-fiction tutorials and game texts in the future.

It’s a rare bonus to have software that is so easy to use also have a useful scripting language. Nisus not only meets my needs when I’m writing, it’s a great tool when I’m hacking, too. In Nisus Clean HTML I wrote a rough script that converted a Nisus document into HTML by going through each paragraph, converting the paragraph to RTF, and searching the RTF for style names. This only worked for paragraph-level styles; I didn’t know RTF well enough to go any deeper (and still don’t).

Nisus 1.1 introduced some very useful features to the Nisus scripting language: we can now get the paragraph-level style name as well as the chunks of character-level styles within a paragraph.

Take a look at FireBlade Coffeehouse Texts for an example of what the script produces.

Getting paragraph styles

For example, here’s a simple script for collecting all of the paragraph style names used by the document.

[toggle code]

  • $thisDocument = Document.active
  • $paragraphStyles = Array.new()
  • Select Paragraph 1
  • Select Start
  • $inBlockQuote = 0
  • While Select Next Paragraph
    • $currentSelection = $thisDocument.textSelection
    • $style = $currentSelection.typingAttributes
    • $styleName = $style.paragraphStyleName
    • If $paragraphStyles.indexOfValue($styleName) == -1
      • $paragraphStyles.appendValue $styleName
    • End
  • End
  • $paragraphStyles.sort
  • $paragraphStyles = $paragraphStyles.join("\n")
  • $styles = "PARAGRAPH STYLES:\n$paragraphStyles"
  • Prompt $styles
  • Write Clipboard $styles

Save this in your Nisus macro folder (find it in the Macro menu of Nisus) and it will display all paragraph-level styles in use, and copy them to the clipboard for pasting.

  1. Document.textSelection provides the current selected stuff, as a TextSelection object.
  2. TextSelection.typingAttributes provides an Attributes object of the attributes of the currently selected stuff; this will let us get the style information.
  3. Attribute.paragraphStyleName gives the paragraph-level style in those attributes; since the script is selecting one paragraph at a time, there will only ever be one paragraph-level style to worry about.
  4. Array.indexOfValue($styleName) checks to see if that style name is already in the array of style names; if indexOfValue comes back with -1, that style name is not in the list.
  5. Array.appendValue($styleName) adds that style name to the list.
  6. Array.sort sorts the array alphabetically.
  7. Array.join("\n") turns the array into a string of text, with each style name on its own line.

Character styles

Getting character-level styles is very similar, but we need to loop through the chunks of the paragraph that have the character-level style.

[toggle code]

  • $thisDocument = Document.active
  • $paragraphStyles = Array.new()
  • $characterStyles = Array.new()
  • Select Paragraph 1
  • Select Start
  • $inBlockQuote = 0
  • While Select Next Paragraph
    • $currentSelection = $thisDocument.textSelection
    • $style = $currentSelection.typingAttributes
    • $styleName = $style.paragraphStyleName
    • If $paragraphStyles.indexOfValue($styleName) == -1
      • $paragraphStyles.appendValue $styleName
    • End
    • #are there any character styles?
    • $currentParagraph = $currentSelection.subtext
    • $currentIndex = 0
    • While $currentIndex < $currentParagraph.length
      • $currentStyleRange = $currentParagraph.rangeOfAttributesAtIndex $currentIndex
      • $subText = $currentParagraph.subtextInRange $currentStyleRange
      • $style = $subText.attributesAtIndex 0
      • $styleName = $style.characterStyleName
      • If $styleName
        • If $characterStyles.indexOfValue($styleName) == -1
          • $characterStyles.appendValue $styleName
        • End
      • End
      • $currentIndex = $currentIndex + $currentStyleRange.length
    • End
  • End
  • $paragraphStyles.sort
  • $characterStyles.sort
  • $paragraphStyles = $paragraphStyles.join("\n")
  • $characterStyles = $characterStyles.join("\n")
  • $styles = "PARAGRAPH STYLES:\n$paragraphStyles\n\nCHARACTER STYLES:\n$characterStyles"
  • Prompt $styles
  • Write Clipboard $styles
  1. TextSelection.subtext provides a Text object of the selected text, which in this case will be a paragraph.
  2. Text.length is the total length of that paragraph.
  3. Text.rangeOfAttributesAtIndex provides a Range object describing the range of characters that have the same attributes around the index number the script gives it (which starts at zero and increases as the script loops through the text).
  4. Text.subtextInRange $currentStyleRange provides a Text object of that range of similarly-styled text.
  5. Text.attributesAtIndex 0 provides an Attributes object, which will contain the character-level style name.
  6. Attributes.characterStyleName provides the character-level style name.
  7. After that, it does the same thing as above, to collect an array of character-level style names.

This might be useful if you need to see at a glance what styles you’re using (as opposed to all the styles in the document, which can contain styles you’re not using).

Using the HTML conversion script

Other than that the basic logic of this script is the same as the earlier one. I’ve modified it a bit so that it’s easier to use and easier to distribute.

Put the entire “Publish” folder of the Nisus HTML conversion scripts into your Nisus Macros folder (you can find it in the Macros menu when Nisus is open). You can name it whatever you want; it doesn’t need to be called “Publish”.

TOP.html and BOTTOM.html are used to bookend the HTML that the script creates. TOP.html is probably most useful, since it lets you add your own style sheet. Inside TOP.html, __TITLE__ and __FILENAME__ will be replaced by the document title (determined from the file name) and the document filename (without the extension), respectively.

The script automatically creates a folder with the same name as the document (without the extension). Inside that folder, it creates a PDF version of the file, a gzipped copy of the RTF version of the file, and the HTML file. If there are images, it creates an images folder and puts the images in that folder.

The script will erase any existing content in those folders if they are the same name as what the script produces. And it won’t warn you, because that’s annoying.

I’ve also included a few other scripts that might be useful, such as a script for extracting all images from the document, and the example script above for collecting the styles that are used in the document.

I’ll update the archive whenever I add new features, such as table and list conversion, which I expect to do eventually.

Notes

This script was really made for me; you are likely to find it less useful than I do, and will probably have to modify it.

  • This currently relies on the name of the style for determining headline levels, so you’ll need to use the built-in “heading x” styles to set headline levels. Styles based off of those won’t get picked up, and neither will styles that have had navigation levels applied to them. Hopefully, the navigation level will be able to be picked up from the Attributes object in future versions.
  • This script only converts styles. It does not convert attributes. That’s not because the Nisus scripting language can’t do it, it’s because that’s what I want. I have no desire to make my web documents look like my print documents. I’m assuming that any attributes not part of a style are meant only for paper and shouldn’t carry over into the HTML.
  • It still uses RTF for grabbing images.
  • It does not do lists or tables yet. I had this working in the previous version, but haven’t needed it yet in this version.
  • It would be a bit more reliable if I could use $thisDocument.text, however, that will drop any text inside of tables. There is no Document.paragraphs that returns the same things we get when we Select Next Paragraph. But using Select Next Paragraph does mean that you can’t switch documents while this script is running—the selection will follow the active document.
  • Because the resulting HTML is very regular, it is easy to parse it (as I did for the FireBlade Texts site) into separate pages for each top-level or sub-level headline. I’ve included the Python script I use for that as an example.
  • The way that the Nisus scripting language seamlessly shifts between Nisus and Perl is a joy to work with. Perl is usually my first choice for filtering text, and obviously in a word processor filtering text is a common task.

I love it when a Perl comes together.

September 28, 2013: Nisus HTML script now handles floating content

My Nisus simple HTML publish script now handles floating images and floating text boxes. This means that floating images are no longer lost, and neither are tables or other text inside of floating text boxes.

This was made possible by several new features in Nisus’s macro language. For images, the script no longer has to extract the image from RTF. Selections have an enclosedInlineImages and an enclosedFloatingContents method, making it simple enough to grab any images in a selection:

[toggle code]

  • $currentSelection = $thisDocument.textSelection
  • #are there any images?
  • $images = $currentSelection.enclosedInlineImages
  • If $images
    • ForEach $image in $images
      • SaveImage($image)
    • End
  • End
  • $floats = $currentSelection.enclosedFloatingContents
  • If $floats
    • ForEach $float in $floats
      • If $float.isImage
        • SaveImage($float.image)
      • End
    • End
  • End

And images themselves are now objects with properties and methods, including the ability to write themselves out to a file:

[toggle code]

  • Define Command SaveImage($image, $imageFolder)
    • File.createFolderWithPath $imageFolder
    • $imagePath = $image.writeWebImageToFolderAtPath $imageFolder
    • $imageFileName = $imagePath.lastFilePathComponent
    • $imageName = $imageFileName
    • Begin Perl
      • use URI::Escape;
      • $imageFileName = uri_escape($imageFileName);
    • End
    • $imageHTML = "<img src=\"images/$imageFileName\" alt=\"$imageName\" />"
    • return $imageHTML
  • End

The writeWebImageToFolderAtPath ensures that the image is in a standard web format, regardless of its format in the document. This means that embedded PDFs and EPS images will be written to a more standard format, currently JPEG.

The ability to define commands contributed heavily to the main improvement of this version. It allowed me to separate out the code that handles text conversions, which in turn allowed me to call it both for text in the normal document flow and on text in floating text boxes.

January 18, 2010: Nisus HTML script now handles tables

I finally got around to copying the table code from my old publish script to the new one. Note that it currently is written for me, and I never have tables with more than one line in them. So, I transfer paragraph-level styles from the paragraph to the table cell. You may find this less than useful.

  1. <- Django Add Related Item
  2. Leopard setuid ->