Nisus HTML conversion

Jerry Stratton, December 20, 2008

I’ve become a huge fan of Nisus just because of its ease-of-use. I’ve started using it for everything I would once have used AppleWorks for; it is very easy to use, has all of the features I need for most documents (and most of the features I need for all documents), and rarely gets in the way when I’m writing. I use it now for all of my fiction and will hopefully be using it for my non-fiction tutorials and game texts in the future.

It’s a rare bonus to have software that is so easy to use also have a useful scripting language. Nisus not only meets my needs when I’m writing, it’s a great tool when I’m hacking, too. In Nisus Clean HTML I wrote a rough script that converted a Nisus document into HTML by going through each paragraph, converting the paragraph to RTF, and searching the RTF for style names. This only worked for paragraph-level styles; I didn’t know RTF well enough to go any deeper (and still don’t).

Nisus 1.1 introduced some very useful features to the Nisus scripting language: we can now get the paragraph-level style name as well as the chunks of character-level styles within a paragraph.

Take a look at FireBlade Coffeehouse Texts for an example of what the script produces.

Getting paragraph styles

For example, here’s a simple script for collecting all of the paragraph style names used by the document.

[toggle code]

$thisDocument = Document.active
$paragraphStyles = Array.new()
Select Paragraph 1
Select Start
$inBlockQuote = 0
While Select Next Paragraph
- $currentSelection = $thisDocument.textSelection
- $style = $currentSelection.typingAttributes
- $styleName = $style.paragraphStyleName
- If $paragraphStyles.indexOfValue($styleName) == -1
  - $paragraphStyles.appendValue $styleName
- End
End
$paragraphStyles.sort
$paragraphStyles = $paragraphStyles.join("\n")
$styles = "PARAGRAPH STYLES:\n$paragraphStyles"
Prompt $styles
Write Clipboard $styles

Save this in your Nisus macro folder (find it in the Macro menu of Nisus) and it will display all paragraph-level styles in use, and copy them to the clipboard for pasting.

Document.textSelection provides the current selected stuff, as a TextSelection object.
TextSelection.typingAttributes provides an Attributes object of the attributes of the currently selected stuff; this will let us get the style information.
Attribute.paragraphStyleName gives the paragraph-level style in those attributes; since the script is selecting one paragraph at a time, there will only ever be one paragraph-level style to worry about.
Array.indexOfValue($styleName) checks to see if that style name is already in the array of style names; if indexOfValue comes back with -1, that style name is not in the list.
Array.appendValue($styleName) adds that style name to the list.
Array.sort sorts the array alphabetically.
Array.join("\n") turns the array into a string of text, with each style name on its own line.

Character styles

Getting character-level styles is very similar, but we need to loop through the chunks of the paragraph that have the character-level style.

[toggle code]

$thisDocument = Document.active
$paragraphStyles = Array.new()
$characterStyles = Array.new()
Select Paragraph 1
Select Start
$inBlockQuote = 0
While Select Next Paragraph
- $currentSelection = $thisDocument.textSelection
- $style = $currentSelection.typingAttributes
- $styleName = $style.paragraphStyleName
- If $paragraphStyles.indexOfValue($styleName) == -1
  - $paragraphStyles.appendValue $styleName
- End
- #are there any character styles?
- $currentParagraph = $currentSelection.subtext
- $currentIndex = 0
- While $currentIndex < $currentParagraph.length
  - $currentStyleRange = $currentParagraph.rangeOfAttributesAtIndex $currentIndex
  - $subText = $currentParagraph.subtextInRange $currentStyleRange
  - $style = $subText.attributesAtIndex 0
  - $styleName = $style.characterStyleName
  - If $styleName
    - If $characterStyles.indexOfValue($styleName) == -1
      - $characterStyles.appendValue $styleName
    - End
  - End
  - $currentIndex = $currentIndex + $currentStyleRange.length
- End
End
$paragraphStyles.sort
$characterStyles.sort
$paragraphStyles = $paragraphStyles.join("\n")
$characterStyles = $characterStyles.join("\n")
$styles = "PARAGRAPH STYLES:\n$paragraphStyles\n\nCHARACTER STYLES:\n$characterStyles"
Prompt $styles
Write Clipboard $styles

TextSelection.subtext provides a Text object of the selected text, which in this case will be a paragraph.
Text.length is the total length of that paragraph.
Text.rangeOfAttributesAtIndex provides a Range object describing the range of characters that have the same attributes around the index number the script gives it (which starts at zero and increases as the script loops through the text).
Text.subtextInRange $currentStyleRange provides a Text object of that range of similarly-styled text.
Text.attributesAtIndex 0 provides an Attributes object, which will contain the character-level style name.
Attributes.characterStyleName provides the character-level style name.
After that, it does the same thing as above, to collect an array of character-level style names.

This might be useful if you need to see at a glance what styles you’re using (as opposed to all the styles in the document, which can contain styles you’re not using).

Using the HTML conversion script

Other than that the basic logic of this script is the same as the earlier one. I’ve modified it a bit so that it’s easier to use and easier to distribute.

Put the entire “Publish” folder of the Nisus HTML conversion scripts into your Nisus Macros folder (you can find it in the Macros menu when Nisus is open). You can name it whatever you want; it doesn’t need to be called “Publish”.

TOP.html and BOTTOM.html are used to bookend the HTML that the script creates. TOP.html is probably most useful, since it lets you add your own style sheet. Inside TOP.html, __TITLE__ and __FILENAME__ will be replaced by the document title (determined from the file name) and the document filename (without the extension), respectively.

The script automatically creates a folder with the same name as the document (without the extension). Inside that folder, it creates a PDF version of the file, a gzipped copy of the RTF version of the file, and the HTML file. If there are images, it creates an images folder and puts the images in that folder.

The script will erase any existing content in those folders if they are the same name as what the script produces. And it won’t warn you, because that’s annoying.

I’ve also included a few other scripts that might be useful, such as a script for extracting all images from the document, and the example script above for collecting the styles that are used in the document.

I’ll update the archive whenever I add new features, such as table and list conversion, which I expect to do eventually.

Notes

This script was really made for me; you are likely to find it less useful than I do, and will probably have to modify it.

This currently relies on the name of the style for determining headline levels, so you’ll need to use the built-in “heading x” styles to set headline levels. Styles based off of those won’t get picked up, and neither will styles that have had navigation levels applied to them. Hopefully, the navigation level will be able to be picked up from the Attributes object in future versions.
This script only converts styles. It does not convert attributes. That’s not because the Nisus scripting language can’t do it, it’s because that’s what I want. I have no desire to make my web documents look like my print documents. I’m assuming that any attributes not part of a style are meant only for paper and shouldn’t carry over into the HTML.
It still uses RTF for grabbing images.
It does not do lists or tables yet. I had this working in the previous version, but haven’t needed it yet in this version.
It would be a bit more reliable if I could use $thisDocument.text, however, that will drop any text inside of tables. There is no Document.paragraphs that returns the same things we get when we Select Next Paragraph. But using Select Next Paragraph does mean that you can’t switch documents while this script is running—the selection will follow the active document.
Because the resulting HTML is very regular, it is easy to parse it (as I did for the FireBlade Texts site) into separate pages for each top-level or sub-level headline. I’ve included the Python script I use for that as an example.
The way that the Nisus scripting language seamlessly shifts between Nisus and Perl is a joy to work with. Perl is usually my first choice for filtering text, and obviously in a word processor filtering text is a common task.

I love it when a Perl comes together.

September 28, 2013: Nisus HTML script now handles floating content

Download Zip file (14.7 KB)

My Nisus simple HTML publish script now handles floating images and floating text boxes. This means that floating images are no longer lost, and neither are tables or other text inside of floating text boxes.

This was made possible by several new features in Nisus’s macro language. For images, the script no longer has to extract the image from RTF. Selections have an enclosedInlineImages and an enclosedFloatingContents method, making it simple enough to grab any images in a selection:

[toggle code]

$currentSelection = $thisDocument.textSelection
#are there any images?
$images = $currentSelection.enclosedInlineImages
If $images
- ForEach $image in $images
  - SaveImage($image)
- End
End
$floats = $currentSelection.enclosedFloatingContents
If $floats
- ForEach $float in $floats
  - If $float.isImage
    - SaveImage($float.image)
  - End
- End
End

And images themselves are now objects with properties and methods, including the ability to write themselves out to a file:

[toggle code]

Define Command SaveImage($image, $imageFolder)
- File.createFolderWithPath $imageFolder
- $imagePath = $image.writeWebImageToFolderAtPath $imageFolder
- $imageFileName = $imagePath.lastFilePathComponent
- $imageName = $imageFileName
- Begin Perl
  - use URI::Escape;
  - $imageFileName = uri_escape($imageFileName);
- End
- $imageHTML = "<img src=\"images/$imageFileName\" alt=\"$imageName\" />"
- return $imageHTML
End

The writeWebImageToFolderAtPath ensures that the image is in a standard web format, regardless of its format in the document. This means that embedded PDFs and EPS images will be written to a more standard format, currently JPEG.

The ability to define commands contributed heavily to the main improvement of this version. It allowed me to separate out the code that handles text conversions, which in turn allowed me to call it both for text in the normal document flow and on text in floating text boxes.

Read the full post and comments

January 18, 2010: Nisus HTML script now handles tables

Download Zip file (14.7 KB)

I finally got around to copying the table code from my old publish script to the new one. Note that it currently is written for me, and I never have tables with more than one line in them. So, I transfer paragraph-level styles from the paragraph to the table cell. You may find this less than useful.

Read the full post and comments

FireBlade Coffeehouse Texts: Texts available in RTF, PDF, and HTML format.
Nisus HTML Conversion scripts (Zip file, 14.7 KB): This script will convert a simple Nisus document (no tables or lists) into very clean HTML.

More Nisus

Importing an index into Nisus: Nisus makes it very easy to import an externally-generated index into a document.
Text to image filter for Smashwords conversions: Smashwords has very strange requirements for ebooks. This script is what I use to convert books to .doc format for Smashwords, including converting tables to images.
Nisus HTML script now handles floating content: My Nisus simple HTML publish script now handles floating images and floating text boxes.
Lulu, Nisus, and Gods & Monsters: Lulu is sometimes really annoying. But they usually get the job done. Nisus, on the other hand, is rarely annoying to use and always gets the job done.
Nisus Writer Pro 2.0: The new Nisus is pure awesome: very easy to use, and it does everything I need.
Four more pages with the topic Nisus, and other related pages

More Perl

Simple .ics iCalendar file creator: A simple Perl script to create an ics file from a human-readable text of events.
No premature optimization: Don’t optimize code before it needs optimization or you’re likely to create unoptimized code.
Using Term::ANSIColor with GeekTool: Rather than using the raw codes directly, Perl (at least on OS X) comes with Term::ANSIColor built in.
Nisus “clean HTML” macro: The Nisus macro language is Perl; this means we can use all of Perl’s strengths as a text filter scripting language in Nisus.
SilverService and Taskpaper: SilverService is a great little app if you commonly need to repetitiously modify text. Any application that supports services will support running selected text through command-line scripts via SilverService.
Three more pages with the topic Perl, and other related pages

Comments?

The undiscovered comment form, whose bourn no poster returns.

Your email, URL, and location are optional—but I won’t be able to contact you if you don’t leave a working email. Your email does not get displayed, your URL and location do. Your name is required but may vary as the needs of the day demand, or you can just use the anonymous Hark Thrice name. You can use the following tags: <em>, <a>, <blockquote>. Use them wisely and post intelligently. Comments may take some time to approve, especially if I’m stuck in a Mexican jail.

If you have private comments, or questions about this page, please, leave a message on the Negative Space Comments Page.

Lost?

If you’re looking for something here, use the search box in the navigation to limit your search to this part of the site, or use the Negative Space search page.

Jerry

We have a house rule for grappling in my game: if you try to grapple with someone, it turns out that they are made of lava. This greatly simplifies things and avoids all problems with less elegant rules. — Swordgleam (LivingDice.com)

Contents of Negative Space™ as a whole Copyright © 1994-2024 Jerry Stratton. Individual copyrights remain held by their respective authors unless they specify otherwise. Site titles, such as Negative Space, Strange Bedfellows, Biblyon Broadsheet, Highland Games, and FireBlade Coffeehouse are trademarks of Jerry Stratton.

Code and code snippets, to the extent that they are copyrightable, may be re-distributed under the terms of the GNU General Public License 3.

Nisus HTML conversion last modified September 28th, 2013.

Your comment
Your name
Your email
Your web page
Your location

Mimsy Were the Borogoves

Nisus HTML conversion

Getting paragraph styles

Character styles

Using the HTML conversion script

Notes

More Nisus

More Perl

Editorials

Books, Movies, & Music

Technology & Hacks

Food

42 Astounding Scripts

Walkerville Reader

Biblyon Broadsheet

About Mimsy

Comments?

Lost?

Mimsy Were the Borogoves

Nisus HTML conversion

Getting paragraph styles

Character styles

Using the HTML conversion script

Notes

More Nisus

More Perl

Editorials

Books, Movies, & Music

Technology & Hacks

Food

42 Astounding Scripts

Walkerville Reader

Biblyon Broadsheet

Blogroll

Keep in touch

About Mimsy

Comments?

Lost?