Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, PHP, and whatever else I happen to feel like hacking at.

Nisus HTML script now handles floating content

Jerry Stratton, September 28, 2013

My Nisus simple HTML publish script now handles floating images and floating text boxes. This means that floating images are no longer lost, and neither are tables or other text inside of floating text boxes.

This was made possible by several new features in Nisus’s macro language. For images, the script no longer has to extract the image from RTF. Selections have an enclosedInlineImages and an enclosedFloatingContents method, making it simple enough to grab any images in a selection:

[toggle code]

  • $currentSelection = $thisDocument.textSelection
  • #are there any images?
  • $images = $currentSelection.enclosedInlineImages
  • If $images
    • ForEach $image in $images
      • SaveImage($image)
    • End
  • End
  • $floats = $currentSelection.enclosedFloatingContents
  • If $floats
    • ForEach $float in $floats
      • If $float.isImage
        • SaveImage($float.image)
      • End
    • End
  • End

And images themselves are now objects with properties and methods, including the ability to write themselves out to a file:

[toggle code]

  • Define Command SaveImage($image, $imageFolder)
    • File.createFolderWithPath $imageFolder
    • $imagePath = $image.writeWebImageToFolderAtPath $imageFolder
    • $imageFileName = $imagePath.lastFilePathComponent
    • $imageName = $imageFileName
    • Begin Perl
      • use URI::Escape;
      • $imageFileName = uri_escape($imageFileName);
    • End
    • $imageHTML = "<img src=\"images/$imageFileName\" alt=\"$imageName\" />"
    • return $imageHTML
  • End

The writeWebImageToFolderAtPath ensures that the image is in a standard web format, regardless of its format in the document. This means that embedded PDFs and EPS images will be written to a more standard format, currently JPEG.

The ability to define commands contributed heavily to the main improvement of this version. It allowed me to separate out the code that handles text conversions, which in turn allowed me to call it both for text in the normal document flow and on text in floating text boxes.

[toggle code]

  • Define Command FloatingExtractor($float)
    • $tempDocument = Document.newWithText $float.text
    • $tempDocument.show
    • While Select Next Paragraph
      • $currentSelection = $tempDocument.textSelection
      • paragraphExtractor($currentSelection, $tempDocument)
    • End
    • $tempDocument.close True
  • End
  • While Select Next Paragraph
    • paragraphExtractor($currentSelection, $thisDocument)
    • If $floats
      • ForEach $float in $floats
        • If $float.isTextBox
          • FloatingExtractor($float)
          • $thisDocument.show
        • End
      • End
    • End
  • End

When the script detects floating text content, it extracts the floating content into a new document, then selects each part in order just as it does for the main document.

Nisus’s new macro features, including hash arrays, also removed the need for a Perl include file; if you look at the code in the zip file, you’ll see that has gone away. The script still uses Perl extensively for Perl’s text munging capability, but it no longer needs separate global functions and variables; those can all be handled in Nisus.

A somewhat lesser new feature is that it knows when a paragraph style inherits from a headline style, and uses the headline style’s textual level (that is, heading 1 is level 1, heading 2 is level 2, etc.). It does not use the table of contents level from the style, because paragraphs can have multiple table of contents levels.

In response to Nisus HTML conversion: New features in Nisus’s scripting language make HTML conversion almost a breeze.

  1. <- Nisus HTML tables