Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, PHP, and whatever else I happen to feel like hacking at.

Adding links to PDF in Python

Jerry Stratton, June 15, 2007

In Multi-column PDFs I went over how to add frames to a reportlab toolkit PDF document using Python. I did this for my San Diego Dining After Midnight web page. While it’s meant for printing, PDFs can contain hyperlinks, and with the growth of PDAs with quality screens, PDFs will probably become even more useful on the computer than on paper. So why not add links to the restaurants and to the map to get to the restaurant?

Adding a linked area to a PDF

There are two ways to make a link in reportlab. The first is by placing it on a rectangle on the page. For example, at the top of each page I have a header that contains the hostname for the Dining After Midnight web site. I create it using a function that draws the hostname in the upper right corner of the page.

[toggle code]

  • def addHeader(canvas, document):
    • canvas.saveState()
    • hostname = "DiningAfterMidnight.com"
    • hostlink = "http://www." + hostname + "/"
    • fontsize = 12
    • fontname = 'Times-Roman'
    • headerBottom = document.bottomMargin+document.height-document.topMargin
    • bottomLine = headerBottom - fontsize/4
    • topLine = headerBottom + fontsize
    • lineLength = document.width+document.leftMargin
    • canvas.setFont(fontname,fontsize)
    • canvas.drawRightString(lineLength, headerBottom, hostname)
    • hostnamewidth = canvas.stringWidth(hostname)
    • linkRect = (lineLength, bottomLine, lineLength-hostnamewidth, topLine)
    • canvas.linkURL(hostlink, linkRect)

The emphasized lines show the new code to create a link to the hostname over the hostname text. If you’re familiar with HTML, this is not like HTML at all. In HTML, we specify what text the link “belongs to”. In PDF, we specify the rectangular portion of the page that the link lives at. If we want that rectangular portion to correspond to some specific text, we need to determine where that text is and how big it is.

Since the hostname is drawn on the page using drawRightString, we already know the lower right hand corner of the text. The width is available from the stringWidth method.

Linking text in a PDF

The platypus extension to reportlab makes link creation sort of easier. Sort of, because there’s a bit of a gotcha if your PDF contains multiple markup.

The way to mark some text in platypus as being linked is to surround it with the link tag. If you are familiar with HTML, the link tag is very similar to the a tag in HTML:

  • <link href="http://www.hoboes.com/Mimsy/'">Mimsy Were the Borogoves</link>

This, for example, modifies the Dining After Midnight PDF code to generate linked restaurant names and linked street addresses:

[toggle code]

  • name = restaurant.name
  • name = name.replace('’', '<unichar name="RIGHT SINGLE QUOTATION MARK"/>')
  • if restaurant.url:
    • link = restaurant.url
    • if link.live:
      • name = '<link href="' + link.get_absolute_url() + '">' + name + '</link>'
  • address = restaurant.address()
  • address = address.replace('’', '<unichar name="RIGHT SINGLE QUOTATION MARK"/>')
  • address = '<link href="' + restaurant.mapref() + '">' + address + '</link>'
  • items.append(platypus.Paragraph(name, nameStyle))
  • items.append(platypus.Paragraph(address, infoStyle))

It’s simple enough. All it is is text manipulation. The “restaurant” object contains its name, its address, and a link object. And it knows how to generate a map link (mapref) to its address.

The problem is that if you have any markup or special characters already in the address or title, those, and only those, linked texts will be underlined. To the reader, it appears that some of the links are underlined, and some aren’t, which really ends up looking as if some texts aren’t linked when they really are.

The reason is that platypus automatically underlines links if other markup appears in the link text, but not if it doesn’t. This includes adding unicode characters using platypus’s unichar markup. As far as I can tell, this doesn’t look like a bug. I have no idea why it’s there, though. I can’t find any mention of why it happens or even that it happens in the documentation.

I “fixed” it by commenting out the line “tx._canvas.line(t_off+x1, y, t_off+x2, y)” on line 464 of reportlab/platypus/paragraph.py:

[toggle code]

  • for x1,x2,link in xs.links:
    • #tx._canvas.line(t_off+x1, y, t_off+x2, y)
    • _doLink(tx, link, (t_off+x1, y, t_off+x2, yl))
  • xs.links = []

This may end up disabling some underlining capabilities in platypus, but since I don’t use underlining it’s not a problem for me. Your mileage may vary.

February 6, 2016: Test classes and objects in python

One of the critical advances in computer programming since I began programming in the eighties are Objects. An Object is a thing that can include both functions and variables. In Python, an object is an instance of a class. For example, Django relies heavily on classes for its models. In Adding links to PDF in Python the main class used is for a restaurant. Each object is an instance of the restaurant class.

But one of the great things about object-oriented programming is that the things that access your objects don’t care what class it is. They care only whether it has the appropriate functions and variables. When they are on an object, a function is called a method and a variable is called a property.

Any object can masquerade as another object by providing the same methods and properties. This means that you can easily make test classes that allows creating objects for use in the PDF example.

In Adding links to PDF in Python, I had a Django model for the Restaurants and a Django model for the Links that were each restaurant’s web page. But because Django models are nothing more than (very useful) classes, you can make a fake Restaurant and fake Link to impersonate what the code snippet expects.

[toggle code]

  • # in real life, the Link class would probably pull information from a database of links
  • # and live would be whether it is currently a valid link,
  • # and get_absolute_url would be the actual URL for that link
  • class Link():
    • def __init__(self, title):
      • self.title = title
    • def live(self):
      • return True
    • def get_absolute_url(self):
      • return "http://www.example.com/" + self.title.replace(" ", "_")
  • # in real life, the Restaurant class would probably be a table of restaurants
  • # and would store the name of each restaurant, an id for the restaurant's web site
  • # and the restaurant's address
  • class Restaurant():
    • def name(self):
      • return "The Green Goblin"
    • def url(self):
      • myURL = Link("The Green Goblin")
      • return myURL
    • def address(self):
      • return "1060 West Addison, Chicago, IL"
    • def mapref(self):
      • return "https://www.google.com/maps/place/" + self.address().replace(" ", "+")

Save that as restaurant.py.

Objects are created from classes using:

  1. <- Multi-column PDFs
  2. Three-fold PDF ->