Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, PHP, and whatever else I happen to feel like hacking at.

Multiple column PDF generation in Python

Jerry Stratton, May 27, 2007

My first step was to generate PDF in Snakelets. It wasn’t particularly useful PDF, but it showed that PDF generation was reasonably easy using ReportLab toolkit.

The second step was to embed Python in Django by combining Mako and Django templates.

The final step is to generate multi-column PDF with common content on each page. I’m going to step back into my Snakelets testbed to get this working, and then use what I learn to create the Django version.

The process of creating multiple column layouts in Platypus is conceptually quite simple. First, you create the page type (such as letter size, or landscape-oriented A4). Second, you place frames on the page. And finally, you pour some text into the page. Platypus will flow it into each frame in order, creating new pages with the same layout when the current page fills up.

Along the way, we can give the page more than one layout and control which one is used; and we can tell the page to let us know whenever a new page is created, so that we can add common elements to each page, such as a page number.

Multiple columns

Multiple columns turned out to be easier than I expected. Instead of using the SimpleDocTemplate as in the previous examples, we need to use BaseDocTemplate and create our own frames using Platypus. Platypus will flow the text into each frame as necessary.

I completely re-wrote the makePDF method on the PDF class of my Snakelets blog. First thing, of course, is to import the appropriate libraries.

  • import reportlab.platypus as platypus
  • import reportlab.lib

The PDF class is now very similar to the previous one, except that it uses BaseDocTemplate instead of SimpleDocTemplate.

[toggle code]

  • #creates a PDF version of the site
  • class PDF(Blog):
    • def serve(self, request, response):
      • response.setContentType("application/pdf")
      • #display it normally in the browser if possible, but if they save it give it a useful name
      • response.setHeader("Content-Disposition", "inline; filename=Second Chances.pdf")
      • out=response.getOutput()
      • self.makePDF(out)

I’ve added a content-disposition header. Instead of attachment, as you normally put into content-disposition, I’ve specified that the file should be displayed normally (which on some browsers will still be as an attachment if they don’t have the capability to inline-view PDF files). But by giving it a filename, if they do save the PDF they should get that as the default filename for the file.

[toggle code]

    • def makePDF(self, destination):
      • posts = []
      • #let's set up some styles
      • inch = reportlab.lib.units.inch
      • style = reportlab.lib.styles.getSampleStyleSheet()
      • #article styles
      • headlineStyle = style["Heading2"]
      • paraStyle = style["Normal"]
      • paraStyle.spaceAfter = inch*.04
      • paraStyle.alignment=reportlab.lib.enums.TA_JUSTIFY
      • #go through each blog post and store its title and content
      • for postKey in self.app.posts.posts:
        • #get the post
        • post = self.app.posts.posts[postKey]
        • body = post.body
        • title = post.title
        • #the parts of each post will need to be kept together
        • items = []
        • headline = platypus.Paragraph(title, headlineStyle)
        • items.append(headline)
        • for paragraph in body.split("\n"):
          • #I have never understood encode/decode
          • #and have no idea why this is the correct decoding on my Mac
          • para = paragraph.decode("cp1252")
          • para = platypus.Paragraph(para, paraStyle)
          • items.append(para)
        • item = platypus.KeepTogether(items)
        • posts.append(item)

All that the above code does is loop through each post, and add it to the list of items that will need to be on the page. Each post is held together by the KeepTogether function, so that a post will not break across page—or, in this case, frame—boundaries.

[toggle code]

      • #create the basic page and frames
      • document = platypus.BaseDocTemplate(destination, pagesize=reportlab.lib.pagesizes.letter)
      • frameCount = 2
      • frameWidth = document.width/frameCount
      • frameHeight = document.height-.05*inch
      • frames = []
      • #construct a frame for each column
      • for frame in range(frameCount):
        • leftMargin = document.leftMargin + frame*frameWidth
        • column = platypus.Frame(leftMargin, document.bottomMargin, frameWidth, frameHeight)
        • frames.append(column)
      • template = platypus.PageTemplate(frames=frames)
      • document.addPageTemplates(template)
      • document.build(posts)
      • return

I’ve made the number of frames flexible so that I can also create three or four columns by changing one variable. This will be useful when I switch to the restaurant listing. However many frames we want, the code loops through and sets the left margin to march across the page; the bottom margin, width, and height are the same for each frame.

The frames need to be combined into a template using PageTemplate, and the template then needs to be added to the document using addPageTemplates. Finally, the code builds the document as normal. Here’s what that code creates:

A different first page

You’ll notice that the function to add a template to a document is plural. We can add as many templates as we want, and choose the appropriate one for the next page.

When we have more than one template, we need to give each template an ID so that we can refer to it elsewhere.

For the blog, wouldn’t it be nice to have a title page that contains the title and subtitle of the blog, with only short columns below the title?

First we’ll need to set up some styles for the title page. Above the “article styles” section, add:

  • #title page styles
  • titleStyle = style["Title"]
  • titleStyle.fontSize=40
  • titleStyle.leading = titleStyle.fontSize*1.1
  • #need to copy the object or style changes will apply to any incarnation of "Normal"
  • subTitleStyle = copy.copy(style["Normal"])
  • subTitleStyle.alignment=reportlab.lib.enums.TA_CENTER
  • subTitleStyle.fontName="Times-Italic"

Also, import the copy module:

  • import copy

We need to copy the Normal style, because these styles are all objects. In Python, when you use “=” with an object on the right, you’re getting a reference to that object. But we’re already using the Normal style for our standard paragraphs. Without copying the object, any changes we make to subTitleStyle will also apply to paraStyle, and vice versa.

Now we need to add the title and subtitle to the list of items that need to appear in our document. Above the for loop that loops through each post, add:

  • #now create the title page
  • title = self.getTitle()
  • description = self.getDescription()
  • posts.append(platypus.Paragraph(title, titleStyle))
  • posts.append(platypus.Paragraph(description, subTitleStyle))
  • #done with the title info, move to the next frame and queue up the later page template
  • posts.append(platypus.FrameBreak())
  • posts.append(platypus.NextPageTemplate("laterPages"))

This will add the title and subtitle using the styles we’ve set up. Then, it appends them to the “posts” list. The trick here is, we’re going to have two different templates. One template is for the first page (the title page), and the other template will be for all of the rest of the pages. We’ll call that second template “laterPages”.

The first page template will have an extra frame at the top for the headline. The FrameBreak line here adds a frame break immediately after the subtitle, so that only the title and subtitle go into that first frame. The NextPageTemplate tells Platypus that, the next time it breaks the page, it should use the template called “laterPages”.

We set up the first page template right along with the standard template. Above the “construct the column frames” for loop, add:

  • #title page frames
  • firstPageHeight = 3.5*inch
  • firstPageBottom = frameHeight-firstPageHeight
  • framesFirstPage = []
  • titleFrame = platypus.Frame(document.leftMargin, firstPageBottom, document.width, firstPageHeight)
  • framesFirstPage.append(titleFrame)

This calculates how much space the title frame will use, and how much space is left for the columns, on the title page. Inside the “construct the column frames” for loop, add:

  • #columns for the first page
  • firstPageColumn = platypus.Frame(leftMargin, document.bottomMargin, frameWidth, firstPageBottom)
  • framesFirstPage.append(firstPageColumn)

So now, in addition to creating a set of frames in the “frames” list, that span from the top and bottom of the page, we’re also creating a set of frames in the “framesFirstPage” list, which span only from the bottom of the title to the bottom of the page.

Finally, we’re now using more than one template, so we have to set up that list. Replace the “template=” line and the addPageTemplates line with:

  • templates = []
  • templates.append(platypus.PageTemplate(frames=framesFirstPage, id="firstPage"))
  • templates.append(platypus.PageTemplate(frames=frames, id="laterPages"))
  • document.addPageTemplates(templates)

Instead of giving addPageTemplates a single template, we’re giving it a list of templates. The first template is made of the frames in the framesFirstPage list, and the second is the same template we made previously. Each template has an ID.

This should produce the following PDF:

It’s beginning to look like a perfectly good PDF document.

A common header

Most long documents will have page numbers, or some kind of header or footer. This is not something that can be done on the template, however, because it tends to change across each page. What we can do is add a marker to each template for which we need special treatment, telling Platypus to call a special function every time a new page is created using that template.

To the laterPages template, add “onPage=self.addHeader”:

  • templates.append(platypus.PageTemplate(frames=frames, id="laterPages", onPage=self.addHeader))

This tells Platypus to call the addHeader method whenever a new page is created from the laterPages template. For the addHeader method, add this method to the PDF class:

[toggle code]

  • #display the title of the blog and the current page
  • def addHeader(self, canvas, document):
    • canvas.saveState()
    • title = self.getTitle()
    • fontsize = 12
    • fontname = 'Times-Roman'
    • headerBottom = document.bottomMargin+document.height+document.topMargin/2
    • bottomLine = headerBottom - fontsize/4
    • topLine = headerBottom + fontsize
    • lineLength = document.width+document.leftMargin
    • canvas.setFont(fontname,fontsize)
    • if document.page % 2:
      • #odd page: put the page number on the right and align right
        • title += "-" + str(document.page)
        • canvas.drawRightString(lineLength, headerBottom, title)
    • else:
      • #even page: put the page number on the left and align left
      • title = str(document.page) + "-" + title
      • canvas.drawString(document.leftMargin, headerBottom, title)
    • #draw some lines to make it look cool
    • canvas.setLineWidth(1)
    • canvas.line(document.leftMargin, bottomLine, lineLength, bottomLine)
    • canvas.line(document.leftMargin, topLine, lineLength, topLine)
    • canvas.restoreState()

When Platypus calls this method, it will send us the canvas for the page, and the document that it’s writing to the page. Judging from the sample code out there, you’ll want to always begin your method by saving the canvas’s state, and end it by restoring the canvas’s state. I think that this ensures that if some other function is still using the canvas, we don’t end up inadvertently changing the font name, font size, or other settings under its nose.

This method works differently from the page itself. On the page we used what ReportLab calls flowables, to dynamically let the text flow where it needs to be. Here, we’re specifying exactly where the text will display.

On odd pages, we use the drawRightString method to align the header to the right. On even pages, we use drawString to align to the left.

In both cases we get the current page number from document.page.

On all pages, we draw a line across the top of the header and across the bottom of the header, just because it looks cool (and because I needed to know I could do it for my restaurants listing). You can also draw rectangles, circles, and curves. The user guide on the ReportLab Toolkit site explains how they work.

This is what the final PDF, with headers, looks like:

The After-midnight version

Using this in Django for my restaurant database was simple enough. The only problem I ran into is that, in my Django databases, I convert every special character into its HTML entity for flexibility. Platypus appears to understand only a very few entities, and it throws out any entity it doesn’t understand. The only fix I could come up with was to hand-edit either reportlab/platypus/paraparser.py to add the entities I needed to the greek dict, or, as I ended up doing, hand-editing reportlab/lib/xmllib.py to add them to ENTITYDEFS.

For example, currently restaurant names contain right single quotes and accented e’s. Until I edited xmllib.py, those characters disappeared from the final PDF.

[toggle code]

  • ENTITYDEFS = {
    • 'lt': '<',
    • 'gt': '>',
    • 'amp': '&',
    • 'quot': '"',
    • 'apos': '\'',
    • 'rsquo': '\xe2\x80\x99',
    • 'eacute': '\xc3\xa9',
    • }

There’s probably a better way of getting Platypus to recognize these entities, but I couldn’t find any mention of this in the documentation.

[toggle code]

  • {% mako %}
  • <%
  • import reportlab.platypus as platypus
  • import reportlab.lib
  • import StringIO
  • destination = StringIO.StringIO()
  • parts = []
  • #downsize the default styles
  • def adjustStyles(style):
    • reduction = .7
    • style.fontSize=style.fontSize*reduction
    • style.leading = style.leading*reduction
    • style.spaceAfter = style.spaceAfter*reduction
    • style.spaceBefore = style.spaceBefore*reduction
    • style.leftIndent = style.leftIndent*reduction
  • #display the title and hostname of the site
  • def addHeader(canvas, document):
    • canvas.saveState()
    • title = "San Diego After Midnight"
    • hostname = "DiningAfterMidnight.com"
    • fontsize = 12
    • fontname = 'Times-Roman'
    • headerBottom = document.bottomMargin+document.height-document.topMargin
    • bottomLine = headerBottom - fontsize/4
    • topLine = headerBottom + fontsize
    • lineLength = document.width+document.leftMargin
    • canvas.setFont(fontname,fontsize)
    • canvas.drawString(document.leftMargin, headerBottom, title)
    • canvas.drawRightString(lineLength, headerBottom, hostname)
    • #draw some lines to make it look cool
    • canvas.line(document.leftMargin, bottomLine, lineLength, bottomLine)
    • canvas.line(document.leftMargin, topLine, lineLength, topLine)
    • canvas.restoreState()
  • #let's set up some styles
  • inch = reportlab.lib.units.inch
  • style = reportlab.lib.styles.getSampleStyleSheet()
  • #styles
  • areaStyle = style["Heading1"]
  • areaStyle.spaceAfter = 0
  • areaStyle.spaceBefore = 4
  • adjustStyles(areaStyle)
  • nameStyle = style["Heading2"]
  • nameStyle.fontName="Times-Italic"
  • nameStyle.spaceAfter = 0
  • nameStyle.spaceBefore = 2
  • nameStyle.leftIndent=.13*inch
  • adjustStyles(nameStyle)
  • infoStyle = style["Normal"]
  • infoStyle.leftIndent=.25*inch
  • infoStyle.leading = infoStyle.leading*.85
  • adjustStyles(infoStyle)
  • #go through each restaurant and store its info
  • area = ""
  • for restaurant in restaurants:
    • #don't break a restaurant apart
    • items = []
    • #the area should be accompanied by at least one restaurant in its column
    • if restaurant.area.name != area:
      • area = restaurant.area.name
      • items.append(platypus.Paragraph(area, areaStyle))
    • items.append(platypus.Paragraph(restaurant.name, nameStyle))
    • items.append(platypus.Paragraph(restaurant.address(), infoStyle))
    • items.append(platypus.Paragraph(restaurant.phone, infoStyle))
    • items.append(platypus.Paragraph(restaurant.typeline(), infoStyle))
    • items.append(platypus.Paragraph(restaurant.hoursline(), infoStyle))
    • item = platypus.KeepTogether(items)
    • parts.append(item)
  • #create the basic page
  • pagesize =reportlab.lib.pagesizes.landscape(reportlab.lib.pagesizes.letter)
  • leftMargin = .25*inch
  • rightMargin = .25*inch
  • topMargin = .25*inch
  • bottomMargin = .25*inch
  • document = platypus.BaseDocTemplate(destination, pagesize=pagesize, leftMargin=leftMargin, rightMargin=rightMargin, topMargin=topMargin, bottomMargin=bottomMargin)
  • #create the frames
  • frameCount = 3
  • frameWidth = document.width/frameCount
  • #leave space for the header
  • frameHeight = document.height-.25*inch
  • frames = []
  • #construct the column frames
  • for frame in range(frameCount):
    • leftMargin = document.leftMargin + frame*frameWidth
    • column = platypus.Frame(leftMargin, document.bottomMargin, frameWidth, frameHeight)
    • frames.append(column)
  • document.addPageTemplates(platypus.PageTemplate(frames=frames, onPage=addHeader))
  • document.build(parts)
  • pdfpage = destination.getvalue().decode('utf8', 'ignore')
  • %>
  • ${pdfpage}
  • {% endmako %}

This uses the “mako” template tag I described in Embedding Mako into Django to embed Python directly into a Django template. You can see the result of this script at the Dining After Midnight web site.

The only new thing here is the adjustStyles function. I added that so that I could use the basic sizes and distances in the sample template but easily adjust them if the number of restaurants grows. I should really try to tie it to the number of columns. I could then check the final page to see if it has exceeded two pages, and if so, run through again with a smaller font size and/or an extra column.

  1. <- Mako and Django
  2. Python PDF Links ->