Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, Swift, BASIC, and whatever else I happen to feel like hacking at.

The Dream of Poor Bazin

Jerry Stratton

What if the Three Musketeers were journalists in Washington, DC? What if journalists were swashbuckling, swaggering, hard-drinking warriors of truth? Find out in Jerry Stratton’s The Dream of Poor Bazin.

Goodreads: What books did I read last week and last month?

Jerry Stratton, February 26, 2016

I started using Goodreads in 2014, and it’s a nice way of tracking what books I read and what I thought about them. One thing I have definitely noticed missing, however, is an advanced search. Every once in a while I want to see what and how many books I read in the last week, or the last month, or the last x days, and there is no such search. You have to count them up yourself.

At the end of the year, I wanted to find all of the books from 2015 that I’d rated at 5. The only way I could find to do this was to eyeball the list. An advanced search would have made this easy.

In my Django database of books purchased, this sort of data is easy to drill into. It’s very easy to see how many books I purchased in February 2015, for example. And since one of my New Year’s Resolutions is that I am going to read more than I buy, comparing books purchased to books read is important!

However, while it does not provide an advanced search Goodreads does offer data export so that you can save your data to your hard drive. If you use Goodreads extensively it’s a good idea to make regular backups. The export creates a CSV, or comma-separated file, of your books, ratings, reviews, pretty much everything associated with each book you read. This allows us to make an advanced search of our own.

I chose Python because I’m familiar with it, and because it has both a CSV module and an in-memory SQL module (based on SQLite 3) built in.

The drawback to the CSV module is that it is relatively old and not unicode aware. In its defense it assumes everything is utf–8, but it doesn’t mark it that way so that the rest of Python knows. Goodreads, fortunately, provides its CSV file as utf–8. It’s not too hard to make a Python generator that will return unicode/UTF8 values when importing from the Goodreads csv file.

One more trick is that the csv reader only knows of one type of value, the string. But if we want last week’s data, we need to be able to search on a date. So, riffing on LMatter’s stack overflow code, I made a UnicodeDictReader that also converts Goodreads’s Date Read to a Python date.

This part of the script then loops through every line in the csv file and inserts the relevant data into an on-the-fly sqlite3 database.

[toggle code]

  • # import data
  • # csv is not unicode
  • #http://stackoverflow.com/questions/5004687/python-csv-dictreader-with-utf-8-data
  • def UnicodeDictReader(utf8_data):
    • csvReader = DictReader(utf8_data)
    • if args.verbose:
      • print csvReader.fieldnames
    • for row in csvReader:
      • yield {key: massageValues(key, value) for key, value in row.iteritems()}
  • def massageValues(key, value):
    • if key == 'Date Read':
      • if value:
        • (year, month, day) = value.split('/')
        • return date(int(year), int(month), int(day))
      • else:
        • return None
    • return unicode(value, 'utf-8')
  • csvfile = args.csvfile[0]
  • goodreads = UnicodeDictReader(csvfile)
  • # create database
  • sq3connection = sqlite3.connect(':memory:')
  • sq3connection.row_factory = sqlite3.Row
  • cursor = sq3connection.cursor()
  • cursor.execute('CREATE TABLE books (title, author, authorLastFirst, rating int, dateRead date, bookshelf)')
  • for row in goodreads:
    • book = {
      • 'title': row['Title'],
      • 'author': row['Author'],
      • 'authorLastFirst': row['Author l-f'],
      • 'rating': row['My Rating'],
      • 'dateRead': row['Date Read'],
      • 'bookshelf': row['Exclusive Shelf'],
    • }
    • cursor.execute('INSERT INTO books VALUES (:title, :author, :authorLastFirst, :rating, :dateRead, :bookshelf)', book)
  • # clean up
  • # commit is necessary for testing, if writing to a real file
  • sq3connection.commit()
  • sq3connection.close()

Note that if you are modifying the script to import other fields, you may find it useful to be able to view the database using the sqlite3 command-line command. Change “:memory:” to, say, “testfile.sqlite3” and you should see a file by that name show up in the current working directory.

As you can see, that code expects a csv file name. Here’s the code with the ArgumentParser setup and all of the command-line options:

[toggle code]

  • #!/usr/bin/python
  • import argparse, sqlite3
  • from datetime import date, timedelta
  • from csv import DictReader
  • sortFields = {
    • 'title': 'title ASC',
    • 'author': 'authorLastFirst ASC',
    • 'rating': 'rating DESC',
    • 'date': 'dateRead DESC',
  • }
  • parser = argparse.ArgumentParser(description='Search Goodreads export file.')
  • parser.add_argument('csvfile', metavar='CSVFILE', type=argparse.FileType('r'), nargs=1, help='a Goodreads file to search')
  • parser.add_argument('--author')
  • parser.add_argument('--bookshelf')
  • parser.add_argument('--rating', type=int, choices=range(1, 6))
  • parser.add_argument('--rating-exact', type=int, choices=range(1, 6))
  • parser.add_argument('--title')
  • parser.add_argument('--period', choices=['week', 'month', 'year', 'week-to-date', 'month-to-date', 'year-to-date'])
  • parser.add_argument('--days', type=int)
  • parser.add_argument('--order', default='date', choices=sortFields.keys())
  • parser.add_argument('--verbose', action='store_true')
  • args = parser.parse_args()

With these options, I can do searches such as for last week, or the last 14 days:

~/bin/goodreads goodreads_library_export.csv --period week
 **** Drunkard's Walk by Frederik Pohl (2016-01-21)
   ** The Mother Tongue - English & How It Got That Way by Bill Bryson (2016-01-21)
  *** Treks Not Taken by Steven R. Boyett (2016-01-20)
~/bin/goodreads goodreads_library_export.csv --days 14
 **** Drunkard's Walk by Frederik Pohl (2016-01-21)
   ** The Mother Tongue - English & How It Got That Way by Bill Bryson (2016-01-21)
  *** Treks Not Taken by Steven R. Boyett (2016-01-20)
 **** A Dangerous Friend by Ward Just (2016-01-15)
  *** Envisioning Information by Edward R. Tufte (2016-01-15)

The searches can be combined, as well. For example, the books I enjoyed the most from last year:

~/bin/goodreads goodreads_library_export.csv --rating 5 --period year
***** The Dying Earth (The Dying Earth, #1) by Jack Vance (2015-11-30)
***** Lebanese Cuisine: Over Two Hundred Authentic Recipes Designed for the Gourmet, the Vegetarian, the Healthfood Enthusiast by Madelain Farah (2015-11-10)
***** Advise & Consent by Allen Drury (2015-10-08)
***** The Collected Stories of Philip K. Dick, Volume 5: We Can Remember It For You Wholesale (2015-09-21)
***** The Congressman Who Loved Flaubert: 21 Stories and Novellas by Ward Just (2015-09-19)
***** The Case for Democracy: The Power of Freedom to Overcome Tyranny and Terror by Natan Sharansky (2015-07-09)
***** Mockingbird by Walter Tevis (2015-04-30)
***** Zen in the Art of Writing by Ray Bradbury (2015-03-04)
***** The Origins and Development of the English Language by John Algeo (2015-01-31)

And books that were at least okay last month:

~/bin/goodreads goodreads_library_export.csv --rating 2 --period month
 **** The Best of C. L. Moore by C.L. Moore (2015-12-29)
  *** The Space Scavengers by Cleve Cartmill (2015-12-20)
 **** Knights of the Dinner Table: Bundle of Trouble vol. 45 by Jolly R. Blackburn (2015-12-18)
  *** Astro City Vol. 8: Shining Stars by Kurt Busiek (2015-12-17)
  *** Green Magic: The Fantasy Realms of Jack Vance (2015-12-17)
  *** The Salads Cookbook by Southern Living Magazine (2015-12-16)
 **** Intellectuals and Society by Thomas Sowell (2015-12-14)
 **** Dear Committee Members by Julie Schumacher (2015-12-10)
  *** The Best of Fritz Leiber (2015-12-06)
 **** Scoop by Evelyn Waugh (2015-12-01)

Everything I’ve read by Ray Bradbury:

~/bin/goodreads goodreads_library_export.csv --author bradbury --bookshelf read
***** Zen in the Art of Writing by Ray Bradbury (2015-03-04)
***** Dandelion Wine (Green Town, #1) by Ray Bradbury (2014-09-03)
 **** The Illustrated Man by Ray Bradbury (2014-06-10)
 **** The October Country by Ray Bradbury
  *** I Sing the Body Electric! & Other Stories by Ray Bradbury
***** Something Wicked This Way Comes (Green Town, #2) by Ray Bradbury
 **** Fahrenheit 451 by Ray Bradbury
***** The Martian Chronicles by Ray Bradbury

It can also search through titles; for example, here are all of the books with “best of” in the title that I’ve marked on Goodreads as having been read:

~/bin/goodreads goodreads_library_export.csv --bookshelf read --title "best of"
 **** The Best of C. L. Moore by C.L. Moore (2015-12-29)
  *** The Best of Fritz Leiber (2015-12-06)
 **** For the Love of Mike: More of the Best of Mike Royko (2015-09-09)
 **** The Best of Edmond Hamilton (2015-08-30)
  *** The Best of Cordwainer Smith (2015-07-15)
 **** The Best of Fredric Brown (2015-07-02)
 **** One More Time: The Best of Mike Royko (2015-06-20)
  *** The Best of Omni Science Fiction No. 2 by Ben Bova (2015-04-16)

Each book’s line is printed out from a Book class. I often find it useful to put a database row into a class to make it easier to output custom data from the row. Here’s the simple one I use for a book row:

[toggle code]

  • # a book class to make it easier to access and display book data
  • class Book:
    • def __init__(self, row):
      • self.data = row
    • def __getattr__(self, name):
      • if name in self.data.keys():
        • return self.data[name]
      • else:
        • return None
    • def stars(self):
      • stars = self.rating * '*'
      • stars = stars.rjust(5)
      • return stars
    • def line(self):
      • line = '%s %s' % (self.stars(), self.title)
      • if self.author:
        • line = line + ' by %s' % self.author
      • if self.dateRead:
        • line = line +  ' (%s)' % self.dateRead
      • elif self.bookshelf and not args.bookshelf:
        • line = line + ' (%s)' % self.bookshelf
      • return line

And, finally, I construct a query from whatever options were provided on the command line:

[toggle code]

  • # construct query
  • query = 'SELECT * FROM books'
  • searches = []
  • searchValues = []
  • if args.author:
    • searches.append('author LIKE ?')
    • searchValues.append('%'+args.author+'%')
  • if args.bookshelf:
    • searches.append('bookshelf=?')
    • searchValues.append(args.bookshelf)
  • if args.rating:
    • searches.append('rating >= ?')
    • searchValues.append(args.rating)
  • if args.rating_exact:
    • searches.append('rating IS ?')
    • searchValues.append(args.rating_exact)
  • if args.title:
    • searches.append('title LIKE ?')
    • searchValues.append('%'+args.title+'%')
  • if args.days:
    • searches.append('dateRead >= ?')
    • searchValues.append(date.today() - timedelta(days=args.days))
  • if args.period:
    • searches.append('dateRead BETWEEN ? AND ?')
    • today = date.today()
    • if args.period.endswith('-to-date'):
      • endDate = today
      • if args.period.startswith('year-'):
        • startDate = today.replace(month=1, day=1)
      • elif args.period.startswith('month-'):
        • startDate = today.replace(day=1)
      • else:
        • weekSoFar = today.weekday()+1
        • if weekSoFar == 7:
          • weekSoFar = 0
        • startDate = today - timedelta(days=weekSoFar)
    • else:
      • if args.period == 'year':
        • startDate = date(today.year-1, 1, 1)
        • endDate = date(today.year-1, 12, 31)
      • elif args.period == 'month':
        • month = today.month - 1
        • year = today.year
        • if month == 0:
          • month = 12
          • year = year - 1
        • startDate = date(year, month, 1)
        • endDate = today.replace(day=1) - timedelta(days=1)
      • else:
        • weekSoFar = today.weekday()+2
        • if weekSoFar == 8:
          • weekSoFar = 1
        • endDate = today - timedelta(days=weekSoFar)
        • startDate = endDate - timedelta(days=7)
    • searchValues.append(startDate)
    • searchValues.append(endDate)
  • if searches:
    • query = query + ' WHERE ' + ' AND '.join(searches)
  • query = query + ' ORDER BY ' + sortFields[args.order]
  • # perform search
  • if args.verbose:
    • print query
    • print
  • resultCount = 0
  • results = cursor.execute(query, searchValues)
  • for row in results:
    • book = Book(row)
    • resultCount = resultCount + 1
    • print book.line()
  • print
  • print "Found %i book%s." % (resultCount, 's'[resultCount==1:])

The only particularly complex option is for the date-related periods, since it needs to calculate when the last month, year, or week was. For most search options, if you need to add a new one all you’ll have to do is add a new “if args.option”, append a query format to searches and the query data to search for to searchValues.

You can use the –verbose option to see what the current script passes to .execute for queries.

  1. <- Django-MPTT
  2. Model 100 data transfer ->