Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, PHP, and whatever else I happen to feel like hacking at.

Timeout class with retry in Python

Jerry Stratton, April 16, 2014

In Paramiko, the SSHClient’s connect method has a timeout parameter, but it rarely causes a timeout in some common instances. Since moving from San Diego’s Cox Cable to Round Rock’s Time-Warner, I’ve been seeing stuck connections much more often.

The fix appears to be to use signals: set an alarm before running the line/function that might get stuck, and then remove the alarm afterward. If the alarm has time to go off, it will generate an exception that can then be handled.

Since the inability to connect or fail to connect appears to be extremely random, I decided to combine it with the option to retry the connection:

[toggle code]

  • import signal
  • class TimeoutError(Exception):
    • pass
  • #Paramiko timeout does not work often, if at all
  • # So create a timeout class
  • class Timer(object):
    • def __init__(self, function=None, seconds=30, tries=3, errorMessage='Timeout'):
      • self.seconds = seconds
      • self.tryLimit = tries
      • self.tries = 1
      • self.function = function
      • self.errorMessage = errorMessage
    • def act(self):
      • signal.signal(signal.SIGALRM, self.handleTimeout)
      • signal.alarm(self.seconds)
      • self.function()
      • signal.alarm(0)
    • def handleTimeout(self, signum, frame):
      • if self.tries >= self.tryLimit:
        • raise TimeoutError(self.errorMessage)
      • else:
        • print 'Timed out on try', self.tries, self.errorMessage
        • self.tries = self.tries + 1
        • self.act()
        • print 'Succeeded on try', self.tries

The first class, a subclass of Exception, doesn’t do anything except give us an appropriately-named exception. The second class will raise that exception if the passed function does not complete before the given number of seconds.

First, instantiate the timer; then run the “act” method. If it times out, it will (by default) try again twice; that is, it will try three times. The function is retried simply by calling the act method again.

For example:

[toggle code]

  • from paramiko import SSHClient, SSHException
  • class SFTPClient(Maker):
    • def ensureConnection(self, purpose=None):
      • try:
        • timer = Timer(function=self.openConnection, errorMessage=purpose)
        • timer.act()
        • return True
      • except SSHException, errtext:
        • self.warning('Unable to open SSH connection:', errtext, purpose)
      • except socket.error, errtext:
        • self.warning('Socket error connecting:', errtext, purpose)
      • except EOFError, errtext:
        • self.warning('Server has terminated with EOFError:', errtext, purpose)
      • except TimeoutError, errtext:
        • self.warning('Timeout error connecting:', errtext)
      • return False
    • def openConnection(self):
      • self.client = SSHClient()
      • self.client.load_system_host_keys()
      • self.client.connect(hostname=self.host, username=self.user, timeout=20)
      • self.sftp = self.client.open_sftp()

Usually, this code, if it timeouts, will succeed on the second try; only rarely will it fail on all three tries.

Timed out on try 1 uploading stage file string /Stage/Mimsy/Books/no-one-left-lie.html

Timed out on try 2 uploading stage file string /Stage/Mimsy/Books/no-one-left-lie.html

Succeeded on try 3

This could be used with other kinds of errors, but it is especially appropriate for timeouts, because the nature of the timeout is that whatever problem there was thirty seconds ago probably no longer exists.

October 20, 2014: Retry SSH connections after transient error

The Timeout class works great for retrying connections after they timeout, but what about more prosaic errors? I’ve been getting a bunch of AuthenticationException errors in my Python/Paramiko connection attempts lately. I’d been just capturing all SSHExceptions (of which AuthenticationException is a subclass) and reporting the error, but this is just a transient error that almost always goes away on the very next upload.

That makes it a perfect candidate for retrying the connection. I renamed the class from Timeout to Persistence, because this more generic class is going to be more persistent at making connections.1

[toggle code]

  • from paramiko import SSHException, AuthenticationException
  • class Persistence(object):
    • def __init__(self, function=None, seconds=30, tries=3, errorMessage='Timeout'):
      • self.seconds = seconds
      • self.tryLimit = tries
      • self.tries = 1
      • self.function = function
      • self.errorMessage = errorMessage
    • def act(self):
      • signal.signal(signal.SIGALRM, self.handleTimeout)
      • signal.alarm(self.seconds)
      • try:
        • self.function()
      • except AuthenticationException, error:
        • self.tryAgain(AuthenticationException(error), 'Authentication exception')
      • signal.alarm(0)
    • def tryAgain(self, exception, message):
      • if self.tries >= self.tryLimit:
        • raise exception
      • else:
        • print message, 'try', self.tries, self.errorMessage
        • sleep(2*self.tries)
        • self.tries = self.tries + 1
        • self.act()
        • print 'Succeeded on try', self.tries
    • def handleTimeout(self, signum, frame):
      • self.tryAgain(TimeoutError(self.errorMessage), 'Timed out')

All it really does is add a tryAgain method that can be called both by the handleTimeout method and any exceptions in try/except. If the failure continues more than three times, the exception is passed back up as normal.

  1. <- Pipe progress viewer
  2. Swift Apple ->