Retry SSH connections after transient error

Jerry Stratton, October 20, 2014

The Timeout class works great for retrying connections after they timeout, but what about more prosaic errors? I’ve been getting a bunch of AuthenticationException errors in my Python/Paramiko connection attempts lately. I’d been just capturing all SSHExceptions (of which AuthenticationException is a subclass) and reporting the error, but this is just a transient error that almost always goes away on the very next upload.

That makes it a perfect candidate for retrying the connection. I renamed the class from Timeout to Persistence, because this more generic class is going to be more persistent at making connections.1

[toggle code]

  • from paramiko import SSHException, AuthenticationException
  • class Persistence(object):
    • def __init__(self, function=None, seconds=30, tries=3, errorMessage='Timeout'):
      • self.seconds = seconds
      • self.tryLimit = tries
      • self.tries = 1
      • self.function = function
      • self.errorMessage = errorMessage
    • def act(self):
      • signal.signal(signal.SIGALRM, self.handleTimeout)
      • signal.alarm(self.seconds)
      • try:
        • self.function()
      • except AuthenticationException, error:
        • self.tryAgain(AuthenticationException(error), 'Authentication exception')
      • signal.alarm(0)
    • def tryAgain(self, exception, message):
      • if self.tries >= self.tryLimit:
        • raise exception
      • else:
        • print message, 'try', self.tries, self.errorMessage
        • sleep(2*self.tries)
        • self.tries = self.tries + 1
        • self.act()
        • print 'Succeeded on try', self.tries
    • def handleTimeout(self, signum, frame):
      • self.tryAgain(TimeoutError(self.errorMessage), 'Timed out')

All it really does is add a tryAgain method that can be called both by the handleTimeout method and any exceptions in try/except. If the failure continues more than three times, the exception is passed back up as normal.

[toggle code]

    • def ensureConnection(self, purpose=None):
      • try:
        • opener = Persistence(function=self.openConnection, errorMessage=purpose)
        • opener.act()
        • return True
      • except SSHException, errtext:
        • self.warning('Unable to open SSH connection:', errtext, purpose)
      • except socket.error, errtext:
        • self.warning('Socket error connecting:', errtext, purpose)
      • except EOFError, errtext:
        • self.warning('Server has terminated with EOFError:', errtext, purpose)
      • except TimeoutError, errtext:
        • self.warning('Timeout error connecting:', errtext)
      • return False

  1. And not persistent in the sense of outliving its parent process.