Tuesday, November 2, 2010

Learning Twisted (part 8) - Anatomy of deferreds in Twisted

There are numerous posts and document on conceptual explanation of one of the central concepts in Twisted framework - deferred.

The book on Twisted network programming provides an analogy of deferreds as buzzers which are handed to a visitor by a restaurant owner. This buzzer notifies the visitor that the table is ready and he could set aside what ever he has been doing and can come over to occupy the table meant for him.

Others identify deferreds as a place holder or a promise that is yet to be fulfilled. We could attach other actions that should follow when the promise is fulfilled or breached. These actions are like callback chains that would be triggered when a deferred fires.

Deferreds allows you to create followup action for something that will take some time to get fulfilled. This in turn relieves twisted to attend to other tasks and come back to execute follow-up actions when the condition is completed.

I will keep myself to code commentary and current behavior of deferreds.

Here is a simple code that creates a deferred, adds callbacks which gets executed when a callback is fired.

Diag:
      +--------+
      |deferred|
      +--------+
          |
          |      +--+    \--/
          '......|f1|....|fe|
                 +--+    /--\

from twisted.internet.defer import Deferred

def f1(result): 
    print result
    return "f1 result"


def fe(result): 
    print 'Callback on failure'
    print result

d = Deferred()

d.addCallback(f1)
d.addErrback(fe)

def traceit():
    d.callback("defer results")

traceit()
I am using the call tracing technique mentioned in one of my previous post.
Output:
traceit              ----->                  callback
callback             ----->        _startRunCallbacks
_startRunCallbacks   ----->             _runCallbacks
_runCallbacks        ----->                        f1
defer results
_runCallbacks        ----->                  passthru

Some observation:

  1. Deferred are fired by calling callback or errback method.
  2. _startRunCallbacks runs through list of registered callback function and executes them.
  3. In this case it loops twice through the list for f1 and fe. fe is not executed as it is registered to be executed in case an error occurs.  You can see the no op function passthru.
  4. f1 is passed the deferred result which in this case is a string - "defer results"
  5. The callbacks are independent of twisted reactor and reactor does not invoke these callback functions.
  6. When deferred fires (deferred.callback is called), the callbacks are almost immediately executed. What I mean by "almost" is that it gets executed as soon as possible.
You can also register multiple functions as callbacks to a deferred. Here, I am providing only the incremental code to above.

Diag:

      +--------+
      |deferred|
      +--------+
          |
          |      +--+   +--+    +--+    \--/
          '......|f1|...|f2|....|f3|....|fe|
                 +--+   +--+    +--+    /--\

def f2(result): 
    print result
    return "f2 result"
def f3(result): 
    print result
    return "f3 result"
    
d = Deferred()
d.addCallback(f1)  
d.addCallback(f2)
d.addCallback(f3)
d.addErrback(fe) 


Output:
traceit              ----->                  callback
callback             ----->        _startRunCallbacks
_startRunCallbacks   ----->             _runCallbacks
_runCallbacks        ----->                        f1
defer results
_runCallbacks        ----->                        f2
f1 result
_runCallbacks        ----->                        f3
f2 result
_runCallbacks        ----->                  passthru

Some useful observations that we can make based on the call flow are:
  1. The callback functions are called one after another.
  2. Results are passed from one call to the next in the chain. 
  3. All callback functions must accept a result. 
    1. The first in the chain is passed the result with which deferred.callback is called. F1 is passed "defer results" and returns "f1 result"
  4. The subsequent callback function receives the result returned from previous callback function.
    1. f2 is passed "f1 result" and returns "f2 result" which is passed to f3
  5. From the way the results is passed, it makes sense for the callback functions to return the original result that is passed to it so that others registered functions could make use of it.
Now, lets modify the above code to fire a deferred for failure.
from twisted.python.failure import Failure

d = Deferred()
d.addCallback(f1)  
d.addCallback(f2)
d.addCallback(f3)
d.addErrback(fe) 

e = Exception('failed')
fail = Failure(e)

def traceit():
    d.errback(fail)

traceit()

Output:
traceit              ----->                   errback
errback              ----->        _startRunCallbacks
_startRunCallbacks   ----->             _runCallbacks
_runCallbacks        ----->                  passthru
_runCallbacks        ----->                  passthru
_runCallbacks        ----->                  passthru
_runCallbacks        ----->                        fe
Callback on failure
Some observations based on the call flow:
  1. _runCallbacks runs through the list of callbacks. 
  2. It executes only those that are registered via addErrback. You can see three passthru, which are are for f1, f2 and f3. These are not executed as deferred fires with an error (via errback)
  3. Error results are passed as Failure objects which encapsulates an exception.

Now we add some twist to the way callbacks are registered.
Diag:

      +--------+
      |deferred|
      +--------+
          |
          |      +--+    \--/    +--+    +--+
          '......|f1|....|fe|....|f2|....|f3|
                 +--+    /--\    +--+    +--+

d = Deferred()
d.addCallback(f1)  
d.addErrback(fe) 
d.addCallback(f2)
d.addCallback(f3)

e = Exception('failed')
fail = Failure(e)

def traceit():
    d.errback(fail)

traceit()

The output:
traceit              ----->                   errback
errback              ----->        _startRunCallbacks
_startRunCallbacks   ----->             _runCallbacks
_runCallbacks        ----->                  passthru
_runCallbacks        ----->                        fe
Callback on failure
_runCallbacks        ----->                        f2
None
_runCallbacks        ----->                        f3
f2 result


There are some important observation to make here.
  1. The sequence of how you register your callbacks and errback functions determine how they are being interpreted.
  2. In the above code, you see f1 does not run as errback is called. Function "fe" runs and returns a result which is not an error. This results in execution of functions "f2" and "f3".

Now we will make another small change, we pass on the failure object in fe.

def fe(result): 
    print 'Callback on failure'
    return result

d = Deferred()
d.addCallback(f1) 
d.addErrback(fe) 

d.addCallback(f2)
d.addCallback(f3)

Output:
traceit              ----->                   errback
errback              ----->        _startRunCallbacks
_startRunCallbacks   ----->             _runCallbacks
_runCallbacks        ----->                  passthru
_runCallbacks        ----->                        fe
Callback on failure
_runCallbacks        ----->                  passthru
_runCallbacks        ----->                  passthru
_runCallbacks        ----->              cleanFailure
cleanFailure         ----->              __getstate__
Unhandled error in Deferred:
Traceback (most recent call last):
Failure: exceptions.Exception: failed

The observation to make here is that f2, f3 are not executed as execution of fe returns a failure. Now it make sense for all callback functions to return the original result that is passed into the function, else we need to ensure the order in which the callbacks and errbacks are registered.

Playing around the same lines as before but adding another catchall errback function

def fe1(result): 
    print 'fe1: Callback on failure'    

d = Deferred()
d.addCallback(f1) 
d.addErrback(fe) 
d.addCallback(f2)
d.addCallback(f3)
d.addErrback(fe1)

Output:
traceit              ----->                   errback
errback              ----->        _startRunCallbacks
_startRunCallbacks   ----->             _runCallbacks
_runCallbacks        ----->                  passthru
_runCallbacks        ----->                        fe
Callback on failure
_runCallbacks        ----->                  passthru
_runCallbacks        ----->                  passthru
_runCallbacks        ----->                       fe1
fe1: Callback on failure

Now, you can see that f2 and f3 are not executed as fe passes on a failure and next errback registered is called.

The correct picture of registered callbacks looks something like this:

Diag:

      +--------+
      |deferred|
      +--------+
          |
          |      +--+    \--/    +--+     +--+    \---/
          '......|f1|....|fe|....|f2|.....|f3|....|fe1|
                 +--+    /--\    +--+     +--+    /---\

When a defer fires, the callbacks startes geting executed in order. Depending on the current state of the result object, the callback or errback is executed.

  • If defer fires a callback then f1 is executed and if f1 results in an error fe also gets executed.
  • If defer fires a errback then f1 is not executed but fe which is registered as errback as next in the sequence gets executed. 
    • If "fe" returns a postive result then "f2" will get executed and 
    • if "fe" results in an error, "fe1" which is next errback registered gets executed.
It's already a long post. More on deferreds later. 

1 comment: