Wednesday, September 22, 2010

Python and binary data - Part 3

Normal file operations that we use are line oriented
FILE = open(filename,"w")
FILE .close()
FILE = open(filename,"r")
for line in FILE.readlines(): print line
FILE .close()

We can also use byte oriented I/O operations on these files.
FILE = open(filename,"r")  # This reads up to numBytes of Bytes from the file.
But if the file does not contain textual data, the contents may not be meaningful.

It is much better to open the file in binary mode
FILE = open(filename,"rb")

Python objects and "C" struct storage
Python does provide a different abstraction level than "C" for the various data types like integer etc. They also store them differently.
Of course the data stored in binary files or sent and received across the network is contiguous bytes. In Python, the data like list may not be stored as contiguous chunk of bytes.
C variable
     +-----+         +--------+
     |array|-------> |10|9|2|3|
     +-----+         +--------+
                     contiguous bytes

Python variable

     +-----+         +-------+
     |array|-------> |.|.|.|.|
     +-----+         +-------+
                      | | | |
                      | | | |
                     10 9 2 3

To handle them, it is important to convert python values to "C" structs, i.e pack them as contiguous bytes of data or dissemble a contiguous chunk of bytes to Python objects.

The module "struct" provides facility to pack python objects as contiguous chunk of bytes or dissemble a chunk of bytes to python structures.

Packing a structure
The pack function takes a format string and one or more arguments, and returns a binary string. This looks very much like you are formatting a string except that the output is not a string but a chunk of bytes.
import struct
import sys
print "Native byteorder: ", sys.byteorder
# If no byteorder is specified, native byteorder is used
buffer = struct.pack("ihb", 3, 4, 5)
print "Byte chunk: ", repr(buffer)
print "Byte chunk unpacked: ", struct.unpack("ihb", buffer)
# Last element as unsigned short instead of unsigned char ( 2 Bytes)
buffer = struct.pack("ihh", 3, 4, 5)
print "Byte chunk: ", repr(buffer)

Native byteorder:  little
Byte chunk:  '\x03\x00\x00\x00\x04\x00\x05'
Byte chunk unpacked:  (3, 4, 5)
Byte chunk:  '\x03\x00\x00\x00\x04\x00\x05\x00'

You could use network byte order with data received from network or pack data to send it to network.

import struct
# If no byteorder is specified, native byteorder is used
buffer = struct.pack("hhh", 3, 4, 5)
print "Byte chunk native byte order: ", repr(buffer)
buffer = struct.pack("!hhh", 3, 4, 5)
print "Byte chunk network byte order: ", repr(buffer)
Byte chunk native byte order:  '\x03\x00\x04\x00\x05\x00'
Byte chunk network byte order:  '\x00\x03\x00\x04\x00\x05'

You can optimize by avoiding the overhead of allocating a new buffer by providing a buffer that was created earlier.

import struct
from ctypes import create_string_buffer
bufferVar = create_string_buffer(8)
bufferVar2 = create_string_buffer(8)
# We use a buffer that has already been created
# provide format, buffer, offset and data
struct.pack_into("hhh", bufferVar, 0, 3, 4, 5)
print "Byte chunk: ", repr(bufferVar.raw)
struct.pack_into("hhh", bufferVar2, 2, 3, 4, 5)
print "Byte chunk: ", repr(bufferVar2.raw)

Byte chunk:  '\x03\x00\x04\x00\x05\x00\x00\x00'
Byte chunk:  '\x00\x00\x03\x00\x04\x00\x05\x00'

Unpacking , of course is the reverse of packing : unpack(fmt, binary).


  1. The knowledge of python is very essential for the software developers. Python is a high level, general purpose, dynamic programming language that is of code readability and its synatx allows programmers to express the concept in fewer lines of code.
    python training in chennai | python training institutes in chennai

  2. Very good article on using struct. Thank you.

  3. Nice information about python and the programming structure is explained in an understandable manner my sincere thanks for sharing this post
    Python Training in Chennai

  4. nice article great post comment information thanks for sharing

  5. That is very interesting; you are a very skilled blogger. I have shared your website in my social networks! A very nice guide. I will definitely follow these tips. Thank you for sharing such detailed article.
    QlikView Training in Chennai
    Informatica Training in Chennai
    Python Training in Chennai

  6. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging

    Azure Training
    Azure Training in Chennai

  7. I ‘d mention that most of us visitors are endowed to exist in a fabulous place with very many wonderful individuals with very helpful things.
    nebosh course in chennai