We'd like to provide a mechanism for ZODB users to save large, infrequently-changing chunks of binary data ( > ~ 1MB ) into storage that allows later accesses to the object to be more memory/space-efficient than current strategies that exist for accessing large objects.
Storing large seldom-changing binary objects in ZODB requires somewhat complicated application logic. When large objects are stored, they currently need to be broken up into many smaller ZODB records in primarily in order to reduce the amount of memory consumed when users wish to deal with the "blob" as a unit.
An example of using "straight" ZODB to store large binary content: The Zope 2 "Image"/"File" content objects use a "Pdata" class to service this requirement. Each Pdata object stores up to 64K of data. The "Image"/"File" object has a method which converts a file stream into multiple Pdata objects which are linked together in succession. A Pdata chain that is created from a 202KB file stream will end up looking like this:
next next next next
64K Pdata ----> 64K Pdata ----> 64K Pdata ----> 10K Pdata ----> None
The None at the end of the Pdata chain delineates the end of the chain.
When a Zope "Image"/"File: object is requested to be rendered to the browser, the File object has a method which iterates over all the Pdata objects, sending each in turn to the remote browser via a Zope RESPONSE.write method call. It stops when the None pointer is encountered in the chain. This is fast and robust.
However, storing binary data in this way comes at a price. In order to increase the overall speed of access to objects stored within a ZODB storage, ZODB keeps an in-memory LRU cache of objects. When Pdata objects are accessed, they are placed in the cache and, if the cache is full, their insertion into the cache may evict other (possibly also frequently-accessed objects) out of the cache.
Also, in the case of Zope, when it serves up a large "file" from an object stored in ZODB, it tends to buffer data into a temporary file before actually sending content out to a remote browser. This requires IO/disk activity and CPU power.
It would be more efficient to actually store large objects as files (or at least as objects which could be opened as a stream) instead of storing them in a ZODB storage as many distinct objects that need to be reassembled into a unit at processing time.
We'd like to create a common API for blob-like objects, allowing people to create their own blob object implementations. We'd also like to create a "reference" Blob implementation.
The API for creating a blob may be used something like this within a method of a Persistent object:
def createMyBlob(self):
from ZODB.blob import FileBlob
self.blob = FileBlob()
fh = self.blob.open('wb')
fh.write('some_data')
fh.close()
The API for retrieving all data from a blob (not a common thing to want to do, but useful for demonstration) within a method of a Persistent object might look something like this:
def getAllMyBlobData(self):
fh = self.blob.open('rb')
data = fh.read()
fh.close()
return data
Essentially, a "FileBlob" in the above is a persistent object that has an "open" method which returns a file-like object. This is the minimum required API for blob objects.