Differences between revisions 34 and 35
Revision 34 as of 2009-11-23 08:36:31
Size: 16897
Editor: gabrielgrant
Comment: moved some extraneous info re:validators into a footnote
Revision 35 as of 2009-12-05 19:15:56
Size: 17790
Editor: gabrielgrant
Comment: Added a list of available hooks, and an example of when to use storm_invalidated. Adding examples of when to use the others would be good.
Deletions are marked like this. Additions are marked like this.
Line 277: Line 277:


== Hooks ==

Storm provides a few hooks - methods on Storm-managed objects triggered when certain operations are performed by the store.

=== __storm_invalidated__ ===

Called when an object becomes invalidated in Storm's internal cache.

This hook can be used when implementing a cache to store the result of calculations performed on Storm-managed data. By caching the results in the object itself, and then cleaning the cache in the `__storm_invalidated__` hook, the results are automatically invalidated whenever storm decides the underlying data is no longer valid.

=== __storm_pre_flush__ ===

Called just before an object is flushed to the database.

=== __storm_flushed__ ===

Called after an object has been flushed.

=== __storm_loaded__ ===

Called each time an object is loaded from the database. See the [[Tutorial]] for an example of its use.


----

Storm Manual

This is the detailed documentation for Storm. For a more hands-on approach, check out the Tutorial.

Databases

The create_database() function

The create_database() function has the following prototype:

database = create_database(uri)

The uri parameter may be either a URI string or a URI object. The following URIs are examples of acceptable syntax:

  • backend:database_name

  • backend://hostname/database_name

  • backend://hostname:port/database_name

  • backend://username:password@hostname/database_name

  • backend://hostname/database_name?option=value

  • backend://username@/database_name

MySQL

URIs

Example connection string:

mysql://username:password@hostname/database_name

Dependencies

MySQL support depends on the MySQLdb module for Python.

PostgreSQL

URIs

Example connection string:

postgres://username:password@hostname/database_name

Dependencies

PostgreSQL support depends on the psycopg2 module for Python.

SQLite

URIs

SQLite databases can be opened by using "sqlite" for the backend name (in the schema part of the URI). The hostname, username and password parts are ignored.

If no database name is specified, this will create an in-memory database:

database = create_database('sqlite:')

If a database name is specified, it may be a relative path (in which case it is relative to the current working directory), or an absolute path. Paths should be specified in the system dependent style (eg, with '\' as the file separator on windows).

For example, supposing the current working directory is /home/storm (on a unix system), the following statements all correspond to opening a database in "/home/storm/foo"

   1 >>> create_database('sqlite:foo')
   2 >>> create_database('sqlite:/home/storm/foo')
   3 >>> create_database('sqlite:///foo')
   4 >>> create_database('sqlite:////home/storm/foo')

If the database file doesn't already exist, it will be created when a Store object is first created using the returned Database object. If any of the intervening directories don't exist, or the path specified is inaccessible in some way, an exception (OperationalError) will be raised when a Store object is created with the returned Database object.

Note that SQLite databases generally consist of a single file, but associated temporary files are created in the same directory. The SQLite documentation suggests that you either place your SQLite databases on a journaling filesystem, or place them in a directory which contains no other files, to avoid a (small) potential cause of data loss in the case of system failure (See http://www.sqlite.org/lockingv3.html, section "How To Corrupt Your Database Files"). For this reason, it is often simplest to create an empty directory to hold each of your SQLite databases.

SQLite accepts one option in the option part of the URI (ie, following the filename part, separated from it by a "?"). This option specifies the time that SQLite will wait when trying to obtain a lock on the database. The timeout defaults to 5 seconds; the following command opens a database with a timeout of 0.3 seconds:

   1 >>> database = create_database("sqlite:foo?timeout=0.3")

Dependencies

SQLite support depends on the pysqlite2 module for Python.

The Store interface

The ResultSet interface

Properties

What is a property in Storm

Common constructor parameters

Most properties accept the following properties. All of them are optional.

  • name: The column name in the database. Setting it allows the class attribute name and the column name in the database schema to differ. If unset, the name will be considered the same for the column and the class attribute.

  • primary: If true, this property is considered to map the primary key of the database table. Compound keys should be defined using the class-level __storm_primary__ attribute instead (see the section about compound keys for more information).

  • default: The default value for the property.

  • default_factory: A factory which returns default values for the property. This should be used when the default value is mutable.

  • validator: a function (or other callable object) that takes three arguments: the object that the property is attached to, the attribute name as a string and the value that is being set. The validator should return the value that the property should be set to, and if the validator function raises an exception, then the property is not set at all. Validators can only be set on properties, not on references or reference sets1, but setting a validator on a foreign id property can be used to achieve the same result as having a validator on the reference itself. For example, if you have a reference like foo = Reference(foo_id, Foo.id), then executing "object.foo = bar" will end up doing "object.foo_id = bar.id", which will trigger foo_id's validator. 2

Defining compound keys

To define a compound key in Storm, use the __storm_primary__ class-level attribute. It should be set to a tuple with the names of the properties composing the primary key. For example:

   1 >>> class FooBar(object):
   2 ...     __storm_table__ = "foo_bar"
   3 ...     __storm_primary__ = "foo_id", "bar_id"
   4 ...     foo_id = Int()
   5 ...     bar_id = Int()

Table of properties vs. python vs. database types

Property

Python

PostgreSQL

MySQL

SQLite

Bool

bool

BOOL

TINYINT(1)

INT

Int

int, long

SERIAL, BIGSERIAL, SMALLINT, INT, BIGINT

TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT

INT

Float

float

FLOAT, REAL, DOUBLE PRECISION

FLOAT, REAL, DOUBLE PRECISION

REAL, FLOAT, DOUBLE

Decimal

Decimal

DECIMAL, NUMERIC, MONEY

DECIMAL, NUMERIC

TEXT, VARCHAR

Unicode

unicode

TEXT, VARCHAR, CHAR

TEXT, VARCHAR, CHAR

TEXT, VARCHAR

RawStr

str

BYTEA

BLOB, BINARY, VARBINARY

BLOB

Pickle

any

BYTEA

BLOB, BINARY, VARBINARY

BLOB

DateTime

datetime

TIMESTAMP

DATETIME, TIMESTAMP

TEXT, VARCHAR

Date

date

DATE

DATE

TEXT, VARCHAR

Time

time

TIME

TIME

TEXT, VARCHAR

TimeDelta

timedelta

INTERVAL

TEXT

TEXT, VARCHAR

List

list

ARRAY[]

?

TEXT, VARCHAR

References

One-to-one

Many-to-one

Many-to-many

Using on_remote

The Reference interface

The ReferenceSet interface

Expressions

When working with the repository, it is frequently necessary to write expressions which describe the data you want to manipulate (find, alter, or delete). In native SQL these expressions are most frequently found in the ‘WHERE’ portion of a query.

As an example, look at the query:

SELECT * FROM Foo WHERE Foo.name = ‘bar’

In this example, “Foo.name = ‘bar’” is an expression which describes the Foo objects to be returned.

Storm provides specialized expression classes and overloaded operators which permit you to build expressions using native Python syntax.

For example, to perform the previous query using Storm, you could use the statement:

   1 >>> store.find(Foo, Foo.name == u"bar")

Complex Expressions

Storm supports complex expressions, including those using AND, OR and IN. The following directions on building complex expressions were extracted from the Storm mailing list.

IN

An 'IN' expression evaluates true if its value is present within its list. An example of using an IN expression within Storm is provided below.

   1 >>> store.find(Foo, Foo.name.is_in([u"bar", u"baz"]))

An alternate syntax would be:

   1 from storm.expr import In
   2 store.find(FOO, In(Foo.name, ([u"bar", u"baz"])))

AND

An ‘AND’ expression evaluates true if all its sub-expressions each evaluate True. An example of using ‘AND’ is provided below:

   1 store.find(Foo, (Foo.name == u"bar") & (Foo.id == 5))

Note: The use of parenthesis around each sub expression is required due to operator precedence.

Alternately, the storm.expr.And class can be used explicitly:

   1 from storm.expr import And
   2 store.find(Foo, And(Foo.name == u"bar", Foo.id == 5))

Using an explicit ‘And’ is particularly useful if you have a loop or generator function which build lists of expressions. Since this is all serializable, you can easily build up a python list of expressions, and then And() it together, as shown in the following example:

   1 expressions = []
   2 for foo in foos:
   3         expressions += [ Foo.name.like(foo.name) ]
   4         expressions += [ Foo.gender == foo.gender ]
   5 store.find(Foo, And(*expressions))

OR

An ‘OR’ expression evaluates true if any of its sub-expressions evaluate True. The syntax for building OR expressions matches the syntax for AND expressions described above.

Managing stores

Working with multiple threads

Store objects are not threadsafe. This means that each Store must be accessed from only a single thread at a time. In fact, some databases (eg, SQLite) will require that the Store is only accessed by the thread in which it was created, which means that you can't keep a global pool of Stores and pass them to threads as needed, or simply implement a mutex around a single Store.

Therefore, the best policy is usually to create a Store object for each thread which needs one. One convenient way to manage this is to implement a "threadsafe" Store manager, which creates a Store for each thread when that thread first requests one, and then keeps the Store cached in a thread-local dictionary. This approach is taken by ZStorm to provide Store objects to each thread.

Note that Database objects are able to be shared between threads, and accessed concurrently, so you only need to create one of these for each underlying database.

SQLite and threads

Stores using SQLite, in particular, can be difficult to work with in a multi-threaded environment. This is because SQLite has very limited support for concurrent connections. In particular, any call which attempts to modify the database can only succeed if there are no other connections open on the database. Roughly, this translates into Storm as a requirement that no other Store objects exist for that database. If a call which attempts to modify the database is made while there are other connections open, the call check that there are no other connections, and if there are it will wait for a little while for those connections to go away (and new connections will be prevented from being opened while this wait is in progress). However, if the other connections don't go away within the timeout, the call will raise an OperationalError, with the message "database is locked".

By default, SQLite will wait for the other connections to go away for 5 seconds. However, this timeout can be changed by specifying a timeout option in the URI when opening the database. See the section on timeouts in SQLite database URIs for more details.

In most cases, if you get an exception from a call to the Store, you will need to call rollback() before continuing with use of the Store. However, in the specific case of a call which returns an OperationalError with the message "database is locked", the Store is left in a well defined state, such that you can simply retry the commit() call (possibly after taking some action to get rid of the other open connections to the database). However, unless you first do something to prevent the database being locked (such as closing another Store which is accessing the database) you will often just get the same "database is locked" exception, so it is often easiest to handle such a case like any other exception, and rollback the changes.

Warning: There is one exception to this: SQLite versions 3.3.0 to 3.3.17 inclusive have a bug which means that, if you get an OperationalError("database is locked") from any call other than commit(), further operations on the Store can lead to database corruption - unless you first call rollback(). More details of this bug are available at http://www.sqlite.org/cvstrac/wiki?p=CorruptionFollowingBusyError

It is not always possible to avoid getting "database is locked" exceptions; for example, the lock may be held by another process that you have little control over. However, it is often possible to structure your application to avoid concurrent connections blocking each other. One simple strategy is to perform all your access to SQLite from a single thread, using a single Store: this will mean that you won't get any concurrency, but won't have to wait for locks either. You could manage such a setup by having a threadsafe queue object, which any thread can put data to be written to the database on, or requests for data to be read from the database, and a single worker thread will then read items from the queue, perform the necessary actions on the Store, and put results back on a queue.

A slightly less restrictive setup is to restrict _write_ access to the Store to a single thread, but allow any thread to open a Store for read-only operations. As long as the read-only operations don't last long (specifically, they don't last longer than the timeout on the writer's Store), this will execute without error - however, read operations may have to wait for a commit to finish, which in turn may be waiting for other read operations to complete, so there could be noticeable delays in accessing information.

A different approach is to allow any thread to perform read or write access to the store, but to require that a mutex is obtained before a thread performs any write access to the database, and to require that write accesses are completed in a short time (such that readers don't timeout). The mutex is necessary to avoid a "deadlock" situation, where two writers begin a modification simultaneously, and thus cannot complete until the other goes away. Such deadlocks will be resolved, eventually, by a timeout, but no database access will be possible until the timeout finishes - better to use a mutex to prevent this situation occurring.

Whichever strategy you use (other than the single-worker-thread strategy), it is always possible that a call will take longer than you expect (perhaps due to an unusual spike in the server's load), resulting in other threads raising an OperationalError("database is locked") exception, so your code needs to anticipate this possibility, and handle it appropriately.

Finally, if you have a multi-threaded situation which requires greater concurrency, your best approach may well be to use a different database which is designed for this kind of situation.

Using a global function

Using a base class

Creating a backend

Testing

It's easy to test that Storm features are working with new backends. There are base test classes in tests/store/database.py and tests/store/base.py which may be inherited by database-specific test cases. Look at other backends to get a feeling of how it works.

If these tests pass with a new backend, you may be pretty sure that Storm is working fine with the created backend.

Hooks

Storm provides a few hooks - methods on Storm-managed objects triggered when certain operations are performed by the store.

__storm_invalidated__

Called when an object becomes invalidated in Storm's internal cache.

This hook can be used when implementing a cache to store the result of calculations performed on Storm-managed data. By caching the results in the object itself, and then cleaning the cache in the __storm_invalidated__ hook, the results are automatically invalidated whenever storm decides the underlying data is no longer valid.

__storm_pre_flush__

Called just before an object is flushed to the database.

__storm_flushed__

Called after an object has been flushed.

__storm_loaded__

Called each time an object is loaded from the database. See the Tutorial for an example of its use.


  1. This is more of a design decision than missing functionality, since it doesn't make much sense to have a validator attached to a reference but that is not run when the underlying field is changed. Also, a particular property might be involved in multiple references (e.g. a Reference(on_remote=True) or ReferenceSet() on the other class). If you were able to specify the validators on references and reference sets, then what would happen when conflicting validators exist? By having the validator on the property itself, it is unambiguous what will be run. (1)

  2. Note that if "bar" has not been flushed to the database yet (in which case it won't have a database ID yet), the validator won't run until the flush happens, though if you're using transactions properly, it shouldn't be a big issue. (2)

Manual (last edited 2009-12-05 20:36:43 by gabrielgrant)