Tutorial - Storm

Contents

It runs!
Importing
Basic definition
Creating a database and the store
Creating an object
The store of an object
Finding an object
Caching behavior
Flushing
Changing objects with the Store
Committing
Rolling back
Constructors
References and subclassing
Many-to-one reference sets
Many-to-many reference sets and composed keys
Joins
Sub-selects
Ordering and limiting results
Multiple types with one query
The Storm base class
Loading hook
Executing expressions
Auto-reloading values
Expression values
Aliases
Debugging
Much more!

It runs!

This Storm tutorial is included in the source code at tests/tutorial.txt, so that it may be tested and stay up to date.

Storm is packaged for Ubuntu by the Storm Team:

sudo add-apt-repository ppa:storm/ppa
sudo apt-get update
sudo apt-get install python-storm

Importing

Let's start by importing some names into the namespace.

   1 >>> from storm.locals import *
   2 >>>

Basic definition

Now we define a type with some properties describing the information we're about to map.

   1 >>> class Person(object):
   2 ...     __storm_table__ = "person"
   3 ...     id = Int(primary=True)
   4 ...     name = Unicode()

Notice that this has no Storm-defined base class or constructor.

Creating a database and the store

We still don't have anyone to talk to, so let's define an in-memory SQLite database to play with, and a store using that database.

   1 >>> database = create_database("sqlite:")
   2 >>> store = Store(database)
   3 >>>

Three databases are supported at the moment: SQLite, MySQL and PostgreSQL. The parameter passed to create_database() is an URI, as follows:

database = create_database("scheme://username:password@hostname:port/database_name")

The scheme may be "sqlite", "postgres", or "mysql".

Now we have to create the table that will actually hold the data for our class.

   1 >>> store.execute("CREATE TABLE person "
   2 ...               "(id INTEGER PRIMARY KEY, name VARCHAR)")
   3 <storm.databases.sqlite.SQLiteResult object at 0x...>

We got a result back, but we don't care about it for now. We could also use noresult=True to avoid the result entirely.

Creating an object

Let's create an object of the defined class.

   1 >>> joe = Person()
   2 >>> joe.name = u"Joe Johnes"
   3 
   4 >>> print "%r, %r" % (joe.id, joe.name)
   5 None, u'Joe Johnes'

So far this object has no connection to a database. Let's add it to the store we've created above.

   1 >>> store.add(joe)
   2 <Person object at 0x...>
   3 
   4 >>> print "%r, %r" % (joe.id, joe.name)
   5 None, u'Joe Johnes'

Notice that the object wasn't changed, even after being added to the store. That's because it wasn't flushed yet.

The store of an object

Once an object is added to a store, or retrieved from a store, it's relation to that store is known. We can easily verify which store an object is bound.

>>> Store.of(joe) is store
True

>>> Store.of(Person()) is None
True

Finding an object

Now, what would happen if we actually asked the store to give us the person named Joe Johnes?

   1 >>> person = store.find(Person, Person.name == u"Joe Johnes").one()
   2 
   3 >>> print "%r, %r" % (person.id, person.name)
   4 1, u'Joe Johnes'
   5 
   6 >>>

The person is there! Yeah, ok, you were expecting it.

We can also retrieve the object using its primary key.

   1 >>> store.get(Person, 1).name
   2 u'Joe Johnes'

Caching behavior

One interesting thing is that this person is actually Joe, right? We've just added this object, so there's only one Joe, why would there be two different objects? There isn't.

   1 >>> person is joe
   2 True

What's going on behind the scenes is that each store has an object cache. When an object is linked to a store, it will be cached by the store for as long as there's a reference to the object somewhere, or while the object is dirty (has unflushed changes).

Storm ensures that at least a certain number of recently used objects will stay in memory inside the transaction, so that frequently used objects are not retrieved from the database too often.

Flushing

When we tried to find Joe in the database for the first time, we've noticed that the id property was magically assigned. This happened because the object was flushed implicitly so that the operation would affect any pending changes as well.

Flushes may also happen explicitly.

   1 >>> mary = Person()
   2 >>> mary.name = u"Mary Margaret"
   3 >>> store.add(mary)
   4 <Person object at 0x...>
   5 
   6 >>> print "%r, %r" % (mary.id, mary.name)
   7 None, u'Mary Margaret'
   8 
   9 >>> store.flush()
  10 >>> print "%r, %r" % (mary.id, mary.name)
  11 2, u'Mary Margaret'

Changing objects with the Store

Besides changing objects as usual, we can also benefit from the fact that objects are tied to a database to change them using expressions.

   1 >>> store.find(Person, Person.name == u"Mary Margaret").set(name=u"Mary Maggie")
   2 >>> mary.name
   3 u'Mary Maggie'

This operation will touch every matching object in the database, and also objects that are alive in memory.

Committing

Everything we've done so far is inside a transaction. At this point, we can either make these changes and any pending uncommitted changes persistent by committing them, or we can undo everything by rolling them back.

We'll commit them, with something as simple as

   1 >>> store.commit()
   2 >>>

That was straightforward. Everything is still the way it was, but now changes are there "for real".

Rolling back

Aborting changes is very straightforward as well.

   1 >>> joe.name = u"Tom Thomas"
   2 >>>

Let's see if these changes are really being considered by Storm and by the database.

   1 >>> person = store.find(Person, Person.name == u"Tom Thomas").one()
   2 >>> person is joe
   3 True

Yes, they are. Now, for the magic step (suspense music, please).

   1 >>> store.rollback()
   2 >>>

Erm.. nothing happened?

Actually, something happened.. with Joe. He's back!

   1 >>> print "%r, %r" % (joe.id, joe.name)
   2 1, u'Joe Johnes'

Constructors

So, we've been working for too long with people only. Let's introduce a new kind of data in our model: companies. For the company, we'll use a constructor, just for the fun of it. It will be the simplest company class you've ever seen:

   1 >>> class Company(object):
   2 ...     __storm_table__ = "company"
   3 ...     id = Int(primary=True)
   4 ...     name = Unicode()
   5 ...
   6 ...     def __init__(self, name):
   7 ...         self.name = name

Notice that the constructor parameter isn't optional. It could be optional, if we wanted, but our companies always have names.

Let's add the table for it.

   1 >>> store.execute("CREATE TABLE company "
   2 ...               "(id INTEGER PRIMARY KEY, name VARCHAR)", noresult=True)

Then, create a new company.

>>> circus = Company(u"Circus Inc.")

>>> print "%r, %r" % (circus.id, circus.name)
None, u'Circus Inc.'

The id is still undefined because we haven't flushed it. In fact, we haven't even added the company to the store. We'll do that soon. Watch out.

References and subclassing

Now we want to assign some employees to our company. Rather than redoing the Person definition, we'll keep it as it is, since it's general, and will create a new subclass of it for employees, which include one extra field: the company id.

   1 >>> class Employee(Person):
   2 ...     __storm_table__ = "employee"
   3 ...     company_id = Int()
   4 ...     company = Reference(company_id, Company.id)
   5 ...
   6 ...     def __init__(self, name):
   7 ...         self.name = name

Pay attention to that definiton for a moment. Notice that it doesn't define what's already in person, and introduces the company_id, and a company property, which is a reference to another class. It also has a constructor, but which leaves the company alone.

As usual, we need a table. SQLite has no idea of what a foreign key is, so we'll not bother to define it.

   1 >>> store.execute("CREATE TABLE employee "
   2 ...               "(id INTEGER PRIMARY KEY, name VARCHAR, company_id INTEGER)",
   3 ...               noresult=True)

Let's give life to Ben now.

   1 >>> ben = store.add(Employee(u"Ben Bill"))
   2 
   3 >>> print "%r, %r, %r" % (ben.id, ben.name, ben.company_id)
   4 None, u'Ben Bill', None

We can see that they were not flushed yet. Even then, we can say that Bill works on Circus.

   1 >>> ben.company = circus
   2 
   3 >>> print "%r, %r" % (ben.company_id, ben.company.name)
   4 None, u'Circus Inc.'

Of course, we still don't know the company id since it was not flushed to the database yet, and we didn't assign an id explicitly. Storm is keeping the relationship even then.

If whatever is pending is flushed to the database (implicitly or explicitly), objects will get their ids, and any references are updated as well (before being flushed!).

   1 >>> store.flush()
   2 
   3 >>> print "%r, %r" % (ben.company_id, ben.company.name)
   4 1, u'Circus Inc.'

They're both flushed to the database. Now, notice that the Circus company wasn't added to the store explicitly in any moment. Storm will do that automatically for referenced objects, for both objects (the referenced and the referencing one).

Let's create another company to check something. This time we'll flush the store just after adding it.

   1 >>> sweets = store.add(Company(u"Sweets Inc."))
   2 >>> store.flush()
   3 >>> sweets.id
   4 2

Nice, we've already got the id of the new company. So, what would happen if we changed just the id for Ben's company?

   1 >>> ben.company_id = 2
   2 >>> ben.company.name
   3 u'Sweets Inc.'
   4 >>> ben.company is sweets
   5 True

Hah! That wasn't expected, was it?

Let's commit everything.

   1 >>> store.commit()
   2 >>>

Many-to-one reference sets

So, while our model says that employees work for a single company (we only design normal people here), companies may of course have multiple employees. We represent that in Storm using reference sets.

We won't define the company again. Instead, we'll add a new attribute to the class.

   1 >>> Company.employees = ReferenceSet(Company.id, Employee.company_id)
   2 >>>

Without any further work, we can already see which employees are working for a given company.

   1 >>> sweets.employees.count()
   2 1
   3 
   4 >>> for employee in sweets.employees:
   5 ...     print "%r, %r" % (employee.id, employee.name)
   6 ...     print employee is ben
   7 ...
   8 1, u'Ben Bill'
   9 True

Let's create another employee, and add him to the company, rather than setting the company in the employee (it sounds better, at least).

   1 >>> mike = store.add(Employee(u"Mike Mayer"))
   2 >>> sweets.employees.add(mike)
   3 >>>

That, of course, means that Mike's working for a company, and so it should be reflected elsewhere.

   1 >>> mike.company_id
   2 2
   3 
   4 >>> mike.company is sweets
   5 True

Many-to-many reference sets and composed keys

We want to represent accountants in our model as well. Companies have accountants, but accountants may also attend several companies, so we'll represent that using a many-to-many relationship.

Let's create a simple class to use with accountants, and the relationship class.

   1 >>> class Accountant(Person):
   2 ...     __storm_table__ = "accountant"
   3 ...     def __init__(self, name):
   4 ...         self.name = name
   5 
   6 >>> class CompanyAccountant(object):
   7 ...     __storm_table__ = "company_accountant"
   8 ...     __storm_primary__ = "company_id", "accountant_id"
   9 ...     company_id = Int()
  10 ...     accountant_id = Int()

Hey, we've just declared a class with a composed key!

Now, let's use it to declare the many-to-many relationship in the company. Once more, we'll just stick the new attribute in the existent object. It may easily be defined at class definition time. Later we'll see another way to do that as well.

   1 >>> Company.accountants = ReferenceSet(Company.id,
   2 ...                                    CompanyAccountant.company_id,
   3 ...                                    CompanyAccountant.accountant_id,
   4 ...                                    Accountant.id)

Done! The order in which attributes were defined is important, but the logic should be pretty obvious.

We're missing some tables, at this point.

   1 >>> store.execute("CREATE TABLE accountant "
   2 ...               "(id INTEGER PRIMARY KEY, name VARCHAR)", noresult=True)
   3 ...
   4 
   5 >>> store.execute("CREATE TABLE company_accountant "
   6 ...               "(company_id INTEGER, accountant_id INTEGER,"
   7 ...               " PRIMARY KEY (company_id, accountant_id))", noresult=True)

Let's give life to a couple of accountants, and register them in both companies.

   1 >>> karl = Accountant(u"Karl Kent")
   2 >>> frank = Accountant(u"Frank Fourt")
   3 
   4 >>> sweets.accountants.add(karl)
   5 >>> sweets.accountants.add(frank)
   6 
   7 >>> circus.accountants.add(frank)
   8 >>>

That's it! Really! Notice that we didn't even have to add them to the store, since it happens implicitly by linking to the other object which is already in the store, and that we didn't have to declare the relationship object, since that's known to the reference set.

We can now check them.

>>> sweets.accountants.count()
2

>>> circus.accountants.count()
1

Even though we didn't use the CompanyAccountant object explicitly, we can check it if we're really curious.

   1 >>> store.get(CompanyAccountant, (sweets.id, frank.id))
   2 <CompanyAccountant object at 0x...>

Notice that we pass a tuple for the get() method, due to the composed key.

If we wanted to know for which companies accountants are working, we could easily define a reversed relationship:

   1 >>> Accountant.companies = ReferenceSet(Accountant.id,
   2 ...                                     CompanyAccountant.accountant_id,
   3 ...                                     CompanyAccountant.company_id,
   4 ...                                     Company.id)
   5 
   6 >>> [company.name for company in frank.companies]
   7 [u'Circus Inc.', u'Sweets Inc.']
   8 
   9 >>> [company.name for company in karl.companies]
  10 [u'Sweets Inc.']

Joins

Since we've got some nice data to play with, let's try to make a few interesting queries.

Let's start by checking which companies have at least one employee named Ben. We have at least two ways to do it.

First, with an implicit join.

   1 >>> result = store.find(Company,
   2 ...                     Employee.company_id == Company.id,
   3 ...                     Employee.name.like(u"Ben %"))
   4 ...
   5 
   6 >>> [company.name for company in result]
   7 [u'Sweets Inc.']

Then, we can also do an explicit join. This is interesting for mapping complex SQL joins to Storm queries.

   1 >>> origin = [Company, Join(Employee, Employee.company_id == Company.id)]
   2 >>> result = store.using(*origin).find(Company, Employee.name.like(u"Ben %"))
   3 
   4 >>> [company.name for company in result]
   5 [u'Sweets Inc.']

If we already had the company, and wanted to know which of his employees were named Ben, that'd have been easier.

   1 >>> result = sweets.employees.find(Employee.name.like(u"Ben %"))
   2 
   3 >>> [employee.name for employee in result]
   4 [u'Ben Bill']

Sub-selects

Suppose we want to find all accountants that aren't associated with a company. We can use a sub-select to get the data we want.

   1 >>> laura = Accountant(u"Laura Montgomery")
   2 >>> store.add(laura)
   3 <Accountant ...>
   4 
   5 >>> subselect = Select(CompanyAccountant.accountant_id, distinct=True)
   6 >>> result = store.find(Accountant, Not(Accountant.id.is_in(subselect)))
   7 >>> result.one() is laura
   8 True
   9 >>>

Ordering and limiting results

Ordering and limiting results obtained are certainly among the simplest and yet most wanted features for such tools, so we want to make them very easy to understand and use, of course.

A code of line is worth a thousand words, so here are a few examples that demonstrate how it works:

>>> garry = store.add(Employee(u"Garry Glare"))

>>> result = store.find(Employee)

>>> [employee.name for employee in result.order_by(Employee.name)]
[u'Ben Bill', u'Garry Glare', u'Mike Mayer']

>>> [employee.name for employee in result.order_by(Desc(Employee.name))]
[u'Mike Mayer', u'Garry Glare', u'Ben Bill']

>>> [employee.name for employee in result.order_by(Employee.name)[:2]]
[u'Ben Bill', u'Garry Glare']

Multiple types with one query

Sometimes, it may be interesting to retrieve more than one object involved in a given query. Imagine, for instance, that besides knowing which companies have an employee named Ben, we also want to know who is the employee. This may be achieved with a query like follows:

   1 >>> result = store.find((Company, Employee),
   2 ...                     Employee.company_id == Company.id,
   3 ...                     Employee.name.like(u"Ben %"))
   4 
   5 >>> [(company.name, employee.name) for company, employee in result]
   6 [(u'Sweets Inc.', u'Ben Bill')]

The Storm base class

So far we've been defining our references and reference sets using classes and their properties. This has some advantages, like being easier to debug, but also has some disadvantages, such as requiring classes to be present in the local scope, what potentially leads to circular import issues.

To prevent that kind of situation, Storm supports defining these references using the stringified version of the class and property names. The only inconvenience of doing so is that all involved classes must inherit from the Storm base class.

Let's define some new classes to show that. To expose the point, we'll refer to a class before it's actually defined.

   1 >>> class Country(Storm):
   2 ...     __storm_table__ = "country"
   3 ...     id = Int(primary=True)
   4 ...     name = Unicode()
   5 ...     currency_id = Int()
   6 ...     currency = Reference(currency_id, "Currency.id")
   7 
   8 >>> class Currency(Storm):
   9 ...     __storm_table__ = "currency"
  10 ...     id = Int(primary=True)
  11 ...     symbol = Unicode()
  12 
  13 >>> store.execute("CREATE TABLE country "
  14 ...               "(id INTEGER PRIMARY KEY, name VARCHAR, currency_id INTEGER)",
  15 ...               noresult=True)
  16 
  17 >>> store.execute("CREATE TABLE currency "
  18 ...               "(id INTEGER PRIMARY KEY, symbol VARCHAR)", noresult=True)

Now, let's see if it works.

   1 >>> real = store.add(Currency())
   2 >>> real.id = 1
   3 >>> real.symbol = u"BRL"
   4 
   5 >>> brazil = store.add(Country())
   6 >>> brazil.name = u"Brazil"
   7 >>> brazil.currency_id = 1
   8 
   9 >>> brazil.currency.symbol
  10 u'BRL'

Questions!?

Loading hook

Storm allows classes to define a few different hooks are called to act when certain things happen. One of the interesting hooks available is the __storm_loaded__ one.

Let's play with it. We'll define a temporary subclass of Person for that.

   1 >>> class PersonWithHook(Person):
   2 ...     def __init__(self, name):
   3 ...         print "Creating %s" % name
   4 ...         self.name = name
   5 ...
   6 ...     def __storm_loaded__(self):
   7 ...         print "Loaded %s" % self.name
   8 
   9 
  10 >>> earl = store.add(PersonWithHook(u"Earl Easton"))
  11 Creating Earl Easton
  12 
  13 >>> earl = store.find(PersonWithHook, name=u"Earl Easton").one()
  14 
  15 >>> store.invalidate(earl)
  16 >>> del earl
  17 >>> import gc
  18 >>> collected = gc.collect()
  19 
  20 >>> earl = store.find(PersonWithHook, name=u"Earl Easton").one()
  21 Loaded Earl Easton

Note that in the first find, nothing was called, since the object was still in memory and cached. Then, we invalidated the object from Storm's internal cache and ensured that it was out-of-memory by triggering a garbage collection. After that, the object had to be retrieved from the database again, and thus the hook was called (and not the constructor!).

Executing expressions

Storm also offers a way to execute expressions in a database-agnostic way, when that's necessary.

For instance:

   1 >>> result = store.execute(Select(Person.name, Person.id == 1))
   2 >>> result.get_one()
   3 (u'Joe Johnes',)

This mechanism is used internally by Storm itself to implement the higher level features.

Auto-reloading values

Storm offers some special values that may be assigned to attributes under its control. One of these values is AutoReload. When used, it will make the object automatically reload the value from the database when touched. Even primary keys may benefit from its use, as shown below.

>>> from storm.locals import AutoReload

>>> ruy = store.add(Person())
>>> ruy.name = u"Ruy"
>>> print ruy.id
None

>>> ruy.id = AutoReload
>>> print ruy.id
4

This may be set as the default value for any attribute, making the object be automatically flushed if necessary.

Expression values

Besides auto-reloading, it's also possible to assign what we call a "lazy expression" to an attribute. Such expressions are flushed to the database when the attribute is accessed, or when the object is flushed to the database (INSERT/UPDATE time).

For instance:

   1 from storm.locals import SQL
   2 
   3 >>> ruy.name = SQL("(SELECT name || ? FROM person WHERE id=4)", (" Ritcher",))
   4 >>> ruy.name
   5 u'Ruy Ritcher'

Notice that this is just an example of what may be done. There's no need to write SQL statements this way, if you don't want to. You may also use class-based SQL expressions provided in Storm, or even not use lazy expressions at all.

Aliases

So now let's say that we want to find every pair of people that work for the same company. I have no idea about why one would want to do that, but that's a good case for us to exercise aliases.

First, we import ClassAlias into the local namespace (mental note: this should be in storm.locals as well), and create a reference to it.

   1 >>> from storm.info import ClassAlias
   2 >>> AnotherEmployee = ClassAlias(Employee)

Nice, isn't it?

Now we can easily make the query we want, in a straightforward way:

   1 >>> result = store.find((Employee, AnotherEmployee),
   2 ...                     Employee.company_id == AnotherEmployee.company_id,
   3 ...                     Employee.id > AnotherEmployee.id)
   4 
   5 >>> for employee1, employee2 in result:
   6 ...     print (employee1.name, employee2.name)
   7 (u'Mike Mayer', u'Ben Bill')

Woah! Mike and Ben work for the same company!

(Quiz for the attent reader: why is greater than being used in the query above?)

Debugging

Sometimes you just need to see which statements Storm is executing. A debug tracer built on top of Storm's tracing system can be used to see what's going on under the hood. A tracer is an object that gets notified when interesting events occur, such as when Storm executes a statement. A function to enable and disable statement tracing is provided. Statements are logged to sys.stderr by default, but a custom stream may also be used.

   1 >>> import sys
   2 >>> from storm.tracer import debug
   3 
   4 >>> debug(True, stream=sys.stdout)
   5 >>> result = store.find((Employee, AnotherEmployee),
   6 ...                     Employee.company_id == AnotherEmployee.company_id,
   7 ...                     Employee.id > AnotherEmployee.id)
   8 >>> list(result)
   9 EXECUTE: 'SELECT employee.company_id, employee.id, employee.name, "...".company_id, "...".id, "...".name FROM employee, employee AS "..." WHERE employee.company_id = "...".company_id AND employee.id > "...".id', ()
  10 [(<Employee object at ...>, <Employee object at ...>)]
  11 
  12 >>> debug(False)
  13 >>> list(result)
  14 [(<Employee object at ...>, <Employee object at ...>)]

Much more!

There's a lot more about Storm to be shown. This tutorial is just a way to get initiated on some of the concepts. While your questions are not answered somewhere else, feel free to ask them in the mailing list.

Wiki

Page

User

Trail