20395
Comment:
|
24198
Update the tutorial from trunk to include the new 'Debugging' section.
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
Line 135: | Line 134: |
Line 153: | Line 151: |
Let's try to show that this is the case in practice. {{{#!python >>> del person >>> joe.tainted = True >>> joe.tainted True >>> joe = store.get(Person, 1) >>> joe.tainted True }}} Now, let's get rid of the last reference, and see if it's still tainted. {{{#!python >>> del joe >>> import gc >>> collected = gc.collect() >>> joe = store.get(Person, 1) >>> joe.tainted Traceback (most recent call last): ... AttributeError: 'Person' object has no attribute 'tainted' >>> print "%r, %r" % (joe.id, joe.name) 1, u'Joe Johnes' }}} So, basically the last reference disapeared, and the object got deallocated, so the store has loaded the object again for us. |
Storm ensures that at least a certain number of recently used objects will stay in memory inside the transaction, so that frequently used objects are not retrieved from the database too often. |
Line 644: | Line 612: |
==== Sub-selects ==== Suppose we want to find all accountants that aren't associated with a company. We can use a sub-select to get the data we want. {{{#!python >>> laura = Accountant(u"Laura Montgomery") >>> store.add(laura) <Accountant ...> >>> subselect = Select(CompanyAccountant.accountant_id, distinct=True) >>> result = store.find(Accountant, Not(Accountant.id.is_in(subselect))) >>> result.one() is laura True >>> }}} ==== Ordering and limiting results ==== Ordering and limiting results obtained are certainly among the simplest and yet most wanted features for such tools, so we want to make them very easy to understand and use, of course. A code of line is worth a thousand words, so here are a few examples that demonstrate how it works: {{{ >>> garry = store.add(Employee(u"Garry Glare")) >>> result = store.find(Employee) >>> [employee.name for employee in result.order_by(Employee.name)] [u'Ben Bill', u'Garry Glare', u'Mike Mayer'] >>> [employee.name for employee in result.order_by(Desc(Employee.name))] [u'Mike Mayer', u'Garry Glare', u'Ben Bill'] >>> [employee.name for employee in result.order_by(Employee.name)[:2]] [u'Ben Bill', u'Garry Glare'] }}} ==== Multiple types with one query ==== Sometimes, it may be interesting to retrieve more than one object involved in a given query. Imagine, for instance, that besides knowing which companies have an employee named Ben, we also want to know who is the employee. This may be achieved with a query like follows: {{{#!python >>> result = store.find((Company, Employee), ... Employee.company_id == Company.id, ... Employee.name.like(u"Ben %")) >>> [(company.name, employee.name) for company, employee in result] [(u'Sweets Inc.', u'Ben Bill')] }}} |
|
Line 726: | Line 756: |
>>> store.invalidate(earl) | |
Line 727: | Line 758: |
>>> import gc | |
Line 735: | Line 767: |
was still in memory and cached. When we deleted the object and ensured that it was collected, the object had to be retrieved from the database again, and thus the hook was called (and not the constructor!). |
was still in memory and cached. Then, we invalidated the object from Storm's internal cache and ensured that it was out-of-memory by triggering a garbage collection. After that, the object had to be retrieved from the database again, and thus the hook was called (and not the constructor!). |
Line 787: | Line 820: |
Besides auto-reloading, it's also possible to assign a value that will run an arbitrary SQL expression to update a value. For instance: |
Besides auto-reloading, it's also possible to assign what we call a "lazy expression" to an attribute. Such expressions are flushed to the database when the attribute is accessed, or when the object is flushed to the database (INSERT/UPDATE time). For instance: |
Line 805: | Line 842: |
==== Aliases ==== So now let's say that we want to find every pair of people that work for the same company. I have no idea about why one would ''want'' to do that, but that's a good case for us to exercise aliases. First, we import `ClassAlias` into the local namespace (''mental note: this should be in storm.locals as well''), and create a reference to it. {{{#!python >>> from storm.info import ClassAlias >>> AnotherEmployee = ClassAlias(Employee) }}} Nice, isn't it? Now we can easily make the query we want, in a straightforward way: {{{#!python >>> result = store.find((Employee, AnotherEmployee), ... Employee.company_id == AnotherEmployee.company_id, ... Employee.id > AnotherEmployee.id) >>> for employee1, employee2 in result: ... print (employee1.name, employee2.name) (u'Mike Mayer', u'Ben Bill') }}} Woah! Mike and Ben work for the same company! (Quiz for the attent reader: why is ''greater than'' being used in the query above?) ==== Debugging ==== Sometimes you just need to see which statements Storm is executing. A debug tracer built on top of Storm's tracing system can be used to see what's going on under the hood. A tracer is an object that gets notified when interesting events occur, such as when Storm executes a statement. A function to enable and disable statement tracing is provided. Statements are logged to sys.stderr by default, but a custom stream may also be used. {{{#!python >>> import sys >>> from storm.tracer import debug >>> debug(True, stream=sys.stdout) >>> result = store.find((Employee, AnotherEmployee), ... Employee.company_id == AnotherEmployee.company_id, ... Employee.id > AnotherEmployee.id) >>> list(result) EXECUTE: 'SELECT employee.company_id, employee.id, employee.name, "...".company_id, "...".id, "...".name FROM employee, employee AS "..." WHERE employee.company_id = "...".company_id AND employee.id > "...".id', () [(<Employee object at ...>, <Employee object at ...>)] >>> debug(False) >>> list(result) [(<Employee object at ...>, <Employee object at ...>)] }}} |
It runs!
This Storm tutorial is included in the source code at tests/tutorial.txt, so that it may be tested and stay up to date.
Importing
Let's start by importing some names into the namespace.
Basic definition
Now we define a type with some properties describing the information we're about to map.
Notice that this has no Storm-defined base class or constructor.
Creating a database and the store
We still don't have anyone to talk to, so let's define an in-memory SQLite database to play with, and a store using that database.
Three databases are supported at the moment: SQLite, MySQL and PostgreSQL. The parameter passed to create_database() is an URI, as follows:
database = create_database("scheme://username:password@hostname:port/database_name")
The scheme may be "sqlite", "postgres", or "mysql".
Now we have to create the table that will actually hold the data for our class.
We got a result back, but we don't care about it for now. We could also use noresult=True to avoid the result entirely.
Creating an object
Let's create an object of the defined class.
So far this object has no connection to a database. Let's add it to the store we've created above.
Notice that the object wasn't changed, even after being added to the store. That's because it wasn't flushed yet.
The store of an object
Once an object is added to a store, or retrieved from a store, it's relation to that store is known. We can easily verify which store an object is bound.
>>> Store.of(joe) is store True >>> Store.of(Person()) is None True
Finding an object
Now, what would happen if we actually asked the store to give us the person named Joe Johnes?
The person is there! Yeah, ok, you were expecting it.
We can also retrieve the object using its primary key.
Caching behavior
One interesting thing is that this person is actually Joe, right? We've just added this object, so there's only one Joe, why would there be two different objects? There isn't.
What's going on behind the scenes is that each store has an object cache. When an object is linked to a store, it will be cached by the store for as long as there's a reference to the object somewhere, or while the object is dirty (has unflushed changes).
Storm ensures that at least a certain number of recently used objects will stay in memory inside the transaction, so that frequently used objects are not retrieved from the database too often.
Flushing
When we tried to find Joe in the database for the first time, we've noticed that the id property was magically assigned. This happened because the object was flushed implicitly so that the operation would affect any pending changes as well.
Flushes may also happen explicitly.
Changing objects with the Store
Besides changing objects as usual, we can also benefit from the fact that objects are tied to a database to change them using expressions.
This operation will touch every matching object in the database, and also objects that are alive in memory.
Committing
Everything we've done so far is inside a transaction. At this point, we can either make these changes and any pending uncommitted changes persistent by committing them, or we can undo everything by rolling them back.
We'll commit them, with something as simple as
That was straightforward. Everything is still the way it was, but now changes are there "for real".
Rolling back
Aborting changes is very straightforward as well.
Let's see if these changes are really being considered by Storm and by the database.
Yes, they are. Now, for the magic step (suspense music, please).
Erm.. nothing happened?
Actually, something happened.. with Joe. He's back!
Constructors
So, we've been working for too long with people only. Let's introduce a new kind of data in our model: companies. For the company, we'll use a constructor, just for the fun of it. It will be the simplest company class you've ever seen:
Notice that the constructor parameter isn't optional. It could be optional, if we wanted, but our companies always have names.
Let's add the table for it.
Then, create a new company.
>>> circus = Company(u"Circus Inc.") >>> print "%r, %r" % (circus.id, circus.name) None, u'Circus Inc.'
The id is still undefined because we haven't flushed it. In fact, we haven't even added the company to the store. We'll do that soon. Watch out.
References and subclassing
Now we want to assign some employees to our company. Rather than redoing the Person definition, we'll keep it as it is, since it's general, and will create a new subclass of it for employees, which include one extra field: the company id.
Pay attention to that definiton for a moment. Notice that it doesn't define what's already in person, and introduces the company_id, and a company property, which is a reference to another class. It also has a constructor, but which leaves the company alone.
As usual, we need a table. SQLite has no idea of what a foreign key is, so we'll not bother to define it.
Let's give life to Ben now.
We can see that they were not flushed yet. Even then, we can say that Bill works on Circus.
Of course, we still don't know the company id since it was not flushed to the database yet, and we didn't assign an id explicitly. Storm is keeping the relationship even then.
If whatever is pending is flushed to the database (implicitly or explicitly), objects will get their ids, and any references are updated as well (before being flushed!).
They're both flushed to the database. Now, notice that the Circus company wasn't added to the store explicitly in any moment. Storm will do that automatically for referenced objects, for both objects (the referenced and the referencing one).
Let's create another company to check something. This time we'll flush the store just after adding it.
Nice, we've already got the id of the new company. So, what would happen if we changed just the id for Ben's company?
Hah! That wasn't expected, was it?
Let's commit everything.
Many-to-one reference sets
So, while our model says that employees work for a single company (we only design normal people here), companies may of course have multiple employees. We represent that in Storm using reference sets.
We won't define the company again. Instead, we'll add a new attribute to the class.
Without any further work, we can already see which employees are working for a given company.
Let's create another employee, and add him to the company, rather than setting the company in the employee (it sounds better, at least).
That, of course, means that Mike's working for a company, and so it should be reflected elsewhere.
Many-to-many reference sets and composed keys
We want to represent accountants in our model as well. Companies have accountants, but accountants may also attend several companies, so we'll represent that using a many-to-many relationship.
Let's create a simple class to use with accountants, and the relationship class.
1 >>> class Accountant(Person):
2 ... __storm_table__ = "accountant"
3 ... def __init__(self, name):
4 ... self.name = name
5
6 >>> class CompanyAccountant(object):
7 ... __storm_table__ = "company_accountant"
8 ... __storm_primary__ = "company_id", "accountant_id"
9 ... company_id = Int()
10 ... accountant_id = Int()
Hey, we've just declared a class with a composed key!
Now, let's use it to declare the many-to-many relationship in the company. Once more, we'll just stick the new attribute in the existent object. It may easily be defined at class definition time. Later we'll see another way to do that as well.
Done! The order in which attributes were defined is important, but the logic should be pretty obvious.
We're missing some tables, at this point.
Let's give life to a couple of accountants, and register them in both companies.
That's it! Really! Notice that we didn't even have to add them to the store, since it happens implicitly by linking to the other object which is already in the store, and that we didn't have to declare the relationship object, since that's known to the reference set.
We can now check them.
>>> sweets.accountants.count() 2 >>> circus.accountants.count() 1
Even though we didn't use the CompanyAccountant object explicitly, we can check it if we're really curious.
Notice that we pass a tuple for the get() method, due to the composed key.
If we wanted to know for which companies accountants are working, we could easily define a reversed relationship:
1 >>> Accountant.companies = ReferenceSet(Accountant.id,
2 ... CompanyAccountant.accountant_id,
3 ... CompanyAccountant.company_id,
4 ... Company.id)
5
6 >>> [company.name for company in frank.companies]
7 [u'Circus Inc.', u'Sweets Inc.']
8
9 >>> [company.name for company in karl.companies]
10 [u'Sweets Inc.']
Joins
Since we've got some nice data to play with, let's try to make a few interesting queries.
Let's start by checking which companies have at least one employee named Ben. We have at least two ways to do it.
First, with an implicit join.
Then, we can also do an explicit join. This is interesting for mapping complex SQL joins to Storm queries.
If we already had the company, and wanted to know which of his employees were named Ben, that'd have been easier.
Sub-selects
Suppose we want to find all accountants that aren't associated with a company. We can use a sub-select to get the data we want.
Ordering and limiting results
Ordering and limiting results obtained are certainly among the simplest and yet most wanted features for such tools, so we want to make them very easy to understand and use, of course.
A code of line is worth a thousand words, so here are a few examples that demonstrate how it works:
>>> garry = store.add(Employee(u"Garry Glare")) >>> result = store.find(Employee) >>> [employee.name for employee in result.order_by(Employee.name)] [u'Ben Bill', u'Garry Glare', u'Mike Mayer'] >>> [employee.name for employee in result.order_by(Desc(Employee.name))] [u'Mike Mayer', u'Garry Glare', u'Ben Bill'] >>> [employee.name for employee in result.order_by(Employee.name)[:2]] [u'Ben Bill', u'Garry Glare']
Multiple types with one query
Sometimes, it may be interesting to retrieve more than one object involved in a given query. Imagine, for instance, that besides knowing which companies have an employee named Ben, we also want to know who is the employee. This may be achieved with a query like follows:
The Storm base class
So far we've been defining our references and reference sets using classes and their properties. This has some advantages, like being easier to debug, but also has some disadvantages, such as requiring classes to be present in the local scope, what potentially leads to circular import issues.
To prevent that kind of situation, Storm supports defining these references using the stringified version of the class and property names. The only inconvenience of doing so is that all involved classes must inherit from the Storm base class.
Let's define some new classes to show that. To expose the point, we'll refer to a class before it's actually defined.
1 >>> class Country(Storm):
2 ... __storm_table__ = "country"
3 ... id = Int(primary=True)
4 ... name = Unicode()
5 ... currency_id = Int()
6 ... currency = Reference(currency_id, "Currency.id")
7
8 >>> class Currency(Storm):
9 ... __storm_table__ = "currency"
10 ... id = Int(primary=True)
11 ... symbol = Unicode()
12
13 >>> store.execute("CREATE TABLE country "
14 ... "(id INTEGER PRIMARY KEY, name VARCHAR, currency_id INTEGER)",
15 ... noresult=True)
16
17 >>> store.execute("CREATE TABLE currency "
18 ... "(id INTEGER PRIMARY KEY, symbol VARCHAR)", noresult=True)
Now, let's see if it works.
Questions!?
Loading hook
Storm allows classes to define a few different hooks are called to act when certain things happen. One of the interesting hooks available is the __storm_loaded__ one.
Let's play with it. We'll define a temporary subclass of Person for that.
1 >>> class PersonWithHook(Person):
2 ... def __init__(self, name):
3 ... print "Creating %s" % name
4 ... self.name = name
5 ...
6 ... def __storm_loaded__(self):
7 ... print "Loaded %s" % self.name
8
9
10 >>> earl = store.add(PersonWithHook(u"Earl Easton"))
11 Creating Earl Easton
12
13 >>> earl = store.find(PersonWithHook, name=u"Earl Easton").one()
14
15 >>> store.invalidate(earl)
16 >>> del earl
17 >>> import gc
18 >>> collected = gc.collect()
19
20 >>> earl = store.find(PersonWithHook, name=u"Earl Easton").one()
21 Loaded Earl Easton
Note that in the first find, nothing was called, since the object was still in memory and cached. Then, we invalidated the object from Storm's internal cache and ensured that it was out-of-memory by triggering a garbage collection. After that, the object had to be retrieved from the database again, and thus the hook was called (and not the constructor!).
Executing expressions
Storm also offers a way to execute expressions in a database-agnostic way, when that's necessary.
For instance:
This mechanism is used internally by Storm itself to implement the higher level features.
Auto-reloading values
Storm offers some special values that may be assigned to attributes under its control. One of these values is AutoReload. When used, it will make the object automatically reload the value from the database when touched. Even primary keys may benefit from its use, as shown below.
>>> from storm.locals import AutoReload >>> ruy = store.add(Person()) >>> ruy.name = u"Ruy" >>> print ruy.id None >>> ruy.id = AutoReload >>> print ruy.id 4
This may be set as the default value for any attribute, making the object be automatically flushed if necessary.
Expression values
Besides auto-reloading, it's also possible to assign what we call a "lazy expression" to an attribute. Such expressions are flushed to the database when the attribute is accessed, or when the object is flushed to the database (INSERT/UPDATE time).
For instance:
Notice that this is just an example of what may be done. There's no need to write SQL statements this way, if you don't want to. You may also use class-based SQL expressions provided in Storm, or even not use lazy expressions at all.
Aliases
So now let's say that we want to find every pair of people that work for the same company. I have no idea about why one would want to do that, but that's a good case for us to exercise aliases.
First, we import ClassAlias into the local namespace (mental note: this should be in storm.locals as well), and create a reference to it.
Nice, isn't it?
Now we can easily make the query we want, in a straightforward way:
Woah! Mike and Ben work for the same company!
(Quiz for the attent reader: why is greater than being used in the query above?)
Debugging
Sometimes you just need to see which statements Storm is executing. A debug tracer built on top of Storm's tracing system can be used to see what's going on under the hood. A tracer is an object that gets notified when interesting events occur, such as when Storm executes a statement. A function to enable and disable statement tracing is provided. Statements are logged to sys.stderr by default, but a custom stream may also be used.
1 >>> import sys
2 >>> from storm.tracer import debug
3
4 >>> debug(True, stream=sys.stdout)
5 >>> result = store.find((Employee, AnotherEmployee),
6 ... Employee.company_id == AnotherEmployee.company_id,
7 ... Employee.id > AnotherEmployee.id)
8 >>> list(result)
9 EXECUTE: 'SELECT employee.company_id, employee.id, employee.name, "...".company_id, "...".id, "...".name FROM employee, employee AS "..." WHERE employee.company_id = "...".company_id AND employee.id > "...".id', ()
10 [(<Employee object at ...>, <Employee object at ...>)]
11
12 >>> debug(False)
13 >>> list(result)
14 [(<Employee object at ...>, <Employee object at ...>)]
Much more!
There's a lot more about Storm to be shown. This tutorial is just a way to get initiated on some of the concepts. While your questions are not answered somewhere else, feel free to ask them in the mailing list.