sqlalchemy is a popular object relational mapper (orm) for python; it provides a tutorial with many details, but might be difficult to follow as a beginner; so i write this article to make that even simpler, hopefully get novices started in a few minutes;

what is sqlalchemy

citing the first sentence in the tutorial:

The SQLAlchemy Object Relational Mapper presents a method of associating user-defined Python classes with database tables, and instances of those classes (objects) with rows in their corresponding tables. It includes a system that transparently synchronizes all changes in state between objects and their related rows, called a unit of work, as well as a system for expressing database queries in terms of the user defined classes and their defined relationships between each other.

so sqlalchemy does 2 jobs:

  • provide a mapping between python elements and database elements;

  • synchronize changes between python elements and database elements;

with sqlalchemy, you no longer need sql for database query; instead, you call python methods, and sqlalchemy does the rest for you;

install

sqlalchemy can be installed via pip:

pip install SQLAlchemy

example 0: a single table

our first example is very simple: create a python class User, mapped to rdbms table user; then create a User object user0, and add it as a row into the rdbms table user;

the code (with comments) speaks for itself:

#!/usr/bin/env python3

##  (boilerplate) imports;
from sqlalchemy import Column
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

##  (boilerplate) create database engine; here we create an in-memory sqlite
##  database, and echo sql statements issued to the rdbms for easy debugging;
##  you can use another rdbms (such as mysql) here;
engine = create_engine('sqlite:///:memory:', echo=True)

##  (boilerplate) create declarative base; this is the base class of all classes
##  mapped to rdbms tables, when using the *declarative* system; alternatively,
##  you can use *classical mappings*, but that wont be covered in this example;
Base = declarative_base()

##  declare a class `User` mapped to rdbms table `user`;
class User(Base):

    ##  rdbms table name;
    __tablename__ = 'user'

    ##  column spec; in this example, the table has 2 columns: an integer `id`
    ##  which is the primay key, and a string `name`; if multiple columns have
    ##  `primary_key=True`, they form a compound primary key;
    id      = Column(Integer, primary_key=True)
    name    = Column(String)

    ##  (optional) repr function; for easy debugging;
    def __repr__(self):
        return '<User:%r:%r>' % (self.id, self.name)

##  (boilerplate) create declared rdbms tables; this issues `CREATE TABLE`
##  statements to rdbms;
Base.metadata.create_all(engine)

##  now our table is created; then we create a row object;
user0 = User(name='Alice')
print(user0)        ## output: <User:None:'Alice'>

##  (boilerplate) create session factory; a session factory provides sessions;
Session = sessionmaker(bind=engine)

##  create a session; a session is the place to manipulate row objects; you may
##  think a session as a cache on the rows (though it isnt really a cache); you
##  may think a session as the staged area if you are familiar with git; you may
##  also think a thread as the eater, a session as the plate, and row objects as
##  the food; the session is not thread-safe; so each thread should get its own
##  session for database operation;
session = Session()

##  add the object to session (similar to `git add`);
session.add(user0)

##  flush the session; this implicitly begins a database transaction and issues
##  `INSERT INTO` statements; flush is optional per default, as sqlalchemy will
##  auto flush when necessary;
session.flush()
print(user0)        ##  now the row gets an id; output: <User:1:'Alice'>

##  commit the session (similar to `git commit`); this will first flush pending
##  operations to database then commit the transaction; if you want a rollback
##  instead of a commit, change this line to `session.rollback()`;
session.commit()
print(user0)        ##  output: <User:1:'Alice'>

##  query all the rows in rdbms table `user`;
rows = session.query(User).all()
print(rows)         ##  output: [<User:1:'Alice'>]

#  close the session;
session.close()

example 1: object states

the above example shows how an object works in a session; now it is good to know objects have states, depending on where they live; there are 5 object states, if you are using the latest version of sqlalchemy:

  • transient:

    an instance that is not in a session, and is not saved to the database; that is to say, it has no database identity;

  • pending:

    when you session.add() a transient instance, it becomes pending; it hasnt been flushed to the database yet, but will be when the next flush occurs;

  • persistent:

    an instance that is present in the session and has a record in the database; you get persistent instances by:

    • flushing, so that the pending instances become persistent, or

    • querying the database for existing instances;

  • deleted:

    an instance that has been deleted in a flush, but the transaction has not yet completed; objects in this state are essentially in the opposite of pending state; objects in this state have 2 possible fates:

    • if the transaction is committed, the object will move to detached state;

    • if the transaction is rolled back, the object will move back to persistent state;

  • detached:

    an instance which corresponds, or previously corresponded, to a record in the database, but is not currently in any session; the detached object will contain a database identity marker, however because it is not associated with a session, it is unknown whether or not this database identity actually exists in a target database;

we can inspect an object state using the inspect() function:

#!/usr/bin/env python3

from sqlalchemy import Column
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import inspect

engine = create_engine('sqlite:///:memory:', echo=False)
Base = declarative_base()

class User(Base):
    __tablename__ = 'user'
    id      = Column(Integer, primary_key=True)
    name    = Column(String)
    def __repr__(self):
        return '<User:%r:%r>' % (self.id, self.name)

Base.metadata.create_all(engine)

Session = sessionmaker(bind=engine)
session = Session()

user0 = User(name='Alice')
print(
    inspect(user0).transient,
    inspect(user0).pending,
    inspect(user0).persistent,
    inspect(user0).deleted,
    inspect(user0).detached,
)   ##  transient;
session.add(user0)
print(
    inspect(user0).transient,
    inspect(user0).pending,
    inspect(user0).persistent,
    inspect(user0).deleted,
    inspect(user0).detached,
)   ##  pending;
session.flush()
print(
    inspect(user0).transient,
    inspect(user0).pending,
    inspect(user0).persistent,
    inspect(user0).deleted,
    inspect(user0).detached,
)   ##  persistent;
session.delete(user0)
session.flush()
print(
    inspect(user0).transient,
    inspect(user0).pending,
    inspect(user0).persistent,
    inspect(user0).deleted,
    inspect(user0).detached,
)   ##  deleted;
session.commit()
print(
    inspect(user0).transient,
    inspect(user0).pending,
    inspect(user0).persistent,
    inspect(user0).deleted,
    inspect(user0).detached,
)   ##  detached;
session.close()

should output:

True False False False False
False True False False False
False False True False False
False False False True False
False False False False True

now that we have seen how one table works, we will add another table and build relationship between them; traditionally, to link two rdbms tables, we use a foreign key; this is still true with sqlalchemy;

the following example adds a second table address, which represents an email address; this is a one-to-many relationship: a user may have multiple email addresses, and each email address must have a user; the relationship is reflected by a foreign key user_id defined in table address, pointing to the column id in table user;

here is the code:

#!/usr/bin/env python3

from sqlalchemy import Column
from sqlalchemy import ForeignKey
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy import create_engine
from sqlalchemy import inspect
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

engine = create_engine('sqlite:///:memory:', echo=False)
Base = declarative_base()

class User(Base):
    __tablename__ = 'user'
    id      = Column(Integer, primary_key=True)
    name    = Column(String)
    def __repr__(self):
        return '<User:%r:%r>' % (self.id, self.name)

class Address(Base):
    __tablename__ = 'address'
    id      = Column(Integer, primary_key=True)
    email   = Column(String)
    user_id = Column(Integer, ForeignKey('user.id'))
    def __repr__(self):
        return '<Address:%r:%r:%r>' % (self.id, self.email, self.user_id)

Base.metadata.create_all(engine)

Session = sessionmaker(bind=engine)
session = Session()

user0 = User(name='Alice')
session.add(user0)
session.flush()     ##  necessary for getting an auto-assigned id;

addr0 = Address(email='[email protected]', user_id=user0.id)
session.add(addr0)
session.flush()
addr1 = Address(email='[email protected]', user_id=user0.id)
session.add(addr1)
session.flush()

session.commit()

users = session.query(User).all()
print(users)
addrs = session.query(Address).all()
print(addrs)
matched_rows = session.query(User, Address).filter(User.id == Address.user_id).all()
print(matched_rows)

session.close()

output:

[<User:1:'Alice'>]
[<Address:1:'[email protected]':1>, <Address:2:'[email protected]':1>]
[(<User:1:'Alice'>, <Address:1:'[email protected]':1>), (<User:1:'Alice'>, <Address:2:'[email protected]':1>)]

the above example shows how to build relationship at database level; however, sqlalchemy also provides its own relationship model at orm level, using sqlalchemy.orm.relationship; the orm-level relationship tracks relationship between objects in a session, which can help users manipulate objects in a smarter way; for example, when users add an object to the database, sqlalchemy also auto adds its linked objects; note that we still need the foreign key when using the orm-level relationship;

an example demonstrating orm-level relationship:

#!/usr/bin/env python3

from sqlalchemy import Column
from sqlalchemy import ForeignKey
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy import create_engine
from sqlalchemy import inspect
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
from sqlalchemy.orm import sessionmaker

engine = create_engine('sqlite:///:memory:', echo=False)
Base = declarative_base()

class User(Base):
    __tablename__ = 'user'
    id      = Column(Integer, primary_key=True)
    name    = Column(String)
    ##  relationship: a user has a list of addresses; sqlalchemy is clever
    ##  enough to infer this is a list by inspecting the foreign key;
    ##
    ##  the address object should keep a back reference in its field `user`;
    addrs   = relationship('Address', back_populates='user')
    def __repr__(self):
        return '<User:%r:%r>' % (self.id, self.name)

class Address(Base):
    __tablename__ = 'address'
    id      = Column(Integer, primary_key=True)
    email   = Column(String)
    user_id = Column(Integer, ForeignKey('user.id'))
    ##  relationship: an address has a user; sqlalchemy is clever enough to
    ##  infer this is not a list by inspecting the foreign key;
    ##
    ##  the user object should keep a back reference in its field `addrs`;
    user    = relationship('User', back_populates='addrs')
    def __repr__(self):
        return '<Address:%r:%r:%r>' % (self.id, self.email, self.user_id)

Base.metadata.create_all(engine)

Session = sessionmaker(bind=engine)
session = Session()

user0 = User(name='Alice')
addr0 = Address(email='[email protected]')
addr1 = Address(email='[email protected]')

##  build object relationship at orm level;
user0.addrs.append(addr0)
user0.addrs.append(addr1)

##  note that we dont need to explicitly add `addr0` and `addr1`; they are added
##  automatically when we add `user0` because they are linked by `user0`;
session.add(user0)
session.commit()

users = session.query(User).all()
print(users)
addrs = session.query(Address).all()
print(addrs)
matched_rows = session.query(User, Address).filter(User.id == Address.user_id).all()
print(matched_rows)

session.close()

output (should be the same as the previous example):

[<User:1:'Alice'>]
[<Address:1:'[email protected]':1>, <Address:2:'[email protected]':1>]
[(<User:1:'Alice'>, <Address:1:'[email protected]':1>), (<User:1:'Alice'>, <Address:2:'[email protected]':1>)]