I have heard of eager loading data relationships a lot in the past, but it was only recently that I became more conscious of it in my APIs. I will be showing you guys what it is and why you should bother about it in this post.
Prerequisites
The prerequisites for this post is a basic understanding of SQL Joins, SQLAlchemy and Python
Definitions of Eager Loading VS Lazy Loading
Eager loading is a technique of retrieving a model's relationship data while querying the model either through a JOIN or subquery.
Lazy loading, on the other hand, retrieves the related models only when an attempt is made to retrieve the relationship field by emitting SELECT statements.
I think examples would help make this clearer.
Examples
Let's say we have three models, namely User, Membership and Company, such that there is a many-to-many relationship between users and companies like so:
In SQLAlchemy models:
class User(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
username = db.Column(
db.String(20),
nullable=False,
)
password_hash = db.Column(db.VARCHAR(130), nullable=False)
memberships = db.relationship('Membership',
back_populates='member')
class Company(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
name = db.Column(db.String(120), nullable=False)
website = db.Column(db.String(), nullable=False)
address = db.Column(db.String(), nullable=False)
memberships = db.relationship('Membership',
back_populates='company')
class Membership(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
company_id = db.Column(db.Integer,
db.ForeignKey('company.id'))
role = db.Column(db.String, default='REGULAR_USER')
member = db.relationship("User", back_populates="memberships")
company = db.relationship('Company', back_populates="memberships")
The Lazy Problem
Now if we query a membership from our database, like so:
>>> membership = Membership.query.first()
By default, SQLAlchemy retrieves only the data contained in the Membership table by running the following SQL query:
INFO:sqlalchemy.engine.base.Engine:SELECT membership.id AS membership_id, membership.user_id AS membership_user_id, membership.company_id AS membership_company_id, membership.role AS membership_role
FROM membership
LIMIT %(param_1)s
Now when you try to access the member.username
contained in the membership object:
>>> membership.member.username
it makes another DB call to retrieve the member from the User's table which then contains the username:
INFO:sqlalchemy.engine.base.Engine:SELECT "user".id AS user_id, "user".username AS user_username, "user".password_hash AS user_password_hash
FROM "user"
WHERE "user".id = %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}
Similarly, when we attempt to retrieve name
of the company
, it results in another DB call:
>>> membership.company.name
INFO:sqlalchemy.engine.base.Engine:SELECT company.id AS company_id, company.name AS company_name, company.website AS company_website, company.address AS company_address
FROM company
WHERE company.id = %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}
When you retrieve multiple rows(say 50) from a table using lazy loading and you run a loop to access a related field, SQLAlchemy would emit SELECT statements for each of the (50) models you retrieved.
Note that looping through each model described here is how marshmallow dumps nested fields to JSON.
That is lazy-loading relationship fields work. You try to retrieve data only when you need them.
The Eager Solution
When we Eager-Load relationship fields, we tell SQLAlchemy to retrieve those fields when the first query is being made either through a JOIN or a subquery. In general, JOINs tend to be more efficient than subqueries.
We do this by using sqlalchemy.orm
module like so:
>>> from sqlalchemy import orm
>>> Membership.query.options(orm.joinedload('company')).first()
This generates the following SQL:
INFO:sqlalchemy.engine.base.Engine:SELECT membership.id AS membership_id, membership.user_id AS membership_user_id, membership.company_id AS membership_company_id, membership.role AS membership_role, company_1.id AS company_1_id, company_1.name AS company_1_name, company_1.website AS company_1_website, company_1.address AS company_1_address
FROM membership LEFT OUTER JOIN company AS company_1 ON company_1.id = membership.company_id
LIMIT %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}
As you can see a join is made to retrieve the company
relationship thus trying to access the company name
>>> membership.company.name
'Ade Store1'
does not lead to any extra DB call.
Eager Loading multiple fields
What if we wanted to return nested data from both the company
and member
fields, how do we eager load both? We can do that easily by adding more arguments to the options method call like so:
membership = Membership.query.options(
orm.joinedload('company'), orm.joinedload('member')
).first()
This generates the following SQL statement:
INFO:sqlalchemy.engine.base.Engine:SELECT membership.id AS membership_id, membership.user_id AS membership_user_id, membership.company_id AS membership_company_id, membership.role AS membership_role, user_1.id AS user_1_id, user_1.username AS user_1_username, user_1.password_hash AS user_1_password_hash, company_1.id AS company_1_id, company_1.name AS company_1_name, company_1.website AS company_1_website, company_1.address AS company_1_address
FROM membership LEFT OUTER JOIN "user" AS user_1 ON user_1.id = membership.user_id LEFT OUTER JOIN company AS company_1 ON company_1.id = membership.company_id
LIMIT %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}
With that accessing either the company
or member
field does not lead to any extra SQL statement being run.
Takeaways
You should eager-load relationship fields only when you know that those fields would be used in your code else you would fall into a situation where you are retrieving values you don't actually need which can make your API a little slower. Also, be sure to avoid lazy-loading fields that you would need for your logic
One other thing to note is that this concept is not tied to only SQLAlchemy. It also exists in Django ORM and some other non-python ORMs.
You can get more info on this topic from the SQLAlchemy Docs
Conclusion
I would like to end by saying that the syntax for the eager-loading doesn't quite cumbersome. You can make this easier by putting the logic in a BaseModel as I did in the blog post below
Simplifying SQLAlchemy models by creating a BaseModel
Chidiebere Ogujeiofor ・ Feb 12 '20 ・ 5 min read
Finally, you can view all the code(including setup) can be gotten the gist below:
Thanks for your time
Top comments (0)