Django Database Call Optimization
Did you know that it takes longer to grab 1 piece of data from the database 100 times than it does to grab 100 pieces of data from the database 1 time? Apparently it takes a lot longer.
In a bit of code I was working on I was making two database calls within an iteration. It turns out it is much more efficient to make a slightly more complicated database call outside of the iteration and then access the prefetched information within the iteration without hitting the database again each time.
But it’s not just about fitting your query into one line or what looks like one call. When you build a query set the corresponding SQL is built but it is not run until one of 7 things happens. Here are the most common of the list, at least in the code I have been writing.
iteration -the first time you iterate through the query set it will evaluate the results in order to iterate through them. If you iterate through a second time it does not hit the database.
len() -evaluates the query set in order to figure out how many objects it returns.
list() -force the query set to become a list object/ force evaluation.
bool() -test whether or not a query set returns any objects (some=True, none=False).
repr() -this is called whenever you run a query in the python shell so you can see your results right away. Similarly, print() runs the SQL as well.
You can do all sorts of stuff and still only hit the database once!
queryset1 = Article.objects.filter(title__startswith=“The”) queryset2 = queryset1.filter(publication__in=publication_list) queryset3 = UserArticleShare.objects.filter(article__in=queryset2)
I still haven’t hit the database!
# print names of users who have shared articles starting with ‘The’ # and belonging to publications on the publication_list: for uas in queryset3: # now I finally hit the database one time! print(uas.user.name) # but this also hits the database! And since this happens within # the iteration, that's a lot of database calls
Why does uas.user hit the database?
Once queryset3 is evaluated I can access all of the data it grabbed from the database without hitting the database a second time. But it turns out user is not among that data. That's because user is not actually a field on UserArticleShare, in fact, the field is user_id. Calling user runs the SQL to grab the User object's fields given it's id.
Well, you could just print the user_id and it won't run any extra SQL:
for uas in queryset3: # hit the database once print(uas.user_id) # doesn't hit the database. But that's no fun!
So there's something called prefetch_related.
for uas in queryset3.prefetch_related('user'): # hit the database once print(uas.user.name) # DOES NOT hit the database because # we have prefetched all the user objects in the original query.
Quiz: Why doesn't it hit the database when you call name on user? We didn't prefetch that!
Answer: Because name is not a relationship to another model, it's just a field on the UserArticleShare model just like user_id.
P.S. Don't use len() on a query set. It will have to grab all the data and then count in in Python Use count(). This will count in in the SQL instead, which is much faster.