# Django Query Optimization with prefetch_related and select_related

Most developers like in their early days focus mostly on getting the code to work. However, as we grow and build more applications, the focus changes from getting the code to work to make it work better and utilize resources effectively - Optimization.

Code optimization is any method of code modification to improve code quality and efficiency. A program may be optimized so that it becomes a smaller size, consumes less memory, executes more rapidly, or performs fewer input/output operations. - viva64


In this article, I will focus on database query optimization with two Django built-in functions: prefetch_related and select_related.

Throughout this article, the model below will be used for examples.

from django.db import models
from django.utils import timezone

class Seller(models.Model):

    user_id = models.CharField(primary_key=True, max_length=100, editable=False, unique=True)
    email = models.EmailField(max_length=500, unique=True,verbose_name="Email")
    seller_name = models.CharField(max_length=500, null=True, unique=True, verbose_name="Seller name")
    date_added = models.DateTimeField(default=timezone.now)

    class Meta:
        db_table = "seller_db"

    def __str__(self):
        return self.seller_name

class Product(models.Model):

    seller_id = models.ForeignKey(Seller, on_delete=models.CASCADE, editable=False, blank=False, related_name="product_seller")
    product_id = models.CharField(primary_key=True, max_length=100, editable=False, unique=True)
    product_name = models.TextField(blank=False, verbose_name="Product Name")
    product_description = models.TextField(blank=False, verbose_name="Product Description")
    date_added = models.DateTimeField(default=timezone.now)

    class Meta:
        db_table = "product_db"

    def __str__(self):
        return self.product_name

class Buyer(models.Model):

    buyer_name = models.CharField(primary_key=True, max_length=300, editable=False)
    products = models.ManyToManyField(Product)
    date_added = models.DateTimeField(default=timezone.now)

    class Meta:
        db_table = "buyer_db"

    def __str__(self):
        return self.buyer_name

Evaluation of database queries will be done using a Python package - django-print-sql. Other tools like django-silk, django-debug-toolbar or django-query-profiler can be used to achieve similar results.


Returns a QuerySet that will "follow" foreign-key relationships, selecting additional related-object data when it executes its query. This is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries. - Django Documentation

select_related creates an SQL join and includes the fields of the items in the related object in the SELECT statement of a particular query. Thus ensuring that later use of the related object called will not trigger a new database call.

select_related in Django works with ForeignKey, OneToOneField and reverse OneToOneField.

# Example

from rest_framework.decorators import api_view
from rest_framework.response import Response
from .models import *
from django_print_sql import print_sql_decorator

@api_view(['GET'])
@print_sql_decorator(count_only=True)
def without_select_related(request):
    product_queryset = Product.objects.all()
    data = []
    for product in product_queryset:
        data.append({
            "product_name":product.product_name,
            "prod_desc": product.product_description,
            "seller_name": product.seller_id.seller_name
        })
    return_data = {
        "error" : 0,
        "message" : "Successful",
        "data" : data
    }
    return Response(return_data)

Runing the above function will give the output below on our console.

[10 queries executed, total time elapsed 9.99ms]

We currently have 9 products and 5 sellers in our database. The first query fetches all the products, for each product, there is another database call to fetch the seller details. This database lookup happens 9 times has there are 9 products giving us a total of 10 queries. The number of queries will increase as we have more products in the database and we may end up having 50 queries or more. select_related solves this easily as shown below.

@api_view(['GET'])
@print_sql_decorator(count_only=True)
def with_select_related(request):
    product_queryset = Product.objects.select_related("seller_id").all()
    data = []
    for product in product_queryset:
        data.append({
            "product_name":product.product_name,
            "prod_desc": product.product_description,
            "seller_name": product.seller_id.seller_name
        })
    return_data = {
        "error" : 0,
        "message" : "Successful",
        "data" : data
    }
    return Response(return_data)

Running the above function will give the output below on our console.

[1 query executed, total time elapsed 2.97ms]

From 10 queries to just one in 2.97ms.

prefetch_related in Django is similar to select_related. However, prefetch_related does not create an SQL join like select_related. prefetch_related performs a seperate lookup for each related object and does the joining in Python. Also prefetch_related is not limited to ForeignKey, OneToOneField and reverse OneToOneField only. prefetch_related also supports ManyToManyField, GenericRelation and GenericForeignKey.

** prefetch_related can also be used with the above examples for select_related

# Example

@api_view(['GET'])
@print_sql_decorator(count_only=True)
def without_prefetch_related(request):
    buyer_queryset = Buyer.objects.all()
    data = []
    for buyer in buyer_queryset:
        buyer_products = [ product.product_name for product in buyer.products.all() ]
        data.append({"buyer_name":buyer.buyer_name, "products":buyer_products})
    return_data = {
        "error" : 0,
        "message" : "Successful",
        "data" : data
    }
    return Response(return_data)

Runing the above function will give the output below on our console.

[6 queries executed, total time elapsed 6.27ms]

We currently have 5 buyers in our database. The first query fetches the list of all buyers. Then for each buyer, there is another database lookup to search for the buyer's purchases. Altogether, we have a total of 6 queries. This will however quickly increase with the increase in buyers. prefetch_related can be used to solve this has shown below. Here we are delaying with ManyToManyField.

@api_view(['GET'])
@print_sql_decorator(count_only=True)
def with_prefetch_related(request):
    buyer_queryset = Buyer.objects.prefetch_related('products').all()
    data = []
    for buyer in buyer_queryset:
        buyer_products = [ product.product_name for product in buyer.products.all() ]
        data.append({"buyer_name":buyer.buyer_name, "products":buyer_products})
    return_data = {
        "error" : 0,
        "message" : "Successful",
        "data" : data
    }
    return Response(return_data)
[2 queries executed, total time elapsed 5.29ms]

prefetch_related in Django also allows you fetch related fields of related fields. An example can be seen below.

buyer_queryset = Buyer.objects.prefetch_related('products','products__seller_id').all()

This produces a data fetch of Buyers, Products and the Sellers of each product with [3 queries executed, total time elapsed 3.82ms] compared to [31 queries executed, total time elapsed 41.13ms] without using prefetch_related.

Prefetch object can be used to control prefetch_related further.

Buyer.objects.prefetch_related(Prefetch("products",queryset=Product.objects.order_by("seller_id")))

For more use cases, you should check out the Django Documentation.

Go forth and write optimized database queries !

More from SoftNexus