If you’re working with resource intensive operations on large databases in Django, chances are you’ll encounter a “Memory Error”. I’ve seen this error numerous times over the last couple of weeks as part of a large data conversion project and wanted to share some the tips I learned. Number one, DON’T PANIC!!! It’s not Django’s fault, Django’s just doing some caching for you to speed things up. This is a good thing, seriously… you want Django optimized for performance by default since most resource intense operations are one-off tasks, like data imports, bulk updates, special ad-hoc reports, etc. In these types of tasks all we need to do is slow Django down a bit and ease up on the memory, here’s how…
- Use MyModel.objects.all().iterator() instead of MyModel.objects.all()
- If you don’t need all of the fields in a query use values_list
- Use the Memory efficient Django Queryset Iterator, written by Thierry Schellenbach
- set DEBUG = False in your settings.py file. If DEBUG is True, then Django saves a copy of every SQL statement it has executed. See Django FAQ: Why is Django leaking memory? (spoiler: Django’s not really leaking memory).
Ranked in the order of highest performance, but also in the order of most memory intensive.
- MyModel.objects.all() – fastest, for standard queries
- MyModel.objects.all().iterator() – medium, for large queries
- queryset_iterator(MyModel.objects.all()) – slowest, for monster queries
One size doesn’t fit all, use the best method for your particular task. By the way, the Django Snippet: “Memory Efficient Django Queryset Iterator” is solid. I’ve been using it a lot over the past couple of weeks and been validating the results along the way — works like a champ