...
- Never slice your queryset to get the rows to delete
ex: Grant.objects.filter(expires__lte=Now())[:batch_size]
- Create a management command that can be run again later if need be
- Management command should have command line options to configure the batch size and sleep time between batches so DevOps can tune resource usage to play nice with the production traffic
- Delete by an indexed column if you can
- id is usually best if you have contiguous ids
- Get the max id you want to delete, the current min id in the table, delete by id chunks by batch size until you reach max id.
- If you have to filter by a non-indexed column, try to keep to your batch size by also limiting by an indexed column
- Example: edx-django-oauth2-provider (this deleted ~1.6M production rows in 46 mins w/ chunk size of 3k and 5 secs between batches)
- NOTE: Django has a delete optimization that make makes this performant. But if the table has foreign keys or connected pre/post delete signals, that optimization WILL NOT be used and this will effectively turn into the ORM Delete Process below.
...