Debugging background tasks inside loops and transactions
I picked up a bug fix the other day that I was able to solve fairly quickly based on a hunch, but even after I’d solved it, I didn’t totally understand why. So I went hunting (because c’mon, the why is the fun part!)
The crux of the problem was this: We were doing an operation on a list of objects, the whole loop wrapped in a database transaction (more detail on defining your own transactions and using it as a context manager here), and then calling a background task for each of them. Users were reporting that the action that was supposed to be performed in the background task wasn’t happening for all of the users in the loop — only some of them. To make this even more fun, I was able to replicate the issue fairly consistently on production, but not at all locally. Oh, concurrency (jk, I actually do 💙 distributed systems).
Here’s a pared down version of the problem code:
from django.db import transactionwith transaction.atomic():
for user in users:
The issue here lies in the transaction. If the task is enqueued and picked up before the transaction closes, it’s possible that the database object on which it’s operating doesn’t actually exist yet. This issue is documented here, and can be resolved as the docs point out — by adding the function to a list of operations to be run only once the transaction closes. Like so:
from django.db import transaction, connectionwith transaction.atomic():
for user in users:
This solved the original problem by guaranteeing that a database record existed and could be modified in the background task, but introduced a new bug, which manifested itself in a nearly identical way — the background task was only being run for exactly one user: the last one in the loop. The good news: this new bug was 100% consistent, while the previous bug had been only almost totally consistent. The plot thickens.
Another benefit of this new bug is that I was able to reproduce it locally. So I stuck some debuggers in the code, one of them right after my
connection.on_commit(...) call. A
connection has an attribute called
run_on_commit, which is just a list of tasks that will be added to the queue of background…