I had never heard the word “idempotent” when I started at my first software job. I had a liberal arts education and had been working in nonprofit and school management before my career change. I suspect I’m not alone in this total lack of awareness of the concept! So let’s talk about it.
What does it mean?
It means you can do the same thing with the same inputs any number of times, and it won’t change the end result.
This is easy to achieve when we’re talking about data retrieval — retrieving data doesn’t change the data itself. But it gets a little more complex when we’re talking about running operations on data. In that context, another way to phrase the definition of “idempotent” would be that results are changed only the first time an operation is run, and any subsequent runs of the same operation will have no impact.
Let’s look at some examples of this to make it a bit more clear:
- Any GET request where you’re retrieving data is idempotent. If the data changes between requests, it’s because some other operation elsewhere in the system caused it to change, but your requests had no bearing on that at all
- Sorting is idempotent — sorting a collection the first time will put it in order, but if you run the sorting operation again, it will have no impact. The collection will still be in sorted order.
- In the context of data streaming, where you’re consuming data from a message bus, it means that the first time a given process consumes a message, it might have some side-effects, but re-processing the same message again would have no impact at all. (This does not necessarily mean that a different process consuming messages from the same message bus won’t also have an impact, since that is a different operation entirely. Idempotence refers only to the same operation being run multiple times).
- If a scheduled task fails partway through and is run again later, any operations that happened in the first portion of the script before the failure occurred should not occur again.
How do we achieve it?
This is where idempotence in software differs from the mathematical concept. Mathematically speaking, idempotence literally means that the same operation is run multiple times without an impact on subsequent runs. In software achieving idempotence usually means building in a mechanism for preventing the operation from being run if it already has.
While all the mechanisms for how to achieve idempotence or detailed walkthroughs of their implementation are out of scope for this post, strategies might include:
- Using a nonce, or a unique identifier, for each message, which can be cached or put in some other data store. Then, before running any process, the data store should be checked to make sure that the message’s nonce is not already there — its presence would indicate that the process has already been running that message. If it is found, skip processing the data entirely
- In the case of a backfill, a failure could cause any operations that did succeed before the failure to be rolled back, so that a subsequent run of the same process would start again with the data in its original starting state.
- Some operations are idempotent by definition. Sorting is one example. Another is deleting something from a database — no matter how many times you try to do it, any attempt after the first one will have no impact, since the item will no longer exist (assuming your delete operation is of sufficient specificity to guarantee that you won’t accidentally delete a different element the next time around!)