A few weeks ago we got a really interesting bug report from one of our API clients.
Hey, we’ve been playing around with the Invoice API and we’re running into this bug where the results vary depending on the number of items we request per page. Sometimes the order gets switched around, and it looks like an item is missing?
This was definitely weird, and I jumped on it right away. Missing records and non-deterministic orders are no joke, especially with invoicing.
Digging in the code, I found how the app orders the API results:
ORDER BY invoice_date DESC
Which makes sense, the invoices are sorted by the date they were issues. But, here’s the tricky thing about invoices: you can send out multiple invoices in a single day. In fact, that’s what’s generally considered, “a damn good day”.
What happens when multiple items fit into the same sorting block?
Think about how an automatic coin sorter works. It sends the coin down a slide, trying to guide the coin into progressively bigger slots. Once it hits the matching slot, it falls into the coin sleeve.
This works because each coin is a different size, so what would happen if two different coins were the same size? You’d end up with a sleeve full of mixed coins, which isn’t very helpful.
Sorting algorithms work in a similar way: they take the data and try to fit it into the right slot, and at the end you get an ordered list.
What happened here is that I accidentally let different coins be the same size. I’d considered the ordering of invoices over time, but I’d forgotten about invoices for the same date!
Ensuring deterministic sorting on overlapping records
The best way to fix this bug is to add another filter condition that ensures deterministic uniqueness. In this case:
ORDER BY invoice_date DESC, id DESC
This guarantees the newest invoices will be returned first, even if there are multiple invoices issued on the same date.
What about the missing record?
The part of the bug report that tripped me up the most was the fact a record was missing. How could that be possible? It’s not like the query changed conditions between requests!
Because of the non-deterministic ordering, the overlapping records had a slightly different order each time the query ran. Since the pagination logic grabs a slice of an ever-changing pie, and the page break could fall on a date with multiple invoices, voila, missing record!
How did you miss this?
Since it’s a slight-edge case to run into a situation where a page break would happen on a date with multiple invoices, I’d never run into it in my manual testing.
Running into this bug taught me a few lessons:
- Always make sure to think about the edge cases, even on something as “simple” as chronological ordering. Think about what your data looks like in the real world
- It’s crucial to understand the underlying technology you’re using. Knowing SQL and SQL orders helped me diagnose this problem and fix it quickly, and now I know how to prevent it from happening in the future.
- Be prepared for rough patches and bugs. API v2 is in beta because there is stuff we overlooked and need to fix. Likewise, you gotta be prepared to diagnose and fix bugs as quickly as possible.
Whenever you’re sorting records by a column that could have an overlap between records (same name, email, date, dollar amount, etc.), always have a secondary sort to make the ordering deterministic.