Tuesday, February 12, 2013

Thrashing and Amortization

Two new terms I've learned the definition of today in the contexts of Collections:

Amortisation

You might read comments in the JavaDocs such as this one for ArrayList:

"The add operation runs in amortized constant time, that is, adding n elements requires O(n) time."

without worrying too much about what it means. In the case of any collection backed by an array, this means that the array must resize when it doubles - an expensive operation - but that insertions are essentially free (adding an element to an array is much more efficient than adding it to, say, a linked list).

So, when the total number of elements goes from N to 2 x N, the array has to copy 2N elements into the new, resized array. But since N elements were added for "free", the average cost is (2N/N = 2) a constant*

"Every time you hit a power of 2 you take that many array accesses but in a sense you have already paid for them by putting those items on the stack. That's called amortised analysis, when you consider the total cost averaged over all operations." - Prof Rob Sedgewick, Coursera.

Thrashing

Thrashing is a pathological phenomena that could happen as the underlying array resizes at threshold X. If an element is added at X+1, the underlying array is enlarged. If an element is removed leaving X-1, the underlying array is shrunk.

"If a client calls push, pop, push, pop then it's going to be doubling, halving, doubling, halving creating arrays on every operation taking time proportional to N on every operation and therefore quadratic time for everything. The efficient solution is to wait for the array to get one quarter full before you halve it... There won't be another resizing operation until it gets totally full or half again full. So the invariant of that is that the array is always between 25% or 100% full and that every time time you resize you've already paid for it in an amortised sense." - Prof Rob Sedgewick, Coursera.


*  Note: this math is a simplification but you get the idea. A more accurate analysis is that the work of adding N elements is (of course) N accesses plus the number of resizes (2 + 4 + 8 + ... + N). But remember, 2 + 4 + 8 + 16 + ... + N = (2N - 2). We ignore the -2 for large N. Therefore, the total cost is actually 3N and so the amortised cost is the constant 3.

No comments:

Post a Comment