Why are so many SAS developers struggling with asymptotic runtime? Time to fix this gaping hole in your knowledge before you deploy yet another catastrophic O(n²) disaster to prod. Listen closely and I'll break it down into tiny, digestible pieces so you can finally comprehend this.
Asymptotic runtime is a mathematical framework used to describe how an algorithm's execution time changes as the input size (which we call n) grows toward infinity.
Note that I said infinity, not "ten items from your local test database."
We don't count actual seconds or milliseconds. Why? Because your slow, outdated laptop will run code differently than a high performance server. Measuring seconds is for amateurs. Professionals look at the growth rate of operations. We look at how the algorithm scales when the data gets massive.
There are three symbols you need to know.
Big O (O): This is the upper bound. It's the absolute worst case scenario. It guarantees that your code will never, ever perform worse than this curve. Since your code likely runs terribly, you'll spend most of your time dealing with this one.
Big Omega (Ω): This is the lower bound. It represents the best case scenario. It's the absolute minimum number of operations required.
Big Theta (Θ): This is the tight bound. We only use this when the best case and worst case growth rates are exactly the same. It means the runtime is perfectly sandwiched between two constant factors.
When analyzing runtime, we completely ignore constants and lower order terms. Why? Because when n is one billion, adding 5 or multiplying by 2 does not matter. Only the highest power of n matters.
Here's the hierarchy of efficiency, ordered from best to worst. Try to memorize it so you can stop writing terrible code:
O(1) - constant time, the gold standard. The input could be five items or five billion items; the algorithm takes the exact same number of steps. An example is looking up an element in an array by its index. It's instant. It's flawless. It's everything your code isn't.
O(log n) - logarithmic time, highly efficient. Every time the algorithm takes a step, it cuts the remaining problem size in half. Think of binary search. Even if you search through a billion sorted items, it takes a maximum of around 30 steps.
O(n) - linear time, acceptable, but uninspired. The time grows in direct proportion to the input size. If you double the data, you double the time. This is what happens when you use a simple loop to scan through an unsorted list from start to finish.
O(n log n) - linearithmic time, the standard for any decent sorting algorithm, like merge sort or quick sort, the absolute best we can do for comparison based sorting.
O(n²) - quadratic time, this is where we enter the danger zone. If you double the input size, the execution time quadruples. You usually achieve this embarrassment by nesting a loop inside another loop because you didn't know how to utilize a hash map.
O(2^n) - exponential time, complete and utter failure. The operations double with every single item you add. If n hits 50, your program will literally outlive the universe.
If you only ever write code that processes a list of ten users, you can ignore all of this and keep writing your sloppy nested loops.But the moment your data scales to millions of users, an O(n²) algorithm will lock up the CPU, crash the server, and force your manager to ask why you didn't learn this in school. Asymptotic runtime allows you to predict these catastrophic bottlenecks before they happen, independent of hardware.
Next I probably need to show you how to analyze a nested loop or how to identify a constant-time operation...