Algorithm Performance (Asymptotic Notation) (C) Copyright 2016 by Peter Chapin =========================================== OVERVIEW In this lesson I will cover the following topics. 1. What an algorithm is. 2. The different ways an algorithm can be "efficient." 3. How the efficiency of algorithms are compared. 4. Asymptotic notation for expressing algorithm performance. BODY *** What is an algorithm and what does "efficient" mean? Just what is an algorithm anyway? An algorithm is basically a "method" for doing something. The classic example is sorting. Suppose you wanted to sort an array. How would you do it? There are actually many different methods (or algorithms). Each has its own advantages and disadvantages. It is often not obvious which one is the most appropriate. In this course I will talk about some different algorithms for commonly performed tasks, and I will show you how to implement those algorithms in a general purpose way using C. Yet to understand what people mean when they speak of one algorithm being "better" than another, we need a way to compare one algorithms. Informally people like to talk about how "efficient" an algorithm is. It is generally assumed that more efficient algorithms are better. But just what does efficient mean? Usually efficiency refers to the speed of the algorithm. A more efficient algorithm gets the job done in less time than a less efficient one. However, there are other resources besides computer time that an algorithm might consume. Sometimes people use "efficient" to mean less memory is used. You could use "efficient" to mean less of anything... less network bandwidth required, less disk space used, fewer keystrokes entered by the user, or, of particular relevance to mobile and embedded computing, less battery power consumed. Often there are trade offs in these various types of efficiency. For example, one algorithm for doing a particular job might be sluggish but require very little memory. Another algorithm for doing that same job might be much faster, but require a great deal of memory. Which algorithm is better? It depends. Which do you have: extra memory or extra processor time? Yet despite all of this, people usually mean execution speed when they talk about efficiency. Unless I say otherwise, that is what I will mean too. *** How fast is fast? Using real time to compare the speed of two algorithms is not very useful. What does it mean to say that a certain sorting algorithm takes 3.24 seconds? It doesn't mean much. That actual clock time it takes to do a job depends not only on how clever the algorithm is, but also on how fast the computer is. A time of 3.24 seconds on a 8 MHz, 16 bit embedded processor is one thing; on a supercomputer it's something else entirely! Yet despite this, it is still meaningful to talk about a "fast" algorithm and a "slow" one. All other things being equal (same machine, etc) a fast algorithm will outperform a slow one. Most algorithms operate on a large collection of data objects. Consider sorting. You would expect that sorting 1000 items would take longer than sorting 100 items. How much longer? We can't give actual times, of course, since actual times depend on the machine and many other factors. But we can say things like: sorting 1000 items with this algorithm takes 10 times as long as sorting 100 items with it. An algorithm that has this property---where the time required is proportional to the amount of data that it operates on---is said to be a "linear time" algorithm. This property will be the same no matter what machine you use to run the algorithm or what language you program it in. It is a property of the algorithm itself. More formally, the time required to execute a linear time algorithm would be T = k1*N + k0 Where k0 is a constant "overhead" time that is required even if there are zero items to process and k1 is a constant of proportionality. The number, N, represents the amount of data the algorithm is working on. The overhead time is almost always ignored. It doesn't matter how large or small it is, that time is considered insignificant. Why? Because we are really only worried about the case when N is large. All algorithms are fast when they work on a small amount of data. Yet when N is large enough to be interesting the value of k1*N will be much larger than k0 (normally... in rare cases this is not true). For example, suppose an algorithm requires 100 ms of computer time on a particular machine to get set up and 1 ms per data item to execute. If you only have 10 data items to process the time required would be T = (1 ms/item) * 10 items + 100 ms = 110 ms In this case most of the time required is in the overhead time. Yet the whole operation only took 110 ms -- not long regardless. Now suppose you had 1 million items to process T = (1 ms/item) * 1000000 items + 100 ms = 1000.1 seconds Here the overhead time it totally negligible. This is the situation we are worried about. How can we get that 1000 seconds down to something shorter? Obviously we could use a faster machine, a faster language, or a more carefully written program. Such techniques often can improve performance by a factor of two or more. Suppose you use a new computer that is twice as fast. Now the time to process each item is 500 microseconds. The overhead time has also reduced to 50 ms, but that is irrelevant since it does not contribute significantly to the overall time anyway. (This is something to think about before you spend hours and hours trying to "optimize" a program. If you optimize the wrong thing, your work will have no significant effect). Now you have T = (0.5 ms/item) * 1000000 items + 50 ms = 500.05 seconds This is fine, but as you will soon see, choosing a different algorithm might make a much more dramatic improvement. *** Non-linear time. You might think that every algorithm would be a linear-time algorithm. After all, if you have N items to process, what else is there to do but process each item? In fact, linear time algorithms are very common. Yet they are not universal. There are algorithms that are worse... and better. Sorting algorithms, typically have to fiddle with each item they are sorting more than once. As the algorithm works, the data gets shuffled around again and again. Many sorting algorithms operate in quadratic time. T = k2*N^2 + k1*N + k0 The total time to complete is proportional to the number of items squared plus additional terms. Since we are only interested in the case where N is large, the k1*N + k0 terms are irrelevant. For a large enough N, the N^2 term will dominate. Now here is the good part: a quadratic time algorithm is slower than a linear time one -- no matter what machine you use. How is that possible? Will an 8 MHz processor running a linear time algorithm outperform a supercomputer running a quadratic time algorithm? Yes! If N is large enough the microcontroller will outperform the supercomputer.. To see this consider the two formula below. Since we are talking about a large N, I will ignore the lower order terms. mico : T1 = k1 * N (k1 is large because the micro is slow). super: T2 = k2 * N^2 (k2 is small because the supercomputer is fast). Despite the fact that k1 > k2, the time T1 will be less than T2 for some large value of N. As N becomes larger and larger, N^2 grows much faster than N. Eventually N^2 times a small k2 value will yield a big number. Since N is growing more slowly, it will eventually fall behind, despite the large k1 value. Just take a look at how N and N^2 behave: Number of items N N^2 --------------- -- --- 1 1 1 10 10 100 100 100 10000 1000 1000 1000000 10000000 1000000 1.0e+12 (not even the supercomputer will like this) Quadratic time algorithms are bad news. They work fine for a small number of items, but if you have to process a billion items using one, you should just give up. If a linear time algorithm is available to do the same job, jump on it. A weak machine running a linear time algorithm will blow a supercomputer running a quadratic time algorithm out of the water. Believe it. There are algorithms that are even slower than quadratic time. Using Cramer's rule for solving systems of simultaneous equations requires time proportional to N*(N!) where N is the number of variables being solved. Solving two equations with two unknowns this way is fine, but what if you wanted to solve 10000 equations with 10000 unknowns? It would take longer than the age of the universe on any computer that could conceivably be constructed. Yet such applications exist. Serious programs that need to solve a large number of simultaneous equations might instead use a method called "Gaussian Elimination." This method runs in cubic time, which is admittedly worse than quadratic time is far superior to the behavior of Cramer's rule. You see what has happened here? We've started to talk about the performance of different algorithms without fussing at all about the particulars of any one machine. We don't need to know what machine you will use. A cubic time algorithm is going to be slow compared to a quadratic time one and incredibly slow compared to a linear time one. *** Big O notation. Computer scientists have developed a notation to express the ideas I have been talking about informally so far. I will spare you the detailed mathematical definition of this notation. Intuitively it works just as I've been talking about. An algorithm is said be O(N) (read "order N") if it runs in linear time. This means in the limit, as N approaches infinity, the time required for the algorithm is bounded above by something proportional to N. What happens when N is small is not the point (and of little practical significance in most cases). Here is a summary of the common running times one sees. O(N): Linear time. This is quite common. Many algorithms have this running time. It is okay -- neither exceptionally good nor horribly bad. O(Nlog(N)): This means the running time is proportional to N times the log(N). This is not as good as O(N), but it is better than O(N^2). There are several good sorting algorithms that do this, and for large N they are much better than their O(N^2) cousins. O(N^2): Quadratic time. This is starting to get not good. Sometimes it's the best you can do, and that hurts. O(N^3): Cubic time. There are some important algorithms that are like this. They are considered very slow. O(e^N): Exponential time. Very bad news. Algorithms like this have the potential to keep a supercomputer busy for times longer than the age of the universe. Such algorithms are only practical for very small values of N. Exponential algorithms are slower than any polynomial algorithm (N^3, N^4, N^5, etc). O(log(N)): Logarithmic time. Very good news. Algorithms like this only take a little more time even after adding a huge number of more data items to process. O(log(N)) algorithms are great. O(1): Constant time. The best news. Algorithms like this take the same amount of time no matter how many data items they process. They consist purely of constant overhead (which is often very small). They are "instantaneous" as far as we are concerned here. Many important operations are constant time operations and that is a very good thing. *** An example. All of my talk so far has been in the abstract. Let's make it a bit more concrete. Let's talk about measuring the length of a string. In C, strings are null terminated (that is, their end is marked with a special, "null" character). To find out how long a string is you have to use the library function strlen. char *p = "Hello, World!"; int Length; Length = strlen(p); How does strlen work? It has to scan down the string, looking at each character, until it finds the null character. It has to count every character that it finds before that point. Thus strlen is an O(N) algorithm (linear time). Do you see that? Most strlen functions have been written in assembly language to use whatever special instructions the processor provides for scanning blocks of memory. This speeds them up considerably. Also, most strlen functions don't actually count the characters as they go; they just find the ending address of the string and subtract the starting address from it. This also speeds them up. Yet the fact remains: it is an O(N) algorithm. The function has to touch every character in the string to find its end. In C++ the standard string class has a length member function you can use to find the length of the string. std::string Message("Hello, World!"); int Length; Length = Message.length(); How does this work? It will depend on how std::strings are managed internally. Every implementation might be a little different. Yet it is possible for std::string to keep the length of the string "on hand" as a separate integer member of the string class. Whenever an operation was done that changed the length, it could update that member accordingly. For example if you appended two strings, the append function could add together the lengths of each to get the new length. That would be quick and easy. If std::string did this, it could give you the length by just returning the separate member. It would be an O(1) algorithm (constant time). There would be no need for the function to paw its way through the string looking for the end. It would have the length just sitting there, ready to return at once. Asking for the length of a string 1 million characters long would take the same (short) time required to get the length of a string 10 characters long. It would be FAR faster than strlen. Most commercial implementations of std::string do this or something like it. In fact, the C++ standard pretty much mandates it. If your strings are all short, it probably doesn't matter much which approach you use. But if you need to deal with very long strings, std::string may well be significantly faster in this respect than its simpler C counterpart. SUMMARY 1. An algorithm is a method for doing some task. Many tasks can be done in various ways, but each method has its own advantages and disadvantages. For example, there are several, well known sorting algorithms and yet there is no single "best" sorting algorithm because each algorithm tends to have its own advantages. 2. Normally when people speak of the efficiency of an algorithm, they are talking about its speed. However, algorithms can also be compared in terms of memory usage or, for that matter, the usage of any other scarce resource. Typically there is a trade-off between speed and memory consumption; fast algorithms sometimes take up a lot of extra memory (not always). 3. When comparing the efficiency of two algorithms, you do not want to concern yourself with the details of the hardware on which they will run. Instead the usual way is to imagine each algorithm working on greater and greater quantities of data and describe how the time required to do so increases with the amount of data to be processed. 4. The big O notation gives us a way to quantify the performance of an algorithm. If an algorithm runs in a time proportional to the number of data items, we say that it is an O(N) or "linear time" algorithm. If it runs in a time proportional to the number of data items squared, we say that it is an O(N^2) or "quadratic time" algorithm. A fast algorithm might run in a time proportional to the log of the number of data items. Such an algorithm is said to be O(log(N)).