Algorithm Performance (Asymptotic Notation)
(C) Copyright 2016 by Peter Chapin
===========================================

OVERVIEW

In this lesson I will cover the following topics.

1. What an algorithm is.

2. The different ways an algorithm can be "efficient."

3. How the efficiency of algorithms are compared.

4. Asymptotic notation for expressing algorithm performance.

BODY

*** What is an algorithm and what does "efficient" mean?

Just what is an algorithm anyway? An algorithm is basically a "method" for doing something. The
classic example is sorting. Suppose you wanted to sort an array. How would you do it? There are
actually many different methods (or algorithms). Each has its own advantages and disadvantages.
It is often not obvious which one is the most appropriate. In this course I will talk about some
different algorithms for commonly performed tasks, and I will show you how to implement those
algorithms in a general purpose way using C. Yet to understand what people mean when they speak
of one algorithm being "better" than another, we need a way to compare one algorithms.

Informally people like to talk about how "efficient" an algorithm is. It is generally assumed
that more efficient algorithms are better. But just what does efficient mean? Usually efficiency
refers to the speed of the algorithm. A more efficient algorithm gets the job done in less time
than a less efficient one. However, there are other resources besides computer time that an
algorithm might consume. Sometimes people use "efficient" to mean less memory is used. You could
use "efficient" to mean less of anything... less network bandwidth required, less disk space
used, fewer keystrokes entered by the user, or, of particular relevance to mobile and embedded
computing, less battery power consumed.

Often there are trade offs in these various types of efficiency. For example, one algorithm for
doing a particular job might be sluggish but require very little memory. Another algorithm for
doing that same job might be much faster, but require a great deal of memory. Which algorithm is
better? It depends. Which do you have: extra memory or extra processor time?

Yet despite all of this, people usually mean execution speed when they talk about efficiency.
Unless I say otherwise, that is what I will mean too.

*** How fast is fast?

Using real time to compare the speed of two algorithms is not very useful. What does it mean to
say that a certain sorting algorithm takes 3.24 seconds? It doesn't mean much. That actual clock
time it takes to do a job depends not only on how clever the algorithm is, but also on how fast
the computer is. A time of 3.24 seconds on a 8 MHz, 16 bit embedded processor is one thing; on a
supercomputer it's something else entirely! Yet despite this, it is still meaningful to talk
about a "fast" algorithm and a "slow" one. All other things being equal (same machine, etc) a
fast algorithm will outperform a slow one.

Most algorithms operate on a large collection of data objects. Consider sorting. You would
expect that sorting 1000 items would take longer than sorting 100 items. How much longer? We
can't give actual times, of course, since actual times depend on the machine and many other
factors. But we can say things like: sorting 1000 items with this algorithm takes 10 times as
long as sorting 100 items with it. An algorithm that has this property---where the time required
is proportional to the amount of data that it operates on---is said to be a "linear time"
algorithm. This property will be the same no matter what machine you use to run the algorithm or
what language you program it in. It is a property of the algorithm itself.

More formally, the time required to execute a linear time algorithm would be

      T = k1*N + k0

Where k0 is a constant "overhead" time that is required even if there are zero items to process
and k1 is a constant of proportionality. The number, N, represents the amount of data the
algorithm is working on.

The overhead time is almost always ignored. It doesn't matter how large or small it is, that
time is considered insignificant. Why? Because we are really only worried about the case when N
is large. All algorithms are fast when they work on a small amount of data. Yet when N is large
enough to be interesting the value of k1*N will be much larger than k0 (normally... in rare
cases this is not true).

For example, suppose an algorithm requires 100 ms of computer time on a particular machine to
get set up and 1 ms per data item to execute. If you only have 10 data items to process the time
required would be

        T = (1 ms/item) * 10 items + 100 ms
          = 110 ms

In this case most of the time required is in the overhead time. Yet the whole operation only
took 110 ms -- not long regardless. Now suppose you had 1 million items to process

        T = (1 ms/item) * 1000000 items + 100 ms
          = 1000.1 seconds

Here the overhead time it totally negligible. This is the situation we are worried about. How
can we get that 1000 seconds down to something shorter?

Obviously we could use a faster machine, a faster language, or a more carefully written program.
Such techniques often can improve performance by a factor of two or more. Suppose you use a new
computer that is twice as fast. Now the time to process each item is 500 microseconds. The
overhead time has also reduced to 50 ms, but that is irrelevant since it does not contribute
significantly to the overall time anyway. (This is something to think about before you spend
hours and hours trying to "optimize" a program. If you optimize the wrong thing, your work will
have no significant effect). Now you have

        T = (0.5 ms/item) * 1000000 items + 50 ms
          = 500.05 seconds

This is fine, but as you will soon see, choosing a different algorithm might make a much more
dramatic improvement.

*** Non-linear time.

You might think that every algorithm would be a linear-time algorithm. After all, if you have N
items to process, what else is there to do but process each item? In fact, linear time
algorithms are very common. Yet they are not universal. There are algorithms that are worse...
and better.

Sorting algorithms, typically have to fiddle with each item they are sorting more than once. As
the algorithm works, the data gets shuffled around again and again. Many sorting algorithms
operate in quadratic time.

        T = k2*N^2 + k1*N + k0

The total time to complete is proportional to the number of items squared plus additional terms.
Since we are only interested in the case where N is large, the k1*N + k0 terms are irrelevant.
For a large enough N, the N^2 term will dominate.

Now here is the good part: a quadratic time algorithm is slower than a linear time one -- no
matter what machine you use. How is that possible? Will an 8 MHz processor running a linear time
algorithm outperform a supercomputer running a quadratic time algorithm? Yes! If N is large
enough the microcontroller will outperform the supercomputer..

To see this consider the two formula below. Since we are talking about a large N, I will ignore
the lower order terms.

      mico :    T1 = k1 * N     (k1 is large because the micro is slow).
      super:    T2 = k2 * N^2   (k2 is small because the supercomputer is fast).

Despite the fact that k1 > k2, the time T1 will be less than T2 for some large value of N. As N
becomes larger and larger, N^2 grows much faster than N. Eventually N^2 times a small k2 value
will yield a big number. Since N is growing more slowly, it will eventually fall behind, despite
the large k1 value. Just take a look at how N and N^2 behave:

    Number of items       N           N^2
    ---------------      --          ---
    1                     1            1
    10                   10          100
    100                 100        10000
    1000               1000      1000000
    10000000        1000000      1.0e+12 (not even the supercomputer will like this)

Quadratic time algorithms are bad news. They work fine for a small number of items, but if you
have to process a billion items using one, you should just give up. If a linear time algorithm
is available to do the same job, jump on it. A weak machine running a linear time algorithm will
blow a supercomputer running a quadratic time algorithm out of the water. Believe it.

There are algorithms that are even slower than quadratic time. Using Cramer's rule for solving
systems of simultaneous equations requires time proportional to N*(N!) where N is the number of
variables being solved. Solving two equations with two unknowns this way is fine, but what if
you wanted to solve 10000 equations with 10000 unknowns? It would take longer than the age of
the universe on any computer that could conceivably be constructed. Yet such applications exist.

Serious programs that need to solve a large number of simultaneous equations might instead use a
method called "Gaussian Elimination." This method runs in cubic time, which is admittedly worse
than quadratic time is far superior to the behavior of Cramer's rule.

You see what has happened here? We've started to talk about the performance of different
algorithms without fussing at all about the particulars of any one machine. We don't need to
know what machine you will use. A cubic time algorithm is going to be slow compared to a
quadratic time one and incredibly slow compared to a linear time one.

*** Big O notation.

Computer scientists have developed a notation to express the ideas I have been talking about
informally so far. I will spare you the detailed mathematical definition of this notation.
Intuitively it works just as I've been talking about.

An algorithm is said be O(N) (read "order N") if it runs in linear time. This means in the
limit, as N approaches infinity, the time required for the algorithm is bounded above by
something proportional to N. What happens when N is small is not the point (and of little
practical significance in most cases).

Here is a summary of the common running times one sees.

O(N): Linear time. This is quite common. Many algorithms have this running time. It is okay --
neither exceptionally good nor horribly bad.

O(Nlog(N)): This means the running time is proportional to N times the log(N). This is not as
good as O(N), but it is better than O(N^2). There are several good sorting algorithms that do
this, and for large N they are much better than their O(N^2) cousins.

O(N^2): Quadratic time. This is starting to get not good. Sometimes it's the best you can do,
and that hurts.

O(N^3): Cubic time. There are some important algorithms that are like this. They are considered
very slow.

O(e^N): Exponential time. Very bad news. Algorithms like this have the potential to keep a
supercomputer busy for times longer than the age of the universe. Such algorithms are only
practical for very small values of N. Exponential algorithms are slower than any polynomial
algorithm (N^3, N^4, N^5, etc).

O(log(N)): Logarithmic time. Very good news. Algorithms like this only take a little more time
even after adding a huge number of more data items to process. O(log(N)) algorithms are great.

O(1): Constant time. The best news. Algorithms like this take the same amount of time no matter
how many data items they process. They consist purely of constant overhead (which is often very
small). They are "instantaneous" as far as we are concerned here. Many important operations are
constant time operations and that is a very good thing.

*** An example.

All of my talk so far has been in the abstract. Let's make it a bit more concrete. Let's talk
about measuring the length of a string.

In C, strings are null terminated (that is, their end is marked with a special, "null"
character). To find out how long a string is you have to use the library function strlen.

    char *p = "Hello, World!";
    int   Length;

    Length = strlen(p);

How does strlen work? It has to scan down the string, looking at each character, until it finds
the null character. It has to count every character that it finds before that point. Thus strlen
is an O(N) algorithm (linear time). Do you see that?

Most strlen functions have been written in assembly language to use whatever special
instructions the processor provides for scanning blocks of memory. This speeds them up
considerably. Also, most strlen functions don't actually count the characters as they go; they
just find the ending address of the string and subtract the starting address from it. This also
speeds them up. Yet the fact remains: it is an O(N) algorithm. The function has to touch every
character in the string to find its end.

In C++ the standard string class has a length member function you can use to find the length of
the string.

    std::string Message("Hello, World!");
    int Length;

    Length = Message.length();

How does this work? It will depend on how std::strings are managed internally. Every
implementation might be a little different. Yet it is possible for std::string to keep the
length of the string "on hand" as a separate integer member of the string class. Whenever an
operation was done that changed the length, it could update that member accordingly. For example
if you appended two strings, the append function could add together the lengths of each to get
the new length. That would be quick and easy.

If std::string did this, it could give you the length by just returning the separate member. It
would be an O(1) algorithm (constant time). There would be no need for the function to paw its
way through the string looking for the end. It would have the length just sitting there, ready
to return at once. Asking for the length of a string 1 million characters long would take the
same (short) time required to get the length of a string 10 characters long. It would be FAR
faster than strlen. Most commercial implementations of std::string do this or something like it.
In fact, the C++ standard pretty much mandates it.

If your strings are all short, it probably doesn't matter much which approach you use. But if
you need to deal with very long strings, std::string may well be significantly faster in this
respect than its simpler C counterpart.

SUMMARY

1. An algorithm is a method for doing some task. Many tasks can be done in various ways, but
   each method has its own advantages and disadvantages. For example, there are several, well
   known sorting algorithms and yet there is no single "best" sorting algorithm because each
   algorithm tends to have its own advantages.

2. Normally when people speak of the efficiency of an algorithm, they are talking about its
   speed. However, algorithms can also be compared in terms of memory usage or, for that matter,
   the usage of any other scarce resource. Typically there is a trade-off between speed and
   memory consumption; fast algorithms sometimes take up a lot of extra memory (not always).

3. When comparing the efficiency of two algorithms, you do not want to concern yourself with the
   details of the hardware on which they will run. Instead the usual way is to imagine each
   algorithm working on greater and greater quantities of data and describe how the time
   required to do so increases with the amount of data to be processed.

4. The big O notation gives us a way to quantify the performance of an algorithm. If an
   algorithm runs in a time proportional to the number of data items, we say that it is an O(N)
   or "linear time" algorithm. If it runs in a time proportional to the number of data items
   squared, we say that it is an O(N^2) or "quadratic time" algorithm. A fast algorithm might
   run in a time proportional to the log of the number of data items. Such an algorithm is said
   to be O(log(N)).