Sorting Numbers

In this example, I will write a program that accepts a sequence of integers on the standard input and then prints them out in sorted order. Because I will be printing the numbers out in a (potentially) very different order from what they were input I have to store the numbers into an array as I read them, sort the array, and then output the resulting array in order.

As with Example #5, I will create several functions to help organize the program. Since I have not yet talked about how arrays can be passed to functions I will make my array a global variable. Here are the functions that I'm visualizing.

int read_input(void); /* This function reads integers from the standard input device until it comes to EOF. It loads a global array with those values and returns the number of values it read. */ void sort(int size); /* This function sorts the global array. However, it only sorts the first Size elements. Any elements after the first Size are ignored. */ void output_results(int size); /* This function prints out the array elements in order. It only prints the first Size elements. Any other elements are ignored. */

With these functions my program becomes

#include <stdio.h> int array[1000]; // I can handle only 1000 numbers at the most. int main(void) { int size; size = read_input(); sort(size); output_results(size); return 0; }

So far so good! The two functions read_input and output_results are fairly easy. Let me take care of them first. To make this example more interesting, I will use pointers to access the array instead of indicies. You should be comfortable with both methods (see if you can write a version of these functions that uses indicies).

int read_input(void) { int number; int *current = array; // Keep reading integers from the standard input until I hit EOF. while (scanf("%d", &number) != -1) { *current = number; current++; // If I read 1000 of them, abort this loop! if (current - array == 1000) break; } return current - array; }

Let's be sure this logic is not off by one. The first time through the loop I store a number into array[0] and then advance current so that it is one past the start of the array. So with one item in the array I have current - array at 1. That sounds fine. The next time through the loop I put an item into array[1] and then advance current again. Now there are two items in the array and current - array has a value of 2. When the loop breaks current - array will contain a count of the number of items I stored in the array. This is the information I want to return. Fine.

Now suppose I store something into array[999] (the last element). Afterward I will make current point just off the end of the array. Thus current - array will be 1000 and that will cause the if statement to trigger and the loop will end. I will return 1000 but that is correct because there would be, in fact, 1000 items in the array. Furthermore I will have avoided trying to store anything into array[1000] (which does not exist). Looks good! You should be watchful of these situations. Errors are easy and common.

The output_results function is easy. I'll output one integer on each line. If I wanted to organize them in a fancier way (say in columns) this function would have to work harder. That's something to do in version 2.0.

void output_results(int size) { int *p; for (p = array; p < array + size; p++) { printf("%d\n", *p); } }

That's about it. Here I'm printing array elements from 0 to size - 1. That is exactly size elements so that's just right.

The sort function is a bit harder. I'm going to write a function that implements the Bubble Sort algorithm. This method for sorting an array is fairly easy to understand and not too difficult to write. It is, however, rather slow. The time it takes to sort the array is proportional to the square of the number of elements it is sorting. Much more efficient sorting algorithms exist. However, bubble sorting will be good enough for now.

The method involves comparing pairs of array elements and swaping them if they are out of order. Consider the array

{ 40, 23, 57, 12, 10 }

I first compare the 40 and 23. Since they are out of order (I'm going to sort in ascending order), I swap them. Thus I get

{ 23, 40, 57, 12, 10 }

Now I move down one position and compare the 40 and 57. Since they are in order I leave them alone. I move down another position and compare the 57 and 12. They are out of order so I swap them. I get

{ 23, 40, 12, 57, 10 }

I move down one more position and compare the 57 and 10. They are also out of order so I swap them too. I get

{ 23, 40, 12, 10, 57 }

I've reached the end of the array. However, since I did at least one swap, I have to start over. I compare the 23 and 40. They are in order so I leave them alone. I compare the 40 and 12. They are out of order so I swap them. I get

{ 23, 12, 40, 10, 57 }

I compare the 40 and 10. They are out of order so I get

{ 23, 12, 10, 40, 57 }

Do you see what's happening? The big values are floating or "bubbling" toward the end and the little values are bubbling toward the beginning. Now I compare the 40 and the 57. They are in order so I leave them alone. I'm done with the array, but since I did at least one swap in that last pass, I have to do it all again.

I compare the 23 and 12. They are out of order.

{ 12, 23, 10, 40, 57 }

Now I compare the 23 and the 10. They are out of order.

{ 12, 10, 23, 40, 57 }

Everything else is in order so I leave it all alone. However, I still did at least one swap so I have to start over again. I compare the 12 and the 10. They are out of order

{ 10, 12, 23, 40, 57 }

Everything else is in order so I leave it all alone. Since I did at least one swap I have to start over. This time everything is in order. There are no swaps and the algorithm ends. The array is sorted. It's pretty boring work. I think we should get a computer to do it!

I will now write some p-code for the bubble sort.

DO <Set swap flag to NO> <Point at first two elements> DO IF <The current pair of elements is out of order> THEN <Swap them> <Set swap flag to YES> END <Advance pointers by one> WHILE <We are not pointing off the end of the array> WHILE <I swapped something>

In this case do/while loops seem like the best choice because I have to look at the array at least once before I can say anything about it. The p-code above has a problem if the array has less than 2 elements in it. However, in that case it is already sorted. (If there is only one thing in the array it must be in the right order!) So I'll just add a check for that before I get into the algorithm above.

IF <Array has less than 2 elements> THEN RETURN END

Here is my function, following the p-code.

void sort(int size) { int *p1, *p2; int did_swap; // If the array is too small to be interesting it is already sorted. if (size < 2) return; // Keep looping as long as I have to swap something. do { // Initialize for this pass. did_swap = 0; p1 = &array[0]; p2 = &array[1]; // Sweep down the array. do { // If the current two elements are out of order, swap them. if (*p1 > *p2) { int Temp = *p1; *p1 = *p2; *p2 = Temp; did_swap = 1; } // Get ready for the next pair of elements. p1++; p2++; } while (p2 < array + size); } while (did_swap); }

Actually the speed of this function can be doubled. Notice how after each pass the biggest element finds its way to the end of the array? Since it is always bigger than anything after it, the biggest element will get swapped all the way down to its final position. Thus it is not really necessary to sweep over the entire array every time. The second time through you can ignore the last element. The third time through you can ignore the last two elements. In this way you end up cutting the number of comparisons you have to do (in general) in half. That's great, but it still doesn't change the fact that this method requires a time proportional to the square of the number of elements. No matter how much you tweak this algorithm it is still slow. Nevertheless I invite you to optimize it by making the adjustment I described. Can you do so without introducing a bug?

The final program is in sort_int.c.

© Copyright 2003 by Peter C. Chapin.Last Revised: