Example #3

Program to Compute Prime Factors

In this example, I will build a program that uses several functions together. My program will take a series of numbers at its standard input and for each number display the prime factorization of that number.

First a little background: every integer (greater than 1) can be broken down into a product of possibly many prime numbers. This breakdown is unique: there is only one way to factor a number into primes. For example, the number 10 is 2*5. Ten is also 5*2 but that involves the same prime factors and so is really the same thing. The number 12 is 2*2*3 or 2^2 * 3 (two raised to the power of two times three). It is true that 12 is also 2*6 but 6 isn't prime. To get the prime factorization of 12 we need to factor 6 into 2*3. It is also true that 12 is 3*4 but 4 isn't prime. To get the prime factorization of 12 we need to factor 4 into 2*2. No matter how we do it we end up with 2*2*3. The prime factors of a number are, in effect, a "signature" for that number. There are various applications where knowing a number's prime factors is useful.

Here's how I'm visualizing my program working. It is always important to first understand exactly what is to be done. If this were a large program I would read (or write) a written specification. For this program I can get away with just showing a few examples.

$ prime_factors
10                <-- I type 10
2 * 5             <-- The program responds
12                <-- I type 12
2^2 * 3           <-- The program responds
^D                <-- I type CTRL+D to end the program

This user interface is very spartan. There are no prompts or other pretty output. However, there are advantages to doing it this way. Suppose I had a file containing a large number of numbers. I could filter that file through this program to create a new file containing the prime factorizations of all the numbers in the original file. Also, because of the way scanf works I can optionally input numbers all on one line.

$ prime_factors
10 12             <-- I type this
2 * 5             <-- The program responds
2^2 * 3           <-- The program responds
^D                <-- I type CTRL+D to end the program

Notice that the program prints out the factors in order from smallest to largest. I want it to do this. It makes it a lot easier to read. Notice also that the program combines equal factors into a single term involve a power. Instead of printing "2 * 2 * 3" for 12 it printed "2^2 * 3". I also want it to do this. Large numbers often have many equal factors and it would be tedious to have them all printed out separately.

Hopefully at this point it is fairly clear what the program is supposed to do. Now... how am I going to make it happen? Let me start with some p-code.

WHILE <There is a number at stdin> LOOP
  <Compute and display the prime factorization of that number>
END

This is fine, but it needs refining.

WHILE <There is a number at stdin> LOOP
  WHILE <The number is not prime> LOOP
    <Find the smallest prime factor and print it>
    <Update the number by dividing it by its smallest factor>
  END
  <Print the number (so that the last prime factor is displayed)>
END

Okay... this logic should compute the prime factors pretty well. Let's check it. Suppose I'm given a number like 12. Since 12 is not prime the program will go inside the inner while loop. The smallest prime factor of 12 is 2 so the program prints that. Next it computes 12/2, which is 6, and updates the number to 6. Since 6 is not prime the loop happens again. The smallest prime factor of 6 is also 2 so the program prints that. Next it computes 6/2, which is 3, and updates the number to 3. Since 3 is prime, the inner while loop ends and the program prints 3. The program outputs 2, 2, and 3. These are the correct factors, although the output isn't in the right format (yet).

If the given number is prime, this logic will just end up printing that number back again. I didn't really specify above what was supposed to happen in this case. However, I feel that just echoing back prime numbers is pretty reasonable. After all a prime number, like 17, is just "1 * 17." We can say that there is an implicit "1 *" in front of whatever the program outputs and everything is totally consistent.

I don't like to think about what would happen if my logic got an input number less than 2. Thus I should check for that. I choose to ignore such numbers. Here is the refined p-code

FUNCTION <Main>(VOID) RETURN <Integer status code>
  BEGIN
    WHILE <There is a number at stdin> LOOP

      # Ignore numbers I don't like.
      IF <The number is less than 2> CONTINUE;

      # Otherwise process the number.
      WHILE <The number is not prime> LOOP
        <Find the smallest prime factor and print it>
        <Update the number by dividing it by its smallest factor>
      END
      <Print the number (so that the last prime factor is displayed)>
    END
  END

Now as I think about this problem I realize that checking to see if a number is prime and finding its smallest prime factor can be done in a single step. Imagine a function called find_first_factor that takes a number and returns the smallest prime factor. If that function returned a value equal to the original number, the number must be prime. Here's how that function might look.

FUNCTION <find_first_factor>(Some number)
  RETURN <The smallest prime factor or the given number if it is prime>
  BEGIN
    FOR <All numbers from 2 that are less than the given number> LOOP
      IF <The loop index divides into the given number evenly> THEN
        RETURN <The loop index>
      END
    END
    RETURN <The given number>
  END

If you find my formal p-code a little daunting looking, don't worry about it. The point is that I'm building a separate function to locate the first prime factor of a number. By putting this operation in a separate function, I don't have to clutter up my main logic with these loops and conditions. Besides, I might find a function like this handy in some other program some day.

The method I use in this function is the brute force method. It simply tries dividing the given number by everything that is less than it. The very first number it finds that works must be the first prime factor (it can't be a non-prime because if it was then some earlier number would have worked too). This logic also works for the value 2 (check it and see). That's not too surprising because I'm borrowing logic from one of my earlier examples and I already figured out how to handle 2 at that time. I don't really have to worry about numbers less than 2 because in the original program such numbers are ignored before this function is called. However, if I wanted this function to be general purpose, I probably should put in some error checking (or properly document the function's limitations).

Okay... I'm going to code up what I've got so far just to see how it works. I realize that the output isn't in the right format yet but that's okay for now. In general you shouldn't try to get everything working at once. That's usually too much to try at a time and it causes difficulties. Solve your problems one step at a time.

#include <stdio.h>

//
// The following function finds the first prime factor in its parameter
// It returns that factor or, if the parameter is prime, it returns the
// parameter itself. This function had undefined behavior if given a
// value less than 2.
//
int find_first_factor(int number)
{
  int i;

  // Loop over all values from 2 that are less than the given number.
  for (i = 2; i < number; i++) {

   // If we find a factor, return it. Since this is the first, it must
   // be prime.
   if (number % i == 0) return i;
  }

  // If we get here, number must be prime.
  return number;
}

/*----------------------------------*/
/*           Main Program           */
/*----------------------------------*/

int main(void)
{
  int number;
  int prime_factor;

  // Keep reading numbers from stdin until there are no more.
  while (scanf("%d", &number) != -1) {

    // Ignore numbers I don't like.
    if (number < 2) continue;

    // Keep looping as long as number isn't prime.
    while ((prime_factor = find_first_factor(number)) != number) {

      // Print the factor I found above and update the number.
      printf("%d, ", prime_factor);
      number /= prime_factor;
    }

    // Print out the last factor.
    printf("%d\n", number);
  }

  return 0;
}

A little explaination is in order here. So far we have used scanf as if it returned nothing. In fact, scanf returns an integer. In particular, it returns a count of the number of things it successfully scanned. In my case I am asking scanf to scan for an integer. If it finds one it will return 1. However, if it comes to the end of the input before finding an integer it will return -1 to indicate an "error" condition. In that case, my outer while loop will end and the program will terminate.

With my inner while loop I use a style similar to what I showed for reading characters in a filter program. In particular, I execute a function, stash what it returns in a variable for later, and perform a test all inside the loop condition. The parentheses around the assignment are necesary because the precedence of assignment is normally lower than that of the test for equality. I explained this in Lesson #9. This technique is particular to C and languages derived from C. Many languages can't use assignment this way. However, it is very common in C so it's good to get used to seeing it.

When I built my p-code I realized in the back of my mind that I could do these two things. That knowledge influenced my p-code. In a sense that is "cheating". I'm thinking about how the coding will go during my design. Yet that is often necessary. As you gain more experience with what your language can do and how, you will also find yourself taking that into account as you design.

Okay... so now I compile and run my program to see how it is coming...

Hey! It looks beautiful.

The next step is to get the output in the right format. Hmmm. Here's a possibility.

FUNCTION <Main>(VOID) RETURN <Integer status code>
  BEGIN
    WHILE <There is a number at stdin> LOOP

      # Ignore numbers I don't like.
      IF <The number is less than 2> CONTINUE;

      <Set a "factor counter" to zero>

      # Otherwise process the number.
      WHILE <The number is not prime> LOOP
        <Find the smallest prime factor>

        # Do we just remember this factor?
        IF <This is the first factor (factor counter at zero) OR
            This is the same as the last factor> THEN
          <Increment a "factor counter">
        ELSE
          # This is something new. Handle the old stuff.
          <Display a term expressing all the accumulated factors>
          <Set the factor counter to one>
        END
        <Save the current factor as the "old" factor>
        <Update the number by dividing it by its smallest factor>
      END

      # We are done with the loop. Process any leftovers as well as the
      # last factor. The IF handles the case where the last factor
      # is the same as the ones we were accumulating. The ELSE handles
      # the case where the last factor is something different.
      #
      IF <The number is the same as the old factor> THEN
        <Increment the factor counter>
        <Display a term expressing all the accumulated factors>
      ELSE
        <Display a term expressing all the accumulated factors>
        <Print number (so that the last factor is displayed)>
      END

    END
  END

As you can see this logic is much more nasty. The reason for all the complexity is because I have to remember things about factors generated in the past. In particular I keep looping and each time I generate a factor that is the same as the old one I count it. When I see something different I then output a term that represents the old, accumulated factors, and get ready to count copies of the new one. Finally when the loop is finished I have to deal with leftover old factors that I've been counting as well as the last one. It's altogether rather unpleasent.

To make my life a bit easier, I'm going to push the operation "Display a term expressing all the accumulated factors" into a function. That way I don't have to worry about its details here. In this program, it will be a fairly simple function:

void display_term(int factor, int count)
{
  printf("%d^%d", factor, count);
}

Although it might not seem worthwhile to make such a simple function, there is some value in it. A future version of this program might try to display each term in a fancier way. For example, some terminals can actually display exponents. This function might be modified to detect such terminals and do whatever must be done to display the exponent nicely. In that respect, this function is more of a placeholder.

The final program is in prime_factors.c. It works as desired.

P.S. While testing my final program I found that it did things like this:

$ prime_factors
10                      <-- I typed this.
2^1 * 5                 <-- The program responded like this

The "2^1" is correct, but looks bad. I realized that it was doing this because my display_term function always uses the format "%d^%d" no matter what value is given as the count. To fix this problem I just had to modify display_term like so

void display_term(int factor, int count)
{
  if (count == 1) printf("%d", factor);
    else printf("%d^%d", factor, count);
}

Because I put the printing of the terms into a function I only had to modify a single place instead of the several places where I use the function. This is another nice feature of functions. They make updating the program easier. This is a particularly interesting example because I argued just a few lines above that it made sense to make the printing of terms a separate function precisely to make these sorts of modifications easier.