
README
======

This folder contains the start files for the VariableStars database. The files are as follows:

    gcvs5.txt
  
The raw data downloaded from the Internet from the General Catalog of Variable Stars, version
5.1. The home page for this data, along with additional information about the data is here:
[http://www.sai.msu.su/gcvs/gcvs/](http://www.sai.msu.su/gcvs/gcvs/). 


    process-stars.awk
  
An Awk script that cleans and formats the data into SQL insert statements. This script has
already been run. It is included here only as a reference.


    star-tables-simple.sql
  
The definition of a relation to hold constellation information and a relation to hold the star
information


    star-data-const.sql
    
The data needed to populate the constellation table (constellation names and abbreviations).
This data needs to be loaded before star-data below so that referential integrity checks in the
star data will pass.


    star-data.sql
  
The data set produced by process-stars.awk. This file contains information on 47660 variables.
One of the records is for a variable with a period of more than 10,000 days. This won't fit
into the NUMERIC(7,3) datatype used for the period (which only allows for up to 9999.999 days).
Thus the number of records that insert successfully is only 47659. Ideally the process-stars
awk script would verify that the values of each field will work for the declared data type of
that field and produce a warning message if not. That way, the database administrator can get
a head's up about troublesome records before trying to actually insert the data.

The original file (gcvs5.txt) contains 58202 records. This means that over 10542 records were
thrown out by process-stars.awk, presummably due to having an unspecified minimium brightness.
This is quite a significant percentage of the original data, but probably reflects the number
of nova and supernova in the original data set.

