The AAVSO International Database, AID, contains 32.7 million observations. That's a lot of photometry!. The break down is:
visual 17.1M
CCD 15.4M
PEP 0.14M
DSLR 0.10M
other 0.03M
In recent years timeseries observations have become popular. I think that this kind of observation makes it hard to answer the question: How much CCD photometry is the AAVSO doing? Is it trending up or down? And with interest in exoplanet growing we are going to have hours-long time series submitted to the AID. This is going to swamp the CCD photometry observation count.
These observations are held in a MySQL database. I was trying to think of a way to define a new column in the db, InTimeseries. But that kind of query is beyond me. But surely there is a clever SQL programmer in our association. I'm looking for you!
Details: the relevant fields of the aid.observations table are:
(star) name varchar(30)
obscode varchar(5)
JD varchar(16)
obstype ENUM ( CCD= 2)
unique_id auto_increment primary key
I'm looking for a query that could go back and fill in the this new field. I'm thinking that an observation would qualify as in-a-timeseries if a consecutive block of observations with common starname and obscode was more than 50 long and spanned less than a day. If being consecutive is key maybe we should survey the data: maybe there is a distinct shape to a histogram of block sizes. I believe the timeseries data would be contiguous when it is submitted.
Anyway, contact me you can suggest a way to mine this data.
Thanks,
George
Hello George,
I'm pretty good in MySQL. One might begin by crafting some SELECT statements to see what they flagged. Only once I had full confidence in how a read-only operation worked would I then write an UPDATE statement that used the result of the SELECT statement.
The SELECT statement approach is going to involve a JOIN of the table with itself. The reason for this is that you are asking to compare an entry in the table with other entries in the same table. This may be too expensive an approach because a JOIN of a 32 * 10^6 record table with itself may cause 1024 * 10^12 comparisons to be done. So I have other ideas on how to make this less performance-intensive, probably involving some off-line processing.
I will email you directly.
~Brian