DevAnn #2: AEF Note sub-fields

Affiliation
American Association of Variable Star Observers (AAVSO)
Tue, 06/13/2017 - 21:05

This forum thread was introduced with the DevAnn #2  

     The definition of the AEF (AAVSO Extended File) Format is described here . Adherence to the standard format ensures that each part of each record in your report goes into the correct field in the AAVSO International Database (AID). 

     The AEF was defined years ago. It is a marvel of conciseness and was designed to accomodate many different observing modes. But as time goes on and our observation techniques change, the AEF needs to evolve.
     What are some of the issues?
     - The AEF format assumes that the comp star reference magnitude will be accessible via the chart reference and the comp star name. That is required for visual obs,  but not for CCD, so people are submitting comp star references where that chain of comp star name-to-chart-to-reference magnitude is broken. This is important because the comp ref is used to standardize the target magnitude.
     - The AEF accepts transformed data, but there is no way to submit the assumptions behind the transform computation.
     - Similarly, ensemble submissions have no backup or reference data.

     No one wants to consider a complete revamping of the format: there are too many dependencies on the current AID. And that will not be necessary if we use a sub-field mechanism which allows more data to be packed into the record without changing the number of fields of the basic record.

     So, what are sub-fields? According to the documentation:
     ----------------
NOTES: Comments or notes about the observation. If no comments, use "na". This field has a maximum length of several thousand characters, so you can be as descriptive as necessary. One convention for including a lot of information as concisely as possible is to use subfields in the format |A=B; the '|' character is the separator, A is a keyvalue name and B is its value. If you need an alternative delimiter from '|', use it but preceed the first instance with "DELIM=". Using this mechanism you can document your transform process in more detail. Here is an example as used by TransformApplier:
5 records aggregated|VMAGINS=-7.244|VERR=0.006|CREFMAG=13.793|CREFERR=0.026|
KREFMAG=14.448|KREFERR=0.021|VX=1.1501|CX=1.1505|KX=1.1500|Tv_bv=0.0090|Tv_bvErr=0.0100|TAver=2.47
     -------------------

     This definition was setup a year ago with the advent of the TransformApplier application. It created a mechanism where the user of the data could see exactly the basis of the transform computation.   It also documents the instrumental magnitude of the target (VMAGINS), and the reference magnitude of the comp (CREFMAG). So if someone doesn't believe the transformed mag, they can recover the untransformed standardized mag ( (VMAGINS - CMAG) + CMAGREF )

     The point of including more documentation with your data submission it to give the user of your data more flexibility. They will better understand the conditions under which the obs was taken and will be able to get back to the raw observation in case they want to re-process it (eg. the comp data has changed)

     So here are some subfields all submitters might consider including:
     VMAGINS   the instrumental magnitude of the target
     VERR      the error of the instrumental magnitude of the target

     Document the comp star, because this is the basis of the standardization and might not be available via the chartid and cname.
     CMAGREF
     CREFERR

     Ensembles are a sticky issue. The number of stars in the ensemble could be large and the methodology of the averaging could be various: almost impossible to document. So submit some information so a user can get back to the raw observation. Include the VMAGINS and VERR so they have your raw target info. And then include information on just one of your ensemble stars so they can re-standardize in a single comp mode.
     CNAME
     CMAGINS
     CERR
         as well as
     CMAGREF
     CMAGERR

     I present this here to get the discussion started. Let's see about creating a standard list of keywords and discuss when and how they should be used in the data submitted to WebObs via the AEF.

     George
 

Affiliation
Astronomical Society of South Australia (ASSAU)
Permitted characters and case

Hi George

This looks good!

Initial comments/questions:

  • Permitted characters in name and value: anything but "|", "=" or whitespace?
  • Case of name: upper only or case insensitive? Obviously you can answer that question with respect to TA, but do we want to constrain names (keys) to uppercase or allow any case or mixture thereof?

Thanks!

David

Affiliation
American Association of Variable Star Observers (AAVSO)
sub-fields as JSON?

I showed this plan to another programmer, and he asked: "Why are you reinventing JSON?". This schema of delimiter, keyword = value is just that: a reinvention. Re-cast the example that I showed as JSON:

{"NUMOBS":5, "VMAGINS":-7.244, "VERR":0.006, "CREFMAG":13.793, "CREFERR":0.026, "KREFMAG":14.448,  "KREFERR":0.021, "VX":1.1501, "CX":1.1505, "KX":1.1500, "Tv_bv":0.0090, "Tv_bvErr":0.0100, "TAver":2.47 }

- It's only slightly more verbose
- Still easily read by eye
- Sure, it's easy to program the parsing of |a=b, but every modern language has a library for JSON.
- JSON notation is extensible. My example had the values all as floats. Other value types (eg. string or array) would need a extension of the definition. Not so with JSON; it all just works.
- David asked about keynames: Whitespace not a probem for JSON. Case is up to our decision on what to standardize.
- The definition of the Notes field still allows for descriptive text at the beginning of the field. The JSON would begin with the first '{' (or with the '{' that matches the terminating '}'?)

George

 

 

Affiliation
Astronomical Society of South Australia (ASSAU)
JSON

Good point George. I almost suggested this as well. Honest. wink

JSON also has an advantage, not shown in your example, that values may not only be numbers and strings but arrays or maps (what JSON is), e.g. comp stars via an array {"COMPS":["000-BMH-815", "000-BMH-816"], ...}

It's not so much that key-value pairs, properties, re-invent JSON since they precede it, more that JSON generalises properties.

David

Affiliation
American Association of Variable Star Observers (AAVSO)
sub-fields as JSON

JSON looks like a good idea, but there are some constraints imposed by the AEF record format: there can be no newlines in the JSON, and there can be no DELIM characters in the JSON. Since DELIM is usually comma, this is a problem.

This issue of the DELIM can be addressed by opening up the webobs code and have it ignore DELIM's  inside the JSON. I was hoping that the subfield formatting issue we are discussing in this thread could be done without opening the code, but that is a forlorn hope. This is going to be a complicated change and the webobs should validate the JSON submission.

I've heard other objections to inserting JSON into a text field: that it includes data into the db but that this data would be inaccessible to sql queries. That if you need more data, then the db should be redesigned along with the input record format. But the bigger constraint is that we cannot change the number of fields in the AEFF; it took a lot to convince various commercial products to package their photometry data into this record format. No one is going to accept a drastic change. Any change to the format must be backward compatible. And as it proves its worth various systems will adopt it, we hope.

George

 

 

 

 

Affiliation
American Association of Variable Star Observers (AAVSO)
A modest proposal

Hello, George: After some thought following our chat at SAS/AAVSO, I think this initiative has developed a classic separation-of-functions problem. We're trying to make the reported-data lines of Extended Format submissions carry two meanings: the reported data per se (as currently in the Extended Format), and the new supporting data.

Long ago, non-fiction books and other documents solved this supporting-data problem with footnotes and endnotes. So, what if AAVSO (1) simply includes in each reported-data line's comment field (last field per line) something like <NOTE_0001> or even <NOTE_AP Cyg_V>, and then (2) at file bottom, below all the reported-data lines, appends supporting-data comment lines like:

#<NOTE_0001> whatever-supporting-data-we-agree-on

Advantages:

  • It's automatically 100% backward compatible, as all new data get sequestered into comment fields and comment lines.
  • Supporting data in AAVSO-specified format gets separated quite cleanly away from the user's own line comments. The user may still use the end-line comment field as wished. Observers will like this a lot, and it encourages format compliance.
  • Supporting-data lines (footnote lines) could be in any format, or even in user's choice of defined formats, depending on depth of supporting data (transforms or not, ensemble or not, etc).
  • Speaking of transform supporting data: this will normally be uniform (per filter) for a given upload, so this could have one footnote line per filter, once per uploaded file, rather than repeating it for every observation.
  • Supporting-data lines format could extend (now or later) to multiple lines, not possible in the one-comment-field-only direction that I see being considered.

I worry that trying to cram supporting data into the line-end comment field--mixed in with user's own comments--will cause endless headaches, not least of which is human-readability. Since we don't know what the scale of supporting data will end up being, perhaps we should think about a naturally extensible approach like footnote lines, which allows multiple lines of supporting data per reported observation, rather than limiting ourselves at the outset to (part of) a single field at the end of a single line.

Affiliation
American Association of Variable Star Observers (AAVSO)
The file the records came in..

Eric,

Let me raise a topic related to your footnote concept that you introduced:

When we submit data by file to webobs the system code combines the meta-data in that file with the AEF lines to create records for the AID. But what happens to that actual submission file? You mentioned to me at SAS that you had a hope that the file was preserved in some fashion because you often put extensive notes into that file, all in comment lines starting with the '#'. I told you that those files were thrown away because that is what I was told years ago. You were clearly dismayed at that news and I can see that your footnote proposal would be a way to capture extensive notes and fix that issue in the future.

The interesting thing is that I recently discovered that the submission files were actually dumped into a cache directory, not thrown away. No attempt was made to link the files to the AID records, but the files are there, 130,000+ of them, dating back to 2012!

So, this raises a couple of questions that I invite comment on:
- Is there a need/interest/value in having extensive notes be captured for records going into AID? We could conceivably create a docurment reference field in the AID record that would point to the file that presented the data to webobs, or some variation of the footnote concept described in this thread by Eric.
- Or is that too complex for webobs/AID? If the data needs extensive explanation, then you should publish a JAAVSO article or find some other venue for explaining the data.
- If there is value in the idea, should we explore recovering the files sitting in that forgotten cache directory?

George