Spectra

Ingesting spectral data products into Data Central allows for population of the Single Object Viewer, and enables download of those data product files. The Schema Browser will also include an entry for each spectral data product type. The minimum requirements for ingesting spectral data products are described below.

Note

Please Note: this is a living document, additional requirements for various data product visualizations (1D spectra etc) will be set, and updated, in the near future. Although Data Central will take every effort to minimize changes to the format of the meta-data required, in order to provide survey teams with interactive visualizations and deliver the best possible user experience, we may from time-to-time require changes to these requirements.

For ingested surveys, where possible, we will auto-generate any meta data required (from fits headers), but teams should be aware that this document will likely contain additional/updated requirements on a ~6 monthly timescale in order to be compatible with the latest Data Central release.

Important

Prerequisite: You’ll need to have provided an input catalogue of sources, as per the Catalogues documentation, following all sections up to and including Source Catalogue Identification.

Note

Remember that the documentation mentioned here is the static, paper-like documentation, the documentation on Documentent Central is entirely separate.

Data Model

Data Central’s ingestion process will map your data onto the Data Central data model format. Spectra are organised hierarchically, as per:

Survey
└── DataRelease
    └── Schema:Spectra
        └── Facility
            └── Data Product

To explore the data model further, visit a survey with spectra (e.g., GAMA DR2) in the Schema Browser to explore the relationships between facilities and data products.

Directory Structure

To ingest spectra, you will provide two folders, one containing the data products themselves, and one containing the metadata. You’ll also need to have provided an input catalogue of sources, as per the Catalogues documentation, following all sections up to and including Source Catalogue Identification.

Data

The spectra directory should contain the fits files themselves.

data
└── <survey>
    └── <datarelease>
        └── spectra
            └── <facility_name>
                ├── product1.fits
                └── product2.gz
            └── <facility_name>
                ├── product3.fits
                └── product4.fits

Attention

<survey> and <datarelease> should be replaced with the values you chose in Getting Started, e.g., gama and dr2

Data Central supports spectra .fits format only (single file per spectrum).

A good rule of thumb is to keep your files succinct and with as few extensions as possible (i.e,, do not pack tens of extensions into your fits files). This makes mapping the correct file (by data type) to a browser visualizer simpler.

Metadata

The following file structure should be adopted. A top-level <survey> directory should contain a single directory per <datarelease>. Both directories should have metadata files described below which will populate the Schema Browser.

metadata
└── <survey>
    ├── <survey>_survey_meta.txt
    └── <datarelease>
        ├── <survey>_<data_release>_data_release_meta.txt
        └── spectra/
            ├── <survey>_<datarelease>_facility_meta.txt
            ├── <survey>_<datarelease>_product_ingestion.txt
            ├── <survey>_<datarelease>_product_meta.txt
            └── docs/

The metadata/catalogues/ directory will contain a minimum of 3 metadata files, plus a docs/ directory if you have supplied additional documentation for a particular facility/product.

Metadata Files

Attention

Metadata files are always pipe-delimited, and have the extension .txt

<survey>_<datarelease>_facility_meta.txt

Provide the following a single pipe-delimited txt file containing an entry (row) for each facility:

name

pretty_name

description | documentation

2dfgrs

2dFGRS

All 2dFGRS spectra of GAMA DR2 objects obtained through the 2dFGRS query page. | 2dFGRS_survey.html

Please name this file: <survey>_<datarelease>_facility_meta.txt e.g., gama_dr2_facility_meta.txt

<survey>_<datarelease>_facility_meta.txt

This file should contain the following columns

name(required=True, type=char, max_limit=100)

Facility name. Use only alphanumeric characters. This must be unique per data release.

pretty_name(required=True, type=char, max_limit=100)

A human-readable version of the facility name. This can contain any characters (up to the character limit).

description(required=True, type=char, max_limit=1000)

A succinct paragraph describing the facility.

documentation(required=True, type=char, max_limit=1000)

If you would like formatted text to appear in the schema browser, please supply the name of the file containing html-formatted text (see Formatting for more info). Note, this is typically for 2-3 paragraphs of information. Detailed documentation should be written into a Document Central article. If you do not wish to supply documentation for a particular row, leave this entry blank.

<survey>_<datarelease>_product_meta.txt

All types of data products (regardless of whether a particular astro object has the data product) will appear in the Schema Browser. Provide the following a single pipe-delimited txt file containing an entry (row) for each product:

facility_name

name

description

documentation

version

contact

2dfgrs

spectrum_1d

Reduced 1D spectrum

2dfgrs_spec.html

1.2

John Smith <john.smith@institute.org>

2qz

spectrum_1d

Reduced 1D spectrum

1.2

John Smith <john.smith@institute.org>

Please name this file: <survey_dr>_product_meta.txt e.g., gama_dr2_product_meta.txt

Attention

Depending on the value for the name column, you will need to add additional columns to this table. See File Formats for more information.

<survey>_<datarelease>_product_meta.txt

This file should contain the following columns

name(required=True, type=char, max_limit=100)

Data product name. Choose from the following:

name

value

vis_type

spectrum_1d

1D Spectrum

1d_spectrum

spectrum_2d

2D Spectrum

2d_spectrum

Now check the File Formats section for any additional columns you may need to add to this file. If you have additional product names that aren’t covered by this list, please let us know as soon as possible.

facility(required=True, type=char, max_limit=100)

The name of the facility (must match a facility name from the <survey>_<datarelease>_facility_meta.txt file)

description(required=True, type=char, max_limit=1000)

A succinct paragraph describing the product.

documentation(required=True, type=char, max_limit=1000)

If you would like formatted text to appear in the schema browser, please supply the name of the file containing html-formatted text (see Formatting for more info). Note, this is typically for 2-3 paragraphs of information. Detailed documentation should be written into a Document Central article. If you do not wish to supply documentation for a particular row, leave this entry blank.

version(required=True, type=char, max_limit=100)

Product version as defined by the team e.g., v1.8

contact(required=True, type=char, max_limit=500)

Format as: John Smith <john.smith@institute.org>

data_format(required=True, type=char, max_limit=100)

This describes the format of the data, and tells Data Central how to read the files from the facility. Note that all the files from a facility must be in the same format (you can create multiple facilities if needed).

This can be one of three options:

  • Single-Split: the data is stored in multiple HDUs, with each HDU composed of a single 1D array.

  • Multiline-Single: the data is stored in a single HDU, across multiple lines (i.e. a 2D array).

  • specutils: the data is in neither of the above formats, and should be loaded via a specific specutils_ loader. If you think this might be the case, please contact us as soon as possible, as this option is somewhat more complex than the first two.

If specutils is selected, then add additional column specutils_format, which is the name of the specutils_ loader to use to read the file.

If Multiline-Single or Single-Split are selected, then add the additional columns all_keywords, all_standard_units, valid_wcs and (if needed, see below) fixup_file.

specutils_format(required=True, type=char, max_limit=100)

This is the name of the specutils loader to use to read the fits file. A list of possible loaders can be found here, and instructions for creating new loaders can be found here.

valid_wcs(required=True, type=bool)

Is the WCS valid (e.g. do the axes match what is expected, do they meet the WCS standard)? Either true or false. The best strategy to check this is to use astropy to try to read the WCS from the FITS header.

If false, fixup_file must be set.

all_standard_units(required=True, type=bool)

Are all the required units in the header, with the right keyword? Either true or false. In general, this means that both CUNIT1 (for the spectral unit, e.g. wavelength or frequency) and BUNIT (for the flux unit) are set. Try reading these header keywords with astropy if you are unsure.

If false, fixup_file must be set.

all_keywords(required=True, type=bool)

Do the HDU(s) include the required keywords to determine their purpose? Either true or false.

We look at different keywords depending on Single-Split or Multiline-Single.

For Multiline-Single, either the EXTNAME or HDUNAME keyword is used, whereas for Single-Split, either the ROW or ARRAY keywords are used.

Using the keyword, the value of the keyword is matched to those in the table below (in a case-insensitive manner).

Value

Purpose

badpix

Skip this row/HDU, do not treat as a spectra.

blank value

Skip this row/HDU, do not treat as a spectra.

sky

Treat as sky spectra, matching to last science spectra read.

stdev

Treat as standard deviation on the last science spectra read.

sigma

Treat as standard deviation on the last science spectra read.

variance

Treat as variance on the last science spectra read.

spectrum

Treat as a science spectra.

These keywords are then mapped to one of the purposes below (the following can also be used as the value in the FITS file).

Name

Purpose

skip

Skip this HDU/row, don’t try to interprate it as a spectrum (e.g. is an image or table).

science

The actual science spectrum.

error_*

The error on a science spectrum. It is assumed that this applies to the previous science spectrum. If a different scheme is needed, a specutils loader should be used. Currently the possible values are error-stdev, error-variance and error-inversevariance, which come from what error handling is supported by astropy. Other error schemes should add support to astropy, or mark the row/HDU as skip.

combined_*

A science or error spectrum combined from other spectra within the file. This will be used preferentially over other spectra found, so should only appear once.

sky

Spectra of the sky (presumably used for reduction purposes).

unreduced_*

Unreduced science or error, will preferentially not be used over other spectra, unless all spectra are unreduced (unlikely).

If false (or you would like to specify more detail about the purpose of each HDU or row), fixup_file must be set.

fixup_file(required=True, type=char, max_limit=100)

The fixup file is a YAML file which includes additional metadata about how to read and interpret your spectra, and the format is documented below. fixup_file is the relative path to this file. You can have one fixup file per facility.

The fixup file

The fixup file is a YAML file with a specific set of information which tells Data Central how to understand your spectra. YAML is common human readable and writeable configuration format (vs. JSON or similar which while readable are quite hard to write).

If you have not seen or written YAML before, https://camel.readthedocs.io/en/latest/yamlref.html is a good reference—we only use a small subset of the features of YAML, so don’t feel you need to understand everything on that page.

There are three sections in the fixup file, wcs, units and hdus/hdu.

wcs sets which FITS header keywords or the values to use for WCS. The following keys are used within the fixup file (Note that either the value or keyword should be specified. Otherwise all keys are required.):

  • pixel_reference_point_keyword or pixel_reference_point: The reference pixel to use, this is usually set by the FITS keyword CRPIX1.

  • pixel_reference_point_value_keyword or pixel_reference_point_value: The value of the reference pixel, this is usually set by the FITS keyword CRVAL1.

  • pixel_width_keyword or pixel_width: The width of each pixel in the WCS unit, this is usually set by the FITS keyword CDELT1.

  • wavelength_unit_keyword or wavelength_unit: The units that the WCS values are in, this is usually set by the FITS keyword CUNIT1.

units sets which FITS header keywords or the values to use for the flux. The following keys are used within the fixup file (Note that either the value or keyword should be specified.):

  • flux_unit_keyword or flux_unit: The units the spectral flux is in. This is usually set by the BUNIT FITS keyword. One of these keywords is required.

  • flux_scale_keyword or flux_scale: How much to scale the flux by. This is usually set by the BSCALE FITS keyword, and if not specified is treated as 1.

The third and most complex section is the hdus/hdu section. hdus is used by the Single-Split format, whereas hdu is used by the Multiline-Single format. Both systems allow for quite precise specification of how the data should be treated. Ideally, this section isn’t needed, and your FITS headers contain sufficient metadata that everything can be automatically detected. Also, this section can only do so much—if there needs to be different metadata specified for each individual FITS file, then this section is insufficient. In that case, creating a specutils loader might be the best way forward.

For the hdu section, you can specify some metadata for the whole HDU, and some additional data on a per-row basis. For the whole hdu, the following options are available:

  • require_transpose: Defaults to False, this is if the data is stored in columns rather than rows.

  • purpose_prefix: Which keywords hold the purpose of each row. Should be of the form <PREFIX>n e.g. ROW0 ROW1 etc. Must cover all rows. The valid values can be found under the all_keywords column.

For each row, in addition to the keywords under the wcs and units sections, the purpose key can be set, and follows the same rules as all_keywords.

The hdus sections is very similar to the hdu section, though rather than dealing with rows, it deals with HDUs. For the whole file, a purpose_prefix can be set (following the same rules given under the hdu section), and for each HDU a purpose key can be set (again using the same rules as the hdu section).

Finally, under the hdu or hdus section, a single cycle key can be set. This follows the same rules as a normal hdu or hdus section (other than it cannot contain another cycle), and allows specifying metadata when there may be a variable number of rows/HDUs with a single collective purpose. An example fixup files (from the OzDES survey) is below:

hdus:
  "0":
    purpose: "combined_science"
  "1":
    purpose: "combined_error_variance"
  "2":
    purpose: "skip"
  "cycle":
    "0":
      purpose: "science"
    "1":
      purpose: "error_variance"
    "2":
      purpose: "skip"

<survey>_<datarelease>_product_ingestion.txt

The final file you will need to provide links each data product with the relevant source from your input catalogue (as mentioned above, you’ll need to have provided an input catalogue of sources, as per the Catalogues documentation, following all sections up to and including Source Catalogue Identification). This file should include an entry (row) for every file you wish to ingest into Data Central.

facility_name

data_product_name

file_name

rel_file_path

source_name

specid

is_best

hdu

purpose

ra

dec

wmin

wmax

start_time

end_time

z

hrv

target_name

snr

resolving_power

spatial_res

gama

spectrum_1d

G12_Y3_017_187.fit

gama/G12_Y3_017_187.fit

6802

G12_Y3_017_187

TRUE

0

science

174.006

0.72093

3727.71

8857.67

2009-01-01T00:00:00.00Z

2009-01-01T00:30:00.00Z

0.05

GAMAJ113601.43+004315.3

3.37

1000

2.1

Please name this file: <survey_dr>_product_ingestion.txt e.g., gama_dr2_product_ingestion.txt

<survey>_<datarelease>_product_ingestion.txt

This file should contain the following columns

facility_name(required=True, type=char, max_limit=100)

The name of the facility (must match a facility name from the <survey>_<datarelease>_facility_meta.txt file)

data_product_name(required=True, type=char, max_limit=100)

The name of the data product (must match a facility name from the <survey>_<datarelease>_product_meta.txt file)

file_name(required=True, type=char, max_limit=100)

The filename of the data product you’ll be providing

rel_file_path(required=True, type=char, max_limit=100)

The relative path of the file. e.g., <facility_name>/product1.fits. See Directory structure.

source_name(required=True, type=char, max_limit=100)

The source name as provided in your source catalogue (see Source Catalogue Identification).

specid(required=False, type=char, max_limit=60)

A name/identifier used to distinguish between different spectra with the same source_name. Can include the source name, but that is not required.

is_best(required=False, type=bool, default=True)

You may associate multiple spectra with a single source, and define ONE of those spectra as the best (add a description of this in your <survey>_<datarelease>_product_meta.txt file or data release information on Document Central. The best spectra for a source will be highlighted in the SOV as such.

ra(required=True, type=char, max_limit=30)

The RA of the spectrum (icrs) in decimal degrees. This is required by the Simple Spectrum Access service.

dec(required=True, type=char, max_limit=30)

The Declination of the spectrum (icrs) in decimal degrees. This is required by the Simple Spectrum Access service.

Simple Spectral Access Metadata

In addition to the other metadata needed to load and read the spectra files, surveys need to provide metadata such that the spectra files can be served over the Simple Spectral Access Protocol (SSA). This metadata can either be in the individual FITS files (preferred), or provided by additional files as part of the metadata preparation process.

<survey>_<datarelease>_product_ingestion.txt
start_time(required=True, type=char, max_limit=50)

The start time of the observation in UTC isoformat: yyyy-mm-ddThh:mm:ss.ss. If the spectrum is the combination of multiple spectra, use the start time of the earliest spectrum. By default, the FITS keyword DATE-OBS will be looked at.

end_time(required=True, type=char, max_limit=50)

The end time of the observation in UTC isoformat: yyyy-mm-ddThh:mm:ss.ss. If the spectrum is the combination of multiple spectra, use the end time of the latest spectrum. By default, the FITS keyword DATE-END will be looked at.

wmin(required=True, type=float)

The start wavelength of the spectrum - in Angstrom. This should be computable via the WCS information within the file.

wmax(required=True, type=float)

The end wavelength of the spectrum - in Angstrom. This should be computable via the WCS information within the file.

z(required=False, type=float)

The determined redshift of the spectrum. By default, the FITS keyword Z will be looked at.

hrv(required=False, type=float)

The determined heliocentric radial velocity of the spectrum - in km/s. By default, the FITS keyword TODO will be looked at. TODO

target_name(required=False, type=char, max_limit=150)

The preferred target name of the spectrum as a string. By default, the FITS keyword OBJECT will be looked at.

snr(required=False, type=float)

A representative signal-to-noise ratio of the spectrum (e.g. median or at the central wavelength). By default, the FITS keyword TODO will be looked at. TODO

resolving_power(required=False, type=float)

The resolving power (lambda/delta lambda) of the spectrum at its central wavelength. By default, the FITS keyword TODO will be looked at. TODO

spatial_res(required=False, type=float)

The spatial resolution corresponding to the PSF of the observed spectrum - in arcsec. By default, the FITS keyword TODO will be looked at. TODO

If a different FITS keyword should be read, an additional section ssa can be added to the approriate fixup file (in the case of Multiline-Single or Single-Split format) or by adding an ssa dictionary to the meta object in the specutils loader.