Spectra

Ingesting spectral data products into Data Central allows for population of the Single Object Viewer, and enables download of those data product files. The Schema Browser will also include an entry for each spectral data product type. The minimum requirements for ingesting spectral data products are described below.

Note

Remember that the documentation mentioned here is uneditable once the data are released. Documentation intended for public use and/or subject to change (e.g., with detailed descriptions of analysis or version update information) should be maintained by the survey teams in Data Central’s Documentation portal: Documentation Central.

Data Model

Data Central’s ingestion process will map your data onto the Data Central data model. Within Data Central, spectra data are organised hierarchically, as per:

<survey>
└── <datarelease>
    └── Schema:Spectra
        └── Facility
            └── Data Product

To explore the data model further, visit a survey with spectra (e.g., GAMA DR2) in the Schema Browser to explore the relationships between facilities and data products.

Directory Structure

To ingest spectra data into Data Central, you will provide two folders, one containing the data products themselves, and one containing the metadata.

Data

The spectra directory should contain the FITS files themselves.

data
└── <survey>
    └── <datarelease>
        └── spectra
            └── <facility_name>
                ├── product1.fits
                └── product2.gz
            └── <facility_name>
                ├── product3.fits
                └── product4.fits

Attention

<survey> and <datarelease> should be replaced with the values you chose in Getting Started, e.g., gama and dr2.

Data Central supports spectra in .fits format only (single file per spectrum).

A good rule of thumb is to keep your files succinct and with as few extensions as possible (i.e,, do not pack tens of extensions into your FITS files). This makes mapping the correct file (by data type) to a browser visualiser simpler.

Metadata

The following file structure should be adopted. A top-level <survey> directory should contain a single directory per <datarelease>. Please follow the directory format indicated below. The metadata files will populate the Schema Browser.

dcmetadata
└── surveys
    |____ <survey>
    ├──── <survey>_survey_meta.txt
    └──── <datarelease>
          ├── <survey>_<datarelease>_data_release_meta.txt
          └── spectra/
              ├── <survey>_<datarelease>_facility_meta.txt
              ├── <survey>_<datarelease>_product_ingestion.txt
              ├── <survey>_<datarelease>_product_meta.txt
              └── docs/

The dcmetadata/surveys/<survey>/<datarelease>/spectra/ directory will contain a minimum of 3 metadata files, plus an optional docs/ directory if you have supplied additional documentation for a particular facility/product.

Metadata Files

Attention

Metadata files are always pipe-delimited, and have the extension .txt.

<survey>_<datarelease>_facility_meta.txt

Please provide a single pipe-delimited .txt file containing an entry (row) for each facility, in the following format:

name

pretty_name

description

documentation

2dfgrs

2dFGRS

All 2dFGRS spectra of GAMA DR2 objects obtained through the 2dFGRS query page.

2dFGRS_survey.md

Please name this file: <survey>_<datarelease>_facility_meta.txt, replacing <survey> and <datarelease> with the survey name and datarelease name, respectively: e.g., sami_dr2_facility_meta.txt.

<survey>_<datarelease>_facility_meta.txt

This file should contain the following columns:

name(required=True, type=string, max_limit=100)

Facility name. Use only alphanumeric characters. This must be unique per data release.

pretty_name(required=True, type=string, max_limit=500)

A human-readable version of the facility name. This can contain any characters (up to the character limit).

description(required=True, type=string, max_limit=500)

A succinct paragraph describing the facility.

documentation(required=False, type=string, max_limit=1000)

If you would like formatted text to appear in the Schema Browser, you must supply a file in the docs/ directory. The file can be either plain text or markdown-formatted text (see Formatting for more info). This field is set to the name of the file. If you do not wish to supply documentation for a particular group, leave this entry blank. Remember that the documentation mentioned here is uneditable once the data are released. Documentation intended for public use and/or subject to change (e.g., with detailed descriptions of analysis or version update information) should be maintained by the survey teams in Data Central’s Documentation portal: Documentation Central.

<survey>_<datarelease>_product_meta.txt

All types of data products will appear in the Schema Browser, even if some AstroObjects do not contain all of them. If any AstroObject has an incomplete set of products, survey teams should document why, and whether or not to expect those products in the future.

Please provide a single pipe-delimited .txt file containing an entry (row) for each product, in the following format:

facility

name

type

description | documentation

version

contact

data_format

specutils_format

2dfgrs

spectrum_1d

spectrum_1d

Reduced 1D spectrum

2dfgrs_spec.md

1.2

John Smith <john.smith@institute.org>

specutils

2dFGRS obscore

2qz

spectrum_1d

spectrum_1d

Reduced 1D spectrum

1.2

John Smith <john.smith@institute.org>

specutils

2dFGRS obscore

Please name this file: <survey>_<datarelease>_product_meta.txt, replacing <survey> and <datarelease> with the survey name and datarelease name, respectively: e.g., sami_dr2_product_meta.txt.

Attention

Depending on the value for the name column, you will need to add additional columns to this table. See File Formats for more information.

<survey>_<datarelease>_product_meta.txt

This file should contain the following columns:

facility(required=True, type=string, max_limit=100)

The name of the facility (this must match a facility name from the <survey>_<datarelease>_facility_meta.txt file above).

name(required=True, type=string, max_limit=100)

Data product name. Choose from the following names:

name

value

vis_type

spectrum_1d

1D Spectrum

1d_spectrum

spectrum_2d

2D Spectrum

2d_spectrum

Now check the Data File Formats section for any additional columns you may need to add to this file. If you have additional product names that aren’t covered by this list, please let us know as soon as possible.

type(required=True, type=string)

The type of the spectra. Choose from the following names:

name

value

vis_type

spectrum_1d

1D Spectrum

1d_spectrum

spectrum_2d

2D Spectrum

2d_spectrum

spectrum_1d_table

1D Spectrum (stored in a FITS table)

1d_spectrum

description(required=True, type=string, max_limit=500)

A succinct paragraph describing the product.

documentation(required=False, type=string, max_limit=1000)

If you would like formatted text to appear in the Schema Browser, you must supply a file in the docs/ directory. The file can be either plain text or markdown-formatted text (see Formatting for more info). This field is set to the name of the file. If you do not wish to supply documentation for a particular group, leave this entry blank. Remember that the documentation mentioned here is uneditable once the data are released. Documentation intended for public use and/or subject to change (e.g., with detailed descriptions of analysis or version update information) should be maintained by the survey teams in Data Central’s Documentation portal: Documentation Central.

version(required=True, type=string, max_limit=100)

Product version as defined by the team e.g., v1.8.

contact(required=True, type=string, max_limit=500)

Format as: John Smith <john.smith@institute.org>.

data_format(required=True, type=string, max_limit=100)

This describes the format of the data, and tells Data Central how to read the files from the facility. Note that this is a vestigial field, and now this can only be one value. Please write “specutils” in this field. We will map to a specific specutils loader identified in the specutils_format field.

specutils_format(required=True, type=string, max_limit=100)

This is the name of the specutils loader to use to read the FITS file. A list of possible loaders can be found here <https://specutils.readthedocs.io/en/latest/spectrum.html#list-of-loaders>, and instructions for creating new loaders can be found here <https://specutils.readthedocs.io/en/stable/custom_loading.html>. If you and your team are unable to create the loaders, please contact us for help, by following this link .

<survey>_<datarelease>_product_ingestion.txt

This file will link each data product with the relevant source from your input catalogue (which was added in the Source Catalogue Identification). This file should include an entry (row) for every file you wish to ingest into Data Central.

facility_name

data_product_name

file_name

rel_file_path

source_name

specid

is_best

hdu

purpose

ra

dec

wmin

wmax

start_time

end_time

z

hrv

target_name

snr

resolving_power

spatial_res

gama

spectrum_1d

G12_Y3_017_187.fit

gama/G12_Y3_017_187.fit

6802

G12_Y3_017_187

TRUE

0

science

174.006

0.72093

3727.71

8857.67

2009-01-01T00:00:00.00Z

2009-01-01T00:30:00.00Z

0.05

GAMAJ113601.43+004315.3

3.37

1000

2.1

Please name this file: <survey>_<datarelease>_product_ingestion_meta.txt, replacing <survey> and <datarelease> with the survey name and datarelease name, respectively: e.g., sami_dr2_product_ingestion_meta.txt.

<survey>_<datarelease>_product_ingestion.txt

This file should contain the following columns:

facility_name(required=True, type=string, max_limit=100)

The name of the facility (this must match a facility name from the <survey>_<datarelease>_facility_meta.txt file above).

data_product_name(required=True, type=string, max_limit=100)

The name of the data product (this must match a data product name from the <survey>_<datarelease>_product_meta.txt file).

file_name(required=True, type=string, max_limit=1000)

The filename of the data product you will be providing.

rel_file_path(required=True, type=string, max_limit=100)

The relative path of the file. e.g., <facility_name>/product1.fits. See Directory structure.

source_name(required=True, type=string, max_limit=100)

The source name as provided in your source catalogue (see Source Catalogue Identification).

specid(required=False, type=string, max_limit=60)

A name/identifier used to distinguish between different spectra with the same source_name. This must be unique within a source, but not across different sources.

is_best(required=False, type=bool, default=True)

You may associate multiple spectra with a single source, and define ONE of those spectra as the best (add a description of this in your <survey>_<datarelease>_product_meta.txt file or data release information on Documentation Central). The best spectra for a source will be highlighted in the SOV as such.

ra(required=True, type=string, max_limit=30)

The RA of the spectrum (icrs) in decimal degrees. This is required by the Simple Spectrum Access service.

dec(required=True, type=string, max_limit=30)

The Declination of the spectrum (icrs) in decimal degrees. This is required by the Simple Spectrum Access service.

Simple Spectral Access Metadata

In addition to the other metadata needed to load and read the spectra files, surveys need to provide metadata such that the spectra files can be served over the Simple Spectral Access Protocol (SSA). This metadata can either be in the individual FITS files (preferred), or provided by additional files as part of the metadata preparation process.

<survey>_<datarelease>_product_ingestion.txt
wmin(required=True, type=float)

The start wavelength of the spectrum - in Angstrom. This should be computable via the WCS information within the file.

wmax(required=True, type=float)

The end wavelength of the spectrum - in Angstrom. This should be computable via the WCS information within the file.

start_time(required=True, type=string, max_limit=50)

The start time of the observation in UTC isoformat: yyyy-mm-ddThh:mm:ss.ss. If the spectrum is the combination of multiple spectra, use the start time of the earliest spectrum. By default, the FITS keyword DATE-OBS will be looked at.

end_time(required=True, type=string, max_limit=50)

The end time of the observation in UTC isoformat: yyyy-mm-ddThh:mm:ss.ss. If the spectrum is the combination of multiple spectra, use the end time of the latest spectrum. By default, the FITS keyword DATE-END will be looked at.

z(required=False, type=float)

The determined redshift of the spectrum. By default, the FITS keyword Z will be looked at.

hrv(required=False, type=float)

The determined heliocentric radial velocity of the spectrum - in km/s. By default, the FITS keyword TODO will be looked at. TODO

target_name(required=False, type=string, max_limit=150)

The preferred target name of the spectrum as a string. By default, the FITS keyword OBJECT will be looked at.

snr(required=False, type=float)

A representative signal-to-noise ratio of the spectrum (e.g. median or at the central wavelength). By default, the FITS keyword TODO will be looked at. TODO

resolving_power(required=False, type=float)

The resolving power (lambda/delta lambda) of the spectrum at its central wavelength. By default, the FITS keyword TODO will be looked at. TODO

spatial_res(required=False, type=float)

The spatial resolution corresponding to the PSF of the observed spectrum - in arcsec. By default, the FITS keyword TODO will be looked at. TODO