Catalogues¶
Ingesting catalogues into Data Central allows for SQL/ADQL querying, and broadcasts the table(s) through the Data Central TAP server. The Schema Browser will also include an entry for each catalogue.
If you provide an input catalogue as part of the catalogue ingestion (as described later in this article), additional functionality is provided:
sources will appear in the Name Resolver and can be automatically resolved by the image cutout.
sources will appear in the Cone Search
sources will be available in the Single Object Viewer. Individual data products (IFS, Spectra) can be linked to a source, and custom SQL run to populate the SOV with particular rows from your catalogues.
Note
source
in Data Central is used interchangeably with AstroObject
. It is a survey-team defined astronomical
object with positional information that individual data product files can be linked to.
Note
Remember that the documentation mentioned here is the static, paper-like documentation, the documentation on Documentent Central is entirely separate.
Data Model¶
Data Central’s ingestion process will map your data onto the Data Central data model format. Within Data Central, catalogue data are organised hierarchically, as per:
Survey
└── DataRelease
└── Schema:Catalogues
└── Group
└── Table
There are dozens of tables from multiple surveys in the Data Central database.
Groups
are used to collect scientifically-related tables together, in order to help the user locate the correct table more quickly.
To explore the data model further, visit the catalogue section of the
Schema Browser to explore the relationships between groups and tables.
Directory Structure¶
To ingest data into Data Central, you will provide two folders, one containing the data products themselves, and one containing the metadata.
Data¶
The catalogues directory should contain the catalogue files themselves.
data
└── <survey>
└── <datarelease>
└── catalogues
├── my_input_cat.fits
└── my_output_table.csv
Attention
<survey>
and <datarelease>
should be replaced with the values you chose in Getting Started, e.g., gama and dr2
Data Central supports catalogues/tables in .csv or .fits formats.
Danger
If your input table is > 2GB in size, please ensure the format is .csv (not fits).
Metadata¶
The following file structure should be adopted.
A top-level <survey>
directory should contain a single directory per <datarelease>
.
Both directories should have metadata files described below which will populate the Schema Browser.
metadata
└── <survey>
├── <survey>_survey_meta.txt
└── <datarelease>
├── <survey>_<data_release>_data_release_meta.txt
└── catalogues/
├── <survey>_<datarelease>_column_meta.txt
├── <survey>_<datarelease>_coordinate_meta.txt ** optional
├── <survey>_<datarelease>_group_meta.txt
├── <survey>_<datarelease>_sql_meta.txt ** optional
├── <survey>_<datarelease>_table_meta.txt
└── docs/
The metadata/catalogues/ directory will contain a minimum of 3 metadata files, plus a docs/ directory if you have supplied additional documentation for a particular catalogue. The two optional metadata files (coordinate_meta and sql_meta) are described later in this article.
Metadata Files¶
Attention
Metadata files are always pipe-delimited, and have the extension .txt
<survey>_<datarelease>_group_meta.txt¶
This file describes the groups
you would like to register, and will be used to populate the Schema Browser.
Provide the following a single pipe-delimited .txt file containing an entry (row) for each group:
name |
pretty_name |
description |
documentation |
contact |
date |
version |
---|---|---|---|---|---|---|
ApMatchedPhotom |
ApMatchedPhotom |
This group provides aperture matched ugrizYJHK photometry. |
unique_group_documentation_filename.txt |
name <email@institute.org> |
2012-04-23 |
v02 |
Please name this file: <survey>_<datarelease>_group_meta.txt e.g., sami_dr2_group_meta.txt
-
<survey>_<datarelease>_group_meta.txt
This file should contain the following columns
-
name
(required=True, type=char, max_limit=100)¶ Group name. Use only alphanumeric characters. This must be unique per data release.
-
pretty_name
(required=True, type=char, max_limit=100)¶ A human-readable version of the group name. This can contain any characters (up to the character limit).
-
description
(required=True, type=char, max_limit=1000)¶ A succinct paragraph describing the group.
-
documentation
(required=True, type=char, max_limit=1000)¶ If you would like formatted text to appear in the schema browser, please supply the name of the file containing html-formatted text (see Formatting for more info). Note, this is typically for 2-3 paragraphs of information. Detailed documentation should be written into a Document Central article. If you do not wish to supply documentation for a particular row, leave this entry blank.
-
contact
(required=True, type=char, max_limit=500)¶ Format as: John Smith <john.smith@institute.org>
-
date
(required=True, type=char, max_limit=100)¶ Group creation/update date as defined by the team e.g., 2012-04-23
-
version
(required=True, type=char, max_limit=100)¶ Group version as defined by the team e.g., v1.8
-
<survey>_<datarelease>_table_meta.txt¶
This file describes the tables
you would like to register, and will be used to populate the Schema Browser and be
available for public SQL/ADQL querying, as well as discoverable through the Data Central TAP server.
Please provide a single .txt file with an entry per table, containing the following meta information:
name |
description |
documentation |
group |
filename |
contact |
date |
version |
---|---|---|---|---|---|---|---|
ApMatchedCat |
This table contains r-band aperture matched photometry and otherSource Extractor outputs for all GAMA DR2 objects. |
unique_table_documentation_filename.txt |
ApMatchedPhotom |
ApMatchedCat.fits |
name <email@institute.org> |
2012-04-23 |
v02 |
Please name this file: <survey>_<datarelease>_table_meta.txt e.g., sami_dr2_table_meta.txt
-
<survey>_<datarelease>_table_meta.txt
This file should contain the following columns
-
name
(required=True, type=char, max_limit=100) Table name. Use only alphanumeric characters. This must be unique per data release.
-
description
(required=True, type=char, max_limit=1000) A succinct paragraph describing the group.
-
documentation
(required=True, type=char, max_limit=1000) If you would like formatted text to appear in the schema browser, please supply the name of the file containing html-formatted text (see Formatting for more info). Note, this is typically for 2-3 paragraphs of information. Detailed documentation should be written into a Document Central article. If you do not wish to supply documentation for a particular row, leave this entry blank.
-
group_name
(required=True, type=char, max_limit=100)¶ The name of the group (must match a group name from the <survey>_<datarelease>_group_meta.txt file above)
-
filename
(required=True, type=char, max_limit=1000)¶ The filename of the table you’ll be providing
-
contact
(required=True, type=char, max_limit=500) Format as: John Smith <john.smith@institute.org>
-
date
(required=True, type=char, max_limit=100) Table creation/update date as defined by the team e.g., 2012-04-23
-
version
(required=True, type=char, max_limit=100) Table version as defined by the team e.g., v1.8
-
<survey>_<datarelease>_column_meta.txt¶
This file describes the columns
you would like to register for each table
, and will be used to populate the Schema Browser, SQL/ADQL query service, and the TAP server.
Please provide the following a single pipe-delimited .txt file containing an entry (row) for each column:
name |
table_name |
description |
ucd |
unit |
data_type |
---|---|---|---|---|---|
ALPHA_J2000 |
ApMatchedCat |
RA (r band) |
pos.eq.ra;em.opt.R |
deg |
double |
CATAID |
EnvironmentMeasures |
Unique GAMA ID |
meta.id |
double |
Please name this file: <survey_dr>_column_meta.txt e.g., sami_dr2_column_meta.txt
-
<survey>_<datarelease>_column_meta.txt
This file should contain the following columns
-
name
(required=True, type=char, max_limit=100) Column name. Use only alphanumeric characters.
Attention
Column names must be SQL-queriable, use only characters, letters and underscores in your column names. Column names cannot start with numbers but can include numbers afterwards. Forbidden characters include: %^&({}+-/ ][‘’’
-
description
(required=True, type=char, max_limit=1000) A succinct paragraph describing the table.
-
table_name
(required=True, type=char, max_limit=100)¶ The name of the table (must match a table name from the <survey>_<datarelease>_table_meta.txt file above)
-
ucd
(required=True, type=char, max_limit=100)¶ UCDs can be found here: http://cds.u-strasbg.fr/UCD/tree/js/ (more info: https://arxiv.org/pdf/1110.0525.pdf)
-
unit
(required=True, type=char, max_limit=100)¶ Column unit
-
data_type
(required=True, type=char, max_limit=100)¶ data type of the column. Add the full name of the data type such as integer instead of shorten form int.
-
Extra Requirements¶
Note: this section is optional, you do not have to provide an _coordinate_meta.txt file or _sql_meta.txt file if your data release does not lend itself to individual source identification.
You cannot ingest individual data products associated with a single astronomical object without completing this step.
By providing a metadata file pointing to the input catalogue of your data release, Data Central is able to populate the database with Astronomical Objects from your survey. These objects are then accessible in the name resolver, and cone search (as well as the image cutout overplotting functionality).
Source Catalogue Identification¶
To identify a table as a source catalogue (i.e. an input catalogue that has one row per source in your data release), please provide a metadata file that contains the name of a single table that contains the resolver info (source name, coordinates, format), as per:
table_name |
source_name_col |
long_col |
lat_col |
long_format |
lat_format |
frame |
equinox |
---|---|---|---|---|---|---|---|
InputCatA |
CATAID |
RA_deg |
Dec_deg |
deg |
deg |
icrs |
Please name this file: <survey_dr>_coordinate_meta.txt e.g., sami_dr2_coordinate_meta.txt
Danger
If your input table is > 2GB in size, please ensure the format is .csv (not fits).
Tip
It is advised to provide coordinates as RA, Dec (degrees, degrees). If your Long/Lat fields are not in an ICRS coordinate frame (degrees), Data Central will auto-generate these columns.
-
<survey_dr>_coordinate_meta.txt
This file should contain the following columns
-
table_name
(required=True, type=char, max_limit=100) The table name (not filename) to be used (must have an entry in the <survey>_<datarelease>_table_meta.txt file.)
-
source_name_col
(required=True, type=char, max_limit=100)¶ The column name for source name (from the specified table)
-
long_col
(required=True, type=char, max_limit=100)¶ The column name for longitude (from the specified table)
-
lat_col
(required=True, type=char, max_limit=100)¶ The column name for latitude (from the specified table)
-
long_format
(required=True, type=char, max_limit=100)¶ The longitude format. Depending on the formatting of your coordinate values (i.e., whether decimal/space delimited/colon delimited) and the value of long_format/lat_format (deg or h), coordinate data are interpreted as:
value
format
interpretation
10.2345
deg
Degrees
1 2 3
deg
Degrees, arcmin, arcsecond
1:2:30.40
deg
Sexagesimal degrees
1 2 0
hourangle
Sexagesimal hours
-
lat_format
(required=True, type=char, max_limit=100)¶ The latitude format. Depending on the formatting of your coordinate values (i.e., whether decimal/space delimited/colon delimited) and the value of long_format/lat_format (deg or h), coordinate data are interpreted as:
value
format
interpretation
10.2345
deg
Degrees
1 2 3
deg
Degrees, arcmin, arcsecond
1:2:30.40
deg
Sexagesimal degrees
1 2 0
hourangle
Sexagesimal hours
-
frame
(required=True, type=char, max_limit=100)¶ Coordinate frame. Accepted values are (fk5, fk4, icrs, galactic, supergalactic)
-
equinox
(required=True, type=char, max_limit=100)¶ If appropriate (leave blank for icrs), the equinox of this frame. Accepted values are (j2000, j1950, b1950)
-
Data Central will auto-generate ICRS-frame RA(deg) Dec(deg) columns if that format has not been provided. Data Central is able to transform using the combinations of coordinate systems listed above. If you do not see the coordinate system your data are currently recorded in, it is advised to generate RA, Dec columns as ICRS for your catalogue to be included.
Danger
The values in source_name_col
must be unique across all of the tables included in the source catalogue. If none of your existing tables meet this requirement, then you will need to generate a new table, which need only include source name, RA and Dec. You do not need to include this table in the column or table metadata files, but it would be preferred.
SOV SQL Functionality¶
If you wish for a your tables to be queried and rows displayed as part of the Single Object Viewer, please provide the following table:
table_name |
sql |
---|---|
InputCatA |
“SELECT * FROM gama_dr2.InputCatA WHERE CATAID = {objid}” |
Please name this file: <survey_dr>_sql_meta.txt e.g., sami_dr2_sql_meta.txt
-
<survey_dr>_sql_meta.txt
This file should contain the following columns
-
table_name
(required=True, type=char, max_limit=100) The table name (not filename) to be used (must have an entry in the <survey_dr>_table_meta.txt file.)
-
sql
(required=True, type=char, max_limit=100)¶ The SQL expression (following the DC syntax of survey_dr.table_name) to be run on SOV load. {objid} will be replaced by the AstroObject being requested.
-
Attention
Ensure your SQL runs before submitting this metadata file. e.g., check whether you need single quotes around the objid as per: ‘{objid}’