This chapter gives somewhat more in-depth information on what to do to
create services compliant to the various DAL protocols by data product
type. While dachs start should give you a fair chance of getting a
service running without reading any of this, it is still a good idea to
read the section for the sort of data you want to publish before setting
out.
SCS, the simple cone search, is the simplest IVOA DAL protocol – it is
just HTTP with RA, DEC, and SR parameters, a slightly constrained
VOTable response, plus a special way to encode errors (in a way somewhat
different from what has been specified for later DAL protocols).
The service discussed in the DaCHS Basics is a combined
SCS/form service. This section just briefly recapitulates what was
discussed there. For a quick start, just follow Quick start with
DaCHS.
SCS can expose any table that has exactly one column
each with the UCDs pos.eq.ra;meta.main, pos.eq.dec;meta.main,
and meta.id;meta.main, where the coordinates must be real or double
precision, and the id must be either some integral type or text; the
standard requires the id to be text, but the renderer will automatically
convert integral types. The main query is then ran against the position
specified in this way.
You almost always want to have a spatial index on these columns. To do
that, use the //scs#pgs-pos-index mixin on the tables, like
this:
<table id="forSCS" onDisk="true" mixin="//scs#pgs-pos-index"> ...
The “pgs” in pgs-pos-index refers to pgSphere, a postgres database
extension for spherical geometry. You may see RDs around that use
the //scs#q3cindex mixin instead here. It does the same
thing (dramatically speed up spatial queries) but uses a different
scheme. It's faster and takes up less space, but it's also less
general, which is why we are trying to phase it out. Only use it when
you are sure you cannot afford the (reasonable, i.e., mostly within a
factor of two) cost of pgSphere.
Note that to have a valid SCS service, you must make sure the
output table always contains the three required columns (as defined by
the UCDs) discussed above. To ensure that, these columns' verbLevel
attribute must be 10 or less (we advise to have it at 1).
SCS could work with a dbCore, but friendly cone search services include
a field with the distance between the object found and the position
passed in; this is added by the special element scsCore.
You (in effect) must include the some pre-defined condDescs that make
up the SCS protocol, like this:
<scsCore queriedTable="main">
<FEED source="//scs#coreDescs"/>
</scsCore>
This will provide the RA, DEC, and SR parameters for most renderers. The
form renderer, however will show a nice input box that lets humans enter
object names or, if they cannot live without them, sexagesimal
positions (if you are curious: this works by setting the
onlyForRenderer and notForRenderer attributes on
Element inputKey).
In addition, //scs#coreDescs gives you a paramter MAXREC to
limit or raise the number of matches returned. This parameter is not
required by SCS, but it is useful if people with sufficient technical
skills (they'll need those because common SCS clients don't support
MAXREC yet) want to raise or lower DaCHS' default match limit (which is
configured in [ivoa]dalDefaultLimit and can be raised up to
[ivoa]dalHardLimit).
SCS allows more query parameters; you can usually use condDesc's
buildFrom attribute to directly make one from an input column. If you
want to add a larger number of them, you might want to use active tags:
<dbCore id="xlcore" queriedTable="main">
<FEED source="//scs#coreDescs"/>
<LOOP listItems="ipix bmag rmag jmag pmra pmde">
<events>
<condDesc buildFrom="\item"/>
</events>
</LOOP>
</dbCore>
Note that most current SCS clients are not good at discovering such
additional parameters, since for SCS this requires going through the
Registry. In TOPCAT, for example, users would have to manually edit the
cone search URL.
Also note that SCS does not really define the syntax of these
parameters, which is relevant because most of the time they will be
float-valued, and hence you will generally need to use intervals as
constraints. The interval syntax used by the SCS renderer is DALI, so a
bounded interval would be 22.3 30e5, and you'd build half-bounded
intervals with IEEE infinity literals, like -Inf -1. Of course,
when accessed through a form, the usual VizieR parameter syntax applies.
To expose that core through a service, just allow the scs.xml renderer
on it. With the extra human-oriented positional constraint and mainly
builtFrom condDescs, you can usually have a web-based form interface
for free:
<service id="cone" allowed="scs.xml,form">
<meta name="title">Nice Catalogue Cone Search</meta>
<meta name="shortName">NC Cone</meta>
<meta name="testQuery.ra">10</meta>
<meta name="testQuery.dec">10</meta>
<meta name="testQuery.sr">0.01</meta>
<scsCore queriedTable="main">
<FEED source="//scs#coreDescs"/>
<LOOP listItems="ipix bmag rmag jmag pmra pmde">
<events>
<condDesc buildFrom="\item"/>
</events>
</LOOP>
</scsCore>
</service>
The meta information given is used when generating registry
records; the idea is that a query with the ra, dec, and sr you give
actually returns some data.
SIAP is a proven, if just a little dated, way to make images available
in a uniform way. Note that there is a second major version of SIAP,
called SIAv2 here. Since it is much more like obscore than like
conventional SIAP, we discuss it there.
To generate a template RD for an image collection published through
SIAP, run:
dachs start siap
See Starting from Scratch for a discussion on how to fill out this
template.
While you can shoehorn DaCHS to pull the necessary information from many
different types of images, anything but FITS files with halfway sane WCS
headers is going to be fiddly – and of course, FITS+modern WCS is about
the only thing that will work nicely on all relevant clients.
If you have to have images of a different sort, it is probably a good
idea to inquire on the dachs-support mailing list before spending a
major effort on local development.
Check out a sample resource directory:
cd `dachs config inputsDir`
svn co http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/emi
cd emi
mkdir data
Now fetch some files to populate the data directory so you have
something to import:
cd data
SRCURL=http://dc.g-vo.org/emi/q/s/siap.xml
curl -s $SRCURL"?POS=163.3,57.8&SIZE=20,20&MAXREC=5&weighting=uniform&RESPONSEFORMAT=votabletd" \
| tr '<TD>' '\n' \
| grep "^http://" \
| sed -e 's/<[^>]*>//g' \
| xargs -n1 curl -sO
(no, this is not in general the way to operate SIAP services; use a
proper client for real work, and we didn't show you this).
This RD also publishes to obscore, so make sure you have the obscore table:
dachs imp //obscore
If you do not plan to publish via obscore yourself (which is reasonably
unlikely) and you try this on a box that the Registry will see later
(you shouldn't), be sure to dachs drop obscore again when done.
Run the import:
cd ..
dachs imp q
Now start the server as necessary (see above), and start TOPCAT and
Aladin. In TOPCAT, open VO/SIA Query, enter your new service's access
URL (it's http://localhost:8080/emi/q/s/siap.xml unless you did
something cunning and should know better yourself) under “SIA URL”
pretty far down in the dialog.
Then have “Lockman Hole” as Object Name and resolve it, or manually
enter 161.25 and 58.0 as RA and Dec, respectively, and have 2 as Angular
Size. Send off the request. You'll get back a table that you can send
to Aladin (Interop/Send to/Aladin), which will respond by presenting a
load dialog. Doubleclick and load as you like. Yes, the images look a
bit like static noise. That's all right here – but do combine these
images with, say, DSS colored optical imagery and marvel at the wonders
of modern VLBI interferometry.
Indicentally, we made the detour through TOPCAT since there's no nice UI
to query non-registred SIAP services in Aladin.
SIAP-capable tables should mix in //siap#pgs. This mixin provides
all the columns necessary for valid SIAP responses, and it will prepare
the table so that spatial queries (which are the most common) will use a
pgSphere index.
So, in the simplest case, a table published through a SIAP service
would be declared like this:
<table id="images" onDisk="True" mixin="//siap#pgs"/>
This only has the minimal SIAP metadata. You will usually want to add
additional columns with extra metadata from your images.
The //siap#pgs mixin also takes care that anything added to
the table also ends up in the products table. This means that the
grammar filling the table needs a //products#define rowfilter.
When filling SIAP tables, you will almost always use:
In practice, this might look like this:
<data id="import_main">
<sources recurse="True">
<pattern>data/*.fits</pattern>
</sources>
<fitsProdGrammar qnd="True">
<maxHeaderBlocks>80</maxHeaderBlocks>
<mapKeys>
<map key="object">OBJECT</map>
<map key="obsdec">OBSDEC</map>
<map key="obsra">OBSRA</map>
</mapKeys>
<rowfilter procDef="//products#define">
<bind key="table">"emi.main"</bind>
</rowfilter>
</fitsProdGrammar>
<make table="main" >
<rowmaker id="gen_rmk" idmaps="object, obsra, obsdec">
<apply procDef="//siap#computePGS"/>
<apply procDef="//siap#setMeta">
<bind name="bandpassLo">0.207</bind>
<bind name="bandpassHi">0.228</bind>
<bind name="bandpassId">"1.4 GHz"</bind>
<bind name="bandpassRefval">0.214</bind>
<bind name="bandpassUnit">"m"</bind>
<!-- since the images have a fairly complex provenance,
there's no way we can have sensible dates; this one here
ought to be reasonably representative -->
<bind name="dateObs"
>dateTimeToMJD(datetime.datetime(2010, 7, 4))</bind>
<bind name="instrument">@INSTRUME</bind>
<bind name="title">"VLBA 1.4 GHz "+@object</bind>
</apply>
<apply name="fixObjectName">
<setup>
<code>
import csv
with open(rd.getAbsPath(
"res/namemap.csv")) as f:
nameMap = dict(csv.reader(f))
</code>
</setup>
<code>
@object = nameMap[@object]
</code>
</apply>
<map key="weighting">\inputRelativePath.split("_")[-1][:-5]</map>
</rowmaker>
</make>
</data>
This does, step by step:
- The sources element is as always – with image collections, the
recurse attribute often comes in particularly handy.
- When ingesting images, you will very typically read from
FITS primary headers. That is what element
fitsProdGrammar does unless told otherwise: Its rawdicts simply are
the (astropy.io.fits) headers turned into plain python dictionaries.
- The qnd attribute of the grammar is recommended as long as you
get away with it. It makes some (weak) assumptions to yield
significant speedups, but it limits you to the primary header. You
cannot use qnd with compressed FITS images. Also note the
hdusField attribute when you have more complex FITSes to process.
- The fitsProdGrammar will map keys with hyphens to names with
underscores, which allows for smoother action with them in rowmakers.
The mapKeys element can produce additional mappings; in this case,
we abuse it a bit to let us have idmaps (rather than simplemaps)
in the rowmaker. And, actually, to illustrate the feature, as this
data does not need key mapping, really.
- Since we are defining a table serving data products here,
the grammar needs the //products#define rowfilter
discussed in the products table.
- We have mentioned the //siap#computePGS mixin above; as
long as astropy can deal with your WCS, it's really automatic (though
you may want to pass parameters when you have cubes with suboptimal
WCS or want to keep products without WCS in your table). And if
you don't have proper WCS: see above on checking with dachs-supoort.
- The second apply you want when feeding SIAP tables is
//siap#setMeta. Try to give all its keys
somewhat sensible values, you will make your users' lives much easier.
- The exception to the “just fill all of setMeta's parameters” is the
various bandpass* parameters. If the bandpass used is known to DaCHS,
just set bandpassId and then use the
//siap#getBandFromFilter apply. Whether a band is known
you can find out by running dachs admin dumpDF data/filters.txt –
and we are grateful for further contributions to that file.
- Typically, many values you find in the FITS headers will be messy and
fouled up. You'll spend some quality time fixing these values in the
typical case. Here, we translate somewhat broken object names using
a simple mapping file that was provided by the author. In many other
situations, the //procs#mapValue or
//procs#dictMap applys let you do fixes with less code.
- As is usual in DaCHS procdures, you can access the embedding RD as rd.
In our object name fixer, we use that to let DaCHS find the input
file independently of where the programme was started.
Somewhat regrettably, //siap#setMeta cannot be used with idmaps="*";
as settings from setMeta would be overwritten then. This is because
setMeta was written before the idmaps attribute existed and hence it
wrote directly into the rowdict. Changing setMeta now would break existing
RDs, which we try hard to avoid.
There are two cores you may want for SIAP services:
- Element dbCore, to which you add the necessary
condDescs manually as below, for “normal” SIAP services.
- siapCutoutCore, which speaks SIAP
but returns cutouts rather than full images; the size of these cutouts
is determined by the SIZE argument (i.e., the region of interest).
To furnish these cores with the parameters required by the standard, use
the //siap#protoInput condDesc.
If you want to re-use the core for a
form-based service, simply add the //siap#humanInput
condDesc; as for
SCS, the respective renderers will use parsers adapted to where the
inputs come from.
So, a basic core with a couple of additional fields would look like
this:
<dbCore id="query_images" queriedTable="main">
<condDesc original="//siap#protoInput"/>
<condDesc original="//siap#humanInput"/>
<condDesc buildFrom="dateObs"/>
<condDesc buildFrom="bandpassId" />
<condDesc>
<inputKey name="object" type="text"
tablehead="Target Object"
description="Object being observed, Simbad-resolvable form"
ucd="meta.name" verbLevel="5" required="True">
<values fromdb="object FROM lensunion.main"/>
</inputKey>
</condDesc>
</dbCore>
If you wrote the core to work for both SIAP and form as described above,
there's little more to say except you'll want to use the siap.xml
renderer, and you need some additional metadata for VO registration.
The latter is described with the siap.xml renderer.
With this, the service definition would look like this:
<service id="im" allowed="form,siap.xml" core="query_image">
<meta name="shortName">sample images</meta>
<meta name="title">Sample Image Archive</meta>
<meta name="sia.type">Pointed</meta>
<meta name="testQuery.pos.ra">230.444</meta>
<meta name="testQuery.pos.dec">52.929</meta>
<meta name="testQuery.size.ra">0.1</meta>
<meta name="testQuery.size.dec">0.1</meta>
<publish render="siap.xml" sets="ivo_managed"/>
<publish render="form" sets="local,ivo_managed"/>
</service>
– where again you can just write the above core inline rather than
referencing it; that's the style we usually recommend, and what the
dachs start template has.
SIAP's metadata is quite a bit poorer than Obscore's (cf. publishing
anything through Obscore). This means that you will usually have to
fill a few extra parameters to arrive at a full Obscore record. But
even a simple use of the //obscore#publishSIAP mixin will
let people find your SIAP data in the obscore table.
The defaults given in the reference documentation show how we are
mapping SIAP to Obscore. At least whatever is NULL by default you
should critically examine and fill as appropriate.
DaCHS' SIAP template gives the most important extra fields:
<mixin
calibLevel="2"
collectionName="'%a few letters identifying this data%'"
targetName="%column name of an object designation%"
expTime="%column name of an exposure time%"
targetClass="'%simbad taget class%'"
>//obscore#publishSIAP</mixin>
You can delete any parameter you cannot fill or do not want to fill,
though you really should put in a useful collectionName.
If you have larger images or cubes and serve them through SIAP, consider
offering datalinks or perhaps even have Datalinks as Products.
The latter case is particularly attractive if your images are so large
that people just clicking on some row in Aladin might not expect a
download of that size (in 2020, I'd set that limit at, perhaps, 100 MB).
In both cases, people can select and download only parts of the image or
request scaled versions of it (a less powerful and transparent
alternative is the siapCutoutCore discussed above; the two options don't
mix very well).
Defining a datalink service for normal FITS images is not hard. In the
simplest case, you just give a bit of metadata, use the
//soda#fits_genDesc descriptor generator (you don't need to
understand exactly what that is; if you are curious:
Datalink and SODA has the full story) and FEED
//soda#fits_standardDLFuncs. Done.
The following example, taken from
lswscans/res/positions, adds a fixed link to a scaled
version, which might work a bit smoother with unsophisticated Datalink
clients, using a meta maker:
<service id="dl" allowed="dlget,dlmeta">
<meta name="title">HDAP Datalink</meta>
<meta name="description">This service lets you access cutouts
from HDAP plates and retrieve scaled versions.</meta>
<datalinkCore>
<descriptorGenerator procDef="//soda#fits_genDesc">
<bind key="accrefPrefix">lswscans</bind>
</descriptorGenerator>
<FEED source="//soda#fits_standardDLFuncs"/>
<metaMaker semantics="#science">
<code>
yield descriptor.makeLink(
makeProductLink(descriptor.accref+"?scale=4"),
contentType="image/fits",
description="FITS, scaled by 1/4",
contentLength=descriptor.estimateSize()/16.)
</code>
</metaMaker>
</datalinkCore>
</service>
See Meta Makers for more information on what is going on
inside the meta maker. The remaining material is either stereotypical
or pure metadata: title and description are as for any other serivce,
and the accrefPrefix should in general reflect your resource
directory name. DaCHS' datalink machinery will reject any publisher DID
asking for an accref not starting with that string. The idea here is
to avoid applying code to datasets it is not written for.
When you attach the datalink functionality (rather than having datalink
links as access URLs), the following is the recommended pattern:
Add a pub_did column to the SIAP table. Make sure you're giving it a
UCD of meta.id;meta.main and that you have no other such column
in your table:
<table id="data" onDisk="true" mixin="//siap#pgs">
<index columns="pub_did"/>
<column name="pub_did" type="text"
ucd="meta.id;meta.main"
tablehead="P. DID"
description="Dataset identifier assigned by the publisher"
verbLevel="15"/>
Populate the pub_did column in the row maker:
<map key="pub_did">\standardPubDID</map>
Declare your table as having datalink support; both SIAP and TAP
will pick that up and add the necessary declarations so
datalink-aware clients will know they can run datalink queries
against your service:
<meta name="_associatedDatalinkService">
<meta name="serviceId">dl</meta>
<meta name="idColumn">pub_did</meta>
</meta>
This block needs to sit in the table element. The serviceId meta
contains the id of the datalink service.
If you also produce HTML forms and tables, see datalinks in
columns.
Publishing spectra is harder than publishing catalogues or images; for
one, the Simple Spectral Access Protocol comes with a large bunch of
metadata, quite a bit of which regrettably repeats VOResource. And
there is no common format for spectra, just a few contradicting
loose conventions.
That is why dachs start produces a template that contains an
embedded datalink service. This lets you push out halfway rational
VOTables that most interesting clients can reliably deal with, while
still giving access to whatever upstream data you got.
In the past, we have tried to cope with the large and often constant
metadata set of SSAP
using various mixins that have a certain part of the metadata in PARAMs
(which is ok by the standard). These were, specifically, the mixins
//ssap#hcd and //ssap#mixc. Do not use them any more in new
data and ignore any references to them in the documentation.
The modern way to deal with SSAP – both for spectra and for time series
– is to use the //ssap#view mixin. In essence, this is
a relatively shallow way to map your own metadata to SSA metadata using
a SQL view. This is also what the dachs start template does.
Check out the feros resource directory into your inputs directory:
cd `dachs config inputsDir`
svn co http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/feros
cd feros
mkdir data
As recommended, the checkout does not contain actual data, so let's
fetch a file:
cd data
curl -O http://dc.g-vo.org/getproduct/feros/data/f04031.fits
cd ..
This RD also publishes to obscore, so make sure you have the obscore table:
dachs imp //obscore
If you do not plan to publish via obscore yourself (which is reasonably
unlikely) and you try this on a box that the Registry will see later
(you shouldn't), be sure to dachs drop obscore again when done.
Run the input and the regression tests:
dachs imp q
dachs test q
One regression test should fail since you've not yet pre-generated the
previews (which are optional but recommended for your datasets, too):
python3 bin/makepreviews.py
dachs test q
If the regressions tests don't pass now, please complain to the authors
of this tutorial.
From here on, you can point your favourite spectral client (fallback:
TOPCAT's SSA client; note that TOPCAT itself cannot open this service's
native format and you'll have to go through datalink, which TOPCAT knows
how to do since about version 4.6) to
http://localhost:8080/feros/q/ssa/ssap.xml and do your queries (if you
don't know anything else, do a positional query for 149.734, 2.28216).
Please drop the dataset again when you're done playing with it:
dachs drop feros/q
Since the dataset is in the obscore table, it would otherwise be
globally discoverable, and that'd be bad.
Contrary to what DaCHS does with the relatively small
SCS, SIAP, and SLAP models, due to the size of the SSAP model, spectral
services are always based on a database view on top of a table the
structure of which is controlled by you; you're saving quite a bit of
work if you keep your own table's columns as close to their SSA form as
possible, though.
Another special situation is that most spectra are delivered in fairly
crazy formats, which means that it's usually a good idea to have a
datalink service that serves halfway standard files – in DaCHS, these
comply to the VO's spectral data model, which is a VOTable with a bit of
standard metadata. It's certainly not beautiful or very sensible, but
it sure beats the IRAF-style 1D images people still routinely push around.
So, to start a spectral service, use the ssap+datalink template:
$ mkdir myspectra; cd myspectra
$ dachs start ssap+datalink
This will result in the usual q.rd file as per starting from
scratch; see there for how to efficiently edit this and for
explanations on the common metadata items.
SSAP-specific material starts at the meta definitions for
ssap.dataSource and ssap.creationType. These are actually used
in service discovery, so you should be careful to select the right
words. Realistically, using survey/archival for
observational data and theory/archival for theoretical spectra
should be the right thing most of the time.
Next, you define the raw_data table; this should contain all
metadata “unpredictably” varying between datasets (or, for large data
collections, anything that needs to be indexed for good performance).
For instance, for observational data, the observed location is going to
change from row to row. The start and the end of the spectrum is
probably going to be fixed for a given instrument, and so if you have a
homogeneous data collection you probably will
not have columns for them and rather provide constant values when
defining the view.
To conveniently define the table, it is recommended to pull the SSA
columns for raw_data by name from DaCHS' SSAP definitions and use
SSAP conventions (e.g., units). The generated RD is set up for this by
giving namePath="//ssap#instance", which essentially means “if
someone requests an element by id, first look in the instance table
of the RD”. This is then used in the following LOOP (cf. Active
Tags). As generated, this will look somewhat like:
<LOOP listItems="ssa_dateObs ssa_dstitle ssa_targname ssa_length
ssa_specres ssa_timeExt">
– this will pull in the enumerated columns as if you had defined them
literally in the table. Depending on the nature of your data, you may
want to pull in more columns if they vary for your datasets (or throw
out ones you don't need, as ssa_dateObs for theoretical data).
To see what is available, refer to the reference documentation of the
the //ssap#view mixin. Any parameter that starts with
ssa_ can be pulled in as a column.
The template RD then mixes in //products#table (which you pretty
certainly want; see The Products Table for an explanation),
//ssap#plainlocation (which you want if
you have positions on observational data) and //ssap#simpleCoverage
(which you want if you want to publish your observational spectra
through obscore). The template then defines:
<FEED source="//scs#splitPosIndex"
long="degrees(long(ssa_location))"
lat="degrees(lat(ssa_location))"/>
This again is mainly useful for obscore as long as DaCHS' ADQL engine
may turn queries into q3c statements; just leave it if you have
positions, and remove it if you don't.
You can, of course, define further columns here, both for later use in
the view and for local management. SSAP lets you return arbitrary local
columns, and in particular for theory services, you will have to (to
define the physics of your model). As a DaCHS convention, please don't
use the ssa_ prefix on your custom columns. See
theossa/q for an example of a table with many extra
columns.
The SSA template then goes on with a data item filling the raw_data
table. The template assumes you're parsing from IRAF-style 1D images.
You will have to use a different grammar if that is not what you have,
and in that case you in particular you cannot use the specAx var
defined in the rowmaker.
The data item has <recreateAfter>make_view</recreateAfter> quite
early on; this simply makes sure that the SSA view will be regenerated
after you import the table itself.
The rowfilter in the grammar is fairly complex here because we will
completely hide the original; if you simply want to serve your upstream
format, just cut it down to just giving table, mime, preview
and preview_mime. If you do that, use the following strings in
mime:
- image/fits for IRAF-style 1D image spectra
- application/fits for spectra in FITS tables
- application/x-votable+xml for spectra in VOTables
Please do not put anything else into SSA tables, because you will most
certainly overstrain most SSA clients; if you have a different
upstream format and you want to make it available, turn it into SDM
VOTables and use datalink to link to the original source.
Hence, for most cases (including also ASCII spectra), here's what we
recommend as the product definition rowfilter (it's roughly what's in
the template) to isolate your clients from the odd upstream formats:
<rowfilter procDef="//products#define">
<bind name="table">"\\schema.main"</bind>
<bind name="path">\\fullDLURL{"sdl"}</bind>
<bind name="fsize">%typical size of an SDM VOTable%</bind>
<bind name="datalink">"\\rdId#sdl"</bind>
<bind name="mime">"application/x-votable+xml"</bind>
<bind name="preview">\\standardPreviewPath</bind>
<bind name="preview_mime">"image/png"</bind>
</rowfilter>
This is pointing the path accessed to a datalink URL using the
fullDLURL macro, which expands to a URL retrieving the full dataset;
the “sdl” argument to the macro references the datalink service defined
further down. Since the data returned is generated on the fly, you will
have to give an estimate of how large the VOTable will be (overriding
DaCHS' default of the size of the source file). Don't sweat this too
much, just don't claim something is 1e9 bytes when you're really just
returning a few kilobytes. The rowfilter expects the size in bytes.
The bindings already prepare for making and serving previews, which is
discussed in more detail in Product Previews in the DaCHS
reference; see there for everything mentioning “preview“.
SSAP has a feature that lets users request certain formats, and for
clients that don't know Datalink, this may be a good idea. In that
scheme, you use a rowfilter to return a description of your native data
and the processed SDM-compliant dataset as used here. See
theossa/q for an example how that would look like. Our
recommendation: don't bother, it's a misfeature that will most likely
just confuse your users.
The rowmaker is fairly standard; we should perhaps mention the
elements:
<var name="specAx">%use getWCSAxis(@header_, 1) for IRAF FITSes
(delete otherwise)%</var>
<map key="ssa_specstart">%typically, @specAx.pixToPhys(1)*1e-10%</map>
<map name="ssa_length">%typically, @specAx.axisLength%</map>
getWCSAxis is a function that looks at a FITS image's WCS
information to let you transform pixel to physical coordinates. This
currently uses a simplified DaCHS implemenation that only does a small
part of WCS (but we may change that, keeping the interface stable). The
var, anyway, binds the resulting object to specAx. You can use
that later to find out the limits of the spectrum. The way it is written
here, you will still have convert the value to metres manually. But as
said above, if you're publishing a homogeneous collection of spectra,
both values are probably constant, and you'll want to remove both maps
from the template.
The template goes on defining the data table that will serve as the
basis of the service. It starts with the declaration:
<meta name="_associatedDatalinkService">
<meta name="serviceId">sdl</meta>
<meta name="idColumn">ssa_pubDID</meta>
</meta>
This is lets DaCHS add a link to the datalink service to results
generated from this (both via SSAP and TAP). There's nothing you need to
change here (unless you chuck datalink); see SSAP and Datalink and
Datalink for details.
The main body of the table definition is the //ssap#view
mixin. In it, you need to write the SSA parameters as SQL literals (i.e.,
strings need single quotes around them) or expressions involving column
references. To keep things fast, you should have SSA-ready
columns in the source
table, so you will usually have column references only here. Most of
these items default to NULL, so if you do not have a piece of metadata,
it is reasonably safe to just remove the attribute.
A few of the mixin's parameters deserve extra discussion:
- sourcetable – this is a reference, i.e., this must resolve to
the id of some table element. It can be cross-RD if really
necessary. It is not the SQL table reference (that would include
a schema reference).
- copiedcolumns – this lets you copy over columns from the source
table, i.e., the one you just defined using (comma-separated) shell
patterns of their names (yes, that's just like idmaps in
rowmakers). The * given in the template should work in most
cases, but if you have private columns in the source table, you can
suppress them in the view; a useful convention might be to start all
private columns with a p; you'd then say copiedcolumns="[^p]*".
Note that copied columns are automatically added in the view as 1:1
maps, and you cannot use view arguments to override them. Use
different column names in the source table and the view if you
(think you) have to do view-level processing of your values.
- customcode – use this if you have extra material to enter in the
view definition generated by the mixin. We hope you won't need that
and would be interested in your use case if you find yourself using
this.
- ssa_spectralunit, ssa_fluxunit – these are the only
mandatory parameters starting with ssa_ (but their values are
still overwritten if they are in copiedcolumns). There really is no
point in having them vary from row to row because their values are
metadata for the corresponding error columns (which is one of the
many spec bugs in SSAP).
- ssa_spectralucd, ssa_fluxucd – these are like the unit
parameters in that they contain data collection-level metadata. The only
reason they are not mandatory is that there are defaults that seem
sensible for a large number of cases. Check them, and again, you
cannot really let them vary from row to row.
- ssa_fluxSI, ssa_spectralSI, ssa_timeSI – these were an
attempt to work around a missing specification for unit strings in
the VO. Since we now have VOUnit, just ignore them.
The data item making this table is trivial. You should set it to
auto="False" (i.e., don't build this on an unadorned dachs imp).
The building of this data will normally be triggered by the
recreateAfter of the source table import.
Use the element ssapCore for SSAP services. You must
feed in the condition descriptors for the SSAP parameters you want to
support (some are mandatory). The
simplest way to do that is to FEED the
//ssap#hcd_condDescs stream. It
includes condition descriptors for all mandatory and optional parameters
that we can remotely see in use.
Some of them may not be relevant to your service because your table
never has values for them. For example, theoretical spectra will
typically not give information on positions. The SSAP spec says that
such a service should ignore POS rather than returning the empty set.
We consider that an unfortunate recommendation that you should
ignore; if someone queries your theoretical service with a position, it
is highly likely they do not want to see all your spectra.
If you nevertheless think you must ignore
certain conditions, you can use the PRUNE
active tag. This looks like this:
<ssapCore queriedTable="newdata">
<FEED source="//ssap#hcd_condDescs">
<PRUNE id="coneCond"/>
<PRUNE id="bandCond"/>
</FEED>
</ssapCore>
Again, do not do this just because you don't have, say position
information.
Here is a table of parameter names and ids; you can always check them
by inspecting the output of dachs adm dumpDF //ssap:
Parameter name |
condDesc id |
POS, SIZE |
coneCond |
BAND |
bandCond |
TIME |
timeCond |
For APERTURE, SNR, REDSHIFT, TARGETNAME, TARGETCLASS, PUBDID,
CREATORDID, and MTIME, the condDesc id simply is <keyname>_cond,
e.g., APERTURE_cond.
To have custom parameters, simply add condDesc elements as usual:
<ssapCore queriedTable="newdata">
<FEED source="//ssap#hcd_condDescs"/>
<condDesc buildFrom="t_eff"/>
</ssapCore>
For SSAP cores, buildFrom will enable “PQL”-like query syntax such
that users can post arguments like 20000/30000,35000 to t_eff.
This is in keeping with the general SSAP parameter style, while
more modern VO services use 2-arrays for intervals (“DALI style”).
To expose SSAP cores, use the ssap.xml renderer.
Using the
form renderer on SSAP cores is not terribly useful, because the core
returns XML directly, and there are far too many parameters no human
will ever be interested in anyway.
Hence, you will typically define extra browser-based
services. The example RD shows a compact way to do that:
<service id="web" defaultRenderer="form">
<meta name="shortName">\\schema Web</meta>
<dbCore queriedTable="main">
<condDesc buildFrom="ssa_location"/>
<condDesc buildFrom="ssa_dateObs"/>
<condDesc>
<inputKey original="data.ssa_targname" tablehead="Star">
<values fromdb="ssa_targname from theossa.data
order by ssa_targname"/>
</inputKey>
</condDesc>
</dbCore>
<outputTable>
<autoCols>accref, mime, ssa_targname,
ssa_aperture, ssa_dateObs, datalink</autoCols>
<FEED source="//ssap#atomicCoords"/>
<outputField original="ssa_specstart" displayHint="displayUnit=Angstrom"/>
<outputField original="ssa_specend" displayHint="displayUnit=Angstrom"/>
</outputTable>
</service>
Essentially, we only select a few fields people might want to query
against, and we directly build them out of the query fields; the SSA
condDescs are bound to the funny and insufficiently defined SSA input
syntax and probably not very useful in interactive applications.
The extra selector for object names with the names actually present in
the database is a nice service as long as you only have a few hundred
objects or so. Since the query over ssa_targname is executed at
each load of the RD, it should be fast, which means that even for
medium-sized tables, you should have an index on the object names in
the raw_data table, probably like this:
<index columns="ssa_targname"/>
In the output table, we only give a few of the many dozen SSAP output
fields, and we change the units of the spectral limits to Angstroms,
which will look nicer for optical spectra. For
educational reasons you might want to change this to nm (nanometer).
In the template, this form-based service is published as a capability of
the SSA service. This is done using the service attribute in the
Element publish in the SSAP service element:
<publish render="form" sets="ivo_managed,local" service="web"/>
See Registering Web Interfaces to DAL Services for more background.
The SSA metadata is not far from the Obscore metadata (cf. publishing
anything through obscore), and so an Obscore publication of SSAP data
almost comes for free: Minimally, just mix in
//obscore#publishSSAPMIXC and set calibLevel. The
template does a bit more:
<mixin
calibLevel="%likely one of 1 for uncalibrated or 2 for uncalibrated data%"
coverage="%ssa_region -- or remove this if you have no ssa_region%"
sResolution="ssa_spaceres"
oUCD="ssa_fluxucd"
emUCD="ssa_spectralucd"
>//obscore#publishSSAPMIXC</mixin>
– if you use one of the old hcd or mixc mixins, you do not want
oUCD and emUCD.
In particular for larger spectral collections, it is highly recommended
to also have the //ssap#simpleCoverage mixin in an
obscore-published spectral table; only then will you get indexed queries
when there are constraints on s_region, and having these non-indexed
will lead to really slow obscore queries.
Given that you will usually get fairly bizarre inputs and will probably
want to publish “repaired” spectra, using Datalink to provide both
native and SDM (“Spectral Data Model”) compliant spectra without having
to resort to SSAP's ill-thought-out FORMAT feature is a fairly natural
thing to do. That is why the SSAP+datalink template comes with almost
all that you need to do that; what is left is mainly to
write an embedded grammar to parse the spectra (if the parsing is
complex, you might to go for an Element customGrammar, which
lets you keep the source outside of the RD).
Other than that, it is just a few formalities.
So, you first define the table that will later hold your spectrum.
Use the //ssap#sdm-instance mixin for that (this continues
material from the template):
<table id="instance" onDisk="False">
<mixin ssaTable="main"
spectralDescription="%something like 'Wavelength' or so%"
fluxDescription="%something like 'Flux density' or so%"
>//ssap#sdm-instance</mixin>
<meta name="description">%a few words what a spectrum represents%</meta>
</table>
The descriptions you need to enter here typically end up being labels on
axes of spectral plots, so it is particularly important to be concise and
precise here.
If your spectrum has additional columns (e.g., errors, noise estimates,
bin widths), just put more columns in here. The mixin pulls in all the
various params that SDM wants to see from the row of the spectrum in the
SSAP table.
Note that the table does not have onDisk="True"; these tables are
only made for the brief moment it takes to serialise them into what the
user receives.
As usual, to fill tables, you want a data element. The template just
gives a few hints on how that might work. As a working example,
zcosmos/q parses from 1D FITS images like this:
<data id="build_sdm_data" auto="False">
<embeddedGrammar>
<iterator>
<setup imports="gavo.protocols.products, gavo.utils.pyfits"/>
<code>
fitsPath = products.RAccref.fromString(
self.sourceToken["accref"]).localpath
hdus = pyfits.open(fitsPath)
ax = utils.getWCSAxis(hdus[0].header, 1)
for spec, flux in enumerate(hdus[0].data[0]):
yield {"spectral": ax.pix0ToPhys(spec), "flux": flux}
hdus.close()
</code>
</iterator>
</embeddedGrammar>
<make table="spectrum">
<parmaker>
<apply procDef="//ssap#feedSSAToSDM"/>
</parmaker>
</make>
</data>
The way this figures out the file from which to parse will work if you
have the actual file path in the product table. When you hide the
upstream format as recommended, you have to follow some custom
convention. The SSAP+datalink template has:
sourcePath = urllib.decode(
self.sourceToken["ssa_pubDID"].split('?', 1)[-1])
This works if you use the macro standardPubDID as in the
template.
But you can do arbitrary things here; see the califa/q3 RD
for an example for how you can decode the accref to database rows.
A more complex scenario with ASCII data stored externally and cached is
in theossa/q, a case where multiple orders of echelle
spectra are being processed in flashheros/q.
You will notice that the rowmaker in both the example and the template
is missing. DaCHS will then fill in the
default one, which essentially is idmaps="*". Since you are writing
the grammar from scratch, just use the names of the columns defined in
the instance table and be done with it. The predefined column names are
spectral and flux, so make sure you always have keys for them in
the dictionaries you yield from your grammars.
While there is no rowmaker, the make does have
an Element parmaker; this is stereotypical,
just always use the procDef as here. It copies values from the SSA
input row to the params in the instance table.
Finally, you would need to write the service. For SSAP and SODA,
what's in the template ought to just work. Add Element
metaMaker-s to provide links to, e.g., your raw input files if you want.
In that case, please skim the endless
chapter on Datalink and SODA in the reference documentation
to get an idea of how
descriptor generators, data functions, and meta makers play together.
For reasons discussed in Datalinks in Columns,
it may be a good idea to include a custom
column with a datalink URL in the SSAP table. The ssap+datalink
template already has such a column in its source table and fills it in
import's rowmaker.
As long as there is no anointed successor to SSAP explicitly
catering to time series, you
can use SSAP to publish time series. It is a bit of a hack, but
clients like SPLAT do about the right thing with them.
As to examples, check out k2c9vst/q (which
parses the data from ASCII files) and
gaia/q2 (which stores the actual time series in the
database, a technique we believe is a very good idea).
To notify the Registry (and possibly clients, too), that you are
producing time series, do two things:
Globally declare that you serve time series by setting:
<meta name="productType">timeseries</meta>
near the top of your RD. The ssap+datalink template has more
information on the productType meta.
- have 'timeseries' as ssa_dstype in your view definition.
This section is under construction
Since there is no actual agreed-upon standard for the serialisation of
time series, you will probably have to produce time series on the fly;
DaCHS helps you to produce something that will hopefully follow
standardisation efforts in the VO without much additional work later
on, using, you guessed it, a mixin. For now, the only mixin available
is for photometric timeseries: See the //timeseries#phot-0
mixin to build things. If you have other time series, please write
mail to dachs-support.
To see things in action, refer to k2c9vst/q, the instance
table and the corresponding makes.
However, there is an additional complication, showcased in
gaia/q2 and bgds/l:
it is quite common to have time series in multiple bands in one
resource. For DaCHS, this is a bit of a problem, because the band
influences quite a bit of the table metadata in DaCHS – this is in the
mixin, and what you set there is fixed once the table instance is made.
To get around this, look at the technique shown in bgds/l.
This first defines a STREAM time-series-template with a macros where
the items very between bands:
<STREAM id="time-series-template">
<table id="instance-\band_short">
<mixin
effectiveWavelength="\effective_wavelength"
filterIdentifier="\band_human"
longitude="@ra"
latitude="@dec"
It then uses a LOOP to fill these slots and create one table definition
per band:
<LOOP>
<csvItems>
band_short, band_human, band_ucd, effective_wavelength
i, SDSS/i, em.opt.I, 7.44e-7
r, SDSS/r, em.opt.R, 6.12e-7
</csvItems>
<events source="time-series-template"/>
</LOOP>
The dispatch between the different table templates then happens in the
data function of the tsdl service, using a somewhat obscure feature of
rsc.Data: when using the createWithTable class function, you can
pass in the table that the data item should make. This obviously only
works in specialised circumstances like the one here, but then it's
really convenient. So, while the make in make_instance claim to
build instance-i, this is really overridden in the datalink
service's data function to select the actual table definition:
<dataFunction>
<setup imports="gavo.rsc"/>
<code>
dd = rd.getById("make_instance")
descriptor.data = rsc.Data.createWithTable(dd,
rd.getById("instance_"+descriptor.band))
descriptor.data = rsc.makeData(
dd,
data=descriptor.data,
forceSource=descriptor)
</code>
</dataFunction>
(where descriptor.band has been pulled from the dataset identifier
in the custom descriptor generator of that datalink service; that
pattern is probably a good idea when you are in a similar situation).
This will not scale well to many dozens of bands – if you have that,
you probably want somewhat more hardcore means –, but for the usual
handful of bands this is a relatively reasonable way to produce
time series with nice metadata.
“Obscore”, in VO jargon, refers to a publication of datasets by putting
their metadata into a TAP-queriable database table with a bespoke set of
columns. It lets people pose very complex constraints, even using
uploaded tables, and it is flexible enough to support almost any sort of
data the typed services (SIAP, SSAP) serve and a lot more.
You may ask: Why have the S*APs in the first place? The answer is,
mainly, history. Had we had TAP from the start, it is likely we had not
bothered with defining standards for typed services. But that's not how
things worked out, and thus client support of Obscore still is inferior
to that of the typed services.
However, with a view to a future migrating towards obscore, it is
certainly a good idea to publish data through obscore, too. The good
news is that in DaCHS, that is generally close to trivial.
You will sometimes see something called ObsTAP mentioned. This was
meant to refer to “Obscore queried through TAP”, but since, really,
everyone uses Obscore through TAP, people do not say ObsTAP much any
more. If you see it somewhere, pretend it is really saying Obscore.
Before you can do anything with obscore, you have to run:
dachs imp //obscore
This will also declare support for the obscore data model in your TAP service's
registry record, which will make all-VO obscore queries use your
service. Avoid that if you do not really publish anything through
Obscore.
To drop Obscore if you have accidentally imported it, run:
dachs drop --system //obscore
Internally, the ivoa.obscore table is implemented as a view. If this
view contains bad SQL or tables that have been dropped, working with
obscore can result in rather confusing messages. If that happens, try:
dachs imp //obscore recover
This should remove the bad content from the view statement.
Obscore defines a single table called ivoa.obscore. In DaCHS, that
table typically contains data from a multitude of resources with
different metadata structures. To keep that manageable, DaCHS
implements the table as a view, where the individual tables are mapped
onto that common schema. These mappings are almost always created using
a mixin from the //obscore RD. Filling out its parameters will result
in SQL DDL fragments that are eventually combined to the view
definition. In case you are curious: The fragments are kept in the
ivoa._obscoresources table.
There is some documentation on what to
put where in the mixin documentation, but frankly, as a publisher, you
should have at least passing knowledge of the obscore data model
(2017ivoa.spec.0509L).
When you start with a table underlying a typed service, you can get away
with just saying something like (using SIAP as an example):
mixin="//obscore#publishSIAP"
to the table definition's start tag. You do not have to re-import a table to
publish it to Obscore when you have already imported it – dachs imp -m
<rd id> && dachs imp //obscore will include an existing table in the
obscore view.
When you import data without the -m flag, the mixins arrange for
everything, so you do not need the extra step of importing //obscore.
Since the Obscore data model is quite a bit richer than SIAP's and just a bit
richer than SSAP's, you will usually want to add extra metadata through
the mixin, for instance:
<mixin
sResolution="0.5"
calibLevel="2"
>//obscore#publishSIAP</mixin>
Again, a dachs imp -m followed by an import of //obscore would
be enough to make these changes visible in ivoa.obscore.
See SIAP and Obscore and SSAP and Obscore for more information
on how to Obscore-publish typed data.
Dataset Identifiers
Obscore uses the concept of dataset identifiers rather extensively, and
it is not unlikely that queries against the obs_publisher_did column
will be run – not the least in connection with datalink, in which the
DID has the role of something like a primary key. DaCHS'
obscore-associated datalink service, for instance, will do such queries,
and will be slow if postgres has to seqscan large tables for pubDIDs.
While DaCHS probably does a good job with creating usable (and globally
unique) publisher DIDs, it will not index them by default. Use the
createDIDIndex parameter of the various mixins to make one if your
data collection contains more than just a few hundered entries and there
is no index on it anyway.
On the other hand, the creator DID would be assigned by whoever wrote
the data file, and you should not change or invent it. It was intended
to let people track who republishes a given data set, weed out
duplicates, and the like. Regrettably, only very few data provides
assign creator DIDs, so it's probably not worth bothering.
If you are in a position in which you could make your data provider
generate creator DIDs, you could make them set a good precendent. DaCHS
helps you by letting you claim an authority for them (which would be the
first step). See tutreg/gavo_edu_auth for an example RD
that, when dachs pub-ed, will claim an authority for your publishing
registry, and see Claiming an Authority for the background on
authorities.
target_class
The obscore model has the notion of a target class for pointed
observations; this is intended to cover use cases like “get me spectra
of Galaxies“ or so. Of course, this only works with a common vocabulary
of object types, which does not actually exist in the VO at this time.
The next best thing is SIMBAD's types, which are to be used until
then.
s_region
Obscore has two ways to do spatial queries: using s_ra, s_dec, and
perhaps s_fov on the one hand, and using s_region on the other. That is
a bit unfortunate because in practice you have to have two indices over
at least three columns. Also, DaCHS really likes it if columns are
type-clean, and thus the mixins take quite a bit of pain to make sure
only polygons are in s_region. Given s_region in our obscore
has an xtype of adql:REGION and is thus polymorphic, you might get
away with having other types in there. No proimises on the long term,
though.
Having said all that: please make sure that whenever there is a position
of some kind, you also fill s_region; this is not a problem in SIAP, but
where you only have a position and aperture, in a pinch fill in
something like:
<map key="s_region">pgsphere.SCircle.fromDALI(
[alpha, delta, aperture]).asPoly(6)</map>
(the //ssap#setMeta mixin already does that when both a
position and an aperture are available).
See also Creating pgSphere Geometries for more information on how to
fill geometry-valued columns.
You can also have “pure” Obscore tables which do not build on protocol
mixins. A live example is the cubes table in the
califa/q3 RD within
the GAVO data center. Here is a brief explanation of how this works.
Somewhat like with the SSA view, you define a table for
the obscore columns varying for your particular data collection.
In that tables' definition re-use the metadata given in
the global obscore table. A compact way to do that is through a LOOP
(see Active Tags) and original references, exploiting the
namePath on Element Table:
<table id="cubes" onDisk="True" namePath="//obscore#ObsCore">
<LOOP listItems="obs_id obs_title obs_publisher_did
target_name t_exptime t_min t_max s_region
t_exptime em_min em_max em_res_power">
<events>
<column original="\item"/>
</events>
</LOOP>
adql="True" is absent here as the obscore mixin will set it later.
If you do not have any additional columns
(which you can of course have) and just want to have your datasets in
the obscore table, consider having <adql>hidden</adql> after the
obscore mixin. This will make your table invisible to but still
readable by TAP. This is desirable in such a situation because the
entire information of the table would already be contained in the
obscore table, and thus there is no real reason to query the extra table. In
the Califa example cited above, that is obviously not the case; there is
a wealth of additional columns in the custom, non-obscore table.
We believe this will be the rule rather than the exception.
For a quick overview over what column names you can have in the
listItems above, see the obscore table description.
Even with a custom obscore-like table, you will
almost always want to have DaCHS manage your products. This
works even when all your files are external (i.e., you're entering http
URLs in //products#define's path), and so use the
//products#table mixin (which you don't see with SIAP and SSAP as their
mixins pull it in for you):
<mixin>//products#table</mixin>
Then, apply the //obscore#publish mixin, which is like the
protocol-specific mixins except it doesn't pre-set parameters based on
what is already in protocol-specific tables:
<mixin
accessURL="dlurl"
size="10"
mime="'application/x-votable+xml;content=datalink'"
calibLevel="3"
collectionName="'CALIFA'"
coverage="s_region"
dec="s_dec"
emMax="7e-7"
emMin="3.7e-7"
emResPower="4000/red_disp_mean"
expTime="t_exptime"
facilityName="'Calar Alto'"
fov="0.01"
instrumentName="'PMAS/PPAK at 3.5m Calar Alto'"
oUCD="'phot.flux;em.opt'"
productType="'cube'"
ra="s_ra"
sResolution="0.0002778"
title="obs_title"
tMax="t_min"
tMin="t_max"
targetClass="'Galaxy'"
targetName="target_name"
>//obscore#publish</mixin>
Essentially, what is constant is given in literals, what is variable is
given as a column reference. It is a bit unfortunate that you have to
enter quite a few identity mappings in here (and our foolish
camel-case mogrification of the parameter names doesn't help). Telling
us you are annoyed with this will certainly speed up our efforts to fix
things.
That's about it for defining the table. To fill the table, just have a
normal rowmaker; since the table contains products, don't forget the
//products#define rowfilter in the grammar.
Most of the time, you do not need to worry about telling the Registry
anything about what you do with obscore. As long as you have the
obscore table, your TAP registry record will tell the Registry about it,
and as long as that is published, clients looking for obscore-published
data will find your service and thus your datasets (if they match
the constraints in the obscore query, that is).
In the other direction, when you register a service for a data
collection published via a typed protocol, DaCHS will add a reference
such that clients can see that the data is available through obscore,
too.
But when you do not register a typed service for your data collection
for some reason, you should also register the standalone table as
described in publishing DaCHS-managed tables via TAP
SIAP version 2 is just a a thin layer of parameters on top of obscore.
To publish with SIAP version 2, simply ingest your data as described in
publishing images via SIAP and add the
//obscore#publishSIAP mixin.
In contrast to SIAP version 1, you do not define or register a service
for a SIAPv2-published data collection. Instead,
there is a sitewide SIAPv2 service at <root
URL>/__system__/siap2/sitewide/siap.xml. It is always there,
but it is unpublished by default. To publish it, you should furnish
some extra metadata in the userconfig RD and then run:
dachs pub //siap2
Specifically, get the sitewidesiap2-extras stream and follow the
instructions there to update the meta items as appropriate; at this
point, they are exactly analogous to the ones for SIAP version 1.
EPN-TAP is a standard for publishing planetary data via TAP. DaCHS
has had support for version 0.37, which is in the //epntap RD. Do not
use that for new projects.
The EPN-TAP proposed specification is now in version 2.0, and support
for that is provided by //epntap2. There's an official web-based client
for EPN-TAP at http://vespa.obspm.fr.
You can use EPN-TAP to publish data without any associated datasets;
this happens, for instance, in the catalogue of minor planets,
mpc/q. More commonly, however, there are data files
(“products”) associated to each row. In this case, have at least:
optional_columns="access_url access_format access_estsize"
– these are required to manage such products.
When publishing datasets, there are two basic scenarios:
- local files; you let DaCHS find the sources, parse them, and infer
metadata from this; DaCHS will then serve them. That's what's shown
in the quick start example below. We believe that is the more robust
model overall.
- ingest from pre-destilled metadata; this is when you don't have the
files locally (or at least DaCHS should not serve them). Instead,
you read metadata from dumps from other databases, metadata
stores, or whatever. The titan/q RD shows an example
for that.
To start an EPN-TAP service, do as per Starting from Scratch and use
the epntap template:
dachs start epntap
Data in planetary sciences often comes in PDS format, which
superficially resembles FITS but is quite a bit more sophisticated.
Unfortunately, python support for PDS is underwhelming. At least there
is PyPDS, which needs to be installed for DaCHS' Element
pdsGrammar to work.
Install PyPDS if you don't have it anyway:
curl -LO https://github.com/RyanBalfanz/PyPDS/archive/master.zip
unzip master.zip
cd PyPDS
python setup.py build
sudo python setup.py install
Get the sample data:
cd `dachs config inputsDir`
curl -O http://docs.g-vo.org/epntap-example.tar.gz
tar -xvzf epntap-example.tar.gz
cd lutetia
Import it and build the previews from the PDS images:
dachs imp q
python bin/makePreview.py
Start the server as necessary. If you go to your local ADQL endpoint
(something like http://localhost:8080/adql) and execute queries like:
SELECT * FROM lutetia.epn_core
there.
For access through a standard protocol, start TOPCAT, select VO/TAP
Query, and at the bottom of the dialog enter http://localhost:8080/tap
(or whatever you configured) in “TAP URL”. Hit “Use Service”, wait
until the table metadata is in and then again query something like:
SELECT * FROM lutetia.epn_core
Hit “Run Query”,open the table and play with it. As a little visual treat,
in TOPCAT's main window hit “Activation Action”, and configure the
preview_url column under “View URL as Image”. Then click on the table rows.
To get into Vespa's query interface, you will have to register your
table. Do not do this with the sample data.
In essence, EPNcore is just a set of columns, some mandatory, some
optional. The mandatory ones are pulled into a table by using
the //epntap2#table-2_0 mixin,
which needs the spatial_frame_type
parameter (see the reference for what's supported for it) since that
determines the metadata on the spatial columns. Optional columns can be
pulled in through the optional_columns mixin parameter, and, as said
above, a few of these optional columns are actually required if you want
to publish data products through EPN-TAP. The reference
documentation lists what is available. You can, of course, define
further, non-standard columns as usual.
To populate EPNcore tables, use the //epntap2#populate-2_0,
apply identifying the parameters applying to your data collection and
setting them as usual (cf. Mapping Data). You may need to refer to
the EPN-TAP proposed specification now and then while doing that.
Note again that
parameter values are python expressions, and so you have to use quotes
when specifying literal strings.
If you have to evaluate complex expressions, it is recommended to do the
computations in Element var-s and then use the variables set
there in the bind``s (as ``@myvar). This also lets you re-use
values once computed. Even more complex, multi-statement computations
can be done in Element apply with custom code.
Serving Local Products
When DaCHS is intended to serve local files itself (which is
preferable),
use the //products#define rowfilter in the grammar as usual
(cf. The Products Table). Note that this assumes by default that
you are serving FITS files, which in EPN-TAP most likely is not the
case. Hence, you will usually have to set the mime parameter as
in, perhaps:
<bind name="mime">"image/x-pds"</bind>
in your row maker, the use the epntap2#populate-localfile-2_0
apply (if this gives you errors, make sure you have the optional
columns for products as described above).
Incidentally, you could still use that even for external products, which
is useful if you have DaCHS-generated previews or want to attach a
datalink service. In that case, however, you have to invent some accref
for DaCHS (probably the remote file path) and set that in
products#define's accref parameter. The remote URI then needs to go
into the path parameter.
Serving External Products
When all you have are external URLs, you do not need to go through the
products table (though you still can, as described in Serving Local
Products). It is simpler, however, to just directly fill the
access_url, access_format and access_estsize columns using
plain Element map-s.
The s_region parameter (see
//epntap2#populate-2_0) is essentially
a footprint describing the area covered by 2D spatially extended data products.
It uses pgshpere types such as spoly, scircle, smoc, or spoint
(we advise against the use of spoint as a s_region type: only spatially
extended types should be used). The default type is spoly, the others
must be specified using the regiontype mixin parameter (see
//epntap2#table-2_0).
For more information on how to create values for these regions, see
Creating pgSphere Geometries.
EPN-TAP tables are queried through the data center's TAP service. If
you have registred that, there is nothing else you need to do to access
your data.
For registration, just add:
<publish/>
to your table body and run dachs pub <rd-id>.
Datalink is not a discovery protocol like the others discussed so far;
rather, it is a file format and a simple access protocol for
representing relationships between parts of complex datasets.
Essentially, datalink is for you if you have parts of a dataset's
provenance chain, refined products like source lists and cutouts, masks, or
whatever else. Together with its companion standard SODA, it also lets
clients do server-side manipulations like cutouts, scaling, format
conversion, and the like.
Datalink this is particularly attractive when you have large datasets and you
don't want to push out the whole thing in one go by default. Instead,
clients can then query their users for what part of the dataset they
would like to get – or to alert them of the fact that a large amount of
data is to be expected.
Since Datalink is very flexible, defining datalink services is a bit
involved. The reference documentation has a large section on it.
Here, we discuss some extra usage patterns. The concrete application to
spectra and images is discussed in SSAP and Datalink and
SIAP and Datalink.
In DaCHS, Datalink services are associated with tables. This
association is declared using the _associatedDatalinkService meta
item, which consists of a serviceId (a service reference as per
referencing in DaCHS) and an idColumn (stating from which column
the ID parameter to the datalink service is to be taken from). So,
within the table, you add something like:
<meta name="_associatedDatalinkService">
<meta name="serviceId">dl</meta>
<meta name="idColumn">pub_did</meta>
</meta>
This implies that the service dl within the current RD will produce
a datalink document if passed a string from idColumn. The example
implies that this column ought to contain publisher DIDs (see Dataset
Identifiers), which is
what the standard descriptor generators that come with DaCHS like to
see. Since publisher DIDs tend to be a bit unwieldy (they are supposed
to be globally unique, after all), the standard descriptor generators
will also let you pass in plain accrefs.
If you write your own descriptor generator, you are free to stick whatever
you like into the idColumn, just so long the table and the
descriptor generator agree on its interpretation.
The _associatedDatalinkService declaration discussed in the previous
section is all it takes when you serve data to datalink-aware clients.
If, however, you also want to cater to clients without native datalink
support, you may want to add links to the datalink documents in your
responses; this is particularly advisable when you have services working
through forms in web browsers.
One way to effect that is by defining a column like this:
<column name="datalink" type="text"
ucd="meta.ref.url"
tablehead="DL"
description="URL of a datalink document for this dataset"
verbLevel="1" displayHint="type=url">
<property name="targetType"
>application/x-votable+xml;content=datalink</property>
<property name="targetTitle">Datalink</property>
</column>
The property declarations add some elements to response VOTables that
inform clients like Aladin what to expect when following that link. At
this point, this is a nonstandard convention.
You will then have to fill that column in the rowmaker. As long as the
product is being managed through the products table and you thus used
the //products#define rowfilter in the grammar, all that
takes is a macro:
<map key="datalink">\dlMetaURI{dl}</map>
Here, the “dl” in the macro argument must be the id of the datalink
service.
This method will retain the datalink columns even in protocol responses.
While at this point there is something to be said for that, because
users immediately discover that datalink is available, datalink-aware
clients will then have both the datalink through
_associatedDatalinkService and the in-table column, which, since
they cannot know that the two are really the same, will degrade user
experience: Why should the same datalink be present twice?
With increasing availability of datalink-aware protocol clients, we
therefore prefer a second alternative: produce the extra datalinks only
when rendering form responses. To do that,
furnish web-facing services with an Element outputTable. In
there, do not include the column with your publisher DID but instead
produce a link directly to the links response, somewhat like this:
<service id="web" core="siacore">
...
<outputTable>
<outputField name="dlurl" select="accref"
tablehead="Datalink Access"
description="URL of a datalink document for the dataset
(cutouts, different formats, etc)">
<formatter>
yield T.a(href=getDatalinkMetaLink(
rd.getById("dl"), data)
)["Datalink"]
</formatter>
<property name="targetType"
>application/x-votable+xml;content=datalink</property>
<property name="targetTitle">Datalink</property>
</outputField>
In particular for large datasets, it is usually a good idea to keep
people from blindly pulling the data without first having been made
aware that what they're accessing is not just a few megabyte of FITS.
For that, datalink is a good mechanism by pointing to a links response
as the primary document retrieved.
Of course, without a datalink-enabled client people might be locked out
from the dataset entirely. On the other hand, DaCHS comes with a
stylesheet formatting links responses to be usable in a common web
brower, so that might still be preferable to overwhelming unsuspecting
clients with large amounts of data.
To have datalinks rather than the plain dataset as what the accref
points to, you need to change what DaCHS thinks of your dataset; this is
what the //products#define rowfilter in your grammar is for:
<fitsProdGrammar qnd="True">
<rowfilter procDef="//products#define">
<bind key="path">\dlMetaURI{dl}</bind>
<bind key="mime">'application/x-votable+xml;content=datalink'</bind>
<bind key="fsize">10000</bind>
[...]
</rowfilter>
[...]
</fitsProdGrammar>
The fsize here reflects an estimation of the size of the links
response.
When you do this, you must use a descriptor generator that does not
fetch the actual file location from the path in the products table,
since that column now contains the URI of the links response.
For FITS images, you can use the DLFITSProductDescriptor class as
//soda#fits_genDesc's descClass parameter. The base
functionality of a FITS cutout service with datalink products would then
be:
<service id="dl" allowed="dlget,dlmeta">
<meta name="title">My Cutout Service</meta>
<datalinkCore>
<descriptorGenerator procDef="//soda#fits_genDesc"
name="genFITSDesc">
<bind key="accrefPrefix">'mysvcs/data'</bind>
<bind key="descClass">DLFITSProductDescriptor</bind>
</descriptorGenerator>
<FEED source="//soda#fits_standardDLFuncs"/>
</datalinkCore>
</service>
If you have something else, you will have to write the resolution code
yourself – DLFITSProductDescriptor's sources (in
gavo.protocols.datalink) should give you a head start on how to do
that); see also the tsdl service bgds/l for how to integrate
that into your RD.
Note that DaCHS will not produce automatic previews in this
situation. Have a look at Product Previews for what to do
instead.