WIS2: topic hierarchies & dataset classifications
Notes from informal meeting, 16 Jul 2021
Present: Tom Kralidis, Peter Silva, Jeremy Tandy, Enrico Fucile
Â
Topic hierarchies & dataset classifications.
Are these the same thing - or related in some way?
Â
Topic (in message protocol) is not about discovery.
If you look at S3 or other Object Stores, it's about providing a single name to look up a resource. The topic name provides the same function for message brokers.
Topic tree is a hierarchical set of names; similar to OID.
Aim is to create a structured topic tree.
TT-Protocols suggest a hierarchy based on TTAAii and the WMO file naming convention.
Starting at the top-level with Country and Originating Centre to partition the hierarchy into sub-hierarchies that can be independently maintained - similar to barcode governance.
Country = authority of publisher … not the location of the observation / simulation data.
What about ships and mobile platforms? These change location.
Also note the challenge wrt ownership/publication: OceanOps runs platforms operated by coalitions of countries. Current practice is to use the Country/Origin for where the data is published.
…
Topic hierarchy = country | originating centre | {thematic stuff}
…
Enrico > GTS headers don't fit anymore. GTS headers are broken. TTAAii-based topic hierarchy won't work in WIS2.
TT-Protocols proposal is based on backward compatibility with GTS. [further discussion needed]
…
Existing WMO Code Tables. What can we re-use?
…
[Data] Format is not included in the topic hierarchy.
…
Filtering - by topic.
Server-side filtering. You can wild-card at any point in the hierarchy.
Client-side filtering. Needs parsing of the message itself, after it's downloaded. For example, reg-exp on the filename. Or look inside the message to see if you need it. (or need the data that the message refers too)
Â
Metadata:
Dataset is the primary citizen in WIS.
Search the catalogue to find the dataset you're interested in; redirected to the data access end-point(s) based on information in the catalogue record - e.g. an actionable link pointing to the topic / sub-topic from where one can subscribe
Assumption: people will want to subscribe to changes in a dataset.
For example: all Canadian SYNOPs could be a dataset; comprising of 000's of files. Subscribers would be told about the availability of a new file.
Catalogue record will include descriptive metadata; including keywords and concepts from "themes" (hierarchical controlled vocabularies) - these might be similar to GCMD or GEMET
WIS Search API will include ability to search by theme topic.
Do these hierarchical controlled vocabularies map to the message broker topic hierarchies???
Â
How are the WWW community wanting to acquire data?
How should we group data assets together into sets that make meaningful sense?
e.g. SYNOP, but not the more detailed level of "intermediate"
Enrico > is "intermediate" of use to anyone? it's just an artifact of history.
…
How big should the subset be? A whole country? A region?
Â
Separation of concerns?
[A] search the catalogue --> [B] find the metadata record --> [C] find the actionable link(s) in the metadata (e.g. WAF, OGC-API, asynchronous API endpoint with topic hierarchy)