WIS2: topic hierarchies & dataset classifications

Notes from informal meeting, 16 Jul 2021

Present: Tom Kralidis, Peter Silva, Jeremy Tandy, Enrico Fucile

 

  • Topic hierarchies & dataset classifications.

  • Are these the same thing - or related in some way?

 

  • Topic (in message protocol) is not about discovery.

  • If you look at S3 or other Object Stores, it's about providing a single name to look up a resource. The topic name provides the same function for message brokers.

  • Topic tree is a hierarchical set of names; similar to OID.

  • Aim is to create a structured topic tree.

  • TT-Protocols suggest a hierarchy based on TTAAii and the WMO file naming convention.

  • Starting at the top-level with Country and Originating Centre to partition the hierarchy into sub-hierarchies that can be independently maintained - similar to barcode governance.

  • Country = authority of publisher … not the location of the observation / simulation data.

  • What about ships and mobile platforms? These change location.

  • Also note the challenge wrt ownership/publication: OceanOps runs platforms operated by coalitions of countries. Current practice is to use the Country/Origin for where the data is published.

  • …

  • Topic hierarchy = country | originating centre | {thematic stuff}

  • …

  • Enrico > GTS headers don't fit anymore. GTS headers are broken. TTAAii-based topic hierarchy won't work in WIS2.

  • TT-Protocols proposal is based on backward compatibility with GTS. [further discussion needed]

  • …

  • Existing WMO Code Tables. What can we re-use?

  • …

  • [Data] Format is not included in the topic hierarchy.

  • …

  • Filtering - by topic.

  • Server-side filtering. You can wild-card at any point in the hierarchy.

  • Client-side filtering. Needs parsing of the message itself, after it's downloaded. For example, reg-exp on the filename. Or look inside the message to see if you need it. (or need the data that the message refers too)

 

  • Metadata:

  • Dataset is the primary citizen in WIS.

  • Search the catalogue to find the dataset you're interested in; redirected to the data access end-point(s) based on information in the catalogue record - e.g. an actionable link pointing to the topic / sub-topic from where one can subscribe

  • Assumption: people will want to subscribe to changes in a dataset.

  • For example: all Canadian SYNOPs could be a dataset; comprising of 000's of files. Subscribers would be told about the availability of a new file.

  • Catalogue record will include descriptive metadata; including keywords and concepts from "themes" (hierarchical controlled vocabularies) - these might be similar to GCMD or GEMET

  • WIS Search API will include ability to search by theme topic.

  • Do these hierarchical controlled vocabularies map to the message broker topic hierarchies???

 

  • How are the WWW community wanting to acquire data?

  • How should we group data assets together into sets that make meaningful sense?

  • e.g. SYNOP, but not the more detailed level of "intermediate"

  • Enrico > is "intermediate" of use to anyone? it's just an artifact of history.

  • …

  • How big should the subset be? A whole country? A region?

 

  • Separation of concerns?

  • [A] search the catalogue --> [B] find the metadata record --> [C] find the actionable link(s) in the metadata (e.g. WAF, OGC-API, asynchronous API endpoint with topic hierarchy)