2022-03-14 ET-W2AT Meeting

Date

Mar 14, 2022 13:00-14:30 UTC

Participants

ET-W2AT

  • @Rémy Giraud

  • @Jeremy Tandy (Unlicensed)

  • @Tom Kralidis (Unlicensed)

  • @peter.silva (Unlicensed)

  • @Baudouin Raoult (Unlicensed)

  • @Kai Wirt (Unlicensed)

  • @Henning Weber (Unlicensed)

  • @thorsten.buesselberg (Unlicensed)

Other Experts

  • @Kari Sheets (Unlicensed)

WMO Secretariat

  • @Enrico Fucile

  • @HADDOUCH Hassan

  • @Xiaoxia Chen

Apologies

  • @Pablo Loyber (Unlicensed)

  • @Li Xiang (Unlicensed)

  • @Kenji Tsunoda (Unlicensed)

  • @Dana Ostrenga (Unlicensed)

  • @sabai.fatima (Unlicensed)

Goals

Continue discussion on message structure and metadata topic hierarchy

Discussion topics

No

Item

Presenter

Notes

No

Item

Presenter

Notes

1

Topic tree

Peter Silva

Peter continues with his slides on message structure:

  • Topic tree changes from WMO-386

  • TT-Protocols have been mapping from TTAAii (etc.) - but there are some deficiencie

  • In common, requests are seeking to take out references to time (e.g. forecast time)

  • Adding Geo and time Extent

  • …

2

Discussion

All

Enrico >TT-Protocols work is good, mapping from GTS and now trying to simplify, but as an architecture team, need to decide if the mapping from GTS is what we need in WIS2 or maybe just exposing the legacy GTS part(s) within WIS2

Jeremy > WIS2 needs to cover more data than has been shared in GTS, so the topic tree needs to be larger; with earth system approach. GTS can only be a branch within the tree

 Peter > agreed, hoping that the GTS branch can fit within the WIS2 topic tree. With GTS data showing up in the right place

 Enrico > GTS data is broader than World Weather Watch, it contains pretty much everything, hydrology is missing, oceanography is important

 Jeremy > what I want to see is how close the TT-Protocols mapping from GTS get us?

 Peter > TT-Protocols work is still a work in progress. Working with data consumers to see what bits of the TTAAii or filename they use / don't use to determine which bits are irrelevant as criteria to organise data. Time category (e.g. "intermediate") seems to be an irrelevant criteria. Another irrelevant criteria is geographic position. Semantic equivalence between TAC and BUFR; irrespective of the encoding/format, a given category of data should appear in the same place [noting that BUFR Table D sequences are what define the semantics of the data content - and there are 6 flavors of SYNOP!]

 Enrico > there's lots of complexity for observations. I suggest that we have "WIGOS" at a point in the tree, and another for "GDPFS" for NWP data. It’s important to re-use existing vocabularies. We need to be consistent across Activity Areas / Programmes. So, SC-ON / WIGOS need to do this work, with support from us

 [see http://codes.wmo.int/wmdr/ ]

 Peter > we need to hand-off these sub-trees to the domain experts, so in my example, "surface" should be the hand-off point

 Tom > so at the "surface" level in the tree, the Programmes decide that name too?

 Jeremy > avoid using WMO Programme names in the tree - these are potentially brittle

 ACTION (Enrico?) > link TT-Protocols team with WIGOS folk (Jorg Klausen et al.)

 Peter > it would be OK for data to appear at multiple points in the tree

 Peter > so - we delegate to communities of interest for them to develop their subtree? And provide a first attempt or some general guidance [agreed]

 Tom > we need to provide a framework to constrain how they develop their subtree [agreed]

 Peter > filenames? are current GTS filenames useful for WIS2? we have the TTAAii and hash [in the example shown]. This doesn't provide any useful information for MQP users, or WAF users. But if we remove the TTAAii from the filename, we'll struggle to roundtrip back to GTS. So I recommend that we keep the TTAAii in the filename for legacy products, but see this as an opaque identifier; usable as a key to lookup data [?]

 Enrico > we can hand this "product identifier" concern to the domain experts, is OK for WIGOS, but there isn't an equivalent for GDPFS

 Kai > agree that we hand off subtree structure and filenames to the domain experts, but it's our job to keep all the parts [of WIS] working together. For example, making sure that we can link filename and topic. We don't really care what the topic or filenames look like - we just need to be able to link them together

 Tom > agrees, example from WIS2BOX

 Enrico > suggests notes that we put the location information in the BUFR data file

 Peter > but you have to download the file to know if you want to download the file

 Jeremy > if we put geographic information in the message, can we do client-side filtering? Or would this begin to bulk out the message?

 Tom > STAC does bounding boxes; so does GeoJSON

 Peter > a simple bounding box would be OK; small enough, easy to understand

 Tom > keep the bounding box to WGS84

 Kai > let's look at the different layers of decisions, first: look at the discovery metadata to decide what topics to subscribe to; second: if you put information in the message

 Peter > we can do server side filtering on the topics - but only string pattern matching, hard to do string pattern matching for bounding boxes, so recommend putting it in the message, and allowing filtering on location there

 Peter > recommends adding metadata (e.g. valid time) in the filename because it enables users of the Web Accessible Folders to see data. They don't see MQ message, so metadata there can't be used

 Enrico > WAF isn't designed to browse. If you wanted to do this, I recommend using the API access or providing STAC metadata

 Tom > lots of OGC Met Ocean DWG delineate between concepts of design time vs. run time. consider discovery metadata as "design time" curation, people should look at the discovery metadata; from there you can jump to the WAF, API etc.

 Jeremy > don't like putting metadata in filenames - it's bad practice: 1) filenames with embedded metadata are "micro-formats" that people need to parse, It’s OK for specialists - but not for generalists. 2) we're trying to work with other communities - it's unreasonable to expect other people to follow our filename

 Kai > we use the topic and the message metadata to decide whether to download the files

 Peter > but this makes WAF un-usable … because people would be browsing "gibberish" (non-sensical filenames)

 Jeremy > but browsing the WAF is an edge case used to recover if your system loses its queue and needs to re-boot and grab files, it's not the main case

 Tom > the WAF will include the topic-tree in terms of the directory structure, so there will be some "metadata". Why wouldn't a Data Consumer just grab all the files in the sub-directory? a HTTP server could provide metadata like date of publication, so Data Consumer could download from newest to oldest, and stop when they reach data they don't need?

 Enrico > perhaps we should put metadata alongside the files in the WAF, e.g. a copy of the message or STAC metadata

 Kai > the caches have a copy of all the messages anyway, so for recovery, you could review these

 Jeremy > but clients should keep some state information, e.g. what files have already been downloaded (or what messages have been received)

 Tom > is there any harm in making the Global Cache WAF browseable? And making the Global Cache provide a STAC end-point to make the data browseable, This would add value to downstream data users

 Canadian examples:

WAF: https://dd.weather.gc.ca/STAC: https://api.weather.gc.ca/stac/msc-datamart

https://api.weather.gc.ca/stac/msc-datamart/model_gem_global/15km/grib2/lat_lon/00/000/CMC_glb_ABSV_ISBL_200_latlon.15x.15_2022031300_P000?f=json

https://api.weather.gc.ca/stac/msc-datamart/model_gem_global/15km/grib2/lat_lon/00/000/CMC_glb_ABSV_ISBL_200_latlon.15x.15_2022031300_P000

 Jeremy > If we did this, would we make these Global Cache WAFs discoverable

 Tom > yes

 Jeremy > And Global Catalogue would add the association links?

 Tom > yes

 Kai > STAC for the Global Cache may not work - how would it know how to build the STAC Items?

 Enrico > It's difficult to do this in general; even BUFR format is quite variable

 Jeremy > pause this discussion - let's get back to topic tree for now

we'll pick up the filename / WAF / browsable end-point discussion another time [28-March]

3

Metadata topic hierarchy

Tom Klaridis

Tom presents his slides on WIS2 Discovery metadata topic hierarchy perspectives

  • Topic hierarchy

  • Topic hierarchy components

    • Core

    • Data type

  • Temporal

  • …

4

Discussion

All

Jeremy > putting data policy ("core", "recommended") seems problematic to me. What happens when data is moved from "recommended" to "core"? - it would change the place of the data, and this will happen! what's the rationale?

 Enrico > this is my suggestion: the rationale is that you need to take different actions for core vs recommended data, core data is open and unrestricted, it's about managing access control

 Peter > recommends that we use different channels for open and non-open; both with the same topic-tree beneath. Best to have the point in the topic-tree relating to access control as high as possible

Kai > supports Peter's proposal. We will have Data Consumers that would subscribe only to public / open data, then, if data got moved to "open", then existing (wild-carded) subscriptions would simply start returning new data

 Hassan > recommends "public" and "limited" for the top-level channel names

 Enrico > outstanding work item to determine how we deal with Recommended data with access controls.

 Tom > need to make sure that the Channel isn't part of the classification hierarchy, so from Peter's example "xpublic/v03/" wouldn't appear in the metadata classification hierarchy, these things are implementation specific [agreed]

 Tom > but from "WIS" downward, this looks OK

 Enrico > but the CCCC (originating centre) is legacy, it's a badly managed resource of originating centres. I agree to manage originating codes at the country-level

 Jeremy > the originating centre will be in the discovery metadata

 Enrico > we might be able to allocate new identifiers for WIS2 nodes

Action items

  link TT-Protocols team with WIGOS folk (Jorg Klausen et al.)

Decisions

 

  1. we delegate to communities of interest for them to develop their subtree? And provide a first attempt or some general guidance
  2. we need to provide a framework to constrain how they develop their subtree
  3. need to make sure that the Channel isn't part of the classification hierarchy