2022-03-14 ET-W2AT Meeting
Date
Mar 14, 2022 13:00-14:30 UTC
Participants
ET-W2AT
@Rémy Giraud
@Jeremy Tandy (Unlicensed)
@Tom Kralidis (Unlicensed)
@peter.silva (Unlicensed)
@Baudouin Raoult (Unlicensed)
@Kai Wirt (Unlicensed)
@Henning Weber (Unlicensed)
@thorsten.buesselberg (Unlicensed)
Other Experts
@Kari Sheets (Unlicensed)
WMO Secretariat
@Enrico Fucile
@HADDOUCH Hassan
@Xiaoxia Chen
Apologies
@Pablo Loyber (Unlicensed)
@Li Xiang (Unlicensed)
@Kenji Tsunoda (Unlicensed)
@Dana Ostrenga (Unlicensed)
@sabai.fatima (Unlicensed)
Goals
Continue discussion on message structure and metadata topic hierarchy
Discussion topics
No | Item | Presenter | Notes |
---|---|---|---|
1 | Topic tree | Peter Silva | Peter continues with his slides on message structure:
|
2 | Discussion | All | Enrico >TT-Protocols work is good, mapping from GTS and now trying to simplify, but as an architecture team, need to decide if the mapping from GTS is what we need in WIS2 or maybe just exposing the legacy GTS part(s) within WIS2 Jeremy > WIS2 needs to cover more data than has been shared in GTS, so the topic tree needs to be larger; with earth system approach. GTS can only be a branch within the tree  Peter > agreed, hoping that the GTS branch can fit within the WIS2 topic tree. With GTS data showing up in the right place  Enrico > GTS data is broader than World Weather Watch, it contains pretty much everything, hydrology is missing, oceanography is important  Jeremy > what I want to see is how close the TT-Protocols mapping from GTS get us?  Peter > TT-Protocols work is still a work in progress. Working with data consumers to see what bits of the TTAAii or filename they use / don't use to determine which bits are irrelevant as criteria to organise data. Time category (e.g. "intermediate") seems to be an irrelevant criteria. Another irrelevant criteria is geographic position. Semantic equivalence between TAC and BUFR; irrespective of the encoding/format, a given category of data should appear in the same place [noting that BUFR Table D sequences are what define the semantics of the data content - and there are 6 flavors of SYNOP!]  Enrico > there's lots of complexity for observations. I suggest that we have "WIGOS" at a point in the tree, and another for "GDPFS" for NWP data. It’s important to re-use existing vocabularies. We need to be consistent across Activity Areas / Programmes. So, SC-ON / WIGOS need to do this work, with support from us  [see http://codes.wmo.int/wmdr/ ]  Peter > we need to hand-off these sub-trees to the domain experts, so in my example, "surface" should be the hand-off point  Tom > so at the "surface" level in the tree, the Programmes decide that name too?  Jeremy > avoid using WMO Programme names in the tree - these are potentially brittle  ACTION (Enrico?) > link TT-Protocols team with WIGOS folk (Jorg Klausen et al.)  Peter > it would be OK for data to appear at multiple points in the tree  Peter > so - we delegate to communities of interest for them to develop their subtree? And provide a first attempt or some general guidance [agreed]  Tom > we need to provide a framework to constrain how they develop their subtree [agreed]  Peter > filenames? are current GTS filenames useful for WIS2? we have the TTAAii and hash [in the example shown]. This doesn't provide any useful information for MQP users, or WAF users. But if we remove the TTAAii from the filename, we'll struggle to roundtrip back to GTS. So I recommend that we keep the TTAAii in the filename for legacy products, but see this as an opaque identifier; usable as a key to lookup data [?]  Enrico > we can hand this "product identifier" concern to the domain experts, is OK for WIGOS, but there isn't an equivalent for GDPFS  Kai > agree that we hand off subtree structure and filenames to the domain experts, but it's our job to keep all the parts [of WIS] working together. For example, making sure that we can link filename and topic. We don't really care what the topic or filenames look like - we just need to be able to link them together  Tom > agrees, example from WIS2BOX  Enrico > suggests notes that we put the location information in the BUFR data file  Peter > but you have to download the file to know if you want to download the file  Jeremy > if we put geographic information in the message, can we do client-side filtering? Or would this begin to bulk out the message?  Tom > STAC does bounding boxes; so does GeoJSON  Peter > a simple bounding box would be OK; small enough, easy to understand  Tom > keep the bounding box to WGS84  Kai > let's look at the different layers of decisions, first: look at the discovery metadata to decide what topics to subscribe to; second: if you put information in the message  Peter > we can do server side filtering on the topics - but only string pattern matching, hard to do string pattern matching for bounding boxes, so recommend putting it in the message, and allowing filtering on location there  Peter > recommends adding metadata (e.g. valid time) in the filename because it enables users of the Web Accessible Folders to see data. They don't see MQ message, so metadata there can't be used  Enrico > WAF isn't designed to browse. If you wanted to do this, I recommend using the API access or providing STAC metadata  Tom > lots of OGC Met Ocean DWG delineate between concepts of design time vs. run time. consider discovery metadata as "design time" curation, people should look at the discovery metadata; from there you can jump to the WAF, API etc.  Jeremy > don't like putting metadata in filenames - it's bad practice: 1) filenames with embedded metadata are "micro-formats" that people need to parse, It’s OK for specialists - but not for generalists. 2) we're trying to work with other communities - it's unreasonable to expect other people to follow our filename  Kai > we use the topic and the message metadata to decide whether to download the files  Peter > but this makes WAF un-usable … because people would be browsing "gibberish" (non-sensical filenames)  Jeremy > but browsing the WAF is an edge case used to recover if your system loses its queue and needs to re-boot and grab files, it's not the main case  Tom > the WAF will include the topic-tree in terms of the directory structure, so there will be some "metadata". Why wouldn't a Data Consumer just grab all the files in the sub-directory? a HTTP server could provide metadata like date of publication, so Data Consumer could download from newest to oldest, and stop when they reach data they don't need?  Enrico > perhaps we should put metadata alongside the files in the WAF, e.g. a copy of the message or STAC metadata  Kai > the caches have a copy of all the messages anyway, so for recovery, you could review these  Jeremy > but clients should keep some state information, e.g. what files have already been downloaded (or what messages have been received)  Tom > is there any harm in making the Global Cache WAF browseable? And making the Global Cache provide a STAC end-point to make the data browseable, This would add value to downstream data users  Canadian examples: WAF: https://dd.weather.gc.ca/STAC: https://api.weather.gc.ca/stac/msc-datamart  Jeremy > If we did this, would we make these Global Cache WAFs discoverable  Tom > yes  Jeremy > And Global Catalogue would add the association links?  Tom > yes  Kai > STAC for the Global Cache may not work - how would it know how to build the STAC Items?  Enrico > It's difficult to do this in general; even BUFR format is quite variable  Jeremy > pause this discussion - let's get back to topic tree for now we'll pick up the filename / WAF / browsable end-point discussion another time [28-March] |
3 | Metadata topic hierarchy | Tom Klaridis | Tom presents his slides on WIS2 Discovery metadata topic hierarchy perspectives
|
4 | Discussion | All | Jeremy > putting data policy ("core", "recommended") seems problematic to me. What happens when data is moved from "recommended" to "core"? - it would change the place of the data, and this will happen! what's the rationale?  Enrico > this is my suggestion: the rationale is that you need to take different actions for core vs recommended data, core data is open and unrestricted, it's about managing access control  Peter > recommends that we use different channels for open and non-open; both with the same topic-tree beneath. Best to have the point in the topic-tree relating to access control as high as possible Kai > supports Peter's proposal. We will have Data Consumers that would subscribe only to public / open data, then, if data got moved to "open", then existing (wild-carded) subscriptions would simply start returning new data  Hassan > recommends "public" and "limited" for the top-level channel names  Enrico > outstanding work item to determine how we deal with Recommended data with access controls.  Tom > need to make sure that the Channel isn't part of the classification hierarchy, so from Peter's example "xpublic/v03/" wouldn't appear in the metadata classification hierarchy, these things are implementation specific [agreed]  Tom > but from "WIS" downward, this looks OK  Enrico > but the CCCC (originating centre) is legacy, it's a badly managed resource of originating centres. I agree to manage originating codes at the country-level  Jeremy > the originating centre will be in the discovery metadata  Enrico > we might be able to allocate new identifiers for WIS2 nodes |
Action items
Decisions
Â
- we delegate to communities of interest for them to develop their subtree? And provide a first attempt or some general guidance
- we need to provide a framework to constrain how they develop their subtree
- need to make sure that the Channel isn't part of the classification hierarchy