2022-02-28 ET-W2AT Meeting

Date

Feb 28, 2022 14:30-16:30 UTC

Participants

ET-W2AT

  • @Jeremy Tandy (Unlicensed)

  • @Rémy Giraud

  • @Dana Ostrenga (Unlicensed)

  • @thorsten.buesselberg (Unlicensed) (absent)

  • @Kai Wirt (Unlicensed)

  • @Henning Weber (Unlicensed) (absent)

  • @Tom Kralidis (Unlicensed)

  • @peter.silva (Unlicensed)

  • @Kenji Tsunoda (Unlicensed) (absent)

  • @Li Xiang (Unlicensed)

  • @Baudouin Raoult (Unlicensed)

Other Experts

  • @Kari Sheets (Unlicensed)

WMO Secretariat

  • @Peiliang Shi (Unlicensed)

  • @Enrico Fucile

  • @HADDOUCH Hassan

  • @Timo Proescholdt

  • @David Berry

  • @Anna Milan

  • @Xiaoxia Chen

 

Goals

  • Metadata and Catalog (Presented by Tom)

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

1 Metadata

Tom

Tom reported on WIS2 metadata an catalogue perspectives

  • Standards context

    • this isn't just for WMO [it should be broadly applicable]"

    • we're building on lots of existing standards in OGC, W3C, IETF

    • these are applicable for the Web, and promote Web-based data sharing

  • Metadata records

    • WCMP2 will be based on OGC API - Records - not ISO19115/19139

    • extends OGC-API Features, covers both content model and access mechanism

    • metadata can be deployed as a static, crawlable catalogue, or via API machinery

    • core record model is GeoJSON dialect; leveraging Dublin Core, DCAT, CSW3; provided as JSON / HTML

    • allows any kind of resource to be catalogued; data, services, etc.

  • Catalogue

    • crawlable [static] files or searchable via API machinery (spatial/temporal sub setting, keyword search, broader/narrower search within topic hierarchies)

  • Workflows

    • how do the shared services (Global components) interact: WIS2 Node registers to [administrative] WIS2 Global Service

      [Tom presents how the Global Catalogue is populated]

  • TODO’s:

    • definition of mandatory and optional clauses for WCMP2

    • development of topic hierarchy: required for both classification/categorization and how the pub/sub topics are organized; what's fixed, what's devolved to other communities etc.

    • develop KPI for assessing metadata granularity

    • KPIs for metadata quality

    • quality assessment of discovery metadata; driving continuous improvement of metadata and a better user experience

    • develop business process for continuous feedback and improvement of metadata in the catalogue, e.g. we can create a report, but so what? Who generates the report (role, responsibility)? What should centers be obliged to do in response to a report? Etc.

    • low barrier (Web-friendly) documentation - for all WIS2 outreach aspects!

2

Discussion

ALL

Remy > two questions

1. OGC API is a 2021/2022 standard … how can we be sure that this will stay and be relevant in a decade?

2. also need more explanation about the link between metadata and the message topic hierarchy

 Tom > OGC API is leveraging many simple building blocks; OGC API - Features is also a joint ISO standard; OGC API - Records will be an ISO standard too, so I think it will be stable. Topic hierarchy provides a useful way to classify data, and browse broader/narrower classifications. The association link needs to map to something where someone can subscribe

 Jeremy > the building blocks used in OGC API are based on how the Web works - should be durable. Data updates via MQP not only for real-time data

 Remy > we have a history of ISO standards not being long-lasting - so we'll see. Topic hierarchy example: if we have hydrological information, this has nothing to do with our MQ data delivery

Baudouin > association with a topic doesn't seem to be compulsory - but if it's there we need the match between the record and the MQ topic

 Tom > OGC specification development is being done in an iterative way - with implementation and testing along the way, not a 500-page ISO spec that's never been tested

Jeremy > timing implications for WIS2 Technical Regulations in INFCOM2 (Nov 2022)?

 Enrico > INFCOM2 will be 24 Oct 2022, which means everything will need to be done by the summer (end-July), so I propose we have INF paper with latest versions, if not final versions, with "experimental" status, then update the manuals afterwards, on the next cycle. We did this for the CF-NetCDF profiles

 Remy > think about having everything ready from 1-Jan-2024, so that people can start officially exchanging data from then, we need to accommodate "agile" updates of the Technical Regulations

Enrico> I propose to develop agree plan for the updating the WIS Technical Regulations (Enrico, Remy, Jeremy)

Peter > MQ messages are really just micro metadata records that people can look at to decide if they want to download the data, The topic hierarchy needs to match. Spatial and temporal extent metadata needs to be included in the "micro metadata" - can only go in two places: filename convention or message body. So far not in either (we don't have a file-naming convention agreed)

Jeremy > need to be able to navigate from the "micro metadata" in the message, back to the discovery metadata record

Tom > agreed

Jeremy > further discussion needed: should the "micro metadata" message include a reference to the original discovery metadata record [in the Global Catalogue]?

 Baudouin > caution - what killed WIS1 was meeting requirements that didn't really exist, do we really have use cases for having spatial extents in the "micro metadata"?

 Baudouin > on WCMP2 GitHub repository I'm still seeing GTS-style TTAAii headers - we need to avoid these because they're super-specialized. I suggest we use the names of cities / locations instead

 Tom > noted; the GitHub repository has some draft examples; will get these cleaned up

 Baudouin > "examples become the norm" - they're more powerful than the other docs!

 Enrico > I can provide assistance in identifying good examples to use - we need to steer away from old GTS memes

Enrico > topic hierarchy needs to be addressed with GDPFS / NWP data we don't have much insight on how to do this. NWP data is a very high priority - ref. facilitating access to NWP data in response to GBON observation provision, maybe we need better engagement with ALL the activity areas and, work with the WMCs to agree the hierarchy for NWP data. I propose to set up a Task Team with representatives from all NWP global centers to agree on the vocabularies and hierarchy

Remy > notes that Kari, as Chair ET-W2PE, needs to be involved, also we need to structure our engagement activity, we will need each activity area to "take over" the high-level standards we're developing. Tom and Peter can't be on every Task Team, so let's define the _method_ to be used by each activity area, and provide help where they get stuck, we need to scale out here; to devolve the work

Peter > right now we don't yet agree among ourselves - so it's hard to provide recommendations for other teams. we need to develop the framework for activity areas to develop their own domain-specific standards for vocabulary and topic structure

 Enrico > many centers are already distributing data - I worry they won't change their file-names or directory structures.

 Remy > for engagement with Regional Association and Activity Areas, there's a thin line between too early and too late. We need to learn from WIS1 mistakes and, to be able to explain to our colleagues from GDPFS and Satellite what the benefits of joining WIS2 are; migrating from GTS and WIS1, feels like too soon right now. Need to develop the WIS2 "sales pitch" and road test this with a couple of activity areas; GDPFS and Satellite and maybe another

 Anna > satellite and NWP are well regulated - perhaps a different domain? Hydrology?

 Dana > my experience in working with satellite and modelling communities, there's appetite to engage in with this

Jeremy presents the metadata workflow to populate the Global Catalogue

 Enrico > general statement - this workflow to populate the Global Catalogue spreads the logic out over several Global components, this adds complexity; systems like this are hard to manage

 Jeremy > our previous discussions pushed for each Global component to have a well-defined, singular task; Global Broker is as close to a MQ broker as we can make it, Global Cache is a simple edge cache populated via MQ messages, Global Catalogue only does provides the catalogue and search API. In our design, each component is simple. I agree that the logic is distributed between components, but we have resilience through redundancy - we can afford the failure of Global Broker or Global Cache instances and the Global Catalogue will still get populated

 Enrico > but there are data types that won't be cached; for example "Recommended Data" in the Unified Data Policy, this may have access control T&Cs

 Jeremy > yes - but we agreed that all the _metadata_ would be cached, irrespective of whether it describes a cached dataset

 

Action items

Set up Task Team for NWP metadata (Tom, Peter involved), Enrico to take the lead with Yuki
need to resolve where the "file-level" granularity metadata (location, time) needs to be encoded
need to define our file-naming convention
need to finalize the message structure
 update examples in the WCMP2 GitHub repo
we need to develop the framework for activity areas to develop their own domain-specific standards for vocabulary and topic structure
develop the WIS2 "sales pitch" and road test this with a couple of activity areas; GDPFS and Satellite and maybe another

Decision

  1. WCMP2 based on OGC-API Records
  2. WIS2 [Global] Catalogue implements OGC-API Records API
  3. Set up Task Team for NWP metadata (Tom, Peter involved), Enrico to take the lead with Yuki
  4. Agree a structured method and framework for activity areas to develop their own domain-specific standards for vocabulary and topic structure
  5. Next weekly meeting to discuss message structure

Â