2022-03-28 ET-W2AT Meeting

 Date

Mar 28, 2022 13:45-15:30UTC

 Participants

  • @Rémy Giraud

  • @Jeremy Tandy (Unlicensed)

  • @Tom Kralidis (Unlicensed)

  • @peter.silva (Unlicensed)

  • @Baudouin Raoult (Unlicensed)

  • @Kai Wirt (Unlicensed)

  • @thorsten.buesselberg (Unlicensed)

  • @Dana Ostrenga (Unlicensed)

Other Experts

  • @Kari Sheets (Unlicensed)

WMO Secretariat

  • @Enrico Fucile

  • @HADDOUCH Hassan

  • @Xiaoxia Chen

  • @Timo Proescholdt

  • @David Berry

  • @Anna Milan

Apologies

  • @Henning Weber (Unlicensed)

  • @Pablo Loyber (Unlicensed)

  • @Wang Peng (Unlicensed)

  • @Kenji Tsunoda (Unlicensed)

  • @sabai.fatima (Unlicensed)

 Goals

  • Discuss (i) topic tree, (ii) filenames, and browsable WAF end-points at Global Caches

 Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Discussion on Topic tree

All

Jeremy> Topic tree: is who's issuing the bulletin, not the geographic location. what's important in WIS is authoritative data, and there's one authority for, e.g., France: the PR. So the topic tree needs to respect authority countries and territories

Countries and Territories

Baudouin > La Reunion has its own 3-letter country code (REU)

Country Codes List - Nations Online Project

so, if you need to say FR to find data from Tahiti or La Reunion, then the user somehow needs to know this? You've failed the users!

 Peter > no - because you can wild-card the (high-level) country code or issuing centre, the issuing office is all about access control

 Remy > also, we can use metadata for discovery [etc.]

Peter > anything that's not in the filename or topic tree is invisible to users of WAF

Kai > would be helpful to have a clear understanding of which requirements are mandatory, and which are optional. WAF could be an optional requirement in WIS2

 Remy > need to be clear on the use-cases that are supported by these requirements

 Tom > is the URL in the message an "actionable" URL?

 Jeremy > no - it's in two parts; baseURL plus relPath, or retPath, previously we agreed that this was acceptable

 Remy > do we need to review this again? because now we're seeing that the link (URL) is most important - the links that point to data

Remy > can we add a feature asking brokers to replay messages from the last 3-hours? e.g. to bridge over a period of system failure

 Peter > this functionality doesn't exist out of the box - there's no built-in replay function

 Remy > but we already have [subscribe & re-publish] features in Global Brokers - we're building additional functionality there.

 Jeremy > geographic information in the message structure?

Remy > notes the split between "upper-level" (common) and "lower-level" in the community-defined part, but that would be up to them

 Peter / Tom > but then geospatial semantics would be community specific

Tom > but geospatial queries should be by API. We need to differentiate between discovery metadata and where someone digs into the data itself - e.g. via an API call

 Jeremy > does anyone object to putting geographic information in the message?

 Anna > geographic information was well implemented in WIS1 metadata, it was the conformance test that was 100% passed; so it's well understood, and relevant for all data types

 Baudouin > the bounding box in the message may not match the bounding box in the discovery metadata - e.g. if the observing platform is moving, or data is a smaller part of a larger dataset

 Timo > but won't this mean that you have duplicate metadata; in multiple places (in the data, in the message etc.); are there issues maintenance

 Jeremy > discovery metadata is different - so no duplication there, but there could be duplication between BUFR message and the MQP message. the BUFR file is kept; the MQ message is transient, so there's no maintenance issue

Enrico > we already have some duplication in metadata - between BUFR message and OSCAR station metadata

 Tom > in WIS2box, we automatically generate the geographic information from the [BUFR] data file

 Kai > question is where do you want to filter? When you subscribe, or when you receive message?

Kai > so if we have geographic information in the message, we can support client-side filtering

 Anna > asks about identifying files with GUIDs

 Peter > the problem with GUIDs is that you need to cross reference with metadata every time, before you choose to download. Here, we're trying to identify the minimal set of metadata useful for people to decide whether to download the file

 Tom > bounding box only? What about stations that are points.

Baudouin > N-S-E-W bbox, and start- and time-time.

Peter > re-use what STAC adopts

Tom > that's geojson, geojson geometry. 99% of what you see uses will be bounding box, but all geojson parsers will support it, so we get that for free

 Peter > but STAC also has a bounding box as well as geometry

 Tom > because the bounding box is for discovery

 Baudouin > have a bounding box with zero area - N-S equal, W-E equal - it works! that would be enough to determine whether to download the file

Tom > don't know how software will deal with zero-area bounding boxes. We could profile the geojson to only allow point and bbox?

 Agree: support extent and point

Agree: we want the smallest "footprint" of standard as possible - so having everything as bounding box would be simpler

 ACTION (Tom): test how software copes with zero-area bbox

 Remy > what else do we need to filter on? time, extent or instant? Actually - to decide if you're downloading the data - just need the start time instant. (or valid time). you can get the duration of the data from the discovery metadata, or from the data itself. data format / mime-type [agreed]

 Enrico > temporal durations might go in the topic tree?

 Peter > TT-Protocols pulled the temporal duration

 Remy > notes the one-to-one mapping between dataset and topic, so the topic tree wouldn't different between 0, 6, 12, 18 forecast runtimes

 Agree: message structure includes start-time (valid-time) of the data, as an instant

 Action (Jeremy, Tom): create some examples for time stamps in the message

 Kai > agree for observations, but for forecasts, users might be interested only in the first 48-hours.

 Remy > and we don't know about hydrology at all, so let's not close off options

 Peter > the proposal to have just a time-instant leaves the extent implicit

 Agree: support time-extent, or time-instant

 ACTION (Tom): test how zero-length time-extents are supported by software

 Remy > suggests that communities may decide to add community-relevant filtering like the geojson "properties" area that's unbounded

 Peter > the work of the TT-Protocols team has thought about this. Additional fields can be added to the message as appropriate, noting the downside that not everyone can leverage the semantics

 >> the key thing here is providing information to decide whether to download the file

 Tom > this sounds a lot like Core and Extensions

 >> didn't get around to discuss support for browsing [files in] Web Accessible Folders. Continue discussion by correspondence [based on email exchange between Peter and Jeremy]

jeremy > summary:

1) Global Cache is not browsable by design, but we can use HTTP Head to identify files according to their age (publication time); or Head on the directory to get a list of files

>> [Remy notes that a Global Cache _may_ not need to organise data in directories - they might just have a massive set of files with unique filenames (retPath)]

 2) we _may_ want to support browsing Web-accessible folders at originating centres, this might be achieved by embedding metadata in the filename, or using a "companion file" with metadata, such as STAC. We also need to discuss whether browseable WAFs at originating centres are mandatory or optional [I think optional], and then how we recommend people enable that [an originating centre may choose to use filenames with embedded metadata or STAC or something else - they just need to describe what they've done to make the data browsable, for example, with a note in the discovery metadata]. This might be defined for a community, with a community file-naming standard or use of STAC etc.

Also note that some originating centres may not expose data as files, choosing to only use an API - APIs are browseable in a different sense

 pending discussion -

  • deprecate sFTP, use HTTPS because it offers more flexibility in terms of headers and access control]

  •  Should a Global Cache (or Global Broker?) offer a replay service - e.g. an API that enables a user to request all messages that were published on a given topic for a specified interval/duration … then the user looks at the messages to determine what files it needs! … Key point is that we don't assume the need for Web Accessible Folders everywhere]

 Message Queue Protocols (MQP) | Replay / catchup

 

 

 

 Action items

Next week:(1-hr) Discussion with VerneMQ about Global Broker implementation. (1-hr) Supporting browseable WAFs for originating centres [see above], further definition of the topic tree (common part)
Tom: test how zero-length time-extents are supported by software
Jeremy, Tom: create some examples for time stamps in the message

 Decisions