2022-03-28 ET-W2AT Meeting
Date
Mar 28, 2022 13:45-15:30UTC
Participants
@Rémy Giraud
@Jeremy Tandy (Unlicensed)
@Tom Kralidis (Unlicensed)
@peter.silva (Unlicensed)
@Baudouin Raoult (Unlicensed)
@Kai Wirt (Unlicensed)
@thorsten.buesselberg (Unlicensed)
@Dana Ostrenga (Unlicensed)
Other Experts
@Kari Sheets (Unlicensed)
WMO Secretariat
@Enrico Fucile
@HADDOUCH Hassan
@Xiaoxia Chen
@Timo Proescholdt
@David Berry
@Anna Milan
Apologies
@Henning Weber (Unlicensed)
@Pablo Loyber (Unlicensed)
@Wang Peng (Unlicensed)
@Kenji Tsunoda (Unlicensed)
@sabai.fatima (Unlicensed)
Goals
Discuss (i) topic tree, (ii) filenames, and browsable WAF end-points at Global Caches
Discussion topics
Item | Presenter | Notes |
---|---|---|
Discussion on Topic tree | All | Jeremy> Topic tree: is who's issuing the bulletin, not the geographic location. what's important in WIS is authoritative data, and there's one authority for, e.g., France: the PR. So the topic tree needs to respect authority countries and territories Countries and Territories Baudouin > La Reunion has its own 3-letter country code (REU) Country Codes List - Nations Online Project so, if you need to say FR to find data from Tahiti or La Reunion, then the user somehow needs to know this? You've failed the users! Peter > no - because you can wild-card the (high-level) country code or issuing centre, the issuing office is all about access control Remy > also, we can use metadata for discovery [etc.] Peter > anything that's not in the filename or topic tree is invisible to users of WAF Kai > would be helpful to have a clear understanding of which requirements are mandatory, and which are optional. WAF could be an optional requirement in WIS2 Remy > need to be clear on the use-cases that are supported by these requirements Tom > is the URL in the message an "actionable" URL? Jeremy > no - it's in two parts; baseURL plus relPath, or retPath, previously we agreed that this was acceptable Remy > do we need to review this again? because now we're seeing that the link (URL) is most important - the links that point to data Remy > can we add a feature asking brokers to replay messages from the last 3-hours? e.g. to bridge over a period of system failure Peter > this functionality doesn't exist out of the box - there's no built-in replay function Remy > but we already have [subscribe & re-publish] features in Global Brokers - we're building additional functionality there. Jeremy > geographic information in the message structure? Remy > notes the split between "upper-level" (common) and "lower-level" in the community-defined part, but that would be up to them Peter / Tom > but then geospatial semantics would be community specific Tom > but geospatial queries should be by API. We need to differentiate between discovery metadata and where someone digs into the data itself - e.g. via an API call Jeremy > does anyone object to putting geographic information in the message? Anna > geographic information was well implemented in WIS1 metadata, it was the conformance test that was 100% passed; so it's well understood, and relevant for all data types Baudouin > the bounding box in the message may not match the bounding box in the discovery metadata - e.g. if the observing platform is moving, or data is a smaller part of a larger dataset Timo > but won't this mean that you have duplicate metadata; in multiple places (in the data, in the message etc.); are there issues maintenance Jeremy > discovery metadata is different - so no duplication there, but there could be duplication between BUFR message and the MQP message. the BUFR file is kept; the MQ message is transient, so there's no maintenance issue Enrico > we already have some duplication in metadata - between BUFR message and OSCAR station metadata Tom > in WIS2box, we automatically generate the geographic information from the [BUFR] data file Kai > question is where do you want to filter? When you subscribe, or when you receive message? Kai > so if we have geographic information in the message, we can support client-side filtering Anna > asks about identifying files with GUIDs Peter > the problem with GUIDs is that you need to cross reference with metadata every time, before you choose to download. Here, we're trying to identify the minimal set of metadata useful for people to decide whether to download the file Tom > bounding box only? What about stations that are points. Baudouin > N-S-E-W bbox, and start- and time-time. Peter > re-use what STAC adopts Tom > that's geojson, geojson geometry. 99% of what you see uses will be bounding box, but all geojson parsers will support it, so we get that for free Peter > but STAC also has a bounding box as well as geometry Tom > because the bounding box is for discovery Baudouin > have a bounding box with zero area - N-S equal, W-E equal - it works! that would be enough to determine whether to download the file Tom > don't know how software will deal with zero-area bounding boxes. We could profile the geojson to only allow point and bbox? Agree: support extent and point Agree: we want the smallest "footprint" of standard as possible - so having everything as bounding box would be simpler ACTION (Tom): test how software copes with zero-area bbox Remy > what else do we need to filter on? time, extent or instant? Actually - to decide if you're downloading the data - just need the start time instant. (or valid time). you can get the duration of the data from the discovery metadata, or from the data itself. data format / mime-type [agreed] Enrico > temporal durations might go in the topic tree? Peter > TT-Protocols pulled the temporal duration Remy > notes the one-to-one mapping between dataset and topic, so the topic tree wouldn't different between 0, 6, 12, 18 forecast runtimes Agree: message structure includes start-time (valid-time) of the data, as an instant Action (Jeremy, Tom): create some examples for time stamps in the message Kai > agree for observations, but for forecasts, users might be interested only in the first 48-hours. Remy > and we don't know about hydrology at all, so let's not close off options Peter > the proposal to have just a time-instant leaves the extent implicit Agree: support time-extent, or time-instant ACTION (Tom): test how zero-length time-extents are supported by software Remy > suggests that communities may decide to add community-relevant filtering like the geojson "properties" area that's unbounded Peter > the work of the TT-Protocols team has thought about this. Additional fields can be added to the message as appropriate, noting the downside that not everyone can leverage the semantics >> the key thing here is providing information to decide whether to download the file Tom > this sounds a lot like Core and Extensions >> didn't get around to discuss support for browsing [files in] Web Accessible Folders. Continue discussion by correspondence [based on email exchange between Peter and Jeremy] jeremy > summary: 1) Global Cache is not browsable by design, but we can use HTTP Head to identify files according to their age (publication time); or Head on the directory to get a list of files >> [Remy notes that a Global Cache _may_ not need to organise data in directories - they might just have a massive set of files with unique filenames (retPath)] 2) we _may_ want to support browsing Web-accessible folders at originating centres, this might be achieved by embedding metadata in the filename, or using a "companion file" with metadata, such as STAC. We also need to discuss whether browseable WAFs at originating centres are mandatory or optional [I think optional], and then how we recommend people enable that [an originating centre may choose to use filenames with embedded metadata or STAC or something else - they just need to describe what they've done to make the data browsable, for example, with a note in the discovery metadata]. This might be defined for a community, with a community file-naming standard or use of STAC etc. Also note that some originating centres may not expose data as files, choosing to only use an API - APIs are browseable in a different sense pending discussion -
|
|
|
|