2022-02-02 ET-W2AT Meeting

Date

Feb 2, 2022 13:00-15:00 UTC

Participants

ET-W2AT

@Jeremy Tandy (Unlicensed)
@Rémy Giraud
@Dana Ostrenga (Unlicensed)
@thorsten.buesselberg (Unlicensed)
@Kai Wirt (Unlicensed)
@Henning Weber (Unlicensed)
@Tom Kralidis (Unlicensed)
@peter.silva (Unlicensed)
@Ken Tsunoda (Unlicensed)
@Li Xiang (Unlicensed)
@Baudouin Raoult (Unlicensed)

WMO Secretariat

@Peiliang Shi (Unlicensed)
@Enrico Fucile
@HADDOUCH Hassan
@Timo Proescholdt
@Anna Milan
@Xiaoxia Chen

Goals

To discuss the MQP Protocol, the list of core shared service to reach an agreement

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Opening and Discussion on MQP Protocol	Jeremy Rémy	Jeremy remind the key decisions taken in the weekly meetings related to the shared services and highlighted the need to approve them in this meeting MQP Protocol Rémy> Enrico created a page for decision on protocols: Which MQP Protocol Jeremy> We specify two protocols: MQTT3.1 vs MQTT 5 Henning> MQTT is it only for small messages; there is any limitation on MQTT? Remy> No limitation in MQTT5. Baudouin> Are two MQTT interoperable? Tom> Peter, Enrico and me are working MQP topic structure, we expect to have first draft next months, would be good to put it on a Monday topic- starting on alignment of granularity for topic and metadata/dataset Rémy> Topic to be discussed on Monday meetings Action: needs alignment with metadata structure for datasets, topics and metadata on the agenda to discuss Kai> need to consider headers in the filenames; and link the filenames to some of metadata Rémy> The global file naming is for GTS, need to review the Global File naming Convention in the context of WIS2. Tom> discovery metadata, pub/sub, providing notification of new files Enrico> WIS2 topic principles is under discussion Decisions: Data will always be made available via a MQ message that points to where the data is for download; with an optimisation to embed small data in the message [approved] Adopts the MQTT 3.1 and MQTT 5 in WIS2.0 Pending discussions: dataset granularity, metadata structure, topic structure and implications for [GTS] File-naming Convention (to discuss circa 21-Feb) What size is "small" data that can be embedded in a message. Are there kinds of data that should (always) be embedded in a message (e.g. Tsunami warnings)?
2. Discussion on Shared Service	ALL	Shared Services approach A description of shared services is available on https://wmo-teams.atlassian.net/wiki/spaces/WIS2/pages/306970677 Rémy>The previous meetings discuss the concept of shared services and this meeting aims to reach agreement of the concept and discuss the list of core shared services The team agreed the concept of shared service in WIS2.0: Global Broker Global Discovery Catalogue Global Cache Baudouin> GC abbreviation for both Global Catalogue and Global Cache Li Xiang> Suggest using Global Discovery Catalogue Baudouin> if it's the GISC republishing messages then the GISC will use their URLs, so it’s the responsibility of the GISC to make sure the URLs resolve Remy> we should avoid using the term GISC, we're talking about a shared service, Who will run the shared service is not yet decided. The main point is to provide a point of aggregation so that people can subscribe in one place, and we avoid NC/DCPC sources being "hammered" by 193 WMO members and everyone else! Tom> TT-WIS-Metadata is working on the draft document of WMCP2.0 to be available in 2-3 weeks. OGC API records is served as baseline for the catalogue protocol and metadata standard Action: to discuss the metadata protocol at next Monday’s meetings Action: Tom to share the link for the OGC API records and wmcp 2.0 Hassan> To test them out on the pilot projects and add the deployments of the WIS2node in a box. To involve the TT-GISC from the beginning, and involve the ET and TT to develop details of each component of shared services to complete the architecture. Summary: Decision: shared services approach approved by the team (Global Cache, Global Broker and Global Discovery Catalogue) Pending discussion: technical details of how Global Brokers work / are implemented - including: how we ensure all Global Broker instances publish messages from all NC/DCPCs (e.g. "synchronisation") strategies for protecting brokers from overload (e.g. from too many connections/subscribers) strategies for ensuring prioritized delivery of urgent messages (e.g. how MQ topics are organized to ensure priority message topics don't get clogged). What are the remaining functions of a GISC (i.e. what is left over once shared-services are used to deliver some functions)? What support do GISCs need to provide to NC/DCPC in their Area of Responsibility? Technical details of how Global Catalogues work - including metadata harvesting (or crawling). What data (if any) should be “cached” (i.e. copied and republished) at (some) GISCs for global low-latency resilient access? For example: real-time [weather] data of global interest, or all "Core" data (as per WMO Unified Data Policy)? See Is the concept of Area of Responsibility (AoR) still relevant? for a starter discussion.
3. Discussion on NC/DCPC connection with the shared service	ALL	How many instances are there for the shared service? Do you agree that NC/DCPC will connect to more than one instance of a shared service Jeremy > What about connectivity between shared service instances, and with NC/DCPC? Should a NC/DCPC connect to more than one instance of a shared service? What is the minimum number of connections between shared-service instances? Rémy > we've not yet agreed how many instances of the shared services we'll have - so let's pause on that, but we do know that NC/DCPC will want their data to be available through the shared services. We learn from WIS1 that we should allow NC/DCPC to publish messages / data at least twice to ensure that data / messages don't get lost, however avoids the tight coupling between a NC/DCPC and GISC. NC/DCPC depends on "shared services" - which will most likely be provided by GISCs Timo > we can make synchronisation issues go away if NC/DCPC publish to all Rémy> No cache synchronization. Brokers should see all the messages. The GC will just download data made available by brokers (Henning) No hard limitation for the instances of shared services. Kai > agrees that inter-shared service communication is needed - this will mitigate system failures Jeremy > politically, not everyone wants to "talk" directly to each other - so we need intermediaries Decision: no requirement for NC/DCPC to connect to ALL instances of a shared-service Kai > are there instances where an NC may not have the capability (or capacity) to publish to two instances? Decision: NC/DCPC MUST connect to at least one instance of a shared-service, and should connect to two or more instances of a shared-service Timo >If there are more than 3 GB and NC connect to at most 2 the question is how the notifications get to the other GBs. One obvious way is that GB re-publish notifications. This makes the system more complicated (and likely requires non-standard components in the GB). Need a avoid infinite loop Decision: NC/DCPC connects directly to the shared-service instance(s) - not via their GISC. Jeremy > Inter-connectivity between shared service instances … All-2-All, fully meshed, G=3 etc. Baudouin > Unidata IDD has been avoiding circular re-publication for years https://www.unidata.ucar.edu/projects/idd/ldmfaq.html , topology is here: https://rtstats.unidata.ucar.edu/rtstats/ Henning > don’t make explicit restrictions about the number of instances, and be clear on the expectation that we will have a small number of high-quality instances. So we need to work out the process to select those that host a shared service; e.g. quality gates, performance Rémy> all shared services aren't equal, an NC connecting to multiple caches to get the data gets around the problem of poor quality cache instances. The main concern relates to the Global Broker - this needs to be highly performant! Need to avoid the reason to offer a Global Broker being "prestige". We have audits - but we know from experience that it's difficult to "kick out" underperforming GISCs Jeremy > service performance will be publicly shared, "red blobs" on maps is a motivator for Members to improve performance (or at least to resolve the performance issues) Hassan> ET-AC can use the Audit and Certification process Peiliang> Monitoring will be more effective than audits. Jeremy> Audit is important. Peter> Audit Effectiveness is important. Henning> Rather than taking political issue into account, but to use technical solution for data usage Kai> Service registry (Global control center), solution is to have the connections automatically. Peiliang Algorithm to optimize the connectivity sounds great. What we need to consider is to have the monitoring system in place to monitor the daily performance. Then there will be a report at the end of year to present, indicating the global infrastructure situation. Kai> data or service monitoring? it should be distinguished. Rémy> We need to define various metrics using the same approach. Timo>The use of standards and a service oriented architecture make it technically easy to replace one components by another. example, a NC plugs into another GC and GB, whose addresses they obtain from a registry. Only components that are working (as per monitoring) are listed in the registry. Since we have standardized the MQP, the message schema and possibly the download schema, GB and GC are interchangeable Decisions approved: No requirement for NC/DCPC to connect to ALL instances of a shared-service NC/DCPC MUST connect to at least one instance of a shared-service, and SHOULD connect to two or more instances of a shared-service NC/DCPC connects directly to the shared-service instance(s) - not via their GISC. No requirement for all-to-all fully meshed connection There must be more than one instance of each shared service Global Monitoring [approved as Shared service there needs to be automated service monitoring of the WIS2 system from day 1 Pending discussions: "anti-loop" logic for avoid transmission of duplicate messages (and data). governance and process to allocate shared service tasks [to GISCs] - noting that Secretariat plan to talk to each GISC operator to determine their aspirations for provision of shared-service instances. What service performance criteria are needed for use in audit and how should these be evaluated? What we're going to monitor and how; e.g. SaaS monitoring service Functional requirements for the Global Monitoring shared service
4. Discussion on Global Cache connectivity	ALL	(Rémy) No need for Global caches to be connected to each other. Synchronization is only for global brokers and not for global cache. (Kai) We should not forget about the structure of the connections (not only the number).. We need to avoid having two halves of the brokers which are only connected by one link (Peiliang) need to ensure that centres are connected at least with a "spanning tree" Pending discussion: Define the minimum number of connections ("G") for each type of shared service (broker, catalogue, cache, monitor)

Action items

to put the alignment of granularity between topics and metadata on the agenda to discuss at next Monday’s meeting

to further discuss the number of instances are needed GB/GCat/GCache

Decision

Data will always be made available via a MQ message that points to where the data is for download; with an optimisation to embed small data in the message [approved]
Adopts the MQTT 3.1 and MQTT 5 in WIS2.0

Shared services approach approved by the team (Global Cache, Global Broker and Global Discovery Catalogue)
No requirement for NC/DCPC to connect to ALL instances of a shared-service

NC/DCPC MUST connect to at least one instance of a shared service, and SHOULD connect to two or more instances of a shared-service
NC/DCPC connects directly to the shared-service instance(s) - not via their GISC
No requirement for all-to-all fully meshed connection
There must be more than one instance of each shared service
Global Monitoring approved as shared service
There needs to be automated service monitoring of the WIS2 system from day 1

WIS 2.0