2022-02-02 ET-W2AT Meeting

Date

Feb 2, 2022 13:00-15:00 UTC

Participants

ET-W2AT

  • @Jeremy Tandy (Unlicensed)

  • @Rémy Giraud

  • @Dana Ostrenga (Unlicensed)

  • @thorsten.buesselberg (Unlicensed)

  • @Kai Wirt (Unlicensed)

  • @Henning Weber (Unlicensed)

  • @Tom Kralidis (Unlicensed)

  • @peter.silva (Unlicensed)

  • @Ken Tsunoda (Unlicensed)

  • @Li Xiang (Unlicensed)

  • @Baudouin Raoult (Unlicensed)

WMO Secretariat

  • @Peiliang Shi (Unlicensed)

  • @Enrico Fucile

  • @HADDOUCH Hassan

  • @Timo Proescholdt

  • @Anna Milan

  • @Xiaoxia Chen

Goals

  1. To discuss the MQP Protocol, the list of core shared service to reach an agreement

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

  1. Opening and Discussion on MQP Protocol

Jeremy

Rémy

Jeremy remind the key decisions taken in the weekly meetings related to the shared services and highlighted the need to approve them in this meeting

MQP Protocol

Rémy> Enrico created a page for decision on protocols:

Which MQP Protocol

Jeremy> We specify two protocols: MQTT3.1 vs MQTT 5

Henning> MQTT is it only for small messages; there is any limitation on MQTT?

Remy> No limitation in MQTT5.

Baudouin> Are two MQTT interoperable?

Tom> Peter, Enrico and me are working MQP topic structure, we expect to have first draft next months, would be good to put it on a Monday topic- starting on alignment of granularity for topic and metadata/dataset

Rémy> Topic to be discussed on Monday meetings

Action: needs alignment with metadata structure for datasets, topics and metadata on the agenda to discuss

Kai> need to consider headers in the filenames; and link the filenames to some of metadata

Rémy> The global file naming is for GTS, need to review the Global File naming Convention in the context of WIS2.

Tom> discovery metadata, pub/sub, providing notification of new files

Enrico> WIS2 topic principles is under discussion

Decisions:

  • Data will always be made available via a MQ message that points to where the data is for download; with an optimisation to embed small data in the message [approved]

  • Adopts the MQTT 3.1 and MQTT 5 in WIS2.0

Pending discussions:

  • dataset granularity, metadata structure, topic structure and implications for [GTS] File-naming Convention (to discuss circa 21-Feb)

  • What size is "small" data that can be embedded in a message. Are there kinds of data that should (always) be embedded in a message (e.g. Tsunami warnings)?

2. Discussion on Shared Service

ALL

Shared Services approach

A description of shared services is available on https://wmo-teams.atlassian.net/wiki/spaces/WIS2/pages/306970677

Rémy>The previous meetings discuss the concept of shared services and this meeting aims to reach agreement of the concept and discuss the list of core shared services

The team agreed the concept of shared service in WIS2.0:

  • Global Broker

  • Global Discovery Catalogue

  • Global Cache

Baudouin> GC abbreviation for both Global Catalogue and Global Cache

Li Xiang> Suggest using Global Discovery Catalogue

Baudouin> if it's the GISC republishing messages then the GISC will use their URLs, so it’s the responsibility of the GISC to make sure the URLs resolve

Remy> we should avoid using the term GISC, we're talking about a shared service, Who will run the shared service is not yet decided. The main point is to provide a point of aggregation so that people can subscribe in one place, and we avoid NC/DCPC sources being "hammered" by 193 WMO members and everyone else!

Tom> TT-WIS-Metadata is working on the draft document of WMCP2.0 to be available in 2-3 weeks. OGC API records is served as baseline for the catalogue protocol and metadata standard

Action: to discuss the metadata protocol at next Monday’s meetings

Action: Tom to share the link for the OGC API records and wmcp 2.0

Hassan> To test them out on the pilot projects and add the deployments of the WIS2node in a box. To involve the TT-GISC from the beginning, and involve the ET and TT to develop details of each component of shared services to complete the architecture.

Summary:

Decision: shared services approach approved by the team (Global Cache, Global Broker and Global Discovery Catalogue)

Pending discussion:

  • technical details of how Global Brokers work / are implemented - including:

  1. how we ensure all Global Broker instances publish messages from all NC/DCPCs (e.g. "synchronisation")

  2. strategies for protecting brokers from overload (e.g. from too many connections/subscribers)

  3. strategies for ensuring prioritized delivery of urgent messages (e.g. how MQ topics are organized to ensure priority message topics don't get clogged).

  4. What are the remaining functions of a GISC (i.e. what is left over once shared-services are used to deliver some functions)?

  5. What support do GISCs need to provide to NC/DCPC in their Area of Responsibility?

  • Technical details of how Global Catalogues work - including metadata harvesting (or crawling).

  • What data (if any) should be “cached” (i.e. copied and republished) at (some) GISCs for global low-latency resilient access? For example: real-time [weather] data of global interest, or all "Core" data (as per WMO Unified Data Policy)?

  • See Is the concept of Area of Responsibility (AoR) still relevant? for a starter discussion.

3. Discussion on NC/DCPC connection with the shared service

ALL

How many instances are there for the shared service?

Do you agree that NC/DCPC will connect to more than one instance of a shared service

Jeremy > What about connectivity between shared service instances, and with NC/DCPC?

Should a NC/DCPC connect to more than one instance of a shared service?

What is the minimum number of connections between shared-service instances?

 Rémy > we've not yet agreed how many instances of the shared services we'll have - so let's pause on that, but we do know that NC/DCPC will want their data to be available through the shared services. We learn from WIS1 that we should allow NC/DCPC to publish messages / data at least twice to ensure that data / messages don't get lost, however avoids the tight coupling between a NC/DCPC and GISC. NC/DCPC depends on "shared services" - which will most likely be provided by GISCs

Timo > we can make synchronisation issues go away if NC/DCPC publish to all

Rémy> No cache synchronization. Brokers should see all the messages. The GC will just download data made available by brokers

(Henning) No hard limitation for the instances of shared services.

Kai > agrees that inter-shared service communication is needed - this will mitigate system failures

 Jeremy > politically, not everyone wants to "talk" directly to each other - so we need intermediaries 

Decision: no requirement for NC/DCPC to connect to ALL instances of a shared-service

Kai > are there instances where an NC may not have the capability (or capacity) to publish to two instances?

Decision: NC/DCPC MUST connect to at least one instance of a shared-service, and should connect to two or more instances of a shared-service

Timo >If there are more than 3 GB and NC connect to at most 2 the question is how the notifications get to the other GBs.

One obvious way is that GB re-publish notifications. This makes the system more complicated (and likely requires non-standard components in the GB). Need a avoid infinite loop

 Decision: NC/DCPC connects directly to the shared-service instance(s) - not via their GISC.

 Jeremy > Inter-connectivity between shared service instances … All-2-All, fully meshed, G=3 etc.

Baudouin > Unidata IDD has been avoiding circular re-publication for years https://www.unidata.ucar.edu/projects/idd/ldmfaq.html , topology is here: https://rtstats.unidata.ucar.edu/rtstats/

Henning > don’t make explicit restrictions about the number of instances,

and be clear on the expectation that we will have a small number of high-quality instances. So we need to work out the process to select those that host a shared service; e.g. quality gates, performance

Rémy> all shared services aren't equal, an NC connecting to multiple caches to get the data gets around the problem of poor quality cache instances. The main concern relates to the Global Broker - this needs to be highly performant!

Need to avoid the reason to offer a Global Broker being "prestige". We have audits - but we know from experience that it's difficult to "kick out" underperforming GISCs

Jeremy > service performance will be publicly shared, "red blobs" on maps is a motivator for Members to improve performance (or at least to resolve the performance issues)

Hassan> ET-AC can use the Audit and Certification process

Peiliang> Monitoring will be more effective than audits.

Jeremy> Audit is important.

Peter> Audit Effectiveness is important.

Henning> Rather than taking political issue into account, but to use technical solution for data usage

Kai> Service registry (Global control center), solution is to have the connections automatically.

Peiliang Algorithm to optimize the connectivity sounds great. What we need to consider is to have the monitoring system in place to monitor the daily performance. Then there will be a report at the end of year to present, indicating the global infrastructure situation.

Kai> data or service monitoring? it should be distinguished.

Rémy> We need to define various metrics using the same approach.

Timo>The use of standards and a service oriented architecture make it technically easy to replace one components by another. example, a NC plugs into another GC and GB, whose addresses they obtain from a registry. Only components that are working (as per monitoring) are listed in the registry. Since we have standardized the MQP, the message schema and possibly the download schema, GB and GC are interchangeable

Decisions approved:

  • No requirement for NC/DCPC to connect to ALL instances of a shared-service

  • NC/DCPC MUST connect to at least one instance of a shared-service, and SHOULD connect to two or more instances of a shared-service

  • NC/DCPC connects directly to the shared-service instance(s) - not via their GISC.

  • No requirement for all-to-all fully meshed connection

  • There must be more than one instance of each shared service

  • Global Monitoring [approved as Shared service

  • there needs to be automated service monitoring of the WIS2 system from day 1

Pending discussions:

  • "anti-loop" logic for avoid transmission of duplicate messages (and data).

  • governance and process to allocate shared service tasks [to GISCs] - noting that Secretariat plan to talk to each GISC operator to determine their aspirations for provision of shared-service instances.

  • What service performance criteria are needed for use in audit and how should these be evaluated?

  • What we're going to monitor and how; e.g. SaaS monitoring service

  • Functional requirements for the Global Monitoring shared service

4. Discussion on Global Cache connectivity

ALL

(Rémy) No need for Global caches to be connected to each other. Synchronization is only for global brokers and not for global cache.

(Kai) We should not forget about the structure of the connections (not only the number).. We need to avoid having two halves of the brokers which are only connected by one link

(Peiliang) need to ensure that centres are connected at least with a "spanning tree"

Pending discussion: Define the minimum number of connections ("G") for each type of shared service (broker, catalogue, cache, monitor)

Action items

to put the alignment of granularity between topics and metadata on the agenda to discuss at next Monday’s meeting
to further discuss the number of instances are needed GB/GCat/GCache

Decision

  1. Data will always be made available via a MQ message that points to where the data is for download; with an optimisation to embed small data in the message [approved]
  2. Adopts the MQTT 3.1 and MQTT 5 in WIS2.0
  1. Shared services approach approved by the team (Global Cache, Global Broker and Global Discovery Catalogue)
  2. No requirement for NC/DCPC to connect to ALL instances of a shared-service
  1. NC/DCPC MUST connect to at least one instance of a shared service, and SHOULD connect to two or more instances of a shared-service
  2. NC/DCPC connects directly to the shared-service instance(s) - not via their GISC
  3. No requirement for all-to-all fully meshed connection
  4. There must be more than one instance of each shared service
  5. Global Monitoring approved as shared service
  6. There needs to be automated service monitoring of the WIS2 system from day 1