Date
13:00-15:00 UTC
\uD83D\uDC65 Invited Participants
Jeremy TANDY (ET-W2IT Chair)
Rémy GIRAUD (SC-IMT Chair)
Hyumin EOM (KMA)
Masato FUJIMOTO (JMA)
Kari Sheets (NOAA)
Steve Olson (NOAA)
Max Marno (Synoptic)
José Mauro(INMET)
Tom Kralidis (ECCC)
Kai Wirt-Thorsten (DWD)
Elena Arenskotter (DWD)
Saad Mohammed Almajnooni
Majed Mahjoub (NCM)
Chems eddine ELGARRAI (DGM)
Lei XUE (CMA)
Wenjing GU (CMA)
Xinqiang HAN (CMA)
WMO Secretariat
Enrico Fucile
Hassan Haddouch
Xiaoxia Chen
David Inglis Berry
Anna Milan
Timo Proescholdt
Apologies
Ping GUO (CMA)
Yoritsugi YUGE (JMA)
Meeting Note
Assessment of metrics implementation in Global Services
Agreement on consistent metrics behavior in Global Services
validation of discovery metadata at Global Discovery Catalogue; implementation of Global Broker “discard”
WIS2 Global Services metrics
Jeremy introduced the commentary based on analysis performed by WMO Secretariat (Maaike), 1-hour period on 20-Feb-2025.
Key outcomes from discussion: agree what the correct reporting is for each metric
Connected: 0 = not connected, 1 = connected, report 0 if you cannot connected (null / no data if you’ve never tried to connect)
Numbers: _after_ de-duplication, including errors or not?
This is needed to provide baseline information to assess whether Global Service behaviour is consistent.
Metrics can be used to understand WIS2 performance in a “stepwise” fashion - i.e., start with “connection”, then look at the numbers of messages/data-objects.
Global Monitor:
Looking at the metrics available from ma-meteomaroc-global-monitor there are gaps in the metrics from 18h-20h UTC, 19-Feb-2025.
This was due to an outage of the Global Monitor. Which happens …
But: we’re not currently monitoring the Global Monitors so we cannot easily identify if “gaps” are due to an upstream Global Service not reporting data, or whether the GM was offline and didn’t scrape the metrics every 15-seconds.
** Recommendation: we monitor the Global Monitors
… should be able to see this from wmo_wis2_gb_connected_flag - but neither ma-meteomaroc-global-monitor nor cn-cma-global-monitor appear in the metrics for Global Brokers (below).
… do we need a dashboard indicating “Global Services available now” (red/green traffic light) etc. - based on connection to HTTP or MQTT end-points
(Rémy) regarding the Global Monitors, since China and Morocco have confirmed having a Broker, he proposed establishing a sensor centre to monitor the Global Monitors' connectivity. He also requested China and Morocco to share their brokers endpoints with WMO Secretariat.
Action: GM (Morocco) and GM (China) to share the GM broker endpoint with the Secretariat.
Global Broker metrics:
gb_metrics_analysis.xlsx
Nomenclature:
BR-GB = br-inmet-global-broker
CN-GB = cn-cma-global-broker
FR-GB = fr-meteofrance-global-broker
US-GB = us-noaa-global-broker
Sheet #1: wmo_wis2_gb_connected_flag
Values are the “highest” value recorded during the hour (i.e., if “1” was recorded at any point during the hour, the value is “1”)
First challenge: getting consistent metrics on _connections_
Important so that we can trigger warnings/alerts from Global Monitors - if multiple GBs cannot connect to a WIS2 Node (or Global Service) this increases the likelihood that the centre is offline rather than there being a point-to-point connection issue.
[lines 6, 8, 92] US-GB does not report a metric on WIS2 Node (au-bom, bf-anam, za-weathersa) that is known to be offline - all other GB report “0” (not connected)
Action: Steve to check and follow up on this.
[lines 10, 15, etc.] CN-GB does not appear to report metrics on other Global Services
Action: Lei to follow up on this.
[lines 17, 18, 23, 48, 62] CN-GB cannot connect to WIS2 Nodes from Chile (cl-meteochile), Cameroon (cm-meteocameroon), Cuba (cu-insmet), Italy (it-meteoam), Morocco (ma-marocmeteo) - confirm?
Action: Lei to check and follow up on this issue
Action: the Secretariat to resend the IP addresses of GB(China) and GB (China) need to check the user and password of these WIS2 nodes, e.g Morocco.
[line 35] Only BR-GB appears to be subscribing to FR-GB (specifically, only BR-GB is reporting being connected) - more connections expected (albeit CN-GB probably not reporting its connection)
Remy reported that the Global Broker in Brazil is using different credentials for incoming and outgoing connections and suspects this as the source of the problem
Action: Lei to check and follow up on this connectivity issue
[line 36] US-GB is using the wrong centre-id for FR-GB (which probably also explains why it looks like there are so few connections to FR-GB - see above)
Action: Steve to check and follow up on this
[line 41, 47] Only FR-GB connected to WIS2 Nodes hk-hko-swic and ir-irimo - recommended to have at least 2 subscriptions
Action: hk-hko-swic, Secretariat to verify the correct IP address of GB (Brazil) and share with Hong Kong colleague and also check with the Iran colleague their access control
[lines 46, 47, 48] US-GB is not reporting connection to EUMETSAT, Iran, and Italy - is this because US-GB never attempts to establish a connection?
Action: Steve to check. Secretariat to share the Global Broker (US) credentials, and Italy to verify their whitelist for the Global Broker (US)
[line 53] US-GB reports connection to ke-kmd which isn’t in the WIS2 Registry - what’s happening here?
Action: Steve to check
[line 59] Only US-GB reports connection to WIS2 Node from Kazakhstan (kz-kazhydromet) - confirm?
Remy confirmed that GB-FR and Brazil have connectivity with Kazakhstan and that there was only a downtime limited to a short period (one day)
[lines 75, 76] Confusion on centre-id for Sint Maarten (sx-met or sx-metservice)
Action: Secretariat to check with Sint Maarten
Sheet #2: wmo_wis2_gb_msg_received_total
Values are the _increase_ in numbers of messages received during the hour
Second challenge: diagnosing whether each GB is handling (roughly) the same number of messages
Important so that we can see if messages are getting lost, which would mean that subscribers to such a GB would be under-served
[summary] FR-GB and BR-GB report roughly similar numbers (they’re running the same software!), CN-GB reports roughly 20x more messages, 3000-3500x more messages
Action: Steve and Marc to follow up on this
[line 8] CN-GB reports receiving 25k messages from Burkina Faso while that WIS2 Node is known to be offline (and CN-GB reports no connection)
Action: Lei to follow up on this
Decision: no need to run the performance tests. identify 3 or 4 centres, as a reference, to monitor the all 4 GBs if they behave consistently. Each GB will test the metrics.
A discussion was raised regarding how to report on the functioning of Global Services and the required retention of metrics . For GB (France), Remy informed that the retention period is two weeks of archived metrics. For GM China and GM Morocco, the retention period is also two weeks
Chems informed that the retention for GM Morocco can be extended to one month.
(Hassan) to report on the functioning of the Global Services, we can consider using the Jira system to provide statistics of GS performance.
(Rémy) we need to collectively define KPIs of global services performance to be presented at INFCOM-4.
Action: Remy, Jeremy, and Hassan will discuss the retention of the metrics in Geneva this week on the sideline of the Gateways meeting
Sheet #3: wmo_wis2_gb_msg_no_metadata_tot
(Similar to sheet #2)
[lines 14, 17, 24, 25, etc.] CN-GB is reporting no metadata when the other GBs are not - is the metadata validation only enabled for CN-GB?
[line 82] CN-GB reports no missing metadata from UK, BR-GB and FR-GB report errors, US-GB doesn’t report
Global Cache metrics
gc_metrics_analysis.xlsx sent by email from Jeremy
Nomenclature:
CN-GC = cn-cma-global-cache
UK/USA-GC = data-metoffice-noaa-global-cache
DE-GC = de-dwd-global-cache
JP-GC = jp-jma-global-cache
KR-GC = kr-kma-global-cache
UK/USA-GC is still reporting metrics for the 100+ test nodes used in the Global Services testing - these are excluded from the analysis for clarity. But - these “old” centre-ids must be removed.
Update: Max fixed it.
Sheet #1: wmo_wis2_gb_connected_flag
[line 3, 4] JP-GC reports no metrics for Antigua or ai-metservice (?) even though a small number of messages are sent
[line 5] DE-GC reports no metric for Argentina even though messages are being sent according to GB metrics; from looking at Grafana DE-GC appears to have gaps in provision of metrics
[lines 17, 28, etc.] DE-GC and JP-GC often don’t report a metric (no value available, e.g., Cameroon, Guinea) where as other GCs are able to connect; GB metrics indicate that Cameroon and Guinea are connected but (excepting US-GB) not sending any messages. How are CN-GC, UK/USA-GC and KR-GC reporting “connected” when there probably wasn’t anything to download?
[line 21] UK/USA-GC can’t connect to Cyprus - confirm?
Note: “last-download” timestamp - if nothing has ever been downloaded for a given data server, the value will be null (not reported)
… Generally, metrics are only set once something has been tried. They are (mostly?) not initialised. CN-GC appears to initialise _download_total to zero even when there’s been no connection. Confirm?
Decision: we need to agree on the consistent metrics considering current DWD and Uk&USA implement differently.
Action: Jeremy, Rémy, Kai and Max to have a meeting this week to discuss the GC metrics.
Sheet #2: wmo_wis2_gc_download_total
[summary] Generally, CN-GC, UK/USA-GC and KR-GC tend to agree on numbers; DE-GC and JP-GC appear different.
[line 15] ca-eccc-msc-global-discovery-catalogue published ~30 cacheable data objects in the hour - what’s being published? (Metadata tarballs are once per day)
[line 20] GCs are not reporting a metric for cn-cma-global-discovery-catalogue - suggesting there has never been a download from the CN-GDC; is CN-GDC providing daily tarballs of the metadata records?
Action: pause for discussion
Sheet #3: wmo_wis2_gc_download_errors_total
[line 14] GC downloads from ca-eccc-msc: CN-GC and UK/USA-GC report ~200 errors, KR-GC reports 13k errors, DE-GC reports 172k errors, JP-GC reports 550k errors … what is happening?
[line 24] CN-GC, UK/USA-GC and KR-GC all report broadly consistent total-download from de-dwd-gts-to-wis2 (~500k message), CN-GC and UK/USA-GC report ~300 errors, but KR-GC reports 147k errors … does the _total_downloads include _error_downloads?
Action: pause for discussion
Global Discovery Catalogue metrics
gdc_metrics_analysis.xlsx
Global Discovery Catalogues (GDC) also appear to suffer from inconsistent metrics implementation.
But GDCs are not part of the data-exchange operations (mostly, the metrics are about metadata quality)
… so recommend that we prioritise getting metrics for GB and GC consistent first.
3-validation of discovery metadata at Global Discovery Catalogue; implementation of Global Broker “discard”
Tom presented the issue presented on GitHub regarding finding the best time to set properties.metadata_id as required #119.
(Rémy) currently, the properties of metadata is not mandatory. During ET-WISOP kick-off meeting, we notified that we will enable metadata validation by 1st September 2025. He highlighted if we are going to change the regulations we need to present it for INFCOM in 2026 for approval and EC, which means it will be done in 2027, which is too long. He proposed another approach by using the channel in metadata record.
(Kai) we should enforce the metadata in WIS2 exchange. The problem is if The issue is that if there's an error in GDC, it could disrupt data exchange.
(Jeremy) the first part, because GB will need to look through GDC, if a metadata record is broken (or if the metadata fails validation by a GDC), the channel will be parsed down by GB. Then the data will be missing.
(Rémy) We cannot use the metadata ID starting from 1st September 2025. There's a risk that issues with one GDC could impact data sharing. To improve reliability, we need to be more flexible rather than too strict.
Remy propose to combine the truth from all GDCs and create a reference based on that on GitHub
Actions
GM (Morocco) and GM (China) to share the GM broker endpoint with the Secretariat.
Kari and Steve to follow up to make sure more consistent GB (US) metrics reports
All to go through the list shared in the email to have a consistent metrics
Next meeting
10 March
0 Comments