#Sebbari# Archive/Cache/Destroy (Disposal)*
1. Quality Assurance/Control
The term “Disposal” Is preferred, because it is more generic, and less “confronting” to NMHSs than the term “Destroy”
A quality control process should be defined for all data and information being archived, to remove errors
[TB1] and highlight and address deficiencies.
Data/information integrity checks, and checks that information expected is in fact received, should be in place, and prompt remedial action taken when information is missing or corrupted.
The quality control process should be documented
[GP2] and versioned, and the documentation retained in line with the NMHS Retention Policy.
The results of quality control must be documented
2. Backup strategy
Elements of Strategy
Have a backup strategy for all data and information types with backups at regular, prescribed intervals.
At least one working copy and one backup copy must be kept. For high-value and irreproduceable information, regular offsite backup is additionally required.
A data synchronization policy between the working copy and the backup copy is required (synchronized / desynchronized / synchronization frequency)
A rapid recovery process is required, to ensure that information can be restored, that corrupted data/information can be rebuilt, in a timely fashion and in line with operational priorities.
Details of the recovery procedure should be clearly and concisely described in easy to access documentations, and regular testing of the procedures should be carried out.
3. Archival and retention
The purpose of archiving data is to be able to preserve and reuse the original data for future generations.
Planning
Data/information management plan. Specifying among others: storage requirement, frequency of update, security, restrictions etc
Responsibilities and accountabilities for each different data or information type must be clearly established, including who is the appropriate contact point
Inventories and metadata should be maintained and kept up to date, and be archived.
Special attention should be paid to the type of storage chosen -> robust hardware is required for long-term storage (eg: magnetic tapes are preferable to optical storage
[GP4] ).
Cloud storage may be a good alternative to in-house storage. Considerations include, among others: cost of archival and extraction; whether the data/information are to be shared;
Reading the archive
It is necessary to establish recommendations for the reading time of the data / information: rules must be put in place for the caching of the read data (quantity and duration)
Data considered as "popular", i.e., in high demand, should be archived on faster storage (permanent cache), even if it involves double archiving
A review of these "popular" data needs to be done regularly
Retention and Disposal
Retention policy should be set by Member Countries based on national legislation and business needs. For public data the retention policy should be made publicly available.
Members should provide a technical solution that provides the level of resilience appropriate to the retention policy.
The retention policy needs to consider whether the data/information can be reproduced or not. In general, irreproduceable observations will be retained in perpetuity.
Members are responsible for determining the retention policy and for the information and datasets that are to be permanently preserved the members are responsible to setup the proper mechanisms for preservation.
Procedures and documentation are required to ensure that where data or information is to be disposed of, approved and verifiable processes are followed.
An appropriate decommissioning process is required once data have been migrated/disposed of.
Where there is a need for long term preservation of data, Members may want to consider offsite backup twinning/mirroring with other centre or use of cloud solutions. Members remain responsible for the preservation of data as outlined in their retention policy even where a third party is tasked to implement the policy.
Documentation
Inventories: All archived information must be recorded in an inventory that is regularly maintained, accessible, and provides clear advice on how to access the information.
Details of the data lifecycle must be maintained such as reception date, validity period, etc, and details of any replications, along with provenance information (see below).
When a dataset or information set is deleted, metadata should be retained and publicly accessible that specify the prior existence of the set, and describe the circumstances and reasons for the deletion.
4. Technology migration
A technology migration plan must be developed, to ensure that the data and information remains accessible, readable and recoverable.
Where a media migration needs to be performed, the process should be well defined and have checks and verifications in place.
In migrating data and information, a snapshot of the latest version should be taken.
5. Security
Cyber Security
Data/information must be protected from unauthorised modification, which requires clear processes and authorisations as part of the governance process.
Sensitive data requires higher levels of access control. Multi-key authentication and encryption may be required.
Virus protection software (where appropriate) must be kept up to date
In general, public access should not be allowed to an NMHS database or information system. Where information is to be "pulled", the information requested should be accessed through a firewall or replicated version of the information.
Physical Security
Physical security of the information (both storage and archiving) includes protection from (fire / water / earthquake / physical intrusion
If the information is in hard-copy form, it must be managed in accordance with best-practice storage techniques for physical records
6. Provenance and Versioning
Different versions of the information data must be clearly identified;
Maintain an audit trail describing processing history, quality procedures etc that change the data/information;
Clearly specify the original and authoritative versions of the data-or information set;
For high-profile or closely-scrutinised products full traceability is required, i.e., it should be possible to link products to the version of the data or information used to generate the product
Metadata and associated artifacts (e.g. Source code, algorithms …) needs to be
[TB1]We should not be removing errors unless they are so egregious that the data are not and can never be usable. Maybe introduce flagging here rather than removal.
[GP2]See my comment below.
[GP3]Documented is good. Should we also require it to be made available?
[GP4]Should we mention cloud storage? Many if not all US national data centers are exploring the cloud option for their archive needs now.