The first phase of the data lifecycle is the creation of data/information. The creation process includes the following steps:
Pre-processing and document of collection data[TB1] : To ensure correctness, completeness and normativity, pre-processing should be conducted before collected data goes into a repository or a data management the system (CDMS). Pre-processing methods include: data format check, filename conversion, file merging/splitting and so on. The details of pre-processing procedure should be documented.
Document any associated information of the product/dataset: All associated information about the product/dataset should be well-documented, and documentation should be persistently available. The information needs to be documented includes: what it [TB2] data source, how it was produced and processed, what it contains, its ownership, its retention policy and retention status, how it is accessed, who can use it and how to use it, what application areas it is intended to support.
Version Control and its Software Code[TB3] : To ensure traceability and reproducibility, you[GP4] need provide version control information which describe the procedure of how the product was produced and processed should be documented and version control. If a product (such as a satellite data time-series) requires multiple processing steps, documentation of each step must be kept and made available. The software code is necessary, which used to create a product should be stored in a suitable code library and a version control technique should be used to keep a clear record of how the software code was created, developed and changed over time. Revision history is necessary to track changes which should contain a unique version number, when the change was made, who made it, what was changed and the purpose of the change. A good version control practice of software code should allow people to identify the latest final version easily and revert to an earlier version whenever you need.
Quality Assurance: Ensure there is always a mechanism for quality assurance[TB5] , and quality related procedures, definition of quality measures should be documented.
Discovery Metadata Generation: Ensure there is a process exists to define discovery metadata, which provides the information of what a dataset contains, who is responsible for the dataset and how to access it. The change of algorithm, software code or input should trigger the process to generate a new discovery metadata.
Data Storage: Once D/I has been created within the organisation, it needs to be stored and protected.
Data Security Assurance: To ensure a privacy, confidentiality and appropriate access, relevant access authority and use constraints should be clearly stated. For those data coming from external sources or belonging to common intellectual property rights, an additional agreement defining rights, obligations and responsibilities is important.
Policy Setting: Policies should be set to provide the authoritative statement of the principles for information management. The policies should cover the following issues:
Data is accessible to only authorised users and is not disclosed to unauthorised users or the public unless appropriate and lawful.
Software codes are modified under clearly-defined circumstances and testing needs to be carried out before online.
[TB1]What is collection data?
[TB2]Suggest removing
[TB3]Is this realistic?
[GP4]Producers or providers?
[TB5]The term quality control is used above, quality assurance here. Terms should be defined.