Demonstration Test Catchments: Data sharing and archival - WQ0219

Defra’s Demonstration Test Catchment (DTC) project examines aspects of three test catchments (the Eden [Cumbria], Hampshire Avon and Wensum [Norfolk]), in order to investigate innovative solutions to the long standing water quality problem of diffuse pollution. A major component of the DTC approach is intensive monitoring of the environment in order better to understand the origin and nature of stressors, a key issue in the UK’s successful compliance with the Water Framework Directive. This project, the provision of a data archive for the DTCs, provides a key extra element to the DTCs.
A small data archive consortium of the Freshwater Biological Association (FBA), a leader in environmental data archiving, and King’s College London, a major developer of processes for handling and using electronic data, has the crucial task of receiving data gathered by three larger consortia (one for each of the three test catchments) and storing it in a dedicated archive in such a way that it is secure for the long term and easily accessible by consortia members, the people living in the three catchments, decision and policy makers. Tools, using leading edge technologies, will be developed to enable interrogation of the data to be achieved in an enjoyable, clear and efficient way and will be as accessible as possible.

In order to achieve this aim, it will be necessary to carry out much innovative work in order to develop the data archive into a format in which users can add or access information easily. The types of data are varied, including water chemistry, hydrological information, weather records, ecological data and ‘soft data’ (photographic and video images, subjective assessments, etc.). An effective archive will link interrelated components of this diverse data archive together so that users will be able to find relevant information in response to search queries. The task of this project, in addition to accepting and storing data, will be to make its use as intuitive and valuable as possible.

The data archive project team will talk to the researchers who will provide the data, and to potential users, including farmers and other land owners, in order to understand the uses to which the data may be put. Following this, the data archive team will develop ontologies, which use words and phrases (‘vocabularies’) specific to the types of use anticipated in order to allow users to find data easily; this will be combined with an intuitive approach to querying the data and a clear understanding of the formats in which users will want data to be made available. The entire process will be web-based, and will use web portals designed specifically to facilitate input and output of the data. Data provision will be connected, where appropriate, with supporting information to enable stakeholders better to understand its meaning and relevance to their needs.

Forums, newsletters and other communication activities will be organised for researchers and the stakeholders to demonstrate the new techniques and to raise awareness of the work being done, the results and the implications for ways in which water quality can be improved in future. Care will be taken to complement the activities of each of the consortia to maximise the cost effectiveness of these knowledge transfer activities, recognising that these must be of a high standard and can be very consumptive of resources. The FBA’s existing connections to farmers and their organisations, those interested in the natural history of water bodies and their surroundings, environmental charities, the water industry and regulators will be used to maximise the spread of knowledge from the project and to raise the profile of water quality.

Following the completion of the initial development and of the DTC projects, steps will be taken to ensure the long term viability of the data archive as a mechanism for storing and accessing valuable data on rivers catchments and water quality.
7. (b) Objectives

The general objective of this work is to provide a mechanism for storing, collating and accessing data from the Demonstration Test Catchments (DTCs) in a way that is usable for data providers, specialist data users and other stakeholders in the three catchments and elsewhere.

The proposed programme of work will be broken down into eleven Work Packages (WPs: described in 7.(c)), each of which is associated with a set of time-bound objectives that together will fulfil the general objective. This breakdown will make the work easier to manage and progress easier to monitor. The specific objectives are as follows, the numbers matching those of the WPs.

Note 1: The dates indicated assume a project start date of 1 Jan 2011.

Note 2: Because of the iterative nature of our approach, project deliverables will be made available in phased versions, so it is important to understand to what the dates below refer.

For documents (e.g. the data model), the date refers to the publication of the first complete version, which will be regarded as quasi-definitive for the project. This may however be subject to further updates during the project in the light of changing circumstances, in which case a final version will be released at the end of the project as part of the documentation set.

For software, the date refers to the release of the first, complete, tested beta version of the code in question, for final, formal evaluation by stakeholders. Earlier, partial, releases will be made available in accordance with the agile development methodology followed by the project.

1.1 To organise a Project Management Board for reporting to Defra (end March 2011).
1.2 To organise a Project Steering Committee, comprising scientists and technology experts, to act as “critical friend” of the project (end March 2011).
1.3 To produce a Quality Plan, describing the QA procedures to be applied during the project, and in particular to assure quality of project outputs (end June 2011).
1.4 To produce a Dissemination Plan and to manage dissemination and communication of outputs and achievements (end Sept. 2011).
1.5 To produce progress reports and a final report for Defra (end Dec 2011 and annually thereafter).

2.1 To engage with the data provider community through a series of targeted workshops (end Sept. 2011).
2.2 To deliver a Data Management Plan, as described in DTC-DMR-3 (end Dec 2011, with updates ongoing thereafter).
2.3 To deliver an ongoing online communication mechanism for data providers (end Sept. 2011).

3.1 To engage with the broader user community through a series of workshops categorised by DTC area (end Sept. 2011).
3.2 To deliver a prioritised list of functionality and features required by or requested by particular stakeholder communities (complete version end Jan. 2012).

4.1 To develop data models that define the semantics and syntax of the data and metadata to be managed by the archive, using a standard formalism such as UML (end Mar. 2012).
4.2 To define metadata schemas for the digital material in the archive, using relevant international and open standards (end Jun. 2012).
4.3 To implement content models for the different types of digital objects to be managed in the archive (end Dec. 2012).

5.1 To define formal vocabularies for describing the domain-specific objects, procedures and relationships with which the archive will have to deal, based as far as practicable on existing vocabularies and standards (first complete release end Mar. 2012).

6.1 To deliver an operational web-accessible archive capable of ingesting, storing and preserving the data objects identified in WP2 (end Jun 2014).
6.2 To deliver an archive that provides data curation and preservation services in accordance with the OAIS model (end Jun. 2014).
6.3 To deliver an archive that can ingest streams of data from the various sources identified in WP2, and which can be adapted to handle new data sources without major modification (end Jun. 2014).
6.4 To deliver an archive that can support a range of data object types (list to be determined in WP2) and can be adapted to handle new object types without major modification (end Jun. 2014).
6.5 To implement a flexible access control framework so that data is made available only in accordance with access rights, at a dataset and sub-dataset level (end Jun. 2014).

7.1 To specify and implement a generic and flexible framework for querying the archive (end Jun. 2014).
7.2 To implement a number of specific query services using the framework (end Jun. 2014).
7.2 To write guidelines on developing additional query services using the framework (end Jun. 2014).

8.1 To support the delivery and export of datasets in a variety of common formats, to be determined through engagement with potential user communities (end Jun. 2014).
8.2 To support delivery and export of data though a flexible and extensible mechanism, so that additional formats can be added as required without excessive modifications (end Jun. 2014).

9.1 To deliver access portals to accommodate targeted communities, as specified by Defra (end Sept. 2014)
9.2 To provide a framework facilitating the creation of additional portals subsequent to the project (end Sept. 2014).

10.1 To ensure that the archive, together with associated tooling and interfaces, meets the requirements of the various stakeholder communities (end Nov. 2014).

11.1 To host the data, archive and associated portals and tools after the lifetime of the DTC project with guaranteed longevity of access in the absence of continuing funding (mechanism in place at end Sept. 2014).
11.2 To provide access to the data, archive and associated portals and tools that is free at the point of access for all interested stakeholders (mechanism in place at end Sept. 2014).
11.3 To produce an Exit and Sustainability Plan that describes the options and potential business models for taking the system forward once the funding period is over (end Sept. 2014).

