Data Life Cycle
The data life cycle, as described by DataONE's best practices (https://www.dataone.org/data-life-cycle), has eight components for managing and preserving data for use and reuse. The following describes how the components are implemented at the Arctic LTER.
Plan: Planning begins at the proposal where a detailed plan for experiments and data collection is proposed. Both the Arctic LTER and any projects that used our sites or experiments include these plans. However sometimes ad hoc plans and opportunities arise and there are no formal plans. The investigators will determine how these data fit in the overall data plans they follow.
Collect and Assure the quality of data: Each investigator is responsible for the collection and curating of the data to a point where the data can be put online for others to use.
Describe: We encourage the investigators to use the templates we provide for submitting datasets for uploading to our web site and to data portals. Comments and lists in the templates are used to aid investigators in entering metadata.
Once an investigator submits a dataset, the Information Manager (IM) uses several scripts (at present Excel macros) to check the metadata and data for completeness and correct data formats. These checks include using standard units, consistent site descriptions and locations, keywords, missing data definitions and other common errors in formatting data. Once the dataset passes the checks the data files are moved to a data directory on the web server and the dataset, data sources and any necessary persons, research sites or project content are created or updated. Once complete the dataset is uploaded to a data portal where additional checks may be made.
Preserve and Discover: The Arctic LTER is one of several LTER Drupal web sites that use Drupal Ecological Information Management System (DEIMS) as a content management system for a web site and for entering and managing datasets. DEIMS use several custom content types to populate dataset metadata fields. People, research sites and projects are all separate content types (SQL data tables) that populate a dataset content type. In addition, we use the Drupal bibilo module to maintain the Arctic LTER bibliography. Once datasets are entered/updated and published on the Arctic LTER web site, xml (eml) can be generated for uploading a dataset to the EDI data portal, https://portal.edirepository.org. Once uploaded to EDI a Digital Object Identifier (DOI) is assigned and the dataset is discoverable through EDI and DataONE data portals. EDI is an NSF-funded project for curation and archive of environmental data and provides for long-term archiving of data. Other data centers can be used for Arctic LTER and associated projects, e.g. the Arctic Data Center, Genbank. At the Arctic LTER the local databases are regular backed up to local hard drives and to cloud backups. For those interested details on the hardware and software are available.
Integrate and Analyze: Software used for data integration and analyses have been evolving as datasets become larger and more complicated. Many researchers use spreadsheet programs for the basic data entry and calculations. But MatLab and R have become increasing important for data analyses. An example is the R scripts being developed for analyzing and checking vegetation reflectance data.