Research data management

Considerations

Starting your own research from the existing data collected by other researchers can have some major benefits:

  • much of the background work has already been completed making it easier to undertake further research
  • it’s time-saving and cost efficient due to the reduced cost associated with duplication of data
  • the data comes with a degree of pre-established validity and reliability

However, careful consideration is required before reusing data.

Source context

  • Is there enough description about the content of the data? Is the context of the research relevant?
  • Is the source trustworthy?
  • Do you know how long the data will be stored and made available?

Licenses and user agreements

  • Are there restrictions or specifications of data re-use?
  • What will be the impact of these restrictions on your research?

Methodology

  • What is the relationship between existing and new data?
  • How will the data be integrated?
  • How will any format differences be managed?

Considering these aspects will help you determine if the data is suitable for you to reuse.

Finding datasets

For guidance on finding and reusing datasets, consult the How To Find Datasets library guide below.

Data citation and standards

When reusing the data of others, it’s critical to give proper attribution to the work of the original creator. This is called data citation and refers to the practice of referencing data to acknowledge it’s source, in the same way as referencing a book or journal article.

Citing data is important because it:

  • Acknowledges and provides credit to the originator of the data
  • Allows replication or verification of the new results and data, improving their reliability and validity
  • Enables the collection of statistics on the impact of the data (data citation metrics)

However, because the citation of data is a relatively new practice, the standards to follow are often unclear - referencing software like Endnote does have a template for datasets, but other requirements may mean the generated references need to be modified.

You should consider the following in order of precedence:

  1. Any guidelines from your editor or publisher
  2. Any guidelines from your Style Guide or Publication Manual
  3. Any guidelines from the data source (either the dataset creator or the data repository)

If these requirements are unclear or informal, DataCite recommends including the following elements:

Creator (PublicationYear). Title. Version. Publisher. ResourceType. Identifier

  • Publication year is the date when the dataset was published (not the collection or coverage date)
  • Publisher refers to the repository or data centre where the data is stored
  • Identifier should be displayed as a linkable, permanent URL
  • Version and ResourceType may be added where desirable.