TReQ: 5 Open data & code

16 December, 2021

Reading time: 10 minutes

This video tutorial explains how open data and code can help improve the transparency, reproducibility, and quality (TReQ) of applied research.

TReQ (Improving the transparency, reproducibility and quality of your research): 5 Open data & code

Presented by Nicole Watson

This video tutorial explains how open data and code can help improve the transparency, reproducibility, and quality (TReQ) of applied research. Open data and open code means making your data and code publicly available in online repositories for others to use and check. We discuss the benefits of open data and code and address concerns both for you as a researcher and for science and society in general, and show you how to go about doing it This is video 5 of 6 from our team at University College London.

TReQ open data & code: video transcript

[Introduction slide 1: title “TReQ: improving the transparency, reproducibility and quality of your research”]

[Introduction slide 2: title “Video Five: Open Data and Code”]

Nicole Watson: In this video, we’re going to talk about open data and code. It’s a big topic and we can’t cover everything, so we’re just going to focus on some of the most helpful principles to bear in mind.

[Slide titled “What does it mean for data and code to be open?”]

Nicole Watson: So, first of all, what does it mean for data and code to be open?

[Definition slide with title “Open Data: research data that is freely available online for other researchers to view, access and download”]

Nicole Watson: Open data is research data that’s freely available online for other researchers to view, access and download. Similarly, making your code open means publicly sharing it online.

[Definition slide with title “Open Code: code for data analysis that is publicly shared online”]

Nicole Watson: It’s important to note that when we talk about code, we don’t just mean software or computer code used to analyse quantitative data. We’re talking about the general process of analysing your data. So this could also include things like inductive or deductive codes used to analyse qualitative data.

[Slide titled “Why share data and code?”]

Nicole Watson: So why might you want to make your data and code public?

[Slide with an animated icon depicting the head and shoulders of two people in separate circles, who become linked by arrows and title “Sharing data allows other researchers to make use of your dataset”.]

Nicole Watson: First of all, sharing your data allows other researchers to make use of it with appropriate citations, of course.

[Text slide with bullet points:

  • Increases the impact and visibility of your work
  • Prevents duplication
  • More efficient use of resources]

Nicole Watson: This can hugely increase the impact and the visibility of your work, as well as benefiting scientific progress by preventing duplication of work and also sharing valuable resources.

[Slide with an animated white icon depicting a handshake on a plain turquoise background and title “Encourages collaboration between different research groups”.]

Nicole Watson: Sharing your data can encourage collaboration between different research groups, which can be really beneficial for your reputation as a scientist and also your own productivity. There’s also kind of a community element to it.

[Slide with an animated icon depicting three people coming together into a group and title “Benefits the scientific community as a whole”.]

Nicole Watson: So if you’ve ever benefited from the use of someone else’s data, there’s a high chance that someone else could equally benefit from using yours. The idea of letting other people see your code might seem slightly daunting, but it also has great benefits for you as a researcher.

[Slide with an animated icon depicting hands typing on a keyboard and title “Write better and cleaner code for yourself and others”.]

Nicole Watson: Personally, I’ve always found that whenever I’ve decided in advance that I’m going to be sharing my code, it’s made me much more conscious of writing better and cleaner code, and also making lots of comments and notes that are useful, not only for other people, but also myself in the future.

[Slide containing a screen recording of someone coding and adding notes to their code.]

Nicole Watson: There’s also evidence to suggest that papers with code available get cited more often than those that don’t have it.

[Slide with an animated icon depicting a hand held out flat with the palm upwards. A plant grows from the palm with a tick symbol in the centre and title “Shows that you have confidence in your own work”.]

Nicole Watson: Finally, if you’re willing to share your data and code, that shows that you’ve got the confidence in your work to put it out there and let other people check it. And when errors have been made, this means that they can be identified and put right, which is good for scientific progress.

[Slide titled “Key Considerations”]

Nicole Watson: There are a few things to bear in mind when sharing research data online. It’s recommended that you follow the FAIR principles. So this means data needs to be findable, accessible, interoperable and reusable.

[Slide titled “FAIR” down the side appears, with the text “findable”, “accessible”, “interoperable” and “reusable” appearing next to the relevant letters]

Nicole Watson: In other words, this means it needs to be easy for both humans and machines to find, and people need to be able to actually access the data. It should be able to be integrated with other different forms of data and to be as useful for as wide a variety of applications as possible. It also needs to be documented in sufficient detail to allow others to reuse it. Another thing to keep in mind when you’re sharing your research data is privacy and ethics, and this is really important. Typically, data needs to be anonymised before it’s shared, and you need to make sure you’ve got consent from your research participants.

[Slide containing a screen recording of the UK Data Service’s search page. The screen recording zooms in and slowly scrolls down to show the results for the search terms ‘data sharing’]

Nicole Watson: The UK Data Service has loads of great resources on this topic.

[Slide titled “Potential Concerns”]

Nicole Watson: One of the most common concerns with data sharing is that someone else might swoop in and use your data to run some analysis that you were planning to do before you get the chance to do it yourself.

[Slide titled “Concern One: someone might use my data before me”]

Nicole Watson: We understand why researchers might be concerned about this, but in reality, it happens pretty rarely. You can also choose to licence the data in such a way so that people need to either cite you or get your explicit permission before they can use it. And this can even lead to collaborations that might’ve not happened if you kept the data closed in the first place.

[Slide containing a screen recording of the Turing Way Guide website. The screen recording zooms in to show a mouse clicking on the button titled “Data Licenses”. It then shows this page being loaded and slowly scrolls down to show more of this page’s content]

Nicole Watson: The Turing Way’s Guide to Reproducible Research has a really good webpage with a list of all the different licence types and when to apply them. Another potential concern is the time and effort that it might take to tidy up your data and code and make them fit for public viewing.

[Slide titled “Concern Two: it takes too long to tidy up my code”]

Nicole Watson: But many funding centres are already requiring that you produce data management plans, which involve keeping your data well organised anyway. And some of them are even requiring that you make your data open by default.

[Slide with an animated icon depicting three computer servers and title “Keeping your data and code well organised is good research practice”]

Nicole Watson: In general, keeping your data and code well organised and well managed is just good research practice.

[Slide with an animated icon depicting a large cog with a needle moving round it and title “Good data management also makes your workflow more efficient”]

Nicole Watson: Trust me, it makes your workflow much more efficient and your life so much easier. There are some legitimate situations where sharing data and code gets a little bit more tricky. For example, if you’re working with a commercial partner or if you’re using data that contains highly sensitive information.

[Slide titled “Concern Three: I’m using data that might not be shareable”]

Nicole Watson: The coding equivalent of this might be if you’re using proprietary software or models. Under these kinds of circumstances, maybe there’s an agreement that can be made to share part of the dataset or to create a synthetic dataset that structurally mirrors the real data without revealing any of the actual information.

[Slide titled “Practical Tips”]

Nicole Watson: So we’re just going to finish off with a few practical tips. So when I first started sharing my data, I found it really tricky to figure out which of the many different repositories to go for.

[Slide containing a screen recording of the Zenodo website homepage. The screen recording slowly scrolls down and then zooms in to show more of the page]

Nicole Watson: One of the most commonly used sites for data sharing is Zenodo, which is a free and open source platform.

[Slide containing a screen recording of the Figshare website homepage. The screen recording zooms in to show a mouse clicking on the button titled ‘Login’, and then loads the login screen]

Nicole Watson: Figshare is another really good repository, and many universities also have their own systems too. If your code is fairly simple, you can usually just upload it alongside your data or as supplementary material when you publish your paper. If your code is more complex, you might need to turn to tools such as Git or GitHub.

[Slide containing a screen recording of the Turing Way Guide website. The screen recording slowly scrolls down to show more of the page]

Nicole Watson: And again, The Turing Way Guide has lots of really good resources on this.

[Slide titled “Summary”]

Nicole Watson: So to summarise, sharing your data and code can have huge benefits for you as a researcher.

[Text slide with bullet points:

  • Increases the impact and visibility of your work
  • Encourages collaborations with other scientists
  • Incentivises good management of your data and code]

Nicole Watson: It can help improve the impact and the visibility of your work, encourage collaborations with other scientists, and it can even give you a really good incentive to keep your data and code well managed, which your future self will definitely thank you for. It also has huge benefits for science.

[Text slide with bullet points:

  • Errors can be spotted and corrected
  • Prevents duplication of work
  • More resources available for the scientific community]

Nicole Watson: It allows errors to be spotted when they are made. It prevents the duplication of work and allows valuable resources to be shared with the whole scientific community. As a researcher, you’ve almost certainly benefited from the use of someone else’s data, so this is a chance to give back to your fellow scientists by sharing your own. There is a lot to bear in mind, but there’s also loads of great resources out there to help you.

[Closing slide 1: title “Up Next: Conclusion”]

[Closing slide 2: title “TReQ: improving the transparency, reproducibility and quality of your research”]

[Closing slide 3: title “For links to further resources and more about us visit: bit.ly/TReQtools”]

[Closing slide 4: UCL logo, CREDS: Centre for Research into Energy Demand Solutions logo with title “Supported by”]

Banner photo credit: Joel Filipe on Unsplash