Scripting
The 5 Do's of open science scripting
The following are meant as a set of guidelines for you to consider adopting in your scripting. We don't expect people to adopt all of these points instantly. Perhaps try adopting one point at a time.
- Do make your code available on GitHub (or similar) platforms even if it doesn't run top to bottom
- As academics, being able to share our code is actually a privilege - make use of it - and admittedly it may not always feel like it!
- Too often "my code doesn't run" is used as an excuse not to put code online
- You don't have to know how to use Git in order to use GitHub - drag and drop files into a repository using GitHub's web interface
- Do document your code
- "Your closest collaborator is you from 6 months ago, and they don't answer emails". Documenting your code through README documents and in-line comments is a way of being transparent about what your code does, and a way of being kind to yourself!
- Do learn dynamic documents e.g. R Markdown / Quarto / Jupyter notebooks
- A clue you could find them helpful is if you find yourself inventing your own comment syntax for headings in your scripts
- It makes it really easy to share analysis with others and to iterate through versions without having to copy and paste output into Word documents for example
- Do consider separating your code from your data and output
- Using config files to store absolute file paths means that your code reads in the location of the data (example)
- It helps prevent accidentally sharing sensitive data when you share your code (e.g. on public spaces like github)
- This can help with accessing RDSF locally and on HPC
- It's a good security practice to avoid sharing file paths publicly
- It makes it much easier to run your code in different locations, because the config file can be different in each location depending on where to find your data and outputs
- Do constantly keep learning and trying to improve your coding
- One way to do this is to read code on publicly available GitHub repositories
- Have a look at papers such as Wilson et al., Best Practices for Scientific Computing, PLOS Biology, 2014 DOI
- and Wilson et al., PLOS Computational Biology, 2017 DOI
- and the Reproducible Research section of the Turing Way here
- and Morton et al., BMJ Health & Care Informatics, 2022, 29:e100488 URL
Monthly meetings
Leads: Matthew Suderman
We'll hold meetings in OS6 at Oakfield House every first Tuesday of the month at 12pm, lunch will be provided! Sessions are themed and will be recorded
List of meetings...
Topic suggestions
- IEU project portal
- Creating and managing projects in the IEU
- Development environments
- High performance computing
- Writing scripts to use HPC systems like BlueCrystal for analysis
- Documenting analyses
- Pipelining tools
- Project structure
- Project template; config files; pipeline; separating data, code and outputs
- Version control
- Git, Github and Github Desktop
- John D. Blischak, Emily R. Davenport, and Greg Wilson: "A Quick Introduction to Version Control with Git and GitHub" PLoS Computational Biology, 2016|
- Magit https://www.youtube.com/watch?v=epp25eCFzd0&ab_channel=SamNeaves
- Gitlab
- Git, Github and Github Desktop
- Reproducibility
- renv https://rstudio.github.io/renv/
- RStudio https://carpentries-incubator.github.io/Reproducible-Publications-with-RStudio/
- virtualenv https://pypi.org/project/virtualenv/
- conda https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/
- docker https://www.youtube.com/watch?v=EApuQ3Yffe8&ab_channel=SamNeaves
- singularity https://apptainer.org/
- Open science
- Use R, RStudio, git and GitHub to share methods, data and code https://carpentries-incubator.github.io/open-science-with-r
- Data science
- Trusted research environments