Data Reproducibility Workshop: Abstracts

Extending the lifeline of data and its analysis - to ensure FAIR science

Session information

Sunday 24th of March 2019

9:00 - 11:00 Session 1: Electronic lab books and protocols => followed by a hands-on workshop

Keeping a record of your research activities is essential for reproducible research. In this seminar, we will introduce electronic lab books and online protocols to improve quality management within labs with regard to reproducibility of the workflow. Participants will learn about electronic lab book alternatives and what advantages they have over traditional pen and paper lab books. Additionally, we will give examples of online protocols and their importance in quality management in the lab.

12:30 - 14:00 Session 2: Power analysis and experimental design => followed by a hands-on workshop

Experimental results should be generalizable to the study population. A prerequisite for this and later attempts to reproduce an experiment is a sufficient number of experimental subjects and an experimental protocol that maximizes the validity of experimental findings. In this workshop, participants will receive an introduction in experimental design (particularly with respect to preclinical research/small sample sizes if this is what people want). Additionally, they will learn how to estimate the number of experimental units that are needed for reproducible inferences.

14:30 - 15:00 - Seminar on the importance of Data Reproducibility - general problems, issues and how we can address them

15:30 - 17:00 Session 3: Data management and open data => followed by a hands-on workshop and discussion

The amount of the data being produced is tremendous, yet its usefulness depends on how the data is recorded, analyzed, stored and shared with the wider scientific community. Although the platforms ensuring “happily ever after” for data sharing are becoming increasingly available, many scientists lack the knowledge on the specific “how to”.

Participants will learn how to ensure that data is findable, accessible, interoperable and re-usable (FAIR). We will specifically cover technical options on how to do this but also how to deal with privacy and patent relevant issues.

Monday 25th of March 2019

9:00 - 11:00 Session 4: Reproducible analysis pipelines (dry lab) => followed by a hands-on workshop

Once the data from an experiment is collected it needs to be processed, poured into figures and statistical analyses to be applied. This raw data to figure process is often not well represented in the methods section of papers. Here, we will highlight several options how to increase the degree of this computational reproducibility to make the analysis reproducible. Participants will be introduced to simple tools like Stencila (currently endorsed by eLife). More advanced participants will be given the opportunity to get an introduction to reproducible notebooks in R/Python as scripted languages.

12:30 - 14:30 Session 5: Version control - Git and Github => followed by a hands-on workshop

Working in teams and internationally on computer code can be sometimes confusing. The vast majority of researchers and software developers use git and one of the cloud service github/gitlab/bitbucket. Participants will receive a general introduction into the logic behind version control. Beyond this we will introduce basic concepts in git and how to interact with others on github.

15:00 - 17:00 Session 6: Statistics refresher

Inferential statistics is at the heart of experimental research. In this workshop, we will revisit some basic statistical principles and common misconceptions in how they are applied in research. We will discuss some tools and strategies that can help to improve statistical practice. We will cover the following issues:

- What is an appropriate statistical test for your data

- Beyond numbers – visualizing data

- How to approach statistical questions beyond using t-tests (what is a Linear model, how to look for interactions)

- Limitations of using p-values (and why confidence intervals can be useful)

- Basic principles of multiple comparison correction

Tuesday 26th of March

9:00 - 11:00 Session 7: Data Visualization => followed by a hands-on workshop

Figures in scientific publications are the backbone of conveying the results of a study. Here, we will introduce the basic concepts of Figure design and how to strike a balance between storytelling and correct data representation. We will specifically address when different display options (e.g. bar plot vs box plot) are appropriate for a certain data type. Additionally, we will introduce the grammar of graphics as a concept for Figure design.

12:30 - 14:30 Session 8: Publishing models and peer review => followed by a discussion group

The last stage before the successful publication is the peer review. Scientists are in the position to be a reviewer and reviewed during their career. Here, we will give guidance particularly on the reviewer side:

- Develop checklists to guide critical appraisal of the literature

- Assess in how far bias, confounding, chance or reverse causality could drive results

- Assess causal associations using Bradford Hill criteria

- Practice critical appraisal skills using example papers

- Additionally, we will introduce publication models and compare green, gold, closed access models and highlight advantages of open access publishing.

- We will discuss open peer-review and reviewing the manuscript as the PostDocs

15:00 - 17:00 Session 9: Bringing it all together and final round-up discussion

We are planning to discuss what are the common problems with implementing Data Reproducibility in individual research and how can KAUST infrastructure be improved to enhance Data Reproducibility for researchers