Towards Reproducible Data Science for Community and Policy Research

An experiential road map


Kailas Venkitasubramanian, PhD

Charlotte Urban Institute

Analytics Frontiers Conference, 2023

Mar. 9, 2023

There’s a reproducibility storm brewing in Science (for some time now)

“reproducibility refers to the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator. That is, a second researcher might use the same raw data to build the same analysis files and implement the same statistical analysis in an attempt to yield the same results. Reproducibility is a minimum necessary condition for a finding to be believable and informative.”

– U.S. National Science Foundation (NSF) subcommittee on replicability in science

Reproducibility in simple terms

  • Can we replicate/re-create a project end-to-end without dead-ends?
  • Can someone else replicate our project without breakdowns?
    • Is our data research-ready and replication-ready?
    • Is our software/code replication ready?
    • Is our computational environment replication ready?

Our Work - Charlotte Urban Institute

  • 50+ years of community and policy research in the 14-county Charlotte Region
  • A community-engaged research arm of UNC Charlotte and part of UrbanCORE
  • Touch a wide range of domains in civic life - Health, Education, Economy, Environment and others
  • Keep pulse of how this region, its needs, issues and solutions are evolving

UI Region

Our Work - Charlotte Regional Data Trust

CRDT

Why reproducibility matters in community research?

  • Shorter runway to evidence-based public policy and public action
  • Active social and political visibility (and scrutiny)
  • Diverse stakeholders and interests
  • Limited resources
  • Translating Research
  • Community’s Trust

The path to an optimal reproducibility recipe

UI Data Science Team’s Goals

  • Develop reproducible analytical processes across research life cycle

  • Build and sustain data and computational infrastructure that align with reproducibility goals

  • Amend project management strategy to enable reproducible work and collaboration

  • Demonstrate value for organization and community through reproducible projects

  • Establish partnerships that champion research transparency and verifiability

  • Be a steward in helping our partners embrace these principles into their work

The UI Data Science Ecosystem

The FAIR framework

The FAIR framework

Walking the path..

The UI Reproducibility Project

UI Projects ::: footer https://www.leibniz-fli.de/research/good-scientific-practice/rdm-at-fli :::

Our tool-verse

RMarkdown

Targets

Renku

Data & Operations Documentation

Project workflows

How the practice is evolving..

Has all this made difference?

QoL Reproducible Workflow

So, what do we want you to take-away?

Operating principles

  • Something is better than nothing

  • Expect and embrace pain

  • Be the change you want to see in others

  • Start small. Build in increments

  • Open Source first but be inclusive

  • You are doing it wrong if this is no fun

  • Help others. Share with others. Be nimble.

  • Don’t be a reproducibility fundamentalist

Thank you!

Acknowledgement