Skip to contents

Why switch?

synthpop excels at disclosure-controlled individual-level microdata but lacks joint distribution modeling via copulas. rsdv uses a Gaussian copula to preserve inter-column correlations.

Side-by-side comparison

synthpop workflow

library(synthpop)
synth <- syn(adult_income[, c("age", "occupation", "income")])
synthetic_data <- synth$syn

rsdv workflow

library(rsdv)

set.seed(42)

meta  <- metadata(adult_income) |>
  set_column_type("age",        "numerical") |>
  set_column_type("occupation", "categorical") |>
  set_column_type("income",     "categorical")

syn   <- gaussian_copula_synthesizer(meta)
syn   <- fit(syn, adult_income)
synthetic_data <- sample(syn, n = nrow(adult_income))

Key differences

Feature synthpop rsdv
Correlation modeling CART-based sequential Gaussian copula over all column types
Column constraints Limited Equality, inequality, fixed combos, custom
Conditional sampling Via predictor order sample_conditions() on categorical values
Quality metrics Built-in utility measures KS, TVD, correlation & contingency similarity, ML efficacy
Diagnostics None Validity report (ranges, categories, key uniqueness)
Privacy metrics None NNDR, attribute disclosure risk
Python interop No API-compatible with SDV