Skip to contents

Metadata

metadata()
Create a metadata object describing a dataset's column types
set_column_type()
Set the type of a column in metadata
set_primary_key()
Set the primary key column of the metadata
print(<rsdv_metadata>)
Print method for rsdv_metadata
metadata_to_json()
Serialize metadata to a JSON string
metadata_from_json()
Deserialize metadata from a JSON string
save_metadata()
Save metadata to a JSON file
load_metadata()
Load metadata from a JSON file

Constraints

add_constraint()
Add a constraint to metadata
equality_constraint()
Constraint: two columns must be equal row-wise
inequality_constraint()
Constraint: col_a must be less than / greater than col_b
fixed_combinations_constraint()
Constraint: only observed column combinations are valid
custom_constraint()
Constraint: arbitrary row-wise predicate
check_constraint()
Check a single constraint against each row of a data frame
check_constraints()
Check all constraints in metadata against a data frame
print(<equality_constraint>)
Print method for an equality_constraint
print(<inequality_constraint>)
Print method for an inequality_constraint
print(<fixed_combinations_constraint>)
Print method for a fixed_combinations_constraint
print(<custom_constraint>)
Print method for a custom_constraint

Synthesizers

gaussian_copula_synthesizer()
Create a Gaussian Copula synthesizer
reexports fit
Objects exported from other packages
sample()
Sample synthetic rows from a fitted synthesizer
sample_conditions()
Sample synthetic rows that match fixed column values (conditional sampling)
is_fitted()
Check whether a synthesizer has been fitted
validate_data()
Validate that a data frame is compatible with metadata

Evaluation

quality_report()
Generate a quality report comparing real and synthetic data
print(<rsdv_quality_report>)
Print method for rsdv_quality_report
autoplot(<rsdv_quality_report>)
Plot a quality report
diagnostic_report()
Generate a diagnostic (validity) report for synthetic data
print(<rsdv_diagnostic_report>)
Print method for rsdv_diagnostic_report
autoplot(<rsdv_diagnostic_report>)
Plot a diagnostic report
privacy_report()
Generate a privacy report comparing real and synthetic data
print(<rsdv_privacy_report>)
Print method for rsdv_privacy_report
autoplot(<rsdv_privacy_report>)
Plot a privacy report
ks_similarity()
Kolmogorov-Smirnov similarity score per numerical column
tvd_similarity()
Total variation distance similarity score per categorical column
correlation_similarity()
Correlation similarity between real and synthetic numerical column pairs
contingency_similarity()
Contingency similarity between real and synthetic categorical column pairs
ml_efficacy()
ML efficacy: train-on-synthetic / test-on-real accuracy ratio (TSTR)
nndr()
Nearest-Neighbor Distance Ratio privacy score
attribute_disclosure_risk()
Attribute disclosure risk

Data

adult_income
Adult Income dataset (500-row sample)