Simulation study from empirical data with penetrance
BayesMendel Lab
Source:vignettes/simulation_study_real.Rmd
simulation_study_real.Rmd
Goal
Here we apply the penetrance package to simulated families where the data-generating penetrance function is known and based on existing penetrance estimates.
Simulated Data
The data-generating distribution of the age-specific penetrances is based on existing penetrance estimates for Colorectal cancer in carriers of any pathogenic variant in MLH1 from the PanelPRO Database.
The families were simulated using the PedUtils Rpackage.
dat <- test_fam2
Simple simulation
Then we run the estimation using the default settings.
# Set the random seed
set.seed(2024)
# Set the prior
prior_params <- list(
asymptote = list(g1 = 1, g2 = 1),
threshold = list(min = 5, max = 30),
median = list(m1 = 2, m2 = 2),
first_quartile = list(q1 = 6, q2 = 3)
)
# Set the allele frequency for MLH1 based on PanelPRO Database
prevMLH1 <- 0.0004453125
# We use the default baseline (non-carrier) penetrance
print(baseline_data_default)
# We run the estimation procedure with one chain and 20k iterations
out_sim <- penetrance(
pedigree = dat, twins = NULL, n_chains = 1, n_iter_per_chain = 20000,
ncores = 2, baseline_data = baseline_data_default , prev = prevMLH1,
prior_params = prior_params, burn_in = 0.1, median_max = TRUE,
ageImputation = FALSE, removeProband = FALSE
)
References
Lee G, Liang JW, Zhang Q, Huang T, Choirat C, Parmigiani G, Braun D. Multi-syndrome, multi-gene risk modeling for individuals with a family history of cancer with the novel R package PanelPRO. Elife. 2021;10:e68699. doi:10.7554/eLife.6869