Bayesian Inference using Independent Metropolis-Hastings for Penetrance Estimation
Source:R/PenEstimMain.R
penetrance.Rd
This function implements the Independent Metropolis-Hastings algorithm for Bayesian penetrance estimation of cancer risk. It utilizes parallel computing to run multiple chains and provides various options for analyzing and visualizing the results.
Usage
penetrance(
pedigree,
twins = NULL,
n_chains = 1,
n_iter_per_chain = 10000,
ncores = 6,
max_age = 94,
baseline_data = baseline_data_default,
removeProband = FALSE,
ageImputation = FALSE,
median_max = TRUE,
BaselineNC = TRUE,
var = c(0.1, 0.1, 2, 2, 5, 5, 5, 5),
burn_in = 0,
thinning_factor = 1,
distribution_data = distribution_data_default,
af = 1e-04,
max_penetrance = 1,
sample_size = NULL,
ratio = NULL,
prior_params = prior_params_default,
risk_proportion = risk_proportion_default,
summary_stats = TRUE,
rejection_rates = TRUE,
density_plots = TRUE,
plot_trace = TRUE,
penetrance_plot = TRUE,
penetrance_plot_pdf = TRUE,
probCI = 0.95,
sex_specific = TRUE
)
Arguments
- pedigree
A data frame containing the pedigree data in the required format. It should include the following columns:
PedigreeID
: A numeric value representing the unique identifier for each family. There should be no duplicated entries.ID
: A numeric value representing the unique identifier for each individual. There should be no duplicated entries.Sex
: A numeric value where0
indicates female and1
indicates male. Missing entries are not currently supported.MotherID
: A numeric value representing the unique identifier for an individual's mother.FatherID
: A numeric value representing the unique identifier for an individual's father.isProband
: A numeric value where1
indicates the individual is a proband and0
otherwise.CurAge
: A numeric value indicating the age of censoring (current age if the person is alive or age at death if the person is deceased). Allowed ages range from1
to94
.isAff
: A numeric value indicating the affection status of cancer, with1
for diagnosed individuals and0
otherwise. Missing entries are not supported.Age
: A numeric value indicating the age of cancer diagnosis, encoded asNA
if the individual was not diagnosed. Allowed ages range from1
to94
.Geno
: A column for germline testing or tumor marker testing results. Positive results should be coded as1
, negative results as0
, and unknown results asNA
or left empty.
- twins
A list specifying identical twins or triplets in the family. For example, to indicate that "ora024" and "ora027" are identical twins, and "aey063" and "aey064" are identical twins, use the following format:
twins <- list(c("ora024", "ora027"), c("aey063", "aey064"))
.- n_chains
Integer, the number of chains for parallel computation. Default is 1.
- n_iter_per_chain
Integer, the number of iterations for each chain. Default is 10000.
- ncores
Integer, the number of cores for parallel computation. Default is 6.
- max_age
Integer, the maximum age considered for analysis. Default is 94.
- baseline_data
Data for the baseline risk estimates (probability of developing cancer), such as population-level risk from a cancer registry. Default data, for exemplary purposes, is for Colorectal cancer from the SEER database.
- removeProband
Logical, indicating whether to remove probands from the analysis. Default is FALSE.
- ageImputation
Logical, indicating whether to perform age imputation. Default is FALSE.
- median_max
Logical, indicating whether to use the baseline median age or
max_age
as an upper bound for the median proposal. Default is TRUE.- BaselineNC
Logical, indicating that the non-carrier penetrance is assumed to be the baseline penetrance. Default is TRUE.
- var
Numeric vector, variances for the proposal distribution in the Metropolis-Hastings algorithm. Default is
c(0.1, 0.1, 2, 2, 5, 5, 5, 5)
.- burn_in
Numeric, the fraction of results to discard as burn-in (0 to 1). Default is 0 (no burn-in).
- thinning_factor
Integer, the factor by which to thin the results. Default is 1 (no thinning).
- distribution_data
Data for generating prior distributions.
- af
Numeric, allele frequency for the risk allele. Default is 0.0001.
- max_penetrance
Numeric, the maximum penetrance considered for analysis. Default is 1.
- sample_size
Optional numeric, sample size for distribution generation.
- ratio
Optional numeric, ratio parameter for distribution generation.
- prior_params
List, parameters for prior distributions.
- risk_proportion
Numeric, proportion of risk for distribution generation.
- summary_stats
Logical, indicating whether to include summary statistics in the output. Default is TRUE.
- rejection_rates
Logical, indicating whether to include rejection rates in the output. Default is TRUE.
- density_plots
Logical, indicating whether to include density plots in the output. Default is TRUE.
- plot_trace
Logical, indicating whether to include trace plots in the output. Default is TRUE.
- penetrance_plot
Logical, indicating whether to include penetrance plots in the output. Default is TRUE.
- penetrance_plot_pdf
Logical, indicating whether to include PDF plots in the output. Default is TRUE.
- probCI
Numeric, probability level for credible intervals in penetrance plots. Must be between 0 and 1. Default is 0.95.
- sex_specific
Logical, indicating whether to use sex-specific parameters in the analysis. Default is TRUE.