Skip to contents

This function implements the Independent Metropolis-Hastings algorithm for Bayesian penetrance estimation of cancer risk. It utilizes parallel computing to run multiple chains and provides various options for analyzing and visualizing the results.

Usage

penetrance(
  pedigree,
  twins = NULL,
  n_chains = 1,
  n_iter_per_chain = 10000,
  ncores = 6,
  max_age = 94,
  baseline_data = baseline_data_default,
  removeProband = FALSE,
  ageImputation = FALSE,
  median_max = TRUE,
  BaselineNC = TRUE,
  var = c(0.1, 0.1, 2, 2, 5, 5, 5, 5),
  burn_in = 0,
  thinning_factor = 1,
  distribution_data = distribution_data_default,
  af = 1e-04,
  max_penetrance = 1,
  sample_size = NULL,
  ratio = NULL,
  prior_params = prior_params_default,
  risk_proportion = risk_proportion_default,
  summary_stats = TRUE,
  rejection_rates = TRUE,
  density_plots = TRUE,
  plot_trace = TRUE,
  penetrance_plot = TRUE,
  penetrance_plot_pdf = TRUE,
  probCI = 0.95,
  sex_specific = TRUE
)

Arguments

pedigree

A data frame containing the pedigree data in the required format. It should include the following columns:

  • PedigreeID: A numeric value representing the unique identifier for each family. There should be no duplicated entries.

  • ID: A numeric value representing the unique identifier for each individual. There should be no duplicated entries.

  • Sex: A numeric value where 0 indicates female and 1 indicates male. Missing entries are not currently supported.

  • MotherID: A numeric value representing the unique identifier for an individual's mother.

  • FatherID: A numeric value representing the unique identifier for an individual's father.

  • isProband: A numeric value where 1 indicates the individual is a proband and 0 otherwise.

  • CurAge: A numeric value indicating the age of censoring (current age if the person is alive or age at death if the person is deceased). Allowed ages range from 1 to 94.

  • isAff: A numeric value indicating the affection status of cancer, with 1 for diagnosed individuals and 0 otherwise. Missing entries are not supported.

  • Age: A numeric value indicating the age of cancer diagnosis, encoded as NA if the individual was not diagnosed. Allowed ages range from 1 to 94.

  • Geno: A column for germline testing or tumor marker testing results. Positive results should be coded as 1, negative results as 0, and unknown results as NA or left empty.

twins

A list specifying identical twins or triplets in the family. For example, to indicate that "ora024" and "ora027" are identical twins, and "aey063" and "aey064" are identical twins, use the following format: twins <- list(c("ora024", "ora027"), c("aey063", "aey064")).

n_chains

Integer, the number of chains for parallel computation. Default is 1.

n_iter_per_chain

Integer, the number of iterations for each chain. Default is 10000.

ncores

Integer, the number of cores for parallel computation. Default is 6.

max_age

Integer, the maximum age considered for analysis. Default is 94.

baseline_data

Data for the baseline risk estimates (probability of developing cancer), such as population-level risk from a cancer registry. Default data, for exemplary purposes, is for Colorectal cancer from the SEER database.

removeProband

Logical, indicating whether to remove probands from the analysis. Default is FALSE.

ageImputation

Logical, indicating whether to perform age imputation. Default is FALSE.

median_max

Logical, indicating whether to use the baseline median age or max_age as an upper bound for the median proposal. Default is TRUE.

BaselineNC

Logical, indicating that the non-carrier penetrance is assumed to be the baseline penetrance. Default is TRUE.

var

Numeric vector, variances for the proposal distribution in the Metropolis-Hastings algorithm. Default is c(0.1, 0.1, 2, 2, 5, 5, 5, 5).

burn_in

Numeric, the fraction of results to discard as burn-in (0 to 1). Default is 0 (no burn-in).

thinning_factor

Integer, the factor by which to thin the results. Default is 1 (no thinning).

distribution_data

Data for generating prior distributions.

af

Numeric, allele frequency for the risk allele. Default is 0.0001.

max_penetrance

Numeric, the maximum penetrance considered for analysis. Default is 1.

sample_size

Optional numeric, sample size for distribution generation.

ratio

Optional numeric, ratio parameter for distribution generation.

prior_params

List, parameters for prior distributions.

risk_proportion

Numeric, proportion of risk for distribution generation.

summary_stats

Logical, indicating whether to include summary statistics in the output. Default is TRUE.

rejection_rates

Logical, indicating whether to include rejection rates in the output. Default is TRUE.

density_plots

Logical, indicating whether to include density plots in the output. Default is TRUE.

plot_trace

Logical, indicating whether to include trace plots in the output. Default is TRUE.

penetrance_plot

Logical, indicating whether to include penetrance plots in the output. Default is TRUE.

penetrance_plot_pdf

Logical, indicating whether to include PDF plots in the output. Default is TRUE.

probCI

Numeric, probability level for credible intervals in penetrance plots. Must be between 0 and 1. Default is 0.95.

sex_specific

Logical, indicating whether to use sex-specific parameters in the analysis. Default is TRUE.

Value

A list containing combined results from all chains, including optional statistics and plots.