Skip to contents

A package for penetrance estimation in family-based penetrance studies.

This function implements the Independent Metropolis-Hastings algorithm for Bayesian penetrance estimation of cancer risk. It utilizes parallel computing to run multiple chains and provides various options for analyzing and visualizing the results.

Usage

penetrance(
  pedigree,
  twins = NULL,
  n_chains = 1,
  n_iter_per_chain = 10000,
  ncores = 6,
  max_age = 94,
  baseline_data = baseline_data_default,
  remove_proband = FALSE,
  age_imputation = FALSE,
  median_max = TRUE,
  BaselineNC = TRUE,
  var = c(0.1, 0.1, 2, 2, 5, 5, 5, 5),
  burn_in = 0,
  thinning_factor = 1,
  imp_interval = 100,
  distribution_data = distribution_data_default,
  prev = 1e-04,
  sample_size = NULL,
  ratio = NULL,
  prior_params = prior_params_default,
  risk_proportion = risk_proportion_default,
  summary_stats = TRUE,
  rejection_rates = TRUE,
  density_plots = TRUE,
  plot_trace = TRUE,
  penetrance_plot = TRUE,
  penetrance_plot_pdf = TRUE,
  plot_loglikelihood = TRUE,
  plot_acf = TRUE,
  probCI = 0.95,
  sex_specific = TRUE
)

Arguments

pedigree

A data frame containing the pedigree data in the required format. It should include the following columns:

  • PedigreeID: A numeric value representing the unique identifier for each family. There should be no duplicated entries.

  • ID: A numeric value representing the unique identifier for each individual. There should be no duplicated entries.

  • Sex: A numeric value where 0 indicates female and 1 indicates male. Unknown sex needs to be coded as NA.

  • MotherID: A numeric value representing the unique identifier for an individual's mother.

  • FatherID: A numeric value representing the unique identifier for an individual's father.

  • isProband: A numeric value where 1 indicates the individual is a proband and 0 otherwise.

  • CurAge: A numeric value indicating the age of censoring (current age if the person is alive or age at death if the person is deceased). Allowed ages range from 1 to 94. Unknown ages can be left empty or coded as NA.

  • isAff: A numeric value indicating the affection status of cancer, with 1 for diagnosed individuals, 0 for unaffected individuals, and NA for unknown status.

  • Age: A numeric value indicating the age of cancer diagnosis, encoded as NA if the individual was not diagnosed. Allowed ages range from 1 to 94. Unknown ages can be left empty or coded as NA.

  • geno: A column for germline testing or tumor marker testing results. Positive results should be coded as 1, negative results as 0, and unknown results as NA or left empty.

twins

A list specifying identical twins or triplets in the family. For example, to indicate that "ora024" and "ora027" are identical twins, and "aey063" and "aey064" are identical twins, use the following format: twins <- list(c("ora024", "ora027"), c("aey063", "aey064")).

n_chains

Integer, the number of chains for parallel computation. Default is 1.

n_iter_per_chain

Integer, the number of iterations for each chain. Default is 10000.

ncores

Integer, the number of cores for parallel computation. Default is 6.

max_age

Integer, the maximum age considered for analysis. Default is 94.

baseline_data

Data for the baseline risk estimates (probability of developing cancer), such as population-level risk from a cancer registry. Default data, for exemplary purposes, is for Colorectal cancer from the SEER database.

remove_proband

Logical, indicating whether to remove probands from the analysis. Default is FALSE.

age_imputation

Logical, indicating whether to perform age imputation. Default is FALSE.

median_max

Logical, indicating whether to use the baseline median age or max_age as an upper bound for the median proposal. Default is TRUE.

BaselineNC

Logical, indicating that the non-carrier penetrance is assumed to be the baseline penetrance. Default is TRUE.

var

Numeric vector, variances for the proposal distribution in the Metropolis-Hastings algorithm. Default is c(0.1, 0.1, 2, 2, 5, 5, 5, 5).

burn_in

Numeric, the fraction of results to discard as burn-in (0 to 1). Default is 0 (no burn-in).

thinning_factor

Integer, the factor by which to thin the results. Default is 1 (no thinning).

imp_interval

Integer, the interval at which age imputation should be performed when age_imputation = TRUE.

distribution_data

Data for generating prior distributions.

prev

Numeric, prevalence of the carrier status. Default is 0.0001.

sample_size

Optional numeric, sample size for distribution generation.

ratio

Optional numeric, ratio parameter for distribution generation.

prior_params

List, parameters for prior distributions.

risk_proportion

Numeric, proportion of risk for distribution generation.

summary_stats

Logical, indicating whether to include summary statistics in the output. Default is TRUE.

rejection_rates

Logical, indicating whether to include rejection rates in the output. Default is TRUE.

density_plots

Logical, indicating whether to include density plots in the output. Default is TRUE.

plot_trace

Logical, indicating whether to include trace plots in the output. Default is TRUE.

penetrance_plot

Logical, indicating whether to include penetrance plots in the output. Default is TRUE.

penetrance_plot_pdf

Logical, indicating whether to include PDF plots in the output. Default is TRUE.

plot_loglikelihood

Logical, indicating whether to include log-likelihood plots in the output. Default is TRUE.

plot_acf

Logical, indicating whether to include autocorrelation function (ACF) plots for posterior samples. Default is TRUE.

probCI

Numeric, probability level for credible intervals in penetrance plots. Must be between 0 and 1. Default is 0.95.

sex_specific

Logical, indicating whether to use sex-specific parameters in the analysis. Default is TRUE.

Value

A list containing combined results from all chains, including optional statistics and plots.

Author

Maintainer: Nicolas Kubista bmendel@jimmy.harvard.edu

Authors:

  • BayesMendel Lab

Examples

# Create example baseline data (simplified for demonstration)
baseline_data_default <- data.frame(
  Age = 1:94,
  Female = rep(0.01, 94),
  Male = rep(0.01, 94)
)

# Create example distribution data
distribution_data_default <- data.frame(
  Age = 1:94,
  Risk = rep(0.01, 94)
)

# Create example prior parameters
prior_params_default <- list(
  shape = 2,
  scale = 50
)

# Create example risk proportion
risk_proportion_default <- 0.5

# Create a simple example pedigree
example_pedigree <- data.frame(
  PedigreeID = rep(1, 4),
  ID = 1:4,
  Sex = c(1, 0, 1, 0),  # 1 for male, 0 for female
  MotherID = c(NA, NA, 2, 2),
  FatherID = c(NA, NA, 1, 1),
  isProband = c(0, 0, 1, 0),
  CurAge = c(70, 68, 45, 42),
  isAff = c(0, 0, 1, 0),
  Age = c(NA, NA, 40, NA),
  geno = c(NA, NA, 1, NA)
)

# Basic usage with minimal iterations
result <- penetrance(
  pedigree = list(example_pedigree),
  n_chains = 1,
  n_iter_per_chain = 10,  # Very small number for example
  ncores = 1,             # Single core for example
  summary_stats = TRUE,
  plot_trace = FALSE,     # Disable plots for quick example
  density_plots = FALSE,
  penetrance_plot = FALSE,
  penetrance_plot_pdf = FALSE,
  plot_loglikelihood = FALSE,
  plot_acf = FALSE
)
#> Rejection rates:  0.6 
#>   Median_Male    Median_Female   Threshold_Male  Threshold_Female
#>  Min.   :39.91   Min.   :49.86   Min.   :34.87   Min.   :34.79   
#>  1st Qu.:40.00   1st Qu.:50.00   1st Qu.:34.94   1st Qu.:34.83   
#>  Median :40.00   Median :50.00   Median :35.00   Median :35.00   
#>  Mean   :39.99   Mean   :49.99   Mean   :34.97   Mean   :34.92   
#>  3rd Qu.:40.00   3rd Qu.:50.00   3rd Qu.:35.00   3rd Qu.:35.00   
#>  Max.   :40.08   Max.   :50.04   Max.   :35.00   Max.   :35.00   
#>  First_Quartile_Male First_Quartile_Female Asymptote_Male   Asymptote_Female
#>  Min.   :38.92       Min.   :39.75         Min.   :0.4071   Min.   :0.4181  
#>  1st Qu.:38.99       1st Qu.:39.84         1st Qu.:0.4690   1st Qu.:0.4196  
#>  Median :39.00       Median :40.00         Median :0.4999   Median :0.4196  
#>  Mean   :38.98       Mean   :39.92         Mean   :0.4808   Mean   :0.4422  
#>  3rd Qu.:39.00       3rd Qu.:40.00         3rd Qu.:0.4999   3rd Qu.:0.4343  
#>  Max.   :39.00       Max.   :40.00         Max.   :0.5204   Max.   :0.5455  

# View basic results
head(result$summary_stats)
#>   Median_Male    Median_Female   Threshold_Male  Threshold_Female
#>  Min.   :39.91   Min.   :49.86   Min.   :34.87   Min.   :34.79   
#>  1st Qu.:40.00   1st Qu.:50.00   1st Qu.:34.94   1st Qu.:34.83   
#>  Median :40.00   Median :50.00   Median :35.00   Median :35.00   
#>  Mean   :39.99   Mean   :49.99   Mean   :34.97   Mean   :34.92   
#>  3rd Qu.:40.00   3rd Qu.:50.00   3rd Qu.:35.00   3rd Qu.:35.00   
#>  Max.   :40.08   Max.   :50.04   Max.   :35.00   Max.   :35.00   
#>  First_Quartile_Male First_Quartile_Female Asymptote_Male   Asymptote_Female
#>  Min.   :38.92       Min.   :39.75         Min.   :0.4071   Min.   :0.4181  
#>  1st Qu.:38.99       1st Qu.:39.84         1st Qu.:0.4690   1st Qu.:0.4196  
#>  Median :39.00       Median :40.00         Median :0.4999   Median :0.4196  
#>  Mean   :38.98       Mean   :39.92         Mean   :0.4808   Mean   :0.4422  
#>  3rd Qu.:39.00       3rd Qu.:40.00         3rd Qu.:0.4999   3rd Qu.:0.4343  
#>  Max.   :39.00       Max.   :40.00         Max.   :0.5204   Max.   :0.5455