Title: | Covariates, Interaction, and Selection for DGLM |
---|---|
Description: | An implementation of double generalized linear model (DGLM) building with variable selection procedures and handling of interaction terms and other complex situations. We also provide a method of handling convergence issues within the dglm() function. The package offers a simulation function for generating simulated data for testing purposes and utilizes the forward stepwise variable selection procedure in model-building. It also provides a new custom bootstrap function for mean and standard deviation estimation and functions for building crossplots and squareplots from a data set. |
Authors: | Ann Stapleton [aut], Yishi Wang [aut, cre], Kaitlyn Hohmeier [aut], Jordan Tanley [aut] |
Maintainer: | Yishi Wang <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0 |
Built: | 2025-01-30 03:36:02 UTC |
Source: | https://github.com/cran/CIS.DGLM |
This function implements a custom bootstrapping procedure that utilizes bootstrapping to estimate mean and SD of stress between two environment states (A and B).
bootstrap( dataset, n.boot = 10^5, variables, stress_variable, alpha = 0.05, ran.seed = 12345 )
bootstrap( dataset, n.boot = 10^5, variables, stress_variable, alpha = 0.05, ran.seed = 12345 )
dataset |
Data set to be utilized. |
n.boot |
Number of bootstraps to perform. Defaults to 10^5. |
variables |
List of variables from mean and variance models in DGLM. |
stress_variable |
Name of the variable with the stress values. |
alpha |
Significance level by which to determine the confidence intervals for the bootstrap estimates. Defaults to 0.05, thus creating the 95 percent confidence intervals. |
ran.seed |
Random seed value for generating different random bootstrap samples.] |
Lists with confidence intervals for the bootstrap estimations for average stress in As and Bs of variables in mean model and confidence intervals for the bootstrap estimations of standard deviation of stress in As and Bs of variables in variance model.
test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1) variables <- colnames(test.data[-1]) bootstrap(test.data, n.boot=100,variables, 'stress') unlink(c('bootstrap mean A stress.txt','bootstrap mean B stress.txt', 'bootstrap sd A stress.txt', 'bootstrap sd B stress.txt'))
test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1) variables <- colnames(test.data[-1]) bootstrap(test.data, n.boot=100,variables, 'stress') unlink(c('bootstrap mean A stress.txt','bootstrap mean B stress.txt', 'bootstrap sd A stress.txt', 'bootstrap sd B stress.txt'))
This function draws crossplots for As and Bs in each variable in the mean and variance models with the Mean Estimate vs Standard Deviation Estimate.
draw.crossplots( fn.mean.A, fn.mean.B, fn.sd.A, fn.sd.B, fn.pe.mean, fn.pe.sd, variables, ishybrid, num.vars )
draw.crossplots( fn.mean.A, fn.mean.B, fn.sd.A, fn.sd.B, fn.pe.mean, fn.pe.sd, variables, ishybrid, num.vars )
fn.mean.A |
Enter file name of file with confidence intervals of mean stress, environment level A data. Can be hybrid, inbred, or full data set. This file needs to be obtained from the bootstrap function, run with the desired data set. |
fn.mean.B |
Enter file name of file with confidence intervals of mean stress, environment level B data. Can be hybrid, inbred, or full data set. This file needs to be obtained from the bootstrap function, run with the desired data set. |
fn.sd.A |
Enter file name of file with confidence intervals of SD stress, environment level A data. Can be hybrid, inbred, or full data set. This file needs to be obtained from the bootstrap function, run with the desired data set. |
fn.sd.B |
Enter file name of file with confidence intervals of SD stress, environment level B data. Can be hybrid, inbred, or full data set. This file needs to be obtained from the bootstrap function, run with the desired data set. |
fn.pe.mean |
Enter file name of file with point estimates of mean for each gene (both A and B environment levels present). Can be hybrid, inbred, or full data set. This file needs to be obtained from the mean_stress function, run with the desired data set. |
fn.pe.sd |
Enter file name of file with point estimates of SD for each gene (both A and B environment levels present). Can be hybrid, inbred, or full data set. This file needs to be obtained from the sd.stress function, run with the desired data set. |
variables |
List of variables from mean and variance models. Mean variables need to be listed first, then variance variables. |
ishybrid |
Indicates the type of the data set being examined. You can use 'Hybrid', 'Inbred', "All", etc. |
num.vars |
Number of variables per model. Used to ascertain if a variable falls in the mean or the variance model. |
There is no return for this function; it prints crossplots for each of the variables listed in the parameter 'variables.'
test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1) variables <- colnames(test.data[-1]) mean_stress(test.data, variables, 'stress') sink(); sd.stress(test.data, variables, 'stress') sink(); plot_vars <- c("loci_var.4","loci_var.7.env_var.2","loci_var.3", "loci_var.5","loci_var.8.env_var.2","loci_var.4") bootstrap(test.data, n.boot=100,variables, 'stress') draw.crossplots('bootstrap mean A stress.txt','bootstrap mean B stress.txt', 'bootstrap sd A stress.txt', 'bootstrap sd B stress.txt', 'mean_stress.txt', 'sd_stress.txt', plot_vars, 'All',3) unlink(c('bootstrap mean A stress.txt','bootstrap mean B stress.txt', 'bootstrap sd A stress.txt', 'bootstrap sd B stress.txt', 'mean_stress.txt', 'sd_stress.txt'))
test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1) variables <- colnames(test.data[-1]) mean_stress(test.data, variables, 'stress') sink(); sd.stress(test.data, variables, 'stress') sink(); plot_vars <- c("loci_var.4","loci_var.7.env_var.2","loci_var.3", "loci_var.5","loci_var.8.env_var.2","loci_var.4") bootstrap(test.data, n.boot=100,variables, 'stress') draw.crossplots('bootstrap mean A stress.txt','bootstrap mean B stress.txt', 'bootstrap sd A stress.txt', 'bootstrap sd B stress.txt', 'mean_stress.txt', 'sd_stress.txt', plot_vars, 'All',3) unlink(c('bootstrap mean A stress.txt','bootstrap mean B stress.txt', 'bootstrap sd A stress.txt', 'bootstrap sd B stress.txt', 'mean_stress.txt', 'sd_stress.txt'))
This function draws square plots for As and Bs in each variable in the mean and variance models with the Mean Estimate vs Standard Deviation Estimate
draw.squareplots( fn.mean.A, fn.mean.B, fn.sd.A, fn.sd.B, fn.pe.mean, fn.pe.sd, variables, ishybrid, num.vars )
draw.squareplots( fn.mean.A, fn.mean.B, fn.sd.A, fn.sd.B, fn.pe.mean, fn.pe.sd, variables, ishybrid, num.vars )
fn.mean.A |
file name of file with confidence intervals of mean stress, environment level A data. Can be either hybrid or inbred data. |
fn.mean.B |
file name of file with confidence intervals of mean stress, environment level B data. Can be either hybrid or inbred data. |
fn.sd.A |
file name of file with confidence intervals of SD stress, environment level A data. Can be either hybrid or inbred data. |
fn.sd.B |
file name of file with confidence intervals of SD stress, environment level B data. Can be either hybrid or inbred data. |
fn.pe.mean |
file name of file with point estimates of mean for each gene (both A and B environment levels present). Can be either hybrid or inbred data. |
fn.pe.sd |
file name of file with point estimates of SD for each gene (both A and B environment levels present). Can be either hybrid or inbred data. |
variables |
list of variables from mean and variance models. Mean vars needs to be listed first, then variance vars. |
ishybrid |
indicates the type of the data set being examined. Choose 'Hybrid' or 'Inbred' or "All". |
num.vars |
number of variables per model. Used to ascertain if a variable falls in the mean or the variance model. |
There is no return for this function; it prints square plots for each of the variables listed in the parameter 'variables'.
test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1) variables <- colnames(test.data[-1]) mean_stress(test.data, variables, 'stress') sink(); sd.stress(test.data, variables, 'stress') sink(); bootstrap(test.data, n.boot=100,variables, 'stress') plot_vars <- c("loci_var.4","loci_var.7.env_var.2","loci_var.3", "loci_var.5","loci_var.8.env_var.2","loci_var.4") draw.squareplots('bootstrap mean A stress.txt','bootstrap mean B stress.txt', 'bootstrap sd A stress.txt', 'bootstrap sd B stress.txt', 'mean_stress.txt', 'sd_stress.txt', plot_vars, 'All', 3) unlink(c('bootstrap mean A stress.txt','bootstrap mean B stress.txt', 'bootstrap sd A stress.txt', 'bootstrap sd B stress.txt', 'mean_stress.txt', 'sd_stress.txt'))
test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1) variables <- colnames(test.data[-1]) mean_stress(test.data, variables, 'stress') sink(); sd.stress(test.data, variables, 'stress') sink(); bootstrap(test.data, n.boot=100,variables, 'stress') plot_vars <- c("loci_var.4","loci_var.7.env_var.2","loci_var.3", "loci_var.5","loci_var.8.env_var.2","loci_var.4") draw.squareplots('bootstrap mean A stress.txt','bootstrap mean B stress.txt', 'bootstrap sd A stress.txt', 'bootstrap sd B stress.txt', 'mean_stress.txt', 'sd_stress.txt', plot_vars, 'All', 3) unlink(c('bootstrap mean A stress.txt','bootstrap mean B stress.txt', 'bootstrap sd A stress.txt', 'bootstrap sd B stress.txt', 'mean_stress.txt', 'sd_stress.txt'))
This function implements the forward stepwise variable selection procedure on the simulated data set generated in simu.inter.dat.interboth. In this function, we utilize a dummy value of "1" when initializing the model to avoid issues with a NULL value when adding variables to the model.
forward.sel.dglm(dat.ana.num12.df, ouput.name = "out1.txt", num.loop = 10)
forward.sel.dglm(dat.ana.num12.df, ouput.name = "out1.txt", num.loop = 10)
dat.ana.num12.df |
A data set filled with data based on a simulation, per the procedure to generate it in the simu.inter.dat.interboth function. |
ouput.name |
The name of the output file to which the results are to be saved. Defaults to 'out1.txt'. |
num.loop |
The number of iterations that forward stepwise selection is performed (and hence how many variables will be in the final mean and variance models). Defaults to 10 loops. |
A list with mean and variance mean effects and p-values associated with the coefficients.
test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1) forward.sel.dglm(test.data)
test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1) forward.sel.dglm(test.data)
This function implements the forward stepwise variable selection procedure on a real data set. It utilizes the dglm function from the dglm packages to build the model and helps to account for more complex situations such as convergence issues with dglm and interaction terms in the model. In this function, we utilize a dummy value of "1" when initializing the model to avoid issues with a NULL value when adding variables to the model.
forward.sel.dglm.real( dat.ana.num12.df, ouput.name = "out1.txt", num.loop = 10, typ.err = 0.05 )
forward.sel.dglm.real( dat.ana.num12.df, ouput.name = "out1.txt", num.loop = 10, typ.err = 0.05 )
dat.ana.num12.df |
The data set to be used to build the DGLM. |
ouput.name |
The name of the output file to which the results will be saved. |
num.loop |
The number of iterations that forward stepwise selection is performed (and hence how many variables will be in the final mean and variance models). Defaults to 10 iterations. |
typ.err |
Type 1 error. The default value is 0.05. |
A data frame with mean and variance mean effects and p-values associated with the coefficients for each loop. The function also produces a text file containing the model-building information at each stage of the loop (i.e. variables causing errors or warnings, the state of the model at each iteration, etc.).
library(dplyr) test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1) forward.sel.dglm.real(test.data) unlink(c('out1.txt'))
library(dplyr) test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1) forward.sel.dglm.real(test.data) unlink(c('out1.txt'))
This function provides the mean stress among As and Bs, corresponding to different environment levels, for a list of variables.
mean_stress( dataset, var, stress_variable, output.name = "mean_stress.txt", use.output = TRUE, bin.levels = c(0, 1) )
mean_stress( dataset, var, stress_variable, output.name = "mean_stress.txt", use.output = TRUE, bin.levels = c(0, 1) )
dataset |
Data set to be utilized. For the purposes of this function, the binary values in each variable are considered to be -1 and 1. |
var |
A list of variables. If using variables from a DGLM, use the variables from the mean model (or, if trying to find intervals for use in the plotting functions draw.crossplots and draw.squareplots, use all variables in both mean and variance models). |
stress_variable |
Name of the variable with the stress values. |
output.name |
Name of the output file to which to save the outputs. Defaults to 'mean_stress.txt'. |
use.output |
A binary variable to indicate whether the output is automatically saved to an external text file. Defaults to TRUE. If FALSE, the output will not be saved to a file. |
bin.levels |
A list that provides the binary values utilized in the dataset. Defaults to c(0,1), indicating that 0 and 1 are used as the binary outcomes; can also be 1, -1. List the value for the "A" environment level first, then the value for the "B" environment level. |
Produces a data frame with three columns: var, AvgB, and AvgA. These provide the variable and its corresponding mean stress values for As and Bs, corresponding to different environment levels.
test.data <- simu.inter.dat.interboth(n.rep = 2, n.obs.per.rep = 15, ran.seed = 1) variables <- colnames(test.data[-1]) mean_stress(test.data, variables, 'stress', use.output = FALSE)
test.data <- simu.inter.dat.interboth(n.rep = 2, n.obs.per.rep = 15, ran.seed = 1) variables <- colnames(test.data[-1]) mean_stress(test.data, variables, 'stress', use.output = FALSE)
This function provides the mean stress among As and Bs, corresponding to different environment levels, for a list of variables.
sd.stress( dataset, var, stress_variable, output.name = "sd_stress.txt", use.output = TRUE, bin.levels = c(0, 1) )
sd.stress( dataset, var, stress_variable, output.name = "sd_stress.txt", use.output = TRUE, bin.levels = c(0, 1) )
dataset |
Data set to be utilized. For the purposes of this function, the binary values in each variable are considered to be -1 and 1. |
var |
A list of variables. If using variables from a DGLM, use the variables from the variance model (or, if trying to find intervals for use in the plotting functions draw.crossplots and draw.squareplots, use all variables in both mean and variance models). |
stress_variable |
Name of the variable with the stress values. |
output.name |
Name of the output file to which to save the outputs. Defaults to 'sd_stress.txt'. |
use.output |
A binary variable to indicate whether the output is automatically saved to an external text file. Defaults to TRUE. If FALSE, the output will not be saved to a file. |
bin.levels |
A list that provides the binary values utilized in the dataset. Defaults to c(0,1), indicating that 0 and 1 are used as the binary outcomes; can also be 1, -1. List the value for the "A" environment level first, then the value for the "B" environment level. |
Produces a data frame with three columns: var, sd1, and sdneg1. These provide the variable and its corresponding standard deviation of stress values for As and Bs, corresponding to different environment levels.
test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1) variables <- colnames(test.data[-1]) sd.stress(test.data, variables, 'stress', use.output = FALSE)
test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1) variables <- colnames(test.data[-1]) sd.stress(test.data, variables, 'stress', use.output = FALSE)
This function implements a simulation based on randomly-generated data following a known structure to create a double generalized linear model (DGLM). In particular, it supports the use of interaction terms in the DGLM.
simu.inter.dat.interboth( n.rep = 3, n.level.env = 2, n.obs.per.rep = 150, n.loci = 8, which.mean.loci = c(3:4), hypo.mean.para = c(1, 4), incept.mean = 36, which.var.loci = c(4:5), hypo.var.para = c(2, 5), incept.var = -2, which.env.inter.mean = 2, which.loci.inter.mean = 7, hypo.inter.para.mean = 2.5, which.env.inter.var = 2, which.loci.inter.var = 8, hypo.inter.para.var = 3.8, simu.prob.var = rep(0.5, n.loci), rc.val = 40, ran.seed = NULL )
simu.inter.dat.interboth( n.rep = 3, n.level.env = 2, n.obs.per.rep = 150, n.loci = 8, which.mean.loci = c(3:4), hypo.mean.para = c(1, 4), incept.mean = 36, which.var.loci = c(4:5), hypo.var.para = c(2, 5), incept.var = -2, which.env.inter.mean = 2, which.loci.inter.mean = 7, hypo.inter.para.mean = 2.5, which.env.inter.var = 2, which.loci.inter.var = 8, hypo.inter.para.var = 3.8, simu.prob.var = rep(0.5, n.loci), rc.val = 40, ran.seed = NULL )
n.rep |
The number of repetitions of each environment level. Defaults to 3. |
n.level.env |
The number of environment variable levels. Defaults to 2. |
n.obs.per.rep |
A parameter to accommodate repetitions in the data set. Defaults to 150. |
n.loci |
The number of gene loci in data set. Defaults to 8. |
which.mean.loci |
A vector that specifies which gene loci are significant in the mean model. Defaults to the vector c(3:4). |
hypo.mean.para |
A vector that contains the slopes of the gene locations (from which.mean.loci) in the mean model. Defaults to the vector c(1,4). |
incept.mean |
The intercept for the mean model portion of the DGLM. Defaults to 36. |
which.var.loci |
A vector that specifies which gene loci are significant in the variance model. Defaults to c(4:5). |
hypo.var.para |
A vector that contains the slopes of the gene locations (from which.var.loci) in the variance model. Defaults to c(2,5). |
incept.var |
The intercept for the variance model portion of the DGLM. Defaults to -2. |
which.env.inter.mean |
Specifies which level of environment significantly interacts in the mean model. This is one of the parameters that supports model-building with interaction terms. Defaults to 2. |
which.loci.inter.mean |
Specifies which gene location interacts with the environment level in the mean model. This is one of the parameters that supports model-building with interaction terms. Defaults to 7. |
hypo.inter.para.mean |
The interaction effect between the specified environment level and the specified gene location in the mean model. This is another parameter that supports model-building with interaction terms. Defaults to 2.5. |
which.env.inter.var |
Specifies which level of environment significantly interacts in the variance model. This is another parameter that supports model-building with interaction terms. Defaults to 2. |
which.loci.inter.var |
Specifies which gene location interacts with the environment level in the variance model. This is another parameter that supports model-building with interaction terms. Defaults to 8. |
hypo.inter.para.var |
Interaction effect between the specified environment level and the specified gene location in the variance model. Defaults to 3.8. |
simu.prob.var |
The probability of each lcoi to be zero or one. Different loci may have different probability and can be adjusted as needed. Defaults to rep(0.5,n.loci). |
rc.val |
A parameter used to control right-censored data. It sets an upper limit such that if an observation is above a certain range, that observation cannot be included in the data set. Default value is 40. |
ran.seed |
The random seed to be used. Defaults to NULL. |
The function returns a data frame containing the data built from the simulation. It provides data for stress values, an environment variable with two levels (0 and 1), and levels for each simulated gene variable.
test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1)
test.data <- simu.inter.dat.interboth(n.rep = 3, n.obs.per.rep = 15, ran.seed = 1)