MarZIC estimates and tests the marginal mediation effects of
zero-inflated compositional mediators. It can be applied to microbiome
studies to estimate and test the marginal mediation effects of relative
abundances (RA). Let Y, M and X denote the outcome, mediator (RA)
and the independent variable respectively. And let 1(M > 0) denote the
binary indicator variable indicating whether M is positive. MarZIC decomposes the
mediation effect into two components of which one is the total mediation
effect going thru the continuous scale of the mediator, denoted by
NIE1 and the other is the
mediation effect going thru the binary change (from zero to non-zero
status) of the mediator, denoted by NIE2. For a microbiome data, let
Mj denote
the RA of the jth
taxon/OTU/ASV. And let Z
denote confounder variable (if any) and Z could contain multiple variables.
The marginal outcome model for Y is: Y = β0 + β1Mj + β21Mj > 0 + β3X + β4X1Mj > 0 + β5XMj + β6Z + ϵ
The mean of Mj depends on
X and Z through the following equation:
$$
\log \Bigg(\frac{E(M_j)}{1-E(M_j)}\Bigg)=\alpha_0 + \alpha_1X+\alpha_2Z
$$ Let Δj = P(Mj = 0)
which is the probability of a structural zero (ie, true zero). We use
the following logistic model for this probability:
$$
\log\bigg(\frac{\Delta_j}{1-\Delta_j}\bigg)=\gamma_0 +
\gamma_1X+\gamma_2Z
$$
Typically, users just need to feed the first six inputs to the
function: Experiment_dat
, lib_name
,
y_name
, x_name
, conf_name
and
taxa_of_interest
.
MicrobData
A dataset contains microbiome data. The
microbiome data could be relative abundance or absolute abundance.
Subjects with missing value will be removed during analysis.CovData
A dataset contains outcome, library size and
covariates.lib_name
Name of library size variable.y_name
Name of outcome variablex_name
Name of covariate of interestconf_name
Name of confounders Defaule is NULL, meaning
no confounder.x4_inter
Whether to include the interaction term β4. Default is TRUE.x5_inter
Whether to include the interaction term β5. Default is TRUE.taxa_of_interest
A character vector for taxa names
indicating taxa that should be analyzed. Default is “all”, meaning all
taxa should be included into analysis.mediator_mix_range
Number of mixtures in mediator.
Default is 1, meaning no mixture.transfer_to_RA
Logical variable indicating whether the
microbiome data should be transferred into relative abundance. Default
is TRUE. If FALSE, microbiome data will not be transferred (e.g., if the
input data is already RA data).num_cores
Number of CPU cores to be used in
parallelization task.adjust_method
P value adjustment method. Same as
p.adjust in R. Default is “fdr”.fdr_rate
FDR cutoff for significance. Default is
0.05.taxDropThresh
The threshold of dropping taxon due to
high zero percentage. Default is 0.9, meaning taxon will be dropped for
analysis if zero percentage is higher than 90%.taxDropCount
The threshold of dropping taxon due to not
enough non-zero observation counts. Default is 4 *
(length(conf_name)+2), meaning taxon will be dropped if non-zero
observation is less than four times of the total number of covariates
(including the independent variable) plus intercept.zero_prop_NIE2
The threshold of zero percentage for
calculating NIE2. Default is 0.1, meaning NIE2 will be calculated for
taxon with zero percentage greater than 10%.zero_count_NIE2
The threshold of zero counts for
calculating NIE2. Default is 4 * (length(conf_name)+2), meaning NIE2
will be calculated for taxon with zero counts greater than four times of
number of covariates plus 1.SDThresh
The threshold of dropping taxon due to low
coefficient of variation (CV) to avoid constant taxon. Default is 0.05,
meaning any taxon has CV less than 0.05 will be dropped.SDx
The threshold of stopping analysis due to low CV of
covariate of interest. Default is 0.05, meaning when CV of covariate of
interest is less than 0.05, the analysis will be stopped.SDy
The threshold of stopping analysis due to low CV of
outcome. Default is 0.05, meaning when CV of outcome is less than 0.05,
the analysis will be stopped.A list
of 4
datasets containing the results
for NIE1
, NIE2
, NDE
, and
NIE
. Each dataset has row representing each taxon, 6
columns for Estimates
, Standard Error
,
Lower bound for 95% Confidence Interval
,
Upper bound for 95% Confidence Interval
,
Adjusted p value
, Significance indicator
.
## A make up example with 1 taxon and 100 subjects.
set.seed(1)
nSub <- 200
nTaxa <- 10
## generate covariate of interest X
X <- rbinom(nSub, 1, 0.5)
## generate confounders
conf1<-rnorm(nSub)
conf2<-rbinom(nSub,1,0.5)
## generate mean of each taxon. All taxon are having the same mean for simplicity.
mu <- exp(-5 + X + 0.1 * conf1 + 0.1 * conf2) /
(1 + exp(-5 + X + 0.1 * conf1 + 0.1 * conf2))
phi <- 10
## generate true RA
M_taxon<-t(sapply(mu,function(x) dirmult::rdirichlet(n=1,rep(x*phi,nTaxa))))
P_zero <- exp(-3 + 0.3 * X + 0.1 * conf1 + 0.1 * conf2) /
(1 + exp(-3 + 0.3 * X + 0.1 * conf1 + 0.1 * conf2))
non_zero_ind <- t(sapply(P_zero,function(x) 1-rbinom(nTaxa,1,rep(x,nTaxa))))
True_RA<-t(apply(M_taxon*non_zero_ind,1,function(x) x/sum(x)))
## generate outcome Y based on true RA
Y <- 1 + 100 * True_RA[,1] + 5 * (True_RA[,1] > 0) + X + conf1 + conf2 + rnorm(nSub)
## library size was set to 10,000 for all subjects for simplicity.
libsize <- 10000
## generate observed RA
observed_AA <- floor(M_taxon*libsize*non_zero_ind)
observed_RA <- t(apply(observed_AA,1,function(x) x/sum(x)))
colnames(observed_RA)<-paste0("rawCount",seq_len(nTaxa))
CovData <- cbind(Y = Y, X = X, libsize = libsize, conf1 = conf1, conf2 = conf2)
Above steps could be ignored if a SummarizedExperiment object is
already available. Suppose we’re interested in the mediation effects of
rawCount1
, rawCount2
, and
rawCount3
, the analysis could be done as:
## run the analysis
res <- MarZIC(
MicrobData = observed_RA,
CovData = CovData,
lib_name = "libsize",
y_name = "Y",
x_name = "X",
conf_name = c("conf1","conf2"),
taxa_of_interest = c("rawCount1","rawCount2","rawCount3"),
num_cores = 1,
mediator_mix_range = 1
)
The results contain 4 datasets for NIE1, NIE2, NDE, NIE, respectively. The NIE1, for example, could be extracted by:
The significant result could be extracted by:
subset(NIE1, significance == TRUE)
#> est se CI low CI up p value unadj p value adj
#> rawCount1 7.138202 1.66603 3.872782 10.40362 1.831038e-05 5.493114e-05
#> significance
#> rawCount1 TRUE
From the result, we can find that rawCount1
is an
significant mediator of mediating the effect of X on Y.
Wu et al.(2022) MarZIC: A Marginal Mediation Model for Zero-Inflated Compositional Mediators with Applications to Microbiome Data. Genes 2022, 13, 1049.
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MarZIC_1.0.0 rmarkdown_2.28
#>
#> loaded via a namespace (and not attached):
#> [1] dirmult_0.1.3-5 jsonlite_1.8.9 compiler_4.4.1 Rcpp_1.0.13
#> [5] parallel_4.4.1 jquerylib_0.1.4 yaml_2.3.10 fastmap_1.2.0
#> [9] lattice_0.22-6 R6_2.5.1 lmtest_0.9-40 NlcOptim_0.6
#> [13] Formula_1.2-5 knitr_1.48 iterators_1.0.14 MASS_7.3-61
#> [17] maketools_1.3.1 nnet_7.3-19 bslib_0.8.0 rlang_1.1.4
#> [21] cachem_1.1.0 mathjaxr_1.6-0 xfun_0.48 quadprog_1.5-8
#> [25] modeltools_0.2-23 sass_0.4.9 sys_3.4.3 doParallel_1.0.17
#> [29] cli_3.6.3 digest_0.6.37 foreach_1.5.2 grid_4.4.1
#> [33] flexmix_2.3-19 sandwich_3.1-1 lifecycle_1.0.4 pracma_2.4.4
#> [37] evaluate_1.0.1 codetools_0.2-20 buildtools_1.0.0 zoo_1.8-12
#> [41] stats4_4.4.1 betareg_3.2-1 tools_4.4.1 htmltools_0.5.8.1