Title: | Identifying Significant SNPs in Genome Wide Association Studies (GWAS) via Clustering |
---|---|
Description: | Identifying disease-associated significant SNPs using clustering approach. This package is implementation of method proposed in Xu et al (2019) <DOI:10.1038/s41598-019-50229-6>. |
Authors: | Yan Xu, Li Xing, Jessica Su, Xuekui Zhang<[email protected]>, Weiliang Qiu <[email protected]> |
Maintainer: | Li Xing <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.7 |
Built: | 2025-02-23 03:46:03 UTC |
Source: | https://github.com/cran/GWASbyCluster |
An ExpressionSet object storing simulated genotype data. The minor allele frequency (MAF) of cases has the same prior as that of controls.
data("esSim")
data("esSim")
In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.
In cluster , the minor allele frequency
(MAF)
of cases is greater than the MAF
of
controls.
In cluster , the MAF
of cases is equal to
the MAF of controls.
In cluster , the MAF
of cases is smaller than
the MAF
of controls.
The proportions of the 3 clusters of SNPs are ,
, and
, respectively.
We assume a “half-flat shape” bivariate prior for the MAF in
cluster
where is hte indicator function taking value
if the event
is true, and value
otherwise.
The function
is the probability density function of the
beta distribution
.
We assume has the beta prior
.
We also assume a “half-flat shape” bivariate prior for the MAF in
cluster
The function is the probability density function of the
beta distribution
.
Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes.
That is, given MAF , the probabilities of genotypes are
We also assume the genotypes (wild-type),
(heterozygote), and
(mutation) follows a multinomial distribution
We set the number of cases as , the number of controls as
,
and the number of SNPs as
.
The hyperparameters are
,
,
,
,
,
,
,
,
.
Note that when we generate MAFs from the half-flat shape bivariate priors,
we might get very small MAFs or get MAFs . In these cased,
we then delete this SNP.
So the final number of SNPs generated might be less than the initially-set
number of SNPs.
For the dataset stored in esSim
, there are SNPs.
SNPs are in cluster -,
SNPs are in cluster
,
and
SNPs are in cluster
.
Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.
data(esSim) print(esSim) pDat=pData(esSim) print(pDat[1:2,]) print(table(pDat$memSubjs)) fDat=fData(esSim) print(fDat[1:2,]) print(table(fDat$memGenes)) print(table(fDat$memGenes2))
data(esSim) print(esSim) pDat=pData(esSim) print(pDat[1:2,]) print(table(pDat$memSubjs)) fDat=fData(esSim) print(fDat[1:2,]) print(table(fDat$memGenes)) print(table(fDat$memGenes2))
An ExpressionSet object storing simulated genotype data. The minor allele frequency (MAF) of cases has different prior than that of controls.
data("esSimDiffPriors")
data("esSimDiffPriors")
In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.
In cluster , the minor allele frequency
(MAF)
of cases is greater than the MAF
of
controls.
In cluster , the MAF
of cases is equal to
the MAF of controls.
In cluster , the MAF
of cases is smaller than
the MAF
of controls.
The proportions of the 3 clusters of SNPs are ,
, and
, respectively.
We assume a “half-flat shape” bivariate prior for the MAF in
cluster
where is hte indicator function taking value
if the event
is true, and value
otherwise.
The function
is the probability density function of the
beta distribution
.
The function
is the probability density function of the
beta distribution
.
We assume has the beta prior
.
We also assume a “half-flat shape” bivariate prior for the MAF in
cluster
The function is the probability density function of the
beta distribution
.
The function
is the probability density function of the
beta distribution
.
Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes.
That is, given MAF , the probabilities of genotypes are
We also assume the genotypes (wild-type),
(heterozygote), and
(mutation) follows a multinomial distribution
We set the number of cases as , the number of controls as
,
and the number of SNPs as
.
The hyperparameters are
,
,
,
,
,
,
,
,
,
,
,
,
.
Note that when we generate MAFs from the half-flat shape bivariate priors,
we might get very small MAFs or get MAFs . In these cased,
we then delete this SNP.
So the final number of SNPs generated might be less than the initially-set
number of SNPs.
For the dataset stored in esSim
, there are SNPs.
SNPs are in cluster -,
SNPs are in cluster
,
and
SNPs are in cluster
.
Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.
data(esSimDiffPriors) print(esSimDiffPriors) pDat=pData(esSimDiffPriors) print(pDat[1:2,]) print(table(pDat$memSubjs)) fDat=fData(esSimDiffPriors) print(fDat[1:2,]) print(table(fDat$memGenes)) print(table(fDat$memGenes2))
data(esSimDiffPriors) print(esSimDiffPriors) pDat=pData(esSimDiffPriors) print(pDat[1:2,]) print(table(pDat$memSubjs)) fDat=fData(esSimDiffPriors) print(fDat[1:2,]) print(table(fDat$memGenes)) print(table(fDat$memGenes2))
Estimate SNP cluster membership. Only update cluster mixture proportions. Assume the 3 clusters have different sets of hyperparameters.
estMemSNPs(es, var.memSubjs = "memSubjs", eps = 0.001, MaxIter = 50, bVec = rep(3, 3), pvalAdjMethod = "fdr", method = "FDR", fdr = 0.05, verbose = FALSE)
estMemSNPs(es, var.memSubjs = "memSubjs", eps = 0.001, MaxIter = 50, bVec = rep(3, 3), pvalAdjMethod = "fdr", method = "FDR", fdr = 0.05, verbose = FALSE)
es |
An ExpressionSet object storing SNP genotype data.
It contains 3 matrices. The first matrix, which can be extracted by The second matrix, which can be extracted by The third matrix, which can be extracted by |
var.memSubjs |
character. The name of the phenotype variable indicating subject's case-control status. It must take only two values: 1 indicating case and 0 indicating control. |
eps |
numeric. A small positive number as threshold for convergence of EM algorithm. |
MaxIter |
integer. A positive integer indicating maximum iteration in EM algorithm. |
bVec |
numeric. A vector of 2 elements. Indicates the parameters of the symmetric Dirichlet prior for proportion mixtures. |
pvalAdjMethod |
character. Indicating p-value adjustment method. c.f. |
method |
method to obtain SNP cluster membership based on the responsibility matrix. The default value is “FDR”. The other possible value is “max”. see details. |
fdr |
numeric. A small positive FDR threshold used to call SNP cluster membership |
verbose |
logical. Indicating if intermediate and final results should be output. |
In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.
In cluster , the minor allele frequency
(MAF)
of cases is greater than the MAF
of
controls.
In cluster , the MAF
of cases is equal to
the MAF of controls.
In cluster , the MAF
of cases is smaller than
the MAF
of controls.
The proportions of the 3 clusters of SNPs are ,
, and
, respectively.
We assume a “half-flat shape” bivariate prior for the MAF in
cluster
where is hte indicator function taking value
if the event
is true, and value
otherwise.
The function
is the probability density function of the
beta distribution
.
We assume has the beta prior
.
We also assume a “half-flat shape” bivariate prior for the MAF in
cluster
The function is the probability density function of the
beta distribution
.
Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes.
That is, given MAF , the probabilities of genotypes are
We also assume the genotypes (wild-type),
(heterozygote), and
(mutation) follows a multinomial distribution
For each SNP, we calculat its posterior probabilities that it belongs to cluster . This forms a matrix with 3 columns. Rows are SNPs.
The 1st column is the posterior probability that the SNP belongs to cluster
.
The 2nd column is the posterior probability that the SNP belongs to cluster
.
The 3rd column is the posterior probability that the SNP belongs to cluster
.
We call this posterior probability matrix as responsibility matrix.
To determine which cluster a SNP eventually belongs to, we can use 2 methods.
The first method (the default method) is “FDR” method, which will
use FDR criterion to determine SNP cluster membership.
The 2nd method is use the maximum posterior probability to decide which
cluster a SNP belongs to.
A list of 12 elements
wMat |
matrix of posterior probabilities. The rows are SNPs. There are 3 columns. The first column is the posterior probability that a SNP belongs to cluster - given genotypes of subjects. The second column is the posterior probability that a SNP belongs to cluster 0 given genotypes of subjects. The third column is the posterior probability that a SNP belongs to cluster + given genotypes of subjects. |
memSNPs |
a vector of SNP cluster membership for the 3-cluster partitionfrom the mixture of 3 Bayesian hierarchical models. |
memSNPs2 |
a vector of binary SNP cluster membership. 1 indicates the SNP has different MAFs between cases and controls. 0 indicates the SNP has the same MAF in cases as that in controls. |
piVec |
a vector of cluster mixture proportions. |
alpha.p |
the first shape parameter of the beta prior for MAF obtaind from initial 3-cluster partitions based on GWAS for cluster +. |
beta.p |
the second shape parameter of the beta prior for MAF obtaind from initial 3-cluster partitions based on GWAS for cluster +. |
alpha0 |
the first shape parameter of the beta prior for MAF obtaind from initial 3-cluster partitions based on GWAS for cluster 0. |
beta0 |
the second shape parameter of the beta prior for MAF obtaind from initial 3-cluster partitions based on GWAS for cluster 0. |
alpha.n |
the first shape parameter of the beta prior for MAF obtaind from initial 3-cluster partitions based on GWAS for cluster -. |
beta.n |
the second shape parameter of the beta prior for MAF obtaind from initial 3-cluster partitions based on GWAS for cluster -. |
loop |
number of iteration in EM algorithm |
diff |
sum of the squared difference of cluster mixture proportions between current iteration and previous iteration in EM algorithm. if |
res.limma |
object returned by limma |
Yan Xu <[email protected]>, Li Xing <[email protected]>, Jessica Su <[email protected]>, Xuekui Zhang <[email protected]>, Weiliang Qiu <[email protected]>
Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.
data(esSimDiffPriors) print(esSimDiffPriors) es=esSimDiffPriors[1:500,] fDat = fData(es) print(fDat[1:2,]) print(table(fDat$memGenes)) res = estMemSNPs( es = es, var.memSubjs = "memSubjs") print(table(fDat$memGenes, res$memSNPs))
data(esSimDiffPriors) print(esSimDiffPriors) es=esSimDiffPriors[1:500,] fDat = fData(es) print(fDat[1:2,]) print(table(fDat$memGenes)) res = estMemSNPs( es = es, var.memSubjs = "memSubjs") print(table(fDat$memGenes, res$memSNPs))
Estimate SNP cluster membership. Only update cluster mixture proportions. Assume all 3 clusters have the same set of hyperparameters.
estMemSNPs.oneSetHyperPara(es, var.memSubjs = "memSubjs", eps = 1.0e-3, MaxIter = 50, bVec = rep(3, 3), pvalAdjMethod = "none", method = "FDR", fdr = 0.05, verbose = FALSE)
estMemSNPs.oneSetHyperPara(es, var.memSubjs = "memSubjs", eps = 1.0e-3, MaxIter = 50, bVec = rep(3, 3), pvalAdjMethod = "none", method = "FDR", fdr = 0.05, verbose = FALSE)
es |
An ExpressionSet object storing SNP genotype data.
It contains 3 matrices. The first matrix, which can be extracted by The second matrix, which can be extracted by The third matrix, which can be extracted by |
var.memSubjs |
character. The name of the phenotype variable indicating subject's case-control status. It must take only two values: 1 indicating case and 0 indicating control. |
eps |
numeric. A small positive number as threshold for convergence of EM algorithm. |
MaxIter |
integer. A positive integer indicating maximum iteration in EM algorithm. |
bVec |
numeric. A vector of 2 elements. Indicates the parameters of the symmetric Dirichlet prior for proportion mixtures. |
pvalAdjMethod |
character. Indicating p-value adjustment method. c.f. |
method |
method to obtain SNP cluster membership based on the responsibility matrix. The default value is “FDR”. The other possible value is “max”. see details. |
fdr |
numeric. A small positive FDR threshold used to call SNP cluster membership |
verbose |
logical. Indicating if intermediate and final results should be output. |
We characterize the distribution of genotypes of SNPs by a mixture of 3 Bayesian hierarchical models. The 3 Bayeisan hierarchical models correspond to 3 clusters of SNPs.
In cluster , the minor allele frequency
(MAF)
of cases is greater than the MAF
of
controls.
In cluster , the MAF
of cases is equal to
the MAF of controls.
In cluster , the MAF
of cases is smaller than
the MAF
of controls.
The proportions of the 3 clusters of SNPs are ,
, and
, respectively.
We assume a “half-flat shape” bivariate prior for the MAF in
cluster
where is hte indicator function taking value
if the event
is true, and value
otherwise.
The function
is the probability density function of the
beta distribution
.
We assume has the beta prior
.
We also assume a “half-flat shape” bivariate prior for the MAF in
cluster
Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes.
That is, given MAF , the probabilities of genotypes are
We also assume the genotypes (wild-type),
(heterozygote), and
(mutation) follows a multinomial distribution
For each SNP, we calculat its posterior probabilities that it belongs to cluster . This forms a matrix with 3 columns. Rows are SNPs.
The 1st column is the posterior probability that the SNP belongs to cluster
.
The 2nd column is the posterior probability that the SNP belongs to cluster
.
The 3rd column is the posterior probability that the SNP belongs to cluster
.
We call this posterior probability matrix as responsibility matrix.
To determine which cluster a SNP eventually belongs to, we can use 2 methods.
The first method (the default method) is “FDR” method, which will
use FDR criterion to determine SNP cluster membership.
The 2nd method is use the maximum posterior probability to decide which
cluster a SNP belongs to.
A list of 10 elements
wMat |
matrix of posterior probabilities. The rows are SNPs. There are 3 columns. The first column is the posterior probability that a SNP belongs to cluster - given genotypes of subjects. The second column is the posterior probability that a SNP belongs to cluster 0 given genotypes of subjects. The third column is the posterior probability that a SNP belongs to cluster + given genotypes of subjects. |
memSNPs |
a vector of SNP cluster membership for the 3-cluster partitionfrom the mixture of 3 Bayesian hierarchical models. |
memSNPs2 |
a vector of binary SNP cluster membership. 1 indicates the SNP has different MAFs between cases and controls. 0 indicates the SNP has the same MAF in cases as that in controls. |
piVec |
a vector of cluster mixture proportions. |
alpha |
the first shape parameter of the beta prior for MAF obtaind from initial 3-cluster partitions based on GWAS. |
beta |
the second shape parameter of the beta prior for MAF obtaind from initial 3-cluster partitions based on GWAS. |
loop |
number of iteration in EM algorithm |
diff |
sum of the squared difference of cluster mixture proportions between current iteration and previous iteration in EM algorithm. if |
res.limma |
object returned by limma |
Yan Xu <[email protected]>, Li Xing <[email protected]>, Jessica Su <[email protected]>, Xuekui Zhang <[email protected]>, Weiliang Qiu <[email protected]>
Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.
data(esSimDiffPriors) print(esSimDiffPriors) fDat = fData(esSimDiffPriors) print(fDat[1:2,]) print(table(fDat$memGenes)) res = estMemSNPs.oneSetHyperPara( es = esSimDiffPriors, var.memSubjs = "memSubjs") print(table(fDat$memGenes, res$memSNPs))
data(esSimDiffPriors) print(esSimDiffPriors) fDat = fData(esSimDiffPriors) print(fDat[1:2,]) print(table(fDat$memGenes)) res = estMemSNPs.oneSetHyperPara( es = esSimDiffPriors, var.memSubjs = "memSubjs") print(table(fDat$memGenes, res$memSNPs))
Simulate Genotype Data from a Mixture of 3 Bayesian Hierarchical Models. The minor allele frequency (MAF) of cases has the same prior as that of controls.
simGenoFunc(nCases = 100, nControls = 100, nSNPs = 1000, alpha.p = 2, beta.p = 5, pi.p = 0.1, alpha0 = 2, beta0 = 5, pi0 = 0.8, alpha.n = 2, beta.n = 5, pi.n = 0.1, low = 0.02, upp = 0.5, verbose = FALSE)
simGenoFunc(nCases = 100, nControls = 100, nSNPs = 1000, alpha.p = 2, beta.p = 5, pi.p = 0.1, alpha0 = 2, beta0 = 5, pi0 = 0.8, alpha.n = 2, beta.n = 5, pi.n = 0.1, low = 0.02, upp = 0.5, verbose = FALSE)
nCases |
integer. Number of cases. |
nControls |
integer. Number of controls. |
nSNPs |
integer. Number of SNPs. |
alpha.p |
numeric. The first shape parameter of Beta prior in cluster |
beta.p |
numeric. The second shape parameter of Beta prior in cluster |
pi.p |
numeric. Mixture proportion for cluster |
alpha0 |
numeric. The first shape parameter of Beta prior in cluster |
beta0 |
numeric. The second shape parameter of Beta prior in cluster |
pi0 |
numeric. Mixture proportion for cluster |
alpha.n |
numeric. The first shape parameter of Beta prior in cluster |
beta.n |
numeric. The second shape parameter of Beta prior in cluster |
pi.n |
numeric. Mixture proportion for cluster |
low |
numeric. A small positive value. If a MAF generated from half-flat shape
bivariate prior is smaller than |
upp |
numeric. A positive value. If a MAF generated from half-flat shape
bivariate prior is greater than |
verbose |
logical. Indicating if intermediate results or final results should be output to output screen. |
In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.
In cluster , the minor allele frequency
(MAF)
of cases is greater than the MAF
of
controls.
In cluster , the MAF
of cases is equal to
the MAF of controls.
In cluster , the MAF
of cases is smaller than
the MAF
of controls.
The proportions of the 3 clusters of SNPs are ,
, and
, respectively.
We assume a “half-flat shape” bivariate prior for the MAF in
cluster
where is hte indicator function taking value
if the event
is true, and value
otherwise.
The function
is the probability density function of the
beta distribution
.
We assume has the beta prior
.
We also assume a “half-flat shape” bivariate prior for the MAF in
cluster
The function is the probability density function of the
beta distribution
.
Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes.
That is, given MAF , the probabilities of genotypes are
We also assume the genotypes (wild-type),
(heterozygote), and
(mutation) follows a multinomial distribution
Note that when we generate MAFs from the half-flat shape bivariate priors,
we might get very small MAFs or get MAFs . In these cased,
we then delete this SNP.
So the final number of SNPs generated might be less than the initially-set number of SNPs.
An ExpressionSet object stores genotype data.
Yan Xu <[email protected]>, Li Xing <[email protected]>, Jessica Su <[email protected]>, Xuekui Zhang <[email protected]>, Weiliang Qiu <[email protected]>
Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.
set.seed(2) esSim = simGenoFunc( nCases = 100, nControls = 100, nSNPs = 500, alpha.p = 2, beta.p = 5, pi.p = 0.1, alpha0 = 2, beta0 = 5, pi0 = 0.8, alpha.n = 2, beta.n = 5, pi.n = 0.1, low = 0.02, upp = 0.5, verbose = FALSE ) print(esSim) pDat = pData(esSim) print(pDat[1:2,]) print(table(pDat$memSubjs)) fDat = fData(esSim) print(fDat[1:2,]) print(table(fDat$memGenes)) print(table(fDat$memGenes2))
set.seed(2) esSim = simGenoFunc( nCases = 100, nControls = 100, nSNPs = 500, alpha.p = 2, beta.p = 5, pi.p = 0.1, alpha0 = 2, beta0 = 5, pi0 = 0.8, alpha.n = 2, beta.n = 5, pi.n = 0.1, low = 0.02, upp = 0.5, verbose = FALSE ) print(esSim) pDat = pData(esSim) print(pDat[1:2,]) print(table(pDat$memSubjs)) fDat = fData(esSim) print(fDat[1:2,]) print(table(fDat$memGenes)) print(table(fDat$memGenes2))
Simulate Genotype Data from a Mixture of 3 Bayesian Hierarchical Models. The minor allele frequency (MAF) of cases has different priors than that of controls.
simGenoFuncDiffPriors( nCases = 100, nControls = 100, nSNPs = 1000, alpha.p.ca = 2, beta.p.ca = 3, alpha.p.co = 2, beta.p.co = 8, pi.p = 0.1, alpha0 = 2, beta0 = 5, pi0 = 0.8, alpha.n.ca = 2, beta.n.ca = 8, alpha.n.co = 2, beta.n.co = 3, pi.n = 0.1, low = 0.02, upp = 0.5, verbose = FALSE)
simGenoFuncDiffPriors( nCases = 100, nControls = 100, nSNPs = 1000, alpha.p.ca = 2, beta.p.ca = 3, alpha.p.co = 2, beta.p.co = 8, pi.p = 0.1, alpha0 = 2, beta0 = 5, pi0 = 0.8, alpha.n.ca = 2, beta.n.ca = 8, alpha.n.co = 2, beta.n.co = 3, pi.n = 0.1, low = 0.02, upp = 0.5, verbose = FALSE)
nCases |
integer. Number of cases. |
nControls |
integer. Number of controls. |
nSNPs |
integer. Number of SNPs. |
alpha.p.ca |
numeric. The first shape parameter of Beta prior in cluster |
beta.p.ca |
numeric. The second shape parameter of Beta prior in cluster |
alpha.p.co |
numeric. The first shape parameter of Beta prior in cluster |
beta.p.co |
numeric. The second shape parameter of Beta prior in cluster |
pi.p |
numeric. Mixture proportion for cluster |
alpha0 |
numeric. The first shape parameter of Beta prior in cluster |
beta0 |
numeric. The second shape parameter of Beta prior in cluster |
pi0 |
numeric. Mixture proportion for cluster |
alpha.n.ca |
numeric. The first shape parameter of Beta prior in cluster |
beta.n.ca |
numeric. The second shape parameter of Beta prior in cluster |
alpha.n.co |
numeric. The first shape parameter of Beta prior in cluster |
beta.n.co |
numeric. The second shape parameter of Beta prior in cluster |
pi.n |
numeric. Mixture proportion for cluster |
low |
numeric. A small positive value. If a MAF generated from half-flat shape
bivariate prior is smaller than |
upp |
numeric. A positive value. If a MAF generated from half-flat shape
bivariate prior is greater than |
verbose |
logical. Indicating if intermediate results or final results should be output to output screen. |
In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.
In cluster , the minor allele frequency
(MAF)
of cases is greater than the MAF
of
controls.
In cluster , the MAF
of cases is equal to
the MAF of controls.
In cluster , the MAF
of cases is smaller than
the MAF
of controls.
The proportions of the 3 clusters of SNPs are ,
, and
, respectively.
We assume a “half-flat shape” bivariate prior for the MAF in
cluster
where is hte indicator function taking value
if the event
is true, and value
otherwise.
The function
is the probability density function of the
beta distribution
.
We assume has the beta prior
.
We also assume a “half-flat shape” bivariate prior for the MAF in
cluster
The function is the probability density function of the
beta distribution
.
Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes.
That is, given MAF , the probabilities of genotypes are
We also assume the genotypes (wild-type),
(heterozygote), and
(mutation) follows a multinomial distribution
Note that when we generate MAFs from the half-flat shape bivariate priors,
we might get very small MAFs or get MAFs . In these cased,
we then delete this SNP.
So the final number of SNPs generated might be less than the initially-set number of SNPs.
An ExpressionSet object stores genotype data.
Yan Xu <[email protected]>, Li Xing <[email protected]>, Jessica Su <[email protected]>, Xuekui Zhang <[email protected]>, Weiliang Qiu <[email protected]>
Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.
set.seed(2) esSimDiffPriors = simGenoFuncDiffPriors( nCases = 100, nControls = 100, nSNPs = 500, alpha.p.ca = 2, beta.p.ca = 3, alpha.p.co = 2, beta.p.co = 8, pi.p = 0.1, alpha0 = 2, beta0 = 5, pi0 = 0.8, alpha.n.ca = 2, beta.n.ca = 8, alpha.n.co = 2, beta.n.co = 3, pi.n = 0.1, low = 0.02, upp = 0.5, verbose = FALSE ) print(esSimDiffPriors) pDat = pData(esSimDiffPriors) print(pDat[1:2,]) print(table(pDat$memSubjs)) fDat = fData(esSimDiffPriors) print(fDat[1:2,]) print(table(fDat$memGenes)) print(table(fDat$memGenes2))
set.seed(2) esSimDiffPriors = simGenoFuncDiffPriors( nCases = 100, nControls = 100, nSNPs = 500, alpha.p.ca = 2, beta.p.ca = 3, alpha.p.co = 2, beta.p.co = 8, pi.p = 0.1, alpha0 = 2, beta0 = 5, pi0 = 0.8, alpha.n.ca = 2, beta.n.ca = 8, alpha.n.co = 2, beta.n.co = 3, pi.n = 0.1, low = 0.02, upp = 0.5, verbose = FALSE ) print(esSimDiffPriors) pDat = pData(esSimDiffPriors) print(pDat[1:2,]) print(table(pDat$memSubjs)) fDat = fData(esSimDiffPriors) print(fDat[1:2,]) print(table(fDat$memGenes)) print(table(fDat$memGenes2))