首页 > 编程札记 > 编程

Paper reading (六十六)：A Sparse Regression Model for Predictive Modeling of Microbial Community Data

阅读：评论：0

论文题目：A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data

scholar 引用：1

页数：12

发表时间：2018.12

发表刊物：Frontiers in Microbiology

作者：Jian Xiao, Li Chen, ..., and Jun Chen

摘要：

Fueled by technological advancement, there has been a surge of human microbiome studies surveying the microbial communities associated with the human body and their links with health and disease. As a complement to the human genome, the human microbiome holds great potential for precision medicine. Efficient predictive models based on microbiome data could be potentially used in various clinical applications such as disease diagnosis, patient stratification and drug response prediction. One important characteristic of the microbial community data is the phylogenetic tree that relates all the microbial taxa based on their evolutionary history. The phylogenetic tree is an informative prior for more efficient prediction since the microbial community changes are usually not randomly distributed on the tree but tend to occur in clades at varying phylogenetic depths (clustered signal). Although community-wide changes are possible for some conditions, it is also likely that the community changes are only associated with a small subset of “marker” taxa (sparse signal). Unfortunately, predictive models of microbial community data taking into account both the sparsity and the tree structure remain under-developed. In this paper, we propose a predictive framework to exploit sparse and clustered microbiome signals using a phylogeny-regularized sparse regression model. Our approach is motivated by evolutionary theory, where a natural correlation structure among microbial taxa exists according to the phylogenetic relationship. A novel phylogeny-based smoothness penalty is proposed to smooth the coefficients of the microbial taxa with respect to the phylogenetic tree. Using simulated and real datasets, we show that our method achieves better prediction performance than competing sparse regression methods for sparse and clustered microbiome signals.

正文组织架构：

1. Introduction

2. Methods

2.1 A Phylogeny-Induced Correlation Structure Among OTUs

2.2 Phylogeny-Regularized Sparse Generalized Linear Model

2.3 Connection With Existing Methods

2.4 Some Theoretical Properties

2.5 Model Estimation and Computational Complexity

3. Simulation studies

3.1 Simulation Strategy

3.1.1 Simulating OTU Abundance Data

3.1.2 Selecting Outcome-Associated OTUs

3.1.3 Generating the Outcome Based on the Outcome-Associated OTUs

3.2 Competing Methods, Model Selection and Evaluation

3.2.1 Competing Methods

3.2.2 Model Selection and Evaluation

3.3 Simulation Results

3.3.1 Results for Continuous-Outcome Data

3.3.2 Results for Binary-Outcome Data

3.3.3 Comparison to SLS With Different Sparsity Levels in the Laplacian Matrix

4. Real Data Applications

4.1 Caffeine Intake Data

4.2 Smoking Data

5. Discussion

正文部分内容摘录：

1. Biological Problem: What biological problems have been solved in this paper?

One important characteristic of the microbial community data is the phylogenetic tree that relates all the microbial taxa based on their evolutionary history.
capturing sparse and clustered microbiome signals
One important task for microbiome analysis is to predict the phenotype/outcome (either quantitative or qualitative) based on the features of the underlying microbial community (relative abundances of the OTUs and their phylogeny).

2. Main discoveries: What is the main discoveries in this paper?

我们的主要结果：Using simulated and real datasets, we show that our method achieves better prediction performance than competing sparse regression methods for sparse and clustered microbiome signals.
这个新方法的鲁棒性也不错：We demonstrated the robustness of the proposed method when the tree was not informative or misspecified.

3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?

主要工作1：we propose a predictive framework to exploit sparse and clustered microbiome signals using a phylogeny-regularized sparse regression model.
主要工作2：A novel phylogeny-based smoothness penalty is proposed to smooth the coefficients of the microbial taxa with respect to the phylogenetic tree.
a phylogeny-regularized sparse regression model 这是提出的模型，这个模型的创新之处在于设计了一种a novel phylogeny-based smoothness penalty
The inputs of our method are the OTU count table, the phylogenetic tree of the OTUs and the outcome measurements, and the outputs are the selected OTUs and the predictive function based on their abundances.
Caffeine Intake Data: 98 samples and 499 OTUs
Smoking Data: 32 non-smokers and 28 smokers with 174 OTUs

4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?

Unfortunately, predictive models of microbial community data taking into account both the sparsity and the tree structure remain under-developed. 前无古人
传统方法的缺点：these methods do not fully exploit the information in the microbiome data, particularly the phylogenetic relationship among OTUs.
当然也有一些人探索了OTUs的应用，但是也有一些缺点：there is still a need to develop prediction methods for sparse clustered signals while exploiting the full information of the phylogenetic tree, which consists of both the tree topology and branch lengths.
主要工作1的优点：Our approach is motivated by evolutionary theory, where a natural correlation structure among microbial taxa exists according to the phylogenetic relationship.
新方法相较于传统的 Laplacian-based smoothness penalty 的优点：We show that such inverse correlation-based smoothness penalty improved over the traditional Laplacian-based smoothness penalty for microbiome applications, due to its local smoothing property as well as the dual smoothing effects (i.e., data-driven and prior-driven smoothing).
新方法性能提升的另一个原因：an additional tuning parameter in the smoothness penalty allows our model to capture signals at various phylogenetic depths, further improving its prediction power.
introduction中我们这个新方法的优点：In summary, the sparse nature of the distribution of OTUs in complex microbiome data can be better captured by our model because it provides a data-adaptive way to group the OTUs according to their phylogeny as well as to select the most predictive OTUs, which leads to improved prediction and interpretation.

5. Biological Significance: What is the biological significance of these ML methods’ results?

We compared the proposed method (SICS) to Lasso, MCP and Elastic Net (Enet), the three sparse regression models without considering the phylogenetic tree. We also compared SICS to a Laplacian-regularized sparse regression model as implemented in glmgraph (SLS)
Besides those sparse regression models, we also compared SICS to a representative machine learning method, Random Forest (RF), which has been demonstrated good prediction performance on microbiome data

6. Prospect: What are the potential applications of these machine learning methods in biological science?

模型可以扩展的方面：Our model can be extended to capture more complex nonlinear effects.
The simplest strategy is to apply various transformations, e.g., Box-cox transformation, to the OTU abundance data and selects the best transformation function based on cross-validation.
Alternatively, one could apply an additive model, which is more flexible and allows OTU-specific nonlinear effects .
但是扩展了以后，还是需要更大的数据集来验证性能。However, a larger sample size may be needed to achieve good performance.
存在的问题：the distribution of OTU abundances is very skewed, and a large number of OTUs are rare and of low-abundance.
For these rare OTUs, their sampling variability is very large.
未来考虑的一方面：Accommodating the sampling error in the predictive model could potentially improve the prediction performance.
好了，展望一下：Jointly modeling the microbiome and the outcome data is thus a promising direction. We leave these extensions as our future work.

本文发布于:2024-02-04 07:14:38，感谢您对本站的认可！

本文链接：https://www.4u4v.net/it/170701873053495.html

上一篇：Windows Phone 7 开发机解锁过程

下一篇：Paper reading (八十一)：Ecologically informed microbial biomarkers and accurate classification

标签：Sparse Regression 六十六 Paper reading

留言与评论（共有 0 条评论）