首页 > 编程札记 > 编程

Advance articles alert 26 July 2019

阅读：评论：0

【】1. diSTruct v1.0: Generating Biomolecular Structures from Distance Constraints

Oskar Taubert, Ines Reinartz, Henning Meyerhenke, Alexander Schug

Bioinformatics, btz578, .1093/bioinformatics/btz578

Published:22 July 2019

Abstract

Summary

The distance geometry（几何） problem is often encountered in molecular biology and the life sciences at large, as a host of experimental methods produce ambiguous and noisy distance data. In this note, we present diSTruct; an adaptation of the generic MaxEnt-Stress graph drawing algorithm to the domain of biological macromolecules（生物大分子）. diSTruct is fast, provides reliable structural models even from incomplete or noisy distance data and integrates access to graph analysis tools.

Availability and Implementation

diSTruct is written in C ++, Cython and Python 3. It is available from .git or in the Python package index under the MIT license.

Supplementary information

Supplementary data is available at Bioinformatics online.

2.How sequence alignment scores correspond to probability models

Martin C Frith Bioinformatics, btz576, .1093/bioinformatics/btz576

Published:22 July 2019

Abstract

Motivation

Sequence alignment remains fundamental in bioinformatics. Pair-wise alignment is traditionally based on ad hoc scores for substitutions, insertions, and deletions, but can also be based on probability models (pair hidden Markov models: PHMMs). PHMMs enable us to: fit the parameters to each kind of data, calculate the reliability of alignment parts, and measure sequence similarity integrated over possible alignments.

Results

This study shows how multiple models correspond to one set of scores. Scores can be converted to probabilities by partition functions with a “temperature” parameter: for any temperature, this corresponds to some PHMM. There is a special class of models with balanced length probability, i.e. no bias towards either longer or shorter alignments. The best way to score alignments and assess their significance depends on the aim: judging whether whole sequences are related versus finding related parts. This clarifies the statistical basis of sequence alignment.

Supplementary information

Supplementary data are available at Bioinformatics online.

【】3.Genetic association testing using the GENESIS R/Bioconductor package

Stephanie M Gogarten, Tamar Sofer, Han Chen, Chaoyu Yu, Jennifer A Brody,Timothy A Thornton, Kenneth M Rice, Matthew P Conomos

Bioinformatics, btz567, .1093/bioinformatics/btz567

Published:22 July 2019

Abstract

Summary

The Genomic Data Storage (GDS) format provides efficient storage and retrieval of genotypes measured by microarrays and sequencing. We developed GENESIS to perform various single- and aggregate-variant association tests using genotype data stored in GDS format. GENESIS implements highly flexible mixed models, allowing for different link functions, multiple variance components, and phenotypic heteroskedasticity. GENESIS integrates cohesively with other R/Bioconductor packages to build a complete genomic analysis workflow entirely within the R environment.

Availability and Implementation

; vignettes included.

Supplementary Information

Supplementary tables and figures are available at Bioinformatics online.

4。

【】Estimating and testing the microbial causal mediation effect with high-dimensional and compositional microbiome data

Chan Wang, Jiyuan Hu, Martin J Blaser, Huilin Li

Bioinformatics, btz565, .1093/bioinformatics/btz56

22 July 2019

Abstract

Motivation

Recent microbiome association studies （微生物关联研究）have revealed important associations between microbiome and disease/health status. Such findings encourage scientists to dive deeper to uncover the causal role （因果关系）of microbiome in the underlying biological mechanism, and have led to applying statistical models to quantify causal microbiome effects and to identify the specific microbial agents. However, there are no existing causal mediation methods specifically designed to handle high dimensional and compositional microbiome data.

Results

We propose a rigorous Sparse Microbial Causal Mediation Model (SparseMCMM) specifically designed for the high dimensional and compositional microbiome data in a typical three-factor (treatment, microbiome and outcome) causal study design. In particular, linear log-contrast regression model and Dirichlet regression model are proposed to estimate the causal direct effect of treatment and the causal mediation effects of microbiome at both the community and individual taxon levels. Regularization techniques are used to perform the variable selection in the proposed model framework to identify signature causal microbes. Two hypothesis tests on the overall mediation effect are proposed and their statistical significance is estimated by permutation procedures. Extensive simulated scenarios show that SparseMCMM has excellent performance in estimation and hypothesis testing. Finally, we showcase the utility of the proposed SparseMCMM method in a study which the murine microbiome has been manipulated by providing a clear and sensible causal path among antibiotic treatment, microbiome composition and mouse weight.

Availability

and .

Supplementary information

Supplementary data are available at Bioinformatics online.

5. GSMA: an approach to identify robust global and test Gene Signatures using Meta-Analysis

Adib Shafi, Tin Nguyen, Azam Peyvandipour, Sorin Draghici

Bioinformatics, btz561, .1093/bioinformatics/btz561

Published:22 July 2019

Abstract

Motivation

Recent advances in biomedical research have made massive amount of transcriptomic data available in public repositories from different sources. Due to the heterogeneity present in the individual experiments, identifying reproducible biomarkers for a given disease from multiple independent studies has become a major challenge. The widely used meta-analysis approaches, such as Fisher’s method, Stouffer’s method, minP and maxP, have at least two major limitations: i) they are sensitive to outliers, and ii) they perform only one statistical test for each individual study, and hence do not fully utilize the potential sample size to gain statistical power.

Results

Here we propose GSMA, an intra- and inter-level meta-analysis framework that overcomes these limitations and provides a gene signature that is reliable and reproducible across multiple independent studies of a given disease. The approach provides a comprehensive global signature that can be used to understand the underlying biological phenomena, and a smaller test signaturethat can be used to classify future samples of a given disease. We demonstrate the utility of the framework by constructing disease signatures for influenza and Alzheimer’s disease using 9 data sets including 1,108 individuals. These signatures are then validated on 12 independent data sets including 912 individuals. The results indicate that the proposed approach performs better than the majority of the existing meta-analysis approaches in terms of both sensitivity as well as specificity. The proposed signatures could be further used in diagnosis, prognosis and identification of therapeutic targets.

Availability

For the review purpose, source code is currently available at . It will be available as a package in Bioconductor soon.

Supplementary information

Supplementary data are available at Bioinformatics online.

本文发布于:2024-01-30 06:29:08，感谢您对本站的认可！

本文链接：https://www.4u4v.net/it/170656735019880.html

上一篇：知识图谱实战导论：从什么是KG到LLM与KG/DB的结合实战

下一篇：Python学习笔记之日期、时间的用法