<?xml version='1.0' encoding='UTF-8'?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
  <channel>
    <title>Journal of Computational Biology</title>
    <link>https://pubmed.ncbi.nlm.nih.gov/rss-feed/?feed_id=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;ff=20220524180447&amp;v=2.17.6&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;utm_medium=rss&amp;utm_source=Other</link>
    <description>Journal of Computational Biology: Latest results from PubMed</description>
    <atom:link href="https://pubmed.ncbi.nlm.nih.gov/rss-feed/?feed_id=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;ff=20220524180447&amp;v=2.17.6&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;utm_medium=rss&amp;utm_source=Other" rel="self"/>
    <docs>http://www.rssboard.org/rss-specification</docs>
    <generator>PubMed RSS feeds (2.17.6)</generator>
    <language>en</language>
    <lastBuildDate>Tue, 24 May 2022 22:04:47 +0000</lastBuildDate>
    <pubDate>Fri, 20 May 2022 06:00:00 -0400</pubDate>
    <ttl>120</ttl>
    <item>
      <title>kmer2vec: A Novel Method for Comparing DNA Sequences by word2vec Embedding</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35593919/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>The comparison of DNA sequences is of great significance in genomics analysis. Although the traditional multiple sequence alignment (MSA) method is popularly used for evolutionary analysis, optimally aligning k sequences becomes computationally intractable when k increases due to the intrinsic computational complexity of MSA. Despite numerous k-mer alignment-free methods being proposed, the existing k-mer alignment-free methods may not truly capture the contextual structures of the sequences. In...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May 20. doi: 10.1089/cmb.2021.0536. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>The comparison of DNA sequences is of great significance in genomics analysis. Although the traditional multiple sequence alignment (MSA) method is popularly used for evolutionary analysis, optimally aligning <i>k</i> sequences becomes computationally intractable when <i>k</i> increases due to the intrinsic computational complexity of MSA. Despite numerous <i>k</i>-mer alignment-free methods being proposed, the existing <i>k</i>-mer alignment-free methods may not truly capture the contextual structures of the sequences. In this study, we present a novel <i>k</i>-mer contextual alignment-free method (called kmer2vec), in which the sequence <i>k</i>-mers are semantically embedded to word2vec vectors, an essential technique in natural language processing. Consequently, the method converts each DNA/RNA sequence into a point in the word2vec high-dimensional space and compares DNA sequences in the space. Because the word2vec vectors are trained from the contextual relationship of <i>k</i>-mers in the genomes, the method may extract valuable structural information from the sequences and reflect the relationship among them properly. The proposed method is optimized on the parameters from word2vec training and verified in the phylogenetic analysis of large whole genomes, including coronavirus and bacterial genomes. The results demonstrate the effectiveness of the method on phylogenetic tree construction and species clustering. The method running speed is much faster than that of the MSA method, especially the phylogenetic relationships constructed by the kmer2vec method are more accurate than the conventional <i>k</i>-mer alignment-free method. Therefore, this approach can provide new perspectives for phylogeny and evolution and make it possible to analyze large genomes. In addition, we discuss special parameterization in the <i>k</i>-mer word2vec embedding construction. An effective tool for rapid SARS-CoV-2 typing can also be derived when combining kmer2vec with clustering methods.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35593919/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35593919</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0536>10.1089/cmb.2021.0536</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35593919</guid>
      <pubDate>Fri, 20 May 2022 06:00:00 -0400</pubDate>
      <dc:creator>Ruohan Ren</dc:creator>
      <dc:creator>Changchuan Yin</dc:creator>
      <dc:creator>Stephen S-T Yau</dc:creator>
      <dc:date>2022-05-20</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>kmer2vec: A Novel Method for Comparing DNA Sequences by word2vec Embedding</dc:title>
      <dc:identifier>pmid:35593919</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0536</dc:identifier>
    </item>
    <item>
      <title>Extracting Information from Gene Coexpression Networks of &lt;em&gt;Rhizobium leguminosarum&lt;/em&gt;</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35588362/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Nitrogen uptake in legumes is facilitated by bacteria such as Rhizobium leguminosarum. For this bacterium, gene expression data are available, but functional gene annotation is less well developed than for other model organisms. More annotations could lead to a better understanding of the pathways for growth, plant colonization, and nitrogen fixation in R. leguminosarum. In this study, we present a pipeline that combines novel scores from gene coexpression network analysis in a principled way to...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May 19. doi: 10.1089/cmb.2021.0600. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>Nitrogen uptake in legumes is facilitated by bacteria such as <i>Rhizobium leguminosarum</i>. For this bacterium, gene expression data are available, but functional gene annotation is less well developed than for other model organisms. More annotations could lead to a better understanding of the pathways for growth, plant colonization, and nitrogen fixation in <i>R. leguminosarum</i>. In this study, we present a pipeline that combines novel scores from gene coexpression network analysis in a principled way to identify the genes that are associated with certain growth conditions or highly coexpressed with a predefined set of genes of interest. This association may lead to putative functional annotation or to a prioritized list of genes for further study.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35588362/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35588362</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0600>10.1089/cmb.2021.0600</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35588362</guid>
      <pubDate>Thu, 19 May 2022 06:00:00 -0400</pubDate>
      <dc:creator>Javier Pardo-Diaz</dc:creator>
      <dc:creator>Mariano Beguerisse-Díaz</dc:creator>
      <dc:creator>Philip S Poole</dc:creator>
      <dc:creator>Charlotte M Deane</dc:creator>
      <dc:creator>Gesine Reinert</dc:creator>
      <dc:date>2022-05-19</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Extracting Information from Gene Coexpression Networks of &lt;em&gt;Rhizobium leguminosarum&lt;/em&gt;</dc:title>
      <dc:identifier>pmid:35588362</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0600</dc:identifier>
    </item>
    <item>
      <title>Translator: A &lt;em&gt;Trans&lt;/em&gt;fer &lt;em&gt;L&lt;/em&gt;earning Approach to Facilitate Single-Cell &lt;em&gt;AT&lt;/em&gt;AC-Seq Data Analysis fr&lt;em&gt;o&lt;/em&gt;m &lt;em&gt;R&lt;/em&gt;eference Dataset</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35584295/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Recent advances in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) have allowed simultaneous epigenetic profiling over thousands of individual cells to dissect the cellular heterogeneity and elucidate regulatory mechanisms at the finest possible resolution. However, scATAC-seq is challenging to model computationally due to the ultra-high dimensionality, low signal-to-noise ratio, complex feature interactions, and high vulnerability to various confounding factors....</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May 17. doi: 10.1089/cmb.2021.0596. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">Recent advances in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) have allowed simultaneous epigenetic profiling over thousands of individual cells to dissect the cellular heterogeneity and elucidate regulatory mechanisms at the finest possible resolution. However, scATAC-seq is challenging to model computationally due to the ultra-high dimensionality, low signal-to-noise ratio, complex feature interactions, and high vulnerability to various confounding factors. In this study, we present Translator, an efficient transfer learning approach to capture generalizable chromatin interactions from high-quality (HQ) reference scATAC-seq data to obtain robust cell representations in low-to-moderate quality target scATAC-seq data. We applied Translator on various simulated and real scATAC-seq datasets and demonstrated that Translator could learn more biologically meaningful cell representations than other methods by incorporating information learned from the reference data, thus facilitating various downstream analyses such as clustering and motif enrichment measurements. Moreover, Translator's block-wise deep learning framework can handle nonlinear relationships with restricted connections using fewer parameters to boost computational efficiency through Graphics Processing Unit (GPU) parallelism. Finally, we have implemented Translator as a free software package available for the community to leverage large-scale, HQ reference data to study target scATAC-seq data.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35584295/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35584295</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0596>10.1089/cmb.2021.0596</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35584295</guid>
      <pubDate>Wed, 18 May 2022 06:00:00 -0400</pubDate>
      <dc:creator>Siwei Xu</dc:creator>
      <dc:creator>Mario Skarica</dc:creator>
      <dc:creator>Ahyeon Hwang</dc:creator>
      <dc:creator>Yi Dai</dc:creator>
      <dc:creator>Cheyu Lee</dc:creator>
      <dc:creator>Matthew J Girgenti</dc:creator>
      <dc:creator>Jing Zhang</dc:creator>
      <dc:date>2022-05-18</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Translator: A &lt;em&gt;Trans&lt;/em&gt;fer &lt;em&gt;L&lt;/em&gt;earning Approach to Facilitate Single-Cell &lt;em&gt;AT&lt;/em&gt;AC-Seq Data Analysis fr&lt;em&gt;o&lt;/em&gt;m &lt;em&gt;R&lt;/em&gt;eference Dataset</dc:title>
      <dc:identifier>pmid:35584295</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0596</dc:identifier>
    </item>
    <item>
      <title>Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35584271/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Microbial organisms play important roles in many aspects of human health and diseases. Encouraged by the numerous studies that show the association between microbiomes and human diseases, computational and machine learning methods have been recently developed to generate and utilize microbiome features for prediction of host phenotypes such as disease versus healthy cancer immunotherapy responder versus nonresponder. We have previously developed a subtractive assembly approach, which focuses on...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May 17. doi: 10.1089/cmb.2021.0640. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>Microbial organisms play important roles in many aspects of human health and diseases. Encouraged by the numerous studies that show the association between microbiomes and human diseases, computational and machine learning methods have been recently developed to generate and utilize microbiome features for prediction of host phenotypes such as disease versus healthy cancer immunotherapy responder versus nonresponder. We have previously developed a <i>subtractive assembly</i> approach, which focuses on extraction and assembly of differential reads from metagenomic data sets that are likely sampled from differential genomes or genes between two groups of microbiome data sets (e.g., healthy vs. disease). In this article, we further improved our subtractive assembly approach by utilizing groups of k-mers with similar abundance profiles across multiple samples. We implemented a locality-sensitive hashing (LSH)-enabled approach (called kmerLSHSA) to group billions of k-mers into <i>k-mer coabundance groups</i> (kCAGs), which were subsequently used for the retrieval of <i>differential</i> kCAGs for subtractive assembly. Testing of the kmerLSHSA approach on simulated data sets and real microbiome data sets showed that, compared with the conventional approach that utilizes <i>all</i> genes, our approach can quickly identify differential genes that can be used for building promising predictive models for microbiome-based host phenotype prediction. We also discussed other potential applications of LSH-enabled clustering of k-mers according to their abundance profiles across multiple microbiome samples.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35584271/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35584271</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0640>10.1089/cmb.2021.0640</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35584271</guid>
      <pubDate>Wed, 18 May 2022 06:00:00 -0400</pubDate>
      <dc:creator>Wontack Han</dc:creator>
      <dc:creator>Haixu Tang</dc:creator>
      <dc:creator>Yuzhen Ye</dc:creator>
      <dc:date>2022-05-18</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype</dc:title>
      <dc:identifier>pmid:35584271</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0640</dc:identifier>
    </item>
    <item>
      <title>WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35575747/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Accurate multiple sequence alignment is challenging on many data sets, including those that are large, evolve under high rates of evolution, or have sequence length heterogeneity. While substantial progress has been made over the last decade in addressing the first two challenges, sequence length heterogeneity remains a significant issue for many data sets. Sequence length heterogeneity occurs for biological and technological reasons, including large insertions or deletions (indels) that...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May 17. doi: 10.1089/cmb.2021.0585. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:p1="http://pubmed.gov/pub-one"><b>Accurate multiple sequence alignment is challenging on many data sets, including those that are large, evolve under high rates of evolution, or have sequence length heterogeneity. While substantial progress has been made over the last decade in addressing the first two challenges, sequence length heterogeneity remains a significant issue for many data sets. Sequence length heterogeneity occurs for biological and technological reasons, including large insertions or deletions (indels) that occurred in the evolutionary history relating the sequences, or the inclusion of sequences that are not fully assembled. Ultra-large alignments using Phylogeny-Aware Profiles (UPP) (Nguyen et al. 2015) is one of the most accurate approaches for aligning data sets that exhibit sequence length heterogeneity: it constructs an alignment on the subset of sequences it considers "full-length," represents this "backbone alignment" using an ensemble of hidden Markov models (HMMs), and then adds each remaining sequence into the backbone alignment based on an HMM selected for that sequence from the ensemble. Our new method, WeIghTed Consensus Hmm alignment (WITCH), improves on UPP in three important ways: first, it uses a statistically principled technique to weight and rank the HMMs; second, it uses</b> <mml:math><mml:mi>k</mml:mi><mml:mo>&gt;</mml:mo><mml:mn>1</mml:mn></mml:math> <b>HMMs from the ensemble rather than a single HMM; and third, it combines the alignments for each of the selected HMMs using a consensus algorithm that takes the weights into account. We show that this approach provides improved alignment accuracy compared with UPP and other leading alignment methods, as well as improved accuracy for maximum likelihood trees based on these alignments.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35575747/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35575747</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0585>10.1089/cmb.2021.0585</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35575747</guid>
      <pubDate>Mon, 16 May 2022 06:00:00 -0400</pubDate>
      <dc:creator>Chengze Shen</dc:creator>
      <dc:creator>Minhyuk Park</dc:creator>
      <dc:creator>Tandy Warnow</dc:creator>
      <dc:date>2022-05-16</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment</dc:title>
      <dc:identifier>pmid:35575747</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0585</dc:identifier>
    </item>
    <item>
      <title>Improvements Achieved by Multiple Imputation for Single-Cell RNA-Seq Data in Clustering Analysis and Differential Expression Analysis</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35575729/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>In a single-cell RNA-seq (scRNA-seq) data set, a high proportion of missing values (or an excessive number of zeroes) are frequently observed. For the related follow-up tasks, such as clustering analysis and differential expression analysis, a data set without missing values is generally required. Many imputation approaches have been proposed for this purpose. Multiple imputation (MI) is a well-established approach to address possible biases in a follow-up analysis result based on one-time...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May 16. doi: 10.1089/cmb.2021.0597. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>In a single-cell RNA-seq (scRNA-seq) data set, a high proportion of missing values (or an excessive number of zeroes) are frequently observed. For the related follow-up tasks, such as clustering analysis and differential expression analysis, a data set without missing values is generally required. Many imputation approaches have been proposed for this purpose. Multiple imputation (MI) is a well-established approach to address possible biases in a follow-up analysis result based on one-time imputed data. There is a lack of investigation on this in the analysis of scRNA-seq data. In this study, we have investigated how to efficiently apply the MI approach to the clustering analysis and the differential expression analysis of scRNA-seq data. We proposed an MI procedure for clustering analysis and an MI procedure for differential expression analysis. To demonstrate the improvements achieved by MI in clustering analysis and differential expression analysis of scRNA-seq data, we analyzed three well-known scRNA-seq data sets. scIGANs, an scRNA-seq imputation method based on the generative adversarial networks (GANs), has been recently proposed for scRNA-seq data imputation. Multiple randomly imputed data sets can be conveniently generated by this method. We implemented our MI procedures based on scIGANs. We demonstrated that MI yielded improved performances on the clustering analysis and differential expression analysis results. Our applications to experimental scRNA-seq data illustrated the advantages of MI over one-time imputation of missing values in scRNA-seq data.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35575729/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35575729</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0597>10.1089/cmb.2021.0597</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35575729</guid>
      <pubDate>Mon, 16 May 2022 06:00:00 -0400</pubDate>
      <dc:creator>Mengqiu Zhu</dc:creator>
      <dc:creator>Yinglei Lai</dc:creator>
      <dc:date>2022-05-16</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Improvements Achieved by Multiple Imputation for Single-Cell RNA-Seq Data in Clustering Analysis and Differential Expression Analysis</dc:title>
      <dc:identifier>pmid:35575729</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0597</dc:identifier>
    </item>
    <item>
      <title>Fast Algorithms for the Simplified Partial Digest Problem</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35575710/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>The simplified partial digest problem (SPDP) models an effective and robust method for the building of a physical map using restriction site analysis. The best known algorithm requires O(n2^(n)) time, using O(n2^(n)) working space. The high complexities in time and space impede its application to genomes of a large number of sites. This article gives two new algorithms. The first improves the time by a factor of O(n) and significantly reduces the space to O(n²). The second improves both the time...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May 16. doi: 10.1089/cmb.2021.0641. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>The simplified partial digest problem (SPDP) models an effective and robust method for the building of a physical map using restriction site analysis. The best known algorithm requires <i>O</i>(<i>n</i>2</b><sup><b><i>n</i></b></sup><b>) time, using <i>O</i>(<i>n</i>2</b><sup><b><i>n</i></b></sup><b>) working space. The high complexities in time and space impede its application to genomes of a large number of sites. This article gives two new algorithms. The first improves the time by a factor of <i>O</i>(<i>n</i>) and significantly reduces the space to <i>O</i>(<i>n</i><sup>2</sup>). The second improves both the time and space to <i>O</i>(<i>n</i><sup>1.5</sup>2</b><sup><b><i>n</i>/2</b></sup><b>). Extensive experiments are conducted on real genomes. For instances that can be solved by the best known algorithm, the new algorithms achieve a speedup of up to 4000 times; in addition, due to the reduction in space, the new algorithms can solve many more instances. Experiments also reveal the following advantage of the SPDP method: almost every instance has at most four feasible solutions and for an instance that does not contain any pair of symmetric restriction sites, in all observed examples, the solution is unique.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35575710/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35575710</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0641>10.1089/cmb.2021.0641</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35575710</guid>
      <pubDate>Mon, 16 May 2022 06:00:00 -0400</pubDate>
      <dc:creator>Biing-Feng Wang</dc:creator>
      <dc:date>2022-05-16</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Fast Algorithms for the Simplified Partial Digest Problem</dc:title>
      <dc:identifier>pmid:35575710</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0641</dc:identifier>
    </item>
    <item>
      <title>Variational Approximation-Based Model Selection for Microbial Network Inference</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35549398/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Microbial associations are characterized by both direct and indirect interactions between the constituent taxa in a microbial community, and play an important role in determining the structure, organization, and function of the community. Microbial associations can be represented using a weighted graph (microbial network), whose nodes represent taxa and edges represent pairwise associations. A microbial network is typically inferred from a sample-taxa matrix that is obtained by sequencing...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May 12. doi: 10.1089/cmb.2021.0595. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">Microbial associations are characterized by both direct and indirect interactions between the constituent taxa in a microbial community, and play an important role in determining the structure, organization, and function of the community. Microbial associations can be represented using a weighted graph (microbial network), whose nodes represent taxa and edges represent pairwise associations. A microbial network is typically inferred from a sample-taxa matrix that is obtained by sequencing multiple biological samples and identifying the taxa counts in each sample. However, it is known that microbial associations are impacted by environmental and/or host factors. Thus, a sample-taxa matrix generated in a microbiome study involving a wide range of values for the environmental and/or clinical metadata variables may in fact be associated with more than one microbial network. In this study, we consider the problem of inferring multiple microbial networks from a given sample-taxa count matrix. Each sample is a count vector assumed to be generated by a mixture model consisting of component distributions that are multivariate Poisson log-normal. We present a variational expectation maximization algorithm for the model selection problem to infer the correct number of components of this mixture model. Our approach involves reframing the mixture model as a latent variable model, treating only the mixing coefficients as parameters, and subsequently approximating the marginal likelihood using an evidence lower bound framework. Our algorithm is evaluated on a large simulated dataset generated using a collection of different graph structures (band, hub, cluster, random, and scale-free).</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35549398/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35549398</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0595>10.1089/cmb.2021.0595</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35549398</guid>
      <pubDate>Fri, 13 May 2022 06:00:00 -0400</pubDate>
      <dc:creator>Shibu Yooseph</dc:creator>
      <dc:creator>Sahar Tavakoli</dc:creator>
      <dc:date>2022-05-13</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Variational Approximation-Based Model Selection for Microbial Network Inference</dc:title>
      <dc:identifier>pmid:35549398</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0595</dc:identifier>
    </item>
    <item>
      <title>The Probability of Joint Monophyly of Samples of Gene Lineages for All Species in an Arbitrary Species Tree</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35544237/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Monophyly is a feature of a set of genetic lineages in which every lineage in the set is more closely related to all other members of the set than it is to any lineage outside the set. Multiple sets of lineages that are separately monophyletic are said to be reciprocally monophyletic, or jointly monophyletic. The prevalence of reciprocal monophyly, or joint monophyly (JM), has been used to evaluate phylogenetic and phylogeographic hypotheses, as well as to delimit species. These applications...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May 11. doi: 10.1089/cmb.2021.0647. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>Monophyly is a feature of a set of genetic lineages in which every lineage in the set is more closely related to all other members of the set than it is to any lineage outside the set. Multiple sets of lineages that are separately monophyletic are said to be reciprocally monophyletic, or jointly monophyletic. The prevalence of reciprocal monophyly, or joint monophyly (JM), has been used to evaluate phylogenetic and phylogeographic hypotheses, as well as to delimit species. These applications often make use of a probability of JM under models of gene lineage evolution. Studies in coalescent theory have computed this JM probability for small numbers of separate groups in arbitrary species trees and for arbitrary numbers of separate groups in trivial species trees. In this study, generalizing existing results on monophyly probabilities under the multispecies coalescent, we derive the probability of JM for <i>arbitrary</i> numbers of separate groups in <i>arbitrary</i> species trees. We illustrate how our result collapses to previously examined cases. We also study the effect of tree height, sample size, and number of species on the probability of JM. We obtain relatively simple lower and upper bounds on the JM probability. Our results expand the scope of JM calculations beyond small numbers of species, subsuming past formulas that have been used in simpler cases.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35544237/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35544237</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0647>10.1089/cmb.2021.0647</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35544237</guid>
      <pubDate>Wed, 11 May 2022 06:00:00 -0400</pubDate>
      <dc:creator>Rohan S Mehta</dc:creator>
      <dc:creator>Mike Steel</dc:creator>
      <dc:creator>Noah A Rosenberg</dc:creator>
      <dc:date>2022-05-11</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>The Probability of Joint Monophyly of Samples of Gene Lineages for All Species in an Arbitrary Species Tree</dc:title>
      <dc:identifier>pmid:35544237</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0647</dc:identifier>
    </item>
    <item>
      <title>Mathematical Model of HIV/AIDS Considering Sexual Preferences Under Antiretroviral Therapy, a Case Study in San Juan de Pasto, Colombia</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35544039/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>While several studies on human immunodeficiency virus (HIV)/acquired immunodeficiency syndrome (AIDS) in the homosexual and heterosexual population have demonstrated substantial advantages in controlling HIV transmission in these groups, the overall benefits of the models with a bisexual population and initiation of antiretroviral therapy have not had enough attention in dynamic modeling. Thus, we used a mathematical model based on studying the impacts of bisexual behavior in a global community...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May;29(5):483-493. doi: 10.1089/cmb.2021.0323.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>While several studies on human immunodeficiency virus (HIV)/acquired immunodeficiency syndrome (AIDS) in the homosexual and heterosexual population have demonstrated substantial advantages in controlling HIV transmission in these groups, the overall benefits of the models with a bisexual population and initiation of antiretroviral therapy have not had enough attention in dynamic modeling. Thus, we used a mathematical model based on studying the impacts of bisexual behavior in a global community developed in the PhD thesis work of Espitia (2021). The model is governed by a nonlinear ordinary differential equation system, the parameters of which are calibrated with data from the cumulative cases of HIV infection and AIDS reported in San Juan de Pasto in 2019. Our model estimations show which parameters are the most influential and how to modulate them to decrease the HIV infection.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35544039/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35544039</a> | PMC:<a href="https://www.ncbi.nlm.nih.gov/pmc/PMC9125573/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">PMC9125573</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0323>10.1089/cmb.2021.0323</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35544039</guid>
      <pubDate>Wed, 11 May 2022 06:00:00 -0400</pubDate>
      <dc:creator>Cristian C Espitia</dc:creator>
      <dc:creator>Miguel A Botina</dc:creator>
      <dc:creator>Marco A Solarte</dc:creator>
      <dc:creator>Ivan Hernandez</dc:creator>
      <dc:creator>Ricardo A Riascos</dc:creator>
      <dc:creator>João F Meyer</dc:creator>
      <dc:date>2022-05-11</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Mathematical Model of HIV/AIDS Considering Sexual Preferences Under Antiretroviral Therapy, a Case Study in San Juan de Pasto, Colombia</dc:title>
      <dc:identifier>pmid:35544039</dc:identifier>
      <dc:identifier>pmc:PMC9125573</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0323</dc:identifier>
    </item>
    <item>
      <title>Cuckoo Search-Based Optimization for Cancer Classification: A New Hybrid Approach</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35527646/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>The design of an optimal framework for the prediction of cancer from high-dimensional and imbalanced microarray data is a challenging job in the fields of bioinformatics and machine learning. There are so many techniques for dimensionality reduction, but it is unclear which of these techniques performs best with different classifiers and datasets. This article focused on the independent component analysis (ICA) features (genes) extraction method for Naïve Bayes (NB) classification of microarray...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May 6. doi: 10.1089/cmb.2021.0410. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>The design of an optimal framework for the prediction of cancer from high-dimensional and imbalanced microarray data is a challenging job in the fields of bioinformatics and machine learning. There are so many techniques for dimensionality reduction, but it is unclear which of these techniques performs best with different classifiers and datasets. This article focused on the independent component analysis (ICA) features (genes) extraction method for Naïve Bayes (NB) classification of microarray data, because ICA perfectly takes out an independent component from the datasets that satisfy the classification criteria of the NB classifier. A novel hybrid method based on a nature-inspired metaheuristic algorithm is proposed in this article for resolving optimization problems of ICA extracted genes. The cuckoo search (CS) algorithm and artificial bee colony (ABC) for finding the best subset of features to increase the performance of ICA for the NB classifier is designed and executed. According to our investigation, the CS-ABC with ICA was implemented for the first time to resolve the dimensionality reduction problem in high-dimensional microarray biomedical datasets. The CS algorithm improved the local search process of the ABC algorithm, and then the hybrid algorithm CS-ABC provided better optimal gene sets that improved the classification accuracy of the NB classifier. The experimental comparison shows that the CS-ABC approach with the ICA algorithm performs a deeper search in the iterative process, which can avoid premature convergence and produce better results compared with the previously published feature selection algorithm for the NB classifier.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35527646/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35527646</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0410>10.1089/cmb.2021.0410</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35527646</guid>
      <pubDate>Mon, 09 May 2022 06:00:00 -0400</pubDate>
      <dc:creator>Rabia Musheer Aziz</dc:creator>
      <dc:date>2022-05-09</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Cuckoo Search-Based Optimization for Cancer Classification: A New Hybrid Approach</dc:title>
      <dc:identifier>pmid:35527646</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0410</dc:identifier>
    </item>
    <item>
      <title>Data Set-Adaptive Minimizer Order Reduces Memory Usage in &lt;em&gt;k&lt;/em&gt;-Mer Counting</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35527644/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>The rapid continuous growth of deep sequencing experiments requires development and improvement of many bioinformatic applications for analysis of large sequencing data sets, including k-mer counting and assembly. Several applications reduce memory usage by binning sequences. Binning is done by using minimizer schemes, which rely on a specific order of the minimizers. It has been demonstrated that the choice of the order has a major impact on the performance of the applications. Here we...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May 6. doi: 10.1089/cmb.2021.0599. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>The rapid continuous growth of deep sequencing experiments requires development and improvement of many bioinformatic applications for analysis of large sequencing data sets, including <i>k</i>-mer counting and assembly. Several applications reduce memory usage by binning sequences. Binning is done by using minimizer schemes, which rely on a specific order of the minimizers. It has been demonstrated that the choice of the order has a major impact on the performance of the applications. Here we introduce a method for tailoring the order to the data set. Our method repeatedly samples the data set and modifies the order so as to flatten the <i>k</i>-mer load distribution across minimizers. We integrated our method into Gerbil, a state-of-the-art memory-efficient</b> <i>k</i><b>-mer counter, and were able to reduce its memory footprint by 30%-50% for large</b> <i>k</i><b>, with only a minor increase in runtime. Our tests also showed that the orders produced by our method produced superior results when transferred across data sets from the same species, with little or no order change. This enables memory reduction with essentially no increase in runtime.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35527644/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35527644</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0599>10.1089/cmb.2021.0599</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35527644</guid>
      <pubDate>Mon, 09 May 2022 06:00:00 -0400</pubDate>
      <dc:creator>Dan Flomin</dc:creator>
      <dc:creator>David Pellow</dc:creator>
      <dc:creator>Ron Shamir</dc:creator>
      <dc:date>2022-05-09</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Data Set-Adaptive Minimizer Order Reduces Memory Usage in &lt;em&gt;k&lt;/em&gt;-Mer Counting</dc:title>
      <dc:identifier>pmid:35527644</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0599</dc:identifier>
    </item>
    <item>
      <title>A New Context Tree Inference Algorithm for Variable Length Markov Chain Model with Applications to Biological Sequence Analyses</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35451885/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>The statistical inference of high-order Markov chains (MCs) for biological sequences is vital for molecular sequence analyses but can be hindered by the high dimensionality of free parameters. In the seminal article by Bühlmann and Wyner, variable length Markov chain (VLMC) model was proposed to embed the full-order MC in a sparse structured context tree. In the key procedure of tree pruning of their proposed context algorithm, the word count-based statistic for each branch was defined and...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr 22. doi: 10.1089/cmb.2021.0604. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">The statistical inference of high-order Markov chains (MCs) for biological sequences is vital for molecular sequence analyses but can be hindered by the high dimensionality of free parameters. In the seminal article by Bühlmann and Wyner, variable length Markov chain (VLMC) model was proposed to embed the full-order MC in a sparse structured context tree. In the key procedure of tree pruning of their proposed context algorithm, the word count-based statistic for each branch was defined and compared with a fixed cutoff threshold calculated from a common chi-square distribution to prune the branch of the context tree. In this study, we find that the word counts for each branch are highly intercorrelated, resulting in non-negligible effects on the distribution of the statistic of interest. We demonstrate that the inferred context tree based on the original context algorithm by Bühlmann and Wyner, which uses a fixed cutoff threshold based on a common chi-square distribution, can be systematically biased and error prone. We denote the original context algorithm as VLMC-Biased (VLMC-B). To solve this problem, we propose a new context tree inference algorithm using an adaptive tree-pruning scheme, termed VLMC-Consistent (VLMC-C). The VLMC-C is founded on the consistent branch-specific mixed chi-square distributions calculated based on asymptotic normal distribution of multiple word patterns. We validate our theoretical branch-specific asymptotic distribution using simulated data. We compare VLMC-C with VLMC-B on context tree inference using both simulated and real genome sequence data and demonstrate that VLMC-C outperforms VLMC-B for both context tree reconstruction accuracy and model compression capacity.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35451885/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35451885</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0604>10.1089/cmb.2021.0604</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35451885</guid>
      <pubDate>Fri, 22 Apr 2022 06:00:00 -0400</pubDate>
      <dc:creator>Shaokun An</dc:creator>
      <dc:creator>Jie Ren</dc:creator>
      <dc:creator>Fengzhu Sun</dc:creator>
      <dc:creator>Lin Wan</dc:creator>
      <dc:date>2022-04-22</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>A New Context Tree Inference Algorithm for Variable Length Markov Chain Model with Applications to Biological Sequence Analyses</dc:title>
      <dc:identifier>pmid:35451885</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0604</dc:identifier>
    </item>
    <item>
      <title>Genome-Wide Causation Studies of Complex Diseases</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35451855/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Despite significant progress in dissecting the genetic architecture of complex diseases by genome-wide association studies (GWAS), the signals identified by association analysis may not have specific pathological relevance to diseases so that a large fraction of disease-causing genetic variants is still hidden. Association is used to measure dependence between two variables or two sets of variables. GWAS test association between a disease and single-nucleotide polymorphisms (SNPs) (or other...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr 22. doi: 10.1089/cmb.2021.0676. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">Despite significant progress in dissecting the genetic architecture of complex diseases by genome-wide association studies (GWAS), the signals identified by association analysis may not have specific pathological relevance to diseases so that a large fraction of disease-causing genetic variants is still hidden. Association is used to measure dependence between two variables or two sets of variables. GWAS test association between a disease and single-nucleotide polymorphisms (SNPs) (or other genetic variants) across the genome. Association analysis may detect superficial patterns between disease and genetic variants. Association signals provide limited information on the causal mechanism of diseases. The use of association analysis as a major analytical platform for genetic studies of complex diseases is a key issue that may hamper discovery of disease mechanisms, calling into the questions the ability of GWAS to identify loci-underlying diseases. It is time to move beyond association analysis toward techniques, which enables the discovery of the underlying causal genetic structures of complex diseases. To achieve this, we propose the concept of genome-wide causation studies (GWCS) as an alternative to GWAS and develop additive noise models (ANMs) for genetic causation analysis. Type 1 error rates and power of the ANMs in testing causation are presented. We conducted GWCS of schizophrenia. Both simulation and real data analysis show that the proportion of the overlapped association and causation signals is small. Thus, we anticipate that our analysis will stimulate serious discussion of the applicability of GWAS and GWCS.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35451855/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35451855</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0676>10.1089/cmb.2021.0676</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35451855</guid>
      <pubDate>Fri, 22 Apr 2022 06:00:00 -0400</pubDate>
      <dc:creator>Rong Jiao</dc:creator>
      <dc:creator>Xiangning Chen</dc:creator>
      <dc:creator>Eric Boerwinkle</dc:creator>
      <dc:creator>Momiao Xiong</dc:creator>
      <dc:date>2022-04-22</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Genome-Wide Causation Studies of Complex Diseases</dc:title>
      <dc:identifier>pmid:35451855</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0676</dc:identifier>
    </item>
    <item>
      <title>Feature Selection by Hybrid Brain Storm Optimization Algorithm for COVID-19 Classification</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35446145/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>A large number of features lead to very high-dimensional data. The feature selection method reduces the dimension of data, increases the performance of prediction, and reduces the computation time. Feature selection is the process of selecting the optimal set of input features from a given data set in order to reduce the noise in data and keep the relevant features. The optimal feature subset contains all useful and relevant features and excludes any irrelevant feature that allows machine...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr 19. doi: 10.1089/cmb.2021.0256. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>A large number of features lead to very high-dimensional data. The feature selection method reduces the dimension of data, increases the performance of prediction, and reduces the computation time. Feature selection is the process of selecting the optimal set of input features from a given data set in order to reduce the noise in data and keep the relevant features. The optimal feature subset contains all useful and relevant features and excludes any irrelevant feature that allows machine learning models to understand better and differentiate efficiently the patterns in data sets. In this article, we propose a binary hybrid metaheuristic-based algorithm for selecting the optimal feature subset. Concretely, the brain storm optimization algorithm is hybridized by the firefly algorithm and adopted as a wrapper method for feature selection problems on classification data sets. The proposed algorithm is evaluated on 21 data sets and compared with 11 metaheuristic algorithms. In addition, the proposed method is adopted for the coronavirus disease data set. The obtained experimental results substantiate the robustness of the proposed hybrid algorithm. It efficiently reduces and selects the feature subset and at the same time results in higher classification accuracy than other methods in the literature.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35446145/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35446145</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0256>10.1089/cmb.2021.0256</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35446145</guid>
      <pubDate>Thu, 21 Apr 2022 06:00:00 -0400</pubDate>
      <dc:creator>Timea Bezdan</dc:creator>
      <dc:creator>Miodrag Zivkovic</dc:creator>
      <dc:creator>Nebojsa Bacanin</dc:creator>
      <dc:creator>Amit Chhabra</dc:creator>
      <dc:creator>Muthusamy Suresh</dc:creator>
      <dc:date>2022-04-21</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Feature Selection by Hybrid Brain Storm Optimization Algorithm for COVID-19 Classification</dc:title>
      <dc:identifier>pmid:35446145</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0256</dc:identifier>
    </item>
    <item>
      <title>Harnessing Fuzzy Rule Based System for Screening Major Histocompatibility Complex Class I Peptide Epitopes from the Whole Proteome: An Implementation on the Proteome of &lt;em&gt;Leishmania donovani&lt;/em&gt;</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35404099/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>The development of peptide-based vaccines is enhanced by immunoinformatics, which predicts the patterns that B cells and T cells recognize. Although several tools are available for predicting the Major histocompatibility complex (MHC-I) binding peptides, the wide variants of human leucocyte antigen allele make it challenging to choose a peptide that will induce an immune response in a majority of people. In addition, for a peptide to be considered a potential vaccine candidate, factors such as T...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr 11. doi: 10.1089/cmb.2021.0464. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">The development of peptide-based vaccines is enhanced by immunoinformatics, which predicts the patterns that B cells and T cells recognize. Although several tools are available for predicting the Major histocompatibility complex (MHC-I) binding peptides, the wide variants of human leucocyte antigen allele make it challenging to choose a peptide that will induce an immune response in a majority of people. In addition, for a peptide to be considered a potential vaccine candidate, factors such as T cell affinity, proteasome cleavage, and similarity to human proteins also play a major role. Identifying peptides that satisfy the earlier cited measures across the entire proteome is, therefore, challenging. Hence, the fuzzy inference system (FIS) is proposed to detect each peptide's potential as a vaccine candidate and assign it either a very high, high, moderate, or low ranking. The FIS includes input features from 6 modules (binding of 27 major alleles, T cell propensity, pro-inflammatory response, proteasome cleavage, transporter associated with antigen processing, and similarity with human peptide) and rules derived from an observation of features on positive samples. On validation of experimentally verified peptides, a balanced accuracy of ∼80% was achieved, with a Mathew's correlation coefficient score of 0.67 and an F-1 score of 0.74. In addition, the method was implemented on complete proteome of <i>Leishmania donovani</i>, which contains ∼4,800,000 peptides. Lastly, a searchable database of the ranked results of the <i>L. donovani</i> proteome was made and is available online (MHC-FIS-LdDB). It is hoped that this method will simplify the identification of potential MHC-I binding candidates from a large proteome.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35404099/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35404099</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0464>10.1089/cmb.2021.0464</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35404099</guid>
      <pubDate>Mon, 11 Apr 2022 06:00:00 -0400</pubDate>
      <dc:creator>Saravanan Vijayakumar</dc:creator>
      <dc:date>2022-04-11</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Harnessing Fuzzy Rule Based System for Screening Major Histocompatibility Complex Class I Peptide Epitopes from the Whole Proteome: An Implementation on the Proteome of &lt;em&gt;Leishmania donovani&lt;/em&gt;</dc:title>
      <dc:identifier>pmid:35404099</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0464</dc:identifier>
    </item>
    <item>
      <title>Statistical Methods for Microbiome Compositional Data Network Inference: A Survey</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35404093/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Microbes can be found almost everywhere in the world. They are not isolated, but rather interact with each other and establish connections with their living environments. Studying these interactions is essential to an understanding of the organization and complex interplay of microbial communities, as well as the structure and dynamics of various ecosystems. A widely used approach toward this objective involves the inference of microbiome interaction networks. However, owing to the...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr 11. doi: 10.1089/cmb.2021.0406. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>Microbes can be found almost everywhere in the world. They are not isolated, but rather interact with each other and establish connections with their living environments. Studying these interactions is essential to an understanding of the organization and complex interplay of microbial communities, as well as the structure and dynamics of various ecosystems. A widely used approach toward this objective involves the inference of microbiome interaction networks. However, owing to the compositional, high-dimensional, sparse, and heterogeneous nature of observed microbial data, applying network inference methods to estimate their associations is challenging. In addition, external environmental interference and biological concerns also make it more difficult to deal with the network inference. In this article, we provide a comprehensive review of emerging microbiome interaction network inference methods. According to various research targets, estimated networks are divided into four main categories: correlation networks, conditional correlation networks, mixture networks, and differential networks. Their assumptions, high-level ideas, advantages, as well as limitations, are presented in this review. Since real microbial interactions can be complex and dynamic, no unifying method has, to date, captured all the aspects of interest. In addition, we discuss the challenges now confronting current microbial interaction study and future prospects. Finally, we point out several feasible directions of microbial network inference analysis and highlight that future research requires the joint promotion of statistical computation methods and experimental techniques.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35404093/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35404093</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0406>10.1089/cmb.2021.0406</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35404093</guid>
      <pubDate>Mon, 11 Apr 2022 06:00:00 -0400</pubDate>
      <dc:creator>Liang Chen</dc:creator>
      <dc:creator>Hui Wan</dc:creator>
      <dc:creator>Qiuyan He</dc:creator>
      <dc:creator>Shun He</dc:creator>
      <dc:creator>Minghua Deng</dc:creator>
      <dc:date>2022-04-11</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Statistical Methods for Microbiome Compositional Data Network Inference: A Survey</dc:title>
      <dc:identifier>pmid:35404093</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0406</dc:identifier>
    </item>
    <item>
      <title>Quantitative Biology Undergraduate Major at the University of Southern California</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35404078/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>In 2017, the University of Southern California started a new undergraduate major in quantitative biology. This major combines training in the biological sciences, mathematics, and computer science to prepare students for 21st century biology and medicine. In this article I will discuss the curriculum, the first two cohorts of graduates, the current students, and future plans for the major.</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr 11. doi: 10.1089/cmb.2021.0605. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">In 2017, the University of Southern California started a new undergraduate major in quantitative biology. This major combines training in the biological sciences, mathematics, and computer science to prepare students for 21st century biology and medicine. In this article I will discuss the curriculum, the first two cohorts of graduates, the current students, and future plans for the major.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35404078/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35404078</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0605>10.1089/cmb.2021.0605</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35404078</guid>
      <pubDate>Mon, 11 Apr 2022 06:00:00 -0400</pubDate>
      <dc:creator>Peter Calabrese</dc:creator>
      <dc:date>2022-04-11</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Quantitative Biology Undergraduate Major at the University of Southern California</dc:title>
      <dc:identifier>pmid:35404078</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0605</dc:identifier>
    </item>
    <item>
      <title>DeepVir: Graphical Deep Matrix Factorization for In Silico Antiviral Repositioning-Application to COVID-19</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35394368/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>This study formulates antiviral repositioning as a matrix completion problem wherein the antiviral drugs are along the rows and the viruses are along the columns. The input matrix is partially filled, with ones in positions where the antiviral drug has been known to be effective against a virus. The curated metadata for antivirals (chemical structure and pathways) and viruses (genomic structure and symptoms) are encoded into our matrix completion framework as graph Laplacian regularization. We...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May;29(5):441-452. doi: 10.1089/cmb.2021.0108. Epub 2022 Apr 7.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>This study formulates antiviral repositioning as a matrix completion problem wherein the antiviral drugs are along the rows and the viruses are along the columns. The input matrix is partially filled, with ones in positions where the antiviral drug has been known to be effective against a virus. The curated metadata for antivirals (chemical structure and pathways) and viruses (genomic structure and symptoms) are encoded into our matrix completion framework as graph Laplacian regularization. We then frame the resulting multiple graph regularized matrix completion (GRMC) problem as deep matrix factorization. This is solved by using a novel optimization method called HyPALM (Hybrid Proximal Alternating Linearized Minimization). Results of our curated RNA drug-virus association data set show that the proposed approach excels over state-of-the-art GRMC techniques. When applied to in silico prediction of antivirals for COVID-19, our approach returns antivirals that are either used for treating patients or are under trials for the same</b>.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35394368/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35394368</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0108>10.1089/cmb.2021.0108</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35394368</guid>
      <pubDate>Fri, 08 Apr 2022 06:00:00 -0400</pubDate>
      <dc:creator>Aanchal Mongia</dc:creator>
      <dc:creator>Stuti Jain</dc:creator>
      <dc:creator>Emilie Chouzenoux</dc:creator>
      <dc:creator>Angshul Majumdar</dc:creator>
      <dc:date>2022-04-08</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>DeepVir: Graphical Deep Matrix Factorization for In Silico Antiviral Repositioning-Application to COVID-19</dc:title>
      <dc:identifier>pmid:35394368</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0108</dc:identifier>
    </item>
    <item>
      <title>Special Issue: Biological Distributed Algorithms 2021</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35389753/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>No abstract</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr;29(4):305. doi: 10.1089/cmb.2022.29060.ye.</p><p><b>NO ABSTRACT</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35389753/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35389753</a> | DOI:<a href=https://doi.org/10.1089/cmb.2022.29060.ye>10.1089/cmb.2022.29060.ye</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35389753</guid>
      <pubDate>Thu, 07 Apr 2022 06:00:00 -0400</pubDate>
      <dc:creator>Yuval Emek</dc:creator>
      <dc:creator>Saket Navlakha</dc:creator>
      <dc:date>2022-04-07</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Special Issue: Biological Distributed Algorithms 2021</dc:title>
      <dc:identifier>pmid:35389753</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2022.29060.ye</dc:identifier>
    </item>
    <item>
      <title>Identification of New Clusters from Labeled Data Using Mixture Models</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35384743/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Nowadays attempts to segment classes or groups are often found in various fields. Especially, one of emerging issues in biological and medical areas is identification of new subtypes of biological samples or patients. For the identification, we often need to find new subtypes from known classes. In such cases, we usually use clustering techniques. However, usual clustering methods could mix up the labels of the known classes in clustering outcomes and it might lead to wrong interpretation for...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr 5. doi: 10.1089/cmb.2021.0443. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>Nowadays attempts to segment classes or groups are often found in various fields. Especially, one of emerging issues in biological and medical areas is identification of new subtypes of biological samples or patients. For the identification, we often need to find new subtypes from known classes. In such cases, we usually use clustering techniques. However, usual clustering methods could mix up the labels of the known classes in clustering outcomes and it might lead to wrong interpretation for the identified clusters. Also, they do not use the information about known classes. Thus, this study proposes a Gaussian mixture model-based approach for identifying new clusters from known classes while it maintains them. The performance of the proposed model is verified through simulations and it is applied to a breast cancer data set.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35384743/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35384743</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0443>10.1089/cmb.2021.0443</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35384743</guid>
      <pubDate>Wed, 06 Apr 2022 06:00:00 -0400</pubDate>
      <dc:creator>Yujung Kim</dc:creator>
      <dc:creator>Jaejik Kim</dc:creator>
      <dc:date>2022-04-06</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Identification of New Clusters from Labeled Data Using Mixture Models</dc:title>
      <dc:identifier>pmid:35384743</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0443</dc:identifier>
    </item>
    <item>
      <title>On the Number of Saturated and Optimal Extended 2-Regular Simple Stacks in the Nussinov-Jacobson Energy Model</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35353583/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>It is known that both RNA secondary structure and protein contact map can be presented using combinatorial diagrams, the combinatorial enumeration and related problems of which have been studied extensively. Motivated by previous enumeration works on saturated RNA secondary structures and extended stack structures of protein contact maps, we are interested in the enumeration problems of saturated and optimal extended stacks in the Nussinov-Jacobson energy model, in which each base pair...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May;29(5):425-440. doi: 10.1089/cmb.2021.0421. Epub 2022 Mar 28.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>It is known that both RNA secondary structure and protein contact map can be presented using combinatorial diagrams, the combinatorial enumeration and related problems of which have been studied extensively. Motivated by previous enumeration works on saturated RNA secondary structures and extended stack structures of protein contact maps, we are interested in the enumeration problems of saturated and optimal extended stacks in the Nussinov-Jacobson energy model, in which each base pair contributes energy -1. Then optimal structures are those with most arcs, and locally optimal structures are exactly the saturated structures, in which no more arcs can be added without violating the structure definition. For saturated extended 2-regular simple stacks, whose degree configuration is related to the protein fold in two-dimensional honeycomb lattice, we obtain generating function equation and asymptotic formula for its number. Moreover, an explicit formula for the number of optimal extended 2-regular simple stacks is also obtained.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35353583/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35353583</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0421>10.1089/cmb.2021.0421</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35353583</guid>
      <pubDate>Wed, 30 Mar 2022 06:00:00 -0400</pubDate>
      <dc:creator>Qianghui Guo</dc:creator>
      <dc:creator>Yinglie Jin</dc:creator>
      <dc:creator>Mengqin Li</dc:creator>
      <dc:creator>Lisa Hui Sun</dc:creator>
      <dc:creator>Yanyan Xu</dc:creator>
      <dc:date>2022-03-30</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>On the Number of Saturated and Optimal Extended 2-Regular Simple Stacks in the Nussinov-Jacobson Energy Model</dc:title>
      <dc:identifier>pmid:35353583</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0421</dc:identifier>
    </item>
    <item>
      <title>Texture Enhancement of Medical Images for Efficient Disease Diagnosis with Optimized Fractional Derivative Masks</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35353538/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>For the past two decades, fractional-order derivatives have been used to model many systems in science and engineering with more accuracy than existing integer-order derivatives. Many of these applications have been employed in the image processing field. It is undeniable that an image enhancement algorithm is very much desirable for medical image analysis to diagnose various kinds of diseases more efficiently. These requirements demand that the image should be of high quality. Hence, accurate...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Mar 28. doi: 10.1089/cmb.2021.0267. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">For the past two decades, fractional-order derivatives have been used to model many systems in science and engineering with more accuracy than existing integer-order derivatives. Many of these applications have been employed in the image processing field. It is undeniable that an image enhancement algorithm is very much desirable for medical image analysis to diagnose various kinds of diseases more efficiently. These requirements demand that the image should be of high quality. Hence, accurate edge-detection and denoising models are required in medical image processing, improving, and enhancing the contrast of an image to attain a better texture and avoid noise. In this study, we employ and compare the conventional methods and recent and most popular fractional-order-based methods for medical image analysis texture enhancement. To make a fair comparison, the fractional-order operators are optimized for all images with gray wolf optimizer while considering the performance metric mean squared error. The results showed that fractional differential-based operators perform better than conventional integer-order operators for texture enhancement of medical images.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35353538/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35353538</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0267>10.1089/cmb.2021.0267</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35353538</guid>
      <pubDate>Wed, 30 Mar 2022 06:00:00 -0400</pubDate>
      <dc:creator>Priyanka Harjule</dc:creator>
      <dc:creator>Manva Mohd Tokir</dc:creator>
      <dc:creator>Tanuj Mehta</dc:creator>
      <dc:creator>Shivam Gurjar</dc:creator>
      <dc:creator>Anupam Kumar</dc:creator>
      <dc:creator>Basant Agarwal</dc:creator>
      <dc:date>2022-03-30</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Texture Enhancement of Medical Images for Efficient Disease Diagnosis with Optimized Fractional Derivative Masks</dc:title>
      <dc:identifier>pmid:35353538</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0267</dc:identifier>
    </item>
    <item>
      <title>Agreement in Spiking Neural Networks</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35333601/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>We study the problem of binary agreement in a spiking neural network (SNN). We show that binary agreement on n inputs can be achieved with O(n) of auxiliary neurons. Our simulation results suggest that agreement can be achieved in our network in O(logn) time. We then describe a subclass of SNNs with a biologically plausible property, which we call size-independence. We prove that solving a class of problems, including agreement and Winner-Take-All, in this model requires Ω(n) auxiliary neurons,...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr;29(4):358-369. doi: 10.1089/cmb.2021.0365. Epub 2022 Mar 23.</p><p><b>ABSTRACT</b></p><p xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:p1="http://pubmed.gov/pub-one">We study the problem of binary agreement in a spiking neural network (SNN). We show that binary agreement on <i>n</i> inputs can be achieved with <mml:math><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math> of auxiliary neurons. Our simulation results suggest that agreement can be achieved in our network in <mml:math><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>log</mml:mo><mml:mi>n</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math> time. We then describe a subclass of SNNs with a biologically plausible property, which we call size-independence. We prove that solving a class of problems, including agreement and Winner-Take-All, in this model requires <mml:math><mml:mi>Ω</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math> auxiliary neurons, which makes our agreement network size-optimal.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35333601/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35333601</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0365>10.1089/cmb.2021.0365</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35333601</guid>
      <pubDate>Fri, 25 Mar 2022 06:00:00 -0400</pubDate>
      <dc:creator>Martin Kunev</dc:creator>
      <dc:creator>Petr Kuznetsov</dc:creator>
      <dc:creator>Denis Sheynikhovich</dc:creator>
      <dc:date>2022-03-25</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Agreement in Spiking Neural Networks</dc:title>
      <dc:identifier>pmid:35333601</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0365</dc:identifier>
    </item>
    <item>
      <title>Correlation Imputation for Single-Cell RNA-seq</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35325552/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Recent advances in single-cell RNA sequencing (scRNA-seq) technologies have yielded a powerful tool to measure gene expression of individual cells. One major challenge of the scRNA-seq data is that it usually contains a large amount of zero expression values, which often impairs the effectiveness of downstream analyses. Numerous data imputation methods have been proposed to deal with these "dropout" events, but this is a difficult task for such high-dimensional and sparse data. Furthermore,...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May;29(5):465-482. doi: 10.1089/cmb.2021.0403. Epub 2022 Mar 21.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>Recent advances in single-cell RNA sequencing (scRNA-seq) technologies have yielded a powerful tool to measure gene expression of individual cells. One major challenge of the scRNA-seq data is that it usually contains a large amount of zero expression values, which often impairs the effectiveness of downstream analyses. Numerous data imputation methods have been proposed to deal with these "dropout" events, but this is a difficult task for such high-dimensional and sparse data. Furthermore, there have been debates on the nature of the sparsity, about whether the zeros are due to technological limitations or represent actual biology. To address these challenges, we propose Single-cell RNA-seq Correlation completion by ENsemble learning and Auxiliary information (SCENA), a novel approach that imputes the correlation matrix of the data of interest instead of the data itself. SCENA obtains a gene-by-gene correlation estimate by ensembling various individual estimates, some of which are based on known auxiliary information about gene expression networks. Our approach is a reliable method that makes no assumptions on the nature of sparsity in scRNA-seq data or the data distribution. By extensive simulation studies and real data applications, we demonstrate that SCENA is not only superior in gene correlation estimation, but also improves the accuracy and reliability of downstream analyses, including cell clustering, dimension reduction, and graphical model estimation to learn the gene expression network.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35325552/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35325552</a> | PMC:<a href="https://www.ncbi.nlm.nih.gov/pmc/PMC9125575/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">PMC9125575</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0403>10.1089/cmb.2021.0403</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35325552</guid>
      <pubDate>Thu, 24 Mar 2022 06:00:00 -0400</pubDate>
      <dc:creator>Luqin Gan</dc:creator>
      <dc:creator>Giuseppe Vinci</dc:creator>
      <dc:creator>Genevera I Allen</dc:creator>
      <dc:date>2022-03-24</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Correlation Imputation for Single-Cell RNA-seq</dc:title>
      <dc:identifier>pmid:35325552</dc:identifier>
      <dc:identifier>pmc:PMC9125575</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0403</dc:identifier>
    </item>
    <item>
      <title>Use of DFT Distance Metrics for Classification of SARS-CoV-2 Genomes</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35325549/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>In this work, we investigate using Fourier coefficients (FCs) for capturing useful information about viral sequences in a computationally efficient and compact manner. Specifically, we extract geographic submission location from SARS-CoV-2 sequence headers submitted to the GISAID Initiative, calculate corresponding FCs, and use the FCs to classify these sequences according to geographic location. We show that the FCs serve as useful numerical summaries for sequences that allow manipulation,...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May;29(5):453-464. doi: 10.1089/cmb.2021.0229. Epub 2022 Mar 21.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>In this work, we investigate using Fourier coefficients (FCs) for capturing useful information about viral sequences in a computationally efficient and compact manner. Specifically, we extract geographic submission location from SARS-CoV-2 sequence headers submitted to the GISAID Initiative, calculate corresponding FCs, and use the FCs to classify these sequences according to geographic location. We show that the FCs serve as useful numerical summaries for sequences that allow manipulation, identification, and differentiation via classical mathematical and statistical methods that are not readily applicable for character strings. Further, we argue that subsets of the FCs may be usable for the same purposes, which results in a reduction in storage requirements. We conclude by offering extensions of the research and potential future directions for subsequent analyses, such as the use of other series transforms for discreetly indexed signals such as genomes.</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35325549/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35325549</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0229>10.1089/cmb.2021.0229</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35325549</guid>
      <pubDate>Thu, 24 Mar 2022 06:00:00 -0400</pubDate>
      <dc:creator>Micah Thornton</dc:creator>
      <dc:creator>Monnie Mcgee</dc:creator>
      <dc:date>2022-03-24</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Use of DFT Distance Metrics for Classification of SARS-CoV-2 Genomes</dc:title>
      <dc:identifier>pmid:35325549</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0229</dc:identifier>
    </item>
    <item>
      <title>Integrating Long-Range Regulatory Interactions to Predict Gene Expression Using Graph Convolutional Networks</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35325548/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Long-range regulatory interactions among genomic regions are critical for controlling gene expression, and their disruption has been associated with a host of diseases. However, when modeling the effects of regulatory factors, most deep learning models either neglect long-range interactions or fail to capture the inherent 3D structure of the underlying genomic organization. To address these limitations, we present a Graph Convolutional Model for Epigenetic Regulation of Gene Expression...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 May;29(5):409-424. doi: 10.1089/cmb.2021.0316. Epub 2022 Mar 21.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>Long-range regulatory interactions among genomic regions are critical for controlling gene expression, and their disruption has been associated with a host of diseases. However, when modeling the effects of regulatory factors, most deep learning models either neglect long-range interactions or fail to capture the inherent 3D structure of the underlying genomic organization. To address these limitations, we present a Graph Convolutional Model for Epigenetic Regulation of Gene Expression (GC-MERGE). Using a graph-based framework, the model incorporates important information about long-range interactions via a natural encoding of genomic spatial interactions into the graph representation. It integrates measurements of both the global genomic organization and the local regulatory factors, specifically histone modifications, to not only predict the expression of a given gene of interest but also quantify the importance of its regulatory factors. We apply GC-MERGE to data sets for three cell lines-GM12878 (lymphoblastoid), K562 (myelogenous leukemia), and HUVEC (human umbilical vein endothelial)-and demonstrate its state-of-the-art predictive performance. Crucially, we show that our model is interpretable in terms of the observed biological regulatory factors, highlighting both the histone modifications and the interacting genomic regions contributing to a gene's predicted expression. We provide model explanations for multiple exemplar genes and validate them with evidence from the literature. Our model presents a novel setup for predicting gene expression by integrating multimodal data sets in a graph convolutional framework. More importantly, it enables interpretation of the biological mechanisms driving the model's predictions</b>.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35325548/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35325548</a> | PMC:<a href="https://www.ncbi.nlm.nih.gov/pmc/PMC9125570/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">PMC9125570</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0316>10.1089/cmb.2021.0316</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35325548</guid>
      <pubDate>Thu, 24 Mar 2022 06:00:00 -0400</pubDate>
      <dc:creator>Jeremy Bigness</dc:creator>
      <dc:creator>Xavier Loinaz</dc:creator>
      <dc:creator>Shalin Patel</dc:creator>
      <dc:creator>Erica Larschan</dc:creator>
      <dc:creator>Ritambhara Singh</dc:creator>
      <dc:date>2022-03-24</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Integrating Long-Range Regulatory Interactions to Predict Gene Expression Using Graph Convolutional Networks</dc:title>
      <dc:identifier>pmid:35325548</dc:identifier>
      <dc:identifier>pmc:PMC9125570</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0316</dc:identifier>
    </item>
    <item>
      <title>Emergence of Direction-Selective Retinal Cell Types in Task-Optimized Deep Learning Models</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35275740/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Convolutional neural networks (CNNs), a class of deep learning models, have experienced recent success in modeling sensory cortices and retinal circuits through optimizing performance on machine learning tasks, otherwise known as task optimization. Previous research has shown task-optimized CNNs to be capable of providing explanations as to why the retina efficiently encodes natural stimuli and how certain retinal cell types are involved in efficient encoding. In our work, we sought to use...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr;29(4):370-381. doi: 10.1089/cmb.2021.0368. Epub 2022 Mar 11.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">Convolutional neural networks (CNNs), a class of deep learning models, have experienced recent success in modeling sensory cortices and retinal circuits through optimizing performance on machine learning tasks, otherwise known as task optimization. Previous research has shown task-optimized CNNs to be capable of providing explanations as to why the retina efficiently encodes natural stimuli and how certain retinal cell types are involved in efficient encoding. In our work, we sought to use task-optimized CNNs as a means of explaining computational mechanisms responsible for motion-selective retinal circuits. We designed a biologically constrained CNN and optimized its performance on a motion-classification task. We drew inspiration from psychophysics, deep learning, and systems neuroscience literature to develop a toolbox of methods to reverse engineer the computational mechanisms learned in our model. Through reverse engineering our model, we proposed a computational mechanism in which direction-selective ganglion cells and starburst amacrine cells, both experimentally observed retinal cell types, emerge in our model to discriminate among moving stimuli. This emergence suggests that direction-selective circuits in the retina are ecologically designed to robustly discriminate among moving stimuli. Our results and methods also provide a framework for how to build more interpretable deep learning models and how to understand them.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35275740/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35275740</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0368>10.1089/cmb.2021.0368</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35275740</guid>
      <pubDate>Fri, 11 Mar 2022 06:00:00 -0500</pubDate>
      <dc:creator>Keith T Murray</dc:creator>
      <dc:creator>Mien Brabeeba Wang</dc:creator>
      <dc:creator>Nancy Lynch</dc:creator>
      <dc:date>2022-03-11</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Emergence of Direction-Selective Retinal Cell Types in Task-Optimized Deep Learning Models</dc:title>
      <dc:identifier>pmid:35275740</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0368</dc:identifier>
    </item>
    <item>
      <title>Coordinating Amoebots via Reconfigurable Circuits</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35255223/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>We consider an extension to the geometric amoebot model that allows amoebots to form so-called circuits. Given a connected amoebot structure, a circuit is a subgraph formed by the amoebots that permits the instant transmission of signals. We show that such an extension allows for significantly faster solutions to a variety of problems related to programmable matter. More specifically, we provide algorithms for leader election, consensus, compass alignment, chirality agreement, and shape...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr;29(4):317-343. doi: 10.1089/cmb.2021.0363. Epub 2022 Mar 7.</p><p><b>ABSTRACT</b></p><p xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:p1="http://pubmed.gov/pub-one">We consider an extension to the geometric amoebot model that allows amoebots to form so-called <i>circuits</i>. Given a connected amoebot structure, a circuit is a subgraph formed by the amoebots that permits the instant transmission of signals. We show that such an extension allows for significantly faster solutions to a variety of problems related to programmable matter. More specifically, we provide algorithms for leader election, consensus, compass alignment, chirality agreement, and shape recognition. Leader election can be solved in <mml:math><mml:mi>Θ</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>log</mml:mo><mml:mi>n</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math> rounds, with high probability (w.h.p.), consensus in <mml:math><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math> rounds, and both, compass alignment and chirality agreement, can be solved in <mml:math><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>log</mml:mo><mml:mi>n</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math> rounds, w.h.p. For shape recognition, the amoebots have to decide whether the amoebot structure forms a particular shape. We show that the amoebots can detect a shape composed of triangles within <mml:math><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math> rounds. Finally, we show how the amoebots can detect a parallelogram with linear and polynomial side ratio within <mml:math><mml:mi>Θ</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>log</mml:mo><mml:mi>n</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math> rounds, w.h.p.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35255223/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35255223</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0363>10.1089/cmb.2021.0363</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35255223</guid>
      <pubDate>Mon, 07 Mar 2022 06:00:00 -0500</pubDate>
      <dc:creator>Michael Feldmann</dc:creator>
      <dc:creator>Andreas Padalkin</dc:creator>
      <dc:creator>Christian Scheideler</dc:creator>
      <dc:creator>Shlomi Dolev</dc:creator>
      <dc:date>2022-03-07</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Coordinating Amoebots via Reconfigurable Circuits</dc:title>
      <dc:identifier>pmid:35255223</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0363</dc:identifier>
    </item>
    <item>
      <title>Improvement of Automatic Glioma Brain Tumor Detection Using Deep Convolutional Neural Networks</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35235381/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>This article introduces automatic brain tumor detection from a magnetic resonance image (MRI). It provides novel algorithms for extracting patches and segmentation trained with Convolutional Neural Network (CNN)'s to identify brain tumors. Further, this study provides deep learning and image segmentation with CNN algorithms. This contribution proposed two similar segmentation algorithms: one for the Higher Grade Gliomas (HGG) and the other for the Lower Grade Gliomas (LGG) for the brain tumor...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Mar 1. doi: 10.1089/cmb.2021.0280. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">This article introduces automatic brain tumor detection from a magnetic resonance image (MRI). It provides novel algorithms for extracting patches and segmentation trained with Convolutional Neural Network (CNN)'s to identify brain tumors. Further, this study provides deep learning and image segmentation with CNN algorithms. This contribution proposed two similar segmentation algorithms: one for the Higher Grade Gliomas (HGG) and the other for the Lower Grade Gliomas (LGG) for the brain tumor patients. The proposed algorithms (Intensity normalization, Patch extraction, Selecting the best patch, segmentation of HGG, and Segmentation of LGG) identify the gliomas and detect the stage of the tumor as per taking the MRI as input and segmented tumor from the MRIs and elaborated the four algorithms to detect HGG, and segmentation to detect the LGG works with CNN. The segmentation algorithm is compared with different existing algorithms and performs the automatic identification reasonably with high accuracy as per epochs generated with accuracy and loss curves. This article also described how transfer learning has helped extract the image and resolution of the image and increase the segmentation accuracy in the case of LGG patients.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35235381/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35235381</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0280>10.1089/cmb.2021.0280</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35235381</guid>
      <pubDate>Wed, 02 Mar 2022 06:00:00 -0500</pubDate>
      <dc:creator>Ayman Altameem</dc:creator>
      <dc:creator>Basetty Mallikarjuna</dc:creator>
      <dc:creator>Abdul Khader Jilani Saudagar</dc:creator>
      <dc:creator>Meenakshi Sharma</dc:creator>
      <dc:creator>Ramesh Chandra Poonia</dc:creator>
      <dc:date>2022-03-02</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Improvement of Automatic Glioma Brain Tumor Detection Using Deep Convolutional Neural Networks</dc:title>
      <dc:identifier>pmid:35235381</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0280</dc:identifier>
    </item>
    <item>
      <title>A Mathematical Framework for Analyzing Wild Tomato Root Architecture</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35235373/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>The root architecture of wild tomato, Solanum pimpinellifolium, can be viewed as a network connecting the main root to various lateral roots. Several constraints have been proposed on the structure of such biological networks, including minimizing the total amount of wire necessary for constructing the root architecture (wiring cost), and minimizing the distances (and by extension, resource transport time) between the base of the main root and the lateral roots (conduction delay). For a given...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr;29(4):306-316. doi: 10.1089/cmb.2021.0361. Epub 2022 Mar 2.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">The root architecture of wild tomato, <i>Solanum pimpinellifolium</i>, can be viewed as a network connecting the main root to various lateral roots. Several constraints have been proposed on the structure of such biological networks, including minimizing the total amount of wire necessary for constructing the root architecture (wiring cost), and minimizing the distances (and by extension, resource transport time) between the base of the main root and the lateral roots (conduction delay). For a given set of lateral root tip locations, these two objectives compete with each other-optimizing one results in poorer performance on the other-raising the question how well <i>S. pimpinellifolium</i> root architectures balance this network design trade-off in a distributed manner. In this study, we describe how well <i>S. pimpinellifolium</i> roots resolve this trade-off using the theory of Pareto optimality. We describe a mathematical model for characterizing the network structure and design trade-offs governing the structure of <i>S. pimpinellifolium</i> root architecture. We demonstrate that <i>S. pimpinellifolium</i> arbors construct architectures that are more optimal than would be expected by chance. Finally, we use this framework to quantify structural differences between arbors grown in the presence of salt stress, classify arbors into four distinct architectural ideotypes, and test for heritability of variation in root architecture structure.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35235373/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35235373</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0361>10.1089/cmb.2021.0361</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35235373</guid>
      <pubDate>Wed, 02 Mar 2022 06:00:00 -0500</pubDate>
      <dc:creator>Arjun Chandrasekhar</dc:creator>
      <dc:creator>Magdalena M Julkowska</dc:creator>
      <dc:date>2022-03-02</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>A Mathematical Framework for Analyzing Wild Tomato Root Architecture</dc:title>
      <dc:identifier>pmid:35235373</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0361</dc:identifier>
    </item>
    <item>
      <title>Optimal Solution of a Fractional HIV/AIDS Epidemic Mathematical Model</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35230161/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>This article presents a fractional mathematical model of the human immunodeficiency virus (HIV)/AIDS spread with a fractional derivative of the Caputo type. The model includes five compartments corresponding to the variables describing the susceptible patients, HIV-infected patients, people with AIDS but not receiving antiretroviral treatment, patients being treated, and individuals who are immune to HIV infection by sexual contact. Moreover, it is assumed that the total population is constant....</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Mar;29(3):276-291. doi: 10.1089/cmb.2021.0253. Epub 2022 Feb 25.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">This article presents a fractional mathematical model of the human immunodeficiency virus (HIV)/AIDS spread with a fractional derivative of the Caputo type. The model includes five compartments corresponding to the variables describing the susceptible patients, HIV-infected patients, people with AIDS but not receiving antiretroviral treatment, patients being treated, and individuals who are immune to HIV infection by sexual contact. Moreover, it is assumed that the total population is constant. We construct an optimization technique supported by a class of basis functions, consisting of the generalized shifted Jacobi polynomials (GSJPs). The solution of the fractional HIV/AIDS epidemic model is approximated by means of GSJPs with coefficients and parameters in the matrix form. After calculating and combining the operational matrices with the Lagrange multipliers, we obtain the optimization method. The theorems on the existence, unique, and convergence results of the method are proved. Several illustrative examples show the performance of the proposed method. Mathematics Subject Classification: 97M60; 41A58; 92C42.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35230161/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35230161</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0253>10.1089/cmb.2021.0253</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35230161</guid>
      <pubDate>Tue, 01 Mar 2022 06:00:00 -0500</pubDate>
      <dc:creator>Hossein Hassani</dc:creator>
      <dc:creator>Zakieh Avazzadeh</dc:creator>
      <dc:creator>J A Tenreiro Machado</dc:creator>
      <dc:creator>Praveen Agarwal</dc:creator>
      <dc:creator>Maryam Bakhtiar</dc:creator>
      <dc:date>2022-03-01</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Optimal Solution of a Fractional HIV/AIDS Epidemic Mathematical Model</dc:title>
      <dc:identifier>pmid:35230161</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0253</dc:identifier>
    </item>
    <item>
      <title>Trade-offs of Linear Mixed Models in Genome-Wide Association Studies</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35230156/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Motivated by empirical arguments that are well known from the genome-wide association studies (GWAS) literature, we study the statistical properties of linear mixed models (LMMs) applied to GWAS. First, we study the sensitivity of LMMs to the inclusion of a candidate single nucleotide polymorphism (SNP) in the kinship matrix, which is often done in practice to speed up computations. Our results shed light on the size of the error incurred by including a candidate SNP, providing a justification...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Mar;29(3):233-242. doi: 10.1089/cmb.2021.0157. Epub 2022 Feb 25.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">Motivated by empirical arguments that are well known from the genome-wide association studies (GWAS) literature, we study the statistical properties of linear mixed models (LMMs) applied to GWAS. First, we study the sensitivity of LMMs to the inclusion of a candidate single nucleotide polymorphism (SNP) in the kinship matrix, which is often done in practice to speed up computations. Our results shed light on the size of the error incurred by including a candidate SNP, providing a justification to this technique to trade off velocity against veracity. Second, we investigate how mixed models can correct confounders in GWAS, which is widely accepted as an advantage of LMMs over traditional methods. We consider two sources of confounding factors-population stratification and environmental confounding factors-and study how different methods that are commonly used in practice trade off these two confounding factors differently.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35230156/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35230156</a> | PMC:<a href="https://www.ncbi.nlm.nih.gov/pmc/PMC8968846/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">PMC8968846</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0157>10.1089/cmb.2021.0157</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35230156</guid>
      <pubDate>Tue, 01 Mar 2022 06:00:00 -0500</pubDate>
      <dc:creator>Haohan Wang</dc:creator>
      <dc:creator>Bryon Aragam</dc:creator>
      <dc:creator>Eric P Xing</dc:creator>
      <dc:date>2022-03-01</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Trade-offs of Linear Mixed Models in Genome-Wide Association Studies</dc:title>
      <dc:identifier>pmid:35230156</dc:identifier>
      <dc:identifier>pmc:PMC8968846</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0157</dc:identifier>
    </item>
    <item>
      <title>Comparing Phylogenetic Trees Side by Side Through iPhyloC, a New Interactive Web-Based Framework</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35230147/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Current frameworks of side-by-side phylogenetic trees comparison face two issues: (1) accepting mainly binary trees as input and (2) assuming input trees having identical or highly overlapping taxa. However, cladistic comparative studies often lead with multiple nontotally resolved trees with nonidentical sets of taxa. We tackle these issues in this study, presenting the iPhyloC, an interactive web-based framework for comparing phylogenetic trees side by side. iPhyloC supports automatic...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Mar;29(3):292-303. doi: 10.1089/cmb.2021.0351. Epub 2022 Feb 25.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">Current frameworks of side-by-side phylogenetic trees comparison face two issues: (1) accepting mainly binary trees as input and (2) assuming input trees having identical or highly overlapping taxa. However, cladistic comparative studies often lead with multiple nontotally resolved trees with nonidentical sets of taxa. We tackle these issues in this study, presenting the iPhyloC, an interactive web-based framework for comparing phylogenetic trees side by side. iPhyloC supports automatic identification of the common taxa in the input trees, comparison options between them, intuitive design, high usability, scalability to large trees, and cross-platform support. iPhyloC was tested using different trees and a supertree depicting the phylogenetic relationships within the insect order Diptera as examples.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35230147/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35230147</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0351>10.1089/cmb.2021.0351</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35230147</guid>
      <pubDate>Tue, 01 Mar 2022 06:00:00 -0500</pubDate>
      <dc:creator>Muhsen Hammoud</dc:creator>
      <dc:creator>Charles Morphy D Santos</dc:creator>
      <dc:creator>João Paulo Gois</dc:creator>
      <dc:date>2022-03-01</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Comparing Phylogenetic Trees Side by Side Through iPhyloC, a New Interactive Web-Based Framework</dc:title>
      <dc:identifier>pmid:35230147</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0351</dc:identifier>
    </item>
    <item>
      <title>An Upper and Lower Bound for the Convergence Time of House-Hunting in &lt;em&gt;Temnothorax&lt;/em&gt; Ant Colonies</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35196137/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>We study the problem of house-hunting in ant colonies, where ants reach consensus on a new nest and relocate their colony to that nest, from a distributed computing perspective. We propose a house-hunting algorithm that is biologically inspired by Temnothorax ants. Each ant is modeled as a probabilistic agent with limited power, and there is no central control governing the ants. We show an Ω(logn) lower bound on the running time of our proposed house-hunting algorithm, where n is the number of...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr;29(4):344-357. doi: 10.1089/cmb.2021.0364. Epub 2022 Feb 22.</p><p><b>ABSTRACT</b></p><p xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:p1="http://pubmed.gov/pub-one">We study the problem of house-hunting in ant colonies, where ants reach consensus on a new nest and relocate their colony to that nest, from a distributed computing perspective. We propose a house-hunting algorithm that is biologically inspired by <i>Temnothorax</i> ants. Each ant is modeled as a probabilistic agent with limited power, and there is no central control governing the ants. We show an <mml:math><mml:mi>Ω</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>log</mml:mo><mml:mi>n</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math> lower bound on the running time of our proposed house-hunting algorithm, where <i>n</i> is the number of ants. Furthermore, we show a matching upper bound of expected <mml:math><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>log</mml:mo><mml:mi>n</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math> rounds for environments with only one candidate nest for the ants to move to. Our work provides insights into the house-hunting process, giving a perspective on how environmental factors such as nest quality or a quorum rule can affect the emigration process.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35196137/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35196137</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0364>10.1089/cmb.2021.0364</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35196137</guid>
      <pubDate>Wed, 23 Feb 2022 06:00:00 -0500</pubDate>
      <dc:creator>Emily Zhang</dc:creator>
      <dc:creator>Jiajia Zhao</dc:creator>
      <dc:creator>Nancy Lynch</dc:creator>
      <dc:date>2022-02-23</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>An Upper and Lower Bound for the Convergence Time of House-Hunting in &lt;em&gt;Temnothorax&lt;/em&gt; Ant Colonies</dc:title>
      <dc:identifier>pmid:35196137</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0364</dc:identifier>
    </item>
    <item>
      <title>Scalable Species Tree Inference with External Constraints</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35196115/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Species tree inference is a basic step in biological discovery, but discordance between gene trees creates analytical challenges and large data sets create computational challenges. Although there is generally some information available about the species trees that could be used to speed up the estimation, only one species tree estimation method that addresses gene tree discordance-ASTRAL-J, a recent development in the ASTRAL family of methods-is able to use this information. Here we describe...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Feb 21. doi: 10.1089/cmb.2021.0543. Online ahead of print.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><b>Species tree inference is a basic step in biological discovery, but discordance between gene trees creates analytical challenges and large data sets create computational challenges. Although there is generally some information available about the species trees that could be used to speed up the estimation, only one species tree estimation method that addresses gene tree discordance-ASTRAL-J, a recent development in the ASTRAL family of methods-is able to use this information. Here we describe two new methods, NJst-J and FASTRAL-J, that can estimate the species tree, given a partial knowledge of the species tree in the form of a nonbinary unrooted constraint tree. We show that both NJst-J and FASTRAL-J are much faster than ASTRAL-J and we prove that all three methods are statistically consistent under the multispecies coalescent model subject to this constraint. Our extensive simulation study shows that both FASTRAL-J and NJst-J provide advantages over ASTRAL-J: both are faster (and NJst-J is particularly fast), and FASTRAL-J is generally at least as accurate as ASTRAL-J. An analysis of the Avian Phylogenomics Project data set with 48 species and 14,446 genes presents additional evidence of the value of FASTRAL-J over ASTRAL-J (and both over ASTRAL), with dramatic reductions in running time (20 hours for default ASTRAL, and minutes or seconds for ASTRAL-J and FASTRAL-J, respectively).</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35196115/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35196115</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0543>10.1089/cmb.2021.0543</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35196115</guid>
      <pubDate>Wed, 23 Feb 2022 06:00:00 -0500</pubDate>
      <dc:creator>Baqiao Liu</dc:creator>
      <dc:creator>Tandy Warnow</dc:creator>
      <dc:date>2022-02-23</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Scalable Species Tree Inference with External Constraints</dc:title>
      <dc:identifier>pmid:35196115</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0543</dc:identifier>
    </item>
    <item>
      <title>RECOMB 2021 Special Issue</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35179993/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>No abstract</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Feb;29(2):91. doi: 10.1089/cmb.2021.29051.jp.</p><p><b>NO ABSTRACT</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35179993/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35179993</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.29051.jp>10.1089/cmb.2021.29051.jp</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35179993</guid>
      <pubDate>Fri, 18 Feb 2022 06:00:00 -0500</pubDate>
      <dc:creator>Jian Peng</dc:creator>
      <dc:date>2022-02-18</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>RECOMB 2021 Special Issue</dc:title>
      <dc:identifier>pmid:35179993</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.29051.jp</dc:identifier>
    </item>
    <item>
      <title>The Statistics of &lt;em&gt;k&lt;/em&gt;-mers from a Sequence Undergoing a Simple Mutation Process Without Spurious Matches</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35108101/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>k-mer-based methods are widely used in bioinformatics, but there are many gaps in our understanding of their statistical properties. Here, we consider the simple model where a sequence S (e.g., a genome or a read) undergoes a simple mutation process through which each nucleotide is mutated independently with some probability r, under the assumption that there are no spurious k-mer matches. How does this process affect the k-mers of S? We derive the expectation and variance of the number of...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Feb;29(2):155-168. doi: 10.1089/cmb.2021.0431. Epub 2022 Feb 1.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><i>k</i>-mer-based methods are widely used in bioinformatics, but there are many gaps in our understanding of their statistical properties. Here, we consider the simple model where a sequence <i>S</i> (e.g., a genome or a read) undergoes a simple mutation process through which each nucleotide is mutated independently with some probability <i>r</i>, under the assumption that there are no spurious <i>k</i>-mer matches. How does this process affect the <i>k</i>-mers of <i>S</i>? We derive the expectation and variance of the number of mutated <i>k</i>-mers and of the number of islands (a maximal interval of mutated <i>k</i>-mers) and oceans (a maximal interval of nonmutated <i>k</i>-mers). We then derive hypothesis tests and confidence intervals (CIs) for <i>r</i> given an observed number of mutated <i>k</i>-mers, or, alternatively, given the Jaccard similarity (with or without MinHash). We demonstrate the usefulness of our results using a few select applications: obtaining a CI to supplement the Mash distance point estimate, filtering out reads during alignment by Minimap2, and rating long-read alignments to a de Bruijn graph by Jabba.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35108101/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35108101</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0431>10.1089/cmb.2021.0431</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35108101</guid>
      <pubDate>Wed, 02 Feb 2022 06:00:00 -0500</pubDate>
      <dc:creator>Antonio Blanca</dc:creator>
      <dc:creator>Robert S Harris</dc:creator>
      <dc:creator>David Koslicki</dc:creator>
      <dc:creator>Paul Medvedev</dc:creator>
      <dc:date>2022-02-02</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>The Statistics of &lt;em&gt;k&lt;/em&gt;-mers from a Sequence Undergoing a Simple Mutation Process Without Spurious Matches</dc:title>
      <dc:identifier>pmid:35108101</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0431</dc:identifier>
    </item>
    <item>
      <title>ProALIGN: Directly Learning Alignments for Protein Structure Prediction via Exploiting Context-Specific Alignment Motifs</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35073170/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Template-based modeling (TBM), including homology modeling and protein threading, is one of the most reliable techniques for protein structure prediction. It predicts protein structure by building an alignment between the query sequence under prediction and the templates with solved structures. However, it is still very challenging to build the optimal sequence-template alignment, especially when only distantly related templates are available. Here we report a novel deep learning approach...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Feb;29(2):92-105. doi: 10.1089/cmb.2021.0430. Epub 2022 Jan 21.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">Template-based modeling (TBM), including homology modeling and protein threading, is one of the most reliable techniques for protein structure prediction. It predicts protein structure by building an alignment between the query sequence under prediction and the templates with solved structures. However, it is still very challenging to build the optimal sequence-template alignment, especially when only distantly related templates are available. Here we report a novel deep learning approach ProALIGN that can predict much more accurate sequence-template alignment. Like protein sequences consisting of sequence motifs, protein alignments are also composed of frequently occurring alignment motifs with characteristic patterns. Alignment motifs are context-specific as their characteristic patterns are tightly related to sequence contexts of the aligned regions. Inspired by this observation, we represent a protein alignment as a binary matrix (in which 1 denotes an aligned residue pair) and then use a deep convolutional neural network to predict the optimal alignment from the query protein and its template. The trained neural network implicitly but effectively encodes an alignment scoring function, which reduces inaccuracies in the handcrafted scoring functions widely used by the current threading approaches. For a query protein and a template, we apply the neural network to directly infer likelihoods of all possible residue pairs in their entirety, which could effectively consider the correlations among multiple residues. We further construct the alignment with maximum likelihood, and finally build a structure model according to the alignment. Tested on three independent data sets with a total of 6688 protein alignment targets and 80 CASP13 TBM targets, our method achieved much better alignments and 3D structure models than the existing methods, including HHpred, CNFpred, CEthreader, and DeepThreader. These results clearly demonstrate the effectiveness of exploiting the context-specific alignment motifs by deep learning for protein threading.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35073170/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35073170</a> | PMC:<a href="https://www.ncbi.nlm.nih.gov/pmc/PMC8892980/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">PMC8892980</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0430>10.1089/cmb.2021.0430</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35073170</guid>
      <pubDate>Mon, 24 Jan 2022 06:00:00 -0500</pubDate>
      <dc:creator>Lupeng Kong</dc:creator>
      <dc:creator>Fusong Ju</dc:creator>
      <dc:creator>Wei-Mou Zheng</dc:creator>
      <dc:creator>Jianwei Zhu</dc:creator>
      <dc:creator>Shiwei Sun</dc:creator>
      <dc:creator>Jinbo Xu</dc:creator>
      <dc:creator>Dongbo Bu</dc:creator>
      <dc:date>2022-01-24</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>ProALIGN: Directly Learning Alignments for Protein Structure Prediction via Exploiting Context-Specific Alignment Motifs</dc:title>
      <dc:identifier>pmid:35073170</dc:identifier>
      <dc:identifier>pmc:PMC8892980</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0430</dc:identifier>
    </item>
    <item>
      <title>Uncovering Molecular Mechanisms of Drug Resistance via Network-Constrained Common Structure Identification</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35073162/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Uncovering mechanisms of acquired drug resistance has garnered increasing attention worldwide as drug resistance reduces antibiotic and chemotherapy effectiveness. Most bioinformatics studies have elucidated these mechanisms based on differentially expressed gene (DEG) analysis. However, considering the associated complex network of biological systems, the specific molecular interactions must also be studied to obtain a complete understanding of the mechanisms related to drug resistance....</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Mar;29(3):257-275. doi: 10.1089/cmb.2021.0314. Epub 2022 Jan 21.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">Uncovering mechanisms of acquired drug resistance has garnered increasing attention worldwide as drug resistance reduces antibiotic and chemotherapy effectiveness. Most bioinformatics studies have elucidated these mechanisms based on differentially expressed gene (DEG) analysis. However, considering the associated complex network of biological systems, the specific molecular interactions must also be studied to obtain a complete understanding of the mechanisms related to drug resistance. Accordingly, by analyzing sample-specific gene networks, we sought to elucidate mechanisms of acquired drug resistance of cells based on molecular interactions between genes. In the current study, we focus on gefitinib and erlotinib and characterized cell lines based on their sensitivity. We also consider CRISPR knockout screening of the target gene, epidermal growth factor receptor (<i>EGFR</i>), as a characteristic of cells. Subsequently, we constructed a drug sensitivity-CRISPR knockout screen-specific gene network. To identify the molecular mechanisms of drug resistance from the multiple large-scale networks, we proposed a novel computational method, designated network-constrained sparse common component analysis (NetSCCA), that extracts common structures of multiple networks characterizing molecular interaction in drug-sensitive and drug-resistant cell lines. We then applied NetSCCA to multilayer networks of candidate drug-response genes to identify common structures of the regulatory system in drug-sensitive and <i>EGFR</i>-dependent cells, and drug-resistant and <i>EGFR</i>-independent cells. NetSCCA identified crucial common targets and regulator genes that dominate multiple networks in drug-sensitive and drug-resistant cell lines, respectively. Our analysis for common structure identification based on NetSCCA has the capacity to characterize the molecular interplay between genes and crucial markers related to mechanisms of acquired drug resistance that cannot be revealed by analysis based solely on DEG analysis. The biological mechanisms associated with gefitinib and erlotinib sensitivity of identified genes were verified through the literature. We expect that the proposed method will serve as a useful tool for uncovering not only drug resistance mechanisms but also complex biological systems based on massive genomic data sets.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35073162/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35073162</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0314>10.1089/cmb.2021.0314</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35073162</guid>
      <pubDate>Mon, 24 Jan 2022 06:00:00 -0500</pubDate>
      <dc:creator>Heewon Park</dc:creator>
      <dc:creator>Rui Yamaguchi</dc:creator>
      <dc:creator>Seiya Imoto</dc:creator>
      <dc:creator>Satoru Miyano</dc:creator>
      <dc:date>2022-01-24</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Uncovering Molecular Mechanisms of Drug Resistance via Network-Constrained Common Structure Identification</dc:title>
      <dc:identifier>pmid:35073162</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0314</dc:identifier>
    </item>
    <item>
      <title>RECOMB 2021 Special Issue</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35050716/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>No abstract</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Jan;29(1):2. doi: 10.1089/cmb.2021.29050.jp.</p><p><b>NO ABSTRACT</b></p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35050716/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35050716</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.29050.jp>10.1089/cmb.2021.29050.jp</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35050716</guid>
      <pubDate>Thu, 20 Jan 2022 06:00:00 -0500</pubDate>
      <dc:creator>Jian Peng</dc:creator>
      <dc:date>2022-01-20</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>RECOMB 2021 Special Issue</dc:title>
      <dc:identifier>pmid:35050716</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.29050.jp</dc:identifier>
    </item>
    <item>
      <title>GRNUlar: A Deep Learning Framework for Recovering Single-Cell Gene Regulatory Networks</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35050715/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>We propose GRNUlar, a novel deep learning framework for supervised learning of gene regulatory networks (GRNs) from single-cell RNA-Sequencing (scRNA-Seq) data. Our framework incorporates two intertwined models. First, we leverage the expressive ability of neural networks to capture complex dependencies between transcription factors and the corresponding genes they regulate, by developing a multitask learning framework. Second, to capture sparsity of GRNs observed in the real world, we design an...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Jan;29(1):27-44. doi: 10.1089/cmb.2021.0437.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">We propose GRNUlar, a novel deep learning framework for supervised learning of gene regulatory networks (GRNs) from single-cell RNA-Sequencing (scRNA-Seq) data. Our framework incorporates two intertwined models. First, we leverage the expressive ability of neural networks to capture complex dependencies between transcription factors and the corresponding genes they regulate, by developing a multitask learning framework. Second, to capture sparsity of GRNs observed in the real world, we design an unrolled algorithm technique for our framework. Our deep architecture requires supervision for training, for which we repurpose existing synthetic data simulators that generate scRNA-Seq data guided by an underlying GRN. Experimental results demonstrate that GRNUlar outperforms state-of-the-art methods on both synthetic and real data sets. Our study also demonstrates the novel and successful use of expression data simulators for supervised learning of GRN inference.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35050715/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35050715</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0437>10.1089/cmb.2021.0437</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35050715</guid>
      <pubDate>Thu, 20 Jan 2022 06:00:00 -0500</pubDate>
      <dc:creator>Harsh Shrivastava</dc:creator>
      <dc:creator>Xiuwei Zhang</dc:creator>
      <dc:creator>Le Song</dc:creator>
      <dc:creator>Srinivas Aluru</dc:creator>
      <dc:date>2022-01-20</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>GRNUlar: A Deep Learning Framework for Recovering Single-Cell Gene Regulatory Networks</dc:title>
      <dc:identifier>pmid:35050715</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0437</dc:identifier>
    </item>
    <item>
      <title>SCOT: Single-Cell Multi-Omics Alignment with Optimal Transport</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35050714/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Recent advances in sequencing technologies have allowed us to capture various aspects of the genome at single-cell resolution. However, with the exception of a few of co-assaying technologies, it is not possible to simultaneously apply different sequencing assays on the same single cell. In this scenario, computational integration of multi-omic measurements is crucial to enable joint analyses. This integration task is particularly challenging due to the lack of sample-wise or feature-wise...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Jan;29(1):3-18. doi: 10.1089/cmb.2021.0446.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">Recent advances in sequencing technologies have allowed us to capture various aspects of the genome at single-cell resolution. However, with the exception of a few of co-assaying technologies, it is not possible to simultaneously apply different sequencing assays on the same single cell. In this scenario, computational integration of multi-omic measurements is crucial to enable joint analyses. This integration task is particularly challenging due to the lack of sample-wise or feature-wise correspondences. We present single-cell alignment with optimal transport (SCOT), an unsupervised algorithm that uses the Gromov-Wasserstein optimal transport to align single-cell multi-omics data sets. SCOT performs on par with the current state-of-the-art unsupervised alignment methods, is faster, and requires tuning of fewer hyperparameters. More importantly, SCOT uses a self-tuning heuristic to guide hyperparameter selection based on the Gromov-Wasserstein distance. Thus, in the fully unsupervised setting, SCOT aligns single-cell data sets better than the existing methods without requiring any orthogonal correspondence information.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35050714/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35050714</a> | PMC:<a href="https://www.ncbi.nlm.nih.gov/pmc/PMC8812493/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">PMC8812493</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0446>10.1089/cmb.2021.0446</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35050714</guid>
      <pubDate>Thu, 20 Jan 2022 06:00:00 -0500</pubDate>
      <dc:creator>Pinar Demetci</dc:creator>
      <dc:creator>Rebecca Santorella</dc:creator>
      <dc:creator>Björn Sandstede</dc:creator>
      <dc:creator>William Stafford Noble</dc:creator>
      <dc:creator>Ritambhara Singh</dc:creator>
      <dc:date>2022-01-20</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>SCOT: Single-Cell Multi-Omics Alignment with Optimal Transport</dc:title>
      <dc:identifier>pmid:35050714</dc:identifier>
      <dc:identifier>pmc:PMC8812493</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0446</dc:identifier>
    </item>
    <item>
      <title>The Power of Population Effect in &lt;em&gt;Temnothorax&lt;/em&gt; Ant House-Hunting: A Computational Modeling Approach</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35049358/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>The decentralized cognition of animal groups is both a challenging biological problem and a potential basis for bioinspired design. In this study, we investigated the house-hunting algorithm used by emigrating colonies of Temnothorax ants to reach consensus on a new nest. We developed a tractable model that encodes accurate individual behavior rules, and estimated our parameter values by matching simulated behaviors with observed ones on both the individual and group levels. We then used our...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Apr;29(4):382-408. doi: 10.1089/cmb.2021.0369. Epub 2022 Jan 20.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">The decentralized cognition of animal groups is both a challenging biological problem and a potential basis for bioinspired design. In this study, we investigated the house-hunting algorithm used by emigrating colonies of <i>Temnothorax</i> ants to reach consensus on a new nest. We developed a tractable model that encodes accurate individual behavior rules, and estimated our parameter values by matching simulated behaviors with observed ones on both the individual and group levels. We then used our model to explore a potential, but yet untested, component of the ants' decision algorithm. Specifically, we examined the hypothesis that incorporating site population (the number of adult ants at each potential nest site) into individual perceptions of nest quality can improve emigration performance. Our results showed that attending to site population accelerates emigration and reduces the incidence of split decisions. This result suggests the value of testing empirically whether nest site scouts use site population in this way, in addition to the well-demonstrated quorum rule. We also used our model to make other predictions with varying degrees of empirical support, including the high cognitive capacity of colonies and their rational time investment during decision-making. In addition, we provide a versatile and easy-to-use Python simulator that can be used to explore other hypotheses or make testable predictions. It is our hope that the insights and the modeling tools can inspire further research from both the biology and computer science community.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35049358/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35049358</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0369>10.1089/cmb.2021.0369</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35049358</guid>
      <pubDate>Thu, 20 Jan 2022 06:00:00 -0500</pubDate>
      <dc:creator>Jiajia Zhao</dc:creator>
      <dc:creator>Nancy Lynch</dc:creator>
      <dc:creator>Stephen C Pratt</dc:creator>
      <dc:date>2022-01-20</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>The Power of Population Effect in &lt;em&gt;Temnothorax&lt;/em&gt; Ant House-Hunting: A Computational Modeling Approach</dc:title>
      <dc:identifier>pmid:35049358</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0369</dc:identifier>
    </item>
    <item>
      <title>Set-Min Sketch: A Probabilistic Map for Power-Law Distributions with Application to &lt;em&gt;k&lt;/em&gt;-Mer Annotation</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35049334/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>k-mer counts are important features used by many bioinformatics pipelines. Existing k-mer counting methods focus on optimizing either time or memory usage, producing in output very large count tables explicitly representing k-mers together with their counts. Storing k-mers is not needed if the set of k-mers is known, making it possible to only keep counters and their association to k-mers. Solutions avoiding explicit representation of k-mers include Minimal Perfect Hash Functions (MPHFs) and...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Feb;29(2):140-154. doi: 10.1089/cmb.2021.0429. Epub 2022 Jan 18.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one"><i>k</i>-mer counts are important features used by many bioinformatics pipelines. Existing <i>k</i>-mer counting methods focus on optimizing either time or memory usage, producing in output very large count tables explicitly representing <i>k</i>-mers together with their counts. Storing <i>k</i>-mers is not needed if the set of <i>k</i>-mers is known, making it possible to only keep counters and their association to <i>k</i>-mers. Solutions avoiding explicit representation of <i>k</i>-mers include Minimal Perfect Hash Functions (MPHFs) and Count-Min sketches. We introduce Set-Min sketch-a sketching technique for representing associative maps inspired from Count-Min-and apply it to the problem of representing <i>k</i>-mer count tables. Set-Min is provably more accurate than both Count-Min and Max-Min-an improved variant of Count-Min for static datasets that we define here. We show that Set-Min sketch provides a very low error rate, in terms of both the probability and the size of errors, at the expense of a very moderate memory increase. On the other hand, Set-Min sketches are shown to take up to an order of magnitude less space than MPHF-based solutions, for fully assembled genomes and large <i>k</i>. Space-efficiency of Set-Min in this case takes advantage of the power-law distribution of <i>k</i>-mer counts in genomic datasets.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35049334/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35049334</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0429>10.1089/cmb.2021.0429</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35049334</guid>
      <pubDate>Thu, 20 Jan 2022 06:00:00 -0500</pubDate>
      <dc:creator>Yoshihiro Shibuya</dc:creator>
      <dc:creator>Djamal Belazzougui</dc:creator>
      <dc:creator>Gregory Kucherov</dc:creator>
      <dc:date>2022-01-20</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Set-Min Sketch: A Probabilistic Map for Power-Law Distributions with Application to &lt;em&gt;k&lt;/em&gt;-Mer Annotation</dc:title>
      <dc:identifier>pmid:35049334</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0429</dc:identifier>
    </item>
    <item>
      <title>flopp: Extremely Fast Long-Read Polyploid Haplotype Phasing by Uniform Tree Partitioning</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35041529/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Resolving haplotypes in polyploid genomes using phase information from sequencing reads is an important and challenging problem. We introduce two new mathematical formulations of polyploid haplotype phasing: (1) the min-sum max tree partition problem, which is a more flexible graphical metric compared with the standard minimum error correction (MEC) model in the polyploid setting, and (2) the uniform probabilistic error minimization model, which is a probabilistic analogue of the MEC model. We...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Feb;29(2):195-211. doi: 10.1089/cmb.2021.0436. Epub 2022 Jan 17.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">Resolving haplotypes in polyploid genomes using phase information from sequencing reads is an important and challenging problem. We introduce two new mathematical formulations of polyploid haplotype phasing: (1) the min-sum max tree partition problem, which is a more flexible graphical metric compared with the standard minimum error correction (MEC) model in the polyploid setting, and (2) the uniform probabilistic error minimization model, which is a probabilistic analogue of the MEC model. We incorporate both formulations into a long-read based polyploid haplotype phasing method called <i>flopp</i>. We show that flopp compares favorably with state-of-the-art algorithms-up to 30 times faster with 2 times fewer switch errors on 6 × ploidy simulated data. Further, we show using real nanopore data that flopp can quickly reveal reasonable haplotype structures from the autotetraploid <i>Solanum tuberosum</i> (potato).</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35041529/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35041529</a> | PMC:<a href="https://www.ncbi.nlm.nih.gov/pmc/PMC8892958/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">PMC8892958</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0436>10.1089/cmb.2021.0436</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35041529</guid>
      <pubDate>Tue, 18 Jan 2022 06:00:00 -0500</pubDate>
      <dc:creator>Jim Shaw</dc:creator>
      <dc:creator>Yun William Yu</dc:creator>
      <dc:date>2022-01-18</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>flopp: Extremely Fast Long-Read Polyploid Haplotype Phasing by Uniform Tree Partitioning</dc:title>
      <dc:identifier>pmid:35041529</dc:identifier>
      <dc:identifier>pmc:PMC8892958</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0436</dc:identifier>
    </item>
    <item>
      <title>Finding Maximal Exact Matches Using the r-Index</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35041518/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Efficiently finding maximal exact matches (MEMs) between a sequence read and a database of genomes is a key first step in read alignment. But until recently, it was unknown how to build a data structure in [Formula: see text] space that supports efficient MEM finding, where r is the number of runs in the Burrows-Wheeler Transform. In 2021, Rossi et al. showed how to build a small auxiliary data structure called thresholds in addition to the r-index in [Formula: see text] space. This addition...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Feb;29(2):188-194. doi: 10.1089/cmb.2021.0445. Epub 2022 Jan 17.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">Efficiently finding maximal exact matches (MEMs) between a sequence read and a database of genomes is a key first step in read alignment. But until recently, it was unknown how to build a data structure in [Formula: see text] space that supports efficient MEM finding, where <i>r</i> is the number of runs in the Burrows-Wheeler Transform. In 2021, Rossi et al. showed how to build a small auxiliary data structure called <i>thresholds</i> in addition to the <i>r</i>-index in [Formula: see text] space. This addition enables efficient MEM finding using the <i>r</i>-index. In this article, we present the tool that implements this solution, which we call MONI. Namely, we give a high-level view of the main components of the data structure and show how the source code can be downloaded, compiled, and used to find MEMs between a set of sequence reads and a set of genomes.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35041518/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35041518</a> | PMC:<a href="https://www.ncbi.nlm.nih.gov/pmc/PMC8902461/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">PMC8902461</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0445>10.1089/cmb.2021.0445</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35041518</guid>
      <pubDate>Tue, 18 Jan 2022 06:00:00 -0500</pubDate>
      <dc:creator>Massimiliano Rossi</dc:creator>
      <dc:creator>Marco Oliva</dc:creator>
      <dc:creator>Paola Bonizzoni</dc:creator>
      <dc:creator>Ben Langmead</dc:creator>
      <dc:creator>Travis Gagie</dc:creator>
      <dc:creator>Christina Boucher</dc:creator>
      <dc:date>2022-01-18</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Finding Maximal Exact Matches Using the r-Index</dc:title>
      <dc:identifier>pmid:35041518</dc:identifier>
      <dc:identifier>pmc:PMC8902461</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0445</dc:identifier>
    </item>
    <item>
      <title>MONI: A Pangenomic Index for Finding Maximal Exact Matches</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35041495/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Recently, Gagie et al. proposed a version of the FM-index, called the r-index, that can store thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to build the r-index efficiently via a technique called prefix-free parsing (PFP) and demonstrated its effectiveness for exact pattern matching. Exact pattern matching can be leveraged to support approximate pattern matching, but the r-index itself cannot support efficiently popular and important queries such as finding...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Feb;29(2):169-187. doi: 10.1089/cmb.2021.0290. Epub 2022 Jan 17.</p><p><b>ABSTRACT</b></p><p xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:p1="http://pubmed.gov/pub-one">Recently, Gagie et al. proposed a version of the FM-index, called the <i>r</i>-index, that can store thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to build the <i>r</i>-index efficiently via a technique called prefix-free parsing (PFP) and demonstrated its effectiveness for exact pattern matching. Exact pattern matching can be leveraged to support approximate pattern matching, but the <i>r</i>-index itself cannot support efficiently popular and important queries such as finding maximal exact matches (MEMs). To address this shortcoming, Bannai et al. introduced the concept of thresholds, and showed that storing them together with the <i>r</i>-index enables efficient MEM finding-but they did not say how to find those thresholds. We present a novel algorithm that applies PFP to build the <i>r</i>-index and find the thresholds simultaneously and in linear time and space with respect to the size of the prefix-free parse. Our implementation called <mml:math><mml:mstyle><mml:mi>M</mml:mi></mml:mstyle><mml:mi>O</mml:mi><mml:mi>N</mml:mi><mml:mi>I</mml:mi></mml:math> can rapidly find MEMs between reads and large-sequence collections of highly repetitive sequences. Compared with other read aligners-PuffAligner, Bowtie2, BWA-MEM, and CHIC- MONI used 2-11 times less memory and was 2-32 times faster for index construction. Moreover, MONI was less than one thousandth the size of competing indexes for large collections of human chromosomes. Thus, MONI represents a major advance in our ability to perform MEM finding against very large collections of related references.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35041495/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35041495</a> | PMC:<a href="https://www.ncbi.nlm.nih.gov/pmc/PMC8892979/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">PMC8892979</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0290>10.1089/cmb.2021.0290</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35041495</guid>
      <pubDate>Tue, 18 Jan 2022 06:00:00 -0500</pubDate>
      <dc:creator>Massimiliano Rossi</dc:creator>
      <dc:creator>Marco Oliva</dc:creator>
      <dc:creator>Ben Langmead</dc:creator>
      <dc:creator>Travis Gagie</dc:creator>
      <dc:creator>Christina Boucher</dc:creator>
      <dc:date>2022-01-18</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>MONI: A Pangenomic Index for Finding Maximal Exact Matches</dc:title>
      <dc:identifier>pmid:35041495</dc:identifier>
      <dc:identifier>pmc:PMC8892979</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0290</dc:identifier>
    </item>
    <item>
      <title>Deriving Ranges of Optimal Estimated Transcript Expression due to Nonidentifiability</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35041494/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>Current expression quantification methods suffer from a fundamental but undercharacterized type of error: the most likely estimates for transcript abundances are not unique. This means multiple estimates of transcript abundances generate the observed RNA-seq reads with equal likelihood, and the underlying true expression cannot be determined. This is called nonidentifiability in probabilistic modeling. It is further exacerbated by incomplete reference transcriptomes where reads may be sequenced...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Feb;29(2):121-139. doi: 10.1089/cmb.2021.0444. Epub 2022 Jan 17.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">Current expression quantification methods suffer from a fundamental but undercharacterized type of error: the most likely estimates for transcript abundances are not unique. This means multiple estimates of transcript abundances generate the observed RNA-seq reads with equal likelihood, and the underlying true expression cannot be determined. This is called nonidentifiability in probabilistic modeling. It is further exacerbated by incomplete reference transcriptomes where reads may be sequenced from unannotated transcripts. Graph quantification is a generalization to transcript quantification, accounting for the reference incompleteness by allowing exponentially many unannotated transcripts to express reads. We propose methods to calculate a "confidence range of expression" for each transcript, representing its possible abundance across equally optimal estimates for both quantification models. This range informs both whether a transcript has potential estimation error due to nonidentifiability and the extent of the error. Applying our methods to the Human Body Map data, we observe that 35%-50% of transcripts potentially suffer from inaccurate quantification caused by nonidentifiability. When comparing the expression between isoforms in one sample, we find that the degree of inaccuracy of 20%-47% transcripts can be so large that the ranking of expression between the transcript and other isoforms from the same gene cannot be determined. When comparing the expression of a transcript between two groups of RNA-seq samples in differential expression analysis, we observe that the majority of detected differentially expressed transcripts are reliable with a few exceptions after considering the ranges of the optimal expression estimates.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35041494/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35041494</a> | PMC:<a href="https://www.ncbi.nlm.nih.gov/pmc/PMC8892959/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">PMC8892959</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0444>10.1089/cmb.2021.0444</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35041494</guid>
      <pubDate>Tue, 18 Jan 2022 06:00:00 -0500</pubDate>
      <dc:creator>Hongyu Zheng</dc:creator>
      <dc:creator>Cong Ma</dc:creator>
      <dc:creator>Carl Kingsford</dc:creator>
      <dc:date>2022-01-18</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Deriving Ranges of Optimal Estimated Transcript Expression due to Nonidentifiability</dc:title>
      <dc:identifier>pmid:35041494</dc:identifier>
      <dc:identifier>pmc:PMC8892959</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0444</dc:identifier>
    </item>
    <item>
      <title>Simulating Single-Cell Gene Expression Count Data with Preserved Gene Correlations by scDesign2</title>
      <link>https://pubmed.ncbi.nlm.nih.gov/35020490/?utm_source=Other&amp;utm_medium=rss&amp;utm_campaign=None&amp;utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&amp;fc=None&amp;ff=20220524180447&amp;v=2.17.6</link>
      <description>scDesign2 is a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. This article shows how to download and install the scDesign2 R package, how to fit probabilistic models (one per cell type) to real data and simulate synthetic data from the fitted models, and how to use scDesign2 to guide experimental design and benchmark computational methods. Finally, a note is given about cell clustering as a preprocessing step before...</description>
      <content:encoded><![CDATA[<div><p style="color: #4aa564;"><b>J Comput Biol</b>. 2022 Jan;29(1):23-26. doi: 10.1089/cmb.2021.0440. Epub 2022 Jan 11.</p><p><b>ABSTRACT</b></p><p xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:p1="http://pubmed.gov/pub-one">scDesign2 is a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. This article shows how to download and install the scDesign2 R package, how to fit probabilistic models (one per cell type) to real data and simulate synthetic data from the fitted models, and how to use scDesign2 to guide experimental design and benchmark computational methods. Finally, a note is given about cell clustering as a preprocessing step before model fitting and data simulation.</p><p style="color: lightgray">PMID:<a href="https://pubmed.ncbi.nlm.nih.gov/35020490/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">35020490</a> | PMC:<a href="https://www.ncbi.nlm.nih.gov/pmc/PMC8812500/?utm_source=Other&utm_medium=rss&utm_content=0Ee9dQsQb9k95_3hhH3_53l8gWboCiv_mwvdV_sakL5&ff=20220524180447&v=2.17.6">PMC8812500</a> | DOI:<a href=https://doi.org/10.1089/cmb.2021.0440>10.1089/cmb.2021.0440</a></p></div>]]></content:encoded>
      <guid isPermaLink="false">pubmed:35020490</guid>
      <pubDate>Wed, 12 Jan 2022 06:00:00 -0500</pubDate>
      <dc:creator>Tianyi Sun</dc:creator>
      <dc:creator>Dongyuan Song</dc:creator>
      <dc:creator>Wei Vivian Li</dc:creator>
      <dc:creator>Jingyi Jessica Li</dc:creator>
      <dc:date>2022-01-12</dc:date>
      <dc:source>Journal of computational biology : a journal of computational molecular cell biology</dc:source>
      <dc:title>Simulating Single-Cell Gene Expression Count Data with Preserved Gene Correlations by scDesign2</dc:title>
      <dc:identifier>pmid:35020490</dc:identifier>
      <dc:identifier>pmc:PMC8812500</dc:identifier>
      <dc:identifier>doi:10.1089/cmb.2021.0440</dc:identifier>
    </item>
  </channel>
</rss>
