SBOI Meeting Location and Schedule


Location


Schedule

Spring 2019

  • Jan 14: Prediction of bacterial E3 ubiquitin ligase effectors usingreduced amino acid peptide fingerprinting
  • Jan 21: Martin Luther King Day
  • Jan 28
  • Feb 4
  • Feb 11: Identifying KDIGO trajectory subphenotypes after acute kidney injury with increased mortality rates
    • Presenter: Taylor Smith
  • Feb 18
  • Feb 25
  • Mar 4: Medication application pattern mining using recurrent neural networks
    • Presenter: Sajjad Fouladvand
  • Mar 11
  • Mar 18
  • Mar 25: Computational Tools for the Untargeted Assignment of FT-MS Metabolomics Datasets
    • Presenter: Joshua Mitchell
  • Apr 1
  • Apr 8: Evidence of Peroxidase Catalysed Formation of Cysteine-Tyrosine and Dityrosine Cross-Linking in Mammalian Sperm Protamines
    • Presenter: Christian Powell
  • Apr 15: CCTS Spring Conference
  • Apr 22
  • Apr 29
    • Presenter: Brian Davis

Fall 2018 (6 Presentations)

  • Sept 10: Mining emerging phenomena on large-scale longitudinal phenomics data
    • Presenter: Chen Jin
    • Type: Research Seminar
  • Sept 17: Review of annotation enrichment analyses for omics-level datasets
    • Presenter: Hunter Moseley
  • Sept 24: A Machine Learning Approach to Computational Polypharmacology and Lessons Learned
    • Abstract:
      • It is common knowledge that drugs have polypharmacological properties that can be explored for new insights in drug discovery. That is a given drug interacts with many different proteins and a given protein interacts with multiple drugs. The polypharmacological, promiscuous nature of pharmaceuticals can have both beneficial and detrimental consequences. This attribute can be exploited to improve drug efficacy and prevent drug resistance. In addition to the ability of chemical compounds to interact with an array of protein targets, many diseases have multiple genetic determinants, and individual genetic determinants may be involved in multiple diseases. Furthermore, protein function and expression are controlled by a regulatory network of other proteins. When targeted therapies work initially patients often develop resistance due to secondary mutations or compensation from other parts of the underlying biological network. This illustrates the potential benefits of establishing computational polypharmacology methods – discovering drugs that intentionally target multiple proteins for a beneficial therapeutic result. Many adverse drug reactions (ADRs) result from drugs interacting with non-therapeutic off-targets (unintended interactions). Animal studies during preclinical trial are not always a good indication of these adverse interactions in humans, and such adverse effects are generally not discovered until a drug has reached clinical trial or is already on the market. With the number of different proteins in humans and the genetic variations observable in the population, a full understanding of all possible interactions through experiments and clinical testing alone is not feasible, making computational investigations particularly useful and relevant.
      • In our digitalized, data-driven world, there is a wealth of knowledge available that is beyond the processing power of an individual researcher or even team of researchers. The abundance of available biomedical data combined with the massive computing power we have available today with leadership class supercomputers provides great opportunities to advance computational drug research. A tool that reliably predicts protein and drug binding would revolutionize the pharmaceutical industry. An accurate representation of polypharmacological networks would provide a wealth of knowledge and insights on drug repurposing, side-effect prediction, and drug efficacy. This would lead the way to personalized polypharmacological networks including individual’s genetic variations resulting in a breakthrough for precision medicine. However, there are still many obstacles to overcome when it comes to utilizing massive computational power and ensuring accuracy of our predictions.
      • Using machine learning to score potential drug candidates may offer an advantage over traditional imprecise scoring functions because the parameters and model structure can be learned from the data. However, models may lack interpretability, are often overfit to the data, and are not generalizable to drug targets and chemotypes not in the training data. Benchmark datasets are prone to artificial enrichment and analogue bias due to the overrepresentation of certain scaffolds in experimentally determined active sets. Datasets can be evaluated using spatial statistics to quantify the dataset topology and better understand potential biases. Dataset clumping comprises a combination of self-similarity of actives and separation from decoys in chemical space and is associated with overoptimistic virtual screening results. This talk explores data, methods, and potential data biases relevant to computational drug binding predictions.
    • Type: Research Presentation
    • Presenter: Sally Ellingson
  • Oct 1: Untargeted lipidomics of NSCLC shows differentially abundant lipid categories in cancer vs non-cancer
    • Presenter: Joshua Mitchell
    • Type: Research Seminar
  • Oct 8
  • Oct 15
  • Oct 22
  • Oct 29: Investigating the role of iron and the tumor microenvironment in breast cancer progression
    • Presenter: Luis Sordo Vieira
    • Type: Research Seminar
    • Abstract:
      • Breast cancer cells are addicted to iron. The mechanisms by which malignant cells acquire and contain high levels of iron are not completely understood. Macrophages and fibroblasts in the tumor microenvironment are significant contributors to iron acquisitions. In this talk, we will summarize some of our research in progress of a mathematical model of how iron affects breast cancer progression, and survey some of the results of how iron affects breast cancer progression.
  • Nov 5: Automatic 13C Chemical Shift Reference Correction of Protein NMR Spectral Data Using Data Mining and Bayesian Statistical Modeling
    • Presenter: Xi Chen
    • Type: Research Seminar
  • Nov 12: Computational exploration of the molecular basis of calmodulin-dependent calcineurin activation
    • Presenter: Bin Sun
    • Type: Research Seminar
    • Abstract: Calmodulin (CaM) binds to calcineurin (CaN) 's CaM-recognition motif with an affinity in the low picomolar range, however this alone is insufficient to fully activate CaN. It has been shown that the CaN regulatory domain folds upon CaM binding and that there is a region C-terminal to the canonical CaM-binding region, the 'distal helix', that becomes helical and is critical for activation. Intriguingly, a soybean-derived CaM variant competes with mammalian CaM and suppresses CaN activation. Further, although the plant variant exhibits relatively high sequence homology with the mammalian isoform, many of the non-conserved amino acids are positioned far from the canonical binding site, which suggests that secondary protein-protein interactions are responsible for regulating CaN activation. We hypothesized that plant CaM variants exhibit impaired distal helix/CaM interactions that prevent CaN activation. To test this hypothesis, we utilized molecular simulations including replica-exchange molecular dynamics to model distal helix conformations, Brownian dynamics to generate trial distal helix/CaM poses and conventional molecular dynamics to evaluate the stability of the predicted binding modes. From these simulations we have isolated a potential binding site (site IV), which yields poses characterized by strong interprotein interactions and comparatively small conformational fluctuations. Further, molecular simulations of distal helix (A454D) and site IV (soybean CaM-inspired substitutions K30E and G40D) variants exhibit impaired interactions that correlate with reduced CaN activation in those systems. This study therefore provides a potential structural basis for the role of secondary CaM/CaN interactions in mediating CaN activation.
  • Nov 26
  • Dec 3

Spring 2018 (9 Presentations)

  • Jan 29: Elevated RNA Editing Activity Is a Major Contributor to Transcriptomic Diversity in Tumors
  • Feb 5: The (gruesome) eating habits of solid tumors
    • Presenter: Andrew Lane
  • Feb 12 Finding New Molecular Targets for Hepatocellular Carcinoma
  • Feb 19: No meeting
  • Feb 26: No meeting
  • Mar 5: Dual roles of electrostatic-steering and conformational dynamics in the binding of calcineurin’s intrinsically-disordered recognition domain to calmodulin
    • Presenter: Bin Sun
    • Abstract: Calcineurin (CaN) is a serine/threonine phosphatase that regulates a variety of physiological and pathophysiological processes in most mammalian tissue. It has been established that the calcineurin regulatory domain (RD) is highly disordered when inhibiting CaN, yet it undergoes a disorder-to-order transition upon binding calmodulin (CaM) to activate the phosphatase. The prevalence of polar and charged amino acids in the RD domain implicate electrostatic interactions in mediating CaM binding, yet it unclear whether properties of the RD conformational ensemble, such as its effective volume and accessibility of its CaM binding motif help or hinder its ability to participate in protein-protein recognition events. In the present study, we investigated via computational modeling the extent to which electrostatics and structural disorder co-facilitate or hinder CaM /CaN association kinetics. We examined several peptides containing the CaM binding motif via molecular dynamics (MD) and Brownian dynamics (BD), for which lengths and amino acid charge distributions were varied, to isolate the contributions of electrostatics versus conformational diversity to predicted, diffusion-limited association rates via microsecond-scale molecular dynamics and Brownian dynamics simulations. Our results indicate that the RD amino acid composition and sequence length influence both the dynamic availability of conformations amenable to CaM binding, as well as long-range electrostatic interactions to steer association. These findings provide intriguing insight into the interplay between conformational diversity and electrostatically-driven protein-protein association involving CaN, which are likely to extend to wide-ranging processes regulated by intrinsically-disordered proteins.
  • Mar 12: CANCELED! Automatic 13C Chemical Shift Reference Correction for Unassigned Protein NMR Spectra
    • Presenter: Xi Chen
    • Abstract: Poor chemical shift referencing, especially for 13C in protein Nuclear Magnetic Resonance (NMR) experiments, fundamentally limits and even prevents effective study of biomacromolecules via NMR, including protein structure determination and analysis of protein dynamics. To solve this problem, we constructed a Bayesian probabilistic framework that circumvents the limitations of previous reference correction methods that required protein resonance assignment and/or protein structure. Our software named Bayesian Model Optimized Reference Correction (BaMORC) can detect and correct 13C chemical shift referencing errors on the order of +/- 0.45 ppm at a 90% confidence interval (CI) before the protein resonance assignment step of the analysis. By combining the BaMORC methodology with a new intra-peaklist grouping algorithm, we created a combined method referred to as SoBaMORC that can be applied to unassigned experimental peak lists. SoBaMORC kept all experimental three-dimensional HN(CO)CACB-type peak lists tested within +/- 0.4 ppm of the correct 13C reference value. SoBaMORC can be applied to correct 13C chemical shift referencing errors when it will have the most impact, right before protein resonance assignment and other downstream analyses are started. Moreover, this web application allows non-NMR experts to quickly detect and correct 13C referencing before they use try to use spectral data with referencing errors. Thus, this software lowers the bar of NMR expertise required to perform effective protein NMR studies. Software implementing SoBaMORC is available for download and through a web-based interface for use by the broader scientific community.
  • Mar 19: Mutational Characterization of Squamous Cell Lung Cancers from Appalachian Kentucky: Moving Closer to Personalized Treatment
    • Presenter: Hunter Moseley*
  • Mar 19: CANCELED! How can public engagement change the way you do and present your science
    • Presenter: Sylvie Garneau-Tsodikova
  • Mar 26: Detangling PPI networks to uncover functionally meaningful clusters
    • Presenter: Eugene Hinderer
  • April 2: CANCELED!
    • Presenter: Varun Dwaraka
  • April 9: Metabolic network segmentation: A probabilistic graphical modeling approach to identify the sites and sequential order of metabolic regulation from non-targeted metabolomics data
  • April 16: Developing a Global Homology Analysis for Comparative Genomics
    • Presenter: Kelly Sovacool
  • April 23: Determination of Protein Functional Regions from Pre-existing Protein-Level Annotations
    • Presenter: Christian Powell

Fall 2017 (9 presentations)

  • Aug 14 - Systems Biology Using Amazon Web Services
    • Presenter: Peyton Biggs, AWS Genomics
  • Aug 28
  • Sep 6 - The 22q11.2 Deletion Syndrome: Transmission and Variation Annotation
    • Presenter: Matthew Hestand
    • Abstract: The 22q11.2 deletion syndrome is the most common chromosomal deletion syndrome in humans with an incidence of 1 in 2-4000 live births and shows extremely variable clinical presentations. Here we utilize multiple state-of-the-art genomic strategies, including fiber-FISH and whole-genome sequencing, to characterize the structure of the region. This includes identifying common and a-typical deletion sizes, fine-tuning the deletion breakpoints, and identifying nested inversion polymorphisms that predispose parents for transmitting chromosome 22q11.2 deletions to their offspring. In addition, the hemizygous nature of the region offers the opportunity to evaluate mutations in relation to known recessive disorders, such as SNAP29 mutations causative for cerebral dysgenesis, neuropathy, ichthyosis and keratoderma, Kousseff, and a potentially autosomal recessive form of Opitz G/BBB syndrome. However, we also identify variation that appears damaging, but is actually benign, such as SCARF2 mutations in patients that do not clinically present signs of Van den Ende-Gupta syndrome. Overall, these in depth studies on more than one thousand individuals are providing a rich annotation detailing structure, pathogenic and benign variation, and transmittance of the 22q11.2 deletion.
  • Sep 11
  • Sep 18 - Data Quality & Consistency in Various Scientific Repositories
    • Presenter: Andrey Smelter (& others)
    • Abstract: Our lab recently developed an API to the metabolomics repository Metabolomics Workbench, mwtab. Using this API package to investigate the data sets in Metabolomics Workbench revealed some interesting data issues. In addition to Metabolomics Workbench, we have also had experience working with data from the Biological Magnetic Resonance Bank, RefDB, Gene Ontology, Kyoto Encyclopedia of Genes and Genomes, and Protein Data Bank. Each of these data repositories have gotchas that casual users may not be aware of.
  • Sep 25 - Small Molecule Isotope Resolved Formula Enumerator (SMIRFE): a tool for assigning isotopologues and metabolites in Fourier transform mass spectra
    • Presenter: Joshua Mitchell
    • Abstract: Fourier-transform mass-spectrometry (FTMS) is often utilized in the detection of small molecules derived from biological samples. What is directly detected in the FTMS spectra are peaks for related sets of isotopologues or molecules that differ only in their isotopic composition for various adducted and charged species corresponding to specific molecules present in a given biological sample or introduced by contamination. The sheer complexity of the what is detected along with a variety of analytically-introduced variance, error, and artifacts have hindered the systematic analysis of the complex patterns of detected peaks. We have developed and prototyped a novel algorithm SMIRFE that detects small biomolecules less than 2000 daltons in mass at a desired statistical confidence and determines their specific elemental molecular formula (EMF) using detected cliques of related isotopologue peaks with compatible isotope resolved molecular formulae (IMFs). The methodology works on both mass spectra derived from non-stable isotope tracing experiments, but especially on mass spectra from stable isotope tracing experiments that contain metabolites labeled with specific stable isotopes like 13C, 15N, and 2H from a given labeling source and/or from natural abundance. The current prototype efficiently searches a roughly 4.8 quintillion (4.8x1018) IMF space for each peak’s m/z, based on molecular masses <=2000 daltons, but larger IMF spaces are searchable. This approach has none of the limitations of current methods that can only detect known metabolites in a database. Thus, this new method enables the full interpretation of untargeted metabolomics studies through the identification of metabolites at the level of structural isomers representing the same EMF. We validated the assignment performance using verified assignments from a peak list derived from a Thermo Orbitrap Fusion Tribrid FTMS spectrum of a biological sample that had been treated with the ECF (2Cl-CO2Et) chemoselection agent. The current SMIRFE prototype provided both high accuracy for untargeted assignment for verified metabolite cliques and unambiguous IMF assignment for over half of the detected peaks in the tested peak list.
  • Oct 2
    • Presenter: Bradley Stewart
  • Oct 9 - Between-scan peak correspondence and normalization for direct-injection Fourier transform mass spectrometry data
    • Abstract: Direct-injection Fourier-transform mass-spectrometry (FTMS) is employed by many research groups as a method in metabolomics, gathering abundance information about all possibly present metabolites in a biological system of interest. In many cases, the data are acquired in multiple “scans”, and the point intensities are averaged across scans for peak identification, fitting, and integration, resulting in upwards of tens of thousands of individual peaks for assignment. However, differences in the relative scale between scans can span multiple orders of magnitude, reducing the effectiveness of simple averaging approaches and leading to the introduction of a significant number of noise peaks. Previous work in our lab has identified peak artifacts present in FTMS acquired data. As a parallel to that work, we have developed methods for characterizing peaks in individual scans and subsequently combining peaks across multiple scans. The developed methods include peak identification, peak fitting and integration, noise peak determination, followed by peak correspondence, normalization, and averaging across scans. Through the combination of these methods, the data density for a given sample is greatly reduced, going from 13000 or 30000 peaks to 2000 or 4000 peaks, while gaining information about the reliability of a peak via the number of scans it was observed in, as well as the (relative) standard deviation of the peak mass-to-charge ratio, height (often reported as intensity), and area. These scan-level peak characterizations and aggregations to significantly improve downstream data analyses including assignment to specific metabolite isotopologues.
    • Presenter: Robert M Flight
  • Oct 17 - Commonwealth Computational Summit
  • Oct 30 - Statistical analyses to detect and refine genetic associations with neurodegenerative diseases
    • Presenter: Yuriko Katsumata
  • Nov 6 - The Landscape of Isoform Switches in Human Cancers
  • Nov 13
  • Nov 27 - Understanding the chemistry of calcium signaling through computation: The Calcium, Calmodulin, Calcineurin signaling ‘triad’
    • Presenter: Peter Kekenes-Huskey
  • Dec 4
  • Dec 11

Summer 2017 (1 presentation)

Spring 2017 (15 presentations)

  • Jan 2 - Observing New Years Day
  • Jan 9 - ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions
  • Jan 16 - Martin Luther King Jr Day
  • Jan 23 - Reconstruction of biological pathways and metabolic networks from in silico labeled metabolites
  • Jan 30 - Insights into Disease-Associated Mutations in the Human Proteome through Protein Structural Analysis
  • Feb 6 - No Speaker
  • Feb 13 - Navigating the Evolving Computational Landscape
    • Presenter: Jim Griffioen
  • Feb 15 - CANCELED! - Modular Ontology Modeling for Data Access and Reuse (IBI Seminar)
    • Presenter: Pascal Hitzler, Wright State University
    • Location: Hardymon Theater in the Marksbury Building
    • Abstract: One of the original motivations for developing ontologies was that they were to act as generic domain models which can be easily reused and repurposed. However, ontology modeling for applications in practice is often driven by very concrete use cases, and thus the corresponding ontologies are often strongly tailored towards meeting very specific use case requirements. As a consequence, ontologies in practice are often not easy to repurpose, and their added value for data access and reuse is limited. In this presentation, we discuss how to model ontologies in such a way as to simplify future reuse. In particular, we will discuss modularization of ontologies, the role of ontology design patterns, and ontology views.
  • Feb 20 - Big Compute, Big Data, and Better Drugs, Beyond Docking: Increasing the Accuracy of Virtual Screens
    • Presenters: Sally Ellingson & Amir Kucharski
  • Feb 27 - Citizen Science, Data Science
    • Presenter: Jin Chen
  • Mar 6 - Exogenous Metabolic Enzymes as Therapies
    • Presenter: Chang-Guo Zhan
  • Mar 13 - Metabolomics of thrombotic myocardial infarction: systems characterization of plasma metabolome perturbations and the development of a diagnostic classifier
    • Presenter: Patrick Trainor, invited speaker from University of Louisville
    • Abstract: Heart disease is the leading cause of global mortality. Acute Myocardial Infarction (MI), is an acute disease event that is characterized by myocardial ischemia and necrosis. While myocardial necrosis is a unifying pathological characteristic and a central tenet of diagnostic criteria, detection of necrosis does not inform clinicians as to the antecedent cause. Specifically, MI may follow spontaneous atherosclerotic plaque disruption that results in a coronary thrombus (thrombotic MI) or may follow non-thrombotic causes such as coronary vasospasm that result in oxygen supply and demand mismatch (non-thrombotic MI). We sought to achieve two aims using untargeted metabolomic profiling of human plasma: (1) to determine the differential affects on human metabolism of thrombotic MI versus non-thrombotic MI and (2) to develop a preliminary diagnostic classifier capable of discriminating between thrombotic MI, non-thrombotic MI, and stable disease. We enrolled subjects presenting with thrombotic MI, non-thrombotic MI or stable coronary artery disease (CAD) and quantified plasma metabolites by untargeted UPLC-MS/MS and GC-MS in the acute event phase and a stable disease state for each subject. In this talk we discuss a systems approach for evaluating the dynamic change in modules of interrelated metabolites across the transition from a stable disease state to acute event. Modules were inferred by analyzing the topology of a weighted network constructed from plasma abundances. We then pivot to a discussion of methodology we have developed for automated metabolite selection for diagnostic classifiers which borrows from related work in biologically inspired computing and artificial intelligence. We conclude with a discussion of our important findings and future research.
  • Mar 20 - Gene regulation in ischemia-reperfusion injury of the retina
    • Presenter: Kalina Andreeva
    • Abstract: Ischemia-reperfusion injuries are associated with several diseases/disorders of the retina. Current treatments often have a poor outcomes, in part due to a lack of understanding of the molecular mechanisms and potential therapeutic targets. Our lab has generated mRNA and miRNA microarray expression data to investigate the transcriptional and post-transcriptional regulation of gene expression following induction of ischemia-reperfusion injury in the rat retina. We have identified several regulatory elements including transcription factors (TFs), micro RNAs (miRs) and mRNAs all of which play key roles in the early and late phases of IR injury. Recently, a new class of non-coding RNA regulators, termed circular RNAs (circRNAs), have been reported to be encoded in the genome and expressed in all tissues and cell types investigated thus far. We have examined the genome-wide expression of circRNAs in rat’s model of retinal ischemia. The analyses reviled that thousands of circRNAs accumulate in neural rat’s retina and that their accumulation was altered in the IR-injured eye when compared with the corresponding sham control.
  • Mar 27 - Comparative Transcriptomics of Limb Regeneration: Identification of Conserved Gene Expression Changes Among Three Species of Ambystoma
    • Presenter: Varun Dwarka
    • Abstract: Advances in sequencing technologies and analyses are beginning to allow robust testing of the gene networks underlying limb regeneration. In order to elucidate a core set of genes that are commonly expressed among Ambystomatid salamanders that elicit a natural regenerative response, we used a comparative approach between close and distant relatives of the Ambystoma mexicanum, or the Mexican axolotl. We reasoned that it would be possible to identify and parse species-specific expression differences using newly developed expression analyses. Here we report commonly expressed genes among three naturally regenerating Ambystomatid species: A. mexicanum, A.andersoni, and A. maculatum at 24 hours of wound healing.
  • April 3 - Bioinformatic approaches to characterizing salamander sex chromosomes
    • Presenter: Melissa Keinath & Nataliya Timoshevskaya
  • April 10 - Tools for assigning mass spectra from labeled chemoselectively derivatized samples
    • Presenter: Joshua Mitchell
  • April 17 - Association Kinetics of CaN binding to CaM
    • Abstract: Calcineurin (CaN) is a serine/threonine phosphatase that regulates a variety of physiological and pathophysiological processes in most mammalian tissue. It has been established that the calcineurin (CaN) regulatory domain is highly disordered when inhibiting CaN, yet it undergoes a disorder-to-order transition upon binding calmodulin (CaM) to activate the phosphatase. Given the enrichments of negatively charged residues in CaM and positively charged residues in CaM-binding region in CaN, it is intuitive to postulate that the electrostatic interaction between these two binding partners should play an important role in association kinetics. Meanwhile, the conformational dynamics of highly-flexible CaN could dictate the availability of CaM-accessible CaN states, which could affect the overall association kinetics as well. In this presentation, I will talk about a series of computational studies we performed to explore the electrostatic and conformational roles in the CaN:CaM association process.
    • Presenter: Bin Sun
  • April 19 - IBI Seminar: High-throughput Biomedical Image Computing for Digital Health
    • Abstract: In biomedical informatics, a large amount of image data has been collected to support clinical diagnosis, treatment decision and medical prognosis. The large volume and the diversity of informatics across different imaging modalities require advanced and high-throughput image computing technologies for more accurate disease detection, deeper understanding of the mechanisms of disease progression, and better healthcare in precision medicine. With the ever increasing amount of biomedical image data, it is very important to design and develop efficient technologies for large-scale biomedical image analysis. This talk will describe high-throughput biomedical image computing methods for digital health, focusing on three significant topics: object detection, segmentation, and image understanding in medical diagnosis. Specifically, I will present several novel machine learning and imaging informatics technologies to process biomedical big image data and introduce the applications of these technologies in medical diagnosis
    • Presenter: Fuyong Xing, University of Florida
    • Location & Time: 12:00-12:50pm, April 19, 2017 in 170 Biopharm Complex (Todd Building)
  • April 24: The importance of experimental design and QC samples in large-scale and MS-driven untargeted metabolomic studies of humans
  • April 24: IBI Seminar - Feature Selection and Learning on High-Dimensional and Large-Scale Data
    • Presenter: Qiang Cheng, PhD, Southern Illinois University Carbondal
    • Abstract: Diverse areas of scientific research and everyday life, such as healthcare, biomedicine and finance, are now deluged with high-dimensional data and big data. There is a need of data mining and prediction techniques for finding patterns and discovering knowledge from such data. In this talk I will present our feature selection and learning methods for handling such data effectively and efficiently. The feature selection methods integrate intrinsic discriminative information and exploit global optimization techniques on Markov random fields, giving rise to a closed-form solution of linear complexity. The learning methods are built within our minimax pattern learning framework, extending lasso-type sparse representation and possessing efficient complexity and fast convergence. I will present both supervised and unsupervised models that exploit jointly representation and learning. It is expected that these methods will have potentially a significant impact on various fields such as medicine and science
  • May 1 - Automating the semantic enumeration and extraction of concepts from ontologies
    • Presenter: Eugene Hinderer

Fall 2016 (13 presentations)

  • Aug 29 - Canceled due to no-one volunteering
  • Sep 5 - Labor Day, No Meeting
  • Sep 12 - Organizational Meeting
  • Sep 19 - Analysis of protein-coding genetic variation in 60,706 humans
  • Sep 26 - DeepSplice: Deep Classification of Novel Splice Junctions Revealed by RNA-seq
    • Abstract: Alternative splicing (AS) is a regulated process that enables the production of multiple mRNA transcripts from a single multi-exon gene. The availability of large-scale RNA-seq datasets has made it possible to predict splice junctions, as well as splice sites through spliced alignment to the reference genome. This greatly enhances the capability to decipher gene structures and explore the diversity of splicing variants. However, existing ab initio aligners are vulnerable to false positive spliced alignments as a result of sequence errors and random sequence matches. These spurious alignments can lead to a significant set of false positive splice junction predictions, confusing downstream analyses of splice variant detection and abundance estimation. In this work, we illustrate that splice junction sequence characteristics can be ascertained from experimental data with deep learning techniques. We employ deep convolutional neural networks for a novel splice junction classification tool named DeepSplice that (i) outperforms state-of-the-art methods for predicting splice sites, (ii) shows high computational efficiency and (iii) can be applied to self-defined training data by users.
    • Presenter: Yi Zhang
  • Oct 3 - Bhattacharyya distance – From concept to grant application
    • Abstract: Recent work by our group revisited feasible solution algorithms (FSAs) first popularized by Doug Hawkins in the early 1990’s. We use FSAs to find interactions between explanatory variables in predictive models. Initial versions of the algorithm failed miserably for logistic regression in big data problems with n << p, where p is the number of explanatory variables. We were able to overcome this issue using the Bhattacharyya distance between two bivariate distributions, which allowed us to write grants to further investigate the idea.
    • Presenter: Arnold Stromberg
  • Oct 10 - The natural selection of bad science
  • Oct 10 - Hospital Access Audit Logs - Roles and Anomalies
    • Presenter: Carl Gunter
    • Time & Location: 4pm, Davis Marksbury Building
  • Oct 17 - Interactive, Visual Data Curation Using PREMISE
    • Presenter: Patrick Shepard
  • Oct 20 - IBI Seminar: Deep-Learning: Investigating feed-forward Deep Neural Networks for Modeling High Throughput Chemical Bioactivity Data
    • Presenter: Luke Huan, University of Kansas
  • Oct 24 - Higher-order Organization of Complex Networks
  • Oct 31 - Mining the lamprey genome for oncogenes
    • Presenter: Jeramiah Smith
  • Nov 7 - Canceled
  • Nov 14 - Analysis of the dynamic coexpression network of heart regeneration in the zebrafish
  • Nov 21 - Thanksgiving
  • Nov 28 - Putative Link Between ZIKV-encoded miRNAs, Microcephaly, and Other CNS and PNS Disorders
    • Presenter: Michael Sheetz
  • Dec 5 - Using Next-Generation Sequencing to Analyze miRNA Profiles in Placenta and Serum During Normal Equine Pregnancy
    • Presenter: Shavahn Loux
  • Dec 12 - Toward unraveling the molecular basis of ion affinity and selectivity in small, calcium binding proteins
    • Presenter: Pete Kekenes-Huskey

Summer 2016 (11 presentations)

  • May 16 - No Seminar due to the Bluegrass Molecular Biophysical Symposium
  • May 23 - Computationally characterizing genomic pipelines using high-confident call sets
    • Presenter: Xiaofei Zhang
  • May 30 - No Seminar due to Memorial Day Holiday
  • June 6 - Gene expression features of articular cartilage
    • Presenter: Emma Adam
  • June 13 - Canceled
  • June 20 - Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control
  • June 27 - Identification of shared and unique susceptibility pathways among cancers of the lung, breast, and prostate from genome-wide association studies and tissue-specific protein interactions
  • July 4 - No Seminar due to Independence Day Holiday
  • July 11 - A gut (microbiome) feeling about the brain
  • July 18 - A More Comprehensive Examination of the Human Genome by Long-Read Sequencing
    • Presenter: Matthew Hestand, Laboratory for Cytogenetics and Genome Research
    • Abstract: Cheap, high-throughput, short-read sequencing has brought forth a genomics revolution. However, the nature of this technology limits variant detection primarily to unphased single-nucleotide variants and small indels, as well as creating 'black boxes' in the genome due to low complexity sequences, repetitive elements, and regions of skewed GC content. However, long-read technologies do enable sequencing through many of these difficult regions, including those of clinical relevance. For example, we have determined length and AGG interruptions in the CGG tandem repeat that causes FXTAS, Primary Ovarian Insufficiency, and the Fragile X syndrome. Long-read sequencing also enabled us to determine the structure, breakpoint sequences, and postulate the underlying mechanism of previously unobserved chromothripsis-like chromosomes. Overall, we demonstrate the utility of PacBio long-read technology to evaluate structural variations, discriminate pseudogene sequences, directly phase single-nucleotide variants, identify variation in tandem repeats, and even de novo assemble a full human genome.
  • July 25 - Deep Convolutional Neural Networks: Concepts and Examples
    • Presenter: Nathan Jacobs, Associate Professor of Computer Science, Center for Visualization and Virtual Environments, University of Kentucky
    • Abstract: For the past 5 years, methods based on Deep Convolutional Neural Networks (CNNs) have been dramatically advancing the state of the art in computer vision, approaching, and often exceeding, human speed and accuracy. This has been made possible by a combination of factors, including novel neural network architectures, massive datasets, faster hardware, improved software abstractions, and various low-level algorithmic innovations. This talk will provide a technical introduction to CNNs, an overview of their recent rise in popularity in the field of computer vision, and examples of their use for a variety of semantic and geometric image understanding tasks.
  • Aug 1 - Understanding Cation Binding to SERCA using Molecular Dynamics
    • Presenter: Bradley Stewart
  • Aug 8 - Detailed gene-based association study of genes linked to hippocampal sclerosis of aging and cerebral age-related TDP-43 with sclerosis neuropathology: GRN, TMEM106B, ABCC9, and KCNMB2
    • Abstract: Hippocampal sclerosis of aging (HS-Aging) is a common and distinctive clinical-pathological entity that can cause dementia. To learn more about genetic risk of HS-Aging pathology, we tested gene-based associations of the GRN, TMEM106B, ABCC9, and KCNMB2 genes, which were reported to be associated with HS-Aging in previous studies. We used genetic data obtained from the Alzheimer’s Disease Genetics Consortium (ADGC), linked to autopsy-derived neuropathological outcomes from the National Alzheimer’s Coordinating Center (NACC). Of the 3,251 subjects included in the study and who died after age 60 years, 271 (8.3%) were identified as a HS-Aging case. The highest association signals came from SNPs on the ABCC9 gene (rs7966849), and on the KCNMB2 gene (rs73183328). The ABCC9 gene had a significant gene-based association with HS-Aging assuming recessive mode of inheritance (MOI) when applying the Bonferroni correction. We confirmed the same results in people aged 80 years or older. The significant gene-based association of the ABCC9 gene is driven by the region in which the most significant variants are introns, whereas our studies underscore the many different SNPs that are in linkage disequilibrium of the HS-Aging associated SNP in the TMEM106B gene.
    • Presenter: Yuriko Katsumata
  • Aug 15 - A new Booster Proposal for Inference in Population Genetics
    • Abstract: Impostance Sampling plays a fundamental role in likelihood based inference for various population genetic models. Although exact algorithms are not practical for moderate datasets, they can provide valuable intuition for improving proposals. Using one such intuition, we propose a booster proposal that works with an existing proposal to bring about more than an order of magnitute improvement in accuracy under the standard neutral coalescent model of a single, well-mixed population of constant size over time following infinite sites model of mutation. The improvement is consistent in both simulated and real datasets. The method is not based on resampling and thus preserves independence of the samples. It is also faster and the memory requirements are comparable to that of the existing methods. It is generic in nature and thus readily applicable to more complex models involving migration and recombination. It provides a strong support towards our continued advocacy from earlier works that systems approach can be a viable solution to the Felsenstein`s 2^8 programs problem.
    • Presenter: Susanta Tewari
  • Aug 22 - The FAIR Guiding Principles for scientific data management and stewardship

Spring 2016 (16 presentations)

  • Jan 11 - Current and proposed HPC resources supporting bioinformatics and systems biochemistry at UK
    • Presenter: Cody Bumgardner
  • Jan 18 - No Meeting, Martin Luther King Holiday
  • Jan 25 - Improved Sleep related Gene Ontologies through analysis of KOMP2 sleep phenotyping data and gene expression studies
    • Presenter: Shreyas Joshi
  • Feb 1 - Can so few viral proteins really cause so much cellular chaos? The involvement of putative viral-encoded miRNAs in the pathogenicity of Ebola
    • Presenter: Michael Sheetz
  • Feb 8 - Sensitive Positive Mode LC/MS/MS Method for Fatty Acid Analysis
  • Feb 15 - Discrete Models for the Simulation and Control of Gene Regulatory Networks
    • Abstract: Understanding how the physiology of organisms arises through the dynamic interaction of the molecular constituents of life is an important goal of molecular systems biology, for which mathematical modeling can be very helpful. Different modeling strategies have been used for this purpose. Dynamic mathematical models can be broadly divided into two classes: continuous, such as systems of differential equations and their stochastic variants and discrete, such as Boolean networks and their generalizations. This talk will focus on the discrete modeling approach, which employs techniques from discrete mathematics, combinatorics, graph theory, and computational algebra. Discrete models play an important role in modeling processes that can be viewed as evolving in discrete time, in which state variables have only finitely many possible states. This talk will present an approach for stochastic simulations of discrete models. This approach will be used to study optimal control techniques to identify a control policy to navigate the system so that the probability of reaching a desirable state is maximized. The algorithms assume a set of intervention targets represented by control nodes and edges in the wiring diagram and uses techniques from Markov decision processes for the identification of a control policy that dictates how to move from one state to another.
    • Presenter: David Murrugarra
  • Feb 22 - Programming for Multicore CPUs with Python
    • Abstract: A brief introduction to techniques in Python to take advantage of multiple CPU cores to accelerate calculations. A description of the different types of parallel computing in python will be provided but the majority of the information will be on the different ways to use the multiprocessing library in Python. Also, how to use linear algebra libraries (namely OpenBlas) in Numpy will be covered. Code examples will be provided.
    • Presenter: Joshua Mitchell
  • Feb 29 - Approaches to linkage mapping in diverse non-model vertebrates
    • Abstract: A relatively informal discussion of a few non-standard approaches that my lab has been using to generate dense linkage maps for non-model vertebrates. Including outbred crossing designs, genotyping by sequencing, genotyping by RNAseq, and single sperm sequencing. Time permitting, I also plan discuss a few recently developed technologies that were presented at the recent AGBT meeting.
    • Presenter: Jeramiah Smith
  • Mar 7 - Computational Prediction of Adverse Drug Reactions
    • Presenter: Sally Ellingson
  • Mar 14 - Zhx2, liver metabolism, and sex-biased gene expression
    • Abstract: A discussion of my current project within the Spear lab. Our newest data provide evidence for Zhx2 as a novel regulator of cytochrome p450 gene expression in the liver, as well as many other known sex-biased genes. These genes are important for lipid, drug, and steroidal metabolism in the liver and contribute to the development of non-alcoholic fatty liver disease (NAFLD) in our mouse model.
    • Presenter: Alexandra Nail
  • Mar 18 - Pheno-Informatics: A New Framework For Analyzing Phenomics Data
    • Location: Wethington 014 (Basement Auditorium)
    • Abstract: Nowadays, DNA sequence data are available for many species, but the systematic quantification and analysis of phenotypes remains a big challenge. My research aim is to bridge the genotype-phenotype gap by developing novel data mining techniques so that multi-omics data can be transformed into testable hypotheses to identify important genes in various aspects. In this talk, I will first introduce our recent progress in phenomics data modeling, including a new inter-functional phenomics clustering method and a new phenotype-environment relationship learning framework. I will illustrate how these tools have led us to discover new biological mechanism. In the second part, I will discuss our future plan in bioinformatics and data science, and their applications in biomedical research.
    • Presenter: Jin Chen
  • Mar 21 - Non-lethal Inhibition of Gut Microbial Trimethylamine Production for the Treatment of Atherosclerosis
  • Mar 28 - Protein NMR Reference Correction: A statistical approach for an old problem
    • Presenter: Bill (Xi) Chen
  • April 4 - Classification of Cancer Using Metabolomics
    • Abstract: Metabolomics is being regularly applied to understand and characterize various cancers. This seminar will discuss the use of random forests to generate models able to discriminate normal from cancer samples using lipids from lung cancer tissue with small numbers of lipids, and the development of a non-parametric method to evaluate the power of the classification method.
    • Presenter: Robert M Flight
  • April 11 - Canceled due to scheduling problems
  • April 18 - Metagenomic assessment of possible microbial contamination in the equine reference genome assembly
    • Presenter: Scotty DePriest
  • April 25 - Co-occurring Genomic Alterations Define Major Subsets of KRAS-Mutant Lung Adenocarcinoma with Distinct Biology, Immune Profiles, and Therapeutic Vulnerabilities
  • May 2 - Canceled
  • May 9 - Comparison between two chromatography/mass-spectrometry methods used in metabolomics
    • Presenter: Marc Warmoes
  • May 16 - *Canceled due to Bluegrass Biophysics Symposium

Fall 2015 (13 presentations)

  • Aug 31 - An atlas of genetic influences on human blood metabolites
  • Sep 7 - Labor Day, no meeting
  • Sep 14 - Working with STRING PPI’s Offline for Cancer Network Analysis
    • Presenter: Robert Flight
  • Sep 21 - Big Data and Systems Biology Approaches to Explore Transcriptome and RNA Regulatory Networks
    • Presenter: Juw Won Park, University of Louisville
    • Abstract: The high-throughput RNA sequencing (RNA-seq) has provided a powerful tool for transcriptome analysis. Due to the dramatic decrease in cost, it became quite common to generate millions and billions of sequence reads from a given RNA sample to identify/quantify the abundance of mRNA isoforms across the entire transcriptome. Large consortium projects also started generating massive RNA-seq data on tens of thousands of samples along with various other genomic/phenotypic measurements. However, the extraordinary potentials embedded in these large, complex datasets cannot be fully recognized without the development of proper methods for analyzing these big transcriptome and genome datasets. In this presentation, I will discuss my recent efforts in developing computational and statistical methods for the analysis of transcriptome isoform complexity and RNA regulatory networks using RNA-seq datasets.
  • Sep 28 - Applying data fusion approaches on multiply platform metabolomics data acquired from breast cancer tumours
  • Oct 1 - Single Molecule Variant Detection: From Heteroduplexes in a Single DNA Molecule to Whole Chromosome Rearrangements
    • Location & Time: Gluck Equine Research Center Auditorium, October 1, 4pm
    • Presenter: Matthew Hestand
    • Abstract: The PacBio single-molecule sequencing platform produces long (avg 12-15kb) error-prone reads, though the errors are randomly distributed. Therefore, combined with read coverage or circularizing a DNA molecule and repeatedly sequencing both strands produces highly accurate consensus sequences. We have utilized this circular sequencing approach to determine error rates and profiles across six commonly used polymerases. Besides accurately determining mutations in double strands, the platform permits the identification of heteroduplexes, where a base on one strand is not complimentary to a base on the other strand. Interestingly, we observed that Watson-Crick base-pairing errors are not equally distributed, but that across most polymerases there is a bias for pyrimidine transitions over purine transitions. Moving from single molecule errors to chromosome spanning errors, the long reads also provide a unique resource to identify structural variation, including sequencing across repetitive elements. Indeed, we used PacBio to demonstrate an insertional translocation of chrX sequence into chrY, generating an extended pseudoautosomal region (PAR). The insertion is generated by non-allelic homologous recombination between a 548 bp LTR6B repeat within the chrY PAR1 and a second LTR6B repeat located 105 kb from the PAR boundary on chrX. PacBio phasing within the duplicated region also enabled identification of the paternally inherited insert sequence and findings of multiple haplotypes from ancestrally related individuals, demonstrating X/Y recombination. In a separate cohort, aCGH identified three patients containing distinct clusters of only copy number gains across a single chromosome 18 or 22. A combination of Illumina, PacBio, and Sanger sequencing was used to identify and characterize the breakpoints in these patients. For these highly rearranged chromosomes, breakpoint sequences lead to the hypothesis of an origin different from traditional chromothripsis and chromoanasynthesis, possibly a repair process driven by non-canonical non-homologous end joining mediated by polymerase theta. In conclusion, we demonstrate the PacBio platform provides unique capabilities to detect variation, from single molecules to whole chromosomes rearrangements.
  • Oct 5 - Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction
  • Oct 12 - Atlas of Cancer Signalling Network: a systems biology resource for integrative analysis of cancer data with Google Maps
  • Oct 19 - The lamprey genome: deep insights, deeper challenges
    • Presenter: Jeramiah Smith
  • Oct 26 - Molecular Dynamics Study of Divalent Ion coordination in EF Hand Proteins
    • Presenter: Caitlin Scott
  • Nov 2 - Toxicology Seminar: IDENTIFYING CANCER DRIVERS USING AN EVOLUTIONARY SYSTEMS APPROACH
    • Presenter: Natarajan Kannan
    • Location: MN 263
  • Nov 9 - Identifying anti-growth factors for human cancer cell lines through genome-scale metabolic modeling
  • Nov 16 - The BioPlex Network: A Systematic Exploration of the Human Interactome
  • Nov 23 - Cancelled
  • Nov 30 - Establishing Precise Evolutionary History of a Gene Improves Predicting Disease Causing Missense Mutations
    • Presenter: Igor Zhulin
    • Abstract: Predicting the phenotypic effects of mutations has become an important application in population genetics studies and clinical genetic diagnostics. Computational tools, such as PolyPhen and SIFT, utilize comparative genomics to evaluate the behavior of the variant over evolutionary time and assume that variants seen during the course of evolution are likely benign in humans. However, due to full automation and applicability to all human genes these tools do not reconstruct the detailed evolutionary history of any given gene, such as assignment of orthologous/paralogous relationships. On the other hand, it is known that paralogs have dramatically different roles in Mendelian diseases. For example, while inactivating mutations in the NPC1 gene cause the neurodegenerative disorder Niemann-Pick C, inactivating mutations in its paralog NPC1L1 are not disease causing and moreover are implicated in protection from coronary heart disease. We identified major events in NPC1 evolution and revealed and compared orthologs and paralogs of the human NPC1 gene through phylogenetic and protein sequence analyses. Based on the results, we built an algorithm to distinguish deleterious from neutral variants. We demonstrated that by removing the NPC1 paralogs and distant homologs from the analysis we can improve the overall performance of categorizing damaging and benign single amino acid substitutions. Our results show that a thorough analysis of gene history followed by identification of functionally equivalent orthologs improves the accuracy in predicting disease-causing missense mutations. We anticipate that this approach will be used as a reference in the interpretation of variants in other genetic diseases as well.
  • Dec 7 - Canceled - Scheduling mixup
  • Dec 14 - Canceled - Room 501C needed for exams

Summer 2015 (15 presentations)

Spring 2015 (16 presentations)

  • Jan 5 - The Horse Genome Project
    • Presenter: Jamie MacLeod
    • Interest: genomic sequencing and annotation
  • Jan 8 - Illuminating the last 4 million years of equine evolution
    • Presenter: Ludovic Orlando
    • Location & Time: Gluck Equine Research Center Auditorium, January 8, 4pm
    • Abstract: Horses, zebras, and asses represent the only living members of the equid family. This family originated in Northern America some 55 millions ago and flourished into a large number of species during the Tertiary period. The deep evolutionary history of equids is well documented in the paleontological record and represents a textbook example of evolution. However, their recent evolutionary history remains largely unknown. By sequencing the genome of a 700,000 year-old horse, representing the oldest genome hitherto sequenced, our group has shown that the most recent common ancestor of extant equids lived some 4 million years ago. Further genome sequencing for each species within the family illuminated the patterns and processes of the equine radiation, from its early split in the New World to its subsequent migrations into Eurasia and Africa. This revealed large-scale demographic expansions and contractions following major climatic changes as well as the genetic toolkit that underlies the species’ adaptations. Importantly, our comparative genome dataset revealed multiple cases of gene flow between species. This shows that the species barrier is not always waterproof, and challenges current speciation models, which assume that changes in the chromosomal structure often result in full reproductive isolation. In addition to help better comprehending the processes driving the origins of species and their adaptation, the equid family, which includes not less than two domesticated species, also offers a fantastic opportunity to study how humans transformed wild animals into domesticates that best suit their purpose. Using ancient DNA, our group reconstructed the complete genomes from horses that lived prior to the domestication and identified 125 genes that have been positively selected since. This conservative set of genes reveals the range of physical, physiological and behavioral functions that have been reshaped by humans during history and antiquity.
  • Jan 12 - A Negative Binomial Model-Based Method for Di fferential Expression Analysis Based on NanoString nCounter Data
    • Presenter: Hong Wang
    • Type: Research Presentation
  • Jan 19 - Martin Luther King Day, No seminar
  • Jan 26 - ComPPI: a cellular compartment-specific database for protein-protein interaction network analysis
  • Feb 2- DNA Cytosine Methylation: Structural and Thermodynamic Characterization of the Epigenetic Marking Mechanism
  • Feb 9 - Integrating Interactome and Transcriptome Data
  • Feb 16 - Cancelled due to Huge Snow Storm (15+ inches of snow...)
  • Feb 23 - Tissue-based map of the human proteome
  • Mar 2 - Overview and perspective from the recent EMBL-EBI metabolomics training workshop
  • Mar 9 - ARK
    • Presenter: Andrew McCollam
  • Mar 16 - Spring Vacation (do we want a seminar??)
  • Mar 23 - Good Communication Makes for Really Bad Wine: Prion-Based Transformation of Metabolism
    • Papers: 1, 2
    • Presenter: Michael Sheetz
  • Mar 30 - An Underview of Enzymology and Metabolism
    • Presenter: Andrew Lane
  • April 6 - Gene set analysis: limitations in popular existing methods and proposed improvements
  • April 13 - Lipid profiling for early diagnosis and progression of colorectal cancer using direct-infusion electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry
  • April 20 - CoMEt: A Statistical Approach to Identify Combinations of Mutually Exclusive Alterations in Cancer
  • April 27 - Laboratory Information Management Systems (LIMS) built in Galaxy

Fall 2014 (5 presentations)

  • Sept 1 - Labor Day - No Journal Club
  • Sept 8 - IVT-seq reveals extreme bias in RNA sequencing
  • Sept 15 - New Frontiers in Data Analytics
    • Presenters: Xiangrong Yin and Arnold J. Stromberg
    • Type: Infrastructure Presentation
  • Sept 22 - An atlas of genetic influences on human blood metabolites
  • Sept 29 - RESCHEDULED - Galaxy: An Open, Web-based platform for Data-Intensive Biological Analysis
    • Presenter: Jeremy Goecks, George Washington University
    • Type: Invited Speaker Research Presentation
    • Abstract: Galaxy ( http://galaxyproject.org) is a popular Web-based analysis platform for data-driven biology and for genomics in particular. Galaxy’s mission statement is to make computational biology analyses accessible, reproducible, and collaborative. Galaxy addresses the complete scientific data analysis process, with support for analysis tools and complete histories, reproducible workflows, data visualizations, and interactive publication supplements. Galaxy has been cited more than 1500 times in scientific publications, there are 62 active public servers, and our main public server ( http://usegalaxy.org) processes ~130,000 analysis jobs each month. In this talk, I will describe the Galaxy platform, the Galaxy team and community, and future directions for the project.
  • Oct 6 - IBSeq: An island-based approach for RNA-seq differential expression analysis
    • Presenter: Abdallah Eteleeb, University of Louisville
    • Type: Invited Speaker Research Presentation
    • Abstract: High-throughput mRNA sequencing (also known as RNA-Seq) promises to be the technique of choice for studying transcriptome profiles. This technique provides the ability to develop precise methodologies for transcript and gene expression quantification, novel transcript and exon discovery, and splice variant detection. One of the limitations of current RNA-Seq methods is the dependency on annotated biological features (e.g. exons, transcripts, genes) to detect expression differences across samples. This forces the identification of expression levels and the detection of significant changes to known genomic regions. Any significant changes that occur in unannotated regions will not be captured. To overcome this limitation, we developed a novel segmentation approach, Island-Based (IBSeq), for analyzing differential expression in RNA-Seq and targeted sequencing (exome capture) data without specific knowledge of an isoform. The IBSeq segmentation determines individual islands of expression based on windowed read counts that can be co
Topic revision: r307 - 18 Apr 2019, RobertFlight
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Moseley Bioinformatics Lab? Send feedback