Proceedings of the 2014 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference

The MidSouth Computational Biology and Bioinformatics Society (MCBIOS 2014) held its eleventh annual conference at the Wes Watkins Center at Oklahoma State University, Stillwater on March 7-8, 2014. The theme was "From Genome to Phenome: Connecting the Dots". Conference Chair this year was Rakesh Kaundal, who is also one of the MCBIOS board members, and conference committee members were Ulrich K. Melcher and Doris Kupfer. The current president is Andy Perkins and Cesar Compadre was elected as President-Elect for 2015-16. There were 154 registrants and a total of 125 abstracts submitted (50 oral and 75 poster presentations). 
 
Keynote speakers were Owen White from The University of Maryland School of Medicine, whose talk was titled "The Human Microbiome Project: Large-Scale Data Management and Analysis", and Jody Hey from Temple University "Designing Genealogy Samplers for Population Genetics". Dr. William Slikker, Director of the Food and Drug Administration's, National Center for Toxicological Research, concluded with a talk on the past ten years of MCBIOS and a perspective on its future. 
 
Participants also had the opportunity to attend a workshop on next-generation sequencing (NGS), hosted by Peter Hoyt of OSU and Dr. Graham Wiley of the Oklahoma Medical Research Foundation. The workshop had a keynote by Dr. Joshua Orvis of The University of Maryland School of Medicine and Johns Hopkins University on genome annotation. 
 
The winners of conference awards were: 
 
Best Paper Award: Zongliang Yue, Ping Wan, Hui Huang, Zhan Xie and Jake Y. Chen for "SLDR: A Method to Identify New Gene Regulatory Relationship Candidates" [1] 
 
Best Paper Runner-up: Nam S Vo and Vinhthuy Phan for "Exploiting dependencies of pairwise-comparison outcomes to predict patterns of gene response" [2] 
 
Best Oral Presentations (Post-Doctoral fellows): 
 
Michael A. Bauer, University of Arkansas for Medical Sciences 
 
Erich A. Peterson, University of Arkansas for Medical Sciences 
 
Best Oral Presentations (students): 
 
Karl Walker, Arkansas State University, 1st place Shraddha Thakkar, University of Arkansas for Medical Sciences, 2nd place 
 
Mihir Jaiswal, University of Arkansas at Little Rock, 3rd place 
 
Best Poster (Computation): 
 
Stephen Reichley, Mississippi State University, 1st place 
 
Kushal Bohra, Texas A&M University at Commerce, 2nd place 
 
Austin McCullough, John Brown University, 3rd place 
 
Best Poster (Biology): 
 
Shraddha Thakkar, University of Arkansas for Medical Sciences, 1st place 
 
Sunetra Das, University of Oklahoma, 2nd place (tie) 
 
Garima Saxena, University of North Texas, 2nd place (tie) 
 
Kangmei Zhao, University of Oklahoma, 2nd place (tie)


Selecting papers for the MCBIOS XI Proceedings
All papers were anonymously peer-reviewed by at least 2 reviewers and quantitatively evaluated on the basis of three criteria: Novelty, Impact and Clarity, enabling papers to be ranked. Editors who were also co-authors of submitted papers were not permitted to handle their own papers editorially. This year, 22 papers were submitted, and 16 papers were deemed acceptable by reviewers, giving an acceptance rate of 73%, higher than last year's 59%. Papers generally fell into four categories:

Networks and pathways
Zongliang Yue, et al. [1] developed a new computational technique called Step-Level Differential Response (SLDR) to identify genetic regulatory relationships. This technique takes advantages of functional genomics data for the same species under different perturbation conditions, therefore complementary to current popular computational techniques. It can particularly identify "rare" activation/inhibition relationship events that can be difficult to find in experimental results. SLDR is computationally efficient with o (N2) complexity and may be applied to the mining of functional genomics big data for future network biology and network medicine applications.
The manuscript by Toby and colleagues [3] reports some interesting observations regarding the possible role of the SigB regulon in the divergence of members of the Bacillus cereus group based on the protein-coding content of 25 completed whole genomes of B. cereus group isolates. The authors used cluster analysis of orthologous proteins to reconstruct the clade structure of the B. cereus group, and found that the resulting structure follows the pattern of what genes belong to the SigB regulon or not. Their observation suggests the hypothesis that horizontal gene transfer, gene duplication/divergence and deletion dictate the underlying coding capacity in these genomes, and is likely a common pattern in prokaryotic evolution.
Peterson et al. developed a workflow based on whole exome sequencing to discover clonal lineages in tumor samples. Here, they utilize the availability of samples from a single multiple myeloma patient from initial presentation and two subsequent relapses to arrive at key single nucleotide and insertion/deletion events in Key Genes that presumably gave rise to the initial presentation and the relapses. What is unique is the visualization tool, CloneViz, which allows for visualization of these events by chromosome and applies Gaussian kernel density plots for easy comparison between samples.
The benefit is a better understanding of the evolution of a cancer [4].

Genomics & transcriptomics
In light of a shrinking NIH budget, researchers can stretch funds further if they can decrease the number of samples used for gene expression profiling. The paper by Vo and Phan [2] addresses the issue of small sample size in microarray experiments that contain measurements of multiple responses. While they developed their method for microarray studies, it can well be scaled to gene expression profiling using RNA sequencing technology. The latter requires complex data processing, made easy by the paper of Peng et al. Their software, SeqAssist [5], processes raw FASTQ files and extracts useful biological data. Zhang et. al. used two other stateof-the-art pipelines to analyze whole genomes, by sequencing 35 Korean individuals [6]. Not only did they identify genomic variants specific for the Korean population, but also analyzed functions of genes and disease susceptibility associated with these SNPs.
Michael A. Bauer, et al. describe custom software tools for the integration and analysis of data from various molecular profiling methods. This software allows the integration of microarray, RNA-Seq, and whole exome sequencing data, and was applied here to sample from a patient with multiple myeloma for illustrative purposes [7].

Proteomics
The manuscript by Jaiswal and co-authors [8] presents a new algorithm for analyzing cross-linking mass spectrometry data, XLPM, and its implementation. The results of applying the new software to model systems are shown. Central to this approach is the B-y ion filter which the authors have used to analyze NIST spectral library. The authors claim the XLPM selection filter leads to increased analysis speed and higher confidence non-specific crosslink data.
The manuscript by Zhang and Zhao [9] performed some quality measurements on mass-spectrometry metabolomics data. They conclude that their proposed zigzag index is better than several metrics including the MCQ index in evaluating the quality of extracted ion chromatographs (EICs). This work is important due to a need for effective metrics of quality evaluation of EICs to filter out low-quality data.
Millions of dollars are spent annually to better understand how pathogens infect their hosts and to identify potential targets for therapeutics. Protein-Protein Interaction (PPI) is an important mechanism playing a crucial role in host-pathogen interactions and pathogenicity. Although there are several bioinformatics methods developed to predict PPIs at the intra-species level, there are very few studies at the inter-species level. As a case study on Arabidopsis-Pseudomonas syringae interaction system, Sitanshu S. Sahu et al [10] developed various domain and interolog based approaches to predict genome-scale PPI network. This is the first report of deciphering an interspecies interactome in any plant-microbe system. Jo and Cheng [11] discussed an important problem in proteome science i.e. protein fold recognition. A random forest based machine learning method is proposed to predict the fold of proteins and they demonstrated its efficiency by comparing their method with the existing approaches. The method should help in recognizing the correct structural fold for template-based protein structure modeling.
Laccases (E.C. 1.10.3.2) are multi-copper oxidases that have gained importance in many industries such as biofuels, pulp production, textile dye bleaching, bioremediation, and food production. Previous classification systems for laccase enzymes are based on multiple sequence alignments and they largely follow species taxonomy rather than substrate ranges, enzyme properties, or specific function. In the manuscript by Weirick et al [12], they developed a two-phase classification system; first using unsupervised learning approaches to identify various laccase subtypes based on sequence features and enzyme function, and then developed a supervised learning method for predicting/classifying new laccases from the unknowns. This tool will be a useful resource to the biotechnology community especially those working in the area of biofuels.
Huiwen Ng et al. [13] developed a competitive docking approach (CDA) for performing ligand-docking in Estrogen receptors. The CDA takes into account and compares the non-covalent interactions between a specific ligand and the two separate docking models based on the respective docking scores of the docked complex and, therefore, better reflects the receptor-ligand interaction. The CDA approach is extensible to other receptor targets both to screen for potential binders and to differentiate between agonists and antagonists, and is as applicable in drug discovery as for regulatory testing purposes.

Miscellaneous
Thousands of bioinformatics programs have been published and put online, but studies have shown that many of these resources become inaccessible in a time-dependent manner [14]. Jason Hennessey et al. conduct the largest survey to date of the current availability of previously published URLs -over 27,000. They attempt to identify which ones are Scientific Data Analysis Resources (SDARs) and analyze factors associated with the probability they are still available. They find that SDAR production tends to be distributed widely among institutions (in contrast to publications in general, the bulk of which tend to come from a relatively small fraction of institutions), and that more authors per SDAR-producing paper tends to equate with a greater probability of future accessibility [15].
Weizhong Zhao et al. discuss methods of summarization of text datasets in the form of topic modeling and show its effectiveness on three separate biological datasets [16].
Stephen Grace et al. present Haystack, a web-based tool for metabolomics research. Haystack is designed to visualize, parse, filter, and extract significant features from Liquid Chromatography Mass Spectrometry (LCMS) datasets. They show it is effective when comparing proteomics data from plants grown under two different light conditions [17].