Proceedings of the Eighth Annual UT-ORNL-KBRIN Bioinformatics Summit 2009

Over the past decade, the University of Tennessee (UT), Oak Ridge National Laboratory (ORNL), and the Kentucky Biomedical Research Infrastructure Network (KBRIN) have collaborated to share their extensive bioinformatics research and educational expertise to further strengthen bioinformatics in the Tennessee and Kentucky region. One of the results of these collaborations is joint sponsorship in an annual regional bioinformatics summit that brings together researchers, educators, and students with interests in bioinformatics from research and educational institutions in Kentucky, Tennessee, and other states. These summits provide unique opportunities for enhancing collaborative links in the region and for further integration of multidisciplinary research efforts across institutions. As a result, a number of new collaborative research and educational projects have been fostered across institutions. The Eighth Annual UT-ORNL-KBRIN Bioinformatics Summit was held at Fall Creek Falls State Park in Pikeville, Tennessee from March 20–22, 2009. A total of 202 participants registered for the summit, with 94 from various Tennessee institutions and 68 from various Kentucky institutions. A number of additional participants came from universities and research institutions from other states and countries, e.g. the National Institutes of Health, Virginia Commonwealth University, University of Cincinnati, Emory University, and the University of British Columbia. Seventy-seven registrants were faculty, with an additional 62 student, 43 staff, and 20 postdoctoral level participants.

ne 2009D007D533626CDC7D88A272D0B56D2B5B10.1186/1471-2105-10-S7-I1
Ridge National Laboratory (ORNL), and the Kentucky Biomedical Research Infrastructure Network (KBRIN) have collaborated to share their extensive bioinformatics research and educational expertise to further strengthen bioinformatics in the Tennessee and Kentucky region.One of the results of these collaborations is joint sponsorship in an annual regional bioinformatics summit that brin

together researchers, educators, and stu
ents with interests in bioinformatics from research and educational institutions in Kentucky, Tennessee, and other states.These summits provide unique opportunities for enhancing collaborative links in the region and for further integration of multidisciplinary research efforts across institutions.As a result, a number of new collaborative research and educational projects have been fostered across institutions.The Eighth Annual UT-ORNL-KBRIN Bioinformatics Summit was held at Fall Creek Falls State Park in Pikeville, Tennessee from March 20-22, 2009.A total of 202 participants registered for the summit, with 94 from various Tennessee institutions and 68 from various Kentucky institutions.A number of additional participants came from universities and research institutions from other states and countries, e.g. the National Institutes of Health, Virginia Commonwealth University, University of Cincinnati, Emory University, and the University of British Columbia.Seventy-seven registrants were faculty, with an additional 62 student, 43 staff, and 20 postdoctoral level participants.

The conference program included three days of presentations.The first day was devoted to workshops, including two Geospiza/Digital World Biology workshops, along with a bioinformatics education and a microarray analysis workshop.The last day and a half were dedicated to scientific sessions in bioinformatics divided into three plenary sessions: Medical and Translational Informatics, Systems Biology, and Next-Generation Sequencing and Epigenetics.


Geospiza/Digital World Biology Workshops

Dr. Sandra Porter, president of Digital World Biology, kicked off the Summit with two workshops from Geospiz and Digital World Biology.The first workshop, "Polymorphism/SNP Discovery" focused on discovering single nucleotide polymorphisms (SNPs) using raw Sanger sequencing trace files [1,2] and the associated Phred [3,4] quality values in regions with low quality scores.These SNPs can be visually represented by using techniques for viewing sequence chromatograms such as Geospiza's FinchTV.The example of the SNP with dbSNP [5] entry rs671 was used as an illustrative case.This SNP results in a single base difference in the nucleotide sequence from an A in the wild type to a G in the mutant, causing a change from a glutamine to a lysine in amino acid position 487 of the alcohol dehydrogenase (ALDH2) gene.Conformational changes in the ALDH2 protein structure cause an individual ingesting alcohol to lose the capability of efficiently metabolizing acetaldehyde [6].HapMap [7] information for this SNP indicates that 31% of the Han Chinese population is heterozygous for this SNP.In fact, this allele is not typically found in any population outside of Asia [8].Dr. Porter discussed using NCBI's Cn3D [9] to view structural locations

o form hypotheses as to the possib
e molecular interactions of amino acids at specific locations and how these interactions can be affected by polymorphisms.This discussion was illustrated with the wild type and mutant for ALDH2 structures with Protein Data Bank (PDB) [10] entries 1O05 and 1ZUM, respectively, where the change from a negatively to a positively charged amino acid causes a change in how the protein subunits of ALDH2 interact.

In the second Geospiza workshop, "Next Generation DNA Sequencing", Dr. Porter provided an overview of three next generation sequencing platforms (454 [11], Illumina [12], and SOLiD [13]) which are technologies enabling for an increased availability of DNA sequence information at a reduced cost per run compared to the traditional Sanger sequencing technique.The properties of each approach in terms of methodologies, data preparation, raw data, analysis, and sequence types interrogated (i.e.genomes, transcriptomes, miRNAs, copy number variants, SNP analysis, epigenetic methylation) were explained.Dr. Porter also discussed the pipeline that Geospiza have in place

or dealing with data from sample to results.U
e of next generation sequencing data and RNA-Seq [14] for transcriptome analysis as opposed to microarrays [14] is becoming a more real possibility.One of the main advantages of such an approach is that it becomes possible to study all possible isoforms and SNPs, including those not previously described.Studies performed for transcriptome analysis [15,16] were analyzed using Geospiza's GeneSifter™ software.


Bioinformatics Education Workshop

Dr. Steven Jennings of the University of Arkansas-Little Rock led a discussion on the state of bioinformatics education.Building upon his experience in creating a Ph.D. program in Bioinformatics as well as serving on national society level bioinformatics education committees, Dr. Jennings offered several insights into how a student interested in bioinformatics should be trained.The issue of training students often leads to a struggle of breadth versus depth of training.As Dr. Jennings pointed out, many of the techniques students learn will be out-of-date within five years of graduation.Therefore,

mportance in bioinformatics training s
ould be placed on producing students who are independent thinkers who are able to adapt to changing technologies.A methodology for constructing a program in bioinformatics was proposed by constructing a cube, where each dimension represents topics in the fields of biology, computer science, and mathematical modeling/computation.The intersection of these areas shows the difficulty in producing a "one size fits all" program.


Statistical Analysis of Microarrays Workshop

Issues involved in analyzing microarray data from a statistical perspective were the topic of the workshop provided by Dr. Arnold Stromberg from the University of Kentucky.Dr. Stromberg has examined a number of research issues with microarrays, including pooling samples [17].The main focus of this workshop was to encourage researchers to reduce the number of tests and gene lists used in order to increase the p-values and reduce the false discovery rate (FDR).A quadratic regression analysis technique was discussed that allows researchers to classify the behavior of genes into one of nine basic patterns over time [18].Such an approach can be favorable to cluster analysis by showing the actual behavior of the gene(s) of interest.Dr. Stromb rg suggested that the best approach to solving issues with microarrays is to consider the experimental design from the outset, keeping in mind three key questions: 1) What do you want to know? 2) What is the simplest design that will do the job? 3) Can the design be modified to reduce variability?


Medical and Translational Informatics

This year's Medical and Translational Informatics session included a plenary presentation by Bruce Aronow of the University of Cincinnati and Cincinnati Children's on "Integrative Biology and Disease."Dr. Aronow presented his perspective of building upon systems biology techniques for understanding systems dynamics across concepts to allow for a higher level of abstractions.Inclusion of databases of prior knowledge such as molecular, clinical and phenomic sources is key to the understanding of what is going on biologically.For instance, a pathway can be analyzed by first understanding how miRNAs can knock down transcription factor expression, thereby altering gene expression which in turn may affect a particular pathway.A discussion of ontological models for drug and disease correlation was included, which will hopefully lead to better personalized medicine by developing a greater connectivity of knowledge between drug interactions and their effects on genes, gene products, as well as transcriptional and translational control elements.The Systems Biology of Disease and Drug Ontology (SBD) as well

GATACA and Topp
ene [19] were discussed as tools that allow for a better understanding of disease.

A second plenary presentation entitled "Slim-Prim: A bioinformatics database bridging basic and clinical science" was made by Ian Brooks of the University of Tennessee Health Science Center.Slim-Prim is a HIPPA compliant management system for managing information for either scientific laboratories or patient-care research.Slim-Prim was initially developed for use by members of the University of Tennessee Health Science Center's Clinical and Translational Science Institute (CTSI).At its base is an Oracle data/knowledge management system.Built upon this core is a web-based API for building forms for individual projects or patient studies.Each project an then be linked to additional information such as patient history and biorepository information both locally and in a federated fashion.Dr. Brooks discussed two such sources of electronic health records currently housed in Slim-Prim, the Kids' Inpatient Database (KID) [20] which contains 7 million records; and the Mid-South eHealth Alliance (MSeHA) [21] which produces "RHIO" for electronic health records at a rate of 1.5 million records per year.A web-based report generator, Knowledge Informatics for Science and Medical Education and Training (KISMET), allows for access to local and national resources, including caBIG [22], for more complete analysis of the Slim-Prim data.The main benefits of the Slim-Prim system are that it is user-friend y, secure, versatile, and portable.


Systems Biology

The Systems Biology plenary session featured four speakers from Virginia Commonwealth University (VCU).Dr. Michael Miles presented his research on genetic characterization of robust ethanol-responsive gene networks in mouse prefrontal cortex [23][24][25][26][27][28].Analysis by his group of QTL mapping of genome-wide expression changes to ethanol in mice response pinpointed multiple genome loci showing strong signals, indicating the role of these loci in gene expression changes.A number of loci were suggested to influence regulation of response to ethanol for gene networks.E istatic interactions were observed for a number of loci, suggesting the role for DNA modification in regulation of gene expression in response to ethanol.

The presentation on systems vaccinology for Cryptosporidium, an important apicomplexan parasite, was made by Dr. Gregory Buck, head of the Center for the Study of Biological Complexity at VCU.He summarized his research, which yielded the genome sequences of C. hominis and C. parvum [29,30].He further described the succ

sful identification by his group of
romising vaccine targets by employing a joint strategy of comparative analysis of gene expression and proteomics of different stages of the life cycle of Cryptosporidium, using genome analysis identifying predicted membrane-or surface-associated proteins, secreted proteins, and other relevant candidates, and by employing a combination of experimental and in silico analysis [31][32][33].

Dr. Zhongming Zhao presented his research on gene networks and pathways in schizophrenia.In his presentation, he discussed his bioinformatic approach to identify candidate genes for schizophrenia by combining results from gene mapping studies including genome-wide association analysis, linkage analysis, gene expression information, and literature search, an by employing screening criteria of connectivity in the human protein-protein interaction networks [34].He also outlined his research on the role of microRNA interaction networks in schizophrenia and the successful development of an online database for schizophrenia genes.

Dr. Ping Xu presented the final talk at this session, in which he described his integrative study of streptococcal virulence by employing comparative genomics and systems biology.He described the devastating effect of streptococcal infections and summarized a systematic experimental genome-wide deletion analysis of each open reading frame in the Streptococcus sanguinis genome, which will lead to better understanding of the phenotypic role of each of these genes [35,36].


Next-Gen Sequencing and Epigenetics

Robert Hanson of NIH/NIDDK was the first presenter in this session with a talk on "Genetic and Epigenetic Studies of Type 2 Diabetes in Americ

Indians."His presentation
included a discussion of the complexity of Type 2 diabetes, specifically in understanding the role of potential epigenetic factors.These studies involved looking at the birth weight of babies in addition to familial history in the American Indian population.Genome-wide linkage analysis and association mapping studies indicate potential candidates .Some variants show significantly weaker effects in American Indians than in Europeans, indicating the importance of epigenetics in terms of parent-of-origin effects and interaction with the diabetic intrauterine environment.Jarret Glasscock of Cofactor Genomics followed with a presentation titled "New aspects of bioinformatics introduced by next-generation sequencing technologies."Dr. Glasscock has been involved in early testing and characterization of many of the Next-Gen sequencing platforms, including the llumina, 454, and SOLiD technologies.His presentation covered the possibilities these technologies now provide, including large scale and single nucleotide polymorphism discovery, gene expression quantification, and epigenetic studies through bisulphate se

encing.An overv
ew and contrast of these technologies were given in terms of which types of studies are most suitable for each.Dr. Glasscock's presentation led to an engaging discussion.These exciting technologies are rapidly evolving and lead to many interesting research questions both with the data generated and in methodologies for handling and annotating the data itself.


Educational Opportunities

Dr. Cynthia Peterson, the director of the UT/ORNL Graduate School of Genome Science and Technology resented an update on the educational opportunities at UT/ ORNL.She discussed the progress made with SCALE-IT (scalable computing and leading edge innovative technologies) program over the past year.In addition, she discussed the National Institute for Mathematical & Biological Synthesis (NIMBioS), a one-of-a-kind institute housed at the University of Tennessee.NIMBioS is the result of a $16 million National Science Foundation award to the University of Tennessee, Knoxville that will draw more than 600 national and international researchers each year to participate in working groups, workshops, and conferences.Support for working groups, postdoctoral and sabbatical fellowships, as wel

as graduate a
sistantships are all available through NIMBioS http:// www.nimbios.org/.PEER, The Program for Excellence & Equity in Research, was also discussed as an avenue to increase the diversity of student populations in the STEM areas through graduate fellowships, scientific training, and caree