4273π: Bioinformatics education on low cost ARM hardware
© Barker et al.; licensee BioMed Central Ltd. 2013
Received: 24 April 2013
Accepted: 9 July 2013
Published: 12 August 2013
Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access.
We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012-2013.
4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost.
KeywordsBioinformatics education Teaching material Raspberry Pi Linux
Bioinformatics is increasingly included in the undergraduate curriculum for biology students. Teaching bioinformatics is made difficult, however, by the constraints of a typical university computer classroom. Some areas of basic bioinformatics may be taught using such classrooms, where all that is required is an Internet connection and Web browser (e.g. BLAST  searches at the NCBI ). More in-depth teaching requires the re-creation of a bioinformatics research environment, consisting of a Linux or UNIX operating system, standard GNU utilities , specialist bioinformatics software, and sequence databases.
Undergraduate modules ought, ideally, to prepare students for research in an academic research group. Students who do pursue a research career will often find that institutional computer support is targeted to generic computer use (e.g. Microsoft software) rather than installing and maintaining systems suitable for bioinformatics. Particularly outside of bioinformatics research (but also occasionally within it), the principal investigator of the research group may never have used Linux, may have a limited idea of the procedures, and may expect group members to ‘pick things up’ and deal with problems themselves. This requires researchers to have a high level of proficiency with Linux, including the ability to install both standard Linux packages and software for which no standard package may be available. A single taught module cannot prepare a student for all eventualities, but ought to leave the student with the basic skills and confidence to be able to discover solutions, and implement them, as required. Hence, a certain amount of system administration should appear in an undergraduate bioinformatics module for biologists.
Traditionally, the environment required for an undergraduate bioinformatics module has been created in one of four ways. Firstly, one may set up a central GNU/Linux server on the campus and allow students to connect by Secure Shell, ssh (‘the server approach’). The server will typically run either a standard Linux distribution or a specialist bioinformatics distribution such as NEBC Bio-Linux . The server approach allows the instructor to have full control over the server, and allows students to connect from existing computing classrooms with little or no adjustment to the classroom software. For students to connect to the server via the intranet, classroom computers only require an ssh client, the X Window System (X11), and a means of file transfer such as secure copy (scp). Students may also connect to the server from home (typically requiring them to install virtual private network software in addition to ssh, X11 and an scp client) or elsewhere on campus. Secondly, one may provide students with a virtual machine, consisting of an environment similar to that which they might experience on the Linux server but running on a classroom computer, either with a standard Linux distribution or a specialist bioinformatics distribution such as DNA Linux Virtual Desktop Edition  (‘the VM approach’). This has the advantage that students may be given administrator access to their virtual machine. Thirdly, one may provide students with a Linux system on removable media (‘the USB stick approach’, for example ; where files and settings do not have to be saved, a DVD may be used instead ). So long as students have the media to hand, this allows them to boot into ‘their own’ Linux. As with the VM approach, students may be given administrator access. The additional advantage is that the media may be portable between computer classrooms and home computers, without requiring students to move virtual machine image files. Fourthly, students may be loaned or required to buy laptops of a specific kind, with a suitable operating system, data and software installed (‘the laptop approach’). This avoids hardware incompatibilities that the USB stick approach may, in practice, experience .
Because administrator access cannot be allowed, the server approach fails to give students experience of the standard mechanism of software installation. It also involves competition for resources such as CPU time, especially if the class is large or the server is also shared with research colleagues. The VM approach solves both these problems but is less portable. Although, in theory, students may transfer a VM from one computer to another (assuming the destination has the necessary virtualisation software installed), the task is non-trivial, and more time consuming than a simple transfer of data or documents. The USB stick approach reduces the portability problem, since it is trivial to move a USB stick from one computer to another. However, smooth operation on all hardware is not guaranteed and requires ongoing efforts from the developers of the Linux distribution as new hardware is released. The laptop approach avoids all these problems by providing a portable computer holding everything required for the course. However, it is expensive.
As a fifth approach, we propose loaning a Raspberry Pi computer  and associated peripherals to students for the duration of the course (‘the Raspberry Pi approach’). This includes a customised version of Linux, appropriate software and data. This allows students full administrator access to a suitable operating system, without the difficulties of the VM or USB stick approaches. Should the student accidentally damage critical files, the system can be re-written from a master image.
The Raspberry Pi Model B - with 256 MB (now 512 MB) RAM, an ARM11 CPU running at 700 MHz before over clocking and a Video core IV GPU - was released for public sale in 2012  and costs £28.07a or £31.20 [10, 11]. Though additional items are required to turn it into a functioning, general-purpose computer (case, charger, SD card, mouse, keyboard, monitor and cable; and an entirely separate computer for initialising the SD card), it is still relatively low-cost (Additional file 1: Table S1). The existence of the Raspberry Pi is partly a celebration of the early days of popular computing in the 1980s, and an attempt to recreate that excitement among young people today . It is also a symptom of the rapidly decreasing costs and increasing performance of computer hardware. The Raspberry Pi uses an ARM CPU . Because of their high performance-per-watt, ARM CPUs are frequently found in small electronic appliances such as mobile phones and tablets. With CPU innovation increasingly driven by such applications, as opposed to more traditional areas such as desktop, laptop and server computers, the prevalence and utility of ARM-based computer hardware is likely to increase. Indeed, ARM-based servers are starting to appear in data centres, due to their modest requirements for power .
Though far slower than current desktop and laptop computers, the Raspberry Pi is notably faster than the Cray 1 supercomputer , a marvel of computer speed in its day. The valid question arises as to how much computer power is actually required to teach undergraduates bioinformatics? We propose that the answer is, by current standards, ‘not much’. The Raspberry Pi is more than adequate for the task. The Raspberry Pi approach includes all the benefits of the laptop approach, above, but at lower cost. In addition, the Raspberry Pi is a new and exciting computer system, which in itself can add interest to the course.
A variety of operating systems is available for the Raspberry Pi . These include Raspbian , which is based on Debian GNU/Linux . Over 35,000 Debian software packages are available pre-compiled for Raspbian, including Web browsers, text editors, word processors, and a wide range of bioinformatics packages . Other software will usually compile and run without problems. Some features of recent CPUs (e.g. 64-bit addressing or vector operations) are absent, but we have not found these to be at all necessary for our proposed use of the Raspberry Pi. The most serious limitation, affecting structure visualisation software in particular, is limited graphics performance. Even so, some structural visualisation software does work on the Raspberry Pi . New system software, improving graphics performance by making better use of the Raspberry Pi’s GPU, is under development .
We provide 4273π, a customised version of Linux for Raspberry Pi computer hardware. 4273π includes an Open Access bioinformatics course, 4273π Bioinformatics for Biologists.
4273π Bioinformatics for Biologists is based on the module BL4273 Bioinformatics for Biologists at the University of St Andrews , an optional module of 15 SCOTCAT credits, equivalent to 7.5 ECTS credits or ~4 US credits. BL4273 is intended for final-year undergraduate students on BSc(Hons) Biology, BSc(Hons) Biochemistry and other degree courses taught by the School of Biology. BL4273 was taught on Raspberry Pi hardware in Semester 1 of academic year 2012-2013. During this time, students were loaned a Raspberry Pi, SD card and USB stick for backup. Teaching was carried out in a small computer room in which students connected to monitors more generally used with Windows desktop PCs. Students were allowed to take all equipment home on loan (including keyboard, mouse and cables), with the exception of the monitors; several copies of the main textbook  were also available. Course material was released week-by-week using an rsync server (running on a Raspberry Pi) on the university intranet. Following the conclusion of the module, material was edited to remove closed-access material (e.g. images in lectures), typographical errors were corrected, and the SD card image was re-created using a recent version of Raspbian .
Preparation of a release of 4273π begins with the latest Raspbian SD card image . This is then customised to produce a ‘master’ SD card for the release, partly by a series of scripts which alter the configuration and use Raspbian’s port of the APT mechanism of Debian  to install specialist bioinformatics packages  and more general packages, and partly by a series of commands entered manually (e.g. to install BLAST databases and 4273π Bioinformatics for Biologists in the ~/4273pi/ directory). The master SD card image is stored on a separate computer and uploaded to the 4273π Web site. A ‘work instruction’ detailing the steps performed to convert Raspbian into 4273π, and all scripts used, are distributed with 4273π.
For the permanent record, the teaching material included in the current release - excluding Linux, software and BLAST databases - is available as Additional file 2. The latest version may be downloaded from the 4273π Web site .
As the Raspbian operating system, Raspberry Pi firmware and hardware and 4273π Bioinformatics for Biologists teaching material develop, further releases of 4273π will be made available. It is anticipated that there will be a minimum of two releases per year during the next four years.
Results and discussion
Timetable for 4273 π Bioinformatics for Biologists
Genomes, sequences and bioinformatics data.
Linux and Perl.
Linux, Perl and protein BLAST.
Linux, Perl and delimiting gene/protein families.
Multiple alignment and phylogeny.
Multiple alignment and phylogeny.
Gene family evolution.
Gene family evolution.
BLAST; DNA sequence analysis.
DNA sequence analysis.
Looking at species differences.
Detecting positive selection.
Function and evolution of enzymes.
Function and evolution of enzymes.
Seminar: student presentations.
(Administrator) power to the people
Rather than present systems administration as a complex task to be delegated to a technical support unit - which in students’ future careers may not be available - 4273π Bioinformatics for Biologists introduces software installation on Linux through standard package management (APT), through compilation (GNU ‘make’ for SNAP  and PAML ) and through command-line launch of a JAR (Modelgenerator  and Mesquite ). Other administrative tasks covered include upgrading Linux and installing a MySQL server. Although these matters of system administration are incidental to an intellectual understanding of bioinformatics, they do not take much time to teach, and we believe they leave students well-prepared for a bioinformatics research career - in many cases, better prepared than any other members of the research group within which they are working.
The virtue of patience
No serious compromises in content were required to teach bioinformatics on the Raspberry Pi. A BLAST search of the GenPept database (‘nr’) is, in practice, too slow. However a BLAST search of the SwissProt database takes only a few minutes. Delimitation of protein families across two prokaryotic genomes using BLAST and OrthoMCL is entirely feasible, and is central to the coursework component of 4273π Bioinformatics for Biologists. Java (Modelgenerator, Mesquite) programs run slowly, but not unbearably so. Speed will likely improve once Oracle Java becomes available for the platform. Floating-point-intensive tasks (PhyML  and PAML) are also slow, but feasible. Waiting an hour for an analysis is, in fact, realistic training for bioinformatics research. Research will tend to use a far faster computer or cluster, but with far larger input. Teaching bioinformatics on rather slow hardware was not universally popular among students (Additional file 1: Table S2). However, it is a valuable lesson in the transferrable virtue of patience, and the value of checking the configuration of analyses before launching them.
Low-cost teaching and learning
When the undergraduate module BL4273 at the University of St Andrews ran previously, in Semester 1 of academic year 2008-2009, it used the server approach. Students used desktop computers running Windows in a computer classroom to connect to a 64-bit IBM System x3755 8877 server with 4 AMD CPU cores and 16 GB RAM, running Debian Linux. When purchased in 2008, this server cost £19,537 after educational discount, including 20 TB external storage (14 TB usable as RAID5), uninterruptable power supply (UPS), tape library for backup, and 3 years’ hardware support. It was housed in a secure air-conditioned server room, where it required the University’s IT Services department for installation and maintenance (and for setting up the tape library in a different building) and the University’s Estates department to set up a new 15A electrical connection for the UPS. Although this did not happen, there was always the fear of down-time at some crucial stage in teaching, which is more problematic for a single server being used by all students than for (say) one desktop computer in a classroom. As well as teaching, the server was used for research purposes, leading to worries about conflicting resource use between students and researchers.
To use 4273π, the investment in hardware per set of equipment, including the Raspberry Pi but excluding the monitor, before any quantity or educational discount is ~ £147-£159, depending on what exactly is bought and from where (Additional file 1: Table S1; Background). This is far cheaper than the server used in 2008-2009, but is not extremely cheap. However, these are mostly one-off expenses, since the equipment may be re-used; some components are likely already present in an educational establishment; the maximum cost of any one part is no more than £31.20, allowing cheap repairs compared to ‘the laptop approach’; and the cost of some parts of the equipment (e.g. SD card) continues to fall noticeably. Among its other advantages, the Raspberry Pi approach is a low-cost method for bioinformatics teaching and learning.
By including an explicit Open Access licence, and removing or replacing material incompatible with this from 4273π Bioinformatics for Biologists, we have been able to share it with anyone interested, the world over, in such a way that they can - with minimal care - re-use and adapt it without accusation of plagiarism or copyright violation. This approach is broadly in common with the pioneering EcoEd Digital Library  and related portals , but is in contrast to most of the teaching material that can be found by an online search, for which the licence is unclear. We expect our approach will lead to mutual benefits, for example the contribution of corrections or teaching material by others. As Open Access publication is becoming more standard for research, we predict that Open Access will become more standard for teaching material.
Availability and requirements
Project name: 4273π
Project home page: http://eggg.st-andrews.ac.uk/4273pi
Operating Systems: Linux
Other requirements: Raspberry Pi computer hardware
Licence: 4273 π Bioinformatics for Biologists has a Creative Commons Attribution licence (http://creativecommons.org/licenses/by/2.0)
Any restrictions to use by non-academics: no
aPrices in British Pounds (GBP), including UK tax but excluding delivery charges, obtained on 26 June 2013. £1 converts to approximately $1.54 US Dollars, €1.18 Euro, or R93.04 Indian Rupees .
Advanced Micro Devices
A package tool
Bachelor of Science with Honours
Central processing unit
European credit transfer and accumulation system
GNU’s not UNIX!
Graphics processing unit
International Business Machines
National Center for Biotechnology Information
NERC Environmental Bioinformatics Centre
Natural Environment Research Council
Redundant array of independent disks, level 5
Scottish Credit Accumulation and Transfer
Universal serial bus
Uninterruptable power supply
The X Window System.
Losia Lagisz contributed to the lecture material on Looking at Species Differences. Ian Grieve and Lianne Baker assisted with the task of obtaining sufficient Raspberry Pi computers, during summer and autumn 2012 when they were scarce. All students of BL4273 Bioinformatics for Biologists at the University of St Andrews are thanked for testing teaching material, particularly those in academic year 2012-2013, who used the Raspberry Pi. Their contribution has been invaluable. We thank the developers of Raspbian Linux, upon which 4273π is based, for their rapid and professional work. Clare Peddie’s enthusiasm (as Director of Teaching for the School of Biology, University of St Andrews) allowed the necessary equipment purchases to go ahead. JBOM is grateful to the Scottish Universities Life Sciences Alliance (SULSA) for financial support and thanks the BBSRC for funding his research through grant BB/I00596X/1. The University of St Andrews provided funding for the Open Access charge.
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- NCBI BLAST home. http://blast.ncbi.nlm.nih.gov/Blast.cgi,
- GNU operating system. http://www.gnu.org,
- Field D, Tiwari B, Booth T, Houten S, Swan D, Bertrand N, Thurston M: Open software for biologists: from famine to feast. Nat Biotechnol. 2006, 24: 801-803. 10.1038/nbt0706-801.View ArticlePubMedGoogle Scholar
- Bassi S, Gonzalez VC: DNALinux virtual desktop edition. Nature Precedings. 2007, http://dx.doi.org/10.1038/npre.2007.670.1,Google Scholar
- Bio-Linux 7 USB memory sticks. http://nebc.nerc.ac.uk/tools/bio-linux/live-usbkey,
- Yu G, Wang LG, Meng XH, He QY: LXtoo: an integrated live Linux distribution for the bioinformatics community. BMC Res Notes. 2012, 5: 360-10.1186/1756-0500-5-360.PubMed CentralView ArticlePubMedGoogle Scholar
- Raspberry Pi: An ARM GNU/Linux box for $25. http://www.raspberrypi.org,
- BBC News: The Raspberry Pi computer goes on general sale. http://www.bbc.co.uk/news/technology-17190918,
- Element14 Community: Raspberry Pi. http://www.element14.com/community/groups/raspberry-pi,
- RS Components: Raspberry Pi. http://uk.rs-online.com/web/generalDisplay.html?id=raspberrypi,
- Roberts J: Is the Raspberry Pi the future of computing? techradar.pro, from Linux Format Issue 156. 2012, http://www.techradar.com/news/computing/pc/is-the-raspberry-pi-the-future-of-computing-1078276,Google Scholar
- ARM: The architecture for the digital world. http://www.arm.com,
- Latif L: ARM sees its 32-bit chips being deployed in future servers: not everything needs 64-bit addressing. The Inquirer. 2013, http://www.theinquirer.net/inquirer/news/2259386/arm-sees-its-32bit-chips-being-deployed-in-future-servers,Google Scholar
- 2000 Nickels: A Cray for $35. http://2000nickels.com/blog/2012/11/19/a-cray-for-35-dollars,
- Raspberry Pi: Operating system distributions. http://www.raspberrypi.org/phpBB3/viewforum.php?f=18,
- Raspbian. http://www.raspbian.org,
- Debian. http://www.debian.org,
- Möller S, Krabbenhöft HN, Tille A, Paleino D, Williams A, Wolstencroft K, Goble C, Holland R, Belhachemi D, Plessy C: Community-driven computational biology with Debian Linux. BMC Bioinformatics. 2010, 11 (Suppl 12): S5-10.1186/1471-2105-11-S12-S5.PubMed CentralView ArticlePubMedGoogle Scholar
- O’Boyle NM: Noel O’Blog: Chemistrify your Raspberry Pi Part III. http://baoilleach.blogspot.co.uk/2013/01/chemistrify-your-raspberry-pi-part-iii_19.html,
- Raspberry Pi: Wayland preview. http://www.raspberrypi.org/archives/4053,
- University of St Andrews: Undergraduate course catalogue. 2012, http://www.st-andrews.ac.uk/coursecatalogue/ug/2012-2013, -2013,
- Bradnam K, Korf I: Unix and Perl to the rescue! a field guide for the life sciences (and other data-rich pursuits). 2012, Cambridge: Cambridge University PressView ArticleGoogle Scholar
- Raspberry Pi: Downloads. http://www.raspberrypi.org/downloads,
- 4273π. http://eggg.st-andrews.ac.uk/4273pi,
- Korf I: Gene finding in novel genomes. BMC Bioinformatics. 2004, 5: 59-10.1186/1471-2105-5-59.PubMed CentralView ArticlePubMedGoogle Scholar
- Birney E, Clamp M, Durbin R: GeneWise and Genomewise. Genome Res. 2004, 14: 988-995. 10.1101/gr.1865504.PubMed CentralView ArticlePubMedGoogle Scholar
- EMBL-EBI: GeneWise input form. http://www.ebi.ac.uk/Tools/psa/genewise,
- Marygold SJ, Leyland PC, Seal RL, Goodman JL, Thurmond J, Strelets VB, Wilson RJ, the FlyBase Consortium: FlyBase: improvements to the bibliography. Nucleic Acids Res. 2013, 41: D751-D757. 10.1093/nar/gks1024.PubMed CentralView ArticlePubMedGoogle Scholar
- Gardiner A, Barker D, Butlin RK, Jordan WC, Ritchie MG: Drosophila chemoreceptor gene evolution: selection, specialization and genome size. Mol Ecol. 2008, 17: 1648-1657. 10.1111/j.1365-294X.2008.03713.x.View ArticlePubMedGoogle Scholar
- Chambers R: Vestiges of the natural history of creation. 1844, London: John ChurchillGoogle Scholar
- Yang Z: PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088.View ArticlePubMedGoogle Scholar
- Keane TM, Creevey CJ, Pentony MP, Naughton TJ, Mclnerney JO: Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol. 2006, 6: 29-10.1186/1471-2148-6-29.PubMed CentralView ArticlePubMedGoogle Scholar
- Maddison WP, Maddison DR: Mesquite: a modular system for evolutionary analysis. Version 2.75. 2011, http://mesquiteproject.org,Google Scholar
- Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010, 59: 307-321. 10.1093/sysbio/syq010.View ArticlePubMedGoogle Scholar
- EcoEd digital library. http://ecoed.esa.org,
- EvoEd digital library: DRD partners and people. http://evoed.evolutionsociety.org/index.php?P=DRD_People,
- XE Currency converter. http://www.xe.com/currencyconverter,
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.