Heikki Lehvaslaiho, European Bioinformatics Institute
30 June 2004
I quickly tested a few common open source spreadsheet programs, openoffice.org calc, gnumeric and kspread, for this automatic symbol mutation ability.
The following crude text table indicates if the conversions happens by default in these programs. "date" means that DEC1 type string gets converted, "float" means that RIKEN identifiers of type "2310009E13" get converted.
.................."date"...."float"
calc................yes........yes
gnumeric........no........yes
kspread.........no........yes
Be careful out there!
Competing interests
None declared
Well spotted
Andrew Clegg, Birkbeck
21 July 2004
One to pin up on lab walls everywhere. I shudder to think how many pieces of work this might have affected.
Competing interests
None declared
Special Interest group on spreadsheet risks
Patrick OBeirne, Eusprig
26 July 2004
The European Spreadsheet Risk Interest Group (EUSPRIG) discusses the prevention and detection of spreadsheet errors. You can read about the emergence of the discipline of Spreadsheet Engineering and other related information at our website <a href="http://www.eusprig.org">www.eusprig.org</a>. We have just completed our fifth international conference and now have a corpus of approximately 100 peer reviewed papers in our subject domain.
We're not specifically a group to discuss Excel bugs and workarounds, the <a href="http://peach.ease.lsoft.com/archives/excel-l.html">Excel-L list</a> is a very busy source of information on these, as well of course as the MS Knowledgebase.
We are very interested in hearing from users about how you mitigate spreadsheet risks, what good practices they adopt, and so on. We are working with the ECDL Foundation for a syllabus of good practice for end users.
Patrick O'Beirne, chair, Eusprig
Competing interests
none
Good point.
Carol Bult, The Jackson Laboratory
27 July 2004
The article raises a very good point. I've experienced similar behavior in excel for other data types. I would add that it is always a good idea to carry along a unique numeric database id along with gene names/symbols. Database accession ids may be less likely to be munged by Excel (unless the ids are alpha-numeric!) and since they are usually unique and permanent they can be used to restore and/or update lists of gene names/symbols (which change all of the time).
Competing interests
No competing interests
19 probe sets in Affymetrix's human U133Plus2.0
Chao Lu, Hospital for Sick Children, Toronto
28 July 2004
A good point. Many people did not pay attention to this 'small' error.
Here is a list of 19 probe sets with errors in their gene symbol (June 23, 04 annotation, Affymetrix) when opened in Excel:
1570394_at ===> 1-Sep
200902_at ===> 15-Sep
208999_at ===> 8-Sep
209000_s_at ===> 8-Sep
212413_at ===> 6-Sep
212414_s_at ===> 6-Sep
212415_at ===> 6-Sep
212698_s_at ===> 10-Sep
213666_at ===> 6-Sep
214298_x_at ===> 6-Sep
214720_x_at ===> 10-Sep
220781_at ===> 1-Dec
221129_at ===> 2-Apr
223362_s_at ===> 3-Sep
225814_at ===> 1-Sep
226627_at ===> 8-Sep
227034_at ===> 10-Sep
227552_at ===> 1-Sep
233632_s_at ===> 1-Sep
Competing interests
None declared
And the lesson is...
Neil Saunders, University of Queensland
11 April 2008
And that's why bioinformaticians don't use Excel for this purpose. Or more generally, don't use spreadsheets as "databases".
Competing interests
None declared
MS should pick this up
Richard Jackson, Independent
12 May 2011
I believe a large part of bioinformatics is about providing a conduit between experts in different fields, as well as novel discovery. Often, people have their own preferences for data manipulation packages, and frequently scientists with less technical expertise tend towards Excel. Moving data back and forth between individuals in such ways give ample opportunities for errors like this to arise.
Hence, I think the situation is ubiquitous and serious enough to warrant intervention by Microsoft. I don't know if they've picked up on this article yet. Sadly, they don't seem to have anything in terms of a suggestion box on their website (I spent an hour looking!)
not only excel
30 June 2004
I quickly tested a few common open source spreadsheet programs, openoffice.org calc, gnumeric and kspread, for this automatic symbol mutation ability.
The following crude text table indicates if the conversions happens by default in these programs. "date" means that DEC1 type string gets converted, "float" means that RIKEN identifiers of type "2310009E13" get converted.
.................."date"...."float"
calc................yes........yes
gnumeric........no........yes
kspread.........no........yes
Be careful out there!
Competing interests
None declared
Well spotted
21 July 2004
One to pin up on lab walls everywhere. I shudder to think how many pieces of work this might have affected.
Competing interests
None declared
Special Interest group on spreadsheet risks
26 July 2004
The European Spreadsheet Risk Interest Group (EUSPRIG) discusses the prevention and detection of spreadsheet errors. You can read about the emergence of the discipline of Spreadsheet Engineering and other related information at our website <a href="http://www.eusprig.org">www.eusprig.org</a>. We have just completed our fifth international conference and now have a corpus of approximately 100 peer reviewed papers in our subject domain.
For more reports of spreadsheet errors, see
<a href="http://www.eusprig.org/stories.htm">our stories</a>
We're not specifically a group to discuss Excel bugs and workarounds, the <a href="http://peach.ease.lsoft.com/archives/excel-l.html">Excel-L list</a> is a very busy source of information on these, as well of course as the MS Knowledgebase.
We are very interested in hearing from users about how you mitigate spreadsheet risks, what good practices they adopt, and so on. We are working with the ECDL Foundation for a syllabus of good practice for end users.
Patrick O'Beirne, chair, Eusprig
Competing interests
none
Good point.
27 July 2004
The article raises a very good point. I've experienced similar behavior in excel for other data types. I would add that it is always a good idea to carry along a unique numeric database id along with gene names/symbols. Database accession ids may be less likely to be munged by Excel (unless the ids are alpha-numeric!) and since they are usually unique and permanent they can be used to restore and/or update lists of gene names/symbols (which change all of the time).
Competing interests
No competing interests
19 probe sets in Affymetrix's human U133Plus2.0
28 July 2004
A good point. Many people did not pay attention to this 'small' error.
Here is a list of 19 probe sets with errors in their gene symbol (June 23, 04 annotation, Affymetrix) when opened in Excel:
1570394_at ===> 1-Sep
200902_at ===> 15-Sep
208999_at ===> 8-Sep
209000_s_at ===> 8-Sep
212413_at ===> 6-Sep
212414_s_at ===> 6-Sep
212415_at ===> 6-Sep
212698_s_at ===> 10-Sep
213666_at ===> 6-Sep
214298_x_at ===> 6-Sep
214720_x_at ===> 10-Sep
220781_at ===> 1-Dec
221129_at ===> 2-Apr
223362_s_at ===> 3-Sep
225814_at ===> 1-Sep
226627_at ===> 8-Sep
227034_at ===> 10-Sep
227552_at ===> 1-Sep
233632_s_at ===> 1-Sep
Competing interests
None declared
And the lesson is...
11 April 2008
And that's why bioinformaticians don't use Excel for this purpose. Or more generally, don't use spreadsheets as "databases".
Competing interests
None declared
MS should pick this up
12 May 2011
I believe a large part of bioinformatics is about providing a conduit between experts in different fields, as well as novel discovery. Often, people have their own preferences for data manipulation packages, and frequently scientists with less technical expertise tend towards Excel. Moving data back and forth between individuals in such ways give ample opportunities for errors like this to arise.
Hence, I think the situation is ubiquitous and serious enough to warrant intervention by Microsoft. I don't know if they've picked up on this article yet. Sadly, they don't seem to have anything in terms of a suggestion box on their website (I spent an hour looking!)
Competing interests
None declared