Using Spreadsheets In Bioinformatics Can Corrupt Data, Changing Gene Names Into Dates

from the careful,-now dept

A few years back, people were rather disturbed to find out about the famous Excel bug, whereby the multiplication of two numbers in Microsoft's spreadsheet gave the wrong number. It turns out there are other circumstances in which Excel (and, to be fair, presumably other spreadsheets) can give incorrect results, but they are unlikely to be encountered in typical everyday tasks. However, in the specialized world of bioinformatics, which uses computers to analyze data about genes and related areas, careless use of spreadsheets can throw up a significant numbers of errors, as this paper in BMC Bioinformatics explains:

Use of one of the research community's most valuable and extensively applied tools for manipulation of genomic data can introduce erroneous names. A default date conversion feature in Excel (Microsoft Corp., Redmond, WA) was altering gene names that it considered to look like dates. For example, the tumor suppressor DEC1 [Deleted in Esophageal Cancer 1] was being converted to '1-DEC.'
Here we have the interesting interaction of two very different fields, where the name of a gene involved in esophageal cancer, DEC1, was interpreted by Excel to mean the date, 1 December. As the paper points out, these kinds of substitution errors are already to be found in key public databases:
DEC1, a possible target for cancer therapy, was incorrectly rendered, and it could potentially be missed in downstream data analysis. The same type of error can infect, and propagate through, the major public data resources. For example, this type of error occurs several times in even the immaculately curated LocusLink database.
As that notes, a gene that might be relevant for treating cancer could well be missed because of this incorrect conversion to a date by Excel. Although it is unlikely that any serious harm has been caused by this -- yet -- it's a useful reminder of the dangers of depending a little too heavily on the results of software without checking for corruption of this kind.

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+

Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: bioinformatics, data, errors, genes, spreadsheets


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • identicon
    Anonymous Coward, 30 Jul 2014 @ 9:09pm

    Some software excels, some not so much.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 30 Jul 2014 @ 9:14pm

    "DAMN AUTOCORRECT!"

    link to this | view in chronology ]

    • icon
      Ninja (profile), 31 Jul 2014 @ 6:01am

      Re:

      Yeah, it keeps trying to correct my cursing and swearing. I want to call people a bunch of aunts damn it! I want to use dock, not some fancy medical name! Duck it!

      link to this | view in chronology ]

  • identicon
    Zem, 30 Jul 2014 @ 9:21pm

    A perfect example of why we need software patents. If Visicalc had had the proper patent protection MS would never have been able to produce this bug ridden product that now dominates the market.

    am I doing this right?

    link to this | view in chronology ]

  • icon
    sinsi (profile), 30 Jul 2014 @ 9:30pm

    Select column, Format as Text.

    link to this | view in chronology ]

    • icon
      Woadan (profile), 31 Jul 2014 @ 12:37am

      Re:

      Click on the square above row 1 and to the left of column a to highlight all cells, then right-click a cell, choose format, choose text, OK as many times as needed to get out and get back to the sheet.

      link to this | view in chronology ]

      • icon
        MrTroy (profile), 31 Jul 2014 @ 1:02am

        Re: Re:

        This looks to be a text import/entry bug rather than a formatting bug, so I'm not sure that this would help.

        link to this | view in chronology ]

        • identicon
          Anonymous Coward, 31 Jul 2014 @ 6:42am

          Re: Re: Re:

          Excel sucks at importing CSV data, it is a well known feature.

          link to this | view in chronology ]

    • identicon
      Anonymous Coward, 31 Jul 2014 @ 5:18am

      Re:

      Doesn't work. you import the data to excel and it changes the content. SEPT9 goes to 9-Sept, and when you "format as text", it doesn't get changed back to SEPT9.

      link to this | view in chronology ]

      • identicon
        Anonymous Coward, 31 Jul 2014 @ 5:45am

        Re: Re:

        And they call this a feature because they design to the lowest common denominator. They might release a "professional" version for a few more Benjamins.

        link to this | view in chronology ]

      • identicon
        Donglebert The Needlessly Unready, 31 Jul 2014 @ 6:31am

        Re: Re:

        If you open excel, and then try to open a text or csv file, the Import Wizard will start and give the option to define each column's data type before import. It won't convert it.

        The wizard doesn't run if you click on a file and select to open it in Excel, or if you set your exporting application to directly open Excel, or if you paste the data into a sheet. It will then autocorrect the data into a date field that, at it's heart, is a formatted number. If you convert that back to text, you'll see the number.

        link to this | view in chronology ]

      • identicon
        Anonymous Coward, 31 Jul 2014 @ 1:18pm

        Re: Re:

        It works fine IF you format the columns BEFORE the import. That takes a little planning and advance work, so maybe not.
        .

        link to this | view in chronology ]

    • icon
      mermaldad (profile), 31 Jul 2014 @ 10:12am

      Re:

      Nope. The problem is that Excel converts the *value* of the cell to the numeric equivalent that represents the date, and also converts the cell format to Date. So if you format as text, you fix one problem, but not the other. "DEC1" becomes 41974, for example.

      link to this | view in chronology ]

  • identicon
    Editor-In-Chief, 30 Jul 2014 @ 9:41pm

    This problem occurs in many different data analysis situations using Microsoft spreadsheets

    I first came across this problem around 2001 with loading sub-contract data for analysis. it just so happened that some of the text data would be interpreted as floating point numbers, thereby rendering the data analysis completely wrong.

    The data had to be pre-processed to insert "" around the relevant fields before loading the data into the spreadsheet. This did stop the spreadsheet conversion routines from touching that specific data. Unfortunately, the data was supplied in varying formats so that a uniform method couldn't be developed at the time in the timeframe required.

    David Oliver Graeme Samuel Offenbach

    link to this | view in chronology ]

  • identicon
    Lawrence D’Oliveiro, 30 Jul 2014 @ 11:09pm

    Spreadsheets Are Undebuggable

    As an experienced software developer, I just have to roll my eyes in disbelief at the number of people who trust complicated spreadsheets to give the right answers without the ability to check that they actually do.

    Even important economic research is being called into question because of spreadsheet errors.

    link to this | view in chronology ]

    • identicon
      Anonymous Coward, 31 Jul 2014 @ 5:36am

      Re: Spreadsheets Are Undebuggable

      The errors in Reinhart-Rogoff went beyond mere incompetence with Excel. But yes, they severely undercut their analysis with Excel based stupidity

      link to this | view in chronology ]

    • identicon
      Anonymous Coward, 31 Jul 2014 @ 5:48am

      Re: Spreadsheets Are Undebuggable

      But if it is in a pie chart then it has to be correct, even when the total percentages of one pie add up to over 100.

      link to this | view in chronology ]

  • identicon
    Anonymous Coward, 30 Jul 2014 @ 11:26pm

    So instead of spreadsheets...

    what should they use? Clearly spreadsheets are efficient but unreliable, so what other data software is out there that can do the same job?

    link to this | view in chronology ]

    • icon
      Woadan (profile), 31 Jul 2014 @ 12:40am

      Re: So instead of spreadsheets...

      Sheets app for Android; OpenOffice or LibreOffice. (Not sure if they share similar issues.)

      link to this | view in chronology ]

      • icon
        MrTroy (profile), 31 Jul 2014 @ 1:09am

        Re: Re: So instead of spreadsheets...

        Confirmed that OpenOffice does, and I think that Google Docs does as well.

        link to this | view in chronology ]

    • icon
      Eldakka (profile), 31 Jul 2014 @ 1:49am

      Re: So instead of spreadsheets...

      It's not so much the spreadsheet that's the issue, it's how people are using it. If the data was imported in the correct way it wouldn't have been an issue. However, most of the people doing this sort of data analysis (geneticists, biologists, economists) aren't Computer Science people, therefore they may not even be aware that this type of issue can occur.

      It's the same as climate scientists stating "If you aren't a climate scientist, you have no authority to comment on our work." But the problem is, most of climate science (computer modelling, statistical analysis) isn't "Climate" Science, it's a mix of Computer Science (programming for computer modelling), Fluid Dynamics (also for computer modelling, you've got to accurately model fluid dynamics to have an accurate atmospheric model), Statistical analysis (all the climate chronologies that are done are based on Statistics) and so on.

      When a non-Computer Scientist programs an advanced model, without input from a physicist/Engineer (for fluid dynamics), and picks and uses statistical methods without an in-depth knowledge of statistics to know which method to use when, you are going to have problems.

      However, all these users (Climate Scientists, Biologist, Engineer and what have you) go: "Wonderful, I don't need all these specialists (Computer Scientist, programmer, Engineer, Climate Scientist, Biologist), I can just use an off the shelf spreadsheet, whip up a few formulas in it, and presto!".

      A Computer Scientist might go: "I'll just write my own program to do this climate modelling thing, I've got a fluid dynamics text book, I'll just read that and implement it in code." The code may be beautiful, concise, with the formula's implemented flawlessly...until a fluid dynamics specialist looks at it and goes "why did you use that formula/paradigm there? That's a rather exotic case, superheated plasmas, it's not very good when you are modelling a layer of air that's at -45C..."

      Of course, if you don't have an in-depth knowledge of spreadsheets (hell, a civil servant who lives, breathes, finance and the spreadsheets that are daily created probably knows how to use a spreadsheet program better than a 'scientist', whether that's an Engineer, Computer Scientist, Economist, Climate Scientist what have you), or Fluid Dynamics, or programming or Computer Science, then you'll probably end up with a balls-up.

      Unless you have expertise in the other field, you should engage an expert in that field to at least advise on what you are doing. If a Biologist needs to manipulate a lot of genes with a computer...consult a computer scientist on the best way to do what you want done, and have them implement it for you if necessary...and if the implementation requires expertise from another field, say fluid dynamics, then the computer scientist should probably consult a fluid dynamacist, or statistician or whatever else is necessary.

      link to this | view in chronology ]

    • identicon
      Lawrence D’Oliveiro, 31 Jul 2014 @ 3:37am

      Re: So instead of spreadsheets...

      How about proper stats analysis and data visualization software?

      * NumPy and SciPy
      * matplotlib
      * The R statistical language
      * SAGE
      * GNU Octave

      ... just to name a few.

      link to this | view in chronology ]

    • identicon
      Anonymous Coward, 31 Jul 2014 @ 5:54am

      Re: So instead of spreadsheets...

      "Clearly spreadsheets are efficient"

      I do not understand this statement. Are you saying that the software in question is efficient at producing erroneous output? How can this be considered efficient when the user then has to expend countless hours finding and correcting the introduced errors. I find this to be grossly inefficient. You would be better off writing your own scripts.

      link to this | view in chronology ]

      • icon
        nasch (profile), 31 Jul 2014 @ 6:39am

        Re: Re: So instead of spreadsheets...

        You would be better off writing your own scripts.

        Because that will work right the first time?

        link to this | view in chronology ]

        • identicon
          Anonymous Coward, 31 Jul 2014 @ 7:04am

          Re: Re: Re: So instead of spreadsheets...

          No, because you control the outcome.

          It was an example of how inefficient a poorly designed application is, when doing it yourself is potentially better.

          link to this | view in chronology ]

          • icon
            nasch (profile), 31 Jul 2014 @ 7:24am

            Re: Re: Re: Re: So instead of spreadsheets...

            It was an example of how inefficient a poorly designed application is, when doing it yourself is potentially better.

            So you're saying making mistakes using Excel (because that's what happened here) is inefficient, and it would be a better use of time for that same person (who is a scientist, not a software developer, and can't use Excel properly) to write their own software from scratch?

            link to this | view in chronology ]

            • identicon
              Lawrence D’Oliveiro, 31 Jul 2014 @ 5:44pm

              Re: would be a better use of time for that same person (who is a scientist, not a software developer

              Hence the well-known saying, that there is no time to do it right, but there is time to do it over.

              link to this | view in chronology ]

    • icon
      LduN (profile), 31 Jul 2014 @ 6:31am

      Re: So instead of spreadsheets...

      Any database should do the trick... quite simple too

      link to this | view in chronology ]

  • identicon
    Rekrul, 31 Jul 2014 @ 12:19am

    This is a perfect example of Microsoft's attitude of thinking they know what's best for the user. Instead of allowing the user to set such options, their software just goes ahead and does it.

    It's the same thinking that lead to them adding a virus distribution mechanism to Windows and calling it AutoRun/AutoPlay.

    link to this | view in chronology ]

    • icon
      Woadan (profile), 31 Jul 2014 @ 12:50am

      Re:

      The likelihood is that MS responded to many Excel community requests to make data entry easier for some sorts of data, such as dates, and so Excel now "senses" a date has been entered and converts the data to its default date setting.

      Excel is used by many organizations for many purposes. You need to at least be aware of how the software functions. Otherwise why are you using it?

      link to this | view in chronology ]

      • identicon
        Anonymous Coward, 31 Jul 2014 @ 5:57am

        Re: Re:

        Yes, it is sort of like using crayons to make your presentation to the board.

        link to this | view in chronology ]

      • identicon
        Rekrul, 31 Jul 2014 @ 10:02am

        Re: Re:

        The likelihood is that MS responded to many Excel community requests to make data entry easier for some sorts of data, such as dates, and so Excel now "senses" a date has been entered and converts the data to its default date setting.

        That should be an option.

        I'm sure that there are probably quite a few people who wouldn't mind having all their figures rounded up or down to make things simpler. Should the software just go ahead and do that for everyone?

        link to this | view in chronology ]

        • icon
          mdpopescu (profile), 31 Jul 2014 @ 10:57am

          Re: Re: Re:

          Yes. In general, the less options you present to the user, the better. See various articles on the subject, like Joel's or Jeff Atwood's.

          link to this | view in chronology ]

        • identicon
          Michael, 31 Jul 2014 @ 11:50am

          Re: Re: Re:

          It IS an option.

          You can turn this option off by selecting your worksheet and formatting the cells as text. Anything you add from that point forward will be treated as text.

          It's not rocket science, but sometimes it helps to RTFM.

          link to this | view in chronology ]

          • identicon
            Anonymous Coward, 31 Jul 2014 @ 12:40pm

            Re: Re: Re: Re:

            It's not rocket science, but sometimes it helps to RTFM.

            But Micro$oft claims their software is intuitive to use, therefore users should not need to RTFM.
            /big myth

            link to this | view in chronology ]

    • icon
      ottermaton (profile), 31 Jul 2014 @ 3:41am

      Re:

      This is a perfect example of Microsoft's attitude of thinking they know what's best for the user.

      This. A thousand times over.

      link to this | view in chronology ]

    • identicon
      Michael, 31 Jul 2014 @ 5:43am

      Re:

      This is a perfect example of Microsoft's attitude of thinking they know what's best for the user. Instead of allowing the user to set such options, their software just goes ahead and does it.

      Yay for Microsoft bashing. However, this is the DEFAULT behavior of excel when pasting in data from the clipboard that is in certain common formats. Someone actually familiar with the software would have know this was a possibility and could have taken lots of different steps to avoid it - the simplest being to set the format of the column to text before pasting (all of three clicks), or properly using the import features of the software to identify data types correctly.

      This particular "issue" is not a software problem.

      If I had to select the individual data type and formatting of everything I ever pasted into Excel, I would never get anything done.

      link to this | view in chronology ]

      • identicon
        Anonymous Coward, 31 Jul 2014 @ 6:03am

        Re: Re:

        "Yay for Microsoft bashing."

        Well deserved bashing in this case.

        Rather than make excuses and describe workarounds for a poorly designed software application, one might simply use a more sophisticated suite from a different source that is more suited to their purpose. I realize that in many corporate environments obtaining such special treatment is an uphill battle, so good luck with that.

        link to this | view in chronology ]

        • icon
          nasch (profile), 31 Jul 2014 @ 6:43am

          Re: Re: Re:

          Rather than make excuses and describe workarounds for a poorly designed software application

          It's not a workaround, it's using the software correctly. Maybe the default behavior isn't right for these users, but that doesn't mean the software is flawed (in this particular way). It's not "Excel for Bioinformatics" after all. If a user doesn't know how to import data correctly, doesn't learn how before doing it, and ends up with incorrect data, why is that Microsoft's fault?

          link to this | view in chronology ]

          • identicon
            Anonymous Coward, 31 Jul 2014 @ 7:01am

            Re: Re: Re: Re: Incorrect data

            If a user doesn't know how to import data correctly, doesn't learn how before doing it, and ends up with incorrect data, why is that Microsoft's fault?
            The data is mangled by a "feature" that is on-by-default in every new document, and as I understand it, the user is never proactively informed that the feature exists. I understand the appeal of a tool that is immediately ready for use on first install (after Click Once gets done downloading the multi-GB installer for 15+ minutes ;)), but presenting an interface that looks like it will always "do the right thing" is an attractive nuisance. It lures users into assuming Excel will do the right thing, without any hint that its idea of right differs from what the user intends.

            A dialog on first import saying "Excel has normalized some date-like strings." could have alerted the user that the import mangled the results. The burden would then be on the user to go back and reimport the data correctly. A fancier version might offer links to documentation that explains what kind of normalization Excel enables and options to re-run the import in place with those normalizations disabled. A very fancy version could even import the data in non-normalized form, but provide an option to have Excel interact with it (render, search, transform, etc.) in normalized or as-imported form.

            link to this | view in chronology ]

            • icon
              nasch (profile), 31 Jul 2014 @ 7:18am

              Re: Re: Re: Re: Re: Incorrect data

              A dialog on first import saying "Excel has normalized some date-like strings." could have alerted the user that the import mangled the results.

              Sadly, users rarely read dialogs. Reading documentation is probably even less likely. Given that they're not going to get much help from the user, software designers have to choose what is the best default behavior for the greatest number of users. Perhaps you disagree with the default behavior, but either way it's going to be wrong for someone.

              link to this | view in chronology ]

            • identicon
              Michael, 31 Jul 2014 @ 8:37am

              Re: Re: Re: Re: Re: Incorrect data

              A dialog on first import saying "Excel has normalized some date-like strings." could have alerted the user that the import mangled the results.

              I'm not sure if you have noticed that software has really stopped popping up dialogs like this, but while they used to be standard practice, sometime in the late 80's software companies realized that their customers HATE THOSE THINGS MORE THAN GETTING SOMETHING WRONG NOW AND THEN.

              link to this | view in chronology ]

          • icon
            Avatar28 (profile), 31 Jul 2014 @ 8:16am

            Re: Re: Re: Re:

            > It's not "Excel for Bioinformatics" after all.

            Exactly. People need to remember that Excel isn't marketed for that particular use. The vast VAST majority of users of this software benefit from these sorts of changes. Remember, most will be using it for things like financial data and stuff like that. For those people importing a date that displays in a different format can create issues. In the US we would write Dec 1 but in Europe 1 Dec is more common. I can pretty much guarantee that displaying gene abbreviations correctly is a niche use that NO designer ever had cross their mind. I'd also be surprised if the other packages on the market didn't do the same thing.

            link to this | view in chronology ]

      • icon
        John Fenderson (profile), 31 Jul 2014 @ 10:23am

        Re: Re:

        "This particular "issue" is not a software problem."

        I couldn't disagree more. This issue is most definitely a software problem. At the very least, conversion warnings should have been issued.

        link to this | view in chronology ]

        • icon
          nasch (profile), 31 Jul 2014 @ 11:15am

          Re: Re: Re:

          At the very least, conversion warnings should have been issued.

          That would merely be a CYA issue, since generally users would ignore them. It wouldn't have any substantive effect.

          link to this | view in chronology ]

        • identicon
          Michael, 31 Jul 2014 @ 11:56am

          Re: Re: Re:

          These options are easily turned off.

          They are on by default because that is what most of the users want most of the time. They have gotten years of feedback, complaints, and requests to come up with the defaults - they didn't arbitrarily decide that automatic date detection should be on and automatic fraction detection should be off.

          However, the included documentation and the ability to control all of these things - just in case you don't want them on for a specific purpose.

          If they had gone your way, you would type in a list of numbers, highlight them, and wonder why it didn't show you the total.

          link to this | view in chronology ]

    • icon
      John Fenderson (profile), 31 Jul 2014 @ 10:21am

      Re:

      "This is a perfect example of Microsoft's attitude of thinking they know what's best for the user. Instead of allowing the user to set such options, their software just goes ahead and does it."

      A million times this. This is a consistent problem with Microsoft software.

      link to this | view in chronology ]

      • icon
        MrTroy (profile), 31 Jul 2014 @ 7:17pm

        Re: Re:

        Microsoft does a lot of bad things in a lot of software, and don't get me started on the APIs... but this is just a case of Microsoft providing what most of its customers want.

        As mentioned above, the target audience for Excel was never bio-informatics... and yet they are still catered for because steps can be taken to import data correctly, instead of blindly pasting stuff around, not checking the results, then blaming the software for getting it wrong.

        There are lots of suggestions that Excel should have warned that it had converted data to a date... but nobody has considered how much they would be pissed off when Excel issued a warning every time they entered a date, or a number - that's just training your users to ignore warnings.

        There are lessons to be learned from this story, but blaming it all on Microsoft is choosing not to learn.

        link to this | view in chronology ]

        • icon
          John Fenderson (profile), 1 Aug 2014 @ 9:24am

          Re: Re: Re:

          "but this is just a case of Microsoft providing what most of its customers want."

          I'm aware of that, but catering to what most of its customers want means that they produce software which is dumbed down and allows for stupid errors to be easily overlooked. This effect is one of the reasons I avoid Microsoft software as much as I can.

          "blaming it all on Microsoft is choosing not to learn."

          Don't misunderstand me -- I'm not blaming it all on Microsoft at all. This type of problem is hard to resolve in a way that pleases everyone. I'm just saying that Microsoft's approach is one that consistently fails to meet my needs.

          link to this | view in chronology ]

        • identicon
          Anonymous Coward, 4 Aug 2014 @ 5:22am

          Re: Re: Re:

          Warning that it has performed inconsistent conversions on a column would be useful - it is very likely to mean that either it has munged something that happened to look date-like or number-like (phone numbers, SKU codes, etc.), or there was malformed data in the column which you'll need to fix.

          link to this | view in chronology ]

  • icon
    Woadan (profile), 31 Jul 2014 @ 12:57am

    Why is a major undertaking like what BMC Bioinformatics was doing not being entered into a database? Use the right tool for the job, and Excel is not the right tool for this job.

    link to this | view in chronology ]

    • icon
      MrTroy (profile), 31 Jul 2014 @ 1:07am

      Re:

      Why do you think Excel isn't a database? And even if it isn't a sufficiently generic database, there's no guarantee that the front-end for any other custom-built database won't have any data entry issues.

      Why do you even think you know what the requirements for the job are? Presumably being able to share the document around is one of the requirements, and databases aren't usually very good at that.

      Also, from the article: For example, this type of error occurs several times in even the immaculately curated LocusLink database.... use of spreadsheets doesn't preclude the use of databases.

      link to this | view in chronology ]

      • identicon
        Donglebert The Needlessly Unready, 31 Jul 2014 @ 3:01am

        Re: Re:

        True, there's no guarantee. But anyone writing a database has to define the field data types, whereas Excel by default makes assumptions for you.

        And no, Excel is not a database application. It's a spreadsheet application that can do some database type work.

        Excel could be used for this sort of data work, but Woadan is entirely correct that a database would be the better solution.

        link to this | view in chronology ]

        • identicon
          Michael, 31 Jul 2014 @ 5:56am

          Re: Re: Re:

          Database:
          a collection of pieces of information that is organized and used on a computer


          Excel most certainly falls into the category of database applications - although I would argue that Excel is a database, a user interface, a calculation engine, and programming platform, a reporting tool, and an analysis toolset.

          But anyone writing a database has to define the field data types, whereas Excel by default makes assumptions for you.

          Absolutely not true. I have worked with databases in the past that were entirely made up of un-typed data, lots of them I have worked with have effectively been defined as nothing but character data or blobs, and Excel makes default assumptions, but they are only defaults and you can override them very easily and it stores everything as typed data.

          Woadan is entirely correct that a database would be the better solution

          This is the kind of thing that gets me furious. Without any specifications, you cannot possibly make this determination. For all we know, the data we are talking about is a single table with 10 columns and 300 rows. Excel is a great solution for that. Frankly, a "database" would be pretty useless for this "problem" as it was not the data storage engine that set the type and reformatted the values - that was the user interface.

          The best solution here would be a user that knows how to use whatever software they were using to import data - and setting the type properly when they did it.

          link to this | view in chronology ]

          • identicon
            Anonymous Coward, 31 Jul 2014 @ 6:05am

            Re: Re: Re: Re:

            Technically, a flat file can be called a database.

            link to this | view in chronology ]

          • identicon
            Donglebert The Needlessly Unready, 31 Jul 2014 @ 6:51am

            Re: Re: Re: Re:

            I didn't say that Excel couldn't be a database, I said it was not explicitly a database application. That's why Microsoft also produce Access - which is a database app, and SQL server, which is a rdms.

            Excel is a spreadsheet app. Yes, like some database apps, it holds data in a tabular format. It also typically displays data in a tabular format, which most databases don't do (unless you run queries - an area where Excel is also weak (but, then, it's a spreadsheet app, not a db app). It can't cope with significantly large amounts of data. In day to day use it relies on the same formulas being duplicated again and again with different inputs. It doesn't create indexes effectively.

            A pen and paper can be a database too. So can Word. So can Photoshop. That doesn't mean they're database apps.

            And, if you come across databases where a field that requires a specific datatype has not been defined as that data type, then the developer didn't know what they were doing or the spec they were working from was flawed.

            And we're talking genome data here. It's not unreasonable to assume that the dataset was very large. Otherwise, this story wouldn't be a story.

            link to this | view in chronology ]

  • icon
    Richard (profile), 31 Jul 2014 @ 2:08am

    Once upon a time

    Once upon a time there was a spreadsheet that was different. The formulae were clearly visible (they weren't hidden in the cells). The data items had names - not cell addresses. In short it was to existing spreadsheets as high level languages are to assembly language.

    I bought a copy (one of my very few software purchases). Twenty-one years later I still use it. It is still hugely better than excel and its clones. It was (is) called Improv.

    The fact that it did not take over is a tragedy of market failure. The fact that I have increasing difficulty keeping it going (confined now to a virtual xp box) is a tragedy of copyright.

    link to this | view in chronology ]

    • icon
      MrTroy (profile), 31 Jul 2014 @ 2:18am

      Re: Once upon a time

      I feel your pain, but things aren't so bad in the rest of the world now. I know they were just two examples, but you can specify names for cells and cell ranges in most spreadsheeting software now, and at least Excel allows showing formulas instead of results across all cells. It's entirely possible that you could use Excel (or a different competing product) in the same way that you're used to using Improv.

      link to this | view in chronology ]

    • identicon
      Donglebert The Needlessly Unready, 31 Jul 2014 @ 3:08am

      Re: Once upon a time

      Yup, Improv was great. More of a db data modeller posing as a spreadsheet.

      link to this | view in chronology ]

    • identicon
      Kyle, 31 Jul 2014 @ 7:37am

      Re: Once upon a time

      Not all is lost, it can still be got from here:

      http://www.vetusware.com/download/Lotus%20Improv%202.1/?id=5797

      link to this | view in chronology ]

    • icon
      JP Jones (profile), 31 Jul 2014 @ 7:09pm

      Re: Once upon a time

      Um, Excel can do all of that. You can show formulas instead of results, you can assign cells and ranges specific names, and has an actual (if basic) programming language behind its macros.

      The thing is that Excel is designed for simple stuff. It can do more complex things but it isn't really designed for it. Excel is great for small forms with a couple of automatically formatted outputs, or a home budget, or keeping track of documents in a small office. It is not designed for advanced statistical analysis of gene structures.

      Office is designed with several specialized tools for specific purposes rather than one swiss army knife that tries to do everything. You can make a presentation in Word, a simple spreadsheet in Access, and write a report in Powerpoint. It works. It just doesn't do it well.

      There was an easy way to avoid this, which a simple Google search would have revealed...click the upper left box to select the entire sheet, and change the drop down box from "General" to "Text." Problem solved.

      I personally use Excel extensively in my own job and am very familiar with both its strengths and its limitations. Is it perfect? Heck, no...I'd love for the ability to write formulas in multiple lines, sort of like how VBA is formatted, to help keep track of coding in a complex sheet when macros aren't an option (usually due to business rules). But overall the quick formatting of data is useful for prototyping and reformatting data, creating useful rosters and graphs, and other simple tasks when your boss asks for a quick analysis of something.

      This is like complaining that using the flat side of a power drill to hammer a nail isn't very easy. Why not use a hammer? Well, I want to use my power drill, screw hammers.

      Sorry. It's not the drill's fault you don't know how to use a hammer.

      link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Jul 2014 @ 2:43am

    EXTRA EXTRA!!

    Journal publisher states the obvious. TD runs it.

    Get your copy now!

    link to this | view in chronology ]

    • identicon
      Anonymous Coward, 31 Jul 2014 @ 6:08am

      Re:

      Sounds like the voice of experience ... must've got butt hurt in a big meeting with pie chart enthusiasts.

      link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Jul 2014 @ 3:05am

    Old news, maybe?

    BMC Bioinformatics 2004, 5:80 doi:10.1186/1471-2105-5-80

    The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/5/80

    Received: 5 March 2004
    Accepted: 23 June 2004
    Published: 23 June 2004

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Jul 2014 @ 5:03am

    Maybe someone from the open source community needs to create a spreadsheet system for just data analysis it would make sense that a hammer is a hammer until you need a wrench , keep things very simple , I don't open VLC to edit photos I use GIMP.

    link to this | view in chronology ]

    • identicon
      Kyle, 31 Jul 2014 @ 7:48am

      Re:

      A good drop in replacement for Excel and Access can be found at

      https://www.libreoffice.org/

      But as a number of posters above have said, nothing beats inputting the correct information and, just as important, make sure cell-formatting is done correctly.

      I suppose any spreadsheet / database program will fall flat if these two criteria aren't met.

      I remember the good old days of DBASE III. Currently, they can be found at

      http://www.dbase.com/

      and they are sitting at version 9

      Just my - stating the obvious - 2c worth. :-)

      link to this | view in chronology ]

  • icon
    Zeissmann (profile), 31 Jul 2014 @ 5:17am

    Idiocy

    If anything, this is a remainder that using spreadsheets for stuff they weren't meant to be used for (like maintaining databases) is a sign of idiocy. Especially if the spreadsheet is buggy and heavy on control-free spellchecking.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Jul 2014 @ 6:06am

    This is a perfect example of Microsoft's attitude of thinking they know what's best for the user.

    this is exactly the situation, and it's not just microsoft but pretty much all software are certain they know infinitely more about what the user should be doing than the user. for instance, doubling spacing after a sentence-ending period. most websites refuse that even though at times it really aids readability. momma knows what's right for you.

    another real pain to me is headers when sorting. about half the time excel declares the top cell to be a header and doesn't include it in the sort. if you aren't vigilant about checking, you may get an error there.

    super annoying to have to do handstands to use the stupid software. if that auto-dating feature (which annoys me, too) causes a serious error, i hope the offended party sues the shit out of microsoft.

    link to this | view in chronology ]

  • icon
    RadioactiveSmurf (profile), 31 Jul 2014 @ 6:22am

    This is why I hate Microsofts "helpful" features. If I wanted December 1st I would type that. I know what I'm doing and it takes twice as long to go back through and correct the mistakes that Microsoft created for me.

    link to this | view in chronology ]

    • icon
      Avatar28 (profile), 31 Jul 2014 @ 8:20am

      Re:

      And for every person like you there are probably 10 that find that sort of feature helpful. Just like when Word first went WYSIWYG. A few people bitched about it. They wanted their old display where it just showed the document with a markup-style layout instead of WYSIWYG. Most people wanted the latter and that's what we get now.

      link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Jul 2014 @ 6:43am

    Maybe they should have checked through the formatting on the spreadsheet. Maybe they should have used a different statistical analysis software. Maybe they shouldn't have assume that using a basic consumer level program in a basic manner would be fine for statistical analysis.

    We knew better than this in 2nd year stats classes in college. In 3rd year we were using alternative stat programs with more built in functionality. This isn't a failure of Microsoft. This is people not being careful, not doing work how they were supposed to, being cheap, and trying to find a scapegoat for their sloppy work.

    link to this | view in chronology ]

  • icon
    Sheogorath (profile), 31 Jul 2014 @ 7:13am

    A default date conversion feature in Excel (Microsoft Corp., Redmond, WA) was altering gene names that it considered to look like dates.

    Okay, which idiot didn't turn this 'default' feature off?

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Jul 2014 @ 9:34am

    both the date-whether-you-need-it-or-not issue and the imposition-of-headers-sometimes-for-sorting-when-you-don't-have-them issue are really annoying and force the user to check what the stupid software is doing rather than concentrating on his or her task.

    bad design really is bad design.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Jul 2014 @ 11:36am

    In general, the less options you present to the user, the better.

    some years ago i was employed using the design software catia v4, which was very amendable to make the environment the most useful to a designer that it could be.

    one day a fairly new user complained to me that he wished there weren't so many things that could be adjusted. i told him i agreed exactly and that all machines should be set up the way i like mine.

    link to this | view in chronology ]

    • identicon
      Michael, 31 Jul 2014 @ 12:02pm

      Re:

      he wished there weren't so many things that could be adjusted

      Damned if you do, damned if you don't. In my experience it has always been that each user has their handful of things that should be configurable and "everything else is just clutter that should be removed". And guess what, it was a different handful of things for each user.

      link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Jul 2014 @ 11:54am

    The problem is that Excel converts the *value* of the cell to the numeric equivalent that represents the date, and also converts the cell format to Date. So if you format as text, you fix one problem, but not the other. "DEC1" becomes 41974, for example.

    yes, but it's even worse than that. if excel can't discern a year, it will insert this year into that date. so the spreadsheet you make that error on this year will equate that cell to 41974, but if you go back to a 2013 table, it won't be that number for dec1.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Jul 2014 @ 12:36pm

    michael, i hope you realize that my comment #69 and your comment #73 are essentially saying the same thing.

    i'm pleased, of course, when someone adds heft to my argument, but i always hope that someone realizes it.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Jul 2014 @ 5:30pm

    It works fine IF you format the columns BEFORE the import.

    no, it doesn't.  i just tried it.

    now, it does if i set the format of a cell or group of cells to be TEXT and enter the value into the one-line input field up top or copy/paste from a text document one item of information into that field.

    problem is if i copy/paste a large amount of info, say 30 football players' info, from a document, i can't put it all in that one-line entry field.  i have to pick a location in the sheet and dump all that info with respect to the place i picked to put it.  when i do that it changes the format of those TEXT cells to CUSTOM and changes those 5-10 football players' heights to 10-May.  if i force that format back to either GENERAL or TEXT, i get 41769.

    it doesn't work unless i handpick some cells to format and transfer the information one item at a time and be careful to select the correct cell and dump the info into that upper entry field. ONE HEIGHT AT A TIME.

    let me spell it: S U C K S.

    i tried mass dumping three players' heights and weights into that one-line upper entry field, and it took it, but put all those heights and weights of three players into that one cell, not an array of cells like i need.

    like i say, S U C K S.

    link to this | view in chronology ]

    • icon
      nasch (profile), 1 Aug 2014 @ 8:00am

      Re:

      no, it doesn't. i just tried it.

      now, it does if i set the format of a cell or group of cells to be TEXT and enter the value into the one-line input field up top or copy/paste from a text document one item of information into that field.


      I don't have Excel to test it, but are you only testing with copy/paste, or are you importing from a file? Because the latter has all kinds of options that the former doesn't.

      link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Jul 2014 @ 5:58pm

    i found another way the transfer works fine.

    if i have info in a spreadsheet carefully formatted so that anything that excel would misread as a date is handled as TEXT, then i can select a bunch of cells, copy/paste that info anywhere in another spreadsheet, and it will reformat the cells to be TEXT in the new spreadsheet as needed.  i don't have to pre-format those cells.  so that's not too bad.

    it evens transfers the justification changes i had made in the original spreadsheet.  in other words, it brings the formatting with the data.

    transfers from any other source than another excel spreadsheet probably have to be very carefully handled, but if i painstakingly set up and excel spreadsheet, i can transfer without losing formatting.

    link to this | view in chronology ]

  • identicon
    Zonker, 1 Aug 2014 @ 5:25pm

    It looks like you're trying to insert some dates into your spreadsheet.

    Would you like help?

    * Convert all text into date format.

    * Leave all my gene names alone, you fool!

    * Yes, I desperately in need of a date! Do you have any attractive single friends who have low enough standards to go on a date with me? (Forever alone!)

    [ ] Don't show me this tip again

    link to this | view in chronology ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.