Excel vs R: When to use what

Believe it or not, Excel is still my go-to analysis tool a lot of times, because it’s great at what it does.  I’m a shortcut fiend, so I can do things pretty quickly. So when do I opt for R?  People have asked me this many times.  Here is my unofficial checklist I loop through in my head to decide whether to Excel or not to Excel:

    1. Is the data not well structured or PivotTable-ready?  Does it have a lot of stuff within cells that needs to get broken out?
      If yes, then R, unless I can work my Excel magic to clean it up.
    2. Is this a quick and dirty one-time analysis?  Including quick visuals.
      If Yes, then Excel, as long as the data is not gigantic.
    3. Do I need anything beyond basic statistical analysis?  Regression, clustering, text mining, time series analysis, etc
      If Yes, then R.  No contest.
    4. Do I have to crunch a few disparate datasets to do my work?
      Depends on complexity.  If data sets are small and a simple vlookup can handle it, then Excel.  If more than three tables, most likely R.  If more than 1-2 columns vlookup’ing from each table, also R.
    5. Something I will want to share in a web-based, interactive format that is nice to look at?
      R with the Shiny framework
    6. Unique and beautiful visuals the world has rarely seen?

While I was learning R, I used a hybrid approach … doing the heavy-lifting data prep work in R, then using the write.csv() function to send my data frames back to Excel for visuals and basic analysis.  Over time, I have learned to do more complete analysis in R, from beginning to end.

I hope this helps!  What scenarios did I miss?

Leave a Reply

1 Comment on "Excel vs R: When to use what"


2 months 7 days ago

Nice post, John.
I use a lot of Excel when I have to present scenarios and change a “final” table in front of clients. Looking at the data is easier in Excel than in R, but of course I’m referring to a “final, condensed, aggregated” dataset (< ~30000 I guess?).
Knowing how to conduct a PCA analysis in R does not mean you understand the dataset and can play with human-generated scenarios on it in a spreadsheet.
For that sense, R is overrated. There's a reason the industry still relies heavily on spreadsheets. They are easy. They support the creation of scenarios and conversation about the data.
R programming will be gibbreish in front of a meeting room with the CEO. Unless you spent days/weeks working on a Shiny dashboard. And still you can be surprised with a simple question on "change a parameter x, please".