Believe it or not, Excel is still my go-to analysis tool a lot of times, because it’s great at what it does. I’m a shortcut fiend, so I can do things pretty quickly. So when do I opt for R? People have asked me this many times. Here is my unofficial checklist I loop through in my head to decide whether to Excel or not to Excel:

- Is the data
**not well structured**or PivotTable-ready? Does it have a**lot of stuff within cells**that needs to get broken out?

If yes,**R**, unless I can work my Excel magic to clean it up. - Is this a
**quick and dirty one-time analysis**? Including quick visuals.

If Yes, then**Excel**, as long as the data is not gigantic. - Do I need anything
**beyond basic statistical analysis**? Regression, clustering, text mining, time series analysis, etc

If Yes, then**R**. No contest. - Do I have to
**crunch a few disparate datasets**to do my work?

Depends on complexity. If data sets are small and a simple vlookup can handle it, then**Excel**. If more than three tables, most likely**R**. If more than 1-2 columns vlookup’ing from each table, also**R**. - Something I will want to
**share in a web-based, interactive format**that is nice to look at?

**R**with the Shiny framework **Unique and beautiful visuals**the world has rarely seen?

**R**.

While I was learning R, I used a hybrid approach … doing the heavy-lifting data prep work in R, then using the write.csv() function to send my data frames back to Excel for visuals and basic analysis. Over time, I have learned to do more complete analysis in R, from beginning to end.

I hope this helps! What scenarios did I miss?

## Leave a Reply

9 Comments on "Excel vs R: When to use what"

Nice post, John.

I use a lot of Excel when I have to present scenarios and change a “final” table in front of clients. Looking at the data is easier in Excel than in R, but of course I’m referring to a “final, condensed, aggregated” dataset (< ~30000 I guess?).

Knowing how to conduct a PCA analysis in R does not mean you understand the dataset and can play with human-generated scenarios on it in a spreadsheet.

For that sense, R is overrated. There's a reason the industry still relies heavily on spreadsheets. They are easy. They support the creation of scenarios and conversation about the data.

R programming will be gibbreish in front of a meeting room with the CEO. Unless you spent days/weeks working on a Shiny dashboard. And still you can be surprised with a simple question on "change a parameter x, please".

What you have said is true about scenario analysis, but then R is meant for statisticians and written by statisticians. Using it for a simple think like that is like using a thermonuclear bomb to kill a fly.

R shines truly when one actually have to use statistics to infer information, say creation of neural networks or forecasting or the likes.

Also when working in a file with more then 1000 line items when it is necessary to have a birds eye view then R works well for me (this is my personal thought and given that I am proficient in R this is biased).

Thanks for the input, guys. You’re both right. I will never argue against use of spreadsheets because I still use them on a daily basis, more than R, in fact.

Over time, each person figures out what works for them and when. To this day, I still pull our R as my secret weapon at work. The important thing is you diversify your skill set so you can build the confidence to think that nothing is impossible (at least as far as data work goes).

It’s apples and oranges. Spreadsheets are for displaying small data sets and doing simple calculations on them. That’s where Excel and the likes shine. For advanced statistical analysis on larger data sets you need R. There is a small intersect between the two (i.e. joining tables) but these have very different purposes. R is not suitable for “changing a “final” table in front of clients” because advanced statistical analysis needs more time and concentration than a meeting with clients could provide.

Thanks, your opinion just the same as me. I’m using Excel and R together to analyze data, Excel helps me in the data editor and R for analyzing. In fact, R and Excel are just the tools, the most important for data analysis is the mindset and suitable theory to tell the audience a story based on boring data, don’t you think?

Excellent points! I’ve been in that situation enough times in my career — dealing with other people’s highly convoluted Excel spreadsheets that are really sensitive to the most trivial changes. The linearity of code ensures that you can read it from top to bottom to follow the logic.

I should add that today there is a much higher limit on the size of the dataset you can work with in Excel using Power Query and/or Power Pivot. I’ve worked with tens of millions of rows of data in Power Query without noticeable performance issues. I’ve also created “live” connections to databases containing billions of records (Hadoop, ERP systems, etc.)

Anyone who thinks they can’t work with large datasets in Excel is still stuck in the “old school” ways of thinking. Microsoft has made massive strides in this area in recent years, so if you’re not aware of these new Power tools you really should avail yourself of them to see how they can totally transform the way you work with data in Excel.