You're not a stats geniusNeither am I. I get it.
You just want to up your skills and work on cooler projects.
But learning R and data science is a bit overwhelming.
Recent blog posts
You want to identify the nth largest or smallest item in a group using R. For example, to filter out the two rows in the table below: Any time there is some by-group processing, I almost always stick with the dplyr library because of it’s so-called window operations. Below are a few techniques: Let’s say[…]
You want to calculate percent of column in R as shown in this example, or as you would in a PivotTable: Here are two ways: (1) using Base R, (2) using dplyr library. If you are dealing with many cases at once, you can also go with method (3) automating with a loop. Let’s say[…]
[This post was also published on LinkedIn.] Today I met a colleague in the marketing team in Ireland. He is taking one of these popular online Data Analytics courses and needed help with a cluster analysis assignment. He was stuck, so I offered to take a look. The exercise involved clustering insurance customers in terms[…]
Let’s say we imported a .csv or .xlsx file into R, and it’s like this: Looks fine, until you try to do a calculation on the payment column: So we run str(df) to check the table structure. Lo and behold, that column is not a numeric variable, it is character (chr): How to remove the[…]
MID(), LEFT() and RIGHT() make it easy to extract parts of strings in Excel. Let’s see how we apply those in R. Using substr() Let’s start with a simple example in Excel: Base R does not have exact equivalents to these functions. Instead, there is substr(), which generically extracts substrings from strings. Let’s replicate the[…]
Below is a short list of basic, commonly-used Excel formulas and their R counterparts. The R functions are part of Base R, in that they do no require third-party packages. This is not an exhaustive list by any stretch. Basic Arithmetic Functions Excel formula R function SUM sum AVERAGE mean COUNTA length MAX max (sometimes[…]
This is not an Excel feature, although it should be! Often times we have a list of things in a column that needs to be combined into, say, a comma-delimited string. Column to Text in R… Column to Text in Excel That column looks like this: Typically I write a quick formula that cumulatively concacatenates[…]
The apply family of functions takes the prize for being the most useful yet most confusing and unintuitive (at least initially). Here I hope to demystify this wonderful set of tools. In short, this set of functions is useful when we need to repeat something over a set of values (list values, vector, dataframe columns,[…]
It is good practice to keep a clean workspace by removing objects that are no longer being used. This is especially true if you have saved multiple large data frames in the course of your analysis. Let’s say you have a workspace with five objects stored, and looks like this: The ls() function returns a[…]