How to do MID, RIGHT an LEFT in R

MID(), LEFT() and RIGHT() make it easy to extract parts of strings in Excel. Let’s see how we apply those in R.

Using substr()

Let’s start with a simple example in Excel:

MID RIGHT LEFT in Excel

Base R does not have exact equivalents to these functions. Instead, there is substr(), which generically extracts substrings from strings. Let’s replicate the above image in R:

a = "Chicago"
b = "New York City"
c = "Los Angeles"

# LEFT
substr(a, 1, 3)

# MID
substr(b, 5, 8)

# RIGHT
substr(c, 5, 11)
[1] "Chi"
[1] "York"
[1] "Angeles"

The substr() function is structured as substr(x, start, stop) where start and stop are the first and last characters to be extracted, respectively.

Creating custom functions

Chapter 15 of the book shows how to create custom functions. We would write custom functions if we plan to execute these commands regularly. Or if we just want the comfort of a more familiar function. Here’s how I would do it:

left = function(text, num_char) {
  substr(text, 1, num_char)
}

mid = function(text, start_num, num_char) {
  substr(text, start_num, start_num + num_char - 1)
}

right = function(text, num_char) {
  substr(text, nchar(text) - (num_char-1), nchar(text))
}

left(a, 3)
mid(b, 5, 4)
right(c, 7)
[1] "Chi"
[1] "York"
[1] "Angeles"

The structure of these three functions exactly mimics the Excel formulas. right(), for example, now just takes text and num_char as input, just like Excel’s RIGHT(). We did this by hacking the input to the substr() function. Taking the Los Angeles example, right() ends up evaluating to:

substr("Los Angeles", 11 - (7 - 1), 11)

==> substr("Los Angeles", 5, 11)

Which is exactly what we did in the first example.

More Advanced string extraction

The above examples are just scratching the surface of what you can do with string extraction in R. For example:

Separating first and last names
Separate first and last name in R

This is not bad in Excel, but what if someone puts in their middle name or initial? It would get more complicated if you want to write a single formula to handle all the rows.

Or how about this one:

Extracting the query string from a URL
Extract query string in R

To just get the query string (the part where q=…), the first three are not too hard in Excel. But then we get a curveball with the &another_parameter, which we don’t want to grab.

Regular expressions come into play, and we will cover this in a later post.

Leave a Reply

Be the First to Comment!

avatar

wpDiscuz