`MID()`

, `LEFT()`

and `RIGHT()`

make it easy to extract parts of strings in Excel. Let’s see how we apply those in R.

### Using `substr()`

Let’s start with a simple example in Excel:

Base R does not have exact equivalents to these functions. Instead, there is `substr()`

, which generically extracts substrings from strings. Let’s replicate the above image in R:

a = "Chicago" b = "New York City" c = "Los Angeles" # LEFT substr(a, 1, 3) # MID substr(b, 5, 8) # RIGHT substr(c, 5, 11)

[1] "Chi" [1] "York" [1] "Angeles"

The `substr()`

function is structured as `substr(x, start, stop)`

where *start* and *stop* are the first and last characters to be extracted, respectively.

### Creating custom functions

Chapter 15 of the book shows how to create custom functions. We would write custom functions if we plan to execute these commands regularly. Or if we just want the comfort of a more familiar function. Here’s how I would do it:

left = function(text, num_char) { substr(text, 1, num_char) } mid = function(text, start_num, num_char) { substr(text, start_num, start_num + num_char - 1) } right = function(text, num_char) { substr(text, nchar(text) - (num_char-1), nchar(text)) } left(a, 3) mid(b, 5, 4) right(c, 7)

[1] "Chi" [1] "York" [1] "Angeles"

The structure of these three functions exactly mimics the Excel formulas. `right()`

, for example, now just takes *text* and *num_char* as input, just like Excel’s `RIGHT()`

. We did this by hacking the input to the `substr()`

function. Taking the Los Angeles example, `right()`

ends up evaluating to:

`substr("Los Angeles", 11 - (7 - 1), 11)`

==> `substr("Los Angeles", 5, 11)`

Which is exactly what we did in the first example.

### More Advanced string extraction

The above examples are just scratching the surface of what you can do with string extraction in R. For example:

**Separating first and last names**

This is not bad in Excel, but what if someone puts in their middle name or initial? It would get more complicated if you want to write a single formula to handle all the rows.

Or how about this one:

**Extracting the query string from a URL**

To just get the query string (the part where q=…), the first three are not too hard in Excel. But then we get a curveball with the *&another_parameter*, which we don’t want to grab.

Regular expressions come into play, and we will cover this in a later post.

## Leave a Reply

Be the First to Comment!