Let’s say we imported a .csv or .xlsx file into R, and it’s like this:
person payment 1 Person 1 $56.11 2 Person 2 $20.42 3 Person 3 $104.20 4 Person 4 $201.21 5 Person 5 $5.06
Looks fine, until you try to do a calculation on the payment column:
Error in sum(df$payment) : invalid 'type' (character) of argument
So we run
str(df) to check the table structure. Lo and behold, that column is not a numeric variable, it is character (chr):
'data.frame': 5 obs. of 2 variables: $ person : chr "Person 1" "Person 2" "Person 3" "Person 4" ... $ payment: chr "$56.11" "$20.42" "$104.20" "$201.21" ...
How to remove the dollar signs from column in R
One way to do it is with the
gsub() function, in conjunction with
gsub() is used to substitute specific text from a string with other text, and
as.numeric() can coerce a variable to numeric.
Let’ see it in action:
# replace $ with blank "" in the df$payment column. and coerce that result to numeric df$payment_2 = as.numeric(gsub("\\$", "", df$payment)) # print the data frame just to eyeball it df # compute sum of payment_2 column sum(df$payment_2)
gsub() function looks for any instance of “\$” and replaces it with “”. The forward-slashes are known as escape characters. They are needed because
gsub() accepts “regular expression” as the first parameter, and since
$ is a reserved regular expression notation, we must “escape” this reserved meaning and look for literal values of
$ in the text strings.
person payment payment_2 1 Person 1 $56.11 56.11 2 Person 2 $20.42 20.42 3 Person 3 $104.20 104.20 4 Person 4 $201.21 201.21 5 Person 5 $5.06 5.06  387
By the same token, we can replace commas and other currency-related notations that are being read as part of the string. We can do them individual as we did above with the dollar sign, or we can specify any number of symbols to remove, all at once. For example, to remove both dollar sign and comma, we use the following notation:
# replace $ and comma with blank "" in the df$payment column df$payment_2 = as.numeric(gsub("[\\$,]", "", df$payment))
Here we modified the
gsub() function to include
[\\$,]. We could add any number of other symbols within brackets that we wish to replace.
Regular expressions is a whole massive topic unto itself. Entire books are written about it. My favorite is Mastering Regular Expressions. And regexr.com is an excellent resource for learning and testing regular expressions on your text.
And don’t forget. We just used data frame columns for convenience here. But a column is simply a vector, so generically speaking, this approach can be used for any kind of vector.