Calculate percent of row in R

You want to calculate the percent of row as shown in the tables below, or as you would in a PivotTable:

Percent of Row in R

In R, any time you have to deal with row or column totals in some way, the apply() function is often the way to go. In this case we start by calculating the totals column:

Let’s say our data frame is named fruits.

total_col = apply(fruits[,-1], 1, sum)

Result

[1] 69 50 33 17 67

The apply function can run an operation (in this case sum) across all rows or all columns. In this case we set the second argument to 1, which represents running the operation across each row. And as you can see, the result is a vector of five numbers, one for each row. And it matches the totals column in the table above.

Two ways to calculate percent of row in R

It’s good to now the manual way so that you can grow to code your own automated way instead. The automated way is more scalable and less prone to mistakes:

Solution 1: Manual method

Here is one way using the standard dollar sign referencing:

fruits$week1_pct = fruits$week1 / total_col
fruits$week2_pct = fruits$week2 / total_col
fruits$week3_pct = fruits$week3 / total_col
fruits$week4_pct = fruits$week4 / total_col
fruits$week5_pct = fruits$week5 / total_col

An alternative is to use a library like dplyr, but the benefit is fairly minimal and equally verbose (might have speed gains with larger data sets).

Solution 2: Automated method using sapply or lapply

First we loop through each numeric column (all except first), and in each iteration divide that column by the total_col vector:

pcts = lapply(fruits[,-1], function(x) {
  x / total_col
})

The result is a list of five vectors, one for each column calculation:

$week1
[1] 0.27536232 0.10000000 0.03030303 0.05882353 0.20895522

$week2
[1] 0.2753623 0.0200000 0.4242424 0.3529412 0.1641791

$week3
[1] 0.1304348 0.2600000 0.1212121 0.1764706 0.2686567

$week4
[1] 0.2173913 0.2600000 0.2727273 0.2941176 0.2537313

$week5
[1] 0.1014493 0.3600000 0.1515152 0.1176471 0.1044776

We could have also done a for() loop, but that is actually more work for you.

Next we can convert that to a data frame (a data frame is actually like a list of column vectors, each with the same number of items), and append the fruit column:

pcts = as.data.frame(pcts)
pcts$fruit = fruits$fruit

We could have also done a cbind() and the lapply() output to have both sets of numbers in one combined table.

Both methods above result in a table like this:

        fruit      week1     week2     week3     week4     week5
1      Apples 0.27536232 0.2753623 0.1304348 0.2173913 0.1014493
2     Bananas 0.10000000 0.0200000 0.2600000 0.2600000 0.3600000
3     Oranges 0.03030303 0.4242424 0.1212121 0.2727273 0.1515152
4     Mangoes 0.05882353 0.3529412 0.1764706 0.2941176 0.1176471
5  Pineapples 0.20895522 0.1641791 0.2686567 0.2537313 0.1044776

(Although technically in the automated example you will end up with the fruit column at the far right if you followed the same method)

Leave a Reply

Be the First to Comment!

avatar

wpDiscuz