You want to calculate the percent of row as shown in the tables below, or as you would in a PivotTable:

In R, any time you have to deal with row or column totals in some way, the `apply()`

function is often the way to go. In this case we start by calculating the totals column:

Let’s say our data frame is named *fruits*.

total_col = apply(fruits[,-1], 1, sum)

Result

[1] 69 50 33 17 67

The apply function can run an operation (in this case `sum`

) across all rows or all columns. In this case we set the second argument to 1, which represents running the operation across each row. And as you can see, the result is a vector of five numbers, one for each row. And it matches the totals column in the table above.

### Two ways to calculate percent of row in R

It’s good to now the manual way so that you can grow to code your own automated way instead. The automated way is more scalable and less prone to mistakes:

Here is one way using the standard dollar sign referencing:

fruits$week1_pct = fruits$week1 / total_col fruits$week2_pct = fruits$week2 / total_col fruits$week3_pct = fruits$week3 / total_col fruits$week4_pct = fruits$week4 / total_col fruits$week5_pct = fruits$week5 / total_col

An alternative is to use a library like `dplyr`

, but the benefit is fairly minimal and equally verbose (might have speed gains with larger data sets).

`sapply`

or `lapply`

First we loop through each numeric column (all except first), and in each iteration divide that column by the *total_col* vector:

pcts = lapply(fruits[,-1], function(x) { x / total_col })

The result is a list of five vectors, one for each column calculation:

$week1 [1] 0.27536232 0.10000000 0.03030303 0.05882353 0.20895522 $week2 [1] 0.2753623 0.0200000 0.4242424 0.3529412 0.1641791 $week3 [1] 0.1304348 0.2600000 0.1212121 0.1764706 0.2686567 $week4 [1] 0.2173913 0.2600000 0.2727273 0.2941176 0.2537313 $week5 [1] 0.1014493 0.3600000 0.1515152 0.1176471 0.1044776

We could have also done a `for()`

loop, but that is actually more work for you.

Next we can convert that to a data frame (a data frame is actually like a list of column vectors, each with the same number of items), and append the fruit column:

pcts = as.data.frame(pcts) pcts$fruit = fruits$fruit

We could have also done a `cbind()`

and the `lapply()`

output to have both sets of numbers in one combined table.

Both methods above result in a table like this:

fruit week1 week2 week3 week4 week5 1 Apples 0.27536232 0.2753623 0.1304348 0.2173913 0.1014493 2 Bananas 0.10000000 0.0200000 0.2600000 0.2600000 0.3600000 3 Oranges 0.03030303 0.4242424 0.1212121 0.2727273 0.1515152 4 Mangoes 0.05882353 0.3529412 0.1764706 0.2941176 0.1176471 5 Pineapples 0.20895522 0.1641791 0.2686567 0.2537313 0.1044776

(Although technically in the automated example you will end up with the *fruit* column at the far right if you followed the same method)

## Leave a Reply

Be the First to Comment!