Function of the Week - dplr::add_count( )
In this document, I will introduce the add_count ( ) function and show what it’s for. I will also compare & contrast the add_count( ) function to the count( ) function.
The function add_count( ) adds a column in the output table with group-wise counts. The function add_count( ) is similar to the function count( ), except it uses mutate( ) versus summarize– thus adding a new column with the group counts of unique values. The function add_count( ) can be thought of as a shortcut for the group_by( ) function in conjunction with the mutate( ) function.
First, upload tidyverse and data set. I used the penguins data set and deleted a few columns that were not needed for this example.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.0
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dplyr)
library(palmerpenguins)
data(penguins)
penguins_new <-keeps<-c("species", "island", "body_mass_g", "sex", "year")
penguins_new =penguins[keeps]
Using add_count( ) for “body_mass_g” variable
penguins_new %>%
dplyr::add_count(body_mass_g)
## # A tibble: 344 × 6
## species island body_mass_g sex year n
## <fct> <fct> <int> <fct> <int> <int>
## 1 Adelie Torgersen 3750 male 2007 5
## 2 Adelie Torgersen 3800 female 2007 12
## 3 Adelie Torgersen 3250 female 2007 5
## 4 Adelie Torgersen NA <NA> 2007 2
## 5 Adelie Torgersen 3450 female 2007 8
## 6 Adelie Torgersen 3650 male 2007 6
## 7 Adelie Torgersen 3625 female 2007 1
## 8 Adelie Torgersen 4675 male 2007 1
## 9 Adelie Torgersen 3475 <NA> 2007 3
## 10 Adelie Torgersen 4250 <NA> 2007 5
## # … with 334 more rows
Same argument, but naming the new column to reflect the variable that the group-wise count column corresponds with:
penguins_new %>%
dplyr::add_count(body_mass_g, name = "n_body_mass_g")
## # A tibble: 344 × 6
## species island body_mass_g sex year n_body_mass_g
## <fct> <fct> <int> <fct> <int> <int>
## 1 Adelie Torgersen 3750 male 2007 5
## 2 Adelie Torgersen 3800 female 2007 12
## 3 Adelie Torgersen 3250 female 2007 5
## 4 Adelie Torgersen NA <NA> 2007 2
## 5 Adelie Torgersen 3450 female 2007 8
## 6 Adelie Torgersen 3650 male 2007 6
## 7 Adelie Torgersen 3625 female 2007 1
## 8 Adelie Torgersen 4675 male 2007 1
## 9 Adelie Torgersen 3475 <NA> 2007 3
## 10 Adelie Torgersen 4250 <NA> 2007 5
## # … with 334 more rows
Comparison to the count( ) function:
penguins_new %>%
dplyr::count(body_mass_g)
## # A tibble: 95 × 2
## body_mass_g n
## <int> <int>
## 1 2700 1
## 2 2850 2
## 3 2900 4
## 4 2925 1
## 5 2975 1
## 6 3000 2
## 7 3050 4
## 8 3075 1
## 9 3100 1
## 10 3150 4
## # … with 85 more rows
Number of rows using the add_count( ) function: 344
Number of rows using the count( ) function: 95
There are 344 rows, the same number of rows in the orginal data set. We can see that the number of rows is preserved using the add_count( ) function.
There are 95 rows in the count ( ), so the data has been compressed to have the number of rows equal to the number of unique values there are in the body_mass_g.
Using add_count() for “sex” variable
penguins_new %>%
dplyr::add_count(sex)
## # A tibble: 344 × 6
## species island body_mass_g sex year n
## <fct> <fct> <int> <fct> <int> <int>
## 1 Adelie Torgersen 3750 male 2007 168
## 2 Adelie Torgersen 3800 female 2007 165
## 3 Adelie Torgersen 3250 female 2007 165
## 4 Adelie Torgersen NA <NA> 2007 11
## 5 Adelie Torgersen 3450 female 2007 165
## 6 Adelie Torgersen 3650 male 2007 168
## 7 Adelie Torgersen 3625 female 2007 165
## 8 Adelie Torgersen 4675 male 2007 168
## 9 Adelie Torgersen 3475 <NA> 2007 11
## 10 Adelie Torgersen 4250 <NA> 2007 11
## # … with 334 more rows
Same argument, but naming the new column to reflect the variable that the group-wise count column corresponds with:
penguins_new %>%
dplyr::add_count(sex, name = "n_sex")
## # A tibble: 344 × 6
## species island body_mass_g sex year n_sex
## <fct> <fct> <int> <fct> <int> <int>
## 1 Adelie Torgersen 3750 male 2007 168
## 2 Adelie Torgersen 3800 female 2007 165
## 3 Adelie Torgersen 3250 female 2007 165
## 4 Adelie Torgersen NA <NA> 2007 11
## 5 Adelie Torgersen 3450 female 2007 165
## 6 Adelie Torgersen 3650 male 2007 168
## 7 Adelie Torgersen 3625 female 2007 165
## 8 Adelie Torgersen 4675 male 2007 168
## 9 Adelie Torgersen 3475 <NA> 2007 11
## 10 Adelie Torgersen 4250 <NA> 2007 11
## # … with 334 more rows
Comparison to the count( ) function:
penguins_new %>%
dplyr::count(sex)
## # A tibble: 3 × 2
## sex n
## <fct> <int>
## 1 female 165
## 2 male 168
## 3 <NA> 11
Number of rows using the add_count( ) function: 344
Number of rows using the count( ) function: 95
We can see for the first observation in row one, the penguin being observed is male. Using add_count( ), the “n_sex” column lists the value 168 as the group-wise count for male penguins. Looking at the count( ) function, we see that there are 168 male penguins total in the data set.
This function is useful for certain projects, such as a project where it is useful to be able to quickly see how many group-wise counts there are for a certain variable while preserving the number of rows in the original data set. It is also helpful as a shortcut to the mutate( ) & group_by ( ) function.