Function of the Week - dplr::add_count( )

In this document, I will introduce the add_count ( ) function and show what it’s for. I will also compare & contrast the add_count( ) function to the count( ) function.

What is it for?

The function add_count( ) adds a column in the output table with group-wise counts. The function add_count( ) is similar to the function count( ), except it uses mutate( ) versus summarize– thus adding a new column with the group counts of unique values. The function add_count( ) can be thought of as a shortcut for the group_by( ) function in conjunction with the mutate( ) function.

First, upload tidyverse and data set. I used the penguins data set and deleted a few columns that were not needed for this example.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.0 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr)

library(palmerpenguins)
data(penguins)

penguins_new <-keeps<-c("species", "island", "body_mass_g", "sex", "year")
penguins_new =penguins[keeps]

Using add_count( ) for “body_mass_g” variable

penguins_new %>%
dplyr::add_count(body_mass_g)
## # A tibble: 344 × 6
##    species island    body_mass_g sex     year     n
##    <fct>   <fct>           <int> <fct>  <int> <int>
##  1 Adelie  Torgersen        3750 male    2007     5
##  2 Adelie  Torgersen        3800 female  2007    12
##  3 Adelie  Torgersen        3250 female  2007     5
##  4 Adelie  Torgersen          NA <NA>    2007     2
##  5 Adelie  Torgersen        3450 female  2007     8
##  6 Adelie  Torgersen        3650 male    2007     6
##  7 Adelie  Torgersen        3625 female  2007     1
##  8 Adelie  Torgersen        4675 male    2007     1
##  9 Adelie  Torgersen        3475 <NA>    2007     3
## 10 Adelie  Torgersen        4250 <NA>    2007     5
## # … with 334 more rows

Same argument, but naming the new column to reflect the variable that the group-wise count column corresponds with:

penguins_new %>%
dplyr::add_count(body_mass_g, name = "n_body_mass_g")
## # A tibble: 344 × 6
##    species island    body_mass_g sex     year n_body_mass_g
##    <fct>   <fct>           <int> <fct>  <int>         <int>
##  1 Adelie  Torgersen        3750 male    2007             5
##  2 Adelie  Torgersen        3800 female  2007            12
##  3 Adelie  Torgersen        3250 female  2007             5
##  4 Adelie  Torgersen          NA <NA>    2007             2
##  5 Adelie  Torgersen        3450 female  2007             8
##  6 Adelie  Torgersen        3650 male    2007             6
##  7 Adelie  Torgersen        3625 female  2007             1
##  8 Adelie  Torgersen        4675 male    2007             1
##  9 Adelie  Torgersen        3475 <NA>    2007             3
## 10 Adelie  Torgersen        4250 <NA>    2007             5
## # … with 334 more rows

Comparison to the count( ) function:

penguins_new %>%
dplyr::count(body_mass_g)
## # A tibble: 95 × 2
##    body_mass_g     n
##          <int> <int>
##  1        2700     1
##  2        2850     2
##  3        2900     4
##  4        2925     1
##  5        2975     1
##  6        3000     2
##  7        3050     4
##  8        3075     1
##  9        3100     1
## 10        3150     4
## # … with 85 more rows

Number of rows using the add_count( ) function: 344

Number of rows using the count( ) function: 95

There are 344 rows, the same number of rows in the orginal data set. We can see that the number of rows is preserved using the add_count( ) function.

There are 95 rows in the count ( ), so the data has been compressed to have the number of rows equal to the number of unique values there are in the body_mass_g.

Using add_count() for “sex” variable

penguins_new %>%
dplyr::add_count(sex)
## # A tibble: 344 × 6
##    species island    body_mass_g sex     year     n
##    <fct>   <fct>           <int> <fct>  <int> <int>
##  1 Adelie  Torgersen        3750 male    2007   168
##  2 Adelie  Torgersen        3800 female  2007   165
##  3 Adelie  Torgersen        3250 female  2007   165
##  4 Adelie  Torgersen          NA <NA>    2007    11
##  5 Adelie  Torgersen        3450 female  2007   165
##  6 Adelie  Torgersen        3650 male    2007   168
##  7 Adelie  Torgersen        3625 female  2007   165
##  8 Adelie  Torgersen        4675 male    2007   168
##  9 Adelie  Torgersen        3475 <NA>    2007    11
## 10 Adelie  Torgersen        4250 <NA>    2007    11
## # … with 334 more rows

Same argument, but naming the new column to reflect the variable that the group-wise count column corresponds with:

penguins_new %>%
dplyr::add_count(sex, name = "n_sex")
## # A tibble: 344 × 6
##    species island    body_mass_g sex     year n_sex
##    <fct>   <fct>           <int> <fct>  <int> <int>
##  1 Adelie  Torgersen        3750 male    2007   168
##  2 Adelie  Torgersen        3800 female  2007   165
##  3 Adelie  Torgersen        3250 female  2007   165
##  4 Adelie  Torgersen          NA <NA>    2007    11
##  5 Adelie  Torgersen        3450 female  2007   165
##  6 Adelie  Torgersen        3650 male    2007   168
##  7 Adelie  Torgersen        3625 female  2007   165
##  8 Adelie  Torgersen        4675 male    2007   168
##  9 Adelie  Torgersen        3475 <NA>    2007    11
## 10 Adelie  Torgersen        4250 <NA>    2007    11
## # … with 334 more rows

Comparison to the count( ) function:

penguins_new %>%
dplyr::count(sex)
## # A tibble: 3 × 2
##   sex        n
##   <fct>  <int>
## 1 female   165
## 2 male     168
## 3 <NA>      11

Number of rows using the add_count( ) function: 344

Number of rows using the count( ) function: 95

We can see for the first observation in row one, the penguin being observed is male. Using add_count( ), the “n_sex” column lists the value 168 as the group-wise count for male penguins. Looking at the count( ) function, we see that there are 168 male penguins total in the data set.

Is it helpful?

This function is useful for certain projects, such as a project where it is useful to be able to quickly see how many group-wise counts there are for a certain variable while preserving the number of rows in the original data set. It is also helpful as a shortcut to the mutate( ) & group_by ( ) function.