haven::zap_label()

In this document, I will introduce haven::zap_label() function and show what it’s for.

#load Packages
library(readr)
pacman::p_load(tidyverse, here, haven, labelled)

#Check R documentation for zap_label
?zap_label()

What is it for?

zap_label function is used to remove your variable label. You can use this function if you want to simply drop any labels from your dataframe. zap_label accepts one argument (x) which can be a vector or data frame, like this zap_label (x).

Example 1: Remove label from a vector

#Let's say we want to label a variable called "Pain rating"
PR <- labelled(0:10, c(nopain = 0, moderate = 5, severe = 10), label = "Pain rating")
PR
## <labelled<integer>[11]>: Pain rating
##  [1]  0  1  2  3  4  5  6  7  8  9 10
## 
## Labels:
##  value    label
##      0   nopain
##      5 moderate
##     10   severe
#We want to change the label of this variable, so we have to remove the old label using zap_label
PR <- zap_label(PR)
PR
## <labelled<integer>[11]>
##  [1]  0  1  2  3  4  5  6  7  8  9 10
## 
## Labels:
##  value    label
##      0   nopain
##      5 moderate
##     10   severe
#Reassign the variable with a new label name "Pain Scale"
PR <- labelled(0:10, c(nopain = 0, moderate = 5, severe = 10), label = "Pain Scale")
PR
## <labelled<integer>[11]>: Pain Scale
##  [1]  0  1  2  3  4  5  6  7  8  9 10
## 
## Labels:
##  value    label
##      0   nopain
##      5 moderate
##     10   severe

Example 2: Remove labels from a dataframe, spss dataset

#load the namcs2015_spss dataset

namcs <- read_sav("~/Desktop/R-Programming/part5/data/namcs2015-spss.sav")
glimpse(namcs[,1:3])
## Rows: 28,332
## Columns: 3
## $ VMONTH <dbl+lbl> 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,  4,  4,  4,  4,…
## $ VDAYR  <dbl+lbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 3, 3,…
## $ AGE    <dbl+lbl> 65, 45, 61, 55, 53, 92, 59, 92, 65,  1, 78, 75, 58, 58, 69,…
#Check variable structure
str(namcs[,1:3])
## tibble [28,332 × 3] (S3: tbl_df/tbl/data.frame)
##  $ VMONTH: dbl+lbl [1:28332] 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,  4,  4, ...
##    ..@ label      : chr "Month of visit"
##    ..@ format.spss: chr "F2.0"
##    ..@ labels     : Named num [1:12] 1 2 3 4 5 6 7 8 9 10 ...
##    .. ..- attr(*, "names")= chr [1:12] "January" "February" "March" "April" ...
##  $ VDAYR : dbl+lbl [1:28332] 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5...
##    ..@ label        : chr "Day of week of visit"
##    ..@ format.spss  : chr "F1.0"
##    ..@ display_width: int 7
##    ..@ labels       : Named num [1:7] 1 2 3 4 5 6 7
##    .. ..- attr(*, "names")= chr [1:7] "Sunday" "Monday" "Tuesday" "Wednesday" ...
##  $ AGE   : dbl+lbl [1:28332] 65, 45, 61, 55, 53, 92, 59, 92, 65,  1, 78, 75, 58, ...
##    ..@ label        : chr "Patient age in years"
##    ..@ format.spss  : chr "F3.0"
##    ..@ display_width: int 5
##    ..@ labels       : Named num [1:2] 0 92
##    .. ..- attr(*, "names")= chr [1:2] "Under 1 year" "92 years or older"
#Remove labels from variables using zap_label and check for variable structures
str(zap_label(namcs[,1:3]))
## tibble [28,332 × 3] (S3: tbl_df/tbl/data.frame)
##  $ VMONTH: dbl+lbl [1:28332] 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,  4,  4, ...
##    ..@ format.spss: chr "F2.0"
##    ..@ labels     : Named num [1:12] 1 2 3 4 5 6 7 8 9 10 ...
##    .. ..- attr(*, "names")= chr [1:12] "January" "February" "March" "April" ...
##  $ VDAYR : dbl+lbl [1:28332] 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5...
##    ..@ format.spss  : chr "F1.0"
##    ..@ display_width: int 7
##    ..@ labels       : Named num [1:7] 1 2 3 4 5 6 7
##    .. ..- attr(*, "names")= chr [1:7] "Sunday" "Monday" "Tuesday" "Wednesday" ...
##  $ AGE   : dbl+lbl [1:28332] 65, 45, 61, 55, 53, 92, 59, 92, 65,  1, 78, 75, 58, ...
##    ..@ format.spss  : chr "F3.0"
##    ..@ display_width: int 5
##    ..@ labels       : Named num [1:2] 0 92
##    .. ..- attr(*, "names")= chr [1:2] "Under 1 year" "92 years or older"
#Relabel those variables
var_label(namcs[,1:3]) <- c("Visit Month", "Visit Day", "Pt Age")
#Check if labels are correct
str(namcs[,1:3])
## tibble [28,332 × 3] (S3: tbl_df/tbl/data.frame)
##  $ VMONTH: dbl+lbl [1:28332] 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,  4,  4, ...
##    ..@ label      : chr "Visit Month"
##    ..@ format.spss: chr "F2.0"
##    ..@ labels     : Named num [1:12] 1 2 3 4 5 6 7 8 9 10 ...
##    .. ..- attr(*, "names")= chr [1:12] "January" "February" "March" "April" ...
##  $ VDAYR : dbl+lbl [1:28332] 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5...
##    ..@ label        : chr "Visit Day"
##    ..@ format.spss  : chr "F1.0"
##    ..@ display_width: int 7
##    ..@ labels       : Named num [1:7] 1 2 3 4 5 6 7
##    .. ..- attr(*, "names")= chr [1:7] "Sunday" "Monday" "Tuesday" "Wednesday" ...
##  $ AGE   : dbl+lbl [1:28332] 65, 45, 61, 55, 53, 92, 59, 92, 65,  1, 78, 75, 58, ...
##    ..@ label        : chr "Pt Age"
##    ..@ format.spss  : chr "F3.0"
##    ..@ display_width: int 5
##    ..@ labels       : Named num [1:2] 0 92
##    .. ..- attr(*, "names")= chr [1:2] "Under 1 year" "92 years or older"

Example 3: Remove labels from a dataframe, Stata dataset

#load the NHANES Stata dataset

nhanes1116 <- read_dta("NHANESFULL2011-2016 copy.dta")
glimpse(nhanes1116[,1:3])
## Rows: 29,902
## Columns: 3
## $ seqn     <dbl> 93674, 67472, 64879, 90043, 70388, 64115, 77799, 63774, 67990…
## $ sddsrvyr <dbl> 9, 7, 7, 9, 7, 7, 8, 7, 7, 9, 8, 8, 9, 7, 9, 8, 7, 8, 8, 8, 8…
## $ ridstatr <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
#Check variable structure
str(nhanes1116[,1:3])
## tibble [29,902 × 3] (S3: tbl_df/tbl/data.frame)
##  $ seqn    : num [1:29902] 93674 67472 64879 90043 70388 ...
##   ..- attr(*, "label")= chr "Respondent sequence number"
##   ..- attr(*, "format.stata")= chr "%10.0g"
##  $ sddsrvyr: num [1:29902] 9 7 7 9 7 7 8 7 7 9 ...
##   ..- attr(*, "label")= chr "Data release cycle"
##   ..- attr(*, "format.stata")= chr "%10.0g"
##  $ ridstatr: num [1:29902] 2 2 2 2 2 2 2 2 2 2 ...
##   ..- attr(*, "label")= chr "Interview/Examination status"
##   ..- attr(*, "format.stata")= chr "%10.0g"
#Remove labels from variables using zap_label and check for variable structures
str(zap_label(nhanes1116[,1:3]))
## tibble [29,902 × 3] (S3: tbl_df/tbl/data.frame)
##  $ seqn    : num [1:29902] 93674 67472 64879 90043 70388 ...
##   ..- attr(*, "format.stata")= chr "%10.0g"
##  $ sddsrvyr: num [1:29902] 9 7 7 9 7 7 8 7 7 9 ...
##   ..- attr(*, "format.stata")= chr "%10.0g"
##  $ ridstatr: num [1:29902] 2 2 2 2 2 2 2 2 2 2 ...
##   ..- attr(*, "format.stata")= chr "%10.0g"
#Relabel those variables
var_label(nhanes1116[,1:3]) <- c("ID", "Data Cycle", "Int/Exm Stat")
#Check if labels are correct
str(nhanes1116[,1:3])
## tibble [29,902 × 3] (S3: tbl_df/tbl/data.frame)
##  $ seqn    : num [1:29902] 93674 67472 64879 90043 70388 ...
##   ..- attr(*, "label")= chr "ID"
##   ..- attr(*, "format.stata")= chr "%10.0g"
##  $ sddsrvyr: num [1:29902] 9 7 7 9 7 7 8 7 7 9 ...
##   ..- attr(*, "label")= chr "Data Cycle"
##   ..- attr(*, "format.stata")= chr "%10.0g"
##  $ ridstatr: num [1:29902] 2 2 2 2 2 2 2 2 2 2 ...
##   ..- attr(*, "label")= chr "Int/Exm Stat"
##   ..- attr(*, "format.stata")= chr "%10.0g"

Is it helpful?

I think this function is useful when we want to change the way the variables were labelled, especially after data manipulation when we want to name the variables that make more sense to our data. It is also helpful to remove labels in data from different statistical software such as SPSS, SAS or Stata.