Function: remove_empty()
In this document, I will introduce the remove_empty()
function and show what it’s for.
#Load required packages
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
#Create the example data.frame with a missing column and missing row
fxn <- data.frame(
A = c("1", "2", "3", NA, "5"),
B = c("6", "7", "8", NA, "10"),
C = c("11", "12", "13", NA, "15"),
D = c(NA, NA, NA, NA, NA)
)
print(fxn) #Data frame contains 5 rows and 4 columns, where the 4th row and column D are empty
## A B C D
## 1 1 6 11 NA
## 2 2 7 12 NA
## 3 3 8 13 NA
## 4 <NA> <NA> <NA> NA
## 5 5 10 15 NA
The remove_empty()
function from the janitor
package is used for removing empty rows and/or columns from a data.frame or matrix. This function removes all rows and/or columns with only missing values, such as “NA.” The arguments from this function are typically composed of 3 components: dat, which, and quiet. The “dat” component refers to the data.frame or matrix. The “which” element indicates if empty columns and/or rows are to be removed. The optional “quiet” may be set as “TRUE” to omit the summary of the removed contents, and “FALSE” shows messages of the number of items removed.
#Currently, the data.frame contains an empty row and column containing entirely missing values, as shown by "NA"
fxn
## A B C D
## 1 1 6 11 NA
## 2 2 7 12 NA
## 3 3 8 13 NA
## 4 <NA> <NA> <NA> NA
## 5 5 10 15 NA
#By default, without any indication for "which," this function removes both rows and columns with entirely missing values and does not print out a summary statement
remove_empty(fxn)
## value for "which" not specified, defaulting to c("rows", "cols")
## A B C
## 1 1 6 11
## 2 2 7 12
## 3 3 8 13
## 5 5 10 15
#Removes both empty rows and columns and prints out summary statement
remove_empty(fxn, which = c("rows", "cols"), quiet = FALSE)
## Removing 1 empty rows of 5 rows total (20%).
## Removing 1 empty columns of 4 columns total (Removed: D).
## A B C
## 1 1 6 11
## 2 2 7 12
## 3 3 8 13
## 5 5 10 15
#Removes the empty column, but empty row retains and suppresses summary statement
remove_empty(fxn, which = c("cols"), quiet = TRUE)
## A B C
## 1 1 6 11
## 2 2 7 12
## 3 3 8 13
## 4 <NA> <NA> <NA>
## 5 5 10 15
#Removes the empty row, but empty column retains and suppresses summary statement
remove_empty(fxn, which = c("rows"), quiet = TRUE)
## A B C D
## 1 1 6 11 NA
## 2 2 7 12 NA
## 3 3 8 13 NA
## 5 5 10 15 NA
#Alternatively, the function may be written as a pipeline
fxn %>% remove_empty() #removes empty rows and columns by default
## value for "which" not specified, defaulting to c("rows", "cols")
## A B C
## 1 1 6 11
## 2 2 7 12
## 3 3 8 13
## 5 5 10 15
fxn %>% remove_empty(("rows"), quiet = FALSE) #removes empty row with summary statement printed
## Removing 1 empty rows of 5 rows total (20%).
## A B C D
## 1 1 6 11 NA
## 2 2 7 12 NA
## 3 3 8 13 NA
## 5 5 10 15 NA
fxn %>% remove_empty("cols") #removes empty column with summary statement suppressed
## A B C
## 1 1 6 11
## 2 2 7 12
## 3 3 8 13
## 4 <NA> <NA> <NA>
## 5 5 10 15
Yes, this function is practical in cleaning data by removing rows and/or columns with only missing values. The remove_empty()
function may be used in conjunction with other functions in the janitor
package to further tidy up the data, such as cleaning up the column and row names. In addition, this function would help to enhance the table-making process and make the data more readable. Tidy data enables increased efficiency in data management and analysis and reduces the chances of errors.