Recently I have been working on a very large legacy project which utilises the excellent data.table
package throughout. What this has resulted in is an R CMD check
containing literally thousands of NOTE
s similar to the following:
❯ checking R code for possible problems ... NOTE
my_fn: no visible binding for global variable ‘mpg’
There are several reasons why you might see these NOTE
s and, for our code base, some of the NOTE
s were potentially more damaging than others. This was a problem as these NOTE
s were hidden firstly by a suppression of them due to a manipulation of the _R_CHECK_CODETOOLS_PROFILE_
option of the .Renviron
file. Once this was removed we discovered the more damaging NOTE
s were hidden within the sheer amount of NOTE
s we had in the R CMD check
.
Non-standard Evaluation
If we have a function where we are using data.table
’s modification by reference features, i.e. we are using a variable in an unquoted fashion (also known as non-standard evaluation (NSE)) then this issue will occur. Take the following function as an example.
my_fn <- function() {
mtcars <- data.table::data.table(mtcars)
mtcars[, mpg_div_hp := mpg / hp]
mtcars[]
}
Here, we would find the following NOTE
s:
❯ checking R code for possible problems ... NOTE
my_fn: no visible binding for global variable ‘mpg_div_hp’
my_fn: no visible binding for global variable ‘mpg’
my_fn: no visible binding for global variable ‘hp’
Undefined global functions or variables:
hp mpg mpg_div_hp
Sometimes you may also see these NOTE
s for syntactic sugar such as !!
or :=
if you haven’t correctly imported the package they come from.
This is a well discussed issue on the internet which only became an issue after a change introduced to the core R code in version 2.15.1. There are two solutions to this problem.
Option One
Include all variable names within a globalVariables()
call in the package documentation file.
globalVariables(c("mpg", "hp", "mpg_div_hp"))
For our package, as there are literally thousands of variables to list in this file, it makes it very difficult to maintain this list and makes the file very long. If, however, the variables belong to data which are stored within your package then this can be greatly simplified to
globalVariables(names(my_data))
You may wish to import any syntactic sugar functionality here as well. For example
globalVariables(c(":=", "!!"))
Option Two
The second option involves binding the variable locally to the function. At the top of your function you can define the variable as a NULL
value.
my_fn <- function() {
mpg <- hp <- mpg_div_hp <- NULL
mtcars <- data.table::data.table(mtcars)
mtcars[, mpg_div_hp := mpg / hp]
mtcars[]
}
Therefore your variable(s) are now bound to object(s) and so the R CMD check
has nothing to complain about. This is the method that the data.table
team recommend and to me, feels like a much neater and more importantly maintainable solution than the first option.
A Note on the Tidyverse
You may also come across this problem whilst programming using the tidyverse
for which there is a very neat solution. You simply need to be more explicit within your function by using the .data
pronoun.
#' @importFrom rlang .data
my_fn <- function() {
mtcars %>%
mutate(mpg_div_hp = .data$mpg / .data$hp)
}
Note the import!
Selecting Variables with the data.table
..
Prefix
NOTE
s can occur when we are using the ..
syntax of data.table
, for example
double_dot <- function() {
mtcars <- data.table::data.table(mtcars)
select_cols <- c("cyl", "wt")
mtcars[, ..select_cols]
}
This will yield
❯ checking R code for possible problems ... NOTE
Undefined global functions or variables:
..select_cols
In this instance, this can be solved by avoiding the ..
syntax and using the alternative with = FALSE
notation.
double_dot <- function() {
mtcars <- data.table::data.table(mtcars)
select_cols <- c("cyl", "wt")
mtcars[, select_cols, with = FALSE]
}
Even though the ..
prefix is syntactic sugar, we cannot use globalVariables(c(".."))
since the actual variable in this case is ..select_cols
; we would therefore need to use globalVariables(c("..select_cols"))
if we wanted to use the globalVariables()
approach.
Missing Imports
In our code base, I also found NOTE
s for functions or datasets which were not correctly imported. For example, consider the following simple function.
Rversion <- function() {
info <- sessionInfo()
info$R.version
}
This gives the following NOTE
s:
❯ checking R code for possible problems ... NOTE
Rversion: no visible global function definition for ‘sessionInfo’
Consider adding
importFrom("utils", "sessionInfo")
to your NAMESPACE file.
Here the R CMD check
is rather helpful and tells us the solution; we need to ensure that we explicitly import the function from the utils
package in the documentation. This can easily be done with the roxygen2
package by including an @importFrom utils sessionInfo
tag.
Trying to Call Removed Functionality
If you have a function which has been removed from your package but attempt to call it from another function, R will only give you a NOTE
about this.
use_non_existent_function <- function() {
this_function_doesnt_exist()
}
This will give the NOTE
❯ checking R code for possible problems ... NOTE
use_non_existent_function: no visible global function definition for
‘this_function_doesnt_exist’
Of course it goes without saying that you should make sure to remove any calls to functions which have been removed from your package. As a side note, when I first started working on the project, I was initially unaware that within our package we had the option _R_CHECK_CODETOOLS_PROFILE_ = "suppressUndefined = TRUE"
set within our .Renviron
file which will suppresses all unbound global variable NOTE
s from appearing in the R CMD check
. However given that this can mask these deeper issues within your package, such as not recognising when a function calls functionality which has been removed from the package. This can end up meaning the end user can face nasty and confusing error messages. Therefore I would not recommend using this setting and would suggest tackling each of your packages NOTE
s individually to remove them all.
I actually discovered all of our package NOTE
s when introducing the lintr
package to our CI pipeline. lintr
will pick up on some – but not all – of these unbound global variable problems ('lintr
of course does not take the _R_CHECK_CODETOOLS_PROFILE_
into account). Take our original function as an example
my_fn <- function() {
mtcars <- data.table::data.table(mtcars)
mtcars[, mpg_div_hp := mpg / hp]
mtcars[]
}
Here, lintr
will highlight the variables mpg
and hp
as problems but it currently won’t highlight the variables on the LHS of :=
, i.e. mpg_div_hp
.
Conclusion
When developing your package, if you are experiencing these unbound global variables NOTE
s you should
- Strive to define any unbound variables locally within a function.
- Ensure that any functions or data from external packages (including
utils
,stats
, etc.) have the correct@importFrom
tag - Do not suppress this check in the
.Renviron
file and the solutions proposed here should remove the current need to do so - Any package wide unbound variables, which are typically syntactic sugar (e.g.
:=
), should be defined within the package description file inside aglobalVariables()
function, which should be a very short and maintainable list.