Creating an R-Package Template: A Guide

Posted by Ivan Navarro

Mar 8, 2018 6:05:00 AM

 

RLanguage.jpg

Photo by Steinar Engeland on Unsplash

By Ivan Navarro, Data Scientist at Cytel

R is an open-source implementation of ‘S’, the statistical programming language. With its open character and ability to extend its functionality using external packages, R allows users to create their own packages that are easily loadable into the core instance.
In essence, R-packages are extensions that contain source-code, documentation, data and examples of personal contributions and can be extremely useful for data scientists, statisticians and programmers alike who need to create custom analysis and visualizations. However, creating your first R-package can be a complex task for non-experienced users.
In this blog, I explain how to create a basic R-package which can be used as template for anyone interested in making a contribution. A previous knowledge on R programming is required, but you will not need to deal with technical aspects of the creation process because the R-Package structure is shared at the end.


What is an R-Package?

In his book “Advanced R”,.Hadley Wickham wrote “R is now widely used in academic research, education, and industry. It is constantly growing, with new versions of the core software released regularly and more than 5,000 packages available”. This quote clearly reflects the increasing importance and popularity of R-packages.
Because it is a community-based software, the development of R-packages should follow a specific structure and guidelines to ensure that:
• It can be loaded into the core R software.
• New extensions are compatible with other computers.
• If the package is to be shared with the community on an official basis, it should be compliant with repositories requirements like “The Comprehensive R Archive Network” (CRAN) and “Bioconductor”.

Note: This post will not explain the requirements to install and load extensions, or the use of R itself. For this, there are many excellent introductory R tutorials available, like those in Cytel’s blogs


Basic R-Package creation
In this section, I will detail the different requirements and development steps including the required tools, the basic folder structure, and share a simple example feature along with its documentation and tests creation.

Required tools
To start creating an R-Package you will need to have an R core instance running. If you don’t already have this installed, you can download it from CRAN (https://cran.r-project.org/).

Below are the packages that I personally recommend to develop an R-package:
“devtools”: development tools to do the work easily.
• “roxygen2”: permits function auto-documentation.
“testthat”: for unit-testing.

Although details are not included in this post, I also suggest:
“rmarkdown”: documentation language to create vignettes with code chunks.
“shiny”: creation of examples using a web GUI.

Note: During this guide, I use a MacOS. So be aware that some of the detailed steps may not coincide with your OS, especially if Windows or Linux are used. However, R is platform independent and all other OS have their own readily available solutions for these steps.

Package structure
Creating a basic R-Package structure is easier with the “devtools” solution. In fact, this package is the main tool for this purpose and offers a lot of functions, which help to simplify the process.

Let’s begin by loading this R-package and creating a minimal package structure in the desired system path:

library(devtools)
devtools::create(“MyPack”)

You can now see that our new package is named “MyPack”, and the line devtools::create() helps to create the minimum required structure.

It includes:
MyPack.Rproj, with package compilation and project details. Identifies the folder as an R-Package.
DESCRIPTION, where we describe some metadata about the package (Authors, Maintainer, Description, Dependencies, etc).
NAMESPACE, auto-generated file which content describes functions to be exported and packages to be imported when the package is loaded.
• R folder will contain R source code.

Note: You can find files starting with a dot (ie.: .Rbuildignore, .gitignore), which are related with the building process and version control parameters to ignore specific files, usually auto-generated.

The R folder will host all our R development. In the opposite of typical R pipelines, R-packages do not use the loading instructions, neither library() nor require(), to load other dependencies. For this reason, the file “MyPack-package.R” should be created in that R folder, to dynamically load the package itself as well as all the imports required as shown below:

#' @useDynLib MyPack
#' @importFrom stats pweibull dweibull

From package “stats”, functions pweibull() and dweibull() are already available in the R instance by default and do not need to be imported, but they were included in “MyPack-package.R” as an example and will be used for unit-testing.

Feature example
As an example, the next lines implement the probability density function (PDF) and the cumulative distribution function (CDF) for the Weibull distribution. Thus, a new file is created (i.e., <MyPack_folder>/R/MyWeibull.R) to contain them as new functions in this package:

## PDF of Weibull distribution
MyWpdf <- function(x,a,b) {
if (x>=0) (a/b) * (x/b)^(a-1) * exp(-(x/b)^a) else 0
}

## CDF of Weibull distribution
MyWcdf <- function(x,a,b) {
if (x>=0) 1-exp((-x/b)^a) else 0
}

Now we have a first implementation of our R-package. The next few R commands are introduced to manage it in our system:
devtools::build(“./MyPack”), to build a *.tar.gz (distributable) package with the R-package content.
devtools::build_win(“./MyPack”), in case you want that your R-Package becomes compatible with Windows. This function works by bundling source package, and then uploading to http://win-builder.r-project.org/. Once building is complete you will receive a link to the built package in the email address listed in the maintainer field located in the DESCRIPTION file.
devtools::check(“./MyPack”), to check that the build package is correct.
devtools::install(“./MyPack”), to install the build package in R’s library.
devtools::uninstall(“MyPack”), to uninstall any package from R’s library.


Documentation
Properly documenting all the exposed features in a R-package is extremely important because it helps other users ( and even yourself when using it again) to understand what is happening in each piece of code, to know what input parameters are expected, and what one should expect after any execution.

One best practice is to auto-document the source code. It means that each function describes all the required information before the “X <- function(…) {…}” statement.

Since we are using “roxygen2” to auto-generate content and documentation, each new function has to include the documentation mark #’ plus the description and documentation tags (ex. @param, @returns) to describe its action.

To indicate that a function should be available to the user, an @export tag is required also. “roxygen2” will collect this information and will modify the NAMESPACE file to set these exportation rules at the same time as the documentation is generated.

As an example, find below the Weibull PDF that we created earlier in this post but auto-documented:

#’ PDF of Weibull distribution
#’
#’ This function calculates the probability density for x given ‘a’
#’ as the shape parameter and ‘b’as the scale parameter.
#’
#’ @param x value.
#’ @param a Shape parameter.
#’ @param b Scale parameter.
#’ @return Probability density.
#’
#’ @export
MyWpdf <- function(x,a,b) {
if (x>=0) (a/b) * (x/b)^(a-1) * exp(-(x/b)^a) else 0
}

To generate the documentation, use the instruction devtools::document("./MyPack") from R. This will modify the NAMESPACE file and generate *.Rd files which will contain manual pages for functions in a “man” folder inside the project.

Tests
Testing, along with features and documentation creation, is among the most important aspects of package development. The benefits include (not exclusively):


• Reliability: The feature works as expected because tests demonstrate it. It ensures that any future edits will not “break” earlier features.
• Repeatability: Ensures that results shall always be acceptable.
• Reusability: The tests could be considered as “second documentation”, because they are available with all considered options and parametrizations. So, the user can get inspiration from them.

To create a set of tests in an R-Package, we will use an external package called “testthat”. Use the instruction devtools::use_testthat(“./MyPack”) to generate the required folder and files structure from templates.
To create a new test file, execute devtools::use_test("MyWeibull", pkg="./MyPack") with a context name to identify this test file.

An example of tests for the current features included in this post, would be:

test_that("Testing my PDF Weibull function", {

### INIT
s <- seq(-1,1,0.01)
weibull_shape <- 1
weibull_scale <- 0.5

### EXECUTION
## We want to compare the theoretical PDF and stats::dweibull() results
myWpdfRes <- sapply(s, FUN=function(x, sh, sc) MyWpdf(x,sh,sc), sh=weibull_shape, sc=weibull_scale)
dWpdfRes <- sapply(s, FUN=function(x, sh, sc) dweibull(x,sh,sc), sh=weibull_shape, sc=weibull_scale)

### CHECK
## All values should be equal.
expect_true(all(round(myWpdfRes,9) == round(dWpdfRes,9))) ## TRUE

})

The use of expect_true() is essential in tests because this and many other “expect_*” functions compare obtained results with what was expected and “testthat” considers them as formal test cases.

To evaluate your tests set, run devtools::test(), or devtools::test(filter=”MyWeibull”) to filter by context.

Top Tip: It is worth reading any of the many unit testing tutorials available by searching Google. One good example is the section “Tests” from the book “R-Packages”.

Maintenance and final considerations
The reader should remember that an R-package is a distributable container with R features that can be expanded with additional or improved solutions. After developing, documenting, and testing each effort you will need to run all tests with devtools::test(), create appropriate documentation with devtools::document(), check that all package content and metadata is consistent with devtools::check(), create a distributable R-package with devtools::build() and install it with devtools::install().

At this point, you should have the introductory knowledge needed to go ahead and create your own package. Since R is a community-driven programming language (or platform, if we consider CRAN), it is highly extensible. So, reading external documentation is very much encouraged. It’s especially worth reading “Writing R extensions” and “R-Packages” among other resources.

From my experience working in the clinical data science field, I believe that it is preferable to create small and distributable R-packages with atomic features rather than over weighted “all-in-one” ones. A good rule of thumb would be:
• First write some code.
• If it is needed again, write a function.
• If it is needed again, go for a package creation!

The basic R-package created in this post can be found publicly in Github.com:

Click here to access the package.

Feel free to download, clone or fork it to use as template.

We hope you found this a useful exercise! 

Questions? Connect with me on LinkedIn

References
• R (programming language): https://en.wikipedia.org/wiki/R_(programming_language)
• Writing R extensions: https://cran.r-project.org/doc/manuals/r-release/R-exts.html
• “Advanced R”, Hadley Wickham. Ed: CRC press.
• “The Comprehensive R Archive Network” https://cran.r-project.org/
• “Weibull distribution” https://en.wikipedia.org/wiki/Weibull_distribution
• “R-Packages”, Hadley Wickham. Ed: O’Reilly; http://r-pkgs.had.co.nz

 

To learn more about Cytel's data science services click the button below.

Data science

 

Ivan.jpg

About the author

Ivan Navarro is a senior computing scientist at Cytel. He focuses on improving the computational performance of analytical tools, developing R-based solutions, and performing survival analysis. He has extensive experience in software development and data analysis in the life sciences domain.
He has acquired specialized skills and earned certifications in the fields of Big Data, data science, machine learning, and high-performance computing. Ivan obtained an MS in bioinformatics and biostatistics from Universitat Oberta de Catalunya.

 

 

 

 

 

Topics: R language, R programming, data science

The Cytel blog keeps you up to speed with the latest developments in biostatistics and clinical biometrics.  Sign up for updates direct to your inbox. You can unsubscribe at any time.

 

Posts by Topic

see all

Recent Posts