What is a promise in Javascript?

Question

Patricio Moracho

Asked: 2020-10-14 12:40:15 +0800 CST 2020-10-14 12:40:15 +0800 CST 2020-10-14 12:40:15 +0800 CST

What to consider in R to build a reproducible example?

772

Whether it is when asking a question on this site or when we need to share an example with a colleague, what elements should we take into account to ensure the reproducibility of the example? (information, data, structures, etc.)

1 Answers

Voted

Patricio Moracho · Answer 1 · 2020-10-14T12:40:15+08:00

We are going to translate and adapt a bit Joris Meys excellent answer on the English site.

A good minimal and reproducible example should consist of the following elements:

a minimal set of data, necessary to reproduce the error or understand the query
the minimum executable code needed to reproduce the error or understand the query, which can be executed with the given data set.
the necessary information about the packages used, the R version and the system on which it runs.
in the case of random processes, a seed (which is normally set by set.seed())

It is often useful to examine help file examples of functions used. In general, all the code given there meets the requirements of a minimal reproducible example: data is provided, minimal code is provided, and everything is executable.

Production of a minimal data set

For most cases, this can be easily accomplished by providing the object vector/dataframe/matrix/etcwith some example values. Or you can indicate one of the datasetsalready incorporated ones, which are supplied with most of the packages. A full list of built-in datasets can be viewed with the command: library(help = "datasets"). There is a brief description of each dataset and more information can be obtained, for example, by asking ?mtcarswhere mtcarsone of the listed datasets is. Other packages may contain additional data sets.

Creating one vectoris easy. Sometimes you need to add some randomness, and there are a whole number of functions to do that. sample()you can randomize a vector, or give a vectorrandom with only a few values. lettersis a useful vector containing the alphabet, which can be used to construct factors.

Some examples:

random values: x <- rnorm(10)for normal distribution, x <- runif (10)for uniform distribution,...
a permutation of some values: x <- sample(1:10)for the vector 1:10in random order.
a random factor:x <- sample(letters[1:4], 20, replace = TRUE)

For arrays, you can use matrix(), for example:

matrix(1:10, ncol = 2)

The creation of dataframescan be done using data.frame(). Pay attention in the creation, do not make a dataframetoo complicated, do not add variables that are not going to be used.

An example:

Data <- data.frame(
    X = sample(1:10),
    Y = sample(c("yes", "no"), 10, replace = TRUE)
)

In some cases, it is necessary to maintain the specific formats of each variable/column. For these, you can use any of the provided functions such <as.AlgunTipo>as: as.factor, as.Date, as.xts, etc.

Copy own data

If you have some data that would be too difficult to construct using these methods, or is necessary to understand a problem (e.g. to determine a problem converting a date from a string you have to "see" the format of the actual data, not an example that is surely correct), then you can always subset your original data, using for example head(), subset()or the indices. Then you can use dput()to give us something that can be put into R immediately:

> dput(head(iris,4))
structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6), Sepal.Width = c(3.5, 
3, 3.2, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5), Petal.Width = c(0.2, 
0.2, 0.2, 0.2), Species = structure(c(1L, 1L, 1L, 1L), .Label = c("setosa", 
"versicolor", "virginica"), class = "factor")), .Names = c("Sepal.Length", 
"Sepal.Width", "Petal.Length", "Petal.Width", "Species"), row.names = c(NA, 
4L), class = "data.frame")

In some cases, a dataframecan have many values that are handled as Factors, doing one subsetor a headwe obtain a smaller sample, but in any case we would be transferring the Factors/levels that we are not using in this sample. What we can do in these cases is eliminate the Levels/levels that are not being used in the sample. Using droplevels(), for example:

> dput(droplevels(head(iris, 4)))
structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6), Sepal.Width = c(3.5, 
3, 3.2, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5), Petal.Width = c(0.2, 
0.2, 0.2, 0.2), Species = structure(c(1L, 1L, 1L, 1L), .Label = "setosa",
class = "factor")), .Names = c("Sepal.Length", "Sepal.Width", 
"Petal.Length", "Petal.Width", "Species"), row.names = c(NA, 
4L), class = "data.frame")

Note that it now Specieshas only one level .Label = "setosa"because effectively we only have these in the sample:Species = structure(c(1L, 1L, 1L, 1L)

Another caveat for dput is that it won't work for indexed data data.tableor for tbl_df( class grouped_dfof dplyr). For this cases you can convert the object to a dataframecommon one before sharing it: dput(as.data.frame(my_data)).

In the worst case, you can give a textual representation that could eventually be read by read.table():

zz <- "Sepal. Longitud Sepal. Anchura Ancho Petal. Longitud Petal. Anchura Especie
1 5,1 3,5 1,4 0,2 setosa
2 4,9 3,0 1,4 0,2 setosa
3 4,7 3,2 1,3 0,2 setosa
4 4,6 3,1 1,5 0,2 setosa
5 5,0 3,6 1,4 0,2 setosa
6 5,4 3,9 1,7 0,4 setosa"

Data <- read.table(text=zz, header = TRUE)

Eventually it could happen that the data is such that it is impracticable to share it in the aforementioned ways, then consider using some service, for example. up to 0.5mb could be used pastebin.com : d <- read.table("http://pastebin.com/raw.php?i=m1ZJuKLH")or some other. Remember that you can save any object with write(df, "archivo.Rda")and then load it with load("archivo.Rda").

If in any way it is not possible to share data, at a minimum we should be able to inform the structure and class of the objects, for that some of these routines usually provide relevant information:

dim(df)
class(df)
typeof(df)
attributes(df)
length(df)
head(str(df))

Sharing the minimal code

This should be the easy part, but it often isn't. What you shouldn't do is:

add all kinds of data conversions. Make sure the supplied data is already in the correct format (unless that's the problem of course)
copy-paste an entire function or the part of code that gives an error. First try to locate which lines exactly give the error. Most of the time you will find out what your problem is.

What it should do is:

add a simple and concise explanation of what your code does, what it is expected to do, and what it actually does. In the event that the result is data, adding a sample of how it should be is much clearer than any explanation you can make.
add the packages to be used if any are used. Ex: library("randomForest")or require("ggplot").
if you open connections or make files, add some code to close them or delete the files on completion (using unlink())
if you change the options, make sure that the code contains an instruction to revert them to the original ones. (for example: op <- par(mfrow=c (1,2))... algún código... par(op))
Concise and explanatory variable names, comments in the code are good practices when sharing code.
test your code in a new, empty R session to make sure the code is executable. Many times we do not realize when sharing a code that we already have certain variables initialized, testing the code to share in a new session will make us realize this. People should be able to copy and paste their data and code into the console and get exactly what you have.

give additional information

In most cases, just the version of R and the operating system will suffice. When conflicts with packages arise, outputting sessionInfo()can be of great help. When talking about connections to other applications (whether via ODBC or anything else), version numbers for these should also be provided, and if possible the information needed in the configuration as well.

If you are running R in RStudio using rstudioapi::versionInfo()can be useful to report your version of RStudio .

If you have a problem with a specific package, you may want to provide a version of the package by giving the output of packageVersion("nombre del paquete").

Reprex

This package that you can install on demand with install.packages("reprex")or, if you use Rstudioit already has it incorporated as a addin, does something very simple and tremendously useful. Let's say you have code like the following:

plot(runif(100))

You select it, copy it to the clipboard and call reprex::reprex()or Rstudiogo to Addins -> Reprex selectionand it will magically generate the full code to paste as an example, for example here on SOes.

``` r
plot(runif(100))
```

![](https://i.imgur.com/hUAOsNc.png)

<sup>Created on 2019-05-21 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>

Which would end up being:

plot(runif(100))

^{Created on 2019-05-21 by the reprex package (v0.3.0)}

What to consider in R to build a reproducible example?

Production of a minimal data set

Copy own data

Sharing the minimal code

give additional information

Reprex

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?