Patricio Moracho's Questions

Patricio Moracho

Asked: 2022-06-17 09:59:31 +0800 CST

How to get the history of a CRAN package?

3

I would like to know how to obtain the list of versions of a certain package from the CRAN repository, so far what I have been able to observe is that with each package the uploaded versions are maintained, for example taking as a case shiny, we can access this urlwhere we can see the versions uploaded, the date and eventually download that version. It occurred to me to process the HTML and get this data, but is there any other more direct way?

Patricio Moracho

Asked: 2022-09-04 14:42:22 +0800 CST

How to anonymize strings from a data.frame?

4

I need to share a data.framewith a colleague, but would like to somehow "anonymize" the data. The idea would be:

Nothing too advanced (I don't need to adhere to any standards or norms)
not reversible
only for chains
simple and fast

Suppose some data like this:

df <- data.frame(nombre = c('Juan', 'Pedro'),
                 Edad = c(34, 45),
                 dni = c('12345678', '87654321'))

The idea would be to apply the algorithm only on the name and the ID.

Patricio Moracho

Asked: 2022-02-10 13:32:12 +0800 CST

Is it possible to set the order in an UPDATE statement?

4

Let's say I have a table like the following:

LineId  Linea
------- -----------
1       Linea 1
2       Linea 2
3       Linea 3
4       Linea 4

And I'm looking to get an output like this:

LineId  Linea
------- -----------
1   Linea 1, 
2   Linea 1, Linea 2, 
3   Linea 1, Linea 2, Linea 3, 
4   Linea 1, Linea 2, Linea 3, Linea 4,

I think the idea is understood, cumulatively concatenate each line, and in the order given by LineId(fundamental). A very rustic way to solve it would be to do something like this:

DECLARE  @Temporal VARCHAR(8000)
UPDATE  #Ejemplo 
    SET @Temporal = ISNULL(@Temporal,'') + Linea + ', ',
        Linea = @Temporal
    FROM #Ejemplo

But, this is where I wonder: since the tables don't have a natural order and you can't set a ORDER BYin a statement UPDATEeither or at least I don't know how to do it, the previous statement, which works fine in the example, doesn't guarantee me the update order. Might as well be outputting something like this

LineId  Linea
------- -----------
1   Linea 1, 
2   Linea 1, Linea 3, Linea 2, 
3   Linea 1, Linea 3,
4   Linea 1, Linea 3, Linea 2, Linea 4,

Additional details:

The example is a test of the idea of the problem, it will work fine, possibly always
In reality, I have a similar case, a legacy code, which processes sequential text files
Erratically, cases are detected where the insertion order would not be maintained
I am clear that there is no "insertion order", I am not complaining about the behavior of SQL Server, it is expected. What yes, this behavior began to be verified when changing a version of the engine (2008 to the next)
The solution that I don't like, but it works, is to use cursors and update by row
I'd like to see if there is a more elegant or natural way to solve it
So far I have tried without much success: a) Add an identity to the table that represents the order, to see if the engine uses it by default b) Go through the generation of an XML, but so far I have not achieved the expected result.

To play the data

CREATE TABLE #Ejemplo (
    LineId  INT IDENTITY,
    Linea   VARCHAR(8000)
)

INSERT INTO #Ejemplo(Linea)
    VALUES  ('Linea 1'), ('Linea 2'), ('Linea 3'), ('Linea 4')

Patricio Moracho

Asked: 2020-03-03 10:29:27 +0800 CST

How to explain this behavior of the CASE?

4

There is a behavior CASEthat has always produced a certain doubt in me. Normally if I see this code:

case when id = 1 then 1 else '' end

I usually change it to something like this:

case when id = 1 then 1 else 0 end

O well

case when id = 1 then 1 else NULL end

That is to say, I am generating a column that, a priori, would have a numerical value, so it does not seem consistent in ELSEreturning a string, even though it is a "blank", so it seems more consistent to me to return a numerical value or in any case a NULL.

However, this code is fully functional, and the blank returned is somehow coerced to a 0.

select id,
    case when id = 1 then 1 else '' end,
    case when id = 2 then 1 else '' end
    from (select 1 as id union 
          select 2
          ) T

+---+---+---+
| 1 | 1 | 0 |
+---+---+---+
| 2 | 0 | 1 |
+---+---+---+

However if instead of a blank ''string we return another string:

select id,
    case when id = 1 then 1 else 'no' end,
    case when id = 2 then 1 else 'no' end
    from (select 1 as id union 
          select 2
          ) T

Msg 245, Level 16, State 1, Line 73
Conversion failed when converting the varchar value 'no' to data type int.

What is this behavior due to? Is this documented somewhere?

Patricio Moracho

Asked: 2020-01-11 10:01:39 +0800 CST

How do I uninstall an old version of Java?

4

I have a terminal with Windows 10 that came pre-installed with an ancient version of Java: Java 2 Runtime Environment, SE v1.4.2_04along with the fact that I don't need it, there is the issue of the security hole that implies leaving it installed. I have tried to eliminate it in several ways since with the "Add Remove Programs" option I have not succeeded:

I have tried the following procedures without success:

The Official Java Uninstall Tool
JavaRa , is recommended in multiple forums and certainly automates many of the tasks related to Java maintenance .
Wise Program Unninstaller , or any of the many uninstall tools.

Patricio Moracho

Asked: 2020-11-15 09:28:41 +0800 CST

Why in certain programming languages is the first index of data collections 1?

16

In general we are used to seeing that collection type objects, I am talking about lists, arrays, matrices, recordsets, or whatever they are called in any language, are "indexed" starting at position 0. However, in R, for some decision By design, any objects (in fact Rthere are no scalar objects) are "indexed" starting at 1.

> vector <- c(1,2,3)
> vector[1]
[1] 1
> vector[0]
numeric(0)

Historically speaking, what motivated this decision? Does it have any particular advantage over "indexing" starting at 0?

Patricio Moracho

Asked: 2020-10-19 10:52:08 +0800 CST

How to find the optimal paths that meet a certain condition?

3

I am studying the package igraphto solve a problem. Let's say I have the following topology:

Each node has possible paths, you always have to follow the direction of the arrows, you cannot go in the opposite direction, the nodes have an order, you can go from h4to h5but not to h3, anyway this is only informative, because the topology already takes this into account. Nodes have an attribute, in this example, it is represented by color.

Finally, what I am looking for is to be able to find at least one path (ideally all of them), as short as possible, in such a way that, starting at any point, I can make sure I go through the three "colors" (or attributes) at least once. ).

Example: h1 -> h6 -> h9, it is ideal since I go through the three colors in 3 steps, but it could also be valid if h1 -> h3 -> h4 -> h6I repeat one of the colors but go through all three.

To reproduce this topology:

library(igraph)

nodos <- structure(list(Hito = structure(1:9, .Label = c("h1", "h2", "h3", 
                                                         "h4", "h5", "h6", "h7", "h8", "h9"), class = "factor"), tipo = structure(c(1L, 
                                                                                                                                    2L, 3L, 1L, 3L, 2L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                        -9L))
topology <- structure(list(Node.1 = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
                                                2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 7L, 
                                                7L, 8L), .Label = c("h1", "h2", "h3", "h4", "h5", "h6", "h7", 
                                                                    "h8"), class = "factor"), Node.2 = structure(c(1L, 3L, 4L, 6L, 
                                                                                                                   7L, 1L, 2L, 3L, 5L, 7L, 2L, 4L, 6L, 4L, 3L, 6L, 7L, 4L, 5L, 6L, 
                                                                                                                   5L, 7L, 6L, 7L, 7L), .Label = c("h3", "h4", "h5", "h6", "h7", 
                                                                                                                                                   "h8", "h9"), class = "factor")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                        -25L))
g <- graph.data.frame(topology, vertices=nodos, directed=TRUE)
V(g)$color <- c("#006699", "#CC0000", "#009933")[as.numeric(factor(V(g)$tipo))]
plot.igraph(g, 
            vertex.size = 20, 
            edge.arrow.size = 0.5,
            vertex.label.font=2, 
            vertex.label.color="gray85",
            vertex.label.cex=1.4, 
            edge.color="gray45",
            layout=layout.kamada.kawai)

Patricio Moracho

Asked: 2020-09-28 06:32:16 +0800 CST

How to build a matrix product of a comparison with a range of values?

4

Let's assume we have a data.framelike the following:

set.seed(2019)
datos <- data.frame(ANO1=sample(1:10, 10, replace = TRUE),
                    ANO2=sample(1:10, 10, replace = TRUE))
datos

   ANO1 ANO2
1     8    8
2     8    7
3     4    3
4     7    2
5     1    7
6     1    7
7     9    1
8     1    8
9     2    4
10    7    5

What I am looking for is to create a matrix with the logical value of comparing if the two columns are less than a certain set of numbers, for example, let's say a range from 1 to 10, taking the first row as a reference, I would like to obtain something like this :

   ANO1 ANO2     1     2     3     4     5     6     7     8     9   10
1     8    8 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE TRUE

In this case 8they are both less than 10and 9but not less than the rest of the values.

Patricio Moracho

Asked: 2020-09-24 19:49:41 +0800 CST

How to improve performance of a geom_histogram()?

2

I am graphing a histogram from a very important set of data using geom_histogram(), I have realized that as the "definition" of it increases by increasing the number of binsbars, the result is slower and slower. The ratio, with a histogram of R base is at least 10 to 1 in time. Example:

library("ggplot2")
library("microbenchmark")
set.seed(2019)
x <- rnorm(100000)
df <- data.frame(x=x)

ggplot_hist <- function(data, bins=100000){
  print(ggplot(data, aes(x=x)) + geom_histogram(bins=bins))
}

base_hist <- function(x, breaks=100000){
  print(hist(x, breaks=length(x)))
}

microbenchmark(
  base_hist(x), 
  ggplot_hist(df), 
  times=3L 
)

Unit: seconds
            expr       min        lq      mean    median        uq       max neval
    base_hist(x)  4.503556  4.632358  4.680143  4.761159  4.768436 4.775713     3
 ggplot_hist(df) 56.330033 57.249490 60.182923 58.168946 62.109369 66.049791    3

Is there a way to optimize a histogram in ggplot?

Patricio Moracho

Asked: 2020-08-22 06:59:08 +0800 CST

Can an operator be passed as a parameter to a function?

6

I would like to build a function like the following:

function(x, y, operador)

The idea is that it receives an operator, for example: +, -, *. /and can return the result of applying the same between xey

Patricio Moracho

Asked: 2020-07-02 20:04:45 +0800 CST

How to build a data.frame with all the words of the Spanish language?

2

I am needing to build a data.framewith words from the Spanish language (or at least a significant number of them), the idea is to use them to later "clean" others data.frame, in order to remove patterns that do not correspond to valid words.

There is a resource in the RAE that is the Reference Corpus of Current Spanish (CREA) , it is a set of some 140,000 documents, made up of books, press material and others. On the other hand, the mentioned document talks about a Frequent Forms Report and I am particularly interested in working with the Total List of Frequencies , which according to what I understand, is a complete list of the words of this Corpus ordered by frequency.

The most specific query is: How can I incorporate this resource into a data.frame? , and the other more general , is this a valid resource for what I'm looking for?

Patricio Moracho

Asked: 2020-06-08 11:52:44 +0800 CST

How to list sales of periods for which I have no information?

12

I have a monthly sales table like the following:

create table ventas (
  id      int NOT NULL AUTO_INCREMENT,
  year    int,
  month   int,
  monto numeric(15,2),
  PRIMARY KEY (id)
);  

insert into ventas (year, month, monto) values (2018, 1, 100);
insert into ventas (year, month, monto) values (2018, 1, 300);
insert into ventas (year, month, monto) values (2018, 3, 340);
insert into ventas (year, month, monto) values (2018, 5, 200);
insert into ventas (year, month, monto) values (2018, 5, 100);
insert into ventas (year, month, monto) values (2018, 7, 100);
insert into ventas (year, month, monto) values (2018, 8, 100);
insert into ventas (year, month, monto) values (2018, 9, 200);
insert into ventas (year, month, monto) values (2018,11, 350);
insert into ventas (year, month, monto) values (2018,12, 440);

I am trying to make a report of these sales per month, I tried it like this:

select year,
       month,
       sum(monto) as total
       from ventas
       group by year,
                month

And I get something like this:

| year | month | total |
|------|-------|-------|
| 2018 |     1 |   400 |
| 2018 |     3 |   340 |
| 2018 |     5 |   300 |
| 2018 |     7 |   100 |
| 2018 |     8 |   100 |
| 2018 |     9 |   200 |
| 2018 |    11 |   350 |
| 2018 |    12 |   440 |

Which is correct, but as can be seen there are "gaps", that is, months without values, I would like to have a report, but with the 12 months and complete those that have not had sales with a 0, that is to say something like this

| year | month | total |
|------|-------|-------|
| 2018 |     1 |   400 |
| 2018 |     2 |     0 |
| 2018 |     3 |   340 |
| 2018 |     4 |     0 |
| 2018 |     5 |   300 |
| 2018 |     6 |     0 |
| 2018 |     7 |   100 |
| 2018 |     8 |   100 |
| 2018 |     9 |   200 |
| 2018 |    10 |     0 |
| 2018 |    11 |   350 |
| 2018 |    12 |   440 |

Important

It may not be sales, or it may be some other type of data, it is important to understand the conceptual problem, which is, what do we do when we are missing information in a table? When we do not have sensor readings for all hours, when there are accounting accounts that do not register movements in certain months, when we want to list the sales of all the branches, but there are branches that have not had sales, when we want to know how many people occupied a room, but there are rooms that have never been occupied, etc.

Patricio Moracho

Asked: 2020-02-05 10:43:50 +0800 CST

Is it possible to use a non-standard package in Repl.it?

3

repl.itis an excellent, useful and for now free tool for executing R code online. However, I have difficulties using non-standard R packages. In fact, when I try to install something, the following happens:

install.packages("vegan")

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Warning in install.packages("vegan") :
  'lib = "/usr/local/lib/R/site-library"' is not writable
Error in install.packages("vegan") : unable to install packages

In the documentation , it is only mentioned that it is possible to install other libraries or packages in the case of Python, Javascriptor Ruby.

Is there a way to be able to use packages outside of the base distribution in this tool?

Patricio Moracho

Asked: 2020-09-28 08:24:39 +0800 CST

What is the proper way to compare floating point numbers?

4

Surely you have already encountered a problem like the following:

> (2.3 - 1.8) == 0.5
[1] FALSE
> sqrt(2)^2 == 2
[1] FALSE

The explanation to the general problem of handling floating point numbers can be found here: Why can't my programs do arithmetic calculations correctly? .

This is not a problem particular to R but to any language that handles floating point numbers.

Now, how can we resolve or handle these "inconsistencies" in the language when making comparisons?

Patricio Moracho

Asked: 2020-05-25 06:51:05 +0800 CST

How does control flow work in a language that is entirely vector?

5

In every language there are clauses to control the flow of execution, in R in particular I am talking about the if/else, the whileand the repeat. These are not very different from those that we can find in any other language, they evaluate a certain condition, depending on one Vedadero/Falsoof it, it will be the direction they take. But in R , there is a big little difference.

Being a purely vector language, there are no "scalar" data, although there are different types of data, this can only exist in a "container" (the most elementary is the vector), when we do it a = 1in any language, we are assigning a space to store a single integer data, in R , it is the same, but with a subtle difference, an array of type integer is created, with a single element.

Now, with the evaluation of the conditions the same thing happens, it does not return a TRUE/FALSEscalar, but rather a boolean vector, however the flow control, like the rest of the languages, is clearly "scalar", a single TRUE/FALSEwill determine the flow to follow So: How is this situation of needing a single piece of data for evaluation compatible in the language, when in reality the language does not have it?

Patricio Moracho

Asked: 2020-05-01 20:18:47 +0800 CST

What is the difference between using a field validator or validating the model using clean?

3

Assuming I have a Django model like the following:

class Comprobante(models.Model):
    punto_venta = models.IntegerField(blank=True)

And I want to validate the model and particularly that punto_ventait is a value from 1 to 9999. I understand that there are two ways:

Use a validator in the field

def punto_venta_validate(value):
    if not value:
        raise ValidationError(_('El punto de venta es obligatorio'))
    if value <1 or value > 9999:
        raise ValidationError(_('El punto de venta debe debe ser un valor entre 1 y 9999'))

class Comprobante(models.Model):
    punto_venta = models.IntegerField(blank=True, validators=[punto_venta_validate]))

Validate on `clean()`model event

def clean(self):
    punto_venta_validate(self.punto_venta)

The only visible difference is that a ValidationErrorwhen the field is validated by using validators, it will show the message next to the field in the admin interface, when we use clean()to validate, I see that the error appears on all the fields. Eventually clean(), we could also validate multiple conditions and add each error to a list, in this way we could show all the errors of each field, so there would not be a difference between the two methods either. So: What is the difference between both methods? Is it just a matter of how they are displayed ValidationErroror is there something else that I am missing?

Patricio Moracho

Asked: 2020-11-25 14:54:35 +0800 CST

How to rearrange a data.frame?

4

I have a data.framewith a certain structure:

ucba <- data.frame(UCBAdmissions)
ucba

      Admit Gender Dept Freq
1  Admitted   Male    A  512
2  Rejected   Male    A  313
3  Admitted Female    A   89
4  Rejected Female    A   19
5  Admitted   Male    B  353
6  Rejected   Male    B  207
7  Admitted Female    B   17
8  Rejected Female    B    8
9  Admitted   Male    C  120
10 Rejected   Male    C  205
11 Admitted Female    C  202
12 Rejected Female    C  391
13 Admitted   Male    D  138
14 Rejected   Male    D  279
15 Admitted Female    D  131
16 Rejected Female    D  244
17 Admitted   Male    E   53
18 Rejected   Male    E  138
19 Admitted Female    E   94
20 Rejected Female    E  299
21 Admitted   Male    F   22
22 Rejected   Male    F  351
23 Admitted Female    F   24
24 Rejected Female    F  317

And I would like to reformulate it to the following form:

  Dept Male/Admitted Male/Rejected Female/Admitted Female/Rejected
1    A           512           313              89              19
2    B           353           207              17               8
3    C           120           205             202             391
4    D           138           279             131             244
5    E            53           138              94             299
6    F            22           351              24             317

Basically:

We group by department
We summarize in columns the values of acceptance/rejection ( Admit) and gender Gender.
Final output should be other data.frameand column names should be self explanatory

I've researched various options ( aggregateand xtabs) which so far are not entirely convincing to me.

Patricio Moracho

Asked: 2020-11-25 08:10:24 +0800 CST

What are the differences between library() and require() when loading a package?

6

When loading a package there are two ways to do it: library()and require(). What differences, if any, are there between the two methods?

Free translation and reworking of: What is the difference between require() and library()?

Patricio Moracho

Asked: 2020-10-14 12:40:15 +0800 CST

What to consider in R to build a reproducible example?

7

Whether it is when asking a question on this site or when we need to share an example with a colleague, what elements should we take into account to ensure the reproducibility of the example? (information, data, structures, etc.)

Patricio Moracho

Asked: 2020-08-03 18:55:59 +0800 CST

How to check if two data ranges have shared elements?

21

A few days ago I saw a question that, among other things, implied a problem similar to the one I am going to pose, I wanted to do it in a more general way, because I understand that the well-posed solution could serve as a reference to similar problems. Perhaps for some of you the answer is trivial or obvious, but in my case only when I chewed it enough I found (at least I think so) that it was simpler than I thought. I propose it in Sql but it could be more about algorithms, the issue is that it seemed more practical to be able to test the solutions.

Suppose the following example:

CREATE TABLE A (
    NRO_DESDE INT,
    NRO_HASTA INT
    )

CREATE TABLE B (
    NRO_DESDE INT,
    NRO_HASTA INT
    )

INSERT INTO A (NRO_DESDE, NRO_HASTA)
VALUES (5, 8)

INSERT INTO B (NRO_DESDE, NRO_HASTA)
VALUES (1, 2), (4, 5), (5, 8), (6, 7), (7, 9), (4, 10), (9, 11)

SELECT NRO_DESDE, NRO_HASTA FROM A;
SELECT NRO_DESDE, NRO_HASTA FROM B;

The table Ahas a single value:

NRO_DESDE NRO_HASTA
========= =========
5         8

The boardB

NRO_DESDE NRO_HASTA
========= =========
1         2
4         5
5         8
6         7
7         9
4         10
9         11

The tables Aand Brepresent sets of intervals, but for which we do not have all the values but rather we know the first and last element of each set, the idea is to compare the only set in Awith all of Band determine if they share any element. As an example, the record B (4, 5)shares the 5with A, the (1, 2)does not share any element, the (7, 9)share the 7, 8. The result would then be the records of Bthat have elements shared with those of A, it is not important to know what they are, just to know that there are, we can also assume that the number of elements in each set is relatively manageable. Don't worry about missing primary keys, it's just a conceptual example.

Note: The code is built in SQL Server but could be resolved in any "flavor" of SQL.

How to get the history of a CRAN package?

How to anonymize strings from a data.frame?

Is it possible to set the order in an UPDATE statement?

To play the data

How to explain this behavior of the CASE?

How do I uninstall an old version of Java?

Why in certain programming languages is the first index of data collections 1?

How to find the optimal paths that meet a certain condition?

How to build a matrix product of a comparison with a range of values?

How to improve performance of a geom_histogram()?

Can an operator be passed as a parameter to a function?

How to build a data.frame with all the words of the Spanish language?

How to list sales of periods for which I have no information?

Important

Is it possible to use a non-standard package in Repl.it?

What is the proper way to compare floating point numbers?

How does control flow work in a language that is entirely vector?

What is the difference between using a field validator or validating the model using clean?

Use a validator in the field

Validate on `clean()`model event

How to rearrange a data.frame?

What are the differences between library() and require() when loading a package?

What to consider in R to build a reproducible example?

How to check if two data ranges have shared elements?

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?

Patricio Moracho's questions

To play the data

Important

Use a validator in the field

Validate on clean()model event

Validate on `clean()`model event