I have the following data
datos <- read.table(text = 'col1
"Casa Verde CA/BBE/565655/15"
"Casa CA/BBE/2345/15 VERDE"
"Casa Verde _CA/BBE/122281/15_B"
"Casa CA/ABC/1281/2015"
"Casa Azul 5_CA/AAA/1281/2017_B2"
"Casa Verde_5_CA/CCC/12/17"
', header = TRUE, stringsAsFactors = FALSE)
data:
col1
1 Casa Verde CA/BBE/565655/15
2 Casa CA/BBE/2345/15 VERDE
3 Casa Verde _CA/BBE/122281/15_B
4 Casa CA/ABC/1281/2015
5 Casa Azul 5_CA/AAA/1281/2017_B2
6 Casa Verde_5_CA/CCC/12/17
The last numbers is the year. I want that if it puts 15 it is changed to 2015.
The structure to look for is:
2 or more letters/3 letters/ 2 or more numbers/two numbers
and change the /two numbers to 20 two numbers (15 -> 2015)
Whatever is in front of or behind the structure to be searched for can be deleted. It would be something like this:
1 Casa Verde CA/BBE/565655/15 CA/BBE/565655/2015
2 Casa CA/BBE/2345/15 VERDE CA/BBE/2345/2015
3 Casa Verde _CA/BBE/122281/15_B CA/BBE/122281/2015
4 Casa CA/ABC/1281/2015 CA/ABC/1281/2015
5 Casa Azul 5_CA/AAA/1281/2017_B2 CA/AAA/1281/2017
6 Casa Verde_5_CA/CCC/12/17 CA/CCC/12/2017
I have tried with
datos$col2 <- stringr::str_replace(datos$col1,"(\\w{2}\\/\\w{3}\\/\\d{2,}\\/)(\\d{2})","\\120\\2")
But 2015 becomes 202015.
If I try with $
datos$col2 <- stringr::str_replace(datos$col1,"(\\w{2}\\/\\w{3}\\/\\d{2,}\\/)(\\d{2}$)","\\120\\2")
Those with something behind the year do not change. Any ideas?
warning: I have never programmed in R.
Search Regex:
Replacement Regex:
Result:
As you can see I am using named capture groups to make it more evident that 20 is not part of your capture groups. Also add a non-capturing group which basically indicates that the following group should not be, so that it ignores expressions with 4-digit years. Finally we add the $ line terminator to make sure it fails before trying unnecessary matches with the rest of the text.
I understand that the final code should be something like:
If the year always occurs in the last string, considering the slash as the separator
/
, I would work only with that smaller portion of the string, it is possible that this way it is simpler and safer to apply regular expressions.Only with this piece of data, if we assume that 4 numbers in a row is the year and that 2 would be the same but starting from 2000, we could do something like this: