I have a table Peticiones
whose layout is:
NPeticion - nvarchar
FechaApertura - date
TipoPeticion - nvarchar
FechaResolucion - nvarchar
...
(más campos, irrelevantes para este asunto)
The issue is that I am trying to make a query that eliminates duplicates of records that have the same NPeticion
, FechaApertura
and TipoPeticion
keep the record with the FechaResolucion
most recent and different from 'NULL'
unless all the records are 'NULL'
that I would keep any of them.
I made this query: (possibly not the most elegant/efficient but it's what I came up with)
;with aBorrar as
(select [NPeticion], [FechaApertura], [TipoPeticion],[FechaResolucion],
row_number() over(partition by [NPeticion], [FechaApertura]
order by [FechaResolucion] desc) rn
from [dbo].[Peticiones]
where NPeticion in (
SELECT
[NPeticion]
FROM
[dbo].[Peticiones]
GROUP BY
[NPeticion],FechaApertura, TipoPeticion
HAVING
COUNT(*) > 1
)
and [FechaResolucion] <> 'NULL'
Union
select [NPeticion], [FechaApertura], [TipoPeticion],[FechaResolucion],2
from [dbo].[Peticiones]
where NPeticion in (
SELECT
[NPeticion]
FROM
[dbo].[Peticiones]
GROUP BY
[NPeticion],FechaApertura, TipoPeticion
HAVING
COUNT(*) > 1
)
and [FechaResolucion] = 'NULL'
)
select *
from aBorrar
where rn > 1;
Explanation:
Since FechaResolucion
it is a nvarchar
, if I order desc
to keep the most recent, the values appear first 'NULL'
. That's why I do a select ordering the records that are not 'NULL'
and then union of those null.
But that doesn't solve the problem if all the duplicates contain that column equal to 'NULL'
I would delete them all...
How do I make it leave me a record in the event that all the duplicates have FechaResolucion = 'NULL'
?
Data example:
[NPeticion] [FechaApertura] [TipoPeticion] [FechaResolucion]
'20171204000001' 12/04/2017 'A' '12/04/2017'
'20171204000001' 12/04/2017 'A' '10/04/2017'
'20171204000001' 12/04/2017 'A' '10/04/2017'
'20171204000001' 12/04/2017 'A' 'NULL'
'20171204000002' 12/04/2017 'B' 'NULL'
'20171204000002' 12/04/2017 'B' 'NULL'
Current result (records to be deleted):
[NPeticion] [FechaApertura] [TipoPeticion] [FechaResolucion]
'20171204000001' 12/04/2017 'A' '10/04/2017'
'20171204000001' 12/04/2017 'A' '10/04/2017'
'20171204000001' 12/04/2017 'A' 'NULL'
'20171204000002' 12/04/2017 'B' 'NULL'
'20171204000002' 12/04/2017 'B' 'NULL'
Expected result (records to be deleted):
[NPeticion] [FechaApertura] [TipoPeticion] [FechaResolucion]
'20171204000001' 12/04/2017 'A' '10/04/2017'
'20171204000001' 12/04/2017 'A' '10/04/2017'
'20171204000001' 12/04/2017 'A' 'NULL'
'20171204000002' 12/04/2017 'B' 'NULL'
Note: I have not designed the BD. Modifying its design is not possible.
Some comments on your current code. One of your problems is that by simply doing
ORDER BY FechaResolucion DESC
, and beingFechaResolucion
a string with the date formatdd/mm/aaaa
, it will not sort correctly, but alphabetically.On the other hand, you don't need to separate two queries to see if they contain or not
NULL
, but you can do something like the following:In this case, the first CTE doesn't even need to be done separately, and you could use the second one by replacing it
CONVERT(DATETIME,FechaResolucion,103)
withCONVERT(DATETIME,NULLIF(FechaResolucion,'NULL'),103)
, but I prefer to leave it that way because I find it more readable.I would simply order by
FechaResolucion DESC
with aROW_NUMBER()
in a subquery, then filter by onlyNumber = 1
and it will bring you the most recent date or any of the Nulls in case they are all null