I am working with a collection that includes two types of documents with different structures:
Some with information about videos projected on the screen:
{
"_id": {"$oid": "5e870200adbe1d000183fa4d"},
"data":
{
"inicio": "2020-03-30 10:20:29",
"fin": "2020-03-30 10:20:32",
"archivo": "salvamento4.mp4",
"tipo": "video"
},
"idSensor": 3,
"idDevice": 5
}
Others with information on the reaction of users to what is projected:
{
"_id": {"$oid": "5e86fe50adbe1d0001472c0f"},
"data":
{
"Trackings":
[{
"BeginTime": "2020-03-30T08:23:42.034893+00:00",
"FaceInfo":
{
"Age": 26.34,
"Emotion": "NEUTRAL",
"IsDetected": true,
"MaleProbability": 0.71,
"gazeTime": 2.37,
"numGazes": 71
},
"ImageSize":
{
"Height": 1080,
"Width": 1920
},
"LookingDuration": 2.37,
"PersonID": "P-2020-03-30_2749",
"ReIDInfo": {"NumReIDs": 1},
"RoiInfo": {"RoiDuration": 0.17},
"SensorID": 0,
"TrackingDuration": 2.77,
"Trajectory": null,
"direction": null,
"id": 1,
"roiName": 0,
"roiType": 1
}],
"timestamp": "2020-03-30T08:23:52.327678"
},
"idSensor": 2,
"idDevice": 5
}
At the moment I have created two simple queries -the first ones I do in Mongo- using the aggregation framework to extract the information separately and see what it looks like in my BI; on the one hand, what videos are shown, how many times each one, at what times, and on the other, how many people have seen them, how long they stare at the screen, with what facial expression, etc.
Well, now I would like to cross the data to try to determine who has seen each video and to be able to quantify if some have more impact than others on users.
Leaving aside the different time formats of the documents, which is something that I hope will be unified in the not too distant future, my idea is, for each idDevice
, to determine the overlaps between the projection times of each video (delimited by data.inicio
and data.fin
) and the visualization of each user (delimited by data.Trackings.BeginTime
and data.Trackings.TrackingDuration
, which would have to be added to the first to obtain the upper limit) to know how many people have seen each projection and how much of each projection -although the same person could see several projections, they are not very Long duration-.
For example, if a video is projected that is seen by three people -whether or not they see it in its entirety, watch that alone or watch more videos before and/or after-, that relates that video -or rather, the projection at that moment of that video, because the number of different videos projected is small- with those three people -or rather, with the views that these three people have made at that moment of that video, because they can be recurring users at different times or days-, so that in table format it would be something like:
idDispositivo idProyeccion dtInicioProy dtFinProy idVisualizacion dtInicioVis dtFinVis
1 1 10:00 10:03 1 09:58 10:02
1 1 10:00 10:03 2 10:01 10:07
1 1 10:00 10:03 3 10:01 10:02
The table would be more complex, it would include other columns to relate the projection with the name of the video and the visualization with the person, the estimated age, their facial expression, etc., but these fields are the essential ones; and even in my BI I could create additional columns to determine what % of the video each person has seen, for example. I hope you understand what I intend to achieve. The BI application I use creates datasets with the same tabular format regardless of the data source (Mongo, SQL, CSV, etc.).
I'm not going to lie: I don't even know where to start; the queries that I have made are very simple, with $match
and $project
, and $unwind
in the case of the visualizations to decompose data.Trackings
. I don't know if it would be possible and/or necessary to use $lookup
it to relate both types of documents -or some other method that allows unions-, because from what I've read it is applicable to different collections, and in my case it is the same.
Anyway, any suggestion on where to start will be more than welcome.
EDITION:
I include a sample of 8 files, 5 corresponding to views and 3 corresponding to broadcasts:
[
{"_id":{"$oid":"5e86fe50adbe1d0001472c0f"},"data":{"Trackings":[{"BeginTime":"2020-03-30T10:20:28.034893+00:00","FaceInfo":{"Age":26.34,"Emotion":"NEUTRAL","IsDetected":true,"MaleProbability":0.71,"gazeTime":2.37,"numGazes":71},"ImageSize":{"Height":1080,"Width":1920},"LookingDuration":2.37,"PersonID":"P-2020-03-30_2749","ReIDInfo":{"NumReIDs":1},"RoiInfo":{"RoiDuration":0.17},"SensorID":0,"TrackingDuration":2.77,"Trajectory":null,"direction":null,"id":1,"roiName":0,"roiType":1}],"timestamp":"2020-03-30T08:23:52.327678"},"idSensor":2,"idDevice":5}
,{"_id":{"$oid":"5e86fe93adbe1d0001472c10"},"data":{"Trackings":[{"BeginTime":"2020-03-30T10:20:19.843470+00:00","FaceInfo":{"Age":26.04,"Emotion":"NEUTRAL","IsDetected":true,"MaleProbability":1,"gazeTime":4.1,"numGazes":123},"ImageSize":{"Height":1080,"Width":1920},"LookingDuration":4.1,"PersonID":"P-2020-03-30_2754","ReIDInfo":{"NumReIDs":1},"RoiInfo":{"RoiDuration":5},"SensorID":0,"TrackingDuration":4.97,"Trajectory":null,"direction":null,"id":1,"roiName":0,"roiType":1}],"timestamp":"2020-03-30T08:29:53.731042"},"idSensor":2,"idDevice":5}
,{"_id":{"$oid":"5e86feb7adbe1d0001472c11"},"data":{"Trackings":[{"BeginTime":"2020-03-30T10:20:35.329768+00:00","FaceInfo":{"Age":null,"Emotion":null,"IsDetected":false,"MaleProbability":null,"gazeTime":0,"numGazes":0},"ImageSize":{"Height":1080,"Width":1920},"LookingDuration":0,"PersonID":"P-2020-03-30_2763","ReIDInfo":{"NumReIDs":1},"RoiInfo":{"RoiDuration":1.43},"SensorID":0,"TrackingDuration":1.4,"Trajectory":null,"direction":null,"id":1,"roiName":0,"roiType":1}],"timestamp":"2020-03-30T08:31:29.528652"},"idSensor":2,"idDevice":5}
,{"_id":{"$oid":"5e86feceadbe1d0001472c12"},"data":{"Trackings":[{"BeginTime":"2020-03-30T10:20:23.658545+00:00","FaceInfo":{"Age":26.82,"Emotion":"NEUTRAL","IsDetected":true,"MaleProbability":1,"gazeTime":0.9,"numGazes":27},"ImageSize":{"Height":1080,"Width":1920},"LookingDuration":0.9,"PersonID":"P-2020-03-30_2766","ReIDInfo":{"NumReIDs":1},"RoiInfo":{"RoiDuration":0.2},"SensorID":0,"TrackingDuration":3.37,"Trajectory":null,"direction":null,"id":1,"roiName":0,"roiType":1}],"timestamp":"2020-03-30T08:32:45.731280"},"idSensor":2,"idDevice":5}
,{"_id":{"$oid":"5e86fef4adbe1d0001472c13"},"data":{"Trackings":[{"BeginTime":"2020-03-30T10:20:19.458345+00:00","FaceInfo":{"Age":28.38,"Emotion":"NEUTRAL","IsDetected":true,"MaleProbability":1,"gazeTime":1.23,"numGazes":37},"ImageSize":{"Height":1080,"Width":1920},"LookingDuration":1.23,"PersonID":"P-2020-03-30_2776","ReIDInfo":{"NumReIDs":1},"RoiInfo":{"RoiDuration":0.17},"SensorID":0,"TrackingDuration":1.27,"Trajectory":null,"direction":null,"id":1,"roiName":0,"roiType":1}],"timestamp":"2020-03-30T08:34:24.242392"},"idSensor":2,"idDevice":5}
,{"_id":{"$oid":"5e87018dadbe1d000183fa4b"},"data":{"inicio":"2020-03-30 10:20:20","fin":"2020-03-30 10:20:21","archivo":"salvamento5.mp4","tipo":"video"},"idSensor":3,"idDevice":5}
,{"_id":{"$oid":"5e8701baadbe1d000183fa4c"},"data":{"inicio":"2020-03-30 10:20:21","fin":"2020-03-30 10:20:29","archivo":"video2.mp4","tipo":"video"},"idSensor":3,"idDevice":5}
,{"_id":{"$oid":"5e870200adbe1d000183fa4d"},"data":{"inicio":"2020-03-30 10:20:29","fin":"2020-03-30 10:20:32","archivo":"salvamento4.mp4","tipo":"video"},"idSensor":3,"idDevice":5}
]
EDIT 2:
If I am not mistaken, it has to be discriminated in such a way that dtInicioVis
it is not greater than or equal to dtFinProy
or dtFinVis
less than or equal to dtInicioProy
:
! data.Trackings.BeginTime >= data.fin
! (data.Trackings.BeginTime + data.Trackings.TrackingDuration) =< data.inicio
And if neither of these two possibilities occurs, then you have to relate the projections document to the visualizations document.
BEING EDITED
You can group your data according to
idDevice
:This will allow you to group the corresponding data.
Finally I managed to make the query; I suppose that it is far from optimal -there is a
$match
last step-, but it returns the results I expect -for eight documents, of course; Let's see how it works with the production-collection: