More than a question, a debate arises when modeling the database of a software for occupational health management. The question is the following, we have two entities, one entity Persona
and the other called EPS
and when making the relationship between the two, the debate arises if Id
one should be in the other and vice versa. I explain:
A person MUST be in only one EPS
, they cannot be in several; one EPS
has many Personas
linked. By concept of cardinality it is said that in a one-to-many relationship, the many must have the Id
one of the one, in other words one Persona
must have the foreign key of Id
. But debating some say that in the entity EPS
must have the Id
of the Persona
. What would be the correct way to model this problem?
Taking as an example the following entities and some of their data:
EPS Entity :
Entity Person :
Knowing that one
Persona
must be related to only oneEPS
but oneEPS
can have more than onePersona
, I expose mainly 3 cases:Case 1: The entity
Persona
must have the foreign key of theEPSId
Finally, the table
Persona
would only have to add the foreign reference of the table,EPS
leaving something like this:(5 records in the table
Persona
and 3 records in the tableEPS
)Case 2: The entity
EPS
must have the foreign key ofPersonaId
This is a more complicated case, since you will have duplicate information and it would have to be added as part of the key so that in the end a key made up of and
PersonaId
is generated , leaving something like this:EPS
PersonaId
As you can see,
EPSId
they are repeated, and assuming you have large amounts of information, searches can be complicated.(5 records in the table
Persona
and 5 records in the tableEPS
)Case 3: An intermediate table with the reference of
PersonaId
yEPSId
This case would be functional in case you want a many-to-many relationship, this would avoid having duplicate information in one or another table. I do not consider it to be a bad practice since finally the information is kept normalized, leaving the entities as it was shown at the beginning and creating another table, leaving something like this:
Which for this example, would be something similar to Case 2.
(5 records in table
Persona
, 3 records in tableEPS
and 3 records in intermediate tablePersonaEPS
)As you mentioned in your question, " By concept of cardinality it is said that in a one-to-many relationship, the many must have the
Id
one of the one, in other words onePersona
must have the foreign key ofId
", then case 1 should be ideal for this scenario.Depends.
Let's first analyze why the focus
id
of the entitypersona
in the entityEPS
is wrong. Then we will discuss the advantages and disadvantages of using theid
deEPS
in the entitypersona
, and using an intermediate table.Let's suppose that our entity
EPS
is as shown in the diagram, with a primary key composed ofid_eps
andid_persona
:The entity is in first normal form (1NF). However, at first glance we can see that something is wrong: data redundancy, which leads us to a series of problems when trying to work with it ( update anomalies ):
id_persona
would be null, and the field can't be null because it's part of the primary key.id_eps
1? We would have to find all the corresponding records and change the name, or risk a data inconsistency if we forget to include one.id_persona
6, and this is the only affiliate that has the EPS withid_eps
3. We would be deleting not only the client, but also the only record we have of that EPS.These problems caused by the table for being only in 1FN, occur when we try to drive
id_persona
on it. As we can see, it is not only the fact that we are storing repeated information, but also that it is impossible to work with it if we want to keep the database consistent. At this point, there is no debate . It simply is not an alternative.What solution can we give you? Well, keep normalizing. We separate the attributes that are not dependent on the primary key into a new entity. So our EPS table would be the following, with a primary key
id_eps
(this table is in 3FN):Now yes, to the debate.
This means that our entity
persona
is more or less like this, with a primary keyid_persona
and a foreign keyid_eps
:The board is at 3NF. I would say that there are several criteria to analyze, but I am going to focus on three: performance, accuracy of information and scalability.
In terms of performance, by having id_eps as a foreign key, it allows queries on EPS and its clients to only handle a JOIN between two tables.
Regarding the veracity of the information, it is not possible to enter a client more than once for two or more EPS. Each client belongs to a single EPS, otherwise it would try to violate the integrity of the primary key.
The problem could be in scalability: What happens if the model is implemented, everyone happy and content, and one day we are asked to handle information on the dates of linking/unlinking with the EPS? Likewise, the user who made the registration and the date of the transaction.
We could deal with it in the following way (the fields are a continuation of the table
persona
, but I separated them for formatting reasons):We comply with the request, but we can begin to see that these data no longer have so much to do with the entity
persona
as such. We can create a new table for this, with the primary key made up ofid_eps
andid_persona
, but we could have seen this from the beginning with the following approach:Although cardinality invites us to the previous approach, we saw that if the database scales, we could have avoided some problems if instead of thinking about cardinality, we had thought about the relationship between the tables.
What is there between an EPS and a person who links them? It could be a
contrato
with primary key composed ofid_persona
andid_eps
:The table is in 3FN, and now if the link dates and release date fields make more sense in a
contrato
than in apersona
. In addition, we could easily add fields for reason of termination, registering user, registration date, etc; without altering the information we already have about our EPS and people.But not everything is perfect. In this example, a query involving the names of the EPSs and the names of their clients is equivalent to a JOIN between three tables. For the example, it is simple, but normally databases have dozens of tables and if we normalize too much (go beyond the 3NF, for example), such a simple query can easily involve a JOIN between five, six or more tables, causing information is expensive to obtain.
"Ha, but I created indexes and the query plan tells me that it uses them, so there is no problem with multiple joins" . Great, but an index adds cost to insert and update operations . If the system is very transactional, this could mean a drop in performance. Let's not say if we manage more than three or four indices per table, and if when scaling, we modify the tables and we have to recalculate all the statistics. Sometimes it becomes prohibitive. I've seen questions here on SO about queries that take hours to execute, and while many of the queries perform poorly, some can be seen to be designed not for high volume data.
In addition to this, suppose that by carelessness or mistake (or intentionally) a client that already existed is entered, with a new EPS in the table
contrato
:The second record is the problem. The database remains integrated and consistent, but its information is no longer accurate. Unlike the previous approach, the integrity of the primary key is not broken here, so it is necessary to take a previous action when inserting into this table: a previous search, for example, and this adds a cost to the process of creating a new contract.
In summary, knowing which approach to take will give you (in my opinion) experience (mainly work). If the task or exercise is about cardinality, take the first approach, but in the real world design and modeling goes far beyond cardinality:
Is the database relational? Is it an OLTP or an OLAP? If it is an OLTP, what level of transaction will it have? What volume of data do the tables handle? Are there prior security measures in sending information to the database? How feasible is it that they will ask for later modifications?