How to "contain" the join table

I have a number of situations where a join table has extra data in it, not just the foreign keys. For example, I have teams and people , and teams_people which has things like what their role is on the team (captain, player, etc.), their number, and so on. To date, I have been loading team rosters with something like $teams = $this->Teams->find()->where([...])->contain(['People']); and then as I’m iterating through the people on a team I access the join table record through $person->_joinData->role for example.

I’ve been told by a trustworthy source that _joinData should be an implementation detail that my code is unaware of, and I should change my containment so that the data is present in a more reliable location, which will probably require ensuring that TeamsTable and PeopleTable both have a hasMany association to TeamsPeopleTable.

Assuming that I know how to set up all the required associations, what should my find statement above be changed to, in order to ensure all the data is present?

->contain(['TeamsPeople' => 'People']) ? I don’t love that, as it means that the Team entities loaded this way will not have a people property, but any that are loaded in places where I might need the list of people but nothing about their roles, etc. would. That makes the Team entity a little bit unpredictable in terms of common functions that might be used for both scenarios.

->contain(['People' => 'TeamsPeople']) ? I think that will load all the join table records for each person, regardless of what team they are on?

Your idea of treating the join table as a separate relationship is a reasonable one. But I don’t think either of the relationships you’ve created will earn you any complexity reductions. Instead, you’ll simply change the name of the deep data from person->_joinData to person->TeamsPeople.

The approach I would take in this case is to expand the interface of your Person entity class to provide transparent access to the join table data. I personally favor accessor methods but you should be able to use virtual fields too.

//accessor
public function role() {
    return $this->_joinData->role;
}

//virtual
public function _getRole() {
    return $this->_joinData->role;
}

With this strategy, you have to guarantee the join table data will always be available in the Person entity when you have it. The association options or named queries should give you tools to enforce this.

Now, consumers of your entities can use the extended interface of your Person entity rather than being hard-coded to the structure of your Person entity.

As a further note on expanding the entity interface, I often make my entities provide boolean answers to various questions. Taking your person->teams_people->role example, I would almost certainly have this method:

//using the previously described virtual field technique
public function roleIs($role) {  // or simply is() if there are no conflicting method names
    if( is( array($role) ) {
        $result = in_array($this->role, $role);
    }
    else {
        $result = $this->role === $role
    }
    $return $result;
}

The other thing I would do in this case would be to create constants for all my roles. Then I’d use those EVERYWHERE rather than literals. I do this so often that I have a special directory full of constant classes:

Screen Shot 2022-05-11 at 10.19.18 AM

This would be typical content for one of these classes:

The use of constants means I have code-completion hints from my IDE once I have identified the constant class. No more typos of literals, no more forgetting what my choices are or their spelling.

Hopefully these examples make it clear why I would make the People::isRole($role) method can respond to arrays.

The main difference between person->_joinData, as it is loaded automatically when containing People on a Team, and person->teams_people, as it would be loaded if I used ->contain(['People' => 'TeamsPeople']), is that the latter will have the player’s entire team history; Teams <=> People is a belongsToMany association. When I’m loading a team roster, I want only that team’s join table record for each person.

Adding an accessor function that uses _joinData doesn’t eliminate the problematic _joinData reference, it just moves it. Not included in my original post (because I don’t think it’s relevant) is that this discussion started because I’m updating my unit tests to use the fixture factories, and _joinData isn’t always set reliably in objects that creates. It’s fine for controller method integration tests, or table finders, but not entity tests. I can use the factory method to persist the entity and then read it back right away so the structure is as expected, but that’s a code smell which led to the conversation that _joinData should be a Cake-internal detail, not something that any of my code relies on.

I see I had not appreciated some of the nuance and alternate use cases you had in mind. Given all that, your idea of defining the TeamsPeople association makes the most sense. But with all this additional information I think the answer to your original question:

is, they both have merit depending on your specific needs.

Even with the use of your new association I personally would use the accessor techniques I described. One reason:

I would say rather, that it encapsulates it. That is a reasonable improvement. And having a named association that contains true entities (rather than the _joinData array) makes the technique even more powerful.

But I have the feeling you may not agree. Judging by many of your answers to forum question in the past I think we have a slightly different style preference for using the Cake framework.

Where you (I suspect) stick to ‘standard’ Table and Entity classes as they are baked, I expand my entity classes into more complex (but still ‘normal’) structures. For example, I have a system that tracks Addresses, each hasMany Coms (phone, email, url, whatever).

There is actually no use case in my system that uses an Address without also needing its Coms. So, I my classes are designed to make this larger Entity work as smoothly as possible. I have the feeling you would never do this.

When I have a use case for an Entity in both its simple form and a more complex form, I either make a second table that delivers the deeper entity type or tell the Table class which entity type I want before making the query. This sounds like an approach you would never take.

As to testing:

Oh yes! FixtureFactories is a great tool and I think that is highly relevant. Definitely I’d go with eliminating _joinData in that case.

To account for this in the case of my Address class described above I do this in my AddressFactory::setDefaultData():

In this way, any time I get an address, I also get the expected associated data which my system enforces.

PS I always enjoy reading your forum posts and really appreciate your efforts on behalf of the framework.

And because with me, there is always “one more thing”: here is an example of why

Consider this method from my deep Address entity:

_joinData is, in my cases at least, an actual entity. Maybe because I typically have through settings on my associations?

I think if you were to look at my code, you’d be surprised (and a little horrified; I know I am) at some of the things I’ve done for the sake of expediency. I try to answer questions with best practices, but I don’t always follow them myself. Not doing so does tend to get one into sticky situations like the one I find myself in now, though.

I very much expand my entities with all manner of accessor functions, some of which are certainly ill-advised, but many of which I am quite happy with. I’ve had things in the past (especially in the Cake 1.x days, when it was all arrays, and a person property would be accessed like $person['Person']['first_name'] if loaded directly, but $person['first_name'] if it was loaded through containment on another table) that tried to account for the same data taking different structures. But with entities, most of the need for that is gone, and I tend to find that any other time I find myself wanting to do that, it means that I’m making unwarranted assumptions. For example, when a function needs to work with a team and a roster, I’m generally better off passing those as separate arguments, even if it’s fn($team, $team->people) instead of assuming that $team will have a people property somewhere and making allowances for all the possible ways it might be called. And as I write this, I wonder if maybe that’s going to be part of the solution here…

Yes, I have structures like that too. And when join tables are involved, _joinData is present in the resulting entity structure (though only if I persist the factory result, not if I getEntity, so that’s part of the issue), and there’s no syntax for setting additional fields in the join table with this.

Thanks! I find that helping solve other problems is a good way to exercise my mind, and often think of ways to improve my own code while doing so.