Suggestion: optionally force JOIN on BelongsTo relationships

falmp | 3 years ago

I think optionally forcing BelongsTo relationships to always JOIN the tables would optimize the queries. Sorry if I'm talking non-sense, as I've never used an ORM framework I might be wrong and/or doing it the wrong way. ;)

But let me give you an example:

Suppose I have a blog application with tables posts and users, each post belong to a user. If my view goes through the list, displaying detailed information about the user, MySQL receives the following queries:

select * from posts;
select * from posts p join users u on p.userId = u.id where p.id = 1 and p.userId = x;
select * from posts p join users u on p.userId = u.id where p.id = 2 and p.userId = y;
(...)

If we would force the JOIN from the beginning, the only query being run would be:

select * from posts p join users u on p.userId = u.id;

Not sure how that would be done, probably by extending the BelongsTo annotation.

What do you think?



KrisJordan | 3 years ago

falmp,

Great observation, the N+1 queries issue can be optimized as you've suggested with relationship prefetching / forced joins. It's under consideration for the 0.3 release which focuses almost exclusively on the ORM. If you are interested in helping with implementation, that'd be great.

As I understand the queries above you're trying to iterate through users and show their posts? You should be able to get this down to 2 queries with the existing ORM. Here's some psuedo-code:

$users = Make::a('User')->all();
$usersCount = $users->count();
$posts = $users->posts();
$postsCount = $posts->count();

for($i = 0; $i < $usersCount; $i++) {
echo "$users[$i]->name:\n";
for($j = key($posts); $j < $postsCount; $j++) {
if($posts[$j]->userId != $users[$i]->id) {
break;
} else {
echo "$posts[$j]->title\n";
}
}
}

The problem is that a solution like this exposes the guts that SQL queries are going on under the covers. This isn't the most natural way to code it up, which would be:

$users = Make::a('User')->all();
foreach($user as $user) {
echo "$user->name:\n";
foreach($user->posts() as $post) {
echo "$post->title\n";
}
}

The dilemma with prefetching is that it will introduce non-trivial complexity to the ORM. For most applications the N+1 problem isn't a big deal. For high traffic applications with large enough databases using JOINs is optimized against (Flickr, for example, never uses joins, emphatically denormalizes their data, and caches on the front-end). So, there is question whether prefetching is worth the complexity. Prefetching a single relationship isn't too complex but when you account for prefetching multiple relationships things get uglier.

More thought needs to go into how necessary prefetching with forced joins is for 0.3 and it is worth looking at what other leaders are doing. Last I checked, though Rails has this capability, it is deemphasized. Not sure about Django. Even with great indices using JOINs to return data from multiple tables gets to be really inefficient. What we should think through is if prefetching/auto-joins is really the answer we need - or if we could simply make scenarios like the one above more natural to express without it.

reply

falmp | 3 years ago

Kris,

I was actually trying to retrieve all the posts already with the author information. There would be no inner foreach on the code:

$posts = Make::a('Post')->all();
foreach($posts as $post) {
echo "$post->title\n";
echo "$post->body\n";
echo "By: $post->user()->name\n";
}

If I forced the JOIN from the beginning, I'd have received the same number of results already with the users' information in 1 query (I think the HasMany relationship could also benefit from this on some situations regardless of the complexity). And I didn't mean to always force the JOIN on every relationship, this would have to be explicit on the annotation or some other way, maybe even while retrieving the results. I have tried doing a Make::a('Post')->all()->innerJoin('users', 'users.id', 'posts.userId'); with no success; the query being run is right, but I couldn't retrieve the results in the context.

I agree though that the point here might be the weight of JOINS vs. the number of queries.

I've been studying the Recess code the last few days, it's amazing the code base you have. Great work! I've been working with PHP for a few years now, but I'm quite new to the whole OOP, ORM, frameworks, etc. thing. I've been messing with the code on GitHub (which was new to me too, by the way), I'll try to contribute when possible.

I'm looking forward for the 0.2 version. Is there a roadmap somewhere?

reply

KrisJordan | 3 years ago

falmp,

Thank-you for this additional information. This will be considered in depth for the 0.3 release which focuses entirely on the Models/Database.

0.2 is nearly complete and there should be news on it this week. Stay tuned, Kris

reply

falmp | 3 years ago

Kris,

Sorry for coming back to this issue again, but I just noticed something. If the performance killer are the JOINs, Recess is doing it when it's not necessary (when I'm accessing a foreign property of my object). Like I said on my first example, if I do:

$posts = Make::a('Post')->all();
foreach($posts as $post) {
echo "$post->title\n";
echo "$post->body\n";
echo "By: $post->user()->name\n";
}

The queries run are:

select * from posts;
select * from posts p join users u on p.userId = u.id where p.id = 1 and p.userId = x;

The second query could save the JOIN and look up the users table by the user's ID (since this query is only used to get the user information).

I really hope you reconsider setting up an optional pre-fetch of the BelongsTo relationship. And if not pre-fetched, at least the unnecessary JOIN could be saved.

reply

KrisJordan | 3 years ago

Really need to make spaces & nbsp ; in code blocks, that really looks ugly, sorry!

reply



Welcome KrisJordan!

Web: http://www.krisjordan.com/

Location: North Carolina

Edit Profile