The other day when I visited a group page on Facebook and looked at the "Related groups" section I started to think about how that SQL question might be implemented.
Screen capture from the MSDN Sweden Facebook group.
This is not a list that the group owner can control, instead the list automatically shows groups that has a lot of members in common with the group you're currently visiting.
Without any knowledge of Facebooks design for this, let's assume that the database structure is like this, a very ordinary many-to-many relationship:
To have some test data I filled these tables with the following data.
As you can see the "Comic Book Characters" and the "Superheroes" groups have a lot of members in common while the "Celebrities" group is an island.
So, based on a given GroupId, how can we get a resultset that lists all of the other groups with a column that specifies the amount of shared members? Let's build this step by step, in fact I do this as I'm writing so the final solution is not totally clear to me yet, even though I have some idea. If you want to follow along and run these SQL commands you can download the database file here (this database also includes the final SQL command as a stored procedure).
Let's say that the given GroupId is 2 (so this might come from a querystring or some other input, in these commands the number 2 will be hard coded). This means that the final result we want is this:
GroupId | GroupName | SharedMemberCount |
---|---|---|
1 | Comic Book Characters | 4 |
3 | Celebrities | 0 |
Let's start with selecting all groups except the given one;
SELECT GroupId, GroupName FROM Groups WHERE GroupId <> 2
Now, we will need a nested select statement to add the third column, I usually put the outer structure in place, and then I start writing the SELECT statement within the parenthesis:
SELECT GroupId, GroupName, (SELECT something) AS SharedMemberCount FROM Groups WHERE GroupId <> 2
Let's start with just counting the total numbers of members in each group, to do this the Groups table needs an alias so that we can distinguish the two GroupId columns;
SELECT GroupId, GroupName, (SELECT COUNT(*) FROM PersonGroupMemberships AS pgm WHERE pgm.GroupId = g.GroupId ) AS SharedMemberCount FROM Groups AS g WHERE GroupId <> 2
This is the current resultset;
Now we need to filter the inner select so that it only counts the members that are also members in the given group (GroupId=2).
SELECT GroupId, GroupName, (SELECT COUNT(*) FROM PersonGroupMemberships AS pgm WHERE pgm.GroupId = g.GroupId AND pgm.PersonId IN (SELECT PersonId FROM PersonGroupMemberships WHERE GroupId=2) ) AS SharedMemberCount FROM Groups AS g WHERE GroupId <> 2
And really, this was no brain surgery. We simply say that the PersonId of the rows we count has to have a value that corresponds to any PersonId that is a member of group 2 by using the IN statement and yet another nested SELECT statement.
Next we should add sorting so that the groups with the most shared members comes on top, I also add a TOP statement since we probably just want the most related groups, not all groups. These changes wont make any difference with this testdata, but they would if we had more groups:
SELECT TOP 4 GroupId, GroupName, (SELECT COUNT(*) FROM PersonGroupMemberships AS pgm WHERE pgm.GroupId = g.GroupId AND pgm.PersonId IN (SELECT PersonId FROM PersonGroupMemberships WHERE GroupId=2) ) AS SharedMemberCount FROM Groups AS g WHERE GroupId <> 2 ORDER BY SharedMemberCount DESC
And now we get the resultset we wanted;
To think about
Of course you wouldn't want to display the "Celebrities" group in this case, since it has no members in common with the "Superheroes" group. How could you filter the resultset so that it only includes groups where SharedMemberCount is more than zero? You might wanna use this technique.
(By the way, I have installed SQL Server 2008 Express, and it gives neat intellisense when working with SQL queries! :-)
No comments :
Post a Comment