CSCI-1080 Intro To CS: World Wide WebSaint Louis University, Department of Computer Science
|
Each red rectangle represents a tag used (for example a keyword of interest for a song)
Each blue circle represents a page bookmarked (for example a song or a product)
A tag and a page are joined by an edge if that tag has been used to describe that item (page, product or song)
Because of the way it is constructed, the above graph has two notable properties:
Each node is colored with one of two colors (red or blue)
Each edge joins two nodes of different colors. (In other words, no two nodes of the same color are adjacent.)
This is an example of a bipartite graph. Another way to describe a bipartite graph is by drawing the red nodes on one side of the picture and the blue nodes on the other side of the picture, just as above. Then each edge crosses from the left side to the right side. No two nodes on the same side are adjacent.
Most graphs are non-bipartite. A graph is non-bipartite when there is:
No way to color its vertices red and blue without creating red-red or blue-blue edges
No way to draw its vertices in two columns so that edges only cross from column to column and never join vertices in the same column
Note that the above two conditions are both saying the same thing.
Finding clusters is one way to discover meaningful groups of nodes in a graph. Another way is based on the idea that nodes x and y are similar to the extent that they have similar network neighborhoods – even if x and y are not adjacent to each other. This is the idea behind structural equivalence.
We define the structural equivalence of two nodes x and y as the similarity of their neighborhoods, as measured by the Jaccard index:
.
This equals 1 when x and y have identical neighborhoods and 0 when the neighborhoods of x and y have no nodes in common.
Example: Consider the bipartite graph drawn below. The yellow nodes below left are tags on delicious and the orange nodes below right are some of the URLs associated with these tags:
The structural equivalence of YouTube and Video is
The structural equivalence of MySpace and Video is:
The content, structure, and computation discussed above give us a simple framework for an abbreviated impersonal version of collaborative filtering (adapted from Principia Cybernetica):
The two steps of impersonal collaborative filtering:
A large set of affiliation data is collected. For example, a set of 3-tuples defined above, from which we derive a bipartite graph indicating which tags are affiliated with which Web pages.
Given a tag of interest and the bipartite graph from step 1, structural equivalence is used to compute the similarity of other tags to that tag of interest. A subgroup of “Related Tags” is identified and presented in rank order.
Imagine that you are a habitual Amazon customer. All you have to do is log in and you will see personalized recommendations informed by your history of purchases. The following four steps (again adapted from Principia Cybernetica) describe how this works.
The four steps of personalized collaborative filtering:
A large set of affiliation data is collected. For Amazon and most other CF sites, the relevant bipartite graph indicates which users have bought (downloaded, used, recommended, etc.) which items.
You log in. Given the bipartite graph from step 1, structural equivalence is used to compute the similarity of other users to you. A subgroup of “people like you” is identified.
Using the subgroup derived in step 2, an average preference function is determined. This function describes what “people like you” typically buy (download, etc.) on the site.
Using the preference function derived in step 3, the site displays the most preferred items that you have not yet bought.
The contect of this site is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.