original:http://www.mzanin.com/Papers/NW07%20-%20PA.pdf

Proceedings of the Workshop Net-Works 2007,

Aranjuez, 10{11 September 2007, pp. 1{16.

Preferential attachment, aging and weights in

recommendation systems

Massimiliano Zanin

1

, Pedro Cano

1

and Javier M. Buld¶u

21

1 Music Technology Group, Universitat Pompeu Fabra

2

Nonlinear Dynamics and Chaos Group, Departamento de F¶³sica, Universidad

Rey Juan Carlos

Abstract

In the present work, algorithms based on complex networks applica-

tions are applied to Recommendation Systems in order to improve their

quality of predictions. We show how some networks are grown under the

in°uence of trendiness forces, and how this can be used to enhance the re-

sults of a recommendation system, i.e. increase their percentage of right

predictions. After de¯ning a base algorithm, we create recommendation

networks which are based on an histogram of user ratings, using there-

fore a principle of preferential attachment. We show the in°uence of data

aging in the prediction of user habits and how the exact moment of the

prediction in°uences the recommendation. Finally, we design weighted

networks that take into account the age of the information used to gener-

ate the links. In this way, we obtain a better approximation to evaluate

the users’ tastes.

Keywords: Recommendation systems, preferential attachment, network

evolution

1. Introduction

Since the experiment of Milgram in 1967 [1], the study of (social) networks

have attracted the interest of many scientists from completely di®erent ¯elds.

Boosted by the seminal paper of Watts and Strogatz [2], complex networks

theory has become a strong utility to analyze di®erent kinds of data structures.

The application of complex networks to social problems has generated special

interest, and it has given fruitful results in di®erent subjects, raging from sex-

ual disease control [3, 4] to music community identi¯cation [6, 7]. Another ¯eld

1

emails: massimiliano.zanin@hotmail.com, pedro@bmat.com, javier.buldu@urjc.es2 Preferential attachment, aging and weights in recommendation systems

where complex networks have been successfully implemented is in Recommen-

dation Systems. In the last years, developments in computer and information

technologies have created new channels of commerce, mainly electronic, where

millions of customers are served each day, generating an enormous quantity

of information about their habits. On the other hand, this innovation has

created the need for personalization in customer cares, and this has supposed

a great interest in generating algorithms that recommend items to users that

enter an \e-store”.

In the search for better recommendation algorithms using complex net-

works theory, properties of the system like Clustering Coe±cient [13] or Jac-

card’s Coe±cient [12] have been explored, obtaining di®erent results. When

the growth of the recommendation system is considered, the Preferential At-

tachment strategy has been recently proposed [12], but without much consid-

eration within this community.

In this paper, we want to go deeper in the idea of applying preferential

attachment to a recommendation system: after de¯ning a base algorithm, we

study the e®ect of time in the network evolution, and ¯nd a better approxi-

mation to evaluate the users’ tastes.

2. Preparing the ground

The item-based strategy [8, 11] is one of the most popular in recommendation

systems: it presents interesting advantages, like short computation time and

low sensitivity to network sparsity. Since it is the most extended way of

generate a recommendation matrix, we take this algorithm as the ground to

compare with any other results.

The basic idea behind an item-based strategy is to look into the set of

items related with the target user, to compute the similarity of these items

with the others in the network, and select the most similar (see [8] for details).

For this purpose, a cosine-based similarity is commonly used. For each item, a

vector of length N is created, being N the total number of users. The vector

accounts for the relation between items thanks to user choises: for example, if

the n

th

element of the vector has a value of 1, it means that the user number n

has selected that item (or 0 if not). In some datasets, moreover, each element

can represent the rating of a given user for an item: e.g. a value between 1

and 5. After creating those vectors, the similarity between two items i and j

is de¯ned as:

sim(i; j) = cos(~i;~j) =

~i ¢ ~j

j~ij ¢ j~jj

(1)M. Zanin, P. Cano and J.M. Buld¶u 3

In this paper, we will only use this measure of similarity, for being well-

known and easy to implement; nevertheless, other ways to calculate this char-

acteristic have been developed in the past: the Correlation-Based Similarity

(by computing the Pearson-r correlation) and the Adjusted Cosine Similarity

[8].

In our experimental study we have used two datasets, each one with dif-

ferent characteristics, to observe results in di®erent backgrounds.

The ¯rst dataset is the collection of ratings of NetFlix [9], a web page of

movie renting where users can also evaluate movies (from 1 to 5). In order

to work with a network of simple (unweighted) connections, we ¯lter ratings

di®erent from 5 (the highest mark), so that we only keep users connected with

their top-rated movies. The result is a set of 17770 items (movies), 2.6 millions

users and more than 23 millions of operations (links).

Figure 1: Degree distribution for items (a) (i.e. movies) and users (b) in the

NetFlix dataset.

The second dataset, is from Art Of The Mix [10]. In this network, we have

90000 users, 472000 items (songs, in this case) and 1.3 millions of links. The

Art Of The Mix is a project started at the end of 1997 and consists of a web

site where users upload and interchange playlists of their favorite music. The

songs, somehow, ¯t in those lists, even though they do not need to belong to

the same country, decade or musical genre. In this way, a certain connection

results between songs of the list, whose origin is based on the musical taste

of the playlist’ author. Both datasets share the same structure: a line of

the network ¯le includes a connection between an user (speci¯cally, the ID of

the user) and an item (again, an anonimous ID de¯ning the item), and the

timestamp of the connection.

Once networks are de¯ned, it is worth noting that the size of the present4 Preferential attachment, aging and weights in recommendation systems

datasets is much higher than previous results in other networks, like [12], where

10000 items and 2000 users where considered, or [13] with a dataset close to

40000 items.

3. Preferential attachment

The initial step to improve a recommendation algorithm by taking advantage

of complex networks theory is to use the concept of preferential attachment.

First introduced by Barab¶asi and Albert in [16], the preferential attachment

has become a paradigmatic growing algorithm in order to explain the struc-

tures and evolution of social networks.

The main idea in [16] is that nodes with higher degrees (i.e., with more

links) acquire new links at higher rates than low-degree nodes; the probability

that a link will connect a new node j with another existing node i is linearly

proportional to the actual degree of i:

p(j ! i)=

ki

PN

j=1

kj

(2)

where ki

is the degree of node i and N is the total number of nodes. When

de¯ning a recommendation algorithm, this is equivalent to suppose that a given

user has a higher probability of selecting a popular item than an unknown one.

Intuitively, it may be clear that in some cases it will be right: every time the

algorithm is applied to a selling system, where goods being sold depend on

trendiness, items that are well-known will have a higher probability of being

bought. Nevertheless, there can be cases where the popularity of an item, or

the existence of a certain fashion, do not a®ect the creation of new links, and

users make their choices only following personal criteria.

As we will see, both considerations should be taken into account and some

kind of balance between them should also be included. Another interesting

point is that the initial dataset consist on a bipartite network [14] with two

di®erent kind of nodes, users or items (movies/songs). The bipartite net-

work could be projected in two di®erent networks; one with users being the

fundamental nodes and other with movies/songs being the nodes. Neverthe-

less, both projected networks disregard part of the information when they are

considered independently and we must de¯ne a way of accounting for all the

information within the dataset.

At this point, let us explain the way of implementing a preferential at-

tachment strategy in our recommendation algorithm, i.e., an algorithm that

favors the recommendation of the most connected items. The procedure can

be summarized in four steps:M. Zanin, P. Cano and J.M. Buld¶u 5

² First, we de¯ne a distance between a target user and any other user.

As in the case of items, a vector is created for each user, accounting for

his/her selected items. The vector has length M which corresponds to

the total number of items, and it will have a value of 1 at position m if

the m ¡ th item has been chosen by the user. Next, the cosine-distance

dis(j) with respect to the target user is calculated, and values are stored

in a linear array:

dis(j) = cos(~i;~j) =

~i ¢ ~j

j~ij ¢ j~jj

(3)

where i is the target user, and j is other user of the network. As be-

fore, other measures can be chosen, and this one has been selected for

simplicity.

² For each item l of the network, we de¯ne a compatibility value comp(l; user)

between an item and the target user, which is calculated as the sum of

the closeness of users related with that item; closeness is de¯ned as

1 ¡ dis:

comp(l; user) =

X

j

(1 ¡ dis(j)) (4)

where l is the item, and j accounts for users that have connections with

l.

² Finally, items are ordered according to their compatibility, in descending

order. Items in the beginning of the list are the more compatible, i.e.

the more suitable for recommendation. In this way, items in the top

of the list are the best for the target user, and should be submitted to

his/her attention.

Two important features of this approach need to be explained in detail.

First of all, this scheme has a very small calculation time; the most ex-

pensive operation, i.e. the calculation of distance between users, is executed

only one time: this leads to a complexity function of O(m), where m is the

number of users. On the contrary, for the basic item-based scheme, the al-

gorithm should calculate the compatibility between an item and each one of

the items connected to the target user. This is equivalent to carry out this

calculation u times, where u is the number of items related with the target

user; or, in other words, O(l¢u), with l being the number of items. As a result,

the computational cost of the basic algorithm is up to 100 times worse for the6 Preferential attachment, aging and weights in recommendation systems

NetFlix dataset: of course, this can be an important feature when working

with large datasets and real-time recommendations.

Second, unlike the basic algorithm, now we see that the global score (the

measure of the quality of the recommendation) of an item depends on how

many users have a connection with it: for each one of this connections, its

compatibility value (i.e. the compatibility between the selected and the target

user) is summed up, and the result of the sum is the global compatibility of that

item. This means that an item with many links will have a higher compatibility

value than another item with only a few links (because of the higher quantity

of values summed up); this is the basis of preferential attachment: the more

connections, the more the probability of being chosen by another user. On

the other side, not only the number of links is considered: the compatibility is

calculated, like in the basic algorithm, to be a representation of the user tastes;

if an item is well-known, but is far from the tastes of the target user, its total

compatibility value will be small, and that item will not be recommended.

4. Aging e®ect

4.. 1 Trendiness in real networks

As explained before, it makes sense that preferential attachment may improve

the quality of recommendations when the underlying network has a strong

trendiness component, where the trendiness component is the preference for

an user for items with high popularity: in the case of customers datasets,

buying items in an e-store, as the NetFlix dataset, the more an item is known,

the more is likely for that item to be chosen by the target user. Up to now, all

data previous to prediction date has been considered. Except some works like

[17] or [18], this has been the traditional approach, since it is a generalized

opinion that the more data is used in calculation, the better the result will

be. Nevertheless, trendiness of an item greatly depends on time: one item can

have a high popularity on a time t0, but it can lose all interest after a certain

time t1.

This fact can be observed in Fig. 2. The left plot shows an hypothetical

evolution of the number of new links for an item A (i.e., the derivative of

its degree): in time t0 this item has a great instantaneous degree (i.e. a

great popularity in a given moment, with many new users connecting to this

item), while close to t1 its number of new links decreases. On the other side,

item B has an overall lower degree, with a greater degree close to time t1.

It is important to note, that item A has a greater number of connections

if we consider the global data, while B wins in instantaneous degree after

time t1. A simple recommendation algorithm, like the one exposed before,M. Zanin, P. Cano and J.M. Buld¶u 7

Figure 2: Example of degree evolution for two items; item on the left has a

higher global degree, while item B has a higher degree in time t1.

would consider all data of the network, resulting in a greater probability for

item A; nevertheless, if we want a real-time suggestion, e.g. just after t1, the

recommendation algorithm should advantage B.

The example above explains the importance of the link aging: when the

global network is used in calculations, many data that are not strictly necessary

are included; sometimes, that unwanted data can lead to mistakes, and in

addition they always increase the calculation time.

Figure 3: Global degree evolution for NetFlix (a) and Art Of The Mix (b)

networks: the central point represents the moment of greatest degree of every

item. In the insets, are represented the degree evolution of an example item

for each network; note as for (b), the degree shows no clear peak: the mean

degree evolution for that network is therefore °at.

In Fig. 3 we represent how the instantaneous degree of the items evolves8 Preferential attachment, aging and weights in recommendation systems

in time. The instantaneous degree takes into account the number of new links

per day. We can see in the inset of Fig. 3-(a) an example of the instantaneous

degree evolution for a given item. In order to account for all items, we add

the instantaneous degree of all items, but aligned at their absolute maxima.

Fig. 3 shows how we obtain di®erent results for both networks. For the

NetFlix dataset, a great peak is observed, with the degree value increasing

and decreasing continuously around the central point: from the aging point

of view, it means that, ¯rst, there is a certain correlation time in the process

of achieving the highest popularity. Second, popularity depends on time, and

therefore, we must take it into account at the moment of recommending an

item.

The opposite case is Art Of The Mix, where the instantaneous degree level

of the whole dataset is quite constant, with only a central delta-shaped peak.

In fact, the central peak is an arti¯ce: since we align all items at their absolute

maxima, we will always have the highest value at time zero. Nevertheless, the

°at spectrum of the rest of the series indicates that °uctuations of the instan-

taneous degree are ¯ltered when adding all items. The absence of correlation

in the degree evolution indicates that relations between users and items do

not depend on time, as we don’t have a transaction like structure, and that

trendiness is not important to explain network growth: aging should not help

in improving results.

4.. 2 The cut-o® time

Starting from the above considerations, we de¯ne an improvement of the basic

preferential attachment algorithm: before calculating the result, the network

is ¯ltered to include only data (i.e. links) enclosed in a time window. We

assign a cut-o® time d to the window, and for a given time t1 and a target

item, only links within the window t1 and t1 ¡ d are considered.

Results of applying aging-based ¯ltering to both networks are shown in

Fig. 4 and Fig. 5 (NetFlix), and Fig. 6 (ArtOfTheMix). In order to evaluate

the recommendation algorithm we compute the score of the predictions, which

will be explained in detail in the next section. For the time being, the score

must be taken as an indicator of the quality of the recommendation. As

expected, thanks to the strong trendiness in the NetFlix dataset, the cut-

o® dimension of the window results in an improved score. Obviously, when

the window is too small, there is not enough information to perform a good

recommendation and the score decreases. When applying an aging ¯ltering to

Art Of The Mix network we do not obtain an improvement of the score (see

Fig. 6): as degree evolution is not important in this kind of network, reducing

the dimension of the window excludes important data from the analysis, andM. Zanin, P. Cano and J.M. Buld¶u 9

therefore the score decreases.

When network growth is based on rules that are equivalent to preferential

attachment, an important improvement in recommendation results can be

achieved; we go from the 0:924 of the item-based algorithm, to 0:933 of the

preferential attachment algorithm without aging, and ¯nally to 0:939 when link

aging is considered. At the same time, calculation time has been optimized:

when window size is small, there is less information to be processed and the

recommendation speeds up; moreover, we have previously seen how an user-

based strategy is more e±cient that an item-based one: once again, the speed

of the new algorithm is more suitable for real-time implementations.

Figure 4: Recommendation score as a function of cut-o® window dimension

d, for NetFlix dataset. The horizontal line represent the score for the basic

item-based algorithm, while the right point, marked with no cut-o®, is the

result of using the preferential attachment algorithm without ¯ltering data

(as if d = 1).10 Preferential attachment, aging and weights in recommendation systems

Figure 5: Recommendation score for NetFlix dataset. (a) Mean score obtained

for users with di®erent number of items; the system needs a minimum of 3

movies connected to an user to recommend to him with good results. (b)

Mean score obtained as a function of the mean degree of the movies chosen

by an user, for the standard and the preferential attachment algorithm.

4.. 3 Score calculation

In the previous section, he have used a score value to compare results com-

ing from di®erent algorithms: it is time to explain how it is calculated, and

moreover, why we have used this strategy.

When we evaluate a recommendation system, we randomly choose a target

user and a target item already selected by this user: that item should be

recommended by the algorithm for the given user, using only data prior to

link date and time. No restriction is applied to links position: it can be at the

beginning of the dataset (thus, only a few data can be used), or it can be at

the end (improving the amount of information available, but also increasing

the computational cost). The recommendation algorithm would return a list

of items, ordered by compatibility, so that the items on the top of the list

should be the best for the target user.

The Score value is simply calculated as a function of the position of the

target item in that list:

Score = 1 ¡

P ositem

#items

The more the target item is in the upper part of the recommendation list,

the more score approximates to 1.

In the past, other algorithms have been de¯ned to check the perfor-

mance of recommendation algorithms, and some of them (i.e. MEA, RMSE orM. Zanin, P. Cano and J.M. Buld¶u 11

Figure 6: Recommendation score as a function of cut-o® window dimension d,

for Art Of The Mix dataset.

Precision/Recall/F-measure) are often taken as standard measures. To make

an example, in [15] a great part of the dataset is used for training the system,

while the last part is the testing period; using data of the ¯rst set, the algo-

rithm should generate a ranked list of recommendations for each user, and the

quality of the recommendation system is then measured using the number of

hits and their position in the ranked list.

This method of evaluation is not suitable when preferential attachment

is used, and even more when an aging e®ect is applied, due to the fact that

time has a great in°uence in calculations. When we choose a time t0 and

a given user for evaluating the recommendation, all data related with item’s

rank depend on t0. If an item i is a hit at a distant time t1, let us say t1 << t0,

we should disregard that result.

5. Links weight

Finally, let us mention some details about the link heterogeneity. When de¯n-

ing recommendation algorithms, links between users and items are normally12 Preferential attachment, aging and weights in recommendation systems

identical, and the network is de¯ned as unweighted. In our case, we have a

parameter that can be used to discriminate the importance of each connection:

the age of that link.

For a given link, we can assign a weight that is de¯ned as a function W

of the number of days passed since its creation. Although any function can be

used for this purpose, we have chosen a piecewise linear function, that can be

tuned by two parameters ® and ¯:

W(i) =

½

1; ai > ¯

1 +

¯¡ai

¯

®; ai · ¯

where ai

is the age of the link. In this way, we modify the compatibility

of a given item l, which now reads:

comp(l) =

X

j

(1 ¡ dis(j)) W(j ! l) (5)

where (j ! l) is the link connecting user j to item l.

Figure 7: E®ect of considering weighted links. Results refer to the NetFlix

dataset for a window dimension of 120 days.M. Zanin, P. Cano and J.M. Buld¶u 13

The obtained score for di®erent values of ® and ¯ on the NetFlix data col-

lection is shown in Fig. 7. A maximum is detected around ¯ = 20 for di®erent

®, while large values of ¯ lead to a reduction of the score. This behavior is

expected since high values of ¯ are equivalent to increase the importance of

old links, a fact that is not favorable for a preferential attachment strategy.

On the other side, low values of ¯ are equivalent to include only very young

links, excluding a great quantity of information, and making the score value

to decrease.

6. Discussion

In order to better explain how preferential attachment algorithm works, we

report an example of recommendation for the NetFlix dataset. The target

user, randomly chosen, is the user number 658088, and the target item is item

number 872 (for privacy issues, users and items are encoded with sequential

numbers). Target user has links with 24 other items in the moment of the

recommendation.

First, we calculate the score using the basic algorithm. After making the

ranking ordered by compatibility, in ¯rsts positions the followings items are

founded, along with their compatibility score:

Item (1

st

) 7843 (2

nd

) 5085 (3

rd

) 11038 (4

th

) 14241

Compatibility 0.16734 0.14864 0.14591 0.14381

Target item is in position 830, with a compatibility of 0:04993: that is,

we get a score of 0:95329 (Score = 1 ¡ 830=17770, where 17770 is the total

number of items) for this case.

Next step is executing the preferential attachment algorithm with aging

on the same user and item. The dimension of the window d used for data

¯ltering can take di®erent values, and for each value the results obtained (i.e.

number of connections of the target user, rankings, score) are di®erent.

To show an example, we report what can be obtained with d = 70 days.

In this case, after ¯ltering the dataset, we have only 2:26 millions operations

(about ten time less than original data), and target user has 3 more links to

other items. Target item 872 is connected with 198 users in that interval of

time, and their compatibility with target user are the following:

User (1

st

) 698478 (2

nd

) 2081171 (3

rd

) 1558760 . . .

Compatibility 0.04352 0.06337 0.05803 . . .

Summing up all 198 values give a total compatibility of 14:69987. In this14 Preferential attachment, aging and weights in recommendation systems

example, we can see as the compatibility value is greater than the one obtained

with the basic algorithm: this is because we are summing up hundreds of

values, so the system must work with wider ranges. For this value of d, the

ranking obtained starts with the following values:

Item (1

st

) 13728 (2

nd

) 14240 (3

rd

) 2782 (4

th

) 11521

Compatibility 756.14 165.59 160.98 146.96

Target item is at position 357, that represent a score of 0:9799: comparing

the result of the item-based algorithm, target item climbed 515 positions.

Scores obtained with di®erent values of d are shown below:

d 30 50 100 140 180 1

Score 0.87530 0.97794 0.97766 0.97535 0.9740 0.97840

7. Conclusions

In recommendation systems, it is a common opinion that the bigger the

dataset, the better the result will be. In this paper, we show that in cer-

tain case this reasoning is not true. When recommendation systems refer to

networks with strong trendiness component, a preferential attachment strat-

egy can improve results, while at the same time, smaller computational cost is

required. This fact is due to the aging of the existing information, which can

be crucial in certain kind of networks. We demonstrate that, when fashion

or trends are present in the evolution of a given network, the age of the links

must be taken into account when developing a recommendation algorithm.

Moreover, we have seen that weighted links, based on its age, are suitable for

discriminating between recent and old information, increasing the quality of

the prediction in trendiness networks.

References

[1] S. Milgram, \The small world problem”, Psychol. Today, 1 (1967) 61{

67.

[2] D. J. Watts and S. H. Strogatz, \Collective dynamics of small-world

networks”, Nature , 393 (1998) 440{442.

[3] R.M. May and A.L.Lloyd, \Infection Dynamics on Scale-free Net-

works”, Phys. Rev. E, 64 (2001) 066112.M. Zanin, P. Cano and J.M. Buld¶u 15

[4] R. Pastor-Satorras and A. Vespignani, \Epidemic dynamics in ¯-

nite size scale-free networks”, Phys. Rev. E 65 (2002) 035108.

[5] Peter S. Bearman, James Moody and Katherine Stovel, \Chain

of A®ection: The Structure of Adolescent Romantic and Sexual Net-

works”, AJS, 110-1 (2004) 44{91.

[6] R. Lambiotte and M. Ausloos, \On the genre-¯cation of music: a

percolation approach”, Eur. Phys. J. B 50 (2006) 183{188.

[7] Juyoung Park, Oscar Celma, Markus Koppenberger, Pedro

Cano and Javier M. Buldu¶, \The Social Network of Contemporary

Popular Musicians”, arXiv:physics/0609229v1 (2006).

[8] Badrul Sarwar, George Karypis, Joseph Konstan and John

Riedl, \Item-based Collaborative Filtering Recommendation Algo-

rithms” In Proc. of the 10th International World Wide Web Conference

(WWW10) Hong Kong, May (2001).

[9] http://www.net°ix.com

[10] http://www.artofthemix.org

[11] J. Wang, A.P. de Vries and M.J.T. Reinders \Unifying Userbased

and Itembased Collaborative Filtering Approaches by Similarity Fusion”

Proceedings of SIGIR’06 Seattle (2006).

[12] Zan Huang, Xin Li and Hsinchun Chen, \Link Prediction Approach

to Collaborative Filtering”, In Proc. of the 5th ACM/IEEE-CS joint con-

ference on Digital libraries, 141{142 (2005).

[13] Zan Huang, \Link Prediction Based on Graph Topology: The Predictive

Value of the Generalized Clustering Coe±cient”, LinkKDD (2006).

[14] M.E.J. Newman, \The structure and function of complex networks”,

SIAM Review, 45 (2003) 67{256.

[15] Zan Huang, Daniel D. Zeng and Hsinchun Chen, \Analyzing

Consumer-product Graphs: Empirical Findings and Applications in Rec-

ommender Systems”, SSRN September (2005).

[16] A.-L. Barabasi, R. Albert ¶ , \Emergence of Scaling in Random Net-

works”, Science 286 (1999).16 Preferential attachment, aging and weights in recommendation systems

[17] Xiaodan Song, Ching-Yung Lin, Belle L. Tseng and Ming-Ting

Sun, \Modeling Evolutionary Behaviors for Community-based Dynamic

Recommendation”, Proc. of the SIAM Intl. Conf. on Data Mining (2006).

[18] J.L. Herlocker, J.A. Konstan, L.G. Terveen and J.T. Riedl,

\Evaluating Collaborative Filtering Recommender Systems”, ACM

Transactions on Information Systems (TOIS) (2004).