From: TSim: a system for discovering similar users on Twitter
The total similarity formula |
---|
\( Sim _{\text{Total }} ({\text{u}}_{\text{i}} ,{\text{u}}_{\text{j}} ) = \mathop \sum \limits_{m = 1}^{7} \left( {Sim_{\text{m }} \left( {{\text{u}}_{\text{i}} ,{\text{u}}_{\text{j}} } \right) * {\text{weight}}_{\text{m }} } \right) \) |
where, \( u_{i} \) is the examined user—TSim attempts to find similar users to it \( u_{j} \) is the candidate user—the user that TSim is computing its similarity to ui Simm is the score of signal m similarity between ui and uj. Sim1 through Sim7 explained below weightm is the weight assigned to signal m score |
Name | Formula | Explanation | |
---|---|---|---|
Signal 1 or Sim1 | Followings and Followers Relationship Similarity | \( {\text{Sim}}_{{ {\text{Relationship}}}} \left( {u_{i} , u_{j} } \right) = \left\{ {\begin{array}{*{20}c} {1\; if\;the\;candidate\;user\;appears \;in \;one \;list} \\ { 2\; if\; the \;candidate user\;appears \;in \;two \;lists} \\ . \\ . \\ . \\ . \\ {n + k \;if\; the \;candidate \;user \;appears \;in \;all \;lists} \\ \end{array} } \right. \) | n is the number of the ui’s followers k is the number of the ui’s friends |
Signal 2 or Sim2 | Mention Similarity | \( {\text{Sim}}_{\text{Mention }} \left( {u_{i} , u_{j} } \right) = \mathop \sum \limits_{l = 1}^{w} \frac{{{\text{twtsThrd }}\left( {{\text{l}}, u_{i} ,u_{j} } \right)}}{{{\text{twtsThrdTot}}\left( {{\text{l}}, u_{i} } \right) }}* \frac{ 1}{{ {\text{accntsTwt}}\left( {{\text{l}}, u_{i} } \right)}} \) | twtsThrd is a function that returns the number of ui tweets in the communication thread l with uj that mention the account uj twtsThrdTot is a function that returns the total number of tweets in the communication thread l. accntsTwt is the total number of accounts in the tweets in thread l w is the total number of communication threads mentioning both ui and uj |
Signal 3 or Sim3 | Retweet Similarity | \( {\text{Sim}}_{\text{Retweet }} \left( {u_{i} , u_{j} } \right) = {\text{numOfTwtsInRetwtList}}\left( {{\text{u}}_{\text{i}} , {\text{u}}_{\text{j}} } \right) \) | numOfTwtsInRetwtList is the number of uj tweets that ui retweeted |
Signal 4 or Sim4 | Favorite Similarity | \( {\text{Sim}}_{\text{Favorite }} \left( {u_{i} , u_{j} } \right) = {\text{numOfTwtsInFavList }}\left( {{\text{u}}_{\text{j}} ,{\text{u}}_{\text{i}} } \right) \) | numOfTwtsInFavList is the number of uj tweets that ui favorited |
Signal 5 or Sim5 | Common Hashtags Similarity | \( {\text{Sim}}_{\text{Hashtag }} \left( {{\text{u}}_{\text{i}} , {\text{u}}_{\text{j}} } \right) = \sum \limits_{l = 1}^{w} \frac{1}{{1 + HTOffset\left( {{\text{u}}_{\text{i}} , {\text{u}}_{\text{j}} ,{\text{HT}}_{l} } \right) }} \) where, \( {HTOffset}\left( {{\text{u}}_{\text{i}} , {\text{u}}_{\text{j}} ,HT} \right) = \left| {{\text{PT}}\left( {{\text{u}}_{\text{i}} , {\text{HT}}} \right) - {\text{PT}}\left( {{\text{u}}_{\text{j}} , {\text{HT}}} \right)} \right| + \left| {{\text{NT}}\left( {{\text{u}}_{\text{i}} , {\text{HT}}} \right) - {\text{NT}}\left( {{\text{u}}_{\text{j}} , {\text{HT}}} \right)} \right| + \left| {{\text{NTT}}\left( {{\text{u}}_{\text{i}} , {\text{HT}}} \right) - {\text{NTT}}\left( {{\text{u}}_{\text{j}} , {\text{HT}}} \right)} \right| \) | PT is a function that takes in a user id and a hashtag HT and returns the number of positive tweets of the user in the hashtag NT is a function that takes in a user id and a hashtag HT and returns the number of negative tweets of the user in the hashtag NTT is a function that takes in a user id and a hashtag HT and returns the number of neutral tweets of the user in the hashtag w is the total number of hashtags that both ui and uj tweeted in |
Signal 6 or Sim6 | Common Interests Similarity | \( {\text{Sim}}_{\text{Interests }} \left( {{\text{u}}_{\text{i}} ,{\text{u}}_{\text{j}} } \right) = {\text{count}}\left( {{\text{ints}}\left( {{\text{u}}_{\text{i}} } \right) \cap {\text{ints}}\left( {{\text{u}}_{\text{j}} } \right)} \right) \) | Ints is a function that takes in a user id and returns his/her top 5 interests after performing topic analysis to his/her tweets |
Signal 7 or Sim7 | Profile Similarity | \( \begin{aligned} {\text{Sim}}_{\text{Profile}} \, = & \;\,\left[ {{\text{gender}}\left( {{\text{u}}_{\text{i}} } \right)is \, equal \, to\;{\text{gender}}\left( {{\text{u}}_{\text{j}} } \right)} \right] \\ & + \;[{\text{language}}\left( {{\text{u}}_{\text{i}} } \right)is \, equal \, to\;{\text{language}}\left( {{\text{u}}_{\text{j}} } \right)] \\ & + \;[{\text{location}}\left( {{\text{u}}_{\text{i}} } \right)is \, equal \, to\;{\text{location}}\left( {{\text{u}}_{\text{j}} } \right)] \\ \end{aligned} \) | Gender is a function that takes in a user id and returns its gender from the user’s profile on Twitter Language is a function that takes in a user id and returns its language from the user’s profile Location is a function that takes in a user id and returns its location from the user’s profile |