Skip to main content

Advertisement

Table 1 Total similarity formula and the formulas used for each signal

From: TSim: a system for discovering similar users on Twitter

The total similarity formula
\( Sim _{\text{Total }} ({\text{u}}_{\text{i}} ,{\text{u}}_{\text{j}} ) = \mathop \sum \limits_{m = 1}^{7} \left( {Sim_{\text{m }} \left( {{\text{u}}_{\text{i}} ,{\text{u}}_{\text{j}} } \right) * {\text{weight}}_{\text{m }} } \right) \)
where, \( u_{i} \) is the examined user—TSim attempts to find similar users to it \( u_{j} \) is the candidate user—the user that TSim is computing its similarity to ui Simm is the score of signal m similarity between ui and uj. Sim1 through Sim7 explained below weightm is the weight assigned to signal m score
Name Formula Explanation
Signal 1 or Sim1 Followings and Followers Relationship Similarity \( {\text{Sim}}_{{ {\text{Relationship}}}} \left( {u_{i} , u_{j} } \right) = \left\{ {\begin{array}{*{20}c} {1\; if\;the\;candidate\;user\;appears \;in \;one \;list} \\ { 2\; if\; the \;candidate user\;appears \;in \;two \;lists} \\ . \\ . \\ . \\ . \\ {n + k \;if\; the \;candidate \;user \;appears \;in \;all \;lists} \\ \end{array} } \right. \) n is the number of the ui’s followers k is the number of the ui’s friends
Signal 2 or Sim2 Mention Similarity \( {\text{Sim}}_{\text{Mention }} \left( {u_{i} , u_{j} } \right) = \mathop \sum \limits_{l = 1}^{w} \frac{{{\text{twtsThrd }}\left( {{\text{l}}, u_{i} ,u_{j} } \right)}}{{{\text{twtsThrdTot}}\left( {{\text{l}}, u_{i} } \right) }}* \frac{ 1}{{ {\text{accntsTwt}}\left( {{\text{l}}, u_{i} } \right)}} \) twtsThrd is a function that returns the number of ui tweets in the communication thread l with uj that mention the account uj twtsThrdTot is a function that returns the total number of tweets in the communication thread l. accntsTwt is the total number of accounts in the tweets in thread l w is the total number of communication threads mentioning both ui and uj
Signal 3 or Sim3 Retweet Similarity \( {\text{Sim}}_{\text{Retweet }} \left( {u_{i} , u_{j} } \right) = {\text{numOfTwtsInRetwtList}}\left( {{\text{u}}_{\text{i}} , {\text{u}}_{\text{j}} } \right) \) numOfTwtsInRetwtList is the number of uj tweets that ui retweeted
Signal 4 or Sim4 Favorite Similarity \( {\text{Sim}}_{\text{Favorite }} \left( {u_{i} , u_{j} } \right) = {\text{numOfTwtsInFavList }}\left( {{\text{u}}_{\text{j}} ,{\text{u}}_{\text{i}} } \right) \) numOfTwtsInFavList is the number of uj tweets that ui favorited
Signal 5 or Sim5 Common Hashtags Similarity \( {\text{Sim}}_{\text{Hashtag }} \left( {{\text{u}}_{\text{i}} , {\text{u}}_{\text{j}} } \right) = \sum \limits_{l = 1}^{w} \frac{1}{{1 + HTOffset\left( {{\text{u}}_{\text{i}} , {\text{u}}_{\text{j}} ,{\text{HT}}_{l} } \right) }} \) where, \( {HTOffset}\left( {{\text{u}}_{\text{i}} , {\text{u}}_{\text{j}} ,HT} \right) = \left| {{\text{PT}}\left( {{\text{u}}_{\text{i}} , {\text{HT}}} \right) - {\text{PT}}\left( {{\text{u}}_{\text{j}} , {\text{HT}}} \right)} \right| + \left| {{\text{NT}}\left( {{\text{u}}_{\text{i}} , {\text{HT}}} \right) - {\text{NT}}\left( {{\text{u}}_{\text{j}} , {\text{HT}}} \right)} \right| + \left| {{\text{NTT}}\left( {{\text{u}}_{\text{i}} , {\text{HT}}} \right) - {\text{NTT}}\left( {{\text{u}}_{\text{j}} , {\text{HT}}} \right)} \right| \) PT is a function that takes in a user id and a hashtag HT and returns the number of positive tweets of the user in the hashtag NT is a function that takes in a user id and a hashtag HT and returns the number of negative tweets of the user in the hashtag NTT is a function that takes in a user id and a hashtag HT and returns the number of neutral tweets of the user in the hashtag w is the total number of hashtags that both ui and uj tweeted in
Signal 6 or Sim6 Common Interests Similarity \( {\text{Sim}}_{\text{Interests }} \left( {{\text{u}}_{\text{i}} ,{\text{u}}_{\text{j}} } \right) = {\text{count}}\left( {{\text{ints}}\left( {{\text{u}}_{\text{i}} } \right) \cap {\text{ints}}\left( {{\text{u}}_{\text{j}} } \right)} \right) \) Ints is a function that takes in a user id and returns his/her top 5 interests after performing topic analysis to his/her tweets
Signal 7 or Sim7 Profile Similarity \( \begin{aligned} {\text{Sim}}_{\text{Profile}} \, = & \;\,\left[ {{\text{gender}}\left( {{\text{u}}_{\text{i}} } \right)is \, equal \, to\;{\text{gender}}\left( {{\text{u}}_{\text{j}} } \right)} \right] \\ & + \;[{\text{language}}\left( {{\text{u}}_{\text{i}} } \right)is \, equal \, to\;{\text{language}}\left( {{\text{u}}_{\text{j}} } \right)] \\ & + \;[{\text{location}}\left( {{\text{u}}_{\text{i}} } \right)is \, equal \, to\;{\text{location}}\left( {{\text{u}}_{\text{j}} } \right)] \\ \end{aligned} \) Gender is a function that takes in a user id and returns its gender from the user’s profile on Twitter Language is a function that takes in a user id and returns its language from the user’s profile Location is a function that takes in a user id and returns its location from the user’s profile