Skip to main content

Table 1 Total similarity formula and the formulas used for each signal

From: TSim: a system for discovering similar users on Twitter

The total similarity formula

\( Sim _{\text{Total }} ({\text{u}}_{\text{i}} ,{\text{u}}_{\text{j}} ) = \mathop \sum \limits_{m = 1}^{7} \left( {Sim_{\text{m }} \left( {{\text{u}}_{\text{i}} ,{\text{u}}_{\text{j}} } \right) * {\text{weight}}_{\text{m }} } \right) \)

where, \( u_{i} \) is the examined user—TSim attempts to find similar users to it

\( u_{j} \) is the candidate user—the user that TSim is computing its similarity to ui

Simm is the score of signal m similarity between ui and uj. Sim1 through Sim7 explained below

weightm is the weight assigned to signal m score

Name

Formula

Explanation

Signal 1 or Sim1

Followings and Followers Relationship Similarity

\( {\text{Sim}}_{{ {\text{Relationship}}}} \left( {u_{i} , u_{j} } \right) = \left\{ {\begin{array}{*{20}c} {1\; if\;the\;candidate\;user\;appears \;in \;one \;list} \\ { 2\; if\; the \;candidate user\;appears \;in \;two \;lists} \\ . \\ . \\ . \\ . \\ {n + k \;if\; the \;candidate \;user \;appears \;in \;all \;lists} \\ \end{array} } \right. \)

n is the number of the ui’s followers

k is the number of the ui’s friends

Signal 2 or Sim2

Mention Similarity

\( {\text{Sim}}_{\text{Mention }} \left( {u_{i} , u_{j} } \right) = \mathop \sum \limits_{l = 1}^{w} \frac{{{\text{twtsThrd }}\left( {{\text{l}}, u_{i} ,u_{j} } \right)}}{{{\text{twtsThrdTot}}\left( {{\text{l}}, u_{i} } \right) }}* \frac{ 1}{{ {\text{accntsTwt}}\left( {{\text{l}}, u_{i} } \right)}} \)

twtsThrd is a function that returns the number of ui tweets in the communication thread l with uj that mention the account uj

twtsThrdTot is a function that returns the total number of tweets in the communication thread l.

accntsTwt is the total number of accounts in the tweets in thread l

w is the total number of communication threads mentioning both ui and uj

Signal 3 or Sim3

Retweet Similarity

\( {\text{Sim}}_{\text{Retweet }} \left( {u_{i} , u_{j} } \right) = {\text{numOfTwtsInRetwtList}}\left( {{\text{u}}_{\text{i}} , {\text{u}}_{\text{j}} } \right) \)

numOfTwtsInRetwtList is the number of uj tweets that ui retweeted

Signal 4 or Sim4

Favorite Similarity

\( {\text{Sim}}_{\text{Favorite }} \left( {u_{i} , u_{j} } \right) = {\text{numOfTwtsInFavList }}\left( {{\text{u}}_{\text{j}} ,{\text{u}}_{\text{i}} } \right) \)

numOfTwtsInFavList is the number of uj tweets that ui favorited

Signal 5 or Sim5

Common Hashtags Similarity

\( {\text{Sim}}_{\text{Hashtag }} \left( {{\text{u}}_{\text{i}} , {\text{u}}_{\text{j}} } \right) = \sum \limits_{l = 1}^{w} \frac{1}{{1 + HTOffset\left( {{\text{u}}_{\text{i}} , {\text{u}}_{\text{j}} ,{\text{HT}}_{l} } \right) }} \)

where,

\( {HTOffset}\left( {{\text{u}}_{\text{i}} , {\text{u}}_{\text{j}} ,HT} \right) = \left| {{\text{PT}}\left( {{\text{u}}_{\text{i}} , {\text{HT}}} \right) - {\text{PT}}\left( {{\text{u}}_{\text{j}} , {\text{HT}}} \right)} \right| + \left| {{\text{NT}}\left( {{\text{u}}_{\text{i}} , {\text{HT}}} \right) - {\text{NT}}\left( {{\text{u}}_{\text{j}} , {\text{HT}}} \right)} \right| + \left| {{\text{NTT}}\left( {{\text{u}}_{\text{i}} , {\text{HT}}} \right) - {\text{NTT}}\left( {{\text{u}}_{\text{j}} , {\text{HT}}} \right)} \right| \)

PT is a function that takes in a user id and a hashtag HT and returns the number of positive tweets of the user in the hashtag

NT is a function that takes in a user id and a hashtag HT and returns the number of negative tweets of the user in the hashtag

NTT is a function that takes in a user id and a hashtag HT and returns the number of neutral tweets of the user in the hashtag

w is the total number of hashtags that both ui and uj tweeted in

Signal 6 or Sim6

Common Interests Similarity

\( {\text{Sim}}_{\text{Interests }} \left( {{\text{u}}_{\text{i}} ,{\text{u}}_{\text{j}} } \right) = {\text{count}}\left( {{\text{ints}}\left( {{\text{u}}_{\text{i}} } \right) \cap {\text{ints}}\left( {{\text{u}}_{\text{j}} } \right)} \right) \)

Ints is a function that takes in a user id and returns his/her top 5 interests after performing topic analysis to his/her tweets

Signal 7 or Sim7

Profile Similarity

\( \begin{aligned} {\text{Sim}}_{\text{Profile}} \, = & \;\,\left[ {{\text{gender}}\left( {{\text{u}}_{\text{i}} } \right)is \, equal \, to\;{\text{gender}}\left( {{\text{u}}_{\text{j}} } \right)} \right] \\ & + \;[{\text{language}}\left( {{\text{u}}_{\text{i}} } \right)is \, equal \, to\;{\text{language}}\left( {{\text{u}}_{\text{j}} } \right)] \\ & + \;[{\text{location}}\left( {{\text{u}}_{\text{i}} } \right)is \, equal \, to\;{\text{location}}\left( {{\text{u}}_{\text{j}} } \right)] \\ \end{aligned} \)

Gender is a function that takes in a user id and returns its gender from the user’s profile on Twitter

Language is a function that takes in a user id and returns its language from the user’s profile

Location is a function that takes in a user id and returns its location from the user’s profile