Skip to main content

Table 2 Brief descriptions of the map and the reduce functions used in processing each signal

From: TSim: a system for discovering similar users on Twitter

MapReduce job

Map function

Reduce function

Signal 1: Followings and followers relationship similarity

It takes in the examined user ID and each of his/her following and followers

It simply produces pairs of (follower/following user id, “1”)

The input will be every follower/following user ID and a list of “1”s depending on how many times this user id appeared in the different lists

The reduce adds up these “1”s to produce the follower/following user ID along with the sum of these ones, which is its score on this signal

Signal 2: Mention similarity

It takes in the examined user ID and each of his/her tweet threads

It extracts the user IDs in these tweets (preceded by @ symbol)

It calculated the score for each user ID based on the formula in Table 1

It outputs each user ID along with its score

The input will be every user ID mentioned in the tweets of the examined user along with a set of scores for each thread this user was mentioned in

The reduce adds up these scores to produce the mentioned user ID along with the sum of these scores, which is the user’s score on this signal

Signal 3: Retweet similarity

It takes in the examined user ID and each of his/her retweets

It simply produces pairs of (original tweeter user id, “1”)

The input will be every user ID the examined user has retweeted their tweets and a list of “1”s depending on how many times the examined user retweeted for this particular user

The reduce adds up these “1”s to produce the retweeted user ID along with the sum of these ones, which is its score on this signal

Signal 4: Favorite similarity

It takes in the examined user ID and each of his/her favorited tweets

It simply produces pairs of (original tweeter user id, “1”)

The input will be every user ID the examined user has favorited their tweets and a list of “1”s depending on how many times the examined user favorited for this particular user

The reduce adds up these “1”s to produce the favorited user ID along with the sum of these ones, which is its score on this signal

Signal 5: Common hashtags similarity

It takes in the candidate user ID and each of his/her tweets that have the hashtag symbol (#)

It compares the sentiment of tweets against the sentiment of the examined user’s tweets in the same hashtag (obtained in preprocessing) using the formula in Table 1. (HTOffset)

It produces (candidate ID, Hashtag + score)

The reduce function will receive a candidate user ID and a list of pairs of hashtags and scores

It will loop through this list and sum the scores with the same hashtag

Then it will use the similarity formula in Table 1 to compute the final score for each candidate

Produce candidate ID and score

Signal 6: Common interests similarity

It takes in the candidate user ID and a list of his/her tweets

Applies LDA to get the top 5 interests

Computes the score after comparing with the examined user’s top 5 interests (obtained in preprocessing) according to the formula in Table 1

Produce (candidate ID, score)

The Reduce function simply takes the input and passes as output

Signal 7: Profile similarity

It takes in the candidate user ID and his/her profile info

Computes the score after comparing with the examined user’s gender, location and language (obtained in preprocessing) according to the formula in Table 1

Produce (candidate ID, score)

The Reduce function simply takes the input and passes as output

Mid and final MapReduce

Takes in the candidate user ID along with his/her score

Produces (candidate ID, signal weight + score)

The reduce function will receive a candidate user ID and a list of pairs of signal weights and scores

It will loop through this list and sum the scores with the same weight

Then it will multiply the summed up score by the associated weight and sums up the weighted sums to produce the score for that candidate

Produce candidate ID and score