Navigation
|
Some Statistics about data User - Term Matrix: Users : 70021 Terms: 55808 Non Zeros : 23243197 Rank for svd : 1000 Group - Term Matrix Groups : 87700 Terms : 55808 Non Zeros : 75696630 Tag Sets used for query: Total tag sets: 500 Avg no. of tags: 13.8 Database: Users: 84547 Avg number of pics per user : 746 Avg number of tag sets per user : 235 Number of tags in a tag-set [1,2) 3.81% [2,4) 17.13% [4,8) 41.13% [8,16) 30.33% [16,32) 6.75% [32,64) 0.74% [64,2048) 0.10% "tag-uniqueness" of users On an average, for a user: 50% of her tags are used by less than 1% of the users 10% of her tags are used by 1-2% of the users 10% of her tags are used by 2-4% of the users 10% of her tags are used by 4-8% of the users 10% of her tags are used by 8-16% of the users 08% of her tags are used by 16-32% of the users 02% of her tags are used by 32-64% of the users Query Selection Segment the users based on the number of tag sets they have. The range of a bucket is [2^i , 2^(i+1)), i=0,1,2... Select 50 users from each bucket Randomly select 1 tag set (with at least 10 tags) for each user selected above Some Statistics about data User - Term Matrix: Users : 70021 Terms: 55808 Non Zeros : 23243200 Rank for svd : 1000 Group - Term Matrix Groups : 87700 Terms : 55808 Non Zeros : 75696630 Tag Sets used for query: Total tag sets: 642 Avg no. of tags: 13.8 Database: Users: 84547 Avg number of pics per user : 746 Avg number of tag sets per user : 235 Number of tags in a tag-set [1,2) 3.81% [2,4) 17.13% [4,8) 41.13% [8,16) 30.33% [16,32) 6.75% [32,64) 0.74% [64,2048) 0.10% "tag-uniqueness" of users On an average, for a user 50% of her tags are used by less than 1% of the users 10% of her tags are used by 1-2% of the users 10% of her tags are used by 2-4% of the users 10% of her tags are used by 4-8% of the users 10% of her tags are used by 8-16% of the users 08% of her tags are used by 16-32% of the users 02% of her tags are used by 32-64% of the users Query Expansion Input a weighted list of inital tags (all weights equal by default) and a query expansion weight Get all the tag sets for the given user For each tag set, compute the number of common tags with the given tags. The sum of weights of the common tags gives the weight of the tag set. Give a weight equal to the above weight to each tag in the tag set. The weight for a tag is the weight it accumulates over all the tag sets. Change the weight of the initial tags as weight = weight / (query expansion weight) The tags with non zero weights will be referred to as search tags. Evaluation For a given (user)-(tag set), consider first two tags as the given tags Remove the given tag set from the user's profile Don't suggest the tags given in the query Calculate precision-recall for top 20 positions. Empty positions are considered as garbage output. System -1 : What Flickr seems to do Method Give out a ranked list of tags based on the frequency of tags in the user's own profile (query independent) Results Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.47 0.04 614 02 0.42 0.08 614 03 0.37 0.11 614 04 0.34 0.13 614 05 0.32 0.15 614 06 0.29 0.16 614 07 0.27 0.18 614 08 0.26 0.19 614 09 0.24 0.20 614 10 0.23 0.21 614 11 0.22 0.22 614 12 0.21 0.23 614 13 0.20 0.24 614 14 0.20 0.25 614 15 0.19 0.25 614 16 0.18 0.26 614 17 0.18 0.27 614 18 0.17 0.27 614 19 0.16 0.28 614 20 0.16 0.28 614 Pros Plain and simple. Works well many times. Cons Gives out only the tags which are already present in the user's profile. Plus its query independent System 0 : Baseline system Method Expand the query using user's profile with a query expansion weight of 1 Give out a ranked list of tags based on the above expansion Results Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.56 0.05 639 02 0.51 0.10 639 03 0.47 0.14 639 04 0.43 0.17 639 05 0.40 0.19 639 06 0.37 0.21 639 07 0.35 0.23 639 08 0.33 0.24 639 09 0.31 0.26 639 10 0.29 0.27 639 11 0.27 0.28 639 12 0.26 0.28 639 13 0.25 0.29 639 14 0.23 0.30 639 15 0.22 0.30 639 16 0.21 0.30 639 17 0.20 0.31 639 18 0.20 0.31 639 19 0.19 0.32 639 20 0.18 0.32 639 With User Study Overall (position [tab] avg precision [tab] n): 01 0.72 130 02 0.68 130 03 0.64 130 04 0.59 130 05 0.55 130 06 0.52 130 07 0.49 130 08 0.46 130 09 0.43 130 10 0.41 130 11 0.39 130 12 0.37 130 13 0.36 130 14 0.34 130 15 0.32 130 16 0.31 130 17 0.30 130 18 0.29 130 19 0.28 130 20 0.27 130 With at least 4 tags in the tag set Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.42 0.09 653 02 0.37 0.14 653 03 0.32 0.18 653 04 0.28 0.20 653 05 0.25 0.22 653 06 0.23 0.24 653 07 0.21 0.25 653 08 0.19 0.25 653 09 0.18 0.26 653 10 0.17 0.27 653 11 0.16 0.27 653 12 0.15 0.28 653 13 0.14 0.28 653 14 0.13 0.29 653 15 0.13 0.29 653 16 0.12 0.30 653 17 0.11 0.30 653 18 0.11 0.30 653 19 0.11 0.31 653 20 0.10 0.31 653 Pros High precision and recall Cons Gives out only the tags which are already present in the user's profile System 1 : Tag Similarity + User Similarity Method Build User-Term matrix where U_T(i,j) = IDF weight of tag j if user i has that tag in his profile, otherwise 0. Each user has L2 norm = 1 Expand the query using user's profile with a query expansion weight of 0.8. Build a tag vector using the user's profile. An entry in the vector is IDF weight of the corresponsing tag. L2 norm = 1 Compute user-user similarity as S = U_T*U Rank users according to the search tags: R = sum over all tags (search tag weight * U_T(:,tagid)) Join the above two criteria as F = S .* R Take the top 10 users. These will be referred to as suggested users. For each suggested user, do a query expansion using search tags as the query and a query expansion weight of 1. This gives a weighted list of suggested tags from this user. Overall weight of a tag = sum over all suggested users (user weight * tag weight from this user as calculated above) Rerank the tag weight acording to IDF as Tag weight = Tag weight / IDF weight Output a ranked list of tags Results Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.33 0.03 642 02 0.30 0.06 642 03 0.26 0.07 642 04 0.24 0.09 642 05 0.22 0.10 642 06 0.20 0.11 642 07 0.19 0.12 642 08 0.18 0.13 642 09 0.17 0.14 642 10 0.17 0.15 642 11 0.16 0.16 642 12 0.15 0.16 642 13 0.15 0.17 642 14 0.14 0.18 642 15 0.14 0.18 642 16 0.13 0.19 642 17 0.13 0.19 642 18 0.12 0.20 642 19 0.12 0.20 642 20 0.12 0.21 642 With user study Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.56 130 02 0.50 130 03 0.44 130 04 0.41 130 05 0.37 130 06 0.34 130 07 0.32 130 08 0.31 130 09 0.30 130 10 0.29 130 11 0.28 130 12 0.27 130 13 0.26 130 14 0.25 130 15 0.24 130 16 0.23 130 17 0.22 130 18 0.22 130 19 0.21 130 20 0.21 130 With at least 4 tags in the tag set Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.21 0.04 653 02 0.17 0.06 653 03 0.15 0.08 653 04 0.14 0.09 653 05 0.13 0.11 653 06 0.12 0.12 653 07 0.11 0.13 653 08 0.11 0.14 653 09 0.10 0.14 653 10 0.09 0.15 653 11 0.09 0.15 653 12 0.09 0.16 653 13 0.08 0.17 653 14 0.08 0.17 653 15 0.08 0.18 653 16 0.07 0.18 653 17 0.07 0.19 653 18 0.07 0.19 653 19 0.07 0.20 653 20 0.07 0.20 653 When we take TF also in the user vector: Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.29 0.03 642 02 0.25 0.05 642 03 0.23 0.07 642 04 0.22 0.08 642 05 0.20 0.10 642 06 0.19 0.11 642 07 0.18 0.12 642 08 0.17 0.13 642 09 0.16 0.14 642 10 0.16 0.15 642 11 0.15 0.16 642 12 0.14 0.16 642 13 0.14 0.17 642 14 0.13 0.17 642 15 0.13 0.18 642 16 0.12 0.18 642 17 0.12 0.19 642 18 0.11 0.19 642 19 0.11 0.20 642 20 0.11 0.20 642 Pros Suggest tags which the user might not have based on similarity with other users Cons Low recision and recall. There could be number of reasons for this. First two tags are not always the most descriptive. Further, the suggested tags might still be relevant if a human judges the result. System 2 : Tag Similarity + concept space User Similarity Method Same as System 1 except that the user-user similarity is calculated in the concept space in the following way: Map the search tags to the concept space as QC = Q' * svd_V Map the user tag vector to the concept space as UC = U' * svd * svd_V Reweight the user concepts using the concept weights of the search tag. UC = UC .* QC Compute user-user similarity as S = svd_U * UC' Results Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.19 0.02 642 02 0.15 0.03 642 03 0.13 0.04 642 04 0.12 0.04 642 05 0.11 0.05 642 06 0.10 0.05 642 07 0.09 0.06 642 08 0.09 0.06 642 09 0.09 0.07 642 10 0.08 0.07 642 11 0.08 0.08 642 12 0.08 0.08 642 13 0.07 0.08 642 14 0.07 0.09 642 15 0.07 0.09 642 16 0.07 0.09 642 17 0.06 0.09 642 18 0.06 0.10 642 19 0.06 0.10 642 20 0.06 0.10 642 With TF: Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.29 0.03 642 02 0.22 0.04 642 03 0.18 0.05 642 04 0.17 0.06 642 05 0.15 0.07 642 06 0.14 0.07 642 07 0.13 0.08 642 08 0.12 0.08 642 09 0.11 0.09 642 10 0.10 0.09 642 11 0.10 0.10 642 12 0.09 0.10 642 13 0.09 0.10 642 14 0.09 0.11 642 15 0.08 0.11 642 16 0.08 0.11 642 17 0.08 0.12 642 18 0.08 0.12 642 19 0.07 0.12 642 20 0.07 0.13 642 Pros A user might have different concepts in his/her profile. Instead of using the complete profile for computing the similarity with other users, give higher weight to concepts present in the query. Cons Further lower precision and recall (totally unexpected, btw!) System 3 : Tag Similarity + Group Similarity Method Same as System 1 except that we have a Group-Term matrix and we compute User-Group similarity instead of User-User similarity Results Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.35 0.03 642 02 0.30 0.06 642 03 0.28 0.08 642 04 0.27 0.10 642 05 0.24 0.11 642 06 0.22 0.12 642 07 0.21 0.14 642 08 0.20 0.15 642 09 0.19 0.16 642 10 0.18 0.17 642 11 0.17 0.17 642 12 0.17 0.18 642 13 0.16 0.19 642 14 0.15 0.19 642 15 0.15 0.20 642 16 0.14 0.20 642 17 0.14 0.21 642 18 0.13 0.21 642 19 0.13 0.22 642 20 0.13 0.22 642 With at least 4 tags in the tag set Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.24 0.05 653 02 0.21 0.08 653 03 0.18 0.09 653 04 0.16 0.11 653 05 0.15 0.13 653 06 0.13 0.14 653 07 0.12 0.15 653 08 0.12 0.15 653 09 0.11 0.16 653 10 0.10 0.17 653 11 0.10 0.18 653 12 0.09 0.18 653 13 0.09 0.19 653 14 0.09 0.20 653 15 0.08 0.21 653 16 0.08 0.21 653 17 0.08 0.22 653 18 0.08 0.22 653 19 0.07 0.23 653 20 0.07 0.23 653 With TF: Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.17 0.02 642 02 0.13 0.02 642 03 0.12 0.03 642 04 0.11 0.04 642 05 0.10 0.05 642 06 0.09 0.05 642 07 0.09 0.06 642 08 0.08 0.06 642 09 0.08 0.07 642 10 0.08 0.07 642 11 0.07 0.07 642 12 0.07 0.08 642 13 0.07 0.08 642 14 0.06 0.08 642 15 0.06 0.08 642 16 0.06 0.09 642 17 0.06 0.09 642 18 0.05 0.09 642 19 0.05 0.09 642 20 0.05 0.09 642 System 4 : Tag Similarity + concept space Group Similarity Method Same as System 3 except that we compute User-Group similarity in concept space as was done in System 2 Results Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.26 0.03 642 02 0.22 0.04 642 03 0.20 0.06 642 04 0.19 0.07 642 05 0.18 0.08 642 06 0.17 0.09 642 07 0.16 0.10 642 08 0.15 0.11 642 09 0.14 0.12 642 10 0.14 0.13 642 11 0.13 0.13 642 12 0.13 0.14 642 13 0.12 0.14 642 14 0.12 0.15 642 15 0.12 0.16 642 16 0.11 0.16 642 17 0.11 0.17 642 18 0.11 0.17 642 19 0.10 0.17 642 20 0.10 0.18 642 With TF: Overall (position [tab] avg precision [tab] avg recall [tab] n): 01 0.26 0.02 642 02 0.21 0.04 642 03 0.18 0.05 642 04 0.16 0.06 642 05 0.15 0.07 642 06 0.14 0.07 642 07 0.13 0.08 642 08 0.12 0.09 642 09 0.12 0.09 642 10 0.11 0.10 642 11 0.10 0.10 642 12 0.10 0.10 642 13 0.09 0.11 642 14 0.09 0.11 642 15 0.09 0.11 642 16 0.08 0.12 642 17 0.08 0.12 642 18 0.08 0.12 642 19 0.08 0.12 642 20 0.07 0.13 642 System 0 + System 1 : Profile + Similar Users Method Linear combination with different weights Results Overall (position [tab] avg precision [tab] avg recall [tab] n):
With User Study Overall (position [tab] avg precision [tab] n):
With at least 4 tags: Overall (position [tab] avg precision [tab] avg recall [tab] n):
System 0 + System 2 : Profile + concept space similar Users Method Linear combination with different weights Results Overall (position [tab] avg precision [tab] avg recall [tab] n):
System 0 + System 3 : Profile + similar Groups Method Linear combination with different weights Results Overall (position [tab] avg precision [tab] avg recall [tab] n):
With at least 4 tags: Overall (position [tab] avg precision [tab] avg recall [tab] n):
System 0 + System 4 : Profile + concept space Similar Groups Method Linear combination with different weights Results Overall (position [tab] avg precision [tab] avg recall [tab] n):
|