Flickr related stuff

Navigation

This wiki

This page

Some Statistics about data

User - Term Matrix:
Users : 70021
Terms: 55808
Non Zeros : 23243197
Rank for svd : 1000

Group - Term Matrix
Groups : 87700
Terms : 55808
Non Zeros : 75696630

Tag Sets used for query:
Total tag sets: 500
Avg no. of tags: 13.8

Database:
Users: 84547
Avg number of pics per user : 746
Avg number of tag sets per user : 235

Number of tags in a tag-set
[1,2)             3.81%
[2,4)             17.13%
[4,8)       41.13%
[8,16)           30.33%
[16,32)           6.75%
[32,64)           0.74%
[64,2048)       0.10%

"tag-uniqueness" of users
On an average, for a user:
50% of her tags are used by less than 1% of the users
10% of her tags are used by 1-2% of the users
10% of her tags are used by 2-4% of the users
10% of her tags are used by 4-8% of the users
10% of her tags are used by 8-16% of the users
08% of her tags are used by 16-32% of the users
02% of her tags are used by 32-64% of the users

Query Selection
Segment the users based on the number of tag sets they have.
The range of a bucket is [2^i , 2^(i+1)), i=0,1,2...
Select 50 users from each bucket
Randomly select 1 tag set (with at least 10 tags) for each user selected above

Some Statistics about data

User - Term Matrix:
Users : 70021
Terms: 55808
Non Zeros : 23243200
Rank for svd : 1000

Group - Term Matrix
Groups : 87700
Terms : 55808
Non Zeros : 75696630

Tag Sets used for query:
Total tag sets: 642
Avg no. of tags: 13.8

Database:
Users: 84547
Avg number of pics per user : 746
Avg number of tag sets per user : 235

Number of tags in a tag-set
[1,2)             3.81%
[2,4)             17.13%
[4,8)       41.13%
[8,16)           30.33%
[16,32)           6.75%
[32,64)           0.74%
[64,2048)       0.10%

"tag-uniqueness" of users
On an average, for a user
50% of her tags are used by less than 1% of the users
10% of her tags are used by 1-2% of the users
10% of her tags are used by 2-4% of the users
10% of her tags are used by 4-8% of the users
10% of her tags are used by 8-16% of the users
08% of her tags are used by 16-32% of the users
02% of her tags are used by 32-64% of the users

Query Expansion
Input a weighted list of inital tags (all weights equal by default) and a query expansion weight
Get all the tag sets for the given user
For each tag set, compute the number of common tags with the given tags. The sum of weights of the common tags gives the weight of the tag set.
Give a weight equal to the above weight to each tag in the tag set.
The weight for a tag is the weight it accumulates over all the tag sets.
Change the weight of the initial tags as weight = weight / (query expansion weight)
The tags with non zero weights will be referred to as search tags.

Evaluation
For a given (user)-(tag set), consider first two tags as the given tags
Remove the given tag set from the user's profile
Don't suggest the tags given in the query
Calculate precision-recall for top 20 positions. Empty positions are considered as garbage output.

System -1 : What Flickr seems to do

Method
Give out a ranked list of tags based on the frequency of tags in the user's own profile (query independent)

Results
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01 0.47     0.04 614
02 0.42     0.08 614
03 0.37     0.11 614
04 0.34     0.13 614
05 0.32     0.15 614
06 0.29     0.16 614
07 0.27     0.18 614
08 0.26     0.19 614
09 0.24     0.20 614
10 0.23     0.21 614
11 0.22     0.22 614
12 0.21     0.23 614
13 0.20     0.24 614
14 0.20     0.25 614
15 0.19     0.25 614
16 0.18     0.26 614
17 0.18     0.27 614
18 0.17     0.27 614
19 0.16     0.28 614
20 0.16     0.28 614

Pros
Plain and simple. Works well many times.

Cons
Gives out only the tags which are already present in the user's profile. Plus its query independent

System 0 : Baseline system

Method
Expand the query using user's profile with a query expansion weight of 1
Give out a ranked list of tags based on the above expansion

Results
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01 0.56     0.05 639
02 0.51     0.10 639
03 0.47     0.14 639
04 0.43     0.17 639
05 0.40     0.19 639
06 0.37     0.21 639
07 0.35     0.23 639
08 0.33     0.24 639
09 0.31     0.26 639
10 0.29     0.27 639
11 0.27     0.28 639
12 0.26     0.28 639
13 0.25     0.29 639
14 0.23     0.30 639
15 0.22     0.30 639
16 0.21     0.30 639
17 0.20     0.31 639
18 0.20     0.31 639
19 0.19     0.32 639
20 0.18     0.32 639

With User Study
Overall (position [tab] avg precision [tab] n):
01      0.72    130
02      0.68    130
03      0.64    130
04      0.59    130
05      0.55    130
06      0.52    130
07      0.49    130
08      0.46    130
09      0.43    130
10      0.41    130
11      0.39    130
12      0.37    130
13      0.36    130
14      0.34    130
15      0.32    130
16      0.31    130
17      0.30    130
18      0.29    130
19      0.28    130
20      0.27    130

With at least 4 tags in the tag set
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01 0.42 0.09 653
02 0.37 0.14 653
03 0.32 0.18 653
04 0.28 0.20 653
05 0.25 0.22 653
06 0.23 0.24 653
07 0.21 0.25 653
08 0.19 0.25 653
09 0.18 0.26 653
10 0.17 0.27 653
11 0.16 0.27 653
12 0.15 0.28 653
13 0.14 0.28 653
14 0.13 0.29 653
15 0.13 0.29 653
16 0.12 0.30 653
17 0.11 0.30 653
18 0.11 0.30 653
19 0.11 0.31 653
20 0.10 0.31 653

Pros
High precision and recall

Cons
Gives out only the tags which are already present in the user's profile

System 1 : Tag Similarity + User Similarity

Method
Build User-Term matrix where U_T(i,j) = IDF weight of tag j if user i has that tag in his profile, otherwise 0. Each user has L2 norm = 1
Expand the query using user's profile with a query expansion weight of 0.8.
Build a tag vector using the user's profile. An entry in the vector is IDF weight of the corresponsing tag. L2 norm = 1
Compute user-user similarity as S = U_T*U
Rank users according to the search tags: R = sum over all tags (search tag weight * U_T(:,tagid))
Join the above two criteria as F = S .* R
Take the top 10 users. These will be referred to as suggested users.
For each suggested user, do a query expansion using search tags as the query and a query expansion weight of 1. This gives a weighted list of suggested tags from this user.
Overall weight of a tag = sum over all suggested users (user weight * tag weight from this user as calculated above)
Rerank the tag weight acording to IDF as Tag weight = Tag weight / IDF weight
Output a ranked list of tags

Results
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01 0.33     0.03 642
02 0.30     0.06 642
03 0.26     0.07 642
04 0.24     0.09 642
05 0.22     0.10 642
06 0.20     0.11 642
07 0.19     0.12 642
08 0.18     0.13 642
09 0.17     0.14 642
10 0.17     0.15 642
11 0.16     0.16 642
12 0.15     0.16 642
13 0.15     0.17 642
14 0.14     0.18 642
15 0.14     0.18 642
16 0.13     0.19 642
17 0.13     0.19 642
18 0.12     0.20 642
19 0.12     0.20 642
20 0.12     0.21 642

With user study
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01      0.56    130
02      0.50    130
03      0.44    130
04      0.41    130
05      0.37    130
06      0.34    130
07      0.32    130
08      0.31    130
09      0.30    130
10      0.29    130
11      0.28    130
12      0.27    130
13      0.26    130
14      0.25    130
15      0.24    130
16      0.23    130
17      0.22    130
18      0.22    130
19      0.21    130
20      0.21    130

With at least 4 tags in the tag set
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01 0.21 0.04 653
02 0.17 0.06 653
03 0.15 0.08 653
04 0.14 0.09 653
05 0.13 0.11 653
06 0.12 0.12 653
07 0.11 0.13 653
08 0.11 0.14 653
09 0.10 0.14 653
10 0.09 0.15 653
11 0.09 0.15 653
12 0.09 0.16 653
13 0.08 0.17 653
14 0.08 0.17 653
15 0.08 0.18 653
16 0.07 0.18 653
17 0.07 0.19 653
18 0.07 0.19 653
19 0.07 0.20 653
20 0.07 0.20 653

When we take TF also in the user vector:
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01      0.29    0.03    642
02      0.25    0.05    642
03      0.23    0.07    642
04      0.22    0.08    642
05      0.20    0.10    642
06      0.19    0.11    642
07      0.18    0.12    642
08      0.17    0.13    642
09      0.16    0.14    642
10      0.16    0.15    642
11      0.15    0.16    642
12      0.14    0.16    642
13      0.14    0.17    642
14      0.13    0.17    642
15      0.13    0.18    642
16      0.12    0.18    642
17      0.12    0.19    642
18      0.11    0.19    642
19      0.11    0.20    642
20      0.11    0.20    642

Pros
Suggest tags which the user might not have based on similarity with other users

Cons
Low recision and recall. There could be number of reasons for this. First two tags are not always the most descriptive. Further, the suggested tags might still be relevant if a human judges the result.

System 2 : Tag Similarity + concept space User Similarity

Method
Same as System 1 except that the user-user similarity is calculated in the concept space in the following way:
Map the search tags to the concept space as QC = Q' * svd_V
Map the user tag vector to the concept space as UC = U' * svd * svd_V
Reweight the user concepts using the concept weights of the search tag. UC = UC .* QC
Compute user-user similarity as S = svd_U * UC'

Results
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01 0.19     0.02 642
02 0.15     0.03 642
03 0.13     0.04 642
04 0.12     0.04 642
05 0.11     0.05 642
06 0.10     0.05 642
07 0.09     0.06 642
08 0.09     0.06 642
09 0.09     0.07 642
10 0.08     0.07 642
11 0.08     0.08 642
12 0.08     0.08 642
13 0.07     0.08 642
14 0.07     0.09 642
15 0.07     0.09 642
16 0.07     0.09 642
17 0.06     0.09 642
18 0.06     0.10 642
19 0.06     0.10 642
20 0.06     0.10 642

With TF:
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01      0.29    0.03    642
02      0.22    0.04    642
03      0.18    0.05    642
04      0.17    0.06    642
05      0.15    0.07    642
06      0.14    0.07    642
07      0.13    0.08    642
08      0.12    0.08    642
09      0.11    0.09    642
10      0.10    0.09    642
11      0.10    0.10    642
12      0.09    0.10    642
13      0.09    0.10    642
14      0.09    0.11    642
15      0.08    0.11    642
16      0.08    0.11    642
17      0.08    0.12    642
18      0.08    0.12    642
19      0.07    0.12    642
20      0.07    0.13    642

Pros
A user might have different concepts in his/her profile. Instead of using the complete profile for computing the similarity with other users, give higher weight to concepts present in the query.

Cons
Further lower precision and recall (totally unexpected, btw!)

System 3 : Tag Similarity + Group Similarity

Method
Same as System 1 except that we have a Group-Term matrix and we compute User-Group similarity instead of User-User similarity

Results
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01 0.35     0.03 642
02 0.30     0.06 642
03 0.28     0.08 642
04 0.27     0.10 642
05 0.24     0.11 642
06 0.22     0.12 642
07 0.21     0.14 642
08 0.20     0.15 642
09 0.19     0.16 642
10 0.18     0.17 642
11 0.17     0.17 642
12 0.17     0.18 642
13 0.16     0.19 642
14 0.15     0.19 642
15 0.15     0.20 642
16 0.14     0.20 642
17 0.14     0.21 642
18 0.13     0.21 642
19 0.13     0.22 642
20 0.13     0.22 642

With at least 4 tags in the tag set
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01 0.24 0.05 653
02 0.21 0.08 653
03 0.18 0.09 653
04 0.16 0.11 653
05 0.15 0.13 653
06 0.13 0.14 653
07 0.12 0.15 653
08 0.12 0.15 653
09 0.11 0.16 653
10 0.10 0.17 653
11 0.10 0.18 653
12 0.09 0.18 653
13 0.09 0.19 653
14 0.09 0.20 653
15 0.08 0.21 653
16 0.08 0.21 653
17 0.08 0.22 653
18 0.08 0.22 653
19 0.07 0.23 653
20 0.07 0.23 653

With TF:
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01      0.17    0.02    642
02      0.13    0.02    642
03      0.12    0.03    642
04      0.11    0.04    642
05      0.10    0.05    642
06      0.09    0.05    642
07      0.09    0.06    642
08      0.08    0.06    642
09      0.08    0.07    642
10      0.08    0.07    642
11      0.07    0.07    642
12      0.07    0.08    642
13      0.07    0.08    642
14      0.06    0.08    642
15      0.06    0.08    642
16      0.06    0.09    642
17      0.06    0.09    642
18      0.05    0.09    642
19      0.05    0.09    642
20      0.05    0.09    642

System 4 : Tag Similarity + concept space Group Similarity

Method
Same as System 3 except that we compute User-Group similarity in concept space as was done in System 2

Results
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01 0.26     0.03 642
02 0.22     0.04 642
03 0.20     0.06 642
04 0.19     0.07 642
05 0.18     0.08 642
06 0.17     0.09 642
07 0.16     0.10 642
08 0.15     0.11 642
09 0.14     0.12 642
10 0.14     0.13 642
11 0.13     0.13 642
12 0.13     0.14 642
13 0.12     0.14 642
14 0.12     0.15 642
15 0.12     0.16 642
16 0.11     0.16 642
17 0.11     0.17 642
18 0.11     0.17 642
19 0.10     0.17 642
20 0.10     0.18 642

With TF:
Overall (position [tab] avg precision [tab] avg recall [tab] n):
01      0.26    0.02    642
02      0.21    0.04    642
03      0.18    0.05    642
04      0.16    0.06    642
05      0.15    0.07    642
06      0.14    0.07    642
07      0.13    0.08    642
08      0.12    0.09    642
09      0.12    0.09    642
10      0.11    0.10    642
11      0.10    0.10    642
12      0.10    0.10    642
13      0.09    0.11    642
14      0.09    0.11    642
15      0.09    0.11    642
16      0.08    0.12    642
17      0.08    0.12    642
18      0.08    0.12    642
19      0.08    0.12    642
20      0.07    0.13    642

System 0 + System 1 : Profile + Similar Users

Method
Linear combination with different weights

Results
Overall (position [tab] avg precision [tab] avg recall [tab] n):

System 0 (50%) + System 1 (50%)	System 0 (75%) + System 1 (25%)	System 0 (85%) + System 1 (15%)	System 0 (95%) + System 1 (5%)	System 0 (99%) + System 1 (1%)
01 0.49 0.05 642 02 0.46 0.09 642 03 0.43 0.12 642 04 0.40 0.15 642 05 0.38 0.18 642 06 0.36 0.20 642 07 0.33 0.22 642 08 0.31 0.23 642 09 0.30 0.25 642 10 0.28 0.26 642 11 0.26 0.26 642 12 0.25 0.27 642 13 0.24 0.28 642 14 0.23 0.29 642 15 0.22 0.30 642 16 0.21 0.30 642 17 0.20 0.31 642 18 0.19 0.31 642 19 0.19 0.32 642 20 0.18 0.32 642	01 0.57 0.06 642 02 0.52 0.10 642 03 0.48 0.14 642 04 0.45 0.17 642 05 0.42 0.20 642 06 0.39 0.22 642 07 0.36 0.24 642 08 0.34 0.25 642 09 0.32 0.26 642 10 0.30 0.27 642 11 0.28 0.29 642 12 0.27 0.29 642 13 0.26 0.30 642 14 0.24 0.31 642 15 0.23 0.31 642 16 0.22 0.32 642 17 0.21 0.32 642 18 0.21 0.33 642 19 0.20 0.33 642 20 0.19 0.34 642	01 0.59 0.06 642 02 0.53 0.10 642 03 0.49 0.14 642 04 0.46 0.17 642 05 0.43 0.20 642 06 0.39 0.22 642 07 0.37 0.24 642 08 0.34 0.26 642 09 0.32 0.27 642 10 0.30 0.28 642 11 0.29 0.29 642 12 0.27 0.30 642 13 0.26 0.31 642 14 0.25 0.31 642 15 0.24 0.32 642 16 0.23 0.32 642 17 0.22 0.33 642 18 0.21 0.34 642 19 0.20 0.34 642 20 0.19 0.34 642	01 0.59 0.06 642 02 0.54 0.10 642 03 0.50 0.14 642 04 0.46 0.18 642 05 0.43 0.20 642 06 0.40 0.22 642 07 0.37 0.24 642 08 0.35 0.26 642 09 0.33 0.27 642 10 0.31 0.28 642 11 0.29 0.29 642 12 0.28 0.30 642 13 0.26 0.31 642 14 0.25 0.32 642 15 0.24 0.32 642 16 0.23 0.33 642 17 0.22 0.33 642 18 0.21 0.34 642 19 0.20 0.34 642 20 0.20 0.35 642	01 0.59 0.06 642 02 0.54 0.10 642 03 0.50 0.14 642 04 0.46 0.18 642 05 0.43 0.20 642 06 0.40 0.22 642 07 0.37 0.24 642 08 0.35 0.26 642 09 0.33 0.27 642 10 0.31 0.28 642 11 0.29 0.29 642 12 0.28 0.30 642 13 0.26 0.31 642 14 0.25 0.32 642 15 0.24 0.32 642 16 0.23 0.33 642 17 0.22 0.33 642 18 0.21 0.34 642 19 0.20 0.34 642 20 0.20 0.35 642

With User Study
Overall (position [tab] avg precision [tab] n):

System 0 (50%) + System 1 (50%)	System 0 (75%) + System 1 (25%)	System 0 (85%) + System 1 (15%)	System 0 (95%) + System 1 (5%)	System 0 (99%) + System 1 (1%)
01 0.65 130 02 0.62 130 03 0.59 130 04 0.57 130 05 0.53 130 06 0.51 130 07 0.48 130 08 0.45 130 09 0.43 130 10 0.41 130 11 0.39 130 12 0.37 130 13 0.36 130 14 0.35 130 15 0.34 130 16 0.33 130 17 0.32 130 18 0.31 130 19 0.30 130 20 0.29 130		01 0.72 130 02 0.70 130 03 0.66 130 04 0.62 130 05 0.58 130 06 0.54 130 07 0.52 130 08 0.48 130 09 0.45 130 10 0.43 130 11 0.41 130 12 0.40 130 13 0.38 130 14 0.36 130 15 0.35 130 16 0.34 130 17 0.33 130 18 0.32 130 19 0.31 130 20 0.30 130

With at least 4 tags:
Overall (position [tab] avg precision [tab] avg recall [tab] n):

System 0 (50%) + System 1 (50%)	System 0 (75%) + System 1 (25%)	System 0 (85%) + System 1 (15%)	System 0 (95%) + System 1 (5%)	System 0 (99%) + System 1 (1%)
01 0.35 0.07 653 02 0.30 0.12 653 03 0.27 0.15 653 04 0.25 0.18 653 05 0.23 0.20 653 06 0.21 0.22 653 07 0.20 0.23 653 08 0.18 0.24 653 09 0.17 0.25 653 10 0.16 0.26 653 11 0.15 0.26 653 12 0.14 0.27 653 13 0.13 0.28 653 14 0.13 0.28 653 15 0.12 0.29 653 16 0.11 0.29 653 17 0.11 0.29 653 18 0.10 0.30 653 19 0.10 0.30 653 20 0.10 0.31 653	01 0.40 0.08 653 02 0.35 0.14 653 03 0.31 0.17 653 04 0.28 0.20 653 05 0.25 0.22 653 06 0.23 0.24 653 07 0.21 0.25 653 08 0.19 0.26 653 09 0.18 0.27 653 10 0.17 0.28 653 11 0.16 0.28 653 12 0.15 0.29 653 13 0.14 0.30 653 14 0.14 0.30 653 15 0.13 0.31 653 16 0.12 0.31 653 17 0.12 0.31 653 18 0.11 0.32 653 19 0.11 0.32 653 20 0.10 0.33 653	01 0.42 0.09 653 02 0.36 0.14 653 03 0.32 0.18 653 04 0.29 0.21 653 05 0.26 0.23 653 06 0.24 0.24 653 07 0.21 0.26 653 08 0.20 0.26 653 09 0.18 0.27 653 10 0.17 0.28 653 11 0.16 0.29 653 12 0.15 0.29 653 13 0.14 0.30 653 14 0.14 0.31 653 15 0.13 0.31 653 16 0.12 0.31 653 17 0.12 0.32 653 18 0.11 0.32 653 19 0.11 0.33 653 20 0.10 0.33 653	01 0.42 0.09 653 02 0.37 0.14 653 03 0.32 0.18 653 04 0.29 0.21 653 05 0.26 0.23 653 06 0.24 0.25 653 07 0.22 0.26 653 08 0.20 0.27 653 09 0.18 0.27 653 10 0.17 0.28 653 11 0.16 0.29 653 12 0.15 0.30 653 13 0.15 0.30 653 14 0.14 0.31 653 15 0.13 0.31 653 16 0.13 0.32 653 17 0.12 0.32 653 18 0.11 0.32 653 19 0.11 0.33 653 20 0.11 0.33 653	01 0.42 0.09 653 02 0.37 0.14 653 03 0.32 0.18 653 04 0.29 0.21 653 05 0.26 0.23 653 06 0.24 0.25 653 07 0.22 0.26 653 08 0.20 0.27 653 09 0.19 0.27 653 10 0.17 0.28 653 11 0.16 0.29 653 12 0.15 0.30 653 13 0.15 0.30 653 14 0.14 0.31 653 15 0.13 0.31 653 16 0.13 0.32 653 17 0.12 0.32 653 18 0.11 0.32 653 19 0.11 0.33 653 20 0.11 0.33 653

System 0 + System 2 : Profile + concept space similar Users

Method
Linear combination with different weights

Results
Overall (position [tab] avg precision [tab] avg recall [tab] n):

System 0 (50%) + System 2 (50%)	System 0 (75%) + System 2 (25%)	System 0 (85%) + System 2 (15%)	System 0 (95%) + System 2 (5%)	System 0 (99%) + System 2 (1%)
01 0.40 0.04 642 02 0.38 0.07 642 03 0.37 0.10 642 04 0.35 0.13 642 05 0.33 0.16 642 06 0.31 0.18 642 07 0.30 0.19 642 08 0.28 0.21 642 09 0.26 0.22 642 10 0.25 0.23 642 11 0.24 0.24 642 12 0.23 0.24 642 13 0.22 0.25 642 14 0.21 0.26 642 15 0.20 0.26 642 16 0.19 0.27 642 17 0.18 0.27 642 18 0.18 0.28 642 19 0.17 0.28 642 20 0.16 0.29 642	01 0.54 0.05 642 02 0.50 0.10 642 03 0.46 0.13 642 04 0.43 0.16 642 05 0.40 0.19 642 06 0.37 0.21 642 07 0.35 0.23 642 08 0.32 0.24 642 09 0.31 0.25 642 10 0.29 0.26 642 11 0.27 0.27 642 12 0.26 0.28 642 13 0.25 0.29 642 14 0.23 0.29 642 15 0.22 0.30 642 16 0.21 0.30 642 17 0.20 0.31 642 18 0.20 0.31 642 19 0.19 0.32 642 20 0.18 0.32 642	01 0.57 0.06 642 02 0.52 0.10 642 03 0.47 0.14 642 04 0.45 0.17 642 05 0.42 0.20 642 06 0.38 0.22 642 07 0.36 0.23 642 08 0.33 0.25 642 09 0.31 0.26 642 10 0.30 0.27 642 11 0.28 0.28 642 12 0.27 0.29 642 13 0.25 0.30 642 14 0.24 0.30 642 15 0.23 0.31 642 16 0.22 0.31 642 17 0.21 0.32 642 18 0.20 0.32 642 19 0.19 0.32 642 20 0.19 0.33 642	01 0.57 0.06 642 02 0.52 0.10 642 03 0.48 0.14 642 04 0.45 0.17 642 05 0.42 0.20 642 06 0.39 0.22 642 07 0.36 0.24 642 08 0.34 0.25 642 09 0.32 0.27 642 10 0.30 0.28 642 11 0.29 0.29 642 12 0.27 0.30 642 13 0.26 0.30 642 14 0.24 0.31 642 15 0.23 0.31 642 16 0.22 0.32 642 17 0.21 0.32 642 18 0.20 0.33 642 19 0.20 0.33 642 20 0.19 0.33 642	01 0.57 0.06 642 02 0.53 0.10 642 03 0.48 0.14 642 04 0.45 0.17 642 05 0.42 0.20 642 06 0.39 0.22 642 07 0.36 0.24 642 08 0.34 0.25 642 09 0.32 0.27 642 10 0.30 0.28 642 11 0.29 0.29 642 12 0.27 0.30 642 13 0.26 0.30 642 14 0.24 0.31 642 15 0.23 0.31 642 16 0.22 0.32 642 17 0.21 0.32 642 18 0.21 0.33 642 19 0.20 0.33 642 20 0.19 0.33 642

System 0 + System 3 : Profile + similar Groups

Method
Linear combination with different weights

Results
Overall (position [tab] avg precision [tab] avg recall [tab] n):

System 0 (50%) + System 3 (50%)	System 0 (75%) + System 3 (25%)	System 0 (85%) + System 3 (15%)	System 0 (95%) + System 3 (5%)	System 0 (99%) + System 3 (1%)
01 0.52 0.05 642 02 0.48 0.09 642 03 0.44 0.13 642 04 0.41 0.15 642 05 0.39 0.18 642 06 0.36 0.20 642 07 0.34 0.22 642 08 0.32 0.24 642 09 0.30 0.25 642 10 0.29 0.26 642 11 0.27 0.27 642 12 0.26 0.28 642 13 0.24 0.29 642 14 0.23 0.29 642 15 0.22 0.30 642 16 0.21 0.31 642 17 0.21 0.31 642 18 0.20 0.32 642 19 0.19 0.32 642 20 0.18 0.32 642	01 0.59 0.06 642 02 0.52 0.10 642 03 0.49 0.14 642 04 0.46 0.17 642 05 0.42 0.20 642 06 0.39 0.22 642 07 0.37 0.24 642 08 0.34 0.25 642 09 0.32 0.27 642 10 0.30 0.28 642 11 0.29 0.29 642 12 0.27 0.30 642 13 0.26 0.30 642 14 0.25 0.31 642 15 0.23 0.32 642 16 0.22 0.32 642 17 0.22 0.33 642 18 0.21 0.33 642 19 0.20 0.34 642 20 0.19 0.34 642	01 0.60 0.06 642 02 0.54 0.10 642 03 0.50 0.14 642 04 0.46 0.18 642 05 0.43 0.20 642 06 0.40 0.22 642 07 0.37 0.24 642 08 0.35 0.26 642 09 0.33 0.27 642 10 0.31 0.28 642 11 0.29 0.29 642 12 0.28 0.30 642 13 0.26 0.31 642 14 0.25 0.32 642 15 0.24 0.32 642 16 0.23 0.32 642 17 0.22 0.33 642 18 0.21 0.34 642 19 0.20 0.34 642 20 0.19 0.34 642	01 0.59 0.06 642 02 0.54 0.10 642 03 0.50 0.15 642 04 0.46 0.18 642 05 0.43 0.20 642 06 0.40 0.23 642 07 0.38 0.25 642 08 0.35 0.26 642 09 0.33 0.27 642 10 0.31 0.29 642 11 0.29 0.29 642 12 0.28 0.30 642 13 0.26 0.31 642 14 0.25 0.32 642 15 0.24 0.32 642 16 0.23 0.33 642 17 0.22 0.33 642 18 0.21 0.34 642 19 0.20 0.34 642 20 0.20 0.35 642	01 0.60 0.06 642 02 0.54 0.11 642 03 0.50 0.14 642 04 0.46 0.18 642 05 0.43 0.20 642 06 0.40 0.22 642 07 0.38 0.25 642 08 0.35 0.26 642 09 0.33 0.27 642 10 0.31 0.29 642 11 0.29 0.29 642 12 0.28 0.30 642 13 0.26 0.31 642 14 0.25 0.32 642 15 0.24 0.32 642 16 0.23 0.33 642 17 0.22 0.33 642 18 0.21 0.34 642 19 0.20 0.34 642 20 0.20 0.35 642

With at least 4 tags:
Overall (position [tab] avg precision [tab] avg recall [tab] n):

System 0 (50%) + System 3 (50%)	System 0 (75%) + System 3 (25%)	System 0 (85%) + System 3 (15%)	System 0 (95%) + System 3 (5%)	System 0 (99%) + System 3 (1%)
01 0.36 0.07 653 02 0.32 0.12 653 03 0.28 0.15 653 04 0.25 0.18 653 05 0.23 0.20 653 06 0.21 0.22 653 07 0.20 0.23 653 08 0.18 0.24 653 09 0.17 0.25 653 10 0.16 0.26 653 11 0.15 0.26 653 12 0.14 0.27 653 13 0.13 0.28 653 14 0.13 0.28 653 15 0.12 0.29 653 16 0.11 0.29 653 17 0.11 0.30 653 18 0.11 0.30 653 19 0.10 0.31 653 20 0.10 0.31 653	01 0.41 0.08 653 02 0.35 0.14 653 03 0.31 0.17 653 04 0.28 0.20 653 05 0.25 0.22 653 06 0.23 0.24 653 07 0.21 0.25 653 08 0.19 0.26 653 09 0.18 0.27 653 10 0.17 0.28 653 11 0.16 0.28 653 12 0.15 0.29 653 13 0.14 0.29 653 14 0.13 0.30 653 15 0.13 0.30 653 16 0.12 0.30 653 17 0.12 0.31 653 18 0.11 0.31 653 19 0.11 0.31 653 20 0.10 0.32 653	01 0.42 0.08 653 02 0.36 0.14 653 03 0.32 0.18 653 04 0.29 0.21 653 05 0.26 0.23 653 06 0.23 0.24 653 07 0.21 0.25 653 08 0.20 0.26 653 09 0.18 0.27 653 10 0.17 0.28 653 11 0.16 0.28 653 12 0.15 0.29 653 13 0.14 0.30 653 14 0.14 0.30 653 15 0.13 0.30 653 16 0.12 0.31 653 17 0.12 0.31 653 18 0.11 0.32 653 19 0.11 0.32 653 20 0.10 0.32 653	01 0.42 0.09 653 02 0.37 0.14 653 03 0.32 0.18 653 04 0.29 0.21 653 05 0.26 0.23 653 06 0.24 0.25 653 07 0.22 0.26 653 08 0.20 0.26 653 09 0.18 0.27 653 10 0.17 0.28 653 11 0.16 0.29 653 12 0.15 0.29 653 13 0.14 0.30 653 14 0.14 0.30 653 15 0.13 0.31 653 16 0.12 0.31 653 17 0.12 0.31 653 18 0.11 0.32 653 19 0.11 0.32 653 20 0.10 0.33 653	01 0.43 0.09 653 02 0.37 0.14 653 03 0.32 0.18 653 04 0.29 0.21 653 05 0.26 0.23 653 06 0.24 0.25 653 07 0.22 0.26 653 08 0.20 0.26 653 09 0.18 0.27 653 10 0.17 0.28 653 11 0.16 0.29 653 12 0.15 0.29 653 13 0.14 0.30 653 14 0.14 0.31 653 15 0.13 0.31 653 16 0.12 0.31 653 17 0.12 0.32 653 18 0.11 0.32 653 19 0.11 0.33 653 20 0.10 0.33 653

System 0 + System 4 : Profile + concept space Similar Groups

Method
Linear combination with different weights

Results
Overall (position [tab] avg precision [tab] avg recall [tab] n):

System 0 (50%) + System 4 (50%)	System 0 (75%) + System 4 (25%)	System 0 (85%) + System 4 (15%)	System 0 (95%) + System 4 (5%)	System 0 (99%) + System 4 (1%)
01 0.48 0.05 642 02 0.46 0.09 642 03 0.42 0.12 642 04 0.39 0.15 642 05 0.37 0.17 642 06 0.34 0.19 642 07 0.32 0.21 642 08 0.30 0.22 642 09 0.28 0.23 642 10 0.27 0.24 642 11 0.25 0.25 642 12 0.24 0.26 642 13 0.23 0.27 642 14 0.22 0.27 642 15 0.21 0.28 642 16 0.20 0.28 642 17 0.19 0.29 642 18 0.18 0.29 642 19 0.18 0.30 642 20 0.17 0.30 642	01 0.57 0.06 642 02 0.52 0.10 642 03 0.47 0.14 642 04 0.44 0.17 642 05 0.41 0.19 642 06 0.38 0.21 642 07 0.35 0.23 642 08 0.33 0.24 642 09 0.31 0.26 642 10 0.29 0.27 642 11 0.28 0.28 642 12 0.26 0.28 642 13 0.25 0.29 642 14 0.24 0.30 642 15 0.22 0.30 642 16 0.21 0.31 642 17 0.21 0.31 642 18 0.20 0.32 642 19 0.19 0.32 642 20 0.18 0.32 642	01 0.58 0.06 642 02 0.53 0.10 642 03 0.49 0.14 642 04 0.45 0.17 642 05 0.42 0.20 642 06 0.39 0.22 642 07 0.36 0.24 642 08 0.34 0.25 642 09 0.32 0.26 642 10 0.30 0.27 642 11 0.28 0.28 642 12 0.27 0.29 642 13 0.25 0.30 642 14 0.24 0.31 642 15 0.23 0.31 642 16 0.22 0.31 642 17 0.21 0.32 642 18 0.20 0.32 642 19 0.19 0.33 642 20 0.19 0.33 642	01 0.58 0.06 642 02 0.54 0.10 642 03 0.50 0.14 642 04 0.46 0.17 642 05 0.42 0.20 642 06 0.39 0.22 642 07 0.37 0.24 642 08 0.34 0.25 642 09 0.32 0.27 642 10 0.30 0.28 642 11 0.29 0.29 642 12 0.27 0.30 642 13 0.26 0.30 642 14 0.24 0.31 642 15 0.23 0.31 642 16 0.22 0.32 642 17 0.21 0.32 642 18 0.21 0.33 642 19 0.20 0.33 642 20 0.19 0.34 642	01 0.59 0.06 642 02 0.54 0.10 642 03 0.50 0.14 642 04 0.46 0.17 642 05 0.42 0.20 642 06 0.39 0.22 642 07 0.37 0.24 642 08 0.34 0.25 642 09 0.32 0.27 642 10 0.30 0.28 642 11 0.29 0.29 642 12 0.27 0.30 642 13 0.26 0.30 642 14 0.24 0.31 642 15 0.23 0.31 642 16 0.22 0.32 642 17 0.21 0.32 642 18 0.21 0.33 642 19 0.20 0.33 642 20 0.19 0.34 642