International Conference on Computer Systems and Technologies - CompSysTech’07
Analysis and experimental study of an algorithm for automatic
assignment of reviewers to papers
Yordan Kalmukov
Abstract: This paper is a continuation of the one entitled “An algorithm for automatic assignment of
reviewers to papers” [1] published in CompSysTech 2006 Conference Proceedings. The main aim of the
present paper is to outline the results of the analysis and experimental study of the suggested algorithm [1].
It has been compared in terms of accuracy and running time to the maximum-weighted matching algorithm of
Kuhn and Munkres (also known as the Hungarian Algorithm) implemented in The MyReview System [3].
Key words: automatic assignment of reviewers to papers, conference management system, the
assignment problem, matching in bipartite graphs.
INTRODUCTION
During the last several years the conference management systems have become
widely used for organizing scientific conferences. Among the major reasons for that is the
functionality for automatic assignment of reviewers to papers. In case of a small number of
submitted papers the manual assignment is still acceptable. However when the number of
papers and reviewers gets bigger the process of manual assignment gets harder and less
correct. It is impossible for the Programme Committee (PC) chairs to know all authors,
reviewers and their fields of research. Even if they have all the information, PC chairs are
still not likely to make all possible combinations of pairs and then to find
the most accurate one. At this stage using a conference management system and its
functionality for automatic assignment is not just helpful but it is mandatory.
An accurate automatic assignment can be made only if reviewers and authors
provide information about their interests (competences), respectively papers. In the next
paragraphs “PC members” and “reviewers” are used as synonyms although it is arguable if
it is on principle correct or not.
There are various ways of stating competences and describing papers. One of them
is the “keywords mechanism”. Authors and reviewers are required to select keywords
(usually from a previously defined by the PC chairs set of keywords) that best describe
their papers, respectively competences. Then a compatibility (similarity) factor between i-th
paper and the j-th reviewer can be easily calculated by applying some basic operations to
the sets of keywords selected by the author of i-th paper and the j-th reviewer. So, instead
of browsing and rating all submitted papers PC members just specify their interests and
competences. The disadvantage of the keywords mechanism is that the common set of all
keywords has to be both finite (having a reasonable number of elements – 20 to 30) and
complete in respect to the area of science. This is not easily achievable. Situations where
an author can not find any relevant keyword to his/her paper should be avoided. Otherwise
they will result in less accuracy.
Another way of expressing interests is by using conference topics together with an
Iterative Rating Method [2] as used by Philippe Rigaux in the MyReview System [3]. The
idea here is to suggest a small set of papers (based on the conference topics) to each PC
member. Then he/she rates the papers from his/her set and a collaborative filtering
algorithm is run to obtain a new set of papers that will be suggested for either further rating
or assignment to the specified PC member. The collaborative filtering algorithm predicts
the compatibility factors between reviewers and the papers that have never been rated by
those reviewers. In this way PC members are not required to rate all submitted papers but
just a small amount of them. The disadvantage of this method is that for achieving a high
enough accuracy, PC members have to rate papers several times.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed for
profit or commercial advantage and that copies bear this notice and the full citation on the
first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists,
requires prior specific permission and/or a fee.
© 2007 ACM ISBN: 978-954-9641-50-9
- VI.7-1 -
International Conference on Computer Systems and Technologies - CompSysTech’07
There is one thing in common between these two ways of expressing reviewers’
interests and preferences – it is that both ways compute a compatibility (similarity) factor
for each pair . Thus a matrix of similarity factors can be built and used
as an input for the assignment algorithm.
This paper is a continuation of the one entitled “An algorithm for automatic
assignment of reviewers to papers” [1] published in CompSysTech 2006 Conference
Proceedings. The main aim of the present paper is to outline the results from the analysis
and the experimental study of the suggested in [1] algorithm. The latter has been
compared in terms of accuracy and running time to the maximum-weighted matching
algorithm of Kuhn and Munkres (also known as the Hungarian Algorithm) implemented in
The MyReview System [3, 2]. The paper describing the algorithm [1] can be downloaded
from the CompSysTech virtual library – please see references section.
An algorithm for automatic assignment of reviewers to papers – features
The suggested assignment algorithm [1] is independent on the way of calculating
similarity factors (by using keywords or iterative rating). However it has been used and
tested with keywords only. It:
¾ Provides a uniform distribution of papers to reviewers (if possible), i.e. all
reviewers evaluate an equal number of papers.
¾ Guarantees that if a paper has at least one keyword in common with a reviewer,
then the paper will have a reviewer assigned to it (feasible if the number of papers
that can be evaluated by a particular reviewer only is not more than the maximum
allowed number of papers per reviewer).
¾ Runs times faster comparing to similar algorithms.
At the end of the algorithm each paper will have 1 or 0 reviewers assigned to it. If
papers have to be evaluated by more than one PC member (as they usually have to) the
algorithm is repeated as many times as needed. If the calculated number (by using
formula (1) in [1]) of papers (m) to be assigned to each reviewer is not an integer but a real
value then the algorithm rounds it down to the first integer (this operation is noted as
floor(m)) and performs n iterations, where n is the number of reviewers per paper. After
doing that all reviewers will have floor(m) papers to evaluate and there will be papers with
not enough reviewers (i.e. less than n or more precisely n-1). The algorithm then sets the
maximum allowed number of papers per reviewer to 1 and performs one more iteration. In
this way it provides a uniform distribution of papers to reviewers and the number of papers,
assigned to each reviewer will be either floor(m) or floor(m)+1. However, it is not always
possible such a uniform distribution to be achieved. Well, it is 100% guaranteed that
nobody will be given more than floor(m)+1 papers. But if a PC member is competent to
review k papers only, where k < floor(m), then just k papers are given to him/her. As a
result the number of reviewers evaluating floor(m)+1 papers increases (if possible) in order
to keep the original number of assignments that have to be made. If this is not possible
some papers may left with less than n reviewers. In this situation no “blind” assignments
are made. Instead, the PC chair is warned that there are not enough reviewers competent
to evaluate the specified papers so he/she can manually assign reviewers to them.
Fortunately such situations appear rarely.
The second feature of the algorithm is provided by the C1 correction (calculated by
using formulas (4), (5) and (6) in [1]) used to modify the similarity factors in the first row of
the weight matrix [1]. Although it is written the algorithm guarantees that if a paper has at
least one keyword in common with a reviewer, then the paper will have a reviewer
assigned to it, it actually can not be always guaranteed. Image a situation where n papers
could be evaluated by “reviewer 1” only. He is the only one competent to review them. But
- VI.7-2 -
International Conference on Computer Systems and Technologies - CompSysTech’07
the maximum allowed number of papers per reviewer is n-1. Then at least 1 paper will not
be assigned to any reviewer and this is unavoidable. There is no algorithm that can handle
this situation without assigning an incompatible reviewer to that paper.
The third feature together with an accuracy analysis is a primary topic of this paper
and will be discussed in details in the next sections.
Analysis and experimental study
The algorithm can be divided into 4 main steps (figures 1 and 2 from [1]). During the
first step it builds a weight matrix (matrix of similarity factors). The algorithm then iterates
thorough steps 2 and 3 finding a solution of the assignment problem. Once the solution is
found the algorithm goes to step 4 and stores the solution within the relevant data
structures – PS and RS [1]. At the end of the 4-th step, 1 or 0 reviewers will be assigned to
each paper.
The number of iterations through step 2 and 3 (CI1) does not depend on the number
of submitted papers and registered PC members. It depends on the distribution of
reviewers’ competences to papers’ thematic fields. Its value varies depending on whether
many reviewers have an interest to a small group of papers; or a small amount of PC
members are the most competent reviewers for a large number of papers and etc. After
dozens of experiments it has been observed that its value varies from 1 to 12. Thus for this
analysis CI1 can be assumed as a constant. If a small number of reviewers are dominating
in the first several rows of the weight matrix, then CI1 will be high.
Let’s note the overall execution time of the algorithm with T(a). Each step has an
execution time of T(si), where i ∈ [1, 4]. If each paper has to be evaluated by k reviewers
then T(a) is calculated as:
T(a) = T(s1) + k[CI1(T(s2) + T(s3)) + T(s4)]
where:
k – number of repetitions of the algorithm ( = reviewers per paper + 1);
CI1 – number of iterations through step 2 and 3.
The execution time of step 1 (figure 1 in [1]) is
T(s1) = C11*P*R + C12* P*SORT(R)
where:
C11 – the time needed to calculate the similarity factor between i-th paper and j-th reviewer
C12 – constant factor of the sorting algorithm
If the sorting algorithm sorts in its worst case in O(n lg n) then
T(s1) = C11*P*R + C12*P*R*lg(R)
(1)
It is obvious that the highest order term is C12*P*R*lg(R), however it is experimentally
observed that if the number of reviewers is less than 1000 (which seems to be true for all
conferences) then
C11 >> C12*lg(R)
(2)
So the whole term C12*P*R*lg(R) will cause an insignificant influence on T(s1), thus it can
just be ignored (or absorbed by the first term). The reason is that there are many
operations standing behind the constant C11 – not just calculating the similarity factor
between Pi and RJ, but also finding different kind of conflict situations (if the author and the
reviewer are from the same institution or country; if the reviewer to be assigned to a paper
is a co-author of the paper and others). Some of these operations may include additional
SQL queries and regular expressions.
1000 reviewers is not actually a threshold value so (2) may remain true even if the
number of reviewers is more than a thousand. It is noted 1000, because 1000 is the
biggest value used in experiments. Experimental results fully prove this statement. Take a
- VI.7-3 -
International Conference on Computer Systems and Technologies - CompSysTech’07
look at table 2.The weight matrix used by the suggested algorithm and the one of WMA
are built by almost the same programming code. The only one difference is that columns
of the matrix used by the suggested algorithm are sorted.
After ignoring the second term of (1), finally it can be written that
T(s1) ∈ O(P*R)
(3)
As shown on figure 2 in [1], the algorithm has to remove similarity factors between
each paper and all reviewers specified in reviewersToRemove structure. So it goes P
times through reviewersToRemove. The size of reviewersToRemove can not be more than
P/papersPerReviewer, where papersPerReviewer is calculated by using formula (1) in [1].
P
T(s2 ) = C21 * P + C22 * P
P * reviewersP erPaper / R
As reviewersPerPaper is a constant defined by the PC chair (usually 2 or 3) it can be
absorbed by C22. So,
T(s2) = C21*P + C22*P*R
(4)
Thus:
T(s2) ∈ O(P*R)
(5)
The execution time of step 3 is:
R
P
R
T( s3 ) = C31
SORT( CI2) + C32
CI2
R
CI 2
CI2 – does not depend on the number of papers and reviewers, but on the distribution of
reviewers’ competences to papers’ thematic fields. If the number of reviewers that appear
in the first row of the weight matrix is higher, then CI2 is smaller. How many reviewers will
appear in the first row of the matrix depends on the keywords chosen by authors and
reviewers.
The value of CI2 can not be controlled or predicted as it depends on users’ preferences,
but experiments shows it varies within a small range. Thus it can be assumed as a
constant, absorbed by C31 and C32.
P
P
(6)
T(s3 ) = C31 * R * * lg( ) + C32 * R
R
R
P and R are always interconnected in a way that each reviewer evaluates a reasonable
number of papers and usually P/R ∈ [1, 3], so they cancel each other insight the
logarithm. Then T(s3) is transformed to:
(7)
T( s3 ) = C31 * P + C32 * R
So
T(s3) ∈ O(P)
(8)
Step 4 is the easiest one to find complexity of. By analyzing figure 2 in [1] it is obvious
that
T(s4) ∈ O(P)
(9)
Then
T(a) ∈ MAX(O(P*R), O(P*R), O(P), O(P)) or
T(a) ∈ O(P*R)
(10)
Experimental study
The algorithm’s complexity can be experimentally proven by performing a regression
analysis. Because of the lack of space, only the most essential points of the analysis will
be outlined here, but not the whole analysis in details.
- VI.7-4 -
International Conference on Computer Systems and Technologies - CompSysTech’07
Let’s assume the algorithm is a “black box” having 2 independent (controlled)
variables as an input (these are the number of papers and the number of reviewers); and
one dependent (response) variable as an output – the execution time. The aim of this
analysis is to find a mathematical model (regression equation) that shows how the
execution time (here noted as Y) depends on the number of papers (noted as x1) and the
number of reviewers (noted as x2). The uncertainty in the value of Y is caused by some
random variables that also influence the execution time (the distribution of reviewers’
interests to papers’ thematic fields; the server utilization; the operating system and etc.).
Although x1 and x2 are somehow interconnected (so that every reviewer evaluates a
reasonable number of papers) there is no equation that can be used to calculate x1 by
given x2, or x2 by given x1. For the algorithm x1 and x2 are linearly interdependent.
Changing the level of x1 will not change the level of x2 and the opposite
Based on the complexity analysis just made and the theory in [4] it is estimated that the
regression model that can best fit looks like:
ŷ = b0 + b1x1 + b2x2 + b12x1x2
(11)
However a second-order polynomial model (12) will be used for this experiment
ŷ = b0 + b1x1 + b2x2 + b12x1x2 + b11x12 + b22x22
(12)
It is expected that both b11 and b22 will be zeros so eventually (12) will be transformed into
(11).
A B2 plan of experimental study [4] and the method of lease squares are used for
finding the regression parameters. According to the B2 plan, 3 different levels should be
chosen for each independent variable. Then these levels are coded and take part in the so
called “experiment matrix” (X). The levels are: 0 – basic level (xib); 1 – high level (xih);
-1 – low level (xil);
The most frequently observed values are usually taken as basic levels. In case of
scientific conferences the most common values are – 150 papers and 100 reviewers, x1
basic = 150, x2 basic = 100. As specified in [4] high and low levels can be calculated by using
the following equation:
(0.1÷ 0.2)(xi max – xi min) = xi high – xi basic = xi basic – xi low
(13)
Let’s assume that in most cases x1 ∈ [0, 500] and x2 ∈ [0, 300].
Then x1 max = 500; x1 min = 0; x2 max = 300; x2 min = 0;
x1 high = 250; x1 basic = 150; x1 low = 50; x2 high = 160; x2 basic = 100; x2 low = 40; (14)
•
•
•
Or in coded way they will look like: x ih = +1; x ib= 0; x il= -1;
Table 1
B2
N X0 X1
X1 real
X2 X2 real
X1X2
X12 X22
Y
(1)
1
Core
2
3
4
5
Star
6
points 7
8
Center 9
1
1
1
1
1
1
1
1
1
1
-1
1
-1
1
-1
0
0
0
250
50
250
50
250
50
150
150
150
1
1
-1
-1
0
0
1
-1
0
160
160
40
40
100
100
160
40
100
1
-1
-1
1
0
0
0
0
0
1
1
1
1
1
1
0
0
0
1
1
1
1
0
0
1
1
0
(2)
y
y
y
21,5887 21,7964 20,9226
5,1228 4,9775 5,0108
6,4274 6,7207 6,4021
1,5808 1,6095 1,6729
13,988 14,2449 13,6655
3,4608 3,3442 3,4646
13,8292 13,4768 13,5479
3,9192 4,2583 4,1362
8,46 8,6382
9,034
- VI.7-5 -
Y
Sj2
21,4359
5,03703
6,5167
1,62107
13,9661
3,4232
13,618
4,1046
8,7107
0,20839
0,00579
0,03136
0,00222
0,08428
0,00468
0,03473
0,0295
0,08631
(3)
International Conference on Computer Systems and Technologies - CompSysTech’07
According to the plan 9 experiments have to be performed with different levels of x1
and x2. Each experiment is repeated 3 times. The results from each measurement of the
execution time are shown in columns 11 to 13 (y(1), y(2), y(3)) of table 1. Column 14 gives
the average execution time for the j-th experiment, where j ∈ [1, 9]. Column 15 contains
the sample variance of the execution time for the j-th experiment.
For performing a correct regression analysis it is required that variances of all
experiments are statistically equal. This could be verified by using Bartlett’s test or
Cochran’s criterion [4]. By placing all sample variances from all experiments in the relevant
formula for calculating Cochran’s criterion it is found that all variances are statistically
equal. So, regression analysis could be correctly done with the present experimental data.
The well known method of least squares is the key tool to find regression parameters.
According to it:
B = (XTX)-1XTY
(15)
Where:
X – Experiment matrix. It consists of x0, x1, x2, x1x2, x12, x22 columns from table 1;
XT – The transposed matrix of X;
(XTX)-1 – Information matrix. It depends only on the number of independent variables
and the type of regression model. It is taken from a reference book [4];
Y – Vector containing all average execution times (column 14 of table 1);
B – Vector containing all calculated regression parameters.
B=
0
0
0 - 0.3333 - 0.3333 ⎤
⎡0.5556
⎢
⎥
0
0.1667
0
0
0
0
⎢
⎥
⎢
⎥
0
0 0.1667 0
0
0
⎢
⎥
0
0
0
0.2500
0
0
⎢
⎥
⎢ - 0.3333 0
⎥
0
0 0.5000
0
⎢
⎥
0
0
0 0.5000
⎢ - 0.3333 0
⎥
⎢
⎥
⎣
⎦
x
⎡1 1 1
⎢ 1 -1 1
⎢
⎢ 1 1 -1
⎢
⎢ 1 -1 -1
⎢1 1 1
⎢
⎣⎢ 1 1 1
1 1 1 1 1 1 ⎤
- 1 1 - 1 0 0 0 ⎥⎥
-1 0 0 1 -1 0 ⎥
⎥
1 0 0 0 0 0⎥
1 1 1 0 0 0 ⎥
⎥
1 0 0 1 1 0 ⎦⎥
x
⎡21.4359⎤
⎢5.0370 ⎥
⎥
⎢
⎢6.5167 ⎥
⎥
⎢
⎢1.6211 ⎥
⎢13.9661 ⎥
⎥
⎢
⎢3.4232 ⎥
⎢13.6180 ⎥
⎥
⎢
⎢4.1046 ⎥
⎢8.7107 ⎥
⎦
⎣
=
⎡8.8033 ⎤
⎢5.3073 ⎥
⎢
⎥
⎢4.6423 ⎥
⎢
⎥
⎢2.8758 ⎥
⎢- 0.1418⎥
⎢
⎥
⎣⎢0.0248 ⎦⎥
Then all elements of the B vector are tested for statistical significance by Student’s ttest. If an element is insignificant then the whole term associated with it has to be removed
from (12). Removing terms from (12) leads to removing the corresponding columns of X
matrix related to this term. This will change the information matrix as well and the B vector
should be recalculated again. For more information on how exactly this is done please
refer to [4]. Here, b11 and b22 (the last two elements of B) are proved to be insignificant, so
b11x12 and b22x22 are removed from (12).
Finally:
B=
⎡8.7148⎤
⎢5.3062⎥
⎢
⎥
⎢4.6414⎥
⎢
⎥
⎣2.8758⎦
or b0 = 8.7148; b1 = 5.3062; b2 = 4.6414 and b12 = 2.8758;
Then the regression model is:
yˆ = 8 .7148 + 5 .3062 x& 1 + 4 .6414 x& 2 + 2 .8758 x& 1x& 2
(16)
If (16) is adequate then it will prove (10). So (16) is tested for goodness of fit (or
adequacy). This is done by using Fisher’s criterion [4]. According to the theory in [4]:
N
F
2
S
= ad ;
2
S dr
2
S ad =
(Y j − yˆ j)
∑
j =1
N − m'
2
N
nj ;
2
S dr =
2
∑
S
j
j =1
N
- VI.7-6 -
;
(17)
International Conference on Computer Systems and Technologies - CompSysTech’07
Where:
S2ad – variance of model adequacy;
S2dr – variance of data reproduction;
Sj2 – variance of the j-th experiment;
Yj – average execution time for the j-th experiment;
ŷj – predicted (calculated by the regression model) value of Y for the j-th experiment;
nj – number of repetitions of the j-th experiment;
N – number of experiments;
m’ – number of regression parameters;
The regression model is considered to be adequate (or good to fit) if the calculated
value of F < Fα;Kad;Kdr [4], where α = 0.05, i.e. confidence level of 95%; Kad – degrees of
freedom of S2ad; Kdr – degrees of freedom of S2dr; For this analysis:
F = 1.137939 < F0.05;5;18 = 2.78, so the regression model is adequate. The constants
in (16) depend on hardware performance. If this experiment is done on other machine the
regression parameters will be completely different. However the regression model will still
look like (11), that’s why (16) proves the computational complexity (10).
Experiments are performed on a laptop computer – Celeron 2 GHz, 512 MB RAM,
Windows XP Home, Apache 1.3x, PHP 4.3.4, MySQL 3.23.45.
Comparison to other algorithms. Accuracy and running time.
The automatic assignment of reviewers to papers is a typical assignment problem in
bipartite graphs. Consider the complete bipartite graph G = (P U R, P x R), where P is the
set of all submitted papers and R – the set of all registered reviewers. There is an edge
from every paper to every reviewer and every edge has a weight. The weight of the edge
between paper Pi and reviewer Rj is exactly the similarity (compatibility) factor between Pi
and Rj.
Obviously (common sense) the accuracy of an assignment algorithm will be higher if
the weights of the edges included in matching are higher. Thus the accuracy can be
measured by the weight of matching. Higher weight of matching means higher accuracy.
The weight of a matching (M) is a sum of the weights of all individual edges (e) included in
that matching or:
w( M ) = ∑ w(e)
(18)
e∈M
As written in [1] the similarity factors or the weight of the edges are calculates as follows:
SF PiRj = w ( ePiRj ) =
count ( KW Pi ∩ KW Rj )
count ( KW Pi ∪ KW Rj )
(19)
where:
SFPiRj - similarity factor between i-th paper and j-th reviewer
KWPi - set of keywords, describing the i-th paper
KWRj - set of keywords chosen by the j-th reviewer
count() - gives the number of elements within a set. Duplicates are ignored, i.e. the result
set has unique values only.
(19) shows not only how competent the j-th reviewer is to evaluate the i-th paper, but
also how worthy is the i-th paper to be evaluated by j-th reviewer not by someone else. For
more information please refer to [1].
In this sense the accuracy of the suggested algorithm can be easily found just by
summing the similarity factors of all assigned pairs . However the
calculated absolute number will mean nothing. What if the accuracy is 249.15 or 0.43!?
- VI.7-7 -
International Conference on Computer Systems and Technologies - CompSysTech’07
How accurate is the algorithm? This question can not be answered just by using these
numbers. So measuring the accuracy in absolute numbers is far from enough for
evaluating the algorithm’s quality. The question here is what the highest accuracy possible
to be achieved is. If it is known the accuracy of the suggested algorithm could be simply
compared to the best possible – which is now a realistic way for evaluating the algorithm’s
quality. Obviously the most accurate algorithm is the one that guarantees finding the
maximum-weighted matching.
To evaluate the quality of the suggested algorithm [1] it is compared in terms of
accuracy and running time to the maximum-weighted matching algorithm of Kuhn and
Munkres (also known as the Hungarian Algorithm). The latter, of course, as a maximumweighted matching algorithm, delivers the best possible assignment [2, 5]. It originally runs
in O(n4), but it can be optimized to run in O(n3) [5]. The Hungarian algorithm is
implemented in The MyReview System [3] by Miki Hermann and Philippe Rigaux. To make
a fair enough comparison this implementation is integrated (not for commercial use, but for
experimental purposes only) in the conference management system [6] used by the
Department of Computing at the University of Ruse. So, both algorithms are part of one
and the same conference management system and share the same input data. Although
[6] relies on keywords and The MyReview System [3] uses an Iterative Rating Method [2],
the two algorithms can be objectively compared as they are both independent on the way
of expressing reviewers’ interests and describing papers. The conference management
system [6] uses formula (19) to calculate a similarity (compatibility) factor for every pair
.
The algorithms are tested dozens of times by using randomly generated test data and
several times by using real data taken from two real conferences. Both algorithms assign 2
reviewers to each paper, and of course, share one and the same input data for every
experiment. The results are summarized in table 2.
This comparison experiment is performed on a desktop computer – AMD Duron 1.3 GHz,
512 MB RAM, Windows XP Professional, Apache 2.0, PHP 5.1.6, MySQL 5.0.
Table 2
Papers/
Reviewers
Optimal (maximum)-weighted matching
algorithm implemented in [3]
avg
match,
%
execution time, s
ZM
variance
20/10
0.1196 + 0.273
2 22.25
0.0075
50/30
0.5613 + 5.6739
1 32.12
0.0094
75/40
1.4005 + 15.0243
0 38.69
0.0088
100/50
1.5669 + 41.3718
0 37.06
0.0095
150/80
3.2265 + 129.9825
0 41.92
0.0088
150/100
4.8523 + 135.7165
0 43
0.0099
175/100
4.8785 + 248.662
0 43.49
0.0089
200/100
6.257 + 392.8176
0 43.9
0.0094
200/150
8.4473 + 367.6491
0 47.15
0.0101
Data from conference 1 - 2 reviewers per paper
* 121/54
1.9259 + 84.9161
0 26.26
0.0215
Data from conference 2 - 3 reviewers per paper
77/55
1.034 + 23.326
0 38.57
0.0171
Suggested algorithm [1]
avg
match,
%
execution time, s ZM
variance
0.1394 + 0.0306
2 21.97 0.0078
0.5796 + 0.0869
1 31.83 0.0102
1.055 + 0.1468
0 38.59 0.0095
1.6781 + 0.2094
0 36.89 0.0108
3.5274 + 0.5228
0 41.67 0.0099
4.4708 + 0.5581
0 42.71 0.0102
5.0131 + 0.6686
0 43.26 0.0091
5.7635 + 0.7266
0 43.73 0.0108
8.4756 + 1.1822
0 46.81 0.0106
2.2675 + 0.3243
0 26.45
0.0246
1.1384 + 0.2155
0 37.62
0.02
The execution time is given as a sum of two terms. The first one is the time elapsed
in building the weight matrix. As expected this term has almost the same value for both
algorithms. The second term is more interesting – it is the time elapsed in finding a
solution of the assignment problem. This is where the suggested algorithm has a great
- VI.7-8 -
International Conference on Computer Systems and Technologies - CompSysTech’07
advantage. In case of 150 papers and 100 reviewers, the suggested algorithm is more
than 200 times faster than the implementation of the Hungarian algorithm.
The “avg. match” (average weight of the assignment) is the weight of matching
calculated by (18) divided by the number of assigned pairs and multiplied by 100. The
accuracy of the algorithms is compares by this parameter. When talking about matchings
here please note that in this particular case it is possible 2 (or more) edges to share a
single vertex – do not forget that every paper is reviewed by more than one PC member
and every reviewer evaluates several papers.
Let’s assume the Hungarian algorithm is a 100% accurate. Such an assumption is
adequate as the Hungarian algorithm is a maximum-weighted matching algorithm. Then it
can be calculated that for almost all measurements the suggested algorithm works with
about 98% of the accuracy of the Kuhn-Munkres algorithm implemented by Miki Hermann
and Philippe Rigaux. However there is an anomaly as well. Take a look at the data
concerning the first conference (the row marked with * - 121 papers; 54 reviewers). In that
experiment the suggested algorithm achieves higher accuracy than the maximumweighted matching algorithm, which by theory can not be true. Thus, this case exactly
should be a subject of further study and investigation.
CONCLUSIONS AND FUTURE WORK
All the analyses and experiments presented in this paper confirm the 3 main features
of the suggested algorithm [1]. Furthermore it works with about 98% of the accuracy of the
Hungarian algorithm, but many times faster (in case of 150 papers and 100 reviewers
more than 200 times faster) as it has better asymptotic efficiency.
The suggested algorithm [1] has been tested and successfully used to automatically
assign reviewers to papers for several conferences during 2006 and 2007. In future, tools
for improving the assignment will be added to the assignment module of [6]. As [6] relies
on keywords these tools may be based on collaborative filtering or techniques for
determining how semantically close the keywords are – i.e. if a reviewer has selected
keyword i, but keyword i is strongly related to keyword j, then the latter can be
automatically added to the set of keywords chosen by the reviewer. The same could be
done in respect to papers as well.
REFERENCES
[1] Kalmukov, Y., An algorithm for automatic assignment of reviewers to papers,
Proceedings of CompSysTech 2006, Veliko Turnovo, 2006
http://ecet.ecs.ru.acad.bg/cst06/Docs/cp/sV/V.5.pdf
[2] Rigaux, Ph., An Iterative Rating Method: Application to web-based conference
management, ACM Intl. Conference on Applied Computing (ACM-SAC’04), 2004
[3] Rigaux, Ph., The MyReview System, http://myreview.lri.fr
[4] Mitkov, A., D. Minkov, Statistical methods for research and optimization Part II,
Zemizdat, Sofia, 1993
[5] Khuller, S., Design and Analysis of Algorithms: Course Notes, University of
Maryland, Maryland USA, 1996
[6] Web-based conference management system – www.science-library.com/conf/
ABOUT THE AUTHOR
Yordan Kalmukov, PhD Student, Department of Computing, University of Ruse,
Tel.; +359 877 421102, email: JKalmukov@gmail.com
- VI.7-9 -