The evaluation will be conducted in three phases. In each phase, we will only choose one conference to evaluate each team’s results. A team’s final result will be calculated as the weighted sum of the chosen three conferences’ submissions, i.e., 20%*Conf1 + 40%*Conf2 + 40%*Conf3.
The ground truth will be determined by all the full research papers accepted in each conference. We use the following simple policy to determine the Affiliation Ranking:
- Each accepted paper has an equal vote (i.e., they are equally important).
- Each author has an equal contribution to a paper.
- If an author has multiple affiliations, each affiliation also contributes equally.
Suppose there is a paper which has three authors "author1", "author2" and "author3", who belong to different affiliations "affiliation1", "affiliation2" and "affiliation3", respectively. Then in this case, each affiliation will receive 1/3 vote for its score.
Suppose another paper has two authors "author1" (belongs to "affiliation1") and "author4" (belongs to both "affiliation3" and "affiliation4". In this case, "affiliation1" will receive the score 1/2, while "affiliation3" and "affiliation4" both receive the score 1/4.
By aggregating the scores from the above two papers, the final affiliation ranking scores are:
"affiliation1" = 1/3 + 1/2 = 0.8333
"affiliation3" = 1/3 + 1/4 = 0.5833
"affiliation2" = 1/3 = 0.3333
"affiliation4" = 1/4 = 0.2500
Once we obtain the above ground truth ranking or relevance score, we will utilize the metric NDCG@20 to measure the relevance. We define NDCG@N as following:
Where i is the rank of an institution, and reli is this’ institution’s relevance score.
We use the following example to illustrate how we calculate the NDCG scores.
Suppose for KDD 2016, the final institution ranking and relevance scores are:
The result submitted by the team "team1" is:
Then we will calculate the NDCG score for "team1" as following:
To normalize DCG values, an ideal ordering for the given query is needed. For this example, that ordering would be the monotonically decreasing sort of the relevance judgments provided by the experiment participant, which is: 8.465, 6.238, 4.186, 2.515, 0.878, 0.
In the ground truth data, the relevance scores for two different institutions may be the same. In this case, we treat them as the same rank, like the following example:
In this example, "Yahoo!" and "UIUC" ties on scores, so they are both ranked as the second place, hence "IBM" will be considered as the 4th place.
In the submitted ranking results, we ask the participants to avoid generating the same ranking scores for any two affiliations. If we find the ranking scores are the same for some affiliations, we will rank these affiliations based on the order of them in the submitted results.
In the event of a tie between any eligible entries, we will evaluate the NDCG beyond the top 20 positions until we break the ties. In the extreme case that we cannot break the tie using NDCG, we will determine the winner based on the submissions' timestamps.
Forming a team
There is no limit on the number of people on any given team. To register a team, you only need to register one team leader and choose a nickname for your team. During the competition, a user cannot participate on multiple teams. To be ranked in the competition and qualified for prizes, each registered participant (individual or team leader) needs to disclose their names. If a person participates on multiple teams, all teams associated with that person will be disqualified. Furthermore, on May 20th 2016, the team leader needs to declare the composition of your team. No changes are allowed after that.
Team mergers are not allowed in this competition.
Participation is not conditioned on delivering the code. However, to claim the winning prizes, the teams have to submit and present a paper describing their solution at the KDD Cup Workshop. We will ask the top ranking participants to voluntarily fill out a fact sheet about their methods, and to help reproduce their results. The KDD Cup 2016 Workshop is part of the KDD 2016 conference and is open to the registered conference participants. The team agrees in advance that the submitted manuscript from the winning team will be published in the KDD Workshop proceedings if the Workshop organizers choose to do so.
Besides duplicating team membership, using any information that is not publicly available is considered cheating and any teams who do so will be disqualified from the competition. For instance, leveraging any information from the submitted papers is not allowed since only a subset of persons (e.g. PC members) can access such information.
Please login to submit your results once we open the team registration.
Your entry must be a single .tsv text file that contains a row for each item in the following format:
- [conference id] \t [affiliation id] \t [probability score] \n
- The Conference ID and the Affiliation ID must match the format given in the graph data. The probability score is a decimal value between 0 and 1 representing the importance of an affiliation in relation to the given conference. If an affiliation is missing from the results, the value will be assumed as 0.
- Affiliation IDs must match the IDs provided in the "2016 KDD Cup Selected Affiliations" file on the data download page.
- The filename must be results.tsv
- In each phase, in the submitted results.tsv file, please include and only include the conferences required in that phase (phase 1: SIGIR, SIGMOD, SIGCOMM; phase 2: KDD, ICML; phase 3: FSE, MobiCom, MM).
While there is no limit to the number of entries you can submit, we will only evaluate the last submission in each phase.
Prizes Eligibility Rules
Prizes are subject to eligibility criteria. View the full Official Rules of the Contest.