KDD Cup 2016
Whose papers are accepted the most: towards measuring the impact of research institutions
Finding influential nodes in a social network for identifying patterns or maximizing information diffusion has been an active research area with many practical applications. In addition to its obvious value in the advertisement industry, in the research community this topic is also highly valued by those who have been longing for mechanisms to effectively disseminate new scientific discoveries and technological breakthroughs so as to advance our collective knowledge and elevate our civilization. For students, parents and funding agencies that are planning their academic pursuits or evaluating grant proposals, having an objective picture of the institutions in question is particularly essential. Against this backdrop we have witnessed that releasing a yearly Research Institution or University Ranking has become a tradition for many popular newspapers, magazines and academic institutes (see Appendix for some example rankings). The published rankings not only attract a lot of attention from governments, universities, students and parents, but also create many debates on the scientific correctness behind those rankings. The most criticized aspect of these rankings is: the data used and the methodology employed for the ranking are mostly unknown to the public.
This KDD Cup would like to galvanize the community to address this very important problem through any publicly available datasets, like the Microsoft Academic Graph (MAG), a freely available dataset that includes information of academic publications and citations. Being a heterogeneous graph, MAG can be used to study the influential nodes of various types, including authors, affiliations and venues; however, we will focus on affiliations in this competition. In effect, given a research field, we are challenging the KDD Cup community to jointly develop data mining techniques to identify the best research institutions based on their publications and how they are cited in research articles.
The high level task of this challenge is: given any research field, like Machine Learning, Data Mining, etc., rank the most influential institutions, like CMU, UIUC, etc., using any publicly available information such as Microsoft Academic Graph. However, for the purpose of a competition, a faithful evaluation metric is required. We thus transform this task into another innovative and interesting task: given any upcoming top conferences such as KDD, SIGIR, and ICML in 2016, rank the importance of institutions based on predicting how many of their papers will be accepted.
The participants are expected to utilize any information on the Web, including the heterogeneous information in the Microsoft Academic Graph, for predicting next year’s top institutions. Take KDD as an example, the information that is helpful in the ranking might include but not be limited to:
- Previous years’ KDD top institutions
- Topic trends of previous years’ KDD papers.
- Previous years’ KDD top authors’ impact factor based on the citation graph.
- Location of each year’s KDD since institutions close to the location may have more appearances.
- Information from other conferences and journals that are related to KDD, like ICDM, ICML, WWW, CIKM, TKDD, etc.
- Co-author factor.
- Temporal information associated.
This year’s KDD Cup is novel and challenging in several aspects:
- The problem itself is an open problem, and the teams do not necessarily have to utilize the supervised learning algorithms;
- The evaluation setting is significantly different from previous KDD Cup challenges because the ground truth is not known beforehand, which makes the problem even more challenging;
- The teams are encouraged to use a publicly available dataset to derive knowledge and insights.
1st Place - $10,000
2nd Place - $6,500
3rd Place - $3,500
Note: In addition to the prizes above which are based on all three phases' results, we also provide honorable mention awards to recognize the top team in each phase. A certificate will be presented to the team who wins the individual phase. In this case, even if you missed the first two phases’ submission deadlines for some reason, you are still encouraged to attend the third phase.