Microsoft Academic Graph (MAG)
Although we encourage the participants to use any publicly available information in this challenge, we do provide all the teams with the Microsoft Academic Graph (MAG). The Microsoft Academic Graph is a large and heterogeneous graph containing scientific publication records, citation relationships between publications, as well as authors, institutions, journal and conference "venues," and fields of study. This data is available as a set of zipped text files stored in Microsoft Azure blob storage and available via HTTP. The latest file size (zipped) is ~28.2GB. We also separate the zipped file into several smaller zipped files for easier downloading.
Please use the data version "2016-02-05" since it contains two specific files for this KDD Cup 2016.
The data provided for the challenge can be accessed and downloaded from http://aka.ms/academicgraph. Please use the data version “2016-02-05” since it contains two specific files for this KDD Cup 2016.
Note: In order to emphasize an important technical challenge that is common in web-scale data collection and aggregation, the data released here have undergone only rudimentary processing, for example in areas of author and paper conflation/deduplication. This noisy yet realistic dataset can provide additional avenues for research in the big data arena.
We encourage competitors that require computational resource support to apply for the Microsoft Azure for Research Award. For details about the Microsoft Azure for Research Award submission process, go to http://www.windowsazurepass.com/research. Please include the hashtag #academicgraph in your submission title for easier tracking.
In addition to the Microsoft Academic Graph data, we also provide the following tools for exploring the Academic Graph:
- Academic Knowledge API, a new Project Oxford service that enables developers to directly query the Academic Graph
- Microsoft Academic, a preview of the new Academic semantic search experience powered by the Academic Knowledge API
* Note: The data in the Academic Knowledge API is updated on a weekly basis and reflects the most current Academic content available. The snapshot data reflects a specific snapshot of the Academic Graph and will remain unchanged for the duration of the competition. If you want to use the "2016-02-05" version of the data in the Academic Knowledge API, please set the model to kdd2016 in the API. See details here.
Affiliation List to be Ranked
In the above dataset, we released a list of affiliation names and IDs for KDD Cup 2016. The participating teams only need to rank the affiliations that appear in the list. Please see the "2016 KDD Cup Selected Affiliations" file on the data download page.
Full Paper List in the past Five Years
The conference data in MAG contain many types of papers, including full research papers, industry track papers, short papers, poster papers, workshop papers, etc. It is nontrivial for the participants to identify the correct type for each conference paper. Since we will only evaluate the affiliations from accepted full research papers, we also provided the past five years’ (from 2011 to 2015) full research paper list of the eight targeted conferences. Please see the "2016 KDD Cup Selected Papers" file on the data download page.
External Data Sources
Teams are welcome to use external data in their approaches as long as that data is publicly accessible. The winners are required to disclose all the information they have used to generate the final ranking.