Knowledge graphs have recently attracted significant attention from both industry and academia in scenarios that require exploiting large-scale heterogeneous data collections. They have witnessed numerous applications in a diverse range: from social media to the telecom industry. Thus, many companies in various industry sectors have started building and maintaining their own knowledge graph for internal use in the last few years. All of the applications of the graph involve reasoning over the heterogeneous relational data it stores, to infer novel information not already present. We refer to this process as knowledge graph completion.
However, modern knowledge graph data possesses an additional property that gives rise to a new challenge: the graphs can contain more than hundreds of millions of entities. When graph sizes reach high orders of magnitude a delicate balance between scalability with respect to model performance on one hand and scalability with respect to computational cost on the other might be required and such a model has yet to be proposed.
The work for this project presents an approach to constructing a model that generates meaningful graph representations while maintaining as significant scalability and prediction performance as possible. During preprocessing, network analysis techniques provided graph features, which are utilized by a novel graph embedding model that integrated local representations, obtained using standard and state-of-the-art techniques, into a global picture. Evaluation results showed that the approach performed significantly well on the link prediction and query answering tasks on Swisscom data, reproducing results reported in related work. Certain experiments on academic data confirmed the possibility for even further improvement through more focused research.