Knowledge Graphs (KGs) are large collections of entities, their descriptions and relationships between them. KGs are a vital resource for QA systems, as they can be processed and aggregated to effectively construct comprehensive answers. Collectively created KGs exploit the “wisdom of the crowd” to produce and maintain high-quality and up-to-date information. In this context, an ongoing project that attracted significant community interest is Wikidata, from the Wikimedia foundation. Since its inception in October 2012, Wikidata has grown to include more than hundred thousand registered users. These users have already gathered the facts about more than 15 M entities and help maintaining the Wikidata KG to keep the representation of these entities and facts up-to-date. All the information in Wikidata is released under an open license, which allows the data to be freely reused and shared, offering a variety of opportunities to build knowledge-based systems on top of it. Rich up-to-date information regarding millions of entities within the Wikidata KG makes Wikidata particularly suitable as a knowledge resource for QA systems. In order to enable effective use of Wikidata in these systems, it is crucial to better understand the processes through which the Wikidata community produces knowledge. These understanding can provide valuable insights in the data quality in Wikidata and uncover hidden relationships between entities to support serendipitous information discovery in the QA processes. In particular, our aims are: First, we aim to investigate the relationship between the Wikidata community processes and their impact on the data quality. Second, we aim to verify whether these processes can be leveraged to provide serendipitous answers in QA systems.

As a first step, we carried out research on the effects of a particular Wikidata feature, the lack of enforced property constraints, on the presence of conflicts and the expression of knowledge diversity. Complete Wikidata dumps up to 4 July 2015 were used for our analysis. Our results showed that, although the current level of conflictuality and knowledge diversity in Wikidata are low, they cannot be clearly connected to the feature analysed. Following that, we analysed how group features influence outcome quality. We first selected a number of characteristics deemed to be related to high performance, i.e. group size, group diversity, and level of coordination. Afterwards, we analysed whether Wikidata entities found to have high quality matched what stated about group features in previous literature and compared them to the average of Wikidata entities. The next step will involve, first of all, the creation of a framework to perform large scale data quality evaluation of Wikidata, grounded on previous findings on data quality. Subsequently, we will conduct a study to explore how connections can be made between Wikidata entities on the basis of community processes, and how these connections can be exploited to provide serendipitous answers in a QA system. Finally, a user study will be performed to assess the relevancy and accurateness of the answers provided using this system.



Wikidatians Are Born: Paths to Full Participation in a Collaborative Structured Knowledge Base. Alessandro Piscopo, Christopher Phethean, Elena Simperl. HICSS-50. 2017 (to appear)