Understanding and predicting Web content credibility using the Content Credibility Corpus
The goal of our research is to create a predictive model of Web content credibility evaluations, based on human evaluations.
The model has to be based on a comprehensive set of independent factors that can be used to guide user’s credibility evaluations in crowd- sourced systems like WOT, but also to design machine classifiers of Web content credibil- ity.
The factors described in this article are based on empirical data. We have created a dataset obtained from an extensive crowdsourced Web credibility assessment study (over 15 thousand evaluations of over 5000 Web pages from over 2000 participants). First, on- line participants evaluated a multi-domain corpus of selected Web pages. Using the ac- quired data and text mining techniques we have prepared a code book and conducted another crowdsourcing round to label textual justifications of the former responses.
We have extended the list of significant credibility assessment factors described in previous re- search and analyzed their relationships to credibility evaluation scores. Discovered factors that affect Web content credibility evaluations are also weakly correlated, which makes them more useful for modeling and predicting credibility evaluations. Based on the newly identified factors, we propose a predictive model for Web content credibility. The model can be used to determine the significance and impact of discovered factors on credibility evaluations. These findings can guide future research on the design of automatic or semi- automatic systems for Web content credibility evaluation support. This study also con- tributes the largest credibility dataset currently publicly available for research: the Content Credibility Corpus (C3).
©2017 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license. ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )
According to findings of a 2011 survey ( Purcell, 2011), 92% of American adult Internet users use search engines to find information on the Web, with 59% who do so on a typical day. This and other studies confirm our intuitions regarding the important role of Web information. The Web continues to provide extremely low cost means of publishing information, often coupled with high incentives for doing so, since Web content can affect purchasing behaviors, opinions, and other important decisions of Web users. This combination of factors led to large volumes of non-credible and unreliable information being published on the Web.
BLOCKCHAIN HOLDS GREAT potential for improving payment systems, but for the moment that potential remains largely unrealised.read more
Algorithms automating repetitive legal tasks will allow lawyers to focus on pertinent legal issues while expanding their work portfolios.read more
From improving health care processes to predicting when you might need to go into the hospital, AI is improving many aspects of the way we obtain and pay for medical care. Most patients aren’t aware – yet – of what goes on to make AI a reality in health care.read more
An analysis of more than 400 use cases across 19 industries and nine business functions highlights the broad use and significant economic potential of advanced AI techniques.read more
Did the chicken you just buy at the supermarket have a nice life, roam free, and eat healthy grains? If you’re the kind of person who cares, Carrefour SA, the big France-based grocery chain, has the bird for you.read more
Steven Spielberg’s new film “Ready Player One” imagines a future where people live much of their lives in virtual reality. Do science fiction’s predictions of the future ever come true?read more
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
Antoine de Saint-Exupery