Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 18 May 2024

Emotions unveiled: detecting COVID-19 fake news on social media

  • Bahareh Farhoudinia   ORCID: orcid.org/0000-0002-2294-8885 1 ,
  • Selcen Ozturkcan   ORCID: orcid.org/0000-0003-2248-0802 1 , 2 &
  • Nihat Kasap   ORCID: orcid.org/0000-0001-5435-6633 1  

Humanities and Social Sciences Communications volume  11 , Article number:  640 ( 2024 ) Cite this article

118 Accesses

31 Altmetric

Metrics details

  • Business and management
  • Science, technology and society

The COVID-19 pandemic has highlighted the pernicious effects of fake news, underscoring the critical need for researchers and practitioners to detect and mitigate its spread. In this paper, we examined the importance of detecting fake news and incorporated sentiment and emotional features to detect this type of news. Specifically, we compared the sentiments and emotions associated with fake and real news using a COVID-19 Twitter dataset with labeled categories. By utilizing different sentiment and emotion lexicons, we extracted sentiments categorized as positive, negative, and neutral and eight basic emotions, anticipation, anger, joy, sadness, surprise, fear, trust, and disgust. Our analysis revealed that fake news tends to elicit more negative emotions than real news. Therefore, we propose that negative emotions could serve as vital features in developing fake news detection models. To test this hypothesis, we compared the performance metrics of three machine learning models: random forest, support vector machine (SVM), and Naïve Bayes. We evaluated the models’ effectiveness with and without emotional features. Our results demonstrated that integrating emotional features into these models substantially improved the detection performance, resulting in a more robust and reliable ability to detect fake news on social media. In this paper, we propose the use of novel features and methods that enhance the field of fake news detection. Our findings underscore the crucial role of emotions in detecting fake news and provide valuable insights into how machine-learning models can be trained to recognize these features.

Similar content being viewed by others

research about social media fake news

A guide to artificial intelligence for cancer researchers

research about social media fake news

Principal component analysis

research about social media fake news

Toolbox of individual-level interventions against online misinformation

Introduction.

Social media has changed human life in multiple ways. People from all around the world are connected via social media. Seeking information, entertainment, communicatory utility, convenience utility, expressing opinions, and sharing information are some of the gratifications of social media (Whiting and Williams, 2013 ). Social media is also beneficial for political parties or companies since they can better connect with their audience through social media (Kumar et al., 2016 ). Despite all the benefits that social media adds to our lives, there are also disadvantages to its use. The emergence of fake news is one of the most important and dangerous consequences of social media (Baccarella et al., 2018 , 2020 ). Zhou et al. ( 2019 ) suggested that fake news threatens public trust, democracy, justice, freedom of expression, and the economy. In the 2016 United States (US) presidential election, fake news engagement outperformed mainstream news engagement and significantly impacted the election results (Silverman, 2016 ). In addition to political issues, fake news can cause irrecoverable damage to companies. For instance, Pepsi stock fell by 4% in 2016 when a fake story about the company’s CEO spread on social media (Berthon and Pitt, 2018 ). During the COVID-19 pandemic, fake news caused serious problems, e.g., people in Europe burned 5G towers because of a rumor claiming that these towers damaged the immune system of humans (Mourad et al., 2020 ). The World Health Organization (WHO) asserted that misinformation and propaganda propagated more rapidly than the COVID-19 pandemic, leading to psychological panic, the circulation of misleading medical advice, and an economic crisis.

This study, which is a part of a completed PhD thesis (Farhoundinia, 2023 ), focuses on analyzing the emotions and sentiments elicited by fake news in the context of COVID-19. The purpose of this paper is to investigate how emotions can help detect fake news. This study aims to address the following research questions: 1. How do the sentiments associated with real news and fake news differ? 2. How do the emotions elicited by fake news differ from those elicited by real news? 3. What particular emotions are most prevalent in fake news? 4. How can these feelings be used to recognize fake news on social media?

This paper is arranged into six sections: Section “Related studies” reviews the related studies; Section “Methods” explains the proposed methodology; and Section “Results and analysis” presents the implemented models, analysis, and related results in detail. Section “Discussion and limitations” discusses the research limitations, and the conclusion of the study is presented in Section “Conclusion”.

Related studies

Research in the field of fake news began following the 2016 US election (Carlson, 2020 ; Wang et al., 2019 ). Fake news has been a popular topic in multiple disciplines, such as journalism, psychology, marketing, management, health care, political science, information science, and computer science (Farhoudinia et al., 2023 ). Therefore, fake news has not been defined in a single way; according to Berthon and Pitt ( 2018 ), misinformation is the term used to describe the unintentional spread of fake news. Disinformation is the term used to describe the intentional spread of fake news to mislead people or attack an idea, a person, or a company (Allcott and Gentzkow, 2017 ). Digital assets such as images and videos could be used to spread fake news (Rajamma et al., 2019 ). Advancements in computer graphics, computer vision, and machine learning have made it feasible to create fake images or movies by merging them together (Agarwal et al., 2020 ). Additionally, deep fake videos pose a risk to public figures, businesses, and individuals in the media. Detecting deep fakes is challenging, if not impossible, for humans.

The reasons for believing and sharing fake news have attracted the attention of several researchers (e.g., Al-Rawi et al., 2019 ; Apuke and Omar, 2020 ; Talwar, Dhir et al., 2019 ). Studies have shown that people have a tendency to favor news that reinforces their existing beliefs, a cognitive phenomenon known as confirmation bias. This inclination can lead individuals to embrace misinformation that aligns with their preconceived notions (Kim and Dennis, 2019 ; Meel and Vishwakarma, 2020 ). Although earlier research focused significantly on the factors that lead people to believe and spread fake news, it is equally important to understand the cognitive mechanisms involved in this process. These cognitive mechanisms, as proposed by Kahneman ( 2011 ), center on two distinct systems of thinking. In system-one cognition, conclusions are made without deep or conscious thoughts; however, in system-two cognition, there is a deeper analysis before decisions are made. Based on Moravec et al. ( 2020 ), social media users evaluate news using ‘system-one’ cognition; therefore, they believe and share fake news without deep thinking. It is essential to delve deeper into the structural aspects of social media platforms that enable the rapid spread of fake news. Social media platforms are structured to show that posts and news are aligned with users’ ideas and beliefs, which is known as the root cause of the echo chamber effect (Cinelli et al., 2021 ). The echo chamber effect has been introduced as an aspect that causes people to believe and share fake news on social media (e.g., Allcott and Gentzkow, 2017 ; Berthon and Pitt, 2018 ; Chua and Banerjee, 2018 ; Peterson, 2019 ).

In the context of our study, we emphasize the existing body of research that specifically addresses the detection of fake news (Al-Rawi et al., 2019 ; Faustini and Covões, 2020 ; Ozbay and Alatas, 2020 ; Raza and Ding, 2022 ). Numerous studies that are closely aligned with the themes of our present investigation have delved into methodological approaches for identifying fake news (Er and Yılmaz, 2023 ; Hamed et al., 2023 ; Iwendi et al., 2022 ). Fake news detection methods are classified into three categories: (i) content-based, (ii) social context, and (iii) propagation-based methods. (i) Content-based fake news detection models are based on the content and linguistic features of the news rather than user and propagation characteristics (Zhou and Zafarani, 2019 , p. 49). (ii) Fake news detection based on social context employs user demographics such as age, gender, education, and follower–followee relationships of the fake news publishers as features to recognize fake news (Jarrahi and Safari, 2023 ). (iii) Propagation-based approaches are based on the spread of news on social media. The input of the propagation-based fake news detection model is a cascade of news, not text or user profiles. Cascade size, cascade depth, cascade breadth, and node degree are common features of detection models (Giglietto et al., 2019 ; de Regt et al., 2020 ; Vosoughi et al., 2018 ).

Machine learning methods are widely used in the literature because they enable researchers to handle and process large datasets (Ongsulee, 2017 ). The use of machine learning in fake news research has been extremely beneficial, especially in the domains of content-based, social context-based, and propagation-based fake news identification. These methods leverage the advantages of a range of characteristics, including sentiment-related, propagation, temporal, visual, linguistic, and user/account aspects. Fake news detection frequently makes use of machine learning techniques such as logistic regressions, decision trees, random forests, naïve Bayes, and support vector machine (SVM). Studies on the identification of fake news also include deep learning models, such as convolutional neural networks (CNN) and long short-term memory (LSTM) networks, which can provide better accuracy in certain situations. Even with a small amount of training data, pretrained language models such as bidirectional encoder representations from transformers (BERT) show potential for identifying fake news (Kaliyar et al., 2021 ). Amer et al. ( 2022 ) investigated the usefulness of these models in benchmark studies covering different topics.

The role of emotions in identifying fake news within academic communities remains an area with considerable potential for additional research. Despite many theoretical and empirical studies, this topic remains inadequately investigated. Ainapure et al. ( 2023 ) analyzed the sentiments elicited by tweets in India during the COVID-19 pandemic with deep learning and lexicon-based techniques using the valence-aware dictionary and sentiment reasoner (Vader) and National Research Council (NRC) lexicons to understand the public’s concerns. Dey et al. ( 2018 ) applied several natural language processing (NLP) methods, such as sentiment analysis, to a dataset of tweets about the 2016 U.S. presidential election. They found that fake news had a strong tendency toward negative sentiment; however, their dataset was too limited (200 tweets) to provide a general understanding. Cui et al. ( 2019 ) found that sentiment analysis was the best-performing component in their fake news detection framework. Ajao et al. ( 2019 ) studied the hypothesis that a relationship exists between fake news and the sentiments elicited by such news. The authors tested hypotheses with different machine learning classifiers. The best results were obtained by sentiment-aware classifiers. Pennycook and Rand ( 2020 ) argued that reasoning and analytical thinking help uncover news credibility; therefore, individuals who engage in reasoning are less likely to believe fake news. Prior psychology research suggests that an increase in the use of reason implies a decrease in the use of emotions (Mercer, 2010 ).

In this study, we apply sentiment analysis to the more general topic of fake news detection. The focus of this study is on the tweets that were shared during the COVID-19 pandemic. Many scholars focused on the effects of media reports, providing comprehensive information and explanations about the virus. However, there is still a gap in the literature on the characteristics and spread of fake news during the COVID-19 pandemic. A comprehensive study can enhance preparedness efforts for any similar future crisis. The aim of this study is to answer the question of how emotions aid in fake news detection during the COVID-19 pandemic. Our hypothesis is that fake news carries negative emotions and is written with different emotions and sentiments than those of real news. We expect to extract more negative sentiments and emotions from fake news than from real news. Existing works on fake news detection have focused mainly on news content and social context. However, emotional information has been underutilized in previous studies (Ajao et al., 2019 ). We extract sentiments and eight basic emotions from every tweet in the COVID-19 Twitter dataset and use these features to classify fake and real news. The results indicate how emotions can be used in differentiating and detecting fake and real news.

With our methodology, we employed a multifaceted approach to analyze tweet text and discern sentiment and emotion. The steps involved were as follows: (a) Lexicons such as Vader, TextBlob, and SentiWordNet were used to identify sentiments embedded in the tweet content. (b) The NRC emotion lexicon was utilized to recognize the range of different emotions expressed in the tweets. (c) Machine learning models, including the random forest, naïve Bayes, and SVM classifiers, as well as a deep learning model, BERT, were integrated. These models were strategically applied to the data for fake news detection, both with and without considering emotions. This comprehensive approach allowed us to capture nuanced patterns and dependencies within the tweet data, contributing to a more effective and nuanced analysis of the fake news content on social media.

An open, science-based, publicly available dataset was utilized. The dataset comprises 10,700 English tweets with hashtags relevant to COVID-19, categorized with real and fake labels. Previously used by Vasist and Sebastian ( 2022 ) and Suter et al. ( 2022 ), the manually annotated dataset was compiled by Patwa et al. ( 2021 ) in September 2020 and includes tweets posted in August and September 2020. According to their classification, the dataset is balanced, with 5600 real news stories and 5100 fake news stories. The dataset used for the study was generated by sourcing fake news data from public fact-checking websites and social media outlets, with manual verification against the original documents. Web-based resources, including social media posts and fact-checking websites such as PolitiFact and Snopes, played a key role in collecting and adjudicating details on the veracity of claims related to COVID-19. For real news, tweets from official and verified sources were gathered, and each tweet was assessed by human reviewers based on its contribution of relevant information about COVID-19 (Patwa et al., 2021 ; Table 2 on p. 4 of Suter et al., 2022 , which is excerpted from Patwa et al. ( 2021 ), also provides an illustrative overview).

Preprocessing is an essential step in any data analysis, especially when dealing with textual data. Appropriate preprocessing steps can significantly enhance the performance of the models. The following preprocessing steps were applied to the dataset: removing any characters other than alphabets, change the letters to lower-case, deleting stop words such as “a,” “the,” “is,” and “are,” which carry very little helpful information, and performing lemmatization. The text data were transformed into quantitative data by the scikit-learn ordinal encoder class.

The stages involved in this research are depicted in a high-level schematic that is shown in Fig. 1 . First, the sentiments and emotions elicited by the tweets were extracted, and then, after studying the differences between fake and real news in terms of sentiments and emotions, these characteristics were utilized to construct fake news detection models.

figure 1

The figure depicts the stages involved in this research in a high-level schematic.

Sentiment analysis

Sentiment analysis is the process of deriving the sentiment of a piece of text from its content (Vinodhini and Chandrasekaran, 2012 ). Sentiment analysis, as a subfield of natural language processing, is widely used in analyzing the reviews of a product or service and social media posts related to different topics, events, products, or companies (Wankhade et al., 2022 ). One major application of sentiment analysis is in strategic marketing. Păvăloaia et al. ( 2019 ), in a comprehensive study on two companies, Coca-Cola and PepsiCo, confirmed that the activity of these two brands on social media has an emotional impact on existing or future customers and the emotional reactions of customers on social media can influence purchasing decisions. There are two methods for sentiment analysis: lexicon-based and machine-learning methods. Lexicon-based sentiment analysis uses a collection of known sentiments that can be divided into dictionary-based lexicons or corpus-based lexicons (Pawar et al., 2015 ). These lexicons help researchers derive the sentiments generated from a text document. Numerous dictionaries, such as Vader (Hutto and Gilbert, 2014 ), SentiWordNet (Esuli and Sebastiani, 2006 ), and TextBlob (Loria, 2018 ), can be used for scholarly research.

In this research, Vader, TextBlob, and SentiWordNet are the three lexicons used to extract the sentiments generated from tweets. The Vader lexicon is an open-source lexicon attuned specifically to social media (Hutto and Gilbert, 2014 ). TextBlob is a Python library that processes text specifically designed for natural language analysis (Loria, 2018 ), and SentiWordNet is an opinion lexicon adapted from the WordNet database (Esuli and Sebastiani, 2006 ). Figure 2 shows the steps for the sentiment analysis of tweets.

figure 2

The figure illustrates the steps for the sentiment analysis of tweets.

Different methods and steps were used to choose the best lexicon. First, a random partition of the dataset was manually labeled as positive, negative, or neutral. The results of every lexicon were compared with the manually labeled sentiments, and the performance metrics for every lexicon are reported in Table 1 . Second, assuming that misclassifying negative and positive tweets as neutral is not as crucial as misclassifying negative tweets as classifying positive tweets, the neutral tweets were ignored, and a comparison was made on only positive and negative tweets. The three-class and two-class classification metrics are compared in Table 1 .

Third, this study’s primary goal was to identify the precise distinctions between fake and real tweets to improve the detection algorithm. We addressed how well fake news was detected with the three sentiment lexicons, as different results were obtained. This finding means that a fake news detection model was trained with the dataset using the outputs from three lexicons: Vader, TextBlob, and SentiWordNet. As previously indicated, the dataset includes labels for fake and real news, which allows for the application of supervised machine learning detection models and the evaluation of how well various models performed. The Random Forest algorithm is a supervised machine learning method that has achieved good performance in the classification of text data. The dataset contains many tweets and numerical data reporting the numbers of hospitalized, deceased, and recovered individuals who do not carry any sentiment. During this phase, tweets containing numerical data were excluded; this portion of the tweets constituted 20% of the total. Table 2 provides information on the classification power using the three lexicons with nonnumerical data. The models were more accurate when using sentiments drawn from Vader. This finding means the Vader lexicon may include better classifications of fake and real news. Vader was selected as the superior sentiment lexicon after evaluating all three processes. The steps for choosing the best lexicon are presented in Fig. 3 (also see Appendix A in Supplementary Information for further details on the procedure). Based on the results achieved when using Vader, the tweets that are labeled as fake include more negative sentiments than those of real tweets. Conversely, real tweets include more positive sentiments.

figure 3

The figure exhibits the steps for choosing the best lexicon.

Emotion extraction

Emotions elicited in tweets were extracted using the NRC emotion lexicon. This lexicon measures emotional effects from a body of text, contains ~27,000 words, and is based on the National Research Council Canada’s affect lexicon and the natural language toolkit (NLTK) library’s WordNet synonym sets (Mohammad and Turney, 2013 ). The lexicon includes eight scores for eight emotions based on Plutchick’s model of emotion (Plutchik, 1980 ): joy, trust, fear, surprise, sadness, anticipation, anger, and disgust. These emotions can be classified into four opposing pairs: joy–sadness, anger–fear, trust–disgust, and anticipation–surprise. The NRC lexicon assigns each text the emotion with the highest score. Emotion scores from the NRC lexicon for every tweet in the dataset were extracted and used as features for the fake news detection model. The features of the model include the text of the tweet, sentiment, and eight emotions. The model was trained with 80% of the data and tested with 20%. Fake news had a greater prevalence of negative emotions, such as fear, disgust, and anger, than did real news, and real news had a greater prevalence of positive emotions, such as anticipation, joy, and surprise, than did fake news.

Fake news detection

In the present study, the dataset was divided into a training set (80%) and a test set (20%). The dataset was analyzed using three machine learning models: random forest, SVM, and naïve Bayes. Appendices A and B provide information on how the results were obtained and how they correlate with the research corpus.

Random forest : An ensemble learning approach that fits several decision trees to random data subsets. This classifier is popular for text classification, high-dimensional data, and feature importance since it overfits less than decision trees. The Random Forest classifier in scikit-learn was used in this study (Breiman, 2001 ).

Naïve Bayes : This model uses Bayes’ theorem to solve classification problems, such as sorting documents into groups and blocking spam. This approach works well with text data and is easy to use, strong, and good for problems with more than one label. The Naïve Bayes classifier from scikit-learn was used in this study (Zhang, 2004 ).

Support vector machines (SVMs) : Supervised learning methods that are used to find outliers, classify data, and perform regression. These methods work well with data involving many dimensions. SVMs find the best hyperplanes for dividing classes. In this study, the SVM model from scikit-learn was used (Cortes and Vapnik, 1995 ).

Deep learning models can learn how to automatically describe data in a hierarchical way, making them useful for tasks such as identifying fake news (Salakhutdinov et al., 2012 ). A language model named bidirectional encoder representations from transformers (BERT) was used in this study to help discover fake news more easily.

BERT : A cutting-edge NLP model that uses deep neural networks and bidirectional learning and can distinguish patterns on both sides of a word in a sentence, which helps it understand the context and meaning of text. BERT has been pretrained with large datasets and can be fine-tuned for specific applications to capture unique data patterns and contexts (Devlin et al., 2018 ).

In summary, we applied machine learning models (random forest, naïve Bayes, and SVM) and a deep learning model (BERT) to analyze text data for fake news detection. The impact of emotion features on detecting fake news was compared between models that include these features and models that do not include these features. We found that adding emotion scores as features to machine learning and deep learning models for fake news detection can improve the model’s accuracy. A more detailed analysis of the results is given in the section “Results and analysis”.

Results and analysis

In the sentiment analysis using tweets from the dataset, positive and negative sentiment tweets were categorized into two classes: fake and real. Figure 4 shows a visual representation of the differences, while the percentages of the included categories are presented in Table 3 . In fake news, the number of negative sentiments is greater than the number of positive sentiments (39.31% vs. 31.15%), confirming our initial hypothesis that fake news disseminators use extreme negative emotions to attract readers’ attention.

figure 4

The figure displays a visual representation of the differences of sentiments in each class.

Fake news disseminators aim to attack or satirize an idea, a person, or a brand using negative words and emotions. Baumeister et al. ( 2001 ) suggested that negative events are stronger than positive events and that negative events have a more significant impact on individuals than positive events. Accordingly, individuals sharing fake news tend to express more negativity for increased impressiveness. The specific topics of the COVID-19 pandemic, such as the source of the virus, the cure for the illness, the strategy the government is using against the spread of the virus, and the spread of vaccines, are controversial topics. These topics, known for their resilience against strong opposition, have become targets of fake news featuring negative sentiments (Frenkel et al., 2020 ; Pennycook et al., 2020 ). In real news, the pattern is reversed, and positive sentiments are much more frequent than negative sentiments (46.45% vs. 35.20%). Considering that real news is spread among reliable news channels, we can conclude that reliable news channels express news with positive sentiments so as not to hurt their audience psychologically and mentally.

The eight scores for the eight emotions of anger, anticipation, disgust, fear, joy, sadness, surprise, and trust were extracted from the NRC emotion lexicon for every tweet. Each text was assigned the emotion with the highest score. Table 4 and Fig. 5 include more detailed information about the emotion distribution.

figure 5

The figure depicts more detailed information about the emotion distribution.

The NRC lexicon provides scores for each emotion. Therefore, the intensities of emotions can also be compared. Table 5 shows the average score of each emotion for the two classes, fake and real news.

A two-sample t -test was performed using the pingouin (PyPI) statistical package in Python (Vallat, 2018 ) to determine whether the difference between the two groups was significant (Tables 6 and 7 ).

As shown in Table 6 , the P values indicate that the differences in fear, anger, trust, surprise, disgust, and anticipation were significant; however, for sadness and joy, the difference between the two groups of fake and real news was not significant. Considering the statistics provided in Tables 4 , 5 , and Fig. 5 , the following conclusions can be drawn:

Anger, disgust, and fear are more commonly elicited in fake news than in real news.

Anticipation and surprise are more commonly elicited in real news than in fake news.

Fear is the most commonly elicited emotion elicited in both fake and real news.

Trust is the second most commonly elicited emotion in fake and real news.

The most significant differences were observed for trust, fear, and anticipation (5.92%, 5.33%, and 3.05%, respectively). The differences between fake and real news in terms of joy and sadness were not significant.

In terms of intensity, based on Table 5 ,

Fear is the mainly elicited emotion in both fake and real news; however, fake news has a higher fear intensity score than does real news.

Trust is the second most commonly elicited emotion in two categories—real and fake—but is more powerful in real news.

Positive emotions, such as anticipation, surprise, and trust, are more strongly elicited in real news than in fake news.

Anger, disgust, and fear are among the stronger emotions elicited by fake news. Joy and sadness are elicited in both classes almost equally.

During the COVID-19 pandemic, fake news disseminators seized the opportunity to create fearful messages aligned with their objectives. The existence of fear in real news is also not surprising because of the extraordinary circumstances of the pandemic. The most crucial point of the analysis is the significant presence of negative emotions elicited by fake news. This observation confirms our hypothesis that fake news elicits extremely negative emotions. Positive emotions such as anticipation, joy, and surprise are elicited more often in real news than in fake news, which also aligns with our hypothesis. The largest differences in elicited emotions are as follows: trust, fear, and anticipation.

We used nine features for every tweet in the dataset: sentiment and eight scores for every emotion and sentiment in every tweet. These features were utilized for supervised machine learning fake news detection models. A schematic explanation of the models is given in Fig. 6 . The dataset was divided into training and test sets, with an 80%–20% split. The scikit-learn random forest, SVM, and Naïve Bayes machine learning models with default hyperparameters were implemented using emotion features to detect fake news in nonnumerical data. Then, we compared the prediction power of the models with that of models without these features. The performance metrics of the models, such as accuracy, precision, recall, and F1-score, are given in Table 7 .

figure 6

The figure exhibits a schematic explanation of the model.

When joy and sadness were removed from the models, the accuracy decreased. Thus, the models performed better when all the features were included (see Table C.1. Feature correlation scores in Supplementary Information). The results confirmed that elicited emotions can help identify fake and real news. Adding emotion features to the detection models significantly increased the performance metrics. Figure 7 presents the importance of the emotion features used in the random forest model.

figure 7

The figure illustrates the importance of the emotion features used in the Random Forest model.

In the random forest classifier, the predominant attributes were anticipation, trust, and fear. The difference in the emotion distribution between the two classes of fake and real news was also more considerable for anticipation, trust, and fear. It can be claimed that fear, trust, and anticipation emotions have good differentiating power between fake and real news.

BERT was the other model that was employed for the task of fake news detection using emotion features. The BERT model includes a number of preprocessing stages. The text input is segmented using the BERT tokenizer, with sequence truncation and padding ensuring that the length does not exceed 128 tokens, a reduction from the usual 512 tokens due to constraints on computing resources. The optimization process utilized the AdamW optimizer with a set learning rate of 0.00001. To ascertain the best number of training cycles, a 5-fold cross-validation method was applied, which established that three epochs were optimal. The training phase consisted of three unique epochs. The model was executed on Google Colab using Python, a popular programming language. The model was evaluated with the test set after training. Table 8 shows the performance of the BERT model with and without using emotions as features.

The results indicate that adding emotion features had a positive impact on the performance of the random forest, SVM, and BERT models; however, the naïve Bayes model achieved better performance without adding emotion features.

Discussion and limitations

This research makes a substantial impact on the domain of detecting fake news. The goal was to explore the range of sentiments and emotional responses linked to both real and fake news in pursuit of fulfilling the research aims and addressing the posed inquiries. By identifying the emotions provoked as key indicators of fake news, this study adds valuable insights to the existing corpus of related scholarly work.

Our research revealed that fake news triggers a higher incidence of negative emotions compared to real news. Sentiment analysis indicated that creators of fake news on social media platforms tend to invoke more negative sentiments than positive ones, whereas real news generally elicits more positive sentiments than negative ones. We extracted eight emotions—anger, anticipation, disgust, fear, joy, sadness, surprise, and trust—from each tweet analyzed. Negative and potent emotions such as fear, disgust, and anger were more frequently found elicited in fake news, in contrast to real news, which was more likely to arouse lighter and positive emotions such as anticipation, joy, and surprise. The difference in emotional response extended beyond the range of emotions to their intensity, with negative feelings like fear, anger, and disgust being more pronounced in fake news. We suggest that the inclusion of emotional analysis in the development of automated fake news detection algorithms could improve the effectiveness of the machine learning and deep learning models designed for fake news detection in this study.

Due to negativity bias (Baumeister et al., 2001 ), bad news, emotions, and feedback tend to have a more outsized influence than positive experiences. This suggests that humans are more likely to assign greater weight to negative events over positive ones (Lewicka et al., 1992 ). Our findings indicate that similar effects are included in social media user behavior, such as sharing and retweeting. Furthermore, the addition of emotional features to the fake news detection models was found to improve their performance, providing an opportunity to investigate their moderating effects on fake news dissemination in future research.

The majority of the current research on identifying fake news involves analyzing the social environment and news content (Amer et al., 2022 ; Jarrahi and Safari, 2023 ; Raza and Ding, 2022 ). Despite its possible importance, the investigation of emotional data has not received sufficient attention in the past (Ajao et al., 2019 ). Although sentiment in fake news has been studied in the literature, earlier studies mostly neglected a detailed examination of certain emotions. Dey et al. ( 2018 ) contributed to this field by revealing a general tendency toward negativity in fake news. Their results support our research and offer evidence for the persistent predominance of negative emotions elicited by fake news. Dey et al. ( 2018 ) also found that trustworthy tweets, on the other hand, tended to be neutral or positive in sentiment, highlighting the significance of sentiment polarity in identifying trustworthy information.

Expanding upon this sentiment-focused perspective, Cui et al. ( 2019 ) observed a significant disparity in the sentiment polarity of comments on fake news as opposed to real news. Their research emphasized the clear emotional undertones in user reactions to false material, highlighting the importance of elicited emotions in the context of fake news. Similarly, Dai et al. ( 2020 ) analyzed false health news and revealed a tendency for social media replies to real news to be marked by a more upbeat tone. These comparative findings highlight how elicited emotions play a complex role in influencing how people engage with real and fake news.

Our analysis revealed that the emotions conveyed in fake tweets during the COVID-19 pandemic are in line with the more general trends found in other studies on fake news. However, our research extends beyond that of current studies by offering detailed insights into the precise distribution and strength of emotions elicited by fake tweets. This detailed research closes a significant gap in the body of literature by adding a fresh perspective on our knowledge of emotional dynamics in the context of disseminating false information. Our research contributes significantly to the current discussion on fake news identification by highlighting these comparative aspects and illuminating both recurring themes and previously undiscovered aspects of emotional data in the age of misleading information.

The present analysis was performed with a COVID-19 Twitter dataset, which does not cover the whole period of the pandemic. A complementary study on a dataset that covers a wider time interval might yield more generalizable findings, while our study represents a new effort in the field. In this research, the elicited emotions of fake and real news were compared, and the emotion with the highest score was assigned to each tweet, while an alternative method could be to compare the emotion score intervals for fake and real news. The performance of detection models could be further improved by using pretrained emotion models and adding additional emotion features to the models. In a future study, our hypothesis that “fake news and real news are different in terms of elicited emotions, and fake news elicits more negative emotions” could be examined in an experimental field study. Additionally, the premises and suppositions underlying this study could be tested in emergency scenarios beyond the COVID-19 context to enhance the breadth of crisis readiness.

The field of fake news research is interdisciplinary, drawing on the expertise of scholars from various domains who can contribute significantly by formulating pertinent research questions. Psychologists and social scientists have the opportunity to delve into the motivations and objectives behind the creators of fake news. Scholars in management can offer strategic insights for organizations to deploy in countering the spread of fake news. Legislators are in a position to draft laws that effectively stem the flow of fake news across social media channels. In addition, the combined efforts of researchers from other academic backgrounds can make substantial additions to the existing literature on fake news.

The aim of this research was to propose novel attributes for current fake news identification techniques and to explore the emotional and sentiment distinctions between fake news and real news. This study was designed to tackle the subsequent research questions: 1. How do the sentiments associated with real news and fake news differ? 2. How do the emotions elicited by fake news differ from those elicited by real news? 3. What particular elicited emotions are most prevalent in fake news? 4. How could these elicited emotions be used to recognize fake news on social media? To answer these research questions, we thoroughly examined tweets related to COVID-19. We employed a comprehensive strategy, integrating lexicons such as Vader, TextBlob, and SentiWordNet together with machine learning models, including random forest, naïve Bayes, and SVM, as well as a deep learning model named BERT. We first performed sentiment analysis using the lexicons. Fake news elicited more negative sentiments, supporting the idea that disseminators use extreme negativity to attract attention. Real news elicited more positive sentiments, as expected from trustworthy news channels. For fake news, there was a greater prevalence of negative emotions, including fear, disgust, and anger, while for real news, there was a greater frequency of positive emotions, such as anticipation, joy, and surprise. The intensity of these emotions further differentiated fake and real news, with fear being the most dominant emotion in both categories. We applied machine learning models (random forest, naïve Bayes, SVM) and a deep learning model (BERT) to detect fake news using sentiment and emotion features. The models demonstrated improved accuracy when incorporating emotion features. Anticipation, trust, and fear emerged as significant differentiators between fake and real news, according to the random forest feature importance analysis.

The findings of this research could lead to reliable resources for communicators, managers, marketers, psychologists, sociologists, and crisis and social media researchers to further explain social media behavior and contribute to the existing fake news detection approaches. The main contribution of this study is the introduction of emotions as a role-playing feature in fake news detection and the explanation of how specific elicited emotions differ between fake and real news. The elicited emotions extracted from social media during a crisis such as the COVID-19 pandemic could not only be an important variable for detecting fake news but also provide a general overview of the dominant emotions among individuals and the mental health of society during such a crisis. Investigating and extracting further features of fake news has the potential to improve the identification of fake news and may allow for the implementation of preventive measures. Furthermore, the suggested methodology could be applied to detecting fake news in fields such as politics, sports, and advertising. We expect to observe a similar impact of emotions on other topics as well.

Data availability

The datasets analyzed during the current study are available in the Zenodo repository: https://doi.org/10.5281/zenodo.10951346 .

Agarwal S, Farid H, El-Gaaly T, Lim S-N (2020) Detecting Deep-Fake Videos from Appearance and Behavior. 2020 IEEE International Workshop on Information Forensics and Security (WIFS), 1–6. https://doi.org/10.1109/WIFS49906.2020.9360904

Ainapure BS, Pise RN, Reddy P, Appasani B, Srinivasulu A, Khan MS, Bizon N (2023) Sentiment analysis of COVID-19 tweets using deep learning and lexicon-based approaches. Sustainability 15(3):2573. https://doi.org/10.3390/su15032573

Article   Google Scholar  

Ajao O, Bhowmik D, Zargari S (2019) Sentiment Aware Fake News Detection on Online Social Networks. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2507–2511. https://doi.org/10.1109/ICASSP.2019.8683170

Al-Rawi A, Groshek J, Zhang L (2019) What the fake? Assessing the extent of networked political spamming and bots in the propagation of# fakenews on Twitter. Online Inf Rev 43(1):53–71. https://doi.org/10.1108/OIR-02-2018-0065

Allcott H, Gentzkow M (2017) Social media and fake news in the 2016 election. J Econ Perspect 31(2):211–236. https://doi.org/10.1257/jep.31.2.211

Amer E, Kwak K-S, El-Sappagh S (2022) Context-based fake news detection model relying on deep learning models. Electronics (Basel) 11(8):1255. https://doi.org/10.3390/electronics11081255

Apuke OD, Omar B (2020) User motivation in fake news sharing during the COVID-19 pandemic: an application of the uses and gratification theory. Online Inf Rev 45(1):220–239. https://doi.org/10.1108/OIR-03-2020-0116

Baccarella CV, Wagner TF, Kietzmann JH, McCarthy IP (2018) Social media? It’s serious! Understanding the dark side of social media. Eur Manag J 36(4):431–438. https://doi.org/10.1016/j.emj.2018.07.002

Baccarella CV, Wagner TF, Kietzmann JH, McCarthy IP (2020) Averting the rise of the dark side of social media: the role of sensitization and regulation. Eur Manag J 38(1):3–6. https://doi.org/10.1016/j.emj.2019.12.011

Baumeister RF, Bratslavsky E, Finkenauer C, Vohs KD (2001) Bad is stronger than good. Rev Gen Psychol 5(4):323–370. https://doi.org/10.1037/1089-2680.5.4.323

Berthon PR, Pitt LF (2018) Brands, truthiness and post-fact: managing brands in a post-rational world. J Macromark 38(2):218–227. https://doi.org/10.1177/0276146718755869

Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324

Carlson M (2020) Fake news as an informational moral panic: the symbolic deviancy of social media during the 2016 US presidential election. Inf Commun Soc 23(3):374–388. https://doi.org/10.1080/1369118X.2018.1505934

Chua AYK, Banerjee S (2018) Intentions to trust and share online health rumors: an experiment with medical professionals. Comput Hum Behav 87:1–9. https://doi.org/10.1016/j.chb.2018.05.021

Cinelli M, De Francisci Morales G, Galeazzi A, Quattrociocchi W, Starnini M (2021) The echo chamber effect on social media. Proc Natl Acad Sci USA 118(9). https://doi.org/10.1073/pnas.2023301118

Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297. https://doi.org/10.1007/BF00994018

Cui L, Wang S, Lee D (2019) SAME: sentiment-aware multi-modal embedding for detecting fake news. 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 41–48. https://doi.org/10.1145/3341161.3342894

Dai E, Sun Y, Wang S (2020) Ginger cannot cure cancer: Battling fake health news with a comprehensive data repository. In Proceedings of the 14th International AAAI Conference on Web and Social Media, ICWSM 2020 (pp. 853–862). (Proceedings of the 14th International AAAI Conference on Web and Social Media, ICWSM 2020). AAAI press

de Regt A, Montecchi M, Lord Ferguson S (2020) A false image of health: how fake news and pseudo-facts spread in the health and beauty industry. J Product Brand Manag 29(2):168–179. https://doi.org/10.1108/JPBM-12-2018-2180

Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint. https://doi.org/10.48550/arXiv.1810.04805

Dey A, Rafi RZ, Parash SH, Arko SK, Chakrabarty A (2018) Fake news pattern recognition using linguistic analysis. Paper presented at the 2018 joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Kitakyushu, Japan. pp. 305–309

Er MF, Yılmaz YB (2023) Which emotions of social media users lead to dissemination of fake news: sentiment analysis towards Covid-19 vaccine. J Adv Res Nat Appl Sci 9(1):107–126. https://doi.org/10.28979/jarnas.1087772

Esuli A, Sebastiani F (2006) Sentiwordnet: A publicly available lexical resource for opinion mining. Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Farhoundinia B (2023). Analyzing effects of emotions on fake news detection: a COVID-19 case study. PhD Thesis, Sabanci Graduate Business School, Sabanci University

Farhoudinia B, Ozturkcan S, Kasap N (2023) Fake news in business and management literature: a systematic review of definitions, theories, methods and implications. Aslib J Inf Manag https://doi.org/10.1108/AJIM-09-2022-0418

Faustini PHA, Covões TF (2020) Fake news detection in multiple platforms and languages. Expert Syst Appl 158:113503. https://doi.org/10.1016/j.eswa.2020.113503

Frenkel S, Davey A, Zhong R (2020) Surge of virus misinformation stumps Facebook and Twitter. N Y Times (Online) https://www.nytimes.com/2020/03/08/technology/coronavirus-misinformation-social-media.html

Giglietto F, Iannelli L, Valeriani A, Rossi L (2019) ‘Fake news’ is the invention of a liar: how false information circulates within the hybrid news system. Curr Sociol 67(4):625–642. https://doi.org/10.1177/0011392119837536

Hamed SK, Ab Aziz MJ, Yaakub MR (2023) Fake news detection model on social media by leveraging sentiment analysis of news content and emotion analysis of users’ comments. Sensors (Basel, Switzerland) 23(4):1748. https://doi.org/10.3390/s23041748

Article   ADS   PubMed   Google Scholar  

Hutto C, Gilbert E (2014) VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216–225. https://doi.org/10.1609/icwsm.v8i1.14550

Iwendi C, Mohan S, khan S, Ibeke E, Ahmadian A, Ciano T (2022) Covid-19 fake news sentiment analysis. Comput Electr Eng 101:107967–107967. https://doi.org/10.1016/j.compeleceng.2022.107967

Article   PubMed   PubMed Central   Google Scholar  

Jarrahi A, Safari L (2023) Evaluating the effectiveness of publishers’ features in fake news detection on social media. Multimed Tools Appl 82(2):2913–2939. https://doi.org/10.1007/s11042-022-12668-8

Article   PubMed   Google Scholar  

Kahneman D (2011) Thinking, fast and slow, 1st edn. Farrar, Straus and Giroux

Kaliyar RK, Goswami A, Narang P (2021) FakeBERT: fake news detection in social media with a BERT-based deep learning approach. Multimed Tools Appl 80(8):11765–11788. https://doi.org/10.1007/s11042-020-10183-2

Kim A, Dennis AR (2019) Says who? The effects of presentation format and source rating on fake news in social media. MIS Q 43(3):1025–1039. https://doi.org/10.25300/MISQ/2019/15188

Kumar A, Bezawada R, Rishika R, Janakiraman R, Kannan PK (2016) From social to sale: the effects of firm-generated content in social media on customer behavior. J Mark 80(1):7–25. https://doi.org/10.1509/jm.14.0249

Lewicka M, Czapinski J, Peeters G (1992) Positive-negative asymmetry or when the heart needs a reason. Eur J Soc Psychol 22(5):425–434. https://doi.org/10.1002/ejsp.2420220502

Loria S (2018) Textblob documentation. Release 0.15, 2 accessible at https://readthedocs.org/projects/textblob/downloads/pdf/latest/ . available at http://citebay.com/how-to-cite/textblob/

Meel P, Vishwakarma DK (2020) Fake news, rumor, information pollution in social media and web: a contemporary survey of state-of-the-arts, challenges and opportunities. Expert Syst Appl 153:112986. https://doi.org/10.1016/j.eswa.2019.112986

Mercer J (2010) Emotional beliefs. Int Organ 64(1):1–31. https://www.jstor.org/stable/40607979

Mohammad SM, Turney PD (2013) Crowdsourcing a word–emotion association lexicon. Comput Intell 29(3):436–465. https://doi.org/10.1111/j.1467-8640.2012.00460.x

Article   MathSciNet   Google Scholar  

Moravec PL, Kim A, Dennis AR (2020) Appealing to sense and sensibility: system 1 and system 2 interventions for fake news on social media. Inf Syst Res 31(3):987–1006. https://doi.org/10.1287/isre.2020.0927

Mourad A, Srour A, Harmanai H, Jenainati C, Arafeh M (2020) Critical impact of social networks infodemic on defeating coronavirus COVID-19 pandemic: Twitter-based study and research directions. IEEE Trans Netw Serv Manag 17(4):2145–2155. https://doi.org/10.1109/TNSM.2020.3031034

Ongsulee P (2017) Artificial intelligence, machine learning and deep learning. Paper presented at the 2017 15th International Conference on ICT and Knowledge Engineering (ICT&KE)

Ozbay FA, Alatas B (2020) Fake news detection within online social media using supervised artificial intelligence algorithms. Physica A 540:123174. https://doi.org/10.1016/j.physa.2019.123174

Patwa P, Sharma S, Pykl S, Guptha V, Kumari G, Akhtar MS, Ekbal A, Das A, Chakraborty T (2021) Fighting an Infodemic: COVID-19 fake news dataset. In: Combating online hostile posts in regional languages during emergency situation. Cham, Springer International Publishing

Păvăloaia V-D, Teodor E-M, Fotache D, Danileţ M (2019) Opinion mining on social media data: sentiment analysis of user preferences. Sustainability 11(16):4459. https://doi.org/10.3390/su11164459

Pawar KK, Shrishrimal PP, Deshmukh RR (2015) Twitter sentiment analysis: a review. Int J Sci Eng Res 6(4):957–964

Google Scholar  

Pennycook G, McPhetres J, Zhang Y, Lu JG, Rand DG (2020) Fighting COVID-19 misinformation on social media: experimental evidence for a scalable accuracy-nudge intervention. Psychol Sci 31(7):770–780. https://doi.org/10.1177/0956797620939054

Pennycook G, Rand DG (2020) Who falls for fake news? The roles of bullshit receptivity, overclaiming, familiarity, and analytic thinking. J Personal 88(2):185–200. https://doi.org/10.1111/jopy.12476

Peterson M (2019) A high-speed world with fake news: brand managers take warning. J Product Brand Manag 29(2):234–245. https://doi.org/10.1108/JPBM-12-2018-2163

Plutchik R (1980) A general psychoevolutionary theory of emotion. In: Plutchik R, Kellerman H (eds) Theories of emotion (3–33): Elsevier. https://doi.org/10.1016/B978-0-12-558701-3.50007-7

Rajamma RK, Paswan A, Spears N (2019) User-generated content (UGC) misclassification and its effects. J Consum Mark 37(2):125–138. https://doi.org/10.1108/JCM-08-2018-2819

Raza S, Ding C (2022) Fake news detection based on news content and social contexts: a transformer-based approach. Int J Data Sci Anal 13(4):335–362. https://doi.org/10.1007/s41060-021-00302-z

Salakhutdinov R, Tenenbaum JB, Torralba A (2012) Learning with hierarchical-deep models. IEEE Trans Pattern Anal Mach Intell 35(8):1958–1971. https://doi.org/10.1109/TPAMI.2012.269

Silverman C (2016) This Analysis Shows How Viral Fake Election News Stories Outperformed Real News On Facebook. BuzzFeed News 16. https://www.buzzfeednews.com/article/craigsilverman/viral-fake-election-news-outperformed-real-news-on-facebook

Suter V, Shahrezaye M, Meckel M (2022) COVID-19 Induced misinformation on YouTube: an analysis of user commentary. Front Political Sci 4:849763. https://doi.org/10.3389/fpos.2022.849763

Talwar S, Dhir A, Kaur P, Zafar N, Alrasheedy M (2019) Why do people share fake news? Associations between the dark side of social media use and fake news sharing behavior. J Retail Consum Serv 51:72–82. https://doi.org/10.1016/j.jretconser.2019.05.026

Vallat R (2018) Pingouin: statistics in Python. J Open Source Softw 3(31):1026. https://doi.org/10.21105/joss.01026

Article   ADS   Google Scholar  

Vasist PN, Sebastian M (2022) Tackling the infodemic during a pandemic: A comparative study on algorithms to deal with thematically heterogeneous fake news. Int J Inf Manag Data Insights 2(2):100133. https://doi.org/10.1016/j.jjimei.2022.100133

Vinodhini G, Chandrasekaran R (2012) Sentiment analysis and opinion mining: a survey. Int J Adv Res Comput Sci Softw Eng 2(6):282–292

Vosoughi S, Roy D, Aral S (2018) The spread of true and false news online. Science 359(6380):1146–1151. https://doi.org/10.1126/science.aap9559

Article   ADS   CAS   PubMed   Google Scholar  

Wang Y, McKee M, Torbica A, Stuckler D (2019) Systematic literature review on the spread of health-related misinformation on social media. Soc Sci Med 240:112552. https://doi.org/10.1016/j.socscimed.2019.112552

Wankhade M, Rao ACS, Kulkarni C (2022) A survey on sentiment analysis methods, applications, and challenges. Artif Intell Rev 55(7):5731–5780. https://doi.org/10.1007/s10462-022-10144-1

Whiting A, Williams D (2013) Why people use social media: a uses and gratifications approach. Qual Mark Res 16(4):362–369. https://doi.org/10.1108/QMR-06-2013-0041

Zhang H (2004) The optimality of naive Bayes. Aa 1(2):3

Zhou X, Zafarani R (2019) Network-based fake news detection: A pattern-driven approach. ACM SIGKDD Explor Newsl 21(2):48–60. https://doi.org/10.1145/3373464.3373473

Zhou X, Zafarani R, Shu K, Liu H (2019) Fake news: Fundamental theories, detection strategies and challenges. Paper presented at the Proceedings of the twelfth ACM international conference on web search and data mining. https://doi.org/10.1145/3289600.3291382

Download references

Open access funding provided by Linnaeus University.

Author information

Authors and affiliations.

Sabancı Business School, Sabancı University, Istanbul, Turkey

Bahareh Farhoudinia, Selcen Ozturkcan & Nihat Kasap

School of Business and Economics, Linnaeus University, Växjö, Sweden

Selcen Ozturkcan

You can also search for this author in PubMed   Google Scholar

Contributions

Bahareh Farhoudinia (first author) conducted the research, retrieved the open access data collected by other researchers, conducted the analysis, and drafted the manuscript as part of her PhD thesis successfully completed at Sabancı University in the year 2023. Selcen Ozturkcan (second author and PhD co-advisor) provided extensive guidance throughout the research process, co-wrote sections of the manuscript, and offered critical feedback on the manuscript. Nihat Kasap (third author and PhD main advisor) oversaw the overall project and provided valuable feedback on the manuscript.

Corresponding author

Correspondence to Selcen Ozturkcan .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Informed consent was not required as the study did not involve a design that requires consent.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Farhoudinia, B., Ozturkcan, S. & Kasap, N. Emotions unveiled: detecting COVID-19 fake news on social media. Humanit Soc Sci Commun 11 , 640 (2024). https://doi.org/10.1057/s41599-024-03083-5

Download citation

Received : 02 June 2023

Accepted : 22 April 2024

Published : 18 May 2024

DOI : https://doi.org/10.1057/s41599-024-03083-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

research about social media fake news

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

A systematic review on fake news research through the lens of news creation and consumption: Research efforts, challenges, and future directions

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

Affiliation School of Intelligence Computing, Hanyang University, Seoul, Republic of Korea

Roles Conceptualization, Formal analysis, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing

Affiliation College of Information Sciences and Technology, Pennsylvania State University, State College, PA, United States of America

Roles Funding acquisition, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

ORCID logo

  • Bogoan Kim, 
  • Aiping Xiong, 
  • Dongwon Lee, 
  • Kyungsik Han

PLOS

  • Published: December 9, 2021
  • https://doi.org/10.1371/journal.pone.0260080
  • Reader Comments

28 Dec 2023: The PLOS One Staff (2023) Correction: A systematic review on fake news research through the lens of news creation and consumption: Research efforts, challenges, and future directions. PLOS ONE 18(12): e0296554. https://doi.org/10.1371/journal.pone.0296554 View correction

Fig 1

Although fake news creation and consumption are mutually related and can be changed to one another, our review indicates that a significant amount of research has primarily focused on news creation. To mitigate this research gap, we present a comprehensive survey of fake news research, conducted in the fields of computer and social sciences, through the lens of news creation and consumption with internal and external factors.

We collect 2,277 fake news-related literature searching six primary publishers (ACM, IEEE, arXiv, APA, ELSEVIER, and Wiley) from July to September 2020. These articles are screened according to specific inclusion criteria (see Fig 1). Eligible literature are categorized, and temporal trends of fake news research are examined.

As a way to acquire more comprehensive understandings of fake news and identify effective countermeasures, our review suggests (1) developing a computational model that considers the characteristics of news consumption environments leveraging insights from social science, (2) understanding the diversity of news consumers through mental models, and (3) increasing consumers’ awareness of the characteristics and impacts of fake news through the support of transparent information access and education.

We discuss the importance and direction of supporting one’s “digital media literacy” in various news generation and consumption environments through the convergence of computational and social science research.

Citation: Kim B, Xiong A, Lee D, Han K (2021) A systematic review on fake news research through the lens of news creation and consumption: Research efforts, challenges, and future directions. PLoS ONE 16(12): e0260080. https://doi.org/10.1371/journal.pone.0260080

Editor: Luigi Lavorgna, Universita degli Studi della Campania Luigi Vanvitelli, ITALY

Received: March 24, 2021; Accepted: November 2, 2021; Published: December 9, 2021

Copyright: © 2021 Kim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript.

Funding: This research was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (2019-0-01584, 2020-0-01373).

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

The spread of fake news not only deceives the public, but also affects society, politics, the economy and culture. For instance, Buzzfeed ( https://www.buzzfeed.com/ ) compared and analyzed participation in 20 real news and 20 fake news articles (e.g., likes, comments, share activities) that spread the most on Facebook during the last three months of the 2016 US Presidential Election. According to the results, the participation rate of fake news (8.7 million) was higher than that of mainstream news (7.3 million), and 17 of the 20 fake news played an advantageous role in winning the election [ 1 ]. Pakistan’s ministry of Defense posted a tweet fiercely condemning Israel after coming to believe that Israel had threatened Pakistan with nuclear weapons, which was later found to be false [ 2 ]. Recently, the spread of the absurd rumor that COVID-19 propagates through 5G base stations in the UK caused many people to become upset and resulted in a base station being set on fire [ 3 ].

Such fake news phenomenon has been rapidly evolving with the emergence of social media [ 4 , 5 ]. Fake news can be quickly shared by friends, followers, or even strangers within only a few seconds. Repeating a series of these processes could lead the public to form the wrong collective intelligence [ 6 ]. This could further develop into diverse social problems (i.e., setting a base station on fire because of rumors). In addition, some people believe and propagate fake news due to their personal norms, regardless of the factuality of the content [ 7 ]. Research in social science has suggested that cognitive bias (e.g., confirmation bias, bandwagon effect, and choice-supportive bias) [ 8 ] is one of the most pivotal factors in making irrational decisions in terms of the both creation and consumption of fake news [ 9 , 10 ]. Cognitive bias greatly contributes to the formation and enhancement of the echo chamber [ 11 ], meaning that news consumers share and consume information only in the direction of strengthening their beliefs [ 12 ].

Research using computational techniques (e.g., machine or deep learning) has been actively conducted for the past decade to investigate the current state of fake news and detect it effectively [ 13 ]. In particular, research into text-based feature selection and the development of detection models has been very actively and extensively conducted [ 14 – 17 ]. Research has been also active in the collection of fake news datasets [ 18 , 19 ] and fact-checking methodologies for model development [ 20 – 22 ]. Recently, Deepfake, which can manipulate images or videos through deep learning technology, has been used to create fake news images or videos, significantly increasing social concerns [ 23 ], and a growing body of research is being conducted to find ways of mitigating such concerns [ 24 – 26 ]. In addition, some research on system development (i.e., a game to increase awareness of the negative aspects of fake news) has been conducted to educate the public to avoid and prevent them from the situation where they could fall into the echo chamber, misunderstandings, wrong decision-making, blind belief, and propagating fake news [ 27 – 29 ].

While the creation and consumption of fake news are clearly different behaviors, due to the characteristics of the online environment (e.g., information can be easily created, shared, and consumed by anyone at anytime from anywhere), the boundaries between fake news creators and consumers have started to become blurred. Depending on the situation, people can quickly change their roles from fake news consumers to creators, or vice versa (with or without their intention). Furthermore, news creation and consumption are the most fundamental aspects that form the relationship between news and people. However, a significant amount of fake news research has positioned in news creation while considerably less research focus has been placed in news consumption (see Figs 1 & 2 ). This suggests that we must consider fake news as a comprehensive aspect of news consumption and creation .

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0260080.g001

thumbnail

The papers were published in IEEE, ACM, ELSEVIER, arXiv, Wiley, APA from 2010 to 2020 classified by publisher, main category, sub category, and evaluation method (left to right).

https://doi.org/10.1371/journal.pone.0260080.g002

In this paper, we looked into fake news research through the lens of news creation and consumption ( Fig 3 ). Our survey results offer different yet salient insights on fake news research compared with other survey papers (e.g., [ 13 , 30 , 31 ]), which primarily focus on fake news creation. The main contributions of our survey are as follows:

  • We investigate trends in fake news research from 2010 to 2020 and confirm a need for applying a comprehensive perspective to fake news phenomenon.
  • We present fake news research through the lens of news creation and consumption with external and internal factors.
  • We examine key findings with a mental model approach, which highlights individuals’ differences in information understandings, expectations, or consumption.
  • We summarize our review and discuss complementary roles of computer and social sciences and potential future directions for fake news research.

thumbnail

We investigate fake news research trend (Section 2), and examine fake news creation and consumption through the lenses of external and internal factors. We also investigate research efforts to mitigate external factors of fake news creation and consumption: (a) indicates fake news creation (Section 3), and (b) indicates fake news consumption (Section 4). “Possible moves” indicates that news consumers “possibly” create/propagate fake news without being aware of any negative impact.

https://doi.org/10.1371/journal.pone.0260080.g003

2 Fake news definition and trends

There is still no definition of fake news that can encompass false news and various types of disinformation (e.g., satire, fabricated content) and can reach a social consensus [ 30 ]. The definition continues to change over time and may vary depending on the research focus. Some research has defined fake news as false news based on the intention and factuality of the information [ 4 , 15 , 32 – 36 ]. For example, Allcott and Gentzkow [ 4 ] defined fake news as “news articles that are intentionally and verifiably false and could mislead readers.” On the other hand, other studies have defined it as “a news article or message published and propagated through media, carrying false information regardless of the means and motives behind it” [ 13 , 37 – 43 ]. Given this definition, fake news refers to false information that causes an individual to be deceived or doubt the truth, and fake news can only be useful if it actually deceives or confuses consumers. Zhou and Zafarani [ 31 ] proposed a broad definition (“Fake news is false news.”) that encompasses false online content and a narrow definition (“Fake news is intentionally and verifiably false news published by a news outlet.”). The narrow definition is valid from the fake news creation perspective. However, given that fake news creators and consumers are now interchangeable (e.g., news consumers also play a role of gatekeeper for fake news propagation), it has become important to understand and investigate the fake news through consumption perspectives. Thus, in this paper, we use the broad definition of fake news.

Our research motivation for considering news creation and consumption in fake news research was based on the trend analysis. We collected 2,277 fake news-related literature using four keywords (i.e., fake news, false information, misinformation, rumor) to identify longitudinal trends of fake news research from 2010 to 2020. The data collection was conducted from July to September 2020. The criteria of data collection was whether any of these keywords exists in the title or abstract. To reflect diverse research backgrounds/domains, we considered six primary publishers (ACM, IEEE, arXiv, APA, ELSEVIER, and Wiley). The number of papers collected for each publisher is as follows: 852 IEEE (37%), 639 ACM (28%), 463 ELSEVIER (20%), 142 arXiv (7%), 141 Wiley (6%), 40 APA (2%). We excluded 59 papers that did not have the abstract and used 2,218 papers for the analysis. We then randomly chose 200 papers, and two coders conducted manual inspection and categorization. The inter-coder reliability was verified by the Cohen’s Kappa measurement. The scores for each main/sub-category were higher than 0.72 (min: 0.72, max: 0.95, avg: 0.85), indicating that the inter-coder reliability lies between “substantial” to “perfect” [ 44 ]. Through the coding procedure, we excluded non-English studies (n = 12) and reports on study protocol only (n = 6), and 182 papers were included in synthesis. The PRISMA flow chart depicts the number of articles identified, included, and excluded (see Fig 1 ).

The papers were categorized into two main categories: (1) creation (studies with efforts to detect fake news or mitigate spread of fake news) and (2) consumption (studies that reported the social impacts of fake news on individuals or societies and how to appropriately handle fake news). Each main category was then classified into sub-categories. Fig 4 shows the frequency of the entire literature by year and the overall trend of fake news research. It appears that the consumption perspective of fake news still has not received sufficient attention compared with the creation perspective ( Fig 4(a) ). Fake news studies have exploded since the 2016 US Presidential Election, and the trend of increase in fake news research continues. In the creation category, the majority of papers (135 out of 158; 85%) were related to the false information (e.g., fake news, rumor, clickbait, spam) detection model ( Fig 4(b) ). On the other hand, in the consumption category, much research pertains to data-driven fake news trend analysis (18 out of 42; 43%) or fake content consumption behavior (16 out of 42; 38%), including studies for media literacy education or echo chamber awareness ( Fig 4(c) ).

thumbnail

We collected 2,277 fake news related-papers and randomly chose and categorized 200 papers. Each marker indicates the number of fake news studies per type published in a given year. Fig 4(a) shows a research trend of news creation and consumption (main category). Fig 4(b) and 4(c) show a trend of the sub-categories of news creation and consumption. In Fig 4(b), “Miscellaneous” includes studies on stance/propaganda detection and a survey paper. In Fig 4(c), “Data-driven fake news trend analysis” mainly covers the studies reporting the influence of fake news that spread around specific political/social events (e.g., fake news in Presidential Election 2016, Rumor in Weibo after 2015 Tianjin explosions). “Conspiracy theory” refers to an unverified rumor that was passed on to the public.

https://doi.org/10.1371/journal.pone.0260080.g004

3 Fake news creation

Fake news is no longer merely propaganda spread by inflammatory politicians; it is also made for financial benefit or personal enjoyment [ 45 ]. With the development of social media platforms people often create completely false information for reasons beyond satire. Further, there is a vicious cycle of this false information being abused by politicians and agitators.

Fake news creators are indiscriminately producing fake news while considering the behavioral and psychological characteristics of today’s news consumers [ 46 ]. For instance, the sleeper effect [ 47 ] refers to a phenomenon in which the persuasion effect increases over time, even though the pedigree of information shows low reliability. In other words, after a long period of time, memories of the pedigree become poor and only the content tends to be remembered regardless of the reliability of the pedigree. Through this process, less reliable information becomes more persuasive over time. Fake news creators have effectively created and propagated fake news by targeting the public’s preference for news consumption through peripheral processing routes [ 35 , 48 ].

Peripheral routes are based on the elaboration likelihood model (ELM) [ 49 ], one of the representative psychological theories that handles persuasive messages. According to the ELM, the path of persuasive message processing can be divided into the central and the peripheral routes depending on the level of involvement. On one hand, if the message recipient puts a great deal of cognitive effort into processing, the central path is chosen. On the other hand, if the process of the message is limited due to personal characteristics or distractions, the peripheral route is chosen. Through a peripheral route, a decision is made based on other secondary cues (e.g., speakers, comments) rather than the logic or strength of the argument.

Wang et al. [ 50 ] demonstrated that most of the links shared or mentioned in social media have never even been clicked. This implies that many people perceive and process information in only fragmentary way, such as via news headlines and the people sharing news, rather than considering the logical flow of news content.

In this section, we closely examined each of the external and internal factors affecting fake news creation, as well as the research efforts carried out to mitigate the negative results based on the fake news creation perspective.

3.1 External factors: Fake news creation facilitators

We identified two external factors that facilitate fake news creation and propagation: (1) the unification of news creation, consumption, and distribution, (2) the misuse of AI technology, and (3) the use of social media as a news platform (see Fig 5 ).

thumbnail

We identify two external factors—The unification of news and the misuse of AI technology—That facilitate fake news creation.

https://doi.org/10.1371/journal.pone.0260080.g005

3.1.1 The unification of news creation, consumption, and distribution.

The public’s perception of news and the major media of news consumption has gradually changed. The public no longer passively consumes news exclusively through traditional news organizations with specific formats (e.g., the inverted pyramid style, verified sources) nor view those news simply as a medium for information acquisition. The public’s active news consumption behaviors began in earnest with the advent of citizen journalism by implementing journalistic behavior based on citizen participation [ 51 ] and became commonplace with the emergence of social media. As a result, the public began to prefer interactive media, in which new information could be acquired, their opinions can be offered, and they can discuss the news with other news consumers. This environment has motivated the public to make content about their beliefs and deliver the content to many people as “news.” For example, a recent police crackdown video posted in social media quickly spread around the world that influenced protesters and civic movements. Then, it was reported later by the mainstream media [ 52 ].

The boundaries between professional journalists and amateurs, as well as between news consumers and creators, are disappearing. This has led to a potential increase in deceptive communications, making news consumers suspicious and misinterpreted the reality. Online platforms (e.g., YouTube, Facebook) that allow users to freely produce and distribute content have been growing significantly. As a result, fake news content can be used to attract secondary income (e.g., multinational enterprises’ advertising fees), which contributes to accelerating fake news creation and propagation. An environment in which the public can only consume news that suits their preferences and personal cognitive biases has made it much easier for fake news creators to achieve their specific purposes (e.g., supporting a certain political party or a candidate they favor).

3.1.2 The misuse of AI technology.

The development of AI technology has made it easier to develop and utilize tools for creating fake news, and many studies have confirmed the impact of these technologies— (1) social bots, (2) trolls, and (3) fake media —on social networks and democracy over the past decade.

3.1.2.1 Social bots . Shao et al. [ 53 ] analyzed the pattern of fake news spread and confirmed that social bots play a significant role in fake news propagation and social bot-based automated accounts were largely affected by the initial stage of spreading fake news. In general, it is uneasy for the public to determine whether such accounts are people or bots. In addition, social bots are not illegal tools and many companies legally purchase them as a part of marketing, thus it is not easy to curb the use of social bots systematically.

3.1.2.2 Trolls . The term “trolls” refers to people who deliberately cause conflict or division by uploading inflammatory, provocative content or unrelated posts to online communities. They work with the aim of stimulating people’s feelings or beliefs and hindering mature discussions. For example, the Russian troll army has been active in social media to advance its political agenda and cause social turmoil in the US [ 54 ]. Zannettou et al. [ 55 ] confirmed how effectively the Russian troll army has been spreading fake news URLs on Twitter and its significant impact on making other Twitter users believe misleading information.

3.1.2.3 Fake media . It is now possible to manipulate or reproduce content in 2D or even 3D through AI technology. In particular, the advent of fake news using Deepfake technology (combining various images on an original video and generating a different video) has raised another major social concern that had not been imagined before. Due to the popularity of image or video sharing on social media, such media types have become the dominant form of news consumption, and the Deepfake technology itself is becoming more advanced and applied to images and videos in a variety of domains. We witnessed a video clip of former US President Barack Obama criticizing Donald Trump, which was manipulated by the US online media company BuzzFeed to highlight the influence and danger of Deepfake, causing substantial social confusion [ 56 ].

3.2 Internal factors: Fake news creation purposes

We identified three main purposes for fake news creation— (1) ideological purposes, (2) monetary purposes, and (3) fear/panic reduction .

3.2.1 Ideological purpose.

Fake news has been created and propagated for political purposes by individuals or groups that positively affect the parties or candidates they support or undermine those who are not on the same side. Fake news with this political purpose has shown to negatively influence people and society. For instance, Russia created a fake Facebook account that caused many political disputes and enhanced polarization, affecting the 2016 US Presidential Election [ 57 ]. As polarization has intensified, there has also been a trend in the US that “unfriending” people who have different political tendencies [ 58 ]. This has led the public to decide whether to trust the news or not regardless of its factuality and has resulted in worsening in-group biases. During the Brexit campaign in the UK, many selective news articles were exposed on Facebook, and social bots and trolls were also confirmed as being involved in creating public opinions [ 59 , 60 ].

3.2.2 Monetary purpose.

Financial benefit is another strong motivation for many fake news creators [ 34 , 61 ]. Fake news websites usually reach the public through social media and make profits through posted advertisements. The majority of fake websites are focused on earning advertising revenue by spreading fake news that would attract readers’ attention, rather than political goals. For example, during the 2016 US Presidential Election in Macedonia, young people in their 10s and 20s used content from some extremely right-leaning blogs in the US to mass-produce fake news, earning huge advertising revenues [ 62 ]. This is also why fake news creators use provocative titles, such as clickbait headlines, to induce clicks and attempt to produce as many fake news articles as possible.

3.2.3 Fear and panic reduction.

In general, when epidemics become more common around the world, rumors of absurd and false medical tips spread rapidly in social media. When there is a lack of verified information, people feel great anxious and afraid and easily believe such tips, regardless of whether they are true [ 63 , 64 ]. The term infodemic , which first appeared during the 2003 SARS pandemics, describes this phenomenon [ 65 ]. Regarding COVID-19, health authorities have recently announced that preventing the creation and propagation of fake news about the virus is as important as alleviating the contagious power of COVID-19 [ 66 , 67 ]. The spread of fake news due to the absence of verified information has become more common regarding health-related social issues (e.g., infectious diseases), natural disasters, etc. For example, people with disorders affecting cognition (e.g., neurodegenerative disorder) are tend to easily believe unverified medical news [ 68 – 70 ]. Robledo and Jankovic [ 68 ] confirmed that many fake or exaggerated medical journals are misleading people with Parkinson’s disease by giving false hopes and unfounded fake articles. Another example is a rumor that climate activists set fire to raise awareness of climate change quickly spread as fake news [ 71 ], when a wildfire broke out in Australia in 2019. As a result, people became suspicious and tended to believe that the causes of climate change (e.g., global warming) may not be related to humans, despite scientific evidence and research data.

3.3 Fake news detection and prevention

The main purpose of fake news creation is to make people confused or deceived regardless of topic, social atmosphere, or timing. Due to this purpose, it appears that fake news tends to have similar frames and structural patterns. Many studies have attempted to mitigate the spread of fake news based on these identifiable patterns. In particular, research on developing computational models that detect fake information (text/images/videos), based on machine or deep learning techniques has been actively conducted, as summarized in Table 1 . Other modeling studies include the credibility of weblogs [ 84 , 85 ], communication quality [ 88 ], susceptibility level [ 90 ], and political stance [ 86 , 87 ]. The table was intended to characterize a research scope and direction of the development of fake information creation (e.g., the features employed in each model development), not to present an exhaustive list.

thumbnail

https://doi.org/10.1371/journal.pone.0260080.t001

3.3.1 Fake text information detection.

Research has considered many text-based features, such as structural (e.g., website URLs and headlines with all capital letters or exclamations) and linguistic information (e.g., grammar, spelling, and punctuation errors) about the news. Research has also considered the sentiments of news articles, the frequency of the words used, user information, and who left comments on the news articles, and social network information among users (who were connected based on activities of commenting, replying, liking or following) were used as key features for model development. These text-based models have been developed for not only fake news articles but also other types of fake information, such as clickbaits, fake reviews, spams, and spammers. Many of the models developed in this context performed a binary classification that distinguished between fake and non-fake articles, with the accuracy of such models ranging from 86% to 93%. Mainstream news articles were used to build most models, and some studies used articles on social media, such as Twitter [ 15 , 17 ]. Some studies developed fake news detection models by extracting features from images, as well as text, in news articles [ 16 , 17 , 75 ].

3.3.2 Fake visual media detection.

The generative adversary network (GAN) is an unsupervised learning method that estimates the probability distribution of original data and allows an artificial neural network to produce similar distributions [ 109 ]. With the advancement of GAN, it has become possible to transform faces in images into those of others. However, photos of famous celebrities have been misused (e.g., being distorted into pornographic videos), increasing concerns about the possible misuse of such technology [ 110 ] (e.g., creating rumors about a certain political candidate). To mitigate this, research has been conducted to develop detection models for fake images. Most studies developed binary classification models (fake image or not), and the accuracy of fake image detection models was high, ranging from 81% to 97%. However, challenges still exist. Unlike fake news detection models that employ fact-checking websites or mainstream news as data verification or ground-truth, fake image detection models were developed using the same or slightly modified image datasets (e.g., CelebA [ 97 ], FFHQ [ 99 ]), asking for the collection and preparation of a large amount of highly diverse data.

4 Fake news consumption

4.1 external factors: fake news consumption circumstances.

The implicit social contract between civil society and the media has gradually disintegrated in modern society, and accordingly, citizens’ trust in the media began to decline [ 111 ]. In addition, the growing number of digital media platforms has changed people’s news consumption environment. This change has increased the diversity of news content and the autonomy of information creation and sharing. At the same time, however, it blurred the line between traditional mainstream media news and fake news in the Internet environment, contributing to polarization.

Here, we identified three external factors that have forced the public to encounter fake news: (1) the decline of trust in the mainstream media, (2) a high-choice media environment, and (3) the use of social media as a news platform .

4.1.1 Fall of mainstream media trust.

Misinformation and unverified or biased reports have gradually undermined the credibility of the mainstream media. According to the 2019 American mass media trust survey conducted by Gallup, only 13% of Americans said they trusted traditional mainstream media: newspapers or TV news [ 112 ]. The decline in traditional media trust is not only a problem for the US, but also a common concern in Europe and Asia [ 113 – 115 ].

4.1.2 High-choice media environment.

Over the past decade, news consumption channels have been radically diversified, and the mainstream has shifted from broadcasting and print media to mobile and social media environments. Despite the diversity of news consumption channels, personalized preferences and repetitive patterns have led people to be exposed to limited information and continue to consume such information increasingly [ 116 ]. This selective news consumption attitude has enhanced the polarization of the public in many multi-media environments [ 117 ]. In addition, the commercialization of digital platforms have created an environment in which cognitive bias can be easily strengthened. In other words, a digital platform based on recommended algorithms has the convenience of providing similar content continuously after a given type of content is consumed. As a result, it may be easy for users to fall into the echo chamber because they only access recommended content. A survey of 1,000 YouTube videos found that more than two-thirds of the videos contained content in favor of a particular candidate [ 118 ].

News consumption in social media does not simply mean the delivery of messages from creators to consumers. The multi-directionality of social media has blurred the boundaries between information creators and consumers. In other words, users are already interacting with one another in various fashions, and when a new interaction type emerges and is supported by the platform, users will display other types of new interactions, which will also influence ways of consuming news information.

4.1.3 Use of social media as news platform.

Here we focus on the most widely used social media platforms—YouTube, Facebook, and Twitter—where each has characteristics of encouraging limited news consumption.

First, YouTube is the most unidirectional of social media. Many YouTube creators tend to convey arguments in a strong, definitive tone through their videos, and these content characteristics make viewers judge the objectivity of the information via non-verbal elements (e.g., speaker, thumbnail, title, comments) rather than facts. Furthermore, many comments often support the content of the video, which may increase the chances of viewers accepting somewhat biased information. In addition, a YouTube video recommendation algorithm causes users who watch certain news to continuously be exposed to other news containing the same or similar information. This behavior and direction on the part of isolated content consumption could undermine the viewer’s media literacy, and is likely to create a screening effect that blocks the user’s eyes and ears.

Second, Facebook is somewhat invisible regarding the details of news articles because this platform ostensibly shows only the title, the number of likes, and the comments of the posts. Often, users have to click on the article and go to the URL to read the article. This structure and consumptive content orientation on the part of Facebook presents obstacles that prevent users from checking the details of their posts. As a result, users have become likely to make limited and biased judgments and perceive content through provocative headlines and comments.

Third, the largest feature of Twitter is anonymity because Twitter asks users to make their own pseudonyms [ 119 ]. Twitter has a limited number of letters to upload, and compared to other platforms, users can produce and spread indiscriminate information anonymously and do not know who is behind the anonymity [ 120 , 121 ]. On the other hand, many accounts on Facebook operate under real names and generally share information with others who are friends or followers. Information creators are not held accountable for anonymous information.

4.2 Internal factors: Cognitive mechanism

Due to the characteristics of the Internet and social media, people are accustomed to consuming information quickly, such as reading only news headlines and checking photos in news articles. This type of news consumption practice could lead people to consider news information mostly based on their beliefs or values. This practice can make it easier for people to fall into an echo chamber and further social confusion. We identified two internal factors affecting fake news consumption: (1) cognitive biases and (2) personal traits (see Fig 6 ).

thumbnail

https://doi.org/10.1371/journal.pone.0260080.g006

4.2.1 Cognitive biases.

Cognitive bias is an observer effect that is broadly recognized in cognitive science and includes basic statistical and memory errors [ 8 ]. However, this bias may vary depending on what factors are most important to affect individual judgments and choices. We identified five cognitive biases that affect fake news consumption: confirmation bias, in-group bias, choice-supportive bias, cognitive dissonance, and primacy effect.

Confirmation bias relates to a human tendency to seek out information in line with personal thoughts or beliefs, as well as to ignore information that goes against such beliefs. This stems from the human desire to be reaffirmed, rather than accept denials of one’s opinion or hypothesis. If the process of confirmation bias is repeated, a more solid belief is gradually formed, and the belief remains unchanged even after encountering logical and objective counterexamples. Evaluating information with an objective attitude is essential to properly investigating any social phenomenon. However, confirmation bias significantly hinders this. Kunda [ 122 ] discussed experiments that investigated the cognitive processes as a function of accuracy goals and directional goals. Her analysis demonstrated that people use different cognitive processes to achieve the two different goals. For those who pursue accuracy goals (reaching a “right conclusion”), information is used as a tool to determine whether they are right or not [ 123 ], and for those with directional goals (reaching a desirable conclusion), information is used as a tool to justify their claims. Thus, biased information processing is more frequently observed by people with directional goals [ 124 ].

People with directional goals have a desire to reach the conclusion they want. The more we emphasize the seriousness and omnipresence of fake news, the less people with directional goals can identify fake news. Moreover, their confirmation bias through social media could result in an echo chamber, triggering a differentiation of public opinion in the media. The algorithm of the media platform further strengthens the tendency of biased information consumption (e.g., filter bubble).

In-group bias is a phenomenon in which an individual favors a group that he or she belongs to. The causes of in-group bias are two [ 125 ]. One is a categorization process, which exaggerates the similarities between members within one category (the internal group) and differences with others (the external groups). Consequently, positive reactions towards the internal group and negative reactions (e.g., hostility) towards the external group are both increased. The other reason is self-respect based on social identity theory. To positively evaluate the internal group, a member tends to perceive that other group members are similar to himself or herself.

In-group bias has a significant impact on fake news consumption because of radical changes in the media environment [ 126 ]. The public recognizes and forms groups based on issues through social media. The emotions and intentions of such groups of people online can be easily transferred or developed into offline activities, such as demonstrations and rallies. Information exchanges within such internal groups proceeds similarly to the situation with confirmation bias. If confirmation bias is keeping to one’s beliefs, in-group bias equates the beliefs of my group with my beliefs.

Choice-supportive bias refers to an individual’s tendency to justify his or her decision by highlighting the evidence that he or she did not consider in making the decision [ 127 ]. For instance, people sometimes have no particular purpose when they purchase a certain brand of products or service, or support a particular politician or political party. They emphasize that their choices at the time were right and inevitable. They also tend to focus more on positive aspects than negative effects or consequences to justify their choice. However, these positive aspects can be distorted because they are mainly based on memory. Thus, choice-supportive bias, can be regarded as the cognitive errors caused by memory distortion.

The behavioral condition of choice-supportive bias is used to justify oneself, which usually occurs in the context of external factors (e.g., maintaining social status or relationships) [ 7 ]. For example, if people express a certain political opinion within a social group, people may seek information with which to justify the opinion and minimize its flaws. In this procedure, people may accept fake news as a supporting source for their opinions.

Cognitive dissonance was based on the notion that some psychological tension would occur when an individual had two perceptions that were inconsistent [ 128 ]. Humans have a desire to identify and resolve the psychological tension that occurs when a cognitive dissonance is established. Regarding fake news consumption, people easily accept fake news if it is aligned with their beliefs or faith. However, if such news is seen as working against their beliefs or faith, people define even real news as fake and consume biased information in order to avoid cognitive dissonance. This is quite similar to cognitive bias. Selective exposure to biased information intensifies its extent and impact in social media. In these circumstances, an individual’s cognitive state is likely to be formed by information from unclear sources, which can be seen as a negative state of perception. In that case, information consumers selectively consume only information that can be in harmony with negative perceptions.

Primacy effect means that information presented previously will have a stronger effect on the memory and decision-making than information presented later [ 129 ]. The “interference theory [ 130 ]” is often referred to as a theoretical basis for supporting the primacy effect, which highlights the fact that the impression formed by the information presented earlier influences subsequent judgments and the process of forming the next impression.

The significance of the primary effect for fake news consumption is that it can be a starting point for biased cognitive processes. If an individual first encounters an issue in fake news and does not go through a critical thinking process about that information, he or she may form false attitudes regarding the issue [ 131 , 132 ]. Fake news is a complex combination of facts and fiction, making it difficult for information consumers to correctly judge whether the news is right or wrong. These cognitive biases induce the selective collection of information that feels more valid for news consumers, rather than information that is really valid.

4.2.2 Personal traits.

We two aspects of personal characteristics or traits can influence one’s behaviors in terms of news consumption: susceptibility and personality.

4.2.2.1 Susceptibility . The most prominent feature of social media is that consumers can be also creators, and the boundaries between the creators and consumers of information become unclear. New media literacy (i.e., the ability to critically and suitably consume messages in a variety of digital media channels, such as social media) can have a significant impact on the degree of consumption and dissemination of fake news [ 133 , 134 ]. In other words, the higher new media literacy is, the higher the probability that an individual is likely to take a critical standpoint toward fake news. Also, the susceptibility level of fake news is related to one’s selective news consumption behaviors. Bessi et al. [ 35 ] studied misinformation on Facebook and found that users who frequently interact with alternative media tend to interact with intentionally false claims more often.

Personality is an individual’s traits or behavior style. Many scholars have agreed that the personality can be largely divided into five categories (Big Five)—extraversion, agreeableness, neuroticism, openness, and conscientiousness [ 135 , 136 ]—and used them to understand the relationship between personality and news consumption.

Extroversion is related to active information use. Previous studies have confirmed that extroverts tend to use social media and that their main purpose of use is to acquire information [ 137 ] and better determine the factuality of news on social media [ 138 ]. Furthermore, people with high agreeableness, which refers to how friendly, warm, and tactful, tend to trust real news than fake news [ 138 ]. Neuroticism refers to a broad personality trait dimension representing the degree to which a person experiences the world as distressing, threatening, and unsafe. People with high neuroticism usually show negative emotions or information sharing behavior [ 139 ]. Neuroticism is positively related to fake news consumption [ 138 ]. Openness refers to the degree of enjoying new experiences. High openness is associated with high curiosity and engagement in learning [ 140 ], which enhances critical thinking ability and decreases negative effects of fake news consumption [ 138 , 141 ]. Conscientiousness refers to a person’s work ethic, being orderly, and thoroughness [ 142 ]. People with high conscientiousness tend to regard social media use as distraction from their tasks [ 143 – 145 ].

4.3 Fake news awareness and prevention

4.3.1 decision-making support tools..

News on social media does not go through the verification process, because of its high degree of freedom to create, share, and access information. The study reported that most citizens in advanced countries will have more fake information than real information in 2022 [ 146 ]. This indicates that potential personal and social damage from fake news may increase. Paradoxically, many countries that suffer from fake news problems strongly guarantee the freedom of expression under their constitutions; thus, it would be very difficult to block all possible production and distribution of fake news sources through laws and regulations. In this respect, it would be necessary to put in place not only technical efforts to detect and prevent the production and dissemination of fake news but also social efforts to make news consumers aware of the characteristics of online fake information.

Inoculation theory highlights that human attitudes and beliefs can form psychological resistance by being properly exposed to arguments against belief in advance. To have the ability to strongly protest an argument, it is necessary to expose and refute the same sort of content with weak arguments first. Doris-Down et al. [ 147 ] asked people who were from different political backgrounds to communicate directly through mobile apps and investigated whether these methods alleviated their echo-chamberness. As a result, the participants made changes, such as realizing that they had a lot in common with people who had conflicting political backgrounds and that what they thought was different was actually trivial. Karduni et al. [ 148 ] provided comprehensive information (e.g., connections among news accounts and a summary of the location entities) to study participants through the developed visual analytic system and examined how they accepted fake news. Another study was conducted to confirm how people determine the veracity of news by establishing a system similar to social media and analyzing the eye tracking of the study participants while reading fake news articles [ 28 ].

Some research has applied the inoculation theory to gamification. A “Bad News” game was designed to proactively warn people and expose them to a certain amount of false information through interactions with the gamified system [ 29 , 149 ]. The results confirmed the high effectiveness of inoculation through the game and highlighted the need to educate people about how to respond appropriately to misinformation through computer systems and games [ 29 ].

4.3.2 Fake information propagation analysis.

Fake information tends to show a certain pattern in terms of consumption and propagation, and many studies have attempted to identify the propagation patterns of fake information (e.g., the count of unique users, the depth of a network) [ 150 – 153 ].

4.3.2.1 Psychological characteristics . The theoretical foundation of research intended to examine the diffusion patterns of fake news lies in psychology [ 154 , 155 ] because psychological theories explain why and how people react to fake news. For instance, a news consumer who comes across fake news will first have doubts, judge the news against his background knowledge, and want to clarify the sources in the news. This series of processes ends when sufficient evidence is collected. Then the news consumer ends in accepting, ignoring, or suspecting the news. The psychological elements that can be defined in this process are doubts, negatives, conjectures, and skepticism [ 156 ].

4.3.2.2 Temporal characteristics . Fake news exhibits different propagation patterns from real news. The propagation of real news tends to slowly decrease over time after a single peak in the public’s interest, whereas fake news does not have a fixed timing for peak consumption, and a number of peaks appear in many cases [ 157 ]. Tambuscio et al. [ 151 ] proved that the pattern of the spread of rumors is similar to the existing epidemic model [ 158 ]. Their empirical observations confirmed that the same fake news reappears periodically and infects news consumers. For example, rumors that include the malicious political message that “Obama is a Muslim” are still being spread a decade later [ 159 ]. This pattern of proliferation and consumption shows that fake news may be consumed for a certain purpose.

5 A mental-model approach

We have examined news consumers’ susceptibility to fake news due to internal and external factors, including personal traits, cognitive biases, and the contexts. Beyond an investigation on the factor level, we seek to understand people’s susceptibility to misinformation by considering people’s internal representations and external environments holistically [ 5 ]. Specifically, we propose to comprehend people’s mental models of fake news. In this section, we first briefly introduce mental models and discuss their connection to misinformation. Then, we discuss the potential contribution of using a mental-model approach to the field of misinformation.

5.1 Mental models

A mental model is an internal representation or simulation that people carry in their minds of how the world works [ 160 , 161 ]. Typically, mental models are constructed in people’s working memory, in which information from long-term memory and the environments are combined [ 162 ]. They also indicate that individuals represent complex phenomena with somewhat abstraction based on their own experiences and understanding of the contexts. People rely on mental models to understand and predict their interactions with environments, artifacts and computing systems, as well as other individuals [ 163 , 164 ]. Generally, individuals’ ability to represent the continually changing environments is limited and unique. Thus, mental models tend to be functional and dynamic but not necessarily accurate or complete [ 163 , 165 ]. Mental models also differ between various groups and in particular between experts and novices [ 164 , 166 ].

5.2 Mental models and misinformation

Mental models have been proposed to understand human behaviors in spatial navigation [ 167 ], learning [ 168 , 169 ], deductive reasoning [ 170 ], mental presentations of real or imagined situations [ 171 ], risk communication [ 172 ], and usable cybersecurity and privacy [ 166 , 173 , 174 ]. People use mental models to facilitate their comprehension, judgment, and actions, and can be the basis of individual behaviors. In particular, the connection between a mental-model approach and misinformation has been revealed in risk communication regarding vaccines [ 175 , 176 ]. For example, Downs et al. [ 176 ] interviewed 30 parents from three US cities to understand their mental models about vaccination for their children aged 18 to 23 months. The results revealed two mental models about vaccination: (1) heath oriented : parents who focused on health-oriented topics trusted anecdotal communication more than statistical arguments; and (2) risk oriented : parents with some knowledge about vaccine mechanisms trusted communication with statistical arguments more than anecdotal information. Also, the authors found that many parents, even those favorable to vaccination, can be confused by ongoing debate, suggesting somewhat incompleteness of their mental models.

5.3 Potential contributions of a mental-model approach

Recognizing and dealing with the plurality of news consumers’ perception, cognition and actions is currently considered as key aspects of misinformation research. Thus, a mental model approach could significantly improve our understanding of people’s susceptibility to misinformation, as well as inform the development of mechanisms to mitigate misinformation.

One possible direction is to investigate the demographic differences in the context of mental models. As more Americans have adopted social media, the social media users have become more representative for the population. Usage by older adults has increased in recent years, with the use rate of about 12% in 2012 to about 35% in 2016 ( https://www.pewresearch.org/internet/fact-sheet/social-media/ ). Guess et al. (2019) analyzed participants’ profiles and their sharing activity on Facebook during the 2016 US Presidential campaign. A strong age effect was revealed. While controlled the effects of ideology and education, their results showed that Facebook users who are over 65 years old were associated with sharing nearly seven times as many articles from fake news domains on Facebook as those who are between 18–29 years old, or about 2.3 times as many as those in the age between 45 to 65.

Besides older adults, college students were shown more susceptibility to misinformation [ 177 ]. We can identify which mental models a particular age group ascribes to, and compare the incompleteness or incorrectness of the mental models by age. On the other hand, such comparison might be informative to design general mechanisms to mitigate misinformation independent of the different concrete mental models possessed by different types of users.

Users’ actions and decisions are directed by their mental models. We can also explore news consumers’ mental models and discover unanticipated and potentially risky human system interactions, which will inform the development and design of user interactions and education endeavors to mitigate misinformation.

A mental-model approach supplies an important, and as yet unconsidered, dimension to fake news research. To date, research on people’s susceptibility to fake news in social media has lagged behind research on computational aspect research on fake news. Scholars have not considered issues of news consumers’ susceptibility across the spectrum of their internal representations and external environments. An investigation from the mental model’s perspective is a step toward addressing such need.

6 Discussion and future work

In this section, we highlight the importance of balancing research efforts on fake news creation and consumption and discuss potential future directions of fake news research.

6.1 Leveraging insights of social science to model development

Developing fake news detection models has achieved great performance. Feature groups used in the model are diverse including linguistics, vision, sentiment, topic, user, and network, and many models used multiple groups to increase the performance. By using datasets with different size and characteristics, research has demonstrated the effectiveness of the models through a comparison analysis. However, much research has considered and used the features that are easily quantifiable, and many of them tend to have unclear justification or rationale of being used in modeling. For example, what is the relationship between the use of question (?), exclamation (!), or quotation marks (“…”) and fake news?, what does it mean by a longer description relates to news trustworthiness?. There are also many important aspects that can be used as additional features for modeling and have not yet found a way to be quantified. For example, journalistic styles are important characteristics that determine a level of information credibility [ 156 ], but it is challenging to accurately and reliably quantified them. There are many intentions (e.g., ideological standpoint, financial gain, panic creation) that authors may implicitly or explicitly display in the post but measuring them is uneasy and not straightforward. Social science research can play a role in here coming up with a valid research methodology to measure such subjective perceptions or notions considering various types and characteristics of them depending on a context or environment. Some research efforts in this research direction include quantifying salient factors of people’s decision-making identified in social science research and demonstrating the effectiveness of using the factors in improving model performance and interpreting model results [ 70 ]. Yet more research that applies socio-technical aspects in model development and application would be needed to better study complex characteristics of fake news.

6.1.1 Future direction.

Insights from social science may help develop transparent and applicable fake news detection models. Such socio-technical models may allow news consumers to have a better understanding of fake news detection results and its application as well as to take more appropriate actions to control fake news phenomenon.

6.2 Lack of research on fake news consumption

Regarding fake news consumption, we confirmed that only few studies involve the development of web- or mobile-based technology systems to help consumers aware possible dangers of fake news. Those studies [ 28 , 29 , 147 , 148 ] tried to demonstrate the feasibility of developed self-awareness systems through user studies. However, due to the limited number of study participants (min: 11, max: 60) and their lack of demographic diversity (i.e., recruited only college students of one school, the psychology research pool at the authors’ institution), the generalization and applicability of these systems are still questionable. On the other hand, research that involves the development of fake news detection models or network analysis to identify the pattern of fake news propagation has been relatively active. These results can be used to identify people (or entities) who intentionally create malicious fake content; however, it is still challenging to restrict people who originally had not shown any behaviors or indications of sharing or creating fake information but later manipulated real news to fake or disseminated fake news with their malicious intention or cognitive biases.

In other words, although fake news detection models have shown great, promising performance, the influence of the models may be exerted in limited cases. This is because fake news detection models heavily rely on the data that were labeled as fake by other fact-checking institutions or sites. If someone manipulates the news that were not covered by fact-checking, the format or characteristics of the manipulated news may be different from those (i.e., conventional features) that are identified and managed in the detection model. Such differences may not be captured by the model. Therefore, to prevent fake news phenomenon more effectively, research needs to consider changes of news consumption.

6.2.1 Future direction.

It may be desirable to support people recognizing that their news consumption behaviors (e.g., like, comment, share) can have a significant ripple effect. Developing a system that tracks activities of people’s news consumption and creation, measures similarity and differences between those activities, and presents behaviors or patterns of news consumption and creation to people would be helpful.

6.3 Limited coverage of fact-checking websites and regulatory approach

Some of the well-known fact-checking websites (e.g., snopes.com, politifact.com) cover news shared mostly on the Internet and label the authenticity or deficiencies of the content (e.g., miscaptioned, legend, misattributed). However, these fact-checking websites may show limited coverage in that they are only used for those who are willing to check the veracity of certain news articles. Social media platforms have been making continuous efforts to mitigate the spread of fake news. For example, Facebook shows that content that has been falsely assessed by fact-checkers is relatively less exposed to news feeds or shows warning indicators [ 178 ]. Instagram has also changed the way that warning labels are displayed when users attempt to view the content that has been falsely assessed [ 179 ]. However, this type of an interface could lead news consumers to relying on algorithmic decision-making rather than self-judgment because these ostensible regulations (e.g., warning labels) tend to lack transparency of the decision. As we explained previously, this is related to filter bubbles. Therefore, it is important to provide a more clear and transparent communicative interface for news consumers to access and understand underlying information of the algorithm results.

6.3.1 Future direction.

It is necessary to create a news consumption circumstance that gives a wider coverage of fake news and more transparent information of algorithmic decisions on news credibility. This will help news consumers preemptively avoid fake news consumption and contribute more to preventing fake news propagation. Consumers also make more proper and accurate decisions based on their understanding of the news.

6.4 New media literacy

With the diversification of news channels, we can easily consume news. However, we are also in a media environment that asks us to self-critically verify news content (e.g., whether the news title reads like a clickbait, whether the news title and content are related), which in reality is hard to be done. Moreover, in social media, news consumers can be news creators or reproducers. During this process, news information could be changed based on a consumer’s beliefs or interests. A problem here is that people may not know how to verify news content or not be aware of whether the information could be distorted or biased. As the news consumer environment changes rapidly and faces modern media deluge, the importance of media literacy education is high. Media literacy refers to the ability to decipher media content, but in a broad sense, to understand the principles of media operation and media content sensibly and critically, and in turn to the ability to utilize and creatively reproduce content. Being a “lazy thinker” is more susceptible to fake news than having a “partisan bias” [ 32 ]. As “screen time” (i.e., time spent looking at smartphone, computer, or television screens) has become more common, people are consuming only stimulating (e.g., sensual pleasure and excitement) information [ 180 ]. This could gradually lower one’s ability of critical, reasonable thinking, leading to making wrong judgments and actions. In France, when fake news problem became more serious, and a great amount of efforts were made to create “European Media Literacy Week” in schools [ 181 ]. The US is also making legislative efforts to add media literacy to the general education curriculum [ 182 ]. However, the acquisition of new media literacy through education may be limited to people in school (e.g., young students) and would be challenging to be expanded to wider populations. Thus, there is also a need for supplementary tools and research efforts to support more people to critically interpret and appropriately consume news.

In addition, more critical social attention is needed because visual content (e.g., images, videos), which had been naturally accepted as facts, can be easily manipulated in a malicious fashion and looked very natural. We have seen that people prefer to watch YouTube videos for news consumption rather than reading news articles. This visual content makes it relatively easy for news consumers to trust the content compared to text-based information and makes it easier to obtain information simply by playing the video. Since visual content will become a more dominant medium in future news consumption, educating and inoculating news consumers about potential threats of fake information in such news media would be important. More attention and research are needed on the technology supporting fake visual content awareness.

6.4.1 Future direction.

Research in both computer science and social science should find ways (e.g., developing a game-based education system or curriculum) to help news consumers aware of their practice of news consumption and maintain right news consumption behaviors.

7 Conclusion

We presented a comprehensive summary of fake news research through the lenses of news creation and consumption. The trends analysis indicated a growing increase in fake news research and a great amount of research focus on news creation compared to news consumption. By looking into internal and external factors, we unpacked the characteristics of fake news creation and consumption and presented the use of people’s mental models to better understand people’s susceptibility to misinformation. Based on the reviews, we suggested four future directions on fake news research—(1) a socio-technical model development using insights from social science, (2) in-depth understanding of news consumption behaviors, (3) preemptive decision-making and action support, and (4) educational, new media literacy support—as ways to reduce the gap between news creation and consumption and between computer science and social science research and to support healthy news environments.

Supporting information

S1 checklist..

https://doi.org/10.1371/journal.pone.0260080.s001

  • View Article
  • Google Scholar
  • 2. Goldman R. Reading fake news, Pakistani minister directs nuclear threat at Israel. The New York Times . 2016;24.
  • PubMed/NCBI
  • 6. Lévy P, Bononno R. Collective intelligence: Mankind’s emerging world in cyberspace. Perseus Books; 1997.
  • 11. Jamieson KH, Cappella JN. Echo chamber: Rush Limbaugh and the conservative media establishment. Oxford University Press; 2008.
  • 14. Shu K, Cui L, Wang S, Lee D, Liu H. defend: Explainable fake news detection. In: In Proc. of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD); 2019. p. 395–405.
  • 15. Ruchansky N, Seo S, Liu Y. Csi: A hybrid deep model for fake news detection. In: In Proc. of the 2017 ACM on Conference on Information and Knowledge Management (CIKM); 2017. p. 797–806.
  • 16. Cui L, Wang S, Lee D. Same: sentiment-aware multi-modal embedding for detecting fake news. In: In Proc. of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM); 2019. p. 41–48.
  • 17. Wang Y, Ma F, Jin Z, Yuan Y, Xun G, Jha K, et al. Eann: Event adversarial neural networks for multi-modal fake news detection. In: In Proc. of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data mining (KDD); 2018. p. 849–857.
  • 18. Nørregaard J, Horne BD, Adalı S. Nela-gt-2018: A large multi-labelled news for the study of misinformation in news articles. In: In Proc. of the International AAAI Conference on Web and Social Media (ICWSM). vol. 13; 2019. p. 630–638.
  • 20. Nguyen AT, Kharosekar A, Krishnan S, Krishnan S, Tate E, Wallace BC, et al. Believe it or not: Designing a human-ai partnership for mixed-initiative fact-checking. In: In Proc. of the 31st Annual ACM Symposium on User Interface Software and Technology (UIST); 2018. p. 189–199.
  • 23. Brandon J. Terrifying high-tech porn: creepy’deepfake’videos are on the rise. Fox News . 2018;20.
  • 24. Nguyen TT, Nguyen CM, Nguyen DT, Nguyen DT, Nahavandi S. Deep Learning for Deepfakes Creation and Detection. arXiv . 2019;1.
  • 25. Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M. Faceforensics++: Learning to detect manipulated facial images. In: IEEE International Conference on Computer Vision (ICCV); 2019. p. 1–11.
  • 26. Nirkin Y, Keller Y, Hassner T. Fsgan: Subject agnostic face swapping and reenactment. In: In Proc. of the IEEE International Conference on Computer Vision (ICCV); 2019. p. 7184–7193.
  • 28. Simko J, Hanakova M, Racsko P, Tomlein M, Moro R, Bielikova M. Fake news reading on social media: an eye-tracking study. In: In Proc. of the 30th ACM Conference on Hypertext and Social Media (HT); 2019. p. 221–230.
  • 35. Horne B, Adali S. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In: In Proc. of the 11th International AAAI Conference on Web and Social Media (ICWSM); 2017. p. 759–766.
  • 36. Golbeck J, Mauriello M, Auxier B, Bhanushali KH, Bonk C, Bouzaghrane MA, et al. Fake news vs satire: A dataset and analysis. In: In Proc. of the 10th ACM Conference on Web Science (WebSci); 2018. p. 17–21.
  • 37. Mustafaraj E, Metaxas PT. The fake news spreading plague: was it preventable? In: In Proc. of the 9th ACM Conference on Web Science (WebSci); 2017. p. 235–239.
  • 40. Jin Z, Cao J, Zhang Y, Luo J. News verification by exploiting conflicting social viewpoints in microblogs. In: In Proc. of the 13th AAAI Conference on Artificial Intelligence (AAAI); 2016. p. 2972–2978.
  • 41. Rubin VL, Conroy N, Chen Y, Cornwell S. Fake news or truth? using satirical cues to detect potentially misleading news. In: In Proc. of the Second Workshop on Computational Approaches to Deception Detection ; 2016. p. 7–17.
  • 45. Kahneman D, Tversky A. Prospect theory: An analysis of decision under risk. In: Handbook of the fundamentals of financial decision making: Part I. World Scientific; 2013. p. 99–127.
  • 46. Hanitzsch T, Wahl-Jorgensen K. Journalism studies: Developments, challenges, and future directions. The Handbook of Journalism Studies . 2020; p. 3–20.
  • 48. Osatuyi B, Hughes J. A tale of two internet news platforms-real vs. fake: An elaboration likelihood model perspective. In: In Proc. of the 51st Hawaii International Conference on System Sciences (HICSS); 2018. p. 3986–3994.
  • 49. Cacioppo JT, Petty RE. The elaboration likelihood model of persuasion. ACR North American Advances. 1984; p. 673–675.
  • 50. Wang LX, Ramachandran A, Chaintreau A. Measuring click and share dynamics on social media: a reproducible and validated approach. In Proc of the 10th International AAAI Conference on Web and Social Media (ICWSM). 2016; p. 108–113.
  • 51. Bowman S, Willis C. How audiences are shaping the future of news and information. We Media . 2003; p. 1–66.
  • 52. Hill E, Tiefenthäler A, Triebert C, Jordan D, Willis H, Stein R. 8 Minutes and 46 Seconds: How George Floyd Was Killed in Police Custody; 2020. Available from: https://www.nytimes.com/2020/06/18/us/george-floyd-timing.html .
  • 54. Carroll O. St Petersburg ‘troll farm’ had 90 dedicated staff working to influence US election campaign; 2017.
  • 55. Zannettou S, Caulfield T, Setzer W, Sirivianos M, Stringhini G, Blackburn J. Who let the trolls out? towards understanding state-sponsored trolls. In: Proc. of the 10th ACM Conference on Web Science (WebSci); 2019. p. 353–362.
  • 56. Vincent J. Watch Jordan Peele use AI to make Barack Obama deliver a PSA about fake news. The Verge . 2018;17.
  • 58. Linder M. Block. Mute. Unfriend. Tensions rise on Facebook after election results. Chicago Tribune . 2016;9.
  • 60. Howard PN, Kollanyi B. Bots, #StrongerIn, and #Brexit: computational propaganda during the UK-EU referendum. arXiv . 2016; p. arXiv–1606.
  • 61. Kasra M, Shen C, O’Brien JF. Seeing is believing: how people fail to identify fake images on the Web. In Proc of the 2018 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI). 2018; p. 1–6.
  • 62. Kirby EJ. The city getting rich from fake news. BBC News . 2016;5.
  • 63. Hu Z, Yang Z, Li Q, Zhang A, Huang Y. Infodemiological study on COVID-19 epidemic and COVID-19 infodemic. Preprints . 2020; p. 2020020380.
  • 71. Knaus C. Disinformation and lies are spreading faster than Australia’s bushfires. The Guardian . 2020;11.
  • 72. Karimi H, Roy P, Saba-Sadiya S, Tang J. Multi-source multi-class fake news detection. In: In Proc. of the 27th International Conference on Computational Linguistics ; 2018. p. 1546–1557.
  • 73. Wang WY. “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv . 2017; p. arXiv–1705.
  • 74. Pérez-Rosas V, Kleinberg B, Lefevre A, Mihalcea R. Automatic Detection of Fake News. arXiv . 2017; p. arXiv–1708.
  • 75. Yang Y, Zheng L, Zhang J, Cui Q, Li Z, Yu PS. TI-CNN: Convolutional Neural Networks for Fake News Detection. arXiv . 2018; p. arXiv–1806.
  • 76. Kumar V, Khattar D, Gairola S, Kumar Lal Y, Varma V. Identifying clickbait: A multi-strategy approach using neural networks. In: In Proc. of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR); 2018. p. 1225–1228.
  • 77. Yoon S, Park K, Shin J, Lim H, Won S, Cha M, et al. Detecting incongruity between news headline and body text via a deep hierarchical encoder. In: Proc. of the AAAI Conference on Artificial Intelligence. vol. 33; 2019. p. 791–800.
  • 78. Lu Y, Zhang L, Xiao Y, Li Y. Simultaneously detecting fake reviews and review spammers using factor graph model. In: In Proc. of the 5th Annual ACM Web Science Conference (WebSci); 2013. p. 225–233.
  • 79. Mukherjee A, Venkataraman V, Liu B, Glance N. What yelp fake review filter might be doing? In: In Proc. of The International AAAI Conference on Weblogs and Social Media (ICWSM); 2013. p. 409–418.
  • 80. Benevenuto F, Magno G, Rodrigues T, Almeida V. Detecting spammers on twitter. In: In Proc. of the 8th Annual Collaboration , Electronic messaging , Anti-Abuse and Spam Conference (CEAS). vol. 6; 2010. p. 12.
  • 81. Lee K, Caverlee J, Webb S. Uncovering social spammers: social honeypots+ machine learning. In: In Proc. of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR); 2010. p. 435–442.
  • 82. Li FH, Huang M, Yang Y, Zhu X. Learning to identify review spam. In: In Proc. of the 22nd International Joint Conference on Artificial Intelligence (IJCAI); 2011. p. 2488–2493.
  • 83. Wang J, Wen R, Wu C, Huang Y, Xion J. Fdgars: Fraudster detection via graph convolutional networks in online app review system. In: In Proc. of The 2019 World Wide Web Conference (WWW); 2019. p. 310–316.
  • 84. Castillo C, Mendoza M, Poblete B. Information credibility on twitter. In: In Proc. of the 20th International Conference on World Wide Web (WWW); 2011. p. 675–684.
  • 85. Jo Y, Kim M, Han K. How Do Humans Assess the Credibility on Web Blogs: Qualifying and Verifying Human Factors with Machine Learning. In: In Proc. of the 2019 CHI Conference on Human Factors in Computing Systems (CHI); 2019. p. 1–12.
  • 86. Che X, Metaxa-Kakavouli D, Hancock JT. Fake News in the News: An Analysis of Partisan Coverage of the Fake News Phenomenon. In: In Proc. of the 21st ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW); 2018. p. 289–292.
  • 87. Potthast M, Kiesel J, Reinartz K, Bevendorff J, Stein B. A Stylometric Inquiry into Hyperpartisan and Fake News. arXiv . 2017; p. arXiv–1702.
  • 89. Popat K, Mukherjee S, Strötgen J, Weikum G. Credibility assessment of textual claims on the web. In: In Proc. of the 25th ACM International on Conference on Information and Knowledge Management (CIKM); 2016. p. 2173–2178.
  • 90. Shen TJ, Cowell R, Gupta A, Le T, Yadav A, Lee D. How gullible are you? Predicting susceptibility to fake news. In: In Proc. of the 10th ACM Conference on Web Science (WebSci); 2019. p. 287–288.
  • 91. Gupta A, Lamba H, Kumaraguru P, Joshi A. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In: In Proc. of the 22nd International Conference on World Wide Web ; 2013. p. 729–736.
  • 92. He P, Li H, Wang H. Detection of fake images via the ensemble of deep representations from multi color spaces. In: In Proc. of the 26th IEEE International Conference on Image Processing (ICIP). IEEE; 2019. p. 2299–2303.
  • 93. Sun Y, Chen Y, Wang X, Tang X. Deep learning face representation by joint identification-verification. Advances in Neural Information Processing Systems . 2014; p. 1–9.
  • 94. Huh M, Liu A, Owens A, Efros AA. Fighting fake news: Image splice detection via learned self-consistency. In: In Proc. of the European Conference on Computer Vision (ECCV); 2018. p. 101–117.
  • 95. Dang H, Liu F, Stehouwer J, Liu X, Jain AK. On the detection of digital face manipulation. In: In Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020. p. 5781–5790.
  • 96. Tariq S, Lee S, Kim H, Shin Y, Woo SS. Detecting both machine and human created fake face images in the wild. In Proc of the 2nd International Workshop on Multimedia Privacy and Security (MPS). 2018; p. 81–87.
  • 97. Liu Z, Luo P, Wang X, Tang X. Deep learning face attributes in the wild. In: In Proc. of the IEEE International Conference on Computer Vision (ICCV); 2015. p. 3730–3738.
  • 98. Wang R, Ma L, Juefei-Xu F, Xie X, Wang J, Liu Y. Fakespotter: A simple baseline for spotting ai-synthesized fake faces. arXiv . 2019; p. arXiv–1909.
  • 99. Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. In: In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2019. p. 4401–4410.
  • 100. Yang X, Li Y, Qi H, Lyu S. Exposing GAN-synthesized faces using landmark locations. In Proc of the ACM Workshop on Information Hiding and Multimedia Security (IH&MMSec). 2019; p. 113–118.
  • 101. Zhang X, Karaman S, Chang SF. Detecting and simulating artifacts in gan fake images. In Proc of the 2019 IEEE International Workshop on Information Forensics and Security (WIFS). 2019; p. 1–6.
  • 102. Amerini I, Galteri L, Caldelli R, Del Bimbo A. Deepfake video detection through optical flow based cnn. In Proc of the IEEE International Conference on Computer Vision Workshops (ICCV). 2019; p. 1205–1207.
  • 103. Li Y, Lyu S. Exposing deepfake videos by detecting face warping artifacts. arXiv . 2018; p. 46–52.
  • 104. Korshunov P, Marcel S. Deepfakes: a new threat to face recognition? assessment and detection. arXiv . 2018; p. arXiv–1812.
  • 105. Jeon H, Bang Y, Woo SS. Faketalkerdetect: Effective and practical realistic neural talking head detection with a highly unbalanced dataset. In Proc of the IEEE International Conference on Computer Vision Workshops (ICCV). 2019; p. 1285–1287.
  • 106. Chung JS, Nagrani A, Zisserman A. Voxceleb2: Deep speaker recognition. arXiv . 2018; p. arXiv–1806.
  • 107. Songsri-in K, Zafeiriou S. Complement face forensic detection and localization with faciallandmarks. arXiv . 2019; p. arXiv–1910.
  • 108. Ma S, Cui L, Dai D, Wei F, Sun X. Livebot: Generating live video comments based on visual and textual contexts. In Proc of the AAAI Conference on Artificial Intelligence (AAAI). 2019; p. 6810–6817.
  • 109. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. Advances in Neural Information Processing Systems . 2014; p. arXiv–1406.
  • 110. Metz R. The number of deepfake videos online is spiking. Most are porn; 2019. Available from: https://cnn.it/3xPJRT2 .
  • 111. Strömbäck J. In search of a standard: Four models of democracy and their normative implications for journalism. Journalism Studies . 2005; p. 331–345.
  • 112. Brenan M. Americans’ Trust in Mass Media Edges Down to 41%; 2019. Available from: https://bit.ly/3ejl6ql .
  • 114. Ladd JM. Why Americans hate the news media and how it matters. Princeton University Press; 2012.
  • 116. Weisberg J. Bubble trouble: Is web personalization turning us into solipsistic twits; 2011. Available from: https://bit.ly/3xOGFqD .
  • 117. Pariser E. The filter bubble: How the new personalized web is changing what we read and how we think. Penguin; 2011.
  • 118. Lewis P, McCormick E. How an ex-YouTube insider investigated its secret algorithm. The Guardian . 2018;2.
  • 120. Kavanaugh AL, Yang S, Li LT, Sheetz SD, Fox EA, et al. Microblogging in crisis situations: Mass protests in Iran, Tunisia, Egypt; 2011.
  • 121. Mustafaraj E, Metaxas PT, Finn S, Monroy-Hernández A. Hiding in Plain Sight: A Tale of Trust and Mistrust inside a Community of Citizen Reporters. In Proc of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM) . 2012; p. 250–257.
  • 125. Tajfel H. Human groups and social categories: Studies in social psychology. Cup Archive ; 1981.
  • 127. Correia V, Festinger L. Biased argumentation and critical thinking. Rhetoric and Cognition: Theoretical Perspectives and Persuasive Strategies . 2014; p. 89–110.
  • 128. Festinger L. A theory of cognitive dissonance. Stanford University Press; 1957.
  • 136. John OP, Srivastava S, et al. The Big Five trait taxonomy: History, measurement, and theoretical perspectives. Handbook of Personality: theory and research . 1999; p. 102–138.
  • 138. Shu K, Wang S, Liu H. Understanding user profiles on social media for fake news detection. In: 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE; 2018. p. 430–435.
  • 142. Costa PT, McCrae RR. The NEO personality inventory. Psychological Assessment Resources; 1985.
  • 146. Panetta K. Gartner top strategic predictions for 2018 and beyond; 2017. Available from: https://gtnr.it/33kuljQ .
  • 147. Doris-Down A, Versee H, Gilbert E. Political blend: an application designed to bring people together based on political differences. In Proc of the 6th International Conference on Communities and Technologies (C&T). 2013; p. 120–130.
  • 148. Karduni A, Wesslen R, Santhanam S, Cho I, Volkova S, Arendt D, et al. Can You Verifi This? Studying Uncertainty and Decision-Making About Misinformation Using Visual Analytics. In Proc of the 12th International AAAI Conference on Web and Social Media (ICWSM). 2018;12(1).
  • 149. Basol M, Roozenbeek J, van der Linden S. Good news about bad news: gamified inoculation boosts confidence and cognitive immunity against fake news. Journal of Cognition . 2020;3(1).
  • 151. Tambuscio M, Ruffo G, Flammini A, Menczer F. Fact-checking effect on viral hoaxes: A model of misinformation spread in social networks. In Proc of the 24th International Conference on World Wide Web (WWW). 2015; p. 977–982.
  • 152. Friggeri A, Adamic L, Eckles D, Cheng J. Rumor cascades. In Proc of the 8th International AAAI Conference on Weblogs and Social Media (ICWSM) . 2014;8.
  • 153. Lerman K, Ghosh R. Information contagion: An empirical study of the spread of news on digg and twitter social networks. arXiv . 2010; p. arXiv–1003.
  • 155. Cantril H. The invasion from Mars: A study in the psychology of panic. Transaction Publishers; 1952.
  • 158. Bailey NT, et al. The mathematical theory of infectious diseases and its applications. Charles Griffin & Company Ltd; 1975.
  • 159. on Religion PF, Life P. Growing Number of Americans Say Obama Is a Muslim; 2010.
  • 160. Craik KJW. The nature of explanation. Cambridge University Press; 1943.
  • 161. Johnson-Laird PN. Mental models: Towards a cognitive science of language, inference, and consciousness. 6. Harvard University Press; 1983.
  • 162. Johnson-Laird PN, Girotto V, Legrenzi P. Mental models: a gentle guide for outsiders. Sistemi Intelligenti . 1998;9(68).
  • 164. Rouse WB, Morris NM. On looking into the black box: Prospects and limits in the search for mental models. Psychological Bulletin . 1986;100(3).
  • 166. Wash R, Rader E. Influencing mental models of security: a research agenda. In Proc of the 2011 New Security Paradigms Workshop (NSPW). 2011; p. 57–66.
  • 167. Tversky B. Cognitive maps, cognitive collages, and spatial mental models. In Proc of European conference on spatial information theory (COSIT). 1993; p. 14–24.
  • 169. Mayer RE, Mathias A, Wetzell K. Fostering understanding of multimedia messages through pre-training: Evidence for a two-stage theory of mental model construction. Journal of Experimental Psychology: Applied . 2002;8(3).
  • 172. Morgan MG, Fischhoff B, Bostrom A, Atman CJ, et al. Risk communication: A mental models approach. Cambridge University Press; 2002.
  • 174. Kang R, Dabbish L, Fruchter N, Kiesler S. “My Data Just Goes Everywhere:” User mental models of the internet and implications for privacy and security. In Proc of 11th Symposium On Usable Privacy and Security . 2015; p. 39–52.
  • 178. Facebook Journalism Project. Facebook’s Approach to Fact-Checking: How It Works; 2020. https://bit.ly/34QgOlj .
  • 179. Sardarizadeh S. Instagram fact-check: Can a new flagging tool stop fake news?; 2019. Available from: https://bbc.in/33fg5ZR .
  • 180. Greenfield S. Mind change: How digital technologies are leaving their mark on our brains. Random House Incorporated ; 2015.
  • 181. European Commission. European Media Literacy Week; 2020. https://bit.ly/36H9MR3 .
  • 182. Media Literacy Now. U.S. media literacy policy report 2020; 2020. https://bit.ly/33LkLqQ .

Smart. Open. Grounded. Inventive. Read our Ideas Made to Matter.

Which program is right for you?

MIT Sloan Campus life

Through intellectual rigor and experiential learning, this full-time, two-year MBA program develops leaders who make a difference in the world.

A rigorous, hands-on program that prepares adaptive problem solvers for premier finance careers.

A 12-month program focused on applying the tools of modern data science, optimization and machine learning to solve real-world business problems.

Earn your MBA and SM in engineering with this transformative two-year program.

Combine an international MBA with a deep dive into management science. A special opportunity for partner and affiliate schools only.

A doctoral program that produces outstanding scholars who are leading in their fields of research.

Bring a business perspective to your technical and quantitative expertise with a bachelor’s degree in management, business analytics, or finance.

A joint program for mid-career professionals that integrates engineering and systems thinking. Earn your master’s degree in engineering and management.

An interdisciplinary program that combines engineering, management, and design, leading to a master’s degree in engineering and management.

Executive Programs

A full-time MBA program for mid-career leaders eager to dedicate one year of discovery for a lifetime of impact.

This 20-month MBA program equips experienced executives to enhance their impact on their organizations and the world.

Non-degree programs for senior executives and high-potential managers.

A non-degree, customizable program for mid-career professionals.

New leadership and management thinking from MIT experts

Use imagination to make the most of generative AI

Lending standards can be too tight for too long, research finds

Credit: Rob Dobi

Ideas Made to Matter

Social Media

MIT Sloan research about social media, misinformation, and elections

Oct 5, 2020

False information has become a feature of social media — especially during election years. Research shows false news peaked on Twitter during the 2012 and 2016 presidential elections , and a bipartisan Senate committee found that before and after the 2016 election, the Russian government used Facebook, Instagram, and Twitter to spread false information and conspiracy theories and stoke divisions.

Over the last several years, MIT Sloan researchers have studied the spread of false information, or so-called fake news, described by researchers as “entirely fabricated and often partisan content presented as factual.” Understanding more about why people share misinformation, and how it spreads, leads to proposed solutions — a goal that becomes more important as people spend more time on social media platforms, and the connections between misinformation and election results become clearer.

Below is an overview of some MIT Sloan research about social media, fake news, and elections.

False rumors spread faster and wider than true information, according to a 2018 study published in Science by MIT Sloan professor Sinan Aral and Deb Roy and Soroush Vosoughi of the MIT Media Lab. They found falsehoods are 70% more likely to be retweeted on Twitter than the truth, and reach their first 1,500 people six times faster. This effect is more pronounced with political news than other categories. Bots spread true and false information at the same rates, the researchers found, so people are the ones hitting retweet on false information. One potential reason: the novelty hypothesis, which found that people are drawn to information that is novel and unusual, as false news often is. (Not that bots don’t play a role in spreading misinformation — in fact, they can easily manipulate people’s opinions .) 

Falsehoods are 70% more likely to be retweeted than the truth.

People who share false information are more likely distracted or lazy, rather than biased , according to MIT Sloan professor David Rand and his co-author Gordon Pennycook. Their 2018 study asking people to rate the accuracy of news headlines on Facebook found that people who engage in more analytical thinking are more likely to discern true from false , regardless of their political views.     

Some misinformation comes from politicians — and it might help them get votes.   Under certain circumstances, people appreciate a candidate who tells obvious lies , even seeing that candidate as more “authentic,” according to research co-authored by Ezra Zuckerman Sivan, an associate dean and professor at MIT Sloan. A norm-breaking candidate who tells lies appeals to aggrieved constituencies because those voters see norms as illegitimately imposed by the establishment. The paper was co-authored by Minjae Kim, PhD '18 and assistant professor at Rice University, and Oliver Hahl, PhD '13 and and assistant professor at Carnegie Mellon.  

Attaching warnings to social media posts that feature information disputed by fact-checkers can backfire. A study by Rand and his co-authors outlined a potential downfall to labeling misinformation online : the “implied truth effect,” where people assume all information without a label is true. As a result, false headlines that fail to get tagged, or aren’t tagged quickly, could be taken as truth. Attaching verifications to some true headlines could be a possible fix.

Social media can also skew opinions because of what people don’t see.   Another study by Rand and several co-authors looked at " information gerrymandering, " or how people tend to live in partisan bubbles where they receive just a partial picture of how others feel about political issues. This can distort what people think about how others plan to vote — and even influence the outcome of elections.

Aral and MIT Sloan professor Dean Eckles outlined a four-step plan for researchers to measure and analyze social media manipulation and turn that information into a defense against future interference. The steps, in brief: catalog exposure to social media manipulation; combine exposure and voter behavior datasets; assess the effectiveness of manipulative messages; and calculate consequences of voting behavior changes. 

And in his new book “ The Hype Machine ,” Aral goes more in-depth, exploring the promise and peril of social media and how to protect society and democracy from its threats.

When asked directly, most people say it is important to share information that is accurate, according to a study co-authored by Rand. Yet people tend to share false information online because the social media context focuses their attention on factors other than truth and accuracy — not because they don’t care about the truth of what they are sharing. Reminding people to think about the importance of accuracy —  an “accuracy nudge” — can increase the quality of news they subsequently share . (The same is true for inaccurate information about COVID-19.)

Taking time to think also helps. In another study, Rand and his co-authors found that when people had a chance to deliberate about the accuracy of news headlines, they were more likely to identify false headlines than they were when they made a snap judgement. This was true regardless of a person’s political beliefs and whether the headline affirmed them.

Look at how advertising works on social media platforms. Advertising spreads fake news through methods like Facebook’s marketing tools, which allow advertisers to pay to target certain groups of people. A study co-authored by MIT Sloan marketing professor Catherine Tucker found a 75% reduction in fake news being shared after Facebook rolled out a new advertising system designed to intercept articles with fake news stories.

Crowdsource ratings for online news sources . After initial concerns about Facebook’s idea to survey users about the validity of various news sources, Rand and his colleagues found in a study that people generally came to the same conclusion as fact-checkers — showing that using the wisdom of the crowd could work. One caveat: the decision to only allow people familiar with a news source to rate its validity was a “terrible idea,” Rand said.

Rand, MIT Sloan research scientist Mohsen Mosleh, and MIT Sloan graduate student Cameron Martel also studied whether the type of correction (for example, polite and hedged or more direct) makes people more likely to reply or correct their behavior . The bottom line: it does not. But analytic thinking and active open-minded thinking are associated with updating beliefs in response to corrections.

More social media coverage

Related Articles

2023 Working Definitions including "patient capital," "cultural detox," "greenhushing," and "culture of heroics."

  • Follow us on Facebook
  • Follow us on Twitter
  • Criminal Justice
  • Environment
  • Politics & Government
  • Race & Gender

Expert Commentary

Fake news and the spread of misinformation: A research roundup

This collection of research offers insights into the impacts of fake news and other forms of misinformation, including fake Twitter images, and how people use the internet to spread rumors and misinformation.

research about social media fake news

Republish this article

Creative Commons License

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License .

by Denise-Marie Ordway, The Journalist's Resource September 1, 2017

This <a target="_blank" href="https://journalistsresource.org/politics-and-government/fake-news-conspiracy-theories-journalism-research/">article</a> first appeared on <a target="_blank" href="https://journalistsresource.org">The Journalist's Resource</a> and is republished here under a Creative Commons license.<img src="https://journalistsresource.org/wp-content/uploads/2020/11/cropped-jr-favicon-150x150.png" style="width:1em;height:1em;margin-left:10px;">

It’s too soon to say whether Google ’s and Facebook ’s attempts to clamp down on fake news will have a significant impact. But fabricated stories posing as serious journalism are not likely to go away as they have become a means for some writers to make money and potentially influence public opinion. Even as Americans recognize that fake news causes confusion about current issues and events, they continue to circulate it. A December 2016 survey by the Pew Research Center suggests that 23 percent of U.S. adults have shared fake news, knowingly or unknowingly, with friends and others.

“Fake news” is a term that can mean different things, depending on the context. News satire is often called fake news as are parodies such as the “Saturday Night Live” mock newscast Weekend Update. Much of the fake news that flooded the internet during the 2016 election season consisted of written pieces and recorded segments promoting false information or perpetuating conspiracy theories. Some news organizations published reports spotlighting examples of hoaxes, fake news and misinformation  on Election Day 2016.

The news media has written a lot about fake news and other forms of misinformation, but scholars are still trying to understand it — for example, how it travels and why some people believe it and even seek it out. Below, Journalist’s Resource has pulled together academic studies to help newsrooms better understand the problem and its impacts. Two other resources that may be helpful are the Poynter Institute’s tips on debunking fake news stories and the  First Draft Partner Network , a global collaboration of newsrooms, social media platforms and fact-checking organizations that was launched in September 2016 to battle fake news. In mid-2018, JR ‘s managing editor, Denise-Marie Ordway, wrote an article for  Harvard Business Review explaining what researchers know to date about the amount of misinformation people consume, why they believe it and the best ways to fight it.

—————————

“The Science of Fake News” Lazer, David M. J.; et al.   Science , March 2018. DOI: 10.1126/science.aao2998.

Summary: “The rise of fake news highlights the erosion of long-standing institutional bulwarks against misinformation in the internet age. Concern over the problem is global. However, much remains unknown regarding the vulnerabilities of individuals, institutions, and society to manipulations by malicious actors. A new system of safeguards is needed. Below, we discuss extant social and computer science research regarding belief in fake news and the mechanisms by which it spreads. Fake news has a long history, but we focus on unanswered scientific questions raised by the proliferation of its most recent, politically oriented incarnation. Beyond selected references in the text, suggested further reading can be found in the supplementary materials.”

“Who Falls for Fake News? The Roles of Bullshit Receptivity, Overclaiming, Familiarity, and Analytical Thinking” Pennycook, Gordon; Rand, David G. May 2018. Available at SSRN. DOI: 10.2139/ssrn.3023545.

Abstract:  “Inaccurate beliefs pose a threat to democracy and fake news represents a particularly egregious and direct avenue by which inaccurate beliefs have been propagated via social media. Here we present three studies (MTurk, N = 1,606) investigating the cognitive psychological profile of individuals who fall prey to fake news. We find consistent evidence that the tendency to ascribe profundity to randomly generated sentences — pseudo-profound bullshit receptivity — correlates positively with perceptions of fake news accuracy, and negatively with the ability to differentiate between fake and real news (media truth discernment). Relatedly, individuals who overclaim regarding their level of knowledge (i.e. who produce bullshit) also perceive fake news as more accurate. Conversely, the tendency to ascribe profundity to prototypically profound (non-bullshit) quotations is not associated with media truth discernment; and both profundity measures are positively correlated with willingness to share both fake and real news on social media. We also replicate prior results regarding analytic thinking — which correlates negatively with perceived accuracy of fake news and positively with media truth discernment — and shed further light on this relationship by showing that it is not moderated by the presence versus absence of information about the new headline’s source (which has no effect on perceived accuracy), or by prior familiarity with the news headlines (which correlates positively with perceived accuracy of fake and real news). Our results suggest that belief in fake news has similar cognitive properties to other forms of bullshit receptivity, and reinforce the important role that analytic thinking plays in the recognition of misinformation.”

“Social Media and Fake News in the 2016 Election” Allcott, Hunt; Gentzkow, Matthew. Working paper for the National Bureau of Economic Research, No. 23089, 2017.

Abstract: “We present new evidence on the role of false stories circulated on social media prior to the 2016 U.S. presidential election. Drawing on audience data, archives of fact-checking websites, and results from a new online survey, we find: (i) social media was an important but not dominant source of news in the run-up to the election, with 14 percent of Americans calling social media their “most important” source of election news; (ii) of the known false news stories that appeared in the three months before the election, those favoring Trump were shared a total of 30 million times on Facebook, while those favoring Clinton were shared eight million times; (iii) the average American saw and remembered 0.92 pro-Trump fake news stories and 0.23 pro-Clinton fake news stories, with just over half of those who recalled seeing fake news stories believing them; (iv) for fake news to have changed the outcome of the election, a single fake article would need to have had the same persuasive effect as 36 television campaign ads.”

“Debunking: A Meta-Analysis of the Psychological Efficacy of Messages Countering Misinformation” Chan, Man-pui Sally; Jones, Christopher R.; Jamieson, Kathleen Hall; Albarracín, Dolores. Psychological Science , September 2017. DOI: 10.1177/0956797617714579.

Abstract: “This meta-analysis investigated the factors underlying effective messages to counter attitudes and beliefs based on misinformation. Because misinformation can lead to poor decisions about consequential matters and is persistent and difficult to correct, debunking it is an important scientific and public-policy goal. This meta-analysis (k = 52, N = 6,878) revealed large effects for presenting misinformation (ds = 2.41–3.08), debunking (ds = 1.14–1.33), and the persistence of misinformation in the face of debunking (ds = 0.75–1.06). Persistence was stronger and the debunking effect was weaker when audiences generated reasons in support of the initial misinformation. A detailed debunking message correlated positively with the debunking effect. Surprisingly, however, a detailed debunking message also correlated positively with the misinformation-persistence effect.”

“Displacing Misinformation about Events: An Experimental Test of Causal Corrections” Nyhan, Brendan; Reifler, Jason. Journal of Experimental Political Science , 2015. doi: 10.1017/XPS.2014.22.

Abstract: “Misinformation can be very difficult to correct and may have lasting effects even after it is discredited. One reason for this persistence is the manner in which people make causal inferences based on available information about a given event or outcome. As a result, false information may continue to influence beliefs and attitudes even after being debunked if it is not replaced by an alternate causal explanation. We test this hypothesis using an experimental paradigm adapted from the psychology literature on the continued influence effect and find that a causal explanation for an unexplained event is significantly more effective than a denial even when the denial is backed by unusually strong evidence. This result has significant implications for how to most effectively counter misinformation about controversial political events and outcomes.”

“Rumors and Health Care Reform: Experiments in Political Misinformation” Berinsky, Adam J. British Journal of Political Science , 2015. doi: 10.1017/S0007123415000186.

Abstract: “This article explores belief in political rumors surrounding the health care reforms enacted by Congress in 2010. Refuting rumors with statements from unlikely sources can, under certain circumstances, increase the willingness of citizens to reject rumors regardless of their own political predilections. Such source credibility effects, while well known in the political persuasion literature, have not been applied to the study of rumor. Though source credibility appears to be an effective tool for debunking political rumors, risks remain. Drawing upon research from psychology on ‘fluency’ — the ease of information recall — this article argues that rumors acquire power through familiarity. Attempting to quash rumors through direct refutation may facilitate their diffusion by increasing fluency. The empirical results find that merely repeating a rumor increases its power.”

“Rumors and Factitious Informational Blends: The Role of the Web in Speculative Politics” Rojecki, Andrew; Meraz, Sharon. New Media & Society , 2016. doi: 10.1177/1461444814535724.

Abstract: “The World Wide Web has changed the dynamics of information transmission and agenda-setting. Facts mingle with half-truths and untruths to create factitious informational blends (FIBs) that drive speculative politics. We specify an information environment that mirrors and contributes to a polarized political system and develop a methodology that measures the interaction of the two. We do so by examining the evolution of two comparable claims during the 2004 presidential campaign in three streams of data: (1) web pages, (2) Google searches, and (3) media coverage. We find that the web is not sufficient alone for spreading misinformation, but it leads the agenda for traditional media. We find no evidence for equality of influence in network actors.”

“Analyzing How People Orient to and Spread Rumors in Social Media by Looking at Conversational Threads” Zubiaga, Arkaitz; et al. PLOS ONE, 2016. doi: 10.1371/journal.pone.0150989.

Abstract: “As breaking news unfolds people increasingly rely on social media to stay abreast of the latest updates. The use of social media in such situations comes with the caveat that new information being released piecemeal may encourage rumors, many of which remain unverified long after their point of release. Little is known, however, about the dynamics of the life cycle of a social media rumor. In this paper we present a methodology that has enabled us to collect, identify and annotate a dataset of 330 rumor threads (4,842 tweets) associated with 9 newsworthy events. We analyze this dataset to understand how users spread, support, or deny rumors that are later proven true or false, by distinguishing two levels of status in a rumor life cycle i.e., before and after its veracity status is resolved. The identification of rumors associated with each event, as well as the tweet that resolved each rumor as true or false, was performed by journalist members of the research team who tracked the events in real time. Our study shows that rumors that are ultimately proven true tend to be resolved faster than those that turn out to be false. Whilst one can readily see users denying rumors once they have been debunked, users appear to be less capable of distinguishing true from false rumors when their veracity remains in question. In fact, we show that the prevalent tendency for users is to support every unverified rumor. We also analyze the role of different types of users, finding that highly reputable users such as news organizations endeavor to post well-grounded statements, which appear to be certain and accompanied by evidence. Nevertheless, these often prove to be unverified pieces of information that give rise to false rumors. Our study reinforces the need for developing robust machine learning techniques that can provide assistance in real time for assessing the veracity of rumors. The findings of our study provide useful insights for achieving this aim.”

“Miley, CNN and The Onion” Berkowitz, Dan; Schwartz, David Asa. Journalism Practice , 2016. doi: 10.1080/17512786.2015.1006933.

Abstract: “Following a twerk-heavy performance by Miley Cyrus on the Video Music Awards program, CNN featured the story on the top of its website. The Onion — a fake-news organization — then ran a satirical column purporting to be by CNN’s Web editor explaining this decision. Through textual analysis, this paper demonstrates how a Fifth Estate comprised of bloggers, columnists and fake news organizations worked to relocate mainstream journalism back to within its professional boundaries.”

“Emotions, Partisanship, and Misperceptions: How Anger and Anxiety Moderate the Effect of Partisan Bias on Susceptibility to Political Misinformation”

Weeks, Brian E. Journal of Communication , 2015. doi: 10.1111/jcom.12164.

Abstract: “Citizens are frequently misinformed about political issues and candidates but the circumstances under which inaccurate beliefs emerge are not fully understood. This experimental study demonstrates that the independent experience of two emotions, anger and anxiety, in part determines whether citizens consider misinformation in a partisan or open-minded fashion. Anger encourages partisan, motivated evaluation of uncorrected misinformation that results in beliefs consistent with the supported political party, while anxiety at times promotes initial beliefs based less on partisanship and more on the information environment. However, exposure to corrections improves belief accuracy, regardless of emotion or partisanship. The results indicate that the unique experience of anger and anxiety can affect the accuracy of political beliefs by strengthening or attenuating the influence of partisanship.”

“Deception Detection for News: Three Types of Fakes” Rubin, Victoria L.; Chen, Yimin; Conroy, Niall J. Proceedings of the Association for Information Science and Technology , 2015, Vol. 52. doi: 10.1002/pra2.2015.145052010083.

Abstract: “A fake news detection system aims to assist users in detecting and filtering out varieties of potentially deceptive news. The prediction of the chances that a particular news item is intentionally deceptive is based on the analysis of previously seen truthful and deceptive news. A scarcity of deceptive news, available as corpora for predictive modeling, is a major stumbling block in this field of natural language processing (NLP) and deception detection. This paper discusses three types of fake news, each in contrast to genuine serious reporting, and weighs their pros and cons as a corpus for text analytics and predictive modeling. Filtering, vetting, and verifying online information continues to be essential in library and information science (LIS), as the lines between traditional news and online information are blurring.”

“When Fake News Becomes Real: Combined Exposure to Multiple News Sources and Political Attitudes of Inefficacy, Alienation, and Cynicism” Balmas, Meital. Communication Research , 2014, Vol. 41. doi: 10.1177/0093650212453600.

Abstract: “This research assesses possible associations between viewing fake news (i.e., political satire) and attitudes of inefficacy, alienation, and cynicism toward political candidates. Using survey data collected during the 2006 Israeli election campaign, the study provides evidence for an indirect positive effect of fake news viewing in fostering the feelings of inefficacy, alienation, and cynicism, through the mediator variable of perceived realism of fake news. Within this process, hard news viewing serves as a moderator of the association between viewing fake news and their perceived realism. It was also demonstrated that perceived realism of fake news is stronger among individuals with high exposure to fake news and low exposure to hard news than among those with high exposure to both fake and hard news. Overall, this study contributes to the scientific knowledge regarding the influence of the interaction between various types of media use on political effects.”

“Faking Sandy: Characterizing and Identifying Fake Images on Twitter During Hurricane Sandy” Gupta, Aditi; Lamba, Hemank; Kumaraguru, Ponnurangam; Joshi, Anupam. Proceedings of the 22nd International Conference on World Wide Web , 2013. doi: 10.1145/2487788.2488033.

Abstract: “In today’s world, online social media plays a vital role during real world events, especially crisis events. There are both positive and negative effects of social media coverage of events. It can be used by authorities for effective disaster management or by malicious entities to spread rumors and fake news. The aim of this paper is to highlight the role of Twitter during Hurricane Sandy (2012) to spread fake images about the disaster. We identified 10,350 unique tweets containing fake images that were circulated on Twitter during Hurricane Sandy. We performed a characterization analysis, to understand the temporal, social reputation and influence patterns for the spread of fake images. Eighty-six percent of tweets spreading the fake images were retweets, hence very few were original tweets. Our results showed that the top 30 users out of 10,215 users (0.3 percent) resulted in 90 percent of the retweets of fake images; also network links such as follower relationships of Twitter, contributed very little (only 11 percent) to the spread of these fake photos URLs. Next, we used classification models, to distinguish fake images from real images of Hurricane Sandy. Best results were obtained from Decision Tree classifier, we got 97 percent accuracy in predicting fake images from real. Also, tweet-based features were very effective in distinguishing fake images tweets from real, while the performance of user-based features was very poor. Our results showed that automated techniques can be used in identifying real images from fake images posted on Twitter.”

“The Impact of Real News about ‘Fake News’: Intertextual Processes and Political Satire” Brewer, Paul R.; Young, Dannagal Goldthwaite; Morreale, Michelle. International Journal of Public Opinion Research , 2013. doi: 10.1093/ijpor/edt015.

Abstract: “This study builds on research about political humor, press meta-coverage, and intertextuality to examine the effects of news coverage about political satire on audience members. The analysis uses experimental data to test whether news coverage of Stephen Colbert’s Super PAC influenced knowledge and opinion regarding Citizens United, as well as political trust and internal political efficacy. It also tests whether such effects depended on previous exposure to The Colbert Report (Colbert’s satirical television show) and traditional news. Results indicate that exposure to news coverage of satire can influence knowledge, opinion, and political trust. Additionally, regular satire viewers may experience stronger effects on opinion, as well as increased internal efficacy, when consuming news coverage about issues previously highlighted in satire programming.”

“With Facebook, Blogs, and Fake News, Teens Reject Journalistic ‘Objectivity’” Marchi, Regina. Journal of Communication Inquiry , 2012. doi: 10.1177/0196859912458700.

Abstract: “This article examines the news behaviors and attitudes of teenagers, an understudied demographic in the research on youth and news media. Based on interviews with 61 racially diverse high school students, it discusses how adolescents become informed about current events and why they prefer certain news formats to others. The results reveal changing ways news information is being accessed, new attitudes about what it means to be informed, and a youth preference for opinionated rather than objective news. This does not indicate that young people disregard the basic ideals of professional journalism but, rather, that they desire more authentic renderings of them.”

Keywords: alt-right, credibility, truth discovery, post-truth era, fact checking, news sharing, news literacy, misinformation, disinformation

5 fascinating digital media studies from fall 2018
Facebook and the newsroom: 6 questions for Siva Vaidhyanathan

About The Author

' src=

Denise-Marie Ordway

What are you looking for?

The researchers sought to understand how the reward structure of social media sites drives users to develop habits of posting misinformation on social media. (Photo/AdobeStock)

USC study reveals the key reason why fake news spreads on social media

The USC-led study of more than 2,400 Facebook users suggests that platforms — more than individual users — have a larger role to play in stopping the spread of misinformation online.

USC researchers may have found the biggest influencer in the spread of fake news: social platforms’ structure of rewarding users for habitually sharing information.

The team’s findings, published Monday by Proceedings of the National Academy of Sciences , upend popular misconceptions that misinformation spreads because users lack the critical thinking skills necessary for discerning truth from falsehood or because their strong political beliefs skew their judgment.

Just 15% of the most habitual news sharers in the research were responsible for spreading about 30% to 40% of the fake news.

The research team from the USC Marshall School of Business and the USC Dornsife College of Letters, Arts and Sciences wondered: What motivates these users? As it turns out, much like any video game, social media has a rewards system that encourages users to stay on their accounts and keep posting and sharing. Users who post and share frequently, especially sensational, eye-catching information, are likely to attract attention.

“Due to the reward-based learning systems on social media, users form habits of sharing information that gets recognition from others,” the researchers wrote. “Once habits form, information sharing is automatically activated by cues on the platform without users considering critical response outcomes, such as spreading misinformation.”

Posting, sharing and engaging with others on social media can, therefore, become a habit.

“[Misinformation is] really a function of the structure of the social media sites themselves.” — Wendy Wood , USC expert on habits

“Our findings show that misinformation isn’t spread through a deficit of users. It’s really a function of the structure of the social media sites themselves,” said Wendy Wood , an expert on habits and USC emerita Provost Professor of psychology and business.

“The habits of social media users are a bigger driver of misinformation spread than individual attributes. We know from prior research that some people don’t process information critically, and others form opinions based on political biases, which also affects their ability to recognize false stories online,” said Gizem Ceylan, who led the study during her doctorate at USC Marshall and is now a postdoctoral researcher at the Yale School of Management . “However, we show that the reward structure of social media platforms plays a bigger role when it comes to misinformation spread.”

In a novel approach, Ceylan and her co-authors sought to understand how the reward structure of social media sites drives users to develop habits of posting misinformation on social media.

Why fake news spreads: behind the social network

Overall, the study involved 2,476 active Facebook users ranging in age from 18 to 89 who volunteered in response to online advertising to participate. They were compensated to complete a “decision-making” survey approximately seven minutes long.

Surprisingly, the researchers found that users’ social media habits doubled and, in some cases, tripled the amount of fake news they shared. Their habits were more influential in sharing fake news than other factors, including political beliefs and lack of critical reasoning.

Frequent, habitual users forwarded six times more fake news than occasional or new users.

“This type of behavior has been rewarded in the past by algorithms that prioritize engagement when selecting which posts users see in their news feed, and by the structure and design of the sites themselves,” said second author Ian A. Anderson , a behavioral scientist and doctoral candidate at USC Dornsife. “Understanding the dynamics behind misinformation spread is important given its political, health and social consequences.”

Experimenting with different scenarios to see why fake news spreads

In the first experiment, the researchers found that habitual users of social media share both true and fake news.

In another experiment, the researchers found that habitual sharing of misinformation is part of a broader pattern of insensitivity to the information being shared. In fact, habitual users shared politically discordant news — news that challenged their political beliefs — as much as concordant news that they endorsed.

Lastly, the team tested whether social media reward structures could be devised to promote sharing of true over false information. They showed that incentives for accuracy rather than popularity (as is currently the case on social media sites) doubled the amount of accurate news that users share on social platforms.

The study’s conclusions:

  • Habitual sharing of misinformation is not inevitable.
  • Users could be incentivized to build sharing habits that make them more sensitive to sharing truthful content.
  • Effectively reducing misinformation would require restructuring the online environments that promote and support its sharing.

These findings suggest that social media platforms can take a more active step than moderating what information is posted and instead pursue structural changes in their reward structure to limit the spread of misinformation.

About the study:  The research was supported and funded by the USC Dornsife College of Letters, Arts and Sciences Department of Psychology, the USC Marshall School of Business and the Yale University School of Management.

Related Articles

A new ‘rule of biology’ may have come to light, expanding insight into evolution and aging, meet the engineering graduate who weaves words as well as limbs, astronautical engineer’s path to success is propelled by mentorship.

Fake news detection based on news content and social contexts: a transformer-based approach

  • Regular Paper
  • Published: 30 January 2022
  • Volume 13 , pages 335–362, ( 2022 )

Cite this article

research about social media fake news

  • Shaina Raza   ORCID: orcid.org/0000-0003-1061-5845 1 &
  • Chen Ding 1  

41k Accesses

84 Citations

5 Altmetric

Explore all metrics

Fake news is a real problem in today’s world, and it has become more extensive and harder to identify. A major challenge in fake news detection is to detect it in the early phase. Another challenge in fake news detection is the unavailability or the shortage of labelled data for training the detection models. We propose a novel fake news detection framework that can address these challenges. Our proposed framework exploits the information from the news articles and the social contexts to detect fake news. The proposed model is based on a Transformer architecture, which has two parts: the encoder part to learn useful representations from the fake news data and the decoder part that predicts the future behaviour based on past observations. We also incorporate many features from the news content and social contexts into our model to help us classify the news better. In addition, we propose an effective labelling technique to address the label shortage problem. Experimental results on real-world data show that our model can detect fake news with higher accuracy within a few minutes after it propagates (early detection) than the baselines.

Similar content being viewed by others

research about social media fake news

Fake News Detection by Weakly Supervised Learning Based on Content Features

research about social media fake news

FN2: Fake News DetectioN Based on Textual and Contextual Features

research about social media fake news

Profiling Fake News: Learning the Semantics and Characterisation of Misinformation

Avoid common mistakes on your manuscript.

1 Introduction

Fake news detection is a subtask of text classification [ 1 ] and is often defined as the task of classifying news as real or fake. The term ‘fake news’ refers to the false or misleading information that appears as real news. It aims to deceive or mislead people. Fake news comes in many forms, such as clickbait (misleading headlines), disinformation (with malicious intention to mislead the public), misinformation (false information regardless of the motive behind), hoax, parody, satire, rumour, deceptive news and other forms as discussed in the literature [ 2 ].

Fake news is not a new topic; however, it has become a hot topic since the 2016 US election. Traditionally, people get news from trusted sources, media outlets and editors, usually following a strict code of practice. In the late twentieth century, the internet has provided a new way to consume, publish and share information with little or no editorial standards. Lately, social media has become a significant source of news for many people. According to a report by Statistica, Footnote 1 there are around 3.6 billion social media users (about half the population) in the world. There are obvious benefits of social media sites and networks in news dissemination, such as instantaneous access to information, free distribution, no time limit, and variety. However, these platforms are largely unregulated. Therefore, it is often difficult to tell whether some news is real or fake.

Recent studies [ 2 , 3 , 4 ] show that the speed at which fake news travels is unprecedented, and the outcome is its wide-scale proliferation. A clear example of this is the spread of anti-vaccination misinformation Footnote 2 and the rumour that incorrectly compared the number of registered voters in 2018 to the number of votes cast in US Elections 2020. Footnote 3 The implications of such news are seen during the anti-vaccine movements that prevented the global fight against COVID-19 or in post-election unrest. Therefore, it is critically important to stop the spread of fake news at an early stage.

A significant research gap in the current state-of-the-art is that it focuses primarily on fake news detection rather than early fake news detection. The seminal works [ 4 , 5 ] on early detection of fake news usually detect the fake news after at least 12 h of news propagation, which may be too late [ 6 ]. An effective model should be able to detect fake news early, which is the motivation of this research.

Another issue that we want to highlight here is the scarcity of labelled fake news data (news labelled as real or fake) in real-world scenarios. Existing state-of-the-art works [ 4 , 7 , 8 ] generally use fully labelled data to classify fake news. However, the real-world data is likely to be largely unlabelled [ 5 ]. Considering the practical constraints, such as unavailability of the domain experts for labelling, cost of manual labelling, and difficulty of choosing a proper label for each news item, we need to find an effective way to train a large-scale model. One alternative approach is to leverage noisy, limited, or imprecise sources to supervise labelling of large amounts of training data. The idea is that the training labels may be imprecise and partial but can be used to create a strong predictive model. This scheme of training labels is the weak supervision technique [ 9 ].

Usually, the fake news detection methods are trained on the current data (available during that time), which may not generalize to future events. Many of the labelled samples from the verified fake news get outdated soon with the newly developed events. For example, a model trained on fake news data before the COVID-19 may not classify fake news properly during COVID-19. The problem of dealing with a target concept (e.g. news as ‘real’ or ‘fake’) when the underlying relationship between the input data and target variable changes over time is called concept drift [ 10 ]. In this paper, we investigate whether concept drift affects the performance of our detection model, and if so, how we can mitigate them.

This paper addresses the challenges mentioned above (early fake news detection and scarcity of labelled data) to identify fake news. We propose a novel framework based on a deep neural network architecture for fake news detection. The existing works, in this regard, rely on the content of news [ 7 , 11 , 12 ], social contexts [ 1 , 4 , 5 , 8 , 13 , 14 ], or both [ 4 , 8 , 15 ]. We include a broader set of news-related features and social context features compared to the previous works. We try to detect fake news early (i.e. after a few minutes of news propagation). We address the label shortage problem that happens in real-world scenarios. Furthermore, our model can combat concept drift.

Inspired by the bidirectional and autoregressive Transformer (BART) [ 16 ] model from Facebook that is successfully used in language modelling tasks, we propose to apply a deep bidirectional encoder and a left-to-right decoder under the hood of one unified model for the task of fake news detection. We choose to work with the BART model over the state-of-the-art BERT model [ 17 ], which has demonstrated its abilities in NLP (natural language processing) tasks (e.g. question answering and language inference), as well as the GPT-2 model [ 18 ], which has impressive autoregressive (time-series) properties. The main reason is that the BART model combines the unique features (bidirectional and autoregressive) of both text generation and temporal modelling, which we require to meet our goals.

Though we take inspiration from BART, our model is different from the original BART in the following aspects: (1) in comparison with the original BART, which takes a single sentence/document as input, we incorporate a rich set of features (from news content and social contexts) into the encoder part; (2) we use a decoder to get predictions not only from previous text sequences (in this case, news articles) as in the original BART but also from previous user behaviour (how users respond to those articles) sequences, and we detect fake news early by temporally modelling user behaviour; (3) on top of the original BART model, we add a single linear layer to classify news as fake or real.

Our contributions are summarized as follows:

We propose a novel framework that exploits news content and social contexts to learn useful representations for predicting fake news. Our model is based on a Transformer [ 19 ] architecture, which facilitates representation learning from fake news data and helps us detect fake news early. We also use the side information (metadata) from the news content and the social contexts to support our model to classify the truth better.

We present a systematic approach to investigate the relationship between the user profile and news veracity. We propose a novel Transformer-based model using zero-shot learning [ 20 ] to determine the credibility levels of the users. The advantage of our approach is that it can determine the credibility of both long-term and new users, and it can detect the malicious users who often change their tactics to come back to the system or vulnerable users who spread misinformation.

We propose a novel weak supervision model to label the news articles. The proposed model is an effective labelling technique that lessens the burden of extensive labelling tasks. With this approach, the labels can be extracted instantaneously from known sources and updated in real-time.

We evaluate our system by conducting experiments on real-world datasets: (i) NELA-GT-19 [ 21 ] that consists of news articles from multiple sources and (ii) Fakeddit [ 22 ] that is a multi-modal dataset containing text and images in posts on the social media website Reddit. While the social contexts used in this model are from Reddit, consisting of upvotes, downvotes, and comments on posts, the same model can be generalized to fit other social media datasets. The same method is also generalizable for any other news dataset. The results show that our proposed model can detect fake news earlier and more accurately than baselines.

The rest of the paper is organized as follows. Section  2 is the related work. Section  3 discusses the proposed framework. Section  4 explains the details of our fake news detection model, Sect.  5 describes the experimental set-up, and Sect.  6 shows the results and analyses. Finally, Sect.  7 is about the limitations, and Sect.  8 gives the conclusion and lists the future directions.

2 Literature review

Fake news is information that is false or misleading and is presented as real news[ 23 ]. The term ‘fake news’ became mainstream during the 2016 presidential elections in United States. Following this, Google, Twitter, Facebook took steps to combat fake news. However, due to the exponential growth of information in online news portals and social media sites, distinguishing between real and fake news has become difficult.

In the state-of-the-art, the fake news detection methods are categorized into two types: (1) manual fact-checking; (2) automatic detection methods. Fact-checking websites, such as Reporterslab, Footnote 4 Politifact Footnote 5 and others [ 2 ], rely on human judgement to decide the truthfulness of some news. Crowdsourcing, e.g. Amazon’s Mechanical Turk, Footnote 6 is also used for detecting fake news in online social networks. These fact-checking methods provide the ground truth (true/false labels) to determine the veracity of news. The manual fact-checking methods have some limitations: 1) it is time-consuming to detect and report every fake news produced on the internet; 2) it is challenging to scale well with the bulks of newly created news, especially on social media; 3) it is quite possible that the fact-checkers’ biases (such as gender, race, prejudices) may affect the ground truth label.

The automatic detection methods are alternative to the manual fact-checking ones, which are widely used to detect the veracity of the news. In the previous research, the characteristics of fake news are usually extracted from the news-related features (e.g. news content) [ 21 ] or from the social contexts (social engagements of the users) [ 4 , 22 , 24 ] using automatic detection methods.

The content-based methods [ 25 , 26 , 27 , 28 ] use various types of information from the news, such as article content, news source, headline, image/video, to build fake news detection classifiers. Most content-based methods use stylometry features (e.g. sentence segmentation, tokenization, and POS tagging) and linguistic features (e.g. lexical features, bag-of-words, frequency of words, case schemes) of the news articles to capture deceptive cues or writing styles. For example, Horne and Adalı [ 29 ] extract stylometry and psychological features from the news titles to differentiate fake news from real. Przybyla et al. [ 26 ] develop a style-based text classifier, in which they use bidirectional Long short-term memory (LSTM) to capture style-based features from the news articles. Zellers et al. [ 12 ] develop a neural network model to determine the veracity of news from the news text. Some other works [ 27 , 30 ] consider lexicons, bag-of-words, syntax, part-of-speech, context-free grammar, TFIDF, latent topics to extract the content-based features from news articles.

A general challenge of content-based methods is that fake news's style, platform, and topics keep changing. Models that are trained on one dataset may perform poorly on a new dataset with different content, style, or language. Furthermore, the target variables in fake news change over time, and some labels become obsolete, while others need to be re-labelled. Most content-based methods are not adaptable to these changes, which necessitates re-extracting news features and re-labelling data based on new features. These methods also require a large amount of training data to detect fake news. By the time these methods collect enough data, fake news has spread too far. Because the linguistic features used in content-based methods are mostly language-specific, their generality is also limited.

To address the shortcomings of content-based methods, a significant body of research has begun to focus on social contexts to detect fake news. The social context-based detection methods examine users’ social interactions and extract relevant features representing the users’ posts (review/post, comments, replies) and network aspects (followers–followee relationships) from social media. For example, Liu and Wu [ 5 ] propose a neural network classifier that uses social media tweets, retweet sequences, and Twitter user profiles to determine the veracity of the news.

The existing social contexts-based approaches are categorized into two types: (1) stance-based methods and (2) propagation-based methods. The stance-based approaches exploit the users’ viewpoints from social media posts to determine the truth. The users express the stances either explicitly or implicitly. The explicit stances are the direct expressions of users’ opinions usually available from their reactions on social media. Previous works [ 4 , 5 , 22 ] mostly use upvotes/downvotes, thumbs up/down to extract explicit stances. The implicit stance-based methods [ 5 , 31 ], on the other hand, are usually based on extracting linguistic features from social media posts. To learn the latent stances from topics, some studies [ 11 ] use topic modelling. Other studies [ 13 , 32 ] look at fake users’ accounts and behaviours to see if they can detect implicit stances. A recent study also analyses users’ views on COVID-19 by focusing on people who interact and share information on Twitter [ 33 ]. This study provides an opportunity to assess early information flows on social media. Other related studies [ 34 , 35 ] examine users’ feelings about fake news on social media and discover a link between sentiment analysis and fake news detection.

The propagation-based methods [ 36 , 37 , 38 , 39 ] utilize information related to fake news, e.g. how users spread it. In general, the input to a propagation-based method can be either a news cascade (direct representation of news propagation) or self-defined graph (indirect representation capturing information on news propagation) [ 2 ]. Hence, these methods use graphs and multi-dimensional points for fake news detection [ 36 , 39 ]. The research in propagation-based methods is still in its early stages.

To conclude, social media contexts, malicious user profiles, and user activities can be used to identify fake news. However, these approaches pose additional challenges. Gathering social contexts, for example, is a broad subject. The data is not only big, but also incomplete, noisy, and unstructured, which may render existing detection algorithms ineffective.

Other than NLP methods, visual information is also used as a supplement to determine the veracity of the news. A few studies investigate the relationship between images and tweet credibility [ 40 ]. However, the visual information in this work [ 40 ] is hand-crafted, limiting its ability to extract complex visual information from the data. In capturing automatic visual information from data, Jin et al. [ 41 ] propose a deep neural network approach to combine high-level visual features with textual and social contexts automatically.

Recently, transfer learning has been applied to detect fake news [ 1 , 7 ]. Although transfer learning has shown promising results in image processing and NLP tasks, its application in fake news detection is still under-explored. This is because fake news detection is a delicate task in which transfer learning must deal with semantics, hidden meanings, and contexts from fake news data. In this paper, we propose a transfer learning-based scheme, and we pay careful attention to the syntax, semantics and meanings in fake news data.

2.1 State-of-the-art fake news detection models

In one of earlier works, Karimi et al. [ 42 ] use convolutional neural network (CNN) and LSTM methods to combine various text-based features, such as those from statements (claims) related to news data. Liu et al. [ 39 ] also use RNN and CNN-based methods to build propagation paths for detecting fake news at the early stage of its propagation. Shu et al. [ 4 ] propose a matrix factorization method TriFN to model the relationships among the publishers, news stories and social media users for fake news detection.

Cui et al. [ 12 ] propose an explainable fake news detection system DEFEND based on LSTM networks. The DEFEND considers users’ comments to explain if some news is real or fake. Nguyen et al. [ 15 ] propose a fake news detection method FANG that uses the graph learning framework to learn the representations of social contexts. These methods discussed above are regarded as benchmark standards in the field of fake news research.

In recent years, there has been a greater focus in NLP research on pre-trained models. BERT [ 17 ] and GPT-2 [ 43 ] are two state-of-the-art pre-trained language models. In the first stage, the language model (e.g. BERT or GPT-2) is pre-trained on the unlabelled text to absorb maximum amount of knowledge from data (unsupervised learning). In the second stage, the model is fine-tuned on specific tasks using a small-labelled dataset. It is a semi-supervised sequence learning task. These models are also used in fake news research.

BERT is used in some fake news detection models [ 1 , 7 , 44 ] to classify news as real or fake. BERT uses bidirectional representations to learn information and is generally more suitable for NLP tasks, such as text classification and translation. The GPT-2, on the other hand, uses the unidirectional representation to predict the future using left-to-right context and is better suited for autoregressive tasks, where timeliness is a crucial factor. In related work, Zellers et al. [ 12 ] propose a Grover framework for the task of fake news detection, which uses a language model close to the architecture of GPT-2 trained on millions of news articles. Despite these models’ robust designs, there are a few research gaps. First, these models do not consider a broader set of features from news and social contexts. Second, these methods ignore the issue of label scarcity in real-world scenarios. Finally, the emphasis is not on early fake news detection.

The state-of-the-art focuses primarily on fake news detection methods rather than early fake news detection. A few works [ 4 , 5 ] propose early detection of fake news. However, to detect fake news, these methods [ 4 , 5 ] usually rely on a large amount of fake news data observed over a long period of time (depending upon the availability of the social contexts). The work in [ 4 ] detects fake news after at least 12 h of news propagation, as demonstrated in their experiments, which may be too late. According to research [ 6 ], the fake news spreads within minutes once planted. For example, the fake news that Elon Musk’s Tesla team is inviting people to give them any amount (ranging from 0.1 to 20) of bitcoins in exchange for double the amount resulted in a loss of millions of dollars within the first few minutes. Footnote 7 Therefore, it is critical to detect fake news early on before it spreads.

Our work is intended to address these issues (early fake news detection, labels scarcity) in fake news research. BERT and GPT-2 (or similar) have not been used to their full potential for representation learning and autoregression tasks in a single unifying model that we intend to work on going forward in our research. We propose a combination of Transformer architectures that can be applied to a wide range of scenarios, languages, and platforms.

3 Overview of the proposed framework

3.1 problem definition.

Given a multi-source news dataset and social contexts of news consumers (social media users), the task of fake news detection is to determine if a news item is fake or real. Formally, we define the problem of fake news detection as:

Input: News items, social contexts and associated side information

Output: One of two labels: ‘fake’ or ‘real’.

3.2 Proposed architecture

Figure  1 shows an overview of our proposed framework. Initially, the news comes from the news ecosystem [ 45 ], which we refer to as the dataset (input) in this work. The news content and social contexts go into the respective components where the data is being preprocessed. The input to the embedding layer is the features from news content and social contexts. The output from the embedding layer is the vector representations of news content and social contexts. These vector representations are combined to produce a single representation that is passed as input to the Transformer block. The output from the Transformer is transferred to the classification layer, followed by the cross-entropy layer. We get a label (fake or real) for each news as the final output.

figure 1

Overview of the proposed framework

We utilize three types of embeddings in the embedding layer: (1) token embeddings: to transform words into vector representations; (2) segment embeddings: to distinguish different segments or sentences from the content; (3) positional embeddings: to show tokens’ positions within sequences.

We create sequences from the news content and social contexts (user behaviours). In our work, we maintain a temporal order in the sequences through positional encodings. The intuition is that each word in the sequence is temporally arranged and is assigned to a timestep, which means that the first few words correspond to the timestep 0, timestep 1, and so on, till the last word corresponding to the last timestep. We use the sinusoidal property of position encodings in sequences [ 46 ], where the distance between the neighbouring timesteps is symmetrical and decays over time.

We discuss the Transformer block that consists of the encoder, decoder and attention mechanism in detail in Sect.  4 and Fig.  3 .

3.3 The news ecosystem

The news ecosystem consists of three basic entities: publishers (news media or editorial companies that publish the news article), information (news content) and users [ 4 , 45 , 47 ]. As shown in Fig.  1 , initially, the news comes from publishers. Then, it goes to different websites or online news platforms. The users get news from these sources, sharing news on different platforms (blogs, social media). The multiple friends’ connections, followers–followees links, hashtags, and bots make up a social media network.

3.3.1 News content

The news content component takes news content from the news ecosystem. The news body (content) and the corresponding side information represent a news article. The news body is the main text that elaborates the news story; generally, the way a news story is written reflects an author's main argument and viewpoint. We include the following side information related to news:

Source The source of news (e.g. CNN, BBC).

Headline The title text that describes the main topic of the article. Usually, the headlines are designed to catch the attention of readers.

Author The author of the news article.

Publication time The time when news is published; it is an indicator of recency or lateness of news.

Partisan information This information is the adherence of a news source to a particular party. For example, a news source with many articles favouring the right-wing reflects the source’ and authors’ partisan bias.

3.3.2 Social contexts

The social contexts component takes the social contexts on the news, such as posts, likes, shares, replies, followers–followees and their activities. When the features related to the content of news articles are not enough or available, social contexts can provide helpful information on fake news. Each social context is represented by a post (comment, review, reply) and the corresponding side information (metadata). The post is a social media object posted by a user; it contains useful information to understand a user’s view on a news article. We include the following side information related to social contexts:

User A person or bot that registers on social media.

Title The headline or short explanation of the post. The title of the post matches the news headline.

Score A numeric score given to a post by another user; this feature determines whether another user approves or disapproves of the post.

Source The source of news.

Number of comments The count of comments on a post; this feature gives the popularity level of a post.

Upvote–Downvote ratio An estimate of other users’ approval/disapproval of a post.

Crowd (aggregate) response We calculate the aggregate responses of all users on each news article. To calculate the aggregate response, we take all the scores on a post to determine a user’s overall view of a news story. We assume that a news story or theme with a score less than 1 is not reliable and vice versa.

User credibility We determine the credibility level of social media users as an additional social context feature. This feature is helpful to determine if a user tends to spread some fake news or not. For example, similar posts by a non-credible user on a news item is an indicator of a news being real or fake. We determine user credibility through a user credibility component, shown in Fig.  2 and discussed next.

figure 2

The user credibility module

3.4 User credibility module

The topic of determining the credibility of social media users is not new in the literature. Some previous works apply community detection [ 48 ], sentiment analysis [ 33 ] and profile ranking techniques [ 49 ]. However, there is not much work in the fake news detection that considers the user credibility of social media users. The seminal work [ 4 ], in this regard, is a simple clustering approach that assigns a credibility level to each user. We adopt a different approach to build the user credibility module, as shown in Fig.  2 .

We use zero-shot learning (ZSL) [ 20 ] to determine the credibility of users. ZSL is a mechanism by which a computer program learns to recognize objects in an image or extract information from text without labelled training data. For example, a common approach to classifying news categories is training a model from scratch on task-specific data. However, ZSL enables this task to be completed without any previous task-specific training. ZSL can also detect and predict unknown classes that a model has never seen during training based on prior knowledge from the source domain [ 43 , 51 ] or auxiliary information.

To determine the credibility level, we first group each user’s engagements (comments, posts, replies) and then feed this information into our ZSL classifier. We build our ZSL classifier based on the Transformer architecture. We attach the pre-trained checkpoint Footnote 8 (weights of the model during the training phase) of a huge dataset: multi-genre natural language inference (MNLI) [ 50 ], with our classifier.

A multi-sourced collection is frequently used to collect information or opinions from large groups of people who submit their data through social media or the internet. We are using MNLI because it is a large-scale crowd-sourced dataset that covers a range of genres of spoken and written text. User credibility and crowdsourcing have been linked in previous research [ 55 , 56 ]. Therefore, we anticipate that a large amount of crowdsourced data in MNLI could reveal a link between users’ credibility and how they express their opinions. It would be expensive if we need to gather such crowd-sourced opinions as well as direct user feedbacks ourselves. We gain the benefits of a pre-trained model in terms of size and training time and the benefit of accuracy by using MNLI.

Through ZSL, the checkpoint that is pre-trained can be fine-tuned for a specialized task, e.g. the user credibility task in our work. We could classify the users into different unseen classes (user credibility levels). In total, we define five credibility levels: ‘New user’, ‘Very uncredible’, ‘Uncredible’, ‘Credible’, ‘Very credible’. We use the prior knowledge of a fine-tuned ZSL model and its checkpoint, and we also use the semantics of the auxiliary information to determine known user classes. Our model can also determine new unknown user classes. Later, we incorporate this information as the weak labels into our fake news detection model.

Another module in this framework is the weak supervision module that is related to our datasets and labelling scheme, so we will discuss it in the dataset section (Sect.  6.4 ).

4 Proposed method

4.1 preliminaries.

Let \(N = \left\{ {n_{1} , n_{2} , \ldots , n_{\left| N \right|} } \right\}\) be a set of news items, each of which is labelled as \(y_{i} \in \left\{ {0,1} \right\}\) , \(y_{i}\)  = 1 is the fake news and \(y_{i} = 0\) is the real news. The news item \(n_{i} \) is represented by its news body (content) and the side information (headline, body, source, etc.). When a news item \(n_{i}\) is posted on social media, it is usually responded to by a number of social media users \(U = \left\{ {u_{1} , u_{2} , \ldots , u_{\left| U \right|} } \right\}\) . The social contexts include users’ social engagements (interactions), such as comments on news, posts, replies, and upvotes/downvotes.

We define social contexts on a news item \(n_{i} { }\) as: \({\text{SC}}\left( {n_{i} } \right) = \left( {\left( {u_{1} ,{\text{sc}}_{1} ,t_{1} } \right), \left( {u_{2} ,{\text{sc}}_{2} ,t_{2} } \right), \ldots ,\left( {u_{{\left| {{\text{sc}}} \right|}} ,{\text{sc}}_{{\left| {{\text{sc}}} \right|}} ,t_{{\left| {{\text{sc}}} \right|}} } \right)} \right)\) , where each tuple \(\left( {u,{\text{sc}},t} \right)\) refers to a user u ’s social contexts sc on a news item \(n_{i} { }\) during time t . Here, a user may interact with a post multiple times, and each interaction is recorded with its timestamp.

The task of fake news detection is to find a model M that predicts a label \(\hat{y}\left( {n_{i} } \right) \in \left\{ {0,1} \right\}\) for each news item based on its news content and the corresponding social contexts. Therefore, the task of fake news detection, in this paper, is defined as shown in Eq. ( 1 ):

where \(C(n_{i} )\) refers to the content of news and \({\text{SC}}(n_{i} )\) refers to the social contexts on the news. The notations used in this paper can be found in “Appendix A”.

4.2 Proposed model: FND-NS

Here, we introduce our proposed classification model called FND-NS (Fake News Detection through News content and Social context), which adapts the bidirectional and auto-regressive Transformers (BART) for a new task—fake news detection, as shown in Fig.  3 . The original BART [ 16 ] is a denoising autoencoder that is trained in two steps. It first corrupts the text with an arbitrary noising function, and then it learns a model to reconstruct the original text. We use the BART as sequence-to-sequence Transformer with a bidirectional encoder (like BERT) and a left-to-right autoregressive decoder (like GPT-2).

figure 3

The encoder and decoder blocks in FND-NS model

Models such as BERT [ 17 ], which captures the text representations in both directions, and GPT-2 [ 18 ], which has autoregressive properties, are examples of self-supervised methods. Both are Transformer models with their strengths: BERT excels in discriminative tasks (identifying existing data to classify) but is limited in generative tasks. At the same time, GPT-2 is capable of generative tasks (learning the regularities in data to generate new samples) but not discriminative tasks due to its autoregressive properties. In comparison with these models, BART integrates text generation and comprehension using both bidirectional and autoregressive Transformers. Due to this reason, we choose to work on the BART architecture.

Though we get inspiration from BART, our network architecture is different from the original BART in the following manner. The first difference between our model and the original BART is the method of input. Original BART takes one piece of text as input in the encoder part. In contrast, we incorporate a rich set of features (from news content and social contexts) into the encoder part. We use multi-head attentions to weigh the importance of different pieces of information. For example, if the headline is more convincing in persuading readers to believe something, or if a post receives an exceptionally large number of interactions, we pay closer attention to such information. We have modified the data loader of the original BART to feed more information into our neural network architecture.

The second difference is the way the next token is predicted. By token, we mean the token-level tasks such as named entity recognition and question answering, where models are required to produce fine-grained output at the token level [ 52 ]. We randomly mask some tokens in the input sequence. We follow the default masking probability, i.e. masking 15% of tokens with (MASK), as in the original paper [ 16 ]. However, we predict the ids of those masked items based on the positions of missing inputs from the sequence. This way, we determine the next item in the sequence based on its temporal position. In our work, we use the decoder to make predictions based on the previous sequences of text (news articles) and the previous sequences of user behaviours (how users respond to those articles). Modelling user behaviours in such a temporal manner helps us detect fake news in the early stage.

Finally, different from the original BART, we add a linear transformation and SoftMax layer to output the final target label.

Next, we discuss our model (Fig.  3 ) and explain how we use it in fake news detection. Let N represents a set of news items. Each news item has a set of text and social context features. These features are merged to form a combined feature set, as shown in the flowchart in Fig.  4 .

figure 4

Flowchart of proposed FND-NS model

These combined features are then encoded into a vector representation. Let \(X\) represent a sequence of k combined features for a news item, as shown in Eq. ( 2 ):

These features are given as input to the embedding layers. The embedding layer gives us a word embedding vector for each word (feature in our case) in the combined feature set. We also add a positional encoding vector with each word embedding vector. The word embedding vector gives us the (semantic) information for each word. The positional encoding describes the position of a token (word) in a sequence. Together they give us the semantic as well as temporal information for each word. We define this sequence of embedding vectors as \(X^{\prime } = \left\{ {x_{1}^{\prime } ,x_{2}^{\prime } , \ldots ,x_{k}^{\prime } } \right\}\) .

In the sequence-to-sequence problem, we find a mapping f from an input sequence of \(k\) vectors \(X_{1:k}^{\prime }\) to a sequence of \(l\) target vectors \(Y_{1:l}\) . The number of target vectors is unknown a priori and depends on the input sequence. The f is shown in Eq. ( 3 ):

4.2.1 Encoder

The encoder is a stack of encoder blocks, as shown in green in Fig.  3 . The encoder maps the input sequence to a contextualized encoding sequence. We use the bidirectional encoder to encode the input from both directions to get the contextualized information. The input to the encoder is the input sequence \(X_{1:k}^{\prime }\) . The encoder maps the input sequence \(X^{\prime }\) to a contextualized encoding sequence \(\overline{X}\) , as shown in Eq. ( 4 ):

The first encoder block transforms each context-independent input vector to a context-dependent vector representation. The next encoder blocks further refine the contextualized representation until the last encoder block outputs final contextualized encoding \(\overline{X}_{1:k}\) . Each encoder block consists of a bidirectional self-attention layer, followed by two feed-forward layers. We skip the details of feed-forward layers, which are the same as in [ 17 ]. We focus more on the bidirectional self-attention layer that we apply to the given inputs.

The bidirectional self-attention layer takes the vector representation \(x_{i}^{\prime } \in X_{1:k}^{\prime }\) as the input. Each input vector \(x_{i}^{\prime }\) in the encoder, block is projected to a key vector \(\kappa_{i} \in \mathcal{K}_{1:k}\) , value vector \(v_{i} \in V_{1:k}\) , and a query vector \(q_{i} \in Q_{1:k}\) , through three trainable weight matrices \(W_{q}\) , \(W_{v}\) , \(W_{k}\) , as shown in Eq. ( 5 )

where \(\forall i \in \left\{ {1, 2 , \ldots , k} \right\}\) . The same weight matrices are applied to each input vector \(x_{i}^{\prime }\) . After projecting each input vector \(x_{i}^{\prime }\) to a query, key and value vector, each query vector is compared to all the key vectors. The intuition is that the higher the similarity between a key vector and a query vector, the more important is the corresponding value for the output vector. The output from the self-attention layer is the output vector representation \(x_{i}^{\prime \prime }\) , which is a refined contextualized representation of \(x_{i}^{\prime }\) . An output vector x ″ is defined as the weighted sum of all value vectors \(V\) plus the input vector \(x^{\prime }\) . The weights are proportional to the cosine similarity between the query vectors and respective key vectors, shown in Eq. ( 6 ):

here X ″ is the sequence of output vectors generated from the input X ′. X ″ is given to the last encoder block, and the output from the last encoder is a sequence of encoder hidden states \(\overline{X}\) . The final output from the encoder is the contextualized encoded sequence \(\overline{X}_{1:k}\) , which is passed to the decoder.

4.2.2 Decoder

The decoder only models on the leftward context, so it does not learn bidirectional interactions. Generally, the news (either real or fake) is shown or read in the order of publication timestamps. So, news reading is a left-to-right (backward-to-forward) process. Naturally, the timestamps of users’ engagements also follow the order of the news. In our work, we model the left-to-right interdependencies in the sequences through the decoder part. The recurrent structure inside the decoder helps us use the predictions from a previous state to generate the next state. With autoregressive modelling, we can detect fake news in a timely manner, contributing to early detection.

The Transformer-based decoder is a stack of decoder blocks, as shown in orange in Fig.  3 , and the dense layer language modelling (LM) head is on the top. The LM head is a linear layer with weights tied to the input embeddings. Each decoder block has a unidirectional self-attention layer, followed by a cross-attention layer and two feed-forward layers. The details about the feed-forward layers can be found in the paper [ 18 ]. Here, we focus more on the details of attention layers.

The input to the decoder is the contextualized encoding sequence \(\overline{X}_{1:k}\) from the encoder part. The decoder models the conditional probability distribution of the target vector sequence \(Y_{1:l}\) , given the input \(\overline{X}_{1:k}\) , shown in Eq. ( 7 ):

here l is the number of the target vectors and depends on the input sequence k. By Bayes’ rule, this distribution can be factorized into conditional distributions of a target sequence \(y_{i} \in Y_{1:l} \) , as shown in Eq. ( 8 ):

where \(\forall i \in \left\{ {1, 2 , \ldots , l} \right\}\) . The LM head maps the encoded sequence of target vectors \(\overline{Y}_{0:i - 1}\) to a sequence of logit vectors \(\mathcal{L}_{1:k} = \ell_{1} , \ldots ,\ell_{k}\) , where the dimensionality of each logit vector \(\ell_{i}\) corresponds to the size of the input vocabulary \(1:k\) . A probability distribution over the whole vocabulary is obtained by applying a SoftMax operation on  \(\ell_{i}\) , as shown in Eq. ( 9 ):

here \(W_{{{\text{emb}}}}^{T}\) is transpose of the word embedding matrix. We autoregressively generate output from the input sequences through probability distribution in \(p_{{\theta_{{{\text{dec}}}} }} {(}y_{i} { |}\overline{X}_{1:k} { },{ }Y_{0:i - 1} )\) .

The unidirectional attention takes the input vector y ′ (representation of \(y\) ), and the output is the vector representation y ″. Each query vector in the unidirectional self-attention layer is compared only to its respective key vector and previous ones to yield the respective attention weights. The attention weights are then multiplied by their respective value vectors and summed together, as in Eq. ( 10 ):

The cross-attention layer takes as input two vector sequences: (1) outputs of the unidirectional self-attention layer, i.e. \(Y_{0:l - 1}^{\prime \prime }\) ; (2) contextualized encoding vectors \(\overline{X}_{1:k} \) from the encoder. The cross-attention layer puts each of its input vectors to condition the probability distribution of the next target vectors on the encoder's input. We summarize cross-attention in Eq. ( 11 ):

The index range of the key and value vectors is \(1:l\) , which corresponds to the number of contextualized encoding vectors. Y ‴ is given to the last decoder block and the output from the decoder is a sequence of hidden states \(\overline{Y}\) .

4.2.3 Model training

In this work, we implement the transfer learning solution [ 19 ] for fake news detection. We leverage the previous learnings from a BART pre-trained checkpoint Footnote 9 and fine-tune the model on the downstream task of fake news detection. We perform the classification task for fake news detection. For the classification task, we input the same sequences into the encoder and decoder. The final hidden state of the final decoder token is fed into an output layer for classification. This approach is like the [CLS] representation (CLS for classification) in BERT that serves as the token for the output classification layer. The BERT has the CLS token returned by the encoder, but in BART, we need to add this additional token in the final decoder part. Therefore, we add the token < S > in the decoder to attend to other decoder states from the complete input. We show the classification process in Fig.  5 [ 16 ].

figure 5

The classification process; input fed into encoder goes into decoder; the output label

We represent the last hidden state [ \(\mathcal{S}\) ] of the decoder as \(h_{{\left[ \mathcal{S} \right]}}\) . The number of the classes is two (fake is 1, real is 0). A probability distribution \(p \in [0,1]\) \(^{2}\) is computed over the two classes using a fully connected layer with two output neurons on top of \(h_{{\left[ \mathcal{S} \right]}}\) , which is followed by the SoftMax activation function, as shown in Eq. ( 12 ):

where \({\mathcal{W}}\) is the learnable projection matrix and b is the bias. We train our model for the sequence-pair classification task [ 17 ] to classify fake news. Unlike the typical sequence-pair classification task, we use the binary Cross-Entropy with logits loss function instead of the vanilla cross-entropy loss used for the multi-class classification. However, the same model can be adapted for the multi-class classification if there is a need. Through binary cross-entropy loss, our model can assign independent probabilities to the labels. The cross-entropy function H determines the distance between the true probability distribution and predicted probability distribution, as shown in Eq. ( 13 ):

where \(y_{j}\) is the ground truth for observation and \(\hat{y}_{j}\) is the model prediction.

Based on the Transformer architecture, our model naturally takes the sequences of words as the input, which keeps flowing up the stacks from encoder to decoder, while the new sequences are coming in. We organize the news data according to the timestamps of users’ engagements so that the temporal order is retained during the creation of the sequences. We use paddings to fill up the shorter readers’ sequences, while the longer sequences are truncated.

5 Experimental set-up

5.1 datasets.

It was not a trivial task to find a suitable dataset to evaluate our proposed model because most of the standard datasets available for fake news detection are either too small, sparse, or void of temporal information.

A few state-of-the-art datasets, such as FakeNewsNet [ 23 ], are not available as the full version but can be found as the sample data. This is mainly because most of these datasets use Twitter data for social contexts and thus cannot be publicly accessible due to license policies. Other available datasets that consider the fake news content are outdated. Since fake news producers typically change their strategies over time, such datasets are not suitable to solve the issue of fake news data for the recent news data. After extensive research and careful consideration, we found that the NELA-GT-19 and Fakeddit are most suitable for our proposed problem regarding the number of news articles, temporal information, social contexts, and associated side information.

To evaluate the effectiveness of our proposed FND-NS model, we conducted comprehensive experiments on the data from the real-world datasets: NELA-GT-2019 [ 21 ] and Fakeddit [ 22 ]. Both datasets are in English, and we take the same timeline for the two datasets.

5.1.1 NELA-GT-2019

For our news component, we use the NELA-GT-2019 dataset [ 21 ], a large-scale, multi-source, multi-labelled benchmark dataset for news veracity research. This dataset can be accessed from here. Footnote 10 The dataset consists of 260 news sources with 1.12 million news articles. These news articles were published between January 1, 2019, and December 31, 2019. The actual news articles are not labelled. We get the ground truth labels (0—reliable, 1—mixed, 2—unreliable) at the source level and use the weak supervision (discussed in Sect.  5.2 ) to assign a label to each news article. We use the article ID, publication timestamp, source of news, title, content (body), and article's author for the news features. We only use the ‘reliable’ and ‘unreliable’ source-level labels. For the ‘mixed’ labels, we change them to ‘unreliable’ if they are reported as ‘mixed’ by the majority of the assessment sites and omit the left-over ‘mixed’ sources. The statistics of the actual data can be found in the original paper [ 21 ].

5.1.2 Fakeddit

For the social contexts, we use the Fakeddit dataset [ 22 ], which is a large scale, multi-modal (text, image), multi-labelled dataset sourced from Reddit (social news and discussion website). This dataset can be accessed from here. Footnote 11 Fakeddit consists of over 1 million submissions from 22 different subreddit (users’ community boards) and over 300,000 unique individual users. The data are collected from March 19, 2008, till October 24, 2019. We consider the data from January 01, 2019, till October 24, 2019, to match it with the timeline of the NELA-GT-19. According to previous work [ 3 ], this amount of data is considered sufficient for testing the concept drift. We use the features of the social context from this dataset: submission (the post on a news article), submission title (title of the post matching with the headline of the news story), users’ comments on the submission, user IDs, subreddit (a forum dedicated to a specific topic on Reddit) source, news source, number of comments, up-vote to down-vote ratio, and timestamp of interaction. The statistics of the actual data can be found in the original paper [ 22 ].

5.2 Weak supervision

The weak supervision module is a part of our proposed framework as shown in Fig.  6 . We utilize weak (distant) labelling to label news articles. Weak supervision (distant supervision) is an alternative approach to label creation, in which labels are created at the source level and can be used as proxies for the articles. One advantage of this method is that it reduces the labelling workload. Furthermore, the labels for articles from known sources are known instantly, allowing for real-time labelling, as well as parameter updates and news analysis. This method is also effective in the detection of misinformation [ 9 , 53 , 54 , 55 ].

figure 6

Weak supervision module

The intuition behind weak supervision is that the weak labels on the training data may be imprecise but can be used to make predictions using a strong model [ 54 ]. We overcome the scarcity issue of hand-labelled data by compiling a dataset like this, which can be done almost automatically and can yield good results as shown in Sect.  6.3 .

In our work, we use the weak supervision to assign article-level labels for the NELA-GT-2019 dataset, where the source-level labels are provided by the dataset. This method is also suggested by the providers of NELA-GT-2019 dataset [ 21 ]. For the Fakeddit dataset, the ground truth labels are provided by the dataset itself, we only create two new labels for this dataset—‘crowd response’ and ‘user credibility’. We use these labels provided by the datasets to create a new weighted aggregate label to be assigned to each news article.

From the NELA-GT-19, we get the ground truth labels associated with each source (e.g. NYTimes, CNN, BBC, theonion and many more). These labels are provided by seven different assessment sites: (1) Media Bias/Fact Check, (2) Pew Research Center, (3) Wikipedia, (4) OpenSources, (5) AllSides, (6) BuzzFeed News, and (7) Politifact, to each news source. Based on these seven assessments, Gruppi et al. [ 21 ] created an aggregated 3-class label: unreliable, mixed and reliable, to assign to each source. We use the source-level labels as the proxies for the article-level labels. The assumption is that each news story belongs to a news source and the reliability of the news source has an impact on the news story. This approach is also suggested in the NELA-GT-18 [ 56 ] and NELA-GT-20 [ 57 ] papers and has shown promising results in the recent fake news detection work [ 3 ].

Once we get the label for each news article, we perform another step of processing over article-level labels. As mentioned earlier, the NELA-GT-19 provides the 3-class source-level labels: {‘Unreliable’, ‘Mixed, ‘Reliable’}. According to Gruppi et al. [ 21 ], the ‘mixed’ label means mixed factual reporting. We have not used the ‘mixed’ label in our work. We change the ‘mixed’ label to ‘unreliable’ if it is reported as ‘mixed’ by the majority of the assessment sites. For the remaining left-over mixed labels, we remove those sources to avoid ambiguity. This gives a final news dataset with 2-class labels: {‘Reliable’, ‘Unreliable’}.

The other dataset used in this work is Fakeddit. Nakamura et al. [ 22 ] also use the weak supervision to create labels in the Fakeddit dataset. They use the Reddit themes to assign a label to each news story. More details about Reddit themes and the distant labelling process are available in their paper [ 22 ].

The dataset itself provides labels as 2-way, 3-way and 6-way labels. We use the 6-way label scheme, where the labels assigned to each sample are: ‘True’, ‘Satire’, ‘Misleading Content’, ‘Imposter Content’, ‘False Connection’, and ‘Manipulated Content’. We assign two more weak labels in addition to 6-way labels, which are user credibility and crowd response labels that we compute using the social contexts. The user credibility level has five classes: ‘New user’, ‘Very uncredible’, ‘Uncredible’, ‘Credible’, ‘Very credible’. The crowd response has two classes: ‘Fake’ and ‘Real’.

We get the user credibility levels through our ZSL classifier (Fig.  2 ). For the crowd response, we simply take the scores of all the comments (posts) of users on a news story to determine the overall view of users on this news story. The goal is to make the label learning more accurate by adding more weak labels to the available labels. In a preliminary test, we find that using weak supervision with multiple weak labels (our case) achieves a better result than using Fakeddit theme-based weak labels alone [ 22 ] (they learned using their weak supervision model).

Based on this, we design a formula to assign the final label (‘Real’, ‘Fake’) to each sample in the aggregate functionality part. We assign a final label ‘ Fake ’ to a new article if one of the following conditions is satisfied: (1) its 6-way label specified in Fakeddit is ‘Satire’, ‘Misleading content’, ‘Imposter’, ‘False connection’, or ‘Manipulated content’; (2) its label specified in NELA-GT-19 is ‘Unreliable’; (3) its label according to user credibility is ‘Very uncredible’ or ‘Uncredible’; (4) its label according to crowd response is ‘Fake’. We assign a label ‘ Real ’ to the news if all of the following conditions are satisfied: (1) its label in Fakeddit is ‘True’; (2) its label in NELA-GT-19 is ‘Reliable’; (3) its label according to user credibility is ‘New user’, ‘Credible’, or ‘Very credible’; (4) its label according to crowd response is ‘Real’. We do not penalize a new user because we do not have sufficient information for a new user.

Our FND-NS model implicitly assumes that these weak labels are precise and heuristically matches the labels against the corpus to generate the training data. The model predicts the final label: ‘Real’ or ‘Fake’ for the news.

To handle the data imbalance problem in both datasets, we use the under-sampling technique [ 58 ], in which the majority class is made closer to the minority class by removing records from the majority class. The number of fake and real news items from datasets used in this research is given in Table 1 .

We temporally split the data for the model training. We use the last 15% of the chronologically sorted data as the test set, the second to last 10% of the data as the validation set and the initial 75% of the data as the train set. We also split the history of each user based on the interaction timestamp. We consider the last 15% of the interactions as the test set.

5.3 Evaluation metrics

In this paper, the fake news detection task is a binary decision problem, where the detection result is either fake or real news. To assess the performance of our proposed model, we use the accuracy ACC, precision Prec, recall Rec, F1-score F1, area under the curve AUC and average precision AP as the evaluation metrics. The confusion matrix determines the information about actual and predicted classifications, as shown in Table 2 .

The variables TP, FP, TN and FN in the confusion matrix refer to the following:

True Positive (TP): number of fake news that are identified as fake news.

False Positive (FP): number of real news that are identified as fake news.

True negative (TN): number of real news that are identified as real news.

False negative (FN): number of fake news that are identified as real news.

For the Prec, Rec, F1 and ACC, we perform the specific calculation as:

To calculate the AUC, we calculate the true positive rate (TPR) and the false positive rate (FPR). TPR is a synonym for the recall, whereas FPR is calculated as:

The receiver operating characteristic (ROC) plots the trade-offs between the TPR and FPR at different thresholds in a binary classifier. The AUC is an aggregate measure to evaluate the performance of the model across all those possible thresholds. Compared to the accuracy measure ACC, the AUC is better at ranking predictions. For example, if there are more fake news samples in the classification, the accuracy measure may favour the majority class. On the other hand, the AUC measure gives us the score order (ranking) along with the accuracy score. We also include the average precision AP that gives the average precision at all such possible thresholds, similar to the area under the precision–recall curve.

5.4 Hyperparameters

We implement our model with Pytorch on the GPUs provided by Google Colab Pro. Footnote 12 We use the pre-trained checkpoint of bart-large-mnli . Footnote 13 The MNLI is a crowd-sourced dataset that can be used for the tasks such as sentiment analysis, hate speech detection, detecting sarcastic tone, and textual entailment (conclude a particular use of a word, phrase, or sentence). The model is pre-trained on 12 encoder and 12 decoder layers, in total 24 layers with a dimensionality size of 1024. The model has 16-heads with around 1 million parameters. We add a 2-layer classification head fine-tuned on the MNLI. The model hyperparameters are shown in Table 3 .

Our model is trained using Adam optimizer [ 59 ]. In our experiments, the larger batch sizes did not work. So, we decrease the batch size from 32 (often used) to 8 until the memory issues get resolved. We keep the same batch size of 8 during the training and validation process. The number of train epochs is 10. The default sequence length supported by the BART is 1024. Through an initial analysis of our datasets, we find that the mean length of a news story is around 550 words, whereas a Reddit post is on average 50 words. The maximum sequence length of BERT and GPT-2 is 512, which is less than the mean length of a news story. So, we set the sequence length to 700 to include the average news length and the side information from the news and social contexts. The sequences are created based on the timestamps of the user’s engagement. The longer sequences are truncated, while the shorter ones are padded to the maximum length.

5.5 Baseline approaches

We compare our model with the state-of-the-art fake news detection methods, including deep neural and traditional machine learning methods. We also consider other baselines, including a few recent Transformer models, a neural method (Text CNN) and a traditional baseline (XGboost).

A few state-of-the-art methods, such as a recent one by Liu and Wu [ 5 ], is not publicly accessible, so we have not included those in this experiment. Some of these baselines are by default using content features only (e.g. exBake, Grover, Transformer-based baselines, TextCNN), and some are using social contexts only (e.g. 2-Stage Tranf., SVM-L, SVM-G and LG group). A few baselines use the social contexts with content-based features (e.g. TriFN, FANG, Declare). Our model uses both the news content and the social contexts with the side information. For a fair comparison, we test the baselines using their default settings. In addition, we also test them by including both news content and social contexts. In this case, we create variants of baselines (default setting and setting with both news and social context).

To determine the optimal hyperparameter settings for the baselines, we primarily consult the original papers. However, there is little information provided on how the baselines are tuned in these papers. So, we optimize the hyperparameters of each baseline using our dataset. We also train all the models from scratch. We optimize the standard hyperparameters (epochs, batch size, dimensions, etc.) for each baseline. Some of the hyperparameters specific to individual models are reported below (along with the description of each method).

FANG [ 15 ]: it is a deep neural model to detect fake news using graph learning. We optimize the following losses simultaneously during the model training: 1) unsupervised proximity loss, 2) self-supervised stance loss, and 3) supervised fake news detection loss (details can be found in the paper [ 15 ]), whereas the implementation details are available here. Footnote 14 We feed both the news-related information and social contexts into the model.

2-Stage Tranf. [ 1 ]: it is a deep neural fake news detection model that focuses on fake news with short statements. The original model is based on BERT-base [ 17 ], and checkpoint is recommended to use, so we build this model using the same method. We feed the news-related information and social contexts into the model. We also represent another variant of this model where we remove the news body and news source, keeping only social contexts (as in the default model) and represent it as 2-Stage Tranf. ( nc -).

exBAKE [ 7 ]: it is another fake news detection method based on deep neural networks. This model is also based on the BERT model and is designed for the content of news. Besides showing the original model's results, we also incorporate social contexts into the model by introducing another variant of this model. The model variants are exBAKE (with both news content and social contexts) and exBAKE ( sc- ) (default model, without social contexts).

Declare [ 8 ]: it is a deep neural network model that assesses the credibility of news claims. This model uses both the news content and social contexts by default, so we feed this information to the model. An implementation of the model can be found here. Footnote 15

TriFN [ 4 ]: it is a matrix factorization based model that uses both news content and social contexts to detect fake news. We give both the news and social contexts to the model. The model implementation can be accessed here. Footnote 16

Grover [ 12 ]: it is a deep neural network-based fake news detection model based on GPT-2 [ 18 ] architecture. The model takes news related information and can incorporate additional social contexts too. We give both the news content and social contexts to Grover. In addition, we remove the social contexts and keep the content information only (as in default Grover model), which we represent as Grover ( sc -). We use the Grover-base implementation of the model and initialize the model using the GPT-2 checkpoint. Footnote 17 The model implementation is available here. Footnote 18

SVM-L; SVM-G; LG [ 14 ]: it is a machine learning model based on similarity among the friends’ networks to discover fake accounts in social networks and detect fake news. We use all the proposed variants: linear support vector machine (SVM), medium Gaussian SVM and logistic regression, and optimize them to their optimal settings.

BERT [ 17 ]: BERT (bidirectional encoder representations from Transformers) is a Google-developed Transformer-based model. We use both the cased (BERT-c) and uncased (BERT-u) version of the BERT, with 24-layer, 1024-hidden, 16-heads, and 336 M parameters. The model implementation can be found here. Footnote 19

VGCN-BERT [ 60 ]: it is a deep neural network-based model that combines the capability of BERT with a vocabulary graph convolutional network (VGCN). The model implementation is available here. Footnote 20

XLNET [ 61 ]: it is an extension of the Transformer-XL model, which was pre-trained with an autoregressive method to learn bidirectional contexts. We use the hyperparameters: 24-layer, 1024-hidden, 16-heads, 340 M parameters. The model implementation is available here. Footnote 21

GPT-2 [ 18 ]: it is a causal (unidirectional) Transformer pre-trained using language modelling. We use the hyperparameters: 24-layer, 1024-hidden, 16-heads, 345 M parameters. The model implementation is available here. Footnote 22

DistilBERT [ 62 ]: it is a BERT-based small, fast, cheap, and light Transformer model, which uses 40% fewer parameters than BERT-base, runs 60% faster, and keeps over 95% of BERT’s results, as measured in the paper [ 62 ]. We only use the cased version for this model (based on the better performance of the BERT cased version, also shown in the later experiments). We use the hyperparameters: 6-layer, 768-hidden, 12-heads, 65 M parameters, and the model implementation is available here. Footnote 23

Longformer [ 63 ]: it is a Transformer-based model that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. We use the hyperparameters with 24-layer, 1024-hidden, 16-heads, ~ 435 M parameters and the model is initiated from the RoBERTa-large Footnote 24 checkpoint, trained on documents of max length 4,096. The model implementation is available here. Footnote 25

We use both news content and social contexts to train the Transformer-based models (BERT, VGCN-BERT, XLNET, GPT-2, DistillBERT), which are built for taking textual information. But these models can also handle the social contexts, as evidenced in some preliminary tests where we first fed the news content, then news content with social contexts, and found a marginal difference in performance.

Text CNN [ 64 ]: it is a convolution neural network (CNN) with one layer of convolution on top of word vectors, where the vectors are pre-trained on a large number (~ 100 billion) of words from Google News. Footnote 26 The model implementation is available here. Footnote 27

XGBoost [ 65 ]: it is an optimized distributed machine learning algorithm under the gradient boosting framework, with implementation from here. Footnote 28

We report the results for each baseline based on best performing hyperparameters for each evaluation metric.

6 Results and analyses

In this section, we present the results and analyse them.

6.1 Model performance

We show the learning curve for training loss and validation loss during model training in Fig.  7 .

figure 7

Training versus validation loss

In our model, the validation loss is quite close to the training loss. The validation loss is slightly higher than training loss, but overall, both values are converging (when plotting loss over time). Overall, it shows a good fit for model learning.

We also test the model's performance on the test data to show the confusion matrix in Table 4 .

Based on the confusion matrix in Table 4 , the model accuracy is 74.89%, which means more than 74% of the results are correct. We get the precision of 72.40%, which means that we have a few false positives (news is real but predicted as fake), and we can correctly predict a large portion of true positives (i.e. the news is fake and predicted as fake). We get a recall value of 77.68%, which shows we have many more true positives than false negatives. Generally, a false negative (news is fake but predicted as real) is worse than a false positive in fake news detection. In our experiment, we get less false negatives than false positives. Our F1-score is 74.95%, which is also quite high.

6.2 Overall performance comparison

We show the best results of all baselines and our FND-NS model using all the evaluation metrics in Table 5 . The results are based on data from both datasets, i.e. social contexts from Fakeddit on the NELA-GT-19 news. The input and hyperparameter optimization settings for each baseline model are given above (Sect.  5.5 ). The best scores are shown in bold.

Overall, we see that our proposed FND-NS model has the highest accuracy (74.8%), precision (72.4%), recall (77.6%), AUC (70.4%) and average precision (71%) among all the models. The superiority of our model is attributed to its advantages:

Our model utilizes rich features from news content and social contexts to extract the specifics of fake news.

We exploit the knowledge transition from large-scale pre-trained models (e.g. MNLI) to the fake news detection problem with transfer learning. The right choice of the pre-trained checkpoints on the specific corpus helps us make better predictions. During empirical testing, we check the performance of our model with and without including the pre-trained checkpoints. We find better results with the inclusion of the MNLI checkpoint.

We model the timeliness in our model through an autoregressive model, which helps us detect fake news in a timely and early manner.

We address the label shortage problem through the proposed weak supervision module, which helps us make better predictions on unforeseen news.

We have the following more findings from the results:

Among the fake news detection baselines, the overall performance of FANG is the best. The performance of FANG is the second best after our FND-NS model. FANG uses graph learning to detect fake news and focus on learning context representations from the data. The overall performance of exBAKE and 2-Stage Tranf. as indicated in most metrics is the next best. These models (exBAKE and 2-Stage Tranf.) are based on the BERT model and are suitable for representation learning. Our model outperforms these models, most likely because we focus on both autoregression and representation learning.

The 2-Stage-Tranf. uses the claim data from the social media. We also test this model with its default input setting as in 2-Stage Tranf. ( nc- ), omitting the news content (news body, headline, source) and allowing only social context features (such as post, title, score). With this change, we do not find much difference in the performance. We find the better performance (though marginal) of 2-Stage-Tranf. when we only keep the news-related features (not including social contexts). This is most likely due to the support of the 2-Stage-Tranf. model for auxiliary information. Our model performs better than 2-Stage-Tranf. with its support for side information. This is likely because our model can handle longer sequence lengths than the baselines, resulting in some loss of information and thus accuracy in those models.

Then comes the performance of Declare, TriFN and Grover models, all of which are considered the benchmark models in fake news research. Grover is a content-based neural fake news detection model. Declare is a neural network framework that detects fake news based on the claim data. TriFN is a non-negative matrix factorization algorithm that includes news content and social contexts to detect fake news.

We also test Grover (content-based model) without social contexts in Grover ( sc -). We find some better performance of Grover ( sc -) than Grover's (with both inputs). This result shows that a model built on rich content features (news body, headline, publication date) with autoregressive properties (GPT-2 like architecture) can perform better even without social contexts.

The SVM and LG are also used for fake news detection. Due to their limited capabilities and the use of hand-crafted dataset features, the accuracies of SVM and LG are lower in these experiments. The results for SVM and LG do not generalize the performance of these models to all situations in this field.

In general, the performance of the Transformer-based methods is better than the traditional neural-based methods (Text CNN) and the linear models (SVM, LG, XGBoost). This is probably because the Transformer-based methods use the multi-head attention and positional embeddings, which are not by-default integrated with the CNNs (of text CNN) and the linear methods. With the default attention mechanisms and more encoding schemes (e.g. token, segment and position), the Transformers compute input and output representations better than the traditional neural methods. Our FND-NS model, however, performs better than these Transformer models. This is because our framework includes many add-ons, such as weak supervision, representation learning, autoregression, which (all of them together) are not present in the typical Transformer models.

The general performance of simple neural methods (e.g. Text CNN, Declare) that are not Transformer-based is better than the linear methods (SVM, LG, XGBoost). This is probably because the linear methods use manual feature engineering, which is not optimal. On the other hand, the neural-based methods can capture both the global and the local contexts in the news content and social contexts to detect the patterns of fake news.

Among the Transformers, the cased model (e.g. BERT-c), in general, performs better than its respective uncased version (e.g. BERT-u). Generally, fake or false news uses capital letters and emotion-bearing words to present something provoking. Horne and Adalı [ 29 ] also present several examples where fake titles use capitalized words excessively. This shows why the cased models can detect fake news better compared to the uncased versions.

The overall performance of the distilled (condensed) versions (Distill BERT) is slightly lower than their respective actual models (BERT). Based on the better performance of the BERT cased version over its uncased version, we use the cased version of Distill BERT. The Distill BERT does not use token-type embeddings and retains only half of the layers and parameters of the actual BERT, which probably results in the overall lower prediction accuracy. The distilled versions balance the computational complexity and accuracy. This result suggests that using the distilled version can achieve comparable results (to the original model) with better speed.

We also see that the general performance of the autoregressive models (XLNet and GPT-2) is better than the most autoencoding models (DistilBERT, Longformer, BERT-u). The exception is seen in BERT-c for some scores. The autoregressive Transformers usually model the data from left to the right and are suitable for time-series modelling. They predict the next token after reading all the previous ones. On the other hand, the autoencoding models usually build a bidirectional representation from the whole sentences and are suitable for natural language understanding tasks, such as GLUE (general language understanding evaluation), classification, and text categorization [ 17 ]. Our fake news detection problem implicitly involves data that vary over time. The autoregressive models show relatively better results. Our FND-NS model performs the best because it has both the autoencoding model and the autoregressive model.

We find VGCN-BERT as a competitive model. The VGCN is an extension of the CNN model combined with the graph-based method and the BERT model. The results in Table 5 show the good performance of CNN in the TextCNN method and that of the BERT model. The neural graph networks have recently demonstrated noticeable performance in the representative learning tasks by modelling the dependencies among the graph states [ 66 ]. That is why the performance of VGCN-BERT (using BERT-u) is better than TextCNN and BERT-u alone. This result also indicates that hybrid models are better than standalone models. FANG also uses a graph neural network with the supervised learning loss function and has shown promising results.

6.3 Effectiveness of weak supervision

In this experiment, we test the effectiveness of the weak supervision module on the validation data for the accuracy measure.

We show different settings for weak supervision. These settings are:

M1: Weak supervision on both datasets, NELA-GT-19 and Fakeddit with original labels + user credibility label + crowd response label;

M2: Weak supervision on both datasets, NELA-GT-19 and Fakeddit with original labels + user credibility label;

M3: Weak supervision on both datasets, NELA-GT-19 and Fakeddit with original labels + crowd response label;

M4: Weak supervision on both datasets, NELA-GT-19 and Fakeddit with original labels;

M5: Weak supervision on NELA-GT-19 only;

M6: Weak supervision on Fakeddit only with original labels;

M7: Weak supervision on Fakeddit with original + user credibility labels;

M8: Weak supervision on Fakeddit with original + crowd response labels;

M9: Weak supervision on Fakeddit with original + user credibility labels + crowd response labels.

The results of FND-NS on these settings are shown in Fig.  8 . The results show that our model performs better when we include newly learned weak labels and worse when we omit any one of the weak labels. This is seen with the best performance of FND-NS in the M1 setting. The crowd response label proves to be more productive than the user credibility label. This is seen with > 2% loss in accuracy in M2 and M7 (without crowd response) compared to the M3 and M8 (with crowd response). This conclusion is also validated by another experiment in the ablation study (Sect.  6.4 ) discussed later.

figure 8

Accuracy percentage on different settings of weak supervision for FND-NS model

We also see that the model performance improves when we include both datasets. This can be seen with the overall better performance of the model with both datasets. In general, all these results (in Fig.  8 ) indicate that the weak labels may be imprecise but can be used to provide accurate predictions. Some of these FND-NS results are also present in the Ablation study (Sect.  6.4 ) and are explored in more detail there.

In the Fakeddit paper [ 22 ], we see the performance of the original BERT model to be around 86%, which is understandable because the whole Fakeddit dataset (ranging from the year 2008 till 2019) is used in that work. In our paper, we use the Fakeddit data only for the year 2019. Usually, the models perform better with more data. In particular, deep neural networks (e.g. BERT) perform better with more training examples. Omitting many training examples could affect the performance of the model. This is the possible reason we see lower accuracy of our model in this experiment using the Fakeddit data. For the same reason, we see the performance of the original BERT a bit lower with the Fakeddit data in Table 5 .

The results on the original NELA-GT-19 may also be different in our work. This is because we do not consider much of the mixed labels from the original dataset. Also, since we use under-sampling for data balancing for both of our datasets, the results may vary for the experiments in this paper versus the other papers using these datasets.

6.4 Ablation study

In the ablation study, we remove a key component from our model one a time and investigate its impact on the performance. The list of reduced variants of our model are listed below:

FND-NS: The original model with news and social contexts component;

FND-N: FND-NS with news component—removing social contexts component;

FND-N(h-): FND-N with headlines removed from the news component;

FND-N(b-): FND-N with news body removed from the news component;

FND-N(so-): FND-N with news source removed from the news component;

FND-N(h-)S: FND-NS with headlines removed from the news component;

FND-N(b-)S: FND-NS with news body removed from the news component;

FND-N(so-)S: FND-NS with news source removed from the news component;

FND-S: FND-NS with social context component—removing news component;

FND-S (uc-): FND-S with user credibility removed from the social contexts;

FND-S (cr-): FND-S with crowd responses removed from the social contexts;

FND-NS (uc-): FND-NS with user credibility removed from the social contexts;

FND-NS (cr-): FND-NS with crowd responses removed from the social contexts;

FND (en-)-NS: FND-NS with the encoder block removed—sequences from both the news and social contexts components are fed directly into the decoder;

FND (de-)-NS: FND-NS with the decoder block removed;

FND (12ly-)-NS: FND-NS with 12 layers removed (6 from encoder and 6 from decoder).

The results of the ablation study are shown in Table 6 .

The findings from the results are summarized below:

When we remove the news component, the model accuracy drops. This is demonstrated by the lower scores of FND-S, compared to the original model FND-NS in Table 6 . However, when we remove the social context component, the model accuracy drops more. This is seen with the lower accuracy of FND-N (without social contexts) compared to the FND-S. This result indicates that both the news content and social contexts play an essential role in fake news detection, as indicated in the best performance of the FND-NS model.

The results also show that the performance of the FND-NS model is impacted more when we remove the news body than removing the headline or the source of the news. This is seen with relatively lower accuracy of FND-N(b-) compared to both the FND-N(h-) and FND-N(so-). The same results are seen in the lower accuracy of FND-N(b-)S compared to both the FND-N(h-)S and FND-N(so-)S. The result shows that the headline and source are important, but the news body alone carries more information about fake news. The source seems to carry more information than the headline; this is perhaps related to the partisan information.

From the social contexts, we find that when we remove the user credibility or the crowd responses, the model performance in terms of accuracy is decreased. Between the user credibility and crowd responses, the model performance is impacted more when we remove crowd responses. This is seen with the lower performance of FND-S(cr-) and FND-NS(cr-) compared to FND-S(uc-) and FND-NS(uc-). The same finding is also observed in Fig.  8 for the results of weak supervision. The probable reason for the crowd responses being more helpful for fake news detection could be that they provide users’ overall score on a news article directly, whereas the user credibility only plays an indirect role in the prediction process. According to the concept drift theory, the credibility levels of the users may change over time. Some users leave the system permanently, some change their viewpoints, and new users keep coming into the system. Therefore, the user credibility may not be as informative as crowd responses and thus has less effect on the overall detection result.

The model performance is impacted when we remove the encoder from the FND-NS. The model performance is affected even more when we remove the decoder. This is seen with the lower scores of FND(de-)-NS, which is lower than FND(en-)-NS. In our work, the decoder is the autoregressive model, and the encoder is the autoencoding model. This result also validates our previous finding from the baselines (Table 5 ), where we find the better performance of the autoregressive model (e.g. GPT-2) compared to most autoencoding models (Longformer, DistillBERT, BERT-c).

Lastly, we find that removing layers from the model lowers the accuracy of the FND-NS model. We get better speed upon removing almost half the layers and the parameters, but this comes with the information loss and the lower accuracy. This also validates our baseline results in Table 5 , where we see that the distilled models are faster in speed, but they do not perform as good as the original models.

We also test the sequence lengths in {50, 100, 250, 500, 700} in our model. It is important to mention that the large sequence length often causes memory issues and interrupts the model’s working. However, we adjust the sequence length according to the batch size. This facility to include sequence length > 512 is provided by BART. Most models (e.g. BERT, GPT-2) do not support sequence length > 512. Our model performance with different sequence lengths is shown in Fig.  9 .

figure 9

The FND-NS with different sequence lengths

The results in Fig.  9 show that our model performs the best when we use a sequence length of 700. Our datasets consist of many features from the news content and social contexts over the span of close to one year. The news stories are on average 500 words or more, which carries important information about the news veracity. The associated side information is also essential.

The results clearly show that truncating the text could result in information loss. It is why we see the information loss with the smaller sequence lengths. With a larger sequence length, we could accurately include more news features and users’ engagement data to accurately reflect the patterns in users’ behaviours.

We also observe that the sequence length depends on the average sequence length of the dataset. Since our datasets are large and by default contain longer sequences, we get better performance with a larger sequence length. Due to the resource limitations, we could not test on further larger lengths, which we leave for future work.

6.5 The impact of concept drift

The concept drift occurs when the interpretation of the data changes over time, even when the data may not have changed [ 10 ]. Concept drift is an issue that may lead to the predictions of trained classifiers becoming less accurate as time passes. For example, the news that is classified as real may become fake after some time. The news profiles and users’ profiles classified as fake may also change over time (some profiles become obsolete and some are removed). Most importantly, the tactics of fake news change over time, as new ways are developed to produce fake news. These types of changes will result in the concept drift. A good model can combat the concept drift.

In this experiment, we train our model twice a month and then test on each week moving forward. At first, train on the first two weeks’ data and test on the data from the third week. Next time, the model is trained on the data from the next two weeks plus the previous two weeks (e.g. week 1, 2, 3, 4) and tested on the next week (e.g. week 5) and this process (training on four weeks’ data and testing on the following week) continues. We evaluate the performance of the model using AUC and report the results in Fig.  10 . The reason we choose AUC here is that it is good at ranking predictions (compared to other metrics).

figure 10

AUC of FND-NS during different weeks

Overall, the concept drift impacts our FND-NS model's performance, but these changes happen slowly over time. As shown in Fig.  10 , the model performance initially improves, then the performance is impacted by the concept drift in mid of March. This probably shows the arrival of unforeseen events during this time period. Once the model is trained on these events, we see a rise in performance. This is shown by a better and steady performance of the model in April. We then see a sudden rise in performance in mid-April. This is probably because, up to this point, the model has been trained on those events. After this point, the model performance becomes steady.

Overall, the results show that our model effectively deals with concept drift, as the model performance is not much impacted at a large scale during all the timesteps. In general, these results suggest that simply retraining the model every so often is enough to keep changes with fake news. We also observe that fake news's concept drift does not occur as often as in real news scenarios. The same kind of analysis is also seen in related work [ 3 ], where the authors performed extensive experiments on concept drift and concluded that the content of the fake news does not drift as abruptly as that of the real news. Though the fake news does not evolve so often as the real news, once planted, the fake news travels further, farther and broader than the real news. Therefore, it is important to detect fake news as early as possible.

6.6 The effectiveness of early fake news detection

In this experiment, we compare the performance of our model and baselines on early fake news detection. We follow the methodology of Liu and Wu [ 39 ] to define the propagation path for a news story, shown in Eq. ( 19 ):

where \(x_{j}\) is the observation sample and \(\mathcal{T}\) is the detection deadline. The idea is that any observation data after the detection deadline \(\mathcal{T}\) cannot be used for training. For example, a piece of news with timestep t means that the news is propagated t timesteps ago. Following [ 39 ] for choosing the unit for the detection deadlines, we also take the units in minutes. According to the research in fake news detection, fake news usually takes less than an hour to spread. It is easy to detect fake news after 24 h, but earlier detection is a challenge, as discussed earlier.

In this experiment, we evaluate the performance of our model and the baselines on different detection deadlines or timesteps. To report the results, we take the observations under the detection deadlines: 15, 30, 60, 100 and 120 min, as shown in Fig.  11 . For simplicity, we keep the best performing models among the available variants, e.g. among the Transformers, we keep only BERT-c, GPT-2 and VGCN-BERT based on better scores in the previous experiment (Table 5 ). Similarly, we keep the LG from its group [ 14 ]. Among the other fake news detection baselines, we include all (FANG, exBAKE, 2-Stage Tranf, Grover and Declare). We also keep the other baselines (TextCNN and XGBoost). We evaluate the performance of the models using the AUC measure.

figure 11

a Fake news detection based on 15-min. deadline. b Fake news detection based on 30-min. deadline. c Fake news detection based on 30-min. deadline. d Fake news detection based on 100-min. deadline. e Fake news detection based on 120-min. deadline

The results show that our FND-NS model outperforms all the models for early fake news detection. FND-NS also shows a steady performance during all these detection deadlines. The autoregressive modelling (decoder) in FND-NS helps in modelling future values based on past observations. We have more observations listed below:

The autoregressive models (GPT-2, Grover) perform better for early detection, probably because these models implicitly assume future values based on previous observations.

The autoencoding models (BERT, exBAKE) show relatively lower performance than autoregressive models (GPT-2, Grover) in the early detection tasks. This is because these models are for representation learning tasks. These models perform well when more training data is fed into the models (as seen with the better performance in BERT-c in Table 5 ), but the deadline constraints have perhaps limited their capacity to do early detection.

The FANG, exBAKE, 2-Stage Transf., TriFN, Declare, VGCN-BERT perform better during later time steps. This is understandable, as the model learns over time.

The LG, TextCNN and XGBoost do not perform as good as the other baselines.

Overall, the results suggest that since linguistic features of the fake news and the social contexts on the fake news are less available during the early stage of the news, we see the lower performance of all the models during the early timesteps. Our model shows better accuracy than other models because we consider both the news and the social contexts. The news data and the social media posts contain sufficient linguistic features and are supplementary to each other, which helps us determine the fake news earlier than the other methods.

7 Limitations

Our data and approach have some limitations that we mention below:

7.1 Domain-level error analysis

The NELA-GT-19 comprises 260 news sources, which can only represent a limited amount of fake news detection analysis over a given period of time. As a result, the current results are based on the provided information. There may be other datasets that are more recent, covering different languages or target audiences, aligned with other fake news outlets (sources). They may have been missed in these results. In future, we would like to use other datasets such as NELA-GT-20 [ 57 ], or scrape more news sources from various websites and social media platforms.

Due to concept drift, the model trained on our datasets may have biases [ 67 ], causing some legitimate news sites to be incorrectly labelled. This may necessitate a re-labelling and re-evaluation process using more recent data.

According to recent research [ 21 ], the producers of disinformation change their tactics over time. We also want to see how these tactics evolve and incorporate these changes into our detection models.

At the moment, we evaluate our models on a binary classification problem. Our next step will be to consider multi-label classification, which will broaden the model’s applicability to various levels of fake news detection.

7.2 Ground truth source-level labels for news articles

We have used the Media Bias Fact Check’s source-level ground truth labels as proxies for the news articles. According to previous research, the choice of ground truth labels impacts downstream observations [ 68 ]. Our future research should evaluate models using different ground truth from fake and mainstream news sites. Furthermore, some sources consider more fine-grained fake news domains and more specific subcategories. Understanding whether existing models perform better in some subcategories than others can provide helpful information about model bias and weaknesses.

7.3 Weak supervision

Motivated by the success of weak supervision in similar previous works [ 9 , 57 , 69 ], we are currently using weak supervision to train deep neural network models effectively. In our specific scenario, applying this weak supervision scheme to the fake news classification problem also reduced the model development time from weeks to days. Moreover, despite noisy labels in weakly labelled training data, our results show that our proposed model performs well using weakly labelled data. However, we acknowledge that if we rely too much on weakly labelled data, the model may not generalize in all cases. This limitation can be overcome by considering manual article-level labelling, which has its own set of consequences (e.g. laborious and time-consuming process).

In future, we intend to use semi-supervised learning [ 70 ] techniques to leverage unlabelled data using structural assumptions automatically. We could also use the transfer learning technique [ 71 ] to pre-train the model only on fake news data. Furthermore, we plan to try knowledge-based weak supervision [ 54 ], which employs structured data to label a training corpus heuristically. The knowledge-based weak supervision also allows the automated learning of an indefinite number of relation extractors.

7.4 User profiles

Another limitation in this study is that we only use a small portion of users’ profiles from the currently available dataset (i.e. Fakeddit). Though Fakeddit covers users’ interactions over a long range of timestamps, we could only use a portion because we need to match users’ interactions (social contexts) from the Fakeddit dataset with the timeline of news data from the NELA-GT-19 dataset. This limitation, however, only applies to our test scenarios. The preceding issue will not arise if a researcher or designer uses complete data to implement our model on their social media platform.

One future direction for our research is to expand the modelling of users’ social contexts. First, we can include user connections in a social network in our model. User connections information can reveal the social group to which a user belongs and how the social network nurtures the spread of fake news. Second, we may incorporate user historical data to better estimate the user status, as a user’s tendency to spread fake news may change over time.

Another approach is to crawl more real-world data from news sites and social media platforms (such as Twitter) to include more social contexts, which could help identify more fake news patterns. Crawling multi-modal data such as visual content and video information can also be useful for detecting fake news.

Our proposed fake news detection method can be applied to other domains, such as question-answering systems, news recommender systems [ 47 , 72 ], to provide authentic news to readers.

7.5 Transfer learning

We have used transfer learning to match the tasks of fake news detection and user credibility classification. We have evidence that the MNLI can be useful for such tasks [ 73 , 74 , 75 ]. However, we must be cautious to avoid negative transfer, which is an open research problem.

We conducted preliminary research to understand the transferability between the source and target domains to avoid negative transfer learning. After that, we choose MNLI to extract knowledge based on appropriate transferability measures for learning fake news detection and user credibility. We understand that an entire domain (for example, from MNLI) cannot be used for transfer learning; however, for the time being, we rely on a portion of the source domain for useful learning in our target domain. The next step in this research will be to identify a more specific transfer learning domain.

7.6 User credibility

As previously stated, we transfer relevant knowledge from MNLI to user credibility, and we admit that the relatedness between the two tasks can be partial. In future, we plan to get user credibility scores through other measures such as FaceTrust [ 76 ], Alexa Rank [ 77 ], community detection algorithms [ 48 ], sentiment analysis [ 33 ] and profile ranking techniques [ 49 ].

7.7 Baselines

We include a variety of baseline methods in our experiment. While we choose algorithms with different behaviours and benchmarking schemes in mind, we must acknowledge that our baseline selection is small compared to what is available in the entire field. Our ultimate goal is to understand broad trends. We recognize that our research does not evaluate enough algorithms to make a broad statement about the whole fake news detection field.

7.8 Sequence length

We find that the difference in sequence length is the most critical factor contributing to FND-NS outperforming the benchmark models in our experiments. We acknowledge that most of the models used in this study do not support the sequence length larger than 512. We did not shorten the sequence lengths during the ablation study, but ablation of heavier features such as the news body or headline tends to reduce total sequences, which is why our model performed differently (worse than expected) during the ablation study. Nevertheless, we would like to draw the readers’ attention to a trade-off between the model’s predictive performance and computational cost. In our experiments, models that consider shorter sequences sacrifice some predictive performance for relatively shorter processing time. The predictive power of the classifiers usually improves by increasing the sequence length [ 19 , 63 ] that we choose to work with.

7.9 Experimental set-up

Another limitation of this study is the availability of limited resources (like GPUs, memory, data storage, etc.), due to which we could not perform many experiments on other large-scale data sources. In future, we plan to expand our experiments using better infrastructure.

So far, our model is trained offline. To satisfy the real-time requirement, we just need to train and update the model periodically.

8 Conclusion

In this paper, we propose a novel deep neural framework for fake news detection. We identify and address two unique challenges related to fake news: (1) early fake news detection and (2) label shortage. The framework has three essential parts: (1) news module, (2) social contexts module and (3) detection module. We design a unique Transformer model for the detection part, which is inspired by the BART architecture. The encoder blocks in our model perform the task of representation learning. The decoder blocks predict the future behaviour based on past observations, which also helps us address the challenges of early fake news detection. The decoders depend on the working of the encoders. So, both modules are essential for fake news detection. To address the label shortage issue, we propose an effective weak supervision labelling scheme in our framework. To sum up, the inclusion of rich information from both the news and social contexts and weak labels proves helpful in building a strong fake news detection classifier.

https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/ .

https://www.wrcbtv.com/story/43076383/first-doses-of-covid19-vaccines-administered-at-chattanooga-hospital-on-thursday .

https://archive.is/OXJ60 .

https://reporterslab.org/fact-checking/ .

https://www.politifact.com/ .

https://www.mturk.com/ .

https://www.bbc.com/news/technology-56402378 .

https://dl.fbaipublicfiles.com/fairseq/models/bart.large.mnli.tar.gz .

https://doi.org/10.7910/DVN/O7FWPO

https://github.com/entitize/fakeddit .

https://colab.research.google.com/ .

https://github.com/nguyenvanhoang7398/FANG .

https://github.com/atulkumarin/DeClare .

https://github.com/KaiDMML/FakeNewsNet .

https://openai.com/blog/tags/gpt-2/ .

https://github.com/rowanz/grover .

https://github.com/google-research/bert .

https://github.com/Louis-udm/VGCN-BERT .

https://github.com/zihangdai/xlnet .

https://github.com/openai/gpt-2 .

https://huggingface.co/transformers/model_doc/distilbert.html .

https://github.com/pytorch/fairseq/tree/master/examples/roberta .

https://github.com/allenai/longformer .

https://code.google.com/archive/p/word2vec/ .

https://github.com/dennybritz/cnn-text-classification-tf .

https://xgboost.readthedocs.io/en/latest/ .

Liu, C., Wu, X., Yu, M., Li, G., Jiang, J., Huang, W., Lu, X.: A two-stage model based on BERT for short fake news detection. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 11776 LNAI, pp. 172–183 (2019). https://doi.org/10.1007/978-3-030-29563-9_17

Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. (2020). https://doi.org/10.1145/3395046

Article   Google Scholar  

Horne, B.D., NØrregaard, J., Adali, S.: Robust fake news detection over time and attack. ACM Trans. Intell. Syst. Technol. (2019). https://doi.org/10.1145/3363818

Shu, K., Wang, S., Liu, H.: Beyond news contents: The role of social context for fake news detection. In: WSDM 2019—Proceedings of 12th ACM International Conference on Web Search Data Mining, vol. 9, pp. 312–320 (2019). https://doi.org/10.1145/3289600.3290994

Liu, Y., Wu, Y.F.B.: FNED: a deep network for fake news early detection on social media. ACM Trans. Inf. Syst. (2020). https://doi.org/10.1145/3386253

Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359 , 1146–1151 (2018)

Jwa, H., Oh, D., Park, K., Kang, J.M., Lim, H.: exBAKE: automatic fake news detection model based on Bidirectional Encoder Representations from Transformers (BERT). Appl. Sci. 9 , 4062 (2019). https://doi.org/10.3390/app9194062

Popat, K., Mukherjee, S., Yates, A., Weikum, G.: Declare: debunking fake news and false claims using evidence-aware deep learning. arXiv Preprint. http://arxiv.org/abs/1809.06416 . (2018)

Wang, Y., Yang, W., Ma, F., Xu, J., Zhong, B., Deng, Q., Gao, J.: Weak supervision for fake news detection via reinforcement learning. In: AAAI 2020—34th AAAI Conference on Artificial Intelligence, pp. 516–523 (2020)

Hoens, T.R., Polikar, R., Chawla, N.: V: Learning from streaming data with concept drift and imbalance: an overview. Prog. Artif. Intell. 1 , 89–101 (2012)

Kaliyar, R.K., Goswami, A., Narang, P., Sinha, S.: FNDNet—a deep convolutional neural network for fake news detection. Cogn. Syst. Res. 61 , 32–44 (2020). https://doi.org/10.1016/j.cogsys.2019.12.005

Zellers, R., Holtzman, A., Rashkin, H., Bisk, Y., Farhadi, A., Roesner, F., Choi, Y.: Defending against neural fake news. Neurips (2020)

Yang, S., Shu, K., Wang, S., Gu, R., Wu, F., Liu, H.: Unsupervised fake news detection on social media: a generative approach. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5644–5651 (2019)

Mohammadrezaei, M., Shiri, M.E., Rahmani, A.M.: Identifying fake accounts on social networks based on graph analysis and classification algorithms. Secur. Commun. Netw. (2018). https://doi.org/10.1155/2018/5923156

Nguyen, V.H., Sugiyama, K., Nakov, P., Kan, M.Y.: FANG: leveraging social context for fake news detection using graph representation. Int. Conf. Inf. Knowl. Manag. Proc. (2020). https://doi.org/10.1145/3340531.3412046

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., Zettlemoyer, L.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv (2019). https://doi.org/10.18653/v1/2020.acl-main.703

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv Preprint. http://arxiv.org/abs/1810.04805 . (2018)

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog. 1 , 9 (2019)

Google Scholar  

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 69–77 (2016)

Gruppi, M., Horne, B.D., Adalı, S.: NELA-GT-2019: A large multi-labelled news dataset for the study of misinformation in news articles. arXiv Preprint. http://arxiv.org/abs/2003.08444v2 (2020)

Nakamura, K., Levy, S., Wang, W.Y.: r/fakeddit: a new multimodal benchmark dataset for fine-grained fake news detection. arXiv Preprint. http://arxiv.org/abs/1911.03854 (2019)

Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: FakeNewsNet: a data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big Data 8 , 171–188 (2020). https://doi.org/10.1089/big.2020.0062

Pizarro, J.: Profiling bots and fake news spreaders at PAN’19 and PAN’20: bots and gender profiling 2019, profiling fake news spreaders on Twitter 2020. In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pp. 626–630 (2020)

Horne, B.D., Dron, W., Khedr, S., Adali, S.: Assessing the news landscape: a multi-module toolkit for evaluating the credibility of news. In: The Web Conference 2018—Companion of the World Wide Web Conference, WWW 2018, pp. 235–238 (2018)

Przybyla, P.: Capturing the style of fake news. Proc. AAAI Conf. Artif. Intell. 34 , 490–497 (2020). https://doi.org/10.1609/aaai.v34i01.5386

Silva, R.M., Santos, R.L.S., Almeida, T.A., Pardo, T.A.S.: Towards automatically filtering fake news in Portuguese. Expert Syst. Appl. 146 , 113199 (2020). https://doi.org/10.1016/j.eswa.2020.113199

Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., Stein, B.: A stylometric inquiry into hyperpartisan and fake news. arXiv Preprint. http://arxiv.org/abs/1702.05638 . (2017)

Horne, B., Adali, S.: This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In: Proceedings of the International AAAI Conference on Web and Social Media (2017)

Zhou, X., Wu, J., Zafarani, R.: SAFE: similarity-aware multi-modal fake news detection. Adv. Knowl. Discov. Data Min. 12085 , 354 (2020)

De Maio, C., Fenza, G., Gallo, M., Loia, V., Volpe, A.: Cross-relating heterogeneous Text Streams for Credibility Assessment. In: IEEE Conference on Evolving and Adaptive Intelligent Systems, 2020-May, (2020). https://doi.org/10.1109/EAIS48028.2020.9122701

Wanda, P., Jie, H.J.: DeepProfile: finding fake profile in online social network using dynamic CNN. J. Inf. Secur. Appl. (2020). https://doi.org/10.1016/j.jisa.2020.102465

Naseem, U., Razzak, I., Khushi, M., Eklund, P.W., Kim, J.: Covidsenti: a large-scale benchmark Twitter data set for COVID-19 sentiment analysis. IEEE Trans. Comput. Soc. Syst. (2021)

Naseem, U., Razzak, I., Eklund, P.W.: A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimed. Tools Appl. 80 , 1–28 (2020)

Naseem, U., Razzak, I., Hameed, I.A.: Deep context-aware embedding for abusive and hate speech detection on Twitter. Aust. J. Intell. Inf. Process. Syst. 15 , 69–76 (2019)

Huang, Q., Zhou, C., Wu, J., Liu, L., Wang, B.: Deep spatial–temporal structure learning for rumor detection on Twitter. Neural Comput. Appl. (2020). https://doi.org/10.1007/s00521-020-05236-4

Jiang, S., Chen, X., Zhang, L., Chen, S., Liu, H.: User-characteristic enhanced model for fake news detection in social media. In: CCF International Conference on Natural Language Processing and Chinese Computing, pp. 634–646 (2019)

Qian, F., Gong, C., Sharma, K., Liu, Y.: Neural user response generator: fake news detection with collective user intelligence. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 3834–3840 (2018)

Liu, Y., Wu, Y.F.B.: Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 354–361 (2018)

Cao, J., Qi, P., Sheng, Q., Yang, T., Guo, J., Li, J.: Exploring the role of visual content in fake news detection. Disinformation, Misinformation, Fake News Social Media, pp. 141–161 (2020)

Jin, Z., Cao, J., Guo, H., Zhang, Y., Luo, J.: Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 795–816 (2017)

Karimi, H., Roy, P., Saba-Sadiya, S., Tang, J.: Multi-source multi-class fake news detection. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1546–1557 (2018)

Wu, X., Lode, M.: Language models are unsupervised multitask learners (summarization). OpenAI Blog. 1 , 1–7 (2020)

Vijjali, R., Potluri, P., Kumar, S., Teki, S.: Two stage transformer model for covid-19 fake news detection and fact checking. arXiv Preprint. http://arxiv.org/abs/2011.13253 . (2020)

Anderson, C.W.: News ecosystems. SAGE Handb. Digit. J. 410–423 (2016)

Wang, B., Shang, L., Lioma, C., Jiang, X., Yang, H., Liu, Q., Simonsen, J.G.: On position embeddings in BERT. In: International Conference on Learning Representations (2021)

Raza, S., Ding, C.: News recommender system: a review of recent progress, challenges, and opportunities. Artif. Intell. Rev. (2021). https://doi.org/10.1007/s10462-021-10043-x

Papadopoulos, S., Kompatsiaris, Y., Vakali, A., Spyridonos, P.: Community detection in social media. Data Min. Knowl. Discov. 24 , 515–554 (2012)

Abu-Salih, B., Wongthongtham, P., Chan, K.Y., Zhu, D.: CredSaT: credibility ranking of users in big social data incorporating semantic analysis and temporal factor. J. Inf. Sci. 45 , 259–280 (2019). https://doi.org/10.1177/0165551518790424

Williams, A., Nangia, N., Bowman, S.R.: A broad-coverage challenge corpus for sentence understanding through inference. arXiv Preprint. http://arxiv.org/abs/1704.05426 (2017)

Pushp, P.K., Srivastava, M.M.: Train once, test anywhere: zero-shot learning for text classification. arXiv Preprint. http://arxiv.org/abs/1712.05972 (2017)

Dong, L., Yang, N., Wang, W., Wei, F., Liu, X., Wang, Y., Gao, J., Zhou, M., Hon, H.-W.: Unified language model pre-training for natural language understanding and generation. arXiv Preprint. http://arxiv.org/abs/1905.03197 (2019)

Helmstetter, S., Paulheim, H.: Weakly supervised learning for fake news detection on Twitter. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 274–277 (2018)

Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., Weld, D.S.: Knowledge-based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 541–550 (2011)

Baly, R., Karadzhov, G., Alexandrov, D., Glass, J., Nakov, P.: Predicting factuality of reporting and bias of news media sources. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, pp. 3528–3539 (2020)

Nørregaard, J., Horne, B.D., Adalı, S.: NELA-GT-2018: A large multi-labelled news dataset for the study of misinformation in news articles. In: Proceedings of 13th International Conference on Web and Social Media, ICWSM 2019, pp. 630–638 (2019). https://doi.org/10.7910/DVN/ULHLCB

Horne, Benjamin; Gruppi, M.: NELA-GT-2020: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles. arXiv Preprint. http://arxiv.org/abs/2102.04567 . (2021). https://doi.org/10.7910/DVN/CHMUYZ

Drummond, C., Holte, R.C., et al.: C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, pp. 1–8 (2003)

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv Preprint. http://arxiv.org/abs/1711.05101 (2017)

Lu, Z., Du, P., Nie, J.Y.: VGCN-BERT: augmenting BERT with graph embedding for text classification. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 369–382 (2020)

Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, pp. 5753–5763 (2019)

Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv Preprint. http://arxiv.org/abs/1910.01108 (2019)

Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv Preprint. http://arxiv.org/abs/2004.05150 (2020)

Kim, Y.: Convolutional neural networks for sentence classification. arXiv Preprint. http://arxiv.org/abs/1408.5882 (2014)

Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., et al.: Xgboost: extreme gradient boosting. R Package version 0.4-2.1, (2015)

Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7370–7377 (2019)

Raza, S., Ding, C.: News recommender system considering temporal dynamics and news taxonomy. In: Proceedings—2019 IEEE International Conference on Big Data, Big Data 2019, pp. 920–929. Institute of Electrical and Electronics Engineers Inc. (2019)

Bozarth, L., Saraf, A., Budak, C.: Higher ground? How groundtruth labeling impacts our understanding of fake news about the 2016 US presidential nominees. In: Proceedings of the International AAAI Conference on Web and Social Media, pp. 48–59 (2020)

Wang, Y., Sohn, S., Liu, S., Shen, F., Wang, L., Atkinson, E.J., Amin, S., Liu, H.: A clinical text classification paradigm using weak supervision and deep representation. BMC Med. Inform. Decis. Mak. 19 , 1–13 (2019). https://doi.org/10.1186/s12911-018-0723-6

Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3 , 1–130 (2009)

MATH   Google Scholar  

Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: International Conference on Artificial Neural Networks, pp. 270–279 (2018)

Raza, S., Ding, C.: A Regularized Model to Trade-off between Accuracy and Diversity in a News Recommender System. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 551–560 (2020)

Bhuiyan, M., Zhang, A., Sehat, C., Mitra, T.: Investigating “who” in the crowdsourcing of news credibility. In: Computational Journalism Symposium (2020)

Popat, K., Mukherjee, S., Strötgen, J., Weikum, G.: Credibility assessment of textual claims on the web. In: International Conference on Information and Knowledge Managenent Proceedings, 24–28-October-2016, pp. 2173–2178 (2016). https://doi.org/10.1145/2983323.2983661

Yang, K.-C., Niven, T., Kao, H.-Y.: Fake News Detection as Natural Language Inference. arXiv Preprint. http://arxiv.org/abs/1907.07347 (2019)

Sirivianos, M., Kim, K., Yang, X.: FaceTrust: Assessing the credibility of online personas via social networks. In: Proceedings of 4th USENIX Conferences on Hot Topics in Security (2009)

Thakur, A., Sangal, A.L., Bindra, H.: Quantitative measurement and comparison of effects of various search engine optimization parameters on Alexa Traffic Rank. Int. J. Comput. Appl. 26 , 15–23 (2011)

Download references

Author information

Authors and affiliations.

Ryerson University, Toronto, ON, Canada

Shaina Raza & Chen Ding

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Shaina Raza .

Ethics declarations

Conflict of interest.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Notations used in paper

Rights and permissions.

Reprints and permissions

About this article

Raza, S., Ding, C. Fake news detection based on news content and social contexts: a transformer-based approach. Int J Data Sci Anal 13 , 335–362 (2022). https://doi.org/10.1007/s41060-021-00302-z

Download citation

Received : 24 April 2021

Accepted : 13 December 2021

Published : 30 January 2022

Issue Date : May 2022

DOI : https://doi.org/10.1007/s41060-021-00302-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Social contexts
  • Concept drift
  • Weak supervision
  • Transformer
  • User credibility
  • Zero shot learning
  • Find a journal
  • Publish with us
  • Track your research

The tentacles of retracted science reach deep into social media. A simple button could change that.

research about social media fake news

In 1998, a paper linking childhood vaccines with autism was published in prestigious journal, The Lancet, only to be retracted in 2010 when the science was debunked.

Fourteen years since its retraction, the paper’s original claim continues to flourish on social media, fuelling misinformation and disinformation around vaccine safety and efficacy.

A University of Sydney team is hoping to help social media users identify posts featuring misinformation and disinformation arising from now-debunked science. They have developed and tested a new interface that helps users discover further information about potentially fraught claims on social media.

They created and tested the efficacy of adding a “more information” button to social media posts. The button links to a drop down which allows users to see more details about claims or information in news posts, including information on whether that news is based on retracted science. The reseachers say social media platforms could use an algorithm to link posts to details of retracted science.

Testing of the interface among a group of participants showed that when people understand the idea of retraction and can easily find when health news is based on a claim from retracted research, it can help reduce the impact and spread of misinformation as they are less likely to share it.

“Knowledge is power,” said  Professor Judy Kay  from the  School of Computer Science  who led the research. “During the height of the COVID-19 pandemic, myths around the efficacy and safety of vaccines abounded. We want to help people to better understand when science has been debunked or challenged so they can make informed decisions about their health,” she said.

“The ability to read and properly interpret often complex scientific papers is a very niche skill – not everybody has that literacy or is up to date on the latest science. Many people would have seen posts about now-debunked vaccine research and thought: ‘it was published in a medical journal, so it must be true’. Sadly, that isn’t the case for retracted publications.”

“Social media platforms could do much better than they do now,” said co-author and PhD student Waheeb Yaqub. “During the height of the COVID-19 pandemic, myths around the efficacy and safety of vaccines spread like wildfire.”

“Our approach shows that when people understand the idea of retraction and can find when health news is based on a retracted science article, it can reduce the impact and spread of misinformation,” he said.

Tool boosts literacy of processes behind scientific research

The research was conducted with 44 participants who started with little or no understanding of scientific retraction. After completing a five-minute tutorial, they rated how various reasons for retraction make a paper’s findings invalid.

The researchers then studied how participants used the “more information” button. They found the new information altered the participants’ beliefs on three health claims based on retracted papers shared on social media.

These claims were: whether masks are effective in limiting the spread of coronavirus; that the Mediterranean diet is effective in reducing heart disease; and snacking while watching an action movie leads to overeating.

The first claim was based on two papers, one which had been retracted and one which hadn’t. The other two claims were based on retracted papers. The researchers specifically chose papers of which participants would have differing knowledge.

“Participants confidently considered masks were effective. Most didn’t know about the Mediterranean diet and so were unsure about whether it was true. Many people whose personal experience of snacking during films made them believe it was true.”

The button influenced participants when they knew little about a topic to begin with. When the participants discovered the post was based on a retracted paper, they were less likely to like or share it.

On social media, both misinformation (the inadvertent spread of false information) and disinformation (false information deliberately spread with malicious intent), are rising.

Papers can be retracted when problems with methodology, results or experiments are found.

The researchers say it would be feasible for social media platforms to develop back-end software that links databases of retracted papers.

“If social media platforms want to maintain their quality and integrity, they should look to implement simple methods like ours,” Professor Kay said.

The study was published in  Proceedings of the ACM on Human-Computer Interaction .

DECLARATION 

The authors declare no conflicts of interest. Waheeb Yaqub is the recipient of a research scholarship.

Related news

University signs mous with ho chi minh city department of health and institute of development studies, it's a scam, methane emissions from landfill could be turned into sustainable jet fuel in plasma chemistry leap.

  • Request Info

Waidner-Spahr Library

  • Do Research
  • About the Library
  • Dickinson Scholar
  • Ask a Librarian
  • My Library Account

Social Media: Fake News

  • Online Persona
  • Information Timeline

What is "Fake News?"

"Fake news" refers to news articles or other items that are deliberately false and intended to manipulate the viewer. While the concept of fake news stretches back to antiquity, it has become a large problem in recent years due to the ease with which it can be spread on social media and other online platforms, as people are often less likely to critically evaluate news shared by their friends or that confirms their existing beliefs. Fake news is alleged to have contributed to important political and economic outcomes in recent years. 

This page presents strategies for detecting fake news and researching the topic through reliable sources so you can draw your own conclusions. The most important thing you can do is to recognize fake news and halt its dissemination by not spreading it to your social circles. 

What are Social Media Platforms Doing?

Several social media platforms have responded to the rise in fake news by adjusting their news feeds, labeling news stories as false or contested, or through other approaches. Google has also made changes to address the problem. The websites below explain some of the steps these platforms are taking. These steps can only go so far, however; it's always the responsibility of the reader to question and verify information found online. 

  • Working to Stop Misinformation and False News--Facebook
  • Facebook Tweaks Its 'Trending Topics' Algorithm To Better Reflect Real News--Facebook
  • Facebook says it will act against 'information operations' using false accounts--Facebook
  • Fact Check now available in Google Search and News around the world--Google
  • Google has banned 200 publishers since it passed a new policy against fake news
  • Solutions that can stop fake news spreading--BBC

Recognizing Fake News

You've all seen fake news before, and it's easy to recognize the worst examples. 

Fake news is often too good to be true, too extreme, or too out of line with what you know to be true and what other news sources are telling you. The following suggestions for recognizing fake news are taken from the NPR story  Fake News Or Real? How To Self-Check The News And Get The Facts . 

  • Pay attention to the domain and URL : Reliable websites have familiar names and standard URLs, like .com or .edu. 
  • Read the "About Us" section : Insufficient or overblown language can signal an unreliable source. 
  • Look at the quotes in a story : Good stories quote multiple experts to get a range of perspectives on an issue.
  • Look at who said them : Can you verify that the quotes are correct? What kind of authority do the sources possess?
  • Check the comments : Comments on social media platforms can alert you when the story doesn't support the headline.
  • Reverse image search : If an image used in a story appears on other websites about different topics, it's a good sign the image isn't actually what the story claims it is. 

If in doubt, contact a librarian for help evaluating the claim and the source of the news using reliable sources. 

Strategies for Combatting Fake News

  • Be aware of the problem : Many popular sources, particularly online news sources and social media, are competing for your attention through outlandish claims, and sometimes with the intent of manipulating the viewer
  • Think critically : Critically evaluate news that you encounter. If it sounds too good (or sometimes too bad) to be true, it probably is. Most fake news preys on our desire to have our beliefs confirmed, whether they be positive or negative
  • Check facts against reliable sources : When you encounter a claim in the news, particularly if it sets off alarm bells for you, take the time to evaluate the claim using reputable sources including library databases, fact checking websites like  Snopes.com  or  PolitiFact , and authoritative news sources like the New York Times  (click for instructions on registering an account) and the BBC . While established news sources can also be wrong from time to time, they take care to do extensive fact checking to validate their articles
  • Stop the spread of fake news : You can do your part in halting the spread of fake news by not spreading it further on social media, through email, or in conversation

Real or Fake?

  • News Story 1
  • News Story 2

Is this news item real or fake? Try evaluating it using the tips presented on this page.

Story 1 (November 2011): SHOCK - Brain surgeon confirms ObamaCare rations care, has death panels!

research about social media fake news

Story 2 (January 2017): The State Department’s entire senior administrative team just resigned

How Social Media Spreads Fake News

Social media is one of the main ways that fake news is spread online. Platforms like Facebook and Twitter make it easy to share trending news without taking the time to critically evaluate it.

People are also less likely to critically evaluate news shared by their friends, so misleading news stories end up getting spread throughout social networks with a lot of momentum.

Read the articles below to get a better understanding of how social media can reinforce our preexisting beliefs and make us more likely to believe fake news. 

  • Stanford study examines fake news and the 2016 presidential election
  • The reason your feed became an echo chamber--and what to do about it
  • How fake news goes viral: a case study
  • Is Facebook keeping you in a political bubble?
  • 2016 Lie of the Year: Fake news
  • Fake News Expert On How False Stories Spread and Why People Believe Them
  • BBC News--Filter Bubbles
  • NPR--Researchers Examine When People Are More Susceptible To Fake News
  • << Previous: Danger
  • Next: Information Timeline >>
  • Last Updated: Jul 18, 2023 4:14 PM
  • URL: https://libguides.dickinson.edu/socialmedia

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Approaches to Identify Fake News: A Systematic Literature Review

Dylan de beer.

Department of Informatics, University of Pretoria, Pretoria, 0001 South Africa

Machdel Matthee

With the widespread dissemination of information via digital media platforms, it is of utmost importance for individuals and societies to be able to judge the credibility of it. Fake news is not a recent concept, but it is a commonly occurring phenomenon in current times. The consequence of fake news can range from being merely annoying to influencing and misleading societies or even nations. A variety of approaches exist to identify fake news. By conducting a systematic literature review, we identify the main approaches currently available to identify fake news and how these approaches can be applied in different situations. Some approaches are illustrated with a relevant example as well as the challenges and the appropriate context in which the specific approach can be used.

Introduction

Paskin ( 2018 : 254) defines fake news as “particular news articles that originate either on mainstream media (online or offline) or social media and have no factual basis, but are presented as facts and not satire”. The importance of combatting fake news is starkly illustrated during the current COVID-19 pandemic. Social networks are stepping up in using digital fake news detection tools and educating the public towards spotting fake news. At the time of writing, Facebook uses machine learning algorithms to identify false or sensational claims used in advertising for alternative cures, they place potential fake news articles lower in the news feed, and they provide users with tips on how to identify fake news themselves (Sparks and Frishberg 2020 ). Twitter ensures that searches on the virus result in credible articles and Instagram redirects anyone searching for information on the virus to a special message with credible information (Marr 2020 ).

These measures are possible because different approaches exist that assist the detection of fake news. For example, platforms based on machine learning use fake news from the biggest media outlets, to refine algorithms for identifying fake news (Macaulay 2018 ). Some approaches detect fake news by using metadata such as a comparison of release time of the article and timelines of spreading the article as well where the story spread (Macaulay 2018 ).

The purpose of this research paper is to, through a systematic literature review, categorize current approaches to contest the wide-ranging endemic of fake news.

The Evolution of Fake News and Fake News Detection

Fake news is not a new concept. Before the era of digital technology, it was spread through mainly yellow journalism with focus on sensational news such as crime, gossip, disasters and satirical news (Stein-Smith 2017 ). The prevalence of fake news relates to the availability of mass media digital tools (Schade 2019 ). Since anyone can publish articles via digital media platforms, online news articles include well researched pieces but also opinion-based arguments or simply false information (Burkhardt 2017 ). There is no custodian of credibility standards for information on these platforms making the spread of fake news possible. To make things worse, it is by no means straightforward telling the difference between real news and semi-true or false news (Pérez-Rosas et al. 2018 ).

The nature of social media makes it easy to spread fake news, as a user potentially sends fake news articles to friends, who then send it again to their friends and so on. Comments on fake news sometimes fuel its ‘credibility’ which can lead to rapid sharing resulting in further fake news (Albright 2017 ).

Social bots are also responsible for the spreading of fake news. Bots are sometimes used to target super-users by adding replies and mentions to posts. Humans are manipulated through these actions to share the fake news articles (Shao et al. 2018 ).

Clickbait is another tool encouraging the spread of fake news. Clickbait is an advertising tool used to get the attention of users. Sensational headlines or news are often used as clickbait that navigate the user to advertisements. More clicks on the advert means more money (Chen et al. 2015 a).

Fortunately, tools have been developed for detecting fake news. For example, a tool has been developed to identify fake news that spreads through social media through examining lexical choices that appear in headlines and other intense language structures (Chen et al. 2015 b). Another tool, developed to identify fake news on Twitter, has a component called the Twitter Crawler which collects and stores tweets in a database (Atodiresei et al. 2018 ). When a Twitter user wants to check the accuracy of the news found they can copy a link into this application after which the link will be processed for fake news detection. This process is built on an algorithm called the NER (Named Entity Recognition) (Atodiresei et al. 2018 ).

There are many available approaches to help the public to identify fake news and this paper aims to enhance understanding of these by categorizing these approaches as found in existing literature.

Research Method

Research objective.

The purpose of this paper is to categorize approaches used to identify fake news. In order to do this, a systematic literature review was done. This section presents the search terms that were used, the selection criteria and the source selection.

Search Terms

Specific search terms were used to enable the finding of relevant journal articles such as the following:

  • (“what is fake news” OR “not genuine information” OR “counter fit news” OR “inaccurate report*” OR “forged (NEAR/2) news” OR “mislead* information” OR “false store*” OR “untrustworthy information” OR “hokes” OR “doubtful information” OR “incorrect detail*” OR “false news” OR “fake news” OR “false accusation*”)
  • AND (“digital tool*” OR “digital approach” OR “automated tool*” OR “approach*” OR “programmed tool*” OR “digital gadget*” OR “digital device*” OR “digital machan*” OR “digital appliance*” OR “digital gizmo” OR “IS gadget*” OR “IS tool*” OR “IS machine*” OR “digital gear*” OR “information device*”)
  • AND (“fake news detection” OR “approaches to identify fake news” OR “methods to identify fake news” OR “finding fake news” OR “ways to detect fake news”).

Selection Criteria

Inclusion criteria..

Studies that adhere to the following criteria: (1) studies published between 2008 and 2019; (2) studies found in English; (3) with main focus fake news on digital platforms; (4) articles that are published in IT journals or any technology related journal articles (e.g. computers in human behavior) as well as conference proceedings; (5) journal articles that are sited more than 10 times.

Exclusion Criteria.

Studies that adhered to the following criteria: (1) studies not presented in journal articles (e.g. in the form of a slide show or overhead presentation); (2) studies published, not relating to technology or IT; (3) articles on fake news but not the identification of it.

The search terms were used to find relevant articles on ProQuest, ScienceDirect, EBSCOhost and Google Scholar (seen here as ‘other sources’).

Flowchart of Search Process

Figure  1 below gives a flowchart of the search process: the identification of articles, the screening, the selection process and the number of the included articles.

An external file that holds a picture, illustration, etc.
Object name is 491455_1_En_2_Fig1_HTML.jpg

A flowchart of the selection process

In this section of the article we list the categories of approaches that are used to identify fake news. We also discuss how the different approaches interlink with each other and how they can be used together to get a better result.

The following categories of approaches for fake news detection are proposed: (1) language approach, (2) topic-agnostic approach, (3) machine learning approach, (4) knowledge-based approach, (5) hybrid approach.

The five categories mentioned above are depicted in Fig.  2 below. Figure  2 shows the relationship between the different approaches. The sizes of the ellipses are proportional to the number of articles found (given as the percentage of total included articles) in the systematic literature review that refer to that approach.

An external file that holds a picture, illustration, etc.
Object name is 491455_1_En_2_Fig2_HTML.jpg

Categories of fake news detection approaches resulting from the systematic literature review

The approaches are discussed in depth below with some examples for illustration purposes.

Language Approach

This approach focuses on the use of linguistics by a human or software program to detect fake news. Most of the people responsible for the spread of fake news have control over what their story is about, but they can often be exposed through the style of their language (Yang et al. 2018 ). The approach considers all the words in a sentence and letters in a word, how they are structured and how it fits together in a paragraph (Burkhardt 2017 ). The focus is therefore on grammar and syntax (Burkhardt 2017 ). There are currently three main methods that contribute to the language approach:

Bag of Words (BOW):

In this approach, each word in a paragraph is considered of equal importance and as independent entities (Burkhardt 2017 ). Individual words frequencies are analysed to find signs of misinformation. These representations are also called n-grams (Thota et al. 2018 ). This will ultimately help to identify patterns of word use and by investigating these patterns, misleading information can be identified. The bag of words model is not as practical because context is not considered when text is converted into numerical representations and the position of a word is not always taken into consideration (Potthast et al. 2017 ).

Semantic Analysis:

Chen et al. 2017 b explain that truthfulness can be determined by comparing personal experience (e.g. restaurant review) with a profile on the topic derived from similar articles. An honest writer will be more likely to make similar remarks about a topic than other truthful writers. Different compatibly scores are used in this approach.

Deep Syntax:

The deep syntax method is carried out through Probability Context Free Grammars (Stahl 2018 ). The Probability Context Free Grammars executes deep syntax tasks through parse trees that make Context Free Grammar analysis possible. Probabilistic Context Free Grammar is an extension of Context Free Grammars (Zhou and Zafarani 2018 ). Sentences are converted into a set of rewritten rules and these rules are used to analyse various syntax structures. The syntax can be compared to known structures or patterns of lies and can ultimately lead to telling the difference between fake news and real news (Burkhardt 2017 ).

Topic-Agnostic Approach

This category of approaches detect fake news by not considering the content of articles bur rather topic-agnostic features. The approach uses linguistic features and web mark-up capabilities to identify fake news (Castelo et al. 2019 ). Some examples of topic-agnostic features are 1) a large number of advertisements, 2) longer headlines with eye-catching phrases, 3) different text patterns from mainstream news to induce emotive responses 4) presence of an author name (Castelo et al. 2019 ; Horne and Adali 2017 ).

Machine Learning Approach

Machine learning algorithms can be used to identify fake news. This is achieved through using different types of training datasets to refine the algorithms. Datasets enables computer scientists to develop new machine learning approaches and techniques. Datasets are used to train the algorithms to identify fake news. How are these datasets created? One way is through crowdsourcing. Perez-Rosas et al. ( 2018 ) created a fake news data set by first collecting legitimate information on six different categories such as sports, business, entertainment, politics, technology and education (Pérez-Rosas et al. 2018 ). Crowdsourcing was then used and a task was set up which asked the workers to generate a false version of the news stories (Pérez-Rosas et al. 2018 ). Over 240 stories were collected and added to the fake news dataset.

A machine learning approach called the rumor identification framework has been developed that legitimizes signals of ambiguous posts so that a person can easily identify fake news (Sivasangari et al. 2018 ). The framework will alert people of posts that might be fake (Sivasangari et al. 2018 ). The framework is built to combat fake tweets on Twitter and focuses on four main areas; the metadata of tweets, the source of the tweet; the date and area of the tweet, where and when the tweet was developed (Sivasangari et al. 2018 ). By studying these four parts of the tweet the framework can be implemented to check the accuracy of the information and to separate the real from the fake (Sivasangari et al. 2018 ). Supporting this framework, the spread of gossip is collected to create datasets with the use of a Twitter Streaming API (Sivasangari et al. 2018 ).

Twitter has developed a possible solution to identify and prevent the spread of misleading information through fake accounts, likes and comments (Atodiresei et al. 2018 ) - the Twitter crawler, a machine learning approach works by collecting tweets and adding them to a database, making comparison between different tweets possible.

Knowledge Based Approach

Recent studies argue for the integration of machine learning and knowledge engineering to detect fake news. The challenging problem with some of these fact checking methods is the speed at which fake news spreads on social media. Microblogging platforms such as Twitter causes small pieces of false information to spread very quickly to a large number of people (Qazvinian et al. 2011 ). The knowledge-based approach aims at using sources that are external to verify if the news is fake or real and to identify the news before the spread thereof becomes quicker. There are three main categories; (1) Expert Oriented Fact Checking, (2) Computational Oriented Fact Checking, (3) Crowd Sourcing Oriented Fact Checking (Ahmed et al. 2019 ).

Expert Oriented Fact Checking.

With expert oriented fact checking it is necessary to analyze and examine data and documents carefully (Ahmed et al. 2019 ). Expert-oriented fact-checking requires professionals to evaluate the accuracy of the news manually through research and other studies on the specific claim. Fact checking is the process of assigning certainty to a specific element by comparing the accuracy of the text to another which has previously been fact checked (Vlachos and Riedel 2014 ).

Computational Oriented Fact Checking.

The purpose of computational oriented fact checking is to administer users with an automated fact-checking process that is able to identify if a specific piece of news is true or false (Ahmed et al. 2019 ). An example of computational oriented fact checking is knowledge graphs and open web sources that are based on practical referencing to help distinguish between real and fake news (Ahmed et al. 2019 ). A recent tool called the ClaimBuster has been developed and is an example of how fact checking can automatically identify fake news (Hassan et al. 2017 ). This tool makes use of machine learning techniques combined with natural language processing and a variety of database queries. It analyses context on social media, interviews and speeches in real time to determine ‘facts’ and compares it with a repository that contains verified facts and delivers it to the reader (Hassan et al. 2017 ).

Crowd Sourcing Oriented.

Crowdsourcing gives the opportunity for a group of people to make a collective decision through examining the accuracy of news (Pennycook and Rand 2019 ). The accuracy of the news is completely based on the wisdom of the crowd (Ahmed et al. 2019 ). Kiskkit is an example of a platform that can be used for crowdsourcing where the platform allows a group of people to evaluate pieces of a news article (Hassan et al. 2017 ). After one piece has been evaluated the crowd moves to the next piece for evaluation until the entire news article has been evaluated and the accuracy thereof has been determined by the wisdom of the crowd (Hassan et al. 2017 ).

Hybrid Approach

There are three generally agreed upon elements of fake news articles, the first element is the text of an article, second element is the response that the articles received and lastly the source used that motivate the news article (Ruchansky et al. 2017 ). A recent study has been conducted that proposes a hybrid model which helps to identify fake news on social media through using a combination of human and machine learning to help identify fake news (Okoro et al. 2018 ). Humans only have a 4% chance of identifying fake news if they take a guess and can only identify fake news 54% of the time (Okoro et al. 2018 ). The hybrid model as proven to increase this percentage (Okoro et al. 2018 ). To make the hybrid model effective it combines social media news with machine learning and a network approach (Okoro et al. 2018 ). The purpose of this model is to identify the probability that the news could be fake (Okoro et al. 2018 ). Another hybrid model called CSI (capture, score, integrate) has been developed and functions on the main elements; (1) capture - the process of extracting representations of articles by using a Recurrent Neutral Network (RNN), (2) Score – to create a score and representation vector, (3) Integrate – to integrate the outputs of the capture and score resulting in a vector which is used for classification (Ruchansky et al. 2017 ).

In this paper we discussed the prevalence of fake news and how technology has changed over the last years enabling us to develop tools that can be used in the fight against fake news. We also explored the importance of identifying fake news, the influence that misinformation can have on the public’s decision making and which approaches exist to combat fake news. The current battle against fake news on COVID-19 and the uncertainty surrounding it, shows that a hybrid approach towards fake news detection is needed. Human wisdom as well as digital tools need to be harnessed in this process. Hopefully some of these measures will stay in place and that digital media platform owners and public will take responsibility and work together in detecting and combatting fake news.

  • Ahmed, S., Hinkelmann, K., Corradini, F.: Combining machine learning with knowledge engineering to detect fake news in social networks - a survey. In: Proceedings of the AAAI 2019 Spring Symposium, vol. 12 (2019)
  • Albright J. Welcome to the era of fake news. Media Commun. 2017; 5 (2):87. doi: 10.17645/mac.v5i2.977. [ CrossRef ] [ Google Scholar ]
  • Atodiresei C-S, Tănăselea A, Iftene A. Identifying fake news and fake users on twitter. Procedia Comput. Sci. 2018; 126 :451–461. doi: 10.1016/j.procs.2018.07.279. [ CrossRef ] [ Google Scholar ]
  • Burkhardt JM. History of fake news. Libr. Technol. Rep. 2017; 53 (8):37. [ Google Scholar ]
  • Castelo, S., Almeida, T., Elghafari, A., Santos, A., Pham, K., Nakamura, E., Freire, J.: A topic-agnostic approach for identifying fake news pages. In: Companion Proceedings of the 2019 World Wide Web Conference on - WWW 2019, pp. 975–980 (2019). 10.1145/3308560.3316739
  • Chen, Y., Conroy, N.J., Rubin, V.L.: Misleading online content: recognizing clickbait as false news? In: Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection - WMDD 2015, Seattle, Washington, USA, pp. 15–19. ACM Press (2015a). 10.1145/2823465.2823467
  • Chen Yimin, Conroy Nadia K., Rubin Victoria L. News in an online world: The need for an “automatic crap detector” Proceedings of the Association for Information Science and Technology. 2015; 52 (1):1–4. [ Google Scholar ]
  • Hassan, N., Arslan, F., Li, C., Tremayne, M.: Toward automated fact-checking: detecting check-worthy factual claims by claimbuster. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2017, Halifax, NS, Canada, pp. 1803–1812. ACM Press (2017). 10.1145/3097983.3098131
  • Horne, B.D., Adali, S.: This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In: International AAAI Conference on Web and Social Media, vol. 8 (2017)
  • Macaulay, T.: Can technology solve the fake news problem it helped create? (2018). https://www.techworld.com/startups/can-technology-solve-fake-news-problem-it-helped-create-3672139/
  • Marr, B.: Coronavirus fake news: how Facebook, Twitter, and Instagram are tackling the problem. Forbes (2020). https://www.forbes.com/sites/bernardmarr/2020/03/27/finding-the-truth-about-covid-19-how-facebook-twitter-and-instagram-are-tackling-fake-news/
  • Okoro EM, Abara BA, Umagba AO, Ajonye AA, Isa ZS. A hybrid approach to fake news detection on social media. Niger. J. Technol. 2018; 37 (2):454. doi: 10.4314/njt.v37i2.22. [ CrossRef ] [ Google Scholar ]
  • Paskin D. Real or fake news: who knows? J. Soc. Media Soc. 2018; 7 (2):252–273. [ Google Scholar ]
  • Pennycook G, Rand DG. Fighting misinformation on social media using crowdsourced judgments of news source quality. Proc. Natl. Acad. Sci. 2019; 116 (7):2521–2526. doi: 10.1073/pnas.1806781116. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3391–3401. Association for Computational Linguistics (2018). https://www.aclweb.org/anthology/C18-1287
  • Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., Stein, B.: A stylometric inquiry into hyperpartisan and fake news (2017). arXiv Preprint arXiv:1702.05638
  • Qazvinian, V., Rosengren, E., Radev, D.R., Mei, Q.: Rumor has it: identifying misinformation in microblogs. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, pp. 1589–1599 (2011)
  • Ruchansky, N., Seo, S., Liu, Y.: CSI: a hybrid deep model for fake news detection. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, pp. 797–806 (2017). 10.1145/3132847.3132877
  • Schade, U.: Software that can automatically detect fake news. Comput. Sci. Eng. 3 (2019)
  • Shao C, Ciampaglia GL, Varol O, Yang K, Flammini A, Menczer F. The spread of low-credibility content by social bots. Nat. Commun. 2018; 9 (1):4787. doi: 10.1038/s41467-018-06930-7. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sivasangari V, Anand PV, Santhya R. A modern approach to identify the fake news using machine learning. Int. J. Pure Appl. Math. 2018; 118 (20):10. [ Google Scholar ]
  • Sparks, H., Frishberg, H.: Facebook gives step-by-step instructions on how to spot fake news (2020). https://nypost.com/2020/03/26/facebook-gives-step-by-step-instructions-on-how-to-spot-fake-news/
  • Stahl, K.: Fake news detection in social media. California State University Stanislaus, 6 (2018)
  • Stein-Smith, K.: Librarians, information literacy, and fake news. Strat. Libr. 37 (2017)
  • Thota, A., Tilak, P., Ahluwalia, S., Lohia, N.: Fake news detection: a deep learning approach. SMU Data Sci. Rev. 1 (3), 21 (2018)
  • Vlachos, A., Riedel, S.: Fact checking: task definition and dataset construction. In: Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, Baltimore, MD, USA, pp. 18–22. Association for Computational Linguistics (2014). 10.3115/v1/W14-2508
  • Yang, Y., Zheng, L., Zhang, J., Cui, Q., Li, Z., Yu, P.S.: TI-CNN: convolutional neural networks for fake news detection (2018). arXiv preprint arXiv:1806.00749
  • Zhou, X., Zafarani, R.: Fake news: a survey of research, detection methods, and opportunities (2018). arXiv preprint arXiv:1812.00315

Facebook went away. Political divides didn't budge.

In the weeks before and after the 2020 presidential election, researchers at Stanford and elsewhere ran a number of tests to try to understand how much Facebook and its corporate cousin, Instagram, may be contributing to the nation's political divide.

One of those experiments — led by Matthew Gentzkow and Hunt Allcott , senior fellows at the Stanford Institute for Economic Policy Research (SIEPR) — centered on more than 35,000 Facebook and Instagram users who were paid to stay off the platforms in the run-up to Election Day. There’s a lot that researchers could glean from the social media hiatus, including whether people’s political attitudes shifted and in what ways. If views changed dramatically, that would support the argument that Facebook and Instagram, and social media generally, are helping to drive Americans apart.

The results of that deactivation exercise — the largest ever involving social media users and the first to include Instagram — are in: Staying off Facebook and Instagram in the final stretch of the November vote had little or no effect on people’s political views, their negative opinions of opposing parties, or beliefs around claims of election fraud.

But when it comes to Facebook’s impact on what people believed about current events, the researchers reached two conclusions. Those who were off Facebook were worse at answering news quiz questions, but they were also less likely to fall for widely circulated misinformation, suggesting that the platform can be an important conduit for both real and false news.

These findings, newly published by the Proceedings of the National Academy of Sciences, are in line with the main takeaways of the other experiments into Facebook and Instagram’s potential influence around the 2020 election, in which changing news feeds and limiting re-sharing of posts didn’t reduce polarization or change beliefs about whether the voting process was tainted. Those tests were detailed in four papers published in July 2023 in Science and Nature .

Taken together, the papers suggest that, when it comes to U.S. politics, Facebook and Instagram may not have as much ability to shape political attitudes during an election season as the popular narrative suggests.

And like the previous studies, the Gentzkow and Allcott-led study doesn’t absolve Meta Platforms, which owns Facebook and Instagram, from the messy state of U.S. politics. For one thing, the results support the view that Facebook may create harm by distributing misinformation. Gentzkow says it’s also possible that the platforms contributed to polarization in the past, even if people’s use of them in the run-up to the election had limited impact.

research about social media fake news

“We are not ruling out the possibility that Facebook and Instagram contribute to polarization in other ways over time,” says Gentzkow, who is the Landau Professor of Technology and the Economy in the Stanford School of Humanities and Sciences.

He also notes another finding suggesting that using Facebook in the weeks before the 2020 presidential election may have made people somewhat more likely to vote for Donald Trump and somewhat less likely to vote for Joe Biden. This could suggest that, for Facebook users who still were on the site, Trump’s campaign was savvier at building support than Biden’s team was.

“This effect was not quite statistically significant, so we need to take it with a grain of salt,” Gentzkow says. “But if it’s real, it’s big enough that it could impact the outcome of a close election.”

Meta opens up a trove of data

The study led by Gentzkow and Allcott — and the four that preceded it — are part of a massive research project that has been billed as the most comprehensive, evidence-based investigation yet into the role of social media in American democracy.

The project came together following critiques of Meta’s role in the spread of fake news, Russian influence, and the Cambridge Analytica data breach. The collaboration between academics and Meta researchers involved a series of steps to protect the integrity of the research , which builds on work by Stanford Law School Professor Nathaniel Persily on how to structure partnerships between academia and social media companies. Meta, for example, agreed not to prohibit any findings from being published.

In all, nearly 20 independent social scientists — including Gentzkow; Allcott, a professor at the Stanford Doerr School of Sustainability; Neil Malhotra, a political economist at the Stanford Graduate School of Business; and, Shanto Iyengar and Jennifer Pan, both political scientists in the Stanford School of Humanities and Sciences — are part of the project.

“Access to Meta’s proprietary data has allowed us to jump over big barriers to research on extremely important issues involving social media and politics,” Gentzkow says.

Gentzkow and Allcott’s study — whose 31 co-authors include five current and former SIEPR predocs and one former SIEPR undergraduate research fellow — involved roughly 19,900 Facebook users and 15,600 Instagram users who agreed to stop using the platforms ahead of the 2020 election. About a quarter of them agreed to deactivate their accounts for six weeks before the November vote. The rest comprised a control group that logged off for just one week.

The study’s analysis relies on a number of measures, among them participant surveys, state voting records, campaign donations, and Meta platform data. Some participants also allowed the researchers to track how they used other news and social media services when they were off Facebook or Instagram.

On top of the findings on polarization, knowledge, and Republican support, the authors conclude that Facebook and Instagram help people engage in the political process — mostly through posting about politics and signing petitions online (voter turnout didn’t change).

Takeaways for 2024 and beyond

Gentzkow says that the study’s finding that Facebook and Instagram didn’t change people’s political attitudes or beliefs in claims of electoral fraud in 2020 is especially interesting in light of his previous research with Allcott. In an earlier smaller-scale study of Facebook users who stayed off the platform for a month ahead of the 2018 midterms, the authors did find evidence that it contributes to polarization.

The distinction, Gentzkow says, could be that people are aware enough of political issues during a presidential election, so Facebook and Instagram have little or no effect on their beliefs or attitudes. But during other elections, when information about candidates or issues are not so front and center, social media may have more influence over what people think.

“Even though Facebook and Instagram did not contribute to polarization in the runup to the 2020 election, it’s possible that they are helping to widen political divides in other contexts where people’s views are less entrenched,” Gentzkow says.

And though the study was limited to the six weeks leading up to the presidential vote, it’s still a critical time in U.S. politics — hence the phenomenon known as the “ October surprise .”

“Things happen in the home stretch of a presidential election that can change poll numbers,” he says. “We’ve learned from this study that altering how much time people spend on Facebook and Instagram during that period isn’t likely to make a huge difference.”

More News Topics

California public school enrollment drops again, but transitional kindergarten is up, how large might labour’s ‘dullness dividend’ be.

  • Media Mention
  • Politics and Media

When the Export-Import Bank closed up, US companies saw global sales plummet

  • Global Development and Trade
  • Money and Finance
  • Research Highlight

Deactivating Facebook for just a few weeks reduces belief in fake news

The largest study ever carried out on social media deactivation has found that disconnecting lowers users political participation and also their propensity to believe misinformation.

El ex presidente de EE UU, Donald Trump (a la izquierda), y el presidente Joe Biden (a la derecha)

Before the 2020 U.S. presidential election, more than 35,000 Facebook and Instagram users agreed to participate in an experiment. Twenty-seven percent of that randomly chosen group were paid to deactivate their accounts for six weeks. The rest only disconnected for a week. The objective was to analyze what happens when users disconnect from the two of the biggest social media networks in the most heated weeks of the U.S. political calendar.

The result is that hardly anything at all happens. Except for one detail: the group that disconnected from Facebook (not Instagram) tended to not believe the misinformation that was circulating online. On the other hand, their political participation, especially online, also decreased.

The new article, published on Monday in the journal PNAS , is the work of more than 30 academics from U.S. universities and Meta researchers. It is part of the macro study, which began to be published last summer in the magazines Science and Nature. This research found that conservatives consume more misinformation on Facebook, among other results. The project is the result of an August 2020 agreement between Meta and two professors, who then selected the rest of the researchers.

One of the authors of the article, Stanford University professor Matthew Gentzkow, says there are two main findings from the PNAS study: “First, stopping using Facebook and Instagram in the final stretch of the election had little or no effect on political opinions, their negative opinions about opposing parties, or beliefs about complaints of electoral fraud. Second, stopping using Facebook does affect people’s knowledge and beliefs. Those who went off Facebook responded worse to news tests, but they were also less likely to believe widespread misinformation, suggesting that the platform may be an important channel for both true and fake news,” he says.

Although the result is not completely clear, for Gentzkow, the finding is quite surprising: “Previous research has shown that exposure to misinformation is often quite low for most people, so I was really surprised to see this effect, which was large enough to be marginally detectable,” he says.

Unprecedented macro study

Aside from being part of an unprecedented macro study using internal Meta data, the work is also the largest ever done on social media deactivation. The sample is 10 times larger than any previous experiment, according to the article. The authors, however, admit that the work has its limitations when it comes to measuring the real impact of a network like Facebook in democracies.

“This study has the same problem as the previous ones,” says David García, a professor at the University of Konstanz in Germany, who commented on the summer 2023 articles in Nature . “It is only able to experiment with individuals who are within a society where many other people continue to use Facebook and Instagram normally. When we talk about the Facebook effect, we think about what society would be like without Facebook compared to what it is like with Facebook, not what the people who do not use the network are like compared to those who do use it,” García explains.

Gentzkow admits that investigating that issue is beyond the reach of academia: “It is not possible to do an experiment now to directly answer the question of what polarization would have been like if Facebook had never existed.”

But the experiment, at least, did clear up doubts about whether Instagram was in the same league as Facebook: the article did not find any kind of effect on the influencer-centric network. “Aside from a reduction in online participation, we find no significant impacts of Instagram deactivation on any other primary outcomes,” the article says. “This is true even among younger users, and it suggests that despite Instagram’s rapid growth, Facebook likely remains the platform with the largest impacts on political outcomes.”

The study also found that users who went off Facebook for six weeks were more wary of the political information they saw on Facebook, a shift that also occurred on Instagram: “One potential explanation is that time away from a platform made users more aware of the amount of low-quality or inaccurate information to which they had been exposed,” the study explains.

While the study found that political participation (especially online) fell among the group that disconnected from Facebook, voter intention remained the same. In other words, less participation online did not lead to fewer voters at the polls.

Nor did Instagram and Facebook deactivation have an effect on polarization , perceived legitimacy of the elections or candidate preference. Another question that the researchers wanted to answer was how important Facebook was to Donald Trump. The study did not find definitive data that suggested the platform helped the former U.S. president. Although the results are “statistically insignificant,” they do suggest that deactivation had an effect on who participants voted for. “Deactivation decreased Trump favorability, decreased turnout among Republicans and increased turnout among Democrats,” the article states.

Researcher David García believes that the work could have been clearer on this point: “The result regarding Trump favorability is very interesting. It does not reach the level pre-specified by the scientists, but because the standard of evidence they had is very high. If they had assumed that Facebook deactivation lowers the vote for Trump, it would have passed the test. I see more important evidence than what appears in the text. The effect was small for the amount of data they had, but it doesn’t seem so small to me when you think about how close elections in the U.S. tend to be ,” he explains.

Facebook invited 10.6 million users to take part in the study and 637,388 clicked the invitation. Of this figure, only 19,857 people completed the experiment, which also involved a series of surveys. Participants who deactivated their account for six weeks were paid $150. On Instagram, Meta invited 2.6 million users and achieved 15,585 participants.

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

More information

Mark Zuckerberg

The reinvention of Mark Zuckerberg: Facebook founder turns 40 eager to leave labels aside

‘Proyecto Cazafantasmas’ Facebook

The ‘Ghostbusters project’ or how Facebook planned to spy on Snapchat users

Archived in.

  • Donald Trump
  • Francés online
  • Inglés online
  • Italiano online
  • Alemán online
  • Crucigramas & Juegos

Especialízate en Project Management con esta maestría presencial en Madrid, España

research about social media fake news

'Doing your own research' can make fake news seem believable

W hile it's healthy to question what we see and hear in the media, those quick internet searches to fact-check news stories can unexpectedly backfire and lead people to believe false stories , according to the director of the University of Oregon's undergraduate journalism program.

As more people tune into the press for the upcoming election cycle, Seth Lewis, who holds the Shirley Papé Chair in Emerging Media at UO's School of Journalism and Communication, said caution is in order when trying to verify media accounts.

For those who plan to cast a vote in this year's statewide and presidential elections, not knowing what media sources and stories to trust can lead one to end up more misinformed.

"The big takeaway is there are social costs to not trusting journalists and institutions," Lewis said. "There's the cost of encountering poor-quality information and the cost in time that could be spent on other activities besides trying to fact-check the news."

Drawing on interviews conducted in 2020, a time when people were relying heavily on the news for guidance on the COVID-19 pandemic, Lewis and his University of Utah colleague Jacob L. Nelson found that Americans had greater faith in their abilities to fact-check the news than they had in the news itself. Many of those interviewed reported feeling the need to "do their own research" using search engines because of their distrust in journalism as biased and politicized.

But those who reject journalism in favor of their own internet research can wind up more misinformed, falling into conspiracy theories, rabbit holes and low-quality data voids, a problem heightened during election season, Lewis said.

As supported in recent work by a different set of researchers, which appears in the journal Nature , when people were encouraged to do additional searching after reading true and fake stories on the COVID-19 pandemic, for example, they were more likely to believe in fake news than those who hadn't performed an online search.

As ballots for Oregon's statewide election hit mailboxes in May and the 2024 presidential campaign heats up, equipping voters with the tools to more effectively navigate the infinite information environment can increase their access to high-quality news sources, research shows.

In their 2020 interviews, Lewis and Nelson found that frustration and distrust in the news surprisingly crossed partisan lines. People who were interviewed shared the sentiment that only "sheep" would trust journalists and also had a common desire to better understand the world. Yet to uncover that clear, accurate picture, information seekers must challenge not only a news source's biases and reputability but also one's own biases that might influence what stories they trust or dismiss, Lewis said.

"That skepticism should be applied as much to ourselves as to others," he said. "You should be a little bit skeptical of your own opinions."

Waning trust in news media can be traced back to the 1970s and has been rapidly accelerating in recent years because of several challenging crises the United States has faced, Lewis said.

"We're in a moment where we are increasingly realizing that news is both everywhere and nowhere," he said. "News is all around us yet seems to have, in some sense, less impact than it did before. It's never been easier to stumble upon news, but people often talk about being exhausted by it and, therefore, are turning away from it at unprecedented levels."

Journalists can do better to earn the public's trust, Lewis said. Many individuals don't see journalists as experts nor have a strong relationship with them as they do with their doctors, for example.

Although there is a fair bit of distrust in both journalism and health care as institutions, people are more trusting of individual doctors and don't feel the need to fact-check them as they do for individual journalists, Lewis found in a 2023 study published in the journal Media and Communication .

"But journalists are experts," Lewis said. "They are experts in finding accurate information and trying to present it in a professional manner, but they can also do better in presenting themselves as practitioners with expertise."

Bringing transparency into the practice of journalism can illuminate what some people see as a black box. In their latest research study , published April 25 in the research publication Journalism , Lewis and his team noticed in interviews that many Americans perceived journalists as motivated by profits. But in reality, most journalists are paid rather poorly and are motivated more by passion than pursuit of profit, he said. Widespread job cuts also have hit the industry, with hundreds of journalists laid off at the start of 2024.

A disconnect exists between how people perceive journalism and how it actually works, and journalists should share the principles, techniques and challenges that go into it, Lewis said.

Journalists can also embrace more public engagement in their work. For instance, Lewis' UO colleague Ed Madison leads the Journalistic Learning Initiative, which gives middle- and high-schoolers the opportunity to learn journalistic techniques, become more media literate and tell factual stories about their world.

"What it takes to build trust in journalism is the same as anywhere else," Lewis said. "By building relationships."

Provided by University of Oregon

Credit: Pixabay/CC0 Public Domain

IMAGES

  1. Facebook's fake news problem in one chart

    research about social media fake news

  2. The Psychology of Fake News

    research about social media fake news

  3. Fake News, Social Media and Politics

    research about social media fake news

  4. Fake news infographic

    research about social media fake news

  5. The best ways to spot fake news on social media & help stop the spread

    research about social media fake news

  6. Fake news is growing thanks to social networks users, research study

    research about social media fake news

VIDEO

  1. Social media fake screen shot 😮 #tech #tips #tricks #hiddenchat

  2. Gen Asim Munir comments on social media fake news

  3. Subhalekha Sudhakar Emotional on Social Media Fake News

  4. Social Media

  5. Fake News

  6. Kadapa DSP MD Sharif Denied Social Media Fake News

COMMENTS

  1. Fake news on Social Media: the Impact on Society

    Fake news (FN) on social media (SM) rose to prominence in 2016 during the United States of America presidential election, leading people to question science, true news (TN), and societal norms. FN is increasingly affecting societal values, changing opinions on critical issues and topics as well as redefining facts, truths, and beliefs. To understand the degree to which FN has changed society ...

  2. A comprehensive survey of fake news in social networks: Attributes

    The author (Katsaros et al., 2019) tested various optimal Machine learning used to detect fake news. Identical research was done by The author (Agarwal et al., 2019) ... Section 6, provides the most recent research-based studies for detecting and analyzing fake news on social media, as well as links to online resources for fact-checking and ...

  3. Emotions unveiled: detecting COVID-19 fake news on social media

    Research in the field of fake news began following the 2016 US election (Carlson, 2020; Wang et al., 2019).Fake news has been a popular topic in multiple disciplines, such as journalism ...

  4. A systematic review on fake news research through the lens of news

    To mitigate this research gap, we present a comprehensive survey of fake news research, conducted in the fields of computer and social sciences, through the lens of news creation and consumption with internal and external factors. ... Allcott H, Gentzkow M. Social media and fake news in the 2016 election. Journal of Economic Perspectives. 2017 ...

  5. The impact of fake news on social media and its influence on health

    Importantly, false information has been a genuine concern among social-media platforms and governments, which have implemented strategies to contain misinformation and fake news during the pandemic. Of the social-media platforms, in order to contain the advance of FNs, Facebook has implemented a new feature to inform users when they engage with ...

  6. Controlling the spread of misinformation

    Fake news on social media reached a crescendo surrounding the 2016 U.S. presidential election. Facebook officials testified that up to 60 million bots spread misinformation on its platform, while a study found that a quarter of preelection tweets linking to news articles shared false or extremely biased information.

  7. Full article: Combating fake news, disinformation, and misinformation

    Fake news stories are shared more often on social media than articles from edited news media (Silverman & Alexander, Citation 2016), where there is some form of gatekeeping. Caplan et al. ( Citation 2018 ) corroborate this assertion and submit that social media platforms like Facebook and Twitter have been heavily cited as facilitating the ...

  8. The impact of fake news on social media and its influence on health

    Purpose As the new coronavirus disease propagated around the world, the rapid spread of news caused uncertainty in the population. False news has taken over social media, becoming part of life for many people. Thus, this study aimed to evaluate, through a systematic review, the impact of social media on the dissemination of infodemic knowing and its impacts on health. Methods A systematic ...

  9. The Amplification of Exaggerated and False News on Social Media: The

    We use a unique, nationally representative, survey of UK social media users (n = 2,005) to identify the main factors associated with a specific and particularly troubling form of sharing behavior: the amplification of exaggerated and false news.Our conceptual framework and research design advance research in two ways.

  10. (PDF) How to Identify Fake News on Social Media: A Systematic

    How to Identify Fake News on Social Media: A. Systematic Literature Revie w 1. Russel Jowore and Marita Turpin2 [0000-0002-4425-2010] 1 ,2 Department of Informatics, University of Pretoria ...

  11. "Fake News" Is Not Simply False Information: A Concept Explication and

    "Fake news," or fabricated information that is patently false, has become a major phenomenon in the context of Internet-based media. It has received serious attention in a variety of fields, with scholars investigating the antecedents, characteristics, and consequences of its creation and dissemination.

  12. Fake news on Social Media: the Impact on Society

    Fake news (FN) on social media (SM) rose to prominence in 2016 during the United States of America presidential election, leading people to question science, true news (TN), and societal norms. FN ...

  13. The Influence of Fake News on Social Media: Analysis and Verification

    The purpose of this research was to analyze the prevalence of fake news on social networks, and implicitly, the economic crisis generated by the COVID-19 pandemic, as well as the identification of solutions for filtering and detecting fake news. In this context, we created a series of functions to identify fake content, using information collected from different articles, through advanced ...

  14. Fake news on Social Media: the Impact on Society

    Abstract. Fake news (FN) on social media (SM) rose to prominence in 2016 during the United States of America presidential election, leading people to question science, true news (TN), and societal norms. FN is increasingly affecting societal values, changing opinions on critical issues and topics as well as redefining facts, truths, and beliefs.

  15. MIT Sloan research about social media, misinformation, and elections

    Below is an overview of some MIT Sloan research about social media, fake news, and elections. Problems False rumors spread faster and wider than true information, according to a 2018 study published in Science by MIT Sloan professor Sinan Aral and Deb Roy and Soroush Vosoughi of the MIT Media Lab. They found falsehoods are 70% more likely to be ...

  16. Fake news, social media and marketing: A systematic review

    Therefore, marketing research about self-enhancement and group solidarity motivations to share fake news on social media should be taken into consideration. Alongside this, as confirmed by preliminary findings from Borges-Tiago and colleagues (2020) , information literacy and information technology skills could play a role in determining the ...

  17. Fake news, disinformation and misinformation in social media: a review

    Social media outperformed television as the major news source for young people of the UK and the USA. 10 Moreover, as it is easier to generate and disseminate news online than with traditional media or face to face, large volumes of fake news are produced online for many reasons (Shu et al. 2017).Furthermore, it has been reported in a previous study about the spread of online news on Twitter ...

  18. Fake news and the spread of misinformation: A research roundup

    Our results suggest that belief in fake news has similar cognitive properties to other forms of bullshit receptivity, and reinforce the important role that analytic thinking plays in the recognition of misinformation." "Social Media and Fake News in the 2016 Election" Allcott, Hunt; Gentzkow, Matthew.

  19. Fake news on social media: Understanding teens' (Dis)engagement with

    This article takes a qualitative approach to examine the role of fake news in shaping adolescent's participation in news. Instead of experimental approaches that are common with similar research, the current study expands our understanding of teenagers' engagement with news on social media using focus groups, interviews in addition to reviewing research reports by the Norwegian Media ...

  20. Study reveals key reason why fake news spreads on social media

    USC study reveals the key reason why fake news spreads on social media. The USC-led study of more than 2,400 Facebook users suggests that platforms — more than individual users — have a larger role to play in stopping the spread of misinformation online. USC researchers may have found the biggest influencer in the spread of fake news ...

  21. Fake news detection based on news content and social contexts: a

    According to the research in fake news detection, fake news usually takes less than an hour to spread. It is easy to detect fake news after 24 h, but earlier detection is a challenge, as discussed earlier. ... Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In: 32nd ...

  22. Americans share fake news to fit in with social circles

    Participants who were more concerned about the social costs of not fitting in were also more likely to share fake news. While fake news may seem prolific, prior research has found that fake news only accounts for 0.15% of Americans' daily media consumption, and 1% of individuals are responsible for 80% of fake news sharing. Other research ...

  23. A systematic review on fake news research through the lens of news

    Fake news research trends of the past decade (2010-2020). We collected 2,277 fake news related-papers and randomly chose and categorized 200 papers. Each marker indicates the number of fake news studies per type published in a given year. ... To date, research on people's susceptibility to fake news in social media has lagged behind ...

  24. Fake news unmasked: Emotional features boost fake news detection

    Study: Emotions unveiled: detecting COVID-19 fake news on social media.Image Credit: voyata / Shutterstock. Background. It is well established that social media significantly impacts various ...

  25. The tentacles of retracted science reach deep into social media. A

    A University of Sydney team is hoping to help social media users identify posts featuring misinformation and disinformation arising from now-debunked science. They have developed and tested a new interface that helps users discover further information about potentially fraught claims on social media.

  26. Fake News

    "Fake news" refers to news articles or other items that are deliberately false and intended to manipulate the viewer. While the concept of fake news stretches back to antiquity, it has become a large problem in recent years due to the ease with which it can be spread on social media and other online platforms, as people are often less likely to critically evaluate news shared by their friends ...

  27. Approaches to Identify Fake News: A Systematic Literature Review

    Introduction. Paskin (2018: 254) defines fake news as "particular news articles that originate either on mainstream media (online or offline) or social media and have no factual basis, but are presented as facts and not satire".The importance of combatting fake news is starkly illustrated during the current COVID-19 pandemic. Social networks are stepping up in using digital fake news ...

  28. Facebook went away. Political divides didn't budge

    News. Research Highlights; Media Mentions; Awards & Appointments; Economic Impact of COVID-19 ... The project came together following critiques of Meta's role in the spread of fake news, Russian influence, and the Cambridge Analytica data breach. ... Some participants also allowed the researchers to track how they used other news and social ...

  29. Deactivating Facebook for just a few weeks reduces belief in fake news

    Deactivating Facebook for just a few weeks reduces belief in fake news ... for Gentzkow, the finding is quite surprising: "Previous research has shown that exposure to misinformation is often quite low for most people, so I was ... the work is also the largest ever done on social media deactivation. The sample is 10 times larger than any ...

  30. 'Doing your own research' can make fake news seem believable

    Story by Leila Okahata. • 2d • 4 min read. While it's healthy to question what we see and hear in the media, those quick internet searches to fact-check news stories can unexpectedly backfire ...