Social sciences – Carbon Chemist

The importance of family-based sampling for biobanks

[ad_1]
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). This paper provides a broad overview of the population-based UK Biobank sample, which has had a transformative influence on epidemiology and the genetic study of complex traits.

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).

Article
PubMed
PubMed Central

Google Scholar
The All of Us Research Program Investigators. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019).

Article
PubMed Central

Google Scholar
Our Future Health Study Protocol. Our Future Health https://medconfidential.org/wp-content/uploads/2023/06/CLEAN-3-Protocol-V4.0-FINAL_15DEC2022_Redacted.pdf (2022).
Davies, N. M., Dickson, M., Davey Smith, G., van den Berg, G. J. & Windmeijer, F. The causal effects of education on health outcomes in the UK Biobank. Nat. Hum. Behav. 2, 117–125 (2018).

Article
PubMed
PubMed Central

Google Scholar
Abdellaoui, A. et al. Genetic correlates of social stratification in Great Britain. Nat. Hum. Behav. 3, 1332–1342 (2019).

Article
PubMed

Google Scholar
Yengo, L. et al. Imprint of assortative mating on the human genome. Nat. Hum. Behav. 2, 948–954 (2018). This paper introduced an approach to test the extent of assortative mating across traits using molecular genetic data.

Article
PubMed
PubMed Central

Google Scholar
Sanjak, J. S., Sidorenko, J., Robinson, M. R., Thornton, K. R. & Visscher, P. M. Evidence of directional and stabilizing selection in contemporary humans. Proc. Natl Acad. Sci. USA 115, 151–156 (2018).

Article
ADS
CAS
PubMed

Google Scholar
Gardner, E. J. et al. Reduced reproductive success is associated with selective constraint on human genes. Nature 603, 858–863 (2022).

Article
ADS
CAS
PubMed

Google Scholar
Griffith, G. J. et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nat. Commun. 11, 5749 (2020).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Abdellaoui, A., Yengo, L., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar
Carr, D. & Springer, K. W. Advances in families and health research in the 21st century. J. Marriage Fam. 72, 743–761 (2010).

Article

Google Scholar
Macmillan, L. & Tominey, E. Parental inputs and socio-economic gaps in early child development. J. Popul. Econ. 36, 1513–1543 (2023).

Article

Google Scholar
Lawlor, D. A. & Mishra, G. D. (eds) Family Matters: Designing, Analysing, and Understanding Family-Based Studies in Life Course Epidemiology (Oxford Univ. Press, 2009).
Dicks, A., Levels, M., van der Velden, R. & Mills, M. C. How young mothers rely on kin networks and formal childcare to avoid becoming NEET in the Netherlands. Front. Sociol. 6, 787532 (2021).

Article
PubMed

Google Scholar
Bratti, M., Fiore, S. & Mendola, M. The impact of family size and sibling structure on the great Mexico–USA migration. J. Popul. Econ. 33, 483–529 (2020).

Article

Google Scholar
Torche, F. Analyses of intergenerational mobility: an interdisciplinary review. Ann. Am. Acad. Pol. Soc. Sci. 657, 37–62 (2015).

Article

Google Scholar
Chetty, R., Hendren, N., Kline, P. & Saez, E. Where is the land of opportunity? The geography of intergenerational mobility in the United States. Q. J. Econ. 129, 1553–1623 (2014).

Article

Google Scholar
Hertz, T. et al. The inheritance of educational inequality: international comparisons and fifty-year trends. BE J. Econ. Anal. Policy 7, 48 (2008).

Google Scholar
Taubes, G. Epidemiology faces its limits: the search for subtle links between diet, lifestyle, or environmental factors and disease is an unending source of fear—but often yields little certainty. Science 269, 164–169 (1995).

Article
ADS
MathSciNet
CAS
PubMed

Google Scholar
D’Onofrio, B. M., Lahey, B. B., Turkheimer, E. & Lichtenstein, P. Critical need for family-based, quasi-experimental designs in integrating genetic and social science research. Am. J. Public Health 103, S46–S55 (2013).

Article
PubMed
PubMed Central

Google Scholar
Knopik, V. S. Maternal smoking during pregnancy and child outcomes: real or spurious effect? Dev. Neuropsychol. 34, 1–36 (2009).

Article
PubMed
PubMed Central

Google Scholar
Cnattingius, S. The epidemiology of smoking during pregnancy: Smoking prevalence, maternal characteristics, and pregnancy outcomes. Nicotine Tob. Res. 6, 125–140 (2004).

Article

Google Scholar
Tam, V. et al. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 20, 467–484 (2019).

Article
CAS
PubMed

Google Scholar
Loos, R. J. F. 15 years of genome-wide association studies and no signs of slowing down. Nat. Commun. 11, 5900 (2020).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Price, A. L., Zaitlen, N. A., Reich, D. & Patterson, N. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11, 459–463 (2010).

Article
CAS
PubMed
PubMed Central

Google Scholar
Berg, J. J. et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, e39725 (2019).

Article
PubMed
PubMed Central

Google Scholar
Young, A. I. et al. Mendelian imputation of parental genotypes improves estimates of direct genetic effects. Nat. Genet. 54, 897–905 (2022). This paper demonstrates that missing genotypes of relatives can be imputed in a way that provides unbiased estimates of direct and indirect genetic effects.

Article
CAS
PubMed
PubMed Central

Google Scholar
Howe, L. J. et al. Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects. Nat. Genet. 54, 581–592 (2022). This paper used a large sample of siblings to estimate direct genetic effects and to demonstrate that genetic associations are inflated in samples of unrelated individuals for many traits.

Article
CAS
PubMed
PubMed Central

Google Scholar
Zaidi, A. A. & Mathieson, I. Demographic history mediates the effect of stratification on polygenic scores. eLife 9, e61548 (2020).

Article
CAS
PubMed
PubMed Central

Google Scholar
Turchin, M. C. et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 44, 1015–1019 (2012).

Article
CAS
PubMed
PubMed Central

Google Scholar
Berg, J. J. & Coop, G. A population genetic signal of polygenic adaptation. PLoS Genet. 10, e1004412 (2014).

Article
PubMed
PubMed Central

Google Scholar
Robinson, M. R. et al. Population genetic differentiation of height and body mass index across Europe. Nat. Genet. 47, 1357–1362 (2015).

Article
CAS
PubMed
PubMed Central

Google Scholar
Field, Y. et al. Detection of human adaptation during the past 2000 years. Science 354, 760–764 (2016).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Racimo, F., Berg, J. J. & Pickrell, J. K. Detecting polygenic adaptation in admixture graphs. Genetics 208, 1565–1584 (2018).

Article
PubMed
PubMed Central

Google Scholar
Guo, J. et al. Global genetic differentiation of complex traits shaped by natural selection in humans. Nat. Commun. 9, 1865 (2018).

Article
ADS
PubMed
PubMed Central

Google Scholar
Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012). This paper demonstrated that principal components can control for population stratification of common variants but that this approach is less successful for rare variants.

Article
CAS
PubMed
PubMed Central

Google Scholar
Persyn, E., Redon, R., Bellanger, L. & Dina, C. The impact of a fine-scale population stratification on rare variant association test results. PLoS ONE 13, e0207677 (2018).

Article
PubMed
PubMed Central

Google Scholar
Bouaziz, M. et al. Controlling for human population stratification in rare variant association studies. Sci. Rep. 11, 19015 (2021).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Kong, A. et al. The nature of nurture: effects of parental genotypes. Science 359, 424–428 (2018). This paper was the first to demonstrate indirect genetic effects using molecular genetic data in very large samples of trios.

Article
ADS
CAS
PubMed

Google Scholar
Demange, P. A. et al. Estimating effects of parents’ cognitive and non-cognitive skills on offspring education using polygenic scores. Nat. Commun. 13, 4801 (2022).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Wang, B. et al. Robust genetic nurture effects on education: A systematic review and meta-analysis based on 38,654 families across 8 cohorts. Am. J. Hum. Genet. 108, 1780–1791 (2021).

Article
CAS
PubMed
PubMed Central

Google Scholar
Young, A. I. et al. Relatedness disequilibrium regression estimates heritability without environmental bias. Nat. Genet. 50, 1304–1310 (2018).

Article
CAS
PubMed
PubMed Central

Google Scholar
Young, A. I., Benonisdottir, S., Przeworski, M. & Kong, A. Deconstructing the sources of genotype-phenotype associations in humans. Science 365, 1396–1400 (2019).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Davies, N. M. et al. Within family Mendelian randomization studies. Hum. Mol. Genet. 28, R170–R179 (2019).

Article
CAS
PubMed

Google Scholar
Border, R. et al. Assortative mating biases marker-based heritability estimators. Nat. Commun. 13, 660 (2022). This paper reports the extent of cross-trait assortative mating and its implications for misinterpretations of genetic correlations.

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Border, R. et al. Cross-trait assortative mating is widespread and inflates genetic correlation estimate. Science 378, 754–761 (2022).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Magnus, P. et al. Cohort profile update: The Norwegian Mother and Child Cohort Study (MoBa). Int. J. Epidemiol. 45, 382–388 (2016).

Article
PubMed

Google Scholar
Tapia-Conyer, R. et al. Cohort profile: The Mexico City Prospective Study. Int. J. Epidemiol. 35, 243–249 (2006).

Article
PubMed

Google Scholar
Sijtsma, A. et al. Cohort profile update: Lifelines, a three-generation cohort study and biobank. Int. J. Epidemiol. 51, e295–e302 (2022).

Article
PubMed

Google Scholar
Van Der Laan, J., De Jonge, E., Das, M., Te Riele, S. & Emery, T. A whole population network and its application for the social sciences. Eur. Sociol. Rev. 39, 145–160 (2023).

Article

Google Scholar
Liu, A. et al. Evidence from Finland and Sweden on the relationship between early-life diseases and lifetime childlessness in men and women. Nat. Hum. Behav. 8, 276–287 (2023).

Article
PubMed
PubMed Central

Google Scholar
Allesøe, R. L. et al. Deep learning for cross-diagnostic prediction of mental disorder diagnosis and prognosis using Danish nationwide register and genetic data. JAMA Psychiatry 80, 146 (2023).

Article
PubMed

Google Scholar
Boyd, A. et al. Cohort profile: The ‘Children of the 90s’—the index offspring of the Avon Longitudinal Study of Parents and Children. Int. J. Epidemiol. 42, 111–127 (2013).

Article
PubMed

Google Scholar
Power, C., Kuh, D. & Morton, S. From developmental origins of adult disease to life course research on adult disease and aging: insights from birth cohort studies. Annu. Rev. Public Health 34, 7–28 (2013).

Article
PubMed

Google Scholar
Larmuseau, M. H. D. et al. Low historical rates of cuckoldry in a Western European human population traced by Y-chromosome and genealogical data. Proc. R. Soc. B Biol. Sci. 280, 20132400 (2013).

Article
CAS

Google Scholar
Around 7,000 children born each year in England and Wales likely to experience the death of their mother. Office for National Statistics https://www.ons.gov.uk/news/news/around7000childrenborneachyearinenglandandwaleslikelytoexperiencethedeathoftheirmother (2019).
Tomkins, S. in Family Matters: Designing, Analysing and Understanding Family Based Studies in Life Course Epidemiology (eds Lawlor, D. A. & Mishra, G. D.) Ch. 8, 129–150 (Oxford Univ. Press, 2009).
Berthoud, R., Fumagalli, L., Lynn, P. & Platt, L. Design of the Understanding Society Ethnic Minority Boost Sample.Working Paper No. 2009-02 (Institute for Social and Economic Research, University of Essex, 2009).
Schreuder, P. & Alsaker, E. The Norwegian Mother and Child Cohort Study (MoBa) – MoBa recruitment and logistics. Nor. Epidemiol. 24, 23–27 (2014).

Google Scholar
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).

Article
PubMed

Google Scholar
Lawlor, D. A. & Leon, D. A. in Family Matters: Designing, Analysing and Understanding Family Based Studies in Life Course Epidemiology (eds Lawlor, D. A. & Mishra, G. D.) Ch. 13, 263–278 (Oxford Univ. Press, 2009).
Davies, N. M., Holmes, M. V. & Smith, G. D. Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians. Brit. Med. J. 362, k601 (2018).

Article
PubMed
PubMed Central

Google Scholar
Brumpton, B. et al. Avoiding dynastic, assortative mating, and population stratification biases in Mendelian randomization through within-family analyses. Nat. Commun. 11, 3519 (2020).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Howe, L. J. et al. Educational attainment, health outcomes and mortality: a within-sibship Mendelian randomization study. Int. J. Epidemiol. 52, 1579–1591 (2023).

Article
PubMed
PubMed Central

Google Scholar
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).

Article
CAS
PubMed
PubMed Central

Google Scholar
Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019).

Article
CAS
PubMed
PubMed Central

Google Scholar
Kendler, K. S., Gardner, C. O. & Lichtenstein, P. A developmental twin study of symptoms of anxiety and depression: evidence for genetic innovation and attenuation. Psychol. Med. 38, 1567–1575 (2008).

Article
CAS
PubMed
PubMed Central

Google Scholar
Ott, J., Kamatani, Y. & Lathrop, M. Family-based designs for genome-wide association studies. Nat. Rev. Genet. 12, 465–474 (2011).

Article
CAS
PubMed

Google Scholar
Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat. Genet. 40, 1068–1075 (2008).

Article
CAS
PubMed
PubMed Central

Google Scholar
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

Article
CAS
PubMed
PubMed Central

Google Scholar
Young, A. I. Solving the missing heritability problem. PLoS Genet. 15, 1008222 (2019).

Article

Google Scholar
Visscher, P. M. et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2, e41 (2006).

Article
PubMed
PubMed Central

Google Scholar
Kemper, K. E. et al. Phenotypic covariance across the entire spectrum of relatedness for 86 billion pairs of individuals. Nat. Commun. 12, 1050 (2021).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Balbona, J. V., Kim, Y. & Keller, M. C. Estimation of parental effects using polygenic scores. Behav. Genet. 51, 264–278 (2021). This paper described how samples of related individuals with molecular genetic data can be used to estimate parental effects while controlling for assortative mating.

Article
PubMed
PubMed Central

Google Scholar
Lawson, H. A., Cheverud, J. M. & Wolf, J. B. Genomic imprinting and parent-of-origin effects on complex traits. Nat. Rev. Genet. 14, 609–617 (2013).

Article
CAS
PubMed
PubMed Central

Google Scholar
Robinson, M. R. et al. Genetic evidence of assortative mating in humans. Nat. Hum. Behav. 1, 0016 (2017).

Article

Google Scholar
Singh, T. et al. Rare coding variants in ten genes confer substantial risk for schizophrenia. Nature 604, 509–516 (2022).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Kong, A. et al. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471–475 (2012).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010).

Article
ADS
CAS
PubMed

Google Scholar
Sasani, T. A. et al. Large, three-generation human families reveal post-zygotic mosaicism and variability in germline mutation accumulation. eLife 8, e46922 (2019).

Article
PubMed
PubMed Central

Google Scholar
Jónsson, H. et al. Whole genome characterization of sequence diversity of 15,220 Icelanders. Sci. Data 4, 170115 (2017).

Article
PubMed
PubMed Central

Google Scholar
Kaplanis, J. et al. Genetic and chemotherapeutic influences on germline hypermutation. Nature 605, 503–508 (2022).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Genomics England Research Consortium. Heritability of de novo germline mutation reveals a contribution from paternal but not maternal genetic factors. Preprint at bioRxiv https://doi.org/10.1101/2022.12.17.520885 (2022).
Stankovic, S. et al. Genetic links between ovarian ageing, cancer risk and de novo mutation rates. Nature 633, 608–614 (2014).

Differences in misinformation sharing can lead to politically asymmetric sanctions

[ad_1]

Sample and basic data collection for 2020 election study

First, we collected a list of Twitter users who tweeted or retweeted either of the election hashtags #Trump2020 and #VoteBidenHarris2020 on 6 October 2020. We also collected the most recent 3,200 tweets sent by each of those accounts. We processed tweets and extracted tweeted domains from 34,920 randomly selected users (15,714 shared #Trump2020 and 19,206 shared #VoteBidenHarris2020), and filtered down to 12,238 users who shared at least five links to domains used by the ideology estimator of ref. ⁵⁷. We also excluded 426 ‘elite’ users with more than 15,000 followers who are probably unrepresentative of Twitter users more generally (because of this exclusion, suspension data were not collected for these users; however, as described in Supplementary Information section 2, our main results on the association between political orientation and low-quality news sharing are also observed among these elite users). These data were collected as part of a project that was approved by the Massachusetts Institute of Technology Committee on the Use of Humans as Experimental Subjects Protocol 91046.

We then constructed a politically balanced set of users by randomly selecting 4,500 users each from the remaining 4,756 users who shared #Trump2020 and 7,056 users who shared #VoteBidenHarris2020. After 9 months, on 30 July 2021, we checked the status of the 9,000 users and assessed suspension. We classify an account as having been suspended if the Twitter application programming interface (API) returned error code 63 (‘User has been suspended’) when querying that user.

To measure a user’s tendency to share misinformation, we follow most other researchers in this space^11,12,58,59 and use news source quality as a proxy for article accuracy, because it is not feasible to rate the accuracy of individual tweets at scale. Specifically, to quantify the quality of news shared by each user, we leveraged a previously published set of 60 news sites (20 mainstream, 20 hyper-partisan 20 fake news; Table 1) whose trustworthiness had been rated by 8 professional fact-checkers as well as politically balanced crowds of laypeople. The crowd ratings were determined as follows. A sample of 971 participants from the USA, quota-matched to the national distribution on age, gender, ethnicity and geographic region, were recruited through Lucid⁶⁰. Each participant indicated how much they trusted each of the 60 news outlets using a 5-point Likert scale. For each outlet, we then calculated politically balanced crowd ratings by calculating the average trust among Democrats and the average trust among Republicans, and then averaging those two average ratings.

We also examined Reliability ratings for a set of 283 sites from Ad Fontes Media, Inc., Factual Reporting ratings for a set of 3,216 sites from Media Bias/Fact Check and Accuracy ratings for a set of 4,767 sites from a recent academic paper by Lasser et al.³³. We then used the Twitter API to retrieve the last 3,200 posts (as of 6 October 2020) for each user in our study, and collected all links to any of those sites shared (tweeted or retweeted) by each user. Following the approach used in previous work^58,59, we calculated a news quality score for each user (bounded between 0 and 1) by averaging the ratings of all sites whose links they shared, separately for each set of site ratings. Finally, we transform these ratings into low-quality news sharing scores by subtracting the news quality ratings from 1. Over 99% of users in our study had shared at least one link to a rated domain. When combining the four expert-based measures into an aggregate news quality score, we replaced missing values with the sample mean; PCA indicated that only one component should be retained (87% of variation explained), which had weights of 0.50 on Pennycook and Rand (ref. ³⁸) fact-checker ratings, 0.51 on Ad Fontes Media Reliability ratings, 0.48 on Media Bias/Fact Check Factual Reporting ratings and 0.51 on Lasser et al.³³ Accuracy ratings. In all PCA analyses, we use parallel analysis to determine the number of retained components.

To measure a user’s political orientation, we first classify their partisanship on the basis of whether they shared more #Trump2020 or #VoteBidenHarris2020 hashtags. Additionally, we retrieved all accounts followed by users in our sample and used the statistical model from ref. ³⁹ to obtain a continuous measure of users’ ideology on the basis of the ideological leaning of the accounts they followed. Similarly, we used the statistical models from ref. ⁴⁰ and ref. ¹² to estimate users’ ideology using the ideological leanings of the news sites that the users shared content from. We also calculated user ideology by averaging political leanings of domains they shared through tweets or retweets on the basis of the method in ref. ¹². The intuition behind these approaches is that users on social media are more likely to follow accounts (and share news stories from sources) that are aligned with their own ideology than those that are politically distant. Thus, the ideology of the accounts the user follows, and the ideology of the news sources the user shares, provide insight into the user’s ideology. When combining these four measures into an aggregate political orientation score, we replaced missing values with the sample mean; PCA indicated that only one component should be retained (88% of variation explained), which had weights of 0.49 on hashtag-based partisanship, 0.49 on follower-based ideology, 0.51 on sharing-based ideology estimated through ref. ⁴⁰ and 0.51 on sharing-based ideology estimated through ref. ¹². We also used this aggregate measure to calculate a user’s extent of ideological extremity by taking the absolute value of the aggregate ideology measure; and we used PCA to combine measures of the standard deviation across a user’s tweets of news site ideology scores from ref. ¹² and ref. ⁴⁰, and standard deviation of ideology of accounts followed from ref. ³⁹, as a measure of the ideological uniformity (versus diversity) of news shared by the user.

Policy simulations

In addition to the regression analyses, we also simulate politically neutral suspension policies and determine each user’s probability of suspension; and from this, determine the level of differential impact we would expect in the absence of differential treatment. The procedure is as follows. First, we identify a set of low-quality sources that could potentially lead to suspension. We do so using the politically balanced layperson trustworthiness ratings from ref. ³⁸, as well as using the fact-checker trustworthiness ratings from that same paper. For both sets of ratings, there is a natural discontinuity at a value of 0.25 (on a normalized trust scale from 0 = Not at all to 1 = Entirely) (Extended Data Fig. 2). Thus, we consider sites with average trustworthiness ratings below 0.25 to be ‘low quality’; and for each user, we count the number of times they tweet links to any of these low-quality sites.

We then define a suspension policy as the probability of a user getting suspended each time they share a link to a low-quality news site. We model suspension as probabilistic because many (almost certainly most) of the articles from low-quality news sites are not actually false, and sharing such articles does not constitute an offence. Thus, we consider who would get suspended under suspension policies that differ in their harshness, varying from a 0.01% chance of getting suspended for each shared link to a low-quality news site up to a 10% chance. Specifically, for each user, we calculate their probability of getting suspended as

$$P\left({\rm{suspended}}\right)=1-{\left(1-k\right)}^{L}$$

where L is the number of low-quality links shared, and k is the probability of suspension for each shared link (that is, the policy harshness). The only way the user would not get suspended is if on each of the L times they share a low-quality link, they are not suspended. Because they do not get suspended with probability (1 − k), the probability that they would never get suspended is (1 − k)^L. Therefore, the probability that they would get suspended at some point is 1 − (1 − k)^L.

We then calculate the mean (and 95% confidence interval) of that probability across all Democrats versus Republicans in our sample (as determined by sharing Biden versus Trump election hashtags). The results of these analyses are shown in Fig. 3b, and Supplementary Information section 2 presents statistical analyses of estimated probability of suspension on the basis of each measure of political orientation.

We also do a similar exercise using the likelihood of being a bot, rather than low-quality news sharing. The algorithm of ref. ⁴³ provides an estimated probability of being a bot for each user, on the basis of the contents of their tweets. We define a suspension policy as the minimum probability of being human, k, required to avoid suspension (or, in other words, a threshold on bot likelihood above which the user gets suspended). Specifically, for a policy of harshness k, users with bot probability greater than (1 − k) are suspended. The results of these analyses are shown in Fig. 3c.

Reanalyses of extra datasets

Facebook sharing in 2016 by users recruited through YouGov

Here we analyse data presented in ref. ¹¹. A total of n = 1,191 survey respondents recruited using YouGov gave the researchers permission to collect the links they shared on Facebook for 2 months (through a Facebook app), starting in November 2016. As part of the survey, participants self-reported their ideology (using a 5-point Likert scale; not including participants who selected ‘Not sure’, yielding n = 995 respondents with usable ideology data) and their party affiliation (Democrat, Republican, Independent, Other, Not sure). As in our Twitter studies, we calculate low-quality information sharing scores for each user by using the fact-checker and politically balanced crowd ratings for the 60 news sites from ref. ³⁸, as described above in Table 1. A total of 893 participants shared at least one rated link.

Twitter sharing in 2018 and 2020 by users recruited through Prolific

Here we analyse data presented in ref. ⁴¹. A total of n = 2,100 participants were recruited using the online labour market Prolific in June 2018. Twitter IDs were provided by participants at the beginning of the study. However, some participants entered obviously fake Twitter IDs—for example, the accounts of celebrities. To screen out such accounts, we followed the original paper and excluded accounts with follower counts above the 95th percentile in the dataset. We had complete data and usable Twitter IDs for 1,901 users. As part of the survey, participants self-reported the extent to which they were economically liberal versus conservative, and socially liberal versus conservative, using 5-point Likert scales. We construct an overall ideology measure by averaging over the economic and social measures. The Twitter API was used to retrieve the content of their last 3,200 tweets (capped by the Twitter API limit). Data were retrieved from Twitter on 18 August 2018, and then again on 12 April 2020 (the latter data pull excludes tweets collected during the former data pull). We calculate low-quality information sharing scores for each user by using the fact-checker and politically balanced crowd ratings for the 60 news sites from ref. ³⁸, as described above in Table 1. A total of 594 participants shared at least one rated link in the 2018 data pull and 379 participants shared at least one rated link in the 2020 data pull; 288 participants shared at least one rated link in both data pulls.

Twitter sharing in 2021 by users who followed at least three political elites

Here we analyse data presented by Mosleh and Rand¹³, in which Twitter accounts for 816 elites were identified, and then 5,000 Twitter users were randomly sampled from the set of 38,328,679 users who followed at least three of the elite accounts. Each user’s last 3,200 tweets were collected on 23 July 2021, and sharing of low-quality news domains was assessed using the fact-checker and politically balanced crowd ratings from ref. ³⁸. A total of 3,070 users shared at least one rated link. The statistical model from ref. ³⁹ was used to obtain a continuous measure of users’ ideology on the basis of the ideological leaning of the accounts they followed.

Twitter sharing in 2022 by users who followed at least three political elites

Here we analyse previously unpublished data, in which 11,805 Twitter users were sampled from a set of 296,202,962 users who followed at one of the political elite accounts from ref. ⁴¹. We randomly sampled from users who had more than 20 lifetime tweets and followed at least three political elites for whom we had a partisanship rating. Each user’s last 3,200 tweets were collected on 25 December 2022, and sharing of low-quality news domains was assessed using the fact-checker and politically balanced crowd ratings from ref. ³⁸. A total of 4,040 users shared at least one rated link. The statistical model from ref. ³⁹ was used to obtain a continuous measure of users’ ideology on the basis of the ideological leaning of the accounts they followed.

Twitter sharing in 2023 by users who followed at least one political elite, stratified on follower count

Here we analyse previously unpublished data in which 11,886 Twitter users were randomly sampled, stratified on the basis of log₁₀-transformed number of followers (rounded to the nearest integer) from the same set of 296,202,962 users who followed at one political elite account. On 4 March 2023, we retrieved all tweets made by each user since 22 December 2022 using the Twitter Academic API. Sharing of low-quality news domains was assessed using the fact-checker and politically balanced crowd ratings from ref. ³⁸. A total of 4,408 users shared at least one rated link. The statistical model from ref. ³⁹ was used to obtain a continuous measure of users’ ideology on the basis of the ideological leaning of the accounts they followed.

Sharing of false claims on Twitter

Here we analyse data from Ghezae et al.⁵³. Unlike the previous analyses, this dataset does not use domain quality as a proxy for misinformation sharing. Instead, sets of specific false versus true headlines were used. The headline sets were assembled by collecting claims that third-party fact-checking websites such as snopes.com or politifact.org had indicated were false, and collecting veridical claims from reputable news outlets. Furthermore, the headlines were pre-tested to determine their political orientation (on the basis of survey respondents’ evaluation of how favourable the headline, if entirely accurate, would be for the Democrats versus Republicans; see ref. ⁵⁶ for details of the pre-testing procedure).

Survey participants were recruited to rate the accuracy of each URL’s headline claim. Specifically, each participant was shown ten headlines randomly sampled from the full set of headlines, and rated how likely they thought it was that the headline was true using a 9-point scale from ‘not at all likely’ to ‘very likely’. For each headline, we created politically balanced crowd ratings by averaging the accuracy ratings of participants who identified as Democrats, averaging the accuracy ratings of participants who identified as Republicans and then averaging these two average ratings. We then classify URLs as inaccurate (and thus as misinformation) on the basis of crowd ratings if the politically balanced crowd rating was below the accuracy scale midpoint.

Additionally, the Twitter Academic API was used to identify all Twitter users who had posted primary tweets containing each URL. These primary tweets occurred between 2016 and 2022 (2016, 1%; 2017, 2%; 2018, 4%; 2019, 5%; 2020, 34%; 2021, 27%; 2022, 27%). The ideology of each of those users was estimated using the statistical model from ref. ³⁹ on the basis of the ideological leaning of the accounts they followed. This allows us to count the number of liberals and conservatives who shared each URL on Twitter.

The dataset pools across three different iterations of this procedure. The first iteration used 104 headlines selected to be politically balanced, such that the Democrat-leaning headlines were as Democrat-leaning as the Republican-leaning headlines were Republican-leaning; n = 1,319 participants from Amazon Mechanical Turk were then shown a random subset of headlines that were half politically neutral and half aligned with the participant’s partisanship. The second iteration used 155 headlines (of which 30 overlapped with headlines used in the first iteration); n = 853 participants recruited using Lucid rated randomly selected headlines. The third iteration used 149 headlines (no overlap with previous iterations); n = 866 participants recruited using Lucid rated randomly selected headlines. The Amazon Mechanical Turk sample was a pure convenience sample, whereas the Lucid samples were quota-matched to the national distribution on age, gender, ethnicity and geographic region, and then true independents were excluded. For the 30 headlines that overlapped between iterations 1 and 2, the politically balanced crowd accuracy ratings from Amazon Mechanical Turk and Lucid correlated with each other at r(28) = 0.75. Therefore, we collapsed the politically balanced ratings across platforms for those 30 headlines. In total, this resulted in a final dataset with fact-checker ratings, politically balanced crowd ratings and counts of numbers of posts by liberals and conservatives on Twitter for 378 unique URLs.

Finally, we also classified the topic of each URL. To do so, we used Claude, an artificial intelligence system designed by Anthropic that emphasizes reliability and predictability, and has text summarization as one of its primary functions. We uploaded the full set of headlines to the artificial intelligence system, and first asked it to summarize the topics discussed in the headlines. We then asked it to indicate the topic covered in each specific headline, and manually inspected the results to ensure that the classifications were sensible. Next, we examined the frequency of each topic, synthesized the results into a set of six overarching topics and then finally asked the artificial intelligence system to categorize each headline into one of these six topics. This process led to the following distribution of topics: US Politics (174 headlines), Social Issues (91 headlines), COVID-19 (48 headlines), Business/Economy (41 headlines), Foreign Affairs (28 headlines) and Crime/Justice (26 headlines). As a test of the robustness of the classification, we also asked another artificial intelligence system, GPT4, to classify the first 100 headlines into the six topics. We found that Claude and GPT4 agreed on 80% of the headlines.

Sharing intentions of false COVID-19 claims across 16 countries

Here, we examine survey data from ref. ³⁷. In these experiments, participants were recruited from 16 different countries using Lucid, with respondents quota-matched to the national distributions on age and gender in each country. Participants were shown ten false and ten true claims about COVID-19 (sampled from a larger set of 45 claims), presented without any source attribution. The claims were collected from fact-checking organizations in numerous countries, as well as sources such as the World Health Organization’s list of COVID-19 myths. This approach removes ideological variation in exposure to misinformation online¹³, as well as any potential source cues/effects, and directly measures variation in the decision about what to share.

As in our other analyses, we complement the professional veracity ratings with crowd ratings. Specifically, n = 8,527 participants in the Accuracy condition rated the accuracy of each of the headlines they were shown using a 6-point Likert scale. We calculate the average accuracy rating for each statement in each country, and classify statements as misinformation if that average rating is below the scale midpoint.

Our main analyses then focus on the responses of the n = 8,597 participants from the Sharing condition, in which participants indicated their likelihood of sharing each claim using a 6-point Likert scale. To calculate each user’s level of misinformation sharing, we first discretize the sharing intentions responses such that choices of 1 (Extremely unlikely), 2 (Moderately unlikely) or 3 (Slightly unlikely) on the Likert scale are counted as not shared, whereas choices of 4 (Slightly likely), 5 (Moderately likely) or 6 (Extremely likely) are counted as shared. We then determine, for each user, the fraction of shared articles that were (1) rated as false by fact-checkers, and (2) rated as below the accuracy scale midpoint on average by respondents in the Accuracy condition.

We then ask how misinformation sharing varies with ideology within each country. Specifically, we construct a conservatism measure by averaging responses to two items from the World Values Survey that were included in the survey, which asked how participants would place their views on the scales of ‘Incomes should be made more equal’ versus ‘There should be greater incentives for individual effort’ and ‘Government should take more responsibility to ensure that everyone is provided for’ versus ‘People should take more responsibility to provide for themselves’ using 10-point Likert scales. Pilot data collected in the USA confirmed that responses to these two items correlated with self-report conservatism (r(956) = 0.32 for the first item and r(956) = 0.40 for the second item).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

[ad_2]

Source link

October 2, 2024

[ad_1]

Data collection

In our study, data collection from various social media platforms was strategically designed to encompass various topics, ensuring maximal heterogeneity in the discussion themes. For each platform, where feasible, we focus on gathering posts related to diverse areas such as politics, news, environment and vaccinations. This approach aims to capture a broad spectrum of discourse, providing a comprehensive view of conversation dynamics across different content categories.

Facebook

We use datasets from previous studies that covered discussions about vaccines⁵⁰, news⁵¹ and brexit⁵². For the vaccines topic, the resulting dataset contains around 2 million comments retrieved from public groups and pages in a period that ranges from 2 January 2010 to 17 July 2017. For the news topic, we selected a list of pages from the Europe Media Monitor that reported the news in English. As a result, the obtained dataset contains around 362 million comments between 9 September 2009 and 18 August 2016. Furthermore, we collect a total of about 4.5 billion likes that the users put on posts and comments concerning these pages. Finally, for the brexit topic, the dataset contains around 460,000 comments from 31 December 2015 to 29 July 2016.

Gab

We collect data from the Pushshift.io archive (https://files.pushshift.io/gab/) concerning discussions taking place from 10 August 2016, when the platform was launched, to 29 October 2018, when Gab went temporarily offline due to the Pittsburgh shooting⁵³. As a result, we collect a total of around 14 million comments.

Data were collected from the Pushshift.io archive (https://pushshift.io/) for the period ranging from 1 January 2018 to 31 December 2022. For each topic, whenever possible, we manually identified and selected subreddits that best represented the targeted topics. As a result of this operation, we obtained about 800,000 comments from the r/conspiracy subreddit for the conspiracy topic. For the vaccines topic, we collected about 70,000 comments from the r/VaccineDebate subreddit, focusing on the COVID-19 vaccine debate. We collected around 400,000 comments from the r/News subreddit for the news topic. We collected about 70,000 comments from the r/environment subreddit for the climate change topic. Finally, we collected around 550,000 comments from the r/science subreddit for the science topic.

We created a list of 14 channels, associating each with one of the topics considered in the study. For each channel, we manually collected messages and their related comments. As a result, from the four channels associated with the news topic (news notiziae, news ultimora, news edizionestraordinaria, news covidultimora), we obtained around 724,000 comments from posts between 9 April 2018 and 20 December 2022. For the politics topic, instead, the corresponding two channels (politics besttimeline, politics polmemes) produced a total of around 490,000 comments between 4 August 2017 and 19 December 2022. Finally, the eight channels assigned to the conspiracy topic (conspiracy bennyjhonson, conspiracy tommyrobinsonnews, conspiracy britainsfirst, conspiracy loomeredofficial, conspiracy thetrumpistgroup, conspiracy trumpjr, conspiracy pauljwatson, conspiracy iononmivaccino) produced a total of about 1.4 million comments between 30 August 2019 and 20 December 2022.

Twitter

We used a list of datasets from previous studies that includes discussions about vaccines⁵⁴, climate change⁴⁹ and news⁵⁵ topics. For the vaccines topic, we collected around 50 million comments from 23 January 2010 to 25 January 2023. For the news topic, we extend the dataset used previously⁵⁵ by collecting all threads composed of less than 20 comments, obtaining a total of about 9.5 million comments for a period ranging from 1 January 2020 to 29 November 2022. Finally, for the climate change topic, we collected around 9.7 million comments between 1 January 2020 and 10 January 2023.

Usenet

We collected data for the Usenet discussion system by querying the Usenet Archive (https://archive.org/details/usenet?tab=about). We selected a list of topics considered adequate to contain a large, broad and heterogeneous number of discussions involving active and populated newsgroups. As a result of this selection, we selected conspiracy, politics, news and talk as topic candidates for our analysis. For the conspiracy topic, we collected around 280,000 comments between 1 September 1994 and 30 December 2005 from the alt.conspiracy newsgroup. For the politics topics, we collected around 2.6 million comments between 29 June 1992 and 31 December 2005 from the alt.politics newsgroup. For the news topic, we collected about 620,000 comments between 5 December 1992 and 31 December 2005 from the alt.news newsgroup. Finally, for the talk topic, we collected all of the conversations from the homonym newsgroup on a period that ranges from 13 February 1989 to 31 December 2005 for around 2.1 million contents.

Voat

We used a dataset presented previously⁵⁶ that covers the entire lifetime of the platform, from 9 January 2018 to 25 December 2020, including a total of around 16.2 million posts and comments shared by around 113,000 users in about 7,100 subverses (the equivalent of a subreddit for Voat). Similarly to previous platforms, we associated the topics to specific subverses. As a result of this operation, for the conspiracy topic, we collected about 1 million comments from the greatawakening subverse between 9 January 2018 and 25 December 2020. For the politics topic, we collected around 1 million comments from the politics subverse between 16 June 2014 and 25 December 2020. Finally, for the news topic, we collected about 1.4 million comments from the news subverse between 21 November 2013 and 25 December 2020.

YouTube

We used a dataset proposed in previous studies that collected conversations about the climate change topic⁴⁹, which is extended, coherently with previous platforms, by including conversations about vaccines and news topics. The data collection process for YouTube is performed using the YouTube Data API (https://developers.google.com/youtube/v3). For the climate change topic, we collected around 840,000 comments between 16 March 2014 and 28 February 2022. For the vaccines topic, we collected conversations between 31 January 2020 and 24 October 2021 containing keywords about COVID-19 vaccines, namely Sinopharm, CanSino, Janssen, Johnson&Johnson, Novavax, CureVac, Pfizer, BioNTech, AstraZeneca and Moderna. As a result of this operation, we gathered a total of around 2.6 million comments to videos. Finally, for the news topic, we collected about 20 million comments between 13 February 2006 and 8 February 2022, including videos and comments from a list of news outlets, limited to the UK and provided by Newsguard (see the ‘Polarization and user leaning attribution’ section).

Content moderation policies

Content moderation policies are guidelines that online platforms use to monitor the content that users post on their sites. Platforms have different goals and audiences, and their moderation policies may vary greatly, with some placing more emphasis on free expression and others prioritizing safety and community guidelines.

Facebook and YouTube have strict moderation policies prohibiting hate speech, violence and harassment⁵⁷. To address harmful content, Facebook follows a ‘remove, reduce, inform’ strategy and uses a combination of human reviewers and artificial intelligence to enforce its policies⁵⁸. Similarly, YouTube has a similar set of community guidelines regarding hate speech policy, covering a wide range of behaviours such as vulgar language⁵⁹, harassment⁶⁰ and, in general, does not allow the presence of hate speech and violence against individuals or groups based on various attributes⁶¹. To ensure that these guidelines are respected, the platform uses a mix of artificial intelligence algorithms and human reviewers⁶².

Twitter also has a comprehensive content moderation policy and specific rules against hateful conduct^63,64. They use automation⁶⁵ and human review in the moderation process⁶⁶. At the date of submission, Twitter’s content policies have remained unchanged since Elon Musk’s takeover, except that they ceased enforcing their COVID-19 misleading information policy on 23 November 2022. Their policy enforcement has faced criticism for inconsistency⁶⁷.

Reddit falls somewhere in between regarding how strict its moderation policy is. Reddit’s content policy has eight rules, including prohibiting violence, harassment and promoting hate based on identity or vulnerability^68,69. Reddit relies heavily on user reports and volunteer moderators. Thus, it could be considered more lenient than Facebook, YouTube and Twitter regarding enforcing rules. In October 2022, Reddit announced that they intend to update their enforcement practices to apply automation in content moderation⁷⁰.

By contrast, Telegram, Gab and Voat take a more hands-off approach with fewer restrictions on content. Telegram has ambiguity in its guidelines, which arises from broad or subjective terms and can lead to different interpretations⁷¹. Although they mentioned they may use automated algorithms to analyse messages, Telegram relies mainly on users to report a range of content, such as violence, child abuse, spam, illegal drugs, personal details and pornography⁷². According to Telegram’s privacy policy, reported content may be checked by moderators and, if it is confirmed to violate their terms, temporary or permanent restrictions may be imposed on the account⁷³. Gab’s Terms of Service allow all speech protected under the First Amendment to the US Constitution, and unlawful content is removed. They state that they do not review material before it is posted on their website and cannot guarantee prompt removal of illegal content after it has been posted⁷⁴. Voat was once known as a ‘free-speech’ alternative to Reddit and allowed content even if it may be considered offensive or controversial⁵⁶.

Usenet is a decentralized online discussion system created in 1979. Owing to its decentralized nature, Usenet has been difficult to moderate effectively, and it has a reputation for being a place where controversial and even illegal content can be posted without consequence. Each individual group on Usenet can have its own moderators, who are responsible for monitoring and enforcing their group’s rules, and there is no single set of rules that applies to the entire platform⁷⁵.

Logarithmic binning and conversation size

Owing to the heavy-tailed distributions of conversation length (Extended Data Fig. 1), to plot the figures and perform the analyses, we used logarithmic binning. Thus, according to its length, each thread of each dataset is assigned to 1 out of 21 bins. To ensure a minimal number of points in each bin, we iteratively change the left bound of the last bin so that it contains at least N = 50 elements (we set N = 100 in the case of Facebook news, due to its larger size). Specifically, considering threads ordered in increasing length, the size of the largest thread is changed to that of the second last largest one, and the binning is recalculated accordingly until the last bin contains at least N points.

For visualization purposes, we provide a normalization of the logarithmic binning outcome that consists of mapping discrete points into coordinates of the x axis such that the bins correspond to {0, 0.05, 0.1, …, 0.95, 1}.

To perform the part of the analysis, we select conversations belonging to the [0.7, 1] interval of the normalized logarithmic binning of thread length. This interval ensures that the conversations are sufficiently long and that we have a substantial number of threads. Participation and toxicity trends are obtained by applying to such conversations a linear binning of 21 elements to a chronologically ordered sequence of comments, that is, threads. A breakdown of the resulting datasets is provided in Supplementary Table 2.

Finally, to assess the equality of the growth rates of participation values in toxic and non-toxic threads (see the ‘Conversation evolution and toxicity’ section), we implemented the following linear regression model:

$${\rm{p}}{\rm{a}}{\rm{r}}{\rm{t}}{\rm{i}}{\rm{c}}{\rm{i}}{\rm{p}}{\rm{a}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}={\beta }_{0}+{\beta }_{1}\cdot {\rm{b}}{\rm{i}}{\rm{n}}+{\beta }_{2}\cdot \,({\rm{b}}{\rm{i}}{\rm{n}}\cdot {\rm{i}}{\rm{s}}{\rm{T}}{\rm{o}}{\rm{x}}{\rm{i}}{\rm{c}}),$$

where the term β₂ accounts for the effect that being a toxic conversation has on the growth of participation. Our results show that β₂ is not significantly different from 0 in most original and validation datasets (Supplementary Tables 8 and 11)

Toxicity detection and validation of the models used

The problem of detecting toxicity is highly debated, to the point that there is currently no agreement on the very definition of toxic speech^64,76. A toxic comment can be regarded as one that includes obscene or derogatory language³², that uses harsh, abusive language and personal attacks³³, or contains extremism, violence and harassment¹¹, just to give a few examples. Even though toxic speech should, in principle, be distinguished from hate speech, which is commonly more related to targeted attacks that denigrate a person or a group on the basis of attributes such as race, religion, gender, sex, sexual orientation and so on⁷⁷, it sometimes may also be used as an umbrella term^78,79. This lack of agreement directly reflects the challenging and inherent subjective nature of the concept of toxicity. The complexity of the topic makes it particularly difficult to assess the reliability of natural language processing models for automatic toxicity detection despite the impressive improvements in the field. Modern natural language processing models, such as Perspective API, are deep learning models that leverage word-embedding techniques to build representations of words as vectors in a high-dimensional space, in which a metric distance should reflect the conceptual distance among words, therefore providing linguistic context. A primary concern regarding toxicity detection models is their limited ability to contextualize conversations^11,80. These models often struggle to incorporate factors beyond the text itself, such as the participant’s personal characteristics, motivations, relationships, group memberships and the overall tone of the discussion¹¹. Consequently, what is considered to be toxic content can vary significantly among different groups, such as ethnicities or age groups⁸¹, leading to potential biases. These biases may stem from the annotators’ backgrounds and the datasets used for training, which might not adequately represent cultural heterogeneity. Moreover, subtle forms of toxic content, like indirect allusions, memes and inside jokes targeted at specific groups, can be particularly challenging to detect. Word embeddings equip current classifiers with a rich linguistic context, enhancing their ability to recognize a wide range of patterns characteristic of toxic expression. However, the requirements for understanding the broader context of a conversation, such as personal characteristics, motivations and group dynamics, remain beyond the scope of automatic detection models. We acknowledge these inherent limitations in our approach. Nonetheless, reliance on automatic detection models is essential for large-scale analyses of online toxicity like the one conducted in this study. We specifically resort to the Perspective API for this task, as it represents state-of-the-art automatic toxicity detection, offering a balance between linguistic nuance and scalable analysis capabilities. To define an appropriate classification threshold, we draw from the existing literature⁶⁴, which uses 0.6 as the threshold for considering a comment to be toxic. This threshold can also be considered a reasonable one as, according to the developer guidelines offered by Perspective, it would indicate that the majority of the sample of readers, namely 6 out of 10, would perceive that comment as toxic. Due to the limitations mentioned above (for a criticism of Perspective API, see ref. ⁸²), we validate our results by performing a comparative analysis using two other toxicity detectors: Detoxify (https://github.com/unitaryai/detoxify), which is similar to Perspective, and IMSYPP, a classifier developed for a European Project on hate speech¹⁶ (https://huggingface.co/IMSyPP). In Supplementary Table 14, the percentages of agreement among the three models in classifying 100,000 comments taken randomly from each of our datasets are reported. For Detoxify we used the same binary toxicity threshold (0.6) as used with Perspective. Although IMSYPP operates on a distinct definition of toxicity as outlined previously¹⁶, our comparative analysis shows a general agreement in the results. This alignment, despite the differences in underlying definitions and methodologies, underscores the robustness of our findings across various toxicity detection frameworks. Moreover, we perform the core analyses of this study using all classifiers on a further, vast and heterogeneous dataset. As shown in Supplementary Figs. 1 and 2, the results regarding toxicity increase with conversation size and user participation and toxicity are quantitatively very similar. Furthermore, we verify the stability of our findings under different toxicity thresholds. Although the main analyses in this paper use the threshold value recommended by the Perspective API, set at 0.6, to minimize false positives, our results remain consistent even when applying a less conservative threshold of 0.5. This is demonstrated in Extended Data Fig. 5, confirming the robustness of our observations across varying toxicity levels. For this study, we used the API support for languages prevalent in the European and American continents, including English, Spanish, French, Portuguese, German, Italian, Dutch, Polish, Swedish and Russian. Detoxify also offers multilingual support. However, IMSYPP is limited to English and Italian text, a factor considered in our comparative analysis.

Polarization and user leaning attribution

Our approach to measuring controversy in a conversation is based on estimating the degree of political partisanship among the participants. This measure is closely related to the political science concept of political polarization. Political polarization is the process by which political attitudes diverge from moderate positions and gravitate towards ideological extremes, as described previously⁸³. By quantifying the level of partisanship within discussions, we aim to provide insights into the extent and nature of polarization in online debates. In this context, it is important to distinguish between ‘ideological polarization’ and ‘affective polarization’. Ideological polarization refers to divisions based on political viewpoints. By contrast, affective polarization is characterized by positive emotions towards members of one’s group and hostility towards those of opposing groups^84,85. Here we focus specifically on ideological polarization. The subsequent description of our procedure for attributing user political leanings will further clarify this focus. On online social media, the individual leaning of a user toward a topic can be inferred through the content produced or the endorsement shown toward specific content. In this study, we consider the endorsement of users to news outlets of which the political leaning has been evaluated by trustworthy external sources. Although not without limitations—which we address below—this is a standard approach that has been used in several studies, and has become a common and established practice in the field of social media analysis due to its practicality and effectiveness in providing a broad understanding of political dynamics on these online platforms^{1,43,86,87,88}. We label news outlets with a political score based on the information reported by Media Bias/Fact Check (MBFC) (https://mediabiasfactcheck.com), integrating with the equivalent information from Newsguard (https://www.newsguardtech.com/). MBFC is an independent fact-checking organization that rates news outlets on the basis of the reliability and the political bias of the content that they produce and share. Similarly, Newsguard is a tool created by an international team of journalists that provides news outlet trust and political bias scores. Following standard methods used in the literature^1,43, we calculated the individual leaning of a user l ∈ [−1, 1] as the average of the leaning scores l_c ∈ [−1, 1] attributed to each of the content it produced/shared, where l_c results from a mapping of the news organizations political scores provided by MBFC and Newsguard, respectively: [left, centre-left, centre, centre-right, right] to [−1, − 0.5, 0, 0.5, 1], and [far left, left, right, far right] to [−1, −0.5, 0.5, 1]). Our datasets have different structures, so we have to evaluate user leanings in different ways. For Facebook News, we assign a leaning score to users who posted a like at least three times and commented at least three times under news outlet pages that have a political score. For Twitter News, a leaning is assigned to users who posted at least 15 comments under scored news outlet pages. For Twitter Vaccines and Gab, we consider users who shared content produced by scored news outlet pages at least three times. A limitation of our approach is that engaging with politically aligned content does not always imply agreement; users may interact with opposing viewpoints for critical discussion. However, research indicates that users predominantly share content aligning with their own views, especially in politically charged contexts^87,89,90. Moreover, our method captures users who actively express their political leanings, omitting the ‘passive’ ones. This is due to the lack of available data on users who do not explicitly state their opinions. Nevertheless, analysing active users offers valuable insights into the discourse of those most engaged and influential on social media platforms.

Burst analysis

We used the Kleinberg burst detection algorithm⁴⁶ (see the ‘Controversy and toxicity’ section) to all conversations with at least 50 comments in a dataset. In our analysis, we randomly sample up to 5,000 conversations, each containing a specific number of comments. To ensure the reliability of our data, we exclude conversations with an excessive number of double timestamps—defined as more than 10 consecutive or over 100 within the first 24 h. This criterion helps to mitigate the influence of bots, which could distort the patterns of human activity. Furthermore, we focus on the first 24 h of each thread to analyse streams of comments during their peak activity period. Consequently, Usenet was excluded from our study. The unique usage characteristics of Usenet render such a time-constrained analysis inappropriate, as its activity patterns do not align with those of the other platforms under consideration. By reconstructing the density profile of the comment stream, the algorithm divides the entire stream’s interval into subintervals on the basis of their level of intensity. Labelled as discrete positive values, higher levels of burstiness represent higher activity segments. To avoid considering flat-density phases, threads with a maximum burst level equal to 2 are excluded from this analysis. To assess whether a higher intensity of comments results in a higher comment toxicity, we perform a Mann–Whitney U-test⁹¹ with Bonferroni correction for multiple testing between the distributions of the fraction of toxic comments t_i in three intensity phases: during the peak of engagement and at the highest levels before and after. Extended Data Table 4 shows the corrected P values of each test, at a 0.99 confidence level, with H1 indicated in the column header. An example of the distribution of the frequency of toxic comments in threads at the three phases of a conversation considered (pre-peak, peak and post-peak) is reported in Fig. 4c.

Toxicity detection on Usenet

As discussed in the section on toxicity detection and the Perspective API above, automatic detectors derive their understanding of toxicity from the annotated datasets that they are trained on. The Perspective API is predominantly trained on recent texts, and its human labellers conform to contemporary cultural norms. Thus, although our dataset dates back to no more than the early 1990s, we provide a discussion on the viability of the application of Perspective API to Usenet and validation analysis. Contemporary society, especially in Western contexts, is more sensitive to issues of toxicity, including gender, race and sexual orientation, compared with a few decades ago. This means that some comments identified as toxic today, including those from older platforms like Usenet, might not have been considered as such in the past. However, this discrepancy does not significantly affect our analysis, which is centred on current standards of toxicity. On the other hand, changes in linguistic features may have some repercussions: there may be words and locutions that were frequently used in the 1990s that instead appear sparsely in today’s language, making Perspective potentially less effective in classifying short texts that contain them. We therefore proceeded to evaluate the impact that such a possible scenario could have on our results. In light of the above considerations, we consider texts labelled as toxic as correctly classified; instead, we assume that there is a fixed probability p that a comment may be incorrectly labelled as non-toxic. Consequently, we randomly designate a proportion p of non-toxic comments, relabel them as toxic and compute the toxicity versus conversation size trend (Fig. 2) on the altered dataset across various p. Specifically, for each value, we simulate 500 different trends, collecting their regression slopes to obtain a null distribution for them. To assess if the probability of error could lead to significant differences in the observed trend, we compute the fraction f of slopes lying outside the interval (−|s|,|s|), where s is the slope of the observed trend. We report the result in Supplementary Table 9 for different values of p. In agreement with our previous analysis, we assume that the slope differs significantly from the ones obtained from randomized data if f is less than 0.05.

We observed that only the Usenet Talk dataset shows sensitivity to small error probabilities, and the others do not show a significant difference. Consequently, our results indicate that Perspective API is suitable for application to Usenet data in our analyses, notwithstanding the potential linguistic and cultural shifts that might affect the classifier’s reliability with older texts.

Toxicity of short conversations

Our study focuses on the relationship between user participation and the toxicity of conversations, particularly in engaged or prolonged discussions. A potential concern is that concentrating on longer threads overlooks conversations that terminate quickly due to early toxicity, therefore potentially biasing our analysis. To address this, we analysed shorter conversations, comprising 6 to 20 comments, in each dataset. In particular, we computed the distributions of toxicity scores of the first and last three comments in each thread. This approach helps to ensure that our analysis accounts for a range of conversation lengths and patterns of toxicity development, providing a more comprehensive understanding of the dynamics at play. As shown in Supplementary Fig. 3, for each dataset, the distributions of the toxicity scores display high similarity, meaning that, in short conversations, the last comments are not significantly more toxic than the initial ones, indicating that the potential effects mentioned above do not undermine our conclusions. Regarding our analysis of longer threads, we notice here that the participation quantity can give rise to similar trends in various cases. For example, high participation can be achieved because many users take part in the conversation, but also with small groups of users in which everyone is equally contributing over time. Or, in very large discussions, the contributions of individual outliers may remain hidden. By measuring participation, these and other borderline cases may not be distinct from the statistically highly likely discussion dynamics but, ultimately, this lack of discriminatory power does not have any implications on our findings nor on the validity of the conclusions that we draw.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

[ad_2]

Source link

Tag: Social sciences

The importance of family-based sampling for biobanks

Differences in misinformation sharing can lead to politically asymmetric sanctions

Sample and basic data collection for 2020 election study

Policy simulations

Reanalyses of extra datasets

Facebook sharing in 2016 by users recruited through YouGov

Twitter sharing in 2018 and 2020 by users recruited through Prolific

Twitter sharing in 2021 by users who followed at least three political elites

Twitter sharing in 2022 by users who followed at least three political elites

Twitter sharing in 2023 by users who followed at least one political elite, stratified on follower count

Sharing of false claims on Twitter

Sharing intentions of false COVID-19 claims across 16 countries

Reporting summary

Persistent interaction patterns across social media platforms and over time

Data collection

Facebook

Gab

Reddit

Telegram

Twitter

Usenet

Voat

YouTube

Content moderation policies

Logarithmic binning and conversation size

Toxicity detection and validation of the models used

Polarization and user leaning attribution

Burst analysis

Toxicity detection on Usenet

Toxicity of short conversations

Reporting summary