<aside>
đź”® Scope: The goal of the research is to identify existing patterns for prosocial norming, proactive moderation, or moderation.
</aside>
Questions
Findings
Notes
AI moderation options
- GPT-4
- Open AI is experimenting with using GPT-4 for content moderation. It works by:
- Once a policy guideline is written, policy experts can create a golden set of data by identifying a small number of examples and assigning them labels according to the policy. (Lilian Weng; Vik Goel; Andrea Vallone)
- Then, GPT-4 reads the policy and assigns labels to the same dataset, without seeing the answers. Lilian Weng; Vik Goel; Andrea Vallone)
- By examining the discrepancies between GPT-4’s judgments and those of a human, the policy experts can ask GPT-4 to come up with reasoning behind its labels, analyze the ambiguity in policy definitions, resolve confusion and provide further clarification in the policy accordingly. We can repeat steps 2 and 3 until we are satisfied with the policy quality. Lilian Weng; Vik Goel; Andrea Vallone)
- This iterative process yields refined content policies that are translated into classifiers, enabling the deployment of the policy and content moderation at scale. Lilian Weng; Vik Goel; Andrea Vallone)
- Benefit of AI moderation: Open AI argues that AI-based content moderation is better than human moderation in 3 ways: more consistent labels, faster feedback loops; and reduced mental burden on human moderators.
- “More consistent labels. Content policies are continually evolving and often very detailed. People may interpret policies differently or some moderators may take longer to digest new policy changes, leading to inconsistent labels. In comparison, LLMs are sensitive to granular differences in wording and can instantly adapt to policy updates to offer a consistent content experience for users.” Lilian Weng; Vik Goel; Andrea Vallone)
- “Faster feedback loop. The cycle of policy updates – developing a new policy, labeling, and gathering human feedback – can often be a long and drawn-out process. GPT-4 can reduce this process down to hours, enabling faster responses to new harms.” Lilian Weng; Vik Goel; Andrea Vallone)
- “Reduced mental burden. Continual exposure to harmful or offensive content can lead to emotional exhaustion and psychological stress among human moderators. Automating this type of work is beneficial for the wellbeing of those involved.” Lilian Weng; Vik Goel; Andrea Vallone)
- Accuracy: Students in a Stanford trust and safety engineering course found that Chat-GPT’s moderation model was more accurate than either their own or Google/Jigsaw’s Perspective Model. “GPT-4 was often the winner, with only a little bit of prompt engineering necessary to get to good results,” according to Alex Stamos who teaches the course. (Newton)
- Limitations
- Potential bias: “Judgments by language models are vulnerable to undesired biases that might have been introduced into the model during training. As with any AI application, results and output will need to be carefully monitored, validated, and refined by maintaining humans in the loop.” Lilian Weng; Vik Goel; Andrea Vallone)
- Cost: Given the high cost of GPT-4, using it for content moderation is also currently more expensive than other moderated tools. (Newton)
Moderation challenges
- What are the key challenges faced by moderators when dealing with offensive content, misinformation, or harassment?
- “Platforms’ community standards change constantly, and thus must be enforced differently from day to day. Communicating those changes to a global workforce speaking dozens of languages introduces daily opportunities for mass confusion.” (Newton)
- “For platform policy makers, this introduces significant friction into the development and implementation of new rules. Mostly it just means that everything takes more time — time to distribute the policy, time to educate the moderators, time to test the effect of implementing it, and so on.” (Newton)
- What psychological toll does moderation take on moderators, and how can platforms support their mental well-being?
- In the case of news site comment moderation, Gina Masullo and Martin Riedl at CME, found that when participants engaged in moderating uncivil comments, their own trust in the news outlet decreased and their emotional exhaustion increased. (Gina Masullo; Martin Riedl)
- “There are key steps platforms can and occasionally do take to protect the mental health of moderators, whether they work in house or for outsourcing companies. They can offer high wages and good health care, including mental health care. They can build moderating tools that reduce the impact of reviewing disturbing content by turning it grayscale, turning off audio by default, and blurring it until the moderator is ready to look at it.” (Newton)
References (Kinds of moderators and moderation)