How to implement content moderation with OpenAI's API in Python

Content moderation is an important aspect of maintaining a safe and respectful online environment. OpenAI’s Moderation API provides a tool to automatically check whether content complies with OpenAI’s usage policies. In this Answer, we’ll explore how to implement content moderation using OpenAI’s API in Python.

Understanding OpenAI’s Moderation API

OpenAI’s Moderation API is designed to identify content that violates specific categories, such as hate speech, harassment, self-harm, sexual content, and violence. The API classifies content into various categories, each with a specific description. For example:

  • Hate: Content that expresses or promotes hate based on race, gender, etc.

  • Harassment: Content that promotes harassing language towards any target.

  • Self-harm: Content that promotes or depicts acts of self-harm.

  • Sexual: Content meant to arouse sexual excitement.

  • Violence: Content that depicts death, violence, or physical injury.

The moderation endpoint is free to use when monitoring the inputs and outputs of OpenAI APIs, and it’s continuously being improved for accuracy.

Setting up the environment

Before we start, you’ll need to have Python installed on your system and the OpenAI Python client library. You can install the latter using pip:

pip install openai

You’ll also need to obtain an API key from OpenAI, which will be used to authenticate your requests.

Implementing content moderation in Python

Importing the library

First, import the OpenAI library and set your API key.

import openai
openai.api_key = 'your-api-key'

Creating a moderation request

You can create a moderation request by calling the Moderation.create method and passing the text you want to check.

response = openai.Moderation.create(
input="Educative is a website for developers"
)

Analyzing the response

The response will contain information about whether the content is flagged and the categories it violates.

import openai
import os
openai.api_key = os.environ["SECRET_KEY"]
response = openai.Moderation.create(
input="Input text goes here"
)
flagged = response['results'][0]['flagged']
categories = response['results'][0]['categories']
category_scores = response['results'][0]['category_scores']
print("Flagged:", flagged)
print("Categories:", categories)
print("Category Scores:", category_scores)

The flagged field will be set to true if the content violates OpenAI’s policies, and false otherwise. The categories dictionary contains binary flags for each category, and the category_scores dictionary contains the model’s confidence scores.

Handling different content types

For higher accuracy, especially with longer pieces of text, you may consider splitting the text into smaller chunks, each less than 2,000 characters.

Conclusion

OpenAI’s Moderation API offers an easy solution for content moderation, allowing developers to ensure that user-generated content aligns with community guidelines and usage policies. By integrating this API into your applications, you can create a safer and more respectful online environment.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved