How to use OpenAI APIs: Text, image, and audio generation

Integrating OpenAI’s chat creation, image, and audio APIs can supercharge your applications with capabilities that create text content, generate images, and work with audio data. In this Answer, we provide a step-by-step guide to help you seamlessly set up these APIs with code examples so you can start building quickly.

This nswer covers the following examples:

  1. Text generation with GPT-4

  2. Image generation with DALL-E-3

  3. Generating audio response to a prompt with gpt-4o-audio-preview

  4. Text to audio conversion with TTS-1

  5. Audio to text conversion with Whisper-1

Let's start!

Quick setup

To integrate OpenAI models, make sure you have the following:

  1. OpenAI API key: Get your API key from the OpenAI if you don't have it.

  2. Python and OpenAI library: Ensure you have Python installed, along with OpenAI's official library.

Install the OpenAI library if you haven’t done so yet with the following command:

pip install openai

Text generation with GPT-4

OpenAI’s GPT models are capable of generating text, answering questions, and handling other natural language processing tasks. Here’s how to integrate and use it:

import openai
# Replace 'YOUR_API_KEY' with your actual API key
openai.api_key = 'YOUR_API_KEY'
def generate_text(prompt):
response = openai.chat.completions.create(
model="gpt-4", # Or choose another model like "gpt-3.5-turbo"
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
# Example
prompt = "Write a poem about the ocean."
print(generate_text(prompt))
  • Line 1: Import the openai library to interact with the OpenAI API.

  • Line 3: Set your OpenAI API key to authenticate the API requests using openai.api_key = 'YOUR_API_KEY'.

  • Line 5: Make a call to the OpenAI API using chat.completions.create(). This method requires the model parameter (e.g., "gpt-4") and a list of messages.

  • Lines 6–7: Choose the model and provide the prompt within the messages array. The model generates a response based on the prompt provided.

  • Line 9: Extract and return the content from the response. The response from the API call contains the model’s reply in response.choices[0].message.content.

Image generation with DALL·E-3

To create images, OpenAI offers the DALL·E model, which generates images based on text prompts. Here’s how to integrate image generation:

import openai
# Replace 'YOUR_API_KEY' with your actual API key
openai.api_key = 'YOUR_API_KEY'
def generate_image(prompt):
response = openai.images.generate(
prompt=prompt,
model="dall-e-3",
size="1024x1024",
quality="standard",
n=1,
)
image_url = response.data[0].url
return image_url
# Example usage
prompt = "A futuristic cityscape at sunset."
print(generate_image(prompt))

Note: Copy and paste the generated image URL into your browser to view the image.

  • Line 5: Make a call to OpenAI's images.generate() method to generate an image based on the provided prompt.

  • Lines 6–10: Pass prompt as the input text that will guide the image generation. Also, specify the model (e.g., "dall-e-3"), the desired image size ("1024x1024"), and the image quality ("standard"). You can also set n=1 to generate one image.

  • Line 12: The API returns a response, and the URL of the generated image is stored in response.data[0].url.

Generating audio response to a prompt with GPT-4o-audio-preview

OpenAI's GPT-4o-audio-preview enables realistic audio responses to text prompts. Here’s how to integrate audio generation into your applications:

import openai
# Replace 'YOUR_API_KEY' with your actual API key
openai.api_key = 'YOUR_API_KEY'
# Function to interact with OpenAI's API and generate audio
def generate_audio(prompt):
    try:
        response = openai.chat.completions.create(
            model="gpt-4o-audio-preview",
            modalities=["text", "audio"],
            audio={"voice": "alloy", "format": "mp3"},
            messages=[
                {
                    "role": "user",
                    "content": prompt
                }
            ]
        )
        # Decode the audio data and save it to a file
        audio_data = base64.b64decode(response.choices[0].message.audio.data)
        audio_file_path = "response.mp3"
        with open(audio_file_path, "wb") as audio_file:
            audio_file.write(audio_data)
        return audio_file_path, None
    except Exception as e:
        return None, str(e)
# Streamlit UI
import streamlit as st
st.title("Audio Response Generator with OpenAI")

# Input prompt
prompt = st.text_input("Enter your text prompt:", "Is a golden retriever a good family dog?")

import base64
# Button to generate audio
if st.button("Generate audio response"):
    if prompt.strip():
        with st.spinner("Generating response..."):
            audio_file_path, error = generate_audio(prompt)

            if error:
                st.error(f"An error occurred: {error}")
            else:
                st.audio(audio_file_path, autoplay=True)

                # Provide a download link
                with open(audio_file_path, "rb") as file:
                    b64_audio = base64.b64encode(file.read()).decode()
                download_link = f'<a href="data:audio/mp3;base64,{b64_audio}" download="response.mp3">Download the Audio</a>'
                st.markdown(download_link, unsafe_allow_html=True)
    else:
        st.warning("Please enter a prompt before generating audio.")
Code to generate audio responses to a text prompt using OpenAI's gpt-4o-audio-preview model
  • Lines 7–17: Make a call to OpenAI's chat.completions.create() method to generate a response, including both text and audio. Pass the required parameters:

    • model specifies the GPT model being used; Use "gpt-4o-audio-preview" to generate the audio response.

    • modalities defines the types of output expected, in this case, both "text" and "audio".

    • audio specifies the audio parameters, such as the voice type (e.g., "alloy") and the audio format (e.g., "mp3").

    • messages contains the prompt, structured with a role ("user") and the content of the prompt.

  • Line 19: The response from the API includes audio data in response.choices[0].message.audio.data. Decode the audio data (which is base64-encoded) using base64.b64decode() and store it in the audio_data variable. This audio data can be written to an audio file.

Text-to-audio conversion with TTS-1

OpenAI’s TTS-1 model transforms text into natural-sounding speech. Here’s how to integrate text-to-audio conversion into your applications:

import openai
# Replace 'YOUR_API_KEY' with your actual API key
openai.api_key = 'YOUR_API_KEY'
# Function to generate audio from OpenAI's speech API
def generate_audio(prompt):
    try:
        # Use OpenAI's speech model to generate audio
        with openai.audio.speech.with_streaming_response.create(
            model="tts-1",
            voice="alloy",
            input=prompt,
            response_format="mp3"
        ) as response:
            # Save the audio response to a file
            audio_file_path = "audio.mp3"
            response.stream_to_file(audio_file_path)

        return audio_file_path, None
    except Exception as e:
        return None, str(e)

# Streamlit UI
import streamlit as st
st.title("OpenAI Text to Speech Converter")

# Input prompt for generating audio
prompt = st.text_input("Enter your text:", "Hello, I am speaking out loud the text you provided.")

# Button to generate audio
if st.button("Give My Text the Voice"):
    if prompt.strip():
        with st.spinner("Generating speech..."):
            audio_file_path, error = generate_audio(prompt)

            if error:
                st.error(f"An error occurred: {error}")
            else:
                # Play the generated audio
                st.audio(audio_file_path, autoplay=True)
    else:
        st.warning("Please input the text you want to convert to speech.")
Code to convert text to audio using OpenAI's TTS model
  • Lines 8–13: Use OpenAI's audio.speech.with_streaming_response.create() method to generate speech from text. Pass the required parameters:

    • with statement ensures that the API’s response is streamed properly.

    • model specifies the TTS (Text-to-Speech) model being used, in this case, "tts-1".

    • voice sets the voice type for the audio (e.g., "alloy").

    • input contains the text prompt that will be converted into speech.

    • response_format defines the audio format, here it is set to "mp3".

  • Lines 15–16: The audio_file_path specifies the path where the audio file will be saved (e.g., "audio.mp3"). The response.stream_to_file() function is used to stream the audio data to a file.

Audio to text conversion with Whisper-1

OpenAI’s Whisper-1 model enables accurate transcription of audio into text. Here’s how to integrate audio-to-text conversion into your applications:

import openai
# Replace 'YOUR_API_KEY' with your actual API key
openai.api_key = 'YOUR_API_KEY'
# Function to transcribe audio using OpenAI Whisper
def transcribe_audio(file):
    try:
        # Send the audio file to OpenAI Whisper model for transcription
        transcription = openai.audio.transcriptions.create(
            model="whisper-1",
            file=file
        )
        return transcription.text, None
    except Exception as e:
        return None, str(e)

# Streamlit UI
import streamlit as st
st.title("Audio-to-Text Transcription App")

# File uploader for audio
uploaded_file = st.file_uploader("Upload an audio file (MP3, WAV, etc.)", type=["mp3", "wav", "m4a"])

# Button to process the uploaded audio file
if uploaded_file is not None:
    st.audio(uploaded_file, format="audio/mp3", start_time=0)  # Display the uploaded audio with a player
    
    if st.button("Transcribe Audio"):
        with st.spinner("Transcribing audio..."):
            # Transcribe the uploaded file
            transcription, error = transcribe_audio(uploaded_file)

            if error:
                st.error(f"An error occurred: {error}")
            else:
                st.success("Transcription completed!")
                st.write(f"**Transcribed Text**: {transcription}")
Code to convert audio to text using OpenAI's Whisper model
  • Lines 8–11: Use OpenAI’s audio.transcriptions.create() method to send an audio file to the Whisper model for transcription. Pass the following parameters:

    • model specifies the transcription model to use, here it is "whisper-1".

    • file is the audio file that will be transcribed. This can be a file object (e.g., opened using open() in binary mode).

  • Line 12: The transcription result is returned as the text attribute from the API response, which contains the transcribed text.

Conclusion

OpenAI's APIs make it easy to build engaging applications, whether you’re creating a chatbot, generating images, or transcribing audio. These tools help you add powerful features to your projects and deliver rich user experiences.

While this answer offers a foundational overview of integrating OpenAI's APIs for text, image, and audio generation, you can further deepen your understanding through our comprehensive course: Mastering OpenAI API and ChatGPT for Innovative Applications.​

Frequently asked questions

Haven’t found what you were looking for? Contact Us


Can I use OpenAI APIs commercially?

Yes, OpenAI APIs can be used commercially, provided you comply with OpenAI’s usage policies (https://openai.com/policies/terms-of-use/). Be sure to review these policies to understand restrictions and obligations clearly.


Are there free tiers available?

OpenAI provides a limited trial credit for new users. Check OpenAI Pricing (https://openai.com/chatgpt/pricing/) for more.


How can I secure my API key?

Never expose your API key publicly. Use environment variables or secure storage to manage keys.


Free Resources

Copyright ©2025 Educative, Inc. All rights reserved