OpenAI Models - CometAPI

GPT-5.2 Pro

GPT-5.2 Pro

Context:400,000

gpt-5.2-pro is the highest-capability, production-oriented member of OpenAI’s GPT-5.2 family, exposed through the Responses API for workloads that demand maximal fidelity, multi-step reasoning, extensive tool use and the largest context/throughput budgets OpenAI offers.

GPT-5.2 Chat

GPT-5.2 Chat

Context:128,000

gpt-5.2-chat-latest is the Chat-optimized snapshot of OpenAI’s GPT-5.2 family (branded in ChatGPT as GPT-5.2 Instant). It is the model for interactive/chat use cases that need a blend of speed, long-context handling, multimodal inputs and reliable conversational behaviour.

GPT-5.2

GPT-5.2

Context:400,000

GPT-5.2 is a multi-flavored model suite (Instant, Thinking, Pro) engineered for better long-context understanding, stronger coding and tool use, and materially higher performance on professional “knowledge-work” benchmarks.

GPT-5.1 Chat

GPT-5.1 Chat

GPT-5.1 Chat is an instruction-tuned conversational language model for general-purpose chat, reasoning, and writing. It supports multi-turn dialogue, summarization, drafting, knowledge-base QA, and lightweight code assistance for in-app assistants, support automation, and workflow copilots. Technical highlights include chat-optimized alignment, controllable and structured outputs, and integration paths for tool invocation and retrieval workflows when available.

GPT-5.1

GPT-5.1

GPT-5.1 is a general-purpose instruction-tuned language model focused on text generation and reasoning across product workflows. It supports multi-turn dialogue, structured output formatting, and code-oriented tasks such as drafting, refactoring, and explanation. Typical uses include chat assistants, retrieval-augmented QA, data transformation, and agent-style automation with tools or APIs when supported. Technical highlights include text-centric modality, instruction following, JSON-style outputs, and compatibility with function calling in common orchestration frameworks.

GPT Image 1.5

GPT Image 1.5

GPT-Image-1.5 is OpenAI’s image model in the GPT Image family . It is a natively multimodal GPT model designed to generate images from text prompts and to perform high-fidelity edits of input images while following user instructions closely.

GPT-5 nano

GPT-5 nano

GPT-5 Nano is an artificial intelligence model provided by OpenAI.

GPT-5 mini

GPT-5 mini

GPT-5 mini is OpenAI’s cost- and latency-optimized member of the GPT-5 family, intended to deliver much of GPT-5’s multimodal and instruction-following strengths at substantially lower cost for large-scale production use. It targets environments where throughput, predictable per-token pricing, and fast responses are the primary constraints while still providing strong general-purpose capabilities.

GPT 5 Chat

GPT 5 Chat

GPT-5 Chat (latest) is an artificial intelligence model provided by OpenAI.

GPT-5

GPT-5

GPT-5 is OpenAI's most powerful coding model to date. It shows significant improvements in complex front-end generation and debugging large codebases. It can transform ideas into reality with intuitive and aesthetically pleasing results, creating beautiful and responsive websites, applications, and games with a keen sense of aesthetics, all from a single prompt. Early testers have also noted its design choices, with a deeper understanding of elements like spacing, typography, and white space.

GPT-4.1 nano

GPT-4.1 nano

GPT-4.1 nano is an artificial intelligence model provided by OpenAI. gpt-4.1-nano: Features a larger context window—supporting up to 1 million context tokens and capable of better utilizing that context through improved long-context understanding. Has an updated knowledge cutoff time of June 2024. This model supports a maximum context length of 1,047,576 tokens.

GPT-4.1

GPT-4.1

GPT-4.1 is an artificial intelligence model provided by OpenAI. gpt-4.1-nano: Features a larger context window—supporting up to 1 million context tokens and capable of better utilizing that context through improved long-context understanding. Has an updated knowledge cutoff time of June 2024. This model supports a maximum context length of 1,047,576 tokens.

GPT-4o mini

GPT-4o mini

GPT-4o mini is an artificial intelligence model provided by OpenAI.

Whisper-1

Whisper-1

Speech to text, creating translations

TTS

TTS

OpenAI Text-to-Speech

Sora 2 Pro

Sora 2 Pro

Per Second:$0.3

Sora 2 Pro is our most advanced and powerful media generation model, capable of generating videos with synchronized Audio. It can create detailed, dynamic video clips from natural language or images.

Sora 2

Sora 2

Per Second:$0.1

Super powerful video generation model, with sound effects, supports chat format.

GPT Image 1 mini

GPT Image 1 mini

Cost-optimized version of GPT Image 1. It is a native Multimodal language model that accepts both text and image input and generates image output.

GPT-4o

GPT-4o

<div>GPT-4o is OpenAI's most advanced Multimodal model, faster and cheaper than GPT-4 Turbo, with stronger visual capabilities. This model has a 128K context and a knowledge cutoff of October 2023. Models in the 1106 series and above support tool_calls and function_call.</div> This model supports a maximum context length of 128,000 tokens.

GPT 4.1 mini

GPT 4.1 mini

GPT-4.1 mini is an artificial intelligence model provided by OpenAI. gpt-4.1-mini: A significant leap in small model performance, even beating GPT-4o in many benchmarks. It meets or exceeds GPT-4o in intelligence evaluation while reducing latency by nearly half and cost by 83%. This model supports a maximum context length of 1,047,576 tokens.

GPT Image 2

GPT Image 2

o4-mini

o4-mini

O4-mini is an artificial intelligence model provided by OpenAI.

O3 Pro

O3 Pro

OpenAI o3‑pro is a “pro” variant of the o3 reasoning model engineered to think longer and deliver the most dependable responses by employing private chain‑of‑thought reinforcement learning and setting new state‑of‑the‑art benchmarks across domains like science, programming, and business—while autonomously integrating tools such as web search, file analysis, Python execution, and visual reasoning within API.

o3-mini

o3-mini

O3-mini is an artificial intelligence model provided by OpenAI.

o3

o3

O3 is an artificial intelligence model provided by OpenAI.

GPT-4o mini Audio

GPT-4o mini Audio

GPT-4o mini Audio is a multimodal model for speech and text interactions. It performs speech recognition, translation, and text-to-speech, follows instructions, and can call tools for structured actions with streaming responses. Typical uses include real-time voice assistants, live captioning and translation, call summarization, and voice-controlled applications. Technical highlights include audio input and output, streaming responses, function calling, and structured JSON output.

codex-mini-latest

codex-mini-latest

Codex Mini is an artificial intelligence model provided by OpenAI. It is OpenAI's latest achievement in code generation, a lightweight model specifically optimized for the Codex command-line interface (CLI). As a fine-tuned version of o4-mini, this model inherits the base model's high efficiency and response speed while being specially optimized for code understanding and generation.

GPT-4o Audio Preview

GPT-4o Audio Preview

This model supports a maximum context length of 128,000 tokens.

GPT-4o mini TTS

GPT-4o mini TTS

GPT-4o mini TTS is a neural text-to-speech model designed for natural, low-latency voice generation in user-facing applications. It converts text to natural-sounding speech with selectable voices, multi-format output, and streaming synthesis for responsive experiences. Typical uses include voice assistants, IVR and contact flows, product read-aloud, and media narration. Technical highlights include API-based streaming and export to common audio formats such as MP3 and WAV.

GPT-4o mini Search Preview

GPT-4o mini Search Preview

GPT-4o mini Search Preview is a compact multimodal model in the GPT-4o family geared toward search-oriented interactions and retrieval workflows. It interprets and reformulates queries, synthesizes concise answers, and can ground responses via external search when integrated through tool/function calling. Typical uses include in-product search assistants, knowledge-base QA, e-commerce discovery, and query understanding for ranking and routing. Technical highlights include text-and-image inputs, instruction following, structured output formats, and tool use integration for RAG pipelines.

GPT-4o Transcribe

GPT-4o Transcribe

GPT-4o Transcribe is an audio-to-text model for multilingual, low-latency speech recognition. It supports real-time streaming and batch transcription from common audio formats with punctuation and sentence segmentation. Typical uses include live captions, voice assistant input, meeting notes, and media or call recording transcription. Technical highlights include audio modality support, long-form processing, and APIs suited for interactive and server-side workflows.

GPT-4o Search

GPT-4o Search

GPT-4o Search is a GPT-4o-based multimodal model configured for search-augmented reasoning and grounded, current answers. It follows instructions and uses web search tools to retrieve, evaluate, and synthesize external information, with source context when available. Typical uses include research assistance, fact-checking, news and trend monitoring, and answering time-sensitive queries. Technical highlights include tool/function calling for browsing and retrieval, long-context handling, and structured outputs suitable for citations and links.

GPT-4o Realtime

GPT-4o Realtime

The Realtime API allows developers to build low-latency, Multimodal experiences, including speech-to-speech functionality. Text and Audio processed by the Realtime API are priced separately. This model supports a maximum context length of 128,000 tokens.

GPT-4o mini Realtime Preview

GPT-4o mini Realtime Preview

GPT-4o mini Realtime Preview is a real-time multimodal model for interactive voice and visual experiences. It handles speech, text, and images with streaming input and output, plus tool/function calling for grounded actions. Typical uses include voice assistants, live call handling, real-time captioning, and visual question answering over camera or screen content. Technical highlights include bidirectional audio, vision understanding, streaming responses, and structured outputs via functions.

GPT-4o mini Audio Preview

GPT-4o mini Audio Preview

GPT-4o mini Audio Preview is a compact multimodal model for building conversational audio applications. It supports speech input and output alongside text, enabling speech recognition, speech synthesis, and mixed text-audio dialogs with tool/function calling for structured actions. Typical uses include voice assistants, streaming transcription with summarization, IVR and call-bot workflows, and audio-enabled in-app helpers. Technical highlights include audio I/O, streaming responses, instruction following, and integration via chat and tools APIs.

Gemini omni fast

Gemini omni fast

Per Second:$0.4

Omni is the new model that can create anything from any input — starting with video. With Omni, you can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge. You can also easily edit your videos through conversation.

gpt-5.5

gpt-5.5

111

111

Per Request:$20

FLUX.2

FLUX.2

tts-1-hd-1106

tts-1-hd-1106

tts-1-hd

tts-1-hd

tts-1-1106

tts-1-1106

tts-1

tts-1

text-embedding-ada-002

text-embedding-ada-002

An Ada-based text embedding model optimized for various NLP tasks.

text-embedding-3-small

text-embedding-3-small

A small text embedding model for efficient processing.

text-embedding-3-large

text-embedding-3-large

A large text embedding model for a wide range of natural language processing tasks.

omni-moderation-latest

omni-moderation-latest

Per Request:$0.002

omni-moderation-2024-09-26

omni-moderation-2024-09-26

Per Request:$0.002

o1-pro-all

o1-pro-all

o1-pro-2025-03-19

o1-pro-2025-03-19

o1-pro

o1-pro

O1-pro is an artificial intelligence model provided by OpenAI.

o1-preview-all

o1-preview-all

Per Request:$0.2

o1-preview-2024-09-12

o1-preview-2024-09-12

o1-preview

o1-preview

O1-preview is an artificial intelligence model provided by OpenAI.

o1-mini-all

o1-mini-all

Per Request:$0.1

o1-mini-2024-09-12

o1-mini-2024-09-12

o1-mini

o1-mini

O1-mini is an artificial intelligence model provided by OpenAI.

o1-all

o1-all

Per Request:$0.2

o1-2024-12-17

o1-2024-12-17

o1

o1

O1 is an artificial intelligence model provided by OpenAI.

gpt-realtime-mini

gpt-realtime-mini

An economical version of the real-time GPT—capable of responding to Audio and text input in real-time via WebRTC, WebSocket, or SIP connections.

gpt-oss-20b

gpt-oss-20b

gpt-oss-20b is an artificial intelligence model provided by cloudflare-workers-ai.

gpt-oss-120b

gpt-oss-120b

gpt-oss-120b is an artificial intelligence model provided by cloudflare-workers-ai.

gpt-image-1

gpt-image-1

An advanced AI model for generating images from text descriptions.

gpt-4o-all

gpt-4o-all

<div>GPT-4o is OpenAI's most advanced Multimodal model, faster and cheaper than GPT-4 Turbo, with stronger visual capabilities. This model has a 128K context and a knowledge cutoff of October 2023. Models in the 1106 series and above support tool_calls and function_call.</div> This model supports a maximum context length of 128,000 tokens.

gpt-4-vision-preview

gpt-4-vision-preview

This model supports a maximum context length of 128,000 tokens.

gpt-4-vision

gpt-4-vision

This model supports a maximum context length of 128,000 tokens.

gpt-4-v

gpt-4-v

Per Request:$0.05

gpt-4-turbo-preview

gpt-4-turbo-preview

<div>gpt-4-turbo-preview Upgraded version, stronger code generation capabilities, reduced model "laziness", fixed non-English UTF-8 generation issues.</div> This model supports a maximum context length of 128,000 tokens.

gpt-4-turbo-2024-04-09

gpt-4-turbo-2024-04-09

<div>gpt-4-turbo-2024-04-09 Upgraded version, stronger code generation capabilities, reduced model "laziness", fixed non-English UTF-8 generation issues.</div> This model supports a maximum context length of 128,000 tokens.

gpt-4-turbo

gpt-4-turbo

GPT-4 Turbo is an artificial intelligence model provided by OpenAI.

gpt-4-search

gpt-4-search

Per Request:$0.05

gpt-4-gizmo-*

gpt-4-gizmo-*

gpt-4-gizmo

gpt-4-gizmo

gpt-4-dalle

gpt-4-dalle

Per Request:$0.05

gpt-4-all

gpt-4-all

gpt-4-32k

gpt-4-32k

GPT-4 32K is an artificial intelligence model provided by Azure.

gpt-4-1106-preview

gpt-4-1106-preview

gpt-4-0613

gpt-4-0613

gpt-4-0314

gpt-4-0314

gpt-4-0125-preview

gpt-4-0125-preview

gpt-4

gpt-4

GPT-4 is an artificial intelligence model provided by OpenAI.

gpt-3.5-turbo-0125

gpt-3.5-turbo-0125

GPT-3.5 Turbo 0125 is an artificial intelligence model provided by OpenAI. A pure official high-speed GPT-3.5 series, supporting tools_call. This model supports a maximum context length of 4096 tokens.

dall-e-3

dall-e-3

Per Request:$0.02

New version of DALL-E for image generation.

dall-e-2

dall-e-2

An AI model that generates images from text descriptions.