Create chat completion
POST /v1/chat/completions
POST
/v1/chat/completions
Creates a model response for the given chat conversation. OpenAI-compatible endpoint.
Authorizations
Request Body required
object
model
required
ID of the model to use
string
vllm-primary messages
required
A list of messages comprising the conversation
Array<object>
object
role
required
The role of the message author
string
content
required
The content of the message
string
temperature
Sampling temperature
number
max_tokens
Maximum tokens to generate
integer
stream
Enable streaming responses
boolean
top_p
Nucleus sampling parameter
number
stop
One of:
string
Array<string>
Responses
200
Successful response
object
id
string
chatcmpl-abc123 object
string
chat.completion created
Unix timestamp
integer
model
string
choices
Array<object>
object
index
integer
message
object
role
required
The role of the message author
string
content
required
The content of the message
string
finish_reason
string
usage
object
prompt_tokens
integer
completion_tokens
integer
total_tokens
integer
400
Invalid request
object
error
object
message
string
type
string
code
string
param
string
401
Invalid or missing API key
object
error
object
message
string
type
string
code
string
param
string
429
Rate limit exceeded
object
error
object
message
string
type
string
code
string
param
string