Create chat completion

POST

/v1/chat/completions

Production server

Creates a model response for the given chat conversation. OpenAI-compatible endpoint.

Authorizations

BearerAuth

Request Body^required

object

model

required

ID of the model to use

string

vllm-primary

messages

required

A list of messages comprising the conversation

Array<object>

object

role

required

The role of the message author

string

Allowed values: system user assistant

content

required

The content of the message

string

temperature

Sampling temperature

number

default: 1 <= 2

max_tokens

Maximum tokens to generate

integer

stream

Enable streaming responses

boolean

top_p

Nucleus sampling parameter

number

<= 1

stop

One of:

string
Array<string>

string

Responses

200

Successful response

object

string

chatcmpl-abc123

object

string

chat.completion

created

Unix timestamp

integer

model

string

choices

Array<object>

object

index

integer

message

object

role

required

The role of the message author

string

Allowed values: system user assistant

content

required

The content of the message

string

finish_reason

string

Allowed values: stop length tool_calls

usage

object

prompt_tokens

integer

completion_tokens

integer

total_tokens

integer

400

Invalid request

object

error

object

message

string

type

string

code

string

param

string

401

Invalid or missing API key

object

error

object

message

string

type

string

code

string

param

string

429

Rate limit exceeded

object

error

object

message

string

type

string

code

string

param

string

Create chat completion

Authorizations

Request Body required

Responses

200

400

401

429

Request Body^required