ChatGPT流式输出

发表于 2023-03-12 更新于 2025-06-08

参考

消息检查是否违规：https://platform.openai.com/docs/guides/moderation/quickstart
怎么开启流式传输：https://platform.openai.com/docs/api-reference/chat/create#chat/create-stream

1
2

stream boolean Optional Defaults to false
Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.

怎么做到连续上下文：https://platform.openai.com/docs/guides/chat/introduction，可以用https://platform.openai.com/playground?mode=chat 进行测试，入下图：刚开始问的时候assistant内容是为空的。

# 当用户指令引用先前的消息时，包括对话历史记录会有所帮助。
messages=[
        # 主要的输入是消息参数。消息必须是一个消息对象的数组，其中每个对象都有一个角色（"系统"、"用户 "或 "助手"）和内容（消息的内容）。对话可以短到1条消息，也可以写满很多页。
        # 通常，对话首先使用系统消息进行格式化，然后是交替的用户和助理消息。
        {"role": "system", "content": "You are a helpful assistant."}, # 系统信息有助于设置助手的行为
        {"role": "user", "content": "Who won the world series in 2020?"}, # 用户消息有助于指导助手。它们可以由应用程序的最终用户生成，或由开发人员设置为指令
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"} # 用户消息有助于指导助手。它们可以由应用程序的最终用户生成，或由开发人员设置为指令
    ]

completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=messages
)

print(completion.choices[0].message)

In Python, the assistant’s reply can be extracted with response[‘choices’][0][‘message’][‘content’].

Every response will include a finish_reason. The possible values for finish_reason are:

stop: API returned complete model output
length: Incomplete model output due to max_tokens parameter or token limit
content_filter: Omitted content due to a flag from our content filters
null: API response still in progress or incomplete

ChatGPTUnofficialProxyAPI，如果您想跟踪对话，您需要像这样传递 parentMessageId

const api = new ChatGPTAPI({ apiKey: process.env.OPENAI_API_KEY })

// send a message and wait for the response
let res = await api.sendMessage('What is OpenAI?')
console.log(res.text)

// send a follow-up
res = await api.sendMessage('Can you expand on that?', {
  parentMessageId: res.id
})
console.log(res.text)

// send another follow-up
res = await api.sendMessage('What were we talking about?', {
  parentMessageId: res.id
})
console.log(res.text)

您可以通过 onProgress 处理程序添加流:

const res = await api.sendMessage('Write a 500 word essay on frogs.', {
  // print the partial response as the AI is "typing"
  onProgress: (partialResponse) => console.log(partialResponse.text)
})

// print the full text at the end
console.log(res.text)

python: https://github.com/labteral/chatgpt-python/blob/master/chatgpt/chatgpt.py
https://www.reddit.com/r/OpenAI/comments/10x67vc/stream_responses_from_openai_api_with_python_a/

accessToken获取，https://chat.openai.com/chat登陆，https://chat.openai.com/api/auth/session获取。

nginx配置

已经找到问题了
location /api
{
rewrite ^/api/?(.*)$ /$1 break;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header REMOTE-HOST $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Connection “”;
proxy_http_version 1.1;
proxy_pass http://127.0.0.1:3002/;
}
本地的这个配置是没有问题的，可以实现打字机效果，主要的就是如果你用的是代理的api，那么代理的api也要配置，nginx 默认配置的 proxy_buffering 是开启的 4k，会导致返回的数据先在ng积压到4k或者链接关闭才会返回到客户端，建议关闭proxy_buffering，如下是配置参考
location /openai/ {
default_type octet-stream;
proxy_buffering off;
chunked_transfer_encoding on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 120;
proxy_pass https://api.openai.com/;
proxy_set_header Host api.openai.com;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header REMOTE-HOST $remote_addr;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_http_version 1.1;
}

python测试流式输出：

import os
import openai

openai.api_key = ""
openai.api_base = "https://api.openai.com/v1" # 换成代理，一定要加v1

for resp in openai.ChatCompletion.create(
                                    model="gpt-3.5-turbo",
                                    messages=[
                                      {"role": "user", "content": "hello"}
                                    ],
                                    stream = True):
    if 'content' in resp.choices[0].delta:
        print(resp.choices[0].delta.content, end="", flush=True)

请求分析

未使用流式输出的情况

请求过程
Request URL: https://nephengpt.zeabur.app/proxy/v1/chat/completions
Request Method: POST
Status Code: 200
Remote Address: 119.23.226.212:3008
Referrer Policy: strict-origin-when-cross-origin
响应头
access-control-allow-origin: http://localhost:5174
cache-control: no-cache, must-revalidate
content-encoding: gzip
content-type: application/json # 未使用流式输出
date: Sun, 12 Mar 2023 05:33:10 GMT
openai-model: gpt-3.5-turbo-0301
openai-organization: user-utn66hl1ou1ok1dzfhyndxj3
openai-processing-ms: 13562
openai-version: 2020-10-01
strict-transport-security: max-age=15724800; includeSubDomains
vary: Origin, Accept-Encoding
x-request-id: 3bae3759ea1365e84ad6d8edccb7d8a1
请求头
:authority: nephengpt.zeabur.app
:method: POST
:path: /proxy/v1/chat/completions
:scheme: https
accept: /
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9
authorization: Bearer
cache-control: no-cache
content-length: 88
content-type: application/json
origin: http://localhost:5174
pragma: no-cache
referer: http://localhost:5174/
sec-ch-ua: “Chromium”;v=”110”, “Not A(Brand”;v=”24”, “Google Chrome”;v=”110”
sec-ch-ua-mobile: ?1
sec-ch-ua-platform: “Android”
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36
请求数据
{
“messages”: [
{
“role”: “user”,
“content”: “说一下冒泡排序”
}
],
“model”: “gpt-3.5-turbo”
}
使用流式输出的情况
请求过程，如https://freegpt.one/
Request URL: https://freegpt.one/backend-api/conversation
Request Method: POST
Status Code: 200
Remote Address: 119.23.226.212:3008
Referrer Policy: strict-origin-when-cross-origin
响应头
access-control-allow-origin: *
cache-control: no-cache
cf-cache-status: DYNAMIC
cf-ray: 7a69bd4d9900aac9-SYD
content-type: text/event-stream # 返回的格式
date: Sun, 12 Mar 2023 05:46:55 GMT
nel: {“success_fraction”:0,”report_to”:”cf-nel”,”max_age”:604800}
report-to: {“endpoints”:[{“url”:”https://a.nel.cloudflare.com/report/v3?s=uqkV8Rbu6KwATAq5Enx2gEs62OtzHb1OzxOEA%2FPPDTmEOU8rB2mTJRsbo%2FhBhozu4QjYf4hB8PB95lI3j7wflQxjXd59ZFJdC3nZIJrV5EAvf2UHqpr8C%2BKLgJ%2Bc”}],”group”:”cf-nel”,”max_age”:604800}
server: cloudflare
请求头
:authority: freegpt.one
:method: POST
:path: /backend-api/conversation
:scheme: https
accept: text/event-stream # 指定为event-stream
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9
authorization: Bearer
cache-control: no-cache
content-length: 262
content-type: application/json
cookie: cf_clearance=SCO.6UKnQWds7f7f0di8FY7YoR3SvGuRh8GLNecfsYo-1678599991-0-250
origin: https://freegpt.one # 代理站点
pragma: no-cache
referer: https://freegpt.one/
sec-ch-ua: “Chromium”;v=”110”, “Not A(Brand”;v=”24”, “Google Chrome”;v=”110”
sec-ch-ua-mobile: ?1
sec-ch-ua-platform: “Android”
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36
x-openai-assistant-app-id
请求数据
{
“action”: “next”,
“messages”: [
{
“id”: “60d068fc-54b8-43aa-a10c-59f6d009a95c”,
“role”: “user”,
“content”: {
“content_type”: “text”,
“parts”: [
“python怎么流式输出response”
]
}
}
],
“parent_message_id”: “283deb9b-ac7a-430f-95a9-92394ce9e149”,
“model”: “text-davinci-002-render”
}

另外的流输出方式，如https://gpt.gcchen.cn/：

请求过程
Request URL: https://gpt.gcchen.cn/api/chat-process
Request Method: POST
Status Code: 200
Remote Address: 1.12.228.197:443
Referrer Policy: strict-origin-when-cross-origin
响应头
access-control-allow-credentials: true
access-control-allow-headers: Content-Type
access-control-allow-headers: *
access-control-allow-methods: *
access-control-allow-methods: *
access-control-allow-origin: *
access-control-allow-origin: *
content-type: application/octet-stream
date: Sun, 12 Mar 2023 06:05:49 GMT
server: nginx
strict-transport-security: max-age=31536000
x-powered-by: Express
请求头
:authority: gpt.gcchen.cn
:method: POST
:path: /api/chat-process
:scheme: https
accept: application/json, text/plain, /
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9
cache-control: no-cache
content-length: 38
content-type: application/json
cookie: Hm_lvt_837d979bc9b73183a508c3486c37a02c=1678601121; Hm_lpvt_837d979bc9b73183a508c3486c37a02c=1678601121
origin: https://gpt.gcchen.cn
pragma: no-cache
referer: https://gpt.gcchen.cn/
sec-ch-ua: “Chromium”;v=”110”, “Not A(Brand”;v=”24”, “Google Chrome”;v=”110”
sec-ch-ua-mobile: ?1
sec-ch-ua-platform: “Android”
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36
请求数据
{
“prompt”: “冒泡排序”,
“options”: {}
}
第二次请求数据
{
“prompt”: “它的时间复杂度是多少”,
“options”: {
“parentMessageId”: “chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”
}
}
第三次请求数据
{
“prompt”: “和插入排序比怎么样”,
“options”: {
“parentMessageId”: “chatcmpl-6t9BNZ47EjAa9j2xIkY3ZbJdYKNNK” # 记录历史会话内容
}
}
响应数据
{“role”:”assistant”,”id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”parentMessageId”:”63e9fcea-8c58-4174-b8bf-73898f366cf1”,”text”:””,”detail”:{“id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”object”:”chat.completion.chunk”,”created”:1678601149,”model”:”gpt-3.5-turbo-0301”,”choices”:[{“delta”:{“role”:”assistant”},”index”:0,”finish_reason”:null}]}}
{“role”:”assistant”,”id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”parentMessageId”:”63e9fcea-8c58-4174-b8bf-73898f366cf1”,”text”:”冒”,”delta”:”冒”,”detail”:{“id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”object”:”chat.completion.chunk”,”created”:1678601149,”model”:”gpt-3.5-turbo-0301”,”choices”:[{“delta”:{“content”:”冒”},”index”:0,”finish_reason”:null}]}}
{“role”:”assistant”,”id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”parentMessageId”:”63e9fcea-8c58-4174-b8bf-73898f366cf1”,”text”:”冒泡”,”delta”:”泡”,”detail”:{“id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”object”:”chat.completion.chunk”,”created”:1678601149,”model”:”gpt-3.5-turbo-0301”,”choices”:[{“delta”:{“content”:”泡”},”index”:0,”finish_reason”:null}]}}
…
“parentMessageId”:”63e9fcea-8c58-4174-b8bf-73898f366cf1”,”text”:”冒泡排序是一种简单的排序算法，它重复地走访过要排序的数列，每次比较相邻的两个元素，如果顺序错误就交换它们的位置。经过一轮的比较后，最大（或最小）的元素就被交换到了数列的末尾（或开头），然后再从头开始进行下一轮比较和交换，直到全部元素都有序排列为止。”,”delta”:”。”,”detail”:{“id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”object”:”chat.completion.chunk”,”created”:1678601149,”model”:”gpt-3.5-turbo-0301”,”choices”:[{“delta”:{“content”:”。”},”index”:0,”finish_reason”:null}]}}
{“role”:”assistant”,”id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”parentMessageId”:”63e9fcea-8c58-4174-b8bf-73898f366cf1”,”text”:”冒泡排序是一种简单的排序算法，它重复地走访过要排序的数列，每次比较相邻的两个元素，如果顺序错误就交换它们的位置。经过一轮的比较后，最大（或最小）的元素就被交换到了数列的末尾（或开头），然后再从头开始进行下一轮比较和交换，直到全部元素都有序排列为止。”,”detail”:{“id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”object”:”chat.completion.chunk”,”created”:1678601149,”model”:”gpt-3.5-turbo-0301”,”choices”:[{“delta”:{},”index”:0,”finish_reason”:”stop”}]}} # 最后finish_reason为stop，data变为[DONE]