ChatGPT流式输出

参考

消息检查是否违规:https://platform.openai.com/docs/guides/moderation/quickstart
怎么开启流式传输:https://platform.openai.com/docs/api-reference/chat/create#chat/create-stream

1
2
stream boolean Optional Defaults to false
Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.

怎么做到连续上下文:https://platform.openai.com/docs/guides/chat/introduction,可以用https://platform.openai.com/playground?mode=chat 进行测试,入下图:刚开始问的时候assistant内容是为空的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 当用户指令引用先前的消息时,包括对话历史记录会有所帮助。
messages=[
# 主要的输入是消息参数。消息必须是一个消息对象的数组,其中每个对象都有一个角色("系统"、"用户 "或 "助手")和内容(消息的内容)。对话可以短到1条消息,也可以写满很多页。
# 通常,对话首先使用系统消息进行格式化,然后是交替的用户和助理消息。
{"role": "system", "content": "You are a helpful assistant."}, # 系统信息有助于设置助手的行为
{"role": "user", "content": "Who won the world series in 2020?"}, # 用户消息有助于指导助手。它们可以由应用程序的最终用户生成,或由开发人员设置为指令
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"} # 用户消息有助于指导助手。它们可以由应用程序的最终用户生成,或由开发人员设置为指令
]

completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=messages
)

print(completion.choices[0].message)

In Python, the assistant’s reply can be extracted with response[‘choices’][0][‘message’][‘content’].

Every response will include a finish_reason. The possible values for finish_reason are:

  • stop: API returned complete model output
  • length: Incomplete model output due to max_tokens parameter or token limit
  • content_filter: Omitted content due to a flag from our content filters
  • null: API response still in progress or incomplete

ChatGPTUnofficialProxyAPI,如果您想跟踪对话,您需要像这样传递 parentMessageId

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
const api = new ChatGPTAPI({ apiKey: process.env.OPENAI_API_KEY })

// send a message and wait for the response
let res = await api.sendMessage('What is OpenAI?')
console.log(res.text)

// send a follow-up
res = await api.sendMessage('Can you expand on that?', {
parentMessageId: res.id
})
console.log(res.text)

// send another follow-up
res = await api.sendMessage('What were we talking about?', {
parentMessageId: res.id
})
console.log(res.text)

您可以通过 onProgress 处理程序添加流:

1
2
3
4
5
6
7
const res = await api.sendMessage('Write a 500 word essay on frogs.', {
// print the partial response as the AI is "typing"
onProgress: (partialResponse) => console.log(partialResponse.text)
})

// print the full text at the end
console.log(res.text)

python: https://github.com/labteral/chatgpt-python/blob/master/chatgpt/chatgpt.py
https://www.reddit.com/r/OpenAI/comments/10x67vc/stream_responses_from_openai_api_with_python_a/

accessToken获取,https://chat.openai.com/chat登陆,https://chat.openai.com/api/auth/session获取。

nginx配置

已经找到问题了
location /api
{
rewrite ^/api/?(.*)$ /$1 break;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header REMOTE-HOST $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Connection “”;
proxy_http_version 1.1;
proxy_pass http://127.0.0.1:3002/;
}
本地的这个配置是没有问题的,可以实现打字机效果,主要的就是如果你用的是代理的api,那么代理的api也要配置,nginx 默认配置的 proxy_buffering 是开启的 4k,会导致返回的数据先在ng积压到4k或者链接关闭才会返回到客户端,建议关闭proxy_buffering,如下是配置参考
location /openai/ {
default_type octet-stream;
proxy_buffering off;
chunked_transfer_encoding on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 120;
proxy_pass https://api.openai.com/;
proxy_set_header Host api.openai.com;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header REMOTE-HOST $remote_addr;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_http_version 1.1;
}

python测试流式输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import os
import openai

openai.api_key = ""
openai.api_base = "https://api.openai.com/v1" # 换成代理,一定要加v1

for resp in openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "hello"}
],
stream = True):
if 'content' in resp.choices[0].delta:
print(resp.choices[0].delta.content, end="", flush=True)

请求分析


未使用流式输出的情况

  1. 请求过程
    Request URL: https://nephengpt.zeabur.app/proxy/v1/chat/completions
    Request Method: POST
    Status Code: 200
    Remote Address: 119.23.226.212:3008
    Referrer Policy: strict-origin-when-cross-origin
  2. 响应头
    access-control-allow-origin: http://localhost:5174
    cache-control: no-cache, must-revalidate
    content-encoding: gzip
    content-type: application/json # 未使用流式输出
    date: Sun, 12 Mar 2023 05:33:10 GMT
    openai-model: gpt-3.5-turbo-0301
    openai-organization: user-utn66hl1ou1ok1dzfhyndxj3
    openai-processing-ms: 13562
    openai-version: 2020-10-01
    strict-transport-security: max-age=15724800; includeSubDomains
    vary: Origin, Accept-Encoding
    x-request-id: 3bae3759ea1365e84ad6d8edccb7d8a1
  3. 请求头
    :authority: nephengpt.zeabur.app
    :method: POST
    :path: /proxy/v1/chat/completions
    :scheme: https
    accept: /
    accept-encoding: gzip, deflate, br
    accept-language: zh-CN,zh;q=0.9
    authorization: Bearer
    cache-control: no-cache
    content-length: 88
    content-type: application/json
    origin: http://localhost:5174
    pragma: no-cache
    referer: http://localhost:5174/
    sec-ch-ua: “Chromium”;v=”110”, “Not A(Brand”;v=”24”, “Google Chrome”;v=”110”
    sec-ch-ua-mobile: ?1
    sec-ch-ua-platform: “Android”
    sec-fetch-dest: empty
    sec-fetch-mode: cors
    sec-fetch-site: cross-site
    user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36
  4. 请求数据
    {
    “messages”: [
    {
    “role”: “user”,
    “content”: “说一下冒泡排序”
    }
    ],
    “model”: “gpt-3.5-turbo”
    }
    使用流式输出的情况
  5. 请求过程,如https://freegpt.one/
    Request URL: https://freegpt.one/backend-api/conversation
    Request Method: POST
    Status Code: 200
    Remote Address: 119.23.226.212:3008
    Referrer Policy: strict-origin-when-cross-origin
  6. 响应头
    access-control-allow-origin: *
    cache-control: no-cache
    cf-cache-status: DYNAMIC
    cf-ray: 7a69bd4d9900aac9-SYD
    content-type: text/event-stream # 返回的格式
    date: Sun, 12 Mar 2023 05:46:55 GMT
    nel: {“success_fraction”:0,”report_to”:”cf-nel”,”max_age”:604800}
    report-to: {“endpoints”:[{“url”:”https:\/\/a.nel.cloudflare.com\/report\/v3?s=uqkV8Rbu6KwATAq5Enx2gEs62OtzHb1OzxOEA%2FPPDTmEOU8rB2mTJRsbo%2FhBhozu4QjYf4hB8PB95lI3j7wflQxjXd59ZFJdC3nZIJrV5EAvf2UHqpr8C%2BKLgJ%2Bc”}],”group”:”cf-nel”,”max_age”:604800}
    server: cloudflare
  7. 请求头
    :authority: freegpt.one
    :method: POST
    :path: /backend-api/conversation
    :scheme: https
    accept: text/event-stream # 指定为event-stream
    accept-encoding: gzip, deflate, br
    accept-language: zh-CN,zh;q=0.9
    authorization: Bearer
    cache-control: no-cache
    content-length: 262
    content-type: application/json
    cookie: cf_clearance=SCO.6UKnQWds7f7f0di8FY7YoR3SvGuRh8GLNecfsYo-1678599991-0-250
    origin: https://freegpt.one # 代理站点
    pragma: no-cache
    referer: https://freegpt.one/
    sec-ch-ua: “Chromium”;v=”110”, “Not A(Brand”;v=”24”, “Google Chrome”;v=”110”
    sec-ch-ua-mobile: ?1
    sec-ch-ua-platform: “Android”
    sec-fetch-dest: empty
    sec-fetch-mode: cors
    sec-fetch-site: same-origin
    user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36
    x-openai-assistant-app-id
  8. 请求数据
    {
    “action”: “next”,
    “messages”: [
    {
    “id”: “60d068fc-54b8-43aa-a10c-59f6d009a95c”,
    “role”: “user”,
    “content”: {
    "content_type": "text",
    "parts": [
      "python怎么流式输出response"
    ]
    
    }
    }
    ],
    “parent_message_id”: “283deb9b-ac7a-430f-95a9-92394ce9e149”,
    “model”: “text-davinci-002-render”
    }

另外的流输出方式,如https://gpt.gcchen.cn/:

  1. 请求过程
    Request URL: https://gpt.gcchen.cn/api/chat-process
    Request Method: POST
    Status Code: 200
    Remote Address: 1.12.228.197:443
    Referrer Policy: strict-origin-when-cross-origin
  2. 响应头
    access-control-allow-credentials: true
    access-control-allow-headers: Content-Type
    access-control-allow-headers:
    access-control-allow-methods:

    access-control-allow-methods:
    access-control-allow-origin:

    access-control-allow-origin: *
    content-type: application/octet-stream
    date: Sun, 12 Mar 2023 06:05:49 GMT
    server: nginx
    strict-transport-security: max-age=31536000
    x-powered-by: Express
  3. 请求头
    :authority: gpt.gcchen.cn
    :method: POST
    :path: /api/chat-process
    :scheme: https
    accept: application/json, text/plain, /
    accept-encoding: gzip, deflate, br
    accept-language: zh-CN,zh;q=0.9
    cache-control: no-cache
    content-length: 38
    content-type: application/json
    cookie: Hm_lvt_837d979bc9b73183a508c3486c37a02c=1678601121; Hm_lpvt_837d979bc9b73183a508c3486c37a02c=1678601121
    origin: https://gpt.gcchen.cn
    pragma: no-cache
    referer: https://gpt.gcchen.cn/
    sec-ch-ua: “Chromium”;v=”110”, “Not A(Brand”;v=”24”, “Google Chrome”;v=”110”
    sec-ch-ua-mobile: ?1
    sec-ch-ua-platform: “Android”
    sec-fetch-dest: empty
    sec-fetch-mode: cors
    sec-fetch-site: same-origin
    user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36
  4. 请求数据
    {
    “prompt”: “冒泡排序”,
    “options”: {}
    }
    第二次请求数据
    {
    “prompt”: “它的时间复杂度是多少”,
    “options”: {
    “parentMessageId”: “chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”
    }
    }
    第三次请求数据
    {
    “prompt”: “和插入排序比怎么样”,
    “options”: {
    “parentMessageId”: “chatcmpl-6t9BNZ47EjAa9j2xIkY3ZbJdYKNNK” # 记录历史会话内容
    }
    }
  5. 响应数据
    {“role”:”assistant”,”id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”parentMessageId”:”63e9fcea-8c58-4174-b8bf-73898f366cf1”,”text”:””,”detail”:{“id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”object”:”chat.completion.chunk”,”created”:1678601149,”model”:”gpt-3.5-turbo-0301”,”choices”:[{“delta”:{“role”:”assistant”},”index”:0,”finish_reason”:null}]}}
    {“role”:”assistant”,”id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”parentMessageId”:”63e9fcea-8c58-4174-b8bf-73898f366cf1”,”text”:”冒”,”delta”:”冒”,”detail”:{“id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”object”:”chat.completion.chunk”,”created”:1678601149,”model”:”gpt-3.5-turbo-0301”,”choices”:[{“delta”:{“content”:”冒”},”index”:0,”finish_reason”:null}]}}
    {“role”:”assistant”,”id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”parentMessageId”:”63e9fcea-8c58-4174-b8bf-73898f366cf1”,”text”:”冒泡”,”delta”:”泡”,”detail”:{“id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”object”:”chat.completion.chunk”,”created”:1678601149,”model”:”gpt-3.5-turbo-0301”,”choices”:[{“delta”:{“content”:”泡”},”index”:0,”finish_reason”:null}]}}

    “parentMessageId”:”63e9fcea-8c58-4174-b8bf-73898f366cf1”,”text”:”冒泡排序是一种简单的排序算法,它重复地走访过要排序的数列,每次比较相邻的两个元素,如果顺序错误就交换它们的位置。经过一轮的比较后,最大(或最小)的元素就被交换到了数列的末尾(或开头),然后再从头开始进行下一轮比较和交换,直到全部元素都有序排列为止。”,”delta”:”。”,”detail”:{“id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”object”:”chat.completion.chunk”,”created”:1678601149,”model”:”gpt-3.5-turbo-0301”,”choices”:[{“delta”:{“content”:”。”},”index”:0,”finish_reason”:null}]}}
    {“role”:”assistant”,”id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”parentMessageId”:”63e9fcea-8c58-4174-b8bf-73898f366cf1”,”text”:”冒泡排序是一种简单的排序算法,它重复地走访过要排序的数列,每次比较相邻的两个元素,如果顺序错误就交换它们的位置。经过一轮的比较后,最大(或最小)的元素就被交换到了数列的末尾(或开头),然后再从头开始进行下一轮比较和交换,直到全部元素都有序排列为止。”,”detail”:{“id”:”chatcmpl-6t981Jwm9KjjCvfqlS6uCP66ChYnS”,”object”:”chat.completion.chunk”,”created”:1678601149,”model”:”gpt-3.5-turbo-0301”,”choices”:[{“delta”:{},”index”:0,”finish_reason”:”stop”}]}} # 最后finish_reason为stop,data变为[DONE]
nephen wechat
欢迎您扫一扫上面的微信公众号,订阅我的博客!
坚持原创技术分享,您的支持将鼓励我继续创作!