全球主机交流论坛

标题: python大佬请进,关于抓取的数据格式的问题 [打印本页]

作者: 张大牛    时间: 2023-9-18 00:17
标题: python大佬请进,关于抓取的数据格式的问题
最近百度上线的chat  https://chat.baidu.com/

想自己套个壳玩玩,原打算是用PHP的,但是因为这个结果是流数据,PHP的curl貌似搞不定,只好转到python

代码已完成90%

(, 下载次数: 29)

代码运行成功,也有返回数据

但返回的数据都是这种格式

C:\>python baidu.py
请求成功
event:ping

event:message
data:{"status":0,"qid":"12643291455431891488","pkgId":"cd971053-9663-4d88-8a54-2
c428ddbf3b7_0","sessionId":"43d95fc2-b94f-442f-9c5b-b8078f712862","isDefault":1,
"isShow":0,"data":{"message":{"msgId":"cd971053-9663-4d88-8a54-2c428ddbf3b7","is
Rebuild":false,"updateTime":"1694966929746","metaData":{"state":"waiting-resp","
endTurn":false,"userInfo":{"status":3}},"content":{}}}}


event:message
data:{"status":0,"qid":"12643291455431891488","pkgId":"cd971053-9663-4d88-8a54-2
c428ddbf3b7_1","sessionId":"43d95fc2-b94f-442f-9c5b-b8078f712862","isDefault":1,
"isShow":0,"data":{"message":{"msgId":"cd971053-9663-4d88-8a54-2c428ddbf3b7","is
Rebuild":false,"updateTime":"1694966933900","metaData":{"state":"waiting-resp","
endTurn":false,"userInfo":{"status":3}},"content":{"searchQuery":{"querys":["鲁
迅是谁"]}}}}}


event:message
data:{"status":0,"qid":"12643291455431891488","pkgId":"cd971053-9663-4d88-8a54-2
c428ddbf3b7_2","sessionId":"43d95fc2-b94f-442f-9c5b-b8078f712862","isDefault":1,
"isShow":0,"data":{"message":{"msgId":"cd971053-9663-4d88-8a54-2c428ddbf3b7","is
Rebuild":false,"updateTime":"1694966933978","metaData":{"state":"generating-resp
","endTurn":false,"userInfo":{"status":3}},"content":{"generator":{"text":"鲁迅
,原名周樟","type":"txt","showType":"append","antiFlag":0,"isFinished":false}}}}
}


event:message
data:{"status":0,"qid":"12643291455431891488","pkgId":"cd971053-9663-4d88-8a54-2
c428ddbf3b7_3","sessionId":"43d95fc2-b94f-442f-9c5b-b8078f712862","isDefault":1,
"isShow":0,"data":{"message":{"msgId":"cd971053-9663-4d88-8a54-2c428ddbf3b7","is
Rebuild":false,"updateTime":"1694966934570","metaData":{"state":"generating-resp
","endTurn":false,"userInfo":{"status":3}},"content":{"generator":{"text":"寿,
后改名周树人,字豫山,后改字豫才,是浙江绍兴的人。","type":"txt","showType":"app
end","antiFlag":0,"isFinished":false}}}}}

请问怎么才能把需要的text内容提取出来,组成完整的答案?

求大佬指教
作者: taiyi747    时间: 2023-9-18 00:21
正则万能,你这个格式我前两天刚处理过,把内容交给ai让他给你写正则表达式就好了
作者: BackDoor    时间: 2023-9-18 00:24
提示: 作者被禁止或删除 内容自动屏蔽
作者: 张大牛    时间: 2023-9-18 00:31
taiyi747 发表于 2023-9-18 00:21
正则万能,你这个格式我前两天刚处理过,把内容交给ai让他给你写正则表达式就好了 ...

谢谢大佬,没想到呢,一直想着json解析
作者: 张大牛    时间: 2023-9-18 01:03
BackDoor 发表于 2023-9-18 00:24
python 有  json库直接输出的。

parsed_data['data']['message']['content']['generator']['text']

好的,谢谢大佬
作者: william2ct    时间: 2023-9-18 01:42
json.loads()
作者: zqm840527    时间: 2023-9-18 08:48
这不就是JSON?
作者: 秋秋0827    时间: 2023-9-18 10:34
这是py的json基础问题呀,你的头像有点花里胡哨呀




欢迎光临 全球主机交流论坛 (https://d.168530.xyz/) Powered by Discuz! X3.4