首页 > 编程札记 > 编程

爬虫逆向实战(12)

阅读：评论：0

爬虫逆向实战(12)

一、数据接口分析

主页地址：某交易所

1、抓包

通过抓包可以发现登录是通过表单提交的

2、判断是否有加密参数

请求参数是否加密？
通过查看“载荷”模块，可以发现有两个加密参数password和execution
请求头是否加密？
无
响应是否加密？
无
cookie是否加密？
无

二、加密位置定位

1、password

（1）看启动器

因为这个登录是表单提交，所以无法通过启动器点位

（2）搜索关键字

通过搜索关键字password =可以找到password的加密位置

2、execution

通过搜索关键字execution可以发现，这个值是直接写在html静态页面中的。

三、扣js代码

从定位到的password加密位置可以发现，网站仅仅使用了一个encode64方法转码，但是为了防止网站是改写的，我们还是先测试一下。将刚刚抓到的包中的password密文进行转码，可以发现成功转码成了明文，这就说明这个网站真的是只转了一下码。所以我们也就没有必要扣js代码了。
execution是直接写在静态页面上的，所以我们只需要先请求静态页面，再使用正则表达式'<input type="hidden" name="execution" value="(.*?)"'把execution的值匹配出来就可以了。

四、验证码

1、接口分析

通过点击图片更换验证码可以发现，每次更换验证码，网站都会发一个包请求sso/picture

通过查看“载荷”模块，可以发现这个请求会携带receiver、enuuid、mark、rand四个参数。其中mark是账号，rand是随机数，所以这两个参数不需要关心。而receiver和enuuid这两个参数，我们仔细观察抓包可以发现，这两个参数来自于一个enuuid的接口

所以我们可以请求这个接口获取到这两个参数。
当我们在输入框输入图片验证码时，网站会请求sso/validlogin接口，携带输入的验证码来校验我们输入的验证码。

五、发送请求

1、思路

结合上面的分析，我们可以先请求html静态页面获取到execution参数，然后请求enuuid接口获取到uuid和enuuid参数，获取到这两个参数之后，我们就可以获取图片验证码了，请求sso/picture接口获取到图片验证码，然后识别图片验证码（我这里使用的打码平台进行的识别）。携带识别出的验证码请求sso/validlogin接口，返回成功响应后，将密码进行encode64转码，再发包进行登录即可。

2、源代码

"""
Email：912917367@qq
Date: 2023/8/14 13:37
"""
import base64
import re
import timeimport requestsfrom utils.chaojiying import ChaojiyingClientclass Spider:def __init__(self, username, password):self.session = requests.session()self.session.headers = {"Origin": "","Referer": "=/&locale=zh","User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36",}ution = ''self.uuid = '&#uuid = ''self.pic_str = ''self.username = usernameself.password = passworddef get_execution(self):url = '=/&locale=zh'response = (url=url)pattern = r'<input type="hidden" name="execution" value="(.*?)"&#ution = re.findall(pattern, )[0]def get_uuid(self):url = ""params = {"service": "/","locale": "zh","_": int(time.time() * 1000)}response = (url, params=params)info_data = response.json()self.uuid = info_data['uuid']uuid = info_data['enuuid']def get_img_code(self):url = ""params = {"receiver": self.uuid,"enuuid": uuid,"rand": "0.004521081820116013"}response = (url, params=params)with open('img.png', 'wb') as f:f.t)cjy = ChaojiyingClient('超级鹰账号', '超级鹰密码', '超级鹰应用id')im = open('img.png', 'rb').read()pic_data = cjy.post_pic(im, 1902)self.pic_str = pic_data['pic_str']print(self.pic_str)def check_img_code(self):url = ""params = {"text": self.pic_str,"receiver": self.uuid,"mark": self.username,"type": "3","_": int(time.time() * 1000)}response = (url, params=params)if '正确' in response.json()['message']:return Truereturn Falsedef login(self):encoded_bytes = base64.b64encode(de('utf-8'))pwd = encoded_bytes.decode('utf-8')url = ""params = {"service": "/","locale": "zh"}data = {"receiver": self.username,"iframe": "false","password": pwd,"text": self.pic_str,"uuid": self.uuid,"type": "PL","execution": ution,"_eventId": "submit"}response = self.session.post(url, params=params, data=data))print(response)def run(self):_execution()_uuid()while _img_code()if self.check_img_code():breakself.login()if __name__ == '__main__':spider = Spider('账号', '密码')spider.run()

本文发布于:2024-01-31 13:50:30，感谢您对本站的认可！

本文链接：https://www.4u4v.net/it/170668023228978.html

上一篇：python IO编程（文件读写、StringIO和BytesIO、操作文件和目录、序列化）

下一篇：Python进阶:聊协程

标签：爬虫实战

留言与评论（共有 0 条评论）