首页 > 编程札记 > 编程

爬虫之数据提取方法（一、json提取）

阅读：评论：0

爬虫之数据提取方法（一、json提取）

json的数据提取

1 什么是json

JSON(JavaScript Object Notation) 是一种轻量级的数据交换格式，它使得人们很容易的进行阅读和编写。同时也方便了机器进行解析和生成。适用于进行数据交互的场景，比如网站前台与后台之间的数据交互。

2 json模块中方法的学习

其中类文件对象的理解：

具有read()或者write()方法的对象就是类文件对象，比如f = open(“a.txt”,”r”) f就是类文件对象

具体使用方法：

#json.dumps 实现python类型转化为json字符串
#indent实现换行和空格
#ensure_ascii=False实现让中文写入的时候保持为中文
json_str = json.dumps(mydict,indent=2,ensure_ascii=False)#json.loads 实现json字符串转化为python的数据类型
my_dict = json.loads(json_str)#json.dump 实现把python类型写入类文件对象
with open(&#","w") as f:json.dump(mydict,f,ensure_ascii=False,indent=2)# json.load 实现类文件对象中的json字符串转化为python类型
with open(&#","r") as f:my_dict = json.load(f)

3 jsonpath模块的学习

3.1 jsonpath介绍

用来解析多层嵌套的json数据;JsonPath 是一种信息抽取类库，是从JSON文档中抽取指定信息的工具，提供多种语言实现版本，包括：Javascript, Python， PHP 和 Java。

3.2 JsonPath 对于 JSON 来说，相当于 XPath 对于 XML。

    安装方法：pip install jsonpath官方文档：

3.3 JsonPath与XPath语法对比：

3.4 语法使用示例

{ "store": {"book": [ { "category": "reference","author": "Nigel Rees","title": "Sayings of the Century","price": 8.95},{ "category": "fiction","author": "Evelyn Waugh","title": "Sword of Honour","price": 12.99},{ "category": "fiction","author": "Herman Melville","title": "Moby Dick","isbn": "0-553-21311-3","price": 8.99},{ "category": "fiction","author": "J. R. R. Tolkien","title": "The Lord of the Rings","isbn": "0-395-19395-8","price": 22.99}],"bicycle": {"color": "red","price": 19.95}}
}

JSONPath	Result
$.store.book[*].author	store中的所有的book的作者
$…author	所有的作者
$.store.*	store下的所有的元素
$.store…price	store中的所有的内容的价格
$…book[2]	第三本书
$…book[(@.length-1)] \| $…book[-1:]	最后一本书
$…book[0,1] \| $…book[:2]	前两本书
$…book[?(@.isbn)]	获取有isbn的所有数
$…book[?(@.price<10)]	获取价格大于10的所有的书
$…*	获取所有的数据

3.5 代码示例：

我们以拉勾网城市JSON文件 .json 为例，获取所有城市。

import requests
import jsonpath
import jsonurl = '.json'
response &#(url)
html_str = t.decode()# 把json格式字符串转换成python对象
jsonobj = json.loads(html_str)# 从根节点开始，匹配name节点
citylist = jsonpath.jsonpath(jsonobj,'$..name')fp = open('city.json','w')content = json.dumps(citylist, ensure_ascii=False)fp.de('utf-8'))
fp.close()

效果如下：

本文发布于:2024-02-01 08:21:39，感谢您对本站的认可！

本文链接：https://www.4u4v.net/it/170674689935190.html

上一篇：lagou大前端课程

下一篇：爬取拉勾网之一：利用requests和lxml库爬取

标签：爬虫方法数据 json

留言与评论（共有 0 条评论）

爬虫之数据提取方法（一、json提取）

爬虫之数据提取方法（一、json提取）

目录

json的数据提取

1 什么是json

2 json模块中方法的学习

3 jsonpath模块的学习

3.1 jsonpath介绍

3.2 JsonPath 对于 JSON 来说，相当于 XPath 对于 XML。

3.3 JsonPath与XPath语法对比：

3.4 语法使用示例

3.5 代码示例：