python3+正则表达式爬取猫眼电影

小程序：扫一扫查出行
【扫一扫了解最新限行尾号】
复制小程序

'''Request+正则表达式抓取猫眼电影TOP100内容'''
import requests
from requests.exceptions import RequestException
import re
import json
from multiprocessing import Pool #进程池
def get_one_page(url):
try:
reponse = requests.get(url)
if reponse.status_code==200:
return reponse.text
return None
except RequestException:
return "error!"def parse_one_page(html):
pattern = re.compile('<dd>.*?board-index.*?>(\d+).*?data-src="(.*?)".*?name"><a'
'.*?>(.*?)</a>.*?star">(.*?).*?releasetime">(.*?)'
'.*?integer">(.*?).*?fraction">(.*?).*?</dd>',re.S)
items = re.findall(pattern,html)
for item in items:
yield {
"排名":item[0],
"海报连接": item[1],
"电影名": item[2],
"主演": item[3].strip()[3:],
"上映时间": item[4].strip()[5:],
"评分": item[5]+item[6]
}def write_to_file(content):
with open("content.txt","a",encoding="utf-8")as f:
# print(type(content)) #内容为字典形式
f.write(json.dumps(content,ensure_ascii=False)+'\n') #用于将字典形式的数据转化为字符串
f.close()
def main(offset):
url = "https://maoyan.com/board/4?offset="+str(offset)
get_html = get_one_page(url)
# print(get_html)
for item in parse_one_page(get_html):
print(item)
write_to_file(item)if __name__=='__main__':
# for i in range(10):
# main(i*10)
pool = Pool() # 进程池调用
pool.map(main,[i*10 for i in range(10)])

正文

相关阅读

Python3爬虫实战百度云，python爬虫资料百度云

Python3网络爬虫第二版，python3网络爬虫第二版pdf百度云

Python3爬虫(二)，python3爬虫入门教程

Python3爬虫如何带上cookie，python爬虫 cookie

Python3的爬虫实战，python爬虫100例教程

崔庆才Python3开发网络爬虫代码，崔庆才python3爬虫视频教程

Python3教务爬虫，爬虫学校教务系统违法吗

Python3爬虫403，Python3爬虫视频

目录[+]