re——正则表达式
lxml——xpath语法
BeautifulSoup库——BeautifulSoup语法
本次使用BeautifulSoup库,导入模块
import requests
from bs4 import BeautifulSoup
headers = {
'Cookie': 'cainfo=%7B%22ca_a%22%3A%22-%22%2C%22ca_b%22%3A%22-%22%2C%22ca_s%22%3A%22seo_baidu%22%2C%22ca_n%22%3A%22default%22%2C%22ca_medium%22%3A%22-%22%2C%22ca_term%22%3A%22-%22%2C%22ca_content%22%3A%22%22%2C%22ca_campaign%22%3A%22%22%2C%22ca_kw%22%3A%22-%22%2C%22ca_i%22%3A%22-%22%2C%22scode%22%3A%2210103000312%22%2C%22keyword%22%3A%22-%22%2C%22ca_keywordid%22%3A%22-%22%2C%22ca_transid%22%3A%22%22%2C%22platform%22%3A%221%22%2C%22version%22%3A1%2C%22track_id%22%3A%224319658499235840%22%2C%22display_finance_flag%22%3A%22-%22%2C%22client_ab%22%3A%22-%22%2C%22guid%22%3A%22451910d6-aaa6-4408-ad0b-e6db61a0fb37%22%2C%22ca_city%22%3A%22linfen%22%2C%22sessionid%22%3A%22db60e9f7-55fa-4127-bbc3-67cdc766a720%22%7D; uuid=451910d6-aaa6-4408-ad0b-e6db61a0fb37; cityDomain=sh; ganji_uuid=5711987712877171316020; antipas=86302508296801922D41dx; clueSourceCode=%2A%2300; user_city_id=13; preTime=%7B%22last%22%3A1583804158%2C%22this%22%3A1583804141%2C%22pre%22%3A1583804141%7D; Hm_lvt_936a6d5df3f3d309bda39e92da3dd52f=1583804168; Hm_lpvt_936a6d5df3f3d309bda39e92da3dd52f=1583804184; sessionid=db60e9f7-55fa-4127-bbc3-67cdc766a720; lg=1; close_finance_popup=2020-03-10',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0'
}
url = 'https://www.guazi.com/sh/buy/'
resp = requests.get(url, headers=headers)
html = resp.content.decode('utf-8')
soup = BeautifulSoup(html, 'html.parser')
print(soup)
说明:html.parser为BeautifulSoup库自带的解析库(解析速度较低),对网页源代码进行解析。解析结果:
打印格式会发生变化,直观上更加整齐,方便数据的匹配解析。
在网页源代码中可以发现所有的车辆信息都在
的标签中,所以我们要进入标签去获取信息。
获取信息
infos = soup.find('ul',{'class':'carlist clearfix js-top'}).find_all('li')
print(infos)
for info in infos:
#类型
type = info.find('h2').get_text()
print(type)
说明:soup.find就是先找到属性值为“carlist clearfix js-top”的ul标签,再找到它下方所有的li标签下的所有信息。
for语句再对获取到的信息进行解析,获取li标签下具体的车辆信息,根据具体信息所在不同的标签find,如车价,里程数等。下图可看到车辆类型在h2标签中,可以从h2标签中获取的车辆类型信息。
获取结果:
【Python爬虫】瓜子二手车网(三)
【Python爬虫】瓜子二手车网——完整源码
作者:ZoomToday