【Python爬虫】瓜子二手车网(二)——获取信息

Sally ·
更新时间:2024-09-21
· 501 次阅读

【Python爬虫】瓜子二手车网(一) 获取数据 解析工具

re——正则表达式

lxml——xpath语法

BeautifulSoup库——BeautifulSoup语法

本次使用BeautifulSoup库,导入模块

import requests from bs4 import BeautifulSoup headers = { 'Cookie': 'cainfo=%7B%22ca_a%22%3A%22-%22%2C%22ca_b%22%3A%22-%22%2C%22ca_s%22%3A%22seo_baidu%22%2C%22ca_n%22%3A%22default%22%2C%22ca_medium%22%3A%22-%22%2C%22ca_term%22%3A%22-%22%2C%22ca_content%22%3A%22%22%2C%22ca_campaign%22%3A%22%22%2C%22ca_kw%22%3A%22-%22%2C%22ca_i%22%3A%22-%22%2C%22scode%22%3A%2210103000312%22%2C%22keyword%22%3A%22-%22%2C%22ca_keywordid%22%3A%22-%22%2C%22ca_transid%22%3A%22%22%2C%22platform%22%3A%221%22%2C%22version%22%3A1%2C%22track_id%22%3A%224319658499235840%22%2C%22display_finance_flag%22%3A%22-%22%2C%22client_ab%22%3A%22-%22%2C%22guid%22%3A%22451910d6-aaa6-4408-ad0b-e6db61a0fb37%22%2C%22ca_city%22%3A%22linfen%22%2C%22sessionid%22%3A%22db60e9f7-55fa-4127-bbc3-67cdc766a720%22%7D; uuid=451910d6-aaa6-4408-ad0b-e6db61a0fb37; cityDomain=sh; ganji_uuid=5711987712877171316020; antipas=86302508296801922D41dx; clueSourceCode=%2A%2300; user_city_id=13; preTime=%7B%22last%22%3A1583804158%2C%22this%22%3A1583804141%2C%22pre%22%3A1583804141%7D; Hm_lvt_936a6d5df3f3d309bda39e92da3dd52f=1583804168; Hm_lpvt_936a6d5df3f3d309bda39e92da3dd52f=1583804184; sessionid=db60e9f7-55fa-4127-bbc3-67cdc766a720; lg=1; close_finance_popup=2020-03-10', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0' } url = 'https://www.guazi.com/sh/buy/' resp = requests.get(url, headers=headers) html = resp.content.decode('utf-8') soup = BeautifulSoup(html, 'html.parser') print(soup)

说明:html.parser为BeautifulSoup库自带的解析库(解析速度较低),对网页源代码进行解析。解析结果:

打印格式会发生变化,直观上更加整齐,方便数据的匹配解析。

在网页源代码中可以发现所有的车辆信息都在

    的标签中,所以我们要进入标签去获取信息。

    获取信息 infos = soup.find('ul',{'class':'carlist clearfix js-top'}).find_all('li') print(infos) for info in infos: #类型 type = info.find('h2').get_text() print(type)

    说明:soup.find就是先找到属性值为“carlist clearfix js-top”的ul标签,再找到它下方所有的li标签下的所有信息。

    for语句再对获取到的信息进行解析,获取li标签下具体的车辆信息,根据具体信息所在不同的标签find,如车价,里程数等。下图可看到车辆类型在h2标签中,可以从h2标签中获取的车辆类型信息。

     获取结果:

    【Python爬虫】瓜子二手车网(三) 【Python爬虫】瓜子二手车网——完整源码
    作者:ZoomToday



    瓜子 二手车 python爬虫 Python

需要 登录 后方可回复, 如果你还没有账号请 注册新账号