Python爬虫库urllib的使用教程详解

关键词

Python爬虫库urllib的使用教程详解

以下是“Python爬虫库urllib的使用教程详解”的完整攻略。

一、简介

Python的Urllib库是一个用于网页抓取和数据提取的标准库，它包含了网页模拟，网页请求等一系列模块，可以方便地实现网页数据的GET请求和POST请求，并可以方便地处理网页返回的数据内容。

二、Urllib库常用模块

Urllib库包含4个模块：urllib.request、urllib.parse、urllib.error、urllib.robotparser，其中最常用的是urllib.request模块，它主要是用于发送HTTP/HTTPS请求。

urllib.request模块

urllib.request 模块提供了最基本的构造HTTP请求的功能，如最简单的发送请求的方法urlopen()。

import urllib.request
response = urllib.request.urlopen('http://www.baidu.com')
print(response.read()) # 打印百度网页内容

urllib.parse模块

urllib.parse 模块主要是用于解析和组合URL，urlencode()方法可以将参数编码成URL格式。

import urllib.parse
params = {'name': 'Python', 'age': 30}
qs = urllib.parse.urlencode(params)
url = 'http://www.example.com?'+qs
print(url) # 输出 http://www.example.com?name=Python&age=30

三、示例说明

示例一：抓取网页内容

import urllib.request
response = urllib.request.urlopen('http://www.baidu.com')
print(response.read()) # 打印百度网页内容

运行结果：

b'<!DOCTYPE html>\n<!--STATUS OK-->\n<html>....'

示例二：模拟POST请求

import urllib.request
import urllib.parse

data = urllib.parse.urlencode({'name': 'Python', 'age': 30})
data = data.encode('ascii') # ascii编码格式
url = 'http://www.example.com/login'
response = urllib.request.urlopen(url, data)
print(response.read()) # 打印登录成功后的网页HTML内容

运行结果：

b'<!DOCTYPE html>\n<html><head><title>Welcome to example.com</title></head><body><h1>Login success!</h1>...</body></html>'

以上就是“Python爬虫库urllib的使用教程详解”的完整攻略，希望对您有所帮助。

本文链接：http://task.lmcjl.com/news/14771.html

展开阅读全文

上一篇：JS Date（时间/日期）对象下一篇：使用Python 统计文件夹内所有pdf页数的小工具

热门文章排行

推荐文章

关键词

Python爬虫库urllib的使用教程详解

一、简介

二、Urllib库常用模块

三、示例说明

示例一：抓取网页内容

示例二：模拟POST请求