在Python中html文件乱码情况如何处理-群英

您现在的位置是：群英 > 开发技术 > 移动开发

在Python中html文件乱码情况如何处理

Admin发表于 2022-07-19 17:34:101343 次浏览

上一篇： Golang交叉编译是怎样的，要点是什么

下一篇： gorm+gin怎么样实现restful分页接口

今天这篇给大家分享的知识是“在Python中html文件乱码情况如何处理”,小编觉得挺不错的，对大家学习或是工作可能会有所帮助，对此分享发大家做个参考，希望这篇“在Python中html文件乱码情况如何处理”文章能帮助大家解决问题。

python写入html文件中文乱码问题

使用open函数将爬虫爬取的html写入文件，有时候在控制台不会乱码，但是写入文件的html中的中文是乱码的

案例分析

看下面一段代码：

# 爬虫未使用cookiefrom urllib import requestif __name__ == '__main__':
    url = "http://www.renren.com/967487029/profile"

    rsp = request.urlopen(url)

    html = rsp.read().decode()    with open("rsp.html","w")as f:        # 将爬取的页面
        print(html)
        f.write(html)

看似没有问题，并且在控制台输出的html也不会出现中文乱码，但是创建的html文件中

解决方案

使用open方法的一个参数，名为encoding=” “，加入encoding=”utf-8”即可

# 爬虫未使用cookiefrom urllib import requestif __name__ == '__main__':
    url = "http://www.renren.com/967487029/profile"

    rsp = request.urlopen(url)

    html = rsp.read().decode()    with open("rsp.html","w",encoding="utf-8")as f:        # 将爬取的页面
        print(html)
        f.write(html)

运行结果