Search This Blog

Rishi's Notes

Posts

Web scraping

Get link
Facebook
X
Pinterest
Email
Other Apps

April 09, 2022

For web scraping Sources : real python web scraping imp geeksforgeeks Packages: to get HTML: 1. urllib 2.requests to parse HTML : 3. bs4 from urllib . request import urlopen import requests #can use both urllib as well as requests from bs4 import BeautifulSoup # response = urlopen ( 'https://jeemain.nta.nic.in' ) h_tml = response .read().decode( "utf-8" ) #string response2 = requests . get ( 'https://jeemain.nta.nic.in' ) h2_tml = response2 . content #string t = BeautifulSoup ( h2_tml , "html.parser" ) print ( h2_tml ) #or print(h_tml) response.text return the output as a string object, use it when you're downloading a text file . Such as HTML files, etc. And response.content returns the output as a bytes object, use it when you're downloading a binary file . Such as PDF files, audio files, images, etc.

1 comment