Web scraping

 

For web scraping

Sources :


Packages:

to get HTML:
1. urllib
2.requests

to parse HTML :
3. bs4

from urllib.request import urlopen
import requests
#can use both urllib as well as requests
from bs4 import BeautifulSoup  

#
response = urlopen('https://jeemain.nta.nic.in')
h_tml= response.read().decode("utf-8") #string

response2=requests.get('https://jeemain.nta.nic.in')
h2_tml=response2.content #string

t = BeautifulSoup(h2_tml,"html.parser")

print(h2_tml)
#or print(h_tml)


response.text return the output as a string object, use it when you're downloading a text file. Such as HTML files, etc.

And response.content returns the output as a bytes object, use it when you're downloading a binary file. Such as PDF files, audio files, images, etc.



Comments

Post a Comment

Popular posts from this blog