Web scraping
For web scraping
Sources :
Packages:
to get HTML:
1. urllib
2.requests
to parse HTML :
3. bs4
from urllib.request import urlopen
import requests
#can use both urllib as well as requests
from bs4 import BeautifulSoup
#
response = urlopen('https://jeemain.nta.nic.in')
h_tml= response.read().decode("utf-8") #string
response2=requests.get('https://jeemain.nta.nic.in')
h2_tml=response2.content #string
t = BeautifulSoup(h2_tml,"html.parser")
print(h2_tml)
#or print(h_tml)
response.text return the output as a string object, use it when you're downloading a text file. Such as HTML files, etc.
And response.content returns the output as a bytes object, use it when you're downloading a binary file. Such as PDF files, audio files, images, etc.
hehehe
ReplyDelete