Web scraping

For web scraping

Sources :

real python web scraping

imp

geeksforgeeks

Packages:

to get HTML:

1. urllib

2.requests

to parse HTML :

3. bs4

from urllib.request import urlopen
import requests
#can use both urllib as well as requests
from bs4 import BeautifulSoup   

#
response = urlopen('https://jeemain.nta.nic.in')
h_tml= response.read().decode("utf-8") #string

response2=requests.get('https://jeemain.nta.nic.in')
h2_tml=response2.content #string

t = BeautifulSoup(h2_tml,"html.parser")

print(h2_tml)
#or print(h_tml)

response.text return the output as a string object, use it when you're downloading a text file. Such as HTML files, etc.

And response.content returns the output as a bytes object, use it when you're downloading a binary file. Such as PDF files, audio files, images, etc.

Search This Blog

Rishi's Notes

Web scraping

For web scraping

Sources :

Packages:

Comments

Post a Comment

Popular posts from this blog