user
How to get HTML page description with BeautifulSoup and Python?
alphonsio

In Python, the simplest way to get page description with BeautifulSoup is to use the following syntax:

from bs4 import BeautifulSoup

soup = BeautifulSoup(htmlStr, features="html5lib")

# Find description in meta <meta name="description"...
description = soup.find("meta", attrs={'name':'description'})
print (description["content"])

# Find description in meta <meta property="og:description" ...
description = soup.find("meta", property="og:description")
print (description["content"])

To get the description of an HTML page using BeautifulSoup and Python, you typically look for the <meta> tag with the name attribute set to "description". Here’s a step-by-step guide on how to do this:

Step 1: Install BeautifulSoup and Requests

If you haven't already, you need to install the BeautifulSoup and Requests libraries. You can do this using pip:

pip install beautifulsoup4 requests

Step 2: Write the Python Code

Here’s a sample code snippet that demonstrates how to fetch an HTML page and extract the description:

import requests
from bs4 import BeautifulSoup

# URL of the page you want to scrape
url = 'https://example.com'

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the content of the page
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Find the meta description tag
    description_tag = soup.find('meta', attrs={'name': 'description'})
    
    # Extract the content attribute if the tag is found
    if description_tag and 'content' in description_tag.attrs:
        description = description_tag['content']
        print("Description:", description)
    else:
        print("No description found.")
else:
    print("Failed to retrieve the page. Status code:", response.status_code)

Explanation:

  1. Requests: We use the requests library to fetch the content of the HTML page.
  2. BeautifulSoup: This library is used to parse the HTML content.
  3. Finding the Description: We look for the <meta> tag with name="description" and extract the content from its content attribute.

Note:

Make sure to replace 'https://example.com' with the actual URL of the page you want to scrape. Also, be mindful of the website's robots.txt and scraping policies.