Python Module 4

Strings Manipulation
Question-1
The organizers want to ensure that all participants provide valid email
addresses during the sign-up. The developer tasked with this assignment,
is responsible for creating an email validation system. This system will
help filter out incorrect email addresses and ensure that participants
receive important updates and notifications about the exhibition.
def validate_email(email):
if "@" in email:
domain = email.split("@")[1]
valid_domains = ["gmail.com", "yahoo.com", "rediff.com", "hotmail.com"]
if domain in valid_domains:
return True
return False
if validate_email(user_email):
print("Email is valid.")
else:
print("Email is not valid.")
Question-2
In the world of art and literature, finding and highlighting specific
phrases or keywords within a text is crucial for creating impactful
narratives. In this assignment, you will develop a Python program that
helps users find and highlight a given substring within a larger text.
This program can be useful for curators and writers seeking to emphasize
certain themes or ideas within their content.
def find_and_highlight(text, substring):
index = text.find(substring)
if index == -1:
return "Substring not found in the text."

highlight_start = "["
highlight_end = "]"
modified_text = text[:index] + highlight_start + substring + highlight_end + text[index+len(substring):]
return modified_text
highlighted_text = find_and_highlight(larger_text, substring_to_find)
print("Modified Text:")
print("highlighted_text")
Question-3
A creative developer has been entrusted with designing the digital badges
for the exhibition participants. Each badge will feature the initials of
the artist or attendee. Bella decides to automate the process by creating
a script that takes the participant's full name and generates their
personalized initials to be featured on the badges.
def get_initials(name):
words = name.split()
initials = [word[0].upper() for word in words]
return ''.join(initials)
initials = get_initials(full_name)
print("Initials: ",initials)
Question-4
Dana, an aspiring journalist, has been assigned the task of creating
engaging and concise descriptions for the artworks to be displayed. To
ensure the descriptions are effective, she wants to avoid repetitive
words. Taking help from a programmer and her programming skills, Dana
develops a program that analyses the descriptions, providing her with
insights into the frequency of words. This helps her craft compelling
descriptions that captivate the exhibition visitors.
def count_words(text):
words = text.split()
word_count = {}
for word in words:
if word in word_count:
word_count[word] = word_count[word] + 1
else:
word_count[word] = 1
return word_count
article = article.lower()
word_frequency = count_words(article)
print("Word Frequency:")
for word, count in word_frequency.items():
print(f"{word}: {count}")
Question-5
The organizers want to get feedback from the participants and analyse the
feedback received after the exhibition. In this program, visitors can
provide feedback, and the program demonstrates the usage of various string
functions to analyse and manipulate the feedback. The analysis includes
counting the number of characters and words, converting to uppercase,
lowercase, and title case, reversing the feedback, and counting the number
of vowels.
You can further enhance and customize this program to include additional
string functions, graphical representations of the analysis, and other
interactive features.
def analyze_feedback(feedback):
num_characters = len(feedback)
num_words = len(feedback.split())
uppercase_feedback = feedback.upper()
lowercase_feedback = feedback.lower()
title_case_feedback = feedback.title()
reversed_feedback = feedback[::-1]
vowels = "AEIOUaeiou"
num_vowels = sum(1 for char in feedback if char in vowels)
print("Feedback Analysis:")
print("--------------------------------------------------")
print(f"Number of characters: {num_characters}")
print(f"Number of words: {num_words}")
print(f"Uppercase feedback: {uppercase_feedback}")
print(f"Lowercase feedback: {lowercase_feedback}")
print(f"Title case feedback: {title_case_feedback}")
print(f"Reversed feedback: {reversed_feedback}")
print(f"Number of vowels: {num_vowels}")
print("--------------------------------------------------")
def main():
print("Welcome to the Visitor Feedback Analysis Program!")
analyze_feedback(feedback)
print("\nThank you for your feedback!")
main()
WebScraping
Question-1
Now, The organizers of the exhibition got many entries online through
their website and they further wanted to get the information extracted
from the website. But their website wasn't working as some of the visitors
complained so they decided to first confirm if they were able to reach the
website or not. Adding to that a developer recommended he can test it
using Python’s request library.
import requests
url = "https://www.vadehraart.com/"
response = requests.get(url)
print(response.status_code)
if response.status_code == 200:
print(response.content)
else:
print("The webpage fetch has failed.")
Beautiful Soup
Question-1
In the context of an art exhibition that celebrates creativity and
inspiration, The organizers have asked the developers to develop a digital
curation system that gathers and presents a curated collection of
inspirational quotes, each paired with captivating visuals, to enhance
visitors' artistic experience. The objective is to create a Python-based
solution that automates the extraction of quotes, their associated themes,
images, and authors from a designated webpage.
import requests
from bs4 import BeautifulSoup
URL = "http://www.values.com/inspirational-quotes"
r = requests.get(URL)
print(r)
soup = BeautifulSoup(r.content, 'html.parser') # If
soup
quotes=[]
table = soup.find('div', attrs = {'id':'all_quotes'})
table
for row in table.findAll('div',
attrs = {'class':'col-6 col-lg-4 text-center margin-30px-bottom sm-margin-30px-top'}):
quote = {}
quote['theme'] = row.h5.text
quote['url'] = row.a['href']
quote['img'] = row.img['src']
quote['lines'] = row.img['alt'].split(" #")[0]

quotes.append(quote)
print(quotes)
WebScraping with String Manipulation

Question-1
In the creative realm of art exhibitions, the fusion of technology and
aesthetics can elevate the visitor experience to new heights. As a part of
an artistic endeavor, the team of developers has been commissioned to
enhance an upcoming art exhibition's digital representation. Your role in
this project is pivotal: you are tasked with extracting header tags from
the exhibition's dedicated webpage and conducting a comprehensive analysis
of the frequency of words used in these headers. By infusing technology
into art curation, you will contribute to designing a digital space that
not only showcases artistic brilliance but also captivates visitors with a
seamless and visually engaging online journey.
from urllib.request import urlopen
webpage_url = 'https://en.wikipedia.org/wiki/Main_Page'
response = urlopen(webpage_url)
soup = BeautifulSoup(response, 'html.parser')
header_tags = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])
print('List of header tags:')
header_list = []
for header in header_tags:
print(header.text)
header_list.append(header.text)
word_frequency = {}
for header in header_list:
words = header.split()
for word in words:
if word in word_frequency:
word_frequency[word] = word_frequency[word] + 1
else:
word_frequency[word] = 1
sorted_word_frequency = dict(sorted(word_frequency.items(), key=lambda item: item[1],

reverse=True))
print('\nWord Frequency Analysis:')
print('--------------------------')
count = 0
for word, freq in sorted_word_frequency.items():
print(f"{word}: {freq}")
count = count + 1
if count >= 5:
break
Capstone
Question-1
Gather and extract relevant information from the IMDb website's 'Top 250'
movie list, including movie rankings, titles, release years, IMDb ratings,
vote counts, and genres, using web scraping techniques and data parsing in
Python?"
import requests
url = "https://www.imdb.com/search/title/?groups=top_250&sort=user_rating"
soup = BeautifulSoup(response.content, 'html.parser')
movie_containers = soup.find_all('div',{'class':"lister-item mode-advanced"})
print(len(movie_containers))
movie_data = {}
for container in movie_containers:
movie_info = {}
ranking = container.h3.span.text.replace('.', '')
movie_info['ranking'] = ranking
name = container.h3.a.text
movie_info['name'] = name
year = container.h3.find('span', {'class':'lister-item-year text-muted unbold'}).text.strip('()')
movie_info['year'] = year
imdb = container.strong.text
movie_info['imdb_rating'] = float(imdb)
vote = container.find('span', {'name':'nv'}).text
movie_info['votes'] = vote
genres = container.p.find('span', {'class' : 'genre'}).text.strip()
movie_info['genre'] = genres
movie_data[name] = movie_info
print(movie_data)
Question-2
Create a dictionary of the above data provided: Write a Python program to
perform the following operations as mentioned below:
Sorting Movie Data by IMDb Ratings:

After extracting movie data, sort it based on IMDb ratings in descending
order using the sorted() function and a custom sorting key using the
lambda function.
movie_data = {'The Shawshank Redemption': {'ranking': '1', 'name': 'The Shawshank Redemption', 'year':
'1994', 'imdb_rating': 9.3, 'votes': '2,791,397', 'genre': 'Drama'}, 'The Godfather': {'ranking': '2', 'name':
'The Godfather', 'year': '1972', 'imdb_rating': 9.2, 'votes': '1,944,229', 'genre': 'Crime, Drama'}, 'The Dark
Knight': {'ranking': '3', 'name': 'The Dark Knight', 'year': '2008', 'imdb_rating': 9.0, 'votes': '2,771,569',
'genre': 'Action, Crime, Drama'}, "Schindler's List": {'ranking': '4', 'name': "Schindler's List", 'year': '1993',
'imdb_rating': 9.0, 'votes': '1,403,821', 'genre': 'Biography, Drama, History'}, 'The Godfather Part II':
{'ranking': '5', 'name': 'The Godfather Part II', 'year': '1974', 'imdb_rating': 9.0, 'votes': '1,320,870',
'genre': 'Crime, Drama'}, 'The Lord of the Rings: The Return of the King': {'ranking': '6', 'name': 'The Lord
of the Rings: The Return of the King', 'year': '2003', 'imdb_rating': 9.0, 'votes': '1,911,924', 'genre':
'Action, Adventure, Drama'}, '12 Angry Men': {'ranking': '7', 'name': '12 Angry Men', 'year': '1957',
'imdb_rating': 9.0, 'votes': '829,179', 'genre': 'Crime, Drama'}, 'Pulp Fiction': {'ranking': '8', 'name': 'Pulp
Fiction', 'year': '1994', 'imdb_rating': 8.9, 'votes': '2,141,402', 'genre': 'Crime, Drama'}, 'Spider-Man:
Across the Spider-Verse': {'ranking': '9', 'name': 'Spider-Man: Across the Spider-Verse', 'year': '2023',
'imdb_rating': 8.8, 'votes': '243,641', 'genre': 'Animation, Action, Adventure'}, 'Inception': {'ranking': '10',
'name': 'Inception', 'year': '2010', 'imdb_rating': 8.8, 'votes': '2,460,643', 'genre': 'Action, Adventure, Sci-
Fi'}, 'The Lord of the Rings: The Fellowship of the Ring': {'ranking': '11', 'name': 'The Lord of the Rings:
The Fellowship of the Ring', 'year': '2001', 'imdb_rating': 8.8, 'votes': '1,940,118', 'genre': 'Action,
Adventure, Drama'}, 'Fight Club': {'ranking': '12', 'name': 'Fight Club', 'year': '1999', 'imdb_rating': 8.8,
'votes': '2,225,606', 'genre': 'Drama'}, 'Forrest Gump': {'ranking': '13', 'name': 'Forrest Gump', 'year':
'1994', 'imdb_rating': 8.8, 'votes': '2,171,468', 'genre': 'Drama, Romance'}, 'Il buono, il brutto, il cattivo':
{'ranking': '14', 'name': 'Il buono, il brutto, il cattivo', 'year': '1966', 'imdb_rating': 8.8, 'votes': '787,484',
'genre': 'Adventure, Western'}, 'The Lord of the Rings: The Two Towers': {'ranking': '15', 'name': 'The
Lord of the Rings: The Two Towers', 'year': '2002', 'imdb_rating': 8.8, 'votes': '1,725,289', 'genre': 'Action,
Adventure, Drama'}, 'Jai Bhim': {'ranking': '16', 'name': 'Jai Bhim', 'year': '2021', 'imdb_rating': 8.8,
'votes': '209,719', 'genre': 'Crime, Drama, Mystery'}, 'Interstellar': {'ranking': '17', 'name': 'Interstellar',
'year': '2014', 'imdb_rating': 8.7, 'votes': '1,978,037', 'genre': 'Adventure, Drama, Sci-Fi'}, 'Goodfellas':
{'ranking': '18', 'name': 'Goodfellas', 'year': '1990', 'imdb_rating': 8.7, 'votes': '1,211,198', 'genre':
'Biography, Crime, Drama'}, 'The Matrix': {'ranking': '19', 'name': 'The Matrix', 'year': '1999',
'imdb_rating': 8.7, 'votes': '1,985,257', 'genre': 'Action, Sci-Fi'}, "One Flew Over the Cuckoo's Nest":
{'ranking': '20', 'name': "One Flew Over the Cuckoo's Nest", 'year': '1975', 'imdb_rating': 8.7, 'votes':
'1,041,219', 'genre': 'Drama'}, 'Star Wars: Episode V - The Empire Strikes Back': {'ranking': '21', 'name':
'Star Wars: Episode V - The Empire Strikes Back', 'year': '1980', 'imdb_rating': 8.7, 'votes': '1,338,664',
'genre': 'Action, Adventure, Fantasy'}, 'Oppenheimer': {'ranking': '22', 'name': 'Oppenheimer', 'year':
'2023', 'imdb_rating': 8.6, 'votes': '399,297', 'genre': 'Biography, Drama, History'}, 'Se7en': {'ranking': '23',
'name': 'Se7en', 'year': '1995', 'imdb_rating': 8.6, 'votes': '1,727,769', 'genre': 'Crime, Drama, Mystery'},
'The Silence of the Lambs': {'ranking': '24', 'name': 'The Silence of the Lambs', 'year': '1991',
'imdb_rating': 8.6, 'votes': '1,489,061', 'genre': 'Crime, Drama, Thriller'}, 'The Green Mile': {'ranking': '25',
'name': 'The Green Mile', 'year': '1999', 'imdb_rating': 8.6, 'votes': '1,356,698', 'genre': 'Crime, Drama,
Fantasy'}, 'Saving Private Ryan': {'ranking': '26', 'name': 'Saving Private Ryan', 'year': '1998',
'imdb_rating': 8.6, 'votes': '1,445,823', 'genre': 'Drama, War'}, 'Terminator 2: Judgment Day': {'ranking':
'27', 'name': 'Terminator 2: Judgment Day', 'year': '1991', 'imdb_rating': 8.6, 'votes': '1,138,887', 'genre':
'Action, Sci-Fi'}, 'Star Wars': {'ranking': '28', 'name': 'Star Wars', 'year': '1977', 'imdb_rating': 8.6, 'votes':
'1,410,486', 'genre': 'Action, Adventure, Fantasy'}, 'Sen to Chihiro no kamikakushi': {'ranking': '29',
'name': 'Sen to Chihiro no kamikakushi', 'year': '2001', 'imdb_rating': 8.6, 'votes': '806,573', 'genre':
'Animation, Adventure, Family'}, 'Cidade de Deus': {'ranking': '30', 'name': 'Cidade de Deus', 'year':
'2002', 'imdb_rating': 8.6, 'votes': '779,885', 'genre': 'Crime, Drama'}, 'La vita è bella': {'ranking': '31',
'name': 'La vita è bella', 'year': '1997', 'imdb_rating': 8.6, 'votes': '720,584', 'genre': 'Comedy, Drama,
Romance'}, "It's a Wonderful Life": {'ranking': '32', 'name': "It's a Wonderful Life", 'year': '1946',
'imdb_rating': 8.6, 'votes': '477,856', 'genre': 'Drama, Family, Fantasy'}, 'Shichinin no samurai': {'ranking':
'33', 'name': 'Shichinin no samurai', 'year': '1954', 'imdb_rating': 8.6, 'votes': '357,156', 'genre': 'Action,
Drama'}, 'Seppuku': {'ranking': '34', 'name': 'Seppuku', 'year': '1962', 'imdb_rating': 8.6, 'votes': '63,281',
'genre': 'Action, Drama, Mystery'}, 'Gladiator': {'ranking': '35', 'name': 'Gladiator', 'year': '2000',
'imdb_rating': 8.5, 'votes': '1,558,002', 'genre': 'Action, Adventure, Drama'}, 'The Prestige': {'ranking':
'36', 'name': 'The Prestige', 'year': '2006', 'imdb_rating': 8.5, 'votes': '1,391,963', 'genre': 'Drama,
Mystery, Sci-Fi'}, 'The Departed': {'ranking': '37', 'name': 'The Departed', 'year': '2006', 'imdb_rating': 8.5,
'votes': '1,376,163', 'genre': 'Crime, Drama, Thriller'}, 'Back to the Future': {'ranking': '38', 'name': 'Back
to the Future', 'year': '1985', 'imdb_rating': 8.5, 'votes': '1,259,666', 'genre': 'Adventure, Comedy, Sci-Fi'},
'Django Unchained': {'ranking': '39', 'name': 'Django Unchained', 'year': '2012', 'imdb_rating': 8.5, 'votes':
'1,629,532', 'genre': 'Drama, Western'}, 'Gisaengchung': {'ranking': '40', 'name': 'Gisaengchung', 'year':
'2019', 'imdb_rating': 8.5, 'votes': '882,754', 'genre': 'Drama, Thriller'}, 'Alien': {'ranking': '41', 'name':
'Alien', 'year': '1979', 'imdb_rating': 8.5, 'votes': '916,536', 'genre': 'Horror, Sci-Fi'}, 'Whiplash': {'ranking':
'42', 'name': 'Whiplash', 'year': '2014', 'imdb_rating': 8.5, 'votes': '927,771', 'genre': 'Drama, Music'},
'Léon': {'ranking': '43', 'name': 'Léon', 'year': '1994', 'imdb_rating': 8.5, 'votes': '1,205,981', 'genre':
'Action, Crime, Drama'}, 'The Usual Suspects': {'ranking': '44', 'name': 'The Usual Suspects', 'year': '1995',
'imdb_rating': 8.5, 'votes': '1,116,458', 'genre': 'Crime, Drama, Mystery'}, 'The Pianist': {'ranking': '45',
'name': 'The Pianist', 'year': '2002', 'imdb_rating': 8.5, 'votes': '874,314', 'genre': 'Biography, Drama,
Music'}, 'The Lion King': {'ranking': '46', 'name': 'The Lion King', 'year': '1994', 'imdb_rating': 8.5, 'votes':
'1,102,361', 'genre': 'Animation, Adventure, Drama'}, 'American History X': {'ranking': '47', 'name':
'American History X', 'year': '1998', 'imdb_rating': 8.5, 'votes': '1,155,786', 'genre': 'Crime, Drama'},
'Psycho': {'ranking': '48', 'name': 'Psycho', 'year': '1960', 'imdb_rating': 8.5, 'votes': '696,715', 'genre':
'Horror, Mystery, Thriller'}, 'The Intouchables': {'ranking': '49', 'name': 'The Intouchables', 'year': '2011',
'imdb_rating': 8.5, 'votes': '895,241', 'genre': 'Biography, Comedy, Drama'}, 'Casablanca': {'ranking': '50',
'name': 'Casablanca', 'year': '1942', 'imdb_rating': 8.5, 'votes': '590,567', 'genre': 'Drama, Romance,
War'}}
sorted_movie_data = sorted(movie_data.items(), key=lambda x: x[1]['imdb_rating'], reverse=True)
top_5_movies = sorted_movie_data[:5]
for movie_name, movie_info in top_5_movies:
print(f"Movie: {movie_name}")
print(f"IMDb Rating: {movie_info['imdb_rating']}")

print(f"Year: {movie_info['year']}")
print(f"Votes: {movie_info['votes']}")
print(f"Genre: {movie_info['genre']}")
print("=" * 30)
Question-3
Begin by importing the necessary libraries: requests for sending HTTP
requests, BeautifulSoup for parsing HTML content, and any other required
libraries.
Sending GET Request and Parsing HTML:
1. Define the URL of the IMDb page containing user reviews. Send a GET
request to the URL and create a BeautifulSoup object to parse the HTML
content.
Finding User Review Containers:

2. Use soup.find_all() to locate all the containers containing user
reviews. In this case, we are looking for
elements with the class lister-item-content.

Extracting User Reviews:
3. Iterate through each review container and extract the text content of
the reviews.
Checking for Negative Sentiment:
4. Define a list (['not good', 'pathetic', 'cannot','poor',

'disappointed', 'disappointment', 'bad', 'uninspired', 'negative']) of
predefined bad words that indicate negative sentiment. Create a user-
defined function contains_bad_words() to check if a review contains any of
the bad words.
Analyzing Reviews:
5. Iterate through the list of user reviews and use the

contains_bad_words() function to analyze each review for negative
sentiment.
import requests
url = "https://www.imdb.com/title/tt13375076/reviews/?ref_=ttexr_ql_2"
soup = BeautifulSoup(response.content, 'html.parser')
reviews = soup.find_all('div',{'class':"lister-item-content"})
print(len(reviews))
user_reviews = []
for r in reviews:
ur = r.find("div", {"class":"text show-more__control"}).text
user_reviews.append(ur)
print(user_reviews)
bad_words = ['not good', 'pathetic', 'cannot','poor', 'disappointed', 'disappointment', 'bad', 'uninspired',

'negative']
def contains_bad_words(comment_text):
comment_text_lower = comment_text.lower()
for word in bad_words:
if word in comment_text_lower:
return True
return False
user_reviews = enumerate(user_reviews, start=1)
for idx, rev in user_reviews:
if contains_bad_words(rev):
print(f"Quote {idx}: Contains bad words")
else:
print(f"Quote {idx}: Good")

Python Module 4

Uploaded by

Copyright:

Available Formats

Python Module 4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Python Module 4

Uploaded by

Copyright:

Available Formats

Strings Manipulation

valid_domains = ["gmail.com", "yahoo.com", "rediff.com", "hotmail.com"]

print("Email is not valid.")

def find_and_highlight(text, substring):

return "Substring not found in the text."

modified_text = text[:index] + highlight_start + substring + highlight_end + text[index+len(substring):]

highlighted_text = find_and_highlight(larger_text, substring_to_find)

initials = [word[0].upper() for word in words]

for word in words:

for word, count in word_frequency.items():

num_vowels = sum(1 for char in feedback if char in vowels)

print(f"Number of characters: {num_characters}")

print(f"Number of words: {num_words}")

print(f"Uppercase feedback: {uppercase_feedback}")

print(f"Lowercase feedback: {lowercase_feedback}")

print(f"Title case feedback: {title_case_feedback}")

print(f"Reversed feedback: {reversed_feedback}")

print(f"Number of vowels: {num_vowels}")

print("Welcome to the Visitor Feedback Analysis Program!")

print("\nThank you for your feedback!")

from bs4 import BeautifulSoup

soup = BeautifulSoup(r.content, 'html.parser') # If

table = soup.find('div', attrs = {'id':'all_quotes'})

for row in table.findAll('div',

attrs = {'class':'col-6 col-lg-4 text-center margin-30px-bottom sm-margin-30px-top'}):

quote['lines'] = row.img['alt'].split(" #")[0]

WebScraping with String Manipulation

from urllib.request import urlopen

from bs4 import BeautifulSoup

soup = BeautifulSoup(response, 'html.parser')

header_tags = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])

print('List of header tags:')

for header in header_tags:

for header in header_list:

for word in words:

sorted_word_frequency = dict(sorted(word_frequency.items(), key=lambda item: item[1],

print('\nWord Frequency Analysis:')

for word, freq in sorted_word_frequency.items():

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.content, 'html.parser')

movie_containers = soup.find_all('div',{'class':"lister-item mode-advanced"})

for container in movie_containers:

ranking = container.h3.span.text.replace('.', '')

year = container.h3.find('span', {'class':'lister-item-year text-muted unbold'}).text.strip('()')

vote = container.find('span', {'name':'nv'}).text

genres = container.p.find('span', {'class' : 'genre'}).text.strip()

Sorting Movie Data by IMDb Ratings:

sorted_movie_data = sorted(movie_data.items(), key=lambda x: x[1]['imdb_rating'], reverse=True)

for movie_name, movie_info in top_5_movies:

print(f"IMDb Rating: {movie_info['imdb_rating']}")

Sending GET Request and Parsing HTML:

Finding User Review Containers:

elements with the class lister-item-content.

Checking for Negative Sentiment:

4. Define a list (['not good', 'pathetic', 'cannot','poor',

5. Iterate through the list of user reviews and use the

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.content, 'html.parser')