0% found this document useful (0 votes)
61 views

Report Final

This document presents a project report on ImagineX, a system for creating images using artificial intelligence. The report includes an introduction on the brief history and objectives of AI image generation. It also describes the system requirements, feasibility analysis, system design, and implementation of the ImagineX system. The project aims to showcase the potential of generative AI tools like DALL-E for producing high-quality images for various applications.

Uploaded by

Soumya Bera
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Report Final

This document presents a project report on ImagineX, a system for creating images using artificial intelligence. The report includes an introduction on the brief history and objectives of AI image generation. It also describes the system requirements, feasibility analysis, system design, and implementation of the ImagineX system. The project aims to showcase the potential of generative AI tools like DALL-E for producing high-quality images for various applications.

Uploaded by

Soumya Bera
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

DEPARTMENT OF INFORMATION TECHNOLOGY

FUTURE INSTITUTE OF ENGINEERING AND MANAGEMENT


SONARPUR , 700150, WEST BENGAL, INDIA

ImagineX:
Creating Images with Ease using AI Wizardry

A Project Report Submitted in Partial Fulfilment of the


Requirement for the Degree of

Bachelor of Technology
in
Information Technology

by Group No. : 04

Under the Guidance of


Prof. Arindham Sinharay

Ajmal Danish -14800220009


Warisha Hussam - 14800220019
Muzammil Aftab - 14800220001
Soumya Bera - 14800220025

Page
1
Department of Information Technology
FUTURE INSTITUTE OF ENGINEERING AND MANAGEMENT
Sonarpur Station Road, Kolkata – 700150
Tel: 033-2434 5640 (Extn. – 238) URL: www.futureengineering.in

CERTIFICATE

We do hereby declaring that the work which is being presented in the


Project Report entitled ImagineX: Creating Images with Ease using AI
Wizardry in partial fulfillment of the requirements for the award of the
Bachelor of Technology in Information Technology and submitted to the
Department of Information Technology of Future Institute of Engineering
and Management , Kolkata , is an authentic record of our own work
carried out during the period from July 2023 to May 2024, under the
supervision of Prof. Arindam Sinharay.
The matter presented in this thesis has been submitted by us for the award
of any other degree elsewhere.
Full Signature of the Students(s)
a)
b)
c)
d)
This is to certify that the above statement made by the students, is correct
to the best of my knowledge.

Signature of the Supervisor


Date:

Head
Department of Information Technology
Future Institute of Engineering and Management
Kolkata, WB

Page
2
Acknowledgement

During our work on this project, we had learned many things and
that is not only professional, but also in the personal sense. This was an
absolute group effort; however, it would not have been possible without
the kind support and help of many individuals. We would like to extend
our sincere thanks to all of them.

We are highly indebted to our guide Prof. Arindam Sinharay his


guidance and constant supervision as well as for providing necessary
information regarding the project and also for his support in completing
the project.

We express our thanks to our Principal Dr.Anirban Chakrabarty


and our Head of the Department Prof. Poly Sil Sen for extending their
support. We would also thank our Institution and the faculty members
without whom this project would have been a distant reality. Our thanks
and appreciations also go to all people who have willingly helped us out
with their abilities.

Group No. 4

Class Name Makaut Roll No. Signature


Roll no.
20/IT/063 Ajmal Danish 14800220009
20/IT/005 Warisha Hussam 14800220019
20/IT/056 Muzammil Aftab 14800220001
20/IT/009 Soumya Bera 14800220025

Page
3
INDEX

Abstract………………………………………………………………….

1. Introduction …………………………………………………………
1.1 Brief History ………………………………………………………
1.2 Objective……………………………………………………………
1.3 What is Web Development? …………………………………………
1.4 What is a Website? …………………………………………………

2 System Requirements and System Analysis …………………………..


2.1 Requirement Collection………………………………………………..
2.2 System Requirements ………………………………………………….
2.2.1 Functional Requirements …………………………………………..
2.2.2 Non - Functional Requirements…………………………………….
2.3 Feasibility Analysis ………………………………………………….
2.3.1 Economic Feasibility ……………………………………………..
2.3.2 Operational Feasibility ……………………………………………
2.3.3 Technical Feasibility ………………………………………………
2.3.4 Schedule Feasibility ……………………………………………….

3 System Design ………………………………………………………..


3.1 Overall Design……………………………………………………
3.2 Design Diagrams ……………………………………………………

4. Implementation ……………………………………………………….
4.1 Implementation Tools………………………………………………
4.2 Front End Tools…………………………………………………….
4.3 Back End Tools………………………………………………………
4.4 Model
Implementation…………………………………………………………..

5 Future
Scope ………………………………………………………………….
6 References …………………………………………………………….

Page
4
Abstract

Artificial Intelligence (AI) has revolutionized the field of image


generation, offering an efficient and effective alternative to traditional
image creation methods. Al-powered image generation algorithms are
capable of producing high-quality, realistic images with unprecedented
speed and accuracy, making them an invaluable tool for a wide range of
industries, including gaming, e- commerce, and advertising.
Al image generation algorithms are based on deep learning techniques,
which involve training a neural network on a large dataset of images to
learn patterns and relationships between different features. Once trained,
the algorithm can generate new images by combining these learned
features in novel ways, producing unique and visually appealing results.
The benefits of using AI for image generation are numerous. It can
significantly reduce the time and cost associated with traditional image
creation methods, while also allowing for greater customization and
flexibility. Additionally, AI- powered image generation can help
overcome many of the limitations associated with human-generated
images, such as bias and subjectivity.
Overall, the ability of AI to generate high-quality images quickly and
efficiently has made it an essential tool for businesses and creative
professionals looking to enhance their visual content. As AI technology
continues to advance, we can expect even greater innovation and new
applications in the field of image generation.

Page
5
1. Introduction

1.1 Brief History :

Artificial Intelligence (AI) has made significant strides in recent years,


and one area that has seen remarkable advancements is generative models.
Generative models are a subset of machine learning models that can
generate data that is similar to the training data. One such generative
model is DALL-E, developed by OpenAI.
OpenAI is an Al research organization that was founded in 2015 by a
group of visionaries including Elon Musk, Sam Altman, Greg Brockman,
and others. The organization's goal was to create safe and beneficial Al
for humanity. Since its inception, OpenAI has developed several AI
models that have pushed the boundaries of what is possible in AI research.
DALL-E is a generative model that was introduced by OpenAI in 2021.
The model is based on the Transformer architecture, which uses attention
mechanisms to model dependencies between the input and output.
DALL-E was pre-trained on a massive corpus of images and text data,
which allowed it to learn patterns and structures in image generation. This
pre-training enabled the model to generate coherent and high-quality
images when fine-tuned on specific image generation tasks.
This project aims to showcase the potential of the DALL-E model in
generating high-quality images that can be used in various applications.
The project includes several stages, including testing and evaluation of
the AI tool. The evaluation focuses on the quality, similarity to the input
description, and diversity of the generated images. The results
demonstrate the effectiveness of the Al tool in producing high-quality
images that meet user requirements.
Overall, this project showcases the potential of Al in generating images
that can be used in various applications, such as art, advertising, and
entertainment. The project also highlights the challenges and
opportunities in developing Al tools that can generate high-quality
images that meet user requirements. DALL-E, based on the Transformer
model, is an excellent example of how AI can be applied to generative
tasks and push the boundaries of what is possible in AI research.
DALL-E was introduced by OpenAI in January 2021 and quickly gained
widespread attention due to its impressive image generation capabilities.
The name "DALL-E" is a reference to the surrealist artist Salvador Dalí
and the animated movie character Wall-E, symbolizing the model's
ability to generate surreal and imaginative images.

Page
6
DALL-E is a successor to OpenAI's previous language generation model,
GPT- 3, which was primarily designed for natural language processing
tasks. DALL- E's focus on image generation makes it a significant step
forward in the field of generative models, and it has the potential to
revolutionize many industries that rely on visual content.
One of the most remarkable features of DALL-E is its ability to generate
images from textual descriptions. For example, if given a textual
description such as "a green chair shaped like a pear," DALL-E can
generate a corresponding image that matches the description. This
capability opens up many possibilities for creative and practical
applications, such as generating custom product images for e-commerce
websites or creating artwork based on textual descriptions.
Despite being a relatively new model, DALL-E has already generated
significant interest in the AI community and has been used in various
applications, including art, design, and marketing. OpenAI has released
several examples of images generated by DALL-E, showcasing its ability
to generate imaginative and detailed images that are difficult to
distinguish from those created by human designers.
Overall, DALL-E represents a significant step forward in the field of
generative models, and its potential applications are vast. As AI
technology continues to advance, models like DALL-E are likely to play
an increasingly important role in many industries that rely on visual
content.

Page
7
1.2 Objective :

The college project is about creating an Al tool that can generate images
using Stable Diffusion library. The project aims to showcase the
potential of Al in generating high- quality images for various purposes.
The AI tool is built on the CompVis/stable-diffusion-v1-4 architecture, a
large language model trained by OpenAI. The model has been fine-tuned
to generate images based on textual inputs given by the user. It can
produce images of various styles and formats, including sketches,
paintings, and photographs.
The project includes several stages, including evaluation. To use the Al
tool, the user provides a textual description of the desired image and
specifies the style and format. The model then generates an image that
matches the user's requirements.
The evaluation of the project focuses on the quality, similarity to the
input description, and diversity of the generated images. The results
demonstrate the effectiveness of the Al tool in producing high-quality
images that meet user requirements.
Overall, the project aims to demonstrate the potential of AI in generating
images that can be used in different applications, such as art, advertising,
and entertainment. The project also highlights the challenges and
opportunities in developing Al tools that can generate high-quality
images that meet user requirements.

Page
8
1.3 What is Web Development ?

Web development is the process of building and maintaining websites


and web applications. It involves a combination of programming
languages, tools, and works to create interactive and dynamic websites
that can be accessed by frameworks users over the internet. Web
development includes designing the user interface, creating the
functionality, and implementing the backend infrastructure required to
support the website or application. Web developers typically use a variety
of programming languages, such as HTML, CSS, JavaScript, and server-
side languages like PHP, Ruby, or Python. They may also work with web
development frameworks like React, Angular, or Vue to streamline the
development process and improve the performance of the website or
application. Web development is a complex process that requires
attention to detail, technical knowledge, and creativity to deliver
engaging and responsive web experiences.

Page
9
1.4 What is a website ?

A website is a collection of web pages that are hosted on a web server


and can be accessed by users over the internet. It can contain a variety of
content, such as text, images, videos, and interactive features, and can be
used for a range of purposes, including personal or business websites, e-
commerce sites, blogs, social media platforms, and more.
A website typically has a homepage, which serves as an entry point for
the site and provides an overview of its content and purpose. From there,
users can navigate to other pages on the site through links or menus.
Websites can be created using a variety of tools and technologies,
including web development languages like HTML, CSS, and JavaScript,
as well as web development frameworks like WordPress, Drupal, and
Joomla.
Websites can be accessed through web browsers like Chrome, Safari,
Firefox, and Internet Explorer, and can be viewed on a range of devices,
including desktop and laptop computers, tablets, and smartphones. The
design and functionality of a website can vary widely depending on its
purpose and target audience, but all websites share the common goal of
providing information and engaging users with content.

Page
10
2. System Requirements and System Analysis

2.1 Requirement Collection :

Front End Development requirements list of necessary functions, are a


capabilities or characteristics related to design themes and plans for
creating it. The process that was held while collecting the requirements of
the system are as follows:

• Team Discussion :

The Process of discussing how the project should be implemented.

• Understanding the focused group :

Understanding as a team, what the audience wants to see in the theme.

Page
11
2.2 System Requirements :

2.2.1 Functional Requirements :

Majorly, functional requirements of the systems intend to describe what


the system is supposed to do. The main functional requirements of this
system are as follows:
• User should be able to view all the necessary information and
specification about this project.
• Browser computability.
• Responsive to all devices.

2.2.2 Non-Functional Requirements :

A non-functional requirement describes how the system performs a


certain function. Non-functional requirements generally specify the
system's quality attributes or characteristics. Our tool follows properties
such as reliability, usability, storage occupancy, performance and
response time.
• AI Model Development Language: PYTHON
• Scripting Language: Javascript.
• Markup Language : HTML.
• Styling Language: CSS.
• ServerSide Language: PYTHON.
• IDE: Microsoft Visual Studio, Google Colab.
• Processor: Intel core i3 or equivalent, Google Colab Provided
Processor.
• Hard Disk: Atleast 50 GB, Google Colab Provided Disk.
• Ram: 2 GB, Google Colab Provided RAM.

Page
12
2.3 Feasibility Analysis :
A feasibility study is an analysis of how successfully a project can be
completed, accounting for factors that affect it such as economic,
technological and scheduling factors. Project managers use feasibility
studies to determine potential positive and negative outcomes of a project
investing a considerable amount of time and money into it. Feasibility
studies allow companies to determine and organize all of the necessary
details to make business work. A feasibility study helps to identify
logistical problems, and nearly all business related problems, along with
the solutions to alleviate them.

2.3.1 Economic Feasibility :


Economically, our tool is bound to do well. There is little cost associated
for using the system. Hence, the system is economically feasible. If user
needs any support, then such will be available upon email request.

2.3.2 Operational Feasibility :


Operational feasibility asks if the system will work when developed and
installed. The system is user friendly so the user can use this system more
enthusiastically. The following points were taken into account for
operational feasibility of the system:
• The system causes no harm.
• The system is affordable and has low operational cost.
2.3.3 Technical Feasibility :
The website must be evaluated from the technical aspect first. The
valuation of this feasibility must be based on an outline design of the
website requirement having identified an outline system, the investigation
must go on to suggest the type of equipment, required method developing
the system, of running the system once it has been designed. Technical
issues raised during the investigation are:
• Does the necessary technology exist to do what is
suggested/assigned?
• Can the system be upgraded if developed?

2.3.4 Schedule Feasibility :


Schedule feasibility is a measure of how reasonable the project timetable
is. So, feasible schedule had been managed through proper time schedule.

Page
13
3 System Design

3.1 Overall Design :

Overall Coding Implementation :-

Page
14
Generating Output :-

Overall Input to Output :-

Input text Box :- Elephant

Output :-

Page
15
3.2 Design Diagrams :

Data Flow Diagram :-

Input Text
User request sent to
the model

CompVis/stable-
Application diffusion-v1-4

Model output
received

Output
Image

Page
16
4.Implementation

4.5 Implementation Tools:


Implementation is an activity that is contained throughout the
development phase. It is the process of bringing designed system into
operational use. The system is tested first and then turned into working
system. Every task identified in the design specification is carried out in
this phase.

4.6 Front End Tools:


HTML:
HTML stands for Hypertext Markup Language. It is a standard markup
language used to create and design web pages and web applications.
HTML is the backbone of web development and is used to structure
content on the web, including text, images, videos, and other media.
HTML consists of a series of tags and attributes that are used to define
the structure and content of a web page. Tags are used to create elements
like headings, paragraphs, links, images, and forms, while attributes are
used to provide additional information about these elements, such as their
size, color, and alignment.

CSS:
CSS stands for Cascading Style Sheets. It is a style sheet language used to
describe the presentation of HTML and XML documents, including web
pages and web applications. CSS is used to define the layout, fonts,
colors, and other visual aspects of a web page, making it an essential part
of web development.
CSS works by assigning styles to HTML elements using selectors. For
example, CSS selector can be used to apply a specific font family and
size to all headings on a web page. CSS also supports the use of classes
and IDs to apply styles to specific elements or groups of elements.

JAVASCRIPT:
JavaScript is a programming language used to create interactive and
dynamic web pages and web applications. It is a client-side language,
which means that it runs on the user's computer rather than the web server.
JavaScript is used to add interactivity and functionality to a web page,
such as Jom validation, animations and user interface enhancements. It
can be used to fanipulate HTML and CSS elements dynamically,
allowing developers to create engaging and responsive web pages that
adapt to user actions and input.

Page
17
4.3 Back End Tools:

PYTHON:
Python is a versatile and powerful language commonly used for web
development. With frameworks like Django and Flask, Python excels as a
backend language, providing robust solutions for building scalable and
efficient web applications. Its clean syntax and extensive libraries
streamline development, allowing programmers to focus on application
logic rather than intricate details. Django, a high-level web framework,
simplifies database interactions, URL routing, and templating, making it
ideal for rapid development. Flask, a lightweight alternative, offers
flexibility, enabling developers to choose components as needed.
Python's rich ecosystem, coupled with its readability, makes it a preferred
choice for creating dynamic and feature-rich web applications.

FLASK:
Flask, a lightweight Python web framework, empowers developers to
swiftly build robust web applications. With its simplicity and flexibility,
Flask facilitates rapid development while maintaining a minimalistic
codebase. Leveraging the Werkzeug toolkit and Jinja2 templating engine,
Flask ensures efficient routing and dynamic content rendering. It supports
RESTful APIs, making it ideal for microservices architecture. Flask's
modular design encourages extensibility through a range of plugins,
enabling seamless integration with databases, authentication systems, and
more. Whether crafting a small project or a scalable web solution, Flask's
intuitive structure and active community support make it a top choice for
backend development, guaranteeing efficiency and scalability.

Page
18
4.4 Model Implementation :

The model showcases the integration of several libraries to create a Stable


Diffusion image generation pipeline. The script employs the Python
Imaging Library (PIL) for image processing and IPython's display
module to visualize the generated image interactively. Additionally, it
relies on the StableDiffusionPipeline from the "diffusers" library,
leveraging a pre-trained model for stable image diffusion.

The script begins by setting up essential parameters, including an


authentication token for accessing the model, the specific model ID
("CompVis/stable-diffusion-v1-4"), and the computational device (in this
case, "cuda" for GPU acceleration). The pipeline is then instantiated with
these configurations, employing mixed-precision computation for
improved efficiency.

The central function, generate_image, takes a prompt as input and utilizes


the stable diffusion pipeline to generate an image based on the provided
text. The resulting image is then displayed using IPython's display
functionality.

This code is instrumental in harnessing the power of stable diffusion


techniques to generate visually appealing and contextually relevant
images based on user prompts. The integration of PyTorch, the
"diffusers" library, and IPython's display capabilities demonstrates a
cohesive approach to incorporating deep learning models into interactive
applications, facilitating the exploration of creative image generation
through textual input.

Page
19
5. Future Scope :

■ Potential Improvements :
– Model Fine-Tuning: Enhancing model performance
for diverse prompt interpretations.
– Code Optimization: Streamlining code for faster
execution.
– User Interface Enhancement: Creating a user-
friendly interface for easy interaction.
– Building Our Own Model : Creating our own AI
model from scratch using GAN

■ Extended Features :
– Multi-Modal Generation: Text-to-image generation
with additional audio or video prompts.
– Collaborative Platform: Enabling multiple users to
contribute to image generation simultaneously.

Page
20
6.References

1. Brown, A. R., Dhariwal, P., Schulman, J., & Hesse, C. (2021).


Language
Models are Few-Shot Learners. arXiv preprint arXiv:2106.14448. 2.
DALL-E: Creating Images from Text. (n.d.). https://openai.com/dall-e/
3. Karras, T., Laine, S., & Aila, T. (2019). A Style-Based Generator
Architecture for Generative Adversarial Networks. arXiv preprint
arXiv:1812.04948.
4. Ray, S. (2021). Al-Powered Art: What You Need to Know.
https://www.creativebloq.com/inspiration/ai-powered-art-what-you-need-
to-know
5. The democratization of artificial intelligence. (2021).
https://www.economist.com/special-report/2021/06/24/the-
democratization-of-artificial-intelligence
6. W3Schools. (n.d.). HTML Tutorial. https://www.w3schools.com/html/
7. W3Schools. (n.d.). CSS Tutorial. https://www.w3schools.com/css/
8. MDN Web Docs. (n.d.). JavaScript. https://developer.mozilla.org/en-
US/docs/Web/JavaScript
9. https://huggingface.co/CompVis/stable-diffusion-v1-4

Page
21

You might also like