A Report of Six Weeks Industrial Training at Think-Next Private Limited
A Report of Six Weeks Industrial Training at Think-Next Private Limited
A Report of Six Weeks Industrial Training at Think-Next Private Limited
At
OF THE DEGREE OF
BACHELOR OF ENGINEERING
JUNE-JULY, 2018
SUBMITTED BY:
ABHISHEK JOSHI
UNIVERSITY UID:-
16BCS3171
Page 1 of 30
Page 2 of 30
CHANDIGARH UNIVERSITY 16BCS3171
CONTENTS
Certificate by Company/Industry/Institute…………………………………………………...3
Candidate’s Declaration……………………………………………………………………...4
Abstract……………………………………………………………………………………….5
Acknowledgement…………………………………………………………………………….6
About the Company/ Industry / Institute…………………………………………………….7
List of Figures………………………………………………………………………………..8
CHAPTER 1 INTRODUCTION
3.1 Result……………………………………………………………….20
3.2 Discussion…………………………………………………………..21
3.3 Snapshots of results…………………………………………………22
REFERENCES……………………………………………………………………………. 30
Page 3 of 30
CHANDIGARH UNIVERSITY 16BCS3171
CANDIDATE’S DECLARATION
I “ABHISHEK JOSHI” hereby declare that I have undertaken six weeks industrial training at
“THINK-NEXT PRIVATE LIMITED” during a period from 14 MAY 2018 to 29 JUNE 2018 in
partial fulfillment of requirements for the award of degree of B.E (COMPUTER SCIENCE &
is being presented in the training report submitted to Department of Computer Science &
training work.
Page 4 of 30
CHANDIGARH UNIVERSITY 16BCS3171
ABSTRACT
My project is based on the fact of scrapping out the data out from the various websites.
As we usually know that every minute millions of data is produced at very fast pace. The data is
then stored on databases on various websites. So, thereby my project will help to scratch out the
important data from the websites that may be used in data analysis or for research work.
Since this system required to be accessed by the admin and users only so I have created login
system and signup system for the same reason.
The project is developed in python language that makes it robust. It include both C and python
language to encode.
The project provide the complete description of using the various tools used for web scrapping.
As web scrapping is not at all easy so therefore a separate section called Documentation is
provided in the web App.
The app is generally based on data analytics and cover all sections regarding the data analytics
including various tools used in the data analytics.
Web Scrapping is a technique employed to extract large amounts of data from websites whereby
the data is extracted and saved to a local file in your computer or to a database in table
(spreadsheet) format.
A web scraping software will automatically load and extract data from multiple pages of
websites based on your requirement. It is either custom built for a specific website or is one
which can be configured to work with any website. With the click of a button you can easily save
the data available in the website to a file in your computer.
The problem with most generic web scraping software is that they are very difficult to setup and
use. There is a steep learning curve involved. So my project will solve this problem. With a very
intuitive, point and click interface, using mu app you can start extracting data within minutes
from any website.
Page 5 of 30
CHANDIGARH UNIVERSITY 16BCS3171
ACKNOWLEDGEMENT
I would like to express my special thanks of gratitude to my teacher (Mr. Sunil Kumar) as well
as my friend(Daksh Agarwal) who gave me the golden opportunity to do this wonderful project
on the topic (Web Scrapping), which also helped me in doing a lot of Research and I came to
know about so many new things I am really thankful to them. Secondly I would also like to
thank my parents and friends who helped me a lot in finalizing this project within the limited
time frame.
Page 6 of 30
CHANDIGARH UNIVERSITY 16BCS3171
Think Next Technologies Pvt. Ltd. is an ISO 9001:2008 Certified Software, Electronics and
CAD/CAM Trainer that is also approved by the Ministry of Corporate Affairs. We deal in
offering superior training for Web Designing and Development, Mobile Apps Development,
Digital Marketing, College/School ERP Software, University Conferences and Journals
Management.
Approved from Ministry of Corporate Affairs, Govt. of India. Corporate Identity No.
U72200PB2011PTCO35677 Affiliated with Indian Testing Board & ISTQB (International
Software Testing Qualifications Board). Member of CII (Confederation of Indian Industry)
Membership No. N5238P.
ThinkNEXT offers various 6 Months/3 Months/ 6 Weeks Industrial Training programs for
B.Tech, MCA, BCA, Diploma, M.Sc (IT), B.Sc (IT) and other related students. ThinkNEXT
offers Industrial Training in the field of CSE/IT/Electronics (ECE)/Mechanical/Civil/Electrical
Engineering students to make students Industry-Ready.
Page 7 of 30
CHANDIGARH UNIVERSITY 16BCS3171
LIST OF FIGURES
FIGURE 2:-Page 24
1. Signup page
2. Login page
3. Main page
4. Application Page
5. Documentation section
Page 8 of 30
CHANDIGARH UNIVERSITY 16BCS3171
1. INTRODUCTION
1. 4 GB RAM
1. Python 3.5.1
2. Sqlite
3.Django 1.10
4.Apache Server
Page 9 of 30
CHANDIGARH UNIVERSITY 16BCS3171
(Web Scrapping)
My project is based on the fact of scrapping out the data from the various websites.
The project is developed in python language that makes it robust. It include both C and python
language to encode.
The app is generally based on data analytics and cover all sections regarding the data analytics
including various tools used in the data analytics.
Web Scrapping is a technique employed to extract large amounts of data from websites whereby
the data is extracted and saved to a local file in your computer or to a database in table
(spreadsheet) format.
A web scraping software will automatically load and extract data from multiple pages of
websites based on your requirement. It is either custom built for a specific website or is one
which can be configured to work with any website. With the click of a button you can easily save
the data available in the website to a file in your computer
Page 10 of 30
CHANDIGARH UNIVERSITY 16BCS3171
Since this system required to be accessed by the admin and users only so I have created login
system and signup system for the same reason.
The project is developed in python language that makes it robust. It include both C and python
language to encode.
The project provide the complete description of using the various tools used for web scrapping.
As web scrapping is not at all easy so therefore a separate section called Documentation is
provided in the web App.
The app is generally based on data analytics and cover all sections regarding the data analytics
including various tools used in the data analytics.
Web Scrapping is a technique employed to extract large amounts of data from websites whereby
the data is extracted and saved to a local file in your computer or to a database in table
(spreadsheet) format.
A web scraping software will automatically load and extract data from multiple pages of
websites based on your requirement. It is either custom built for a specific website or is one
which can be configured to work with any website. With the click of a button you can easily save
the data available in the website to a file in your computer.
The problem with most generic web scraping software is that they are very difficult to setup and
use. There is a steep learning curve involved. So my project will solve this problem. With a very
intuitive, point and click interface, using mu app you can start extracting data within minutes
from any website.
Page 11 of 30
CHANDIGARH UNIVERSITY 16BCS3171
2.1.1 HTML:-
Every webpage we look at is written in a language called HTML. You can think of HTML as
the skeleton that gives every webpage structure. In this course, we'll use HTML to add
In the editor to the right, there's a tab called test.html. This is the file we'll type our HTML into.
The code with the <>s. That's HTML! Like any language, it has its own special syntax (rules
for communicating).
2.1.2 CSS:-
Cascading Style Sheets, fondly referred to as CSS, is a simple design language intended to
CSS handles the look and feel part of a web page. Using CSS, you can control the colour of the
text, the style of fonts, the spacing between paragraphs, how columns are sized and laid out,
what background images or colours are used, layout designs, and variations in display for
CSS is easy to learn and understand but it provides powerful control over the presentation of an
HTML document. Most commonly, CSS is combined with the markup languages HTML or
XHTML.
Page 12 of 30
CHANDIGARH UNIVERSITY 16BCS3171
2.1.3 JAVASCRIPT:-
JavaScript is a dynamic computer programming language. It is lightweight and most commonly
used as a part of web pages, whose implementations allow client-side script to interact with the
user and make dynamic pages. It is an interpreted programming language with object-oriented
capabilities.
JavaScript was first known as Live Script, but Netscape changed its name to JavaScript,
possibly because of the excitement being generated by Java. JavaScript made its first
appearance in Netscape 2.0 in 1995 with the name Live Script. The general-purpose core of the
language has been embedded in Netscape, Internet Explorer, and other web browsers.
2.1.3 PYTHON:-
language. It was created by Guido van Rossum during 1985- 1990. Like Perl, Python source
code is also available under the GNU General Public License (GPL). This tutorial gives enough
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
Python is copyrighted. Like Perl, Python source code is now available under the GNU General
Python is now maintained by a core development team at the institute, although Guido van
Page 13 of 30
CHANDIGARH UNIVERSITY 16BCS3171
2.1.4 DJANGO:-
Django is a high-level Python Web framework that encourages rapid development and clean,
pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web
development, so you can focus on writing your app without needing to reinvent the wheel. It’s
When a request comes to a web server, it's passed to Django which tries to figure out what is
actually requested. It takes a web page address first and tries to figure out what to do. This
Resource Locator – so the name urlresolver makes sense). It is not very smart – it takes a list
of patterns and tries to match the URL. Django checks patterns from top to bottom and if
something is matched, then Django passes the request to the associated function (which is
called view).
Page 14 of 30
CHANDIGARH UNIVERSITY 16BCS3171
(USING PYTHON)
Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming
language. It was created by Guido van Rossum during 1985- 1990. Like Perl, Python source
code is also available under the GNU General Public License (GPL). This tutorial gives enough
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
The concepts and rules used in python programming provide these important benefits:
Interactive
Interpreted
Modular
Dynamic
Object-oriented
Portable
High level
Page 15 of 30
CHANDIGARH UNIVERSITY 16BCS3171
Structure Query Language (SQL) is a database query language used for storing and
managing data in Relational DBMS. SQL was the first commercial language introduced for
E.F Codd's Relational model of database. Today almost all RDBMS (MySQL, Oracle,
Informix, Sybase, MS Access) use SQL as the standard database query language. SQL is
SQL Command
This includes changes to the structure of the table like creation of table, altering table,
All DDL commands are auto-committed. That means it saves all the changes
Page 16 of 30
CHANDIGARH UNIVERSITY 16BCS3171
DML commands are used for manipulating the data stored in the table and not the table
itself. DML commands are not auto-committed. It means changes are not permanent to
Command Description
Page 17 of 30
CHANDIGARH UNIVERSITY 16BCS3171
Data query language is used to fetch data from tables based on conditions that we can easily
apply.
Command Description
CONNECTIVITY
1. Django determines the root URLconf module to use. Ordinarily, this is the value of
theROOT_URLCONF setting.
2. Django loads that Python module and looks for the variable urlpatterns. This should be
3. Django runs through each URL pattern, in order, and stops at the first one that matches
4. Once one of the URL patterns matches, Django imports and calls the given view, which
is a simple Python function (or a class-based view). The view gets passed the following
arguments:
o An instance of HttpRequest.
o If the matched URL pattern returned no named groups, then the matches from the
Page 18 of 30
CHANDIGARH UNIVERSITY 16BCS3171
o The keyword arguments are made up of any named parts matched by the path
optional kwargs argument to django.urls.path() or django.urls.re_path().
5. If no URL pattern matches, or if an exception is raised during any point in this process,
Captured values can optionally include a converter type. For example, use <int:name> to
a / character, is matched.
Page 19 of 30
CHANDIGARH UNIVERSITY 16BCS3171
3.1 RESULTS
This project results in enhancement of the DATA ANALYSIS in the can be viewed as a vast and
way better application for users to save time in order to fetch the data from any website in
minutes.
this I was able to explore the use of Django, Apache Server and in enhancement the concepts
of hybrid programing, JavaScript, html, css, python and enhancing the concepts of HTML and
At the due of all these things, I am able to create web applications using Django and Python.
Page 20 of 30
CHANDIGARH UNIVERSITY 16BCS3171
Since this system required to be accessed by the admin and the users so I have created
Some direct links are provided so that one can visit website that can provide complete
explanation for the tools used in the scrapping process and getting more information about the
Web scrapping.
For describing the complete functioning of the web app documentation is made that provide the
The application section has interface that demand for the url to fetch out the data from that
website It has three sections. First section ask for url to copy out the complete code of html
document .Second section ask for url to obtain out the various link that are associated with the
html document. Third section asks for the url to extract all the text information that is written on
the website.
Lastly the layout pages has been designed through some beautiful quotes and some mesmerizing
gallery and with information regarding web scrapping that is coded with html, css, bootstrap.
Page 22 of 30
CHANDIGARH UNIVERSITY 16BCS3171
Page 23 of 30
CHANDIGARH UNIVERSITY 16BCS3171
Page 24 of 30
CHANDIGARH UNIVERSITY 16BCS3171
Page 25 of 30
CHANDIGARH UNIVERSITY 16BCS3171
Page 26 of 30
CHANDIGARH UNIVERSITY 16BCS3171
4.1 CONCLUSION
This project results in enhancement of the DATA ANALYSIS in the can be viewed as a vast and
way better application for users to save time in order to fetch the data from any website in
minutes.
this I was able to explore the use of Django, Apache Server and in enhancement the concepts
of hybrid programing, JavaScript, html, css, python and enhancing the concepts of HTML and
At the due of all these things, I am able to create web applications using Django and Python.
Scrapping methods can be changed in future but it will be always in demand. Because
People/business always love to gather data instantly, no one like manual efforts
There is always need to compare your business with competitor’s business and web
Web scraping is the most important part of marketing. Because thereis always need of
For web scraping services, methods are changing. Firstly, PHP scripts was used for web scraping
and still PHP scripts are famous but Python is getting more famous for web scraping now.
Page 28 of 30
CHANDIGARH UNIVERSITY 16BCS3171
REFRENCES
1. www.w3schools.com
2. www.tutorialpoint.com
3. A reference to HTML/CSS/Bootstrap
4. .Django Tutorials
5. www.datacamp.com
Page 29 of 30
CHANDIGARH UNIVERSITY 16BCS3171
Page 30 of 30