Social Media Datasets For Analysis and Modeling Drug Usage

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

International Journal of Trend in Scientific Research and Development (IJTSRD)

Volume 3 Issue 5, August 2019 Available Online: e-ISSN: 2456 – 6470

Social Media Datasets for Analysis and Modeling Drug Usage

Sindhu S. B, Dr. B. N Veerappa
Department of Studies in Computer Science & Engineering,
University BDT College of Engineering, Davangere, Karnataka, India

How to cite this paper: Sindhu S. B | Dr. B. ABSTRACT

N Veerappa "Social Media Datasets for This paper based on the research carried out in the area of data mining
Analysis and Modeling Drug Usage" depends for managing bulk amount of data with mining in social media on
Published in using composite applications for performing more sophisticated analysis.
International Enhancement of social media may address this need. The objective of this
Journal of Trend in paper is to introduce such type of tool which used in social network to
Scientific Research characterised Medicine Usage. This paper outlined a structured approach to
and Development analyse social media in order to capture emerging trends in medicine abuse by
(ijtsrd), ISSN: 2456- applying powerful methods like Machine Learning. This paper describes how
6470, Volume-3 | IJTSRD25246 to fetch important data for analysis from social network. Then big data
Issue-5, August techniques to extract useful content for analysis are discussed.
2019, pp.150-152, KEYWORDS: social media; data mining; Big data
Copyright © 2019 by author(s) and These days it is utilized for separating the information of patient's to know the
International Journal of Trend in Scientific comprehension of patient indications. Web-based social networking, order from
Research and Development Journal. This singular informing to live for as, is giving inconceivable chances to patient to
is an Open Access article distributed speak their encounters with medication and gadgets. Web-based social
under the terms of networking permits message commitment, gathering data and circulation in the
the Creative medicinal services space. Medicinal services are one which contains the data of
Commons Attribution patients with their authorization. It gives a viable person to person
License (CC BY 4.0) communication condition. The best possible method for mining data and float
( from the learning is cloud. Utilizing system based examination technique it
/4.0) demonstrate the online networking.
Social network (media) is one to extracting the information surveillance? The challenges of using these emerging
from the internet. Nowadays it is used for extracting the data surveillance systems for infectious disease epidemiology,
of patient’s to know the understanding of patient symptoms. including the specific resources needed, technical
Social media, classify from individual messaging to live for requirements, and acceptability to public health
as, is providing immeasurable opportunities for patient. practitioners and policymakers, have wide-reaching
implications for public health surveillance in the 21st
Converse their experiences with drug and devices. Social century.”9 The use of social media for health monitoring and
media allows message contribution, gathering information surveillance indeed has many drawbacks and diffculties,
and distribution in the health care space. Health care is one particularly if done automatically. For example, traditional
which contains the information of patients with their NLP methods that are applied to longer texts have proven to
permission. It provides an effective social networking be inadequate when applied to short texts, such as those
environment. The proper way of mining information and found in Twitter.2 Something seemingly simple, such as
drift from the knowledge is cloud. Using network based searching and collecting relevant postings, has also proven
analysis method it model the social media such as Facebook, to be quite challenging, given the amount of data and the
Twitter, WebMD. diverse styles and wording used by people to refer to the
topic of interest in colloquial terms (semantic heterogeneity)
Numerous studies have been published recently in this inherent to this type of media. The goal of this session was to
realm, including studies on pharmacovigilance,2 identifying attract researchers that have explored automatic methods
smoking cessation patterns,3 identifying user social circles for the collection, extraction, representation, analysis, and
with common experiences (like drug abuse),4 monitoring validation of social media data for public health surveillance
malpractice,5 and tracking infectious disease spread.6–8 A and monitoring, including epidemiological and behavioral
systematic review9 conducted in 2014 found numerous studies. It serves as a unique forum to discuss novel
attempts to use this user-generated data, but none yet approaches to text and data mining methods that respond to
integrated in national surveillance programs, noting the the specific requirements of social media and that can prove
promise and challenges of the field quite succinctly: “More invaluable for public health surveillance. Research topics
direct access to such [social media] data could enable presented at this session include: • Early detection of disease
surveillance epidemiologists to detect potential public health outbreaks • Medication safety, including medicine
threats such as rare, new diseases or early-level warnings for interactions and dietary supplement safety • Health
epidemics. But how useful are data from social media and behaviors, including diet success • Individual well-being
the Internet, and what is the potential to enhance which affects mental and physical health.

@ IJTSRD | Unique Paper ID – IJTSRD25246 | Volume – 3 | Issue – 5 | July - August 2019 Page 150
International Journal of Trend in Scientific Research and Development (IJTSRD) @ eISSN: 2456-6470
Literature Survey records for inquire about examination or information
Noemie Elhadad, et al[1], agreeable chains are a noteworthy mining, which prompts security issues. The size of
hotspot for customer created criticism on about all items and information in cloud foundation ascends as far as nature of
administrations. Clients much of the time accept on social Big Data; in this manner making it a contention for
bind to reveal now and then genuine episodes as opposed to conventional programming instruments to process such
going to social correspondence channels. This vital, mass information inside a tolerable slipped by time. As a
significant, customer made certainties, if extricated consequence, it is a contention for current anonymization
genuinely and powerfully from the social chain, can possibly strategies to save protection on classified extensible
have the positive effect on basic applications identified with informational collections because of their deficiency of
social wellbeing and security, and past. Shockingly, the scalability. An Author speaks to an extensible two-stage way
creation of data from social chain where the yield of the to deal with Anonym zing versatile datasets utilizing
extraction procedure is utilized to take solid activities in the dynamic Map Reduce structure and LKC security display.
genuine world are not very much upheld by existing
innovation. Customary data creation approaches don't Methodology
function admirably finished the exceedingly casual and The fig 1 below is the proposed architecture which is used to
ungrammatical sentence structure in social chain. They don't mine the social media data. The diagram consists of database
deal with the generation and collection of uncommon where the social media data is stored and from that the
substance. In our progressing aggregate undertaking particular data is selected and extraction of the data is done.
between Columbia University and the New York City Data extraction is nothing but exacting a particular data of
Department of Health and Mental Hygiene (DOHMH), this feature is selected.
paper intend to address these distinction in research and
innovation for one essential general wellbeing. After this step data transformation is done, during this phase
transforming a particular data is done. And those data is the
Erwan Le Martelot et al[2] Today wherever organize is send to process by the machine learning model where those
accessible. The people group exposure got an expanding data is tested in which class they belong. There is a training
consideration as an approach to uncover the arrangement of set of instances present in the machine learning model which
systems and associated inside than externally. However the is used to predict the new test instances where or which
vast majority of the powerful techniques accessible don't class they belong.
think about the conceivable levels of association, or scales, a
system may incorporate and are in this manner restricted. In The algorithm works as listed steps the first step is to collect
this paper Author said in regards to perfect with worldwide the raw data from the social media as the input.
and neighborhood criteria that empowers quick multi-scale
group finding. The strategy is to clarify with two calculations,
one for each sort of measure, and executed with 6 known
standard. Disclosure people group at different level is
computation ally extravagant assignment. Consequently, this
activity puts a solid consideration on the lessening of
computational unpredictability. A few heuristics are initiated
for accelerate reason. Trial displays the competency and
correct of our way regarding singular calculation and model
by testing them against substantial out-comes in multi-scale
arrange. This work likewise offers an appraisal amongst
criteria and between the worldwide and nearby

Big information is the term that portrayed by its expanding

volume, speed, assortment and veracity. Every one of these
qualities makes handling on this huge information an
unpredictable undertaking. Along these lines, for preparing
such information Author need to do it any other way like
Map Reduce Framework. At the point when an association
trades information for mining helpful data from this Big Data
then protection of the information turns into an imperative
issue in the earlier years, a few security safeguarding models
have been given. Anonymizing [3]Thedataset should be
possible on numerous operations like speculation,
concealment and specialization. These calculations are for
the most part reasonable for dataset that does not have the
attributes of the Big Data. To propagate the protection of
dataset a calculation was proposed recently. An creator
speaks to how the development of enormous Data qualities,
Map Reduce structure for security safeguarding in eventual
fate of our exploration. Fig1: Block Diagram

E.Srimathi,K.A.Apoorva[4],as numerous web administrations The next step after the input is to apply supervised learning
expect customers to share their private electronic wellbeing algorithm for feature extraction. And at this particular step

@ IJTSRD | Unique Paper ID – IJTSRD25246 | Volume – 3 | Issue – 5 | July - August 2019 Page 151
International Journal of Trend in Scientific Research and Development (IJTSRD) @ eISSN: 2456-6470
the output is also generated as the extracted data. After this resulting in different categories of data is available on
step the training set is generated and even the testing data is different medicines. Here we are going to analyze data based
created by applying various algorithms. Next step is to apply on usage like medicine name, area, gender like this we are
machine learning algorithm to the training dataset to train considered a dataset processed according requirements and
the model for getting the correct predicted output. After resulted in 98 percent accuracy using machine learning
these steps are main aim is to build the classifier model to technique
classify the test data. Its upto us that which classifier
algorithm we can use. There are many classifier algorithm,
they are listed above in this paper. After building the
classifier model the next step is to apply that model for
classifying the training data sets. The last step is we get the
output as the predicted class of that particular training data .
Fig1.Home Screen
Module Description
There are different operational module in this application
Selecting Data set: First of all for any big data application
our first job is to collect data set and maintain dataset for
further processing. Analyze the Data set: Analyzing means
converting unstructured data into proper format for further
processing .Finding Attributes/ Properties of Data set:
Fig2: Classified Dataset based On Subject
For processing queries we need to find out or identify the
data attributes of the data set by which we need to predict
queries. Processing User Queries: After all finding
attributes we need to process user queries with different
parameter and has to predict and show the result.

Prediction of Medicine Drug usage to identify requirement,
availability and scarcity of medicine in different places based
on the usage of medicine in different areas of cities. For the
prediction of medicine usage we are developed a machine
learning software modelto which we are providing dataset
information which contains information regarding which Fig 3: Analysis of Medicine Details.
medicine is used by whichpatient and in which area along
with additional properties. After provide dataset which is CONCLUSION
unreadable format it need to convert to structure one for This paper presented our approach for mining and managing
further processing. After conversion we are going to extract data from social chain which depends upon combination of
features of medicine dataset which are necessary for bulk amount of data from social networks which is based on
identifying their usage using supervised learning technique. combination of big data and infrastructure paradigms.
Supervised learning is the machine learning task of learning Machine Learning model is used to mine, store and process
a function that maps an input to an output based on example bulk data through social network. Processing of mined data
input-output pairs. It infers a function from labelled training is also performed by ML which simplifies development of
data consisting of a set of training examples. In supervised new algorithms and provides high scalability and flexibility.
learning, each example is a pair consisting of an input object This paper presents development of an implementation of
and a desired output value. A supervised learning algorithm Machine Learning Technique that extend to large
analyses the training data and produces an inferred function, chunks(storage) of machines comprising thousands of
which can be used for mapping new examples. An optimal machines. The utilization makes efficient use of these
scenario will allow for the algorithm to correctly determine machine resources is suitable for many large computational
the class labels for unseen instances. issue encountered at

After classifying data labels they are going to group which REFERENCES
are related to each other based on the subject which we are [1] Noemie Elhadad, et al “Information extraction from
considered. Then cluster is going to form the group of similar social media for public health”.
elements like on medicine name, areas etc..., supervised
learning method categorizes our data into a desired and [2] Erwan Le Martelot, “Fast multi scale detection of
distinct number of classes where we can assign label to each relevant communities”.
class. Here are used decision tree classification technique to [3] Hari Kumar. R M.E (CSE), Dr. P. Uma Maheshwari , Ph.d,
make decision on available data items and classify them “Literature survey on big data in cloud,” International
according to critical information. Decision Tree is simple to Journal of Technical Research and Applications e-ISSN:
understand and visualise, requires little data preparation, 2320-8163.
and can handle both numerical and categorical data.
[4] E. Srimathi, K. A. Apoorva “Preserving identity privacy
RESULT of healthcare records in big data publishing using
Result on the studies is carried out with large number of data dynamic MR”, International Journal of Advanced
set collected from real world health organization. Research in Computer Science and Software
Classification is done on the dataset with feature extraction Engineering, Vol 5, Issue 4, 2015.

@ IJTSRD | Unique Paper ID – IJTSRD25246 | Volume – 3 | Issue – 5 | July - August 2019 Page 152

You might also like