Article 7

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

International Journal of Soft Computing and Engineering (IJSCE)

ISSN: 2231-2307, Volume-1, Issue-5, November 2011

Data Mining and It’s Approaches towards


Higher Education Solutions
Tripti Arjariya*, Shiv Kumar, Rakesh Shrivastava, Dinesh Varshney
 ennoblement of learners. Still below twenty percent of
Abstract— The major objective of the research is advancement of learners are able to get admission in higher education in
knowledge and theoretical understanding of the relations among Indian context. The new approaches of IT application
variables for the study and development using data mining in
higher education system and its solution for Madhya Pradesh
hindering the factors facilitating or as an obstructed for higher
state. New knowledge takes three main forms: Exploratory education. The totalities of IT application in the form of data
research: which structures and identifies new problems, are mined through the different areas of higher education
Constructive research: develops solutions to a problem, Empirical studies.
research: tests the feasibility of a solution using empirical
evidence. As per the research methodology, one can have two HOW DO WE CATEGORIZE DATA MINING SYSTEMS
distinct methods of research either primary or secondary.
This study is a survey type of research followed by the There are many data mining [DM] systems available or being
developmental study of data mining in higher education Madhya developed [7]. Some are specialized systems dedicated to a
Pradesh state. The terms basic or fundamental indicate that, given data source or are confined to limited DM
through theory generation, basic research provides the
functionalities, other are more versatile and comprehensive.
foundation for further, sometimes-applied research. As there is
no guarantee of short-term practical gain, researchers may find it DM systems can be categorized according to criteria of
difficult to obtain funding for basic research. In this research we classification as following [3]:
come to know that how the data mining approaches and issues are A. Classification according to the type of data source mined
helpful for the development and the solutions of higher education which categorizes DM systems according to the type of data
in Madhya Pradesh state. handled such as spatial data, multimedia data, time-series
Index Terms— Knowledge development, data mining data, text data, World Wide Web etc.
approaches and issues, higher education system. B. Classification according to the data model drawn on that
categorizes DM systems based on the data model involved
I. INTRODUCTION such as relational database, object-oriented database, data
The hardware and software approaches make changes in warehouse, transactional etc.
technological applications. This makes society more C. Classification according to the kind of knowledge
scientific day to day due to generation to generation changes discovered[5] which categorizes DM systems based on the
in different areas. In the teaching learning process kind of knowledge discovered or DM functionalities such as
intervention Information Communication Technology(ICT) characterization, discrimination, association, classification,
able to solve the problem of teacher, learners and clustering etc. Some systems tend to be comprehensive
administrators with a systematic way. In India and abroad no systems offering several DM functionalities together.
country is able to solve their basic educational problems even D. Classification according to mining techniques used
growth of literacy rate from elementary to higher education employ and provide different techniques, this classification
system. The developed and developing countries worldwide categorizes DM systems according to the data analysis
trying to focus on development of educational system of approach used such as machine learning, neural networks,
under developed countries. The education systems are also genetic algorithms, statistics, visualization, data base or data
reformed by the interference of International organization like warehouse-oriented etc. The classification can also take into
UNESCO[2]. In higher education system it is a great account the degree of user interaction involved in the data
challenge to take advantages of ICT applications in general mining process such as query-driven systems, interactive
and find out Root causes of problems, prospects of exploratory systems, or autonomous systems [6].

Manuscript received October 22, 2011. II. WHAT ARE THE ISSUES IN DATA MINING
* Corresponding Author
Tripti Arjariya, Department of Computer Science and Engineering, DM algorithms embody techniques [3] [7] that have existed
Madhya Pradesh Bhoj (Open) University, Bhopal (M.P.)-462021, India. for many years, but have only lately been applied as reliable
(E-mail: [email protected]).
Shiv Kumar, Associate Professor, Department of Information
and scalable tools that time and again outperform older
Technology, Technocrats Institute of Technology, Bhopal (M.P.)-462021, classical statistical methods. While DM is still in its infancy, it
India. (E-mail: [email protected]). is becoming a trend and ubiquitous. Before DM develops into
Dr. Rakesh Shrivastava, Professor, Department of Higher Education,
a conventional, mature and trusted discipline, many still
Govt. of Madhya Pradesh, Bhopal (M.P.)-462021, India. (E-mail:
[email protected] ) pending issues have to be addressed some of these issues are
Dr. Dinesh Varshney, Professor, Multimedia Regional Center, Madhya addressed below.
Pradesh Bhoj (Open) University, Indore (M.P.) - 452001, India. (E-mail:
[email protected]).

238
Data Mining and It’s Approaches towards Higher Education Solutions

A. Security and social issues: Security is an important not designed for the very large data sets DM is dealing
issue with any data collection that is shared and/or is today. Terabyte sizes are common. This raises the issues
intended to be used for strategic decision-making. In of scalability and efficiency of the DM methods when
addition, when data is collected for customer profiling, processing considerably large data. Algorithms with
user behaviour understanding, correlating personal data exponential and even medium-order polynomial
with other information etc. large amounts of sensitive complexity cannot be of practical use for DM. Linear
and private information about individuals or companies algorithms are usually the norm. However, concerns
is gathered and stored. This becomes controversial such as completeness and choice of samples may arise.
given the confidential nature of some of this data and the Other topics in the issue of performance are incremental
potential illegal access to the information. DM could updating, and parallel programming. There is no doubt
disclose new implicit knowledge about individuals or that parallelism can help solve the size problem if the
groups that could be against privacy policies, especially dataset can be subdivided and the results can be merged
if there is potential dissemination of discovered later. Incremental updating is important for merging
information. Another issue that arises from this concern results from parallel mining, or updating DM results
is the appropriate use of DM. when new data becomes available without having to
re-analyze the complete dataset.
B. User interface issues: The knowledge discovered
by DM tools is useful as long as it is interesting, and E. Data source issues: There are various issues related
above all understandable by the user. Good data to the data sources, some are practical such as the
visualization eases the interpretation of DM results, as diversity of data types, while others are philosophical
well as helps users better understand their needs. There like the data glut problem. We certainly have an excess
are many visualization ideas and proposals for effective of data since we already have more data than we can
data graphical presentation. However, there is still much handle and we are still collecting data at an even higher
research to accomplish in. The major issues related to rate. If the spread of database management systems has
user interfaces and visualization is “screen real-estate”, helped increase the gathering of information, the advent
information rendering, and interaction. Interactivity of DM is certainly encouraging more data harvesting.
with the data and DM results is crucial since it provides The current practice is to collect as much data as
means for the user to focus and refine the mining tasks, possible now and process it. The concern is whether we
as well as to picture the discovered knowledge [6] from are collecting the right data at the appropriate amount,
different angles and at different conceptual levels. whether we know what we want to do with it, and
whether we distinguish between what data is important
C. Mining methodology issues: These issues pertain and what data is insignificant. Regarding the practical
to the DM approaches applied and their limitations. issues related to data sources, there is the subject of
Topics such as versatility of the mining approaches, heterogeneous databases and the focus on diverse
diversity of data available, dimensionality of the complex data types. We are storing different types of
domain, broad analysis needs (when known), data in a variety of repositories.
assessment of the knowledge discovered, exploitation of
background knowledge and metadata[6], control and
handling of noise in data etc. are all examples that can III. PROPOSED ALGORITHM
dictate mining methodology choices. Most algorithms
assume data to be noise-free. This is of course a strong Algorithm -1:
assumption. Most datasets contain exceptions, invalid Input: Data set R, Attribute set Ai
or incomplete information, which may complicate the Output: data set R’
analysis process and in many cases compromise the R’ -> R
accuracy of the results. As a consequence, data For I=1 to n do
pre-processing (data cleaning and transformation) Max (Ai) = the deepest node in the attribute set Ai
becomes vital and the most important phase in the If Max(Ai). Distance_to_max<Ii
knowledge discovery process. DM techniques should be Newnode=node.root_path_array[Ii-node.distance_t
able to handle noise in data or incomplete information. -_max]
More than the size of data, the size of the search space is Else
Newnode=max(Ai)
even more decisive for DM techniques. The search
Endif
space usually grows exponentially when the number of
Replace node with new node
dimensions increases. This is known as the curse of
Endfor
dimensionality. This “curse” affects so badly the Remove duplication from R’
performance of some data mining approaches that it is End
becoming one of the most urgent issues to solve.

D. Performance issues: Many artificial intelligence Algorithm -2:


and statistical methods exist for data analysis and Input : Primitive rules set R
interpretation[1]. However, these methods were often Output Generalized rules set R’

239
International Journal of Soft Computing and Engineering (IJSCE)
ISSN: 2231-2307, Volume-1, Issue-5, November 2011
R’ <- 0 interference by the Information Communication Technology
N= | R | and DM areas. Higher Education system in India, now a day’s
For I=0 to N-1 do totally depended on DM majors. The demand and problem
r <- ri solving abilities within the framework of logical argument
M <- |r| and accuracy of result need to explore through research and
For j=0 to M-1 do development procedure. Efforts are made by Government,
If ri inconsistent with rule rn E then NGOs and Independent bodies trying to make social
Restore the dropped condition aj problems solve able easily through the DM. Through the
Endif algorithm and the experimental results we can conclude that
the data mining techniques are very much useful in the
End for
development and finding out the solutions of higher education
Included in rule r
in Madhya Pradesh state.
If rule r is not logically include in a rule r’ E MRULE then
MRULE <- r U MRULE
Endif REFERENCES
End
1. Cave, M., Kogan, M. and Hanney, S. (1990), “The scope and effects
of performance measurement in British higher education, in F. J. R.
R: Data set which is a any college web site because Ontology C.” Dochy, M. S. R. Segers and W. H. F. W. Wijnen (Eds.),
is used for specific domain. “Management Information and Performance Indicators in Higher
R’: is the output Education,”Van Gorcum and Comp, 48–49.
2. Fielden, J., and Abercromby, K. (2000), “UNESCO Higher Education
Ai is the attribute set
Indicators Study: Accountability and International Co-operation in
Max (Ai) is the function which finds the deepest node in the the Renewal of Higher Education”, Georgia Professional Standards.
attribute set Ai. UNESCO, Paris.
3. Han, J. and Kamber, M. (2001), “Data Mining: Concepts and
Techniques”, Simon Fraser University, Organ Kaufmann.
4. Johnstone, J.N. (1976), “Indicators of the Performance of Educational
IV. RESULTS AND DISCUSSIONS Systems. UNESCO”, International Institute for Educational Planning,
Paris.
5. Luan, J. (2002a), “Data mining and knowledge management in higher
education – potential applications”, In Proceedings of AIR Forum,
Toronto, Canada.
6. Luan, J. (2002b), “Data Mining Application in Higher Education”,
SPSS Executive Report.

Figure 3 Recall ratios of query methods

User input the query words in ontology as expansion words


and its performance can be showed through precision and
recall ratios that are calculated from experimental results.
Through 10 times different information requests, we compute
recall and precision ratios and make comparison with
traditional query method.

V. CONCLUSIONS
The DM is directly associated with use of technology for
accessing data and to give result as required in a desired way.
In Indian contest though computer literacy among the users
are very low but its applicability in different sectors of the
society is highly demandable day to day. With specific to
education sector it has great demand both teaching and
learning prospects. The management aspects are highly

240

You might also like