Article 7
Article 7
Article 7
Manuscript received October 22, 2011. II. WHAT ARE THE ISSUES IN DATA MINING
* Corresponding Author
Tripti Arjariya, Department of Computer Science and Engineering, DM algorithms embody techniques [3] [7] that have existed
Madhya Pradesh Bhoj (Open) University, Bhopal (M.P.)-462021, India. for many years, but have only lately been applied as reliable
(E-mail: [email protected]).
Shiv Kumar, Associate Professor, Department of Information
and scalable tools that time and again outperform older
Technology, Technocrats Institute of Technology, Bhopal (M.P.)-462021, classical statistical methods. While DM is still in its infancy, it
India. (E-mail: [email protected]). is becoming a trend and ubiquitous. Before DM develops into
Dr. Rakesh Shrivastava, Professor, Department of Higher Education,
a conventional, mature and trusted discipline, many still
Govt. of Madhya Pradesh, Bhopal (M.P.)-462021, India. (E-mail:
[email protected] ) pending issues have to be addressed some of these issues are
Dr. Dinesh Varshney, Professor, Multimedia Regional Center, Madhya addressed below.
Pradesh Bhoj (Open) University, Indore (M.P.) - 452001, India. (E-mail:
[email protected]).
238
Data Mining and It’s Approaches towards Higher Education Solutions
A. Security and social issues: Security is an important not designed for the very large data sets DM is dealing
issue with any data collection that is shared and/or is today. Terabyte sizes are common. This raises the issues
intended to be used for strategic decision-making. In of scalability and efficiency of the DM methods when
addition, when data is collected for customer profiling, processing considerably large data. Algorithms with
user behaviour understanding, correlating personal data exponential and even medium-order polynomial
with other information etc. large amounts of sensitive complexity cannot be of practical use for DM. Linear
and private information about individuals or companies algorithms are usually the norm. However, concerns
is gathered and stored. This becomes controversial such as completeness and choice of samples may arise.
given the confidential nature of some of this data and the Other topics in the issue of performance are incremental
potential illegal access to the information. DM could updating, and parallel programming. There is no doubt
disclose new implicit knowledge about individuals or that parallelism can help solve the size problem if the
groups that could be against privacy policies, especially dataset can be subdivided and the results can be merged
if there is potential dissemination of discovered later. Incremental updating is important for merging
information. Another issue that arises from this concern results from parallel mining, or updating DM results
is the appropriate use of DM. when new data becomes available without having to
re-analyze the complete dataset.
B. User interface issues: The knowledge discovered
by DM tools is useful as long as it is interesting, and E. Data source issues: There are various issues related
above all understandable by the user. Good data to the data sources, some are practical such as the
visualization eases the interpretation of DM results, as diversity of data types, while others are philosophical
well as helps users better understand their needs. There like the data glut problem. We certainly have an excess
are many visualization ideas and proposals for effective of data since we already have more data than we can
data graphical presentation. However, there is still much handle and we are still collecting data at an even higher
research to accomplish in. The major issues related to rate. If the spread of database management systems has
user interfaces and visualization is “screen real-estate”, helped increase the gathering of information, the advent
information rendering, and interaction. Interactivity of DM is certainly encouraging more data harvesting.
with the data and DM results is crucial since it provides The current practice is to collect as much data as
means for the user to focus and refine the mining tasks, possible now and process it. The concern is whether we
as well as to picture the discovered knowledge [6] from are collecting the right data at the appropriate amount,
different angles and at different conceptual levels. whether we know what we want to do with it, and
whether we distinguish between what data is important
C. Mining methodology issues: These issues pertain and what data is insignificant. Regarding the practical
to the DM approaches applied and their limitations. issues related to data sources, there is the subject of
Topics such as versatility of the mining approaches, heterogeneous databases and the focus on diverse
diversity of data available, dimensionality of the complex data types. We are storing different types of
domain, broad analysis needs (when known), data in a variety of repositories.
assessment of the knowledge discovered, exploitation of
background knowledge and metadata[6], control and
handling of noise in data etc. are all examples that can III. PROPOSED ALGORITHM
dictate mining methodology choices. Most algorithms
assume data to be noise-free. This is of course a strong Algorithm -1:
assumption. Most datasets contain exceptions, invalid Input: Data set R, Attribute set Ai
or incomplete information, which may complicate the Output: data set R’
analysis process and in many cases compromise the R’ -> R
accuracy of the results. As a consequence, data For I=1 to n do
pre-processing (data cleaning and transformation) Max (Ai) = the deepest node in the attribute set Ai
becomes vital and the most important phase in the If Max(Ai). Distance_to_max<Ii
knowledge discovery process. DM techniques should be Newnode=node.root_path_array[Ii-node.distance_t
able to handle noise in data or incomplete information. -_max]
More than the size of data, the size of the search space is Else
Newnode=max(Ai)
even more decisive for DM techniques. The search
Endif
space usually grows exponentially when the number of
Replace node with new node
dimensions increases. This is known as the curse of
Endfor
dimensionality. This “curse” affects so badly the Remove duplication from R’
performance of some data mining approaches that it is End
becoming one of the most urgent issues to solve.
239
International Journal of Soft Computing and Engineering (IJSCE)
ISSN: 2231-2307, Volume-1, Issue-5, November 2011
R’ <- 0 interference by the Information Communication Technology
N= | R | and DM areas. Higher Education system in India, now a day’s
For I=0 to N-1 do totally depended on DM majors. The demand and problem
r <- ri solving abilities within the framework of logical argument
M <- |r| and accuracy of result need to explore through research and
For j=0 to M-1 do development procedure. Efforts are made by Government,
If ri inconsistent with rule rn E then NGOs and Independent bodies trying to make social
Restore the dropped condition aj problems solve able easily through the DM. Through the
Endif algorithm and the experimental results we can conclude that
the data mining techniques are very much useful in the
End for
development and finding out the solutions of higher education
Included in rule r
in Madhya Pradesh state.
If rule r is not logically include in a rule r’ E MRULE then
MRULE <- r U MRULE
Endif REFERENCES
End
1. Cave, M., Kogan, M. and Hanney, S. (1990), “The scope and effects
of performance measurement in British higher education, in F. J. R.
R: Data set which is a any college web site because Ontology C.” Dochy, M. S. R. Segers and W. H. F. W. Wijnen (Eds.),
is used for specific domain. “Management Information and Performance Indicators in Higher
R’: is the output Education,”Van Gorcum and Comp, 48–49.
2. Fielden, J., and Abercromby, K. (2000), “UNESCO Higher Education
Ai is the attribute set
Indicators Study: Accountability and International Co-operation in
Max (Ai) is the function which finds the deepest node in the the Renewal of Higher Education”, Georgia Professional Standards.
attribute set Ai. UNESCO, Paris.
3. Han, J. and Kamber, M. (2001), “Data Mining: Concepts and
Techniques”, Simon Fraser University, Organ Kaufmann.
4. Johnstone, J.N. (1976), “Indicators of the Performance of Educational
IV. RESULTS AND DISCUSSIONS Systems. UNESCO”, International Institute for Educational Planning,
Paris.
5. Luan, J. (2002a), “Data mining and knowledge management in higher
education – potential applications”, In Proceedings of AIR Forum,
Toronto, Canada.
6. Luan, J. (2002b), “Data Mining Application in Higher Education”,
SPSS Executive Report.
V. CONCLUSIONS
The DM is directly associated with use of technology for
accessing data and to give result as required in a desired way.
In Indian contest though computer literacy among the users
are very low but its applicability in different sectors of the
society is highly demandable day to day. With specific to
education sector it has great demand both teaching and
learning prospects. The management aspects are highly
240