Student Profiling On Academic

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

IBIMA Publishing

Journal of e-Learning & Higher Education


http://www.ibimapublishing.com/journals/JELHE/jelhe.html
Vol. 2012 (2012), Article ID 622480, 8 pages
DOI: 10.5171/2012.622480

Student Profiling on Academic


Performance Using Cluster Analysis
Osman N. Darcan and Bertan Y. Badur
MIS Department, Boğaziçi University, Istanbul, Turkey
______________________________________________________________________________________________________________

This study is carried out in Management Information System (MIS) department which accepts
students from general and vocational high schools with widely varying range of educational
backgrounds. As an emerging interdisciplinary field, MIS education demands both technical and
managerial skills from its students. However, students with different backgrounds have to
pursue the same diversified set of courses. The aim of this study is to investigate students’
segments and profiles based on the various dimensions of academic abilities they possess, by
performing cluster analysis. The data set consists of the student official grade for the required
courses. First, dimensionality of the course grades is reduced to a few independent abilities by
performing factor analysis. The summed scales representing the independent factors are then
used in the cluster analysis to obtain student segments. Finally, variation of the student
background measured by high school type is profiled for each segment. The students from
general high schools have been more successful in MIS education compared to students from
vocational schools where only the basic knowledge on management or computer skills is
offered. The results of this analysis are also utilized in shaping various macro and micro level
strategies in our MIS department.

Keywords: Educational data mining, factor analysis, cluster analysis.


______________________________________________________________________________________________________________

Introduction courses. Specialized courses are offered in


the last two years to provide the student
Management Information Systems (MIS) with a strong foundation in information
combines the disciplines of management management.
and computer science to manage
information (Laudon and Laudon, 2009). In Turkey, students have to take a
As an emerging interdisciplinary field, MIS nationwide entrance exam to study at a
demands both technical and managerial university. The main objective of this exam
skills from its graduates. The curriculum of is to measure the candidate’s basic
MIS department in Boğaziçi University is knowledge in social and technical high
designed to deliver a balanced set of school courses. Based on these
management and computer courses in measurements, composite scores are
order to prepare students for developing calculated in selection of these candidates.
and maintaining business information As a direct consequence of this, students
systems. Courses offered in the MIS from general high schools and vocational
curriculum cover a wide range of topics high schools (mainly from computer and
that include management and organization, management departments) with widely
economics, marketing, accounting and varying range of backgrounds are admitted
finance, computer programming, system to the MIS department. Students with
design concepts, database management, different backgrounds have to pursue the
data communication and operations same diversified set of courses such as
research. In the first two years, students programming, managerial and quantitative
take basic management and computing subjects as well as analysis and design.

Copyright © 2012 Osman N. Darcan and Bertan Y. Badur. This is an open access article distributed under
the Creative Commons Attribution License unported 3.0, which permits unrestricted use, distribution, and
reproduction in any medium, provided that original work is properly cited. Contact author: Osman N.
Darcan E-mail: [email protected]
Journal of e-Learning & Higher Education 2

The aim of this study is to investigate the simplified abstract views of the complex
profiles of students in MIS department by reality. Quantitative models such as
performing cluster analysis on various classification and regression are examples
dimensions of academic abilities based on of predictive functionalities. Prediction
their official grade data for the required involves using some known variables to
courses. Characteristics of students in each predict unknown or future values of other
cluster are examined to gain inside variables of interest. Classification is the
knowledge about how such attributes as process of finding a set of models that
educational background and high school describe and distinguish data using a
types are distributed over each segment. training dataset. The derived model is used
Especially, how the distribution of category to predict class labels that are unknown.
of high school types varies among different Regression analysis is similar to
segments are of interest to shape strategic classification, but it is used to predict a
decision of our department. continuous target variable.

The outline of this paper is as follows. In This study can be categorized as


Section 2, basic data mining functionalities educational data mining which is an
are introduced and related works in emerging discipline, concerned with
educational data mining are summarized. developing methods for exploring the
The methodology of this study is presented unique types of data that come from the
in Section 3, which is followed by the educational context. Educational data
description of data in Section 4. Section 5 mining is a new research area. A survey of
discusses the results in detail. Finally, the the application of data mining techniques
last section summarizes our work and to various educational systems is given in
presents how the result of the analysis is Romero and Ventura (2007). These
used in the department under question. techniques include data visualization,
clustering, classification and association
Educational Data Mining analysis applied to educational systems
such as traditional education, distant
Data mining is the process of analyzing education as well as the learning content
data from different perspectives and management systems.
summarizing it into useful information.
Data mining functionalities are classified In an another work of Romero et al. (2008)
into two broad categories as descriptive on educational data mining, the application
and predictive ones (Han et al., 2011). of various data mining techniques on data
Descriptive functionalities help understand collected from the activities of students
characteristics of data in databases. who use Moodle e-learning course
Description focuses on finding human- management system is discussed.
interpretable patterns describing the data. Typical applications of education data
Data visualization, association analysis, mining are in the following areas:
clustering are examples of descriptive predicting academic success, (Ma et al,
functionalities. Data visualization aims to 2000; Barker et al, 2004; Herzog, 2006),
communicate data clearly and effectively predicting the course outcomes,
through graphical representation. (Hämäläinen et al, 2006; Bresfelean et al,
Association analysis is the discovery of rules 2008), cluster analysis of e-learning
in transactional databases. It is widely used material (Drigas and Vrettaros, 2004; Tane
for market basket analysis to uncover the et al, 2004; Hammouda and Kamel, 2005)
items that are purchased together. and learning log analysis (Hadwin et al,
Clustering analysis identifies clusters 2005; Nesbit and Hadwin, 2006).
embedded in the data, where a cluster is a
collection of data objects that are similar to To the extent of our knowledge, a closely
one another. On the other hand, predictive related research is in Dzemyda (2005),
functionalities make predictions based on where a method for the analysis of
inferences. Predictive functionalities curricula via the statistical analysis of
generally are based on models which are examination results is proposed.
3 Journal of e-Learning & Higher Education

The method is grounded on the qualitative variables that are not included
visualization of a set of academic subjects in cluster analysis (Sharma, 1995).
characterized by their correlation matrix of
25 subjects obtained using examination As can be seen in Table 1, there are 31
results multidimensional data. The required courses in our current MIS
correlation matrix has been analyzed to undergraduate curriculum, so a dimension
test the relation between the aptitudes of reduction strategy is needed to obtain
students and the marks earned in the independent factors. Courses requiring
related subjects. similar abilities from students are expected
to fall under the same factors. One possible
Methodology approach is based on identifying these
different dimensions subjectively using
This study aims at clustering domain knowledge; this can be
undergraduate students in the MIS accomplished by assigning a weight to each
department of Boğaziçi University based on of these ability dimensions for each course.
course grades data. After forming student These weights can be obtained from
clusters, a profile analysis was carried out instructors or students by designing
so as to examine the variation of other appropriate questionnaires and combining
student characteristics in different student their opinions accordingly.
segments. These characteristics are

Table 1: MIS Department Course List

Course Code Course Name


MIS 111 Economics I
MIS 112 Economics II
MIS 113 Management and Organization
MIS 114 Business Law
MIS 116 Principles of Marketing
MIS 125 Intr. to Info. Systems and Technology
MIS 131 Introduction to Algorithms
MIS 134 Introduction to Database
MIS 143 Business Mathematics I
MIS 144 Business Mathematics II
MIS 211 Financial Accounting
MIS 212 Managerial Accounting
MIS 213 Quantitative Techniques
MIS 224 Research Methodology
MIS 231 Introduction to Programming
MIS 236 Intermediate Programming
MIS 251 Computer Hardware and Sys. Software
MIS 252 Business Data Communications
MIS 313 Quantitative Analysis for Decision Making
MIS 316 Finance
MIS 317 Interpersonal Communication
MIS 321 Systems Analysis and Design
MIS 326 Object Oriented Modeling
MIS 335 Database Systems
MIS 336 Business Program Development
MIS 374 Internet Info. Services
MIS 415 Human Factors in Computing
MIS 417 Legal and Ethical Issues in Computing
MIS 424 Information Systems Management
MIS 426 Enterprise Information Systems
MIS 463 Decision Support Systems for Business
Journal of e-Learning & Higher Education 4

The second approach, as followed in this measured by the units of standard


study, is the factor analysis which is a deviations above or below the average
multivariate statistical method whose course grade of that year. Course averages
primary purpose is to define the structure are mapped to 2.0 for the sake of easy
of data. It can be utilized to examine the interpretation instead of using the well-
underlying patterns for a large number of known z scores.
variables and to determine whether these
patterns can be condensed or summarized Description of Data
in a smaller set of factors or components.
The correlation between the original The data set is obtained from the
variables and the factors are called factor Registration Office of the University. The
loadings. Once the initial solution (set of data set contains records that include
independent factors and their loadings) is information about the student number,
obtained, a rotation method can be applied course code, semester, letter grade and
to facilitate interpretation of the solution. status, as well as records that contain
The rotation method is expected to alter student’s personal information such as
the decomposition of variance explained by gender, high school name and type. The
different factors. Factor analysis can be sample period ranges from fall-2000
carried out with different techniques such semester to fall-2007 semester. There are
as principle component factoring, principle 467 MIS students in the data set. The
axis factoring and maximum-likelihood letters ranging from AA (excellent) to F
(Basilevsky, 1994; Hair et al., 2009; (Fail) are mapped to a ratio scale numerical
Sharma, 1995). variable with scores ranging from 4 (for
AA) to 0 (for F). In the case of a student
In this study, factor analysis using the taking the same course more than once, the
principle component factoring is applied to average of the all earned grades is used as
obtain the underlying factors representing its final score. The data is converted into
the ability dimensions of student grades. tabular format where rows represent
Varimax rotation method is used to obtain students and columns represent courses,
the rotated factor loadings. hence each cell contains the score of a
particular student for a specific course. In
For each factor, summed scales are this format, the data contains a lot of
computed by taking the arithmetic missing values since new students do not
averages of highly loading courses on that have any junior or senior course grades. No
factor (Hair et al., 2009). Since all variables missing value handling method is used as
are numerical with ratio scale, cluster factor analysis is based on computing the
analysis is performed by k-means correlations among variables, because
algorithm (Han et al., 2011; Mirkin, 2005). replacing missing scores with the means
After forming the clusters, qualitative may introduce bias into estimations.
characteristics of students in each segment
are examined by cross tabulations. Results

Only the grades of the required courses Both factor and cluster analysis were
offered by the department are used in this performed using SPSS version 16.0 (SPSS,
study. The elective course grades are 2009). The results of the factor analysis are
omitted due to the heterogeneous nature of shown in Table 2.
these grades. In addition, considering the
fact that each instructor may have different Kaiser-Meyer-Olkin (Kaiser, 1970)
grading policies and even the same measure of sampling adequacy gives a
instructor’s grading patterns may change value of 0.846, Bartlett’s test sphericity has
over time, for each year the course grades a chi-square value of 1467. With 435
are standardized by mapping the course degrees of freedom, this has a p value of
average grade to 2.0 and standard 0.000. Both of these statistics indicate that
deviation to 1.0. Hence, for each specific the data set is suitable for factor analysis.
year, the success of a particular student is There are four factors whose Eigen values
5 Journal of e-Learning & Higher Education

exceed unity that explain 63% of the total organizational behavior. The reliability of
variance in the data set. Considering the these four factors is examined by
description of the required courses in MIS computing the Cronbach’s alpha values
department given in Table 1 and the individually. The Cronbach’s alpha values
rotated factor (component) loading matrix shown in Table 3 indicate that factors 1 to
presented in Table 2 reveals that factor 1 4 are all reliable.
consists of programming courses. Factor 2
basically contains quantitative courses such Results of the cluster analysis are
as mathematics, statistics as well as the two presented in Table 4. The k-means
introductory economics courses. Factor 3 algorithm is experimented with different
collects the courses that require system number of clusters. In all these
thinking and design ability of students. experiments, similar clustering patterns
Finally, factor 4 includes managerial are observed. Number of clusters is chosen
courses such as marketing and as 6.

Table 2: Results of Factor Analysis: Rotated Component Matrix

Component
1 2 3 4
MIS 231 .735 .241 -.037 .226
MIS 374 .703 .142 .156 .075
MIS 236 .693 .347 .180 .103
MIS 251 .691 .328 .332 .226
MIS 134 .654 .142 .349 .281
MIS 131 .630 .307 -.002 .366
MIS 335 .586 .332 .309 -.062
MIS 252 .575 .252 .417 .075
MIS 125 .574 -.015 .262 .457
MIS 212 .501 .238 .172 .135
MIS 336 .493 .323 .420 -.072
MIS 316 .464 .416 .409 .119
MIS 144 .249 .818 .122 .109
MIS 143 .267 .751 .120 .234
MIS 112 .378 .656 .133 .284
MIS 313 .498 .594 .348 -.026
MIS 111 .416 .566 .056 .448
MIS 213 .447 .537 .366 .261
MIS 417 .088 -.103 .747 .182
MIS 463 .179 .095 .647 .230
MIS 426 .146 .083 .646 .020
MIS 224 .236 .450 .558 .285
MIS 317 .227 .354 .553 .318
MIS 424 .098 .344 .503 .033
MIS 321 .469 .288 .498 -.033
MIS 113 .100 .248 .213 .751
MIS 116 .303 .364 .435 .508
MIS 211 .408 .209 .237 .475
MIS 114 -.011 .023 .031 .281
MIS 326 .432 .369 .200 -.148
Journal of e-Learning & Higher Education 6

Table 3: Results of Reliability Analysis

Cronbach’s
Courses
Alpha

MIS125, MIS131, MIS134, MIS212, MIS213, MIS231,


Factor 1 0.937 MIS236, MIS251, MIS252, MIS313, MIS316, MIS321,
MIS335, MIS326, MIS336, MIS374

MIS111, MIS112, MIS143, MIS144, MIS213, MIS224, MIS313,


Factor 2 0.914
MIS316

Factor 3 0.856 MIS224, MIS317, MIS321, MIS424, MIS426, MIS463, MIS417

Factor 4 0.819 MIS111, MIS113, MIS116, MIS125, MIS211

Table 4: Final Cluster Centers

Cluster
1 2 3 4 5 6
F1 (Programming) 1.28880 .18754 2.88221 1.93007 2.38013 1.38250

F2 (Quantitative) 1.29870 .99043 3.13647 1.84014 2.42301 1.44350

F3 (System) 1.31171 .55253 2.86094 1.83856 2.36766 .95564

F4 (Managerial) 1.13553 -.22458 2.77570 1.99219 2.42438 1.89805

Number of Students 45 7 78 76 93 40

Examination of the cluster centers in Table cluster 2 which can be treated as outliers.
4 reveals that: Cluster 3 represents the These students have performed very
most successful students. In all four poorly in all abilities.
dimensions, their grades are approximately
one standard deviation above the mean. Table 5 presents cross tabulations of the
The students in Cluster 5 represent the clusters and high school types. By
second successful group whose grades are examining the student’s personnel
in general 0.4 standard deviation above the information records, three main categories
average. Cluster 4 is characterized by the of high school types can be identified as
average students. Unsuccessful students vocational commerce (School Type 1),
are grouped in Clusters 1 and 6, whose vocational computer (School Type 2) and
grades are below the average in all the general high schools (School Type 3). The
abilities. However, students in these chi-square statistics with 8 degrees of
clusters have similar programming and freedom has a p-value of 0.000. The null
quantitative abilities (F1, F2) but they are hypothesis of independence of school type
differentiated in the system and managerial and student clusters can be rejected at a 1
dimensions (F3, F4). Compared to cluster 1, % confidence level. 68 % of the general
the managerial abilities of students in high school students are successful in MIS
cluster 6 are higher by 0.7 standard education (fall in Cluster 3 and 5). On the
deviation, whereas their system thinking other hand, for students from computer
abilities are lower by 0.4 standard and commerce high school, this percentage
deviation. There are only 7 students in is approximately 48.6 and 35.5
7 Journal of e-Learning & Higher Education

respectively. 19.8 % of the students from Approximately one third of computer and
commerce high school fall in Cluster 6 commerce high school students are
which is approximately two times higher average students.
than students from other types of schools.

Table 5: Cross Tabulation of School Types and Student Clusters

Student Segments
1 3 4 5 6 Total
Count 12 13 29 19 18 91
1 % within school 13.2% 14.3% 31.9% 20.9% 19.8% 100.0%
% within cluster 30.8% 17.1% 38.7% 21.1% 47.4% 28.6%
School Type

Count 12 18 31 32 10 103
2 % within school 11.7% 17.5% 30.1% 31.1% 9.7% 100.0%
% within cluster 30.8% 23.7% 41.3% 35.6% 26.3% 32.4%
Count 15 45 15 39 10 124
3 % within school 12.1% 36.3% 12.1% 31.5% 8.1% 100.0%
% within cluster 38.5% 59.2% 20.0% 43.3% 26.3% 39.0%
Count 39 76 75 90 38 318
Total % within school 12.3% 23.9% 23.6% 28.3% 11.9% 100.0%
% within cluster 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%

Conclusions
• The programming courses are offered in
the first two years of the MIS curriculum.
In this study, we have explored different
The assignment of the students to
student segments by performing cluster
different sections of the programming
analysis on various dimensions of academic
courses as well as the curriculum design
abilities for the MIS department of Boğaziçi
for these sections can be carried out by
University. Based on these segments, the
considering students’ backgrounds.
profiles of students including categorical
variables such as educational background • The projects assigned to students in
and high school types are determined. courses in the last two years of the
These profiles are used in two ways: (1) to program require different skills
investigate how high school type that (programming, managerial, quantitative,
varies among different segments effects the or system) of the students; hence, the
education; and (2) to distribute students in group member’s composition can be
various elective courses and projects as determined based on the results of this
well as to revise educational strategies of study.
the department related to the curriculum.
• The results can be used to offer a
Since students have to take a nationwide different type of elective courses
entrance exam to enter a university, the according to the background of the
department has no control over the current students in a particular semester
selection process of the undergraduate as well as designing elective tracks.
students. Therefore, the evaluation as
defined in this paper cannot be applied to • The academic advisors of the students
the selection of students. However, the can consider the findings to guide
results of this study can be used in the students in selecting appropriate
following areas to improve the quality of complementary or departmental elective
the MIS education: courses.
Journal of e-Learning & Higher Education 8

References Han, J., Kamber, M. & Pei, J. (2011). 'Data


Mining: Concepts and Techniques,' Morgan
Barker, K., Trafalis, T. & Rhoads, T. R. Kaufmann, San Francisco, CA.
(2004). "Learning from Student Data,"
Proceedings of the 2004, IEEE Systems and Herzog, S. (2006). "Estimating Student
Information Engineering Design Retention and Degree-Completion Time:
Symposium, University of Virginia, Decision Trees and Neural Networks Vis-a-
Charlottesville, 79–86. Vis Regression," New Directions for
Institutional Research, 131, 17–33.
Basilevsky, A. (1994). "Statistical Factor
Analysis and Related Methods: Theory and Kaiser, H. F. (1970). "A Second Generation
Applications," Wiley-Interscience, New Little Jiffy," Psychometrica, 35(4), 401-415.
York. Laudon, K. C. & Laudon, J. P. (2009).
'Management Information Systems,'
Bresfelean, V. P., Bresfelean, M., Ghisoiu, N.
Prentice Hall, New Jersey.
& Comes, C. A. (2008). "Determining
Students’ Academic Failure Profile Ma Y., Liu, B., Wong, C. K., Yu, P. S. & Lee, S.
Founded on Data Mining Methods," M. (2000). "Targeting the Right Students
Proceedings of the 30th International Using Data Mining," Proceedings of the
Conference on Information Technology Sixth ACM SIGKDD International
Interfaces (ITI 2008), Dubrovnik, Croatia, Conference on Knowledge Discovery and
317–322. Data Mining (KDD’00), ACM Press, New
York, 457–464.
Drigas, A. & Vrettaros, J. (2004). "An
Intelligent Tool for Building e-Learning Mirkin, B. (2005). "Clustering for Data
Contend-Material Using Natural Language Mining: A Data Recovery Approach,"
in Digital Libraries," WSEAS Trans. Inf. Sci. Chapman & Hall/CRC, Boca Raton, Florida.
Appl., 5(1), 1197–1205.
Nesbit, J. C. & Hadwin, A. F. (2006).
Dzemyda, G. (2005). "Multidimensional ‘Methodological Issues in Educational
Data Visualization in the Statistical Analysis Psychology,’ Handbook of Educational
of Curricula," Computational Statistics & Psychology, Alexander, P. A. and Winne, P.
Data Analysis, 49(1), 265 – 281. H. (ed), Mahwah, NJ: Erlbaum, 825–847.

Hadwin, A. F., Winne, P. H. & Nesbit, J. C. Romero, C. & Ventura, S. (2007).


(2005), Annual Review: "Roles for Software "Educational Data Mining: A Survey from
Technologies in Advancing Research and 1995 to 2005," Expert Systems with
Theory in Educational Psychology," Br. J. Applications, 33(1), 135-146.
Educ. Psychol.,75, 1–24.
Romero, C., Ventura, S. & Garcia, E. (2008).
Hair, J. F., Anderson, R. E., Tatham, R. L. & "Data Mining in Course Management
Black, W. C. (2009). 'Multivariate Data Systems: Moodle Case Study and Tutorial,"
Analysis,' Prentice Hall, New Jersey. Computers & Education, 51(1), 368-384.

Hämäläinen, W., Laine, T. H. & Sutinen. E. Sharma, S. (1995). "Applied Multivariate


(2006). "Data Mining in Personalizing Techniques," J. Wiley, New York.
Distance Education Courses," Data Mining
in e-Learning, Romero, C and Ventura, S. SPSS Inc. (2009). Clementine 12.0 User
(eds), WitPress, Southampton, U.K., 157– Manual.
171. Tane, J., Schmitz, C. & Stumme, G. (2004).
"Semantic Resource Management for the
Hammouda, K. & Kamel, M. (2005). 'Data Web: An E-Learning Application,"
Mining in E-learning, E-Learning Proceedings 13th World Wide Web
Networked Environments and Conference, WWW2004, Fieldman, S. and
Architectures: A Knowledge Processing Uretsky, M. (eds.), ACM Press, New York, 1–
Perspective,' Pierre, S. (ed), Springer- 10.
Verlag, Berlin, Germany.

You might also like