Predicting The Admissions of Students in Masters Program Using Machine Learning

Nowadays we see many students conducting their studies away from their home countries. The main
country targeted for these foreign students is the United States of America. Most foreign students in the
United States of America come from India and China. Over the past decade the number of Indian
students studying for graduate studies from the USA has increased rapidly. With the increasing number
of foreign students studying in the USA, each applicant has to face a tough competition to get into their
dream university. Often as students do not have much knowledge of the procedures, requirements and
details of USA universities they seek the help of educational advisor firms to help them successfully
secure entry into universities that suit their profile, as they have the potential to invest as much as
consultation funds. In addition to these educational consulting firms there are a few websites and blogs
that guide students through admission procedures. The reversal of existing resources is that they are very
limited and not truly reliable in terms of their accuracy and reliability. The purpose of this study is to
develop a system that uses machine learning algorithms, which we will call the Student Admission
Predictor (SAP). It will help students identify opportunities for their university applications to be
accepted. It will also help them identify the most relevant universities in their profile and provide them
with details of those universities. A simple user interface will be created so that users can access the SAP

In the empowering world of computation, technologies like machine learning and AI makes a new

changes in the field of computer science to every domain of industry. Thinking about abstract scope of

these, using the technique of machine learning we are going to predict the admissions of a students in

higher studies (master’s program) using the overall performance, scores and various other factors of

students and critically analyze the performance and outcome of the predicted model using the variety of

machine learning algorithms.

In short span of time ideaculate the process of being smart to predict the above scenario is much needed

to see the analytical view of admissions in higher studies (more generally in abroad program). As

machine learning provides a more robust algorithm which are capable of predicting this problem in

efficient manner.

Machine learning agility towards the defined problem and uncertain execution had increased due to

tremendous classification of mathematical defined algorithms. Prediction of admissions for masters

program is a uncertain hypothesis and generation of actual results might tends to true negative. Using the

moden machine learning algorithms it might be trajectory move for the given problem and generates a

true positive results for the future data.

Prediction of students admission deends on the various factors which are generally features in terms

machine learning and the classes(either 0 or 1) specifies that particular students get admissions or not

depending upon the features(independent variables).

Research Methodology

Machine learning algorithms have vast tendency to behave better on the a defined set of problems more

exhaustively in side taken of the providing dataset size. So for the proof of concept of this project we are

using the data in form of csv with more internally obtaining size. More or less there are various

algorithms which we can apply on this project to better predicting the required problem with tendency of

result towards more general and optimally analyze. Also using the base-case as LR algorithm for naïve


The methods used in this project is generally defined for classification problem but not limited to that

particular use-case:

• This is a binary classification problem. The output has only two possibilities either Yes

(1) or No (0).

• Support Vector machine (SVM): Use this algorithm on the given problem. In the SVM

algorithm, we plot each data item as a point in n-dimensional space (where n is number of

features you have) with the value of each feature being the value of a particular

coordinate. Then, we perform classification by finding the hyper-plane that differentiates

the two classes.

Fig:-a) SVM representation for given data

Linear Discriminant Analysis (LDA):It used as dimensionality reduction technique in the pre-

processing step for pattern-classification and machine learning applications. In this project we used as a

same pre-processor defined for linear analysis on feature set.

Define the Hypothesis: Based on the admission criteria in master’s program in India made through

(COAP). Various different features are mandatory apart from the score. The defined hypothesis set are:

1) Gate Score


3) College Rating

4) Research work (MS program IIT’s)

5) Achievements

6) Letter of Recommendation for the research oriented program

7) SOP

These are the various defined hypothesis for the processing admissions in master’s program.

There are various other points which not effect underlying project.

All of the hypothesis plays an important role and providing upon the given dataset. We are using

our university student’s record (primary data) for specifying admissions in higher studies (MS

program). The dataset consists of 250*7 entries excluding 1 column of feature set. The datasets

for verification and processing of project along with that test-set 80*6 excluding the prediction


In this project, using the state-of art algorithm of machine learning for classification of feature set

including the dependent variable(entity). Various validations applied for further increasing rate of

accuracy depends upon the prediction set.


The implementation of the project is done using the main libraries for data science(Numpy, pandas and

Matplotlib) for processing data-frame, collections, cleaning and further preprocessing. The complete

module is implemented on Jupyter using the programming language as Python(3.7). The code

implementation snippets and sample is attached in the file.

1- Load the dataset to the console of the software(jupyter).

2- Data cleaning and pre-processing including the testing of null hypothesis.

3- Manipulate the data-frame for unique description of the variables from data

4- Visualize the entry-set of datasets in different visual design charts and plots.

Fig:-b) The detail count dtype of the desired columns

The dataset have total 110 null count and data-type belong to int and float type where prediction class is

type float and distribution to train and test set accordingly.

Fig: Description of the complete data-frame for prediction

• unified training approach — same loss (MSE), Linear regression model as base model

• The train set and test set split in 80:20 and using the cross fold the model get accuracy above 98%

with +- 2% validation loss.

• No null entries in the dataset apparently state of art cleaning technique method in Pandas library

applied to transformed the required dataset in desired approach.

Fig: c- The subject proposal score vs Admission plot

Fig: d- plot of gate score vs college tier

 It specifies from the plot gate score and cgpa plays a major role in deciding admissions for

higher studies in top government college.

 Apart from that gate score is directly proportion to college tier. Tier-1 have better score

than tier-2 and so on. Random conversion is there for some distributed points.

 Letter of recommendation is also leads with a key role for MS admission where research-

oriented work. For the research work LOR is key factor along with the gate score.

 Linear regression model performs with a state of art technique and generates a prediction

score of almost (100%) with fine-tune some hyper-parameter and reducing the outliers

(few one’s). The hypothesis generated for the dependent variable along the feature set

class tends to actual conversion score of (t-test > 0.5) on the data.

Fig: e- CGPA vs Admission plot representation


The project is evaluated on the state of art design pattern of the machine learning algorithms. Using the

linear regression model data approaches the score of >98% with approx loss on validation with +- 2%.

The LR model approaches with best prediction score. By fine-tune some-parameters and ensemble

model get the best predicted score. There is no difference between the actual and predicted result.

Apparaently using the SVM model the score not effected much with classification accuracy of 98%.

Both the model performs at their best and core prediction with the best accuracy rate. Some fine-tune the

parameters model accuracy with highest. Loss function and confusion matrix not present at the higher

rate of tunning and iteration(train_test split).

Fig; Actual score Vs Predicted score for Admissions in MS program(LR)


To examine the admission prediction rate of a student in higher studies using the complexity analysis of

various machine learning algorithms(SVM, LDA) and their performance measure on the selected dataset

(students various factos including marks in competitive exams, class scores, projects and many more).

This problem is more tend to a classification problem in speciality of the defined aim. Using the machine

learning we critically predict the result set and generate analysis on the outcome result. Machine learning

techniques helps to predict results such that students gets a probalistic to insights on their admissions and

the idea of major focusing features which are deciding factor of admission.

Using the Linear Regression model the accuracy attained it best for the prediction of sores based on a

various features for the admissions in the higher program. The ML based approaches works it best when

data as it’s desired limit. The ML algorithms are tends to more optimize towards the real problems with

concentric fine-tune of paramters and labelling of datasets.


Machine learning algorithms are works best with defined-sets. In the prediction of admissions for higher

studies the state of art machine learning algorithms works better and predicted the result with better

accuracy. Built-in support of Scikit-learn makes models to predict, analyze and defines the result scores

more accurately and in scalable order.

• Using the LR model performs it’s best and predicts with the accuracy score > 98%.

• The state-of-art principle approaches the best with the desired data-sets

• The data-cleaning and pre-processing is also a defined factor of model-tune and prediction with

higher percent.

• The machine learning model defines in the given sets with depending upon the feature set and

class to predict.


