Human Annotator For Imbalanced Dossier
Human Annotator For Imbalanced Dossier
Human Annotator For Imbalanced Dossier
v
TABLE OF CONTENTS
vi
3.2 EXISTING SYSTEM 14
3.2.1 Disadvantage 14
3.3 PROPOSED SYSTEM 15
3.3.1 Advantage 15
3.4 MODULES OF THE PROJECT 16
3.4.1 Authentication and Authorization 16
3.4.2 Material Upload 16
3.4.3 Active Learning 17
3.4.4 Active Learning with Extreme Learning 17
Machine
3.4.5 Learning Material 17
3.5 FEASIBILITY REPORT 17
3.5.1 Technical Feasibility 18
3.5.2 Operational Feasibility 19
3.5.3 Economic Feasibility 19
3.6 SOFTWARE REQUIREMENT SPECIFICATION 19
3.6.1 Developers Responsibilities Overview 20
3.6.2 Functional Requirements 20
3.6.3 Non-Functional Requirements 20
3.6.4 Performance Requirements 21
3.7 THE .NET FRAMEWORK ARCHITECTURE 21
3.7.1 Common Language Runtime Engine 22
3.7.2 Language Independence 23
3.7.3 Framework Class Library 23
3.7.4 Simplified Deployment 23
3.7.5 Security 23
3.7.6 Portability 23
3.7.7 Common Language Specification 24
3.8 SQL SERVER 2014 24
3.9 SYSTEM DESIGN 25
3.10 SYSTEM TESTING AND IMPLEMENTATION 26
vii
3.10.1 Strategic Approach to Software Testing 26
3.11 SYSTEM SECURITY 27
3.11.1 Security in Software 27
3.11.1.1 Client Side Validation 28
3.11.1.2 Server Side Validation 28
4 RESULTS AND DISCUSSIONS 29
4.1 NORMILIZATION 29
4.2 E-R DIAGRAMS 32
4.3 DATA FLOW DIAGRAM 37
4.4 ALGORITHM 38
5 CONCLUSION AND FUTURE WORK 41
REFERENCES 42
APPENDIX 43
A. SAMPLE CODE 43
B. SCREENSHOTS 53
C. PUBLICATION WITH PLAGARISM 59
REPORT
viii
LIST OF FIGURES
ix
LIST OF TABLES
x
LIST OF ABBREVIATIONS
ABBREVIATIONS EXPANSION
ANSI American National Standards Institute
AOW-ELM Active Online Weighted-Extreme Learning Machine
ASP Active Server Page
ASR Automatic Speech Recognition
BCL Base Class Library
BLOBs Binary Large Objects
CBIR Content Based Image Retrieval
CLR Common Language Runtime
ELM Extreme Learning Machine
FCL Framework Class Library
MLP Multiple Level Perceptron
SQL Structured Query language
SRS Software Requirement Specification
SVM Support Vector Machine
WCF Windows Communication Foundation
xi
CHAPTER 1
INTRODUCTION
1
collect the imbalanced data, he will segregate labeled and unlabelled data to provide
a complete learning material.
2
considers the unlabeled instance that has the most predictive divergence among
multiple diverse baseline classifiers to be more significant. In addition, active learning
models can also be divided into different categories according to which kind of
classifier has been adopted.
Some popular classifiers, including naive Bayes, k-nearest neighbors,
decision tree, multiple level perceptron (MLP), logistic regression, support vector
machine (SVM), and extreme learning machine (ELM), have all been developed to
satisfy the requirements of active learning. In the past decade, active learning has
also been deployed in a variety of real world applications, such as video annotation,
image retrieval, text classification, remote sensing, image annotation, speech
recognition, network intrusion detection, and bioinformatics.
The proposed algorithm is named active online weighted ELM (AOW-
ELM), and it should be applied in the pool-based batch-mode active learning scenario
with an uncertainty significance measure and ELM classifier. In AOW-ELM, we first
take advantage of the idea of cost-sensitive learning to select the weighted ELM
(WELM) as the base learner to address the class imbalance problem existing in the
procedure of active learning. Then, we adopt the AL-ELM algorithm presented in our
previous paper to construct an active learning framework. Next, we deduce an
efficient online learning mode of WELM in theory and design an effective weight
update rule. Finally, benefiting from the idea of the margin exhaustion criterion, we
present a more flexible and effective early stopping criterion. Moreover, we try to
simply discuss why active learning can be disturbed by skewed instance distribution,
further investigating the influence of three main distribution factors, including the class
imbalance ratio, class overlapping, and small disjunction. Specifically, we suggest
adopting the clustering techniques to previously select the initially labeled seed set,
and thereby avoid the missed cluster effect and cold start phenomenon as much as
possible. Experiments are conducted on 32 binary-class imbalanced data sets, and
the results demonstrate that the proposed algorithmic framework is generally more
effective and efficient than several state-of-theart active learning algorithms that were
specifically designed for the class imbalance scenario.
3
1.4.1. Machine Learning Methods