New Template Geoplanning 2021

GeoPlanning
Journal of Geomatics and Planning Vol. x, No. x, year
e-ISSN: 2355-6544 Original Research

Received: ..........;
Accepted: ..........;
Determinant of Recent In-Migrants in Yogyakarta
Published: .......... using Bayesian Regression Logistics
Keywords: Devi Azarina Manzilir Rohmah1*, Ani Budi Astuti1, Achmad Efendi1
Recent In-migrants, Bayesian
Logistic Regression, Yogyakarta 1. Brawijaya University, Indonesia
DOI: 10.14710/geoplanning.8.1.pp-pp
*Corresponding author(s) email:
[email protected]
Abstract
Recent in-migrants are defined as a resident living in a province different from the province in the past five years. This
research aims to find out the status determinant of recent in-migrants entering the province of Yogyakarta, Indonesia in
2021 with Bayesian logistic regression combined with a nonlinear principal component analysis in the process of forming a
latent variable. The data were obtained from the results of national socio-economic surveys (SUSENAS) KOR in March
2021. The research results indicate that particular variables such as age, resident latest education, the status of main
activities, the status of residential ownership, housing quality, and asset ownership have significant influences on recent in-
migrants entering the Province of Yogyakarta, Indonesia.
Copyright © 2021 GJGP-Undip
This open access article is distributed under a
Creative Commons Attribution (CC-BY-NC-SA) 4.0 International license
1. Introduction
Logistic regression is a method in statistical analysis to describe the connection between independent
variables having two or more categories of dependent variables with reference to a categorical or interval scale
(Hosmer and Lemeshow, 2013). The logistic regression comprises binary, multinominal, and ordinal logistic
regressions. The binary logistic regression is intended to analyze the connection between one response variable
and several predictor variables. The response variable consists of dichotomous qualitative data with the value of
1 to indicate the existence of a characteristic and 0 to indicate the absence of a characteristic. This research aims
to find out the status determinant of recent in-migrants entering Province of Yogyakarta in 2021. Logistic
regression is used to analyze this topic due to its dependent variable which consist of dichotomous qualitative
data.
Migration serves as the response to the variation in the condition of a neighbourhood where the population
resides. Lee (1976) argues that there are several matters affecting migration, and economic motive is among
others. Recent migration, one of the types of migration, refers to recent migrants whose province in the past five
years was different from the province during the census. In other words, the recent migrants entering the
province of Yogyakarta refer to the residents living in another province other than Yogyakarta before the census
took place. The data on recent migrants were obtained from National Socio-Economic Survey (SUSENAS) KOR
in March 2021.
According to SUSENAS KOR in March 2021, Yogyakarta has 5.9% recent in-migrants, which makes
Yogyakarta has the highest number among other provinces in Indonesia. This number indicates that there are
5.9% civilians in Yogyakarta had a place of residence outside this province five (5) years before the enumeration
took place in 2021. It is interesting due to fact that Yogyakarta had the lowest Province Minimum Wage (UMP)
for years, including in 2021. This is contradicted with the common theory, such as Lee (1976) that states
economic motive is among other migration determinant.
1
Rohmah, D. A. M., et al. / Geoplanning: Journal of Geomatics and Planning, Vol x, No x, year, pp-pp
The study conducted by Syairozi and Wijaya (2020) implies that age, education, sex, land ownership, and
real wage differences are significant to the decision to migrate. Furthermore, Dustmann and Glitz (2011) argue
that migration and education are inseparable. Sarmita and Simamora (2018) studied the social and economic
characteristics of migrants from Java using descriptive statistic methods, while Statistics and Data Center of
Education and Culture (2017) defines component variables forming the socio-economic status of households by
referring to the characteristics of housing quality and asset ownership. Of these two studies, it is obvious that the
socio-economic variable is constructed by other indicators such as the characteristics of housing quality and
asset ownership, or in other word, socio-economic variable is a latent variable.
Solimun, et. al. (2017) stated that, generally, latent variable is defined as a variable that cannot be directly
measured but should involve the reflecting or forming indicators. If the indicators forming the latent variable are
analyzed partially, they will result in more variables being studied, leading to inefficiency of the research,
especially in analysis and interpretation. It is not possible to apply arithmetic operations in the indicators forming
the latent variable due to varied measurements. To tackle this issue, logistic regression can be combined with the
nonlinear principal component analysis to transform latent variable data. This method uses a principal
component scoring method obtained from the nonlinear principal component analysis. The result of the
transformation utilizing the nonlinear principal component analysis (NLPCA) could be referred to as data input
for the following analyses, such as logistic regression.
In this logistic regression modelling, parameter estimation is regarded as a vital stage. The performance of
this estimation is often affected by the sample size and data characteristics. An unbalanced dependent variable is
often seen in logistic regression when one of the classes determined is uncommon (Owen, 2007). This condition
could affect the performance of the estimation method used, and to deal with this problem, the Bayesian method
can be employed as an estimation.
The research once conducted by King and Zeng (2001) indicates that the Bayesian method is unbiased for
unbalanced data. They also add that the parameter estimation that refers to Bayesian yields a more relevant result
than the conventional method often used in parameter estimation in logistic regression, namely the Maximum
Likelihood, to model the case with an unbalanced dependent variable. That is, this research adopted the Bayesian
method to estimate the parameter of the logistic regression.
Departing from previous studies, this research adopted pre-existing variables to identify factors affecting
the status of recent migrants entering the province of Yogyakarta 2021 by employing a Bayesian approach of
logistic regression.
2. Methods and Material

2.1. Migration
In a broader term, migration is defined as permanent or semi-permanent residential change
(Tjiptoherijanto, 2009). Mantra (2012) argues that a person can be said to migrate when the person moves to
another residential place permanently or relatively permanently (for a minimum period of time) by reaching a
particular minimum distance or moving from one geographical unit involving the residential change from the
place of origin to another point of destination. According to the Statistics Center, to define the term migration, it
is essential to refer to administrative boundaries that cover provinces, regencies, villages, sub-districts/hamlets,
and the scope of minimum time of six months or less than six months, if the person concerned has a plan to
reside in the place of destination. According to Mantra (2012), recent migration is among other migration types,
where a person can be categorized as a migrant in the residential area during the census is different from five
years ago before the census took place.
2
Lee (1976) argues that there are four factors that need attention in the process of population migration
such as the place of origin, factors existing in the place of destination, obstacles between the place of origin and
destination, and the factors existing in the place of origin and destination.
2.2. Non-linear Principal Component Analysis

A principal component analysis is a multivariate analysis used to transformed the original variable set into
new and smaller variables explaining the majority of the variety of the original variable set (Dillon and
Goldstein, 1984). Gifi (1981) defines the specific multivariate analysis as linear and non-linear. Mixed scale
indicators (metric and non-metric) used the non-linear principal component analysis to transform data. The non-
linear transformation refers to the principal component analysis with optimal scaling from qualitative scale to
quantitative value (Markos, et. al., 2010 and Meulman, et. al., 2004).
In the non-linear principal component analysis, the category of all variables with the scale other than
numeric scale is labelled with categorical quantification with relevant numeric scale; the non-linear principal
component analysis aims to optimize or find the quadratic mean of optimal correlation between variables
labelled with the quantification of categorical components.
Gifi (1981) opines that the observation analyzed using a non-linear principal component will form matrix
H of n × m size. This matrix H is then broken down into vector h jwhich is transformed and normalized in
PRINCALS in package Homals of software R. The transformation result of matrix H to G via vector h j can be
written in the block matrix in equation (1).
G ≜ [ G 1 ⋮ G2 ⋮ ⋯ ⋮ G m ] (1)
From matrix G j , the following process refers to equation (2).
D j ≜ [G ' j G j ] (2)
where:
Dj : diagonal matrix k j × k j with the relative frequency of variable j in the main diagonal
Gj : indicator matrix for each indicator
≜ means defined.
Matrix of quantification category of variable j is formulated with equation (3):
(3)
with
(4)
Yj : multicategory calculation (k j × p)
Dj : diagonal matrix k j × k j (relative frequency of the variable j in the main diagonal)
Gj : indicator matrix for each indicator
X: matrix score of object component (n × m)
3
m : number of variables
G : block compound matrices of G j
Y: a set of multiple and single category quantification

In the non-linear principal component analysis, the optimal linear combination model used to calculate the
principal component score refers to equation (5):
(5)
where:
z : principal component score
aj : component weight with the order p ×1
Gj : indicator matrix j of the size n × k j
Yj : calculation of multicategory with order k j × p
qj : transformation data
2.3. Logistic Regression Analysis

Logistic regression is a basic classification method initially intended for response variable or dependent
variable with two classes namely binary logistic regression, which further develops with a dependent variable
that consists of Multinomial Logistic Regression. Binary logistic regression is a method of analysis used to find
out the connection between a dependent variable that is binary or dichotomous and a predictor variable or
independent variable that is polychotomous (Hosmer and Lemeshow, 2013).
If there is an observation (X, Y) in which the X represents the independent variable and Y is the dependent
one with the (6) and can be transformed as in equation (7):
exp( β 0 + β 1 x 1+ ⋯+ β m x m)
π ( X )= (6)
1+exp(β 0 + β 1 x 1 +⋯ + β m x m )
ln
( π (X)
1−π ( X ) )
=( β 0 + β 1 x 1 +⋯+ βm x m )= βT X (7)
2.3.1. Parameter Estimation in Logistic Regression

One of the methods commonly used to estimate the parameter in logistic regression is the maximum
likelihood (MLE) and Bayesian method. The Bayesian method was developed according to the Bayes theorem.
In this case study, the application of this method is intended to compound information coming from data with
prior probability in terms of model validity level, so that the best model can be selected with the highest
posterior probability and an average sum is obtained (Rachev, et. al., 2008).
In Bayes theorem, Bi, with i=1 , 2 ,... , n as sample space S with P( Bi )≠ 0 and representing an
independent event; thus, for random event A where P( A) ≠0 , probability Biwith condition A is given as
follows:
4
P ( A∨Bi )P(Bi )
P(Bi∨ A)= n
(8)
∑ P(Bi ) P( A∨Bi)
i=1
n
Furthermore, if ∑ P(Bi ) P( A∨Bi ) is regarded as constant, equation (8) will turn to equation (9):
i=1
P( Bi∨ A) ∝ P( A∨B i) P(B i) (9)
According to Ghosh, et. al., (2007), in addition to the model f ( x∨θ) or the likelihood, Bayesian requires
the distribution for θ , or known as prior. Liu and Powers (2012) suggest that non-informative prior can be
referred to without initial knowledge of the parameter distribution to determine the prior distribution, while
Genkin, et. al., (2007) argue that the prior distribution for the parameter in the binary logistic regression model
of Bayesian follows the normal distribution. According to Walpole, et. al., (2012), estimating parameter θ may
refer to the distribution f ( x∨θ) and π (θ), with π (θ) as prior distribution for θ . This refers to θ given X
(observed data) called posterior distribution given in the following formula:
f ( x|θ ) π ( θ )
π (θ|x )= (10)
g(x)
so:
π (θ|x ) ∝ f ( x|θ ) π ( θ ) (11)
2.3.2. Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo or MCMC is based on the construction of Markov Chain that is convergent
with posterior distribution as the target distribution f ( θ| x ) . The MCMC method is also known as an iterative
model, considering that each stage yields a score contingent upon the previous stage. One of the algorithms in
MCMC, in which conditional distribution has the known form, uses Gibbs sampling.
The convergence checking in the MCMC method is intended to figure out whether the data is relevant to
the prior distribution. This convergence checking method of MCMC can refer to Trace Plot, MC error, and
autocorrelation (Ntzoufras, 2011).
Trace Plot; if the model has converged, the trace plot result does not form a particular pattern.
MC error; if the model has converged, the score of MC Error is very low (less than 5% of standard deviation).
Autocorrelation; if the model has converged, the first lag the autocorrelation score is close to one and the
following lag shows an autocorrelation score heading for zero.
Moreover, parameter testing is required to investigate whether the predictor variable significantly affects
the response variable. In the Bayesian method, the parameter test refers to credible intervals of 2.5% and 97.5%
quantile of the distribution. If the credible intervals do not indicate a score of 0, the predictor variable
significantly affects the response variable.
2.3.3. Interpretation
5
In logistic regression modelling, parameter interpretation is aimed to find out the value estimation of the
predictor variable. Interpreting the logistic regression parameter of the categorical variable uses Odds Ratio
(Hosmer and Lemeshow, 2013). The odds ratio represents a ratio between the probability of success and the
probability of failure, leading to relative probability from the probability of success towards the probability of
failure. The odds ratio is also referred to as the exposure association (risk factor) of an event. The following is
equation of the Odds and Odds Ratio (Azen and Walker, 2011):
π
odds= (12)
1−π
Odds 1
Odds Ratio= (13)
Odds 2
2.4. Material
This research employed secondary data gathered from Migrant Profiles of Socio-Economic Survey
Results KOR in March 2021 downloaded from Silastik BPS. The criteria of respondents covered 17 to 64-year-
old Indonesian citizens residing in Yogyakarta during the census. The variables and indicators used in this
research are presented in Table 1.
Table 1. Variable Outlines
Data
Variable Indicator Answer
Scale
Recent Migrants (Y) (0) No Nominal
-
(1) Yes
Age (X1) - 15-64 years old Ratio
Sex (X2) (0) Men Nominal
-
(1) Women
Latest Education (X3) (0) Not going to school Ordinal
(1) Primary- Secondary School
-
(2) High School
(3) University Qualifications
Main Activities (X4) (0) Others Nominal
(1) Studying
-
(2) Working
(3) Taking care of the household
Home ownership(X5) (0) Not under their ownership Ordinal
-
(1) Under their ownership
Housing quality (X6) (0) Fibers and others Ordinal
Widest roof type (I6.2)
(1) Not fibers
(0) Others Ordinal
Widest wall type (I6.3)
(1) Brick
(0) Ground Ordinal
Widest floor type (I6.4)
(1) Not ground
(0) Public property Ordinal
Defecation facility (I6.5)
(1) Own/shared
Toiler type (I6.6) (0) Others Ordinal
6
(1) Gooseneck Toilet

(0) Others Ordinal
Lighting source (I6.7)
(1) Electricity with meter
(0) Others Ordinal
Drinking water facilities (I6.8)
(1) Bottled water/refill
(0) Others Ordinal
Cooking fuel (I6.9)
(1) Electricity/gas
Asset Ownership (X7) (0)No Ordinal
Motorcycle (I7.1)
(1)Yes
(0)No Ordinal
Gold (min.10 gr) (I7.2)
(1)Yes
Data
Variable Indicator Answer
Scale
Flat screen TV (min. 30 inch) (0)No Ordinal
(I7.3) (1)Yes
(0)No Ordinal
Air Conditioner (I7.4)
(1)Yes
(0)No Ordinal
Water heater (I7.5)
(1)Yes
(0)No Ordinal
Gas cylinder >5.5 kg (I7.6)
(1)Yes
(0)No Ordinal
Refrigerator (I7.7)
(1)Yes
(0)No Ordinal
Laptop (I7.8)
(1)Yes
(0)No Ordinal
Car (I7.9)
(1)Yes
(0)No Ordinal
Land (I7.10)
(1)Yes
(0)No Ordinal
Landline (I7.11)
(1)Yes
3. Result and Discussion

3.1. Results of Non-linear Principal Component Analysis
One of the outputs obtained from the non-linear principal component analysis with software R and homals
package is the component loading score. The component loading can be seen in Table 2.
Table 2. Component Loading
Variable Indicators Loading Variable Indicators Loading

X6 I6.1 -0.00546 X7 I7.1 -0.0812
I6.2 -0.17883 I7.2 -0.18931
I6.3 -0.2073 I7.3 -0.19984
I6.4 -0.21008 I7.4 -0.21103
7
I6.5 -0.25341 I7.5 -0.14889

I6.6 -0.12467 I7.6 -0.21292
I6.7 -0.08904 I7.7 -0.16007
I6.8 -0.12449 I7.8 -0.17334
I7.9 -0.21122
I7.10 -0.05051
I7.11 -0.1562
As in Table 2, the model for X 6 and X 7 can be given as follows:
X 6 =−0.00546 I 1.1−0.17883 I 1.2 −0.20730 I 1.3 −0.21008 I 1.4 −0.25341 I 1.5 −0.12467 I 1.6 −0.08904 I 1.7−0.12449
X 7 =−0.08120 I 2.1−0.18931 I 2.2−0.19984 I 2.3 −0.21103 I 2.4 −0.14889 I 2.5−0.21292 I 2.6 −0.16007 I 2.7−0.17334
The next output obtained from the non-linear principal component analysis is category quantification
score. Table 3 shows the quantification of categories obtained from the analysis results of the non-linear
principal component analysis. This quantification of categories was used to replace the respondent qualitative
data and for the calculation of transformation data and principal component score in line with equation (5).
Table 3. Category quantification score
Category quantifications
Variable Indicators
0 1
I6.1 0.003857 -0.0000019
I6.2 0.011322 -0.0006985
I6.3 0.019054 -0.0005577
I6.4 0.02715 -0.000402
X6
I6.5 0.028231 -0.0005625
I6.6 0.008002 -0.0004803
I6.7 0.000782 -0.002508
I6.8 0.003509 -0.0010922
I7.1 0.002845 -0.000573
I7.2 0.001597 -0.0055467
I7.3 0.001328 -0.0074347
I7.4 0.00096 -0.0114818
I7.5 0.000386 -0.0142123
X7 I7.6 0.001331 -0.0084216
I7.7 0.003118 -0.0020285
I7.8 0.001816 -0.0040918
I7.9 0.001455 -0.0075821
I7.10 0.00126 -0.0004987
I7.11 0.000474 -0.0127457
The following step is to find out the score of the quantification of categories and the component loading
that required the calculation of the principal component score. To obtain this score, the multiplication of the
category quantification score and the component loading of each indicator of the dimension used was required.
8
The principal component score was obtained by adding up the multiplication result of the transformed data of
each respondent to the component loading in each indicator.
Table 4. Combined data
Subject Y X1 X2 X3 X4 X5 X6 X7
1 0 41 0 2 1 0 0.005938 0.013365
2 1 39 1 3 3 0 0.005938 0.013365
3 1 54 0 3 3 0 0.005938 0.078863
… … … … … … … … …
8731 0 61 1 0 1 1 -0.034718 -0.01411
8732 0 21 0 1 1 1 -0.034718 -0.01411
3.2. Multicollinearity Test

The multicollinearity test was performed by referring Pearson-Spearman correlation score. The correlation
score of each variable can be seen in Table 5.
Table 5. Pearson-Spearman Correlation Score
Corr. Y X1 X2 X3 X4 X5 X6 X7
Y 1.00 -0.17 -0.00 0.09 0.08 -0.26 0.03 0.01
X1 -0.17 1.00 0.02 -0.28 -0.12 0.17 -0.03 -0.00
X2 -0.00 0.02 1.00 -0.01 0.25 0.01 0.01 0.03
X3 0.09 -0.28 -0.00 1.00 0.04 -0.19 0.18 0.41
X4 0.08 -0.12 0.25 0.04 1.00 -0.05 0.05 0.08
X5 -0.26 0.17 0.01 -0.12 -0.06 1.00 -0.00 0.09
X6 0.03 -0.03 0.01 0.18 0.05 -0.00 1.00 0.21
X7 0.01 -0.00 0.03 0.41 0.08 0.09 0.21 1.00
Table 5 shows that there was no correlation between variables higher than 0.6, meaning that the
assumption of non-multicollinearity was fulfilled.
3.3. Bayesian Logistic Regression

Bayesian method aims to obtain posterior distribution from the multiplication of prior distribution and
likelihood. Several previous studies using the Bayesian method often used non-informative prior to determining
prior distribution, considering that there has not been any prior knowledge regarding the parameter distribution.
With the absence of this parameter, previous studies referred to normal distribution. This type of distribution was
picked due to the two parameters, namely mean (𝜇) showing the true parameter score and standard deviation (𝜎)
showing the uncertainty of the score of a parameter. Therefore, in this research, the prior was determined to have
a normal distribution with the mean zero and variance 1. In this research, Gibbs sampling was used as the
MCMC algorithm with the iteration of 1,000,000 + 500,000 burn in + 4 thin.
3.3.1. Convergence Test

In the Bayesian method, a convergence test is required to find out if the generated score is in accordance
with the posterior distribution. The convergence test in MCMC used a trace plot, MC Error, and autocorrelation
plot. The convergence test using MC Error is presented in Table 6.
9
Table 6. Convergence Test using MC Error
Paramete 1% MC
SD Result
r SD Error
β0 Convergenc
0.2162 0.0022 0.00165
e
β1 Convergenc
0.0041 0.0000 0.00003
e
β2 Convergenc
0.0990 0.001 0.00030
e
β3 Convergenc
0.0588 0.0006 0.00028
e
β4 Convergenc
0.0622 0.0006 0.00032
e
β5 Convergenc
0.0974 0.001 0.00030
e
β6 0.1707 0.0017 Convergenc
0.00052
0 1 e
β7 0.0291 0.0002 Convergenc
0.00010
7 9 e
beta_0
0.5
0.0
The trace plot from the analysis result is presented in Figure 1. -0.5
-1.0
-1.5
-2.0
500000 600000 800000

iteration
beta_1
beta_0
0. 5
beta_2
0. 0
- 0. 5
- 1. 0
beta_1
- 1. 5
- 2. 0
500000 600000 800000

iter at ion
beta_1
-0.03
- 0. 03
0.5
- 0. 04
- 0. 05
- 0. 06
-0.03
- 0. 07
500000 600000 800000

iter at ion
beta_2
0.25
-0.04
0. 5
0. 25
0. 0
- 0. 25
- 0. 5
-0.04
- 0. 75
500000 600000 800000
0.0
iter at ion
beta_3
-0.05
0. 6
0. 4
0. 2
0. 0
-0.05
- 0. 2
-0.25
500000 600000 800000
iter at ion
beta_4
-0.06
0. 6
0. 4
0. 2
-0.5
0. 0
-0.06
- 0. 2
500000 600000 800000

iter at ion
beta_5
-0.07
- 1. 25
-0.75
- 1. 5
- 1. 75
- 2. 0
-0.07
- 2. 25
500000 600000 800000

iter at ion
beta_6
0. 0
- 0. 5
500000 600000 800000

- 1. 0
500000 600000 800000

- 1. 5
- 2. 0
500000 600000 800000

500000 600000 800000
iter at ion
beta_7
0. 1
iteration
iteration
0. 0
- 0. 1
- 0. 2
iteration
500000 600000 800000
iter at ion
beta_3
beta_2
beta_3
beta_4
0.5 0.6
0.6
0.6
0.25 0.4
0.4
0.4
0.0 0.2
0.2
-0.25 0.2
0.0
0.0
-0.5 0.0
-0.2
-0.75
-0.2 -0.2
500000 600000 800000
500000 600000 800000 500000 600000 800000
iteration
iteration iteration
beta_4 beta_5
beta_6
beta_5
0.6 -1.25
0.0
-1.25
0.4 -1.5
-0.5
-1.5
0.2 -1.75
-1.0
-1.75
0.0 -2.0
-1.5
-2.0
-0.2 -2.25
-2.0
-2.25
500000 600000 800000 500000 600000 800000
500000 600000 800000
500000 600000 iteration 800000 iteration
iteration
iteration
beta_7
beta_6
0.1
0.0
-0.5 0.0
-1.0
-0.1
-1.5
-0.2
-2.0
500000 600000 800000
500000 600000 800000
iteration iteration
Figure 1. Trace plot

Figure 1 shows that the trace plot did not show any strong pattern or periodicity. The autocorrelation plot
is presented in Figure 2.
10
Figure 2. Autocorrelation Plot

Figure 2 shows that autocorrelation between parameters was low, resulting in generated independent
samples. According to the result of the convergence test of MCMC, each parameter was convergent or the
generated samples were from the expected posterior distribution.
3.3.2. Parameter Significance Test

In a Bayesian method, a parameter significance test is performed by examining the credible interval. A
parameter is deemed to be significant if the credible interval does not show zero in the percentile interval of
2.5% to 97.5%. The credible interval for each parameter is presented in Table 7.
Table 7. Credible Interval
Paramete
2.50% 97.50% Result
r
β0 -1.140 -0.293 Significant
β2 -0.321 0.067 Insignificant
β3 0.046 0.277 Significant
β4 0.159 0.403 Significant
Table 7 shows that of the seven variables used, only one variable was proven insignificant based on the
credible interval, namely variable X 2 or sex, while all the six other variables gave significant results based on
credible interval.
3.3.3. Classification Accuracy

The accuracy of a model in classifying data is useful to find out the goodness of a Bayesian logistic
regression model. The higher the classification of a model is formed, the better the model is obtained. This
accuracy in the binary logistic regression is presented in Table 8.
11
Table 8. Classification Accuracy
Prediction Class
Precisely
Classification Accuracy Not as recent Recent Predict
migrants Migrants
Not as recent migrants 6554 10 0.9985
Actual Class Recent migrants 390 16 0.0394
% 0.9426
This table indicates that the model can precisely predict the research subject not as recent migrants,
accounting for 6,554 or 99.85% (sensitivity) and this model can precisely predict the recent migrants as the
research subject for as much as 16 or 3.9% (specificity). Overall, this model can give an accurate prediction of
94.26%.
In this research, the ROC curve was used to test the relevance of the model used in addition to the analysis
using a classification table. The ROC curve of the analysis result is presented in Figure 3.
Figure 3. Kurva ROC
Figure 3 indicates that the model is relevant since the curve generated was close to one. This is in line with
the area under the curve or commonly abbreviated as AUC for as much as 0.768. Of this score, this model is
deemed to be appropriate to explain the model with a fair discrimination category.
3.4. Interpretation
The analysis result based on the odds Ratio is presented in Table 9:
Table 9. Odds
Variable Category Odds ratio
17 years old (reference) 1 (reference)

X1
18 years old 0.9552 (0.9499, 0.9606)
0 Men 1 (reference)
X2
1 Women 0.8808 (0.88, 0.8751)
0 Not going to school 1 (reference)
X3 1 Primary/Secondary school 1.94 (1.11, 3.41)
2 High school 4.79 (2.79, 8.23)
3 University 4.425 (2.42, 7.46)
12
0 Others 1 (reference)
1 Studying 1.57 (0.58, 4.27)
X4
2 Working 8.71 (3.18, 23.8)
3 Taking care of household 2.35 (0.85, 6.48)
0 Not under their ownership 1 (reference)
X5
1 Under their ownership 0.14 (0.12, 0.17)
X6 Housing quality score -0.1928 (lowest) 1 (reference)
Housing quality score -0.1428 0.964 (0.961, 0.967)
Asset ownership score -0.026 (lowest) 1 (reference)
X7
Asset ownership score 0.024 0.9975 (0.9972, 0.9979)
3.5. Discussion
According to parameter significance test, age ( X 1 ) and recent migrant status show a significant correlation
with the odds ratio score of 0.9552 for each increasing age of one year departing from 17 years old resident. That
is, referring to the age of 18, this research concluded that the 17-year-old resident was 1.0449 more likely to be
recent migrants than the 18-year-old resident. This is relevant to the study conducted by Zaiceva (2014)
reporting that the correlation between migration and age is negative and significant, meaning that the probability
to do migration is getting lower as people get older. Moreover, Zaiceva (2014) adds that the highest likelihood of
migration is obvious among the 20-30-year-old resident. Zaiceva (2014) further adds that when people reach the
limit of productive age, another wave of the resident who remigrates will come because the retired resident
decides to come back to their place of origin. This research result is in line with UNECE Policy Brief on Ageing
(2016) implying that despite the main activities as one of the biggest motivations to do the migration, aging is
another motivating factor for people to return to their place of origin or their families and relatives, and no
longer because of economic grounds. This is also because of the challenges faced by elderly migrants when they
migrate. This return among elderly migrants is also due to health conditions or attempts to re-adjust to the
environment so that elderly migrants are more likely to do no migration but to return to their families and
relatives. Since the scope of this research is only restricted to the 64-year-old resident, the migration wave
among people of retirement age is not discussed in this research.
Moreover, the result of the odds ratio shows that women were 0.8808 more likely to become recent
migrants than men. In other words, men were 1.135 more likely to be recent migrants than women. United
Nations General Assembly (2019) reported that, globally, the number of female migrants was twice as high as
that of males in all countries except in Africa and Asia. This was due to discriminative social and cultural norms
and the policies that had an insignificant contribution to the issue aiming to protect the vulnerability of women.
Most people were affected by gender discrimination, harassment, and violations of women’s rights when they
were migrating. The European Institute for Gender Equality (EIGE, 2020) reported several issues regarding
gender inequality in migration status, including job market participation and deskilling. Some other factors such
as qualification and skills have contributed to deskilling. These two factors affect both men and women in
different ways. For example, the role of women as housewives is probably regarded as a hampering factor to
improve their qualifications and skill in the job market since they are bound to their household responsibilities.
In other words, the need to attend retraining or to get a qualification for the skill they have is no longer a priority
due to the gender-based role in a family. The stereotype of the role of women in society probably reinforces
gender inequality in migration.
Furthermore, resident latest education is the variable with a significant score. According to the odds ratio
result, the resident with primary to secondary levels as their latest education were 1.94 more likely to become
recent migrants than those not going to school. Furthermore, the resident with high school level as their latest
education were 4.79 more likely to do recent migration than those not completing their primary education level,
and the resident with university qualifications were 4.425 more likely to do recent migration than those not
13
completing primary school. This is in line with the research conducted by Todaro (2000) explaining that there is
a positive correlation between access to education and migration. People with higher education levels are more
likely to do migration than those with lower education levels. Dustmann and Glitz (2011) argue that migration
and education are two aspects of decision interrelated in multiple dimensions since education and skill play a
vital role in several individual migration stages. Economic success in the regions to which people migrate is
often determined by the levels of education of the migrants concerned and how plausible the transfer of their
skills is to their job. Moreover, education supports migrants with higher education to be more future-oriented
and the salary gaps at the regional level are reduced (Handler, 2018).
In terms of the variable of main activities, of the above four categories, the working category had the
highest odds ratio, accounting for 8.71. This indicates that the people currently working were 8.71 more likely to
do recent migration than the people whose main activity was not working, currently studying, and taking care of
the household. This is in line with the research conducted by Todaro (2000) reporting that economic factor is one
of the strongest factors to migrate. Meanwhile, the currently studying category had an odds ratio of 1.57. This
figure indicates that those with studying as the main activity were 1.57 more likely to do a migration. (Munir,
2010) also explains that access to higher and better education levels contributes to the pull factors to do a
migration. Furthermore, Hagen-Zanker and Mallett (2016) also argue that the education factor determines the
destination of migration, supported by the existence of 110 public and private higher education institutions in the
Province of Yogyakarta in 2019 (BPS, 2020) and two of the five best universities in Indonesia in 2019 according
to 4ICU (2019) were located in Yogyakarta, namely Universitas Gadjah Mada and Universitas Negeri
Yogyakarta. These seem to be the basis of the decision on why people have migrated to Yogyakarta. The
household category accounted for 2.35, indicating that those taking care of the households were 2.35 more likely
to do recent migration than those with activities other than studying, working, and household. Similarly, Mitchell
in Mantra (2012) explains that there are several forces triggering people to remain in their place of origin and to
leave their place of origin. In this case, the responsibility to take care of households serves as the forces to
encourage them to migrate.
Furthermore, the variable of house ownership also has significant effects on recent migrants with the odds
ratio accounting for 0.14. This figure indicates that the resident was 0.14 more likely to be recent migrants than
those not owning houses. That is, people with houses not under their ownership were 7.14 more likely to do
recent migration than the people whose houses were under their ownership. In line with this conclusion,
Helderman, et. al. (2006) argue that generally it is not easy for people to decide to migrate, and there are several
reasons making some remain in a particular place. In addition, several studies also imply that house ownership
has negative effects on the decision to migrate mainly because moving out of the house one owns requires more
costs than moving out of the house one rents, recalling that the transactional costs are often taken as the
responsibility of house owners.
Housing quality and asset ownership also contribute significant influences to the status of recent migrants.
The housing quality has the odds ratio of 0.963 and asset ownership had an odds ratio of 0.975 for every 0.05
increase, indicating that the lower the housing quality and asset ownership are, the lower the likelihood is to
migrate, compared to the residents with higher values of housing quality and asset ownership.
Results should be clear and concise. The results should summarize (scientific) findings rather than
providing data in great detail. Please highlight differences between your results or findings and the previous
publications by other researchers. For tables, they are sequentially numbered with the table title and number
above the table. Tables should be centered in the column and fit to window.
4. Conclusion
This research concludes that the Bayesian approach in logistic regression with iteration 1,000,000, 4
thinning interval, and 500.000 burn-in indicates that of seven variables, six variables with significant influences
14
on the status of recent migrants entering the Province of Yogyakarta consist of age ( X 1 ), latest education ( X 3 ),
main activities ( X 4), house ownership ( X 5 ), housing quality ( X 6 ), and asset ownership ( X 7 ). Of these six
variables, the younger resident, the resident with high school or equal as their latest education, the resident
currently working as their main activity, the resident renting a house, and the resident with high housing quality
score and high asset ownership score are more likely to do recent migration to the Province of Yogyakarta.
This research is expected to give merit to the local government to anticipate matters regarding migration
to the Province of Yogyakarta such as the improvement of school systems and facilities, procurement of student-
friendly public transport, and many more.
Suggestion for further research is to use rare event logistic regression analysis to reduce the bias level in
the prediction of minority data, or latent class regression analysis can also be considered as another option.
5. References
Dillon, R., & Goldstein, M. (1984). Multivariate Analysis: Methods and Applications. John Wiley & Sons.
Dustmann, C., & Glitz, A. (2011). Migration and Education. In Handbook of the Economics of Education (pp. 327–
439). Elsevier. https://doi.org/10.1016/B978-0-444-53444-6.00004-3
Genkin, A., Lewis, D. D., & Mandigan, D. (2007). Large Scale Bayesian Logistic Regression for Text Categorization.
Technometrics, 49(3), 291–304.
Ghosh, J. K., Delampady, M., & Samanta, T. (2007). An Introduction to Bayesian Analysis: Theory and Methods.
Springer Science dan Business Media.
Gifi, A. (1981). Nonlinear Multivariate Analysis. Universitas Leiden.
Hagen-Zanker, J., & Mallett, R. (2016). Journeys to Europe: The Role of Policy in Migrant Decision Making – Policy
Brief.
Helderman, A. C., Ham, M., & Mulder, C. H. (2006). Migration and Home Ownership. Tijdschrift Voor Economische
En Sociale Geografie, 97(2), 111–125. https://doi.org/10.1111/j.1467-9663.2006.00506.x
Hosmer, D. W., & Lemeshow, S. (2013). Applied Logistic Regression (Third Edition). John Wiley & Sons.
King, G., & Zeng, L. (2001). Logistic Regression in Rare Events Data. Political Analysis, 9(2), 137–163.
Lee, E. S. (1976). Teori Migrasi. Pusat Penelitian Kependudukan UGM.
Lestari, F. N. F. (2017). Analisis Diskriminan Dengan Komponen Utama Non Linier (Akunl) Pada Data Berskala
Campuran (Studi Pada Mahasiswa Bidikmisi Fmipa Universitas Brawijaya Angkatan Tahun 2014). Universitas
Brawijaya.
Liu, H., & Powers, D. A. (2012). Bayesian Inference for zero inflated poisson regression modes. . Journal of
Statistics : Advance in Theory and Aplications, 7(2), 155-188.
Mantra, I. B. (2012). Demografi Umum. Pustaka Pelajar.
Markos, A. I., Vozalis, M. G., & Margaritis, K. G. (2010). An Optimal Scaling Approach to Collaborative Filtering
Using Categorical Principal Component Analysis and Neighborhood Formation. In IFIP Advances in
Information and Communication Technology (pp. 22–29). Springer. https://doi.org/10.1007/978-3-642-16239-
8_6
15
Meulman, J., van der Kooij, A., & Heiser, W. (2004). Principal Components Analysis With Nonlinear Optimal
Scaling Transformations for Ordinal and Nominal Data. In Handbook of Quantitative Methodology for the
Social Sciences (pp. 50–71). SAGE Publications, Inc. https://doi.org/10.4135/9781412986311.n3
Munir, R. (2010). Migrasi dalam Dasar-Dasar Demografi (2nd Ed). Lembaga Demografi FEUI dan Lembaga
Penerbit Salemba Empat.
Ntzoufras, I. (2011). Bayesian Modeling Using WinBUGS . John Wiley & Sons.
Owen, A. B. (2007). Infinitely Imbalanced Logistic Regression. In Journal of Machine Learning Research (Vol. 8).
Rachev, S. T., Hsu, J. S., Bagasheva, B. S., & Fabozzi, F. J. (2008). Bayesian Methods in Finance. John Wiley &
Sons.
Sarmita, I. M., & Simamora, A. H. (2018). Karakteristik Sosial Ekonomi Dan Tipologi Migrasi Migran Asal Jawa Di
Kuta Selatan-Bali. Jurnal Ilmiah Ilmu Sosial, 4(2).
Syairozi, M., & Wijaya, K. (2020). Migrasi Tenaga Kerja Informal: Studi Pada Kecamatan Sukorejo Kabupaten
Pasuruan. Seminar Nasional Sistem Informasi (SENASIF), 4(1), 2383–2394.
The European Institute for Gender Equality (EIGE). (2020). Gender mainstreaming Sectoral Brief: Gender and
Migration. The European Institute for Gender Equality.
Tjiptoherijanto, P. (2009). Dimensi Kependudukan Dalam Pembangunan Berkelanjutan. Bappenas.
Todaro, M. P. (2000). Pembangunan Ekonomi (5th Ed). Bumi Aksara.
UNECE Policy Brief on Ageing. (2016). Migration and Older Age: Older migrants and migrant care workers.
United Nations General Assembly. (2019). The impact of migration on migrant women and girls: a gender
perspective.
Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2012). Probability & Statistics for Engineers & Scientist (9th
ed.). Pearson Education, Inc.
Zaiceva, A. (2014). The impact of aging on the scale of migration . IZA World of Labor, Institute of Labor Economics
(IZA).

16

New Template Geoplanning 2021

Uploaded by

Copyright:

Available Formats

New Template Geoplanning 2021

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

New Template Geoplanning 2021

Uploaded by

Copyright:

Available Formats

GeoPlanning

Journal of Geomatics and Planning Vol. x, No. x, year

e-ISSN: 2355-6544 Original Research

2. Methods and Material

2.2. Non-linear Principal Component Analysis

From matrix G j , the following process refers to equation (2).

Gj : indicator matrix for each indicator

Dj : diagonal matrix k j × k j (relative frequency of the variable j in the main diagonal)

Gj : indicator matrix for each indicator

X: matrix score of object component (n × m)

G : block compound matrices of G j

Y: a set of multiple and single category quantification

Gj : indicator matrix j of the size n × k j

Yj : calculation of multicategory with order k j × p

2.3. Logistic Regression Analysis

2.3.1. Parameter Estimation in Logistic Regression

P( Bi∨ A) ∝ P( A∨B i) P(B i) (9)

π (θ|x ) ∝ f ( x|θ ) π ( θ ) (11)

2.3.2. Markov Chain Monte Carlo (MCMC)

(1) Gooseneck Toilet

3. Result and Discussion

Variable Indicators Loading Variable Indicators Loading

I6.5 -0.25341 I7.5 -0.14889

As in Table 2, the model for X 6 and X 7 can be given as follows:

3.2. Multicollinearity Test

3.3. Bayesian Logistic Regression

3.3.1. Convergence Test

Table 6. Convergence Test using MC Error

500000 600000 800000

500000 600000 800000

500000 600000 800000

500000 600000 800000

500000 600000 800000

500000 600000 800000

500000 600000 800000

500000 600000 800000

500000 600000 800000

Figure 1. Trace plot

Figure 2. Autocorrelation Plot

3.3.2. Parameter Significance Test

Table 7. Credible Interval

3.3.3. Classification Accuracy

Table 8. Classification Accuracy

Figure 3. Kurva ROC

Variable Category Odds ratio

17 years old (reference) 1 (reference)

You might also like