Modeling Student Mathematics Achievement in Senior High School Based on Selection Results Using Gee 2 Method with Natural Spline

Every school has a vision and mission to become the superior institution so that it can compete and gain trust from the public. To achieve that, one of the efforts of the school is doing the selection of new students at the beginning of each academic year. In Lumajang region, admission of new students (PPDB) are selected using several components, such as national test scores (NUN) and Mapping/Placement test (MP). This research explores the best model of the relationship between selection components (and other conditions of students at the time of selection) and academic achievement during high school (in the form semester mathematics grade) starting from semester 1 till 5 at 3 schools in Lumajang regions. We apply Generalized Estimating Equation order 2 (GEE2) with Natural Spline. The results show that (i) the three schools, have different model and PGRI has the highest mean, followed by SMA1and SMA3, as shown by significant negative estimates of the coefficients. (i) Altough it is relatively small, distance from school has negatif contribution to the mathematics grade as shown by negatif (but significant) coefficient; (ii) The Junior High School NUN has nonlinear (and nonparametric) contribution as shown by the graphical representation and coefficient of natural spline. (iii) Score of Placement Test contribute positively and significantly to the the smester mathematics grade.


INTRODUCTION
Every school has a vision and mission to become a superior institution so that it can compete and gain high trust from the people.To realize that, one of the efforts is to conduct a recruitment of new students (PPDB) at the beginning of each new school year.In Lumajang PPDB uses several requirements or criteria such as the National Test Score (NUN) of Junior High School and Mapping Test.There are other conditions that need to be considered for students who have been accepted in recruitment of new students selection that may affect students' achievement during in senior high school such as distance of residence and income of parents.
To model the relatioship between selection scores (served as predictors) and students achievement on mathematics during 5 semesters in high school (served as responses).The common statistical methods to model relationship between responses and predictors are various regression analyses (or statistical models).Statistical models have been developed from linear model (having one or more predictors) but with one univariate/independent response (having Gaussian/Normal distribution).The linear model have developed to Generalized Linear Model (GLM) to accomodate data with response which are not normally distributed but still independent (Nelder and Wedderbun 1972).For data that may have non normally distributed and correlated response, Liang and Zeger in 1986 introduced a method called Generalized Estimating Equation (GEE) which is a multivariate generalization of GLM.In the GEE method, it is necessary to select the appropriate correlation structure and can describe the correlation among response.The selection of the best model on the GEE method uses Quasi-Likelihood Under the Independence Information Criterion (QIC).GEE2 is an extension of GEE to introduce the equation of scale parameter estimation which is completed simultaneously with the first estimation equation as in ordinary GEE (Tirta et al, 2016) .To accomodate nonlinearity (indicated by the pattern of data that tends to rise or fall sharply) and to produce a good curve shape, then GEE2 may be combined with natural spline or B-spline methods to include a nonparametric component.This research aims to (i) find the best model for describing the relationship between semester mathematics achevement during high school, and various components of students' selection commonly done in Lumajang regions, (ii) describe the components which are significantly related to semester mathematics achievement, by applying GEE2 method with possible extension using natural spline.The finding will help school.
Since the responses are vector (assumed to be correlated each others) the most appropriate method to apply is GEE, especially GEE2 with Natural spline extension.With GEE2 we can model the mean and the scale parameters (dependant upon the distribution of responses variable), correlation structure to model correlation among responses, and component nonparametric in the form of natural spline.The selection of the best (most appropriate) model in term of number of predictors, correlation structure, distribution and link for responses and nonparametric components are done by computing QIC and choose the model with the smallest value of QIC.

RESULTS AND DISCUSSION
We begin with exploring the correlation among the semester grade (RS1,RS2, RS3, RS3, RS4, RS5).As shown in Figure 1 the correlation among (RS1, RS2, RS3, RS4) are relatively constant arround 0.7, but correlation with SR5 are mostly 'dropp' to arround 0.5.Therefore two type of correlation structure, AR-1 and exchangeable, are worth to be considered.In order to find the most suitable predictors, we start the initial model by choosing all the available Xs as predictor and the results (the estimate and their p values) are as follows.
After considering the smoother and the predictor for scale parameter, we then check some candidte for distributions and correlation structures.Since the reponse are continuous variable we consider Gamma (log) and Gaussian as candidates for distribution, while for correlation structure the candidates are Exchangeable and AR-1.So we have some candidate models to explore.We compute the QIC value for each model and choose the model with smallest QIC as the best (most appropriate model).Based on this criteria, we find that the QIC are similr for Gaussian and Gamma where for AR1 the QIC= 3017.701dan for exchangeable the QIC=3017.701.Therefore we choose Gaussian (identity) and Exchangeable as the most appropriate (final) model.The fitting give rises to the following estimates (we only report the estimates and their p-values, since the space is limited).Although the Parents' income doses not contribut significantly neither in mean model nor in scale model, we retain it in the scale model since it enhanced the model in term of reducing (alittle bit) in se (standard error) of the estimates.The results show that student achievement as shown by semester mathematical grade are postively correlated following exchangeable structure, which mean they are significant and relatively constant.The 3 schools have different models, SMA1 has mean of mathematical grade lower than SMA PGRI and SMA3 has the lowest among all, as shown by significant negative estimates of the coefficients.However this finding does not indicate the quality of learning in each school, since except the NUN, the predictors and the grade are local and may not comparable each other.Altough it is relatively small, distance from school has negatif contribution to the mathematics grade as shown by negatif (but significant) coefficient.This results agrees with the results found by Surani (2012), that the closer the students stay to the school, the more opportunity for students to get access to school facilities and may also related to students fitness in attending the class.Perhaps the unexpected result is that the Junior High School NUN no significant linear contribution, but has a bit nonlinear (and nonparametric)

CONCLUSION
The fitting of students learning achievement on mathematics using GEE order 2 gives results in: 1. Student achievement during high school in the form of mathematics grade report from semester 1 (one) to 5 (five) are positively correlated with exchangeable structure.2. The 3 schools have different models, SMA1 has mean of mathematical grade lower than SMAPGRI and SMA3 has the lowest among all.3. Distance of student residence from school has negative contribution to the mathematics grade as shown by negatif (but significant) coefficient.4. The Junior High School National Test Scores have no significant linear contribution, but have a bit nonlinear (and nonparametric) contribution.5. Local Test (Placement Test) scores contribute positively and significantly to the the smester mathematics grade.

Recommendations
Further study is needed to model high school local achievements on mathematics (semester grade, together with other non academic factors) and Senior high school national test score or acceptance/ rejection to university, to find out how various local/ internal score related to national or regional test.

Figure 2 .
Figure 2.Correlation between the junior Figure1 Figure 1.The plot of NUN and Mathematics Grade (for 5 semesters) using Natural Spline on 3 degree of freedom.Figure2.indicates nonlinearity and nonparametric relationship between NUN and Mathematics Grade.So for the next model exploration we will considered NUN as nonparametric component (smoother) using natural spline with degree of freedom 3.After considering the smoother and the predictor for scale parameter, we then check some candidte for distributions and correlation structures.Since the reponse are continuous variable we consider Gamma (log) and Gaussian as candidates for distribution, while for correlation structure the candidates are Exchangeable and AR-1.So we have some candidate models to explore.We compute the QIC value for each model and choose the model with smallest QIC as the best (most appropriate model).Based on this criteria, we find that the QIC are similr for Gaussian and Gamma where for AR1 the QIC= 3017.701dan for exchangeable the QIC=3017.701.Therefore we choose Gaussian (identity) and Exchangeable as the most appropriate (final) model.The fitting give rises to the following estimates (we only report the estimates and their p-values, since the space is limited).Although the Parents' income doses not contribut significantly neither in mean model nor in scale model, we retain it in the scale model since it enhanced the model in term of reducing (alittle bit) in se (standard error) of the estimates.

Famili
contribution as shown by the graphical representation and coefficient of natural spline.This finding actually needs further investigation since the NUNs are national level and the semester mathematics grades are local, school based.In contrary to national wide NUN score, the score of Local Test (Placement Test) contribute positively and significantly to the the smester mathematics grade.The final model can be formulated as follows