Does poor educational infrastructure influence school dropout and child labor in Brazil?
This project (Write a Data Science Blog Post) is part of Udacity Data Scientist Nanodegree Program. Detailed analysis with all required code is posted in my github repository.
The current school dropout in Brazil is 20%, while there are more than 1.8 million children in child labor situation. It’s common sense to relate high quality school infrastructure with low rate of school dropout, but is this actually true in the Brazilian reality?
The objective of this article is to apply some Data Science techniques I’ve learned in Udacity Data Scientist Nanodegree Program to extract valuable insights while exploring and analyzing data to answer the following questions:
- Is school dropout rate higher in schools with less infrastructure?
- Is child labor rate higher in regions where the schools have less infrastructure?
- Is child labor rate higher in regions with high school dropout rate?
The federal government’s website allows access to its different areas, such as education.
I’ve used two datasets to join schools infrastructure 2012 data (link) with a ENEM survey with school dropout and child labor 2013 data (link). Both references are in Brazilian Portuguese.
ENEM is a nation-wide exam where candidates use the grades to apply to universities all around the country, which grades can be used by students to apply for universities across the country and also Portugal.
While doing this exam, the candidates are asked to fill in a socioeconomic survey where they answer questions regarding their study period, social conditions and family composition among other aspects. I used some of those answers to find information about candidates who worked as children or had dropped out of school at least once in the past.
By analyzing the charts below, it’s possible to say that the school dropout problem isn’t isolated in a particular region, given that the Brazilian dropout rate based on ENEM candidate answers is 4.3% and we find states of 4 regions with rates above this number. The only region that has all states below the national rate is the Northeast.
A national survey, conducted by IBGE (national research institute), which points to a 20% school dropout rate, might show a difference between the ENEM candidates and the general Brazilian profile.
Based on Brazilian law, I considered as child labor everyone who stated to have started working under 14 years old, for it’s forbidden for a child this age to work in any circumstances. IBGE informs Brazil had 1.8 million children in child labor situation in 2019.
According to our school dropout findings, child labor is also not specific in a single region. The Brazilian child labor rate based on ENEM candidates answers is 10,2%, and states of 4 regions have rates higher than the national average. The only exception is the Northeast region, which also has all its states below the national rate.
In both aspects, school dropout and child labor, we have the Midwest candidates at the top of the rates, while we have the Northeast region in the opposite side. Mato Grosso do Sul (MS) has 5 times more candidates stating they had dropped out of school at least once and has 3 times more students who had started working before turning 14 years old than Maranhão (MA), which appears among the lowest rates in both charts.
In order deep-drive into this analysis, I’ve used another dataset that contains specific information about schools infrastructure across the country, such as the number of bathrooms available for the students, the electricity source and the sewage system.
For me, it has been particularly surprising that there are several towns where the schools have no bathrooms available for the students, no electricity or not even water supply.
As seen in the charts above, contradictory information shows that almost half of the schools in the North have no bathroom and almost a quarter have no electricity, but it’s the region with the highest meal offering rate. Around 95% of the North schools offer meals for the students during school time, and this rate is even higher for the Northern schools with no electricity: 99% of them provide meals for the students.
Schools infrastructure score
According to Joaquim Soares Neto, professor of Universidade de Brasília, the infrastructure of a school seems to affect the quality of the education it provides, as he presents in his study where, among other researchers, he has found that 84,5% of the Brazilian schools hasn’t the basic structure for a quality education.
In this study, the researchers divided the features a school can have in two categories: basic and advanced. The basic category includes electricity and water supply, sewage disposal, bathrooms and a few equipment such as TVs, DVRs, computers and printers. All other items compose the advanced category, for example teachers room, library, science lab and sports court.
To determine the infrastructure of a school, I’ve attributed different weights for each basic or advanced feature it has. Then I could use this value to compare different combinations of features effectively.
Southern, Midwestern and Southeastern regions have similar infrastructure score averages for these regions states composes the highest half of the infrastructure per state chart.
When it comes to connecting school dropout and infrastructure score, by looking at the above charts, it seems there isn’t a strong relation, given the Northeastern region has the lowest school dropout rate, the highest state in terms of infrastructure score is Ceará (CE), which has a low school dropout rate if compared with the other states, however it’s not even in the “top 10” in terms of school infrastructure.
Is school dropout rate higher in schools with less infrastructure?
By performing a linear regression over each city school dropout rate and its average school infrastructure score, we can see the trend is flat. It’s actually slightly upward.
This means that I couldn’t find an significant reduction on the school dropout rate comparing the highest school infrastructure scores against the lowest ones.
Even if we analyze only a state, for example Maranhão (MA), which has the lowest school dropout rate, it’s not possible to confirm there is growth comparing such values.
Then, the answer for this question is no, school dropout is not higher when the school infrastructure is low.
Is child labor rate higher in regions where the schools have less infrastructure?
Here’s an interesting outcome, the data shows that child labor is slightly higher where the schools have more infrastructure, as below:
Going deeper on this matter, I’ve analyzed the charts of the states with the lowest and highest child labor rate, respectively Piauí (PI) and Mato Grosso (MT). The trend lines are close to flat, showing:
Both states has a similar flat trend line, showing that the child labor is not strongly connected with the schools infrastructure not even when the child labor rate is high or low.
Another aspect that points to the opposite direction of the first impression is the correlation of both axis in 41%, which is considered moderate.
Then, the answer for this question is also no. Although it’s not possible to state the opposite.
Is child labor rate higher in regions with high school dropout rate?
By performing a linear regression, I could see the trend line slightly upward.
Differently of the previous questions, the trend line shows a positive
Analyzing further, I’ve done the same analysis as in the previous question, with the states with lower and higher child labor rate, respectively Piauí (PI) and Mato Grosso (MT).
Although the correlation between axis of 31% is considered moderate, the upward trend on all scenarios makes me believe the data shows that school dropout and child labor are related somehow.
Then, the answer for this question is yes, the data has shown that with higher child labor rates, it’s found also higher school dropout rates.
According to the analyzed data, it’s not possible to confirm the poor schools infrastructure is directly influencing school dropout and child labor rate.
As an Data Science exercise, I’ve experienced that when the dataset is interesting, CRISP-DM process tend to be fun and pleasant.