This repository scrapes several job websites in Egypt to gather information on labor market demands. This information can be useful for tracking occupational trends over time, informing educational investments, and designing curriculums that can help improve labor market outcomes.
In Egypt and many developing countries having timely data that is critical for driving key policy decisions and informing individuals on how and where to invest in education is limited. Typically data driven policy decisions have often only been achieved through considerable survey expenses. These data initiatives are often challenging for developing countries to allocate sufficient resources towards given the expansive nature of priorities and needs these countries face. Online job ad data serves as a potential source of low-cost information on the skills in demand that could be useful for policy makers and learners.
This type of information is especially critical in Egypt where nearly three-quarters of the people of working age were out of the labor force, unemployed, informally employed or not-well matched their jobs (estimates from Egypt Labour Force Survey 2014). Among the university educated population, unemployment is especialy high standing at over 20 percent of the population even for science, technical, engineering and mathematics (STEM) degrees. One of the few exceptions is for medical or health degrees where the supply is closely matched to labor market needs. While poor competition, lack of good jobs and labor mobility are part of the explanation, it is likely that these outcomes are also reflective of an education system that has faced challenges in developing the critical skills that graduates need to be successful in the labor market. Starting with a solid data-driven approach by filling the much needed gap in information on labor market trends and the skills that are in demand is seen as an initial step in rectifying one of the major problems in labor market matching.
The two repositories that are scraped are:
- OLX.com
- Wuzzuf.net
OLX.com contains nearly 130K job ads at any given time with around one-third of these ads applying to job seekers and the other two-thirds of job ads applying to open job vacancies. Most of these job ads are listed in Arabic and therefore capture a good deal of the local market. The geographic coverage of these ads is extensive covering nearly 365 regions and all 27 governates of Egypt. Each job ad goes live for approximately 90 days or until removal by the user. The extent to which these ads are potentially valid and representative of real job vacancies requires further investigation. Frequency and extent to job postings may also be somewhat correlated with the availability and accessibility of the internet in different areas. Still this website was assessed as one of the better sources for job advertisements online in the context of Egypt.
Wuzzuf.net is a job platform that is targeted at export oriented jobs and jobs in foreign companies. It contains about 5K job ads at any given time with job ads typically expiring after 30 days. Most job ads are listed in English. The Wuzzuf is a professionalized platform where employers with job vacancies fill in many details about the job requirements and desired qualifications and skills needed for the job and where job seekers can upload resumes and apply directly for the various vacancies through the website. As a result, it does allow tracking of applications, number of resumes reviewed and number of shortlisted candidates that apply to the job ads over time. The geographic scope is currently primarily limited to greater Cairo and Alexandria. While the scope of these jobs are more limited they provide a picture of the skill and educational requirements that are desired for some of the top private and non-governmental sector jobs in Egypt.
The data being gathered is intended to be stored in a SQLite database. As a result, the user should creat the relevant databases by running OLXDatabaseConversion.py and WuzzufDatabaseConversion.py and ensuring that the function reset_tables() is uncommented. This code can also be used to update or replace tables as needed and report key statistics from the data that has been inserted in the various tables.
The code that scrapes the websites are contained in scrapeEgyptOLX_cloudv2.py and scrapeWuzzuf_cloudv2.py. The code is designed to scrape the websites at daily intervals through a UNIX/LINUX based system where you set a crontab that runs the code once daily. The code scrapes each page on day 1, and at weekly intervals thereafter until the job ad expires or ceases to exist. This allows for some moderate tracking of job ad views on OLX.com and applications to different job ads on the Wuzzuf site over time.
This code draws in the data from the SQL database and does some basic cleaning and translation from Arabic to English using googletrans. At the moment it simply outputs key summary statistics by job sector such as the share of managerial versus entry level jobs, whether a bachelor's education is desired, and whether the job is full-time based on the processing of the data. The analysis is not yet optimized to take advantage of the potential time series nature of the dataset or the description content that might allow for highlighting of the skills in demand.
This code cleans the Wuzzuf data that is stored in the SQL database to turn it into a dataset that is ready for basic analysis. This dataset also performs some basic tagging of words to identify the key skills in demand associated with various occupations by cleaning the job requirements section to focus on key words that associated with skills and qualifications. Skill and qualification demands are initially visualized through word clouds (see AnalyzeWuzzuf.py).
A primary goal is to develop this information so that it is more useful for potential job seekers and students in Egypt such that it helps inform educational and skill investments that improve the possibility for these job seekers to enter better jobs. As a result, building a website that can draw on more real-time information and iteractively link to the cleaned data and modeling of job trends will be a critical step in achieving this goal.