Job Quality Extractor

This project is about extracting dimensions of job quality from online job adverts. This work was funded by the Economic Statistics Centre of Excellence.

The term "job quality" refers to aspects of a job that affect worker wellbeing - for example how much the job is paid, and whether the contract is permanent. Most research on job quality rightly focuses on data from the employee's point of view, using surveys or interviews or, recently, online reviews.

Here, we provide a method for identifying dimensions of job quality in online job adverts.

What dimensions of job quality do you extract?

We took as our starting point CIPD's seven dimensions of job quality:

pay and benefits
contract (elsewhere called terms of employment)
work-life balance
job design and the nature of work
relationships at work
employee voice
health and wellbeing

We also added an additional category, ‘barriers to access’, to our taxonomy, so that dimensions of job quality that directly impact marginalised groups might be gathered together. We made one further addition, “atmosphere, culture and environment”, which fits under “Social support and cohesion” and which we took from Sleeman 2024. Our taxonomy of job quality can be seen here.

Installation

To install the package, run

pip install git+https://github.com/nestauk/dap_job_quality.git

Quickstart

To extract dimensions of job quality from a single job advert or from a list of job adverts, you can use the extract_job_quality() function. This function takes a dataframe of job adverts as input, and returns

A dataframe with the job adverts split into sentences; each sentence is labelled 0 or 1 according to whether it is related to job quality, and sentences labelled 1 are also matched to the taxonomy.
A concise dict which just contains the ID of each advert, and the target phrases that it was matched to.

Example usage:

from dap_job_quality.pipeline.find_job_quality import JobQuality
import pandas as pd

# Initialize JobQuality class
job_quality = JobQuality()
job_quality.load()

# Example job adverts dataframe
job_adverts = pd.DataFrame(
    [
        {'id': 123, 'description': '[This is a job advert. It has many benefits such as a pension scheme and a cycle to work scheme.]'},
        {'id': 234, 'description': '[This is a job advert for a bank job. There are free childcare vouchers. We also offer a yearly bonus and generous salary.]'}
    ]
)

# Extract job quality
jq_df_filtered, job_id_to_target_phrase = job_quality.extract_job_quality(
    job_adverts, id_col="id", text_col="description"
)

The output dataframe jq_df_filtered should look like this:

id	description	clean_description	job_quality_label	sentences_split	ngrams	target_phrase	cosine_similarity	subcategory
123	[This is a job advert. It has many benefits su...	This is a job advert. It has many benefits suc...	LABEL_1	It has many benefits such as a pension scheme ...	a cycle to work	Cycle to work	0.965111	PERKS
123	[This is a job advert. It has many benefits su...	This is a job advert. It has many benefits suc...	LABEL_1	It has many benefits such as a pension scheme ...	many benefits such as	benefits	0.874949	PERKS
123	[This is a job advert. It has many benefits su...	This is a job advert. It has many benefits suc...	LABEL_1	It has many benefits such as a pension scheme ...	such as a pension	pension	0.821573	COMP
123	[This is a job advert. It has many benefits su...	This is a job advert. It has many benefits suc...	LABEL_1	It has many benefits such as a pension scheme ...	a pension scheme and	pension scheme	0.964935	COMP
234	[This is a job advert for a bank job. There ar...	This is a job advert for a bank job. There are...	LABEL_1	There are free childcare vouchers.	There are free childcare vouchers.	childcare vouchers	0.838904	CARING
234	[This is a job advert for a bank job. There ar...	This is a job advert for a bank job. There are...	LABEL_1	We also offer a yearly bonus and generous salary.	bonus and generous salary.	compensation	0.576268	COMP
234	[This is a job advert for a bank job. There ar...	This is a job advert for a bank job. There are...	LABEL_1	We also offer a yearly bonus and generous salary.	a yearly bonus and	performance bonus	0.618560	COMP

Meanwhile, the more concise output, job_id_to_target_phrase, should look like this:

{
    123: ['Cycle to work', 'benefits', 'pension', 'pension scheme'],
    234: ['childcare vouchers', 'compensation', 'performance bonus']
 }

How does it work?

The pipeline comprises 4 basic steps:

Clean the text minimally, then separate the advert into sentences
Classify the sentences as either relating to job quality (eg "We are a friendly supportive team") or not relating to job quality (eg "You must have a friendly supportive demeanour")
Chunk up the sentences
Match the sentence chunks to the taxonomy(Our taxonomy of job quality can be seen here.)

You can find more detail on these steps in the documentation.

Developer setup

Meet the data science cookiecutter requirements, in brief:
- Install: direnv and conda
Run make install to configure the development environment:
- Setup the conda environment
- Configure pre-commit
Download the spacy model: python -m spacy download en_core_web_sm

Contributor guidelines

Technical and working style guidelines

This project was made possible via funding from the Economic Statistics Centre of Excellence

Project based on Nesta's data science project template (Read the docs here).

Name		Name	Last commit message	Last commit date
Latest commit History 296 Commits
.cookiecutter		.cookiecutter
.github		.github
dap_job_quality		dap_job_quality
docs		docs
infra/skypilot		infra/skypilot
outputs		outputs
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yaml		environment.yaml
jupytext.toml		jupytext.toml
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Job Quality Extractor

What dimensions of job quality do you extract?

Installation

Quickstart

How does it work?

Developer setup

Contributor guidelines

About

Releases

Packages

Contributors 5

Languages

License

nestauk/dap_job_quality

Folders and files

Latest commit

History

Repository files navigation

Job Quality Extractor

What dimensions of job quality do you extract?

Installation

Quickstart

How does it work?

Developer setup

Contributor guidelines

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages