Patent classification

This project is aiming to implement the patent classification at the subclass level
according to IPC and CPC systems. The total number of classes is more than 600.

The pipeline for the project implementation is as below:

Extract dataset
EDA of the dataset
Train a model

For all of the above tasks, the respective jupyter notebook is shared.

With the Google big query, the dataset for the classification task is generated. The generated dataset is stored in the CSV file. For each year varying from the year, 2009 to 2019 separate CSV files are created. This dataset is made publically available for experiment purposes. The attribute of these CSV files are as shown in the table below:

ID	Date	Title	Claim	cpc_subclass
8844051	2014-09-23	Lithium-ion secondary battery	A lithium-ion secondary battery comprising ...	H01M,Y02E,Y02T

The link to download this dataset by year is provided below.

2009 CSV Link
2010 CSV Link
2011 CSV Link
2012 CSV Link
2013 CSV Link
2014 CSV Link
2015 CSV Link
2016 CSV Link
2017 CSV Link
2018 CSV Link
2019 CSV Link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Patent classification

Files

README.md

Latest commit

History

README.md

File metadata and controls

Patent classification