Patent Data Analysis and Visualization

Overview

This repository contains code, resources, and workflows for analyzing patent data using Python, Apache Spark, AWS, and Microsoft Azure services. The objective of this project is to extract actionable insights and trends from patent datasets to aid intellectual property strategies and business decisions.

Architecture

The workflow follows a 4-phase architecture:

Sourcing: Data is scraped and ingested from major patent repositories such as:
- Google Patents
- WIPO
- USPTO
- FPO
- Espacenet
Storage: Patent data is stored in cloud solutions:
- Amazon S3
- Microsoft Azure Blob Storage
ETL (Extract, Transform, Load):
- Tools Used: Apache Spark (Azure Databricks) and Delta Lake
- Data pipelines are built using Azure Data Factory to clean and transform data.
- The Medallion Architecture ensures:
  - Bronze: Raw ingestion
  - Silver: Filtered and clean data
  - Gold: Aggregated and analytics-ready data.
Visualization: Insights are visualized using:
- Power BI
- Matplotlib & Seaborn (Python libraries)

Key Features

Web Scraping: Patent data is extracted using BeautifulSoup and Python scripts.
Preprocessing:
- Data cleaning
- Parsing XML, JSON, CSV, and PDF formats
Feature Engineering:
- Keyword extraction
- Citation network analysis
ETL Pipelines: Scalable data processing with Apache Spark.
Visualizations: Interactive charts for patent trends, keyword frequency, and metrics.

Count of Patents by Year

Count o Power BI Desktop f inventor by country

Th Power BI Desktop e development of countries' interest in patenting

Project Structure

├── Analysis of Patents on Virus Engineering.pdf   # PDF report on virus engineering patents
├── ETL_PROCESS.ipynb                              # Notebook for the ETL process
├── Interface_DEMO.rar                             # Demo interface (compressed file)
├── Patents_Scraping.ipynb                         # Notebook for web scraping patent data
├── Project_Architecture.png                       # Architecture diagram for the project
├── Project_Presentation.pdf                       # Project presentation file
├── Projet_visualizations.pdf                      # Visualizations and insights in PDF
└── README.md                                      # Project documentation

Installation

Prerequisites

Python 3.x
Apache Spark
AWS credentials for S3
Microsoft Azure access

Steps:

Clone the repository:

git clone https://github.com/your_username/patent-analysis.git
cd patent-analysis

Data Sources

The project leverages patent data from:

Google Patents
WIPO
USPTO
FPO
Espacenet

Usage

Data Scraping: Use Patents_Scraping.ipynb to collect and store patent data.
ETL Process: Run the ETL_PROCESS.ipynb notebook to clean, transform, and prepare the data.
Visualization: Load the processed data into Power BI or Python notebooks to generate insights.

Contributions

Contributions are welcome! Follow these steps:

Fork this repository.
Create a new branch: git checkout -b feature/new-feature.
Commit your changes: git commit -m "Add new feature".
Push to the branch: git push origin feature/new-feature.
Submit a Pull Request.

Contact

For questions, feedback, or collaborations, contact:

Najma El boutaheri Email: najmaelboutaheri@gmail.com

Acknowledgments

Special thanks to all contributors and the open-source libraries used in this project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Patent Data Analysis and Visualization

Overview

Architecture

Key Features

Project Structure

Installation

Prerequisites

Steps:

Data Sources

Usage

Contributions

Contact

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Analysis of Patents on Virus Engineering.pdf		Analysis of Patents on Virus Engineering.pdf
ETL _PROCESS.ipynb		ETL _PROCESS.ipynb
Interface_DEMO.rar		Interface_DEMO.rar
Patents_Scraping.ipynb		Patents_Scraping.ipynb
Project_Architecture.png		Project_Architecture.png
Project_Presentation.pdf		Project_Presentation.pdf
Projet_visualizations.pdf		Projet_visualizations.pdf
README.md		README.md

najmaelboutaheri/Patents_analysis

Folders and files

Latest commit

History

Repository files navigation

Patent Data Analysis and Visualization

Overview

Architecture

Key Features

Project Structure

Installation

Prerequisites

Steps:

Data Sources

Usage

Contributions

Contact

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages