Skip to content

A powerful Python tool for analyzing GitHub repositories, fetching README files, mapping repository structures, and extracting non-binary file contents. Ideal for developers, researchers, and project managers looking to gain insights into repository content efficiently.

Notifications You must be signed in to change notification settings

waveupHQ/github-repo-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Repository Analyzer

This Python package provides tools for analyzing GitHub repositories, including fetching README files, repository structure, and non-binary file contents. It also generates structured outputs with pre-formatted prompts to guide further analysis of the repository's content.

GitHub Repository Analyzer

Features

  • README Retrieval: Automatically extracts the content of README.md to provide an initial insight into the repository.
  • Structured Repository Traversal: Maps out the repository's structure through an iterative traversal method, ensuring thorough coverage without the limitations of recursion.
  • Selective Content Extraction: Retrieves text contents from files, intelligently skipping over binary files to streamline the analysis process.
  • Generate Repository Content File: Creates a text file containing all non-binary file contents from the repository, providing a comprehensive view of the repository's textual content in a single file.

Installation

  1. Clone the repository:
    git clone https://github.com/waveuphq/github-repo-analyzer.git
    cd github-repo-analyzer
  2. Install the required dependencies:
pip install -r requirements.txt
  1. Copy .env.example to .env and add your GitHub personal access token:
cp .env.example .env

Then edit .env and replace your_github_token_here with your actual GitHub token.

Usage

Here's a basic example of how to use the GitHubRepoAnalyzer:

from github_repo_analyzer import GitHubRepoAnalyzer
import os

# Load GitHub token from environment variable
github_token = os.getenv('GITHUB_TOKEN')

# Initialize the analyzer
analyzer = GitHubRepoAnalyzer("owner", "repo", github_token)

# Analyze the repository
analysis = analyzer.analyze_repo()

# Generate structured output
output = analyzer.generate_structured_output(analysis)
print(output)

# Generate content file
analyzer.generate_content_file(analysis)

Running Tests

To run the unit tests:

python -m unittest discover tests

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License.

About

A powerful Python tool for analyzing GitHub repositories, fetching README files, mapping repository structures, and extracting non-binary file contents. Ideal for developers, researchers, and project managers looking to gain insights into repository content efficiently.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages