-
Notifications
You must be signed in to change notification settings - Fork 214
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1069 from NANDAGOPALNG/main
Added new project called "Turn Images Into Story"
- Loading branch information
Showing
3 changed files
with
129 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# Turn Images into Story with AI | ||
|
||
|
||
## Project Overview | ||
|
||
This project uses artificial intelligence to generate stories from images. The goal is to create a system that can take an image as input and produce a coherent and engaging story as output. | ||
|
||
## How it Works | ||
|
||
**Image Input**: The user uploads an image to the system. | ||
|
||
**Image Analysis**: The system uses computer vision techniques to analyze the image and extract relevant features such as objects, scenes, and actions. | ||
|
||
**Story Generation**: The system uses a natural language processing (NLP) model to generate a story based on the extracted features. | ||
|
||
**Post-processing**: The system refines the generated story to ensure coherence, grammar, and readability. | ||
|
||
**Features** | ||
|
||
**Image Upload**: Users can upload images in various formats (e.g., JPEG, PNG, GIF). | ||
|
||
**Story Generation**: The system generates a story based on the uploaded image. | ||
|
||
**Story Customization**: Users can customize the story by selecting from various genres, tones, and styles. | ||
* | ||
**Image-Story Pairing**: The system allows users to view the original image alongside the generated story. | ||
|
||
## Technical Requirements | ||
|
||
**Programming Languages**: Python 3.x, JavaScript (for frontend) | ||
|
||
**Frameworks**: TensorFlow, Keras (for image analysis and NLP), React (for frontend) | ||
|
||
**Libraries**: OpenCV, Pillow (for image processing), NLTK, spaCy (for NLP) | ||
|
||
**Hardware**: GPU acceleration recommended for faster image processing and story generation | ||
|
||
**Installation** | ||
|
||
**Clone the repository**: git clone https://github.com/NANDAGOPALNG/ML-Nexus/tree/main/Generative%20Models/Turn%20Images%20Into%20Story%20with%20AI | ||
|
||
**Install dependencies**: pip install -r requirements.txt | ||
|
||
**Set up the environment**: python setup.py | ||
|
||
**Run the application**: python app.py | ||
|
||
## Usage | ||
|
||
Upload an image to the system. | ||
Select the desired story genre, tone, and style (optional). | ||
Click the "Generate Story" button. | ||
View the generated story alongside the original image. | ||
|
||
## Contributing | ||
|
||
Contributions are welcome! If you'd like to contribute to this project, please: | ||
|
||
Fork the repository. | ||
Create a new branch for your feature or bug fix. | ||
Commit your changes with a descriptive message. | ||
Open a pull request. | ||
|
||
## License | ||
This project is licensed under the MIT License. See the LICENSE file for details. | ||
|
||
## Acknowledgments | ||
|
||
We acknowledge the contributions of the open-source community and the developers of the libraries and frameworks used in this project. | ||
|
||
## Future Work | ||
|
||
Improving Story Quality: Refine the NLP model to generate more coherent and engaging stories. | ||
Expanding Image Support: Add support for more image formats and sizes. | ||
User Feedback Mechanism: Implement a feedback system to allow users to rate and improve the generated stories. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
import torch | ||
import gradio as gr | ||
from PIL import Image | ||
import scipy.io.wavfile as wavfile | ||
|
||
# Use a pipeline as a high-level helper | ||
from transformers import pipeline | ||
|
||
device = "cuda" if torch.cuda.is_available() else "cpu" | ||
|
||
# model_path = ("../Models/models--Salesforce--blip-image-captioning-large" | ||
# "/snapshots/2227ac38c9f16105cb0412e7cab4759978a8fd90") | ||
# | ||
# tts_model_path = ("../Models/models--kakao-enterprise--vits-ljs/snapshots" | ||
# "/3bcb8321394f671bd948ebf0d086d694dda95464") | ||
|
||
caption_image = pipeline("image-to-text", | ||
model="Salesforce/blip-image-captioning-large", device=device) | ||
|
||
narrator = pipeline("text-to-speech", | ||
model="kakao-enterprise/vits-ljs") | ||
|
||
# caption_image = pipeline("image-to-text", | ||
# model=model_path, device=device) | ||
# | ||
# narrator = pipeline("text-to-speech", | ||
# model=tts_model_path) | ||
|
||
def generate_audio(text): | ||
# Generate the narrated text | ||
narrated_text = narrator(text) | ||
|
||
# Save the audio to a WAV file | ||
wavfile.write("output.wav", rate=narrated_text["sampling_rate"], | ||
data=narrated_text["audio"][0]) | ||
# Return the path to the saved audio file | ||
return "output.wav" | ||
|
||
|
||
def caption_my_image(pil_image): | ||
semantics = caption_image(images=pil_image)[0]['generated_text'] | ||
return generate_audio(semantics) | ||
|
||
demo = gr.Interface(fn=caption_my_image, | ||
inputs=[gr.Image(label="Select Image",type="pil")], | ||
outputs=[gr.Audio(label="Image Caption")], | ||
title="@GenAILearniverse Project 8: Image Captioning", | ||
description="THIS APPLICATION WILL BE USED TO CAPTION THE IMAGE.") | ||
demo.launch() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
transformers | ||
torch | ||
gradio | ||
scipy | ||
phonemizer |