Merge pull request #1069 from NANDAGOPALNG/main

Added new project called "Turn Images Into Story"
UTSAVS26 · Nov 4, 2024 · 588da58 · 588da58
2 parents 1e0fcf9 + 16b79da
commit 588da58
Show file tree

Hide file tree

Showing 3 changed files with 129 additions and 0 deletions.
diff --git a/Generative-AI/Turn Pictures Into Story/README.md b/Generative-AI/Turn Pictures Into Story/README.md
@@ -0,0 +1,75 @@
+# Turn Images into Story with AI
+
+
+## Project Overview
+
+This project uses artificial intelligence to generate stories from images. The goal is to create a system that can take an image as input and produce a coherent and engaging story as output.
+
+## How it Works
+
+**Image Input**: The user uploads an image to the system.
+
+**Image Analysis**: The system uses computer vision techniques to analyze the image and extract relevant features such as objects, scenes, and actions.
+
+**Story Generation**: The system uses a natural language processing (NLP) model to generate a story based on the extracted features.
+
+**Post-processing**: The system refines the generated story to ensure coherence, grammar, and readability.
+
+**Features**
+
+**Image Upload**: Users can upload images in various formats (e.g., JPEG, PNG, GIF).
+
+**Story Generation**: The system generates a story based on the uploaded image.
+
+**Story Customization**: Users can customize the story by selecting from various genres, tones, and styles.
+*
+**Image-Story Pairing**: The system allows users to view the original image alongside the generated story.
+
+## Technical Requirements
+
+**Programming Languages**: Python 3.x, JavaScript (for frontend)
+
+**Frameworks**: TensorFlow, Keras (for image analysis and NLP), React (for frontend)
+
+**Libraries**: OpenCV, Pillow (for image processing), NLTK, spaCy (for NLP)
+
+**Hardware**: GPU acceleration recommended for faster image processing and story generation
+
+**Installation**
+
+**Clone the repository**: git clone https://github.com/NANDAGOPALNG/ML-Nexus/tree/main/Generative%20Models/Turn%20Images%20Into%20Story%20with%20AI
+
+**Install dependencies**: pip install -r requirements.txt
+
+**Set up the environment**: python setup.py
+
+**Run the application**: python app.py
+
+## Usage
+
+Upload an image to the system.
+Select the desired story genre, tone, and style (optional).
+Click the "Generate Story" button.
+View the generated story alongside the original image.
+
+## Contributing
+
+Contributions are welcome! If you'd like to contribute to this project, please:
+
+Fork the repository.
+Create a new branch for your feature or bug fix.
+Commit your changes with a descriptive message.
+Open a pull request.
+
+## License
+This project is licensed under the MIT License. See the LICENSE file for details.
+
+## Acknowledgments
+
+We acknowledge the contributions of the open-source community and the developers of the libraries and frameworks used in this project.
+
+## Future Work
+
+Improving Story Quality: Refine the NLP model to generate more coherent and engaging stories.
+Expanding Image Support: Add support for more image formats and sizes.
+User Feedback Mechanism: Implement a feedback system to allow users to rate and improve the generated stories.
diff --git a/Generative-AI/Turn Pictures Into Story/app.py b/Generative-AI/Turn Pictures Into Story/app.py
@@ -0,0 +1,49 @@
+import torch
+import gradio as gr
+from PIL import Image
+import scipy.io.wavfile as wavfile
+
+# Use a pipeline as a high-level helper
+from transformers import pipeline
+
+device = "cuda" if torch.cuda.is_available() else "cpu"
+
+# model_path = ("../Models/models--Salesforce--blip-image-captioning-large"
+#               "/snapshots/2227ac38c9f16105cb0412e7cab4759978a8fd90")
+#
+# tts_model_path = ("../Models/models--kakao-enterprise--vits-ljs/snapshots"
+#                   "/3bcb8321394f671bd948ebf0d086d694dda95464")
+
+caption_image = pipeline("image-to-text",
+                model="Salesforce/blip-image-captioning-large", device=device)
+
+narrator = pipeline("text-to-speech",
+                    model="kakao-enterprise/vits-ljs")
+
+# caption_image = pipeline("image-to-text",
+#                 model=model_path, device=device)
+#
+# narrator = pipeline("text-to-speech",
+#                     model=tts_model_path)
+
+def generate_audio(text):
+    # Generate the narrated text
+    narrated_text = narrator(text)
+
+    # Save the audio to a WAV file
+    wavfile.write("output.wav", rate=narrated_text["sampling_rate"],
+                  data=narrated_text["audio"][0])
+    # Return the path to the saved audio file
+    return "output.wav"
+
+
+def caption_my_image(pil_image):
+    semantics = caption_image(images=pil_image)[0]['generated_text']
+    return generate_audio(semantics)
+
+demo = gr.Interface(fn=caption_my_image,
+                    inputs=[gr.Image(label="Select Image",type="pil")],
+                    outputs=[gr.Audio(label="Image Caption")],
+                    title="@GenAILearniverse Project 8: Image Captioning",
+                    description="THIS APPLICATION WILL BE USED TO CAPTION THE IMAGE.")
+demo.launch()
diff --git a/Generative-AI/Turn Pictures Into Story/requirements.txt b/Generative-AI/Turn Pictures Into Story/requirements.txt
@@ -0,0 +1,5 @@
+transformers
+torch
+gradio
+scipy
+phonemizer