forked from UTSAVS26/PyVerse
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into Fixes-UTSAVS26#931-Spotify
- Loading branch information
Showing
15 changed files
with
2,135 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
import torch | ||
from transformers import AutoModel, AutoProcessor | ||
|
||
# load the model and processor | ||
model = AutoModel.from_pretrained("jadechoghari/vfusion3d", trust_remote_code=True) | ||
processor = AutoProcessor.from_pretrained("jadechoghari/vfusion3d") | ||
|
||
# download and preprocess the image | ||
import requests | ||
from PIL import Image | ||
from io import BytesIO | ||
|
||
image_url = 'https://sm.ign.com/ign_nordic/cover/a/avatar-gen/avatar-generations_prsz.jpg' | ||
response = requests.get(image_url) | ||
image = Image.open(BytesIO(response.content)) | ||
|
||
# preprocess the image and get the source camera | ||
image, source_camera = processor(image) | ||
|
||
|
||
# generate planes (default output) | ||
output_planes = model(image, source_camera) | ||
print("Planes shape:", output_planes.shape) | ||
|
||
# generate a 3D mesh | ||
output_planes, mesh_path = model(image, source_camera, export_mesh=True) | ||
print("Planes shape:", output_planes.shape) | ||
print("Mesh saved at:", mesh_path) | ||
|
||
# Generate a video | ||
output_planes, video_path = model(image, source_camera, export_video=True) | ||
print("Planes shape:", output_planes.shape) | ||
print("Video saved at:", video_path) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models | ||
|
||
VFusion3D is a large, feed-forward 3D generative model trained with a small amount of 3D data and a large volume of synthetic multi-view data. It is the first work exploring scalable 3D generative/reconstruction models as a step towards a 3D foundation. | ||
|
||
**Quick Start** | ||
Getting started with VFusion3D is super easy! 🤗 Here’s how you can use the model with Hugging Face: | ||
|
||
**Install Dependencies (Optional)** | ||
Depending on your needs, you may want to enable specific features like mesh generation or video rendering. We've got you covered with these additional packages: | ||
|
||
```python | ||
!pip --quiet install imageio[ffmpeg] PyMCubes trimesh rembg[gpu,cli] kiui | ||
``` | ||
|
||
Load model directly | ||
```python | ||
import torch | ||
from transformers import AutoModel, AutoProcessor | ||
|
||
# load the model and processor | ||
model = AutoModel.from_pretrained("jadechoghari/vfusion3d", trust_remote_code=True) | ||
processor = AutoProcessor.from_pretrained("jadechoghari/vfusion3d") | ||
|
||
# download and preprocess the image | ||
import requests | ||
from PIL import Image | ||
from io import BytesIO | ||
|
||
image_url = 'https://sm.ign.com/ign_nordic/cover/a/avatar-gen/avatar-generations_prsz.jpg' | ||
response = requests.get(image_url) | ||
image = Image.open(BytesIO(response.content)) | ||
|
||
# preprocess the image and get the source camera | ||
image, source_camera = processor(image) | ||
|
||
|
||
# generate planes (default output) | ||
output_planes = model(image, source_camera) | ||
print("Planes shape:", output_planes.shape) | ||
|
||
# generate a 3D mesh | ||
output_planes, mesh_path = model(image, source_camera, export_mesh=True) | ||
print("Planes shape:", output_planes.shape) | ||
print("Mesh saved at:", mesh_path) | ||
|
||
# Generate a video | ||
output_planes, video_path = model(image, source_camera, export_video=True) | ||
print("Planes shape:", output_planes.shape) | ||
print("Video saved at:", video_path) | ||
``` | ||
|
||
Default (Planes): By default, VFusion3D outputs planes—ideal for further 3D operations. | ||
Export Mesh: Want a 3D mesh? Just set export_mesh=True, and you'll get a .obj file ready to roll. You can also customize the mesh resolution by adjusting the mesh_size parameter. | ||
Export Video: Fancy a 3D video? Set export_video=True, and you'll receive a beautifully rendered video from multiple angles. You can tweak render_size and fps to get the video just right. | ||
|
||
License | ||
The majority of VFusion3D is licensed under CC-BY-NC, however portions of the project are available under separate license terms: OpenLRM as a whole is licensed under the Apache License, Version 2.0, while certain components are covered by NVIDIA's proprietary license. | ||
The model weights of VFusion3D is also licensed under CC-BY-NC. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
torch | ||
transformers | ||
Pillow | ||
requests |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
LungCancerProject | ||
============================== | ||
|
||
|
||
Deep learning is a fast and evolving field that has a lot of implications on medical imaging field. | ||
|
||
Currently medical images are interpreted by radiologists, physicians etc. But this interpretation gets very subjective. After years of looking at ultrasound images, my co-workers and I still get into arguments about whether we are actually seeing a tumor in a scan. Radiologists often have to look through large volumes of these images that can cause fatigue and lead to mistakes. So there is a need for automating this. | ||
|
||
Machine learning algorithms such as support vector machines are often used to detect and classify tumors. But they are often limited by the assumptions we make when we define features. This results in reduced sensitivity. | ||
However, deep learning could be ideal solution because these algorithms are able to learn features from raw image data. | ||
|
||
One challenge in implementing these algorithms is the scarcity of labeled medical image data. While this is a limitation for all applications of deep learning, it is more so for medical image data because of patient confidentiality concerns. | ||
|
||
In this post you will learn how to build a convolutional neural network, train it, and have it detect lung nodules. I used the data from the Lung Image Database Consortium and Infectious Disease Research Institute [(LIDC/IDRI) data base] (https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI). As these images were huge (124 GB), I ended up using reformatted version available for [LUNA16](https://luna16.grand-challenge.org/data/). This dataset consisted of 888 CT scans with annotations describing coordinates and ground truth labels. First step was to create a image database for training. | ||
|
||
### Creating an image database | ||
|
||
The images were formatted as .mhd and .raw files. The header data is contained in .mhd files and multidimensional image data is stored in .raw files. I used [SimpleITK](http://www.simpleitk.org/) library to read the .mhd files. Each CT scan has dimensions of 512 x 512 x n, where n is the number of axial scans. There are about 200 images in each CT scan. | ||
|
||
There were a total of 551065 annotations. Of all the annotations provided, 1351 were labeled as nodules, rest were labeled negative. So there big class imbalance. The easy way to deal with it to under sample the majority class and augment the minority class through rotating images. | ||
|
||
|
||
We could potentially train the CNN on all the pixels, but that would increase the computational cost and training time. So instead I just decided to crop the images around the coordinates provided in the annotations. The annotation were provided in Cartesian coordinates. So they had to be converted to voxel coordinates. Also the image intensity was defined in Hounsfield scale. So it had to be rescaled for image processing purposes. | ||
|
||
The [script](https://github.com/swethasubramanian/LungCancerDetection/blob/master/src/data/create_images.py) below would generate 50 x 50 grayscale images for training, testing and validating a CNN. | ||
|
||
<script src="https://gist.github.com/swethasubramanian/8483c5a21d0727e99976b0b9e2b60e68.js"></script> | ||
|
||
While the script above under-sampled the negative class such that every 1 in 6 images had a nodule. The data set is still vastly imbalanced for training. I decided to augment my training set by rotating images. The [script](https://github.com/swethasubramanian/LungCancerDetection/blob/master/src/data/augment_images.py) below does just that. | ||
|
||
<script src="https://gist.github.com/swethasubramanian/72697b5cff4c5614c06460885dc7ae23.js"></script> | ||
|
||
So for an original image, my script would create these two images: | ||
<table> | ||
<tr> | ||
<td><img src="https://cloud.githubusercontent.com/assets/5193925/22851059/03e29daa-efcc-11e6-953f-4d0eba54f7ff.jpg" width="300"/></td> | ||
<td><img src="https://cloud.githubusercontent.com/assets/5193925/22851058/f65f3aee-efcb-11e6-8f7e-e35a3cffb1d9.jpg" width="300"/></td> | ||
<td><img src="https://cloud.githubusercontent.com/assets/5193925/22851043/cccb990c-efcb-11e6-9850-b621d21c8bed.jpg" width="300"/></td> | ||
</tr> | ||
<tr><td align="center">original image</td><td align="center">90 degree rotation</td><td align="center">180 degree rotation</td></tr> | ||
</table> | ||
|
||
Augmentation resulted in a 80-20 class distribution, which was not entirely ideal. But I also did not want to augment the minority class too much because it might result in a minority class with little variation. | ||
|
||
### Building a CNN | ||
|
||
Now we are ready to build a CNN. After dabbling a bit with tensorflow, I decided it was way too much work for something incredibly simple. I decided to use [tflearn](http://tflearn.org/). Tflearn is a high-level API wrapper around tensorflow. It made coding lot more palatable. The approach I used was similar to [this](https://medium.com/@ageitgey/machine-learning-is-fun-part-3-deep-learning-and-convolutional-neural-networks-f40359318721#.if8n708rd). I used a 3 convolutional layers in my architecture. | ||
|
||
![arch](https://cloud.githubusercontent.com/assets/5193925/22851323/0b6a1d80-efd3-11e6-9bdf-23ce19796549.png) | ||
|
||
My CNN model is defined in a class as shown in the script below. | ||
|
||
<script src="https://gist.github.com/swethasubramanian/45be51b64d1595e78fb171c5dbb6cce6.js"></script> | ||
|
||
I had a total of 6878 images in my training set. | ||
|
||
### Training the model | ||
Because the data required to train a CNN is very large, it is often desirable to train the model in batches. Loading all the training data into memory is not always possible because you need enough memory to handle it and the features too. I was working out of a 2012 Macbook Pro. So I decided to load all the images into a hdfs dataset using h5py library. You can find the script I used to do that [here.](https://github.com/swethasubramanian/LungCancerDetection/blob/master/src/data/build_hdf5_datasets.py) | ||
|
||
Once I had the training data in a hdfs dataset, I trained the model using this script. | ||
|
||
<script src="https://gist.github.com/swethasubramanian/dca76567afe1c175e016b2ce299cb7fb.js"></script> | ||
|
||
The training took a couple of hours on my laptop. Like any engineer, I wanted to see what goes on under the hood. As the filters are of low resolution (5x5), it would be more useful to visualize features maps generated. | ||
|
||
So if I pass through this image through the first convolutional layer (50 x 50 x 32), it generates a feature map that looks like this: | ||
![conv_layer_0](https://cloud.githubusercontent.com/assets/5193925/22851574/5bb733ba-efdb-11e6-8943-b248f4bcaf58.png) | ||
|
||
The max pooling layer following the first layer downsampled the feature map by 2. So when the downsampled feature map is passed into the second convolutional layer of 64 5x5 filters, the resulting feature map is: | ||
![conv_layer_1](https://cloud.githubusercontent.com/assets/5193925/22851575/5cf648d8-efdb-11e6-9363-bb4da8c346fa.png) | ||
|
||
The feature map generated by the third convolutional layer containing 64 3x3 filters is: | ||
![conv_layer_2](https://cloud.githubusercontent.com/assets/5193925/22851577/5e79d026-efdb-11e6-8a2f-6860716d582b.png) | ||
|
||
### Testing data | ||
I tested my CNN model on 1623 images. I had an validation accuracy of 93 %. My model has a precision of 89.3 % and recall of 71.2 %. The model has a specificity of 98.2 %. | ||
|
||
Here is the confusion matrix. | ||
|
||
![confusion_matrix](https://cloud.githubusercontent.com/assets/5193925/22851647/9ac21532-efdd-11e6-8618-fd46af1da3a3.png) | ||
|
||
I looked deeper into the sort of predictions: | ||
False Negative Predictions: | ||
![preds_fns](https://cloud.githubusercontent.com/assets/5193925/22851625/1d578b90-efdd-11e6-82d7-74d6f8fb69c8.png) | ||
False Positive Predictions: | ||
![preds_fps](https://cloud.githubusercontent.com/assets/5193925/22851626/1f47302c-efdd-11e6-8f17-70128e52875a.png) | ||
True Negative Predictions: | ||
![preds_tns](https://cloud.githubusercontent.com/assets/5193925/22851627/211a12e8-efdd-11e6-921f-62a8e55353c6.png) | ||
True Positive Predictions: | ||
![preds_tps](https://cloud.githubusercontent.com/assets/5193925/22851629/22a94886-efdd-11e6-90bb-a7331ceae520.png) | ||
|
||
|
||
--Ananya Gupta | ||
|
||
|
||
|
||
|
||
|
Oops, something went wrong.