forked from YoushanZhang/AiAI
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
131 lines (114 loc) · 7.82 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AiAI Research Projects</title>
<style>
body {
font-family: Arial, sans-serif;
margin: 0;
padding: 0;
}
.container {
width: 80%;
margin: 20px auto;
display: flex;
flex-direction: column;
align-items: center;
text-align: center;
}
.logo {
margin-bottom: 20px;
}
.projects {
display: flex;
justify-content: space-between;
width: 100%;
}
.left-projects,
.right-projects {
width: 48%;
}
.project {
margin-bottom: 20px;
}
.project h2 {
margin-bottom: 10px;
}
.project p {
margin: 5px 0;
}
</style>
</head>
<body>
<div class="container">
<img src="AiAI.png" alt="Lab logo" class="logo" width="15%">
<h1>AiAI Research Projects</h1>
<div class="projects">
<div class="left-projects">
<div class="project">
<h2>KatzBot </h2>
<p> A chatbot developed from scratch can answer any questions about academics, faculty, and other information in real-time.</p>
<a href="https://github.com/YoushanZhang/AiAI/tree/main/KatzBot">GitHub Repository</a>
<a href="https://kiran-vutukuri.github.io/katz/"> Project Report </a>
</div>
<div class="project">
<h2>Image Generation for Medical Applications</h2>
<p>The aim of this study is to utilize the diffusion model to augment the original dataset, thereby improving the performance of the Norberg Angle prediction model. </p>
<a href="https://github.com/YoushanZhang/AiAI/tree/main/Image%20Generation%20for%20Medical%20Applications/Dog_Hip">GitHub Repository</a>
</div>
<div class="project">
<h2>Course Video Understanding</h2>
<p>Visual Question Answering (VQA) research aims to create AI systems that can answer natural language questions about images. However, traditional VQA methods often provide simplistic responses. This work introduces Visual Question Explanation (VQE) to enhance VQA by providing detailed explanations and facilitating complex interaction with visual content. We developed an MLVQE dataset from a machine learning course, comprising slide images, transcripts, and question-answer pairs. We propose SparrowVQE, a small multimodal model, trained with a three-stage mechanism: multimodal pre-training, instruction tuning, and domain fine-tuning. SparrowVQE, utilizing SigLIP and Phi-MLP models, outperforms state-of-the-art methods in benchmark VQA datasets, demonstrating superior performance and detailed understanding of visual content.</p>
<a href="https://github.com/YoushanZhang/AiAI/tree/main/Course%20Video%20Understanding">GitHub Repository</a>
</div>
<div class="project">
<h2>Veterinarian GPT</h2>
<p>VetMedGPT is a specialized tool developed to assist in the initial diagnosis and first aid for animals, aiming to bridge the gap in the field of artificial intelligence (AI) by providing tailored support for veterinary medicine healthcare.</p>
<a href="https://github.com/YoushanZhang/AiAI/tree/main/VetMedGPT">GitHub Repository</a>
</div>
<div class="project">
<h2>Machine Learning Chat Robot for Students</h2>
<p>This research introduces a novel generative pre-trained transformer-based model, MLGPT, which utilizes a specialized machine learning question and answer dataset to enhance depth and precision in domain-specific queries. Additionally, we developed the MLGPT-C chatbot that supports interactive, audio-based conversations with real-time interruption capabilities, significantly outperforming existing methods in machine learning query resolution.</p>
<a href="https://github.com/YoushanZhang/AiAI/tree/main/Machine%20Learning%20Chat%20Robot%20for%20Students">GitHub Repository</a>
</div>
</div>
<div class="right-projects">
<div class="project">
<h2>High quality Voice Clone</h2>
<p>This work presents two different approaches to text-to-
speech (TTS) synthesis. We incorporate speaker variance information beyond duration (pitch, energy) to enrich the model’s understanding of speech variations. These features are used during training and inference, improving the one-to-many mapping problem in
TTS</p>
<a href="https://youshanzhang.github.io/AiAI/High-Quality-Voice-Cloning">GitHub Repository</a>
</div>
<div class="project">
<h2>3D Human Motion Generation</h2>
<p>Explore the forefront of animation technology with our 3D Human Motion project. Utilizing advanced AI, our platform translates textual descriptions into realistic 3D human animations, revolutionizing the way digital content is created. Harness the power of our Text Residual Motion Encoder (TRME) to bring dynamic, precise human movements to life with just a few words.</p>
<a href="https://github.com/YoushanZhang/AiAI/tree/main/3D%20Human%20Motion">GitHub Repository</a>
</div>
<div class="project">
<h2>Breast Cancer Detection Mobile App Design</h2>
<p>Description of Project 8.</p>
<a href="https://github.com/kanchanmaurya95/PinkGuardian.git ">GitHub Repository</a>
</div>
<div class="project">
<h2>Voice Cloning</h2>
<p>proposed Text-to-Speech (TTS) system architecture represents a meticulously designed sequence of components aimed at synthesizing natural and expressive voice from input text. At its core are three major components: the Text Encoder, Mel Spectrogram Encoder, and Voice Cloning Model. The Text Encoder serves as the initial step, translating input text into a robust representation through a series of intricate procedures, including character embeddings, bidirectional GRU processing, and attention mechanisms. Following this, the Mel Spectrogram Encoder generates an encoded representation of the mel spectrogram, capturing crucial acoustic subtleties. Finally, the Voice Cloning Model combines these encoded representations, employing a detailed decoder architecture with RNN layers and attention mechanisms to synthesize speech. </p>
<a href="https://github.com/YoushanZhang/AiAI/tree/main/Voice_cloning/">GitHub Repository</a>
</div>
<div class="project">
<h2>Course Attendance Robot</h2>
<p>This project introduces a comprehensive system for managing attendance, harnessing facial detection and recognition technologies to identify individual students and register their attendance.</p>
<a href="https://ankitagg2008.github.io/Course-Attendance-Robot/">GitHub Repository</a>
</div>
<div class="project">
<h2>Video Generation</h2>
<p>Description of Project 10.</p>
<a href="https://github.com/username/project1">GitHub Repository</a>
</div>
</div>
</div>
</div>
</body>
</html>