Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
xiaoqian-shen authored Oct 21, 2024
1 parent 19663cc commit fa5f9ca
Showing 1 changed file with 68 additions and 25 deletions.
93 changes: 68 additions & 25 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ <h1 class="title is-2 publication-title">LongVU: Spatiotemporal Adaptive Compres
</span>

<span class="link-block">
<a href="" target="_blank"
<a href="https://github.com/Vision-CAIR/LongVU" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
Expand All @@ -91,7 +91,7 @@ <h1 class="title is-2 publication-title">LongVU: Spatiotemporal Adaptive Compres
</span>

<span class="link-block">
<a href="" target="_blank"
<a href="https://huggingface.co/Vision-CAIR/LongVU_Qwen2_7B" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
🤗
Expand All @@ -103,7 +103,7 @@ <h1 class="title is-2 publication-title">LongVU: Spatiotemporal Adaptive Compres


<span class="link-block">
<a href="" target="_blank"
<a href="https://huggingface.co/spaces/Vision-CAIR/LongVU" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
🎨
Expand Down Expand Up @@ -147,45 +147,75 @@ <h2 class="title is-3">Examples</h2> <br></div>
<div class="hero-body">
<div class="container" style="margin-top: -60px;">
<div id="results-carousel" class="carousel results-carousel">

<div class="item">
<video width="100%" controls style="margin-bottom: 30px;">
<div style="text-align: center;">
<video width="60%" controls style="margin-bottom: 30px;">
<source src="https://longvu.s3.amazonaws.com/assets/robot-video-game.mp4" type="video/mp4">
</video>
</div>

<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 100px; overflow-y: auto;">
<p><strong style="color: #ec9720;">User:</strong> Please provide a detailed description of the video.</p>
<p><strong style="color: #008AD7;">LongVU:</strong> The video begins with a scene set in a futuristic cityscape at night, where a humanoid robot stands in the foreground, its back to the camera, against a backdrop of illuminated buildings and a helicopter in flight. The setting is characterized by a blend of modern and cyberpunk elements, with neon signs and a wet street reflecting the city lights. As the video progresses, the robot remains stationary, with the helicopter moving closer, eventually hovering above the street, indicating a possible interaction or event unfolding between the robot and the helicopter.
<br><br>
The scene then shifts to an interior setting, likely a workshop or garage, filled with various electronic equipment and machinery. The robot is now in the foreground, facing the camera, with a person in the background, suggesting a narrative progression. The environment is cluttered, with the lighting casting a blue hue over the scene, emphasizing the technological theme.
<br><br>
As the video continues, the camera angle changes to show the robot from behind, with the person in the background now more visible, indicating movement. The workshop setting remains consistent, with the clutter and electronic equipment unchanged. The lighting and color scheme continue to highlight the technological ambiance.
<br><br>
The perspective then shifts to a side view of the robot, revealing more of its structure and the cluttered workshop environment. The person in the background is now closer to the robot, suggesting interaction or movement towards it. The lighting and color scheme remain consistent, maintaining the technological atmosphere.
<br><br>
Finally, the camera zooms in on the robot, focusing on its upper body and the intricate details of its internal components. The cluttered workshop environment becomes less visible, with the lighting casting a blue hue over the scene, emphasizing the technological theme. The camera's close-up shot highlights the robot's design and the complexity of its internal mechanisms, suggesting a narrative focus on the robot's internal workings or its role within the technological setting.</p>
</div>
</div>

<div class="item">
<div style="text-align: center;">
<video width="60%" controls style="margin-bottom: 30px;">
<source src="https://longvu.s3.amazonaws.com/assets/video_10834.mp4" type="video/mp4">
</video>
</div>

<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 180px; overflow-y: auto;">
<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 100px; overflow-y: auto;">
<p><strong style="color: #ec9720;">User:</strong> Can you identify the direction of the yellow sphere's movement in the video?</p>
<p><strong style="color: #008AD7;">LongVU:</strong> The yellow sphere is moving away from the red cube and towards the right side of the frame.</p>
</div>
</div>

<div class="item">
<video width="100%" controls style="margin-bottom: 30px;">
<div style="text-align: center;">
<video width="60%" controls style="margin-bottom: 30px;">
<source src="https://longvu.s3.amazonaws.com/assets/H_H_3_0709_0916.mp4" type="video/mp4">
</video>
</div>

<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 180px; overflow-y: auto;">
<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 100px; overflow-y: auto;">
<p><strong style="color: #ec9720;">User:</strong> What happens to the man as he tries to free himself from the narrow gap in the chair?</p>
<p><strong style="color: #008AD7;">LongVU:</strong> As the man tries to free himself from the narrow gap in the chair, he gets stuck and struggles to get out.</p>
</div>
</div>

<div class="item">
<video width="100%" controls style="margin-bottom: 30px;">
<div style="text-align: center;">
<video width="60%" controls style="margin-bottom: 30px;">
<source src="https://longvu.s3.amazonaws.com/assets/Top081_02555.mp4" type="video/mp4">
</video>
</div>

<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 180px; overflow-y: auto;">
<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 100px; overflow-y: auto;">
<p><strong style="color: #ec9720;">User:</strong> Describe this video in detail.</p>
<p><strong style="color: #008AD7;">LongVU:</strong> The video begins with a scene featuring two animated characters in a fantastical setting, suggesting a narrative of adventure or conflict. The first character, dressed in a yellow and red martial arts uniform with a mask, is seen in a defensive or preparatory stance, while the second character, an older man with a white beard and a blue robe, appears surprised or concerned. The background is filled with green, leaf-like structures and a mountainous landscape, indicating a natural, possibly magical environment.</p>
</div>
</div>

<div class="item">
<video width="100%" controls style="margin-bottom: 30px;">
<div style="text-align: center;">
<video width="60%" controls style="margin-bottom: 30px;">
<source src="https://longvu.s3.amazonaws.com/assets/1E7VU.mp4" type="video/mp4">
</video>
</div>

<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 180px; overflow-y: auto;">
<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 100px; overflow-y: auto;">
<p><strong style="color: #ec9720;">User:</strong> What happened after the person held the dish?</p>
<p><strong style="color: #008AD7;">LongVU:</strong> After holding the dish, the person placed it in the cabinet.</p>
</div>
Expand All @@ -207,50 +237,61 @@ <h2 class="title is-3">Long Video Examples</h2> <br></div>
<div class="container" style="margin-top: -60px;">
<div id="results-carousel" class="carousel results-carousel">
<div class="item">
<video width="100%" controls style="margin-bottom: 20px;">
<div style="text-align: center;">
<video width="60%" controls style="margin-bottom: 20px;">
<source src="https://longvu.s3.amazonaws.com/assets/count_139.mp4" type="video/mp4">
</video>
</div>

<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 180px; overflow-y: auto;">
<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 100px; overflow-y: auto;">
<p><strong style="color: #ec9720;">User:</strong> In this video, how many times does the scene of the 'making jewelry' action appear in total?</p>
<p><strong style="color: #008AD7;">LongVU:</strong> The scene of the 'making jewelry' action appears twice in the video.</p>
<p style="color: rgb(168, 71, 71);">(For your reference, the relevant section starts around 00:50 and 03:15)</p>
</div>
</div>

<div class="item">
<video width="100%" controls style="margin-bottom: 20px;">
<div style="text-align: center;">
<video width="60%" controls style="margin-bottom: 20px;">
<source src="https://longvu.s3.amazonaws.com/assets/ego_36.mp4" type="video/mp4">
</video>
</div>

<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 180px; overflow-y: auto;">
<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 100px; overflow-y: auto;">
<p><strong style="color: #ec9720;">User:</strong> Who did I interact with when I skated?</p>
<p><strong style="color: #008AD7;">LongVU:</strong> During your skating, you interacted with a lady wearing a black dress.</p>
<p style="color: rgb(168, 71, 71);">(For your reference, the relevant section starts around 05:00)</p>
</div>
</div>

<div class="item">
<video width="100%" controls style="margin-bottom: 20px;">
<div style="text-align: center;">
<video width="60%" controls style="margin-bottom: 20px;">
<source src="https://longvu.s3.amazonaws.com/assets/needle_43.mp4" type="video/mp4">
</video>
</div>

<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 180px; overflow-y: auto;">
<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 100px; overflow-y: auto;">
<p><strong style="color: #ec9720;">User:</strong> What is the condition of the highway where the SUV is parked for green screen or chroma key?</p>
<p><strong style="color: #008AD7;">LongVU:</strong> The highway where the SUV is parked for green screen or chroma key is empty and surrounded by trees.</p>
<p style="color: rgb(168, 71, 71);">(For your reference, the relevant section starts around 17:53)</p>
</div>
</div>

<div class="item">
<video width="100%" controls style="margin-bottom: 30px;">
<div style="text-align: center;">
<video width="60%" controls style="margin-bottom: 30px;">
<source src="https://longvu.s3.amazonaws.com/assets/order_160.mp4" type="video/mp4">
</video>
</div>

<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 180px; overflow-y: auto;">
<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 100px; overflow-y: auto;">
<p><strong style="color: #ec9720;">User:</strong> Arrange the following events from the video in the correct chronological order: (1)Woman begins ripping the wrapping paper with her hands; (2)Woman sets the box on top of the wrapping paper and begins wrapping the box; (3)Woman lifts up a box and sets it on the table; (4)Woman grabs a pair of scissors and tape. Options:
<ul>
(A) 1->2->3->4 (B) 4->3->2->1 (C) 2->1->3->4 (D) 3->4->1->2
<li>(A) 1->2->3->4</li>
<li>(B) 4->3->2->1</li>
<li>(C) 2->1->3->4</li>
<li>(D) 3->4->1->2</li>
</ul>
</p>
<p><strong style="color: #008AD7;">LongVU:</strong> (D)3->4->1->2.</p>
Expand All @@ -259,11 +300,13 @@ <h2 class="title is-3">Long Video Examples</h2> <br></div>
</div>

<div class="item">
<video width="100%" controls style="margin-bottom: 20px;">
<div style="text-align: center;">
<video width="60%" controls style="margin-bottom: 20px;">
<source src="https://longvu.s3.amazonaws.com/assets/needle_76.mp4" type="video/mp4">
</video>
</div>

<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 180px; overflow-y: auto;">
<div class="answer-text" style="font-size: 16px; font-family: 'Arial'; margin-bottom: 20px; max-height: 100px; overflow-y: auto;">
<p><strong style="color: #ec9720;">User:</strong> What is the chef doing with the lobster in the dinner preparation?</p>
<p><strong style="color: #008AD7;">LongVU:</strong> The chef is cutting the lobster in half.</p>
<p style="color: rgb(168, 71, 71);">(For your reference, the relevant section starts around 05:37)</p>
Expand Down Expand Up @@ -300,7 +343,7 @@ <h2 class="title is-3">LongVU Architecture</h2>
<h2 class="content has-text-justified">
Architecture of <strong>LongVU</strong>. Given a densely sampled video frames, we first utilize DINOv2 prior to remove redundant frames, and fuse the remaining frame features from both SigLIP and DINOv2. Then we selectively reduce visual tokens via cross-modal query. Finally, we conduct spatial token compression based on temporal dependencies to further meet the limited context length of LLMs.
</h2>
<img src="static/images/method.png" height="100%"/>
<img src="static/images/method.png" width="80%"/>
</div>
</div>
</div>
Expand All @@ -315,7 +358,7 @@ <h2 class="content has-text-justified">
<div class="column is-five-sixths">
<h2 class="title is-3">Video Understanding Results</h2>
<div class="myrow">
<img src="static/images/result1.png" height="100%"/>
<img src="static/images/result1.png" width="80%"/>
</div>

</div>
Expand All @@ -330,7 +373,7 @@ <h2 class="title is-3">Video Understanding Results</h2>
<div class="column is-five-sixths">
<h2 class="title is-3">Edge Model Results</h2>
<div class="myrow">
<img src="static/images/result2.png" height="100%"/>
<img src="static/images/result2.png" width="80%"/>
</div>

</div>
Expand Down

0 comments on commit fa5f9ca

Please sign in to comment.