-
Notifications
You must be signed in to change notification settings - Fork 0
/
2024-icra40-from-words-to-poses.html
179 lines (156 loc) · 8.3 KB
/
2024-icra40-from-words-to-poses.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
<!DOCTYPE html>
<html>
<head>
<title>Simon Schwaiger - Github IO</title>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=0.9">
<link rel="stylesheet" href="https://www.w3schools.com/w3css/4/w3.css">
<link rel="stylesheet" href="./src/styles.css">
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico"> <!--http://tools.dynamicdrive.com/favicon/-->
</head>
<body class="w3-light-grey">
<!-- w3-content defines a container for fixed size centered content,
and is wrapped around the whole page content, except for the footer in this example -->
<div class="w3-content">
<!-- Grid -->
<div class="w3-row">
<!-- Introduction menu -->
<div class="w3-col l4">
<!-- About Card -->
<div class="w3-card w3-margin w3-margin-top">
<a href="./index.html">
<img src="./img/portrait.jpg" style="width:100%; border-top-left-radius: 20px; border-top-right-radius: 20px;">
</a>
<div class="w3-container w3-white" style="border-top-left-radius: 0px; border-top-right-radius: 0px;">
<a href="./index.html">
<h4><b>Simon Schwaiger, MSc</b></h4>
</a>
<p>I am a Lecturer/Researcher at University of Applied Sciences Technikum Wien and Doctoral Student at Graz University of Technology, working on machine learning and modern control approaches in robotics as well as personal projects.</p>
<ul style="list-style-type:none; padding:0px; margin:0px;">
<hr>
<li>
<a href="./cv.html">
<button class="w3-button w3-padding-16 w3-white w3-block w3-left-align">
<img style="height:20px; transform:translate(-50%,-2.5px);" src="./img/icons/briefcase.svg"><span><b>Curriculum Vitae</b></span></b>
</button>
</a>
</li>
<li>
<a href="./publications.html">
<button class="button-highlight w3-button w3-padding-16 w3-white w3-block w3-left-align">
<img style="height:20px; transform:translate(-50%,-2.5px);" src="./img/icons/document-signed.svg"><span><b>Publications</b></span></b>
</button>
</a>
</li>
<li>
<a href="./portfolio.html">
<button class="w3-button w3-padding-16 w3-white w3-block w3-left-align">
<img style="height:20px; transform:translate(-50%,-2.5px);" src="./img/icons/square-terminal.svg"><span><b>Portfolio</b></span></b>
</button>
</a>
</li>
<li>
<a href="./teaching.html">
<button class="w3-button w3-padding-16 w3-white w3-block w3-left-align">
<img style="height:20px; transform:translate(-50%,-2.5px);" src="./img/icons/graduation-cap.svg"><span><b>Teaching Experience</b></span></b>
</button>
</a>
</li>
<hr>
<li>
<a href="https://github.com/SimonSchwaiger" target="_blank">
<button class="w3-button w3-padding-16 w3-white w3-block w3-left-align">
<img style="height:20px; transform:translate(-50%,-2.5px);" src="./img/icons/github.svg"><span><b>GitHub</b></span></b>
</button>
</a>
</li>
<li>
<a href="https://www.researchgate.net/profile/Simon-Schwaiger" target="_blank">
<button class="w3-button w3-padding-16 w3-white w3-block w3-left-align">
<img style="height:20px; transform:translate(-50%,-2.5px);" src="./img/icons/book-bookmark.svg"><span><b>Researchgate</b></span></b>
</button>
</a>
</li>
<li>
<a href="https://www.linkedin.com/in/simon-schwaiger-90354519a/" target="_blank">
<button class="w3-button w3-padding-16 w3-white w3-block w3-left-align">
<img style="height:20px; transform:translate(-40%,-2.5px);" src="./img/icons/linkedin.svg"><span><b>LinkedIn</b></span></b>
</button>
</a>
</li>
</ul>
<p></p>
</div>
</div>
<!-- END Introduction Menu -->
</div>
<!-- Information Cards -->
<div class="w3-col l8 s12">
<!-- Content Class -> Holds one Publication -->
<div class="w3-card-4 w3-margin w3-white" style="padding: 15pt;">
<!-- START GENERATED HTML HERE --><h1 align="center">
From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models
</h1>
<h3 align="center">
Tessa Pulli<sup>1</sup>, Stefan Thalhammer<sup>2</sup>, Simon Schwaiger<sup>2</sup> and Markus Vincze<sup>1</sup>
</h3>
<i align="center">
<p><sup>1</sup> Vision for Robotics Laboratory, Automation and Control Institute, TU Wien, Austria</p>
<p><sup>2</sup> University of Applied Sciences Technikum Wien, Faculty of Industrial Engineering, 1200 Vienna, Austria</p>
<p><a href="mailto:pulli@acin.tuwien.ac.at">pulli@acin.tuwien.ac.at</a></p>
</i>
<table align="center" style="border-collapse: collapse; max-width: 300pt;">
<tr>
<td align="middle" style="border: none;">
<a href="https://arxiv.org/pdf/2409.05413" style="color: white; font-size: 14pt;">
<div style="background-color: #363636; border-radius: 50px; padding: 10px 20px; color: white; width: 80pt;">
<img src="img/document_icon.png" height="14" style="transform:translate(-10%,-1px);"> Paper
</div>
</a>
</td>
<td align="middle" style="border: none;">
<a href="" style="color: white; font-size: 14pt;">
<div style="background-color: #363636; border-radius: 50px; padding: 10px 20px; color: white; width: 80pt;">
<img src="img/logo_github.png" height="14" style="transform:translate(-10%,-1px);"> Code
</div>
</a>
</td>
<td align="middle" style="border: none;">
<a href="https://arxiv.org/abs/2409.05413" style="color: white; font-size: 14pt;">
<div style="background-color: #363636; border-radius: 50px; padding: 10px 20px; color: white; width: 80pt;">
<img src="img/logo_arxiv.png" height="14" style="transform:translate(-10%,-1px);"> arXiv
</div>
</a>
</td>
</tr>
</table>
<h2 align="center"> Abstract</h2>
<i align="center">
<p>Robots are increasingly envisioned to interact in real-world scenarios, where they must continuously adapt to new situations. To detect and grasp novel objects, zero-shot
pose estimators determine poses without prior knowledge. Recently, vision language models (VLMs) have shown considerable advances in robotics applications by establishing an understanding between language input and image input. In our work, we take advantage of VLMs zero-shot capabilities and translate this ability to 6D object pose estimation. We propose a novel framework for promptable zero-shot 6D object pose estimation using language embeddings. The idea is to derive a coarse location of an object based on the relevancy map of a language-embedded NeRF reconstruction and to compute the pose estimate with a point cloud registration method. Additionally, we provide an analysis of LERF’s suitability for open-set object pose estimation. We examine hyperparameters, such as activation thresholds for relevancy maps and investigate the zero-shot capabilities on an instance- and category-level. Furthermore, we plan to conduct robotic grasping experiments in a real-world setting.</p>
</i>
<hr />
<h2>Citation</h2>
<p>If you use this work in your research, please cite our paper:</p>
<pre><code class="language-bibtex">Coming soon!
</code></pre>
<!-- END GENERATED HTML HERE -->
</div>
<!-- END Information Cards -->
</div>
<!-- END GRID -->
</div>
<!-- END w3-content -->
</div>
<!-- Footer -->
<footer>
<p style="margin-top:3cm;"><b>2024 Simon Schwaiger.</b>
Powered by <u><a href="https://www.w3schools.com/w3css/default.asp" target="_blank">w3.css</a></u>.
Icons by <u><a href="https://www.flaticon.com/icon-fonts-most-downloaded" target="_blank">flaticon.com</a></u>.
Styling modified from <u><a href="https://github.com/KrauseFx/markdown-to-html-github-style/tree/master" target="_blank">markdown-to-html-github-style.</a></u>
Publication pages inspired by the <u><a href="https://nerfies.github.io/" target="_blank">nerfies github.io page</a></u>.</p>
<p>This website is licensed under a <u><a href="https://creativecommons.org/licenses/by-sa/4.0/" target="_blank">Creative Commons Attribution-ShareAlike 4.0 International License.</a></u>
Website source code can be borrowed, but link back to the template in your footer.</p>
</footer>
</body>
</html>