-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
287 lines (271 loc) · 13.8 KB
/
index.html
File metadata and controls
287 lines (271 loc) · 13.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description"
content="A differentiable visual prompting mechanism to extend the learning abilities of existing adaptors for foundation models.">
<meta name="keywords" content="Visual Prompts, Adverse Conditions, Object-Segmentation, Semantic Segmentation, DiffPrompter">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>DiffPrompter: Differentiable Implicit Visual Prompts for Object-Segmentation in Adverse Conditions</title>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet"
href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<!-- <link rel="icon" href="./static/images/favicon.svg"> -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">DiffPrompter: Differentiable Implicit Visual Prompts for Object-Segmentation in Adverse Conditions</h1>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://www.linkedin.com/in/sanket-kalwar-458801119/">Sanket Kalwar</a><sup>1*</sup>,</span>
<span class="author-block">
<a href="#">Mihir Ungarala</a><sup>1*</sup>,</span>
<span class="author-block">
<a href="#">Shruti Jain</a><sup>1*</sup>,
</span>
<span class="author-block">
<a href="#">Aaron Monis</a><sup>1</sup>,
</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=sLJrNq0AAAAJ&hl=en">Krishna Reddy Konda</a><sup>3</sup>,
</span>
<span class="author-block">
<a href="https://scholar.google.co.in/citations?user=oVS3HHIAAAAJ&hl=en">Sourav Garg</a><sup>2</sup>,
</span>
<span class="author-block">
<a href="https://scholar.google.co.in/citations?user=QDuPGHwAAAAJ&hl=en">K Madhava Krishna</a><sup>1</sup>
</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>Robotics Research Centre, IIIT Hyderabad, India</span>
<br/>
<span class="author-block"><sup>2</sup>AIML, University of Adelaide</span>
<br/>
<span class="author-block"><sup>3</sup>ZF TCI, Hyderabad, India</span>
<br/><br/>
<span class="author-block"><i>* indicates equal contributions</i><span>
</div>
<div class="column has-text-centered">
<div class="publication-links">
<!-- PDF Link. -->
<span class="link-block">
<a href="https://arxiv.org/pdf/2310.04181.pdf"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Paper</span>
</a>
</span>
<span class="link-block">
<a href="https://arxiv.org/abs/2310.04181"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="ai ai-arxiv"></i>
</span>
<span>arXiv</span>
</a>
</span>
<!-- Video Link. -->
<span class="link-block">
<a href="https://www.youtube.com/watch?v=UwAy8gNo3Cg"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-youtube"></i>
</span>
<span>Video</span>
</a>
</span>
<!-- Code Link. -->
<span class="link-block">
<a href="https://github.com/DiffPrompter/diff-prompter"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
</span>
<span>Code</span>
</a>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<!-- Abstract. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-2">Abstract</h2>
<div class="content has-text-justified">
<p>
Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems.
While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios.
</p>
<p>
We introduce <span class="dnerf">DiffPrompter</span>, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in foundation models. Our proposed ∇HFC image processing block excels particularly in adverse weather conditions, where conventional methods often fall short. Furthermore, we investigate the advantages of jointly training visual and latent prompts, demonstrating that this combined approach significantly enhances performance in out-of-distribution scenarios.
</p>
<p>
Our differentiable visual prompts leverage parallel and series architectures to generate prompts, effectively improving object segmentation tasks in adverse conditions. Through a comprehensive series of experiments and evaluations, we provide empirical evidence to support the efficacy of our approach.
</p>
</div>
</div>
</div>
<!--/ Abstract. -->
<!-- Paper video. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Video</h2>
<div class="publication-video">
<iframe width="560" height="315" src="https://www.youtube.com/embed/UwAy8gNo3Cg?si=wcbMwG6FFO4wbZDQ"
title="YouTube video player" frameborder="0" a
llow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
</div>
</div>
</div>
<!--/ Paper video. -->
<!-- </div> -->
</section>
<section class="section">
<div class="container is-max-desktop">
<!-- Overview. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-2">Overview</h2>
<div class="content has-text-justified">
<figure class="is-inline-block">
<img src="./static/images/Teaser_v0.6.jpg" alt="Teaser Image" width="85%">
</figure>
<p>
<span class="dnerf">DiffPrompter</span> is a semantic segmentation method that utilizes the visual and latent prompts generated by the prompt generator.
These prompts are then used by the prompt decoder to generate a semantic mask for segmenting objects, especially in adverse conditions and low-level segmentation tasks.
</p>
</div>
</div>
</div>
</div>
<div class="container is-max-desktop">
<!-- Overview. -->
<div class="columns is-centered has-text-centered">
<div class="column">
<div class="content has-text-justified">
<figure class="is-inline-block">
<img src="./static/images/Teaser_final.jpg" alt="Teaser Image Results" width="95%">
</figure>
</div>
</div>
</div>
</div>
<div class="container is-max-desktop">
<!-- Overview. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<div class="content has-text-justified">
<p>
The <span class="dnerf">DiffPrompter</span> framework serves as the inspiration for creating <span class="is-italic has-text-weight-semibold">Serial Differentiable Adapter</span> (SDA) and <span class="is-italic has-text-weight-semibold">Parallel Differentiable Adapter</span> (PDA),
both of which achieve superior results as compared to the current state-of-the-art (SOTA) methods, <a href="https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_Explicit_Visual_Prompting_for_Low-Level_Structure_Segmentations_CVPR_2023_paper.pdf">EVP</a> and <a href="https://arxiv.org/pdf/2304.09148.pdf">SAM-Adapter</a>.
</p>
<p>
On the left side of the vertical dashed line in the above figure, the columns respectively represent the ground-truth segmentation mask, the SAM-Adapter (ViT-B) output,
and the PDA (SAM init.) output. In this representation, the model's masked output is shown in red, with green bounding boxes highlighting correct predictions
and red bounding boxes indicating missed segmentation outputs (false negatives) or incorrect segmentation outputs (false positives).
On the right side of the vertical dashed line, we showcase qualitative results for EVP (ViT-B) and our SDA model.
</p>
<p>
It is evident that our proposed methods, PDA and SDA, demonstrate superior qualitative performance compared to the SOTA methods, EVP and SAM-Adapter.
</p>
</div>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<!-- Proposed Method. -->
<div class="columns is-centered has-text-centered">
<div class="column">
<h2 class="title is-2">Architecture</h2>
<div class="content has-text-justified">
<figure class="is-inline-block">
<img src="./static/images/EVP-Final_PM_V0.1.jpg" alt="Proposed Method" class="mb-1">
</figure>
</div>
</div>
</div>
<div class="container is-max-desktop">
<!-- Proposed Method. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<div class="content has-text-justified">
<p>
We propose a Differentiable Visual Prompt block, referred to as <span class="dnerf">DiffVP</span>, as shown in Sec. III-A.
This block learns visual prompts using <span class="dnerf">DiffIP</span> and latent prompts through a shallow vision encoder.
</p>
<p>
The Differentiable Adaptor, denoted as <span class="dnerf">DiffAdaptor</span> (Sec. III-B), employs <span class="dnerf">VPTune</span> to learn the local information of the visual prompt.
Local information from the transformer layer of the encoder is provided to the <span class="dnerf">Embedding Tune</span> layer.
The output of the <span class="dnerf">Embedding Tune</span> and <span class="dnerf">VPTune</span> layers is combined and fed into the adaptor layer, which outputs local features that when added to the transformer layers, learn local invariant features.
</p>
<p>
To introduce global invariance in the features, the latent embedding generated by the shallow vision encoder is added to the transformer output features. This combination of local and global invariance aids the <span class="is-italic has-text-weight-semibold">Parallel Differentiable Adaptor</span> (Sec. III-C.1) and <span class="is-italic has-text-weight-semibold">Sequential Differentiable Adaptor</span> (Sec. III-C.2) in improving semantic segmentation tasks.
</p>
</div>
</div>
</div>
</section>
<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
<h2 class="title">BibTeX</h2>
<pre><code>@article{diffprompter2023,
author = {Kalwar, Sanket and Ungarala, Mihir and Jain, Shruti and Monis, Aaron and Konda, Krishna Reddy and Garg, Sourav and Krishna, K Madhava},
title = {DiffPrompter: Differentiable Implicit Visual Prompts for Object-Segmentation in Adverse Conditions},
journal = {},
year = {2023},
}</code></pre>
</div>
</section>
<footer class="footer">
<div class="container">
<div class="content has-text-centered">
<a class="icon-link"
href="https://arxiv.org/pdf/2310.04181.pdf">
<i class="fas fa-file-pdf"></i>
</a>
<a class="icon-link" href="https://github.com/DiffPrompter/diff-prompter" class="external-link" disabled>
<i class="fab fa-github"></i>
</a>
</div>
<div class="columns is-centered">
<div class="column is-8">
<div class="content">
<p>
This website is licensed under a <a rel="license"
href="http://creativecommons.org/licenses/by-sa/4.0/">Creative
Commons Attribution-ShareAlike 4.0 International License</a>. This website borrows the <a href="https://github.com/nerfies/nerfies.github.io">source code</a> of this website.
</p>
</div>
</div>
</div>
</div>
</footer>
</body>
</html>