diffprompter.github.io/index.html at main · DiffPrompter/diffprompter.github.io

287 lines (271 loc) · 13.8 KB
<!DOCTYPE html>
  <meta charset="utf-8">
  <meta name="description"
        content="A differentiable visual prompting mechanism to extend the learning abilities of existing adaptors for foundation models.">
  <meta name="keywords" content="Visual Prompts, Adverse Conditions, Object-Segmentation, Semantic Segmentation, DiffPrompter">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>DiffPrompter: Differentiable Implicit Visual Prompts for Object-Segmentation in Adverse Conditions</title>
  <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
        rel="stylesheet">
  <link rel="stylesheet" href="./static/css/bulma.min.css">
  <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
  <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
  <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
  <link rel="stylesheet"
        href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
  <link rel="stylesheet" href="./static/css/index.css">
  <!-- <link rel="icon" href="./static/images/favicon.svg"> -->
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script defer src="./static/js/fontawesome.all.min.js"></script>
  <script src="./static/js/bulma-carousel.min.js"></script>
  <script src="./static/js/bulma-slider.min.js"></script>
  <script src="./static/js/index.js"></script>
<section class="hero">
  <div class="hero-body">
    <div class="container is-max-desktop">
      <div class="columns is-centered">
        <div class="column has-text-centered">
          <h1 class="title is-1 publication-title">DiffPrompter: Differentiable Implicit Visual Prompts for Object-Segmentation in Adverse Conditions</h1>
          <div class="is-size-5 publication-authors">
            <span class="author-block">
              <a href="https://www.linkedin.com/in/sanket-kalwar-458801119/">Sanket Kalwar</a><sup>1*</sup>,</span>
            <span class="author-block">
              <a href="#">Mihir Ungarala</a><sup>1*</sup>,</span>
            <span class="author-block">
              <a href="#">Shruti Jain</a><sup>1*</sup>,
            </span>
            <span class="author-block">
              <a href="#">Aaron Monis</a><sup>1</sup>,
            </span>
            <span class="author-block">
              <a href="https://scholar.google.com/citations?user=sLJrNq0AAAAJ&hl=en">Krishna Reddy Konda</a><sup>3</sup>,
            </span>
            <span class="author-block">
              <a href="https://scholar.google.co.in/citations?user=oVS3HHIAAAAJ&hl=en">Sourav Garg</a><sup>2</sup>,
            </span>
            <span class="author-block">
              <a href="https://scholar.google.co.in/citations?user=QDuPGHwAAAAJ&hl=en">K Madhava Krishna</a><sup>1</sup>
            </span>
          </div>
          <div class="is-size-5 publication-authors">
            <span class="author-block"><sup>1</sup>Robotics Research Centre, IIIT Hyderabad, India</span>
            <br/>
            <span class="author-block"><sup>2</sup>AIML, University of Adelaide</span>
            <br/>
            <span class="author-block"><sup>3</sup>ZF TCI, Hyderabad, India</span>
            <br/><br/>
            <span class="author-block"><i>* indicates equal contributions</i><span>
          </div>
          <div class="column has-text-centered">
            <div class="publication-links">
              <!-- PDF Link. -->
              <span class="link-block">
                <a href="https://arxiv.org/pdf/2310.04181.pdf"
                   class="external-link button is-normal is-rounded is-dark">
                  <span class="icon">
                      <i class="fas fa-file-pdf"></i>
                  </span>
                  <span>Paper</span>
              </span>
              <span class="link-block">
                <a href="https://arxiv.org/abs/2310.04181"
                   class="external-link button is-normal is-rounded is-dark">
                  <span class="icon">
                      <i class="ai ai-arxiv"></i>
                  </span>
                  <span>arXiv</span>
              </span>
              <!-- Video Link. -->
              <span class="link-block">
                <a href="https://www.youtube.com/watch?v=UwAy8gNo3Cg"
                   class="external-link button is-normal is-rounded is-dark">
                  <span class="icon">
                      <i class="fab fa-youtube"></i>
                  </span>
                  <span>Video</span>
              </span>
              <!-- Code Link. -->
              <span class="link-block">
                <a href="https://github.com/DiffPrompter/diff-prompter"
                   class="external-link button is-normal is-rounded is-dark">
                  <span class="icon">
                      <i class="fab fa-github"></i>
                  </span>
                  <span>Code</span>
              </span>
            </div>
          </div>
        </div>
      </div>
<section class="section">
  <div class="container is-max-desktop">
    <!-- Abstract. -->
    <div class="columns is-centered has-text-centered">
      <div class="column is-four-fifths">
        <h2 class="title is-2">Abstract</h2>
        <div class="content has-text-justified">
          <p>
            Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems. 
            While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios.
          </p>
          <p>
            We introduce <span class="dnerf">DiffPrompter</span>, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in foundation models. Our proposed &nabla;HFC image processing block excels particularly in adverse weather conditions, where conventional methods often fall short. Furthermore, we investigate the advantages of jointly training visual and latent prompts, demonstrating that this combined approach significantly enhances performance in out-of-distribution scenarios.
          </p>
          <p>
            Our differentiable visual prompts leverage parallel and series architectures to generate prompts, effectively improving object segmentation tasks in adverse conditions. Through a comprehensive series of experiments and evaluations, we provide empirical evidence to support the efficacy of our approach. 
          </p>
        </div>
      </div>
    <!--/ Abstract. -->
    <!-- Paper video. -->
    <div class="columns is-centered has-text-centered">
      <div class="column is-four-fifths">
        <h2 class="title is-3">Video</h2>
        <div class="publication-video">
          <iframe width="560" height="315" src="https://www.youtube.com/embed/UwAy8gNo3Cg?si=wcbMwG6FFO4wbZDQ" 
                  title="YouTube video player" frameborder="0" a
                  llow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" 
                  allowfullscreen>
          </iframe>
        </div>
      </div>
    <!--/ Paper video. -->
  <!-- </div> -->
<section class="section">
  <div class="container is-max-desktop">
    <!-- Overview. -->
    <div class="columns is-centered has-text-centered">
      <div class="column is-four-fifths">
        <h2 class="title is-2">Overview</h2>
        <div class="content has-text-justified">
          <figure class="is-inline-block">
            <img src="./static/images/Teaser_v0.6.jpg" alt="Teaser Image" width="85%">
          </figure>
          <p>
          <span class="dnerf">DiffPrompter</span> is a semantic segmentation method that utilizes the visual and latent prompts generated by the prompt generator. 
          These prompts are then used by the prompt decoder to generate a semantic mask for segmenting objects, especially in adverse conditions and low-level segmentation tasks.
          </p> 
        </div>
      </div>
  <div class="container is-max-desktop">
    <!-- Overview. -->
    <div class="columns is-centered has-text-centered">
      <div class="column">
        <div class="content has-text-justified">
          <figure class="is-inline-block">
            <img src="./static/images/Teaser_final.jpg" alt="Teaser Image Results" width="95%">
          </figure> 
        </div>
      </div>
  <div class="container is-max-desktop">
    <!-- Overview. -->
    <div class="columns is-centered has-text-centered">
      <div class="column is-four-fifths">
        <div class="content has-text-justified">
          <p>
            The <span class="dnerf">DiffPrompter</span> framework serves as the inspiration for creating <span class="is-italic has-text-weight-semibold">Serial Differentiable Adapter</span> (SDA) and <span class="is-italic has-text-weight-semibold">Parallel Differentiable Adapter</span> (PDA), 
            both of which achieve superior results as compared to the current state-of-the-art (SOTA) methods, <a href="https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_Explicit_Visual_Prompting_for_Low-Level_Structure_Segmentations_CVPR_2023_paper.pdf">EVP</a> and <a href="https://arxiv.org/pdf/2304.09148.pdf">SAM-Adapter</a>. 
          </p>
            On the left side of the vertical dashed line in the above figure, the columns respectively represent the ground-truth segmentation mask, the SAM-Adapter (ViT-B) output, 
            and the PDA (SAM init.) output. In this representation, the model's masked output is shown in red, with green bounding boxes highlighting correct predictions 
            and red bounding boxes indicating missed segmentation outputs (false negatives) or incorrect segmentation outputs (false positives). 
            On the right side of the vertical dashed line, we showcase qualitative results for EVP (ViT-B) and our SDA model.
          </p>
          <p> 
            It is evident that our proposed methods, PDA and SDA, demonstrate superior qualitative performance compared to the SOTA methods, EVP and SAM-Adapter.
          </p>  
        </div>
      </div>
<section class="section">
  <div class="container is-max-desktop">
    <!-- Proposed Method. -->
    <div class="columns is-centered has-text-centered">
      <div class="column">
        <h2 class="title is-2">Architecture</h2>
        <div class="content has-text-justified">
        <figure class="is-inline-block">
          <img src="./static/images/EVP-Final_PM_V0.1.jpg" alt="Proposed Method" class="mb-1">
        </figure>
        </div>
      </div>
  <div class="container is-max-desktop">
    <!-- Proposed Method. -->
    <div class="columns is-centered has-text-centered">
      <div class="column is-four-fifths">
      <div class="content has-text-justified">
        <p>
          We propose a Differentiable Visual Prompt block, referred to as <span class="dnerf">DiffVP</span>, as shown in Sec. III-A. 
          This block learns visual prompts using <span class="dnerf">DiffIP</span> and latent prompts through a shallow vision encoder.
        </p>
        <p>
          The Differentiable Adaptor, denoted as <span class="dnerf">DiffAdaptor</span> (Sec. III-B), employs <span class="dnerf">VPTune</span> to learn the local information of the visual prompt. 
          Local information from the transformer layer of the encoder is provided to the <span class="dnerf">Embedding Tune</span> layer. 
          The output of the <span class="dnerf">Embedding Tune</span> and <span class="dnerf">VPTune</span> layers is combined and fed into the adaptor layer, which outputs local features that when added to the transformer layers, learn local invariant features.
        </p>
        <p> 
          To introduce global invariance in the features, the latent embedding generated by the shallow vision encoder is added to the transformer output features. This combination of local and global invariance aids the <span class="is-italic has-text-weight-semibold">Parallel Differentiable Adaptor</span> (Sec. III-C.1) and <span class="is-italic has-text-weight-semibold">Sequential Differentiable Adaptor</span> (Sec. III-C.2) in improving semantic segmentation tasks.
        </p>
      </div>
<section class="section" id="BibTeX">
  <div class="container is-max-desktop content">
    <h2 class="title">BibTeX</h2>
    <pre><code>@article{diffprompter2023,
  author    = {Kalwar, Sanket and Ungarala, Mihir and Jain, Shruti and Monis, Aaron and Konda, Krishna Reddy and Garg, Sourav and Krishna, K Madhava},
  title     = {DiffPrompter: Differentiable Implicit Visual Prompts for Object-Segmentation in Adverse Conditions},
  journal   = {},
  year      = {2023},
}</code></pre>
<footer class="footer">
  <div class="container">
    <div class="content has-text-centered">
      <a class="icon-link"
         href="https://arxiv.org/pdf/2310.04181.pdf">
        <i class="fas fa-file-pdf"></i>
      <a class="icon-link" href="https://github.com/DiffPrompter/diff-prompter" class="external-link" disabled>
        <i class="fab fa-github"></i>
    <div class="columns is-centered">
      <div class="column is-8">
        <div class="content">
          <p>
            This website is licensed under a <a rel="license"
                                                href="http://creativecommons.org/licenses/by-sa/4.0/">Creative
            Commons Attribution-ShareAlike 4.0 International License</a>. This website borrows the <a href="https://github.com/nerfies/nerfies.github.io">source code</a> of this website.
          </p>
        </div>
      </div>
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

index.html

Latest commit

History

index.html

File metadata and controls