๐Ÿš€ VLM-FO1 + SAM3 Demo

๐Ÿ“‹ Instructions

Combine the SAM3 detection results with the VLM-FO1 model to enchance its dectection and segmentation performance on complex label tasks.

How it works

  1. Upload or pick an example image.
  2. Describe the target object in natural language.
  3. Hit Submit to run SAM3 + VLM-FO1.

Outputs

  • SAM3 Result: raw detections with masks/bboxes generated by SAM3.
  • VLM-FO1 Result: filtered detections plus labels generated by VLM-FO1.

Tips

  • One prompt at a time is currently supported. Multiple label prompts will be supported soon.
  • Use the examples below to quickly explore the pipeline.

๐Ÿ”— References

Prompt

Click to load example
Image Input Label Prompt