🚀 VLM-FO1 + SAM3 Demo

📋 Instructions

Combine the SAM3 detection results with the VLM-FO1 model to enchance its dectection and segmentation performance on complex label tasks.

How it works

Upload or pick an example image.
Describe the target object in natural language.
Hit Submit to run SAM3 + VLM-FO1.

Outputs

SAM3 Result: raw detections with masks/bboxes generated by SAM3.
VLM-FO1 Result: filtered detections plus labels generated by VLM-FO1.

Tips

One prompt at a time is currently supported. Multiple label prompts will be supported soon.
Use the examples below to quickly explore the pipeline.

🔗 References

SAM3
VLM-FO1

Image Input

Prompt

Label Prompt

Click to load example

Image Input	Label Prompt

SAM3 Result

VLM-FO1 Result

Extracted Detection Output