Text-only chat interfaces limit the kinds of research and assistance a conversational system can support. This project explored a Python-based visual chatbot that accepts text, images, and video as inputs.
The system combines natural-language processing with multimodal input handling and model APIs. A user can provide visual context alongside a question, allowing the interface to support tasks that would be difficult to express through text alone.
Research focus
- How should a conversation represent visual context?
- What feedback helps a user understand what the system processed?
- How can multimodal responses remain clear and reviewable?
- Which interface patterns reduce friction in research applications?
The work was published at IITCEE 2026 in Bangalore as "Development of a Python-Based Visual Chatbot for Advancing Human-Computer Interaction in Research Applications." The publication is indexed with DOI 10.1109/IITCEE67948.2026.11394241.
For me, the project connected AI engineering with interface design: a model may support multiple modalities, but the product still has to help people use those capabilities with confidence.