资源论文Evaluating and Complementing Vision-to-Language Technology for People who are Blind with Conversational Crowdsourcing

Evaluating and Complementing Vision-to-Language Technology for People who are Blind with Conversational Crowdsourcing

2019-11-06 | |  73 |   35 |   0
Abstract We study how real-time crowdsourcing can be used both for evaluating the value provided by existing automated approaches and for enabling workflows that provide scalable and useful alt text to blind users. We show that the shortcomings of existing AI image captioning systems frequently hinder a user’s understanding of an image they cannot see to a degree that even clarifying conversations with sighted assistants cannot correct. Based on analysis of clarifying conversations collected from our studies, we design experiences that can effectively assist users in a scalable way without the need for real-time interaction. Our results provide lessons and guidelines that the designers of future AI captioning systems can use to improve labeling of social media imagery for blind users.

上一篇:Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

下一篇:Solving Sudoku with Consistency: A Visual and Interactive Approach

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...