Questioning the AI: Informing Design Practices for Explainable AI User Experiences

Q. Vera, Liao Daniel, Gruen Sarah Miller

The research question addressed in this paper [1] is can AI replace human explanations? The authors provide a thorough explanation of the need to make systems explainable and delve into thoughts regarding raising after-questions. These after questions convey why or why not a certain thing is happening, how is it happening, and what would occur if something else happened. These aim to give users a satisfactory explanation. The goals of the study included identfying scope for intervension for making XAI interfaces better for users and investigating explainability of current interfaces and commenting upon their challenges. They maintained an algorithm informed question bank for the investigation, to be used as a guide for future work.

Semi-structured interviews of twenty User Experience and design practitioners was the chosen methodology of research. All of these informants were IBM employees. This is a limitation of the study, similar to the one conducted by Wang et al. [2] where power dynamics within the company would affect responses. Moreover, factors like company ideology and not saying anything that may be looked down upon in the workplace need to be considered in the limitations. The recruitment process was based on a combination of snowball and purposive sampling. Internal chat groups within IBM were targetted, and there was a screening criterion for recruitment. This criterion is generally stated in the paper, but how it was ensured hasn’t been explained. Open coding was used to analyse interview transcripts. However, it was done only the theme of explainabiliy in mind. Any other details that may not be directly relevant but may have scope to explore further were subdued.

A significant shortcoming of the research methodology adopted is that Vera et al. try to understand the users’ needs to understand and work with AI. Although domain experts have the necessary experience to have a general understanding of the users’ thoughts, they cannot comment about it in detail since it may vary across people. Moreover, results may vary based on environment, ease of use, desirability etc. Involving users as informants would be a useful extension of the study. Participants in the study were expected to talk about any AI softwares that they were familiar with and had experience using it. The domains of these softwares haven’t been specified. There might be a prejudice towards some software or domain, many might be missed out. It can be assumed that the softwares were specific only to the domains of the participants. Since a majority of the informants were from business and healthcare, the study would bent in the direction of their experiences. This reduces the possibility of generalisation of the study. Alternatively, a set of softwares could be chosen to conduct contexual studies. Insigths from these interviews could be used to draw a comparisons for explainability of the softwares. Another possible extension to the research methods for a more detailed review would be adopting the method of diary study. The recruited participants would be asked to perform multiple tasks with the AI over the course of a few days or a set time period and they have to record their experience every time they use it.

The paper is hard to read. It does not have a narrative flow of thought, and the placement of findings seems haphazard. It makes it challenging to keep track of the information presented. Moreover, there is a lot of repetition of the point that explainability of interfaces is essential to make AI better. This simply adds words rather than share new information. The paper does not touch upon tangibility, many points stated seem vague and in the air. The title is justified and research novel.

There were certain questions that remain unanswered as the paper concluded and made recommendations. There was no commentary on judgement of performance is good enough for large scale deployment. Who judges it? Should there be any specific criterion for judgment apart from accuracy? How transparent should the systems be in their working for both experts and users. A recommendation that the paper ends in is that XAI products need to have user-friendly interfaces. There is scope for further work here. Suggestions can be made specific to the softwares tested to make their interfaces more explainable.

References

Q. Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20), 1–15. https://doi.org/10.1145/3313831.3376590
Ruoxu Wang, Fan Yang, and Michel M. Haigh. 2017. Let me take a selfie: Exploring the psychological effects of posting and viewing selfies and groupies on social media. Telematics and Informatics 34, 4: 274–283. https://doi.org/10.1016/j.tele.2016.07.004