Saleema Amershi, DanWeld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

The paper [1] studies the interface of the expanding landscape of AI technologies. The authors flow through existing work to select recommendations relevant to their field of interest and follow a thorough process to validate and rewrite them. The paper mentions the struggle of designers and developers in AI interfaces since most work is theoretical, is very scattered across different studies and can’t be directly adopted. This paper, therefore, is more practical in its approach. It caters to the implementation side of the field.

The authors conducted a four-step methodology process. The first one focused on the data collection of existing guidelines and recommendations in the field of question. These guidelines were selected from academic literature, personal blogs (public articles), which aren’t peer-reviewed and increased their knowledge base by studying industry product reviews and analysing them. This phase resulted in a collection of 168 guidelines, out of which the authors selected 20 and worked with for the scope of this study. These guidelines were classified into four headings to cater to different stages in AI development, where the scope for User Interaction lies. These were pre-interaction, during interaction, they considered the faulty/ possible wrong cases and overtime for the longer run. The second phase was directed towards a modified version of heuristic evaluation on the selected guidelines. The modification was to test the guidelines rather than the original method where the interface is scrutinised. The authors used a varied set of products that are used by many people to run their analysis. They came up with a criterion to rewrite 18 guidelines that they worked on in greater detail. The authors have not commented on the purpose, need or method of making this decision.

Moving forward, the authors have ensured that they justify the need for thirteen authors. This study is comprehensive in its approach. Not only does it have a preliminary study done by the authors themselves but they have also incorporated users in phase 3 to play around with their perspective regarding applicability of the recommendation and clarity of the phrasing. This user study involved 49 HCI practitioners. The participants were supposed to use a particular feature of an AI product and then comment through the form of a questionnaire on the scope of the recommendations. Semantic differential scale was used to add detail rather than a simple binary response. Incentives were awarded according to the number of their responses. The AI based products were chosen based on their online ratings, usage of AI and non-offensive nature. Sampling was done by emailing lists spread across 4 countries, with people who had at least some experience in HCI and user research. However, there are no details regarding the demographics of recruited informants. They have very well explained that they mitigated possible institutional or representative biases. Figure 1 visually represents the results of the study. These graphs make interpretation easier and reduces discrepancies for an application based approach. The authors have dwelled in detail about the response to all the guidelines and how they revised their collection. The last phase adopted was expert feedback to judge and validate revisions of guidelines based on user responses. Recruitment was done with snowball sampling, and responses were spread across sub-domains within user experience. These were all from the same organisation, which may have added bias to the study. The detailed research methodology was iterative, and informed decisions were made on every step.

Amershi et al. have cited resources that justified their decisions, like selection criteria of research methodology, rather than just citing resources that they have used. They have made good use of tables and figures to add engagement, value and visual impact to their research. The paper is very well written and presented, though some guidelines lack clarity and need more detail to be understood by everyone involved in the field. Despite the scope for straying away into detail for any particular guideline, the authors stick to the primary idea. This can be seen as an advantage regarding the writing style and as a disadvantage since the participants weren’t offered the option to share interesting experiences. One set of informants were asked for examples they felt were relevant and they could be cited to add value. The title is not glorified and is to the point but misses out on the angle of discussing the holistic approach to implementation of interaction adopted.

This research is a good starting point for practising the implementation of recommendations of researches across the globe. This study is general and can be extended to implement them in different fields. Results may not be replicated in domains other than the one’s informants were recruited from, and others may not respond in a similar pattern. Extensions of the study can look into the shortcomings highlighted in the review.

References

  1. Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19), 1–13. https://doi.org/10.1145/3290605.3300233