Although conversational agents (CA) have increased in the control of smart devices, recent research has revealed that the frequency of interaction with agents decreases over time due to a gap between the user’s expectations and the actual experience. To reduce the gap, previous studies explored the mental model related to the user’s expectation for designing a CA through a verbal approach such as an interview, but this was insufficient because the mental model can contain abstract images that are difficult to express in words. Therefore, in this paper, we aim to understand user perceptions through a drawing approach. We asked 34 smart speaker users to draw what the CA looks like. We found that the participants drew not only the CA but also the environment surrounding the CA, and the perception of the environment influences the expectation and intimacy with the CA. Based on these findings, we suggest that environmental factors be considered significant in designing CA persona.