Harvard Data Science Review’s Founding Editor-in-Chief, Xiao-Li Meng, and Media Feature Editor, Liberty Vittert, engage in a conversation with Dr. Seema Iyer, the Senior Director of the Hive at USA for UNHCR, the UN Refugee Agency. The Hive is the innovation lab responsible for bringing data science, machine learning, and new technologies into the organization’s operations to address the needs of refugees around the world.
The conversation between Dr. Iyer and HDSR revolves around the global refugee crisis and the pivotal role of data science in addressing it. Dr. Iyer delves into the types of data gathered to understand the needs of refugees, the challenges in utilizing this data, and the potential role of AI in facilitating new approaches. She provides specific examples about the use of AI for pro bono legal work and speedier processing of refuge statutes and aiding communication to raise awareness about the refugee crisis. Dr. Iyer reflects on the inaugural #Innovate4Refugees convening hosted by The Hive in September to create a space to share insights about the complexities of data collection in the refugee space, emphasizing the need for a broad view of what constitutes data and the importance of creative approaches to make sense of the information gathered. The discussion also addresses the challenges of misinformation and disinformation in the digital age, with a focus on the amplification of misinformation through social media and the efforts to create safer platforms for refugees. The interview concludes with reflections on the role of AI in communication, education, and legal aspects related to refugees, pointing towards the potential of generative AI in transforming how information is disseminated and understood.
This interview is episode 34 of The Harvard Data Science Review Podcast. This episode released October 25, 2023.
HDSR includes both an audio recording and written transcript of the interview below. The transcript that appears below has been edited for purposes of grammar and clarity with approval from all contributors.
Liberty Vittert: [00:00:00] Welcome to The Harvard Data Science Review Podcast. I'm Liberty Vittert, the feature editor of Harvard Data Science Review, and joining me is my cohost and editor-in-chief, Xiao-Li Meng. Today we'll be discussing the pressing issue of the global refugee crisis and how data is playing a crucial role in addressing it. According to the United Nations Refugee Agency, a staggering 108 million people were displaced from their homes by the end of 2022 due to various reasons such as persecution, conflict, human violence, and human rights violations. To shed light on this topic, we have special guest Seema Iyer, the senior director of The Hive, USA for the United Nations Refugee Agency's Innovation Lab that's responsible for bringing data science, machine learning, and new technologies into the organization's operations to enhance fundraising for refugees. During our conversation, we will explore the types of data that have been gathered on refugees and the challenges that organizations face in utilizing this data to streamline their efforts. Furthermore, we will delve into the role of generative AI in aiding the cause and how it's contributing to finding the solutions. Join us to learn all of this and more on this month's insightful episode of The Harvard Data Science Review Podcast.
[00:01:29] Seema, thank you so much for being here. I want to dive right in with probably the most basic question in data science, which is what kind of data are we using to help the refugee crisis? What data is being collected and what data do you need?
Seema Iyer [00:01:47] Thanks for having me on. There are two throughputs to answer that question. One is looking at the point of the distress. So that is the refugees themselves, when a crisis occurs, what the impetus is that causes somebody to potentially have to flee. That is violence or persecution and, increasingly, unfortunately, climate change around the world as different kinds of weather events are occurring that cause people to flee and potentially not return—not just a regular disaster. So that's one end of the kinds of data that we need to really understand the problem. This is a lot of the work that UNHCR, the United Nations High Commissioner for Refugees, the UN Refugee Agency, does directly. But we can use some of that data to think about the full journey; as people potentially are leaving, where are they going? What are the host countries doing? Is that a temporary situation or a permanent resettlement, or are people even repatriating back? And then on the other end, thinking about raising awareness in the United States about the problem to the kinds of people that would be most interested in learning about it and helping. Helping typically means through money and other financial resources, but helping in so many other ways like communicating about the issue and the problem, storytelling, thinking about how technology and whatever you might have that might not be monetary but in-kind donations that you might be able to do to support people as they're experiencing this tragedy in their lives. There's a throughput, to me, one on the unfortunate origin of the problem. Then on the other side, thinking about how do we raise awareness on our end using data science along the entire spectrum.
Xiao-Li Meng: [00:03:53] Well, thank you. A few years ago, HDSR and IOM [International Organization for Migration] together organized a joint symposium. One of the topics there was about misinformation and disinformation. The idea here is obviously that data are being used by good actors as well as by bad actors. How has the data been used by the bad actors to, in some sense, hurt these refugees? We heard that during the Syrian refugee crisis, there were data being used to target these refugees.
Seema Iyer: [00:04:28] I think misinformation and disinformation are not necessarily new. We've probably had that since the beginning of time. The big change now is the amplification of misinformation and the spread and dissemination, and the kind of platforms that you can disseminate misinformation much faster and guise that misinformation as potentially real information. That is the biggest problem: the speed that social media can play. And on the one hand, social media becomes a lifeline when you are fleeing, and maybe the only thing you have is your phone and that's the way to get access in a secure and anonymized way. Your phone may not be a great way for you to directly communicate anymore, so these other kinds of platforms are the way that people on the move might be trying to access information. And unfortunately, they're not terribly well-regulated. From the ground, almost anybody can kind of say anything and there aren’t many guardrails. So part of the work that I know many organizations are trying to do is an attempt to get in front of that potential dissemination of bad information and create safer platforms for refugees to access information.
Xiao-Li Meng: [00:05:50] A follow up question on that is broadly—this is for both the good data and bad data—generally, collecting data is not easy, particularly collecting good quality data. And I would imagine in this space, in the space of refugees, you probably have organizations or even countries that may or may not want you to have that data. So the question for you is what are the particular challenges in your work in terms of ensuring the data is of good quality? How do you deal with these problems? That would be something quite interesting to know.
Seema Iyer: [00:06:27] I think in today's world, we have to take a very broad view of what data even is. Any way that you can get some read on some situation, to me, becomes information and data that we could potentially use. And they come in a variety of different sources, right? Sometimes it's a tweet, or sometimes it's an actual data set. Maybe sometimes it's a government-based collection of information. Sometimes it's a census, if you're lucky, an actual census. To think very broadly about this idea of sensing what's happening, having a kind of ear to the ground, using a broad range of data, is what I like to think of as information. Because in the end, we're trying to use data to address the situation or understand a problem, and the data can come at us in many different ways.
[00:07:16] So when we think about poor quality data, in some ways, it's maybe in our imagination. How creative can we be with whatever information we might be getting any kind of access to? How can we convert that into some type of sensemaking? Because that's what we're really after. And really thinking about what kind of modeling we might be able to do based on whatever information we get. So, I always say that all data is messy. [Laughing.] All data is messy. Our job as data scientists is to really wrangle that data into some way that we can analyze it. Those are the kind of skills that I think of as a 21st century skill: to take, again, a very broad view of what data might even look like, and creating some algorithms and creating some rules and assumptions so that you can convert that information that's just coming at us into some type of sensemaking.
[00:08:15] So even though UNHCR, at the moment when somebody comes into a refugee camp, they attempt to try and get the registration of that refugee, you don't always know what happens to that refugee if they don't remain in the camp. They don't know where they might have gone. And so they are using different ways of understanding—not necessarily for a tracking purpose, but to see did an integration program actually work? If they connected them to a job, did that actually create a sustainable livelihood for them in the future? So really thinking about accessing information wherever you can get it, and maybe really thinking broadly about how that information can get you what you're looking for, which is some understanding of what is happening on the ground. I think that might be something useful for people to think about. Even if you hear one story of a terrible situation, do we really need more stories? Maybe that one story is sufficient to act on it, to actually potentially help somebody or change something. I'm not saying that one story is always the way I want to go—I'd love to have more data, but sometimes, if that's the best you can do, I would take that over nothing, right?
Xiao-Li Meng: [00:09:29] I see. Let me just follow up very quickly on that, because one common problem in data analysis, as you know, is that the data are not representative of what the situations are. But you just said something really quite important, which is that in the situation here, any indication is probably a cause for action, not just waiting for lots of data. But do you particularly run into problems where you don't have data are the places that really probably need more help? Wherever the more terrible situations are is where you have the most difficult to get data—do you run into that kind of issue?
Seema Iyer: [00:10:09] Yeah. Thinking very broadly about what data could be, I would always love more data. Setting up systems where data collection is not a burden to the people that are on the ground is really at the root of what good data science and good systems and data systems could look like. At the moment of intake, when you actually have somebody for 5 minutes—if you're lucky, if you have 5 minutes—really grabbing as much good information as potentially possible to answer as many questions as you can. That's the moment to do it. But you also don't want to overburden the person answering all these questions. So making sure you don't ask superfluous questions and solicit information that you won't need. That's another kind of opportunity to really think about the system and the moment where some data might be collected. Can you make that a better experience not only for the person giving data, but also so that you can answer your question? So really thinking about the piping, the plumbing of data collection, and where you might have some influence to collect data.
[00:11:19] In a best-case scenario, you have a chance to collect what we call primary data, where you might individually go and talk to refugees or talk to anybody along this pathway, maybe somebody who's already received asylum in a different country. But while those are very amazing data sets for other people, not only just the people that primarily are collecting it, those are getting harder and harder to collect now as people have a lot of interview fatigue and survey fatigue. So I think we do also have to think very creatively along the entire pipeline of data potential collection of where we might try to access information. And that maybe is not so new, but maybe a new place where staff or students can really think through and other people can think through 'How can you help the whole system of understanding by collecting critical data where you might have a moment to be able to do that?'
Liberty Vittert: [00:12:17] Seema, I remember I was with another group in Ecuador, and it was mostly people coming from Venezuela through Ecuador. A lot of these people are migrants, so they didn't have a refugee camp to go to. So the data that was being collected was literally volunteers or people who worked with the organization along the main highway, stopping groups, giving them backpacks of supplies, and in return, the migrants would fill out a little survey on the iPad. And I couldn't help but think that there would be a better way, or that there could be some AI-enabled way to do a better job of collecting this data. So, how do you really see AI—not necessarily just in that case, but in any way—really changing the game for the refugee crisis? Are there certain use cases that you see for that?
Seema Iyer: [00:13:08] Oh, 100%. Along that journey, I'm sure people hear a lot of conversations and stories and things like that. Imagine if we could use AI to convert those conversations into data, which in some ways is what social media is attempting to help us do. We hear these snippets of conversations in social media, but imagine if you could use AI to really put together a data set about experiences for people that they don't even have to stop, but we could actually collect those audio experiences and turn that into data. I think that's a really exciting use of generative AI in the future.
Xiao-Li Meng: [00:13:48] To follow up on that, we knew that you just had the Innovate4Refugees Forum to learn about and engage with the latest innovation, technology, and data science to address the needs of war for refugees and focusing on the digital and legal aspects. Can you tell us what you learned there? What other topics? This seems really interesting, and any information you can share would be terrific.
Seema Iyer: [00:14:13] Thank you. Thank you for allowing me to talk about it. It was a great day. I oversee what's called The Hive Data Science Innovation Lab at USA for UNHCR. And one of the things that we recognized is that The Hive itself is, I guess, an attractor of a variety of different people that are interested in using data and technology to support refugees along that entire throughput from origin to potentially awareness on the other end. So the question was how can we at The Hive bring all the people that are interested in this space into a place where multisector actors can actually get something out of it? Normally when you have technologists in the same room as UNHCR staff, it's the same room as a foundation or in the same room as refugees voices themselves, and it could become a cacophony of misunderstanding and you don't really get anything out of it. So we wanted to create a way of learning across different perspectives. People that are experts in technology and people who are experts in addressing the needs of refugees coming together to both share problems but also think through potential solutions.
[00:15:29] This year we focused on the protection of refugees from a legal and digital sense. So things like misinformation and human rights obviously came up a lot during the conversations. And we enabled panels to think through, 'Okay, who's got the problem?' We had, for example, UNHCR, a division of international protection, who is on the ground, helping refugees right now understand their rights. And then we also had pro-bono lawyers thinking about their experiences as somebody who wants to help and the bottlenecks that they face. In our case, we were specifically looking at automating case law citations. The longer it takes them to do this kind of background research, the fewer refugees they can help on the other end. Is there a machine learning or artificial intelligence way to help them with their case law automation and citation work so that they can basically help more refugees?
[00:16:29] I think what was awesome about the experience—and this actually goes back to that I have a very long experience in this before I came to USA for UNHCR—to really bringing people together who have a disparate perspective on the same problem and normally don't have the opportunity to take a moment and learn from each other in a way that's mutually beneficial. Sometimes you might just get one side of that or the other side of that where, for example, if you don't know anything about technology, sitting around talking about generative AI is pretty daunting and creates a barrier. And so on the one hand, we wanted to make sure that we were talking across sectors, but we also didn't want to dumb anything down. It was very intentional that we enabled real conversations with people who have real technical skills and enable them to talk across lines.
[00:17:27] One of our keynote speakers said it much more eloquently than I could. Her name is Malika Saada Saar, Global Head of Human Rights at YouTube. And she said 'Where the river and the ocean meet is the most fertile place for innovative thinking.' And that is what we hope Innovate4Refugees is: a place for technologists and people who are on the ground helping refugees to come together and really see each other's perspectives and hone in on potential solutions.
Xiao-Li Meng: [00:18:01] I can't help but follow up with the following question because the refugee problem, by nature, by definition, is an international problem. So what I want to get a sense of is that from this forum—I assume the speakers were from multiple countries—for the technologies and the use of innovations, do you feel there is a commonality across different countries or are some countries doing better? Because I assume when you talk about legal aspects, even the digital aspect, there are different cultures that have different backgrounds, different takes on those things. How do you manage this very international collaboration to get the best out of it?
Seema Iyer: [00:18:48] So, two things. We specifically hosted the event during the week that the UN General Assembly is in session in New York. We knew that people would be flying in that week anyway and that we could essentially capture a few hours of their time to join us. The other thing that we did was really ask people who are involved in this work what they want to learn about. On a day like this, we asked our multi-country office in Washington of UNHCR, and they were really fantastic thought partners in thinking about, 'Well, here are the real problems and here are the kinds of things that I would love to learn.' We also asked technologists, what do they want to learn? We have a relationship with Microsoft AI for Humanitarian Action and asked them the same thing. We have a Hive Advisory Board, which gratefully Liberty is joining us on, and we asked them what they would want to learn about. So thinking about and asking people what they wanted to learn really helped us curate a good overview of exactly what you're asking.
[00:19:49] One of the things that we specifically honed in on is a repository of caselaw that UNHCR Division of International Protection manages. It's called Refworld—Refugee World, but it's called Refworld—and it is a repository of case law where you can imagine natural language processing is a huge benefit to really digging into so many different kinds of international law that might apply very specifically to different refugees because of their country of origin or because of where they're located. One of the people that came to the forum, because they were interested in this particular work, was a lawyer from Australia who was coming to UNGA (the UN General Assembly), and he came and he stayed for the entire day, which I was grateful for. And DLA Piper, his organization, is a donor to UNHCR and provides pro-bono legal services for the displacement space. And he said, 'I didn't know anything about Refworld. And I don't know who your target audience was for this particular event, but I think I am your target audience. I am somebody who has resources to help refugees.' They have lawyers who want to help. They would love to be able to help more refugees. And if they could support better integration of technology and the Refworld platform—now that he knows about it—that was something that he got personally out of the event. So to me, that's a win-win right there of getting that international perspective for a specific problem, but it really has broader legs than that.
Liberty Vittert: [00:21:32] I think that brings me to the question of what do you really see as the next frontier in how AI can help with the refugee crisis? What's next? What's your forum going to be on next year?
Seema Iyer: [00:21:44] [Laughing].
Liberty Vittert: [00:21:44] Or maybe that's a little bit too soon to ask that question. But what do you see as the next really big challenge to tackle in this space, whether it's a use case or an idea?
Seema Iyer: [00:21:58] I will claim upfront that I'm probably not an expert in AI, personally. We have used it on our team to find patterns and audience targeting and I see a lot of the benefit. We got a lot out of the event to learn more about the new kinds of generative AI breakthroughs, thinking about how it can create content and summarize content—that's a whole different ballgame. One of the biggest things that I think could come out of this—goes back to our conversation about misinformation—is this idea of really communicating in a way that people understand.
[00:22:45] As a former professor, as we all are, sometimes you say the same thing to five different students and only one of them actually understood it the way you said it, so you kind of have to say it again in a different way so that different perspectives can hear what you're saying. I think generative AI actually has the chance of taking the same bit of information and creating the written word from it, or creating a data visualization from it, or creating a picture for it, or creating a diagram or creating a video, and that different people can actually access the same bit of information using multiple channels in ways that we wouldn't have been able to do before.
[00:23:25] So you can imagine, for example, in the refugee space, not everything's going to be written down. Not everything can even be disseminated person to person. But if we can translate things quicker and we can show videos quicker and we can provide audio quicker to meet the kind of needs and learning needs that we know all of us have different learning applications—if gen AI could really help that, I think that to me was a huge gamechanger that I learned from the event. The ways that we can disseminate and communicate in ways beyond maybe the typical report or the typical data visualization that requires some data visualization knowledge—not to say that I don't love my data viz staff, because I do—but if we can help disseminate that faster, I think that would be great.
Xiao-Li Meng: [00:24:15] That is a really great point. You mentioned that as professors, we all try to find ways to be more effective. Often, I'd be pretty lucky that you talk to five students and you actually got one of them understand, right? Sometimes you don't even get that. So seriously, I think what you mentioned, a forum about this better communication to help people to better understand situations, that's incredible. How much discussion was kind of focusing on the education aspect? Because I assume in this great space now, both the technology and the challenges, I think the best way to combat any future problems is to have more well-trained people that have been equipped with this new technology but also a better understanding of the complexity of the problems in the refugee space. So, how do we train the future generations to be a better workforce in this area?
Seema Iyer: [00:25:13] Yeah, actually one of the questions in the room was, 'Is this going to eliminate jobs?' which is a tough question. [Laughing.]
Xiao-Li Meng: [00:25:21] [Laughing.] No.
Seema Iyer: [00:25:21] Which was a tough question to answer. The answer is no, it's not necessarily going to eliminate a job. It's going to change the nature of that job, for sure. Any time we do any kind of data analysis, no matter what the technique that we use, there's some type of data going in and there's some type of data coming out or output coming, which you have to interpret. I think the next generation of AI specialists, that aspect is not going to change, but what is going into a machine learning model or what is going into a generative AI chatbot, you still have to curate that. You still have to know what your data sources are, no matter what kind of other tool you might be massaging it through. And then interpreting the output is always going to be something that we're going to need, but that interpretation is going to change as the tools are outputting different things.
[00:26:14] I've heard things like generative AI, making sure that you attempt to curate and control the input into your generative AI—so for example, don't have it open to the entire Internet. [Laughing.] Maybe that's not a great source of information because there are some things on the Internet that are actually misinformation and wrong. So you want to attempt to try and curate that. But on the other end, generative AI is not human, so you want to make sure that what comes out is accurate. It might be logical, but it might not be factual, right? And so making sure that you fact check, which may be a different kind of skill than you had in the past or a different aspect of a job than you might have had in the past. So I don't think the job is necessarily going to go away—humans are still going to be a part of the entire process—but the output, there's going to be more of it with generative AI than we've ever seen. Being able to be discerning that it's actually accurate, I think is something that we do need to help our teams or students better understand.
Xiao-Li Meng: [00:27:18] Actually, before I turn to Liberty for the magic wand question, I can't help but add one more question for you. In terms of creating new jobs, now we know that because of ChatGPT and generative AI, one fast growing area is called prompt engineering. How do you ask the question?
Seema Iyer: [00:27:37] How do you even ask the question?
Xiao-Li Meng: [00:27:38] Right. Because that's become so important. So I'm curious, during a forum, is there any particular discussions about the prompt engineering in this space? How do you communicate with the machine to get what you want? Is there a particular challenge or any particular tips?
Seema Iyer: [00:27:54] That was a huge component of what we were talking about because right now the prompt engineers are the pro-bono lawyers. [Laughing.]
Xiao-Li Meng: [00:28:02] [Laughing.] I see.
Seema Iyer: [00:28:03] They're the ones that are having to do it right now. Getting in the heads of these pro-bono lawyers to essentially automate some of the stuff that they might have been doing in the past was a part of the project. And how useful was it to learn about the way that a lawyer thinks about doing background research and how to cite a case law that might have occurred? That became a key component to making sure that it was even a good outcome. So the users in our case happened to be pro-bono lawyers, but if they didn't see that the output was accurate according to the way that they would have done it themselves, that was a great kind of outcome for that. And I think the law itself—international law, state law, local law—in some ways it's very much code. There is logic around the law.
Xiao-Li Meng: [00:28:54] Absolutely. Absolutely, yes.
Seema Iyer: [00:28:55] And analyzing law, there’s a lot of great applications. I heard one that was very interesting, using it from an advocacy point of view and using generative AI across a bunch of different laws in California to figure out if, let’s say, I am affordable housing expert or affordable housing advocate or a refugee advocate, which of these laws actually are the kinds of things that I want to advocate for? But if I'm not a lawyer, how am I going to read through all those laws and asking generative AI to help you think through which laws you want to support or which laws you want to lobby for? That was fascinating to think about where, even with respect to the law, generative AI could help with. So we honed in on a specific need for refugees, which was legal and digital protection. But to Liberty's point, next year we could pick a different area of need such as housing or we could pick livelihoods or we could pick education, as you already mentioned. There's no dearth, unfortunately, of needs of refugees. But the more we can help technologists understand what those very specific needs are, the more they can deploy their skills in a way that's relevant.
Liberty Vittert: [00:30:11] I feel like I have a thousand more questions I could ask, but we inevitably always have to wrap this up. So we always do a magic wand question at the end. This is if you could wave your magic wand, what would the answer be? So, not to put you on the spot—I can see your face is like, 'oh my God'—but if you could wave your magic wand to have one set, magically have one perfect set of data, that you could have to help the refugee crisis, what would it be?
Seema Iyer: [00:30:43] I would probably put it on the awareness side of this work. I think a lot of people are unaware of the plight of people around the world and how directly related it could be to your lives. One of the things that happened during the event and actually has happened a lot since I've taken this job is that people tell me about their connection to refugees—and even if I think of my own family, we're not so far removed from a refugee family. None of us. It could be by the grace of God that that any one of us could be in this situation all of a sudden. So if there was a perfect data set, it would be how could we use communication and generative AI to really help people empathize, sympathize, and put themselves in the shoes of refugees? That would be the best data set that I can think of based on who you are, really thinking deeply about your own family history. Like I said, none of us is too far removed from a situation where this could have been us. That would be my magic wand data set.
Xiao-Li Meng: [00:31:52] That's a really terrific answer. I think in general, I feel like our society or any society, if we all can put ourselves in other people's shoes as much as possible, we'll be a much better society, because these days we just don't think that way. Thanks for reminding us. None of us is that remote from these situations that we don't like to get into. So I really want to thank you on behalf of Harvard Data Science Review and the entire team for coming to talk to us, but most importantly for the great work you are doing for our society. Thank you.
Seema Iyer: [00:32:25] I'm so appreciative of the opportunity. Thank you so much.
Liberty Vittert: [00:32:27] Thank you for listening to this week's episode of the Harvard Data Science Review Podcast. To stay updated with all things HDSR, you can visit our website at hdsr.mitpress.mit.edu, or follow us on Twitter and Instagram @theHDSR. A special thanks to our executive producer Rebecca McLeod and producers Tina Tobey Mack, and assistant producer Arianwyn Frank. If you liked this episode, please leave us a review on Spotify, Apple, or wherever you get your podcasts. This has been Harvard Data Science Review: everything data science and data science for everyone.
Seema Iyer, Xiao-Li Meng, and Liberty Vittert have no financial or non-financial disclosures to share for this interview
©2023 Seema Iyer, Xiao-Li Meng, and Liberty Vittert. This interview is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the interview.