Column Editor’s Note: In this issue's “Minding the Future” column, high school student Anne Mykland recounts her experiences at a data science internship, and discusses how her real-life research connects to—yet differs from—classroom lessons. Practical research through internships and similar opportunities is a great way to build up data science acumen in the next generation. If you have organized data science research experiences for pre-college students, consider submitting a column proposal. We'd love to hear from you!
Keywords: high school, natural language processing, data science education, STEM research
From weather prediction to chatbots, data science and statistics are everywhere throughout the world today. Yet over the course of my four years in high school, only three months of statistics education were required. These months have mainly consisted of memorizing formulas and definitions from a textbook. What is a mean, median, and mode? How do we calculate the standard deviation? While these concepts are crucial in building a solid foundation in statistics or data science, hand-calculating means from fictional agriculture data quickly begins to feel long-winded and purposeless. Furthermore, many people are intimidated by numbers and are pushed away from pursuing data science or statistics after seeing numerous rows of quantitative data. Even for those who are more comfortable with numbers, standard questions from high school statistics textbooks can be inaccessible to students.
The coding aspect of data science is similar. Even in intermediate computer science classes, it is often taught through a traditional instructive approach, which walks students step by step through each line of code. Emphasis is likely placed on the memorization of built-in functions. The traditional approach may work well to produce in-depth learning in the long term. However, for a shorter period, such as a single semester, it could lead many to become confused due to complicated terminology and lack of context and intuition associated with what is being learned. The instructive method also causes students to become caught up in details. Students may fail to understand the overarching principles behind coding and be unable to see the breadth of interesting things that can be achieved through code.
In the summer of 2022, I attended the Data Science Institute (DSI) Summer Lab at the University of Chicago, a 10-week internship program that brings together high school, undergraduate, and graduate students to conduct formal data science research with a professor.
I found the internship to be an invaluable experience, as it provided me with a different approach to learning data science topics. Consider the above example of learning programming. Instead of coding line by line, research begins with a broad topic and a question that needs to be answered. To approach this larger task, I had to break the problem down into smaller pieces and explore the logic required to implement my ideas. This process helped me develop more intuition over the code; any functions that I was using naturally made sense to me because I could place them in the context of the problem. Memorization also came automatically as I coded more. Additionally, to get my code to run through large amounts of data in short periods of time, I learned to code efficiently. Good coding practices were also crucial so that the numerous pieces of analysis stayed organized. This was a practical aspect of computer science that did not exist in my high school classes.
The open-ended nature of research required me to study literature relating to my project. While I had previous experience reviewing literature in my history classes, I could not say the same for many of my STEM classes. Searching for relevant papers and learning how to navigate through them are two important skills I learned in the summer of 2022. These skills are especially significant in data science, where cutting-edge knowledge is rapidly updated, and concepts can be highly technical. Furthermore, when the reading materials were not directly related to the project, they opened my eyes to many exciting new methodologies and drove me to learn more than I ‘needed’ to know. This provides a contrast to traditional high school classes, where data science or statistics education has a curriculum that does not encourage much exploration. Literature review also allowed me to dive more deeply into topics that piqued my interest.
Looking back on the internship, I ask myself what made the experience so exciting. Problem-based learning is one reason. However, there are other factors, such as communication and teamwork, the interdisciplinary nature of data science, and relatedness to one’s daily life.
During the first few days of the internship, I set my research goal for the summer: I wanted to use quantitative methods to better understand the divisiveness of language in American political speech. I often watched the presidential debates on YouTube and discussed politics with peers and family members. Therefore, this was a topic that was familiar to me, and that was connected to my own life. The question of language divisiveness is also very open-ended: In contrast to high school classes, where I was handed numerical data and asked to perform specific operations upon it, I now had greater freedom to collect the kinds of data I wanted to use and choose the features that I wanted to analyze. The freedom encouraged me to think creatively and made learning data science topics a more personalized experience.
Furthermore, from toxic posts on social media to the 2021 riot at the U.S. Capitol, the harmful consequences of divisiveness can be seen everywhere. Therefore, pursuing this research felt meaningful on both a societal and a personal level. This sense of relevance provided me with a degree of interest and motivation that I did not gain from solving textbook questions.
Along with the aspect of personal connection, I was excited about my project because of the political science aspect that it contained. In school, subjects are taught separately, creating the illusion that they stand isolated. Perhaps the analytical skills used to understand classical literature in English class can be applied to history articles, and perhaps adding numbers in binary is a skill useful in both math and computer science. Yet the crossover between STEM and social science or humanities classes is minute. Many students begin to feel as if they can only excel in humanities, while STEM is beyond their comprehension, and vice versa. In particular, students may label themselves as ‘math people’ or ‘nonmath people,’ and they often perceive this characteristic as immutable. However, research has shown that the math mindset could be learned (Boaler, 2022). Engaging in data science research is one way to expose STEM-related concepts to those who feel intimidated by numbers.
Indeed, data science incorporates concepts from many fields. This notion can be seen on the level of the data itself, which does not necessarily take on a numerical form. The debate transcripts that I worked with, for example, were in text form. Along with the data, my topic of research helped me understand the connection between various fields of study. Applying the knowledge from my history classes and daily experiences to my data science research was gratifying and helped me understand just how broad the reaches of data and data science expanded.
The interdisciplinary nature of data science can spark the interest of students whose passions lie in a wide variety of subjects, breaking down the walls of the math mindset. For example, analyzing the language in old plays could interest students who enjoy theater, and developing algorithms to help disadvantaged populations access various health care options would captivate students who are passionate about public service. For those who enjoy visual art, image processing could be an engaging project. Textual and visual data are familiar to many and could be intuitive ways to introduce people to data science.
The program gave me a glimpse into the world of academic collaboration and teamwork. I was part of a research team and worked with two professors, including my mentor, as well as two graduate and two undergraduate students. Attending weekly team meetings, I listened to the experiences of people in different stages of their data science careers. Following their suggestions for analysis, I was exposed to a variety of machine-learning techniques and lexicons that were beyond the scope of a high school class. Reading through their code for the project, I learned good coding practices. Team members also taught me many real-life skills, from working with servers to searching for relevant literature.
My time at the internship also acquainted me with the research process, from brainstorming to analysis and presentation. Giving presentations was particularly valuable, as this was my first time presenting a highly technical topic. I learned how to explain my research to people who were not necessarily experts on the subject, which is crucial given the applications and implications that data science research often has upon the wider community and the public. Weekly seminars from other students and faculty were also very stimulating, broadening my understanding of what it means to be a data scientist. The seminars exposed me to a large variety of data science applications, including human–robot interaction, dark patterns, genetics, and many others.
Most high school students have written research papers in humanities classes. However, STEM research seems to be a missing piece. My internship experience makes me believe that incorporating data science research into the high school curriculum would be an excellent way to engage students.
This is not to say that the traditional approach can be discarded. Basic coding syntax, density curves, and statistical hypothesis tests, which I learned about in my high school classes, provided me with a level of data science familiarity that made my research easier to approach. It would have been difficult for me to enter the internship with no prior experience.
Nevertheless, research remains an invaluable educational tool. It allows for a more intuitive understanding of data science concepts and a hands-on experience. Returning to high school classes after the conclusion of my internship, I feel more comfortable coding, employing statistical tools, and devising creative solutions. I believe that the problem-solving skills learned throughout the research experience are also applicable in a professional work setting. Thus, a combination of the problem-solving approach and the traditional instructive approach would be most beneficial in teaching data science to students.
Encouraging students to explore data science literature is a potential first step in introducing them to research. Personally, reading literature helped me broaden my understanding of data science as a field and sparked my interest in natural language processing. Each student could, for example, read through various pieces of literature and present a favorite finding. This would provide students with real-life examples of data science that they found personally interesting.
This combined approach of instruction and problem-solving does not just apply to high school. As data science becomes an increasingly important tool in a wide range of areas and lines of work, data science education has never been more crucial. Research projects can be incorporated within data science education both at the university level and in further education programs within companies and government agencies.
Anne Mykland has no financial or non-financial disclosures to share for this article.
Boaler, J. (2022). Mathematical mindsets: Unleashing students' potential through creative mathematics, inspiring messages and innovative teaching. John Wiley & Sons.
©2023 Anne Mykland. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.