An interview with Shuang Frost, Aleksandrina Goeva, Javin Pombra, Sara Stoudt, Ana Trisovic, and Chris Wang by William Seaton and Catherine Zucker
Much of the current postsecondary training in core data science fields treats ‘practice’ as something to be relegated to capstone projects or other final preparations before students leave their programs. Here we argue for a paradigm shift, placing a so-called practicum course at the center of a data science program, intentionally organized as a hybrid between an educational classroom and an industry-like environment. As a case study, we detail our experience of the past 5 years developing the Statistics Practicum in Boston University’s M.S. in Statistical Practice (MSSP) program. We describe the motivation, organization, and logistics of our practicum, as well as both successes and challenges we have faced. In particular, the challenge of fairly and effectively assessing student achievement and program impact in this novel setting is discussed.
Keywords: consulting, education, industry partnership
Most postsecondary students in core data science fields have traditionally encountered substantive practical experiences only toward the end of their degree program, through capstone projects, internships, and so on. This approach can leave students, upon exiting academia, needing a nontrivial ramping-up period before they can truly have an impact with their first employers. In marked contrast, we have developed a data science program in which a so-called practicum course is placed at the center, intentionally organized as a hybrid between an educational classroom and an industry-like environment. Our experience is that this approach fundamentally changes the student experience and leaves students well-positioned to have immediate impact upon employment. We describe here the overall structure of the program and the practicum, detailing both successes and challenges we have faced.
This vast enterprise we call data science can be said to be many things to many people. So much so that the parable of the Blind Men and an Elephant is often invoked in analogy. Nevertheless, at the core, there are several aspects of data science that appear to be fairly uniformly agreed upon, and perhaps primary among those is the importance of actually practicing data science.
In the specific context of a novel master’s program, practice is intentionally at the center of what we have been doing for the past 5 years. The Boston University M.S. in Statistical Practice (MSSP) program was launched in 2015. Figure 1 shows a schematic representation of the program concept. Importantly, it was designed from the ground up—starting tabula rasa—rather than as a modification or perturbation of the traditional M.A. in Statistics program we had already had for roughly 30 years. The MSSP consists of eight courses (four credits each): two electives, four core courses in computing, methods and modeling, and statistical theory, and at the center, a two-semester Statistics Practicum sequence. The relationship between the core courses and the practicum is an essential feature of the program. The choice, tenor, and timing of topics in the core courses support active learning in the practicum, in part by accelerating the core classes in the first month of fall semester classes. The other key component of the MSSP program is substantial sources of practice, coordinated and managed through the practicum, in the form of (i) semester-long external-partner projects and (ii) consulting projects for the university community. Ultimately, the aim of the MSSP program is ambitious—we seek to transform individuals from students focused largely on courses to already experienced data science professionals in as little as 9 months.
The Statistics Practicum is the heart of the BU MSSP program—it is the center about which everything else revolves. The definition of practicum given by Wikipedia is “a graduate level course, often in a specialized field of study, that is designed to give students supervised practical application of a previously or concurrently studied theory.”1 The use of a practicum has a long and respected history in fields like education, psychology, public health, social work, and others. However, to the best of our knowledge, the concept has little by way of an established presence in statistics education. Of course, most statistics programs include practical work to varying extents, often incorporated through mechanisms like class projects, capstone courses, and consulting courses or units.
Capstones in particular are popular, with many programs emphasizing sponsored projects on which student teams work over the course of an entire academic year. However, capstones, by definition, typically come at the end of a period of study, where core courses and laying ‘foundation’ were emphasized. And, generally, capstones are not integrated with those foundational courses. In contrast, our practicum is a part of the student learning experience from start to finish, and the learning done in the practicum is integrated with that of the other core courses. In our view, the iterative nature of practice is an essential part of practitioner education. MSSP student teams begin working with clients through the MSSP Consulting Service one month into the program and complete multiple projects each semester. The MSSP Consulting Service operates as an integrated part of the practicum. Students also work on semester-long projects for MSSP partner organizations, typically Boston-based business and governmental units. All of this while progressively building a stronger and stronger foundation.
The aims of this article are three-fold: (i) to lay out the structure and execution of our Statistics Practicum, (ii) to illustrate the nature and evolution of the practical experiences of a typical student, and (iii) to characterize various of the challenges encountered. An overview of the organization and flow of our Statistics Practicum is given in Section 2. This is followed in Section 3 by a detailed discussion of sources of practice, both partner projects and consulting. A discussion of logistics and resources may be found in Section 4. In Section 5, we focus on the challenges—and our solutions to date—around the logistics and resourcing of the practicum. The specific challenge of assessment is tackled in Section 6, albeit raising more questions than answers. We close with some discussion in Section 7. In the Appendix we offer some additional brief historical context, from the perspective of statistics education.
The Statistics Practicum by design takes place within the confines of a typical semester-long course, meeting twice a week as a full class with smaller groups meeting for weekly discussions and labs. Aside from its weekly schedule, however, the practicum bears no resemblance to a traditional lecture course. The central organizing principle of the course is the data science life cycle, encountered first in the fall semester from a statistics-centric perspective and revisited in the spring semester from the perspective of a statistician working in the larger data science environment. We use a flipped classroom, requiring pre-class preparation for in-class discussions. The majority of class time is devoted to various aspects of the external partner projects and university consulting projects that are the source of the bulk of the students’ practical experiences. Discussion sections are used to augment development of select skills, while labs are used to allow small-group work on consulting projects under the guidance of teaching fellows. Below we describe in some detail the general organization and flow of the Statistics Practicum. A description of the two major sources of practice, that is, external partner projects and statistical consulting, may be found in the next section. All characterizations here correspond to the MSSP program in its current steady state of approximately 45–55 students per year.
Figure 2 shows a graphical depiction of the flow of the course in the first semester of the Statistics Practicum. The progression of topics over time is represented moving from top to bottom, while themes are indicated with colors/symbols. Prior to the start of the fall semester, students will have met for an intensive, two-week boot camp, focused largely on helping establish a relatively level playing field in terms of technical background in mathematics, probability, statistics, and computing.2 In contrast, the first three weeks of the practicum are spent establishing necessary context and perspective for statistical practice, introducing early stages of the data science life cycle, and addressing key elements of communication. In select weeks, later throughout the semester and in a just-in-time fashion, topics relevant to the latter stages of the data science life cycle are introduced. During those weeks not represented in the figure, the focus is on preparation for and presentation of milestones for external partner projects (see Figure 3 in Section 3).
In those weeks with topics, students complete a reading assignment and an online reading quiz prior to a 20–30 minute in-class discussion. During the first 3 weeks of the fall semester, as part of ramping up for the year, both classes each week have readings, with group-based active learning exercises following the discussion. From the fourth week onward, the MSSP Consulting Service is opened to clients and students to begin working on projects. Once consulting has begun, only one day per week is available for reading assignments. In the two weekly practicum classes, one day is used for reading and partner projects. The other day is devoted to consulting project presentations and discussions (see Section 3 for details).
Students start the semester reading sources like the American Statistical Association’s (ASA) THIS is Statistics website, the document When You Consult a Statistician … What to Expect, from the ASA Statistical Consulting Section, and David Donoho’s “50 Years of Data Science” article (Donoho, 2017). All of these have been found to be important in helping students broaden their perhaps surprisingly limited perspective on what it means to be a statistician and to work in the larger framework of data science. Associated with these readings is an in-class discussion of career goals, which in turn kicks off a homework assignment asking them to develop a so-called target resume (i.e., a sketch of what they would like their resume to look like in 9 months). Reading and discussion about project life cycle models provides an early introduction to reproducible research—a concept that features prominently throughout the course. The phrase, ’What is the question?’ deriving from an assigned article of the same title (Leek & Peng, 2015), is fundamental to the approach taught to statistical practice in the course. These and related readings are supplemented by in-class role-playing exercises in the first few weeks, wherein students are exposed to various scenarios commonly encountered that require thoughtful consideration of what question(s) are driving a project, what is the role of the statistician in that project, and how relevant aspects of elements ranging from ethics to reproducibility may come into play.
Throughout the practicum, students receive instruction and coaching to build communication skills. Class preparation readings for communication classes early in the fall semester include material that is simple and to the point. Recent articles have included “Ten Simples Rules for Making Good Oral Presentations” (Bourne, 2007) and “Ten Simple (Empirical) Rules for Writing Science” (Weinberger et al., 2015), from the PLoS Computational Biology series of the same name, as well as selected pieces from the Nature series English Communication for Scientists Several short in-class activities and homework assignments are paired with these readings. These include an exercise in group oral presentations, where students are asked to prepare and present a 5-minute `pitch’ dubbed the `Why you should …’ talk. The talk can address any topic of their choice—as long as that topic has nothing to do with statistics or data science! This last aspect, although the source of much consternation among students at first, allows us to highlight and encourage good oral presentation habits without the necessity of commenting about technical content. These activities and homeworks at the start of the year are generally accompanied by a formal rubric (mapping to principles of best practice from the readings), shared with the students ahead of time, against which their performance is measured. Having set expectations and established a common vocabulary, the same rubrics are applied informally and in real time throughout the rest of the year (i.e., as in a typical industry setting). By the end of the spring semester practicum, students have been giving presentations as individuals and in teams with such regularity that most can lead class and team discussions with little or no preparation.
Throughout the rest of the semester, students encounter readings tied to various other key aspects of the data science life cycle. These include data provenance, cleaning, and manipulation, at which time elements of confidentiality, privacy, and security are discussed; data visualization; (mis)interpretation of statistical inferences, with reading that includes recent ASA-led publications and statements on statistical significance and p values (Wasserstein & Lazar, 2016; Wasserstein et al., 2019); and the assessment and reporting of statistical results, which includes John Ioannidis’s article “Why Most Published Research Findings Are False” (2005). In-class activities associated with these topics are generally pursued directly within the context of the external partner project groups.
Tying the organization of the practicum to the data science life cycle means that we time the readings with key milestones of the external partner projects (see below for details). In turn, the timing of certain topics in other courses is also informed by the needs of the practicum. For example, the instructor of the computing course (a data science course in R) makes sure students are exposed to tools for data manipulation (based on the tidyverse) and visualization (based on ggplot2) prior to or concurrently with receiving data for their external partner projects and conducting exploratory data analysis, respectively. Similarly, the statistical methods course that students take concurrently, focused on statistical regression (from simple linear regression through generalized linear models, inclusive of mixed effects modeling), necessarily employs a more cyclic approach to topics rather than the more traditional linear approach. This approach allows students to encounter more sophisticated methods earlier in the semester when they first need them in the exploratory phases of their project work. Later they circle back and revisit those methods in greater depth as the project work itself begins to solidify.
The spring semester Statistics Practicum has a flow that is similar to the fall semester, but without the need for the groundwork of the early fall. The focus in the MSSP program shifts in the spring to the role of statisticians in the larger data science environment. In the practicum, this means a shift toward aspects of big data, data engineering, systems for automated data collection and management, relevant aspects of predictive analytics, and artificial intelligence. This shift is parallel to a change in focus in the required methodology course from statistical methods in the fall to statistical machine learning in the spring. In the practicum, there is a corresponding effort to focus the external partner projects on prediction. The fact that students see a similar pattern of topics in the Statistics Practicum over the fall and spring semesters has led us to schedule our required statistical theory course in the second semester. This approach allows them to reflect on theoretical foundations with a semester of practice under their belts, providing them with tools they will carry into practice.
The MSSP program is a practice-centric program. Our goal is for each student to emerge from the program with roughly a half-dozen new project experiences. In order to succeed in this goal for a program of even moderate size, a substantial amount of practical work must be sourced. In our program, we draw on two sources of practical work: (i) projects with external partners, and (ii) statistical consulting projects with university researchers. Importantly, each of these yield experiences that are real3 —they involve real people, asking real questions, for which they seek answers from real data. Simply put, the work matters to someone. We find that this last element both highly motivates the students and allows us to naturally hold them to professional-level expectations. By extension, this element then also inspires (indeed, obligates) us to provide students with the infrastructure and training to achieve these expectations.
In this section, we describe each of our two sources of practice in turn. Throughout, we refer to Figure 3, which shows an illustrative (and authentic) timeline of a year of practical work for one student.
Projects with external partners share a symbiotic relationship with the data science life cycle that defines the organization of the overall Statistics Practicum. The rhythm and pace of these projects is both informed by and itself informs what is covered in the course and when. These projects typically are lined up by MSSP program faculty one to six months in advance, in partnership with external entities in industry, government, and the nonprofit sector. Scoping of projects generally involves a series of initial meetings to match goals of prospective partners and MSSP, iteration by email on formal project statements, and frequently (sometimes extensive) discussion between university and partner legal representatives to define and agree upon parameters like intellectual property, nondisclosure agreements, data sharing and privacy, and so on.
Each project in a semester usually has a team of 10 to 15 students, working under the direction of an MSSP faculty member, in conjunction with a primary contact(s) within the partner institution. Teams are often broken into subteams as the semester progresses. Scope of work includes a series of predetermined milestones to be met by MSSP and/or partners, including regular check-ins, a mid-semester progress report (usually a presentation following an initial period of data cleaning/manipulation and exploratory data analysis), and a final presentation. Deliverables beyond the presentations usually include a project report and the corresponding code, the latter of which are frequently made accessible through an app (e.g., built using the shiny package in R, Chang et al., 2017).
Consider the fall semester partner project in Figure 3, shown in yellow and labeled “HPT Medicare Claims Project.” The work on this project was done during the fall of 2018 in concert with an international consulting firm focused primarily on the health care sector. The goal of the partner was to better understand various aspects associated with hypoparathyroidism (a rare condition, characterized by low levels of hormone related to mineral balance in the body) by looking at claims data from the Medicare population. It was expected by the partner that this project would leave them better positioned to make business recommendations to biopharma clients, by developing offerings (e.g., summary reports, prediction of trends, etc.) that leveraged the information gained through the work. Conversely, MSSP students considering the project could expect to work with gigabytes of data in an industry setting (i.e., they worked on a secure, cloud-based platform hosted by the partner), on tasks of immediate relevance to a major industry sector, with a team of professionals having an array of expertise. Students were divided into three subteams of four students each to drill down on various aspects of hypoparathyroidism, guided by the business needs of the partner: on patient segmentation to define subpopulations of patients, determining drivers of cost associated with having hypoparathyroidism, and prediction of onset of hypoparathyroidism based on patient claim history. Students spent the initial couple of weeks simply trying to figure out how to navigate such data (big enough it did not fit into memory in R). They then gradually brought to bear the necessary preliminary steps around cleaning, summary, and visualization that helped them begin to assess what the data might have to say about the business questions being asked. Each team then spent the rest of the semester pursuing analyses to achieve the goals set by the partner, generally in ways that required them to go beyond what they were learning in their method courses at the time (e.g., self-teaching select advanced statistical and machine learning tools, guided by the lead instructor).
For the spring semester project in the figure, labeled “Public School Enrollment Projection,” there is a similar narrative, with some natural variation due to the work being done in the public sector (i.e., with the school district of a major U.S. city) rather than in an industry setting. In particular, students explored different sources of information such as grades, attendance, student demographic information, as well as school-level information to determine drivers of enrollment variation. After subdividing the project into subtasks (grades, special education, etc.) based on initial exploration, students explored multiple modeling approaches in machine learning that they were learning at the time in their core methodology course (i.e., boosted trees and Gaussian process), as well as multilevel linear models and auto-regressive integrated moving average (ARIMA) models that fall under the label of more classical statistical modeling. In the end, students showed that stacking (in a machine learning sense) multilevel linear models with the currently used projection model produced the best projection. The students presented their results and submitted a 10-page written report that was delivered to the client through Google Drive.
Interweaving weekly course topics with partner project milestones is a hallmark of our practicum, allowing us to create an industry-like feel within an academic classroom. The names of partners and the corresponding project statements are revealed—with due fanfare!—soon after students have been introduced to the importance of the principle of asking ‘What is the Question?’ at which point some initial discussion of ethics, confidentiality, and related concepts is had as well. Project teams are formed through combined consideration of student preference and a need to balance skill sets. Each team receives its data the week that students read about data provenance, cleaning, and manipulation. In-class presentations are made midsemester, soon after we cover data visualization and the use of modeling in the context of exploratory data analysis. Later, as project teams are beginning to accelerate toward convergence and a final set of deliverables, students are reading about (mis)interpretation of statistical inference and the assessment and reporting of statistical results. Content covered earlier in the semester pertaining to oral and written communication is revisited repeatedly.
The external partner projects play a fundamental role in the Statistics Practicum and, by extension, in the overall MSSP program. However, due to both their length and complexity, it would be a challenge to scale the number of such projects to an extent that allowed us to achieve our goal of providing each student in the program with roughly a half-dozen real project experiences. Instead, we source the rest of the project work internally, through a statistical consulting service.
In 2015, we opened up a statistical consulting service (named MSSP Consulting) for the university concurrently with the start of the MSSP program. Until that point in time, the university lacked a single, coordinated entity capable of supplying statistical consulting and collaboration at scale. By launching both the MSSP program and MSSP Consulting together, the idea was to grow the university’s capacity for statistical consulting as we grew the program itself. Students are supplied with a regular supply of projects on which to work, for which they receive course credit in the practicum, and clients receive consulting in an academic environment at a level of organization and professionalism targeting industry levels. As of the past 2019–2020 academic year, each class of roughly 45–55 students helps provide consulting to approximately 100 university researchers annually.
Several examples of consulting projects are indicated in Figure 3. The student’s first project began with the opening of our consulting service on October 1.4 This involved working with a faculty member in our School of Education to begin understanding the nature and value of data from a large, multicohort survey seeking insight into violence exposure and substance use among LGBTQ adolescents. The end product of this work was a collection of exploratory tools that were packaged into a single R Shiny app for the client, enabling her own ongoing exploration of the data going forward. The second project was with a doctoral student in our Department of Biology, modeling experimental data obtained with the goal of better understanding the growth of nitrogenous bacteria. In this case, the client could not share the data with our consultants, and so the work primarily consisted of modeling advice provided interactively over a series of meetings, with an emphasis on how best to deal with zero inflation. The third project was one of a number of such projects MSSP did with master’s students doing research in the context of our university’s M.S. in Genetic Counseling program. Here the work consisted of advising on and prototyping the coding for a collection of visualizations that helped the client summarize survey data on the perception of genetic counseling of parents of children with a form of autism. Finally, the last project was with a doctoral student in our Department of Forensic Anthropology, where the goal was to examine the differences in the extent of trauma inflicted by blade-like implements under various conditions.
Note that the MSSP student whose experiences are summarized in Figure 3 was involved with several consulting projects at once, which is common. Importantly, despite the diversity of topics and tasks, at a certain level of granularity all four projects were approached through an identical process. This process is reflected in our management system and the training we provide those doing the managing, as well as the steps defining the beginning, middle, and end of each project, along with the software infrastructure we use to support those steps, as we describe following.
Adequately managing and staffing the Statistics Practicum demands more in the way of logistics and resources than a typical two-semester course sequence. It is important to note that the MSSP program is run as a cohort system and, as structured in its current steady-state, has both a director of the MSSP program as a whole and a director for the MSSP Consulting. This approach facilitates a high degree of coordination among elements of the program, both within and outside of the practicum. Here we sketch some of the most pertinent details around logistics and resources, for the program as a whole and, in particular, the practicum. A summary of the personnel requirements may be found in Table 1.
Our MSSP program operates on a traditional academic year cycle, with all students starting in the fall. Accordingly, admissions are run in the spring. We aim for approximately 45–55 students in each entering class. Students must have had basic foundational topics like two semesters of calculus, a semester or more of programming, and at least one year of training in statistics. And they are expected to have had at least one of either a semester of discrete probability or a semester of linear algebra. While this last element may be seemingly odd from the perspective of a traditional statistics master’s program, we have found that it is sufficient to operationalize the goal of admitting students beyond just undergraduate majors in mathematics or statistics. This approach has proven to be important in empowering students with diverse backgrounds and training to enter into and succeed in the program. It has been especially useful in helping us recruit working professionals hoping to retrain as data scientists.
Note. Among the instructional faculty there is the director of the MSSP program and the director of the MSSP Consulting service. Note that the number of research faculty is variable (shown as “1+” in the table), in that the need and financial support for their role is driven by research needs of various colleges and schools within the university. They are therefore not so much a necessary component of the practicum as they are a set of synergistic partners, benefiting in turn by the opportunity to incorporate students into their work.
Generally, we find ourselves with roughly two-thirds of our students having been mathematics and statistics majors, and usually attending the program directly after undergraduate studies. The other one-third of our students are a mixed group, both in terms of majors and in terms of whether they have worked previously after graduation or not. Among this latter group, we have had students succeed with backgrounds as diverse as economics and psychology, chemistry and engineering, computer science, marketing, fashion, and even music. In turn, this diversity of backgrounds has proven to be an enormous benefit to the quality of work done in the program. In particular, we engineer our project and consulting teams to balance qualities across our students, so that to the extent possible each team has, for example, core strengths in mathematics, statistics, and programming, as well as speaking and writing, and also both younger and older students (e.g., usually with a difference ranging from 2 to 5 years post-baccalaureate). Students learn from each other’s strengths (where, say, a strong writer may not necessarily do the writing, but rather support weaker writers in improving) and the overall quality of deliverables to partners and clients benefits as a result.
The summer before matriculation and a boot camp then become important pieces of the program as a whole. Prior to matriculation, various reading materials are made available to students and they are pointed to various resources for self-learning. In addition, students begin to receive weekly newsletters at this time, which helps already to begin to define the notion of ‘cohort’ upon which we will build when they arrive. In the boot camp itself, as described earlier, students receive intensive exposure to core material in mathematics, probability, statistics, and computing. For this we use traditional resources and instructional methodologies, intentionally seeking to leverage what is presumably familiar to them. At the same time, however, various team-building exercises are interspersed throughout the boot camp. Through these we introduce early the importance of team work, and allow students to begin getting to know each other better.
The Statistics Practicum consists of two semester-long, four-credit courses (i.e., the number of credits for a standard course at our institution). It is team taught, usually with a ratio of 15 students to one faculty member. So, for example, at our current size of roughly 50 students per year, we equip the practicum with three faculty members. Each faculty instructor receives full credit for teaching one course. The MSSP program director is one of the instructors and generally the lead coordinating instructor for the course. The other instructors are in turn typically drawn from among faculty teaching other of the core MSSP courses. To date, instructors have been drawn from a mix of tenure-line faculty and (nontenure line) professors of the practice. Salary lines for the latter, as well as most other teaching resources (e.g., teaching fellows / assistants), are supported through tuition return from the MSSP program.
The course-related duties for instructors primarily involve (i) preparation of materials, (ii) facilitating in-class discussions of readings, (iii) providing feedback during in-class consulting presentations, and (iv) leadership on partner projects. Materials for readings and the development of associated quizzes are reviewed and updated each summer. Instructors tend to rotate among themselves the leading of the in-class discussions around readings. Feedback on consulting is provided by all instructors (see Section 4.3 for more details around consulting). And each instructor takes leadership for one partner project.
Partner projects arguably constitute the single largest time commitment for instructors each semester. Work on formal project descriptions and partner agreements (e.g., MOU’s, data use agreements, etc.) is usually begun at the start of each summer, since it can take some time to work through the necessary formalities on both academic and industry sides. Realistically, however, the projects stem from discussions that often start months or even a year in advance. The MSSP program director formally spearheads this work, but generally each corresponding faculty instructor is in the relevant discussion early, and often carrying much of the load, since relationships play an important role in all of this work. During the semester, the instructors have sole responsibility for their projects, but all instructors are working to the same semester-long schedule and touch base regularly to ensure timing remains well-coordinated. Such timing is further solidified by having common deadlines for in-class and partner presentations throughout the semester.
In addition to the instructors, the practicum usually employs at least one teaching assistant in the form of an MSSP student from the previous year’s cohort. Such students are responsible for supplementary material in the discussion sections, either presenting it themselves (e.g., a hands-on illustration of git workflows used in consulting) or coordinating speakers (e.g., a panel discussion about internships from a subset of the previous year’s students). These students also will often serve in support roles for students on each project team. The availability of such students derives from the fact that, although MSSP can be completed in just two semesters, a substantial portion (roughly 60%) of students prefer to graduate in three or even four semesters (i.e., the latter going part-time), which, among other things, allows them time to find and complete a summer internship while studying. The presence of these students in class helps provide an important element of continuity in the program and a strong sense of connection among peers across cohorts.
We employ a hierarchical management system in running the consulting service. At the base this hierarchy are the roughly 45–55 students in the MSSP program, whose training and preparation in the practicum has already been described. Directly overseeing these students are three Ph.D. student mentors. These mentors commit to this role for the full academic year and, in exchange, receive credit as teaching fellows in the department. Finally, there are a handful of faculty associated with the MSSP program. This includes the MSSP program director, all faculty teaching in the Statistics Practicum, and certain research faculty associated with the program. The PhD student mentors and the faculty come together weekly for a 90-minute administrative meeting, during which time new projects are discussed, progress on current projects is assessed, and any trouble-shooting that is needed is done. In turn, the PhD student mentors meet weekly with their teams, which consist of roughly 15 MSSP students per team, each of which is broken down into subteams of four or five students. It is these subteams that work together on individual consulting projects like those described previously (i.e., so the student corresponding to Figure 3 would have had three or four teammates on each of the four consulting projects).
We note that the Ph.D. student mentors receive intensive training during the first month of the fall semester, in addition to attending all classes in the Statistics Practicum during that period of time. This includes training on both technical and nontechnical topics. The former includes background in R and other aspects of our software infrastructure (see below), reading on specific topics frequently encountered (e.g., mixed effect modeling, mediation analysis), and exposure to case studies drawn from previous years. The latter includes reading and discussion of principles of project management and techniques for working effectively as ‘middle managers.’
Like any service-based entity, maintaining the quality of our service is critical. Given the volume of people and projects involved, we have found that the only way to ensure a high quality of service with high probability is to lay out clear, general, replicable processes across the entire endeavor. Every consulting project request comes into the system through the same online web form, through which we gather some minimal information on the client, their questions, their data, and any progress to date. After initial review and approval by an MSSP faculty or staff member, the project is assigned to a Ph.D. student mentor, who then reaches out to the client to arrange an initial meeting. These initial meetings involve the client, the Ph.D. student mentor, and a single subteam of four to five MSSP students. Meetings are held during predefined slots of time during the week, which helps simplify time management for the Ph.D. and MSSP students (subsequent meetings, at the discretion of both clients and consultants, may be held at other times). Presuming the project is nontrivial in nature, a so-called intake form is filled out by the consulting team following this initial meeting and brought by the Ph.D. student mentor to the next administrative meeting. The project is ‘triaged’ by faculty and Ph.D. students together, resulting in an intake form (possibly edited) that is returned to both consultants and client and acts as an informal contract for the proposed scope of work. This work is then pursued along the lines outlined. Most projects will include one or more intermediate milestones, each of which involves a short presentation by the consultants to the entire MSSP class (students and faculty) during the one class period per week reserved for this purpose.5 Upon completion of the work, typically a report is written and sent to the client (upon faculty approval), along with any relevant software deliverables.
The principle of reproducible research is central to our approach. Even beyond the usual rationale motivating reproducible research as best practice, there is the added motivation that we not infrequently have clients return from year to year to a service that is largely based on MSSP students that typically stay with us no more than two or three semesters before graduating. All projects are archived through the Open Science Framework. All project materials are kept in a private repository dedicated to the purpose on GitHub and/or other data storage devices that meet the data security standards of our university and partners. Our use of GitHub, integrated with RStudio, allows us to introduce and insist upon appropriate best practices around software versioning and sharing within the larger context of the students’ statistical work.6 Final reports use a format that follows standard principles of best practice for scientific writing, as introduced in the early weeks of the Statistics Practicum. Whenever possible, these reports are generated using the R markdown environment. For those projects in which our consultants work directly with data (i.e., as opposed to, say, inquiries about experimental design or projects in which the client cannot or will not share their data), any and all stages of analysis are executed in code (i.e., as opposed to, say, by hand in Excel). Management of projects is supported through the Basecamp software platform and, more recently, using Microsoft Teams.
The MSSP Statistics Practicum goes well beyond being ‘just a course.’ Accordingly, as could be expected, its development has entailed a number of unique challenges. Here we summarize some key challenges and our efforts to address them to date.
As an integral part of a master’s degree program, the practicum necessarily takes place in an academic environment. Among other things, this means that the practicum must ‘fit’ within standard academic ‘units’, that is, using established labels like courses, discussion sections, and labs, based on personnel with titles like student, teaching assistant/fellow, and instructor. Yet clearly the practicum is in many ways nontraditional in intent, form, and execution. The inherent tension that results requires working with the department and administration to map what we want to do onto the appropriate units. For us, such mapping has included assigning instructors (all of whom do largely identical work) to varying flavors of ‘instructor’ titles, because the course registration system allows for only one primary instructor. It has also involved a substantial exercise early on in defining the roles and duties of our Ph.D. student mentors, as middle managers in our consulting hierarchy, so that they could receive credit as teaching fellows for their work.
As mentioned already, logistics of running all of the practical work in the Statistics Practicum are nontrivial. The course is team-taught and therefore entails many of the standard challenges inherent to any sort of team teaching. At the same time, there’s more involved than, say, the need to coordinate lectures, given a half-dozen industry projects per year and roughly 100 consulting projects per year. Such challenges are alleviated by formalizing and abstracting the process to the level of key steps common to most projects, which enables instructors to leverage past experience and hence navigate each process less with a sense of ‘one off’ and more with a sense of ‘déjà vu.’ This is in fact a key benefit of adopting the data science life cycle as a central organizing principle (although, for instructors, the life cycle necessarily extends both before and after the course). Running multiple projects in parallel, each with a faculty lead, within one course also creates useful synergies. The use of student (sub)team leaders within each project helps further distribute the load while, at the same time, providing select students with additional opportunities for growth.
Working with industry partners brings its own set of challenges (albeit with the prospect for substantial reward in the balance!). Projects start and end with relationships, between MSSP and its faculty, on the one hand, and individuals at partner entities, on the other. Projects require planning, organization, and regular communication, before, during, and frequently even after the semester. And because not all potential projects come to fruition, we generally find ourselves scoping more projects than we actually implement. Additionally, it is important to get the legal aspects right, which can take some time (and, frequently, patience!). And afterward, it is important to monitor students for compliance. These sorts of challenges are not unique to our practicum, of course. Increasingly, they are being faced by a large fraction of data science programs across the United States. So much so that several bodies and gatherings around the topic of data science and education recently have focused on this topic.7 From this point of view, our Statistics Practicum arguably offers a valuable mechanism by which to navigate some of the challenges.
Standards for best practice in statistics and data science are rapidly evolving. In just the 5 years since we launched the MSSP program, topics like reproducible research, inferential statements following variable selection, the (mis)use of p values, ethics, fairness in machine learning, explainability, and the general area of artificial intelligence have received enormous scrutiny and attention, both within our field and increasingly outside. Accordingly, there is a challenge to revisit each year not only the choice of reading materials for the Statistics Practicum but also, to some extent, the topics (particularly in the second semester). Students in our program need to not only be well-trained in foundational topics but also to be aware of and capable of discussing current ‘hot topics’ in practice. Their ability to do so frequently is tested in job interviews.
As the centerpiece of our MSSP program, and given the cohort-based style in which we run the program, the Statistics Practicum necessarily must be run in a manner more tightly integrated with the other required aspects of the program than, say, in a traditional master’s program. To facilitate such integration, we hold a short, weekly MSSP staff meeting at which we allow time for curricular aspects across the courses to be raised. Relevant issues might pertain to anything from when our computing or methodology courses cover a particular topic during the semester to how well a particular student(s) appears to be juggling assigned tasks. Of course, over time, and among faculty teaching regularly in the program, many of these discussions begin to happen naturally. Additionally, at the end of each academic year, we hold a one-day program retreat, where faculty examine what worked well and what worked less well. These discussions, while critical to program creation in the first few years, still continue to hold surprises. For example, two years ago we conjectured that our statistical theory course might better serve the students if encountered in their second semester, rather than their first. Having implemented the change, anecdotal evidence to date suggests this may indeed be the case that our students are better served by seeing theory after developing some foundation in practice, rather than laying foundations in both theory and practice simultaneously in the first semester.
Lastly, in running the Statistics Practicum—an academic course—in a quasi-industry-like manner, with multiple teams working on multiple projects throughout the year, we find that how best to set expectations and to incentivize students to meet those expectations, is a fascinating puzzle. Students cannot be fired, of course, yet we hold them (in time) to increasingly more professional-level standards of conduct and quality of work. At the same time, students are there to be students. That means, in part, having the freedom to learn and explore, and to both succeed and fail. Ultimately, we have come to see the development of assessment criteria and mechanisms as arguably the most powerful device through which to approach this puzzle.
Ultimately, we are in a classroom and academic environment. So assessment is a necessary aspect—assessment of both student success and practicum success. Given the highly nontraditional character of the practicum—consciously blending aspects of traditional pedagogy and a quasi-industry work environment—the question of assessment is nontrivial. And one that we—and, we believe, most data science programs—are only in the early stages of beginning to tackle in a methodical fashion. Heeding the call of Gelman and Loken (2012), our aim is to develop in the coming years an infrastructure for assessment that allows us to practice what we preach!
Assessment is a term that covers a range of topics, a portmanteau that requires unpacking to be useful. In its most productive form, assessment includes feedback that enables, enhances, and accelerates learning. Feedback can be multidirectional, including input to faculty or students as individuals or in groups. Grading, the default feedback mechanism in education, is widely understood to be imperfect. In addition to its lack of dynamic range, grading reduces a complex multidimensional problem to a single scale and does not resemble evaluation practices common in actual practice—except, perhaps, some compensation decisions. In the practicum, we are challenged by the fact that the feedback we offer students is often drowned out by the grades we are required to give at the end of each semester.
We are working toward formal learning outcomes-based assessment, such as proposed in Chance and Peck (2015). To some extent, however, the very innovation and mechanisms that make our practicum work are also some of the most prominent barriers that make this kind of assessment challenging. First and foremost, because all of the projects that students work on are real, it is not possible to provide uniform experiences that lead to a strictly uniform assessment rubric. Another important aspect is the diversity of backgrounds. Since we recruit students with diverse backgrounds, the rate of student success on specific aspects of the program can vary in a nontrivial manner from one student to another—yet all may (and typically do) prove highly successful in finding a job and beginning their career soon after graduation, which arguably is most students’ primary concern! Despite understanding these limitations, here are some of the ways in which we currently perform assessment at MSSP.
Throughout the practicum, students are assessed on individual knowledge and performance using multiple traditional assessment tools. Online reading quizzes (offered through our university’s course management software) are used to assess their reading comprehension of practicum topics. Peer-review assessment using an online platform, guided by rubrics published for students with each assignment, is used to assess their writing assignments. Their in-class consulting presentations and reports are given direct feedback and noted for completion.
What we find most challenging to assess, however, is their consulting and project work. Since the outcome of these projects are explicitly managed to be successful, we rely on self-, peer, and co-assessment with a detailed rubric, which has roots in Dochy et al. (1999). This approach allows us to assess both the group members and the direct supervisor (e.g., the Ph.D. student mentor, in the case of the consulting projects). As these metrics have a nontrivial subjective element to them (e.g., see Bryan et al., 2005, for discussion stemming from a formal study in the context of education of medical students), we go through multiple rounds of calibration to maintain consistency across different groups and supervisors (i.e., similar in spirit to the problem of calibrating rankings across multiple raters). Despite the inherent challenges in this exercise, we have found that there is educational value even simply through the act of doing these assessments, as they promote self-reflection and further understanding of the evaluation criteria.
Student grades for each semester are based on a weighted average of attendance/participation (10%), reading and quizzes (25%), homework assignments (15%), partner project work (25%), and consulting work (25%). These percentages are meant, in part, to suggest to students a rough sense of how we expect them to allocate their time. Passing grades at the graduate level are set to a B- or better by university dictates. It is extremely rare for us to encounter a student who is somehow able to sufficiently avoid engaging throughout the semester so as to merit a failing grade. As a result, at the end of each semester we face the further challenge of how best to distinguish among student investment on a limited grading scale. Grading is curved and ultimately we find it reflects the truth in what we advertise at the start of each academic year: that, with sufficient hard work (and they do indeed work hard), failing is typically not a concern—but to excel is still a challenge.
Currently, we have no direct measure of success for the practicum as a whole. But we have many indirect measures. Some of these latter include metrics associated with the success of the consulting service, which last year served roughly 100 university researchers. Our students are placing well, with what we estimate to be a 95% placement rate within 6 months of graduation, with students frequently joining major multinational companies in the industry sectors of their respective interest. Overall, students are expressing their satisfaction in the exit surveys. And we are finding that our quickly growing alumni base is highly enthusiastic, maintaining a level of connection back to the program that easily surpasses what we have traditionally seen in any of our other graduate degree programs.
Assessment is an essential part of practice. In businesses, so-called 360-degree assessments (e.g., Heathfield, 2020) can move the focus of conversation from individual performance to organization performance—and how individuals affect it. However, to conduct effective 360-degree assessments is nontrivial (e.g., Edleson, 2012). And there is a delicate balancing act between acknowledging values promoted in a corporation setting, such as change management (how organizations need to constantly evolve), with how a traditional educational institution sees assessment as meeting certain predefined criteria. This is the yin-and-yang of a professional academic program that will be an inevitable tug o' war throughout its existence.
One thing that we are sure of is that this balance cannot be achieved only by the judgment of academic faculty alone. How best to strike this balance deserves an extended dialogue between academic and industry, both at the level of individual programs like ours and more broadly across data science. Within MSSP we try to facilitate conversations with our industry partners through various events, such as a MSSP Dinner Dialogues, our Stats@Work industry speaker series, local Meetup events, and so on. We are also in the process of forming a corporate advisory board that we envision will serve as an important source of feedback, formalizing much of the network we have drawn upon to date.
In August 2020, we held our sixth bootcamp,8 taking the first steps to prepare this year's new students for the practicum and all of the elements of the MSSP program that integrate through it. Between mid-August and early October, students learned the essential skills for day-one consulting, while a team of three Ph.D. student mentors reviewed past consulting engagements to prepare for managing consulting projects and mentoring students. A practice-centric program must begin each year by re-creating the organization through which services are delivered, and educational goals are achieved. When the program started in 2015, we did not fully understand the implications of practice-centric education or the role the practicum would play as a reaction chamber in which evolution accelerated.
For MSSP Students, the pace and rhythms of the practicum permeate the program, binding skills and knowledge learned in required and elective courses to direct experience in an iterating cycle of presentations, readings, discussions, and seminars. For faculty, teaching fellows, and administrators, the start of another year sets in motion a weekly schedule of meetings focused on coordinating consulting engagements with teaching fellows, managing partner projects, scheduling consulting presentations in the practicum, and maintaining the program's calendar of seminars and events. These meetings themselves constitute a practice through which the practicum and the MSSP program are refined and improved.
MSSP was initiated and has developed as the data science wave began to surge across many universities, including Boston University itself. As we were defining and operating a practice-centric, statistically flavored data science program, one of us was serving as co-chair of the Roundtable on Data Science Postsecondary Education convened by the National Academies of Sciences, Engineering, and Medicine. Through that lens, it was apparent that many programs across the nation were and are exploring creative ways to incorporate additional elements of practice into their curricula. Our practicum arguably represents an outlier in the spectrum of existing approaches, both in the extent to which it incorporates practice and the manner. This is a large and active space, with similar (and also early) outliers being programs like the M.S. in Analytics at North Carolina State University and the M.S. in Data Science at the University of San Francisco. See the NCSU interactive visualization for some of the many other graduate degree programs in analytics and data science.
A natural concern we have heard from colleagues in describing our practicum model is, ‘But what about statistical theory and methods?’ Central to our approach is the perspective that this is not an either/or proposition. The practicum is a unifying device (i.e., recall Figure 1) and, in particular, an amplifier for what students learn about statistical theory and methods. In bringing theory and methods to bear in practice they are guided and supported in our program through, on the one hand, the tight coupling of content and timing between core courses and the practicum needs, and on the other hand, the iterative nature of learning used in the practicum, which in turn reflects the natural progression of refinement in practical work. Ultimately, our sense is that students come away with a richer understanding of the interaction of theory and practice than those we have taught in more traditional settings.
Our use of words like ‘practice’ and ‘practicum,’ as opposed to ‘applications’ and ‘applied,’ is deliberate and intentional. Our feeling is that the ‘theory versus applications’ perspective that has so dominated statistics and other core data science fields for decades is a false dichotomy. And one that will hold data science back from its full potential if similarly adopted. We are encouraged that increasingly others appear to agree. The number one finding and recommendation of the report from the 2018–2019 National Science Foundation -sponsored workshop series Statistics at a Crossroads: Challenges and Opportunities in the Data Science Era (He et al., 2019) is: “the central role of practice.” Specifically, the authors state, “Today it is imperative for us to put practice at the center of our discipline with relevant computation and theory as supports.” At the same time, they emphasize, "Good statistical theory must inform and strengthen practice, or we are wasting our time and energy …." Put simply, theory informs principle, and principle informs practice; and practice, in turn, informs theory.9
Looking back on the organizing logic of the MSSP program, it only made sense to center a statistical practice program on a practicum course. The practicum is the driving force in the program. Among its roles, the practicum is a committee of the whole where students, instructors, and teaching fellows interact in an ongoing discussion where theory and principle affect the outcomes of consulting engagements and partner projects in which students are actively engaged. Making the practicum the center of the program—not a capstone, lab, or external project—tears down the wall between academic abstraction and the work MSSP students came to the university to begin.
Disciplines engaged in data science, and their respective members, arguably are engaged with practice to varying extents. Within our own field of statistics—rather ironically, for a field that is central to the data science endeavor as a whole—historically the reception to arguments for the importance of practice has been curiously mixed. On the one hand, statistical practice has been pervasive throughout the sciences for over a century, and remains so in the modern age, whether through machine learning applications in automatic facial recognition systems, or the market basket analysis in online markets, or longitudinal studies in the development of new drugs. On the other hand, there has been a pervasive and consistent inclination in statistics education toward statistical theory and the mathematics of statistics, with statistical practice generally being relegated to a secondary role. This situation is arguably most acute at the university level—a uniquely important and formative point in time for students, just prior to the beginning of their careers as data scientists.
This tension within statistics is not new, going back at least to the 1960s. And calls for shifting statistical practice to the center of our field’s collective focus—a call we echo here—is not new either. For example, John Tukey (Tukey, 1962) famously wrote:
What of the future? The future of data analysis can involve great progress, the overcoming of real difficulties, and the provision of a great service to all fields of science and technology. Will it? That remains to us, to our willingness to take up the rocky road of real problems in preferences to the smooth road of unreal assumptions, arbitrary criteria, and abstract results without real attachments. Who is for the challenge?
Similarly, roughly 20 years later, the Committee on Training Statisticians for Industry, within the American Statistical Association’s Section on Statistics Education, published a report (Boardman et al., 1980) recommending that educational programs “focus on real problems and the statistical theory and methodology that are useful in their solution.”
Another notable event was a symposium funded by the National Academies of Sciences, Engineering, and Medicine (NASEM) on interdisciplinary statistics education for 21st-century statisticians, held by the Committee on Applied and Theoretical Statistics (CATS) in San Francisco in August 1993. The published proceedings (Kettenring, 1994) open with a GEP Box quote, "statisticians must learn to be good scientists, a talent which has to be acquired by experience and example." While the symposium discussion included many issues that remain important today, the symposium may be more notable for not discussing the 1993 event that has had the most profound effect on statisticians (and the rest of humanity!) of the 21st century—in April 1993, CERN open-sourced the code for the World Wide Web.
But the momentum toward a shift in perspective has been increasing, and perhaps is even beginning to surge now in a way that can (and—we ourselves would argue—should!) have a profound impact on the field of statistics—and thereby data science. Twenty-five years later, the pace of change brought on by the Internet and the Web is assumed, and interdisciplinary statistics has morphed into data science, with perspectives beginning to look even further down the road. For example, in the NASEM (2018) report Data Science for Undergraduates Opportunities and Options, the committee began by envisioning the world of 2040 when 2018 newborns will be completing training courses and college degrees, ready to enter the workforce. By envisioning the conditions of midcentury data science practice, the committee provides insight into how the data science curriculum of today might evolve to prepare students for successful careers.
Practice is central because it is the vehicle on which we ride waves of change. Particularly supportive evidence for this assertion are the outcomes of the referenced 2018–2019 NSF-sponsored workshop series Statistics at a Crossroads: Challenges and Opportunities in the Data Science Era.
Eric D. Kolaczyk, Haviland Wright, and Masanao Yajima have no financial or non-financial disclosures to share for this article.
Boardman, T. J., Hahn, G. J., Hill, W. J., Hocking, R. R., Hunter, W. G., Lawton, W. H., Ott, R. L, Snee, R. D., & Strawderman, W. E. (1980). Preparing statisticians for careers in industry: Report of the ASA section on statistical education committee on training of statisticians for industry. American Statistician, 36(2), 65–75. https://doi.org/10.2307/2684106
Bourne, P. E. (2007). Ten simple rules for making good oral presentations. PLoS Computational Biology, 3(4), Article e77. https://doi.org/10.1371/journal.pcbi.0030077
Bryan, R. E., Krych, A. J., Carmichael, S. W., Viggiano, T. R., & Pawlina, W. (2005). Assessing professionalism in early medical education: Experience with peer evaluation and self-evaluation in the gross anatomy course. Annals-Academy of Medicine Singapore, 34(8), 486. https://www.semanticscholar.org/paper/Assessing-professionalism-in-early-medical-with-and-Bryan-Krych/129e26f1dee338060fcc0ad58ba7ce60a1540091?p2df
Chance, B., & Peck, R. (2015). From curriculum guidelines to learning outcomes: Assessment at the program level. The American Statistician, 69(4), 409–416. https://doi.org/10.1080/00031305.2015.1077730k
Chang, W., Cheng, J., Allaire, J., Xie, Y., & McPherson, J. (2017). Shiny: Web application framework for R (Version 1.5.0.). R Studio. https://CRAN.R-project.org/package=shiny
Dochy, F. J. R. C., Segers, M., & Sluijsmans, D. (1999). The use of self-, peer and co-assessment in higher education: A review. Studies in Higher education, 24(3), 331–350. https://doi.org/10.1080/03075079912331379935
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766. https://doi.org/10.1080/10618600.2017.1384734
Edleson, H. (2012). Do 360 evaluations work. American Psychology Association, 43(10), 58. https://www.apa.org/monitor/2012/11/360-evaluations
Gelman, A., & Loken, E. (2012). Statisticians: When we teach, we don’t practice what we preach. Chance, 25(1), 47–48. https://chance.amstat.org/2012/02/ethics-25-1/
He, X., Madigan, C., Wellner, J., & Yu, B. (2019). Statistics at a crossroads: Who is for the challenge? [Workshop report]. National Science Foundation. https://www.nsf.gov/mps/dms/documents/Statistics_at_a_Crossroads_Workshop_Report_2019.pdf
Heathfield, S. M. (2020, July 15). What is a 360 review? Definition and examples of a 360 review. The Balance Careers. https://www.thebalancecareers.com/what-is-a-360-review-1917541
Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Medicine, 2(8), Article e124. https://doi.org/10.1371/journal.pmed.0020124
Kettenring, J. R. (1994). Modern interdisciplinary university statistics education: Proceedings of a symposium. National Academies Press. https://www.nap.edu/catalog/2355/modern-interdisciplinary-university-statistics-education-proceedings-of-a-symposium
Leek, J. T., & Peng, R. D. (2015). What is the question? Science, 347(6228), 1314–1315. https://doi.org/10.1126/science.aaa6146
National Academies of Sciences, Engineering, and Medicine. (2018). Data science for undergraduates: Opportunities and options. National Academies Press. https://www.nap.edu/catalog/25104/data-science-for-undergraduates-opportunities-and-options
Tukey, J. W. (1962). The future of data analysis. The Annals of Mathematical Statistics, 33(1), 1–67. https://projecteuclid.org/euclid.aoms/1177704711
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133. https://doi.org/10.1080/00031305.2016.1154108
Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p < 0.05.” The American Statistician, 73(Suppl. 1), 1–19. https://doi.org/10.1080/00031305.2019.1583913
Weinberger, C. J., Evans, J. A., & Allesina, S. (2015). Ten simple (empirical) rules for writing science. PLoS Computational Biology, 11(4), Article e1004205. https://doi.org/10.1371/journal.pcbi.1004205
©2021 Eric D. Kolaczyk, Haviland Wright, and Masanao Yajima. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.
An interview with Shuang Frost, Aleksandrina Goeva, Javin Pombra, Sara Stoudt, Ana Trisovic, and Chris Wang by William Seaton and Catherine Zucker