Data Science and Computing: The View From a Sister Campus

Hal S. Stern1 Debra J. Richardson2 Marios Papaefthymiou3 1Department of Statistics, Donald Bren School of Information and Computer Sciences, University of California Irvine, Irvine, California, United States of America, 2Department of Informatics, Donald Bren School of Information and Computer Sciences, University of California Irvine, Irvine, California, United States of America, 3Department of Computer Science, Donald Bren School of Information and Computer Sciences, University of California Irvine, Irvine, California, United States of America

algorithms, databases, graphics, and software engineering.It also developed expertise in human-computer interaction and the social and organizational impacts of computing, and welcomed interdisciplinary collaborations that were unusual at the time.Statistics developed late at UC Irvine; the department was founded in 2002, at roughly the same time that the burgeoning Department of ICS was transitioning to a school with multiple departments.It was natural to include statistics, a discipline with both information and computational elements, in the new school.
The combination of the three key elements, computing, statistics, and information, in a single academic entity, as Berkeley is building, provides a number of advantages.At Irvine, we share a common physical environment (a building houses the large majority of our faculty) and work together often on hiring committees, seminar series, grant proposals, and so on.Collaborations in teaching, research, and program development are easily managed in such an environment.We briefly describe the experience at UC Irvine in building up educational programs, carrying out impactful research, and partnering with our colleagues on campus.

Educational Programs
Having the computer science, informatics, and statistics departments in a single school makes developing educational programs straightforward.University resources often depend on student enrollments within a school.Partitioning credit for programs that are joint with other academic units can be a challenge or impediment, though one that can usually be worked out.Not having the challenge enabled us to set up a data science major relatively quickly; our program began admitting students in 2016.Our B.S. in Data Science program is focused on developing students with strong skills in mathematics, statistics, and computing who are well equipped to be partners in developing novel methods to address a range of data types and to participate in collaborative teams using data to inform science and policy.Upon graduation, the students have the At the doctoral level, the Ph.D. programs in all three departments allow students to focus on different aspects of the data and computing world.Our students easily enroll in courses across the departments to develop the skills they need to address their research questions.

Research
ICS at UC Irvine has a long history of leadership in data-centric research issues, machine learning, and artificial intelligence.For example, from its earliest days in the late 1960s, ICS offered graduate coursework in artificial intelligence.This was quite rare at the time.In 1987, ICS graduate students set up the UC Irvine Machine Learning Repository containing data sets that the research community can use to benchmark new methods or to support educational efforts.The archive has been cited more than 1,000 times.Recently refreshed through funding from the National Science Foundation, the repository contains over 500 data sets.
Faculty in ICS carry out highly regarded research programs in the core disciplines central to data science and computing.Rather than describe these in detail, we focus here on the collaborative research partnerships that complement core research activities in statistics, machine learning, artificial intelligence, and informatics.These partnerships involve other researchers at UCI, researchers at other universities, and corporate partners.
There are partnerships across the range of disciplines that Chayes describes in her Section 5. Notable initiatives include the Connected Learning Lab, the Center for Statistics and Applications in Forensic Evidence (CSAFE), and the Institute for Genomics and Bioinformatics.The Connected Learning Lab includes faculty from six different academic schools at UCI studying and mobilizing learning technologies in equitable, innovative, and learner-centered ways.CSAFE is a research partnership of a number of universities focused on developing novel methods to accurately assess forensic evidence and thereby assure a more equitable justice system.The Institute for Genomics and Bioinformatics has been around for nearly 15 years fostering innovative basic and applied research at the intersection of the life and computational sciences.Its members partner with UCI researchers in medicine and biological sciences on issues associated with personalizing medical treatments.
The centers listed are just a few examples.Faculty in ICS cover the campus with their partnerships and collaborative grants and projects.They are central members of nearly all of the extramurally funded interdisciplinary collaborations on campus that are the hallmark of modern science.The latest innovation in data science and computing at UC Irvine is the philanthropically funded Irvine Initiative in AI, Law, and Society, which brings together faculty, graduate students, and researchers from across campus (especially in the UCI Schools of Law, Social Science, and ICS) focused on the promise and challenges of the increasing use of algorithms in science and policy.

Challenges and Opportunities
The academic world is a place of great opportunity for those with interests in data science and computing and their combined impact on science and policy.As described by Chayes and reiterated here, there are countless examples where large, heterogeneous, and diverse data sets are changing the way research is done in the physical sciences, the life sciences, the social sciences, the humanities, and so on.As described above, modern research teams require collaboration of data science and computing researchers with those in a wide range of disciplines.In many areas, data sets are also posing novel questions about society-How can we ensure that data and algorithms are inclusive and equitable?How can we ensure that data are secure?How can we protect the privacy of data sources?Answering such questions also requires partnerships of individuals from many disciplines.Yet, all too often, university communities evaluate their members by the standards of individual disciplines.We need to make sure that faculty reviews and promotions appropriately value the interdisciplinary collaborations that are the key to scientific breakthroughs.
background to assume positions in industry or pursue graduate school in either statistics or computer science if they have that desire.The program includes course sequences in statistical methods and calculus-based statistical theory.It also includes a range of core computer science topics like algorithms, data structures, data management, and computer organization.The program concludes with a senior-level requirement for a twoquarter (20-week) team-based project course.This capstone requirement reinforces the critical role that teams play in successful efforts to address science and policy questions.To get beyond our major and reach out to other students on campus, UCI has developed its own Introduction to Data Science course.With no prerequisite, the course introduces the full data cycle.Topics include data collection and retrieval, data cleaning, exploratory analysis and visualization, introduction to statistical modeling and inference, and communicating findings.Applications include real data from a wide-range of fields following reproducible practices.Students learn to make predictions and inferences using models and consider the impacts of data-based decisions.More recently, UC Irvine has developed a master of data science program (MDS) aimed at developing a cadre of well-trained data professionals to support the needs of industry and government.This full-time program can be completed in approximately 15 months.With minimal prerequisites (programming, calculus, linear algebra, and introductory probability/statistics), the program provides background sufficient to allow students to identify data relevant to a scientific or policy question, apply the appropriate statistical and computational skills to gather the data, and implement relevant analytical procedures over large, heterogeneous data sets in a modern cloud computing environment.Like the undergraduate program, a key element is a capstone course that provides additional training in writing and communication.This new program is welcoming its first cohort in the Fall of 2021.

Harvard
Data Science Review • Issue 3.2, Spring 2021 Data Science and Computing: The View From a Sister Campus 4