Software Engineering Practices in Academia: Promoting the 3Rs— Readability, Resilience, and Reuse

Over the past decade as data science has become integral to the research workflow, we, like many others, have learned that good data science requires high-quality software engineering. Unfortunately, our experience is that many data science projects can be limited by the absence of software engineering processes. We advocate that data science projects should incorporate what we call the 3Rs of software engineering: readability (human understandable codes), resilience (fails rarely/gracefully), and reuse (can easily be used by others and can be embedded in other software). This article discusses engineering practices that promote 3R software in academia. We emphasize that best practices in academia may differ from those in industry because of substantial differences in project scope (most academic projects have a single developer who is the sole user) and the reward systems in place in academia. We provide a framework for selecting a level of software engineering rigor that aligns well with the project scope, something that may change over time. We further discuss how to improve training in software engineering skills in an academic environment and how to build communities of practice that span across disciplines.

for software documentation along with an automated system for documenting Tellurium application programming interfaces.

Astropy
Astrophysics has a long history of community software development extending back to the 1980s. These initiatives have been orchestrated by large organizations (e.g., those funded through government support such as IRAF [Image Reduction and Analysis Facility]; Tody, 1986, and Starlink;Currie, 2014) and by smaller groups of users who built domain-specific applications (e.g., the analysis package for gamma-ray astronomy, gammapy; Deil et al., 2017). More recently, a third development strategy has emerged in which packages (or frameworks) of larger scope are created through the integration of many smaller packages. Astropy (Price-Whelan et al., 2018), created in 2011, is an example of such a strategy.
Astropy was motivated by the rise of Python as a lingua franca in astronomy. A central goal of the Astropy Project was to provide consistency and completeness for common calculations and tools used by astronomers.
Examples of these tools include unit conversions, the manipulation of sky coordinates (e.g., transforming from Galactic coordinates to Right Ascension and Declination), and software to read and write common astronomical data formats.
The packages integrated into Astropy were, in large part, developed by researchers well versed in software engineering practices. For example, constituent packages made use of version control via GitHub, had unit tests for the core libraries and functions, and issued pull requests as a means of developing new features.
Packages were distributed using common software repositories such as PyPI (Python Package Index). Tools for continuous integration (e.g., Travis CI and Jenkins) were adopted early in the development of Astropy to improve the reliability and robustness of the software. This solid engineering foundation greatly facilitated the construction of Astropy and its adoption by the community.
Even with this engineering foundation, package integration was nontrivial because of the need for common abstractions across packages. An example of this is the sub-package astropy.units, which provides a representation of physical units used in astrophysics, enables translation between units, and has the ability to decompose complex parameters (e.g., the Hubble parameter) into their base units (i.e., inverse time). As with many early Astropy packages, the units package was developed from an existing application that had introduced units to cosmological simulation software and was then extended to support the needs of the broader astronomical community. The lack of existing standards within astronomy for units led to the inclusion of all available standards within the package to make it as general as possible. Functionality to translate between conventions enabled the units package to provide general support without forcing the community to agree on a set of standards. This 'ease of use' philosophy underlined many of the Astropy design choices.
Building the community of Astropy users, maintainers, and developers required convincing astronomers with little or no formal training in software engineering to adopt these standard tools and procedures if they wanted to contribute to the code base. With little financial resources, the training and education of the user community was supported by the Astropy developers themselves. The availability of GitHub, on which Astropy was built, provided the tools and infrastructure on which to develop community-agreed engineering practices for version control, issue tracking, and communication. The availability of Infrastructure tools and repositories such as GitHub are key to the sustainability of software projects in astronomy and enable common approaches to be adopted within a community.

3R Software Engineering Practices
The foregoing academic software projects, while different in size and scope, show how good software engineering practices can increase the adoption and trust of software packages beyond the developers who wrote them. Based on this and our experiences in software development, we have developed a set of recommendations for software engineering practices that aid in the development of 3R software. We do this with great humility since software engineering is a field with a long history and a vast literature (see Boehm, 1976;Glass et al., 2002). A good starting point for this literature is on artifact sharing (Timperley et al., 2020).
By engineering practice, we mean a collection of related activities used in building, evolving, and managing software systems. Examples of engineering activities include coding, quality assurance, and distributing software. We use the term artifact to refer to work products of software engineering, especially code, documentation, and data. We note in passing that data science produces other artifacts, such as predictive models and analysis pipelines. These artifacts are beyond the scope of this article.
Our recommendations come in part from the experience of members of the eScience Institute at the University of Washington and discussions. eScience has more than 20 technical staff, almost all of whom have a PhD in a primary discipline such as Computer Science, Physics, Human Centered Design, Statistics, and Chemistry.
Technical staff are engaged in many educational programs. Some teach formal courses in their departments.
Others orchestrate and/or instruct Software Carpentries. eScience oversees the development of data science curricula and courses at the graduate and undergraduate level. We also have an extensive outreach program for on-campus researchers, our Winter Incubator in which domain researchers dedicate 16 hours per week for a quarter to work with a member of eScience technical staff. In the summer we run a program for matriculating undergraduates and graduate students from across the county (and even globally) to undertake data science projects for social good (DSSG). Beyond this, we offer approximately 20 hours per week of office hours to researchers who seek focused consultations with technical staff. These interactions provided us with extensive insights into the challenges encountered in academic projects of varying size and duration.
The engineering practices we propose combine the collective insights of eScience technical staff with feedback from software engineers in industry and at national laboratories as well as researchers who teach software engineering in an academic environment. Where possible, we point to published recommendations and draw on work from the field of software architecture and development practices. The vast majority of this literature focuses on team processes for complex projects, with only a modest discussion of software development (actually building software) and almost nothing on maintaining and extending existing academic software.
From our experience, since many academic projects are short-term and of small scale, it is appropriate to apply a level of engineering rigor that is commensurate with the project scope. An important consideration is, however, to highlight the requirements for transitioning a project to a larger scope. We provide recommendations of best practices and how these practices may evolve as a project moves from a single developer, to use within a self-contained team, and then possibly to broad adoption by a research community.
Since our interest is in data science, we focus on Python and R, the most widely used languages in data science.

Engineering Practices and Their Interactions
A central theme in this article is that engineering practices should be scaled to the project scope. That is, in smaller projects, a practice may be greatly simplified or absent altogether. However, if a project grows, there must be awareness of how to incorporate engineering practices that were not considered previously. In the following, we describe how engineering practices need to evolve as projects grow. Although we have recommendations for what practices to change, there is less agreement from the use cases we have studied about what should trigger a change in engineering practices or how to build consensus within a developer community to adopt these changes. We refer readers to studies on this topic related to the adoption of developer tools (Brooke Jordan, 2014) and security analysis (Jaspan et al., 2007).
Broadly, there are technical and people management activities (which we use synonymously with practices) within software engineering. Technical practices produce code, data, and documentation of the software internals. Refining this further, the production of code and data includes design, quality assurance (e.g., testing), and packaging and deployment. People management activities address coordination and communication within the project and communication between project developers and users. The people management practices have associated artifacts as well, such as project plans and prioritized lists of features and fixes.
The nature of engineering practices depends strongly on the scope of the project. An example of a project with a small scope is a short-term exploratory effort by a single researcher. In contrast, a project with large scope often involves multiple teams at different locations. We consider three project scopes: Developing for your own use (solo). Our experience is that the vast majority of academic software projects consist of a single developer who is the sole user of the software. These projects are often undertaken as part of a research exploration. Few academic projects advance beyond this stage.
Developing for your research lab (lab). Many researchers work in teams. They often find that the problem solved by their software can be used by others in their team. In these projects, developers and users are in We emphasize that the boundaries between these scopes are fluid. For example, a solo project may evolve into a lab project, and the reverse can happen as well.

Details of Engineering Practices
There is a vast literature on software engineering and engineering practices. In this section, we describe a subset of these practices that we feel are most relevant to data science. These practices relate to: version control, design, coding, quality assurance, packaging and deployment, user documentation, team management, and user engagement. We organize the discussion by project scope (including some references to more in-depth discussions of these software engineering practices). For each topic, there are two bullets. The first describes what the practice is and why it is important; the second bullet outlines some recommended tools and best practices.
We begin with version control (Blokdyk, 2022). (Keeling, 2017) is at the core of creating resilient (e.g., by early consideration of error conditions) and reusable software (e.g., by a modular design). frequent contact.

Developing for a broad research community (community).
There are a small number of projects that are used by a broader research community and/or of sufficient technical scope that a large team is required.
What: Version control deals with tracking changes to artifacts (e.g., code, documents, data) in shared collections of files called repositories. Version control is an essential part of making software resilient and reusable.
How: For software, services such as GitHub (Ponuthorai, 2023) (Wikipedia, n.d.-e) allow users to have a code repository where changes can be viewed. Commonly used features are (a) undoing a change that introduced an error and (b) coordinating changes among multiple developers. Another widely used feature is a version control 'branch' that allows developers to make changes in parallel (and also facilities managing experimental data). A solo project needs version control to ensure that the code is not lost and to revert to previous versions if a bug is introduced. They enable experimentation and exploration of new ideas without impacting the primary or main branch. A lab project has additional requirements, such as resolving 'change conflicts' (changes to the same line in a file by different developers). In a community project, more formal coordination is done to manage releases, develop new packages or features based on the initial code base (i.e., forks), integrate codes from other groups, and handle urgent bug fixes ('hot fixes') that are done between formal releases.
What: There are multiple components to software design of which we consider two critical for data science applications. First is the design of the user experience or use cases, often referred to as functional design. (Kerninghan & Pike, 1999) is the process of writing detailed instructions so that a computer can perform a desired task.

Computer programming or coding
In industry, quality assurance (Patton, 2005) is a very broad term that encompasses the entire engineering process. This is about how a user interacts with the system to accomplish their objectives. The second is component design. This specifies how to create and interconnect software artifacts that perform the use cases.
How: Appendices B and C contain simplified templates that we developed for functional and component design that we use in CSE 583 (University of Washington, 2023-b) and DATA 515A (University of Washington, 2023-c) at the University of Washington and in CHEME 545 and 546 (University of Washington, 2023-a). The functional design specifies a set of use cases that are detailed descriptions of user interactions with the software system. Appendix B contains a template for functional design. A component design can be expressed in many ways, such as: a data flow diagram (Li & Chen, 2009), UML diagrams that describe objects with properties and behaviors (Fowler, 2003), and entity-relationship diagrams (Li & Chen, 2009). Appendix C contains a template for component design. In solo projects, design may be done informally (e.g., in a notebook). In a lab project, there is often some discussion that requires a shared white board and sometimes an informal write-up. In community projects, more formality is required, such as a standard template for function and component design documents. How: Over the last 50 years, programming has evolved into a systematic engineering activity with powerful productivity tools. Examples of such tools are integrated development environments (IDEs) for Python (e.g., PyCharm; Nguyen, 2019) and R (e.g., Allaire, 2011). In recent years there has been a trend toward "literate programming" (Knuth, 1992), especially 'notebooks' (e.g., Jupyter) that intermix code with text to provide a narrative for an analysis. For a solo project, readability is greatly improved by providing notes about decisions made (e.g., a GitHub README file or within the notebook) as well as the use of consistent naming conventions to facilitate understanding codes written months earlier. In a lab project, readability and resilience are enhanced agreement on common data structures and coding styles (with tools such as linters to enforce style). A community project often takes this a step further by having code reviews in which developers explain their motivations for engineering decisions and reviewers advise on approaches to improve reuse.
What: Quality assurance is about ensuring resilience, good performance, security, and privacy.
Packaging and deployment (e.g., Waldon, 2012) are activities that make software available to users, an essential element of making software reusable.
User documentation (Bhatti, 2021) is the written descriptions that accompany a software package so that nondevelopers can effectively use the software, a key consideration in building reusable software.
How: For academic projects, the focus is mostly about testing for errors at various levels. For a solo project, it likely means implementing unit tests (codes that check for errors in functions and methods) for key elements of the project. Excellent open source packages are available to enable these tests (e.g., Python unittest and R testthat). For a lab project, unit tests are more extensive, and there is continuous integration (e.g., run all unit tests after every commit to the software repository). A community project likely includes additional quality tests for each software release to ensure there is no 'regression' in future releases. (See Nielsen, 2000, for a more detailed discussion.)

What:
The goal is to make software developed by one user available to other users. This often requires that the software developer structure their codes into a package that can be shared with other users. Users may have different software installed on their computers, even different operating systems. So, the package must specify its dependencies, such as a particular version of the Python library numpy . This raises a further challenge that two packages may have conflicting requirements, such as different versions of numpy .
How: Most academic projects use an install model of package deployment in which the user's computer is updated to incorporate the software. PyPI is the most common mechanism for distributing Python packages, and CRAN is widely used for R packages. Other software repositories include Conda and SourceForge.
There are also service models for software distribution in which the software runs on servers owned by the provider and users are not aware of the software updates (e.g., Gmail). Still another approach is containerbased distribution (e.g., Docker). For a solo project, there may be no packaging and deployment since the code runs on a single machine for a single user, but to support the reproducibility of the research a welldefined and reproducible development environment can be critical. For a lab project, it is common that all machines in the lab run an almost identical software stack (e.g., the same version of Linux and Python packages), and often codes are relatively machine independent (e.g., Python, R); so, deployment is done via PyPI for Python and CRAN for R (with their associated packaging requirements). A recent trend is to use virtual machines in the cloud so that even if physical machines have different software, the virtual machines are identical. A community project often involves multiple languages and hardware platforms, and so packaging and distribution is more complex. One such complexity is that quality assurance must include testing of packaging and installs.
What: User documentation covers installation, basic usage, and a detailed reference manual for advanced users. For example, a screen scraper application might specify a command line to install the tool, illustrate its usage on a page from The New York Times, and point to detailed documentation on options for different kinds of web pages.
We use the term team management (Project Management Institute, 2017) to refer to those aspects of project management that address the internals of the project.
User engagement (Cagan, 2018) addresses interactions between the software developers and users of the software. Table 1 summarizes the foregoing discussion providing examples of software packages that can support the software engineering practices (e.g., linters, and unit test frameworks). The rows are software engineering practices, and columns are the three project scopes: solo, lab, and community. The rigor of engineering practices increases as the scope of the project progresses from solo to community.
We use this table to recommend engineering activities for an academic environment. Our expectation is that most projects are not adequately characterized by a single column, and so we expect that projects may adjust How: Solo projects have modest needs here, mostly to ensure that the developer can easily recall how to use their software some months after it was written (and the methods or research papers underlying its development). In a lab project, developers may provide a 'help' option for command line tools and/or a onepage summary of usage (a 'manual page, ' Linux, 2009.)

or a Jupyter Notebook (Ragan-Kelley et al., 2014).
In community projects, there are more extensive capabilities (e.g., Read the Docs; Cotton, 2016) that contain detailed descriptions of the software features, examples, and capabilities for searching documentation.
What: Examples of team management include: agreeing on common objectives, developing a plan (tasks, people, deadlines), progress monitoring, and plan evolution.
How: Agile practices are widely used for managing software projects (Martin, 2003), an approach that iteratively delivers prototypes, an approach that applies to all project scopes. Little is required for a solo project beyond an individual prioritizing their activities. For a lab project, lab managers (e.g., principal investigators) may find it useful to have a spreadsheet that describes who is working on which feature and the expected completion dates. For a community project, there is often coordination across physical locations. Managing the dependencies between teams may demand the use of project management software (Nieto-Rodriguez, 2022) as well as a designated project manager or package maintainer who tracks progress of the project plan.
What: Sometimes this is included in project management or product management (delivering products customers want). We separate this aspect because the role often goes unnoticed as a lab project grows into a community project.
How: In a solo or lab project, user engagement typically involves a hallway conversation with a peer researcher. However, in a community project, communicating with users may require using GitHub issues, a designated email account, and even periodic user group meetings.
Harvard Data Science Review • Issue 5.2, Spring 2023 Software Engineering Practices in Academia: Promoting the 3Rs-Readability, Resilience, and Reuse 13 their practices by employing recommendations from more than one column. We note that the table includes a number of technical terms. Rather than defining these inline, we have included a glossary as Appendix A. engineering processes to researchers from a broad set of domains. The areas covered in these courses include the UNIX shell environment, version control using git, and an introduction to Python and plotting with Python.
More advanced topics such as automating the building of software and the use of databases and SQL are available. Software Carpentry provides much less depth on more advanced software engineering practices such as unit testing and continuous integration. The limited extent of the Carpentry courses (typically a few days) means that it is harder to integrate the processes within the everyday work or research by a student (particularly more advanced practices). Only through repeated use of these practices do they become embedded in the way that we work.
We have two broad thoughts about changes in academic curriculum that are required to address the development of 3R skills. First, we believe that the focus should be on undergraduate courses. One reason is that it provides a scalable mechanism to prepare students for 21st-century careers in academia and research.
The other reason is that these undergraduate courses can also be available to graduate and postgraduate students who need 3R skills in software engineering. That is, our focus is on undergraduate courses, but the training will be done at all levels in the university.
Our second thought is that the courses for developing 3R skills need to be radically redesigned. At present, these course sequences are a lightweight version of the material taught to CS undergraduate majors. That is, courses early in the sequence focus on theory; only toward the end of the course sequence do students acquire 3R skills. We recommend that the material be restructured so that 3R skills are taught (and practiced) early on.
More advanced courses in the sequence should provide greater sophistication in areas such as programming (e.g., abstraction techniques) and data structures (e.g., complexity analysis). There are a couple of examples of a first course in such a sequence. At the University of Washington, CSE 583 "Software Development for Data Scientists'' (Beck, 2018) is a one-quarter course on software engineering for non-CS graduate students that covers all of the engineering practices described above and includes a capstone project to practice these skills.
We close with more details about the syllabus for CSE 583. The intent of this course is to develop 3R skills for students who have little programming backgrounds. Key topics are: review of Python programming; version control with GitHub; the bash command line; constructing Python modules; unit tests (both what to test and how to use the unittest package); creating PyPI packages; continuous integration; and team processes.
Team processes include code reviews, technology reviews (how to choose a software dependency), and project planning. After the topics are addressed individually, students gain practice in their use by doing a class project with a team of three to four students.

The Future of Software Engineering for Academic Researchers
One major direction we are pursuing at the University of Washington is to develop a community of practice for looking for a 'second act.' A critical aspect to the success of such a program is the retention of good talent. Carver et al.'s (2022) survey found an overwhelming concern about the lack of career paths for software professionals in academia. This will require careful thought about the career paths for software engineers within the academic environment, an environment that puts a premium on published articles, not software projects. Retaining skilled software engineers will require providing appealing career paths in academic institutions.
We have a few insights as to how to attract experienced software engineers. We have learned much from hiring software engineers for the recently created Scientific Software Engineering Center at eScience. The goal of the center is to apply industry-grade software engineering practices to the development of research software for science. Hence, we mostly targeted industry for sourcing software engineering talent.
We have several observations based on our experience over the last 6 months of hiring. First, it is easier to recruit senior software engineers who have spent a decade or more in industry. They are attracted to the mission, engineering autonomy and scope, and potential to have impact on scientific breakthroughs after spending years on commercial projects that are mainly focused on profit through extremely specific engineering optimizations, often as very small cogs in large engineering-product teams. Second, it is extremely difficult to reach parity with private industry in terms of compensation, which makes it hard to attract junior to mid-level software engineers with industry experience as they are less likely to depart from lucrative careers in the private sector. Third, there is a lack of formal structure for software development in academia. This is both an opportunity for engineers to extend their skills in eliciting software requirements and a challenge as it slows the pace of engineering output due to the high degree of uncertainty when projects are launched. This is sometimes a constraint during recruiting as the uncertainty could be seen as a lack of investment in supplemental roles such as customer success, product design, program/product management, and software ecosystem and servicing-roles that allow software engineers to focus on software coding-related tasks in which they intend to continue growing their skills.
The biggest advantage of software engineering in academia is the culture of openness and the opportunity to change the trajectory of a multiyear investment in science by contributing highly sought after engineering products to a dedicated community of scientists and researchers. This community impact goes beyond the organization or region to benefit society at large. We expect this to be a primary factor in retaining software engineers in academia, and in establishing the perception of research software engineering in academia as a highly fulfilling career path.

Conclusions
Our experience at the eScience Institute is that successful data science projects create software that is readable by others, resilient to variations in usage, and reusable by embedding within other software. We refer to these considerations as the 3Rs of software engineering.
This article addresses engineering practices that create 3R software. By engineering practice, we mean much more than coding, although coding is an important element. Among the engineering practices we discuss are: version control, design, quality assurance, packaging, documentation, and project management.
There are robust industry practices for creating 3R software. However, many of these practices are skillsintensive and time-consuming. Further, although application of these practices can result in a high level of 3R capabilities, this outcome is poorly matched with the needs of most academic projects. Most academic projects Reuse 20 are quite small; they consist of a single researcher who is the sole user of the software. A modest number of academic software projects address multiple users in the same lab. Very few academic projects are directed at a large research community. Often the transition from a single-user application to community-developed software arises organically rather than from a decision at the start of a project. These considerations led us to restructure software engineering practices into a progression of increasing rigor to better match the needs of academic projects with different scopes.
The need for 3R skills for academic software led us to examine teaching and training of software engineering.
We provide an in-depth analysis of our institution, the University of Washington, and we provide some insights into the situations at Carnegie Mellon University and the University of California at Berkeley. We conclude that undergraduates outside of CS (or related departments, such as electrical engineering) face significant challenges with acquiring 3R skills because of the limited time available in undergraduate majors to take prerequisite courses and the competition to take these courses.
We touch on another path to creating 3R software-building a 'community of practice.'. This is a team of experienced research software engineers (i.e., an RSE team) who apply engineering best practices to research projects. This is not an alternative to teaching and training, rather it complements those efforts. One example of an RSE team is LINCC Frameworks (n.d.) a joint project between the University of Washington, Carnegie Mellon University and the LSST Corporation to develop scientific software to analyze data from the Rubin

Observatory Legacy Survey of Space and Time (LSST). A broader initiative is the recently announced Virtual
Institute for Scientific Software (VISS) (Boyle, 2022) that seeks to accelerate scientific discoveries through the development of 3R software for a diverse set of academic projects.
A further consideration is cultural. In academia, the criteria for success is the publication of the results. In contrast, success in a software engineering culture is creating software that is widely used, and has a reputation for good quality. These cultural differences can create an 'impedance mismatch' that may present challenges for an RSE team and promoting 3R software.
If we can address these challenges, we have an opportunity to increase the readability, resilience, and reuse of research software in the United States and throughout the world. Doing so will accelerate the progress of research. It will also aid in workforce development by having more undergraduates trained in software development and by providing a community of practice to support the careers and advancement of those software developers in academia.