Many data science students and practitioners are reluctant to adopt good coding practices as long as the code ‘works.’ However, code standards are an important part of modern data science practice, and they play an essential role in the development of data acumen. Good coding practices lead to more reliable code and save more time than they cost, making them important even for beginners. We believe that principled coding is vital for quality data science practice. To effectively instill these practices within academic programs, instructors and programs need to begin establishing these practices early, to reinforce them often, and to hold themselves to a higher standard while guiding students. We describe key aspects of good coding practices for data science, illustrating with examples in R and in Python, though similar standards are applicable to other software environments. Practical coding guidelines are organized into a top ten list.
Keywords: data acumen, data science, data science practice, data science education, code quality, code style
08/30/2023: To preview this content, click below for the Just Accepted version of the article. This peer-reviewed version has been accepted for its content and is currently being copyedited to conform with HDSR’s style and formatting requirements.
©2023 Randall Pruim, Maria-Cristiana Gîrjău, and Nicholas J. Horton. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.