The scale and speed of the generative AI (artificial intelligence) revolution, while offering unprecedented opportunities to advance science, is also challenging the traditional academic research model in fundamental ways. The academic research model and academic institutions are not set up to be nimble in the face of rapidly advancing technologies, and the task of adopting such new technologies usually falls on individual researchers. Excitement about the opportunities that generative AI brings is leading to a rush of researchers with various levels of technical expertise and access to resources to adopt this new technology, which could lead to many researchers ‘reinventing the wheel’ and research outcomes lacking in ethics, rigor, and reproducibility. This problem not only applies to generative AI, but could also be true for other upcoming and similarly disruptive technologies. We argue that the current norm of relying on individual researchers for new technology adoption is no longer adequate. It is time that academic institutions and their research organizations such as our own (the Michigan Institute for Data Science) develop new mechanisms to help researchers adopt new technologies, especially those that cause major seismic shifts such as generative AI. We believe this is essential for helping academic researchers stay at the forefront of research and discovery, while preserving the validity and trustworthiness of science.
Keywords: institutional transformation, best practices, training, academic researcher, rigor and reproducibility, institutional support
Generative AI (artificial intelligence) is a type of AI algorithm that can generate new content (such as text, images, audio, video, and other modalities) that is statistically probable based on the data that the algorithm is trained on (Bommasani et al., 2021; Cao et al., 2023; Dwivedi et al., 2023; Gozalo-Brizuela & Garrido-Merchan, 2023; Vaswani et al., 2017). Compared to other types of AI technology, such as natural language processing, generative AI is based on newer AI architectures, most notably transformers and diffusion models, trained on enormous volumes of (sometimes multimodal) data in their natural forms (such as raw texts and images from the internet) without the need of labeling the training data. Generative AI thus opens up enormous possibilities to revolutionize how AI assists humans in all types of activities that involve interacting with a computer.
The emergence of generative AI has tantalized academic researchers with its potential to vastly accelerate research, and even to enable new research, in multiple ways (Boyko et al., 2023; Dwivedi et al., 2023; Microsoft Research AI4Science and Microsoft Azure Quantum, 2023; Morris, 2023; Wang et al., 2023).
The use of domain-agnostic generative tools (such as text and image generation) to improve research productivity, by assisting with routine tasks such as drafting and editing emails and manuscripts, checking for compliance, and facilitating the communication with the lay audience.
The use of domain-agnostic generative AI to enhance the research expertise of individual researchers and research teams. This includes summarizing and representing knowledge within disciplines, gathering interdisciplinary insights, and supporting communication for interdisciplinary collaboration.
The use of domain-agnostic and domain-specific generative AI to accelerate and automate the research process, such as data cleaning, formatting, and imputation; suggesting research hypotheses and selecting experimental parameters; coding, data analysis, and visualization.
The use of domain-specific generative models, such as for aerospace engineering or protein structure models, to enable new paths for research discovery.
Such possibilities are fueling researchers’ enthusiasm for incorporating generative AI in research, even though most of generative AI’s potential benefits for research remain to be tested and validated. Of the four types of generative AI use that we mention, the use of domain-specific models has been reported extensively (as examples, see Andrade & Walsh, 2023; Chenthamarakshan et al., 2023; Grisoni et al., 2021; Gu et al., 2023; Hie et al., 2023; Madani et al., 2023; Zeng et al., 2022). But successes of the first three types of generative AI use in research are only beginning to be reported (see the following examples: Boiko et al., 2023; Ciucă & Ting, 2023; Jablonka et al., 2023; Lyu et al., 2023; Mahjour et al., 2023). This enthusiasm is also accompanied by a lack of preparedness among researchers. In the academic research environment, many faculty members have no concrete idea about how to implement generative AI in their research, or even how to work with generative AI at all, including simply using prompts to query information.
Many also do not know a good starting point because new generative AI tools emerge almost daily and there is not an obvious path of skills progression. A survey that we conducted in November of 2023 of 60 faculty affiliates of the Michigan Institute for Data Science (MIDAS) (Table 1) gives us a glimpse of this picture. Only 12% of the respondents have the expertise to train their own generative AI models; fewer than one-third can run existing models or fine-tune models. Even after ChatGPT, which is supposed to be an easy-to-use tool, became available for almost a year, half of all respondents are not able to use prompts with ChatGPT to obtain good results. The faculty members’ biggest need is to develop skills through training and learning from peers. This closely mirrors a brief survey that we conducted in the summer of 2023 with MIDAS faculty, in which 70% of the 92 respondents indicated that they had no knowledge or only conceptual understanding (as opposed to hands-on practice) with generative AI. We believe this is representative of the academic research scene at this moment across institutions.
How do you want to use generative AI in your research? | Improving productivity (drafting documents, summarizing documents, etc): 72% Coding: 63% Data analysis and modeling: 55% Communication (email, presentation, etc.): 45% Helping with data generation, processing, and documentation: 38% |
What is the skill level in your research group with regard to generative AI? | Can use things like ChatGPT with prompts, but not using them well yet: 47% Can use things like ChatGPT with prompts and can get some good results: 48% Can run existing models: 28% Can fine tune existing models: 22% Can train models: 12% |
---|---|
What support is important for you to use generative AI in your research? | Technical tutorials: 68% Connecting with other researchers exploring GenAI to learn from each other: 60% Brainstorming sessions to develop project / grant ideas: 51% Finding collaborators on grants and projects: 42% Finding students: 42% |
The enthusiasm and the unpreparedness are naturally accompanied by researchers’ concerns about using generative AI. Some concerns are common among generative AI users in many lines of work, and include issues such as data privacy and confidentiality, the biases that the models inherit from the training data, the AI confabulation or hallucination, the opacity of data and training algorithms to the users of generative AI models, thus the inability to assess whether a model is appropriate for a certain type of use (Birhane et al., 2023; Liebrenz et al., 2023; Ray, 2023; Zhuo et al., 2023). In addition, there are also concerns specific to using generative AI for scientific research. The rigor and reproducibility of research with generative AI in the workflow has already become a major consideration. Any research with AI models that are not developed locally, and without transparency of data and algorithms, poses fundamental challenges throughout the research workflow, from study design and data query all the way to results validation (Li et al., 2023; Sohn, 2023; Spirling, 2023). Many researchers are already aware of such issues to various degrees. For example, stories about generative AI hallucinating citations are shared widely. But there is very little discussion yet of how well generative AI systems do in coming up with research hypotheses that are creative, testable, and of practical value. So it remains to be seen how the benefits of generative AI in research weigh against the negative consequences.
None of these concerns are new to academic researchers. There are always hopes and fears when a new technology emerges with the promise of transforming research and people wonder how best to adopt it for research innovation while upholding research integrity. These concerns, however, are amplified in the case of generative AI because of how quickly new AI systems are developed, while our understanding of the functions and limitations of these systems is still very limited (Bengio et al., 2023; Bommasani et al., 2023). These issues are further exacerbated when researchers at all skill levels rush to adopt generative AI methods in their research and there is not a standard or process for model selection or for quality control of the model use. What we will almost surely witness, then, will be a flood of research outcomes and publications of uncertain quality using generative AI, which will likely distract scientists from doing good research in the short term and may even have long-term impacts. Academic researchers are quite aware of these challenges. In fact, at a generative AI faculty workshop in the summer of 2023 (see more description in Section 2.2), the concerns of the attendees were reflected in the following specific topics:
A. Understanding model output, upholding research rigor and reproducibility.
How to think about research rigor and reproducibility when there is lack of transparency of generative AI models, and when the model output depends on the specific prompts.
How to assess the novelty of the model’s output.
How to identify and correct bias, misinformation, or erroneous training data and in model outputs.
How to think about data provenance and governance with generative AI models.
How to quantify uncertainty of model outputs.
B. Understanding issues of ethics, authorship, copyright, and privacy.
How to cite, acknowledge, and report generative AI in research work.
How to assess issues related to copyrighted training data, and model outputs based on copyrighted training data.
How to assess data privacy and confidentiality issues when researchers have little knowledge about the training data.
How to assess the balance between privacy / confidentiality and the need for data and model transparency.
Patent issues if a research idea is first suggested by generative AI.
C. Technical and infrastructure considerations with the use of generative AI in research.
Choosing a model and comparing models for a particular research question.
Fine-tuning models locally and the local resources needed for this.
Keeping up-to-date knowledge of generative AI models.
It is obvious to us that it is not feasible, or at least highly inefficient, if individual researchers are expected to address such issues themselves not only because most lack the expertise, effort, and resources, but also because they would each be reinventing the wheel. The typical researcher learns to use a new research method on their own or through their collaborators, and gets pointers to resources from someone they happen to interact with. This, somewhat random, social diffusion will not be sufficient when they need to acquire skills with a new technology overnight and put it to immediate research use, and also goes against the nationwide drive to ensure equitable access to AI technologies (National Artificial Intelligence Research Resource Task Force, 2023). It is also virtually impossible for researchers to individually assess model quality, validity, and reliability, leading to at least some guesswork in adoption and implementation choices.
We believe a new model of enabling the adoption of rapidly emerging technologies is sorely needed at this point, and we believe academic institutions and their research centers should play a critical role. Universities are already responsible for providing the research infrastructure, such as computing centers and research cores for scientific instrumentation, and supporting resource-intensive, large-scale, and high-throughput research. They should also be responsible for enabling the adoption of new technologies in research. Indeed, many universities are already keenly aware of the importance of generative AI and are already developing capacity, such as computing resources. The University of Michigan, for example, has just launched UMGPT, which provides a relatively secure environment for campus use, including research use. Some institutions are also training domain-specific generative AI models for academic research such as OLMo (Open Language Model) and the GatorTron (Yang et al., 2022).
However, these are not enough. We believe that the emergence of generative AI is a call for universities, as the home of new knowledge and the home of academic researchers, to play a much more active role in enabling academic researchers to develop new skills and adopt new research methods in ethical, responsible, and effective ways. This will likely have long-lasting benefits to research and discovery. Universities, however, are not set up to be nimble in ways that some businesses can be in response to new technology developments and ‘market trends.’ So what can be done?
We advocate for university-level research institutes to fill this need and help complete a solution–implementation–outcome process that will help academic researchers adopt new technologies or research standards (solutions) to achieve better research innovation and outcomes (Figure 1). While it is difficult to imagine an entire university being nimble in the face of an emerging technology, an organization within a university can be so. Universities often set up a research institute to advance a research area of importance. Indeed, there are many examples of institutes that have spearheaded research in a ‘hot’ area and risen to well-deserved prominence for their work in advancing the frontier, particularly in interdisciplinary areas.
But we believe there can be a very different role for a research institute at a university, which can have an even greater impact on science: to serve as a knowledge base and facilitator for the adoption of new methods that have the potential to transform research across a range of disciplines. Such methods frequently arise in fields such as data science and AI. generative AI is perhaps the best example because of its applicability in almost every line of work and its fast pace of advancement. But it surely is only one of the very first technologies that could bring sweeping changes. Hence, what we advocate for, supporting the adoption of generative AI in research, will be equally relevant for future waves of new technologies. In other words, academic research institutes can play a significant role in institutional transformation by developing and disseminating tools, training researchers, and establishing best practices, all of which are essential for researchers to swiftly adopt new technologies to stay at the forefront of research and innovation.
In the next section, we describe some of the work that we have already started to develop in this new role for our institute. The work is still very preliminary, given that we and the researchers that we support are still at the initial stage of understanding myriad considerations associated with generative AI. But it provides a starting point for further discussion on the institutional effort needed for adopting new technologies in academic research.
The Michigan Institute for Data Science (MIDAS) at the University of Michigan (U-M) has been investing effort for institutional transformation over the past few years, with an initial focus on technical skill development and rigor and reproducibility in data-intensive research. As U-M’s focal point of data science and AI research, the central goal of MIDAS is to enable the transformative use of data science and AI methods for both scientific and societal impact, across an enormous array of disciplines with wildly different epistemological approaches and data use practices.
Among its many threads of work in enabling research, providing training, and building research collaboration, one component is to teach new research methodology to faculty and staff researchers through a set of summer academies that introduce data science and AI research skills from the beginning level to advanced topics. These summer academies started as an experiment because we were uncertain of the needs; but in the past three years, the offering expanded from one week-long bootcamp per year to multiple week-long sessions, and has trained nearly 300 faculty and staff researchers. Our experience demonstrates that faculty and staff researchers need such opportunities to systematically learn new research methodologies.
MIDAS’s effort to improve rigor and reproducibility focuses on filling another important gap (Liu et al., 2022). Many journals, funding agencies, and professional societies have developed clear guidelines, requirements, and incentives for research rigor and reproducibility. Many researchers have a reasonable understanding of the issue and know what outcomes are expected from them. But the reproducibility problem remains serious, especially for data-intensive research that has a long and complex workflow (Hardwicke et al., 2021; Laurinavichyute et al., 2022; Stodden et al., 2018). Through collaboration with the university’s research community, MIDAS has coordinated grassroots efforts and developed online resources and training to enable rigor and reproducibility in data-intensive research. The MIDAS reproducibility online resource hub has had more than 10,000 visits. MIDAS is now developing a nationwide training program for faculty and staff scientists, funded by the National Institutes of Health, on improving the rigor and reproducibility of data-intensive research.
More importantly, through this work we have come to realize that a major gap in the researchers’ efforts to improve reproducibility is that they often lack the means or the expertise to translate guidelines into outcomes. In other words, researchers need to be handed validated methods/tools and know how to use them in order to complete the solution–implementation–outcome process (Figure 1). In this case, the solution is the reproducible research guidelines; the outcome is more reproducible research; and the implementation is the phase where researchers are equipped with appropriate tools and processes.
Such previous work has developed the mindset at MIDAS that allowed the team to plunge into action when generative AI ‘stormed’ the world stage. Since early 2023, MIDAS has started developing best practice guidelines, coordinating the exploration of generative AI for research, and providing training for researchers.
Just like the researchers themselves, almost all research organizations are scrambling to cope with generative AI and its regulation, which changes quickly. Guidelines in addition to researchers’ discretion are essential because generative AI’s use in research is fraught with issues every step of the way, from whether the training data is appropriate for a particular type of research to the validation of output. Its use to improve productivity can also be tangled with additional issues such as confidentiality and copyright. The National Institutes of Health and the National Science Foundation, for example, have already formally forbidden the use of generative AI in grant proposal review (National Institutes of Health, 2023; National Science Foundation, 2023). Many journals, such as Nature and Science, also prohibit certain types of usage of text and images created by generative AI (Flanagin et al., 2023; Harker, 2023). Understanding what they are or are not allowed to do is an additional challenge for researchers. We expect many such guidelines and that they will evolve quickly with time.
To provide a starting point for researchers, MIDAS compiled a set of guidelines that include the following topics:
Writing with generative AI
Can I use generative AI to write research papers?
Can I use generative AI to write grants?
Can I use generative AI to help me when I write a literature review section for my paper?
Can I use generative AI to write nontechnical summaries, create presentations, and translate my work?
Using generative AI to improve productivity
Can I use generative AI to review grant proposals or papers?
Can I use generative AI to write letters of support?
How can I use generative AI as a brainstorming partner in my research?
Using generative AI for data generation and analysis
Can I use generative AI to write code?
Can I use generative AI for data analysis and visualization?
Can I use generative AI as a substitute for human participants in surveys?
Can I use generative AI to label data?
Can I use generative AI to review data for errors and biases?
Reporting the use of generative AI
How do I cite contents created or assisted by generative AI?
How do I report the use of generative AI models in a paper?
Considerations for choosing generative AI models
How do I decide which generative AI to use in research?
Open source
Accuracy and precision
Cost
What uniquely generative AI issues should I consider when I adopt generative AI in my research?
Ethical issues
Bias in data
AI hallucination
Plagiarism
Prompt engineering
Knowledge cutoff data
Model continuity
Security
We selected these topics based on our discussions with researchers in our community. We are updating the guide several times a month as new guidelines are published from federal agencies, funding agencies, professional societies, and journals.
2.2. Demonstrating the Use of Generative AI in Research and Exploring Possibilities
Many academic researchers may have only tried using ChatGPT to draft an email or to edit some texts, but most of them are aware of the possibility of using generative AI to do much more and to accelerate research and enable new research ideas in many other ways. However, how this can be done is still elusive. For example, many have heard that generative AI can be used to summarize research literature. However, successful implementations are still very few, and researchers are concerned with many issues associated with such use, such as the indiscriminate inclusion of published work with poor quality or that is irreproducible, and bias against work in non-English languages. Many researchers are also aware that generative AI can help with data analysis, but what skills researchers need to have in order to ensure that the analysis is correct is also unclear to many. Domain-specific generative AI models have been used for protein structure research, drug design, material science, and many other fields of inquiry, yet many researchers are unclear what special skills and data are needed to train and deploy such models. Exposing researchers to successful examples, therefore, has been one of our top priorities.
MIDAS organized a faculty workshop in the summer of 2023 with 92 U-M faculty attendees. Twelve speakers demonstrated how they incorporated generative AI in research (health care research, chemistry, social science, arts and design), and discussed ethical and technical considerations as well as infrastructure challenges. The attendees participated in a few rounds of breakout discussions focusing on how generative AI can be used in research to improve productivity, significant research questions that can be boosted with generative AI, and ethical and technical challenges. The attendees came from 45 academic units at the university, with a diverse range of research areas (Table 2). Such diverse participation is a strong indicator of the widespread interest in generative AI.
Arts and Design | Biological Science | Engineering |
---|---|---|
Architecture and Urban Planning Arts and Design | Biostatistics Physiology | Aerospace Engineering Chemical Engineering Civil and Environmental Engineering Electrical Engineering and Computer Science Industrial and Operations Engineering Mechanical Engineering Nuclear Engineering and Radiological Sciences Robotics |
Environmental and Earth Sciences | Medical Science | Math and Physical Science |
Environment and Sustainability Climate and Space Science and Engineering | Anesthesiology Cardiac Surgery Computational Medicine and Bioinformatics Internal Medicine Kinesiology Learning Health Sciences Ophthalmology Pediatrics Pharmacy Psychiatry Radiation Oncology | Chemistry Mathematics Physics Statistics |
Social Science and Business | ||
Business Communications and Media Information Political Science Public Policy |
Also in summer, 2023, MIDAS organized a webinar series, Generative AI Coast-to-Coast (C2C), together with Johns Hopkins University, Rice University, the International Computer Science Institute, The Ohio State University, and University of Washington. The webinars featured eight speakers from the six institutions on “Generative AI in Healthcare,” “Generative AI in the Lab,” “Policy, Ethics and Generative AI,” and “An Under the Hood Look at Generative AI: Potentials and Pitfalls.” The goal of the series was also to demonstrate the successful implementation of generative AI in research, build collaboration, and point out the cautions to take.
Based on our faculty survey, receiving training is their top priority regarding generative AI. Thus, MIDAS has offered a series of hands-on tutorials as the starting point for academic researchers. The focus was not domain-specific generative AI models trained on technical data, such as protein structures; instead, the focus was on using generative AI, including large language models (LLM), in domain-agnostic ways. The topics included:
Writing, planning, and literature review: enhancing professional productivity with generative AI
Code smarter, not harder: harnessing generative AI for research programming efficiency
Integrating generative AI into your research workflow: using image generation as the example
Making generative AI better for you: fine-tuning and experimentation for custom research solutions
However, feedback from workshop attendees and researchers in our community indicated that we should have started from an even more basic place and be as hands-on as possible. As shown in Table 1, the vast majority of researchers would like to use generative AI to improve productivity, but only half are getting reasonable results even when using ChatGPT with prompts. Therefore, we are planning a few new tutorial sessions, making them more hands-on, and focusing on more basic tasks. The topics will include:
Improving general productivity with ChatGPT: non-research writing (emails, posters and presentations, checking for compliance, letters of recommendation, translation)
Finding, synthesizing, and summarizing literature with LLM
Generating simulated data with LLM
Data analysis and visualization with ChatGPT (text, image, and numeric data)
Drafting research articles with ChatGPT (drafting, writing, and formatting bibliographies)
During this set of work in the past few months, and through discussions with our research community, it is increasingly obvious to us that the support MIDAS is providing to academic researchers for generative AI is, in essence, filling the same ‘implementation’ gap that we identified in the solution–implementation–outcome model for research reproducibility (Table 3). In both cases, researchers know that there is a solution for them to do better research: incorporating generative AI in research, or following guidelines for research reproducibility. They also know in both cases what the ideal outcome should be: accelerated research and innovation, and more rigorous and reproducible research. However, in both cases the gap is that researchers are left to their own devices to implement the solution to achieve the outcome. In both cases, the effort from MIDAS can have a significant impact to fill the implementation gap.
Adopting Generative AI for Research | Improving Research Reproducibility | |
---|---|---|
Solution | Generative AI as a powerful tool to accelerate the research process, and develop previously infeasible research. | Guidelines for improving research reproducibility. |
Implementation gap | Most researchers do not have skills and resources to swiftly and responsibly adopt generative AI in research. | Most researchers do not have methods/tools at their disposal, and do not have the skills and resources to develop their own tools to improve research reproducibility. |
The role of academic research institutes | Identify researchers’ needs, develop guidelines, standard processes and tools to adopt generative AI in research, and enable the wide adoption of such processes and tools. | Identify researchers’ needs, develop standard processes and tools to improve research reproducibility and enable the wide adoption of such processes and tools. |
Outcome | Faster and more innovative research. | Improved research reproducibility. |
However, we do not believe simply providing guidelines, training, and research incubation is enough. Providing expertise to address common considerations might also be essential, which would include helping researchers assess and choose generative AI models for specific research applications; helping them assess model transparency and bias, as well as validating outputs; and, frankly, helping them stay up to date. As wave after wave of generative AI methods emerge, it is simply not feasible to expect that academic researchers can always know what newest developments are out there, and upskill in a timely manner, even with all the support that their institutions can provide. For example, just a year ago when ChatGPT first came out, many were dismissive because of the limited input and output formats, the types of generic information it could handle, privacy concerns, and its knowledge cutoff date. But within merely a year, with the improvements of ChatGPT itself and numerous plug-ins and other additions, ChatGPT can handle many types of files and data, do many types of data analysis and visualization, incorporate specialized knowledge in the model, and search the internet (no knowledge cutoff dates anymore) for updated information. Generative AI models can now be downloaded and run local models to get around privacy issues. Multiple generative AI–powered tools are now available to summarize scientific literature. The number of models available to researchers is also growing exponentially. For example, Hugging Face now hosts more than 110,000 transformer-based AI models and more than 20,000 diffusion models. Similarly, many model evaluation and validation tools and processes (external or built-in) are now publicly available. In other words, generative AI models and related tools are advancing at a pace much faster than individual researchers can keep up. The next critical step for institutes like MIDAS may need to be developing processes and providing efforts to help researchers choose models, assess and validate outputs, and curate models. In Section 1.2, we described many considerations or concerns that faculty researchers have expressed about using generative AI. Most of these can be addressed much more effectively with institutional effort. This is a direction that we are starting to invest effort in.
The academic research model has been evolving, in part because of new demands when the scale of research gets increasingly larger and the nature of research becomes more complex and more interdisciplinary. New AI technologies that use increasingly more massive and complex data and the rapidly evolving requirements of technical expertise demonstrate this point. Two features of generative AI, which are distinct from other technologies that have disrupted academic research in the past, seem to accelerate this process: First, generative AI can be used in almost every research field in some way, even though the specific usage in many cases is still being experimented with. Second, generative AI is being adopted at an unprecedented speed, with new models announced almost every day. It is a technology that advances rapidly and is instantly available to most researchers. We believe that academic researchers should no longer be left to rely on themselves to adopt such new technologies, because they themselves will not be able to fully leverage the potential of such technologies and stay at the forefront of research and discovery. Instead, their institutions should play a much more active role in addition to providing an enabling infrastructure. Academic institutions evolve at a much slower pace than technologies. Academic appointments, assessments, and support structure are difficult to change, let alone rapidly. However, research institutes within universities can be much more nimble than the entire institution, and they can have a significant impact on institutional transformation if they help researchers swiftly and responsibly adopt new technologies and do so ‘at scale and at speed.’
With regard to generative AI, helping researchers explore its use, building technical skills, developing best practices, and developing processes still constitute only the beginning stage. We anticipate that, very soon, researchers will encounter the next set of challenges, which will resemble the challenges that we have identified in achieving research reproducibility (Liu et al., 2022). A few examples include:
How do researchers choose among domain-agnostic and domain-specific generative AI models? This concerns the utility of a particular model either trained on ‘generic’ data or data only pertaining to certain research fields; whether a model is good only for a narrow range of research questions, and if so, how narrow? Are most researchers able to assess this individually?
How high is the barrier for individual academic researchers to use generative AI models in specific research scenarios? The barrier may well be higher than many are expecting. On the surface, publicly available generative AI models are accessible to anyone in the research community. But using such models rigorously for research not only requires proper prompt engineering but also, more importantly, technical skills to assess whether the model is trained appropriately for the specific research purpose, to fine-tune the model locally, and to assess and validate the model output. The barrier is even higher for models trained with specialized data (such as protein structures).
How extensively is any generative AI model validated? How are models compared and benchmarked for addressing a set of research questions? At this point, this is largely unexplored, and it is unlikely to be feasible for most researchers to take on this task on their own.
How do we avoid black boxes but also balance transparency with the protection of privacy? Issues of transparency are almost unavoidable when researchers use models that are not developed locally; and when the data, training algorithms, and parameters are not accessible. This poses a significant new challenge for the assessment of rigor, reproducibility, interpretability, and ethical issues of research studies. However, how much transparency is acceptable also depends on many other factors and it will be difficult for individual researchers to decide.
We believe such challenges can only be met through much more intensive institutional effort. Few researchers will be able to develop processes and tools to translate guidelines and technical solutions into research outcomes all by themselves. Their limitations are the result of limited resources, priorities (publishing and securing grants), and technical skills. Not only is this true in the cases of adopting generative AI for research, but also for new challenges that will arise, including the next wave of AI systems that go beyond generative AI. As such, the role of existing or future research institutes to support such transformation will need to be strengthened. Such efforts differ from the traditional functions of both research cores (such as computing centers and DNA sequencing centers) and interdisciplinary ‘mission’ centers/institutes. The traditional cores provide instrumentation and technical support for such instrumentation as the hardware for new technologies, and traditional research centers/institutes deliver research advances in a specific field or toward a specific objective rather than provide broad-based support. What is needed for broad-based institutional transformation is the development of guidelines, tools, and training to actively bridge the implementation gap. Another example of such transformational effort is software engineering for academic research, which strives to build general use research software with industry standards (Connolly et al., 2023).
In conclusion, we advocate for research institutes within academic institutions to play a leading role in institutional transformation in the adoption of rapidly advancing and game-changing technologies. This role is becoming critical as technologies such as generative AI are challenging the traditional academic research model and the role of academia in research discovery and innovation. Concerted effort from academic institutions will be indispensable to support the swift and responsible adoption of new technologies. Specifically, for generative AI:
University research institutes can coordinate the effort to create, validate, and benchmark generative AI tools for various types of research and develop protocols for using them.
Through training and disseminating tools and protocols, these organizations can help researchers adopt generative AI efficiently and rigorously.
These organizations can collaborate with model developers and methodology experts to develop generative AI for use in various research contexts that ensure the rigor and reproducibility of such research.
In the longer term, these organizations should play a major role to bridge new solutions for research innovation and the desired outcomes.
Jing Liu and H. V. Jagadish have no financial or non-financial disclosures to share for this article.
Andrade, S. R., & Walsh, H. S. (2023, June 12–16). SafeAeroBERT: Towards a safety-informed aerospace-specific language model [Video presentation]. AIAA AVIATION 2023 Forum, San Diego, CA, and online. Online publication, Article 3437. https://arc.aiaa.org/doi/10.2514/6.2023-3437
Bengio, Y. et al. (2023, March 22). Pause giant AI experiments: An open letter. Future of Life Institute. https://futureoflife.org/open-letter/pause-giant-ai-experiments/
Birhane, A., Kasirzadeh, A., Leslie, D., & Wachter, S. (2023). Science in the age of large language models. Nature Reviews Physics, 5, 277–280. https://doi.org/10.1038/s42254-023-00581-4
Boiko, D. A., MacKnight, R., & Gomes, G. (2023). Emergent autonomous scientific research capabilities of large language models. ArXiv. https://doi.org/10.48550/arXiv.2304.05332
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N.S., Chen, A.S., Creel, K.A., Davis, J., Demszky, D ... Liang, P. (2021). On the opportunities and risks of foundation models. ArXiv. https://doi.org/10.48550/arXiv.2108.07258
Bommasani, R., Klyman, K., Longpre, S., Kapoor, S., Maslej, N., Xiong, B., Zhang, D., & Liang, P. (2023). The foundation model transparency index. ArXiv. https://doi.org/10.48550/arXiv.2310.12941
Boyko, J., Cohen, J., Fox, N., Veiga, M. H., Li, J. I., Liu, J., Modenesi, B., Rauch, A. H., Reid, K. N., Tribedi, S., Visheratina, A., & Xie, X. (2023). An interdisciplinary outlook on large language models for scientific research. ArXiv. https://doi.org/10.48550/arXiv.2303.04226
Cao, Y., Li, S., Liu, Y., Yan, Z., Dai, Y., Yu, P.S., & Sun. L. (2023) A comprehensive survey of AI-generated content (AIGC): A history of Generative AI from GAN to ChatGPT. ArXiv. https://doi.org/10.48550/arXiv.2303.04226
Chenthamarakshan, V., Hoffman, S. C., Owen, C. D., Lukacik, P., Strain-Damerell, C., Fearon, D., Malla, T. R., Tumber, A., Schofield, C. J., Duyvesteyn, H. M., Dejnirattisai, W., Carrique, L., Walter, T. S., Screaton, G. R., Matviiuk, T., Mojsilovic, A., Crain, J., Walsh, M. A., Stuart, D. I., & Das, P. (2023). Accelerating drug target inhibitor discovery with a deep generative foundation model. Science Advances, 9(25), Article eadg7865. https://doi.org/10.1126/sciadv.adg7865
Ciucă, I., & Ting, Y. S. (2023). Galactic ChitChat: Using large language models to converse with astronomy literature. ArXiv. https://doi.org/10.48550/arXiv.2304.05406
Connolly, A., Hellerstein, J., Alterman, N., Beck, D., Fatland, R., Lazowska, E., Mandava, V., & Stone, S. (2023). Software engineering practices in academia: Promoting the 3Rs—readability, resilience, and reuse. Harvard Data Science Review, 5(2). https://doi.org/10.1162/99608f92.018bf012
Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., Baabdullah, A. M., Koohang, A., Raghavan, V., Ahuja, M. K., Albanna, H., Albashrawi, M. A., Al-Busaidi, A. S., Balakrishnan, J., Barlette, Y., Basu, S., Bose, I., Brooks, L. D., Buhalis, D., ... Wright, R. D. (2023). “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71, Article 102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642
Flanagin, A., Bibbins-Domingo, K., Berkwits, M., & Christiansen, S. L. (2023). Nonhuman “authors” and implications for the integrity of scientific publication and medical knowledge. JAMA, 329(8), 637–639. https://doi.org/10.1001/jama.2023.1344
Gozalo-Brizuela, R., & Garrido-Merchan, E. C. (2023). ChatGPT is not all you need. A state of the art review of large generative AI models. ArXiv. https://doi.org/10.48550/arXiv.2301.04655
Grisoni, F., Huisman, B. J., Button, A. L., Moret, M., Atz, K., Merk, D., & Schneider, G. (2021). Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Science Advances, 7(24), Article eabg3338. https://doi.org/10.1126/sciadv.abg3338
Gu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., Naumann, T., Gao, J., & Poon, H. (2021). Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1), 1–23. https://doi.org/10.1145/3458754
Hardwicke, T.E., Bohn, M., MacDonald, K, Hembacher, E., Nuijten, M. B., Peloquin, B. N., deMayo, B. E., Long, B., Yoon, E. J., & Frank, M. C. (2021) Analytic reproducibility in articles receiving open data badges at the journal Psychological Science: an observational study. Royal Society Open Science, 8(1), Article 201494. https://doi.org/10.1098/rsos.201494
Harker, J. (2023, March). Science journals set new authorship guidelines for AI-generated text. Environmental Factor. https://factor.niehs.nih.gov/2023/3/feature/2-artificial-intelligence-ethics
Hie, B. L., Shanker, V. R., Xu, D., Bruun, T. U., Weidenbacher, P. A., Tang, S., & Kim, P. S. (2023). Efficient evolution of human antibodies from general protein language models. Nature Biotechnology, 42(2), 275–283. https://doi.org/10.1038/s41587-023-01763-2
Jablonka, K. M., Ai, Q., Al-Feghali, A., Badhwar, S., Bocarsly, J. D., Bran, A. M., Bringuier, S., Brinson, L. C., Choudhary, K., Circi, D., Cox, S., de Jong, W., Evans, M., Gastellu, N., Genzling, J., Gil, M. V., Gupta, A., Hong, Z., Imran, A. ... Blaiszik, B. J. (2023). 14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon. Digital Discovery, 2, 1233–1250. https://doi.org/10.1039%2Fd3dd00113j
Laurinavichyute, A., Yadav, H., & Vasishth, S. (2022). Share the code, not just the data: A case study of the reproducibility of articles published in the Journal of Memory and Language under the open data policy. Journal of Memory and Language, 125, Article 104332. https://doi.org/10.1016/j.jml.2022.104332
Li, B., Qi, P., Liu, B., Di, S., Liu, J., Pei, J., Yi, J., & Zhou, B. (2023). Trustworthy AI: From principles to practices. ACM Computing Surveys, 55(9), Article 177. https://doi.org/10.1145/3555803
Liebrenz, M., Schleifer, R., Buadze, A., Bhugra, D., & Smith, A. (2023). Generating scholarly content with ChatGPT: Ethical challenges for medical publishing. The Lancet Digital Health, 5(3), e105–e106. https://doi.org/10.1016/s2589-7500(23)00019-5
Liu, J., Carlson, J., Pasek, J., Puchala, B., Rao, A., & Jagadish, H. V. (2022). Promoting and enabling reproducible data science through a reproducibility challenge. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.9624ea51
Lyu, Q., Tan, J., Zapadka, M. E., Ponnatapuram, J., Niu, C., Wang, G., & Whitlow, C. T. (2023). Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: Promising results, limitations, and potential. ArXiv. https://doi.org/10.48550/arXiv.2303.09038
Madani, A., Krause, B., Greene, E. R., Subramanian, S., Mohr, B. P., Holton, J. M., Olmos, J. L., Xiong, C., Sun, Z. Z., Socher, R., Fraser, J. S., & Naik, N. (2023). Large language models generate functional protein sequences across diverse families. Nature Biotechnology, 41(8), 1099–1106. https://doi.org/10.1038/s41587-022-01618-2
Mahjour, B., Hoffstadt, J., & Cernak, T. (2023). Designing chemical reaction arrays using phactor and ChatGPT. Organic Process Research & Development, 27(8), 1510–1516. https://doi.org/10.1021/acs.oprd.3c00186
Microsoft Research AI4Science & Microsoft Azure Quantum. (2023). The impact of large language models on scientific discovery: A preliminary study using GPT-4. ArXiv. https://doi.org/10.48550/arXiv.2311.07361
Morris, M. R. (2023). Scientists' perspectives on the potential for generative AI in their fields. ArXiv https://doi.org/10.48550/arXiv.2304.01420
National Artificial Intelligence Research Resource Task Force. (2023). Strengthening and democratizing the U.S. Artificial Intelligence innovation ecosystem: An implementation plan for a National Artificial Intelligence Research Resource. https://www.ai.gov/wp-content/uploads/2023/01/NAIRR-TF-Final-Report-2023.pdf
National Institutes of Health. (2023). The use of generative Artificial Intelligence technologies is prohibited for the NIH peer review process. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-23-149.html
National Science Foundation. (2023). Notice to research community: Use of generative artificial intelligence technology in the NSF merit review process. https://new.nsf.gov/news/notice-to-the-research-community-on-ai?utm_medium=email&utm_source=govdelivery
Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121–154. https://doi.org/10.1016/j.iotcps.2023.04.003
Sohn, E. (2023). The reproducibility issues that haunt health-care AI. Nature, 613(7943), 402–403. https://doi.org/10.1038/d41586-023-00023-2
Spirling, A. (2023). Why open-source generative AI models are an ethical way forward for science. Nature, 616(7957), 413. https://doi.org/10.1038/d41586-023-01295-4
Stodden, V., Seiler, J., and Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. Proceedings of the National Academy of Sciences, 115(11), 2584–2589. https://doi.org/10.1073/pnas.1708290115
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008. https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Wang, H., Fu, T., Du, Y. Gao, W., Huang, K., Liu, Z., Chandak, P., Liu, S., Van Katwyk, P., Deac, A., Anandkumar, A., Bergen, K. J., Gomes, C. P., Ho, S., Kohli, P., Lasenby, J., Leskovec, J., Liu, T., Manrai, A. K. ... Zitnik, M. (2023) Scientific discovery in the age of artificial intelligence. Nature, 620(7972), 47–60. https://doi.org/10.1038/s41586-023-06221-2
Yang, X., Chen, A., PourNejatian, N., Shin, H. C., Smith, K. E., Parisien, C., Compas, C. B., Martin, C., Flores, M. G., Zhang, Y., Magoc, T., Harle, C. A., Lipori, G. P., Mitchell, D. A., Hogan, W. R., Shenkman, E. A., Bian, J., & Wu, Y. (2022). GatorTron: A large clinical language model to unlock patient information from unstructured electronic health records. ArXiv. https://doi.org/10.48550/arXiv.2203.03540
Zeng, X., Wang, F., Luo, Y., Kang, S. G., Tang, J., Lightstone, F. C., Fang, E. F., Cornell, W., Nussinov, R., & Cheng, F. (2022). Deep generative molecular design reshapes drug discovery. Cell Reports Medicine, 3(12), Article 100794. https://doi.org/10.1016/j.xcrm.2022.100794
Zhuo, T. Y., Huang, Y., Chen, C., & Xing, Z. (2023). Red teaming ChatGPT via Jailbreaking: Bias, robustness, reliability and toxicity. ArXiv. https://doi.org/10.48550/arXiv.2301.12867
©2024 Jing Liu and H. V. Jagadish. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.