Philanthropic foundations have anecdotally played an important role in supporting university research. In one of the few detailed empirical studies of the level of funding, Murray estimates that philanthropic foundations (gifts and grants) provided over 30% of university research funding in leading universities (Murray, 2013). Beyond the levels of funding, the philanthropic foundations also are likely to be more innovative and take more risks than do federal science agencies. There is evidence that the peer review process adopted by federal agencies is conservative (Carson et al., 2023)—as Nicholson and Ioannidis (2012, p. 34) point out, the mantra is “conform and be funded.” There is also evidence that philanthropic foundations successfully pursue different funding approaches, funding people not projects (Azoulay et al., 2011).
Four philanthropic foundations certainly played a critical role in supporting the initial work of many of the articles in this special issue (Emecz et al., 2024; Hausen & Azarbonyad, 2024; Lane et al., 2024; Pallotta et al., 2024; Potok, 2024; Zdawczyk et al., 2024), which report on the vision and operational implementation of a successful platform—the Democratizing Data platform—which has subsequently been piloted by federal statistical agencies in support of the Foundations for Evidence-Based Policymaking Act (2019). Without their initial support, none of the work could have happened.
The editors thought that it was likely that many readers of this special issue might be interested in how the four foundations—Schmidt Sciences, the Alfred P. Sloan Foundation, the Patrick J. McGovern Foundation, and the Overdeck Family Foundation—support innovative ideas. We interviewed the four foundation officers who managed the Democratizing Data project and asked them to answer three questions: What does their foundation do? Why was the project of interest? and What is their vision of the future of data as a public good and for evidence building? The following is a summary of four separate conversations on those three topics, organized by the date of initial support of the Democratizing Data project.
Our official slogan is solving the hardest problems and enabling the best people to solve the hardest problems. We try to be early funders and intentionally risk bearing, so not everything has to work. We look to fund the areas where things are not working right and we can help fix it. It is essentially applying the fortunes of Eric and Wendy Schmidt for the good of society.
One area in which things were not working right was access to data. It wasn’t accessible because it was behind the equivalent of a barbed wire fence, and then the fence got twice as high with COVID-19. In general, we support open data and expect all of our grantees to make their data available as promptly as possible. There are often field norms that occasionally get in the way, where embargoes and data are still traditional. We do our best to avoid that.
We were also interested in how do we get data really broadly trusted and available both nationally and internationally?
We don’t promise forever funding. We have a very straightforward model. We want to get the best people to work on the hardest problems for general cultural and societal value. Some of the things we do are pure science and pure knowledge. Others are application. Uh, we are happy with both. We do, however, primarily fund natural sciences.
The Alfred P. Sloan Foundation is a private philanthropic foundation which has been around about 90 years. It supports research in the natural and social sciences with a broad vision that science and technology are drivers of American prosperity. Sloan is in the business of supporting the development of public goods, particularly the advancement of knowledge, not a monetary return.
This is particularly true for investments in data. Most research produces data in some form. But one of the things that is pretty deep in Sloan’s DNA is an attention to scaling to the broader community. We are proud of our investment in the Sloan Digital Sky Survey, which, in addition to its massive impact on astronomy, has also demonstrated a different way of provisioning data as a resource for a research community (Szalay et al., 1999; Taghizadeh-Popp et al., 2020). The power of pooling resources is critical. You can build something up that, beyond advancing science and making careers, has just tremendous spillover benefit.
The Overdeck Family Foundation seeks to provide all children with the opportunity to unlock their potential. Our grant making focuses on unlocking innovation, evidence, and growth opportunities to scale cost-effective programs accelerating key academic and socio-emotional outcomes. We fund efforts both inside and outside of school in the areas of early childhood, STEM education, and K-9 programs that include supporting educators and student-centered learning environments. We operate with a strong belief in the importance of evidence and learning. By investing in enhancing the quality and utilization of research and data in education, we hope to accelerate adoption and funding for evidence-based programs and practices.
The McGovern Foundation conceptualized our mission 5 years ago to recenter human-centered, morally grounded leadership for data and AI (artificial intelligence) in civil society. We observed that technical development that was happening was tightly held by private sector organizations, with a corollary lack of access for governments, for civil society, and for those who are focusing on social good. The foundation is geared to transform the traditional model of philanthropy, moving from the exclusive provision of grants to building in-house technical capacity. This unlocks three concentric opportunities: first, to partner very closely with civil society and government to build new structures, systems, and practices around use of data and predictive AI and now generative AI. The second is to build an in-house team of experts who partner with those organizations to build organizational transformation, aggregating best practices around data use, and enabling organizations to use those data and their derivative insights to deploy new program interventions at scale. And the third is to establish an in-house team of technologists who build consumer-facing applications enabled by data and AI, with a focus on purpose over profit. These applications could range from uses of generative AI for supporting small journalistic newsrooms to population-level health products using decades of frontline health access data to identify clusters of behavioral practices leading to worse health outcomes.
The foundation also supports government in policymaking and advocacy for private sector organizations to use better practices, supporting boards and CEOs in understanding where the forefront of these technologies are and how they should be focusing on deploying them.
In broad terms, the foundation supports long-term thinking about new institutions necessary to ensure that advances in data science and artificial intelligence are used for the betterment of humanity.
We are intensely interested in the value of the work we fund, not simply what we get done. The value can be pure science, but there can also be other outcomes. We look for chances of funding raw value, either addressing some mushy problems early with really great people who can clarify and address the problem, or by creating a series of software products or usable tools that others can be encouraged to use.
As I noted earlier, we were also quite prepared to take risks and jump in early. We were the first funder for this project, and our goal here was to get something launched where it was not clear exactly what we would be getting. Both my background and Julia’s support the observation that many platforms are made, but few are in use. The goal here was to make something really useful and neat. Governmental data are attractive in that sense, because they’re often much more easily accessible to researchers and policymakers than private sector data.
The democratizing data aspect was very attractive because it was creating a platform that would provide a narrow but excellent beachhead on this problem of how do you know what’s going on with data use. If you know how a few other people have been using data, maybe that’s a clever idea for how you might use it.
As I noted earlier, we look for areas where we can get something going and help fund it while it’s getting up and moving. So we’re very pleased that the work that we started is now a seed for much of the evidence-based data opportunities with much broader use initially by federal agencies who have a genuine reason to want to have information about data use. Ideally there would be a closing of the loop which gave information to agencies on how to improve data provision if they were provided evidence on what people wanted—as demonstrated by their real revealed preference, that is, what they do with it?
Sloan sees many proposals from people who want to build infrastructure in one form or another. It could be a new instrument. It could be a large data set, a longitudinal data project, an open source code base, a trained model, any number of things. When Sloan look at ambitious projects like these, there are a few things we look for, and one of the first is whether there is a clear initial use case. We talk a lot about Field of Dreams thinking (from the 1989 movie). Does the project assume that “If you build it, they will come?” Or is the audience there already? We examine whether the project is feasible. Does the technology exist? Does needed infrastructure exist, or could existing pieces be brought to bear—not just from a technical perspective but also from a data perspective. Does the data exist? And if there’s a gap between the world we want to have and what currently exists, what are those obstacles, can they be surmounted and does a community exist? Is there demand for it?
We also look for early engagement with the user community. One of the things that’s been really wonderful about the Democratizing Data project are the ways that the community has been engaged through the early Kaggle competition and the workshops and convenings along the way. It has not just been an investment in building a resource and assuming researchers will use it. Bringing early users along as part of the conversation ensures that the resources are well fit to the needs of the community as a whole, while making sure they’re not overfit to the needs of any one particular user. It’s a delicate balance, and it’s one of the things we very much look for in in this kind of project.
In sum, we are particularly enthusiastic about successful innovations around infrastructure. What that means is that you start to see institutionalization, actual commitment by agencies to get involved.
There were four attractive features. One was the crowd-sourcing aspect. The Kaggle competition represented an effective enlisting of the broader community to develop dashboard solutions and engagement through hosting community events. Another was the pilot nature of the project. That meant test the potential of the work with a single agency to speed buy-in and then replicate with other agencies. A third was the data visualization aspect, because the compelling design of dashboard improves engagement and salience of effort. Finally, we found the partnership approach, particularly the relationships with federal agencies and the engagement with publishers.
We recognize the immense opportunity if the federal government steps into and becomes a leader in data management. Our recent experiences tell us that innovation in the governmental use of data to directly inform policymaking really has come from a small set of leaders and often has not had the resources to be able to deploy proofs of concept, and then build products and platforms at scale.
The philanthropic opportunity is first to support those imaginative and visionary leaders to be able to enumerate a vision of transformative application of these tools and methods. The second is to support them and their organizations to partner with government to deploy proofs of concept, and the third, to provide educational resources and advocacy materials that then cement those opportunities inside of federal policymaking and resource allocation.
Philanthropies are particularly well equipped to sustain this work because we can do it without a self-interested economic agenda. We can take risks that allow us to really examine what’s possible rather than what’s likely to drive the baseline outcome. And then because we could convene representatives from civil society, the private sector and government, and also collaborate with other funders, we could support a long-term overall transformation.
This project was both pragmatic and visionary. It looked for ways to address what felt like the core and foundational problem, which is the identification and aggregation of already available data sets into environments where they could advise and support decision-making. And that itself felt like an unsolved problem. The ability to support agencies in understanding the data that they had access to was an amazing first step. Going from there to creating cross institutional collaboration and decision-making becomes the next frontier.
We have been pleased with the early success in integrating data not just across federal decision-making, but in integrating state and local communities as well, and recognizing that many of the problems that we are trying to address require participation from a variety of stakeholders.
In general, we are very interested in maximizing the value of scientific work. And this means not just making the results available but also available to people who would really care about it and who care about data. Frequently it’s the students who created it. Much of the hard work gets done by grad students and postdocs and the occasional undergrad who’s actually in the lab, who’s actually beating data into plowshares. As a result, we support a number of fellowship and postdoc programs, some of them quite large. One of them focuses on improving interdisciplinary, cross-disciplinary work by encouraging postdocs to learn a whole new field. We have another very large program to help assist natural scientists to take advantage of AI in their regular research.
Why do we focus on the postdoc level? Mostly because they move. They do a lot of the hard work. Also they have a few years to prove themselves, and proving yourself sometimes means doing something exceptional. We give them funding that allows them to take a risk because they can always get another postdoc. And, if something they’re doing that’s risky doesn’t quite work out, then after 3 years, they’re almost certain to go somewhere else.
So we’re really interested in the way people who carry new thoughts bring them elsewhere, where they get hybridized with a new set of people they work with. They’re likely to take the data they have produced to do new work with it. So there is, we hope, a virtuous circle of both the people and the information interacting.
It’s particularly useful, in other words, to know what happens to the data. Knowing what people did is easy if they stay in the academic research world, because they will continue to publish visibly under their name. We can go and see what their record is. But you don’t get very good reports on its use. So I’m very much in favor of the large direction of figuring out which data are being used and how and therefore, in some sense, which data have been the most valued in later years. There is also a secondary value—the people who paid for the research and were paying for the data, such as agencies and other philanthropies, can get information on whether their investments are being used and how, provides a justification for doing more of it and continuing.
In reality, what we’re going to be seeing, I think, is a growing need to be very international on some areas. Some data matter at a human and societal level, not only a cultural, intellectual, and medical data level. Which if you’re making medical policy or making diagnostic decisions, it would be good if the data were good instead of bad.
The whole issue of provenance, the whole issue of review, knowing where data came from, knowing who it came from, knowing who vetted it and so on are the standard questions, but the answers have hitherto not been available.
So we think this data search and discovery is a brand-new, important topic. If I look at the world of climate change, my hope is that in not this decade, but next decade, there will be some very serious major mitigation and adaptation efforts, not just talking about it and starting around it. If you’re going to be making major policy decisions with literally trillions of dollars of equivalent funding around the world, it’s going to have to be based on real facts and real data. People aren’t going to trust data from one nation if it’s going to affect their economies. So we’re going to have to know where the data came from, be able to match multiple international data streams, be able to curate and manage the provenance. But then also, you would like to know which data went into which decisions, whether it’s an industrial investment or a national program. This is going to be critical.
In other words, all of the topics that have been discussed so far have to move up one notch onto the global, not just the national scene, but first, get it fixed local in the state and federal realm here. That’s really valuable. And when I look at the sorts of data and projects we’re likely to fund, we’re funding a bunch of work in astrophysics, for example. We’re interested in the quickest possible provision of data, not embargoing behind some of the traditional one- or two-year delays that were common in the field last century. An example is the ocean research that the Schmidt Ocean Institute funds, where we provide a platform otherwise known as a 360-foot boat for use by researchers. We insist that some of the data are made publicly available within hours or days. We strongly encourage publication of all the relevant data thereafter. And if you don’t agree with that, you don’t have to use our ship. Similarly, you don’t have to use our funding if you don’t want to make your data reasonably available.
There is a lot of use of phrases like fair and open, and everybody is in favor of the usual way of other people contributing. But how do you avoid the natural free-rider problem? Who’s going to do the hard work of making information available? And now we get back into incentives—the carrots and sticks. Funders have sticks and academic colleagues have carrots. This is all really important. So I’m really happy to see how there’s a rapidly growing beachhead across federal agencies through evidence-based research.
The challenge is that as a national policy it’s much easier to talk about it than to do it. If you don’t have the machinery, you don’t have the tools. And if you don’t have the people who have the role of making it happen, it’s not going to happen because it’s always more fun to do new stuff than to make your old stuff more accessible. It’s more fun to do something over again than to get it out of a literature.
We hope that the end result is not just data, but actually useful information, and then finding out that it’s been used and how it’s being used.
Something that I think we’re starting to see that excites me is the end of the single web interface. People are instead building API (application programming interface)-level access, which means that we’re going to see new things built on top of existing infrastructures. We’re very lucky at Sloan to have a board that really values and understands our role in not just funding the shiny thing that’s most visible, but understanding the layers of infrastructure that that sits on. That means that we can have a really substantive conversation about the ways that our investments in different areas contribute to the foundational things that are sunk into the bedrock that enable the skyscrapers to be built.
One of the fantastic examples is that 9 out of 10 of the papers written using Sloan Digital Sky Survey data are authored, not by members of the collaboration, but by people who gained access to it as open data. They can then find their own insights and research or join it with other data sets. And because of that openness and flexibility, as well as the technology infrastructure for that, we were able to support its generalization so that it’s not just that the data are used elsewhere in astronomy, it’s that the technology infrastructure can be used in other contexts where people are dealing with data at a certain scale.
We think a lot about socio-technical infrastructure and understand that both technology infrastructure (things like Jupyter), as well as the practices and the ability of a research community to use that technology, lead to new insights from data.
We look for a few critical components:
Value Proposition: clear use cases and projects addressing well-articulated challenges.
Team: leaders and project teams with the skills, experiences, and determination to affect change.
Partnerships: relationships with and buy-in from key stakeholders required to achieve success.
Theory of Scale: projects with the potential and vision for achieving large-scale, sustainable impact.
Catalytic Role: opportunities where our funding and expertise could truly make a significant difference to the outcome of the project.
In just the past 12 months, we have seen transformative use cases when data science is applied to specific verticals, from drug discovery to kind of pragmatic applications of data science for labor and workforce transformation. But what remains missing, and where a great opportunity lies, is in building the infrastructure that moves those from individual pilots or products that are being generated or pieces of research to transforming the way we think about entering into an academic discourse.
Three components are critical to this transition: One is the creation of shared data infrastructure that physically allows us to aggregate and share this information. The second is the development of more robust data protection and governance guidelines. And the third is recognizing the role of government in supporting these tools and governance mechanisms as matters of public infrastructure. Each of those components requires significant work that has to happen in partnership between catalytic funding coming from philanthropy, and significant intellectual work that has to happen between practitioners and federal government resources.
Julia Lane, Stuart Feldman, Joshua Greenberg, Jonathan Sotsky, and Vilas Dhar have no financial or non-financial disclosures to share for this article.
Azoulay, P., Graff Zivin, J. S., & Manso, G. (2011). Incentives and creativity: Evidence from the academic life sciences. The RAND Journal of Economics, 42(3), 527–554. https://doi.org/10.3386/w15466
Carson, R. T., Zivin, J. S. G., & Shrader, J. G. (2023). Choose your moments: Peer review and scientific risk taking. (NBER Working Paper No. 31409). https://doi.org/10.3386/w31409
Emecz, A., Mitschang, A., Zdawczyk, C., Dahan, M., Baas, J., & Lemson, G. (2024). Turning visions into reality: Lessons learned from building a search and discovery platform. Harvard Data Science Review, (Special Issue 4). https://doi.org/10.1162/99608f92.d8a3742f
Foundations for Evidence-Based Policymaking Act of 2018, Pub. L. No. 115-435, 132 Stat. 5529 (2019). https://www.congress.gov/bill/115th-congress/house-bill/4174
Hausen, R., & Azarbonyad, H. (2024). Discovering data sets through machine learning: An ensemble approach to uncovering the prevalence of government-funded data sets. Harvard Data Science Review, (Special Issue 4). https://doi.org/10.1162/99608f92.18df5545
Lane, J., Spector, A. Z., & Stebbins, M. (2024). An Invisible hand for creating public value from data. Harvard Data Science Review, (Special Issue 4). https://doi.org/10.1162/99608f92.03719804
Murray, F. (2013). Evaluating the role of science philanthropy in American research universities. Innovation Policy and the Economy, 13(1), 23–60. https://doi.org/10.1086/668238
Nicholson, J. M., & Ioannidis, J. P. (2012). Conform and be funded. Nature, 492(7427), 34–36. https://doi.org/10.1038/492034a
Pallotta, N., Lane, J., Locklear, J. M., Ren, X., Robila, V., & Alaeddini, A. (2024). Discovering data sets in unstructured corpora: Discovering use and identifying new opportunities. Harvard Data Science Review, (Special Issue 4). https://doi.org/10.1162/99608f92.77bfa1c9
Potok, N. (2024). Data usage information and connecting with data users: U.S. mandates and guidance for government agency evidence building. Harvard Data Science Review, (Special Issue 4). https://doi.org/10.1162/99608f92.652877ca
Szalay, A. S., Kunszt, P., Thakar, A., Gray, J., & Slutz, D. (1999). The Sloan Digital Sky Survey and its archive. ArXiv. https://doi.org/10.48550/arXiv.astro-ph/9912382
Taghizadeh-Popp, M., Kim, J. W., Lemson, G., Medvedev, D., Raddick, M. J., Szalay, A. S., Thakar, A. R., Booker, J., Chhetri, C., & Dobos, L. (2020). SciServer: A science platform for astronomy and beyond. Astronomy and Computing, 33, Article 100412. https://doi.org/10.1016/j.ascom.2020.100412
Zdawczyk, C., Lane, J., Rivers, E., & Aydin, M. (2024). Searching for how data have been used: Intuitive labels for data search and discovery. Harvard Data Science Review, (Special Issue 4). https://doi.org/10.1162/99608f92.f1cbbfbb
©2024 Julia Lane, Stuart Feldman, Joshua Greenberg, Jonathan Sotsky, and Vilas Dhar. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.