Artificial intelligence company Google DeepMind and Google.org have announced a new partnership with the UK’s Wellcome Sanger Institute to build high-quality genomic datasets, which will form the basis of training data for new AI models.
The consortium is initially planned to last 5 years, with Google.org and DeepMind funding of $5 million per year.
DeepMind, the company behind the ubiquitous protein structure prediction software AlphaFold, officially released AlphaGenome in January. The publicly available model can predict the function of DNA sequences, and Žiga Avsec, a Google DeepMind researcher and lead author of the AlphaGenome paper, told C&EN that it can go beyond expression and predict more-detailed aspects like DNA accessibility and transcription-factor binding.
DeepMind also detailed a multiagent AI platform, Co-Scientist, in May, which can scan existing literature and generate hypotheses.
All of these tools are built on open-access datasets, but not all areas of the life sciences have suitably indexed and curated resources to train new AI algorithms. In a news release, Julia Wilson, chief innovation and impact officer at the Sanger Institute, says the consortium aims “to create resources that will be shared widely with the community to enable transformative scientific discoveries and deliver broad impact across the life sciences.”