Just how many proteins are in the human genome? The consensus estimate of 19,500 excludes an unknown number of microproteins—gene products whose tiny size makes them tricky to detect and study. An international consortium announced today that they have identified several protein-coding genes that had previously been overlooked, and laid out a roadmap for identifying up to 7,200 more (Nature 2026, DOI: 10.1038/s41586-026-10459-x).
After the human genome was sequenced, researchers were surprised by how small a fraction of it encodes proteins. Geneticists found numerous biological roles for the noncoding regions of the genome. But then, to their surprise, new techniques for sequencing ribosome-associated RNA showed that thousands of RNAs once labeled “noncoding” appear to be translated into short microproteins.
“Whenever we as humans discover something new, we create a box for it, and we create terms for it … and then within a second we realize biology is much more complex than we think,” says Sebastiaan van Heesch, a systems biologist at Utrecht’s Princess Maxima Center for Pediatric Oncology who helped lead the work.
Despite its pitfalls, classification is critical for the scientists who curate genome databases—and the people who use that information, such as clinical geneticists, biobank curators, and researchers all over the world.
Questions about the “noncoding” RNAs that are translated after all led Jonathan Mudge, who leads a genome annotation project at the European Bioinformatics Institute (EBI), to think hard about validating these proteins’ existence. One conversation led to another, says van Heesch, and “before we knew it, we had a very nice worldwide consortium.”
Itty bitty proteins slip past some filters
Working from a list of 7,264 stretches of DNA that encode a start and stop codon, and could plausibly make a microprotein of at least 16 amino acids, the research team went looking for evidence that the microproteins exist.
The consortium built two massive proteomics datasets, cobbling together billions of mass spectra collected on different instruments during over 400 studies that had been made public.
The smaller a protein is, the more likely it is to slip through quality filters in data analysis. For instance, to identify a protein by mass spectrometry, researchers look for two unique peptides spanning at least 18 amino acids. That’s impossible for the shortest microproteins, which may not even be 18 amino acids long.
The researchers assessed a few dozen well-studied microproteins and found that just 5% could be detected using typical proteomics rules. According to Robert Moritz, a proteomics researcher at the Institute for Systems Biology, that prompted the team to ask themselves, “What if we start relaxing some of those rules?” When they did, they found evidence that thousands of microproteins are indeed produced.
To validate the microproteins’ existence, the researchers wove in lines of evidence testing for function. They developed a new metric for evolutionary pressure, which can be a proxy for a gene’s importance, and conducted CRISPR screens, to measure how cells change when a gene is missing and so determine the gene product’s biological role. The results were mixed.
“The evidence for some [microproteins] was striking and clear,” van Heesch says. The existence that others are made, or have a biological role, “was much harder to establish.”
“Whenever we as humans discover something new, we create a box for it, and we create terms for it … and then within a second we realize biology is much more complex than we think.”
An ongoing process
The EBI has already added at least three microproteins to its human genome database. Researchers are still vetting many of the other 7,264 candidates; for the moment, they are calling sequences that are definitely translated, but may not qualify as coding genes, peptideins.
“It’s too large to ignore now,” Moritz says about the evidence for microproteins’ existence, adding that there may be thousands more peptideins to evaluate. A preliminary analysis of molecules as small as 10 amino acids pushes the number toward 20,000, which could double the human proteome, he says.
“We know they’re there. And now it’s up to the entire scientific community to help us understand which ones do what—or maybe some of them do nothing,” van Heesch says.
2026 American Chemical Society