Statement from Human Longevity Inc. on PNAS Paper on Face and other Trait Predictions from Whole Genome Sequencing and Machine Learning for Identification of Individuals
September 5, 2017
As outlined in a peer reviewed study recently published in the journal, Proceedings of the National Academy of Sciences (PNAS), Human Longevity researchers set out to see what traits could be predicted and thus used to identify individuals applying machine learning algorithms to whole genome data, without assuming any additional information such as age, sex, and ethnic being shared alongside the genome. The study centered on 1,061 people from diverse ethno-geographic backgrounds. The authors readily acknowledged that this was a very small cohort and that much larger cohorts would be needed to make much more precise predictions and identifications. The researchers stand behind their methodology (there are more than 40 pages in the paper’s supplemental material outlining all methods) and invite all to review the PNAS paper.
As the team states in the conclusion of the paper:
We have presented predictive models for facial structure, voice, eye color, skin color, height, weight, and BMI from common genetic variation and have developed a model for estimating age from WGS data. Despite limitations in statistical power due to the small sample size of 1,061 individuals, predictions are sound. Although individually, each predictive model provided limited information about an individual’s identity, we have derived an optimal similarity measure from multiple prediction models that enabled matching between genomes and phenotypic profiles with good accuracy. Over time, predictions will get more precise, and, thus, the results of this work will be of greater consideration in the current discussion on genome privacy protection.
As also stated in the paper and publicly, a central reason for doing this study was to point out that as larger and larger genomic and associated phenotypic information databases exist (both public and private), individuals who are participating in these studies need to fully understand the implications of having their genomes in such databases. Those in our field and policymakers must also understand this situation. A core belief from the HLI researchers is that there is now no such thing as true de-identification and full privacy in publicly accessible databases because one’s genome is the ultimate identifier in that it codes for all the physical traits that are recognized as that individual. Put simply, if you have a genome from the public domain, researchers can sketch a picture of that individual, thus identifying that person. And while current methodologies are less sophisticated, the field is rapidly advancing so methodologies will only improve.
We agree that sharing of genomic data is invaluable for research, however to reiterate, our results suggest that genomes cannot be considered fully de-identifiable and should be shared by using appropriate levels of security and due diligence. At HLI we employ some of the best minds and tools to ensure security of our data. We look forward to continuing to work with interested researchers, policy makers and legislators to ensure the safety and privacy of genomic and other health-related information.
# # #
Embracing the Perks of Cloud Computing
By Yaron Turpaz, Ph.D., MBA, CIO, Human Longevity, Inc.
Human Longevity, Inc. was launched in March 2014 with a mission to transform healthcare and accelerate the practice of personalized, preventive healthcare via detailed and comprehensive genomics analysis and risks assessment, in integration with high quality phenotypic and medical information. We have built the largest human genome sequencing facility in the world with 24 Illumina HiSeq X machines and two Pacific Bioscience instruments. With an unprecedented capacity to produce more than 30,000 whole human genomes per year, at 30X genome coverage, we have built an integrated KnowledgebaseTM and developed cloud based solutions to process, analyze and visualize such complex multi-dimensional data that provides time- sensitive meaningful scientific and personalized health insights. To date, we have processed more than 4PB of genomics data from more than 20,000 integrated genomes and health records, and are on track to have more than 1M integrated health records in our KnowledgebaseTM by 2020. This requires us to build solutions that support very large unstructured data sets with real time analysis of complex queries.
HLI’s Franz Och, Chief Data Scientist, wins NAACL Best Paper Award for Machine Learning Word Study
The North American Chapter of the Association for Computational Linguistics – Human Language Technologies (NAACL HLT) honored HLI’s Franz Och with the award for best long paper. Three awards were presented, one for best long and two for best student papers.
Franz Och, HLI’s Chief Data Scientist, produced this paper while at Google, where he devised Google Translate, before joining HLI. The paper, entitled Unsupervised Morphology Induction Using Word Embeddings (co-written with Radu Soricut of Google) explored learn word morphology in an unsupervised way from large amounts of text.
The construct does this by representing words in a high-dimensional vector space and representing morphological rules with ‘vector arithmetic’. So the system learns for example how the affix ‘ly’ changes the meaning of a word (immediate -> immediately) or what the prefix ‘over’ does to a word. “The cool thing of the paper”, as Franz describes his work, is that the machine does this completely unsupervised, so it can be applied to all languages that have prefix or suffix morphology. The paper shows results for a diverse set of languages and shows that modeling this improves, for example, word similarity tests.
Milken Institute Global Health Conference Video
Human Longevity Inc. Co-founders Craig Venter and Peter Diamandis talk pioneering genomics for a precision medicine future at the Milken Institute Global Health Conference.
Read more about the event on the Milken Institute website.
Dr J Craig Venter wins the Leeuwenhoek Medal
De Leeuwenhoek Medal 2015 has been awarded to Dr J Craig Venter. The medal was presented to him by the Dutch State Secretary of Education, Culture and Science Sander Dekker at the annual conference of the Royal Netherlands Society for Microbiology (KNVM) and The Netherlands Society for Medical Microbiology (NVMM) April 14 at Papendal, Arnhem.
The Leeuwenhoek-medal is the most prestigious prize in the field of Microbiology. It has been awarded every 10 years since 1877 to a scientist who has made the most important contributions to Microbiology during the last 10 years. The first 13 medals were awarded by the Royal Netherlands Academy of Arts and Sciences. From 2015 onwards, KNVM awards the medal. Dr Venter is the 14th winner. Previous award winners include Louis Pasteur, and the Nobel prize winners Andre Lwoff and Selman Waksman.
Dr Venter has developed novel approaches in genome sequencing. Venter showed the power of large scale sequencing combined with novel bioinformatics analytical tools. He provided the first full genome sequence of a bacterium, Haemophilus influenzae. Many genomes followed including the human genome. Dr Venter also discovered millions of novel genes by his famous Global Ocean Sampling Expedition. The achievements of Dr Venter are not limited to genome sequencing. For instance, he has made milestone contributions to synthetic biology. He developed methods for whole bacterial chromosome synthesis and assembly, followed by successful transplantation of a synthetic chromosome into a chromosome-less bacterial cell. The achievements of Dr Venter will pave the way for the development of newly designed bacterial species with unprecedented functionalities, for instance for the production of fuels for the future.
The award is an authentic gold plated silver medal. Dr Venter is now also honorary member of KNVM.
Prof Dr HAB Wösten
J. Craig Venter, Ph.D., Co-Founder and CEO, Human Longevity, Inc. (HLI) Participates in White House Precision Medicine Event
Prepared Statement by J. Craig Venter, Ph.D.
It is gratifying to see that the Obama Administration realizes the great power and potential for genomic science as a means to better understand human biology, and to aid in disease prevention and treatment. I was honored to participate in today’s White House event outlining a potential new, government-funded precision medicine program.
Since the 1980s my teams have been focused on advancing the science of genomics—from the first sequenced genome of a free living organism, the first complete human genome, microbiome and synthetic cell— to better all our lives.
We founded HLI in 2013 with the goal of revolutionizing healthcare and medicine by systematically harnessing genomics data to address disease. Our comprehensive database is already in place with thousands of complete human genomes, microbiomes and phenotypic information together with accompanying clinical records, and is enabling the pharmaceutical industry, academics, physicians and patients to use these data to advance understanding about disease and wellness, and to apply them for personalized care.
We envisioned a new era in medicine when we founded HLI in which millions of lives will be improved through genomics and comprehensive phenotype data.
Now, through sequencing and analyzing thousands of genomes with private funds – with the goal of reaching 1 million genomes by 2020 – we believe that we can get a holistic understanding of human biology and the individual.
It is encouraging that the US government is discussing taking a role in a genomic-enabled future, especially funding the Food and Drug Administration (FDA) to develop high-quality, curated databases and develop additional genomic expertise. We agree, though, that there are still significant issues that must be addressed in any government-funded and led precision medicine program. Issues surrounding who will have access to the data, privacy and patient medical/genomic records are some of the most pressing.
We look forward to continuing the dialogue with the Administration, FDA and other stakeholders as this is an important initiative in which government must work hand in hand with the commercial sector and academia.
Additional Background on Human Longevity, Inc.
HLI, a privately held company headquartered in San Diego, CA was founded in 2013 by pioneers in the fields of genomics and stem cell therapy. Using advances in genomic sequencing, the human microbiome, proteomics, informatics, computing, and cell therapy technologies, HLI is building the world’s largest and most comprehensive database of human genomic and phenotype data.
The company is also building advanced health centers – called HLI Health Hubs – which will be the embodiment of our philosophies of genomic science-based longevity care – where we will apply this learning and deliver it to the general public for the greatest benefit. Individuals and families will be seen in welcoming environments for one-stop, advanced evaluations (advanced genotype and phenotype analysis including whole body MRI, wireless digital monitoring, etc.). Our first prototype center is slated to open in July 2015 in San Diego, California.
For more information please visit www.humanlongevity.com
# # #
HLI Media Contact:
Heather Kowalski, firstname.lastname@example.org, 858-361-0466
News Round Up from Launch
Following are a few highlights from the coverage of our launch little more than a week ago.
- Microbes and Metabolites Fuel an Ambitious Aging Project
Susan Young, MIT Technology Review (March 11, 2014)
- J. Craig Venter launches new project on aging
Karen Weintraub, USA Today (March 5, 2014)
- Malaysian billionaire backs RM230mil ‘fountain of youth’ research
Julie Steenhuysen, The Star Online (March 5, 2014)
- For his next act, genome wiz Craig Venter takes on aging
Julie Steenhuysen, Reuters (March 4, 2014)
- A Genetic Entrepreneur Sets His Sights on Aging and Death
Andrew Pollack, The New York Times (March 4, 2014)
- Venter Starts DNA-Scanning Company to Boost Longevity
Robert Langreth, Bloomberg (March 4, 2014)
- Q&A: Genome Pioneer Craig Venter Plans Largest Human Genome Project to Aid Longevity
Dan Vergano, National Geographic (March 4, 2014)
- J. Craig Venter’s Latest Venture Has Ambitions Across Human Lifespan
Aaron Krol, BioIT World (March 4, 2014)
- Biggest gene sequence project to launch
Bradley J. Fikes and Gary Robbins, UT San Diego (March 4, 2014)