FWD 2 Sequencing the Cannabis Genome: Impact, History, and Future

HerbalEGram: Volume 8, Number 10, October 2011

Sequencing the Cannabis Genome: Impact, History, and Future

The August announcement that Massachusetts-based company Medicinal Genomics had sequenced the entire genome of Cannabis sativa L. received much national attention, including coverage by media outlets from National Public Radio (NPR) and CBS to recreational marijuana blog HailMaryJane.com.1,2,3 While the development makes for attention-grabbing headlines—“Marijuana Genome Sequenced for Health, Not Highs,” “Science Cracks the Cannabis Genome”—how will it impact research and public health?

Medicinal Genomics founder Kevin McKernan became interested in decoding the cannabis genome when working with a clinical oncologist to sequence the DNA of cancer tumors and patients.

“As a result of this,” said McKernan, “[I] had a few friends with cancer ask about medical marijuana” (e-mail, September 29, 2011).

Then he read Spanish scientist Manuel Guzm
án’s research documenting that cannabinoids, some of the biologically active compounds in cannabis, have a favorable therapeutic index in cancerous cell cultures and animal models.4 McKernan said this “really drove it home,” as that finding is rare with most potential cancer drugs. Additionally, McKernan read Etienne de Meijer's work emphasizing that the cannabis chemotype is strictly governed by genetics,5 “but we only knew CBD [cannabidiol] and THC [tetrahydrocannabinol] synthase sequences to date.”

“I naively figured we could sequence the whole genome for under $50k and that this had to be a priority,” he said. “Turned out to be far more complicated of genome than one could gather from the literature.”

Though other scientists and organizations have been working on sequencing cannabis,
Medicinal Genomics is said to have produced the “largest known gene collection” at more than 131 billion bases of sequence.6 The sequence bases of C. sativa were made available on August 18th on Amazon EC2—a public cloud computing service—via Nimbus Informatics, an open source data management website. A data assembly is also available for download.

Impact of Genome Decoding

Before the sequencing of the cannabis genome, about 12 cannabis genes were known, and now tens of thousands are known, said McKernan. Additionally, the genome has provided a better understanding of THC synthase genomics, as well as more than 2 million single nucleotide variants.

McKernan also noted that the genome has made apparent that THC synthase is not one gene with 2 copies in a diploid genome, “as all of the previous papers have postulated.” Instead, the recent sequencing implies that the gene has been replicated 8 times—with the potential assistance of a transposable element—and diverged.

In discussing the genome's importance with NPR, McKernan said it will allow scientists to investigate the genes that govern cannabis compounds other than THC and CBD, and to sequence other cannabis strains to highlight different traits.1 Leading cannabinoid researcher Ethan Russo, MD, finds these possibilities “very exciting.”

“The publication of the cannabis genome is a welcome scientific development," said Dr. Russo, “but one whose potential applications remain to be determined. The possibilities are enticing, and it seems certain that many able minds will apply their imagination to the task. Every phytocannabinoid that has been closely researched so far has demonstrated unique therapeutic potential. There are hundreds of strains available on the black market, but these are not necessarily stable and reproducible” (e-mail, September 22-29, 2011).

Dr. Russo thinks the most promising new investigations that might stem from the genome could be in the area of epigenetics—the study of heritable gene function changes that occur with no change in DNA—such as determining the factors regulating cannabinoid production, biosynthetic pathways, and terpenoid regulation. 

“One example,” he said, “might be the production of high-CBD
strains. Some researchers, like myself, believe that terpenoids synergize phytocannabinoid effects. Thus, it might be theoretically possible to produce plants that express one cannabinoid and one terpenoid to therapeutic advantage, say in treating anxiety.”7

The sequencing of the genome has resulted in a discussion that it will enable researchers to study cannabis without actually having to use real cannabis plants, which can be difficult to obtain within the strict US regulatory environment surrounding cannabis. Researchers could study cannabis’s genome through bioinformatics, said McKernan, noting that the human genome project enabled the understanding of how to reprogram cells to be capable of making several different cell types, such as adult stem cells.
The cannabis genome, he said, will enable people to potentially discover novel genes related to terpenoid synthesis by comparing the sequence to grape (Vitis vinifera) and hops (Humulus lupulus).

“This isn’t to say no one will ever need the plant again,” said McKernan. “But a much larger audience can now legally study the plant than prior to August 18, 2011.”

But Donald Abrams, MD
professor of clinical medicine at the University of California San Francisco, chief of hematology/oncology at San Francisco General Hospital, and a researcher of cannabis's effect in humans—stressed the importance of having access to actual cannabis material. “We know already a lot about the plant and its components without knowing the genome,” said Dr. Abrams (e-mail, September 19, 2011). “You don’t need the genome; you need the plant.”8

Previous Cannabis Genetics Work

Though several media outlets reported McKernan as saying that not much has been accomplished in cannabis genomics,6,9 he told the American Botanical Council (ABC), “One article took a comment out of context and it went viral through the news.”

Instead, McKernan recognized that much valuable work has been completed. “Many people intimate with the science of cannabis have offered up their time to help guide us on good places to apply this work in both the medical and hemp areas,” he said. “There is a lot of previous groundwork on THC and CBD synthase so there is a lot to learn from the road which has been paved to date.”

According to Dr. Russo, "Arguably, the genes for the most pharmacologically versatile components in cannabis have already been identified.”10 Dr. Russo, who is the senior medical advisor to GW Pharmaceuticals—manufacturer of Sativex®, an oromucosal spray containing cannabis extracts that is in late stage clinical trials in the United States for treatment of cancer pain—further explained, "While this development will, without doubt, spur further investigation, a tremendous amount of genetic work on cannabis has been accomplished previously."

Among these genetic breakthroughs, Dr. Russo listed the biochemical characterization and synthesis of THC; cloning and crystallization of THCA synthase; purification and sequencing of cannabidiolic acid; and the isolation of THCA synthase and identification of a unique single nucleotide polymorphism from an ancient cannabis sample found in a Chinese tomb.10,11

Additionally, Dr. Russo noted that high-THC, -CBD, -CBG, and -CBC strains of cannabis plants have been produced already, as have high-THCV, -CBDV, -CBGV, and -CBCV strains that are currently being researched.

“A lot of great work has already been done on CBD and THC synthase,” said McKernan, “and all of those cannabinoids mentioned above are derivatives within those 2 isolated pathways. The next question is what makes the other 77 cannabinoids reported to be in the plant?”

And though some news stories reported that the genome will enable scientists to breed cannabis plants containing no cannabinoids, McKernan said in his interview with ABC that this has already been accomplished and clarified that he hopes the genome will give scientists a better understanding of the cannabinoid pathways that can be regulated up or down.

McKernan noted the genome of mustard weed (Arabidopsis thaliana)—the first-ever plant genome to be sequenced—that has been said to potentially enable “crops to be grown without pesticides, or to be grown in poorer soil… help native plants fight invasive species… and bring about new medicines for humans.”12

“Every plant genome which has been decoded has resulted in a radical change of the field,” said McKernan. “I think I'd be called myopic if I claimed cannabis genetics ends at cannabinoid RNA sequences in terms of potential for the plant when one considers hemp and the many unknowns about the genetics governing the terpenoid and cannabinoid pathways.

“Compared to other plant genomes
like Arabidopsis, where hundreds to thousands of genomes are now being sequenced,” McKernan continued, “cannabis was a desert and our work has simply built an aqueduct, but by no means makes it a rain forest. The reference we have to date is still a draft. The important thing to note is that once you have a reference genome you can then leverage these next gen sequencing tools to drive the whole genome sequencing costs down tremendously.”

McKernan said he hopes the genome will open investigation into the reason behind the low levels of expression of the other cannabinoids, and if they could be regulated to be expressed more, and which genes are responsible for these cannabinoids.

Medicinal Genomics, which also has offices in the Netherlands, is currently working to sequence the genome of C. indica—which McKernan said has an assembly that is 50% larger than the
public Chemdawg strain assembly of C. sativa sequenced—and will be debuted as an iPad® app.

—Lindsay Stafford


1. Barclay E. Buzz kill: marijuana genome sequenced for health, not highs. National Public Radio. August 19, 2011: Shots NPR Health Blog. Available at: www.npr.org/blogs/health/2011/08/19/139762352/cracking-the-marijuana-genome-in-search-of-therapeutic-highs. Accessed September 28, 2011.

2. Genetic code of cannabis reported unlocked. CBS News. August 18, 2011. Available at: www.cbsnews.com/stories/2011/08/18/scitech/main20094132.shtml. Accessed September 28, 2011.

3. Science cracks the cannabis genome. Hail Mary Jane. August 29, 2011. Available at: http://hailmaryjane.com/science-cracks-the-cannabis-genome/.

4. Blázquez C, Salazar M, Carracedo A, et al. Cannabinoids inhibit glioma cell invasion by down-regulating matrix metalloproteinase-2 expression. Cancer Res. 2008;68(6):1945-52.

de Meijer EP, Bagatta M, Carboni A, et al. The inheritance of chemical phenotype in Cannabis sativa L. Genetics. 2003;163(1):335-46.

6. Medicinal Genomics sequences the cannabis genome using roche's GS FLX+ System [press release]. Branford, CT: Roche. August 18, 2011. Available at:
www.roche.com/media/media_releases/med_dia_2011-08-18b.htm. Accessed October 3, 2011.

7. Russo E. Taming THC: potential cannabis synergy and phytocannabinoid-terpenoid entourage effects. British Journal of Pharmacology. 2011;163:1344-1364.

8. Tirrell M. Marijuana DNA sequenced by startup in search for medicinal uses. Bloomberg. August 17, 2011. Available at: www.bloomberg.com/news/2011-08-18/marijuana-dna-sequenced-by-startup.html. Accessed September 28, 2011.

9. Johnson C. Marblehead startup seeks to unlock secrets of cannabis. Boston Globe. August 18, 2011. Available at:
http://articles.boston.com/2011-08-18/news/29901462_1_sequencing-cannabis-genetic-blueprint. Accessed October 3, 2011.

10. Russo E. Cannabis genome uncloaked: commentary on the scientific implications. Article for the International Cannabinoid Research Society. Unpublished. Sent to L. Stafford by E. Russo, September 22, 2011.

11. Russo E, Jiang HE, Li X, et al. Phytochemical and genetic analyses of ancient cannabis from Central Asia. Journal of Experimental Botany. 2008;59(15):4171–4182.

12. Hansen A. A small plant’s genome has huge impact. National Science Foundation. July 23, 2004. Available at: www.nsf.gov/discoveries/disc_summ.jsp?cntn_id=100162. Accessed October 2, 2011.