Leveraging AI in Cancer Genomics

Mr Sam Santhosh

Introduction
We are in the midst of an AI (Artificial Intelligence) summer. Billions of dollars are being invested into both AI hardware and software and there is great excitement on the benefits to humanity that AI can provide. There is also a feeling of trepidation as finally, AI systems have become capable of taking over many white-collar jobs as well. Will AI become smarter than humans? Can they become ‘conscious’? It may take a decade or more to answer these questions – but it is very clear that the current AI tools that we have, can make a major impact in every industry. In this article, I will attempt to explain how AI can be leveraged in Cancer Genomics.

What is Cancer Genomics?

Cancer is a genetic disease that arises when normal cells acquire changes in their DNA—either by chance, or due to environmental exposures—that disrupt the regulation of normal cell growth and division transforming them to become cancerous. Over the past two decades, the advent of high-throughput omics technologies—particularly next-generation sequencing, which enables the parallel sequencing of millions of DNA fragments—has revolutionised our understanding of cancer. These technologies have revealed that cancers are not singular diseases but collections of distinct molecular subtypes, each with its own pattern of genetic and epigenetic alterations. They have uncovered key oncogenic drivers and shown that even within the same cancer type, tumours can be highly heterogeneous. As a result, cancer treatment has begun shifting from a one-size-fits-all paradigm to precision oncology, where therapeutic decisions are increasingly guided by the unique molecular profile of a patient’s tumour.

One of the most striking examples of how cancer genomics has transformed treatment comes from lung cancer. For many patients with non-small cell lung cancer (NSCLC), which was once treated with standard chemotherapy offering limited benefit, the discovery of mutations in a gene called EGFR changed everything. These mutations make the cancer cells heavily reliant on EGFR, signaling to grow. Targeted drugs that block EGFR function lead to dramatic and often long-lasting responses in patients whose tumours carried the mutation. Similar success stories followed in other cancers—such as using HER2-targeted therapies in breast cancer, or BRAF inhibitors in melanoma—where understanding the genetic makeup of the tumour allowed for treatments that are more effective and often better tolerated than traditional chemotherapy. These advances have shown that looking at the DNA of a cancer can reveal vulnerabilities that are invisible under the microscope but critical for choosing the right therapy.

Another major advance made possible by cancer genomics is the ability to monitor cancer using a simple blood test. As tumours grow and evolve, they shed fragments of their DNA into the bloodstream—known as circulating tumour DNA (ctDNA). By sequencing this DNA, cancer progression in patients can be tracked, including appearance of drug resistance even before symptoms or imaging changes appear. This allows clinicians to switch to a second-line treatment early, before the disease progresses further. In addition to lung cancer, ctDNA is increasingly being used in colorectal, breast, and other cancers, and it is opening the door to earlier detection of relapse and more timely treatment decisions.

Despite the vast amount of data generated from cancer patients, our understanding of the disease as a whole feels fragmented. One major reason is that this data often exists in silos. Genomic data may be stored separately from imaging or clinical data, making it difficult to see the full picture of the disease. Even when different types of data (multimodal data) are combined, our traditional methods of analysis fall short, unable to easily handle the complexity, noise and variability present in real-world patient data. As a result, much of the rich information hidden in these datasets remains untapped. We still struggle to predict which patient will benefit from a specific drug, why some tumours relapse despite initial response, or how the micro-environment around the tumour shapes its behaviour. In short, we have the data—but we don’t yet have the tools to fully understand what it’s telling us. The growth of AI and its ability to find patterns in complex, multimodal data could be powerfully applied to human health and disease.

A Brief History of AI

Over the last many decades, AI had promised a lot but failed to deliver. Somewhat like nuclear fusion. Great in theory but unable to make an impact in practice. I remember trying out AI programming during my MBA at IIM-Calcutta in the mid 1980s – we used LISP (List processing) then, to create Expert systems (rule based programs) – it was good fun compared to the drudgery of programming in Basic, Fortran or Pascal (which were commonly used at that time), but real-life applications were few for AI. Later in the mid 1990s, when I was running my software company, neural networks had come in vogue, and we actually got an AI commercial project – a neural network tool to predict oil prices based on past data (prices as well as events); but it did not work well. The main challenge was hardware limitations, and we gave up after some time.

Well, the world has changed now – huge clusters of servers and tremendously fast communications, now gives us practically unlimited processing power accessible from anywhere in the world. This growth of the computer processing capability gave the first inkling of the future when IBM’s Deep Blue beat the World Chess champion Gary Kasparov in 1997. Still, that type of processing power was only available to a few. But by the early 2000s, the second generation of AI started making an impact. Dedicated processors initially developed for Graphics and Gaming started getting used for Data mining, machine learning and neural networks which found many applications – though mostly in the background with users not being much aware of them. When in 2011, IBM’s Watson beat the world champions in Jeopardy, AI came more into the public perception. But still, despite huge efforts by IBM, Watson could not add much value in practical applications like healthcare. Even some hospitals in India tried out Watson in cancer diagnosis and treatment but found it wanting.

The next breakthrough came in 2016 when AlphaGo from DeepMind (it had become a Google company by then) beat the Go world champion. The game Go is much more complex than Chess – for example, after 3 moves in chess there are about 121 million possible configurations of the board. But after 3 moves in Go, there are on the order of 200 quadrillion (2x10^15) possible configurations. While IBM’s Deep Blue had used the ‘brute force’ approach where the algorithm tries to go through as many moves as possible, that approach would not work in games like Go. So, how did AlphaGo work? It combined deep neural networks with advanced search algorithms, specifically Monte Carlo Tree Search (MCTS). It was then trained through a combination of supervised and reinforcement learning. Deep learning used neural networks loosely modelled on the human brain. These systems ‘learn’ when their networks are ‘trained’ on large amounts of data. The trainers could also create many copies of AlphaGo and get it to play against itself repeatedly. So, the algorithm was able to simulate millions of games and even learn new strategies in the process. Later versions of the software like AlphaZero dispensed with any prior human knowledge. After being fed the rules of the game, the system trained on its own, playing itself millions of times over, learning from scratch to reach the level of performance that it could trounce AlphaGo with just one day’s training! Not only that, but it could also beat the world champions in other games like chess, shogi etc. In 2018, the same team came out with AlphaFold which solved the protein folding problem which had been a 50-year-old challenge in Biology. (This would lead to the two AI scientists Dennis Hassabis and John Jumper who led the development of AlphaFold receive the 2024 Nobel Prize in Chemistry).

Recent Breakthroughs

In Nov 2022, OpenAI launched ChatGPT, a chatbot using a Large Language Model (LLM) which was based on the ‘transformer’ architecture that Google had released in 2017. (GPT stands for Generative Pre-trained Transformer). It was so powerful and polymathic that it could answer any question immediately in fluent prose. It did not matter whether you wanted an essay or marketing plan or a piece of software code, ChatGPT could provide it in a fraction of a second. In the last three years, LLMs have taken the world by storm. Newer and more powerful versions have also been coming out quickly leaving us overwhelmed. However, LLMs are not perfect and will not replace human thinking right away. Their capability lies in their massive processing power and though they seem to ‘understand’ and ‘reason’ they are not actually thinking. You can consider it as a very advanced version of the autocomplete in your phone which completes the word or phrase that you are typing. After having parsed through billions of pages of text on the internet, LLMs like ChatGPT have a representation of the relationships between words, pictures etc. and their context, allowing it to create output that often looks better than what a human can do. However, LLMs have many limitations – not only the huge processing and energy that is required but also the ‘un-explainability’ of the results (it is not possible to figure out how the LLM came out with the answer) and occasional ‘hallucinations’ (it makes up facts when it can’t find the answers) - massive commercial use will take some time. Another major limitation of an LLM is that it cannot do any action. It can create a recipe but cannot make the dish. It can give a great tour itinerary but cannot book your ticket or check for hotel availability. But this is now being solved by AI agents – with a new technology called ‘Agentic AI’. These are software programs that can interact with applications or people, manipulate data, control hardware, and do real tasks that a human can do. A self-driving car is an example – numerous other solutions in the field of customer service, marketing, healthcare support, education etc. are currently being developed. Combining the power of Generative AI and Agentic AI and supported by the developments in robotics and nanotechnology, we are now entering into an exciting phase of technology convergence.

The Future of Cancer Genomics & AI

As discussed in the earlier section, the older non-AI models have limited ability to capture the true behaviour of cancer by not being able to combine multimodal data. The newer AI models, like deep learning and large language models trained in genetic data, can process multimodal data to find patterns that humans might never notice. For example, AI has been used to discover new gene signatures that predict which cancers are more likely to spread or resist treatment from biopsy images. It can also identify combinations of mutations that may not be dangerous alone but become problematic when they occur together. In some studies, AI has uncovered hidden subtypes within a cancer that explain why some patients respond better to certain therapies—even when their tumours appear similar under a microscope or by standard genetic tests. AI has also helped to make sense of vast single-cell genomics datasets, revealing how different cells within the same tumour behave differently and respond to stress or therapy.

These are insights that were difficult, if not impossible, to achieve with earlier tools—bringing us closer to truly understanding cancer at a systems level.

While AI holds tremendous promise in advancing cancer understanding, it also faces several significant challenges. First, cancer is an incredibly complex and dynamic disease. Tumours evolve over time, differ across patients, and interact with their surrounding environment in ways that are hard to capture in static datasets. AI models are only as good as the data they are trained on—and much of today’s biomedical data is incomplete, biased, or collected under different standards. This can lead to models that work well in one setting but fail in others. Another major hurdle is interpretability: AI systems can make accurate predictions but often cannot explain why, which makes it difficult for clinicians to trust or act on their outputs. Additionally, many AI models are developed using retrospective datasets, which may not fully reflect real-world clinical complexity. Finally, integrating AI tools into existing clinical workflows requires not only technical compatibility but also regulatory approval, clinician education and patient trust.

Sam Santhosh
Entrepreneur and Author
Founder SciGenom Labs
Incubator - Medgenome Labs