I am a clinical epidemiologist and bioinformatician by background. After completing my MBBS and MD in Internal Medicine at Christian Medical College, Vellore, India, I completed my MPhil in Epidemiology and Biostatistics at the University of Cambridge in 2010, followed by a PhD in examination of genetic factors associated with disease in genetically diverse populations (2013). My subsequent work as a post-doctoral fellows at the Wellcome Sanger Institute focused on the study of population history, and historical migration across Africa (as co-lead of the African Genome Variation Project). As a senior staff-scientist at the Wellcome Sanger Institute, I co-led the Uganda Genome Resource Project (Gurdasani et al., Cell, 2019), studying genetic determinants of disease across ethnically diverse populations. As a Senior Lecturer in Machine Learning at Queen Mary University of London, and Turing Fellow, I used machine learning approaches for prediction of outcomes in health, and genomic data. I work closely with Genomics England and co-lead the GeCIP project on prediction loss of function effects on health and disease within Genomics England. My recent work has focused on using machine learning and deep learning approaches in health for clustering of patient trajectories and better prediction of outcomes (Alaa, Gurdasani et al, Nature Machine Intelligence). My research interests range from the development of new NLP and machine learning methods for data mining, and phenotype clustering (e.g. in Long COVID) to developing new pipelines for drug discovery using large-scale multi-dimensional data.
My current work focuses on using Large Language models to detect early warning signals for epidemics in vast amounts of open-source data (EPIWATCH- https://www.epiwatch.org/). I work within the EPIWATCH team (Biosecurity Programme led by Prof Raina MacIntyre), leading the NLP team, epidemiology research team, and software development teams within the Programme. Our work focuses on using state-of-the-art Large Language Models (e.g. GPT-3.5, LlaMA, Orca, Platypus) for text mining, classification, named entity recognition, relational extraction, and information retrieval within the domain of public health. A key focus of the team is to developing novel LLMs for public health by supervised training on the EPIWATCH database of curated labelled data from epidemic surveillance operational since 2016.
I hold an honorary Senior lecturer in Machine Learning position at the Queen Mary University of London, UK.