About me

Current. I'm a staff data scientist at Meta, where I work on Content Understanding using Vision Language Models (VLMs). I focus on evals for multimodal models, where I focus on making state of the art models that can understand videos and images. I also worked in MSL on the router for Meta's LLMs, which is responsible for routing user queries to the best model and ensuring that the models have the necessary context to answer user queries.

About. I'm a data scientist and machine learning engineer. I've built machine learning systems to solve problems in robotics (Amazon Robotics), social media (Meta/Instagram), health care and finance (Pearl Health, Cooper Health). I've been both a manager and an individual contributor. See RESUME for more details.

Investor. I'm an LP investor in software, AI, and robotics.

Founder & Builder. I've also founded and developed several online software companies. See PORTFOLIO for more details.

Research. I earned my PhD from UPenn, where I was a member of the Network Dynamics Group. I studied the complex dynamics of large social systems, such as the spread of health behavior, the emergence of social consensus, and the collective problem-solving ability of teams of researchers. My research has been published in Science, Proceedings of the National Academy of Sciences, PLOS ONE, and elsewhere. See PUBLICATIONS for more details.

Blog

Does Better Embedding Quality Actually Improve Bayesian Optimization with LLMs?

2026-04-22

A systematic investigation into BOPRO for molecular optimization

AI Therapy: What the First RCT of a Generative‑AI Therapist Really Tells Us

2025-05-04

Therabot’s NEJM AI RCT shows that an open‑weight LLM, fine‑tuned on rigorously curated therapy transcripts and guarded by strict safety layers, yields a moderate (~d 0.6) short‑term improvement over wait‑list controls—impressive for generative‑AI healthcare, but likely an upper‑bound effect size given design inflation.

How to train your Llama – and ChatGPT-3.5 – for a retrieval augmented generation (RAG) task.

2023-09-12

How to fine-tune Llama-2 and ChatGPT-3.5 for a retrieval augmented generation (RAG) task

Produce a vivid and high-resolution illustration of a llama inspired by the art and design aesthetics of Disney's 'How to Train Your Dragon'. This llama should appear majestic, wise, and powerful, much like the dragons from the film, yet retain recognizable traits of a real-life llama. The scene should have a mystical ambiance, with the llama possibly standing on a scenic mountaintop or amidst drifting mists. Overlay this with the title 'How to Train and Host your Llama' in a font that complements the whimsical and adventurous vibe of the image.

Causal Discovery

2020-09-01

Causal discovery is the process of inferring the causal structure of a closed system using observational data.

This post outlines how causal discovery is possible using time series data, and explores some novel techniques developed by Runge and colleagues in a series of papers in Nature Communications, and Science Advances. The team has also created a software package, Tigramite, that implements these methods.

causes of smoking

Collective Intelligence

2017-06-12

Our new publication on the wisdom of crowds has come out in Proceedings of the National Academy of Sciences here.

Engineering Behavior Change Through Social Media

2015-06-10

Our research on a social media intervention to increase health in a fitness program was presented in Helskini, Finland during the IC2S2 conference.