I'm Eric

I work on AI with applications in vision, language and biology

hyena_dna_paper

About

I'm a PhD student at Stanford in the BioEngineering department. I'm advised by Steve Baccus in neurobiology and Chris RĂ© in computer science. I'm a part of the Baccus lab and Hazy Research.

Updates

02/27/24: Our preprint on Evo is announced! We try to answer if "DNA is all you need" for a biological foundation model.

01/15/24: Our ICLR '24 paper on FlashFFTConv has been accepted! Excited to visit Vienna, Austria, never been!

06/28/23: I'm very excited to share HyenaDNA, a long-range foundation model for DNA! (update, accepted at NeurIPS '23 as a *spotlight* !) arxiv, blog, colab, github, checkpoints, YouTube talk

04/24/23: Our ICML '23 paper on Hyena has been accepted! And as an oral presentation, that's a first! Very fortunate to be a part of it.

03/07/23: Excited to share our work on Hyena, an alternative to attention that can learn on sequences *10x longer*, is up to *100x faster* than optimized attention, by using implicit long convolutions & gating! arxiv, code, blog

09/14/22: Our NeurIPS '22 paper on S4ND has been accepted!!! I am SO excited! We extend work on S4 to muldimensional continuous-signals like images and video.

06/21/22: I started my 2nd internship with Google Research on the Machine Intelligence and Image Understanding team, working on text-guided image generation!

05/19/22: We submitted our paper on S4ND to NeurIPS 2022, an extension of S4 to multidimensional signals for modeling images and video! Fingers crossed...!

12/01/21: I officially joined Steve Baccus' lab in neurobiology for my thesis lab! Steve studies the visual system in humans and animals. I'll also be co-advised by Chris Re in computer science! I'm excited to fuse neuroscience and AI!

07/22/21: My paper, OSCAR-Net, just got accepted into ICCV 2021! So excited for my first paper...!

06/15/21: I started my internship at Google Research! I'll be working on multimodal generation (for joint video and audio)!

01/05/21: I started a (joint) lab rotation with Chris Re and Fei-Fei Li, continuing work on CPR quality prediction in videos.

12/1/20: I'll be joining Fei Fei Li's lab in January for a rotation! I'll be in the Partnership in AI-Assisted Care - think smart hospitals with computer vision.

11/16/20: I submitted my first paper to CVPR 2021 on my work at Adobe, and we're patenting the algorithm! (Update - got my first conference rejection!)

09/21/20: I started my first lab rotation with Leo Guibas in computer science. I'll be working on detecting walkable floor space for an assistive-robotic suit.

ball-of-life

Sequence modeling and design from molecular to genome scale with Evo

We report Evo, a genomic foundation model that enables prediction and generation tasks from the molecular to genome scale. Using an architecture based on advances in deep signal processing, we scale Evo to 7 billion parameters with a context length of 131 kilobases (kb) at single-nucleotide, byte resolution. Trained on whole prokaryotic genomes, Evo can generalize across the three fundamental modalities of the central dogma of molecular biology to perform zero-shot function prediction that is competitive with, or outperforms, leading domain-specific language models. Evo also excels at multi-element generation tasks, which we demonstrate by generating synthetic CRISPR-Cas molecular complexes and entire transposable systems for the first time. Using information learned over whole genomes, Evo can also predict gene essentiality at nucleotide resolution and can generate coding-rich sequences up to 650 kb in length, orders of magnitude longer than previous methods. Advances in multi-modal and multi-scale learning with Evo provides a promising path toward improving our understanding and control of biology across multiple levels of complexity.

hyena_dna_paper hyena_dna_youtube

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution (NeurIPS '23, Spotlight)

HyenaDNA is a long-range genomic foundation model pretrained on the human genome with context lengths of up to 1 million tokens at single nucleotide resolution. It uses a simple stack of Hyena operators - a new layer based on implicit convolutions that's been shown to match attention in quality in natural language with lower time complexity. HyenaDNA enables sequences 500x longer and trains 160x faster than previous Transformer-based genomic models. HyenaDNA achieves state-of-the-art on 23 (of 28) genomic downstream tasks, and explores the first use of in-context learning in genomics. Our entire codebase is public! Go biology :)

hyena_paper

Hyena Hierarchy: Towards Larger Convolutional Language Models (ICML '23, Oral Presentation)

Hyena is an attention-replacement operator that can in-context learn on sequences 10x longer, is up to 100x faster than optimized attention, has lower time complexity, by using long convolutions & gating. Hyena is a convolutional operator for large language models that can match attention quality, while scaling subquadratically in sequence length, allowing us to train 100x faster at 64k tokens, and train on sequences up to 131k tokens long. Just like attention, Hyena can be used in Vision Transformer - matching attention on ImageNet-1k, demonstrating potential for Hyena as a general deep learning operator.

S4ND_paper

S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces (NeurIPS 2022)

The visual world is made up of multidimensional and naturally continuous-signals, but SotA computer vision models, e.g., Transformers and CNNs, use discrete pixel representations. We present S4ND, a new deep learning layer for computer vision models, that learns continuous-signal representations of images and videos by extending work on S4 (Gu. et al) into multiple dimensions. We show S4ND can boost or maintain performance of canonical vision architectures by replacing standard 2D/3D convolutions and self-attention with a continuous and global convolutional kernel.

OSCAR-Net: Object-centric Scene Graph Attention for Image Attribution (ICCV '21)

Our work to help fight against fake news was accepted at ICCV 2021! We created an algorithm that creates a unique image fingerprint that can tell whether an image has been manipulated (Photoshopped), which we call OSCAR-Net. Given any image in the wild, the algorithm creates a scene graph and uses graph neural networks and transformer encoders to learn an embedding of object visual features and their spatial relationships. We reached state-of-the-art of the PSBattles dataset.

Chest compression quality prediction from "in-the-wild" videos using self-supervised learning


Working with the Partnership in AI-assisted care in Fei-Fei Li's lab, we talked to clinicians to identify opportunities for AI to help patients in high-impact settings. Performing chest compression during CPR can vary widely in quality, leading to unneccary death. We trained a vision model to detect and rate the quality of chest compressions in order to provide real-time feedback. Our model can predict the rate of compression and detect when too much time has passed in between compressions, both key clinical metrics correlated to survival rates. We collected and annotated raw CPR videos from YouTube, and used self-supervision to increase the data signal, in particular for measuring the rate of compression.

depth_video

Dataset for egocentric walkable floor space detection

I rotated with Leo Guibas' Geometric Computing Lab, and worked on the multi-camera perception system for an exoskeleton suit to help people walk and avoid obstacles. I created a dataset for egocentric walkable floor space detection using sparse footprint signals. Here's a sample of the depth video collected.

Abstract

Stool Classification Using Deep Metric Learning

I researched human stool classification using deep metric learning at Cornell Tech. You can read about our methods in the paper here. My team crowdsourced a dataset and had three physicians annotate images of human stool. We then tried several architectures and techniques in deep learning, and ultimately we were able to predict a key clinical metric, the Bristol Stool Scale, with near-doctor level accuracy.

Abstract

I worked on a research collaboration between UC Berkeley and the Tokyo Institute of Technology researching seismically-resistant bridge columns. I built a finite-element model to predict the stress-strain relation and failure modes of a novel interlocking steel reinforcement column. My work was published in the 2014 Berkeley McNair Scholar Journal.

Auggi AI

Auggi is a startup out of Cornell that's passionate about gut health. I worked on the technology that allows you to take a picture of your stool on your phone and automatically characterize and extract clinical data from that image.


Attention model demo

TVision

TVision is a startup in NYC disrupting the TV analytics industry. I worked on a prototype to analyze how people watched TV in their living rooms as part of opt-in home studies. I used face detection and head pose algorithms to predict a person's level of attention second by second while watching shows or ads.

Here's one of the prototypes I built for TVision in the summer of 2018.

AirCam

With AirCam, you can ask "hey Google, where is the T.V. remote?" and it will verbally tell you where it is. AirCam is a system that gives smart speakers "eyes" using an overhead camera. On voice command, it will turn on a camera and find everyday objects you lose around your living room, including your phone and keys.

AirCam was an applied research project with Professor Serge Belongie at Cornell and demoed at Cornell Tech's Open studio in December 2018.

DrowsyCam

DrowsyCam makes driving safer by monitoring your sleepiness level. When your eyes are closed for too long, it will send an alert message to wake you up. It uses key point feature detection around your eyes and calculates a ratio between the height and width, distiguishing between open and closed.

Our team of 3 engineers at Cornell built a standalone IoT device that runs on a Raspberry Pi and was demoed in April 2018.

Education

Stanford University, PhD Bioengineering + AI (2020-)

AI in vision, language and biology

Cornell University, MEng Computer Science (2019)

Computer vision and machine learning

Stanford University, MS Civil Engineering (2009)

Sustainable construction and design

UC Berkeley, BS Civil Engineering (2007)

Earthquare engineering

Work Experience

Google Research (Jun 2022 - Sep 2022)

Research Intern

Multimodal image inversion & generation.

Google Research (Jun 2021 - Nov 2021)

Research Intern

Multimodal content generation (simultaneous video + audio).

Adobe Research (Jun 2020 - Dec 2020)

Deep Learning Research Intern

Detecting authentic and tampered images for journalists using deep learning on the Content Authentication Initiative. Using scene graphs and graph neural networks to model discrepancies in objects and their spatial relationships in images.

Facebook AI (Jun 2019 - Apr 2020)

Computer Vision Researcher

Image tampering and fake ID detection using deep learning and image forensics. Self-supervision, anomaly detection, metric learning, representation learning.

TVision

Computer Vision Engineer Intern

A startup measuring at-home TV viewership and attention of shows and ads using computer vision in the living room

Power Advocate

Energy Consultant Manager

Strategic sourcing consultant for energy companies

Aspen Environmental Group

Energy Analyst

Energy policy analyst for regulatory agencies in California

Calera Corporation

Lab Engineer

A start up company developing green cement products from power plant flue gas CO2 and seawater

Curtins Consulting

Structural Engineer Intern

Structural analysis of buildings in the London area

University of Tokyo

Earthquake Engineer Research Intern

Researching earthquake-resistant bridge designs in Tokyo

Pacific Earthquake Engineering Research Center

Structural Engineer Research Intern

Testing new earthquake-resistant concrete materials