Who am I?

Hi! My name is Flavius, and I've been working as a machine learning engineer for almost 7 years now. I've been working on various computer vision tasks, ranging from object detection, back when YOLO wasn't the coolest kid around the block, to encoding images and signals in the same embedding space using multi-modal models.

Last year I built and grew Audiolizer to over 1k signups, together with my partner. This year I've decided to share my experience about both Audiolizer and my research career, and writing a blog seems the best option. We aim to touch on various technical subjects in this blog, in a hopefully weekly manner, while also releasing personal posts and updates from time to time.

Why Audiolizer?

The AI industry evolves at a crazy pace, and I always felt like I was behind.

Every week comes with a new state-of-the-art paper, and I just don't find it humanly possible to keep up with while also having to code and maintain an active life in general. I also have a very bad character trait that makes me constantly compare myself with my peers, and everyone seems to be able to put more effort and ingest more information than I am.

This frustration slowly grew on me, and I had to find a solution that would make me feel like I'm doing more. I figured that I don't have enough research ideas because I never have enough time to read and understand the new papers, while I always felt that I had a lot of dead times when I was washing dishes or driving to work. And that's how Audiolizer's idea was born.

What is Audiolizer?

Audiolizer is a platform where you can convert any research work, from papers to journals, into audio content. I quickly understood that simple text-to-speech would never work because audio content requires way more focus to be understood.

Because they are meant to be read, papers contain a ton of visual data that can't be translated into audio without losing content. Audiolizer solves this by firstly sending the paper to GPT-4o. The LLM reads the paper and sees the tables and figures. It then tries to break it down into a few main sections that are explained in natural language, aiming to maintain as much content after it translates it into audio.

One of the features that helped me the most in my ML journey is that the LLM's interpretation can be tuned based on a knowledge level of your choice. Meaning that if you want to jump into a new field and don't understand the slang, instead of being lost for the first few weeks, you can just select the student knowledge level on Audiolizer, and domain-specific words will be interpreted in an understandable manner so you can grasp concepts as easily as possible.

Since we've developed this platform, I am able to go through at least 3 papers per day. That's not much, but it made me read 10x the papers I used to read in a year, and it helped me land a senior ML position, coming from a mid one.

What's next for audio papers?

Audio notes. Jokes aside, our target is to build an all-in research helper, a place where you can store all your papers, annotate them, and take audio and written notes about their content. More than this, we aim to see this product evolve into a research network, where you can share your notes and ideas about papers, and you can listen (or read) about what other people study.

Improved interpretation
Better voice selection
Podcast abilities
UI improvements

But until that point, we are working on improvements for interpretation, voices, and podcast abilities. Our UI could use a bit of tuning too, so we are committed to the continuous improvement of this platform that we've built.