Activeloop will earn $11 million to provide enterprises with a better way to leverage multimodal data for artificial intelligence

Activeloop will earn $11 million to provide enterprises with a better way to leverage multimodal data for artificial intelligence

Join us on April 10 in Atlanta and explore the security workforce landscape. We’ll look at the vision, benefits, and use cases of AI in security teams. Request an invite here.


California-based Activeloop, a startup offering a dedicated database for streamlining AI projects, today announced that it has raised $11 million in Series A funding from Streamlined Ventures, Y Combinator, Samsung Next (the startup acceleration arm of the Samsung Group) and numerous other investors .

While several data platforms exist, Activeloop, founded by Princeton dropout Davit Buniatyan, has carved a niche for itself with a system that tackles one of the biggest challenges facing enterprises today: using unstructured multimodal data to train artificial intelligence models. The company says this technology, dubbed “Deep Lake,” enables teams to build AI applications at costs up to 75% lower than market offerings, while increasing the productivity of engineering teams by up to five times.

The work is important as more enterprises look for ways to leverage their complex data sets for AI applications targeting a variety of use cases. According to McKinsey research, generative AI could generate between $2.6 trillion and $4.4 trillion annually in global corporate profits, which will have a significant impact in dozens of areas, including providing support in customer interactions, generating creative content for marketing and sales and developing software code based on natural language prompts.

How does Activeloop Deep Lake help?

Today, training high-performance core AI models involves working with petabyte-scale unstructured data across modalities such as text, audio, and video. This task typically requires teams to identify relevant data sets from disorganized silos and use them on the fly to work with various storage and retrieval technologies – an activity that requires a lot of boilerplate coding and integration on the part of engineers and can increase the cost of the project.

VB event

AI Impact Tour – Atlanta

Continuing our tour, we head to Atlanta on April 10 for the AI ​​Impact Tour stop. This exclusive, invite-only event, hosted in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Places are limited, so request an invitation today.

Ask for an invitation

Activeloop builds on this inconsistent approach by standardizing Deep Lake, which stores complex data—including images, videos, and annotations—in the form of mathematical representations (tensors) native to machine learning (ML) and makes it easy to stream these tensors into SQL Tensor Query Language, an in-browser visualization engine, or deep learning frameworks such as PyTorch and TensorFlow.

This gives developers a single platform for everything from filtering and searching multimodal data to tracking and comparing its versions over time and streaming for training models targeting different use cases.

Searching for elephants in Activeloop Deep Lake

In an interview with VentureBeat, Buniatyan says that Deep Lake offers all the benefits of a regular data lake (such as ingesting multimodal data from silos), but what stands out is that it converts all of it into the tensor format that deep learning algorithms expect as input.

Tensors are neatly stored in cloud object storage or on-premises storage such as AWS S3, and then seamlessly streamed from the cloud to graphics processing units (GPUs) for training – providing just enough data for computation to be able to make full use of . Previous approaches that dealt with large datasets required copying the data in batches, causing GPUs to idle.

Buniatyan said he started working on Activeloop and the technology in 2018, when he was faced with the challenge of storing and pre-processing thousands of high-resolution mouse brain scans at the Princeton Neuroscience Lab. Since then, the company has developed core database functionalities in two main categories: open source and proprietary.

“The open source aspect includes, but is not limited to, the dataset format, version control, and a wide range of APIs for streaming and querying. On the other hand, the proprietary segment includes advanced visualization tools, knowledge mining and a powerful streaming engine, which together increase the overall functionality and attractiveness of the product,” he told VentureBeat.

While the CEO did not share the exact number of customers Activeloop works with, he noted that the open source project has been downloaded over a million times to date, strengthening the company’s presence in the enterprise segment. Currently, the enterprise offering features a usage-based pricing model and is used by Fortune 500 companies in highly regulated industries, including biopharma, life sciences, medical technology, automotive and legal.

One customer, Bayer Radiology, used Deep Lake to unify various data processing modalities into a single data storage solution, improving data preprocessing time and enabling a new “X-ray chat” feature that allows data analysts to review scans in natural language.

“Activeloop’s knowledge search functionality has been optimized to help data teams create solutions at costs up to 75% lower than anything else on the market, while significantly increasing search accuracy, which is important in the industries served by Activeloop,” the founder added.

Plan your development

With this round of funding, Activeloop plans to expand its enterprise offerings and attract more customers to its AI database, enabling them to organize complex, unstructured data and easily search for knowledge.

The company also plans to use the funds to expand its engineering team.

“A key development in the pipeline is the upcoming release of Deep Lake v4, which includes – faster concurrent I/O, the fastest streaming data loader for training models, a full reproducible data line, and integration of external data sources,” Buniatyan noted, saying there are multiple customers in this space, but “there are no direct competitors.”

Ultimately, he hopes, this technology will save enterprises from spending millions on in-house data organization and retrieval solutions, and will also stop engineers from doing a lot of manual work and coding templates, which will make them more productive.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *