A new company named Vana aims to compensate users for providing training data.
In the era of generative AI, data is incredibly valuable. So, why shouldn’t you be able to sell your own data?
From major tech companies to startups, AI developers are buying e-books, images, videos, audio, and more from data brokers to train their AI products. However, many individual creators and owners of this data aren’t getting paid for it. Vana, a startup founded by Anna Kazlauskas and Art Abal in 2021, aims to change that.
Anna Kazlauskas studied computer science and economics at MIT and previously launched a fintech automation startup called Iambiq. Art Abal, on the other hand, worked as a corporate lawyer before joining Vana. Together, they created a platform that allows users to combine their data, such as chats, speech recordings, and photos, into datasets for training generative AI models. They also want to personalize experiences, like sending daily motivational voicemails based on wellness goals or creating an art-generating app that understands individual style preferences. Check this Meta is testing its AI chatbot Instagram, WhatsApp, and Messenger in India and Africa Apr 2024.
According to Kazlauskas, Vana’s infrastructure creates a user-owned data treasury by letting users aggregate their personal data in a secure way. This allows users to own AI models and use their data across various AI applications.
This is how Vana presents its platform and API to developers:
“The Vana API links a user’s personal data from different platforms. This allows you to customize your app for each user. Your app can quickly access a user’s customized AI model or data, making it easier to get started and removing worries about computing costs. We believe users should have the freedom to bring their data from closed platforms like Instagram, Facebook, and Google to your app. This lets you create incredible personalized experiences right from the start.”
Signing up for Vana is easy. Once you verify your email, you can personalize your account by adding data like photos, a brief about yourself, and voice recordings. Then, you can browse through various apps made with Vana’s platform and data. These apps include things like chatbots, interactive stories, and even tools to help create profiles for dating apps like Hinge.
You might wonder why anyone would willingly give their personal information to a startup, especially in a time when people are more concerned about privacy and cyberattacks. After all, Vana, backed by $20 million from investors like Paradigm and Polychain Capital, is still a profit-driven company. Can we really trust them not to misuse or mishandle the data they collect?
In response to concerns about data privacy, Kazlauskas emphasizes that Vana aims to empower users to control their own data. Users have the option to store their data on their own servers instead of Vana’s and decide how it’s shared with apps and developers. Vana operates on a subscription model, starting at $3.99 per month, and charges developers a fee for using data sets for AI training. This setup discourages the company from exploiting users’ data.
“We aim to create user-owned models where everyone contributes their data,” Kazlauskas explained, “and users can take their data and models with them to any app.”
Although Vana claims it doesn’t sell users’ data for AI training, it enables users to do so themselves, beginning with their Reddit posts.
Vana recently introduced the Reddit Data DAO (Digital Autonomous Organization), which combines users’ Reddit data, such as karma and post history. Members of the DAO collectively decide how to use this data. By joining with their Reddit account, requesting their data from Reddit, and uploading it to the DAO, users earn the right to vote on decisions like licensing the combined data to AI companies for profit sharing.
Vana’s Reddit Data DAO is a response to Reddit’s decision to sell data from its platform to companies. Reddit used to allow access to its posts and communities for AI training without restrictions, but it changed its policy last year, earning millions in licensing fees. Vana’s DAO aims to free user data from platforms like Reddit that profit from it. However, Reddit is not supportive of this initiative and has banned Vana’s subreddit discussing the DAO. Reddit claims that Vana is exploiting its data export system, which is designed to comply with privacy laws.
Vana’s DAO has over 141,000 members, but it’s still a small fraction of Reddit’s user base. There are challenges in fairly distributing payments from data buyers, as the current system rewards users based on their Reddit karma, which may not accurately reflect their contributions. Vana suggests that users share more data to make the DAO more valuable, but this requires trusting Vana with sensitive information.
It’s uncertain whether Vana’s DAO will gain widespread adoption due to various obstacles. However, other startups and companies are also exploring ways to empower users to control their data. While there’s no perfect solution yet, there’s ongoing effort to find ways to address concerns around data privacy and AI training.