CEO of writer.com May Habib attends the Harper’s Bazaar At Work Summit, in partnership with Porsche and One&Only One Za’abeel, at Raffles London at The OWO on November 21, 2023 in London, England.
Dave Benett | Getty Images
San Francisco-based AI startup Writer debuted a large artificial intelligence model on Wednesday to compete with enterprise offerings from OpenAI, Anthropic and others. But, unlike some of those competitors, it doesn’t need to spend as much to train its AI.
The company told CNBC it spent about $700,000 to train its latest model, including the data and GPUs, compared to the millions of dollars competing startups spend to build their own models. Its strategy has caught the attention of investors.
Writer is raising up to $200 million at a $1.9 billion valuation, according to a source familiar with the situation who spoke with CNBC. That’s nearly quadruple the company’s valuation last September, when it raised $100 million at a valuation of more than $500 million.
The company cuts costs using synthetic data, or data created by AI. It’s designed to mimic the real-world information that’s usually fed into models without compromising privacy and is becoming a more popular method for training.
A study by AI researchers revised in June found that if current AI development trends continue, tech companies will “fully exhaust” the publicly available training data between 2026 and 2032, writing that “human-generated public text data cannot sustain scaling beyond this decade.”
Amazon has used synthetic data in training Alexa, Meta has used it to fine-tune its Llama models and Microsoft-backed OpenAI is incorporating it into its models, according to job descriptions posted by the company. Some experts, however, have warned that synthetic data should be used cautiously, as it has the potential to degrade model performance and exacerbate existing biases.
Waseem Alshikh, Writer’s co-founder and CTO, told CNBC that Writer has been working on its synthetic data pipeline for years.
“There’s some confusion in the industry about the definition of ‘synthetic’ data,” Alshikh said. “To be clear, we don’t train our models on fake or hallucination data, and we don’t use a model to generate random data… We take real, factual data and convert it to synthetic data that is specifically structured in a clearer and cleaner way for model training.”
The company’s generative AI allows corporate clients to use its large language models (LLMs) to generate human-sounding text for anything from LinkedIn posts to job descriptions to mission statements, analyze and summarize data or text and build custom AI applications for market analysis and more. The company has more than 250 enterprise customers, including Accenture, Uber, Salesforce, L’Oreal and Vanguard, who use the tech across sectors like support, IT, operations, sales, and marketing.
The generative AI market is poised to top $1 trillion in revenue within a decade. To date in 2024, investors have pumped $26.8 billion into 498 generative AI deals, according to PitchBook, and companies in the sector raised $25.9 billion in 2023, up more than 200% from 2022.