*.*
News7News 7
HomeTechnologyAI Briefing: Writer’s CTO on how to make AI models think more creatively

AI Briefing: Writer’s CTO on how to make AI models think more creatively

by News7

When training data is similar across major large language models, finding ways to make them more creative and more differentiated is increasingly important. That reality has more enterprise customers asking for ways to make AI more creative when generating content — and to help with the actual process of thinking creatively.

Last month, the AI startup Writer released a new LLM called Palmyra Creative that aims to help enterprise businesses squeeze more creativity out of generative AI. The goal isn’t just to help with outputs; it’s also to help companies using AI in more creative ways. Palmyra Creative follows other domain-specific LLM released from Writer such as the the healthcare-focused Palmyra Med and the finance-focused Palmyra Fin. (Writer’s customers using various models include Qualcomm, Vanguard, Salesforce, Kenvue, Uber and Dropbox.)

In terms of creative thinking, AI models overall already have evolved quite a bit over the past few years. Some experts have found LLMs to be more creative than humans in areas like divergent thinking. Last year, researchers at the University of Arkansas published a paper exploring how OpenAI’s GPT-4 model is able to generate multiple creative ideas, find varied solutions to problems, and explore various angles. However, current LLMs still are largely limited to their own knowledge via training data — rather than lived experiences or learned lessons like humans are able to tap into.

Writer’s process involves creating AI models that are self-adapting or “self-evolving,” said Writer CTO Waseem Al Shikh, who co-founded the company with Writer CEO May Habib in 2020. Rather than worrying about the sheer size of a model, Shikh explained the company’s focus now is on developing models with a framework built around three separate buckets: model knowledge, model reasoning and model behaviors. 

“It’s not just enough to have a creative model,” Al Shikh told Digiday in an interview last month. “It’s just like a human, right? If you all just have the same libraries with a lot of books, each will come with ideas, but the funny thing is we’re not just creating all the ideas with one clear theme. So the plan in the future now is to have self-evolving functionalities to all of our models and having creativity be at the top of the list.”

Writer’s updates also benefit from the company’s partnership with Nvidia through the use of NIMs — short for Nvidia Inference Microservices — that help simplify and speed up how AI models are deployed and scaled across various enterprise-specific uses. In a way, NIMs serve as somewhat of a flight controller that helps decide which AI model and when to use it depending on the company, its knowledge and the desired task. 

“With workflows, you know the start and the steps,” Shikh said. “This concept of NIM is very futuristic, we can get there, but you’ll need all these models. This is why we’re building domain-specific models. You can have three or four or five specific models and they are self-evolving for customer’ behaviors.”

Unlocking new ways to think more creatively could give marketers and others new ways to find fresh ideas, break out of AI echo chambers and escape the uniform patterns that plague many AI outputs. Writer sees retailers potentially using Palmyra Creative for personalized marketing campaigns or enhanced loyalty programs. The models might help healthcare providers simplify patient communications, equip financial firms to create more educational tools or give B2B tech companies ideas for product-positioning and refining technical documents.

This conversation has been edited for brevity and clarity. 

What makes Palmyra Creative different from other models?

Our larger model and bigger models — for example finance or medical — are more focused on what we call knowledge. We want them to be accurate for every single formula and every single medicine they use. When you go to a financial model, it’s about focusing on core reasoning and math equations. The behavior will change also. General models try to balance between those [knowledge, reasoning and behavior].

What was different about the model development process?

Since all the models have similar architectures and similar training data, you know it’s just finding similarity with the weights and how much this weight actually looks like. What we decided to do is actually take the same training data we have today, but we were more creative with the creative weights. We trained three separate models and then we started to merge the models and shuffle them between the layers. What happens then is you have a unique relation that doesn’t exist within any other model. We also found out the model has interesting behaviors — the model can actually push back and doesn’t follow the traditional path of everyone else because the weight is very unique to the model itself. We call it dynamic merging between the layers. 

Merging a model is not a new idea, but what is new is the technique itself and the utilization of the technique. The different thing here is we are slicing the model between them and we have a specific way to make sure the relationship between them is not broken so you don’t end up having a gibberish output or a strange hallucination. It’s a thin line between what ends up as hallucination and what creativity looks like.

Reminds me of how creativity often happens in the blurred line between fact and fiction.

A hundred percent. But we have to define it, especially with enterprise customers. What we end up saying is we want the model to say whatever it wants, but we need the model to be careful about one thing, which we call claims. There’s a difference between “let me give you a crazy idea” and a claim that seemed unchecked. We did a lot of work around what we call controlled claims. We don’t have the source of truth [for the model] because we cannot consider for example Wikipedia the source of truth, can we? It has a lot of random stuff. We cannot consider every single thing coming from every single government on the planet to be the source of truth. But we decided to say keep the model creative, but don’t claim statements.

Hallucinations often come with more of the explainability question when it’s having to justify itself. Is that maybe less of an issue without needing to verify claims?

Exactly. We decided to start from the root of it and control the claim … The [Palmyra] Creative model is less about knowledge and more about behavior. We think enterprises will love this creative model to write a case study or find new use cases or to write more creative stories about how to adopt their products and how you can explain it without what sounds like AI. But controlling the claim was the biggest part. Like you said, if you don’t have a claim, you don’t have to explain it. 

How do you guide the model for when it should evolve or be creative and when it should be consistent?

We’ve been working on it since early summer. What if we could make these models think more like a human? What if the models can reflect, revolve and remember? Basically, can we get those to start working outside the training set in real-time? All the models today are still stuck to the training data – without the training data, it’s really hard to get it to do anything. This is what we call self-evolving. Self-evolving models mean you don’t need to teach them. The model will update their weight in real time. The model will actually reflect. And the model itself can actually ensure the information.

To give you a bad example: If I say my name is Waseem and I’m the president of the United States, the model will be smart enough to know, ‘Maybe your name is Waseem, but you’re not the president of the United States.’ This stuff that’s really important, meaning if you use it more, the model will gain more control and more knowledge. It’s more high-level and takes a lot of time to explain, but it’s a standard transformer design with a new feature called Memory. For each layer inside the neural network has the memory layer next to it. So you can actually talk to it and see it change. 

Because the model basically will not do the same mistake twice because we know that wrong answer. It remembers the wrong [one] and will try it differently next time we think about the question. I love to tell my team, most humans — not all of us — learn from our mistakes and we don’t do the same mistakes twice.

Prompts & Products — AI-related news and announcements this week

Rembrand, a generative AI startup that helps brands place virtual products in social media and other content, raised $23 million in Series A funding.

Lucid Motors, the electric car company, is partnering with SoundHound AI to integrate a new in-vehicle voice assistant into cars to give drivers real-time information and more in-vehicle controls.

A new campaign from TurboTax promotes AI agents and “AI-powered human experts” to the Intuit-owned app to help people file their taxes.

AI will be all over Las Vegas next week during CES 2025 as tech giants, startups and brands descend on the Nevada desert to promote their various updates and partnerships.

AI stories from across Digiday

How AI could shape content and ads in 2025

Generative AI grows up: Digiday’s 2024 timeline of transformation

The definitive Digiday guide to what’s in and out for advertising in 2025

2024 in review: A timeline of the major deals between publishers and AI companies

Why early generative AI ads aren’t working and how creatives will shift to integrate the tech into their work

How Omnicom’s purchase of IPG changes the notion of an agency holding company

https://digiday.com/?p=564480

Source : DigiDay

You may also like

12345678..........................%%%...*...........................................$$$$$$$$$$$$$$$$$$$$--------------------.....