How can Adobe create an image generation model like Firefly when the training da

Since the surge in generative artificial intelligence began, there has been a heated debate on how to train large AI models effectively.

In one camp, primarily composed of tech companies like OpenAI, they claim that it is "impossible" to train good AI models without leveraging copyrighted data from the internet.

Opposing this stance is another camp consisting of artists who argue that AI companies have taken their intellectual property (creations) without their consent and without offering compensation.

As a tech company, Adobe is unusually aligned with the artists' camp. The approach it uses serves as a prime example, demonstrating how to build generative AI products without scraping copyrighted data from the internet.

A year ago, Adobe released its image generation model, Firefly, which has been integrated into its popular photo-editing tool, Photoshop.In an exclusive interview with MIT Technology Review, Adobe's AI leaders firmly believe that this is the only way forward.

They say that it's not just about the livelihood of creators, but also about our entire information ecosystem. What they have learned indicates that building responsible technology does not necessarily have to come at the expense of pursuing commercial goals.

Adobe's Digital Media Business President, David Wadhwani, said: "We worry that the industry, especially Silicon Valley, won't stop to ask 'how' or 'why'.

Just because you can create something, it doesn't mean you shouldn't consider the impact you might have when you're creating it."

These questions guided the development of Firefly. As the image generation craze began in 2022, the creative community had a strong backlash against artificial intelligence.Many people regard generative artificial intelligence models as "derivative content machines," creating images in the style of another artist, which has sparked a legal battle over copyright and fair use.

The latest generative AI technology also makes it easier to create deepfake content and misinformation.

Wadhwani said, "We are well aware that in order to provide creators with proper recognition and commercial legal certainty, we cannot build models by scraping data from the internet."

Adobe's Chief Technology Officer of Digital Media, Ely Greenfield, stated that Adobe wishes to "acknowledge the foundation of human labor upon which these artificial intelligences are built," while also benefiting from generative artificial intelligence.

"We must figure out how to fairly reward people's labor in the present and the future," he said.Should Data Be Collected from the Internet?

The common practice of web data scraping in artificial intelligence has recently become highly controversial. AI companies such as OpenAI, Stability.AI, Meta, and Google are facing numerous lawsuits over AI training data. Technology companies believe that scraping publicly available data from the internet is not problematic.

Writers and artists disagree with this perspective. They are advocating for a licensing-based model where creators would be compensated for the inclusion of their works in training datasets.

Dialogue with OpenAI's resident artist: Even if AI can create images, humans are

Scientists create a new method for green hydrogen production, and the price of t

Three filmmakers use Sora to generate short films, including characters such as

Help dual carbon development, scientists discover a combination of reactive nucl

Exclusive interview with ASML CTO: The company's next big strategy will be a sup

Scientists create a deep learning framework for material design, achieving mater

OpenAI Sora can generate seven surreal short films, and industries such as anima

Scientists prepare nanosheet superlattices, allowing LEDs to directly emit stron

Scientists use porphyrins to create electrochemical polymers, with a battery dis

Scientists reveal the heat transfer mechanism between heterojunctions of two-dim

Greenfield stated that the content Adobe trained with Firefly was clearly licensed for the purpose of training artificial intelligence. This means that most of the training data comes from Adobe's photo library.He added that when creative content is used to train artificial intelligence models, the company will provide additional compensation (rewards) to creators.

This stands in stark contrast to the mainstream practices in today's artificial intelligence field: many technology companies indiscriminately crawl data on the internet, yet have limited understanding of the specific content included in the training datasets.

Due to the existence of these practices, artificial intelligence datasets inevitably include copyrighted content and personal data. Some studies have also found toxic content, such as child sexual abuse material.

"Scraping" data from the internet provides technology companies with a cheap way to obtain a large amount of training data, and generally speaking, having more data allows developers to build more powerful models.

Greenfield said that restricting the training of Fireflies to licensed data is a risky decision.Greenfield said, "Honestly, when we started building Firefly and image generation models, we didn't know if we could meet customer needs without collecting web data."

"Then we found out we could, which is fantastic."

Artificial content moderators also review the training data to remove offensive or harmful content, copyrighted materials, and images of well-known personalities. The company has the corresponding licenses for the data used in the product training process.

Greenfield stated that Adobe's strategy has always been to integrate generative artificial intelligence tools into its existing products.

For example, in Photoshop, users can input text commands to the Firefly tool, allowing it to fill specific areas of the image as requested. This gives users more control over the creative process and helps to inspire their creativity.Despite this, Adobe still has more work to do. The company wants to make Firefly faster. For instance, Greenfield said that the current content moderation algorithm takes about 10 seconds to complete the review of the output content.

Adobe is also trying to figure out how some commercial clients generate copyrighted content, such as Marvel characters or Mickey Mouse.

Adobe has collaborated with companies like IBM, Mattel, Nvidia, and NASCAR, allowing these companies to use tools with their intellectual property. It is also delving deeper into areas such as audio, lip-syncing, and 3D generation tools.

If there is a problem with the data, there is a problem with the model.The decision not to collect internet data also gives Adobe an advantage in content moderation. It is well known that generative artificial intelligence is difficult to control, and developers themselves do not know why these models generate the images and text they do.

This leads to the situation where generative artificial intelligence models often output problematic and toxic content.

Greenfield said that it all comes down to the training data it has received. He said, for example, Adobe's model has never seen photos of Joe Biden or Donald Trump, so users cannot trick it into producing false political information.

The training data for Adobe's artificial intelligence model does not include news content, nor does it include celebrities. It has not been trained on any (special) copyrighted material, such as images of Mickey Mouse.

"It just doesn't understand what that concept is," Greenfield said.Adobe has also initiated an automatic content review mechanism during the creation process to check whether the creations of Fireflies are suitable for professional use. The model is prohibited from creating news stories or violent images, and the names of some artists are also blocked.

The content generated by Fireflies is also tagged, indicating that it was created using artificial intelligence, as well as the editing history of the image.

In a crucial year for the U.S. elections, it is particularly important for people to know who made a piece of content with artificial intelligence and how it was made.

Adobe has been a strong advocate for labeling artificial intelligence content, indicating its source and author (creation tool).

The company, together with The New York Times and X (formerly Twitter), has launched the "Content Authenticity Initiative," an association that promotes labeling artificial intelligence-generated content.This type of label can inform people whether the content is generated by artificial intelligence. Currently, this initiative has been supported by more than 2,500 members.

It is also part of the development of the C2PA, which is an industry standard label that shows the source and creation method of the content.

Greenfield said: "We should have done a better job in terms of media literacy and tools, and we should empower people to verify the authenticity of any so-called 'seeing is believing' content."

Claire Leibowicz, the head of Artificial Intelligence and Media Integrity at the non-profit organization Partnership on AI, said that Adobe's approach highlights the necessity for AI companies to think deeply about content moderation.

Leibowicz added that Adobe's approach to generative artificial intelligence, which balances combating misinformation and promoting business objectives (such as maintaining the autonomy and ownership of creators), serves many social goals well.She said: "Adobe's business mission is not to eradicate misinformation, but to empower creators. Isn't this the perfect integration of mission and (business) strategy, ultimately achieving a win-win situation?"

Wadhwani agreed with this. The company stated that the functionality driven by Firefly is one of its most popular features, with 90% of Firefly's web application users being new to Adobe products.

Wadhwani said: "From a business perspective, I believe our approach is definitely beneficial."

How can Adobe create an image generation model like Firefly when the training da

POST A COMMENT