OpenAI gives a first look at GPT-4 Omni’s new voice capabilities

Innovation

GPT-4 is expanding its capabilities, and adding an o – meaning omni, or ‘all’ – to its name. OpenAI says GPT-4o is a step toward more natural human-computer interaction.

Source: OpenAI

OpenAI is expanding on the offerings that GPT-4 provides. The next-generation model announced today is called GPT-4o and has improved capability in text, voice, and vision. GPT-4o is also considerably faster than its predecessor, according to OpenAI.

“You can now have voice conversations with ChatGPT directly from your computer. GPT-4o is especially better at vision and audio understanding compared to existing models,” an OpenAI statement reads. More advanced audio and video capabilities will be coming in the near future, the company says.

“With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations,” an OpenAI statement reads.

Audio input response time is now 320 milliseconds, on par with human response time, according to the company.

“Whether you want to brainstorm a new idea for your company, prepare for an interview or have a topic you’d like to discuss, tap the headphone icon in the bottom right corner of the desktop app to start a voice conversation,” advises OpenAI.

Altman on GPT-4o

OpenAI CEO Sam Altman shared his thoughts on the new ‘Omni’ version of GPT-4 in a blogpost today.

“Talking to a computer has never felt really natural for me; now it does,” says Altman.

“As we add (optional) personalization, access to your information, the ability to take actions on your behalf, and more, I can really see an exciting future where we are able to use computers to do much more than ever before,” writes Altman.

OpenAI CEO Sam Altman in Sun Valley, Idaho in 2018. (Photo by Drew Angerer/Getty Images)

“The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different. It is fast, smart, fun, natural, and helpful,” says Altman.

Free ChatGPT integration

Some of the new features will be available on the restyled, free version of ChatGPT.

“Every week, more than a hundred million people use ChatGPT,” a statement from OpenAI reads. “We are starting to roll out more intelligence and advanced tools to ChatGPT Free users over the coming weeks.”

Some of the features include:

GPT-4 level intelligence
Get responses(opens in a new window) from both the model and the web
Analyze data(opens in a new window) and create charts
Chat about photos you take
Upload files(opens in a new window) for assistance summarizing, writing or analyzing
Discover and use GPTs and the GPT Store
Build a more helpful experience with Memory

There will also be a new ChatGPT desktop app available for macOS users. The app is ‘designed to integrate seamlessly into anything you’re doing on your computer. ‘ Pressing Option + Space simultaneously will enable users to instantly ask ChatGPT a question.

‘Model spec’ details LLM conflict resolution

OpenAI also released information this week on how the unique non profit/for profit company approaches shaping desired model behaviour. The company revealed how it ‘evaluates tradeoffs when conflicts arise’ in the spec it released on its website:

1. Objectives: Broad, general principles that provide a directional sense of the desired behavior

Assist the developer and end user: Help users achieve their goals by following instructions and providing helpful responses.
Benefit humanity: Consider potential benefits and harms to a broad range of stakeholders, including content creators and the general public, per OpenAI’s mission.
Reflect well on OpenAI: Respect social norms and applicable law.

2. Rules: Instructions that address complexity and help ensure safety and legality

Follow the chain of command
Comply with applicable laws
Don’t provide information hazards
Respect creators and their rights
Protect people’s privacy
Don’t respond with NSFW (not safe for work) content

3. Default behaviors: Guidelines that are consistent with objectives and rules, providing a template for handling conflicts and demonstrating how to prioritize and balance objectives

Assume best intentions from the user or developer
Ask clarifying questions when necessary
Be as helpful as possible without overstepping
Support the different needs of interactive chat and programmatic use
Assume an objective point of view
Encourage fairness and kindness, and discourage hate
Don’t try to change anyone’s mind
Express uncertainty
Use the right tool for the job
Be thorough but efficient, while respecting length limits

Look back on the week that was with hand-picked articles from Australia and around the world. Sign up to the Forbes Australia newsletter here or become a member here.