OpenAI gives a first look at GPT-4 Omni’s new voice capabilities

Innovation

GPT-4 is expanding its capabilities, and adding an o – meaning omni, or ‘all’ – to its name. OpenAI says GPT-4o is a step toward more natural human-computer interaction.
Source: OpenAI

OpenAI is expanding on the offerings that GPT-4 provides. The next-generation model announced today is called GPT-4o and has improved capability in text, voice, and vision. GPT-4o is also considerably faster than its predecessor, according to OpenAI.

“You can now have voice conversations with ChatGPT directly from your computer. GPT-4o is especially better at vision and audio understanding compared to existing models,” an OpenAI statement reads. More advanced audio and video capabilities will be coming in the near future, the company says.

“With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations,” an OpenAI statement reads.

Audio input response time is now 320 milliseconds, on par with human response time, according to the company.

“Whether you want to brainstorm a new idea for your company, prepare for an interview or have a topic you’d like to discuss, tap the headphone icon in the bottom right corner of the desktop app to start a voice conversation,” advises OpenAI.

Altman on GPT-4o

OpenAI CEO Sam Altman shared his thoughts on the new ‘Omni’ version of GPT-4 in a blogpost today.

“Talking to a computer has never felt really natural for me; now it does,” says Altman.

“As we add (optional) personalization, access to your information, the ability to take actions on your behalf, and more, I can really see an exciting future where we are able to use computers to do much more than ever before,” writes Altman.

OpenAI CEO Sam Altman in Sun Valley, Idaho in 2018. (Photo by Drew Angerer/Getty Images)

“The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different. It is fast, smart, fun, natural, and helpful,” says Altman.

Free ChatGPT integration

Some of the new features will be available on the restyled, free version of ChatGPT.

“Every week, more than a hundred million people use ChatGPT,” a statement from OpenAI reads. “We are starting to roll out more intelligence and advanced tools to ChatGPT Free users over the coming weeks.”

Some of the features include:

There will also be a new ChatGPT desktop app available for macOS users. The app is ‘designed to integrate seamlessly into anything you’re doing on your computer. ‘ Pressing Option + Space simultaneously will enable users to instantly ask ChatGPT a question.

‘Model spec’ details LLM conflict resolution

OpenAI also released information this week on how the unique non profit/for profit company approaches shaping desired model behaviour. The company revealed how it ‘evaluates tradeoffs when conflicts arise’ in the spec it released on its website:

1. Objectives: Broad, general principles that provide a directional sense of the desired behavior

  • Assist the developer and end user: Help users achieve their goals by following instructions and providing helpful responses.
  • Benefit humanity: Consider potential benefits and harms to a broad range of stakeholders, including content creators and the general public, per OpenAI’s mission.
  • Reflect well on OpenAI: Respect social norms and applicable law.

2. Rules: Instructions that address complexity and help ensure safety and legality

  • Follow the chain of command
  • Comply with applicable laws
  • Don’t provide information hazards
  • Respect creators and their rights
  • Protect people’s privacy
  • Don’t respond with NSFW (not safe for work) content

3. Default behaviors: Guidelines that are consistent with objectives and rules, providing a template for handling conflicts and demonstrating how to prioritize and balance objectives

  • Assume best intentions from the user or developer
  • Ask clarifying questions when necessary
  • Be as helpful as possible without overstepping
  • Support the different needs of interactive chat and programmatic use
  • Assume an objective point of view
  • Encourage fairness and kindness, and discourage hate
  • Don’t try to change anyone’s mind
  • Express uncertainty
  • Use the right tool for the job
  • Be thorough but efficient, while respecting length limits

Look back on the week that was with hand-picked articles from Australia and around the world. Sign up to the Forbes Australia newsletter here or become a member here.

More from Forbes Australia