OpenAI releases updates for real-time inference across audio, vision and text
According to Cointelegraph, OpenAI made four updates to its models in October to help its AI models better conduct conversations and improve image recognition. The first major update is a real-time API that allows developers to create AI-generated speech apps using a single prompt, enabling natural conversations similar to ChatGPT's advanced speech patterns. Previously, developers had to “stitch together” multiple models to create these experiences. Audio input typically needs to be fully uploaded and processed before a response is received, which means that real-time applications such as voice-to-speech conversations have high latency. With the Realtime API's streaming capabilities, developers can now realize instant, natural interactions, just like a voice assistant. Running on GPT-4, which will be released in May 2024, the API enables real-time inference across audio, vision and text. Another update includes fine-tuning tools for developers that enable them to improve AI responses generated from image and text inputs. The image-based fine-tuner enables the AI to better understand images, leading to enhanced visual search and object detection. The process includes feedback from humans who provide examples of good and bad responses for training. In addition to the speech and vision updates, OpenAI has introduced “model distillation” and “cue caching,” which allow smaller models to learn from larger ones and reduce development costs and time by reusing processed text. OpenAI expects revenues to rise to $11.6 billion next year, up from a projected $3.7 billion in 2024, according to Reuters.
Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.
You may also like
Tether CEO: Tether's bulk commodity liquidity pool could reach 5 billion US dollars by 2026