A DEEP DIVE INTO ONE OF THE COOLEST AI APPLICATIONS.
Demystifying ChatGPT
Reverse engineering the OpenAI chat demo to better understand the next-generation chatbot's intrinsic behavior.
Everyone not living under a rock should be well aware of all the buzz related to OpenAI just released ChatGPT application. Its near-human response accuracy and conversation capabilities are astonishing and opened a wide range of possible applications.
One of the critical features of ChatGPT is its ability to generate relevant and coherent responses within a given conversation. This is achieved through combining techniques, including deep learning, unsupervised learning, and fine-tuning large amounts of conversational data.
In addition to its ability to generate responses, ChatGPT also understands and interprets a conversation's context. This allows it to create appropriate responses for the situation, making it more effective at maintaining a natural and flowing conversation.
Overall, ChatGPT is a powerful and versatile tool that has the potential to revolutionize the way we interact with machines. Combining deep learning, unsupervised learning, and fine-tuning large amounts of data can generate human-like responses and help understand the context of a conversation. This makes it a valuable tool for many applications, from customer service to language translation.
Unfortunately, no API is available.
Almost every developer is looking for APIs to interact and integrate chatGPT into their application. Unfortunately, OpenAPI does not offer a public API to be used through their SDK or a simple HTTP interface, as the model explains upon request.
Using chatGPT through an SDK or directly calling remote APIs would open a whole set of applications, enabling developers to integrate its astonishing capabilities into every app.
In this article, reverse engineering of the private APIs offers a better understanding of chatGPT behavior and some insights into its functioning under the hood.
Reverse engineering the demo app
Disclaimer: this article is for the sole purpose of providing a better understanding of chatGPT intrinsics, while we wait OpenAI to release a full documentation and support through their official SDKs with relative billing. Please be careful because APIs could change without notice because they are not public and/or your account could be banned for improper service usage. If you use the results of this analysis do it at your own risk.
ChatGPT demo app is a web application released in HTML, CSS, and Javascript. Its code is fully minified and chunked using webpack, but with Chrome inspector is relatively easy to track remote network calls and identity some exciting facts.
After the login, three endpoints are invoked for every action a user does and provide support for conversation management.
Invoking these endpoints from any REST client allows interaction with the chat model, like the demo app. Much work is done through HTTP headers, which pass parameters to the stateless backend, allowing to recover session and user state. To ensure chat services are accessed through a web app, OpenAI requires to specify a user-agent header, which we have to set appropriately in our HTTP client to a valid value; otherwise, our requests will be rejected. In our case, we used Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 to emulate a Chrome user agent on a Mac OSX. This header must be set in any request.
Obtaining a session token
Once the username is known after login, the first step is to obtain a session object from its relative endpoint (https://chat.openai.com/api/auth/session). This is achieved through a call that returns the following payload:
An access token is also returned with meaningful user information lasting one month. The token is a standard JWT token that can be decoded to extract user information, the identity provider (basically Google through Auth0), and authorized scopes for user data.
Please refer to the official documentation for more information about the JWT standard and its usage in the OAuth2 workflow. For our purposes, it is enough to get the token we will use in our Authorization HTTP header for the following calls.
2. Start the conversation
Using the obtained token, we can invoke the conversation endpoint (https://chat.openai.com/backend-api/conversation) with the following parameters. An interesting aspect is that the first message requires setting the action attribute as "variant" and then an array of messages.
The response to this payload is a data stream containing the whole set of text tokens building the response. Streams are sent incrementally and terminated by a [DONE] sequence.
To provide an example, the first response in the stream is
While the last chunk of the stream is
Every chunk is sent back as a stringified JSON containing text data and a conversation_id, which helps handle the conversation follow-up.
3. Continuing conversation
Invoking the same endpoint with the "next" action and two follow-up attributes ensures the conversation context is maintained between different calls. The two fundamental parameters are conversation_id and parent_message_id. The first ensures all the messages belong to the same conversation, while the latter provides support for message ordering.
One of the most exciting attributes of the payload is a model attribute that points to text-davinci-002-render, suggesting it is using the OpenAI davinci-002 model under the hood. This model has been fine-tuned to provide chatGPT-specific information and moderate results.
Moderation monitoring is also achieved through the moderation (https://chat.openai.com/backend-api/moderations) endpoint, which receives the whole chat text every time a new sentence is appended (either by the model or the user) and returns feedback about whether the conversation contains sensitive information or not.
Where to go from here?
The release of the ChatGPT demo app gained unprecedented interest from the vast user community. Many people started discussing the power of domain-free conversation enabled by such models. However, even if no API had been released, the developer community started imagining how a possible integration could work. In such a context, we could reasonably expect the rise of conversational agents into several real-world applications shortly. In the meantime, some proofs-of-concept can be developed leveraging existing API, as discussed in this article.
My name is Luca Bianchi. I am the Chief Technology Officer at Neosperience and the author of Serverless Design Patterns and Best Practices. I have built software architectures for large-scale production workloads on AWS for nearly a decade.