Thought leadership from the most innovative tech companies, all in one place.

Jarvis GPT: Create your own version of Jarvis from Iron Man using Python and OpenAI API

Chat GPT is a hot topic right now and it’s taking the world by storm. It’s possibility to respond in a human-like way is highly appreciated and causes many discussions about the future use of this…

image

Photo by Igor Bumba on Unsplash

Chat GPT is a hot topic right now and it’s taking the world by storm. It’s possibility to respond in a human-like way is highly appreciated and causes many discussions about the future use of this tool. There are also many opinions, including mine, that this solution is extremely similar to Jarvis — a supercomputer/assistant of Tony Stark in Iron Man movie. As a child, I always wanted to create something at least a bit similar to any of Iron Man’s tools. Well, I didn’t hesitate to check if there’s any API that I might use and create my own version of Jarvis. Wanna see the result? Then I encourage you to keep reading this post :)

OpenAI API

Fortunately, OpenAI has some API, hooray!
Unfortunately, it is not free, booh…
Fortunately, it’s quite affordable, hooray! You can check it right here: OpenAI API Link We are going to use text completion to pass some text with most likely a question, or a task to solve. In return, we should get a text response generated by the chosen model.

But wait… Jarvis did speak, right?

Of course, he did! This is the whole fun of this project that we can create a tool that will communicate with us by speaking. Let’s break it down to functionalities. Here is what we need:

  • Listen to human voice and convert it to text. The best case would be the multilingual solution
  • Communicate with the API and read the text response
  • Say it out loud (just like Jarvis would do) Those are 3 simple steps that will lead us to create our own version of Jarvis. Time to see the code.

Listen to human voice and convert it to text

To do that I used a speech recognition library called SpeechRecognition. This is the module I created to handle the first step.

# speech_recognition module
import speech_recognition as sr

r = sr.Recognizer()

def speech_to_text(language: str):
    """Language reference: https://gist.github.com/msikma/8912e62ed866778ff8cd"""

    print("Talk now")
    with sr.Microphone() as source:
        # read the audio data from the default microphone
        audio_data = r.listen(source)

    # convert speech to text
    print("Converting...")
    text = r.recognize_google(audio_data, language=language)
    return text

There is just one function that uses the functionalities delivered by the library. It listens to some input from the microphone and whenever it gets one, the data is converted to text. In this case, we are going to listen to the speech until the person stops talking. This is one of the settings that could be set. We could also set the microphone to listen for a given amount of seconds but this is not what we want. We need to have the most human experience as if we were talking to someone and then got an answer. This library could work online and offline. You can check for more details in the documentation.

Communicate with the API

This is another module of the whole program. This time it takes care of the communication with OpenAI API. Let’s break this module into smaller parts.

# OpenAI API communication module
class ChatGPTCommunication:
    URL = "https://api.openai.com/v1/completions"

    def __init__(self):
        self.AUTH_TOKEN = os.getenv("OPENAI_API_KEY")

There is a class that handles the whole communication. In the piece of code above you can notice a simple reference to the OPENAI_API_KEY variable that is stored in the .env file. There is no magic here.

  # OpenAI API communication module
  # this is inside the class ...

  def text_completion(self, text: str):
        """
        Call ChatGPT API and return the response as json.
        API link: https://beta.openai.com/docs/api-reference/completions/create
        """
        openai.api_key = self.AUTH_TOKEN
        return openai.Completion.create(
            model="text-davinci-003",
            prompt=text,
            max_tokens=100,
            temperature=0
        )

The method above presents direct communication with the API, using openai library. This is the place where we could set the model (at the moment it is text-davinci-003 but there are diverse models available. prompt contains the text that we are going to generate i.e. this is what we will say to the program. max_tokens and temperature are explained in the documentation so just to keep it short I will explain it briefly. The API is not free. For each request, we “pay” some tokens. The more text there is, the more tokens we will have to pay, so max_tokens is sort of a limitation per API request. temperature is a value between 0 and 1, where 0 will cause the API to generate “a well-defined” answer while the closer to 1, the more creative answers we might get (regarding the documentation). If you’re interested in API pricing and tokenization, here is a link to the source. Now, when receiving a response there are some special characters that we have to face. Those are most likely newline characters which will appear at the beginning of the text. That is necessary to get rid of those special characters to pass clean text to our speaking module which will be introduced in a moment. For now, I will show you how I got rid of '\n' characters.

    # this code is still inside a class ...
    def _remove_first_and_last_newline_characters(self, text: str):
        while text.startswith('\n'):
            text = text[1:]

        while text.endswith('\n'):
            text = text[:-1]

        return text

    def _clean_text(self, text: str):
        """Returns a string with removed special characters."""
        cleaned_text = self._remove_first_and_last_newline_characters(text)

        # We're expecting mostly double or single '\n' characters
        cleaned_text = cleaned_text.replace("\n\n", '. ')
        cleaned_text = cleaned_text.replace("\n", '. ')

        return cleaned_text

Those are two methods that simply check if there are any '\n' characters in the response. We remove those at the beginning and the end of the text, while those in the middle of the response should be substituted with a dot which will symbolize the end of a sentence. This way we should get a proper text that will be readable for a speaking module.

    def ask(self, text: str):
        """Send some text to ChatGPT API and return the text response of AI"""
        json_response = self.text_completion(text)
        ai_response = json_response['choices'][0]['text']
        clean_response = self._clean_text(ai_response)
        return clean_response

The method above combines all of the methods mentioned before to send some text to the API, receive a response, clean this response and return desired text.

Say it out loud

This module will contain tools to make our program speak. It will expect some text that it will be able to read in diverse languages.

import pyttsx3

class SpeakingPython:
    AVAILABLE_LANGUAGES = ('en', 'pl', 'es', 'pt', 'it', 'fr')

    VOICES = {
        'en': 'com.apple.speech.synthesis.voice.Alex',
        'pl': 'com.apple.speech.synthesis.voice.zosia',
        'es': 'com.apple.speech.synthesis.voice.jorge',
        'pt': 'com.apple.speech.synthesis.voice.joana',
        'it': 'com.apple.speech.synthesis.voice.alice',
        'fr': 'com.apple.speech.synthesis.voice.thomas'
    }

    def __init__(self, language='en'):
        if language not in self.AVAILABLE_LANGUAGES:
            raise ValueError(f"Given language is not supported! Has to be one of:\n {self.AVAILABLE_LANGUAGES}")

        self.language = language
        self.engine = pyttsx3.init()
        self.engine.setProperty('voice', self.VOICES[language])
        self.engine.setProperty('rate', 170)

    def talk(self, phrase: str):
        """Make your system say given phrase"""
        self.engine.say(phrase)
        self.engine.runAndWait()

Here, I decided to use pyttsx3 library as it supports multiple languages and performed well while testing it locally. The class contains a tuple of available languages that you can adjust using general language codes. There might be also diverse voices for the same language. It is also possible to adjust the speed of speech, which is handy because in some languages, the default rate of 200 was a bit too fast and it was difficult to understand what is being said. The method talk() is the main one which makes your program speak.

Put it all together

You did see now almost all the parts required to create your version of Jarvis. Take a look at a method from the communication with API module.

    # Communicate with API module
    # ...
    def voice_ask(self, language: str):
        """Ask a question using your microphone and hear back from chat gpt"""
        text = speech_to_text(language)
        response = self.ask(text)

        sp = SpeakingPython(language)
        sp.talk(response)

This method combines all the tools mentioned before and in the end says the response. To make the script more useful I used python’s click library to add a --lang flag to determine the language which will be expected by the script.

# main.py

@click.command()
@click.option('--lang', default='en', type=click.Choice(['en','fr','es','pl','pt','it']), help='A language that Jarvis will expect')
def ask_jarvis(lang):
    chat_gpt = ChatGPTCommunication()
    chat_gpt.voice_ask(lang)

if __name__ == '__main__':
    ask_jarvis()

Besides using the flag, the click module can set default values, and validate if the given flag is as expected (in this case the language code has to be among those in the list). The help interface also formats it in a pretty way so the user can easily understand how to use the script. It has much more functionalities and I encourage you to give it a try whenever you will create some command line script. You can see all the code in my repository: HERE
I also recorded a video to show you how it actually works.

Conclusion

That’s it guys! Now you have seen it all and you are able to create your own Jarvis ❤
This was the project that I thought about for a long time, but only ChatGPT arrival made it come true.

I hope you guys had or will have fun with this solution. I hope you have learned something valuable (I presented 3 libraries to create Jarvis and introduced you to the handy “click” module).

For more interesting content like that, I encourage you to check out my other articles and I would be grateful if you would follow me on Medium. You might want to see how to automate generating invoices as I described that in my last article.

Thanks again, hope to see you soon here and enjoy using your Jarvis!




Continue Learning