Build your Own ChatGPT with AWS SageMaker in Less than 10 Minutes

Just think about how amazing it would be if you had the ability to receive a brief video summary, assess its quality to decide if it's worth your time, obtain clear and easy-to-follow instructions, and even take a quiz afterwards to assess your comprehension. Wouldn't that be incredible?
In this tutorial, we will construct precisely that!

Solution Overview

This tutorial consists of 4 parts:

  • Section 1: Introduction: SageMaker Studio Lab and OpenAI API access credentials.
  • Part 2 – Getting the transcript of a YouTube video
  • Part 3 focuses on providing a summary and translating a transcript using machine learning models.
  • Section 4: Generating instructions and forming a quiz with ChatGPT application programming interfaces.

Let’s get started!

Section 1: Introduction - SageMaker Studio Lab and OpenAI API Keys Setup In this first section, we will discuss the process of setting up SageMaker Studio Lab and acquiring the necessary OpenAI API keys.

To get started, go to the  Studio Lab landing page  and click  Request free account Complete the necessary details on the form and submit your request. A confirmation email will be sent to your email account for verification. Please follow the instructions given in the email.

You can expect to receive this kind of confirmation email for your account within a week, at the latest.

English language to rephrase the following paragraph: In this specific lesson, we will use the English language for our purposes. GPT-3.5 Turbo The pricing for API calls of this model is incredibly low, as indicated in the table. Just a small fraction of a cent is charged per API call. Additionally, with the complimentary $5 credit, you can conduct numerous experiments without any charges!

Part 2 – How to Get the Transcription of a YouTube Video

the Studio Lab and begin working. After gaining entry to the Studio Lab, log in to start your activities. Amazon SageMaker Studio Lab  .


In My Project, you have the ability to choose a computing type and begin a project runtime using a cloud computing unit. Studio Lab offers the choice between a CPU, which is suited for algorithms that require a lot of computing power, and a GPU, which is suggested for tasks like deep learning, especially involving transformers and computer vision.


Choose a GPU instance, as it will greatly decrease the time it takes to execute. Start the runtime by clicking the Start button. Then, open the project by clicking on the designated option. You might need to solve a CAPTCHA puzzle when starting the runtime. In case you need to take a break, click on Stop runtime since GPU usage is restricted to 4 hours per day. Your progress will be saved.



A project consists of various files and folders, which include Jupyter notebooks. The image provided displays the Launcher for Studio Lab. Select "default: Python" under the Notebook section. This action will generate a fresh notebook.



give it the name "My Documents" and save it. buid-ai.ipynb.

initially, we will utilize pip to install all the necessary packages for the successful completion of this tutorial. Copy the code provided below, and then paste and click Play  button at the top to execute it.

#installing libraries


!pip install python-dotenv
!pip install openai
!pip install openai-whisper openai yt-dlp
!pip install youtube_dl
!pip install youtube_transcript_api
!pip install torchaudio
!pip install sentencepiece
!pip install sacremoses

pip install transformers

Afterwards, we should bring in all the required dependencies. Simply copy and execute the code provided below:


#importing dependencies

import re
from youtube_transcript_api import YouTubeTranscriptApi
import torch
import torchaudio
import openai
import textwrap
from transformers import pipeline


All the necessary preparations have been successfully finished!

Now, we are prepared to tackle the second task mentioned in the solution overview, which is to acquire a transcript of a YouTube video. I utilized aws SageMaker video for this purpose.
paragraph. youtube_url To obtain the URL of a YouTube video, simply copy the link that appears before the "&" symbol, which is demonstrated in the screenshot provided.

Copy and run the following code:


# Specify the YouTube video URL
youtube_url = "https://www.youtube.com/watch?v=i4W7SfP6_38"

# Extract the video ID from the URL using regular expressions
match = re.search(r"v=([A-Za-z0-9_-]+)", youtube_url)
if match:
video_id = match.group(1)
else:
raise ValueError("Invalid YouTube URL")

# Get the transcript from YouTube
transcript = YouTubeTranscriptApi.get_transcript(video_id)

# Concatenate the transcript into a single string
transcript_text = ""
for segment in transcript:
transcript_text += segment["text"] + " "
print(transcript_text)

Section 3 - Providing a concise overview and converting a written record using machine learning models.

Having obtained the complete transcript of the YouTube video, we can now employ open-source models for various natural language processing tasks, like summarizing and translating. These models will assist us in extracting valuable information from the transcript.
Suppose you are not a native English speaker and want to translate the YouTube transcript into Spanish. To accomplish this, we can make use of a pre-trained machine-learning model that is specifically created for translation.


from transformers import pipeline

# Replace this with your own checkpoint
model_checkpoint = "Helsinki-NLP/opus-mt-en-es"
translator = pipeline("translation", model=model_checkpoint)

# Define the maximum sequence length
max_length = 512

# Split the input text into smaller segments
segments = [transcript_text[i:i+max_length] for i in range(0, len(transcript_text), max_length)]

# Translate each segment and concatenate the results
translated_text = ""
for segment in segments:
result = translator(segment)
translated_text += result[0]['translation_text']

print(translated_text)
 

Afterwards, we will continue by creating a summary of the video utilizing a model that has been previously trained on textual data. summarization The paragraph states that I will be using the original transcript in English. However, if you prefer to use a translated transcript, you have the option to replace it. transcript_text  variable with the  translated_text a variable which holds the text after it has been translated.


from transformers import pipeline, AutoTokenizer

# Instantiate the tokenizer and the summarization pipeline
tokenizer = AutoTokenizer.from_pretrained('stevhliu/my_awesome_billsum_model')
summarizer = pipeline("summarization", model='stevhliu/my_awesome_billsum_model', tokenizer=tokenizer)

# Define chunk size in number of words
chunk_size = 200 # you may need to adjust this value depending on the average length of your words

# Split the text into chunks
words = transcript_text.split()
chunks = [' '.join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]

# Summarize each chunk
summaries = []
for chunk in chunks:
# Summarize the chunk
summary = summarizer(chunk, max_length=100, min_length=30, do_sample=False)

# Extract the summary text
summary_text = summary[0]['summary_text']

# Add the summary to our list of summaries
summaries.append(summary_text)

# Join the summaries back together into a single summary
final_summary = ' '.join(summaries)

print(final_summary)
 

We managed to obtain a brief overview of the video's content without including any promotions, commercials, or unnecessary details.

Fourth Part - Extracting Steps and Generating a Quiz Using ChatGPT Application Programming Interfaces (APIs)

account on the Open AI platform. Once registered, you can access the API key, which is essential for making requests to the Open AI models. Additionally, you need to adhere to the usage policies and guidelines provided by Open AI to ensure ethical and responsible use of the models. OpenAI account After finishing the registration and sign-up procedure, it will be necessary for you to generate an API key. This key is crucial as it allows you to send requests to OpenAI through external services.

Navigate to the  OpenAI API keys  page and click  Create new secret key Please enter a name for the key and save it, as you will not have access to the key again in the future.
substitution in the given sentence for the variable mentioned in the double quotes openai.api_key  variable in your code.

def split_text_into_chunks(text, max_chunk_size):
return textwrap.wrap(text, max_chunk_size)

openai.api_key = "provide your key here"
max_chunk_size = 4000

transcript_chunks = split_text_into_chunks(transcript_text, max_chunk_size)
summaries = ""

for chunk in transcript_chunks:
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-16k",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": f"{chunk}\n\nCreate short concise summary"}
],
max_tokens=250,
temperature=0.5
)

summaries += response['choices'][0]['message']['content'].strip() + " "

print("Summary:")
print(summaries)


response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-16k",
messages=[
{"role": "system", "content": "You are a technical instructor."},
{"role": "user", "content": transcript_text},
{"role": "user", "content": "Generate steps to follow from text."},
]
)

# The assistant's reply
guide= response['choices'][0]['message']['content']

print("Steps:")
print(guide)





In order to complete our experiment, let's create a quiz that relates to the information presented in the video. This quiz will help us evaluate our comprehension of the material.

response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-16k",
messages=[
{"role": "system", "content": "You are a helpful assistant that generates questions."},
{"role": "user", "content": transcript_text},
{"role": "user", "content": "Generate 10 quiz questions based on the text with multiple choices."},
]
)

# The assistant's reply
quiz_questions = response['choices'][0]['message']['content']

print("Quiz Questions:")
print(quiz_questions)




You will see a quiz with 10 questions generated to test your knowledge. This can be especially helpful if you are preparing for exams. You can modify a prompt to explain the right answers.







In summary, creating a ChatGPT model with AWS SageMaker involves several steps, including preparing the dataset, configuring the instance and hyperparameters, and deploying the model. AWS SageMaker provides an easy-to-use and scalable platform for building machine learning models while reducing the complexity and resources required for model training and deployment. It is important to have a thorough understanding of natural language processing and deep learning concepts before creating a ChatGPT model.

Post a Comment

Previous Post Next Post

Book a meeting with an AI Expert For Free