Raktim Singh

Home PROMPT ENGINEERING What is PROMPT ENGINEERING

What is PROMPT ENGINEERING

1
What is PROMPT ENGINEERING
composite of hand holding future text with blue background

 

Prompt engineering (PE) is a concept in Natural Language Processing (NLP), which is part of Artificial Intelligence.

Here, the description of the task is given as input, e.g., as a question instead of it being implicitly given.

It refers to describing a given task as a set of inputs or prompts to the AI Engine.

We can think of AI Engine having a model. So, the prompt should have sufficient information to describe the problem and its context to help the model define the solution space from which responses are expected and extract the best response to your query.

Prompt engineering typically works by converting one or more tasks to a prompt-based dataset and training a language model. Here the language model is trained with data/information given in the prompt. For that reason, it is called “prompt-based learning”.

Prompt engineering can work from a large “frozen” pretrained language model.

In that case, only the representation of the prompt is learned (i.e., optimized), using methods such as “prefix-tuning” or “prompt tuning”.

A prompt can have one or more of the following components:

  • Instruction(s) or Question(s)
  • Input data (optional)
  • Examples (optional)

Language Model

At this point of time, we need to understand language model, which are behind these Prompt Engineering.

In AI technology, Natural Language processing (NLP) is an important component. NLP is getting advanced/better with Large Language Models (LLMs).

The LLMs are neural networks that have potentially hundreds of billions of parameters.

They are trained on a big text data…to the tune of hundreds of gigabytes of text data.

In early 2007, research scholars at Google, T. Brants et al. proved empirically that the language translation quality (indicated by the BiLingual Evaluation Understudy (BLEU) score) improves with increase in language model size.

Over the years, lot of work has happened across companies in this direction.

Few well known large language models are:

  1. OpenAI GPT-3– Released in July 2020, this model has 175 Billion parameters, trained on over 570 GB of text data from This model has been trained on NVIDIA V100 GPUs.

It uses a supercomputer for training purposes, hosted in Microsoft’s Azure cloud, consisting of 285,000 CPU cores and 10,000 high-end GPUs.

2. Cohere platform – A platform which offered both generative language models and representation language models.

3. Megatron-Turing Natural Language Generation model (MT-NLG) – Has 530 Billion parameters and is developed by NVIDIA in collaboration with Microsoft , trained on NVIDIA’s Selene machine learning supercomputer.

4. GPT-Neo, GPT-J, and GPT-NeoX – These are LLMs trained on ~825 GB of Open Source language modeling dataset called the Pile dataset. These models work well when provided with few examples.

Models like OpenAI GPT-3 can accept input instructions in English, Spanish, French, etc. However, since the model is trained on dataset that is largely in English the most likely outcomes are in English.

The model provides users with parameters (such as Temperature and Top_p) to direct the model to be more flexible with respect to its output.

If you have to use GPT-3/ChatGPT or any other model, these points should be kept in mind

  1. Choose the correct level of learning for the prompt
  2. Design the prompt effectively

Designing the prompt is very important. This entails

  1. Ability to clearly define the context
  2. Ability to clearly define the intent?
  1. Provide examples that establish a clear ‘pattern’ between the inputs and the expected output?
  2. Setting the model parameters correctly.

The input or instruction along with its context provided to the model is call a prompt. A model responds to the prompt with a response or completion. The better the prompt is designed, more accurate is the output response.

The accuracy /correctness output from the LLM depends on the ‘accuracy’ or ‘correctness or detail’ of the prompt itself. In other words, the input prompts to GPT-3 must be engineered to obtain the output that is closer to the desired output.

GPT-3 provides a series of models such as ‘code-davinci-002’ (which belong to the CODEX models series) which provide translation between natural language and code.

Over a dozen programming languages are currently supported by GPT-3. Some of the popular languages supported are Python, JavaScript, SQL, CSS, GO, Shell, etc.

The input prompt may be an instruction in natural language or a piece of code. The CODEX models can bring in huge efficiency in the Software Development Life Cycle (SDLC)

  1. It can generate code based on requirement in natural language
  2. Refactor code and correct various errors in the code
  3. Provide necessary comments in the code
  4. Create document for a given piece of input code

As of date the GPT-3 is most robust in terms of Python code generation (as per the OpenAI documentation).

GPT-3 –  the hottest buzzword

GPT or Generative Pre-trained Transformer is the hottest buzzword in the field of AI, specifically, in Natural Language Processing. GPT is a large-scale Machine Learning model that uses deep learning techniques to support natural language conversations, suitable to a context and time.

GPT-3 is the third iteration of the GPT language model, originally developed by OpenAI. GPT-3 uses past data and predictive analytics to enable conversational interactions.

It can mine internet data (small-sized text/pictures/videos) and generates large volumes of contextually relevant text/pictures/videos.

It can perform various tasks like

  1. Translating text from one language to another
  2. Writing (Generating) songs or poems
  3. Generate new text/story
  4. Generate software code.

This works on the sampling methodology and therefore, the larger the model or sampling size, the more accurate and sophisticated the conversation or interaction between the GPT-3 Chatbot and customer is likely to be.

When it was first released in 2018, GPT had 117 million parameters.

Today, per the model sizes, GPT-3 has 175 billion parameters, which help the tool analyze relevant data.

GPT-3 175B identifies patterns and associations, and studies the relevance of data contextually to help the conversational interface between itself and its customers. It works over different layers to develop a model.

Each layer is an improvement over the previous one and the most recent layer will be the most intelligent at that point in time.

It is possible to edit in this model, i.e. remove any layer, insert/add another layer. This improves the ability of the model to serve at its optimum.

Based on examples, GPT-3 can develop associations  of various inputs and generate text-like sentences, passages, paragraphs, new songs and pictures.

It can also identify patterns and associated words/-objects with each other. Thus, when asked to generate a sentence, passage or song, based on the input training data, the output will seem as if it has been “created” by a human.

Benefits of GPT-3

  • GPT has the potential to reduce the number of labeled examples needed to train a deep neural network.
  • It is a set of techniques that can train a model to predict the labels of a random subset of input data that are not labelled.
  • This allows the neural network to learn and develop high-quality features that were earlier possible with high-quality labeled data.
  • The current iterations require minimal training and input data to generate an output that is akin to a human-generated one.

On March 15 2022, OpenAI made available new versions of GPT-3. On November 30 2022, OpenAI released ChatGPT, which was fine-tuned from a model in the GPT-3.5 series.

ChatGPT: Engaging naturally

As the name suggests, ChatGPT is a chatbot built on OpenAI’s GPT-3.5 family of Large Language Models. This is an improvement over the current chatbots in that it has the memory of previous interactions, the ability to filter content for defined purposes, and a few additional features.

Launched in November 2022, ChatGPT garnered unprecedented attention, given its ability to generate detailed responses for queries across many areas.

ChatGPT is powered by AI that can be used to ask queries.

OpenAI developed ChatGPT using a methodology known as Reinforcement Learning from Human Feedback or RLHF.

In this method, the AI is trained by using a reward and punishment system. The desired action is rewarded, while the undesired one is punished.

This system of yes or no enables the RLHF to not just ask the right questions, but also generate/ provide the right answers. This methodology also enables conversational interaction that is free of jargon and technical terms.

In simpler terms, ChatGPT is a way of prompting AI to address a question in a human-like manner.

This feature makes it possible for a user to engage in a human-like conversation with ChatGPT and even generate original content.

But what makes ChatGPT stand apart is the “extra training ” that it received from human AI trainers.

The initial language model was fed with a vast number of questions and answers, which were later incorporated into its dataset.

The program was also prompted to respond to a variety of questions, which experts ranked from best to worst.

This is the real reason for ChatGPT’ s ability to generate original, human-like responses after understanding the question and gathering the appropriate information.

The question to be asked now is how companies can make use of ChatGPT.

Since it has been developed under the Large Language Model and Reinforcement Learning from Human Feedback, it can be used to not just automate repetitive tasks, but also to compile research, draft marketing content, generate code/customized instructions, provide after-sales service, increase customer engagement and more.

An important point to note here is that unlike other conversational Chatbots, which learn by the interaction happening between them and a user asking questions, ChatGPT has been trained with external data, i.e., data which was available on various digital platforms like Wikipedia, filtered version of Common crawl etc.

DALL-E2

DALL-E and DALL-E 2 are deep learning models developed by OpenAI to generate digital images from natural language descriptions, called “prompts”.

With ChatGPT, one can generate various text ( story/poem..etc). In similar way, with DALL-E2, one can generate various images/pictures. It also uses a version of GPT-3.

DALL-E , the software’s name is a blend of words. It has been taken from the names of animated robot Pixar character  WALL-E and the Spanish surrealist artist Salvador Dali.

 

You can watch my videos related to technology and fintech at my YouTube channel https://www.youtube.com/@raktim_hindi

Spread the Love!

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here