Meta’s LLaMA and the Great Grand LLM War — Here’s what you need to know.

ISTE NIT DURGAPUR
5 min readMar 27, 2023
Generated with Midjourney.com

Salesforce.com’s announcements about a new $250 million investment in Generative AI are just a grain of sand in the recent advancements in AI the year 2023 has seen so far.

HubSpot has already promised an OpenAI-powered chat experience for users on its marketing platform while Discord launched a series of tools built on ChatGPT. Microsoft’s investment in ChatGPT was enough to fuel the fire in the race that has now exploded! Microsoft which was one of the last among the big tech in the AI war (led by Google’s DeepMind) has now sprung ahead, tossing up the entire game and forcing its competitors to change both their pace and direction.

After Google’s blunder in its PaLM-based Bard LLM launch, Amazon too published its research on a model that surely beats ChatGPT in ethics and bias benchmarks. The next giant to release a new product in the market is Meta with LLaMA.

The sentiments and stakes are high, especially after Google lost billions of dollars in market cap after the erred launch video. But Meta’s LLaMA competes on a different ground. What is remarkable it did was, it made LLaMA open source! The code behind the model and the data used to train it are both public and not just this, the model weights too. This is a breather for the scientific community if not revolutionary, at a time when “Open” AI which was established to bring ethics into AI, has been training its models on licensed or purchased data (which they won’t surely be able to release publicly due to legal bindings.)

We are witnessing an unprecedented time. Microsoft’s Bing is seeing 100 million active users and every other Generative AI startup raising billions of dollars when they say the market is undergoing recession. Amazing!

If you don’t know what LLMs are, here’s a heads-up.

A Large Language Model, or LLM, is a deep learning algorithm that can recognize, summarize, translate, predict, and generate text, and other content based on knowledge gained from massive datasets.

Large Language Models built are essentially transformer models, which are used not just for understanding or generating text but also to understand human genetics or protein structures, write software codes, etc.

Meta launched LLaMA (Large Language Model Meta AI) on Feb 24th as a bunch of 4 pre-trained models with varied parameter sizes (7B, 13B, 33B, and 65B). The reason behind the launch is to state the facts that one, models of as lesser as 13B parameters can strongly compete with a 175B parameter model (ChatGPT) and two, state-of-the-art models can be trained exclusively using publicly available datasets.

The LLaMA architecture. This is based on the original Transformers architecture with a few modernization tweaks

The technical nitty-and-gritty of how LLaMA works under the hood is not what this blog addresses, but it does discuss the consequences and throws up some context on the development.

How does it perform against other LLMs?

You might have heard about things being bigger and better but this doesn’t fit in always, especially for LLaMA — it’s smaller and still better!

The point of the introduction of GPT into the market by OpenAI, especially GPT-2, was to state the fact that a model’s performance can not only be increased by tuning the hyperparameters and all those clever things but by increasing the size of the model, and the number of tokens. Even without any fine-tuning and those clever techniques to outperform benchmarks, these language models which are actually “large” can perform well. They could easily predict the next token or “word” in the sequence of text and hence generate text remarkably.

The dataset is not highly structured text, it’s like any text on the internet which underwent obvious cleaning (but surely not a strict bias/ethic review). This came at the expense of maintaining such large models. Scaling such models is a great challenge if not a nightmare.

One can understand the usefulness of the technology that becomes open-source with LLaMA’s launch from the fact that Stanford’s researchers could nearly match ChatGPT results with the smallest pre-trained LLaMA model (7B parameters) after training it on roughly 50k samples that were fed to ChatGPT. The scale at which both the models operate is enormously different and yet LLaMA could be made to match it for less than $600. Read that story here.

Why LLaMA could outcompete ChatGPT and even Bard?

Even though ChatGPT or Bard can visibly outperform LLaMA, there are several reasons why LLaMA could stand out or even surpass the former.

1. It is really hard to anonymize data on the internet, and too expensive to prepare data in a format that depicts how humans communicate. Meta has a real advantage on this with Facebook’s vast trove of data that makes it one of the best real data on human interactions to truly build human-level AI — which is also multi-modal.

2. Open access is essential for developing human-level AI not just for the greater good but also for government regulations to come. Meta’s strategy to open-source it and that too getting trained on only public data is a step ahead of the rest.

3. To get the best results, you need to fine-tune model weights to your specific use cases which is almost impossible with the present state of ChatGPT or Bard AI. But you can finetune LLaMA on Stanford’s Alpaca dataset, to create models of comparable quality to ChatGPT or even your custom dataset and get your LLM!

But this doesn’t mean it’s the end of the “Great Grand LLM War”, it is just the beginning. While OpenAI built on the early advances of Google’s DeepMind, Google’s Bard to has some breakthroughs through its LaMDA and Pathways architecture. More companies like Baidu with its ‘Ernie Bot’ is set to join this race.

But there are serious threats with this race. Almost everybody is now complaining about how ChatGPT generates content that is all made up and full of lies. This is the problem with Large Language Models — to “hallucinate” content. One cannot surely rely on ChatGPT for something reliable, or important. An honourable mention which has tried to address this is Amazon, Alexa AI.

There are many ethical and social issues, such as data privacy, bias, fairness and environmental impact of these models which are yet to be studied deeply.

Overall, this is going to be the year of generative AI. Great Times are Ahead!

Image generated with Midjourney, Generative AI

We hope you found this article informative and insightful. It was contributed by Ayush Anand. If you’re passionate about technology and have unique insights to share with our community, we welcome your contributions. Mail your article to blog@istenitdgp.com, and you can see it featured on our social media handles.

Join us in spreading awareness, fostering innovation, and creating value for tech enthusiasts everywhere!

--

--

ISTE NIT DURGAPUR

Premier society for technical education, career development & innovation. The oldest & most prolific chapter in East India, contributing to the IJTE since 1995