The AI Canvas Newsletter #14

Explore the latest AI innovations: Google's Gemini 1.5, OpenAI's Sora, and Stable Diffusion 3's creative surge.

This Week in AI: Innovations, Reports, and Features

🚀 Explore Google's Gemini 1.5: A breakthrough in AI with a million-token context window for deeper data analysis.

🎬 Meet Sora by OpenAI: Transforming text into videos, from cityscapes to animations, with minute-long creations.

🎨 Experience Stable Diffusion 3: Elevating text-to-image AI with multi-subject, high-quality image generation.

🌐 Discover Mistral Large: A multilingual language model challenging GPT-4 with advanced reasoning.

🛠️ Introducing Gemma: Google's suite of open models for responsible AI development, including a toolkit for safe applications.

🗣️ Hear Amazon's BASE TTS: A text-to-speech technology delivering natural-sounding voices from complex text.

🌌 Create with Genie: Google's AI that turns images into interactive, playable worlds without action labels.

📹 Delve into Meta's V-JEPA: Advancing machine perception with self-supervised video analysis for a deeper understanding of the physical world.

Google's Gemini 1.5 and the Leap in AI Contextual Understanding

Google's latest AI model, Gemini 1.5, introduces a substantial enhancement in performance with a pioneering long-context window capable of processing up to 1 million tokens. This advancement allows for deeper analysis and understanding across various data types, from extensive codebases to lengthy video content. The model also incorporates a Mixture-of-Experts architecture, improving efficiency in training and application, and is currently available for limited preview to developers and enterprise customers.

Find out more on the announcement page.

Sora: The AI That Crafts Videos from Text Descriptions

Sora is an AI model designed to create videos from textual prompts, producing scenes that range from realistic cityscapes to imaginative animations. The model, which is being tested by visual artists and red teamers, can generate videos up to a minute long, with a focus on adhering to the details of the user's instructions. Despite its capabilities, Sora is still being refined to overcome challenges in physical simulation and temporal consistency.

Checkout OpenAI’s announcement here.

Stable Diffusion 3: Text-to-Image AI Evolves

Stable Diffusion 3, the latest text-to-image AI model, offers enhanced capabilities for generating multi-subject images with superior quality and accurate spelling. Currently in early preview with a waitlist open for sign-ups, this model spans from 800M to 8B parameters, ensuring scalability and creative flexibility.

Find out more here.

Mistral Large: A Competitor to GPT-4 with Multilingual Prowess

Mistral AI introduces Mistral Large, a language model that rivals GPT-4 in performance, offering advanced reasoning and multilingual support for English, French, Spanish, German, and Italian. Available on La Plateforme and Azure, it provides developers with features like JSON formatting and function calling, alongside the efficient Mistral Small for latency-sensitive tasks.

Read more here

Gemma: Google's New Open Models

Google has unveiled Gemma, a suite of open models designed to empower developers and researchers in creating AI responsibly. The release includes lightweight Gemma 2B and 7B models, a Responsible Generative AI Toolkit for safe application development, and comprehensive support across major AI frameworks and hardware platforms. Gemma models are optimised for performance and safety, with a commitment to commercial usage under responsible terms.

Read more here.

BASE TTS: Amazon's Pioneering Text-to-Speech Model

Amazon has developed BASE TTS, a new technology that turns text into speech that sounds strikingly natural. By learning from a vast amount of speech data, this system can handle complex sentences with ease, making computer-generated voices more relatable and easier to understand.

Have a listen to the samples here.

Meta’s V-JEPA: Enhancing Machine Perception with Self-Supervised Video Analysis

Meta's release of the Video Joint Embedding Predictive Architecture (V-JEPA) marks a significant advancement in machine intelligence, focusing on self-supervised learning to interpret complex interactions within videos. The model, which operates under a non-commercial license, offers researchers a new tool to enhance AI's grasp of the physical world, promising more efficient learning and adaptability for a variety of tasks without the need for extensive labelled data.

Have a read here.

Genie: Crafting Interactive Worlds from Images

Genie is a novel foundation world model that can create interactive, playable environments from various image prompts, including photographs and sketches. Trained on internet videos without action labels, Genie learns to understand controllable elements and infer consistent latent actions, paving the way for endless virtual world generation and the development of generalist AI agents.

Have a read here

Technical Reads

Thinking about High-Quality Human Data – Lilian Weng

“High-quality data is the fuel for modern data deep learning model training. Most of the task-specific labeled data comes from human annotation, such as classification task or RLHF labeling (which can be constructed as classification format) for LLM alignment training. Lots of ML techniques in the post can help with data quality, but fundamentally human data collection involves attention to details and careful execution.”

Beyond Self-Attention: How a Small Language Model Predicts the Next Token - Shyam Pather

“I trained a small (~10 million parameter) transformer following Andrej Karpathy’s excellent tutorial, Let’s build GPT: from scratch, in code, spelled out. After getting it working, I wanted to understand, as deeply as possible, what it was doing internally and how it produced its results.”

Large Language Models: A Survey

“The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations.”

Synthetic Data for Finetuning: Distillation and Self-Improvement – Eugene Yan

“It is increasingly viable to use synthetic data for pretraining, instruction-tuning, and preference-tuning. Synthetic data refers to data generated via a model or simulated environment, instead of naturally occurring on the internet or annotated by humans.”

Neural network training makes beautiful fractals - Jascha Sohl-Dickstein

“My five year old daughter came home from kindergarten a few months ago, and told my partner and I that math was stupid (!). We have since been working (so far successfully) to make her more excited about all things math, and more proud of her math accomplishments. One success we've had is that she is now very interested in fractals in general, and in particular enjoys watching deep zoom videos into Mandelbrot and Mandelbulb fractal sets, and eating romanesco broccoli. My daughter's interest has made me think a lot about fractals, and about the ways in which fractals relate to a passion of mine, which is artificial neural networks.”

Needs Before Tools: A Pragmatic Approach to AI Workflow Integration - TfT Hacker

“Exploring the crucial step of identifying specific needs before selecting AI tools, ensuring technology serves as a solution, not just innovation.”

Projects and Code

Say What? Chat With RTX Brings Custom Chatbot to NVIDIA RTX AI PCs

“Tech demo gives anyone with an RTX GPU the power of a personalized GPT chatbot.”

Temporian

“Temporian is an open-source Python library for preprocessing ⚡ and feature engineering 🛠 temporal data 📈 for machine learning applications 🤖”

openllmetry

“Open-source observability for your LLM application, based on OpenTelemetry.”

magika

“Detect file content types with deep learning”

Learning

Let's build the GPT Tokenizer

“In this lecture we build from scratch the Tokenizer used in the GPT series from OpenAI. In the process, we will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely.”

Business and Trends

Journey Through the AI Canvas Podcast

Dive into 'The AI Canvas', our podcast exploring the transformative potential of generative AI. Engage in fireside chats, case studies, and innovative discussions on AI’s impact on industries and creativity.

Latest Episode of the AI Canvas

The AI Canvas - Generative AI in the Classroom: The Future of Learning with Francisco Recalde

In this enlightening episode of the AI Canvas podcast, host David Foster sits down with Francisco Recalde, Head of the Department of Languages at Dixon's Unity Academy, to explore the transformative effects of AI on education. They discuss AI’s potential in teaching and learning, the fear of AI replacing teachers, and the role of AI as a guide for students.

David Foster

Founding Partner, ADSP

Looking for more specialised consultancy?

At ADSP we’re a team of data experts who build AI products with purpose.

Get in Touch Today!