The AI Canvas Newsletter #14
Discover the latest in AI innovation from Google's Gemini 1.5 and ethical AI ensemble Gemma, OpenAI's model Sora, Stable Diffusion 3's creative surge...
The AI Canvas Newsletter #14
The AI Canvas: Your weekly palette of inspiration, insights, and innovation in the world of AI.
🚀 Explore Google's Gemini 1.5: A breakthrough in AI with a million-token context window for deeper data analysis.
🎬 Meet Sora by OpenAI: Transforming text into videos, from cityscapes to animations, with minute-long creations.
🎨 Experience Stable Diffusion 3: Elevating text-to-image AI with multi-subject, high-quality image generation.
🌐 Discover Mistral Large: A multilingual language model challenging GPT-4 with advanced reasoning.
🛠️ Introducing Gemma: Google's suite of open models for responsible AI development, including a toolkit for safe applications.
🗣️ Hear Amazon's BASE TTS: A text-to-speech technology delivering natural-sounding voices from complex text.
🌌 Create with Genie: Google's AI that turns images into interactive, playable worlds without action labels.
📹 Delve into Meta's V-JEPA: Advancing machine perception with self-supervised video analysis for a deeper understanding of the physical world.
Written by Oli Wilkins.
Google's Gemini 1.5 and the Leap in AI Contextual Understanding
Google's latest AI model, Gemini 1.5, introduces a substantial enhancement in performance with a pioneering long-context window capable of processing up to 1 million tokens. This advancement allows for deeper analysis and understanding across various data types, from extensive codebases to lengthy video content. The model also incorporates a Mixture-of-Experts architecture, improving efficiency in training and application, and is currently available for limited preview to developers and enterprise customers.
Find out more on the announcement page.
Sora: The AI That Crafts Videos from Text Descriptions
Sora is an AI model designed to create videos from textual prompts, producing scenes that range from realistic cityscapes to imaginative animations. The model, which is being tested by visual artists and red teamers, can generate videos up to a minute long, with a focus on adhering to the details of the user's instructions. Despite its capabilities, Sora is still being refined to overcome challenges in physical simulation and temporal consistency.
Checkout OpenAI’s announcement here.
Stable Diffusion 3: Text-to-Image AI Evolves
Stable Diffusion 3, the latest text-to-image AI model, offers enhanced capabilities for generating multi-subject images with superior quality and accurate spelling. Currently in early preview with a waitlist open for sign-ups, this model spans from 800M to 8B parameters, ensuring scalability and creative flexibility.
Find out more here.
Mistral Large: A Competitor to GPT-4 with Multilingual Prowess
Mistral AI introduces Mistral Large, a language model that rivals GPT-4 in performance, offering advanced reasoning and multilingual support for English, French, Spanish, German, and Italian. Available on La Plateforme and Azure, it provides developers with features like JSON formatting and function calling, alongside the efficient Mistral Small for latency-sensitive tasks.
Read more here.
Gemma: Google's New Open Models
Google has unveiled Gemma, a suite of open models designed to empower developers and researchers in creating AI responsibly. The release includes lightweight Gemma 2B and 7B models, a Responsible Generative AI Toolkit for safe application development, and comprehensive support across major AI frameworks and hardware platforms. Gemma models are optimised for performance and safety, with a commitment to commercial usage under responsible terms.
Read more here.
BASE TTS: Amazon's Pioneering Text-to-Speech Model
Amazon has developed BASE TTS, a new technology that turns text into speech that sounds strikingly natural. By learning from a vast amount of speech data, this system can handle complex sentences with ease, making computer-generated voices more relatable and easier to understand.
Have a listen to the samples here.
Meta’s V-JEPA: Enhancing Machine Perception with Self-Supervised Video Analysis
Meta's release of the Video Joint Embedding Predictive Architecture (V-JEPA) marks a significant advancement in machine intelligence, focusing on self-supervised learning to interpret complex interactions within videos. The model, which operates under a non-commercial license, offers researchers a new tool to enhance AI's grasp of the physical world, promising more efficient learning and adaptability for a variety of tasks without the need for extensive labelled data.
Have a read here.
Genie: Crafting Interactive Worlds from Images
Genie is a novel foundation world model that can create interactive, playable environments from various image prompts, including photographs and sketches. Trained on internet videos without action labels, Genie learns to understand controllable elements and infer consistent latent actions, paving the way for endless virtual world generation and the development of generalist AI agents.
Have a read here.
Technical Reads
Thinking about High-Quality Human Data – Lilian Weng
“High-quality data is the fuel for modern data deep learning model training. Most of the task-specific labeled data comes from human annotation, such as classification task or RLHF labeling (which can be constructed as classification format) for LLM alignment training. Lots of ML techniques in the post can help with data quality, but fundamentally human data collection involves attention to details and careful execution.”
Large Language Models: A Survey
“The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations.”
Synthetic Data for Finetuning: Distillation and Self-Improvement – Eugene Yan
“It is increasingly viable to use synthetic data for pretraining, instruction-tuning, and preference-tuning. Synthetic data refers to data generated via a model or simulated environment, instead of naturally occurring on the internet or annotated by humans.”
Neural network training makes beautiful fractals - Jascha Sohl-Dickstein
“My five year old daughter came home from kindergarten a few months ago, and told my partner and I that math was stupid (!). We have since been working (so far successfully) to make her more excited about all things math, and more proud of her math accomplishments. One success we've had is that she is now very interested in fractals in general, and in particular enjoys watching deep zoom videos into Mandelbrot and Mandelbulb fractal sets, and eating romanesco broccoli. My daughter's interest has made me think a lot about fractals, and about the ways in which fractals relate to a passion of mine, which is artificial neural networks.”
Needs Before Tools: A Pragmatic Approach to AI Workflow Integration - TfT Hacker
“Exploring the crucial step of identifying specific needs before selecting AI tools, ensuring technology serves as a solution, not just innovation.”
Projects and Code
Say What? Chat With RTX Brings Custom Chatbot to NVIDIA RTX AI PCs
“Tech demo gives anyone with an RTX GPU the power of a personalized GPT chatbot.”
Temporian
“Temporian is an open-source Python library for preprocessing ⚡ and feature engineering 🛠 temporal data 📈 for machine learning applications 🤖”
openllmetry
“Open-source observability for your LLM application, based on OpenTelemetry.”
magika
“Detect file content types with deep learning”
Learning
“In this lecture we build from scratch the Tokenizer used in the GPT series from OpenAI. In the process, we will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely.”
Business and Trends
🚀 Don't miss your weekly dose of cutting-edge AI innovations with The AI Canvas newsletter!
Subscribe now to ensure you never miss out on these transformative insights.
Looking for more specialised consultancy? At ADSP we’re a team of data experts who build AI products with purpose.
We deliver data science projects for companies who want to harness the power that AI can bring to their organisation. Get in touch at hello@adsp.ai.
Stay tuned with The AI Canvas podcast for in-depth episodes exploring Generative AI's transformative role across various industries.