Paper page - Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
Join the discussion on this paper page
Monterey Bay Aquarium

ellievsbear

roma★
occasionally subtle
he wasn't even looking at me and he found me
"I'm Dorothy Gale from Kansas"
🪼

tannertan36
tumblr dot com
we're not kids anymore.
Claire Keane
ojovivo
Jules of Nature
No title available
PUT YOUR BEARD IN MY MOUTH
taylor price
I'd rather be in outer space 🛸

Origami Around
hello vonnie
Misplaced Lens Cap

seen from Malaysia

seen from United Kingdom

seen from Malaysia

seen from South Africa

seen from United States

seen from Germany
seen from Türkiye
seen from Uzbekistan

seen from Kuwait
seen from Brazil
seen from United States

seen from Malaysia

seen from United States
seen from United States

seen from United States
seen from United States
seen from United States
seen from United States
seen from United States
seen from United States
@jamalir
Paper page - Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
Join the discussion on this paper page
rasbt/llama-3.2-from-scratch · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://x.com/llamafactory_ai/status/1893879214727991504?t=0rz_iG3YO_ppFRatiDqh0A&s=09
Meta AI Releases the Video Joint Embedding Predictive Architecture (V-JEPA) Model: A Crucial Step in Advancing Machine Intelligence - MarkTechPost
Meta AI Releases the Video Joint Embedding Predictive Architecture (V-JEPA) Model: A Crucial Step in Advancing Machine Intelligence
Paper page - Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Join the discussion on this paper page
SmolVLM2: Bringing Video Understanding to Every Device
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Magma: A Foundation Model for Multimodal AI Agents
Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face
A Blog post by Daniel Voigt Godoy on Hugging Face
TGI Multi-LoRA: Deploy Once, Serve 30 Models
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Paper page - DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Join the discussion on this paper page
Qwen2.5 VL! Qwen2.5 VL! Qwen2.5 VL! | Qwen
QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD We release Qwen2.5-VL, the new flagship vision-language model of Qwen and also a significan
Open-R1: a fully open reproduction of DeepSeek-R1
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
GitHub - DAMO-NLP-SG/VideoLLaMA3: Frontier Multimodal Foundation Models for Image and Video Understanding
Frontier Multimodal Foundation Models for Image and Video Understanding - DAMO-NLP-SG/VideoLLaMA3
Paper page - MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Join the discussion on this paper page
Paper page - 1.58-bit FLUX
Join the discussion on this paper page
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Apollo: An Exploration of Video Understanding in Large Multimodal Models