Distributed ML infrastructure,
built from LatAm
An open architecture for running ML workloads across heterogeneous GPUs, distributed across three sites in Peru and Argentina. Forecasting, efficient LLMs and tabular ML in production today.
Where ColabHive is today
Honest snapshot. What runs in production today, what we're actively exploring, and what's still on the roadmap.
- 42 models available across forecasting, LLMs, tabular ML and specialists
- Forecasting workloads in production for Aureus Finance (anchor customer)
- LLM, embedding and classification workloads for GiftAI and TakeYourSide
- Distributed orchestration across 3 LatAm sites (Lima, Córdoba, Buenos Aires)
- REST API and Python SDK
- Intel Arc GPU integration to diversify beyond Nvidia/CUDA — 8 cards in evaluation, courtesy of Intel
- Auto-mode for automatic model selection
- Onboarding additional compute contributors
- HiveCredits economic layer (rewards for GPU contributors and model publishers)
- Third-party node onboarding with privacy guarantees
- Energy-aware routing with measured telemetry — architecture supports it, instrumentation in progress
- Self-serve signup and public pricing
Anchor customer. Financial forecasting in production.
AI inference hardware partner.
Exploring Arc GPU integration. 8 cards in evaluation.
What is ColabHive?
ColabHive is a distributed ML infrastructure, not a single model. The platform runs right-sized models across heterogeneous GPUs, distributed across multiple sites.
Instead of one massive model running in a centralized datacenter, ColabHive runs many specialized models — efficient LLMs, embedding and classification heads, forecasting transformers — across heterogeneous hardware in three sites today, with an architecture designed to incorporate additional contributors over time.
Right-sized models on appropriate hardware, scheduled across whatever capacity is available.
Who's Who in the Hive
End Users / Product Teams
People who send prompts to ColabHive via the API or an app. They see "one AI endpoint" that solves complex tasks behind the scenes.
Builders / Developers
People who use ColabHive to build products and models: call the API, upload datasets, train models, deploy agents (private or public).
GPU Owners / Node Operators
People who run the ColabHive Node Runtime on their GPUs. Their machines become nodes that host and run expert models/agents. Compute contributor program in design.
Behind the scenes, the Orchestrator reads requests, breaks them into subtasks, selects the right specialist agents, and combines results into unified answers.
Why This Architecture Matters
Train smarter, not bigger
- • Requires thousands of GPUs in one location
- • Needs NVLink, InfiniBand, high-speed interconnects
- • Massive upfront capital investment
- • Single monolithic model = single point of failure
- • Old hardware becomes useless quickly
- • Run on heterogeneous GPUs (consumer and datacenter)
- • No special interconnects required
- • Train Core + Specialist Agents independently
- • Automatic dataset partitioning by domain/language
- • Distribute workloads across multiple sites
The key insight: Instead of training one 70B+ parameter model that requires enterprise infrastructure, ColabHive trains a core model + specialized expert agents (adapters, LoRA, domain fine-tunes) on partitioned datasets. Experts train as independent but coordinated jobs across the hive, then work together during inference.
This means you can train production-grade ML using distributed, heterogeneous hardware across multiple sites — without needing thousands of identical GPUs in one rack.
Here's how it works: As an end user, when you send a request, the orchestrator reads it, breaks it into subtasks, selects the right expert agents from the network, and orchestrates their collaboration.
You receive one coherent, unified answer—but behind the scenes, multiple specialized AIs running on node operators' GPUs worked together to solve your problem.
Compute contributors
(program in design)
All compute today is provided by our anchor customer across three sites in LatAm. We're designing a contributor program for organizations with available GPU capacity who want to run workloads on a heterogeneous-hardware architecture.
The economic model that compensates contributors (working name: HiveCredits) is still in design. We are not yet onboarding third-party nodes in production.
If you have hardware you'd like to contribute and want early conversations, get in touch.
Build on a
Multi-Agent Platform
ColabHive isn't just an API to one model. It's a platform for building AI products that leverage many models and distributed compute.
As a builder, you'll call the ColabHive API to send complex tasks that only a swarm of specialized agents can solve. Need legal analysis combined with financial modeling and code generation? Send one request—the hive assembles the right team of expert agents automatically.
You don't pick models. You don't manage infrastructure. You describe the problem, and the hive orchestrates the solution.
What You Can Do as a Builder
1 Discover & Register Models from Hugging Face
Search the Hugging Face Hub directly from ColabHive. Filter by task type, model size, license, and popularity. We check compatibility automatically—architecture, license, VRAM requirements—and let you register with one click.
- → Task types: embeddings, reranking, translation, OCR, speech-to-text, moderation, text generation
- → One-click registration: Add to your Hive and run inference immediately
- → Lifecycle management: Track models from candidate → ready → preferred
2 Run Inference via API or SDK
Call your registered models via our unified API or Python SDK. Send tasks to specific endpoints or let the orchestrator route to the right specialist automatically.
- → Direct endpoints: Call specific models when you need control
- → Orchestrated routing: Let the hive choose the best model for the task
3 Train & Fine-Tune Your Own Models
Upload datasets via our Developer Console or API. Configure training jobs: select base models, define hyperparameters, specify hardware requirements (e.g. "24GB+ GPU", "multi-GPU").
Train where it fits. You define the job; ColabHive finds the nodes with enough VRAM and capacity. Track metrics, logs, and versions in our model registry.
Distributed Training at Scale
ColabHive supports Core Model + Expert Agents training: automatically partition your datasets, train multiple specialized experts independently, and deploy them as a coordinated group.
Need thousands of GPUs connected via NVLink/InfiniBand for tight gradient sync
Reuse old hardware anywhere in the world. Train experts independently, no high-speed interconnect required
- → Automatic dataset partitioning by domain/language/difficulty
- → Train experts as independent jobs (adapters/LoRA, fine-tunes)
- → Deploy as a preferred group for coordinated inference
- → Orchestrator routes to the right expert based on request taxonomy
4 Deploy Models as Endpoints (Private or Public)
After training, choose how to deploy:
Only accessible to your apps/organization. Not visible in the public catalog. Your private tools.
Exposed across the hive. Callable by other builders. Subject to review, benchmarking, and safety checks.
Public agents appear in the catalog with domain, capabilities, version info, and evaluation stats. The orchestrator can automatically choose among them when solving tasks.
Three Layers of
the Hive
- • Home GPU rigs
- • Small datacenters
- • Future: ASICs
- • Versioned
- • Benchmarked
- • Specialized
- • Node registry
- • Availability tracking
- • Reputation system
From User Prompt to Unified Answer
See how a single request flows through the hive—from your prompt to expert execution to a unified response.
1. End User / App
Sends a request via the ColabHive API
2. Orchestrator
Reads request, breaks into subtasks
- → Legal analysis
- → Math/optimization
- → Code snippet
3. Agents on Nodes
Each expert runs its part
- Legal agent → Node A
- Math agent → Node B
- Code agent → Node C
4. Aggregation
Orchestrator collects outputs, resolves conflicts, synthesizes insights
5. Unified Answer
Returned to user/app via API, as if one model solved everything
Behind the scenes: The orchestrator not only thinks about performance and capability, but also about energy and carbon cost—routing jobs to cleaner grids when latency allows.
From Training to Deployment to Usage
See how builders train models on the hive, deploy them as agents, and make them available for orchestration.
Builder / Developer Workspace
- → Uploads datasets via console or API
- → Defines training/fine-tuning job: base model, hyperparameters, hardware profile
Training Scheduler & Orchestrator
- → Validates the job
- → Finds eligible nodes (e.g. GPUs with enough VRAM)
- → Dispatches training job across one or more nodes
Nodes (Hive Cells) – Training Execution
- → Nodes download data shards or receive batches
- → Run training loops
- → Report metrics (loss, accuracy, etc.) to builder's dashboard
Model Registry
Trained weights and configs are versioned. Metadata includes datasets used, code version, training run identifiers.
Deployment as Agent
Builder chooses:
Used only by their own apps
Submitted for review
Review & Publication (Public Agents Only)
- ✓ Automated evaluations & benchmarks
- ✓ Safety and robustness checks
- ✓ Domain-specific tests (legal, medical, finance, etc.)
- ✓ Once approved, agent is listed in hive catalog
Usage & Orchestration
Orchestrator can now select this agent when solving relevant tasks. Other builders can explicitly call it if allowed.
Continuous Improvement
Builder can retrain/fine-tune. New versions go through the same flow. Old versions remain traceable in the registry.
Hybrid Nodes: Private-first, Public-when-idle
Every ColabHive node can serve private workloads for its owner and public marketplace jobs when idle. This technical diagram shows how the control plane, data plane, and financial core work together to enable secure, fair hybrid execution.
Elastic hybrid execution (architectural design)
The flows below describe how the architecture is designed to handle private vs. shared capacity once the contributor program and economic layer (HiveCredits) are operational. These are design targets, not the current operating mode. Today, all production capacity is provided by our anchor customer.
The intent: contributors run their own workloads on their own hardware with priority, and optionally expose idle capacity to other workloads scheduled across the network. Borrowing capacity from the network when local hardware is saturated is the symmetric flow.
The Financial Core is the settlement layer that will mediate these flows once the economic model is finalized: shared-capacity jobs go through escrow; jobs running on a contributor's own hardware are tracked for telemetry but don't trigger settlement.
Distributed Training at Scale
Train Core Models + Expert Agents on partitioned datasets without needing enterprise-grade interconnects
❌ Traditional Approach
Training large AI models traditionally requires:
- Thousands of identical GPUs in one location
- NVLink / InfiniBand high-speed interconnects
- Tight gradient synchronization (all-reduce, ring-allreduce)
- Massive capital investment + ongoing cooling/power costs
Result: Teams without datacenter access — particularly outside the US — are locked out, regardless of how much heterogeneous capacity they could otherwise muster.
✅ ColabHive Solution
Core + Expert Groups architecture:
- Any GPU anywhere (3060, 3090, 4090, A100...)
- No special interconnects required
- Automatic dataset partitioning (domain, language, difficulty)
- Independent expert training (adapters, LoRA, fine-tunes)
Result: Train production-grade ML on distributed heterogeneous hardware across multiple sites, without requiring uniform datacenter infrastructure.
Training Program Flow
Developer uploads training data + config
Split by domain, language, taxons
Independent jobs on distributed nodes
Benchmarks, quality checks, provenance tracking
Core + Experts registered as coordinated group
Orchestrator routes to right expert based on taxons
Key Advantage: Each expert trains independently on its partition—no tight gradient sync required. Nodes can be anywhere in the world on any hardware. After training, the group acts as a coordinated team during inference, with the orchestrator routing requests to the right specialist.
Constraint became
the thesis
ColabHive didn't start as an ideological project about decentralized AI. It started because we needed to train forecasting models and couldn't get reasonable access to datacenter GPUs from where we operate.
Cloud GPU pricing denominated in USD against local revenue, import restrictions and tax structures on hardware, regional quotas in hyperscalers — the standard path most ML teams take wasn't really open to us. So we built differently: GPUs distributed across three sites in Lima, Córdoba and Buenos Aires, orchestrated through a registry-based architecture that treats hardware as heterogeneous from day one.
The constraint became the thesis. Modern ML doesn't need uniform hyperscale clusters running giant models for every task. It needs right-sized models on appropriate hardware, scheduled intelligently across whatever's available.
That's also why we're exploring Intel Arc integration. Reducing dependency on a single hardware vendor isn't just a cost play — it's structural for any ML project that doesn't have unlimited cloud budget in USD. If our orchestration layer genuinely treats hardware as heterogeneous, it should run on Arc as well as on Nvidia. We're testing exactly that.
Research & Technical Papers
ColabHive is grounded in rigorous technical analysis. We publish our architectural designs, energy models, and system evaluations openly to enable scientific scrutiny and collaborative development of distributed AI infrastructure.
Global Scale ECO
Analysis of deployment scenarios from 1M to 10M GPUs, estimating capacity and environmental impact
Energy-Aware
Multi-objective cost function balancing latency, capability, and energy consumption
Specialized Agents
Case studies showing how expert ensembles outperform monolithic models on complex tasks
Want to talk?
If you're building ML workloads on heterogeneous hardware, evaluating distributed infrastructure, or want to discuss contributing compute capacity, we'd like to hear from you.
Compute contributors
Contributor program in design. Get in touch if you have GPU capacity and want early conversations.
Get in touchML Builders
Train models, deploy agents, and build on a distributed ML platform across heterogeneous hardware.
Access Console