Early-stage. Three sites. Three production workloads.

Distributed ML infrastructure,
built from LatAm

An open architecture for running ML workloads across heterogeneous GPUs, distributed across three sites in Peru and Argentina. Forecasting, efficient LLMs and tabular ML in production today.

PROJECT STATUS

Where ColabHive is today

Honest snapshot. What runs in production today, what we're actively exploring, and what's still on the roadmap.

In production today
  • 42 models available across forecasting, LLMs, tabular ML and specialists
  • Forecasting workloads in production for Aureus Finance (anchor customer)
  • LLM, embedding and classification workloads for GiftAI and TakeYourSide
  • Distributed orchestration across 3 LatAm sites (Lima, Córdoba, Buenos Aires)
  • REST API and Python SDK
Currently exploring
  • Intel Arc GPU integration to diversify beyond Nvidia/CUDA — 8 cards in evaluation, courtesy of Intel
  • Auto-mode for automatic model selection
  • Onboarding additional compute contributors
Roadmap, not reality yet
  • HiveCredits economic layer (rewards for GPU contributors and model publishers)
  • Third-party node onboarding with privacy guarantees
  • Energy-aware routing with measured telemetry — architecture supports it, instrumentation in progress
  • Self-serve signup and public pricing
Anchor customer and partners
Aureus Finance

Anchor customer. Financial forecasting in production.

Tab-Core

AI inference hardware partner.

Intel

Exploring Arc GPU integration. 8 cards in evaluation.

INTRODUCING

What is ColabHive?

ColabHive is a distributed ML infrastructure, not a single model. The platform runs right-sized models across heterogeneous GPUs, distributed across multiple sites.

Instead of one massive model running in a centralized datacenter, ColabHive runs many specialized models — efficient LLMs, embedding and classification heads, forecasting transformers — across heterogeneous hardware in three sites today, with an architecture designed to incorporate additional contributors over time.

Right-sized models on appropriate hardware, scheduled across whatever capacity is available.

Who's Who in the Hive

End Users / Product Teams

People who send prompts to ColabHive via the API or an app. They see "one AI endpoint" that solves complex tasks behind the scenes.

Builders / Developers

People who use ColabHive to build products and models: call the API, upload datasets, train models, deploy agents (private or public).

GPU Owners / Node Operators

People who run the ColabHive Node Runtime on their GPUs. Their machines become nodes that host and run expert models/agents. Compute contributor program in design.

Behind the scenes, the Orchestrator reads requests, breaks them into subtasks, selects the right specialist agents, and combines results into unified answers.

Why This Architecture Matters

Train smarter, not bigger

❌ Traditional AI Training
  • • Requires thousands of GPUs in one location
  • • Needs NVLink, InfiniBand, high-speed interconnects
  • • Massive upfront capital investment
  • • Single monolithic model = single point of failure
  • • Old hardware becomes useless quickly
✅ ColabHive Approach
  • • Run on heterogeneous GPUs (consumer and datacenter)
  • • No special interconnects required
  • • Train Core + Specialist Agents independently
  • • Automatic dataset partitioning by domain/language
  • • Distribute workloads across multiple sites

The key insight: Instead of training one 70B+ parameter model that requires enterprise infrastructure, ColabHive trains a core model + specialized expert agents (adapters, LoRA, domain fine-tunes) on partitioned datasets. Experts train as independent but coordinated jobs across the hive, then work together during inference.

This means you can train production-grade ML using distributed, heterogeneous hardware across multiple sites — without needing thousands of identical GPUs in one rack.

Here's how it works: As an end user, when you send a request, the orchestrator reads it, breaks it into subtasks, selects the right expert agents from the network, and orchestrates their collaboration.

You receive one coherent, unified answer—but behind the scenes, multiple specialized AIs running on node operators' GPUs worked together to solve your problem.

1
User
Question sent in
2
Orchestrator
Breaks down task, selects experts
3
Expert Agents
Multiple specialized agents process simultaneously
4
Unified Answer
Combined coherent response returned
CONTRIBUTOR PROGRAM

Compute contributors
(program in design)

All compute today is provided by our anchor customer across three sites in LatAm. We're designing a contributor program for organizations with available GPU capacity who want to run workloads on a heterogeneous-hardware architecture.

The economic model that compensates contributors (working name: HiveCredits) is still in design. We are not yet onboarding third-party nodes in production.

If you have hardware you'd like to contribute and want early conversations, get in touch.

In design: contributor program • Node runtime documentation • Economic model
Run specialist models
Host the models that match your hardware profile — efficient LLMs, embedding models, classifiers, forecasting heads.
Heterogeneous hardware
The orchestration layer is designed to schedule workloads across whatever's available — Nvidia today, Intel Arc in evaluation.
Future: economic layer
HiveCredits — the layer that compensates contributors for compute and model usage — is in design.
FOR BUILDERS / DEVELOPERS

Build on a
Multi-Agent Platform

ColabHive isn't just an API to one model. It's a platform for building AI products that leverage many models and distributed compute.

As a builder, you'll call the ColabHive API to send complex tasks that only a swarm of specialized agents can solve. Need legal analysis combined with financial modeling and code generation? Send one request—the hive assembles the right team of expert agents automatically.

You don't pick models. You don't manage infrastructure. You describe the problem, and the hive orchestrates the solution.

What You Can Do as a Builder

1 Discover & Register Models from Hugging Face

Search the Hugging Face Hub directly from ColabHive. Filter by task type, model size, license, and popularity. We check compatibility automatically—architecture, license, VRAM requirements—and let you register with one click.

  • Task types: embeddings, reranking, translation, OCR, speech-to-text, moderation, text generation
  • One-click registration: Add to your Hive and run inference immediately
  • Lifecycle management: Track models from candidate → ready → preferred

2 Run Inference via API or SDK

Call your registered models via our unified API or Python SDK. Send tasks to specific endpoints or let the orchestrator route to the right specialist automatically.

  • Direct endpoints: Call specific models when you need control
  • Orchestrated routing: Let the hive choose the best model for the task

3 Train & Fine-Tune Your Own Models

Upload datasets via our Developer Console or API. Configure training jobs: select base models, define hyperparameters, specify hardware requirements (e.g. "24GB+ GPU", "multi-GPU").

Train where it fits. You define the job; ColabHive finds the nodes with enough VRAM and capacity. Track metrics, logs, and versions in our model registry.

Distributed Training at Scale

ColabHive supports Core Model + Expert Agents training: automatically partition your datasets, train multiple specialized experts independently, and deploy them as a coordinated group.

❌ Traditional Datacenters

Need thousands of GPUs connected via NVLink/InfiniBand for tight gradient sync

✅ ColabHive

Reuse old hardware anywhere in the world. Train experts independently, no high-speed interconnect required

  • Automatic dataset partitioning by domain/language/difficulty
  • Train experts as independent jobs (adapters/LoRA, fine-tunes)
  • Deploy as a preferred group for coordinated inference
  • Orchestrator routes to the right expert based on request taxonomy

4 Deploy Models as Endpoints (Private or Public)

After training, choose how to deploy:

Private Agent

Only accessible to your apps/organization. Not visible in the public catalog. Your private tools.

Public Hive Agent

Exposed across the hive. Callable by other builders. Subject to review, benchmarking, and safety checks.

Public agents appear in the catalog with domain, capabilities, version info, and evaluation stats. The orchestrator can automatically choose among them when solving tasks.

ARCHITECTURE

Three Layers of
the Hive

Nodes
(The Hive Cells)
Machines with GPUs running ColabHive runtime. Each can host one or more expert models.
  • • Home GPU rigs
  • • Small datacenters
  • • Future: ASICs
Models & Agents
(The Specialists)
Specialized LLMs and AI models optimized for specific domains: legal, code, math, science, trading.
  • • Versioned
  • • Benchmarked
  • • Specialized
Orchestrator
(Coordinator)
Coordinator AI that understands requests, selects experts, and combines results.
  • • Node registry
  • • Availability tracking
  • • Reputation system
USER
Sends request
ORCHESTRATOR
Routes to expert agents
Legal
Expert
Code
Expert
Math
Expert
Combined Result
UNIFIED ANSWER
Returned to user
INFERENCE FLOW

From User Prompt to Unified Answer

See how a single request flows through the hive—from your prompt to expert execution to a unified response.

📱

1. End User / App

Sends a request via the ColabHive API

"Analyze this contract and optimize payment terms."
🧠

2. Orchestrator

Reads request, breaks into subtasks

  • → Legal analysis
  • → Math/optimization
  • → Code snippet

3. Agents on Nodes

Each expert runs its part

  • Legal agent → Node A
  • Math agent → Node B
  • Code agent → Node C
🔄

4. Aggregation

Orchestrator collects outputs, resolves conflicts, synthesizes insights

5. Unified Answer

Returned to user/app via API, as if one model solved everything

Behind the scenes: The orchestrator not only thinks about performance and capability, but also about energy and carbon cost—routing jobs to cleaner grids when latency allows.

TRAINING & LIFECYCLE

From Training to Deployment to Usage

See how builders train models on the hive, deploy them as agents, and make them available for orchestration.

1

Builder / Developer Workspace

  • Uploads datasets via console or API
  • Defines training/fine-tuning job: base model, hyperparameters, hardware profile
2

Training Scheduler & Orchestrator

  • Validates the job
  • Finds eligible nodes (e.g. GPUs with enough VRAM)
  • Dispatches training job across one or more nodes
3

Nodes (Hive Cells) – Training Execution

  • Nodes download data shards or receive batches
  • Run training loops
  • Report metrics (loss, accuracy, etc.) to builder's dashboard
4

Model Registry

Trained weights and configs are versioned. Metadata includes datasets used, code version, training run identifiers.

5

Deployment as Agent

Builder chooses:

Private Agent:

Used only by their own apps

Public Hive Agent:

Submitted for review

6

Review & Publication (Public Agents Only)

  • Automated evaluations & benchmarks
  • Safety and robustness checks
  • Domain-specific tests (legal, medical, finance, etc.)
  • Once approved, agent is listed in hive catalog
7

Usage & Orchestration

Orchestrator can now select this agent when solving relevant tasks. Other builders can explicitly call it if allowed.

8

Continuous Improvement

Builder can retrain/fine-tune. New versions go through the same flow. Old versions remain traceable in the registry.

BACKEND ARCHITECTURE

Hybrid Nodes: Private-first, Public-when-idle

Every ColabHive node can serve private workloads for its owner and public marketplace jobs when idle. This technical diagram shows how the control plane, data plane, and financial core work together to enable secure, fair hybrid execution.

GPU Owner / Org
Private jobs
Public Users
Marketplace jobs
Agent Developer
Publishes agents/models
Control Plane
API Gateway
+ Auth
Agent Registry
Versions + Permissions
Orchestrator
Workflow
Scheduler
Node Manager
Reputation
SLA Service
Pricing Engine
HiveCredits
Financial Core
Double-entry Ledger
Accounting
Escrow per task
Hold → Capture
Settlement Engine
GPU / Agent / Platform
Private Usage Record
No charge
Data Plane / Execution
Lane A
Private Pool
(Owner-only)
✓ Priority execution
✓ ACL enforcement
✓ No escrow
Lane B
Public Pool
(Marketplace)
✓ When idle
✓ Preemptible
✓ Escrow → Settle
Metering Service
Usage events
Receipt Signer
Verifier
💡 Idle Detector
When utilization < threshold → enable Public Pool
⚡ Preemption
At safe points (between requests / at checkpoints)
🔒 Non-preemptible Tier
Avoids hybrid-preemptible nodes
🔄 Bidirectional Elasticity
Spare capacity → marketplace (earn). Need more → borrow from hive (pay). Automatic.
Key Flows:
GPU Owner → Orchestrator → Scheduler → Private Pool (priority, no charge)
Public User → Pricing → Ledger (Hold) → Scheduler → Public Pool
Idle GPUs → Public Pool (in design) → Earn credits (future)
Need More → Borrow from Public Hive (in design) → Pay credits (future)
Node → Metering → Settlement (ESCROW → GPU/Agent/Platform) Public only
Node → Metering → Private Usage Record (no escrow)

Elastic hybrid execution (architectural design)

The flows below describe how the architecture is designed to handle private vs. shared capacity once the contributor program and economic layer (HiveCredits) are operational. These are design targets, not the current operating mode. Today, all production capacity is provided by our anchor customer.

The intent: contributors run their own workloads on their own hardware with priority, and optionally expose idle capacity to other workloads scheduled across the network. Borrowing capacity from the network when local hardware is saturated is the symmetric flow.

The Financial Core is the settlement layer that will mediate these flows once the economic model is finalized: shared-capacity jobs go through escrow; jobs running on a contributor's own hardware are tracked for telemetry but don't trigger settlement.

TRAINING ARCHITECTURE

Distributed Training at Scale

Train Core Models + Expert Agents on partitioned datasets without needing enterprise-grade interconnects

Traditional Approach

Training large AI models traditionally requires:

  • Thousands of identical GPUs in one location
  • NVLink / InfiniBand high-speed interconnects
  • Tight gradient synchronization (all-reduce, ring-allreduce)
  • Massive capital investment + ongoing cooling/power costs

Result: Teams without datacenter access — particularly outside the US — are locked out, regardless of how much heterogeneous capacity they could otherwise muster.

ColabHive Solution

Core + Expert Groups architecture:

  • Any GPU anywhere (3060, 3090, 4090, A100...)
  • No special interconnects required
  • Automatic dataset partitioning (domain, language, difficulty)
  • Independent expert training (adapters, LoRA, fine-tunes)

Result: Train production-grade ML on distributed heterogeneous hardware across multiple sites, without requiring uniform datacenter infrastructure.

Training Program Flow

1
Dataset Upload

Developer uploads training data + config

2
Auto Partition

Split by domain, language, taxons

3
Train Experts

Independent jobs on distributed nodes

4
Eval & Publish

Benchmarks, quality checks, provenance tracking

5
Form Group

Core + Experts registered as coordinated group

6
Orchestrated Inference

Orchestrator routes to right expert based on taxons

Key Advantage: Each expert trains independently on its partition—no tight gradient sync required. Nodes can be anywhere in the world on any hardware. After training, the group acts as a coordinated team during inference, with the orchestrator routing requests to the right specialist.

WHY DISTRIBUTED, WHY LATAM

Constraint became
the thesis

ColabHive didn't start as an ideological project about decentralized AI. It started because we needed to train forecasting models and couldn't get reasonable access to datacenter GPUs from where we operate.

Cloud GPU pricing denominated in USD against local revenue, import restrictions and tax structures on hardware, regional quotas in hyperscalers — the standard path most ML teams take wasn't really open to us. So we built differently: GPUs distributed across three sites in Lima, Córdoba and Buenos Aires, orchestrated through a registry-based architecture that treats hardware as heterogeneous from day one.

The constraint became the thesis. Modern ML doesn't need uniform hyperscale clusters running giant models for every task. It needs right-sized models on appropriate hardware, scheduled intelligently across whatever's available.

That's also why we're exploring Intel Arc integration. Reducing dependency on a single hardware vendor isn't just a cost play — it's structural for any ML project that doesn't have unlimited cloud budget in USD. If our orchestration layer genuinely treats hardware as heterogeneous, it should run on Arc as well as on Nvidia. We're testing exactly that.

PUBLICATIONS

Research & Technical Papers

ColabHive is grounded in rigorous technical analysis. We publish our architectural designs, energy models, and system evaluations openly to enable scientific scrutiny and collaborative development of distributed AI infrastructure.

NEW: December 2025

ColabHive: A Distributed Hive-Mind Architecture for Energy-Aware Collaborative AI

An architectural study describing a distributed network of specialized agents running on heterogeneous GPUs, orchestrated by energy-aware routing — the architectural foundation of the platform we're building today.

5 pages • IEEE format Energy & carbon analysis System architecture

For academic use: LaTeX source also available

Global Scale ECO

Analysis of deployment scenarios from 1M to 10M GPUs, estimating capacity and environmental impact

Energy-Aware

Multi-objective cost function balancing latency, capability, and energy consumption

Specialized Agents

Case studies showing how expert ensembles outperform monolithic models on complex tasks

Want to talk?

If you're building ML workloads on heterogeneous hardware, evaluating distributed infrastructure, or want to discuss contributing compute capacity, we'd like to hear from you.

Compute contributors

Contributor program in design. Get in touch if you have GPU capacity and want early conversations.

Get in touch

ML Builders

Train models, deploy agents, and build on a distributed ML platform across heterogeneous hardware.

Access Console