Early-stage. Three sites. Three production workloads.

Distributed ML infrastructure,
built from LatAm

An open architecture for running ML workloads across heterogeneous GPUs, distributed across three sites in Peru and Argentina. Forecasting, efficient LLMs and tabular ML in production today.

Read the docs Read the architecture paper

PROJECT STATUS

Where ColabHive is today

Honest snapshot. What runs in production today, what we're actively exploring, and what's still on the roadmap.

In production today

42 models available across forecasting, LLMs, tabular ML and specialists
Forecasting workloads in production for Aureus Finance (anchor customer)
LLM, embedding and classification workloads for GiftAI and TakeYourSide
Distributed orchestration across 3 LatAm sites (Lima, Córdoba, Buenos Aires)
REST API and Python SDK

Currently exploring

Intel Arc GPU integration to diversify beyond Nvidia/CUDA — 8 cards in evaluation, courtesy of Intel
Auto-mode for automatic model selection
Onboarding additional compute contributors

Roadmap, not reality yet

HiveCredits economic layer (rewards for GPU contributors and model publishers)
Third-party node onboarding with privacy guarantees
Energy-aware routing with measured telemetry — architecture supports it, instrumentation in progress
Self-serve signup and public pricing

Anchor customer and partners

Aureus Finance

Anchor customer. Financial forecasting in production.

Tab-Core

AI inference hardware partner.

Intel

Exploring Arc GPU integration. 8 cards in evaluation.

INTRODUCING

What is ColabHive?

ColabHive is a distributed ML infrastructure, not a single model. The platform runs right-sized models across heterogeneous GPUs, distributed across multiple sites.

Instead of one massive model running in a centralized datacenter, ColabHive runs many specialized models — efficient LLMs, embedding and classification heads, forecasting transformers — across heterogeneous hardware in three sites today, with an architecture designed to incorporate additional contributors over time.

Right-sized models on appropriate hardware, scheduled across whatever capacity is available.

Who's Who in the Hive

End Users / Product Teams

People who send prompts to ColabHive via the API or an app. They see "one AI endpoint" that solves complex tasks behind the scenes.

Builders / Developers

People who use ColabHive to build products and models: call the API, upload datasets, train models, deploy agents (private or public).

GPU Owners / Node Operators

People who run the ColabHive Node Runtime on their GPUs. Their machines become nodes that host and run expert models/agents. Compute contributor program in design.

Behind the scenes, the Orchestrator reads requests, breaks them into subtasks, selects the right specialist agents, and combines results into unified answers.

Why This Architecture Matters

Train smarter, not bigger

❌ Traditional AI Training

• Requires thousands of GPUs in one location
• Needs NVLink, InfiniBand, high-speed interconnects
• Massive upfront capital investment
• Single monolithic model = single point of failure
• Old hardware becomes useless quickly

✅ ColabHive Approach

• Run on heterogeneous GPUs (consumer and datacenter)
• No special interconnects required
• Train Core + Specialist Agents independently
• Automatic dataset partitioning by domain/language
• Distribute workloads across multiple sites

The key insight: Instead of training one 70B+ parameter model that requires enterprise infrastructure, ColabHive trains a core model + specialized expert agents (adapters, LoRA, domain fine-tunes) on partitioned datasets. Experts train as independent but coordinated jobs across the hive, then work together during inference.

This means you can train production-grade ML using distributed, heterogeneous hardware across multiple sites — without needing thousands of identical GPUs in one rack.

Here's how it works: As an end user, when you send a request, the orchestrator reads it, breaks it into subtasks, selects the right expert agents from the network, and orchestrates their collaboration.

You receive one coherent, unified answer—but behind the scenes, multiple specialized AIs running on node operators' GPUs worked together to solve your problem.

User

Question sent in

Orchestrator

Breaks down task, selects experts

Expert Agents

Multiple specialized agents process simultaneously

Unified Answer

Combined coherent response returned

CONTRIBUTOR PROGRAM

Compute contributors
(program in design)

All compute today is provided by our anchor customer across three sites in LatAm. We're designing a contributor program for organizations with available GPU capacity who want to run workloads on a heterogeneous-hardware architecture.

The economic model that compensates contributors (working name: HiveCredits) is still in design. We are not yet onboarding third-party nodes in production.

If you have hardware you'd like to contribute and want early conversations, get in touch.

Run specialist models

Host the models that match your hardware profile — efficient LLMs, embedding models, classifiers, forecasting heads.

Heterogeneous hardware

The orchestration layer is designed to schedule workloads across whatever's available — Nvidia today, Intel Arc in evaluation.

Future: economic layer

HiveCredits — the layer that compensates contributors for compute and model usage — is in design.

FOR BUILDERS / DEVELOPERS

Build on a
Multi-Agent Platform

ColabHive isn't just an API to one model. It's a platform for building AI products that leverage many models and distributed compute.

As a builder, you'll call the ColabHive API to send complex tasks that only a swarm of specialized agents can solve. Need legal analysis combined with financial modeling and code generation? Send one request—the hive assembles the right team of expert agents automatically.

You don't pick models. You don't manage infrastructure. You describe the problem, and the hive orchestrates the solution.

What You Can Do as a Builder

1 Discover & Register Models from Hugging Face

Search the Hugging Face Hub directly from ColabHive. Filter by task type, model size, license, and popularity. We check compatibility automatically—architecture, license, VRAM requirements—and let you register with one click.

→ Task types: embeddings, reranking, translation, OCR, speech-to-text, moderation, text generation
→ One-click registration: Add to your Hive and run inference immediately
→ Lifecycle management: Track models from candidate → ready → preferred

2 Run Inference via API or SDK

Call your registered models via our unified API or Python SDK. Send tasks to specific endpoints or let the orchestrator route to the right specialist automatically.

→ Direct endpoints: Call specific models when you need control
→ Orchestrated routing: Let the hive choose the best model for the task

3 Train & Fine-Tune Your Own Models

Upload datasets via our Developer Console or API. Configure training jobs: select base models, define hyperparameters, specify hardware requirements (e.g. "24GB+ GPU", "multi-GPU").

Train where it fits. You define the job; ColabHive finds the nodes with enough VRAM and capacity. Track metrics, logs, and versions in our model registry.

Distributed Training at Scale

ColabHive supports Core Model + Expert Agents training: automatically partition your datasets, train multiple specialized experts independently, and deploy them as a coordinated group.

❌ Traditional Datacenters

Need thousands of GPUs connected via NVLink/InfiniBand for tight gradient sync

✅ ColabHive

Reuse old hardware anywhere in the world. Train experts independently, no high-speed interconnect required

→ Automatic dataset partitioning by domain/language/difficulty
→ Train experts as independent jobs (adapters/LoRA, fine-tunes)
→ Deploy as a preferred group for coordinated inference
→ Orchestrator routes to the right expert based on request taxonomy

4 Deploy Models as Endpoints (Private or Public)

After training, choose how to deploy:

Private Agent

Only accessible to your apps/organization. Not visible in the public catalog. Your private tools.

Public Hive Agent

Exposed across the hive. Callable by other builders. Subject to review, benchmarking, and safety checks.

Public agents appear in the catalog with domain, capabilities, version info, and evaluation stats. The orchestrator can automatically choose among them when solving tasks.

ARCHITECTURE

Three Layers of
the Hive

Nodes

(The Hive Cells)

Machines with GPUs running ColabHive runtime. Each can host one or more expert models.

• Home GPU rigs
• Small datacenters
• Future: ASICs

Models & Agents

(The Specialists)

Specialized LLMs and AI models optimized for specific domains: legal, code, math, science, trading.

• Versioned
• Benchmarked
• Specialized

Orchestrator

(Coordinator)

Coordinator AI that understands requests, selects experts, and combines results.

• Node registry
• Availability tracking
• Reputation system

USER

Sends request

ORCHESTRATOR

Routes to expert agents

Legal

Expert

Code

Expert

Math

Expert

Combined Result

UNIFIED ANSWER

Returned to user

INFERENCE FLOW

From User Prompt to Unified Answer

See how a single request flows through the hive—from your prompt to expert execution to a unified response.

📱

1. End User / App

Sends a request via the ColabHive API

"Analyze this contract and optimize payment terms."

🧠

2. Orchestrator

Reads request, breaks into subtasks

→ Legal analysis
→ Math/optimization
→ Code snippet

⚡

3. Agents on Nodes

Each expert runs its part

Legal agent → Node A
Math agent → Node B
Code agent → Node C

🔄

4. Aggregation

Orchestrator collects outputs, resolves conflicts, synthesizes insights

✅

5. Unified Answer

Returned to user/app via API, as if one model solved everything

Behind the scenes: The orchestrator not only thinks about performance and capability, but also about energy and carbon cost—routing jobs to cleaner grids when latency allows.

TRAINING & LIFECYCLE

From Training to Deployment to Usage

See how builders train models on the hive, deploy them as agents, and make them available for orchestration.

Builder / Developer Workspace

→ Uploads datasets via console or API
→ Defines training/fine-tuning job: base model, hyperparameters, hardware profile

Training Scheduler & Orchestrator

→ Validates the job
→ Finds eligible nodes (e.g. GPUs with enough VRAM)
→ Dispatches training job across one or more nodes

Nodes (Hive Cells) – Training Execution

→ Nodes download data shards or receive batches
→ Run training loops
→ Report metrics (loss, accuracy, etc.) to builder's dashboard

Model Registry

Trained weights and configs are versioned. Metadata includes datasets used, code version, training run identifiers.

Deployment as Agent

Builder chooses:

Private Agent:

Used only by their own apps

Public Hive Agent:

Submitted for review

Review & Publication (Public Agents Only)

✓ Automated evaluations & benchmarks
✓ Safety and robustness checks
✓ Domain-specific tests (legal, medical, finance, etc.)
✓ Once approved, agent is listed in hive catalog

Usage & Orchestration

Orchestrator can now select this agent when solving relevant tasks. Other builders can explicitly call it if allowed.

Continuous Improvement

Builder can retrain/fine-tune. New versions go through the same flow. Old versions remain traceable in the registry.

BACKEND ARCHITECTURE

Hybrid Nodes: Private-first, Public-when-idle

Every ColabHive node can serve private workloads for its owner and public marketplace jobs when idle. This technical diagram shows how the control plane, data plane, and financial core work together to enable secure, fair hybrid execution.

GPU Owner / Org

Private jobs

Public Users

Marketplace jobs

Agent Developer

Publishes agents/models

Control Plane

API Gateway

+ Auth

Agent Registry

Versions + Permissions

Orchestrator

Workflow

Scheduler

Node Manager

Reputation

SLA Service

Pricing Engine

HiveCredits

Financial Core

Double-entry Ledger

Accounting

Escrow per task

Hold → Capture

Settlement Engine

GPU / Agent / Platform

Private Usage Record

No charge

Data Plane / Execution

Lane A

Private Pool

(Owner-only)

✓ Priority execution

✓ ACL enforcement

✓ No escrow

Lane B

Public Pool

(Marketplace)

✓ When idle

✓ Preemptible

✓ Escrow → Settle

Metering Service

Usage events

Receipt Signer

Verifier

💡 Idle Detector

When utilization < threshold → enable Public Pool

⚡ Preemption

At safe points (between requests / at checkpoints)

🔒 Non-preemptible Tier

Avoids hybrid-preemptible nodes

🔄 Bidirectional Elasticity

Spare capacity → marketplace (earn). Need more → borrow from hive (pay). Automatic.

Key Flows:

█ GPU Owner → Orchestrator → Scheduler → Private Pool (priority, no charge)

█ Public User → Pricing → Ledger (Hold) → Scheduler → Public Pool

█ Idle GPUs → Public Pool (in design) → Earn credits (future)

█ Need More → Borrow from Public Hive (in design) → Pay credits (future)

█ Node → Metering → Settlement (ESCROW → GPU/Agent/Platform) Public only

█ Node → Metering → Private Usage Record (no escrow)

Elastic hybrid execution (architectural design)

The flows below describe how the architecture is designed to handle private vs. shared capacity once the contributor program and economic layer (HiveCredits) are operational. These are design targets, not the current operating mode. Today, all production capacity is provided by our anchor customer.

The intent: contributors run their own workloads on their own hardware with priority, and optionally expose idle capacity to other workloads scheduled across the network. Borrowing capacity from the network when local hardware is saturated is the symmetric flow.

The Financial Core is the settlement layer that will mediate these flows once the economic model is finalized: shared-capacity jobs go through escrow; jobs running on a contributor's own hardware are tracked for telemetry but don't trigger settlement.

TRAINING ARCHITECTURE

Distributed Training at Scale

Train Core Models + Expert Agents on partitioned datasets without needing enterprise-grade interconnects

❌ Traditional Approach

Training large AI models traditionally requires:

Thousands of identical GPUs in one location
NVLink / InfiniBand high-speed interconnects
Tight gradient synchronization (all-reduce, ring-allreduce)
Massive capital investment + ongoing cooling/power costs

Result: Teams without datacenter access — particularly outside the US — are locked out, regardless of how much heterogeneous capacity they could otherwise muster.

✅ ColabHive Solution

Core + Expert Groups architecture:

Any GPU anywhere (3060, 3090, 4090, A100...)
No special interconnects required
Automatic dataset partitioning (domain, language, difficulty)
Independent expert training (adapters, LoRA, fine-tunes)

Result: Train production-grade ML on distributed heterogeneous hardware across multiple sites, without requiring uniform datacenter infrastructure.

Training Program Flow

Dataset Upload

Developer uploads training data + config

Auto Partition

Split by domain, language, taxons

Train Experts

Independent jobs on distributed nodes

Eval & Publish

Benchmarks, quality checks, provenance tracking

Form Group

Core + Experts registered as coordinated group

Orchestrated Inference

Orchestrator routes to right expert based on taxons

Key Advantage: Each expert trains independently on its partition—no tight gradient sync required. Nodes can be anywhere in the world on any hardware. After training, the group acts as a coordinated team during inference, with the orchestrator routing requests to the right specialist.

WHY DISTRIBUTED, WHY LATAM

Constraint became
the thesis

ColabHive didn't start as an ideological project about decentralized AI. It started because we needed to train forecasting models and couldn't get reasonable access to datacenter GPUs from where we operate.

Cloud GPU pricing denominated in USD against local revenue, import restrictions and tax structures on hardware, regional quotas in hyperscalers — the standard path most ML teams take wasn't really open to us. So we built differently: GPUs distributed across three sites in Lima, Córdoba and Buenos Aires, orchestrated through a registry-based architecture that treats hardware as heterogeneous from day one.

The constraint became the thesis. Modern ML doesn't need uniform hyperscale clusters running giant models for every task. It needs right-sized models on appropriate hardware, scheduled intelligently across whatever's available.

That's also why we're exploring Intel Arc integration. Reducing dependency on a single hardware vendor isn't just a cost play — it's structural for any ML project that doesn't have unlimited cloud budget in USD. If our orchestration layer genuinely treats hardware as heterogeneous, it should run on Arc as well as on Nvidia. We're testing exactly that.

PUBLICATIONS

Research & Technical Papers

ColabHive is grounded in rigorous technical analysis. We publish our architectural designs, energy models, and system evaluations openly to enable scientific scrutiny and collaborative development of distributed AI infrastructure.

NEW: June 2026

Reducing NVIDIA Dependency with Intel Arc

A measured head-to-head of an Intel Arc Pro B70 (Battlemage) against an NVIDIA RTX 3090 across six AI workloads — 95–112% of the throughput at 56–64% of the power (~1.45–2.0× better performance-per-watt), with device-attributed energy measurement.

~8 pages • GPU benchmarking study Energy-efficiency & perf/watt Intel Arc / Battlemage

Read Full Paper Download PDF (English) Descargar PDF (Español)

For academic use: LaTeX source also available

NEW: December 2025

ColabHive: A Distributed Hive-Mind Architecture for Energy-Aware Collaborative AI

An architectural study describing a distributed network of specialized agents running on heterogeneous GPUs, orchestrated by energy-aware routing — the architectural foundation of the platform we're building today.

5 pages • IEEE format Energy & carbon analysis System architecture

Read Full Paper Download PDF

For academic use: LaTeX source also available

Global Scale ECO

Analysis of deployment scenarios from 1M to 10M GPUs, estimating capacity and environmental impact

Energy-Aware

Multi-objective cost function balancing latency, capability, and energy consumption

Specialized Agents

Case studies showing how expert ensembles outperform monolithic models on complex tasks

Want to talk?

If you're building ML workloads on heterogeneous hardware, evaluating distributed infrastructure, or want to discuss contributing compute capacity, we'd like to hear from you.

Compute contributors

Contributor program in design. Get in touch if you have GPU capacity and want early conversations.

Get in touch

ML Builders

Train models, deploy agents, and build on a distributed ML platform across heterogeneous hardware.

Access Console

Distributed ML infrastructure, built from LatAm

Where ColabHive is today

What is ColabHive?

Who's Who in the Hive

End Users / Product Teams

Builders / Developers

GPU Owners / Node Operators

Why This Architecture Matters

Compute contributors(program in design)

Build on aMulti-Agent Platform

What You Can Do as a Builder

1 Discover & Register Models from Hugging Face

2 Run Inference via API or SDK

3 Train & Fine-Tune Your Own Models

Distributed Training at Scale

4 Deploy Models as Endpoints (Private or Public)

Three Layers ofthe Hive

From User Prompt to Unified Answer

1. End User / App

2. Orchestrator

3. Agents on Nodes

4. Aggregation

5. Unified Answer

From Training to Deployment to Usage

Builder / Developer Workspace

Training Scheduler & Orchestrator

Nodes (Hive Cells) – Training Execution

Model Registry

Deployment as Agent

Review & Publication (Public Agents Only)

Usage & Orchestration

Continuous Improvement

Hybrid Nodes: Private-first, Public-when-idle

Elastic hybrid execution (architectural design)

Distributed Training at Scale

❌ Traditional Approach

✅ ColabHive Solution

Training Program Flow

Constraint becamethe thesis

Research & Technical Papers

Reducing NVIDIA Dependency with Intel Arc

ColabHive: A Distributed Hive-Mind Architecture for Energy-Aware Collaborative AI

Global Scale ECO

Energy-Aware

Specialized Agents

Want to talk?

Compute contributors

ML Builders

Distributed ML infrastructure,
built from LatAm

Compute contributors
(program in design)

Build on a
Multi-Agent Platform

Three Layers of
the Hive

Constraint became
the thesis