Skip to content

AI Coding Tools

Understanding the stack, tools, and models powering modern AI coding

What Is AI Coding?

AI coding is no longer one tool plus one model. Real workflows mix editor, agent, runtime, model host, and hardware depending on task, privacy, and budget.

This guide is compact by design:

  1. Shared industry stack
  2. Provider deltas that actually matter
  3. Minimal local stack
  4. Tool catalog by layer

The Industry Stack

A frontier AI system is a vertical stack — from silicon at the bottom to the app at the top. Below are two ecosystems: the closed / proprietary labs (OpenAI · Anthropic · Google) and the open-source stack you can self-host. Read top-down (user-facing) to bottom (foundation); proprietary entries follow OpenAI/Anthropic/Google order, with / separating alternatives.

Stack LayerClosed / Proprietary EcosystemOpen-Source / Self-Hosted Ecosystem
App (UI/UX)ChatGPT, Claude, GeminiOpen WebUI, AnythingLLM, Jan
Coding AgentGitHub Copilot, Claude Code, CursorCline, Aider, OpenDevin (All Hands)
Agent SDK / OrchestratorOpenAI Assistants API, LangChain Smith (Managed)LangChain, LlamaIndex, CrewAI, Autogen
Vector DB / MemoryPinecone, Vertex AI Search, Enterprise Weaviatepgvector, Qdrant, Chroma, Milvus
ModelGPT-4o, Claude 3.5 Sonnet, Gemini 1.5 ProLlama 3, Qwen 2.5, DeepSeek-V3, Phi-4
API / Model HostingOpenAI Platform, Anthropic API, Vertex AISelf-hosted: vLLM, Ollama, TGI
Managed Open-Model Providers: OpenRouter, Together AI
Inference RuntimeProprietary internal engines (Google/OpenAI)vLLM, Ollama, llama.cpp, SGLang, TensorRT-LLM
Pre-training DataUndisclosed web scraping, licensed media, private synthetic dataOpen corpora (FineWeb, RedPajama, The Stack)
Alignment (RLHF/DPO)Proprietary RLHF, Constitutional AI, RL methodsOpen-source alignment pipelines (TRL, Alignment Handbook), Open datasets
Training FrameworksCustom internal orchestration layersPyTorch, JAX, DeepSpeed, Megatron-LM
Infrastructure OrchestrationManaged hyperscaler clustersKubernetes, Ray, Slurm (bare-metal)
Cloud ComputeAWS, Azure, Google Cloud (TPUs)RunPod, Vast.ai, Lambda Labs, on-prem clusters
Hardware AcceleratorCustom Cloud ASICs (TPU, Trainium, Axion)NVIDIA GPUs (H100/B200), AMD Instinct (MI300X), Apple Silicon
InterconnectInfiniBand, NVLink / NVSwitch (NVIDIA ecosystem)Ultra Ethernet, Standard RoCE/Ethernet

Note: Closed stacks are vertically integrated — lower cost and tighter control, but heavier lock-in. The open-source stack trades turnkey convenience for portability: every layer, from the chip to the app, can be swapped or self-hosted.

What Is Mostly Standardized

  • Markdown-first text output
  • SSE + JSON delta streaming
  • Markdown -> AST -> component render path
  • MCP as practical tool-calling standard

The middle layers have converged; real differences are concentrated in model behavior, context reliability, product UX, and ecosystem lock-in. For the provider-by-provider comparison — and what each chatbot can actually do (web search, Canvas/Artifacts, Mermaid, maps) — see AI Chatbot Platforms.

The Minimal Stack: What You Actually Need

A developer needs four layers; two tools cover it:

ollama run qwen3.6    # runtime + model
opencode              # optional agent
LayerTool / ComponentWhat it does
AgentOpenCode (optional)Intent → prompts + tools
RuntimeOllama (llama.cpp)Transformer forward pass
ModelQwen Coder (GGUF)Learned weight matrices
HardwareGPU / CPUMatrix multiply + attention

The computation: Prompt → tokens → embeddings → stacked Transformer blocks (self-attention + feed-forward = mostly GEMM). Runtime schedules operations; chip executes them — billions of multiply-adds per token.

Summary: Model = numbers. Runtime = recipe. Hardware = executor. Everything else is optional.

AI Components

AI componentsOpen-SourceProprietary
Local Runtimesllama.cpp Ollama LM Studio vLLM GPT4All
Code EditorsVS Code ZedAntigravity Cursor Windsurf
Agents / CLICodex OpenCode Open Interpreter Hermes MiniMax MiMo CodeClaude Code Devin GitHub Copilot
Model PlatformsHugging Face OpenRouter Replicate Vast.ai
AI ModelsLlama Qwen DeepSeek MistralClaude GPT Gemini Kimi

Multimodal AI: Beyond Text and Code

The same machinery — transformers plus diffusion models for pixels and audio — generates speech, images, video, and music. These aren’t tools; they’re models. They run identical forward-pass math on identical hardware; only the training domain changes. The closed-vs-open split repeats across every modality:

ModalityClosed / ProprietaryOpen-SourceRun it locally with
Speech-to-text (STT)OpenAI gpt-4o-transcribe / DeepgramWhisper / Moonshinewhisper.cpp
Text-to-speech (TTS)ElevenLabs / OpenAI TTSKokoro / XTTSComfyUI
Image generationMidjourney / GPT ImageFLUX.1 / Stable Diffusion 3.5ComfyUI
Video generationSora / Runway Gen-4HunyuanVideo / LTX-VideoComfyUI
EmbeddingsOpenAI text-embedding-3 / CohereBGE / Nomic EmbedOllama

Note: Image, video, and audio use ComfyUI as the universal open runtime — the Ollama of pixels. Speech uses whisper.cpp. Reverse flows (image/audio → understanding) are handled by multimodal LLMs above.


Build Strategy in 2026

Use this quick decision rule:

  • Local-first when privacy, cost control, or offline operation dominates
  • Cloud-first when setup speed and model quality dominate
  • Hybrid when you want local dev loops plus cloud fallback for hard tasks

At this point, the hard problem is no longer model availability. It is integration quality: latency, reliability, memory design, and tool orchestration.