AI Coding Tools
Understanding the stack, tools, and models powering modern AI coding
What Is AI Coding?
AI coding is no longer one tool plus one model. Real workflows mix editor, agent, runtime, model host, and hardware depending on task, privacy, and budget.
This guide is compact by design:
- Shared industry stack
- Provider deltas that actually matter
- Minimal local stack
- Tool catalog by layer
The Industry Stack
A frontier AI system is a vertical stack — from silicon at the bottom to the app at the top. Below are two ecosystems: the closed / proprietary labs (OpenAI · Anthropic · Google) and the open-source stack you can self-host. Read top-down (user-facing) to bottom (foundation); proprietary entries follow OpenAI/Anthropic/Google order, with / separating alternatives.
| Stack Layer | Closed / Proprietary Ecosystem | Open-Source / Self-Hosted Ecosystem |
|---|---|---|
| App (UI/UX) | ChatGPT, Claude, Gemini | Open WebUI, AnythingLLM, Jan |
| Coding Agent | GitHub Copilot, Claude Code, Cursor | Cline, Aider, OpenDevin (All Hands) |
| Agent SDK / Orchestrator | OpenAI Assistants API, LangChain Smith (Managed) | LangChain, LlamaIndex, CrewAI, Autogen |
| Vector DB / Memory | Pinecone, Vertex AI Search, Enterprise Weaviate | pgvector, Qdrant, Chroma, Milvus |
| Model | GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro | Llama 3, Qwen 2.5, DeepSeek-V3, Phi-4 |
| API / Model Hosting | OpenAI Platform, Anthropic API, Vertex AI | Self-hosted: vLLM, Ollama, TGI Managed Open-Model Providers: OpenRouter, Together AI |
| Inference Runtime | Proprietary internal engines (Google/OpenAI) | vLLM, Ollama, llama.cpp, SGLang, TensorRT-LLM |
| Pre-training Data | Undisclosed web scraping, licensed media, private synthetic data | Open corpora (FineWeb, RedPajama, The Stack) |
| Alignment (RLHF/DPO) | Proprietary RLHF, Constitutional AI, RL methods | Open-source alignment pipelines (TRL, Alignment Handbook), Open datasets |
| Training Frameworks | Custom internal orchestration layers | PyTorch, JAX, DeepSpeed, Megatron-LM |
| Infrastructure Orchestration | Managed hyperscaler clusters | Kubernetes, Ray, Slurm (bare-metal) |
| Cloud Compute | AWS, Azure, Google Cloud (TPUs) | RunPod, Vast.ai, Lambda Labs, on-prem clusters |
| Hardware Accelerator | Custom Cloud ASICs (TPU, Trainium, Axion) | NVIDIA GPUs (H100/B200), AMD Instinct (MI300X), Apple Silicon |
| Interconnect | InfiniBand, NVLink / NVSwitch (NVIDIA ecosystem) | Ultra Ethernet, Standard RoCE/Ethernet |
Note: Closed stacks are vertically integrated — lower cost and tighter control, but heavier lock-in. The open-source stack trades turnkey convenience for portability: every layer, from the chip to the app, can be swapped or self-hosted.
What Is Mostly Standardized
- Markdown-first text output
- SSE + JSON delta streaming
- Markdown -> AST -> component render path
- MCP as practical tool-calling standard
The middle layers have converged; real differences are concentrated in model behavior, context reliability, product UX, and ecosystem lock-in. For the provider-by-provider comparison — and what each chatbot can actually do (web search, Canvas/Artifacts, Mermaid, maps) — see AI Chatbot Platforms.
The Minimal Stack: What You Actually Need
A developer needs four layers; two tools cover it:
ollama run qwen3.6 # runtime + model
opencode # optional agent
| Layer | Tool / Component | What it does |
|---|---|---|
| Agent | OpenCode (optional) | Intent → prompts + tools |
| Runtime | Ollama (llama.cpp) | Transformer forward pass |
| Model | Qwen Coder (GGUF) | Learned weight matrices |
| Hardware | GPU / CPU | Matrix multiply + attention |
The computation: Prompt → tokens → embeddings → stacked Transformer blocks (self-attention + feed-forward = mostly GEMM). Runtime schedules operations; chip executes them — billions of multiply-adds per token.
Summary: Model = numbers. Runtime = recipe. Hardware = executor. Everything else is optional.
AI Components
| AI components | Open-Source | Proprietary |
|---|---|---|
| Local Runtimes | ||
| Code Editors | ||
| Agents / CLI | ||
| Model Platforms | ||
| AI Models |
Multimodal AI: Beyond Text and Code
The same machinery — transformers plus diffusion models for pixels and audio — generates speech, images, video, and music. These aren’t tools; they’re models. They run identical forward-pass math on identical hardware; only the training domain changes. The closed-vs-open split repeats across every modality:
| Modality | Closed / Proprietary | Open-Source | Run it locally with |
|---|---|---|---|
| Speech-to-text (STT) | OpenAI gpt-4o-transcribe / Deepgram | Whisper / Moonshine | whisper.cpp |
| Text-to-speech (TTS) | ElevenLabs / OpenAI TTS | Kokoro / XTTS | ComfyUI |
| Image generation | Midjourney / GPT Image | FLUX.1 / Stable Diffusion 3.5 | ComfyUI |
| Video generation | Sora / Runway Gen-4 | HunyuanVideo / LTX-Video | ComfyUI |
| Embeddings | OpenAI text-embedding-3 / Cohere | BGE / Nomic Embed | Ollama |
Note: Image, video, and audio use ComfyUI as the universal open runtime — the Ollama of pixels. Speech uses whisper.cpp. Reverse flows (image/audio → understanding) are handled by multimodal LLMs above.
Build Strategy in 2026
Use this quick decision rule:
- Local-first when privacy, cost control, or offline operation dominates
- Cloud-first when setup speed and model quality dominate
- Hybrid when you want local dev loops plus cloud fallback for hard tasks
At this point, the hard problem is no longer model availability. It is integration quality: latency, reliability, memory design, and tool orchestration.