AI Coding Tools

Understanding the stack, tools, and models powering modern AI coding

comparison
software

What Is AI Coding?

AI coding is no longer one tool plus one model. Real workflows mix editor, agent, runtime, model host, and hardware depending on task, privacy, and budget.

This guide is compact by design:

Shared industry stack
Provider deltas that actually matter
Minimal local stack
Tool catalog by layer

The Industry Stack

A frontier AI system is a vertical stack — from silicon at the bottom to the app at the top. Below are two ecosystems: the closed / proprietary labs (OpenAI · Anthropic · Google) and the open-source stack you can self-host. Read top-down (user-facing) to bottom (foundation); proprietary entries follow OpenAI/Anthropic/Google order, with / separating alternatives.

Stack Layer	Closed / Proprietary Ecosystem	Open-Source / Self-Hosted Ecosystem
App (UI/UX)	ChatGPT, Claude, Gemini	Open WebUI, AnythingLLM, Jan
Coding Agent	GitHub Copilot, Claude Code, Cursor	Cline, Aider, OpenDevin (All Hands)
Agent SDK / Orchestrator	OpenAI Assistants API, LangChain Smith (Managed)	LangChain, LlamaIndex, CrewAI, Autogen
Vector DB / Memory	Pinecone, Vertex AI Search, Enterprise Weaviate	pgvector, Qdrant, Chroma, Milvus
Model	GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro	Llama 3, Qwen 2.5, DeepSeek-V3, Phi-4
API / Model Hosting	OpenAI Platform, Anthropic API, Vertex AI	Self-hosted: vLLM, Ollama, TGI Managed Open-Model Providers: OpenRouter, Together AI
Inference Runtime	Proprietary internal engines (Google/OpenAI)	vLLM, Ollama, llama.cpp, SGLang, TensorRT-LLM
Pre-training Data	Undisclosed web scraping, licensed media, private synthetic data	Open corpora (FineWeb, RedPajama, The Stack)
Alignment (RLHF/DPO)	Proprietary RLHF, Constitutional AI, RL methods	Open-source alignment pipelines (TRL, Alignment Handbook), Open datasets
Training Frameworks	Custom internal orchestration layers	PyTorch, JAX, DeepSpeed, Megatron-LM
Infrastructure Orchestration	Managed hyperscaler clusters	Kubernetes, Ray, Slurm (bare-metal)
Cloud Compute	AWS, Azure, Google Cloud (TPUs)	RunPod, Vast.ai, Lambda Labs, on-prem clusters
Hardware Accelerator	Custom Cloud ASICs (TPU, Trainium, Axion)	NVIDIA GPUs (H100/B200), AMD Instinct (MI300X), Apple Silicon
Interconnect	InfiniBand, NVLink / NVSwitch (NVIDIA ecosystem)	Ultra Ethernet, Standard RoCE/Ethernet

Note: Closed stacks are vertically integrated — lower cost and tighter control, but heavier lock-in. The open-source stack trades turnkey convenience for portability: every layer, from the chip to the app, can be swapped or self-hosted.

What Is Mostly Standardized

Markdown-first text output
SSE + JSON delta streaming
Markdown -> AST -> component render path
MCP as practical tool-calling standard

The middle layers have converged; real differences are concentrated in model behavior, context reliability, product UX, and ecosystem lock-in. For the provider-by-provider comparison — and what each chatbot can actually do (web search, Canvas/Artifacts, Mermaid, maps) — see AI Chatbot Platforms.

The Minimal Stack: What You Actually Need

A developer needs four layers; two tools cover it:

ollama run qwen3.6    # runtime + model
opencode              # optional agent

Layer	Tool / Component	What it does
Agent	OpenCode (optional)	Intent → prompts + tools
Runtime	Ollama (llama.cpp)	Transformer forward pass
Model	Qwen Coder (GGUF)	Learned weight matrices
Hardware	GPU / CPU	Matrix multiply + attention

The computation: Prompt → tokens → embeddings → stacked Transformer blocks (self-attention + feed-forward = mostly GEMM). Runtime schedules operations; chip executes them — billions of multiply-adds per token.

Summary: Model = numbers. Runtime = recipe. Hardware = executor. Everything else is optional.

AI Components

AI components	Open-Source	Proprietary
Local Runtimes
Code Editors
Agents / CLI
Model Platforms
AI Models

Multimodal AI: Beyond Text and Code

The same machinery — transformers plus diffusion models for pixels and audio — generates speech, images, video, and music. These aren’t tools; they’re models. They run identical forward-pass math on identical hardware; only the training domain changes. The closed-vs-open split repeats across every modality:

Modality	Closed / Proprietary	Open-Source	Run it locally with
Speech-to-text (STT)	OpenAI gpt-4o-transcribe / Deepgram	Whisper / Moonshine	whisper.cpp
Text-to-speech (TTS)	ElevenLabs / OpenAI TTS	Kokoro / XTTS	ComfyUI
Image generation	Midjourney / GPT Image	FLUX.1 / Stable Diffusion 3.5	ComfyUI
Video generation	Sora / Runway Gen-4	HunyuanVideo / LTX-Video	ComfyUI
Embeddings	OpenAI text-embedding-3 / Cohere	BGE / Nomic Embed	Ollama

Note: Image, video, and audio use ComfyUI as the universal open runtime — the Ollama of pixels. Speech uses whisper.cpp. Reverse flows (image/audio → understanding) are handled by multimodal LLMs above.

Build Strategy in 2026

Use this quick decision rule:

Local-first when privacy, cost control, or offline operation dominates
Cloud-first when setup speed and model quality dominate
Hybrid when you want local dev loops plus cloud fallback for hard tasks

At this point, the hard problem is no longer model availability. It is integration quality: latency, reliability, memory design, and tool orchestration.