The New CAD

New tools, new benchmarks, new technology

Jun 15, 2026

The New (Experimental) CAD

The industrial computer-aided design (CAD) software environment is transitioning from an ecosystem dominated by monolithic, desktop-bound modeling kernels and rigid Product Lifecycle Management (PLM) architectures toward a decoupled, multi-tier landscape[cite: 2, 6]. This structural shift is characterized by the rise of web-native, algorithmic, and API-driven geometric tools that segregate core modeling tasks into specialized software layers[cite: 2, 6].

According to industry analysis by Ralph Grabowski in “Experimental CAD Is Exploding All Over”, the contemporary CAD sector is experiencing a proliferation of specialized, algorithmic modeling paradigms—exemplified by independent projects such as Waterfall CAD, ForgeCAD, CADara, and Trinix[cite: 5, 6]. Rather than presenting single-seat replacements for established corporate platforms, these tools represent a functional fragmentation of the engineering pipeline[cite: 2, 6]. Instead of relying on a single monolithic system to manage the entire lifecycle from conceptualization to engineering documentation, specific workflow bottlenecks are being isolated and addressed by point solutions and Large Language Model (LLM) wrappers[cite: 2, 6].

Technical documentation and market evaluations indicate that this architectural fragmentation is driven by tools integrating directly into existing software environments or serving as headless infrastructure APIs[cite: 2]. This targeted automation primarily addresses two specific operational bottlenecks:

API-Driven Geometry Generation and Infrastructure: Rather than interacting via conventional graphical user interfaces (GUIs), developers and engineering teams are utilizing automated geometry infrastructure[cite: 2]. Zoo.dev provides a browser-based Design API and an automated conversational agent named Zookeeper[cite: 2]. This architecture functions as an agentic program that orchestrates LLMs alongside specialized geometry tools to interpret natural language or scripted commands, translating them directly into programmatic geometry workflows[cite: 2].
Documentation and Drafting Automation: The labor-intensive requirement of translating 3D solids into 2D technical drawings is targeted by specialized, geometry-conditioned plugins[cite: 2]. Software such as DraftAid integrates directly as a subsystem inside legacy kernels, including SolidWorks and Autodesk Inventor[cite: 2]. By evaluating existing models and learning company-specific drafting styles, these tools isolate and automate view generation and annotation placements within the local session, claiming up to a 90% reduction in drafting time[cite: 2].

Consequently, the mechanical engineering workflow is evolving from a single-application paradigm toward a multi-tier pipeline where diverse, specialized agents manage discrete geometric and administrative tasks[cite: 2, 6]. To evaluate the technical viability of these emerging systems, engineers must move past high-level integration features and analyze the underlying mathematical frameworks used to synthesize and validate the resulting boundary representations (B-reps)[cite: 1, 4].

The Metrology of AI (Separating Hype from the Kernel)

To an engineer evaluating software for production deployment, a generative model’s underlying parameter count or natural language proficiency is secondary. The functional value of an AI CAD tool is measured by its regeneration success rate, topological validity, and feature tree integrity[cite: 6]. Historically, generative 3D models were evaluated using visual and surface-level metrics such as Chamfer distance or Intersection over Union (IoU)[cite: 1]. While adequate for rendering and concept art, these metrics are completely inadequate for mechanical engineering; a generated model might visually resemble a spur gear, but without precise mathematical definitions, it cannot be toleranced, subjected to structural analysis, or prepared for CNC machining[cite: 1]. Consequently, the industry has pivoted toward rigorous benchmarks that function as automated geometric quality assurance[cite: 1].

BenchCAD: The Automated Inspection Harness

BenchCAD operates as a capability-decomposed evaluation framework for industrial CAD reasoning[cite: 3, 6]. Rather than assessing mesh outputs, it evaluates an AI’s programmatic logic against a dataset of over 17,900 execution-verified CadQuery programs spanning 106 industrial part families, such as bevel gears, springs, and twist drills[cite: 3, 6]. BenchCAD functions as an automated inspector, actively penalizing generative models that misinterpret strict industrial parameters or attempt to replace complex mechanical operations—such as precise lofts or helical twists—with simplistic extrusions[cite: 1, 6].

Parametric CAD Bench: The Kernel Stress Test

Modern AI CAD tools are increasingly deployed as autonomous agents that iteratively interact with underlying CAD engines[cite: 1]. Parametric CAD Bench evaluates these agents by executing them within a sandboxed CAD environment[cite: 3, 6]. It serves as a rigorous stress test for multi-step engineering workflows[cite: 6]. Instead of merely evaluating the final shape of a single part, this benchmark measures an agent’s ability to generate assemblies with correct mathematical mates, enforce component hierarchies, adhere strictly to dimensional specifications, and ensure that the final design passes kernel-level rebuildability tests without throwing topological errors[cite: 6].

CADGenBench: The Drafting Translation Challenge

A significant bottleneck in mechanical design is the translation of legacy 2D data into functional 3D representations. CADGenBench directly targets this translation phase, presenting a standardized drawing-to-CAD challenge[cite: 3, 6]. It measures a model’s capacity to interpret explicit 2D engineering drawings and textual edit instructions, and correctly map them into valid, deterministic 3D models[cite: 3, 6]. By requiring strict engineering-grade geometry quality, CADGenBench ensures that the generated models are not just approximations, but precise parametric translations of the source diagrams[cite: 3, 6].

The synthesis of BenchCAD, Parametric CAD Bench, and CADGenBench represents a vital first step in establishing a metrology for AI-driven design. By moving beyond surface-level visual metrics toward execution-verified logic, assembly reasoning, and drawing translation, these benchmarks provide a foundational framework for assessing engineering intent. However, while these tools are essential, they are only the beginning; the path to fully autonomous, production-grade engineering requires the development of significantly more rigorous and comprehensive evaluation suites capable of validating the deep structural integrity, material performance, and manufacturing feasibility of AI-generated models.

The Underlying Tech (The Good, the Bad, and the Dumb)

To safely deploy AI generation tools into a production engineering environment, experts must understand exactly how the underlying architecture dictates where the geometry will inevitably break down. Mapping the AI mechanics directly to their real-world downstream impacts reveals that commercial offerings currently rely on three distinct frameworks for generating 3D forms, each with explicit limitations [cite: 1].

Mesh-Based Diffusion

In this approach, neural networks trained on 3D datasets generate surface meshes—typically outputting STL, OBJ, or gLTF formats—directly from text prompts [cite: 1]. While this generation is computationally fast and highly effective for industrial styling or visual concepting, it yields a dumb lump of polygons [cite: 1]. A generated model might visually resemble a functional component, but it is fundamentally non-parametric [cite: 1]. Because it lacks exact mathematical definitions, analytic faces, and sketch constraints, the output is structurally useless for precision tolerancing, functional assembly mating, or deterministic CNC toolpaths [cite: 0]. The conversion of the mesh to feature based models is being researched:

Code-Generation (Programmatic CAD)

To bridge the gap between visual approximation and engineering utility, programmatic architectures utilize LLMs to translate text into executable CAD scripts using languages like CadQuery, OpenSCAD, or FreeCAD Python [cite: 1, 2]. The primary advantage of this framework is that executing the script yields an actual, editable parametric feature history [cite: 1]. However, the architectural flaw is that these models hit a hard ceiling on long-horizon command sequences [cite: 5]. As geometries become intricate, the generated code becomes fragile; the AI frequently hallucinates invalid Boolean operations or conflicting dimensional constraints, causing the script to fail to compile or break the CAD kernel entirely [cite: 5].

Direct B-rep Diffusion

This architecture represents the most relevant frontier for production engineering. Rather than outputting brittle code or dumb meshes, geometry-native models directly synthesize mathematical boundary representations (B-reps) [cite: 3]. Systems such as BrepGen and BrepDiff utilize structured latent geometry trees or masked UV grids to sequentially generate faces, edges, and vertices, ultimately outputting kernel-valid STEP files [cite: 0, 3]. These models are excellent for localized geometric completions or direct modeling workflows [cite: 3]. However, the technology is currently bottlenecked by topology: handling complex topological intersections and guaranteeing watertight manifolds without breaking global historical constraints remains a massive computational hurdle for direct B-rep generation [cite: 3].

Conclusion

In summary, while the transition toward AI-driven CAD represents a significant evolution in engineering workflows—leveraging programmatic logic and specialized geometry tools to enhance productivity—it also highlights a critical maturity gap. Existing evaluation frameworks like BenchCAD, Parametric CAD Bench, and CADGenBench provide essential foundational benchmarks for verifying geometric and procedural logic, but they represent only the starting point. To move toward fully autonomous, production-grade engineering, the industry requires significantly more rigorous and comprehensive evaluation suites. True reliability in AI-generated design will only be achieved when we can validate deep structural integrity, material performance, and manufacturing feasibility with the same precision we currently apply to basic geometric validity.

References & Resources

BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD.[cite: 3]
https://github.com/BenchCAD/BenchCAD-main
Parametric CAD Bench: Benchmarking CAD Models and Agents.[cite: 6]
https://www.gnucleus.ai/cad-bench
CADGenBench: A benchmark for AI-driven CAD generation and editing.[cite: 3]
https://github.com/huggingface/cadgenbench
BrepGen: A B-rep Generative Diffusion Model with Structured Latent Geometry.[cite: 4] https://brepgen.github.io
BrepDiff: Single-Stage B-rep Diffusion Model https://brepdiff.github.io
CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward. https://github.com/anniedoris/CAD-Coder
Zoo.dev: API-Driven CAD Infrastructure https://zoo.dev
AI CAD Software Guide (DraftAid context).
https://thecadhub.com/blog/ai-cad-software/
Experimental CAD Is Exploding All Over by Ralph Grabowski.