LangSmith

5· 8 reviews

AI summary ready

Build and deploy LLM applications with confidence

LangSmith is a platform to help developers close the gap between prototype and production. It’s designed for building and iterating on products that can harness the power–and wrangle the complexity–of LLMs.

Visit website

Save tool

AI Infrastructure Tools AI Metrics and Evaluation

8 reviewsAI summary available

AI Infrastructure ToolsAI Metrics and Evaluation

Reviews for LangSmith

Hear what real users highlight about this tool.

Based on 8 reviews

AI summary

LangSmith earns strong praise for reliability, observability, and faster iteration from both makers and users. Makers of Watchman AI highlight flexible, non-LangChain use, real-time analytics, and versioning that keep agents stable. Makers of DryMerge call out robust metadata filtering essential for agent monitoring. Makers of Intryc applaud observability and prompt management. Users value tracing, datasets, evals, and CI/CD integration, noting smoother debugging and quicker feature evaluation. Common asks: a more scalable UI for large datasets and better sharable, persistent filters.

This AI-generated snapshot distills top reviewer sentiments.

fmerian5/524d ago

Shoutout for all the observability and prompt management capabilities!

Source: Product Hunt

Robin Kalari5/51mo ago

LangSmith gives me visibility into what my LLMs are actually doing — tracing, evaluating, and debugging end-to-end. Alternatives feel piecemeal, but LangSmith is built for production from day one. I can test, monitor, and improve models in the same place, which saves hours and makes iteration faster.

Source: Product Hunt

Ieuan King5/52mo ago

For evals and developing agents.

Source: Product Hunt

Maryam Warraich5/525d ago

LangSmith has made it so much easier to take my LLM projects from idea to production. The debugging and iteration tools save me tons of trial and error.

Pros

+ chain sequence debugging (2)+ time-saving (1)+ easy to use (1)

Source: Product Hunt

Felix Petersen5/54mo ago

Shoutout to LangSmith — the best tool we've found for debugging, evaluating, and improving LLM apps in production.

We’re building on OpenAI, Claude, and some custom RAG pipelines. Prompt engineering used to feel like trying to tune a car engine blindfolded. LangSmith gave us eyes, logs, metrics, and iteration velocity.

Why LangSmith stood out We considered:

Traceloop – beautiful UI but lacks first-class support for LangChain agents/tools

PromptLayer – good for tracking OpenAI usage but not built for structured evals or agents

Weights & Biases – amazing for ML ops but too heavyweight for LLM chains

Manual logging + BigQuery – flexible but high overhead and no evaluation layer

LangSmith hit the sweet spot of observability, evaluation, and dev UX.

Why we use it in production:

Structured logs across complex chains, agents, tools, retrievers
Real-time debugging of streaming outputs and nested calls
Trace replay to see exactly what happened inside a failed agent run
Built-in evals for regressions, hallucinations, quality benchmarks
Clean Python SDK that doesn't force a rewrite of your stack

Tips from our experience

Use dataset runs + evals early, even on toy prompts — you'll thank yourself later
When deploying an agent, wrap it in LangSmith tracing from day one — the logs alone are worth it
Pair LangSmith with a feature flag system (like GrowthBook or LaunchDarkly) to ship LLM prompt updates safely
We ship a lot faster now by building A/B evals around new prompts or RAG logic before rollout

LangSmith makes LLM dev feel like real software engineering. Observability and testability were missing from this space — until now.

Source: Product Hunt

Hannah Craighead5/59mo ago

I've been using LangSmith for a couple of months at our startup and it's been incredibly useful for running ongoing LLM evaluations and for evaluating new features. The Python SDK is handy and we've automated LangSmith evals as part of CI/CD on GitHub to spot regressions.

My main struggle with LangSmith has been that the UI can be tricky to work with for larger datasets or datasets with a large experiment history. I would also love for the filters to be persisted in the URL. That way I can send a filtered set of failing examples to someone without having to provide instructions for how to reconstruct it.

Pros

+ evals (2)+ Python SDK (1)+ CI/CD integration (1)

Cons

− UI issues with large datasets (1)− filters not persisted in URL (1)

Source: Product Hunt

Masa5/51yr ago

We use LangSmith to streamline and manage our conversational agents, helping us build and refine the AI-driven coach within our app. It allows us to evaluate and improve LLM performance quickly, optimizing for user engagement and ensuring the chatbot provides accurate feedback.

Source: Product Hunt

Raffaele Zarrelli5/51yr ago

I've loved using LangSmith! It's efficient and user-friendly, making it a joy to work with. The platform's comprehensive visibility into the chain sequence of calls simplifies debugging and enhances the development process.

Pros

+ comprehensive visibility (1)+ chain sequence debugging (2)

Source: Product Hunt