Kotlin logo

Kotlin

A concise multiplatform language developed by JetBrains

AI Ecosystem Kotlin

推出 Tracy:适用于 Kotlin 的 AI 可观测性库

Tracy 是一款开源 Kotlin 库,可以在几分钟内为 AI 赋能的应用程序添加生产级可观测性。它可以帮助您调试故障、测算执行时间,并跟踪模型调用、工具调用和您的自定义应用程序逻辑中的 LLM 使用情况。归根结底,全面的可观测性确保您可以获取了解真实应用程序行为的准确数据,从宏观趋势到细粒度跟踪分析性能,为全面的在线和离线评估提供支持。

Tracy 与常用的 Kotlin/LLM 技术栈(包括 OkHttpKtor 客户端,以及 OpenAIAnthropicGemini 服务)无缝协作,它的底层基于 OpenTelemetry 构建。此架构可以确保开发者全面灵活掌控跟踪数据,支持以标准形式将数据导出至任何兼容的后端(如 Jaeger、Zipkin 或 Grafana),并直接与 LangfuseW&B Weave 这类专用 LLM 工程平台集成。

尽管 Spring AI 或 Koog 等成熟的 AI 框架提供内置可观测性,但 LLM 调用必须完全通过其框架发起才能实现跟踪,并且这类框架无法通过简易的方式跟踪内部应用程序流。相比之下,Tracy 可以通过 API 或 HTTP 客户端插桩的方式来监测 LLM 使用情况。它还通过为 Kotlin 函数或代码块添加注解的方式帮助您理清 AI 组件或内部 AI 智能体状态的时序与因果关系。

我们已开放 Tracy 的源代码,诚邀您帮助我们扩展其功能 – 无论是请求新增 AI 后端或 API 客户端集成,还是提交拉取请求实现相应功能。

AI 可观测性的组件和 Tracy 的解决方案

作为工程师,无论是为现有应用程序添加可观测性,还是从头开始构建新应用程序,我们都想要跟踪、存储和分析以下内容:

  1. LLM 调用元数据,包括被调用的 API、模型及其参数。我们可以选择在开发过程中跟踪 LLM 输入和输出,以进行调试,同时确保生产环境中不会跟踪这些数据。
  2. 引发和响应 LLM 调用的应用程序逻辑流 – 即某次调用的发起位置及涉及的工具。

想象一个非常简单的问候用户的 LLM 聊天应用程序,通过部署工具让问候更具个性化。利用 OpenAI 客户端,应用程序代码可能是这样的:

/** Interface for LLM tool */
interface Tool {
   /** Tool call */
   fun execute(): T
}

/** Gets the current user's name from the system */
class GetUserName() : Tool { ... }

/** Gets the current date and time */
class GetCurrentDateTime() : Tool { ... }

fun main() {
   // Create OpenAI-client using environment variables
   val client: OpenAIClient = OpenAIOkHttpClient.fromEnv()
   ...
   val params = ResponseCreateParams.builder()
       .model(ChatModel.GPT_4O_MINI)
       .maxOutputTokens(2048)
       .addTool(GetUserName::class.java)
       .addTool(GetCurrentDateTime::class.java)
       .input(ResponseCreateParams.Input.ofResponse(inputs))
       .build()

   // Get the response. 
   // In a real application, it would use a loop to process tool calls.
   val response: Response = client.responses().create(params)
   ...
   println(finalGreeting)
}

此时,需要跟踪的重要内容为:

  1. 问候智能体被调用的情况。
  2. LLM 调用。
  3. 工具执行。

我们可以使用基本的 OpenTelemetry SDK,但这样便需要手动添加插桩代码,且这会导致工具调用跟踪的代码重复。 

在理想场景下,我们将能够通过配置一次工具跟踪对所有实现进行自动跟踪,从而确保绝不会出现新增工具未被跟踪的问题。Tracy 让理想场景变为现实。

通过 Tracy 添加可观测性

Tracy 提供的三个高级 API 可以帮助我们全面跟踪聊天应用程序。

限定作用域的 span

withSpan API 可用于创建限定作用域的 span。这些 span 会自动在块开始时激活,在块结束时终止,从而确保正确的嵌套和时序。 

fun main() {
   // Encapsulation into withSpan ensures that all nested events will be
   // traced as part of the greeting agent’s work.  
   withSpan("Greeting agent") {
       ...
   }  
}

LLM 客户端插桩 

LLM 调用是任意 AI 智能体的重要组成部分。它们决定了应用程序的成本、延迟和效率,是出现问题时首先要排查的对象。因此,为 LLM 客户端增加可观测性应简单易行,并尽量减少对代码库的更改。例如,为 OpenAI 客户端添加插桩就非常简单,只需完成以下步骤:

val client = OpenAIOkHttpClient.fromEnv()
// All calls made with the instrumented client are traced.
instrument(client)

默认情况下,客户端插桩仅会跟踪元数据。要跟踪可能包含敏感数据的 LLM 输入和输出,必须以编程方式明确启用此功能,代码为:

TracingManager.traceSensitiveContent()

也可以在运行时启用跟踪,将环境变量 TRACY_CAPTURE_INPUTTRACY_CAPTURE_OUTPUT 设为 true 即可。

工具调用和函数跟踪

LLM 高度依赖工具:各类工具可以帮助 LLM 高效完成确定性任务、节省 token,还能与其运行环境进行交互。作为开发者,我们同样重视工具,但为代码库中的 LLM 工具逐一添加可观测性是一项繁琐且极易遗漏的任务。

虽然 Python 框架中通过装饰器很好地解决了这类问题,但 Kotlin 开发者此前只能投以羡慕的目光。Tracy 彻底改变了这一现状。借助基于注解的跟踪,开发者只需为接口方法添加 @Trace 注解,便可在所有实现类中实现跟踪。如果只是想跟踪某个隔离的方法,操作也同样简单。@Trace 注解同样适用于个别方法或函数。

/** Interface for LLM tool */
interface Tool {
   // All tool calls are now traced
   @Trace(name = "Tool Call")
   fun execute(): T
}

总结

捕获应用程序遥测数据只是第一步。后续还需要将这些数据路由至合适的后端进行存储和分析。尽管我们强烈建议开发者使用专为 LLM 跟踪设计的可观测性解决方案,并直接提供对 Langfuse 和 W&B Weave 的支持,但 Tracy 也提供便捷的方式,可将跟踪发送至任何兼容 OpenTelemetry 的后端、文件或控制台。仓库中包含大量示例,如需获取本文示例的完整代码,请点击此处

借助 Tracy,只需几秒即可配置向 Langfuse 的遥测数据导出。最后,您会得到捕获了 LLM 和工具调用的层级化应用程序跟踪。

未来计划

我们坚信,无论未来几年 LLM 如何发展,可观测性始终是高效、可靠 AI 工程的核心。无论底层 LLM 的性能变得多么强大,开发阶段和生产环境中仍需对使用 LLM 的应用程序进行调试和评估。Tracy 正是为了满足这一需求而创建的,即为 Kotlin 生态系统引入生产级 AI 可观测性。

这只是一个开端! 您可以通过提交问题、提交拉取请求,或在项目中试用 Tracy 并分享反馈,为Kotlin AI 生态系统的发展做出贡献。让我们共同开启跟踪之旅!  

本博文英文原作者:

Anton Bragin

Anton Bragin

AI Ecosystem Kotlin

Introducing Tracy: The AI Observability Library for Kotlin

Tracy is an open-source Kotlin library that adds production-grade observability to AI-powered applications in minutes. It helps you debug failures, measure execution time, and track LLM usage across model calls, tool calls, and your own custom application logic. Ultimately, comprehensive observability ensures you have the exact data needed to understand real-world application behavior, analyze performance from high-level trends down to granular traces, and power comprehensive online and offline evals.

It works seamlessly with common Kotlin/LLM stacks (including OkHttp and Ktor clients, as well as OpenAI, Anthropic, and Gemini ones) while relying on OpenTelemetry under the hood. This architecture guarantees complete flexibility over your trace data, enabling both standard exporting to any compatible backend (like Jaeger, Zipkin, or Grafana) and direct integration with dedicated LLM engineering platforms like Langfuse and W&B Weave.

While full-fledged AI frameworks like Spring AI or Koog provide built-in observability, LLM calls must be made exclusively through their framework APIs to be traced, and they do not provide an easy way to trace the internal application flow. In contrast, Tracy helps you monitor LLM usage through API or HTTP client instrumentation. It also helps you unwind the timing of and causal relationships between AI components or internal AI-agent states by annotating Kotlin functions or blocks of code.

By making Tracy open-source, we invite you to help extend its functionality – whether by requesting new integrations for AI backends and API clients, or by submitting pull requests to implement them.

Components of AI observability and how Tracy helps

As engineers, whether we’re adding observability to an existing application or building a new one from scratch, we want to trace, store, and analyze the following:

  1. LLM call metadata, including the API being called, the model, and its parameters. Optionally, we may want to track LLM inputs and outputs during development for debugging, while ensuring they are not traced in production.
  2. Application logic flow that leads to and from LLM calls – where a certain call originates and which tools are involved.

Imagine a very simple LLM chat application that greets the user, employing tools to make the greeting more personal. Using the OpenAI client, the application code might look like this:

/** Interface for LLM tool */
interface Tool<T> {
   /** Tool call */
   fun execute(): T
}

/** Gets the current user's name from the system */
class GetUserName() : Tool<GetUserName.UserNameResult> { ... }

/** Gets the current date and time */
class GetCurrentDateTime() : Tool<GetCurrentDateTime.DateTimeResult> { ... }

fun main() {
   // Create OpenAI-client using environment variables
   val client: OpenAIClient = OpenAIOkHttpClient.fromEnv()
   ...
   val params = ResponseCreateParams.builder()
       .model(ChatModel.GPT_4O_MINI)
       .maxOutputTokens(2048)
       .addTool(GetUserName::class.java)
       .addTool(GetCurrentDateTime::class.java)
       .input(ResponseCreateParams.Input.ofResponse(inputs))
       .build()

   // Get the response. 
   // In a real application, it would use a loop to process tool calls.
   val response: Response = client.responses().create(params)
   ...
   println(finalGreeting)
}

The important things to trace here are:

  1. The fact that the greeting agent was called.
  2. The LLM calls.
  3. The tool executions.

We could use the basic OpenTelemetry SDK, but that would require us to add instrumentation code manually, and it would lead to code repetition for tool call traces. 

In an ideal scenario, we would be able to configure tool tracing once and have all implementations traced automatically, ensuring we never end up in a situation where newly added tools go untraced. Tracy makes this scenario a reality.

Adding observability with Tracy

Tracy provides three high-level APIs that help us fully cover our chat application with tracing.

Scoped spans

The withSpan API allows you to create scoped spans. These spans automatically activate when a block starts and end when the block finishes, ensuring correct nesting and timing. 

fun main() {
   // Encapsulation into withSpan ensures that all nested events will be
   // traced as part of the greeting agent’s work.  
   withSpan("Greeting agent") {
       ...
   }  
}

LLM client instrumentation 

LLM calls are a crucial part of any AI agent. They define the cost, latency, and efficiency of the application, and they are the first things to be investigated if something goes wrong. That’s why adding observability to an LLM client should be straightforward and require minimal changes to the codebase. For example, adding instrumentation to your OpenAI client is as easy as:

val client = OpenAIOkHttpClient.fromEnv()
// All calls made with the instrumented client are traced.
instrument(client)

By default, client instrumentation traces metadata only. To trace LLM inputs and outputs, which may contain sensitive data, you must explicitly enable this programmatically with:

TracingManager.traceSensitiveContent()

Alternatively, you can enable it at runtime by setting the TRACY_CAPTURE_INPUT and TRACY_CAPTURE_OUTPUT environment variables to true.

Tool calls and function tracing

LLMs love tools: They help the LLMs effectively complete deterministic tasks, save tokens, and interact with the environment they operate in. As developers, we love tools as well, but adding observability for each and every LLM tool in the codebase is a mundane task that is easy to forget.

While decorators shine for such scenarios in Python frameworks, Kotlin developers previously could only look on with envy. Tracy changes things for the better. With annotation-based tracing, you simply have to add the @Trace annotation to an interface method to enable tracing in all implementing classes. If you have an isolated method you want to trace, it’s just as easy. The @Trace annotation works on individual methods or functions as well.

/** Interface for LLM tool */
interface Tool<T> {
   // All tool calls are now traced
   @Trace(name = "Tool Call")
   fun execute(): T
}

Bringing it all together

Capturing telemetry from the application is only half the battle. The other half is routing it to a proper backend where it can be stored and analyzed. While we definitely recommend using observability solutions that target LLM tracing specifically, and provide support for Langfuse and W&B Weave out of the box, Tracy also offers effortless ways to send traces to any OpenTelemetry-compatible backend, file, or console. The repository contains a number of examples, and the complete code for the example from this article is available here.

Configuring telemetry export to Langfuse takes seconds with Tracy. As a result, you get a hierarchical application trace with both LLM and tool calls captured.

What’s next

We truly believe that regardless of the pace of LLM progress in the coming years, observability will remain a cornerstone of effective and reliable AI engineering. No matter how good the underlying LLMs become, the applications using them must still be debugged and evaluated – both during development and in the field. We created Tracy in response to this demand, aiming to bring production-grade AI observability to the Kotlin ecosystem.

And we are just getting started! You can contribute to the growth of the Kotlin AI ecosystem by filing issues, submitting pull requests, or simply by trying Tracy in your projects and sharing your feedback. Let’s trace together!