Plotline | 2025

Conversational AI builder for consumer apps

Conversational AI for consumer apps //
Can AI actually help users in-context?

Every customer has a unique set of aspirations, sensibilities and expectations from consumer apps. Capturing the intent and assisting users at the right time is what every app aims for, but as B2C apps grow more complex, users struggle to complete key actions.

Team

1 Designer, 1 PM, 4 Engg

From scoping to launch

4-5 weeks

Improvement in discovery

~40%

Reduction in time-to-value

~20%

Reduction in drop-offs

~15-20%

Avg reduction in support tickets

~30-40%

CONTEXT

What does Plotline do?
and what it couldn't do yet.

What does Plotline do? and what it couldn't do yet

Plotline helps growth teams at consumer apps build in-app experiences - stories, nudges, walkthroughs, scratch cards - without writing code. The platform sits between a marketer's intent and their end user's behaviour. Our customers are product and growth teams at B2C apps, mostly in fintech, e-commerce, and gaming.

"I'm using Plotline's nudges. They've helped. But users still have questions I can't anticipate, and when they don't get answers, they leave."

Growth team, Kredivo (Indonesia's top 5 finance company)

The problem wasn't that nudges weren't working. It was that nudges are one-directional. Nudges can prompt and inform. Nudges can't listen or talk. And that's what users were trying to do - have a conversation - at exactly the moments that mattered most

For a user mid-way through a loan application who suddenly wonders about a lock-in clause - a nudge can't answer that in that moment. A static FAQ won't cover every organic question. A support ticket takes 1–2 hours. By then, the user is gone.

Plotline's current tools allowed teams to create in-app experiences like these

Who is getting affected

End users - not able to complete key actions and derive value from the product

Product owners are not able to convert their users and plug holes in business growth

Why does it matter

For a user applying for a loan or making a first investment, that wait is the conversion killer.

Dropped-off users tend to raise support tickets leading to overhead on the business team and delayed clarity

PROBLEM // THE COST OF SILENCE

What happens when a user has a question and nobody answers

For a finance app, (60% of Plotline's user base) these are the core UX flows where users seek assistance

Add/Withdraw money from wallets

Adding money and retrieving money for use across the platform

Investments onboarding

Securities and fund investments

Lending through application

Products such as personal, housing, auto loans and credit against investments, P2P lending

What happens when a user has a question and nobody answers

A user is midway through a loan application. They hit a clause about lock-in periods. They need to understand what happens if they withdraw early. Right now, there are exactly two paths available to them.

Path 1

Raise a support ticket. Average resolution time: 1–2 hours. By then, the intent has cooled and trust has started eroding. Most users don't come back.

Path 2

Dig through an FAQ. The question they actually have - specific to their situation, their flow, their moment - isn't there. It never is. FAQs cover what product teams think users will ask, not what users actually ask.

“Over the past 2 years, we have seen that the time taken to resolve support tickets is inversely proportional to lifetime value of our customers"

ALETHIA TAN

SVP, Growth, Kredivo Indonesia

The pattern was consistent across every fintech customer we talked to. The gap wasn't information - the information existed somewhere, in a policy doc, or in an FAQ buried three taps deep. The gap was timing and context. The right information, but not available at the moment the user needed it, in the place they were looking for it

RESEARCH // UNDERSTANDING THE SHAPE OF THE PROBLEM

We talked to the people losing these users

What we didn't know

Before designing anything, I needed to understand what "answering user questions in real-time" actually meant to the people who'd be responsible for it - product and growth teams at our customer companies.

Where do drop-offs actually happen, and which of those moments could a conversation genuinely help?
What would make a marketer trust an AI agent enough to deploy it to their users?
What do end users expect from an AI inside a financial app - and where does trust break down?
What does "good" look like for a conversational interaction in a high-stakes context (lending, investments)?

How we found the answers

We ran 1:1 semi-structured interviews with product and growth leads at fintech apps in our customer base. Semi-structured because we wanted to follow threads, we had a guide, but the most valuable findings came from places we didn't expect. We talked to teams at Kredivo, Dream11, and several others across lending, investing, and wallets.

We also did a journey audit, mapped the core user flows (loan application, investment, wallet top-up/withdrawal) against where support tickets were being raised. This gave us a quantitative layer to ground the qualitative interviews.

One deliberate gap: no end-user research upfront. The timeline didn't allow it. We'd validate with real users during the pilot. That was a tradeoff we named out loud

What are the main customer interactions within your app that could benefit from AI-powered conversations (e.g., customer support, product recommendations, order tracking)?
How do you currently gather customer feedback, troubleshoot issues, and upsell products? Would a conversational agent be suitable for any of these?
How do you estimate the agent's impact in your app? (e.g., multilingual support, personalization, deep product knowledge)?
How important is AI-human handover in complex cases? What is your expectation of bot vs. human interactions?
What are your top concerns about integrating conversational AI agents? (Options: technical complexity, security and privacy, customer trust, handling edge cases, impact on brand, regulatory compliance)

“Over the past 2 years, we have seen that the time taken to resolve support tickets is inversely proportional to lifetime value of our customers"

ALETHIA TAN

SVP, Growth, Kredivo Indonesia

“An agentic experience inside my app should aid the overall discoverabilty and usage. It should intelligently understand when it is needed and what it should help with”

Rishabh

Growth team, Dream11

Three things we learned that reshaped our direction

We expected the dominant concern to be "Will the AI give wrong answers?" That was a concern, but it wasn't the primary one.

Training: "How do I make it know what it needs to know?"

Marketers were anxious about knowledge gaps - stale information, missing context, wrong answers. But what surprised us was how they wanted to solve this. They wanted to teach the system from conversations they'd already had.

This insight directly shaped the evaluation through a benchmark system.

Testing: "How do I know it'll behave the right way before I push it live?"

There was near-universal anxiety about deploying something they couldn't fully preview. Teams wanted to simulate conversations - not just check settings. This wasn't about technical QA. It was about confidence. A marketer needs to be able to say "I have talked to this thing and it makes sense" before they trust it with their users.

Deployment: "What happens when it doesn't know something or gets it wrong?"

The question of escalation - when does the bot hand off to a human, and how - came up in every single interview. More importantly, several people raised brand risk: "If my AI agent says something incorrect about a loan product, I'm liable." This wasn't paranoia. It was valid. It changed how we thought about the autonomy spectrum.

APPROACH // WHAT A SOLUTION WOULD NEED TO DO

Key requirements that narrowed the field

Coming out of research, the shape of the solution was getting clearer. Whatever we built needed to solve for these.

Know where the user is in the app, not just "on the home screen" but "midway through a loan application, on the documentation step."

Know what they're trying to do and tailor the response to that specific intent, not a generic FAQ answer.

Handle questions it hasn't been specifically programmed for the organic, contextual, long-tail questions that no scripted system can anticipate.

Respond in real-time because a 1–2 hour support ticket is a risky approach as it may lead to user abandonment.

Know when to stop because when a question touches compliance, liability, or something it genuinely doesn't know, it needs to hand off or escalate.

Why the obvious options didn't work

The risks were real: hallucination, inconsistency, latency, brand voice drift. But these were design problems, not reasons to abandon the approach. Every structural decision in the product - the knowledge base architecture, the benchmark system, the testing simulator, the deployment controls - exists to constrain and direct the LLM, not to replace it.

Decision trees / scripted flows

Decision trees are deterministic and auditable, but they fail requirement three immediately. You can't pre-build branches for "what happens to my lock-in if I want to prepay in 6 months?" The organic tail is infinite.

FAQ overlays / static knowledge surfaces

Plotline's nudge toolkit already did a version of this. Research told us users had outgrown it. Fails requirement three again, and partially fails requirement one (no awareness of where the user is in their flow).

LLM-based responses

Flexible, can synthesise context from multiple sources, can hold a conversation across turns, can adapt to tone and intent. An LLM at its core, but wrapped in enough structure that a marketer could configure it, test it, trust it, and ship it.

The risks were real: hallucination, inconsistency, latency, brand voice drift. But these were design problems, not reasons to abandon the approach.

The risks were real: hallucination, inconsistency, latency, brand voice drift. But these were design problems, not reasons to abandon the approach. Every structural decision in the product - the knowledge base architecture, the benchmark system, the testing simulator, the deployment controls - exists to constrain and direct the LLM, not to replace it.

Context

Who the agent is, how it speaks, what it can and can't discuss.

Knowledge

What it knows, from which sources, scoped to which flows.

Tools & Actions

What it can do beyond talking and how would it get information

Deployment

When it appears, who sees it, and how it fails gracefully.

TRAINING YOUR AGENT

How do you teach an AI what it should know (and only what it should know)?

Knowledge base

Collection of data points that the agent can retrieve as required such as FAQs, policy PDFs, product specs. The raw material for your agent to get started.

Benchmark conversations

Curated Q&A pairs that set the standard for how the agent should respond. Not just "here's the information" but "here's what a good answer sounds like."

Knowledge base

Collection of data points that the agent can retrieve as required such as FAQs, policy PDFs, product specs. The raw material for your agent to get started.

Problem to design for

A knowledge base that ingests everything and scopes nothing is dangerous. It hallucinates confidently from irrelevant sources. The design problem isn't "how do you give the agent more information?" It's "how do you make it reach for the right information at the right moment and ignore the rest?"

Solution

Easy addition of documents/URLs with guided flows
A marketer can tag a source to a specific user flow. "This document should only be referenced when the user is in the loan application." Scoping reduces hallucination by limiting what the model can reach for in any given moment.

Easy addition of documents/URLs with guided flows
A marketer can tag a source to a specific user flow. "This document should only be referenced when the user is in the loan application." Scoping reduces hallucination by limiting what the model can reach for in any given moment.

Specific addition of URL based content for only relevant information

Adding usage context with your knowledge documents

Problem to design for

Solution

Easy addition of documents/URLs with guided flows

A marketer can tag a source to a specific user flow. "This document should only be referenced when the user is in the loan application." Scoping reduces hallucination by limiting what the model can reach for in any given moment.

Benchmark conversations

Benchmark conversations

Curated Q&A pairs that set the standard for how the agent should respond. Not just "here's the information" but "here's what a good answer sounds like."

Problem to design for

The risks were real: hallucination, inconsistency, latency, brand voice drift. But these were design problems, not reasons to abandon the approach. Every structural decision in the product - the knowledge base architecture, the benchmark system, the testing simulator, the deployment controls - exists to constrain and direct the LLM, not to replace it.

Your brand voice is built through the way you interact with your users. Marketers and growth managers were very conscious of preserving their brand voice

This came directly from the research finding about invisible failure - marketers wanted to train the system from conversations that had already happened, not only by uploading PDFs and hoping for the best. They wanted the system to sound like the best of their manual responses and preserve the overall quality of responses.

Your brand voice is built through the way you interact with your users. Marketers and growth managers were very conscious of preserving their brand voice

The risks were real: hallucination, inconsistency, latency, brand voice drift. But these were design problems, not reasons to abandon the approach. Every structural decision in the product - the knowledge base architecture, the benchmark system, the testing simulator, the deployment controls - exists to constrain and direct the LLM, not to replace it.

Solution

The risks were real: hallucination, inconsistency, latency, brand voice drift. But these were design problems, not reasons to abandon the approach. Every structural decision in the product - the knowledge base architecture, the benchmark system, the testing simulator, the deployment controls - exists to constrain and direct the LLM, not to replace it.

The benchmark system works like this: a marketer adds a conversation pair - a real theme or topic and the ideal response. The agent calibrates against these.
In the simulator, the marketer rates the agent's actual response against the benchmark. Low-rated responses become training signal.
Every design decision in this interface (the rating mechanism, the side-by-side comparison, the feedback input) was tested against one question: would a growth lead at Kredivo/SBI/BharatPe/Upstox understand what they're doing here?

The benchmark system works like this: a marketer adds a conversation pair - a real theme or topic and the ideal response. The agent calibrates against these.
In the simulator, the marketer rates the agent's actual response against the benchmark. Low-rated responses become training signal.
Every design decision in this interface (the rating mechanism, the side-by-side comparison, the feedback input) was tested against one question: would a growth lead at Kredivo/SBI/BharatPe/Upstox understand what they're doing here?

The risks were real: hallucination, inconsistency, latency, brand voice drift. But these were design problems, not reasons to abandon the approach. Every structural decision in the product - the knowledge base architecture, the benchmark system, the testing simulator, the deployment controls - exists to constrain and direct the LLM, not to replace it.

The benchmark interface frames this as teaching by example. Not weights. Not fine-tuning. Not parameters. Just: here's what good looks like. The agent moves toward those responses over time.

Benchmark addition for human-like responses in your authentic brand voice

Blind testing via side-by-side rating mechanism compares agent responses to previous benchmarks. This helps give unbiased directions to the system.

Blind rating system for actual agent responses

CONFIGURING YOUR AGENT

How do you make "configure an AI" feel like a job a marketer already knows how to do?

Setting up your agents

Since the whole concept of having an AI agent take care of your users' needs, aspirations and frustrations was new, I built a few pre-configured templates to help the marketers get started and explore in a low friction way.

To reduce the cognitive load on the marketer, I designed a template library, pre-configured setups for common use cases. A loan FAQ agent. An investment onboarding assistant. A wallet support agent. They start from something recognisable and shape it.

Pre-configured agent templates as the starting block. These can be contextual to any industry the dashboard is configured for

Once a marketer picks a template, they're configuring the agent's behaviour. The risk here was dumping everything into one long settings panel - tone, rules, boundaries, flow logic - and hoping they'd figure it out. Thus, I broke context into three blocks, each with a distinct job:

Communication style, Conversation flow and Escalation rules

The three blocks create a natural sequence: first you decide how the agent sounds, then how it navigates, then where it draws the line. Each block is completable independently, and each has sensible defaults that work out of the box.

Broken down context for a clear set of instructions to the system

Measuring success for this flow

It is very important to track the usability of a completely new product added to our core dashboard. Thus, we are closely tracking the performance.

Task success rate - Creation

Currently ~ 57%

Usability support ticket ratio

Currently ~ 45%

Time to first value

Currently ~ 8 minutes

TESTING AND BUILDING TRUST IN YOUR AGENT

How do you make a non-technical person confident enough to deploy AI to their users?

Configuring an agent and trusting it are two different things. A marketer can fill in every setting correctly and still not feel confident enough to ship it to millions of users. The gap isn't knowledge but evidence. They need to see the agent perform before they believe it will.

Research was unambiguous on this: every team we spoke to wanted to have an actual conversation with the agent before going live. Not review a settings summary. Not check a preview screenshot. Talk to it. Break it. See how it recovers.

Here’s a snapshot of how we can simulate the entire conversation experience, rate previous conversations and help the agent learn exactly how it is supposed to communicate with your users.

"How will I simulate my user's conversation"

"Can I test agent's responses at scale"

"What is causing latency in replies"

ENSURING THE AGENT SHOWS UP AND BEHAVES THE WAY WE WANT IT TO

How will I deploy the system with confidence?
Giving enough context and situation handling directions to the agent

Context broken down into communication styles, conversation guidance & escalation and hand-overs

How will the system detect when to intervene?

Adding the right tools and knowledge bases - can we reduce the cognitive load here?

Setting up for success? How will I define it?

MAKING SENSE OF IT ALL

Analytics that feed back into the system, and keeps constantly improving

Agent performance broken down into actionable intelligence

Started by focusing on core metrics such as conversation volume, goal completion rate, and human handoff rate (when users are escalated to live agents). Real-time conversation logs, knowledge and tools performance also help in targeting the agent better.

Four metrics. Each one connects to a specific action the marketer can take.

Conversation quality

Gaps in responses today become training tasks. The marketer adds a benchmark, updates a source, re-tests. The system improves through use, without engineering involvement.

Goal completion rate

Did the user continue their journey after the conversation ended? This is the metric that ties agent performance to the business outcome customers actually care about.

Agent deployment

Is the agent being triggered at the right instances? If not, deployment rules need adjustment. Is the latency affected in multiple parallel conversations?

Human handoff rate

How often is the agent reaching its limits? High rates signal knowledge gaps or over-tight escalation thresholds.

"I want the system to learn from its mistakes and not repeat them"

Ensuring a system of record for diving deeper and debugging

WHAT'S NEXT

What I learnt & where are we taking this next

Conceptualising how to build modular agentic experiences for platforms like Plotline, for marketers from leading consumer apps and visualising experience for their end users was a great opportunity to understand and design for:

Building conversational agents in a modular way

Breaking the whole process into functions such as context, knowledge, tools & actions ensured a very gradual learning curve and progressive complexity.

Segregating global and agent-specific building blocks

Centralising appearance, communication and brand guidelines reduces the potential for inconsistent experiences

Building trust and traceability into every AI decision

Building unbiased testing and learning flows for agent's training solves for trust at a scale of millions

Where are we taking this next

Working on agentic experiences opens a whole world of possibilities. For the next versions, ideations and concepts have already started!

Improving handover flow to manual agents

Real-time view into handoffs as they happen for live escalation visibility

In-depth end-user research

Sessions with real users of pilot customers to close the gap we deliberately left open.

Introducing new access points and interactions

Access points like floating buttons, pinned banners, gestures like long hold, bottom swipe can be introduced

The interesting problem in AI product design isn't the model. It's the person sitting in the configuration screen, deciding whether this system knows enough, behaves well enough, and fails reliably enough to represent their brand to their users. That's the interface we built.

The interesting problem in AI product design isn't the model. It's the person sitting in the configuration screen, deciding whether this system knows enough, behaves well enough, and fails reliably enough to represent their brand to their users. That's the interface we built.

The interesting problem in AI product design isn't the model. It's the person sitting in the configuration screen, deciding whether this system knows enough, behaves well enough, and fails reliably enough to represent their brand to their users. That's the interface we built.

Can users talk to my app in real-time and get their doubts and queries resolved? I designed the system that lets growth teams build contextual, in-app conversational AI — chat and voice — for the moments where a nudge isn't enough and a support ticket is too late.

Can users talk to my app in real-time and get their doubts and queries resolved? I designed the system that lets growth teams build contextual, in-app conversational AI — chat and voice — for the moments where a nudge isn't enough and a support ticket is too late.

Conversational AI builder for consumer apps

Conversational AI for consumer apps // Can AI actually help users in-context?

1 Designer, 1 PM, 4 Engg

1 Designer, 1 PM, 4 Engg

4-5 weeks

4-5 weeks

~40%

~40%

~20%

~20%

~15-20%

~15-20%

~30-40%

~30-40%

CONTEXT

What does Plotline do? and what it couldn't do yet.

What does Plotline do? and what it couldn't do yet

What does Plotline do? and what it couldn't do yet

Who is getting affected

Who is getting affected

Why does it matter

Why does it matter

PROBLEM // THE COST OF SILENCE

What happens when a user has a question and nobody answers

What happens when a user has a question and nobody answers

For a finance app, (60% of Plotline's user base) these are the core UX flows where users seek assistance

For a finance app, (60% of Plotline's user base) these are the core UX flows where users seek assistance

Add/Withdraw money from wallets

Add/Withdraw money from wallets

Investments onboarding

Investments onboarding

Lending through application

Lending through application

What happens when a user has a question and nobody answers

Path 1

Path 2

RESEARCH // UNDERSTANDING THE SHAPE OF THE PROBLEM

We talked to the people losing these users

We talked to the people losing these users

We talked to the people losing these users

What we didn't know

What we didn't know

How we found the answers

How we found the answers

Three things we learned that reshaped our direction

Three things we learned that reshaped our direction

Three things we learned that reshaped our direction

Training: "How do I make it know what it needs to know?"

Training: "How do I make it know what it needs to know?"

Testing: "How do I know it'll behave the right way before I push it live?"

Testing: "How do I know it'll behave the right way before I push it live?"

Deployment: "What happens when it doesn't know something or gets it wrong?"

Deployment: "What happens when it doesn't know something or gets it wrong?"

APPROACH // WHAT A SOLUTION WOULD NEED TO DO

Key requirements that narrowed the field

Key requirements that narrowed the field

Key requirements that narrowed the field

Know where the user is in the app, not just "on the home screen" but "midway through a loan application, on the documentation step."

Know what they're trying to do and tailor the response to that specific intent, not a generic FAQ answer.

Handle questions it hasn't been specifically programmed for the organic, contextual, long-tail questions that no scripted system can anticipate.

Respond in real-time because a 1–2 hour support ticket is a risky approach as it may lead to user abandonment.

Know when to stop because when a question touches compliance, liability, or something it genuinely doesn't know, it needs to hand off or escalate.

Why the obvious options didn't work

Decision trees / scripted flows

Decision trees / scripted flows

Decision trees are deterministic and auditable, but they fail requirement three immediately. You can't pre-build branches for "what happens to my lock-in if I want to prepay in 6 months?" The organic tail is infinite.

Decision trees are deterministic and auditable, but they fail requirement three immediately. You can't pre-build branches for "what happens to my lock-in if I want to prepay in 6 months?" The organic tail is infinite.

FAQ overlays / static knowledge surfaces

FAQ overlays / static knowledge surfaces

Plotline's nudge toolkit already did a version of this. Research told us users had outgrown it. Fails requirement three again, and partially fails requirement one (no awareness of where the user is in their flow).

Plotline's nudge toolkit already did a version of this. Research told us users had outgrown it. Fails requirement three again, and partially fails requirement one (no awareness of where the user is in their flow).

LLM-based responses

LLM-based responses

Flexible, can synthesise context from multiple sources, can hold a conversation across turns, can adapt to tone and intent. An LLM at its core, but wrapped in enough structure that a marketer could configure it, test it, trust it, and ship it.

Flexible, can synthesise context from multiple sources, can hold a conversation across turns, can adapt to tone and intent. An LLM at its core, but wrapped in enough structure that a marketer could configure it, test it, trust it, and ship it.

The risks were real: hallucination, inconsistency, latency, brand voice drift. But these were design problems, not reasons to abandon the approach.

Context

Context

Conversational AI for consumer apps //
Can AI actually help users in-context?

What does Plotline do?
and what it couldn't do yet.

How will I deploy the system with confidence?
Giving enough context and situation handling directions to the agent

How will I deploy the system with confidence?
Giving enough context and situation handling directions to the agent