FintechAfricaVoice AI2025

Voice AI for Informal Merchants: Lessons from Sentreso

Sentreso launch — voice-first financial assistant for informal merchants

It's 6:30 AM in Dakar. Fatou is arranging bags of rice and onions on a wooden table outside her house. By 7, the first customers arrive — neighbors buying breakfast ingredients on the way to work. She makes change from a small plastic bag of coins. She remembers who owes her from yesterday. She keeps no written records. By noon, she'll have processed thirty or forty transactions, and if you ask her how much she's made, she'll give you an approximate number from memory.

Fatou isn't unusual. She's the norm. Across Francophone West Africa, millions of merchants operate exactly like this — selling from carts, tables, and storefronts, managing inventory in their heads, and tracking finances through a combination of memory, notebook scribbles, and trust.

We built Sentreso for merchants like Fatou. And we got almost everything wrong before we got something right.

The Dashboard That Nobody Used

The first version of Sentreso had a dashboard. It was clean. It had charts. It showed income vs. expenses over time, category breakdowns, daily summaries. We were proud of it.

Nobody used it.

Not because it didn't work. Because the assumptions behind it were wrong at every level:

Literacy assumption.A dashboard assumes you can read labels, interpret axes, and understand what a bar chart means. Many of our target users are functionally literate in their daily lives but not in the visual language of data visualization. A pie chart of expense categories is not intuitive — it's a learned convention.

Screen-time assumption.A dashboard assumes you sit down and review data. Fatou doesn't have a moment between 7 AM and 2 PM when she's not actively serving customers, restocking, or moving between locations. A dashboard is a tool for someone with an office. We were building for someone with a cart.

Data cost assumption. Loading a chart-heavy dashboard costs data. In markets where users buy connectivity in small increments — 100 FCFA for an hour of data — every megabyte has a price. A dashboard that auto-refreshes is literally burning money.

The deeper failure.We weren't just building the wrong interface. We were solving the wrong problem. We assumed merchants wanted analysis. They wanted records.

The Insight: Voice Is the Existing Workflow

We tried everything before we tried voice. Simplified data entry. Fewer fields. Offline sync. SMS-based logging. Every time we added a feature, we added friction. The product got more capable and less used.

Then we asked a different question: what does the merchant already know how to do?

They know how to talk.

And here's the thing we almost missed: they're already narrating their transactions. Watch a merchant in Dakar or Abidjan with a notebook. They don't silently write figures. They talk through the transaction as they record it — "twelve bags of rice, two thousand each, that's twenty-four thousand." The voice is the workflow. The notebook is the artifact.

We weren't introducing a new behavior. We were replacing the notebook. The merchant keeps talking exactly the way they already do. The phone listens instead of the paper.

That reframe changed everything.

What We Built

Sentreso is a voice-first financial assistant. You talk to it in French — the language our users actually speak for commerce — and it records your transactions, tracks your balance, and gives you summaries in the same conversational format.

The stack: React Native + Expo for mobile, VAPI for voice AI, Supabase for backend, EAS for the build and deployment pipeline.

Transaction Extraction: The Hard Part

Voice recognition is now a commodity. The hard part is what happens after transcription — turning unstructured, conversational speech into structured financial data.

When someone says "I sold twelve bags at two thousand each," the system needs to parse that into a structured record: type=income, quantity=12, unit_price=2000, currency=FCFA, total=24000. When they say "I spent fifteen thousand on transport this morning," that's: type=expense, amount=15000, category=transport, time=morning.

The extraction has to handle ambiguity. "Two thousand" — is that a unit price or a total? "This morning" — does that mean today or is the user reporting yesterday's transactions? Context matters, and the NLP layer needs to resolve these ambiguities correctly or ask for clarification naturally, in the flow of conversation.

Speed matters as much as accuracy. If the user has to wait three seconds for confirmation, the product fails. If they have to repeat themselves because the system misheard, the product fails. The feedback loop needs to be as fast as a human saying "got it" — because that's what it's replacing.

Offline-First: Not a Feature, the Architecture

In most fintech products, offline mode is an afterthought — a degraded experience you fall back to when connectivity drops. That approach fails completely in markets where connectivity is the exception, not the rule.

Sentreso is built local-first. Every transaction is stored on-device immediately. The app is fully functional with no internet connection — you can record transactions, check your balance, generate receipts, and review history without a single byte leaving the phone.

When connectivity returns, the sync engine kicks in: outbound transactions queue locally and push to Supabase when a connection is available. Conflicts are resolved with a last-write-wins strategy at the field level, which works because in practice these are single-user devices and conflicts are rare. The key design choice: the local database is the source of truth, and the server is a backup. Not the other way around.

Why does "offline as afterthought" fail? Because it treats the server as the source of truth and the local store as a cache. When the cache can't reach the server, the app shows spinners, disabled buttons, and "no connection" banners. The user learns that the app only works sometimes — and "sometimes" isn't good enough to replace a notebook that works always.

The Code-Switching Problem

We assumed French was enough. It's not.

Merchants in Dakar don't speak pure French. They speak a fluid mix of French and Wolof, switching mid-sentence without thinking about it. "J'ai vendu" (I sold) might be followed by a Wolof numeral instead of "douze." A transaction description might start in French and end with a Wolof product name that has no French equivalent.

Standard ASR (automatic speech recognition) models are trained on monolingual corpora. They expect French or Wolof, not both in the same utterance. When they encounter a code-switch, they either hallucinate a French word that sounds similar to the Wolof one, or they drop the segment entirely. Both failure modes are silent — the user doesn't know the system misheard until they check their records and find wrong numbers.

What we tried: fine-tuning on code-switched audio, building a custom vocabulary layer for common Wolof commercial terms, and post-processing heuristics that detect likely code-switches and re-route them through a Wolof-aware model. What worked best was the vocabulary layer — a focused set of ~200 Wolof terms that appear frequently in market transactions (product names, numerals, greetings, units of measure) injected into the French ASR pipeline as known tokens. It's not a general solution to multilingual ASR. It's a pragmatic one for a specific domain.

This is still an active problem. A real solution requires multilingual ASR models trained on naturalistic African language data — data that barely exists in public datasets. We're contributing to that effort, but it's a multi-year project.

The Receipt Pivot

We thought people wanted insights. They wanted proof.

The most-used feature turned out to be voice receipt generation — the ability to say "generate a receipt for this sale" and send a professional PDF to a customer via WhatsApp. We hadn't planned for this to be the primary use case. We built it as an afterthought. It became the product.

The reason is trust. An informal merchant selling wholesale to a restaurant wants to look professional. A receipt — formatted, dated, with a business name — transforms the interaction from "I bought rice from a lady at the market" to "I have a documented purchase from a supplier." It's not about record-keeping. It's about legitimacy.

This reframed Sentreso entirely. We thought we were building a personal finance tool. We were actually building a business operations tool. The distinction matters because it changes what features you prioritize, how you talk about the product, and who you build for. A personal finance tool helps you understand your money. A business operations tool helps you run your business. The merchants didn't need more understanding. They needed better tools.

The Family Finance Gap

Western fintech assumes a clean separation between personal and business finances. You have a business account and a personal account. Revenue goes here, groceries go there. This assumption is baked into every financial product from QuickBooks to Mint.

That assumption breaks completely in the context we're building for. Many of the merchants we spoke to manage finances across a household and a business simultaneously. Today's rice inventory is also tonight's dinner supply. The profit from this morning's sales pays for a child's school fees this afternoon. Money flows between business and family constantly, and the boundaries are fluid by design — not because of poor financial discipline, but because that's how household economics works when the household is the business.

The single-user, single-entity model we launched with misses this entirely. A useful financial tool for these merchants needs to understand that "I spent 5,000 on rice" might be inventory or might be dinner, and that both are legitimate financial events worth tracking. It needs to handle shared accounts — a husband and wife running a stall together, a mother tracking expenses for three market-stall businesses operated by her children.

We haven't solved this yet. The next version of Sentreso needs to support multi-entity tracking without adding the complexity that killed the dashboard. That's the design challenge.

The Broader Lesson

There's a pattern I've seen across Sentreso, the work at Medic, and the Wave years: technology built for markets with high smartphone penetration doesn't work when you just translate it.

The assumptions are wrong at a fundamental level. Assumptions about connectivity, about literacy, about how people organize their financial lives, about what "professional" looks like. You can't retrofit. You have to design from scratch, from the user's actual context.

Voice AI gets interesting in these markets not because it's novel but because it's appropriate. Speaking is universal. Typing is learned. For a merchant who didn't grow up with smartphones, talking to a financial assistant is more natural than tapping through menus.

That's the bet Sentreso is built on. The infrastructure is still catching up — voice recognition in African languages, low-bandwidth audio streaming, payment integration across mobile money networks. But the direction is right.

The next version is going to be harder to build and more useful. That's the right trade-off.

All projects·All writing