MOBILITY · UX RESEARCH2018–2019

Pickup & Rating Research

Pathao · UX Researcher

Drop pin redesign + 6-tag rating taxonomy shipped to 10M+ users in Bangladesh.

MobilityUX ResearchRide-hailingBangladeshMixed Methods

Published 26th May 2026

Background

In 2018–2019, Pathao was Bangladesh's fastest-growing ride-hailing super-app with over 10 million users. Two friction points were generating consistent complaints: riders struggled to pin their exact pickup location, and the rating system produced almost no useful signal — nearly everyone gave 5 stars regardless of experience.

I led a mixed-methods research initiative across both problems simultaneously, running five parallel research streams over several months to inform what became live product changes.

Research Methods

UAT Sessions26 participants across pickup & rating flows

Follow-up Interviews6 in-depth sessions post-UAT

Survey~50 responses on pickup accuracy & rating behaviour

Contextual ObservationOn-ground rides to observe real pickup behaviour

Prototype TestingTwo pickup UI prototypes tested head-to-head

UAT participants

~50

Survey responses

Research methods

Track A — Pickup Experience

The problem with "where are you?"

Pickup failure was one of Pathao's biggest support ticket drivers. Drivers would arrive at an intersection while the rider was 200m away on a parallel street. The app showed a map pin — but the pin wasn't where the rider actually stood.

UAT sessions revealed three distinct mental models: riders who trusted GPS automatically, riders who manually corrected the pin, and riders who typed a landmark instead. The manual correctors were the most accurate — and the most interesting.

The Aha Moment

They weren't just fixing accuracy — they were saving money.

Several riders told us they deliberately moved the drop pin from the north side of a road to the south side. This wasn't about getting the driver closer — it was about changing the fare calculation zone. A pin on the north side of the road was priced at Tk 136, while the same physical destination priced from the south side was only Tk 108.

Users had reverse-engineered the pricing model and were exploiting drop pin freedom to optimise fares. This was simultaneously a design flaw in the pricing logic and a signal of how deeply engaged power users were with the product.

Prototype showdown

We tested two design directions head-to-head in UAT. Both aimed to reduce pickup confusion — but they made different bets on what the real problem was.

Prototype A

Landmark-first input

Lead with a text search for landmarks and intersections. Deprioritise the map pin as a primary interaction. Better for low-GPS-confidence users.

Prototype B — Selected

Precision drop pin

Enhance the map pin with street-level snapping and visual confirmation of the nearest road. Keeps the spatial mental model but makes accuracy effortless.

Prototype B won on task completion rate and perceived accuracy. Users felt more in control with a map-first interaction — even when they didn't understand GPS limitations.

Track B — Rating System

When everyone gives 5 stars, nothing means anything.

The rating system had a classic rating inflation problem. Survey data confirmed what the product team suspected: roughly 9 out of 10 riders gave a 5-star rating regardless of how the ride actually went. The star score had collapsed as a signal.

The real feedback lived in the comments — unstructured, hard to analyse, and invisible to the driver. We needed to capture the nuance without friction.

~9/10

Gave 5 stars

regardless of experience

Useful signal

from star scores alone

What riders actually complained about

We coded open-ended survey responses and interview transcripts to surface the real complaint themes. Four categories dominated — with a critical gender split.

Navigation errors

Drivers taking wrong turns, unfamiliarity with areas, or not using the in-app map. Most common complaint across all segments.

Cleanliness and vehicle condition

Dirty seats, strong odors, damaged interiors. Mentioned frequently but with lower emotional intensity compared to safety concerns.

Professionalism and behaviour

Unprompted conversation, phone use while driving, not following the agreed route. Riders felt they couldn't raise this in real time.

Safety — especially for women

Female riders reported safety anxiety at rates significantly higher than male riders. Comments pointed to late-night rides, driver behaviour, and lack of in-app escalation paths. This couldn't be a secondary tag.

Research Finding

Safety cannot be optional.

When we analysed responses by gender, the gap was stark. Safety appeared as a minor secondary concern for male riders but a primary, high-emotion concern for female riders. Any tag taxonomy that buried Safety behind Professionalism or Cleanliness would systematically erase the feedback that mattered most to the most vulnerable user segment.

The 6-tag taxonomy

Rather than free-text fields that most riders skipped, we designed a tag system: one tap to flag an experience dimension, with structured signal flowing directly to driver coaching and support. App and payment issues were separated into their own category to keep ride quality data clean.

🤝

Professionalism

Driver behaviour, communication, phone use

✨

Cleanliness

Vehicle condition, odor, seat quality

🗺️

Navigation

Route accuracy, map usage, area knowledge

🛡️

Safety

Driving behaviour, late-night concerns, escalation

📱

App Issue

In-app problems, pickup flow, map accuracy

💳

Payment Issue

Fare disputes, payment method failures

Impact

10M+

Users reached

Both features shipped to full Pathao user base

11%

Pickup accuracy improvement

Post-launch measurement

6 tags

Structured rating signals

vs. 0 before the redesign

Reflection

Users will adapt around constraints you don't know exist.

The fare optimisation behaviour was invisible in the product data — it only appeared when we sat next to people and watched them use the app. Desk research would have missed it entirely. This reinforced for me that field observation is non-negotiable, not a nice-to-have.

Aggregate ratings hide the most important feedback.

A 4.8-star average looks healthy. But that average was masking a segment of female riders who felt unsafe and had no structured way to say so. When you design feedback systems, you have to ask who the system is designed to make comfortable — the rater or the operator.

Running two research tracks in parallel forces prioritisation discipline.

With pickup and rating both live, I had to make constant calls about where to spend observation time and who to recruit for which track. It was a useful constraint — it prevented either problem from expanding to fill all available research space.

Prototype testing is fastest when it's most specific.

The two-prototype showdown for pickup was deliberately narrow — we tested one dimension at a time. Broader prototypes produce ambiguous results. The more specific the question, the cleaner the signal.

Gender-disaggregated analysis should be standard, not special.

The safety finding only emerged because we deliberately split the data. If we'd analysed responses in aggregate, the signal would have been averaged away. Disaggregating by gender, device type, and experience level should be the default starting point, not a secondary step.

Back to Portfolio shahed@rrshahed.xyz