The eval scores each task on two distinct dimensions — kept separate, never collapsed into one number. The shape of an app's weakness matters as much as its strength.
How efficiently can marketers perform recurring work — editing, publishing, translation, SEO, approvals, asset management?
Tests work within the existing implementation. Expected to favor Contentstack.
How much structural / experience-creation work can marketing complete on their own, and at what governance cost?
Tests what happens when the demo scope is exceeded. Expected to favor Uniform.
Do not average D1 and D2 into one score. A 4.5/2.5 split tells a different story than 3.5/3.5 even though the means match.
Ten marketer-productivity tasks. Each scored on the per-task axes below. Assumes the three content types (Home, Generic Landing, Event Landing) and the shared modular block library are in place.
Three tasks that deliberately exceed the demo's pre-modeled scope — testing structural / experience creation without a delivery team. A successful completion means marketing did this part themselves; it does not mean no agency is needed for review or platform stewardship.
| Axis | Scale |
|---|---|
| Time to complete | minutes |
| Clicks / steps | count |
| Required dev help | Y / N |
| Confidence in result | 1–5 |
| Would I want to do this again here | 1–5 |
The last axis is the adoption predictor. Cumulative time and clicks are interesting; the "again" score is the one that maps to whether ICSC actually lives in the tool.
| Axis | Scale |
|---|---|
| Production-ready output | Y / N |
| Brand alignment | 1–5 |
| Structural quality (a developer's eye) | 1–5 |
| Hit a wall requiring dev help | Y / N + where |
Answered qualitatively, not scored. For D2 these carry more weight than per-task numbers — the demos all succeed at small scale, so the questions worth asking are about what happens next.
Full task battery and strategic framing live in docs/CONTENTSTACK_VS_UNIFORM_TASKS.md. Edits to this page should be mirrored to the source doc.