ICSC CMS Evaluation

Scoring criteria for the marketer task battery.

The eval scores each task on two distinct dimensions — kept separate, never collapsed into one number. The shape of an app's weakness matters as much as its strength.

Two dimensions

D1 — Operational Excellence

How efficiently can marketers perform recurring work — editing, publishing, translation, SEO, approvals, asset management?

Tests work within the existing implementation. Expected to favor Contentstack.

D2 — Organizational Independence

How much structural / experience-creation work can marketing complete on their own, and at what governance cost?

Tests what happens when the demo scope is exceeded. Expected to favor Uniform.

Do not average D1 and D2 into one score. A 4.5/2.5 split tells a different story than 3.5/3.5 even though the means match.

Operational Excellence battery (D1)

Ten marketer-productivity tasks. Each scored on the per-task axes below. Assumes the three content types (Home, Generic Landing, Event Landing) and the shared modular block library are in place.

  1. Hero swap + scheduled publish. Update the homepage hero image, headline, and CTA destination, then schedule it to flip live at 6am PT on ICSC LAS VEGAS opening day. Tests visual editing, scheduling, and preview fidelity.
  2. New generic landing from existing components. Spin up a campaign page from existing modular blocks (hero, agenda, sponsor strip, register CTA) — no dev help. Tests component reuse and page-composition speed.
  3. AI rewrite + Spanish translation. Tighten a body section with the platform's AI assist, then generate a Spanish variant for LATAM members. Tests AI authoring quality and the localization workflow.
  4. Personalize a CTA by audience. Show different CTA variants to first-time vs. returning visitors (or by geo). Three audiences pre-seeded; rater builds a fourth as part of the task. Tests audience authoring + variant binding.
  5. A/B test the Event "Register" CTA. Run "Register Now" vs. "Secure Your Spot" with a defined goal. Tests experiment setup and the reporting view.
  6. SEO + Open Graph metadata, verified externally. Set page title, meta description, canonical URL, and OG image on an Event landing. Verify in LinkedIn Post Inspector / Meta Sharing Debugger / opengraph.xyz. Score CMS authoring and external-tool verification separately.
  7. Send a draft through approval before publish. Request review from a teammate, address an inline comment, then publish. Tests workflow, comments, and role permissions.
  8. Bulk upload + organize sponsor logos. Upload ~15 logos, tag by sponsorship tier, normalize sizes, drop into an Event landing sponsor block. Tests DAM bulk ops + asset reuse.
  9. Audience-driven publish decision. Open the CDP / insights tool. Find the highest-intent audience segment for LAS VEGAS. Choose which of two CTA variants to promote next week and write a one-line rationale. Deliverable is the decision, not a screenshot.
  10. Reusable template + three pages. Save a Generic Landing as a pattern/template, instantiate three times. Tests governance and template authoring.

Organizational Independence battery (D2)

Three tasks that deliberately exceed the demo's pre-modeled scope — testing structural / experience creation without a delivery team. A successful completion means marketing did this part themselves; it does not mean no agency is needed for review or platform stewardship.

  • I1 · Net-new experience that wasn't pre-modeled. "ICSC is launching an AMA series for LATAM. There's no AMA page type. Build a launch page: needs a different layout (panelist grid, RSVP form, member-only gate). Reuse blocks where possible, create new structure where needed. No dev or designer."
  • I2 · AI-generated experience from prompt. Using each platform's agentic AI (Uniform Scout / Contentstack AI Assistant + Brand Kit), prompt for a complete page concept — e.g. "an industry research report landing with download CTA, key statistics, and three related case studies." Iterate. Evaluate quality, brand alignment, and what's usable as-is vs. needs rework.
  • I3 · Net-new component creation. A marketer wants a "sponsor tier comparison" block — three columns: photo, name, role, expertise tags, contact button. Doesn't exist in the library. Build it without a dev. Score completion, time, accessibility, reusability, and "would I want to use this component again."

Per-task axes (D1)

AxisScale
Time to completeminutes
Clicks / stepscount
Required dev helpY / N
Confidence in result1–5
Would I want to do this again here1–5

The last axis is the adoption predictor. Cumulative time and clicks are interesting; the "again" score is the one that maps to whether ICSC actually lives in the tool.

Additional axes (D2 only)

AxisScale
Production-ready outputY / N
Brand alignment1–5
Structural quality (a developer's eye)1–5
Hit a wall requiring dev helpY / N + where

Post-battery reflection prompts

Answered qualitatively, not scored. For D2 these carry more weight than per-task numbers — the demos all succeed at small scale, so the questions worth asking are about what happens next.

  • Governance: If marketers can create anything (especially via AI), what mechanisms enforce brand, accessibility, and structural standards? Does flexibility create governance debt over 6–12 months?
  • Reusability: Are AI-generated experiences patternized and maintainable, or one-off artifacts? Can today's AI-generated solution become tomorrow's standard?
  • Consistency at scale: If 10 marketers used this independently, would they produce one coherent system or 10 slightly different ones? What enforces consistency?
  • Where complexity accumulates: Traditional workflows produce design debt and dev debt. AI-first workflows may produce experience debt, content debt, and governance debt. Where does it actually accumulate here?
  • Exit risk: If ICSC wants to leave the platform in five years — how portable, inspectable, and recoverable are the experiences? Are AI-generated experiences exportable in a useful way?
  • Guardrails for a no-review team: ICSC's marketing has no internal design / dev / accessibility review. What review function fills the gap? What breaks first without one?

Calibration with 2–10 raters

  • Per-rater template (one tab per person, identical structure)
  • Roll-up tab per dimension: mean, range, and standard deviation
  • Qualitative-comments column on each task — don't lose the "why" behind a low score
  • Tasks with wide rater disagreement are the interesting ones, regardless of average. Flag them in the writeup.

Full task battery and strategic framing live in docs/CONTENTSTACK_VS_UNIFORM_TASKS.md. Edits to this page should be mirrored to the source doc.