ICSC CMS Evaluation

Scoring criteria for the marketer task battery.

The eval scores each task on two distinct dimensions — kept separate, never collapsed into one number. The shape of an app's weakness matters as much as its strength.

Two dimensions

D1 — Operational Excellence

How efficiently can marketers perform recurring work — editing, publishing, translation, SEO, approvals, asset management?

Tests work within the existing implementation.

D2 — Organizational Independence

How much structural / experience-creation work can marketing complete on their own, and at what governance cost?

Tests what happens when the demo scope is exceeded.

Do not average D1 and D2 into one score. A 4.5/2.5 split tells a different story than 3.5/3.5 even though the means match.

Operational Excellence battery (D1)

Ten marketer-productivity tasks. Each scored on the per-task axes below. Assumes the three content types (Home, Generic Landing, Event Landing) and the shared modular block library are in place.

Most of these are Comms tasks. To give the Publishing team a relevant test, the Web team builds an article page template at the start of sandbox testing; Publishing runs the applicable tasks (personalize, A/B, SEO, approval) against that template. The tag on each task shows who it's for.

Hero swap + scheduled publish. Update the homepage hero image, headline, and CTA destination, then schedule it to flip live at 6am PT on ICSC LAS VEGAS opening day. Tests visual editing, scheduling, and preview fidelity. Comms · Publishing skips?
New generic landing from existing components. Spin up a campaign page from existing modular blocks (hero, agenda, sponsor strip, register CTA) — no dev help. Tests component reuse and page-composition speed. Comms only
AI rewrite + French translation. Tighten a body section with the platform's AI assist, then generate a French variant for a Montreal event. (ICSC runs no events outside North America; French applies only to the 2 Montreal events.) Tests AI authoring quality and the localization workflow. Comms
Personalize a CTA by audience. Show different CTA variants to first-time vs. returning visitors (or by geo). Three audiences pre-seeded; rater builds a fourth as part of the task. Tests audience authoring + variant binding. Comms + Publishing
A/B test the Event "Register" CTA. Run "Register Now" vs. "Secure Your Spot" with a defined goal. Tests experiment setup and the reporting view. Comms + Publishing
SEO + Open Graph metadata, verified externally. Set page title, meta description, canonical URL, and OG image on an Event landing. Verify in LinkedIn Post Inspector / Meta Sharing Debugger / opengraph.xyz. Score CMS authoring and external-tool verification separately. Authoring is Comms + Publishing; the Web team owns the external verification step. Comms + Publishing · Web verifies
Send a draft through approval before publish. Request review from a teammate, address an inline comment, then publish. Tests workflow, comments, and role permissions. Comms + Publishing
Bulk upload + organize sponsor logos. Upload ~15 logos, tag by sponsorship tier, normalize sizes, drop into an Event landing sponsor block. Tests DAM bulk ops + asset reuse. Comms only
Audience-driven publish decision. Open the CDP / insights tool. Find the highest-intent audience segment for LAS VEGAS. Choose which of two CTA variants to promote next week and write a one-line rationale. Deliverable is the decision, not a screenshot. Scope TBD — candidate to drop
Reusable template + three pages. Save a Generic Landing as a pattern/template, instantiate three times. Tests governance and template authoring. Separately, the Web team saves an article page template up front so Publishing can run their own version. Comms · Pub via article template

Organizational Independence battery (D2)

Three tasks that deliberately exceed the demo's pre-modeled scope — testing structural / experience creation without a delivery team. A successful completion means marketing did this part themselves; it does not mean no agency is needed for review or platform stewardship.

I1 · Net-new experience that wasn't pre-modeled. "ICSC is launching a new education series. There's no template for this. Build a launch page: needs a different layout (panelist grid, RSVP form, member-only gate). Reuse blocks where possible, create new structure where needed. No dev or designer."
I2 · AI-generated experience from prompt. Using each platform's agentic AI, prompt for a complete page concept. Use the Proptech landing page Ceros (proptech.icsc.com) as the reference for the type of page to recreate, and prompt until the output reflects that layout. Iterate. Evaluate quality, brand alignment, and what's usable as-is vs. needs rework.
I3 · Net-new component creation. A "sponsor tier" block has been requested for Women in CRE — three columns: photo, name, role, expertise tags, contact button. Doesn't exist in the library. Build it without a dev. Note completion, accessibility, and reusability — they feed the vibe: "would I want to use this component again?"

Scoring — vibes-first (D1)

Axis	Scale
Vibe — how did doing this here feel?	1–5
Needed dev help?	Y / N
One-line note — what made it feel that way	text

Scored mostly on vibes — the gut read of what the task felt like, not a stopwatch (the group won't track time or clicks precisely, so we don't pretend to). 1 = painful · 3 = fine · 5 = loved it. The Vibe rolls confidence, speed, and "would I do this again" into one honest read — the number that maps to whether ICSC actually lives in the tool. "Needed dev help?" is the one hard fact we keep.

Scoring — vibes-first (D2)

Axis	Scale
Vibe — would I want to build here again?	1–5
Hit a wall that needed dev?	Y / N + where
One-line note — brand + structure impression	text

Same one-number / one-flag / one-line shape as D1. Brand alignment and structural quality aren't separate scores anymore — they live in the vibe and the note. Keep the two vibe scores side by side; never average D1 and D2.

Post-battery reflection prompts

Answered qualitatively, not scored. For D2 these carry more weight than per-task numbers — the demos all succeed at small scale, so the questions worth asking are about what happens next.

Governance: If marketers can create anything (especially via AI), what mechanisms enforce brand, accessibility, and structural standards? Does flexibility create governance debt over 6–12 months?
Reusability: Are AI-generated experiences patternized and maintainable, or one-off artifacts? Can today's AI-generated solution become tomorrow's standard?
Consistency at scale: If 10 marketers used this independently, would they produce one coherent system or 10 slightly different ones? What enforces consistency?
Where complexity accumulates: Traditional workflows produce design debt and dev debt. AI-first workflows may produce experience debt, content debt, and governance debt. Where does it actually accumulate here?
Exit risk: If ICSC wants to leave the platform in five years — how portable, inspectable, and recoverable are the experiences? Are AI-generated experiences exportable in a useful way?
Guardrails for a no-review team: ICSC's marketing has no internal design / dev / accessibility review. What review function fills the gap? What breaks first without one?

Calibration with 2–10 raters

Per-rater template (one tab per person, identical structure)
Roll-up tab per dimension: mean, range, and standard deviation
Qualitative-comments column on each task — don't lose the "why" behind a low score
Tasks with wide rater disagreement are the interesting ones, regardless of average. Flag them in the writeup.