City Science Lab SF × MIT Media Lab City Science

Public Safety Pulse

Measure the feeling.
Improve the response.

San Francisco invests heavily in public safety — policing, cleaning, activations, ambassador programs. No existing system can tell you whether residents and visitors actually experience those neighborhoods as safer and more welcoming. Public Safety Pulse builds that signal: continuous, block-level, and designed to close the gap between operational output and lived public experience.

The Problem

Quantifying the feeling of safety.

Cities currently track incidents: crimes reported, complaints filed, response times. What no system captures is how the city's response shapes public experience in a neighborhood — and how that experience shapes decision making. Without that signal, every program is evaluated on its outputs, never on the outcome it was meant to serve.

The Approach

Add the sentiment layer that's been missing.

Public Safety Pulse (PSP) places brief sentiment surveys at the touchpoints people already use — point-of-sale systems, transit kiosks, community check-ins — collecting direct, point-in-time feedback from residents and visitors at block level. That signal, paired with 22 years of operational data and foot traffic context, produces a composite picture of how people actually experience each block. Not a proxy or a model output — a direct measure, on a cadence agencies can act on.

The Outcome

A more responsive, accountable city.

For the first time, a city will be able to pair operational data with direct, block-level resident experience — updated monthly, tied to specific interventions. Agencies and districts won't just see what was deployed; they'll see whether it changed how people experience their neighborhoods. Felt impact and conditions data, measured together, is the feedback loop that makes a city genuinely responsive and accountable to the people it serves.

Explore PSP

✨

The Vision

Where PSP is going — Phase 1 product roadmap, the full data-to-impact pipeline, and an interactive prototype of block-level experience mapping in Mid-Market.

See the vision →

📊

Perception & the Gap

SF residents report feeling less safe even as many conditions have improved. This tab shows 20 years of that divergence — and why the gap between what's measured and what's felt is where civic investment gets lost.

See the gap →

⚡

Interventions

What current data can and can't tell us about whether city and CBD programs are working — and why connecting operational outputs to resident experience is the measurement layer SF is missing.

See evidence →

🗺️

Neighborhood Baseline Map

Ten years of change across all 41 SF neighborhoods — five data layers, interactive time-lapse from 2016, and four validated findings on the 2018 structural break and early warning signals. The analytical foundation PSP builds from.

View the historical baseline →

🤝

Partners

City Science Lab San Francisco and MIT Media Lab City Science — the institutional partnership behind Public Safety Pulse.

Meet the team →

Phase 0 · Pre-publication · Working Instrument · All findings labeled by validation status

What you're seeing: Each neighborhood colored by how it compares to its own 20-year average — not to other neighborhoods. Red = conditions currently elevated above that neighborhood's norm. Green = quieter than usual. Five data layers: safety incidents, street conditions, foot traffic, economic activity, and public sentiment. Toggle to isolate any one. Time Lapse shows change since 2016. Phase 0 · historical conditions + City Survey proxy · not real-time block-level data

Composite Concern

LowerHigher

↓ Scroll for findings from this data

Scores show deviation from each neighborhood's own historical baseline — not a comparison between neighborhoods. Showing: 12-month average

2016

What This Data Reveals

Phase 0 is not just setup for Phase 1 — it produced real findings. Four validated signals from 28.6 million public records that change how SF's public safety data should be read. Each is labeled by confidence level.

Phase 0 Data Foundation

Five Layers — What Each One Measures

Phase 0 integrates five independent public data streams. Four of them — safety incidents, street conditions, foot traffic, and economic vitality — predicted the fifth (public sentiment) with R²=0.824 for 15 years (meaning the model explains 82% of the variation in neighborhood sentiment scores across SF's 41 neighborhoods). After 2019, that predictive relationship broke down sharply.

Safety Incidents

Crimes reported to SFPD, 311 safety calls, drug incident reports. 41 neighborhoods · monthly · 2003–2026 · 11.2M records.

Street Conditions

Graffiti, encampments, illegal dumping, cleaning requests from 311. 41 neighborhoods · monthly · 8.4M records.

Foot Traffic & Transit

SFMTA ridership, parking occupancy, meter revenue — proxy measures for neighborhood presence. These signals are directional indicators, not direct foot traffic counts; interpret as relative patterns across neighborhoods rather than absolute volumes.

Economic Vitality

Business registrations and closures, quarterly sales tax revenue. 41 neighborhoods · monthly + quarterly.

Public Sentiment (Proxy)

SF Controller's City Survey — 11 supervisor districts, biennial. The thinnest layer. Phase 1 replaces this with direct block-level monthly measurement.

✓ Validated · Operational Finding

Two Types of Neighborhoods Require Two Different Responses

118 escalation events detected across 41 neighborhoods reveal a consistent split. Chronically elevated neighborhoods (Tenderloin pattern) have persistently high baseline scores with slow, structural decline — they require sustained intervention. Disruptive spike neighborhoods (Visitacion Valley 11 weeks, Russian Hill 12 weeks, Outer Mission 10 weeks) show sharp, fast-moving escalation that standard lagging indicators miss entirely.

The implication: a single alert threshold applied citywide will either miss spikes or flood chronic areas with false positives. These two types need different detection logic and different response protocols.

✓ Validated · Panel Model

The Predictive Model Broke in 2019

For 15 years, the four conditions layers predicted safety perception with R²=0.824. Post-2019, prediction error increased 4.7× (RMSE 0.13 → 0.61). Same data inputs, dramatically worse predictive power. Something structurally changed in how residents form their sense of safety — and conditions data alone can no longer explain it.

2019 is the inflection point. Pre-2018: stable relationship. 2018–2019: early shift. 2020+: model breaks down.

✓ Validated · Early Warning

Escalation Has Detectable Precursors

Analysis of the 118 escalation events shows consistent precursor patterns that appear 2–4 weeks before conditions scores cross alert thresholds. Specific combinations of 311 call type shifts and foot traffic changes reliably precede escalation in both neighborhood types — giving CBDs and city agencies an operational window to intervene before conditions deteriorate.

Directional finding. Formal validation requires Phase 1 prospective testing.

Phase 0 Findings

What 28.6 million public records across five data layers reveal about San Francisco's public safety — and where the data runs out. These findings live in the Explore the Data tab as contextual insights alongside the interactive map.

Phase 0 Data Foundation

Five Layers of Public Data — What Each One Measures

Phase 0 integrates five independent public data streams into a single analytical framework. Each layer captures a different dimension of neighborhood conditions. Four of these layers — safety incidents, street conditions, foot traffic, and economic vitality — predicted the fifth (public sentiment) with R²=0.824 for 15 years. After 2019, that predictive relationship broke down sharply.

Safety Incidents

Crimes reported to SFPD, 311 safety-related calls, drug incident reports. 41 neighborhoods · monthly · 2003–2026 · 11.2M records.

Street Conditions

Graffiti, encampments, illegal dumping, street and sidewalk cleaning requests from 311. 41 neighborhoods · monthly · 8.4M records.

Foot Traffic & Transit

SFMTA ridership, parking occupancy, parking meter revenue — proxy measures for how many people are moving through and spending time in each neighborhood.

Economic Vitality

Business registrations and closures, quarterly sales tax revenue, commercial vacancy trends. 41 neighborhoods · monthly + quarterly.

Public Sentiment (Proxy)

SF Controller's City Survey safety perception scores — 11 supervisor districts, biennial. The thinnest layer: high geographic aggregation, low time resolution. Phase 1 replaces this proxy with direct, block-level, monthly measurement.

The key relationship: For 15 years, the first four layers predicted the fifth with an R² of 0.82. After 2019, that predictive power dropped sharply — the same conditions began producing significantly different levels of felt safety depending on neighborhood, time, and context. The five findings below document what the data shows about this shift.

✓ Validated · 118 Events · Operational Finding

Two Types of Neighborhoods Require Two Different Responses

118 escalation events detected across 41 neighborhoods using rate-ratio methodology (observed vs. 12-month rolling baseline). The analysis revealed a consistent split that has direct implications for how resources and alerts should be configured.

High-volume neighborhoods need sustained management

Tenderloin, SoMa, Mission, Bayview, Financial District are chronically elevated but trending down. Intensive city investment is visible in the data. These neighborhoods don't "spike" — they require consistent operational attention, not emergency response activation.

Medium-volume neighborhoods produce the surprises

Visitacion Valley (11 weeks, fall 2024), Russian Hill (12 weeks, summer 2023), Outer Mission (10 weeks, late 2024) — extended escalation events in neighborhoods that don't typically appear on dashboards. These are the events most likely to blindside city officials without early-warning infrastructure.

Current uniform alert thresholds miss both patterns: they fire too often in high-volume areas (alarm fatigue) and too late in medium-volume areas (response lag). Phase 1 calibrates thresholds per neighborhood type.

Directional · Pooled cross-correlation

Government Responsiveness Is Now the Key Predictor

Pre-2020, commercial activity (sales tax, business count) directionally predicted safety conditions — more commercial vitality correlated with lower disorder. Post-2020, 311 resolution time and response volume became the stronger signal. Residents appear to use "is the city maintaining this?" as a primary safety cue.

Best responsiveness (directional): Tenderloin and Castro — city investment and CBD management are visible in 311 resolution patterns relative to each neighborhood's own baseline

Largest gap (directional): Visitacion Valley, Mission — city response rate slowing beyond what demand alone explains, per 311 data

Directional finding from pooled cross-correlation analysis. Pre-2020: commercial health negatively associated with safety anomalies. Post-2020: 311 volume and resolution time became stronger signals. Specific coefficients pending formal model run.

✓ Validated · District Level

Perception and Conditions Are Telling Different Stories

Safety perception hit its lowest recorded level in 2023 while many objective conditions were stable or improving. Downtown perception has since recovered — but the recovery is geographically concentrated and the measurement gap remains.

3.63

City Survey 2023 (1–5)

83%

Downtown day, CityBeat 2026

Muni safety perception: +15 points (2021–2025, SFMTA Rider Survey) while general neighborhood perception declined — context-specific interventions can move targeted perception without requiring citywide improvement

CityBeat surveys SF likely voters (n=500, ±4.4pp). Downtown recovery is a positive signal but limited in geographic scope. Not citywide.

✓ Validated · Panel FE Model

The Four Conditions That Stopped Working in 2019

For 15 years, four specific signals reliably predicted safety perception across all 11 supervisor districts: noise complaint share, violent crime per capita, drug offense rate, and graffiti share. Model error: 0.13. After 2019, the same four signals produced 4.7× more error (0.61). The predictive relationship broke.

0.82

R² pre-2019

0.61

RMSE post-2019

4.7×

Error increase

This is not an argument against addressing noise, crime, drugs, and graffiti. It's evidence that those actions no longer reliably translate into residents feeling safer — and that without measuring perception directly, there's no way to know if interventions are working.

✓ Validated · Change Point Detection

2018 Was the Real Turning Point — Not 2020

Change point detection across 41 neighborhoods identified 2018 as the year with the highest concentration of structural shifts across all layers. COVID arrived after the underlying fabric had already begun fraying. Economic Vitality layer showed the earliest directional shifts — commercial corridor hollowing preceded visible safety and environmental deterioration in the pooled analysis.

Key implication: Interventions focused on 2020 and beyond are addressing consequences, not causes. The commercial and economic hollowing that set the conditions for the 2019–2023 perception collapse began at least two years earlier.

Caveat: "Economic Vitality leading" is a directional finding from pooled cross-neighborhood analysis. It has not been validated individually for each neighborhood in the dataset. Phase 1 formalizes this with neighborhood-specific lead-lag analysis.

Perception & the Measurement Gap

How safe do San Franciscans feel in their neighborhoods — and is that changing? This tab presents what the data shows.

The citywide standard is the SF Controller's City Survey: a biennial resident survey tracking safety perception since 1996. City departments run their own service-specific surveys; outside research firms conduct supplemental polls. Together, these sources provide a directional read on public sentiment.

None of them tell us how a specific block or corridor is doing right now. They capture citywide or district-level sentiment, measured every two years. That temporal and geographic gap — between monthly operational data and biennial, district-level sentiment — is what Phase 1 is designed to close.

✓ Verified · City Survey + CityBeat

Safety Perception Over Time: Decline, Then Partial Recovery

The SF Controller's City Survey has tracked citywide resident safety perception since 1996 — it is the standard measure of how San Franciscans feel about safety in their neighborhoods. Scores were stable near 4.2 on a 5-point scale from 2013 through 2019, then fell to a historic low of 3.63 in 2023. The CityBeat poll (SF likely voters, downtown focus, annual, n=500) provides a supplemental read on downtown-specific perception — a different population, different geography, different method. Both are shown below.

City Survey (Citywide)

Source: SF Controller's Office City Survey, 42K responses, 1–5 scale. Biennial. District level only.

CityBeat Downtown Perception

Source: CityBeat 2026, EMC Research / United Airlines. SF likely voters (n=500, ±4.4pp) asked about downtown SF. Annual. Not a citywide sample.

City Survey 2023 citywide: 3.63/5.0 — lowest in survey history. Perception declined even as many objective conditions improved.

CityBeat 2026 downtown daytime: 83% report feeling safe (up from 64% in 2023). Nighttime: 51% (up from 30%). Downtown recovery is real and spatially concentrated.

The gap between citywide City Survey and downtown CityBeat recovery is itself a finding: improvement is spatially concentrated and has not reached all neighborhoods equally.

✓ Verified · City Survey 2023

Perception by Supervisor District (2023)

Finest resolution available from public data. 11 districts × biennial measurement. Day (amber) vs. Night (blue) scores on 1–5 scale.

Daytime safety Nighttime safety

✓ Validated · Panel Model Finding

This Changes How We Think About the Problem

Before 2019, the data was predictable: clean streets, lower crime rates, and faster response times reliably corresponded with residents feeling safer. That relationship held for 15 years. Then it broke. Phase 0 analysis of the same data post-2019 found that a single factor emerged as the dominant predictor of whether people experience their neighborhoods as safe — not crime rates, not cleanliness, but whether government appears to be paying attention and responding.^†

This has a direct operational implication: improving public safety perception runs through being seen as responsive — a signal no conditions dataset measures. That's what Phase 1 builds.

^† Panel fixed-effects model, R²=0.824, 15-year training window 2003–2018, 41 neighborhoods. Full specification in Data & Methods →

The Phase 1 Gap

What Phase 0 Sees vs. What's Missing

The four conditions layers characterize neighborhoods with 47 metrics at monthly resolution across 22 years. The perception layer — how residents actually feel — is measured at district level, every two years. That gap is where intervention money gets wasted: conditions can improve while perception stays stuck, or perception can recover in ways the data doesn't capture.

🔴 Safety Incidents

41 neighborhoods · monthly · 2003–2026

🟡 Street Conditions

41 neighborhoods · monthly · 8.4M records

🔵 Foot Traffic & Transit

Proxy measures only · transit, parking, meters

🟢 Economic Vitality

41 neighborhoods · quarterly sales tax + monthly business

🟣 Public Sentiment

11 districts · biennial · Phase 1 delivers neighborhood-level monthly measurement

Phase 1 fills this gap by deploying direct perception measurement via NPS touchpoints on existing digital infrastructure: point-of-sale systems, Clipper card reader prompts, Envoy visitor kiosks. Target: 5,000–50,000 monthly responses at neighborhood level. This turns the empty bar above into monthly data at the same resolution as conditions.

Interventions & Early Warning Patterns

San Francisco currently tracks incident rates, response times, and conditions across all 41 neighborhoods. These signals identify where pressure is building. What they don't provide is the granular, high-frequency resident feedback needed to contextualize those incidents — whether people experienced the response, whether conditions shifts were noticed, or whether a specific deployment changed the ambient experience of a block. That is the measurement layer Phase 1 provides.

Directional · Cross-Neighborhood Pattern

What the Current System Measures — and What It Doesn't

The current analysis baseline (Phase 0) tracks three operational signals for each neighborhood:

Incident rate — how often events are occurring relative to each neighborhood's own historical baseline
Response time — how quickly city services are deployed after a call or complaint
Resolution rate — whether 311 requests and service calls are being closed

These tell you whether operations are functioning. What they do not capture is the ambient experience between service calls — whether the street environment improved in ways residents noticed, how interactions with city services felt, or whether the neighborhood as a whole experienced a change. That feedback layer is what Phase 1 is built to collect.

Phase 0 measures operational outputs: incident rate, response time, resolution rate. Phase 1 adds the missing dimension — direct resident feedback on the ambient experience between and around those operations.

Phase 1 uses synthetic control methodology to compare sentiment trends on intervention blocks against similar non-intervention blocks — isolating the effect of specific programs from broader citywide trends.

Directional · Pooled Analysis

Precursor Patterns: What to Watch

Cross-layer lead-lag analysis identified consistent directional patterns in commercial corridor neighborhoods. These are not validated predictive models — they are directional signals worth tracking and formalizing in Phase 1.

Environmental → Safety (lag ~1 month): Waste and odor complaints tend to precede increased 911 calls for service in commercial corridors. Directional signal — worth tracking as an early warning.

Economic → Safety (lag ~6 months): Sales tax declines in commercial corridors showed directional correlation with property crime increases at a 6-month lag. Commercial hollowing appears to precede visible safety decline.

Critical caveat: These patterns were detected in pooled cross-neighborhood analysis. They are NOT validated at individual neighborhood level and may not hold for any specific neighborhood. Phase 1 formalizes these with neighborhood-specific baselines.

Phase 1 Framework

Four Intervention Domains: What Phase 1 Will Measure

Phase 1 deploys a before/after measurement framework using synthetic control methods to estimate causal effects of specific intervention types. Phase 0 can identify where conditions are anomalous; Phase 1 measures whether an intervention moved perception.

👁️

Sights — Physical Environment

Lighting improvements, mural activation, graffiti abatement, façade programs, parklet installation. Proxy: streetlight outage reduction, 311 graffiti complaint trends. Phase 1: before/after perception measurement per intervention block.

Directional data exists

🎵

Sounds — Activation & Programming

Pop-up markets, public performances, outdoor programming. Proxy: 311 noise complaint pattern shifts (positive noise vs disorder noise distinction). Phase 1: event-level perception surveys at activation points.

Phase 1 design ready

🌿

Smells — Cleanliness Infrastructure

Street cleaning frequency, encampment response, Pit Stop program expansion. Proxy: waste/odor 311 report trends. Phase 0 shows environmental signals precede 911 service calls at a 1-month lag — cleanliness intervention may interrupt the cascade.

Cascade signal directional

📡

Civic Signals — Non-Police Safety

Ambassador programs, crisis intervention teams, CAHOOTS model deployments. These non-police alternative response programs are currently invisible to operational data — their activity isn't logged in systems PSP can access. Phase 1 creates the measurement infrastructure to evaluate what city-funded and philanthropically-supported alternative response programs actually deliver in public experience terms.

Requires Phase 1 measurement

Validated · Phase 0 Historical Analysis · 2003–2024

Early Warning Patterns: The Value of Proactive Detection

Historical analysis retrospectively identified 118 escalation events across 41 SF neighborhoods between 2003 and 2024. Cities inevitably respond to these events — the question is when and with what precision. The three cards below break down what proactive detection enables at each stage.

01 · What Triggers It

Cross-layer signals — a spike in environmental complaints followed by elevated 911 call volumes, or property crime rising against a neighborhood's own baseline rather than a citywide threshold. The pattern precedes the visible crisis by weeks.

02 · What Data Confirms It

Rate-ratio analysis compares each neighborhood against its own 12-month rolling baseline — not citywide averages. Low-baseline neighborhoods that generate modest raw counts still surface as escalating. 118 events were confirmed across the 22-year dataset.

03 · What Action It Enables

Earlier detection means a wider window to deploy a targeted response before escalation peaks. Pairing the signal with direct resident sentiment enables measurement of whether that response actually changed public experience — turning each event into a validated reference case.

Three illustrative events from the dataset

Visitacion Valley

Fall 2024

HIGH · 11 weeks

Signals Detected

Violent crime ↑1.4×911 calls ↑1.3×

Wk 1

Wk 11

Both signals crossed threshold simultaneously — longest sustained multi-signal event in dataset

Proactive Detection Enables

📍 Pattern flagged at Week 1–2 — wider response window before escalation peaks

🎯 Specific intervention deployed against emerging signal, not established crisis

💬 Resident sentiment confirms whether the block was experiencing deterioration before conditions data crossed threshold

📊 Response logged against event — builds evidence base for future similar patterns

Russian Hill

Summer 2023

MODERATE · 12 weeks

Signals Detected

Property crime ↑1.3×311 calls ↑1.5×

Wk 1

Wk 12

Low-baseline neighborhood — raw numbers stay low. Citywide thresholds miss it. Baseline-relative detection surfaces it.

Proactive Detection Enables

📍 Neighborhood-relative threshold surfaces the signal that citywide counts miss entirely

🎯 Response calibrated to neighborhood type and baseline — not applied from a generic citywide protocol

💬 Did residents notice before conditions data flagged it? Sentiment answers this.

📊 12-week event becomes a validated reference case for comparable neighborhoods

Outer Mission

Late 2024

HIGH · 10 weeks

Signals Detected

911 calls ↑1.4×EMS calls ↑1.3×

Wk 1

Wk 10

911 + EMS co-escalation suggests single underlying driver. Early alert has highest value in this co-occurrence pattern.

Proactive Detection Enables

📍 Co-escalation pattern triggers coordinated response at Week 1, before either signal peaks

🎯 Single underlying driver hypothesis informs which agencies to coordinate — not generic multi-agency dispatch

💬 Resident experience tracked through the event window — did the response shorten what residents experienced?

📊 Resolution logged with sentiment delta — refines recommended response for the next co-escalation event

The evidence-building value: Each measured intervention — response logged, sentiment tracked before and after, outcome recorded — adds to a neighborhood-specific evidence base. Over time, that base answers: which response types work for which escalation patterns in which neighborhood contexts? The 118 events identified here represent 118 opportunities to build that knowledge. Phase 1 starts capturing it in real time.

The Vision

Phase 1 builds the infrastructure to directly measure how people experience public space — continuously, at the block level, and connected to the interventions being deployed. Below is the full product roadmap, the data-to-impact pipeline, and an interactive prototype of what the tool looks like in Mid-Market.

What PSP Does

A continuous feedback loop between public experience and the decisions that improve it.

📍

Block-level resolution

Public experience varies dramatically within a single neighborhood — one block can feel safe while the next does not. PSP measures at the block level so Community Benefit Districts (CBDs) and city agencies can see exactly where experience is declining and target resources precisely, not broadly.

📅

Continuous measurement. Operational cadence.

PSP collects sentiment continuously through Net Promoter Score (NPS) touchpoints, behavioral signals, and imagery analysis. The reported signal is aggregated monthly — the threshold where block-level samples are large enough to be statistically meaningful and CBDs operate their planning cycles. Collection is always on; insight arrives on a rhythm agencies can act on.

🔁

Closed-loop learning

PSP connects interventions to outcomes. When a CBD deploys ambassadors on a block, PSP measures whether public experience on that block improved — compared to similar blocks without the intervention. Each cycle adds to a growing evidence base that trains the recommendation model, so the system gets more accurate about what works as it's used.

Phase 1 · Illustrative Product View

What PSP Looks Like in Practice

Simulated data — not live

psp.citysciencelab.sf.gov · Mid-Market CBD · Block Sentiment View

⚠️ CRITICAL · Immediate

Market & 8th St

Sentiment

Safety score

Perception gap

+44pts

Highest-concern block. Concurrent safety + conditions signals. Vacant storefronts reducing natural surveillance.

Recommended interventions

→ Deploy ambassador pair immediately

→ Lighting audit + interim upgrade

→ Business activation — vacant storefronts

Phase 1 Vision Prototype · Illustrative — not live data

Mid-Market CBD — Block-Level Sentiment Map

This is an illustrative prototype of how Public Safety Pulse would operate in the Mid-Market pilot zone. Each circle represents a block-level measurement unit along the Market/Mission corridor. Fill color reflects the current sentiment score for that block; border color signals operational urgency — from critical (purple) to low (green). In the live system, these scores update monthly from direct NPS surveys and integrated civic signal data. The dashed boundary marks the proposed Phase 1 pilot zone.

Click any block to see its simulated sentiment score, conditions, perception gap, and recommended interventions. Use the layer toggle to switch between data dimensions.

Block Detail

Click any block to see sentiment score, safety, conditions, perception gap, and recommended interventions.

Phase 1 Pilot Scope

Mid-Market CBD · Union Square · ~14 blocks · 5,000–50,000 sentiment observations/month

How It Works

Three Phases. One Continuous Loop.

PSP turns dispersed public data and direct sentiment signals into actionable insight — then measures whether the response shifted how people experience public space. Each phase feeds the next. The loop gets more precise with every cycle.

Sense

Gather public experience signals across the pilot area

📡

Collect

NPS micro-surveys at point-of-sale, transit kiosks, and community touchpoints. Paired with existing streams: 911 calls for service, 311 conditions reports, SFMTA ridership, foot traffic index, business revenue signals, and commercial occupancy rates.

🧩

Integrate

All signals joined spatially to block level and matched by time — so a survey response from Tuesday afternoon is compared to conditions on that specific block that same week.

Google BigQuery

🤖

Analyze

Natural language processing on survey open text. Computer vision on street-level imagery. Behavioral patterns from foot traffic data — all processed to extract sentiment signals at scale.

Gemini AI

→

Understand

Track how conditions and felt experience move in relation to each other

📐

Model

Statistical modeling produces a block-level public experience index — combining sentiment responses, conditions scores, and behavioral signals into a single comparable measure across the pilot area.

Vertex AI

🗺️

Map

Scores visualized as a real-time heat map at block level across the pilot area. Foot traffic denominator from Google PDFM ensures high-traffic blocks aren't penalized for more reported incidents.

Maps + PDFM

💡

Identify

Automatic detection of blocks where how people feel diverges sharply from what conditions data shows — these gaps are the highest-value intervention targets and are surfaced as alerts.

→

Act

Deploy resources precisely. Measure what changed.

🎯

Intervene

Ranked intervention recommendations matched to each block's typology and gap profile — from ambassador deployment to lighting upgrades to event activation — drawn from evidence of what has worked in comparable contexts.

📊

Measure

Sentiment tracked before and after each intervention using synthetic control methodology — isolating the effect of the specific action from broader citywide trends, so CBDs know what actually moved the needle.

🔄

Improve

Each cycle adds evidence to the model. Interventions that consistently improve sentiment scores get weighted higher in future recommendations. The system gets more accurate as the CBD uses it.

✓ Validated Need

What Phase 1 Unlocks

Capability	Phase 0 · Now	Phase 1 · Adds
Sentiment measurement	District level, biennial survey	Block level, monthly via NPS touchpoints
Foot traffic denominator	Proxy: transit + parking	Direct: Google PDFM, 5K–50K obs/month
Update frequency	Monthly conditions, biennial perception	Weekly conditions, monthly sentiment
Intervention attribution	None	Synthetic control: causal effect per intervention type
Alert type	Conditions z-score thresholds	Perception-conditions divergence — the real signal
Street imagery	None	Computer vision on 30K blocks via Gemini Vision
Spatial resolution	Neighborhood level (41 units)	Block level (Mid-Market pilot ~14 blocks)

The Ask · Google.org Impact Challenge

36-month grant · AI for Government Innovation

Phase 1: Mid-Market pilot → Phase 2: multi-corridor SF deployment → Phase 3: replication in a second US city. MIT City Science Lab academic oversight + full open-source methodology publication.

Near-Term Path · Philanthropic Pilot

6-month pilot · Mid-Market corridor

Mid-Market + 2 comparison corridors. Validates the measurement methodology. Produces the first neighborhood-level monthly sentiment dataset. Generates Phase 2 application with real numbers.

Partners

City Science Lab San Francisco

Fiscally sponsored by SPUR · San Francisco, CA

City Science Lab SF applies AI, systems modeling, and civic innovation to real problems in the places people live. The lab's model: bring the rigor of research institutions into the operating rhythm of city government — not as consultants, but as embedded scientific partners.

Public Safety Pulse is a flagship project: it takes a civic problem (the disconnect between resource deployment and resident experience), applies the lab's cross-sector convening capacity to unlock access to city data and agency relationships, and produces a tool that is genuinely useful to the people running the city day-to-day.

"Like SimCity — but for real." The lab builds models that city officials can actually use to test interventions before committing resources.

Civic Innovation AI + Systems Modeling City Government Urban Data San Francisco

MIT

MIT Media Lab City Science

MIT Media Lab · Cambridge, MA · Prof. Kent Larson, Director

MIT City Science is a global research program at the MIT Media Lab studying how cities can be designed and managed to enhance human flourishing. The group created CityScope — augmented tabletop platforms for collaborative urban planning — and the City Science Network, spanning 28 labs worldwide.

Public Safety Pulse builds directly on the intellectual lineage of MIT Place Pulse: a landmark project that generated 1.17M pairwise comparisons of street-level images to understand perceived urban safety, producing the StreetScore algorithm. PSP evolves that approach from static image scoring to dynamic, multi-signal, continuous tracking.

MIT provides peer review rigor, research publication infrastructure, and the credibility of academic independence for all findings and methodology.

CityScope Place Pulse Urban Architecture Mobility on Demand City Science Network

Data & Methods

Full technical documentation for the Phase 0 analytical pipeline. Intended for researchers, data scientists, and city technical staff reviewing the methodology. All findings are traceable to public datasets.

Panel Fixed Effects Model

The structural break finding uses a Panel Fixed Effects OLS model estimated on supervisor district × survey year observations. This is the primary statistical contribution of Phase 0.

Perception_dt = α_d + β₁(n311_noise_dt) + β₂(n_violent_dt) + β₃(n_drug_dt) + β₄(n311_graffiti_dt) + β₅(median_income_d) + ε_dt Where: d = supervisor district (11 units, district fixed effects) t = survey year (11 matched waves, 1996–2023) n = 121 matched observations Features: raw counts per district-year (not normalized shares) SE: HC1 heteroskedasticity-consistent (statsmodels) Validation: Leave-one-year-out (LOYO) cross-validation Note: Feature normalization (share-based) listed as planned improvement for v0 formal submission.

The structural break test compares coefficient stability across pre-2019 and post-2019 sub-samples using an F-test for equality (Chow-type test with HC1 standard errors).

Period	RMSE	R²	Notes
Full period (1996–2023)	0.146	0.824	121 observations, in-sample
LOYO cross-validation (all years)	0.252	—	Honest out-of-sample estimate
Pre-2019 LOYO	0.13	—	Conditions reliably predicted perception
Post-2019 (held out)	0.61	—	4.7× error increase — structural break

Z-Score Framework

Neighborhood conditions are expressed as z-scores measuring deviation from each neighborhood's own historical baseline. Baselines are computed using a regime-aware approach: separate means and standard deviations for pre-COVID (pre-2020), lockdown (2020), recovery (2020–2022), and post-COVID (2022+) periods.

z(metric, neighborhood, month) = (observed - regime_mean) / regime_std Where regime = {pre-COVID | lockdown | recovery | post-COVID}

A score of +1.5 means the metric is 1.5 standard deviations above that neighborhood's typical level for that period — not above the city average. This is intentional: it measures deviation from a neighborhood's own pattern, not comparison to other neighborhoods.

Escalation Detection

Escalation events are detected using rate ratios against 12-month rolling baselines, following CDC EARS-style epidemiological observation methodology adapted for urban safety signals.

Rate ratio = observed_30day / expected_30day Expected = 12-month rolling mean per neighborhood per signal Escalation threshold: ≥1.3× for ≥2 consecutive weeks on ≥2 of 3 signal types simultaneously (Priority A dispatch, violent crime, EMS emergency)

Requiring simultaneous elevation across ≥2 signal types serves as a practical false-positive filter across ~4,400 implicit tests. This methodology should be reviewed by a biostatistician for Phase 1 formalization.

Data Sources

All data publicly available via DataSF (data.sfgov.org), BART, US Census Bureau, NOAA, and open APIs. No proprietary data in Phase 0. The current z-score framework integrates 25 metrics across 4 conditions layers. An additional 20+ datasets are on disk and identified for Phase 1 integration.

Layer	Source	Records	Coverage
Safety Incidents	SFPD Incidents (DataSF), SFPD Historical	3.1M	2003–2026
Safety Incidents	CAD Dispatch (DataSF)	2.39M	2016–2026
Safety Incidents	Fire/EMS Calls (DataSF)	1.49M	2003–2026
Safety Incidents	Traffic Crashes (DataSF), SWITRS	14.6K intersections	2006–2026
Street Conditions	311 Service Requests (DataSF)	8.4M	2008–2026
Street Conditions	Tent/Structure Counts (DataSF)	1.7K obs.	2017–2025
Street Conditions	Streetlight Outages (DataSF)	4.9K	2018–2026
Foot Traffic & Transit	SFMTA Ridership (DataSF, GTFS)	514 routes	2012–2026
Foot Traffic & Transit	BART Station Exits	8 stations	2015–2026
Foot Traffic & Transit	Bay Wheels Trip Data	Trip-level	2017–2026
Foot Traffic & Transit	SFPark Meter Occupancy	28K meters	2019–2026
Economic Vitality	Business Register (DataSF)	200K+ businesses	2003–2026
Economic Vitality	CDTFA Sales Tax by Neighborhood	Quarterly	2015–2024
Economic Vitality	Commercial Vacancy Filings (DataSF)	19K	2022–2024
Economic Vitality	Building Permits (DataSF)	50K+	2013–2026
Public Sentiment	SF City Survey (DataSF)	42.7K responses	1996–2023
Public Sentiment	CityBeat (EMC Research)	500/wave	2023–2026
Public Sentiment	Social/news sentiment (18 sources)	10.8K records	2025–2026
Context	US Census ACS (Census Bureau)	244 tracts	2020

Known Limitations

Monthly temporal resolution limits the analysis to within-month patterns. Sub-monthly dynamics (daily/weekly rhythms, time-of-day effects) are not detectable. Phase 1 deploys event-level data where possible.

Perception data is available only at supervisor district level (11 units) and biennial frequency. All neighborhood-level perception values in this dashboard are directional estimates from conditions data — not directly measured.

Cross-layer correlation findings (environmental → safety cascade) were detected in pooled analysis across all neighborhoods. They have NOT been validated at individual neighborhood level. They may not hold for any specific neighborhood and should not be used for operational decision-making without Phase 1 neighborhood-specific validation.

ACS (US Census) commute and population data is currently used as static neighborhood context only (population, income, renter share). Annual publication lag and 1–2 year data delay make it unsuitable for integration into monthly z-score composites. A planned Phase 1 enhancement pairs ACS residential population shifts with building permit and commercial vacancy data to capture neighborhood composition changes — but this requires Google PDFM foot traffic as a dynamic denominator to be meaningful. Not implemented in Phase 0.

The Z-score framework measures deviation from a neighborhood's own historical pattern. It is not a comparison between neighborhoods. A score of 60 in the Tenderloin and a score of 60 in Pacific Heights mean something structurally different in absolute terms.

Activity denominator: Cross-neighborhood comparisons of raw counts require normalization by true foot traffic, which is not available in public data. Phase 1's Google PDFM integration solves this.

active_business_count metric shows a systematic artifact (uniformly elevated across all neighborhoods in recent data pulls), likely reflecting a data refresh methodology change in the Business Register. This metric is excluded from composite scoring pending investigation.

Measure the feeling.Improve the response.

Measure the feeling.
Improve the response.