Public Safety Pulse
City Science Lab SF × MIT Media Lab
City Science Lab SF × MIT Media Lab City Science
Public Safety Pulse
Measure the feeling. Improve the response.
San Francisco invests heavily in public safety — policing, cleaning, activations, ambassador programs. No existing system can tell you whether residents and visitors actually experience those neighborhoods as safer and more welcoming. Public Safety Pulse builds that signal: continuous, block-level, and designed to close the gap between operational output and lived public experience.
The Problem
Quantifying the feeling of safety.
Cities currently track incidents: crimes reported, complaints filed, response times. What no system captures is how the city's response shapes public experience in a neighborhood — and how that experience shapes decision making. Without that signal, every program is evaluated on its outputs, never on the outcome it was meant to serve.
The Approach
Add the sentiment layer that's been missing.
Public Safety Pulse (PSP) places brief sentiment surveys at the touchpoints people already use — point-of-sale systems, transit kiosks, community check-ins — collecting direct, point-in-time feedback from residents and visitors at block level. That signal, paired with 22 years of operational data and foot traffic context, produces a composite picture of how people actually experience each block. Not a proxy or a model output — a direct measure, on a cadence agencies can act on.
The Outcome
A more responsive, accountable city.
For the first time, a city will be able to pair operational data with direct, block-level resident experience — updated monthly, tied to specific interventions. Agencies and districts won't just see what was deployed; they'll see whether it changed how people experience their neighborhoods. Felt impact and conditions data, measured together, is the feedback loop that makes a city genuinely responsive and accountable to the people it serves.
Explore PSP
✨
The Vision
Where PSP is going — Phase 1 product roadmap, the full data-to-impact pipeline, and an interactive prototype of block-level experience mapping in Mid-Market.
See the vision →
📊
Perception & the Gap
SF residents report feeling less safe even as many conditions have improved. This tab shows 20 years of that divergence — and why the gap between what's measured and what's felt is where civic investment gets lost.
See the gap →
⚡
Interventions
What current data can and can't tell us about whether city and CBD programs are working — and why connecting operational outputs to resident experience is the measurement layer SF is missing.
See evidence →
🗺️
Neighborhood Baseline Map
Ten years of change across all 41 SF neighborhoods — five data layers, interactive time-lapse from 2016, and four validated findings on the 2018 structural break and early warning signals. The analytical foundation PSP builds from.
View the historical baseline →
🤝
Partners
City Science Lab San Francisco and MIT Media Lab City Science — the institutional partnership behind Public Safety Pulse.
Meet the team →
Phase 0 · Pre-publication · Working Instrument · All findings labeled by validation status
What you're seeing: Each neighborhood colored by how it compares to its own 20-year average — not to other neighborhoods. Red = conditions currently elevated above that neighborhood's norm.Green = quieter than usual. Five data layers: safety incidents, street conditions, foot traffic, economic activity, and public sentiment. Toggle to isolate any one. Time Lapse shows change since 2016.Phase 0 · historical conditions + City Survey proxy · not real-time block-level data
Layer
Window
Composite Concern
LowerHigher
↓ Scroll for findings from this data
Scores show deviation from each neighborhood's own historical baseline — not a comparison between neighborhoods. Showing: 12-month average
2016
✕
What This Data Reveals
Phase 0 is not just setup for Phase 1 — it produced real findings. Four validated signals from 28.6 million public records that change how SF's public safety data should be read. Each is labeled by confidence level.
Phase 0 Data Foundation
Five Layers — What Each One Measures
Phase 0 integrates five independent public data streams. Four of them — safety incidents, street conditions, foot traffic, and economic vitality — predicted the fifth (public sentiment) with R²=0.824 for 15 years (meaning the model explains 82% of the variation in neighborhood sentiment scores across SF's 41 neighborhoods). After 2019, that predictive relationship broke down sharply.
Safety Incidents
Crimes reported to SFPD, 311 safety calls, drug incident reports. 41 neighborhoods · monthly · 2003–2026 · 11.2M records.
Street Conditions
Graffiti, encampments, illegal dumping, cleaning requests from 311. 41 neighborhoods · monthly · 8.4M records.
Foot Traffic & Transit
SFMTA ridership, parking occupancy, meter revenue — proxy measures for neighborhood presence. These signals are directional indicators, not direct foot traffic counts; interpret as relative patterns across neighborhoods rather than absolute volumes.
Economic Vitality
Business registrations and closures, quarterly sales tax revenue. 41 neighborhoods · monthly + quarterly.
Public Sentiment (Proxy)
SF Controller's City Survey — 11 supervisor districts, biennial. The thinnest layer. Phase 1 replaces this with direct block-level monthly measurement.
✓ Validated · Operational Finding
Two Types of Neighborhoods Require Two Different Responses
118 escalation events detected across 41 neighborhoods reveal a consistent split. Chronically elevated neighborhoods (Tenderloin pattern) have persistently high baseline scores with slow, structural decline — they require sustained intervention. Disruptive spike neighborhoods (Visitacion Valley 11 weeks, Russian Hill 12 weeks, Outer Mission 10 weeks) show sharp, fast-moving escalation that standard lagging indicators miss entirely.
The implication: a single alert threshold applied citywide will either miss spikes or flood chronic areas with false positives. These two types need different detection logic and different response protocols.
✓ Validated · Panel Model
The Predictive Model Broke in 2019
For 15 years, the four conditions layers predicted safety perception with R²=0.824. Post-2019, prediction error increased 4.7× (RMSE 0.13 → 0.61). Same data inputs, dramatically worse predictive power. Something structurally changed in how residents form their sense of safety — and conditions data alone can no longer explain it.
2019 is the inflection point. Pre-2018: stable relationship. 2018–2019: early shift. 2020+: model breaks down.
✓ Validated · Early Warning
Escalation Has Detectable Precursors
Analysis of the 118 escalation events shows consistent precursor patterns that appear 2–4 weeks before conditions scores cross alert thresholds. Specific combinations of 311 call type shifts and foot traffic changes reliably precede escalation in both neighborhood types — giving CBDs and city agencies an operational window to intervene before conditions deteriorate.
What 28.6 million public records across five data layers reveal about San Francisco's public safety — and where the data runs out. These findings live in the Explore the Data tab as contextual insights alongside the interactive map.
Phase 0 Data Foundation
Five Layers of Public Data — What Each One Measures
Phase 0 integrates five independent public data streams into a single analytical framework. Each layer captures a different dimension of neighborhood conditions. Four of these layers — safety incidents, street conditions, foot traffic, and economic vitality — predicted the fifth (public sentiment) with R²=0.824 for 15 years. After 2019, that predictive relationship broke down sharply.
Safety Incidents
Crimes reported to SFPD, 311 safety-related calls, drug incident reports. 41 neighborhoods · monthly · 2003–2026 · 11.2M records.
Street Conditions
Graffiti, encampments, illegal dumping, street and sidewalk cleaning requests from 311. 41 neighborhoods · monthly · 8.4M records.
Foot Traffic & Transit
SFMTA ridership, parking occupancy, parking meter revenue — proxy measures for how many people are moving through and spending time in each neighborhood.
Economic Vitality
Business registrations and closures, quarterly sales tax revenue, commercial vacancy trends. 41 neighborhoods · monthly + quarterly.
Public Sentiment (Proxy)
SF Controller's City Survey safety perception scores — 11 supervisor districts, biennial. The thinnest layer: high geographic aggregation, low time resolution. Phase 1 replaces this proxy with direct, block-level, monthly measurement.
The key relationship: For 15 years, the first four layers predicted the fifth with an R² of 0.82. After 2019, that predictive power dropped sharply — the same conditions began producing significantly different levels of felt safety depending on neighborhood, time, and context. The five findings below document what the data shows about this shift.
✓ Validated · 118 Events · Operational Finding
Two Types of Neighborhoods Require Two Different Responses
118 escalation events detected across 41 neighborhoods using rate-ratio methodology (observed vs. 12-month rolling baseline). The analysis revealed a consistent split that has direct implications for how resources and alerts should be configured.
High-volume neighborhoods need sustained management
Tenderloin, SoMa, Mission, Bayview, Financial District are chronically elevated but trending down. Intensive city investment is visible in the data. These neighborhoods don't "spike" — they require consistent operational attention, not emergency response activation.
Medium-volume neighborhoods produce the surprises
Visitacion Valley (11 weeks, fall 2024), Russian Hill (12 weeks, summer 2023), Outer Mission (10 weeks, late 2024) — extended escalation events in neighborhoods that don't typically appear on dashboards. These are the events most likely to blindside city officials without early-warning infrastructure.
Current uniform alert thresholds miss both patterns: they fire too often in high-volume areas (alarm fatigue) and too late in medium-volume areas (response lag). Phase 1 calibrates thresholds per neighborhood type.
Directional · Pooled cross-correlation
Government Responsiveness Is Now the Key Predictor
Pre-2020, commercial activity (sales tax, business count) directionally predicted safety conditions — more commercial vitality correlated with lower disorder. Post-2020, 311 resolution time and response volume became the stronger signal. Residents appear to use "is the city maintaining this?" as a primary safety cue.
Best responsiveness (directional): Tenderloin and Castro — city investment and CBD management are visible in 311 resolution patterns relative to each neighborhood's own baseline
Largest gap (directional): Visitacion Valley, Mission — city response rate slowing beyond what demand alone explains, per 311 data
Directional finding from pooled cross-correlation analysis. Pre-2020: commercial health negatively associated with safety anomalies. Post-2020: 311 volume and resolution time became stronger signals. Specific coefficients pending formal model run.
✓ Validated · District Level
Perception and Conditions Are Telling Different Stories
Safety perception hit its lowest recorded level in 2023 while many objective conditions were stable or improving. Downtown perception has since recovered — but the recovery is geographically concentrated and the measurement gap remains.
3.63
City Survey 2023 (1–5)
83%
Downtown day, CityBeat 2026
Muni safety perception: +15 points (2021–2025, SFMTA Rider Survey) while general neighborhood perception declined — context-specific interventions can move targeted perception without requiring citywide improvement
CityBeat surveys SF likely voters (n=500, ±4.4pp). Downtown recovery is a positive signal but limited in geographic scope. Not citywide.
✓ Validated · Panel FE Model
The Four Conditions That Stopped Working in 2019
For 15 years, four specific signals reliably predicted safety perception across all 11 supervisor districts: noise complaint share, violent crime per capita, drug offense rate, and graffiti share. Model error: 0.13. After 2019, the same four signals produced 4.7× more error (0.61). The predictive relationship broke.
0.82
R² pre-2019
0.61
RMSE post-2019
4.7×
Error increase
This is not an argument against addressing noise, crime, drugs, and graffiti. It's evidence that those actions no longer reliably translate into residents feeling safer — and that without measuring perception directly, there's no way to know if interventions are working.
✓ Validated · Change Point Detection
2018 Was the Real Turning Point — Not 2020
Change point detection across 41 neighborhoods identified 2018 as the year with the highest concentration of structural shifts across all layers. COVID arrived after the underlying fabric had already begun fraying. Economic Vitality layer showed the earliest directional shifts — commercial corridor hollowing preceded visible safety and environmental deterioration in the pooled analysis.
Key implication: Interventions focused on 2020 and beyond are addressing consequences, not causes. The commercial and economic hollowing that set the conditions for the 2019–2023 perception collapse began at least two years earlier.
Caveat: "Economic Vitality leading" is a directional finding from pooled cross-neighborhood analysis. It has not been validated individually for each neighborhood in the dataset. Phase 1 formalizes this with neighborhood-specific lead-lag analysis.
Perception & the Measurement Gap
How safe do San Franciscans feel in their neighborhoods — and is that changing? This tab presents what the data shows.
The citywide standard is the SF Controller's City Survey: a biennial resident survey tracking safety perception since 1996. City departments run their own service-specific surveys; outside research firms conduct supplemental polls. Together, these sources provide a directional read on public sentiment.
None of them tell us how a specific block or corridor is doing right now. They capture citywide or district-level sentiment, measured every two years. That temporal and geographic gap — between monthly operational data and biennial, district-level sentiment — is what Phase 1 is designed to close.
✓ Verified · City Survey + CityBeat
Safety Perception Over Time: Decline, Then Partial Recovery
The SF Controller's City Survey has tracked citywide resident safety perception since 1996 — it is the standard measure of how San Franciscans feel about safety in their neighborhoods. Scores were stable near 4.2 on a 5-point scale from 2013 through 2019, then fell to a historic low of 3.63 in 2023. The CityBeat poll (SF likely voters, downtown focus, annual, n=500) provides a supplemental read on downtown-specific perception — a different population, different geography, different method. Both are shown below.
City Survey (Citywide)
Source: SF Controller's Office City Survey, 42K responses, 1–5 scale. Biennial. District level only.
CityBeat Downtown Perception
Source: CityBeat 2026, EMC Research / United Airlines. SF likely voters (n=500, ±4.4pp) asked about downtown SF. Annual. Not a citywide sample.
City Survey 2023 citywide: 3.63/5.0 — lowest in survey history. Perception declined even as many objective conditions improved.
CityBeat 2026 downtown daytime: 83% report feeling safe (up from 64% in 2023). Nighttime: 51% (up from 30%). Downtown recovery is real and spatially concentrated.
The gap between citywide City Survey and downtown CityBeat recovery is itself a finding: improvement is spatially concentrated and has not reached all neighborhoods equally.
✓ Verified · City Survey 2023
Perception by Supervisor District (2023)
Finest resolution available from public data. 11 districts × biennial measurement. Day (amber) vs. Night (blue) scores on 1–5 scale.
Daytime safetyNighttime safety
✓ Validated · Panel Model Finding
This Changes How We Think About the Problem
Before 2019, the data was predictable: clean streets, lower crime rates, and faster response times reliably corresponded with residents feeling safer. That relationship held for 15 years. Then it broke. Phase 0 analysis of the same data post-2019 found that a single factor emerged as the dominant predictor of whether people experience their neighborhoods as safe — not crime rates, not cleanliness, but whether government appears to be paying attention and responding.†
This has a direct operational implication: improving public safety perception runs through being seen as responsive — a signal no conditions dataset measures. That's what Phase 1 builds.
† Panel fixed-effects model, R²=0.824, 15-year training window 2003–2018, 41 neighborhoods. Full specification in Data & Methods →
The Phase 1 Gap
What Phase 0 Sees vs. What's Missing
The four conditions layers characterize neighborhoods with 47 metrics at monthly resolution across 22 years. The perception layer — how residents actually feel — is measured at district level, every two years. That gap is where intervention money gets wasted: conditions can improve while perception stays stuck, or perception can recover in ways the data doesn't capture.
🔴 Safety Incidents
41 neighborhoods · monthly · 2003–2026
🟡 Street Conditions
41 neighborhoods · monthly · 8.4M records
🔵 Foot Traffic & Transit
Proxy measures only · transit, parking, meters
🟢 Economic Vitality
41 neighborhoods · quarterly sales tax + monthly business
Phase 1 fills this gap by deploying direct perception measurement via NPS touchpoints on existing digital infrastructure: point-of-sale systems, Clipper card reader prompts, Envoy visitor kiosks. Target: 5,000–50,000 monthly responses at neighborhood level. This turns the empty bar above into monthly data at the same resolution as conditions.
Interventions & Early Warning Patterns
San Francisco currently tracks incident rates, response times, and conditions across all 41 neighborhoods. These signals identify where pressure is building. What they don't provide is the granular, high-frequency resident feedback needed to contextualize those incidents — whether people experienced the response, whether conditions shifts were noticed, or whether a specific deployment changed the ambient experience of a block. That is the measurement layer Phase 1 provides.
Directional · Cross-Neighborhood Pattern
What the Current System Measures — and What It Doesn't
The current analysis baseline (Phase 0) tracks three operational signals for each neighborhood:
Incident rate — how often events are occurring relative to each neighborhood's own historical baseline
Response time — how quickly city services are deployed after a call or complaint
Resolution rate — whether 311 requests and service calls are being closed
These tell you whether operations are functioning. What they do not capture is the ambient experience between service calls — whether the street environment improved in ways residents noticed, how interactions with city services felt, or whether the neighborhood as a whole experienced a change. That feedback layer is what Phase 1 is built to collect.
Phase 0 measures operational outputs: incident rate, response time, resolution rate. Phase 1 adds the missing dimension — direct resident feedback on the ambient experience between and around those operations.
Phase 1 uses synthetic control methodology to compare sentiment trends on intervention blocks against similar non-intervention blocks — isolating the effect of specific programs from broader citywide trends.
Directional · Pooled Analysis
Precursor Patterns: What to Watch
Cross-layer lead-lag analysis identified consistent directional patterns in commercial corridor neighborhoods. These are not validated predictive models — they are directional signals worth tracking and formalizing in Phase 1.
Environmental → Safety (lag ~1 month): Waste and odor complaints tend to precede increased 911 calls for service in commercial corridors. Directional signal — worth tracking as an early warning.
Economic → Safety (lag ~6 months): Sales tax declines in commercial corridors showed directional correlation with property crime increases at a 6-month lag. Commercial hollowing appears to precede visible safety decline.
Critical caveat: These patterns were detected in pooled cross-neighborhood analysis. They are NOT validated at individual neighborhood level and may not hold for any specific neighborhood. Phase 1 formalizes these with neighborhood-specific baselines.
Phase 1 Framework
Four Intervention Domains: What Phase 1 Will Measure
Phase 1 deploys a before/after measurement framework using synthetic control methods to estimate causal effects of specific intervention types. Phase 0 can identify where conditions are anomalous; Phase 1 measures whether an intervention moved perception.
Pop-up markets, public performances, outdoor programming. Proxy: 311 noise complaint pattern shifts (positive noise vs disorder noise distinction). Phase 1: event-level perception surveys at activation points.
Phase 1 design ready
🌿
Smells — Cleanliness Infrastructure
Street cleaning frequency, encampment response, Pit Stop program expansion. Proxy: waste/odor 311 report trends. Phase 0 shows environmental signals precede 911 service calls at a 1-month lag — cleanliness intervention may interrupt the cascade.
Cascade signal directional
📡
Civic Signals — Non-Police Safety
Ambassador programs, crisis intervention teams, CAHOOTS model deployments. These non-police alternative response programs are currently invisible to operational data — their activity isn't logged in systems PSP can access. Phase 1 creates the measurement infrastructure to evaluate what city-funded and philanthropically-supported alternative response programs actually deliver in public experience terms.
Early Warning Patterns: The Value of Proactive Detection
Historical analysis retrospectively identified 118 escalation events across 41 SF neighborhoods between 2003 and 2024. Cities inevitably respond to these events — the question is when and with what precision. The three cards below break down what proactive detection enables at each stage.
01 · What Triggers It
Cross-layer signals — a spike in environmental complaints followed by elevated 911 call volumes, or property crime rising against a neighborhood's own baseline rather than a citywide threshold. The pattern precedes the visible crisis by weeks.
02 · What Data Confirms It
Rate-ratio analysis compares each neighborhood against its own 12-month rolling baseline — not citywide averages. Low-baseline neighborhoods that generate modest raw counts still surface as escalating. 118 events were confirmed across the 22-year dataset.
03 · What Action It Enables
Earlier detection means a wider window to deploy a targeted response before escalation peaks. Pairing the signal with direct resident sentiment enables measurement of whether that response actually changed public experience — turning each event into a validated reference case.
Three illustrative events from the dataset
Visitacion Valley
Fall 2024
HIGH · 11 weeks
Signals Detected
Violent crime ↑1.4×911 calls ↑1.3×
Wk 1
Wk 11
Both signals crossed threshold simultaneously — longest sustained multi-signal event in dataset
Proactive Detection Enables
📍 Pattern flagged at Week 1–2 — wider response window before escalation peaks
🎯 Specific intervention deployed against emerging signal, not established crisis
💬 Resident sentiment confirms whether the block was experiencing deterioration before conditions data crossed threshold
📊 Response logged against event — builds evidence base for future similar patterns
Russian Hill
Summer 2023
MODERATE · 12 weeks
Signals Detected
Property crime ↑1.3×311 calls ↑1.5×
Wk 1
Wk 12
Low-baseline neighborhood — raw numbers stay low. Citywide thresholds miss it. Baseline-relative detection surfaces it.
Proactive Detection Enables
📍 Neighborhood-relative threshold surfaces the signal that citywide counts miss entirely
🎯 Response calibrated to neighborhood type and baseline — not applied from a generic citywide protocol
💬 Did residents notice before conditions data flagged it? Sentiment answers this.
📊 12-week event becomes a validated reference case for comparable neighborhoods
Outer Mission
Late 2024
HIGH · 10 weeks
Signals Detected
911 calls ↑1.4×EMS calls ↑1.3×
Wk 1
Wk 10
911 + EMS co-escalation suggests single underlying driver. Early alert has highest value in this co-occurrence pattern.
Proactive Detection Enables
📍 Co-escalation pattern triggers coordinated response at Week 1, before either signal peaks
🎯 Single underlying driver hypothesis informs which agencies to coordinate — not generic multi-agency dispatch
💬 Resident experience tracked through the event window — did the response shorten what residents experienced?
📊 Resolution logged with sentiment delta — refines recommended response for the next co-escalation event
The evidence-building value: Each measured intervention — response logged, sentiment tracked before and after, outcome recorded — adds to a neighborhood-specific evidence base. Over time, that base answers: which response types work for which escalation patterns in which neighborhood contexts? The 118 events identified here represent 118 opportunities to build that knowledge. Phase 1 starts capturing it in real time.
The Vision
Phase 1 builds the infrastructure to directly measure how people experience public space — continuously, at the block level, and connected to the interventions being deployed. Below is the full product roadmap, the data-to-impact pipeline, and an interactive prototype of what the tool looks like in Mid-Market.
What PSP Does
A continuous feedback loop between public experience and the decisions that improve it.
📍
Block-level resolution
Public experience varies dramatically within a single neighborhood — one block can feel safe while the next does not. PSP measures at the block level so Community Benefit Districts (CBDs) and city agencies can see exactly where experience is declining and target resources precisely, not broadly.
📅
Continuous measurement. Operational cadence.
PSP collects sentiment continuously through Net Promoter Score (NPS) touchpoints, behavioral signals, and imagery analysis. The reported signal is aggregated monthly — the threshold where block-level samples are large enough to be statistically meaningful and CBDs operate their planning cycles. Collection is always on; insight arrives on a rhythm agencies can act on.
🔁
Closed-loop learning
PSP connects interventions to outcomes. When a CBD deploys ambassadors on a block, PSP measures whether public experience on that block improved — compared to similar blocks without the intervention. Each cycle adds to a growing evidence base that trains the recommendation model, so the system gets more accurate about what works as it's used.
Phase 1 Vision Prototype · Illustrative — not live data
Mid-Market CBD — Block-Level Sentiment Map
This is an illustrative prototype of how Public Safety Pulse would operate in the Mid-Market pilot zone. Each circle represents a block-level measurement unit along the Market/Mission corridor. Fill color reflects the current sentiment score for that block; border color signals operational urgency — from critical (purple) to low (green). In the live system, these scores update monthly from direct NPS surveys and integrated civic signal data. The dashed boundary marks the proposed Phase 1 pilot zone.
Click any block to see its simulated sentiment score, conditions, perception gap, and recommended interventions. Use the layer toggle to switch between data dimensions.
Block Detail
Click any block to see sentiment score, safety, conditions, perception gap, and recommended interventions.
PSP turns dispersed public data and direct sentiment signals into actionable insight — then measures whether the response shifted how people experience public space. Each phase feeds the next. The loop gets more precise with every cycle.
01
Sense
Gather public experience signals across the pilot area
📡
Collect
NPS micro-surveys at point-of-sale, transit kiosks, and community touchpoints. Paired with existing streams: 911 calls for service, 311 conditions reports, SFMTA ridership, foot traffic index, business revenue signals, and commercial occupancy rates.
🧩
Integrate
All signals joined spatially to block level and matched by time — so a survey response from Tuesday afternoon is compared to conditions on that specific block that same week.
Google BigQuery
🤖
Analyze
Natural language processing on survey open text. Computer vision on street-level imagery. Behavioral patterns from foot traffic data — all processed to extract sentiment signals at scale.
Gemini AI
→
02
Understand
Track how conditions and felt experience move in relation to each other
📐
Model
Statistical modeling produces a block-level public experience index — combining sentiment responses, conditions scores, and behavioral signals into a single comparable measure across the pilot area.
Vertex AI
🗺️
Map
Scores visualized as a real-time heat map at block level across the pilot area. Foot traffic denominator from Google PDFM ensures high-traffic blocks aren't penalized for more reported incidents.
Maps + PDFM
💡
Identify
Automatic detection of blocks where how people feel diverges sharply from what conditions data shows — these gaps are the highest-value intervention targets and are surfaced as alerts.
→
03
Act
Deploy resources precisely. Measure what changed.
🎯
Intervene
Ranked intervention recommendations matched to each block's typology and gap profile — from ambassador deployment to lighting upgrades to event activation — drawn from evidence of what has worked in comparable contexts.
📊
Measure
Sentiment tracked before and after each intervention using synthetic control methodology — isolating the effect of the specific action from broader citywide trends, so CBDs know what actually moved the needle.
🔄
Improve
Each cycle adds evidence to the model. Interventions that consistently improve sentiment scores get weighted higher in future recommendations. The system gets more accurate as the CBD uses it.
✓ Validated Need
What Phase 1 Unlocks
Capability
Phase 0 · Now
Phase 1 · Adds
Sentiment measurement
District level, biennial survey
Block level, monthly via NPS touchpoints
Foot traffic denominator
Proxy: transit + parking
Direct: Google PDFM, 5K–50K obs/month
Update frequency
Monthly conditions, biennial perception
Weekly conditions, monthly sentiment
Intervention attribution
None
Synthetic control: causal effect per intervention type
Alert type
Conditions z-score thresholds
Perception-conditions divergence — the real signal
Street imagery
None
Computer vision on 30K blocks via Gemini Vision
Spatial resolution
Neighborhood level (41 units)
Block level (Mid-Market pilot ~14 blocks)
The Ask · Google.org Impact Challenge
36-month grant · AI for Government Innovation
Phase 1: Mid-Market pilot → Phase 2: multi-corridor SF deployment → Phase 3: replication in a second US city. MIT City Science Lab academic oversight + full open-source methodology publication.
Near-Term Path · Philanthropic Pilot
6-month pilot · Mid-Market corridor
Mid-Market + 2 comparison corridors. Validates the measurement methodology. Produces the first neighborhood-level monthly sentiment dataset. Generates Phase 2 application with real numbers.
Partners
CS
City Science Lab San Francisco
Fiscally sponsored by SPUR · San Francisco, CA
City Science Lab SF applies AI, systems modeling, and civic innovation to real problems in the places people live. The lab's model: bring the rigor of research institutions into the operating rhythm of city government — not as consultants, but as embedded scientific partners.
Public Safety Pulse is a flagship project: it takes a civic problem (the disconnect between resource deployment and resident experience), applies the lab's cross-sector convening capacity to unlock access to city data and agency relationships, and produces a tool that is genuinely useful to the people running the city day-to-day.
"Like SimCity — but for real." The lab builds models that city officials can actually use to test interventions before committing resources.
Civic InnovationAI + Systems ModelingCity GovernmentUrban DataSan Francisco
MIT
MIT Media Lab City Science
MIT Media Lab · Cambridge, MA · Prof. Kent Larson, Director
MIT City Science is a global research program at the MIT Media Lab studying how cities can be designed and managed to enhance human flourishing. The group created CityScope — augmented tabletop platforms for collaborative urban planning — and the City Science Network, spanning 28 labs worldwide.
Public Safety Pulse builds directly on the intellectual lineage of MIT Place Pulse: a landmark project that generated 1.17M pairwise comparisons of street-level images to understand perceived urban safety, producing the StreetScore algorithm. PSP evolves that approach from static image scoring to dynamic, multi-signal, continuous tracking.
MIT provides peer review rigor, research publication infrastructure, and the credibility of academic independence for all findings and methodology.
CityScopePlace PulseUrban ArchitectureMobility on DemandCity Science Network
Data & Methods
Full technical documentation for the Phase 0 analytical pipeline. Intended for researchers, data scientists, and city technical staff reviewing the methodology. All findings are traceable to public datasets.
Panel Fixed Effects Model
The structural break finding uses a Panel Fixed Effects OLS model estimated on supervisor district × survey year observations. This is the primary statistical contribution of Phase 0.
Perception_dt = α_d + β₁(n311_noise_dt) + β₂(n_violent_dt)
+ β₃(n_drug_dt) + β₄(n311_graffiti_dt)
+ β₅(median_income_d) + ε_dt
Where:
d = supervisor district (11 units, district fixed effects)
t = survey year (11 matched waves, 1996–2023)
n = 121 matched observations
Features: raw counts per district-year (not normalized shares)
SE: HC1 heteroskedasticity-consistent (statsmodels)
Validation: Leave-one-year-out (LOYO) cross-validation
Note: Feature normalization (share-based) listed as planned
improvement for v0 formal submission.
The structural break test compares coefficient stability across pre-2019 and post-2019 sub-samples using an F-test for equality (Chow-type test with HC1 standard errors).
Period
RMSE
R²
Notes
Full period (1996–2023)
0.146
0.824
121 observations, in-sample
LOYO cross-validation (all years)
0.252
—
Honest out-of-sample estimate
Pre-2019 LOYO
0.13
—
Conditions reliably predicted perception
Post-2019 (held out)
0.61
—
4.7× error increase — structural break
Z-Score Framework
Neighborhood conditions are expressed as z-scores measuring deviation from each neighborhood's own historical baseline. Baselines are computed using a regime-aware approach: separate means and standard deviations for pre-COVID (pre-2020), lockdown (2020), recovery (2020–2022), and post-COVID (2022+) periods.
A score of +1.5 means the metric is 1.5 standard deviations above that neighborhood's typical level for that period — not above the city average. This is intentional: it measures deviation from a neighborhood's own pattern, not comparison to other neighborhoods.
Escalation Detection
Escalation events are detected using rate ratios against 12-month rolling baselines, following CDC EARS-style epidemiological observation methodology adapted for urban safety signals.
Rate ratio = observed_30day / expected_30day
Expected = 12-month rolling mean per neighborhood per signal
Escalation threshold: ≥1.3× for ≥2 consecutive weeks
on ≥2 of 3 signal types simultaneously
(Priority A dispatch, violent crime, EMS emergency)
Requiring simultaneous elevation across ≥2 signal types serves as a practical false-positive filter across ~4,400 implicit tests. This methodology should be reviewed by a biostatistician for Phase 1 formalization.
Data Sources
All data publicly available via DataSF (data.sfgov.org), BART, US Census Bureau, NOAA, and open APIs. No proprietary data in Phase 0. The current z-score framework integrates 25 metrics across 4 conditions layers. An additional 20+ datasets are on disk and identified for Phase 1 integration.
Layer
Source
Records
Coverage
Safety Incidents
SFPD Incidents (DataSF), SFPD Historical
3.1M
2003–2026
Safety Incidents
CAD Dispatch (DataSF)
2.39M
2016–2026
Safety Incidents
Fire/EMS Calls (DataSF)
1.49M
2003–2026
Safety Incidents
Traffic Crashes (DataSF), SWITRS
14.6K intersections
2006–2026
Street Conditions
311 Service Requests (DataSF)
8.4M
2008–2026
Street Conditions
Tent/Structure Counts (DataSF)
1.7K obs.
2017–2025
Street Conditions
Streetlight Outages (DataSF)
4.9K
2018–2026
Foot Traffic & Transit
SFMTA Ridership (DataSF, GTFS)
514 routes
2012–2026
Foot Traffic & Transit
BART Station Exits
8 stations
2015–2026
Foot Traffic & Transit
Bay Wheels Trip Data
Trip-level
2017–2026
Foot Traffic & Transit
SFPark Meter Occupancy
28K meters
2019–2026
Economic Vitality
Business Register (DataSF)
200K+ businesses
2003–2026
Economic Vitality
CDTFA Sales Tax by Neighborhood
Quarterly
2015–2024
Economic Vitality
Commercial Vacancy Filings (DataSF)
19K
2022–2024
Economic Vitality
Building Permits (DataSF)
50K+
2013–2026
Public Sentiment
SF City Survey (DataSF)
42.7K responses
1996–2023
Public Sentiment
CityBeat (EMC Research)
500/wave
2023–2026
Public Sentiment
Social/news sentiment (18 sources)
10.8K records
2025–2026
Context
US Census ACS (Census Bureau)
244 tracts
2020
Known Limitations
Monthly temporal resolution limits the analysis to within-month patterns. Sub-monthly dynamics (daily/weekly rhythms, time-of-day effects) are not detectable. Phase 1 deploys event-level data where possible.
Perception data is available only at supervisor district level (11 units) and biennial frequency. All neighborhood-level perception values in this dashboard are directional estimates from conditions data — not directly measured.
Cross-layer correlation findings (environmental → safety cascade) were detected in pooled analysis across all neighborhoods. They have NOT been validated at individual neighborhood level. They may not hold for any specific neighborhood and should not be used for operational decision-making without Phase 1 neighborhood-specific validation.
ACS (US Census) commute and population data is currently used as static neighborhood context only (population, income, renter share). Annual publication lag and 1–2 year data delay make it unsuitable for integration into monthly z-score composites. A planned Phase 1 enhancement pairs ACS residential population shifts with building permit and commercial vacancy data to capture neighborhood composition changes — but this requires Google PDFM foot traffic as a dynamic denominator to be meaningful. Not implemented in Phase 0.
The Z-score framework measures deviation from a neighborhood's own historical pattern. It is not a comparison between neighborhoods. A score of 60 in the Tenderloin and a score of 60 in Pacific Heights mean something structurally different in absolute terms.
Activity denominator: Cross-neighborhood comparisons of raw counts require normalization by true foot traffic, which is not available in public data. Phase 1's Google PDFM integration solves this.
active_business_count metric shows a systematic artifact (uniformly elevated across all neighborhoods in recent data pulls), likely reflecting a data refresh methodology change in the Business Register. This metric is excluded from composite scoring pending investigation.