Points of interest (eng)

How to Work with POI Data

POI stands for Points of Interest. Most of us have encountered POI data at some point when assessing the “intensity” of an area — which neighborhoods are vibrant and which feel inactive. However, outside of serious academic research, analysis often stops at basic metrics like functional diversity or density.

Let’s go deeper.

A more nuanced understanding of POI metrics significantly improves urban environment analysis and is especially valuable for B2G analytical projects.

Key Metrics for Describing POI Structure

1) POI Density / POI Count

A measure of overall functional intensity — how many services and facilities are available within a given area.

2) Category Counts / Category Shares (cat__*)

Defines the functional profile of a territory — which types of functions are present and in what proportions.

3) Richness

The number of distinct functional categories, ignoring their balance. If a grid cell contains 10 categories with 1 POI each, and another contains 10 categories with 50 POIs each, both have identical richness.

4) Shannon Entropy

The most common mixed-use metric. It captures both the number of categories and their balance. However, entropy values (e.g., 0.9 vs. 1.3) are abstract. Without context, they’re difficult to interpret, compare across areas, or explain to non-specialists.

5) Evenness (Pielou’s Evenness)

A normalized measure of balance. It adjusts Shannon entropy by richness, allowing fair comparisons between areas with different numbers of categories.

6) Simpson / Gini–Simpson Index

Measures the probability that two randomly selected POIs belong to different categories. More sensitive to dominant categories and less sensitive to rare ones.

7) Hill Numbers (q = 1, q = 2)

A more interpretable alternative to entropy-based metrics.

Hill numbers answer the question:

“This level of diversity is equivalent to how many equally represented categories?”

Example:

Suppose a grid cell contains:

70% Food
10% Transport
10% Retail
10% Services

Richness = 4
Shannon = moderate
But Hill q=1 would indicate that this area behaves like a place with ~2 equally represented categories.

The other categories exist — but their contribution is marginal. This makes Hill numbers far easier to communicate in policy or business contexts.

8) Dominance / Top-Category Share

Simply the share (%) of the leading category. Used to assess mono-functionality.

9) HHI / Concentration Indices

These measure concentration, not diversity. If each POI category is treated as “market player,” the Herfindahl–Hirschman Index (HHI) indicates how monopolized the functional structure is. Unlike dominance (which only considers the leader), HHI accounts for the strength of all categories.

Example:

Case A:

60% Food
40% Retail
Dominance = 60%
HHI = moderate (two strong categories)

Case B:

60% Food
10% + 10% + 10% + 10%
Dominance = 60%
HHI = high (one dominant function)

10) Kernel Density Estimation (KDE)

Smoothed spatial density accounting for neighborhood effects. Often used as a preprocessing step before functional zoning or clustering.

11) TF–IDF Weights for POI Categories

Identifies signature functions that distinguish an area from others. This metric highlights categories that are not merely frequent, but disproportionately characteristic of a zone.

12) POI-based composite indices (UDI, vibrancy, livability)

There is a growing trend toward standardized composite indicators that integrate density and diversity for inter-territorial comparisons.

How to Interpret Metrics Together

1) High Richness + Low Evenness

Formally diverse, practically specialized. Many categories exist, but one dominates.

Typical for:

Tourist centers
Railway stations
Retail-entertainment clusters

2) High Shannon + High Hill q1

Balanced mixed-use environment. Functions are distributed without a clear leader.

Typical for:

Mixed-use neighborhoods
Transit-Oriented Development (TOD) areas
“15-minute city” environments

3) Low Hill q2 + High Top-Category Share

Strong specialization. Many POIs, but mostly of one type.

Typical for:

University campuses
Hospital clusters
Airports
Industrial zones

POI Data Sources

1) OpenStreetMap (OSM)

Example query:

[out:json][timeout:120];
area["name:ru"="Москва"]->.searchArea;
(
  node["amenity"="hospital"](area.searchArea);
  way["amenity"="hospital"](area.searchArea);
  relation["amenity"="hospital"](area.searchArea);
);
out center tags;

2) Google Places API

3) Foursquare

Personally, I prefer OSM. Commercial APIs tend to overrepresent business entities, and historical archives are easier to extract from OSM.If there’s interest, I can share parsers for these three sources on GitHub or integrate them into a product.

What Can Be Forecasted?

POI dynamics are increasingly forecasted in two main directions:

Functional zoning evolution
Urban activity / attraction shifts

1) Category Growth or Decline (Grid-Level)

Objective: predict whether the number of POIs in a category (e.g., Food & Beverage) will increase or decrease within 1–3 years.

Formulated as:

Count regression
Or classification (growth / stable / decline)

2) Functional Type Transition

Objective: predict whether a grid cell will change its dominant function (e.g., residential → mixed-use or education → mixed-use).

This approach is highly popular in research on functional urban areas, where researchers first classify zones based on their POI structure and then forecast their evolution — often using CA–Markov models. For example: Exploring the Predictive Ability of the CA–Markov Model for Urban Functional Areas in Nanjing Old City (2024).

3) Diversification / Specialization Forecasting

Objective: predict whether a grid becomes:

More mixed-use (↑ Shannon / ↑ Hill q1)
More specialized (↑ dominance / ↑ HHI)

Often implemented via:

Time-series forecasting of diversity metrics
Or prediction of functional profile clusters

4) Emergence of Rare Functions

Objective: predict whether new categories (e.g., Culture, University, Hospital) will appear.

Approaches:

Binary classification (appear / not appear)
Survival analysis (time-to-event), if long time series exist

Features Typically Used

1) The POI-based metrics described above (density, category shares, richness/Shannon/Hill indices, dominance/HHI), along with lagged features — i.e., historical values of these metrics (t−1, t−2, etc.).

2) Inbound trips / attractor strength as a proxy for demand across different functions (e.g., GPS traces), plus OD flows and distance-decay profiles as indicators of the “accessible market.”

3) Network-based distances (i.e., along the street network) to centers, transit stops, arterial roads, and services, as well as accessibility indicators to POIs.

4) Features of neighboring grid cells: average surrounding density/diversification, presence of “cores,” and graph-based features (grid adjacency or road-network connectivity) — particularly in neural network models.

Modeling Approaches

I. CA–Markov

Combines:

Markov chains (how zones transition over time)
Cellular Automata (where transitions occur spatially)

Used for modeling structural transitions rather than precise POI counts.

Pipeline:

Classify functional type
Build transition matrix
Apply CA spatial rules
Forecast functional maps

There’s a solid GeoAI-based implementation of this on GitHub.

Source: Urban land surface temperature forecasting: a data-driven approach using regression and neural network models (2024).

II. XGBoost and Similar Gradient Boosting Models

If the goal is to predict growth or decline in POI categories or diversity metrics, gradient boosting methods and other tabular models perform particularly well — especially when the time series is relatively short. Mobility, accessibility, and lagged features can also be incorporated naturally and without much friction.

These models are a strong choice when quantitative accuracy is the priority: they can estimate how many and which types of functions are likely to appear. However, they do not explicitly model spatial dynamics and typically capture spatial effects only weakly.

III. LSTM / Transformers / Graph Neural Networks

They are used when many time steps are available and when complex “neighbors ↔ time” interactions matter. The key feature is that they jointly model temporal and spatial dependencies.

This approach is more demanding — it requires a long time series — but offers the highest predictive power. At the same time, interpretability is more challenging, which is why it is less frequently applied in policy-making contexts.

Recent example: Urban Grid Population Inflow Prediction via POI-Enhanced Conditional Diffusion with Dual-Dimensional Attention (2025).

Applied Value

For Urban Planners & Local Authorities

Preventing mono-functionality
High dominance + low Hill → risk of “dead zones” outside peak hours.
Targeted development policies

Where to stimulate services
Where to limit overconcentration

Policy evaluation
Growth in Shannon or Hill q1 → indicator of successful mixed-use strategies.

For Private Business

Risk and potential assessment

High Shannon → stable all-day demand
High specialization → peak-driven and fragile demand

Expansion strategy
POI metrics serve as proxies for pedestrian flows, customer mix, and temporal demand patterns.

Who Else Benefits?

Developers (project concept selection)
Transport agencies (function–mobility linkage)
Retail analysts (area segmentation)
Banks and investors (location resilience assessment)
Researchers (POI–mobility–price–livability relationships)

@urban_mash