Points of interest (eng)
How to Work with POI Data
POI stands for Points of Interest. Most of us have encountered POI data at some point when assessing the “intensity” of an area — which neighborhoods are vibrant and which feel inactive. However, outside of serious academic research, analysis often stops at basic metrics like functional diversity or density.
A more nuanced understanding of POI metrics significantly improves urban environment analysis and is especially valuable for B2G analytical projects.
Key Metrics for Describing POI Structure
A measure of overall functional intensity — how many services and facilities are available within a given area.
2) Category Counts / Category Shares (cat__*)
Defines the functional profile of a territory — which types of functions are present and in what proportions.
The number of distinct functional categories, ignoring their balance. If a grid cell contains 10 categories with 1 POI each, and another contains 10 categories with 50 POIs each, both have identical richness.
The most common mixed-use metric. It captures both the number of categories and their balance. However, entropy values (e.g., 0.9 vs. 1.3) are abstract. Without context, they’re difficult to interpret, compare across areas, or explain to non-specialists.
5) Evenness (Pielou’s Evenness)
A normalized measure of balance. It adjusts Shannon entropy by richness, allowing fair comparisons between areas with different numbers of categories.
6) Simpson / Gini–Simpson Index
Measures the probability that two randomly selected POIs belong to different categories. More sensitive to dominant categories and less sensitive to rare ones.
7) Hill Numbers (q = 1, q = 2)
A more interpretable alternative to entropy-based metrics.
Hill numbers answer the question:
“This level of diversity is equivalent to how many equally represented categories?”
Richness = 4
Shannon = moderate
But Hill q=1 would indicate that this area behaves like a place with ~2 equally represented categories.
The other categories exist — but their contribution is marginal. This makes Hill numbers far easier to communicate in policy or business contexts.
8) Dominance / Top-Category Share
Simply the share (%) of the leading category. Used to assess mono-functionality.
9) HHI / Concentration Indices
These measure concentration, not diversity. If each POI category is treated as “market player,” the Herfindahl–Hirschman Index (HHI) indicates how monopolized the functional structure is. Unlike dominance (which only considers the leader), HHI accounts for the strength of all categories.
10) Kernel Density Estimation (KDE)
Smoothed spatial density accounting for neighborhood effects. Often used as a preprocessing step before functional zoning or clustering.
11) TF–IDF Weights for POI Categories
Identifies signature functions that distinguish an area from others. This metric highlights categories that are not merely frequent, but disproportionately characteristic of a zone.
12) POI-based composite indices (UDI, vibrancy, livability)
There is a growing trend toward standardized composite indicators that integrate density and diversity for inter-territorial comparisons.
How to Interpret Metrics Together
1) High Richness + Low Evenness
Formally diverse, practically specialized. Many categories exist, but one dominates.
2) High Shannon + High Hill q1
Balanced mixed-use environment. Functions are distributed without a clear leader.
3) Low Hill q2 + High Top-Category Share
Strong specialization. Many POIs, but mostly of one type.
POI Data Sources
[out:json][timeout:120]; area["name:ru"="Москва"]->.searchArea; ( node["amenity"="hospital"](area.searchArea); way["amenity"="hospital"](area.searchArea); relation["amenity"="hospital"](area.searchArea); ); out center tags;
3) Foursquare
Personally, I prefer OSM. Commercial APIs tend to overrepresent business entities, and historical archives are easier to extract from OSM.If there’s interest, I can share parsers for these three sources on GitHub or integrate them into a product.
What Can Be Forecasted?
POI dynamics are increasingly forecasted in two main directions:
1) Category Growth or Decline (Grid-Level)
Objective: predict whether the number of POIs in a category (e.g., Food & Beverage) will increase or decrease within 1–3 years.
Objective: predict whether a grid cell will change its dominant function (e.g., residential → mixed-use or education → mixed-use).
This approach is highly popular in research on functional urban areas, where researchers first classify zones based on their POI structure and then forecast their evolution — often using CA–Markov models. For example: Exploring the Predictive Ability of the CA–Markov Model for Urban Functional Areas in Nanjing Old City (2024).
3) Diversification / Specialization Forecasting
Objective: predict whether a grid becomes:
4) Emergence of Rare Functions
Objective: predict whether new categories (e.g., Culture, University, Hospital) will appear.
- Binary classification (appear / not appear)
- Survival analysis (time-to-event), if long time series exist
Features Typically Used
1) The POI-based metrics described above (density, category shares, richness/Shannon/Hill indices, dominance/HHI), along with lagged features — i.e., historical values of these metrics (t−1, t−2, etc.).
2) Inbound trips / attractor strength as a proxy for demand across different functions (e.g., GPS traces), plus OD flows and distance-decay profiles as indicators of the “accessible market.”
3) Network-based distances (i.e., along the street network) to centers, transit stops, arterial roads, and services, as well as accessibility indicators to POIs.
4) Features of neighboring grid cells: average surrounding density/diversification, presence of “cores,” and graph-based features (grid adjacency or road-network connectivity) — particularly in neural network models.
Modeling Approaches
I. CA–Markov
Used for modeling structural transitions rather than precise POI counts.
There’s a solid GeoAI-based implementation of this on GitHub.
II. XGBoost and Similar Gradient Boosting Models
If the goal is to predict growth or decline in POI categories or diversity metrics, gradient boosting methods and other tabular models perform particularly well — especially when the time series is relatively short. Mobility, accessibility, and lagged features can also be incorporated naturally and without much friction.
These models are a strong choice when quantitative accuracy is the priority: they can estimate how many and which types of functions are likely to appear. However, they do not explicitly model spatial dynamics and typically capture spatial effects only weakly.
III. LSTM / Transformers / Graph Neural Networks
They are used when many time steps are available and when complex “neighbors ↔ time” interactions matter. The key feature is that they jointly model temporal and spatial dependencies.
This approach is more demanding — it requires a long time series — but offers the highest predictive power. At the same time, interpretability is more challenging, which is why it is less frequently applied in policy-making contexts.
Recent example: Urban Grid Population Inflow Prediction via POI-Enhanced Conditional Diffusion with Dual-Dimensional Attention (2025).
Applied Value
For Urban Planners & Local Authorities
- Preventing mono-functionality
High dominance + low Hill → risk of “dead zones” outside peak hours. - Targeted development policies
For Private Business
- Expansion strategy
POI metrics serve as proxies for pedestrian flows, customer mix, and temporal demand patterns.