Examples¶
This section provides real-world examples demonstrating how to use pyBDL for common data analysis tasks.
Basic Data Retrieval¶
Getting Started¶
In [1]:
Copied!
from pybdl import BDL
# Initialize client
bdl = BDL()
# List available administrative levels
levels = bdl.levels.list_levels()
print("Administrative levels:")
print(levels[['id', 'name']])
# List available years
years = bdl.years.list_years()
print(f"\nAvailable years: {years['id'].min()} - {years['id'].max()}")
from pybdl import BDL
# Initialize client
bdl = BDL()
# List available administrative levels
levels = bdl.levels.list_levels()
print("Administrative levels:")
print(levels[['id', 'name']])
# List available years
years = bdl.years.list_years()
print(f"\nAvailable years: {years['id'].min()} - {years['id'].max()}")
Fetching levels: 1 pages [00:00, 11.46 pages/s, items=8]
Administrative levels: id name 0 0 Poziom Polski 1 1 Poziom Makroregionów 2 2 Poziom Województw 3 3 Poziom Regionów 4 4 Poziom Podregionów 5 5 Poziom Powiatów 6 6 Poziom Gmin 7 7 Poziom miejscowości statystycznej
Fetching years: 1 pages [00:00, 1.47 pages/s, items=32]
Available years: 1995 - 2026
Finding Variables¶
In [2]:
Copied!
# Search for population-related variables
population_vars = bdl.variables.search_variables(name="population")
print(f"Found {len(population_vars)} population-related variables")
print(population_vars[['id', 'n1']].head())
# Get details for a specific variable
var_details = bdl.variables.get_variable("3643")
print("\nVariable details:")
print(var_details[['id', 'n1', 'n2']])
# Search for population-related variables
population_vars = bdl.variables.search_variables(name="population")
print(f"Found {len(population_vars)} population-related variables")
print(population_vars[['id', 'n1']].head())
# Get details for a specific variable
var_details = bdl.variables.get_variable("3643")
print("\nVariable details:")
print(var_details[['id', 'n1', 'n2']])
Fetching search: 18 pages [01:44, 5.82s/ pages, items=1760]
Found 1760 population-related variables
id n1
0 9179 concerning self-taxation of the population
1 1365239 total net migration per 1000 population
2 498816 net migration in internal movement per 1000 po...
3 745534 net migration abroad per 1000 population
4 453193 net migration per 1000 population
Variable details:
id n1 n2
0 3643 total 30-39
Retrieving Data¶
In [3]:
Copied!
# Get data for a variable at voivodeship level (level 2)
data = bdl.data.get_data_by_variable(
variable_id="3643",
years=[2021],
unit_level=2 # Voivodeship level
)
print(f"Retrieved {len(data)} data points")
print(data[['unit_name', 'year', 'val']].head())
# Get data for a variable at voivodeship level (level 2)
data = bdl.data.get_data_by_variable(
variable_id="3643",
years=[2021],
unit_level=2 # Voivodeship level
)
print(f"Retrieved {len(data)} data points")
print(data[['unit_name', 'year', 'val']].head())
Fetching 3643: 1 pages [00:00, 2.57 pages/s, items=16]
Retrieved 16 data points
unit_name year val
0 MAŁOPOLSKIE 2021 2
1 ŚLĄSKIE 2021 6
2 LUBUSKIE 2021 0
3 WIELKOPOLSKIE 2021 2
4 ZACHODNIOPOMORSKIE 2021 5
In [4]:
Copied!
data_aggr = bdl.aggregates.list_aggregates()
print(data_aggr)
data_aggr = bdl.aggregates.list_aggregates()
print(data_aggr)
Fetching aggregates: 1 pages [00:00, 13.47 pages/s, items=8]
id name level \
0 1 TOTAL 7
1 2 URBAN GMINAS 5
2 3 URBAN-RURAL GMINAS 5
3 4 RURAL GMINAS 5
4 7 URBAN AREAS 5
5 8 RURAL AREAS 5
6 91 NP- Górnośląsko-Zagłębiowska Metropolia 3
7 92 NP- Metropolia Krakowska 3
description
0 Aggregates for items collected on the level: <...
1 It is a sum of data for urban gminas (unit typ...
2 It is a sum of data for urban-rural gminas (un...
3 It is a sum of these data for rural gminas (un...
4 It is a sum of data for urban areas according...
5 It is a sum of data for rural areas, i.e. rura...
6 NaN
7 NaN
Population Analysis by Region¶
Finding Population Variables¶
In [5]:
Copied!
# Search for population variables
pop_vars = bdl.variables.search_variables(name="population")
# Filter for total population (usually contains "total" or "ogółem")
total_pop = pop_vars[
pop_vars['n2'].str.contains('total', case=False, na=False)
]
print(f"Found {len(total_pop)} total population variables")
# Search for population variables
pop_vars = bdl.variables.search_variables(name="population")
# Filter for total population (usually contains "total" or "ogółem")
total_pop = pop_vars[
pop_vars['n2'].str.contains('total', case=False, na=False)
]
print(f"Found {len(total_pop)} total population variables")
Fetching search: 18 pages [00:00, 284.41 pages/s, items=1760]
Found 321 total population variables
Getting Regional Population Data¶
In [6]:
Copied!
# Get population data for all voivodeships in 2021
pop_data = bdl.data.get_data_by_variable(
variable_id="3643", # Example: total population variable
years=[2021],
unit_level=2 # Voivodeship level
)
# Sort by population
pop_sorted = pop_data.sort_values('val', ascending=False)
print("Top 5 voivodeships by population:")
print(pop_sorted[['unit_name', 'val']].head())
# Get population data for all voivodeships in 2021
pop_data = bdl.data.get_data_by_variable(
variable_id="3643", # Example: total population variable
years=[2021],
unit_level=2 # Voivodeship level
)
# Sort by population
pop_sorted = pop_data.sort_values('val', ascending=False)
print("Top 5 voivodeships by population:")
print(pop_sorted[['unit_name', 'val']].head())
Fetching 3643: 1 pages [00:00, 235.52 pages/s, items=16]
Top 5 voivodeships by population:
unit_name val
15 MAZOWIECKIE 12
1 ŚLĄSKIE 6
4 ZACHODNIOPOMORSKIE 5
8 POMORSKIE 4
10 ŁÓDZKIE 4
Economic Indicator Comparison¶
Finding Economic Variables¶
In [7]:
Copied!
# Search for unemployment variables
unemployment_vars = bdl.variables.search_variables(name="unemployment")
print(f"Found {len(unemployment_vars)} unemployment variables")
# Search for GDP-related variables
gdp_vars = bdl.variables.search_variables(name="GDP")
print(f"Found {len(gdp_vars)} GDP-related variables")
# Search for unemployment variables
unemployment_vars = bdl.variables.search_variables(name="unemployment")
print(f"Found {len(unemployment_vars)} unemployment variables")
# Search for GDP-related variables
gdp_vars = bdl.variables.search_variables(name="GDP")
print(f"Found {len(gdp_vars)} GDP-related variables")
Fetching search: 5 pages [00:28, 5.76s/ pages, items=458]
Found 458 unemployment variables
Fetching search: 1 pages [00:05, 5.25s/ pages, items=11]
Found 11 GDP-related variables
Comparing Voivodeships¶
In [8]:
Copied!
# Get unemployment data for all voivodeships
unemployment_data = bdl.data.get_data_by_variable(
variable_id="1234", # Example unemployment variable ID
years=[2021],
unit_level=2
)
# Sort and display
sorted_unemployment = unemployment_data.sort_values('val', ascending=False)
print("Unemployment by voivodeship (2021):")
print(sorted_unemployment[['unit_name', 'val']].head(10))
# Get unemployment data for all voivodeships
unemployment_data = bdl.data.get_data_by_variable(
variable_id="1234", # Example unemployment variable ID
years=[2021],
unit_level=2
)
# Sort and display
sorted_unemployment = unemployment_data.sort_values('val', ascending=False)
print("Unemployment by voivodeship (2021):")
print(sorted_unemployment[['unit_name', 'val']].head(10))
Fetching 1234: 1 pages [00:00, 5.28 pages/s, items=16]
Unemployment by voivodeship (2021):
unit_name val
15 MAZOWIECKIE 766241
1 ŚLĄSKIE 594875
0 MAŁOPOLSKIE 544790
3 WIELKOPOLSKIE 411015
5 DOLNOŚLĄSKIE 334400
10 ŁÓDZKIE 315051
8 POMORSKIE 297680
12 LUBELSKIE 286248
13 PODKARPACKIE 268623
7 KUJAWSKO-POMORSKIE 199304
Time Series Analysis¶
Preparing Data for Visualization¶
In [9]:
Copied!
# Get time series data for a specific variable and region
time_series = bdl.data.get_data_by_variable(
variable_id="3643",
unit_level=2,
unit_parent_id="020000000000" # Example: Mazovian Voivodeship
)
# Filter to recent years
recent_years = time_series[time_series['year'] >= 2015]
# Sort by year
recent_years = recent_years.sort_values('year')
# Prepare for plotting
plot_data = recent_years[['year', 'val']].set_index('year')
print(plot_data.head())
# Get time series data for a specific variable and region
time_series = bdl.data.get_data_by_variable(
variable_id="3643",
unit_level=2,
unit_parent_id="020000000000" # Example: Mazovian Voivodeship
)
# Filter to recent years
recent_years = time_series[time_series['year'] >= 2015]
# Sort by year
recent_years = recent_years.sort_values('year')
# Prepare for plotting
plot_data = recent_years[['year', 'val']].set_index('year')
print(plot_data.head())
Fetching 3643: 1 pages [00:00, 8.17 pages/s, items=3]
val year 2015 2 2015 3 2015 3 2016 1 2016 2
Error Handling¶
Handling Missing Data¶
In [10]:
Copied!
# Check if data exists before processing
data = bdl.data.get_data_by_variable("3643", years=[2021], unit_level=2)
if data.empty:
print("No data available for this variable/year/level combination")
else:
# Process data
print(f"Found {len(data)} records")
print(data.head())
# Check if data exists before processing
data = bdl.data.get_data_by_variable("3643", years=[2021], unit_level=2)
if data.empty:
print("No data available for this variable/year/level combination")
else:
# Process data
print(f"Found {len(data)} records")
print(data.head())
Fetching 3643: 1 pages [00:00, 258.60 pages/s, items=16]
Found 16 records year val unit_id unit_name attr_id 0 2021 2 11200000000 MAŁOPOLSKIE 1 1 2021 6 12400000000 ŚLĄSKIE 1 2 2021 0 20800000000 LUBUSKIE 0 3 2021 2 23000000000 WIELKOPOLSKIE 1 4 2021 5 23200000000 ZACHODNIOPOMORSKIE 1
Enrichment: Adding Human-Readable Labels¶
The access layer can automatically join reference data (levels, attributes, subjects, etc.)
onto the result DataFrame via the enrich parameter. Enrichment fetches each lookup table
once per session and caches it in memory.
In [1]:
Copied!
from pybdl import BDL
bdl = BDL()
# Enrich variables with level names, measure descriptions, and subject names.
# Without enrichment, the DataFrame only contains raw IDs (level, measure_unit_id, subject_id).
variables = bdl.variables.search_variables(name="population", max_pages=1, enrich=["levels", "measures", "subjects"])
print("Variables with enrichment:")
print(variables[["id_x", "n1", "level", "level_name", "measure_unit_id", "measure_unit_description"]].head())
from pybdl import BDL
bdl = BDL()
# Enrich variables with level names, measure descriptions, and subject names.
# Without enrichment, the DataFrame only contains raw IDs (level, measure_unit_id, subject_id).
variables = bdl.variables.search_variables(name="population", max_pages=1, enrich=["levels", "measures", "subjects"])
print("Variables with enrichment:")
print(variables[["id_x", "n1", "level", "level_name", "measure_unit_id", "measure_unit_description"]].head())
Fetching levels: 1 pages [00:00, 202.55 pages/s, items=8] Fetching measures: 1 pages [00:00, 200.37 pages/s, items=75] Fetching subjects: 1 pages [00:00, 243.37 pages/s, items=33]
Variables with enrichment:
id_x n1 level \
0 9179 concerning self-taxation of the population 5
1 1365239 total net migration per 1000 population 6
2 498816 net migration in internal movement per 1000 po... 6
3 745534 net migration abroad per 1000 population 6
4 453193 net migration per 1000 population 6
level_name measure_unit_id measure_unit_description
0 Poziom Powiatów 8 number of pieces
1 Poziom Gmin 26 number of persons
2 Poziom Gmin 26 number of persons
3 Poziom Gmin 26 number of persons
4 Poziom Gmin 26 number of persons
In [2]:
Copied!
# Enrich data with attribute labels and unit details.
# 'attributes' adds attr_name / attr_symbol columns; 'units' adds unit_name_enriched, unit_level, etc.
data = bdl.data.get_data_by_variable(
variable_id="3643",
years=[2021],
unit_level=2,
enrich=["attributes", "units"],
)
print("Data with enriched attribute and unit columns:")
print(data[["unit_id", "unit_name_enriched", "year", "val", "attr_id", "attr_name"]].head())
# Enrich data with attribute labels and unit details.
# 'attributes' adds attr_name / attr_symbol columns; 'units' adds unit_name_enriched, unit_level, etc.
data = bdl.data.get_data_by_variable(
variable_id="3643",
years=[2021],
unit_level=2,
enrich=["attributes", "units"],
)
print("Data with enriched attribute and unit columns:")
print(data[["unit_id", "unit_name_enriched", "year", "val", "attr_id", "attr_name"]].head())
Fetching 3643: 1 pages [00:00, 215.93 pages/s, items=16] Fetching units: 10 pages [11:54, 71.46s/ pages, items=1000]
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) Cell In[2], line 3 1 # Enrich data with attribute labels and unit details. 2 # 'attributes' adds attr_name / attr_symbol columns; 'units' adds unit_name_enriched, unit_level, etc. ----> 3 data = bdl.data.get_data_by_variable( 4 variable_id="3643", 5 years=[2021], 6 unit_level=2, 7 enrich=["attributes", "units"], 8 ) 9 print("Data with enriched attribute and unit columns:") 10 print(data[["unit_id", "unit_name_enriched", "year", "val", "attr_id", "attr_name"]].head()) File ~/git/uep/pyLDB/pybdl/access/enrichment.py:397, in with_enrichment.<locals>.decorator.<locals>.wrapper(self, *args, **kwargs) 395 for spec in specs: 396 if flags.get(spec.flag): --> 397 lookup_df = _get_lookup(self, spec) 398 df = _merge_enrichment(df, lookup_df, spec) 399 return _recombine_result(df, metadata) File ~/git/uep/pyLDB/pybdl/access/enrichment.py:295, in _get_lookup(access, spec) 293 if spec.cache_key in cache: 294 return cache[spec.cache_key] --> 295 lookup_df = spec.sync_loader(access) 296 cache[spec.cache_key] = lookup_df 297 access._enrichment_cache = cache File ~/git/uep/pyLDB/pybdl/access/enrichment.py:122, in _fetch_units_sync(access) 119 else: 120 from pybdl.api.units import UnitsAPI --> 122 data = UnitsAPI(access.api_client.config).list_units(page_size=access._get_default_page_size(), max_pages=None) 123 return _normalize_lookup_dataframe(access, data) File ~/git/uep/pyLDB/pybdl/api/units.py:108, in UnitsAPI.list_units(self, parent_id, level, page, page_size, max_pages, sort, lang, format, if_none_match, if_modified_since, extra_query) 94 def list_units( 95 self, 96 parent_id: str | None = None, (...) 106 extra_query: dict[str, Any] | None = None, 107 ) -> list[dict[str, Any]]: --> 108 return self._fetch_collection_endpoint( 109 "units", 110 extra_params=self._list_units_params(parent_id, level, page, sort, extra_query), 111 lang=lang, 112 format=format, 113 if_none_match=if_none_match, 114 if_modified_since=if_modified_since, 115 page_size=page_size, 116 max_pages=max_pages, 117 results_key="results", 118 ) File ~/git/uep/pyLDB/pybdl/api/client.py:738, in BaseAPIClient._fetch_collection_endpoint(self, endpoint, extra_params, lang, format, if_none_match, if_modified_since, page_size, max_pages, results_key) 730 params_with_page_size["page-size"] = page_size 731 return self.fetch_single_result( 732 endpoint, 733 results_key=results_key, 734 params=params_with_page_size, 735 headers=headers, 736 ) --> 738 return self.fetch_all_results( 739 endpoint, 740 params=params, 741 headers=headers, 742 page_size=page_size, 743 max_pages=max_pages, 744 results_key=results_key, 745 ) File ~/git/uep/pyLDB/pybdl/api/client.py:536, in BaseAPIClient.fetch_all_results(self, endpoint, method, params, headers, results_key, page_size, max_pages, return_metadata, show_progress) 534 first_page = True 535 try: --> 536 for page in self._paginated_request_sync( 537 endpoint, 538 method=method, 539 params=params, 540 headers=headers, 541 results_key=results_key, 542 page_size=page_size, 543 max_pages=max_pages, 544 ): 545 if results_key not in page: 546 raise BDLResponseError(f"Response does not contain key '{results_key}'", payload=page) File ~/git/uep/pyLDB/pybdl/api/client.py:451, in BaseAPIClient._paginated_request_sync(self, endpoint, method, params, headers, results_key, page_size, max_pages, return_all) 449 if not next_url: 450 break --> 451 resp = self._request_sync_url(next_url, method=method, headers=headers) 453 if results_key not in resp: 454 raise BDLResponseError(f"Response does not contain key '{results_key}'", payload=resp) File ~/git/uep/pyLDB/pybdl/api/client.py:363, in BaseAPIClient._request_sync_url(self, url, method, params, headers) 361 if retries_429 < self.config.http_429_max_retries: 362 retries_429 += 1 --> 363 time.sleep(self._retry_delay_after_429(retries_429 - 1, response)) 364 continue 365 return self._process_response(response) KeyboardInterrupt:
In [23]:
Copied!
# Combine enrichment with metadata retrieval.
# return_metadata=True returns a (DataFrame, metadata_dict) tuple.
df, meta = bdl.data.get_data_by_variable(
variable_id="3643",
years=[2021],
unit_level=2,
enrich=["attributes"],
return_metadata=True,
)
print(f"Total pages: {meta.get('totalPages')}, Total records: {meta.get('totalRecords')}")
print(df[["unit_id", "year", "val", "attr_name"]].head())
# Combine enrichment with metadata retrieval.
# return_metadata=True returns a (DataFrame, metadata_dict) tuple.
df, meta = bdl.data.get_data_by_variable(
variable_id="3643",
years=[2021],
unit_level=2,
enrich=["attributes"],
return_metadata=True,
)
print(f"Total pages: {meta.get('totalPages')}, Total records: {meta.get('totalRecords')}")
print(df[["unit_id", "year", "val", "attr_name"]].head())
Fetching 3643: 1 pages [00:00, 161.21 pages/s, items=16] Fetching attributes: 1 pages [00:00, 15.45 pages/s, items=18]
Total pages: None, Total records: 16
unit_id year val attr_name
0 11200000000 2021 2 value
1 12400000000 2021 6 value
2 20800000000 2021 0 0
3 23000000000 2021 2 value
4 23200000000 2021 5 value
Best Practices¶
- Use the access layer: Prefer
bdl.dataoverbdl.api.datafor DataFrame output - Enable caching: Use
use_cache=Truefor repeated queries - Handle pagination: Use
max_pages=Noneto get complete datasets - Use enrichment: Let the library automatically add human-readable names
- Async for bulk operations: Use async methods when fetching multiple datasets
- Error handling: Always handle
RateLimitErrorand check for empty results
{seealso}
- {doc}`access_layer` for access layer documentation
- {doc}`rate_limiting` for rate limiting details
- {doc}`config` for configuration options