Examples¶

This section provides real-world examples demonstrating how to use pyBDL for common data analysis tasks.

Basic Data Retrieval¶

Getting Started¶

In [1]:

Copied!





from pybdl import BDL

# Initialize client
bdl = BDL()

# List available administrative levels
levels = bdl.levels.list_levels()
print("Administrative levels:")
print(levels[['id', 'name']])

# List available years
years = bdl.years.list_years()
print(f"\nAvailable years: {years['id'].min()} - {years['id'].max()}")
from pybdl import BDL

# Initialize client
bdl = BDL()

# List available administrative levels
levels = bdl.levels.list_levels()
print("Administrative levels:")
print(levels[['id', 'name']])

# List available years
years = bdl.years.list_years()
print(f"\nAvailable years: {years['id'].min()} - {years['id'].max()}")

Fetching levels: 1 pages [00:00, 11.46 pages/s, items=8]

Administrative levels:
   id                               name
0   0                      Poziom Polski
1   1               Poziom Makroregionów
2   2                  Poziom Województw
3   3                    Poziom Regionów
4   4                 Poziom Podregionów
5   5                    Poziom Powiatów
6   6                        Poziom Gmin
7   7  Poziom miejscowości statystycznej

Fetching years: 1 pages [00:00,  1.47 pages/s, items=32]

Available years: 1995 - 2026

Finding Variables¶

In [2]:

Copied!





# Search for population-related variables
population_vars = bdl.variables.search_variables(name="population")
print(f"Found {len(population_vars)} population-related variables")
print(population_vars[['id', 'n1']].head())

# Get details for a specific variable
var_details = bdl.variables.get_variable("3643")
print("\nVariable details:")
print(var_details[['id', 'n1', 'n2']])
# Search for population-related variables
population_vars = bdl.variables.search_variables(name="population")
print(f"Found {len(population_vars)} population-related variables")
print(population_vars[['id', 'n1']].head())

# Get details for a specific variable
var_details = bdl.variables.get_variable("3643")
print("\nVariable details:")
print(var_details[['id', 'n1', 'n2']])

Fetching search: 18 pages [01:44,  5.82s/ pages, items=1760]

Found 1760 population-related variables
        id                                                 n1
0     9179         concerning self-taxation of the population
1  1365239            total net migration per 1000 population
2   498816  net migration in internal movement per 1000 po...
3   745534           net migration abroad per 1000 population
4   453193                  net migration per 1000 population

Variable details:
     id     n1     n2
0  3643  total  30-39

Retrieving Data¶

In [3]:

Copied!





# Get data for a variable at voivodeship level (level 2)
data = bdl.data.get_data_by_variable(
    variable_id="3643",
    years=[2021],
    unit_level=2  # Voivodeship level
)

print(f"Retrieved {len(data)} data points")
print(data[['unit_name', 'year', 'val']].head())
# Get data for a variable at voivodeship level (level 2)
data = bdl.data.get_data_by_variable(
    variable_id="3643",
    years=[2021],
    unit_level=2  # Voivodeship level
)

print(f"Retrieved {len(data)} data points")
print(data[['unit_name', 'year', 'val']].head())

Fetching 3643: 1 pages [00:00,  2.57 pages/s, items=16]

Retrieved 16 data points
            unit_name  year  val
0         MAŁOPOLSKIE  2021    2
1             ŚLĄSKIE  2021    6
2            LUBUSKIE  2021    0
3       WIELKOPOLSKIE  2021    2
4  ZACHODNIOPOMORSKIE  2021    5

In [4]:

Copied!

data_aggr = bdl.aggregates.list_aggregates()
print(data_aggr)
data_aggr = bdl.aggregates.list_aggregates()
print(data_aggr)

Fetching aggregates: 1 pages [00:00, 13.47 pages/s, items=8]

   id                                     name  level  \
0   1                                    TOTAL      7   
1   2                             URBAN GMINAS      5   
2   3                       URBAN-RURAL GMINAS      5   
3   4                             RURAL GMINAS      5   
4   7                              URBAN AREAS      5   
5   8                              RURAL AREAS      5   
6  91  NP- Górnośląsko-Zagłębiowska Metropolia      3   
7  92                 NP- Metropolia Krakowska      3   

                                         description  
0  Aggregates for items collected on the level: <...  
1  It is a sum of data for urban gminas (unit typ...  
2  It is a sum of data for urban-rural gminas (un...  
3  It is a sum of these data for rural gminas (un...  
4  It is a sum of  data for urban areas according...  
5  It is a sum of data for rural areas, i.e. rura...  
6                                                NaN  
7                                                NaN

Population Analysis by Region¶

Finding Population Variables¶

In [5]:

Copied!





# Search for population variables
pop_vars = bdl.variables.search_variables(name="population")

# Filter for total population (usually contains "total" or "ogółem")
total_pop = pop_vars[
    pop_vars['n2'].str.contains('total', case=False, na=False)
]
print(f"Found {len(total_pop)} total population variables")
# Search for population variables
pop_vars = bdl.variables.search_variables(name="population")

# Filter for total population (usually contains "total" or "ogółem")
total_pop = pop_vars[
    pop_vars['n2'].str.contains('total', case=False, na=False)
]
print(f"Found {len(total_pop)} total population variables")

Fetching search: 18 pages [00:00, 284.41 pages/s, items=1760]

Found 321 total population variables

Getting Regional Population Data¶

In [6]:

Copied!





# Get population data for all voivodeships in 2021
pop_data = bdl.data.get_data_by_variable(
    variable_id="3643",  # Example: total population variable
    years=[2021],
    unit_level=2  # Voivodeship level
)

# Sort by population
pop_sorted = pop_data.sort_values('val', ascending=False)
print("Top 5 voivodeships by population:")
print(pop_sorted[['unit_name', 'val']].head())
# Get population data for all voivodeships in 2021
pop_data = bdl.data.get_data_by_variable(
    variable_id="3643",  # Example: total population variable
    years=[2021],
    unit_level=2  # Voivodeship level
)

# Sort by population
pop_sorted = pop_data.sort_values('val', ascending=False)
print("Top 5 voivodeships by population:")
print(pop_sorted[['unit_name', 'val']].head())

Fetching 3643: 1 pages [00:00, 235.52 pages/s, items=16]

Top 5 voivodeships by population:
             unit_name  val
15         MAZOWIECKIE   12
1              ŚLĄSKIE    6
4   ZACHODNIOPOMORSKIE    5
8            POMORSKIE    4
10             ŁÓDZKIE    4

Economic Indicator Comparison¶

Finding Economic Variables¶

In [7]:

Copied!





# Search for unemployment variables
unemployment_vars = bdl.variables.search_variables(name="unemployment")
print(f"Found {len(unemployment_vars)} unemployment variables")

# Search for GDP-related variables
gdp_vars = bdl.variables.search_variables(name="GDP")
print(f"Found {len(gdp_vars)} GDP-related variables")
# Search for unemployment variables
unemployment_vars = bdl.variables.search_variables(name="unemployment")
print(f"Found {len(unemployment_vars)} unemployment variables")

# Search for GDP-related variables
gdp_vars = bdl.variables.search_variables(name="GDP")
print(f"Found {len(gdp_vars)} GDP-related variables")

Fetching search: 5 pages [00:28,  5.76s/ pages, items=458]

Found 458 unemployment variables

Fetching search: 1 pages [00:05,  5.25s/ pages, items=11]

Found 11 GDP-related variables

Comparing Voivodeships¶

In [8]:

Copied!





# Get unemployment data for all voivodeships
unemployment_data = bdl.data.get_data_by_variable(
    variable_id="1234",  # Example unemployment variable ID
    years=[2021],
    unit_level=2
)

# Sort and display
sorted_unemployment = unemployment_data.sort_values('val', ascending=False)
print("Unemployment by voivodeship (2021):")
print(sorted_unemployment[['unit_name', 'val']].head(10))
# Get unemployment data for all voivodeships
unemployment_data = bdl.data.get_data_by_variable(
    variable_id="1234",  # Example unemployment variable ID
    years=[2021],
    unit_level=2
)

# Sort and display
sorted_unemployment = unemployment_data.sort_values('val', ascending=False)
print("Unemployment by voivodeship (2021):")
print(sorted_unemployment[['unit_name', 'val']].head(10))

Fetching 1234: 1 pages [00:00,  5.28 pages/s, items=16]

Unemployment by voivodeship (2021):
             unit_name     val
15         MAZOWIECKIE  766241
1              ŚLĄSKIE  594875
0          MAŁOPOLSKIE  544790
3        WIELKOPOLSKIE  411015
5         DOLNOŚLĄSKIE  334400
10             ŁÓDZKIE  315051
8            POMORSKIE  297680
12           LUBELSKIE  286248
13        PODKARPACKIE  268623
7   KUJAWSKO-POMORSKIE  199304

Time Series Analysis¶

Preparing Data for Visualization¶

In [9]:

Copied!





# Get time series data for a specific variable and region
time_series = bdl.data.get_data_by_variable(
    variable_id="3643",
    unit_level=2,
    unit_parent_id="020000000000"  # Example: Mazovian Voivodeship
)

# Filter to recent years
recent_years = time_series[time_series['year'] >= 2015]

# Sort by year
recent_years = recent_years.sort_values('year')

# Prepare for plotting
plot_data = recent_years[['year', 'val']].set_index('year')
print(plot_data.head())

# Get time series data for a specific variable and region
time_series = bdl.data.get_data_by_variable(
    variable_id="3643",
    unit_level=2,
    unit_parent_id="020000000000"  # Example: Mazovian Voivodeship
)

# Filter to recent years
recent_years = time_series[time_series['year'] >= 2015]

# Sort by year
recent_years = recent_years.sort_values('year')

# Prepare for plotting
plot_data = recent_years[['year', 'val']].set_index('year')
print(plot_data.head())

Fetching 3643: 1 pages [00:00,  8.17 pages/s, items=3]

Error Handling¶

Handling Missing Data¶

In [10]:

Copied!





# Check if data exists before processing
data = bdl.data.get_data_by_variable("3643", years=[2021], unit_level=2)

if data.empty:
    print("No data available for this variable/year/level combination")
else:
    # Process data
    print(f"Found {len(data)} records")
    print(data.head())
# Check if data exists before processing
data = bdl.data.get_data_by_variable("3643", years=[2021], unit_level=2)

if data.empty:
    print("No data available for this variable/year/level combination")
else:
    # Process data
    print(f"Found {len(data)} records")
    print(data.head())

Fetching 3643: 1 pages [00:00, 258.60 pages/s, items=16]

Found 16 records
   year  val      unit_id           unit_name  attr_id
0  2021    2  11200000000         MAŁOPOLSKIE        1
1  2021    6  12400000000             ŚLĄSKIE        1
2  2021    0  20800000000            LUBUSKIE        0
3  2021    2  23000000000       WIELKOPOLSKIE        1
4  2021    5  23200000000  ZACHODNIOPOMORSKIE        1

Enrichment: Adding Human-Readable Labels¶

The access layer can automatically join reference data (levels, attributes, subjects, etc.) onto the result DataFrame via the enrich parameter. Enrichment fetches each lookup table once per session and caches it in memory.

In [1]:

Copied!





from pybdl import BDL

bdl = BDL()

# Enrich variables with level names, measure descriptions, and subject names.
# Without enrichment, the DataFrame only contains raw IDs (level, measure_unit_id, subject_id).
variables = bdl.variables.search_variables(name="population", max_pages=1, enrich=["levels", "measures", "subjects"])
print("Variables with enrichment:")
print(variables[["id_x", "n1", "level", "level_name", "measure_unit_id", "measure_unit_description"]].head())
from pybdl import BDL

bdl = BDL()

# Enrich variables with level names, measure descriptions, and subject names.
# Without enrichment, the DataFrame only contains raw IDs (level, measure_unit_id, subject_id).
variables = bdl.variables.search_variables(name="population", max_pages=1, enrich=["levels", "measures", "subjects"])
print("Variables with enrichment:")
print(variables[["id_x", "n1", "level", "level_name", "measure_unit_id", "measure_unit_description"]].head())

Fetching levels: 1 pages [00:00, 202.55 pages/s, items=8]
Fetching measures: 1 pages [00:00, 200.37 pages/s, items=75]
Fetching subjects: 1 pages [00:00, 243.37 pages/s, items=33]

Variables with enrichment:
      id_x                                                 n1  level  \
0     9179         concerning self-taxation of the population      5   
1  1365239            total net migration per 1000 population      6   
2   498816  net migration in internal movement per 1000 po...      6   
3   745534           net migration abroad per 1000 population      6   
4   453193                  net migration per 1000 population      6   

        level_name  measure_unit_id measure_unit_description  
0  Poziom Powiatów                8         number of pieces  
1      Poziom Gmin               26        number of persons  
2      Poziom Gmin               26        number of persons  
3      Poziom Gmin               26        number of persons  
4      Poziom Gmin               26        number of persons

In [2]:

Copied!





# Enrich data with attribute labels and unit details.
# 'attributes' adds attr_name / attr_symbol columns; 'units' adds unit_name_enriched, unit_level, etc.
data = bdl.data.get_data_by_variable(
    variable_id="3643",
    years=[2021],
    unit_level=2,
    enrich=["attributes", "units"],
)
print("Data with enriched attribute and unit columns:")
print(data[["unit_id", "unit_name_enriched", "year", "val", "attr_id", "attr_name"]].head())
# Enrich data with attribute labels and unit details.
# 'attributes' adds attr_name / attr_symbol columns; 'units' adds unit_name_enriched, unit_level, etc.
data = bdl.data.get_data_by_variable(
    variable_id="3643",
    years=[2021],
    unit_level=2,
    enrich=["attributes", "units"],
)
print("Data with enriched attribute and unit columns:")
print(data[["unit_id", "unit_name_enriched", "year", "val", "attr_id", "attr_name"]].head())

Fetching 3643: 1 pages [00:00, 215.93 pages/s, items=16]
Fetching units: 10 pages [11:54, 71.46s/ pages, items=1000]

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[2], line 3
      1 # Enrich data with attribute labels and unit details.
      2 # 'attributes' adds attr_name / attr_symbol columns; 'units' adds unit_name_enriched, unit_level, etc.
----> 3 data = bdl.data.get_data_by_variable(
      4     variable_id="3643",
      5     years=[2021],
      6     unit_level=2,
      7     enrich=["attributes", "units"],
      8 )
      9 print("Data with enriched attribute and unit columns:")
     10 print(data[["unit_id", "unit_name_enriched", "year", "val", "attr_id", "attr_name"]].head())

File ~/git/uep/pyLDB/pybdl/access/enrichment.py:397, in with_enrichment.<locals>.decorator.<locals>.wrapper(self, *args, **kwargs)
    395 for spec in specs:
    396     if flags.get(spec.flag):
--> 397         lookup_df = _get_lookup(self, spec)
    398         df = _merge_enrichment(df, lookup_df, spec)
    399 return _recombine_result(df, metadata)

File ~/git/uep/pyLDB/pybdl/access/enrichment.py:295, in _get_lookup(access, spec)
    293 if spec.cache_key in cache:
    294     return cache[spec.cache_key]
--> 295 lookup_df = spec.sync_loader(access)
    296 cache[spec.cache_key] = lookup_df
    297 access._enrichment_cache = cache

File ~/git/uep/pyLDB/pybdl/access/enrichment.py:122, in _fetch_units_sync(access)
    119 else:
    120     from pybdl.api.units import UnitsAPI
--> 122     data = UnitsAPI(access.api_client.config).list_units(page_size=access._get_default_page_size(), max_pages=None)
    123 return _normalize_lookup_dataframe(access, data)

File ~/git/uep/pyLDB/pybdl/api/units.py:108, in UnitsAPI.list_units(self, parent_id, level, page, page_size, max_pages, sort, lang, format, if_none_match, if_modified_since, extra_query)
     94 def list_units(
     95     self,
     96     parent_id: str | None = None,
   (...)    106     extra_query: dict[str, Any] | None = None,
    107 ) -> list[dict[str, Any]]:
--> 108     return self._fetch_collection_endpoint(
    109         "units",
    110         extra_params=self._list_units_params(parent_id, level, page, sort, extra_query),
    111         lang=lang,
    112         format=format,
    113         if_none_match=if_none_match,
    114         if_modified_since=if_modified_since,
    115         page_size=page_size,
    116         max_pages=max_pages,
    117         results_key="results",
    118     )

File ~/git/uep/pyLDB/pybdl/api/client.py:738, in BaseAPIClient._fetch_collection_endpoint(self, endpoint, extra_params, lang, format, if_none_match, if_modified_since, page_size, max_pages, results_key)
    730     params_with_page_size["page-size"] = page_size
    731     return self.fetch_single_result(
    732         endpoint,
    733         results_key=results_key,
    734         params=params_with_page_size,
    735         headers=headers,
    736     )
--> 738 return self.fetch_all_results(
    739     endpoint,
    740     params=params,
    741     headers=headers,
    742     page_size=page_size,
    743     max_pages=max_pages,
    744     results_key=results_key,
    745 )

File ~/git/uep/pyLDB/pybdl/api/client.py:536, in BaseAPIClient.fetch_all_results(self, endpoint, method, params, headers, results_key, page_size, max_pages, return_metadata, show_progress)
    534 first_page = True
    535 try:
--> 536     for page in self._paginated_request_sync(
    537         endpoint,
    538         method=method,
    539         params=params,
    540         headers=headers,
    541         results_key=results_key,
    542         page_size=page_size,
    543         max_pages=max_pages,
    544     ):
    545         if results_key not in page:
    546             raise BDLResponseError(f"Response does not contain key '{results_key}'", payload=page)

File ~/git/uep/pyLDB/pybdl/api/client.py:451, in BaseAPIClient._paginated_request_sync(self, endpoint, method, params, headers, results_key, page_size, max_pages, return_all)
    449     if not next_url:
    450         break
--> 451     resp = self._request_sync_url(next_url, method=method, headers=headers)
    453 if results_key not in resp:
    454     raise BDLResponseError(f"Response does not contain key '{results_key}'", payload=resp)

File ~/git/uep/pyLDB/pybdl/api/client.py:363, in BaseAPIClient._request_sync_url(self, url, method, params, headers)
    361 if retries_429 < self.config.http_429_max_retries:
    362     retries_429 += 1
--> 363     time.sleep(self._retry_delay_after_429(retries_429 - 1, response))
    364     continue
    365 return self._process_response(response)

KeyboardInterrupt:

In [23]:

Copied!





# Combine enrichment with metadata retrieval.
# return_metadata=True returns a (DataFrame, metadata_dict) tuple.
df, meta = bdl.data.get_data_by_variable(
    variable_id="3643",
    years=[2021],
    unit_level=2,
    enrich=["attributes"],
    return_metadata=True,
)
print(f"Total pages: {meta.get('totalPages')}, Total records: {meta.get('totalRecords')}")
print(df[["unit_id", "year", "val", "attr_name"]].head())
# Combine enrichment with metadata retrieval.
# return_metadata=True returns a (DataFrame, metadata_dict) tuple.
df, meta = bdl.data.get_data_by_variable(
    variable_id="3643",
    years=[2021],
    unit_level=2,
    enrich=["attributes"],
    return_metadata=True,
)
print(f"Total pages: {meta.get('totalPages')}, Total records: {meta.get('totalRecords')}")
print(df[["unit_id", "year", "val", "attr_name"]].head())

Fetching 3643: 1 pages [00:00, 161.21 pages/s, items=16]
Fetching attributes: 1 pages [00:00, 15.45 pages/s, items=18]

Total pages: None, Total records: 16
       unit_id  year  val attr_name
0  11200000000  2021    2     value
1  12400000000  2021    6     value
2  20800000000  2021    0         0
3  23000000000  2021    2     value
4  23200000000  2021    5     value

Best Practices¶

Use the access layer: Prefer bdl.data over bdl.api.data for DataFrame output
Enable caching: Use use_cache=True for repeated queries
Handle pagination: Use max_pages=None to get complete datasets
Use enrichment: Let the library automatically add human-readable names
Async for bulk operations: Use async methods when fetching multiple datasets
Error handling: Always handle RateLimitError and check for empty results

{seealso}
- {doc}`access_layer` for access layer documentation
- {doc}`rate_limiting` for rate limiting details
- {doc}`config` for configuration options