{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Examples\n", "\n", "This section provides real-world examples demonstrating how to use pyBDL for common data analysis tasks.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Data Retrieval\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting Started\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fetching levels: 1 pages [00:00, 15.30 pages/s, items=8]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Administrative levels:\n", " id name\n", "0 0 Poziom Polski\n", "1 1 Poziom Makroregionów\n", "2 2 Poziom Województw\n", "3 3 Poziom Regionów\n", "4 4 Poziom Podregionów\n", "5 5 Poziom Powiatów\n", "6 6 Poziom Gmin\n", "7 7 Poziom miejscowości statystycznej\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Fetching years: 1 pages [00:00, 12.90 pages/s, items=31]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Available years: 1995 - 2025\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "from pybdl import BDL, BDLConfig\n", "\n", "# Initialize client\n", "bdl = BDL()\n", "\n", "# List available administrative levels\n", "levels = bdl.levels.list_levels()\n", "print(\"Administrative levels:\")\n", "print(levels[['id', 'name']])\n", "\n", "# List available years\n", "years = bdl.years.list_years()\n", "print(f\"\\nAvailable years: {years['id'].min()} - {years['id'].max()}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Finding Variables\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fetching search: 18 pages [00:42, 2.37s/ pages, items=1756]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Found 1756 population-related variables\n", " id n1\n", "0 9179 concerning self-taxation of the population\n", "1 1365239 total net migration per 1000 population\n", "2 498816 net migration in internal movement per 1000 po...\n", "3 745534 net migration abroad per 1000 population\n", "4 453193 net migration per 1000 population\n", "\n", "Variable details:\n", " id n1 n2\n", "0 3643 total 30-39\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Search for population-related variables\n", "population_vars = bdl.variables.search_variables(name=\"population\")\n", "print(f\"Found {len(population_vars)} population-related variables\")\n", "print(population_vars[['id', 'n1']].head())\n", "\n", "# Get details for a specific variable\n", "var_details = bdl.variables.get_variable(\"3643\")\n", "print(f\"\\nVariable details:\")\n", "print(var_details[['id', 'n1', 'n2']])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Retrieving Data\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fetching 3643: 1 pages [00:00, 13.40 pages/s, items=16]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Retrieved 16 data points\n", " unit_name year val\n", "0 MAŁOPOLSKIE 2021 2\n", "1 ŚLĄSKIE 2021 6\n", "2 LUBUSKIE 2021 0\n", "3 WIELKOPOLSKIE 2021 2\n", "4 ZACHODNIOPOMORSKIE 2021 5\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Get data for a variable at voivodeship level (level 2)\n", "data = bdl.data.get_data_by_variable(\n", " variable_id=\"3643\",\n", " years=[2021],\n", " unit_level=2 # Voivodeship level\n", ")\n", "\n", "print(f\"Retrieved {len(data)} data points\")\n", "print(data[['unit_name', 'year', 'val']].head())\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fetching aggregates: 1 pages [00:00, 14.59 pages/s, items=8]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " id name level \\\n", "0 1 TOTAL 7 \n", "1 2 URBAN GMINAS 5 \n", "2 3 URBAN-RURAL GMINAS 5 \n", "3 4 RURAL GMINAS 5 \n", "4 7 URBAN AREAS 5 \n", "5 8 RURAL AREAS 5 \n", "6 91 NP- Górnośląsko-Zagłębiowska Metropolia 3 \n", "7 92 NP- Metropolia Krakowska 3 \n", "\n", " description \n", "0 Aggregates for items collected on the level: <... \n", "1 It is a sum of data for urban gminas (unit typ... \n", "2 It is a sum of data for urban-rural gminas (un... \n", "3 It is a sum of these data for rural gminas (un... \n", "4 It is a sum of data for urban areas according... \n", "5 It is a sum of data for rural areas, i.e. rura... \n", "6 NaN \n", "7 NaN \n" ] } ], "source": [ "data_aggr = bdl.aggregates.list_aggregates()\n", "print(data_aggr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Population Analysis by Region\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Finding Population Variables\n" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fetching search: 18 pages [00:00, 516.38 pages/s, items=1756]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Found 321 total population variables\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Search for population variables\n", "pop_vars = bdl.variables.search_variables(name=\"population\")\n", "\n", "# Filter for total population (usually contains \"total\" or \"ogółem\")\n", "total_pop = pop_vars[\n", " pop_vars['n2'].str.contains('total', case=False, na=False)\n", "]\n", "print(f\"Found {len(total_pop)} total population variables\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting Regional Population Data\n" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fetching 3643: 1 pages [00:00, 216.45 pages/s, items=16]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Top 5 voivodeships by population:\n", " unit_name val\n", "15 MAZOWIECKIE 12\n", "1 ŚLĄSKIE 6\n", "4 ZACHODNIOPOMORSKIE 5\n", "8 POMORSKIE 4\n", "10 ŁÓDZKIE 4\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Get population data for all voivodeships in 2021\n", "pop_data = bdl.data.get_data_by_variable(\n", " variable_id=\"3643\", # Example: total population variable\n", " years=[2021],\n", " unit_level=2 # Voivodeship level\n", ")\n", "\n", "# Sort by population\n", "pop_sorted = pop_data.sort_values('val', ascending=False)\n", "print(\"Top 5 voivodeships by population:\")\n", "print(pop_sorted[['unit_name', 'val']].head())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Economic Indicator Comparison\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Finding Economic Variables\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fetching search: 5 pages [00:00, 424.99 pages/s, items=458]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Found 458 unemployment variables\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Fetching search: 1 pages [00:03, 3.34s/ pages, items=11]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Found 11 GDP-related variables\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Search for unemployment variables\n", "unemployment_vars = bdl.variables.search_variables(name=\"unemployment\")\n", "print(f\"Found {len(unemployment_vars)} unemployment variables\")\n", "\n", "# Search for GDP-related variables\n", "gdp_vars = bdl.variables.search_variables(name=\"GDP\")\n", "print(f\"Found {len(gdp_vars)} GDP-related variables\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparing Voivodeships\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fetching 1234: 1 pages [00:00, 3.31 pages/s, items=16]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Unemployment by voivodeship (2021):\n", " unit_name val\n", "15 MAZOWIECKIE 766241\n", "1 ŚLĄSKIE 594875\n", "0 MAŁOPOLSKIE 544790\n", "3 WIELKOPOLSKIE 411015\n", "5 DOLNOŚLĄSKIE 334400\n", "10 ŁÓDZKIE 315051\n", "8 POMORSKIE 297680\n", "12 LUBELSKIE 286248\n", "13 PODKARPACKIE 268623\n", "7 KUJAWSKO-POMORSKIE 199304\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Get unemployment data for all voivodeships\n", "unemployment_data = bdl.data.get_data_by_variable(\n", " variable_id=\"1234\", # Example unemployment variable ID\n", " years=[2021],\n", " unit_level=2\n", ")\n", "\n", "# Sort and display\n", "sorted_unemployment = unemployment_data.sort_values('val', ascending=False)\n", "print(\"Unemployment by voivodeship (2021):\")\n", "print(sorted_unemployment[['unit_name', 'val']].head(10))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Time Series Analysis\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preparing Data for Visualization\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fetching 3643: 1 pages [00:00, 17.97 pages/s, items=3]" ] }, { "name": "stdout", "output_type": "stream", "text": [ " val\n", "year \n", "2015 2\n", "2015 3\n", "2015 3\n", "2016 1\n", "2016 2\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "import pandas as pd\n", "\n", "# Get time series data for a specific variable and region\n", "time_series = bdl.data.get_data_by_variable(\n", " variable_id=\"3643\",\n", " unit_level=2,\n", " unit_parent_id=\"020000000000\" # Example: Mazovian Voivodeship\n", ")\n", "\n", "# Filter to recent years\n", "recent_years = time_series[time_series['year'] >= 2015]\n", "\n", "# Sort by year\n", "recent_years = recent_years.sort_values('year')\n", "\n", "# Prepare for plotting\n", "plot_data = recent_years[['year', 'val']].set_index('year')\n", "print(plot_data.head())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Error Handling\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Handling Missing Data\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fetching 3643: 1 pages [00:00, 217.25 pages/s, items=16]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Found 16 records\n", " year val unit_id unit_name attr_id\n", "0 2021 2 11200000000 MAŁOPOLSKIE 1\n", "1 2021 6 12400000000 ŚLĄSKIE 1\n", "2 2021 0 20800000000 LUBUSKIE 0\n", "3 2021 2 23000000000 WIELKOPOLSKIE 1\n", "4 2021 5 23200000000 ZACHODNIOPOMORSKIE 1\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Check if data exists before processing\n", "data = bdl.data.get_data_by_variable(\"3643\", years=[2021], unit_level=2)\n", "\n", "if data.empty:\n", " print(\"No data available for this variable/year/level combination\")\n", "else:\n", " # Process data\n", " print(f\"Found {len(data)} records\")\n", " print(data.head())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Best Practices\n", "\n", "1. **Use the access layer**: Prefer ``bdl.data`` over ``bdl.api.data`` for DataFrame output\n", "2. **Enable caching**: Use ``use_cache=True`` for repeated queries\n", "3. **Handle pagination**: Use ``max_pages=None`` to get complete datasets\n", "4. **Use enrichment**: Let the library automatically add human-readable names\n", "5. **Async for bulk operations**: Use async methods when fetching multiple datasets\n", "6. **Error handling**: Always handle ``RateLimitError`` and check for empty results\n", "\n", "```{seealso}\n", "- {doc}`access_layer` for access layer documentation\n", "- {doc}`rate_limiting` for rate limiting details\n", "- {doc}`config` for configuration options\n", "```\n" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" } }, "nbformat": 4, "nbformat_minor": 2 }