filter-stations

Installation

To install the package, run the following command in your terminal:

pip install -U filter-stations

Getting Started

All methods require an API key and secret, which can be obtained by contacting TAHMO.

  • The retreive_data class is used to retrieve data from the TAHMO API endpoints.
  • The Filter class is used to filter weather stations data based on things like distance and region.
  • The pipeline class is used to create a pipeline of filters to apply to weather stations based on how they correlate with water level data.
  • The Interactive_maps class is used to plot weather stations on an interactive map.
  • The Water_level class is used to retrieve water level data and coordinates of gauging stations.
# Import the necessary modules
from filter_stations import retreive_data, Filter, pipeline, Interactive_maps

# Define the API key and secret
apiKey = 'your_api_key' # request from TAHMO
apiSecret = 'your_api_secret' # request from TAHMO
maps_key = 'your_google_maps_key' # retrieve from google maps platform

# Initialize the class
ret = retreive_data(apiKey, apiSecret, maps_key)
fs = Filter(apiKey, apiSecret, maps_key)
pipe = pipeline(apiKey, apiSecret, maps_key)
maps = Interactive_maps(apiKey, apiSecret, maps_key)
API_BASE_URL = 'https://datahub.tahmo.org'
API_MAX_PERIOD = '365D'
endpoints = {'VARIABLES': 'services/assets/v2/variables', 'STATION_INFO': 'services/assets/v2/stations', 'WEATHER_DATA': 'services/measurements/v2/stations', 'DATA_COMPLETE': 'custom/sensordx/latestmeasurements', 'STATION_STATUS': 'custom/stations/status', 'QUALITY_OBJECTS': 'custom/sensordx/reports'}
module_dir = 'D:\\previous-pc\\Agent\\Packages\\.'
class retreive_data:
retreive_data(apiKey, apiSecret, api_key)
apiKey
apiSecret
api_key
def get_stations_info(self, station=None, multipleStations=[], countrycode=None):

Retrieves information about weather stations from an API endpoint and returns relevant information based on the parameters passed to it.

Parameters:

  • station (str, optional): Code for a single station to retrieve information for. Defaults to None.
  • multipleStations (list, optional): List of station codes to retrieve information for multiple stations. Defaults to [].
  • countrycode (str, optional): Country code to retrieve information for all stations located in the country. Defaults to None.

Returns:

  • pandas.DataFrame: DataFrame containing information about the requested weather stations.

Usage:

To retrieve information about a single station:

station_info = ret.get_stations_info(station='TA00001')

To retrieve information about multiple stations:

station_info = ret.get_stations_info(multipleStations=['TA00001', 'TA00002'])

To retrieve information about all stations in a country:

station_info = ret.get_stations_info(countrycode='KE')
def get_coordinates(self, station_sensor, normalize=False):

Retrieve longitudes,latitudes for a list of station_sensor names and duplicated for stations with multiple sensors.

Parameters:

  • station_sensor (list): List of station_sensor names.
  • normalize (bool): If True, normalize the coordinates using MinMaxScaler to the range (0,1).

Returns:

  • pd.DataFrame: DataFrame containing longitude and latitude coordinates for each station_sensor.

Usage:

To retrieve coordinates

start_date = '2023-01-01'
end_date = '2023-12-31'
country= 'KE'

# get the precipitation data for the stations
ke_pr = filt.filter_pr(start_date=start_date, end_date=end_date, 
                        country='Kenya').set_index('Date')

# get the coordinates
xs = ret.get_coordinates(ke_pr.columns, normalize=True)
def get_variables(self):

Retrieves information about available weather variables from an API endpoint.

Returns:

  • dict: Dictionary containing information about available weather variables, keyed by variable shortcode.
def k_neighbours(self, station, number=5):

Returns a dictionary of the nearest neighbouring stations to the specified station.

Parameters:

  • station (str): Code for the station to find neighbouring stations for.
  • number (int, optional): Number of neighbouring stations to return. Defaults to 5.

Returns:

  • dict: Dictionary containing the station codes and distances of the nearest neighbouring stations.
def station_status(self):

Retrieves the status of all weather stations

Returns:

  • pandas.DataFrame: DataFrame containing the status of all weather stations.
def trained_models(self, columns=None):

Retrieves trained models from the MongoDB.

Parameters:

  • columns (list of str, optional): List of column names to include in the returned DataFrame. If None, all columns are included. Defaults to None.

Returns:

  • pandas.DataFrame: DataFrame containing trained models with the specified columns.
def aggregate_variables(self, dataframe, freq='1D', method='sum'):

Aggregates a pandas DataFrame of weather variables by applying a specified method across a given frequency.

Parameters:

  • dataframe (pandas.DataFrame): DataFrame containing weather variable data.
  • freq (str, optional): Frequency to aggregate the data by. Defaults to '1D'. Examples include '1H' for hourly, '12H' for every 12 hours, '1D' for daily, '1W' for weekly, '1M' for monthly, etc.
  • method (str or callable, optional): Method to use for aggregation. Defaults to 'sum'. Acceptable string values are 'sum', 'mean', 'min', 'max'. Alternatively, you can provide a custom aggregation function (callable).

                                Example of a custom method:
                                <div class="pdoc-code codehilite">
                                <pre><span></span><code><span class="k">def</span> <span class="nf">custom_median</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
                                    <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">nan</span> <span class="k">if</span> <span class="n">x</span><span class="o">.</span><span class="n">isnull</span><span class="p">()</span><span class="o">.</span><span class="n">all</span><span class="p">()</span> <span class="k">else</span> <span class="n">x</span><span class="o">.</span><span class="n">median</span><span class="p">()</span>
                                
                                <span class="n">daily_median_data</span> <span class="o">=</span> <span class="n">aggregate_variables</span><span class="p">(</span><span class="n">dataframe</span><span class="p">,</span> <span class="n">freq</span><span class="o">=</span><span class="s1">&#39;1D&#39;</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="n">custom_median</span><span class="p">)</span>
                                </code></pre>
                                </div>
    

Returns:

  • pandas.DataFrame: DataFrame containing aggregated weather variable data according to the specified frequency and method.

Usage:

Define the DataFrame containing the weather variable data:

dataframe = ret.get_measurements('TA00001', '2020-01-01', '2020-01-31', ['pr']) # data comes in 5 minute interval

To aggregate data hourly:

hourly_data = aggregate_variables(dataframe, freq='1H')

To aggregate data by 12 hours:

half_day_data = aggregate_variables(dataframe, freq='12H')

To aggregate data by day:

daily_data = aggregate_variables(dataframe, freq='1D')

To aggregate data by week:

weekly_data = aggregate_variables(dataframe, freq='1W')

To aggregate data by month:

monthly_data = aggregate_variables(dataframe, freq='1M')

To use a custom aggregation method:

def custom_median(x):
    return np.nan if x.isnull().all() else x.median()

daily_median_data = aggregate_variables(dataframe, freq='1D', method=custom_median)
def aggregate_qualityflags(self, dataframe, freq='1D'):

Aggregate quality flags in a DataFrame by day.

Parameters:

  • dataframe (pd.DataFrame): The DataFrame containing the measurements.

Returns:

  • pd.DataFrame: A DataFrame with aggregated quality flags, where values greater than 1 are rounded up.
def get_measurements( self, station, startDate=None, endDate=None, variables=None, dataset='controlled', aggregate='5min', quality_flags=False):

Get measurements from a station.

Parameters:

  • station (str): The station ID.
  • startDate (str, optional): The start date of the measurement period in the format 'YYYY-MM-DD'.
  • endDate (str, optional): The end date of the measurement period in the format 'YYYY-MM-DD'.
  • variables (list, optional): The variables to retrieve measurements for. If None, all variables are retrieved.
  • dataset (str, optional): The dataset to retrieve measurements from. Default is 'controlled'.
  • aggregate (bool, optional): Whether to aggregate the measurements by variable. Default is False.
  • quality_flags (bool, optional): Whether to include quality flag data. Default is False.

Returns:

  • A DataFrame containing the measurements.

Usage:

To retrieve precipitation data for a station for the last month:

from datetime import datetime, timedelta

# Get today's date
today = datetime.now()

# Calculate one month ago
last_month = today - timedelta(days=30)

# Format date as a string
last_month_str = last_month.strftime('%Y-%m-%d')
today_str = today.strftime('%Y-%m-%d')

# Define the station you want to retrieve data from
station = 'TA00001'
variables = ['pr']
dataset = 'raw'

# aggregate the data to 30 minutes interval
aggregate = '30min'

# Call the get_measurements method to retrieve and aggregate data
TA00001_data = ret.get_measurements(station, last_month_str, 
                                    today_str, variables, 
                                    dataset, aggregate)
def multiple_measurements( self, stations_list, startDate, endDate, variables, dataset='controlled', csv_file=None, aggregate='1D', quality_flags=False):

Retrieves measurements for multiple stations within a specified date range.

Parameters:

  • stations_list (list): A list of strings containing the codes of the stations to retrieve data from.
  • startDate (str): The start date for the measurements, in the format 'yyyy-mm-dd'.
  • endDate (str): The end date for the measurements, in the format 'yyyy-mm-dd'.
  • variables (list): A list of strings containing the names of the variables to retrieve.
  • dataset (str): The name of the database to retrieve the data from. Default is 'controlled' alternatively 'raw' database.
  • csv_file (str, optional): pass the name of the csv file to save the data otherwise it will return the dataframe.
  • aggregate (bool): If True, aggregate the data per day; otherwise, return data in 5 minute interval.

Returns:

  • df (pandas.DataFrame): A DataFrame containing the aggregated data for all stations.

Raises:

  • ValueError: If stations_list is not a list.

Example Usage:

To retrieve precipitation data for stations in Kenya for the last week and save it as a csv file:

# Import the necessary modules
from datetime import datetime, timedelta
from filter_stations import retreive_data

# An instance of the retreive_data class
ret = retreive_data(apiKey, apiSecret, maps_key)

# Get today's date
today = datetime.now()

# Calculate one week ago
last_week = today - timedelta(days=7)

# Format date as a string
last_week_str = last_week.strftime('%Y-%m-%d')
today_str = today.strftime('%Y-%m-%d')

# Define the list of stations you want to retrieve data from example stations in Kenya
stations = list(ret.get_stations_info(countrycode='KE')['code'])

# Get the precipitation data for the stations in the list
variables = ['pr']

# retrieve the raw data for the stations, aggregate the data and save it as a csv file
dataset = 'raw'
aggregate = '1D'
csv_file = 'Kenya_precipitation_data'

# Call the multiple_measurements method to retrieve and aggregate data
aggregated_data = ret.multiple_measurements(stations, last_week_str, 
                                            today_str, variables, 
                                            dataset, csv_file, aggregate)
def multiple_qualityflags(self, stations_list, startDate, endDate, csv_file=None):

Retrieves and aggregates quality flag data for multiple stations within a specified date range.

Parameters:

  • stations_list (list): A list of station codes for which to retrieve data.
  • startDate (str): The start date in 'YYYY-MM-DD' format.
  • endDate (str): The end date in 'YYYY-MM-DD' format.
  • csv_file (str, optional): The name of the CSV file to save the aggregated data. Default is None.

Returns:

  • pandas.DataFrame or None: A DataFrame containing the aggregated quality flag data for the specified stations, or None if an error occurs.

Raises: Exception: If an error occurs while retrieving data for a station.

def anomalies_report(self, start_date, end_date=None):

Retrieves anomaly reports for a specified date range.

Parameters:

  • start_date (str): The start date for the report in 'yyyy-mm-dd' format.
  • end_date (str, optional): The end date for the report in 'yyyy-mm-dd' format. If not provided, only data for the start_date is returned.

Returns:

  • pandas.DataFrame: A DataFrame containing anomaly reports with columns 'startDate', 'station_sensor', and 'level'. The 'startDate' column is used as the index.

Raises:

  • Exception: If there's an issue with the API request.

Usage:

To retrieve anomaly reports for a specific date range:

start_date = '2023-01-01'
end_date = '2023-01-31'
report_data = ret.anomalies_report(start_date, end_date)

To retrieve anomaly reports for a specific date:

start_date = '2023-01-01'
report_data = ret.anomalies_report(start_date)
def ground_truth(self, start_date, end_date=None):

Retrieves ground truth data for a specified date range.

Parameters:

  • start_date (str): The start date for the report in 'yyyy-mm-dd' format.
  • end_date (str, optional): The end date for the report in 'yyyy-mm-dd' format. If not provided, only data for the start_date is returned.

Returns:

  • pandas.DataFrame: A DataFrame containing ground truth data with columns 'startDate', 'station_sensor', 'description' and 'level'. The 'startDate' column is used as the index.

Raises:

  • Exception: If there's an issue with the API request.

Usage:

To retrieve ground truth data for a specific date range:

start_date = '2023-01-01'
end_date = '2023-01-31'
report_data = ret.ground_truth(start_date, end_date)

To retrieve ground truth data for a specific date:

start_date = '2023-01-01'
report_data = ret.ground_truth(start_date)
class Kieni:
Kieni(api_key, api_secret)
api_key
api_secret
def kieni_weather_data( self, start_date=None, end_date=None, variable=None, method='sum', freq='1D'):

Retrieves weather data from the Kieni API endpoint and returns it as a pandas DataFrame after processing.

Parameters:

  • start_date (str, optional): The start date for retrieving weather data in 'YYYY-MM-DD' format. Defaults to None if None returns from the beginning of the data.
  • end_date (str, optional): The end date for retrieving weather data in 'YYYY-MM-DD' format. Defaults to None if None returns to the end of the data.
  • variable (str, optional): The weather variable to retrieve same as the weather shortcodes by TAHMO e.g., 'pr', 'ap', 'rh'
  • method (str, optional): The aggregation method to apply to the data ('sum', 'mean', 'min', 'max' and custom functions). Defaults to 'sum'.
  • freq (str, optional): The frequency for data aggregation (e.g., '1D' for daily, '1H' for hourly). Defaults to '1D'.

Returns:

  • pandas.DataFrame: DataFrame containing the weather data for the specified parameters, with columns containing NaN values dropped.

Usage:

To retrieve daily rainfall data from January 1, 2024, to January 31, 2024:

# Instantiate the Kieni class
api_key, api_secret = '', '' # Request DSAIL for the API key and secret
kieni = Kieni(api_key, api_secret)

kieni_weather_data = kieni.kieni_weather_data(start_date='2024-01-01', end_date='2024-01-31', variable='pr', freq='1D', method='sum')

To retrieve hourly temperature data from February 1, 2024, to February 7, 2024:

kieni_weather_data = kieni.kieni_weather_data(start_date='2024-02-01', end_date='2024-02-07', variable='te', method='mean', freq='1H')
class pipeline(retreive_data):
pipeline(apiKey, apiSecret, api_key)
def stations_within_radius(self, radius, latitude, longitude, df=False):

Retrieves stations within a specified radius from a given latitude and longitude.

Parameters:

  • radius (float): Radius (in kilometers) within which to search for stations.
  • latitude (float): Latitude of the center point.
  • longitude (float): Longitude of the center point.
  • df (bool, optional): Flag indicating whether to return the result as a DataFrame. Defaults to False.

Returns:

  • DataFrame or list: DataFrame or list containing the stations within the specified radius. If df is True, a DataFrame is returned with the columns 'code', 'location.latitude', 'location.longitude', and 'distance'. If df is False, a list of station codes is returned.
def stations_data_check( self, stations_list, percentage=1, start_date=None, end_date=None, data=None, variables=['pr'], csv_file=None):

Performs a data check on the stations' data and returns the stations with a percentage of missing data below a threshold.

Parameters:

  • stations_list (list): List of station names or IDs.
  • percentage (float, optional): Threshold percentage of missing data. Defaults to 1 (i.e., 0% missing data allowed).
  • start_date (str, optional): Start date for the data range in the format 'YYYY-MM-DD'. Defaults to None.
  • end_date (str, optional): End date for the data range in the format 'YYYY-MM-DD'. Defaults to None.
  • data (DataFrame, optional): Preloaded data for the stations. Defaults to None.
  • variables (list, optional): List of variables to consider for the data check. Defaults to ['pr'].
  • csv_file (str, optional): File name for saving the data as a CSV file. Defaults to None.

Returns:

  • DataFrame: DataFrame containing the stations' data with less than the specified percentage of missing data.
def calculate_lag( self, weather_stations_data, water_level_data, lag=3, above=None, below=None):

Calculates the lag and coefficient of correlation between weather station data and water level data, identifying stations with positive correlations.

Parameters:

  • weather_stations_data (DataFrame): A DataFrame containing weather station data columns for analysis.
  • water_level_data (Series): A time series of water level data used for correlation analysis.
  • lag (int): The maximum lag, in hours, to consider for correlation. Default is 3 hours.
  • above (float or None): If specified, stations with correlations and lags above this threshold are identified.
  • below (float or None): If specified, stations with correlations and lags below this threshold are identified.

Returns:

  • above_threshold_lag (dict): A dictionary where keys represent weather station column names, and values represent the lag in hours if positive correlation exceeds the specified threshold (above).
  • below_threshold_lag (dict): A dictionary where keys represent weather station column names, and values represent the lag in hours if positive correlation falls below the specified threshold (below).
def shed_stations( self, weather_stations_data, water_level_data, gauging_station_coords, radius, lag=3, percentage=1):

Filters and processes weather station data to identify stations potentially contributing to water level changes above or below specified thresholds.

Parameters:

  • weather_stations_data (DataFrame): A DataFrame containing weather station data over a specific date range.
  • water_level_data (Series): A time series of water level data corresponding to the same date range as weather_station_data.
  • gauging_station_coords (tuple): A tuple containing latitude and longitude coordinates of the gauging station.
  • radius (float): The radius in kilometers for identifying nearby weather stations.
  • lag (int): The time lag, in hours, used for correlation analysis. Default is 3 hours.
  • percentage (float): The minimum percentage of valid data required for a weather station to be considered. Default is 1 (100%).
  • above (float or None): The threshold above which water level changes are considered significant. If provided, stations contributing to changes above this threshold are identified.
  • below (float or None): The threshold below which water level changes are considered significant. If provided, stations contributing to changes below this threshold are identified.

Returns:

  • above_threshold_lag (list): List of weather stations with positive correlations and lagged changes above the specified threshold.
  • below_threshold_lag (list): List of weather stations with positive correlations and lagged changes below the specified threshold.

Usage:

Get the TAHMO stations that correlate with the water level data

import pandas as pd
from filter_stations import pipeline

# An instance of the pipeline class
pipe = pipeline(apiKey, apiSecret, maps_key)

# load the water level data and the weather stations data
water_level_data = pd.read_csv('water_level_data.csv')
weather_stations_data = pd.read_csv('weather_stations_data.csv') 

# get the coordinates of the gauging station
gauging_station_coords = (-0.416, 36.951)

# get the stations within a radius of 200km from the gauging station
radius = 200

# get the stations that correlate with the water level data
above_threshold_lag, below_threshold_lag = pipe.shed_stations(weather_stations_data, water_level_data, 
                                                              gauging_station_coords, radius, 
                                                              lag=3, percentage=1)
def plot_figs( self, weather_stations, water_list, threshold_list, save=False, dpi=500, date='11-02-2021'):

Plots figures showing the relationship between rainfall and water level/stage against time.

Parameters:

  • weather_stations (DataFrame): DataFrame containing weather station data.
  • water_list (list): List of water levels/stages.
  • threshold_list (list): List of columns in the weather_stations DataFrame to plot.
  • save (bool, optional): Flag indicating whether to save the figures as PNG files. Defaults to False.
  • dpi (int, optional): Dots per inch for saving the figures. Defaults to 500.
  • date (str, optional): Start date for plotting in the format 'dd-mm-yyyy'. Defaults to '11-02-2021'.

Returns:

  • Displays the images of the plots. and if save is set to true saves the images in the current directory.
Muringato
class Filter(pipeline):
Filter(apiKey, apiSecret, api_key)
def get_stations_info(self, station=None, multipleStations=[], countrycode=None):

Retrieves information about weather stations from an API endpoint and returns relevant information based on the parameters passed to it.

Parameters:

  • station (str, optional): Code for a single station to retrieve information for. Defaults to None.
  • multipleStations (list, optional): List of station codes to retrieve information for multiple stations. Defaults to [].
  • countrycode (str, optional): Country code to retrieve information for all stations located in the country. Defaults to None.

Returns:

  • pandas.DataFrame: DataFrame containing information about the requested weather stations.

Usage:

To retrieve information about a single station:

station_info = ret.get_stations_info(station='TA00001')

To retrieve information about multiple stations:

station_info = ret.get_stations_info(multipleStations=['TA00001', 'TA00002'])

To retrieve information about all stations in a country:

station_info = ret.get_stations_info(countrycode='KE')
def centre_point(self, address):

This method retrieves the latitude and longitude coordinates of a given address using the Google Maps Geocoding API.

Parameters:

  • address : str The address of the location you want to retrieve the coordinates for.
  • api_key : str Your Google Maps Geocoding API key.

Returns:

  • Tuple (float, float) or None The latitude and longitude coordinates of the location if found, or None if the address is not found.
def calculate_new_point(self, lat, lon, distance, bearing):

Calculates a new geographic point based on the given latitude, longitude, distance and bearing.

Parameters:

  • lat (float): The latitude of the starting point in decimal degrees.
  • lon (float): The longitude of the starting point in decimal degrees.
  • distance (float): The distance in kilometers from the starting point to the new point.
  • bearing (float): The bearing in degrees from the starting point to the new point, measured clockwise from true north.

Returns:

  • Tuple[float, float]: A tuple containing the latitude and longitude of the new point, respectively, in decimal degrees.
def compute_filter(self, lat, lon, distance):

Calculates the bounding box coordinates for a given location and distance.

Parameters:

  • lat (float): The latitude of the location.
  • lon (float): The longitude of the location.
  • distance (float): The distance from the location, in kilometers, to the edge of the bounding box.

Returns:

  • A tuple containing four floats representing the bounding box coordinates: (min_lat, min_lon, max_lat, max_lon).
def filter_stations( self, address, distance, startDate=None, endDate=None, csvfile='pr_clog_flags.csv'):

This method filters weather station data within a certain distance from a given address.

Parameters:

  • address (str): Address to center the bounding box around.
  • distance (float): The distance (in kilometers) from the center to the edge of the bounding box.
  • startDate (str): The start date for filtering the weather station data in the format 'YYYY-MM-DD'.
  • endDate (str): The end date for filtering the weather station data in the format 'YYYY-MM-DD'.
  • csvfile (str): The name of the csv file containing the weather station data.

Returns:

  • pandas.DataFrame: The filtered weather station data within the bounding box.
def filter_stations_list(self, address, distance=100):

Filters stations based on their proximity to a given address and returns a list of station codes that fall within the specified distance.

Parameters:

  • address (str): Address to filter stations by.
  • distance (float, optional): Maximum distance (in kilometers) between the stations and the address. Default is 100 km.

Returns:

  • List of station codes that fall within the specified distance from the given address.
def stations_region(self, region, plot=False):

Subsets weather stations by a specific geographical region and optionally plots them on a map with a scale bar.

Parameters:

  • region (str): The name of the region to subset stations from (47 Kenyan counties).
  • plot (bool, optional): If True, a map with stations and a scale bar is plotted. Default is False.

Returns:

  • list or None: If plot is False, returns a list of station codes in the specified region. Otherwise, returns None.

Usage:

To get a list of station codes in the 'Nairobi' region without plotting:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
station_list = fs.stations_region('Nairobi')

To subset stations in the 'Nairobi' region and display them on a map with a scale bar:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
fs.stations_region('Nairobi', plot=True)
Nairobi Region Map
def remove_zero_columns(self, df):

Removes columns with all zeros from a DataFrame.

Parameters:

  • df (DataFrame): The DataFrame to remove columns from.

Returns:

  • DataFrame: The DataFrame with columns containing all zeros removed.
def filter_pr( self, start_date, end_date, country=None, region=None, radius=None, multiple_stations=None, station=None):

Retrieves precipitation data from BigQuery based on specified parameters.

Parameters:

  • start_date (str): Start date for data query.
  • end_date (str): End date for data query.
  • country (str): Country name for filtering stations.
  • region (str): Region name for filtering stations.
  • radius (str): Radius for stations within a specified region.
  • multiple_stations (str): Comma-separated list of station IDs.
  • station (str): Single station ID for data filtering.

Returns:

  • pd.DataFrame: A Pandas DataFrame containing the filtered precipitation data.

Usage:

To get precipitation data for a specific date range:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
pr_data = fs.filter_pr(start_date, end_date)

To get precipitation data for a specific date range and country:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
country = 'Kenya'
pr_data = fs.filter_pr(start_date, end_date, country=country)

To get precipitation data for a specific date range and region:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
region = 'Nairobi'
pr_data = fs.filter_pr(start_date, end_date, region=region)

To get precipitation data for a specific date range and region with a radius:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
region = 'Nairobi'
radius = 100
pr_data = fs.filter_pr(start_date, end_date, region=region, radius=radius)

To get precipitation data for a specific date range and multiple stations:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
multiple_stations = ['TA00001', 'TA00002', 'TA00003']
pr_data = fs.filter_pr(start_date, end_date, multiple_stations=multiple_stations)

To get precipitation data for a specific date range and a single station:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
station = 'TA00001'
pr_data = fs.filter_pr(start_date, end_date, station=station)
def clogs( self, startdate, enddate, flags_json='qualityobjects.json', as_csv=False, csv_file=None):

Generate clog flags DataFrame based on start and end dates.

Parameters:

  • startdate (str): Start date in 'YYYY-MM-DD' format.
  • enddate (str): End date in 'YYYY-MM-DD' format.
  • flags_json (str, optional): Path to the JSON file containing clog flags data. Defaults to 'qualityobjects.json'.
  • questionable (bool, optional): Whether to return questionable clog flags. Defaults to False.
  • as_csv (bool, optional): Whether to save the resulting DataFrame as a CSV file. Defaults to False.
  • csv_file (str, optional): Name of the CSV file to save. Only applicable if as_csv is True. Defaults to None.

Returns:

  • pandas.DataFrame: DataFrame containing the clog flags.
class Interactive_maps(retreive_data):
Interactive_maps(apiKey, apiSecret, api_key)
def draw_map(self, map_center):

Creates a Folium map centered on the specified location and adds markers for each weather station in the area.

Parameters:

  • map_center: a tuple with the latitude and longitude of the center of the map

Returns:

  • A Folium map object
def animation_image( self, sensors, start_date, end_date, day=100, T=10, interval=500, data=None):

Creates an animation of pollutant levels for a given range of days and valid sensors.

Parameters:

  • data (DataFrame): A pandas DataFrame containing station data defaults to none reads pr_clog_flags if none.
  • sensors (list): A list of valid sensor names.
  • day (int): The starting day of the animation (default is 100).
  • T (int): The range of days for the animation (default is 10).
  • interval (int): The interval between frames in milliseconds (default is 500).

Returns:

  • HTML: An HTML object containing the animation.
def animation_grid(self, mu_pred, xi, xj, valid_station_df, clogged_station_df, T=10):

Creates an animation of the predicted data on a grid over time.

Parameters:

  • mu_pred (ndarray): The predicted data on a grid over time.
  • xi (ndarray): The x-coordinates of the grid.
  • xj (ndarray): The y-coordinates of the grid.
  • valid_station_df (DataFrame): A DataFrame containing the information of the valid stations.
  • clogged_station_df (DataFrame): A DataFrame containing the information of the clogged stations.
  • T (int): The number of time steps.

Returns:

  • HTML: The animation as an HTML object.

The animation as an MP4 file

def plot_station(self, ws, df_rainfall):

Plot the rainfall data for a specific weather station.

Parameters:

  • ws: string, the code of the weather station to plot
  • df_rainfall: DataFrame, a pandas DataFrame with rainfall data

Returns:

  • None if no data is available for the specified station
  • a Matplotlib figure showing rainfall data for the specified station otherwise
def encode_image(self, ws, df_rainfall):

Encodes a station's rainfall data plot as a base64-encoded image.

Parameters:

  • ws (str): the code for the station to encode the image for
  • df_rainfall (pandas.DataFrame): a DataFrame containing the rainfall data for all stations

Returns:

  • str: a string containing an HTML image tag with the encoded image data, or a message indicating no data is available for the given station
def get_map( self, subset_list, start_date=None, end_date=None, data_values=False, csv_file='pr_clog_flags.csv', min_zoom=8, max_zoom=11, width=2000, height=2000, png_resolution=300):

Creates a Folium map showing the locations of the weather stations in the given subsets.

Parameters:

  • subset_list : list of lists of str List of subsets of weather stations, where each subset is a list of station codes.
  • start_date : str, optional Start date in the format YYYY-MM-DD, default is None.
  • end_date : str, optional End date in the format YYYY-MM-DD, default is None.
  • data_values : bool, optional If True, the map markers will display a plot of rainfall data, default is False.
  • csv_file : str, optional The name of the CSV file containing the rainfall data, default is 'pr_clog_flags.csv'.
  • min_zoom : int, optional The minimum zoom level of the map, default is 8.
  • max_zoom : int, optional The maximum zoom level of the map, default is 11.
  • width : int, optional The width of the map in pixels, default is 850.
  • height : int, optional The height of the map in pixels, default is 850.
  • png_resolution : int, optional The resolution of the PNG image if data_values is True, default is 300.

Returns:

  • my_map : folium.folium.Map A Folium map object showing the locations of the weather stations in the given subsets.
Subset Map
class Water_level:
def coordinates(self, region):

Get the latitude and longitude coordinates for a specified region.

Parameters:

  • region (str): The region for which coordinates are requested. Valid values are 'muringato' or 'ewaso'.

Returns:

  • tuple: A tuple containing the latitude and longitude coordinates.

Raises:

  • ValueError: If the region provided is not 'muringato' or 'ewaso'.

Example:

# Example usage:
wl = Water_level()
coords = wl.coordinates('muringato')
print(coords)  # Output: (-0.406689, 36.96301)
def water_level_data(self, region, start_date=None, end_date=None):

Retrieve water level data for a specified region and optional date range.

Parameters:

  • region (str): The region for which water level data is requested. Valid values are 'muringato' or 'ewaso'.
  • start_date (str, optional): Start date for filtering the data. Format: 'YYYY-MM-DD'.
  • end_date (str, optional): End date for filtering the data. Format: 'YYYY-MM-DD'.

Returns:

  • pd.DataFrame: A Pandas DataFrame containing water level data with a DateTime index.

Raises:

  • ValueError: If the region provided is not 'muringato' or 'ewaso'.
  • ValueError: If the request to the API is not successful.

Usage:

from filter_stations import water_level
wl = Water_level()
# get water level data for the muringato gauging station
muringato_data = wl.water_level_data('muringato')
# get water level data for the ewaso gauging station
ewaso_data = wl.water_level_data('ewaso') 
class transform_data:
transform_data(apiKey, apiSecret, api_key)
def transform_station_status( self, station_status, today=datetime.date(2024, 9, 7), transformed_data=True):

Transforms the station status data into a dictionary with date as the key and online status as the value.

Parameters:

  • station_status (DataFrame): The original DataFrame containing 'id' and 'status' of the stations.
  • today (datetime.date, optional): The date to be used as the index when the job is run. Default is the current date.
  • transformed_data (bool, optional): If True, the data will be transposed and formatted. If False, the original DataFrame will be used with an additional 'Date' column. Default is True.

Returns:

  • dict: A dictionary containing the transformed station status data.

Note:

  • If transformed_data is True: The returned dictionary will have the date (today) as the key and the number of stations online for that day as the value. Example: {datetime.date(2023, 7, 29): {1: True, 2: False, 3: True}}

  • If transformed_data is False: The returned dictionary will have each row of the original DataFrame with an additional 'Date' column. Example: {0: {'id': 1, 'online': True, 'Date': datetime.date(2023, 7, 29)}, 1: {'id': 2, 'online': False, 'Date': datetime.date(2023, 7, 29)}, 2: {'id': 3, 'online': True, 'Date': datetime.date(2023, 7, 29)}}

def parse_args():