filter-stations

Installation

To install the package, run the following command in your terminal:

pip install -U filter-stations

Getting Started

All methods require an API key and secret, which can be obtained by contacting TAHMO.

The retreive_data class is used to retrieve data from the TAHMO API endpoints.
The Filter class is used to filter weather stations data based on things like distance and region.
The pipeline class is used to create a pipeline of filters to apply to weather stations based on how they correlate with water level data.
The Interactive_maps class is used to plot weather stations on an interactive map.
The Water_level class is used to retrieve water level data and coordinates of gauging stations.

# Import the necessary modules
from filter_stations import retreive_data, Filter, pipeline, Interactive_maps

# Define the API key and secret
apiKey = 'your_api_key' # request from TAHMO
apiSecret = 'your_api_secret' # request from TAHMO
maps_key = 'your_google_maps_key' # retrieve from google maps platform

# Initialize the class
ret = retreive_data(apiKey, apiSecret, maps_key)
fs = Filter(apiKey, apiSecret, maps_key)
pipe = pipeline(apiKey, apiSecret, maps_key)
maps = Interactive_maps(apiKey, apiSecret, maps_key)

API_BASE_URL = 'https://datahub.tahmo.org'

API_MAX_PERIOD = '365D'

endpoints = {'VARIABLES': 'services/assets/v2/variables', 'STATION_INFO': 'services/assets/v2/stations', 'WEATHER_DATA': 'services/measurements/v2/stations', 'DATA_COMPLETE': 'custom/sensordx/latestmeasurements', 'STATION_STATUS': 'custom/stations/status', 'QUALITY_OBJECTS': 'custom/sensordx/reports'}

module_dir = 'D:\\previous-pc\\Agent\\Packages\\.'

class retreive_data:

retreive_data(apiKey, apiSecret, api_key)

apiKey

apiSecret

api_key

def get_stations_info(self, station=None, multipleStations=[], countrycode=None):

Retrieves information about weather stations from an API endpoint and returns relevant information based on the parameters passed to it.

Parameters:

station (str, optional): Code for a single station to retrieve information for. Defaults to None.
multipleStations (list, optional): List of station codes to retrieve information for multiple stations. Defaults to [].
countrycode (str, optional): Country code to retrieve information for all stations located in the country. Defaults to None.

Returns:

pandas.DataFrame: DataFrame containing information about the requested weather stations.

Usage:

To retrieve information about a single station:

station_info = ret.get_stations_info(station='TA00001')

To retrieve information about multiple stations:

station_info = ret.get_stations_info(multipleStations=['TA00001', 'TA00002'])

To retrieve information about all stations in a country:

station_info = ret.get_stations_info(countrycode='KE')

def get_coordinates(self, station_sensor, normalize=False):

Retrieve longitudes,latitudes for a list of station_sensor names and duplicated for stations with multiple sensors.

Parameters:

station_sensor (list): List of station_sensor names.
normalize (bool): If True, normalize the coordinates using MinMaxScaler to the range (0,1).

Returns:

pd.DataFrame: DataFrame containing longitude and latitude coordinates for each station_sensor.

Usage:

To retrieve coordinates

start_date = '2023-01-01'
end_date = '2023-12-31'
country= 'KE'

# get the precipitation data for the stations
ke_pr = filt.filter_pr(start_date=start_date, end_date=end_date, 
                        country='Kenya').set_index('Date')

# get the coordinates
xs = ret.get_coordinates(ke_pr.columns, normalize=True)

def get_variables(self):

Retrieves information about available weather variables from an API endpoint.

Returns:

dict: Dictionary containing information about available weather variables, keyed by variable shortcode.

def k_neighbours(self, station, number=5):

Returns a dictionary of the nearest neighbouring stations to the specified station.

Parameters:

station (str): Code for the station to find neighbouring stations for.
number (int, optional): Number of neighbouring stations to return. Defaults to 5.

Returns:

dict: Dictionary containing the station codes and distances of the nearest neighbouring stations.

def station_status(self):

Retrieves the status of all weather stations

Returns:

pandas.DataFrame: DataFrame containing the status of all weather stations.

def trained_models(self, columns=None):

Retrieves trained models from the MongoDB.

Parameters:

columns (list of str, optional): List of column names to include in the returned DataFrame. If None, all columns are included. Defaults to None.

Returns:

pandas.DataFrame: DataFrame containing trained models with the specified columns.

def aggregate_variables(self, dataframe, freq='1D', method='sum'):

Aggregates a pandas DataFrame of weather variables by applying a specified method across a given frequency.

Parameters:

dataframe (pandas.DataFrame): DataFrame containing weather variable data.
freq (str, optional): Frequency to aggregate the data by. Defaults to '1D'. Examples include '1H' for hourly, '12H' for every 12 hours, '1D' for daily, '1W' for weekly, '1M' for monthly, etc.

method (str or callable, optional): Method to use for aggregation. Defaults to 'sum'. Acceptable string values are 'sum', 'mean', 'min', 'max'. Alternatively, you can provide a custom aggregation function (callable).

                            Example of a custom method:
                            <div class="pdoc-code codehilite">
                            <pre><span></span><code><span class="k">def</span> <span class="nf">custom_median</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
                                <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">nan</span> <span class="k">if</span> <span class="n">x</span><span class="o">.</span><span class="n">isnull</span><span class="p">()</span><span class="o">.</span><span class="n">all</span><span class="p">()</span> <span class="k">else</span> <span class="n">x</span><span class="o">.</span><span class="n">median</span><span class="p">()</span>
                            
                            <span class="n">daily_median_data</span> <span class="o">=</span> <span class="n">aggregate_variables</span><span class="p">(</span><span class="n">dataframe</span><span class="p">,</span> <span class="n">freq</span><span class="o">=</span><span class="s1">&#39;1D&#39;</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="n">custom_median</span><span class="p">)</span>
                            </code></pre>
                            </div>

Returns:

pandas.DataFrame: DataFrame containing aggregated weather variable data according to the specified frequency and method.

Usage:

Define the DataFrame containing the weather variable data:

dataframe = ret.get_measurements('TA00001', '2020-01-01', '2020-01-31', ['pr']) # data comes in 5 minute interval

To aggregate data hourly:

hourly_data = aggregate_variables(dataframe, freq='1H')

To aggregate data by 12 hours:

half_day_data = aggregate_variables(dataframe, freq='12H')

To aggregate data by day:

daily_data = aggregate_variables(dataframe, freq='1D')

To aggregate data by week:

weekly_data = aggregate_variables(dataframe, freq='1W')

To aggregate data by month:

monthly_data = aggregate_variables(dataframe, freq='1M')

To use a custom aggregation method:

def custom_median(x):
    return np.nan if x.isnull().all() else x.median()

daily_median_data = aggregate_variables(dataframe, freq='1D', method=custom_median)

def aggregate_qualityflags(self, dataframe, freq='1D'):

Aggregate quality flags in a DataFrame by day.

Parameters:

dataframe (pd.DataFrame): The DataFrame containing the measurements.

Returns:

pd.DataFrame: A DataFrame with aggregated quality flags, where values greater than 1 are rounded up.

def get_measurements( self, station, startDate=None, endDate=None, variables=None, dataset='controlled', aggregate='5min', quality_flags=False):

Get measurements from a station.

Parameters:

station (str): The station ID.
startDate (str, optional): The start date of the measurement period in the format 'YYYY-MM-DD'.
endDate (str, optional): The end date of the measurement period in the format 'YYYY-MM-DD'.
variables (list, optional): The variables to retrieve measurements for. If None, all variables are retrieved.
dataset (str, optional): The dataset to retrieve measurements from. Default is 'controlled'.
aggregate (bool, optional): Whether to aggregate the measurements by variable. Default is False.
quality_flags (bool, optional): Whether to include quality flag data. Default is False.

Returns:

A DataFrame containing the measurements.

Usage:

To retrieve precipitation data for a station for the last month:

from datetime import datetime, timedelta

# Get today's date
today = datetime.now()

# Calculate one month ago
last_month = today - timedelta(days=30)

# Format date as a string
last_month_str = last_month.strftime('%Y-%m-%d')
today_str = today.strftime('%Y-%m-%d')

# Define the station you want to retrieve data from
station = 'TA00001'
variables = ['pr']
dataset = 'raw'

# aggregate the data to 30 minutes interval
aggregate = '30min'

# Call the get_measurements method to retrieve and aggregate data
TA00001_data = ret.get_measurements(station, last_month_str, 
                                    today_str, variables, 
                                    dataset, aggregate)

def multiple_measurements( self, stations_list, startDate, endDate, variables, dataset='controlled', csv_file=None, aggregate='1D', quality_flags=False):

Retrieves measurements for multiple stations within a specified date range.

Parameters:

stations_list (list): A list of strings containing the codes of the stations to retrieve data from.
startDate (str): The start date for the measurements, in the format 'yyyy-mm-dd'.
endDate (str): The end date for the measurements, in the format 'yyyy-mm-dd'.
variables (list): A list of strings containing the names of the variables to retrieve.
dataset (str): The name of the database to retrieve the data from. Default is 'controlled' alternatively 'raw' database.
csv_file (str, optional): pass the name of the csv file to save the data otherwise it will return the dataframe.
aggregate (bool): If True, aggregate the data per day; otherwise, return data in 5 minute interval.

Returns:

df (pandas.DataFrame): A DataFrame containing the aggregated data for all stations.

Raises:

ValueError: If stations_list is not a list.

Example Usage:

To retrieve precipitation data for stations in Kenya for the last week and save it as a csv file:

# Import the necessary modules
from datetime import datetime, timedelta
from filter_stations import retreive_data

# An instance of the retreive_data class
ret = retreive_data(apiKey, apiSecret, maps_key)

# Get today's date
today = datetime.now()

# Calculate one week ago
last_week = today - timedelta(days=7)

# Format date as a string
last_week_str = last_week.strftime('%Y-%m-%d')
today_str = today.strftime('%Y-%m-%d')

# Define the list of stations you want to retrieve data from example stations in Kenya
stations = list(ret.get_stations_info(countrycode='KE')['code'])

# Get the precipitation data for the stations in the list
variables = ['pr']

# retrieve the raw data for the stations, aggregate the data and save it as a csv file
dataset = 'raw'
aggregate = '1D'
csv_file = 'Kenya_precipitation_data'

# Call the multiple_measurements method to retrieve and aggregate data
aggregated_data = ret.multiple_measurements(stations, last_week_str, 
                                            today_str, variables, 
                                            dataset, csv_file, aggregate)

def multiple_qualityflags(self, stations_list, startDate, endDate, csv_file=None):

Retrieves and aggregates quality flag data for multiple stations within a specified date range.

Parameters:

stations_list (list): A list of station codes for which to retrieve data.
startDate (str): The start date in 'YYYY-MM-DD' format.
endDate (str): The end date in 'YYYY-MM-DD' format.
csv_file (str, optional): The name of the CSV file to save the aggregated data. Default is None.

Returns:

pandas.DataFrame or None: A DataFrame containing the aggregated quality flag data for the specified stations, or None if an error occurs.

Raises: Exception: If an error occurs while retrieving data for a station.

def anomalies_report(self, start_date, end_date=None):

Retrieves anomaly reports for a specified date range.

Parameters:

start_date (str): The start date for the report in 'yyyy-mm-dd' format.
end_date (str, optional): The end date for the report in 'yyyy-mm-dd' format. If not provided, only data for the start_date is returned.

Returns:

pandas.DataFrame: A DataFrame containing anomaly reports with columns 'startDate', 'station_sensor', and 'level'. The 'startDate' column is used as the index.

Raises:

Exception: If there's an issue with the API request.

Usage:

To retrieve anomaly reports for a specific date range:

start_date = '2023-01-01'
end_date = '2023-01-31'
report_data = ret.anomalies_report(start_date, end_date)

To retrieve anomaly reports for a specific date:

start_date = '2023-01-01'
report_data = ret.anomalies_report(start_date)

def ground_truth(self, start_date, end_date=None):

Retrieves ground truth data for a specified date range.

Parameters:

start_date (str): The start date for the report in 'yyyy-mm-dd' format.
end_date (str, optional): The end date for the report in 'yyyy-mm-dd' format. If not provided, only data for the start_date is returned.

Returns:

pandas.DataFrame: A DataFrame containing ground truth data with columns 'startDate', 'station_sensor', 'description' and 'level'. The 'startDate' column is used as the index.

Raises:

Exception: If there's an issue with the API request.

Usage:

To retrieve ground truth data for a specific date range:

start_date = '2023-01-01'
end_date = '2023-01-31'
report_data = ret.ground_truth(start_date, end_date)

To retrieve ground truth data for a specific date:

start_date = '2023-01-01'
report_data = ret.ground_truth(start_date)

class Kieni:

Kieni(api_key, api_secret)

api_key

api_secret

def kieni_weather_data( self, start_date=None, end_date=None, variable=None, method='sum', freq='1D'):

Retrieves weather data from the Kieni API endpoint and returns it as a pandas DataFrame after processing.

Parameters:

start_date (str, optional): The start date for retrieving weather data in 'YYYY-MM-DD' format. Defaults to None if None returns from the beginning of the data.
end_date (str, optional): The end date for retrieving weather data in 'YYYY-MM-DD' format. Defaults to None if None returns to the end of the data.
variable (str, optional): The weather variable to retrieve same as the weather shortcodes by TAHMO e.g., 'pr', 'ap', 'rh'
method (str, optional): The aggregation method to apply to the data ('sum', 'mean', 'min', 'max' and custom functions). Defaults to 'sum'.
freq (str, optional): The frequency for data aggregation (e.g., '1D' for daily, '1H' for hourly). Defaults to '1D'.

Returns:

pandas.DataFrame: DataFrame containing the weather data for the specified parameters, with columns containing NaN values dropped.

Usage:

To retrieve daily rainfall data from January 1, 2024, to January 31, 2024:

# Instantiate the Kieni class
api_key, api_secret = '', '' # Request DSAIL for the API key and secret
kieni = Kieni(api_key, api_secret)

kieni_weather_data = kieni.kieni_weather_data(start_date='2024-01-01', end_date='2024-01-31', variable='pr', freq='1D', method='sum')

To retrieve hourly temperature data from February 1, 2024, to February 7, 2024:

kieni_weather_data = kieni.kieni_weather_data(start_date='2024-02-01', end_date='2024-02-07', variable='te', method='mean', freq='1H')

class pipeline(retreive_data):

pipeline(apiKey, apiSecret, api_key)

def stations_within_radius(self, radius, latitude, longitude, df=False):

Retrieves stations within a specified radius from a given latitude and longitude.

Parameters:

radius (float): Radius (in kilometers) within which to search for stations.
latitude (float): Latitude of the center point.
longitude (float): Longitude of the center point.
df (bool, optional): Flag indicating whether to return the result as a DataFrame. Defaults to False.

Returns:

DataFrame or list: DataFrame or list containing the stations within the specified radius. If df is True, a DataFrame is returned with the columns 'code', 'location.latitude', 'location.longitude', and 'distance'. If df is False, a list of station codes is returned.

def stations_data_check( self, stations_list, percentage=1, start_date=None, end_date=None, data=None, variables=['pr'], csv_file=None):

Performs a data check on the stations' data and returns the stations with a percentage of missing data below a threshold.

Parameters:

stations_list (list): List of station names or IDs.
percentage (float, optional): Threshold percentage of missing data. Defaults to 1 (i.e., 0% missing data allowed).
start_date (str, optional): Start date for the data range in the format 'YYYY-MM-DD'. Defaults to None.
end_date (str, optional): End date for the data range in the format 'YYYY-MM-DD'. Defaults to None.
data (DataFrame, optional): Preloaded data for the stations. Defaults to None.
variables (list, optional): List of variables to consider for the data check. Defaults to ['pr'].
csv_file (str, optional): File name for saving the data as a CSV file. Defaults to None.

Returns:

DataFrame: DataFrame containing the stations' data with less than the specified percentage of missing data.

def calculate_lag( self, weather_stations_data, water_level_data, lag=3, above=None, below=None):

Calculates the lag and coefficient of correlation between weather station data and water level data, identifying stations with positive correlations.

Parameters:

weather_stations_data (DataFrame): A DataFrame containing weather station data columns for analysis.
water_level_data (Series): A time series of water level data used for correlation analysis.
lag (int): The maximum lag, in hours, to consider for correlation. Default is 3 hours.
above (float or None): If specified, stations with correlations and lags above this threshold are identified.
below (float or None): If specified, stations with correlations and lags below this threshold are identified.

Returns:

above_threshold_lag (dict): A dictionary where keys represent weather station column names, and values represent the lag in hours if positive correlation exceeds the specified threshold (above).
below_threshold_lag (dict): A dictionary where keys represent weather station column names, and values represent the lag in hours if positive correlation falls below the specified threshold (below).

def shed_stations( self, weather_stations_data, water_level_data, gauging_station_coords, radius, lag=3, percentage=1):

Filters and processes weather station data to identify stations potentially contributing to water level changes above or below specified thresholds.

Parameters:

weather_stations_data (DataFrame): A DataFrame containing weather station data over a specific date range.
water_level_data (Series): A time series of water level data corresponding to the same date range as weather_station_data.
gauging_station_coords (tuple): A tuple containing latitude and longitude coordinates of the gauging station.
radius (float): The radius in kilometers for identifying nearby weather stations.
lag (int): The time lag, in hours, used for correlation analysis. Default is 3 hours.
percentage (float): The minimum percentage of valid data required for a weather station to be considered. Default is 1 (100%).
above (float or None): The threshold above which water level changes are considered significant. If provided, stations contributing to changes above this threshold are identified.
below (float or None): The threshold below which water level changes are considered significant. If provided, stations contributing to changes below this threshold are identified.

Returns:

above_threshold_lag (list): List of weather stations with positive correlations and lagged changes above the specified threshold.
below_threshold_lag (list): List of weather stations with positive correlations and lagged changes below the specified threshold.

Usage:

Get the TAHMO stations that correlate with the water level data

import pandas as pd
from filter_stations import pipeline

# An instance of the pipeline class
pipe = pipeline(apiKey, apiSecret, maps_key)

# load the water level data and the weather stations data
water_level_data = pd.read_csv('water_level_data.csv')
weather_stations_data = pd.read_csv('weather_stations_data.csv') 

# get the coordinates of the gauging station
gauging_station_coords = (-0.416, 36.951)

# get the stations within a radius of 200km from the gauging station
radius = 200

# get the stations that correlate with the water level data
above_threshold_lag, below_threshold_lag = pipe.shed_stations(weather_stations_data, water_level_data, 
                                                              gauging_station_coords, radius, 
                                                              lag=3, percentage=1)

def plot_figs( self, weather_stations, water_list, threshold_list, save=False, dpi=500, date='11-02-2021'):

Plots figures showing the relationship between rainfall and water level/stage against time.

Parameters:

weather_stations (DataFrame): DataFrame containing weather station data.
water_list (list): List of water levels/stages.
threshold_list (list): List of columns in the weather_stations DataFrame to plot.
save (bool, optional): Flag indicating whether to save the figures as PNG files. Defaults to False.
dpi (int, optional): Dots per inch for saving the figures. Defaults to 500.
date (str, optional): Start date for plotting in the format 'dd-mm-yyyy'. Defaults to '11-02-2021'.

Returns:

Displays the images of the plots. and if save is set to true saves the images in the current directory.

Inherited Members

retreive_data: apiKey; apiSecret; api_key; get_stations_info; get_coordinates; get_variables; k_neighbours; station_status; trained_models; aggregate_variables; aggregate_qualityflags; get_measurements; multiple_measurements; multiple_qualityflags; anomalies_report; ground_truth

class Filter(pipeline):

Filter(apiKey, apiSecret, api_key)

def get_stations_info(self, station=None, multipleStations=[], countrycode=None):

Retrieves information about weather stations from an API endpoint and returns relevant information based on the parameters passed to it.

Parameters:

station (str, optional): Code for a single station to retrieve information for. Defaults to None.
multipleStations (list, optional): List of station codes to retrieve information for multiple stations. Defaults to [].
countrycode (str, optional): Country code to retrieve information for all stations located in the country. Defaults to None.

Returns:

pandas.DataFrame: DataFrame containing information about the requested weather stations.

Usage:

To retrieve information about a single station:

station_info = ret.get_stations_info(station='TA00001')

To retrieve information about multiple stations:

station_info = ret.get_stations_info(multipleStations=['TA00001', 'TA00002'])

To retrieve information about all stations in a country:

station_info = ret.get_stations_info(countrycode='KE')

def centre_point(self, address):

This method retrieves the latitude and longitude coordinates of a given address using the Google Maps Geocoding API.

Parameters:

address : str The address of the location you want to retrieve the coordinates for.
api_key : str Your Google Maps Geocoding API key.

Returns:

Tuple (float, float) or None The latitude and longitude coordinates of the location if found, or None if the address is not found.

def calculate_new_point(self, lat, lon, distance, bearing):

Calculates a new geographic point based on the given latitude, longitude, distance and bearing.

Parameters:

lat (float): The latitude of the starting point in decimal degrees.
lon (float): The longitude of the starting point in decimal degrees.
distance (float): The distance in kilometers from the starting point to the new point.
bearing (float): The bearing in degrees from the starting point to the new point, measured clockwise from true north.

Returns:

Tuple[float, float]: A tuple containing the latitude and longitude of the new point, respectively, in decimal degrees.

def compute_filter(self, lat, lon, distance):

Calculates the bounding box coordinates for a given location and distance.

Parameters:

lat (float): The latitude of the location.
lon (float): The longitude of the location.
distance (float): The distance from the location, in kilometers, to the edge of the bounding box.

Returns:

A tuple containing four floats representing the bounding box coordinates: (min_lat, min_lon, max_lat, max_lon).

def filter_stations( self, address, distance, startDate=None, endDate=None, csvfile='pr_clog_flags.csv'):

This method filters weather station data within a certain distance from a given address.

Parameters:

address (str): Address to center the bounding box around.
distance (float): The distance (in kilometers) from the center to the edge of the bounding box.
startDate (str): The start date for filtering the weather station data in the format 'YYYY-MM-DD'.
endDate (str): The end date for filtering the weather station data in the format 'YYYY-MM-DD'.
csvfile (str): The name of the csv file containing the weather station data.

Returns:

pandas.DataFrame: The filtered weather station data within the bounding box.

def filter_stations_list(self, address, distance=100):

Filters stations based on their proximity to a given address and returns a list of station codes that fall within the specified distance.

Parameters:

address (str): Address to filter stations by.
distance (float, optional): Maximum distance (in kilometers) between the stations and the address. Default is 100 km.

Returns:

List of station codes that fall within the specified distance from the given address.

def stations_region(self, region, plot=False):

Subsets weather stations by a specific geographical region and optionally plots them on a map with a scale bar.

Parameters:

region (str): The name of the region to subset stations from (47 Kenyan counties).
plot (bool, optional): If True, a map with stations and a scale bar is plotted. Default is False.

Returns:

list or None: If plot is False, returns a list of station codes in the specified region. Otherwise, returns None.

Usage:

To get a list of station codes in the 'Nairobi' region without plotting:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
station_list = fs.stations_region('Nairobi')

To subset stations in the 'Nairobi' region and display them on a map with a scale bar:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
fs.stations_region('Nairobi', plot=True)

def remove_zero_columns(self, df):

Removes columns with all zeros from a DataFrame.

Parameters:

df (DataFrame): The DataFrame to remove columns from.

Returns:

DataFrame: The DataFrame with columns containing all zeros removed.

def filter_pr( self, start_date, end_date, country=None, region=None, radius=None, multiple_stations=None, station=None):

Retrieves precipitation data from BigQuery based on specified parameters.

Parameters:

start_date (str): Start date for data query.
end_date (str): End date for data query.
country (str): Country name for filtering stations.
region (str): Region name for filtering stations.
radius (str): Radius for stations within a specified region.
multiple_stations (str): Comma-separated list of station IDs.
station (str): Single station ID for data filtering.

Returns:

pd.DataFrame: A Pandas DataFrame containing the filtered precipitation data.

Usage:

To get precipitation data for a specific date range:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
pr_data = fs.filter_pr(start_date, end_date)

To get precipitation data for a specific date range and country:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
country = 'Kenya'
pr_data = fs.filter_pr(start_date, end_date, country=country)

To get precipitation data for a specific date range and region:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
region = 'Nairobi'
pr_data = fs.filter_pr(start_date, end_date, region=region)

To get precipitation data for a specific date range and region with a radius:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
region = 'Nairobi'
radius = 100
pr_data = fs.filter_pr(start_date, end_date, region=region, radius=radius)

To get precipitation data for a specific date range and multiple stations:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
multiple_stations = ['TA00001', 'TA00002', 'TA00003']
pr_data = fs.filter_pr(start_date, end_date, multiple_stations=multiple_stations)

To get precipitation data for a specific date range and a single station:

fs = Filter(api_key, api_secret, maps_key)  # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
station = 'TA00001'
pr_data = fs.filter_pr(start_date, end_date, station=station)

def clogs( self, startdate, enddate, flags_json='qualityobjects.json', as_csv=False, csv_file=None):

Generate clog flags DataFrame based on start and end dates.

Parameters:

startdate (str): Start date in 'YYYY-MM-DD' format.
enddate (str): End date in 'YYYY-MM-DD' format.
flags_json (str, optional): Path to the JSON file containing clog flags data. Defaults to 'qualityobjects.json'.
questionable (bool, optional): Whether to return questionable clog flags. Defaults to False.
as_csv (bool, optional): Whether to save the resulting DataFrame as a CSV file. Defaults to False.
csv_file (str, optional): Name of the CSV file to save. Only applicable if as_csv is True. Defaults to None.

Returns:

pandas.DataFrame: DataFrame containing the clog flags.

Inherited Members

pipeline: stations_within_radius; stations_data_check; calculate_lag; shed_stations; plot_figs
retreive_data: apiKey; apiSecret; api_key; get_coordinates; get_variables; k_neighbours; station_status; trained_models; aggregate_variables; aggregate_qualityflags; get_measurements; multiple_measurements; multiple_qualityflags; anomalies_report; ground_truth

class Interactive_maps(retreive_data):

Interactive_maps(apiKey, apiSecret, api_key)

def draw_map(self, map_center):

Creates a Folium map centered on the specified location and adds markers for each weather station in the area.

Parameters:

map_center: a tuple with the latitude and longitude of the center of the map

Returns:

A Folium map object

def animation_image( self, sensors, start_date, end_date, day=100, T=10, interval=500, data=None):

Creates an animation of pollutant levels for a given range of days and valid sensors.

Parameters:

data (DataFrame): A pandas DataFrame containing station data defaults to none reads pr_clog_flags if none.
sensors (list): A list of valid sensor names.
day (int): The starting day of the animation (default is 100).
T (int): The range of days for the animation (default is 10).
interval (int): The interval between frames in milliseconds (default is 500).

Returns:

HTML: An HTML object containing the animation.

def animation_grid(self, mu_pred, xi, xj, valid_station_df, clogged_station_df, T=10):

Creates an animation of the predicted data on a grid over time.

Parameters:

mu_pred (ndarray): The predicted data on a grid over time.
xi (ndarray): The x-coordinates of the grid.
xj (ndarray): The y-coordinates of the grid.
valid_station_df (DataFrame): A DataFrame containing the information of the valid stations.
clogged_station_df (DataFrame): A DataFrame containing the information of the clogged stations.
T (int): The number of time steps.

Returns:

HTML: The animation as an HTML object.

The animation as an MP4 file

def plot_station(self, ws, df_rainfall):

Plot the rainfall data for a specific weather station.

Parameters:

ws: string, the code of the weather station to plot
df_rainfall: DataFrame, a pandas DataFrame with rainfall data

Returns:

None if no data is available for the specified station
a Matplotlib figure showing rainfall data for the specified station otherwise

def encode_image(self, ws, df_rainfall):

Encodes a station's rainfall data plot as a base64-encoded image.

Parameters:

ws (str): the code for the station to encode the image for
df_rainfall (pandas.DataFrame): a DataFrame containing the rainfall data for all stations

Returns:

str: a string containing an HTML image tag with the encoded image data, or a message indicating no data is available for the given station

def get_map( self, subset_list, start_date=None, end_date=None, data_values=False, csv_file='pr_clog_flags.csv', min_zoom=8, max_zoom=11, width=2000, height=2000, png_resolution=300):

Creates a Folium map showing the locations of the weather stations in the given subsets.

Parameters:

subset_list : list of lists of str List of subsets of weather stations, where each subset is a list of station codes.
start_date : str, optional Start date in the format YYYY-MM-DD, default is None.
end_date : str, optional End date in the format YYYY-MM-DD, default is None.
data_values : bool, optional If True, the map markers will display a plot of rainfall data, default is False.
csv_file : str, optional The name of the CSV file containing the rainfall data, default is 'pr_clog_flags.csv'.
min_zoom : int, optional The minimum zoom level of the map, default is 8.
max_zoom : int, optional The maximum zoom level of the map, default is 11.
width : int, optional The width of the map in pixels, default is 850.
height : int, optional The height of the map in pixels, default is 850.
png_resolution : int, optional The resolution of the PNG image if data_values is True, default is 300.

Returns:

my_map : folium.folium.Map A Folium map object showing the locations of the weather stations in the given subsets.

Inherited Members

retreive_data: apiKey; apiSecret; api_key; get_stations_info; get_coordinates; get_variables; k_neighbours; station_status; trained_models; aggregate_variables; aggregate_qualityflags; get_measurements; multiple_measurements; multiple_qualityflags; anomalies_report; ground_truth

class Water_level:

def coordinates(self, region):

Get the latitude and longitude coordinates for a specified region.

Parameters:

region (str): The region for which coordinates are requested. Valid values are 'muringato' or 'ewaso'.

Returns:

tuple: A tuple containing the latitude and longitude coordinates.

Raises:

ValueError: If the region provided is not 'muringato' or 'ewaso'.

Example:

# Example usage:
wl = Water_level()
coords = wl.coordinates('muringato')
print(coords)  # Output: (-0.406689, 36.96301)

def water_level_data(self, region, start_date=None, end_date=None):

Retrieve water level data for a specified region and optional date range.

Parameters:

region (str): The region for which water level data is requested. Valid values are 'muringato' or 'ewaso'.
start_date (str, optional): Start date for filtering the data. Format: 'YYYY-MM-DD'.
end_date (str, optional): End date for filtering the data. Format: 'YYYY-MM-DD'.

Returns:

pd.DataFrame: A Pandas DataFrame containing water level data with a DateTime index.

Raises:

ValueError: If the region provided is not 'muringato' or 'ewaso'.
ValueError: If the request to the API is not successful.

Usage:

from filter_stations import water_level
wl = Water_level()
# get water level data for the muringato gauging station
muringato_data = wl.water_level_data('muringato')
# get water level data for the ewaso gauging station
ewaso_data = wl.water_level_data('ewaso')

class transform_data:

transform_data(apiKey, apiSecret, api_key)

def transform_station_status( self, station_status, today=datetime.date(2024, 9, 7), transformed_data=True):

Transforms the station status data into a dictionary with date as the key and online status as the value.

Parameters:

station_status (DataFrame): The original DataFrame containing 'id' and 'status' of the stations.
today (datetime.date, optional): The date to be used as the index when the job is run. Default is the current date.
transformed_data (bool, optional): If True, the data will be transposed and formatted. If False, the original DataFrame will be used with an additional 'Date' column. Default is True.

Returns:

dict: A dictionary containing the transformed station status data.

Note:

If transformed_data is True: The returned dictionary will have the date (today) as the key and the number of stations online for that day as the value. Example: {datetime.date(2023, 7, 29): {1: True, 2: False, 3: True}}
If transformed_data is False: The returned dictionary will have each row of the original DataFrame with an additional 'Date' column. Example: {0: {'id': 1, 'online': True, 'Date': datetime.date(2023, 7, 29)}, 1: {'id': 2, 'online': False, 'Date': datetime.date(2023, 7, 29)}, 2: {'id': 3, 'online': True, 'Date': datetime.date(2023, 7, 29)}}

def parse_args():