Machine Learning and Smart Agriculture using Python and Dash

ML-Dash-Agro

In the fast-evolving world of agriculture, the integration of machine learning and smart technologies has paved the way for revolutionary advancements. Smart Agriculture, also known as Precision Agriculture, employs cutting-edge technologies like Machine Learning to enhance crop yields, optimize resource utilization, and improve overall farm management. In this article, we will explore how Python and Dash, two powerful tools, come together to enable the implementation of smart agriculture solutions. We will delve into various Machine Learning algorithms, including Support Vector Machines (SVM) with linear, polynomial, and radial basis function (RBF) kernels, Decision Tree Classifier, Random Forest Classifier, Gradient Boosting Classifier, and K-Nearest Neighbors (KNN) Classifier, to highlight their role in shaping the future of agriculture.

SVM with Linear, Polynomial, and RBF Kernels

Support Vector Machines (SVM) are a group of supervised learning models that excel in both classification and regression tasks. SVMs classify data by finding an optimal hyperplane that best separates different classes. With Python and Dash, implementing SVM becomes more accessible than ever. Dash, a Python framework for building analytical web applications, provides a user-friendly interface to visualize and interact with SVM models.

1. SVM Introduction Support Vector Machines are a powerful class of algorithms used in various fields, including agriculture. SVM aims to find the best hyperplane that effectively separates different classes of data points in a high-dimensional feature space. It is particularly useful for scenarios with complex decision boundaries, making it a valuable tool for analyzing agricultural data.

2. SVM with Linear Kernel The linear kernel is the simplest form of SVM, suitable for linearly separable data. It performs exceptionally well in scenarios where the classes can be separated by a straight line or a plane. We’ll explore how to implement SVM with a linear kernel using Python and visualize the results with Dash.

3. SVM with Polynomial Kernel When the data is not linearly separable, the polynomial kernel comes into play. It allows SVM to handle curved decision boundaries, enabling better classification in more complex scenarios. We’ll discuss the implementation of SVM with a polynomial kernel and its applications in smart agriculture.

4. SVM with RBF Kernel The Radial Basis Function (RBF) kernel is a popular choice for SVM due to its flexibility in capturing intricate decision boundaries. This kernel can handle non-linearly separable data effectively, making it highly suitable for real-world agricultural datasets. We’ll explore how to utilize the RBF kernel with SVM and visualize its performance using Dash.

Decision Tree Classifier

Decision Tree Classifier is a widely used algorithm that creates a tree-like model for classification tasks. It recursively partitions the data into subsets based on different features, and each partition represents a node in the tree. Python and Dash make it convenient to build, visualize, and interpret Decision Tree models.

1. Decision Tree Overview Before delving into the implementation details, let’s understand the basics of Decision Trees and how they work in the context of agricultural data analysis. Decision Trees offer an intuitive way to make decisions based on a series of if-else conditions.

2. Building a Decision Tree Classifier We’ll explore how to build a Decision Tree Classifier using Python and the popular scikit-learn library. Additionally, we’ll see how to handle hyperparameters to optimize the model’s performance for agricultural datasets.

3. Visualizing Decision Trees with Dash Dash provides an interactive platform to visualize Decision Tree models. We’ll learn how to use Dash to create a user-friendly interface to explore and understand the decision-making process of the model.

Random Forest Classifier

Random Forest is an ensemble learning method that combines multiple Decision Trees to achieve better predictive performance. It mitigates overfitting issues and enhances the accuracy and robustness of the model. Python and Dash can be utilized to implement and visualize Random Forest models.

1. Understanding Random Forest To effectively implement Random Forest in agriculture, it’s essential to grasp its core concepts and advantages over individual Decision Trees. We’ll discuss how Random Forest improves the accuracy and generalization of predictions.

2. Building a Random Forest Classifier We’ll walk through the process of building a Random Forest Classifier using Python’s scikit-learn library. This will include tuning hyperparameters and evaluating the model’s performance.

3. Visualizing Random Forest with Dash Dash provides an interactive environment to visualize Random Forest models, enabling users to explore the collective decision-making process of multiple trees. We’ll create a dynamic dashboard to enhance model interpretation.

Gradient Boosting Classifier

Gradient Boosting is another ensemble learning technique that sequentially builds multiple weak learners to create a strong predictive model. It excels in handling complex relationships within data and is widely used in agriculture for tasks like yield prediction and disease detection.

1. Introduction to Gradient Boosting Before diving into the implementation details, let’s understand the principles of Gradient Boosting and its significance in improving predictive accuracy.

2. Building a Gradient Boosting Classifier We’ll implement Gradient Boosting using Python’s XGBoost library, a popular choice for this technique. Additionally, we’ll discuss strategies to fine-tune the model for agricultural applications.

3. Visualizing Gradient Boosting with Dash Dash offers an excellent platform to create dynamic visualizations of Gradient Boosting models. We’ll utilize Dash to build an interactive dashboard that provides insights into the model’s predictions.

K-Nearest Neighbors (KNN) Classifier

K-Nearest Neighbors is a simple yet effective algorithm used for both classification and regression tasks. It makes predictions based on the majority class of its k-nearest data points. Python and Dash can be employed to implement and visualize KNN models.

1. Understanding K-Nearest Neighbors We’ll explore the underlying principles of KNN and how it operates in the context of smart agriculture. KNN is particularly useful when dealing with spatially correlated agricultural data.

2. Building a K-Nearest Neighbors Classifier We’ll implement KNN using Python’s scikit-learn library and discuss different distance metrics and k-value selection for optimal results.

3. Visualizing K-Nearest Neighbors with Dash Dash provides a visually appealing platform to visualize the KNN model’s performance. We’ll create an interactive dashboard to explore the model’s predictions and understand its decision-making process.

Machine Learning and Smart Agriculture using Python and Dash bring forth a new era of data-driven farming practices. By leveraging algorithms like SVM, Decision Trees, Random Forest, Gradient Boosting, and KNN, farmers can optimize crop yields, make informed decisions, and ensure sustainable agricultural practices. The user-friendly nature of Python and the interactive visualizations provided by Dash make these technologies accessible and valuable tools for modern farmers.

This Python script in the article entails a dashboard for Crop Recommendation in Smart Agriculture using Machine Learning techniques. The dashboard is built with Python and Dash, and it allows users to visualize and analyze crop data, including crop counts, correlation between features, histograms, joint plots, box plots, and accuracy of various machine learning models.

Here’s an overview of the script’s functionality:

  1. Data Loading and Preprocessing: The script loads the crop recommendation dataset from a CSV file. If the file is not found, it creates a mock dataset with random values. The data is then preprocessed, converting the categorical ‘label’ column into numerical ‘target’ values.
  2. Machine Learning Models: The script uses several machine learning models to predict crop recommendations. The models used are:
    • Support Vector Machines (SVM) with linear, polynomial, and radial basis function (RBF) kernels.
    • Decision Tree Classifier.
    • Random Forest Classifier.
    • Gradient Boosting Classifier.
    • K-Nearest Neighbors (KNN) Classifier.
  3. Model Evaluation: The script evaluates the performance of each machine learning model by calculating accuracy scores and generating detailed classification reports.
  4. Data Visualization: The script uses Dash to create an interactive web-based dashboard with various plots and visualizations to explore the crop data and machine learning model results. The dashboard includes:
    • Summary table of basic statistics for the dataset.
    • Heatmap of feature correlations.
    • Scatter plots of crop data with different feature combinations.
    • Histograms for temperature and pH.
    • Joint plots to show relationships between rainfall and humidity, and between K and N.
    • Box plots to compare pH values across different crops.
    • Line plot to analyze the relationship between K and rainfall with humidity less than 65.
    • Accuracy plot for the KNN model with different K values.
  5. Mock Trends: If the dataset CSV file is not found, the script generates mock trends for each crop by modifying the data based on specific conditions.
  6. Running the Dashboard: The script creates a Dash app and sets up the layout with various components and plots. The app is then launched and can be accessed using a web browser.

To run this script, ensure you have all the required libraries installed (Dash, pandas, numpy, seaborn, sklearn, yellowbrick, and plotly) and that the dataset CSV file is correctly located at the specified file path. Then, execute the script, and the Crop Recommendation Dashboard will be available for exploration through the provided local server.

Python Code: Available on GitHub Repository

# Import the required libraries 
import dash
from dash import dcc, html, dash_table
from dash.dependencies import Input, Output
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier
from yellowbrick.classifier import ClassificationReport
import plotly.express as px
import plotly.graph_objs as go
import random
import colorsys
from sklearn.metrics import classification_report

file = 'C:/Downloads/Crop_recommendation.csv'

# Load the dataset or create mock data if file not found
try:
    df = pd.read_csv(file)
except FileNotFoundError:
    df = pd.DataFrame({
        'N': np.random.randint(0, 140, 1000),
        'P': np.random.randint(0, 145, 1000),
        'K': np.random.randint(0, 205, 1000),
        'temperature': np.random.uniform(10, 40, 1000),
        'humidity': np.random.uniform(30, 100, 1000),
        'ph': np.random.uniform(4, 9, 1000),
        'rainfall': np.random.uniform(20, 300, 1000),
        'label': np.random.choice(['rice', 'wheat', 'maize', 'potato', 'cotton', 'sugar cane'], 1000)
    })

# Data Preprocessing
c = df.label.astype('category')
targets = dict(enumerate(c.cat.categories))
df['target'] = c.cat.codes
y = df.target
X = df[['N', 'P', 'K', 'temperature', 'humidity', 'ph', 'rainfall']]
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=1)

# Adding SVM
svc_linear = SVC(kernel='linear').fit(X_train, y_train)
svc_poly = SVC(kernel='poly').fit(X_train, y_train)
svc_rbf = SVC(kernel='rbf').fit(X_train, y_train)
# Adding Decision Tree Classifier
dt_clf = DecisionTreeClassifier(random_state=42).fit(X_train, y_train)
# Adding Random Forest
rf_clf = RandomForestClassifier(max_depth=4, n_estimators=100, random_state=42).fit(X_train, y_train)
# Adding Gradient Boost
grad_clf = GradientBoostingClassifier().fit(X_train, y_train)

# Get detailed summary for each model
linear_report = classification_report(y_test, svc_linear.predict(X_test), target_names=df['label'].unique())
poly_report = classification_report(y_test, svc_poly.predict(X_test), target_names=df['label'].unique())
rbf_report = classification_report(y_test, svc_rbf.predict(X_test), target_names=df['label'].unique())
dt_report = classification_report(y_test, dt_clf.predict(X_test), target_names=df['label'].unique())
rf_report = classification_report(y_test, rf_clf.predict(X_test), target_names=df['label'].unique())
grad_report = classification_report(y_test, grad_clf.predict(X_test), target_names=df['label'].unique())


k_range = range(1, 11)
scores = []
# Adding KNN
for k in k_range:
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)  # Using X_train directly for KNN
    scores.append(knn.score(X_test, y_test))

# Visualize the results
def generate_crop_colors(n):

    hsv_colors = [(i/n, 0.9, 0.9) for i in range(n)]
    rgb_colors = [tuple(int(255 * c) for c in colorsys.hsv_to_rgb(*color)) for color in hsv_colors]
    hex_colors = [f'#{color[0]:02x}{color[1]:02x}{color[2]:02x}' for color in rgb_colors]
    return hex_colors

# Generate the colors
num_crops = len(df['label'].unique())
crop_colors = generate_crop_colors(num_crops)

# Create a DataFrame for crop colors
crop_colors_df = pd.DataFrame({'label': df['label'].unique(), 'color': crop_colors})

# Create a dictionary to map each unique crop label to a specific color
crop_label_to_color = dict(zip(crop_colors_df['label'], crop_colors_df['color']))

# Add the 'target' column to the DataFrame
df['target'] = df['label'].map({crop: i for i, crop in enumerate(targets.values())})



# Mock Trends for each crop if csv not found
try:
    df = pd.read_csv(file)
except FileNotFoundError:
    for crop in crop_colors:
        if crop == 'rice':
            trend_value = np.linspace(0, 1, len(df[df['label'] == crop]))
            df.loc[df['label'] == crop, 'N'] += trend_value * 40
            df.loc[df['label'] == crop, 'P'] += trend_value * 30
            df.loc[df['label'] == crop, 'K'] += trend_value * 20
            df.loc[df['label'] == crop, 'temperature'] += trend_value * 3
            df.loc[df['label'] == crop, 'humidity'] -= trend_value * 5
            df.loc[df['label'] == crop, 'ph'] -= trend_value * 0.5
            df.loc[df['label'] == crop, 'rainfall'] += trend_value * 70

        elif crop == 'wheat':
            trend_value = np.linspace(0, 1, len(df[df['label'] == crop]))
            df.loc[df['label'] == crop, 'N'] += trend_value * 35
            df.loc[df['label'] == crop, 'P'] += trend_value * 28
            df.loc[df['label'] == crop, 'K'] += trend_value * 22
            df.loc[df['label'] == crop, 'temperature'] += trend_value * 4
            df.loc[df['label'] == crop, 'humidity'] -= trend_value * 4
            df.loc[df['label'] == crop, 'ph'] -= trend_value * 0.3
            df.loc[df['label'] == crop, 'rainfall'] += trend_value * 80
        
        elif crop == 'maize':
            trend_value = np.linspace(0, 1, len(df[df['label'] == crop]))
            df.loc[df['label'] == crop, 'N'] += trend_value * 50
            df.loc[df['label'] == crop, 'P'] += trend_value * 35
            df.loc[df['label'] == crop, 'K'] += trend_value * 25
            df.loc[df['label'] == crop, 'temperature'] += trend_value * 6
            df.loc[df['label'] == crop, 'humidity'] -= trend_value * 6
            df.loc[df['label'] == crop, 'ph'] -= trend_value * 0.8
            df.loc[df['label'] == crop, 'rainfall'] += trend_value * 90

        elif crop == 'potato':
            trend_value = np.linspace(0, 1, len(df[df['label'] == crop]))
            df.loc[df['label'] == crop, 'N'] += trend_value * 45
            df.loc[df['label'] == crop, 'P'] += trend_value * 40
            df.loc[df['label'] == crop, 'K'] += trend_value * 35
            df.loc[df['label'] == crop, 'temperature'] += trend_value * 2
            df.loc[df['label'] == crop, 'humidity'] -= trend_value * 8
            df.loc[df['label'] == crop, 'ph'] -= trend_value * 0.2
            df.loc[df['label'] == crop, 'rainfall'] += trend_value * 60

        elif crop == 'cotton':
            trend_value = np.linspace(0, 1, len(df[df['label'] == crop]))
            df.loc[df['label'] == crop, 'N'] += trend_value * 55
            df.loc[df['label'] == crop, 'P'] += trend_value * 25
            df.loc[df['label'] == crop, 'K'] += trend_value * 15
            df.loc[df['label'] == crop, 'temperature'] += trend_value * 8
            df.loc[df['label'] == crop, 'humidity'] -= trend_value * 2
            df.loc[df['label'] == crop, 'ph'] -= trend_value * 0.7
            df.loc[df['label'] == crop, 'rainfall'] += trend_value * 110

        elif crop == 'sugar cane':
            trend_value = np.linspace(0, 1, len(df[df['label'] == crop]))
            df.loc[df['label'] == crop, 'N'] += trend_value * 60
            df.loc[df['label'] == crop, 'P'] += trend_value * 30
            df.loc[df['label'] == crop, 'K'] += trend_value * 18
            df.loc[df['label'] == crop, 'temperature'] += trend_value * 5
            df.loc[df['label'] == crop, 'humidity'] -= trend_value * 3
            df.loc[df['label'] == crop, 'ph'] -= trend_value * 0.5
            df.loc[df['label'] == crop, 'rainfall'] += trend_value * 120

# create app
app = dash.Dash(__name__)

# create layout
app.layout = html.Div(children=[
    html.H1("Crop Recommendation Dashboard", style={'textAlign': 'center'}),
    dash_table.DataTable(
        id='summary-table',
        columns=[
            {'name': 'Statistics', 'id': 'index'},
            * [{'name': col, 'id': col} for col in df.describe().columns]
        ],
        data=round(df.describe(), 2).reset_index().to_dict('records'),  # Round the values to 2 decimal places and include row headers
        style_table={'width': '50%', 'margin': '20px auto'}
    ),
    html.H3("Linear Kernel Accuracy:"),
    dcc.Markdown(f"```\n{linear_report}\n```"),

    html.H3("Poly Kernel Accuracy:"),
    dcc.Markdown(f"```\n{poly_report}\n```"),

    html.H3("RBF Kernel Accuracy:"),
    dcc.Markdown(f"```\n{rbf_report}\n```"),

    html.H3("Decision Tree Accuracy:"),
    dcc.Markdown(f"```\n{dt_report}\n```"),

    html.H3("Random Forest Accuracy:"),
    dcc.Markdown(f"```\n{rf_report}\n```"),

    html.H3("Gradient Boosting Accuracy:"),
    dcc.Markdown(f"```\n{grad_report}\n```"),
    dcc.Graph(
        id='heatmap',
        figure={
            'data': [go.Heatmap(z=X.corr(), x=X.columns, y=X.columns, colorscale='Viridis')],
            'layout': {'title': 'Correlation Heatmap'}
        }
    ),

    dcc.Graph(
        id='scatter-plot',
        figure={
            'data': [
                go.Scatter(
                    x=df['K'],
                    y=df['N'],
                    mode='markers',
                    text=df['label'],
                    marker=dict(color=[crop_label_to_color[crop] for crop in df['label']])
                )
            ],
            'layout': {'title': 'Scatter Plot (K vs N)'}
        }
    ),

    dcc.Graph(
        id='count-plot',
        figure={
            'data': [
                {
                    'x': df['label'].value_counts().index,
                    'y': df['label'].value_counts().values,
                    'type': 'bar',
                    'marker': {'color': [crop_label_to_color[crop] for crop in df['label'].value_counts().index]}
                }
            ],
            'layout': {'title': 'Crop Counts'}
        }
    ),

    dcc.Graph(
        id='pair-plot',
        figure=px.scatter_matrix(df, dimensions=['N', 'P', 'K', 'temperature', 'humidity', 'ph', 'rainfall'], color='label', color_discrete_map=crop_label_to_color).update_layout(
            title='Pair Plot',
            xaxis=dict(title='Features'),
            yaxis=dict(title='Features')
        )
    ),

    dcc.Graph(
        id='hist-temperature',
        figure=go.Figure(data=go.Histogram(x=df['temperature'], marker=dict(color="purple"), nbinsx=15, opacity=0.2)).update_layout(
            title='Temperature Histogram',
            xaxis=dict(title='Temperature'),
            yaxis=dict(title='Count')
        )
    ),

    dcc.Graph(
        id='hist-ph',
        figure=go.Figure(data=go.Histogram(x=df['ph'], marker=dict(color="green"), nbinsx=15, opacity=0.2)).update_layout(
            title='pH Histogram',
            xaxis=dict(title='pH'),
            yaxis=dict(title='Count')
        )
    ),

    dcc.Graph(
        id='joint-rainfall-humidity',
        figure=go.Figure(data=go.Scatter(
            x=df[(df['temperature'] < 30) & (df['rainfall'] > 120)]['rainfall'],
            y=df[(df['temperature'] < 30) & (df['rainfall'] > 120)]['humidity'],
            mode='markers',
            text=df[(df['temperature'] < 30) & (df['rainfall'] > 120)]['label'],
            marker=dict(color=[crop_label_to_color[crop] for crop in df[(df['temperature'] < 30) & (df['rainfall'] > 120)]['label']])
        )).update_layout(
            title='Joint Plot of Rainfall vs Humidity',
            xaxis=dict(title='Rainfall'),
            yaxis=dict(title='Humidity'),
        )
    ),

    dcc.Graph(
        id='joint-k-n',
        figure=go.Figure(data=go.Scatter(
            x=df[(df['N'] > 40) & (df['K'] > 40)]['K'],
            y=df[(df['N'] > 40) & (df['K'] > 40)]['N'],
            mode='markers',
            text=df[(df['N'] > 40) & (df['K'] > 40)]['label'],
            marker=dict(color=[crop_label_to_color[crop] for crop in df[(df['N'] > 40) & (df['K'] > 40)]['label']])
        )).update_layout(
            title='Joint Plot of K vs N',
            xaxis=dict(title='K'),
            yaxis=dict(title='N')
        )
    ),

    dcc.Graph(
        id='joint-k-humidity',
        figure=go.Figure(data=go.Scatter(
            x=df['K'],
            y=df['humidity'],
            mode='markers',
            text=df['label'],
            marker=dict(color=[crop_label_to_color[crop] for crop in df['label']])
        )).update_layout(
            title='Joint Plot of K vs Humidity',
            xaxis=dict(title='K'),
            yaxis=dict(title='Humidity')
        )
    ),

    dcc.Graph(
        id='box-ph-crop',
        figure=go.Figure(data=[go.Box(
            y=df[df['label'] == crop]['ph'],
            name=crop,
            marker=dict(color=crop_label_to_color[crop])
        ) for crop in df['label'].unique()]).update_layout(
            title='Box Plot of pH vs Crop',
            xaxis=dict(title='Crop'),
            yaxis=dict(title='pH')
        )
    ),

    dcc.Graph(
        id='box-p-rainfall',
        figure=go.Figure(data=[
            go.Box(
                y=df[df['label'] == crop]['P'],
                name=crop,
                marker=dict(color=crop_label_to_color[crop])
            ) for crop in df['label'].unique() if df[df['label'] == crop]['rainfall'].mean() > 150
        ]).update_layout(
            title='Box Plot of P vs Rainfall',
            xaxis=dict(title='Crop'),
            yaxis=dict(title='P')
        )
    ),

    dcc.Graph(
        id='line-rainfall-k-humidity',
        figure=go.Figure(data=go.Scatter(
            x=df[(df['humidity'] < 65)]['K'],
            y=df[(df['humidity'] < 65)]['rainfall'],
            mode='lines+markers',
            text=df[(df['humidity'] < 65)]['label'],
            marker=dict(color=[crop_label_to_color[crop] for crop in df[(df['humidity'] < 65)]['label']])
        )).update_layout(
            title='Line Plot of Rainfall vs K with Humidity < 65',
            xaxis=dict(title='K'),
            yaxis=dict(title='Rainfall')
        )
    ),

    dcc.Graph(
        id='knn-accuracy-plot',
        figure=go.Figure(data=go.Scatter(x=list(k_range), y=scores, mode='markers+lines', marker=dict(color='blue'))).update_layout(
            title='KNN Accuracy Plot',
            xaxis=dict(title='K'),
            yaxis=dict(title='Accuracy')
        )
    )
,

   

])

if __name__ == '__main__':
    app.run_server(debug=True)

FAQs

Q: How can Machine Learning benefit smart agriculture using Python and Dash?

Machine Learning offers numerous benefits to smart agriculture, including crop yield prediction, disease detection, resource optimization, and automated decision-making. Python and Dash provide powerful tools to implement and visualize Machine Learning models efficiently.

Q: What are the key advantages of using SVM with RBF kernels in agriculture?

SVM with RBF kernels can handle complex and non-linearly separable data, making it highly effective for agricultural tasks that involve intricate relationships between variables. It excels in crop classification and precision farming applications.

Q: How can Decision Trees improve farm management in smart agriculture?

Decision Trees provide transparent and interpretable models that can aid in making informed decisions related to crop management, pest control, and resource allocation. Their visual representation through Dash enhances their usability for farmers and agronomists.

Q: What makes Random Forest a popular choice for smart agriculture applications?

Random Forest’s ability to mitigate overfitting, handle large datasets, and provide robust predictions makes it a popular choice in agriculture. It is particularly useful when dealing with diverse and complex agricultural data.

Q: How does Gradient Boosting enhance the accuracy of predictive models in agriculture?

Gradient Boosting sequentially improves model performance by correcting errors made by previous weak learners. In agriculture, this leads to enhanced accuracy in yield prediction, disease identification, and crop monitoring.

Q: What role do K-Nearest Neighbors play in precision agriculture?

K-Nearest Neighbors is valuable in precision agriculture for tasks such as soil mapping, disease detection, and variable rate application of resources. Its simplicity and efficiency make it an attractive choice for on-farm decision support systems.

%d bloggers like this: